cbloom rants: 01-14-09 - Allocator Alignment

1/15/2009

01-14-09 - Allocator Alignment

I like it when allocations of size N are aligned to the next lowest power of 2 below N.

So eg. an allocation of 4000 bytes is aligned to 2048. It means you can do things like just malloc a 128-bit vector and it's aligned and you never have to worry about it. You never have to manually ask for alignment as long as the alignment you want is <= the size of your object (which it almost always is).

eg. if you want to malloc some MAT4x4 objects, you just do it and you know that they are aligned to sizeof(MAT4x4).

Is there any disadvantage to this? (eg. does it waste a lot of memory compared to more conservative alignment schemes?)

Also, I used to always do my malloc debug tracking "intrusively" , that is by sticking an info header at the front of the block and allocating and bigger piece for each alloc, then linking them together. The advantage of this is that it's very fast - when you free you just go to (ptr - sizeof(info_header)).

I think I am now convinced that that is the wrong way to do it. It's better to have a separate tracker which hashes from the pointer address to an info struct. The big advantage of the "non-intrusive" way like this is that it doesn't change the behavior of the allocator at all. So things like alignment aren't affected, and neither is cache usage or optimization issues (for example if you're using a GC-type arena allocator and adjacency of items is important to performance).

In general now I'm more eager for debugging and instrumentation schemes like this which have *zero* affect on the behavior of the core functionality, but basically just watch it from the outside.

(For example on consoles where you have 256M of memory in the real consoles and an extra 256M in the dev kits, it's ideal to make two separate allocators, one in the lower 256 where all your real data goes and one for the upper 256 where all your non-intrusive debug extra data goes; in practice this is a pain in the butt, but it is the ideal way to do things, so that you have the real final version of the game running all the time in the lower 256).

4 comments:

Anonymous said...: I guess you'd have to run simulations, but I would expect it to waste a lot of memory, yes. I'm not sure how it would shake out after a lot of freeing, but if you just sat there allocating 64K+1 byte over and over, you'd end up with 50% wastage.

Allocating to an alignment of the largest power of 2 that divides the requested size probably gets you what you want with less wastage. I didn't do that in our malloc because variable amounts of alignment would require extra overhead per malloc, which would suck at the low end.; January 15, 2009 at 4:05 PM
cbloom said...: BTW super-secret MS alignment disclosure :

http://msdn.microsoft.com/en-us/library/ycsb6wwf(vs.80).aspx; January 15, 2009 at 4:56 PM
cbloom said...: "Allocating to an alignment of the largest power of 2 that divides the requested size probably gets you what you want with less wastage"

Oh yeah, that's obviously better. Uh.. that's what I meant to say.; January 15, 2009 at 9:17 PM
Arseny Kapoulkine said...: 16b malloc alignment is only for x64, if your application is 32 bit then it's 8b, so you have to override it to use i.e. SSE w/aligned loads.

As for alignment, it depends on the allocation needs of application - i.e. for allocating 4, 1024, 4, 1024, etc. bytes sequentially you'll get large overhead.

Why do you need it anyway? IMO explicit alignment is the only way to go for realistic usage. Perhaps along with larger default alignment (i.e. 16 instead of 8).; January 15, 2009 at 10:20 PM

cbloom rants

1/15/2009

01-14-09 - Allocator Alignment

4 comments:

old rants