6/14/2011

06-14-11 - A simple allocator

You want to be able to allocate slots, free slots, and iterate on the allocated slot indexes. In particular :


int AllocateSlot( allocator );
void FreeSlot( allocator , int slot );
int GetNextSlot( iterator );

Say you can limit the maximum number of allocations to 32 or 64, then obviously you should use bit operations. But you also want to avoid variable shifts. What do you do ?

Something like this :


static int BottomBitIndex( register U32 val )
{
    ASSERT( val != 0 );
    #ifdef _MSC_VER
    unsigned long b = 0;
    _BitScanForward( &b, val );
    return (int)b;
    #elif defined(__GNUC__)
    return __builtin_ctz(val); // ctz , not clz
    #else
    #error need bottom bit index
    #endif
}

int __forceinline AllocateSlot( U32 & set )
{
    U32 inverse = ~set;
    ASSERT( inverse != 0 ); // no slots free!
    int index = BottomBitIndex(inverse);
    U32 mask = inverse & (-inverse);
    ASSERT( mask == (1UL<<index) );
    set |= mask;
    return index;
}

void __forceinline FreeSlot( U32 & set, int slot )
{
    ASSERT( set & (1UL<<slot) );
    set ^= (1UL<<slot);
}

int __forceinline GetNextSlot( U32 & set )
{
    ASSERT( set != 0 );
    int slot = BottomBitIndex(set);
    // turn off bottom bit :
    set = set & (set-1);
    return slot;
}

/*

// example iteration :

    U32 cur = set;
    while(cur)
    {
        int i = GetNextSlot(cur);
        lprintfvar(i);
    }

*/

However, this uses the bottom bit index, which is not as portably fast as using the top bit index (aka count leading zeros). (there are some platforms/gcc versions where builtin_ctz does not exist at all, and others where it exists but is not fast because there's no direct instruction set correspondence).

So, the straightforward version that uses shifts and clz is probably better in practice.

ADDENDUM : Duh, version of same using only TopBitIndex and no variable shifts :


U32 __forceinline AllocateSlotMask( U32 & set )
{
    ASSERT( (set+1) != 0 ); // no slots free!
    U32 mask = (~set) & (set+1); // find lowest off bit
    set |= mask;
    return mask;
}

void __forceinline FreeSlotMask( U32 & set, U32 mask )
{
    ASSERT( set & mask );
    set ^= mask;
}

U32 __forceinline GetNextSlotMask( U32 & set )
{
    ASSERT( set != 0 ); // iteration over when set == 0
    U32 mask = set & (-set); // lowest on bit
    set ^= mask;
    return mask;
}

int __forceinline MaskToSlot( U32 mask )
{
    int slot = TopBitIndex(mask);
    ASSERT( mask == (1UL<<slot) );
    return slot;
}

(note the forceinline is important because the use of actual references is catastrophic on many platforms (due to LHS), we need these to get compiled like macros).

3 comments:

Arseny Kapoulkine said...

Bottom and top bit indices are related: value & -value gives you the number where the only set bit is the bottom (least significant) one, then you can use clz.

cbloom said...

Yeah, of course. Amended.

Yann Collet said...

Up to now, i've been using a LIFO list or table in order to track slot allocation.
Obviously, since the table costs one pointer per slot, each slot has to be "reasonably sized" for this strategy to be effective.

old rants