cbloom rants: 03-15-14 - Bit IO of Elias Gamma

Making a record of something not new :

Elias Gamma codes are made by writing the position of the top bit using unary, and then the lower bits raw. eg to send 30, 30 = 11110 , the top bit is in position 4 so you send that as "00001" and then the bottom 4 bits "1110". The first few values (starting at 1) are :


1 =   1 : 1
2 =  10 : 01 + 0
3 =  11 : 01 + 1
4 = 100 : 001 + 00
5 = 101 : 001 + 01
...

The naive way to send this code is :


void PutEliasGamma( BITOUT, unsigned val )
{
    ASSERT( val >= 1 );

    int topbit = bsr(val);

    ASSERT( val >= (1<<topbit) && val < (2<<topbit) );

    PutUnary( BITOUT, topbit );

    val -= (1<<topbit); // or &= (1<<topbit)-1;

    PutBits( BITOUT, val, topbit );
}

But it can be done more succinctly.

We should notice two things. First of all PutUnary is very simple :


void PutUnary( BITOUT, unsigned val )
{
    PutBits( BITOUT , 1, val+1 );
}

That is, it's just putting the value 1 in a variable number of bits. This gives you 'val' leading zero bits and then a 1 bit, which is the unary encoding.

The next is that the 1 from the unary is just the same as the 1 we remove from the top position of 'val'. That is, we can think of the bits thusly :


5 = 101 : 001 + 01

unary of two + remaining 01
or

5 = 101 : 00 + 101

two zeros + the value 5

The result is a much simplified elias gamma coder :


void PutEliasGamma( BITOUT, unsigned val )
{
    ASSERT( val >= 1 );

    int topbit = bsr(val);

    ASSERT( val >= (1<<topbit) && val < (2<<topbit) );

    PutBits( BITOUT, val, 2*topbit+1 );
}

note that if your bit IO is backwards then this all works slightly differently (I'm assuming you can combine two PutBits into one with the first PutBits in the top of the second; that is

PutBits(a,na)+PutBits(b,nb) == PutBits((a<<nb)|b,na+nb)

Perhaps more importantly, we can do a similar transformation on the reading side.

The naive reader is :


int GetEliasGamma( BITIN )
{
    int bits = GetUnary( BITIN );

    int ret = (1<<bits) + GetBits( BITIN, bits );

    return ret;
}

(assuming your GetBits can handle getting zero bits, and returns a value >= 1). The naive unary reader is :


int GetUnary( BITIN )
{
    int ret = 0;
    while( GetOneBit( BITIN ) == 0 )
    {
        ret++;
    }
    return ret;
}

but if your bits are top-justified in your bit input word (as in ans_fast for example, or see the end of this post for a reference implementation), then you can use count_leading_zeros to read unary :


int GetUnary( BITIN )
{
    int clz = count_leading_zeros( BITIN );

    ASSERT( clz < NumBitsAvailable(BITIN) );

    int one = GetBits( BITIN, clz+1 );
    ASSERT( one == 1 );

    return clz;
}

here the GetBits is just consuming the zeros and the one on bit of the unary code. Just like in the Put case, the key thing is that the trailing 1 bit of the unary is the same as the top bit value ( "(1<<bits)" ) that we added in the naive reader. That is :


int GetEliasGamma( BITIN )
{
    int bits = count_leading_zeros( BITIN );

    ASSERT( bits < NumBitsAvailable(BITIN) );

    int one = GetBits( BITIN, bits+1 );
    ASSERT( one == 1 );

    int ret = (1<<bits) + GetBits( BITIN, bits );

    return ret;
}

can be simplified to combine the GetBits :

int GetEliasGamma( BITIN )
{
    int bits = count_leading_zeros( BITIN );

    ASSERT( bits < NumBitsAvailable(BITIN) );

    int ret = GetBits( BITIN, 2*bits + 1 );

    ASSERT( ret >= (1<<bits) && ret < (2<<bits) );

    return ret;
}

again assuming that your GetBits combines like big-endian style.

You can do the same for "Exp Golomb" of course, which is just Elias Gamma + some raw bits. (Exp Golomb is the special case of Golomb codes with a power of two divisor).

Summary :

//===============================================================================

// Elias Gamma works on vals >= 1
// these assume that the value fits in your bit word
// and your bit reader is big-endian and top-justified

#define BITOUT_PUT_ELIASGAMMA(bout_bits,bout_numbits,val) do { \
    ASSERT( val >= 1 ); \
    uint32 topbit = bsr64(val); \
    BITOUT_PUT(bout_bits,bout_numbits, val, 2*topbit + 1 ); \
    } while(0)

#define BITIN_GET_ELIASGAMMA(bitin_bits,bitin_numbits,val) do { \
    uint32 nlz = clz64(bitin_bits); \
    uint32 nbits = 2*nlz+1; \
    BITIN_GET(bitin_bits,bitin_numbits,nbits,val); \
    } while(0)

//===============================================================================
// MSVC implementations of bsr and clz :

static inline uint32 bsr64( uint64 val )
{
    ASSERT( val != 0 );
    unsigned long b = 0;
    _BitScanReverse64( &b, val );
    return b;
}

static inline uint32 clz64(uint64 val)
{
    return 63 - bsr64(val);
}

//===============================================================================
// and for completeness, reference bitio that works with those functions :
//  (big endian ; bit input word top-justified)

#define BITOUT_VARS(bout_bits,bout_numbits,bout_ptr) \
    uint64 bout_bits; \
    int64 bout_numbits; \
    uint8 * bout_ptr;

#define BITOUT_START(bout_bits,bout_numbits,bout_ptr,buf) do { \
        bout_bits = 0; \
        bout_numbits = 0; \
        bout_ptr = (uint8 *)buf; \
    } while(0)

#define BITOUT_PUT(bout_bits,bout_numbits,val,nb) do { \
        ASSERT( (bout_numbits+nb) <= 64 ); \
        ASSERT( (val) < (1ULL<<(nb)) ); \
        bout_numbits += nb; \
        bout_bits |= ((uint64)(val)) << (64 - bout_numbits); \
    } while(0)
    
#define BITOUT_FLUSH(bout_bits,bout_numbits,bout_ptr) do { \
        *((uint64 *)bout_ptr) = _byteswap_uint64( bout_bits ); \
        bout_bits <<= (bout_numbits&~7); \
        bout_ptr += (bout_numbits>>3); \
        bout_numbits &= 7; \
    } while(0)
    
#define BITOUT_END(bout_bits,bout_numbits,bout_ptr) do { \
        BITOUT_FLUSH(bout_bits,bout_numbits,bout_ptr); \
        if ( bout_numbits > 0 ) bout_ptr++; \
    } while(0)

//===============================================================

#define BITIN_VARS(bitin_bits,bitin_numbits,bitin_ptr) \
    uint64 bitin_bits; \
    int64 bitin_numbits; \
    uint8 * bitin_ptr;

#define BITIN_START(bitin_bits,bitin_numbits,bitin_ptr,begin_ptr) do { \
        bitin_ptr = (uint8 *)begin_ptr; \
        bitin_bits = _byteswap_uint64( *( (uint64 *)bitin_ptr ) ); \
        bitin_ptr += 8; \
        bitin_numbits = 64; \
    } while(0)

#define BITIN_REFILL(bitin_bits,bitin_numbits,bitin_ptr) do { if ( bitin_numbits <= 56 ) { \
        ASSERT( bitin_numbits > 0 && bitin_numbits <= 64 ); \
        uint64 next8 = _byteswap_uint64( *( (uint64 *)bitin_ptr ) ); \
        int64 bytesToGet = (64 - bitin_numbits)>>3; \
        bitin_ptr += bytesToGet; \
        bitin_bits |= next8 >> bitin_numbits; \
        bitin_numbits += bytesToGet<<3; \
        ASSERT( bitin_numbits >= 56 && bitin_numbits <= 64 ); \
    } } while(0)

#define BITIN_GET(bitin_bits,bitin_numbits,nb,ret) do { \
        ASSERT( nb <= bitin_numbits ); \
        ret = (bitin_bits >> 1) >> (63 - nb); \
        bitin_bits <<= nb; \
        bitin_numbits -= nb; \
    } while(0)

//=========================================================

and yeah yeah I know this bitio could be faster. It's a reference implementation that's trying to avoid obfuscations. GTFO.

Added exp-golomb. The naive put is :


PutEliasGamma( val >> r );
PutBits( val & ((1<<r)-1) , r );

but you do various reductions and get to :

//===============================================================================

// this Exp Golomb is for val >= 0
// Exp Golomb is Elias Gamma + 'r' raw bits

#define BITOUT_PUT_EXPGOLOMB(bout_bits,bout_numbits,r,val) do { \
    ASSERT( val >= 0 ); \
    uint64 up = (val) + (1ULL<<(r)); \
    uint32 topbit = bsr64(up); \
    ASSERT( topbit >= (uint32)(r) ); \
    BITOUT_PUT(bout_bits,bout_numbits, up, 2*topbit + 1 - r ); \
    } while(0)
    
#define BITIN_GET_EXPGOLOMB(bitin_bits,bitin_numbits,r,val) do { \
    uint32 nbits = 2*clz64(bitin_bits)+1+r; \
    BITIN_GET(bitin_bits,bitin_numbits,nbits,val); \
    ASSERT( val >= (1ULL<<r) ); \
    val -= (1ULL<<r); \
    } while(0)

//=========================================================

5 comments:

Fabian 'ryg' Giesen said...: http://fgiesen.wordpress.com/2011/01/19/a-small-note-on-the-elias-gamma-cod/ :)

While working on Bink 2 which uses LSB-first bottom-justified bit packing, I realized unary is easy there too: calculate "x & -x" (clears all but the lowest set bit in x) and then you can do "32 - clz32" (see also "http://fgiesen.wordpress.com/2013/10/18/bit-scanning-equivalencies/"). A bit more work but still short and branch-free. You do lose the super-nice Gamma decoder though.; March 15, 2014 at 9:54 PM
Fabian 'ryg' Giesen said...: Eh, 31 - clz32 and 63 - clz64 respectively. You get the idea.; March 15, 2014 at 9:55 PM
cbloom said...: Ah yes. At least I didn't cover it before on my own blog, which I've been known to do.

One nice thing about 64-bit BITIO and the grab-8 method of refilling is that you always have >= 57 bits available, which means you don't need to even check for the overflow case for typical value ranges.; March 16, 2014 at 11:08 AM
Fabian 'ryg' Giesen said...: Yup. With 64-bit BitIO I'm always torn between byte-based refills (sooo many spare bits to work with, so many refill checks that can be skipped!) and 32-bit-at-a-time refills (nice predictable rare branch, always do 32-bit aligned accesses).

With old school-ish RISCs (especially if they don't have good branch predictors), I think 32-bit aligned is superior; on x86/ARM with their fast support for unaligned reads it's a bit more complicated.

GCN GPUs have "alignbit" and "alignbyte":

v_alignbit_b32 dest, s0, s1, s2

dest = (s0 | (s1 << 32)) >> (s2 & 31)

v_alignbyte_b32 dest, s0, s1, s2

dest = (s0 | (s1 << 32)) >> (8 * (s2 & 3))

I'd sure like to have those on general-purpose CPUs...; March 16, 2014 at 5:19 PM
cbloom said...: Added Exp Golomb to the post.; March 16, 2014 at 6:24 PM

cbloom rants

3/15/2014

03-15-14 - Bit IO of Elias Gamma

5 comments:

old rants