Doing binary arithmetic coding without a divide is trivial. Previous methods for division free multi-symbol arithmetic coding have been at the cost of low coding precision, that is coding in more bits than -log2 of the symbol probabilities (for example see DCC95 and SM98).

code is here :

recip_arith is public domain.

Our method uses an approximate map from the cdf interval to the range interval, scaling by only a few top bits of range. This approximate map introduces coding loss, but that can be made quite small with just 8 bits from range.

The advantage of this approximate map is that it can be exactly inverted using a table of reciprocal multiplies :

x / y == (x * recip_table[y]) >> 32 recip_table[y] = ((1<<32) + y-1) / yx/y in integers is the floor divide, and recip_table is the ceil reciprocal. (see Alverson, "Integer Division using reciprocals"). The choice of a 32 bit numerator is somewhat arbitrary, but we do want the reciprocals to fit in 32 bit words to minimize the size of recip_table in L1 cache.

Crucially because only the top 8 bits of "range" are used in the forward map, then "y" needed in the divide is only 8 bits, so the recip_table can be quite small. Furthermore, 8 bits + 24 bits of cdf in the numerator "x" can be inverted exactly.

** Coding Efficiency **

The coding loss of "recip_arith" with 8 bit tables is reliably under 0.005 bpb (around 0.1%) , in constrast to SM98 where the coding loss can be 10X higher (0.05 bpb or 1.0%)

Calgary Corpus file "news" , len=377,109 cdf_bits = 13 (symbol frequencies sum to 1<<13) from smallest to largest output : recip_arith coder (down/up 8-bit) : 244,641 = 5.190 bpb cacm87: 244,642 = 5.190 bpb range coder: 244,645 = 5.190 bpb recip_arith coder (with end of range map) : 244,736 = 5.192 bpb recip_arith coder: 244,825 = 5.194 bpb recip_arith coder (down/up 2-bit) (aka Daala "reduced overhead") : 245,488 = 5.208 bpb SM98 : 245,690 = 5.212 bpb(the down/up and "end of range map" variants will be discussed later)

The crucial step of recip_arith is the way the arithmetic coding interval is shrunk to encode a symbol.

The standard range coder does :

uint32_t r_norm = ac->range >> cdf_bits; ac->low += cdf_low * r_norm; ac->range = cdf_freq * r_norm;while recip_arith does :

uint32_t range = ac->range; int range_clz = clz32(range); uint32_t r_top = range >> (32 - range_clz - RECIP_ARITH_TABLE_BITS); uint32_t r_norm = r_top << ( 32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits); ac->low += cdf_low * r_norm; ac->range = cdf_freq * r_norm;where "r_top" is just the highest RECIP_ARITH_TABLE_BITS bits of range, and r_norm is those bits back in their original position, then shifted down by cdf_bits. In the end the way the interval is shrunk is exactly the same as the range coder, except that all bits below the top RECIP_ARITH_TABLE_BITS of "r_norm" are turned off.

** Is it useful ? **

It's unclear if there are really killer practical advantages to recip_arith at the moment. On many modern high end CPUs, 32 bit divides are pretty fast, so if you just take recip_arith and use it to replace a standard range coder you might not see much difference. On CPUs with slow divides there is a more significant advantage.

recip_arith provides an interesting continuum that connects very low precision coders like SM98 to the full precision range coder. The "up/down" variant with a very small reciprocal table (4 bits?) might be interesting for hardware implementations, where division is not desirable.

recip_arith works with 64-bit coding state (the standard range coder would require a 64-bit divide, which is often much slower than a 32-bit divide), which can provide advantages. recip_arith also works with the encoder & decoder not in lock step; eg. you could encode with 32-bit state and decode with 64-bit state, allowing you independence of the bitstream & implementations, which is very desirable; not all arithmetic coders have this property.

I think primarily it is theoretically interesting at the moment, and it remains to be seen if that turns into a compelling practical application.

** What's Next **

I'll be doing a couple of followup posts going into the details of how/why this works. We'll talk about the theory of arithmetic coders and how to think about them. Then we'll look at the up/down and "map end" variants.

** Index of posts **

recip_arith on github

1. cbloom rants About arithmetic coders and recip_arith in particular

2. cbloom rants Arithmetic coder division-free maps

3. cbloom rants Working through recip_arith

4. cbloom rants recip_arith without unused range

## No comments:

Post a Comment