Working through recip_arith

Time to look at the actual recip_arith map and how it works.

recip_arith takes the top N bits of range (always a high 1 bit, and then N-1 fractional bits in the fixed point view) and uses them to scale the (power of 2 total) CDF up to the arithmetic domain.

The recip_arith forward map is :

int range_clz = clz32(range);
uint32_t r_top = range >> (32 - range_clz - RECIP_ARITH_TABLE_BITS);
uint32_t r_norm = r_top << (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits);

forward(cdf) = cdf * r_norm;


forward(cdf) = (cdf * r_top) << (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits);
r_norm is the scaling factor, close to (range/cdf_tot), differing in that all bits below RECIP_ARITH_TABLE_BITS are set to zero. Recall the standard "range coder" map is :
uint32_t r_norm = ac->range >> cdf_bits;

forward(cdf) = cdf * r_norm;
the only difference in recip_arith is that we are only using the top RECIP_ARITH_TABLE_BITS of "range" in the scaling. This is a floor of the ideal scaling factor, and means we use less of range than we would like.

Note that the "range coder" itself only uses (r_bits - cdf_bits) in its scaling factor, so it relies on r_bits reasonably larger than cdf_bits. This is in contract to CACM87 or full precision scaling.

The recip_arith map can be thought of as starting with the "identity map" from the last post, and then adding fractional bits to the fixed point in [1,2) , possibly 1.5X, 1.25X, etc. refining the scaling factor.

The point of course is to able to invert it. The naive inverse would be :

inverse(code) = code / r_norm;
which works, but has the divide we are trying to avoid. Our idea is that because r_top is small, we can use it to look up a reciprocal multiply. How should we do that? There are two choices, where we invert the steps of the forward map one by one, in reverse order :
forward(cdf) = (cdf * r_top) << (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits);

inverse(code) = (code >> (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits)) / r_top;


forward(cdf) = (cdf << (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits)) * r_top;

inverse(code) = (code / r_top) >> (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits);
if you actually do the divide, these are the same, but with the reciprocal multiply they are not. The issue is our reciprocal will only be exactly equal to the divide for some number of numerator bits. If we used the second form, we would need (code / r_top) , and "code" can be very large (24-31 bits or 56-63 bits). To do that reciprocal exactly would require an intermediate multiply larger than 64 bits.

Therefore we need the first form : first shift down code to only the necessary bits to resolve cdf boundaries, then do the divide :

inverse(code) :

uint32_t code_necessary_bits = code >> (32 - range_clz - RECIP_ARITH_TABLE_BITS - cdf_bits);

ret = code_necessary_bits / r_top;

ret = ( code_necessary_bits * recip_table[r_top] ) >> RECIP_ARITH_NUMERATOR_BITS;
(for 32-bit systems you might want to choose RECIP_ARITH_NUMERATOR_BITS = 32 so that this is a 32x32 -> hi word multiply; on 64-bit that's moot)

"code_necessary_bits" has its top bit at cdf_bits + RECIP_ARITH_TABLE_BITS ; in practice this is something like 14 + 8 = 22 . This is just enough bits that after dividing by r_top , you still resolve every cdf value exactly. In order for recip_table[] to have an exact reciprocal, it needs to be 1 more bit than the numerator, eg. cdf_bits + RECIP_ARITH_TABLE_BITS + 1. This means the intermediate that we need to multiply easily fits in 64 bits.

This acts as the limit on RECIP_ARITH_TABLE_BITS, aside from the consideration of how much space in L1 you can afford to give it.

With RECIP_ARITH_TABLE_BITS = 1 , recip_arith is the "identity map". With RECIP_ARITH_TABLE_BITS = 8, coding loss is very low (0.1%).

The fraction of "range" that can be wasted by the recip_arith map is 1/2^RECIP_ARITH_TABLE_BITS , which means the coding loss is :

coding loss @ maximum ~= -log2( 1 - 1/2^RECIP_ARITH_TABLE_BITS )

(eg. at RECIP_ARITH_TABLE_BITS = 8 this is 0.00565 bits)
The maximum occurs when all bits uner the top RECIP_ARITH_TABLE_BITS are 1; when they're all 0 this map is exact and there's no coding loss. The real world coding loss seems to be about 75% of the above formula.

Note that in a standard "range coder" you are already approximating to only use the bits of range between r_bits and cdf_bits; so eg. r_bits in 24-31 , cdf_bits of 14, means you are using 10-17 top bits of range in r_norm (the scaling factor). recip_arith just always uses 8 (for example) rather than the variable 10-17.

Note that recip_arith does not map all of range. We'll talk about variants that do use the whole range in the next post. (The standard range coder also doesn't map all of range). This means there are "code" values which the decoder cannot handle through its normal paths, where "code" is out of the mapped range. A valid stream produced by the encoder will never lead to those values being seen in the decoder, but a corrupt/attack/fuzzy stream could have them. Therefore a safe decoder must detect when code is out of the mapped range and at least handle it well enough not to crash. Testing and returning failure is an efficient way to handle it since it's just a branch that's never taken.

In review : the fully general CACM87 map uses 2 divides to encode, and 3 to decode. (encode is 2 forward maps, decode is 1 inverse map + 2 forward maps). By choosing cdf_tot to be a power of 2 we can easily eliminate the divide in the forward map. The divide in the inverse map is harder to remove, but we do it by reducing the number of bits used so that we can use a table of reciprocal multiplies.

The decoder critical path of instructions in recip_arith is : clz -> shift -> table lookup -> multiply . On modern desktops this is about the same speed or just a little faster than a divide.

No comments:

old rants