10-06-08 - 7

Followup on the "Russian Range Coder". So, first of all let me stress once again that this thing is pointless and you should not use it. It's an "optimization" that gives up some compression and doesn't actually make you faster at all. You should never do approximations or optimizations unless they actually help, even if they look like they should be faster. Best to stick with full quality everywhere lest it come back to bite you.

But I was thinking about it and I realized you can improve the renorm loop. The original loop in the case of underflow just truncates the range down to the bottom part to make the top bits always equal to the top bits of "low". But that can be really bad in some cases, since you might be straddling a boundary, but most of your range could be above the straddle, not below. It's easy enough to just check whether you have more range above or below the straddle :

    while(rc->range < (1<<16))
		int above = (rc->low + rc->range) & 0xFFFF;
		int below = (~rc->low) & 0xFFFF;
		//int above = rc->range - 1 - below;
		U32 hi = rc->low + rc->range;
		U32 hi_below = rc->low + below;
		U32 lo_above = hi & 0xFFFF0000;
		if ( above > below )
			rc->low += rc->range - above;
			rc->range = above;	
			rc->range = below;
		*rc->ptr++ = (U8) (rc->low >> (RANGE_SIZE - 8));
		rc->range <<= 8;
		rc->low <<= 8;

That is, we either push the top bits of low up to the top bits of hi, or push the top bits of hi down to the top bits of low, depending on which gives us more code space. Again, this doesn't hurt speed at all because it's rare. In fact, that's why you should just use a proper virtual-queue range coder that handles underflow right.

Just to make it super clear what's happening, here is an actual example :

    rc->low + rc->range            0x4d0043e4
    rc->low                        0x4cffa580
    above                          0x000043e4
    below                          0x00005a7f
    rc->low + below                0x4cffffff
    rc->low + rc->range - above    0x4d000000

Some numbers :

m:\ccc\book1 : inLen : 768,771

True Entropy : 435,042.6 : 4.527 bpb

radArith : encLen : 435,116 = 4.528 bpb
	bytes exceeding entropy : 73

original RRC : encLen : 435,284 = 4.530 bpb
	bytes exceeding entropy : 241

cbloom RRC : encLen : 435,216 = 4.529 bpb
	bytes exceeding entropy : 173

radArith cycles: 69.6
original rc cycles: 74.2
cbloom rc cycles: 70.3

Here "radArith" is a normal Michael Schindler range coder with virtual queue. My modified "russian range coder" has about half the waste of the original (compared to the canonical range coder, not compared to entropy). Note that the arithmetic coders here are all using a max cumulative frequency of 16384, while the "true entropy" uses the exact character counts in the file.

BTW these are the links to the authors of the "Russian Range Coder" - they seem to all be dead. If anyone has live links to these people let me know and I'll hook them up.


  Arithmetic coder based on "carryless rangecoder" by D. Subbotin.
  * http://www.softcomplete.com/algo/pack/rus-range.txt
    (Original C++ code by Subbotin)
  * http://hem.spray.se/mikael.lundqvist/
    (C impelementation by M. Lundqvist)
  * H. Okumura: C magazine, Vol.14, No.7, pp.13-35, Jul. 2002
    (Tutorial of data compression with sample codes, in japanese).

No comments:

old rants