10-06-08
Followup on the "Russian Range Coder". So, first of all let me stress once again that this thing is pointless and you should not use it. It's an "optimization" that gives up some compression and doesn't actually make you faster at all. You should never do approximations or optimizations unless they actually help, even if they look like they should be faster. Best to stick with full quality everywhere lest it come back to bite you.But I was thinking about it and I realized you can improve the renorm loop. The original loop in the case of underflow just truncates the range down to the bottom part to make the top bits always equal to the top bits of "low". But that can be really bad in some cases, since you might be straddling a boundary, but most of your range could be above the straddle, not below. It's easy enough to just check whether you have more range above or below the straddle :
while(rc->range < (1<<16)) { int above = (rc->low + rc->range) & 0xFFFF; int below = (~rc->low) & 0xFFFF; //int above = rc->range - 1 - below; /* U32 hi = rc->low + rc->range; U32 hi_below = rc->low + below; U32 lo_above = hi & 0xFFFF0000; */ if ( above > below ) { rc->low += rc->range - above; rc->range = above; } else { rc->range = below; } *rc->ptr++ = (U8) (rc->low >> (RANGE_SIZE - 8)); rc->range <<= 8; rc->low <<= 8; }
That is, we either push the top bits of low up to the top bits of hi, or push the top bits of hi down to the top bits of low, depending on which gives us more code space. Again, this doesn't hurt speed at all because it's rare. In fact, that's why you should just use a proper virtual-queue range coder that handles underflow right.
Just to make it super clear what's happening, here is an actual example :
rc->low + rc->range 0x4d0043e4 rc->low 0x4cffa580 above 0x000043e4 below 0x00005a7f rc->low + below 0x4cffffff rc->low + rc->range - above 0x4d000000
Some numbers :
m:\ccc\book1 : inLen : 768,771 True Entropy : 435,042.6 : 4.527 bpb radArith : encLen : 435,116 = 4.528 bpb bytes exceeding entropy : 73 original RRC : encLen : 435,284 = 4.530 bpb bytes exceeding entropy : 241 cbloom RRC : encLen : 435,216 = 4.529 bpb bytes exceeding entropy : 173 radArith cycles: 69.6 original rc cycles: 74.2 cbloom rc cycles: 70.3
Here "radArith" is a normal Michael Schindler range coder with virtual queue. My modified "russian range coder" has about half the waste of the original (compared to the canonical range coder, not compared to entropy). Note that the arithmetic coders here are all using a max cumulative frequency of 16384, while the "true entropy" uses the exact character counts in the file.
BTW these are the links to the authors of the "Russian Range Coder" - they seem to all be dead. If anyone has live links to these people let me know and I'll hook them up.
/* Arithmetic coder based on "carryless rangecoder" by D. Subbotin. * http://www.softcomplete.com/algo/pack/rus-range.txt (Original C++ code by Subbotin) * http://hem.spray.se/mikael.lundqvist/ (C impelementation by M. Lundqvist) * H. Okumura: C magazine, Vol.14, No.7, pp.13-35, Jul. 2002 (Tutorial of data compression with sample codes, in japanese). */
Even those older entries are very interesting!
ReplyDeleteI looked up the missing links on the Wayback Machine and found them here:
Subbotin's range coder (http://www.softcomplete.com/algo/pack/rus-range.txt)
Mikael Lundqvist's page (http://hem.spray.se/mikael.lundqvist/)