Anyway, in there I found this little nugget that I had completely forggoten about. I wrote :
Step 3 uses a method introduced by F. Rubin [3], and popularized by G.V. Cormack�s DMC implementation [1]. It is: while ( R < 256 ) { output(L>>16); R <<= 8; L = (L << 8) & 0xFFFFFF; if ( (L + R) > 0xFFFFFF ) R = 0xFFFFFF - L; } This method is approximate, and therefore produces slightly longer output than the canonical CACM-87 style normalization [5]. However, the output is byte-aligned, and the normalization loop is almost always only performed once; these properties make this method extremely fast. The line: if ( (L + R) > 0xFFFFFF ) R = 0xFFFFFF - L; is where the extra output comes from. It can be seen that R is shrunk to less than it need be, so extra normalizations are needed, and more bytes are output. However, this only happens 1/65536th of the time, so the excess output is negligible.Well what do you fucking know. This is a "carryless range coder" (Rant on New Arithmetic Coders) which was apparently rediscovered 20+ years later by the russians. ( see also (Followup on the "Russian Range Coder") ). The method of reducing range to avoid the carry is due to Rubin :
F. Rubin, "Arithmetic Stream Coding Using Fixed Precision Registers", IEEE Trans. Information Theory IT-25 (6) (1979), p. 672 - 675And I found it in Cormack's DMC which amazingly is still available at waterloo : dmc.c
Cormack in 1987 wrote in the dmc.c code :
while ((max-min) < 256) { if(bit)max--; putchar(min >> 16); outbytes++; min = (min << 8) & 0xffff00; max = (max << 8) & 0xffff00; if (min >= max) max = 0x1000000; }which is a little uglier than my version above, but equivalent (he has a lot of ugly +1/-1 stuff cuz he didn't get that quite right).
And actually in the Data Compression Using Dynamic Markov Modelling paper that goes with the dmc.c code, they describe the arithmetic coder due to Guazzo and it is in fact an "fpaq0p" style carryless arithmetic coder (it avoids carries by letting range get very small and only works on binary alphabets). I don't have the original paper due to Guazzo so I can't confirm that attribution. The one in the paper does bit by bit output, but as I've stressed before that doesn't characterize the coder, and Cormack then did bytewise output in the implementation.
Anyway, I had completely forgotten about this stuff, and it changes my previous attribution of byte-wise arithmetic coding to Schindler '98 ; apparently I knew about it in '95 and Cormack did it in '87. (The only difference between the Schindler '98 coder and the NTiCM '95 coder is that I was still doing (Sl*L)/R while Schindler moved to Sl*(L/R) which is a small but significant change).
Apparently the "russian range coder" is actually a "Rubin arithmetic coder" (eg. avoid carries by shrinking range), and the "fpaq0p binary carryless range coder" is actually a "Guazzo arithmetic coder" (avoid carries by being binary and ensuring only range >= 2).
Fits in with the general pattern in maths where most of the important concepts are discovered at least twice and tend to be named after the second or third person to write about them :)
ReplyDeleteHere are all arithmetic coders I got source from:
ReplyDeletehttp://pyramidworkshop.svn.sourceforge.net/viewvc/pyramidworkshop/libcoding-0.0.0/Coding/Arithmetic/
When I took a look at F0also detected it's the Guazzo-Coder, as one can see here:
http://pyramidworkshop.svn.sourceforge.net/viewvc/pyramidworkshop/libcoding-0.0.0/Coding/Arithmetic/BitCoder/GuazzoLow.hpp?revision=312&view=markup
http://pyramidworkshop.svn.sourceforge.net/viewvc/pyramidworkshop/libcoding-0.0.0/Coding/Arithmetic/BitCoder/F0Low.hpp?revision=312&view=markup
I tried to make the code appear as similar as possible.