RealTime Data Compression A comparison of Arithmetic Encoding with FSE
rANS notes The ryg blog
Getting back to my slow exposition track.
We talked before about how strings like "ABC" specify an ANS state machine. The string is the symbols that should be output by the decoder in each state, and there's an implicit cyclic repeat, so "ABC" means "ABCABC..". The cyclic repeat corresponds to only using some modulo of state in the decoder output.
Simple enumerations of the alphabet (like "ABC") are just flat codes. We saw before that power of two binary-distributed strings like "ABAC" are Huffman codes.
What about something like "AAB" ? State 0 and 1 both output an A. State 2 outputs a B. That means A should have twice the probability of B.
How do we encode a state like that? Putting in a B is obvious, we need to make the bottom of state be
a 2 (mod 3) :
x' = x*3 + 2
but to encode an A, if we just did the naive op :
x' = x*3 + 0
x' = x*3 + 1
we're wasting a value. Either a 0 or 1 at the bottom would produce an A, so we have a free bit there.
We need to make use of that bit or we are wasting code space. So we need to find a random bit to transmit
to make use of that freedom. Fortunately, we have a value sitting
around that needs to be transmitted that we can pack into that bit - x!
take a bit off x :
b = x&1
x >>= 1
x' = x*3 + b
x' = (x/2)*3 + (x%2)
more generally if the output string is of length M and symbol s occurs Fs times, you would do :
x' = (x/Fs)*M + (x%Fs) + Bswhich is the formula for rANS.
Now, note that rANS always makes output strings where the symbols are not interleaved. That is, it can make "AAB" but it can't make "ABA". The states that output the same symbol are in consecutive blocks of length Fs.
This is actually not what we want, it's an approximation in rANS.
For example, consider a 3-letter alphabet and M=6. rANS corresponds to an output string of "AABBCC".
We'd prefer "ABCABC". To see why, recall the arithmetic coding formula that we wanted to use :
x' = (x/P) + C
the important part being the (x/P). We want x to grow by that amount, because that's what gives us
compression to the entropy. If x grows by too much or too little, we aren't coding with codelens that
match the probabilities, so there will be some loss.
P = F/M , and we will assume for now that the probabilities are such that the rational expression F/M
is the true probability. What we want is to do :
x' = (x*M)/F + C
to get a more accurate scaling of x. But we can't do that because in general that division will
cause x' to not fall in the bucket [Cs,Cs+1) , which would make decoding impossible.
So instead, in rANS we did :
x' = floor(x/F)*M + C + (x%F)
the key part here being that we had to do floor(x/F) instead of (x/P), which means the bottom bits of x are
not contributing to the 1/P scaling the way we want them to.
x = 7
F = 2
M = 6
P = 1/3
should scale like
x -> x/P = 21
instead scales like
x -> (x/F)*M + (x%F) = (7/2)*6 + (7%2) = 3*6 + 1 = 19
because we lost the bottom bit of 7 when we did (7/2)
In practice this does in fact make a difference when the state value (x) is small. When x is generally
large (vs. M), then (x/F) is a good approximation of the correct scaling. The closer x is to M, the
worse the approximation.
In practice with rANS, you should use something like x in 32-bits and M < 16-bits, so you have decent resolution. For tANS we will be using much lower state values, so getting this right is important.
As a concrete example :
alphabet = 3
1000 random symbols coded
H = 1.585
K = 6
6 <= x < 12
output string "ABCABC"
wrote 1608 bits
output string "AABBCC"
wrote 1690 bits
And a drawing of what's happening :
I like the way Jarek Duda called rANS the "range coder" variant of ANS. While the analogy is not perfect, there is a similarity in the way it approximates the ideal scaling and gains speed.
The crucial difference between a "range coder" and prior (more accurate) arithmetic coders is that the
range coder does a floor divide :
range coder :
symhigh * (range / symtot)
CACM (and such) :
(symhigh * range) / symtot
this approximation is good as long as range is large and symtot is small, just like with rANS.