Plural Tunstall VTF coding in general is extremely fast to decode. It works best with 12-bit tables (must stay in L1), which means it only works well at entropy <= 4 bpb.
Marlin introduces an improved word probability model that captures the way the first letter probability is skewed by the longer-string exclusion. (this is just like the LAM exclusion in LZ)
That is, after coding with word W, subsequent chars that exist in the dictionary (Wa,Wb) cannot then be the next character to start a word, so the probably of the first char of the word being >= c is increased.
The Marlin word probability model improves compression by 1-4% over naive plural Tunstall.
The simplest implementation of the Marlin probability adjustment is like this :
P_naive(W) = Prod[chars c in W] P('c')
P_word(W) = P_scale_first_char( W[0] ) * P_naive(W) * Ptail( num_children(W) )
(where Ptail is the tail-cumulative-proability :
Ptail(x) = Sum[ c >= x ] P('c') (sum of character probabilities to end)
)
(instead of scaling by Ptail you can subtract off the child node probabilities as you make them)
on the first iteration of dictionary building, set P_scale_first_char() = 1.0
this is "naive plural Tunstall"
after building the dictionary, you now have a set of words and P(W) for each
compute :
P_state(i) = Sum[ words W with i children ] P(W)
(state i means only chars >= i can follow)
(iterating P_state -> P(W) a few times here is optional but unnecessary)
P_scale_first_char(c) = Sum[ i <= c ] P_state(i) / P_tail(i)
(P_scale_first_char = P_state_tail)
then repeat dic building one more time
(optionally repeat more but once seems fine)
And that's it!
What do these values actually look like? I thought it might be illuminating to dump them.
This is on a pretty skewed file so the effect is large, the larger the MPS probability the bigger
the effect.
R:\tunstall_test\monarch.tga.rrz_filtered.bmp
filelen = 1572918
H = 2.917293
P of chars =
[0] 0.46841475525106835 double
[1] 0.11553621994280693 double
[2] 0.11508546535801611 double
[3] 0.059216055763873253 double
[4] 0.058911526220693004 double
[5] 0.036597584870921435 double
[6] 0.036475518749229136 double
[7] 0.018035269480036465 double
[8] 0.017757441900976400 double
[9] 0.010309501194595012 double
[10] 0.0097379520102128646 double
P_state =
[0] 0.62183816678155190 double
[1] 0.15374679894466811 double
[2] 0.032874234239563829 double
[3] 0.063794018822874776 double
[4] 0.026001940955215786 double
[5] 0.011274295764837820 double
[6] 0.028098911350290755 double
[7] 0.012986055279597277 double
[8] 0.0013397794289329405 double
.. goes to zero pretty fast ..
P_scale_first_char =
[0] 0.62183816678155202 double
[1] 0.91106139196208336 double
[2] 0.99007668169839014 double
[3] 1.2020426052137052 double
[4] 1.3096008656256881 double
[5] 1.3712643080249607 double
[6] 1.5634088663186672 double
[7] 1.6817189544649160 double
[8] 1.6963250203077103 double
[9] 1.9281295477496172 double
[10] 1.9418127462353780 double
[11] 2.0234438458996773 double
[12] 2.0540542047979415 double
[13] 2.1488636999462676 double
[14] 2.2798060244386895 double
[15] 2.2798060244386895 double
[16] 2.2798060244386895 double
[17] 2.3660205062350039 double
[18] 2.3840557757150402 double
[19] 2.3840557757150402 double
[20] 2.3840557757150402 double
[21] 2.4066061022686838 double
[22] 2.5584098628550294 double
[23] 2.5584098628550294 double
[24] 2.6690752676624321 double
.. gradually goes up to 4.14
estimate real first char probability
= P(c) * P_scale_first_char(c) =
[0] 0.29127817269875372 double
[1] 0.10526058936313110 double
[2] 0.11394343565337962 double
[3] 0.071180221940886246 double
[4] 0.077150585733949978 double
[5] 0.050184961893408854 double
[6] 0.057026149416117611 double
[7] 0.030330254533459933 double
the effect is to reduce the skewing of the probabilities in the post-word alphabet.
I think this problem space is not fully explored yet and I look forward to seeing more work in this domain in the future.
I'm not convinced that the dictionary building scheme here is optimal. Is there an optimal plural VTF dictionary?
I think maybe there's some space between something like RANS and Tunstall. Tunstall inputs blocks of N bits and doesn't carry any state between them. RANS pulls blocks of N bits and carries full state of unused value range between them (which is therefore a lot slower because of dependency chains). Maybe there's something in between?
Another way to think about this Marlin ending state issue is a bit like an arithmetic coder. When you send the index of a word that has children, you have not only sent that word, you've also sent information about the *next* word. That is, some fractional bits of information that you'd like to carry forward.
Say you have {W,Wa,Wb} in the dictionary and you send a W. You've also sent that the next word start with >= c. That's like saying your arithmetic coder cumprob is >= (Pa+Pb). You've refined the range of the next interval.
This could be done with Tunstall by doing the multi-table method (separate tables for state 0,1, and 2+), but unfortunately that doesn't fit in L1.
BTW you can think of Tunstall in an arithmetic codey way. Maybe I'll draw a picture because it's easier to show that way...
I noted this in one of the posts, but I think it bears repeating :
ReplyDeleteThe Marlin probability adjustment is not the only way to handle the tail-cant-be-head exclusion. An alternative that was previously in the literature is to have a separate Tunstall table for each tail exclusion case.
A hybrid approach is also possible & interesting.
When the alphabet is quite skew; the MPS (most probably symbol) has high probability.
In that case you might like to make two Tunstall tables. One where MPS can be the first symbol, and one where it can't. (MPS can't be the first symbol is common and important because it will often be available as a tail symbol). You could then do the Marlin probability adjustment on both of those tables, which would compensate for the non-MPS exclusion cases.
The tail-cant-be-head is similar to the LAM exclusion in LZ77.