PPM :
make a few contexts of previous characters
order4 = * ((U32 *)(ptr-4));
etc..
look up observed counts from each context :
counts_o4 = lookup_stats( order4 );
counts_o2 = lookup_stats( order2 );
counts_o1 = lookup_stats( order1 );
counts_o0 = lookup_stats( order0 );
estimate escape probability from counts at each context :
esc_o4 = estimate_escape( order4 );
esc_o2 = estimate_escape( order2 );
...
code in order from most likely best to least :
if ( arithmetic_code( (1-esc_o4) * counts_o4 ) ) return; else arithmetic_code( esc_o4 );
exclude counts_o4
if ( arithmetic_code( (1-esc_o2) * counts_o2 ) ) return; else arithmetic_code( esc_o2 );
exclude counts_o2
...
update counts :
counts_o4 += sym;
...
Now let's do context mixing :
CM :
make a few contexts of previous characters
order4 = * ((U32 *)(ptr-4));
etc..
look up observed counts from each context :
counts_o4 = lookup_stats( order4 );
counts_o2 = lookup_stats( order2 );
counts_o1 = lookup_stats( order1 );
counts_o0 = lookup_stats( order0 );
estimate weights from counts at each context :
w_o4 = estimate_weight( order4 );
w_o2 = estimate_weight( order2 );
...
make blended counts :
counts = w_o4 * counts_o4 + w_o2 * counts_o2 + ...
now code :
arithmetic_code( counts );
...
update counts :
counts_o4 += sym;
...
It should be clear we can put them together :
make a few contexts of previous characters
order4 = * ((U32 *)(ptr-4));
etc..
look up observed counts from each context :
counts_o4 = lookup_stats( order4 );
counts_o2 = lookup_stats( order2 );
counts_o1 = lookup_stats( order1 );
counts_o0 = lookup_stats( order0 );
if ( CM )
{
estimate weights from counts at each context :
w_o4 = estimate_weight( order4 );
w_o2 = estimate_weight( order2 );
...
make blended counts :
counts = w_o4 * counts_o4 + w_o2 * counts_o2 + ...
// now code :
arithmetic_code( counts );
...
}
else PPM
{
estimate escape probability from counts at each context :
esc_o4 = estimate_escape( order4 );
esc_o2 = estimate_escape( order2 );
...
code in order from most likely best to least :
if ( arithmetic_code( (1-esc_o4) * counts_o4 ) ) return; else arithmetic_code( esc_o4 );
exclude counts_o4
if ( arithmetic_code( (1-esc_o2) * counts_o2 ) ) return; else arithmetic_code( esc_o2 );
exclude counts_o2
...
}
update counts :
counts_o4 += sym;
...
In particular if we do our PPM in a rather inefficient way we can make them very similar :
make a few contexts of previous characters
order4 = * ((U32 *)(ptr-4));
etc..
look up observed counts from each context :
counts_o4 = lookup_stats( order4 );
counts_o2 = lookup_stats( order2 );
counts_o1 = lookup_stats( order1 );
counts_o0 = lookup_stats( order0 );
accumulate into blended_counts :
blended_counts = 0;
if ( CM )
{
estimate weights from counts at each context :
w_o4 = estimate_weight( order4 );
w_o2 = estimate_weight( order2 );
...
}
else PPM
{
estimate escape probability from counts at each context :
esc_o4 = estimate_escape( order4 );
esc_o2 = estimate_escape( order2 );
...
do exclude :
exclude counts_o4 from counts_o2
exclude counts_o4,counts_o2 from counts_o1
...
make weights :
w_o4 = (1 - esc_04);
w_o2 = esc_04 * (1 - esc_02);
...
}
make blended counts :
blended_counts += w_04 * counts_o4;
blended_counts += w_02 * counts_o2;
...
arithmetic_code( blended_counts );
update counts :
counts_o4 += sym;
...
Note that I haven't mentioned whether we are doing binary alphabet or large alphabet or any other practical issues, because it
doesn't affect the algorithm in a theoretical way.
While I'm at it, let me take the chance to mark up the PPM pseudocode with where "modern" PPM differs from "classical" PPM :
(by "modern" I mean 2002/PPMii and by "classical" I mean 1995/"before PPMZ").
PPM :
make a few contexts of previous characters
order4 = * ((U32 *)(ptr-4));
etc..
also make non-continuous contexts like
skip contexts : AxBx
contexts containing only a few top bits from each byte
contexts involving a word dictionary
contexts involving current position in the stream
look up observed counts from each context :
counts_o4 = lookup_stats( order4 );
counts_o2 = lookup_stats( order2 );
counts_o1 = lookup_stats( order1 );
counts_o0 = lookup_stats( order0 );
possibly rescale counts using a "SEE" like operator
eg. use counts as an observation which you then model to predict coding probabilities
estimate escape probability from counts at each context :
esc_o4 = estimate_escape( order4 );
esc_o2 = estimate_escape( order2 );
...
secondary estimate escape using something like PPMZ
also not just using current context but also other contexts and side information
code in order from most likely best to least :
use LOE to choose best order to start from, not necessarily the largest context
also don't skip down through the full set, rather choose a reduced set
if ( arithmetic_code( (1-esc_o4) * counts_o4 ) ) return; else arithmetic_code( esc_o4 );
exclude counts_o4
if ( arithmetic_code( (1-esc_o2) * counts_o2 ) ) return; else arithmetic_code( esc_o2 );
exclude counts_o2
...
update counts :
counts_o4 += sym;
...
do "partial exclusion" like PPMii, do full update down to coded context
and then reduced update to parents to percolate out information a bit
do "inheritance" like PPMii - novel contexts updated from parents
do "fuzzy updates" - don't just update your context but also neighbors
which are judged to be similar in some way
And that's all folks.
1. Did you see this
ReplyDeletehttp://encode.ru/threads/541-Simple-bytewise-context-mixing-demo?
2. My opinion is that PPM is a speed optimization of CM - imagine taking a CM and caching the mixed freqs of lower orders in the context node.
Your "green" and "tree" programs? Yeah, I downloaded them and like them very much, they're a good introduction to CM.
ReplyDeleteWhy is noone else doing bytewise mixing? Have you tried making the weighting more generic instead of hand-tweaked?
Also, yeah I think it is a good way to see PPM as an optimization of CM. PPM is a particular type of simplified mixing in which you can stop doing any more work once you code from a context.