PPM : make a few contexts of previous characters order4 = * ((U32 *)(ptr-4)); etc.. look up observed counts from each context : counts_o4 = lookup_stats( order4 ); counts_o2 = lookup_stats( order2 ); counts_o1 = lookup_stats( order1 ); counts_o0 = lookup_stats( order0 ); estimate escape probability from counts at each context : esc_o4 = estimate_escape( order4 ); esc_o2 = estimate_escape( order2 ); ... code in order from most likely best to least : if ( arithmetic_code( (1-esc_o4) * counts_o4 ) ) return; else arithmetic_code( esc_o4 ); exclude counts_o4 if ( arithmetic_code( (1-esc_o2) * counts_o2 ) ) return; else arithmetic_code( esc_o2 ); exclude counts_o2 ... update counts : counts_o4 += sym; ...
Now let's do context mixing :
CM : make a few contexts of previous characters order4 = * ((U32 *)(ptr-4)); etc.. look up observed counts from each context : counts_o4 = lookup_stats( order4 ); counts_o2 = lookup_stats( order2 ); counts_o1 = lookup_stats( order1 ); counts_o0 = lookup_stats( order0 ); estimate weights from counts at each context : w_o4 = estimate_weight( order4 ); w_o2 = estimate_weight( order2 ); ... make blended counts : counts = w_o4 * counts_o4 + w_o2 * counts_o2 + ... now code : arithmetic_code( counts ); ... update counts : counts_o4 += sym; ...
It should be clear we can put them together :
make a few contexts of previous characters order4 = * ((U32 *)(ptr-4)); etc.. look up observed counts from each context : counts_o4 = lookup_stats( order4 ); counts_o2 = lookup_stats( order2 ); counts_o1 = lookup_stats( order1 ); counts_o0 = lookup_stats( order0 );
if ( CM ) { estimate weights from counts at each context : w_o4 = estimate_weight( order4 ); w_o2 = estimate_weight( order2 ); ... make blended counts : counts = w_o4 * counts_o4 + w_o2 * counts_o2 + ... // now code : arithmetic_code( counts ); ... }
else PPM { estimate escape probability from counts at each context : esc_o4 = estimate_escape( order4 ); esc_o2 = estimate_escape( order2 ); ... code in order from most likely best to least : if ( arithmetic_code( (1-esc_o4) * counts_o4 ) ) return; else arithmetic_code( esc_o4 ); exclude counts_o4 if ( arithmetic_code( (1-esc_o2) * counts_o2 ) ) return; else arithmetic_code( esc_o2 ); exclude counts_o2 ... }
update counts : counts_o4 += sym; ...In particular if we do our PPM in a rather inefficient way we can make them very similar :
make a few contexts of previous characters order4 = * ((U32 *)(ptr-4)); etc.. look up observed counts from each context : counts_o4 = lookup_stats( order4 ); counts_o2 = lookup_stats( order2 ); counts_o1 = lookup_stats( order1 ); counts_o0 = lookup_stats( order0 ); accumulate into blended_counts : blended_counts = 0;
if ( CM ) { estimate weights from counts at each context : w_o4 = estimate_weight( order4 ); w_o2 = estimate_weight( order2 ); ... }
else PPM { estimate escape probability from counts at each context : esc_o4 = estimate_escape( order4 ); esc_o2 = estimate_escape( order2 ); ... do exclude : exclude counts_o4 from counts_o2 exclude counts_o4,counts_o2 from counts_o1 ... make weights : w_o4 = (1 - esc_04); w_o2 = esc_04 * (1 - esc_02); ... }
make blended counts : blended_counts += w_04 * counts_o4; blended_counts += w_02 * counts_o2; ... arithmetic_code( blended_counts ); update counts : counts_o4 += sym; ...Note that I haven't mentioned whether we are doing binary alphabet or large alphabet or any other practical issues, because it doesn't affect the algorithm in a theoretical way.
While I'm at it, let me take the chance to mark up the PPM pseudocode with where "modern" PPM differs from "classical" PPM : (by "modern" I mean 2002/PPMii and by "classical" I mean 1995/"before PPMZ").
PPM : make a few contexts of previous characters order4 = * ((U32 *)(ptr-4)); etc..
also make non-continuous contexts like skip contexts : AxBx contexts containing only a few top bits from each byte contexts involving a word dictionary contexts involving current position in the stream
look up observed counts from each context : counts_o4 = lookup_stats( order4 ); counts_o2 = lookup_stats( order2 ); counts_o1 = lookup_stats( order1 ); counts_o0 = lookup_stats( order0 );
possibly rescale counts using a "SEE" like operator eg. use counts as an observation which you then model to predict coding probabilities estimate escape probability from counts at each context : esc_o4 = estimate_escape( order4 ); esc_o2 = estimate_escape( order2 ); ...
secondary estimate escape using something like PPMZ also not just using current context but also other contexts and side information
code in order from most likely best to least :
use LOE to choose best order to start from, not necessarily the largest context also don't skip down through the full set, rather choose a reduced set
if ( arithmetic_code( (1-esc_o4) * counts_o4 ) ) return; else arithmetic_code( esc_o4 ); exclude counts_o4 if ( arithmetic_code( (1-esc_o2) * counts_o2 ) ) return; else arithmetic_code( esc_o2 ); exclude counts_o2 ... update counts : counts_o4 += sym; ...
do "partial exclusion" like PPMii, do full update down to coded context and then reduced update to parents to percolate out information a bit do "inheritance" like PPMii - novel contexts updated from parents do "fuzzy updates" - don't just update your context but also neighbors which are judged to be similar in some way
And that's all folks.
2 comments:
1. Did you see this
http://encode.ru/threads/541-Simple-bytewise-context-mixing-demo?
2. My opinion is that PPM is a speed optimization of CM - imagine taking a CM and caching the mixed freqs of lower orders in the context node.
Your "green" and "tree" programs? Yeah, I downloaded them and like them very much, they're a good introduction to CM.
Why is noone else doing bytewise mixing? Have you tried making the weighting more generic instead of hand-tweaked?
Also, yeah I think it is a good way to see PPM as an optimization of CM. PPM is a particular type of simplified mixing in which you can stop doing any more work once you code from a context.
Post a Comment