10/02/2012

10-02-12 - Small note on Adaptive vs Static Modeling

Even most people who work in compression don't realize this, but in fact in most cases Adaptive Models and Static Models can achieve exactly the same amount of compression.

Let me now try to make that note more precise :

With an adaptive model to really do things right you must :


1. Initialize to a smart/tuned initial condition (not just 50/50 probabilities or an empty model)

2. Train the model with carefully trained rates; perhaps faster learning at first then slowing down;
perhaps different rates in different parts of the model

3. Reset the model smartly at data-change boundaries, or perhaps have multiple learning scales

4. Be careful of making the adaptive model too big for your data; eg. don't use a huge model space
that will be overly sparse on small files, but also don't use a tiny model that can't learn about
big files

With a static model to do things right you must :

1. Transmit the model very compactly, using assumptions about what the model is like typically;
transmit model refreshes as deltas

2. Send model refreshes in the appropriate places; the encoder must optimally choose model refresh points

3. Be able to send variable amounts of model; eg. with order-1 huffman decide which contexts get their
own statistics and which go into a shared group

4. Be able to send the model with varying degrees of precision; eg. be able to approximate when that's better
for the overall size(model) + size(coded bits)

We've seen over and over in compression that these can be the same. For example with linear-prediction lossless image compression, assuming you are doing LSQR fits to make predictors, you can either use the local neighborhood and generate an LSQR in the decoder each time, or you can transmit the LSQR fits at the start of the file. It turns out that either way compression is about the same (!!* BUT only if the encoder in the latter case is quite smart about deciding how many fits to send and how precise they are and what pixels they apply to).

Same thing with coding residuals of predictions in images. You can either do an adaptive coder (which needs to be pretty sophisticated these days; it should have variable learning rates and tiers, ala the Fenwick symbol-rank work; most people do this without realizing it just by having independent statistics for the low values and the high values) or you can create static shaped laplacian models and select a model for each coefficient. It turns out they are about the same.

The trade off is that the static model way needs a very sophisticated encoder which can optimize the total size (sizeof(transmitted model) + sizeof(coded bits)) , but then it gets a simpler decoder.

(caveat : this is not applicable to compressors where the model is huge, like PPM/CM/etc.)

A lot of people incorrectly think that adaptive models offer better compression. That's not really true, but it is *much* easier to write a compressor that achieves good compression with an adaptive model. With static models, there is a huge over-complete set of ways to encode the data, and you need a very complex optimizing encoder to find the smallest rep. (see, eg. video coders).

Even something as simple as doing order-0 Huffman and choosing the optimal points to retransmit the model is a very hard unsolved problem. And that's just the very tip of the iceberg for static models; even just staying with order-0 Huffman you could do much more; eg. instead of retransmitting a whole model, send a delta instead. Instead of sending the delta to the ideal code lens, instead send a smaller delta to non-ideal codelens (that makes a smaller total len); instead of sending new code lens, select from one of your previous huffmans. Perhaps have 16 known huffmans that you can select from and not transmit anything (would help a lot for small buffers). etc. etc. It's very complex.

Another issue with static models is that you really need to boil the data down to its simplest form for static models to work well. For example with images you want to be in post-predictor space with bias adjusted and all that gubbins before using a static model; on text you want to be in post-BWT space or something like that; eg. you want to get as close to decorrelated as possible. With adaptive models it's much easier to just toss in some extra context bits and let the model do the decorrelation for you. Put another way, static models need much more human guidance in their creation and study about how to be minimal, whereas adaptive models work much better when you treat them badly.

No comments:

old rants