# cbloom rants

## 8/11/2010

### 08-11-10 - ROLZ and Links

I found this little tutorial on Fenwick Trees a little while ago. Peter's original paper is a better way to learn it, but the graphics on this page are really nice; you can really grock the structure of the tree best when you see it visually.

I also found this : Anatomy of ROLZ data archiver , which is the only actual algorithm description I've ever found of ROLZ , since Ilia doesn't write up his work. (there's also a brief description at the Russian Wikipedia ).

Anyway, it's pretty obvious how you would do ROLZ, there are few unexpected cool things on the "Anatomy of ROLZ data archiver" page.

1. The way he keeps the lists of offsets for each context by just stepping back through the history of the file already processed is pretty cool. It means there's no actual separate [context][maxoffsets] table at all, the offsets themselves are pointers back a linked list. It also means that you can do sliding-window trivially.

2. In the BALZnoROLZ.txt file he has Ilia Muraviev's binary probability updater :

```
//This is predictor of Ilya Muraviev
class TPredictor {
private:
unsigned short p1, p2;
public:
TPredictor(): p1(1 << 15), p2(1 << 15) {}
~TPredictor() {}
int P() {
return (p1 + p2);
}
void Update(int bit) {
if (bit) {
p1 += unsigned short(~p1) >> 3;
p2 += unsigned short(~p2) >> 6;
}
else {
p1 -= p1 >> 3;
p2 -= p2 >> 6;
}
}
};

```

First of all, let's back up a second, what is this? It's a probability update for binary arithmetic coding. A very standard way to do fast probability updates for binary arithmetic coding is to do :

```
#define PROB_ONE    (1<<14) // or whatever
#define PROB_UPD_SHIFT  (6) // or something

prob = PROB_ONE >> 1; // start at 1/2

if ( bit )
prob += (PROB_ONE - prob) >> PROB_UPD_SHIFT;
else
prob -= prob >> PROB_UPD_SHIFT;

```
what this is doing is when you get a zero bit :
```
prob *= (1 - 2^-PROB_UPD_SHIFT);

```
that's equivalent to a normal counting probability update if you put :
```
n1 = prob*N
n0 = N - n1

when I get a zero bit n0++ and N++

prob = n1 / N

so update is

prob := prob*N / (N+1)

or prob *= N / (N+1)

so

N/(N+1) = (1 - 2^-PROB_UPD_SHIFT)

which means

N = (2^PROB_UPD_SHIFT - 1)

```
then you keep prob and reset N; that is, this update is equivalent to pretending you have such an n0 and N and you increment them and compute the new probability, but then you don't actually store N, so the next update will have the same weight (if N increased then each update has a smaller effect than the last). This is an IIR filter that acts a bit like a moving average of the last N. The larger N is, the bigger window we are effectively using. A small N adapts very quickly.

So Ilia's probability update is a 2^3-1 and 2^6-1 window size, and then averaged. That's a very simple and neat idea that never occured to me - use two simple probability estimators, one that adapts very fast and one that adapts more slowly, and just blend them.

ryg said...

The blended probability estimator is really neat and simple. Never thought about that either.

One note about this type of estimator in general: there's a somewhat slicker way to write the update rule that usually leads to better code.

The general form for the usual expression is:

if (bit)
p += (max - p) >> n;
else
p -= p >> n;

You can munge the second expression a bit to make it look more like the first:

p -= p >> n;
<=> p += -(p >> n);
<=> p += (bias - p) >> n; (note this is an arithmetic right shift now!)

where bias = (1 << n) - 1 is a constant. So you write the whole update rule as:

p += ((bit ? max : bias) - p) >> n;

and compilers will usually be able to transform that into branchless code (either with cmovs or bit-fu). Fairly easy, but I never saw this mentioned anywhere and most code you find has the other expression (which is admittedly way easier to understand). I used this when size-optimizing the depacker for my executable compressor "kkrunchy".

Some bit of fairly useless trivia: When you're writing depacker code in x86 ASM and end up reading individual bits, the awesome thing for your GetBit function/macro to do is to just return the bit you just read in the carry flag. It's usually easy enough to arrange the code so this happens more or less automatically, and it's very useful to have:
- Carry is untouched by most non-arithmetic ops and inc/dec, so it's a fairly decent flag to return results in (unlike e.g. the zero flag)
- You can jump directly based on the carry flag (jc / jnc)
- You can materialize the bit as integer in an arbitrary register if you want: setc (for byte regs), xor reg, reg / adc reg, reg otherwise.
- You can turn it into a mask with one instruction (sbb reg, reg)

Another staple of tiny LZ decoders is the reordered gamma / Exp-Golomb code: instead of first encoding the length then the "value" bits, you interleave the two. The decoding loop is just:

int val = 1;
while (getbit()) val = val * 2 + getbit();
val--;

or the variant

val = 1;
do
val = val * 2 + getbit();
while (getbit());
val -= 2; // or just skip it

Complete x86 ASM code for this: (with getbit-returns-in-carry)

xor ecx, ecx
inc ecx
nextbit:
call getbit
call getbit
jc nextbit

which is pretty much the perfect way to encode match lens (and offsets for that matter) for tiny simple LZs. Both APack (DOS EXE-packer in the mid-90s) and NRV/UCL (the UPX back end) are based on this.

ryg said...

Holy shit I want a preformatted/code tag...

cbloom said...

"One note about this type of estimator in general: there's a somewhat slicker way to write the update rule that usually leads to better code. ...
and compilers will usually be able to transform that into branchless code "

Yeah, but in real code you almost always put that probability update right inside your arithmetic decode.

I guess you could actually make the whole arithmetic decode + update branchless, but then you often branch on the result of the decode anyway.

Certainly you should only have one branch for decode + prob update + act on decode result. People often have two or three or four branches to do that.

cbloom said...

"Some bit of fairly useless trivia: When you're writing depacker code in x86 ASM and end up reading individual bits, the awesome thing for your GetBit function/macro to do is to just return the bit you just read in the carry flag"

Oh yeah, I've always wanted that. Unfortunately it's pretty inexpressible in C :(

I'm finding your tidbits really interesting, you should write this stuff up.