# cbloom rants

## 2/14/2014

### 02-14-14 - Understanding ANS - 11

I want to do some hand waving about the different ways you can conceptually look at ANS.

Perhaps the best way to understand ANS mathematically is via the analogy with arithmetic coding . While ANS is not actually building up an arithmetic coder interval for the file, each step is very much like a LIFO arithmetic coder, and the (x/P) scaling is what makes x grow the right amount for each symbol. This is the most useful way to think about rANS or uANS, I find.

But there are other ways to think about it.

One is Duda's "asymmetric numeral system", which is how he starts the explanation in the paper, and really confused me to begin with. Now that we've come at ANS from the back way we can understand what he was on about.

The fundamental op in ANS is :

```
integer x contains some previous value

make x' = x scaled up in some way + new value

```
with a normal "symmetric numeral system" , you would just do base-b math :
```
x' = x * b + v

```
which gives you an x' where the old value (x) is distributed evenly, and the v's just cycle :
```
b = 3 for example

x':  0  1  2  3  4  5  6  7  8 ...
x :  0  0  0  1  1  1  2  2  2
v :  0  1  2  0  1  2  0  1  2

```
x' is a way of packing the old value x and the new value v together. This symmetric packing corresponds to the output string "012" in the parlance of this series. The growth factor (x'/x) determines the number of bits required to send our value, and it's uniform.

But it doesn't need to be uniform.

```
0102 :

x':  0  1  2  3  4  5  6  7  8 ...
x :  0  0  1  0  2  1  3  1  4
v :  0  1  0  2  0  1  0  2  0

```
Intuitively, the more often a symbol occurs in the output string, the more slots there are for the previous value (x) to get placed; that is, more bits of x can be sent in lower values of x' when the symbol occurs in many slots. Hence x' grows less. If you're thinking in terms of normalized x's, growing less means you have to output fewer bits to stay in the renormalization range.

You can draw these asymmetric numeral lines in different ways, which Duda does in the paper. For example :

```
input x as the axis line,
output x' in the table :

"AB"
0 1 2 3 4 5 6  x
A 0 2 4          x'
B 1 3 5

"AAB"

0 1 2 3 4 5 6  x
A 0 1 3 4 6 7 9  x'
B 2 5 8 11

output x' as the axis line
input x in the table :

"AB"
0 1 2 3 4 5 6  x'
A 0   1   2   3  x
B   0   1   2

"AAB"
0 1 2 3 4 5 6  x'
A 0 1   2 3   4  x
B     0     1

output x' line implicit
show x and output symbol :

"AB"
0 0 1 1 2 2 3
A B A B A B A

"AAB"
0 1 0 2 3 1 4
A A B A A B A

```
That is, it's a funny way of just doing base-b math; we're shifting up the place value and adding our value in, but we're in an "asymmetric numeral system", so the base is nonuniform. I find this mental image not very useful when thinking about how the coding actually works.

There's another way to think about tANS in particular (tANS = table-based ANS), which is what Yann is talking about .

To get there mentally, we actually need to optimize our tANS code.

When I covered tANS encoding before , I described it something like this :

```
x is the current state
x starts in the range I = [L,2L-1]

to encode the next symbol s
we need to reach the precursor range Is = [Fs,2Fs-1]

to do that, output bits from x
b = x&1; x >>= 1;
until x is lowered to reach Is

then take the state transition C()
this takes x back into I

```
this should be familiar and straightforward.

To optimize, we know that x always starts in a single power of 2 interval [L,2L-1] , and it always lands in a power of 2 interval [Fs,2Fs-1]. That means the minimum number of bits we ever output is from L to 2Fs-1 , and the maximum number of bits is only 1 more than that. So the renormalization can be written as :

```
precompute :

max_state = 2Fs - 1;
min_num_bits = floor(log2(L/Fs));

to renormalize :

x in [L,2L-1]
output min_num_bits from x
x >>= min_num_bits

now ( x >= Fs && x < 2*max_state );

if ( x > max_state ) output 1 more bit; x>>= 1;

now x in [Fs,2Fs-1]

```
But you can move the check for the extra output bit earlier, before shifting x down :
```
precompute :

min_num_bits = log2(L) - log2ceil(Fs);  // if L is power of 2
threshold = (2*Fs)<`<`num_bits;

to renormalize :

x in [L,2L-1]
num_bits = min_num_bits;
if ( x >= threshold ) num_bits ++;
output num_bits from x
x >>= num_bits

x in [Fs,2Fs-1]

```
and then use C(x) since x is now in Is.

It's just straightforward optimization, but it actually allows us to picture the whole process in a different way. Let's write the same encoder, but just in terms of a table index :

```
let t = x - L
t in [0,L-1]

t is a table index.

to encode s :

num_bits = min_num_bits[s] + ( t >= threshold[s] );
bitoutput( t, num_bits );
t = encode_state_table[s][ (t+L)>>num_bits ];

```
That is, we're going from a table index to another table index. We're no longer thinking about going back to the [Fs,2Fs-1] precursor range at all.

Before we got our desired code len by the scaling of the intervals [L,2L)/[Fs,2Fs) , now the code len is the stored number of bits. We can see that we get fractional bits because sometimes we output one more.

Let's revisit an example that we went through previously , but with this new image.

```
L = 8
Fs = {3,3,2}
output string = ABCABCAB

```
We can see right away that our table index t is 3 bits. To encode a 'C' there will be only two slots on our numeral line that correspond to a lower digit of C, so we must output 2 bits and keep 1 bit of t. To encode an 'A' we can keep 3 values, so we can output 1 bit for t in [0,3] and 2 bits for t in [4,7] ; that will give us 2 retained values in the first region and 1 retained value in the second.

Explicitly :

```
t in [0,7]
I want to encode an A
so I want to reach {AxxAxxAx}

t in [0,3]
output t&1
index = (t+L)>>1 = 4 or 5
take the last two A's {xxxAxxAx}
so state -> 3 or 6

t in [4,7]
output t&3
index = (t+L)>>2 = 3
take the first A {Axxxxxxx}
state -> 0

```
Note that the way we're doing it, high states transition to low states, and vice versa. These comes up because of the +L sentry bit method to separate the subranges produced by the shift.

The tANS construction creates this encode table :

```
encode:
A : b=1+(t>=4) : {0,3,6}
B : b=1+(t>=4) : {1,4,7}
C : b=2+(t>=8) : {2,5}

```
It should be obvious that we can now drop all our mental ideas about "ANS" and just make these coding tables directly. All you need is an output string, and you think about doing these kinds of mapping :
```
t in [0,7]

I want to encode a B

[xxxxxxxx] -> [xBxxBxxB]

output bits to reduce the 3 values
and transition to one of the slots with a B

```
The decode table is trivial to make from the inverse :
```
decode:
0: A -> 4 + getbits(2)
1: B -> 4 + getbits(2)
2: C -> 0 + getbits(2)
3: A -> 0 + getbits(1)
4: B -> 0 + getbits(1)
5: C -> 4 + getbits(2)
6: A -> 2 + getbits(1)
7: B -> 2 + getbits(1)

```
Note that each symbol's decode covers the entire origin state range :
```
decode:
0: A -> 4 + getbits(2)  from [4,7]
3: A -> 0 + getbits(1)  from [0,1]
6: A -> 2 + getbits(1)  from [2,3]

1: B -> 4 + getbits(2)  from [4,7]
4: B -> 0 + getbits(1)  from [0,1]
7: B -> 2 + getbits(1)  from [2,3]

2: C -> 0 + getbits(2)  from [0,3]
5: C -> 4 + getbits(2)  from [4,7]

```
During decode we can think about our table index 't' as containing two pieces of information : one is the current symbol to output, but there's also some information about the range where t will be on the next step. That is, the current t contains some bits of the next t. The number of bits depends on where we are in the table. eg. in the example above; when t=4 we specify a B, but we also specify 2 bits worth of the next t.

Doing another example from that earlier post :

```
Fs={7,6,3}

ABCABABACBABACBA

encode:
A : b=1+(t>=12) : {0,3,5,7,10,12,15}
B : b=1+(t>=8) : {1,4,6,9,11,14}
C : b=2+(t>=8) : {2,8,13}

decode:
0: A -> 12 + getbits(2)
1: B -> 8 + getbits(2)
2: C -> 8 + getbits(3)
3: A -> 0 + getbits(1)
4: B -> 12 + getbits(2)
5: A -> 2 + getbits(1)
6: B -> 0 + getbits(1)
7: A -> 4 + getbits(1)
8: C -> 0 + getbits(2)
9: B -> 2 + getbits(1)
10: A -> 6 + getbits(1)
11: B -> 4 + getbits(1)
12: A -> 8 + getbits(1)
13: C -> 4 + getbits(2)
14: B -> 6 + getbits(1)
15: A -> 10 + getbits(1)

```
and this concludes our conception of tANS in terms of just an [0,t-1] table.

I'm gonna be super redundant and repeat myself some more. I think it's intriguing that we went through all this ANS entropy coder idea, scaling values by (x/P) and so on, and from that we constructed tANS code. But you can get to the exact same tANS directly from the idea of the output string!

Let's invent tANS our new way, starting from scratch.

I'm given normalized frequencies {Fs}. Sum{Fs} = L. I want a state machine with L entries. Take each symbol and scatter it into our output string in some way.

To encode each symbol, I need to map the state machine index t in [0,L-1] to one of its occurances in the output string.

```
There are Fs occurances in the output string

I need to map an [0,L-1] value to an [0,Fs-1] value
by outputting either b or b+1 bits

```
now clearly if (L/Fs) is a power of 2, then the log2 of that is just b and we always output that many bits. (eg L=16, Fs=4, we just output 2 bits). In general if (L/Fs) is not a power of 2, then
```
b = floor(log2(L/Fs))
b+1 = ceil(log2(L/Fs))

```
so we just need two sub-ranges of L such that the total adds up to Fs :
```
threshold T
values < T output b bits
values >= T output b+1 bits

total of both ranges after output should equal Fs :

(T>>b) + (L-T)>>(b+1) = Fs

(2T + L-T)>>(b+1) = Fs

L+T = Fs<<(b+1)

T = (Fs<<(b+1)) - L

```
and that's it! We've just made a tANS encoder without talking about anything related to the ANS ideas at all.

The funny thing to me is that we got the exact same condition before from "b-uniqueness". That is, in order to be able to encode symbol s from any initial state, we worked out that the only valid precursor range was Is = [Fs,2*Fs-1] . That leads us to the renormalization loop :

```
while x > (2*Fs-1)
output a bit from x; x>>= 1;

```
And from that we computed a minimum number of output bits, and a threshold state for one more. That threshold we computed was
```
(max_state + 1)<`<`min_num_bits

= (2*Fs-1 + 1)<`<`b
= Fs<<(b+1)

```
which is the same.