EncodeMod is just the idea that you send each token (byte, word, nibble, whatever) with two ranges; in one range the values are terminal (no more tokens), while in the other range it means "this is part of the value" but more tokens follow. You can then optimize the division point for a wide range of applications.
In my original pseudo-code I was writing the ranges with the "more tokens" follow at the bottom, and
terminal values at the top. That is :
Specifically for the case of byte tokens and pow2 mod
mod = 1<
Fabian spotted that the code is slightly simpler if you switch the ranges. Use the low range
[0,upper) for terminal values and [upper,256) for non-terminal values. The ranges are the same, so you
get the same encoded lengths.
<
bits
in each token we send "bits" of values that don't currently fit
upper = 256 - mod
"upper" is the number of terminal values we can send in the current token
I was writing
[0,mod) = bits of value + more tokens follow
[mod,256) = terminal value
(BTW it also occurred to me when learning about ANS that EncodeMod is reminiscent of simple ANS. You're trying to send a bit - "do more bytes follow". You're putting that bit in a token, and you have some extra information you can send with that bit - so just put some of your value in there. The number of slots for bit=0 and 1 should correspond to the probability of each event.)
The switched encodemod is :
U8 *encmod(U8 *to, int val, int bits)
{
const int upper = 256 - (1<
The simplification of the encoder here :
<
bits); // binary, this is 1110000 or similar (8-bits ones, bits zeros)
while (val >= upper)
{
*to++ = (U8) (upper | val);
val = (val - upper) >> bits;
}
*to++ = (U8) val;
return to;
}
const U8 *decmod(int *outval, const U8 *from, int bits)
{
const int upper = 256 - (1<<
bits);
int shift = 0;
int val = 0;
for (;;)
{
int byte = *from++;
val += byte << shift;
if (byte < upper)
break;
shift += bits;
}
*outval = val;
return from;
}
*to++ = (U8) (upper | val);
val = (val - upper) >> bits;
written in long-hand is :
low = val & ((1<
Basically by using "upper" like this, the mask of low bits and add of upper is done in one op.
<
bits)-1);
*to++ = upper + low; // (same as upper | low, same as upper | val)
val -= upper;
val >>= bits;
or
val -= upper;
low = val & ((1<<
bits)-1);
*to++ = upper + low; // (same as upper | low, same as upper | val)
val >>= bits;
and the val -= upper can be done early or late because val >= upper it doesn't touch "low"