ANS (TANS or RANS) in the straightforward implementation writes a large minimum number of bytes.
To be concrete I'll consider a particular extremely bad case : 64-bit RANS with 32-bit renormalization.
The standard coder is :
initialize encoder (at end of stream) :
x = 1<<31
renormalize so x stays in the range x >= (1<<31) and x < (1<<63)
flush encoder (at the beginning of the stream) :
output all 8 bytes of x
decoder initializes by reading 8 bytes of x
decoder renormalizes via :
if ( x < (1<<31) )
{
x <<= 32; x |= get32(ptr); ptr += 4;
}
decoder terminates and can assert that x == 1<<31
this coder outputs a minimum of 8 bytes, which means it wastes up to 7 bytes on low-entropy data
(assuming 1 byte minimum output and that the 1 byte required to byte-align output is not "waste").
In contrast, it's well known how to do minimal flush of arithmetic coders. When the arithmetic coder reaches the end, it has a "low" and "range" specifying an interval. "low" might be 64-bits, but you don't need to output them all, you only need to output enough such that the decoder will get something in the correct interval between "low" and "low+range".
Historically people often did arithmetic coder minimum flush assuming that the decoder would read zero-valued bytes after EOF. I no longer do that. I prefer to do a minimum flush such that decoder will get something in the correct interval no matter what byte follows EOF. This allows the decoder to just read past the end of your buffer with no extra work. (the arithmetic coder reads some # of bytes past EOF because it reads enough to fill "low" with bits, even though the top bits are all that are needed at the end of the stream).
The arithmetic coder minimum flush outputs a number of bytes proportional to log2(1/range) , which is the number of bits of information that are currently held pending in the arithmetic coder state, which is good. The excess is at most 1 byte.
So, to make ANS as clean as arithmetic coding we need a minimal flush. There are two sources of the waste in the normal ANS procedure outlined above.
One is the initial value of x (at the end of the stream). By setting x to (1<<31) , the low end of the renormalization interval, we have essentually filled it with bits it has to flush. (the pending bits in x is log2(x)). But those bits don't contain anything useful (except a value we can check at the end of decoding). One way to remove that waste is to stuff some other value in the initial state which contains bits you care about. Any value you initialize x with, you get back at the end of decoding, so then those bits aren't "wasted". But this can be annoying to find something useful to put in there, since you don't get that value out until the end of decoding.
The other source of waste is the final flush of x (at the beginning of the stream). This one is obvious - the # of pending bits stored in x at any time is log2(x). Clearly we should be flushing the final value of x in a # of bits proportional to log2(x).
So to do ANS minimal flush, here's one way :
initialize encoder (at end of stream) :
x = 0
renormalize so x stays in the range x < (1<<63)
flush encoder (at the beginning of the stream) :
output # of bytes with bits set in x, and those bytes
decoder initializes by reading variable # of bytes of x
decoder renormalizes via :
if ( x < (1<<31) )
{
if ( ptr < ptrend )
{
x <<= 32; x |= get32(ptr); ptr += 4;
}
}
decoder terminates and can assert that x == 0
This ANS variant will output only 1 byte on very-low-entropy data.
There are now two phases of the coder. In the beginning of encoding (at the ending of the stream), x is allowed to be way below the renormalization range. During this phase, encoding just puts information into x, and the value of x grows. (note that x can actually stay 0 and never hold any bits if your consists of entirely the bottom symbol in RANS). Once x grows up into the renormalization interval, you enter the next phase where bits of x are pushed to the output to keep x in the renormalization interval. Decoding, in the first phase you read bytes from the stread to fill x with bits and keep it in the renormalization interval. Once the decoder read pointer hits the end, you switch to the second phase, and now x is allowed to shrink below the renormalization minimum and you can continue to decode the remaining information held in it.
This appears to add an extra branch to the decoder renormalization, but that can be removed by duplicating your decoder into "not near the end" and "near the end" variants.
The #sigbit output of x at the head is just the right thing and should always be done in all variants of ANS.
The checking ptr vs. ptrend and starting x = 0 is the variant that I call "minimal ANS".
Unfortunately "minimal ANS" doesn't play well with the ILP multi-state interleaved ANS. To do interleaved ANS like this you would need an EOF marker for each state. That's possible in theory (and could be done compactly in theory) but is a pain in the butt in practice.