First you define a pattern. Mine was :
"coder" pattern implements :
void reset();
int decode( arithmetic_decoder * dec );
void encode( arithmetic_encoder * enc, int val );
int cost(int val) const;
where encode & decode also both adapt the model (if any).
You can then implement that pattern in various ways.
Perhaps the first "coder" pattern is just a single bit :
template
<
int t_tot_shift, int t_upd_shift>
struct arithbit_updshift
{
U16 m_p0;
void reset()
{
m_p0 = 1<<(t_tot_shift-1);
}
.. etc ..
};
And then you do some N-ary coders.
The standard ones being :
template
I'm not going to show implementations of all of these in this post, but I'll do a few here and there
for clarity. The point is more about the architecture than the implementation.
<
int t_alphabet_size>
struct counting_coder;
template <
int t_numbits, typename t_arithbit>
struct bitwise_topdown_coder;
template <
int t_numbits, typename t_arithbit>
struct bitwise_bottomup_coder;
template <
int t_maxval, typename t_arithbit>
struct unary_coder;
For example :
template
and "t_arithbit" is your base coder pattern for doing a single modeled bit. By making that a template parameter
it can be something like
<
int t_numbits, typename t_arithbit>
struct bitwise_topdown_coder
{
enum { c_numvals = 1<<
t_numbits };
//t_arithbit m_bins[c_numvals-1]; // <- what's needed (use ctx-1 index)
t_arithbit m_bins[c_numvals]; // padded for simpler addressing (use ctx index)
int decode( arithmetic_decoder * dec )
{
int ctx = 1;
for(int i=0;i<
t_numbits;i++)
{
int bit = m_bins[ctx].decode(dec);
ctx <<= 1; ctx |= bit;
}
return ctx & (c_numvals-1);
}
etc ...
};
arithbit_countn0n1
or a template like :
arithbit_updshift<12,5>
etc.
The fun thing is you can start combining them to make new coders which also fit the pattern.
For example, say you want to do a bitwise decomposition coder, but you don't have an even power of 2 worth
of values? And maybe you don't want to put your split points right in the middle?
// t_fracmul256 is the split fraction times 256
// eg. t_fracmul256 = 128 is middle splits (binary subdivision)
// t_fracmul256 = 0 is a unary coder
template
This lets you compile-time generate funny branching trees to be able to handle something like
"my alphabet has 37 symbols, and I want to code each interval as a binary flag at 1/3 of the range,
so the first event is [0-12][13-37]" and so on.
<
int t_numvals, int t_fracmul256, typename t_arithbit>
struct bitwise_split_coder
{
enum { c_numvals = t_numvals };
enum { c_numlo = RR_CLAMP( ((t_numvals*t_fracmul256)/256) , 1, (t_numvals-1) ) };
enum { c_numhi = t_numvals - c_numlo };
t_arithbit m_bin;
bitwise_split_coder<
c_numlo,t_fracmul256,t_arithbit> m_lo;
bitwise_split_coder<
c_numhi,t_fracmul256,t_arithbit> m_hi;
void reset()
{
m_bin.reset();
m_lo.reset();
m_hi.reset();
}
void encode(arithmetic_encoder * enc,int val)
{
if ( val < c_numlo )
{
m_bin.encode(arith,0);
m_lo.encode(arith,val);
}
else
{
m_bin.encode(arith,1);
m_hi.encode(arith, val - c_numlo );
}
}
.. etc ..
};
+ explicit template specialize for t_numvals=1,2
And then you can make yourself some generic tools for plugging coders together. The main ones I
use are :
// val_split_coder :
// use t_low_coder below t_low_count
// then t_high_coder above t_low_count
template
So that gives us a bunch of tools. Then you put them together and make complicated things.
<
int t_low_count, typename t_low_coder, typename t_high_coder , typename t_arithbit>
struct val_split_coder
{
t_arithbit m_bin;
t_low_coder m_lo;
t_high_coder m_hi;
void encode(arithmetic_encoder * enc,int val)
{
if ( val < t_low_count )
{
m_bin.encode(arith,0);
m_lo.encode(arith,val);
}
else
{
m_bin.encode(arith,1);
m_hi.encode(arith, val - t_low_count );
}
}
.. etc ..
};
// bit_split_coder :
// use t_low_coder for t_low_bits
// use t_high_coder for higher bits
// (high and low bits are independent)
template <
int t_low_bit_count, typename t_low_coder, typename t_high_coder >
struct bit_split_coder
{
t_low_coder m_lo;
t_high_coder m_hi;
void encode(arithmetic_encoder * enc,int val)
{
int low = val & ((1<<
t_low_bit_count)-1);
int high = val >> t_low_bit_count;
m_lo.encode(enc,low);
m_hi.encode(enc,high);
}
.. etc ..
};
// bit_split_coder_contexted :
// split bits, code hi then low with hi as context
// (used for decomposition of a value where the bits are dependent)
template <
int t_low_bit_count, typename t_low_coder, int t_high_bit_count, typename t_high_coder >
struct bit_split_coder_contexted
{
t_high_coder m_hi;
t_low_coder m_lo[(1<<
t_high_bit_count)];
void encode(arithmetic_encoder * enc,int val)
{
int high = val >> t_low_bit_count;
int low = val & ((1<<
t_low_bit_count)-1);
m_hi.encode(enc,high);
m_lo[high].encode(enc,low);
}
.. etc ..
};
For example, my LZ match length coder is this :
val_split_coder< 8 ,
bitwise_topdown_coder< 3 , arithbit_updshift<12,5> > , // low val coder
numsigbit_coder< unary_coder<16, arithbit_updshift<12,5> > > , // high val coder
arithbit_updshift<12,5> >
which says : first code a binary event for length < 8 or not. If length < 8 then code it as a 3-bit
binary value. If >= 8 then code it using a num-sig-bits decomposition where the # of sig bits are
written using unary (arithmetic coded) and the remainder is sent raw.
And the standard LZ offset coder that I described in the last post is something like this :
val_split_coder< 64 ,
bitwise_topdown_coder< 6 , arithbit_updshift<12,5> > , // low vals
bit_split_coder< 5 , // high vals
bitwise_bottomup_coder< 5 , arithbit_updshift<12,5> > , // low bits of high vals
numsigbit_coder< unary_coder<30, arithbit_updshift<12,5> > > , // high bits of high vals
> ,
arithbit_updshift<12,5>
>
To be clear : the advantage of this approach is that you can easily play with different variations and decompositions, plug in different coders for each portion of the operation, etc.
The code generated by this is very good, but once you fully lock down your coders you probably want to copy out the exact cases you used and hand code them, since the human eye can do things the optimizer can't.