First you define a pattern. Mine was :
"coder" pattern implements :
void reset();
int decode( arithmetic_decoder * dec );
void encode( arithmetic_encoder * enc, int val );
int cost(int val) const;
where encode & decode also both adapt the model (if any).
You can then implement that pattern in various ways.
Perhaps the first "coder" pattern is just a single bit :
template
<
int t_tot_shift, int t_upd_shift>
struct arithbit_updshift
{
U16 m_p0;
void reset()
{
m_p0 = 1<<(t_tot_shift-1);
}
.. etc ..
};
And then you do some N-ary coders.
The standard ones being :
template
I'm not going to show implementations of all of these in this post, but I'll do a few here and there
for clarity. The point is more about the architecture than the implementation.
<
int t_alphabet_size>
struct counting_coder;
template <
int t_numbits, typename t_arithbit>
struct bitwise_topdown_coder;
template <
int t_numbits, typename t_arithbit>
struct bitwise_bottomup_coder;
template <
int t_maxval, typename t_arithbit>
struct unary_coder;
For example :
template
and "t_arithbit" is your base coder pattern for doing a single modeled bit. By making that a template parameter
it can be something like
<
int t_numbits, typename t_arithbit>
struct bitwise_topdown_coder
{
enum { c_numvals = 1<<
t_numbits };
//t_arithbit m_bins[c_numvals-1]; // <- what's needed (use ctx-1 index)
t_arithbit m_bins[c_numvals]; // padded for simpler addressing (use ctx index)
int decode( arithmetic_decoder * dec )
{
int ctx = 1;
for(int i=0;i<
t_numbits;i++)
{
int bit = m_bins[ctx].decode(dec);
ctx <<= 1; ctx |= bit;
}
return ctx & (c_numvals-1);
}
etc ...
};
arithbit_countn0n1
or a template like :
arithbit_updshift<12,5>
etc.
The fun thing is you can start combining them to make new coders which also fit the pattern.
For example, say you want to do a bitwise decomposition coder, but you don't have an even power of 2 worth
of values? And maybe you don't want to put your split points right in the middle?
// t_fracmul256 is the split fraction times 256
// eg. t_fracmul256 = 128 is middle splits (binary subdivision)
// t_fracmul256 = 0 is a unary coder
template
This lets you compile-time generate funny branching trees to be able to handle something like
"my alphabet has 37 symbols, and I want to code each interval as a binary flag at 1/3 of the range,
so the first event is [0-12][13-37]" and so on.
<
int t_numvals, int t_fracmul256, typename t_arithbit>
struct bitwise_split_coder
{
enum { c_numvals = t_numvals };
enum { c_numlo = RR_CLAMP( ((t_numvals*t_fracmul256)/256) , 1, (t_numvals-1) ) };
enum { c_numhi = t_numvals - c_numlo };
t_arithbit m_bin;
bitwise_split_coder<
c_numlo,t_fracmul256,t_arithbit> m_lo;
bitwise_split_coder<
c_numhi,t_fracmul256,t_arithbit> m_hi;
void reset()
{
m_bin.reset();
m_lo.reset();
m_hi.reset();
}
void encode(arithmetic_encoder * enc,int val)
{
if ( val < c_numlo )
{
m_bin.encode(arith,0);
m_lo.encode(arith,val);
}
else
{
m_bin.encode(arith,1);
m_hi.encode(arith, val - c_numlo );
}
}
.. etc ..
};
+ explicit template specialize for t_numvals=1,2
And then you can make yourself some generic tools for plugging coders together. The main ones I
use are :
// val_split_coder :
// use t_low_coder below t_low_count
// then t_high_coder above t_low_count
template
So that gives us a bunch of tools. Then you put them together and make complicated things.
<
int t_low_count, typename t_low_coder, typename t_high_coder , typename t_arithbit>
struct val_split_coder
{
t_arithbit m_bin;
t_low_coder m_lo;
t_high_coder m_hi;
void encode(arithmetic_encoder * enc,int val)
{
if ( val < t_low_count )
{
m_bin.encode(arith,0);
m_lo.encode(arith,val);
}
else
{
m_bin.encode(arith,1);
m_hi.encode(arith, val - t_low_count );
}
}
.. etc ..
};
// bit_split_coder :
// use t_low_coder for t_low_bits
// use t_high_coder for higher bits
// (high and low bits are independent)
template <
int t_low_bit_count, typename t_low_coder, typename t_high_coder >
struct bit_split_coder
{
t_low_coder m_lo;
t_high_coder m_hi;
void encode(arithmetic_encoder * enc,int val)
{
int low = val & ((1<<
t_low_bit_count)-1);
int high = val >> t_low_bit_count;
m_lo.encode(enc,low);
m_hi.encode(enc,high);
}
.. etc ..
};
// bit_split_coder_contexted :
// split bits, code hi then low with hi as context
// (used for decomposition of a value where the bits are dependent)
template <
int t_low_bit_count, typename t_low_coder, int t_high_bit_count, typename t_high_coder >
struct bit_split_coder_contexted
{
t_high_coder m_hi;
t_low_coder m_lo[(1<<
t_high_bit_count)];
void encode(arithmetic_encoder * enc,int val)
{
int high = val >> t_low_bit_count;
int low = val & ((1<<
t_low_bit_count)-1);
m_hi.encode(enc,high);
m_lo[high].encode(enc,low);
}
.. etc ..
};
For example, my LZ match length coder is this :
val_split_coder< 8 ,
bitwise_topdown_coder< 3 , arithbit_updshift<12,5> > , // low val coder
numsigbit_coder< unary_coder<16, arithbit_updshift<12,5> > > , // high val coder
arithbit_updshift<12,5> >
which says : first code a binary event for length < 8 or not. If length < 8 then code it as a 3-bit
binary value. If >= 8 then code it using a num-sig-bits decomposition where the # of sig bits are
written using unary (arithmetic coded) and the remainder is sent raw.
And the standard LZ offset coder that I described in the last post is something like this :
val_split_coder< 64 ,
bitwise_topdown_coder< 6 , arithbit_updshift<12,5> > , // low vals
bit_split_coder< 5 , // high vals
bitwise_bottomup_coder< 5 , arithbit_updshift<12,5> > , // low bits of high vals
numsigbit_coder< unary_coder<30, arithbit_updshift<12,5> > > , // high bits of high vals
> ,
arithbit_updshift<12,5>
>
To be clear : the advantage of this approach is that you can easily play with different variations and decompositions, plug in different coders for each portion of the operation, etc.
The code generated by this is very good, but once you fully lock down your coders you probably want to copy out the exact cases you used and hand code them, since the human eye can do things the optimizer can't.
Since this is for development and not for code that would ever actually ship, I wonder how good you could get things by using conventional constants rather than template parameters.
ReplyDeleteYou get a structural win in that commensurate patterns are automatically reflected in the encode/decode methods...but those being the only two methods that really need to be in sync somewhat blunts the benefit since you are essentially trading off 2x repetition for having to deal with template metaprogramming. The essential dilemma is that you really do want metaprogramming sometimes, but C++ template metaprogramming is such a bad option a lot of times. I sure wish it weren't!
Well, you can't actually use constants because constants in C aren't properly compile-time.
ReplyDelete(they should be)
If constants were better in the compiler, and branches that evaluated to
if ( false )
were not even attempted to be compiled, then you could do a lot more with just constants.
Some of the stuff here could be done with macros, things like ENCODE_VAL_SPLIT / DECODE_VAL_SPLIT , sure could be macros. But IMO macros just worse than templates in almost every way except compile time and they optimize better.
Some of the stuff here can't be done with anything but templates, like bitwise_split_coder which is recursive and requires the template-specialization-termination trick.
And more generally -
ReplyDeleteI actually think C++ templates are fantastic for little isolated things like this.
As long as you can write them in one file and use them in that file, they're super nice and useful. I use them for things like generating image compressors that can work on different pixel formats.
The place where templates really fall apart (IMO) is when you try to use them as a fundamental architecture and get them in your shared headers and they start crudding up your whole compile.
"As long as you can write them in one file and use them in that file, they're super nice and useful."
ReplyDeleteYeah, I guess the key to sanity is some kind of isolation discipline.