10-18-06 - 1

I just randomly stumbled upon Matt Mahoney's PAQ data compressor, which perhaps is the state of the art these days. I would say it follows in the tradition of PPMii-PPMZ/Rkive, though he doesn't really cite those as he apparently started from a neural net background which is weird. He's doing a lot of the things Malcolm Taylor invented in Rkive, such as using funny non-linearly-preceding contexts (eg. not just ABCD, BCD, CD, D, but also "xBxD" and "AxCx" and so on). The PAQ implementations that win challenges seem to be tweaked out to all hell, which is sort of lame, but that's what you have to do to win in practice. There's some very elegant stuff he's done. There's a short paper about it on his page but I can't really find a thorough description anywhere where he talks about different things he tried & why he's doing them like he is. Alas. He's doing bit-by-bit coding instead of byte-by-byte which I always thought was much more elegant (you don't have charactes and escapes, just 0's and 1's) but I always got worse performance with bits rather than bytes; I guess I was leaking some efficiency in each coding op which adds up when you do 8X more coding ops.

No comments:

old rants