6/20/2010

06-20-10 - KZip

While futzing around on PNG I discovered that there is this whole community of people who are hard-core optimizing PNG's for distribution sizes. They have these crazy tools like pngcrush/optipng/advancecomp/pngout.

AdvPNG is just the Zip encoder from 7zip (Igor Pavlov). It's a semi-optimal parser using a forward-pass multi-lookahead heuristic like Quantum. Some day I need to try to read the source code to figure out what he's doing exactly, but god damnit the code is ugly and it's not documented. Dammit Igor.

PNGOUT is a lossless "deflate" (zip) optimizer. The engine is the same as KZip by Ken Silverman. Unfortunately Ken has not documented his algorithm at all anywhere. Come on Ken! Nobody is gonna pay you for KZip! Just write about it!

Anyway, my guess from reading his brief description and looking at what it does is : KZip has some type of optimal parser. Who knows what kind of optimal parser; my guess is that knowing a bit about how Ken codes it is probably not an actual Storer-Szymanski optimal parser, but rather some kind of heuristic, perhaps a like 7zip/LZMA. KZip also clearly has some kind of Huffman split point optimizer (similar to what I just did ). Again just guessing from the command line options it looks like his Huffman splitter is single pass and is just based on a heuristic that detects changes in the statistics and decides to put a split there. Hmmm, I wish I'd found this months ago.

KZip appears to be the best zip optimizer in existence. Despite claims of being crazy slow I actually think it's quite fast by my standards. No surprise KZip is a lot smaller and faster than my optimal parser, but I do make smaller files. Ken ! Set your algorithm free! (ADDENDUM : whoops, that was only on PNG data; for some reason it's pretty fast on image data, but it's slow as balls in some other cases, not sure what's going on there; 7zip is a lot faster than kzip and the file sizes are usually very close (once in a while kzip does significiantly better)).

For a while I've been curious to try my RAD LZ optimizer on a Zip token stream. It would be a nice sanity check to test it against KZip, but I'm not motivated enough to take the pain of figuring out how to write Zip tokens.

1 comment:

Gilles said...

I have sent it to Ken :)He is a nice guy, I am happy you like his app.

old rants