10-02-12 - Small note on LZHAM

When I did my comparison of various compressors a little while ago, I also tested LZHAM, but I didn't include it in the charts because the numbers I was seeing from it were very strange. In particular, I saw very very slow decode speeds, which surprised me because it seems to test well in other peoples' benchmarks.

So I finally had a deeper look to sort it out. The short answer is that LZHAM has some sort of very long initialization (even for just the decoder) which makes its speed extremely poor on small buffers. I was seeing speeds like 2 MB/sec , much worse than LZMA (which generally gets 10-25 MB/sec on my machine). (this is just from calling lzham_lib_decompress_memory)

On large buffers, LZHAM is in fact pretty fast (some numbers below). The space-speed is very good (on large buffers); it gets almost LZMA compression with much faster decodes. Unfortunately the breakdown on small buffers makes it not a general solution at the moment IMO (it's also very slow on incompressible and nearly-incompressible data). I imagine it's something like the huffman table construction is very slow, which gets hidden on large files but dominates small ones.

Anyhoo, here are some numbers. Decode shows mb/s.

BTW BEWARE : don't pay too much attention to enwik8 results; compressing huge amounts of text is irrelevant to almost all users. The results on lzt99 are more reflective of typical use.

name lzt99 decode
raw 24700820 inf
lz4 14814442 1718.72
zlib 13115250 213.99
oodlelzhlw 10164511 287.54
lzham 10066153 61.24
lzma 9344463 29.77

name enwik8 dec
raw 100000000 inf
lz4 42210253 1032.34
zlib 36445770 186.96
oodlelzhlw 27729121 258.46
lzham 24769055 103.01
lzma 24772996 54.59

(lzma should beat lzham on enwik8 but I can't be bothered to fiddle with all the compress options to find the ones that make it win; this is just setting both to "uber" (and -9) parse level and setting dict size = 2^29 for both)

And some charts for lzt99. See the previous post on how to read the charts .



Can't LZMA2 do multi-threaded encode/decode?


ah, so can LZHAM. I'd be interested to see which is better multi-threaded. I expect the same improvement, but the implementations might differ.

cbloom said...

Neither one can multi-thread decode, which is what I am timing. Both can multi-thread encode in different ways.

Rich Geldreich said...

Thanks for trying out LZHAM. This is a very interesting way of testing codecs. The current decompressor suffers from having to clear too many Huffman context tables at startup. I need to optimize this (maybe add a lazy clearing scheme that only clears those tables that are actually used), and also offer a choice to use less contexts (trading off some compression for faster init).

Also, the latest version of LZHAM (currently only checked into SVN) has faster startup times than the latest downloadable version (I reduced the amount of memory allocation that was ocuring during decompressor init). I need to get off my butt and release it.

cbloom said...

BTW an obvious way to multi-thread an LZ77 decoder (without just chunking, which costs compression) would be have one thread doing all the code stream unpacking and another thread read the unpacked code stream and write out the match copies and literals to the output buffer. Should provide some speedup. Not really a great way to use your CPU though.

cbloom said...

@Rich - Yeah, LZHAM does worst on really small files (16k or less). Obviously one option for a user is just to drop down to a simpler compressor (eg. zip) on tiny buffers.

Also the Huffman fast decode tables should use fewer peek bits on small files.

old rants