05-09-15 - Oodle LZNA

Oodle 1.45 has a new compressor called LZNA. (LZ-nibbled-ANS)

LZNA is a high compression LZ (usually a bit more than 7z/LZMA) with better decode speed. Around 2.5X faster to decode than LZMA.

Anyone who needs LZMA-level compression and higher decode speeds should consider LZNA. Currently LZNA requires SSE2 to be fast, so it only runs full speed on modern platforms with x86 chips.

LZNA gets its speed from two primary changes. 1. It uses RANS instead of arithmetic coding. 2. It uses nibble-wise coding instead of bit-wise coding, so it can do 4x fewer coding operations in some cases. The magic sauce that makes these possible is Ryg's realization about mixing cumulative probability distributions . That lets you do the bitwise-style shift update of probabilities (keeping a power of two total), but on larger alphabets.

LZNA usually beats LZMA compression on binary, slightly worse on text. LZNA is closer to LZHAM decompress speeds.

Some results :


LZNA -z6 : 24,700,820 -> 9,154,248 =  2.965 bpb =  2.698 to 1
decode only      : 0.327 seconds, 43.75 b/kc, rate= 75.65 mb/s

LZMA : 24,700,820 -> 9,329,925 =  3.021 bpb =  2.647 to 1
decode           : 0.838 seconds, 58.67 clocks, rate= 29.47 M/s

LZHAM : 24,700,820 ->10,140,761 =  3.284 bpb =  2.435 to 1
decode           : 0.264 seconds, 18.44 clocks, rate= 93.74 M/s

(note on settings : LZHAM is run at BETTER because UBER is too slow. LZHAM BETTER is comparable to Oodle's -z6 ; UBER is similar to my -z7. LZMA is run at the best compression setting I can find; -m9 and lc=0,lp=2,pb=2 for binary data; with LZHAM I don't see a way to set the context bits. This is the new LZHAM 1.0, slightly different than my previous tests of LZHAM. All 64-bit, big dictionaries.).


LZNA -z6 : 58,788,904 ->12,933,907 =  1.760 bpb =  4.545 to 1
decode only      : 0.677 seconds, 50.22 b/kc, rate= 86.84 mb/s

LZMA : 58,788,904 ->13,525,659 =  1.840 bpb =  4.346 to 1
decode           : 1.384 seconds, 40.70 clocks, rate= 42.49 M/s

LZHAM : 58,788,904 ->15,594,877 =  2.122 bpb =  3.769 to 1
decode           : 0.582 seconds, 17.12 clocks, rate= 100.97 M/s

I'm not showing encode speeds because they're all running different amounts of threading. It would be complicated to show fairly. LZHAM is the most aggressively threaded, and also the slowest without threading.

My "game testset" total sizes, from most compression to least :

Oodle LZNA -z8 :            57,176,229
Oodle LZNA -z5 :            58,318,469

LZMA -mx9 d26:lc0:lp2:pb3 : 58,884,562
LZMA -mx9 :                 59,987,629

LZHAM -mx9 :                62,621,098

Oodle LZHLW -z6 :           68,199,739

zip -9 :                    88,436,013

raw :                       167,495,105

Here's the new Pareto chart for Oodle. See previous post on these charts

This is load+decomp speedup relative to memcpy : (lzt99)

The left-side Y-intercept is the compression ratio. The right-side Y-intercept is the decompression speed. In between you can see the zones where each compressor is the best tradeoff.

With LZMA and LZHAM : (changed colors)

lzt99 is bad for LZHAM, perhaps because it's heterogeneous and LZHAM assumes pretty stable data. (LZHAM usually beats LZHLW for compression ratio). Here's a different example :

load+decomp speedup relative to memcpy : (baby_robot_shell)


Zed said...

can you provide some executables for benchmarking?

Rich Geldreich said...

Interesting charts - you're motivating me to work on LZHAM more.

Rich Geldreich said...

BTW, the latest devel version of lzham has faster compression (and is still compatible with v1.0):


old rants