Density contains 3 algorithms, from super fast to slower : Chameleon, Cheetah, Lion.
They all attain speed primarily by working on U32 quanta of input, rather than bytes. They're sort of LZPish type things that work on U32's, which is a reasonable way to get speed in this modern world. (Cheetah and Lion are really similar to the old LZP1/LZP2 with bit flags for different predictors, or to some of the LZRW's that output forward hashes; the main difference is working on U32 quanta and no match lengths)
The compression ratio is very poor. The highest compression option (Lion) is around LZ4-fast territory, not as good as LZ4-hc. But, are they Pareto? Is it a good space-speed tradeoff?
Well, I can't build Density (I use MSVC) so I can't test their implementation for space-speed.
Compressed sizes :
lzt99 :
uncompressed 24,700,820
density :
c0 Chameleon 19,530,262
c1 Cheetah 17,482,048
c2 Lion 16,627,513
lz4 -1 16,193,125
lz4 -9 14,825,016
Oodle -1 (LZB) 16,944,829
Oodle -2 (LZB) 16,409,913
Oodle LZNIB 12,375,347
(lz4 -9 is not competitive for encode time, it's just to show the level of compression you could get at very fast decode speeds if you don't care about encode time ; LZNIB is an even more extreme case of the same thing - slow to encode, but decode time comparable to Chameleon).
To check speed I did my own implementation of Chameleon (which I believe to be faster than Density's, so it's a fair test). See the next post to get my implementation.
The results are :
comp_len = 19492042
Chameleon_Encode_Time : seconds:0.0274 ticks per: 1.919 mbps : 901.12
Chameleon_Decode_Time : seconds:0.0293 ticks per: 2.050 mbps : 843.31
round trip time = 0.05670
I get a somewhat smaller file size than Density's version for unknown reason.
Let's compare to Oodle's LZB (an LZ4ish) :
Oodle -1 :
24,700,820 ->16,944,829 = 5.488 bpb = 1.458 to 1
encode : 0.061 seconds, 232.40 b/kc, rate= 401.85 mb/s
decode : 0.013 seconds, 1071.15 b/kc, rate= 1852.17 mb/s
round trip time = 0.074
Oodle -2 :
24,700,820 ->16,409,913 = 5.315 bpb = 1.505 to 1
encode : 0.070 seconds, 203.89 b/kc, rate= 352.55 mb/s
decode : 0.014 seconds, 1008.76 b/kc, rate= 1744.34 mb/s
round trip time = 0.084
lzt99 is a collection of typical game data files.
We can test on enwik8 (text/html) too :
Chameleon :
enwik8 :
Chameleon_Encode_Time : seconds:0.1077 ticks per: 1.862 mbps : 928.36
Chameleon_Decode_Time : seconds:0.0676 ticks per: 1.169 mbps : 1479.08
comp_len = 61524068
Oodle -1 :
enwik8 :
100,000,000 ->57,267,299 = 4.581 bpb = 1.746 to 1
encode : 0.481 seconds, 120.17 b/kc, rate= 207.79 mb/s
decode : 0.083 seconds, 697.58 b/kc, rate= 1206.19 mb/s
here Chameleon is much more compelling. It's competitive for size & decode speed, not just encode speed.
Commentary :
Any time you're storing files on disk, this is not the right algorithm. You want something more asymmetric (slow compress, fast decompress).
I'm not sure if Cheetah and Lion are Pareto for round trip time. I'd have to test speed on a wider set of sample data.
When do you actually want a compressor that's this fast and gets so little compression? I'm not sure.
Looking at Chameleon, it is very simple. The only tweak-able part of the algorithm seems to be the hash function. It needs to be very fast and collisions directly reduce the compression rate.
ReplyDeleteI see they chose to do an unsigned 32-bit multiplication with a 'big' even number (2641295638 = 2*79*3541*4721) and then take the top half.
My first instinct would be to multiply with big prime; I am not sure what part to take (bottom half may or may not be faster?).
Do you have any insights about the hash function they chose?
Hashing for compression in cache tables like this is a bit weird. You don't actually want the most random-valued hash (the way you would for normal hash table lookup). I wrote a bit about it before :
ReplyDeletehttp://cbloomrants.blogspot.com/2010/11/11-19-10-hashes-and-cache-tables.html
There are endless options to play with, and what's fastest will depend highly on your platform.