10/25/2008

10-25-08 - 1

The Maximum Compression guy has some bizarro efficiency measure . The right way to measure a compressor is the total time to compress, transmit & decompress over a channel of a certain speed. In my case I mainly only care about decompression, so I measure the quality in terms of the time to transmit the compressed data & the time to decompress.

Here's a CSV of his data with a reasonable efficiency rating. I also make a "True Decomp" time by subtracting off 3 seconds to remove the executable load overhead and the time to write the uncompressed data to disk. Copy-paste this to a file and load with excel as a CSV. The numbers that matter are Decomp+Load at 6 MB/sec and Decomp+Load at 1 MB/sec

number,archiver,switches,solid,size,ratio (%),Comp time,Decomp time,Efficiency (lower is better),,,True Decomp,,Load 6MB/sec,,Load 1MB/sec,,Decomp+Load 6Ms,,D+L 1MBs
99,CABARC 1.00.0106,LZX:17,N,102561882,67.58,175,7,14394,,,4, ,17.093647,,102.561882,,21.093647,,106.561882
45,THOR 0.96,e5,Y,112275112,64.51,24,6,5779,,,3, ,18.71251867,,112.275112,,21.71251867,,115.275112
18,WinRAR 3.71,Normal solid 4096Kb,N,90540300,71.38,152,10,3309,,,7, ,15.09005,,90.5403,,22.09005,,97.5403
73,WINIMP 1.21,1 BEST MM 16Mb B=1020Kb,N,97241095,69.26,204,9,9316,,,6, ,16.20684917,,97.241095,,22.20684917,,103.241095
59,WINIMP 1.21,1 BEST MM,N,99025199,68.7,137,9,7621,,,6, ,16.50419983,,99.025199,,22.50419983,,105.025199
36,WINIMP 1.21,Normal MM,N,100399936,68.26,82,9,5310,,,6, ,16.73332267,,100.399936,,22.73332267,,106.399936
76,GZIP 1.2.4,(none),Y,115442964,63.51,28,7,9570,,,4, ,19.240494,,115.442964,,23.240494,,119.442964
66,AIN 2.32,(none),N,115576936,63.47,25,7,8672,,,4, ,19.26282267,,115.576936,,23.26282267,,119.576936
98,BIX 1.00 b7,(none),N,99694432,68.49,237,10,14196,,,7, ,16.61573867,,99.694432,,23.61573867,,106.694432
80,LHARK 0.4d,(none),N,113439063,64.14,37,8,10132,,,5, ,18.9065105,,113.439063,,23.9065105,,118.439063
63,PKZIP 2.50,#NAME?,N,114022347,63.96,28,8,8178,,,5, ,19.0037245,,114.022347,,24.0037245,,119.022347
52,PKZIP 2.50,(none),N,114501263,63.81,22,8,6775,,,5, ,19.08354383,,114.501263,,24.08354383,,119.501263
85,ESP 1.92,(none),N,115164577,63.6,32,8,10605,,,5, ,19.19409617,,115.164577,,24.19409617,,120.164577
87,DZip 2.90,(none),N,115269882,63.56,33,8,11065,,,5, ,19.211647,,115.269882,,24.211647,,120.269882
86,File2Pack 2.0,(none),N,115305901,63.55,32,8,10772,,,5, ,19.21765017,,115.305901,,24.21765017,,120.305901
41,THOR 0.96,e3,Y,124588656,60.62,6,7,5638,,,4, ,20.764776,,124.588656,,24.764776,,128.588656

Don't put too much stock in these numbers, his timing is very rough (as is my "true decomp" time estimate). Also stuff like RAR and WINIMP is taking advantage of special multimedia recognition to do so well. Anyway, what you can see is that on modern DVD media that does between 1MB/sec and 6 MB/sec, you really cannot afford to spend much time in the decompressor. The slower the channel, the more important it is to spend a lot of time compressing really well, if the channel is infinitely fast then memcpy is the best option. Note that loading from a hard drive you really can't afford to decompress at all (and you shouldn't really be streaming or pointer fixing either, flat load baby!).

If you refer algebra, I ran these numbers :


6 MB/sec compressed data rate


4.0 bpb at 120 MB/sec

( 256 * 4.0 / 8 ) / 6 + ( 256 / 120 ) = 23.47 seconds

3.75 bpb at 80 MB/sec

( 256 * 3.75 / 8 ) / 6 + ( 256 / 80 ) = 23.20 seconds (** winner)

3.6 bpb at 40 MB/sec

( 256 * 3.60 / 8 ) / 6 + ( 256 / 40 ) = 25.60 seconds

3.5 bpb at 20 MB/sec

( 256 * 3.50 / 8 ) / 6 + ( 256 / 20 ) = 31.50 seconds

//--------------------------------------------------

2 MB/sec compressed data rate


4.0 bpb at 120 MB/sec

( 256 * 4.0 / 8 ) / 2 + ( 256 / 120 ) = 66.13 seconds

3.75 bpb at 80 MB/sec

( 256 * 3.75 / 8 ) / 2 + ( 256 / 80 ) = 63.20 seconds (** winner)

3.6 bpb at 40 MB/sec

( 256 * 3.60 / 8 ) / 2 + ( 256 / 40 ) = 64.00 seconds

3.5 bpb at 20 MB/sec

( 256 * 3.50 / 8 ) / 2 + ( 256 / 20 ) = 68.80 seconds

//--------------------------------------------------

4.0 bpb at 120 MB/sec is stuff like THOR,quicklz,LZRW,LZO
3.75 bpb at 80 MB/sec is stuff like LZX, and my Oodle-LZH
3.6 bpb at 40 MB/sec is stuff like RAR & Quantum
3.5 bpb at 20 MB/sec is stuff like LZMA/7-zip

Even though those efficiency numbers are very rough, I was surprised to see Zip was up there doing quite well. A lot of people claim to be "better than zip" , but they measure that in the useless metric of how much compression they get. The real measure is the full time to load compressed data into memory uncompressed.

I've been thinking about making my LZ optimal parser able to output Zip code streams. I think it would be the best Zip recompressor (7-zip also has a very good zip recompressor, I'd like to test myself against that). Unfortunately the source code for both 7-zip and InfoZip is a total mess and untangling that to be able to output a Zip code stream seems like an awful lot of work for a pointless curiosity project.

ADDENDUM : I should note that the total time is actually not really the best measure unless you are completely stalling on the load. Hopefully the actual IO is async and you do other work, which means actually that the decoders which spend more time in IO and less time on CPU work are preferred. The slower decoder with higher compression may look good in a pure race on blocked IO, but it's preventing other stuff from using the CPU. The only time a slow decoder really makes sense is when the CPU is very fast and the IO channel is very slow.

No comments:

old rants