9/13/2012

09-13-12 - LZNib

LZNib is the straightforward/trivial way to do an LZ77 coder using EncodeMod for variable length numbers and 4-bit nibble aligned IO. That is, literals are always 8 bit; the control word is 4 bits and signals either a literal run len or a match length, using a range division value instead of a flag bit.

eg. if the nibble is < config_divider_lrl , it's a literal run len; if nibble is >= config_divider_lrl, it's a match.

The point of LZNib is to see how much compression is possible while keeping the decode speed close to the fastest of any reasonable compressor (LZ4,snappy,etc).

Testing different values of config_divider_lrl :

name raw config_divider_lrl=4 config_divider_lrl=5 config_divider_lrl=6 config_divider_lrl=7 config_divider_lrl=8 config_divider_lrl=9 config_divider_lrl=10 config_divider_lrl=11 config_divider_lrl=12 best
lzt00 16914 5639 5638 5632 5636 5654 5671 5696 5728 5771 5632
lzt01 200000 199360 199354 199348 199345 199339 199333 199324 199319 199314 199314
lzt02 755121 244328 243844 250836 255146 255177 255257 257754 260107 260597 243844
lzt03 3471552 1746220 1744630 1743728 1743043 1742718 1742728 1743191 1744496 1746915 1742718
lzt04 48649 13932 13939 13968 14015 14120 14184 14268 14319 14507 13932
lzt05 927796 422058 421115 420746 420592 418289 418200 418639 418854 418082 418082
lzt06 563160 414925 414080 412748 412223 409673 408884 408361 408435 407393 407393
lzt07 500000 237756 237318 237004 236910 236771 236949 237381 238091 239176 236771
lzt08 355400 309397 309490 308579 307706 307263 306418 305689 305495 305793 305495
lzt09 786488 302834 303018 303773 304350 305222 307405 308888 310649 314647 302834
lzt10 154624 11799 11785 11792 11800 11821 11843 11866 11885 11923 11785
lzt11 58524 22420 22341 22249 22288 22276 22322 22370 22561 22581 22249
lzt12 164423 28901 28974 28900 28957 29053 29122 29296 29381 29545 28900
lzt13 1041576 1072275 1068614 1058545 1047273 1025641 1025616 1025520 1025404 1024891 1024891
lzt14 102400 52010 51755 51595 51462 51379 51314 51298 51302 51341 51298
lzt15 34664 11846 11795 11767 11760 11740 11740 11756 11831 11837 11740
lzt16 21504 11056 11000 10961 10934 10911 10904 10893 10883 10892 10883
lzt17 53161 20122 20119 20152 20210 20288 20424 20601 20834 21091 20119
lzt18 102400 77317 77307 77274 77045 77037 77020 77006 76964 76976 76964
lzt19 768771 306499 307120 308138 309635 311801 314857 318983 323981 329683 306499
lzt20 1179702 975546 974447 973507 972326 971521 972060 971614 971569 985009 971521
lzt21 679936 99059 99182 99385 99673 100013 100492 101018 101652 102387 99059
lzt22 400000 334796 334533 334357 334027 333860 333733 333543 333501 337864 333501
lzt23 1048576 1029556 1026539 1023978 1021833 1019900 1018124 1016552 1015139 1013815 1013815
lzt24 3471552 1711694 1710524 1708577 1706969 1696663 1695663 1694205 1692996 1688324 1688324
lzt25 1029744 224428 224423 224306 229365 229362 229368 229603 227083 227546 224306
lzt26 262144 240106 239633 239200 238864 238538 238232 237960 237738 237571 237571
lzt27 857241 323147 323098 323274 323133 322050 322068 322799 322182 322573 322050
lzt28 1591760 343555 345586 348549 350601 352455 354077 356025 360583 364438 343555
lzt29 3953035 1445657 1442589 1440996 1440794 1437132 1437593 1440565 1442614 1442914 1437132
lzt30 100000 100668 100660 100656 100655 100653 100651 100651 100643 100643 100643
total 24700817 12338906 12324450 12314520 12308570 12268320 12272252 12283315 12296219 12326039 12212820

The divider at 8 is the same as using a flag bit + 3 bit payload in the nibble.

There does seem to be some win to be had by transmitting divider in the file (but it's not huge). Adaptive divider seems like an interesting thing to explore also, but it will make the decoder slower. Obviously it would be nice to be able to find the optimal divider for each file without just running the compressor N times.


Some comparison to other compressors. I tried to run all compressors with settings for max compression and max decode speed.

I'm having a bit of trouble finding anything to compare LZNib to. Obviously the LZ4+Large Window (LZ4P) that I talked about before is a fair challenger. I thought CRUSH would be the one, but it does very poorly on the big files, indicating that it has a small window (?). If anyone knows of a strong byte-aligned LZ that has large/unbounded window, let me know so I have something to compare against.

(ADDENDUM : tor -c1 can do byte-aligned IO and large windows, but it's a really bad byte-aligned coder)

(obviously things like LZ4 and snappy are not fair challengers because their small window makes them much worse; also note that zlib of course has a small window but is the only one with entropy coding; also several of these are not truly "byte aligned"; they have byte-aligned literals but do matches and control words bit by bit, which is "cheating" in this test (if you're allowed to do variable bit IO you can do much better; hell you may as well do huffman if you're doing variable bit IO) (though I'm also cheating because I believe I'm the only LZ here with a proper optimal parser) (I'm also cheating in a more subtle way, because I have this test set to tweak on and others don't)).

(aside : I don't understand why so many people are still writing small-window LZ's. Yes there is a certain use case for small-window LZ (eg. I did one for use on the SPU), but the most common use case for LZ these days is just "decompress the whole thing into a linear memory buffer", in which case you always have the entire buffer available for matches, so why not use it all?)

raw minilzo snappy quicklz 3 lz4 hc lzopack -9 ULZ c6 crush cx zlib lz4p 332 lznib div8
lzt00 16914 7195 7254 6082 6473 5805 6472 5487 4896 6068 5654
lzt01 200000 198906 198222 200009 198900 198934 199680 222477 198199 198880 199339
lzt02 755121 567421 552599 334625 410695 426187 312889 258902 386203 292427 255177
lzt03 3471552 2456002 2399663 2036985 1820761 1854718 1835797 1938138 1789728 1795951 1742718
lzt04 48649 20602 21170 16894 16709 14858 17359 13869 11903 15584 14120
lzt05 927796 590072 554964 480848 460889 453608 464602 444945 422484 440742 418289
lzt06 563160 536084 516818 563169 493055 490308 428137 432989 446533 419768 409673
lzt07 500000 297306 298207 268255 265688 242029 268271 245662 229426 248500 236771
lzt08 355400 356248 351850 317102 331454 314918 337423 315688 277666 322959 307263
lzt09 786488 460896 472498 372682 344792 345561 329608 304048 325921 325124 305222
lzt10 154624 21355 25960 17249 15139 14013 16238 12540 12577 13299 11821
lzt11 58524 28153 29121 25626 25832 23717 26720 23279 21637 23870 22276
lzt12 164423 52725 53515 38745 33666 31574 36077 29601 27583 30864 29053
lzt13 1041576 1045665 1041684 1041585 1042749 1041633 1048598 1061423 969636 1040033 1025641
lzt14 102400 61394 63638 55124 56525 52102 58509 51491 48155 53395 51379
lzt15 34664 16026 15417 13626 14062 12663 14016 12470 11464 12723 11740
lzt16 21504 12858 13165 11646 12349 11203 12554 11119 10311 11392 10911
lzt17 53161 28075 29415 23478 23141 20979 23829 19877 18518 22028 20288
lzt18 102400 100499 97931 81100 85659 74268 89973 76858 68392 79138 77037
lzt19 768771 495312 524686 411916 363217 360558 333732 302006 312257 335912 311801
lzt20 1179702 1181855 1161896 1037098 1045179 1042190 1013392 952329 952365 993442 971521
lzt21 679936 240528 240244 188446 194075 174892 125322 103608 148267 113461 100013
lzt22 400000 401446 397959 338837 361733 354449 355978 321598 309569 348347 333860
lzt23 1048576 1052692 1048694 1048585 1040701 1030737 1047609 985814 777633 1035197 1019900
lzt24 3471552 2668424 2613405 2425865 2369885 2521469 2040395 2080506 2289316 1934129 1696663
lzt25 1029744 361885 425735 351577 324190 326180 477377 297974 210363 332747 229362
lzt26 262144 261152 259327 244555 246465 242343 252297 237162 222808 244990 238538
lzt27 857241 460703 466747 435522 430350 395926 392139 335655 333120 353497 322050
lzt28 1591760 578289 617498 453170 445806 421804 401166 349753 335243 388712 352455
lzt29 3953035 2570903 2535625 2259281 2235299 2052597 2227835 2013763 1805289 1519904 1437132
lzt30 100000 100397 100015 100009 100394 100033 100782 112373 100020 100393 100653
total 24700817 17231068 17134922 15199691 14815832 14652256 14294776 13573404 13077482 13053476 12268320

BTW I finally got most of these compressors linked into my own test bed for this post. Review of their API's (and some ranting) :

minilzo , quicklz, LZ4 : yes, good, easy. API is basically compress(char *,int len,char *), as it should be.

One minor niggle with these guys : I'm actually leaning towards compress API's taking "void *" now for buffers. It's annoying dealing with API's where somebody thinks "char *" is the way to take an array and someone else wants "const unsigned char *", etc. If it's just memory, take void *, I think. I'm not completely sure about that.

Oh, also "uncompress" ? Where did that come from? It's "decompress". Come on!

Oh, and about LZO : so minilzo is nice and clean, but it only gives you access to the most crappy compressor. To get the good LZO variants you have to use the full LZO sdk which is a nightmare. So I didn't bother and just ran lzopack -9.

miniz : I don't love the single file miniz.c thing; I'd like to have an implementation .c and a header with the externs. It makes it way harder to read the externs and see what is supposed to be public. I also don't like the custom types (mz_uLong and such), I like "int" (I guess "size_t" is okay). But not bad, way better than :

zlib : WTF WTF ridiculous clusterfuck. There's an MSVC make file that seems to make a lib that is not what you want. Then the contrib has some projects for two MSVC versions and not others; then you have to make sure that you get the DLL import and WINAPI defines in your import the same as the built lib. Much hair pulling due to "_deflate not found" link errors.

(BTW I don't entirely blame zlib for that; how many man hours have been wasted due to the linker being so damn unhelpful. Hey jerks who work on the linker, if I'm trying to import "__stdcall deflate" and you only have "__cdecl deflate" then perhaps you could figure out that just maybe I wanted those to hook up. We've only been having this damn problem for the last 20 years. It's the kind of thing where one decent human being on that team would go "hmm I bet we could do something better here". The linker is like the overly precise nerd; if you ask "can you pass the salt" he's like "no, I don't have any salt" , and you're like "fuck you, I can see the salt right next to you" and he's like "nope, no salt", and you're like wtf wtf so you walk around and grab it and you say "okay, what's this?" and he says "that's kosher salt").

Compressed sizes from zlib were slightly smaller than miniz, so I posted them.

lzma : this is a mixed bag; the C++ interface is a total disaster of COM insanity, but you can avoid that and the C interface (LzmaEnc/LzmaDec) is okay. A bit of a mess and you oddly have to handle the props transmission yourself, but so much better than the C++ interface that it looks great in comparison. Very bizarre to expose so many technical encoder settings in the basic API though.

snappy : WTF ? No MSVC build, it uses some crazy STL crap for no reason (iostreams? are you kidding me? using std::string for buffers? WTF), it's all overly complex and bad-C++y. Really gross. There's a mess of files and I can't tell which ones I actually need. Oh, and it generates warnings like crazy; so it's all bloated "clean style" but not actually clean, which is the worst way to be. And then after futzing around with it, I'm rewarded with a really crappy compressor.

I guess I see the point of snappy; it's LZRW1 for modern CPUs. Okay, that's fine. And it is reasonably fast even on incompressible data. So it has a certain use case, perhaps for disk compression, or I guess Google uses it for network data compression or some such. But people have been taking snappy and using it as just a utility data compressor and it is totally terrible for that.

11 comments:

won3d said...

Yeah, Snappy is mostly for RPC. I think most of the pain you're seeing is from the fact that the internal implementation doesn't actually use the C++-y things like iostreams, but whomever open sourced it just jammed it in there. Personally, I don't see why you would use it over LZ4, unless you were already using it everywhere (like we are).

A while back (I might have even asked you), I was going to make a nibble-oriented LZP. Hmm...

NeARAZ said...

Speaking of compressor APIs, did you try lzham? Thoughts on it?

ryg said...

There's also aPLib. Was really good for a pure LZ77 without entropy encoding around 2000 or so, haven't compared it to anything recently.

It's part of the "byte-aligned literals but bitwise control codes" camp though. Good parser (well-tweaked heuristic, not optimal), and IIRC it has unlimited window size (intended for all-in-memory decompression). And an API that doesn't suck (no code for the compressor though, but if you already have an optimal parser it's trivial to write an encoder).

Code is here: http://www.ibsensoftware.com/products_aPLib.html

cbloom said...

@Won - yeah as I was doing this I started thinking about LZP-Nib. (I see from my email history search that you did in fact mail me about it long ago).

I suspect that you don't want to do a "pure" LZP (with only one match string), but rather use a few "ways" in the cache table, or perhaps a few bits of forward hash.

Of course this is mostly just rehashing old ideas :

http://cbloomrants.blogspot.com/2011/05/05-20-11-lzp1-variants.html

Also quicklz is a forward-hash LZP (LZP1c) if I'm understanding it correctly.

cbloom said...

LZHAM needs this -

int DictLog2(int rawLen)
{
int l2 = cb::intlog2ceil( (float)rawLen );
return cb::Clamp(l2,LZHAM_MIN_DICT_SIZE_LOG2,LZHAM_MAX_DICT_SIZE_LOG2_X86);
}

Jyrki said...

would you be willing to try gipfeli?

http://code.google.com/p/gipfeli/

cbloom said...

gipfeli needs at least a download .zip, and really there should be a Win32 .exe . It also appears to be entropy coded using a simple bit-packing scheme.

I believe that if you're doing bit-packing you may as well just do huffman. But perhaps if your speed/ratio is good you would convince me otherwise.

cbloom said...

@ryg - I tested aPLib; compression is very similar to CRUSH

cbloom said...

@gipfeli - I'm crankier than usual this morning. But god damn I'm sick of dealing with Google Code. (sourceforge and github and etc etc are no better). The main page should always just be a link to download a zip of the full source tree.

extrememoderate said...

cbloom, if you're looking for a strong LZ77, take a look at LZMAT. It's not entirely about strength, but on some kinds of data it does extremely well, f.e I think it supports arbitrary match length with logarithmic length encoding.

As to aPlib, last time I checked it didn't come with encoder sources...did I miss something?

cbloom said...

Re : aPlib - Correct, I just ran the EXE to see if the compression it achieved was interesting. It was not.

Re : LZMAT - yeah I tried it; compression ratios are somewhere around LZ4/CRUSH . It does have slightly different characteristics than the others, but it's definitely small window. Total on my set is 14518303

old rants