cbloom rants: 09-13-12

LZNib is the straightforward/trivial way to do an LZ77 coder using EncodeMod for variable length numbers and 4-bit nibble aligned IO. That is, literals are always 8 bit; the control word is 4 bits and signals either a literal run len or a match length, using a range division value instead of a flag bit.

eg. if the nibble is < config_divider_lrl , it's a literal run len; if nibble is >= config_divider_lrl, it's a match.

The point of LZNib is to see how much compression is possible while keeping the decode speed close to the fastest of any reasonable compressor (LZ4,snappy,etc).

Testing different values of config_divider_lrl :

name	raw	config_divider_lrl=4	config_divider_lrl=5	config_divider_lrl=6	config_divider_lrl=7	config_divider_lrl=8	config_divider_lrl=9	config_divider_lrl=10	config_divider_lrl=11	config_divider_lrl=12	best
lzt00	16914	5639	5638	5632	5636	5654	5671	5696	5728	5771	5632
lzt01	200000	199360	199354	199348	199345	199339	199333	199324	199319	199314	199314
lzt02	755121	244328	243844	250836	255146	255177	255257	257754	260107	260597	243844
lzt03	3471552	1746220	1744630	1743728	1743043	1742718	1742728	1743191	1744496	1746915	1742718
lzt04	48649	13932	13939	13968	14015	14120	14184	14268	14319	14507	13932
lzt05	927796	422058	421115	420746	420592	418289	418200	418639	418854	418082	418082
lzt06	563160	414925	414080	412748	412223	409673	408884	408361	408435	407393	407393
lzt07	500000	237756	237318	237004	236910	236771	236949	237381	238091	239176	236771
lzt08	355400	309397	309490	308579	307706	307263	306418	305689	305495	305793	305495
lzt09	786488	302834	303018	303773	304350	305222	307405	308888	310649	314647	302834
lzt10	154624	11799	11785	11792	11800	11821	11843	11866	11885	11923	11785
lzt11	58524	22420	22341	22249	22288	22276	22322	22370	22561	22581	22249
lzt12	164423	28901	28974	28900	28957	29053	29122	29296	29381	29545	28900
lzt13	1041576	1072275	1068614	1058545	1047273	1025641	1025616	1025520	1025404	1024891	1024891
lzt14	102400	52010	51755	51595	51462	51379	51314	51298	51302	51341	51298
lzt15	34664	11846	11795	11767	11760	11740	11740	11756	11831	11837	11740
lzt16	21504	11056	11000	10961	10934	10911	10904	10893	10883	10892	10883
lzt17	53161	20122	20119	20152	20210	20288	20424	20601	20834	21091	20119
lzt18	102400	77317	77307	77274	77045	77037	77020	77006	76964	76976	76964
lzt19	768771	306499	307120	308138	309635	311801	314857	318983	323981	329683	306499
lzt20	1179702	975546	974447	973507	972326	971521	972060	971614	971569	985009	971521
lzt21	679936	99059	99182	99385	99673	100013	100492	101018	101652	102387	99059
lzt22	400000	334796	334533	334357	334027	333860	333733	333543	333501	337864	333501
lzt23	1048576	1029556	1026539	1023978	1021833	1019900	1018124	1016552	1015139	1013815	1013815
lzt24	3471552	1711694	1710524	1708577	1706969	1696663	1695663	1694205	1692996	1688324	1688324
lzt25	1029744	224428	224423	224306	229365	229362	229368	229603	227083	227546	224306
lzt26	262144	240106	239633	239200	238864	238538	238232	237960	237738	237571	237571
lzt27	857241	323147	323098	323274	323133	322050	322068	322799	322182	322573	322050
lzt28	1591760	343555	345586	348549	350601	352455	354077	356025	360583	364438	343555
lzt29	3953035	1445657	1442589	1440996	1440794	1437132	1437593	1440565	1442614	1442914	1437132
lzt30	100000	100668	100660	100656	100655	100653	100651	100651	100643	100643	100643
total	24700817	12338906	12324450	12314520	12308570	12268320	12272252	12283315	12296219	12326039	12212820

The divider at 8 is the same as using a flag bit + 3 bit payload in the nibble.

There does seem to be some win to be had by transmitting divider in the file (but it's not huge). Adaptive divider seems like an interesting thing to explore also, but it will make the decoder slower. Obviously it would be nice to be able to find the optimal divider for each file without just running the compressor N times.

Some comparison to other compressors. I tried to run all compressors with settings for max compression and max decode speed.

I'm having a bit of trouble finding anything to compare LZNib to. Obviously the LZ4+Large Window (LZ4P) that I talked about before is a fair challenger. I thought CRUSH would be the one, but it does very poorly on the big files, indicating that it has a small window (?). If anyone knows of a strong byte-aligned LZ that has large/unbounded window, let me know so I have something to compare against.

(ADDENDUM : tor -c1 can do byte-aligned IO and large windows, but it's a really bad byte-aligned coder)

(obviously things like LZ4 and snappy are not fair challengers because their small window makes them much worse; also note that zlib of course has a small window but is the only one with entropy coding; also several of these are not truly "byte aligned"; they have byte-aligned literals but do matches and control words bit by bit, which is "cheating" in this test (if you're allowed to do variable bit IO you can do much better; hell you may as well do huffman if you're doing variable bit IO) (though I'm also cheating because I believe I'm the only LZ here with a proper optimal parser) (I'm also cheating in a more subtle way, because I have this test set to tweak on and others don't)).

(aside : I don't understand why so many people are still writing small-window LZ's. Yes there is a certain use case for small-window LZ (eg. I did one for use on the SPU), but the most common use case for LZ these days is just "decompress the whole thing into a linear memory buffer", in which case you always have the entire buffer available for matches, so why not use it all?)

	raw	minilzo	snappy	quicklz 3	lz4 hc	lzopack -9	ULZ c6	crush cx	zlib	lz4p 332	lznib div8
lzt00	16914	7195	7254	6082	6473	5805	6472	5487	4896	6068	5654
lzt01	200000	198906	198222	200009	198900	198934	199680	222477	198199	198880	199339
lzt02	755121	567421	552599	334625	410695	426187	312889	258902	386203	292427	255177
lzt03	3471552	2456002	2399663	2036985	1820761	1854718	1835797	1938138	1789728	1795951	1742718
lzt04	48649	20602	21170	16894	16709	14858	17359	13869	11903	15584	14120
lzt05	927796	590072	554964	480848	460889	453608	464602	444945	422484	440742	418289
lzt06	563160	536084	516818	563169	493055	490308	428137	432989	446533	419768	409673
lzt07	500000	297306	298207	268255	265688	242029	268271	245662	229426	248500	236771
lzt08	355400	356248	351850	317102	331454	314918	337423	315688	277666	322959	307263
lzt09	786488	460896	472498	372682	344792	345561	329608	304048	325921	325124	305222
lzt10	154624	21355	25960	17249	15139	14013	16238	12540	12577	13299	11821
lzt11	58524	28153	29121	25626	25832	23717	26720	23279	21637	23870	22276
lzt12	164423	52725	53515	38745	33666	31574	36077	29601	27583	30864	29053
lzt13	1041576	1045665	1041684	1041585	1042749	1041633	1048598	1061423	969636	1040033	1025641
lzt14	102400	61394	63638	55124	56525	52102	58509	51491	48155	53395	51379
lzt15	34664	16026	15417	13626	14062	12663	14016	12470	11464	12723	11740
lzt16	21504	12858	13165	11646	12349	11203	12554	11119	10311	11392	10911
lzt17	53161	28075	29415	23478	23141	20979	23829	19877	18518	22028	20288
lzt18	102400	100499	97931	81100	85659	74268	89973	76858	68392	79138	77037
lzt19	768771	495312	524686	411916	363217	360558	333732	302006	312257	335912	311801
lzt20	1179702	1181855	1161896	1037098	1045179	1042190	1013392	952329	952365	993442	971521
lzt21	679936	240528	240244	188446	194075	174892	125322	103608	148267	113461	100013
lzt22	400000	401446	397959	338837	361733	354449	355978	321598	309569	348347	333860
lzt23	1048576	1052692	1048694	1048585	1040701	1030737	1047609	985814	777633	1035197	1019900
lzt24	3471552	2668424	2613405	2425865	2369885	2521469	2040395	2080506	2289316	1934129	1696663
lzt25	1029744	361885	425735	351577	324190	326180	477377	297974	210363	332747	229362
lzt26	262144	261152	259327	244555	246465	242343	252297	237162	222808	244990	238538
lzt27	857241	460703	466747	435522	430350	395926	392139	335655	333120	353497	322050
lzt28	1591760	578289	617498	453170	445806	421804	401166	349753	335243	388712	352455
lzt29	3953035	2570903	2535625	2259281	2235299	2052597	2227835	2013763	1805289	1519904	1437132
lzt30	100000	100397	100015	100009	100394	100033	100782	112373	100020	100393	100653
total	24700817	17231068	17134922	15199691	14815832	14652256	14294776	13573404	13077482	13053476	12268320

BTW I finally got most of these compressors linked into my own test bed for this post. Review of their API's (and some ranting) :

minilzo , quicklz, LZ4 : yes, good, easy. API is basically compress(char *,int len,char *), as it should be.

One minor niggle with these guys : I'm actually leaning towards compress API's taking "void *" now for buffers. It's annoying dealing with API's where somebody thinks "char *" is the way to take an array and someone else wants "const unsigned char *", etc. If it's just memory, take void *, I think. I'm not completely sure about that.

Oh, also "uncompress" ? Where did that come from? It's "decompress". Come on!

Oh, and about LZO : so minilzo is nice and clean, but it only gives you access to the most crappy compressor. To get the good LZO variants you have to use the full LZO sdk which is a nightmare. So I didn't bother and just ran lzopack -9.

miniz : I don't love the single file miniz.c thing; I'd like to have an implementation .c and a header with the externs. It makes it way harder to read the externs and see what is supposed to be public. I also don't like the custom types (mz_uLong and such), I like "int" (I guess "size_t" is okay). But not bad, way better than :

zlib : WTF WTF ridiculous clusterfuck. There's an MSVC make file that seems to make a lib that is not what you want. Then the contrib has some projects for two MSVC versions and not others; then you have to make sure that you get the DLL import and WINAPI defines in your import the same as the built lib. Much hair pulling due to "_deflate not found" link errors.

(BTW I don't entirely blame zlib for that; how many man hours have been wasted due to the linker being so damn unhelpful. Hey jerks who work on the linker, if I'm trying to import "__stdcall deflate" and you only have "__cdecl deflate" then perhaps you could figure out that just maybe I wanted those to hook up. We've only been having this damn problem for the last 20 years. It's the kind of thing where one decent human being on that team would go "hmm I bet we could do something better here". The linker is like the overly precise nerd; if you ask "can you pass the salt" he's like "no, I don't have any salt" , and you're like "fuck you, I can see the salt right next to you" and he's like "nope, no salt", and you're like wtf wtf so you walk around and grab it and you say "okay, what's this?" and he says "that's kosher salt").

Compressed sizes from zlib were slightly smaller than miniz, so I posted them.

lzma : this is a mixed bag; the C++ interface is a total disaster of COM insanity, but you can avoid that and the C interface (LzmaEnc/LzmaDec) is okay. A bit of a mess and you oddly have to handle the props transmission yourself, but so much better than the C++ interface that it looks great in comparison. Very bizarre to expose so many technical encoder settings in the basic API though.

snappy : WTF ? No MSVC build, it uses some crazy STL crap for no reason (iostreams? are you kidding me? using std::string for buffers? WTF), it's all overly complex and bad-C++y. Really gross. There's a mess of files and I can't tell which ones I actually need. Oh, and it generates warnings like crazy; so it's all bloated "clean style" but not actually clean, which is the worst way to be. And then after futzing around with it, I'm rewarded with a really crappy compressor.

I guess I see the point of snappy; it's LZRW1 for modern CPUs. Okay, that's fine. And it is reasonably fast even on incompressible data. So it has a certain use case, perhaps for disk compression, or I guess Google uses it for network data compression or some such. But people have been taking snappy and using it as just a utility data compressor and it is totally terrible for that.

12 comments:

won3d said...: Yeah, Snappy is mostly for RPC. I think most of the pain you're seeing is from the fact that the internal implementation doesn't actually use the C++-y things like iostreams, but whomever open sourced it just jammed it in there. Personally, I don't see why you would use it over LZ4, unless you were already using it everywhere (like we are).

A while back (I might have even asked you), I was going to make a nibble-oriented LZP. Hmm...; September 13, 2012 at 9:33 PM
NeARAZ said...: Speaking of compressor APIs, did you try lzham? Thoughts on it?; September 13, 2012 at 9:45 PM
ryg said...: There's also aPLib. Was really good for a pure LZ77 without entropy encoding around 2000 or so, haven't compared it to anything recently.

It's part of the "byte-aligned literals but bitwise control codes" camp though. Good parser (well-tweaked heuristic, not optimal), and IIRC it has unlimited window size (intended for all-in-memory decompression). And an API that doesn't suck (no code for the compressor though, but if you already have an optimal parser it's trivial to write an encoder).

Code is here: http://www.ibsensoftware.com/products_aPLib.html; September 13, 2012 at 10:50 PM
cbloom said...: @Won - yeah as I was doing this I started thinking about LZP-Nib. (I see from my email history search that you did in fact mail me about it long ago).

I suspect that you don't want to do a "pure" LZP (with only one match string), but rather use a few "ways" in the cache table, or perhaps a few bits of forward hash.

Of course this is mostly just rehashing old ideas :

http://cbloomrants.blogspot.com/2011/05/05-20-11-lzp1-variants.html

Also quicklz is a forward-hash LZP (LZP1c) if I'm understanding it correctly.; September 14, 2012 at 9:55 AM
cbloom said...: LZHAM needs this -

int DictLog2(int rawLen)
{
int l2 = cb::intlog2ceil( (float)rawLen );
return cb::Clamp(l2,LZHAM_MIN_DICT_SIZE_LOG2,LZHAM_MAX_DICT_SIZE_LOG2_X86);
}; September 15, 2012 at 9:47 AM
Anonymous said...: would you be willing to try gipfeli?

http://code.google.com/p/gipfeli/; September 17, 2012 at 6:48 AM
cbloom said...: gipfeli needs at least a download .zip, and really there should be a Win32 .exe . It also appears to be entropy coded using a simple bit-packing scheme.

I believe that if you're doing bit-packing you may as well just do huffman. But perhaps if your speed/ratio is good you would convince me otherwise.; September 17, 2012 at 8:36 AM
cbloom said...: @ryg - I tested aPLib; compression is very similar to CRUSH; September 17, 2012 at 9:35 AM
cbloom said...: @gipfeli - I'm crankier than usual this morning. But god damn I'm sick of dealing with Google Code. (sourceforge and github and etc etc are no better). The main page should always just be a link to download a zip of the full source tree.; September 17, 2012 at 10:09 AM
Anonymous said...: cbloom, if you're looking for a strong LZ77, take a look at LZMAT. It's not entirely about strength, but on some kinds of data it does extremely well, f.e I think it supports arbitrary match length with logarithmic length encoding.

As to aPlib, last time I checked it didn't come with encoder sources...did I miss something?; September 21, 2012 at 9:52 AM
cbloom said...: Re : aPlib - Correct, I just ran the EXE to see if the compression it achieved was interesting. It was not.

Re : LZMAT - yeah I tried it; compression ratios are somewhere around LZ4/CRUSH . It does have slightly different characteristics than the others, but it's definitely small window. Total on my set is 14518303; September 21, 2012 at 11:56 AM
Unknown said...: I know that this is a very old post. I just wanted to comment that aPLib is not a pure LZ77. It has a somewhat strange mechanism for re-using offsets, where instead of including a required offset, it can re-use the offset used the last time when the match was found.; August 24, 2018 at 5:10 PM

cbloom rants

9/13/2012

09-13-12 - LZNib

12 comments:

old rants