An LZ Codec Designed for SSE Decompression

LZSSE code

Some good stuff.

Basically this is a nibble control word LZ (like LZNIB). The nibble has a threshold value T, < T is an LRL (literal run len), >= T is a match length. LZSSET are various threshold variants. As Conor noted, ideally T would be variable, optimized per file (or even better - per quantum) to adapt to different data better.

LZSSE has a 64k window (like LZ4/LZB16) but unlike them supports MML (minimum match length) of 3. MML 3 typically helps compression a little, but in scalar decoders it really hurts speed.

I think the main interesting idea (other than implementation details) is that by limitting the LRL and ML, with no excess/overflow support (ML overflow is handled with continue-match nibbles), it means that you can do a non-looping output of 8/16 bytes. You get long matches or LRL's by reading more control nibbles.

That is, a normal LZ actually has a nested looping structure :

loop on controls from packed stream
 control specifies lrl/ml

 loop on lrl/ml
   output bytes

LZSSE only has *one* outer loop on controls.

There are some implementation problems at the moment. The LZSSE2 optimal parse encoder is just broken. It's unusably slow and must have some bad N^2 degeneracy. This can be fixed, it's not a problem with the format.

Another problem is that LZSSE2 expands incompressible data too much. Real world data (particularly in games) often has incompressible data mixed with compressible. The ideal fix would be to have the generalized LZSSET and choose T per quantum. A simpler fix would be to do something like cut files into 16k or 64k quanta, and to select the best of LZSSE2/4/8 per-quantum and also support uncompressed quanta to prevent expansion.

I will take this moment to complain that the test sets everyone is using are really shit. Not Conors fault, but enwiks and Silesia are grossly not at all representative of data that we see in the real world. Silesia is mostly text and weird highly-compressible data; the file I like best in there for my own testing is "mozilla" (though BTW mozilla also contains a bunch of internal zlib streams; it benefits enormously from precomp). We need a better test corpus!!!

No comments:

old rants