03-26-13 - Oodle 1.1 and GDC

Hey it's GDC time again, so if you're here come on by the RAD booth and say "hi" (or "fuck you", or whatever).

The Oodle web site just went live a few days ago.

Sometimes I feel embarassed (ashamed? humiliated?) that it's taken me five years to write a file IO and data compression library. Other times I think I've basically written an entire OS by myself (and all the docs, and marketing materials, and a video compressor, and aborted paging engine, and a bunch of other crap) and that doesn't sound so bad. I suppose the truth is somewhere in the middle. (perhaps with Oodle finally being officially released and selling, I might write a little post-mortem about how it's gone, try to honestly look back at it a bit. (because lord knows what I need is more introspection in my life)).

Oodle 1.1 will be out any day now. Main new features :

Lots more platforms.  Almost everything except mobile platforms now.

LZNIB!  I think LZNIB is pretty great.  8X faster to decode than ZLIB and usually
makes smaller files.

Other junk :
All the compressors can run parallel encode & decode now.
Long-range-matcher for LZ matching on huge files (still only in-memory though).
Incremental compressors for online transmission, and faster resets.

Personally I'm excited the core architecture is finally settling down, and we have a more focused direction to go forward, which is mainly the compressors. I hope to be able to work on some new compressors for 1.2 (like a very-high-compression option, which I currently don't have), and then eventually move on to some image compression stuff.


won3d said...

Congrats on launching! This is much more impressive than anything I have ever managed to ship.

Is OodleLZNIB an LZ4-alike with optimal parsing?

cbloom said...

LZNIB is this :


It's nibble-wise not byte-wise ; it's also large-window, whereas LZ4 is small-window. But yeah LZ4 is probably the most comparable thing. LZ4 is almost 2x faster than LZNIB, but LZNIB is faster than most other "fast lz's" like LZO and snappy and such.

I couldn't find anything to directly compare LZNIB against, there don't seem to be any large window fast LZ's.

LZNIB generally beats ZLIB on data > 64k bytes in size, while being 5-10X faster to decode.

One thing I'd like to add to it is the option to do 4-bit literals. Currently LZNIB literals are always 8 bit. On most data it doesn't help but on some files a 4-bit literal table helps massively.

won3d said...

Just a straight table? What about something simple like symbol ranking or LRU or something? I guess the adaptation probably wouldn't help compression much but would kill performance?

cbloom said...

Yeah; LRU is crazy-bad for performance, especially on LHS platforms.

If I was SPU-only I could probably use a 16-wide byte vector and they have sweet enough instructions to just keep that in a register and move one byte around in there and such, but there aren't enough platforms that have good SIMD like that for me to rely on it being fast.

It's also one of those situations where a flat table with occasional retransmission is just as good as adaptive schemes, and is just more asymmetric (much harder to encode (because you have to optimize the retransmission points), much easier to decode).

gpakosz said...

You say LZ4 is almost 2x faster than LZNIB yet it doesn't show in the graphs.

What did I miss?

cbloom said...

Dunno what graph you're talking about, please be more specific.

You can email oodle at radgametools for direct information if you like.

gpakosz said...

Pardon my noob question, I'm not a compression expert and I didn't follow all the series in detail.

I'm not questioning Oodle or the measurements really.

I was mentioning my incapacity to link the "LZ4 is almost 2x faster than LZNIB" with the following charts



But by now I read more of your posts. Noticing the "load plus decomp" title again, I guess you meant LZ4 is faster in the pure decomp but encoding in LZ4 takes more space resulting in those load plus decomp curves?

cbloom said...

Yeah, correct. There's only one chart there which shows pure decomp time :


and you can see LZ4 vs LZNIB down at the bottom of the decomp chart.

Most of the charts are showing some form of load+decomp at various simulated disk speeds.

Some of the background on why I make the charts like this is here :



And here you can see a log-log chart of load plus decomp, which shows the higher simulated disk speed range where LZ4 is on the Pareto frontier :


old rants