7/25/2016

Oodle Selkie

Selkie is the faster cousin of Mermaid, distant relative of Kraken.

Selkie is all about decode speed, it aims to be the fastest mainstream decompressor in the world, and still gets more compression than anything in the high-speed domain.

Selkie does not currently have a super fast encoder. It's got good optimal parse encoders that produce carefully tuned encoded file which offer excellent space-speed tradeoff.

The closest compressors to Selkie are the fast byte-wise small-window coders like LZ4 and LZSSE (and Oodle's LZB16). These are all just obsolete now (in terms of ratio vs decode speed), Selkie gets a lot more compression (sometimes close to Zlib compression levels!) and is also much faster.

Selkie will not compress tiny buffers, or files that only compress a little bit. For example if you give Selkie something like an mp3, it might be able to compress it to 95% of its original size, saving a few bytes. Selkie will refuse to do that and just give you the original uncompressed file. If you wanted that compression, that means you wanted to save only a few bytes at a large time cost, which means you don't actually want a fast compressor like Selkie. You in fact wanted a compressor that was more willing to trade time for bytes, such as Mermaid or Kraken. Selkie will not abide logical inconsistency.

Selkie generally beats LZ4 compression even on small files (under 64k) but really gets ahead on files larger than 64k where the unbounded match distances can find big wins.

As usual, I'm not picking on LZ4 here because it's bad; I'm comparing to it because it's the best of the rest, and it's widely known. Both decompressors are run fuzz-safe.


Tests on Win64 (Core i7-3770 3.4 GHz) : 
(for reference, this machine runs memcpy at roughly 8 GB/s)
(total of time & size on each test set)

gametestset : ooSelkie    : 143,579,361 ->70,716,380 =  3.940 bpb =  2.030 to 1 
gametestset : decode      : 29.239 millis, 0.69 c/b, rate= 4910.61 mb/s

gametestset : lz4hc       : 143,579,361 ->80,835,018 =  4.504 bpb =  1.776 to 1 
gametestset : decode      : 44.495 millis, 1.05 c/b, rate= 3226.89 mb/s

pd3d : ooSelkie    : 31,941,800 ->13,428,298 =  3.363 bpb =  2.379 to 1 
pd3d : decode      : 8.381 millis, 0.89 c/b, rate= 3811.29 mb/s

pd3d : lz4hc       : 31,941,800 ->14,273,195 =  3.575 bpb =  2.238 to 1 
pd3d : decode      : 13.479 millis, 1.44 c/b, rate= 2369.67 mb/s

seven : ooSelkie    : 80,000,000 ->36,460,084 =  3.646 bpb =  2.194 to 1 
seven : decode      : 21.458 millis, 0.91 c/b, rate= 3728.26 mb/s

seven : lz4hc       : 80,000,000 ->39,990,656 =  3.999 bpb =  2.000 to 1 
seven : decode      : 31.730 millis, 1.35 c/b, rate= 2521.30 mb/s

silesia : ooSelkie    : 211,938,580 ->69,430,966 =  2.621 bpb =  3.053 to 1 
silesia : decode      : 72.340 millis, 1.16 c/b, rate= 2929.77 mb/s

silesia : lz4hc       : 211,938,580 ->77,841,566 =  2.938 bpb =  2.723 to 1 
silesia : decode      : 93.488 millis, 1.50 c/b, rate= 2267.02 mb/s

The edge that Selkie has over LZ4 is even greater on more difficult platforms like the PS4.

To get a better idea of the magic of Selkie it's useful to look at the other Oodle compressors that are similar to Selkie.

LZB16 is Oodle's LZ4 variant; it gets slightly more compression and slightly more decode speed, but they're roughly equal. It's included here for comparison to LZBLW.

Oodle's LZBLW is perhaps the most similar compressor to Selkie. It's like LZB16 (LZ4) but adds large-window matches. That ability to do long-distance matches hurts speed a tiny bit (2873 mb/s -> 2596 mb/s), but helps compression a lot.

Oodle's LZNIB is nibble-wise, with unbounded offsets and a rep match. It gets good compression, generally better than Zlib, with speed much higher than any LZ-Huff. LZNIB is in a pretty unique space speed tradeoff zone without much competition outside of Oodle.


lz4hc     : 24,700,820 ->14,801,510 =  4.794 bpb =  1.669 to 1 
decode    : 9.481 millis, 1.31 c/b, rate= 2605.37 mb/s

ooLZB16   : 24,700,820 ->14,754,643 =  4.779 bpb =  1.674 to 1
decode    : 8.597 millis, 1.18 c/b, rate= 2873.17 mb/s

ooLZNIB   : 24,700,820 ->12,014,368 =  3.891 bpb =  2.056 to 1
decode    : 17.420 millis, 2.40 c/b, rate= 1417.93 mb/s

ooLZBLW   : 24,700,820 ->13,349,800 =  4.324 bpb =  1.850 to 1
decode    : 9.512 millis, 1.31 c/b, rate= 2596.80 mb/s

ooSelkie  : 24,700,820 ->12,752,506 =  4.130 bpb =  1.937 to 1 
decode    : 6.410 millis, 0.88 c/b, rate= 3853.57 mb/s

LZNIB and LZBLW were both pretty cool before Selkie, but now they're just obsolete.

LZBLW gets a nice compression gain over LZB16, but Selkie gets even more, and is way faster!

LZNIB beats Selkie compression, but is way slower, around 3X slower, in fact it's slower than Mermaid (2283.28 mb/s and compresses to 10,838,455 = 3.510 bpb = 2.279 to 1).

You can see from the curves that Selkie just completely covers the curves of LZB16,LZBLW, and LZ4. When a curve is completely covered like that, it means that it was beaten for both space and speed, so there is no domain where that compressor is ever better. LZNIB just peeks out of the Selkie curve because it gets higher compression (albeit at lower speed), so there is a domain where it is the better choice - but in that domain Mermaid just completely dominates LZNIB, so it too is obsolete.

See the index of this series of posts for more information : Introducing Oodle Mermaid and Selkie .
For more about Oodle visit RAD Game Tools

3 comments:

Sanmayce said...

Mr. Bloom, you seem to have a blind spot for LzTurbo, why is that?

> I'm not picking on LZ4 here because it's bad; I'm comparing to it because it's the best of the rest, and it's widely known.

Yann's decompressor is fantastic but Hamid's one is definitely better, as for Conor's LZSSE2 - on text it is the best, why do you say LZ4 is the best?

In my textual '88' benchmark LzTurbo 39 is TURBO-AMAZING, why don't you put it side by side with Selkie/Kraken, 19 also:
https://sanmayce.wordpress.com/2016/05/23/the-88-benchmark/

I would like to see your decompressors compared to LzTurbo 39 and LZSSE2 17.

cbloom said...

I don't generally test on text. On the binary game files I test on, LZSSE8 is much better than LZSSE2. I do run LZSSE8 at level 17, and you're right it is generally slightly better than LZ4 (thought that's largely implementation; my LZB16 which is an LZ4-ish is often faster than LZSSE), though the difference is small on PC's.

I compare to LZ4 instead of LZSSE generally because it's widely known and easy to build on lots of platforms. It's a de-facto standard for fast compression.

As for LZTurbo, you can feel free to run tests with that yourself. I don't generally test other people's exes, there are too many funny issues, I like to test code or libs.

Sanmayce said...

>I don't generally test on text.

Okay, textual realm may be not as profitable as game-oriented data however it is more ... cultural. Accessing fast textual data trumps all other scenarios, literature-wise, even human-wise I dare to say.

>As for LZTurbo, you can feel free to run tests with that yourself. I don't generally test other people's exes, there are too many funny issues, I like to test code or libs.

Fair enough, yet, my feedback is to remind you that the process is two-way - the other way around is also valid - I mean, both Kraken and LzTurbo are special entities defying stupid status-quo, however your performer is not presented even in .exe format let alone library or source - nothing wrong, but that makes your best results kinda ifY (if-adjective used in 'Big Bang Theory' comedy). My point, when one speaks of best there is no room for personal preferences, platform mumbo-jumbo-isms, openness or whatever - just PERFORMANCE, best is either best or not.

>...there are too many funny issues...

Maybe, but, to me, TurboBench is a workinghorse, I see no better benchmark than it, I trust its output.

old rants