cbloom rants


Oodle 2.5.5 - encoder bug fix

Oodle 2.5.5 fixes a bug in the Kraken & Mermaid encoders which could cause them to make compressed data that decodes incorrectly (producing output different than the original) or could cause the decoder to return failure.

This bug was present from Oodle 2.5.0 to 2.5.4 ; if you use those versions you should update to 2.5.5

When the bug occurs, the OodleLZ_Compress call returns success, thinking it made valid compressed data, but it has actually made a damaged bit stream. When you call Decompress it might return failure, or it might return success but produce decompressed output that does not match the original bits.

Any compressed data that you have made which decodes successfully (and matches the original uncompressed data) is fine. The presence of the bug can only be detected by attempting to decode compressed data and checking that it matches the original uncompressed data.

The decoder is not affected by this bug, so if you have shipped user installations that only do decoding, they don't need to be updated. If you have compressed files which were made incorrectly because of this bug, you can patch only those individual compressed files.

Technical details :

This bug was caused by one of the internal bit stream write pointers writing past the end of its bits, potentially over-writing another previously written bit stream. This caused some of the previously written bits to become garbage, causing them to decode into something other than what they had been encoded from.

This only occured with 64-bit encoders. Any data written by 32-bit encoders is not affected by this bug.

This bug could in theory occur on any Kraken & Mermaid compressed data. In practice it's very rare and I've only seen it in one particular case - "whole huff chunks" on data that is only getting a little bit of compression, with uncompressed data that has a trinary byte structure (such as 24-bit RGB). It's also much more likely in pre-2.3.0 compatibility mode (eg. with OodleLZ_BackwardsCompatible_MajorVersion=2 or lower).


Oodle 2.5.4 - now with Windows UWP

Oodle 2.5.4 is out. There's now a separate Windows UWP SDK (separate from Win32).

Oodle for Windows UWP comes with only the "core" library that does memory to memory compression. The Oodle Core library uses no threads, has minimal dependencies (just the CRT), no funny business, making it very portable.

For full details see the Oodle Change Log


Well Crap

I was cleaning my blog, deleting a bunch old posts, and accidentally deleted some I didn't want to. I'm going to repost a few, so if you have a subscription you may see odd old posts floating in because of that.

Unfortunately there's no blogger recover or trash can feature that I can just undo the delete. Frowny face. Also, while I can repost them, the comments are gone. And unfortunately it seems I can't post them to the same URL. The blogger post URL seems to be irrevocably marked with the post date, and even if I retro-date the post, it munges the URL to not be the same as the original.

ADD : I reposted a few of the ones I wanted to save. The new links are :

cbloom rants 09-27-08 On LZ and ACB
cbloom rants 10-05-08 Rant on New Arithmetic Coders
cbloom rants 10-06-08 Followup on the Russian Range Coder
cbloom rants 10-07-08 Random file stuff I've learned
cbloom rants 10-07-08 A little more on arithmetic coding ...
cbloom rants 10-08-08 Arithmetic coders throw away accuracy in lots of little places.
cbloom rants 10-10-08 On LZ Optimal Parsing
cbloom rants 10-10-08 On the Art of Good Arithmetic Coder Use


Oodle Perf with Chunking and Dictionary Size

I get a lot of customers that want to cut their data into small blocks for paging, who ask "what's the benefit of using larger blocks" ?

The larger the block = more compression, and can help throughput (decode speed).

Obviously larger block = longer latency (to load & decode one whole block).

(though you can get data out incrementally, you don't have to wait for the whole decode to get the first byte out; but if you only needed the last byte of the block, it's strictly longer latency).

If you need fine grain paging, you have to trade off the desire to get precise control of your loading with small blocks & the benefits of larger blocks.

(obviously always follow general good paging practice, like amortize disk seeks, combine small resources into paging units, don't load a 256k chunk and just keep 1k of it and throw the rest away, etc.)

As a reference point, here's Kraken on Silesia with various chunk sizes :

Silesia : (Kraken Normal -z4)

 16k : ooKraken    : 211,938,580 ->75,624,641 =  2.855 bpb =  2.803 to 1 
 16k : decode           : 264.190 millis, 4.24 c/b, rate= 802.22 mb/s

 32k : ooKraken    : 211,938,580 ->70,906,686 =  2.676 bpb =  2.989 to 1 
 32k : decode           : 217.339 millis, 3.49 c/b, rate= 975.15 mb/s

 64k : ooKraken    : 211,938,580 ->67,562,203 =  2.550 bpb =  3.137 to 1 
 64k : decode           : 195.793 millis, 3.14 c/b, rate= 1082.46 mb/s

128k : ooKraken    : 211,938,580 ->65,274,250 =  2.464 bpb =  3.247 to 1 
128k : decode           : 183.232 millis, 2.94 c/b, rate= 1156.67 mb/s

256k : ooKraken    : 211,938,580 ->63,548,390 =  2.399 bpb =  3.335 to 1 
256k : decode           : 182.080 millis, 2.92 c/b, rate= 1163.99 mb/s

512k : ooKraken    : 211,938,580 ->61,875,640 =  2.336 bpb =  3.425 to 1 
512k : decode           : 182.018 millis, 2.92 c/b, rate= 1164.38 mb/s

1024k: ooKraken    : 211,938,580 ->60,602,177 =  2.288 bpb =  3.497 to 1 
1024k: decode           : 181.486 millis, 2.91 c/b, rate= 1167.80 mb/s

files: ooKraken    : 211,938,580 ->57,451,361 =  2.169 bpb =  3.689 to 1 
files: decode           : 206.305 millis, 3.31 c/b, rate= 1027.31 mb/s

16k   :  2.80:1 ,   15.7 enc mbps ,  802.2 dec mbps
32k   :  2.99:1 ,   19.7 enc mbps ,  975.2 dec mbps
64k   :  3.14:1 ,   22.8 enc mbps , 1082.5 dec mbps
128k  :  3.25:1 ,   24.6 enc mbps , 1156.7 dec mbps
256k  :  3.34:1 ,   25.5 enc mbps , 1164.0 dec mbps
512k  :  3.43:1 ,   25.4 enc mbps , 1164.4 dec mbps
1024k :  3.50:1 ,   24.6 enc mbps , 1167.8 dec mbps
files :  3.69:1 ,   18.9 enc mbps , 1027.3 dec mbps

(note these are *chunks* not a window size; no carry-over of compressor state or dictionary is allowed across chunks. "files" means compress the individual files of silesia as whole units, but reset compressor between files.)

You may have noticed that the chunked files (once you get past the very small 16k,32k) are somewhat faster to decode. This is due to keeping match references in the CPU cache in the decoder.

Limitting the match window (OodleLZ_CompressOptions::dictionarySize) gives the same speed benefit for staying in cache, but with a smaller compression win.

window 128k : ooKraken    : 211,938,580 ->61,939,885 =  2.338 bpb =  3.422 to 1 
window 128k : decode           : 181.967 millis, 2.92 c/b, rate= 1164.71 mb/s

window 256k : ooKraken    : 211,938,580 ->60,688,467 =  2.291 bpb =  3.492 to 1 
window 256k : decode           : 182.316 millis, 2.93 c/b, rate= 1162.48 mb/s

window 512k : ooKraken    : 211,938,580 ->59,658,759 =  2.252 bpb =  3.553 to 1 
window 512k : decode           : 184.702 millis, 2.97 c/b, rate= 1147.46 mb/s

window 1M : ooKraken    : 211,938,580 ->58,878,065 =  2.222 bpb =  3.600 to 1 
window 1M : decode           : 184.912 millis, 2.97 c/b, rate= 1146.16 mb/s

window 2M :  ooKraken    : 211,938,580 ->58,396,432 =  2.204 bpb =  3.629 to 1 
window 2M :  decode           : 182.231 millis, 2.93 c/b, rate= 1163.02 mb/s

window 4M :  ooKraken    : 211,938,580 ->58,018,936 =  2.190 bpb =  3.653 to 1 
window 4M : decode           : 182.950 millis, 2.94 c/b, rate= 1158.45 mb/s

window 8M : ooKraken    : 211,938,580 ->57,657,484 =  2.176 bpb =  3.676 to 1 
window 8M : decode           : 189.241 millis, 3.04 c/b, rate= 1119.94 mb/s

window 16M: ooKraken    : 211,938,580 ->57,525,174 =  2.171 bpb =  3.684 to 1 
window 16M: decode           : 202.384 millis, 3.25 c/b, rate= 1047.21 mb/s

files     : ooKraken    : 211,938,580 ->57,451,361 =  2.169 bpb =  3.689 to 1 
files     : decode           : 206.305 millis, 3.31 c/b, rate= 1027.31 mb/s

window 128k:  3.42:1 ,   20.1 enc mbps , 1164.7 dec mbps
window 256k:  3.49:1 ,   20.1 enc mbps , 1162.5 dec mbps
window 512k:  3.55:1 ,   20.1 enc mbps , 1147.5 dec mbps
window 1M  :  3.60:1 ,   20.0 enc mbps , 1146.2 dec mbps
window 2M  :  3.63:1 ,   19.7 enc mbps , 1163.0 dec mbps
window 4M  :  3.65:1 ,   19.3 enc mbps , 1158.5 dec mbps
window 8M  :  3.68:1 ,   18.9 enc mbps , 1119.9 dec mbps
window 16M :  3.68:1 ,   18.8 enc mbps , 1047.2 dec mbps
files      :  3.69:1 ,   18.9 enc mbps , 1027.3 dec mbps

WARNING : tuning perf to cache size is obviously very machine dependent; I don't really recommend fiddling with it unless you know the exact hardware you will be decoding on. The test machine here has a 4 MB L3, so speed falls off slightly as window size approaches 4 MB.

If you do need to use tiny chunks with Oodle ("tiny" being 32k or smaller; 128k or above is in the normal intended operating range) here are a few tips to consider :

1. Consider pre-allocating the Decoder object and passing in the memory to the OodleLZ_Decompress calls. This avoids doing a malloc per call, which may or may not be significant overhead.

2. Consider changing OodleConfigValues::m_OodleLZ_Small_Buffer_LZ_Fallback_Size . The default is 2k bytes. Buffers smaller than that will use LZB16 instead of the requested compressor, because many of the new ones don't do well on tiny buffers. If you want to have control of this yourself, you can set this to 0.

3. Consider changing OodleLZ_CompressOptions::spaceSpeedTradeoffBytes . This is the number of bytes that must be saved from the compressed output size before the encoder will choose a slower decode mode. eg. it controls decisions like whether literals are sent raw or with entropy coding. This number is scaled for full size buffers (128k bytes or more). When using tiny buffers, it will choose to avoid entropy coding more often. You may wish to dial down this value to scale to your buffers. The default is 256 ; I recommend trying 128 to see what the effect is.


Oodle on the Nintendo Switch

Oodle is coming soon to the Nintendo Switch (NX), an ARM A57 device.

Initial performance test vs. the software zlib (1.2.8) provided in the Nintendo SDK :

lzt99                : nn_deflate : 1.883 to 1 : 74.750 MB/s
lzt99                : LZNA       : 2.723 to 1 : 24.886 MB/s
lzt99                : Kraken     : 2.549 to 1 : 238.881 MB/s
lzt99                : Hydra      : 2.519 to 1 : 274.433 MB/s
lzt99                : Mermaid    : 2.393 to 1 : 328.930 MB/s
lzt99                : Selkie     : 1.992 to 1 : 660.859 MB/s

total                : nn_deflate : 1.883 to 1 : 74.750 MB/s
total                : LZNA       : 2.723 to 1 : 24.886 MB/s
total                : Kraken     : 2.549 to 1 : 238.881 MB/s
total                : Hydra      : 2.519 to 1 : 274.433 MB/s
total                : Mermaid    : 2.393 to 1 : 328.930 MB/s
total                : Selkie     : 1.992 to 1 : 660.859 MB/s

Kraken is 3.21X faster than zlib, with way more compression (35% more).

All tests single threaded, 64-bit.

I've also included Hydra, at a space-speed tradeoff value between Kraken & Mermaid (sstb=300). It's a bit subtle, perhaps you can see it best in the loglog chart, but Hydra here is not just interpolating between Kraken & Mermaid performance, it's actually beating both of them in a Pareto frontier sense.

UPDATE : these are rather old initial port numbers on the Switch. We've done some optimization work since then, and the speeds are way up, particularly Mermaid & Selkie. Don't read too much into the numbers here. I should update them...


Introducing Oodle Mermaid and Selkie

I'm pleased to announce the release of two new compressors in Oodle 2.3.0 : Mermaid and Selkie.

Mermaid and Selkie are the super-fast-to-decode distant relatives of Kraken. They use some of the same ideas and technology as Kraken, but are independent compressors targetted at even higher speed and lower compression. Mermaid & Selkie make huge strides in what's possible in compression in the high-speed domain, the same way that Kraken did in the high-compression domain.

Mermaid is about twice as fast as Kraken, but with compression around Zlib levels.

Selkie is one of the fastest decompressors in the world, and also gets much more compression than other very-high-speed compressors.

( Oodle is my data compression library that we sell at RAD Game Tools , read more about it there )

Kraken, Mermaid, and Selkie all use an architecture that makes space-speed decisions in the encoder to give the best tradeoff of compressed size vs decoding speed. The three compressors have different performance targets and make decisions suited for each one's usage domain (Kraken favors more compression and will give up some speed, Selkie strongly favors speed, Mermaid is in between).

For detailed information about the new Mermaid and Selkie I've written a series of posts :

cbloom rants Introducing Oodle Mermaid and Selkie
cbloom rants Oodle 2.3.0 All Test Sets
cbloom rants Oodle 2.3.0 ARM Report
cbloom rants Oodle Mermaid and Selkie on PS4
cbloom rants Oodle Mermaid
cbloom rants Oodle Selkie
RAD Game Tools - Oodle Network and Data Compression

Here are some representative numbers on the Silesia test set : (sum of time and size on individual files)

Oodle 2.3.0 Silesia -z6

Kraken     : 4.082 to 1 : 999.389 MB/s
Mermaid    : 3.571 to 1 : 2022.038 MB/s
Selkie     : 3.053 to 1 : 2929.770 MB/s

zstdmax    : 4.013 to 1 : 468.497 MB/s
zlib9      : 3.128 to 1 : 358.681 MB/s
lz4hc      : 2.723 to 1 : 2267.021 MB/s

on Win64 (Core i7-3770 3.4 GHz)

On Silesia, Mermaid is 5.65X faster to decode than zlib, and gets 14% more compression. Selkie is 1.3X faster to decode than LZ4 and gets 12% more compression.

Charts on Silesia total : (charts show time and size - lower is better!)

And the speedup chart on Silesia, which demonstrates the space-speed efficiency of a compressor in different usage domains.

Kraken was a huge step in the Pareto frontier that pushed the achievable speedup factor way up beyond what other compressers were doing. There's a pre-Kraken curve where we thought the best possible tradeoff existed, that most other compressors in the world roughly lie on (or under). Kraken set a new frontier way up on its own with nobody to join it; Mermaid & Selkie are the partners on that new curve that have their peaks at higher speeds than Kraken.

You can also see this big jump of the new family very easily in scatter plots, which we'll see in later posts .

Oodle Mermaid and Selkie on PS4

The PS4 is a lovely platform to benchmark on because it's standard. It's also very easy to run tests on. Performance of these compressors on the Xbox One is extremely similar, small variatation due to clock rate (1.75 vs 1.6 GHz) and compiler (MSVC vs clang/llvm).

Everything is slow on the PS4 in absolute terms (it's a slow chip and difficult to optimize for). The Oodle compressors do very well, even better in relative terms on PS4 than on typical PC's.

Kraken is usually around ~2X faster than ZStd on PC's, but is 3X faster on PS4. Mermaid is usually just slightly slower than LZ4 on PC's, but is solidly faster than LZ4 on PS4.

lzt99 :

Kraken     : 2.477 to 1 : 390.582 MB/s
Mermaid    : 2.279 to 1 : 749.896 MB/s
Selkie     : 1.937 to 1 : 1159.064 MB/s

zstd       : 2.374 to 1 : 133.498 MB/s
miniz      : 1.883 to 1 : 85.654 MB/s
lz4hc-safe : 1.669 to 1 : 673.616 MB/s
LZSSE8     : 1.626 to 1 : 767.106 MB/s

Mermaid is faster than LZ4 on PS4 !! Wow! And the compression level is in a totally different domain than other super-fast decompressors like LZ4 or LZSSE.

lzt99 is a good case for Selkie & Mermaid. Selkie beats zlib compression ratio while being 75% faster than LZ4.

All compressors here are fuzz-safe, and run in safe mode if they have optional safe/unsafe modes.

Charts : (showing time and size - lower is better!)

lzt99 :

the raw data :

PS4 : Oodle 230 : (-z6)

inName : lzt:/lzt99

reference :

miniz      : 24,700,820 ->13,120,668 =  4.249 bpb =  1.883 to 1
miniz_decompress_time : 288.379 millis, 18.61 c/b, rate= 85.65 mb/s

zstd       : 24,700,820 ->10,403,228 =  3.369 bpb =  2.374 to 1
zstd_decompress_time : 185.028 millis, 11.94 c/b, rate= 133.50 mb/s

lz4hc      : 24,700,820 ->14,801,510 =  4.794 bpb =  1.669 to 1
LZ4_decompress_safe_time : 36.669 millis, 2.37 c/b, rate= 673.62 mb/s

LZSSE8     : 24,700,820 ->15,190,395 =  4.920 bpb =  1.626 to 1
decode_time      : 32.200 millis, 2.08 c/b, rate= 767.11 mb/s

Oodle :

Kraken     : 24,700,820 -> 9,970,882 =  3.229 bpb =  2.477 to 1
decode           : 63.241 millis, 4.08 c/b, rate= 390.58 mb/s

Mermaid    : 24,700,820 ->10,838,455 =  3.510 bpb =  2.279 to 1
decode           : 32.939 millis, 2.13 c/b, rate= 749.90 mb/s

Selkie : 24,700,820 ->12,752,506 =  4.130 bpb =  1.937 to 1
decode           : 21.311 millis, 1.38 c/b, rate= 1159.06 mb/s

BTW for reference, the previous best compressor in Mermaid's domain was LZNIB. Before these new compressors, LZNIB was quite unique in that it got good decode speeds, much faster than the LZ-Huffs of the time (eg. 3X faster than ZStd) but with compression usually better than ZLib. Well, LZNIB is still quite good compared to other competition, but it's just clobbered by the new Oceanic Bestiary compressors. The new compressor in this domain is Mermaid and it creams LZNIB for both size and speed :

LZNIB -z6  : 24,700,820 ->12,015,591 =  3.892 bpb =  2.056 to 1 
decode           : 58.710 millis, 3.79 c/b, rate= 420.73 mb/s

Mermaid    : 24,700,820 ->10,838,455 =  3.510 bpb =  2.279 to 1
decode           : 32.939 millis, 2.13 c/b, rate= 749.90 mb/s

See the index of this series of posts for more information : Introducing Oodle Mermaid and Selkie .
For more about Oodle visit RAD Game Tools

Oodle Mermaid

Mermaid is a new compressor with a unique balance of space and speed. Mermaid is very close to LZ4 decode speeds, while usually beating Zlib compression ratios.

There's really nothing even close. It's way beyond what was previously thought possible.

Mermaid supports unbounded distance match references. This is part of how it gets such high compression. It does so in a new way which reduces the speed penalty normally incurred by large-window LZ's.

Mermaid almost always compresses better than ZLib. The only exception is on small files, less than 32k or so. The whole Oceanic Bestiary family is best suited to files over 64k. They work fine on smaller files, but they lose their huge advantage. It's always best to combine small files into larger units for compression, particularly so with these compressors.

There's not really any single compressor to compare Mermaid to. What we can do is compare vs. Zlib's compression ratio and LZ4's speed. A kind of mythological hybrid like a Chimera, the head of a Zlib and the legs of an LZ4.

Tests on Win64 (Core i7-3770 3.4 GHz) :

Silesia :

On Silesia, Mermaid is just slightly slower than LZ4 but compresses much more than Zlib !!

PD3D :

On PD3D, Mermaid gets almost exactly the compression level of ZLib but the decode speed of LZ4. Magic! It turns out you *can* have your cake and eat it too.

Game Test Set :

lzt99 :

Mermaid really compresses well on lzt99 ; not only does it kill Zlib, it gets close to high compression LZ-Huffs like RAR. (RAR gets 10826108 , Mermaid 10838455 bytes).

Seven :

Because of the space-speed optimizing nature of Mermaid, it will make decisions to be slower than LZ4 when it can find big compression gains. For example if you look at the individual files of the "Seven" test set below - Mermaid is typically right around the same speed as LZ4 or even faster (baby7,dds7,exe7,game7,wad7 - all same speed or faster than LZ4). On a few files it decides to take an encoding slower to decode than LZ4 - model7,enwik7, and records7. The biggest differences are enwik7 and records7, but if you look at the compression ratios - those are all the files where it found huge size differences over LZ4. It has an internal exchange rate for time vs. bytes that it must meet in order to take that encoding, trying to optimize for its space-speed target usage.

Seven files :

Silesia              : Mermaid    : 3.571 to 1 : 2022.038 MB/s
Silesia              : lz4hc      : 2.723 to 1 : 2267.021 MB/s
Silesia              : zlib9      : 3.128 to 1 : 358.681 MB/s

GameTestSet          : Mermaid    : 2.284 to 1 : 2718.095 MB/s
GameTestSet          : lz4hc      : 1.776 to 1 : 3226.887 MB/s
GameTestSet          : zlib9      : 1.992 to 1 : 337.986 MB/s

lzt99                : Mermaid    : 2.279 to 1 : 2283.278 MB/s
lzt99                : lz4hc      : 1.669 to 1 : 2605.366 MB/s
lzt99                : zlib9      : 1.883 to 1 : 309.304 MB/s

PD3D                 : Mermaid    : 2.875 to 1 : 2308.830 MB/s
PD3D                 : lz4hc      : 2.238 to 1 : 2369.666 MB/s
PD3D                 : zlib9      : 2.886 to 1 : 382.349 MB/s

Seven                : Mermaid    : 2.462 to 1 : 2374.212 MB/s
Seven                : lz4hc      : 2.000 to 1 : 2521.296 MB/s
Seven                : zlib9      : 2.329 to 1 : 315.370 MB/s

See the index of this series of posts for more information : Introducing Oodle Mermaid and Selkie .
For more about Oodle visit RAD Game Tools

Oodle Selkie

Selkie is the faster cousin of Mermaid, distant relative of Kraken.

Selkie is all about decode speed, it aims to be the fastest mainstream decompressor in the world, and still gets more compression than anything in the high-speed domain.

Selkie does not currently have a super fast encoder. It's got good optimal parse encoders that produce carefully tuned encoded file which offer excellent space-speed tradeoff.

The closest compressors to Selkie are the fast byte-wise small-window coders like LZ4 and LZSSE (and Oodle's LZB16). These are all just obsolete now (in terms of ratio vs decode speed), Selkie gets a lot more compression (sometimes close to Zlib compression levels!) and is also much faster.

Selkie will not compress tiny buffers, or files that only compress a little bit. For example if you give Selkie something like an mp3, it might be able to compress it to 95% of its original size, saving a few bytes. Selkie will refuse to do that and just give you the original uncompressed file. If you wanted that compression, that means you wanted to save only a few bytes at a large time cost, which means you don't actually want a fast compressor like Selkie. You in fact wanted a compressor that was more willing to trade time for bytes, such as Mermaid or Kraken. Selkie will not abide logical inconsistency.

Selkie generally beats LZ4 compression even on small files (under 64k) but really gets ahead on files larger than 64k where the unbounded match distances can find big wins.

As usual, I'm not picking on LZ4 here because it's bad; I'm comparing to it because it's the best of the rest, and it's widely known. Both decompressors are run fuzz-safe.

Tests on Win64 (Core i7-3770 3.4 GHz) : 
(for reference, this machine runs memcpy at roughly 8 GB/s)
(total of time & size on each test set)

gametestset : ooSelkie    : 143,579,361 ->70,716,380 =  3.940 bpb =  2.030 to 1 
gametestset : decode      : 29.239 millis, 0.69 c/b, rate= 4910.61 mb/s

gametestset : lz4hc       : 143,579,361 ->80,835,018 =  4.504 bpb =  1.776 to 1 
gametestset : decode      : 44.495 millis, 1.05 c/b, rate= 3226.89 mb/s

pd3d : ooSelkie    : 31,941,800 ->13,428,298 =  3.363 bpb =  2.379 to 1 
pd3d : decode      : 8.381 millis, 0.89 c/b, rate= 3811.29 mb/s

pd3d : lz4hc       : 31,941,800 ->14,273,195 =  3.575 bpb =  2.238 to 1 
pd3d : decode      : 13.479 millis, 1.44 c/b, rate= 2369.67 mb/s

seven : ooSelkie    : 80,000,000 ->36,460,084 =  3.646 bpb =  2.194 to 1 
seven : decode      : 21.458 millis, 0.91 c/b, rate= 3728.26 mb/s

seven : lz4hc       : 80,000,000 ->39,990,656 =  3.999 bpb =  2.000 to 1 
seven : decode      : 31.730 millis, 1.35 c/b, rate= 2521.30 mb/s

silesia : ooSelkie    : 211,938,580 ->69,430,966 =  2.621 bpb =  3.053 to 1 
silesia : decode      : 72.340 millis, 1.16 c/b, rate= 2929.77 mb/s

silesia : lz4hc       : 211,938,580 ->77,841,566 =  2.938 bpb =  2.723 to 1 
silesia : decode      : 93.488 millis, 1.50 c/b, rate= 2267.02 mb/s

The edge that Selkie has over LZ4 is even greater on more difficult platforms like the PS4.

To get a better idea of the magic of Selkie it's useful to look at the other Oodle compressors that are similar to Selkie.

LZB16 is Oodle's LZ4 variant; it gets slightly more compression and slightly more decode speed, but they're roughly equal. It's included here for comparison to LZBLW.

Oodle's LZBLW is perhaps the most similar compressor to Selkie. It's like LZB16 (LZ4) but adds large-window matches. That ability to do long-distance matches hurts speed a tiny bit (2873 mb/s -> 2596 mb/s), but helps compression a lot.

Oodle's LZNIB is nibble-wise, with unbounded offsets and a rep match. It gets good compression, generally better than Zlib, with speed much higher than any LZ-Huff. LZNIB is in a pretty unique space speed tradeoff zone without much competition outside of Oodle.

lz4hc     : 24,700,820 ->14,801,510 =  4.794 bpb =  1.669 to 1 
decode    : 9.481 millis, 1.31 c/b, rate= 2605.37 mb/s

ooLZB16   : 24,700,820 ->14,754,643 =  4.779 bpb =  1.674 to 1
decode    : 8.597 millis, 1.18 c/b, rate= 2873.17 mb/s

ooLZNIB   : 24,700,820 ->12,014,368 =  3.891 bpb =  2.056 to 1
decode    : 17.420 millis, 2.40 c/b, rate= 1417.93 mb/s

ooLZBLW   : 24,700,820 ->13,349,800 =  4.324 bpb =  1.850 to 1
decode    : 9.512 millis, 1.31 c/b, rate= 2596.80 mb/s

ooSelkie  : 24,700,820 ->12,752,506 =  4.130 bpb =  1.937 to 1 
decode    : 6.410 millis, 0.88 c/b, rate= 3853.57 mb/s

LZNIB and LZBLW were both pretty cool before Selkie, but now they're just obsolete.

LZBLW gets a nice compression gain over LZB16, but Selkie gets even more, and is way faster!

LZNIB beats Selkie compression, but is way slower, around 3X slower, in fact it's slower than Mermaid (2283.28 mb/s and compresses to 10,838,455 = 3.510 bpb = 2.279 to 1).

You can see from the curves that Selkie just completely covers the curves of LZB16,LZBLW, and LZ4. When a curve is completely covered like that, it means that it was beaten for both space and speed, so there is no domain where that compressor is ever better. LZNIB just peeks out of the Selkie curve because it gets higher compression (albeit at lower speed), so there is a domain where it is the better choice - but in that domain Mermaid just completely dominates LZNIB, so it too is obsolete.

See the index of this series of posts for more information : Introducing Oodle Mermaid and Selkie .
For more about Oodle visit RAD Game Tools

Oodle 2.3.0 ARM Report

I prepared a detailed report of Oodle's performance on ARM mobile devices (Android and iOS).

The full report is here :

oodle_arm_report on cbloom.com

It's a thorough test on many devices and several corpora. See the full details there.

Cliff notes is : Oodle's great on ARM.

For example on the iPadAir2 64-bit , on Silesia :

We found that the iOS devices are generally very good and easy to program for. They're more like desktop Intel chips; they don't have any terrible performance cliffs. The Android ARM devices we tested on were rather more difficult. For one thing they have horrible thermal saturation problems that makes testing on them very difficult. They also have some odd performance quirks.

I'm sure we could get a lot more speed on ARM, but it's rather nasty to optimize for. For one thing the thermal problems mean that iterating and getting good numbers is a huge pain. It's hard to tell if a change helped or not. For another, there's a wide variety of devices and it's hard to tell which to optimize for, and they have different performance shortfalls. So there's definitely a lot left on the table here.

Mermaid & Selkie are quite special on ARM. Many of these devices have small caches (as small as 512k L2) and very slow main memory (slow wrst latency; they often have huge bandwidth, but latency is what I need). Mermaid & Selkie are able to use unbounded windows for LZ without suffering a huge speed hit, due to the unique way they are structured. Kraken doesn't have the same magic trick so it benefits from a limited window, as demonstrated in the report.

See the index of this series of posts for more information : Introducing Oodle Mermaid and Selkie .
For more about Oodle visit RAD Game Tools

old rants