cbloom rants

4/30/2016

Oodle Kraken Pareto Frontier

The whole idea of Kraken is to excel in the size-decodespeed tradeoff. From the beginning of its design, every decision was considered in terms of the size vs decodespeed.

So how does it do?

To visualize the size-decodespeed Pareto frontier, I like to use an imaginary loading problem. You want to load compressed data and then decompress it. One option is the null compressor (or "memcpy") that just loads uncompressed data and then memcpys it. Or you can use compressors that give smaller file sizes, thus quicker loads, but take more time to decompress. By dialing the virtual disk speed, you can see which compressor is best at different tradeoffs of space vs. speed.

Of course you may not actually be just trying to optimize load time, but you can still use this imaginary loading problem as a way to study the space-speed tradeoff. If you care more about size, you look at lower imaginary disk speeds, if you core more about CPU use, you look at higher imaginary disk speeds.

I like to show "speedup" which is the factor of increase in speed using a certain compressor to do load+decomp vs. the baseline (memcpy). So the left hand y-intercepts (disk speed -> 0) show the compression ratio, and the right hand y-intercepts side show the decompression speed (disk speed -> inf), and in between shows the tradeoff. (By "show" I mean "is linearly proportional to, so you can actually look at the ratio of Y-intercepts in the Pareto curve and it tells you compression ratio on the left and decompression speed on the right).

The chart showing "millis" shows time, so obviously lower is better. I show a few ways to combine load & decomp. Sum is the time to load then decomp sequentially. Max is the larger of the two, which is the time if they were perfectly overlapped (not usually possible). Personally I think "l2 sum" is most useful. This is the sqrt of sum of squares, which is a kind of sum that biases towards the larger; it's kind of like a mix of "sum" and "max", it means you want the sum to be small, but you also want them to similar times.

Kraken tops the Pareto curve for decodespeed vs size for a huge range of virtual disk speed; from around 2 mb/s up to around 300 mb/s.

Of course the Pareto curve doesn't tell the whole picture. For one thing I don't have encode speed in the equation here at all (and sometimes you need to look at the memory use tradeoff too, so it's really a size-decodespeed-encodespeed-memoryuse surface). For another you sometimes have strict requirements, like I must hit a certain file size, and then you just pick the fastest decoder that meets that requirement. One thing the curves do clearly tell you is when a curve just completely covers another (that is, is greater in Y for all values of X), then that compressor is just never needed (in space & decodespeed).

Game Test Set :

Just for comparison to earlier posts on my Blog about the other Oodle compressors, here's a run of lzt99 with the full Oodle compressor set.

I've included ZStd for reference here just to show that Kraken jumping way up is not because the previous Oodle compressors were bad - far from it, they were already Pareto optimal. (I like to compare against ZStd not because it's bad, but because it's excellent; it's the only other compressor in the world I know of that's even close to Oodle in terms of good compression ratio and high speed (ZStd also has nice super-fast encode speeds, and it's targetted slightly differently, it's better on text and Oodle is better on binary, it's not a direct apples comparison)).

You can see that LZHLW is just completely covered by Kraken, and so is now deprecated in Oodle. Even LZNIB and BitKnit are barely peeking out of the curve, so the range where they are the right answer is greatly reduced to more specialized needs. (for example BitKnit still romps on strongly structured data, and is useful if you just need a size smaller than Kraken)

Anonymous said...

Nit: can you change "mbps" to "MBps" or "MB/s"? :)

cbloom said...

Hmm, I just assumed that / was forbid in URLs but it appears to be fine. So yes.

Joe Duarte said...

I've been confused by your units as well. You seem to be using mb/s to mean megabits per second in some posts, while in others you use mb/s to mean megabytes per second.

For example, in this post I think the numbers you report are actually megabytes per second: http://cbloomrants.blogspot.com/2016/05/ps4-battle-miniz-vs-zlib-ng-vs-zstd-vs.html

I infer this based on values like 255 "mb/s" for Silesia decode whereas in the present post you report almost 2400 "mb/s" in the graph for Silesia decode. It would be helpful if you locked down and normalized your units. In fact, I think something about unit normalization and clarity would be a good addition to your excellent recent post on benchmarking. I think it's a good idea to use the same units throughout, for data size, compressed size, comp rate, decomp rate, etc. What we usually see is size in literal bytes, and then rates in some other unit which is often erroneous because of MB and mb or GB and gb conflation. I would normalize on binary prefixes, with mebibytes (MiB) as the unit, and use three decimal places for data sizes.

cbloom said...

Units are always megabytes, I never use bits, I wish nobody ever used bits. Actually millions of bytes not 2^20 of bytes.

I could pretend that B = byte and b = bit but the idea that that's a clear standard is I think a fantasy.

The inconsistency is just because the PS4 is massively slower than even my very old mid-range PC. The raw numbers are kind of all over the place in my posts because I test on a variety of hardware, not one reference box. That's sort of intentional because the perf characteristics vary by machine so testing on just one standard box would be misleading.

In general it's valid to compare speed numbers within one post but not across posts.