2/01/2017

Oodle Hydra


02-01-17 | Oodle Hydra

Oodle Hydra - the many headed beast.

Hydra is a meta-compressor which selects Kraken, Mermaid, or Selkie per block. It uses the speed fit model of each compressor to do a lagrangian space-speed optimization decision about which compressor is maximizing the desired lagrange cost (size + lambda*time).

It turns out to be quite interesting.

(this is of course in addition to each of those compressors internally making space-speed decisions; each of them can enable or disable internal processing modes using the same lagrange optimization model. (eg. they can turn on and off entropy coding for various streams). And there are additional per-block implicit decisions such as choosing uncompressed blocks and huff-only blocks.)

Hydra is a single entry point to all the Oodle compressors. You simply choose how much you care about size vs. decode speed, that corresponds to a certain lagrange lambda. In Oodle this is called "spaceSpeedTradeoffBytes". It's the # of bytes that compression must save in order to take up N cycles more of decode time. You then no longer think about "do I want Kraken or Mermaid" , Oodle makes the right decision for you that optimizes the goal.

Hydra can interpolate the performance of Kraken & Mermaid to create a meta-compressor that targets the points in between. That in itself is a somewhat surprising result. Say Kraken is at 1000 mb/s , Mermaid is at 2000 mb/s decode speed, but you really want a compressor that's around 1500 mb/s with compression between the two. We don't know of a Pareto-optimal compressor that is between Kraken and Mermaid, so you're sunk, right? No, you can use Hydra.

I should note that Hydra is very much about *whole corpus* performance. That is, if your target is 1500 mb/s, you may not hit that on any one file - that file could go either all-Kraken or all-Mermaid. The target is hit overall. This is intentional and good, but if for whatever reason you are trying to hit a specific speed for an individual file then Hydra is not the way to do that.

It leads to an idea that I've tried to advocate for before : corpus lagrange optimization for bit rate allocation. If you are dealing with a limited resource that you want to allocate well, such as disk size or download size or time to load - you want to allocate that resource to the data that can make the best use of it. eg. spend your decode time where it makes the biggest size difference. (I encourage this for lossy bit rate allocation as well). So with Hydra some files decode slower and some decode faster, but when they are slower it's because the time was worth it.

And now some reports. We're going to look at 3 copora. On Silesia and gametestset, Hydra interpolates as expected. But then on PD3D, something magic happens ...

(Oodle 2.4.2 , level 7, Core i7-3770 x64)

Silesia :

total                : Kraken     : 4.106 to 1 : 994.036 MB/s
total                : Mermaid    : 3.581 to 1 : 1995.919 MB/s
total                : Hydra200   : 4.096 to 1 : 1007.692 MB/s
total                : Hydra288   : 4.040 to 1 : 1082.211 MB/s
total                : Hydra416   : 3.827 to 1 : 1474.452 MB/s
total                : Hydra601   : 3.685 to 1 : 1780.476 MB/s
total                : Hydra866   : 3.631 to 1 : 1906.823 MB/s
total                : Hydra1250  : 3.572 to 1 : 2002.683 MB/s

gametestset :

total                : Kraken     : 2.593 to 1 : 1309.865 MB/s
total                : Mermaid    : 2.347 to 1 : 2459.442 MB/s
total                : Hydra200   : 2.593 to 1 : 1338.429 MB/s
total                : Hydra288   : 2.581 to 1 : 1397.465 MB/s
total                : Hydra416   : 2.542 to 1 : 1581.959 MB/s
total                : Hydra601   : 2.484 to 1 : 1836.988 MB/s
total                : Hydra866   : 2.431 to 1 : 2078.516 MB/s
total                : Hydra1250  : 2.366 to 1 : 2376.828 MB/s

PD3D :

total                : Kraken     : 3.678 to 1 : 1054.380 MB/s
total                : Mermaid    : 3.403 to 1 : 1814.660 MB/s
total                : Hydra200   : 3.755 to 1 : 1218.745 MB/s
total                : Hydra288   : 3.738 to 1 : 1249.838 MB/s
total                : Hydra416   : 3.649 to 1 : 1381.570 MB/s
total                : Hydra601   : 3.574 to 1 : 1518.151 MB/s
total                : Hydra866   : 3.487 to 1 : 1666.634 MB/s
total                : Hydra1250  : 3.279 to 1 : 1965.039 MB/s

Whoah! Magic!

On PD3D, Hydra finds big free wins - it not only compresses more than Kraken, it decodes significantly faster, repeating the above to point it out :


total                : Kraken     : 3.678 to 1 : 1054.380 MB/s

total                : Hydra288   : 3.738 to 1 : 1249.838 MB/s
 Kraken compression ratio is in between here, around 1300 MB/s
total                : Hydra416   : 3.649 to 1 : 1381.570 MB/s

You can see it visually in the loglog plot; if you draw a line between Kraken & Mermaid, the Hydra data points are above that line (more compression) and to the right (faster).

What's happening is that once in a while there's a block where Mermaid gets the same or more compression than Kraken. While that's rare, when it does happen you just get a big free win from switching to Mermaid on that block. More often, Mermaid only gets a little bit less compression than Kraken but a lot less decode time, so switching is advantageous in the space-speed lagrange cost.

Crucial to Hydra is having a decoder speed fit for every compressor that can simulate decoding a block and count cycles needed to decode on an imaginary machine. You need a model because you don't want to actually measure the time by running the decoder on the current machine - it would take lots of runs to get reliable timing, and it would mean that you are optimizing for the exact machine that you are encoding on. I currently use a single virtual machine that is a blend of various real platforms; in the future I might expose the ability to use virtual machines that simulate specific target machines (because Hydra might make decisions differently if it knows it is targeting PC-x64 vs. Jaguar-x64 vs. Aarch64-on-A57 , etc.).

Hydra is exciting to me as a general framework for the future of Oodle. It provides a way to add in new compression modes and be sure that they are never worse. That is, you always can start with Kraken per block, and then new modes could be picked block by block only when they are known to beat Kraken (in a space-speed sense). It lets you mix in compressors that you specifically don't expect to be good in general on all data, but that might be amazing once in a while on certain data.

(Hydra requires compressors that carry no state across blocks, so you can't naively mix in something like PPM or CM/PAQ. To optimize a switching choice with compressors that carry state requires a trellis-quantization like lattice dynamic programming optimization and is rather more complex to do quickly)

No comments:

old rants