Oodle's speed on the Sony PS4 (and Microsoft Xbox One) and Nintendo Switch is superb. With the slower processors in these consoles (compared to a modern PC), the speed advantage of Oodle makes a big difference in total load time or CPU use.
These are run on the private test file "lzt99". I'm mainly looking at the speed numbers here, not the compression ratio (compression wise, we do so well on lzt99 that it's a bit silly, and also not entirely fair to the competition).
On the Nintendo Switch (clang ARM-A57 AArch64 1.02 GHz) :
Oodle 2.6.0 -z8 :
Leviathan : 2.780 to 1 : 205.50 MB/s
Kraken : 2.655 to 1 : 263.54 MB/s
Mermaid : 2.437 to 1 : 499.72 MB/s
Selkie : 1.904 to 1 : 957.60 MB/s
zlib from nn_deflate
zlib : 1.883 to 1 : 74.75 MB/s
And on the Sony PS4 (clang x64 AMD Jaguar 1.6 GHz) :
Oodle 2.6.0 -z8 :
Leviathan : 2.780 to 1 : 271.53 MB/s
Kraken : 2.655 to 1 : 342.49 MB/s
Mermaid : 2.437 to 1 : 669.34 MB/s
Selkie : 1.904 to 1 :1229.26 MB/s
non-Oodle reference (2016) :
brotli-11 : 2.512 to 1 : 77.84 MB/s
miniz : 1.883 to 1 : 85.65 MB/s
brotli-9 : 2.358 to 1 : 95.36 MB/s
zlib-ng : 1.877 to 1 : 109.30 MB/s
zstd : 2.374 to 1 : 133.50 MB/s
lz4hc-safe : 1.669 to 1 : 673.62 MB/s
LZSSE8 : 1.626 to 1 : 767.11 MB/s
The Microsoft XBox One has similar performance to the PS4. Mermaid & Selkie can decode faster than the
hardware DMA compression engine in the PS4 and Xbox One, and usually compress more if they aren't limited
to small chunks like the hardware DMA engine needs.
Note that the PS4 non-Oodle reference data is from my earlier runs back in 2016 : Oodle Mermaid and Selkie on PS4 and PS4 Battle : MiniZ vs Zlib-NG vs ZStd vs Brotli vs Oodle . They should be considered only rough reference points; I imagine some of those codecs are slightly different now, but does even a 10 or 20 or 50% improvement really make much difference? (also note that there's no true zlib reference in that PS4 set; miniz is close but a little different, and zlib-ng is faster than standard zlib).
Leviathan is in a different compression class than any of the other options, and is still 2-3X faster than zlib.
Something I spotted while gathering the old numbers that I think is worth talking about:
If you look at the old Kraken PS4 numbers from
Oodle 2.3.0 : Kraken Improvement
you would see :
PS4 lzt99
old :
Oodle 2.3.0 -z6 : 2.477 to 1 : 389.28 MB/s
Oodle 2.3.0 -z7 : 2.537 to 1 : 363.70 MB/s
vs new :
Oodle 2.6.0 -z8 : 2.655 to 1 : 342.49 MB/s
(-z8 encode level didn't exist back then)
Oh no! Oodle's gotten slower to decode!
Well no, it hasn't. But this is a good example of how looking at just space or speed on their own can be misleading.
Oodle's encoders are always optimizing for a space-speed goal. There are a range of solutions to that problem which have nearly the same space-speed score, but have different sizes or speeds.
So part of what's happened here is that Oodle 2.6.0 is just hitting a slightly different spot in the space-speed solution space than Oodle 2.3.0 is. It's finding a bit stream that is smaller, and trades off some decode speed for that. With its space-speed cost model, it measures that tradeoff as being a good value. (the user can set the relative value of time & bytes that Oodle uses in its scoring via the spaceSpeedTradeoffBytes parameter).
But something else has also happened - Oodle 2.6.0 has just gotten much better. It hasn't just stepped along the Pareto curve to a different but equally good solution - it has stepped perpendicularly to the old Pareto curve and is finding better solutions.
At RAD we measure that using the "correct" Weissman score which provides a way of combining a space-speed point into a single number that can be used to tell whether you have made a real Pareto improvement or just a tangential step.
The easiest way to see that you have definitely made an improvement is to run Oodle 2.6.0 with a different
spaceSpeedTradeoffBytes price so that it provides a simpler relationship :
PS4 lzt99
new, with spaceSpeedTradeoffBytes = 1024
Oodle 2.6.0 -z8 : 2.495 to 1 : 445.56 MB/s
vs old :
Oodle 2.3.0 -z6 : 2.477 to 1 : 389.28 MB/s
Now we have higher compression and higher speed, so there's no question of whether we lost anything.
In general the Oodle 2.6.0 Kraken & Mermaid encoders are making decisions that slightly bias for higher compression (and slower decode; though often the decode speed is very close) than before 2.6.0. If you find you've lost a little decode speed and want it back, increase spaceSpeedTradeoffBytes (try 400).
Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :
Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility
or visit RAD to read for more information about the Oodle SDK
No comments:
Post a Comment