5/15/2020

Oodle and UE4 Loading Time

Jon Olick has done a study for us on what benefit Oodle Data compression has on Unreal load times :

Oodle and UE4 Loading Time

This is different than the benchmarks I've shown here, which are just the decompressors being run in isolation in a test bench. Jon measures times in Unreal. "Time to first frame" is the best measurement of load time, but it isn't total load time, as Unreal continues to do async loading after first frame.

Jon measured times on his PC, but with Unreal locked to one thread (as well as we could), and with various simulated disk speeds. This let us see the times more clearly, and lets us make some times that are more comparable to slower machines, or the last gen of consoles. Consoles like Switch will see the most load time benefit from Oodle, as they need help with both CPU time and IO time, so the double benefit of Oodle really helps there (Ooodle gives more compression than zlib and is also much faster to decode, so you save time on both IO and decompression).

Since Unreal 4.24 we're offer Oodle Data Compression and Network plugins that drop into Unreal without any changes to the Unreal Engine code. This uses the facilities that Epic added for us to make the compressor & network stack pluggable so that we wouldn't have to patch Engine source code to do our integration. (thanks Epic!). Now, you just drop in our plugins and Oodle automatically replaces the default compression in Unreal.

Some previous posts on Oodle in Unreal for more :

Oodle Data compression in Unreal

Oodle Network compression test (including Unreal)

5/10/2020

Oodle 2.8.6 Released

Oodle 2.8.6 is out now. See the full changelog at RAD.

Oodle 2.8.6 continues small fixes and tweaks in the 2.8.x major version. Leviathan decompression is now 5-10% faster on modern processors.

I have a new standard test machine, so I want to use this space to leave myself some checkpoint reference numbers on the new machine. My new standard machine is :

AMD Ryzen 9 3950X (CPU locked at 3393 MHz)
Zen2, 16 cores (32 hyper), TSC at 3490 MHz

I always down-clock my test machines and disable turbo (or "boost") for the core clocks. This gives me very reliable profiling (of single threaded work, anyway). If you don't do this, you will see not just variability of runs, but also the first thing you test will usually seem faster than later tests, so your testing may be order dependent. If you don't lock your cores at low clocks, then you should be repeating your tests many times, trying them in different orders, and seeing if your results are stable.

new machine :

ect seven Oodle 2.8.6
AMD Ryzen 9 3950X (CPU locked at 3393 MHz)

ooSelkie7       :  2.19:1 ,    7.7 enc MB/s , 4205.6 dec MB/s
ooMermaid7      :  2.85:1 ,    4.3 enc MB/s , 1985.1 dec MB/s
ooKraken7       :  3.08:1 ,    3.5 enc MB/s , 1258.7 dec MB/s
ooLeviathan7    :  3.22:1 ,    2.2 enc MB/s ,  778.4 dec MB/s

zlib9           :  2.33:1 ,    9.4 enc MB/s ,  328.1 dec MB/s
old machine :
ect seven Oodle 2.8.6
Core i7 3770 (locked at 3.4 GHz)

ooSelkie7       :  2.19:1 ,    7.5 enc MB/s , 3682.9 dec MB/s
ooMermaid7      :  2.85:1 ,    3.9 enc MB/s , 1722.3 dec MB/s
ooKraken7       :  3.08:1 ,    3.0 enc MB/s , 1024.9 dec MB/s
ooLeviathan7    :  3.22:1 ,    1.9 enc MB/s ,  679.4 dec MB/s

zlib9           :  2.33:1 ,    8.0 enc MB/s ,  310.2 dec MB/s
speeds are all single threaded, except the Oodle Optimal level encoders which use 2 threads for encoding (Jobify).

All reports on my blog before this post were on the Core i7 3770, where the run platform was not explicitly identified. All reports in the future will be on the Ryzen.

Here's an "example_lz_chart" run on the new machine :

AMD Ryzen 9 3950X (CPU locked at 3393 MHz)
Oodle 2.8.6 example_lz_chart <file>
lz_chart loading r:\testsets\lztestset\lzt99...
file size : 24700820
------------------------------------------------------------------------------
Selkie : super fast to encode & decode, least compression
Mermaid: fast decode with better-than-zlib compression
Kraken : good compression, fast decoding, great tradeoff!
Leviathan : very high compression, slowest decode
------------------------------------------------------------------------------
chart cell shows | raw/comp ratio : encode MB/s : decode MB/s |
All compressors run at various encoder effort levels (SuperFast - Optimal).
Many repetitions are run for accurate timing.
------------------------------------------------------------------------------
       |   HyperFast4|   HyperFast3|   HyperFast2|   HyperFast1|   SuperFast |
Selkie |1.41:834:4353|1.45:742:4355|1.53:557:4112|1.68:465:4257|1.70:412:4232|
Mermaid|1.54:702:3119|1.66:535:2591|1.79:434:2450|2.01:350:2429|2.04:324:2395|
Kraken |1.55:702:2247|1.71:532:1432|1.88:421:1367|2.10:364:1399|2.27:241:1272|
------------------------------------------------------------------------------
compression ratio (raw/comp):
       |   HyperFast4|   HyperFast3|   HyperFast2|   HyperFast1|   SuperFast |
Selkie |    1.412    |    1.447    |    1.526    |    1.678    |    1.698    |
Mermaid|    1.542    |    1.660    |    1.793    |    2.011    |    2.041    |
Kraken |    1.548    |    1.711    |    1.877    |    2.103    |    2.268    |
------------------------------------------------------------------------------
encode speed (MB/s):
       |   HyperFast4|   HyperFast3|   HyperFast2|   HyperFast1|   SuperFast |
Selkie |    834.386  |    742.003  |    557.065  |    465.025  |    412.442  |
Mermaid|    701.818  |    534.711  |    433.517  |    350.444  |    324.358  |
Kraken |    701.792  |    531.799  |    420.887  |    364.245  |    240.661  |
------------------------------------------------------------------------------
decode speed (MB/s):
       |   HyperFast4|   HyperFast3|   HyperFast2|   HyperFast1|   SuperFast |
Selkie |   4352.567  |   4355.253  |   4111.801  |   4256.927  |   4231.549  |
Mermaid|   3118.633  |   2590.950  |   2449.676  |   2429.461  |   2394.976  |
Kraken |   2247.102  |   1431.774  |   1366.672  |   1399.416  |   1272.313  |
------------------------------------------------------------------------------
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal3  |
Selkie |1.75:285:3847|1.83:127:4121|1.86: 55:4296|1.93: 10:4317|1.94:7.2:4297|
Mermaid|2.12:226:2307|2.19:115:2533|2.21: 52:2661|2.37:5.5:2320|2.44:4.2:2256|
Kraken |2.32:152:1387|2.39: 30:1483|2.44: 23:1469|2.55:9.8:1350|2.64:3.5:1292|
Leviath|2.48: 58: 899|2.56: 23: 937|2.62: 11: 968|2.71:3.9: 948|2.75:2.4: 932|
------------------------------------------------------------------------------
compression ratio (raw/comp):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal3  |
Selkie |    1.748    |    1.833    |    1.863    |    1.933    |    1.943    |
Mermaid|    2.118    |    2.194    |    2.207    |    2.370    |    2.439    |
Kraken |    2.320    |    2.390    |    2.435    |    2.553    |    2.640    |
Leviath|    2.479    |    2.557    |    2.616    |    2.708    |    2.749    |
------------------------------------------------------------------------------
encode speed (MB/s):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal3  |
Selkie |    284.979  |    127.375  |     55.468  |     10.398  |      7.168  |
Mermaid|    226.279  |    114.597  |     52.334  |      5.457  |      4.229  |
Kraken |    152.400  |     29.891  |     22.928  |      9.849  |      3.530  |
Leviath|     58.356  |     23.379  |     10.845  |      3.927  |      2.380  |
------------------------------------------------------------------------------
decode speed (MB/s):
       |   VeryFast  |   Fast      |   Normal    |   Optimal1  |   Optimal3  |
Selkie |   3846.881  |   4121.199  |   4296.318  |   4317.344  |   4297.364  |
Mermaid|   2307.345  |   2532.950  |   2660.551  |   2320.415  |   2255.556  |
Kraken |   1387.219  |   1483.488  |   1469.246  |   1350.332  |   1292.404  |
Leviath|    899.052  |    937.473  |    968.337  |    948.179  |    932.194  |
------------------------------------------------------------------------------