There were two major factors in the gains. One was just some more time optimizing some inner loops (including some new super-tight pathways from Fabian).
The other was more rigorous analysis of the space-speed tradeoff decisions inside Kraken. One of the fundamental things that makes Kraken work is the fact that it consider space-speed when making its internal decisions, but before 230 those decisions were made in a rather ad-hoc way. Making those decisions better means that even with the same decoder, the new encoder is able to create files that are the same size but decode faster.
The tradeoff point (technically, the lagrange lambda, or the exchange rate from time to bytes) that's used by Oodle to make space-speed decisions is exposed to the client in the OodleLZ_CompressOptions so you can adjust it to bias for compression or decode speed. Each compressor sets what I believe to be a reasonable default for its usage domain, so adjustments to this value should typically be small (you can't massively change behavior with it; Kraken won't start arithmetic coding things if you set the tradeoff really small, for example, there's a small window where the compressor works well and you can just bias sightly within that window).
Some dry numbers for reference :
On PS4 :
Oodle 230 Kraken -zl4 : 24,700,820 ->10,377,556 = 3.361 bpb = 2.380 to 1
decode only : 65.547 millis, 4.23 c/b, rate= 376.84 mb/s
Oodle 230 Kraken -zl6 : 24,700,820 -> 9,970,882 = 3.229 bpb = 2.477 to 1
decode : 63.453 millis, 4.09 c/b, rate= 389.28 mb/s
Oodle 230 Kraken -zl7 : 24,700,820 -> 9,734,771 = 3.153 bpb = 2.537 to 1
decode : 67.915 millis, 4.38 c/b, rate= 363.70 mb/s
Oodle 220 Kraken -zl4 : 24,700,820 ->10,326,584 = 3.345 bpb = 2.392 to 1
decode only : 0.073 seconds, 211.30 b/kc, rate= 336.76 mb/s
Oodle 220 Kraken -zl6 : 24,700,820 ->10,011,486 = 3.242 bpb = 2.467 to 1
decode : 0.074 seconds, 208.83 b/kc, rate= 332.82 mb/s
Oodle 220 Kraken -zl7 : 24,700,820 -> 9,773,112 = 3.165 bpb = 2.527 to 1
decode : 0.079 seconds, 196.70 b/kc, rate= 313.49 mb/s
On Win64 (Core i7-3770 3.4 GHz) :
Oodle 2.3.0 :
Silesia Kraken -z6
total : 211,938,580 ->51,918,269 = 1.960 bpb = 4.082 to 1
decode : 210.685 millis, 3.38 c/b, rate= 1005.95 mb/s
Weissman 1-256 : [8.575]
mozilla : 51,220,480 ->14,410,181 = 2.251 bpb = 3.554 to 1
decode only : 51.280 millis, 3.41 c/b, rate= 998.83 mb/s
lzt99 : 24,700,820 -> 9,970,882 = 3.229 bpb = 2.477 to 1
decode only : 20.943 millis, 2.89 c/b, rate= 1179.44 mb/s
win81 : 104,857,600 ->38,222,311 = 2.916 bpb = 2.743 to 1
decode only : 108.344 millis, 3.52 c/b, rate= 967.82 mb/s
Oodle 2.2.0 :
Silesia Kraken -z6
total : 211,938,580 ->51,857,427 = 1.957 bpb = 4.087 to 1
decode : 0.232 seconds, 268.43 b/kc, rate= 913.46 M/s
Weissman 1-256 : [8.431]
"silesia_mozilla"
Kraken 230 : 3.55:1 , 998.8 dec mb/s
Kraken 220 : 3.60:1 , 896.5 dec mb/s
Kraken 215 : 3.51:1 , 928.0 dec mb/s
"lzt99"
Kraken 230 : 2.48:1 , 998.8 dec mb/s
Kraken 220 : 2.53:1 , 912.0 dec mb/s
Kraken 215 : 2.46:1 , 957.1 dec mb/s
"win81"
Kraken 230 : 2.74:1 , 967.8 dec mb/s
Kraken 220 : 2.77:1 , 818.0 dec mb/s
Kraken 215 : 2.70:1 , 877.0 dec mb/s
NOTE : Oodle 2.3.0 Kraken data cannot be read by Oodle 2.2.0 or earlier. Oodle 230 can load all old Oodle data (new versions of Oodle can always load all data created by older versions). If you need to make data that be loaded with an older version using Oodle 230, then you can set the minimum decoder version to something lower (by default it's the current version). Contact Oodle support for details.
Some of the biggest gains were found on ARM, which I'll post about more in the future.
No comments:
Post a Comment