cbloom rants: 03/2018

3/26/2018

Compression of Android Game APK Packages with Oodle

This is a look at compression of Android Game APK Packages. In this study I'm mainly looking at the issue of transmission of the APK to the client, not storage on the client and decompression at runtime. The key difference between the two is that for transmission over the network, you want to compress the package as a "tar", that is without division into files, so that the compressor can use cross-file correlation. For storage on disk and runtime loading, you might want to store files individually (or perhaps combined and/or split into paging units), and you might want some files uncompressed.

The Android APK package is just a zip (thanks to them for just using zip and not changing the header so that it can be easily manipulated with standard tools).

I chose the list of games from this article : Google's instant app tech now lets you try games before you buy which is : Clash Royale, Words With Friends 2, Bubble Witch 3 Saga, Final Fantasy XV: A New Empire, Mighty Battles and -- of course -- Solitaire

I discovered that "Mighty Battles" internally contains a large pre-compressed pak file. (it's named "content.mp3" but is not really an mp3, it's some sort of compressed archive. They use the mp3 extension to get the APK package system to store it without further zlib compression.) Because of that I exluded Might Battles from the test; it would be about the same size with every compressor, and is not reflective of how it should be compressed (their internal package should be uncompressed if we're testing how well the outer compressor does). Later I also saw that "Clash Royale" is also mostly pre-compressed content. Clash Royale has its content in ".sc" files that are opaque compressed data. I left it in the test set, but it should also have those files uncompressed for real use with an outer compressor. I wasn't sure which Solitaire to test; I chose the one by Zynga.

The "tar" is made by unpacking the APK zip and concatenating all the files together. I also applied PNGz0 to turn off zlib compression on any PNGs. I then tested various compressors on the game tars.

	original	tar	zlib	Leviathan
BubbleWitch3	78,032,875	304,736,621	67,311,666	54,443,823
ClashRoyale	101,702,690	124,031,098	98,386,824	93,026,161
FinalFantasyXV	58,933,554	144,668,500	57,104,802	41,093,459
Solitaire	14,814,888	139,177,140	14,071,999	8,337,863
WordsWithFriends2	78,992,339	570,621,614	78,784,623	53,413,494
total	332,476,346	1,283,234,973	315,659,914	250,314,800

original  = size of the source APK (per-file zip with some files stored uncompressed)
tar       = unzipped files, with PNGz0, concatenated together
zlib      = zip -9 applied to the tar ; slightly smaller than original
Leviathan = Oodle Leviathan level 8 (Optimal4) applied to the tar

You can see that Clash Royale doesn't change much because it contains large amounts of pre-compressed data internally. The other games all get much smaller with Leviathan on a tar (relative to the original APK, or zlib on the tar). eg. BubbleWitch3 was 78 MB, Leviathan can send it in 54.4 MB ; Solitaire can be sent in almost half the size.

Leviathan is very fast to decode on ARM. Despite getting much more compression than zlib, it is faster to decode. More modern competitors (ZStd, brotli, LZMA) are also slower to decode than Leviathan on ARM, and get less compression.

For reference, here is the performance on this test set of a few compressors (speeds on Windows x64 Core i7-3770) :

Note that some of the wins here are not accessible to game developers. When a mobile game developer uses Oodle on Android, they can apply Oodle to their own content and get the size and time savings there. But they can't apply Oodle to their executable or Java files. The smaller they reduce their content, the larger the proportion of their APK becomes that is made up of files they can't compress. To compress all the content in the APK (both system and client files, as well as cross-file tar compression) requires support from the OS or transport layer.

I'll also take this chance to remind clients that when using Oodle, you should always try to turn off any previous compression on your data. For example, here we didn't just try the compressors directly on the APK files (which are zip archives and have previous zlib compression), we first unpacked them. We then further took the zlib compression off the PNG's so that the outer compressors in the test could have a chance to compress that data better. The internal compressors used on Clash Royale and Mighty Battles should also have been ideally turned off to maximize compression. On the other hand, turning off previous compression does not apply to data-specific lossy compressors such as audio, image, and video codecs. That type of data should be passed through with no further compression.

3/25/2018

Improving the compression of block-compressed textures Revisited

ADD: the best solution for this now is our new product Oodle Texture

I'm leaving this post for historical reference, but it is now obsolete and you should look at the Oodle Texture results instead.

The Oodle LZ compressors are especially good on binary data, such as the block compressed textures used in games.

(by "block compressed textures" I mean BCn, ETC1, ETC2, etc. textures in fixed size blocks for use with GPU's. I do *not* mean already compressed textures such as JPEG, PNG, or BCn that has already been compressed with crunch. You should not be applying Oodle or any other generic compressor on top of already compressed textures of that type. If you have a lot of PNG data consider PNG without ZLib or look for the upcoming Oodle Lossless Image codec.)

See the Appendix at the bottom for a comparison of modern LZ compressors on BCn data. Oodle LZ gets more compression and/or much faster decode speeds on BCn data.

So you can certainly just create your texture as usual (at maximum quality) and compress it with Oodle. That's fine and gives you the best visual quality.

If you need your texture data to be smaller for some reason, you can use a data-specific lossy compressor like crunch (or Basis), or you could use RDO texture creation followed by Oodle LZ compression.

(I've written about this before, here : Improving the compression of block-compressed textures , but I'm trying to do a rather cleaner more thorough job this time).

RDO texture creation is a modification of the step that creates the block compressed texture (BCn or whatever) from the original (RGBA32 or whatever). Instead of simply choosing the compressed texture blocks that minimize error, blocks are chosen to minimize rate + distortion. That is, sometimes larger error is intentionally chosen when it improves rate. In this case, we want to minimize the rate *after* a following LZ compressor. The block compressed textures always have the same size, but some choices are more compressible than others. The basic idea is to choose blocks that have some relation to preceding blocks, thereby making them more compressible. Common examples are trying to reuse selector bits, or to choose endpoints that match neighbors.

RDO encoding of block compressed textures should always be done from the original non-compressed version of the texture, *not* from a previous block compressed encoding. eg. don't take something already in BC1 and try to run RDO to shrink it further. Doing that would cause the errors to add up, a bit like taking a JPEG and lowering it's "quality" setting to make it smaller - that should always be done from original data.

Now, block compressed textures are already lossy. BC1 is quite bad; BC7 and ASTC less so. So adding more error may not be acceptable at all. If large amounts of error are acceptable in your texture, you may not ever be seeing the largest mip levels. Sending mip levels that are too large and never visible is a *far* larger waste of size than anything we do here, so it's important to have a process in your game to find those textures and shrink them.

The best tool I know of at the moment to do RDO texture creation is crunch by Rich Geldreich / Binomial. I'm told that their newer Basis product has an improved RDO-for-LZ but I don't have a copy to test. What I actually run is Unity's improvement to crunch. The way you use it is something like :

crunch_x64_unity.exe -DXT1 -fileformat dds -file input.png -maxmips 1 -quality 200 -out output.dds

That is, tell it to make fileformat DDS, it will do normal block texture compression, but with rate-favoring decisions.

NOTE : we're talking about lossy compression here, which is always a little subtle to investigate because there are two axes of performance : both size and quality. Furthermore "quality" is hard to measure well, and there is no substitute for human eyes examining the images to decide what level of loss is acceptable. Here I am reporting "imdiff" scores with my "combo" metric. The "imdiff" scores are not like an RMSE; they're roughly on a scale of 0-100 where 0 = no difference and 100 = complete garbage, like a percent difference (though not really).

Some results :


act3cracked_colour
1024x1024

non-RDO fast BC1 : 524,416 bytes
then Leviathan   : -> 416,981
imdiff     : 33.26

crunch RDO quality 200 , then Leviathan : -> 354,203
imdiff     : 36.15

file size 85% of non-RDO
error 109% of non-RDO

adventurer_colour 
1024x1024

non-RDO fast BC1 : 524,416 bytes
then Leviathan   : -> 409,874
imdiff     : 32.96

crunch RDO quality 200 , then Leviathan : -> 334,342
imdiff     : 33.48

file size 81% of non-RDO
error 102% of non-RDO

Personally I like crunch's RDO DDS at these high quality levels, 200 or above. It introduces relatively little error and the file size savings are still significant.

At lower quality levels use of crunch can be problematic in practice. Unfortunately it's hard to control how much error it introduces. You either have to manually inspect textures for damage, or run an external process to measure quality and feed that back into the crunch settings. Another problem is that crunch's quality setting doesn't scale with texture size; smaller size textures get less error and larger size textures get more error at the same "quality" setting, which means you need to choose a quality setting per texture size. (I think the issue is that crunch's codebook size doesn't scale with texture size, which makes it particularly bad for textures at 2048x2048 or above, or for large texture atlases).

Your other option besides doing RDO texture creation followed by LZ is to just use crunch's native "crn" format for textures.

Let's compare RDO+LZ vs crn for size. I will do this by dialing the quality setting until they get the same imdiff "combo" score, so we are comparing a line of equal distortion (under one metric).


act3cracked_colour
1024x1024

crunch crn 255 : -> 211,465
imdiff     : 42.33

crunch rdo dds 95 : -> 264,206
imdiff     : 42.36


adventurer_colour
1024x1024

crunch crn 255 : -> 197,644
imdiff     : 38.48

crunch rdo dds 101 : -> 244,402
imdiff     : 38.67

The native "crn" format is about 20% smaller than RDO + LZ on both of these textures. It is to be expected that custom compressors, well designed for one type of data, should beat general purpose compressors. Note that comparing "crn" sizes to just doing BCn + LZ (without RDO) is not a valid comparison, since they are at different error levels.

If you look at the quality settings, the "crn" mode at maximum quality is still introducing a lot of error. That "quality" setting is not on the same scale for crn mode and dds mode. Maximum quality (255) in crn mode is roughly equivalent to quality = 100 in dds mode. Unfortunately there seems to be no way to get higher quality in the crn mode.

This has been an attempt to provide some facts to help you make a good choice. There are three options : BCn (without RDO) + LZ , RDO BCn + LZ, or a custom compresssed texture format like CRN. They have different pros and cons and the right choice depends on your app and pipeline.

Now we haven't looked at decode speed in this comparison. I've never measured crunch's decode speed (of CRN format), but I suspect that Oodle's LZ decoders are significantly faster. Another possible speed advantage for Oodle LZ is that you can store your BCn data pre-swizzled for the hardware, which may let you avoid more CPU work. I should also note that you should never LZ decompress directly into uncached graphics memory. You either need to copy it over after decoding (which is very fast and recommended) or start the memory as cached for LZ decoding and then change it to uncached GPU memory after the decode is done.

Appendix : Performance of LZ compressors on BCn data

Comparison of Oodle to some other compressors on samples of game texture data.

Repeating the "Game BCn" test from Oodle 2.6.0 : Leviathan detailed performance report : A mix of BCn textures from a game (mostly BC1, BC4, and BC7) :

"Game BCn" :

lzma 16.04 -9           : 3.692 to 1 :   64.85 MB/s
brotli24 2017-12-12 -11 : 3.380 to 1 :  237.78 MB/s
zlib 1.2.11 -9          : 2.720 to 1 :  282.78 MB/s
zstd 1.3.3 -22          : 3.170 to 1 :  485.97 MB/s

Kraken8                 : 3.673 to 1 :  880.99 MB/s
Leviathan8              : 3.844 to 1 :  661.93 MB/s

A different set : "test_data\image\dds" is mostly BC1 with some BC3 and some RGBA32

test_data\image\dds :

lzma 16.04 -9           : 2.354 to 1 :   39.53 MB/s
brotli24 2017-12-12 -11 : 2.161 to 1 :  161.40 MB/s
zlib 1.2.11 -9          : 1.894 to 1 :  222.70 MB/s
zstd 1.3.3 -22          : 2.066 to 1 :  443.96 MB/s

Kraken8                 : 2.320 to 1 :  779.84 MB/s
Leviathan8              : 2.386 to 1 :  540.90 MB/s

(note this is lzma with default settings; lzma with settings tweaked for BCn can sometimes get more compression than Leviathan)

While brotli and ZStd are competitive with Kraken's compression ratio on text (and text-like) files, they lag behind on many types of binary data, such as these block compressed textures.

3/19/2018

Oodle Data Compression Integration for Unreal Engine 4

We have created an integration of Oodle Data Compression for Unreal Engine 4. You can now use Oodle's amazing Kraken, Leviathan, Mermaid and Selkie to compress your Unreal game data smaller and load it faster.

Once installed in your source tree, the Oodle Data Compression integration is transparent. Simply create compressed pak files the way you usually do, and instead of compressing with zlib they will be compressed with Oodle. At runtime, the engine automatically decodes with Oodle or zlib as specified in the pak.

Oodle can compress game data much smaller than zlib. Oodle also decodes faster than zlib. With less data to load from disk and faster decompression, you speed up data loading in two ways.

For example, on the ShooterGame sample game, the pak file sizes are :


ShooterGame-WindowsNoEditor.pak

uncompressed :       1,131,939,544 bytes

Unreal default zlib :  417,951,648   2.708 : 1

Oodle Kraken 4 :       372,845,225   3.036 : 1

Oodle Leviathan 8 :    330,963,205   3.420 : 1

Oodle Leviathan makes the ShooterGame pak file 87 MB smaller than the Unreal default zlib compression!

NOTE this is new and separate from the Oodle Network integration which has been in Unreal for some time. Oodle Network provides compression of network packets to reduce bandwidth in networked games.

The Oodle Data Compression integration is available for Unreal 4.18 and 4.19.

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

(these sizes are from the actual Oodle integration in UE4, taking the ShooterGame content and exporting paks with different compression options)

Update June 19 2019 :

Here's another example on a large pak from Fortnite :


FortniteGame-WindowsClient.pak

uncompressed :   14,150,876,787 bytes

zlib 9       :    6,601,033,750 bytes

Oodle Mermaid :   5,878,738,118 bytes

Oodle Leviathan : 5,461,767,960 bytes

zlib and Oodle are both run with 256 KB chunks. Leviathan decodes roughly 3X faster than zlib.

On my Core(TM) i7-8750H zlib is roughly 300 MB/s to decode, Leviathan is 850 MB/s, Oodle Kraken is 1.5 GB/s , Mermaid and Selkie are even faster.

The special thing about Oodle Leviathan is that it gets very high compression levels while still being super fast to decode. Kraken, Mermaid and Selkie are even faster and give you options for different platforms and performance targets.

Note this test on the Fortnite pak was done just by compressing the uncompressed pak that comes with the game, applying 256 KB chunking. This is not quite the same as what you'd get from applying the compression in the Unreal pak file system, because there would also be chunking at the resource level.

3/06/2018

Oodle 2.6.0 : Some more perf comparisons

Here are some more perf comparisons with Oodle 2.6.0, this time with more non-Oodle compressors for reference.

As before, this is Windows x64 on a Core i7-3770, and we're looking at compression ratio vs. decode speed, not considering encode speed, and running all compressors in their highest compression level.

On the "seven" testset, compressing each file independently, then summing decode time and size for each file :

The loglog chart shows log compression ratio on the Y axis and log decode speed on the X axis (the numeric labels show the pre-log values). The upper right is the Pareto frontier.

The raw numbers are :

total      : lzma 16.04 -9           : 3.186 to 1 :   52.84 MB/s
total      : lzham 1.0 -d26 -4       : 2.932 to 1 :  149.57 MB/s
total      : brotli24 2017-12-12 -11 : 2.958 to 1 :  203.91 MB/s
total      : zlib 1.2.11 -9          : 2.336 to 1 :  271.61 MB/s
total      : zstd 1.3.3 -22          : 2.750 to 1 :  474.47 MB/s
total      : lz4hc 1.8.0 -12         : 2.006 to 1 : 2786.78 MB/s
total      : Leviathan8              : 3.251 to 1 :  675.92 MB/s
total      : Kraken8                 : 3.097 to 1 :  983.74 MB/s
total      : Mermaid8                : 2.846 to 1 : 1713.53 MB/s
total      : Selkie8                 : 2.193 to 1 : 3682.88 MB/s

Isolating just Kraken, Leviathan, and ZStd on the same test (ZStd is the closest non-Oodle codec), we can look at file-by-file performance :

The loglog shows each file with a different symbol, colored by the compressor.

The speed advantage of Oodle is pretty uniform, but the compression advantage varies by file type. Some files simply have more air that Oodle can squeeze out beyond what other LZ's find. For example if you only looked at enwik7 (xml/text), then all of the modern LZ's considered here (ZStd,Oodle,Brotli,LZHAM) would get almost exactly the same compression; there's just not a lot of room for them to differentiate themselves on text.

Runs on a couple other standard files :

On the Silesia/Mozilla file :

mozilla    : lzma 16.04 -9           : 3.832 to 1 :   63.79 MB/s
mozilla    : lzham 1.0 -d26 -4       : 3.570 to 1 :  168.96 MB/s
mozilla    : brotli24 2017-12-12 -11 : 3.601 to 1 :  246.68 MB/s
mozilla    : zlib 1.2.11 -9          : 2.690 to 1 :  275.40 MB/s
mozilla    : zstd 1.3.3 -22          : 3.365 to 1 :  503.44 MB/s
mozilla    : lz4hc 1.8.0 -12         : 2.327 to 1 : 2509.82 MB/s
mozilla    : Leviathan8              : 3.831 to 1 :  691.37 MB/s
mozilla    : Kraken8                 : 3.740 to 1 :  985.35 MB/s
mozilla    : Mermaid8                : 3.335 to 1 : 1834.49 MB/s
mozilla    : Selkie8                 : 2.793 to 1 : 3145.63 MB/s

On win81 :

win81      : lzma 16.04 -9           : 2.922 to 1 :   51.87 MB/s
win81      : lzham 1.0 -d26 -4       : 2.774 to 1 :  156.44 MB/s
win81      : brotli24 2017-12-12 -11 : 2.815 to 1 :  214.02 MB/s
win81      : zlib 1.2.11 -9          : 2.207 to 1 :  253.68 MB/s
win81      : zstd 1.3.3 -22          : 2.702 to 1 :  472.39 MB/s
win81      : lz4hc 1.8.0 -12         : 1.923 to 1 : 2408.91 MB/s
win81      : Leviathan8              : 2.959 to 1 :  757.36 MB/s
win81      : Kraken8                 : 2.860 to 1 :  949.07 MB/s
win81      : Mermaid8                : 2.618 to 1 : 1847.77 MB/s
win81      : Selkie8                 : 2.142 to 1 : 3467.36 MB/s

And again on the "seven" testset, but this time with the files cut into 32 kB chunks :

total      : lzma 16.04 -9           : 2.656 to 1 :   43.25 MB/s
total      : lzham 1.0 -d26 -4       : 2.435 to 1 :   76.36 MB/s
total      : brotli24 2017-12-12 -11 : 2.581 to 1 :  178.25 MB/s
total      : zlib 1.2.11 -9          : 2.259 to 1 :  255.18 MB/s
total      : zstd 1.3.3 -22          : 2.363 to 1 :  442.42 MB/s
total      : lz4hc 1.8.0 -12         : 1.849 to 1 : 2717.30 MB/s
total      : Leviathan8              : 2.731 to 1 :  650.23 MB/s
total      : Kraken8                 : 2.615 to 1 :  975.69 MB/s
total      : Mermaid8                : 2.455 to 1 : 1625.69 MB/s
total      : Selkie8                 : 1.910 to 1 : 4097.27 MB/s

By cutting into 32 kB chunks, we remove the window size disadvantage suffered by zlib and LZ4. Now all the codecs have the same match window, and the compression difference only comes from what additional features they provide. The small chunk also stresses startup overhead time and adaptation speed.

The Oodle codecs generally do even better (vs the competition) on small chunks than they do on large files. For example LZMA and LZHAM both have large models that really need a lot of data to get up to speed. All the non-Oodle codecs slow down more on small chunks than the Oodle codecs do.

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK

Oodle 2.6.0 : Hydra and space-speed flexibility

One of the unique things about Oodle is the fact that its compressors are optimizing for a space-speed goal (not just for size), and the user has control over how that combined score is priced.

This is a crucial aspect of Leviathan. Oodle Leviathan considers many options for the encoded data, it rates those options for space-speed and chooses the bit stream that optimizes that goal. This means that Leviathan can consider slower-to-decode bit stream options, and only use them when they provide enough of a benefit that they are worth it.

That is, Leviathan decompression speed varies a bit depending on the file, as all codecs do. However, other codecs will sometimes get slower for no particular benefit. They may choose a slower mode, or simply take a lot more slow encoding options (short matches, or frequent literal-match switches), even if isn't a big benefit to file size. Leviathan only chooses slower encoding modes when they provide a benefit to file size that meets the goal the user has set for space-speed tradeoff.

Each of the new Oodle codecs (Leviathan, Kraken, Mermaid & Selkie) has a different default space-speed goal, which we set to be near their "natural lambda", the place that their fundamental structure works best. Clients can dial this up or down to bias their decisions more for size or speed.

The flexibility of these Oodle codecs allows them to cover a nearly continuous range of compression ratio vs decode speed points.

Here's an example of the Oodle codecs targeting a range of space-speed goals on the file "TC_Foreground_Arc.clb" (Windows x64):

Now you may notice that at the highest compression setting of Kraken (-zs64) it is strictly worse than the fastest setting of Leviathan (-zs1024). If you want that speed-compression tradeoff point, then using Kraken is strictly wrong - you should switch to Leviathan there.

That's what Oodle Hydra does for you. Hydra (the many headed beast) is a meta compressor which chooses between the other Oodle compressors to make the best space-speed encoding. Hydra does not just choose per file, but per block, which means it can do finer grain switching to find superior encodings.

Oodle Hydra on the file "TC_Foreground_Arc.clb" (Windows x64):

When using Hydra you don't explicitly choose the encoder at all. You set your space-speed goal and you trust in Oodle to make the choice for you. It may use Leviathan, Kraken, or Mermaid, so you may get faster or slower decoding on any given chunk, but you do know that when it chooses a slower decoder it was worth it. Hydra also sometimes gets free wins; if you wanted high compression so you would've gone with Leviathan, there are cases where Kraken compresses nearly the same (or even does better), and switching down to Kraken is just a decode speed win for free (no compression ratio sacrificed).

Of course the disadvantage of Hydra is slow encoding, because it has to consider even more encoding options than Oodle already does. It is ideal for distribution scenarios where you encode once and decode many times.

Another way we can demonstrate Oodle's space-speed encoding is to disable it.

We can run Oodle in "max compression only" mode by setting the price of time to zero. That is, when it scores a decision for space-speed, we consider only speed. (I never actually set the price of time to exactly zero; it's better to just make it very small so that ties in size are still broken in favor of speed; specifically we will set spaceSpeedTradeoffBytes = 1).

Here's Oodle Leviathan on the Silesia standard test corpus :

Leviathan default space-speed score (spaceSpeedTradeoffBytes = 256) :

total   : 211,938,580 ->48,735,197 =  1.840 bpb =  4.349 to 1
decode  : 264.501 millis, 4.25 c/B, rate= 801.28 MB/s

Leviathan max compress (spaceSpeedTradeoffBytes = 1) :

total   : 211,938,580 ->48,592,540 =  1.834 bpb =  4.362 to 1
decode  : 295.671 millis, 4.75 c/B, rate= 716.81 MB/s

By ignoring speed in decisions, we've gained 140 kB , but have lost 30 milliseconds in decode time.

Perhaps a better way to look at it is the other way around - by making good space-speed decisions, the default Leviathan setting saves 0.50 cycles per byte in decode time, at a cost of only 0.006 bits per byte of compressed size.