Compression of Android Game APK Packages with Oodle

This is a look at compression of Android Game APK Packages. In this study I'm mainly looking at the issue of transmission of the APK to the client, not storage on the client and decompression at runtime. The key difference between the two is that for transmission over the network, you want to compress the package as a "tar", that is without division into files, so that the compressor can use cross-file correlation. For storage on disk and runtime loading, you might want to store files individually (or perhaps combined and/or split into paging units), and you might want some files uncompressed.

The Android APK package is just a zip (thanks to them for just using zip and not changing the header so that it can be easily manipulated with standard tools).

I chose the list of games from this article : Google's instant app tech now lets you try games before you buy which is : Clash Royale, Words With Friends 2, Bubble Witch 3 Saga, Final Fantasy XV: A New Empire, Mighty Battles and -- of course -- Solitaire

I discovered that "Mighty Battles" internally contains a large pre-compressed pak file. (it's named "content.mp3" but is not really an mp3, it's some sort of compressed archive. They use the mp3 extension to get the APK package system to store it without further zlib compression.) Because of that I exluded Might Battles from the test; it would be about the same size with every compressor, and is not reflective of how it should be compressed (their internal package should be uncompressed if we're testing how well the outer compressor does). Later I also saw that "Clash Royale" is also mostly pre-compressed content. Clash Royale has its content in ".sc" files that are opaque compressed data. I left it in the test set, but it should also have those files uncompressed for real use with an outer compressor. I wasn't sure which Solitaire to test; I chose the one by Zynga.

The "tar" is made by unpacking the APK zip and concatenating all the files together. I also applied PNGz0 to turn off zlib compression on any PNGs. I then tested various compressors on the game tars.

original tar zlib Leviathan
BubbleWitch3 78,032,875 304,736,621 67,311,666 54,443,823
ClashRoyale 101,702,690 124,031,098 98,386,824 93,026,161
FinalFantasyXV 58,933,554 144,668,500 57,104,802 41,093,459
Solitaire 14,814,888 139,177,140 14,071,999 8,337,863
WordsWithFriends2 78,992,339 570,621,614 78,784,623 53,413,494
total 332,476,346 1,283,234,973 315,659,914 250,314,800
original  = size of the source APK (per-file zip with some files stored uncompressed)
tar       = unzipped files, with PNGz0, concatenated together
zlib      = zip -9 applied to the tar ; slightly smaller than original
Leviathan = Oodle Leviathan level 8 (Optimal4) applied to the tar
You can see that Clash Royale doesn't change much because it contains large amounts of pre-compressed data internally. The other games all get much smaller with Leviathan on a tar (relative to the original APK, or zlib on the tar). eg. BubbleWitch3 was 78 MB, Leviathan can send it in 54.4 MB ; Solitaire can be sent in almost half the size.

Leviathan is very fast to decode on ARM. Despite getting much more compression than zlib, it is faster to decode. More modern competitors (ZStd, brotli, LZMA) are also slower to decode than Leviathan on ARM, and get less compression.

For reference, here is the performance on this test set of a few compressors (speeds on Windows x64 Core i7-3770) :

Note that some of the wins here are not accessible to game developers. When a mobile game developer uses Oodle on Android, they can apply Oodle to their own content and get the size and time savings there. But they can't apply Oodle to their executable or Java files. The smaller they reduce their content, the larger the proportion of their APK becomes that is made up of files they can't compress. To compress all the content in the APK (both system and client files, as well as cross-file tar compression) requires support from the OS or transport layer.

I'll also take this chance to remind clients that when using Oodle, you should always try to turn off any previous compression on your data. For example, here we didn't just try the compressors directly on the APK files (which are zip archives and have previous zlib compression), we first unpacked them. We then further took the zlib compression off the PNG's so that the outer compressors in the test could have a chance to compress that data better. The internal compressors used on Clash Royale and Mighty Battles should also have been ideally turned off to maximize compression. On the other hand, turning off previous compression does not apply to data-specific lossy compressors such as audio, image, and video codecs. That type of data should be passed through with no further compression.


Rattenhirn said...

Hello, all of this is almost completely equally true for iOS apps as well, including the archive size comparisons. IPA files are also just Zip, and so is the Windows app package format (.appx if I remember correctly).

But only Android does not unpack its apps during installation, which is both good and bad. Good for the users, as they usually take less precious storage space than on iOS devices. This is quite ironic, because storage space is much more previous on iOS devices. It's good for analyses like this, because you can just grab the APKs off a device.

However, it is bad, because all data are transparently uncompressed when accessed, which not only can causes a lot of confusion when measuring load times, but also makes it double bad when using an internal archive and forgetting to turn off outer compression.

This, somewhat indirectly, brings me to a thing I would really like to see from a modern compression library. Your suggestion to use an inner archive format (i.e. tar) has a few big downsides that make it unusable in practice.

It will require decompression at install time. None of the mobile platforms supports this, so it has to happen at the first app start, which is a really bad user experience. It will present as additional load time (no matter what the screen says), and nothing turns users off quicker than load times. Also, platform owners impose (rather squishy) limits on overall load times, so it might even be rejected.

Secondly, both the original install (from the platform store) and the uncompressed inner archive will hog previous storage space on the device. There is no way to get rid of the original install.

Thirdly, usually you'll want to leave some things compressed up until load time, mostly images and sounds. So the simple inner archive approach might be bad for load times, which again, is bad for everybody.

All of these things weigh worse than the improved download time and bandwidth saving. You install an app once, but, hopefully run it many many times.

This brings me finally to my wished feature: A compression and archive format that combines the space saving of oodled tars with the ability for somewhat random access decompression. I know that these two goals are somewhat contradicting, but I believe there is a sweeter spot to be found than what we currently have.

Finally, I realize that all this is way beyond the scope of your post, but I've had this topic on my chest for some time now and it seemed a good place to unload it and maybe get a discussion going! ;)

cbloom said...

Yes, my look at compression of whole tars is not meant to suggest that developers can or should do that now. Rather it provides a reference of how small this data could be without the constraints of the current APK system. It's also a possible compression level that the OS or transport layer could achieve with platform support (unpacking the tar at install time, possibly recompressing file by file).

For transport over cell networks, minimum size is the main goal, and some time spent unpacking the tar to install is negligible. Oodle decode on modern mobile devices is 500 MB/s or higher. A 50 MB APK takes about 16 seconds to download over LTE in the US (around 3 MB/s), but only 0.1 seconds to unpack the whole thing with Oodle.

There are a couple of different issues -

1. What can developers do right now within the existing APK system?

The best you can do is to take all your data files and put them together in a package like a tar, and then divide that at the seek granularity that you need for random access, maybe 256k or 1M chunks or whatever. (it depends on how you load your data; if you always need to load the whole set of data to run the game, then don't divide at all). As a developer you have to leave your system files out of this pack file. Then compress your pack file with Oodle and name it .mp3 or something so that the APK leaves it uncompressed.

This is better than compressing file by file because it combines tiny files together. You could also compress file by file but only merge together your small files. The point is to make well constructed "paging units" that merge data that is usually loaded at the same time.

2. There's the separate question of how much better it could be with OS level support, and what is the best thing to do there?

The simplest thing would be to replace zlib compression in the APK's with Oodle. Since that decode is done at every app start, the improved decode speed of Oodle would be a nice benefit (on top of the size savings). The next incremental step would be to move to an archive that has "solid" archiving (combining small files into larger compression units) for the small files. Solid archiving is in pretty much every modern archiver (rar, 7z, etc.)

Another incremental step would be to compress the system files (exe and java and such) for transmission and then unpack them at install time (but leave the other data as transmitted). This can gain you many megabytes from compressing those large files that are currently transmitted uncompressed.

The best of all would be to transmit the game as a compressed tar and unpack at install time.

Rattenhirn said...

Great response!

I like the idea of "solid" archiving, never thought of that for some reason...

Another thought, when combining files into a tar archive, or archives with an index, the files could be in there in any order. I wonder if there is a way to sort files cleverly, in order to improve compression even more?

old rants