Improving the compression of block-compressed textures Revisited

The Oodle LZ compressors are especially good on binary data, such as the block compressed textures used in games.

(by "block compressed textures" I mean BCn, ETC1, ETC2, etc. textures in fixed size blocks for use with GPU's. I do *not* mean already compressed textures such as JPEG, PNG, or BCn that has already been compressed with crunch. You should not be applying Oodle or any other generic compressor on top of already compressed textures of that type. If you have a lot of PNG data consider PNG without ZLib or look for the upcoming Oodle Lossless Image codec.)

See the Appendix at the bottom for a comparison of modern LZ compressors on BCn data. Oodle LZ gets more compression and/or much faster decode speeds on BCn data.

So you can certainly just create your texture as usual (at maximum quality) and compress it with Oodle. That's fine and gives you the best visual quality.

If you need your texture data to be smaller for some reason, you can use a data-specific lossy compressor like crunch (or Basis), or you could use RDO texture creation followed by Oodle LZ compression.

(I've written about this before, here : Improving the compression of block-compressed textures , but I'm trying to do a rather cleaner more thorough job this time).

RDO texture creation is a modification of the step that creates the block compressed texture (BCn or whatever) from the original (RGBA32 or whatever). Instead of simply choosing the compressed texture blocks that minimize error, blocks are chosen to minimize rate + distortion. That is, sometimes larger error is intentionally chosen when it improves rate. In this case, we want to minimize the rate *after* a following LZ compressor. The block compressed textures always have the same size, but some choices are more compressible than others. The basic idea is to choose blocks that have some relation to preceding blocks, thereby making them more compressible. Common examples are trying to reuse selector bits, or to choose endpoints that match neighbors.

RDO encoding of block compressed textures should always be done from the original non-compressed version of the texture, *not* from a previous block compressed encoding. eg. don't take something already in BC1 and try to run RDO to shrink it further. Doing that would cause the errors to add up, a bit like taking a JPEG and lowering it's "quality" setting to make it smaller - that should always be done from original data.

Now, block compressed textures are already lossy. BC1 is quite bad; BC7 and ASTC less so. So adding more error may not be acceptable at all. If large amounts of error are acceptable in your texture, you may not ever be seeing the largest mip levels. Sending mip levels that are too large and never visible is a *far* larger waste of size than anything we do here, so it's important to have a process in your game to find those textures and shrink them.

The best tool I know of at the moment to do RDO texture creation is crunch by Rich Geldreich / Binomial. I'm told that their newer Basis product has an improved RDO-for-LZ but I don't have a copy to test. What I actually run is Unity's improvement to crunch. The way you use it is something like :

crunch_x64_unity.exe -DXT1 -fileformat dds -file input.png -maxmips 1 -quality 200 -out output.dds
That is, tell it to make fileformat DDS, it will do normal block texture compression, but with rate-favoring decisions.

NOTE : we're talking about lossy compression here, which is always a little subtle to investigate because there are two axes of performance : both size and quality. Furthermore "quality" is hard to measure well, and there is no substitute for human eyes examining the images to decide what level of loss is acceptable. Here I am reporting "imdiff" scores with my "combo" metric. The "imdiff" scores are not like an RMSE; they're roughly on a scale of 0-100 where 0 = no difference and 100 = complete garbage, like a percent difference (though not really).

Some results :


non-RDO fast BC1 : 524,416 bytes
then Leviathan   : -> 416,981
imdiff     : 33.26

crunch RDO quality 200 , then Leviathan : -> 354,203
imdiff     : 36.15

file size 85% of non-RDO
error 109% of non-RDO


non-RDO fast BC1 : 524,416 bytes
then Leviathan   : -> 409,874
imdiff     : 32.96

crunch RDO quality 200 , then Leviathan : -> 334,342
imdiff     : 33.48

file size 81% of non-RDO
error 102% of non-RDO

Personally I like crunch's RDO DDS at these high quality levels, 200 or above. It introduces relatively little error and the file size savings are still significant.

At lower quality levels use of crunch can be problematic in practice. Unfortunately it's hard to control how much error it introduces. You either have to manually inspect textures for damage, or run an external process to measure quality and feed that back into the crunch settings. Another problem is that crunch's quality setting doesn't scale with texture size; smaller size textures get less error and larger size textures get more error at the same "quality" setting, which means you need to choose a quality setting per texture size. (I think the issue is that crunch's codebook size doesn't scale with texture size, which makes it particularly bad for textures at 2048x2048 or above, or for large texture atlases).

Your other option besides doing RDO texture creation followed by LZ is to just use crunch's native "crn" format for textures.

Let's compare RDO+LZ vs crn for size. I will do this by dialing the quality setting until they get the same imdiff "combo" score, so we are comparing a line of equal distortion (under one metric).


crunch crn 255 : -> 211,465
imdiff     : 42.33

crunch rdo dds 95 : -> 264,206
imdiff     : 42.36


crunch crn 255 : -> 197,644
imdiff     : 38.48

crunch rdo dds 101 : -> 244,402
imdiff     : 38.67

The native "crn" format is about 20% smaller than RDO + LZ on both of these textures. It is to be expected that custom compressors, well designed for one type of data, should beat general purpose compressors. Note that comparing "crn" sizes to just doing BCn + LZ (without RDO) is not a valid comparison, since they are at different error levels.

If you look at the quality settings, the "crn" mode at maximum quality is still introducing a lot of error. That "quality" setting is not on the same scale for crn mode and dds mode. Maximum quality (255) in crn mode is roughly equivalent to quality = 100 in dds mode. Unfortunately there seems to be no way to get higher quality in the crn mode.

This has been an attempt to provide some facts to help you make a good choice. There are three options : BCn (without RDO) + LZ , RDO BCn + LZ, or a custom compresssed texture format like CRN. They have different pros and cons and the right choice depends on your app and pipeline.

Now we haven't looked at decode speed in this comparison. I've never measured crunch's decode speed (of CRN format), but I suspect that Oodle's LZ decoders are significantly faster. Another possible speed advantage for Oodle LZ is that you can store your BCn data pre-swizzled for the hardware, which may let you avoid more CPU work. I should also note that you should never LZ decompress directly into uncached graphics memory. You either need to copy it over after decoding (which is very fast and recommended) or start the memory as cached for LZ decoding and then change it to uncached GPU memory after the decode is done.

Appendix : Performance of LZ compressors on BCn data

Comparison of Oodle to some other compressors on samples of game texture data.

Repeating the "Game BCn" test from Oodle 2.6.0 : Leviathan detailed performance report : A mix of BCn textures from a game (mostly BC1, BC4, and BC7) :

"Game BCn" :

lzma 16.04 -9           : 3.692 to 1 :   64.85 MB/s
brotli24 2017-12-12 -11 : 3.380 to 1 :  237.78 MB/s
zlib 1.2.11 -9          : 2.720 to 1 :  282.78 MB/s
zstd 1.3.3 -22          : 3.170 to 1 :  485.97 MB/s

Kraken8                 : 3.673 to 1 :  880.99 MB/s
Leviathan8              : 3.844 to 1 :  661.93 MB/s

A different set : "test_data\image\dds" is mostly BC1 with some BC3 and some RGBA32

test_data\image\dds :

lzma 16.04 -9           : 2.354 to 1 :   39.53 MB/s
brotli24 2017-12-12 -11 : 2.161 to 1 :  161.40 MB/s
zlib 1.2.11 -9          : 1.894 to 1 :  222.70 MB/s
zstd 1.3.3 -22          : 2.066 to 1 :  443.96 MB/s

Kraken8                 : 2.320 to 1 :  779.84 MB/s
Leviathan8              : 2.386 to 1 :  540.90 MB/s
(note this is lzma with default settings; lzma with settings tweaked for BCn can sometimes get more compression than Leviathan)

While brotli and ZStd are competitive with Kraken's compression ratio on text (and text-like) files, they lag behind on many types of binary data, such as these block compressed textures.

No comments:

old rants