6/17/2020

Oodle Texture slashes game sizes

Oodle Texture is a new technology we've developed at RAD Game Tools which promises to dramatically shrink game sizes, reducing what you need to download and store on disk, and speeding up load times even more.

Oodle Texture creates BC1-7 GPU textures that are far more compressible, so that when packaged for storage or distribution they are much smaller - up to 2X smaller. Many games have most of their content in this form, so this leads to a huge impact on compressed game sizes, usually 10%-50% smaller depending on the content and how Oodle Texture is used.

Smaller content also loads faster, so improving the compression ratio by 2X also improves effective IO speed by 2X. This is possible when the decompression is not the bottleneck, such as when you use super fast Oodle Kraken decompression, or a hardware decoder.

At RAD, we previously developed Oodle Kraken, part of Oodle Data Compression, which provides super fast decompression with good compression ratios, which makes Kraken great for game data loading where you need high speed. But Kraken is generic, it works on all types of data and doesn't try to figure out data-specific optimizations. Oodle Texture is able to greatly decrease the size that a following Kraken compression gets by preparing the textures in ways that make them more compressible.

Oodle Texture is specialized for what are called "block compressed textures". These are a form of compressed image data that is used by GPUs to provide the rendering attributes for surfaces in games. Oodle Texture works on BC1-BC7 textures, sometimes called "BCN textures". The BC1-7 are seven slightly different GPU formats for different bit depths and content types, and most games use a mix of different BCN formats for their textures. Modern games use a huge amount of BCN texture data. Shrinking the BCN textures to half their previous compressed size will make a dramatic difference in game sizes.

For an example of what Oodle Texture can do, on a real game data test set from a small selection of textures from real shipping content :

127 MB BCN GPU textures, mix of BC1-7, before any further compression

78 MB with zip/zlib/deflate

70 MB with Oodle Kraken

40 MB with Oodle Texture + Kraken
Without Oodle, the game may have shipped the zlib compressed textures at 78 MB. The Oodle Texture + Kraken compressed game is almost half the size of the traditional zlib-compressed game (40 MB). While Oodle Texture is great with Kraken, it also works to prepare textures for compression by other lossless back ends (like zlib). We believe that Oodle Texture should be widely used on game textures, even when Kraken isn't available.

While Kraken is a huge technological advance over zip/zlib, it only saved 8 MB in the example above (this is partly because BCN texture data is difficult for generic compressors to work with), while Oodle Texture saved an additional 30 MB, nearly 4X more than Kraken alone. The size savings possible with Oodle Texture are huge, much bigger than we've seen from traditional compressors, and you don't need to accept painful quality loss to get these savings.

The way that games process texture data is :


RGB uncompressed source art content like BMP or PNG

|     <- this step is where Oodle Texture RDO goes
V

BCN compressed texture

|
V

Kraken or zlib compression on the game package containing the BCN textures

|                                                                                   TOOLS ^
V
sent over network, stored on disk

|                                                                                   RUNTIME v
V

decompress Kraken or zlib to load (sometimes with hardware decompressor)

|
V

BCN compressed texture in memory

|
V

rendered on GPU

Oodle Texture doesn't change this data flow, it just makes the content compress better so that the packaged size is smaller. You still get GPU-ready textures as the output. Note that Oodle Texture RDO isn't required in the runtime side at all.

(Oodle Texture also contains bc7prep which has slightly different usage; see more later, or here)

Games don't decompress the BCN encoding, rendering reads directly from BCN. Games use BCN textures directly in memory because GPUs are optimized to consume that format, and they also take less memory than the original uncompressed RGB image would (and therefore also use less bandwidth), but they aren't a great way to do lossy compression to optimize size in packages. For example the familiar JPEG lossy image compression can make images much smaller than BCN can at similar visual quality levels. In Oodle Texture we want to shrink the package sizes, but without changing the texture formats, because games need them to load into BCN. We also don't want to use any slow transcoding step, cause an unnecessary loss of quality, or require decoding at runtime.

Oodle Texture can be used on the new consoles that have hardware decompression without adding any software processing step. You just load the BCN textures into memory and they are decompressed by the hardware, and you get the benefit of much smaller compressed sizes, which also effectively multiplies the load speed.

Oodle Texture RDO can't be used to compress games with existing BCN texture content, as that has already been encoded. We need to re-encode to BCN from source art as part of the game's content baking tools.

BCN textures work on 4x4 blocks of pixels, hence the name "block compressed textures". They are a lossy encoding that stores an approximation of the original source texture in fewer bits. The source colors are typically 24 or 32 bits per texel, while BCN stores them in 4 or 8 bits per texel. So BCN is already a compression factor of something like 6:1 (it varies depending on BC1-7 and the source format).

How does Oodle Texture do it?

To understand the principles of how Oodle Texture finds these savings, we'll have to dig a little into what a BCN encoding is. All the BCN are a little different but have similar principles. I'm going to henceforth talk about BC1 to be concrete as an example that illustrates the main points that apply to all the BC1-7.

BC1 stores 24 bit RGB in 4 bits per texel, which is 64 bits per block of 4x4 texels. It does this by sending the block with two 16-bit endpoints for a line segment in color space (32 bits total for endpoints), and then sixteen 2-bit indices that select an interpolation along those endpoints. 2-bits can encode 4 values for each texel, which are each of the endpoints, or 1/3 or 2/3 of the way between them. (BC1 also has another mode with 3 interpolants instead of 4, but we'll ignore that here for simplicity). The BC1 endpoints are 16-bit in 5:6:5 for R:G:B which is a coarser quantization of the color space than the original 8 bits.

We think of RGB as a "color space" where the R,G, and B are axes of a 3d dimensional coordinate system. A single color is a point in this color space. The original 4x4 block of uncompressed texels is equivalent to sixteen points in this color space. In general those points are scattered around this big 3d space, but in practice they usually form a cloud (or a few clusters) that is compact, because colors that are nearby each other in the image tend to have similar RGB values.

BC1 approximates these points with a line segment that has 4 discrete codable points on the segment, at the endpoints, and 1/3 of the way from each end. Each color in the original sixteen can pick the closest of the 4 codable points with the 2 bits sent per texel. The problem of BC1 encoding is to choose the endpoints for this line segment, so that the reproduced image looks as good as possible. Once you choose the endpoints, it's easy to find the indices that minimize error for that line segment.

The thing that makes BC1 encoding interesting and difficult is that there are a large number of encodings that have nearly the same error. Your goal is to put a line segment through a cluster of points, and slightly different endpoints correspond to stretches or rotations of that line. You can hit any given color with either an endpoint of the segment or a 1/3 interpolant, so you can do these big stretches or contractions of the line segment and still have nearly the same error.

For example, here are two clusters of points (the black dots) in color space, with some possible BC1 encodings that produce similar errors :

If you're only considering distortion, then these options have nearly the same error. In fact you could just put your line segment through the principle axis of the color clusters, and then you are within bounded error of the best possible encoding (if the line segment was sent with real numbers, then the best fit line would in fact minimize squared error, by definition; the quantization of the endpoints means this doesn't necessarily give you minimum error). That's possible because the distortion varies smoothly and convexly (except for quantization effects, which are bounded). This is just a way of saying that there's a minimum error encoding where the line segment goes through the original colors, and if you keep stepping the endpoints away from that line segment, the error gets worse.

Oodle Texture isn't just looking for the lowest error (or "distortion") when encoding to BCN; it does "rate-distortion optimization". This means that in addition to considering the distortion of each possible encoding, it also considers the rate. The "rate" in this case is the estimated size of the chosen block encoding after subsequent compression by a lossless compressor like Kraken or zlib.

By considering rate, Oodle Texture can make smarter encodings that optimize for compressed size as well as quality. Sometimes this is just free, by measuring the rate of different choices you may see that two encodings with equal quality do not have the same rate, and you should choose the one with better rate. Sometimes this means a tradeoff, where you sacrifice a small amount of quality to get a big rate gain.

Rate Distortion Optimization or RDO does not mean that we are introducing loss or bad quality into the encoding. It simply means the encoder is considering two types of cost when it makes decisions. It can balance the desire for maximum quality against the desire for the smallest possible size, since both are not possible at the same time a trade off must be made, which the game developer can control with a quality parameter. Oodle Texture RDO can product very high quality encodings that are nearly visually indistinguishable from non-RDO encodings, but compress much more, simply by being a smart encoding which takes into consideration the rate of the choices.

People actually do rate-distortion optimization in games all the time without realizing it. When you choose to use a 4k x 4k texture vs. an 8k x 8k texture, you are making a visual quality vs size decision. Similarly if you choose BC1 vs BC7, you're choosing 4 or 8 bits per texel vs a quality tradeoff. Those are very big coarse steps, and the value of the tradeoff is not systematically measured. The difference with Oodle Texture is that our RDO is automatic, it provides a smooth easy to control parameter, the tradeoff is scored carefully and the best possible ways to trade size for quality are chosen.

Here's an example of Oodle Texture BC7 encoding made with and without RDO :

BC7 baselineBC7 RDO lambda=30
1.081 to 1 compression1.778 to 1 compression

(texture from cc0textures.com, resized to 512x512 before BC7 encoding; compression ratio is with Kraken level 8)

(BC7 textures like this that hardly compress at all without RDO are common)

Oodle Texture RDO encodes source art to BCN, looking at the many different options for endpoints and measuring "rate" and "distortion" on them. We noted previously that distortion is pretty well behaved as you search for endpoints, but in contrast, the rate does not behave the same way. The rate of two different endpoint choices could be vastly different even for endpoints whose colors are right next to each other in color space. Rate does not vary smoothly or monotonically as you explore the endpoint possibilities, it varies wildly up and down, which means a lot more possibilities have to be searched.

The way we get compression of BCN textures is mainly through reuse of components of the block encoding. That is, the back end compressor will find that a set of endpoints or indices (the two 32-bit parts of a BC1 block, for example) are used in two different places, and therefore can send the second use as an LZ77 match instead of transmitting them again. We don't generally look for repetition of entire blocks, though this can reduce rate, because it causes visually obvious repetitions. Instead by looking to repeat the building components that make up the BCN blocks, we get rate reduction without obvious visual repetition.

You might have something like

Encode block 1 with endpoints {[5,10,7] - [11,3,7]} and indices 0xE3F0805C

Block 2 has lots of choices of endpoints with similar distortions

{[6,11,7] - [11,3,7]} distortion 90   rate 32 bits
{[1,10,7] - [16,5,7]} distortion 95   rate 32 bits
{[5,10,7] - [11,3,7]} distortion 100  rate 12 bits

the choice of {[5,10,7] - [11,3,7]} has a rate that's much lower than the others
because it matches previously used endpoints

Part of what makes RDO encoding difficult is that both "rate" and "distortion" are not trivial to evaluate. There's no simple formula for either that provides the rate and distortion we need.

For distortion, you could easily just measure the squared distance error of the encoding (aka RMSE, SSD or PSNR), but that's not actually what we care about. We care about the visual quality of the block, and the human eye does not work like RMSE, it sees some errors as objectionable even when they are quite numerically small. For RDO BCN we need to be able to evaluate distortion millions of times on the possible encodings, so complex human-visual simulations are not possible. We use a very simple approximation that treats errors as more significant when they occur in smooth or flat areas, because those will be more jarring to the viewer; errors that occur in areas that were already noisy or detailed will not be as noticeable, so they get a lower D score. Getting this right has huge consequences, without a perceptual distortion measure the RDO can produce ugly visible blocking artifacts even when RMSE is quite low.

To measure the rate of each block coding decision, we need to guess how well a block will compress, but we don't yet have all the other blocks, and the compressors that we use are dependent on context. That is, the actual rate will depend on what comes before, and the encoding we choose for the current block will affect the rate of future blocks. In LZ77 encoding this comes mainly through the ability to match the components of blocks; when choosing a current block you want it to be low "rate" in the sense that it is a match against something in the past, but also that it is useful to match against in the future. We use a mix of techniques to try to estimate how different choices for the current block will affect the final compressed size.

When choosing the indices for the BCN encoding (the four interpolants along the line segment that each texel chooses), the non-RDO encoder just took the closest one, giving the minimum color error. The RDO encoder also considers taking interpolants that are not the closest if it allows you to make index bytes that occur elsewhere in the image, thus reducing rate. Often a given color is nearly the same distance from two interpolants, but they might have very different rate. Also, some choice of endpoints might not give you any endpoint reuse, but it might change the way you map the colors to indices that gives you reuse there. Considering all these possibilities quickly is challenging.

Oodle Texture measures these rate and distortion scores for lots of possible block encodings, and makes a combined score

J = D + lambda * R
that lets us optimize for a certain tradeoff of rate and distortion, depending on the lambda parameter. You can't minimize distortion and rate at the same time, but you can minimize J, which reaches the ideal mix of rate and distortion at that tradeoff. The client specifies lambda to control if they want maximum quality, or lower quality for more rate reduction. Lambda is a smooth continuous parameter that gives fine control, so there are no big jumps in quality. Oodle Texture RDO can encode to the same quality as the non-RDO encoders at low lambda, and gradually decreases rate as lambda goes up.

This optimization automatically finds the rate savings in the best possible places. It takes rate away where it makes the smallest distortion gain (measured with our perceptual metric, so the distortion goes where it is least visible). This means that not all textures get the same rate savings, particularly difficult ones will get less rate reduction because they need the bits to maintain quality. That's a feature that gives you the best quality for your bits across your set of textures. Oodle Texture is a bit like a market trader going around to all your textures, asking who can offer a bit of rate savings for the lowest distortion cost and automatically taking the best price.

Textures encoded with Oodle Texture RDO and then Kraken act a bit more like a traditional lossy encoding like JPEG. Non-RDO BCN without followup compression encodes every 4x4 block to the same number of output bits (either 64 or 128). With Oodle Texture RDO + Kraken, the size of output blocks is now variable depending on their content and how we choose to encode them. Easier to compress blocks will take fewer bits. By allocating bits differently, we can reduce the number of bits a given block takes, and perhaps lower its quality. One way to think about Oodle Texture RDO is as a bit allocation process. It's looking at the number of bits taken by each block (after compression) and deciding where those bits are best spent to maximize visual quality.

Rate-distortion optimization is standard in modern lossy codecs such as H264 & H265. They do similar bit allocation decisions in the encoder, usually by explicitly changing quantizers (a quantizer is like the JPEG quality parameter, but modern codecs can vary quantizer around the image rather than having a single value for the whole image) or thresholding small values to zero. What's different here is that Oodle Texture still outputs fixed size blocks, we don't have direct control of the final compression stage, we can only estimate what it will do. We don't have anything as simple as a quantizer to control block rate, we make the lower rate block encodings by finding ways to pack the RGB to BCN that are likely to compress more.

BC7 textures offer higher quality than BC1 at double the size (before compression). Without RDO, BC7 textures have been particularly large in game packages because they naturally compress very poorly. BC7 has many different modes, and packs its fields off byte alignment, which confuses traditional compressors like Kraken and zlib, and makes it hard for them to find any compression. It's quite common for non-RDO BC7 texture to compress by less than 10%.

Oodle Texture RDO can make BC7 encodings that are much more compressible. For example :

"mysoup1024"

non-RDO BC7 :
Kraken          :  1,048,724 ->   990,347 =  7.555 bpb =  1.059 to 1

RDO lambda=40 BC7 :
Kraken          :  1,048,724 ->   509,639 =  3.888 bpb =  2.058 to 1
Modern games are using more and more BC7 textures because they provide much higher quality than BC1 (which suffers from chunky artifacts even at max quality). This means lots of game packages don't benefit as much from compression as we'd like. Oodle Texture RDO on BC7 fixes this.

Oodle Texture also has a lossless transform for BC7 called "bc7prep" that rearranges the fields of BC7 to make it more compressible. This gives a 5-15% compression gain on existing BC7 encodings. It works great stacked with RDO in the high quality levels as well.

We think that Oodle Texture is just a better way to encode BCN textures, and it should be used on games on all platforms. Oodle Texture has the potential to dramatically shrink compressed game sizes.

You can read more about Oodle Texture at the RAD Game Tools web site, along with the rest of the Oodle family of data compression solutions.

6 comments:

Unknown said...

Really great technology!

Is there a way for me to upload a single image to evaluate estimated RDO compression improvements with various formats (zip, 7z, etc)?

cbloom said...

If you're a game developer, contact RAD for an evaluation to try it on your own images.

I'll be posting some more content on "sample runs" page that anyone can download and try various compressors on.

It would be neat to have a web page where people could upload images and see results online to try it out. Unfortunately that's a bit beyond my web coding abilities.

Hansa said...

I think there is typo in your text regarding the saved data sizes for the games reaching 100 GB. You state that Oodle Textures saves only 30 MB.

cbloom said...

The 30 MB savings on the particular test set that I show results on there.

We got 127 MB of uncompressed textures from a gain, covering a few characters, some models, and level textures.

Those would have been 78 MB without Oodle (just zlib compression). That's the size a gamer would have downloaded.

With Oodle Texture + Kraken that got down to 40 MB. 38 MB saved out of 78 MB, so almost 50%.

Unknown said...

What was the lambda in the 127MB example? Or did each texture get its own value? And what about BC7, usually Kraken is having a hard time with BC7 but in this example, Kraken without RDO got 45% on average? I'm assuming it had just a few BC7 textures?

cbloom said...

The 127 MB example is run at lambda=40. That's the upper limit of what I think is safe to use with manual inspection.

That test set is 89 MB BC7, 11.5 MB BC1, 11 MB BC3, 15 MB BC4

(sizes of uncompressed BCN files)

You're right that Kraken sometimes finds little compression on BC7, but it depends on the texture. There are several examples posted on the various pages that show that case, where Kraken (without Prep or RDO) gets over 7 bits per byte on BC7, very little compression until it gets some help from Oodle Texture.

This particular set of BC7's has a bunch of normal maps that have big areas of flat normals, and some character charts where the whole texture isn't used. It's a real set from a shipping game in 2019.

old rants