cbloom rants: 06/2020

6/17/2020

Oodle Texture slashes game sizes

Oodle Texture is a new technology we've developed at RAD Game Tools which promises to dramatically shrink game sizes, reducing what you need to download and store on disk, and speeding up load times even more.

Oodle Texture creates BC1-7 GPU textures that are far more compressible, so that when packaged for storage or distribution they are much smaller - up to 2X smaller. Many games have most of their content in this form, so this leads to a huge impact on compressed game sizes, usually 10%-50% smaller depending on the content and how Oodle Texture is used.

Smaller content also loads faster, so improving the compression ratio by 2X also improves effective IO speed by 2X. This is possible when the decompression is not the bottleneck, such as when you use super fast Oodle Kraken decompression, or a hardware decoder.

At RAD, we previously developed Oodle Kraken, part of Oodle Data Compression, which provides super fast decompression with good compression ratios, which makes Kraken great for game data loading where you need high speed. But Kraken is generic, it works on all types of data and doesn't try to figure out data-specific optimizations. Oodle Texture is able to greatly decrease the size that a following Kraken compression gets by preparing the textures in ways that make them more compressible.

Oodle Texture is specialized for what are called "block compressed textures". These are a form of compressed image data that is used by GPUs to provide the rendering attributes for surfaces in games. Oodle Texture works on BC1-BC7 textures, sometimes called "BCN textures". The BC1-7 are seven slightly different GPU formats for different bit depths and content types, and most games use a mix of different BCN formats for their textures. Modern games use a huge amount of BCN texture data. Shrinking the BCN textures to half their previous compressed size will make a dramatic difference in game sizes.

For an example of what Oodle Texture can do, on a real game data test set from a small selection of textures from real shipping content :

127 MB BCN GPU textures, mix of BC1-7, before any further compression

78 MB with zip/zlib/deflate

70 MB with Oodle Kraken

40 MB with Oodle Texture + Kraken

Without Oodle, the game may have shipped the zlib compressed textures at 78 MB. The Oodle Texture + Kraken compressed game is almost half the size of the traditional zlib-compressed game (40 MB). While Oodle Texture is great with Kraken, it also works to prepare textures for compression by other lossless back ends (like zlib). We believe that Oodle Texture should be widely used on game textures, even when Kraken isn't available.

While Kraken is a huge technological advance over zip/zlib, it only saved 8 MB in the example above (this is partly because BCN texture data is difficult for generic compressors to work with), while Oodle Texture saved an additional 30 MB, nearly 4X more than Kraken alone. The size savings possible with Oodle Texture are huge, much bigger than we've seen from traditional compressors, and you don't need to accept painful quality loss to get these savings.

The way that games process texture data is :


RGB uncompressed source art content like BMP or PNG

|     <- this step is where Oodle Texture RDO goes
V

BCN compressed texture

|
V

Kraken or zlib compression on the game package containing the BCN textures

|                                                                                   TOOLS ^
V

sent over network, stored on disk


|                                                                                   RUNTIME v
V

decompress Kraken or zlib to load (sometimes with hardware decompressor)

|
V

BCN compressed texture in memory

|
V

rendered on GPU

Oodle Texture doesn't change this data flow, it just makes the content compress better so that the packaged size is smaller. You still get GPU-ready textures as the output. Note that Oodle Texture RDO isn't required in the runtime side at all.

(Oodle Texture also contains bc7prep which has slightly different usage; see more later, or here)

Games don't decompress the BCN encoding, rendering reads directly from BCN. Games use BCN textures directly in memory because GPUs are optimized to consume that format, and they also take less memory than the original uncompressed RGB image would (and therefore also use less bandwidth), but they aren't a great way to do lossy compression to optimize size in packages. For example the familiar JPEG lossy image compression can make images much smaller than BCN can at similar visual quality levels. In Oodle Texture we want to shrink the package sizes, but without changing the texture formats, because games need them to load into BCN. We also don't want to use any slow transcoding step, cause an unnecessary loss of quality, or require decoding at runtime.

Oodle Texture can be used on the new consoles that have hardware decompression without adding any software processing step. You just load the BCN textures into memory and they are decompressed by the hardware, and you get the benefit of much smaller compressed sizes, which also effectively multiplies the load speed.

Oodle Texture RDO can't be used to compress games with existing BCN texture content, as that has already been encoded. We need to re-encode to BCN from source art as part of the game's content baking tools.

BCN textures work on 4x4 blocks of pixels, hence the name "block compressed textures". They are a lossy encoding that stores an approximation of the original source texture in fewer bits. The source colors are typically 24 or 32 bits per texel, while BCN stores them in 4 or 8 bits per texel. So BCN is already a compression factor of something like 6:1 (it varies depending on BC1-7 and the source format).

How does Oodle Texture do it?

To understand the principles of how Oodle Texture finds these savings, we'll have to dig a little into what a BCN encoding is. All the BCN are a little different but have similar principles. I'm going to henceforth talk about BC1 to be concrete as an example that illustrates the main points that apply to all the BC1-7.

BC1 stores 24 bit RGB in 4 bits per texel, which is 64 bits per block of 4x4 texels. It does this by sending the block with two 16-bit endpoints for a line segment in color space (32 bits total for endpoints), and then sixteen 2-bit indices that select an interpolation along those endpoints. 2-bits can encode 4 values for each texel, which are each of the endpoints, or 1/3 or 2/3 of the way between them. (BC1 also has another mode with 3 interpolants instead of 4, but we'll ignore that here for simplicity). The BC1 endpoints are 16-bit in 5:6:5 for R:G:B which is a coarser quantization of the color space than the original 8 bits.

We think of RGB as a "color space" where the R,G, and B are axes of a 3d dimensional coordinate system. A single color is a point in this color space. The original 4x4 block of uncompressed texels is equivalent to sixteen points in this color space. In general those points are scattered around this big 3d space, but in practice they usually form a cloud (or a few clusters) that is compact, because colors that are nearby each other in the image tend to have similar RGB values.

BC1 approximates these points with a line segment that has 4 discrete codable points on the segment, at the endpoints, and 1/3 of the way from each end. Each color in the original sixteen can pick the closest of the 4 codable points with the 2 bits sent per texel. The problem of BC1 encoding is to choose the endpoints for this line segment, so that the reproduced image looks as good as possible. Once you choose the endpoints, it's easy to find the indices that minimize error for that line segment.

The thing that makes BC1 encoding interesting and difficult is that there are a large number of encodings that have nearly the same error. Your goal is to put a line segment through a cluster of points, and slightly different endpoints correspond to stretches or rotations of that line. You can hit any given color with either an endpoint of the segment or a 1/3 interpolant, so you can do these big stretches or contractions of the line segment and still have nearly the same error.

For example, here are two clusters of points (the black dots) in color space, with some possible BC1 encodings that produce similar errors :

If you're only considering distortion, then these options have nearly the same error. In fact you could just put your line segment through the principle axis of the color clusters, and then you are within bounded error of the best possible encoding (if the line segment was sent with real numbers, then the best fit line would in fact minimize squared error, by definition; the quantization of the endpoints means this doesn't necessarily give you minimum error). That's possible because the distortion varies smoothly and convexly (except for quantization effects, which are bounded). This is just a way of saying that there's a minimum error encoding where the line segment goes through the original colors, and if you keep stepping the endpoints away from that line segment, the error gets worse.

Oodle Texture isn't just looking for the lowest error (or "distortion") when encoding to BCN; it does "rate-distortion optimization". This means that in addition to considering the distortion of each possible encoding, it also considers the rate. The "rate" in this case is the estimated size of the chosen block encoding after subsequent compression by a lossless compressor like Kraken or zlib.

By considering rate, Oodle Texture can make smarter encodings that optimize for compressed size as well as quality. Sometimes this is just free, by measuring the rate of different choices you may see that two encodings with equal quality do not have the same rate, and you should choose the one with better rate. Sometimes this means a tradeoff, where you sacrifice a small amount of quality to get a big rate gain.

Rate Distortion Optimization or RDO does not mean that we are introducing loss or bad quality into the encoding. It simply means the encoder is considering two types of cost when it makes decisions. It can balance the desire for maximum quality against the desire for the smallest possible size, since both are not possible at the same time a trade off must be made, which the game developer can control with a quality parameter. Oodle Texture RDO can product very high quality encodings that are nearly visually indistinguishable from non-RDO encodings, but compress much more, simply by being a smart encoding which takes into consideration the rate of the choices.

People actually do rate-distortion optimization in games all the time without realizing it. When you choose to use a 4k x 4k texture vs. an 8k x 8k texture, you are making a visual quality vs size decision. Similarly if you choose BC1 vs BC7, you're choosing 4 or 8 bits per texel vs a quality tradeoff. Those are very big coarse steps, and the value of the tradeoff is not systematically measured. The difference with Oodle Texture is that our RDO is automatic, it provides a smooth easy to control parameter, the tradeoff is scored carefully and the best possible ways to trade size for quality are chosen.

Here's an example of Oodle Texture BC7 encoding made with and without RDO :

BC7 baseline	BC7 RDO lambda=30
1.081 to 1 compression	1.778 to 1 compression

(texture from cc0textures.com, resized to 512x512 before BC7 encoding; compression ratio is with Kraken level 8)

(BC7 textures like this that hardly compress at all without RDO are common)

Oodle Texture RDO encodes source art to BCN, looking at the many different options for endpoints and measuring "rate" and "distortion" on them. We noted previously that distortion is pretty well behaved as you search for endpoints, but in contrast, the rate does not behave the same way. The rate of two different endpoint choices could be vastly different even for endpoints whose colors are right next to each other in color space. Rate does not vary smoothly or monotonically as you explore the endpoint possibilities, it varies wildly up and down, which means a lot more possibilities have to be searched.

The way we get compression of BCN textures is mainly through reuse of components of the block encoding. That is, the back end compressor will find that a set of endpoints or indices (the two 32-bit parts of a BC1 block, for example) are used in two different places, and therefore can send the second use as an LZ77 match instead of transmitting them again. We don't generally look for repetition of entire blocks, though this can reduce rate, because it causes visually obvious repetitions. Instead by looking to repeat the building components that make up the BCN blocks, we get rate reduction without obvious visual repetition.

You might have something like

Encode block 1 with endpoints {[5,10,7] - [11,3,7]} and indices 0xE3F0805C

Block 2 has lots of choices of endpoints with similar distortions

{[6,11,7] - [11,3,7]} distortion 90   rate 32 bits
{[1,10,7] - [16,5,7]} distortion 95   rate 32 bits
{[5,10,7] - [11,3,7]} distortion 100  rate 12 bits

the choice of {[5,10,7] - [11,3,7]} has a rate that's much lower than the others
because it matches previously used endpoints

Part of what makes RDO encoding difficult is that both "rate" and "distortion" are not trivial to evaluate. There's no simple formula for either that provides the rate and distortion we need.

For distortion, you could easily just measure the squared distance error of the encoding (aka RMSE, SSD or PSNR), but that's not actually what we care about. We care about the visual quality of the block, and the human eye does not work like RMSE, it sees some errors as objectionable even when they are quite numerically small. For RDO BCN we need to be able to evaluate distortion millions of times on the possible encodings, so complex human-visual simulations are not possible. We use a very simple approximation that treats errors as more significant when they occur in smooth or flat areas, because those will be more jarring to the viewer; errors that occur in areas that were already noisy or detailed will not be as noticeable, so they get a lower D score. Getting this right has huge consequences, without a perceptual distortion measure the RDO can produce ugly visible blocking artifacts even when RMSE is quite low.

To measure the rate of each block coding decision, we need to guess how well a block will compress, but we don't yet have all the other blocks, and the compressors that we use are dependent on context. That is, the actual rate will depend on what comes before, and the encoding we choose for the current block will affect the rate of future blocks. In LZ77 encoding this comes mainly through the ability to match the components of blocks; when choosing a current block you want it to be low "rate" in the sense that it is a match against something in the past, but also that it is useful to match against in the future. We use a mix of techniques to try to estimate how different choices for the current block will affect the final compressed size.

When choosing the indices for the BCN encoding (the four interpolants along the line segment that each texel chooses), the non-RDO encoder just took the closest one, giving the minimum color error. The RDO encoder also considers taking interpolants that are not the closest if it allows you to make index bytes that occur elsewhere in the image, thus reducing rate. Often a given color is nearly the same distance from two interpolants, but they might have very different rate. Also, some choice of endpoints might not give you any endpoint reuse, but it might change the way you map the colors to indices that gives you reuse there. Considering all these possibilities quickly is challenging.

Oodle Texture measures these rate and distortion scores for lots of possible block encodings, and makes a combined score

J = D + lambda * R

that lets us optimize for a certain tradeoff of rate and distortion, depending on the lambda parameter. You can't minimize distortion and rate at the same time, but you can minimize J, which reaches the ideal mix of rate and distortion at that tradeoff. The client specifies lambda to control if they want maximum quality, or lower quality for more rate reduction. Lambda is a smooth continuous parameter that gives fine control, so there are no big jumps in quality. Oodle Texture RDO can encode to the same quality as the non-RDO encoders at low lambda, and gradually decreases rate as lambda goes up.

This optimization automatically finds the rate savings in the best possible places. It takes rate away where it makes the smallest distortion gain (measured with our perceptual metric, so the distortion goes where it is least visible). This means that not all textures get the same rate savings, particularly difficult ones will get less rate reduction because they need the bits to maintain quality. That's a feature that gives you the best quality for your bits across your set of textures. Oodle Texture is a bit like a market trader going around to all your textures, asking who can offer a bit of rate savings for the lowest distortion cost and automatically taking the best price.

Textures encoded with Oodle Texture RDO and then Kraken act a bit more like a traditional lossy encoding like JPEG. Non-RDO BCN without followup compression encodes every 4x4 block to the same number of output bits (either 64 or 128). With Oodle Texture RDO + Kraken, the size of output blocks is now variable depending on their content and how we choose to encode them. Easier to compress blocks will take fewer bits. By allocating bits differently, we can reduce the number of bits a given block takes, and perhaps lower its quality. One way to think about Oodle Texture RDO is as a bit allocation process. It's looking at the number of bits taken by each block (after compression) and deciding where those bits are best spent to maximize visual quality.

Rate-distortion optimization is standard in modern lossy codecs such as H264 & H265. They do similar bit allocation decisions in the encoder, usually by explicitly changing quantizers (a quantizer is like the JPEG quality parameter, but modern codecs can vary quantizer around the image rather than having a single value for the whole image) or thresholding small values to zero. What's different here is that Oodle Texture still outputs fixed size blocks, we don't have direct control of the final compression stage, we can only estimate what it will do. We don't have anything as simple as a quantizer to control block rate, we make the lower rate block encodings by finding ways to pack the RGB to BCN that are likely to compress more.

BC7 textures offer higher quality than BC1 at double the size (before compression). Without RDO, BC7 textures have been particularly large in game packages because they naturally compress very poorly. BC7 has many different modes, and packs its fields off byte alignment, which confuses traditional compressors like Kraken and zlib, and makes it hard for them to find any compression. It's quite common for non-RDO BC7 texture to compress by less than 10%.

Oodle Texture RDO can make BC7 encodings that are much more compressible. For example :

"mysoup1024"

non-RDO BC7 :
Kraken          :  1,048,724 ->   990,347 =  7.555 bpb =  1.059 to 1

RDO lambda=40 BC7 :
Kraken          :  1,048,724 ->   509,639 =  3.888 bpb =  2.058 to 1

Modern games are using more and more BC7 textures because they provide much higher quality than BC1 (which suffers from chunky artifacts even at max quality). This means lots of game packages don't benefit as much from compression as we'd like. Oodle Texture RDO on BC7 fixes this.

Oodle Texture also has a lossless transform for BC7 called "bc7prep" that rearranges the fields of BC7 to make it more compressible. This gives a 5-15% compression gain on existing BC7 encodings. It works great stacked with RDO in the high quality levels as well.

We think that Oodle Texture is just a better way to encode BCN textures, and it should be used on games on all platforms. Oodle Texture has the potential to dramatically shrink compressed game sizes.

You can read more about Oodle Texture at the RAD Game Tools web site, along with the rest of the Oodle family of data compression solutions.

6/16/2020

Oodle Texture bc7prep data flow

We mostly talk about Oodle Texture encoding to BCN from source art, usually with RDO (rate-distortion optimization).

In the previous post about Oodle Texture we looked at the data flow of texture content in games, for the case of Oodle Texture RDO.

There is a different technology in Oodle Texture called "bc7prep" that can be used differently, or in addition to RDO.

BC7prep is a lossless transform specifically for BC7 (not BC1-6) that rearranges the bits to improve the compression ratio. Unlike Oodle Texture RDO, BC7Prep requires a reverse transform at runtime to unpack it back to BC7. This is a super fast operation, and can also be done with a GPU compute shader so the CPU doesn't have to touch the bits at all.

BC7prep can be used in combination with Oodle Texture RDO encoding, or on BC7 encodings made from any source. It typically improves the compression of BC7 by 5-15%

BC7 is particularly important because it makes up a lot of the size of modern games. It's a commonly used texture format, and without RDO or bc7prep it often doesn't compress much at all.

The similar data flow chart for BC7 textures that use bc7prep is :


RGB uncompressed source art content like BMP or PNG

|     <- this step is where Oodle Texture RDO goes
V

BC7 compressed texture  <- you can also start here for bc7prep

|     <- bc7prep transforms BC7 to "prepped" data
V

BC7Prep transformed texture

|
V

Kraken or zlib compression on the game package containing the BCN textures

|                                                                                   TOOLS ^
V

sent over network, stored on disk


|                                                                                   RUNTIME v
V

decompress Kraken or zlib to load (sometimes with hardware decompressor)

|
V

BC7Prep transformed texture

|     <- bc7prep unpacking, can be done on GPU
V

BC7 compressed texture in memory

|
V

rendered on GPU

Oodle Texture gives you several options for how you use it depending on what best fits your game. You can use only Oodle Texture RDO, in which case no runtime decoding is needed. You can use just bc7prep on existing BC7 encoded data, in which case you don't need to use Oodle Texture's BCN encoding from source art at all. Or you can use both together.

BC7Prep combined with Oodle Texture RDO at "near lossless" levels provides size savings for BC7 with almost no visual quality difference from a non-RDO BC7 encoding.

On varied BC7 textures :

total :
BC7 + Kraken                        : 1.387 to 1
BC7Prep + Kraken                    : 1.530 to 1
NearLossless RDO + BC7Prep + Kraken : 1.650 to 1

The size savings with BC7Prep are not huge and dramatic the way they are with RDO, because the BC7Prep transform is lossless, but they are very large compared to the normal differences between lossless compression options.

For example the compression ratio on that BC7 set with Oodle Leviathan is 1.409 to 1, not much better than Kraken, and a far smaller gain than BC7Prep gives. Oodle Leviathan is a very strong compressor that usually finds bigger gains over Kraken than that, but BC7 data is hard for compressors to parse. BC7Prep and Oodle Texture RDO put the data into a form that increases the benefit of strong compressors like Oodle over weaker ones like zlib. (on data that's incompressible, all compressors are the same).

The runtime BC7Prep untransform is extremely fast. If you're using Oodle Data compression in software, it's best to do the BC7Prep untransform in software right after decompression. If you're on a platform with hardware decompression, you may want to use BC7Prep untransform compute shaders so the texture data never has to be touched by the CPU.

Visit the RAD Game Tools website to read all about Oodle Texture.

Oodle Texture sample run

This is a sample run of Oodle Texture with files you can download to verify our results for yourself.

The example here is my photograph "mysoup" which I make CC0. While most game textures are not like photographs, this is typical of the results we see. To try Oodle Texture on your own images contact RAD Game Tools for an evaluation. You can also see more sample encodings on real game textures at the Oodle Texture web site.

I will be showing "mysoup" encoded to BC7 here, which is the format commonly used by games for high quality textures. It is 8 bits per texel (vs 24 for source RGB). The BC7 images I provide here are in DDS format; they are not viewable by normal image viewers, this is intended for game developers and technical artists to see real game data.

I have made these BC7 DDS with Oodle Texture in "BC7 RGBA" mode, which attempts to preserve the opaque alpha channel in the encoding. I would prefer to use our "BC7 RGB" which ignores alpha; this would get slightly higher RGB quality but can output undefined in alpha. Because many third party viewers don't handle this mode, I've not used it here.

I will also show the encoding of a few different maps from a physically based texture : coral_mud_01 from texturehaven.com (CC0), to show a sample of a full texture with diffuse, normal, and attribute maps.

I show RMSE here for reference, you should be able to reproduce the same numbers on the sample data. The RMSE I show is per texel (not per channel), and I compute only RGB RMSE (no A). Note that while I show RMSE, Oodle Texture has been run for Perceptual quality, and visual quality is what we hope to maximize (which sacrifices RMSE performance).

Oodle Texture is not a compressor itself, it works with the back end lossless compressor of your choice. Here I will show sizes with software Kraken at level 8 and zlib at level 9. You can download the data and try different back end compressors yourself. Oodle Texture works with lots of different lossless back end compressors to greatly increase their compression ratio.

The "mysoup" images are 1024x1024 but I'm showing them shrunk to 512x512 for the web site; you should always inspect images for visual quality without minification; click any image for the full size.

Download all the files for mysoup1024 here : (BC7 in DDS) :
mysoup1024_all.7z Download all the files for coral_mud_01 here : (BC7,BC4,BC5 in DDS) :
coral_mud_01.7z

mysoup1024.png :
original uncompressed RGB :

click image for full resolution.

baseline BC7 :
(no RDO, max quality BC7)
RMSE per texel: 2.7190

Kraken          :  1,048,724 ->   990,347 =  7.555 bpb =  1.059 to 1
Kraken+bc7prep  :  1,080,696 ->   861,173 =  6.375 bpb =  1.255 to 1
zlib9           :  1,048,724 -> 1,021,869 =  7.795 bpb =  1.026 to 1

BC7 data is difficult for traditional compressors to handle. We can see here that neither Kraken nor zlib can get much compression on the baseline BC7, sending the data near the uncompressed 8 bits per texel size of BC7.

Oodle Texture provides "bc7prep", which is a lossless transform that makes the BC7 data more compressible. "bc7prep" can be used with Kraken, zlib, or any other back end compressor. "bc7prep" does require a runtime pass to transform the data back to BC7 that the GPU can read, but this can be done with a GPU compute shader so no CPU involvement is needed. Here bc7prep helps quite a bit in changing the data into a form that can be compressed by Kraken.

rdo lambda=5 :
RDO in near lossless mode
RMSE per texel: 2.8473

Kraken          :  1,048,724 ->   849,991 =  6.484 bpb =  1.234 to 1
Kraken+bc7prep  :  1,080,468 ->   767,149 =  5.680 bpb =  1.408 to 1
zlib9           :  1,048,724 ->   895,421 =  6.831 bpb =  1.171 to 1

In near lossless mode, Oodle Texture can make encodings that are visually indistinguishable from baseline, but compress much better. At this level, Oodle Texture RDO is finding blocks that are lower rate (eg. compress better with subsequent Kraken compression), but are no worse than the baseline choice. It's simply a smarter encoding that considers rate as well as distortion when considering the many possible ways the block can be encoded in BCN.

Note that when we say "near lossless RDO" we mean nearly the same quality as the baseline encoding. The baseline encoding to BC7 forces some quality loss from the original, and this RDO setting does not increase that. The RMSE difference to baseline is very small, but the visual quality difference is even smaller.

We believe that legacy non-RDO encodings of most texture types should never be used. Oodle Texture RDO provides huge wins in size with no compromise; if you need max quality just run in near lossless mode. It simply makes a better encoding to BC7 which is much more compressible. Many common BC7 encoders produce worse quality than Oodle Texture does in near-lossless mode.

rdo lambda=40 :
RDO in medium quality mode
RMSE per texel: 4.2264

Kraken          :  1,048,724 ->   509,639 =  3.888 bpb =  2.058 to 1
Kraken+bc7prep  :  1,079,916 ->   455,747 =  3.376 bpb =  2.370 to 1
zlib9           :  1,048,724 ->   576,918 =  4.401 bpb =  1.818 to 1

At lambda=40 we are now trading off some quality for larger rate savings. At this level, visual differences from the original may start to appear, but are still very small, and usually acceptable. (for example the errors here are far far smaller than if you encoded to BC1, or even if you encoded with a poor BC7 encoder that reduces choices in a hacky/heuristic way).

At this level, Kraken is now able to compress the image nearly 2 to 1 , to 3.888 bits per texel, starting from baseline which got almost no compression at all. We've shrunk the Kraken compressed size nearly by half. This also means the content can load twice as fast, giving us an effective 2X multiplier on the disk speed. This is a HUGE real world impact on game content sizes with very little down side.

zlib has also benefitted from RDO, going from 1,021,869 to 576,918 bytes after compression. Kraken does a bit better because it's a bit stronger compressor than zlib. The difference is not so much because Oodle Texture is specifically tuned for Kraken (it's in fact quite generic), but because more compressible data will tend to show the difference between the back end compressors more. On the baseline BC7 data, it's nearly incompressible, so the difference between Kraken and zlib looks smaller there.

Download all the "mysoup" files here : (BC7 in DDS) :
mysoup1024_all.7z

Summary of all the compression results :

baseline BC7 :

Kraken          :  1,048,724 ->   990,347 =  7.555 bpb =  1.059 to 1
Kraken+bc7prep  :  1,080,696 ->   861,173 =  6.375 bpb =  1.255 to 1
zlib9           :  1,048,724 -> 1,021,869 =  7.795 bpb =  1.026 to 1

RDO lambda=5 :

Kraken          :  1,048,724 ->   849,991 =  6.484 bpb =  1.234 to 1
Kraken+bc7prep  :  1,080,468 ->   767,149 =  5.680 bpb =  1.408 to 1
zlib9           :  1,048,724 ->   895,421 =  6.831 bpb =  1.171 to 1

RDO lambda=40 :

Kraken          :  1,048,724 ->   509,639 =  3.888 bpb =  2.058 to 1
Kraken+bc7prep  :  1,079,916 ->   455,747 =  3.376 bpb =  2.370 to 1
zlib9           :  1,048,724 ->   576,918 =  4.401 bpb =  1.818 to 1

coral_mud_01 :

You can get the source art for coral_mud_01 at texturehaven.com. I used the 1k PNG option. On the web site here I am showing a 256x256 crop of the images so they can be seen without minification. Download the archive for the full res images.

coral_mud_01_diff_1k :
diffuse (albedo) color in BC7 (RGBA)

BC7 non-RDO	BC7 lambda=30	BC7 lambda=50
RMSE 3.5537	RMSE 4.9021	RMSE 5.7194
7.954 bpb	5.339 bpb	4.683 bpb

BC7Prep could also be used for additional compression, not shown here.

coral_mud_01_Nor_1k.png:
normal XY in RG channels only in BC5
BC5 decodes to 16 bit
RMSE is RG only

BC5 non-RDO	BC5 lambda=30	BC5 lambda=50
RMSE 3.4594	RMSE 5.8147	RMSE 7.3816
8.000 bpb	5.808 bpb	5.083 bpb

coral_mud_01_bump_1k.png:
bump map, single scalar channel
BC4 decodes to 16 bit

BC4 non-RDO	BC4 lambda=30	BC4 lambda=50
RMSE 3.2536	RMSE 4.1185	RMSE 5.2258
7.839 bpb	6.871 bpb	6.181 bpb

Single scalar channels in BC4 is an unusual usage for games. Typically several scalar channels would be combined in a BC7 texture.

Compressed sizes are with software Kraken at level 8, no additional filters. "bpb" means "bits per byte", it's the compressed size in bits, per byte. The non-RDO textures in coral_mud all get almost no compression at all with Kraken. With RDO that improves to around 5 bpb, which is an 8:5 ratio or 1.6 to 1

With BC7 and BC5, the "bpb" size is also the number of bits per texel, because they start at 8 bits per texel when uncompressed. If RDO can improve BC7 to 4 bits per texel, that means it's now the same size on disk as uncompressed BC1, but at far higher visual quality. (2:1 on BC7 is a bit more than we typically expect; 8:5 or 8:6 is more common)

Download all the files for coral_mud_01 here : (BC7,BC4,BC5 in DDS) :
coral_mud_01.7z

6/09/2020

Followup tidbits on RGBE

As noted previously, RGBE 8888 is not a very good encoding for HDR in 32 bits. I haven't personally evaluated the other options, but from reading the 16-8-8 LogLUV looks okay. You want more bits of precision for luminance, and the only way to do that is to go into some kind of luma-chroma space.

In any case, we'll look at a couple RGBE followup topics because I think they may be educational. Do NOT use these. This is for our education only, don't copy paste these and put them in production! If you want an RGBE conversion you can use, see the previous post!

In the previous post I wrote that I generally prefer centered quantization that does bias on encode. This is different than what is standard for Radiance HDR RGBE files. (DO NOT USE THIS). But say you wanted to do that, what would it look like exactly?


// float RGB -> U8 RGBE quantization
void float_to_rgbe_centered(unsigned char * rgbe,const float * rgbf)
{
    // NOT HDR Radiance RGBE conversion! don't use me!
        
    float maxf = rgbf[0] > rgbf[1] ? rgbf[0] : rgbf[1];
    maxf = maxf > rgbf[2] ? maxf : rgbf[2];

    if ( maxf <= 1e-32f )
    {
        // Exponent byte = 0 is a special encoding that makes RGB output = 0
        rgbe[0] = rgbe[1] = rgbe[2] = rgbe[3] = 0;
    }
    else
    {
        int exponent;
        frexpf(maxf, &exponent);
        float scale = ldexpf(1.f, -exponent + 8);
    
        // bias might push us up to 256
        // instead increase the exponent and send 128
        if ( maxf*scale >= 255.5f )
        {
            exponent++;
            scale *= 0.5f;
        }
    
        // NOT HDR Radiance RGBE conversion! don't use me!
        rgbe[0] = (unsigned char)( rgbf[0] * scale + 0.5f );
        rgbe[1] = (unsigned char)( rgbf[1] * scale + 0.5f );
        rgbe[2] = (unsigned char)( rgbf[2] * scale + 0.5f );
        rgbe[3] = (unsigned char)( exponent + 128 );
    }
}

// U8 RGBE -> float RGB dequantization
void rgbe_to_float_centered(float * rgbf,const unsigned char * rgbe)
{
    // NOT HDR Radiance RGBE conversion! don't use me!

    if ( rgbe[3] == 0 )
    {
        rgbf[0] = rgbf[1] = rgbf[2] = 0.f;
    }
    else
    {
        // NOT HDR Radiance RGBE conversion! don't use me!

        float fexp = ldexpf(1.f, (int)rgbe[3] - (128 + 8));
        // centered restoration, no bias :
        rgbf[0] = rgbe[0] * fexp;
        rgbf[1] = rgbe[1] * fexp;
        rgbf[2] = rgbe[2] * fexp;
    }
}

what's the difference in practice ?

On random floats, there is no difference. This has the same 0.39% max round trip error as the reference implementation that does bias on decode.

The difference is that on integer colors, centered quantization restores them exactly. Specifically : for all the 24-bit LDR (low dynamic range) RGB colors, the "centered" version here has zero error, perfect restoration.

That sound pretty sweet but it's not actually helpful in practice, because the way we in games use HDR data typically has the LDR range scaled in [0,1.0] not ,[0,255]. The "centered" way does preserve 0 and 1 exactly.

The other thing I thought might be fun to look at is :

The Radiance RGBE conversion has 0.39% max round trip error. That's exactly the same as a flat quantizer from the unit interval to 7 bits. (the bad conversion that did floor-floor had max error of 0.78% - just the same as a flat quantizer to 6 bits).

But our RGBE all have 8 bits. We should be able to get 8 bits of precision. How would you do that?

Well one obvious issue is that we are sending the max component with the top bit on. It's in [128,255], we always have the top bit set and then only get 7 bits of precision. We could send that more like a real floating point encoding with an implicit top bit, and use all 8 bits.

If we do that, then the decoder needs to know which component was the max to put the implicit top bit back on. So we need to signal it. Well, fortunately we have 8 bits for the exponent which is way more dynamic range than we need for HDR imaging, so we can take 2 bits from there to send the max component index and leave 6 bits for exponent.

Then we also want to make sure we use the full 8 bits for the non-maximal components. To do that we can scale their fractional size relative to max up to 255.

Go through the work and we get what I call "rgbeplus" :


/**

! NOT Radiance HDR RGBE ! DONT USE ME !

"rgbeplus" packing

still doing 8888 RGBE, one field in each 8 bits, not the best possible general 32 bit packing

how to get a full 8 bits of precision for each component
(eg. maximum error 0.19% instead of 0.38% like RGBE)

for the max component, we store an 8-bit mantissa without the implicit top bit
  (like a real floating point encoding, unlike RGBE which stores the on bit)
  (normal RGBE has the max component in 128-255 so only 7 bits of precision)

because we aren't storing the top bit we need to know which component was the max
  so the decoder can find it

we put the max component index in the E field, so we only get 6 bits for exponent
  (6 is plenty of orders of magnitude for HDR images)
  
then for the non-max fields, we need to get a full 8 bits for them too  
  in normal RGBE they waste the bit space above max, because we know they are <= max
  eg. if max component was 150 , then the other components can only be in [0,150]
    and all the values above that are wasted precision
  therefore worst case in RGBE the off-max components also only have 7 bits of precision.
  To get a full 8, we convert them to fractions of max :
  frac = not_max / max
  which we know is in [0,1]
  and then scale that up by 255 so it uses all 8 bits

this all sounds a bit complicated but it's very simple to decode

I do centered quantization (bias on encode, not on decode)

**/

// float RGB -> U8 RGBE quantization
void float_to_rgbeplus(unsigned char * rgbe,const float * rgbf)
{
    // rgbf[] should all be >= 0 , RGBE does not support signed values
    
    // ! NOT Radiance HDR RGBE ! DONT USE ME !

    // find max component :
    int maxi = 0;
    if ( rgbf[1] > rgbf[0] ) maxi = 1;
    if ( rgbf[2] > rgbf[maxi] ) maxi = 2;
    float maxf = rgbf[maxi];

    // 0x1.p-32 ?
    if ( maxf <= 1e-10 ) // power of 10! that's around 2^-32
    {
        // Exponent byte = 0 is a special encoding that makes RGB output = 0
        rgbe[0] = rgbe[1] = rgbe[2] = rgbe[3] = 0;
    }
    else
    {
        int exponent;
        frexpf(maxf, &exponent);
        float scale = ldexpf(1.f, -exponent + 9);
        // "scale" is just a power of 2 to put maxf in [256,512)
        
        // 6 bits of exponent :
        if ( exponent < -32 )
        {
            // Exponent byte = 0 is a special encoding that makes RGB output = 0
            rgbe[0] = rgbe[1] = rgbe[2] = rgbe[3] = 0;
            return;
        }
        myassert( exponent < 32 );
        
        // bias quantizer in encoder (centered restoration quantization)
        int max_scaled = (int)( maxf * scale + 0.5f );
        if ( max_scaled == 512 )
        {
            // slipped up because of the round in the quantizer
            // instead do ++ on the exp
            scale *= 0.5f;
            exponent++;
            //max_scaled = (int)( maxf * scale + 0.5f );
            //myassert( max_scaled == 256 );
            max_scaled = 256;
        }
        myassert( max_scaled >= 256 && max_scaled < 512 );
        
        // grab the 8 bits below the top bit :
        rgbe[0] = (unsigned char) max_scaled;
        
        // to scale the other two components
        //  we need to use the maxf the *decoder* will see
        float maxf_dec = max_scaled / scale;
        myassert( fabsf(maxf - maxf_dec) <= (0.5/scale) );
        
        // scale lower components to use full 255 for their fractional magnitude :
        int i1 = (maxi+1)%3;
        int i2 = (maxi+2)%3;
        rgbe[1] = u8_check( rgbf[i1] * 255.f / maxf_dec + 0.4999f );
        rgbe[2] = u8_check( rgbf[i2] * 255.f / maxf_dec + 0.4999f );
        
        // rgbf[i1] <= maxf
        // so ( rgbf[i1] * 255.f / maxf ) <= 255
        // BUT
        // warning : maxf_dec can be lower than maxf
        // maxf_dec is lower by a maximum of (0.5/scale)
        // worst case is 
        // (rgbf[i1] * 255.f / maxf_dec ) <= 255.5
        // so you can't add + 0.5 or you will go to 256
        // therefore we use the fudged bias 0.4999f
        
        rgbe[3] = (unsigned char)( ( (exponent + 32) << 2 ) + maxi );
    }
}

// U8 RGBE -> float RGB dequantization
void rgbeplus_to_float(float * rgbf,const unsigned char * rgbe)
{
    // ! NOT Radiance HDR RGBE ! DONT USE ME !

    if ( rgbe[3] == 0 )
    {
        rgbf[0] = rgbf[1] = rgbf[2] = 0.f;
    }
    else
    {
        int maxi = rgbe[3]&3;
        int exp = (rgbe[3]>>2) - 32;
        float fexp = ldexpf(1.f, exp - 9);
        float maxf = (rgbe[0] + 256) * fexp;
        float f1 = rgbe[1] * maxf / 255.f;
        float f2 = rgbe[2] * maxf / 255.f;
        int i1 = (maxi+1)%3;
        int i2 = (maxi+2)%3;
        rgbf[maxi] = maxf;
        rgbf[i1] = f1;
        rgbf[i2] = f2;
    }
}

and this in fact gets a full 8 bits of precision. The max round trip error is 0.196% , the same as a flat quantizer to 8 bits.

(max error is always measured as a percent of the max component, not of the component that has the error; any shared exponent format has 100% max error if you measure as a percentage of the component)

Again repeating myself : this is a maximum precision encoding assuming you need to stick to the "RGBE" style of using RGB color space and putting each component in its own byte. That is not the best possible way to send HDR images in 32 bits, and there's no particular reason to use that constraint.

So I don't recommend using this in practice. But I think it's educational because these kind of considerations should always be studied when designing a conversion. The errors from getting these trivial things wrong are very large compared to the errors that we spend years of research trying to save, so it's quite frustrating when they're done wrong.

6/07/2020

Widespread error in Radiance HDR RGBE conversions

It has come to my attention that broken RGBE image conversions used in the loading of Radiance HDR files are being spread widely. The broken routines are causing significant unnecessary errors in float reconstruction from RGBE 8888.

Note that RGBE 8888 is a very lossy encoding of floating point HDR images to begin with. It is really not appropriate as a working HDR image format if further processing will be applied that may magnify the errors. A half-float format is probably a better choice. If you do use RGBE HDR at least use the correct conversions which I will provide here. (See Greg Ward's article on HDR encodings and also nice table of dynamic range and accuracy of various HDR file formats).

The Radiance HDR RGBE encoding stores 3-channel floating point RGB using four 8-bit components. It takes the RGB floats and finds the exponent of the largest component, sends that exponent in E, then shifts the components to put the largest component's top bit at 128 and sends them as linear 8 bits.

In making the corrected version, I have been guided by 3 principles in order of priority : 1. ensure existing RGBE HDR images are correct encodings, 2. treat the original Radiance implementation as defining the file format, 3. minimize the error of round trip encoding.

With no further ado, let's have the correct RGBE conversion, then we'll talk about the details :

RGBE encoding puts the largest component in the range [128,255). It actually only has 7 bits of mantissa. The smaller components are put on a shared exponent with the largest component, which means if you ever care about tiny values in the non-maximal component this encoding is terrible (as are any shared-exponent encodings). There are much better packings of RGB floats to 32 bits that store more useful bits of precision.

Let's now look at the broken version that's being widely used. It uses the same encoder as above, but in the decoder the bad version does :

// RGBE 8888 -> float RGB
// BAD decode restoration, don't copy me

        rgbf[0] = rgbe[0] * fexp;
        rgbf[1] = rgbe[1] * fexp;
        rgbf[2] = rgbe[2] * fexp;

missing the biases by half.

Essentially this is just a quantization problem. We are taking floats and quantizing them down to 8 bits. Each 8 bit index refers to a "bucket" or range of float values. When you restore after quantizing, you should restore to the middle of the bucket. (with non-uniform error metrics or underlying probability distributions, the ideal restoration point might not be the middle; more generally restore to the value that minimizes error over the expected distribution of input values).

The broken version is essentially doing a "floor" on both the quantization and restoration.


floor quantization :

+---------+---------+---------+---------+---------+---------+
|0        |1        |2        |3        |4        |5        |
+---------+---------+---------+---------+---------+---------+

->
floor restoration :

*0        *1        *2        *3        *4        *5

->
bias 0.5 restoration :

     *0.5      *1.5      *2.5      *3.5      *4.5      *5.5

the rule of thumb is that you need an 0.5 bias on either the quantization side or the restoration side. If you do floor quantization, do +0.5 in restoration. If you do centered quantization (+0.5) you can do do integer restoration.

The broken version is just a bad quantizer that restores value ranges to one edge of the bucket. That's creating a net shift of values downward and just creates error that doesn't need to be there. On random RGB floats :

Maximum relative error :
Correct RGBE encoding : 0.3891%
Bad floor-floor RGBE encoding : 0.7812%

(percentage error relative to largest component).

Note this is a LOT of error. If you just took a real number in [0,1) and quantized it to 8 bits, the maximum error is 0.195% , so even the correct encoding at around 0.39% error is double that (reflecting that we only have 7 bits of precision in the RGBE encoding), and the bad encoding at around 0.78% is double that again (it's equal to the maximum error of uniform quantization if we only had 6 bits of precision).

Reference test code that will print these errors : test_rgbe_error.cpp

I have where possible tried to make this corrected RGBE quantizer match the original HDR file IO from Greg Ward's Radiance . I believe we should treat the Radiance version as canonical and make HDR files that match it. I have adopted a small difference which I note in Appendix 4. The change that I propose here actually makes the encoder match the description of its behavior better than it did before (see Appendix 4). the original Radiance code does floor quantization and midpoint restoration, like here. The broken version was introduced later.

The broken floor-floor code has been widely copied into many tools used in games (such as STBI and NVTT), hopefully those will be fixed soon. Fortunately, the encoder does not need to be changed, only the decode code is changed, so existing HDR images are okay. They can be loaded with the corrected restoration function and should see an improvement in quality for free.

That's it! We'll followup some details in appendices for those who are interested.

Appendix 1 :

Correct conversion in raw text if the pastebin isn't working :

// float RGB -> U8 RGBE quantization
void float_to_rgbe(unsigned char * rgbe,const float * rgbf)
{
    // rgbf[] should all be >= 0 , RGBE does not support signed values
        
    float maxf = rgbf[0] > rgbf[1] ? rgbf[0] : rgbf[1];
    maxf = maxf > rgbf[2] ? maxf : rgbf[2];

    if ( maxf <= 1e-32f )
    {
        // Exponent byte = 0 is a special encoding that makes RGB output = 0
        rgbe[0] = rgbe[1] = rgbe[2] = rgbe[3] = 0;
    }
    else
    {
        int exponent;
        float scale;
        frexpf(maxf, &exponent);
        scale = ldexpf(1.f, -exponent + 8);
                
        rgbe[0] = (unsigned char)( rgbf[0] * scale );
        rgbe[1] = (unsigned char)( rgbf[1] * scale );
        rgbe[2] = (unsigned char)( rgbf[2] * scale );
        rgbe[3] = (unsigned char)( exponent + 128 );
    }
}

// U8 RGBE -> float RGB dequantization
void rgbe_to_float(float * rgbf,const unsigned char * rgbe)
{
    if ( rgbe[3] == 0 )
    {
        rgbf[0] = rgbf[1] = rgbf[2] = 0.f;
    }
    else
    {
        // the extra 8 here does the /256
        float fexp = ldexpf(1.f, (int)rgbe[3] - (128 + 8));
        rgbf[0] = (rgbe[0] + 0.5f) * fexp;
        rgbf[1] = (rgbe[1] + 0.5f) * fexp;
        rgbf[2] = (rgbe[2] + 0.5f) * fexp;
    }
}

Appendix 2 :

Why I prefer centered quantization.

Radiance HDR does floor quantization & centered restoration. I think it should have been the other way around (centered quantization & int restoration), but I don't propose changing it here because we should stick to the reference Greg Ward implementation, and match how existing .hdr files have been encoded.

The reason I prefer centered quantization is that it exactly preserves integers and powers of two. (it has a small bonus of not needing a bias at decode time).

If your input real numbers are evenly distributed with no special values, then there's no reason to prefer either style, they're equal. But usually our inputs are not evenly distributed and random, we often deal with inputs where values like 0.0 and 1.0 or 256.0 are special and we'd like to preserve them exactly.

If you do centered quantization, these restore exactly. If you do floor quantization and then have an 0.5 bias on dequantization to do centered restoration, values like 0 shift to be in the center of their bucket.

In the correct RGBE encoding above, a float input of 1.0 is restored as 1.0039063 (= 1 + 1/256), because 1.0 corresponds to the bottom edge of a quantization bucket, and we restore to the middle of the bucket.

For example for non-HDR images the quantization I like is :

// U8 to float 
// map 1.0 to 255
// centered quantization

unsigned char unit_float_to_u8(float f)
{
    // f in [0,1.0]
    // scale up to [0,255] , then do centered quantization :
    //  eg. values after *255 in [0.5,1.5] -> 1
    // clamp before casting if you aren't sure your floats are bounded!
    return (unsigned char) (f * 255.f + 0.5f);
}

float u8_to_unit_float(unsigned char u8)
{
    // u8 in [0,255]
    // scale to [0,1.0]
    // do straight mapped dequantization :
    return u8 * (1.f/255);
}

It seems that perhaps the desire to preserve 1.0 exactly is what got us into this whole mess. A widely referenced extraction from Radiance was posted here : bjw rgbe, but with a crucial flaw. The reconstruction was changed to remove the bias :

/* standard conversion from rgbe to float pixels */
/* note: Ward uses ldexp(col+0.5,exp-(128+8)).  However we wanted pixels */
/*       in the range [0,1] to map back into the range [0,1].            */

Ward had a valid quantizer (floor encode, bias decode), but it disappeared in this version.

The correct way to get 1.0 preserved would have been to do a centered quantization (bias on the encode). As noted previously I don't recommend changing that now as it would mean existing hdr files were encoded wrong, and it deviates from Ward's original Radiance implementation, which should be taken as defining the format. We should consider the BJW implementation to simply have an incorrect decoder. (the BJW encoder is okay, but it does also suffer from the small flaw discussed in Appendix 4)

Appendix 3 :

RGBE encoder using IEEE floating point bit manipulations.

I don't post this as a suggested optimization, but rather because I think it illuminates what's actually going on in the RGBE encoding.

The IEEE floating points that we are encoding are sign-exponent-mantissa :

SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM

To send the largest component with RGBE, what we are doing is taking the top 7 bits of the mantissa, add the implicit 1 bit ahead of those, and put those in 8 bits.

000000001MMMMMMM0000000000000000

What we're doing with dropping the bits below the top 7 is the floor quantization, whatever those bits were, they're gone. The +0.5 bias on restoration is equivalent to saying we expect all of those dropped bits to be equally likely, hence their average value is 10000 (the highest drop bit is turned on, the rest are left off).

000000001MMMMMMM1000000000000000

The components that aren't the highest are forced onto the same exponent as the highest; this shifts in zeros from the left, their mantissa bits are shifted out the bottom of the word, and then we grab the top 8 bits of them.

The "ldexpf" we do in the correct implementation is just making a pure power of two, which is just an exponent in IEEE floats (all mantissa bits zero). When you multiply that on another float, you're just adding to the exponent part. Unfortunately there aren't standard C ways to do simple float operations like getting an exponent or adding to an exponent so we'll have to dig into the bits.

The full operation is :

union float_and_bits
{
    float f;
    unsigned int bits;  
};

// float RGB -> U8 RGBE quantization
void float_to_rgbe_bits(unsigned char * rgbe,const float * rgbf)
{
    float_and_bits fab[3];
    fab[0].f = rgbf[0];
    fab[1].f = rgbf[1];
    fab[2].f = rgbf[2];
    
    unsigned int max_bits = fab[0].bits > fab[1].bits ?  fab[0].bits : fab[1].bits;
    max_bits = max_bits > fab[2].bits ? max_bits : fab[2].bits;
    int max_exp = (max_bits >> 23) - 127;
    
    //max_bits == 0 is exact float zero
    //(int)max_bits < 0 means the sign bit is on (illegal negative input)
    //max_exp == 128 is NaN
    
    if ( (int)max_bits <= 0 || max_exp <= -100 || max_exp == 128 )
    {
        // Exponent byte = 0 is a special encoding that makes RGB output = 0
        rgbe[0] = rgbe[1] = rgbe[2] = rgbe[3] = 0;
    }
    else
    {
        // float exponent is a value in [1,2)*e^exp , we want [0.5,1) so do ++
        max_exp++;
        
        unsigned int mantissa = 1 << 23;
        int exp0 = (fab[0].bits>>23) - 127;
        int exp1 = (fab[1].bits>>23) - 127;
        int exp2 = (fab[2].bits>>23) - 127;
        int man0 = (fab[0].bits & (mantissa-1)) | mantissa;
        int man1 = (fab[1].bits & (mantissa-1)) | mantissa;
        int man2 = (fab[2].bits & (mantissa-1)) | mantissa;
        man0 >>= min(max_exp - exp0 - 8 + 23,31);
        man1 >>= min(max_exp - exp1 - 8 + 23,31);
        man2 >>= min(max_exp - exp2 - 8 + 23,31);
            
        rgbe[0] = (unsigned char)( man0 );
        rgbe[1] = (unsigned char)( man1 );
        rgbe[2] = (unsigned char)( man2 );
        rgbe[3] = (unsigned char)( max_exp + 128 );
    }
}

// U8 RGBE -> float RGB dequantization
void rgbe_to_float_bits(float * rgbf,const unsigned char * rgbe)
{
    if ( rgbe[3] == 0 )
    {
        rgbf[0] = rgbf[1] = rgbf[2] = 0.f;
    }
    else
    {
        float_and_bits fab;
        int exp = (int)rgbe[3] - 128 - 8;
        fab.bits = (exp+127)<<23;
        float fexp = fab.f;
        rgbf[0] = (rgbe[0] + 0.5f) * fexp;
        rgbf[1] = (rgbe[1] + 0.5f) * fexp;
        rgbf[2] = (rgbe[2] + 0.5f) * fexp;
    }
}

this version using bit manipulation produces exactly identical encodings to the correct version I have posted before.

Appendix 4 :

Another common oddity in the RGBE encoders.

I have tried to stick to the original Radiance version where reasonable, but I have changed one part that is widely done strangely.

The major bias error we have talked about before is in the decoder. This error is in the encode side, but unlike the bias this error is not a large one, it's a small correction and doesn't change the values intended to be stored in the format.

In my correct version, this part :

        int exponent;
        float scale;
        frexpf(maxf, &exponent);
        scale = ldexpf(1.f, -exponent + 8);

is just trying to get the exponent of the largest component, and then form a pure power of 2 float that multiplies by 256 / 2^exponent.

This has been widely done strangely in other implementations. The majority of other implementations do this :

divide by self version :

    scale = frexp(maxf,&exponent) * 256.0/maxf;

frexp returns the mantissa part of the float, scaled to [0.5,1), so when you divide that by maxf what you're doing is cancelling out the mantissa part and leaving just the exponent part of maxf (2^-exponent).

This is not only a bit strange, it is in fact worse. scale made this way does not come out as an exact power of 2, because floating point math is not exact (a reciprocal then multiply does not give back exact 1.0). This divide-by-yourself method does not produce exactly the same encoding as the bit extraction reference version above.

Aside from small drifts of 'scale' causing small error in reconstrction, if it ever got too big, you could have another problem. Our value is supposed to be in [128,256) (inclusive on the bottom of the range, exclusive on the top). That means you can just cast to unsigned char. But if scale could ever be slightly over 1.0, you could get up to 256 exactly, then the cast to unsigned char would turn into 0, which would be a huge error.

This appears to have motivated Greg Ward in the original Radiance code to add a safety margin :

original Radiance code version :
/ray/src/common/color.c line 272

    d = frexp(d, &e) * 255.9999 / d;

The 255.9999 fudge ensures that imprecise math can't make us incorrectly get to 256. It's an ugly way to handle it, and it does cause a small loss of quality, so despite my goal to stick to the original Radiance implementation I am not copying this.

(Radiance was started in 1985 when IEEE compliant floating point rounding was by no means standard, so we can certainly forgive it some fudgy bias factors)

The way I have shown creates a pure power of two scale factor, so it can't suffer from any drift of the scale due to floating point imprecision. This actually matches the described behavior of the RGBE encoding better than previous versions.

For example from Wikipedia :

Radiance calculates light values as floating point triplets, one each for red, green and blue. But storing a full double precision float for each channel (8 bytes × 3 = 24 bytes) is a burden even for modern systems. Two stages are used to compress the image data. The first scales the three floating point values to share a common 8-bit exponent, taken from the brightest of the three. Each value is then truncated to an 8-bit mantissa (fractional part). The result is four bytes, 32 bits, for each pixel. This results in a 6:1 compression, at the expense of reduced colour fidelity.

cbloom rants