11-21-08 - More Texture Compression Nonsense

I guess the DX11 Technical Preview was just publicly released a few weeks ago. Unfortunately it's semi-crippled and still doesn't have information about BC7. From what I gather though BC7 does seem to be pretty high quality.

There're multiple different issues here. There's providing data to the card in a way that's good for texture cache usage (DXT1 is a big winner here, especially on the RSX apparently). Another is keeping data small in memory so you can hold more. Obviously that's a bigger issue on the consoles than the PC, but you always want things smaller in memory if you can. Another issue is paging off disk quickly. Smaller data loads faster, though that's not a huge issue off hard disk, and even off DVD if you're doing something like SVT you are so dominated by seek times that file sizes may not be that big. Another issue is the size of your content for transmission, either on DVD or over the net; it's nice to make your game downloadable and reasonably small.

I guess that the Xenon guys sort of encourage you to use PCT to pack your textures tiny on disk or for transmission. PCT seems to be a variant of HD-photo. MCT is I guess a DXT-recompressor, but I can't find details on it. I'm not spilling any beans that you can't find at MS Research or that end users aren't figuring out or PS3 developers are hinting at.

The "Id way" I guess is storing data on disk in JPEG, paging that, decompressing, then recompressing to DXT5-YCoCg. That has the advantage of being reasonably small on disk, which is important if you have a huge unique textured world so you have N gigs of textures. But I wonder what kind of quality they really get from that. They're using two different lossy schemes, and when you compress through two different lossy schemes the errors usually add up. I would guess that the total error from running through both compressors puts them in the ballpark of DXT1. They're using 8 bits per pixel in memory, and presumably something like 1-2 bits per pixel on disk.

Instead you could just use DXT1 , at 4 bits per pixel in memory, and do a DXT1-recompressor, which I would guess could get around 2 bits per pixel on disk. DXT1-recompressed is lower quality than JPEG, but I wonder how it compares to JPEG-then-DXT5 ?

If I ignore the specifics of the Id method or the XBox360 for the moment, the general options are :

1. Compress textures to the desired hardware-ready format (DXT1 or DXT5 or whatever) with a high quality offline compressor. Store them in memory in this format. Recompress with a shuffler/delta algorithm to make them smaller for transmisssion on disk, but don't take them out of hardware-ready format. One disadvantage of this is that if you have to support multiple hardware-ready texture formats on the PC you have to transmit them all or convert.

2. Compress texture on disk with a lossy scheme that's very tight and has good RMSE/size performance. Decompress on load and then recompress to hardware-ready format (or not, if you have enough video memory). One advantage is you can look at the user's machine and decide what hardware-ready format to use. A disadvantange is the realtime DXT compressors are much lower quality and even though they are very fast the decompress-recompress is still a pretty big CPU load for paging.

3. Hybrid of the two : use some very small format for internet transmision or DVD storage, but when you install to HD do the decompress and recompress to hardware-ready format then. Still has the problem that you're running two different lossy compressors which is evil, but reduces CPU load during game runs.

I don't think that having smaller data on disk is much of a win for paging performance. If you're paging any kind of reasonable fine-grained unit you're just so dominated by seek time, and hard disk throughput is really very fast (and it's all async anyway so the time doesn't hurt). For example a 256 x 256 texture at 4 bits per pixel is 32k bytes which loads in 0.32 millis at 100 MB/sec. (BTW that's a handy rule of thumb - 100k takes 1 milli). So the real interesting goals are : A) be small in memory so that you can have a lot of goodies on screen, and B) be small for transmission to reduce install time, download time, number of DVDs, etc.

One idea I've been tossing around is the idea of using lossy compressed textures in memory as a texture cache to avoid hitting disk. For example, a 256 X 256 texture at 1 bit per pixel is only 8k bytes. If you wanted to page textures in that unit size you would be ridiculously dominated by seek time. Instead you could just hold a bunch of them in memory and decompress to "page" them into usable formats. The disk throughput is comparable to the decompresser throughput, but not having seek times means you can decompress them on demand. I'm not convinced that this is actually a useful thing though.

BTW I also found this Old Epic page about the NVidia DXT1 problem which happily is fixed but I guess there are still loads of those old chips around; this might still make using DXT1 exclusively on the PC impossible. The page also has some good sample textures to kill your compressor with.


Sean said...

Poking around online searching for public information on BC7 I see that the basic theory is that they use 2 or 3 lines instead of 1, thus allowing you to better handle regions with complex transitions.

If you get 8 bpp instead of 4bpp, then one thing you can do is just store two DX1 blocks, so you can encode two lines and two indices along those lines. You don't need to store two indices, just a choice of which line, so that gives you back 1bpp, so you end up with 16 more bits per pixel to spend on upgrading the quality of the endpoints, or to allow twice as many points along each line.

I'd guess allowing twice as many points is probably better, but maybe more expensive in hardware (and makes it harder to do optimal encoding) so you'd be better off improving the ending precision.

After poking around with the numbers, I guess I'd do something like store one end point as 777, the other as a signed 666 relative to it, and you have one bit left over that you use to indicate that you want to right-shift both of them by 1, to give better precision in dark areas. Or maybe 676 and 2 shift bits for one end, and 565 and 3 shift bits for the other end (so you can better express small steps even in-nondark areas). Either one adds up to 40 bits for a pair of endpoints, 80 bits for two pairs plus 3 bits per pixel to choose a line and an point along it, giving 128 bits.

Actually, to go back to the original, you could just do two independent lines, pick an independent point on each one from each of them, and average or sum those. That gives you more degrees of freedom per pixel; if you do something like sum them, it lets you better handle data where one channel is totally special (e.g. like YCoCg, or if somebody's encoded something independent in one channel), etc. Probably super-painful to try to optimally generate.

cbloom said...

Hey this imaginary compression format game is fun ;)

You could have just one base color and then two deltas to define edges, so that you have a rhombus in color space, and then send 2 indexes as U & V in that polygon. Sort of like the "tight frame" thing.

Or send 4 colors to define two edges, and then send a "t" index to iterpolate along each edge and an "s" index to interpolate between the two edges. This lets you do curved paths in color space, like the way you make a curve by putting strings on the coordinate axes in elementary school. Even if your end points are just 565 you should be able to hit colors very exactly because you have so many options of how to place your end points.

Ivan-Assen said...

Our texture management experience has shown that if you have enough memory, textures go into the Windows disk cache anyway, and the reads are not a problem; however, the uploads to the surfaces the GPU can render from are very slow, and remain a significant problem (with managed pool textures being a bit better than default pool). So for our particular case, the highly compressed 1-bpp version as a replacement of the disk cache will not improve the user experience until the number of textures "on standby" exceeds the available physical memory.

castano said...

Hmm... in my experience PCIe bandwidth is generally not the bottleneck these days. You have much more chances of being limited by bandwidth of the permanent storage device. Besides that, for some developers it's not just matter of bandwidth, but of cutting costs by reducing the number of DVDs required to distribute the game.

cbloom said...

"Hmm... in my experience PCIe bandwidth is generally not the bottleneck these days."

Yeah, I was talking about the speed of actually drawing from the texture. Last time I looked it was a lot faster to read filtered texels from DXTC because of the better cache usage. Is that no true any more? Can I use uncompressed textures at full speed?

"You have much more chances of being limited by
bandwidth of the permanent storage device."

Obviously HD is slow, but my argument is that for *paging* the seek time dominates throughput. That's not true for *streaming* (eg. Bink3d) or for the initial level load of course.

"Besides that, for some developers it's not just matter of bandwidth, but of cutting costs by reducing the number of DVDs required to distribute the game."

Yeah absolutely. Though that might only be an issue for one developer ;)

old rants