cbloom rants: 11-20-08

So we know DXT1 is bad compared to a variable bitrate coder, but it is a small block fixed rate coder, which is pretty hard to deal with.

Small block inherently gives up a lot of coding capability, because you aren't allowed to use any cross-block information. H264 and HDPhoto are both small block (4x4) but both make very heavy use of cross-block information for either context coding, DC prediction, or lapping. Even JPEG is not a true small block coder because it has side information in the Huffman table that captures whole-image statistics.

Fixed bitrate blocks inherently gives up even more. It kills your ability to do any rate-distortion type of optimization. You can't allocate bits where they're needed. You might have images with big flat sections where you are actually wasting bits (you don't need all 64 bits for a 4x4 block), and then you have other areas that desperately need a few more bits, but you can't gived them to them.

So, what if we keep ourselves constrained to the idea of a fixed size block and try to use a better coder? What is the limit on how well you can do with those constraints? I thought I'd see if I could answer that reasonably quickly.

What I made is an 8x8 pixel fixed rate coder. It has zero side information, eg. no per-image tables. (it does have about 16 constants that are used for all images). Each block is coded to a fixed bit rate. Here I'm coding to 4 bits per pixel (the same as DXT1) so that I can compare RMSE directly, which is a 32 byte block for 8x8 pixels. It also works pretty well at 24 byte blocks (which is 1 bit per byte), or 64 for high quality, etc.

This 8x8 coder does a lossless YCoCg transform and a lossy DCT. Unlike JPEG, there is no quantization, no subsampling of chroma, no huffman table, etc. Coding is via an embedded bitplane coder with zerotree-style context prediction. I haven't spent much time on this, so the coding schemes are very rough. CodeTree and CodeLinear are two different coding techniques, and neither one is ideal.

Obviously going to 8x8 instead of 4x4 is a big advantage, but it seems like a more reasonable size for future hardware. To really improve the coding significantly on 4x4 blocks you would have to start using something like VQ with a codebook which hardware people don't like.

In the table below you'll see that CodeTree and CodeLinear generally provide a nice improvement on the natural images, about 20%. In general they're pretty close to half way between DXTC and the full image coder "cbwave". They have a different kind of perceptual artifact when they have errors - unlike DXTC which just make things really blocky, these get the halo ringing artifacts like JPEG (it's inherent in truncating DCT's).

The new coders do really badly on the weird synthetic images from bragzone, like clegg, frymire and serrano. I'd have to fix that if I really cared about these things.

One thing that is encouraging is that this coder does *very* well on the simple synthetic images, like the "linear_ramp" and the "pink_green" and "orange_purple". I think these synthetic images are a lot like what game lightmaps are like, and the new schemes are near lossless on them.

BTW image compression for paging is sort of a whole other issue. For one thing, a per-image table is perfectly reasonable to have, and you could work on something like 32x32 blocks. But more important is that in the short term you still need to provide the image in DXT1 to the graphics hardware. So you either just have to page the data in DXT1 already, or you have to recompress it, and as we've seen here the "real-time" DXT1 recompressors are not high enough quality for ubiquitous use.

ADDENDUM I forgot but you may have noticed the "ryg" in this table is also not the same as previous "ryg" - I fixed a few of the little bugs and you can see the improvement here. It's still not competitive, I think there may be more errors in the best fit optimization portion of the code, but he's got that code so optimized and obfuscated I can't see what's going on.

Oh, BTW the "CB" in this table is different than the previous table; the one here uses 4-means instead of 2-means, seeded from the pca direction, and then I try using each of the 4 means as endpoints. It's still not quite as good as Squish, but it's closer. It does beat Squish on some of the more degenerate images at the bottom, such as "linear_ramp". It also beats Squish on artificial tests like images that contain only 2 colors. For example on linear_ramp2 without optimization, 4-means gets 1.5617 while Squish gets 2.3049 ; most of that difference goes away after annealing though.

RMSE per pixel :

file	Squish opt	Squish	CB opt	CB	ryg	D3DX8	FastDXT	cbwave KLTY	CodeTree	CodeLinear
kodim01.bmp	8.2808	8.3553	8.3035	8.516	8.9185	9.8466	9.9565	2.6068	5.757835	5.659023
kodim02.bmp	6.1086	6.2876	6.1159	6.25	6.8011	7.4308	8.456	1.6973	4.131007	4.144241
kodim03.bmp	4.7804	4.9181	4.7953	4.9309	5.398	6.094	6.4839	1.3405	3.369018	3.50115
kodim04.bmp	5.6913	5.8116	5.7201	5.8837	6.3424	7.1032	7.3189	1.8076	4.254454	4.174228
kodim05.bmp	9.6472	9.7223	9.6766	9.947	10.2522	11.273	12.0156	2.9739	6.556041	6.637885
kodim06.bmp	7.1472	7.2171	7.1596	7.3224	7.6423	8.5195	8.6202	2.0132	5.013081	4.858232
kodim07.bmp	5.7804	5.8834	5.7925	5.9546	6.3181	7.2182	7.372	1.4645	3.76087	3.79437
kodim08.bmp	10.2391	10.3212	10.2865	10.5499	10.8534	11.8703	12.2668	3.2936	6.861067	6.927792
kodim09.bmp	5.2871	5.3659	5.3026	5.4236	5.7315	6.5332	6.6716	1.6269	3.473094	3.479715
kodim10.bmp	5.2415	5.3366	5.2538	5.3737	5.7089	6.4601	6.4592	1.7459	3.545115	3.593297
kodim11.bmp	6.7261	6.8206	6.7409	6.9128	7.3099	8.1056	8.2492	1.8411	4.906141	4.744971
kodim12.bmp	4.7911	4.8718	4.799	4.9013	5.342	6.005	6.0748	1.5161	3.210518	3.231271
kodim13.bmp	10.8676	10.9428	10.9023	11.2169	11.6049	12.7139	12.9978	4.1355	9.044009	8.513297
kodim14.bmp	8.3034	8.3883	8.3199	8.5754	8.8656	9.896	10.8481	2.4191	6.212482	6.222196
kodim15.bmp	5.8233	5.9525	5.8432	6.0189	6.3297	7.3085	7.4932	1.6236	4.3074	4.441998
kodim16.bmp	5.0593	5.1629	5.0595	5.1637	5.5526	6.3361	6.1592	1.546	3.476671	3.333637
kodim17.bmp	5.5019	5.6127	5.51	5.6362	6.0357	6.7395	6.8989	1.7166	4.125859	4.007367
kodim18.bmp	7.9879	8.0897	8.0034	8.225	8.6925	9.5357	9.7857	2.9802	6.743892	6.376692
kodim19.bmp	6.5715	6.652	6.5961	6.7445	7.2684	7.9229	8.0096	2.0518	4.45822	4.353687
kodim20.bmp	5.4533	5.5303	5.47	5.5998	5.9087	6.4878	6.8629	1.5359	4.190565	4.154571
kodim21.bmp	7.1318	7.2045	7.1493	7.3203	7.6764	8.4703	8.6508	2.0659	5.269787	5.05321
kodim22.bmp	6.43	6.5127	6.4444	6.6185	7.0705	8.0046	7.9488	2.2574	5.217884	5.142252
kodim23.bmp	4.8995	5.0098	4.906	5.0156	5.3789	6.3057	6.888	1.3954	3.20464	3.378545
kodim24.bmp	8.4271	8.5274	8.442	8.7224	8.9206	9.9389	10.5156	2.4977	7.618436	7.389021
clegg.bmp	14.9733	15.2566	15.1516	16.0477	15.7163	21.471	32.7192	10.5426	21.797655	25.199576
FRYMIRE.bmp	10.7184	12.541	11.9631	12.9719	12.681	16.7308	28.9283	6.2394	21.543401	24.225852
LENA.bmp	7.138	7.2346	7.1691	7.3897	7.6053	8.742	9.5143	4.288	7.936599	8.465576
MONARCH.bmp	6.5526	6.6292	6.5809	6.7556	7.0313	8.1053	8.6993	1.6911	5.880189	5.915117
PEPPERS.bmp	6.3966	6.5208	6.436	6.6482	6.9006	8.1855	8.8893	2.3022	6.15367	6.228315
SAIL.bmp	8.3233	8.3903	8.3417	8.5561	8.9823	9.7838	10.5673	2.9003	6.642762	6.564393
SERRANO.bmp	6.3508	6.757	6.5572	6.991	7.0722	9.0549	18.3631	4.6489	13.516339	16.036401
TULIPS.bmp	7.5768	7.656	7.5959	7.8172	8.0101	9.3817	10.5873	2.2228	5.963537	6.384049
lena512ggg.bmp	4.8352	4.915	4.8261	4.877	5.1986	6.0059	5.5247		2.054319	2.276361
lena512pink.bmp	4.5786	4.6726	4.581	4.6863	5.0987	5.8064	5.838		3.653436	3.815336
lena512pink0g.bmp	3.7476	3.8058	3.7489	3.8034	4.2756	5.0732	4.8933		4.091045	5.587278
linear_ramp1.BMP	1.4045	2.1243	1.3741	1.6169	2.0939	2.6317	3.981		0.985808	0.984156
linear_ramp2.BMP	1.3377	2.3049	1.3021	1.5617	1.9306	2.5396	4.0756		0.628664	0.629358
orange_purple.BMP	2.9032	3.0685	2.9026	2.9653	3.2684	4.4123	7.937		1.471407	2.585087
pink_green.BMP	3.2058	3.3679	3.2	3.2569	3.7949	4.5127	7.3481		1.247967	1.726312

cbloom rants

11/20/2008

11-20-08 - DXTC Part 3

No comments:

old rants