cbloom rants: 06-20-10 - Some notes on PNG

6/20/2010

06-20-10 - Some notes on PNG

Before we go further, lets have a look at PNG for reference.

Base PNG by default does :


Filter 5 = "adaptive" - choose from [0..4] per row using minimum sum of abs

Zlib strategy "FILTER" , which just means minMatchLength = 4

So the pngcrush guys did some clever things. Basically the idea is to try all possible ways to write a PNG and see which is smallest. The things you can play with are :


Filter 0..5

Zlib Strategy ; they have a few hard-coded but really it comes down to min match length

Zlib compression level (oddly highest level is not always best)

PNG scan options (progressive, interleaves, bottom up vs top down, etc.)

It's well known that on weird synthetic images "no filter" beats any filter. The only way you can detect that is actually by trying an LZ compress, you cannot tell from statistics.

The clever thing in pngcrush is that they don't search that whole space, but still usually find the optimal (or close to optimal) settings. The way they do it is with a heuristic guided search; they identify things that they have to always test (the 5 default strategies they try) with LZ, then depending on which of those is best they try a few others, and then maybe a few more, then you're done. It's like based on which branch of the search space you walk off initially they know from testing where the optimum likely is.

"loco" here is pngcrush with the LOCO color space conversion (RGB -> (R-G),G,(B-G) ) from JPEG-LS. This is the only lossless color conversion you can do that is not range expanding (eg. stays in bytes) (* correction : not quite true, see comments, of course any lifting style transform can be non-range-expanding using modulo arithmetic; it does appear to be the only *useful* byte-to-byte color conversion though). (BTW LOCO is not allowed in compliant PNG, but it's such a big win that it's unfair to them not to pretend that PNG can do LOCO for purposes of this comparison).

name	png	pngcrush	loco	advpng	crush+adv	best
ryg_t.yello.01.bmp	421321	412303	386437	373573	373573	373573
ryg_t.train.03.bmp	47438	37900	36003	34260	34260	34260
ryg_t.sewers.01.bmp	452540	451880	429031	452540	451880	429031
ryg_t.font.01.bmp	44955	37857	29001	22514	22514	22514
ryg_t.envi.colored03.bmp	113368	97203	102343	113368	97203	97203
ryg_t.envi.colored02.bmp	63241	55036	65334	63241	55036	55036
ryg_t.concrete.cracked.01.bmp	378383	377831	309126	378383	377831	309126
ryg_t.bricks.05.bmp	506528	486679	375964	478709	478709	375964
ryg_t.bricks.02.bmp	554511	553719	465099	554511	553719	465099
ryg_t.aircondition.01.bmp	29960	29960	23398	20320	20320	20320
ryg_t.2d.pn02.bmp	29443	26025	27156	24750	24750	24750
kodak_24.bmp	705730	704710	572591	705730	704710	572591
kodak_23.bmp	557596	556804	483865	557596	556804	483865
kodak_22.bmp	701584	700576	580566	701584	700576	580566
kodak_21.bmp	680262	650956	547829	646806	646806	547829
kodak_20.bmp	505528	504796	439993	500885	500885	439993
kodak_19.bmp	671356	670396	545636	671356	670396	545636
kodak_18.bmp	780454	779326	631000	780454	779326	631000
kodak_17.bmp	624331	623431	510131	615723	615723	510131
kodak_16.bmp	573671	541748	481190	517978	517978	481190
kodak_15.bmp	612134	611258	516741	612134	611258	516741
kodak_14.bmp	739487	703036	590108	739487	703036	590108
kodak_13.bmp	890577	859429	688072	866745	859429	688072
kodak_12.bmp	569219	533864	477151	535591	533864	477151
kodak_11.bmp	635794	634882	519918	635794	634882	519918
kodak_10.bmp	593590	592738	500082	593590	592738	500082
kodak_09.bmp	583329	582489	493958	558418	558418	493958
kodak_08.bmp	787619	786491	611451	787619	786491	611451
kodak_07.bmp	566085	565281	486421	566085	565281	486421
kodak_06.bmp	667888	631478	540442	644928	631478	540442
kodak_05.bmp	807702	806538	638875	807702	806538	638875
kodak_04.bmp	637768	636856	532209	637768	636856	532209
kodak_03.bmp	540788	506321	464434	514336	506321	464434
kodak_02.bmp	617879	616991	508297	596342	596342	508297
kodak_01.bmp	779475	760251	588034	706982	706982	588034
bragzone_TULIPS.bmp	680124	679152	591881	680124	679152	591881
bragzone_SERRANO.bmp	153129	107167	107759	96932	96932	96932
bragzone_SAIL.bmp	807933	806769	623437	807933	806769	623437
bragzone_PEPPERS.bmp	426419	424748	376799	426419	424748	376799
bragzone_MONARCH.bmp	614974	614098	526754	614974	614098	526754
bragzone_LENA.bmp	474968	474251	476524	474968	474251	474251
bragzone_FRYMIRE.bmp	378228	252423	253967	230055	230055	230055
bragzone_clegg.bmp	483752	483056	495956	483752	483056	483056

There's no adv+loco because advpng and advmng both refuse to work on the "loco" bastardized semi-PNG.

BTW I should note that I should eat my hat a little bit over my own "PNG sucks" post. The thing is, yes basic PNG is easy to beat and it has a lot of mistakes in the design, but the basic idea is fine, they did a good job on the standard pretty quickly, but the thing that really seals the deal is that once you make a flexible open standard, people will step in and find ways to play with it, and while base PNG is pretty bag, PNG after optimization is not bad at all.

14 comments:

Anonymous said...: This is the only lossless color conversion you can do that is not range expanding (eg. stays in bytes).

Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)?; June 21, 2010 at 3:52 AM
cbloom said...: " Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)? "

Yeah but it goes to 9 bits. You can deal with that either by being lossy in chroma (dividing the chroma by 2) , or by escaping. That is, YCoCg followed by a DPCM predictor will have a range of [-256,256] but will almost always fit in [-128,128] , so you could just save one value to indicate "one more byte" and write the extra rare byte when needed.

BTW both of these modes would make good sense for PNG2. (lossy YCoCg in particular would be valuable).; June 21, 2010 at 9:18 AM
ryg said...: "Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)?"
YCoCg or any other YUV-like transform (i.e. transforms that are basically rotations of RGB + rounding) have to increase the dynamic range if they're to be reversible. Your YUV code points are a uniformly spaced cubic lattice. The RGB color space you're trying to cover is a rotated cube relative to that lattice, and the AABB for any such rotated cube (which defines the range of output values) is always at least as large as the original cube. The S-transform trick can only get you so far - as long as you code the three channels independently, you need the extra range.; June 22, 2010 at 1:12 AM
cbloom said...: "Your YUV code points are a uniformly spaced cubic lattice. The RGB color space you're trying to cover is a rotated cube relative to that lattice, and the AABB for any such rotated cube (which defines the range of output values) is always at least as large as the original cube."

Yeah, well said. There is actually a way around this. I think it's in the paper on Piecewise Linear Haar that I link here :

http://cbloomrants.blogspot.com/2008/09/09-08-08-1.html

http://cbloomrants.blogspot.com/2008/09/09-11-08-2.html

I also note that YCoCg is just two Haars : Haar[R,B] then Haar[G,B]

So you could do a non-expanding YCoCg by doing two PLHaars.

Not useful in practice.; June 22, 2010 at 10:36 AM
Anonymous said...: Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range.

Unless I'm misunderstanding ryg's response, I think his explanation is off the mark. (Although I don't know what "the S transform trick" is.)

Technically LOCO is also range expanding (that is, it's range expanding under his argument); obviously R-G requires 9 bits. However you can do everything mod 256 without losing data. So it requires a deeper analysis than just "a rotated box is larger". Which is why I asked. :); June 22, 2010 at 2:03 PM
cbloom said...: " Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range."

Well to be clear Haar does *not* make output values in the same range. There is a funny thing like Haar called PLHaar which nobody uses which does go into the same range.

" Unless I'm misunderstanding ryg's response, I think his explanation is off the mark."

Ryg is right he's just being not completely thorough. A Haar is a 45 degree rotation, which means the AABB gets bigger. That is true as he said. Of course you can take your funny diamond shape and pack it back into a square, but no *linear* transform will do it. In that sense if you pack it back into a square it's not really YCoCg any more.

" Technically LOCO is also range expanding (that is, it's range expanding under his argument); "

Yeah, I suppose so. The problem is not that YCoCg is range expanding - it's that it's range expanding and there's no simple operation to get back into a cube.

In particular, geometrically LOCO is not a rotation, it's a shear. Imagine a square and shear up one side and shear down the other side. Obviously you can take the sheared square and cut it along a horizontal line and put the top under bottom. Now you have a square again. This is what the modulo does.; June 22, 2010 at 4:12 PM
cbloom said...: I should say a Haar is not actually a 45 degree rotation. A Hadamard is a 45 degree rotation which is volume expanding. A Haar is a Hadamard with a non-uniform scale (one axis gets /2). The Haar is not volume expanding, but is "range" expanding. Haar is the same as S Transform.

The "PL Haar" and "Modulo transform" papers are both quite good, I encourage interested readers to follow up there.; June 22, 2010 at 4:33 PM
Anonymous said...: Ok, but there are rotational transformations that pack a square perfectly, although they involve an additional scale and reshuffling (normally you see this as a rotated grid mapped to a square). Since the area of the two shapes is identical by definition it's probably equivalent to scaling one axis shorter and one axis longer. So yeah, I don't see how to get from A to B.

To ryg's argument, I think it's that the AABB is a distraction. The "height" of any given slice of the rotated solid is longer than the original square's height without even looking at the bounds. (And the bounds are where the shear looks equally bad.)

Or to make it concretely obvious, take some coordinate axis of tre transformed data after rotation, and transform that back through the inverse rotation, so it comes out to some diagonal in the RGB cube. If the pre-transform one is long enough to cover the whole line segment through the cube, it must have been longer than a side of the cube.

The other question is whether LOCO is the best "simple" shear. For example, you could find a shear that maps the Y to a primary axis. LOCO is mapping the grays onto the G axis--a perfect gray comes out as pure G (IIRC). So you could have the R and B computation involve a YCoCg-like shifted version of G to reduce the dominance of G's effect on R and B.

However, since you're lossless, you don't care about perceptual effects at all, you just care about modelling the data. So the 'shades of grey is most important'/'most neutral' might well be the better model, rather than something that optimizes for colors where Co=Cg=0 .; June 24, 2010 at 12:03 AM
Anonymous said...: "Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range."

Sorry, I just realized I miswrote that, which is why it sounded dumb. That Haar transform was the thing I was thinking of that I *thought* at the time you meant squeezed them in the same range. I was assuming that the reason you were going to so much effort to recover the extra bit from (a+b)/2 instead of just storing (a+b) in the first place was because the whole thing fit in 16 bits as is, not because you were trying to keep it in 17 bits instead of 18. But if you can't actually store (a-b) mod 256, then yeah.; June 24, 2010 at 12:11 AM
Anonymous said...: So, actually, it looks to me like the the description for YCoCg on the multimedia.cx wiki is written as a series of reversible liftings, although maybe I'm misunderstanding something...

Co = R - B
t = B + (Co >> 1)
Cg = G - t
Y = t + (Cg >> 1)

and then

t = Y - (Cg >> 1)
G = Cg + t
B = t - (Co >> 1)
R = Co + B

So isn't that not-range-expanding? I guess it's basically four shears.

Also, since you allow per-line filtering, and as you said you can do the colorspace "conversion" in the filter, then you could have multiple ones and decide them in a fine-grained way.; June 24, 2010 at 3:04 AM
cbloom said...: "So isn't that not-range-expanding? I guess it's basically four shears."

Of course that is range expanding.

That is a standard "lifting" way to write lossless YCoCg. Constructing a transform through lifting ensures a transform is lossless, but tells you nothing about volume expansion or range expansion.

"To ryg's argument, I think it's that the AABB is a distraction."

No, the AABB is exactly the right thing to look at, because that tells you the range of each coordinate needed.

"The other question is whether LOCO is the best simple shear."

I'm pretty sure it's the *only* simple shear (other that obvious permutations of RGB).; June 24, 2010 at 8:00 AM
cbloom said...: Eh .. actually, this :

Co = R - B
t = B + (Co >> 1)

is a Haar step in the lifting form. This is {t,Co} <- Haar(R,B)

Haar written in this way actually can be made non-range-expanding of course, it's just

Co = [ R - B ] mod 256
t = [ B + (Co >> 1) ] mod 256

t is supposed to be the "low" part of the Haar, it should be close to (R+B)/2 , but when the modulos kick in it can wind up being completely different than that which is annoying.

I dunno maybe this might be a win once in a while.; June 24, 2010 at 9:20 AM
cbloom said...: I guess the crucial thing about LOCO is that the lowpass part (G) is not affected by the modulo in weird ways.

The terrible thing about Modulo-YCoCg is that it can do weird things to the lowpass part.

(BTW Modulo-YCoCg is known as "CFH" and is discussed in the literature); June 24, 2010 at 2:40 PM
Anonymous said...: Haar written in this way actually can be made non-range-expanding of course

Right, that's what I meant, you just have to interpret the variables as implicitly 8-bit, sorry, I thought that was obvious.

I'm pretty sure it's the *only* simple shear (other that obvious permutations of RGB).

LOCO is something like:

out[0] = R-G
out[1] = G
out[2] = B-G

so there's all sorts of other simple reversible shears:

out[0] = R-(G>>1)
out[1] = G
out[2] = B-(G>>2)

etc. etc. I have no idea if there's any point to them, though.; June 25, 2010 at 4:01 AM

cbloom rants

6/20/2010

06-20-10 - Some notes on PNG

14 comments:

old rants