Base PNG by default does :

Filter 5 = "adaptive" - choose from [0..4] per row using minimum sum of abs Zlib strategy "FILTER" , which just means minMatchLength = 4

So the pngcrush guys did some clever things. Basically the idea is to try all possible ways to write a PNG and see which is smallest. The things you can play with are :

Filter 0..5 Zlib Strategy ; they have a few hard-coded but really it comes down to min match length Zlib compression level (oddly highest level is not always best) PNG scan options (progressive, interleaves, bottom up vs top down, etc.)It's well known that on weird synthetic images "no filter" beats any filter. The only way you can detect that is actually by trying an LZ compress, you cannot tell from statistics.

The clever thing in pngcrush is that they don't search that whole space, but still usually find the optimal (or close to optimal) settings. The way they do it is with a heuristic guided search; they identify things that they have to always test (the 5 default strategies they try) with LZ, then depending on which of those is best they try a few others, and then maybe a few more, then you're done. It's like based on which branch of the search space you walk off initially they know from testing where the optimum likely is.

"loco" here is pngcrush with the LOCO color space conversion (RGB -> (R-G),G,(B-G) ) from JPEG-LS. This is the only lossless color conversion you can do that is not range expanding (eg. stays in bytes) (* correction : not quite true, see comments, of course any lifting style transform can be non-range-expanding using modulo arithmetic; it does appear to be the only *useful* byte-to-byte color conversion though). (BTW LOCO is not allowed in compliant PNG, but it's such a big win that it's unfair to them not to pretend that PNG can do LOCO for purposes of this comparison).

name | png | pngcrush | loco | advpng | crush+adv | best |

ryg_t.yello.01.bmp | 421321 | 412303 | 386437 | 373573 | 373573 | 373573 |

ryg_t.train.03.bmp | 47438 | 37900 | 36003 | 34260 | 34260 | 34260 |

ryg_t.sewers.01.bmp | 452540 | 451880 | 429031 | 452540 | 451880 | 429031 |

ryg_t.font.01.bmp | 44955 | 37857 | 29001 | 22514 | 22514 | 22514 |

ryg_t.envi.colored03.bmp | 113368 | 97203 | 102343 | 113368 | 97203 | 97203 |

ryg_t.envi.colored02.bmp | 63241 | 55036 | 65334 | 63241 | 55036 | 55036 |

ryg_t.concrete.cracked.01.bmp | 378383 | 377831 | 309126 | 378383 | 377831 | 309126 |

ryg_t.bricks.05.bmp | 506528 | 486679 | 375964 | 478709 | 478709 | 375964 |

ryg_t.bricks.02.bmp | 554511 | 553719 | 465099 | 554511 | 553719 | 465099 |

ryg_t.aircondition.01.bmp | 29960 | 29960 | 23398 | 20320 | 20320 | 20320 |

ryg_t.2d.pn02.bmp | 29443 | 26025 | 27156 | 24750 | 24750 | 24750 |

kodak_24.bmp | 705730 | 704710 | 572591 | 705730 | 704710 | 572591 |

kodak_23.bmp | 557596 | 556804 | 483865 | 557596 | 556804 | 483865 |

kodak_22.bmp | 701584 | 700576 | 580566 | 701584 | 700576 | 580566 |

kodak_21.bmp | 680262 | 650956 | 547829 | 646806 | 646806 | 547829 |

kodak_20.bmp | 505528 | 504796 | 439993 | 500885 | 500885 | 439993 |

kodak_19.bmp | 671356 | 670396 | 545636 | 671356 | 670396 | 545636 |

kodak_18.bmp | 780454 | 779326 | 631000 | 780454 | 779326 | 631000 |

kodak_17.bmp | 624331 | 623431 | 510131 | 615723 | 615723 | 510131 |

kodak_16.bmp | 573671 | 541748 | 481190 | 517978 | 517978 | 481190 |

kodak_15.bmp | 612134 | 611258 | 516741 | 612134 | 611258 | 516741 |

kodak_14.bmp | 739487 | 703036 | 590108 | 739487 | 703036 | 590108 |

kodak_13.bmp | 890577 | 859429 | 688072 | 866745 | 859429 | 688072 |

kodak_12.bmp | 569219 | 533864 | 477151 | 535591 | 533864 | 477151 |

kodak_11.bmp | 635794 | 634882 | 519918 | 635794 | 634882 | 519918 |

kodak_10.bmp | 593590 | 592738 | 500082 | 593590 | 592738 | 500082 |

kodak_09.bmp | 583329 | 582489 | 493958 | 558418 | 558418 | 493958 |

kodak_08.bmp | 787619 | 786491 | 611451 | 787619 | 786491 | 611451 |

kodak_07.bmp | 566085 | 565281 | 486421 | 566085 | 565281 | 486421 |

kodak_06.bmp | 667888 | 631478 | 540442 | 644928 | 631478 | 540442 |

kodak_05.bmp | 807702 | 806538 | 638875 | 807702 | 806538 | 638875 |

kodak_04.bmp | 637768 | 636856 | 532209 | 637768 | 636856 | 532209 |

kodak_03.bmp | 540788 | 506321 | 464434 | 514336 | 506321 | 464434 |

kodak_02.bmp | 617879 | 616991 | 508297 | 596342 | 596342 | 508297 |

kodak_01.bmp | 779475 | 760251 | 588034 | 706982 | 706982 | 588034 |

bragzone_TULIPS.bmp | 680124 | 679152 | 591881 | 680124 | 679152 | 591881 |

bragzone_SERRANO.bmp | 153129 | 107167 | 107759 | 96932 | 96932 | 96932 |

bragzone_SAIL.bmp | 807933 | 806769 | 623437 | 807933 | 806769 | 623437 |

bragzone_PEPPERS.bmp | 426419 | 424748 | 376799 | 426419 | 424748 | 376799 |

bragzone_MONARCH.bmp | 614974 | 614098 | 526754 | 614974 | 614098 | 526754 |

bragzone_LENA.bmp | 474968 | 474251 | 476524 | 474968 | 474251 | 474251 |

bragzone_FRYMIRE.bmp | 378228 | 252423 | 253967 | 230055 | 230055 | 230055 |

bragzone_clegg.bmp | 483752 | 483056 | 495956 | 483752 | 483056 | 483056 |

There's no adv+loco because advpng and advmng both refuse to work on the "loco" bastardized semi-PNG.

BTW I should note that I should eat my hat a little bit over my own "PNG sucks" post. The thing is, yes basic PNG is easy to beat and it has a lot of mistakes in the design, but the basic idea is fine, they did a good job on the standard pretty quickly, but the thing that really seals the deal is that once you make a flexible open standard, people will step in and find ways to play with it, and while base PNG is pretty bag, PNG after optimization is not bad at all.

## 14 comments:

This is the only lossless color conversion you can do that is not range expanding (eg. stays in bytes).Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)?

" Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)? "

Yeah but it goes to 9 bits. You can deal with that either by being lossy in chroma (dividing the chroma by 2) , or by escaping. That is, YCoCg followed by a DPCM predictor will have a range of [-256,256] but will almost always fit in [-128,128] , so you could just save one value to indicate "one more byte" and write the extra rare byte when needed.

BTW both of these modes would make good sense for PNG2. (lossy YCoCg in particular would be valuable).

"Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)?"

YCoCg or any other YUV-like transform (i.e. transforms that are basically rotations of RGB + rounding) have to increase the dynamic range if they're to be reversible. Your YUV code points are a uniformly spaced cubic lattice. The RGB color space you're trying to cover is a rotated cube relative to that lattice, and the AABB for any such rotated cube (which defines the range of output values) is always at least as large as the original cube. The S-transform trick can only get you so far - as long as you code the three channels independently, you need the extra range.

"Your YUV code points are a uniformly spaced cubic lattice. The RGB color space you're trying to cover is a rotated cube relative to that lattice, and the AABB for any such rotated cube (which defines the range of output values) is always at least as large as the original cube."

Yeah, well said. There is actually a way around this. I think it's in the paper on Piecewise Linear Haar that I link here :

http://cbloomrants.blogspot.com/2008/09/09-08-08-1.html

http://cbloomrants.blogspot.com/2008/09/09-11-08-2.html

I also note that YCoCg is just two Haars : Haar[R,B] then Haar[G,B]

So you could do a non-expanding YCoCg by doing two PLHaars.

Not useful in practice.

Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range.

Unless I'm misunderstanding ryg's response, I think his explanation is off the mark. (Although I don't know what "the S transform trick" is.)

TechnicallyLOCO is also range expanding (that is, it's range expanding under his argument); obviously R-G requires 9 bits. However you can do everything mod 256 without losing data. So it requires a deeper analysis than just "a rotated box is larger". Which is why I asked. :)" Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range."

Well to be clear Haar does *not* make output values in the same range. There is a funny thing like Haar called PLHaar which nobody uses which does go into the same range.

" Unless I'm misunderstanding ryg's response, I think his explanation is off the mark."

Ryg is right he's just being not completely thorough. A Haar is a 45 degree rotation, which means the AABB gets bigger. That is true as he said. Of course you can take your funny diamond shape and pack it back into a square, but no *linear* transform will do it. In that sense if you pack it back into a square it's not really YCoCg any more.

" Technically LOCO is also range expanding (that is, it's range expanding under his argument); "

Yeah, I suppose so. The problem is not that YCoCg is range expanding - it's that it's range expanding and there's no simple operation to get back into a cube.

In particular, geometrically LOCO is not a rotation, it's a shear. Imagine a square and shear up one side and shear down the other side. Obviously you can take the sheared square and cut it along a horizontal line and put the top under bottom. Now you have a square again. This is what the modulo does.

I should say a Haar is not actually a 45 degree rotation. A Hadamard is a 45 degree rotation which is volume expanding. A Haar is a Hadamard with a non-uniform scale (one axis gets /2). The Haar is not volume expanding, but is "range" expanding. Haar is the same as S Transform.

The "PL Haar" and "Modulo transform" papers are both quite good, I encourage interested readers to follow up there.

Ok, but there are rotational transformations that pack a square perfectly, although they involve an additional scale and reshuffling (normally you see this as a rotated grid mapped to a square). Since the area of the two shapes is identical by definition it's probably equivalent to scaling one axis shorter and one axis longer. So yeah, I don't see how to get from A to B.

To ryg's argument, I think it's that the AABB is a distraction. The "height" of any given slice of the rotated solid is longer than the original square's height without even looking at the bounds. (And the bounds are where the shear looks equally bad.)

Or to make it concretely obvious, take some coordinate axis of tre transformed data after rotation, and transform that back through the inverse rotation, so it comes out to some diagonal in the RGB cube. If the pre-transform one is long enough to cover the whole line segment through the cube, it must have been longer than a side of the cube.

The other question is whether LOCO is the best "simple" shear. For example, you could find a shear that maps the Y to a primary axis. LOCO is mapping the grays onto the G axis--a perfect gray comes out as pure G (IIRC). So you could have the R and B computation involve a YCoCg-like shifted version of G to reduce the dominance of G's effect on R and B.

However, since you're lossless, you don't care about perceptual effects at all, you just care about modelling the data. So the 'shades of grey is most important'/'most neutral' might well be the better model, rather than something that optimizes for colors where Co=Cg=0 .

"Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range."

Sorry, I just realized I miswrote that, which is why it sounded dumb. That Haar transform was the thing I was thinking of that I *thought* at the time you meant squeezed them in the same range. I was assuming that the reason you were going to so much effort to recover the extra bit from (a+b)/2 instead of just storing (a+b) in the first place was because the whole thing fit in 16 bits as is, not because you were trying to keep it in 17 bits instead of 18. But if you can't actually store (a-b) mod 256, then yeah.

So, actually, it looks to me like the the description for YCoCg on the multimedia.cx wiki is written as a series of reversible liftings, although maybe I'm misunderstanding something...

Co = R - B

t = B + (Co >> 1)

Cg = G - t

Y = t + (Cg >> 1)

and then

t = Y - (Cg >> 1)

G = Cg + t

B = t - (Co >> 1)

R = Co + B

So isn't that not-range-expanding? I guess it's basically four shears.

Also, since you allow per-line filtering, and as you said you can do the colorspace "conversion" in the filter, then you could have multiple ones and decide them in a fine-grained way.

"So isn't that not-range-expanding? I guess it's basically four shears."

Of course that is range expanding.

That is a standard "lifting" way to write lossless YCoCg. Constructing a transform through lifting ensures a transform is lossless, but tells you nothing about volume expansion or range expansion.

"To ryg's argument, I think it's that the AABB is a distraction."

No, the AABB is exactly the right thing to look at, because that tells you the range of each coordinate needed.

"The other question is whether LOCO is the best simple shear."

I'm pretty sure it's the *only* simple shear (other that obvious permutations of RGB).

Eh .. actually, this :

Co = R - B

t = B + (Co >> 1)

is a Haar step in the lifting form. This is {t,Co} <- Haar(R,B)

Haar written in this way actually can be made non-range-expanding of course, it's just

Co = [ R - B ] mod 256

t = [ B + (Co >> 1) ] mod 256

t is supposed to be the "low" part of the Haar, it should be close to (R+B)/2 , but when the modulos kick in it can wind up being completely different than that which is annoying.

I dunno maybe this might be a win once in a while.

I guess the crucial thing about LOCO is that the lowpass part (G) is not affected by the modulo in weird ways.

The terrible thing about Modulo-YCoCg is that it can do weird things to the lowpass part.

(BTW Modulo-YCoCg is known as "CFH" and is discussed in the literature)

Haar written in this way actually can be made non-range-expanding of courseRight, that's what I meant, you just have to interpret the variables as implicitly 8-bit, sorry, I thought that was obvious.

I'm pretty sure it's the *only* simple shear (other that obvious permutations of RGB).LOCO is something like:

out[0] = R-G

out[1] = G

out[2] = B-G

so there's all sorts of other simple reversible shears:

out[0] = R-(G>>1)

out[1] = G

out[2] = B-(G>>2)

etc. etc. I have no idea if there's any point to them, though.

Post a Comment