6/20/2010

06-20-10 - Some notes on PNG

Before we go further, lets have a look at PNG for reference.

Base PNG by default does :


Filter 5 = "adaptive" - choose from [0..4] per row using minimum sum of abs

Zlib strategy "FILTER" , which just means minMatchLength = 4

So the pngcrush guys did some clever things. Basically the idea is to try all possible ways to write a PNG and see which is smallest. The things you can play with are :


Filter 0..5

Zlib Strategy ; they have a few hard-coded but really it comes down to min match length

Zlib compression level (oddly highest level is not always best)

PNG scan options (progressive, interleaves, bottom up vs top down, etc.)

It's well known that on weird synthetic images "no filter" beats any filter. The only way you can detect that is actually by trying an LZ compress, you cannot tell from statistics.

The clever thing in pngcrush is that they don't search that whole space, but still usually find the optimal (or close to optimal) settings. The way they do it is with a heuristic guided search; they identify things that they have to always test (the 5 default strategies they try) with LZ, then depending on which of those is best they try a few others, and then maybe a few more, then you're done. It's like based on which branch of the search space you walk off initially they know from testing where the optimum likely is.

"loco" here is pngcrush with the LOCO color space conversion (RGB -> (R-G),G,(B-G) ) from JPEG-LS. This is the only lossless color conversion you can do that is not range expanding (eg. stays in bytes) (* correction : not quite true, see comments, of course any lifting style transform can be non-range-expanding using modulo arithmetic; it does appear to be the only *useful* byte-to-byte color conversion though). (BTW LOCO is not allowed in compliant PNG, but it's such a big win that it's unfair to them not to pretend that PNG can do LOCO for purposes of this comparison).

name png pngcrush loco advpng crush+adv best
ryg_t.yello.01.bmp 421321 412303 386437 373573 373573 373573
ryg_t.train.03.bmp 47438 37900 36003 34260 34260 34260
ryg_t.sewers.01.bmp 452540 451880 429031 452540 451880 429031
ryg_t.font.01.bmp 44955 37857 29001 22514 22514 22514
ryg_t.envi.colored03.bmp 113368 97203 102343 113368 97203 97203
ryg_t.envi.colored02.bmp 63241 55036 65334 63241 55036 55036
ryg_t.concrete.cracked.01.bmp 378383 377831 309126 378383 377831 309126
ryg_t.bricks.05.bmp 506528 486679 375964 478709 478709 375964
ryg_t.bricks.02.bmp 554511 553719 465099 554511 553719 465099
ryg_t.aircondition.01.bmp 29960 29960 23398 20320 20320 20320
ryg_t.2d.pn02.bmp 29443 26025 27156 24750 24750 24750
kodak_24.bmp 705730 704710 572591 705730 704710 572591
kodak_23.bmp 557596 556804 483865 557596 556804 483865
kodak_22.bmp 701584 700576 580566 701584 700576 580566
kodak_21.bmp 680262 650956 547829 646806 646806 547829
kodak_20.bmp 505528 504796 439993 500885 500885 439993
kodak_19.bmp 671356 670396 545636 671356 670396 545636
kodak_18.bmp 780454 779326 631000 780454 779326 631000
kodak_17.bmp 624331 623431 510131 615723 615723 510131
kodak_16.bmp 573671 541748 481190 517978 517978 481190
kodak_15.bmp 612134 611258 516741 612134 611258 516741
kodak_14.bmp 739487 703036 590108 739487 703036 590108
kodak_13.bmp 890577 859429 688072 866745 859429 688072
kodak_12.bmp 569219 533864 477151 535591 533864 477151
kodak_11.bmp 635794 634882 519918 635794 634882 519918
kodak_10.bmp 593590 592738 500082 593590 592738 500082
kodak_09.bmp 583329 582489 493958 558418 558418 493958
kodak_08.bmp 787619 786491 611451 787619 786491 611451
kodak_07.bmp 566085 565281 486421 566085 565281 486421
kodak_06.bmp 667888 631478 540442 644928 631478 540442
kodak_05.bmp 807702 806538 638875 807702 806538 638875
kodak_04.bmp 637768 636856 532209 637768 636856 532209
kodak_03.bmp 540788 506321 464434 514336 506321 464434
kodak_02.bmp 617879 616991 508297 596342 596342 508297
kodak_01.bmp 779475 760251 588034 706982 706982 588034
bragzone_TULIPS.bmp 680124 679152 591881 680124 679152 591881
bragzone_SERRANO.bmp 153129 107167 107759 96932 96932 96932
bragzone_SAIL.bmp 807933 806769 623437 807933 806769 623437
bragzone_PEPPERS.bmp 426419 424748 376799 426419 424748 376799
bragzone_MONARCH.bmp 614974 614098 526754 614974 614098 526754
bragzone_LENA.bmp 474968 474251 476524 474968 474251 474251
bragzone_FRYMIRE.bmp 378228 252423 253967 230055 230055 230055
bragzone_clegg.bmp 483752 483056 495956 483752 483056 483056

There's no adv+loco because advpng and advmng both refuse to work on the "loco" bastardized semi-PNG.

BTW I should note that I should eat my hat a little bit over my own "PNG sucks" post. The thing is, yes basic PNG is easy to beat and it has a lot of mistakes in the design, but the basic idea is fine, they did a good job on the standard pretty quickly, but the thing that really seals the deal is that once you make a flexible open standard, people will step in and find ways to play with it, and while base PNG is pretty bag, PNG after optimization is not bad at all.

14 comments:

  1. This is the only lossless color conversion you can do that is not range expanding (eg. stays in bytes).

    Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)?

    ReplyDelete
  2. " Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)? "

    Yeah but it goes to 9 bits. You can deal with that either by being lossy in chroma (dividing the chroma by 2) , or by escaping. That is, YCoCg followed by a DPCM predictor will have a range of [-256,256] but will almost always fit in [-128,128] , so you could just save one value to indicate "one more byte" and write the extra rare byte when needed.

    BTW both of these modes would make good sense for PNG2. (lossy YCoCg in particular would be valuable).

    ReplyDelete
  3. "Didn't you write up previously that there's a YCoCg that's reversible (you just have to do some kind of odd/even analysis to undo the rounding)?"
    YCoCg or any other YUV-like transform (i.e. transforms that are basically rotations of RGB + rounding) have to increase the dynamic range if they're to be reversible. Your YUV code points are a uniformly spaced cubic lattice. The RGB color space you're trying to cover is a rotated cube relative to that lattice, and the AABB for any such rotated cube (which defines the range of output values) is always at least as large as the original cube. The S-transform trick can only get you so far - as long as you code the three channels independently, you need the extra range.

    ReplyDelete
  4. "Your YUV code points are a uniformly spaced cubic lattice. The RGB color space you're trying to cover is a rotated cube relative to that lattice, and the AABB for any such rotated cube (which defines the range of output values) is always at least as large as the original cube."

    Yeah, well said. There is actually a way around this. I think it's in the paper on Piecewise Linear Haar that I link here :

    http://cbloomrants.blogspot.com/2008/09/09-08-08-1.html

    http://cbloomrants.blogspot.com/2008/09/09-11-08-2.html

    I also note that YCoCg is just two Haars : Haar[R,B] then Haar[G,B]

    So you could do a non-expanding YCoCg by doing two PLHaars.

    Not useful in practice.

    ReplyDelete
  5. Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range.

    Unless I'm misunderstanding ryg's response, I think his explanation is off the mark. (Although I don't know what "the S transform trick" is.)

    Technically LOCO is also range expanding (that is, it's range expanding under his argument); obviously R-G requires 9 bits. However you can do everything mod 256 without losing data. So it requires a deeper analysis than just "a rotated box is larger". Which is why I asked. :)

    ReplyDelete
  6. " Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range."

    Well to be clear Haar does *not* make output values in the same range. There is a funny thing like Haar called PLHaar which nobody uses which does go into the same range.

    " Unless I'm misunderstanding ryg's response, I think his explanation is off the mark."

    Ryg is right he's just being not completely thorough. A Haar is a 45 degree rotation, which means the AABB gets bigger. That is true as he said. Of course you can take your funny diamond shape and pack it back into a square, but no *linear* transform will do it. In that sense if you pack it back into a square it's not really YCoCg any more.

    " Technically LOCO is also range expanding (that is, it's range expanding under his argument); "

    Yeah, I suppose so. The problem is not that YCoCg is range expanding - it's that it's range expanding and there's no simple operation to get back into a cube.

    In particular, geometrically LOCO is not a rotation, it's a shear. Imagine a square and shear up one side and shear down the other side. Obviously you can take the sheared square and cut it along a horizontal line and put the top under bottom. Now you have a square again. This is what the modulo does.

    ReplyDelete
  7. I should say a Haar is not actually a 45 degree rotation. A Hadamard is a 45 degree rotation which is volume expanding. A Haar is a Hadamard with a non-uniform scale (one axis gets /2). The Haar is not volume expanding, but is "range" expanding. Haar is the same as S Transform.

    The "PL Haar" and "Modulo transform" papers are both quite good, I encourage interested readers to follow up there.

    ReplyDelete
  8. Ok, but there are rotational transformations that pack a square perfectly, although they involve an additional scale and reshuffling (normally you see this as a rotated grid mapped to a square). Since the area of the two shapes is identical by definition it's probably equivalent to scaling one axis shorter and one axis longer. So yeah, I don't see how to get from A to B.

    To ryg's argument, I think it's that the AABB is a distraction. The "height" of any given slice of the rotated solid is longer than the original square's height without even looking at the bounds. (And the bounds are where the shear looks equally bad.)

    Or to make it concretely obvious, take some coordinate axis of tre transformed data after rotation, and transform that back through the inverse rotation, so it comes out to some diagonal in the RGB cube. If the pre-transform one is long enough to cover the whole line segment through the cube, it must have been longer than a side of the cube.

    The other question is whether LOCO is the best "simple" shear. For example, you could find a shear that maps the Y to a primary axis. LOCO is mapping the grays onto the G axis--a perfect gray comes out as pure G (IIRC). So you could have the R and B computation involve a YCoCg-like shifted version of G to reduce the dominance of G's effect on R and B.

    However, since you're lossless, you don't care about perceptual effects at all, you just care about modelling the data. So the 'shades of grey is most important'/'most neutral' might well be the better model, rather than something that optimizes for colors where Co=Cg=0 .

    ReplyDelete
  9. "Oh yeah, that Haar transform was the thing I was thinking of that squeezed in the same range."

    Sorry, I just realized I miswrote that, which is why it sounded dumb. That Haar transform was the thing I was thinking of that I *thought* at the time you meant squeezed them in the same range. I was assuming that the reason you were going to so much effort to recover the extra bit from (a+b)/2 instead of just storing (a+b) in the first place was because the whole thing fit in 16 bits as is, not because you were trying to keep it in 17 bits instead of 18. But if you can't actually store (a-b) mod 256, then yeah.

    ReplyDelete
  10. So, actually, it looks to me like the the description for YCoCg on the multimedia.cx wiki is written as a series of reversible liftings, although maybe I'm misunderstanding something...


    Co = R - B
    t = B + (Co >> 1)
    Cg = G - t
    Y = t + (Cg >> 1)


    and then


    t = Y - (Cg >> 1)
    G = Cg + t
    B = t - (Co >> 1)
    R = Co + B


    So isn't that not-range-expanding? I guess it's basically four shears.

    Also, since you allow per-line filtering, and as you said you can do the colorspace "conversion" in the filter, then you could have multiple ones and decide them in a fine-grained way.

    ReplyDelete
  11. "So isn't that not-range-expanding? I guess it's basically four shears."

    Of course that is range expanding.

    That is a standard "lifting" way to write lossless YCoCg. Constructing a transform through lifting ensures a transform is lossless, but tells you nothing about volume expansion or range expansion.

    "To ryg's argument, I think it's that the AABB is a distraction."

    No, the AABB is exactly the right thing to look at, because that tells you the range of each coordinate needed.

    "The other question is whether LOCO is the best simple shear."

    I'm pretty sure it's the *only* simple shear (other that obvious permutations of RGB).

    ReplyDelete
  12. Eh .. actually, this :

    Co = R - B
    t = B + (Co >> 1)

    is a Haar step in the lifting form. This is {t,Co} <- Haar(R,B)

    Haar written in this way actually can be made non-range-expanding of course, it's just

    Co = [ R - B ] mod 256
    t = [ B + (Co >> 1) ] mod 256

    t is supposed to be the "low" part of the Haar, it should be close to (R+B)/2 , but when the modulos kick in it can wind up being completely different than that which is annoying.

    I dunno maybe this might be a win once in a while.

    ReplyDelete
  13. I guess the crucial thing about LOCO is that the lowpass part (G) is not affected by the modulo in weird ways.

    The terrible thing about Modulo-YCoCg is that it can do weird things to the lowpass part.

    (BTW Modulo-YCoCg is known as "CFH" and is discussed in the literature)

    ReplyDelete
  14. Haar written in this way actually can be made non-range-expanding of course

    Right, that's what I meant, you just have to interpret the variables as implicitly 8-bit, sorry, I thought that was obvious.

    I'm pretty sure it's the *only* simple shear (other that obvious permutations of RGB).

    LOCO is something like:

    out[0] = R-G
    out[1] = G
    out[2] = B-G

    so there's all sorts of other simple reversible shears:

    out[0] = R-(G>>1)
    out[1] = G
    out[2] = B-(G>>2)

    etc. etc. I have no idea if there's any point to them, though.

    ReplyDelete