06-28-08 - 6

It's a damn shame that J2K didn't catch on. People still don't fucking get it. I'm reading about HD Photo and got this page where the guy rants about J2K and HD Photo and completely misses the point. (of course by missing the point, he makes the point - people didn't understand J2K)

The awesome thing about J2K wasn't that it gave you slightly better quality at normal bit rates, obviously nobody gets that excited about having 40 dB PSNR instead of 38 at 4 bits per pixel. It was the embedded prefix quality of wavelet bitstreams that made J2K a big improvement. That is, you could just output the lossless encoding. Want a smaller file? Just truncate it. This is so fucking rad for many applications, but people never got it, and it was never supported right.

One of the classic examples was digital cameras. There's no longer such a thing as a "capacity" of # of photos for cameras. You just keep taking photos. The first 100 or so are lossless. Then you take another and all your images lose a tiny bit of quality to make room. Take another, quality goes down another microscopic bit. If you want you could take 1000 crappy photos.

In terms of daily use, I would store nothing but lossless J2K on my machine. I never have to worry about picking a quality setting again. I never have to worry about saving a JPEG and then loading it to do more edits and compounding errors. I never have to make internet versions of files. When I want to upload something I just upload the J2K. Smart servers could just terminate the upload when they decide they have enough bits. Or they could accept the whole thing and only serve up a prefix depending on bandwidth. It's so fucking superior.

J2K could've been the in-camera format and it would've provided a lot of benefits even if you typically converted to JPEG when you pulled it onto your machine. There's no need for taking RAW photos if you take lossless J2K in camera. But consumers didn't really drive any demand for it, and camera makers had long pipelines making JPEG-based chips and had no need to devote all the extra engineering to implementing the more complicated encoder. The J2K encoder is a bit complex, and while it was intended for simple camera hardware it's not nearly as easy to encode as JPEG.

Another obvious one is automatic bandwidth customization of web pages. The web server can have a desired max load or something and when it's under stress it just sends smaller prefixes of the J2K files. You could also do really nice quick previews to make the web pages load very fast then pull in the rest of the bits.

A few things killed J2K. Perhaps the biggest was the patent fuckups by all the retards who crammed too much in the standard; this also made the standard unnecessarily complex as everyone in the group tried to get their favorite technology piece in the standard. The other was lack of a good free library for encode & decode being made available quickly. The last was lack of consumer education.

BTW I'm not really a fan of the actual J2K standard; it was overly complex and contained too many different modes and options, like different transforms, lossy and lossless modes, etc. It should have just been one good single lifting reversible truncatable stream.

Bill Crow's HD Photo Blog has lots of goodies in it; it's got good intro material just about HD photography in general, not just on the format.

HD Photo could be a cool thing in various ways. The slightly better compression is pretty meh. The cool thing would be if it actually standardized the encoding of gamma, exposure, and HDR images. While the general public is not going to get this, if it's supported by the OS and the hardware (cameras) it can kind of happen automatically. The camera can tag the exposure information into the file, then when you work with it the OS knows your gamma, and all that junk can be tagged in automatically so that when somebody who actually knows their shit gets the file they can figure out the true meaning of the pixels.

Now, Bill Crow says some things on his blog that are just wrong and quite naive propaganda. He says that the way wavelets concetrate the error in high frequency areas is bad, and that somehow HD Photo's action of spreading error uniformly is better. That's way off. The whole idea of perceptual coding is that you want to put error where it's not noticed. One of the things that makes wavelets great is that they put error in the right place, and the error they introduce tends to be a smoothing, which is visually not annoying. In contrast, HD Photo seems to make blocky errors, and the tests I've seen indicate that HD Photo's human visual error is much worse at the same PSNR.

these guys at some Science news group seem to know WTF is going on.

There are some general things in HD Photo that are interesting to mention.

It does a "lapped" transform as a preprocess. If you search for "lapped image" there are tons of papers on this now. Lapped transforms are a reversible convolution that you can apply to any other transform, but they go best with block-based schemes like the standard JPEG DCT. Basically the convolution takes the lowest bit rate DC signal and changes it from being a bunch of blocks into being smooth bumps. Like instead of the 0th coefficient being an 8-pixel hard step, it's a 16-pixel wide smooth bump. There are a bunch of papers about sticking lapped pre & post processes on standard JPEG, and it improves perceptual error a lot without increasing computation time much at all. Any modern block-based coder should have something like a lapped transform in it.

(ADD : this is not true. In the ensuing years it is now well known that lapped transforms are shit and should not be used. Traditional block transforms + deblock filter are just better. The problem is that lapped transforms screw up your data in ways that you can never get back.)

HD Photo also uses one of the newer lossless "Y-Chroma" color transforms that's based on lifting operations. I think I wrote about this in Rants before. There's a lot of papers on this topic as well. More generally, you could make it so the encoder could write the lifting color transform out to the stream. This would only take a few bits and not cost anything really in the decoder. You can improve compression performance by optimizing the color transform for the given image; it takes a lot of computation so it wouldn't usually be done, maybe never.

No comments:

old rants