JPEG2 proposal / rough principles :

1. Simple simple simple. The decoder should be implementable in ~5000 lines as a single file stb.h style header. Keep it simple!

2. It should be losslessly transcodable from JPEG , ala packJPG/Lepton. That is, JPEG1 should be contained as a subset. (this just means having 8x8 DCT mode, quantization matrix). You could have other block modes in JPEG2 that simply aren't used when you transcode JPEG. You replace the entropy coded back-end with JPEG2 and should get about 20% file size reduction.

IMO this is crucial for rolling out a new format, nobody should ever be trancoding existing JPEGs and thereby introducing new error.

3. Reasonably fast to decode. Slower than JPEG1 by maybe 2X is okay, but not by 10X. eg. JPEG-ANS is okay, JPEG-Ari is probably not okay. Also think about parallelism and GPU decoding for huge images (100 MP). Keeping decoding local is important (eg. each 32x32 block or so should be independently decodable).

4. Decent quality encoding without crazy optimizing encoders. The straightforward encode without big R-D optimizing searches should still beat JPEG.

5. Support for per-block Q , so that sophisticated encoders can do bit rate allocation.

6. Support alpha, HDR. Make a clean definition of color space and gamma. But *don't* go crazy with supporting ICC profiles and lots of bit depths and so on. Needs to be the smallest set of features here. You don't want to get into the situation that's so common where the format is too complex and nobody actually supports it right in practice, so there becomes a "spec standard" and a "de-facto standard" that don't parse lots of the optional modes correctly.

7. Support larger blocks & non-square blocks; certainly 16x16 , maybe 32x32 ? Things like 16x8 , etc. This is important for increasingly large images.

Most of all keep it simple, keep it close to JPEG, because JPEG actually works and basically everything else in lossy image compression doesn't.

Anything that's not just DCT + quantize + entropy is IMO a big mistake, very suspicious and likely to be vaporware in the sense that you can make it look good on paper but it won't work well in reality.


I have in the past posted many times about how plain old baseline JPEG + decent back-end entropy (eg. packJPG/Lepton) is surprisingly competitive with almost every modern image codec.

That's actually quite surprising.

The issue is that baseline JPEG is doing *zero* R-D optimization. Even if you use something like mozjpeg which is doing a bit of R-D optimization, it's doing it for the *wrong* rate model (assuming baseline JPEG coding, not the packjpg I then actually use).

It's well known that doing R-D optimization correctly (with the right rate model) provides absolutely enormous wins in lossy compression, so the fact that baseline JPEG + packJPG without any R-D at all can perform so well is really an indictment of everything it beats. This tells us there is a lot of room for easy improvement.

1 comment:

senior noise said...

Absolutely that's the way to go.

I'd add: support progressive rendering or hierarchical encoding, so that the full image doesn't have to be decoded to display just a thumbnail.

old rants