In practice it just doesn't work. I've tried lots of different lapping methods, and in all of them if I make a parameterized lap amount based on a kaiser-bessel-derived window and then tweak the lap amount to maximize SSIM, it tunes to no lapping at all. Basically what's happening is that the extra bit rate cost caused by the forward lap scrambling things up is too great for the win of smoother basis functions on decompress to make up. Obviously in a few contrived cases it does help, such as on very smooth images at very high compression. (of course the large lap basis functions are a form of modeling - they will help any time the image is smooth over the larger area, and hurt when it is not).
The really retarded thing about this is that areas where the image is very smooth over a large area are the cases we already handle very well!! Yeah sure naive JPEG looks awful, but even a deblocking filter after decompress can fix that case very easily. In areas that aren't smooth, lapping actually makes artifacts like ringing worse.
The other issue is I'm having a little trouble with lagrange bitstream optimization. Basically my DCT block coder does a form of "trellis quantization" (which I wrote about before) where it can selectively zero coefficients if it decides it gets an R/D win by doing so. Obviously this gives you a nice RMSE win at a given rate (by design it does so - any time it finds a coefficient to zero, it steps up the R/D slope). But what does this actually do?
Think about trying to make the best bit stream for a given rate. Say two bits per pixel. If we don't do any lagrange optimization at all, we might pick some quantizer, say Q = 16. Now we turn on lagrange optimization, it finds some coefficients to zero, that reduces the bit rate, so to get back to the target bit rate, we can use a lower quantizer. It searches for the right lagrange lambda by iterating a few times and we wind up with something like Q = 12 , and some values zeroed, and a better RMSE. What's happened is we got to use a lower quantizer, so we made more, larger, nonzero coefficients, and then we selectively zeroed a few that took the most R/D.
But what does this actually do to the image qualitatively? What it does is increase the quality everywhere (Q =16 goes to Q=12) , but then it stomps on the quality in a few isolated spots (trellis quantization zeros some coefficients). If you compare the two images, the lagrange optimized one looks better everywhere, but then is very smooth and blurred out in a few spots. Normally this is not a big deal and it's just a win, but sometimes I've found it actually looks really awful.
Even if you optimize for some perceptual metric like SSIM it doesn't detect how bad this is, because SSIM is still a local measurement and this is a nonlocal artifact. Your eyes very quickly pick out that part of the image has been blurred way more than the rest of it. (in other cases it does the same thing, but it's actually good; it sort of acts like a bilateral filter actually, it will give bits to the high contrast edges and kill coefficients in the texture part, so for like images of skin it does a nice job of keeping the edges sharp and just smoothing out the interior, as opposed to non-lagrange-optimized JPEG which allocates bits equally and will preserve the skin pore detail and make the edges all ringy and chopped up).
I guess the fix to this is some hacky/heuristic way to just force the lagrange optimization not to be too aggressive.
I guess this is also an example of a computer problem that I've observed many times in various forms : when you let a very aggressive optimizer run wild seeking some path to maximize some metric, it will do so, and if your metric does not perfectly measure exactly the thing that you actually want to optimize, you can get some very strange/bad results.