First of all the R/D trellis quantization issue. Very roughly what we're doing here is coding to a certain bit rate. The "RDO" lets us use a smaller quantization bucket size, which initially lowers distrortion and increases our rate, but then we hammer on some of the values - mainly we just force them to zero, which causes some distortion and decreases rate; we choose to hammer the values that save us the most rate per distortion. (99% of the time all you're doing is turning 1's into 0's, so it's a matter of picking the 1 to squash to 0 which saves you most the rate).
Here are the results on "Moses" at 0.5 bits per pixel :
No R/D : RMSE = 9.9381 :
Unconstrained R/D : RMSE = 9.7843 :
You should be able to see in the R/D image that some of the image looks better, but other parts look much worse. The RDO has stolen rate from places where it was expensive in terms of rate to encode a certain distortion, and moved those bits to parts of the image where you can get more distortion win at a cheaper rate. This is awesome if your goal is to minimize RMSE, but it's unclear to me whether this is *ever* good perceptually.
In this particular case, the RDO Moses image actually has a worse SSIM than the No-RD image; this type of mistake is actually something that SSIM is okay at detecting.
In practice I use some hacks to limit how much the RDO can do to any one block. With those hacks I almost always get an SSIM improvement from RDO, but it's still unclear to me whether or not it's actually a perceptual improvement on many images (in some cases it's a very clear win; images like kodim09 or kodim20 where you have big flat patches in some spots and then a lot of edge detail in other spots, the RDO does a good job of stealing from the flats to give to the edges, which the eye likes, because we don't mind it if an almost perfectly smooth area becomes perfectly smooth).
Now for the hacky perceptual smooth DC issue.
This is "kodim04" at 0.25 bpp ; no RDO ; no unblock , no perceptual DC quantization ; basically a naive DCT coder :
Now we turn on the hacky perceptual quantization that gives more precision to smooth DC's : (unblock still off) :
Note that the perceptual quant of DC means that we are using more of our bitrate for the DC band, so we give less bits to AC, which means using a larger quantizer for AC to match the bit rate constraint.
Now with unblocking , no perceptual DC quant : (RMSE = 12.8565 , SSIM = 58.62%)
With unblocking and perceptual DC quant : (RMSE = 12.9666, SSIM = 57.88%)
I think the improvement is clearest on the unblocked images - the perceptual DC quant one actually looks okay, the parts that are supposed to be smooth still look smooth. The one with uniform DC quant looks disgustingly bumpy. Note that the SSIM of the better image is actually quite a bit worse. Of course RMSE gets worse any time you do a perceptual improvement. You should also be able to see that the detail in the hat thatching is better in the nonperceptual version, but that doesn't bother the eye nearly as much as breaking smoothness.
ADDENDUM : some close up pictures of Moses' waddle area showing the R/D artifacts better. You should zoom these to full screen with a box filter and toggle between them to see most clearly. You should see the RDO killing blocks in the collar area very clearly. All you really need to do is look at the last picture of these four and you should be able to see what I'm talking about with the RDO :
Portion of Moses at 0.75 bpp : No lagrange optimization :
With Lagrange RDO :
Crop of No-L :
Crop of RDO :