You make some lossy coder which has various modes and decisions and quantization options. These affect both R&D on a micro scale. You want to run an automated RDO that can in theory try everything.
In practice this creates problems. The RDO winds up doing funny things. It is technically finding the best D for the desired R, but that winds up looking bad.
One issue is that to make RDO efficient we use a simplified D which is not the whole perceptual error.
Often we're measuring D locally (eg. on one block at a time) but there are non-local effects. Either far apart in the frame, or over time.
For example, RDO will often take away too many bits from part of the frame to put them on another, which creates an ugly variation of quality. RDO can create alternating smooth and detailed blocks when they should be uniform. In video, temporal variation can be even worse, with blocks flickering as RDO makes big changes. Even though it was doing the right thing in terms of D on each individual block, it winds up looking terrible.
Some principals that prevent this from happening :
1. There should be a lot of fine steps in R (and D). There should be coding opportunities where you can add a bit or half a bit and get some more quality. Big steps are bad, they cause blocks (either with spatial or temporal shifts) to jump a lot in R & D.
A common problem case is in video residuals, the jumps from "zero movec, zero residual" to "some movec, zero residual" to "some movec, some residual" can easily be very discontinuous if you're not careful (big steps in R) which leads to nasty chunkiness under RDO.
2. Big steps are also bad for local optimization. Assuming you aren't doing a full-search RDO, you want to make a nicely searchable coding space that has smooth and small steps as you vary coding choices.
3. The R(D) (or D(R)) curve should be as smooth as possible, and of course monotonic. Globally, the RDO that you do should result in that. But it should also be true locally (eg. per block) as much as possible.
4. Similar D's should have similar R's. (and vice versa, but it's harder to intuit the other way around). If there are two quite different ways of making a similar distortion number (but different looking) - those should have a similar R. If not, then your coder is visually biased.
eg. if horizontal noise is cheaper to code than vertical noise (at the same D), the RD optimization will kill one but not the other, and the image will visually change. It will appear to smooth in one direction.
Of course this has to be balanced against entropy - if the types of distortion are not equally probably, they must have different bit rates. But it should not be any more than necessary, which is a common flaw. Often rare distortion gets a codeword that is too long, but people don't care much because it's rare, the result being that it just gets killed by RDO.
Part of the problem here is that most coders (such as DCT and z-scan) are based on an implicit model of the image which is built on a global average ideal image. By the time you are coding a given block, you often have enough information to know that this particular image does not match the ideal, but we don't compensate correctly.
You can think about it this way - you start at 0 bits. You gradually increase the rate and give more bits out. Under RDO, you give each bit out to the place where it gives you most increase in D. At each step as you do this, there should lots of places where you can get that good D. It should be available all over the frame with just slightly different D's. If there are big steps in your coding scheme, it will only be in a few places.