Comments on cbloom rants: 04-07-10 - Video

Yeah, I agree with that mostly. (also BTW the gra...

2010-04-12T07:36:12.052-07:00

Yeah, I agree with that mostly.

(also BTW the gradient of D term is most interesting if it also includes temporal gradient)

One problem is that the improved target function that you want to optimize is itself just a heuristic. Even in the grad D case you get into "what should I tweak lambda2 to be so it looks good". The most extreme case is x264 where the SATD target function is a very strange heuristic that sort of happens to work out well due to the structure of the coder.

I also think that the quality goal is just too complex to express and use in a practical way. But in that case even if you can't actually use it in your R/D optimizer you could use it as a verification pass to make sure you actually are optimizing something concrete.

"Don't squared distortion metrics already...

2010-04-12T01:02:11.453-07:00

"Don't squared distortion metrics already achieve this, encouraging a relatively level amount of error throughout the image, and without introducing a second parameter to optimize on?"
They reduce overall error variance, but they don't care at all about spatial variation. For a squared distortion metric, a block with error 100 is a block with error 100 no matter where it appears in the image; if you include the gradient term, it's suddenly a lot cheaper to place that block in a neighborhood with average error around 90 than it would be to place it in a neighborhood with blocks of average error around 20.

I agree that introducing an extra parameter is problematic; in practice you'll want to choose lambda2 as some function of lambda and only optimize for one parameter.

However, that's not my main point; the idea is that you can view the whole process of "do RD optimization, then apply some heuristics" as "optimize simplified version of target function to get an initial solution, then iteratively perform local changes to optimize full target function". If you can write a heuristic as term in the target function (even if somewhat unwieldy), you're in good shape: doing so makes it much clearer what the model is, and there might even be a way to optimize it directly. The other way round is just as interesting - any term that is difficult to incorporate in a full optimization round but easy to evaluate "incrementally" (i.e. "does this change reduce overall error"?) translates to a local post-optimization test in the code. But unlike arbitrary heuristics, in this case you know that it's actually improving some well-defined target function. That's much better than just knowing that "this video looks a bit better when I turn this on".

minimize R + lambdaD + lambda2||grad(D)|| instea...

2010-04-11T15:06:35.167-07:00

minimize R + lambda*D + lambda2*||grad(D)|| instead (using the spatial "gradient", in this case distortion differences between neighboring blocks). The higher lambda2 gets, the less attractive isolated blocks with much higher distortion than their surroundings become.

Don't squared distortion metrics already achieve this, encouraging a relatively level amount of error throughout the image, and without introducing a second parameter to optimize on? (The problem being when we find that sums of absolutes seem to produce better results than squares. Maybe we should be raising to an intermediate power?)

One thing to consider is that not only do the opti...

2010-04-11T10:19:48.547-07:00

One thing to consider is that not only do the optimization processes only improve locally, the metrics are actually unable to measure anything else.

Common distortion metrics for images are basically sums of error measurements over small windows. For all SSD-based metrics, the windows are as small as they can be - single pixels. SSIM uses larger windows with some overlap, but it's still the case that very visible global consistency violations (e.g. an edge being interrupted in a single block) is only detected (and hence only influences measured distortion) in a small number of those windows.

Even without a real understanding of the HVS, it's possible to get out of that "local distortion ghetto" simply by aggregating local measures with other means than just plain summation.

Case in point: the skip/no-skip decision you mentioned. Instead of using R + lambda*D, minimize R + lambda*D + lambda2*||grad(D)|| instead (using the spatial "gradient", in this case distortion differences between neighboring blocks). The higher lambda2 gets, the less attractive isolated blocks with much higher distortion than their surroundings become. Of course you can also get fancy and allow arbitrary distortion along boundaries (which are also determined during the minimization) - then you need to add a penalty term for the length of the boundary as well (effectively clamping the gradient term at some maximum value per block). There's tons of ways to do this kind of stuff.

Adding this kind of term makes global optimization hairy, but it's quite easy to look at a small local change and see whether it improves the modified metric or not. In short, an actual implementation boils down to a couple of rules that "look" heuristic (and in fact might be identical to heuristics you're already using) but actually work towards locally improving a more complete distortion metric.

I think this is the way forward - imperfectly minimizing a more complete cost function is a far better theoretical model to work from than perfectly minimizing a simple function and then doing local changes to "make it look good" without any real justification from your model.

Comments on cbloom rants: 04-07-10 - Video

Yeah, I agree with that mostly. (also BTW the gra...

"Don't squared distortion metrics already...

minimize R + lambda*D + lambda2*||grad(D)|| instea...

One thing to consider is that not only do the opti...

minimize R + lambdaD + lambda2||grad(D)|| instea...