01-12-10 - Lagrange Rate Control Part 2

Okay, so we've talked a bit about lagrange coding decisions. What I haven't mentioned yet is that we're implicitly talking about rate control for video coders. "Rate control" just means deciding how many bits to put in each frame. That's just a coding decision. If the frames were independent (eg. as in Motion JPEG - no mocomp) then we know that lagrange multiplier R/D decisions would in fact be the optimal way to allocate bits to frames. Thus, if we ignore the frame-dependence issue, and if we pretend that all distortions are equal - lagrange rate control should be optimal.

What does lagrange rate control mean for video? It means you pick a single global lambda for the whole video. This lambda tells you how much a bit of rate should be worth in terms of distortion gain. Any bit which doesn't give you at least lambda worth of distortion gain will not be coded. (we assume bits are of monotonically decreasing value - the first bit is the most important, the more bits you send the less value they have). On each frame of video you want to code the frame to maximize J. The biggest single control parameter to do this is the quantizer. The quantizer will largely control the size of the frame. So you dial Q to maximize J on that frame, this sets a frame size. (maximizing J will also affect movec choices, macroblock mode choices, etc).

Frames of different content will wind up getting different Q's and very different sizes. So this is very far from constant bit rate or constant quantizer. What it is is "constant bit value". That is, all frames are of the size where adding one more bit does not help by lambda or more. Harder to code (noisy, fast motion) frames will thus be coded with more error, because it takes more bits to get the same amount of gain in that type of frame. Easy to code (smooth, flat) frames will be coded with much less error. Whether or not this is good perceptually is unclear, but it's how you get optimal D or a given R assuming our D choice is what we want.

Ideally you want to use a single global lambda for your whole video. In practice that might not be possible. Usually the user wants to specificy a certain bit rate, either because they actually need to meet a bit rate maximum (eg. for DVD streaming) , or because they want to specify a maximum total size (eg. for fitting on your game's ship DVD), or simply because that's an intuitive way of specifying "quality" that people are familiar with from MP3 audio and such. So your goal is to hit a rate. To do that with a single global lambda, you would have to try various lambdas, search them up and down, re-encode the whole video each time. You could use binary search (or maybe interpolation search), but this is still a lot of re-encodings of the whole video to try to hit rate. (* more on this later)

Aside : specifying lambda is really how people should encode videos for distribution as downloads via torrents or whatever. When I'm storing a bunch of videos on my hard disk, the limitting factor is my total disk size and the total download time - I don't need to limit how big each individual movie is. What I want is for the bits to go where they help me most. That's exactly what lambda does for you. It makes no sense that I have some 700 MB half hour show that would look just fine in 400 MB , while I have some other 700 MB show that looks like shit and could really use some more bits. Lambda is the right way to allocate hard drive bytes for maximum total viewing quality.

Okay. The funny thing is that I can't find anyone else on the web or in papers talking about lagrange video rate control. It's such an obvious thing that I expected it to be the standard way, but it's not.

What do other people do? The de-facto standard seems to be what x264 and FFMPEG do, which I'll try to roughly outline (though I can't say I get all the details since the only documentation is the very messy source code). Their good mode is two pass, so I'll only talk about that.

The primary thing they do is pick a size for each frame somehow, and then try to hit that size. To hit that frame size, they search QP ( the quantization parameter) a bit. The specifically only search QP in the local neighborhood of the previous QP because they want to limit QP variation between frames (the range of search is a command line parameter - in fact almost everything in this is a command line parameter so I'll stop saying that). When they choose a QP, there's a heuristic formula for H264 which specifies a lambda for lagrange decisions that corresponds to that QP. Note that this lambda is only used for inside-the-frame coding decisions, not for choosing QP or global rate allocation. Also note that the lambda-QP relationship is not necessarily optimal; it's a formula (there are a bunch of H264 papers about making good lambda-QP functional fits and searches). They also do additional funny things like run a blurring pass on QP to smooth out variation; presumably this is a heuristic for perceptual purposes.

So the main issue is how do they pick this frame size to target? So far as I can tell it's a mess of heuristics. For each frame they have a "complexity" measure C. On the first pass C is computed from entropy of the delta or some such thing, raised to the 0.8 power (purely heuristic I believe). The C's are then munged by some fudge factors (that can be set on the command line) - I frame sizes are multiplied by a factor > 1 that makes them bigger, B frame sizes are multipled by a factor < 1 that makes them smaller. Once all the "complexities" are chosen, they are all scaled by (target total size) / (total complexity) to make them add up to the total target size. This sets the desired size for each frame.

Note that this heuristic method has many of the same qualitative properties as full lagrangian allocation - that is, more complex frames will get more bytes than simpler frames, but not *enough* more bytes to give them the same error, so more complex frames will have larger errors than simpler frames. However, quantitatively there's no gaurantee that it's doing the same thing.

So far as I can tell the lagrange method is just better (I'm a little concerned about this because it just seems to vastly obviously better that it disturbs me that not everyone is doing it). Ignoring real world issues we've glossed over, the next big problem is the fact that we have to do this search for lambda, so we'll talk about that next time.

ADDENDUM : x264/ffmpeg rate control references :

ratecontrol.txt - Loren is the main dev but this is very cursory
FFmpeg RateControlContext Struct Reference
FFmpeg libavcodecratecontrol.c Source File
[Ffmpeg-user] changing bitrate on the fly - detailed rate control document, but not written by one of the main devs so beware
x264 Macroblock Tree Ratecontrol testing (committed) - Doom9's Forum - this is about the dependency issue that we haven't discussed


ryg said...

There's a rough overview on the FFMPEG/x264 rate control available, but it's kinda old and doesn't have more information than what you already found out:


cbloom said...

Oh yeah, I forgot to add my reference list. I'll amend the post rather than put them in a comment ...

Anonymous said...

An important detail from that bmlock dependency Doom9 thread:

Daiz: at the same bitrate or even at higher bitrates, fades ended up worse-looking with mbtree on than with mbtree off. It's a shame since the mbtree encodes look better everywhere else.

DS: This seems to be inherent in the algorithm and I'm not entirely sure how to resolve it... it was a problem in RDRC as well, in the exact same way. ... MB-tree lowers the quality on things that aren't referenced much in the future. Without weighted prediction, fades are nearly all intra blocks...

This seems almost like a bug... if you view it as "steal bits from other things to increase the quality of source blocks" it makes sense, but presumably putting extra bits into the source blocks actually reduces the number of bits you need later to hit the target quality.

I guess at really low bit rates where you just copy blocks without making ANY fixup at all then you're just stealing bits.

cbloom said...

"Yes, qualitatively, other frames are improved with mb-tree, but the trend seems to be fades are significantly worse. It's almost as if bitrate is redistributed from fades/low luminance frames to other frames"

"mbtree can definitely be beneficial in some scenarios, but it almost always comes with signicantly worse fades and dark scenes. Perhaps the default --qcomp value isn't optimal, but increasing it will lower the mbtree effect, and basically we are back to square one. What i am seeing is a sort of "tradeoff." Some frames are improved at the expense of others. But the "expense" is quite severe in my opinion, at least with default qcomp. I'm looking for more balance."

This isn't really surprising, it's one of those consequences of using a "D" measure that doesn't match the HVS ; mb-tree and lagrange rate control and all that will move bits around to minimize D , which usually means not giving many bits to low luma stuff. That's normally correct, but I guess is occasionally terrible.

I'll write more about this in part 4 some day.

"but presumably putting extra bits into the source blocks actually reduces the number of bits you need later to hit the target quality."

Not necessarily, you *hope* that putting more bits in source blocks helps overall, but you can't afford to test every bit movement, so you just use some heuristics that usually help.

Thatcher Ulrich said...

This is probably either obvious or useless, but -- for rate control, would it make sense to encode a few frames ahead, while keeping intermediate data structures, and then if a rate adjustment has to be made, re-use the partial work to more cheaply re-encode the same few frames with an adjusted quality?

cbloom said...

Yeah, Thatch, that's basically the "classical" non-one-pass-lagrange method. For doing something like movec back-propagation you have to do something like that.

old rants