5/19/2010

05-19-10 - Some quick notes on VP8

The VP8 release is exciting for what it might be in two years.

If it in fact becomes a clean open-source video standard with no major patent encumbrances, it might be well integrated in Firefox, Windows Media, etc. etc. - eg. we might actually have a video format that actually just WORKS! I don't even care if the quality/size is really competitive. How sweet would it be if there was a format that I knew I could download and it would just play back correctly and not give me any headaches. Right now that does not exist at all. (it's a sad fact that animated GIF is probably the most portable video format of the moment).

Now, you might well ask - why VP8 ? To that I have no good answer. VP8 seems like a messy cock-assed standard which has nothing in particular going for it. The entropy encoder in particular (much like H264) seems badly designed and inefficient. The basics are completely vanilla, in that it is block based, block modes, movecs, transforms, residual coding. In that sense it is just like MPEG1 or H265. That is a perfectly fine thing to do, and in fact it's what I've wound up doing, but you could pull a video standard like that out of your ass in about five minutes, there's no need to license code for that. If in fact VP8 does dodge all the existing patents then that would be a reason that it has value.

The VP8 code stream is probably pretty weak (I really don't know enough of the details to say for sure). However, what I have learned of late is that there is massive room for the encoder to make good output video even through a weak code stream. In fact I think a very good encoder could make good output from an MPEG2 level of code stream. Monty at Xiph has a nice page about work on Theora. There's nothing really cutting edge in there but it's nicely written and it's a good demonstration of the improvement you can get on a fixed standard code stream just with encoder improvements (and really their encoder is only up to "good but still basic" and not really into the realm of wicked-aggressive).

The only question we need to ask about the VP8 code stream is : is it flexible enough that it's possible to write a good encoder for it over the next few years? And it seems the answer is yes. (contrast this to VP3/Theora which has a fundamentally broken code stream which has made it very hard to write a good encoder).

ADDENDUM : this post by Greg Maxwell is pretty right on.

ADDENDUM 2 : Something major that's been missing from the web discussions and from the literature about video for a long time is the separation of code stream from encoder. The code stream basically gives the encoder a language and framework to work in. The things that Jason / Dark Shikary thinks are so great about x264 are almost entirely encoder-side things that could apply to almost any code stream (eg. "psy rdo" , "AQ", "mbtree", etc.). The literature doesn't discuss this much because they are trapped in the pit of PSNR comparisons, in which encoder side work is not that interesting. Encoder work for PSNR is not interesting because we generally know directly how to optimizing for MSE/SSD/L2 error - very simple ways like flat quantizers and DCT-space trellis quant, etc. What's more interesting is perceptual quality optimization in the encoder. In order to acheive good perceptual optimization, what you need is a good way to measure percpetual error (which we don't have), and the ability to try things in the code stream and see if they improve perceptual error (hard due to non-local effects), and a code stream that is flexible enough for the encoder to make choices that create different kinds of errors in the output. For example adding more block modes to your video coder with different types of coding is usually/often bad in a PSNR sense because all they do is create redundancy and take away code space from the normal modes, but it can be very good in a perceptual sense because it gives the encoder more choice.

ADDENDUM 3 : Case in point , I finally have noticed some x264 encoded videos showing up on the torrent sites. Well, about 90% of them don't play back on my media PC right. There's some glitching problem, or the audio & video get out of sync, or the framerate is off a tiny bit, or some shit and it's fucking annoying.

ADDENDUM 4 : I should be more clear - the most exciting thing about VP8 is that it (hopefully) provides an open patent-free standard that can then be played with and discussed openly by the development community. Hopefully encoders and decoder will also be open source and we will be able to talk about the techniques that go into them, and a whole new

6 comments:

won3d said...

Some more rah-rah from a Flash engineer:

http://multimedia.cx/eggs/looking-at-todays-vp8-open-sourcing/

Anonymous said...

Actually Dark Shikiri posted a long analysis of the VP8 codestream and the degree to which it has room for encoder improvement with things like psy-rdo.

(E.g. he says there isn't a good way to do adaptive quantization (though there is a crappy way) so it's going to suffer there.)

cbloom said...

Yeah the problem is he's not really an expert on it either and not very objective or unbiased in his analysis. It's too early to say whether the weird transform or weird entropy coder in VP8 are a major hindrance or not.

In particular Re: AQ I think that's wrong. VP8 can on each block select between a few quantizers which is really all you need to do good enough AQ. Maybe it's slightly less than ideal but not a major hindrance by any means.

Anonymous said...

I'm not sure he's claiming you really need more than 4 quantizers, I think he was just saying that the way to encode it (with the "segment map") was woefully inefficient for the task.

But I'm not positive that's what he meant and I've never looked at the spec myself.

He did say it was *better* entropy coding that H.264 baseline, at least.

cbloom said...

Some vaguely related amusement :

http://planet.xiph.org/

Also an Ogg / anti-Ogg flame war :

http://people.xiph.org/~xiphmont/lj-pseudocut/o-response-1.html

Anonymous said...

I see someone says that the x264 devs are looking to try to find a way to license and make money from it, which would give them a more significant motive towards bias...

old rants