Yesterday I finally adapted my Y4M converter (which does AVI <-> Yuv4MPEG with RGB <-> YUV color conversion and up/down sample, and uses various good methods of YUV, such as out of gamut chroma spill, lsqr optimized conversion, etc.). I added support for the "rec601" (JPEG) and "bt709" (HDTV) versions of YUV (and by "YUV" I mean YCbCr in gamma-encoded space), with both 0-255 and 16-235 range support. I figured I would stress test it by trying to use it in place of ffmpeg in my h264 pipeline for the Y4M conversion. And I found the old brightness problem.
It turns out that when I make an x264 encode and then play it back through DirectShow (with ffdshow), the player is using the "BT 709" yuv matrix (in 16-235 range) (*). When I use MPlayer to play it back and write out frames, it's using the "rec 601" yuv matrix (in 16-235 range).
this appears to be because there's nothing specified in the stream and ffdshow will pick the matrix based on the resolution of the video - so that will super fuck you, depending on the size of the video you need to pick a different matrix (it's trying to do the right thing for HDTV vs SDTV standard video). Their heuristic is :.
width > 1024 or height >= 600: BT.709 width <=1024 and height < 600: BT.601 *)
(in theory x264 doesn't do anything to the YUV planes - I provide it y4m, and it just works on yuv as bytes that it doesn't know anything about; the problem is the decoders which are doing their own thing).
The way I'm doing it now is I make the Y4M myself in rec601 space, let x264 encode it, then extract frames with mplayer (which seems to always use 601 regardless of resolution). If there was a way to get the Y4M directly out of x264 that would make it much easier because I could just do my own yuv->rgb (the only way I've found to do this is to use ffmpeg raw output).
Unfortunately Y4M itself doesn't seem to have any standardized tag to indicate what kind of yuv data is in the container. I've made up my own ; I write an xtag that contains :
yuv=pc.601 yuv=pc.709 yuv=bt.601 yuv=bt.709where "bt" implies 16-235 luma (16-240 chroma) and "pc" implies 0-255 (fullrange).
x264 has a bunch of --colormatrix options to tag the color space in the H264 stream, but apparently many players don't respect it, so the recommended practice is to use the color space that matches your resolution (eg. 709 for HD and 601 for SD). (the --colormatrix options you want are bt709 and bt470bg , I believe).
Some notes by other people :
TV capture "SD" mpeg2 720x576i -> same res in mpe4, so use --colormatrix bt601 --fullrange ? TV capture "HD" mpeg2 1440x1080i -> same res in mpe4, so use --colormatrix bt709 --fullrange ? look at table E-3 (Colour Primaries) in the H.264 spec: bt470bg = bt601 625 = bt1358 625 = bt1700 625 (PAL/SECAM) smpte170m = bt601 525 = bt1358 525 = bt1700 NTSC (yes, PAL and NTSC have different bt601 matrices here) yup there's only: --colormatrix
<string> Specify color matrix setting ["undef"] - undef, bt709, fcc, bt470bg, smpte170m, smpte240m, GBR, YCgCo
ADDENDUM : god damn the color matrix change in bt.709 is so retarded. While in theory the phosphors of HDTVs match 709 better than 601, that is actually pretty irrelevant, since YCbCr is run in gamma-corrected space, and we do the chroma sub-sample, and so on ( see Mag of nonconst luminance error - Charles Poynton ). The actual practical effect of the 709 new matrix is that we're watching lots of videos with badly shifted brightness and saturation. In reality, it just made video quality much much worse.
(I also don't understand the 16-235 range that was used in MPEG. Yeah yeah, NTSC needs the top and bottom of the signal for special codes, fine, but why does that have to be hard-coded into the digital signal? The special region at top and bottom is an *analog* thing. The video could have been full range 0-255 in the digital encoding, and then in the DAC output you just squish it into the middle 7/8 of the signal band. Maybe there's something going on that I don't understand, but it just seems like terrible software engineering design to take the weird quirk of one system (NTSC analog output) and push that quirk back up the pipeline to affect something (digital encoding format) that it doesn't need to).