Y = 0.299 R + 0.587 G + 0.114 B(as used in JPEG, MPEG, etc.)
The new CIE rec709 standard for Y is :
Y = 0.2126 R + 0.7152 G + 0.0722 B
Modern HDTV's and such are supposed to be built to the 709 standard, which means that the perceived brightness (ADD: I'm playing a little loose with terminology here, I should say the "output luminance" or something like that) should be that linear combo of RGB. eg. new TV's should have brighter greens and darker blues than old TV's. I have no idea if that's actually true in practice, when I go in to Best Buy all the TV's look different, so clearly they are not well standardized.
Those formulas are for *linear* light combination. In codec software we usually use them on *gamma-corrected* values, which makes them semi-bogus. eg. there's not much reason to prefer the 709 coefficients for your video encode even if you know you will be displaying on a 709 spec monitor (if you are sending it signal in RGB), because the monitor should be 709-spec on linear RGB but you are using that matrix on gamma-corrected RGB. I suppose if you hope to decode to a hardware-supported YUV format directly (eg. such as for Overlays on Windows), you would want to be using the same color space as the overlay surface wants, but that's a flaky hopeless hell not worth dealing with.
Two points :
1. This is *NOT* a formula for how important R, G and B are. For example if you have an error metric which you compute on each color channel, do not combine them like
Err_tot = 0.2126 Err_R + 0.7152 Err_G + 0.0722 Err_Bthat linear combo is just the contribution to *brightness* , but the human eye sees more than just brightness. It's like you're modeling the eye as only rods and no cones.
(BTW while Blue has a very small contribution to brightness and furthermore only 2% of the cones perceive blue, they are however the most sensitive cones, and their signal is amplified in the brain, so our perception of blue intensity levels (and therefore quantizer step size) is in fact very good; the way that you can get away with killing blue is by giving it much lower spatial resolution )
So, for example, if you are using rec709 Y and then only measuring Y-PSNR , you are severely undercounting the contribution of non-green to error.
2. Now, for what we actually do (data compressors) there's a question of how the difference between 601 and 709 affects us. I'll assume that you are using the color conversion in something like JPEG or H264 which will subsample the chroma and probably has some fixed way to quantize the channels differently.
Choosing 709 or 601 for the conversion matrix are just special cases of doing arbitrary color matrices (such as a KLT), obviously it will depend on the content which one is best. It will also depend on the error metric that you use to define "best" - in particular if you measure error in Y-PSNR using the 709 definition of Y, then it will appear to you that the 709 spec color conversion is best. If you measure error in RGB, then the 601 spec will appear best (usually).
My random guess is that 601 will be better for coding most of the time because it doesn't have such a severe preference for green in the high fidelity non-chroma channel. Basically it's better hedged - when you are given a synthetic image that's all red-blue with no green at all, the 709 matrix is very unfortunate for coding, the 601 is not quite so bad.
Note that using the matrix which matches your reproduction device has basically nothing to do with which matrix is best for coding. The definition of "luminance" for your viewing device refers to *linear* RGB anyway, and in coding we are working on gamma-corrected RGB, so it's best just to think of the color matrix as part of the compressor.
This article from compuphase has some interesting notes about our perception of blue and proposes a cheap color metric to takes them into account.
ReplyDeleteEh, there's some sort of okay stuff in there, but I think the main point about blue is that *any* per-pixel metric is just way off the mark for blue, scaling it by 2 or 3 or whatever is not getting what you want. The important thing is the spatial filter.
ReplyDeleteWell, you are probably using a per pixel metric at some point, in which case it probably makes sense to use something other than the CIE or NTSC luminance factors.
ReplyDeleteAt least in the context of palletizers, vector quantizers and plain DXT compressors I don't see how can you take spatial resolution into account.
"At least in the context of palletizers, vector quantizers and plain DXT compressors I don't see how can you take spatial resolution into account."
ReplyDeleteOf course you can.
For example in a DXTC compressor, on each 4x4 block you need to match the luma at each pixel. However, the blue channel only needs to match well as an average over the 4x4 block. You want the blue *level* to be quite accurate overall, but you can afford more spatial errors in the blue over the block.
If you limit yourself to only per-pixel error metrics you are losing almost all the benefit of perceptual metrics.
(you are right that in those cases you don't have to ability to do R/D optimization and move bits around, so you can't take full advantage of a perceptual metric, but of course you can still use it)
I see. Yep, that makes sense.
ReplyDelete