In general there are two major approaches to perceptual metrics : HVS masking/filtering methods , and structural/statistical methods.
The HVS methods are mainly still using something like a per pixel delta, but they are weighting or masking or remapping it to account for various perceptual effects. Largely they are considering the capabilities of the eye, as opposed to what the brain processes and takes from the image. This includes stuff like rod/cone spatial resolution, perceptually linear response to intensity differences, contrast masking, luma masking, relative contrast sensitivity, etc.
The original major HVS work was the JPEG standard itself with its quantization matrices that are based on CSF (contrast sensitivity function) measurements. After that Watson's DCTune and related work added masking and other effects. This has been extended over the years. Probably the strongest modern work is Lin's JND (just noticeable differences) stuff.
The JND papers work out complicated models of visual masking to figure out what kind of deltas between images are imperceptible.
Obviously color spaces like CIELAB etc are part of the HVS image difference metric, as is the SCIELAB filter. (I guess CIEDE2000 is the newer version)
A lot of the HVS stuff doesn't really deal well with color - most of the old data is for luma only. The Fairchild/iCAM group is doing a lot of work on perception of color and mainly HVS based metrics :
Garrett Johnson's Thesis is the best reference I have found for lots of data on the HVS with actual useful formulas and numbers.
Many of the newer works use a whole-image FFT in opponent color space so that they can apply a CSF (ala JPEG quantization matrices) and a rod/cone spatial resolution filter (ala SCIELAB).
There are lots of metrics made over the years that incorporate some amount of these HVS factors, for example PSNR-HVS-M :
For video there are more HVS factors to consider such as temporal contrast sensitivity and motion masking.
There are also a lot of papers on contrast visibility and color appearance models from the HDR Tone Mapping research guys; in fact there are some good Siggraph courses such as Image Quality Metrics from Siggraph 2000 and Survey of Color for Computer Graphics . There were also a bunch of papers on selective raytracing that gave good details on perceptual metrics.
However, in general the HVS-centric methods are a bit out of favor and "structural methods" appear to be better. Again most of the structural methods work on luma only and ignore color.
The most well-known structure method is SSIM , and there have been many extensions of SSIM that purport to improve it, such as gradient-SSIM and region-pooled-SSIM or feature-type (edge,smooth,texture) pooled SSIM.
The LIVE team has a ton of good papers, with the best ones being on "structural" type metrics :
In particular, they have VIF (Visual Information Fidelity) which is a measure of human-perceptual mutual information. There are a number of modern works using metrics like this for image quality which appears to be a very promising line.
There are various other structural-type metrics, such as doing SVD to find singular vectors and values and so on. So far this work feels to early to be useful.
One interesting work on color I found is SHAME : (Spatial Hue Angle Metric) :
It's basically SCIELAB with the addition of a hue histogram. The hue histogram weights colors that occur more often more. eg. if your image has a big patch of red and you get the color wrong on it, that's visible, but if it only has a few scattered red pixels, then their exact color doesn't matter because relative to their surrounding they will seem "red" even if they are pretty far off.
Some other random links I thought were interesting :
JPEG Post Processing by reapplication of JPEG
NASA ADS Predicting image quality using a modular image difference model
Color FAQ - Frequently Asked Questions Color
[x264-devel] commit New AQ algorithm option (Jason Garrett-Glaser )
MetriX MuX Home Page