11-18-09 - Raw Conversion

CR2 format is a big mess of complication. My god I hate TIFF. The main thing is the sensor data. It appears to be stored as "lossless JPEG" which is a new format that uses the JPEG-LS predictor but then just codes the residual with normal JPEG Huffman coding. The sensor data is RGGB which they either store as a 4-channel per pixel [RGGB per pixel] or as 2-channel [GR or GB]. Either way is clearly not optimal. One interesting thing I could do if I cracked the CR2 format is store all these raws smaller with a better compressor. The RAWs from the S90 are around 11M on average, it uses the 2-channel mode; the RAWs are 1872x2784 = 3744x2784 samples and 12 bits per sample. That means the JPEG is getting to 8.85 bits per sample. Not very good.

Of course I probably have to use dcraw to read it for me, but dcraw is just about the worst piece of code I've ever seen in my life. It's a miracle to me that people are able to write functioning software from code like that.

Paul Lee has a modified dcraw and some nice sample pictures of how demosaicing can go wrong (click the Moire or Aliasing links).

My idea for high quality RAW processing :

First of all, abandon your idea of an "image" as a matrix (grid) of colors (aka a bitmap).

The S90 sensor has barrel distortion that's corrected in software.

It also samples colors in an RGGB Bayer mosaic pattern (like most cameras).

The two of those things combined mean that you really just have a collection of independent R's, G's, and B's at
irregular positions (not on a grid due to barrel distortion).

Now, you should also know that you need to do things like denoising on these original samples, NOT on
the grid of colors after conversion to a bitmap.

So I want to denoise directly on the source data of irregular color samples.
Denoising R & B should make use of the higher quality G data.

Denoising should of course use edge detection and other models of the image prior to make a Bayesian
maximum likelihood estimate of the sample without noise.

To output a bitmap you need to sample from this irregular lattice of samples (mosaic'ed and distorted).

Resampling creates aliasing and loss of information, so you only want to do it once ever on an image.

There's absolutely no a-priori reason why we should be resampling to the same resolution as the sensor
here.  You should resample at this point directly to the final resolution that you want your image.

For example with the S90 rather than outputting the stupid resolution 3648x2736, I would just output 3200x2400
which would let me view images at 1600x1200 on monitors with a box down-filter which will make them appear
much higher quality in practice (vs 3648x2736 viewed at 1600x1200 which involves a nasty blurring down-filter).

The output from this should be a floating point bitmap so that we don't throw away any color resolution

Exposure correction can then be done on the floating point bitmap without worrying about the irregular
lattice or any further resampling issues.


11-06-09 - IsSameFile

I found myself wanting to know if two file names were the same file on disk. It's hard to check that just by looking at the name. Obviously you have issues like one might be absolute, one might be relative. Even if you fix that, they could be different A-code-pageizations of unicode names. And something I hit often is one of them might be on a "subst" or even a hard link. I want to know if they are actually the same file.

This appears to work :

bool IsSameFile(char * Name1,char * Name2)
    HANDLE f1,f2;
    if ( f1 == INVALID_HANDLE_VALUE )
        return false;
    if ( f2 == INVALID_HANDLE_VALUE )
        return false;
    // BY_HANDLE_FILE_INFORMATION has a unique file ID and Volume Serial Number in it
    //  check those are the same
    //  heh fuck it just check they are all the same
    // confirmed : this does work across substs
    return memcmp(&info1,&info2,sizeof(info1)) == 0;


11-04-09 - Video is not for Windows

Holy crap video is a flaming mess.

First, if you don't know, there are two main things : packages & streams. The packages are AVI, MKV, MP4, MOV ; they put together image data and audio data in some way (the layer that unpacks these is often called a "splitter" or "demuxer"). The packages send their streams to codecs which convert them to some format for display. On Windows this is all supposed to go through DirectShow which is supposed to use the 4CC codes and some priority information to automatically find the right handler for the various streams and packages. In theory.

The first problem you hit on Windows is that AVI packages are handled pretty well, but AVI can't hold H264 video because AVI can't handle the flexible B-frame ordering that H264 can generate. (limitted profiles of H264 can be put in AVI, and there are hacks around this problem, but you're getting into a world of hurt). So you need MKV or MP4 boxes, and those are handled poorly; some apps handle them okay, some don't. Some apps "cheat" and don't trust DirectShow like they're supposed to (the cheating apps often work better).

Things I've installed lately :

MP4Box : MP4 stream boxer/unboxer ; pretty decent app, recommended, but help is poor

YAMB : GUI for MP4Box.  Useful to help figure out command lines for MP4Box because the help is bad.
    YAMB has bad bugs though and will fail to launch MP4Box, so you have to copy out the command line
    and run it yourself

MKVVerify : MKV stream checker.  Useful because MKV support is so fucking borked.

MediaInfo : Media info reporter.  Questionable usefulness because I don't trust it and don't know where
    it's getting it's info for.

Graphedit : DirectShow graph visualizer and tester from MS

GSpot : AVI info tool.  Useless.

MSU VMT : Moscow State University Video Quality Measurement Tool.  This is pretty neat when it works,
    but far too often it fails to get the frames correctly, so you get totally bogus results.

MSU LS Codec : Moscow State University Lossless Codec.  Best compressing lossless codec, seems nice
    but crashes some tools when you try to use videos compressed with this.  Thus useless.

Lagarith Codec : This appears to be the one good working lossless codec.  Recommended.

HuffYUV Codec : Videos made with this crash me on read.  Jeff says it works great for him.  Avoid.

MeGUI : GUI for "mencoder" which can driver AviSynth and x264 ; like all of these big GUIs that try to
    run a bunch of other products, this mysteriously fails for me.  It seems to set everything up right
    and then it launches ten other programs and they fail to hook up in the way MeGUI expected them to.

Handbrake : see MeGUI

FFDShow : hooks up the Linux video decoders (ffmpeg , libavc, etc.) to DirectShow.  This thing is
    pretty evil and fails to report frame rate and media info sometimes, but is also the only real

Haali Media Splitter : MKV unboxer, works with FFDShow.  Difficult to install correct manually.
    Even when installed correctly, does some weird shit with framerate; doesn't seem to report it
    correctly through DirectShow.  Probably best to get a codec pack like :

K-Lite Codec Pack : works for me but generally is considered malware
Matroska Codec Pack : didn't work for me
CCCP Codec Pack : not tried

MPlayer : Linux media player, now ported to Windows ; very flexible command line control of everything,
    alternate audio/video in/out.  Highly recommended.

MEncoder : video encode/decode partner to MPlayer.  I've had more success running mplayer and x264 manually
    than using this.  Still I can't complain about MEncoder from the command line.

MPUI : GUI for MPlayer.  This is horrific malware.  When you install it, it takes over your system without
    asking.  They do provide some tools for you to change this after the fact, but still should be avoided.
    Use Media Player Classic or VLC.

AviSynth : script thing to pipe video to other programs that read AVS scripts.  Dear lord.

Basically I've found that all the GUI's are broken, and all the video containers (AVI,MP4,MKV) are broken. The thing I've finally discovered that actually works is using MPlayer and X264 from the command line, and only working with split frames. Trying to work with video containers caused me all kinds of hurt because so many of these apps fail to unbox the containers right and screw up the frame rate or drop frames or other mistakes. Instead now if I want to work on a video I use MPlayer to convert it to raw frames.

mplayer -benchmark -ao null -vo png:z=5 video.avi

to dump frames to PNG

mplayer -benchmark -ao null -vo yuv4mpeg:file=test.y4m video.avi

to dump the video to YUV4MPEG format in "test.y4m" for input to x264

x264.exe --bitrate 10000 --output "out.mp4" test.y4m

x264 compress to "out.mp4"

Then use mp4box to put the audio back if wanted.

The cool thing about mplayer is that its audio/video decoders are the same ones used to view the video. So you can watch it, and if it plays right in the viewer, then it will extract correctly. I've found lots of videos that I can watch in MPC or VLC, but then fail to load the same way in whatever encoder/decoder when I try to process something.

The sucky thing about this method is you make ginormous temp files on your disk, which also slows things down a lot. But avoiding the fuckups of going through the borked DShow codecs and splitters is worth it.

Most of these tools now are originally Linux tools that are getting moved back to Windows. One very promising development is that many of them have the option to directly load libs for the codecs Linux-style (eg. just load libavc to play video) and avoid DirectShow completely. I haven't really tried that yet but it seems like it's almost possible to work with video just by manually picking a few of these libs and then you avoid the whole Windows borked media layer.

ADDENDUM : one of the difficulties I've seen in a lot of tools is reading the frame rate wrong. This is presumably due to the demuxers not reporting things back totally right. But there are also two related fundamental problems that make everything harder :

1. Most of these formats don't have real/useful headers. (if they do have header info, it's just added as "comment" information). This was done originally because theoretically if your AVI is being broadcast on TV and you change the channel into it, you will just start getting bytes in the middle and never see the header, thus they didn't put good headers on at all.

2. It's almost impossible to really reliably just get the frames out of video. DirectShow doesn't have a reliable call that's just "give me the next frame". Instead you have to ask for "when is the next frame" and then "give me an image at this time". The problem is that the "when" can get fucked up in various ways, and then when you say "give me an image at this time" you can either skip frames or get duplicate frames. (this is what fucks up the MSU VMT tool for me so badly, they are getting the time sampling all wrong quite often).

Even if it's not way off, this still causes subtle bugs because people don't agree exactly on how to represent the frame rate. Some people treat broadcast as exactly = 30000/1001 fps and use rational arithmetic for all timing. Some people use floats for frame rate and use 29.97003 and then wind up with floating point precision problems at high frame numbers. Many of the containers store the frame rate as a number of microseconds between frames, eg. 33367 ; so if they store "33367" in the header, should I use that as my frame time increment exactly, or should I use 33366.666666 ?

I'm guessing that tons of people get duplicate and/or dropped frames because of this and just don't notice it.

old rants