9/29/2009

09-29-09 - Aliasing is Pretty

I made some images of aliasing/moiray patterns : (BTW inherent to these images is the fact that they don't look at all right except at full zoom since the image structure is created by the pixel grid sampling aliasing, so click through). You may be surprised to know what these are an image of.

Photobucket

Photobucket

Photobucket

Photobucket

Photobucket

Photobucket

Photobucket

....

Answer : these are pictures of sin( x^2 + y^2 ) , that is, sin( r^2 ) a very simple radial trig function. You don't see the function at all, the images are 100% aliasing as the radial sin is scaled to very high frequency.

9/26/2009

09-26-09 - Habits - Code Inlining

There's been an off and on discussion with some peers in email about the issue of putting small code functionality directly inline in a bigger function vs. splitting it out. I don't think it's actually an interesting topic so I don't want to repeat it here, but it made me think of something I want to write about.

Basically the contention is that even if you have a logically separate piece of work, sometimes it's better to go ahead and write it inline in the larger function, because it makes the code flow more linear and imperative and thus easier to read and debug. There are also obvious disadvantages to writing code inline - less strong separation of functional bits, temptation to copy-paste code, less ability to debug and test functions independently, etc. etc. All the pros and cons are obvious and thus not interesting.

The key point to me is the issue that while code inline may be a win sometimes - it's a big loss at other times, and it requires you to make a careful smart logical decision about which way to go.

That's very bad. I think most smart people as they get older come to the realization that you shouldn't put yourself in positions where you have to make careful logical decisions over and over. Instead you should craft habits and rules for yourself that help you do the right thing without having to make careful decisions.

One obvious case that most smart people figure out is diet and exercise, or booze or other indulgences. Yes, it's perfectly fine to have a some dessert once in a while, but if you open that door for yourself you're putting yourself in a situation where you are consciously making a decision "is it okay for me to have cake today?" and you will inevitably get lazy and make the wrong choice sometimes.

As usual the best analogy is poker, and it's how this point was really made real for me. Smart people often start out playing poker trying to logically reason out every single action and think they don't need to be constrained by simple rules or habits. That's great if you really are correctly thinking through every situation, but you inevitably get tired or lazy or make mistakes, and if you're not constraining yourself with rules, you can make huge mistakes.

For example, there might be cases where the best play is to limp aces up front, or to not reraise with aces, but correctly making that decision requires a lot of careful thought, and the upside to making that decision is pretty small, while the downside to doing it in the wrong situation is very big. It's best to just make a rule for yourself that you always raise or reraise. It simplifies things, it removes decision points and lets you focus on more important issues. It might be +EV to call raises sometimes with hands like 68o, but it's best to just give yourself a rule that you never do that.

To be clear - these rules are specifically non-optimal. By making rules for yourself you are intentionally choosing to not try to be 100% optimal, so of course someone can say "you could have done something better there". That's not the point. The point is that if you try to make perfect decisions all the time you will occasionally fail very badly.

Winning poker play (or good coding, or good life living) are largely about making it easy for yourself to do the right thing.

9/25/2009

09-25-09 - Motion Search

I just read the paper on Patch Match and it makes me angry so I figure I'll write about the motion search method I'm developing for possible future use in the new RAD video stuff. PatchMatch is just so incredibly trivial and obvious, it's one of those things that never should have been a paper and never should have been accepted in a journal. It's a great thing for someone to write on their blog because you can describe it in about one sentence, and most experts in the field already know the idea and are probably doing it already. (I will say the good thing about the paper is they do a good job of gathering references to other papers that are related, such as stuff in texture synth and hole filling and so on which I find interesting).

Here's the one sentence version of PatchMatch : Seed your match field with some random guess or shitty initial matches; improve by incrementally propagating match offsets to neighbors and trying small random deltas to find improvements. (it's an absolute classic spin network magnetic moment relaxation kind of problem).

Here's what I've been doing : start with a match field set to all nulls (no match found yet). Then incrementally fill it in with matches and propagate them to neighbors. It proceeds in a few steps like this :

Step 1. Use computer vision methods to find feature points in a frame. Match feature points to the previous frame. This bit is a bit tricky and tweaky, you only want to make matches that you're pretty confident in. Note that this matching is done based on a "characteristic" of the feature point which has no distance limit, and is also somewhat immune to rotation and scaling and such. Sometimes this step finds some very good correspondences between the frames, but it's sparse - it only has high confidence at a few places in the frame, so you can't use it to find all the block matches (and you wouldn't want to even if you could). Generally this finds around 100 vectors.

Step 2. Find "distinctive" spots in the frame. The goal is to find some spots that are not degenerate - eg. not flat patches, not straight edges. The idea is that these are places where we can likely find a good motion vector with high confidence, unlike degenerate areas where there are lots of equally good match vectors. I use two mechanisms to find distinctive spots : one is the computer vision feature points that were not already used in the first matching step. The second is to take the "cornerness" map of the image using a Harris or Hessian operator on the derivative of gaussians (this is a lot like an edge map, but it kills straight edges). Find the top 5% highest cornerness values that are local maxima, and use those as distinctive spots. All of the distinctive spots do a long distance brute force block match (something like radius = 16 or 32) to try to find a good motion vector for them.

Step 3 : Flood fill to fill in the gaps. We now have presumably good motion vectors at a few key points in the frame. Go to their neighbors and search for match vectors that are close to the neighboring one that we already found. Put that in the blank and push its neighbors to the queue to continue the flood fill.

Step 4 : Relaxation pass. (this is not critical). We now have a motion vector everywhere in the frame. For each match vector in the frame, look at its 4 neighbors. Examine match vectors that are near my 4 neighboring vectors. If one is better, replace self. Continue to next. Theoretically you should do this pass a few times, but I find 1 or 2 is very close to infinite.

The key thing is that motion is usually semi-coherent (but not fully coherent, because we are not really trying to find true motion here, but rather just the best matching block, which is a lot more random than true motion is). By finding very good motion vectors in seed spots where we have high confidence, we can propagate that good information out to places where we don't have as much confidence. This lets us avoid doing large brute-force searches.

BTW I really do not understand the point of all the "diamond search" type shit in the video compression literature. It seems to just find really shitty motion vectors and is not making good use of the possibilities in the bit stream. Especially with GPU video encoding in this modern age, doing plain old big chunks of brute force motion search is preferrable. (yes, I know it's for speed, but it's a poor way to optimize, and the high quality encoders are still non-realtime anyway, so if you're not realtime you may as well take some more time and do better; plus the vast majority of use of non-realtime video encoders is in an encode-once decode-many type of scenario which means you should spend a lot of cpu and encode as well as possible).

With this method I find motion vectors using local searches of radius 8-16 that are the same quality as brute force searches of radius 50-100, which makes it about two orders of magnitude faster (and higher quality, since nobody does brute force searches that far).


ADDENDUM : To give this post a bit more weight, here are some numbers on quality from my video coder vs. brute force search radius :


 -s16  : rmse : 9.3725 , psnr : 28.7277
 -s26  : rmse : 9.2404 , psnr : 28.8510
 -s48  : rmse : 9.0279 , psnr : 29.0531
 -s64  : rmse : 8.9171 , psnr : 29.1603
 -s100 : rmse : 8.7842 , psnr : 29.2907
 -s9999: rmse : 8.5294 , psnr : 29.5465

(-s16 means it's searching a 33x33 grid for motion vectors) (-s9999 means it searches full frame).

The above described iterative feature point propagation method gets


 -sfast: rmse : 8.8154 , psnr : 29.2600

BTW for doing full-frame brute force search you obviously should use a block-space acceleration structure for high dimensional nearest neighbor search, like a kd-tree, a bd-tree (box decomposition) or vp-tree (vantage point). High dimensional spaces are nasty though; the typical idea of "find a cell then walk to its immediate neighbors" is not fast in high D because you have O(D) neighbors.

9/08/2009

09-08-09 - DXTC Addendum

Ryg pointed out that there are a few very important little details that I took for granted and didn't mention in my original DXTC postings , or was just not clear about :

One is that when I try all ways of hitting two given endpoints, I try both 4-color and 3-color versions. That is, given two endpoint colors C0 and C1, I quantize them to 16 bits, then try the DXT1 palette that you get from {C0,C1} and also the one from {C1,C0} (order of DXT1 determines whether it is 3-color or 4-color).

The second and related crucial thing, is that in 3-color mode, the extra color is transparent black. If the texture has no alpha at all, I assume the user will not be using it as an alpha source, so I treat the transparent black as just black. That is, I do color palette selection with alpha just ignored.

Apparently this is pretty important. I suspect this especially helps with the "4 means" method; if a bunch of the colors are near black, you want them to be classed together and then just ignored for the endpoint selection, so that they will go to the hard-coded black in 3-color mode and your interp end points will be chosen from the remaining colors.