Comments on cbloom rants: 05-27-09 - Optimal Local Bases

"However, I bet it's a super pain in the ass to tr...

2009-05-30T13:59:00.913-07:00

"However, I bet it's a super pain in the ass to try to work out an optimal set of N "optimal" transforms."

It's not actually hard at all to get close. Basically you do a kind of k-means thing. You first categorize your blocks using some heuristic characterization method. Find the optimal transform for each category. Then for each block, go back through and assign it to the category that has the optimal transform for that block. Iterate until it converges. This is basically what TMW does for linear predictors and it's also a standard approach in machine learning for building local regressions.

"So then the next step is to just have N pre-chosen block transforms that each block can choose from, those N having been chosen to be optimal across a large set of test images."

Yeah, the big problem is that the choice of transform is totally extra useless information - that is, every one of the transform types can code each block type, just in different ways, so you are intentionally introducing redundancy into the code stream.

Sometimes that's okay if you get enough win from it, but it's never ideal. You'd want to try to predict the transform type and entropy code that as well.

You could also pick more than one "optimal" block ...

2009-05-30T12:31:47.733-07:00

You could also pick more than one "optimal" block transform, and then each block tags itself with which transform to use.

However, I bet it's a super pain in the ass to try to work out an optimal set of N "optimal" transforms.

Also, obviously, having to transfer all the transforms is kind of sucky. So then the next step is to just have N pre-chosen block transforms that each block can choose from, those N having been chosen to be optimal across a large set of test images.

On 1D 4-point transforms: Yeah, the orthonormality...

2009-05-27T17:15:43.075-07:00

On 1D 4-point transforms: Yeah, the orthonormality forces the particular symmetry+sign structure, there's not a lot of DOFs left.

I'm not sure that going for all-out full NxN KLTs is the best next step. One very obvious type of correlations that is everywhere in natural images is features that aren't aligned with the horizontal or vertical axes; by the nature of the tensor product constructions used for both 2D block and wavelet transforms, such features get smeared over several different frequency bands (which then end up being correlated). Images with 45° angles a very obvious example.

There's some work on using what's basically (spatially) rotated 1D wavelets instead: Bandelets. In there, they basically construct a (R-D) optimal basis out of conventional Wavelet packets (=basically block transforms built out of wavelet filters) and Bandelets, with varying spatial sizes and Bandelet orientations. Very interesting. The most interesting aspect about this is that it really captures a type of features that's quite badly represented by ordinary transform types, with a very modest amount of extra information. The main advantage of the latter being that there's a reasonable chance of predicting and adapting it locally.

A big problem with any kind of high-dimensional "just throw it all in" model is that in practice they go smoothly from underdetermined to overfitted without any point inbetween where they're really doing a good job of modelling the data. They can even be both at the same time: there's not enough information to build a good prediction model from, so instead your model spends half its basis vectors on modeling the noise, jitter and quantization errors in your data. Not good.