You have some testset {T} of many items, and you wish to fit some heuristic model M over T which has some parameters. There may be multiple forms of the model and you aren't sure which is best, so you wish to compare models against each other.
For concreteness, you might imagine that T is a bunch of images, and you are trying to make a perceptual DXTC coder; you measure block error in the encoder as something like (SSD + a * SATD ^ b + c * SSIM_8x8 ) , and the goal is to minimize the total image error in the outer loop, measured using something complex like IW-MS-SSIM or "MyDCTDelta" or whatever. So you are trying to fit the parameters {a,b,c} to minimize an error.
For reference, the naive training method is : run the model on all data in {T}, optimize parameters to minimize error over {T}.
The method of random holdouts goes like this :
Run many trials
On each trial, take the testset T and randomly separate it into a training set and a verification set.
Typically training set is something like 75% of the data and verification is 25%.
Optimize the model parameters on the {training set} to minimize the error measure over {training set}.
Now run the optimized model on the {verification set} and measure the error there.
This is the error that will be used to rate the model.
When you make the average error, compensate for the size of the model thusly :
average_error = sum_error / ( [num in {verification set}] - [dof in model] )
Record the optimal parameters and the error for that trial
Now you have optimized parameters for each trial, and an error for each trial. You can take the average over
all trials, but you can also take the sdev. The sdev tells you how well your model is really working - if it's
not close to zero then you are missing something important in your model. A term with a large sdev might just be
a random factor that's not useful in the model, and you should try again without it.
The method of random holdouts reduces over-training risk, because in each run you are measuring error only on data samples that were not used in training.
The method of random holdouts gives you a decent way to compare models which may have different numbers of DOF. If you just use the naive method of training, then models with more DOF will always appear better, because they are just fitting your data set.
That is, in our example say that (SSD + a * SATD ^ b) is actually the ideal model and the extra term ( + c * SSIM_8x8 ) is not useful. As long as it's not just a linear combo of other terms, then naive training will find a "c" such that that term is used to compensate for variations in your particular testset. And in fact that incorrect "c" can be quite a large value (along with a negative "a").
This kind of method can also be used for fancier stuff like building complex models from ensembles of simple models, "boosting" models, etc. But it's useful even in this case where we wind up just using a simple linear model, because you can see how it varies over the random holdouts.
No comments:
Post a Comment