10-27-11 - Metrics

The best thing you can ever have in software development is a good metric that you are trying to optimize for. A repeatable test case that produces a score and all you have to do is maximize that score.

However, having a performance metric that doesn't exactly match what you want to optimize can be very harmful. You have to be very careful about how you set your metric and over-training for it.

If you make a speed test where you run the same bit of code over and over a thousand times, you wind up creating code that overlaps well with itself and runs fast when it's hot in cache, and maybe code that factors out to a precompute then repeat - not necessarily things that you wanted.

If you set a compression challenge based on the Calgary Corpus, you wanted to get just great compressors, but instead you get compressors specifically tuned for those files (mainly english text).

An example that has misled many people is automated financial trading software. It might seem that that is the ideal case - you get a score, how much money it makes - and you just optimize it to make more money and let it run. But that's not right, because there are other factors, such as risk. If you just train the software to maximize EV it can wind up learning to do very strange things (like massive leverage circular arbitrage trades that require huge leverage to squeeze tiny margins; this is part of what killed LTCM for example).

The only time you can really go nuts optimizing for a metric is when the metric is the real final target of your application. Otherwise, you have to be really careful and take the metric with a grain of salt; you optimize for the metric, but also keep an eye on your code and use your human judgement to decide if the changes your making are good general changes or are just over-specific to the metric.

Anyone in software should know this.

Which is what makes it particularly disturbing that the Gates Foundation supports moronic metric-based education.

When you set simple performance metrics for big bureaucracies, you don't make things better. You make the bureaucracies better at optimizing those metrics. And since they have limited resources and limited amounts of time and energy, that typically makes everything else worse.

Granted, Gates is not so moronic as to advocate "teaching to the test", but even a more complicated cocktail of metrics (which they have yet to define, instead pouring money into metrics research) will not be any different. If you're going to pay and hire and fire people based on metrics you create a horrible situation where any creative thought is punished.

(I think Gates' opposition to small class sizes reflects an over-attention to test results (which have been shown to not correlate strongly to class size) and a lack of common sense about what actually happens in a class room)

The irony is that it's just like the way that horrible teachers grade their students. It's like those essay questions on AP exams where they don't actually read your essay and appreciate what you're saying at all, the grader just looks for the key words that you're supposed to have used if your answer is correct, so you could actually write something that doesn't make sense at all, as long as it has the correct "thesis/evidence/summary" structure you get full points.

It gets me personally hot and bothered. My experience of American public schools was that they were generally absurdly soul-crushing in a bureaucratic Kafka-esque way; like you would be tested for your creativity and independent though process, and the way that was done was you had to recite back the specific problem solving steps that you had been told was the method. In that general depressing stupidity, I was lucky enough to have a few teachers that really touched me and helped me through life because they just engaged me as a human being and were flexible about how I was allowed to go about things. In terms of objective evaluations I'm sure many of those special teachers would have done very poorly. They often spent entire class sessions rambling on about their personal lives or about politics - and that was great! That was so much more valuable to a child than taking turns reading out of the textbook or following the official lesson plan.


jfb said...

Metrics in education are a vain attempt to counteract the lack of market pricing. Being arbitrary instead of being an expression of what people actually want in education, they aren't going to work.

I think if vouchers aren't politically acceptable, then what they ought to do is make schooling entirely optional, and only fund per student that is voluntarily attending. There's a 'metric' for you -- is it worthwhile enough *to the student* to even bother to go. I wasted many years of my young life where the answer would have been 'no'. And that doesn't mean a loss of learning, it just wouldn't have been bullshit done slowly in, yeah, a soul-crushing environment.

By the way, have you read http://chrishecker.com/Metrics_Fetishism ? It's pertaining to games but otherwise about the same thing. The gradient ascent bit struck me as spot on.

Thatcher Ulrich said...

This is the blog of my friend Mike G who is trying to improve education at a detailed level. I think it's a balanced, pragmatic and informed take on metrics etc although mainly he is focused on the problems of relatively bad schools.


Here's an essay he wrote about metrics & research in education:


cbloom said...

Reminds me of this :


I find the whole idea of researching human productivity a bit distasteful. For one thing the research is almost always just plain wrong. They operate over much too short of a term. They judge the results in too narrow of a way. They also don't control for constant effort (eg. in the teacher phone call case, the result is useless; you have to compare against an alternative way for the teacher to use that amount of time & effort). They are severely biased by the fact of having an observer. All the research on charters is severely biased by the population selection of charters. etc.

All of that is sort of benign as long as the research just goes in a journal and you are free to use it or not.

eg. as programmers we see this stuff all the time - newflash : Java coders are 50% more efficient than C coders! Using SQL instead of raw files halves coding coding! Pair programming FTW!

And those of us who are not foolish just go "LOL whatever, I'll make my own decision about how to do things".

The problem with things like schools is that these methods will be forced on the teachers. And that I find inhumane.

cbloom said...

Kitchen Stories is the benign, more modest face of this. The thing I really think of is Stalin's or Mao's programs of "efficiency" ; the idea of a state bureaucracy deciding how the workers should do their work, and then cooking up metrics for success which lead to bizarre distortions.

old rants