However, having a performance metric that doesn't exactly match what you want to optimize can be very harmful. You have to be very careful about how you set your metric and over-training for it.
If you make a speed test where you run the same bit of code over and over a thousand times, you wind up creating code that overlaps well with itself and runs fast when it's hot in cache, and maybe code that factors out to a precompute then repeat - not necessarily things that you wanted.
If you set a compression challenge based on the Calgary Corpus, you wanted to get just great compressors, but instead you get compressors specifically tuned for those files (mainly english text).
An example that has misled many people is automated financial trading software. It might seem that that is the ideal case - you get a score, how much money it makes - and you just optimize it to make more money and let it run. But that's not right, because there are other factors, such as risk. If you just train the software to maximize EV it can wind up learning to do very strange things (like massive leverage circular arbitrage trades that require huge leverage to squeeze tiny margins; this is part of what killed LTCM for example).
The only time you can really go nuts optimizing for a metric is when the metric is the real final target of your application. Otherwise, you have to be really careful and take the metric with a grain of salt; you optimize for the metric, but also keep an eye on your code and use your human judgement to decide if the changes your making are good general changes or are just over-specific to the metric.
Anyone in software should know this.
Which is what makes it particularly disturbing that the Gates Foundation supports moronic metric-based education.
When you set simple performance metrics for big bureaucracies, you don't make things better. You make the bureaucracies better at optimizing those metrics. And since they have limited resources and limited amounts of time and energy, that typically makes everything else worse.
Granted, Gates is not so moronic as to advocate "teaching to the test", but even a more complicated cocktail of metrics (which they have yet to define, instead pouring money into metrics research) will not be any different. If you're going to pay and hire and fire people based on metrics you create a horrible situation where any creative thought is punished.
(I think Gates' opposition to small class sizes reflects an over-attention to test results (which have been shown to not correlate strongly to class size) and a lack of common sense about what actually happens in a class room)
The irony is that it's just like the way that horrible teachers grade their students. It's like those essay questions on AP exams where they don't actually read your essay and appreciate what you're saying at all, the grader just looks for the key words that you're supposed to have used if your answer is correct, so you could actually write something that doesn't make sense at all, as long as it has the correct "thesis/evidence/summary" structure you get full points.
It gets me personally hot and bothered. My experience of American public schools was that they were generally absurdly soul-crushing in a bureaucratic Kafka-esque way; like you would be tested for your creativity and independent though process, and the way that was done was you had to recite back the specific problem solving steps that you had been told was the method. In that general depressing stupidity, I was lucky enough to have a few teachers that really touched me and helped me through life because they just engaged me as a human being and were flexible about how I was allowed to go about things. In terms of objective evaluations I'm sure many of those special teachers would have done very poorly. They often spent entire class sessions rambling on about their personal lives or about politics - and that was great! That was so much more valuable to a child than taking turns reading out of the textbook or following the official lesson plan.