12-17-09 - Spam Stupidity

A friend send me this email a few days ago, and Gmail wisely decided it was Spam :

"You work on a 64 bit PC at Rad, right? Did you have any weird issues with any dev software? Was it a relatively painless switch?"

This is one of the most obvious examples yet that I've gotten which is just completely retarded.

I've written before about how they should clearly have an exclusion for people who I have extensively emailed with in the past. WTF. (one of the most retarded examples that I've mentioned previously was randomly picking a few mails out of an ongoing thread and calling them spam, but not the others in the same thread)

Clearly another one should be : if there are no links in the email, then greatly decrease the spam decision threshold (eg. call fewer things spam). Only spam with links are dangerous, and it's very rare to get spam without links these days anyway.

Also if there's no mention of banks, money, credit, penises, or viagra, it's probably not spam.

Obviously their spam filter is just broken. But even aside from being broken, I'm sure it's missing the concept of cost/benefit. That is, for a given mail, you need to decide how bad it would be to misclassify it as spam when it's not spam, and vice-versa. The thing is, that cost is not a constant. It should be dependent on the content and the sender. There are some simple cases, like if the content is "harmless" - no links, no attachments, no mention of Nigeria - then the cost of letting through spam is not very high. Once you guess the cost of each outcome, then you can have your Bayesian spam system give you a guess of what % chance this is spam, and you evaluate the EV of each classification and make the maximum EV decision.

No comments:

old rants