10-20-06 - 1

So, for basic Collaborative Filtering, the strongest thing from the literature that I've found is "Slope One" . If you think about it for a few seconds you'll realize various ways to improve their algorithm. In the end the whole collaborative filtering problem comes down to choosing a set of supports which you guess are well correlated to your desired prediction, and then choosing weights for each of the supports. Finally you might apply corrections if you can make estimates of the expected deviation.

I now seem to be getting around 0.920 and there still a million little areas for improvement. Every time I check off one "todo" I add five more as more questions are opened up (the vast majority of which are dead ends but need to be explored).

One thing's been troubling me lately - I've tried various schemes and thrown out the ones that weren't as good. Perhaps that's wrong though? They aren't as good over all the data, but maybe they're better on portions of the data, and if I could choose that portion of the data, they would help overall. This was part of the surprise of Volf's compression work - adding in predictors that are much worse overall can still be a huge win if they are selected on the portions they do well on. With CF it's intuitively obvious that you have these similar-item based predictors and similar-user based predictors. In general similar-item does better. However, on certain queries you might be able to guess that they will be more user-correlated than item-correlated, so you can user your similar-user predictor on that case and win.

No comments:

old rants