10-28-06 - 1

I urge you all to boycott the ieee and acm. research papers online should be free.


10-25-06 - 1

I'm seeking reading material since I can't type. Help on :

"supervised learning" , via nnets or whatever, perhaps CART decision trees. find f() to minimize |y - f(X)|.

categorizing/clustering. related to DT problem. how to find planes or rules to split a huge set of N-d points into related groups that minimize some overall cost function.



10-24-06 - 1

I crashed my bike and have a separated right shoulder. I'm typing lefty so will probably not reply to your email.


10-20-06 - 2

Anyone know about "expert advice" (learning), combining predictors, etc. ? I know a bit about this field but it's a huge field and sifting through the literature is proving difficult. In particular it seems most of the problems are posed as incremental "games" where you get a time sequence of data and get to see if you were right or not after each guess. Also I'm trying to find Paul Volf's "Switching" paper from DCC98 , but it seems to be only available for pay in the IEEE library. That's especially annoying cuz I think I own the DCC98 book but I have it in my storage locker :(

10-20-06 - 1

So, for basic Collaborative Filtering, the strongest thing from the literature that I've found is "Slope One" . If you think about it for a few seconds you'll realize various ways to improve their algorithm. In the end the whole collaborative filtering problem comes down to choosing a set of supports which you guess are well correlated to your desired prediction, and then choosing weights for each of the supports. Finally you might apply corrections if you can make estimates of the expected deviation.

I now seem to be getting around 0.920 and there still a million little areas for improvement. Every time I check off one "todo" I add five more as more questions are opened up (the vast majority of which are dead ends but need to be explored).

One thing's been troubling me lately - I've tried various schemes and thrown out the ones that weren't as good. Perhaps that's wrong though? They aren't as good over all the data, but maybe they're better on portions of the data, and if I could choose that portion of the data, they would help overall. This was part of the surprise of Volf's compression work - adding in predictors that are much worse overall can still be a huge win if they are selected on the portions they do well on. With CF it's intuitively obvious that you have these similar-item based predictors and similar-user based predictors. In general similar-item does better. However, on certain queries you might be able to guess that they will be more user-correlated than item-correlated, so you can user your similar-user predictor on that case and win.


10-19-06 - 4

It's interesting running this Netflix stuff. I've got a 3 GHz P4 desktop and a 1.7 GHz PM laptop, both with 1 GB RAM and fast disks. Some of the operations are slightly faster on the desktop, but others are *much* faster (like 2X) on the laptop. A lot of the ops are very memory bound, running over lists of 50k elements and doing indirection from one big list to another big list.

10-19-06 - 3

I just realized that cosine similarity is very closely related to MSE similarity.

Consider two vectors A and B. The "MSE" of their difference is just |A-B|^2, the "rmse" is |A-B|. You can define a scale-invariant version of MSE as (|A-B|^2)/ab (where a = |A|, b = |B|). I'm not sure this scale invariance is a good thing or not, in practice is does weight things to the weightings, but people seem to like it because it's "elegant". Anyway, the Cosine law tells us that :

Cos(theta) = (a/b + b/a)/2 - (|A-B|^2)/2ab

The second term is familiar, the first term is this weird sort of ratio average. In the special case of unit vectors a = b = 1 this reduces to

Cos(theta) = 1 - (|A-B|^2)/2

So for unit vectors a "cosine" similarity and an MSE similarity will produce identical orderings. For non-unit vectors, we have this ratio term. The ratio term is minimized when a=b, so it gets bigger for length differences. In a sense it cancels out the length difference from |A-B|. Consider two parallel vectors of different length. The MSE metric considers these very different, but cosine says they are the same.

In practice with movie ratings there are some very weird things that happen here. The "A" vector is the vector of user A's ratings over N movies *subtracted by his average rating*. This subtraction is important, but it does a weird thing. It makes the vector of random direction (by definition of average), and generally makes it close to zero. If you look at a subset of movies that he rates close to average, his vector will be very close to zero and very random. The "B" vector is the same thing. If you now consider two vectors that are close to {0}, the MSE error between them is of course tiny, but the cosine between them is completely random. Obviously that's not a very sensible weighting. Similarly, consider the "scale invariant" version of MSE. You're dividing by the vector lengths. What that does in practice is make the error much larger for vectors near zero. (dividing by length does sort of work for Cosine because larger cosine = better, the opposite for mse)

Despite all this the "Pearson" correlation for user-user similarity does in fact seem to perform well.

10-19-06 - 2

So out of curiosity I just tossed together a trial run of my rebuilt Netflixxer and got an 0.925 , so that's encouraging.

I was thinking that to have any chance of winning I need to get some team-mates / helpers & use the power of the internet. I don't want to deal with signing people up & working out contracts where we'd have to negotiate shares & so on. I was thinking I'd do something like just give away my app, but make people agree that they won't do their own submission or something. Then I at least might attract some hobbyists who want to play with the problem and they might find some improvements or try some ideas that are beneficial.

10-19-06 - 1

Old Betty Boop Cartoons are like super cool. They're not funny, but they're surreal, the perspective is warping and things are changing shape all the time, and most of them are like music videos, set to a song with everything in the video sort of pulsing and dancing to the tune. A bunch of them have music & voices by Cab Calloway which is the fucking bees knees.


10-18-06 - 1

I just randomly stumbled upon Matt Mahoney's PAQ data compressor, which perhaps is the state of the art these days. I would say it follows in the tradition of PPMii-PPMZ/Rkive, though he doesn't really cite those as he apparently started from a neural net background which is weird. He's doing a lot of the things Malcolm Taylor invented in Rkive, such as using funny non-linearly-preceding contexts (eg. not just ABCD, BCD, CD, D, but also "xBxD" and "AxCx" and so on). The PAQ implementations that win challenges seem to be tweaked out to all hell, which is sort of lame, but that's what you have to do to win in practice. There's some very elegant stuff he's done. There's a short paper about it on his page but I can't really find a thorough description anywhere where he talks about different things he tried & why he's doing them like he is. Alas. He's doing bit-by-bit coding instead of byte-by-byte which I always thought was much more elegant (you don't have charactes and escapes, just 0's and 1's) but I always got worse performance with bits rather than bytes; I guess I was leaking some efficiency in each coding op which adds up when you do 8X more coding ops.


10-17-06 - 6

I've noticed two really odd things in the CF research papers so far.

1. They all use this "cosine" similarity measure. The cosine similarity is basically the dot product of the normalized vectors of ratings. What that means is that two movies with ratings like {2,2,2} and {4,4,4} are considered to be identical by this measure. Now, that's an okay thing assuming you're compensating for movie average, since if you subtract the average off both they are both {0,0,0}. However, the standard way of making an item-based prediction does NOT subtract off the average! It's reported in the literature as

Pred = Sum{other items} Weight(this,other) * Rating(other item) / sum of weights;

If you were to correct for averages it would be :

Pred = Average(this item) + Sum{other items} Weight(this,other) * (Rating(other item) - average(other item) / sum of weights;

But that's not what they report.

2. The exception to this (not correcting for average) is the "Similarity Fusion" paper (sigir06_similarityfuson.pdf) which DOES correct for averages, however, they specifically claim that their method in the case of item-only weights reduces to the standard item-based predictor which it in fact does NOT. It reduces to the 2nd term above which is the average-corrected item based predictor, which is quite different.

It seems to me the cosine weighting is very bad if you don't correct for averages, and yet the standard in the literature is to use the cosine weighting and not correct for averages. WTF. The standard "Item-Based Collaborative Filtering" paper (www10_sarwar.pdf) does try using a linear slope+bias to shift ratings between items, but they find that it hurts performance with lots of neighbors or dense data. Weird.

10-17-06 - 5

I had some garlic that was starting to sprout. You can still use it when the sprouts are small, and actually the sprouts are tender and mild, quite delicious, there's no need to cut them out. Anyway, I had too much garlic so I stuck this head in some soil and it's growing like gang-busters; it's shot up big sprouts in just a few days, it's probably the most fun thing I've ever seen growing, because you can almost see it happening in real time. It's next to my computer and I look back over every few hours and it's grown another centimeter !!

10-17-06 - 4

Okay, is it just me or is the "Similarity Fusion" paper completely BS ? (sigir06_similarityfuson.pdf). They initially claim that they're going to combine the prediction from similar users & similar movies based on their confidence in each (which sounds good & reasonable), but then they introduce hard-coded constants and basically just lerp between the two using these fixed constants.

10-17-06 - 3

I don't get why the poker sites give you these freaking rewards for play that you have to spend in their stupid store. I know it makes it seem like you're getting more value, eg. they can give you something with a "value" of $200 when it only costs them $50 or whatever, but I don't get why they don't just give you the option to get the cash value, or even less. Even freaking Wheel of Fortune in the bad old days had the cash option. You'd get your $5000 to spend on "prizes" and you could get all these garbage over-priced things or you could get a much lower "value" choice in cash. Gimme the cash!

10-17-06 - 2

Okay, so I tore apart my NetFlix app and redid some of the basics with more thorough investigations. I can now definitely confirm that most of what's done in the literature is just really wrong and dumb. It reminds me a lot of the way the Compression field was back in the 80's. Somebody came up with some decent ideas way back when, and everyone just keeps using parts of that algorithm even though it was just a rough/poor initial guess. People keep introducing fancier ideas but they keep the broken base. I don't mean to rag on the researchers and make it sound like I'm better than them. They come up with beautiful algorithms that I never would've come up with. But then they just don't try very hard. Like the classic example with PPM was the escape frequency. They tried 1. They tried 1/2. But what about 0.67 ? What about a number that varies based on the context depth & occupation? anyhoo...

Some of the current leaders have finally spoken up in the Netflix forum. They seem to all be using some sort of standard academic bayesian/neural net learning system. This could be hard to beat if they're good & have powerful server clusters to do the training, which they presumably do at universities.

10-17-06 - 1

I'm fascinated recently by the fantasy of super-hot muslim women who have to wear a full burka so no one ever sees anything except their eyes. I think it's replaced my pregger fetish.


10-16-06 - 2

I just found this set of Good Collaborative Filtering Papers (haven't read any yet). I'll release my Netflix code at some point if/when I decide I have no hope of winning.

So, I've read through a bunch of the literature. It seems like I independently invented most of the same techniques. Some interesting tidbits : I'm currently using 2 similar movies & 2 similar users, while the accepted good # in the academic community seems to be 30 (!!). (using more doesn't increase my run time at all since I'm looking at all of them anyway to pick the best 2). Also, the way they're comparing vectors is generally a dot product, while I've been using like a Euclidean distance. I'm not sure if that will make much of a difference. They also do fudging on the similarity to penalize small intersections and such; I have similar fudging and I've found that the details of those fudge factors are enormously important in practice. I can't tell in the literature if people have really investigated different similarity functions, or everyone is just using the one that was published first and has become standard.

One cool approach is the "Context-Boosted Collaborative Filtering" paper, though it seems completely unusable on large data sets because the first step is turning the sparse matrix into a full matrix by filling in the gaps with content-based predictions. That is, it would fill in the full 17k x 480k matrix of ratings, which is 8 billion ratings, which aside from taking about 8 GB of storage (not bad), would take around 1,000,000 hours to compute.

10-16-06 - 1

So, fucking FoxIt PDF Reader is a lot faster than Adobe's, but it's FUCKING MODAL when it's drawing the page, you can't scroll & zoom while drawing, which makes Adobe more usable in practice because when it starts drawing some ridiculous diagram of the inside of an engine or something I can page-down to get to a page that doesn't take seven years to show.


10-14-06 - 1

Parallel parking on a hill is so damn hard with a stick-shift. I did it today and I totally burned up my clutch, trying to inch back and forth on the hill without slamming into the cars around me. I've seen people here who do it like pros in spots that look way too small for them to fit. On that note, fucking people who park in the middle of the curb are asses and morons. Also, if you're parking at the end of the curb, so many fucking retards will come up as close to the next car as possible. Dumb fuck, you're at the end of the curb, you should park as close to the end as you're allowed - you're setting the end justifier, don't waste space at the end and just pack the line tighter. Goofs.


10-13-06 - 4

The car right next to mine got "booted" this morning and it freaked the hell out of me. It's right in front of my window so I saw them drive up in the DPT van and pull out the boot and walk over towards my car. I was thinking "wtf wtf" my heart was racing, and then I saw some other poor sucker get the boot. Phew. Parking is such a nightmare.

10-13-06 - 3

My Netflix contestant is close to Cinematch now, and my program is still mighty dumb. The predictor is quite sensitive to little things and you do have to be careful not to cheat on the probe data, because the probe queries are already in the conditioning set and they can help you a lot in subtle ways. For example, if you let them be included when computing the average rating on each movie, it actually helps a lot. I was doing that and got 0.91 rmse and thought I was the winner, but knew it was too good to be true.

Having the probe data contained in the training set is actually really annoying, because it means I can't fairly precompute very much based on the training set. For example, if you precompute the "most similar" other movie for each movie, you will be using the probe data as part of that, and that helps a *TON* which totally throws off testing on the probe.

There's one case maybe y'all can give me ideas on. It's surprisingly common to have a movie for which I can't find any good "similar" movies. These "red herring" movies wind up contributing a lot to the RMSE score because they tend to be predicted badly. Right now I'm just using the average rating of them as their prediction, but there must be something better.

10-13-06 - 2

In algorithms like this Netflix dealy, something I found back in the compression days is that there's two steps really to algorithm design - first you create a good general structure based on theory of what you think should happen. Then you find lots of little tweaky cases and start doing lots of heuristic "clever" things, and often those sort of finishing touches wind up being huge improvements. This can perplex the algorithm developer in two ways. First of all, you can sometimes have a pretty poor basic algorithm, but you tweak it and do all sort of cleverness to it, and it actually winds up performing really well. The problem is if you try other algorithms that might be much better, they will appear worse because you haven't put all the time of tweaking into them. You can dig yourself into a local minimum where you have all this fancy heuristic junk which really helps your poor algorithm, but doesn't help the better algorithm. The other funny thing that can happen is that some algorithms that are "better" are really hard to do these clever tweaks with. In data compression you have algorithms like CTW which are really beautiful theoretical structures and certainly base CTW beats base PPM, but PPM is intuitive and easy to play with and you can build all sorts of other ideas into it and you can wind up beating CTW because you can isolate the inefficiencies and attack them with various schemes.

One thing I'm surprised at. 25% of movies fall into a category of "super predictable" cases. In these cases, I gave the same rating to the two most similar movies, and the most similar user gave the same rating to the query movie. The RMSE of these queries is still 0.85 ; I would have thought these cases would be more predictable.

Part of the problem with making a really amazing predictor is that the ratings are so discretized. If somebody felt like a "3.5" about a movie, they will sort of randomly shove it to a "3" or a "4" depending on how they feel, but the difference is a whole 1.0 which is big.

10-13-06 - 1

I've done everything reasonably obvious with a linear predictor. There's a nasty trap in tweak factors, in that you are somewhat hard-coding a best fit to the data into your app with the tweak constants. That's fine as long as the data extrapolates well - eg. the fit on the "probe" is also close to the best fit for the "qualifying", but that will only be true if the data's not really strange and your clusters that you fit for are big enough. Any time you fit tweaks that only apply to a handful of elements you are in serious danger of fitting them to the training set too much and not really fitting the general case.

BTW I find the whole "Prize" aspect sort of annoying because it means no one is talking about their approaches publicly. Of course, even in fields where there is no prize, people are still super-secretive, because they always think they have some brilliant idea which is going to make them famous in the academic world, or get them patents and make them rich, blah blah blah.

Oh, I still could use some volunteers to run the heavy process if you have a fast machines that are idle.


10-12-06 - 1

I just realize that I say "k" for thousands of dollars, when I guess most people say "g", as in "I made 2k today beeyotches!". Party Poker is cutting off US players tomorrow and it was going nuts today. Wildest day of poker in my life. I got super unlucky for a long while there, and was at -1k for the day at the worst point, then went on a +3k heater.


10-10-06 - 2

Cracked pepper really is way better than ground pepper on steaks and it's very easy to do.

10-10-06 - 1

Anybody know how to make FireFox not do the thing where it incrementally loads & formats as it goes? I hate the way some web pages open and then reformat over and over as more elements load in. I just want it to wait until the necessary info for formatting is loaded and then be done, which IE mostly gets right.


10-09-06 - 1

So, I have a Netflix contestant that gets 0.978279 on the probe set. That would be on the leaderboard in like 10th place or so at the moment, but it's behind CineMatch and well behind the leaders who are now beating Cinematch (but still way off the 10% improvement needed to win). I'm guessing that the people who have gotten close to Cinematch know something of the standard algorithms in this field and have implemented one of them. LOL I guess that's a good approach. My app is really slow despite fitting all the data in RAM. I haven't run it on the full qualifier set yet, just the probe which is like 5X smaller, and it takes me a full day to run on the probe.


10-06-06 - 3

I got my first non-trivial predictor working on the Netflix data. It's a simple single support predictor using the one most similar movie that user has previously rated.

RMSE for guessing each movie gets its average rating : 1.051938
RMSE for Netflix' Cinematch : 0.9474
RMSE for simple one-movie predictor : 1.037155

So, that's not very encouraging. Obviously I knew it wasn't going to be close to Cinematch, but the improvement over just using the average for each movie is microscopic, and cinematch is like light years ahead.

I'm starting to think that beating Cinematch by 10% is all but impossible. It's a lot like data compression where the closer you get to the true entropy the harder it is to make advances. The Netflix Prize FAQ justifies their 10% criterion by saying the difference between the "average" predictor and Cinematch is about 10%, so they want 10% more. That fails to address two major factors : "average" is actually not an awful predictor at all, and the next 10% is way way harder than the first 10%.

10-06-06 - 2

Damn, I just found out about Sufjan Stevens playing in Berkeley. He's a corn-fed god-loving hipster music magician!!! The shows are sold out and people are scalping them for a fortune :(

10-06-06 - 1

I allocate two 300k buffers and Windoze says my app is using 480k of memory !? WTF !? I have plenty of RAM free, is it putting part of my allocations in the page file? When I pull the files in with Memory Mapped Files, I'm not sure what's happening. It all seems to work cool, but I'm very suspicious about what's happening. I wish it would just use up all my damn free memory on the app I'm running !!???!!!!

Anyone know anything about Memory mapping? Is there a reason I should memory map instead of just reading the whole file into big buffers?

Also, does anyone have a powerful machine that I can run a heavy process on for a long time? I estimate it will take about 200 hours of CPU time, 2 Gigs of RAM is necessary.


10-04-06 - 2

Random job ideas :

Postman. Apparently very hard to get. I like walking around
Parks worker. Similar situation, I like planting plants and such, being outside
Apprentice Carpenter. I've seen some adds for this; I have no skills but would like to learn. I could offer to work the first few weeks with no pay
Bike shop retail or repair. Again my bike repair skills are only passable not expert, but I'd like to learn.
Cook. I'd love to be a chef but that's a long hard road and being an entry-level prep cook or something is pretty brutal
Other ?

10-04-06 - 1

I'm assuming it would be a good thing to load the Netflix Prize data into an SQL DB like Postgre. The thing you want is to cross-index; for each movie you want the list of everyone who's rated it & their ratings; for each user you want all the ratings they've assigned to movies. You could do this easily in C just by having two linked lists on a {user,movie,rating} data item. The only reason for me to not just do it in C is that it's a ton of data and might not fit in memory. Plus I guess learning SQL might be useful some day.

Anybody who knows Windows - what's the best way to hold an 800 MB data structure on a machine with 1 GB of RAM ? I'm assuming if I just hold it in RAM I'm going to page like crazy and get killed, cuz Windoze is Stoopid. I guess I could write my program in basic C and boot to command line Linux or something ridiculous like that but that's not fun.


10-03-06 - 2

The Netflix Prize is pretty interesting. Basically they're trying to get someone to improve their recommendation algorithm and they pay you if you succeed. It seems they set the payment goal to be inaccessible, and their T&C means even if you don't reach the goal they own your algorithm. Anyway, I'm gonna check out the data set and see if I can do some mining and see what happens.

At the moment the site seems to be a mess. The documentation sucks. The forums are full of people who don't know how to program asking nonsense questions.

10-03-06 - 1

Dick Fucking Cheney is hillarious. They got the timing with the body language really well sync'ed. Also Pacino's Cuban accent is so lolololo.


10-02-06 - 3

Okay, i'll summarize the online poker legislation and what's going down for anyone who cares.

The new bill was tacked on the "SAFE ports" bill by Bill Frist primarily. It will soon be signed and go into law. Basically what it does is makes it illegal to provide online gambling or transactions to online gambling. It does not make it illegal for an individual/consumer to partake of said online gambling. Offshore sites will continue to operate. It will be illegal for US banks or other funding instruments to provide transactions with them, so the hard part will be getting your money in & out of them, but of course there will be ways. Another problem is that major publicily traded gambling companies will be shutting off their US customers because of this. This is a bit of a funny thing, because they're foreign companies they aren't directly affected by this law, but they have to comply with it if they want their stock traded in US markets, and so on and complicated stuff I don't really understand. Anyway, it seems all the major publicly trading companies will soon be shutting off US customers. That includes PartyPoker/PartyGaming, PokerStars, 888 (BetonSports), etc. There are other major private companies which will continue to allow US customers, the biggest being Full Tilt, Ultimate Bet, etc. While the law will go into affect immediately, the bank transfer prohibition won't go into affect into the bank regulatory body draws up the detailed regulations, which will take a while.

10-02-06 - 2

My sister's NYC pilates (plus) studio is now open; linky . She's been teaching pilates and studying the body/wellness for a long time. NYC is one of those unique ecosystems where you have all these super-rich people and then you have a whole class of semi-rich people that feed off of them by providing them services/treatments. It was a huge investment to get the business all set up, I'm impressed & glad.

10-02-06 - 1

"The City of No Limits" was really good. It almost felt like it could've gone more surreal and turned into a Borges story. There's a wonderful tradition of surreal movies from Spain and Latin America, things like "Sex and Lucia" , "Abre los Ojos", "The Devil's Backbone", etc. - movies where the narrative gets lost, we're not sure if the characters are dreaming, crazy, or if it's really happening and it's the rest of the world that's not seeing it.


10-01-06 - 2

I just realized my domain should've been "rles.com" Do you see why?

10-01-06 - 1

I'm having a bad downstreak in poker and I'm probably gonna have to sell some stock. In the mean time I've been pondering what it's like to be broke. There are so many brutal expenses that I just sort of roll my eyes at, but if I was truly broke working a minimum wage job they'd be cripling. That extra $500 the hospital charged me even though I have insurance, the $100 in traffic tickets for bullshit non-violations, what do you do if you just don't have that money? I guess you don't pay it, and then it adds up with fees, the creditors come after you, and your life goes to shit.

old rants