02-29-08 - 2

"proselytize" is mis-spelled so often that Google won't even correct it. Some common ones you can find are "prosletize" and "prosletyze", "proseltize" and "proseltyze". The last one actually has a Yahoo Answers page where people define it and fail to point out the horrible misspelling.

02-29-08 - 1

I'm totally going gray now. It's alright, I think I'll be able to rock the sexy salt & pepper look. I think my dad actually looked better once he went salt & pepper, he lost the nerdiness and became distinguished. The downside is it will be harder to fit in with the youngsters, not that I really do anymore.

On the one hand, spending all your time worrying about getting girls is pretty retarded, it seems so shallow and unproductive. On the other hand, finding the right mate is probably the most important single thing in life, so not spending most of your energy on doing that is pretty retarded.


02-27-08 - 5

Things I wanted to do while on "sabbatical" :

1. Drive around the US and camp and hike. Did it, but did it much faster than I dreamed. Got bored and came home.

2. Travel, to South America and South East Asia. Didn't do it. Couldn't find anyone to go with and didn't want to go alone.

3. Get back in shape, in particular work on my back/neck/posture/etc which was in serious trouble when Oddworld closed. Sort of did it, though wound up spending most of my time and energy on shoulder rehab which is still fucked. My back/neck was way better for a while when I was off the computer, but I've wound up spending a ton of time sitting at the computer again and those problems come right back.

4. Play team sports. Didn't do it.

5. Be more social / work on getting out of my shell. Made some serious effort at the beginning but failed and gave up. Regressed completely with Dan.

6. Make an indie game or some kind of other fun software project for profit. Didn't do it. Made a few little aborted attempts but never thought of a game that I actually wanted to make.

7. Get better at poker and try to beat the highest levels. Mmmm.. never really did this the way I wanted, partly because I was spending a lot of energy on Dan and for me anyway it's impossible to get to the highest levels and have a relationship at the same time. I did play and beat 2000 NL online, which is pretty good, so we'll call this "sort of done".

8. Try some other careers to see how I would like them - like being a chef, a farmer, a journalist. Didn't do it. I did cook more but that counts for nothing. I also wrote some articles to submit to editors to try to get a writing job, but never actually submitted any of them.

9. Have lots of sex with different girls. I never really had a partying / screwing around phase, since I was working on software pretty much full time all through college, in addition to lots of extracuricular physics work. This has never been anything that I really wanted to do, but what annoys me is that I always have the thought somewhere in the back of my head that I didn't do it, so I wanted to do it just so that thought would go away and stop taking up brain space. Didn't do it.

10. Learn to play guitar. Made a few half hearted attempts. Failed.

11. Learn to make music on the computer with some kind of tracker thingy. Spent a few hours on this a few times and was flummoxed by how unusable these apps are. I actually started this again in the last few days, but it' still a "Didn't Do" at the moment.

12. Figure out what's going to make me happy and how I should spend the rest of my life. Didn't do it.

13. Be a really great boyfriend for Dan and succeed in having a healthy open relationship. Didn't do it.

Addendum : I know potential employers are reading this garbage brain dump of mine at the moment so I should note the thing I actually did accomplish was writing a bunch of code. Some stuff I did :

	Netflix Prize - a machine learning application on a huge dataset
	Guitar tuner - developed an unusually accurate and stable pitch detector
	Image doubler - in progress work on maximum likelihood super-resolution inference
	Sports bettor - solve sparse matrices and use machine learning to automate bets
	Limit poker bot - developed a Bayesian limit bot, as well as an anti-TTH perfect simulator
	Poker site helper app - developed for potential sale, learned win32 hooking
	Short stack poker bot - developed optimal game theory solution for simplified NLHE

02-27-08 - 4

My blog has always been a totally inappropriate combination of technical and personal. I guess that's what made it somewhat interesting back in the day, though this style is totally ordinary now. Actually writing interesting personal things is becoming more and more difficult. When I started, the internet was a much smaller place. Pretty much only highly technical people were on the net then, eg. none of my family or girlfriends or their families. I could write about my personal life without it coming back to hurt the people around me. My own inclination is to be completely honest all the time, I'm not really embarassed of anything in my life, I would rather just write about everything, but people around me wouldn't understand and would take things the wrong way, and unfortunately I have to worry a bit about people's feelings. Despite how it may seem I actually think about other people's feelings a lot; I think of tons of stuff I would love to write about, because it's interesting and amusing, but stop myself or delete it for the sake of others.

02-27-08 - 3

Some notes on bike balancing :

I wrote before about how the crucial thing is to hold the brakes and keep tension in the pedals so that you can transfer power through your legs to balance the bike. (search the archive for this). Some other little notes :

You don't want to come in too hot. Do most of your stopping before you try to go into a balance and get yourself really slow, then prep for the balance, then do it. Coming in hot to a full stop balance is a very advanced skill.

The balance is much easier if your two feet are level with each other. Once you're slowed down and getting ready to balance, get your feet in this ready position. There will be some slack in your freewheel when you come to a stop, so to compensate prepare your front foot slightly higher, then as you pull the brakes to go into the balance push forward with your front foot to take up the slack and get your feet into the level position.

It's important that your whole body is tensed but not stiff. You need to be supple, sprung, coiled with muscle but still loose. You are connected to the bike at your hands and your feet, so the energy is transfering through your entire chain, so you want elbows bent, knees bent, abs and back tight, shoulders slightly forward.

You obviously should be standing up to balance (standing and getting your body loose and flexed is part of the "ready position"), but you don't want to go too far forward. You don't want all your weight to go over the front wheel. For one thing you need weight in your legs because that's where your power is to control the balance, but also you need the front wheel to still be easy to maneuver. You may want to do some slight twitching of the front wheel, and that only works if your weight is centered back a bit.

Don't steer too much. You want to stay active and loose with the steerer, don't try to keep the front wheel perfectly straight, but at the same time, don't try to steer to balance. The balance is best acheived through lots of little movement, shifting your hips slightly, leaning your head, and steering ever so slightly. A good correction involves all these things, and not one big movement with any one of them.

Addendum : it helps a lot to look down when you balance. I like to focus on the spot where the front wheel touches the pavement.

02-27-08 - 2

Almost every single bar in America is retarded. Small tables and chairs do not belong in bars. The proper furniture for bars is picnic tables. (one long bench seat along the wall is okay too). Picnic tables accomodate big groups, and also encourage mingling if couples or a few small groups sit at them. Most other countries in the world get this right.

Also, Karaoke is fucking disgusting unless you are a Japanese business man in a suit with your coworkers. The proper way to sing in a bar is for everyone to sing folk songs together in a regular bar.

02-27-08 - 1

New phone etiquette : when you answer a cellphone and you have the caller Id so you know who it is, you should not just say "Hello?" you should say "Hi Chris" , which tells the caller you know who they are and they don't have to identify themselves.


02-26-08 - 1

Holy SHIT the Google Maps "Take Public Transit" thing is UNBELIEVABLE. My god it's like people with half a brain are actually writing software and making it do the things that it obviously should do. It seems to only have BART in the system, but I'm sure that better data coverage of other systems will be forthcoming.


02-25-08 - 4

I realize now that I've been extremely depressed for a long time. I guess it was obvious, but you don't really see it when you're in it. The thing is it comes on so gradually. One day you're happy, then life just beats you down and you make less and less effort, and day by day very gradually you give up and lose hope, and you start sleeping more and not wanting to go out, and just not looking forward to anything and thinking everything is shit and everything is pointless. But it doesn't feel like depression, you just think that's the way life is, because it came on so gradually that you can hardly remember when it was different, and the happy times in the past seem like fleeting moments in a sea of gray. Then something happens to shake it up and all of a sudden the birds are signing and the world is beautiful and you can't wait to get started on all these new projects and you're looking around for new fun things to do and trying to pack your days full of new experiences and new people - and only then do you realize that wow, do normal people actually feel like this all the time?

02-25-08 - 3

Anybody out there work for Pixar or ILM ? Any idea if that would actually be a fun job, working on tools/algorithms/rendering stuff?

02-25-08 - 2

A "Chicken Burger" is a Chicken that lives in a city. A chicken served in the style of a Hamburger should be called a "Chicken Hamburger". The 'ham' does not mean that it's made of ham, and "Hamburg" is not seperable.

Addendum : Ah crap, it appears this rant was a case of Cryptomnesia , perhaps from Dmetri Martin, but probably the original source is even older.

02-25-08 - 1

Whoah, I just realized why house music resonates so much with life in the city. It's always in my head when I walk around and I automatically start beat-boxing. It's because of the footsteps! If you walk fast like a city person, you're walking left-right-left-right , dum-dum-dum-dum , about 120 bpm. The beat of your steps automatically sinks into your brain and sets up a rhythm base with each step and you walk quick bum-bum-bum-bum , and then your brain automatically starts adding the off beats, bum-chik-bum-chik-bum-chik-bum-chik .

ps. no I was not on drugs when I wrote this. I wish I was.


02-22-08 - 2

Yes, yes, I really am looking for a full time job ; I keep telling people this but they don't believe me. Yes I really want to go to work and write code. I realized a little while ago that I was basically working full time writing code anyway, so I may as well get paid for it. Plus I miss having other people to work with.

No, head-hunters that does not mean you should send me all the shit jobs you're trying to staff. I'm still not looking for anything awful. Game company jobs in general don't appeal to me that much; the management is too shitty, I don't want to get involved in another badly run disaster where I have to work my butt of and don't make any money. I want to ideally work on interesting things where I can write fun code, either at a big safe corporation or a tiny tiny place with friends. I don't want a management job or anything with a lot of friction and non-technical headaches. I'd love to be able to just write interesting code and work with smart people.

I'm gonna be talking to Google and NVidia ; if you have any other good suggestions that fit the bill let me know.

02-22-08 - 1

Bleck I'm being such a retard about working out and the worst thing is I know it. I'm lifting pretty hard, but I'm just not eating enough, and I can't add muscle unless I eat more. My bodyfat% is already super low and it's just not possible to put on more muscle at my bodyweight. I really want some more shoulder muscle to protect my injuries but it's not gonna happen unless I add 500 cals a day or so. The problem is my stomach tells me to stop eating and I listen to it; partly it's all the years of trying to eat right and stay lean, it goes against all my instincts to force myself to overeat. I mean, I have no problem overeating once in a while for special occasions, but that isn't what I need, I need to slightly overeat every day, and it's really hard. In the mean time all this working out is pretty much completely pointless because I'm not providing the fuel to create new muscle.


02-20-08 - 2

It's annoying that one of the most common things you want to do with floats is also one of the least precise : summing or taking an average of an array. Say you have a bunch of values that are all similar, and you want the average and sdev. To get a good sdev you need a very accurate average. If you have say a million or more numbers, each one becomes tiny compared to the sum, and if you just do a normal sequential sum you make total shit. Of course if your numbers are floats you can do your sum in doubles and mostly not have a problem, but if you really need fine accuracy you need a better solution.

Of course the solution is both easy and obvious. In the common/standard case that the numbers are all very close to the average, a simple recursively dividing sum works perfectly :

template < typename _t >
_t recursive_sum(const _t * begin,const _t * end)
	int count = end - begin;
	case 0: return 0.0;
	case 1: return begin[0];
	case 2: return begin[0] + begin[1];
	case 3: return begin[0] + begin[1] + begin[2];
	case 4: return (begin[0] + begin[1]) + (begin[2] + begin[3]);
		const _t * mid = begin + (count/2);		
		return recursive_sum(begin,mid) + recursive_sum(mid,end);

ADDENDUM : Okay this is a pretty dumb way to do it. It actually works fine and is plenty fast but the better way is this Kahan Summation thing. BTW the PDF linked at the bottom of that page is an awesome awesome doc on floating point.

(UPDATED 3-04-08) : Here's a version that roughly matches the signature of std::acumulate ; the type inference from accum is a little bit of a subtle thing that can bite you in code if you're not careful. BTW I flipped the sign compared to Wikipedia cuz it just makes it really obvious what's happening IMO.

static inline T kahan_accumulate(Iter begin, Iter end, T accum) 
	T err = 0;
	for(;begin != end;++begin)
		T compensated = *begin + err;
		T tmp = accum + compensated;
		err = accum - tmp;
		err += compensated;
		accum = tmp;
	return accum;

static inline typename std::iterator_traits::value_type
kahan_accumulate(Iter begin, Iter end)
	return kahan_accumulate(begin, end,
                         typename std::iterator_traits::value_type());

There are a bunch of ACM papers on this stuff; if you're serious you should go there. For our rough graphics kind of use this Kahan thing is going to be fine. The "Cascading Accumulators" method seems to be the awesome way to go but it's a bit complex to implement. Apparently Kahan also has less error than the recursive way, though in practice they're both tiny.

02-20-08 - 1

People at the GDC ask me what I'm up to these days. I tell them I'm learning to do the Cabbage Patch, from this amazing teacher .


02-18-08 - 1

I discovered there are all these amazing videos of Foundries on YouTube made by amateurs, many of them made in the last days of US and UK steel production. You can start here : Youtube Iron Pour and then browse to similar videos and find tons of amazing stuff.

I also found this performance art group : Iron Guild which looks totally awesome, and some stuff one of them made : Iron Art


02-17-08 - 3

I found a bunch of craigslist adds for free massages for "fit men". Hmm.. I'd like a free massage, I wonder if they'll agree to use latex gloves.

So I guess I should play soccer. I've been looking for some kind of fun team sport, and there are just tons of pickup soccer leagues, and that's a relatively low-injury sport aside from twisted ankles and such.

02-17-08 - 2

Poker and coding both have this really sucky nonlinear return property. I guess physics did too and probably most really complicated mental endeavors. Basically if you are thinking about it 100% of the time, then you are in the zone and your efficiency per unit time is E. So like if you work for 8 hours you accomplish 8*E. If you try to do other things and balance your life more, you don't just spend less time at E, you spend less time at way way below E. The problem is it takes a huge amount of time to get back in the zone when you're out of it, to remember all those many connected things that you can't really write down and all the hunches and intuitive senses of the problem that you had in the zone. If you spend 6 hours, maybe you lose 3 to getting back in the zone and then only have 3 hours at full efficiency. It makes me feel like there's no point to doing it unless you go all out.

Coding is bad this way (and worse in a big complex code base or a really hard problem), but Poker is worse. One of the things that makes poker so bad for this is the lack of good feedback due to the high variance, which it makes it very tricky to get yourself back in the zone because you'll get both false positive feedback and false negative feedback. As you work back into the zone you may have multiple false plateaus where you think you've made it, only to realize later that you were wrong. It literally takes weeks to get back in the zone, which makes dipping in and out of play pretty impractical.

02-17-08 - 1

I can't figure out a trick to remember how to spell "guarantee". I always want to write "garauntee" which IMHO makes much more sense and you can remember it because it has "aunt" in it making the same sound.


02-16-08 - 1

XLR8R has a shit ton of good free music to download. Stop playing those god awful party mixes you make yourself that have no flow and jump from Conchords to Britney Spears to Gnarls Barkley. Download these pre-mixed podcasts and become a party music super hero.


02-13-08 - 3

It's so hard to find good people to play board games with. They can't be too geeky, you have to be able to talk about things outside of games, cuz the actual game isn't that fun it's just a venue for socializing. On the other hand, they have to be smart enough to play well or it just feels pointless. They also have to be smart enough and sharp enough to move really quickly. Most board games are just excruciatingly boring if people take too long, or aren't even paying attention to say they don't realize it's their turn. Sometimes I wind up turning into the "Dealer", going "okay it's your turn now, okay that's what you do? you're done? okay next person", which is not really a good spot to be in, it makes you an asshole and means you can't chitchat at all yourself, but god damn people when it's your turn you just FUCKING GO, and then you can go back to chit-chatting when you've made your move. Most board games are super high variance and also quite shallow. They're much more fun if you treat them as a quick semi-random match, and play over and over quickly.

02-13-08 - 2

I put a bookcase in my kitchen out of necessity and have all my pantry goods on it; at first I thought it was pretty ghetto but now I realize it's TOTALLY AWESOME. Everything is easy to grab and you can see what you have. It's even better than those restaurant style racks cuz those racks are too big, the nice little cubbies of the bookshelf are perfect. It made me realize the whole obsession with "kitchen cabinets" is totally retarded; cabinets suck, it's way more functional to just have no doors and have lots of shelves, and IMO it looks much better too if you have a cool functional kitchen like that. The kitchen cabinet obsession is like the 50's gauche bourgeois manicured lawns and plastic on the sofa.

02-13-08 - 1

Watching the Nature episode "Crash" about the Red Knot seabird that lost 90% of its population in almost one generation due to human disruption of their food source. It reminds me of something I wrote in "Fitness" about how the human anatomy has evolved to not build unnecessary muscle. You see with this occurance of the Red Knot how catastrophic events can cause massive rapid genetic selection. Up until the 1970's or so, everyone thought of genetics as gradually evolving over the aeons, lots of tiny changes adding up. We now know (and it seems quite obvious in hindsight) that in fact the overall genetic makeup of the population makes very rapid and massive changes in response to cataclysmic events. Lots of minor differences evolved into the population over the years, and they weren't strongly selected. Then suddenly something happens and every individual without a certain gene is dead, either from famine, or a disease, or a change in a food source or a predator. With humans, there have been countless famines (and a recent ice age and plague) which have wiped out the individuals that built muscle more easily and required more food to survive.


02-12-08 - 1

Hi-Def San Francisco has some amazing videos; check out the "Best Of" section to watch the fog roll across the city.

Refocus Imaging and Helicon Soft are two different ways to do 3d-photography where you can set the focal plane (or arbitrary focus surface) after the fact.


02-09-08 - 3

Writing the date in my rant entries is just about my only connection to the calendar.

02-09-08 - 2

When skiing I thought of this invention. Ski Goggles should have a nose cover. Not a sealed nose cover but just a piece of plastic or maybe cloth attached over the nose. It would protect the nose from cold wind and also from the sun, since the nose is the most likely thing to get sun burned. I drew a picture . On the right you see normal goggles, on the left the deluxe goggles with Nose Guard. Patent Pending!

It's such an obvious and seemingly awesome thing to have there must be some reason why it's not a good idea or it would've already been done.

02-09-08 - 1

I bought a Samson C01U USB microphone. I have to say I'm quite disappointed and would not recommend it. Most of the reviews I read were very positive, and in theory you get better quality and save money by buying a USB microphone so you don't have to have a seperate A2D and Mic PreAmp and all that stuff. The problem is it's really really quiet, it's made for you to put your lips almost right on the mic. It works fine when you're that close, but anything further is inaudible, and if you pump up the volume after recording it's just noise. Like even one foot away and the sound goes to shit. God damn it.

Sweet page on how mics work


02-07-08 - 1

Went skiing the last few days. WTF is up with Truckee? First of all, TRAFFIC CIRCLES !? I actually think traffic circles are generally superb and good for traffic, but Americans are retarded around them, and to have a tourist town that's a winter destination with icy roads covered with traffic circles is so insane. Second, when did the downtown area get all fancy? I remember Truckee as a total trashy mountain town but now there's this strip of Aveda spas and all that kind of yuppie shit. Bah humbug. The skiing was mostly pretty great.

This image doubling thing I'm thinking about is mostly called "Super Resolution" in the literature (all 2 papers). I hope that name doesn't stick. There is a rather seperate thing which is also called "Super Resolution", which is creating hi res images from crappy video sequences. It seems to be further along in development. Sina Farsiu has good papers on the video kind of super resolution. There's even a free research application called MDSP that does video super-resolution.

It's funny that this kind of video resolution enhancement was being done in the movies way before it was actually being done on computers in real life. I mean, it's no surprise that movies made up semi-sciency mumbojumbo for plot purposes, what's funny is they got it almost exactly right, and I'm sure most computer people at the time knew it was a totally reasonable thing to do, it just hadn't really been fully worked out yet. Unfortunately for all the replicant hunters out there, we still haven't perfected the zooming around the corner technology.


02-04-08 - 4

I updated the guitar tuner app with slightly better harmonic/fundamental tracking. It's still not ideal, that's sort of a messy heuristic problem, I could do better than I am doing but whatever it's not hurting the app much so there it is. There are a lot of advantages to all the noise tolerance work; I can tune my guitar while cars are driving by outside, which totally freaks out my handheld crystal tuner thingy; also I can just use the super shitty mic that's built in to my laptop and actually tune pretty well. One ugly thing is that the guitar's low E is very close to the kill noise frequency threshold I'm using (it's 82 Hz and I kill everything below 77 Hz) which can cause some inaccuracy on the low E if you aren't careful, cuz part of the spectrum tail gets cut off.

For my own reference in case I come back to it : there are a few issues with the whole fundamental frequency thing. I mainly modulate out octaves, so F - 2F - 4F harmonics aren't a problem, it gets really screwy when you switch from 2F to 3F , and actually even 5F can be the highest peak. If you get a spectrum where the 3F peak is biggest, but there are still solid 1F and 2F peaks, that's pretty easy to detect. Another case is sometimes the 1F peak is completely missing, but if there's a 2F peak you can still deal with that by looking at 2/3 or by seeing that the spacing between peaks is F. The really really evil case comes from time evolution. Sometimes you can strike a note and it starts up with strong peaks at 1F,2F, and 3F. Over time each of their amplitudes changes and you can get cases where the 1F and 2F peaks almost completely disappear and you're left with only a 3F peak. To handle that correctly you have to use temporal continuity and just assume that that sound is still acting like a 1F pitch (sometime the 1F and 2F peaks come back and the 3F peak dies out as time keeps evolving). The spacing between the peaks in the frequency domain is a pretty way to get the base pitch (you imagine an additional peak at zero frequency).

I hate relying too much on temporal continuity because it mean that short-time errors get persisted. It's much cooler if I can do everything without relying on the previous frame. You do get a tiny short-time error from fourier analysis of transients, but that's negligible. The real problem comes from real world short-time sounds. With tuning a guitar, when you first strike a string there are lots of funny sounds from your finger rubbing the string and perhaps the string slapping the body and all that stuff that only lasts a second or so. If you start building in too much continuity, you could pick up weird pitches in that mess and then try to persist them. Really once the clean note starts sounding you want all that junk to be forgotten.

02-04-08 - 3

My next project is to get back on the Image Doubler and see if I can actually make the predictive/learning doubler do something worthwhile. I went looking for a big repository of hi-res research/reference images a while ago and couldn't find a damn thing that was decent, it's all super low res or super small collections, like 16 pictures or less. Yesterday I had a "duh" moment. Torrents! Go to the torrent sites and filter for pictures. Of course there's a lot of stupid pictures of ugly girls, but there's also awesome stuff like a package of 800 photos of nature at 1920x1200. Each pixel in each color plane is a training sample, so that's 5.5 billion training samples right there which should hold me for a while.

Ideally I'd get the uncompressed so I don't have spurious JPEG artifacts in my images gunking things up, but it's hella hard to find a good uncompressed image data set.

Ideally I would like an image training set which statistically exactly mirrored the overall statistics of all digital images in existance (weighted by the probability of a user interacting with that image). That is, if 32% of the neighborhoods in all the images in the universe were "smooth" , then in my ideal training set 32% of neighborhoods would be smooth. The average entropy under various predictors would be the same, etc. Basically it would be an expectation-equivalent characteristic sample. Some poor graduate student needs to make that for all of us.

02-04-08 - 2

I've got two computer problems maybe someone can help me with.

1. "Shortcut Keys" in Windows. I'm talking about the thing where from an app's right-click Properties dialog you can set a "Shortcut Key" and set a key combo. I use this a lot as a way to key-chord to apps. It works great except once in a while it stalls out like crazy. You hit the key combo and nothing happens, and in fact the whole taskbar becomes nonresponsive; all the other apps still respond fine, and the process monitor doesn't show any CPU spike. A minute later suddenly windows goes ahead and executes the shortcut key. Note that just switching to a macro program doesn't fix it, because the shortcut key thing is smart enough to switch you to an existing instance rather than start a new one.

2. Getting files from Perforce that don't exist. This is some problem with the VC-Perforce integration. As usual I can't get any help from either of them. This happens when you have a VC project which is under source control, but some file in the project does not exist either on your disk or in the depot. You start up VC and load the project, and VC goes into this mode "Getting files from source control" where it tries to get the file from the depot. If the file is in the depot it gets it fine, but if it's not in the depot, VC appears to hang. I've never let it sit long enough to get past this, but I've watched the disk activity while it's hung in this mode and it appears that VC is doing a recursive scan of the entire depot and touching every single file; I have no idea why it does this and I can't stop it. It's pretty fucking annoying. There is a workaround which is to open the vcproj and figure out what file is mising and make an empty file with that name or remove it from the project, but that's a pain and I'd like to just not have this stupid hang.

02-04-08 - 1

LOL internetaments. For some reason I keep trying to spell "schizophrenic" like "pschyzophrenic". I was writing it today and it looked wrong so I typed it into Google to spell check as I often do. Google tells me the right spelling, but I also notice a few results down is a link to my fucking rants where I used the wrong spelling. If I keep spelling things wrong I can be the definitive site for words like "seperate" and "beaurocracy".

In the future everyone will have a completely unique name so that they're easily searchable.


02-02-08 - 3

If recommender systems like Netflix were augmented with a Network of Trust, one thing you would want is to remember where you got the recommendation from. That way when I watch a movie that was highly recommended for me, and I hate it, I can go back and see the link that recommended it for me and mark it as "don't trust" (for movies). In theory a simple collaborative filtering system will eventually learn your similarity with everyone else in the system, but in practice you have to have a very large # of movies in common before that becomes accurate, far more than normal people watch. If you allow the user to provide extra information you can converge on their tastes much faster.

02-02-08 - 2

Musharraf has certainly done a lot of things we should be unhappy about, but we have to remember he's in a very difficult situation. He has to contend with four very seperate and powerful forces in Pakistan. 1) The middle class and the lawyers, which wants democracy, rule of law, and stability; they would mostly vote against him if there was a good alternative. 2) The devout muslims and the tribes, which want Sharia law and independence from the government; this faction could easily become very violent if upset, and is dangerously close to a majority which means if there were true open elections they might win. 3) The military. This is Musharraf's base (remember he became president in a coup and his power is still backed by the military) - but the military is quite independent, and portions have strong ties to the tribes or the ISI; if anyone tried to curtail the power of the military they could face a coup. 4) the ISI (the intelligence service), which has strong ties with the tribes and the Taliban, and is very independent and certainly responsible for many political assasinations in Pakistan; again moving against them could easily lead to disaster.

The US official policy is that we want real democracy in Pakistan, but behind closed doors the CIA and State aren't so sure about that. An election could easily destabilize Pakistan if someone is elected that is disliked or moves too fast against the Islamic extremists, the military or the ISI. On the other hand if anyone comes to power that works too closely with any of those factions that could also be bad. Musharraf is doing a half decent job at the moment of keeping all the factions reasonably pacified.

Much like Iraq, Lebanon, and Palestine, pushing for elections too soon could be a disaster. First you need some level of stability and rule of law, protection against assasination, a fair election system, the confidence of the populace so that no big groups boycott the vote, etc.

02-02-08 - 1

There's something about Jamie Oliver that I really hate; maybe it's his big pouty lips, or maybe all his cute little Brittishisms, or the way he says "yeah" all the time as punctuation. Anyway, his show "Jamie at Home" is probably the best cooking show on TV at the moment. (apparently Jacques Peppin has some new shows but the PBS here doesn't carry them). Jamie's sous-chef Gennaro is absolutely amazing, such a weird character.


02-01-08 - 3

The fully Bayesian approach to modeling says you should consider all possible models, and weight each model by the probability that it is the right one.

You've seen past data D. You know for a given model the chance that it would produce data D is P(D|M) , so the probability of M being right is P(M|D) = P(D|M)*P(M)/P(D)

Now you want to make some new prediction given your full model. Your full model is actually an ensemble of simple models all weighted by their likelihood. The probability of some new event X based on part observations D is thus P(X|D) = Sum on M { P(X|M) * P(M|D) }

P(X|D) = Sum on M { P(X|M) * P(D|M) * P(M) } / P(D)

Note that the P(M) term here is actually important and often left out by people like me who do this all heuristically. It's a normalizing term in your integration over all models, and it's pretty important to get right depending on how you formulate your model. Basically it serves to make the discretization of model space into cells of equal importance so that you aren't artificially over-weighting models that are multiple covers. You can also use P(M) to control what models are preferred; for example you might make smoother or simpler models morel likely. eg. if you're modeling with a sine function you might make lower frequency waves more likely, so you make P(M) like e^(- freq^2) or something. This makes your model less prone to overfit noisy data. This whole step is referred to as "estimating the priors" ; that is the a-priori probabilities of the parameters of the model.

A common shortcut is just to use the most likely model ; this is aka "maximum likelihood". This just means you pick the one model with the highest P(M|D) and use that one to make predictions. This is potentially giving away a lot of accuracy. People often do a lot of hard work and math to find the model that is maximum likelihood, but we should remember that's far less desirable

In simple situations you can often write down P(D|M) analytically, and then actually explicitly integrate over all possible models. It's common to use models with Gaussian probability spreads because they are normalizable and integrable (over the infinite domain), eg P(D|M) of the form e^(-(D-M)^2).

An alternative to Maximum Likelihood is to use just a few different models and estimate the probability of each one. Again this is sort of a strange discretization of the model space but works well in practice if the chosen models span the state space of reasonable models well. This is usually called weighting "Experts" , also called "Aggregating" ; in Machine Learning they do a sort of related thing and call it "Boosting". There are a lot of papers on exactly how to weight your experts, but in practice there's a lot of heuristics involved in that, because there are factors like how "local" you want your weighting to be (should all prior data weight equally, or should more recent data matter more?").

In all the weighting stuff you'll often see the "log loss". This is a good way of measuring modeling error, and it's just the compression loss. If you think of each of the models as generating a probability for data compression purposes, the log loss is the # of bits needed to compress the actual data using that model as opposed to the best model. You're trying to minimize the log loss, which is to say that you're trying to find the best compressor. Working with log loss (aka compression inefficiency) is nice because it's additive and you don't have to worry about normalization; eg. after each new data event you can update the log loss of each model by just adding on the error from the new data event.

As I said there are various weighting schemes depending on your application, but a general and sort of okay one to start with is the previous probability weight. That is, each model M is weighted by the probability that that model would've generated the previously seen data D, eg. P(D|M). If you're working with a loss value on each expert, this is the exponential loss, that is e^(-loss). Obviously divide by the sum of all weights to normalize. Roughly this means you can use simple weights like e to the minus (# of errors) or e to the minus (distance squared).

My favorite tutorial is : Bayesian Methods for Machine Learning from Radford Neal

If you like to come at it from a data compression viewpoint like me, the canonical reference is : Weighting Techniques in Data Compressiom , Paul Volf's thesis. This work really shook up the data compression world back in 2002, but you can also look at it from a general modeling point of view and I think it's an interesting analysis in general of weighting vs. selecting vs. "switching" models. Volf proved that weighting even just two models together can be a huge win. When chosing your two models you don't really want the best two; you want two good models that don't contain any inherent inefficiencies, but you want them to be as different as possible. That is you want your two models to sort of be a decent span of "model space", whereas the single best model might be in the middle of model space, you don't want that guy, you want two guys that average to the middle.

A Tutorial on Learning With Bayesian Networks by David Heckerman is more mathematical and thorough but still half way readable.

More modern research mainly focuses on not just weighting the experts, but learning the topology of experts. That is, optimizing the number of experts and what parts of the past data each relies on and what their models are, etc. A common approach is to use a "Simple Bayesian Estimator" as a building block (that's just a simple model that generates random numbers around some mean, like a Gaussian), and then figure out a topology for combining these simple bayes guys using product rule of probabilities and so on.

Here's sort of a cool example that game-tech graphics people should be able to relate to : Bayesian Piecewise Linear Regression Using Multivariate Linear Splines . The details are pretty complex, but basically they have a simple model of a bunch of data which is just a piecewise linear (planar) fit, which is just C0 (value continuous but not derivative continuous). They form an ensemble model in a different way using a monte carlo approach. Rather than trying to weight all possible models, they draw models randomly using the correct probability weighting of the models for the data set. Then you just average the results of the drawn models. The resulting ensemble prediction is much better than the best found linear model, and is roughly C1.

This last paper shows a general thing about these Bayesian ensembles - even when the individual models are really shitty, the ensemble can be very good. In the linear regression models, each of the linear splines is not even close to optimal, they don't search for the best possible planes, they're just randomly picked, but then they're combined and weighted and the results are very good. This was seen with AdaBoost for classification too. With boosting you can take super shitty classifiers like just a single plane classifier (everything on the front is class A, everything on back is class B), but if you average over an ensemble of these planes you can get great fits to all kinds of shapes of data.

I wrote about a way to do simple weighting back on 12-16-06 using very simple variance weighting.

02-01-08 - 2

I'm coining the use of the word "sitcom" as an adjective. Sitcom means basically "formulaic and an exaggeration of a stereotype, similar to the lowbrow repetetive comedy on bad sitcoms". An example would be like if your wife is mad at you for leaving the toilet seat up, you could "god honey that is so sitcom". Another would be like if a girl goes black and she never goes back, that would be sitcom. It can also refer to just banal typical daily living stuff, like "yeah we went over to Glen and Margie's house, we talked about their kitchen renovation, it was totally sitcom".

02-01-08 - 1

What should our next president actually do?

1. Get rid of the G.W.B. tax cuts. Politically it would be best to leave the tax cuts for everyone making $100k or less and phase them out above that. Restore the estate tax and dividend taxes. Our government desperately needs money to fix this country and putting back those taxes on the rich will hurt the economy the least. It's a damn shame those tax cuts ever passed.

2. Figure out a way to get out of Iraq and restore our military's morale and fighting capability. This is a mess so I won't go into big details.

3. Get more troops in Afghanistan. Hopefully get an administration with more leverage internationally that can get some more support from NATO. Then we have to do something about Pakistan which is a huge problem. Getting the Pakistani government to really go after the tribal areas is impossible, but it might be possible to get more of the Pakistani military on the border to try to reduce the amount of border crossings. Make it clear to the tribal leaders that they are not being invaded and they are free in their domains but they aren't to cross the border. That certainly won't work without a much bigger force on the Afghan side of the border.

4. Actually do something to improve education in the US. "No Child Left Behind" is such a retarded worthless law, it creates standards and holds schools to them, but A. the tests are horrible measures of learning and B. doesn't provide funding. We need a lot more federal money for schools. We need legal maximum class sizes, maybe 20 kids per class. We need to stop cramming together kids of different aptitudes; maybe require that the slower kids get even smaller class sizes. Offer more after school programs, particularly for the kids that don't have good home environments. Then we need to start thinking about doing something with the colleges. College prices are skyrocketing and at the same time the quality of education is going way down with lots of kids getting rubber stamp degrees.

5. Spend on infrastructure. This is just something we should be doing all the time that we cut at some point to save money in budgets. Infrastructure spending is great for the overall economy; no it's not a great way to get out of recession because it's not fast enough, but it provides lots of steady jobs, and the infrastructure that results makes many other businesses work better. We need roads, bridges, flood control, commuter trains, etc. etc.

6. Get a handle on health care. This is another huge mess so I don't want to get distracted by it. I will just point out how big of a problem it is. We already spend around 15% of our GDP on health care, which is far far more than any other country (many Western countries are around 5%). Furthermore, health care costs are still rising much faster than inflation. This is a huge drain on our economy that needs to get fixed. I think the basics of Hillary's plan are the right way to go, but reducing all the unnecessary expenditure and fraudulent profit taking is going to be politically very difficult in our pro-shyster culture.

7. Do something serious for the environment and energy independence. I don't want to oversell the importance of global warming, but it's high time we started doing something serious to reduce fossil fuel usage; if you like you can pretend that the reason is for strategic energy independence. It could also be very good the economy. There are lots of obvious steps that make sense. IMO the best thing would be huge taxes on water, electricity and gas, which would allow the market to figure out the best ways to reduce use and make it all financially driven. I'm sure that's a political non-starter, so instead you could start penalizing companies that are extremely inefficient with their energy use or waste disposal. Put a bunch of money into researching alternative energy. The amount of money needed for this is microscopic compared to the benefits. A few billion dollars is a HUGE amount to fund research, but is a drop in the bucket compared to our oil imports. A few cents of gas tax could easily fund alternative energy research. Tax subsidies for the shitty alternative energy solutions we currently have do nothing but give free money to businesses.

What are our current political parties actually about? They talk about this and that, but what have the parties consistently actually accomplished in office :

Republicans are :
tax cutters
deficit spenders
grow the defense/military budget immensely
anti-environmental controls, anti-parks

Democrats are :

It's hard to remember anything that the Democrats have really done in recent history. I mean if we list things done under Clinton it was mainly free trade, cutting welfare, etc. I don't know if we can really draw a distinction in terms of foreign policy, both parties tend to fuck around all over the globe in rather retarded ways that generally have negative long term consequences. Both Clinton and Bush allowed genocides to occur on their watch. I'm not really sure what the Democrats stand for other than being "not Republicans".

old rants