12-04-04 - 1


I was just away at the Game Tech conference, giving a talk on OSW. It was really interesting to see the talks on Halo 2 and Half-Life 2, mainly to see that they really aren't do anything we aren't. I look up to those games immensely, and I always imagine in the back of my head that maybe they have some amazing technology or process or something that is helping them be so superior. In fact, their tech/engine process is very similar to ours and the goals are mostly pretty similar, though there are some differences in philosophy. Mainly we at OW just can't trust anyone on the team to do things right if left to their own devices - we have to enforce a lot of structure and rules through the code, whereas Valve and Bungie can be more flexible, because they have the process and responsibility in other departments that makes that possible. The biggest differences between OW and those two H2's are - A) their teams are huge, and B) they have good process outside of code. We at OW have a big company - almost 50 people - but we only have about 20 in game production (!!), whereas Valve and Bungie both have over 50 people in actual production (and a fraction of the admin staff!). Also, it's not like the H2's are really these ivory towers - their tools still have a lot of problems; in Halo 2's case, they were way behind schedule and the game suffered badly, mainly due to game design, in HL2's case, they just took a really long time. Both of them really managed to make great games I think because the core direction was good - eg. the base mechanics are identical to the previous games, so everyone on the team knows how to do that and agrees on it, and then the core philosophy of game design was good - for HL2, it's fully interactive, immersive, for Halo 2, it's systems-based gameplay, etc. Compare that to us where our direction didn't really crystallize until a few months before shipping.

I had to come back from the talk a bit early because we're trying to go gold here (maybe today!). We had four crash bugs when I got back (!!). Only one of them was found by the lovely EA test, and three of them were found by us internally, despite the fact that we have basically no internal test department at all. Lovely. The four crashes were - 1) XACT seems to have a bug with auto release cues and a bad linked list walk, 2) we had a resource registration bug in our code that caused a null deref in a very rare case (only on the DVD build), 3) one crash in the granny "UUU" department, 4) another crash in granny, again UUU/paging related. Amazingly, we fixed them all in like one hour. The UUU bugs in particular have been with us for a long time, and we never had a repro, and in fact I thought maybe they were fixed, but suddenly they came back. In fact, the previous fix I thought I'd done was a total red herring. When I did the fix, I also put in lots of catching to detect if the error happened again. It turns out that my check/catch code was being hit, and the damn content team was seeing those errors reported and skipping past them without telling us or sending us the logs. This has been going on for months, so we could have easily fixed this bug long ago. In general, our whole company and content departments are extremely unaware of how their behavior affects us. In code, we have to start first and crunch hard to get the engine and tools going; then we all work hard, and the content guys slip and slip and finish way late, and then we in code have to continue crunching to finish up. We try to make the code super robust so they can work, we try to put in error checking to help them and to help us find errors and fix them early so we don't wind up with crash bugs at the last minute. It's extremely bad to be making these fixes so late - we've had weeks of test, and we're making these fixes now, so all that testing has not tested these fixes. Also, our producers at OW and EA have become totally irresponsible. They're obviously sick of working on the game and with each other, they just want to kick the game to manufacturing, they don't really care if it's tested right; we make these fixes, they just want to play through it once, call it good and send it out. So, anyhoo, we found these UUU crashes which were tricky little buggers - we've known there was a problem for months and could never find it. I think it was just having those couple of days away at Game-Tech that made it possible. When you work 6 or 7 days a week for months, your brain just gets fried, you can't think straight. Game-Tech was intense, but it was still a break from coding, so I came back with a bit of a clear mind and was able to see the problem. Getting that time away is so valuable for intellectual labor, it's hard to quantify what a productivity boost you get.

No comments:

old rants