11-20-04The Stranger's Wrath code is possibly the most stable and robust game code base ever; certainly it's by far the most stable I've ever seen. On the entire project, we've only had two persistent nasty crash bugs - one was from Granny (a licensed library), and another is from XACT (the XBox audio library). Both of those libraries fail to follow our guidelines of being self-checking, input-validating, etc. For example, XACT will simply crash any time you play sounds with an index out of bounds, which can happen easily due to people have different revisions of the content. It's been extremely important for us to have this level of stability. We are able to take the development build and deliver it as a fully stable milestone in about 1 days' notice. Generally the bad bugs we have to fix are "stop playthrough" bugs, not crash bugs, and quite often the "stop playthrough" is just a design issue, that eg. this door doesn't open so you can't move on. We had to deliver numerous demos to publishers, to MS, to EA, to press, and it was all reasonably easy because we could go from full development to a stable build so quickly. In fact, it encouraged some practices I wasn't too fond of - the design team would frequently work on demos until the day before delivery, so we only got one day of content lock to test & fix code. In the past I've always wanted closer to a week of content lock to make sure we get a stable build.
I would discourage all programmers from relying on repro for bugs. I don't use repros unless it's a strange bug or needs a specific case to test, etc. Generally I try to fix bugs just by looking at the code. When someone describes a bug to me, I think of where in the code that problem could be caused, then I simply go and look at that part of the code. You can look at that code and see how it might break, and 99% of the time you can spot the bug just by looking and making sure it's robust. Even if you don't spot the bug, if you have a good idea where it is, you just add some more asserts and self-checks and logs in that part of the code, and hopefully those will trip in the future and give you more information, and they can stay there to make sure the code seems strong. Another thing that I emphasize is to look at how the bug happened and try to prevent that from happening in the future. Look at the buggy code - how did that get in? Was it reviewed? Did both people involved actually understand that bit of code? Did they talk to the original author? Did they test it? Did they have good asserts checking the code? I try to not only fix the bug, but also fix the behavior that made the bug. Sure, sometimes you just have mistakes that make bugs and everyone was doing the right thing, but that's actually rare compared to bugs made from someone trying to be too fast or sloppy (often me). Repro is sort of a crutch for bugs. I don't like to fix the code for one specific repro case - I like to make the code bulletproof for all possible break cases.
It's funny to me that we (Oddworld) and Bungie use the absolute opposite coding styles, and yet the result is extremely similar. We use heavy C++, dynamic allocation, multiple inheritance, STL. They use almost straight C, macros, function pointers instead of inheritance, no dynamic allocation at all, etc. In the end, we both push the XBox very hard, running the GPU to the limit, using all the 64 megs of RAM to the fullest. I know they don't believe that we use the hardware or memory optimally (in reality, we do stall on L2 cache misses because of our polymorphism, but that's maybe 3% of CPU time lost), and we don't believe they can possibly develop rapidly and robustly, but probably we're both wrong. However, our situations are very different, and I can't imagine their style would work here. We've had to hack in crazy new features on a daily basis; if I want to make some types of objects support some new interface, I can just define that interface and dynamic_cast to it at the query point; I don't have to push my crazy feature up to the base class.