Pierre wrote a pretty good blog post a while ago (which I can't find now grrr) about how the standard maxim to "optimize late & optimize only hot spots" can be wrong. In particular, the general bloat of inefficiency everywhere can be a huge drag and not provide any easy targets. I agree with that to some extent (certainly at OddWorld we had the problem of some nasty generalized speed hit from various small inefficiencies). However, the opposite is also true - optimizing early or micro-optimizing can really make you do stupid things. I've written before about how it can get you into local minima traps where you don't see a better algorithm because your current bad one is so tweaked. One of the worst things you can do is super-optimize a bad algorithm. And on the flip side, it can actually be a huge boon to your coding if your low level stuff is really slow.
I was reminded of this recently when I got this introduction to x64 ASM with an example Huffman decoder. Despite lots of work in ASM, this Huffman decoder is pathetically bad, it actually walks through node pointers to decode, which is just about the worst possible way to do it. I don't mean to pick on that guy, it's example code and it's actually really nice example code, it was super useful to me in my x64 learnings, but I've seen so many bad Huffman ASM implementations, it has to be one of the all-time great examples of foolish premature optimization of bad algorithms. If you had a really slow bit input routine, you might be motivated to work on the *algorithm* to avoid bit inputs as much as possible. Then you would come up with what all the smart people do, which is to read ahead big chunks of bits and use a table to decode symbols directly. Usually the best way to optimize a slow function is to not call it. (I don't actually have a good reference on fast Huffman, but I did write a tiny bit earlier).
Of course there is also the flip side. Sometimes a slow underlying function can cause you to waste lots of times on optimization that's not really necessary. Probably the worst culprit I see of this is allocations. I see people doing masses of work to remove allocations and often it just doesn't make any sense. A small block alloc these days is around 20-50 clocks, often less than a divide. There are good reasons to remove allocs (mainly if you are on a low memory system and want very predictable memory use patterns), but speed is not one of them, and people who are using a very bad malloc back end with a malloc that is hundreds or thousands of clocks are just giving themselves a problem that isn't real.
Browsing around the web another one struck me. I tend to be *very* careful in the way I write code (compared to my game developer compadres anyway, I suppose I am wild and reckless compared to many of the systems devs). I used to be somewhat of a specialist in saving bad code bases, and in that spot I really hate to even try to work with the code I'm given. The first thing I do is start tracing through runs and see what's really happening, and then I start adding comments and asserts everywhere. Then I start wrapping up common functionality into classes that encapsulate something and enforce a rule so that I can be *sure* that what everyone thinks is happening really is happening. When I see code that does not *prove* to me that it's working right, I assume it's not working right.
This carefulness is multiplied many fold when it comes to threading. I am hyper careful and don't trust myself for the most basic things. I see a lot of people write simple threading code and do it without much in the way of comments or helper functions because they are "smart" and they can tell what's happening and don't need helpers. They just do some interlocked ops to coordinate the threads and then maybe do some non-protected accesses when they "know" they can't have races, etc. That's awesome until you have a mysterious problem like this . You could blame the min() for doing something weird, or blame the compiler for not being nice enough with it's volatile function, but IMO this is the type of thing that would never happen if the code was written more carefully.
Even when I'm at my most lazy, I would write this :
// pull available work with a limit of 5 items per iteration LONG work = min(gAvailable, 5);as
LONG avail = gAvailable; LONG work = min(avail, 5);but this is also an example where the hated "volatile" is showing its ugliness. I hate that the "volatile" is far away on the variable decl and not right here where I need it. Sometimes I would write :
LONG avail = *((volatile LONG *)&gAvailable); LONG work = min(avail, 5);but even better if I had my threading helpers I would do something like :
LONG avail = LoadShared_Relaxed(&gAvailable); LONG work = min(avail, 5);where I explicitly load the shared variable using "Relaxed" memory ordering semantics (actually I don't know the usage case, but this should probably have been Acquire memory semantics). LoadShared_Relaxed is nothing but *ptr , so many people don't see the point of having a function call there at all, but it makes it absolutely clear what is happening. It also just makes it more verbose to touch the shared variable, which encourages you to load it out into a local temporary, which is good.
Another option which I often use is to make a Shared<> template , so gAvailable would be Shared< LONG > gAvailable. Then you have to access it with members like LoadRelaxed() or StoreRelease() , etc.
I treat threading code like a loaded gun. I don't point it anyone's face, I store it without bullets, etc. You take these precautions not because they are absolutely necessary, but because they ensure you don't have bad surprises.
A lot of the times in rebuttal, people will show me their unsafely written code and say "well this works fine, tell me what's wrong with it". Umm, no, you don't understand the issue at all my friend. The point is that with unsafely written code, I have to use my brain to figure out whether it is working or not, and as we have seen, that is *very* very hard. Especially with threading and races. In fact the only way I could do it with any level of confidence is to instrument it and run it through Relacy or one of the other automatic race detectors.
Maybe I miss managing people and getting to pick on their code, so now I'm picking on random code from around the internet. Actually that might be fun, do a weekly code review of some random code I grab from the web ;)