6/07/2010

06-07-10 - Coding Style

I sometimes feel like I live in my own coding world, where I have these strange ideas that nobody else agrees with.

Pierre wrote a pretty good blog post a while ago (which I can't find now grrr) about how the standard maxim to "optimize late & optimize only hot spots" can be wrong. In particular, the general bloat of inefficiency everywhere can be a huge drag and not provide any easy targets. I agree with that to some extent (certainly at OddWorld we had the problem of some nasty generalized speed hit from various small inefficiencies). However, the opposite is also true - optimizing early or micro-optimizing can really make you do stupid things. I've written before about how it can get you into local minima traps where you don't see a better algorithm because your current bad one is so tweaked. One of the worst things you can do is super-optimize a bad algorithm. And on the flip side, it can actually be a huge boon to your coding if your low level stuff is really slow.

I was reminded of this recently when I got this introduction to x64 ASM with an example Huffman decoder. Despite lots of work in ASM, this Huffman decoder is pathetically bad, it actually walks through node pointers to decode, which is just about the worst possible way to do it. I don't mean to pick on that guy, it's example code and it's actually really nice example code, it was super useful to me in my x64 learnings, but I've seen so many bad Huffman ASM implementations, it has to be one of the all-time great examples of foolish premature optimization of bad algorithms. If you had a really slow bit input routine, you might be motivated to work on the *algorithm* to avoid bit inputs as much as possible. Then you would come up with what all the smart people do, which is to read ahead big chunks of bits and use a table to decode symbols directly. Usually the best way to optimize a slow function is to not call it. (I don't actually have a good reference on fast Huffman, but I did write a tiny bit earlier).

Of course there is also the flip side. Sometimes a slow underlying function can cause you to waste lots of times on optimization that's not really necessary. Probably the worst culprit I see of this is allocations. I see people doing masses of work to remove allocations and often it just doesn't make any sense. A small block alloc these days is around 20-50 clocks, often less than a divide. There are good reasons to remove allocs (mainly if you are on a low memory system and want very predictable memory use patterns), but speed is not one of them, and people who are using a very bad malloc back end with a malloc that is hundreds or thousands of clocks are just giving themselves a problem that isn't real.

Browsing around the web another one struck me. I tend to be *very* careful in the way I write code (compared to my game developer compadres anyway, I suppose I am wild and reckless compared to many of the systems devs). I used to be somewhat of a specialist in saving bad code bases, and in that spot I really hate to even try to work with the code I'm given. The first thing I do is start tracing through runs and see what's really happening, and then I start adding comments and asserts everywhere. Then I start wrapping up common functionality into classes that encapsulate something and enforce a rule so that I can be *sure* that what everyone thinks is happening really is happening. When I see code that does not *prove* to me that it's working right, I assume it's not working right.

This carefulness is multiplied many fold when it comes to threading. I am hyper careful and don't trust myself for the most basic things. I see a lot of people write simple threading code and do it without much in the way of comments or helper functions because they are "smart" and they can tell what's happening and don't need helpers. They just do some interlocked ops to coordinate the threads and then maybe do some non-protected accesses when they "know" they can't have races, etc. That's awesome until you have a mysterious problem like this . You could blame the min() for doing something weird, or blame the compiler for not being nice enough with it's volatile function, but IMO this is the type of thing that would never happen if the code was written more carefully.

Even when I'm at my most lazy, I would write this :


        // pull available work with a limit of 5 items per iteration
        LONG work = min(gAvailable, 5);

as

    LONG avail = gAvailable;
    LONG work = min(avail, 5);

but this is also an example where the hated "volatile" is showing its ugliness. I hate that the "volatile" is far away on the variable decl and not right here where I need it. Sometimes I would write :

    LONG avail = *((volatile LONG *)&gAvailable);
    LONG work = min(avail, 5);

but even better if I had my threading helpers I would do something like :

    LONG avail = LoadShared_Relaxed(&gAvailable);
    LONG work = min(avail, 5);

where I explicitly load the shared variable using "Relaxed" memory ordering semantics (actually I don't know the usage case, but this should probably have been Acquire memory semantics). LoadShared_Relaxed is nothing but *ptr , so many people don't see the point of having a function call there at all, but it makes it absolutely clear what is happening. It also just makes it more verbose to touch the shared variable, which encourages you to load it out into a local temporary, which is good.

Another option which I often use is to make a Shared<> template , so gAvailable would be Shared< LONG > gAvailable. Then you have to access it with members like LoadRelaxed() or StoreRelease() , etc.

I treat threading code like a loaded gun. I don't point it anyone's face, I store it without bullets, etc. You take these precautions not because they are absolutely necessary, but because they ensure you don't have bad surprises.

A lot of the times in rebuttal, people will show me their unsafely written code and say "well this works fine, tell me what's wrong with it". Umm, no, you don't understand the issue at all my friend. The point is that with unsafely written code, I have to use my brain to figure out whether it is working or not, and as we have seen, that is *very* very hard. Especially with threading and races. In fact the only way I could do it with any level of confidence is to instrument it and run it through Relacy or one of the other automatic race detectors.

Maybe I miss managing people and getting to pick on their code, so now I'm picking on random code from around the internet. Actually that might be fun, do a weekly code review of some random code I grab from the web ;)

8 comments:

sly-id said...

Pierre's post: http://www.codercorner.com/blog/?p=33

Autodidactic Asphyxiation said...

This is separate from your annotating memory locations v. code ranges, but still related to the "volatile" keyword.

So, it appears to be a common pattern that "volatile" is used to express that a single location is potentially being modified by multiple CPU threads. I've read things like this:

http://www.safercode.com/blog/2009/02/24/volatile-c-keyword-myths-dispelled.html

That seems to counterindicate using volatile for such purposes, and that you should rely on explicit barriers and not rely on compilers generating correct code in these cases.

It could be that this is not an issue pragmatically, since there is so much code out there that depends on this behavior that it would be a disastrous to change. However, even this would only be the case if you were focusing on similar architectures, but I don't think you're in that business.

cbloom said...

That article on volatile is not well informed.

The exact meaning of volatile depends on the compiler and platform. It is definitely possible to write good threaded code using volatile on MSVC 2005+. However, due to the amount of unclarity involved in doing so, I highly recommend against it.

See this post and mainly the links at the bottom of it :

http://cbloomrants.blogspot.com/2009/01/01-25-09-low-level-threading-junk-part.html

cbloom said...

In particular :

http://msdn.microsoft.com/en-us/library/12a04hfd%28VS.80%29.aspx

Microsoft Specific

[...]

*

A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.
*

A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.

This allows volatile objects to be used for memory locks and releases in multithreaded applications.

Autodidactic Asphyxiation said...

Well, yeah, I guess my point was that the volatile-pattern is for a narrow enough segment, that if you were working on middleware, it would be quite ill-advised. I mean, I think there are other weird, related bits like the thread-safety of function-static object initialization (safe on recent GCC, not sure otherwise). It is better not to rely on such behavior, IMO.

I think everyone has a line between what is technically conformant and what they know (or believe) works for the cases that matter, but using volatile this way seems like asking for trouble, and it seems that you basically agree.

Autodidactic Asphyxiation said...

Oh, yeah: code reviews are pretty damn essential. At work, some peers are code-reviewing a tagged-pointer string thing, and I've gotten very useful feedback, even though I was very careful in the first go. I mean, some comments are stupid and nitty (can you explicitly check for NULL? can you add parens?), but those are easy to just do instead of argue about.

At the very least, code review spreads institutional wisdom.

mmack said...

Just wanted to state for the record that I didn't write the offending code that my post refers to :)

I probably should have seen the problem sooner though, casting away volatile to appease the compiler is a pretty good sign something's wrong.

ryg said...

"I've written before about how it can get you into local minima traps where you don't see a better algorithm because your current bad one is so tweaked. One of the worst things you can do is super-optimize a bad algorithm. And on the flip side, it can actually be a huge boon to your coding if your low level stuff is really slow."
Even when it gets all the algorithms right, code that's too focused on the micro level can be very unpleasant to use because it doesn't provide any real guidance or vision. BLAS/LAPACK etc. are a prime example. They use good algorithms, they're very efficient, the actual functions are usually exactly as general as they should be, and they get all the little things right (like always having a stride separate from width for 2D arrays and so on). They also tend to make every program that uses them look like a FORTRAN 77 program.

"The first thing I do is start tracing through runs and see what's really happening, and then I start adding comments and asserts everywhere. Then I start wrapping up common functionality into classes that encapsulate something and enforce a rule so that I can be *sure* that what everyone thinks is happening really is happening. When I see code that does not *prove* to me that it's working right, I assume it's not working right."
I had to "deal with" (=fix bugs in) a bunch of async and multithreaded code over the past year, and it's gotten me into the habit of explicitly stating invariants and writing a "CheckInvariant()" function that gets called before and after every operation in debug builds (or, if they're expensive, put them inside a "#if PARANOIA" block so I can quickly turn them all on whenever I think something's fishy without making everything slower for everyone). I hit them very rarely (once or twice during that year), but they're excellent documentation, and just having them written down as code helped me find bugs as I added them.

Another great way to reason about this kind of code is to just draw a state diagram. First rule of thumb: If it's got more than 4 states, there's probably at least one serious bug in there. If you can't fit it onto a single page, throw the code away, it's never going to work. Start with the intended (valid) states and the transitions between them. Then go through states one by one and check that the transitions you've drawn are really all transitions that can happen. I find it easier to spot problems at that level, with only the essential actions visible and all other code effectively removed.

One problem with async/threaded code is that the inherent level of unpredictability means that all paths through a piece of code tend to get exercised. It's the equivalent to fuzz testing - take *any* piece of code that usually gets good data, (file loading, whatever) and feed it something that starts out properly and then switches to random junk at some point. Watch your code crash and burn, even when you're being careful. Most code doesn't survive actual adversarial testing.

"I suppose I am wild and reckless compared to many of the systems devs"
Then again, maybe not. The Therac-25 (link) is an interesting case in point. I wouldn't want the responsibility of working on medical applications (and I'd never work on military applications on principle), and a lot of my friends who are fairly conscientious about programming have said the same thing. I'd like to think that all people who do work on such systems are very careful, but realistically, a lot of them probably just don't worry about this kind of thing much.

old rants