cbloom rants: 06-19-12 - Two Learnings

Two things that I got wrong in the past and have only recently come around to what I believe is the right way.

1. Use pointer-sized ints for memory buffer sizes.

When I made the transition to building 32-64-bit cross platform I was really annoyed with the fact that I couldn't just use "int" everywhere any more. To make it easier for myself, I mostly just used int64 for memory buffers, and made a bunch of helpers for myself that returned int instead of size_t and intptr_t , eg. for things like vector.size() and strlen() and so on I used int-returning variants.

The nice thing about using int64 for memory buffer sizes is that your type doesn't change when you build different targets; that makes coding a lot simpler (and removes the possibility of bugs due to struct sizes changing and such).

Only very recently have I come around, and it's probably not for the reason you think. It's because using a separate type for "pointer sized thing" is a good way of documenting functions with types.

In the previous API post I talked about how I like the function arg types to be as self-documenting as possible. It's even better if you can make it a compile error or warning when you mis-use it. So I was looking at API's like :


Oodle_Read( OodleFile * f, void * buffer, S64 size, U64 filePos );

in particular at the call site where you have something like :


S64 i1,i2;

Oodle_Read(f,ptr,i1,i2);

you can't tell what the last two args are just by looking at the call site, and what's worse, you can switch them in the call order and get no warning. That sucks. It's much better to do :


Oodle_Read( OodleFile * f, void * buffer, SINTa size, U64 filePos );

(SINTa is the RAD pointer-sized signed int). It's now clearly documented in the variable types - this is the size of a memory buffer.

If you have an old code base, it's very annoying at first to do this, because you'll have a lot of conversions to do. It only becomes clean once you have changed your whole code base to follow the rules :


SINTA/UINTa - all memory buffer sizes and array sizes

S64/U64 - file positions and sizes

S32/U32 - really not used for much, just enums and modes and flags, not array sizes

Not only does this put more documentation on the variable, it also makes it more clear when you are doing dangerous cast-downs.

eg. when you try to read a whole file into a memory buffer. You get the file size as an S64 and then you need to pass something to malloc, which takes an SINTa. That makes a clear single point where you are doing a questionable cast. Once you've done that one cast, the rest of the code is cast-free which is nice.

Furthermore, I think it's augmented by having helpers for the cast-downs :


U32 oo64to32(U64 x);  // used super rarely, very weird thing to do
U32 ooAto32(UINTa x); // used super rarely, very weird thing to do

UINTa oo64toA(U64 x); // used for file sizes -> memory buffers, somewhat common

There are only three cast-downs needed, and really the first two should almost never be used; if you are using them a lot it's a sign of bad code. The last one is common, and it should do a check to ensure the cast is okay (eg. if using a > 2GB file size on a 32 bit system).

2. Avoid recursive mutexes.

Like many people, I read the wisdom of the ancients (old school threading programmers) who said that "recursive mutexes are of questionable value and lead to dangerous code; prefer non-recursive mutexes" ; I read that and I went "psshaw" and thought they were crotchety dinosaurs. Well, I've come around.

The thing about recursive mutexes is that like much code which is attractive to the novice, they make the trivial case simpler and look cleaner, but they make the hard case much worse, and the hard case if what actually matters.

The trivial case is : object only locks itself, never locks other objects, never can be freed while locked, etc.

But inevitably in real world threaded code you have to deal with the harder case for mutexes, which is : object might have to lock other objects (and they might have to lock it); lock order may be hard to establish; object may want to free itself while locked, etc.

The way that the novice normally makes a thread-safe object with a recursive mutex is to put a lock scoper in every function on that object :


Object::Func(...)
{
  m_mutex.Lock();

  do stuff;

  m_mutex.Unlock();
}

or really :

Object::Func(...)
{
  MUTEX_SCOPER(m_mutex);

  do stuff;
}

and crucially, when you call other member functions on yourself, that takes the mutex again recursively. (we're going to ignore the efficiency cost of taking the mutex many times). At first that way seems nice, it's easy, but then you encounter one of the real world problems and it falls apart.

What's better is to separate the mutex-taking from the actions, so instead you do :


Object::Func_Locked(...)
{
  do stuff;
}

Object::Func_Unlocked(...)
{
  MUTEX_SCOPER(m_mutex);

  Func_Locked();
}

Then all the functionality is in the _Locked functions and they can call each other.

So now you need to do something like call A->Func1 and B->Func2 , and it has to be done "atomically" , eg. with both locked. Then you run into the issue of mutex order of acquisition and possible deadlocks. If you have used the first style where objects lock themselves, then it is impossible for objects to call each other safely. That is, object A can never call object B, because object B might call object A and there's a deadlock. But with the _Locked functions, any _Locked function can call to another _Locked function, and they don't have to worry about deadlock; instead the lock taking is all pushed up and can be done in a standardized order (or even atomically if you have multi-mutex lock acquisiton).

The other thing that non-recursive locks cleans up, is that you know whenever you see a call to Unlock(), that's the *last* call to unlock and it definitely makes the object publicly visible. That is, there is a crucial transition point from "I own this object" to "others can have it", and with the recursive lock method, that transition point is muddied.

For example, consider this simple and common case : something in the object's internal state tells you it should be deleted. You must do that atomically, because if you unlock then delete, the internal state might change, eg. :


Object::Func()
{
  m_mutex.lock();

  if ( m_x > m_y )
     deleteMe = true;

  m_mutex.unlock();

  // *!*

  if ( deleteMe )
    delete this;
}

That code is no good, at the ! point, some other thread might touch object and make the check m_x > m_y no longer be true, and then we would be deleting this object incorrectly, when the invariant is not set. In order to make this work you need a combined "unlock_and_delete" operation. But to do that you need to know that your unlock is the *last* unlock - you can't do that in the recursive mutex style.

Basically the recursive mutex is sort of like optimizing the code that's already fast; it makes the simple case a bit simpler, but it makes the real world hard cases much worse. Better to just start from the beginning with the more robust long term solution.

Once you realize that the advantage of a recursive mutex is an illusion, then the advantages of the non- recursive mutex become appealing. You can implement a non-recursive mutex far more efficiently. It can take only 1 bit of memory. Every object can easily have its own non-recursive mutex and they don't even need to be allocated or initialized at all.

cbloom rants

6/19/2012

06-19-12 - Two Learnings

2 comments:

old rants