7/06/2011

07-06-11 - Who ordered Condition Variables -

I'm getting back into some low level threading stuff for a week or two and I'll try to write about it because it's very confusing and I always forget the basics.

Condition Variables are explained very strangely around the net. You'll find sites that say they "let you wait on a variable being set to certain value" (not true), "let you avoid polling" (not true), or "the mutex is a left-over from pre-pthreads implementations" (not true).

What a "Condition Variable" really is is a way to receive a Signal and enter a Mutex at the same time ("atomically" if you like).

Why do we want this?

The typical case is we are waiting on some state. When the state is not true, we want our thread to go into a real sleep. So we use an OS Wait() on the thread that's waiting for that state, and an OS Signal() when the state is set to wake the thread. (Wait and Signal might be "Event" on win32, or a semaphore in pthreads, or a futex, etc). Basically :


Waiting thread :

if ( state I want is not set )
{
  Wait(handle);
  // state I want should be set now
  // **


Signalling thread :

if ( I changed state to the one wanted )
  Signal(handle)

simple enough. The problem is, the invariant at (**) is not true. The state you want is NOT necessarily set there, because there's a race. Immediately after you receive the signal, you could be swapped out, and someone else could change the state, and then it would not be what you wanted.

So the obvious thing is to put a mutex around "state". To be less abstract I'll talk about a one item queue (aka a mailbox).


Waiting thread :

Lock mutex;
if ( mailbox empty )
{
  atomically { Unlock(mutex);   Wait(handle); (**) Lock(mutex) }
  // mailbox must be full now
}

Signaling thread :

Lock mutex
if ( mailbox empty )
{
  mailbox = some stuff;
  Signal(handle); Unlock(mutex);
}

now when the mailbox filler signals the handle, the waiter immediately wakes up and tries to lock the mutex and can't; the filler then unlocks and the waiter can run, and it is gauranteed to have an item.

Note that the line marked {atomically} *must* be atomic, otherwise you have a race at (**) just like before.

And this :

  atomically { Unlock(mutex);   Wait(handle); (**) Lock(mutex) }
is exactly what pthread_cond_wait() is.

Personally rather than introducing a new synchronization data type I would have preferred to just get a function that does unlock_wait_lock(). The "cond_var" in pthreads has nothing to do with a "condition"', it's just a waitable handle (and associated mutex and other wiring) in Windows lingo.

There's one more point worth talking about. In the thread that filled the mailbox, we did :


lock mutex
set state
signal
unlock

That's good because it gaurantees that the receiver gets the state it wants and is race free. It's bad because it causes some unnecessary "thread thrashing".

(thread thrashing is a lot like input latency that I mentioned a while ago; it's something you just want to constantly watch and be vigilant about tracking and removing. Any time a thread wakes up and does nothing and goes back to sleep, you are "thrashing" and just wasting massive amounts of CPU. You want to minimize the number of useless thread wakeups)

The alternative is :


lock mutex
set state
unlock
(**)
signal

now, there's a race a ** where the state can get changed before you signal, so the invariant in the receiver is no longer true.

In most cases, however, this is not actually bad, and this form is actually preferred because of its increased efficiency. You have to change the receiver to a "double-check" type of pattern. Something like :


Waiting thread :

Lock mutex;
if ( mailbox empty )
{
  retry:
  cond_wait(mutex,handle); //  atomically { Unlock(mutex); Wait(handle); Lock(mutex) }
  // mailbox may or may not be full now
  if ( mailbox empty ) goto retry;
  // now do work
}

Signaling thread :

Lock mutex
if ( mailbox empty )
{
  mailbox = some stuff;
  Unlock(mutex);
  // intentional race here
  Signal(handle);
}

In general I believe it's a safer design to treat the signal as meaning "wake up and check this condition" instead of "wake up and this condition is definitely set". Then you engineer to minimize the number of wakeups when the condition is not set.

BTW a better design of the primitive would have allowed the signalling thread to do


atomically { Unlock(mutex); Signal(handle); }

which would be the ideal thing. Unfortunately a normal cond_var is not expressed this way. However, apparently on some modern UNIXes cond_var actually *acts* this way. What you do is signal inside the mutex, but the other thread isn't actually woken up until you unlock the mutex. Unfortunately this is a hidden optimization (they should have just provided unlock_and_signal() as one call) and you can't rely on it if you're cross-platform.

Some links on this topic :
Usenet - Condition variables signal with or without mutex locked
condvars signal with mutex locked or not Lo�c OnStage
A word of caution when juggling pthread_cond_signalpthread_mutex_unlock - comp.programming.threads Google Groups

No comments:

old rants