8/01/2011

08-01-11 - Double checked wait

Something that we have touched on a few times is the "double checked wait" pattern. It goes like this :
consumer :

if ( not available )
{
    prepare_wait();

    if ( not available )
    {
        wait();
    }
    else
    {
        cancel_wait();
    }
}

producer :

make available
signal_waiters();
now, why do we do this? Well, if you did just a naive check like this :

consumer :

if ( not available )
{
    // (*1)
    wait();
}

producer :

make available
signal_waiters();

you have a race. What happens is you check available and see none, so you step in to *1 ; then the producer runs, publishes whatever and signals - but there are no waiters yet so the signal is lost. Then you go into the wait() and deadlock. This is the "lost wakeup" problem.

So, the double check avoids this race. What must the semantics of prepare_wait & wait be for it to work? It's something like this :

Any signal that happens between "prepare_wait" and "wait" must cause "wait" to not block (either because the waitable handle is signalled, or through some other mechanism).

Some implementations of a prepare_wait/wait mechanism may have spurious signals; eg. wait might not block even though you shouldn't really have gotten a signal; because of that you will usually loop in the consumer.

Now let's look at a few specific solutions to this problem :

condition variables

This is the locking solution to the race. It doesn't use double-checked wait, instead it uses a mutex to protect the race; the naive producer/consumer is replaced with :


consumer :

mutex.lock();
if ( not available )
{
    unlock_wait_lock();
}

producer :

mutex.lock();
make available
signal_waiters();
mutex.unlock();

which prevents the race because you hold the mutex in the consumer across the condition check and the decision to go into the wait.

waitset

Any simple waitset can be used in this scenario with a double-checked wait. For example a trivial waitset based on Event is like this :


waitset.prepare_wait :
    add current thread's Event to list of waiters

waitset.wait :
    WaitForSingleObject(my Event)

waitset.signal_waiters :
    signal all events in list of waiters

for instance, "waitset" could be a vector of handles with a mutex protecting access to that vector. This would be a race without the prepare_wait and double checking.

In this case we ensure the double-checked semantics works because the current thread is actually added to the waitset in prepare_wait. So any signal that happens before we get into wait() will set our Event, and our wait() will not actually block us, because the event is already set.

eventcount

Thomasson's eventcount accomplishes the same thing but in a different way. A simplified version of it works like this :


eventcount.prepare_wait :
    return key = m_count

eventcount.wait :
    if ( key == m_count )
        Wait(event)

eventcount.signal_waiters :
    m_count++;
    signal event;

(note : event is a single shared broadcast event here)

in this case, prepare_wait doesn't actually add you to the waitset, so signals don't go to you, but it still works, because if signal was called in the gap, the count will increase and no longer match your key, so you will not do the wait.

That is, it specifically detects the race - it sees "was there a signal between when I did prepare_wait and wait?" , and if so, it doesn't go into the wait. The consumer should loop, so you keep trying to enter the wait until you get to check your condition without a signal firing.

futex

It just occurred to me yesterday that futex is actually another solution to this exact same problem. You may recall - futex does an internal check of your pointer against a value, and only goes into the wait if the value matches.

producer/consumer with futex is like this :


consumer :

if ( value = not_available )
{
    futex_wait(&value,not_available);
}

producer :

value = available
futex_signal(&value);

this may look like just a single wait at a glance, but if we blow out what futex_wait is doing :

consumer :

if ( value == not_available )
{
    //futex_wait(&value,not_available);

    futex_prepare_wait(&value);
    if ( value == not_available )
        futex_commit_wait(&value);
    else
        futex_cancel_wait(&value);
}

producer :

value = available
futex_signal(&value);

we see can clearly see that futex is just double-checked-wait in disguise.

That is, futex is really our beloved prepare_wait -> wait pattern , but only for the case that the wait condition is of the form *ptr == something.


Do we like the futex API? Not really. I mean it's nice that the OS provides it, but if you are designing your own waitset you would never make the API like that. It confines you to only working on single ints, and your condition has to be int == value. A two-call API like "prepare_wait / wait" is much more flexible, it lets you check conditions like "is this lockfree queue empty" which are impossible to do with futex (what you wind up doing is just doing the double-check yourself and use futex just as an "Event", either that or duplicating the condition into an int for futex's benefit (but that is risky, it can race if not done right, so not recommended)).

BTW some of the later extensions of futex are very cool, like bitset waiting and requeue.

No comments:

old rants