2/23/2013

02-23-13 - Threading APIs that would be ideal

If you were writing an OS from scratch right now, what low level threading primitives should you provide? I contend they are rather different than the norm.

1. A low-level keyed event with double-checked wait.

Futex and NT's keyed event are both pretty great, but the ideal low level wait should be double-checked. I believe it should be something like :


HANDLE Waitset;

Waitset CreateWaitset();
DestroyWaitset(Waitset ws);

HANDLE wait_handle = Waitset_PrepareWait( Waitset ws , U64 key );

Waitset_CancelWait( Waitset ws , wait_handle h );
Waitset_Wait( Waitset ws , wait_handle h );

Waitset_Signal( Waitset ws, U64 key );

**Now, key of course could be a pointer, but there's no reason for it to be particularly. This is easily a superset of futex; if you want you could just have one global Waitset object, and key could be an int pointer, and you could check *ptr in between PrepareWait and Wait, that would give you futex. But you can do much more with this.

I prefer having a "waitset" object to put the waits on (like KeyedEvent), not just making it global/static (like futex). The advantage is 1. efficiency and 2. multiple meanings for a single "key". It's more efficient because you can have different waitsets for different uses, which makes each one cover fewer waits, which makes all the lookups faster. (that is, rather than 100 global waits pending, maybe you have 10 on 10 different waitsets). The other advantage is that you can reuse the same value for key without it confusing the system. You could have one Waitset where key is a pointer, and another where key is an internal handle number, etc.

2. A proper cond_var with waker-side condition checking.

First of all, a decent cond_var API combines a lot of the disjoint junk in the posix API. It should include the mutex, because that allows for vastly more efficient implementation :


    class condition_var
    {
    public:
        void lock();
        void unlock();
    
        // the below are always called with lock held :

        void unlock_wait_lock();
        
        void signal_unlock();
        void broadcast_unlock();

    private:
        ...
    };

The basic usage of this cv is like :

    cv.lock();

    while( ! condition )
    {
        cv.unlock_wait_lock();
    }

    .. do stuff with condition true ..

    cv.unlock();

A good implementation should do the compound ops (signal_unlock, etc) atomically. But I wouldn't require that because it's too hard.

But that's just background. What you really want is to put the condition check in the API. It should be :


        void wait_lock( [] { wake condition } );

The spec of the API is that "wake condition" is some code that will be run with the mutex locked, and when the function exits you will own the mutex and the condition is true. Then client usage is like :

    cv.wait_lock( condition );

    .. do stuff with condition true ..

    cv.unlock();

which allows for much more efficient implementation. The wake condition of the waiter list can be evaluated easily inside signal_unlock(), because that's always called with the mutex held.

5 comments:

Unknown said...

You might consider http://msdn.microsoft.com/en-us/library/windows/desktop/Hh706898(v=vs.85).aspx (wait on address) somewhere in your taxonomy.

Wake side condition testing is pretty cute, but it might be too hard to implement efficiently. You basically need to have a locked wait list under which you evaluate the condition, stretching out your lock hold time on the wake side. An alternate approach would be to just have more CVs, each of which corresponds to a condition that someone might wait for.

You generally should do the actual waking outside the lock, because thread scheduling is pretty expensive. Windows has the wait morphing optimization for those who signal a cv under an SRW lock, so the scheduling actually occurs when the lock is released.

Unknown said...

You might consider http://msdn.microsoft.com/en-us/library/windows/desktop/Hh706898(v=vs.85).aspx (wait on address) somewhere in your taxonomy.

Wake side condition testing is pretty cute, but it might be too hard to implement efficiently. You basically need to have a locked wait list under which you evaluate the condition, stretching out your lock hold time on the wake side. An alternate approach would be to just have more CVs, each of which corresponds to a condition that someone might wait for.

You generally should do the actual waking outside the lock, because thread scheduling is pretty expensive. Windows has the wait morphing optimization for those who signal a cv under an SRW lock, so the scheduling actually occurs when the lock is released.

cbloom said...

"Wake side condition testing is pretty cute, but it might be too hard to implement efficiently. You basically need to have a locked wait list under which you evaluate the condition, stretching out your lock hold time on the wake side."

Part of the point is that if the OS provides an API like this (rather than emulating it in user space), it could be super efficient.

The wait list is just the list of threads that the OS already has sleeping on that signal, no additional list. The API always has the CV mutex held when that list is checked, so no additional mutex is needed. The only addition is that OS has to test the condition before waking a thread.

But that can only be a performance *win*. It's of course much better to test the condition on the waker-side, rather than just blindly waking it and having it checked by the wakee. You can construct pathological cases, like some thread has a very expensive wait condition and it is at the head of the list and never true - but that same case would be even *worse* with the alternative (wakee-side checking, or out-of-cv checking).

And of course you still wouldn't want to just stick every thread on the same CV, that would be an abuse of the system. It's mainly for cases where you have multiple wait conditions that are sort of interrelated by slightly different; like several threads that might consume the same item but have slightly different conditions to do so.

jfb said...

Hi Charles,

Isn't this essentially the C# Monitor API?

It lets you do loops like:

lock (_instance) {
while (whatever)
{
while (nothing to do) { Monitor.Wait(_instance); }
do.processing;
}
}

cbloom said...

That looks like a standard condvar, doing wake-side condition checking.

old rants