7/31/2011

07-31-11 - An example that needs seq_cst -

No, not really. I thought I found the great white whale an algorithm that actually needs sequential consistency , but it turned out to be our old friend the StoreLoad problem.

It's worth having a quick look at because some of the issues are ones that pop up often.

I was rolling a user-space futex emulator. To test it I wrote a little mutex. A very simple mutex based on a very simplified futex might look like this :


struct futex_mutex2
{
    std::atomic<int> m_state;

    futex_mutex2() : m_state(0)
    {
    }
    ~futex_mutex2()
    {
    }

    void lock(futex_system * system)
    {
        if ( m_state($).exchange(1,rl::mo_acq_rel) )
        {
            void * h = system->prepare_wait();
            while ( m_state($).exchange(1,rl::mo_acq_rel) )
            {
                system->wait(h);
            }
            system->retire_wait();
        }
    }

    void unlock(futex_system * system)
    {
        m_state($).store(0,rl::mo_release);

        system->notify_one();
    }
};

(note that the actually "futexiness" of it is removed now for simplicity of this test ; also of course you should exchange state to a contended flag and all that, but that hides the problem, so that's removed here).

Then the super-simplified futex system (with all the actual futexiness removed, so that it's just a very simple waitset) is :


//#define MO    mo_seq_cst
#define MO  mo_acq_rel

struct futex_system
{
    HANDLE          m_handle;
    atomic<int>     m_count;

    /*************/
    
    futex_system()
    {
        m_handle = CreateEvent(NULL,0,0,NULL);
    
        m_count($).store(0);
    }
    ~futex_system()
    {
        CloseHandle(m_handle);  
    }
        
    void * prepare_wait( )
    {
        m_count($).fetch_add(1,MO);
        
        return (void *) m_handle;
    }

    void wait(void * h)
    {
        WaitForSingleObject((HANDLE)h, INFINITE);
    }

    void retire_wait( )
    {
        m_count($).fetch_add(-1,MO);
    }
    
    void notify_one( )
    {
        if ( m_count($).load(mo_acquire) == 0 ) // mo_seq_cst
            return;
        
        SetEvent(m_handle);
    }
};

So I was finding that it didn't work unless MO was seq_cst (and the load too).

The first point of note is that when I had the full futex system in there which had some internal std::mutexes - there was no bug, the ops on count($) didn't need to be seq cst. That's a common and nasty problem - if you have some ops internally that are seq_cst (such as mutex lock unlock), it can hide the fact that your other atomics are not memory ordered correctly. It was only when I removed the mutexes that the problem revealed itself, but it was actually there all along.

We've discussed this before when we asked "do mutexes need to be seq cst" ; the answer is NO if you just want them to provide mutual exclusion. But if you want them to act like an OS mutex, then the answer is YES. And the issue is that people can write code that is basically relying on the OS mutex being a barrier that provides more than just mutual exclusion.

The next point is that when I reduced the test down to just 2 threads, I still found that I needed seq_cst. That should be a tipoff that the problem does not actually arise from a need for total order. A true seq_cst problem should only show up when you go over 2 threads.

The real problem of course was here :


    void unlock(futex_system * system)
    {
        m_state($).store(0,rl::mo_release);

        //system->notify_one();

        #StoreLoad

        if ( system->m_count($).load(mo_acquire) == 0 ) // mo_seq_cst
            return;
        
        SetEvent(system->m_handle);
    }
};

we just need a StoreLoad barrier there. It should be obvious why we need a StoreLoad there but I'll be very explicit :

same as :

    void unlock(futex_system * system)
    {
        m_state($).store(0,rl::mo_release);

        int count = system->m_count($).load(mo_acquire);

        if ( count == 0 )
            return;
        
        SetEvent(system->m_handle);
    }

same as :

    void unlock(futex_system * system)
    {
        int count = system->m_count($).load(mo_acquire);

        // (*1)

        m_state($).store(0,rl::mo_release);

        if ( count == 0 )
            return;
        
        SetEvent(system->m_handle);
    }

so now at (*1) we have already loaded count and got a 0 (no wiaters); then the other thread trying to lock the mutex sees state == 1, locked, so it incs count and goes to sleep, and we return, and we have a deadlock.

As noted in the first post on this topic, there's no way to express only #StoreLoad in C++0x , so you wind up needing seq cst. Note that the case we cooked up here is almost identical to Thomasson's problem with "event count" so you can read about that :

Synchronization Algorithm Verificator for C++0x - Page 2
really simple portable eventcount... - comp.programming.threads Google Groups
C++0x sequentially consistent atomic operations - comp.programming.threads Google Groups
C++0x memory_order_acq_rel vs memory_order_seq_cst
appcoreac_queue_spsc - comp.programming.threads Google Groups

No comments:

old rants