11/11/2014

11-11-14 - x64 movdqa atomic test

How to do an atomic load/store of a 128 bit value on x64 is an ugly question.

The only guaranteed way is via cmpxchg16b. But that's an awfully slow way to do a load or store.

movdqa appears to be an atomic way to move 128 bits - on most chips. Not all. And Intel/AMD don't want to clearly identify the cases where it is atomic or not. (they specifically don't guarantee it)

At the moment, shipping code needs to use cmpx16 to be safe. (my tests indicate that the x64 chips in the modern game consoles *do* have atomic movdqa, so it seems safe to use there)

My main curiousity is whether there exist any modern ("Core 2" or newer) x64 chips that *do not* provide an atomic movdqa.

Anyhoo, here's a test to check if movdqa is atomic on your machine. If you like, run it and send me the results : (Windows, x64 only)

cb_x64_atomics_test.7z

The atomic test will just run for a while. If it has a failure it will break.

(you will get some errors about not having a v: or r: drive; you can just ignore them.)

Copy the output, or should be able to get the log file here :

"c:\oodlelogs\oodle_cb_test_x64_atomics.log"
"c:\oodlelogs\oodle_cb_test_x64_atomics.prev"

email me at cb at my domain.


For the record, an atomic 128 bit load/store for x64 Windows using cmpx16 :


#include <intrin.h>

void AtomicLoad128(__int64 * out_Result, __int64 volatile * in_LoadFrom)
{
    // do a swap of the value in out_Result to itself
    //  if it matches, it stays the same
    //  it it doesn't match, we get a load
    _InterlockedCompareExchange128(in_LoadFrom,out_Result[1],out_Result[0],out_Result);
}

void AtomicStore128(__int64 volatile * out_StoreTo,const __int64 * in_Value)
{
    // do an initial non-atomic load of StoreTo :
    __int64 check_StoreTo[2];
    check_StoreTo[0] = out_StoreTo[0];
    check_StoreTo[1] = out_StoreTo[1];
    // store with cmpx16 :
    while( ! _InterlockedCompareExchange128(out_StoreTo,in_Value[1],in_Value[0],check_StoreTo) )
    {
        // check_StoreTo was reloaded with the value in out_StoreTo
        _mm_pause();
    }
}

5 comments:

cbloom said...

Failure on AMD Athlon II

Anonymous said...

Should also fail on AMD Bobcat (2011/2012).

cbloom said...

Of note, the AMD Jaguar that's in the consoles is a modified AMD Bobcat. One of the modifications is a 128-bit data path for FPU/SSE.

Sander van Rossen said...

Okay, tried it on my
"Intel Core i7-3930K CPU @ 3.20GHz"
and it seems to work fine, I kept letting it run for about 30 min and no errors showed up.

cbloom said...

Thanks. Yeah I have many reports of success on Core 2 / Core i5 / Core i7, and no failures on those chips.

I believe that all modern Intel chips do have atomic movdqa (as they have 128-bit FPU/SSE data paths). To get a failure you have to go back to an old "Core" chip (just Core, not "Core 2" not Core i5, etc. (btw awesome freaking naming Intel)).

old rants