07-13-10 - Tech Blurg

How do I atomically store or load 128 bit data on x64 ?

One option is just to use cmpxch16b to do loads and stores. That's atomic, but seems a bit excessive. I dunno, maybe it's fine. For loads that's simple enough, you just do a cmpxch16b with 0 and it gives you the value that was there. For stores it's a bit uglier because you have to do a loop and do at least two cmps (one to load, then one to store, which will only succeed if nobody else stored since the load).

The other option is to use the SSE 128 bit load/store. I *believe* that it is atomic (assuming no cache lines are straddled), however it is important to note that SSE memory ops on x86/x64 are weakly ordered, unlike normal memory ops which are all strongly ordered (every x86 load is #LoadLoad and every store is #StoreStore). So, to make strongly ordered 128 bit load/store from the SSE load store you have to do something like

load :
    sse load 128

store :
    sse store 128

or such. I'm not completely sure that's right though and I'm having trouble finding any information on this. What I need is load_acquire_128 and store_release_128. (yes I know MSVC has intrinsics for LoadAcq_128 and StoreRel_128, but those are only for Itanium). (BTW a lot of people mistakenly think they need to use lfence or sfence with normal code; no no, those are only for SSE and write combined memory).

(ADDENDUM : yes, I think this is correct; movdqa (a = aligned) appears to be the correct atomic way to load/store 128 bits on x86; I'm a little worried that getting the weaker SSE memory model involved will break some of the assumptions about the x86 behavior of access ordering).

In other news, the random differences between GCC and MSVC are fucking annoying. Basically it's the GCC guys being annoying cocks; you know MS is not going to copy your syntax, but you could copy theirs. If you would get your heads out of your asses and stop trying to be holier than Redmond, you would realize it's better for the world if you provide compatible declarations. Shit like making me do __attribute__((always_inline)) instead of just supporting __forceinline is just annoying and pointless. Also, you all need to fix up your damn stdlib to be more like MS. Extensions like vsnprintf should be named _vsnprintf (MS style, correct) (* okay maybe not).

You also can't just use #defines to map the MSVC stuff to GCC, because often the qualifiers have to go in different places, so it's a real mess. BTW not having pragma warning disable is pretty horrendous. And no putting it on the command line is nowhere near equivalent, you want to be able to turn them on and off for specific bits of code where you know the warning is bogus or innocuous.

The other random problem I have is the printf format for 64 bit int (I64d) appears to be MS only. God damn it.


Mojo said...

"%lli", "%llu".

The gcc/msvc syntax thing is retarded but c++ sucks more for not having defined where extention decorators or attributes should go.

cbloom said...

"%lli", "%llu".

Bleck. So to use this you have to do something like

#define I64D "%lli"

then to printf you do

printf("what : " I64D "\n",num);

so gross.

Of course in cblib I would just use something like

printf("what : %s"\n",StrNum(num).CStr());

but I can't do that in RAD non-C++ land.

Julien Koenen said...

We invested a couple hours some time ago to implement our own versions of sprintf/vsnprintf and use those exclusively on all platforms. That helped a lot with other issues as well (Like replacing ',' with '.' in german localization and other fun...)

won3d said...

Yeah, the whole format string thing is a disaster.


Ian Romanick said...

vsnprintf is C99. _vsprintf is MS rubbish. MS needs to pull their heads out of their asses and support the TEN YEAR OLD C standard. Most of the extensions in GCC *predate* the similar extensions in MSVC. So, to be fair, it's acutally just MS being dicks. Big f'ing surprise.

cbloom said...

"MS needs to pull their heads out of their asses and support the TEN YEAR OLD C standard"

Hmm, yeah okay, you have a point.

But the stdlib names are not really the worst part because you can just #define them to match, it's all the extra stuff outside the standard (pragma pack, align, etc) that are the worst problems.

Chris Green said...

%lld and %llu work on both compilers afaik.

old rants