One option is just to use cmpxch16b to do loads and stores. That's atomic, but seems a bit excessive. I dunno, maybe it's fine. For loads that's simple enough, you just do a cmpxch16b with 0 and it gives you the value that was there. For stores it's a bit uglier because you have to do a loop and do at least two cmps (one to load, then one to store, which will only succeed if nobody else stored since the load).
The other option is to use the SSE 128 bit load/store. I *believe* that it is atomic (assuming no cache lines are straddled), however it is important to note that SSE memory ops on x86/x64 are weakly ordered, unlike normal memory ops which are all strongly ordered (every x86 load is #LoadLoad and every store is #StoreStore). So, to make strongly ordered 128 bit load/store from the SSE load store you have to do something like
load : sse load 128 lfence store : sfence sse store 128or such. I'm not completely sure that's right though and I'm having trouble finding any information on this. What I need is load_acquire_128 and store_release_128. (yes I know MSVC has intrinsics for LoadAcq_128 and StoreRel_128, but those are only for Itanium). (BTW a lot of people mistakenly think they need to use lfence or sfence with normal code; no no, those are only for SSE and write combined memory).
(ADDENDUM : yes, I think this is correct; movdqa (a = aligned) appears to be the correct atomic way to load/store 128 bits on x86; I'm a little worried that getting the weaker SSE memory model involved will break some of the assumptions about the x86 behavior of access ordering).
In other news, the random differences between GCC and MSVC are fucking annoying. Basically it's the GCC guys being annoying cocks; you know MS is not going to copy your syntax, but you could copy theirs. If you would get your heads out of your asses and stop trying to be holier than Redmond, you would realize it's better for the world if you provide compatible declarations. Shit like making me do __attribute__((always_inline)) instead of just supporting __forceinline is just annoying and pointless. Also, you all need to fix up your damn stdlib to be more like MS. Extensions like vsnprintf should be named _vsnprintf (MS style, correct) (* okay maybe not).
You also can't just use #defines to map the MSVC stuff to GCC, because often the qualifiers have to go in different places, so it's a real mess. BTW not having pragma warning disable is pretty horrendous. And no putting it on the command line is nowhere near equivalent, you want to be able to turn them on and off for specific bits of code where you know the warning is bogus or innocuous.
The other random problem I have is the printf format for 64 bit int (I64d) appears to be MS only. God damn it.