extern "C" DWORD __cdecl FastTlsGetValue_x86(int index)
{
  __asm
  {
    mov     eax,dword ptr fs:[00000018h]
    mov     ecx,index
    cmp     ecx,40h // 40h = 64
    jae     over64  // Jump if above or equal 
    // return Teb->TlsSlots[ dwTlsIndex ]
    // +0xe10 TlsSlots         : [64] Ptr32 Void
    mov     eax,dword ptr [eax+ecx*4+0E10h]
    jmp     done
  over64:   
    mov     eax,dword ptr [eax+0F94h]
    mov     eax,dword ptr [eax+ecx*4-100h]
  done:
  }
}
DWORD64 FastTlsGetValue_x64(int index)
{
    if ( index < 64 )
    {
        return __readgsqword( 0x1480 + index*8 );
    }
    else
    {
        DWORD64 * table = (DWORD64 *)  __readgsqword( 0x1780 );
        return table[ index - 64 ];
    }
}
the ASM one is from nynaeve originally.
(  1 
 2  ).
I'd rather rewrite it in C using __readfsdword but haven't bothered.
Note that these may cause a bogus failure in MS App Verifier.
Also, as noted many times in the past, you should just use the compiler __declspec thread under Windows when that's possible for you. (eg. you're not in a DLL pre-Vista).
I'm confused. Is this entire post telling us what not to do, without telling us what to do? I understand that __declspec(thread) is preferred when not prohibited, but what about when it is prohibited?
ReplyDeleteThe post implicitly assumes that the reader is aware of or can find TlsAlloc/TlsGetValue (or FlsAlloc/FlsGetValue).
ReplyDeleteThe other option is to have your own "State" struct that you pass through every function in your code. The State can then be thread-local, or fiber-local, or job-local, etc.