9/18/2013

09-18-13 - Fast TLS on Windows

For the record; don't use this blah blah unsafe unnecessary blah blah.


extern "C" DWORD __cdecl FastTlsGetValue_x86(int index)
{
  __asm
  {
    mov     eax,dword ptr fs:[00000018h]
    mov     ecx,index

    cmp     ecx,40h // 40h = 64
    jae     over64  // Jump if above or equal 

    // return Teb->TlsSlots[ dwTlsIndex ]
    // +0xe10 TlsSlots         : [64] Ptr32 Void
    mov     eax,dword ptr [eax+ecx*4+0E10h]
    jmp     done

  over64:   
    mov     eax,dword ptr [eax+0F94h]
    mov     eax,dword ptr [eax+ecx*4-100h]

  done:
  }
}

DWORD64 FastTlsGetValue_x64(int index)
{
    if ( index < 64 )
    {
        return __readgsqword( 0x1480 + index*8 );
    }
    else
    {
        DWORD64 * table = (DWORD64 *)  __readgsqword( 0x1780 );
        return table[ index - 64 ];
    }
}

the ASM one is from nynaeve originally. ( 1 2 ). I'd rather rewrite it in C using __readfsdword but haven't bothered.

Note that these may cause a bogus failure in MS App Verifier.

Also, as noted many times in the past, you should just use the compiler __declspec thread under Windows when that's possible for you. (eg. you're not in a DLL pre-Vista).

2 comments:

  1. I'm confused. Is this entire post telling us what not to do, without telling us what to do? I understand that __declspec(thread) is preferred when not prohibited, but what about when it is prohibited?

    ReplyDelete
  2. The post implicitly assumes that the reader is aware of or can find TlsAlloc/TlsGetValue (or FlsAlloc/FlsGetValue).

    The other option is to have your own "State" struct that you pass through every function in your code. The State can then be thread-local, or fiber-local, or job-local, etc.

    ReplyDelete