02-16-09 - Low Level Threading Junk Part 2.5

Back to threading. See the old posts to catch up . We talked a bit about memory models and singletons and such in the last section. One thing we haven't really talked about is the value of TLS (Thread Local Storage). In our singleton discussion we mentioned that the C++ "static" is a huge dangerous trap that should be avoided. In fact, all statics and globals are dangerous and must either be eliminated or thread protected.

One way to fix statics/globals nicely is to make them thread-local. In fact, I would argue that statics/globals should be thread local by default, and only with great care should you make them not thread local. Basically access to TLS variables is totally fast and safe and you don't need to worry about threading issues as long as you only work on TLS values. It's a good pattern to have minimal thread communication areas, and then copy what you need into your TLS. (of course there are other ways to do "thread local" data other than TLS, such as just by convention - like you know that a certain global is only touched by one thread, but that requires care and commenting).

The TLS is a block of memory associated with each thread that you can index into to get at variables; it's analogous to the statics section of a normal exe. Now there are functions to manually get access to the TLS but there's basically no reason to use those, and I don't recommend it. Every compiler on every platform now provides something like "declspec(thread)" which declares a variable to go in TLS. You just define it like :

__thread int thisVarIsInTLS;

If you want to know a lot about the details of how TLS works, I found a great set of articles by Nynaeve all about how TLS is implemented in Windows :

Nynaeve � Blog Archive � Thread Local Storage, part 1 Overview
Nynaeve � Blog Archive � Thread Local Storage, part 2 Explicit TLS
Nynaeve � Blog Archive � Thread Local Storage, part 3 Compiler and linker support for implicit TLS
Nynaeve � Blog Archive � Thread Local Storage, part 4 Accessing __declspec(thread) data
Nynaeve � Blog Archive � Thread Local Storage, part 5 Loader support for __declspec(thread) variables (process initializatio
Nynaeve � Blog Archive � Thread Local Storage, part 6 Design problems with the Windows Server 2003 (and earlier) approach to
Nynaeve � Blog Archive � Thread Local Storage, part 7 Windows Vista support for __declspec(thread) in demand loaded DLLs
Nynaeve � Blog Archive � Thread Local Storage, part 8 Wrap-up

One important thing to note : MSVC supports CINIT for TLS. That means you can use real C++ classes in TLS, and what happens is each time a thread is created, it runs the "cinit" list for those objects. This is *not* widely supported by other compilers, so if you want to write code that is widely portable you should use only basic C types in TLS. It's a bit of a shame because having the cinit and shutdown for TLS would be handy. Instead you have to make your own "shim" or "thunk" routine around your thread to init and shutdown objects.

So far as I can tell, every major compiler on every major platform supports something like "declspec thread" or "__thread". There are some nice advantages to doing this over using manual TLS allocation and indexing. For one thing the compiler can write much better code to access it since the offsets are constants not variables, and it can make use of the direct pointer to the TLS struct. More importantly, you don't need to worry about the initial set up of the TLS indeces in a thread-safe single-instantiation way (which we saw in earlier articles is a pain).

Next post will be on some details of lock-free algorithms.

BTW while I'm posting big blog series, I gathered Larry Osterman's series on Concurrency. This series is not great, there are some errors, but it is pretty comprehensive, so :

Concurrency, Part 1
Concurrency, Part 2 - Avoiding the problem
Concurrency, Part 3 - What if you can't avoid the issue
Concurrency, Part 4 - Deadlocks
Concurrency, part 5 - What happens when you can't avoid lock misordering
Concurrency, part 6 - Reference counting is hard )
Concurrency, part 7 - Why would you ever want to use concurrency in your application
Concurrency, Part 8 - Concurrency for scalability
Concurrency, Part 9 - APIs that enable scalable programming
Concurrency, Part 10 - How do you know if you've got a scalability issue
Concurrency, Part 11 - Hidden scalability issues
Concurrency, part 12 - Hidden scalability issues, part 2
Concurrency, Part 13 - Concurrency and the CLR
Concurrency, Part 14 - Odds and Ends
Concurrency, Part 15 - Wrapping it all up.
Concurrency, part way too many1 - concurrency and the C runtime library.
Concurrency, part way too many2 - Concurrency and the Win32 API set
Concurrency, part way too many3 - Concurrency and Windows


castano said...

Note that the __declspec(thread) mechanism is not supported in explicitely loaded DLLs, which makes it rather useless:


castano said...

Ah, I see that's also discussed in Nynaeve's part 6 article.

Something that I think he doesn't mention is that it does not only fail with dynamically loaded libraries, but also with delay loaded libraries that are linked implicitly.

cbloom said...

DLL shmee-LL

Anonymous said...

Unfortunately, we have to ship DLLs at RAD. It's really annoying to have to take the function-call hit to get a thread-local value inside the goddamned memory allocator... but we do.

cbloom said...

Ah the hard life of library providers.

I'm starting to think that *all* variables should be thread local by default in the future. Or maybe that there should just be no more simple global variable syntax, like if you put

int i;

outside of a function, it would be a compile error. You would have to say :

shared_global int i = 0;


thread_local int i = 0;

Accidentally using globals on a thread is such a super common threading error.

old rants