11-23-12 - Global State Considered Harmful

In code design, a frequent pattern is that of singleton state machines. eg. a module like "the log" or "memory allocation" which has various attributes you set up that affect its operation, and then subsequent calls are affected by those attributes. eg. things like :

Log_SetOutputFile( FILE * f );


Log_Printf( const char * fmt .... );

or :

malloc_setminimumalignment( 16 );


malloc( size_t size );

The goal of this kind of design is to make the common use API minimal, and have a place to store the settings (in the singleton) so they don't have to be passed in all the time. So, eg. Log_Printf() doesn't have to pass in all the options associated with logging, they are stored in global state.

I propose that global state like this is the classic mistake of improving the easy case. For small code bases with only one programmer, they are mostly okay. But in large code bases, with multi-threading, with chunks of code written independently and then combined, they are a disaster.

Let's look at the problems :

1. Multi-threading.

This is an obvious disaster and pretty much a nail in the coffin for global state. Say you have some code like :

pcb * previous_callback = malloc_setfailcallback( my_malloc_fail_callback );

void * ptr = malloc( big_size ); 

malloc_setfailcallback( previous_callback );

this is okay single threaded, but if other threads are using malloc, you just set the "failcallback" for them as well during that span. You've created a nasty race. And of course you have no idea whether the failcallback that you wanted is actually set when you call malloc because someone else might change it on another thread.

Now, an obvious solution is to make the state thread-local. That fixed the above snippet, but some times you want to change the state so that other threads are affected. So now you have to have thread-local versions and global versions of everything. This is a viable, but messy, solution. The full solution is :

There's a global version of all state variables. There are also thread-local copies of all the global state. The thread-local copies have a special value that means "inherit from global state". The initial value of all the thread-local state should be "inherit". All state-setting APIs must have a flag for whether they should set the global state or the thread-local state. Scoped thread-local state changes (such as the above example) need to restore the thread-local state to "inherit".

This can be made to work (I'm using for the Log system in Oodle at the moment) but it really is a very large conceptual burden on the client code and I don't recommend it.

There's another way that these global-state singletons are horrible for multi-threading, and that's that they create dependencies between threads that are not obvious or intentional. A little utility function that just calls some simple functions picks up these ties to shared variables and needs synchronization protection with the global state. This is related to :

2. Non-local effects.

The global state makes the functions that use it non-"pure" in a very hidden way. It means that innocuous functions can break code that's very far away from it in hidden ways.

One of the classic disasters of global state is the x87 (FPU) control word. Say you have a function like :

void func1()

    set x87 CW

    do a bunch of math that relies on that CW


    do more math that relies on CW

    restore CW

Even without threading problems (the x87 CW is thread-local under any normal OS), this code has nasty non-local effects.

Some branch of code way out in func2() might rely on the CW being in a certain state, or it might change the CW and that breaks func1().

You don't want to be able to break code very far away from you in a hidden way, which is what all global state does. Particularly in the multi-threaded world, you want to be able to detect pure functions at a glance, or if a function is not pure, you need to be able to see what it depends on.

3. Undocumented and un-asserted requirements.

Any code base with global state is just full of bugs waiting to happen.

Any 3d graphics programmer knows about the nightmare of the GPU state machine. To actually write robust GPU code, you have to check every single render state at the start of the function to ensure that it is set up the way you expect. Good code always expresses (and checks) its requirements, and global state makes that very hard.

This is a big problem even in a single-source code base, but even worse with multiple programmers, and a total disaster when trying to copy-paste code between different products.

Even something like taking a function that's called in one spot in the code and calling it in another spot can be a hidden bug if it relied on some global state that was set up in just the right way in that original spot. That's terrible, as much as possible functions should be self-contained and work the same no matter where they are called. It's sort of like "movement of call site invariance symmetry" ; the action of a function should be determined only by its arguments (as much as possible) and any memory locations that it reads should be as clearly documented as possible.

4. Code sharing.

I believe that global state is part of what makes C code so hard to share.

If you take a code snippet that relies on some specific global state out of its content and paste it somewhere else, it no longer works. Part of the problem is that nobody documents or checks that the global state they need is set. But a bigger issue is :

If you take two chunks of code that work independently and just link them together, they might no longer work. If they share some global state, either intentionally or accidentally, and set it up differently, suddenly they are stomping on each other and breaking each other.

Obviously this occurs with anything in stdlib, or on the processor, or in the OS (for example there are lots of per-Process settings in Windows; eg. if you take some libraries that want a different time period, or process priority class, or priviledge level, etc. etc. you can break them just by putting them together).

Ideally this really should not be so. You should be able to link together separate libs and they should not break each other. Global state is very bad.

Okay, so we hate global state and want to avoid it. What can we do? I don't really have the answer to this because I've only recently come to this conclusion and don't have years of experience, which is what it takes to really make a good decision.

One option is the thread-local global state with inheritance and overrides as sketched above. There are some nice things about the thread-local-inherits-global method. One is that you do still have global state, so you can change the options somewhere and it affects all users. (eg. if you hit 'L' to toggle logging that can change the global state, and any thread or scope that hasn't explicitly sets it picks up the global option immediately).

Other solutions :

1. Pass in everything :

When it's reasonable to do so, try to pass in the options rather than setting them on a singleton. This may make the client code uglier and longer to type at first, but is better down the road.

eg. rather than

malloc_set_alignment( 16 );

malloc( size );

you would do :

malloc_aligned( size , 16 );

One change I've made to Oodle is taking state out of the async systems and putting in the args for each launch. It used to be like :

OodleWork_SetKickImmediate( OodleKickImmediate_No );
OodleWork_SetPriority( OodlePriority_High );
OodleWork_Run( job );

and now it's :

OodleWork_Run( job , OodleKickImmediate_No, OodlePriority_High );

2. An options struct rather than lots of args.

I distinguish this from #3 because it's sort of a bridge between the two. In particular I think of an "options struct" as just plain values - it doesn't have to be cleaned up, it could be const or made with an initializer list. You just use this when the number of options is too large and if you frequently set up the options once and then use it many times.

So eg. the above would be :

OodleWorkOptions wopts = { OodleKickImmediate_No, OodlePriority_High  };
OodleWork_Run( job , &wopts );

Now I should emphasize that we already have given ourselves great power and clarity. The options struct could just be global, and then you have the standard mess with that. You could have it in the TLS so you have per-thread options. And then you could locally override even the thread-local options in some scope. Subroutines should take OodleWorkOptions as a parameter so the caller can control how things inside are run, otherwise you lose the ability to affect child code which a global state system has.

Note also that options structs are dangerous for maintenance because of the C default initializer value of 0 and the fact that there's no warning for partially assigned structs. You can fix this by either making 0 mean "default" for every value, or making 0 mean "invalid" (and assert) - do not have 0 be a valid value which is anything but default. Another option is to require a magic number in the last value of the struct; unfortunately this is only caught at runtime, not compile time, which makes it ugly for a library. Because of that it may be best to only expose Set() functions for the struct and make the initializer list inaccessible.

The options struct can inherit values when its created; eg. it might fill any non-explicitly given values (eg. the 0 default) by inheriting from global options. As long as you never store options (you just make them on the stack), and each frame tick you get back to a root for all threads that has no options on the stack, then global options percolate out at least once a frame. (so for example the 'L' key to toggle logging will affect all threads on the next frame).

3. An initialized state object that you pass around.

Rather than a global singleton for things like The Log or The Allocator, this idea is to completely remove the concept that there is only one of those.

Instead, Log or Allocator is a struct that is passed in, and must be used to do those options. eg. like :

void FunctionThatMightLogOrAllocate( Log * l, Allocator * a , int x , int y )
    if ( x )
        Log_Printf( l , "some stuff" );

    if ( y )
        void * p = malloc( a , 32 );

        free( a , p );

now you can set options on your object, which may be a per-thread object or it might be global, or it might even be unique to the scope.

This is very powerful, it lets you do things like make an "arena" allocator in a scope ; the arena is allocated from the parent allocator and passed to the child functions. eg :

void MakeSuffixTrie( Allocator * a , U8 * buf, int bufSize )

    Allocator_Arena arena( a , bufSize * 4 );

    MakeSuffixTrie_Sub( &arena, buf, bufSize );

The idea is there's no global state, everything is passed down.

At first the fact that you have to pass down a state pointer to use malloc seems like an excessive pain in the ass, but it has advantages. It makes it super clear in the signature of a function which subsystems it might use. You get no more surprises because you forgot that your Mat3::Invert function logs about degeneracy.

It's unclear to me whether this would be too much of a burden in real world large code bases like games.


Per Vognsen said...

As an API user, I usually want both the dynamically scoped option (your "global with inherits and overrides") and the explicit parameter passing option. Unfortunately, most APIs will give you one or the other, based on some misplaced sense of code aesthetics that values orthogonality over usability.

Unknown said...

Options structs in C++ can be made to serve both of these needs reasonably well with chaining (setters that return *this). All code examples here: https://gist.github.com/4153316

First two examples should be obvious. Doing this like this also gives named parameters (which is one context where bools are OK).

For C APIs, there's a variant below the C++ code. The OodleWorkOpts* are just the same as OodleWorkOpts_Data*, but that's not exposed anywhere. So each of these functions just returns the pointer you pass in, but Defaults() and FromData() (FromData takes a struct that's supposed to be fully initialized) are the only functions in the API that actually give you an OodleWorkOpts* from something you can declare yourself.

That what, you can do the chaining equivalent, and you can also enforce that something is either explicitly initialized to Defaults or explicitly declared by the user to be fully initialized. (That said, I'd normally not even provide the _FromData variant unless it's *really* important).

Calling more than 2 or 3 such C-ish setter calls inside in a function call results in LISP-tastic nesting depth - the C++ version with references is slicker there. Then again, if you override more than 2 parameters, it's more readable to set up the options separately anyway, so that might be a good thing.

nothings said...

On the last point, what I've done in the past is made it so if you pass in NULL for (say) the Allocator, you get the default/global one; this keeps you from needing to expand the API excessively, and then internally the code just does "if (a==NULL) a = global;" at the top of each function.

One thing you have to be careful about in creating reusable code is balancing the API design between usability in big complex programs and in simple one-off programs. A big complex program can afford to write wrappers around your API to make it do what it wants--e.g. you could have no support for a global-ish API, leaving writing that to the app. For little programs, it should work immediately.

Thus, stb_image some global state (for HDR-to-LDR gamma conversion), which will suck for threads but isn't a common path and wasn't worth forcing extra complexity on simple apps. stb_malloc has no global state; if you want to use it for a global allocator you have to wrap it and define one explicitly and provide your own API for accessing it, because stb_malloc isn't intended for little one-off apps.

Brian Balamut said...

Regarding #3, I've worked with a codebase that had that design - with a global 'manager' passed as an argument all over the gameplay code. This bloats things up pretty bad, all your function calls end up needing it if it really is one of those core global systems that's often used. What's worse, if you end up in some leaf function that needs it but doesn't have it, you have to refactor all access to it until you make it up the chain to some higher level that still has the manager argument - you end up changing the signature for multiple methods (and all usage) just to add one line of code.

The day I refactored this to just be a global object there was much rejoicing. It's not as pure, but the overall code quality is better. I still agree with your points, I just think this was a case where someone took the concept 'globals == bad' but didn't try to architect a viable alternative.

Yann Collet said...

Yeah, i'm using the proposition n°3 almost all the time, although i tend to name it "void* context".
I like the idea that it is merely an (undescribed) void*, so that its content can change anytime later, without impacting existing user code.

In the end, it tends to look almost like an object. It's just that the internal variables are stored into a structure pointed at by the void* pointer, which must be explicitly passed along at each function call. So it adds one argument which could be avoided in a C++ object. But that's basically identical. and that's C :)

c0re said...

Hmm.. Aren't you essentially saying just make those chunks of code re-entrant?

cbloom said...

"Hmm.. Aren't you essentially saying just make those chunks of code re-entrant?"

No, it's not exactly the same thing.

cbloom said...

@Balamut -

yeah, I certainly am afraid of that.

I've worked in plenty of game codebases where you had to pass around a "World *" or an "Engine *" even though there only ever was one of them, and it just made the code generally more verbose and annoying than it needed to be.

So certainly it's not a no-brainer.

For low level helper libs (ala cblib or stb.h) my idea is that you would write the low level core code with no globals. eg. functions take Allocator * and Log * as arguments. But then you could/should also provide a helper that has global singleton instances of those objects, and gives you APIs that use the global version.

Or something.

old rants