7/29/2011

07-29-11 - Spinning

It's important to distinguish the two different reasons for spinning in threaded code; it's unfortunate that we don't have different names for them.

Spinning due to a contended RMW operation is not generally horrible, though this can be a bit subtle.

If you know that the spin can only happen when some other thread made progress, that's the good case. (though you can still livelock if the "progress" that the other thread made is in a loop)

One source of problems is on LL/SC architectures, the lock is usually for the whole cache line. That means somebody else could be fiddling things on your cache line and you might abort your SC without ever making progress. eg. something like :


atomic<int> lock;
int spins; // on same cache line

many threads do :

for(;;)
{
  if ( CAS(x,0,1) ) break;

  spins++;
}

this looks like just a spinlock that counts its spins, but in fact it could be a livelock because the write to "spins" invalidates the cacheline of someone trying to do the CAS, and none of the CAS's ever make progress.

There are also plenty of "lockfree" code snippets posted around the net that are actually more like spin-locks, which we shall discuss next.

The other type of spinning is waiting for some other thread to set some condition.

I believe that this type of spinning basically has no place in games. It really only be used in kernel environments where you can ensure that the thread you are waiting on is current running. If you can't do that, you might be spinning on a thread which is swapped out, which could be a very long spin indeed.

These can come up in slightly subtle places. One example we've seen is this MPMC queue ; the spin-backoff there is actually a wait on a specific, and is thus very bad.

Any time you are actually spinning waiting on a specific thread to do something, you need to actually use an OS Wait on that thread to make sure it gets CPU time.

Relacy's context-bound test will tell you any time you have a spin without a backoff.yield call , so that's nice. Then you have to look at any place you do a backoff.yield and try to see if it's waiting for a thread to set a condition, or only spinning if another thread made progress during that time.

Interestingly a simple spinlock illustrates both types of spins :


for(;;)
{
    if ( lock == 1 )
    {
        // spin wait, bad
        continue;
    }

    if ( ! cas(lock,0,1) )
    {
        // spin contention, okay
        continue;
    }

    break;
}

Sometimes "spin waits" can be well hidden in lockfree code. The question you should ask yourself is :

If one of the threads is swapped out (stops running), is my progress blocked?

If that is true, it's a spin wait, and that's bad.

To be clear, whenever what you are really doing is a "wait" - that is, I need this specific other thread to make progress before I can - you really want to explicitly enforce that; either by using an OS wait, or by knowing what thread it is you're waiting on, etc.

A related topic is that "exponential backoff" type stuff in spins is over-rated. The idea of backoff is that you spin and something like :


first 10 times - mm_pause()
next 5 times - SwitchToThread()
then Sleep(1), then Sleep(2) , etc..

this is sort of okay, because if the other thread is running you act like a spin and make progress quickly, while if the other thread is asleep you will eventually go to sleep and give it some CPU time.

But in reality this is shit. The problem is that if you *ever* actually hit the Sleep(1) , that's a disaster. And of course you could have to sleep much longer. Say for example you have 10 threads and you are waiting on a specific one (without using a proper OS wait). When you Sleep, you might not get the thread you want to run; you might have to sleep all your other threads too to finally get it to go.

It's just sloppy threading programming and the performance is too unpredictable for games.

9 comments:

won3d said...

Off thread, I asked you about _mm_pause. This post brings up another related question (although only academic interest to me, since I'm not doing windev anymore).

SwitchToThread has a return value. Is there anything sane you could do with it? The spinlock I used to have used to do a Sleep(0) if SwitchToThread returned false, but I wasn't very rigorous in figuring out if it helped or not.

cbloom said...

"SwitchToThread has a return value. Is there anything sane you could do with it?"

That's a good question; I think the answer is no. It's nice that they return that though.

A Sleep(0) is not going to do anything good there because SwitchToThread does strictly more yielding than Sleep(0) ; you could do a Sleep(1) though.

I guess if you're in a spin and you know you must have another thread running and you SwitchToThread and it returns false, you could assert there.
Or if you knew the other thread holding the lock could be in some kind of wait, you could then break it out.

Another issue is that when you SwitchToThread in something like a spin lock you really would prefer to switch to a thread that's in your process, but that's not guaranteed.

won3d said...

Since it's been forever, I looked up what SwitchToThread was doing:

"The yield of execution is in effect for up to one thread-scheduling time slice on the processor of the calling thread. The operating system will not switch execution to another processor, even if that processor is idle or is running a thread of lower priority."

And Sleep:

"A value of zero causes the thread to relinquish the remainder of its time slice to any other thread that is ready to run. If there are no other threads ready to run, the function returns immediately, and the thread continues execution.

Windows XP/2000: A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution. This behavior changed starting with Windows Server 2003."

So maybe:

if (!SwitchToThread()) Sleep(0);

is reasonable. SwitchToThread apparently only affects the current processor, not the entire system, so spinning on SwitchToThread might not help very much, although Sleep(0) might cause a thread to migrate from one processor to another. And I don't know if processor means an actual core, or a hyperthread.

cbloom said...

I don't think so, but it's very hard to tell from the docs. Multi-proc scheduling on Windows is a bit hard to follow and not well documented.

"The operating system will not switch execution to another processor, even if that processor is idle or is running a thread of lower priority"

This is very confusing. I guess what it's saying is that after another thread swaps in and takes your place, you will always go into WAITING state, you won't get immediately moved to another processor, even if that processor is idle.

However, that's pretty standard for the Windows scheduler. Presumably if that other processor goes into its "I'm idle, get a thread" it would in fact take your thread.

It's unclear to me from that whether SwitchToThread will grab a thread from another processor to swap in for you or not. (?)

"A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run."

Does this mean on any processor? Is that different from STT or not?

The biggest difference between them that I know of is that Sleep(0) will not yield to lower priority threads but STT will. On single proc that means STT always does more yielding that Sleep 0.

Anyway, I don't think it can hurt. It makes some sense to do it the opposite way though, something like

Sleep(0)
if I wasn't swapped out , STT

won3d said...

http://www.bluebytesoftware.com/blog/2006/08/23/PriorityinducedStarvationWhySleep1IsBetterThanSleep0AndTheWindowsBalanceSetManager.aspx

This link has some interesting info (although, it describes the older behavior for Sleep(0), which apparently cause starvation). It seems to suggest that SwitchToThread is better. The most interesting part is the "balance set manager" that is responsible for the priority and quantum boost heuristics. It was the first time I heard this name.

cbloom said...

The fact that Sleep(0) isn't a reliable yield should be well known by now; that doesn't mean you should never use it, or you should use Sleep(1) (you shouldn't). In fact even with STT you might not make progress, which is why people wind up using exponential backoffs that go Sleep(0), then STT, then Sleep(1), etc.

Obviously a good software engineer doesn't just randomly stick calls to STT or whatever in their spin loops, they make a function like MySpinYield() and call their function so that they can encapsulate the tricky platform-specific knowledge in one place. It just staggers me how rarely I see anyone do things like this.

But really you shouldn't EVER do a spin-wait on Windows. There's no reason for it. Any place you are tempted to use a spin-wait, use an "eventcount" instead and do a proper wait. (perhaps after a few spins, but only if you have very good reason to believe those spins actually help)

BTW I've written about the balanced set manager in detail here :

http://cbloomrants.blogspot.com/2010/09/09-12-10-defficiency-of-windows-multi.html


Also BTW don't think "all my threads are the same priority, Sleep(0) is fine for me" - no they aren't. You have no idea what priority your threads are in Windows, they are changing constantly!! Even in a case where you tried to make threads of equal priority, Sleep(0) might just spin until some dynamic priority boost wears off.

won3d said...

http://cbloomrants.blogspot.com/2010/09/09-12-10-defficiency-of-windows-multi.html

Wow, my memory is much worse than it used to be.

Unknown said...

On modern versions of Windows (2K3 and later), SwitchToThread and Sleep(0) have similar behavior and they will only look at the current runqueue.

I think your overall thinking is good, though. Sleeps and SwitchToThread have no place in well-written multi-threaded code. Events and waitqueues are better.

cbloom said...

"On modern versions of Windows (2K3 and later), SwitchToThread and Sleep(0) have similar behavior and they will only look at the current runqueue. "

Are you claiming that the well-documented difference WRST thread priorities is not true?

old rants