cbloom rants: The Wait on Workers Problem

I'd like to open source my Oodle threading stuff. There's some cool stuff. Some day. Sigh.
This is an internal email I sent on 05-13-2015 :
Cliff notes : there's a good reason why OS'es use thread pools and fibers to solve this problem.
There's this problem that I call the "wait on workers problem".

You have some worker threads.  Worker threads pop pending work from a queue, do it, then post a completion event.  You can't ever call Wait
(Wait checks a condition, and if not set, puts the thread to sleep pending that condition) on them, because it could possibly deadlock you (no progress possible) since they could all go to sleep in waits, with work still pending and noone to do it.

The most obvious example is just to imagine you only have 1 worker thread.  Your worker thread does something like :

{
stuff
spawn work2
Wait(work2);
more stuff
}

Oh crap, work2 never runs because the Wait put me to sleep and there's no worker to do it.

In Oodle the solution I use is that you should never do a real Wait on a worker, instead you have to "Yield".

What Yield does is change your current work item back to Pending, but with the specified handle as a condition to being run.  Then it returns back to the work dispatcher loop.  So the above example becomes :

[worker thread dispatch loop pops Work1]
Work1:
{
stuff
spawnm work2
Yield(work2);
}
[Work1 is put back on the pending list, with work2 as a condition]
[worker thread dispatch loop pops Work2]
Work2
Work2 posts completion
[worker thread dispatch loop pops Work1]
{
more stuff
}

So.  The Yield solution works to an extent, but it runs into problems.

1. I only have "shallow yield" (non-stack-saving yield), so the worker must manually save its state or
stack variables to be able to resume.  I don't have "deep yield" that can yield from deep within a series
of calls, that would save the execution location and stack.

This can be a major problem in practice.  It means you can only yield from the top level, you can't ever be down inside some function calls and logic and decide you need to yield.  It means all your threading branching has to be very linear and mapped out at the top level of the work function.

It works great for simple linear processing like do an IO then yield on it, then process the results of the IO.  It doesn't work great for more complicated general parallelism.

2. Because Yield is different from Wait, you can't share code, and you can still easily accidentally break the system by calling Wait.

For example if you have a function like DoStuffInParallel , if you run that on a non-worker thread, it can launch some work items then Wait on them.  You can't do that from a worker.  You must rewrite it for being run from a worker to launch items then return a handle to yield on them (don't yield internally).

It creates an ugly and difficult heterogeneity between worker threads and non-worker threads.

So, we'd like to fix this.

What we'd like is essentially "deep yield" and we want it to just be like an OS Wait, so that functions can be used on worker threads or non-worker threads without changing them.

So my first naive idea was :

"Wait on Workers" can be solved by making Wait a dispatch.

Any time you call Wait, the system checks - am I a worker thread, and if so, instead of actually going into an OS wait, it pops and runs any runnable work.  After completing each work item, it rechecks the wait condition and if it's set, stops dispatching and returns to the Wait-caller.  

If there is no runnable work, you go into an OS wait on either the original wait condition OR runnable work available.

So the original example becomes :

{
stuff
spawn work2
Wait(work2);
[Wait sees we're a worker and runs the work dispatcher]
[work dispatcher pops work2]
{
Work2
}
[work dispatcher return sees work1 now runnable and returns]
more stuff
}

Essentially this is using the actual stack to do stack-saving.  Rather than trying to save the stack and instruction pointer, you just use the fact that they are saved by a normal function call & return.

This method has minor disadvantages in that it can require a very large amount of stack if you go very deep.

But the real problem is it can easily deadlock.

It only works for tree-structured work, and Waits that are only on work items.  If you have non-tree wait cycles, or waits on non-work-items, it can deadlock.

Here's one example :

Work1 :
{
stuff1
Wait on IO
stuff2
}

Work2 :
{
stuff1
Wait on Work1
stuff2
}

with current Oodle system, you can make work like this, and it will complete. (*)

In any system, if Work1 and Work2 get separate threads, they will complete.

But in a Dispatch-on-Wait system, if the Wait on IO in Work1 runs Work2, it will deadlock.

(* = the Oodle system ensures completability by only giving you a waitable handle to a work item when that work is enqueued to run.  So it's impossible to make loops.  But you can make something like the above by doing

h1 = Run(Work1)
Work2.handle = h1;
Run(Work2);

*)


Once you're started Work2 on your thread, you're hosed, you can't recover from that, because you already have Work1 in progress.

Dispatch-on-Wait really only works for a very limited work pattern :

you only Wait on work that you made yourself.
None of the work you make yourself can Wait on anything but work they make themselves.

Really it only allows you to run tree-structured child work, not general threading.

So, one option is use Dispatch-on-Wait but with a rule that if you're on a worker you can only use it for tree-strcutured-child-work.  If you need to do more general waits, you still do the coroutine Yield.

Or you can try to solve the general problem.  In hindsight the solution is obvious, since it's what the serious OS people do : thread pools.

You want to have 4 workers running on a 4 core system.

You actually have a thread pool of 32 worker threads (or whatever) and try to keep at least 4 running at all times.

Any time you Wait on a worker, you first Wake a thread from the pool, then put your thread to sleep.

Any time a worker completes a work item it checks how many worker threads are awake, and if it's too many it goes to sleep.

This is just a way of using the thread system to do the stack-saving and instruction-pointer saving that you need for "deep yield".  The Wait() is essentially doing that deep return back up to the Worker dispatch loop, but it does it by sleeping the current thread and waking another that can start from the dispatch loop.

This just magically fixes all the problems.  You can wait on arbitrary things, you can deep-wait anywhere, you don't get deadlocks.

The only disadvantage is the overhead of the thread switch.

If you really want the micro-efficiency, you could still provide a "WaitOnChildWork" that runs the work dispatch loop, which is to be used only for the tree-structured work case.  This lets you avoid the thread pool work and is a reasonably common case.
cbloom rants

7/26/2015

The Wait on Workers Problem

No comments:

old rants