3/08/2012

03-08-12 - Oodle Coroutines

I wrote a year ago ( 03-11-11 | Worklets , IO , and Coroutines ) about coroutines for doing tasks with external delays. (a year ago! wtf)

This is a small followup to say that yes, in fact coroutines are awesome for this. I never bothered to try to preserve local variables, it's not worth it. You can just store data that you want to save across a "yield" into a struct. In Oodle whenever you are doing a threaded task, you always have a Work context, so it's easy to just stuff your data in there.

I've always been a big fan of coroutines for game scripting languages. You can do things like :


Walk to TV
Turn on TV
if exists Couch
  Walk to Couch

etc. You just write it like linear imperative code, but in fact some of those things take time and yield out of the coroutine.

So obviously the same thing is great for IO or SPU tasks or whatever. You can write :


Vec3 * array = malloc(...);

io_read( array , ... );  //! yields

Mat3 m = camera.view * ...;

spu_transform(array, m); //! yields

object->m_array = array;

and it just looks like nice linear code, but actually you lose execution there at the ! marks and you will only proceed after that operation finishes.

To actually implement the coroutines I have to use macros, which is a bit ugly, but not intolerable. I use the C switch method as previously described; normally I auto-generate the labels for the switch with __COUNTER__ so you can just write :


 .. code ..

YIELD

 .. code ..

and the YIELD macro does something like :

  N = __COUNTER__;
  work->next = N;
  return eCoroutine_Refire;
  }
case N:
  {

(note the braces which mean that variables in one yield chunk are not visible to the next; this means that the failure of the coroutine to maintain local variables is a compile error and thus not surprising).

The exception is if you want to jump around to different blocks of the coroutine, then you need to manually specify a label and you can jump to that label.

Note that yielding without a dependancy is kind of pointless; these coroutines are not yielding to share the CPU, they are yielding because they need to wait on some async handle to finish. So generally when you yield it's because you have some handle (or handles) to async tasks.

The way a yielding call like "spu_transform(array, m);" in the previous example has to be implemented is by starting an async spu task, and then setting the handle as a dependency. It would be something like :


#define spu_transform(args) \
  Handle h = start_spu_transform(args); \
  Work_SetDeps(work,h); \
  YIELD

The coroutine yield will then stop running the work, and the work now has a dependency, so it won't resume until the dependency is done. eg. it waits for the spu task to complete.

I use coroutines basically every time I have to do some processing on a file. For one thing, to minimize memory use I need to stream the file through a double-buffer. For another, you often need to open the file before you start processing, and that needs to be part of the async operation chain as well. So a typical processing coroutine looks something like :


int WorkFunc( Work * work )
{
COROUTINE_START

  Handle h = ioq_open_file( work->filename );

COROUTINE_YIELD_TO(h)

  if open failed -> abort

  get file size

  h = start read chunk 0
  chunkI = 1; // the next chunk is 1

YIELD(h)

  // read of chunkI^1 just finished

  // start the next read :  
  h = ioq_read( chunk[chunkI] );
  chunkI ^= 1;

  // process the chunk we just received :
  process( chunk[chunkI] );

  if done
  {
    ioq_close_file();
    return
  }

YIELD_REPEAT
}

where "YIELD_REPEAT" means resume at the same label so you repeat the current block.

The last block of the coroutine runs over and over, ping-ponging on the double buffer, and yields if the next IO is not done yet when the processing of each block is done.

5 comments:

jfb said...

Funny thing about coroutines is they are already in common use in games. Badly done, hard to follow, and labeled "Finite State Machines". :)

The switch() hack really does make them usable. I recently implemented USB mass storage firmware -- far easier when the interaction with the PC is procedural, especially to see visibily that it performs as the spec shows.

Here's a trick I came up with recently for the struct coroutine approach, you may find it useful:

Before your switch you have a #define IRQFILTER() or whatnot, and you reserve numbers under some value for your non-IRQ part, and above for your IRQ part. On the event (interrupt in my case), you call the function regardless.

If you make the return() have a version that adds the threshold value to what it would normally return, you now have a way to transport your coroutine up into interrupts and back down. In this case it sets the state variable, pulses and returns. The other handler receives the pulse, state variable is no longer filtering it out from taking action (even if it can be activated some other way), and runs.

On the PC I suppose what you could do with this is move your function between threads, if say one part has to execute in a particular thread, and then move it back out again when it's done.

nereus said...

I love coroutines. In c++11 you can use a (tiny) lambda with a closure to make the macro ick go away. Then the state machine stuff is right there in the reader's face. It's pleasingly elegant. I got the idea from protothreads but I'm not surprised to see you bring them up.

castano said...

> (note the braces which mean that variables in one yield chunk are not visible to the next; this means that the failure of the coroutine to maintain local variables is a compile error and thus not surprising).

This has the effect of not letting being able to yield within a loop. I guess that's what motivates the "YIELD_REPEAT".

Wouldn't be it be more clear to remove the braces from yields and actually express that as a loop?

COROUTINE_START
// open file
YIELD;
// read chunk
YIELD;
do {
// read chunk
// process
YIELD;
} while (!done)
// close file
COROUTINE_END

Visual C++ produces warnings or errors when variables are declared inside a switch, so misuse should be fairly obvious:

error C2360: initialization of 'i' is skipped by 'case' label
warning C4700: uninitialized local variable 'i' used

cbloom said...

Hmm, well, you are right that that works and it looks prettier in some ways/cases.

But in practice you almost always have local variables, and then you get into trouble.

The problem is this bit :

YIELD;
do {
// read chunk
// process
YIELD;
} while (!done)
// close file

which is actually :

return 5; case 5:
// part 1
do {
// part 2
return 6; case 6:
// part 3
} while (!done)

so to do stuff with locals you would have to make it :

YIELD; //return 5; case 5:
{
// part 1
}
do
{
{
// part 2
work->m_done = done;
}
YIELD; // return 6; case 6:
} while ( ! work->m_done )


So I'm not sure if that's actually nicer in practice; maybe I'll try writing some real code like that and see how it goes.

cbloom said...

The really ugly thing about YIELD_REPEAT is it creates a looping scope without that scope being visible at all. eg. there are no braces you can scope-match. It's basically a goto of course.

old rants