1. The graph-forwarding automated parallelism system.
The idea here is that you make all your async operations be like flow chart widgets, they have "in" and "out" channels, and then you can draw links and hook up ins to outs.
This creates dependencies automatically, each op depends on everything that feeds its input channels.
So for example you might do :
"c:\junk" :-> OpenFile :-> fh
(that is, OpenFile takes some string as input and then puts out a file handle named "fh")
fh :-> ReadFile :-> buf[0-32768]
buf :->
0-32768 :->
(that is, make a ReadFile op that takes the output of the Openfile, and outputs a valid buffer range)
buf[0-32768] :-> SPU_LZCompress :-> lzSize , compbuf[0-lzSize]
compbuf
(do an LZ compress on the valid buf range and output to compressed buf)
lzSize , compbuf[0-lzSize] :-> WriteFile
etc..
This sets up a chain of operations with dependencies, you can fire it off and then wait on it all to complete.
So, in a sense it's nice because you don't have to write code that waits on the completion of each op and fires the next op and so on.
There are various ways you could set up the async chains, obviously GUI tools where you drag lines around are popular for this sort of thing, but I think that's a terrible way to go. You could just have a text markup, or some creation functions that you call to build up the graph.
Another interesting option is to use the "just run the code" method. That is, you make proxy classes for all the variable types and do-nothing functions with the names of the ops (ReadFile, etc.); then you just run this fake imperative code, and all it does is record the calls and the arguments, and uses that to build the graph. That's easy enough to do for code without branches, but that's sort of a trivial case and I'm not sure how to make it work with branches. In fact in general this type of thing sucks bad for code with loops or branches.
Anyway, I think that this method is basically a terrible idea, except for one thing : creating the graph of the entire async operation before doing it can be a huge performance win. It allows you to "see the future" in terms of what the client wants to do, and thus make better scheduling decisions to maximimize utilization of your available computation resources and disk at all times.
In the simplest case, if the client calls a huge Read and the a huge LZcompress after that, that's a dumb non-parallel way to do things, but in the normal imperative Oodle I can't do anything about it, because at the time I get the Read I don't know what's coming after it. If you gave me the graph, I could go, oh hey during the Read I'm not using the CPU at all, and during the big LZCompress I'm not using the disk, so let me break those into smaller pieces and overlap them. Obviously you can schedule IO's around the timeline so that they try to be issued early enough so their results are back when needed. (though even with the full async graph you can't really schedule right unless you know how long the cpu operations are going to take).
There are even more subtle low level ways that not knowing the future gets me. In the whole worker thread system, there are crucial decisions like "should I wake up a new worker thread to take this work item, or wait for one of the existing worker threads to take it?" or "should I signal the worker threads as I make each item, or wait until I make them all?" ; or even just deciding which worker thread should take a task to maximize cache coherence. You cannot possibly get these decisions right without knowing the future.
Anyhoo, I don't think the advantage outweighs the horribleness of writing code this way, so on to the next one :
2. The coroutine auto-yielding system.
What if your file IO was always done from a coroutine, and instead of blocking when it didn't have the bytes needed, it just yielded the coroutine? That would give you fully async file IO (in the sense that you never block a thread just waiting on IO), and you could write code just like plain sync IO.
eg. you would just write :
// in a coroutine :
FILE * fp = fopen("c:\junk","rb"); // yields the coroutine for the async open
int c = fgetc(fp); // might yield for a buffer fill
etc..
This is sort of appealing, it certainly makes it easy to write async IO code. I personally don't really love the fact that
the thread yielding is totally hidden, I like major functional operation to be clear in the imperative code.
The real practical problem is that you just can't make this nice in the hacky switch-based C coroutine method that I use. You really need a language that supports coroutines natively.
You can cook up a clunky way of doing this with the C coroutines, something like :
#define cofread(fp,buf,size) \
for(;;)
{
{
OodleAsyncHandle h = Oodle_ReadOrReturnAsyncHandle(fp,buf,size);
// Oodle_ReadOrReturnAsyncHandle returns 0 if the Read could complete without waiting
if ( ! h )
break;
Coroutine_AddDependency(h);
}
COROUTINE_YIELD();
Coroutine_FlushDoneDependencies();
}
where COROUTINE_YIELD is my macro that does something like :
self->state = 7;
return;
case 7:
So now you can call cofread() from an Oodle coroutine and it kind of does what we want.
But, because of the fact that we use switches and returns, we can't use any stack variables in the arguments to cofread ; eg :
{
int size = 16384;
cofread(fp,buf,size);
}
is no good, if you resume at the YIELD point inside cofread, "size" is gone. (you'd get a case statement skips variable initialization
error or something like that).
Basically with the hacky C coroutine method you just can't do funny business like this where you hide the control flow; you have to make the yield points very explicit because they are points where you lose all your stack variables and must recreate them.
Perhaps a larger issue is that if you really were going to go with the full coroutine auto-yielding system, you'd want to be able to yield from inside function calls, not just at the root level of the coroutine. eg. you'd like to call functions that might do file IO or fire off worker tasks, and you want them to be able to yield too. That's not possible unless you have full stack-saving coroutines.
ADDENDUM for clarity :
It's totally trivial to fix the lack of stack saving in a limited way. All I have to do is reserve a few slots in the coroutine struct
that cofread can use to store its variables. So cofread becomes :
#define cofread(fp,buf,size) \
co->m_fp = fp; co->m_buf = buf; co->m_size = size;
for(;;)
{
{
OodleAsyncHandle h = Oodle_ReadOrReturnAsyncHandle(co->m_fp,co->m_buf,co->m_size);
// Oodle_ReadOrReturnAsyncHandle returns 0 if the Read could complete without waiting
if ( ! h )
break;
Coroutine_AddDependency(h);
}
COROUTINE_YIELD();
Coroutine_FlushDoneDependencies();
}
and now you really can use cofread within a coroutine, and you can use local variables as arguments to it, and it yields if it can't
complete immediately, and that's all nice.
But it's *not* what I really want for this proposal, which is a full transparent system that a client can build their IO on. The problem is that cofread can only be called at the "root" level of a coroutine. That is, because the "yield" is not a true language yield that preserves function call stack, it must be in the base coroutine function.
eg. you can do :
MyCoroutine1( coroutine * co )
{
COROUTINE_START()
g_fp = cofopen("blah");
cofread(g_fp,g_buf,1024);
COROUTINE_DONE()
}
That's easy. But you cannnot do :
void MyHelper()
{
g_fp = cofopen("blah");
cofread(g_fp,g_buf,1024);
}
MyCoroutine2( coroutine * co )
{
COROUTINE_START()
MyHelper();
COROUTINE_DONE()
}
and that lack of composability makes it unusable as a general purpose way to do IO.
To be super clear and redundant again - Oodle of course does support and extensively uses coroutine IO, but it is for small tasks that I want to have maximum performance (like, eg. read and decompress a Package), where the limitation of having to write all your yielding code within one function is okay. The idea of proposal #2 is to make a system that is visible to the client and totally transparent, which they could use to write all their game IO.
(ASIDE : there is a way to do this in C++ in theory (but not in practice). What you do is do all your yielding at the coroutine root level still, either using the switch method or the lambda method (doesn't really matter). To do yields inside function calls, what you do is have your IO routines throw a DataNotReady exception. Then at the coroutine root level you catch that exception and yield. When you resume from the yield, you retry the function call and should make it further this time (but might throw again). To do this, all your functions must be fully rewindable, that is they should be exception safe, and should use classes that back out any uncommitted changes on destruction. I believe this makes the idea technically possible, but unusable in reality).