It occurs to me that this could massively simplify the giant API.
What you do is treat "array data" as a special type of object that can be linearly broken up. (I noted previously about having RW locks in every object and special-casing arrays by letting them be RW-locked in portions instead of always locking the whole buffer).
Then arrays could have two special ways of running async :
1. Stream. A straightforward futures sequence to do something like read-compress-write would wait for the whole file read to be done before starting the compress. What you could do instead is have the read op immediately return a "stream future" which would be able to dole out portions of the read as it completed. Any call that processes data linearly can be a streamer, so "compress" could also return a stream future, and "write" would then be able to write out compressed bits as they are made, rather than waiting on the whole op.
2. Branch-merge. This is less of an architectural thing than just a helper (you can easily write it client-side with normal futures); it takes an array and runs the future on portions of it, rather than running on the whole thing. But having this helper in from the beginning means you don't have to write lots of special case branch-merges to do things like compress large files in several pieces.
So you basically just have a bunch of simple APIs that don't look particularly Async. Read just returns a buffer (future). ReadStream returns a buffer stream future. They look like simple buffer->buffer APIs and you don't have to write special cases for all the various async chains, because it's easy for the client to chain things together as they please.
To be redundant, the win is that you can write a function like Compress() and you write it just like a synchronous buffer-to-buffer function, but it's arguments can be futures and its return value can be a future.
Compress() should actually be a stackful coroutine, so that if the input buffer is a Stream buffer, then when you try to access bytes
that aren't yet available in that buffer, you Yield the coroutine (pending on the stream filling).
Functions take futures as arguments and return futures.
Every function is actually run as a stackful coroutine on the worker threadpool.
Functions just look like synchronous code, but things like file IO cause a coroutine Yield rather than a thread Wait.
All objects are ref-counted and create automatic dependency chains.
All objects have built-in RW locks, arrays have RW locks on regions.
Parallelism is achieved through generic Stream and Branch/Merge facilities.
While this all sounds very nice in theory, I'm sure in practice it wouldn't work. What I've found is that every parallel routine I write requires new hacky special-casing to make it really run at full efficiency.