Comments on cbloom rants: 12-21-12 - Coroutine-centric Architecture

boost.context is part of boost-libraries since 1.5...

2013-01-29T23:26:20.924-08:00

boost.context is part of boost-libraries since 1.51 (boost.org).
boost.coroutine will be released with boost-1.53 (next week).
boost.fiber is available at github.com/olk/boost-fiber (but not ready yet, need some fixes and docu).

Cool. Where can I download boost.context and boos...

2013-01-29T11:21:37.509-08:00

Cool. Where can I download boost.context and boost.fiber ?

"boost.fiber does not require something like a GC."

I didn't meant that implementing coroutines requires GC. I meant that I believe a realistic/usable larger application that is coroutine-centric should use GC, otherwise the lifetime management in micro-coroutines is too heinous.

Interestingly I'm currently working on such a ...

2013-01-29T10:51:43.158-08:00

Interestingly I'm currently working on such a library - it's called boost.fiber.

I would describe coroutines as a language feature and fibers as resource related (similar to threads).

boost.fiber uses boost.context (context swapping/switching) and provides an interface similar to boost.thread, e.g. the library contains classes like mutex, condition, future<>, unique_lock etc. (but does not use pthread or Windows-thread related stuff).

boost.fiber does not require something like a GC.

Synchronizing fibers running in different threads is possible too.

Work-steeling (usually used in thread-pools) can be done via fiber-steeling, e.g. one thread can steel (migrate) fibers from another thread.

If I get split-stack managed then we have an go-pedant in C++ too.

Most co-routine implementations I've seen lack...

2012-12-22T11:54:03.196-08:00

Most co-routine implementations I've seen lack the ability to carefully set up and share data between the "scheduler" and the coroutine itself.

What I mean is - you can always make coroutines function by saving the entire execution context of the coroutine every time you switch away, and restoring it every time you switch back. But if you're switching between coroutines that share a lot of state, it's not necessarily very efficient.

Ideally you'd have some notion of "sibling" coroutines - and the notion that when you yield you're only going to yield to a sibling, not to just any coroutine anywhere in the entire program. In 99% of cases the sibling is exactly the same code, just in a different instance. That allows the compiler to make certain assumptions about which registers are live, which will be dirtied, etc.

It also allows the compiler to schedule coroutines into each other. For example - you know that whenever you start a coroutine of this family, the first thing it does is run a prologue that loads the top four stack entries into registers r0-r3 - this happens on every entry point to the coroutine. OK, so if you're somewhere that you know is heading for a switch to this type of coroutine, you can do this there, rather than wait until you're actually in the coroutine wanting to use those registers as addresses or whatever.

In functions there's already some of the above functionality in function signatures (i.e. a particular signature makes a family of functions - you don't know which one you're going to call, but it's one of only a few) - but it seems like the same concept would be useful in coroutines.

The above was all learned doing pixel shaders as coroutines of course - you get a substantial perf increase.

Talk on Concurrency in Go that shows off the primi...

2012-12-22T09:55:58.587-08:00

Talk on Concurrency in Go that shows off the primitives here.

On the non-functional side, there's a few new ...

2012-12-22T04:27:42.604-08:00

On the non-functional side, there's a few new imperative languages with that kind of design too, most prominently Go (www.golang.org) and Rust (www.rust-lang.org). Both check pretty much every point on your check list, with one distinction: both don't automatically promote return values to futures. They do both have built-in communication primitives though: Go has channels and Rust pipes, and the idiomatic way to handle this in both languages is to pass in a channel/pipe end point to the coroutine and send a value once it's done. (Both also have lambdas so it's easy to wrap a function that doesn't send its return value that way).

Both don't have batch-starting (that I've seen) or any explicit dependency tracking between coroutines; again they instead phrase this in terms of the communication primitives: If you're waiting for something to finish before you start a dependent event, make it send you a dummy value on completion. Both do have async waits for multiple events at once (usually wait-any, not wait-all).

Both have proper coroutines with stack-saving and small default stack size (~4k). Both use segmented stacks that grow dynamically and aren't necessarily contiguous (so there's no need to reserve large amounts of address space per coroutine).

Neither will expose any of the OS threading primitives by default, and instead use proper async versions that yield instead of blocking.

Both are GC'ed; Rust has a peculiar but interesting model where everything defaults to stack allocated (with compiler-verified memory safety, i.e. pointers to stack variables may not escape their assigned lifetime) and there's two separate heaps: the "managed heap", which is per-coroutine and fully GC'ed (the compiler enforces that such values don't become visible to other coroutines), and the "exchange heap", which is shared but may only contain objects that have exactly one pointer to them (similar to std::unique_ptr). Because the managed heaps are all separate, they can be GCed on a per-coroutine basis without problems. Interesting model, but I have no idea how well it works in practice. Go is just "GC everything that can't be on the stack", which is less efficient but also a lot simpler to reason about :)

Both don't have any locks per object, and instead emphasize using the built-in comm primitives over shared memory. Seems like the right call for normal usage. Go now also comes with a built-in race checker courtesy of Dmitry Vyukov for the parts that do use shared memory.

Both express dependencies by arguments and/or explicit passing of comm channels (which are value types).

Haven't done anything with Rust yet. I've done some simple stuff with Go and really like it so far.

Also, look at Racket (for example) which has a nic...

2012-12-22T03:27:19.757-08:00

Also, look at Racket (for example) which has a nice (but not usually efficient unfortunately due to GC issues) and consistent way to do multi-threaded computations all along with continuations basically mixing future/touch to off-load expressions in other threads and call/cc to emulate coroutine (and exceptions, return...)

At least, it is a good inspiration :-) and syntactically beautiful.

CPS code transformation is basically what you want...

2012-12-22T03:16:39.323-08:00

CPS code transformation is basically what you want here.

CPS is basically the way code is transformed (by the compiler not a human :-)) in most Scheme compilers. Each function is given the next function to execute. This make call/cc free and therefore one shot continuation (coroutine) also free.

Note that this requires proper tail-call recursion optimization but you do not need to save the stack.

However, while it is pretty straightforward in scheme where you only have expressions and only very few primary constructs (lambda, if, set!, defime mostly), I am not sure CPS transform is doable in C which has statement and expressions.