Any time you are in a work item, if you decide that you can get some more parallelism by doing a branch-merge inside that item, you need deep yield.
Remember you should never ever do an OS wait on a coroutine thread (with normal threads anyway; on a WinRT threadpool thread you can). The reason is the OS wait disables that worker thread, so you have one less. In the worst case, it leads to deadlock, because all your worker threads can be asleep waiting on work items, and with no worker threads they will never get done.
Anyway, I've cooked up a temporary work-around, it looks like this :
I'm in some function and I want to branch-merge
If I'm not on on a worker thread
-> just do a normal branch-merge, send the work off and use a Wait for completion
If I am on a worker thread :
inc target worker thread count
if # currently live worker threads is < target count
start a new worker thread (either create or wake from pool)
now do the branch-merge and use OS Wait
dec the target worker thread count
on each worker thread, after completing a work item and before popping more work :
if target worker thread count < currently live count
stop self (go back into a sleeping state in the pool)
this is basically using OS threads to implement stack-saving deep yield. It's not awesome,
but it is okay if deep yield is rare.