I already knew about the evils of sleep-loops and the goodness of events for doing thread waiting, I just got lazy in this particular loop and did it the sleep way cuz its easier. Perfectionist bosses can be annoying when you feel like you did something well enough, but they're also damn helpful, and they push you to be honest about the quality of your work. It's the role I used to play at OW, and I missed not having it when I was working on my own. Anyway I fixed it and the missing time gap disappeared and now I am happy. (this could also be a blog about the awesomeness of good tools - I never would have known about the time gap without the threadprofiler and viewer).
BTW about windows events : something that has been hammered home for me recently is that Windows manual reset events are just a race condition waiting to happen. !! Always use auto-reset events !!
For example, if the worker thread is sending you signals once in a while to let you know certain things have happened, if you have code like this in the main thread with a manual reset event :
WaitForSingleObject ( hEvent ); (*) race here ClearEvent( hEvent );
That's a race. If your thread is swapped out after the Wait but before the Clear, other threads can run along and signal the Event. Then it switches back to you and you clear the event and lose any signals that were set.
I have often used manual reset events in the past successfuly, because I use them to signal a shared data item has changed which is then protected by a critical section. As long as you take the critical section and use the shared data to reevaluate whether to wait, that pattern is safe from races even with a manual reset event (it doesn't matter than you may miss some event signals because you will see it when you check the data). But now that I am doing more lock-free stuff, it's more important to get the event right, and hell there's no reason to not just use auto-reset always, so do!! (when in doubt, be safer).
And now a teaser/brag. This is the performance of the work dispatcher :
linear : 0.770 seconds, 2312158485 clocks 100.000% of linear time 1 threads : 0.770 seconds, 2313504188 clocks 100.058% 2 threads : 0.386 seconds, 1160138775 clocks 50.175% , speedup = 1.993 = 0.99650 per core 3 threads : 0.258 seconds, 774830107 clocks 33.511% , speedup = 2.984 = 0.99467 4 threads : 0.194 seconds, 582231143 clocks 25.181% , speedup = 3.971 = 0.99275 5 threads : 0.194 seconds, 582824062 clocks 25.207% 6 threads : 0.194 seconds, 583144837 clocks 25.221% 7 threads : 0.194 seconds, 583397962 clocks 25.231% 8 threads : 0.194 seconds, 582415282 clocks 25.189%Making a work dispatcher with almost perfect scaling is not really that impressive (there are plenty existing around the net - and actually if you are Windows-only there are some good ones built directly into Windows), but there are some things that I am happy about :
The loss from going from single-thread linear to dispatched in packets is very close to zero.
The loss from running lots of small packets as opposed to a few big ones is very close to zero. (I tried packets of between 4 and 64 items and it made almost no difference).
There's no slowdown when you have too many threads (obviously this is a 4 core machine), which is a common flaw in broken work dispatchers (they start getting slower and slower with too many threads because they have contention issues and they fight each other).
(I should say this testing is on a quad-core chip, which doesn't show scalability problems nearly as much as a true 4-proc machine with separate caches, so there may still be scalability flaws I don't know about)