10-02-12 - Small note on Buffered IO Timing

On Windows, Oodle by default uses OS buffering for reads and does not use OS buffering for writes. I believe this is the right way to go 99% of the time (for games).

(see previous notes on how Windows buffering works and why this is fastest :
cbloom rants 10-06-08 - 2
cbloom rants 10-07-08 - 2
cbloom rants 10-09-08 - 2

Not buffering writes also has other advantages besides raw speed, such as not polluting the file cache; if you buffer writes, then first some existing cache page is evicted, then the page is zero'ed, then your bytes are copied in, and finally it goes out to disk. Particularly if you are streaming out large amounts of data, there's no need to dump out a bunch of read-cached data for your write pages (which is what Windows will do because its page allocation strategy is very greedy).

(the major exception to unbuffered writes being best is if you will read the data soon after writing; eg. if you're writing out a file so that some other component can read it in again immediately; that usage is relatively rare, but important to keep in mind)

Anyhoo, this post is a small note to remind myself of a caveat :

If you are benchmarking apps by their time to run (eg. as an exe on a command line), buffered writes can appear to be much much faster. The reason is that the writes are not actually done when the app exits. When you do a WriteFile to a buffered file, it synchronously reserves the page and zeroes it and copies your data in. But the actual writing out to disk is deferred and is done by the Windows cache maintenance thread at some later time. Your app is even allowed to exit completely with those pages unwritten, and they will trickle out to disk eventually.

For a little command line app, this is a better experience for the user - the app runs much faster as far as they are concerned. So you should probably use buffered writes in this case.

For a long-running app (more than a few seconds) that doesn't care much about the edge conditions around shutdown, you care more about speed while your app is running (and also CPU consumption) - you should probable use unbuffered writes.

(the benefit for write throughput is not the only compelling factor, unbuffered writes also consume less CPU due to avoiding a memset and memcpy).


won3d said...

Write buffering is in the class of "try to improve average throughput at the cost of worst-case latency" strategies, like JIT compilation, garbage collection, or even "smart" mallocs.

won3d said...
This comment has been removed by the author.
cbloom said...

I think it's one of those things where if you actually think about your usage, you can decide whether to do it or not for each case. Unfortunately almost noone does that, they just either turn it on all the time or off all the time.

Of course the same is true for things like mallocs; you shouldn't just have a malloc & free, you should have several that are designed for different use cases, and think about which one to use. But nobody wants that, they just want the magic solution they don't have to think about.

(eg. with something like tcmalloc/hoard you should have at least 2 variants of free :

free to the current thread's pool
free to the allocating thread's pool

because there's no way for the allocator to know the right answer, it needs the client app to tell it.

And with a flag in both cases whether frees that make a whole block free leaves that block reserved or return it to the system)

old rants