01-30-09 - SetFileValidData and async writing

Extending and async writing files on Windows is a bad situation. See for example Asynchronous Disk I/O Appears as Synchronous on Windows . Basically you should just not even try, but if you really want to give it a go :

1. Open your file for OVERLAPPED and also for NO_BUFFERING. Buffering helps reads, but it severely hurts writes. It takes my write speeds from 80 MB/sec down to like 30 MB/sec. I suspect that the reason is that buffering causes the pages to be read in before they are written out. (it's sort of like using cached memory - it will fetch in the pages even though you're doing nothing but stomping all over them).

2. Use the undocumented NtSetInformationFile to resize the file to its full size before you write anything. SetEndOfFile will only work for page size granularity, NtSetInformationFile can do arbitrary sizes. BTW this is also better for fragmentation than just writing lots of data onto the end of the file, which can cause NTFS to give you lots of allocation chains.

3. Use SetFileValidData to tell Windows that whatever is in the sectors on disk is okay. If you don't use SetFileValidData, Windows will first zero out each sector before you get to touch it. This is like a security thing, but obviously it's pretty bad for perf to basically write the whole file twice. SetFileValidData will fail unless you first ask for your process to get the right permissions, which of course will only work for processes running as administrator. Okay, I did all that. this post is okay but dear god don't read the thread.

If you do all those things right - your WriteFile() calls will actually be asynchronous. The actual speed win is not huge. Part of the problem is the next issue :

When I do all that, I start hitting some kind of weird OS buffer filling issue. I haven't tried to track down exactly what's happening because I don't really care that much, but what I see is that the writes are totally async and very fast (> 100 MB/sec) for some random amount of time (usually up to about 10 MB of writing or so) and then suddenly randomly start having huge delays. The write speed then goes down to 40 MB/sec or so.

ADDENDUM : when I say "you should not even try" I mean you should just live with WriteFile() being synchronous. It's plenty fast. Just run it from a thread and it still looks async to your thread (you need a thread anyway because OpenFile and CloseFile are very slow and synchronous; in fact the only thing you can rely on actually being fast and async is ReadFile). Also just live with the fact that Windows is zeroing the disk before you write it, everyone else does.


nothings said...

Are you sure the hard drive can actually do more than 40MB/s?

msnopenid said...

SetFileInformationByHandle should do the same thing as NtSetInformationFile, unless I'm wrong or high.

Also, fragmentation might also be an issue even with setting the total size and the allocated size.


cbloom said...

Yeah, SetFileInformationByHandle looks promising, but it's >= Vista only.

Fragmentation on NTFS is definitely like playing the lottery, but I have found that extending a file over and over with appends almost always gives you bad fragmentation because NTFS seems to be pretty greedy about putting the initial file chains in small portions of the disk free space, which means it can't do the append contiguously.

On the other hand if you do a big reserve up front, you at least have a hope of getting an unfragmented block (though it's not guaranteed).

In fact, I've started to just always reserve at least 1 MB for files I write, and then if I wind up writing them smaller, I just truncate them down. I haven't yet found a disadvantage to this.

msnopenid said...

It actually works on XP too, but you need to download http://www.microsoft.com/downloads/details.aspx?FamilyId=1DECC547-AB00-4963-A360-E4130EC079B8&displaylang=en



cbloom said...

Ah cool, that looks like just some wrappers to expose the undocumented internals.

old rants