06-03-07 - 3

NTFS really gets destroyed by lots of small flushes. If you take a file, write a byte to the end of it, close it, write a byte, close it, over and over, you will really fuck up the NTFS records for that file and all future accesses to that file will be incredibly slow (all reads & writes).

The way NTFS tracks files is that each file is a linked list of contiguous sectors. Each link in the list is a sector address and a number of sectors defining one contiguous chunk. If the file is in one contiguous spot, it's just one link. As it gets fragments, NTFS add links pointing to other spots on the disk.

Basically what I'm imagining is happening is that NTFS is like a non-relocating greedy hole-filling linked list allocator. First of all when you flush a small file, NTFS seems to stick that file at perhaps the smallest space it finds that can hold the file, or perhaps just at the first space it finds, but definitely not at the biggest space. Then when it needs to add a sector to the file, it doesn't move the whole file to a spot that can hold the whole file contiguous, instead it just adds a link at the end of the file pointing to the next block. If you are only doing small appends your file will become a huge linked list of single sectors pointing down a chain, so just reading the file in will make the head seek all over and take forever.

This sort of non-moving link-appending works great in many cases, eg. if you have some 10 MB file and you add 1 byte on it, you obviously don't want to find a space that fits 10MB + 1 and move the whole thing there just to keep it contiguous, the link is perfect. Also sticking in the smallest space to fill holes works great if you never append to files, eg. say I just atomically make a file of size N and will never grow it - then the ideal spot to stick it is the smallest space that fits it (well, this is not actually quite true if you have some model of what you expect future file sizes to be).

They could have very easily added an auto-defrag mechanism to handle this common appending case, just by seeing how many frags are in a file when you close it, and if it's super fragged then just move the whole thing out to an open space (presuming there is one). I mean eventually if you're doing lots of evil disk ops you're going to have too many little holes and no big spaces, and you'll have to do more of a defrag, but even that could be done sort of incrementally in the background from time to time. In fact this is all so easy and would improve the average computer's performance and hard disk lifetime dramatically, I'm not sure why they didn't do it, I think perhaps I'm missing something.

BTW if you actually just had a little incremental background defragger running all the time, then the append-often case doesn't even come up. You treat your disk as having two regions, the "mostly condensed" and "mostly empty". New allocations are initially put somewhere in the mostly empty space where they have lots of room to grow. The background defragger then just takes idle files from the mostly empty region and stuffs them somewhere they fit well in the mostly condensed space.

Meh, I guess this is a retarded amateur ranting about a hard problem that lots of smart people have surely worked on extensively, I'm just bothered that it's so easy to horribly break NTFS with a pretty standard access pattern. In the real world, file systems (just like memory allocators) are not used randomly, so you don't need to make them optimal for totally arbitrary usage; you should make them work really well for the very common usage, and just make them work okay for bizarro rare usage.

No comments:

old rants