04-17-09 - Oodle File Page Cache

I'm trying to figure something out, maybe someone out there has a good idea.

This is about the Oodle File Page Cache that I mentioned previously. I'm not doing the fully general page cache thing yet (I might not ever because it's one of those things you have to buy into my philosophy which is what I'm trying to avoid).

Anyway, the issue is about how to prioritize pages for reclamation. Assume you're in a strictly limitted memory use scenario, you have a fixed size pool of say 50 pages or so.

Now obviously, pages that are actually current locked get memory. And the next highest priority is probably sequentially prefetched pages (the pages that immediately follow the currently locked pages) (assuming the file is flagged for sequential prefetching, which it would be by default).

But after that you have to decide how to use the remaining pages. The main spot where you run into a question is : I need to grab a new page to put data into, but all the pages are taken - which one do I drop and recycle? (Or, equivalently : I'm thinking about prefetching page X, the least important page currently in the pool is page Y - should I reclaim page Y to prefetch page X, or should I just wait and not do the prefetch right now).

The main ambiguity comes from "past" pages vs. "prefetched" pages (and there's also the issue of old prefetched pages that were never used).

A "past" page is one that the client has unlocked. It was paged in, the client locked it, did whatever, then unlocked it. There's one simple case, if the client tells me this file is strictly forward-scan streaming, then the page can be dropped immediately. If not, then the past page is kept around for some amount of time. (there's another sequence point when the file containing the page is closed - again optionally you can say "just drop everything when I close the file" or the pages can be kept around for a while after the close to make sure you were serious about closing it).

A "prefetched" page obviously can be prefetched by sequential scan in an open file. It could also be from a file in the prefetch-ahead file list that was generated by watching previous runs.

Prefetches pages create two issues : one is how far ahead do I prefetch. Basically you prefetch ahead until you run out of free pages, but when you have no free pages, the question is do I reclaim past pages to do new prefetches?

The other issue with prefetches is what do you do with prefetched pages that were never actually used. Like I prefetched some pages but then the client never locked them, so they are still sitting around - at what point do I reclaim those to do new prefetches?

To make it more clear here's a sort of example :

Client gives me a prefetch file list - {file A, file B, file C, file D}

Client opens file A and touches a bunch of pages.  So I pull in {A:0,A:1,A:2} (those are the page numbers in the file).

I also start prefetching file B and file C , then I run out of free pages, so I get {B:0,B:1,C:0}.

Client unlocks the pages in file A but doesn't close file A or tell me I can drop the pages.

Client now starts touching pages in file C.  I give him C:0 that I already prefetched.

Now I want to prefetch C:1 for sequential scan, I need to reclaim a page.

Do I reclaim a page from B (prefetched but not yet used) or a page from A (past pages) ?  Or not prefetch at all?

When client actually asks for C:1 to lock then I must reclaim something.

Should I now start prefetching {D:0} ?  I could drop a page from {A} to get it.

Anyway, this issue just seems like a big mess so I'm hoping someone has a clever idea about how to make it not so awful.

There's also very different paradigms for low-page-count vs high-page-count caches. On something like the PS2 or XBox 1 where you are super memory limitted, you might in fact run with only 4 pages or something tiny like that. In that case, I really want to make sure that I am using each page for the best purpose at all times. In that scenario, each time I need to reclaim a page, I should reevaluate all the priorities so they are fresh and make the best decision I can.

On something like Windows you might run with 1024 pages. (64k page * 1024 pages = 64 MB page cache). In that case I really don't want to be walking every single page to try to pick the best one all the time. I can't just use a heap or something, because page priorities are not static - they can change just based on time passing (if I put any time-based prioritization in the cache). Currently I'm using a sort of cascaded priority queue where I have different pools of priority groups, and I only reevaluate the current lowest priority group. But that's rather complicated.


Brian said...

You are doing this for games right? Can you just build a statistical model from test runs of playing the game. Then just use that to predict what pages are least likely to be used?

cbloom said...

You're quite mad, you know.

Anonymous said...

Or use that to predict what to prefetch...

cbloom said...

(I was kinda joking with that last comment if it wasn't obvious)

So I am using game runs to generate the file access list to automatically schedule prefetches. But I really don't want to rely on that too much because game IO pattersn are not totally deterministic. Even level loads might pull slightly different stuff depending on user quality settings or whatever.

So for example if you recorded a run that went


and then you actually execute a run that goes


when you load C & D, the prefetched version of B should stick around waiting in memory because it will be used.

but if you execute a run that goes


and never loads B, then it needs to get dropped at some point.

Ben Garney said...

Couldn't you implement some reasonable default behavior, then give the user a few hints to deal with the weird cases? For instance, assume that they will always lock things once, unless they hint in the unlock that they will want it again. The glory of the game scenario is you can assume intelligent hints - whereas the OS doesn't have that luxury. You could even have some dumb logic that would say "hey I noticed you're making me thrash these pages, pass this hint and it will save a lot of pain."

Brian said...

If you test the games enough, your statistical models should be pretty decent.

Why not focus on getting good performance from the model, and then make some sort of default behavior that clears something if absolutely necessary (or it becomes obviously very old)?

ryg said...

Hmm, the statistical model thing should work reasonably well if you do it on a per-"object" (file, asset, whatever) basis, but I'm skeptical about doing it on a page level. Page-granular statistics are brittle in the sense that they depend not only on which asset gets used when, but also on the exact version of that asset. This increases the turnaround time for what would otherwise be small content tweaks by a considerable amount since now you have to do test runs again. Changing a few pixels in a texture isn't a big deal no matter what, but e.g. changing the format for a HUD texture from DXT5 to ARGB does make a big difference in size and hence page layout, and so does changing animation compression settings for a chracter.

Blowing up a simple memory test run (which consists of loading all levels and checking whether there are any out of memory conditions, takes our QA maybe 2-3 hours) to a full representative playtest run (don't have exact numbers, but I'd assume a week at least to get reasonable values) is some serious friction for last-minute content tweaks.

cbloom said...

Hmm.. so it seems that a lot of actual OS's just use this simple scheme :

keep pages in an LRU (or pseudo-LRU of some kind)

add prefetched pages to the top of the LRU (the MRU spot)

Then the only decision point is : if I want to do prefetch X, should I pop the LRU page or wait.

cbloom said...

One annoying thing about doing my own page cache is the weirdness of the interaction with Windows cache.

On PS3 or Xenon or whatever this isn't a big issue, I can just do my thing.

On Windows, if I use buffered IO, then all the pages will go through Windows cache AND my cache. Kind of dumb. However, if I *don't* use buffered IO, then I lose the ability to keep pages in cache across program runs, and stdio will appear to beat me when you run the same test over and over because the pages will stay in Windows cache for it.

Pages staying in cache between program runs is of dubious use for games. However it is kind of handy for game developers, since we do run the same test over and over.

I've noticed some developers running 64 bit OS'es with 16 GB + , which gives them 10 GB of disk cache, which means that almost all of their file IO is coming from cache all the time, which sort of screws up all my assumptions. Obviously the average consumer is still nowhere near this, but they will be eventually I suppose.

old rants