You want to hit 60 fps (or whatever) pretty reliably. Each frame there is some amount of work you *must* do (eg. rendering, respond to input), and there is some amount of work that you would like to do (eg. streaming in data, decompressing, etc.). You want to do as much of the optional work as possible each frame, without using too much time.
You would also like to behave well with other apps. eg. you don't want to just spin and eat 100% of the CPU when you're idle. (this is especially important for casual games where the user is likely browsing the web while playing). To do this you want to Sleep() or something with your idle time.
So, ideally your frame loop looks something like :
-- page flip -- start = timer(); [ do necessary work ] remaining = start + frame_duration - timer() [ do optional work as remaining time allows ] remaining = start + frame_duration - timer() [ if remaining is large , sleep a while ] -- page flip --You want the duration of these three things to add up :
(necessary work) + (optional work) + (sleep time) <= frame_duration but as close to frame_duration as possible without going over
Okay. There is one specific case where this is pretty easy : if you are in full screen, using V-sync page flips, *AND* your graphics driver is actually polite and sleeps for V-sync instead of just spinning. I don't know what the state of modern drivers is, but in the past when I looked into this I found that many major companies don't actually sleep-to-vsync on Windows, they busy spin. (this is not entirely their fault, as there is not a very good vsync interrupt system on windows).
Sleep-to-vsync is important not just because you want to give time to other apps, but because the main thread wants to give time to the background threads. eg. if you want to put streaming on a lower priority background thread, then you write your loop just like :
-- page flip --
[ do necessary work ]
-- request page flip --
puts main thread to sleep
[ optional work runs until vsync interrupt switches me back ]
-- page flip --
and you assume the page flip will sleep you to vsync which will give the background loader time to run.
Even if your driver is nice and gives you a sleep-to-vsync, you can't use it in windowed mode.
A lot of games have horrible tearing in windowed mode. It seems to me that you could fix this by using the Dx9+ GetRasterStatus calls. You simply don't do a flip while the raster scan line is inside your window (or right near the top of your window). You wait for it to be past the bottom or in the vblank area or far enough above you. That should eliminate tearing and also make you hit the frame time more exactly because you sync to the raster.
(if you just use a timer without using the raster you can get the worst possible tearing, where a flip line scans up and down on your image because you're just barely missing sync each frame).
But we still have the problem that we can't sleep to vsync, we basically have to spin and poll the raster.
Now you might think you could look at the remaining time and try to sleep that amount. Ha! Sleep(millis) is a huge disaster. Here's a sampling of how long you actually sleep when you call Sleep(0) or Sleep(1) :
sleepMillis : 0 , actualMillis : 28.335571 sleepMillis : 0 , actualMillis : 22.029877 sleepMillis : 0 , actualMillis : 3.383636 sleepMillis : 0 , actualMillis : 18.398285 sleepMillis : 0 , actualMillis : 22.335052 sleepMillis : 0 , actualMillis : 6.214142 sleepMillis : 0 , actualMillis : 3.513336 sleepMillis : 0 , actualMillis : 15.213013 sleepMillis : 0 , actualMillis : 16.242981 sleepMillis : 0 , actualMillis : 8.148193 sleepMillis : 1 , actualMillis : 5.401611 sleepMillis : 0 , actualMillis : 15.296936 sleepMillis : 0 , actualMillis : 16.204834 sleepMillis : 1 , actualMillis : 7.736206 sleepMillis : 0 , actualMillis : 3.112793 sleepMillis : 0 , actualMillis : 16.194344 sleepMillis : 0 , actualMillis : 6.464005 sleepMillis : 0 , actualMillis : 3.378868 sleepMillis : 0 , actualMillis : 22.548676 sleepMillis : 0 , actualMillis : 4.644394 sleepMillis : 0 , actualMillis : 28.266907 sleepMillis : 0 , actualMillis : 51.839828 sleepMillis : 0 , actualMillis : 7.650375 sleepMillis : 0 , actualMillis : 19.336700 sleepMillis : 0 , actualMillis : 5.380630 sleepMillis : 0 , actualMillis : 6.172180 sleepMillis : 0 , actualMillis : 30.097961 sleepMillis : 0 , actualMillis : 17.053604 sleepMillis : 0 , actualMillis : 3.190994 sleepMillis : 0 , actualMillis : 27.353287 sleepMillis : 0 , actualMillis : 23.557663 sleepMillis : 0 , actualMillis : 27.498245 sleepMillis : 0 , actualMillis : 26.178360 sleepMillis : 0 , actualMillis : 29.123306 sleepMillis : 0 , actualMillis : 23.521423 sleepMillis : 0 , actualMillis : 6.811142You basically cannot call Sleep() at all EVER in a Windows game if you want to reliably hit your frame rate.
And yes yes this is with timeBeginPeriod(1) and this is even with setting my priority to THREAD_PRIORITY_TIME_CRITICAL. And no nothing at all is running on my machine (I'm sure the average user machine that has fucking Windows Crippleware grinding away in the background all the time is far worse).
So I have no solution and I am unhappy.
There definitely seems to be some weird stuff going on in the scheduler triggering this. Like if I just sit in my main thread and do render-flip , with no thread switches and no file IO, then my sleep times seem to be pretty reliable. If I kick some streaming work that activates the other threads and does file IO, then my sleep times suddenly go nuts - and they stay nuts for about a second after all the streaming work is done. It's like the scheduler increases my time slice quantum and then lets it come back down later.
My links on this topic :
Using Waitable Timers with an Asynchronous Procedure Call (Windows)
Timing in Win32 - Geiss
Priority Boosts (Windows)
Molly Rocket View topic - Why Win32 is a bad platform for games - part 1
IDirectDrawGetScanLine
How to fight tearing - virtualdub.org
Examining and Adjusting Thread Priority
Detecting Vertical Retrace with crazy vxd
CodeProject Tearing-free drawing with mmtimer
CodeGuru Creating a High-Precision, High-Resolution, and Highly Reliable Timer, Utilising Minimal CPU Resources
I am the linkenator. I am the linkosaurus rex. Linkotron 5000. I have strong Link-Fu. I see your Links are as big as mine. My name is Inlinko Montoya, you linked my father, prepare to be linked!
ADDENDUM : The situation seems to be rather better in Dx9+. You can use VSync in Windowed Mode, and the VSync does in fact seem to sleep your app. (older versions of Dx could only Vsync in full screen, I didn't expect this to actually work, but it does).
However, I am still seeing some really weird stuff with it. For one thing, it seems to frequently miss VSync for no good reason. If you take one of the trivial Dx9 sample apps from the SDK, in their framework you will find that they don't display the frame rate when Vsync is On. Sneaky little bastards, it hides their flaw.
In any sample app using the DXUT framework you can find a line like :
txtHelper.DrawTextLine( DXUTGetFrameStats( DXUTIsVsyncEnabled() ) );This line shows the frame rate only if Vsync if Off. Go ahead and try turning off VSync - you should see a framerate like 400 - 800 fps. Now change it to :
txtHelper.DrawTextLine( DXUTGetFrameStats( true ) );so that you show framerate even if Vsync is on. You will see a framerate of 40 !!!
40 fps is what you get when you alternate between a 30 fps and a 60 fps frame. ((1/30) + (1/60))/2 = 1/40 . That means they are missing vsync on alternating frames.
If you go and stick a timer around just the Present() call to measure the duration spent in Present, you should see something like :
no vsync :
duration : lo : 1.20 millis , hi : 1.40 millis
vsync :
duration : lo : 15.3 millis , hi : 31.03 millis
Yuck. Fortunately, the samples are really easy to fix. Just put this line anywhere in the sample app :
timeBeginPeriod(1);to change the scheduler granularity. This says to me that Present() is actually just calling the OS Sleep() , and is not using a nice interrupt or anything like that, so if your scheduler granularity is too high you will sleep too long and miss vsync a lot. In fact I would not be surprised if Present just had a loop like :
for(;;)
{
if ( IsRasterReady() ) break;
Sleep(1);
}
which sucks donkey balls for various reasons. (* addendum : yes they are doing this, see below).
If in fact they are doing what I think they are doing, then you will hit vsync
more reliably if you pump up your thread priority before calling Sleep :
int oldp = GetThreadPriority( GetCurrentThread() );
SetThreadPriority( GetCurrentThread() , THREAD_PRIORITY_SOMETHING_REALLY_FUCKING_HIGH );
Present( ... );
SetThreadPriority( GetCurrentThread() , oldp );
this tries to ensure that the scheduler will actually switch back to you for your vsync the way you want. (this has no effect in
the samples cuz they have no other threads, but may be necessary if you're actually stressing the system).
Note the samples also freak out if they don't have focus. That's because they actually call Sleep(50) when they don't have focus. That's a cool thing and not a bug ;)
To wrap up, here's the approach that I believe works :
Use dx9+ Use Vsync and 1 back buffer Bump up thread priority & scheduler interval to make sure Present() doesn't miss Make main thread higher priority than worker threads so that workers run when main is sleeping (must also check that workers are not starving) On machines with speed-step , *and* if the app has focus : Run a very low priority thread that just spins to keep the cpu clocked upand cross your fingers and pray.
BTW whenever you show something like framerate you should show four numbers :
Instant current value this frame. Average over last N frames. Low over last N frames. High over last N frames. with N like 10 or something. obviously you can do this trivially with a little circular buffer.You can see a lot of things that way. For example when you're doing something like missing vsync on alternating frames, you will see the average is perfectly stable, a rock solid 40.0 , but the instantaneous value is jumping up and down like mad.
BTW cb::circular_array does this pretty well.
ADDENDUM : I found a note in the Dx9 docs that is quite illuminating. They mention that if you use PRESENT_INTERVAL_DEFAULT, the action is to sync to vblank. If you use PRESENT_INTERVAL_ONE, the action is the same - but there is a difference, it automatically does a timeBeginPeriod(1) for you to help you hit vsync more precisely. This tells me that they are in fact doing the Sleep()/GetRasterStatus loop that I suspected.