Ok, first let me describe the issue :
You want to hit 60 fps (or whatever) pretty reliably. Each frame there is some amount of work you *must* do (eg. rendering, respond to
input), and there is some amount of work that you would like to do (eg. streaming in data, decompressing, etc.). You want to do as
much of the optional work as possible each frame, without using too much time.
You would also like to behave well with other apps. eg. you don't want to just spin and eat 100% of the CPU when you're idle.
(this is especially important for casual games where the user is likely browsing the web while playing). To do this you want to Sleep()
or something with your idle time.
So, ideally your frame loop looks something like :
-- page flip --
start = timer();
[ do necessary work ]
remaining = start + frame_duration - timer()
[ do optional work as remaining time allows ]
remaining = start + frame_duration - timer()
[ if remaining is large , sleep a while ]
-- page flip --
You want the duration of these three things to add up :
(necessary work) + (optional work) + (sleep time) <= frame_duration
but as close to frame_duration as possible without going over
Okay. There is one specific case where this is pretty easy : if you are in full screen, using V-sync page flips, *AND* your graphics
driver is actually polite and sleeps for V-sync instead of just spinning. I don't know what the state of modern drivers is, but in the past
when I looked into this I found that many major companies don't actually sleep-to-vsync on Windows, they busy spin. (this is not entirely
their fault, as there is not a very good vsync interrupt system on windows).
Sleep-to-vsync is important not just because you want to give time to other apps, but because the main thread wants to give time to the
background threads. eg. if you want to put streaming on a lower priority background thread, then you write your loop just like :
-- page flip --
[ do necessary work ]
-- request page flip --
puts main thread to sleep
[ optional work runs until vsync interrupt switches me back ]
-- page flip --
and you assume the page flip will sleep you to vsync which will give the background loader time to run.
Even if your driver is nice and gives you a sleep-to-vsync, you can't use it in windowed mode.
A lot of games have horrible tearing in windowed mode. It seems to me that you could fix this by using
the Dx9+ GetRasterStatus calls. You simply don't do a flip while the raster scan line is inside your
window (or right near the top of your window). You wait for it to be past the bottom or in the vblank
area or far enough above you. That should eliminate tearing and also make you hit the frame time more
exactly because you sync to the raster.
(if you just use a timer without using the raster you can get the worst possible tearing, where a flip line
scans up and down on your image because you're just barely missing sync each frame).
But we still have the problem that we can't sleep to vsync, we basically have to spin and poll the raster.
Now you might think you could look at the remaining time and try to sleep that amount. Ha! Sleep(millis)
is a huge disaster. Here's a sampling of how long you actually sleep when you call Sleep(0) or Sleep(1) :
sleepMillis : 0 , actualMillis : 28.335571
sleepMillis : 0 , actualMillis : 22.029877
sleepMillis : 0 , actualMillis : 3.383636
sleepMillis : 0 , actualMillis : 18.398285
sleepMillis : 0 , actualMillis : 22.335052
sleepMillis : 0 , actualMillis : 6.214142
sleepMillis : 0 , actualMillis : 3.513336
sleepMillis : 0 , actualMillis : 15.213013
sleepMillis : 0 , actualMillis : 16.242981
sleepMillis : 0 , actualMillis : 8.148193
sleepMillis : 1 , actualMillis : 5.401611
sleepMillis : 0 , actualMillis : 15.296936
sleepMillis : 0 , actualMillis : 16.204834
sleepMillis : 1 , actualMillis : 7.736206
sleepMillis : 0 , actualMillis : 3.112793
sleepMillis : 0 , actualMillis : 16.194344
sleepMillis : 0 , actualMillis : 6.464005
sleepMillis : 0 , actualMillis : 3.378868
sleepMillis : 0 , actualMillis : 22.548676
sleepMillis : 0 , actualMillis : 4.644394
sleepMillis : 0 , actualMillis : 28.266907
sleepMillis : 0 , actualMillis : 51.839828
sleepMillis : 0 , actualMillis : 7.650375
sleepMillis : 0 , actualMillis : 19.336700
sleepMillis : 0 , actualMillis : 5.380630
sleepMillis : 0 , actualMillis : 6.172180
sleepMillis : 0 , actualMillis : 30.097961
sleepMillis : 0 , actualMillis : 17.053604
sleepMillis : 0 , actualMillis : 3.190994
sleepMillis : 0 , actualMillis : 27.353287
sleepMillis : 0 , actualMillis : 23.557663
sleepMillis : 0 , actualMillis : 27.498245
sleepMillis : 0 , actualMillis : 26.178360
sleepMillis : 0 , actualMillis : 29.123306
sleepMillis : 0 , actualMillis : 23.521423
sleepMillis : 0 , actualMillis : 6.811142
You basically cannot call Sleep() at all EVER in a Windows game if you want to reliably hit your frame rate.
And yes yes this is with timeBeginPeriod(1) and this is even with setting my priority to THREAD_PRIORITY_TIME_CRITICAL.
And no nothing at all is running on my machine (I'm sure the average user machine that has fucking Windows Crippleware grinding
away in the background all the time is far worse).
So I have no solution and I am unhappy.
There definitely seems to be some weird stuff going on in the scheduler triggering this. Like if I just sit in my main thread
and do render-flip , with no thread switches and no file IO, then my sleep times seem to be pretty reliable. If I kick some
streaming work that activates the other threads and does file IO, then my sleep times suddenly go nuts - and they stay nuts
for about a second after all the streaming work is done. It's like the scheduler increases my time slice quantum and then lets
it come back down later.
My links on this topic :
Using Waitable Timers with an Asynchronous Procedure Call (Windows)
Timing in Win32 - Geiss
Priority Boosts (Windows)
Molly Rocket View topic - Why Win32 is a bad platform for games - part 1
IDirectDrawGetScanLine
How to fight tearing - virtualdub.org
Examining and Adjusting Thread Priority
Detecting Vertical Retrace with crazy vxd
CodeProject Tearing-free drawing with mmtimer
CodeGuru Creating a High-Precision, High-Resolution, and Highly Reliable Timer, Utilising Minimal CPU Resources
I am the linkenator. I am the linkosaurus rex. Linkotron 5000. I have strong Link-Fu. I see your Links are as big as mine.
My name is Inlinko Montoya, you linked my father, prepare to be linked!
ADDENDUM : The situation seems to be rather better in Dx9+. You can use VSync in Windowed Mode, and the VSync does in fact seem
to sleep your app. (older versions of Dx could only Vsync in full screen, I didn't expect this to actually work, but it does).
However, I am still seeing some really weird stuff with it. For one thing, it seems to frequently miss VSync for no good reason.
If you take one of the trivial Dx9 sample apps from the SDK, in their framework you will find that they don't display the
frame rate when Vsync is On. Sneaky little bastards, it hides their flaw.
In any sample app using the DXUT framework you can find a line like :
txtHelper.DrawTextLine( DXUTGetFrameStats( DXUTIsVsyncEnabled() ) );
This line shows the frame rate only if Vsync if Off. Go ahead and try turning off VSync - you should see a framerate like 400 - 800
fps. Now change it to :
txtHelper.DrawTextLine( DXUTGetFrameStats( true ) );
so that you show framerate even if Vsync is on. You will see a framerate of 40 !!!
40 fps is what you get when you alternate between a 30 fps and a 60 fps frame. ((1/30) + (1/60))/2 = 1/40 .
That means they are missing vsync on alternating frames.
If you go and stick a timer around just the Present() call to measure the duration spent in Present, you should
see something like :
no vsync :
duration : lo : 1.20 millis , hi : 1.40 millis
vsync :
duration : lo : 15.3 millis , hi : 31.03 millis
Yuck. Fortunately, the samples are really easy to fix. Just put this line anywhere in the sample app :
timeBeginPeriod(1);
to change the scheduler granularity. This says to me that Present() is actually just calling the OS Sleep() , and is not using a nice
interrupt or anything like that, so if your scheduler granularity is too high you will sleep too long and miss vsync a lot. In fact I
would not be surprised if Present just had a loop like :
for(;;)
{
if ( IsRasterReady() ) break;
Sleep(1);
}
which sucks donkey balls for various reasons. (* addendum : yes they are doing this, see below).
If in fact they are doing what I think they are doing, then you will hit vsync
more reliably if you pump up your thread priority before calling Sleep :
int oldp = GetThreadPriority( GetCurrentThread() );
SetThreadPriority( GetCurrentThread() , THREAD_PRIORITY_SOMETHING_REALLY_FUCKING_HIGH );
Present( ... );
SetThreadPriority( GetCurrentThread() , oldp );
this tries to ensure that the scheduler will actually switch back to you for your vsync the way you want. (this has no effect in
the samples cuz they have no other threads, but may be necessary if you're actually stressing the system).
Note the samples also freak out if they don't have focus. That's because they actually call Sleep(50) when they don't have focus.
That's a cool thing and not a bug ;)
To wrap up, here's the approach that I believe works :
Use dx9+
Use Vsync and 1 back buffer
Bump up thread priority & scheduler interval to make sure Present() doesn't miss
Make main thread higher priority than worker threads so that workers run when main is sleeping
(must also check that workers are not starving)
On machines with speed-step , *and* if the app has focus :
Run a very low priority thread that just spins to keep the cpu clocked up
and cross your fingers and pray.
BTW whenever you show something like framerate you should show four numbers :
Instant current value this frame.
Average over last N frames.
Low over last N frames.
High over last N frames.
with N like 10 or something.
obviously you can do this trivially with a little circular buffer.
You can see a lot of things that way. For example when you're doing something like missing vsync on alternating frames, you will see
the average is perfectly stable, a rock solid 40.0 , but the instantaneous value is jumping up and down like mad.
BTW cb::circular_array does this pretty well.
ADDENDUM : I found a note in the Dx9 docs that is quite illuminating. They mention that if you use PRESENT_INTERVAL_DEFAULT, the
action is to sync to vblank. If you use PRESENT_INTERVAL_ONE, the action is the same - but there is a difference, it automatically
does a timeBeginPeriod(1) for you to help you hit vsync more precisely. This tells me that they are in fact doing the Sleep()/GetRasterStatus
loop that I suspected.