10-01-11 - More Reliable Timing on Windows

When profiling little code chunks on Windows, one of the constant annoyances is the unreliability of times due to multithreading.

Historically the way you address this is run lots of trials (like 100) and take the MIN time of any trial.

(* important note : if you aren't trying to time "hot cache" performance, you need to wipe the cache between each run. I dunno if there's an easy instruction or system call that would invalidate all cache pages; what I usually do is have a routine that goes and munges over some big array).

It's a bit better these days because of many cores. Now you can quite often find a core which is unmolested by annoying services popping up and stealing CPU time and messing up your profile. But sometimes you get unlucky, and your process runs on an IdealProc that has some other shite.

So a simple loop helps :

template <typename t_func>
uint64 GetFuncTime( t_func * pfunc )
    HANDLE proc = GetCurrentProcess();
    HANDLE thread = GetCurrentThread();
    DWORD_PTR affProc,affSys;
    uint64 tick_range = 1ULL << 62;
    for(int rep=0;rep<24;rep++)
        DWORD mask = 1UL<<rep;
        if ( mask & affProc )

        uint64 t1 = __rdtsc();
        uint64 t2 = __rdtsc();

        uint64 cur_tick_range = t2 - t1;
        tick_range = MIN(tick_range,cur_tick_range);



    return tick_range;

which makes it reasonably probable that you get a clean run on some core. For published results you will still want to repeat the whole thing N times.


Branimir Karadžić said...

I'm just curious, is there benefit of using "template " vs uint64 GetFuncTime( void (*pfunc)() ) or just being cplusplusey?

cbloom said...

Works with other func types. In particular you don't have to get the __cdecl or whatever nonsense right.

In practice I use a version that also takes templated args and passes them through.

You're welcome to change it to void *pfunc if that works for you.

old rants