Historically the way you address this is run lots of trials (like 100) and take the MIN time of any trial.
(* important note : if you aren't trying to time "hot cache" performance, you need to wipe the cache between each run. I dunno if there's an easy instruction or system call that would invalidate all cache pages; what I usually do is have a routine that goes and munges over some big array).
It's a bit better these days because of many cores. Now you can quite often find a core which is unmolested by annoying services popping up and stealing CPU time and messing up your profile. But sometimes you get unlucky, and your process runs on an IdealProc that has some other shite.
So a simple loop helps :
which makes it reasonably probable that you get a clean run on some core. For published results you will still
want to repeat the whole thing N times.
uint64 GetFuncTime( t_func * pfunc )
HANDLE proc = GetCurrentProcess();
HANDLE thread = GetCurrentThread();
uint64 tick_range = 1ULL << 62;
DWORD mask = 1UL<
if ( mask & affProc )
uint64 t1 = __rdtsc();
uint64 t2 = __rdtsc();
uint64 cur_tick_range = t2 - t1;
tick_range = MIN(tick_range,cur_tick_range);