8/06/2010

08-06-10 - Forceinline

Forceinline is another thing like restrict that is really in the wrong place. What I want is for it to be at the *call* , or at least overrideable at the call.

For example say you write something like memcpy - (not as an intrinsic but as a function). Most of the time you're okay with it just being a function, but in some little routine that is a super hot spot you want to be able to say :


for ( .. a million .. )
{
  .. important stuff
  __forceinline memcpy(to,from,size);
}

(and the opposite __notinline). More generally Sean once mentioned just the idea of being able to mark up "yeah really optimize this" or "this is done often" parts of the code, so that could suffice as well.

At the moment the only way I know how to do this is some ugly shit like :


__forceinline void inl_myfunc()
{
 .. write code here ..
}

void call_myfunc()
{
  inl_myfunc();
}

then clients can choose to call inl_myfunc or call_myfunc. Ugly. C99 cleaned up the inline/extern spec so you can get compilation of non-inlined "inline" functions in only one place, but it failed to let the client specify whether or not it should inline or not.

BTW it should be evidently clear that the standard compiler inlining heuristic of using complexity is totally wrong. Little shitlet functions that happen to call memcpy should *not* get it inlined, and my big complex LZ decoder function *should*. In fact there's just no way for the compiler to know when it's a good idea to inline or not because it doesn't have information about how often a spot in code is hit.

Restrict continues to cause me no end of annoyance; I'm working on some chunk of code that I know is all alias-free, but I look at the disasm and I see it's doing pointless loads and stores. Okay, WTF, I forgot to put restrict on something. Now I have to randomly browse around my code and type definitions and try to find the one that's missing restrict. That's fucking retarded for workflow. I should just be able to say __restrict { } over my chunk of code.

3 comments:

Autodidactic Asphyxiation said...

Yeah, being able to get the compiler to inline a call site would be nice, but these are also the sort of things that the compiler tends to do a much better job of with profiling-directed optimization, but that is just a big pain, especially since it makes iterations much longer.

At work, the compiler team says things like: don't do these low-level hand-optimizations, file a compiler optimization bug. This is not really acceptable for most people, even those with access to a compiler team. For one, you don't want to add the compiler release cycle to your iterations. Second, GCC is getting better at auto-vectorizing, but humans are still better.

In another, close-but-no-cigar thing is GCC's "flatten" function attribute, which will attempt to inline all the transitive calls under a method. Newer GCC also supports function-specific #pragma to change optimization settings.

wiewior said...

I was thinking actually about creating such construct as my weekend project to learn either gcc codebase or plugins if possible (I doubt it).

But although I can understand what You want in:
__restrict { ...code.... }

Which of the below would You see as proper approach to volatile problem? (You mentioned it in some older post, can't find)

int a,b,c;

volatile a =b*4 + pow(c,2);
volatile { a =b*4 + pow(c,2); }
volatile (a,b) { a =b*4 + pow(c,2); }

Only third seems to address issue of multi-volatile-variables-in-single-statement (MVVISS; don't need this acronym, but looks good!) without making everything volatile'd.

cbloom said...

What do you mean by "volatile problem" ?

Are you talking about compiler ordering of memory accesses, or " __mayalias" ?

I guess __mayalias is the same as making every memory access inside the __mayalias block into a a volatile. That is :

__mayalias {
*p1 = x;
y = *p2;
}

tells it p1 and p2 might point at the same thing, so please actually generate the store and load.

old rants