5/29/2010

05-29-10 - x64 so far

x64 linkage that's been useful so far :

__asm__ cmpxchg8bcmpxchg16b - comp.programming.threads Google Groups
_InterlockedCompareExchange Intrinsic Functions
x86-64 Tour of Intel Manuals
x64 Starting Out in 64-Bit Windows Systems with Visual C++
Writing 64-bit programs
Windows Data Alignment on IPF, x86, and x86-64
Use of __m128i as two 64 bits integers
Tricks for Porting Applications to 64-Bit Windows on AMD64
The history of calling conventions, part 5 amd64 - The Old New Thing - Site Home - MSDN Blogs
Snippets lifo.h
Predefined Macros (CC++)
Physical Address Extension - PAE Memory and Windows
nolowmem (Windows Driver Kit)
New Intrinsic Support in Visual Studio 2008 - Visual C++ Team Blog - Site Home - MSDN Blogs
Moving to Windows Vista x64
Moving to Windows Vista x64 - CodeProject
Mark Williams Blog jmp'ing around Win64 with ml64.exe and Assembly Language
Kernel Exports Added for Version 6.0
Is there a portable equivalent to DebugBreak()__debugbreak - Stack Overflow
How to Log Stack Frames with Windows x64 - Stack Overflow
BCDEdit Command-Line Options
Available Switch Options for Windows NT Boot.ini File
AMD64 Subpage
AMD64 (EM64T) architecture - CodeProject
20 issues of porting C++ code on the 64-bit platform

One unexpected annoyance has been that a lot of the Win32 function signatures have changed. For example LRESULT is now a pointer not a LONG. This is a particular problem because Win32 has always made heavy use of cramming the wrong type into various places, eg. for GetWindowLong and stuffing pointers in LPARAM's and all that kind of shit. So you wind up having tons of C-style casts when you write Windows code. I have made good use of these guys :



// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to & same_size_value_cast( t_fm & from )
{
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    // just value cast :
    return (t_to) from;
}

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to & same_size_bit_cast_p( t_fm & from )
{
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    // cast through char * to make aliasing work ?
    char * ptr = (char *) &from;
    return *( (t_to *) ptr );
}

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
// cast with union is better for gcc / Xenon :
template < typename t_to, typename t_fm >
t_to & same_size_bit_cast_u( t_fm & from )
{
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    union _bit_cast_union
    {
        t_fm fm;
        t_to to;        
    };
    _bit_cast_union converter = { from };
    return converter.to;
}

// check_value_cast just does a static_cast and makes sure you didn't wreck the value
template < typename t_to, typename t_fm >
t_to check_value_cast( const t_fm & from )
{
    t_to to = static_cast<t_to>(from);
    ASSERT( static_cast<t_fm>(to) == from );
    return to;
}

inline int ptr_diff_32( ptrdiff_t diff )
{
    return check_value_cast<int>(diff);
}

BTW this all has made me realize that the recent x86-32 monotony on PC's has been a delightful stable period for development. I had almost forgotten that it used to be always like this. Now to do simple shit in my code, I have to detect if it's x86 or x64 , if it is x64, do I have an MSC version that has the intrinsics I need? if not I have to write a got damn MASM file. Oh and I often have the check for Vista vs. XP to tell if I have various kernel calls. For example :


#if _MSC_VER > 1400

// have intrinsic
_InterlockedExchange64()

#elif _X86_NOT_X64_

// I can use inline asm
__asm { cmpxchg8b ... }

#elif OS_IS_VISTA_NO_XP

// kernel library call available
InterlockedExchange64()

#else

// X64 , not Vista (or want to be XP compatible) , older compiler without intrinsic,
//  FUCK !

#error just use a new newer MSVC version for X64 because I don't want to fucking write MASM rouintes

#endif

Even ignoring the pain of the last FUCK branch which requires making a .ASM file, the fact that I had to do a bunch of version/target checks to get the right code for the other paths is a new and evil pain.

Oh, while I'm ranting, fucking MSDN is now showing all the VS 2010 documentation by default, and they don't fucking tell you what version things became available in.

This actually reminds me of the bad old days when I got started, when processors and instruction sets were changing rapidly. You actually had to make different executables for 386/486 and then Pentium, and then PPro/P3/etc (not to mention the AMD chips that had their own special shiznit). Once we got to the PPro it really settled down and we had a wonderful monotony of well developed x86 on out-of-order machines that continued up to the new Core/Nehalem chips (only broken by the anomalous blip of Itanium that we all ignored as it went down in flames like the Hindenburg). Obviously we've had consoles and Mac and other platforms to deal with, but that was for real products that want portability to deal with, I could write my own Wintel code for home and not think about any of that. Well Wintel is monoflavor no more.

The period of CISC and chips with fancy register renaming and so-on was pretty fucking awesome for software developers, because you see the same interface for all those chips, and then behind the scenes they do magic mumbo jumbo to turn your instructions into fucking gene sequences that multiply and create bacterium that actually execute the instructions, but it doesn't matter because the architecture interface still just looks the same to the software developer.

11 comments:

  1. > I could write my own Wintel code for home and not think about any of that

    It's still monoflavour, it's just Win7+, x64, VS08+ now.

    x86? vs05? xp/vista? Let it go. There's so many other platforms (OSX, Linux, phones*N) vying for attention that that's all Windows gets allocated these days.

    ReplyDelete
  2. Yeah, I'm almost with you on that.

    Unfortunately MS didn't get the message about making all Vista & Win7 versions 64 bit . Grrr.

    Also if you want to actually sell something, XP-32 is still king :

    http://store.steampowered.com/hwsurvey/

    on the plus side XP-64 is negligible, so I can just ignore that case.

    I guess the simplified case is to support XP-32 and Vista-64 , and for the other weirdo cases (Vista-32,XP-64) they can just run the XP-32 version.

    Also the Vista-64 version only supports the newer AMD64 chips with all the instructions, and older chips get the XP-32 version.

    Mmm that's not too bad.

    ReplyDelete
  3. "// cast through char * to make aliasing work ?"

    That doesn't make aliasing work (in theory or in practice). (I think the union doesn't work in theory but does in practice). Casts are irrelevant - what matters is that you can't legally access a value in memory as two incompatible types. T is compatible with const T, and with a member of type T in a struct/union, etc, and with char, but it's not compatible with some unrelated type S. If you access the value as T then you can never access the same value as S, regardless of what you do with the pointer types. If the compiler sees a T* and an S* at the same time, it can always assume changes to one won't affect the other. So you have to copy the value into a value of the new type (use memcpy and the compiler will often inline it and optimise it into nothingness).

    ReplyDelete
  4. "That doesn't make aliasing work (in theory or in practice). (I think the union doesn't work in theory but does in practice)."

    Hmm I'm pretty sure char * does in fact work in practice on some compilers (gcc 32 if I recall correctly), but you are right that it's not a general solution.

    "Casts are irrelevant"

    This is not true, casting to char * is special cased in the standard.

    "is that you can't legally access a value in memory as two incompatible types."

    Yeah this is bollocks and I have to find a real solution to get around it.

    Fucking C99 should have defined a standardized "bit_cast". Either that or just an "allow aliasing here" bracket.

    ReplyDelete
  5. Yeah this is fucking annoying and pointless.

    How am I supposed to do this? I have some type I know is 32 bits and aligned and blah blah, like

    __declspec align(4)
    struct { uint16 a; uint8 b[2]; }

    and I want to call

    Swap32( uint32 * a, uint32 * b);

    and pass in my struct. WTF.

    ReplyDelete
  6. There is one solution for this :

    Swap32( uint32 * a, uint32 * b);

    which is to instead define it like

    Swap32( void* a, void* b);

    because void* and char* are allowed to alias anything. Now, if I actually implemented Swap32 in C, I would have to cast back from void* to uint32* which would be forbidden, so I'll just fucking implement it in assembly. Take that, compiler!

    ReplyDelete
  7. It doesn't work in (at least) 32/64-bit GCC 4.1 and 4.4 on Linux - simple test case:

    int main() {
    float f = 0;
    same_size_bit_cast_p<int, float>(f) = 0x3f800000;
    printf("%f\n", f);
    }

    prints 1.000000 with -O0, and 0.000000 with -O2. (It would work in old GCCs that don't have strict aliasing optimisations, but that's not very interesting.)

    The relevant special case in the standard (3.10.15) is about accessing values of any type through an lvalue of char type, not about casting. Casting might give you an lvalue of char type, but all that matters is the type of the lvalues you use for dereferencing and the casts are a distraction.

    The compiler doesn't care what same_size_bit_cast_p does internally (whether it's doing lots of casts or going through a union or is written in assembly or whatever), it simply knows that writes to an int lvalue can never legally affect reads of a float. A standardised bit_cast wouldn't be any different - the compiler would have to do some whole-program data-flow analysis to work out which variables you might leak aliased pointers into, otherwise it could never be sure if the optimisation is safe.

    You can implement Swap32 in C like

    void Swap32(char* a, char* b) {
    char t[4];
    memcpy(t, a, 4);
    memcpy(a, b, 4);
    memcpy(b, t, 4);
    }

    which gets optimised into mov instructions, and it's legal since values are only being accessed as their original type and as char (since memcpy is defined as (conceptually) copying chars). Not pretty, but at least it's possible.

    ReplyDelete
  8. "The relevant special case in the standard (3.10.15) is about accessing values of any type through an lvalue of char type, not about casting."

    Mmm maybe if I just cast both sides to char * and poke them as char's that will trip things up.

    "The compiler doesn't care what same_size_bit_cast_p does internally"

    I don't think that's true. If you have a memcpy internally it must turn off the no-aliasing assumption. I assume most compilers turn it off when they see asm blocks too.

    "A standardised bit_cast wouldn't be any different - the compiler would have to do some whole-program data-flow analysis"

    That's not true. All I need is a local __assume_aliasing { } block.

    The whole way assume-no-aliasing is associated with types is bonkers IMO. Same thing with putting restrict on variables. It should be portions of *code*. I should be able to take chunks and say "here there be aliasing" or "here there be no aliasing".

    "You can implement Swap32 in C like"

    I don't see how that's valid unless memcpy and casts to char* (accesses as char) get special treatment (eg. triggering reloads from memory afterward). So let me have a way to achieve the same thing.

    And of course to implement memcpy I'd have to do lots of type punning, so how in the world do I implement memcpy?

    I think I could get everything I want if I just had a _CompilerMemFence() directive. CompilerMemFence is not a real memory fence, it just forces the compiler to act as if all writes are flushed before the fence, and all future reads will see any changes from earlier writes.

    ReplyDelete
  9. If you implement a bit_cast with memcpy then it's not really a cast, it's a copy - you're never asking it to read/write the same piece of memory as two different non-char types so there's no unsafe aliasing. It solves the problem by returning a fresh value with the original bytes copied into it rather than returning a pointer that's aliased with the original.

    To implement memcpy in C you just do it like "T src; S dst; for (i...) ((char*)&dst)[i] = ((char*)&src)[i]" - that's where the special case of accessing values through char types comes in, which makes it fine for src to be accessed as both T and char. Nothing is ever used as both T and S so there's no problem, and there's no need to extend the language and compilers with new constructs to add to the already-unpleasantly-complex aliasing mess.

    (Then you rely heavily on the optimiser to make all your char copying not completely awful, or you rely on compiler-specific knowledge and write technically undefined code instead.)

    ReplyDelete
  10. "To implement memcpy in C you just do it like "T src; S dst; for (i...) ((char*)&dst)[i] = ((char*)&src)[i]" - that's where the special case of accessing values through char types comes in, which makes it fine for src"

    That's not how you implement memcpy. You roll up to U32 copies, then U128 copies and maybe do SSE non-temporal streams. How am I supposed to do that without a way to tell the compiler "hey I am aliasing some shit here, respect it".

    I know the reality is memcpy is not really a library function any more, it's basically a keyword in the language with special meaning, but I need to be able to stuff similar to memcpy often so I think it's a salient example.

    ReplyDelete
  11. That's how you implement memcpy portably, which is what C cares about. (Maybe your ints have sizeof(int)==4 but 30-bit range and a 2-bit garbage collection tag, and if you read 4 arbitrary 8-bit chars into them then you'll crash randomly). If you only care about real life and not about obscure historical or hypothetical or future architectures then you're going outside the standard's scope, so you have to rely on compiler-specific features (and then you can use GCC's __attribute__((__may_alias__)) or compile in a separate source file with -fno-strict-aliasing, or use a compiler that doesn't do strict aliasing optimisations at all).

    ReplyDelete