6/07/2010

06-07-10 - Unicode CMD Code Page Checkup

I wrote a while ago about the horrible fuckedness of Unicode support on Windows :

cbloom rants 06-21-08 - 3
cbloom rants 06-15-08 - 2
cbloom rants 06-14-08 - 3

Part of the nastiness was that in Win32 , command line apps get args in OEM code page, but most Windows APIs expect files in ANSI code page. (see my pages above - simply doing OEM to ANSI conversions is not the correct way to fix that) (also SetFileAPIsToOEM is a very bad idea, don't do that).

Here is what I have figured out on Win64 so far :

1. CMD is still using 8 bit characters. Technically they will tell you that CMD is a "true unicode console". eh, sort of. It uses whatever the current code page is set to. Many of those code pages are destructive - they do not preserve the full unicode name. This is what causes the problem I have talked about before of the possibility of having many unicode names which show up as the same file in CMD.

2. "chcp 65001" changes you code page to UTF-8 which is a non-destructive code page. This only works with some fonts that support it (I believe Lucida Concole works if you like that). The raster fonts do not support it.

3. printf() with unicode in the MSVC clib appears to just be written wrong; it does not do the correct code page translation. Even wprintf() passed in correct unicode file names does not do the code page translation correctly. It appears to me they are always converting to ANSI code page. On the other hand, WriteConsoleW does appear to be doing the code page translation correctly. (as usual the pedantic rule lawyers will spout a bunch of bullshit about the fact that printf is just fine the way it is and it doesn't do translations and just passes through binary; not true! if I give it 16 bit chars and it outputs 8 bit chars, clearly it is doing translation and it should let me control how!)


expected : printf with wide strings (unicode) would do translation to the console code page 
(as selected with chcp) so that  characters show up right.
(this should probably be done in the stdout -> console pipe)

observed : does not translate to CMD code page, appears to always use A code page even with
the console is set to another code page

4. CMD has a /U switch to enable unicode. This is not what you think, all it does is make the output of built-in commands unicode. Note that your command line apps might be piped to a unicode text file. To handle this correctly in your own console app, you need to detect that you are being piped to unicode and do unicode output instead of converting to console CP. Ugly ugly.

5. CMD display is still OEM code page by default. In the old days that was almost never changed, but nowadays more people are in fact changing it. To be polite, your app should use GetConsoleOutputCP() , you should *NOT* use SetConsoleOutputCP from a normal command line app because the user's font choice might not support the code page you want.

6. CMD argc/argv argument encoding is still in the console code page (not unicode). That is, if you run a command line app from CMD and auto-complete to select a file with unicode name, you are passed the code page encoding of that unicode name. (eg. it will be bytewise identical to if you did FindFirstW and then did UnicodeToOem). This means GetCommandLineW() is still useless for command line apps - you cannot get back to the original unicode version of the command line string. It is possible for you to get started with unicode args (eg. if somebody many you from CreateProcessW), in which case GetCommandLineW actually is useful, but that is so rare it's not really worth worrying about.


expected : GetCommandLineW (or some other method) would give you the original full unicode arguments
(in all cases)

observed : arguments are only available in CMD code page

7. If I then call system() from my app with the CMD code page name, it fails. If I find the Unicode original and convert it to Ansi, it is found. It appears that system() uses the ANSI code page (like other 8-bit file apis). ( system() just winds up being CreateProcess ). This means that if you just take your own command line that called you and do the same thing again with system() - it might fail. There appears to be no way to take a command line that works in CMD and run it from your app. _wsystem() seems to behave well, so that might be the cleanest way to proceed (presuming you are already doing the work of promoting your CMD code page arguments to proper unicode).


repro : write an app that takes your own full argc/argv array, and spawn a process with those same args 
(use an exe name that involved troublesome characters)

expected : same app runs again

observed : if A and Console code pages differ, your own app may not be found

8. Copy-pasting from CMD consoles seems to be hopeless. You cannot copy a chunk of unicode text from Word or something and paste it into a console and have it work (you would expect it to translate the unicode into the console's code page, but no). eg. you can't copy a unicode file name in explorer and paste it into a console. My oh my.


repro : copy-paste a file name from (unicode) Explorer into CMD

expected : unicode is translated into current CMD code page and thus usable for command line arguments

observed : does not work

9. "dir" seems to cheat. It displays chars that aren't in the OEM code page; I think they must be changing the code page to something else to list the files then changing it back (their output seems to be the same as mine in UTF8 code page). This is sort of okay, but also sort of fucked when you consider problem #8 : because of this dir can show file names which will then not be found if you copy-paste them to your command line!


repro : dir a file with strange characters in it.  copy-paste the text output from dir and type 
"dir <paste>" on the command line

expected : file is found by dir of copy-paste

observed : depending on code page, the file is not be found

So far as I can tell there is no API tell you the code page that your argc/argv was in. That's a pretty insane ommission. (hmm, it might be GetConsoleCP , though I'm not sure about that). (I'm a little unclear about when exactly GetConsoleCP and GetConsoleOutputCP can not be the same; I think the answer is they are only different if output is piped to a file).

I haven't tracked down all the quirks yet, but at the moment my recommendation for best practices for command line apps goes like this :

1. Use GetConsoleCP() to find the input CP. Take your argc/argv and match any file arguments using FindFirstW to get the unicode original names. (I strongly advising using cblib/FileUtil for this as it's the only code I've ever seen that's even close to being correct). For arguments that aren't files, convert from the console code page to wchar.

2. Work internally with wchar. Use the W versions of the Win32 File APIs (not A versions). Use the _w versions of clib FILE APIs.

3. To printf, either just write your own printf that uses WriteConsoleW internally, or convert wide char strings to GetConsoleOutputCP() before calling into printf.

For more information :

Console unicode output - Junfeng Zhang's Windows Programming Notes - Site Home - MSDN Blogs
windows cmd pipe not unicode even with U switch - Stack Overflow
Unicode output on Windows command line - Stack Overflow
INFO SetConsoleOutputCP Only Effective with Unicode Fonts
GetConsoleOutputCP Function (Windows)
BdP Win32ConsoleANSI

Addendum : I've updated cblib and chsh with new versions of everything that should now do all this at least semi-correctly.


BTW a semi-related rant :

WTF are you people who define these APIs not actually programmers? Why the fuck is it called "wcslen" and not "wstrlen" ? And how about just fucking calling it strlen and using the C++ overload capability? Here are some sane ways to do things :


typedef wchar_t wchar; // it's not fucking char_t

// yes int, not size_t mutha fucka
int wstrlen(const wchar * str) { return wcslen(str); }
int strlen(const wchar * str) { return wstrlen(str); }

// wsizeof replaces sizeof for wchar arrays
#define wsizeof(str)    (sizeof(str)/sizeof(wchar))
// best to just always use countof for strings instead of sizeof
#define countof(str)	(sizeof(str)/sizeof(str[0]))

fucking be reasonable to your users. You ask too much and make me do stupid busy work with the pointless difference in names between the non-w and the w string function names.

Also the fact that wprintf exists and yet is horribly fucked is pretty awful. It's one of those cases where I'm tempted to do a


#define wprintf wprintf_does_not_do_what_you_want

kind of thing.

06-07-10 - Exceptions

Small note about exceptions for those who are not aware :

x64 totally changes the way exceptions on the PC are implemented. There are two major paradigms for exceptions :

1. Push/pop tracking and unwinding. Each time you enter a function some unwind info is pushed, and when you leave it is popped. When an exception fires it walks the current unwind stack, calling destructors and looking for handlers. This makes executables a lot larger because it adds lots of push/pop instructions, and it also adds a speed hit everywhere even when exceptions are never thrown becuase it still has to push/pop all the unwind info. MSVC/Win32 has done it this way. ("code driven")

2. Table lookup unwinding. A table is made at compile time of how to unwind each function. When an exception is thrown, the instruction pointer is used to find the unwind info in the table. This is obviously much slower at throw time, but much faster when not throwing, and also doesn't bloat the code size (if you don't count the table, which doesn't normally get loaded at all). I know Gcc has done this for a while (at least the SN Systems version that I used a while ago). ("data driven")

All x64 code uses method #2 now. This is facilitated by the new calling convention. Part of that is that the "frame pointer omission" optimization is forbidden - you must always push the original stack and return address, because that is what is walked to unwind the call sequence when a throw occurs. If you write your own ASM you have to provide prolog/epilog unwind info, which goes into the unwind table to tell it how to unwind your stack.

This has been rough and perhaps slightly wrong. Read here for more :

X64 Unwind Information - FreiK's WebLog - Site Home - MSDN Blogs
Uninformed - vol 4 article 1
Introduction to x64 debugging, part 3 � Nynaeve
Introduction to x64 debugging, part 2 � Nynaeve
Exception Handling (x64)

BTW many people (Pierre, Jeff, etc.) think the overhead of Win32's "code driven" style of exception handling is excessive. I agree to some extent, however, I don't think it's terribly inefficient. That is, it's close to the necessary amount of code and time that you have to commit if you actually want to catch all errors and be able to handle them. The way we write more efficient code is simply by not catching all errors (or not being able to continue or shut down cleanly from them). That is we decide that certain types of errors will just crash the app. What I'm saying is if you actually wrote error checkers and recovery for every place you could have a failure, it would be comparable to the overhead of exception handling. (ADDENDUM : nah this is just nonsense)

6/02/2010

06-02-10 - Some random Win64 junk

To build x64 on the command line, use

vcvarsall.bat amd64

Oddly the x64 compiler is still in "c:\program files (x86)" , not "c:\program files" where it's supposed to be. Also there's this "x86_amd64" thing, I was like "WTF is that?" , it's the cross-compiler to build x64 apps from x86 machines.

amd64 is a synonym for x64. IA64 is Itanium.

The actual "CL" command line to build x64 vs x86 can stay unchanged I think (not sure about this yet). For example you still link shell32.lib , and you still define "WIN32".

Another nasty case where 64 bit bytes your ass is in the old printf non-typed varargs problem. If you use cblib/safeprintf this shouldn't be too bad. Fucking printf formatting for 64 bit ints is not standardized. One nice thing on MSVC is you can use "%Id" for size_t sized things, so it switches between 32 and 64. For real 64 bit use "%I64d" (on MSVC anyway).

size_t really fucking annoys me because it's unsigned. When I do a strlen() I almost always want that to go into a signed int because I'm moving counters around that might go negative. So I either have to cast or I get tons of warnings. Really fucking annoying. So I use my own strlen32 and anyone who thinks I will call strlen on a string greater than 2 GB is smoking some crack.

Here are the MSC vers :


MSVC++ 9.0  _MSC_VER = 1500
MSVC++ 8.0  _MSC_VER = 1400
MSVC++ 7.1  _MSC_VER = 1310
MSVC++ 7.0  _MSC_VER = 1300
MSVC++ 6.0  _MSC_VER = 1200
MSVC++ 5.0  _MSC_VER = 1100

Some important revs : "volatile" changes meaning in ver 1400 ; most of the intrinsics show up in 1300 but more show up in 1400. Template standard compliance changes dramatically in 1400.

06-02-10 - Directives in the right place

I'm annoyed with "volatile" and "restrict" and all that shit because I believe they put the directive in the wrong place - on the variable declaration - when really what you are trying to do is modify certain *actions* , so the qualifier should be on the *actions* not the variables.

eg. if I write some matrix multiply routine and I know I have made it memory-alias safe, I want to say "restrict" on the *code* chunk, not on certain variables.

Same thing with "volatile" in the Java/C++0x/MSVC>=2005 sense, it's really a command for the load or the store, eg. "this store must actually be written to memory and must be ordered against other stores" , not a descriptor of a variable.

06-02-10 - 64 bit transition

The broke ass 64/32 transition is pretty annoying. Some things that MS could/should have done to make it easier :

Don't sell Win7-32 for desktops. If you go to most PC makers (Dell, Lenovo, etc.) and spec a machine, the default OS will be Win7 - 32 bit. I'm sure lots of people don't think about that and just hit okay. That fucking sucks, they should be encouraging everyone to standardize to 64 bit.

Give me a 32+64 combined exe. This is trivial and obvious, I don't get why I don't have it. For most apps there's no need for a whole separate Program Files (x86) / Program Files install dir nonsense. Just let me make a .EXE that has the x86 and the x64 exes in it, and execute the right one. That way I can just distribute my app as a single EXE and clients get the right thing.

Let me call 32 bit DLL's from 64 bit code. As usual people write a bunch of nonsense about why you can't do it that just isn't true. I can't see any real good reason why I can't call 32 bit DLL's from 64 bit code. There are a variety of solutions to the memory addressing range problem. The key thing is that the memory addresses we are passing around are *virtual* addresses, not physical, and each app can have different virtual address page tables. You could easily make the 32-bit to 64-bit thunk act like a process switch, which swaps the page tables. It's perfectly possible for an app with 32 bit of address space to access 64 bits of physical memory (of course this is exactly what 32 bit apps running on 64 bit windows do). Okay, that would be pretty ugly, but there's a very simple solution - reserve the lower 4G of virtual address space in 64 bit apps for communication with 32 bit apps. That is, make all the default calls to HeapAlloc go to > 4G , but give me a flag to alloc in the lower 4 G. Then if I want to call to a 32 bit DLL I have to pass it memory from the lower 4G space. Yes obviously you can't just build your old app and attach to a 32 bit DLL and have it just work, but it would still give me a way to access it when I need it (and there are plenty of other reasons why you wouldn't be able to just link in the 32 bit dll without thinking, eg. the sizes of many of your types have changed).

Right now if I have some old 32 bit DLL that I can't recompile for whatever reason and I need to get to it from my 64 bit app, the recommended solution is to make a little 32 bit server app that calls the DLL for me and use interprocess communication with named memory maps to give its data to the 64 bit app. That's okay and all but they easily could have packaged that into a much easier thing to do.

This code project article is actually one of the best I've seen on the transition. It reminds me of one of the weird broke-ass things I've seen : some of my favorite command line apps are still old 32 bit builds. They still work ... mostly. The problem is that they go through the fucking WOW64 directory remapping, so if you try to do anything with them in the Windows system directories you can get *seriously* fucked up. I really don't like the WOW64 directory remap thing. It only exists to fix broke-ass apps that manually look for stuff in "c:\program files" or whatever. I feel it could have been off by default and then turned on only for exe's that need it. I understand their way makes most 32 bit apps work out of the box which is what most people want, so that all makes sense and is fine, but it is nasty that my perfectly good 32 bit apps don't actually get to see the true state of the machine for no good reason.

To give an example of the most insane case : I use my own "dir" app. Before I rebuilt it, I was using the old 32 bit exe version. It literally lies to me about what files are in a given dir because it gets the WOW64 remap. That's whack. The 32 bit exes work just fine on Win64 and the only thing forcing me to rebuild a lot of these things is just to make them avoid the WOW64.

5/29/2010

05-29-10 - Some more x64

Okay , MASM/MSDev support for x64 is a bit fucked. MSDev has built-in support for .ASM starting in VC 2005 which does everything for you, sets up custom build rule, etc. The problem is, it hard-codes to ML.EXE - not ML64. Apparently they have fixed this for VC 2010 but it is basically impossible to back-fix. (in VC 2008 the custom build rule for .ASM is in an XML file, so you can fix it yourself thusly )

The workaround goes like this :

Go to "c:\program files (x86)\microsoft visual studio 8\vc\bin". Find the occurance of ML64.exe ; copy them to ML.exe . Now you can add .ASM files to your project. Go to the Win32 platform config and exclude them from build in Win32.

You now have .ASM files for ML64. For x86/32 - just use inline assembly. For x64, you extern from your ASM file.

Calling to x64 ASM is actually very easy, even easier than x86, and there are more volatile registers and the convention is that caller has to do all the saving. All of this means that you as a little writer of ASM helper routines can get away with doing very little. Usually your args are right there in {rcx,rdx,r8,r9} , and then you can use {rax,r10,r11} as temp space, so you don't even have to bother with saving space on the stack or any of that. See list of volatile registers

BTW the best docs are just the full AMD64 manuals .

For example here's a full working .ASM file :


public my_cmpxchg64

.CODE

align 8
my_cmpxchg64 PROC 

  mov rax, [rdx]
  lock cmpxchg [rcx], r8
  jne my_cmpxchg64_fail
  mov rax, 1
  ret

align 8
my_cmpxchg64_fail:
  mov [rdx], rax
  mov rax, 0
  ret
align 8
my_cmpxchg64 ENDP

END

And how to get to it from C :


extern "C"  extern int my_cmpxchg64( uint64 * val, uint64 * oldV, const uint64 newV );

BTW one of the great things about posting things on the net is just that it makes me check myself. That cmpxchg64 has a stupid branch, I think this version is better :


align 8
my_cmpxchg64 PROC
  mov rax, [rdx]
  lock cmpxchg [rcx], r8
  sete cl
  mov [rdx], rax
  movzx rax,cl
  ret
my_cmpxchg64 ENDP

and you can probably do better. (for example it's probably better to just define your function as returning unsigned char and then you can avoid the movzx and let the caller worry about that)

ADDENDUM : I just found a new evil secret way I'm fucked. Unions with size mismatches appears not to even be a warning of any kind. So for example you can silently have this in your code :


union Fucked
{
    struct
    {
        void * p1;
        int t;
    } s;
    uint64  i;
};

build in 64 bit and it's just hose city. BTW I think using unions as a datatype in general is probably bad practice. If you need to be doing that for some fucked reason, you should just store the member as raw bits, and then same_size_bit_cast() to convert it to the various types. In other words, the dual identity of that memory should be a part of the imperative code, not a part of the data declaration.

05-29-10 - Lock Free in x64

I mentioned long ago in the low level threading articles that some of the algorithms are a bit problematic on with 64 bit pointers because we don't have large enough atomic operations.

The basic problem is that for many of the lock-free algorithms we need to be able to do a DCAS , that is a CAS of two pointer-sized values, or a pointer and a counter. When our pointer was 32 bits, we could use a 64 bit CAS to implement DCAS. If our pointer is 64 bits then we need a 128 bit CAS to implement DCAS the same way. There are various solutions to this :

1. Use 128 bit CAS. x64 has cmpxchg16b now which is exactly what you need. This is obviously simple and nice. There are a few minor problems :

1.A. There are not other 128 bit atomics, eg. Exchange and Add and such are missing. These can be implemented in terms of loops of CAS, but that is a very minor suckitude.

1.B. Early AMD64 chips do not have cmpxchg16b. You have to check for its presence with a CPUID call. If it doesn't exist you are seriously fucked. Fortunately these chips are pretty rare, so you can just use a really evil fallback to keep working on them : either disable threading completely on them, or simply run the 32 bit version of your app. The easiest way to do that is to have your installer check the CPUID flag and install the 32 bit x86 version of your app instead of the 64 bit version.

1.C. All your lock-free nodes become 16 bytes instead of 8 bytes. This does things like make your minimum alloc size 16 bytes instead of 8 bytes. This is part of the general bloating of 64 bit structs and mildly sucks. (BTW you can see this in winnt.h as MEMORY_ALLOCATION_ALIGMENT is 16 on Win64 and 8 on Win32).

1.D. _InterlockedCompareExchange128 only exists on newer versions of MSVC so you have to write it yourself in ASM for older versions. Urg.

So #1 is an okay solution, but what are the alternative ?

2. Pack {Pointer,Count} into 64 bits. This is of course what Windows does for SLIST, so doing this is actually very safe. Currently pointers on Windows are only 44 bits because of this. They will move to 48 and then 52. You can easily store a 52 bit pointer + a 16 bit count in 64 bits (the 52 bit pointer has the bottom four bits zero so you actually have 16 bits to work with). Then you can just keep using 64 bit CAS. This has no disadvantage that I know of other than the fact that twenty years from now you'll have to touch your code again.

3. You can implement arbitrary-sized CAS in terms of pointer CAS. The powerful standard paradigm for this type of thing is to use pointers to data instead of data by value, so you are just swapping pointers instead of swapping values. It's very simple, when you want to change a value, you malloc a copy of it and change the copy, and then swap in the pointer to the new version. You CAS on the pointer swap. The "malloc" can just be taking data from a recycled buffer which uses hazard pointers to keep threads from using the same temp item at the same time. This is a somewhat more complex way to do things conceptually, but it is very powerful and general, and for anyone doing really serious lockfree work, a hazard pointer system is a good thing to have. See for example "Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS".

You could also of course use a hybrid of 2 & 3. You could use a packed 64 bit {pointer,count} until your pointer becomes more than 52 bits, and then switch to a pointer to extra data.

05-29-10 - x64 so far

x64 linkage that's been useful so far :

__asm__ cmpxchg8bcmpxchg16b - comp.programming.threads Google Groups
_InterlockedCompareExchange Intrinsic Functions
x86-64 Tour of Intel Manuals
x64 Starting Out in 64-Bit Windows Systems with Visual C++
Writing 64-bit programs
Windows Data Alignment on IPF, x86, and x86-64
Use of __m128i as two 64 bits integers
Tricks for Porting Applications to 64-Bit Windows on AMD64
The history of calling conventions, part 5 amd64 - The Old New Thing - Site Home - MSDN Blogs
Snippets lifo.h
Predefined Macros (CC++)
Physical Address Extension - PAE Memory and Windows
nolowmem (Windows Driver Kit)
New Intrinsic Support in Visual Studio 2008 - Visual C++ Team Blog - Site Home - MSDN Blogs
Moving to Windows Vista x64
Moving to Windows Vista x64 - CodeProject
Mark Williams Blog jmp'ing around Win64 with ml64.exe and Assembly Language
Kernel Exports Added for Version 6.0
Is there a portable equivalent to DebugBreak()__debugbreak - Stack Overflow
How to Log Stack Frames with Windows x64 - Stack Overflow
BCDEdit Command-Line Options
Available Switch Options for Windows NT Boot.ini File
AMD64 Subpage
AMD64 (EM64T) architecture - CodeProject
20 issues of porting C++ code on the 64-bit platform

One unexpected annoyance has been that a lot of the Win32 function signatures have changed. For example LRESULT is now a pointer not a LONG. This is a particular problem because Win32 has always made heavy use of cramming the wrong type into various places, eg. for GetWindowLong and stuffing pointers in LPARAM's and all that kind of shit. So you wind up having tons of C-style casts when you write Windows code. I have made good use of these guys :



// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to & same_size_value_cast( t_fm & from )
{
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    // just value cast :
    return (t_to) from;
}

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to & same_size_bit_cast_p( t_fm & from )
{
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    // cast through char * to make aliasing work ?
    char * ptr = (char *) &from;
    return *( (t_to *) ptr );
}

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
// cast with union is better for gcc / Xenon :
template < typename t_to, typename t_fm >
t_to & same_size_bit_cast_u( t_fm & from )
{
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    union _bit_cast_union
    {
        t_fm fm;
        t_to to;        
    };
    _bit_cast_union converter = { from };
    return converter.to;
}

// check_value_cast just does a static_cast and makes sure you didn't wreck the value
template < typename t_to, typename t_fm >
t_to check_value_cast( const t_fm & from )
{
    t_to to = static_cast<t_to>(from);
    ASSERT( static_cast<t_fm>(to) == from );
    return to;
}

inline int ptr_diff_32( ptrdiff_t diff )
{
    return check_value_cast<int>(diff);
}

BTW this all has made me realize that the recent x86-32 monotony on PC's has been a delightful stable period for development. I had almost forgotten that it used to be always like this. Now to do simple shit in my code, I have to detect if it's x86 or x64 , if it is x64, do I have an MSC version that has the intrinsics I need? if not I have to write a got damn MASM file. Oh and I often have the check for Vista vs. XP to tell if I have various kernel calls. For example :


#if _MSC_VER > 1400

// have intrinsic
_InterlockedExchange64()

#elif _X86_NOT_X64_

// I can use inline asm
__asm { cmpxchg8b ... }

#elif OS_IS_VISTA_NO_XP

// kernel library call available
InterlockedExchange64()

#else

// X64 , not Vista (or want to be XP compatible) , older compiler without intrinsic,
//  FUCK !

#error just use a new newer MSVC version for X64 because I don't want to fucking write MASM rouintes

#endif

Even ignoring the pain of the last FUCK branch which requires making a .ASM file, the fact that I had to do a bunch of version/target checks to get the right code for the other paths is a new and evil pain.

Oh, while I'm ranting, fucking MSDN is now showing all the VS 2010 documentation by default, and they don't fucking tell you what version things became available in.

This actually reminds me of the bad old days when I got started, when processors and instruction sets were changing rapidly. You actually had to make different executables for 386/486 and then Pentium, and then PPro/P3/etc (not to mention the AMD chips that had their own special shiznit). Once we got to the PPro it really settled down and we had a wonderful monotony of well developed x86 on out-of-order machines that continued up to the new Core/Nehalem chips (only broken by the anomalous blip of Itanium that we all ignored as it went down in flames like the Hindenburg). Obviously we've had consoles and Mac and other platforms to deal with, but that was for real products that want portability to deal with, I could write my own Wintel code for home and not think about any of that. Well Wintel is monoflavor no more.

The period of CISC and chips with fancy register renaming and so-on was pretty fucking awesome for software developers, because you see the same interface for all those chips, and then behind the scenes they do magic mumbo jumbo to turn your instructions into fucking gene sequences that multiply and create bacterium that actually execute the instructions, but it doesn't matter because the architecture interface still just looks the same to the software developer.

5/27/2010

05-27-10 - Weird Compiler Error

Blurg just fought one of the weirder problems I've ever seen.

Here's the simple test case I cooked up :


void fuck()
{
#ifdef RR_ASSERT
#pragma RR_PRAGMA_MESSAGE("yes")
#else
#pragma RR_PRAGMA_MESSAGE("no")
#endif

    RR_ASSERT(true);
}

And here is the compiler error :

1>.\rrSurfaceSTBI.cpp(43) : message: yes
1>.\rrSurfaceSTBI.cpp(48) : error C3861: 'RR_ASSERT': identifier not found

Eh !? Serious WTF !? I know RR_ASSERT is defined, and then it says it's not found !? WTF !?

Well a few lines above that is the key. There was this :


#undef  assert
#define assert  RR_ASSERT

which seems like it couldn't possibly cause this, right? It's just aliasing the standard C assert() to mine. Not possible related, right? But when I commented out that bit the problem went away. So of course my first thought is clean-rebuild all, did I have precompiled headers on by mistake? etc. I assume the compiler has gone mad.

Well, it turns out that somewhere way back in RR_ASSERT I was in a branch that caused me to have this definition for RR_ASSERT :


#define RR_ASSERT(exp)  assert(exp)

This creates a strange state for the preprocessor. RR_ASSERT is now a recursive macro. When you actually try to use it in code, the preprocessor apparently just bails and doesn't do any text substitution. But, the name of the preprocessor symbol is still defined, so my ifdef check still saw RR_ASSERT existing. Evil.

BTW the thing that kicked this off is that fucking VC x64 doesn't support inline assembly. ARGH YOU COCK ASS. Because of that we had long ago written something like


#ifdef _X86
#define RR_ASSERT_BREAK()  __asm int 3
#else
#define RR_ASSERT_BREAK()  assert(0)
#endif

which is what caused the difficulty.

05-27-10 - Loop Branch Inversion

A major optimization paradigm I'm really missing from C++ is something I will call "loop branch inversion". The problem is for code sharing and cleanliness you often wind up with cases where you have a lot of logic in some outer loops that find all the things you should work on, and then in the inner loop you have to do a conditional to pick what operation to do. eg :

LoopAndDoWork(query,worktype)
{
    Make bounding area
    Do Kd-tree descent .. 
    loop ( tree nodes )
    {
        bounding intersection, etc.
        found an object
        DoPerObjectWork(object);
    }
}

The problem is that DoPerObjectWork then is some conditional, maybe something like :

DoPerObjectWork(object)
{
    switch(workType)
    {
    ...
    }
}

or even worse - it's a function pointer that you call back.

Instead you would like the switch on workType to be on the outside. WorkType is a constant all the way through the code, so I can just propagate that branch up through the loops, but there's way to express it neatly in C.

The only real option is with templates. You make DoPerObjectWork a functor and you make LoopAndDoWork a template. The other option is to make an outer loop dispatcher to constants. That is, make workType a template parameter instead of an integer :


template < int workType >
void t_LoopAndDoWork(query)
{
    ...
}

and then provide a dispatcher which does the branch outside :

LoopAndDoWork(query,worktype)
{
    switch(workType)
    {
    case 0 : t_LoopAndDoWork<0>(query); break;
    case 1 : t_LoopAndDoWork<1>(query); break;
    ...
    }
}

this is an okay solution, but it means you have to reproduce the branch on workType in the outer loop and inner loop. This is not a speed penalty becaus the inner loop is a branch on constant which goes away, it's just ugly for code maintenance purposes because they have to be kept in sync and can be far apart in the code.

This is a general pattern - use templates to turn a variable parameter into a constant and then use an outer dispatcher to turn a variable into the right template call. But it's ugly.

BTW when doing this kind of thing you are often wind up with loops on constants. The compiler often can't figure out that a loop on a constant can be unrolled. It's better to rearrange the loop on constant into branches. For example I'm often doing all this on pixels where the pixel can have between 1 and 4 channels. Instead of this :


for(int c=0;c<channels;c++)
{
    DoStuff(c);
}

where channels is a constant (template parameter), it's better to do :

DoStuff(0);
if ( channels > 1 ) DoStuff(1);
if ( channels > 2 ) DoStuff(2);
if ( channels > 3 ) DoStuff(3);

because those ifs reliably go away.

old rants