05-25-09 - Undefined C

There's a lot of random little shit in C that's technically "undefined" (meaning the compiler/hardware are allowed to do anything they want). Basic stuff like right shifting signed values or doing addition that overflows or casting between different sizes/types of ints.

It's fucking retarded. It's a dangerous and pointless cop out. C is supposed to be the low-level systems language, it should have ways of clearly exposing the actual function of the system.

First of all, there should just be a "Normal C" machine spec that clearly specifies the behavior that 99% of current hardware has (like right shifting signed values shifts in sign bits). Then we could just say "this platform is a Normal C compliant platform".

But even aside from that, just saying "it's undefined" is a horrible way to support varying platforms. What you should do is have requirements and contracts as meta-code in the programs. A program might start out as only min-spec C. In that case, any use of undefined behavior should be a compile error! You tried to do something the platform does not offer, you fail compile!

Then, if the program needs certain things, it can add them to its requirements list, like "I need signed right shift to behave like this". Now that program will only compile on systems that provide that. The needs of the code are clearly listed and the contract is enforced. It should never be possible to access undefined behavior. The bad behavior should either be forbidden, or it should be enforced to do something specific.

All good robust software should be written this way. It's unforgiveable that our basic language tool isn't.

Instead we have a situation where someone can write code like :

    int64 a = ...;
    uint32 b = (uint32) ( a >> 16 );

WTF does that do? Does it work on this new platform I'm trying to compile on ?

Of course you/me as a user programmer can help the situation by putting in lots of in-line unit tests, like :

    int64 a = ...;
    uint32 b = (uint32) ( a >> 16 );

    UNIT_TEST( (uint32) ( ((int64)-1) >> 16 ) == 0xFFFFFFFF );


Autodidactic Asphyxiation said...

And yet 3["abcd"]; is defined.

ryg said...

One of the most annoying undefined things is the signedness of plain "char", it depends on the platform and the compiler. (In fact the C spec mandates that "char" is neither "signed char" nor "unsigned char" but always to be treated as a distinct type from the other two, but that's a different topic).

Most compilers actually default to signed, which means that "character traits" table accesses and the like are very easy to get wrong (yielding negative array indices, which are... undefined!).

The undefinedness of e.g. signed >> is about supporting one's complement machines (which generally speaking shouldn't have been a priority from the 98 C++ and 99 C standards onwards, primarily since the latter also mandates IEEE compliant FP, which seems to me to be a much stronger requirement), but I've never seen any rationale whatsoever for the signed/unsigned char mess. Just guessing here, I think that non-ASCII characters were a non-issue when C was invented, and by the point they standardized it some compilers went one way and some the other and nobody really cared. Well, nowadays with UTF8 being the most popular 8-bit character encoding, it does make a difference, and unsigned is just the right choice nowadays.

old rants