6/17/2011

06-17-11 - C casting is the devil

C-style casting is super dangerous, as you know, but how can you do better?

There are various situations where I need to cast just to make the compiler happy that aren't actually operational casts. That is, if I was writing ASM there would be no cast there. For example something like :

U16 * p;
p[0] = 1;
U8 * p2 = (U8 *)p;
p2[1] = 7;
is a cast that changes the behavior of the pointer (eg. "operational"). But, something like :
U16 * p;
*p = 1;
U8 * p2 = (U8 *)p;
p2 += step;
p = (U16 *) p2;
*p = 2;
is not really a functional cast, but I have to do it because I want to increment the pointer by some step in bytes, and there's no way to express that in C without a cast.

Any time I see a C-style cast in code I think "that's a bug waiting to happen" and I want to avoid it. So let's look at some ways to do that.

1. Well, since we did this as an example already, we can hide those casts with something like ByteStepPointer :


template<typename T>
T * ByteStepPointer(T * ptr, ptrdiff_t step)
{
    return (T *)( ((intptr_t)ptr) + step );
}

our goal here is to hide the nasty dangerous casts from the code we write every day, and bundle it into little utility functions where it's clear what the purpose of the cast is. So now we can write out example as :
U16 * p;
*p = 1;
p = ByteStepPointer(p,step);
*p = 2;
which is much prettier and also much safer.

2. The fact that "void *" in C++ doesn't cast to arbitrary pointers the way it does in C is really fucking annoying. It means there is no "generic memory location" type. I've been experimenting with making the casts in and out of void explicit :


template<typename T>
T * CastVoid(void * ptr)
{
    return (T *)( ptr );
}

template<typename T>
void * VoidCast(T * ptr)
{
    return (void *)( ptr );
}

but it sucks that it's so verbose. In C++0x you can do this neater because you can template specialize based on the left-hand-side. So in current C++ you have to write
Actor * a = CastVoid<Actor>( memory );
but in 0x you will be able to write just
Actor * a = CastVoid( memory );

There are a few cases where you need this, one is to call basic utils like malloc or memset - it's not useful to make the cast clear in this case because the fact that I'm calling memset is clear enough that I'm treating this pointer as untyped memory; another is if you have some generic "void *" payload in a node or message.

Again you don't want just a play C-style cast here, for example something like :

Actor * a = (Actor *) node->data;
is a bug waiting to happen if you change "data" to an int (among other things).

3. A common annoying case is having to cast signed/unsigned. It should be obvious that when I write :

U32 set = blah;
U32 mask = set & (-set);
that I want the "-" operator to act as (~set + 1) on the bits and I don't care that it's unsigned, but C won't let you do that. (see previous rants about how what I really want in this scenario is a "#pragma requires(twos_complement)" ; warning me about the sign is fucking useless for portability because it just makes me cast, if you want to make a real portable language you have to be able to express capabilities of the platform and constraints of the algorithm).

So, usually what you want is a cast that gives you the signed type of the same register size, and that doesn't exist. So I made my own :


static inline S8  Signed(U8 x)  { return (S8) x; }
static inline S16 Signed(U16 x) { return (S16) x; }
static inline S32 Signed(U32 x) { return (S32) x; }
static inline S64 Signed(U64 x) { return (S64) x; }

static inline U8  Unsigned(S8 x)  { return (U8) x; }
static inline U16 Unsigned(S16 x) { return (U16) x; }
static inline U32 Unsigned(S32 x) { return (U32) x; }
static inline U64 Unsigned(S64 x) { return (U64) x; }

So for example, this code :
mask = set & (-(S32)set);
is a bug waiting to happen if you switch to 64-bit sets. But this :
mask = set & (-Signed(set));
is robust. (well, robust if you include a compiler assert that you're 2's complement)

4. Probably the most common case is where you "know" a value is small and need to put it in a smaller type. eg.

int x = 7;
U8 small = (U8) x;
But all integer-size-change casts are super unsafe, because you can later change the code such that x doesn't fit in "small" anymore.

(often you were just wrong or lazy about "knowing" that the value fit in the smaller type. One of the most common cases for this right now is putting file sizes and memory sizes into 32-bit ints. Lots of people get annoying compiler warnings about that and think "oh, I know this is less than 2 GB so I'll just C-style cast". Oh no, that is a huge maintenance nightmare. In two years you try to run on a larger file and suddenly you have bugs all over and you can't find them because you used C-style casts. Start checking your casts!).

You can do this with a template thusly :


// check_value_cast just does a static_cast and makes sure you didn't wreck the value
template <typename t_to, typename t_fm>
t_to check_cast( const t_fm & from )
{
    t_to to = static_cast<t_to>(from);
    ASSERT( static_cast<t_fm>(to) == from );
    return to;
}

but it is so common that I find the template a bit excessively verbose (again C++0x with LHS specialization would help, you could then write just :
small = check( x );

small = clamp( x );
which is much nicer).

To do clamp casts with a template is difficult. You can use std::numeric_limits to get the ranges of the dest type :

template <typename t_to, typename t_fm>
t_to clamp_cast( const t_fm & from )
{
    t_to lo = std::numeric_limits<t_to>::min();
    t_to hi = std::numeric_limits<t_to>::max();
    if ( from < lo ) return lo; // !
    if ( from > hi ) return hi; // !
    t_to to = static_cast<t_to>(from);
    RR_ASSERT( static_cast<t_fm>(to) == from ); 
    return to;
}
however, the compares inherent (at !) in clamping are problematic, for example if you're trying to clamp_cast from signed to unsigned you may get warnings there (you can also get the unsigned compare against zero warning when lo is 0). (? is there a nice solution to this ? you want to cast to the larger ranger of the two types for the purpose of the compare, so you could make some template helpers that do the compare in the wider of the two types, but that seems a right mess).

Rather than try to fix all that I just use non-template versions for our basic types :


static inline U8 S32ToU8Clamp(S32 i)    { return (U8) CLAMP(i,0,0xFF); }
static inline U8 S32ToU8Check(S32 i)    { ASSERT( i == (S32)S32ToU8Clamp(i) ); return (U8)i; }

static inline U16 S32ToU16Clamp(S32 i)  { return (U16) CLAMP(i,0,0xFFFF); }
static inline U16 S32ToU16Check(S32 i)  { ASSERT( i == (S32)S32ToU16Clamp(i) ); return (U16)i; }

static inline U32 S64ToU32Clamp(S64 i)  { return (U32) CLAMP(i,0,0xFFFFFFFFUL); }
static inline U32 S64ToU32Check(S64 i)  { ASSERT( i == (S64)S64ToU32Clamp(i) ); return (U32)i; }

static inline U8 U32ToU8Clamp(U32 i)    { return (U8) CLAMP(i,0,0xFF); }
static inline U8 U32ToU8Check(U32 i)    { ASSERT( i == (U32)U32ToU8Clamp(i) ); return (U8)i; }

static inline U16 U32ToU16Clamp(U32 i)  { return (U16) CLAMP(i,0,0xFFFF); }
static inline U16 U32ToU16Check(U32 i)  { ASSERT( i == (U32)U32ToU16Clamp(i) ); return (U16)i; }

static inline U32 U64ToU32Clamp(U64 i)  { return (U32) CLAMP(i,0,0xFFFFFFFFUL); }
static inline U32 U64ToU32Check(U64 i)  { ASSERT( i == (U64)U64ToU32Clamp(i) ); return (U32)i; }

static inline S32 U64ToS32Check(U64 i)  { S32 ret = (S32)i; ASSERT( (U64)ret == i ); return ret; }
static inline S32 S64ToS32Check(S64 i)  { S32 ret = (S32)i; ASSERT( (S64)ret == i ); return ret; }

which is sort of marginally okay. Maybe it would be nicer if I left off the type it was casting from in the name.

12 comments:

jfb said...

Hmm.

Maybe something along the lines of casting to unsigned T first, then instead of A < min, A - min > A. That'll shut any warnings right up.

Be sure to cast A and min to unsigned before doing that, however. Othrewise, whole program optimization on some compilers (Clang, for one) may take that as an opportunity to optimize that to 'false' if it can prove min >= 0 and eliminate it (signed integer overflow behavior is "undefined", so...).

ejulien said...

Sorry, I did not read the whole post but how about using an union?

union MultiPtr
{
char *pc;
short *ps;
};

mptr.ps[0] = 16;
mptr.pc++;
mptr.pc[0] = 8;

Autodidactic Asphyxiation said...

"Maybe it would be nicer if I left off the type it was casting from in the name."

The real problem is that you're not getting away from implicit casting on the argument. If you use the incorrect version, you can get screwed.

Also, ryg and others suggest that "From" names seem to be better than "To" names:

U8 byte = U8FromS32(some_int)

cbloom said...

"The real problem is that you're not getting away from implicit casting on the argument. If you use the incorrect version, you can get screwed."

How's that? If it implicit casts without a warning then you should be fine (?)


On further thought I'm pretty sure that the overloaded version that leaves off the from is The Win , eg .:

U8 byte = clampU8(some_int);
S16 word = checkS16(some_long);

Autodidactic Asphyxiation said...

I wasn't being clear because I was talking about two different things. But yeah, it is definitely better to templatize the argument, and probably not the return value. Without that you could have the problem:

U8 byte = U16toU8(some_int)

Where the int to U16 conversion screws you before your clamping happens.

The other thing I was talking about is just what you call it. Like clampU8() is a weird name since it sounds like you're clamping a U8, not TO a U8. You're probably thinking about it in terms of where the template type would go. So something like:

U8 byte = U8Clamp(some_int);

Looks better to me.

cbloom said...

"U8 byte = U16toU8(some_int)
Where the int to U16 conversion screws you before your clamping happens."

Yeah, but no C compiler should silently do that implicit cast ? You should get a compile warning there.

Autodidactic Asphyxiation said...

I tried it with gcc 4.2 -Wall (old, I know) and clang through the web demo compiler (which is using default warnings level) and got no warning, even with optimizations. They both complain if I try to pass in a too-large constant. Maybe you should try?

char foo(short s) {
return (char)s;
}

int main() {
int i = 100000;
return foo(i);
}

cbloom said...

main.cpp(236) : warning C4244: 'argument' : conversion from 'int' to 'short', possible loss of data

Autodidactic Asphyxiation said...

What warning level is that?

Maybe my compilers will start warning narrowing conversions one day. But I guess if you're going cross-platform, you probably shouldn't depend on it.

extrememoderate said...

SafeInt solves out-of-range errors on casting.

cbloom said...

I'm not a tiny code fetishist, but holy bejeezus SafeInt might be the most over-engineered over-sized thing I've ever seen.

castano said...

and it's "used extensively throughout Microsoft" LOL!

old rants