C-style casting is super dangerous, as you know, but how can you do better?
There are various situations where I need to cast just to make the compiler happy that aren't actually operational casts. That is, if I was
writing ASM there would be no cast there. For example something like :
U16 * p;
p[0] = 1;
U8 * p2 = (U8 *)p;
p2[1] = 7;
is a cast that changes the behavior of the pointer (eg. "operational"). But, something like :
U16 * p;
*p = 1;
U8 * p2 = (U8 *)p;
p2 += step;
p = (U16 *) p2;
*p = 2;
is not really a functional cast, but I have to do it because I want to increment the pointer by some step in bytes, and there's no way to express that in C
without a cast.
Any time I see a C-style cast in code I think "that's a bug waiting to happen" and I want to avoid it. So let's look at some ways to
do that.
1. Well, since we did this as an example already, we can hide those casts with something like ByteStepPointer :
template<
typename T>
T * ByteStepPointer(T * ptr, ptrdiff_t step)
{
return (T *)( ((intptr_t)ptr) + step );
}
our goal here is to hide the nasty dangerous casts from the code we write every day, and bundle it into little
utility functions where it's clear what the purpose of the cast is. So now we can write out example as :
U16 * p;
*p = 1;
p = ByteStepPointer(p,step);
*p = 2;
which is much prettier and also much safer.
2. The fact that "void *" in C++ doesn't cast to arbitrary pointers the way it does in C is really fucking annoying. It means there is no
"generic memory location" type. I've been experimenting with making the casts in and out of void explicit :
template<
typename T>
T * CastVoid(void * ptr)
{
return (T *)( ptr );
}
template<
typename T>
void * VoidCast(T * ptr)
{
return (void *)( ptr );
}
but it sucks that it's so verbose. In C++0x you can do this neater because you can template specialize based on the left-hand-side. So in
current C++ you have to write
Actor * a = CastVoid<
Actor>( memory );
but in 0x you will be able to write just
Actor * a = CastVoid( memory );
There are a few cases where you need this, one is to call basic utils like malloc or memset - it's not useful to make the cast clear in this
case because the fact that I'm calling memset is clear enough that I'm treating this pointer as untyped memory; another is if you have some
generic "void *" payload in a node or message.
Again you don't want just a play C-style cast here, for example something like :
Actor * a = (Actor *) node->data;
is a bug waiting to happen if you change "data" to an int (among other things).
3. A common annoying case is having to cast signed/unsigned. It should be obvious that when I write :
U32 set = blah;
U32 mask = set & (-set);
that I want the "-" operator to act as (~set + 1) on the bits and I don't care that it's unsigned, but C won't let you do that.
(see previous rants about how what I
really want in this scenario is a "#pragma requires(twos_complement)" ; warning me about the sign is fucking useless for portability because it
just makes me cast, if you want to make a real portable language you have to be able to express capabilities of the platform and constraints of
the algorithm).
So, usually what you want is a cast that gives you the signed type of the same register size, and that doesn't exist. So I made my own :
static inline S8 Signed(U8 x) { return (S8) x; }
static inline S16 Signed(U16 x) { return (S16) x; }
static inline S32 Signed(U32 x) { return (S32) x; }
static inline S64 Signed(U64 x) { return (S64) x; }
static inline U8 Unsigned(S8 x) { return (U8) x; }
static inline U16 Unsigned(S16 x) { return (U16) x; }
static inline U32 Unsigned(S32 x) { return (U32) x; }
static inline U64 Unsigned(S64 x) { return (U64) x; }
So for example, this code :
mask = set & (-(S32)set);
is a bug waiting to happen if you switch to 64-bit sets. But this :
mask = set & (-Signed(set));
is robust. (well, robust if you include a compiler assert that you're 2's complement)
4. Probably the most common case is where you "know" a value is small and need to put it in a smaller type. eg.
int x = 7;
U8 small = (U8) x;
But all integer-size-change casts are super unsafe, because you can later change the code such that x doesn't fit in "small" anymore.
(often you were just wrong or lazy about "knowing" that the value fit in the smaller type. One of the most common cases for this right
now is putting file sizes and memory sizes into 32-bit ints. Lots of people get annoying compiler warnings about that and think
"oh, I know this is less than 2 GB so I'll just C-style cast". Oh no, that is a huge maintenance nightmare. In two years you try to run
on a larger file and suddenly you have bugs all over and you can't find them because you used C-style casts. Start checking your casts!).
You can do this with a template thusly :
// check_value_cast just does a static_cast and makes sure you didn't wreck the value
template <
typename t_to, typename t_fm>
t_to check_cast( const t_fm & from )
{
t_to to = static_cast<
t_to>(from);
ASSERT( static_cast<
t_fm>(to) == from );
return to;
}
but it is so common that I find the template a bit excessively verbose (again C++0x with LHS specialization would help, you could
then write just :
small = check( x );
small = clamp( x );
which is much nicer).
To do clamp casts with a template is difficult. You can use std::numeric_limits to get the ranges of the dest type :
template <
typename t_to, typename t_fm>
t_to clamp_cast( const t_fm & from )
{
t_to lo = std::numeric_limits<
t_to>::min();
t_to hi = std::numeric_limits<
t_to>::max();
if ( from < lo ) return lo; // !
if ( from > hi ) return hi; // !
t_to to = static_cast<
t_to>(from);
RR_ASSERT( static_cast<
t_fm>(to) == from );
return to;
}
however, the compares inherent (at !) in clamping are problematic, for example if you're trying to clamp_cast from signed to unsigned you may
get warnings there (you can also get the unsigned compare against zero warning when lo is 0). (? is there a nice solution to this ? you want to
cast to the larger ranger of the two types for the purpose of the compare, so you could make some template helpers that do the compare in the
wider of the two types, but that seems a right mess).
Rather than try to fix all that I just use non-template versions for our basic types :
static inline U8 S32ToU8Clamp(S32 i) { return (U8) CLAMP(i,0,0xFF); }
static inline U8 S32ToU8Check(S32 i) { ASSERT( i == (S32)S32ToU8Clamp(i) ); return (U8)i; }
static inline U16 S32ToU16Clamp(S32 i) { return (U16) CLAMP(i,0,0xFFFF); }
static inline U16 S32ToU16Check(S32 i) { ASSERT( i == (S32)S32ToU16Clamp(i) ); return (U16)i; }
static inline U32 S64ToU32Clamp(S64 i) { return (U32) CLAMP(i,0,0xFFFFFFFFUL); }
static inline U32 S64ToU32Check(S64 i) { ASSERT( i == (S64)S64ToU32Clamp(i) ); return (U32)i; }
static inline U8 U32ToU8Clamp(U32 i) { return (U8) CLAMP(i,0,0xFF); }
static inline U8 U32ToU8Check(U32 i) { ASSERT( i == (U32)U32ToU8Clamp(i) ); return (U8)i; }
static inline U16 U32ToU16Clamp(U32 i) { return (U16) CLAMP(i,0,0xFFFF); }
static inline U16 U32ToU16Check(U32 i) { ASSERT( i == (U32)U32ToU16Clamp(i) ); return (U16)i; }
static inline U32 U64ToU32Clamp(U64 i) { return (U32) CLAMP(i,0,0xFFFFFFFFUL); }
static inline U32 U64ToU32Check(U64 i) { ASSERT( i == (U64)U64ToU32Clamp(i) ); return (U32)i; }
static inline S32 U64ToS32Check(U64 i) { S32 ret = (S32)i; ASSERT( (U64)ret == i ); return ret; }
static inline S32 S64ToS32Check(S64 i) { S32 ret = (S32)i; ASSERT( (S64)ret == i ); return ret; }
which is sort of marginally okay. Maybe it would be nicer if I left off the type it was casting from in the name.