2/19/2009

02-19-09 - Two Code Gems

First one is from Mischeif/Mayhem (RDE) :

I, like many, have long used this array count :


#define ARRAY_SIZE(data)    (sizeof(data)/sizeof(data[0]))

However, it's very error prone. You can get errors just by accidentally using it wrong, but you also get "silent refactor errors" which is one of the things I really really hate. When I go and change a variable in a way that changes its function, I want all the code that relies on it to scream at me. Basically I want to be able to touch parts of the code without scanning every single line that uses that variable. For example you can use it like this :

    DWORD test1[4];
    char * test2[4];
    char * test3;
        
    COMPILER_ASSERT( ARRAY_SIZE(test1) == 4); // okay
        
    COMPILER_ASSERT( ARRAY_SIZE(&test1) == 4); // wtf
    COMPILER_ASSERT( ARRAY_SIZE(*test2) == 4); // wtf
    COMPILER_ASSERT( ARRAY_SIZE(test3) == 4); // wtf

Maciej has a cool template trick to make the ARRAY_SIZE macro more constrained so that it only works when you want it to work :


template < typename T, size_t N > char (&ArrayCountObj(const T (&)[N]))[N];
#define RDE_ARRAY_COUNT(arr)    (sizeof(rde::ArrayCountObj(arr)))

This is a very common template metaprogramming trick : the use of *types* as *values*. In order to basically make a function that returns N, you create a type that contains the value N (in this case a char array of size N). Then to turn that into a number, you use sizeof(), which is a device that can turn types into ints at compile time. Nice! C++ is a magic box that lets you make C the way it should be. (sometimes).

BTW on a semi-related coding style issue, say you have some initialized array and a count, like :


// expose in header :
extern const int c_numStrings;
extern const char * c_myStrings[];

Don't do this :

const int c_numStrings = 4;
const char * c_myStrings[4] = { "a", "b" "c" };

Instead do this :

const char * c_myStrings[] = { "a", "b" "c" };
const int c_numStrings = ARRAY_SIZE(c_myStrings);

or :

const int c_numStrings = 4;
const char * c_myStrings[] = { "a", "b" "c" };
COMPILER_ASSERT( numStrings == ARRAY_SIZE(c_myStrings) );

To ensure that you actually had the right number of initializers. This ensures that you set up the array right, and also that you keep the array in sync with whatever count it is supposed to correspond to.


(addendum : you can skip reading this whole thing and just go to Wikipedia which has made me her bitch)

Next one is random permutation. The right way to generate a permutation with all possibilities equally likely is :


for(int i=0;i < num;i++) array[i] = i;

for(int i=0;i < num;i++)
{
    int j = i + rand_count(num - i);
    swap( array[i], array[j] );
}

It's pretty easy to see this is correct because at each i you are drawing a number at random from the remaining possibilities, just like you would if you were drawing a number from a hat. (the last iteration is always a NOP so you could iterate up to just num-1).

That's important but well known (thanks Sean), but lets look at rand_count. rand_count(size) should return a random number in [0,size-1].

I'm going to assume you have a decent base random number generator called rand() , but it should not be the CLIB rand (may I suggest KISS99). Many people write rand_count like :


int rand_count(int size)
{
    int r = rand();
    return r % size;
}

Which sort of works, but there are some problems with this. First of all, with a lot of random number generators, the low bits are the worst. You could shuffle the bits to try to fix that. Second of all, modulo is really really slow (it's a divide). But perhaps worst of all - this is only an even distribution in [0,size-1] when size is very small !!

When size is close to the range of rand (15 bits in clib), it becomes very biased towards low numbers. For example if size is 20,000 , rand() can go up to 32768, and only 12768 values will wrap around, so the first 12768 are twice as likely as the remaining 7232 !!

Of course you could also do it by multiplication. Say your rand() made a 15-bit rand like clib does, then you can do :


int rand_count(int size)
{
    int r = rand();
    return (r * size) >> 15;
}

And that works, but has a nasty problem in that the multiplication can overflow if size is too big. (note that in that case the previous rand_count would be equally bad and just fail to ever return a big value). In many applications it would suck to have to check that size is too big, like say I'm testing something and want to grab a random byte from a file, I want to handle up to 2 GB files and not have to worry about my rand overflowing.

So the obvious answer is to go 64-bit and that also leads to a sweet optimization. rand32() here returns a full 32-bit random unsigned int :


unsigned int rand_count(int size)
{
    unsigned int r = rand32();
    return (unsigned int) ( ( (uint64) r * size) >> 32 );
}

And MSVC actually does the right thing here and actually generates just a plain "mul" and "ret edx". That is, it grabs the high 32 bit part of a 32-bit multiply directly and returns it, there's no 64-bit multiplies and no shift at all. (even if it didn't optimize perfectly, I would still use this because it just always works and I rarely need rand to be super duper fast, it's more important that it's correct)

BTW the multiplication method is using the top bits of rand32() , so you need a rand32() that makes good top bits (or good *all* bits). In fact if size = 2 this is literally returning the top bit. (this is actually a good thing for most simple linear PRNG's because the top bits are very good while the bottom bits are very bad).

BTW I use rand_count defined like this a lot because it's what you want to draw from an array, but I've never found a good name for this function, so help me out. It's always a little confusing that it returns in [0,size-1], not [0,size].

ADDENDUM : Sketch of randMod using retries :


// randMod returns in [0, range-1]
uint32 randMod( uint32 range )
{
    ASSERT( range > 0 ); // infinite loop

	// make a mask which the pow2 above range, minus one
    uint32 mask = range | (range>>1) | (range>>2) | (range>>3);
    mask = mask | (mask>>4);
    mask = mask | (mask>>8);
    mask = mask | (mask>>16);
    // bsr would be faster

    for(;;)
    {
        uint32 rand32 = myrand32();
        uint32 randInMask = rand32 & mask;
        ASSERT( randInMask < range*2 );
        // > 50% chance of passing the test so iterations should be rare
        if ( randInMask < range )
            return randInMask;
    }
}

5 comments:

won3d said...

rand_mod?

ryg said...

There's some additional gotchas if you really need (or want) your distribution to be as close to actually uniform as possible (online gambling would be an example). The first thing is that you're likely to use a PRNG that outputs a fixed number of random bits (typically 16 or 32). If your "count" is not a power of 2, it's important to note that any function from {0,...,2^n-1} to {0, 1, ..., count-1} will hit some numbers more often that others (this is just the pigeonhole principle). So if you really need the distribution to be perfectly flat, your best bet is to first generate n-bit random numbers where n=ceil(log2(count)) "the usual way", and just reject the ones that are >=count.

The other point is that the number of permutations is huge, even for relatively small n. For example, if you're shuffling a deck of 52 cards, your PRNG needs to have a state of at least log2(52!)=225.581... bits to even be theoretically able to produce every possible permutation (you actually want some more bits, again to avoid noticeable bias). A generator with 32-bit state will only ever generate a comperatively tiny subset of all possible permutations.

ryg said...

And now I notice this is all on Wikipedia, too: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle - damn, could've saved some typing :)

cbloom said...

Ah, wow, that's a good article.

In general I have found that trial-reject-retry is a very easy method to generate correctly distributed randoms with arbitrary probabilities. Hmm I wrote about that before.

cacapee said...

A thing that I noticed while reading was that you are swapping from the 0th element onwards. Whereas the actual shuffle does it from the last element downwards. A small point but I thought I'd point it out regardless.

old rants