# cbloom rants

## 6/21/2009

### 06-21-09 - Fast Exp & Log

So in an earlier post I wrote about approximation of log2 and Ryg commented with links to Robin Green's great GDC 2003 talk : part1 (pdf) and part2 (pdf) ( main page here ).

It's mostly solid, but in part 2 around page 40 he talks about "fastexp" and "bitlog" and my spidey senses got tingling. Either I don't understand, or he was just smoking crack through that section.

Let's look at "bitlog" first. Robin writes it very strangely. He writes :

```
A Mathematical Oddity: Bitlog
A real mathematical oddity
The integer log2 of a 16 bit integer
Given an N-bit value, locate the leftmost nonzero bit.
b = the bitwise position of this bit, where 0 = LSB.
n = the NEXT three bits (ignoring the highest 1)

bitlog(x) = 8x(b-1) + n

Bitlog is exactly 8 times larger than log2(x)-1

Bitlog Example
For example take the number 88
88 = 1011000
b = 6th bit
n = 011 = 3
bitlog(88) = 8*(6-1)+3
= 43
(43/8)+1 = 6.375
Log2(88) = 6.4594
This relationship holds down to bitlog(8)

```
Okay, I just don't follow. He says it's "exact" but then shows an example where it's not exact. He also subtracts off 1 and then just adds it back on again. Why would you do this :
```
bitlog(x) = 8x(b-1) + n

Bitlog is exactly 8 times larger than log2(x)-1

```
When you could just say :
```
bitlog(x) = 8xb + n

Bitlog is exactly 8 times larger than log2(x)

```
??? Weird.

Furthermore this seems neither "exact" nor an "oddity". Obviously the position of the MSB is the integer part of the log2 of a number. As for the fractional part of the log2, this is not a particular good way to get it. Basically what's happening here is he takes the next 3 bits and uses them for linear interpolation to the next integer.

Written out verbosely :

```
x = int to get log2 of
b = the bitwise position of top bit, where 0 = LSB.

x >= (1 << b) && x < (2 << b)

fractional part :
f = (x - (1 << b)) / (1 << b)

f >= 0 && f < 1

x = 2^b * (1 + f)

correct log2(x) = b + log2(1+f)

approximate with b + f

note that "f" and "log2(1+f)" both go from 0 to 1, so it's exact at the endpoints
but wrong in the middle

```
So far as I can tell, Robin's method is actually like this :
```
uint32 bitlog_x8(uint32 val)
{
if ( val <= 8 )
{
static const uint32 c_table = { (uint32)-1 , 0, 8, 13, 16, 19, 21, 22, 24 };
return c_table[val];
}
else
{
unsigned long index;

_BitScanReverse(&index,(unsigned long)val);

ASSERT( index >= 3 );

uint32 bottom = (val >> (index - 3)) & 0x7;
uint32 blog = (index << 3) | bottom;

return blog;
}
}

```
where I've removed the weird offsets of 1 and this just returns log2 times 8. You need the check for val <= 8 because shifting by negative amounts is fucked.

But you might well ask - why only use 3 bits ? And in fact you're right, I see no reason to use only 3 bits. In fact we can do a fixed point up to 27 bits : (we need to save 5 bits at the top to store the max possible integer part of the log2)

```
float bitlogf(uint32 val)
{
unsigned long index;

_BitScanReverse(&index,(unsigned long)val);

uint32 vv = (val << (27 - index)) + ((index-1) << 27);

return vv * (1.f/134217728); // 134217728 = 2^27
}

```
what we've done here is find the pos of the MSB, shift val up so the MSB is at bit 27, then we add the index of the MSB (we subtract one because the MSB it self starts the counting at one in the 27th bit pos). This makes a fixed point value with 27 bits of fractional part, the bits below the MSB act as the fractional bits. We scale to return a float, but you could of course do this with any # of fixed point bits and return a fixed point int.

But of course this is exactly the same kind of thing done in an int-to-float so we could use that too :

```
float bitlogf2(float fval)
{
FloatAnd32 fi;

fi.f = fval;

float vv = (float) (fi.i - (127 << 23));

return vv * (1.f/8388608); // 8388608 = 2^23
}

```
which is a lot like what I wrote about before. The int-to-float does the exact same thing we did manually above, finding the MSB and making the log2 and fractional part.

One note - all of these versions are exact for the true powers of 2, and they err consistently low for all other values. If you want to minimize the maximum error, you should bias them.

The maximum error of ( log2( 1 + f) - f ) occurs at f = ( 1/ln(2) - 1 ) = 0.442695 ; that error is 0.08607132 , so the correct bias is half that error : 0.04303566

Backing up in Robin's talk we can now talk about "fastexp". "fastexp" is doing "e^x" by using the floating point format again, basically he's just sticking x into the exponent part to get the int-to-float to do the 2^x. To make it e^x instead of 2^x you just scale x by 1/ln(2) , and again we use the same trick as with bitlog : we can do exact integer powers of two, to get the values in between we use the fractional bits for linear interpolation. Robin's method seems sound, it is :

```
float fastexp(float x)
{
int i = ftoi( x * 8.f );

FloatAnd32 f;
f.i = i * 1512775 + (127 << 23) - 524288;

// 1512775 = (2^20)/ln(2)
// 524288 = 0.5*(2^20)

return f.f;
}

```
for 3 bits of fractional precision. (note that Robin says to bias with 0.7*(2^20) ; I don't know where he got that; I get minimum relative error with 0.5)).

Anyway, that's all fine, but once again we can ask - why just 3 bits? Why not use all the bits of x as fractional bits? And if we put the multiply by 1/ln(2) in the float math before we convert to ints, it would be more accurate.

What we get is :

```
float fastexp2(float x)
{
// 12102203.16156f = (2^23)/ln(2)
int i = ftoi( x * 12102203.16156f );

FloatAnd32 f;
f.i = i + (127 << 23) - 361007;

// 361007 = (0.08607133/2)*(2^23)

return f.f;
}

```
and indeed this is much much more accurate. (max_rel_err = 0.030280 instead of 0.153897 - about 5X better).

I guess Robin's fastexp is preferrable if you already have your "x" in a fixed point format with very few fractional bits (3 bits in that particular case, but it's good for <= 8 bits). The new method is preferred if you have "x" in floating point or if "x" is in fixed point with a lot of fractional bits (>= 16).

ADDENDUM :

I found the Google Book where bitlog apparently comes from; it's Math toolkit for real-time programming By Jack W. Crenshaw ; so far as I can tell this book is absolute garbage and that section is full of nonsense and crack smoking.

ADDENDUM 2 :

it's obvious that log2 is something like :

```
x = 2^I * (1+f)

(I is an int, f is the mantissa)

log2(x) = I + log2(1+f)

log2(1+f) = f + f * (1-f) * C

We've been using log2(1+f) ~= f , but we know that's exact at the ends and wrong in the middle
so obvious we should add a term that humps in the middle.

If we solve for C we get :

C = ( log2(1+x) - x ) / x*(1-x)

Integrating on [0,1] gives C = 0.346573583

```
hence we can obviously do a better bitlog something like :
```
float bitlogf3(float fval)
{
FloatAnd32 fi;

fi.f = fval;

float vv = (float) (fi.i - (127<<23));

vv *= (1.f/8388608);

//float frac = vv - ftoi(vv);

fi.i = (fi.i & 0x7FFFFF) | (127<<23);

float frac = fi.f - 1.f;

const float C = 0.346573583f;

return vv + C * frac * (1.f - frac);
}

```

#### 1 comment:

Robin said...

Jack Crenshaw is a highly opinionated, self taught hornery bugger of an engineer. The book is full of inspiration to do it yourself, but some of his mathematical background lacks, shall we say, depth. I, shamefully, never fully grokked the algorithms he talked about, so thank you for the exposition.

I got a lot more good ideas from Ping Tak Peter Tang's work with the Itanium chips and semi-table-based approximations.