11/12/2010

11-12-10 - Some notes on function approximation by iteration

People often blindly use Newton-Raphson method to iteratively improve function approximations. It's worth taking a moment to think about where that comes from and how we can do better.

First of all, iterative function approximation almost always comes from some kind of epsilon expansion (often Taylor series). You assume you are near the desired answer, you set the discrepancy to epsilon, then expand and drop higher powers of epsilon.

For example, Newton-Raphson iteration is this :


You wish to find a root r of f , eg. f(r) = 0

Assume I have x_0 which is close to r, eg, r= x_n + e

0 = f(r) = f(x_n + e) = f(x_n) + e * f'(x_n) + O(e^2)

so

e = -f(x_n)/f'(x_n) + O(e^2)

r = x_n - f(x_n)/f'(x_n) + O(e^2)

therefore

x_n+1 = x_n - f(x_n)/f'(x_n)

is closer to r than x_n

The way we use this to do function approximation is we set up a function f() such that the root of f is at the function we wish to approximate.

For example to find sqrt(A) you would set f(x) = x^2 - A ; this f has roots where x^2 = A , eg. x = sqrt(A).

Note that this method is only really useful when you have a function like sqrt where the function is expensive but its inverse is cheap. This is where the "art" comes in - there are any number of functions you can write which have the root at sqrt(A), but you need to write one that doesn't involve a sqrt; the whole point is to find an iteration that is much cheaper than the full quality function. For example to approximate reciproal sqrt you could also use f(x) = x^2 - 1/A or f(x) = 1/x^2 - A , both are valid but produce different iterations of different complexities.

So, now that we sort of see where Newton-Raphson iteration comes from, we can throw it away. What we really want is a more general idea - start with a point close to the solution, do an epsilon expansion, solve for epsilon, that gives us an iteration. In particular, Newton-Raphson is doing an expansion of the form r = x_n + e , but we could also do r = x_n * (1+e) or r = x_n/(1+e) or whatever. Sometimes these give better iterations (in terms of complexity vs. accuracy).

The next major general point to remember is that all of these methods are based on something like Taylor expansion, and while Taylor expansion is great for infinite series, we should all know that it is *not* optimal for finite series. That is, the first N terms of the Taylor expansion of a function are *not* the best N-term polynomial expansion of that function (except in the limit as epsilon goes to zero).

Over a finite interval, you could use Legendre Polynomials to find the actual best N-term approximation (or any other orthogonal polynomial basis). Since we are on computers here it's often easier to just do a brute force solve for the best coefficients.

This is well known to people who do function approximation (eg. any sane person doing a polynomial approximation of sin(x) over the range [0,pi] is not using a Taylor expansion) (see previous discussion of this 06-13-09 and 06-21-09 ). However, when people do iterative approximation this idea goes out the window for some reason.

In particular, after your initial approximation, you usually know something about the error, whether it is positive, what the bound is. Then you can expand in the error and find the optimal coefficients over that range. This gives you an iterative refinement step which is superior to Newton.

I'll do a followup post and work a specific example.

2 comments:

ryg said...

"we could also do r = x_n * (1+e) or r = x_n/(1+e) or whatever. Sometimes these give better iterations (in terms of complexity vs. accuracy)."
For the specific case of reciprocal, square root and reciprocal square root computations, this is called "Goldschmidt's algorithm". Both Newton-Raphson and Goldschmidt actually evaluate the same series expansion in this case, but Goldschmidt is less serial which makes it popular for FP dividers. See this paper for example.

There's some papers on HW implementations and math libraries for newer architectures and they're worth reading, e.g. this one (also Google for the paper and look at the citations etc). Lots of good stuff in there.

"You could use Legendre Polynomials to find the actual best N-term approximation (or any other orthogonal polynomial basis)"
Orthogonal polynomials give you an optimal approximation wrt. weighted L2 norms (int_a^b (f_approx(x) - f_real(x))^2 w(x) dx, where w(x) is a weighting function corresponding to your choice of basis). So it minimizes average error, but to minimize maximum absolute/relative error (which is usually what you want) you need a different approach (minimax approximation). Still, they're usually significantly better than Taylor polynomials, and the coefficients are easy to determine even without a computer algebra system (unlike minimax polynomials).

cbloom said...

Yeah, you're right that it should be minimax/remez polynomials, though if you bring x into a standard power of two range like [1,2) then just use L2 norm instead of ratio, it's not that far off.

(BTW I just noticed Boost now has a Remez resolver :

http://www.cppprog.com/boost_doc/libs/math/doc/sf_and_dist/html/math_toolkit/backgrounders/remez.html

)

old rants