12/21/2006

12-21-06 - 3

There's a hacky way of solving coupled non-linear systems which I've never really seen documented. The numerical people will puke, but it works incredibly well in almost all cases and is very simple to implement. I dunno, maybe this is well known but I've never seen it really described.

Say you have some function F(A,B) where A and B are vectors. You wish to solve F(A,B) = 0 , (or perhaps minimize an error E(A,B), so take the derivative to get F's). Do some algebra and substitutions until you have two seperate functions that depend only on the other variable :

A = G(B)
B = H(A)

Where G and H may be arbitrary nonlinear functions so you can't just plug them together and solve it. If B was the right answer for B then A =G(B) would give you the exact solution for A and vice-versa.

The hacky solution is basically just to set A from G, then B from H, and repeat. There are some simple hacks to make it better.

First imagine the solutions converging as a time series, so we have A_t and B_t. First of all it's important to have a pretty good initial guess for A_0 and B_0 (actually just for one or the other) because this method doesn't work if you're very far from the right answer.

The next hacky trick is to take less than the full step for the first few steps. A full step would be :

A_t_1 = G(B_t)
B_t_1 = H(A_t)
( A_t_1 should be read as A_(t+1) , it's A at the next time step)

Instead take a fractional step of size f (f in 0 - 1) :

A_t_1 = (1-f) * A_t + f * G(B_t)
B_t_1 = (1-f) * B_t + f * H(A_t)

This avoid oscillations and helps you settle in. You can start with f = 0.5 and quickly step it up 0.6,0.7. For some problems you might want to have "f" max out at 0.99 or something rather than 1.0

The next hack which helps a lot is to use A_t_1 instead of A_t when solving for B_t_1 :

A_t_1 = G(B_t)
B_t_1 = H(A_t_1)

This sort of lets you take a more exact step. If you want you can swap the order of A & B on every other step, or do some sort of fancier thing like :

A_int = 0.5 * A_t + 0.5 * G(B_t)
B_t_1 = H(A_int)
A_t_1 = G(B_t_1)

which is sort of like a centered-time integrator or something though I have no idea if that's actually better. Actually this whole thing sort of reminds me of implicit integrators.

Anyway, in practice this type of iteration converges incredibly fast. In problems where I've used Newton's method or Conjugate Gradient or something and it's taken like 500 iterations to converge, this iteration took 20 steps. Obviously it only works on certain types of problems and they have to have well behaved local minima, but when it works it works great.

To be a bit more concrete I'll give you a specific example. I've been examining the "weighted SVD", and one step of that is the approximation of a matrix with a rank-1 tensor (that's just an outer product of two vectors). The problem is :

Given matrix A_ij
and weights w_ij
Find vectors U_i and V_j such that the error
E = Sum_ij { w_ij * (A_ij - U_i*V_j) }
is minimized

If you take derivatives of E with respect to U and V you can solve for two equations :

U_i = Sum_j { A_ij * w_ij * V_j } / Sum_j { w_ij * (V_j^2) } = H(V)
V_j = Sum_i { A_ij * w_ij * U_i } / Sum_i { w_ij * (U_i^2) } = G(U)

Directly solving these is impossible (I think), but they're exactly in the form I need for my hacky iteration, and in fact it works super well.

I guess the reason no one likes this approach is that on general problems it can very badly fail to converge.

Addendum : this approach is also great for equations in one variable by substitution. For example say you have some evil equation to solve like :

3 + 7*x + sin(x) + sin(x)^2 = 0

Just set y = sin(x) and pretend you have equations in two variables. Now :

x = (- 3 - y - y*y)/7
y = sin(x)

Are the two equations and you just use the alternate stepping approach. The half-step iteration I described here converges to within FLT_EPSILON in 5 steps !!

No comments:

old rants