11/29/2007

11-29-07 - 2

X and Y are vectors, (or a series of numbers). You want to do a regular linear best fit, Y = m * X + b. If we use the notation that <> is the average over the series, then :

m = ( < X * Y > - < Y > * < X > ) / ( < X * X > - < X > * < X > )

b = < Y > - m * < X >;

This is super standard but it's nice and concise which makes it a nice thing to gather. "m" is very almost the "correlation". If we use the formulas

sdev(X) = ( < X * X > - < X > * < X > )

rmse(X) = sqrt( sdev(X) )

then :

correlation = ( < X * Y > - < X > * < Y > ) / ( rmse(X) * rmse(Y) )

Note that if you put the variables in "unbiased form" by subtracting off the average and dividing by the rmse (making it have an average of zero and rmse of 1.0), then the correlation is just < X * Y > , which is the same as the "m" in the linear best fit for unbiased variables.

No comments:

old rants