5.09.2006

Ortho 2

Usage comes from the stats literature. Imagine a distribution of datapoints with some variance, like this:

....*........... *
..*......... *
* .....*...... *
....*..... *
..* ....*
.......*
.....*

Whence the variance in the data? Well, start by taking the mean, that'll be a point somewhere in the middle. Now draw a line (technically, an eigenvector) through the mean that best captures the variance in the distribution... that'll be a vertical line right down the middle, as the data is distributed more 'verticaly' than 'horizontally' as I've drawn the points. But there's still some horizontal scatter unaccounted for by the vertical line, so draw another line through the mean that best captures the remaining scatter. That'll be a horizontal line that's orthogonal to the vertical line. Moreover, the horizontal scatter is independent of the vertical scatter. As a general heuristic, in most datasets, there are two orthogonal eigenvectors that best capture/explain the variance. Hope that's clear... hard to tell you about it like this in a blog. Thus 'orthogonal' vectors describe statistical independence, but it might only work- or at least work cleanest- when the data is normally distributed.

Determining eigenvectors is standard operating procedure in a few of our subfields; if you want to know more, I'm happy to provide refs that treat it more intuitively or mathematically.

No comments: