It is often thought that linear independence and statistical independence are unrelated concepts from different branches of mathematics. In this short note, I beg to differ. Both the concepts of linear independence and statistical independence are encountered in the world of statistics, often in close proximity (e.g., one frequently sees “covariance is a measure of linear dependence”). Furthermore, linear independence has clear mathematical implications vis-à-vis statistical independence. The links and differences between them are frequent sources of confusion in statistics, so I think it is worthwhile clarifying them.
Consider a simple scenario in which you have two non-zero, non-constant, -dimensional data vectors and .
They are linearly independent if there is no non-zero scalar such that
In other words, there is no non-zero multiplicative constant that will transform into Geometrically, this means that the vectors and do not lie on the same line.
The two vectors and are statistically independent if and only if their joint probability density is the product of their marginal probability densities, i.e.,
(though the reverse implication is not true generally).
The two concepts are linked insofar as if the two vectors are not linearly independent then they can also not be statistically independent. For example, if for some non-zero scalar we have
However, linear independence of and does not guarantee statistical independence. It is possible to have cov even if and are linearly independent. It is only when the linear independence takes a particular form, namely the two vectors being orthogonal, that the covariance between them will also be zero. Therefore one could say that covariance is a measure of `non-orthogonality’ (rather than a measure of linear dependence).