This may be a stupid question but, apart from the normal covariance matrix made up from cov(x1,x2); cov(x2,x3);(x1,x3), is there a way of calculating cov (x1,x2,x3)?
you cannot interpret the result univocally. For instance, if Cov(X1,X2,X3)>0 were large, you wouldn't know what is happening, as different combinations of signs (like +·+·+, +·-·-) contribute to the same result. Thus, you cannot distinguish whether X2 and X3 are both large when X1 is large, or X2 and X3 are both small when X1 is large, or X2 and X3 are either both large or both small when X1 is large.
Another argument against using that generalization to more than two variables is the following. For two variables, you have Cov(X,X)=Var(X), so it is plausible to interpret covariance as being related to variability. But for more variables, Cov(X,X,X) and so on are related to higher moments of X which are not interpreted as variability, but as symmetry (third moment), kurtosis (fourth moment), and so on.
(Where `and so on' means `I have no idea what the interpretation of the fifth moment is.)
Yes, there is a direct generalization of the covariance concept to any dimension formulated with an ( at least) semidefinite positive matrix- the covariance matrix- An explanation easy to understand is given by the above response by A. Bazzi.
Thank you very much for the answer, but I still wonder though.
The covariance matrix is very much the presentation of pair-wise covariances, not of all variables. For time series data, we have the co-integration analysis, but I wonder for panel data, would it be possible?
you cannot interpret the result univocally. For instance, if Cov(X1,X2,X3)>0 were large, you wouldn't know what is happening, as different combinations of signs (like +·+·+, +·-·-) contribute to the same result. Thus, you cannot distinguish whether X2 and X3 are both large when X1 is large, or X2 and X3 are both small when X1 is large, or X2 and X3 are either both large or both small when X1 is large.
Another argument against using that generalization to more than two variables is the following. For two variables, you have Cov(X,X)=Var(X), so it is plausible to interpret covariance as being related to variability. But for more variables, Cov(X,X,X) and so on are related to higher moments of X which are not interpreted as variability, but as symmetry (third moment), kurtosis (fourth moment), and so on.
(Where `and so on' means `I have no idea what the interpretation of the fifth moment is.)
I do not think it is reasonable to calculate covariance of more than two variables, because the relationship among many variables cannot be defined clearly. However, if we have just only two variables, it is easy to think about the relationship between them. Hope can help you.
yes, you can you use principal component analysis techniques on the numpy.cov generated matrix to find how the variable related to each other. for example singular value decomposition can be used to decompose the matrix into its eigenvalues and eigen-vectors. analysis can be made on these eigen value and eign-vectors to examine dependency of the analysed variables. I use this technique in seismic data analysis recorded by a geophone having three orthogonal sensors.
This might give a simple demonstration; http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm