In the component matrix, where the variables are grouped within components, some of them have negative values, so that I really would like to know the meaning of the sign in this case.
Each PC has one dimension, and the mid-point has value 0. The sign (positive or negative) tells you the direction that a given variable in that PC is going on a single dimension vector.
For example, if you have 5 variables, the first PC has an eigenvalue of 0.8, and the loadings of each variable in this PC are -0.8, -0.5, 0, 0.2, and 0.5, you can conclude that:
1) Variable 3 doesn't play any role in explaining the variation on PC1 (Var3 has value = 0)
2) Var4 has small role, whereas the others have sizable roles in explaining the variation due to that PC.
3) Var1 will have greater impact than Var2 and Var5.
4) There is a perfect contrast between Var2 and Var5
5) FINALLY, the PC scores derived from this PC (linear function of this PC and the observed values for those variables) will show that individuals with negative PC scores will tent to have greater values of Var1 and Var2, and lower values for the remaining Vars, whereas individuals with PC scores greater than 0 will tend to have greater values of Var4 and Var5, and lower of the remaining.
Each PC has one dimension, and the mid-point has value 0. The sign (positive or negative) tells you the direction that a given variable in that PC is going on a single dimension vector.
For example, if you have 5 variables, the first PC has an eigenvalue of 0.8, and the loadings of each variable in this PC are -0.8, -0.5, 0, 0.2, and 0.5, you can conclude that:
1) Variable 3 doesn't play any role in explaining the variation on PC1 (Var3 has value = 0)
2) Var4 has small role, whereas the others have sizable roles in explaining the variation due to that PC.
3) Var1 will have greater impact than Var2 and Var5.
4) There is a perfect contrast between Var2 and Var5
5) FINALLY, the PC scores derived from this PC (linear function of this PC and the observed values for those variables) will show that individuals with negative PC scores will tent to have greater values of Var1 and Var2, and lower values for the remaining Vars, whereas individuals with PC scores greater than 0 will tend to have greater values of Var4 and Var5, and lower of the remaining.
Sir, I did pca analysis for C-alpha of protein having 1314 no. of residues. the eigenvalues obtained after diagonalizing the covariance matrix is gradually decreasing and last few (2 or 3 ) eigenvalues are negative like -1.7e-07 to -4.4e-07. Is that i can ignore this or this protein is showing random motion?
I did PCA using gromacs tool. The first eigenvalue only contribute 35% of the total fluctuation and first 10 eigenvalue contributes 70% is that there is some cut-off or some criteria to check is it right or not?
Given that we have eigen values and eigen vectors from the coordinate covariance matrix diagonalization, how do we calculate how much the PC component contribute to the fluctuation ?
I am trying to answer the question by Nivedita Rai. To me, it looks like you haven't standardized your data sets. There is a large variation in units. That's why you are getting negative eigenvalue. One more thing, generally you can accept PCs with corresponding eigenvalue greater than 1. Also, look at your scree plot, consider eigenvalue with steep curve and ignore when it bends followed by straight line. I have used SAS, Minitab, and few more. You can use any stat package. I found Minitab better in understanding and exploring your data for PCA. Hope this will help you to understand your data set better.
This tutorial with hands on and code included is the easiest way to understand PCA (it includes the data normalization too): http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
As far as I can understand, you are basically trying to know whether you should retain the negative values and should consider them while doing the PCA. Technically, yes! The negative values only suggest the direction of the correlation between the component and the variable as the correlation could be positive and negative too if your are interested in linear relationship. Furthermore, while shortlisting the components, one should also decide on the cut off criterion irrespective of the fact that whether the value is negative or absolute. Generally, the reliability of the factor would also depend on the relationship between the individual rotated factor loading and the magnitude of the absolute sample size. In other words, larger the sample size, smaller factor loading could be allowed for a factor to be considered significant (Stevens, 2002). Following a thumb rule, with an alpha level of .01 (two-tailed), a rotated factor loading of at least .32 for a sample size of 300 would be considered statistically meaningful (Tabachnick & Fidell, 2007). The choice of cut-off would also depend on the complexity of the variables being handled.
i was just thinking that values of an eigen vector (with mod value 1) may be (0.707 , 0.707) or may be (-0.707 , -0.707). both have mod 1 but how will we choose the right vector for right pca analysis? and again if eigen vector is (0.707 , -0.707) or ( -0.707, 0.707) how will we choose between them? because both are same when we find eigen vectors for a particular eigen value.
When we say correlation that means can be two directional i.e. positive and negative. Interpretation of the principal components is based on finding which variables are most strongly correlated with each component, i.e., which of these numbers are large in magnitude, the farthest from zero in either positive or negative direction.
When different variables are subjected to PCA, not all are positively correlated with each other. There are some variables which are negatively correlated. That's why, theoretically variance as represented by e-values in PCA may b negative or positive.
In the PCA analysis negative values of loadings of variable in the components of the PCA means the existence of an inverse correlation between the factor PCA and the variables.