Relation of loadings and statistical significance

The p-value (statistical significance) measures the difference in relation to the variability. The loading values are the coefficients of the linear combination in the direction of the largest multivariate variability. The latter are influenced by the other variables in the data set (their correlation structure). So while there exists some sort of relation, they cannot be compared directly - maybe except for the special case of having just one variable (gene).

You could do robustness checks by removing some variables from the data set that show up on the same principal component axis (a highly correlated one, for example).

Robert F Balshaw

Further to Andreas Krause's comments, don't forget that PCA is usually a technique performed to explore/describe/summarize the variation observed in your data -- but without any notion of group membership being accounted for. That is, PCA is an unsupervised method.

In comparison, your p-values and fold-changes are describing the differences between the groups. This would definitely count as a supervised analysis (though the term is not usually used when looking at one predictor at a time).

The idea might be more familiar if you look at a classification method like Discriminant analysis rather than just one predictor at a time methods.

PCA looks at the patterns of variation in a set of variables X1, .., Xp.

Discriminant analysis uses very similar methods to look at the pattern of variation *between* two groups.

Crudely speaking,

PCA: ~ X1 + X2 + ... + Xp

Disc Analysis: Group ~ X1 + X2 + ... + Xp

PCA is unsupervised; DA is supervised. These address different questions, so the importance of the X variables can change substantially.

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

Why my colony PCR results of my recombinant bacterial not showing any results?

How to report results of Generalised Linear Mixed Models in a journal article?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

What is the acceptable p-value cutoff for GO enrichment analysis ?

Inquiry on Maximum Nucleic Acid Volume for 2.5 mL Liposome Solution?

Which statistical test should we use?

Seeking Software Recommendations for SELEX NGS Data Analysis?