I am doing PCA on two variables (one independent and one dependent) which have completely different units and scale. More exactly, I am doing multivariate-EOF analysis as the columns of the data matrix represent 12 grid points of the two variables. The first 12 column of the data matrix consist of variable A representing 12 grid cells data of a climate dataset. The latter 12 column consists data of another variable B representing the same 12 grid boxes. The rows has 365 days of daily observations (1-year) for both variables. I reconstructed the two variables A and B using the three largest principal components in order to reduce the noise in the data. I conducted PCA both by standardizing (on correlation matrix) and not standardizing (on covariance matrix) the data and in both cases the first three principal components explain over 90% variation. I calculated the correlation between the reconstructed dataset of A and B and I am surprised that non-standardized method gave significantly higher correlation compared to standardized method. This is desired in my research because the two variables are in fact physically correlated and I want to use the reconstructed data for developing multiple regression model (by conducting PCA in a similar fashion on other independent variables as well). The stat community generally agree on the preference of correlation matrix over covariance matrix, but this result forces me to user the covariance matrix and not correlation matrix. Could anyone comment on this?

Similar questions and discussions