Hello everyone, I applied PCA in python by taking lagged climate indices and then used its resulted two principal components (showing maximum variance) with GLDAS satellite data precipitation values in multiple linear regression to predict monthly precipitation. My data was 38 years monthly data. The correlation between GLDAS and model's predicted precipitation on the training set was 0.50 and on the test set was 0.49.

I wanted to check the model's performance further so I took 22 years monthly TRMM data and calculated PCA on 22 years monthly lagged climate indices. I entered the obtained two PC values in the previous developed multiple linear regression equation and acquired the predicted monthly precipitation. The correlation was calculated between the TRMM and model predicted precipitation which was negative -0.45.

What is the reason that I got negative correlation and what this negative correlation shows?

Similar questions and discussions