Dear everyone,

I just want to construct a sustainable development index using PCA. Ihave divided the inputs into 3 groups: economic, social and environment and I did log some skewed variables like gdp or energy intensity, thus the KMO test for it is oke (>0.7) already.

So, Q1: Is it oke if i have some log like this for gdp, and even some variables as % gdp like fdi, gross debt, industrial value added (all > 0)?

But as stata will standard the varibles when PCA so (Q2) does it mean that I dont need to standard scale for the data anymore? (or should i still standard before)

Also, in my PCA, PC1 only explain about 57% if I PCA for all without grouping it into 3 aspects (social, econ, environment). And I'm considering doing PCA for each group, so Q3: is it OK to do PCA for each group and then create a index by the weighted sum? And if ok, could you recommend some papers that write carefully about this method for me?

Q4: But for PCA, if i want to create a final index, then many papers use the weighted sum of eigen value or significance of each to create the final score? But when I get the eigenfactors, some have different signs that is simply mathemetical, so how to deal with this, or do I need to deal with this?

Q5: As i use chatGPT, it recommend me to flip (the signs) of some variables like co2 as it say that higher is bad to be align with others like gdp which means higher is good. But in real, i have not found out any papers talking deeply about this, so is it correct?

Many thanks to all.

More Jelly Dang's questions See All
Similar questions and discussions