12 December 2018 2 9K Report

Background: We are studying a rural population of 375 malawian households, and are trying to make an asset index based on a list with over 40 binary (and a few continuous) variables copied from the DHS questionaires for Malawi. Trying to conduct a PCA analysis on this data in STATA I got a really low first principal component value, which only accounts for only about 10-13% of the total variation. Trying the Kaiser-Meyer-Olkin Measure of sampling adequacy (KMO), i get an error and i am told that my matrix is singular.

I further wanted to clean and reduce the data i had to do a better analysis, and excluded variables based on: 1. Whether their frequency was higher or lower than respectively 95% or 5% of the population 2. Whether there were were multicolinear variables, (correlation >90% with other variable), as they probably explain the same variance. 3. If they were significantly (chi square test) correlated with less than 6 other variables 4. If they correlated less than 0,099 with majority of the other variables. After this process i am left with 15 variables, and running the PCA now gives me the output of - PrincipalComponent nr1: 3.42364 - Eigenvalue: 1.93568 - Proportion of explained variance: 0.2282 (22,8%) - KMO: Overall | 0.7385 (according to VAM guide; Minimum acceptable value is a value of 0.6.) - Bartlett test of sphericity: Chi-square= 983.032 Degrees of freedom = 105 p-value = 0.000 H0: variables are not intercorrelated

My questions then are: 1. How can i interpret if the proportion of explained variance is high enough(22,8%), and also the KMO (0,7385)? How important are these measures for the validity of the analysis?

2. Does the criteria i use for exclusion of variables sound reasonable, can i consider anything else when choosing variables? for example look at the strength of the significance level of the correlations instead of the amount of significant values in the correlation matrix? 3. I didn ́t rotate my data as it doesn ́t elevate my principal component score. Is this accepted? They do it in other instructions i ́ve seen 4. Having in mind that doing a PCA analysis is a iterative process, how do i know when i have the best possible result? 5. Are there still advantages to doing both PCA and a Factor analysis and comparing the results?

Mehma

Similar questions and discussions