I am conducting an investigation using SEM (Structural Equation Modelling) and I have 4 latent variables and each of which has 7 observed variables. How this affects the parsimony of the model? And in this case what is the appropriate sample size?
The rule of thumb in this regard that for each of the (items/question) of your questionnaire, you should get from 10 to 20 respondents per each one of them. For example, if you have 28 (items/questions) then your minimum sample should be (280 or 560).
Regarding the parsimony of the model, I think it is quite difficult to judge at the moment, however, I would suggest that you run a pilot study with about 100 samples.
This can give you a glance if there some poorly loaded (items/questions), multi-collinearity issues then you can solve them and proceed further with the next iteration of the questionnaire which often will have fewer (items/questions) since some would be dropped due to the previously mentioned reasons.
I cannot say so much with regard to the parsimony-term as I feel that parsimony is a term that rather stems from the area of predictive modeling where the mere goal is data fit. When you have the goal of specifying and testing a model structure that expresses your beliefs about some truly existing data *generating* model (which is judged by the model fit), the term "restrictiveness" should come close and more closely reflects the testable implications of a model. In a nutshell a fitting model lends more support to the proposed structure with increased restrictiveness (i.e., unconditional independencies). From this point of view, your model is highly restrictive (and I guess too much).
That is, your model claims that
a) all seven indicators of a latent are highly correlated
b) that these correlations ALL disappear once you control for the latent variable
c) that all correlations between each indicators of latent X and the indicators of latent Y can be fully explained by the chain linking the indicators (loading x1 -- effect of latent X on latent Y and the loading y1) and finally
d) that these correlations (between the xs and the ys) again disappear once you control latent X and/or latent Y
If you would list these implications, you would certainly fill 3 pages or so. Hence, your model is highly restrictive and because of this rather unrealistic. The danger is that you experience the same fate of other companions with models having many indicators, that the model shows a substantial misfit and, due to lack of alternatives, the model is kept and accepted anyway.
My approach would be to select 3 of the most essential items from each latent variable. This reduces the number of estimated parameters, causes more weight on the testable restrictions of the latent structure, and has a higher chance to fit (=correspond with reality).
With regard to the N issue, you can go with Belal's advise or better, conduct a Monte Carlo simulation in which you experiment with different N's to find out your appropriate N. This, of course as any other power analysis, is only useful if the model makes sense in the first place (see above).
For the case of you have problems to understand the gibberish above :) feel free to ask. I had the experience yesterday where an experienced scholar in a method in which I have little experiences, criticized my approach and I did not understand one single word :)
I do thank you for your valuable and detailed information you provided
- Sample issue is clear.
- Regarding latent variables and indicators I understand that Holger suggested to reduce the number of indicators to 3 in each latent. While Philippe suggested to reduce latent variables even to one factor model.
I am little confused could you please explain more on this issue, Holger and philippe?