Lei and Wu (2007) give a nice summary of common fit indices. In their example analyses, they use the standardized root mean square residual (SRMR), the root mean square error of approximation (RMSEA), the likelihood ratio chi-square goodness of fit statistic, and sometimes the confirmatory fit index (CFI). Many alternatives are very similar to these.
Article Introduction to Structural Equation Modeling: Issues and Pra...
Lei and Wu (2007) give a nice summary of common fit indices. In their example analyses, they use the standardized root mean square residual (SRMR), the root mean square error of approximation (RMSEA), the likelihood ratio chi-square goodness of fit statistic, and sometimes the confirmatory fit index (CFI). Many alternatives are very similar to these.
Article Introduction to Structural Equation Modeling: Issues and Pra...
One thing worth noting is that while you technically want the Chi-Square to be non-significant in model testing, this is very hard to achieve due to the usually large sample required for it. So if it is in fact significant, that isn't a problem so long as the RMSEA, CFI, and other indicators of fit are good.
The generally used indices include Chi-square/df, CFI, GFI, NFI, RFI, RMSEA. As stated by Dr Bernstein its difficult to get insignificant value for Chi - square because of sample size, hence chi-square /df is used.
in the book "Confirmatory Factor Analysis for Applied Research" (2006) Timothy A. Brown suggests on page 145 (table 4.6.) what information should be reported. Hopefully this is of some guiding help.
Building on Professor Bernstein's point, χ² tests can produce more interesting results when used for comparing nested models other than your model vs. the saturated model, which freely estimates all covariances. Consider also that one need not adopt the Neyman–Pearson framework for interpreting the p values that result from χ² tests. I.e., don't think you necessarily have to reject the null completely just because p < .05, unless that's the way you planned to test your hypothesis in advance.
Some other fit indices mentioned here are somewhat redundant. I wouldn't want to say one is better than the other within these classes, but I don't know that it's worthwhile to report more than one or two indices from the same class. I paraphrase Lei and Wu (2007) in the following, mostly to borrow their class terminology and citations. Also see the linked discussion among developers of the lavaan (portmanteau of "latent variable analysis") package for R. It offers a sense of how much space you could take up by just reporting everything available (though I'm not sure how many of these are supported in Amos), as well as a sense of how results may vary among different fit indices of the same classes (they classify indices somewhat differently).
"Incremental" fit indices include the aforementioned CFI, the Tucker–Lewis Index (TLI; Tucker & Lewis, 1973), the normed fit index (NFI) and its non-normed counterpart (NNFI; Bentler & Bonett, 1980), the relative noncentrality index (RNI; McDonald & Marsh, 1990), and Bollen's relative and incremental fit indices (RFI & IFI). These all compare improvements in the tested model vs. the independence model, which fixes all inter-factor covariances to zero and only calculates variances.
"Absolute" fit indices compare the tested model to the saturated model. These include the goodness-of-fit index (GFI; Jöreskog & Sörbom, 1986), its adjusted form (AGFI), and the aforementioned SRMR and RMSEA. Only the latter two indicate better model fit when results are closer to zero. With the rest, you would hope to see results closer to one. (Unless you want the model to fail!)
Several authors offer varied guidelines as sample size and model design seem to be crucial factors to consider. The researcher can report the indices that best fits his model. The attached also provides insight and a good read
One of the recommended article that will answer your question is Hooper, D., Coughlan, J. and Mullen, M. R. “Structural Equation Modelling: Guidelines for Determining Model Fit.” The Electronic Journal of Business Research Methods Volume 6 Issue 1 2008, pp. 53 - 60. Please find the attached article.
A list of fit index: NFI: Bentler-Bonett normed fit index; IFI: Bollen's incremental fit index; TLI: Tucker-Lews index; CFI; comparative fit index; GFI: goodness of fit index; and RMSEA: root mean square error of approximation. In reporting, people often select two FIs (i.e., CFI, TLI, >0.9 indicating good fit), plus RMSEA (
Rules of Thumb Ratio of Sample Size to the Number of Free Parameters Tanaka (1987): 20 to 1 (Most analysts now think that is unrealistically high.) Goal: Bentler & Chou (1987): 5 to 1 Several published studies do not meet this goal. Sample Size 200 is seen as a goal for SEM research Lower sample sizes can be used for Models with no latent variables Models where all loadings are fixed (usually to one) Models with strong correlations Simpler models Models for which there is an upper limit on N (e.g., countries or years as the unit), 200 might be an unrealistic standard.
Power Analysis Best way to determine if you have a large enough sample is to conduct a power analysis. Either use the Sattora and Saris (1985) method or conduct a simulation. To test your power to detect a poor fitting model, you can use Preacher and Coffman's web calculator.
The Chi Square Test: χ2
For models with about 75 to 200 cases, the chi square test is generally a reasonable measure of fit. But for models with more cases (400 or more), the chi square is almost always statistically significant. Chi square is also affected by the size of the correlations in the model: the larger the correlations, the poorer the fit. For these reasons alternative measures of fit have been developed. (Go to a website for computing p values for a given chi square value and df.)
Sometimes chi square is more interpretable if it is transformed into a Z value. The following approximation can be used:
Z = √(2χ2) - √(2df - 1)
An old measure of fit is the chi square to df ratio or χ2/df. A problem with this fit index is that there is no universally agreed upon standard as to what is a good and a bad fitting model. Note, however, that two currently very popular fit indices, TLI and RMSEA, are largely based on this old-fashioned ratio.
The chi square test is too liberal (i.e., too many Type 1) errors when variables have non-normal distributions, especially distributions with kurtosis. Moreover, with small sample sizes, there are too many Type 1 errors.
Introduction to Fit Indices
The terms in the literature used to describe fit indices are confusing, and I think confused. I prefer the following terms (but they are unconventional): incremental, absolute, and comparative which are used on the pages that follow.
Incremental Fit Index
An incremental (sometimes called in the literature relative) fit index is analogous to R2 and so a value of zero indicates having the worst possible model and a value of one indicates having the best possible. So the researcher's model is placed on a continuum. In terms of a formula, it is
Worst Possible Model – My Model Worst Possible Model – Fit of the Best Possible Model
The worst possible model is called the null or independence model and the usual convention is to allow all the variables in the model to have variation but no correlation. (The usual null model is to allow the means to equal their actual value. However, for growth curve models, the null model should set the means as equal, i.e., no growth.) The degrees of freedom of the null model are k(k – 1)/2 where k is the number of variables in the model. Amos refers to the null model as the independence model. Note that a different null model needs to be fitted if the means are part of the model. In this case, the usual null model is to allow the means to equal their actual value and thus the degrees of freedom do not change.. However, for a growth-curve model, the null model should set the means as equal, i.e., no growth.
Alternative null models might be considered (but almost never done). One alternative null model is that all latent variable correlations are zero and another is that all exogenous variables are correlated but the endogenous variables are uncorrelated with each other and the exogenous variables. O’Boyle and Williams (2011) suggest two different null models for the measurement and structural models.
Absolute Fit Index
An absolute measure of fit presumes that the best fitting model has a fit of zero. The measure of fit then determines how far the model is from perfect fit. These measures of fit are typically “badness” measure of fit in that the bigger the index, the worse the fit is.
Comparative Fit Index
A comparative measure of fit is only interpretable when comparing two different models. This term is unique to this website in that these measures are more commonly called absolute fit indices. However, it is helpful to distinguish absolute indices that do not require a comparison between two models. One advantage of a comparative fit index is that it can be computed for the saturated model, and so the saturated model can be compared to non-saturated models.