Comment on model fit indices

It would be highly desirable if there were clear, standard, widely agreed-upon values that indicate an acceptable cut-off score for the fit indices used in structural equation modeling (SEM), but that is not the case. As a simple illustration of the differences that can occur at the applied level, on can look at the differences between the paper by Ezpeleta and colleagues (Ezpeleta, Granero, de la Osa, Penelo, & Domenech, 2012) and the Krieger (2013) papers discussed in the introduction to this report. The Ezpeleta et al. paper used only two fit indices, the CFI and the RMSEA, while Krieger et al, used three fit indices. Values for the two fit indices these two papers shared in common were different. Thus, these two reports differed both on the number of fit indices employed and their cutoff scores.

In discussing this problem Brown (Brown, 2006) notes that the issues involved in goodness-of-fit indices are “hotly debated” with regard to which indices to use and which criteria to apply when determining good or poor model fit. We will not discuss issues on which fit indices to use here, but will note some of this differences raised in choosing criteria for some fit indices herein.

Along with deciding which value of a fit index should be deemed the criterion for good versus poor fit, there is also the issue of how rigidly criteria should be applied. It is tempting to decide that the most rigorous criteria should be adopted. This would increase the likelihood that a model whose fit is deemed to be good or acceptable would indeed be good or acceptable. Of course, setting criteria too high also increases the likelihood that a model that should be considered acceptable will be rejected. See Marsh Hau, and Wen (2004) for further discussion of the role that sample size may play in the process of balancing these factors. In setting these criteria, researcher must balance the desirability of identifying models that are clearly “good” and the importance of not prematurely rejecting potentially useful models.

Various experts have suggested different criteria for some of the commonly-used fit indices such as RMSEA, NFI, and CFI. With regard to RMSEA, Brown (Brown, 2006) notes that Hu and Bentler (Browne & Cudeck, 1993) consider a “reasonably good fit” to be “close to .06 or below,” while Brown and Cudeck describe < .08 as “adequate model fit.” MacCullum, Browne, and Sugawara et al. (1996) concluded that RMSEA values in the range of .08 to .10 are a “mediocre fit” and only if greater than .10, should the model be rejected. Marsh et al. note that “experience” led researchers to suggest that an RMSEA of < .05 is indicative of a close fit, and values up to .08 represesent “reasonable” errors of approximation.

With regard to NNFI and CFI, Bentler and Bonett (1980) originally proposed that relative fit indices (e.g., NNFI and CFI) larger than 0.90 indicates an acceptable model. More recently, Hu and Bentler (1999) suggested that relative fit indices above 0.95 indicate acceptable model. However, Marsh, Hau, and Zhonglin (2004) have strongly cautioned researchers against accepting Hu and Bentler’s (1999) more stringent criterion for goodness-of-fit indices, and have provided a strong conceptual and statistical rationale for retaining Bentler and Bonett’s (1980) long-standing criterion for judging goodness-of-fit indices as acceptable if they exceed 0.90 in value.

With regard to SRMR, there appears to be less disagreement. A value of < .08 has often been recommended (Brown, 2006).

There is also the issue of just how rigidly the adopted cutoff score should be adopted. Is, for example, an RMSEA of .04999 acceptable, but .05 not acceptable; of .07999 acceptable but .08 not acceptable? In that context, Marsh et al.’s (Marsh et al., 2004) comment about golden rules is relevant, when they remind us to “avoid the temptation to treat currently available ‘rules of thumb’ as if they were golden rules (p. 321).” It is also clear that Browne and Cudeck (Browne & Cudeck, 1993) did not intend that an RMSEA value of .05 be used with such precision that a .050 value is “good” and .051 is not good, and they note that, “the choice of 0.05…is a subjective one….” In defending their choice of .05 as “good,” they wisely note that another widely-used and accepted cutoff for determining statistical significance is also somewhat arbitrary, e.g., “It [referring to a 0.05 for RMSEA] is, however, no less subjective than the choice of 5% as a significance level.” Similarly, the APA style manual reminds us that, in discussing how many decimal places to report that rounding of decimal places should be done with “prospective use and statistical precision in mind” (p. 113).

These various issues were taken into account as decisions were made about adopting cutoff scores for this report.

With regard to choosing which fit indices to consider, we initially chose RMSEA, CFI, and NNFI. At the urging of a reviewer, we also included SRMR. These four fit indices are also the ones recommended by Brown (2006).


References

Brown, T. A. (2006). Confirmatory Factor Analysis for Applied Research. New York: Guilford.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural models (pp. 136-162). Newbury Park, CA: Sage.

Ezpeleta, L., Granero, R., de la Osa, N., Penelo, E., & Domenech, J. M. (2012). Dimensions of oppositional defiant disorder in 3-year-old preschoolers. Journal of Child Psychology and Psychiatry, 53, 1128-1138. doi: 10.1111/j.1469-7610.2012.02545.x

Krieger, F. V., Polanczyk, G. V., Goodman, R., Rohde, L. A., Graeff-Martins, A. S., Salum, G., . . . Stringaris, A. (2013). Dimensions of oppositionality in a Brazilian community sample: Testing the DSM-5 proposal and etiological links. Journal of the American Academy of Child and Adolescent Psychiatry, 52, 389-400.

MacCallum, Browne, M. B., & Sugawara, H. M. (1996). Power analysis and determinatin of sample size for covariance structure modeling. Psychological Methods, 1, 130-149.

Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler's findings. Structural Equation Modeling, 11(3), 320-341.