v.2 N.K.Bowen, 2014
TESTING FOR DIFFERENCES IN MEASUREMENT (CFA) MODELS USING MPLUS’S INVARIANCE SHORTCUT CODE (WLSMV)
Measurement invariance occurs when the parameters of a measurement model are statistically equivalent across two or more groups. The parameters of interest in measurement invariance testing are the factor loadings and indicator intercepts (when ML estimation is used) or indicator thresholds (when WLSMV is used with categorical variables). Invariance in residual variances or scale factors can also be tested, but there is consensus that it is not necessary to demonstrate invariance across groups on these parameters. Researchers typically hope to demonstrate that their measures are invariant across groups. When loadings and intercepts or thresholds are invariant across groups, scores on latent variables can be validly compared across the groups and the latent variables can be used in structural models hypothesizing relationships among latent variables. A recent update to Mplus (7.11) provides a convenient shortcut for conducting a sequence of increasingly restrictive invariance tests. This handout gives some non-Mplus-specific information on invariance testing, information on Mplus non-shortcut invariance testing, and then invariance testing using the Mplus shortcut. We focus on tests of invariance with measurement models based on ordinal variables—the most common type of scale variables in social science research.
When scales are based on ordinal measures, it is appropriate to use WLSMV estimation and a polychoric correlation matrix for the analysis. Specifying that observed indicator variables are ordinal automatically leads to the use of WLSMV and the polychoric correlation matrix.
Note that Mplus allows two different approaches to the specification of multiple group CFAs: delta and theta parameterization. Delta is the default, but we recommend using Theta. Theta parameterization lets you obtain information on residual variances (unexplained variance in the observed indicators of factors). Social work researchers are typically interested in error variances, not scale factors. Add the line “Parameterization is Theta” to the analysis section of your input file to request Theta parameterization.
Step 1: Test your hypothesized model separately with each group.
Some invariance scholars recommend obtaining the best model for each group separately before beginning invariance testing. To run separate analyses based on groups represented by values on a variable in your data set, use the following syntax in the VARIABLE section of the code, substituting your variable’s name and value as appropriate. For example, the code below specifies that the analysis should use only cases that have a value of 2 on the race/ethnicity variable in the dataset. After running the model with this group, the value would be changed to 1 or some other value to run the model with another group.
USEOBSERVATION ARE (raceth EQ 2); (Note: there is no space after USE)
If data for your groups are in separate data files, specify the data file for one group at a time in the usual way:
DATA: FILE IS "C:\MyMplus\datafilewhite.dat”;
Seek a good-fitting model for each group, but it may be acceptable to have marginal fit at this stage (Raykov, Marcoulides, & Li, 2012). The model for one group may include correlated errors that do not appear in other groups, but the pattern of factors and loadings should be the same across groups in order to proceed to multiple group modeling. However, see Byrne, Shavelson, Muthén, 1989 for a more relaxed approach.
Step 2: Specify a multiple group model. It may include slight variations for each group if they were identified in the individual group tests.
Multiple group models can be run from multiple datasets or based on values for a variable in the dataset. To specify a multiple group model based on values on a variable in the dataset, use code like the following. The numbers in parentheses must match values on the variable gender in the dataset. This code goes in the section under VARIABLE, because the variable “raceth” here has a special function.
GROUPING IS raceth (1 = WH 2 = AA);
According to the online Mplus user’s guide, the way to specify that data are in two files is:
FILE (white) IS white.dat;
FILE (AA) IS AA.dat;
When data are all in one file, the “GROUPING IS” code is all it takes to let Mplus know you want to do a multiple group analysis. If the models for your groups are the same, simply enter your model code as if you were doing a single group model. For example, the code below specifies that a factor called SSFR (social support from friends) is measured by 4 observed indicators in both groups:
MODEL:
SSFR BY c24 c26 c27 c28;
If the factor model is the same for all groups, no further specification is required. However, if, for example, a correlated error was found between c24 and c25 when the African American group was tested individually, that difference from the overall model could be specified in the multiple group model with code referring to just the African American group as follows:
MODEL:
SSFR BY c24 c26 c27 c28;
MODEL AA: (Note that this line does not turn blue because
c24 WITH c25; it is a subcommand under MODEL:)
By Mplus default (without the shortcut), the code above will test a highly constrained model—one in which factor loadings and thresholds are constrained across groups, with residual variances (theta parameterization) or scales (delta parameterization) fixed at one in the first group and free in the second, and with latent factor means fixed at 0 for the first group and free in the second group. The AA group model would also have its freely estimated correlated error. To compare this constrained model to a less constrained model, we would have to first specify in many lines of syntax a model in which the default constraints on thresholds and/or loadings were all freed (except for those that need to be fixed or constrained for identification purposes). We would need to run this less restrictive model first in order to use the WLSMV difftest mechanism. The following difftest code would be included in the syntax for the less restrictive model. The code requests that a small file be saved with information on the current model to be used later when running the more restrictive model.
SAVEDATA:
difftest=freemodel.dat;
Then we would run the more constrained model, this time including the “difftest= freemodel.dat” code under the ANALYSIS command. These steps give us the chi square difference test results we need to determine if model fit is significantly worse with the equality constraints.
BUT, there is a new and easier way to test measurement invariance! Version 7.11 has a shortcut that allows us to simultaneously run and compare chi squares for a configural model, metric model, and scalar model all in one analysis. You specify the grouping variable or two data files as before, AND, in the ANALYSIS section of the code type:
****** MODEL IS CONFIGURAL METRIC SCALAR; ******
From Dimitrov (2010) we have the following definitions of these levels of invariance:
Configural Invariance—same pattern of factors and loadings across groups, established before conducting measurement invariance tests
Measurement Invariance
• weak or Metric Invariance—invariant factor loadings
• strong or Scalar Invariance—invariant factor loadings and intercepts (ML) or thresholds (WLSMV)
• invariance of uniquenesses or strict invariance—invariant residual variances (not necessary for demonstrating measurement invariance)
With the shortcut, Mplus specifies the Configural model as having loadings and thresholds free in both groups except the loading for the referent indicator, which is fixed at 1.0 in both groups. The means of factors in both groups are fixed at 0 while their variances are free to vary. Residual variances (with theta parameterization) or scale factors (delta parameterization) are fixed at 1.0 in all groups.
With the shortcut, Mplus specifies the Metric model as having loadings constrained across groups except the loading for the first indicator of a factor, which is fixed at 1.0 in both groups. Thresholds are generally allowed to vary across groups, but certain thresholds have to be constrained in order for the model to be identified. Therefore, the first two thresholds of the referent indicator are constrained to be equal across groups, and the first threshold of each other indicator on a factor is constrained to be equal. Note that you can’t run a Metric model with binary indicator variables because it can’t be identified—there is only one threshold for a binary variable. With the shortcut metric model, the mean of the first factor is fixed at 0 and other factor means are free to vary and factor variances are free to vary.
Some researchers, and the online Mplus User’s Guide, suggest evaluating only the Configural/Scalar comparison. They contend that lambdas and thresholds for indicators should only be freed or constrained in tandem. In this case, the code “MODEL IS CONFIGURAL SCALAR” can be used, and searches for sources of invariance will involve freeing lambdas and their thresholds at the same time. Note too, that any one of the three invariance models can be run by itself with the “MODEL IS” line under ANALYSIS.
Step 3: Run the Model: With the example from above (MODEL: and MODEL AA:) we will get output comparing the fit of the three models, which will guide us to our conclusions or next steps. Specifically, before you see detailed fit information in the output for each model, you’ll see:
MODEL FIT INFORMATION
Invariance Testing
Number of Degrees of
Model Parameters Chi-square Freedom P-value
Configural 32 4.919 4 0.2957
Metric 29 7.763 7 0.3540
Scalar 22 26.491 14 0.0224
Degrees of
Models Compared Chi-square Freedom P-value
Metric against Configural 3.571 3 0.3117
Scalar against Configural 21.589 10 0.0173
Scalar against Metric 20.653 7 0.0043
Step 4: Interpreting the Model Comparison Output
A limited number of scenarios are possible in the Invariance Testing output.
Scenario 1: All models have good fit and invariance is supported at each step. J
Scenario 2: One of the models does not have good fit. We need to demonstrate good fit of the model at each step--an issue separate from the invariance tests. Not only would we examine the chi square statistic for the models above, but we would evaluate any other fit indices we have pre-specified. Fit indices pertaining to the multiple group configural model, for example, can be found in the output section called MODEL FIT INFORMATION FOR THE CONFIGURAL MODEL. If the fit statistics are not adequate for any CFA model in our sequence of increasingly constrained model tests, it makes no sense to proceed to the next model: first because any less constrained model should have better fit than the next more constrained model, and second because if our measurement model at any step is unacceptable, it makes no sense to examine further whether parameters differ across groups.
Scenario 3: Fit worsens from the configural to the metric level. If there is non-invariance at the metric level compared to the configural level (meaning constraining lambdas leads to worse fit), the researcher can choose to search for the source(s) of worse fit by testing the loadings of one factor at a time as a set and/or testing one loading at a time. If only a relatively small number of loadings are non-invariant, it is possible to proceed to test the effects of constraining thresholds in this new model. This sequence would be similar to (but simpler than) the search for non-invariant thresholds described in Step 5 below.
Scenario 4: Fit does not significantly worsen when lambdas are constrained, but it does get worse when thresholds are constrained. In this situation, we can claim metric, or weak, invariance of our measure and stop testing, but weak invariance is not desirable. Alternatively, we can do additional tests to identify the source(s) of non-invariance. If only a small number of thresholds are non-invariant, we could claim that our measure is “partially invariant” (Byrne, 1989; Dimitrov, 2010—says invariance in up to 20% of parameters is okay).
Scenario 3 is what we have in our example. The “Metric against Configural” line above indicates that constraining factor loadings did not significantly worsen fit (p of c2 change > .05). However, the “Scalar against Metric” lines indicate that constraining thresholds across groups worsened fit (p of c2 change < .05). We cannot conclude that our measure has scalar, or strong, invariance.
We can, however, determine the extent of the non-invariance in the thresholds.
Step 5: Searching for Sources of Non-Invariance, Approach 1
If we pre-specified that we would be testing thresholds and lambdas separately (as suggested by Millsap & Yun-Tein, 2004), we could take advantage of the shortcut code in the following way, even though we are going beyond the shortcut to look for individual non-invariant parameters.
5a. Mplus’s WLSMV difftest requires that the less restrictive model be run first. The “nested” model is run second. Therefore, we would start with the less constrained model—the SCALAR model plus thresholds freed for one indicator.
ANALYSIS: . . . .
MODEL IS SCALAR; Lambdas and thresholds constrained . . .
MODEL:
SSFR BY c24 c26 c27 c28;
MODEL AA:
[C24$2*] (threshold 1 for c24 remains constrained
[C24$3*] for identification. The other 2 are freed)
SAVEDATA:
DIFFTEST=Ts24free.DAT; Save a difftest file to use in the next analysis
No need to even look at the output, we just wanted to get that difftest file saved.
5b. Then run the more constrained SCALAR model and use difftest to compare its fit to the prior model
ANALYSIS: . . . .
MODEL IS SCALAR; The model with lambdas and thresholds constrained
DIFFTEST = Ts24free.DAT Compares fit of this model with prior model’s fit.
MODEL:
SSFR BY c24 c26 c27 c28; No thresholds freed
SAVEDATA:
DIFFTEST=SCALAR.DAT; Save fit information for next comparison.
The results:
Chi-Square Test for Difference Testing
Value 4.545
Degrees of Freedom 3
P-Value 0.2083
Constraining the thresholds of C24 did not worsen fit, so we’ll move on to another indicator. We know the source of bad fit is out there somewhere!
Repeat Step 5b until the problem indicator is identified.
The test for invariance in the thresholds of C27 and C26 had the same result as the test with C28. Each time we compared the more restrictive SCALAR model (with all lambda and threshold constraints) to a less restrictive model—one with the thresholds free for one indicator. Each time the chidiff test was non-significant, we re-constrained the thresholds before continuing our search.