1
Ronald H. Heck / January 11, 2016
The University of Hawai‘i at Mānoa / Conducting Multi-Parameter Tests

Conducting Multi-Parameter Tests Between Nested Models in SPSS

Often we wish to a series of competing models during the course of model building. The overall goal is to find develop a model that summarizes proposed relationships in a parsimonious manner (i.e., with few extraneous parameters) compared with one or more alternative models. There are typically two different situations that arise. In the first situation, we may just generally wish to examine the fit of two or more proposed models in terms of which provides a better fit to the data, given the number of parameters it has. The models may have some variables and parameters in common, but they differ in that one cannot be completely constructed from the others. In this first situation, to compare models we can make use of what are called “information indices”; for example, we can make use of Akaike’s Information Criterion (AIC), Consistent Akaike’s Information Criterion (CAIC), or the BIC (Bayesian Information Criterion). The goal is to find the model that produces the lowest coefficient, given its number of parameters. Each provides a “penalty” for the number of parameters in the model, with the BIC being providing the more extreme penalty for more parameters, and hence, favors models with fewer parameters than the AIC or the CAIC (which provides an additional correction to AIC favoring models with fewer parameters).

In the other situation, one model may be directly developed from the other model by adding or removing parameters. They are referred to as “nested” models; more specifically, a model that estimates a lower number of parameters is nested within a model that estimates a larger number of parameters if fixing one or more parameter estimates to zero in the larger model results in the smaller model (e.g., Hox, 2010; Marcoulides & Hershberger, 1997; Peugh & Heck, in press). The goal is generally to see whether the restricted (or nested) model, with fewer parameters estimated, fits the data as well (or better) than the alternative model with more estimated parameters. Nested models are typically examined using likelihood ratio tests (which are distributed as chi-square with degrees of freedom equal to the difference in model parameters estimated). The difference in the models is examined with respect to the change in log likelihood between the restricted (nested model with fewer parameters) estimated and alternative models.

Maximum likelihood (ML) estimation summarizes the fit of a proposed model with respect to a discrepancy function between the sample covariance (or correlation) matrix and the model-implied covariance matrix. A model that fits the data perfectly would provide no discrepancy between the two covariance matrices. In order to evaluate the fit of a proposed model against the data, ML estimation produces a model deviance statistic, defined as –2*log likelihood (–2LL), where likelihood is the value of the likelihood function at convergence and log is the natural logarithm. The deviance is an indicator of how well the model fits the data. Models with lower deviance (i.e., a smaller discrepancy function) fit better than models with higher deviance. Nested models (i.e., where a more specific model is formed from a more general one) can be compared by examining differences in these deviance coefficients under specified conditions (e.g., changes in deviance between models per differences in degrees of freedom).

IBM SPSS MIXED currently offers two estimation choices: full information ML estimation (often abbreviated as ML) and restricted maximum likelihood (REML) estimation, which is the default setting. It is important to note there are differences between ML and REML parameter estimation in comparing models in multilevel modeling situations (e.g., for more detailed discussion, see Goldstein, 2011, pp. 57-59; Hox, 2012, pp. 40-42; Singer & Willett, 2003, pp. 88-92; Snijders & Bosker, 2012, pp. 60-61). In ML estimation, both regression coefficients and variance components are included in the likelihood function, while in REML estimation, only the variance components are included in estimating the likelihood function, with the regression coefficients estimated in a second step (Hox, 2010). REML, therefore, is referred to as a restricted solution. One of the shortcomings of ML estimation for comparing nested multilevel models is that the estimation process but does not take into account the loss in degrees of freedom due to the estimation of the P+1 regression coefficients in the proposed model (Hox, 2010). This failure to allocate degrees of freedom properly results in negatively biased random-effect parameter estimates due to positively biased degrees of freedom for parameter estimation. As there are more parameters in the model and smaller sizes, therefore, the variance estimates obtained through ML may be too small, which leads to overly liberal hypothesis tests (Raudenbush & Bryk, 2002).

In contrast, because REML considers fixed effect parameters separately, unbiased random effect estimates can be obtained after the fixed effects and their degrees of freedom are removed from the likelihood function. In other words, REML takes into account the loss in degrees of freedom due to the estimation of the P+ 1 regression coefficients in the model in order to obtain unbiased estimation of the variance components (Snijders & Bosker, 2012). This correction in the denominators used to calculate the variance will be greatest when the sample size is small. Where sample sizes are balanced in multilevel data, REML will produce estimates consistent with estimates produced by ANOVA, which are optimal (Searle, Casella, & McCulloch, 1992).

ML is widely used in model comparison, however, since the computations are easier and generally efficient (with sufficient sample sizes), and because both regression coefficients and variance components are included in estimating the likelihood function, a chi-square test between competing models can easily be constructed to compare nested models (Hox, 2010). Where competing models are nested in random effects, REML estimation can be used. Nested models can be compared using a likelihood ratio test, which involves first computing both the difference in the -2 LogL (or chi-square) model fit statistics (i.e., Δ -2LogL = [-2LogLsmaller] – [-2LogLlarger]) and the difference in the number of estimated parameters (Δp = plarger - psmaller) between the two nested models. An alternative means of comparing nested models, which can be used with REML estimation (which facilitates comparing nested models with both regression coefficients and random effects), is the multi-parameter Wald test (Schafer, 1997). In the following section, we will illustrate each approach.

A Short Illustration Using a Likelihood Ratio Test

For example, let’s suppose we are comparing two nested three-level models. The goal is to determine whether allowing a level-2 predictor’s slope coefficient (in this case, gender) to vary randomly at level 3 improves the fit of the model against the model where the gender slope is fixed at level 2. In this first case, we can use a likelihood ratio test, since we are examining whether adding a random effect improves the model fit relative to the restricted model (with slope fixed to 0). Hence, there are no fixed effects involved in the comparison between models. The first model, with the random effect of gender fixed at level 3, is nested with respect to the alternative model allowing the level-2 predictor to vary randomly at level 3. This is because fixing the random slope effect associated with gender to 0 in the alternative model results in the restricted model (i.e., with a greater number of degrees of freedom). A likelihood ratio test comparing the two models can be conducted by first using the -2LL (or deviance) information in the SPSS output to compute these difference values [Δ -2LogL = (-2LogLsmaller) – (-2LogLlarger)]. The actual nested model “test” involves referring the -2LogL difference value (Δ -2LogL ) to a chi-square sampling distribution at degrees of freedom equal to the difference in the number of estimated parameters. In this case, the difference in model degrees of freedom is 3, owing to the presence of other random effects in the level-3 model. The alternative model (with 14 parameters) is tested against the restricted model by adding a random effect for gender

(GROUP_GENDER) in the last line.

MIXED MATH_DV WITH C_AGE GROUP_GENDER GMC_PCT_FRLUNCH

/PRINT SOLUTION TESTCOV

/METHOD = REML

/FIXED = INTERCEPT C_AGE GROUP_GENDER C_AGE*GROUP_GENDER

/RANDOM = INTERCEPT C_AGE | SUBJECT(PARTICIPANT_ID*SCHOOL_ID) COVTYPE (UN)

/RANDOM = INTERCEPT C_AGE GROUP_GENDER | SUBJECT(SCHOOL_ID) COVTYPE (UN).

The model with random slope at level 3 has 14 estimated parameters and a -2LL (deviance) of 109293.033. The nested model (without the level-3 random gender effect) has 11 estimated parameters and a -2LL of 109302.040. The chi-square difference is 9.007, which is significant at p < .05. The required chi-square coefficient (for 3 degrees of freedom, at p = .05) is 7.82. The results of the likelihood ratio test therefore show that allowing the effects of gender to vary randomly at level 3 would result in a significant improvement in the fit of the analysis model to the data.

Comparing Nested Models Using a Multi-parameter Test

In similar fashion, the model that allows gender to vary randomly at level 3 is nested with respect to a more complex alternative model that includes a level-3 predictor and the presence of three other added variable interactions because fixing the four fixed effects to 0 in the level-3 predictor model results in the more restricted model with the level-2 predictor varying randomly at level 3. As noted earlier, this latter model with fixed-effect and random-effect parameters can be compared against the nested model using REML estimation, instead of ML estimation, and a multi-parameter Wald test (Enders, 2010, pp. 233-239; Li, Raghunathan, & Rubin, 1991; Peugh & Heck, in press, Schafer, 1997) instead of a likelihood ratio test. The multi-parameter test is similar to how the coefficient of determination (i.e., R2) in a multiple regression analysis tests the inclusion of all predictors simultaneously for a significantly non-zero proportion of response variable variance explained (using an F-ratio test).

Following is the SPSS syntax that can be used to compare the random gender effects model to the free/reduced lunch percentages model (both estimated using REML) via a multi-parameter Wald test (with a difference of 4 degrees of freedom) is

MIXED MATH_DV WITH C_AGE GROUP_GENDER GMC_PCT_FRLUNCH

/PRINT SOLUTION TESTCOV

/METHOD = REML

/FIXED = INTERCEPT C_AGE GROUP_GENDER GMC_PCT_FRLUNCH

C_AGE*GROUP_GENDER C_AGE*GMC_PCT_FRLUNCH

GROUP_GENDER*GMC_PCT_FRLUNCH

C_AGE*GROUP_GENDER*GMC_PCT_FRLUNCH

/RANDOM = INTERCEPT C_AGE |

SUBJECT(PARTICIPANT_ID*SCHOOL_ID) COVTYPE (UN)

/RANDOM = INTERCEPT C_AGE GROUP_GENDER |

SUBJECT(SCHOOL_ID) COVTYPE (UN)

/TEST = "MPW" GMC_PCT_FRLUNCH 1; C_AGE*GMC_PCT_FRLUNCH 1;

GROUP_GENDER*GMC_PCT_FRLUNCH 1;

C_AGE*GROUP_GENDER*GMC_PCT_FRLUNCH 1.

where the /TEST = command is used in SPSS for multi-parameter Wald testing. A text title (“MPW” for multi-parameter Wald) is included, and each of the four fixed-effect parameters is listed to indicate its inclusion in the test, followed by a “1” that indicates each fixed effect is weighted equally, and each parameter is separated by a semicolon to allow the four effects to be tested as a unified set. Results for the multi-parameter Wald test in SPSS (F[4, 55.064] = 5.83, p < .001) showed that the inclusion of the four fixed effects in the level-3 predictor model (with 18 parameters estimated versus 14) resulted in a significant improvement in model fit.

Test of Contrastsa
Source / Numerator df / Denominator df / F / Sig.
MPW / 4 / 55.064 / 5.830 / .001
a. Dependent Variable: MATH_DV.

References

Enders, C.K. (2010). Applied missing data analysis. New York: Guilford.

Goldstein, H. (2011). Multilevel statistical models (4th ed.). West Sussex, UK: Wiley.

Hox, J.J. (2010). Multilevel analysis methods: Techniques and applications. New York: Routledge.

Li, K.H., Raghunathan, T.E., & Rubin, D.B. (1991). Large sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065-1073.

Peugh, J. L. & Heck, R. H. (in press). Conducting three-level longitudinal analyses. Journal

of Adolescent Development.

Schafer, J.L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman & Hall.

Searle, S.R., Casella, G., & McCulloch, C.E. (1991). Variance components. NY: Wiley.

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.