Heck et al. Chapter 3, p. 61 ff
This chapter introduces basic multilevel modeling using the dataset ch3multilevel.savand MIXED.
The data editor
The Level 1 variables are
Femalewhether the respondent is female (=1) or not (=0)
sesMeasure of socioeconomic status of the respondent. The main independent variable.
Femses female*ses for investigation of the interaction of gender and ses
mathscore of respondent on a math achievement test. The dependent variable.
The Level 2 variables are
SchcodeID of the school attended by the respondent. This is the grouping IDvariable.
ses_meanmean ses of all respondents within a school. (Not mean of all kids in the school.)
This is Level 2 variable that is a composite of Level 1 values.
per4yrcproportion of students in school going on to a 4 yr college or university
This is a Level 2 variable that is a composite of what could be Level 1 values, although whether each individual student went to a 4 yr coll/univ was not used as such here.
publicwhether school is public (=1) or private (=0).
This is a pure Level 2 characteristic, not an aggregate of any Level 1 values.
The plan for this lecture is
1. Consider treating the data as a single huge group. We’ll perform regular Ordinary Least Squares (OLS) analyses. The text did just a sample. I’ll do both a sample and the whole data set. I’ll illustrate doing OLS using REGRESSION, GLM (which we’ve done before) and MIXED (which we have not used before).
2. We’ll then investigate the simplest possible model that acknowledges the grouping in schools – a model in which each math score is viewed as simply a score taken from a sample of scores whose mean depends on the school. (p. 73 ff), without considering ses.
3. Next we’ll examine a model which acknowledges the grouping and in which math scores are related to ses at the individual level. (p. 80 ff).
4. And so on.
Single Group analysis using REGRESSION, GLM and MIXED.
File is ch3multilevel.sav
Yi = B0 + B1(sesi) + ei
Model is a basic OLS model with no acknowledgement of the grouping of respondents into schools.
Using the REGRESSION procedure: Analyze -> Regression -> Linear. . .
Output
Descriptive StatisticsMean / Std. Deviation / N
math / 57.7339 / 8.78399 / 6871
ses / .0319 / .78078 / 6871
Model Summary
Model / R / R Square / Adjusted R Square / Std. Error of the Estimate
1 / .378a / .143 / .143 / 8.13220
a. Predictors: (Constant), ses
Coefficientsa
Model / Unstandardized Coefficients / Standardized Coefficients / t / Sig.
B / Std. Error / Beta
1 / (Constant) / 57.598 / .098 / 586.608 / .000
ses / 4.255 / .126 / .378 / 33.858 / .000
a. Dependent Variable: math
So Y-hat = 57.598 + 4.255(SES) across the whole sample, treated as a single group.
Using the GLM procedure for the same analysis
Analyze -> General Linear Model -> Univariate
Descriptive StatisticsDependent Variable:math
Mean / Std. Deviation / N
57.7339 / 8.78399 / 6871
Tests of Between-Subjects Effects
Dependent Variable:math
Source / Type III Sum of Squares / df / Mean Square / F / Sig. / Partial Eta Squared / Noncent. Parameter / Observed Powerb
Corrected Model / 75813.316a / 1 / 75813.316 / 1146.382 / .000 / .143 / 1146.382 / 1.000
Intercept / 2.276E7 / 1 / 2.276E7 / 344109.335 / .000 / .980 / 344109.335 / 1.000
ses / 75813.316 / 1 / 75813.316 / 1146.382 / .000 / .143 / 1146.382 / 1.000
Error / 454265.500 / 6869 / 66.133
Total / 2.343E7 / 6871
Corrected Total / 530078.816 / 6870
a. R Squared = .143 (Adjusted R Squared = .143)
b. Computed using alpha = .05
Parameter Estimates
Dependent Variable:math
Parameter / B / Std. Error / t / Sig. / 95% Confidence Interval / Partial Eta Squared / Noncent. Parameter / Observed Powera
Lower Bound / Upper Bound
Intercept / 57.598 / .098 / 586.608 / .000 / 57.406 / 57.791 / .980 / 586.608 / 1.000
ses / 4.255 / .126 / 33.858 / .000 / 4.008 / 4.501 / .143 / 33.858 / 1.000
a. Computed using alpha = .05
So, again, Y-hat = 57.598 + 4.255(ses) treating the data as a single sample.
OLS Regression using the MIXED Procedure
Model: Yi = B0 + B1(sesi) + ei
Basic Specification:Analyze -> Mixed Models -> Linear . . .
I next clicked on the Fixed... button
I highlighted ses and clicked on the Add button to move ses into the Model: field.
I then click on the Continue button.
Next, I clicked on the Statistics... button.
MIXED Output
Model DimensionaNumber of Levels / Number of Parameters
Fixed Effects / Intercept / 1 / 1
ses / 1 / 1
Residual / 1
Total / 2 / 3
a. Dependent Variable: math.
Information Criteriaa
-2 Restricted Log Likelihood / 48303.088
Akaike's Information Criterion (AIC) / 48305.088
Hurvich and Tsai's Criterion (AICC) / 48305.088
Bozdogan's Criterion (CAIC) / 48312.923
Schwarz's Bayesian Criterion (BIC) / 48311.923
The information criteria are displayed in smaller-is-better forms.
a. Dependent Variable: math.
Fixed Effects
Type III Tests of Fixed EffectsaSource / Numerator df / Denominator df / F / Sig.
Intercept / 1 / 6869 / 344109.335 / .000
ses / 1 / 6869.000 / 1146.382 / .000
a. Dependent Variable: math.
Estimates of Fixed Effectsa
Parameter / Estimate / Std. Error / df / t / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Intercept / 57.598169 / .098188 / 6869 / 586.608 / .000 / 57.405690 / 57.790649
ses / 4.254675 / .125661 / 6869.000 / 33.858 / .000 / 4.008340 / 4.501011
a. Dependent Variable: math.
Covariance Parameters
Estimates of Covariance ParametersaParameter / Estimate / Std. Error / Wald Z / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Residual / 66.132698 / 1.128456 / 58.605 / .000 / 63.957541 / 68.381830
a. Dependent Variable: math.
So, MIXED has told us that Y-hat = 57.598169 + 4.254675(ses) when the data are treated as a single sample.
Note that MIXED gave us something that REGRESSION and GLM did not – a test of the null hypothesis that in the population, the variance of the residuals is 0. We rejected that hypothesis. This would spur us on to look for factors to explain the variability in math scores that was not accounted for by ses.
Considering only a subset of the 6781 students in the full sample – those 93 in the first 6 schools.
compute firstsix = 0.
if (schcode >= 1 and schcode <= 6) firstsix = 1.
filter by firstsix.
graph /scatterplot=ses with math.
filter off.
In Tables 3.1 and 3.2, the text reports the results of a regression on the TOTAL sample, not just the respondents in the figure shown above those tables.
graph /scatterplot = ses with math by schcode.
The following graph provides an impetus to consider the grouping of respondents into schools – there is considerable variability of the relationship of math to ses across the 6 groups in this subset. That means that there is probably variability across ALL the groups.
Note that this graph is not quite identical to that on p. 70 in the text because I included 93 cases in my subset. I did that in order to include all respondents in schcode = 6.
Reasons for the impetus –
1) The relationships of math to ses are not identical across the 6 schools in this sample. This suggests that there might be systematicor randomvariation in the intercepts and in the slopes from school to school. This variation might be related to some characteristic of the schools.
2) The potential differences in slopes and intercepts might mean that the overall relationship (57.598169 + 4.254675*ses) that we found above when we treated the data as coming from a single group might not be precisely correct. Taking the slope and intercept differences into account might yield a different overall relationship. Moreover, taking the slope and intercept differences into account might change the p-values associated with either the intercept or the slope.
This is the essence of multilevel modeling – taking into account the possible differences between subgroups and with the effects of grouping on overall relationships.
Step 1. No Level 1 relationship. A model with only means. (p. 73)
The Level 1 “model” (quotes since it’s not much of a model)
Yij = B0j + eij
The Level 1 model says that each score is simply equal to the mean of the school from which the score was gotten plus a random deviation.
The Level 2 Intercept Model
B0j = g00 + u0j
The Level 2 Slope Model.
There is no Level 1 slope, so there can be no model of it.
Since there is no level 1 slope, the Level 2 model, can only be a model of the Level 1 intercept.
In this case, the Level 2 model is also a trivial one: that the Level 1 intercept, that is, the mean of a school’s math scores, is simply equal to the mean of all the math scores plus a random deviation.
The Combined model
Yij = g00 + u0j + eij
Each Yij is treated as the sum of a fixed overall mean (g00) + a random deviation of the group mean from that overall mean (u0j) and a random deviation of the individual score from the group mean (eij).
What can we estimate from this?
1. g00 – the overall sample mean
2. The variance of the group mean deviations - Var(u0j).
3. The mean of the variances of the individual scores from the group means – Var(eij).
Graphically
The red arrows represent the u0js. The black arrow represents the eijs.
Invoking the MIXED procedure.
Mixed Model Analysis – Model is Yij = g00 + u0j + eij
Model DimensionbNumber of Levels / Covariance Structure / Number of Parameters / Subject Variables
Fixed Effects / Intercept / 1 / 1
Random Effects / Intercepta / 1 / Variance Components / 1 / schcode
Residual / 1
Total / 2 / 3
a. As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Your command syntax may yield results that differ from those produced by prior versions. If you are using version 11 syntax, please consult the current syntax reference guide for more information.
b. Dependent Variable: math.
Information Criteriaa
-2 Restricted Log Likelihood / 48877.256
Akaike's Information Criterion (AIC) / 48881.256
Hurvich and Tsai's Criterion (AICC) / 48881.257
Bozdogan's Criterion (CAIC) / 48896.925
Schwarz's Bayesian Criterion (BIC) / 48894.925
The information criteria are displayed in smaller-is-better forms.
a. Dependent Variable: math.
MIXED Output continued
Fixed Effects – Model is Yij = g00 + u0j + eij
Type III Tests of Fixed EffectsaSource / Numerator df / Denominator df / F / Sig.
Intercept / 1 / 416.066 / 93846.456 / .000
a. Dependent Variable: math.
Estimates of Fixed Effectsa
Parameter / Estimate / Std. Error / df / t / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Intercept / 57.674234 / .188266 / 416.066 / 306.344 / .000 / 57.304162 / 58.044306
a. Dependent Variable: math.
Covariance Parameters
Estimates of Covariance ParametersaParameter / Estimate / Std. Error / Wald Z / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Residual / 66.550655 / 1.171618 / 56.802 / .000 / 64.293492 / 68.887062
Intercept [subject = schcode] / Variance / 10.642209 / 1.028666 / 10.346 / .000 / 8.805529 / 12.861989
a. Dependent Variable: math.
Random Effect Covariance Structure (G)a
Intercept | schcode
Intercept | schcode / 10.642209
Variance Components
a. Dependent Variable: math.
The last piece of output was the result of having checked the “Covariances of Random Effects” box earlier. As I mentioned then, it’ll be useful when we estimate both random intercepts and random slopes. Here it is redundant.
Both variances are significantly larger than 0 indicating that there is variation to be accounted for by looking for relationships of the intercepts to other variables for relationships of individual scores to other variables.
Intraclass Correlation Coefficient (p.74 and p. 78-79)
The intraclass correlation coefficient measures the proportion of variance of the scores that is associated with overall level differences between groups.
In this case, it’s a measure of the extent to which variation in the overall values of scores is related to group differences.
It’s analogous to R2, although it is a different measure.
From p. 74, ICC = (σ2B / (σ2B + σ2W)
σ2B
Written using two dimensions, ICC = ------
σ2B + σ2W
If ICC = 0, this means that there is NO variability between groups – not likely.
If ICC is close to 0, this means that the variability between scores in different groups is not any larger than variability between scores from the same group
If ICC > 0, variability between scores from different groups would be expected to be larger than variability between scores from the same group.
For these data,
σ2B = Var u0j = 10.642209.
σ2W = Var eij = 66.550655
10.642209
So, ICC = ------= .138 or 13.8% (p. 79)
(10.642209+66.550655)
So about 14% of the variance is between schools. This is enough to warrant a search for some explanation of that variation in intercepts.
The text (p. 74) suggests .05 as a cutoff. If the ICC is larger than .05, then that suggests that the need for an explanation of the variation in intercepts.
Step 2. Level 1 Relationship with random intercepts and fixed Level 1 slope. (p. 80)
Level 1 Model
Yij = B0j + B1j(ses)ij + eijText, p. 80 has just B1, a typo.
Level 2 Intercept Model
B0j = g00 + u0jEq 3.8
This is simply a random intercepts model.
Level 2 Slope Model
B1j = g10Eq. 3.10
Subscript rules for g
1st subscript designates which Level 1 coeff is being modeled – 0 for intercept; 1, 2, etc for slopes
2nd subscript designates which coeff this is – 0 for intercept of the model; 1, 2, etc for a slope
Ignore Equation 3.9 for now. Although the text introduces the concept of randomly varying slopes, i.e., Eq 3.9 = B1j = g10 + u1j, they don’t pursue that for this 2nd model. We’ll also not pursue it now. They justify this saying, “Often, in building models, we may treat the within-group slopes as fixed in preliminary analyses . . .”
Combined Model
Yij = g00 + u0j + g10(ses)ij + eij
I clicked on the Fixed... button above.
We used the Fixed... dialog box to tell the program to estimate those coefficients that do not vary from group to group. In this case, the intercept, g00, is fixed – it does not vary from group to group, so the Include intercept box is check. In addition the slope of ses, g01, also does not vary from group to group, so we put ses in the Model: box to tell the program to estimate a fixed slope for ses.
After specifying which fixed paramters to estimate, I clicked on the Continue button to take me back to the main MIXED dialog box.
I then clicked on the Random... button in the main dialog box.
The Random... dialog box is used to tell the program which random paramters to estimate.
Checking the Include intercept box tells it to estimate Var(u0j), the variance of intercepts across groups.
If we were going to estimate the variance in slopes across groups we would put ses in the Model: field.
We also have to tell the program that the groups are identified by schcode.
Step 2 MIXED Output: Mixed Model Analysis – Model is Yij = g00 + u0j + g10(SES) + eij
[DataSet1] G:\MdbT\P595C(Multilevel)\Multilevel and Longitudinal Modeling with IBM SPSS\Ch3Datasets&ModelSyntaxes\ch3multilevel.sav
Model DimensionbNumber of Levels / Covariance Structure / Number of Parameters / Subject Variables
Fixed Effects / Intercept / 1 / 1
ses / 1 / 1
Random Effects / Intercepta / 1 / Variance Components / 1 / schcode
Residual / 1
Total / 3 / 4
a. As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Your command syntax may yield results that differ from those produced by prior versions. If you are using version 11 syntax, please consult the current syntax reference guide for more information.
b. Dependent Variable: math.
Information Criteriaa
-2 Restricted Log Likelihood / 48215.440
Akaike's Information Criterion (AIC) / 48219.440
Hurvich and Tsai's Criterion (AICC) / 48219.441
Bozdogan's Criterion (CAIC) / 48235.109
Schwarz's Bayesian Criterion (BIC) / 48233.109
The information criteria are displayed in smaller-is-better forms.
a. Dependent Variable: math.
Fixed Effects – Model is Yij = g00 + u0j + g10(SES) + eij
Type III Tests of Fixed EffectsaSource / Numerator df / Denominator df / F / Sig.
Intercept / 1 / 375.699 / 187802.817 / .000
ses / 1 / 3914.638 / 803.954 / .000
a. Dependent Variable: math.
Note that F = 187,802.817 = t2 = 433.3622. Two tests of the same hypothesis with identical results.
Estimates of Fixed EffectsaParameter / Estimate / Std. Error / df / t / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Intercept / 57.595965 / .132905 / 375.699 / 433.362 / .000 / 57.334634 / 57.857296
ses / 3.873861 / .136624 / 3914.638 / 28.354 / .000 / 3.605999 / 4.141722
a. Dependent Variable: math.
Covariance Parameters
Estimates of Covariance ParametersaParameter / Estimate / Std. Error / Wald Z / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Residual / 62.807187 / 1.108877 / 56.640 / .000 / 60.671000 / 65.018587
Intercept [subject = schcode] / Variance / 3.469256 / .538821 / 6.439 / .000 / 2.558783 / 4.703696
a. Dependent Variable: math.
Random Effect Covariance Structure (G)a
Intercept | schcode
Intercept | schcode / 3.469256
Variance Components
a. Dependent Variable: math.
Beautiful graph
Step 3. Level 1 Relationship with intercept related to Level 2 characteristics and fixed level 1 slope. (p86)
Level 1 Model
Yij = B0j +B1j(sesij) + eij
Level 2 Intercept ModelEq 3.12
B0j = g00 + g01(ses_mean)j + g02(per4yrc)j + g03(public)j + u0j
Level 2 Slope Model
B1j = g10
Combined Model
Yij = g00 + g01(ses_mean)j + g02(per4yrc)j + g03(public)j + u0j+ g10(sesij) + eij
Subscript rules for g
1st subscript designates which Level 1 coeff is being modeled – 0 for intercept; 1, 2, etc for slopes
2nd subscript designates which coeff this is – 0 for intercept of the model; 1, 2, etc for a slope
Model is Yij = g00 + g01(ses_mean)j + g02(per4yrc)j + g03(public)j + u0j+ g10(sesij) + eij
Whew, isn’t this fun?!!
Note that this example illustrates that ALL predictors – Level 1 or Level 2 – are entered here.
Model is Yij = g00 + g01(ses_mean)j + g02(per4yrc)j + g03(public)j + u0j + g10(sesij) + eij
MIXED math BY public WITH ses_mean per4yrc ses
/FIXED=public ses_mean per4yrc ses | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT | SUBJECT(schcode) COVTYPE(VC).
Mixed Model Analysis.
Model is Yij = g00 + g01(ses_mean)j + g02(per4yrc)j + g03(public)j +u0j + g10(sesij) + eij
Model DimensionbNumber of Levels / Covariance Structure / Number of Parameters / Subject Variables
Fixed Effects / Intercept / 1 / 1
public / 2 / 1
ses_mean / 1 / 1
per4yrc / 1 / 1
ses / 1 / 1
Random Effects / Intercepta / 1 / Variance Components / 1 / schcode
Residual / 1
Total / 7 / 7
a. As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Your command syntax may yield results that differ from those produced by prior versions. If you are using version 11 syntax, please consult the current syntax reference guide for more information.
b. Dependent Variable: math.
Information Criteriaa
-2 Restricted Log Likelihood / 48128.899
Akaike's Information Criterion (AIC) / 48132.899
Hurvich and Tsai's Criterion (AICC) / 48132.901
Bozdogan's Criterion (CAIC) / 48148.568
Schwarz's Bayesian Criterion (BIC) / 48146.568
The information criteria are displayed in smaller-is-better forms.
a. Dependent Variable: math.
Fixed Effects. Model is Yij = g00 + g01(ses_mean)j + g02(per4yrc)j + g03(public)j + u0j + g10(sesij) + eij
Type III Tests of Fixed EffectsaSource / Numerator df / Denominator df / F / Sig.
Intercept / 1 / 417.302 / 17093.820 / .000
public / 1 / 409.345 / .354 / .552
ses_mean / 1 / 709.247 / 64.945 / .000
per4yrc / 1 / 413.879 / 9.072 / .003
ses / 1 / 6448.937 / 408.856 / .000
a. Dependent Variable: math.
Estimates of Fixed Effectsb
Parameter / Estimate / Std. Error / df / t / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Intercept / 56.277288 / .429669 / 411.160 / 130.978 / .000 / 55.432665 / 57.121911
[public=0] / .164264 / .275903 / 409.345 / .595 / .552 / -.378098 / .706627
[public=1] / 0a / 0 / . / . / . / . / .
ses_mean / 2.473244 / .306897 / 709.247 / 8.059 / .000 / 1.870709 / 3.075779
per4yrc / 1.419812 / .471391 / 413.879 / 3.012 / .003 / .493192 / 2.346432
ses / 3.190801 / .157803 / 6448.937 / 20.220 / .000 / 2.881455 / 3.500147
a. This parameter is set to zero because it is redundant.
b. Dependent Variable: math.
Note that g10 is a Level 1 predictor so its significance says that individual Math scores increase as individual ses increases.
g00 is the overall intercept of the combined equation – expected math score when all predictors = 0.
g01 is a Level 2 predictor of the Level 1 intercept. So its significance says that among schools equal on the other predictors the overall average of the math scores of a school, i.e., the height of the equation relating math to ses for that school, increases as the mean ses of that school increases.
And g02 is also a Level predictor of the Level 1 intercept, so its significance says among schools equal on the other predictors the overall average of the math scores of a school, i.e., the height of the equation relating math to ses for that school, increase as the proportion of students in that school going to 4 year colleges or universities increase.
Note that all of these carry the usual multiple regression interpretation that the relationships hold while controlling for (holding constant) the other predictors.
Covariance Parameters
Estimates of Covariance ParametersaParameter / Estimate / Std. Error / Wald Z / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Residual / 62.630370 / 1.102966 / 56.784 / .000 / 60.505479 / 64.829885
Intercept [subject = schcode] / Variance / 2.395178 / .443654 / 5.399 / .000 / 1.665987 / 3.443531
a. Dependent Variable: math.
Both variances are significantly larger than 0 indicating that there may be other variables that will account for variation in the intercepts and also variation in the individual scores .