ALDA Chapter 41
Applied Longitudinal Data Analysis (ALDA) Chapter 4 Models
A two-level longitudinal model of data from alcohol1_pp.save.
The authors used a program called MLwiN. We’ll compare SPSS MIXED results with theirs.
Sample is 82 adolescents observed over 3 waves of data collection. Initial age was 14.
Data were obtained at 3 time periods, denoted age_14 = 0, 1, and 2.
Dependent variable based on a 4-item instrument assessing alchohol use, with each response on an 8-point scale, rating frequency with which they 1) drank beer or wine, 2) drank hard liquor, 3) had five or more drinks in a row, and 4) got drunk.
The square root of the mean of the 4 responses at a time period is what is in the data file.
Distribution of the dependent variable at each time period is
On the right is a binned scatterplot, showing the same information.
Here’s a graph created by GLM that shows the change in mean alcuse across time periods nicely.
As you can see from the above histograms and binned scatterplot, the dependent variable is quite positively skewed at each time period. The largest possible mean value is 8, the square root of which is 2.4. I don’t see how values > 3 were obtained. I’ll trust the authors.
Two potential time-invariant Level 2 predictors
COA: Child Of an Alcoholic = 0 if no; 1 if yes.
PEER: Proportion of respondent’s peers who use alcohol gotten at the beginning of data collection on a 6 pt. scale. Again, each entry in the data file is the square root of the response to the questionnaire item.
Below is the relationship of alcuse to PEER at each time period, age_14=0, 1, and 2. It appears that the relationship changes across time periods.
Respondents’ alcohol use seems to become less highly related to peer use after two years.
Here’s a graph of alcuse vs. COA collapsed across time periods.
The text shows graphs from 8 respondents in Figure 4.1. Here is the syntax that was used to create those graphs, along with the graphs.
compute filt=0.
if any (id, 4,14,23,32,41,56,65,82) filt=1.
execute.
filter by filt.
IGRAPH /viewname = 'Scatterplot'
/x1=var(age_14) type = scale
/y = var(alcuse) type=scale
/panel = var(id)
/fitline method=regression linear line= total MEFFECT
/catorder var(id) (ascending values omitempty)
/SCATTER coincident=none.
I believe that the IGRAPH procedure has been replaced by a procedure names GGRAPH. My PASW 18.0 said that IGRAPH would soon not be supported. But SPSS 19.0 and 20.0 ran the above syntax with no warning. So, evidently, IBM decided to continue supporting the IGRAPH syntax. Don’t mess with your user base.
The basic Level 1 model:
Yij = p0i + p1i*TIMEij + eijWhew! No quadtratic term!!
Here, subscript i = individual and j = time period, with values 0, 1, and 2.
This is the same notation we’ve used for longitudinal models from Heck et al. although note that the subscripts ij are used rather than ti in Heck.
The Level 2 models will relate the intercept and/or slope of the Level 1 models to other variables, in this case, COA and PEER.
Here are the Level 1 fitted lines for the two values of COA . . .
Syntax:
value labels coa 0 "Not COA" 1 "COA".
igraph
/x1=var(age) type=scale
/y=var(alcuse) type=scale
/color=var(id) type=categorical
/panel=var(coa)
/fitline method=regression linear line=total meffect spike=off
/scalerange=var(alcuse) min=-1 max=4.
Graphs:
The major difference noted in the text is the difference in intercepts. The COA kids seemed to be using more alcohol at the beginning of the study than were the Not COA kids.
The Level 2 models relate Level 1 intercept and slope to the Level 2 COA variable. (ALDA, Eq. 4.2, p 78)
Alas, ALDA uses the g notation for Level 2. Sorry!!
Model of the Level 1 intercept:p0i = g00 + g01*COA + z0i
Model of the Level 1 slope:p1i = g10 + g11*COAi + z1i
(The ζ symbol used in the text is the Greek lower case zeta.)
Note that in both cases, there is a random component – random from person (i) to person. So these Level 2 models assume random variation in intercepts beyond the variation associated with COA and random variation in slopes beyond the variation associated with COA.
Creating a composite model
This is something we’ve done several times. It won’t hurt to overlearn it.
The Level 1 model:Yij = p0i + p1i*TIMEij + eij
Substituting the Level 2 model for the intercept:Yij = g00 + g01*COA + z0i + p1i*TIMEij + ei
Now substituting the Level 2 model for the slope:
Yij = g00 + g01*COA + z0i+ (g10 + g11*COAi + z1i )*TIMEij + ei
This is the model presented at the bottom of page 80. It’s comforting to see different texts presenting the same procedures.
We can expand it by multiplying through by TIME
Yij = g00 + g01*COA + z0i+ g10*TIMEij + g11*COAi *TIMEij + z1i*TIMEij + ei
We can then rewrite the model, putting the fixed part first, followed by the random part
Yij = g00 + g01*COA + g10*TIMEij + g11*COAi *TIMEij + z0i + z1i *TIMEij + ei
The authors remind us that there are two separate representations of the whole model (p 81).
The level-1/level-2 specification is
The Level 1 model:Yij = p0i + p1i*TIMEij + eij
The Level 2 model of the Level 1 intercept:p0i = g00 + g01*COA + z0i
The Level 2 model of the Level 1 slope:p1i = g10 + g11*COAi + z1i
The composite model specification is
Yij = g00 + g01*COA + g10*TIMEij + g11*COAi *TIMEij + z0i + z1i *TIMEij + ei
Fixed / Structural Component p 82
Note that ALDA refers to the part of the model that we’ve been referring to as the Fixed part as the Structural part.
The text distinguishes the interpretation of the parameters from the two conceptualizations
The fixed part of the composite specification is
Yij = g00 + g01*COA + g10*TIMEij + g11*COAi *TIMEij
The interpretation from the level-1/level-2 conceptualization focuses on specific aspects of the trajectories, i.e., the intercept or the slope
The initial level of alcuse depends on whether parents used alcohol
The rate of change in alcuse depends on whether parents used alcohol
The interpretations from the composite conceptualization focus onthe effect of all the variables on alcohol at a specific time . . .
Alcohol use at a specific time depends on
1) TIME, 2) whether parents used alcohol and 3) the product of time and whether parents used.
Note that the product indicates that the relationship of alcuse to TIME may depend on the level of COA.
Random/Stochastic Component
Note that ALDA refers to what we’ve been calling the random component as the stochastic component.
(Wouldn’t life be dull if everyone used the same words for the same concepts?)
The stochastic component is
z0i + z1i *TIMEij + eij
The composite residual
The text discusses the sum of the random components as the composite residual
z0i + z1i *TIMEij + eij
The composite residual is still a residual - the difference between observed Y and predicted Y.
Note that the residual depends on TIME. It may become larger as TIME increases if z1i is positive, or more negative if z1i is negative. This, of course, is not the assumption made in OLS regression, where residuals are assumed to be equally variable across the whole range of each independent variable. So the residuals are potentially heteroscedastic. This, of course, is part of what distinguishes multilevel from OLS modeling.
The fact that each composite residual has components that are the same across time periods – each z is the same for each value of i across values of j – means that the residuals at different j values will likely be correlated with each other. That is, the residual at time j=0 is z0i + z1i*0 + ei0. The residual at time j=1 is z0i + z1i + ei1. The residual at time j=2 is z0i + z1i*2 + ei2. The common components, z0i and z1i in two or more residuals means that the values at j=0 and j=1 will be correlated, the values at j=0 and j=2 will be correlated and the values at j=1 and j=2 will be correlated. This autocorrelation is discussed in the text on p. 84-85.
For example, suppose persons had the following values of z0i, the random intercept component.
Person(i)z0ij=0j=1j=2
155+…5+…5+…
2-2-2+…-2+…-2+…
333+…3+…3+…
400+…0+…0+…
522+…2+…2+…
6
Scanning any pair of j= columns suggests that the pairs will be correlated, since they share the z0i component.
Since all the residuals for a given person have shared components, e.g., z0i, there will be positive correlations across persons between pairs of residuals. This is part of what caused statisticians to create multilevel models.
Estimation of parameters . . .
1. The most commonly used method is called the method of maximum likelihood.
In this technique, the probability of the specific sample outcome is computed under the assumptions of the model. Values of the parameters of the model – the fixed effects and the variances of random effects – are found that yield the largest possible probability – the maximum likelihood of the data under the assumptions of the model. Those values are the values reported.
Advantages: Asymptotically unbiased and most efficient – least variable from sample to sample from the same population.
Disadvantages – Don’t work well for small samples; Requires that the residuals be normally distributed.
2. Ordinatory Least Squares (OLS). Predicted values of the observations are obtained under the model’s assumptions and values of the parameters that minimize the squares of the differences between the predicted and observed outcomes are reported. Requires that residuals be independent and identically distributed across observations.
Advantages: Estimates are unbiased
Disadvantages: Residuals may be heteroscedastic and / or autcorrelated. Such characteristics affect OLS estimates.
3. Generalized Least Squares (GLS).
Minimizes the differences between predicted and observed values while allowing the residuals to be heteroscedastic and correlated. It does this by estimating the variances and covariances of the residuals in an initial OLS, and then using those estimates when minimizing the differences between predicted and actual.
Advantages: Unbiased estimates; Residuals do not have to be normally distributed.
4. Iterative GLS.
Obtains multiple estimates of the variances and covariances, alternately estimating them, then using the estimates to form new parameter estimates, then estimating the variances and covariances again using the new parameter estimates, continuing this process until a stable result is achieved.
5. Restricted Maximum Likelihood (RML).
This technique is a maximum likelihood procedure that adjusts the degrees of freedom for estimating the random components taking into account the number of fixed parameters estimated.
Advantages: Estimates are unbiased.
Disadvantages: Can’t use chi-square difference tests to test hypotheses about the fixed parameters.
SPSS MIXED’s default is RML, specified as REML. Full maximum likelihood is specified as ML.
Alas, GLS and IGLS are not available in MIXED.
ALDA’s Sequence of model tests . . .
The Unconditional Means model – ALDA’s Model A
Level 1 model: Yij = p0i + eij
Level 2 intercept model: p0i = g00 + z0i
Level 2 slope model:There is no slope model because there is no Level 1 slope
Composite: Yij = g00 + z0i + eij
z0i is the difference between person i’s mean and the grand mean.
eij is the difference between person i’s jth score and person i’s mean.
ALDA states that “The primary reason we fit the unconditional means model is to estimate these variance components, which assess the amount of outcome variation that exists at each level. Associated hypothesis tests help determine whether there is sufficient variation at that level to warrant further analyses.
The pull down menus for the unconditional means model are
The syntax for the unconditional means model is
title "Model A".
mixed alcuse /print=solution /method=ml
/fixed=intercept
/random intercept | subject(id).
Fixed Effects
Type III Tests of Fixed EffectsaSource / Numerator df / Denominator df / F / Sig.
Intercept / 1 / 82 / 92.796 / .000
a. Dependent Variable: alcuse.
Estimates of Fixed Effectsa
Parameter / Estimate / Std. Error / df / t / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Intercept / .921955 / .095707 / 82 / 9.633 / .000 / .731563 / 1.112347
a. Dependent Variable: alcuse.
Covariance Parameters
Estimates of Covariance ParametersaParameter / Estimate / Std. Error / Wald Z / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Residual / .561747 / .062035 / 9.055 / .000 / .452419 / .697494
Intercept [subject = id] / Variance / .563862 / .119112 / 4.734 / .000 / .372702 / .853069
a. Dependent Variable: alcuse.
These are the same values as presented in Table 4.1 in the text. Note that the Wald Z test indicates that there is unaccounted for variance n the residuals and in the intercepts across persons.
The intraclass correlation coefficient.
The intraclass correlation coefficient can be computed any time the total variance in a sample can be partitioned into two pieces – one comprising variance between units (people here) and the other comprising variance within units.
The value of the intraclass correlation coefficient, symbolized as ρ is interpreted as the proportion of variance that is between units.
The formula is ρ = σbet2 / (σbet2 + σwith2)
In this problem σbet2 is the intercept variance from the above output, .563862.
σwith2 is the residual variance from the above output, .561747.
This means that the proportion of variance between people is .563862 / (.563862+.561747) = .501 (big)
The error autocorrelation coefficient
As stated in ALDA on p. 97, for Model A, the unconditional means model, the intraclass correlation coefficient is also the average correlation between pairs of residuals. We would expect pairs of residuals to be correlated .5 for these data.
The unconditional Growth Model – Model B (p 97) Should be called the “Random intercept and growth” model.
(Self test – what’s an unconditional means model? What’s an unconditional growth model?)
Level 1 model: Yij = p0i + p1i*TIMEij + eij
Level 2 intercept model: p0i = g00 + z0i
Level 2 slope modelp1j = g10 + z1i
Composite model: Yij = g00 + z0i + (g10 + z1i)*TIMEij + eij
Changes in interpretation of the variance components
σe2 is now not variance about the mean, but instead is variance about a linear trajectory, but that variance could vary from time period to time period.
σ0i2 is now not variance of a person’s average scores about a grand mean, but is now the variance of the intercept of the linear trajectory – variation in initial status, rather than average status.
σ1i2 did not exist in the unconditional means model. It is now variance in the tilt of the linear trajectory.
We now should also be prepared to consider the covariance of the intercept and slope.
The dialog boxes for the Unconditional growth model
Yij = g00 + z0i + (g10 + z1i)*TIMEij + eij (Rep=Scaled Identity; Sub=UN)
The syntax for application of Model B:
Yij = g00 + z0i + (g10 + z1i)*TIMEij + eij (Rep=Scaled Identity; Sub=UN)
title "Model B".
mixed alcuse with age_14
/print=solution testcov
/method=ml
/fixed = age_14
/random intercept age_14 | subject(id) covtype(un).
The output of Model B:
(Maximum Likelihood Estimates)
Fixed Effects
Type III Tests of Fixed EffectsaSource / Numerator df / Denominator df / F / Sig.
Intercept / 1 / 82.000 / 38.417 / .000
age_14 / 1 / 82.000 / 18.780 / .000
a. Dependent Variable: alcuse.
Estimates of Fixed Effectsa
Parameter / Estimate / Std. Error / df / t / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Intercept / .651303 / .105080 / 82.000 / 6.198 / .000 / .442265 / .860342
age_14 / .270651 / .062455 / 82.000 / 4.334 / .000 / .146409 / .394894
a. Dependent Variable: alcuse.
Covariance Parameters
Estimates of Covariance ParametersaParameter / Estimate / Std. Error / Wald Z / Sig. / 95% Confidence Interval
Lower Bound / Upper Bound
Residual / .337292 / .052676 / 6.403 / .000 / .248354 / .458080
Intercept + age_14 [subject = id] / UN (1,1) / .624357 / .148062 / 4.217 / .000 / .392262 / .993778
UN (2,1) / -.068440 / .070078 / -.977 / .329 / -.205790 / .068911
UN (2,2) / .151203 / .056470 / 2.678 / .007 / .072721 / .314385
a. Dependent Variable: alcuse.
The prediction equation is Predicted alcuse = .651303 + .270651*Age_14.
Note that it appears that in the population, there is variability about the linear trajectory. So, in the words of the text, “. . . some important within-person variation still remains at level-1 (p < .001). This suggests that it might be profitable to introduce substantive predictors into the level-1 submodel.”
Note that residual variability decreased to .337292 from .561747 in Model A, a 40% reduction in uncertainty of prediction.
The Covariance Parameters output also suggests, in the words of the text, that “. . . there is non-zero variability in both true initial status and true rate of change. This suggests that it is worth trying to use level-2 predictors to explain heterogeneity in each parameter.”
The covariance of slope and intercept is not significantly different from zero. This suggests that there is no true relationship between initial status and rate of change in alcuse.
The relationship is best appreciated when expressed as a correlation.
The correlation is Covariance(slope,intercept)/(sd(slope)*sd(intercept)) = -.068/(.790*.389)=-.22.
Pseudo R2 statistics
RY,Y-hat2: The squared correlation between observed and predicted Ys.
Here’s syntax to get the predicted Ys. It’s the same as above with the addition of /save=fixpred
(Don’t use pred, use fixpred. This saves predicted values based on only the fixed parameters in the model.)
title "Model B".
mixed alcuse with age_14
/print=solution testcov
/method=ml
/fixed = age_14
/random intercept age_14 | subject(id) covtype(un)/save=fixpred.
Here’s a screen shot of the first few predicted Ys
The correlation between alcuse and FIXPRED_1 is .208. So R2 = .2082 = .043.
R2 as the proportionate reduction in residual variance
This value requires a baseline. It then assesses the proportionate reduction in residual variance from that baseline. Often, the unconditional means model, Model A in this chapter, is the baseline model.
The Pseudo R2 = (oe2(Baseline Model) – oe2(Improved Model)) / oe2(Baseline Model)
For Model B, the value is (.562 - .337) / .562 = .400.
The interpretation would be that .40 of the within-person variance in alcuse is explained by the linear relationship to TIME.
As we’ll see, Model B will serve as the Baseline Model for subsequent Pseudo R2 values.