S-Plus script file for analyzing data according to the model proposed in
M.A.Croon & M.J.P.M. van Veldhoven (2006) Predicting group level outcome variables from variables measured at the individual level: A latent variable multilevel model. Psychological Methods.
Introductory comments:
1. The data are collected in two different S-Plus data frames:
· data frame data1 contains the group level explanatory variables z1, z2,… and the group level dependent variable y in that order;
· data frame data2 contains the group identification variable g and the individual level explanatory variables x1, x2,…
· Important: The script file presented here assumes that at least one group level explanatory variable is included in the analysis. If no such variables are available, the present script file needs modification.
2. Notes on the group identification conventions:
· Groups should be identified by the consecutive integers1 to ng = number of different groups
· The individual level variable g specifies to which group each individual belong
· The data in data frame data1 should be stacked in the order of the group identification numbers.
3. The script file creates a data frame daf which contains for each group the following variables in that order:
ü The group level explanatory variables labelled z1, z2, …;
ü The unadjusted group means for the individual level explanatory variables. These variables are labelled xmean1, xmean2,….;
ü The adjusted group means for the individual level explanatory variables. These variables are labelled xtilde1, xtilde2, …;
ü The group level dependent variable y.
4. The regression analysis of y on the group level explanatory variables z and the adjusted group means for the individual level explanatory variables xtilde is carried out by calling the S-Plus function lm. The model to be used should be specified in the statement that defines the formula object fmt in the first executable statement of the script file.
#
#
# Script file Croon & van Beldhoven: revised 8/14/2006
#
# Analysis based on the multilevel latent variable model
# with group and individual explanatory variables
#
# Data frame data1 contains group variables z + y
#
# Data frame data2 contains individual variables g + x
#
#
#
# The regression model that is fitted has to be specified by
# the format fmt that
# specifies the names of the variables in the corrected model
#
# fmt is a formula specifying the regression model for y on the
# group explanatory variables x and the corrected group means xtilde
# Example: fmt <- as.formula (y~z1+z2+z3+xtilde1+xtilde2)
fmt <- as.formula(y~z1+z2+xtilde1+xtilde2)
#
# Script file calls a user defined function called: varnames
#
varnames <- function(name, m)
{
test <- is.character(name)
if(!test)
stop("Variable names should be alphanumeric!")
v <- paste(name, 1:m, sep = "")
v
}
ng <- dim(data1)[1]
nt <- dim(data2)[1]
g <- data2[,1]
ns <- tapply(rep(1,nt),g,sum)
mgroup <- dim(data1)[2]-1
mind <- dim(data2)[2]-1
x <- as.matrix(data2[,1+(1:mind)])
z <- as.matrix(data1[,1:mgroup])
y <- data1[,1+mgroup]
mux <- apply(x,2,mean)
muz <- apply(z,2,mean)
dz <- z - matrix(rep(muz,ng),ncol=mgroup,byrow=T)
xmean <- matrix(0,ng,mind)
for (i in 1:mind){
xmean[,i]<- tapply(x[,i],g,mean)
}
vv <- var(cbind(z,xmean))
ind1 <- 1:mgroup
ind2 <- mgroup + (1:mind)
vzz <- vv[ind1,ind1,drop=F]
vzxi <- vv[ind1,ind2,drop=F]
mm <- matrix(rep(c(xmean),rep(ns,mind)),ncol=mind)
d <- x -mm
mse <- t(d) %*% d/(nt-ng)
vu <- mse
d <- mm - matrix(rep(mux,nt),ncol=mind,byrow=T)
msa <- t(d) %*% d /(ng-1)
cc <- (nt - sum(ns^2)/nt)/(ng-1)
vxi <- (msa-mse)/cc
xtilde <- matrix(0,ng,mind)
r2 <- solve(vzz,vzxi)
r1 <- vxi - t(vzxi) %*% r2
id <- diag(mind)
for (i in 1:ng){
p <- solve(r1 +vu/ns[i],r1)
q <- r2 %*% (id-p)
xtilde[i,] <- xmean[i,] %*% p +mux %*% (id-p) +dz[i,] %*% q
}
daf <- data.frame(z,xmean,xtilde,y)
dimnames(daf)[[2]]<- c(varnames("z",mgroup),varnames("xmean",mind),varnames("xtilde",mind),"y")
#
#
#
# Regression analysis according to model specified by fmt
#
res <- lm(fmt,daf)
#
# White-Davidson-MacKinnon correction for heteroscedasticity
#
tm <- unlist(fmt[[3]])
lab <- tm[tm != "+"]
u <- cbind(rep(1,ng),as.matrix(daf[lab]))
e <- res$residuals
p <- solve(t(u) %*% u)
h <- diag(u %*% p %*% t(u))
d <- e^2/(1-h)
v <- p %*% t(u) %*% diag(d) %*% u %*% p
se <- sqrt(diag(v))
estim <- summary(res)$coefficients[,1:2]
estim <- cbind(estim,se)
dimnames(estim)[[2]] <- c("b","uncorrected se","corrected se")
estim