Appendix B
Annotated SAS code for running the 12 different models in the empirical example
*Note: code in ALL CAPS designates information that will certainly change for different data sets. Code in lowercase will not necessarily change.*
*** Fixed Effect Model ***;
procglmdata=DATA;
class CLUSTER;
model Y= ELL|TREAT PRETEST|TREAT CLUSTER /nointsolutionclparm;
* Predictor variables appear in the model statement. Vertical bars are SAS code for including an interaction *;
estimate"Intercept" CLUSTER 000000111111/ divisor=6;
* Data are coded and sorted such that treatment group cluster are first and control group clusters are second. Divisor is a shortcut for dividing weights by the number of cluster in each group if there are equal numbers of control and treatment clusters*;
estimate"Treat" CLUSTER 111111 -1 -1 -1 -1 -1 -1/ divisor=6;run;quit;
*** ML ***;
procmixeddata=DATA method=ml;
* REML is the SAS default so method=ml instructs SAS to use ML *;
model Y= ELL|TREAT PRETEST|TREAT /solutioncl;
random INT /sub=CLUSTER;
* Random effects appear in the random statement. Sub= tells SAS what your cluster variable is *;run;
*** REML ***;
procmixeddata=DATA;
model Y= ELL|TREAT PRETEST|TREAT /solutioncl;
random INT /sub=CLUSTER;
* Random effects appear in the random statement. Sub= tells SAS what your cluster variable is *;run;
*** Kenward-Roger ***;
procmixeddata=DATA;
model Y= ELL|TREAT PRETEST|TREAT /solutionddfm=krcl;
* Ddfm= KR invokes the Kenward-Roger correction.
random INT /sub=CLUSTER;
* Random effects appear in the random statement. Sub= tells SAS what your cluster variable is *;run;
*** GEE ***;
procglimmixdata=DATA empirical;
* The empirical option tells SAS to use a sandwich estimator for the standard errors;
model Y= ELL|TREAT PRETEST|TREAT /solutioncl;
random _residual_ /sub=CLUSTER type=cs;
*_residual_ tells SAS not to use any random effects but to calculate the clustered residuals based on the variable in the subject option. Type = is where the working structure goes, CS specifies a compound symmetric working structure*;run;
*** MBN ***;
procglimmixdata=hlmempirical=mbn;
* empirical=MBN tells SAS to use a sandwich estimator with the Morel-Bokossa-Neerchal correction*;
model Y= ELL|TREAT PRETEST|TREAT /solutioncl;
random _residual_ /sub= CLUSTER type=cs;run;
*** Mancl-DeRouen ***;
procglimmixdata=DATA empirical=firores;
* empirical=firores tells SAS to use a sandwich estimator with the Mancl-Derouen correction*;
model Y= ELL|TREAT PRETEST|TREAT /solutioncl;
random _residual_ /sub=CLUSTER type=cs;run;
*** Kauermann-Carroll ***;
procglimmixdata=DATA empirical=root;
* empirical=ROOT tells SAS to use a sandwich estimator with the Kauermann-Carroll correction*;
model Y= ELL|TREAT PRETEST|TREAT /solutioncl;
random _residual_ /sub=CLUSTER type=cs;run;
*** Fay-Grabaurd ***;
procglimmixdata=DATA empirical=firoeeq;
* empirical=MBN tells SAS to use a sandwich estimator with the Fay-Grabuard correction*;
modelY= ELL|TREAT PRETEST|TREAT /solutioncl;
random _residual_ /sub=CLUSTER type=cs;run;
** MCMC, Inverse Gamma prior***;
procmcmcdata=DATA nbi=10000nmc=50000 thin=50seed=1091940
monitor=(b0 b1 b2 b3 b4 b5 s2g s2)
statistics (percentage= (2.55097.5));
* nbi is the number of burn-in interation. Nmc is the number of Monte Carlo interations. Percentage 2.5 and 97.5 gives the 95% credible interval*;
array gamma[NUMBER OF CLUSTERS];
parms b0 0 b1 0 b2 0 b3 0 b4 0 b5 0 ;
parmsu: 0;
parms s2 1 ;
parms s2g 1;
* parms give starting values for the parameters in the model *;
prior b: ~ normal(0, var = 10000);
* Non informative prior for the regression coefficients*;
prioru: ~ normal(0, var = s2g);
* var = s2g places the prior on the variance *;
prior s2: ~ igamma(0.01, scale = 0.01);
* the inverse gamma has hyperparameters 0.01 and 0.01. The “:” is a wildcard character meaning “any variable that starts with s2” *;
mu = b0 + b1 *TREAT+b2*ELL+b3*(TREAT*ELL)+ b4*(PRETEST)+ b5*(TREAT*PRETEST)+ u[CLUSTER];
model Y ~ normal(mu, var = s2);run;
*** MCMC Uniform Prior***;
procmcmcdata=DATA nbi=10000nmc=50000 thin=50seed=1091940
monitor=(b0 b1 b2 b3 b4 b5 s2g s2)
statistics (percentage= (2.55097.5));
* nbi is the number of burn-in interation. Nmc is the number of Monte Carlo interations. Percentage 2.5 and 97.5 gives the 95% credible interval*;
array gamma[NUMBER OF CLUSTERS];
parms b0 0 b1 0 b2 0 b3 0 b4 0 b5 0 ;
parmsu: 0;
parms s2 1 ;
parms s2g 1;
* parms give starting values for the parameters in the model *;
prior b: ~ normal(0, var = 10000);
* Non informative prior for the regression coefficients*;
prioru: ~ normal(0, sd = s2g);
* sd = s2g places the prior on the standard deviation instead of the variance as recommended by Gelman (2006)*;
prior s2: ~ uniform(0,10);
* the uniform range is 0 to 10. The “:” is a wildcard character meaning “any variable that starts with s2”;
mu = b0 + b1 *TREAT+b2*ELL+b3*(TREAT*ELL)+ b4*(PRETEST)+ b5*(TREAT*PRETEST)+ u[CLUSTER];
model Y ~ normal(mu, sd = s2);run;
*** MCMC Half Cauchy Prior***;
procmcmcdata=DATA nbi=10000nmc=50000 thin=50seed=1091940
monitor=(b0 b1 b2 b3 b4 b5 s2g s2)
statistics (percentage= (2.55097.5));
* nbi is the number of burn-in interation. Nmc is the number of Monte Carlo interations. Percentage 2.5 and 97.5 gives the 95% credible interval*;
arrayu[NUMBER OF CLUSTERS];
parms b0 0 b1 0 b2 0 b3 0 b4 0 b5 0;
parmsu: 0;
parms s2 1 ;
parms s2g 1;
* parms give starting values for the parameters in the model *;
half=pdf("Cauchy", s2, 0, 4) + pdf("Cauchy", -s2, 0, 4);
halfg=pdf("Cauchy", s2g, 0, 4) + pdf("Cauchy", -s2g, 0, 4);
* Use SAS programming statement to specify a half-Cauchy distribtion*;
prior b: ~ normal(0, var = 10000);
* Non informative prior for the regression coefficients*;
prioru: ~ normal(0, sd = s2g);
* sd = s2g places the prior on the standard deviation instead of the variance as recommended by Gelman (2006)*;
prior s2 ~ general(half, lower=1e-10);
* Use general statement to tell SAS that the user is defining their own distribution: the halft defined above. Lower = 1 e-10 is required because the half-Cauchy distribution only has support for positive values.
prior s2g~general(halfg,lower=1e-10);
mu = b0 + b1 *TREAT+b2*ELL+b3*(TREAT*ELL)+ b4*(PRETEST)+ b5*(TREAT*PRETEST)+ u[CLUSTER];
model Y ~ normal(mu, sd = s2);run;