Accounting for Covariance Heterogeneity in Estimates of the Standardized Mean Change

Scott B. Morris

Illinois Institute of Technology

Paper Presented at the 24th Annual Conference of the Society for Industrial and Organizational Psychology, New Orleans, LA, April 2009.


Accounting for Covariance Heterogeneity in Estimates of the Standardized Mean Change

Scott B. Morris

Illinois Institute of Technology

Accurate meta-analysis requires a thorough understanding of the statistical properties of effect size estimates. In particular, estimates of the sampling variance of individual effect sizes are used in most meta-analysis procedures, such as calculating the weighted mean effect size, testing for homogeneity of effect size, and estimating between-study random variance components. Given their central role in meta-analytic calculations, it is important to understand the conditions under which these sampling variance estimates will be accurate.

In experimental designs and program evaluation research, the effect size is typically defined as the standardized mean difference between groups. Methods for meta-analyzing the standardized mean difference (e.g., Hedges & Olkin, 1985) have been developed under the common assumptions of independence, normality and homogeneity of variance. Research has shown that violating these assumptions can bias meta-analytic results (Grissom & Kim, 2001; Harwell, 1997; Morris, 2008). The current study examines the assumption that the variance and covariance of outcome measures are equal across groups.

Effect Size for Pretest-Posttest-Control Designs

The standardized mean difference can be estimated from a variety of experimental designs. In research on organizational interventions or training programs, it is common to measure effectiveness in terms of the mean change in an outcome variable over time. Optimally, a study will include a control group that does not receive the intervention under study, so that the change in the treated group can be compared to the change without treatment.

The pretest-posttest-control (PPC) design is widely regarded as an effective research design for program evaluation research (Cook & Campbell, 1979). Morris (2008) described an effect size for the PPC design, where the standardized effect of the treatment is defined as the difference between groups in the mean pre-post change, divided by the common standard deviation,

, ( 1 )

where μgt is the mean of group g at time t, and s is the standard deviation of the untreated population.

This definition of effect size differs from the one most commonly used in meta-analytic work, which is based on the difference in posttest means (Hedges & Olkin, 1995),

. ( 2 )

When participants are randomly assigned to groups, the population pretest means can be assumed to be equal across groups, and the two definitions of effect size are equivalent. However, in quasi-experimental research, where the groups cannot be assumed to be equal at baseline, the PPC effect size provides a better index of the treatment effect (Morris & DeShon, 2002).

An individual PPC study will provide estimates of the mean and standard deviation of scores in the treatment and control groups at both pretest and posttest. Let the sample mean and standard deviation group g at time t be indicated by Mgt and SDgt, respectively. A sample estimate of the effect size can be computed from the sample means and the pooled pretest standard deviation (Carlson & Schmidt, 1999; Morris, 2008),

( 3 )

where the pooled standard deviation is defined as

. ( 4 )

The effect size estimate also incorporates a correction for bias suggested by Hedge & Olkin (1985). The correction factor is a function of the degrees of freedom associated with the effect size estimate,

. ( 5 )

Heterogeneity of Variance and Covariance

In many situations, it is reasonable to expect individual differences in the treatment effect. Inconsistencies in program delivery can result in differences in the training received by each individual. Further, the same training program may not be equally effective for all participants. Consequently, the amount of change due to training will be greater for some individuals than for others. For those in training group, the post-treatment variance will reflect both the initial individual differences as well as differences in the effectiveness of training. This will result in post-training scores that have greater variance than pre-training scores (Carlson & Schmidt, 1999; Cook & Campbell, 1979). Alternatively, if individuals with low initial status on the outcome variable show greater change than those with high initial status, one would expect the posttest variances to be smaller than at pretest (Alliger & Katzman, 1997).

The same phenomenon that can lead to inflated or deflated posttest variances will also tend to produce covariance heterogeneity. The correlation between pre- and posttest scores will depend on several factors, one of which is the degree to which stable individual differences are correlated with the amount an individual changes over time. Individuals who are initially higher on the outcome may show greater improvement than those with lower initial status. Conversely, it may be those with the lowest initial performance that show the greatest change. In either case, the dependence of change on initial status will impact the size of the pre-post correlation, and consequently, this correlation will likely differ between treatment and control groups.

The following sections develop a method to account for variance and covariance heterogeneity when conducting a meta-analysis. This will be followed by analyses of the magnitude of covariance heterogeneity to be expected in program evaluation research, and its impact on meta-analytic results.

Statistical Theory for Meta-Analytic Estimates of Effect Size

The statistical theory for meta-analysis was developed by Hedges (1981) for the comparison of independent groups at a single point in time, but can be readily generalized to studies using the PPC design. We first develop the statistical theory using a general definition of the effect size, and then apply the theory to effect size estimates for the PPC design.

A general form of the standardized mean difference for a variety of research designs can be written as an estimate of a contrast among means divided by an estimate SD of the common standard deviation,

. ( 6)

The form of will depend on the particular research design. For the PPC design, is the mean pre-post change in the treatment group minus the mean pre-post change in the control group.

Let A be defined as the ratio of the sampling variance of the mean contrast to the variance of individual scores,

. ( 7)

It can be shown that the effect size estimate is distributed as times a non non-central t distribution (Huynh, 1989). From this, the sampling variance of the effect size estimate is defined as

. ( 8)

PPC design with homogeneous pre-post correlation. As noted above, the sampling distribution of the effect size depends on the ratio, A, of the variance of the numerator of the effect size (i.e., the mean contrast) to the variance of raw scores in the untreated population. For dppc, this ratio is given by

. ( 9)

If the variance and pre-post correlation are assumed to be equal across control and treatment groups, the variance of the mean contrast is

, ( 10)

where

. ( 11)

The ratio of the variance of the mean contrast to the variance of individual scores is

. ( 12)

Under this assumption, the variance of dppc is

, ( 13)

which is equivalent to the formula presented in Morris (2008).

PPC design with u nequal heterogeneous variance. Morris (2005) presented a method for computing the sampling variance of effect sizes that does not assume homogeneity of variance. However, this work was limited in that it assumed equal pre-post correlations across groups. In the following, this assumption is relaxed.

Different assumptions about the pattern of heterogeneity of variance and covariance will lead to different forms for the variance of the mean contrast, which can then be used to derive the appropriate sampling variance. To the extent that there are individual differences in treatment effectiveness, it is likely that the posttest variance in the treatment group will differ from the variance of pretest scores. This same phenomenon may cause the pre-post correlation to differ between the treatment and control groups.

Because conditions that do not receive the treatment will be unaffected by variability in treatment effectiveness, it is assumed that pretest scores in both groups and posttest scores in the control group all have variance equal to the untreated population, s2. However, the posttest variance in the treatment group, sT22, may differ from the other cells. From these assumptions, the variance of the mean contrast is,

, ( 14)

where rT and rC are the pre-post correlation in the treatment and control groups, respectively.

Replacing the treatment group posttest variance with,

, ( 15 )

the variance of the mean contrast can be written,

. ( 16)

Consequently, the ratio of the variance of the mean contrast to the variance of the untreated population is

, ( 17)

and the sampling variance of dppc would be

. ( 18)

Accounting for Covariance Heterogeneity in Meta-Analysis

Meta-analysis procedures can be easily adapted to allow for group differences in pre-post correlations. All that is needed is to obtain estimates of sampling variance for individual effect sizes that are appropriate for the data, and then to use these variance estimates when calculating meta-analytic statistics. Variance estimates are used in the calculation of the precision-weighted mean effect size, as well as homogeneity statistics and estimates of random variance components (Hedges & Olkin, 1985). By substituting the appropriate variance formula (i.e., Equation 18), existing meta-analyses procedures will appropriately account for within-study covariance heterogeneity.

The impact of the corrected formulas is likely to be seen primarily when evaluating the variance of effects sizes across studies. When conducting a meta-analysis, it is common to test for homogeneity of effect size or to estimate random between-study variance components in order to determine whether the magnitude of the treatment effect differs across studies. The commonly used Q-test (Hedges & Olkin, 1985) compares the observed variance to the variance expected due to sampling error. This latter term is basically the average of the individual sampling variance estimates from each study. Therefore, to the extent that the sampling variance estimates for the individual studies are inaccurate, the Q-test will be biased.

In a related line of research, Morris (2005) examined violations of the assumption that scores have equal variance across time and group. This research demonstrated that the standard variance formulas used in meta-analysis, which assume homogeneity, will tend to underestimate the actual sampling variance when posttest variances are inflated. This inaccuracy can be quite large under realistic conditions, and failure to account for heterogeneity of variance also resulted in greatly inflated Type I error rates in the Q-test for homogeneity of effect size.

The formula in Equation 18, which allows for different pre-post correlations, should provide the most accurate estimate of sampling variance, and therefore the most accurate meta-analytic results. However, a potential problem with this approach is that separate estimate of the correlations may not be available. Many studies do not report the pre-post correlations at all, let alone separately by groups. Therefore, the researcher may have no choice but to use a pooled estimate that does not differentiate between groups.

Given the difficulty of obtaining the necessary estimates, it is important to examine how much impact covariance heterogeneity will have on variance estimates, and to what extent assuming equal correlations will bias meta-analytic results. In order to do this, it is first necessary to determine the extent to which pre-post correlations are expected to differ across groups. The following section presents a structural model of individual differences in treatment effectiveness, and uses this model to explore the magnitude of covariance heterogeneity to be expected in program evaluation research.

Model of I ndividual Treatment Effectiveness

Models of individual change have become common in growth curve analysis (Bryk & Raudenbush, 1988), and the following model is consistent with those approaches. An experiment is conceptualized as a random sample of N participants from a common population, who are assigned to either a control or treatment group (groups C and T, respectively). To keep the discussion general, I will not specify the method of assigning individuals to groups, which could be either through random assignment as in a traditional experiment, or through some other process as is common in quasi-experimental designs (Cook & Campbell, 1979). Each individual is potentially measured on an outcome variable at two time periods. Pretest measures are taken before the treatment is administered, and posttest measures are taken at some point in time after the treatment group has received the intervention (indicated by time 1 and 2, respectively). Let Ytij represent the score at time t of person i in group j.

The pretest score will be influenced by both stable individual differences on the outcome (π), as well as random error representing temporal instability in the outcome variable (ε). The pretest score of individual i in treatment condition j can be represented by

(19)

The parameters of this model reflect the common population from which participants were sampled, which we will refer to as the untrea ted population. That is, μ is the mean of the untreated population, and σ2 = VAR(π+ ε) is the variance of the untreated population.

After the treatment has been administered, scores for those in the treatment group will be influenced by their initial individual differences, as well as the effect of treatment. Let the Dij represent the change produced by the treatment for person i in group j. The posttest scores would then be represented by

. (20)

If we assume that there is no treatment effect in the control group (i.e., DC1 = 0 for all i), the posttest scores in the control group have the same form as Equation 19, and will only differ from the pretest scores due to temporal instability (ε).

All ε and π in both groups are assumed to have means of 0. The mean treatment effect in the treatment group can be nonzero and will be indicated by μD. In the treatment group, DiT and πiT are assumed to have a bivariate normal distribution with variances of σ2D and σ2π, respectively, and covariance σDπ. This covariance term allows for the possibility that the effectiveness of treatment depends on the initial status. All other terms are assumed to be independent of each other. In the control group, DiC has variance 0, and πiC has variance σ2π equal to that in the treatment group. In both groups, ε1ij and ε2ij are assumed to have constant variance σ2ε.