Supplementary Analyses, p. 1

Early Interventions with Soldiers Returning from Iraq:

Randomization by Platoon

Supplementary Material: Imputation Analyses and d-effects for Small Battlemind

Versus Battlemind Debriefing

Correspondence concerning this supplemental material article should be addressed to Paul Bliese, Division of Psychiatry and Neuroscience, Walter Reed Army Institute of Research, 503 Robert Grant Ave., Silver Spring, MD 20910-7500; email:

Imputation Analyses

A series of analyses were conducted to determine whether there were detectable levels of bias in the sample of 1,060 who completed surveys at both times relative to the sample of 2,297 who were assigned into the four treatment arms. In the analyses, main-effect results from those who completed the survey both times (n = 1,060) were compared to results based on the complete sample of 2,297 for whom missing time 2 data were imputed. In the complete sample of 2,297, a minimum of 1,237 (53.9% of total) time 2 values were estimated for each outcome (the number of estimated values was higher in cases where those in the matched sample failed to complete relevant survey items).

In the comparison, we interpret differences in results between the sample of 1,060 and the complete sample of 2,297 as evidence of bias. Likewise, we interpret congruence in results between the sample of 1,060 and 2,297 as evidence that the sample of those who provided data at both times reflects the properties of the broader sample assigned to the treatment arms. Recall the study design was based on following units rather than individuals; therefore, the loss of data at time 2 was expected.

Missing data were imputed two different ways. Both ways used information about the person for whom the missing value was generated – an approach typically found to provide more accurate estimates (Engels & Diehr, 2003). Last observation carried forward (LOCF) was the first imputation method used. In LOCF, time 1 values were used as missing values for time 2. LOCF is intuitively appealing as a method to detect bias in this case because any non-random loss of information should be reflected in the LOCF results. For instance, if a large proportion of highly symptomatic individuals at time 1 in one condition failed to show up at time 2, the LOCF results would differ from the results based on completers only.

Multiple imputation (MI) was the second technique used to estimate missing values (Schafer, 1997). MI uses maximum likelihood to generate plausible estimates (along with random error) for missing values using information from the non-missing variables in the dataset. MI has been shown to produce unbiased parameter and variance estimates even in cases with considerable missing data (Schafer & Graham, 2002). In the current implementation of MI, the proportion of missing data was high; therefore, based on Schafer and Graham (2002), a total of 20 different imputed datasets were created and the results from the 20 datasets were summarized using methods described by Rubin (1987). These methods provide point-estimates and 95% confidence intervals. The MI routines were implemented using the mix package in R (Schafer, 1997).

In the analyses, results were contrasted for (a) the completer only sample of 1,060, (b) the LOCF sample in which only time 2 outcomes were imputed, and (c) the MI sample where all missing data were imputed. The specific analyses focus on the parameter estimates for each treatment arm based on models that control for preexisting differences in rank, gender, unit type (combat arms or other) and combat exposure. Note that the results are not directly comparable to the mixed-effects results in the main manuscript because in the main manuscript models were estimated using time 1 responses as a covariate. Time 1 values could not be used as a covariate in the missing data comparison because the use of LOCF would produce a situation where over 50% of the sample had exactly the same value for time 1 and time 2.

Results

Tables 1 through 4 provide the results of the analyses for each outcome (PCL, PHQ-D, Sleep and Stigma, respectively). Note that LOCF could not be used for sleep because the sleep scale was not assessed at Time 1. MI-based estimates of time 2, however, could be estimated using other variables in the dataset.

Table 1 lists the results for the PCL, and provides no evidence of differences in the parameter estimates (e.g., values) among (a) completers only, (b) LOCF and (c) MI. For instance, the estimate for Battlemind Debriefing versus Stress Education is -0.13 when based on the sample of 1,060; for LOCF the value is -0.11, and for MI it is -0.10. Importantly, however, the MI-based 95% confidence intervals for this parameter range from -0.23 to 0.03 suggesting that the observed value of -0.13 for completers only and the -0.11 value for LOCF are well within the expected range or normal variability around -0.10. Indeed, the values for all predictors in the completers only sample (and the LOCF sample) are within the 95% MI-based confidence interval estimates.

Details for PHQ-D (Table 2), Sleep problems (Table 3) and Stigma (Table 4) consistently show that the parameter estimates based on (a) completers only and (b) LOCF are within the 95% confidence interval estimates of the MI samples. It is worth highlighting that in no cases do the parameter estimates of any variable from the completer only or LOCF analyses fall outside the 95% MI-based confidence interval estimates. Overall, these results show a high degree of consistency and provide no evidence that analyses based only on the 1,060 who completed the survey would be biased relative to the larger sample who were administered the intervention. Based on these results we believe that it is more conservative to restrict the analysis to the smaller sample of observed data and rely less on imputed data given the large numbers of missing values that would need to be imputed.

Effect sizes for Small Battlemind versus Battlemind Debriefing

Supplemental analyses contrasted small group Battlemind Training with Battlemind Debriefing by using the Debriefing condition as the referent (recall the analyses in the manuscript used Stress Education as the referent). With the exception of changing the referent, the analyses were identical to those presented in Table 3 of the main manuscript. Results of these analyses revealed no significant differences between the small group Battlemind Training and the Battlemind Debriefing conditions. Effect sizes were estimated based on the values in Table 4 of the main manuscript and are presented here in the supplemental analyses as Table 5. While some of the d-effects are of moderate magnitude, it is important to keep in mind that the effects were not statistically significant in the analyses.

References

Engels, J. M. & Diehr, P. (2003). Imputation of missing longitudinal data: a comparison of methods.Journal of Clinical Epidemiology, 56, 968-976.

Rubin, D. B. (1987) Multiple Imputation for Nonresponse in Surveys.New York: Wiley.

Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data.London: Chapman & Hall.

Schafer, J.L., & Graham, J.W., (2002). Missing data: our view of the state of the art. Psychological Methods, 7, 147-177.

1

Supplementary Analyses, p. 1

Note: LOCF is Last Observation Carried Forward; MI is Multiple Imputation.

Note: LOCF is Last Observation Carried Forward; MI is Multiple Imputation.

Note: LOCF is Last Observation Carried Forward; MI is Multiple Imputation. LOCF could not be calculated for Sleep Problems because the sleep measure at T1 was not the same as at T2.

Note: LOCF is Last Observation Carried Forward; MI is Multiple Imputation.

1

Supplementary Analyses, p. 1

1

Supplementary Analyses, p. 1

1