Supplementary Appendix 2: Treatment and assessment of missing data

Contents

Statistical methods and notes

Two types of incomplete data

Correction for selection bias of longitudinal cohort

Correction for bias due to loss to follow-up among longitudinal cohort

Sensitivity analysis

Impact of missing data on logistic regression models

Results

Patient flow

Inverse probability weighting so longitudinal cohort reflects unselected hospital cohort

Multiple Imputation

Sensitivity analysis of primary outcomes

Conclusion

Figures

Figure 1a—Patient Flow Diagram (Chart abstraction and patient recruitment)

Figure 1b—Patient Flow Diagram (Patient follow up)

Figure 2: Non-linear but monotonic association between days in hospital and probably of being in longitudinal cohort (knots placed at 2, 7 and 14 days)

Figure 3: Calibration plot of model used to calculate inverse probability weights

Figure 4: Kaplan-Meier curves with and without inverse probability weighting

Fgiure 5: Sensitivity anlaysis of physical recovery

Tables

Table 1: Baseline Characteristics of Study Patients (Completers vs. LTFU)

Table 2: Clinical outcomes (Completers vs. LTFU)

Table 3: Logistic model used to derive probability of patients being in longitudinal cohort.

Table 4: Logistic regression models predicting 12-month Physical Recovery after applying inverse probability weighting

Table 5: Logistic regression models predicting 12-month survival after applying inverse probability weighting

Table 6: Linear regression models predicting 12-month SF-36 physical function scores among survivors after applying inverse probability weighting

Table 7: Pearson correlation matrix:

Table 8: Estimates of 12 month PF recovery rate with 95% confidence intervals.

References

Statistical methods and notes

Two types of incomplete data

There are two sources of incomplete data in this study that could potentially bias estimates of the 12-month mortality and physical functioning (PF) recovery rates. Firstly, the longitudinal cohort required consent and active participation of the family members and was thus a selected group and potentially not representative of the target population of all patients at least 80 years old admitted to an ICU for at least 24 hours. For example, patients actively dying, patients in the ICU for a short period, and patients who did not have caregivers present early in their stay may have been less likely to be enrolled in the longitudinal cohort. Secondly, 17% of patients in the longitudinal cohort were lost to follow-up before the primary 12-month assessment. The survival status and PF recovery status are unknown in these patients and it is conceivable that these data are not missing at random (NMAR). We had a few mechanisms to determine mortality and once a patient was known to have died their 12-month mortality and PF recovery status were known for all future time points. For this reason, one might argue that the survival rate and PF recovery rate may be higher in the patients with missing outcome. On the other hand, among patients discharged alive from the hospital, follow-up required us to contact the patient or caregiver and it is conceivable that it was more difficult to make contact with a caregiver when the patient had died. As a result, it is uncertain as to which direction any bias, if present, is most likely to take.

The dataset is virtually complete (<0.5% missing) for baseline and hospital data on all patients in the longitudinal cohort. We have baseline and hospital information (but not follow-up outcomes) on a hospital cohort which is an unselected representative sample of our target population at the 22 participating ICUs. The hospital cohort data were collected by chart review for all consecutive patients over 80 admitted to the participating centres until enrolment was stopped at individual centres due to budgetary constraints. Enrolment in the hospital cohort was unrelated to any patient or hospital characteristics other than admission date. Therefore, we believe that the hospital cohort is a random sample of the target population of patients at least 80 years old and in the ICU for > 24 hours.

Correction for selection bias of longitudinal cohort

We employed inverse probability weighting (IPW) so we could use the selected longitudinal cohort sample to obtain population estimates for the unselected hospital cohort. We applied this method to all of our 12-month outcomes including all of the regression modelling. With this method we perform the usual analyses while weighting the observations inversely to the estimated probability of being in the longitudinal cohort compared to the hospital cohort. The estimation of these weights is based on logistic regression where the outcome variable is an indicator of the patient belonging to the longitudinal or hospital cohort and the predictors are carefully selected baseline or hospital outcome variables available in both cohorts that appear to most strongly predict enrolment in the longitudinal cohort. Study centre-level characteristics were not included in estimating weights as these were predictive of differences in enrolment between centres, but not of patients within a given centre. The probability of a patient belonging to the longitudinal cohort compared to the hospital cohort estimates the proportion of patients with similar characteristics who were enrolled in the longitudinal cohort. So for example, if patient A in the longitudinal cohort had half the probability of being selected as patient B, then patients with characteristics similar to patient A are underrepresented in the longitudinal cohort compared to patients with characteristics similar to patient B by a factor of ½. By weighting the analysis inversely to this proportion, we can use the longitudinal cohort to estimate population parameters from the unselected hospital cohort. This IPW approach is frequently used for dealing with missing data or non-representative samples surveys1.

In an unadjusted analysis, we compared all common baseline characteristics and hospital outcomes between the hospital and longitudinal cohort (table 1 and 2 main manuscript). Variables different between the cohorts at p<0.25 were treated as candidate variable in a backward stepwise logistic regression with entry and exit criteria of 0.15. Variables that were redundant (for example hospital mortality was considered instead of ICU mortality) or which reflected data collection artefacts rather than patient characteristics were not considered in the model. For example, centre was a strong predictor of being in the longitudinal cohort but we did not include this in our model because centre reflects differences in study management between centres rather than differences in the characteristics of patients in the longitudinal and hospital cohorts. Strong interactions were considered and the linearity of continuous variables was assessed. When necessary, continuous variables were represented by a restricted cubic spline with knots selected based on prior understanding of the data2.

Odds ratios of the final selected model are presented. The model’s goodness of fit (calibration) is depicted by a calibration plot and tested by the Hosmer-Lemeshow goodness of fit test. The ability to discriminate between the two cohorts is summarized by the c-statistic (i.e. area under the receiver operating curve, a.k.a. the concordance index). The more the patient characteristics differ between the two cohorts, the higher the c-statistics will be and the stronger the effect of the IPW will be. On the other hand, if the model does not discriminate at all (c-statistic = 0.5) then this implies the two cohorts are similar on the observed patient characteristics and IPW is unnecessary and will not alter the results.

Correction for bias due to loss to follow-up among longitudinal cohort

Of the 610 patients participating in the longitudinal cohort 17% have unknown 12 month mortality or return to baseline PF status. We have the primary outcome collected at baseline, 3, 6 and 9 months. Some baseline characteristics, and especially the earlier measures of outcome, were strongly correlated with the final 12 month primary outcome making this dataset a good candidate for multiple imputation (MI)3.

We used fully conditional multiple imputation with a logistic model to multiply impute the primary outcome of 12-month PF recovery while allowing for arbitrary missing pattern among the other variables4. In addition to PF recovery, the imputation model included: age, APACHE 2, primary admission diagnosis, Charlson comorbidity index, IQ code, and the SF-36 PF domain at months 3, 6, 9 and 12. The model was implemented by the FCS statement of PROC MI in SAS Version 9.45. One hundred randomly imputed datasets were created and analyzed and results were combined according to the method of Rubin6.

By assuming that the multivariate correlation structure of the missing data is similar to that among the observed data, MI randomly imputes values of the missing data from their expected distribution conditional on the observed data. Thus MI assumes that the outcome is missing at random after conditioning on all observed data. This assumption, known as missing at random (MAR), is weaker and much more realistic than the missing completely at random assumption (MCAR) required by complete case analysis. For example, if patients with higher PF recovery rates were more likely to be lost to follow-up then the data would not be MCAR, but would be MAR if those same patients had observed information that was predictive of their unobserved PF recovery rate.

Sensitivity analysis

The IPW and MI methods should reduce bias, and could entirely eliminate bias if the IPW weighting model and the MI model are correct. Both methods make the MAR assumption. Unfortunately, there is no way to test the MAR assumption and biases could remain if the data is not missing at random (NMAR). Therefore, we have used a pattern mixture approach to perform a sensitivity analysis to assess the possible range of 12 month survival and PF recovery given the amount of missing data7. The pattern mixture approach is simple, intuitive, transparent and allows us to model the outcome over the entire possible range of missing data assumptions.

With the pattern mixture approach, we model the unknown true sample value as a weighed mixture of the observed and missing data. For example, in the simple case of a binary proportion, the true sample proportion Ptrue=m*Pmiss+(1-m)*Pobs where m is the proportion of missing data, Pobs is the proportion among observed data and Pmiss is the unobserved proportion among the missing data. By plotting Ptrue(y-axis) versus the possible range of Pmiss(x-axis), we can see what the true value would be under the entire range of possible missing values, and we can assess we how much Pmisswould have to differ from Pobs to induce meaningful changes in Ptrue.

Since our primary outcome is a composite of survival and PF recovery among survivors, we have created a contour plot which shows the true PF recovery rate across the possible range of survival and PF recovery among survivors. In this contour plot, the x-axis and y-axis represent the survival rate and the PF recovery rate of survivors respectively among the patients lost to follow-up. Contours on the plot depict the resulting overall true PF recovery rate.

Impact of missing data on logistic regression models

We used inverse weighting in our logistic regression models so that our longitudinal cohort could be used to make inference to the unselected hospital cohort. This was necessary because when both predictors and outcomes are missing, as is the case when an entire patient is not represented in the sample, then logistic regression can provide biased estimates.

However, with logistic regression we do not need to worry about bias due loss to follow-up among the longitudinal cohort. This is because if no predictors are missing, as is essentially our case, the odds ratio estimates from logistic regression remain unbiased even when the outcome is NMAR. 8 For this reason, we did not perform multiple imputation or sensitivity analysis on the logistic regression models. The estimates of odds and risks derived from the logistic regression would be biased if the outcome was NMAR, but we only report odds ratios which remain unbiased.

Results

Patient flow

Figure 1a depicts the patient selection to the longitudinal and hospital cohorts. Figure1b shows the follow-up among the longitudinal cohort. 610 patients were enrolled into the longitudinal cohort. 103 patients were lost to follow-up by one year and 2 patients missed their assessment of baseline physical functioning leaving 505 of 610 (83%) patients evaluable for the primary outcome of recovery to baseline physical functioning.Most of the baseline characteristics of the 505 patients with known primary outcome were similar to the 105 with unknown primary outcome,except patients missing their primary outcome were on average 1 year younger and had a SOFA score one point lower. Although a similar proportion of patients missing their primary outcome lived at home before hospital admission, they were more likely to have lived with a family member then alone compared to patients with known 12-month PPS change. Table 2 shows that some hospital outcomes differed between patients with known and unknown primary outcome. This is expected since the primary outcome is known for any patient who died in the hospital. Thus the ICU and hospital mortality rate is zero among patient lost to follow-up, because they survived at least till hospital discharge, compared to 17% and 31% among completers.

Inverse probability weighting so longitudinal cohort reflects unselected hospital cohort

Table 3 provides the model selected to estimate the probability of belonging to the longitudinal cohort compared to the unselected hospital cohort. In particular, higher APACHE II and increased mortality were independently associated with a lower chance of being enrolled in the longitudinal cohort, but use of vasoactive drugs, longer hospital stay and live discharge from the hospital were associated with a greater chance of being enrolled in the longitudinal cohort.

Site was a strong predictor of enrolment, but we decided to leave it out of the model because it reflected differences in study management between sites rather than differences in the characteristics of patients in the two cohorts. Furthermore, since centre enrolment rates varied from 72% to 0%, including centre in the model resulted in some patients being weighted over a hundred times more than others. Finally, we noted that when centre was included in the prediction model, the corresponding weighting scheme resulted in estimates of 12 month mortality and PF recovery that were essentially unchanged from the unweighted estimates confirming our concern that the prediction model including centre was representing something other than differences between the patient populations of the hospital and longitudinal cohorts.

Figure 2 shows the non-linear association between hospital length of stay and probability of being enrolled in the longitudinal cohort. The knots were selected a-priori at 2, 7 and 14 days; there was a known threshold at 7 days because the recruitment focused on patients in the ICU for less than 7 days.

Figure 3 shows the calibration plot of the actual cohort membership (y-axis) versus the predicted membership (x-axis). The 610 points near the top of the figure represent patients in the longitudinal cohort, while the 894 points near the bottom of the figure represent the patients in the hospital cohort. The points have been randomly jittered slightly so that they are easier to distinguish. The non-parametric LOESS smoother shows the local proportion of actual membership in the longitudinal cohort against the model prediction. The dotted 45 degree line represents perfect calibration. It may be seen that the calibration of this model in this sample is excellent. Also it may be noted that the cohorts overlap slightly over the entire range of predicted probabilities which is required for the longitudinal cohort to be able to estimate the hospital cohort. The points in the top left of the figure represent the relatively few subjects in the longitudinal cohort with low probability of enrolment. These patients will be weighted more heavily than the relatively over-represented patients depicted at the top right of the figure. For example, a patient with an estimated probability of 0.1 will be weighted 8 times higher than a patient with a probability of 0.8. The inverse probability weights ranged from 1.3 to 11.2.

Figure 4 shows the Kaplan-Meier estimates of 12-month survival using the original unweighted data and inverse probability weighting. As expected the survival is slightly lower in the weighed estimate reflecting the lower survival in the hospital cohort compared to the observed longitudinal cohort. In particular, the unweighted and weighed 12-month survival estimates were 56% and 50% respectively. Both the weighted and unweighted Kaplan-Meier estimates still assume that censoring is uninformative; that is that censoring time is unrelated to the mortality time. This assumption is addressed in the sensitivity analysis.

Return to baseline physical functioning at 12 months (the primary outcome of this study) is defined as alive with a SF-26 physical functioning (PF) domain least 10 points and not 10 points or more below baseline. 103 of the 610 patients in the longitudinal cohort were lost to follow-up by 12 months and two additional patient missed their baseline physical functioning assessment leaving a total of 505/610 (83%) of patients evaluable for this outcome. Theunweighted and weighed estimates of this outcome were 24% (95% CI, 18% to 30%) and 22% (95% CI, 15% to 29%) reflecting the longitudinal and overall hospital cohorts respectively. Similar to the Kaplan-Meier survival estimates, the estimates of PF recovery assume that the probability of being missing is unrelated to the rate of PF recovery. This may or may not be true and leads us to our next section which may reduce potential bias by assuming only that data is missing at random after controlling for other observed data (MAR) rather than missing completely at random (MCAR). Our final section will then perform a sensitivity allowing for the data to be not missing at random even after controlling for all observed data (NMAR).