SW 983 – SURVIVAL ANALYSIS AND COX REGRESSION
Context
Last week we saw how logistic regression and discriminant analysis could be used for predicting group membership. The group membership could be based upon some demographic characteristic (e.g., gender, race), intervention (e.g., treatment vs. control) or event (e.g., recidivism, graduation, maintenance in the community, exit from welfare). When group membership is based on the occurrence of an event, we are frequently interested in not just the occurrence of the event but also the time to the event from some starting point (e.g., assignment to treatment). Survival analysis (a.k.a., life tables, event history analysis) provides an appropriate analytic tool for this situation. Because we are interested in the time between two events, survival analysis is applied to longitudinal or follow-up studies. A group of people who enter a study during a fixed time period and are followed over time is referred to as a cohort. Survival analysis will sometimes be referred to as cohort analysis.
As in MR, LR and DA, the predictor (independent) variables can be continuous or categorical. In the case of the latter, dummy variable coding is again used to incorporate the categorical variable into the right hand side of the equation (the variate). The variate will again yield useful information about the direction, and magnitude of the relationship between individual predictor variables and the likelihood of occurrence of the event while controlling for other predictor variables in the equation. Tests of significance for each predictor will be provided as well as odds ratios, which have the same interpretation as in LR.
Why not just calculate the average time to the event for each group and perform a t-test?
The problem with the comparison of group means method is that all of the subjects in your study may not be observed for the same period of time and for some the event will not have occurred during the observation period. These observations are referred to as censored cases. For these cases you only know that the event did not occur during the observation period. You do not know if or when the event might occur after you stopped observing. Survival analysis provides a method for dealing with censored data which does not bias your analysis (if the censored cases do not differ from those that remain under observation).
SPSS and Survival Analysis
We will use two modules in SPSS – Life Tables and Cox Regression
Life Tables
The basic idea of the life table is to subdivide the period of observation after a starting point, such as beginning placement in foster care, into smaller time intervals – say, months. For each interval (month), all children who have been observed at least that long are used to calculate the probability of a “terminal event” (the event could be positive like reunification or adoption) occurring in that interval. The probabilities estimated from each of the intervals are then used to estimate the overall probability of the event occurring at different time points.
Output includes the life table (see below), charts and statistical tests comparing levels of a factor if one is provided (i.e., groups).
Reading a Life Table
Interval Start Time. The beginning value for each interval. Each interval extends from its start time up to the start time of the next interval.
Number Entering This Interval. The number of cases entering the interval that have survived to the beginning of the current interval.
Number Withdrawn during This Interval. The number of cases entering the interval for which follow-up ends somewhere in the interval. These are censored cases; that is, these are cases for which the event of interest has not occurred at the time of last contact.
Number Exposed to Risk. This is calculated as the number of cases entering the interval minus one half of those withdrawn during the interval.
Number of Terminal Events. The number of cases for which the event of interest occurs within the interval.
Proportion of Terminal Events. An estimate of the probability of the event of interest occurring in an interval for a case that has made it to the beginning of that interval. It is computed as the number of terminal events divided by the number exposed to risk.
Proportion Surviving. The proportion surviving is 1 minus the proportion of terminal events.
Cumulative Proportion Surviving at End. This is an estimate of the probability of surviving to the end of an interval. It is computed as the product of the proportion surviving this interval and the proportion surviving all previous intervals.
Probability Density. The probability density is an estimate of the probability per unit time of experiencing an event in the interval.
Hazard Rate. The hazard rate is an estimate of the probability per unit time that a case that has survived to the beginning of an interval will experience an event in that interval.
Standard Error of the Cumulative Proportion Surviving. This is an estimate of the variability of the estimate of the cumulative proportion surviving.
Standard Error of the Probability Density. This is an estimate of the variability of the estimated probability density.
Median Survival Time. Time point by which the value of the cumulative survival function is 0.5. That is, it is the time point by which half of the cases are expected to experience the event.
Graphic Displays
Survival Chart – Plot of the cumulative survival values by time interval.
Hazard Chart
One Minus Survival Chart
Statistical Tests
Survival trends for different levels of a factor can be compared using the Wicoxon (Gehan) test. Overall and pairwise comparisons can be produced.
Cox Regression
Like Life Tables and Kaplan-Meier survival analysis, Cox Regression is a method for modeling time-to-event data in the presence of censored cases. However, Cox Regression allows you to include predictor variables (covariates) in your models. For example, you could construct a model of time to adoption based on characteristics of the child and services provided. Cox Regression will handle the censored cases correctly, and it will provide estimated coefficients for each of the covariates, allowing you to assess the impact of multiple covariates in the same model. You can also use Cox Regression to examine the effect of continuous covariates.
Choice of Life Tables, Kaplan-Meir and Cox Regression
The Life Tables procedure uses an actuarial approach to this kind of analysis (known generally as Survival Analysis). The Kaplan-Meier Survival Analysis procedure uses a slightly different method of calculating life tables that does not rely on partitioning the observation period into smaller time intervals. This method is recommended if you have a small number of observations, such that there would be only a small number of observations in each survival time interval. If you have variables that you suspect are related to survival time or variables that you want to control for (covariates), use the Cox Regression procedure. If your covariates can have different values at different points in time for the same case, use Cox Regression with Time-Dependent Covariates.