Statistical Analysis of Survival Data

Department of Biostatistics May 10 2003

University of Aarhus Michael Væth

STATISTICAL ANALYSIS OF SURVIVAL DATA

IN CLINICAL RESEARCH 3

INTERACTIONS – EFFECT MODIFICATION

Consider again the effects of type of infusion and anti-coagulation treatment on the rate of occurrence of it: In the analysis day 2 page 34-37 the effect of AK-treatment was the same in the two groups (described by the multiplier ). Is this a reasonable assumption?

Statistical Model: Two factors with interaction

Reference group: Na-lactate without AK-treatment

event-rate for reference group:

Assume:

* If NA-lactate is replaced by Glucose the rate is changed by a factor .

* If the patient receive AK treatment the rate is changed by a factor .

* If the patient receive both AK treatment and Glucose the rate is further changed by a factor

Then the event rates in the 4 group become

- anti.coag. / + anti-coag.
Na-lactate / /
Glucose / /

Effect of AK-treatment in Na-lactate group:

Effect of AK-treatment in Glucose group:

No interaction corresponds to

Using dummy variables to represent the model.

Define , and

Infusion / Anti-coag. / / /
Na-lactate / no / 0 / 0 / 0
Na-lactat / yes / 0 / 1 / 0
Glucose / no / 1 / 0 / 0
Glucose / yes / 1 / 1 / 1

The model may then be written as

where

No interaction corresponds to

NOTE: Interaction means that the effect of one variable on the occurrence of events depends on the value of another variable (i.e. the effect is modified the other variable

STATA commands (the two versions produce the same output)

xi:stcox i.group i.ak i.group*i.ak

xi: stcox i.group*i.ak

Output (selected parts only):

i.group _Igroup_1-2 (naturally coded; _Igroup_1 omitted)

i.ak _Iak_0-1 (naturally coded; _Iak_0 omitted)

i.group*i.ak _IgroXak_#_# (coded as above)

************ output omitted here **************

No. of subjects = 85 Number of obs = 85

No. of failures = 27

Time at risk = 9532

LR chi2(3) = 6.64

Log likelihood = -98.231844 Prob > chi2 = 0.0845

------

_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]

------+------

_Igroup_2 | 2.2942 1.052591 1.81 0.070 .9334595 5.63855

_Iak_1 |.3840848 .4105318 -0.90 0.371 .0472731 3.12061

_IgroXak_2_1 |2.904121 3.630981 0.85 0.394 .2504779 33.6713

------

Comments to output

1. Note that the interpretation of the parameters now is different: describes the difference between the NA-lactate and Glucose among patients without AK. The difference for patients with AK is .
A similar change in interpretation applies to

2. The relative rate ratio for the interaction term is not significantly different from 1 (p = 0.39). The likelihood ratio test becomes 0.81 (computed as 2[98.6349 – 98.2318], see also day 2, page 36) gives the same conclusion. The assumption of no interaction can therefore not be rejected.

3. Note however the extremely wide confidence interval for . Based on 27 events among 85 patients very little can be said about interactions.

CHECKING THE VALIDITY OF THE MODEL - MORE

PREVIOUSLY (day 2, page 32-33): Checking the validity of a model with a single covariate.

HERE: Checking the validity of a model with several explanatory variables.

Example

Consider a model in which the risk of developing it depends on type of infusion, AK-treatment and sex of the patient.

The variables group, ak and sex are covariates in the model.

xi: stcox i.group i.ak i.sex

This model defines 8 different subgroups of patients. In the model the 8 different event rates are assumed to be proportional.

Checking proportionality:

For each factor (group, ak and sex) consider the model in which this factor has been removed as a covariate and instead included as a stratification.

In a stratified Cox regression model a separate baseline hazard rate is estimated for each stratum.

A log-minus-log plot for a stratified Cox model will show if the proportionality between levels of the stratifying factor is reasonable after correction for the remaining factors.

STATA’s command stphplot has an option which allows adjustment for other factors.

STATA commands for checking proportionality in the present example. Each variable in turn are used for stratification while adjusting for the remaining variables.

stphplot , by(group) adjust(ak sex) nolntime

stphplot , by(ak) adjust(group sex) nolntime

stphplot , by(sex) adjust(group ak) nolntime

Output

Stratified on treatment groups:

Stratified on anti-coagulation treatment

Stratified on patient’s sex

Testing the proportional hazards assumption with STATA.

STATA can perform a formal test of the proportional hazards assumption based on so-called Schoenfeld residuals (overall test) and scaled Schoenfeld residuals (separate test for each variable in the model). These residuals must be saved in the stcox command which fits the model to be validated.

Example

stcox group ak sex , ///

sch(res*) sca(sres*) nolog noshow

stphtest , detail

Options nolog noshow minimize output from stcox. Alternative: To omit output from stcox completely add quietly in front.

OUTPUT from stphtest

Test of proportional hazards assumption

Time: Time

------

| rho chi2 df Prob>chi2

------+------

group | -0.16714 0.66 1 0.4176

ak | -0.20866 1.29 1 0.2562

sex | 0.03580 0.03 1 0.8553

------+------

global test | 1.84 3 0.6057

------

Alternative procedures:

Use a time-dependent covariates to obtain a formal test of the proportional hazards assumption.

Note: The stcoxkm , by() command (day 2, page 33) allows only one variable at a time, so this command is less useful for models with several variables.

STRATIFIED COX REGRESSION MODELS

Use of stratified Cox models

If the effect of an important factor on the survival time is inadequately described by proportional hazards we may instead consider a model in which this factor enters as a stratifying factor with separate baseline hazards and the remaining factors are usual covariates.

The output contains no regression coefficient (or hazard ratio) for a stratifying factor, but the separate baseline hazards may be plotted an compared.

In STATA a this model is specifying by an option strata(varnames). Up to 5 stratifying factors are allowed.

Example

xi: stcox i.group , strata(sex)

COX REGRESSION WITH MANY COVARIATES

Typical Dataset:

For each patient: a waiting time t, a status d and socio-demographic, clinical and other variables, some of which may have impact on the prognosis.

Some problems:

1. How should the collected information be represented as covariates? (i.e. choice of categories, coding of variables, transformations of data etc.).

2. How many and which variables should be included?

3. Which variables should be investigated in details including scoring of information and interactions?

4. How should the results be presented?

Missing values

For some patients relevant information is missing. Only patients with complete information on the variables in the model are included in the analysis.

Missing values may therefore complicate the analysis and the interpretation of the results considerably.

SELECTION OF VARIABLES

Two main strategies:

1. Forward selection

2. Backward elimination

Forward selection

1. Start out with no variables in the model

2. Check each variable separately and include the variable that is most statistically significant.

3. Check the remaining variables one at the time together with the variables selected to far. Add the most significant to the model.

4. Continue this procedure until no new variable is statistically significant.

Backward elimination

1. Start out with all variables in the model

2. Check each variable and exclude the most non-significant.

3. Re-estimate the effects of the remaining parameters and remove the variable which is now most non-significant.

4. Continue until all remaining variables are statistically significant.

Hybrid methods exist – stepwise procedures – that allow both inclusion and removal of variables in each step.

STATA

In STATA the command sw can be placed in front of any regression command, including stcox. Options are used to specify details of the variable selection procedure.

Examples

//backward elimination, sig. level 0.1, Wald’s test

sw stcox group ak sex age , pr(0.1)

//forward selection, sig. level 0.05, Wald’s test, sex and age evaluated together

sw stcox group ak (sex age) , pe(0.05)

//backward stepwise, based on likelihood ratio test

sw stcox group ak sex age , pr(0.1) pe(0.05) lr

//forward stepwise, hierarchical (=specified order)

sw stcox group ak sex age , pr(0.1) pe(0.05) forward hier

//backward elimination, group and ak forced in model

sw stcox (group ak) sex age , pr(0.1) lock

Forward selection or Backward elimination?

Both approaches have drawbacks:

* In forward selection all conclusion are based on comparison of ”wrong” models.

* Backward elimination may often be infeasible because of too many missing values.

* Both methods are automatic and treat all variables in the same way

Conclusion?

No nice and easy solution exists. An order of priority of the variables based on knowledge and insight is a must.

Recommendation

Never rely (completely) on automatic variable selection procedures.

PRESENTATION OF RESULTS

A table with only variable names and p-values does not say anything about the size or direction of the effects and is therefore is clearly inadequate.

Regression coefficients (with standard errors and
p-values) are rather uninteresting if you forget to explain how the variables are coded in the analysis.

The rate ratios - the or exp() – are usually easier to understand than the regression coefficients.

If the “final” model includes covariates , one may compute an individual prognosis for a future patient with covariates values .

For instance the 5 year survival probability for such a patient can be estimated

where PI is the value of the prognostic index for the patient

and is the estimated survival function for the reference group.

A table given an estimate of 5-years survival or an estimate of the median survival time for patients with certain characteristics may also be a useful way to illustrate the implications of the model.

Formulas for standard errors of these quantities are available, but the calculations are not included in the standard statistical software packages.

Other possibilities

A Plot of the estimated survival function for patients with a particular covariate.

A plot of the estimated 5-years survival probability or the median survival time against the prognostic index.

STATA

To obtain plots of survival function, integrated hazard or (smoothed) hazard rates for particular values of the covariates use the command stcurve after stcox. The corresponding baseline function must be specified and saved with stcox. Example:

quietly stcox group ak age , basesurv(surv0)

stcurve , survival at(group=1 ak=1 age=50)

After each fit stcox saves a large number of results in system variables that may be accessed and used for further calculations. Example:

mat define rc=e(b)

gen pi=rc[1,1]*(group==2)+rc[1,2]*(ak==1)+rc[1,3]*age

COX REGRESSION
TIME-DEPENDENT COVARIATES

In a standard Cox regression the user has two options:

1. The effect of a covariate is described by a single number, a rate ratio, giving the change of the rate if the covariate is increased by one unit.
For a dichotomous variate this is just the ratio of the rates in the two categories.

2. A covariate is used as a stratifying factor. The rate ratio between rates in different strata will then becomes an unspecified function of follow-up time

With time-dependent covariates the rate ratio may depend on follow-up time in a specified way. This may e.g. be used to obtain a statistical test of the proportional hazards assumption.

Example:

Occurrence of it in heart infarct patients (continued).

Based on a log-minus-log plot (see day 2 page 33, page 5 above) we concluded that the rates in the two infusion groups could be assumed to be proportional.

STATA has also a command, stphtest, giving a statistical test of the hypothesis of proportional hazard rates (see page 7).

An alternative test of the hypothesis of proportional hazards assumption can be established using a time-dependent covariate.

The idea is to fit a model in which the log(rate ratio) depends linearly on follow-up time and in this model test if the trend with follow-up time is significantly different from 0. This is accomplished using the option tvc.

Example

The following STATA command fit a model in which the regression coefficient (i.e. the log(hazard ratio)) is a linear function of follow-up time. The nolog option is included to omit output from the iterative estimation process.

xi: stcox i.group , tvc(i.group) nolog

Output

Cox regression -- Breslow method for ties

No. of subjects = 85 Number of obs = 85

No. of failures = 27

Time at risk = 9532

LR chi2(2) = 6.76

Log likelihood = -98.167651 Prob > chi2 = 0.0340

------

_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int.]

------+------

rh |

_Igroup_2 | 8.243608 9.967054 1.74 0.081 .770832 88.1607

------+------

t |

_Igroup_2 | .9865033 .0134885 -0.99 0.320 .9604174 1.01300

------

Note: Second equation contains variables that continuously vary with respect to time; variables are interacted with current values of _t.

Comments:

1. The results show that the difference between the groups decreases with increasing follow-up time (the hazard ratio is less than 1 for _Igroup_2 in the section with time-varying estimates). This tendency is however not statistically significant (p = 0.32) so the data is consistent with the hypothesis of proportional rates in the two groups.

2. The regression coefficient estimated for group (in the first section) is considerably large than before, since the parameter now gives the hazard ratio at the start of follow-up (the intercept at time = 0), and the negative trend with time therefore introduce this change. Note also that the standard error on this estimate is large, since the shortest follow-up time is 28.5 hours.

3. Since this analysis does not reject proportional hazards one would then go on to fit the simpler model without time-dependence, i.e. the analysis shown on day 2, page 29-31.

Occasionally, one may want to consider other forms of dependence on time, e.g. log(time). STATA has an option texp which allows specification of this.

Example

xi: stcox i.group , tvc(i.group) texp(ln(t))