STAT 405 - BIOSTATISTICS

Handout 21 – Survival Analysis: Cox Proportional Hazards Regression

In this section, we will examine regression models for the hazard function h(t). Before examine the form of these models discuss the role of the hazard function in a general sense.

As noted earlier, the log rank test carried out in PROC LIFETEST is useful when time to event is important (i.e., not just the occurrence of an event). Though not discussed here, this test can be extended to control for other variables by adding covariates to the model.

However, another approach exists for analyzing survival data which is much more convenient: proportional hazards regression. As with other regression models, the identification of significant covariates and the interpretation of the estimated model coefficients is of primary concern.

Cox Proportional Hazards Model
Under a proportional hazards model, the hazard function h(t) is modeled as

Comments:

  1. The terms u1, u2, …uk represent a collection of independent variables, created from the covariates x1, x2, …,xp.
  1. The functionh0(t) represents the baseline hazard at time t; this is the hazard for a person with u1= u2 = … = uk= 0. Note that this function cannot be negative.
  1. The exponential function is used so the hazard function is positive for all t.

Cox proportional hazards regression can be carried out in SAS using PROC PHREG. Consider our Leukemia data:

procphregdata=leukemia;

model duration*censor(0)=group / ties=efron;

run;

Interpretation of the Hazard Ratio

For this example with a single predictor variable, we have h(t) = h0(t)exp(1u1).

Let

Then the hazard ratio (HR) is given as follows:

The estimated HR is 4.817 for this example. This means that at any given time, a patient who is still alive in treatment group 2 is 4.82 times more likely to die in the next very small time interval than a patient who is still alive in treatment group 1. That is, the instantaneous relative risk of death for a person in treatment group 2 compared to a person in treatment group 1 is 4.82. Note that thisdoes NOT imply that the probability of surviving for a given time t is 4.82 times larger for group 1!

Discussion of Hypothesis Tests

PROC PHREG automatically tests the following hypotheses:

Ho: j = 0

Ha: j≠ 0

SAS provides three different test statistics, two of which are discussed below:

Wald test statistic:

Likelihood ratio test statistic:

Confidence Limits for the Hazard Ratio

For a binary independent variable, a 100(1-α)% confidence interval is given by

.

Use this formula to find the 95% confidence interval for the hazard ratio:

You can also request these endpoints by using the ‘risk limits’ option in PROC PHREG:

procphregdata=leukemia;

model duration*censor(0)=group / ties=efronrl;

run;

Cox Proportional Hazards Model in R

To fit the Cox proportional hazard model in R you will need the load the pre-installed library survival. This library contains the survfitand survdiff

functions used in the previous handout.

leuk.cph = coxph(Surv(Time,Censor)~Group,data=leukemia)

summary(leuk.cph)

Call:

coxph(formula = Surv(Time, Censor) ~ Group, data = leukemia)

n= 42, number of events= 30

coefexp(coef) se(coef) z Pr(>|z|)

Group2 1.5721 4.8169 0.4124 3.812 0.000138 ***

---

Signif.codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95

Group2 4.817 0.2076 2.147 10.81

Concordance= 0.69 (se = 0.053 )

Rsquare= 0.322 (max possible= 0.988 )

Likelihood ratio test= 16.35 on 1 df, p=5.261e-05

Wald test = 14.53 on 1 df, p=0.0001378

Score (logrank) test = 17.25 on 1 df, p=3.283e-05

Another Example: Adding a Continuous Predictor

Suppose that we also have information regarding the white blood cell count of the leukemia patients. So, we consider adding the continuous predictor (on the log base 2 scale) of the patient white blood cell count to our model. The data can be found in the file Leukemia2.sas.

procphregdata=leukemia;

model duration*censor(0)=group logwbc / ties=efronrl;

run;

Interpretation of the Hazard Ratios

To interpret the hazard ratio for a continuous predictor, we must choose an increment much like we did in logistic regression. Here, the only logical choice is an increment of 1 in the log base 2 scale, which corresponds to doubling the actual white blood cell count.

Thus, after removing the effects of treatment group, a patient with twice as many white blood cells is 5.42 times more likely to die during the next very small interval in time. This implies that an increased white blood cell count has a negative effect on the survival function.

Also, after removing the effects of white blood cell count, at any given time a patient who is still alive in treatment group 2 is 4 times more likely to die in the next very small time interval than a patient who is still alive in treatment group 1.

Confidence Limits for the Hazard Ratio for a Continuous Predictor

In general, holding all other factors constant, the multiplicative risk effect associated with an increment of c units for a continuous predictor xjis . A 100(1-α)% confidence interval for the hazard ratio is given by .

Discussion of Hypothesis Tests

Note that SAS first tests the overall usefulness of the model:

PROC PHREG also tests the following for each predictor variable:

Ho: j = 0

Ha: j≠ 0

Including Interaction Terms in the Model

It is possible that the effect of a continuous predictor may not be the same for levels of a binary predictor; for example, the effect of white blood cell count on survival may differ between patients in treatment groups 1 and 2. To allow for this possible interaction, we should include an interaction term in the model:

procphregdata=leukemia2;

model duration*censor(0)= group logwbc interaction / ties=efronrl;

interaction = logwbc*group;

run;

What are the conclusions of this test?

1