Sequential Designs in Phase 2 Clinical Trials

Table of Contents

Mustapha Setta

April-October 2005

Table of Contents

TABLE OF CONTENTS

0. Summary

1. Introduction:

1.1 Binomial distribution

1.2 Hypothesis Testing

1.3 Bayesian statistics

2. One-Armed Trials

2.1 One-stage design

2.1.1 Normal approximation

2.1.2 Binomial distribution

2.2 Two-stage designs

2.2.1 General introduction

2.2.2 Frequentist designs

2.2.3 Bayesian design

2.3 Three-stage designs

2.3.1 General introduction

2.3.2 Frequentist designs

2.3.3 Bayesian design

2.4 Multi-stage designs

2.4.1. Bayesian Design

3. A case study: investigating a dose of Hirulog

3.1. Case description

3.2. Possible designs

3.2.1 Ginsberg et al

3.2.2 Fleming

3.2.3 Thall & Simon (Continuous)

3.2.4 Thall & Simon (Discrete)

3.3 Reversing Hypotheses

3.4 Summary and recommendations

3.4.1 Summary

3.4.2 Recommendations

4. Two-Armed Trials

4.1. One-Stage design

4.2. Two-stage Design

4.2.1. Decomposition of the Z-test

4.2.2. The B-value

4.2.3. The conditional power

4.2.4. Futility stopping rule

References

Appendix. SAS Programs

0. Summary

In order to create new medicines clinical trials are needed. A clinical trial is a research study to answer specific questions about vaccines, new therapies or new ways of using known treatments. It is also used to determine whether new drugs or treatments are both safe and effective.

In most clinical trials one group of participants is given an experimental drug, while another group is given either a standard treatment for the disease or a placebo (the short hand term is two-armed). However, in earlier stages of development, clinical trials may also have only one ‘arm’ which means that all participants gets the same experimental drug. Clinical trials can be divided into three categories: phase I, II and III.

Phase I studies are primarily concerned with the drug's safety, and are the first time the drug is tested in humans. These studies are typically done in a small number of healthy volunteers (20-100), usually in a hospital setting where they can be closely monitored and treated if there are any side effects. The purpose of these studies is to determine how the experimental drug is absorbed, metabolized, and excreted in humans. Additionally, they seek to determine what types of side effects occur as the dosage of the drug is increased. Any beneficial effects of the drug are also noted.

Phase II: Once an experimental drug has been proven to be safe and well-tolerated in healthy volunteers, it must be tested in the patients that have the disease or condition that the experimental drug is expected to improve/cure. In addition to ensuring that the experimental drug is safe and effective in the patient population of interest, Phase II studies are also designed to evaluate the effectiveness of the drug. The second phase of testing may last from several months to a few years and may involve up to several hundred patients.

Phase III: is a study where an experimental drug is tested in several hundred to several thousand patients with the disease/condition of interest. Most Phase III studies are well controlled, randomized trials. That is, one group of patients (subjects) receives the experimental drug, while a second "control" group receives a standard treatment or placebo. Placement of the subject into the drug treatment or placebo group is random in a binary context (as if by the flip of a coin). Often these studies are "double-blinded", that is, neither the patient nor the researchers know who is getting the experimental drug. The large-scale testing provides the pharmaceutical company, as well as the FDA, with a more thorough understanding of the drug's effectiveness, benefits/risks, and range/severity of possible adverse side effects.

In this research the focus will be on designs for Phase II studies,different designs will be inventoried, evaluated and programmed (in SAS) to make them available. A ‘real-life’ example of a design that may be applied to an actual Organon study is presented.

The two-stage designs that are evaluated can be divided into two groups: One-armed and Two-armed. In both cases,data is used to make a decision about whether to reject or ’accept’(not reject) a statistical hypothesis -usually stated as a null hypothesis H0- in favor of an alternative hypothesis HA. One has then the following setting:

H0: p<=p0 where the true response p of the treatment is less than some uninteresting level p0.

HA: p>p1 where the true response probability is at least some desirable target level p1.

For the one-armedtrials, we have first evaluated the one-stage design where a pre-specified number of patients are enrolled. The treatment is then tested only once, namely at the end of the trial. Two methods are evaluated, the normal approximation which is based at the normal distribution and the exact binomial method where the binomial distribution is used.

However, in order to reduce the number of patients that are used for a trial, a two and three-stage designs that are introduced by Herson (1979) and later by Fleming (1982) and Simon (1989) are explored. Further, a multi-stage design of Thall and Simon is reviewed.

In Fleming’s design the treatment may be declared ‘promising’ as well as ‘unpromising’ at the end of the trial. However in order to reduce the number of patients, in case of early evidence of (in) efficacy, Fleming introduces also interim analyses. After every interim analysis one can also declare the treatment ‘promising’ as well as ‘unpromising’.

Simon’s design is slightly different from Fleming’s design. In this design the trial will be stopped, only because of the lack of the effect (i.e. the treatment is then declared ‘unpromising’ not only at the end of the trial but also after every interim analysis).

Further, the characteristics of both designs are derived assuming a fixed value of the response rate of the treatment p. Obvious choices are then p=p0 (the response under the null hypothesis) and p=p1 (the response rate under the alternative hypothesis).

Similarly to Simon’s design is Herson’s design (1979), again after each interim analysis the treatment may only be declared ‘unpromising’, the only difference is that the characteristics of this design are derived in a different way then in Fleming’s. Here one chose the response rate of the treatment to follow a certain distribution the ‘prior’. Usually the Beta distribution is taken.

Thall and Simon’s design (1994) includes both possibilities ‘promising’ as well as ‘unpromising’. Further is this design a multi-stage design which means that the data is monitored continuously until the total number of patients is reached. The designs properties are derived using a fixed value of the response rate of the treatment.

For ‘two-armed’ trials, an approach based on conditional power is developed (i.e. the probability to reject the null hypothesis of ‘no difference between treatments’ at the end of the trial). If the conditional power is ‘low’ the trial will be stopped and the new treatment will be declared ‘unpromising’ as compared to the control.

For both one-armed and two-armed trials, restriction was made to binary outcomes, i.e and design properties were derived using the binomial distribution. However, for continuous outcomes the designs are similar. The only difference is that the properties will be derived under a continuous distribution. Furthermore, clinicians often prefer a binary outcome like response or bleeding.

For ‘two-armed’ trials we have not considered stopping after an interim analysis and followed by declaring the new treatment ‘promising’. This topic has, however, extensively been discussed in the statistical literature. See, for example, O’Brien and Fleming (1979).

Finally, a case study was reviewed. Deep venous thrombosis (DVT) is a condition where there is a blood clot in a deep vein (a vein that accompanies an artery). For years the standard has been an anticoagulant medication called Heparin which was given through the vein. This results in relatively immediate anticoagulation and treatment of the clot. Along with heparin an oral medication called warfarin is given. The main side effect of heparin and warfarin is bleeding.

Some time ago a new treatment was introduced for the prevention of DVT: Hirulog. To explore the potential of Hirulog in the prevention of DVT, a phase II dose ranging study was preformed in patients undergoing major knee or hip surgery (Ginsberg et al, 1994).

The study objective was to identify a Hirulog dose associated with:

an overall DTV rate 15%
a bleeding rate < 5%

These values represent the approximate rates of the standard treatment heparin. Five dosage regimens were investigated in a sequential fashion using the designs presented above, where each dose was evaluated independently. For each dose it was planned to monitor the rates of bleeding and thrombosis in every 10 patients up to a maximum of 50. Hence, this study may be considered as a sequence of five one-armed trials.

A dose-finding studywas not fully covered. If doses are investigated in a sequential fashion, then the methods of the case study can be applied. If only two doses are investigated in parallel, then the design for two-armed trial can be applied. However, for more than two doses, other methods should be developed. See, for example, Whitehead et al for a recent reference.

1. Introduction:

1.1 Binomial distribution

The binomial distribution is the discrete probability distribution of the number of successes in a sequence of Nindependent success/failure experiments, each of which yields success with probabilityp. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. Hence, the probability of k successes in a sequence of N independent yes/no experiments with success probability pis equal to

, k = 0, 1, 2,….., N; 0 < p < 1

and the probability of k successes or less (cumulative probability) is then equal to

The binomial distribution is also the basis for the popular binomial test which we will define in the next section.

The binomial distribution can also be approximated in some cases by the normal distribution. For large N (say N > 20) and p not too near 0 or 1 (say 0.05 < p < 0.95) the distribution approximately follows the Normal distribution.

For instance, if X ~ binomial (N, p) then approximately X has the Normal distribution with mean E(X) = Np and variance Var(X) = Np (1-p).This implies that X minus his expectation divided by the standard deviation, i.e. the square of the variance, is Normally distributed with expectation zero and a unit variance, i.e. the following holds:

with.

This last term is known as the Z-statistic and can be used for testing statistical hypotheses as we will see in section one of this report.

1.2 Hypothesis Testing

Setting up and testing hypotheses is an essential part of statistical inference. In order to formulate such a test, usually some previous research has been done. This is done either because some hypothesis is believed to be true or because it is to be used as a basis for argument, but it has not been proved.

To make this statistical method more clearly, we introduce some terminology. The general situation is then as follows. We have a set of observations x1,…, xN that may be viewed as realizations of identical independent distributed random variables (i.i.d.) X1,…,XN with distribution with . Here is called a parameter that indicates a specific distribution from a parametric family.

Suppose further that and.We now wants to test against the alternative hypothesis, i.e. we want to come to one of the following conclusions:

Reject H0
Do not reject H0

In each problem considered, the question of interest is mostly simplified into two competing hypotheses; the null hypothesis, denoted H0, and the alternative hypothesis, denoted HA. These two hypotheses are not however treated on an equal basis; special consideration is given to the null hypothesis.

We have to know which types of error we can make. First, we can reject H0 while and hence H0 is correct; this is known as type I error (usually denoted as). Second, we can accept (not reject) H0 while and hence H0 is incorrect; this is called the type II error (denoted). Note further that the complement of the type II error, i.e. reject H0 while, is called the power of the test.

The following table gives a summary of possible results of any hypothesis test:

Decision
Reject H0 / Don't reject H0
Truth / H0 / Type I Error / Right Decision
HA / Right Decision / Type II Error

Now we know which types of errors occur, we choose a significance level and construct a test such that the following holds:

The type I error must be smaller or equal to.
Minimize the type II error.

This shows that we design the test to bound the type I error. For the type II error we do not attempt to reach a prescribed value. The test can then be described by a critical (or rejection) region K with. If x= (x1… , xN) is a vector of realizations of X= (X1….,XN) then we have the following:

Reject H0 if
Do not reject H0 if

By the design of the test the probability in making an error in conclusion one is bounded by. There is no prescribed bound on the probability of making an error in conclusion two. Hence the careful formulation is

For each.
For , minimize ; or maximize

The rejection region K can be derived by solving these constraints.

Note that the function is called the power of the test with a critical region K; and is called the size (the significance level) of the test.

Example:

Suppose we have an observation from a binomial (10,p) distribution. Further we have the following hypotheses to test:

Note that the hypotheses above can also be reformulated as

It is now reasonable to use a critical region of the form [t, 10], for some t=0, 1, 2…..10. We choose the significance level to be equal to 0.05 and derive t by solving the constraints given above. Hence, we have

Now, X has a binomial distribution and so:

Note that because the binomial distribution is discrete, it is not possible to get exactly a significance level that is equal to 0.05 and therefore we choose the critical region as the interval [6, 10]. After we have derived the critical region, the following conclusions can be made:

Reject H0 if
Do not reject H0 if

In this example we have seen that for a given set of observations N that is binomial (N, p) distributed, a given significance level and a given Hypotheses we can construct a test that reject the null hypothesis in favour of the alternative hypothesis. This was done by deriving the critical region as we have seen above.

Conversely to this, as we will see in section 2 and further in this report, we are more interested in the choice of N such that for a given and, we have a type I error and type II error.

1.3 Bayesian statistics

Bayesian statistics is an approach to statistical inference in which observations are used to update prior belief about a probability or population parameter to posterior belief.The name "Bayesian" comes from the fact that Bayes' theorem from probability theory is frequently used in the inference process. Bayes' theorem was first derived by ThomasBays.

Suppose that we have a random variable that is distributed according to a parametric family. The goal is then, given i.i.d. observations, to estimate. For instance, let be a success/ failure experiment (i.e. Bernoulli distributed), where denotes success and denotes failure. Let us define; our goal is then to estimate.

We claim to have a prior distributionover the probability, which represents our prior belief. Suppose this distribution is and. The Beta distribution is concentrated on the interval [0,1] and has the probability densitygiven by

Now we observe the sequenceand suppose that the number of successes is equal to k and n-k is the number of failures. We may calculate the posterior distribution according to Bayes' Rule:

The term is, as before, the likelihood function of and follows by integrating out:

To make this more clearly for our example, let us assume that is Beta distributed with parameters a=10 and b=20 then the density function is:

Suppose further we observe in the new data, with n=100, a sequence of 50 successes followed by a sequence of 50 failures. The likelihood function becomes

Plugging in this likelihood and the prior into the Bayes Rule expression, we obtain the posterior distribution as a with density function that is equal to

Note that the posterior and prior distributions have the same form. The Beta distribution is usually chosen for the binomial distribution which gives the likelihood of i.i.d. Bernoulli trials. Further, the outcome of the Bayesian analysis is not a estimated value of but a posterior distribution. This posterior distribution summarizes all information about. As we get more data the posterior distribution will become more sharply peaked about a single value. This of course will allow us to make inference about the value of.

This Bayesian approach will be used further in this report (Herson and Thall & Simon) and is an alternative way to test some given hypothesis. In Herson’s design as well as Thall & Simon’s, we used this approach to update prior beliefs about a certain parameter of interest. Further, it is used to derive a Framework for a two-stage design, i.e. derive critical regions to stop the test early.

2. One-Armed Trials

Much of the statistical methodology used in these trials is derived from oncology. Cancer trials usually are uncontrolled trials, i.e. one-armed; only one experimental treatment is evaluated to decide whether it has sufficient effectiveness to justify further study. These one-armed trials often use a binary outcome ‘response’, i.e. the drug “works” or it does not. In this section, we focus on these one-armed trials.

In this setting we have the following hypotheses:

H0: p<=p0 where the true response probability p is less than some uninteresting level p0.

HA: p>p1 where the true response probability is at least some desirable target level p1.

2.1 One-stage design

In a one-stage design a pre-specified number of patients are enrolled and the hypothesis is tested only once, namely at the end of the trial.

In order to investigate the efficacy of the new drug, we will derive a formula for the sample size of this single-stage design. This formula depends on the test chosen; there are two different methods: