Adaptive Seamless Phase II / III Designs – Background, Operational Aspects, and Examples
One challenge in the development of new and innovative medicines is the amount of time which it takes to discover, develop, and then demonstrate the benefits of a new drug. For this reason, much thought has been given to find ways in which drug development could be expedited and be made more efficient without compromising integrity and validity of the development process. One intriguing possibility involves addressing within a single trial objectives which have traditionally been addressed in separate trials.
We define an adaptive designas a clinical trial design which incorporates the possibility of modifying some aspect of the trial while it is still ongoing based upon data collected in the trial. A seamless design combines into a single trial objectives traditionally addressed in separate trials. This type of design eliminates the time that would have occurred between the trials had they been conducted separately, and may provide additional efficiencies in terms of the total number of patients or long-term follow up. In an adaptive seamless designas we describe in this paper, we combine trials in this sense, and we will further view that the analysis is inferentially seamless, i.e., the final analysis will usedata from patients enrolled before and after the adaptation.
Combining studies into a seamless design is possible within or across development phases. There are important opportunities for seamless designs in early development, but they are not the focus of this paper: we will concentrate on the seamless transition between Phase IIb (“learning”) and III (“confirming”). We are envisioning that at an interim analysis following the learning stage, a selection decision will be made governing patient cohorts which will continue into the confirmatory stage. Selection of treatment arms which will, along with a control group, continue into the confirmatory phase, may be considered the most typical example, though other types of selection are possible as well. Section 4 will illustrate two cases studies involving adaptive seamless dose selection trials, and one involving patient population selection.
1.1Opportunities and challenges in seamless development
Benefits and drawbacks of implementing an adaptive seamless Phase II/III designs must be carefully weighed against each other. Not all clinical development programs may be candidates for such a design. We will outline some criteria to help in determining the feasibility of such a design for a clinical development program. None of these factors can be considered as strict rules governing when an adaptive seamless design can or cannot be used. Rather, it should be a combination of these types of considerations which will lead to a final conclusion as to whether or not an adaptive seamless approach should be used. These factors should also not be considered as an exhaustive list, but rather as some important considerations to begin the discussion.
A main motivation behind adaptive seamless designs is the possibility of saving time in the development of a new and needed medication. However, besides the reduction in development time, there is also added efficiency in the use of data collected from patients; most efficiently using patient data should be the goal of any clinical development program. Allowing the data from the learning phase to be combined with data from the confirmation phase suggests that such a design would be able to draw stronger conclusions with the same number of exposed patients. Alternatively, fewer patients may be needed to provide the same strength of conclusions as in the standard paradigm. Also, if patients in the learning phase remain in the study and continue to be monitored, more long term safety data could be captured in such a design, and long term safety effects could thus be better understood.
1.1.1Endpoints and enrollment
The most important feasibility consideration for an adaptive seamless design is the amount of time a patient needs to be followed to reach the endpoint which will be used for the selection decision. Using a dose selection example, just prior to the interim analysis which will be used to select the dose that will continue, there will generally be a period of the study during which some patients have been randomized but have not yet been followed long enough to have been evaluated for the endpointbeing used for the selection. Thus, if the time needed to reach this endpoint is short relative to the total enrollment time of the study, then enrollment can continue uninterrupted with relatively few patients enrolled during this ‘transition’ period. Although patients enrolled during this period and randomized to doseswhich will not be continued can be used to better understand the dose response and safety profile, they will not be part of the final confirmatory analysis of the selected dose. If the endpoint duration is too long, then too many patients would be randomized during this period, which could result in unacceptable inefficiencies. In such a case, it could be considered whether enrollment should be temporarily paused during the transition period, causing disruption to the trial and eroding some of the time savings of an adaptive seamless design. Alternatively, a surrogate marker might be used for the selection.
We recommend using well established and well understood endpoints or surrogate markers when implementing adaptive seamless designs. If a goal of a Phase II program in a novel disease area was to determine a primary endpoint to be carried forward into Phase III, an adaptive seamless design would likely not be feasible.
1.1.2Clinical development time
As a primary rationale behind seamless development is to get effective medications to patients sooner, it must be considered whether a seamless development program would accomplish a reduction in development time. This reduction is clear if the seamless trial is the only pivotal trial that will be required for registration. However, if the seamless trial is one of two required pivotal trials, the second pivotal trial should be completed in a timeframe which shortens the overall development time. Ideally, the second, more traditionally designed, pivotal trial could begin immediately after the interim analysis and selection in the seamless study, and would be completed near the time the seamless study is completed. There will also be more time needed for the planning, development, and health authority review for such a design. This additional time must be included in the evaluation of the overall clinical development time.
Drug supply and drug packaging could be more challenging in this setting, since the number of treatment groups would change during the trial. Therefore, development programs which do not have costly or complicated drug regimens would be better suited to such designs. It is also advantageous to have the final formulation of the drug available by the start of the seamless trial, otherwise a bioequivalence study might be required prior to registration. Procedural considerations relative to decision processes and dissemination of information are discussed in Section 3.
Traditionally, selection of the most promising treatment is made in Phase II of drug development, and comparison of this treatment for superiority versus a control is done in a Phase III confirmatory trial. It is highly desirable to avoid a delay between the end of a Phase II trial and the start of the subsequent Phase III trial. One way to streamline the process is to conduct both treatment selection and confirmation of treatment efficacy over control under a single protocol where all the data are appropriately used in the analysis.
Treatment selection may be based on a short-term endpoint (e.g. surrogate or clinical endpoint measured in a short-term interval), while the confirmation stage uses a long-term (clinical) endpoint. Examples are: pain-free at two weeks versus pain-free at 12 weeks, tumor shrinkage response versus survival, fasting plasma glucose versus HbA1c, change in bone mineral density one year after the start of treatment versus occurrence of fractures within five years, etc.
Selection of the best treatment after the firststage can be a complex decision process that involves not only the primary endpoint but also the secondary endpoints, safety data, and possibly external information. Enough information should be obtained during the “learning” stage to enable adequate risk-benefit assessments in the confirmatory stage. Such decision making should be done by a group of experts with a wide range of scientific knowledge about the disease, the compound, the therapeutic area, and should be based on well-reasoned considerations of all aspects of the data.
The second stage data and the relevant data from the first stage are combined in a way that guarantees the Type I error rate for the comparison with control and produces efficient unbiased estimates and confidence intervals with correct coverage probability.
2.2Descriptions of common methods
The methodology of seamless Phase II/III design represents a fusion of treatment selection techniques and hypothesis testing for integration, both operationally and inferentially, of Phase II and Phase III into a single trial.
2.2.1Selection and testing against control
Early work on treatment selection includes that of Bechhofer et al (1968) on fully sequential identification and ranking and Paulson (1964) elimination procedures based on continuous sequential comparisons of two populations at a time. The goal is to select the “best” treatment (defined as the treatment with the largest mean, rate, etc) with a requirement on the probability of correct selection under certain sets of means (outside the “indifference zone”). Methods allow early elimination of weak treatments and use response adaptive treatment allocation to reduce the number of patients on inferior treatments as well as the total sample size of the study.
In order to avoid continuous monitoring, Thall et al (1989) proposed a two-stage design for trials with binary outcomes, in which the first is used to select the “best” treatment and the second includes just the selected treatment together with the control. Inclusion of the control in the first stage is crucial: it allows results from that stage to be pooled with the data in the second stage. Schaid et al (1990) considered a time to event endpoint and allowed also possibilities of stopping early for efficacy after the first stage, or allowing more than one treatment to continue into the second stage.
Stallard and Todd (2003) generalized these designs by using error spending functions to allow for the possibility of more than two stages. The test statistic is based on efficient score, so the design is applicable for general endpoint, including normal, binary, ordinal, or survival time.
These proposals can be used for seamless Phase II/III design with the first, selection stage taking the place of a Phase II trial, and the second, testing stage the ensuing Phase III trial. Various ingredients of these methods can be combined to allow sequential monitoring, response-adaptive treatment allocation and elimination of inferior treatments throughout the seamless trial.
2.2.2Pairwise comparisons with groupsequential designs
Comparison of several treatments and a control can be considered as a testing problem of the global null hypothesis that all treatments are equal in the framework of group sequential tests.
Follmann et al. (1994) (see also Hellmich (2001)) proposed such group sequential tests with Pocock and O’Brien-Fleming type boundaries. These boundaries are chosen to guarantee the Type I error rate. They note that a simple Bonferroni approximation is only slightly conservative. Treatments may be dropped in the course of the trial if they are significantly inferior to others. Step-down type procedures allow critical values for remaining comparisons to be reduced after some treatments have been eliminated. The focus of this approach is on controlling the Type I error rate rather than on power.
For treatment selection based on means of normal distribution, Bischoff and Miller (2005) developed a two-stage adaptive design with minimal number of patients that controls the Type I error rate, achieves a required power to detect a given clinically relevant difference in means, and also controls the probability of wrong selection (i.e., an inferior treatment is chosen after the first stage). An early stopping either for efficacy or for futility is available, and the information received about the variance is used to determine the number of patients on the selected treatment and control for the second stage.
In contrast to all methods considered above, in which the selection and confirmation are based on the same primary endpoint, Todd and Stallard (2005) considered a group sequential design that incorporates treatment selection based upon a short-term endpoint, followed by a confirmation stage comparing the selected treatment with control in terms of a longer-term primary endpoint.
2.2.3P-value combination tests
The methodology of adaptive combination tests (see e.g. Bauer and Kohne (1994), Bauer and Kieser (1999), Brannath et al. (2002), Muller and Schafer (2001), Liu and Pledger (2005)) allows information from earlier stages to be combined with that of later stages, although treatments (doses) are selected at adaptive interim analyses based upon all previous information from inside or outside the seamless trial. Many mid-trial design modifications (which need not necessarily be pre-specified a priori) are possible without compromising the familywise Type I error rate. A general formulation of the adaptive p-value combination test in the context of treatment selection is recently given in Posch et al. (2005).
Assume that we plan the seamless trial as a two-stage design. For the first stage, we fix in the protocol the treatments (doses), primary outcome measure(s), the sample sizes for the two stages, the allocation rule, the test statistics, the p-value combination function, and so on. We conduct the first stage and get our first stage p-value for testing the global null hypothesis that all treatments are equal to placebo (this may be a p-value from a trend test, from a pairwise comparison with control using a closed testing principle, a Dunnett’s test, etc).
All data collected thus far in the trial or available from sources outside of the trial are used in making the interim decision. This may lead to dropping some treatments because of lack of efficacy or safety concerns, or adding new treatments because they seem to be sufficiently safe and more effective. The allocation rule for the second stage may be changed, for example, assigning a greater sample size to a particular treatment arm to get more safety information. The total sample sizes may be modified based upon observed nuisance parameters such as the variability of the primary outcome or the event rate in the control arm.
If after the first stage the trial is not stopped for lack of efficacy, we proceed into the second stage. After the second stage, we calculate the second stage p-value based upon the disjoint sample of the second stage only. The final analysis is conducted by combining the two p-values into a single test statistic using a predefined combination function. Note that this combination rule cannot be chosen in a data-dependent way after the first stage, it must be laid down a priori in the planning phase.
This formulation in terms of combination of p-values provides an enormous generality with regard to hypotheses and statistical models. The adaptive combination test controls the familywise Type I error in the strong sense (i.e., the probability that one treatment is erroneously declared superior to the control) under all types of adaptations which preserve the simple properties of the distribution of the stage-wise p-values under the overall no effect hypothesis. It is important to note that if the treatment continuation criterion used during the trial differs from that which was planned, then this may have an impact on the power, but will not affect the Type I error rate.
Recently, Kelly et al. (2005) developed a very flexible design that uses adaptive group sequential methodology to monitor the largest efficient score statistic over multiple stages of a trial. The design allows the trial to begin with any number of treatments, to have any number of stages, and permits any number of treatments to continue at any stage.
2.2.4Bayesian model-based designs
Inoue et al. (2002) envisage a Phase II trial in which a single treatment is compared with a control based on both survival time and short-term events that may be related to survival through a parametric mixture model. The main comparison is based on the survival endpoint, and the utility of short-term events must be demonstrated by the results actually observed in the trial. Initially, patients are enrolled only in one main center and are randomized between treatment and control throughout. Interim decisions of whether to stop early, continue Phase II, or proceed to confirmatory Phase III are made repeatedly during a time interval, and are based on predictive probabilities of concluding superiority of the new treatment. If the data suggest that the new treatment may have a positive impact on the short-term events and that this impact translates into a survival benefit, then the trial is seamlessly expanded to include other centers, increasing the accrual rate.