Adaptive Designs: Terminology and Classification
1. Introduction
Recent achievements in the methodology of adaptive designs provide new ways of drug development that have the potential to improve quality, speed and efficiency of decision making. With introduced flexibility within trial design, this approach saves resources by identifying failures early and increases efficiency by focusing precious patient resourceon treatments that have a higher probability of success. While clearly advantageous to the drug development program, this is also ethically beneficial to the patients in the trial as it restricts patient exposures to ineffective treatments.
Unfortunately, as often happens with novel approaches, there has been substantial confusion over what these designs are and when they are most applicable. They are known as adaptive, sequential, flexible, self-designing, multi-stage, dynamic, response-driven, smart, novel designs. We propose here an integrated approach to defining and classifying adaptive designs in order to minimize confusion on their terminology and taxonomy across the pharmaceutical industry, its stakeholders and analysts, and regulatory agencies.
The primary purpose of this paper is to describe the range of adaptive designs that are available and to promote the benefits they might bring in all phases of clinical drug development. It is necessary to emphasize that these designs have much more to offer than the rigid conventional parallel group designs in clinical trials.
To maintain focus within the space allotment available, we do not attempt an exhaustive literature review. Rather we focus on key ideas and cite supportive literature as appropriate. Section 2 gives a general definition of adaptive designs and their structure. Section 3 provides a classification of adaptive designs
2. Definition of adaptive designs
Adaptive designis defined as a multi-stage study design that uses accumulating data to decide on how to modify aspects of the study without undermining the validity and integrity of the trial.
To maintain study validity means providing correct statistical inference (such as adjusted p-values, unbiased estimates and adjusted confidence intervals, etc), assuring consistency between different stages of the study, minimizingoperational bias.
To maintain study integrity means providing convincing results to a broader scientific community,preplanning, as much as possible, based on intended adaptations, and maintaining the blind of interim analysis results.
An adaptive design requires the trial to be conducted in multiple stages with access to the accumulated data. An adaptive design may have one or more of the following rules applied at an interim look:
- Allocation Rule: how subjects will be allocated to different arms in the trial? This can be a fixed randomization, say 1:1, throughout the trial, or it may be adaptive with the randomization ratio changing from stage to stage, based on accruing data. This includes also the decision to drop or add treatment arms.
- Sampling Rule: how many subjects will be sampled at the next stage? This may depend on estimates of accrual so far, or on estimates of nuisance parameters, e.g. variance, or even on estimates of treatment effect. In dose-escalation studies this is the cohort size per stage.
- Stopping Rule: when to stop the trial? There are many reasons for stopping a trial: for efficacy, for harm, for futility, for safety.
- Decision Rule: the final decision and interim decisions pertaining to design change not covered by the previous three rules, e.g. to update the model, to change the endpoint, to modify the initial design.
At any stage, the data may be analyzed and subsequent stages can be redesigned taking into account all available data.
This definition includes any group sequential designs for which the only design revision is stopping the study early for sufficiently strong evidence of a treatment effect difference. Another kind of adaptive design aims to treat patients in the study as efficiently as possible using response adaptive allocation in which patients are more likely to be assigned to treatment that appears to be more efficient according to the observed responses. Sample size re-assessment or "internal pilot studies" involve the recalculation of sample size based on interim information about the values of nuisance parameters. While each of these three designs allow just one of the adaptation rules, the most recent class of adaptive designs, known also as flexible designs, allow for adaptive allocation rule (changing the randomization from stage to stage), adaptive sampling rule (the timing of the next interim analysis), a stopping rule, as well as for other modifications to be made following interim analyses (adaptive decision rule), including changing the target treatment difference for which the study is powered, changing the primary endpoint or varying the form of the primary analysis.
Although statistical methodology has been developed to allow for these types of adaptive designs, these methods should never be used to replace the careful planning for the statistical design of a clinical trial. Before starting the trial, an efficient design must be detailed in the protocol. Adaptive design methodology then provides a valuable tool for reasonable design changes.
We describe below the four elements of an adaptive design.
Allocation Rules. At each stage, the allocation rule determines how new patients will be assigned to available treatments. An allocation rule may be fixed (static) during the study or may by adaptive (dynamic), changing from stage to stage according to previous treatment assignments or/and patient responses.
A fixed allocation rule does not necessarily mean a deterministic rule. On the contrary, randomization(random allocation) of patients is usually used to achieve balance in all knownand unknown, observed and unobserved covariates(prognostic factors) at baseline. However, a fixed allocation rule uses allocation probabilities that are determined in advance and are not changed during the trial. Complete randomization uses equalallocation probabilities for balancing treatment assignments.Stratification can be used to improve the randomization,but this approach limits the number ofcovariates. Permuted block design can also be used, but this method has the disadvantage of high predictability at theinvestigatory site level. Restricted randomization is used with fixed unequal allocation probabilities for unbalanced treatment allocation. Rosenberger and Lachin [1]develop this subject more deeply than is possible to report here.
By contrast, an adaptive allocation rule dynamically alters the allocation probabilities to reflect the accruing data on the trial.Covariate-adaptive randomization[2-5]ensures balance between treatment arms with respect to known covariates. Rather than to balance over known covariates, the optimal design approach [6, 7] minimizes the variance of treatment effect estimator in the presence of covariates.
Response-adaptive randomizationuses interim data to unbalance the allocation probabilities in favor of the treatmentarms having comparatively superior outcomes. The simplest one is the randomized play-the-winner rule [8, 9], in whicha success on one treatment results in the next patient’s assignment to the same treatment, and only changesto the alternative treatmentin the event of a failure. More complex and flexible allocation rules can be obtained by using urn models [1]. The allocation probabilitiesare changed during the course of the trial to reflect the known outcomes ofpatients by adding balls of an appropriate color tothe urn. The doubly adaptive biased coin design [10] adapts allocationsbased on previous treatment group assignments as well as on the outcome information.
Bayesian response-adaptive randomization [11, 12] alters the allocation probabilities based on theposterior probabilities of each treatment arm being the “best". Drop-the-loser type [13] of allocation rule removes completely a treatment arm from further randomization schedule. This gives patients a higher chance of receiving the treatment that is performing better.
Sampling Rules. At each stage,the sampling rule determineshow many subjects will be sampled at the next stage.Sample size re-estimation (SSR) design consists of two stages and a simple sampling rule that determines the sample size for the second stage in the light of first stage data. This may depend on estimates of nuisance parameters such as variance orresponse rate in control arm, but not on the treatment difference. A restricted sampling rule is one where the target sample size calculated before the trial serves as a lower bound for the recalculated sample size [14, 15]. Blinded SSR rules calculate the estimate of the nuisance parameter without unmasking treatment codes [16, 17]. They are quite efficient [18-21] and less controversial than the unblinded SSR[22, 23] rules that require unmasking because the pooled variance depends on the sample mean in each arm.
A traditional group sequential design uses a simple sampling rule with fixed (usually equal) sample sizes per stage. On the other hand, the information based design[24] uses a sampling rule that keeps the maximum information fixed but adjusts the sample size in order to achieve it. An error spending approach[25] allows the sample sizes for different stages to vary but in a way that does not depend on the observations from previous stages. Sequentially planned decision procedures [26-30] extend the group sequential designs by allowing future stage sample sizes to depend on the current value of the test statistic.
The most flexible SSR rules incorporate information on the estimated treatment difference as well [31, 32]. The sample size for the next stage is determined by the conditional power, defined as the probability of rejecting the null hypothesis at the end of the study, conditional on the first-stage data. This probability is usually calculated under the originally specified treatment difference and uses information not only on the nuisance parameters but all the observed data, by conditioning on the observed test statistic. It is tempting to replace the originally specified treatment difference by its interim estimate. This option, although proposed frequently [33-36], cannot be recommended as a general strategy. The interim effect size is a random variable and will lead to highly variable second stage sample sizes, including particularly large ones [37].A cap on the maximum sample size is recommended in such situations[38].
Stopping Rules. Stopping rules for clinical trials are intended to protect patients in the trial from unsafe drugs or to hasten the approval of a beneficial treatment. There is a wide range of statistical rules that can be used to determine whether to stop or continue a trial. The majority of such stopping rules are applied to a single primary endpoint and are constructed to satisfy a given power requirement in a hypothesis testing framework. Stopping rules are now available for testing superiority, equivalence, noninferiority and even safety aspects of clinical trials.
A trial may be stopped in the following three situations: first, if the experimental treatment is clearly better than the control (superiority); second, if it is clearly worse than the control (harm); and third, if it is clearly not going to be shown to be better than the control (futility). Many stopping rules are based on boundary crossing methodology: at any stage in the trial, a test statistic is calculated and compared with given stopping boundaries, corresponding to one of the three objectives above; if either of them is crossed, the trial is stopped and the appropriate conclusion drawn, otherwise it is continued to the next stage.
Bayesian stopping rules are based on posterior probabilities of hypotheses of interest and may be supplemented by making predictions of the possible consequences of continuing. Each of the three objectives may be formalized by assessing the posterior probability that the treatment benefit lies above or below some threshold. A skeptical prior can be used for early stopping for efficacy and an enthusiastic prior for early stopping for futility [39, 40].
Decision Rules. At any stage, additional decision rules can be considered like changing the test statistics, redesigning multiple endpoints, selecting which hypothesis to be tested (switching from superiority to non-inferiority [41, 42] or changing the hierarchical order of hypotheses [43, 44]), changing the patient population (e.g., going forward either with the full population or with a pre-specified subpopulation).
To maximize the power of parametric trend tests in a dose-response trial, scores corresponding to the typically unknown shape of the dose response curve have to be applied. Using an adaptive combination test, one can use the first stage data to estimate this shape and compute appropriate scores for the second stage test [45]. A similar idea has been used for changing scores for the comparison of survival curves, if deviations from the proportional hazards assumption are apparentbased on the interim data [46].
Location-scale tests are used in situations where an increase in location is accompanied by an increase in variability. A usual test statistic for such a test is the sum of a location and a scale test statistics. This test can be improved by an adaptive two-stage design where in the first stage the sum and in the second stage a weighted sum of a location and a scale test statistics is used. The appropriate weights are estimated based on the first stage data [47].
Another example for an adaptive choice of the test statistics could be to include a covariate in the second stage test procedure which, in the interim analysis, shows an unexpected effect in terms of variance reductions (not foreseen in the study protocol) [48].
Decision rules for redesigning multiple endpoints include changing their pre-assigned hierarchical order in multiple testing [49], updating their correlation in reverse multiplicity situation [50], excluding those that are not properly measured in terms of variability and completeness [51], updating the parametersin modeling the relationship between the primary endpoint and auxiliary variables (biomarkers, short-term endpoints, etc) [12, 52].
After the first stage,one can perform another two-stage test with the level given by the conditional error function [53]. This allows choosingadaptively the number of interim analyses based on information collected so far. For example, if the sample size was increased one can add another interim analysis if the probability for an early decision is high.
3. Classification of adaptive designs
Single arm trials
Standard Phase II studies are used to screen new treatments for activity and decide which ones should be tested further. The decisions generally are based on single-arm studies using short-term endpoints (response/no response) in limited number of patients. The problem is formulated as hypothesis testing about some minimal acceptable probability of response allowing early stopping due to inactivity of the treatment.
An early approach [54] considered both estimation and testing. At the end of the first stage a decision is made to abandon development of the new treatment if there have been no responses observed. The sample size for the first stage is determined so as to give a specified type I error rate. Following the first stage, the sampling rule calculates the second stage sample size depending on the data from the first stage, so as to estimate the unknown response rate with the specified precision. The design has been extended to three stages [55, 56].
Several group sequential designs with a fixed sampling rule have been proposed and evaluated in the frequientist framework [57, 58]. An adaptive two-stage design allows the sample size at the second stage to depend on the results at the first stage [59].
A Bayesian design [60] stops the trial for activity as soon as the posterior probability that the true response rate is at least as the standard exceeds 0.9 or stops for futility if the posterior probability that the true response is of a considerable improvement over the standard is less than 0.1.
Instead of evaluating each treatment in isolation, one after the other, the adaptive design for the entire screening program can be considered [61-63]. Number of subjects per screening trial is chosen to minimize the shortest possible time to identify the "promising" compound, subject to the given constraints on type I and II risks for the entire screening program.
Comparing two treatments
The main objective of large-scale Phase III clinical trials is to confirm the clinical benefit of the experimental treatment by comparing it with a control (placebo or active). The clinical benefit is expressed through a parameter, an unknown population characteristic about which a hypothesis testing problem is formulated. A test statistic measures the advantage of experimental over control apparent from the sample of data available at an interim analysis.
A sequentialdesign uses a stopping rule that stops the trial at a given stage if the boundary is crossed. If the test statistic stays within the boundaries then there is no enough evidence to come to a conclusion and a further interim look should be taken. A fully sequential design [64] has a very simple sampling rule: look after every observation. Group sequential designs[65] have two or more stages at which the test statistic is compared with the boundaries after groups of patients have been observed. These designs have a simple allocation rule with fixed randomization and a decision rule that simply determines whether to accept or reject the null hypothesis after stopping. The precise form of the stopping rule is determined by consideration of significance level (Type I error rate) and power at the specified alternative (desired treatment advantage on the primary endpoint). The appropriate type of stopping rule should reflect the main objective of the trial and the desirable reasons for stopping or continuing.