An active control is an intervention, such as a drug, a therapy, or a medical device, whose effectiveness has previously been established. This entry will discuss the basic issues associated with an active control trial, and some of the issues attendant with the current methods that are used in the design and analysis of active control trials.
An active control is an intervention, such as a drug, a therapy, or a medical device, whose effectiveness has previously been established. Active controls have been used in trials with or without a placebo. For clinical drug trials with a placebo, sometimes called trials with a gold standard design, active controls usually play a secondary role with respect to demonstrating the effect of a new treatment. In placebo control trials involving psychotropic agents, active controls are intended for the purpose of verifying the assay sensitivity of the trials (Leber ^{[1,2]}), i.e., the ability of the trial to demonstrate an effective drug to be effective (ICH-E10^{[3]}). In this article, active control trial with a gold standard design will not be considered; an active control trial refers specifically to a trial with a control that is known to be effective and without the presence of a placebo. Active control trials often arise when the objective of a trial is to investigate the effect of the new treatment on mortality or serious morbidity outcome for patients with certain serious diseases. For obvious ethical reason, when there are available known effective treatments for the disease, one should use one of these active treatments instead of a placebo as the control (Declaration of Helsinki^{[4]}). This article will discuss the basic issues associated with an active control trial, and some of the issues attendant with the current methods that are used in the design and analysis of active control trials. The concepts and issues will be discussed within the framework of a blinded, parallel, randomized trial comparing the survival experience of a new treatment, T, to a single active control, C, in treating patients with certain serious disease.
The primary objective of an active control trial is to demonstrate the effectiveness of a new treatment, T. To show that a new treatment, T, is effective, traditionally, it is required to demonstrate that T is superior to C, the active control. This is analogous to a placebo control trial where the new treatment, T, has to show superiority to the placebo. It often happens that upon failing to demonstrate superiority in such a trial, the experimenter then concludes that the new treatment is equivalent, similar, or non-inferior to the active control. However, this reasoning is fallacious because in a superiority trial, failing to reject the null hypothesis of inferiority or no difference does not imply that the null hypothesis is true. Unless the new treatment represents a significant therapeutic advance, in general, it would be difficult for the new treatment to show superiority to the active control. This naturally leads to an interest in demonstrating the effectiveness of a new treatment through an active control non-inferiority (or a one-sided equivalence) trial, whereby the new treatment is shown to be equivalent or no worse than the active control by a certain equivalence or non-inferiority margin δ. The question then is how does one determine this margin δ? One may refer to Temple,^{[5]} EMEA^{[6]} and FDA^{[7]} for a discussion of some of the problems involved in the interpretation of active control trials. In the following sections, we will discuss how the two current methods arrive at the determination of this margin and the problems associated with these methods.
The first critical assumption implicit in any active control trial is that the active control is still effective in the current setting. This assumption cannot be verified short of the inclusion of a placebo in the active control trial. If one doesn’t believe that the active control is effective in the current setting, then one should either use a different active control that is believed to be effective in the current setting, or else demonstrates that the new treatment is superior to the active control. Therefore, for this reason as well as for obvious ethical reasons, one should use the most effective active control currently available as the control. With such an active control, one can at least feel more assured that the active control is effective in the present setting.
The second critical assumption that is also implicit in any active control trial is that the trial has assay sensitivity. Assay sensitivity refers to the ability of a trial to distinguish an effective treatment from an ineffective treatment. Without assay sensitivity, a trial cannot even detect the effect of a control that is known to be effective. Thus, with the first assumption, the active control is assumed to be effective in the current trial setting, and with assay sensitivity, the trial should be able to detect the effect of this control. In addition, if the new treatment is ineffective, then this trial should be able to differentiate it from the control.
There are generally two current methods to the determination of a non-inferiority margin. The first approach is termed the fixed margin method, and the second approach is called the fraction retention method, or sometimes termed the synthesis method (Hung et al. ^{[8–10]} and Ng^{[11]}). Both assumptions are needed in the two methods. However, as will be noted later, the first assumption is implicitly made in each method in a different way.
Let the true hazard ratio of T relative to C be denoted by HR(T/C), and let hr(T/C) denote its estimator. Let P denote the reference or imputed “placebo,” and HR(P/C) and hr(P/C) denote the corresponding hazard ratios.
If there is reason to believe that the new treatment, T, represents a true therapeutic advance and one wishes to demonstrate the superiority of T over C, then the null and alternative hypotheses are simply,
In terms of log hazard, these hypotheses may be stated as,
The null hypothesis in Eq. 1 may be tested by the following test statistic:
where SE stands for “standard error.” The null hypothesis in Eq. 1 is rejected if τ _{0} < −1.96 (Wald’s test), or equivalently, if log hr(T/C) + 1.96SE[log hr(T/C)] < 0, at the one-sided 2.5% significance level. As in a placebo control trial, in an active control superiority trial, there is only one parameter that will be tested. In the case here, this parameter is the log-hazard ratio, log HR(T/C), of the new treatment relative to the control.
However, if the new treatment is only as effective as the control, then the null hypothesis in Eq. 1 is unlikely to be rejected. One may not conclude that the new treatment is non-inferior (or equivalent) to the control upon failing to reject this null hypothesis, since failing to reject the null hypothesis in Eq. 1 does not imply that the null hypothesis is true.
When the new treatment is thought to be as effective as the control and the objective of the trial is to demonstrate the effectiveness of this new treatment by showing that it is non-inferior to the control, then the null and alternative hypotheses may be stated as follows:
where δ > 0 is a fixed non-inferiority margin.^{[12]}
Again, in terms of log hazard, these hypotheses may be stated as follows:
where δ > 0 is a fixed non-inferiority margin (this δ is different from the δ above) (See Remark 3).
The null hypothesis in Eq. 3 may be tested by the following test statistic.
The null hypothesis in Eq. 3 is rejected if τ _{0}(δ) < −1.96, or equivalently, if log hr(T/C) + 1.96SE[log hr(T/C)] < δ, at the one-sided 2.5% significance level. One may then conclude that the new treatment is non-inferior to the active control within this fixed margin δ. The question is how should this δ margin be determined?
If the margin δ is arbitrarily selected, then the rejection of the null hypothesis in Eq. 3 may actually lead to the conclusion that a new treatment is non-inferior to the control when, in reality, it is worse than a placebo. This can easily be seen from the following scenario. Assume that in the current trial, the true hazard ratio is known and is given by HR(P/C). If the margin δ is specified so that it is greater than the effect of the control, i.e., δ > log HR(P/C), then the rejection of the null hypothesis in Eq. 3 does not rule out the possibility that the new treatment can be worse than the placebo, i.e., log HR(P/C) < log HR(T/C). This is easily seen from the diagram in Fig. 1.
From Fig. 1, it is clear that, in general, in an active control trial, one should not specify the margin δ to be an arbitrarily fixed number. In particular, one should avoid specifying a δ margin that is greater than the effect of the control. How can this be accomplished?
If the true hazard ratio, HR(P/C), is known, then this can be done easily by defining the δ margin to be a fraction of the control effect, e.g., δ = 1/2 log HR(P/C). However, the true control effect is never known even if there were a placebo present in the current trial. The true control effect can only be estimated. When there is a placebo in the current trial, then the true control effect may be estimated by log hr(P/C). When there is no placebo in the current trial, the true control effect needs to be estimated from historical data. In either case, one can only provide at best a probability statement regarding the likelihood of δ < log HR(P/C). For example, if the true control effect, log HR(P/C), can be estimated from the historical data by the lower limit of the 95% confidence interval (CI) of the control effect, then the δ margin defined as a fixed number equal to half of this estimate of the control effect, i.e.,
provides a high probability that δ < log HR(P/C). But here it is implicitly assumed that the historical control effect remains the same in the current trial setting. If this constancy assumption does not hold, then this probability may not be high either.
Fig. 1 Control effect, treatment effect, and margin δ
The test of the null hypothesis in Eq. 3 by τ _{0}(δ), where δ is defined as in Eq. 5, has been called the two 95% CI-testing procedure. That is, the null hypothesis in Eq. 3 is rejected if
A somewhat similar two 90% CI-testing procedure has been used in thrombolytic trials (White^{[13]}).
Note that this δ in Eq. 5 is not truly a fixed number because it is defined in terms of an estimate of the control effect from the observed historical data. Thus, treating δ as a fixed number in the hypotheses in Eq. 3 is not exactly correct.
It is argued that although the two 95% CI-testing procedure offers a high probability that δ < log HR(P/C), it may be extremely conservative because the true control effect, log HR(P/C), may be very much underestimated by the lower limit of the 95% CI of the control effect. But this argument is true only if the constancy assumption holds. An example given later will show that this may not be conservative enough when the constancy condition fails to hold.
On the other hand, using the point estimate, hr(P/C), in the estimate of the true control effect may lead to a margin δ that is larger than the true control effect with relatively high probability. That is, if one defines
This probability would be even higher when the constancy assumption does not hold. Testing the null hypothesis in Eq. 3 with the statistic τ _{0}(δ), where δ is defined by Eq. 7, is equivalent to an asymmetric two CI-testing procedure where the control effect, log HR(P/C), is estimated by the point estimate, log hr(P/C), which can be thought of as the lower limit of the 0% CI of the control effect.
In the fixed margin method, the determination of the non-inferiority margin δ is based on the estimate of the historical control effect. Depending upon how this estimate is actually defined, the constancy assumption may or may not be assumed. Thus, for example, if the 95% confidence lower limit is used as the estimate of the control effect, then the constancy assumption is clearly not assumed and the historical control effect is actually discounted. If the point estimate is used, then the constancy assumption would be implicitly made and there would be no discounting of the historical control effect. The non-inferiority margin is then defined as half of this discounted historical control effect. Thus, implicitly through the way the control effect is estimated by the historical data, the fixed margin method assumes that the active control is effective in the current trial setting, and its effect size is estimated by a certain degree of discount of the historical control effect. Thus, the choice of the non-inferiority margin δ in the fixed margin method as discussed above can be problematic, because it can be too liberal or too restrictive depending upon whether the historical control effect is assumed to hold true in the current trial setting or whether it is discounted by taking for example the 95% confidence lower limit of the estimate of the historical control effect.
From the above discussion, it is clear that for the margin δ to satisfy the condition that δ < log HR(P/C) with high probability, one must define δ as a certain fraction, 0 < φ _{0} < 1, of the control effect, log HR(P/C), which is unknown and needs to be estimated with data from historical placebo control trials. The idea of considering retention of a certain fraction of the control effect has been discussed in Hauck and Anderson,^{[14]} Holmgren,^{[15]} Koch and Tangen,^{[16]} and Rothmann et al.^{[17]} When the non-inferiority hypothesis is formulated in terms of a fraction of control effect to be retained, it provides a couple of advantages. First, with the specification of φ _{0}, one does not need to pre-specify a δ margin. Secondly, the fraction of control effect to be retained can be specified according to the trial objective. For instance, if the current trial objective is to demonstrate that the new treatment is non-inferior to the active control, then the fraction specified should be closer to 1. If the current trial objective is to demonstrate that the new treatment is simply effective, then the fraction specified may be relatively smaller. Current practice often sets φ _{0} = 0.5.
The definition of the fraction, φ, of control effect retained by the new treatment is simply,
where the active control is assumed to be effective, i.e., log HR(P/C) > 0, in the current trial. The non-inferiority hypotheses may be stated as follows.
where φ _{0} is the desired fraction of active control effect to be retained by the new treatment. Upon substituting the expression for φ in Eqs. 8 into 9 and after some algebra, the null and alternative hypotheses in Eq. 9 can be restated as,
The alternative hypothesis of interest in Eq. 10 can be interpreted as a desire to rule out a loss of more than (1 − ϕ _{0}) of the control effect. The fraction ϕ _{0} reflects, in a sense, the level of evidence one may demand of the new treatment. For example, when φ _{0} = 1, i.e., requiring 100% retention of the control effect, the hypotheses in Eq. 10 become the superiority hypotheses in Eq. 1. When φ _{0} = 0, i.e., requiring 0% retention of the control effect, the hypotheses in Eq. 10 become indirect comparisons to the reference “placebo.”
Comparing to Eq. 3, one may view the expression (1−φ _{0}) log HR(P/C) in Eq. 10 as the δ margin although, here, (1−φ _{0}) log HR(P/C) is not a pre-specified number. It involves the true control effect, log HR(P/C), which is unknown. In the two 95% CI-testing procedure, the expression (1−φ _{0}) log HR(P/C) in Eq. 10 is replaced by the estimate Eq. (Eq. 5), and the whole expression is considered as a fixed δ margin. In contrast to a superiority trial or a fixed margin method, in an active control non-inferiority trial, the hypotheses in Eq. 10 involve two parameters: the hazard ratios HR(T/C) and HR(P/C), as discussed in Chen et al.^{[18]}
Recognizing the fact that log HR(P/C) is an unknown parameter, the following statistic T is proposed for testing the null hypothesis in Eq. 10.
T is claimed to be asymptotically normal and to have good convergence property^{[6]}. One then rejects H _{0} at the one-sided 2.5% significance level if T < −1.96. The fraction retention approach of Rothmann et al.^{[17]} was applied in 2001 in the review and evaluation of the effectiveness of Xeloda in two active control studies FDA^{[19]} and also discussed in Rothmann et al. ^{[17]} An application of this method can also be found in Wang et al.^{[20]} termed this approach the Synthesis Method, referring to the synthetic nature of the test statistic T in (Eq. 11), whereas Rothmann et al.^{[17]} called it the fraction retention method referring to the fraction retention hypothesis (Eq. 9), for which the linearized test statistic T was applied.
Since in an active control trial, there is no concurrent placebo, one cannot estimate the control effect, log HR(P/C), from the current trial. In Eq. 11, the estimate, log hr(P/C), needs to be estimated with data from historical placebo control trials. For this to be valid, one has to assume that the current control effect has not changed over time or, if it has changed, how much it has been reduced. For example, one may assume that the current control effect is a fraction, θ, of the historical control effect. Thus, in Eq. 11, one may include a fraction, θ, in front of the estimate of the control effect, log hr(P/C), which is based on the data from historical placebo control trials. However, in practice, information for such θ would be difficult to come by. The fraction retention method as proposed by Rothmann et al.[17] assumes that the constancy assumption holds. Thus, in the discussion below, we will assume that θ = 1.
It should be noted that in Eq. 8, it is assumed that log HR(P/C) > 0, that is, the Rothmann’s fraction retention method assumes that the active control is effective. Furthermore, in the test statistic T in Eq. 11, the fraction of control effect lost, (1 − ϕ _{0}) log HR(P/C), is unknown and is estimated by (1 − ϕ _{0}) log hr(P/C) based on historical data. Thus, it also implicitly makes the constancy assumption without stating it. Moreover, the statistic T combines data from the current trial with historical data from other completed trials, and this raises question with regard to the validity of the statistical inference drawn based on the test of the hypotheses in Eq. 9 or Eq. 10 by T and causes difficulty in the interpretation of the result.
It should be noted that because the fraction retention approach uses the historical control data for the estimate of the control effect in its test statistic Eq. 11, whereas the fixed margin approach uses the historical control data in the specification of its non-inferiority margin, the two methods appear to be different at least in the manner in which the control effect is estimated. The former assumes the constancy assumption holds, whereas the latter discounts the historical control effect. However, one can cast these methods within the same fraction retention hypothesis framework given by Eq. 10 as described below. Within this same framework of fraction retention hypothesis Eq. 10, one can see that the liberalism or conservatism of these methods is easily seen from their corresponding test statistics used to test the fraction retention hypotheses give in Eq. 10. These test statistics differ simply in the standard errors used for their respective estimator, log hr(T/C) − (1 − φ _{0}) log hr(P/C). Note that all these methods are considering 50% retention of the control effect. Certainly, 50% retention is arbitrary and subjective, but one can use other threshold if appropriate.
The test of the fixed δ-margin null hypothesis in Eq. 3 using the statistics τ _{0}(δ) with δ defined by Eq. 5 is equivalent to the two 95% CI-testing procedure (Eq. 6). It can easily be shown to be equivalent to using the following statistic t to test the fraction retention null hypothesis in Eq. 10:
One rejects H _{0} at the one-sided 2.5% significance level if t < −1.96. From the triangle inequality, one can easily deduce that
and hence, it follows that T < t, whenever the common numerator of both T and t, log hr(T/C)−(1−φ _{0}) log hr(P/C), is negative. Therefore, the null hypothesis H _{0} in Eq. 10 will be rejected by T whenever it is rejected by t at the same critical value, say, −1.96. This implies that relative to testing the null hypothesis in Eq. 10, the statistic t is deflating the type I error.
The test of the fixed δ-margin null hypothesis in Eq. 3 using the statistic τ _{0}(δ) with δ defined by Eq. 7 based on the point estimate, log hr(P/C), is a special case of an asymmetric two CI-testing procedure. It is easily seen to be equivalent to using the following statistic t* to test the null hypothesis in Eq. 10:
Once again, note that t* < T < t whenever the common numerator of t*, T and t, log hr(T/C) − (1 − φ _{0}) log hr(P/C) is negative. Thus, in testing the null hypothesis in Eq. 10 at the same critical value of −1.96, the statistic t* inflates the type I error. Thus, the statistic T protects against the conservatism of t and the liberalism of t* relative to testing the fraction retention null hypothesis in Eq. 10. It is shown that for time-to-event endpoints, the statistic t* deflates the type I error from 0.025 to as low as approximately 0.0028, while t* can inflate the type I error to as much as 0.50.^{[6]}
On the other hand, it is of interest to point out that these test statistics can be shown to correspond conditionally to two asymmetric CI-testing procedures, where the control effect is estimated by the lower limit of a 100(1 − γ)% CI, 0 < γ < 1. The test statistic t* corresponds to γ = 1, t corresponds to γ = 0.05, and T corresponds to some γ = γ _{T}, where 0.05 < γ _{T} < 1 such that if ^{ C(1 − γ T ) < 1.96 denote the critical value corresponding to γ T, then}
Although in general γ _{T} depends on the unknown standard error, SE[log hr(T/C)], but for time-to-event endpoints, such as survival, one may obtain an approximate estimate of γ _{T} at the design stage as shown in Ref. [6], because one can derive an approximate estimate of the unknown SE[log hr(T/C)] at the design stage and hence ^{ C }(1 − γ_{T} ), from which γ _{T} can be obtained. This will be illustrated in more detail in the later section on sample size determination. From this two CI-testing procedure perspective, one obtains an intuitive explanation as to why the statistic T for testing the hypothesis in Eq. 3 protects against the conservatism of t and the liberalism of t*, even though the fraction retention method does not quite fit into the two CI-testing procedure framework in general as indicated above except in the case for time-to-event endpoint.
This duality between the fraction retention hypothesis testing approach and the two CI-testing procedure approach exists because both approach uses the historical data to estimate the control effect, and this estimate is then used in either the test statistic or in the determination of the non-inferiority margin. It should be pointed out that all methods as discussed above actually use the same fraction retention level of 50%, and the control effect is estimated from historical data that came from studies that have been completed and known. If historical data is not available for the active control, then none of these methods would be applicable to an active control trial.
Remark 1: As noted previously, for the same fraction retention level, the fixed margin method is considered to be more conservative than the fraction retention method when the constancy assumption holds. However, the constancy assumption often cannot be verified without a placebo in the current trial. Therefore, in the fixed margin approach, an ostensible reason to select a conservative estimate of the control effect size (e.g., using the lower limit of a 95% confidence interval) is to minimize the potential impact of a deviation from the constancy assumption that is assumed under the fraction retention method. Perhaps, what is not readily appreciated is the fact that in the fixed margin approach, simply using a conservative estimate of the control effect size may not totally avoid the impact of an actual deviation from the constancy assumption as illustrated in the following example.
Example: Consider a highly effective cancer treatment C for relapsed and refractory multiple myeloma patients. Based on two randomized historical trials, a pooled point estimate of the response rate for treatment A is approximately 40% with an associated 95% confidence interval of (36–45%). A non-inferiority trial is proposed to demonstrate that a new treatment T is non-inferior to the control treatment C. The non-inferiority margin was selected based on the lower limit of the 95% confidence interval of 36% in the response rate. Clinically, it is believed that the treatment effect in response rate can be compromised at most 40% in a non-inferiority trial (i.e., a 60% retention of the control effect). Based on the fixed margin method, the non-inferiority margin is determined to be δ = (1−0.60) × 36% = 14.4%. The non-inferiority hypothesis using this margin may be considered conservative according to current regulatory guidance. However, if the constancy assumption does not hold in this case, and the response rate in the current non-inferiority trial is actually 25% say due to patient population differences and/or some other reasons, the non-inferiority margin of 14.4% would be quite large compared to the current control effect of 25%.
Remark 2: As noted earlier, the fraction retention threshold ϕ _{o} determines the stringency of the non-inferiority margin, and in turn it will affect the sample size required for the non-inferiority trial. An important question regarding how stringent should the fraction retention threshold be for a given non-inferiority trial has never been resolved. It has been argued that the FDA drug regulations do not require that a new treatment has to be shown to be non-inferior to an active control in order to be approved. All that the regulations require is that the new treatment be superior to placebo. If that is the case, then what fraction retention threshold would be acceptable as a demonstration of superiority to a placebo that is not present? From an ethical perspective, for a trial with mortality or serious morbidity outcome, it would be unacceptable to treat patients with a potentially inferior new treatment, unless this new treatment is expected to provide some other substantive benefit that may offset such inferiority. However, the problem is that these potential benefits are yet to be demonstrated and quantified, and hence cannot be used in the actual determination of the non-inferiority margin, although they certainly may be considered in the final benefit/risk assessment. This raises the questions as to how liberal or strict should the fraction retention threshold be and how does one determine that?
Remark 3: The definition of the fraction, φ, of control effect to be retained by the new treatment can also be stated in terms of hazard ratios as follows.
where the active control is assumed to be effective, i.e., HR(P/C) − 1 > 0, in the current trial. The non-inferiority hypotheses may be stated as follows.
where φ _{0} is the desired fraction of active control effect to be retained by the new treatment. Upon substituting the expression for φ _{h} in Eq. 15 into 16, and after some algebra, the null and alternative hypotheses in Eq. 16 can be restated as
When 0 < φ < 1, it can be shown easily that φ < φ _{h} < 1. Thus, if the non-inferiority null hypothesis in Eq .10 is rejected, then it follows that the null hypothesis in Eq. 16 is rejected for the same value of φ _{0}. Also, a test statistic for those hypotheses in Eq. 16 analogous to Eq. 14 can be constructed. An application of such a test statistic can be found in Ref. [19].
To pose the non-inferiority hypothesis in Eq. 16 directly in terms of the hazard ratios is attractive in that it offers ease of interpretation. Wang et al.^{[21]} have proposed a ratio test statistic for testing the non-inferiority hypothesis in Eq. 16. However, unlike the authors’ claim that it has greater power when compared to the linearized version, its power is comparable to the Rothmann’s linearized test statistic for the hypothesis in Eq. 10. The reason is that the authors assumed equal variance under the null and alternative hypotheses in the derivation. However, when one allows unequal variance as is likely to be the case, the power is substantially reduced.
The following are some important issues to be considered when contemplating an active control trial. See also the articles by Hung et al.^{[9–10]}
In order to have a valid statistical inference for an active control trial, several key assumptions have to be reasonably satisfied. The first assumption is that in the current trial setting, the active control is effective. Generally, this assumption is unverifiable. If this assumption is not true, then, for a superiority trial, one may only conclude that the new treatment is effective, but not that it is superior to the control. For a non-inferiority trial, this assumption may lead to the approval of a new treatment that is actually worse than a placebo. If possible, one should consider special design features and other concurrent trials that may provide evidence that the active control is effective. The second assumption is that the current trial has assay sensitivity. Assay sensitivity is defined as the ability of a trial to show an effective treatment to be effective. The ICH-E10 document on the Choice of Control^{[3]} introduces and discusses this concept in some detail. It essentially involves the quality of a trial including various aspects of trial design, conduct, and analysis. The third assumption is that the control effect in the current trial is either the same as or is some fraction of the historical control effect, and that there are placebo control historical trials on the basis of which the control effect and its variability can be reasonably assessed. In this regard, issues regarding quality, reliability, publication, and selection biases of these historical trials are of major concern and need to be dealt with. One should strive to make the current trial as similar as possible to the historical trials. If the active control effect has been diminished, then one needs to be able to specify the amount of reduction. The fourth assumption is that the trial is free of bias. An active control trial has potential for bias toward the null without unblinding the trial. Strict standard operating procedures should be in place. As none of these assumptions can be truly verified, the validity of an active control trial is always subject to question.
The fraction retention method may have further validity issue as its test statistic is based on integrating data from the current trial with the data from some completed known historical control data. On the other hand, the fixed margin method tends to be conservative when the constancy assumption holds. However, when this assumption does not hold, then it is difficult to judge whether the specified margin would be liberal or conservative. For both methods, the actual fraction of the control effect to be retained has customarily been set at one-half. This retention level is arbitrary, and for a given situation, other levels may be appropriate and should be considered.
Therefore, in view of these issues, active control trials should be considered only when using a placebo would truly be unethical.
As there is no placebo in the current active control trial, the control effect needs to be estimated from historical trial data. The questions one should address are as follows: What historical placebo control trials are available? Which collection of historical trials will be used to provide the estimate of the control effect? What kind of analysis will be used to provide this estimate? How reliable is the estimate of the historical control effect?
Although the control effect may be assessed by using data from historical randomized placebo-controlled studies, the current control effect may be different because of changes in patient population, standard care, medical practice, etc. One should design the current trial to be as similar as possible to the historical placebo control trials used in estimating the historical control effect. Important factors that may influence the trial outcome should be identified and may include region, gender, ethnicity, and patient characteristics. Standard of care differences may affect trial outcome. The current trial should also be as similar as possible to the historical trials with respect to these factors. Some adjustment of the historical control effect may be necessary by either replacing the control effect, log HR(P/C), in the hypotheses in Eq. 10 by θ log HR(P/C), where 0 < θ < 1, or by adjusting using a mathematical model for population and study differences.
One should be clear about the primary objective of the active control trial. Should the objective be to demonstrate superiority of the new treatment to the control, non-inferiority to the control, or simply effectiveness of the new treatment? Each of these objectives requires a hypothesis with a different degree of tightness in the margin. For demonstration of non-inferiority, one needs to specify a fraction, φ _{0}, of the control effect to be retained. The specification of this fraction depends on the trial objective. If the active control is highly effective, then there may be no legitimate reason to consider a new treatment that is inferior to this control. In this case, one may require φ _{0} to be closer to 1. However, if the new treatment offers some important benefits that are not available from the active control, such as a better toxicity profile, then φ _{0} may be specified somewhat smaller. In the latter case, one may only conclude that the new treatment is effective, but not non-inferior to the control. The actual value of φ _{0} may also depend not only on types of therapies, diseases, and endpoints but also on the distributional properties of the estimated control effect as well.
The conservatism of the two 95% CI-testing procedure does not take into account any of the concerns regarding the validity of the active control trial, nor the confidence one has regarding the estimate of the control effect. At the design stage of an active control trial, such concerns and lack of confidence may be reflected in the values specified for various design parameters. The design parameters include the fraction, φ _{0}, of control effect to be retained and the fraction, θ, of the historical control effect to be reduced. The specification of these design parameters should also take into consideration how the control effect estimate is calculated and the resultant sample size requirement. Once φ _{0} and θ have been specified, one can calculate the sample size required for testing the non-inferiority null hypothesis in Eq. 10.
The sample size required for testing the null hypothesis in Eq. 10 at a given significance level can be calculated as follows. The calculation depends on an appropriate assessment of the control effect. As mentioned before, using the statistic T to test the null hypothesis in Eq. 10 is conditionally equivalent to an asymmetric two CI-testing procedure, where the control effect is estimated by the lower limit of a 100(1 − γ _{T})% CI. The value of this γ _{T} is not known ahead of time. However, for time to event endpoint, this γ _{T} can be approximately calculated at design stage based on the pre-specified number of events at stopping time (the final analysis) and the significance level used in the test of the null hypothesis in Eq. 10.
In practice, the control effect sometimes cannot be assessed reliably either because there is insufficient historical data or the control effect may have changed over time. In such cases, the sample size calculation should be based on a conservatively estimated control effect, e.g., using the lower limit of the 95% or even 99% CI.
In oncology, we require, as stopping rules for time to event endpoints, pre-specified fixed number of events. A lower bound and approximation for the asymptotic standard error of log hr(T/C) for a 1:1 randomization is given by $\begin{array}{l}2\text{/}\sqrt{n}+{c}_{(1-\gamma )}(1-{\varphi}_{0})SE[\mathrm{log}hr(P\text{/}C)]\text{\hspace{0.17em}}\text{/}\text{\hspace{0.17em}}1.96\\ \text{\hspace{1em}}=\sqrt{\{4\text{/}n+{(1-{\varphi}_{0})}^{2}S{E}^{2}[\mathrm{log}hr(P\text{/}C)]\}},\end{array}$
, where n denotes the pre-specified total number of events at stopping. The use of this lower bound for the standard error at the design stage allows us to algebraically “equate” the use of a test statistic with a pre-specified two CI procedure.Consider the case where α = 0.025, β = 0.2, φ _{0} = 0.5, θ = 1, log hr(P/C) = 0.234, and SE[log hr(P/C)] = 0.075. Table 1 gives the number of events needed for various “alternatives” for HR(T/C) and the corresponding non-inferiority cutoffs for the upper limit of the 95% CI for log HR(T/C). The cutoff is defined as δ = (1 − φ _{0}){log hr(P/C) − c _{(1} − γ _{)}SE[log hr(P/C)]}, where c _{(1} − γ _{)} is the solution to the equation,
HR(T/C) |
Number of events |
Cutoff |
---|---|---|
1 |
4801 |
1.0842 |
0.95 |
1505 |
1.0976 |
0.90 |
750 |
1.1044 |
0.85 |
446 |
1.1085 |
0.80 |
291 |
1.1114 |
and the number of events needed at stopping is given by n where n solves
In the design of a non-inferiority trial, if one believes that the historical active control effect is likely to be maintained in the current trial setting, then the hypothesis with a retention of certain fraction of the control effect could be a reasonable formulation, although as noted earlier a basic problem with this approach is inherent in the test statistic used to test the non-inferiority hypothesis that is partly based on data from past completed studies (if they exist). If there is reason to believe that the historical control effect has been reduced over time, then the fixed margin approach would be more appropriate, although it remains a challenge to provide a reasonable estimate of the current control effect. If the control effect in the current setting is greater than the estimate, then the margin would be conservative. On the other hand, if the control effect in the current setting is smaller than the estimate, then the margin would be liberal as illustrated by the earlier example. In either the fraction retention or the fixed margin approach, there is a basic lack of a reference for assessing whether a given margin is too liberal or too restrictive. Further research in this regard would be needed in order to develop a more acceptable approach to the design of a non-inferiority study.
Active control non-inferiority trials should be considered on a case-by-case basis. We need to have assurance that the active control effect exists in the current study patient population. It is recommended that the best available treatment be used as the control. We need to assess whether the current effect size, if it exists, has diminished. We should have historical randomized placebo-controlled studies that can provide reliable estimate of the historical control effect. The current trial population should be as similar as possible to the populations studied in the historical control studies. A “non-inferiority” trial may be prone to bias toward “no difference.” Careful attention should be paid to the conduct and analysis of such a study. It should be well monitored and conducted. After the study is done and all efficacy and safety data collected, the evidence as to whether the new treatment is effective should be carefully examined. One should then assess this evidence by incorporating any other additional substantive benefits that the new treatment can provide to the patients that the control cannot. In view of the subjective nature of the margin determination, it is recommended that in the final efficacy assessment, the pre-specified non-inferiority margin should serve more as a guide than as an absolute cutpoint. The final conclusion of course should still follow the usual benefit/risk assessment.
The views expressed in this article are those of the authors and not necessarily those of the U.S. Food and Drug Administration nor those of the individual companies affiliated with the co-authors.