A few challenging aspects of noninferiority design deserve mention. Even if there is no placebo group, an implicit superiority comparison between the test treatment and placebo underpins the noninferiority trial. Three-group studies that include a placebo group may allow an explicit comparison, but practical or ethical reasons often preclude randomized assignment to placebo, and instead historical data must be relied on for the placebo comparison. In some cases, historical data for a placebo treatment are not available. In these cases, less effective treatments may stand in for placebo to identify the expected benefit of the active control on which to base the noninferiority margin. In studies of stroke prevention, aspirin has been the comparator for warfarin (the active control), and trials comparing warfarin with aspirin provide estimates of a treatment effect used to set noninferiority margins for novel oral anticoagulants. In the case of coronary stents, bare-metal stents have been used as the reference for the treatment effect of approved drug-eluting stents (the active control) in studies of new drug-eluting stents. Treatment strategies such as percutaneous coronary intervention (PCI) for left main coronary artery disease and transcatheter therapy for valvular heart disease have been compared with surgery, and patients receiving medical therapy have served as a reference group for the treatment effect of surgery. Antiinfective therapies are an example of an area of investigation in which no placebo comparisons are available. Finally, in setting the sample-size goal, the noninferiority margin should not be “back calculated” solely from a feasible sample size. To do so may sufficiently exclude the chosen margin but will not necessarily reflect a conclusion of noninferiority that is clinically meaningful.

A noninferiority study design is increasingly being used to evaluate the safety of new therapeutics. A particular challenge in noninferiority design for safety studies is that there are usually no reasonable data to justify the margin for safety. Instead, the study’s clinical advisors must decide what level of adverse events is acceptable. That level might vary according to the severity of the events, the absolute risk for the patient population, and the expected benefit of the treatment in question. In the PRECISION (Prospective Randomized Evaluation of Celecoxib Integrated Safety versus Ibuprofen or Naproxen) trial, which evaluated the noninferiority of celecoxib to naproxen for the treatment of arthritis, a relative margin of 1.33 was chosen on the basis of an expected annualized risk of 2% for the primary composite end point of death from cardiovascular causes (including hemorrhage), nonfatal myocardial infarction, or nonfatal stroke. Although this was a three-group trial, the third group did not receive placebo but instead received ibuprofen, as a second noninferiority comparator for celecoxib. During the 10-year study period, the rate of treatment discontinuation was nearly 80%, showing that drug trials may also be susceptible to incomplete treatment adherence. Nonetheless, in both the primary intention-to-treat analyses and secondary “on treatment” analyses, celecoxib was noninferior to naproxen and to ibuprofen.

Good trials include a variety of investigators with a representative mix of experience appropriate to the intervention under study. The trial of short-term use of antiseptic-coated versus antibiotic-impregnated versus plain latex catheters made substantial efforts to include a heterogeneous group of hospitals, specialists, and surgical procedures. Despite these examples, this is a dimension on which many trials fail the pragmatism test. A pragmatic approach is easier when an intervention is implemented at a group level rather than at an individual level — this is one reason that pragmatic trials commonly incorporate cluster randomization. In ASSIST, only 113 of 233 possible schools (48%) expressed an interest in participating in the trial. The percentage of potential clusters agreeing to take part will vary according to the trial context. A trial that is run by an overarching authority may achieve much higher participation. For example, a hospital could insist on the full involvement of all wards in a trial of approaches to infection control.

Finally, noninferiority designs raise analytic questions that may differ from those in a superiority study. In a superiority study, an intention-to-treat analysis (in which all patients who received the experimental treatment, even if only one dose, are included in the statistical tests for superiority) is used. In a noninferiority study, however, if some patients did not receive the full course of the assigned treatment, an intention-to-treat analysis may produce a bias toward a false positive conclusion of noninferiority by narrowing the difference between the treatments. In some instances, a per-protocol analysis, which excludes patients who did not meet the inclusion criteria or did not receive the randomized, per-protocol assignment, may be preferable in a noninferiority trial. However, a per-protocol analysis may include fewer participants and introduce postrandomization bias. In general, both the intention-to-treat and per-protocol data sets are important. We suggest analyzing both sets and examining the results for consistency. Furthermore, careful consideration and sensitivity analyses may be needed before drawing conclusions about noninferiority.

A trial with blinded interventions is not fully pragmatic. In pragmatic trials, the randomly assigned group is commonly not masked. Efforts that are made to minimize biases in open trials include focusing outcomes on major events, such as death and emergency hospital admissions. This approach has been used in the Prospective Randomized Open Blinded End-point (PROBE) trials, such as the Anglo-Scandinavian Cardiac Outcomes Trial–Blood Pressure Lowering Arm (ASCOT-BPLA) trial and the Systolic Blood Pressure Intervention Trial (SPRINT) of the effect on cardiovascular events of different strategies for lowering blood pressure. However, the reporting of nonserious adverse events, reasons for treatment discontinuation, and many patient-reported outcomes are subject to greater degrees of bias in open trials, which affects the quality of the trial. The Initial Antidepressant Choice in Primary Care trial, a policy trial of fluoxetine versus tricyclic drugs as first-line therapy for depression, assessed the consequences of the initial choice of an antidepressant agent under usual care conditions; adverse events were a main outcome, and the open nature of the trial could have compromised the integrity of this outcome. In the trial, clinical and quality-of-life outcomes and overall treatment costs provided no clear guidance regarding the initial selection of fluoxetine or tricyclic drugs. The CRASH trial involved a placebo control and blinding; nonetheless, it had many pragmatic elements. In many situations, the need to avoid reporting bias will override purist pragmatic considerations, making blinding the optimal approach. In complex intervention trials, in which blinding the intervention is often impossible, it is usually possible to blind the assessment of outcomes. In any trial, the advantages and disadvantages of blinding must be considered; blinding is particularly important when the reporting of key end points or safety events could be biased in an open trial.