2007 DesignandAnalysisofSteppedWedge

(Hussey & Hughes, 2007) ⇒ Michael A Hussey, and James P Hughes. (2007). “Design and Analysis of Stepped Wedge Cluster Randomized Trials.” Contemporary Clinical Trials, 28(2). doi:10.1016/j.cct.2006.05.007

Subject Headings: Stepped Wedge Cluster Randomized Trial, Partner Notification of Potential STI Exposure Task.

Notes

Cited By

http://scholar.google.com/scholar?q=%22Design+and+analysis+of+stepped+wedge+cluster+randomized+trials%22+2007

Quotes

Author Keywords

Cluster randomized trial; Stepped wedge design; Prevention trials.

Abstract

Cluster randomized trials (CRT) are often used to evaluate therapies or interventions in situations where individual randomization is not possible or not desirable for logistic, financial or ethical reasons. While a significant and rapidly growing body of literature exists on CRTs utilizing a “parallel” design (i.e. I clusters randomized to each treatment), only a few examples of CRTs using crossover designs have been described. In this article we discuss the design and analysis of a particular type of crossover CRT – the stepped wedge – and provide an example of its use.

1. Introduction

Cluster (or community, or group) randomized trials (CRT) are distinguished by the fact that individuals are randomized in groups rather than individually. CRTs have been used to evaluate antismoking interventions [1,2], methods of preventing human immunodeficiency virus (HIV) and other sexually transmitted diseases (STDs) [3,4], and in a number of other contexts [5,6]. Cluster designs may be chosen because the intervention can only be administered on a community-wide scale (e.g. [7]), or to minimize contamination ([8]), or for other logistic, financial or ethical reasons. From a statistical viewpoint, the key characteristic of CRTs is that the individual units within a cluster are correlated and this feature must be incorporated into power calculations and the trial analysis.

CRTs often employ a parallel design: for a two-arm study with 2I independent clusters, I clusters are randomly assigned to each intervention at a single time point. If the cluster sizes are all equal, a two-sample t-test may be used to compare cluster-level mean responses between the intervention groups. If there are more than 2 treatment arms, a oneway analysis of variance may be used. Sometimes the communities are matched and randomization is done within the matched sets. In that case, a paired analysis (e.g. paired t-test) is used. When cluster sizes vary, individual level analyses using generalized estimating equations [17] or random effects models [16] may be used. Statistical aspects of the design and analysis of parallel CRTs have been widely discussed (e.g. [9,10]).

In contrast, crossover designs are less commonly used in CRTs (three examples are [6,11,12]). A crossover CRT requires fewer clusters than a parallel design but may take twice as long (or longer) to complete (since each cluster receives both the treatment and control interventions). If the intervention requires a lengthy follow up period, then this fact alone might make a crossover design impractical. In a standard crossover design the order of the interventions is randomized for each cluster and a time period (called the “washout” period) is often included between the two interventions so that the first intervention does not affect the second. Analysis of a standard crossover design focuses on within-cluster comparisons using a paired t-test.

A stepped wedge design [13] is a type of crossover design in which different clusters cross over (switch treatments) at different time points. In addition, the clusters cross over in one direction only — typically, from control to intervention. The first time point usually corresponds to a baseline measurement where none of the clusters receive the intervention of interest. At subsequent time points, clusters initiate the intervention of interest and the response to the intervention is measured. More than one cluster may start the intervention at a time point, but the time at which a cluster begins the intervention is randomized. Fig. 1 illustrates the differences between the parallel, traditional crossover and stepped wedge designs.

Although the stepped wedge design extends the length of a randomized trial due to the presence of multiple time intervals, the nature of the design may be beneficial in certain settings. In a parallel or traditional crossover design, the intervention must be implemented in half of the total clusters simultaneously. However, limited resources or geographical constraints may make this logistically impossible (e.g. [13]). The stepped wedge design allows the researcher to implement the intervention in a smaller fraction of the clusters at each time point. Another unique feature of the stepped wedge design is that the crossover is unidirectional. All clusters eventually receive the intervention and, in particular, the intervention is never removed once it has been implemented (at least over the course of the trial) which may alleviate ethical and/or community concerns. This makes the stepped wedge design particularly useful for evaluating the population-level impact of an intervention that has been shown to be effective in an individually randomized trial. The unidirectional aspect of the crossover does, however, complicate the analysis since the treatment effect can no longer be estimated exclusively from within-cluster comparisons. More details on the analysis of such trials are provided below.

…

2. Example — partner notification

…

3. Statistical issues

In this section we examine a number of issues related to the design and analysis of stepped wedge CRTs.

…

4. Discussion

Using theoretical calculations and simulation we have investigated statistical characteristics of the stepped wedge design for cluster randomized trials. In particular, we have outlined a procedure for computing power in such trials and investigated the effect of varying intercluster correlation, number of randomization steps and treatment delay on trial power. The design is relatively insensitive to variations in the intercluster correlation. We also found that, for a fixed number of clusters, power decreases as the number of randomization steps decreases. Most of the power loss is due to a reduction in the number of measurement times rather than the reduction in randomization steps, per se. However, in practice, the optimal situation of having one cluster randomized to the intervention at each time point may be infeasible. A practical strategy is simply to maximize the number of time intervals given constraints on the number of clusters that can logistically be started at one time point and the desired length of the trial.

We found that a delay in the treatment effect (i.e. where the full treatment effect is not realized until one or more time intervals after the intervention is introduced) significantly reduces power. Delays can be incorporated into the power calculations by using fractional values for the treatment covariate in the design matrix Z. Explicit modeling of the delay in this manner recovers a small portion of the power. Adding additional monitoring periods at the end of the trial results in additional power recovery. However, the loss in power due to a delay in the treatment effect generally cannot be fully recovered. Therefore, it is desirable to make each monitoring period long enough so that the effect of the treatment is fully realized before the next period begins.

Analyses that rely on within-cluster information only (e.g. paired t-test) provide a valid analysis of the stepped wedge design only if there are no time effects. Otherwise, a within-cluster analysis provides a biased estimate of the treatment effect. A formula for the bias was derived based on the treatment schedule and the true values of time effect parameters β1, …, βT−1.Within-cluster analyses should only be used if no significant temporal trends or fluctuations are expected over the course of the trial. However, if external or a priori information suggests that there are no time effects then an analysis based on model (3) without parameters for time still provides a more efficient analysis than the paired t-test.

An anonymous reviewer suggested modifying (1) by including time as a random effect. We felt that this approach did not reflect our interest in controlling for temporal trends and fluctuations in disease prevalence over the course of a particular trial (and a relatively complex model for the time effect might be required since – for infectious disease studies – adjacent time periods are unlikely to be independent). Nonetheless, we found this idea interesting and potentially applicable in some circumstances. Such an approach might be particularly appropriate if temporal variations in the outcome were thought to be due to factors unrelated to changes in the underlying disease prevalence (e.g. changes in personnel doing outcome surveys). Further development of this idea is warranted.

Using simulations, we compared LMM, GLMM, and GEE with respect to size and power for a trial with 24 clusters and 5 time intervals (to mimic the Washington state EPT trial). The simulation results agreed well with predictions based on asymptotics — LMM maintained the nominal test size and had power close to that predicted by Eq. (7) for the case of equal cluster sizes. GEE and GLMM showed evidence of inflated size that could be resolved using a jackknife variance estimate. This phenomenon may be due to the limited number of clusters [23]. Although LMM had a slight power advantage when cluster sizes were equal, GEE and GLMM were substantially more efficient than LMM when cluster sizes varied.

Model (3) assumes that there are no cluster by time interactions. Including such interactions would result in an overparameterized model, however. If a cluster by time interaction is expected then one possible strategy is to create strata of clusters with similar expected time trends. Then a stratum by time interaction could be included as a factor in the model.

The stepped wedge design provides an innovative choice for a cluster randomized crossover trial that is subject to constraints that limit the use more conventional designs. The stepped wedge seems particularly suited to investigations of community level public health interventions that have been proven effective in individual level trials and so-called “phase IV” effectiveness trials.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 DesignandAnalysisofSteppedWedge	Michael A Hussey James P Hughes			Design and Analysis of Stepped Wedge Cluster Randomized Trials				10.1016/j.cct.2006.05.007		2007