2008 MethodsforTestingTheoryandEvalu

(Brown, Wang, et al., 2008) ⇒ C. Hendricks Brown, Wei Wang, Sheppard G Kellam, Bengt O. Muthén, Hanno Petras, Peter Toyinbo, Jeanne Poduska, Nicholas Ialongo, Peter A Wyman, Patricia Chamberlain, and The Prevention Science and Methodology Group. (2008). “Methods for Testing Theory and Evaluating Impact in Randomized Field Trials: Intent-to-treat Analyses for Integrating the Perspectives of Person, Place, and Time.” In: Drug and Alcohol Dependence Journal, 95. doi:10.1016/j.drugalcdep.2007.11.013

Subject Headings: Group-randomized trial; Subject-level Controlled Experiment

Notes

Cited By

http://scholar.google.com/scholar?q=%22Methods+for+testing+theory+and+evaluating+impact+in+randomized+field+trials%3A+Intent-to-treat+analyses+for+integrating+the+perspectives+of+person%2C+place%2C+and+time%22+2008

Quotes

Author Keywords

Intent-to-treat analysis; Group-randomized trials; Mediation; Moderation; Multilevel models; Growth models; Mixture models; Additive models; Random effect models; Developmental epidemiology; Prevention

Abstract

Randomized field trials provide unique opportunities to examine the effectiveness of an intervention in real world settings and to test and extend both theory of etiology and theory of intervention. These trials are designed not only to test for overall intervention impact but also to examine how impact varies as a function of individual level characteristics, context, and across time. Examination of such variation in impact requires analytical methods that take into account the trial's multiple nested structure and the evolving changes in outcomes over time. The models that we describe here merge multilevel modeling with growth modeling, allowing for variation in impact to be represented through discrete mixtures — growth mixture models — and nonparametric smooth functions — generalized additive mixed models. These methods are part of an emerging class of multilevel growth mixture models, and we illustrate these with models that examine overall impact and variation in impact. In this paper, we define intent-to-treat analyses in group-randomized multilevel field trials and discuss appropriate ways to identify, examine, and test for variation in impact without inflating the Type I error rate. We describe how to make causal inferences more robust to misspecification of covariates in such analyses and how to summarize and present these interactive intervention effects clearly. Practical strategies for reducing model complexity, checking model fit, and handling missing data are discussed using six randomized field trials to show how these methods may be used across trials randomized at different levels.

1. Introduction

Randomized field trials (RFTs) provide a powerful means of testing a defined intervention under realistic conditions. Just as important as the empirical evidence of overall impact that a trial provides (Flay et al., 2005), an RFT can also refine and extend both etiologic theory and intervention theory. Etiologic theory examines the role of risk and protective factors in prevention, and an RFT formally tests whether changes in these hypothesized factors lead to the prevention of targeted outcomes. Theories of intervention characterize how change in risk or protective factors impact immediate and distal targets and how specific theory driven mediators produce such changes (Kellam and Rebok, 1992; Kellam et al., 1999). The elaborations in theory that can come from an RFT draw on understanding the interactive effects of individual level variation in response over time to different environmental influences. An adolescent drug abuse prevention program that addresses perceived norms, for example, may differentially affect those already using substances compared to nonusers. This intervention’s effect may also differ in schools that have norms favoring use compared to schools with norms favoring nonuse. Finally, the impact may differ in middle and high school as early benefits may wane or become stronger over time.

This paper presents a general analytic framework and a range of analytic methods that characterize intervention impact in RFTs that may vary across individuals, contexts, and time. The framework begins by distinguishing the types of research questions that RFTs address, then continues by introducing a general three-level description of RFT designs. Six different RFTs are described briefly in terms of these three levels, and illustrations are used to show how to test theoretically driven hypotheses of impact variation across persons, place, and time. In this paper, we focus on intent-to-treat (ITT) analyses that examine the influence of baseline factors on impact, and leave all post-assignment analyses, such as mediation analysis, for discussions elsewhere. This separation into two parts is for pragmatic and space considerations only, as post-assignment analyses provide valuable insights into ITT results and are generally included in major evaluations of impact. For these intent-to-treat analyses, we present standards for determining which subjects should be included in analyses, how missing data and differences in intervention exposure should be handled, and what causal interpretations can be legitimately drawn from the statistical summaries. We present the full range of different modeling strategies available for examining variation in impact, and we emphasize those statistical models that are the most flexible in addressing individual level and contextual factors across time. Two underutilized methods for examining impact, generalized additive mixed models (GAMM) and growth mixture models (GMM), are presented in detail and applied to provide new findings on the impact of the Good Behavior Game (GBG) in the First Generation Baltimore Prevention Program trial.

We first define a randomized field trial and then describe the research questions it answers. An RFT uses randomization to test two or more defined psychosocial or education intervention conditions against one another in the field or community under realistic training, supervision, program funding, implementation, and administration conditions. All these conditions are relevant to evaluating effectiveness or impact within real world settings (Flay, 1986). In contrast, there are other randomized trials that test the efficacy of preventive interventions in early phases of development. These efficacy trials are designed to examine the maximal effect under restricted, highly standardized conditions that often reduce individual or contextual variation as much as possible. Testing efficacy requires that the intervention be implemented as intended and delivered with full fidelity. The interventions in efficacy trials are delivered by intervention agents (Snyder et al., 2006) who are carefully screened and highly trained. In efficacy trials, they are generally professionals who are brought in by an external research team. By contrast, the intervention agents of RFTs are often parents, community leaders, teachers or other practitioners who come from within the indigenous community or institutional settings (Flay, 1986). The level of fidelity in RFTs is thus likely to vary considerably, and examining such variation in delivery can be important in evaluating impact (Brown and Liao, 1999). Both types of trials are part of a larger strategy to build new interventions and test their ultimate effects in target populations (Greenwald and Cullen, 1985).

As a special class of experiments, RFTs have some unique features. Most importantly, they differ from efficacy trials on the degree of control placed on implementation of the intervention. They are designed to address questions other than those of pure efficacy, and they often assess both mediator and moderator effects (Krull and MacKinnon, 1999; MacKinnon and Dwyer, 1993; MacKinnon et al., 1989; Tein et al., 2004). Also, they often differ from many traditional trials by the level at which randomization occurs as well as the choice of target population. These differences are discussed below starting with comments on program implementation first.

Program implementation is quite likely to vary in RFTs due to variation in the skills and other factors that may make some teachers or parents more able to carry out the intervention than others even when they receive the same amount of training. These trials are designed to test an intervention the way it would be implemented within its community, agency, institutional, or governmental home setting. In such settings, differences in early and continued training, support for the implementers, and differences in the aptitude of the implementers can lead to variation in implementation. The intervention implementers, who are typically not under the control of the research team the way they are in efficacy trials, are likely to deliver the program with varied fidelity, more adaptation, and less regularity than that which occurs in efficacy trials (Dane and Schneider, 1998; Domitrovich and Greenberg, 2000; Harachi et al., 1999). Traditional intent-to-treat analyses which do not adjust for potential variations in implementation, fidelity, participation, or adherence, are often supplemented with “as-treated” analyses, mediation analysis, and other post-assignment analyses described elsewhere (Brown and Liao, 1999; Jo, 2002; MacKinnon, 2006).

A second common difference between RFTs and controlled efficacy trials is that the intervention often occurs at a group rather than individual level; random assignment in an efficacy trial is frequently at the level of the individual while that for an RFT generally occurs at levels other than the individual, such as classroom, school, or community. Individuals assigned to the same intervention cluster are assessed prior to and after the intervention, and their characteristics, as well as characteristics of their intervention group may serve in multilevel analyses of mediation or moderation (Krull and MacKinnon, 1999). In addition, levels nested above the group level where intervention assignment occurs, such as the school in a classroom randomized trial, can also be used in assessing variation in intervention impact. Examples of six recent multilevel designs are presented in Table 1; these are chosen because random assignment occurs at different levels ranging from the individual level to the classroom, school, district, and county level. This table describes the different levels in each trial as well as the individual level denominators that are used in intent-to-treat analyses, a topic we present in detail in Section 2.2. We continue to refer to these trials in this paper to illustrate the general approach to analyzing variation in impact for intent-to-treat, as treated, and other analyses involving post-assignment outcomes.

Table 1: Design factors at the individual, group, and block level and covariates hypothesized to account for variation in intervention impact for six randomized field trials

Finally, RFTs often target heterogeneous populations, whereas controlled experiments routinely use tight inclusion/exclusion criteria to test the intervention with a homogenous group. Because they are population-based, RFTs can be used to examine variation in impact across the population, for example to understand whether a drug prevention program in middle school has a different impact on those who are already using substances at baseline compared to those who have not yet used substances. This naturally offers an opportunity to examine the impact by baseline level of risk, and thereby examine whether changes in this risk affect outcomes in accord with etiologic theory.

We are often just as interested in examining variation in impact in RFTs as we are in examining the main effect. For example, a universal, whole classroom intervention aimed proximally at reducing early aggressive, disruptive behavior and distally at preventing later drug abuse/dependence disorders may impact those children who were aggressive, disruptive at baseline but have little impact on low aggressive, disruptive children. It may work especially well in classes with high numbers of aggressive, disruptive children but show less impact in either classrooms with low numbers of aggressive, disruptive children or in classrooms that are already well managed. Incorporating these contextual factors in multilevel analyses should also increase our ability to generalize results to broader settings (Cronbach, 1972; Shadish et al., 2002). Prevention of or delay in later drug abuse/dependence disorders may also depend on continued reduction in aggressive, disruptive behavior through time. Thus our analytic modeling of intervention impact or RFTs will often require us to incorporate growth trajectories, as well as multilevel factors.

RFTs, such as that of the Baltimore Prevention Program (BPP) described in this issue of Drug and Alcohol Dependence (Kellam et al., 2008), are designed to examine the three fundamental questions of a prevention program’s impact on a defined population: (1) who benefits; (2) for how long; (3) and under what conditions or contexts? Answering these three questions allows us to draw inferences and refine theories of intervention far beyond what we could do if we only address whether a significant overall program impact was found. The corresponding analytical approaches we use to answer these questions require greater sophistication and model checking than would ordinarily be required of analyses limited to addressing overall program impact. In this paper, we present integrative analytic strategies for addressing these three general questions from an RFT and illustrate how they test and build theory as well as lead to increased effectiveness at a population level. Appropriate uses of these methods to address specific research questions are given and illustrated on data related to the prevention of drug abuse/dependence disorders from the First Baltimore Prevention Program trial and other ongoing RFTs.

The prevention science goal in understanding who benefits, for how long, and under what conditions or contexts draws on similar perspectives from both theories of human behavior and from methodology that characterize how behaviors change through time and context. In the developmental sciences, for example, the focus is on examining how individual behavior is shaped over time or stage of life by individual differences acting in environmental contexts (Weiss, 1949). In epidemiology, which seeks to identify the causes of a disorder in a population, we first start descriptively by identifying the person, place, and time factors that link those with the disorder to those without such a disorder (Lilienfeld and Lilienfeld, 1980).

From the perspective of prevention methodology, these same person, place, and time considerations play a fundamental roles in trial design (Brown and Liao, 1999; Brown et al., 2006, 2007a,b) and analysis (Brown et al., 2008; Bryk and Raudenbush, 1987; Goldstein, 2003; Hedeker and Gibbons, 1994; Muthén, 1997; Muthén and Shedden, 1999; Muthén et al., 2002; Raudenbush, 1997; Wang et al., 2005; Xu and Hedeker, 2001). Randomized trial designs have extended beyond those with individual level randomization to those that randomize at the level of the group or place (Brown and Liao, 1999; Brown et al., 2006; Donner and Klar, 2000; Murray, 1998; Raudenbush, 1997; Raudenbush and Liu, 2000; Seltzer, 2004). Randomization also can occur simultaneously in time and place as illustrated in dynamic wait-listed designs where schools are assigned to receive an intervention at randomly determined times (Brown et al., 2006). Finally, in a number of analytic approaches used by prevention methodologists that are derived from the fields of biostatistics, psychometrics, and the newly emerging ecometrics (Raudenbush and Sampson, 1999), there now exist ways to include characteristics of person and place in examining impact through time.

There has been extensive methodologic work done to develop analytic models that focus on person, place, and time. For modeling variation across persons, we often use two broad classes of modeling. Regression modeling is used to assess the impact of observed covariates that are measured on individuals and contexts that are measured without error. Mixed effects modeling, random effects, latent variables, or latent classes are used when there is important measurement error, when there are unobserved variables or groupings, or when clustering in contexts produces intraclass correlation. For modeling the role of places or context, multilevel modeling or mixed modeling is commonly used. For models involving time, growth modeling is often used, although growth can be examined in a multilevel framework as well. While all these types of models — regression, random effects, latent variable, latent class, multilevel, mixed, and growth modeling — have been developed somewhat separately from one another, the recent trend has been to integrate many of these perspectives. There is a growing overlap in the overall models that are available from these different perspectives (Brown et al., 2008; Gibbons et al., 1988), and direct correspondences between these approaches can often be made (Wang et al., 2005). Indeed, the newest versions of many well-known software packages in multilevel modeling (HLM, MLWin), mixed or random effect modeling (SAS, Splus, R, SuperMix), and latent variable and growth modeling (Mplus, Amos), provide routines that can replicate models from several of the other packages.

Out of this new analytic integration come increased opportunities for examining complex research questions that are now being raised by our trials. In this paper, we provide a framework for carrying out such analyses with data from RFTs in the pursuit of answers to the three questions of who benefits, for how long, and under what conditions or contexts. In Section 2, we describe analytic and modeling issues to examine impact of individual and contextual effects on a single outcome measure. In this section, we deal with defining intent-to-treat analyses for multilevel trials, handling missing data, theoretical models of variation in impact, modeling and interpreting specific estimates as causal effects of the intervention, and methods for adjusting for different rates of assignment to the intervention. The first model we describe is a generalized linear mixed model (GLMM), which models a binary outcome using logistic regression and includes random effects as well. We conclude with a discussion of generalized additive mixed models, which represent the most integrative model in this class. Some of this section includes technical discussion of statistical issues; non-technical readers can skip these sections without losing the meaning by attending to the concluding sentences that describe the findings in less technical terms, as well as the examples and figures.

In Section 3, we discuss methods to examine intervention impact on growth trajectories. We discuss representing intervention impact in our models with specific coefficients that can be tested. Because of their importance to examining the effects of prevention programs, growth mixture models are highlighted, and we provide a causal interpretation of these parameters as well as discuss a number of methods to examine model fit. Again, non-technical readers can skip the equations and attend to introductory statements that precede the technical discussions.

Section 4 returns to the use of these analyses for testing impact and building theory. We also describe newer modeling techniques, called General Growth Mixture Models (GGMM), that are beginning to integrate the models described in Sections 2 and 3.

2. Using an RFT to determine who benefits from or is harmed by an intervention on a single outcome measure

This question is centrally concerned with assessing intervention impact across a range of individual, group, and context level characteristics. We note first that population-based randomized preventive field trials have the flexibility of addressing this question much more broadly than do traditional clinicbased randomized trials where selection into the clinic makes it hard to study variation in impact. With classic [[pharmaceutical randomized clinical trials (P-RCT’s)]], the most common type of controlled experiment in humans, there is a well accepted methodology for evaluating impact that began with the early pharmacotherapy trials conducted by A. B. Hill starting in the 1940s (Hill, 1962) and is now routinely used by pharmaceutical licensing agencies such as the U.S. Food and Drug Administration and similar agencies in Europe and elsewhere. The most important impact analysis for P-RCTs has been the so-called “intent-to-treat” (ITT) analysis, a set of rigid rules that determine (1) who is included in the analyses — the denominator — (2) how to classify subjects into intervention conditions, and (3) how to handle attrition. ITT is also intended to lead to a conservative estimate of intervention impact in the presence of partial adherence to a medication and partial dropout from the study during the follow-up period (Lachin, 2000; Lavori, 1992; Pocock, 1983; Tsiatis, 1990). These two sources of bias, called treatment dropout and study dropout (Kleinman et al., 1998), have direct analogues in RFTs as well (Brown and Liao, 1999). Detailed examination of how these two factors impact statistical inferences in RCTs have been done by others (Kleinman et al., 1998). In this paper, we use a minimum of technical language to examine first the accepted characteristics of ITT analyses for P-RCTs and then specify a new standard for multilevel RFTs directed at our interests in understanding variation of impact among individuals, places, and time.

3. Analytical strategies for examining variation in intervention impact over time

In this section, we summarize how growth modeling can characterize the patterns of change in repeated measures over time due to an intervention compared to control. We consider many of these models as intent-to-treat analyses, and for some trials a growth model analysis may provide the primary analysis of impact, just as in P-RCT’s the primary analysis can be based on the rates of change in a repeated measure for intervention versus control (Muthén, 1997, 2003, 2004, in press; Muthén and Muthén, 1998-2007; Muthén et al., 2002). These growth models are quite flexible, incorporating linear or nonlinear growth patterns, interactions with baseline variables, intervention changes that affect the variance or covariance as well as the mean pattern of growth, and varying intervention impact across different patterns of growth, rather than an effect that is homogeneous across the entire population. These methods also have flexible ways of dealing with non-normal distributions, including the use of Two-Part (Olsen and Schafer, 2001), and related censoring models (Nagin, 2005) for drug use and other data where zero use is its own special category, as well as for binary, ordinal, and time-to-event data (Muthén and Muthén, 1998-2007). Elsewhere, we have described these different types of growth models and shown their use on the First BPP trial impact analyses of the GBG (Brown et al., 2008); therefore in this paper we illustrate the range of the use of these models in RFTs.

4. Discussion

RFTs are designed to answer research questions that examine interventions delivered in real world settings. The main question we address in ITT analyses involves assessing an intervention’s effectiveness, in order to characterize conditions under which outcomes improve or worsen relative to a community standard. The methods described in this paper address standards for conducting ITT analyses, analytic tools that incorporate clustering and nonlinearity in the modeling, methods to handle incomplete data, and modeling strategies that protect our inferences of variation in impact against incomplete specification of the model.

Regarding standards for conducting ITT analyses in multilevel RFTs, we concluded that design details would dictate just which individuals should be included in the analyses. By limiting the analysis to all those individuals who were there at the beginning, we avoid selection biases by having comparable groups to compare between intervention and control at baseline. On the other hand, the handling of late entrants pits two goals of ITT analyses against one another, the goals of avoiding biases in intervention groups and avoiding complications dealing with partial exposure to an intervention. The case for their exclusion in ITT analyses is that late entrance is an event that occurs after the intervention period begins, thus potentially affecting the inferences in unknown ways. The case for inclusion is that late entrance is a natural, uncontrolled occurrence that needs to be accounted for in evaluating overall impact. If the circumstances of the trial allow one to argue convincingly that (1) late entrants are completely comparable across intervention groups and (2) these late entering subjects are not choosing to enter because an intervention is being used, then it would be permissible to include these late entrants in ITT analyses. We also recommend that a rule be established to define late entrants and that they generally not be included in ITT analyses except under certain circumstances such as continued random assignment.

Even if late entrants are excluded from formal analyses, their presence in the classroom may have some effects on the outcomes of the other participants. For example, Kellam et al. (1998) reported that higher levels of first grade classroom aggressive, disruptive behavior had a strong interaction with individual level of aggressive, disruptive behavior on middle school aggressive, disruptive behavior. If aggressive, disruptive, late entrant children are disproportionally assigned to one intervention, this could introduce bias in evaluating impact. In the First BPP trial, we saw somewhat higher rates of late entrant children being assigned to GBG classrooms, so such contextual variation in classroom aggressive, disruptive behavior by condition should have an attenuating impact of the GBG; nevertheless, we report a number of significant findings.

In some RFTs, there is no formal enumeration of a denominator for each community under study. RFTs that test surveillance or case identification strategies, such as testing whether a gatekeeper training program can increase the identification of suicidal youth in schools (Brown et al., 2006), directly count the numerators but often must rely on some census or indirect method for determining denominators in order to calculate the rate of identification for suicidality. In that trial, which randomized schools to when their staff would receive gatekeeper training, we do not have available detailed tracking information of youth in the schools; therefore there is no practical way of removing late entrants from both the numerator that counts suicidal youth and the denominator of that risk set. In this situation, the late entrants cannot be dropped from the analysis.

This paper recommends two types of high quality missing data procedures to be used in RFTs: full information maximum likelihood (FIML) and multiple imputation procedures. Our experience with longitudinal follow-ups of RFT’s is that these models often do provide similar inferences to one another but often produce different inferences compared to those based on lower quality missing data procedures. It is usually worth the effort to use FIML or multiple imputation procedures in the analyses. We note, however, there is one common situation where the standard analysis that ignores any missing data is equivalent to a full information maximum likelihood analysis, e.g.: when there are no missing covariates and only the outcome is missing. Thus special procedures are not necessary in this case.

One important procedure that we introduce in this paper is the assignment adjusted analysis. This procedure protects against under inclusion of covariates in an analysis of RFTs. In classroom-based trials as well as other multilevel designs, the proportion of units assigned to active intervention within a block (i.e., school) is often not constant; in the case of classroom-based trials the varying numbers of classrooms per school forces this proportion to vary. Randomizing at this higher level does not automatically protect against under inclusion of covariates in the way it would if randomizing at the individual level. The assignment adjusted procedure we present above is useful whenever randomization to intervention is imbalanced across these higher levels of blocking. We suggest that it be used to compare against standard analyses; if no differences are found, the original analyses can be reported with a note that assignment adjustment did not result in any different conclusions. If there are differences in the conclusions about intervention impact in these two analyses, we recommend a closer examination of the potential effects of additional measured covariates that had not been included. If these analyses fail to resolve the differences, we believe that there should be greater reliance placed on the assignment adjusted analyses.

We have presented two broad classes of analytic models that are well equipped to examine variation in impact. Additive models allow for a very flexible way to examine how baseline risk may moderate intervention effect, so statements about impact at the low and high ends of risk, as well as in the middle, are generally more valid than those based on linear models (Brown, 1993a). Likewise, growth mixture models can separately examine impact across different trajectory classes. This procedure is also flexible in fitting multiple growth patterns to data. One of its strengths is that this flexibility allows us to examine whether the intervention impact is present across all classes, whether the intervention impact on trajectories is the same or different across classes, and whether the impact changes across time. Simultaneous examination of impact at each time point is also appropriate to do if one uses Bonferroni or other methods to correct for the number of comparisons (Petras et al., 2008). It is also possible to attribute causal inferences about the intervention impact to both of these models. The flexibility of these models is also a source of weakness; if either of these models is fit poorly to the data, then the resultant model coefficients can be interpreted erroneously. The methods we outlined to assess quality of fit are essential to apply before selecting a model or examining coefficients that address impact.

In presenting these new methods, this paper also provides new evidence of the GBG impact on males. Specifically, the analyses of the GBG’s impact on DISC conduct disorder demonstrate substantial benefit on a diagnosable disorder by grade six. These early impact results on conduct disorder continue through adolescence and young adulthood on aggressive, disruptive behavior, antisocial personality disorder, and violent and criminal behavior (Petras et al., 2008), as well as on drug and alcohol abuse/dependence disorders (Kellam et al., 2008).

Questions of variation in impact are central for theory building and practical implementation of an effective intervention in community or population settings. Populations have wide variations in risk and protective factors, so we would expect that an intervention that targets a particular risk factor, such as aggressive, disruptive behavior, would have differential impact across this level of risk in the population (Brown et al., 2007c). For interventions that target multiple risk and protective factors, differential impact is also likely. Thus in population-based trials, we recommend that one planned analysis be an ITT examination of whether impact varies based on hypothesized risk factors. Even if no interactive impact with baseline individual level risk is found, individual level risk may affect outcomes as a main effect. Even when the outcome is far removed in time from the intervention period, there can be dramatic continuities of these antecedent risks over time, as we have found in our analyses of the role of aggressive, disruptive behavior in the long-term effects of the GBG (Kellam et al., 2008; Petras et al., 2008; Poduska et al., 2008; Wilcox et al., 2008).

When an intervention targets multiple risk and protective factors or when it targets risk processes, such as coercive interactions in the family, it may be more challenging to identify a short list of baseline measures that are best suited to examine first. Others have found, however, that risk factors often tend to co-occur and their presence is often associated with the absence of many protective factors, so it may well be possible to form a one dimensional scale for risk and a second dimensional scale for protective effects. In the case of risk processes, such as coercive interaction styles between a parent and a child, there are often simple baseline measures, such as the child’s assessment of family communication that correlate well with these more complex patterns that are themselves the targets of the intervention.

An intervention’s effect may vary across different contextual factors as well. Interventions that aim to change norms about drug use or willingness to prevent suicide, for example, need to measure these factors at baseline across the appropriate “level of intervention” (Brown and Liao, 1999), or social environments such as the classroom or the school where such interventions are expected to operate (Flay and Collins, 2005). The failure to describe the role of these baseline contextual factors can lead to large-scale implementations in communities where these interventions may not be effective.

For universal, selective, and indicated interventions, there are some differences in how we would frame or use information on variation in impact. For universal interventions, it is quite possible for an intervention aimed at a broad population to be beneficial for some and cause harm to others. This can occur, for example in drug prevention studies, when some subjects are already using substances and others are nonusers. The two goals of primary prevention of delaying initiation for nonusers and secondary prevention to reduce drug use among those already using, may not be accomplished well by an intervention, and it may be that one group benefits while the other is harmed. Such questions of positive and negative impact have been raised in the prevention of outcomes that have not received the same level of attention as that for drug abuse. In suicide prevention, concerns have been raised that even discussing suicide may direct non-affected youth towards these outcomes (some evidence now refutes this; see Gould et al., 2005), but poorly conceived programs that memorialize peers who have recently committed suicide may have a contagion effect. In delinquency and drug prevention, there is clear evidence of learning that is transmitted from more deviant to less deviant youth (Dishion et al., 1996, 1999, 2001). Only by studying the impact among these different subgroups in carefully designed randomized trials will we be able to determine whether a program is having a harmful effect on a vulnerable population.

Continuing with variation in impact in universal preventive interventions, the work that our group and others (Hawkins et al., 2005; Reid et al., 1999) have done in early prevention of aggression, conduct disorder, delinquency and other externalizing behaviors, strongly suggests that prevention programs aimed at integrating and socializing children who exhibit externalizing behaviors into successful roles in the classroom, school, and family, can have major impacts on this high risk group and have beneficial or at least no harmful effects on those at much lower risk. Compared to programs that isolate and concentrate poorly behaving youth (Dishion et al., 1996, 1999, 2001), such approaches provide benefit by shaping behaviors within the most relevant social fields in their lives, thereby avoiding issues of labeling children as different and requiring a different intervention to adjust for reentry. These early, universal preventive interventions are likely to be cost effective strategies for preventing the life-persistent conduct disorder and antisocial behavior.

For selective preventive interventions, such as those directed at children going through a major transition in family composition due to foster parenting (Chamberlain, 2003), divorce (Forgatch and DeGarmo, in press; Wolchik et al., 2002, in press), or bereavement (Sandler et al., 2003), an examination of variation in impact can help differentiate those who may benefit from an existing intervention from those who would be better served by another intervention or none at all. As an example of a selective intervention, the multidimensional treatment foster care provides ongoing support for foster parents in handling the needs of individual children. A parent daily report (PDR) is used as a daily tool to assess how the child is behaving, and repeated high scores on this scale are highly predictive of disruptions from their foster care placements and other poor outcomes for the child (Chamberlain et al., 2006). We would predict that the MTFC intervention would be more impactful for those youth who score high on PDR soon after placement. Thus an outcome-effective as well as cost-effective way of implementing this program in a community may be to direct higher resources to those foster families taking care of children with high levels of PDRs.

Indicated interventions directed at those who are already exhibiting signs or symptoms related to a disorder, or treatments themselves can use an understanding of variation in impact to better predict those who are adequately served by an intervention from those who are not likely to utilize or benefit from a particular intervention. In the MTA Multimodal Treatment Study of Children with attention deficit hyperactivity disorder (ADHD), for example, children were randomized to a medication-management, a behavioral intervention, their combination, or community treatment model with less management. The course of attention problems and social functioning varies dramatically in these children, and can be represented by a growth mixture model with one group improving quickly and having good outcomes while a second group is more likely to have less favorable outcomes (MTA Cooperative Group, 1999; Swanson et al., in press). The benefit of well managed medication over behavioral therapy alone or community controls on social skills and peer relations appears clearly for both classes of children. However, many families deviate from their original assigned intervention condition by initiating medication use or discontinuing use over time. By modeling both impact and continued use as outcomes, we can predict who is more likely to benefit from long-term medication use and who is not as likely, thus helping inform families whether their child is likely to benefit from continued use.

Unified intervention strategies provide a population or public health approach to prevention that integrates universal, selective, and indicated interventions (Brown and Liao, 1999). These approaches begin with a broad-based intervention, and then apply more intensive interventions to non-responders. Thus in a first stage of this unified intervention, a universal first-grade intervention that focuses on managing classroom behavior, improving reading, and linking families and classrooms may be applied to everyone. For those youth who continue having problems with achievement and behavior in the classroom, a more intensive intervention that involves work outside the classroom or with the parents can serve to enhance supports. Finally, for those who still need more assistance, a treatment oriented program can be provided. At each stage in this model, one can test interventions through an additional randomization of atrisk youth. Cutoff values based on baseline risk can be assessed using additive models, described here (Petras et al., 2004, 2005), or tree-based models (Breiman et al., 1984) that specifically identify cutoffs empirically.

One implication of this perspective on examining variation in impact based on theoretically hypothesized moderators is that such findings are more difficult to describe in terms of effect sizes, overall odds ratios and the like that are now in use (Brown et al., 2007c). These one-dimensional summaries are often used in meta-analyses to combine inferences across similar studies to examine intervention impact, and variation in impact is one reason why some programs or prevention approaches may show low effect sizes. For example, media campaigns focusing on preventing marijuana use appear to have quite limited success in the general population, yet the campaigns are directed to target audiences, such as sensation seekers, who are the only ones likely to be affected (Palmgreen et al., 2007). In these cases, there is no single summary, like an overall effect size that will satisfactorily summarize this interactive effect. It is very valuable, however, to present analyses in scientific papers that are based on subgroups thought to be most at risk, or thought likely to benefit the most from an intervention directed at them (Pillow et al., 1991; Brown, 1991), as well as the nonlinear and linear interaction effects described in this paper. Only by doing so will one be able to examine in meta-analyses how interventions may differ across risk levels (Brown et al., 2007c). Furthermore, the power that one has to look at interaction effects in a single RFT is likely to be modest (Brown et al., 2007d). However, by reporting impact across risk subgroups in each single trial, a meta-analysis can use the accumulation of these results to examine more fully the impact as a function of risk level.

Finally, we caution that indiscriminant or overuse of the methods for examining variation in impact that are described in this paper will result in spurious findings. We provided a strategy that maintains an overall Type I error rate for each analysis (Kellam et al., 2008). If one does not place limits on the number of tests or correct for multiple comparisons, it will always be possible to find a significant impact on a subset of subjects if one looks long enough. The methods described here need to be applied formally to test hypothesized variations in impact, i.e., by variation in individual level or contextual level of baseline risk. They should not be used repeatedly in purely exploratory fashion without being guided by theory. Similarly, the strength of these methods in this paper relies on maintaining the quality of the research design throughout the study. No amount of analytic sophistication can correct for severe deviations from the design protocol. If groups are randomized to intervention conditions but then significant numbers of individuals do not receive the intended intervention, or if there is assessment or attrition bias, these ITT analyses could have little relevance to the causal effect of the intervention. It is necessary for researchers to conduct RFT’s so that the design integrity is maintained throughout the study.

}}

References

Aber JL, Gephart MA, Brooks-Gunn J, Connell JP. Development in context: implications for studying neighborhood effects. In: Brooks-Gunn J, Duncan GJ, Aber JL, editors. Neighborhood Poverty. Russell Sage Foundation; New York: 1997. pp. 44–61.
Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 1996;91:444–455.
Asparouhov T, Muthèn BO. Multilevel mixture models. In: Hancock GR, Samuelsen KM, editors. Advances in Latent Variable Mixture Models. Information Age Publishing, Inc.; Charlotte, NC: in press.
Baker SG, Fitzmaurice GM, Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative covariates. Biostatistics. 2006;7:29–40. [PubMed]
Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ. Latent variable regression for multiple discrete outcomes. J. Am. Stat. Assoc. 1997;92:1375–1386.
Bauer JD, Curran PJ. Distributional assumptions of growth mixture models: implications for overextraction of latent trajectory classes. Psychol. Methods. 2003;8:338–363. [PubMed]
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth International Group; Belmont, CA: 1984.
Breslow N, Clayton DG. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 1993;88:9–25.
Brooks-Gunn J, Duncan GJ, Klebanov PK, Sealand N. Do neighborhoods influence child and adolescent development? Am. J. Sociol. 1993;99:353–395.
Brown CH. Comparison of mediational selected strategies and sequential designs for preventive trials: comments on a proposal by Pillow et al. Am. J. Community Psychol. 1991;19:837–846. [PubMed]
Brown CH. Analyzing preventive trials with generalized additive models. Am. J. Community Psychol. 1993a;21:635–664. [PubMed]
Brown CH. Statistical methods for preventive trials in mental health. Stat. Med. 1993b;12:289–300. [PubMed]
Brown CH, Costigan T, Kendziora K. Data analytic frameworks: analysis of variance, latent growth, and hierarchical models. In: Nezu A, Nezu C, editors. Evidence-Based Outcome Research: A Practical Guide to Conducting Randomized Clinical Trials for Psychosocial Interventions. Oxford University Press; London: 2008. pp. 285–313.
Brown CH, Indurkhya A, Kellam SG. Power calculations for data missing by design: applications to a follow-up study of lead exposure and attention. J. Am. Stat. Assoc. 2000;95:383–395.
Brown CH, Kellam SG, Ialongo N, Poduska J, Ford C. Prevention of aggressive behavior through middle school using a first grade classroom-based intervention. In: Tsuang MT, Lyons MJ, Stone WS, editors. Towards Prevention and Early Intervention of Major Mental and Substance Abuse Disorders. American Psychiatric Publishing; Arlington, VA: 2007a. pp. 347–370.
Brown CH, Liao J. Principles for designing randomized preventive trials in mental health: an emerging developmental epidemiology paradigm. Am. J. Community Psychol. 1999;27:673–710. [PubMed]
Brown CH, Wang W, Guo J. Technical report, Department of Epidemiology and Biostatistics. University of South Florida; Tampa, FL: 2007b. Modeling variation in impact in randomized field trials.
Brown CH, Wang W, Sandler I. Technical report, Department of Epidemiology and Biostatistics. University of South Florida; Tampa, FL: 2007c. Examining how context changes intervention impact: the use of effect sizes in multilevel meta-analysis.
Brown CH, Wyman PA, Brinales JM, Gibbons RD. The role of randomized trials in testing interventions for the prevention of youth suicide. Int. Rev. Psychiatry. 2007d;18:617–631. [PubMed]
Brown CH, Wyman PA, Guo J, Peña J. Dynamic wait-listed designs for randomized trials: new designs for prevention of youth suicide. Clin. Trials. 2006;3:259–271. [PubMed]
Bryk AS, Raudenbush SW. Application of hierarchical linear models to assessing change. Psychol. Bull. 1987;101:147–158.
Carlin JB, Wolfe R, Brown CH, Gelman A. A case study on the choice, interpretation and checking of multilevel models for longitudinal, binary outcomes. Biostatistics. 2001;2:397–416. [PubMed]
Chamberlain P. Treating Chronic Juvenile Offenders: Advances Made through the Oregon Multidimensional Treatment Foster Care Model. American Psychological Association; Washington, D.C.: 2003.
Chamberlain P, Price JM, Reid JB, Landsverk J, Fisher PA, Stoolmiller M. Who disrupts from placement in foster and kinship care? Child Abuse Negl. 2006;30:409–424. [PubMed]
Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychol. Methods. 2001;6:330–351. [PubMed]
Cronbach LJ. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. Wiley; New York: 1972.
Dane AV, Schneider BH. Program integrity in primary and early secondary prevention: are implementation effects out of control. Clin. Psychol. Rev. 1998;18:23–45. [PubMed]
Dishion TJ, Poulin F, Burraston B. Peer group dynamics associated with iatrogenic effects in group interventions with high-risk young adolescents. New Dir. Child Adolesc. Dev. 2001;91:79–92. [PubMed]
Dishion TJ, McCord J, Poulin F. When interventions harm: peer groups and problem behavior. Am. Psychol. 1999;54:755–764. [PubMed]
Dishion TJ, Spracklen KM, Andrews DW, Patterson GR. Deviancy training in male adolescent friendships. Behav. Ther. 1996;27:373–390.
Domitrovich CE, Greenberg MT. The study of implementation: current findings from effective programs that prevent mental disorders in school-aged children. J. Ed. Psychol. Consult. 2000;11:193–221.
Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Arnold; London: 2000.
Fisher P, Wicks J, Shaffer D, Piacentini J, Lapkin J. Division of Child and Adolescent Psychiatry. New York State Psychiatric Institute; New York: 1992. Diagnostic Interview Schedule for Children Users’ Manual.
Flay BR. Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. Prev. Med. 1986;15:451–474. [PubMed]
Flay BR, Biglan A, Boruch RF, Castro FG, Gottfredson D, Kellam S, Moscicki EK, Schinke S, Valentine JC, Ji P. Standards of evidence: criteria for efficacy, effectiveness and dissemination. Prev. Sci. 2005;6:151–175. [PubMed]
Flay BR, Collins LM. Historical review of school-based randomized trials for evaluating problem behavior prevention programs. Annals Amer. Acad. Polit. Soc. Sci. 2005;599:115–146.
Forgatch MS, DeGarmo DS. Accelerating recovery from poverty: prevention effects for recently separated mothers. J. Early Intensive Behav. Interv. in press.
Frangakis CE, Rubin DB. Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika. 1999;86:365–379.
Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21–29. [PubMed]
Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials. third ed. Springer Science, Business Media, LLC; New York: 1998.
Gibbons RD, Hedeker D. Random effects probit and logistic regression models for three-level data. Biometrics. 1997;53:1527–1537. [PubMed]
Gibbons RD, Hedeker D, Waternaux C, Davis JM. Random regression models: a comprehensive approach to the analysis of longitudinal psychiatric data. Psychopharmacol. Bull. 1988;24:438–443. [PubMed]
Goldstein H. Multilevel Statistical Models. third ed. Edward Arnold; London: 2003.
Gould MS, Marrocco FA, Kleinman M, Thomas JG, Mostkoff K, Cote J, Davies M. Evaluating iatrogenic risk of youth suicide screening programs: a randomized controlled trial. JAMA. 2005;293:1635–1643. [PubMed]
Graham JW, Olchowski AE, Gilreath TD. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev. Sci. 2007;8:206–213. [PubMed]
Graham JW, Taylor BJ, Olchowski AE, Cumsille PE. Planned missing data designs in psychological research. Psychol. Methods. 2006;11:323–343. [PubMed]
Greenbaum PE, Del Boca FK, Darkes J, Wang CP, Goldman MS. Variation in the drinking trajectories of freshmen college students. J. Consult. Clin. Psychol. 2005;73:229–238. [PubMed]
Greenwald P, Cullen JW. The new emphasis in cancer control. J. Natl. Cancer Inst. 1985;74:543–551. [PubMed]
Harachi TW, Abbott RD, Catalano RF. Opening the black box: using process evaluation measures to assess implementation and theory building. Am. J. Community Psychol. 1999;27:711–731. [PubMed]
Hastie T, Tibshirani R. Generalized Additive Models. Chapman and Hall; London: 1990.
Hawkins JD, Kosterman R, Catalano RF, Hill KG, Abbott RD. Promoting positive adult functioning through social development intervention in childhood: long-term effects from the Seattle Social Development Project. Arch. Pediatr. Adolesc. Med. 2005;159:25–31. [PubMed]
Hedeker D, Gibbons RD. A random-effects ordinal regression model for multilevel analysis. Biometrics. 1994;50:933–944. [PubMed]
Henson JM, Reise SP, Kim KH. Detecting mixtures from structural model differences using latent variable mixture modeling: a comparison of relative model fit statistics. Struct. Eq. Model. 2007;14:202–226.
Hill ABS. Statistical Methods in Clinical and Preventive Medicine. Livingstone; Edinburgh: 1962.
Hipp JR, Bauer DJ. Local solutions in the estimation of growth mixture models. Psychol. Methods. 2006;11:36–53. [PubMed]
Hoeksma JB, Kelderman H. On growth curves and mixture models. Infant Child Dev. 2006;15:627–634.
Holland PW. Statistics and causal inference. J. Am. Stat. Assoc. 1986;81:945–960.
Ialongo NS, Werthamer L, Kellam SG, Brown CH, Wang S, Lin Y. Proximal impact of two first-grade preventive interventions on the early risk behaviors for later substance abuse, depression, and antisocial behavior. Am. J. Community Psychol. 1999;27:599–641. [PubMed]
Jo B. Estimation of intervention effects with noncompliance: alternative model specifications. J. Educ. Behav. Stat. 2002;27:385–409.
Jo B, Muthén BO. Modeling of intervention effects with noncompliance: a latent variable approach for randomized trials. In: Marcoulides GA, Schumacker RE, editors. New Developments and Techniques in Structural Equation Modeling. Lawrence Erlbaum Associates; Hillsdale, NJ: 2001. pp. 57–87.
Kellam SG, Brown CH, Poduska JM, Ialongo N, Wang W, Toyinbo P, Petras H, Ford C, Windham A, Wilcox HC. Effects of a universal classroom behavior management program in first and second grades on young adult behavioral, psychiatric, and social outcomes. Drug Alcohol Depend. 2008;95:S5–S28. [PMC free article] [PubMed]
Kellam SG, Koretz D, Moscicki EK. Core elements of developmental epidemiologically based prevention research. Am. J. Community Psychol. 1999;27:463–482. [PubMed]
Kellam SG, Ling X, Merisca R, Brown CH, Ialongo N. The effect of the level of aggression in the first grade classroom on the course and malleability of aggressive behavior into middle school. Dev. Psychopathol. 1998;10:165–185. [PubMed]
Kellam SG, Rebok GW. Building developmental and etiological theory through epidemiologically based preventive intervention trials. In: McCord J, Tremblay RE, editors. Preventing Antisocial Behavior: Interventions from Birth Through Adolescence. Guilford Press; New York: 1992. pp. 162–195.
Kellam SG, Werthamer-Larsson L, Dolan LJ, Brown CH. Developmental epidemiologically based preventive trials: baseline modeling of early target behaviors and depressive symptoms. Am. J. Community Psychol. 1991;19:563–584. [PubMed]
Kleinman KP, Ibrahim JG, Laird NM. A Bayesian framework for intent-to-treat analyses with missing data. Biometrics. 1998;54:265–278. [PubMed]
Kraemer HC, Wilson GT, Fairburn CG, Agras WS. Mediators and moderators of treatment effects in randomized clinical trials. Arch. Gen. Psychiatry. 2002;59:877–883. [PubMed]
Krull JL, MacKinnon DP. Multilevel mediation modeling in group-based intervention studies. Eval. Rev. 1999;23:418–444. [PubMed]
Lachin JM. Statistical considerations in the intent-to-treat principle. Control. Clin. Trials. 2000;21:167–189. [PubMed]
Lavori PW. Clinical trials in psychiatry: should protocol deviation censor patient data? Neuropsychopharmacology. 1992;6:39–63. [PubMed]
Li F, Fisher KJ, Harmer P, McAuley E. Delineating the impact of Tai Chi training on physical function among the elderly. Am. J. Prev. Med. 2002;23:92–97. [PubMed]
Lilienfeld A, Lilienfeld DE. Foundations of Epidemiology. second ed. Oxford University Press; New York: 1980.
Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; New York: 1987.
Little R, Yau L. Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics. 1996;52:1324–1333. [PubMed]
MacKinnon DP. Introduction to Statistical Mediation Analysis. Erlbaum; Mahwah, NJ: 2006.
MacKinnon DP, Dwyer JH. Estimating mediated effects in prevention studies. Eval. Rev. 1993;17:144–158.
MacKinnon DP, Weber MD, Pentz MA. How do school-based drug prevention programs work and for whom? Drugs Soc. 1989;3:125–143.
Mazumdar S, Liu KS, Houck PR, Reynolds CF., III Intent-to-treat analysis for longitudinal clinical trials: coping with the challenge of missing values. J. Psychiatr. Res. 1999;33:87–95. [PubMed]
McCullagh P, Nelder JA. Generalized Linear Models. Chapman and Hall; London: 1989.
Moffitt TE. Adolescence-limited and life-course-persistent antisocial behavior: a developmental taxonomy. Psychol. Rev. 1993;100:674–701. [PubMed]
Moffitt TE, Caspi A. Childhood predictors differentiate life-course persistent and adolescence-limited antisocial pathways among males and females. Dev. Psychopathol. 2001;13:355–375. [PubMed]
MTA Cooperative Group Moderators and mediators of treatment response for children with attention-deficit/hyperactivity disorder. Arch. Gen. Psychiatry. 1999;56:1088–1096. [PubMed]
Murray DM. Design and Analysis of Group-Randomized Trials. Oxford University Press; New York: 1998.
Muthén BO. Latent variable modeling with longitudinal and multilevel data. In: Raftery AE, editor. Sociological Methodology. Blackwell; Boston: 1997. pp. 453–480.
Muthén BO. Statistical and substantive checking in growth mixture modeling. Psychol. Methods. 2003;8:369–377. [PubMed]
Muthén B. Latent variable analysis: growth mixture modeling and related techniques for longitudinal data. In: Kaplan D, editor. Handbook of Quantitative Methodology for the Social Sciences. Sage; Newbury Park, CA: 2004. pp. 345–368.
Muthén BO. Latent variable hybrids: overview of old and new models. In: Hancock GR, Samuelsen KM, editors. Advances in Latent Variable Mixture Models. Information Age Publishing, Inc.; Charlotte, NC: in press.
Muthén B, Asparouhov T. Growth mixture analysis: models with non-Gaussian random effects. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, editors. Advances in Longitudinal Data Analysis. Chapman & Hall/CRC Press; London: 2006.
Muthén BO, Brown CH, Masyn K, Jo B, Khoo ST, Yang CC, Wang CP, Kellam S, Carlin J, Liao J. General growth mixture modeling for randomized preventive interventions. Biostatistics. 2002;3:459–475. [PubMed]
Muthén BO, Curran PJ. General longitudinal modeling of individual differences in experimental designs: a latent variable framework for analysis and power estimation. Psychol. Methods. 1997;2:371–402.
Muthén BO, Muthén LK. The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U.S. national sample. J. Stud. Alcohol. 2000;61:290–300. [PubMed]
Muthén BO, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics. 1999;55:463–469. [PubMed]
Muthén LK, Muthén BO. Mplus: Statistical Analysis with Latent Variables: User’s Guide. Muthén & Muthén; Los Angeles, CA: 19982007. Version 4.2.
Nagin D. Group-based modeling of development. Harvard University Press; Cambridge, MA: 2005.
Nagin DS, Land KC. Age, criminal careers, and population heterogeneity: Specification and estimation of a nonparametric, mixed Poisson model. Criminology. 1993;31:327–362.
Nagin DS, Tremblay RE. Analyzing developmental trajectories of distinct but related behaviors: a group-based method. Psychol. Methods. 2001;6:18–34. [PubMed]
Neyman J. On the application of probability theory to agricultural experiments: essay on principles, Section 9. Stat. Sci. 1990;5:465–480. 1923. Translated in.
Nylund KL, Asparouhov T, Muthèn B. Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct. Equat. Model. in press.
Olsen MK, Schafer JL. A two-part random-effects model for semi-continuous longitudinal data. J. Am. Stat. Assoc. 2001;96:730–745.
Palmgreen P, Lorch EP, Stephenson MT, Hoyle RH, Donohew L. Effects of the Office of National Drug Control Policy’s Marijuana Initiative Campaign on high-sensation-seeking adolescents. Am. J. Public Health. 2007;97:1644–1649. [PMC free article] [PubMed]
Pearson JD, Morrell CH, Landis PK, Carter HB, Brant LJ. Mixed-effects regression models for studying the natural history of prostate disease. Stat. Med. 1994;13:587–601. [PubMed]
Petras H, Chilcoat HD, Leaf PJ, Ialongo NS, Kellam SG. Utility of TOCA-R scores during the elementary school years in identifying later violence among adolescent males. J. Am. Acad. Child Adolesc. Psychiatry. 2004;43:88–96. [PubMed]
Petras H, Ialongo N, Lambert SF, Barrueco S, Schaeffer CM, Chilcoat H, Kellam S. The utility of elementary school TOCA-R scores in identifying later criminal court violence amongst adolescent females. J. Am. Acad. Child Adolesc. Psychiatry. 2005;44:790–797. [PubMed]
Petras H, Kellam SG, Brown CH, Muthén B, Ialongo NS, Poduska JM. Developmental epidemiological courses leading to Antisocial Personality Disorder and violent and criminal behavior: effects by young adulthood of a universal preventive intervention in first- and second-grade classrooms. Drug Alcohol Depend. 2008;95:S45–S59. [PMC free article] [PubMed]
Pillow DR, Sandler IN, Braver SL, Wolchik SA, Gersten JC. Theory-based screening for prevention: focusing on mediating processes in children of divorce. Am. J. Community Psychol. 1991;19:809–836. [PubMed]
Pickett KE, Pearl M. Multilevel analyses of neighborhood socioeconomic context and health outcomes: a critical view. J. Epidemiol. Community Health. 2001;55:111–122. [PMC free article] [PubMed]
Plybon LE, Kliewer W. Neighborhood types and externalizing behavior in urban school-age children: tests of direct, mediated, and moderated effects. J. Child Fam. Studies. 2001;10:419–437.
Pocock SJ. Clinical Trials: A Practical Approach. Wiley; New York: 1983.
Poduska J, Kellam S, Wang W, Brown CH, Ialongo N, Toyinbo P. Impact of the Good Behavior Game, a universal classroom-based behavior intervention, on young adult service use for problems with emotions, behavior, or drugs or alcohol. Drug Alcohol Depend. 2008;95:S29–S44. [PMC free article] [PubMed]
R Project The R Project for Statistical Computing. 2007. Downloaded from http://www.r-project.org/. November 5, 2007.
Raudenbush SW. Statistical analysis and optimal design for cluster randomized trials. Psychol. Methods. 1997;2:173–185.
Raudenbush SW, Bryk AS. Hierarchical Linear Models: Applications and Data Analysis Methods. second ed. Sage Publications; Newbury Park, CA: 2002.
Raudenbush SW, Liu X. Statistical power and optimal design for multisite randomized trials. Psychol. Methods. 2000;5:199–213. [PubMed]
Raudenbush SW, Sampson RJ. Ecometrics: toward a science of assessing ecological settings, with application to the systematic social observation of neighborhoods. Sociol. Methodol. 1999;29:1–41.
Reid JB, Eddy JM, Fetrow RA, Stoolmiller M. Description and immediate impacts of a preventive intervention for conduct problems. Am. J. Community Psychol. 1999;24:483–517. [PubMed]
Rosenbaum P. Model based direct adjustment. J. Am. Stat. Assoc. 1987;82:387–394.
Roy A, Bhaumik DK, Aryal S, Gibbons RD. Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics. 2007;63:699–707. [PubMed]
Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 1974;66:688–701.
Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592.
Rubin DB. Bayesian inference for causal effects: The role of Randomization. Ann. Stat. 1978;6:34–58.
Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley; New York: 1987.
Rubin DB. Multiple imputation after 18+ years (with discussion) J. Am. Stat. Assoc. 1996;91:473–489.
Rush AJ, Trivedi MH, Wisniewski SR, Nierenberg A, Stewart JW, Warden D, Niederehe G, Thase ME, Lavori PW, Lebowitz BD, McGrath PJ, Rosenbaum JF, Sackeim HA, Kupfer DJ, Fava M. Acute and longer-term outcomes in depressed outpatients who required one or several treatment steps: A STAR*D report. Am. J. Psychiatry. 2006;163:1905–1917. [PubMed]
Sandler IN, Ayers TS, Wolchik SA, Tein JY, Kwok OM, Lin K, Padgett-Jones S, Weyer JL, Cole E, Kriege G, Griffin WA. Family Bereavement Program: efficacy of a theory-based preventive intervention for parentally-bereaved children and adolescents. J. Consult. Clin. Psychol. 2003;71:587–600. [PubMed]
Schaeffer CM, Petras H, Ialongo N, Masyn KE, Hubbard S, Poduska J, Kellam S. A comparison of girls’ and boys’ aggressive-disruptive behavior trajectories across elementary school: prediction to young adult antisocial outcomes. J. Consult. Clin. Psychol. 2006;74:500–510. [PubMed]
Schafer JL. Analysis of Incomplete Multivariate Data. Chapman & Hall; London: 1997.
Schafer JL. Multiple imputation: a primer. Stat. Methods Med. Res. 1999;8:3–15. [PubMed]
Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol. Methods. 2002;7:147–177. [PubMed]
Schwarz G. Estimating the dimension of a model. Ann. Stat. 1978;6:461–464.
Segawa E, Ngwe JE, Li Y, Flay BR, Aban Aya Coinvestigators Evaluation of the effects of the Aban Aya Youth Project in reducing violence among African American adolescent males using latent class growth mixture modeling techniques. Eval. Rev. 2005;29:128–148. [PMC free article] [PubMed]
Seltzer M. The use of hierarchical models in analyzing data from field experiments and quasi-experiments. In: Kaplan D, editor. The SAGE Handbook of Quantitative Methodology for the Social Sciences. Sage; Thousand Oaks, CA: 2004.
Shadish WR, Cook TD, Campbell DT. Experimental and Quasi-Experimental Design for Generalized Causal Inference. Houghton Mifflin; Boston: 2002.
Snyder JJ, Reid J, Stoolmiller M, Howe G, Brown H, Dagne G, Cross W. The role of behavior observation in measurement systems for randomized prevention trials. Prev. Sci. 2006;7:43–56. [PubMed]
Swanson J, Hinshaw SP, Arnold LE, Gibbons RD, Marcus S, Hur K, Jensen PS, Vitiello B, Abikoff H, Greenhill LL, Hechtman L, Pelham W, Wells K, Conners CK, Elliott G, Epstein L, Hoagwood K, Hoza B, Molina BS, Newcorn JH, Severe JB, Odbert C, Wigal T. Secondary evaluations of MTA 36-month outcomes: propensity score and growth mixture model analyses. J. Am. Acad. Child Adolesc. Psychiatry. in press.
Tein JY, Sandler IN, MacKinnon DP, Wolchik SA. How did it work? Who did it work for? Mediation in the context of a moderated prevention effect for children of divorce. J. Consult. Clin. Psychol. 2004;72:617–624. [PubMed]
Tsiatis A. Methodological issues in AIDS clinical trials. Intent-to-treat analysis. J. Acquir. Immune Defic. Syndr. 1990;3:S120–S123. [PubMed]
Verbeke G, Lesaffre E. A linear mixed-effects model with heterogeneity in the random-effects population. J. Am. Stat. Assoc. 1996;91:217–221.
Wang Y. Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998;60:159–174.
Wang C-P, Brown CH, Bandeen-Roche K. Residual diagnostics for growth mixture models: examining the impact of a preventive intervention on multiple trajectories of aggressive behavior. J. Am. Stat. Assoc. 2005;100:1054–1076.
Weiss P. The biological basis of adaptation. In: Romano J, editor. Adaptation. Cornell University Press; New York: 1949. pp. 7–14.
Wilcox HC, Kellam SG, Brown CH, Poduska J, Ialongo NS, Wang W, Anthony J. The impact of two universal randomized first- and second-grade classroom interventions on young adult suicide ideation and attempts. Drug Alcohol Depend. 2008;95:S60–S73. [PMC free article] [PubMed]
Wolchik SA, Sandler IN, Millsap RE, Plummer BA, Greene SM, Anderson ER, Dawson-McClure SR, Hipke K, Haine RA. Six-year follow-up of preventive interventions for children of divorce: a randomized controlled trial. JAMA. 2002;288:1874–1881. [PubMed]
Wolchik S, Sandler I, Weiss L, Winslow E. New beginnings: an empirically-based intervention program for divorced mothers to help children adjust to divorce. In: Briesmeister JM, Schaefer CE, editors. Handbook of Parent Training: Helping Parents Prevent and Solve Problem Behaviors. Wiley; New York: in press.
Wolfinger RD, O’Connell M. Generalized linear mixed models: a pseudo-likelihood approach. J. Stat. Comput. Simul. 1993;48:233–243.
Wood SN. Technical Report 04-12. Department of Statistics. University of Glasgow; Glasgow, UK: 2004. Low Rank Scale Invariant Tensor Product Smooths for Generalized Additive Mixed Models.
Wyman PA, Brown CH, Inman J, Cross W, Schmeelk-Cone K, Guo J, Peña J. Randomized trial of a gatekeeper training program for suicide prevention. J Clin. Consult. Psychol. in press.
Xu W, Hedeker D. A random-effects mixture model for classifying treatment response in longitudinal clinical trials. J. Biopharmaceut. Stat. 2001;11:253–273. [PubMed]
Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–1060. [PubMed];

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 MethodsforTestingTheoryandEvalu	Wei Wang Sheppard G Kellam Hanno Petras Peter Toyinbo Jeanne Poduska Nicholas Ialongo Peter A Wyman Patricia Chamberlain Bengt O. Muthén C. Hendricks Brown			Methods for Testing Theory and Evaluating Impact in Randomized Field Trials: Intent-to-treat Analyses for Integrating the Perspectives of Person, Place, and Time				10.1016/j.drugalcdep.2007.11.013		2008