Confounding Variable

Jump to: navigation, search

A confounding variable is an latent random variable that is in a causal relationships with both a dependent variable and an independent variable (of the experiment).



  • (Wikipedia, 2011) ⇒ Wikipedia contributors. (2011). “Confounding Variable." extracted 2011-Jun-11
    • In statistics, a confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable. The methodologies of scientific studies therefore need to control for these factors to avoid a false positive (Type I) error; an erroneous conclusion that the dependent variables are in a causal relationship with the independent variable. Such a relation between two observed variables is termed a spurious relationship. Thus, confounding is a major threat to the validity of inferences made about cause and effect, i.e. internal validity, as the observed effects should be attributed to the independent variable rather than the confounder. For example, consider the statistical relationship between ice cream sales and drowning deaths. These two variables have a positive, and potentially statistically significant, correlation with each other. At first sight, an evaluator might be tempted to infer a causal relationship in one direction or the other (either that ice cream causes drowning or that drowning causes ice cream consumption): On one hand, the evaluator might attribute the entirety of the correlation to the causal chain "Since a) a nonzero fraction of people who eat ice cream go swimming shortly thereafter, b) swimming after eating causes cramps in a nonzero fraction of that fraction of people, and c) those cramps cause the inability to swim and the subsequent drowning of a nonzero fraction of the latter fraction, an increase in ice cream sales will cause an increase in drowning deaths." On the other, the evaluator might attribute the entirety of that correlation to the causal chain "Since a) drowning deaths cause bereavement among almost all of the deceased's loved ones and b) some nonzero fraction of grieving persons console themselves with ice cream, an increase in drowning deaths will cause an increase in ice cream consumption, purchases, and sales." In turn, if both of these patterns hold true, they will amplify each other, although that amplification is bounded at a horizontal asymptote: Some of the people who eat ice cream and then drown will leave behind grieving loved ones who console themselves with ice cream, some of those ice-cream-eating loved ones will go swimming after eating their ice cream, and some of those ice-cream-eating-and-then-swimming loved ones will drown, etc., but even in a world where these two factors are the only ones in play, the small percentages at issue quickly reduce the amplification at each successive iteration to almost nil.


    • A lurking variable (confounding factor or variable, or simply a confound or confounder) is a "hidden" variable in a statistical or research model that affects the variables in question but is not known or acknowledged, and thus (potentially) distorts the resulting data. This hidden third variable causes the two measured variables to falsely appear to be in a causal relation. Such a relation between two observed variables is termed a spurious relationship. An experiment that fails to take a confounding variable into account is said to have poor internal validity.

      For example, ice cream consumption and murder rates are highly correlated. Now, does ice cream incite murder or does murder increase the demand for ice cream? Neither: they are joint effects of a common cause or lurking variable, namely, hot weather. Another look at the sample shows that it failed to account for the time of year, including the fact that both rates rise in the summertime.

      In statistical experimental design, attempts are made to remove lurking variables such as the placebo effect from the experiment. Because we can never be certain that observational data are not hiding a lurking variable that influences both x and y, it is never safe to conclude that a linear model demonstrates a causal relationship with 100% certainty, no matter how strong the linear association.


  • (MacKinnon & Luecken. 2008) ⇒ David P. MacKinnon and Linda J. Luecken. (2008). “How and for Whom? Mediation and Moderation in Health Psychology.” In: Health Psychology, 27(2 Suppl):S99. [ doi:10.1037/0278-6133.27.2
    • QUOTE: There are two other types of variables that are relevant to discussion of third-variable effects, and that are often confused with mediators or moderators: confounding variables and covariates. A confounding variable is one that changes the relation between an independent and dependent variable because it is related to both variables, but is not theoretically in a causal sequence between the independent and dependent variable. When considering whether a variable is a mediator or a confound, the presumed presence or lack of a causal mediation relation should be taken into account. Confounders explain a significant relation between the independent and dependent variable by a third variable that predicts both variables, whereas a mediator explains a relation between variables because it is intermediate in a causal sequence. Age, gender, and income are often included in statistical models because of their potential to act as confounding variables. For example, due to its significant associations with both their IV and DV, Surtees, Wainwright, Luben, Khaw, and Day (2006) evaluate participant age as a potential confound of the relation between personal mastery and all-cause mortality. However, the relation did not change, suggesting that age was not a confound that would explain the observed association between mastery and mortality. … One other type of third variable is a covariate, which has a relation with one or both of the independent and dependent variables, but does not appreciably change the relation between an independent and dependent variable when included in a statistical analysis. Covariates are generally not of theoretical interest, but are often included in a model to explain additional variability in a dependent variable.