Two-way ANOVA

From GM-RKB
Jump to navigation Jump to search

A Two-way ANOVA is an extension of One-way ANOVA that determines intraclass correlation between categorical variables (independent variables) and dependent variables.



References

2016

Model
Upon observing variation among all [math]\displaystyle{ n }[/math] data points, for instance via a histogram, “probability may be used to describe such variation". Let us hence denote by [math]\displaystyle{ Y_{ijk} }[/math] the random variable which observed value [math]\displaystyle{ y_{ijk} }[/math] is the [math]\displaystyle{ k }[/math]-th measure for treatment [math]\displaystyle{ (i,j) }[/math]. The two-way ANOVA models all these variables as varying independently and normally around a mean, [math]\displaystyle{ \mu_{ij} }[/math], with a constant variance, [math]\displaystyle{ \sigma^2 }[/math] (homoscedasticity):

[math]\displaystyle{ Y_{ijk} \, | \, \mu_{ij}, \sigma^2 \; \overset{i.i.d.}{\sim} \; \mathcal{N}(\mu_{ij}, \sigma^2) }[/math].

Specifically, the mean of the response variable is modeled as a linear combination of the explanatory variables:

[math]\displaystyle{ \mu_{ij} = \mu + \alpha_i + \beta_j + \gamma_{ij} }[/math],

where [math]\displaystyle{ \mu }[/math] is the grand mean, [math]\displaystyle{ \alpha_i }[/math] is the additive main effect of level [math]\displaystyle{ i }[/math] from the first factor (i-th row in the contigency table), [math]\displaystyle{ \beta_j }[/math] is the additive main effect of level [math]\displaystyle{ j }[/math] from the second factor (j-th column in the contigency table) and [math]\displaystyle{ \gamma_{ij} }[/math] is the non-additive interaction effect of treatment [math]\displaystyle{ (i,j) }[/math] from both factors (cell at row i and column j in the contigency table).
An other, equivalent way of describing the two-way ANOVA is by mentioning that, besides the variation explained by the factors, there remains some statistical noise. This amount of unexplained variation is handled via the introduction of one random variable per data point, [math]\displaystyle{ \epsilon_{ijk} }[/math], called error. These [math]\displaystyle{ n }[/math] random variables are seen as deviations from the means, and are assumed to be independent and normally distributed:

[math]\displaystyle{ Y_{ijk} = \mu_{ij} + \epsilon_{ijk} \text{ with } \epsilon_{ijk} \overset{i.i.d.}{\sim} \mathcal{N}(0, \sigma^2) }[/math].