# Two Active-Treatment Controlled Experiment

A Two Active-Treatment Controlled Experiment is a treatment-controlled experiment with two active treatments.

**Context:**- It can (typically) include a Bivariate Controlled Experiment Evaluation Task.
- It can range from being an A/B Test to being an A/A Test.
- It can range from being a Randomized Two Active-Treatment Controlled Experiment to being a Non-Randomized Two Active-Treatment Controlled Experiment.

**Example(s):****Counter-Example(s):****See:**Independent Two-Sample t-Test, Contingency Table, Randomized Controlled Experiment, Statistical Hypothesis Testing, Web Design Optimization, User Experience Design, Click-Through Rate Optimization, Evidence-Based Practice.

## References

### 2018

- (MLG TensorFlow, 2018) ⇒ (2008). "A/B testing". In: Machine Learning Glossary (TensorFlow) Retrieved 2018-04-22.
- QUOTE: A statistical way of comparing two (or more) techniques, typically an incumbent against a new rival. A/B testing aims to determine not only which technique performs better but also to understand whether the difference is statistically significant. A/B testing usually considers only two techniques using one measurement, but it can be applied to any finite number of techniques and measures.

### 2016

- (Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/A/B_testing Retrieved:2016-9-14.
- In marketing and business intelligence,
**A/B testing**is a term for a randomized experiment with two variants, A and B, which are the control and variation in the controlled experiment. A/B testing is a form of statistical hypothesis testing with two variants leading to the technical term,*two-sample hypothesis testing*, used in the field of statistics. Other terms used for this method include**bucket tests**and**split-run testing.**These terms can have a wider applicability to more than two variants, but the term A/B testing is also frequently used in the context of testing more than two variants. In online settings, such as web design (especially user experience design), the goal of A/B testing is to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement). Formally the current web page is associated with the null hypothesis. A/B testing is a way to compare two versions of a single variable typically by testing a subject's response to variable A against variable B, and determining which of the two variables is more effective. As the name implies, two versions (A and B) are compared, which are identical except for one variation that might affect a user's behavior. Version A might be the currently used version (control), while version B is modified in some respect (treatment). For instance, on an e-commerce website the purchase funnel is typically a good candidate for A/B testing, as even marginal improvements in drop-off rates can represent a significant gain in sales. Significant improvements can sometimes be seen through testing elements like copy text, layouts, images and colors, but not always. The vastly larger group of statistics broadly referred to as multivariate testing or multinomial testing is similar to A/B testing, but may test more than two different versions at the same time and/or has more controls, etc. Simple A/B tests are not valid for observational, quasi-experimental or other non-experimental situations, as is common with survey data, offline data, and other, more complex phenomena. A/B testing has been marketed by some as a change in philosophy and business strategy in certain niches, though the approach is identical to a between-subjects design, which is commonly used in a variety of research traditions.^{[1]}A/B testing as a philosophy of web development brings the field into line with a broader movement toward evidence-based practice. The benefits of A/B testing are considered to be that it can be performed continuously on almost anything, especially since most marketing automation software now, typically, comes with the ability to run A/B tests on an on-going basis. This allows for updating websites and other tools, using current resources, to keep up with changing trends.

- In marketing and business intelligence,

- ↑ Cite error: Invalid
`<ref>`

tag; no text was provided for refs named`wired`

### 2016b

- (Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/A/B_testing#Common_test_statistics Retrieved:2016-9-14.
- "Two-sample hypothesis tests" are appropriate for comparing the two samples where the samples are divided by the two control cases in the experiment. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. Student's t-tests are appropriate for comparing means under relaxed conditions when less is assumed. Welch's t test assumes the least and is therefore the most commonly used test in a two-sample hypothesis test where the mean of a metric is to be optimized. While the mean of the variable to be optimized is the most common choice of estimator, others are regularly used.
For a comparison of two binomial distributions such as a click-through rate one would use Fisher's exact test.

- "Two-sample hypothesis tests" are appropriate for comparing the two samples where the samples are divided by the two control cases in the experiment. Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation. Student's t-tests are appropriate for comparing means under relaxed conditions when less is assumed. Welch's t test assumes the least and is therefore the most commonly used test in a two-sample hypothesis test where the mean of a metric is to be optimized. While the mean of the variable to be optimized is the most common choice of estimator, others are regularly used.

Assumed Distribution | Example Case | Standard Test | Python Implementation |
---|---|---|---|

Gaussian | Average Revenue Per Paying User | Welch's t test | scipy.stats.ttest_ind |

Binomial | Click Through Rate | Fisher's exact test | scipy.stats.fisher_exact |

Poisson | Average Transactions Per Paying User | E-test | None |

Multinomial | Number of each product Purchased | Chi-squared test | scipy.stats.chisquare |

Unknown | -- | Mann–Whitney U test | scipy.stats.mannwhitneyu |

### 2006

- (Maurin, 2006) ⇒ Michel Maurin. (2006). “An Original Comfort/Discomfort Quantification in a Bivariate Controlled Experiment: Application to the Discomfort Evaluation of Seated Arm Reach.” In: Proceedings of the SA-DHM Congress.