# Independent Two-Sample t-Test Task

An Independent Two-Sample t-Test Task is a statistical hypothesis testing task used to describe an independent two-sample t-test.

**AKA:**Independent/Unpaired/Unmatched Two-Sample t-Testing.**Context:**- Task Input:
- Input Data : Two independent sample datasets: [math]\{x_1,x_2,\cdots,x_n\}[/math] , [math]\{y_1,y_2,\cdots,y_m\}[/math].
- Input Parameters:
- [math]\alpha_0[/math], a significance level value or a confidence level (a percentage).

**output**:- one-sample t-test statistic value,
- P-value or Region of Acceptance
- Region of Rejection (optional)
- Decision Errors (optional).

**Task Requirements**- Definition of the test variable and categorical groups. This may require a Classification System.
- Verification of Test Requirement(s) (Optional):
- Population Variances are equal. This may include a Bartlett's Test or a Levene's Test for the homogeneity of variance.
- Sampling distribution can be approximated to a normal distribution.

- Hypotheses Statement: a null hypothesis and an alternative hypothesis according to one-tailed independent two-sample t-test or two-tailed independent two-sample t-test
- Test Statistic computation: This require the calculation an independent two-sample t-test statistic from the samples mean values and samples standard deviations.
- P-value and/or Region of acceptance computation: these require a t-distribution calculator or t-table.
- Decision Rule: Null hypothesis is reject if P-value is less than [math]\alpha_0[/math] or if the t-test statistic value follows outside region of acceptance.

- It can be solved by an Independent Two-Sample t-Test System (that implements an independent two-sample t-test algorithm).

- Task Input:
**Example(s)**:- Considering the dataset ('http://www.scipy-lectures.org/_downloads/brain_size.csv') which includes the following variables: Gender, FSIQ, VIQ, PIQ, Weight, Height, MRI Count. The first example in Independent Two-Sample t-Test System solves the testing the null hypothesis: "VIQ mean value among females and males are equal". First, the system solves the following data classification task by categorizing the VIQ dataset (test variable) according to the nominal variable "Gender":

- Sample 1/Group 1:
`female_viq`

dataset corresponds VIQ values for Gender labelled as "Female", - Sample 2/Group 2:
`male_viq`

dataset corresponds VIQ values for Gender labelled as "Male".

- Sample 1/Group 1:

- Then, it uses an independent two-sample t-test algorithm,
`stats.ttest_ind(female_viq, male_viq)`

to calculate the independent two-sample t-test statistic, [math]t=-0.77261617232[/math] and p-value [math]p=0.4445287677858[/math]. If we consider a significance level [math]\alpha=0.05[/math] the test fails to reject the null hypothesis.

**Counter-Example(s):****See:**Independent Two-Sample t-Test System, Statistical Significance, Sample Average, Sample Variance.

## References

### 2017a

- (Wikipedia, 2017) ⇒ http://en.wikipedia.org/wiki/Student%27s_t-test#Independent_two-sample_t-test
- Given two groups (1, 2), this test is only applicable when:

- the two sample sizes (that is, the number,
*n*, of participants of each group) are equal; - it can be assumed that the two distributions have the same variance;

- the two sample sizes (that is, the number,
- Violations of these assumptions are discussed below.
- The
*t*statistic to test whether the means are different can be calculated as follows:- [math] t = \frac{\bar {X}_1 - \bar{X}_2}{s_p \sqrt{2/n}} [/math]

- where
- [math]\ s_p = \sqrt{\frac{s_{X_1}^2+s_{X_2}^2}{2}}[/math]

- Here [math]s_p[/math] is the pooled standard deviation for
*n*=*n*_{1}=*n*_{2}and [math]s_{X_1}^2[/math] and [math]s_{X_2}^2[/math] are the unbiased estimators of the variances of the two samples. The denominator of*t*is the standard error of the difference between two means. - For significance testing, the degrees of freedom for this test is 2
*n*− 2 where*n*is the number of participants in each group.

### 2017b

- (Stattrek, 2017) ⇒ http://stattrek.com/hypothesis-test/difference-in-means.aspx?Tutorial=AP
- This lesson explains how to conduct a hypothesis test for the difference between two means. The test procedure, called the two-sample t-test, is appropriate when the following conditions are met:
- The sampling method for each sample is simple random sampling.
- The samples are independent.
- Each population is at least 20 times larger than its respective sample.
- The sampling distribution is approximately normal, which is generally the case if any of the following conditions apply.
- The population distribution is normal.
- The population data are symmetric, unimodal, without outliers, and the sample size is 15 or less.
- The population data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40.
- The sample size is greater than 40, without outliers.

- This lesson explains how to conduct a hypothesis test for the difference between two means. The test procedure, called the two-sample t-test, is appropriate when the following conditions are met:

### 2017c

- (QCP Glossary, 2017) ⇒ https://www.quality-control-plan.com/StatGuide/ttest_unpaired.htm
- The two-sample unpaired t test is used to test the null hypothesis that the two population means corresponding to the two random samples are equal.

- Assumptions:
- Within each sample, the values are independent, and identically normally distributed (same mean and variance).
- The two samples are independent of each other.
- For the usual two-sample t test, the two different samples are assumed to come from populations with the same variance, allowing for a pooled estimate of the variance. However, if the two sample variances are clearly different, a variant test, the Welch-Satterthwaite t test, is used to test whether the means are different.

- Assumptions:

### 2017D

- http://www.evanmiller.org/ab-testing/t-test.html Does the average value differ across two groups?

### 2014

- (McDonald, 2014) ⇒ McDonald, J.H., (2014). Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, Maryland. Retrieved from http://www.biostathandbook.com/twosamplettest.html which contains handbook's content of pages 126-130.
**Summary**- Use Student's t–test for two samples when you have one measurement variable and one nominal variable, and the nominal variable has only two values. It tests whether the means of the measurement variable are different in the two groups.

**Introduction**- There are several statistical tests that use the t-distribution and can be called a t–test. One of the most common is Student's t–test for two samples. Other t–tests include the one-sample t–test, which compares a sample mean to a theoretical mean, and the paired t–test.- Student's t–test for two samples is mathematically identical to a one-way anova with two categories; because comparing the means of two samples is such a common experimental design, and because the t–test is familiar to many more people than anova, I treat the two-sample t–test separately.
**When to use it**- Use the two-sample t–test when you have one nominal variable and one measurement variable, and you want to compare the mean values of the measurement variable. The nominal variable must have only two values, such as "male" and "female" or "treated" and "untreated."