Independent Two-Sample t-Test System

From GM-RKB
Jump to navigation Jump to search

An Independent Two-Sample t-Test System is statistical hypothesis testing system that implements an independent two-sample t-test algorithm to solve an independent two-sample t-test task.

#importing python libraries
In[1]: import pandas
In[2]: from scipy import stats
# reading sample data
In[3]: data = pandas.read_csv('http://www.scipy-lectures.org/_downloads/brain_size.csv', sep=';', na_values=".")
# VIQ (test variable) data categorization by Gender (grouping variable, nominal variable with two categorical values)
# sample 1: Female VIQ
In[4]: female_viq = data[data['Gender'] == 'Female']['VIQ']
# sample 2: Male VIQ
In[5]: male_viq = data[data['Gender'] == 'Male']['VIQ']
# (Optional 1) Test the null hypothesis whether population variances are equal using Bartlett's Test
In[6]: stats.bartlett(female_viq,male_viq)
#Output : t-statistic value and p-value
Out[6]: (0.52142853432619718, 0.47023294528713788)
# (Optional 2) Test the null hypothesis whether population variances are equal using Levene's Test
In[7]: stats.levene(female_viq,male_viq)
#Output : t-statistic value and p-value
Out[7]: 0.78528266993527363, 0.38110422921600584)
#Main Task: Testing the whether female and male VIQ means are equal using an independent two-sample t-test
In[8]: stats.ttest_ind(female_viq, male_viq)
#Task Output : t-statistic value and p-value
Out[8]: (-0.77261617232, 0.4445287677858)
Considering the significance level [math]\displaystyle{ \alpha=0.05 }[/math] the Bartlett's Test, Levene's Test and independent two-sample t-test fail to reject the null hypotheses as the p-values are greater that this value.
  • A system to test the null hypothesis whether means of female's and male's height are equal using the same dataset as above using the following iPython code lines:
#importing python libraries
In[1]: import pandas
In[2]: from scipy import stats
# reading sample data
In[3]: data = pandas.read_csv('http://www.scipy-lectures.org/_downloads/brain_size.csv', sep=';', na_values=".")
# Fill in the missing values (NAN values) for Height
# In[4]: data['Height'].fillna(method='pad', inplace=True)
# Height (test variable) data categorization by Gender (grouping variable, nominal variable with two categorical values)
# sample 1: Female Height
In[5]: female_h = data[data['Gender'] == 'Female']['Height']
# sample 2: Male Height
In[6]: male_h = data[data['Gender'] == 'Male']['Height']
# (Optional) Test the null hypothesis whether population variances are equal using Bartlett's Test
In[7]: stats.bartlett(female_h,male_h)
#Output : t-statistic value and p-value
Out[7]: (2.0876034164547845, 0.14849886684013355)
#Main Task: Testing the whether means of female's and male's height means are equal using an independent two-sample t-test
In[8]: stats.ttest_ind(female_h, male_h)
#Task Output : t-statistic value and p-value
Out[8]: (-6.3452292802666515, 1.915212359094238e-07)
Conclusion: considering the significance level [math]\displaystyle{ \alpha=0.05 }[/math], Bartlett's Test fail to reject the null hypothesis. However, the independent two-sample t-test rejects the null hypothesis, p-value is to small and we can assume that the means of female's and male's height are not equal.


References

2017a

Calculates the T-test for the means of two independent samples of scores. This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

2017b

2017c

The logic and computational details of two-sample t-tests are described in Chapters 9-12 of the online text Concepts & Applications of Inferential Statistics. For the independent-samples t-test, this unit will perform both the "usual" t-test, which assumes that the two samples have equal variances, and the alternative t-test, which assumes that the two samples have unequal variances. (A good formulaic summary of the unequal-variances t-test can be found on the StatsDirect web site. A more thorough account appears in the online journal Behavioral Ecology.)

2017d

A t test compares the means of two groups. For example, compare whether systolic blood pressure differs between a control and treated group, between men and women, or any other two groups.

Don't confuse t tests with correlation and regression. The t test compares one variable (perhaps blood pressure) between two groups. Use correlation and regression to see how two variables (perhaps blood pressure and heart rate) vary together. Also don't confuse t tests with ANOVA. The t tests (and related nonparametric tests) compare exactly two groups. ANOVA (and related nonparametric tests) compare three or more groups. Finally, don't confuse a t test with analyses of a contingency table (Fishers or chi-square test). Use a t test to compare a continuous variable (e.g., blood pressure, weight or enzyme activity). Use a contingency table to compare a categorical variable (e.g., pass vs. fail, viable vs. not viable).

2017 e.

2015a

  • (Hamelg, 2015) ⇒ Retrieved on 2017-02-26 from "Python for Data Analysis Part 24: Hypothesis Testing and the T-Test", http://hamelg.blogspot.ca/2015/11/python-for-data-analysis-part-24.html
    • A two-sample t-test investigates whether the means of two independent data samples differ from one another. In a two-sample test, the null hypothesis is that the means of both groups are the same. Unlike the one sample-test where we test against a known population parameter, the two sample test only involves sample means. You can conduct a two-sample t-test by passing with the stats.ttest_ind() function.

2015b

  • (Mangiafico, 2015) ⇒ Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.3.0. , Content retrieved from http://rcompanion.org/rcompanion/d_02.html
    • (...) Welch’s t-test is shown above in the “Example” section (“Two sample unpaired t-test”). It is invoked with the var.equal=FALSE option in the t.test function.