Welch's t-Test System

From GM-RKB
Jump to navigation Jump to search

A Welch's t-Test System is statistical hypothesis testing system that implements a Welch's t-test algorithm to solve an Welch's t-test task.

#importing python libraries
In[1]: import pandas
In[2]: from scipy import stats
In[3]: iimport numpy
#Reading online dataset
In[4]: data = pandas.read_csv('http://libguides.library.kent.edu/ld.php?content_id=11205378', sep=',', na_values=".")
#Defining mile run time for datasets: "Athelete" and "Nonathelete"
In[5]: athelete = data[data['Athlete'] == 1]['MileMinDur']
In[6]: nonathelete = data[data['Athlete'] == 0]['MileMinDur']
# Converting dataset from hh:mm:ss format to a numerical number: running time in minutes
In[7]: athelete=athelete.astype(str).reshape(athelete.size,1)
In[8]: nonathelete=nonathelete.astype(str).reshape(nonathelete.size,1)
In[9]: athelete=athelete[numpy.where(athelete!=[' '])]
In[10]: nonathelete=nonathelete[numpy.where(nonathelete!=[' '])]
In[11]: for i in range(numpy.shape(athelete)[0]) :
...: h,m,s=athelete[i].split(':')
...: athelete[i]=int(h)*60+int(m)+(int(s)/60.)
In[12]: for j in range(numpy.shape(nonathelete)[0]) :
...: h,m,s=nonathelete[j].split(':')
...: nonathelete[j]=int(h)*60+int(m)+(int(s)/60.)
#Defining significance level
In[13]: alpha=0.05
#Performing Levene's Test. This tests whether the populations are equal
In[14]: stats.levene(athelete,nonathelete)
#Output : t-statistic value and p-value
Out[14]: (102.563129443,1.4800514645e-21)
Conclusion: P-value is too small, [math]\displaystyle{ p=1.480\times10^{-21} }[/math], Levene's test rejects the null hypothesis. Population Variances are not equal.
#Performing Welch's Test.
In[15]: stats.ttest_ind(athelete,nonathelete, equal_var = False)
#Output : t-statistic value and p-value
Out[15]: (-15.0486789157 and p-value 5.82457889026e-39)
Conclusion: P-value is too small, [math]\displaystyle{ p=5.82\times10^{-39} }[/math], null hypothesis is rejected. Running time between is very different. Indeed, the difference between mean sample values for 'Atheletes' and 'Nonatheletes' is 2 minutes and 14 seconds.


References

2017a

Calculates the T-test for the means of two independent samples of scores.
This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.
Parameters: a, b : array_like
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis : int or None, optional
Axis along which to compute test. If None, compute over the whole arrays, a, and b.
equal_var : bool, optional
If True (default), perform a standard independent 2 sample test that assumes equal population variances R643. If False, perform Welch’s t-test, which does not assume equal population variance R644.
New in version 0.11.0.
nan_policy : {‘propagate’, ‘raise’, ‘omit’}, optional
Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: statistic : float or array
The calculated t-statistic.
pvalue : float or array
The two-tailed p-value.
Notes
We can use this test, if we observe two independent samples from the same or different population, e.g. exam scores of boys and girls or of two ethnic groups. The test measures whether the average (expected) value differs significantly across samples. If we observe a large p-value, for example larger than 0.05 or 0.1, then we cannot reject the null hypothesis of identical average scores. If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%, then we reject the null hypothesis of equal averages.

2017b

The logic and computational details of two-sample t-tests are described in Chapters 9-12 of the online text Concepts & Applications of Inferential Statistics. For the independent-samples t-test, this unit will perform both the "usual" t-test, which assumes that the two samples have equal variances, and the alternative t-test, which assumes that the two samples have unequal variances. (A good formulaic summary of the unequal-variances t-test can be found on the StatsDirect web site. A more thorough account appears in the online journal Behavioral Ecology.)

2017c

A t test compares the means of two groups. For example, compare whether systolic blood pressure differs between a control and treated group, between men and women, or any other two groups.

Don't confuse t tests with correlation and regression. The t test compares one variable (perhaps blood pressure) between two groups. Use correlation and regression to see how two variables (perhaps blood pressure and heart rate) vary together. Also don't confuse t tests with ANOVA. The t tests (and related nonparametric tests) compare exactly two groups. ANOVA (and related nonparametric tests) compare three or more groups. Finally, don't confuse a t test with analyses of a contingency table (Fishers or chi-square test). Use a t test to compare a continuous variable (e.g., blood pressure, weight or enzyme activity). Use a contingency table to compare a categorical variable (e.g., pass vs. fail, viable vs. not viable).

2017D

2015

  • (Mangiafico, 2015) ⇒ Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.3.0. , Content retrieved from http://rcompanion.org/rcompanion/d_02.html
    • (...) Welch’s t-test is shown above in the “Example” section (“Two sample unpaired t-test”). It is invoked with the var.equal=FALSE option in the t.test function.