One-Sample t-Test System

From GM-RKB
Jump to navigation Jump to search

An One-Sample t-Test System is statistical hypothesis testing system that implements a one-sample t-test algorithm to solve an one-sample t-test task.

#importing python libraries
In[1]: import pandas
In[2]: from scipy import stats
# reading sample data
In[3]: data = pandas.read_csv('http://www.scipy-lectures.org/_downloads/brain_size.csv', sep=';', na_values=".")
# Performing One-sample t-test
# Null hypothesis VIQ population mean value is 10
In[4]: stats.ttest_1samp(data['VIQ'], 10)
#Output : t-statistic value and p-value
Out[4]: 27.4100314376, 4.29314923813e-27
Conclusion: p-value is too small, null hypothesis is rejected. VIQ population mean value is NOT 10
#Null hypothesis VIQ population mean value is 115
In[5]: stats.ttest_1samp(data['VIQ'], 115)
#Output : t-statistic value and p-value
Out[5]: -0.709688161306, 0.482119348971
Conclusion: P-value is greater than the significance levels = 0.05, 0.025,0.01 , test fails to reject the null hypothesis. VIQ population mean value can be 115.
#Null hypothesis VIQ population mean value is 200
In[6]: stats.ttest_1samp(data['VIQ'], 200)
#Output : t-statistic value and p-value
Out[6]: (-23.4732706938, 1.2940564282e-24)
Conclusion: p-value is too small, null hypothesis is rejected. VIQ population mean value is NOT 200
#importing python libraries
In[1]: import numpy as np
In[2]: import pandas as pd
In[3]: import scipy.stats as stats
#creating a random artificial datasets
In[4]: np.random.seed(6)
In[5]: population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
In[6]: population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
In[7]: minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
In[8]: minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
#population dataset
In[9]: population_ages = np.concatenate((population_ages1, population_ages2))
#sample dataset
In[10]: minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2)
#calling one-sample t-test function
In[11]: stats.ttest_1samp(a=minnesota_ages,popmean=population_ages.mean())
#Output : t-statistic value and p-value
Out[11]: -2.5742714883655027, 0.013118685425061678
For a significance level [math]\displaystyle{ \alpha=0.05 }[/math] the null hypothesis is rejected
# calculation of acceptance region lower limit for significance level alpha=0.05 using probability density function for the t-distribution (stats.t.pdf). Note that q=alpha/2 and df is degrees of freedom
# In[12]: stats.t.ppf(q=0.025, df=49)
Out[12]: -2.0095752344892093
# calculation of acceptance region upper limit q=1- (alpha/2)
In[13]: stats.t.ppf(q=0.975, df=49)
Out[13]: 2.0095752344892088
null hypothesis is rejected because t-statistic falls outside the acceptance region.
# Alternative method of calculating p-value using cumulative distribution function for the t-distribution (stats.t.cdf). Note that x= t-statistic value.
In[14]: stats.t.cdf(x= -2.5742, df= 49) * 2
Out[14]: 0.013121066545690117
null hypothesis is rejected, p-value is less than significance level.


References

2017a

Calculates the T-test for the mean of ONE group of scores.
This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.

2017b

>>>
>>> stats.ttest_1samp(data['VIQ'], 0)
(...30.088099970..., 1.32891964...e-28)
With a p-value of [math]\displaystyle{ 10^{-28} }[/math] we can claim that the population mean for the IQ (VIQ measure) is not 0.

2015

  • (Hamelg, 2015) ⇒ Retrieved on 2017-02-26 from "Python for Data Analysis Part 24: Hypothesis Testing and the T-Test", http://hamelg.blogspot.ca/2015/11/python-for-data-analysis-part-24.html
    • A one-sample t-test checks whether a sample mean differs from the population mean. Let's create some dummy age data for the population of voters in the entire country and a sample of voters in Minnesota and test the whether the average age of voters Minnesota differs from the population
  import numpy as np
  import pandas as pd
  import scipy.stats as stats
  import matplotlib.pyplot as plt
  import math
  np.random.seed(6)
  population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
  population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
  population_ages = np.concatenate((population_ages1, population_ages2))
  minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
  minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
  minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))
  print( population_ages.mean() )
  print( minnesota_ages.mean() )
Notice that we used a slightly different combination of distributions to generate the sample data for Minnesota, so we know that the two means are different. Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis that the sample comes from the same distribution as the population. To conduct a one sample t-test, we can the stats.ttest_1samp() function:
 stats.ttest_1samp(a= minnesota_ages, popmean= population_ages.mean())
 # (Sample data, Pop mean)