Pearson Correlation Test System

From GM-RKB
Jump to navigation Jump to search

An Pearson Correlation Test System is statistical hypothesis testing system that solves a Pearson correlation test task.

#importing python libraries
import pandas
from scipy.stats import pearsonr
#reading data file
data = pandas.read_csv('brain_size.csv', sep=';', na_values=".")
#female dataset
female_viq = data[data['Gender'] == 'Female']['VIQ']
#female dataset
male_viq = data[data['Gender'] == 'Male']['VIQ']
#calling pearsonr function
pearsonr(female_viq,male_viq)
# output : Pearson’s correlation coefficient, 2-tailed p-value
(0.0082168169434572707, 0.97257333753162245)
there is no linear relationship between the two datasets.
Example 2:
from scipy.stats import pearsonr
pearsonr([1,2,3,4,5,6],[2,3,4,5,6,7])
# output : Pearson’s correlation coefficient, 2-tailed p-value
(1.0, 0.0)
the two datasets are correlated


References

2017

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.