(Redirected from p-value)
Jump to: navigation, search

A p-value is a probability measure used in frequentist statistics for testing the strength of the null hypothesis against an alternative hypothesis.




  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/p-value Retrieved:2015-12-2.
    • In statistics, the p-value is a function of the observed sample results (a statistic) that is used for testing a statistical hypothesis. More specifically, the p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the hypothesis under consideration is true. Here, "more extreme" is dependent on the way the hypothesis is tested. Before the test is performed, a threshold value is chosen, called the significance level of the test, traditionally 5% or 1% and denoted as α. If the p-value is equal to or smaller than the significance level (α), it suggests that the observed data are inconsistent with the assumption that the null hypothesis is true and thus that hypothesis must be rejected (but this does not automatically mean the alternative hypothesis can be accepted as true). When the p-value is calculated correctly, such a test is guaranteed to control the Type I error rate to be no greater than α. Since p-value is used in frequentist inference (and not Bayesian inference), it does not in itself support reasoning about the probabilities of hypotheses but is only as a tool for deciding whether to reject the null hypothesis. Statistical hypothesis tests making use of p-values are commonly used in many fields of science and social sciences, such as economics, psychology, biology, criminal justice and criminology, and sociology.Misuse of this tool continues to be the subject of criticism.


  • http://en.wikipedia.org/wiki/P-value
    • … The lower the p-value, the less likely the result, assuming the Null Hypothesis, so the more "significant" the result, in the sense of Statistical Significance – one often uses p-values of 0.05 or 0.01, corresponding to a 5% chance or 1% of an outcome that extreme, given the null hypothesis. It should be noted, however, that the idea of more or less significance is here only being used for illustrative purposes. The result of a test of significance is either "statistically significant" or "not statistically significant"; there are no shades of gray.

      More technically, a p-value of an experiment is a random variable defined over the Sample Space of the experiment such that its distribution under the null hypothesis is uniform on the interval [0,1]. Many p-values can be defined for the same experiment.




  • (Goodman, 1999) ⇒ Steven N. Goodman. (1999). “Toward Evidence-based Medical Statistics. 1: The P Value Fallacy.” In: Annals Internal Medicine, 130(12).
    • ABSTRACT: An important problem exists in the interpretation of modern medical research data: Biological understanding and previous research play little formal role in the interpretation of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain "error rates," without consideration of information from outside the experiment. This statistical approach, the key components of which are P values and hypothesis tests, is widely perceived as a mathematically coherent approach to inference. There is little appreciation in the medical community that the methodology is an amalgam of incompatible elements, whose utility for scientific inference has been the subject of intense debate among statisticians for almost 70 years. This article introduces some of the key elements of that debate and traces the appeal and adverse impact of this methodology to the P value fallacy, the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result. This argument is made as a prelude to the suggestion that another measure of evidence should be used -- the Bayes factor, which properly separates issues of long-run behavior from evidential strength and allows the integration of background knowledge with statistical findings.