Test Statistic
Jump to navigation
Jump to search
A Test Statistic is a sample-derived numerical statistic function that can quantify statistical evidence against null hypotheses in statistical hypothesis testing.
- AKA: Statistical Test Statistic, Hypothesis Test Statistic, Test Statistics (plural).
- Context:
- It can typically compute Statistical Test Values from sample data to evaluate statistical hypotheses.
- It can typically standardize Sample Observations into distribution-specific values for hypothesis testing procedures.
- It can typically quantify Systematic Variance relative to random variance in experimental designs.
- It can typically provide Numerical Summarys that reduce complex datasets to single values for statistical decision making.
- It can typically enable Statistical Significance Assessment through comparison with critical values or p-value calculations.
- ...
- It can often follow Known Probability Distributions under null hypothesis assumptions.
- It can often incorporate Sample Size Information through standardization formulas.
- It can often distinguish between Null Hypothesis Behavior and alternative hypothesis behavior.
- It can often support One-Tailed Tests or two-tailed tests depending on research questions.
- ...
- It can range from being a Simple Test Statistic to being a Complex Test Statistic, depending on its test statistic computational complexity.
- It can range from being a Exact Test Statistic to being an Approximate Test Statistic, depending on its test statistic distributional assumptions.
- It can range from being a Univariate Test Statistic to being a Multivariate Test Statistic, depending on its test statistic variable count.
- ...
- It can be defined as the ratio between systematic variance divided by random variance or the ratio between experimental effect divided by variability.
- It can support Parametric Statistical Tests through relationship between point estimates and population parameters normalized by standard deviations:
- Generally defined as [math]\displaystyle{ t= f(\hat{\theta}(X),\sigma(X,\theta),\theta_0)=\frac{\hat{\theta}(X)-\theta_0}{\sigma (X,\theta)} }[/math]
- Where [math]\displaystyle{ \hat{\theta}(X) }[/math] is a point estimate derived from sample data of random variable [math]\displaystyle{ X }[/math]
- Where [math]\displaystyle{ \theta_0 }[/math] is a population parameter value stated under null hypothesis (i.e. [math]\displaystyle{ H_0:\; \theta=\theta_0 }[/math])
- Where [math]\displaystyle{ \sigma(X,\theta) }[/math] is standard deviation depending on both sampling distribution and population distribution
- It can support Non-Parametric Statistical Tests without depending on population parameters and sampling distributions:
- Generally defined as sum of observed differences or ranks: [math]\displaystyle{ t= \sum f(R_i) }[/math]
- It can integrate with Statistical Software Packages for automated hypothesis testing.
- It can determine Test Rejection Regions through critical value comparisons.
- It can facilitate Power Analysis for sample size determination.
- ...
- Example(s):
- Mean-Based Test Statistics, such as:
- One-Sample t-Statistic: [math]\displaystyle{ t=\frac{\overline{x}-\mu_0}{s/\sqrt{n}} }[/math], obtained from sample mean value ([math]\displaystyle{ \overline{x} }[/math]), population mean value stated by null hypothesis ([math]\displaystyle{ \mu_0 }[/math]), sample standard deviation ([math]\displaystyle{ s }[/math]) and sample size ([math]\displaystyle{ n }[/math]).
- Matched-Pair t-Statistic: [math]\displaystyle{ t = \frac{\bar{d} - D}{s_d/\sqrt{n}} }[/math], obtained from mean difference between matched pairs in sample ([math]\displaystyle{ \bar{d} }[/math]), hypothesized difference between population means (D) and standard deviation of differences ([math]\displaystyle{ s_d }[/math]).
- Independent Two-Sample t-Statistic: [math]\displaystyle{ t = \frac{(\overline{x}_1 - \overline{x}_2) - d_0}{s_p \sqrt{1/n_1+1/n_2}} }[/math], obtained from sample means ([math]\displaystyle{ \overline{x_1}, \; \overline{x_2} }[/math]) with sample sizes [math]\displaystyle{ n_1 }[/math] and [math]\displaystyle{ n_2 }[/math], hypothesized difference ([math]\displaystyle{ d_0 }[/math]), and pooled standard deviation ([math]\displaystyle{ s_p }[/math]).
- Welch's t-Statistic for unequal variance comparisons.
- Standardized Test Statistics, such as:
- Z-Statistic for large sample tests with known population variance.
- Chi-Square Statistic: [math]\displaystyle{ \chi^2=\sum^n_{i=1}\frac{(O_i−E_i)^2}{E_i} }[/math], obtained from observed frequency counts ([math]\displaystyle{ O_i }[/math]) and expected frequency counts ([math]\displaystyle{ E_i }[/math]).
- F-Statistic for variance ratio tests in ANOVA procedures.
- Rank-Based Test Statistics, such as:
- Wilcoxon Signed-Rank Test Statistic: [math]\displaystyle{ W =\sum^n_{i=1} R^{(+)}_i }[/math], obtained as sum of positive ranks ([math]\displaystyle{ R^{(+)}_i }[/math]).
- Mann-Whitney U Statistic: [math]\displaystyle{ n_1 n_2 + \frac{n_2(n_2+1)}{2} - \sum^{n_2}_{i=n_1+1}R_i }[/math], obtained from sample sizes ([math]\displaystyle{ n_1, n_2 }[/math]) and sum of ranks ([math]\displaystyle{ R_i }[/math]).
- Kruskal-Wallis Test Statistic for multiple group comparisons.
- Correlation Test Statistics, such as:
- Goodness-of-Fit Test Statistics, such as:
- ...
- Mean-Based Test Statistics, such as:
- Counter-Example(s):
- P-Value, which represents probability rather than a test statistic value.
- Effect Size, which measures practical significance rather than statistical significance.
- Confidence Interval, which provides parameter estimate ranges rather than hypothesis test values.
- Descriptive Statistic, which summarizes data characteristics without hypothesis testing capability.
- Sufficient Statistic, which captures all sample information about parameters without necessarily being used for hypothesis testing.
- See: Statistical Hypothesis Testing, Null Hypothesis, Alternative Hypothesis, P-Value, Statistical Population, Sampling Distribution, Critical Value, Type I Error, Type II Error, Statistical Power, Significance Level.
References
2016
- (Wikipedia, 2016) ⇒ http://en.wikipedia.org/wiki/test_statistic Retrieved 2016-09-11
- QUOTE: A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis (...) For example, suppose the task is to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If the coin is flipped 100 times and the results are recorded, the raw data can be represented as a sequence of 100 heads and tails. If there is interest in the marginal probability of obtaining a head, only the number T out of the 100 flips that produced a head needs to be recorded. But T can also be used as a test statistic in one of two ways:
- the exact sampling distribution of T under the null hypothesis is the binomial distribution with parameters 0.5 and 100.
- the value of T can be compared with its expected value under the null hypothesis of 50, and since the sample size is large a normal distribution can be used as an approximation to the sampling distribution either for T or for the revised test statistic T−50.
- Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing.
2016
- (Stat Trek, 2016) ⇒ http://stattrek.com/statistics/dictionary.aspx?definition=Statistic Retrieved: 10-02-2016
- QUOTE: In hypothesis testing, the test statistic is a value computed from sample data. The test statistic is used to assess the strength of evidence in support of a null hypothesis.
- Suppose the test statistic in a hypothesis test is equal to S. If the probability of observing a test statistic as extreme as S is less than the significance level, we reject the null hypothesis.
2016
- (Statistical Analysis Glossary, 2016) ⇒ http://www.quality-control-plan.com/StatGuide/sg_glos.htm#P_value Retrieved: 10-02-2016
- QUOTE: In a statistical hypothesis test, the P value is the probability of observing a test statistic at least as extreme as the value actually observed, assuming that the null hypothesis is true. This probability is then compared to the pre-selected significance level of the test. If the P value is smaller than the significance level, the null hypothesis is rejected, and the test result is termed significant. The P value depends on both the null hypothesis and the alternative hypothesis. In particular, a test with a one-sided alternative hypothesis will generally have a lower P value (and thus be more likely to be significant) than a test with a two-sided alternative hypothesis. However, one-sided tests require more stringent assumptions than two-sided tests. They should only be used when those assumptions apply.
2016
- (Vsevolozhskaya et al., 2016) ⇒ Olga A. Vsevolozhskaya, Chia-Ling Kuo, Gabriel Ruiz, Luda Diatchenko, and Dmitri V. Zaykin. (2016). “The More You Test, the More You Find: Smallest P-values Become Increasingly Enriched with Real Findings As More Tests Are Conducted." arXiv preprint arXiv:1609.01788
- QUOTE: We consider P-values derived from commonly used test statistics, such as chi-squared, F, normal z, and Student's t statistics. ...
1978
- (Rosenthal, 1978) ⇒ Robert Rosenthal. (1978). “Combining results of independent studies." Psychological bulletin 85, no. 1 (1978): 185.
- QUOTE: ... Not simply in connection with combining ps but at any time that test statistics such as t, F, or Z are reported, estimated effect sizes should routinely be reported. The particular effect size d seems to be the most useful one to em- ploy when two groups are being compared. ...