Pooled Standard Deviation

A Pooled Standard Deviation is a linear combination between standard deviations of independent samples drawn from populations of unknown but equal variances.

AKA: Combined Standard Deviation, Composite Standard Deviation, Overall Standard Deviation.
Context:
- It can also be defined as the square-root of a pooled variance estimator.
- It is used in the calculation of independent two-sample t-statistic.
- It can be estimated to be a weighted average between the individual sample standard deviations [math]\displaystyle{ s_1,s_2,\cdots,s_k }[/math]:

[math]\displaystyle{ s_p =\sqrt{ \frac{(n_1 − 1)s_1^2 + (n_2 − 1)s_2^2 + \cdots + (n_k−1)s_k^2}{n_1 + n_2 + \cdots + n_k − k}} }[/math]

where [math]\displaystyle{ n_1,n_2,\cdots,n_k }[/math] are the respective sample sizes

Counter-Example(s):
See: Standard Deviation, Sample Standard Deviation, Point Estimate, Sample Variance.

References

2017

(Wikipedia, 2017) ⇒ http://en.wikipedia.org/wiki/Pooled_variance
- In statistics, pooled variance (also known as combined, composite, or overall variance) is a method for estimating variance of several different populations when the mean of each population may be different, but one may assume that the variance of each population is the same.

Under the assumption of equal population variances, the pooled sample variance provides a higher precision estimate of variance than the individual sample variances. This higher precision can lead to increased statistical power when used in statistical tests that compare the populations, such as the t-test.

The square-root of a pooled variance estimator is known as a pooled standard deviation (also known as combined, composite, or overall standard deviation).

(...)If the populations are indexed [math]\displaystyle{ i = 1, \ldots, k }[/math], then the pooled variance [math]\displaystyle{ s^2_p }[/math] (or [math]\displaystyle{ s^2_c }[/math] ) can be estimated by the weighted average:

[math]\displaystyle{ s_p^2=\frac{\sum_{i=1}^k (n_i - 1)s_i^2}{\sum_{i=1}^k(n_i - 1)} = \frac{(n_1 - 1)s_1^2+(n_2 - 1)s_2^2+\cdots+(n_k - 1)s_k^2}{n_1+n_2+\cdots+n_k - k} }[/math],

where [math]\displaystyle{ n_i }[/math] is the sample size of population [math]\displaystyle{ i }[/math] and the sample variances are

[math]\displaystyle{ s^2_i }[/math] = [math]\displaystyle{ \frac{1}{n_i-1} \sum_{j=1}^{n_i} \left(y_j - \overline{y_i} \right)^2 }[/math].

Use of [math]\displaystyle{ (n_i-1) }[/math] weighting factors instead of [math]\displaystyle{ n_i }[/math] comes from Bessel's correction.

2014

(IUPAC, 2014) ⇒ Retrieved from http://goldbook.iupac.org/html/P/P04758.html published in IUPAC. Compendium of Chemical Terminology, 2nd ed. (the "Gold Book"). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). XML on-line corrected version: http://goldbook.iupac.org (2006-) created by M. Nic, J. Jirat, B. Kosata; updates compiled by A. Jenkins.
- A problem often arises when the combination of several series of measurements performed under similar conditions is desired to achieve an improved estimate of the imprecision of the process. If it can be assumed that all the series are of the same precision although their means may differ, the pooled standard deviations [math]\displaystyle{ s_p }[/math] from [math]\displaystyle{ k }[/math] series of measurements can be calculated as

[math]\displaystyle{ s_p =\sqrt{ \frac{(n_1 − 1)s_1^2 + (n_2 − 1)s_2^2 + \cdots + (n_k−1)s_k^2}{n_1 + n_2 + \cdots + n_k − k}} }[/math]

The suffices [math]\displaystyle{ 1 , 2 , \cdots , k }[/math] refer to the different series of measurements. In this case it is assumed that there exists a single underlying standard deviation [math]\displaystyle{ \sigma }[/math] of which the pooled standard deviation [math]\displaystyle{ s_p }[/math] is a better estimate than the individual calculated standard deviations [math]\displaystyle{ s_1, s_2, \cdots, s_k }[/math]. For the special case where [math]\displaystyle{ k }[/math] sets of duplicate measurements are available, the above equation reduces to

[math]\displaystyle{ s_p = \sqrt{ (\frac{\sum (x_{i1} − x_{i2})^2}{2k}} }[/math]

Results from various series of measurements can be combined in the following way to give a pooled relative standard deviation [math]\displaystyle{ s_{r,p} }[/math]:

[math]\displaystyle{ s_{r,p} = \sqrt{\frac{\sum (n_i − 1)s_{r,i}^2}{\sum n_i − 1}} = \sqrt{\frac{\sum(n_i − 1)s_{i2}x_i^{−2}}{\sum n_i − 1}} }[/math]

2007

(PSU, 2007)http://sites.stat.psu.edu/~ajw13/stat500_su_res/notes/lesson10/lesson10_03.html
- When we have good reason to believe that the standard deviation for population 1 (also called sample) is about the same as that of population 2 (also called sample), we can estimate the common standard deviation by pooling information from samples from population 1 and population 2.

Let [math]\displaystyle{ n_1 }[/math] be the sample size from population 1, [math]\displaystyle{ s_1 }[/math] be the sample standard deviation of population 1.

Let [math]\displaystyle{ n_2 }[/math] be the sample size from population 2, [math]\displaystyle{ s_2 }[/math] be the sample standard deviation of population 2.

Then the common standard deviation can be estimated by the pooled standard deviation:

[math]\displaystyle{ s_p=\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_2+n_1-1}} }[/math]

The test statistic is:

[math]\displaystyle{ t=\frac{\overline{y}_1-\overline{y}_2}{s_p\sqrt{1/n_1+1/n_2}} }[/math]

with degrees of freedom equal to [math]\displaystyle{ df = n_1 + n_2 - 2 }[/math].

Pooled Standard Deviation

References

2017

2014

2007

Navigation menu

Search