# 2004 TopicsInStatisticalDataAnalysis

(Redirected from Arsham, 2004)

## Quotes

...

• Binomial.
• Application: Gives probability of exactly successes in n independent trials, when probability of success p on single trial is a constant. Used frequently in quality control, reliability, survey sampling, and other industrial problems.
• Example: What is the probability of 7 or more "heads" in 10 tosses of a fair coin?
• Comments: Can sometimes be approximated by normal or by Poisson distribution.
• Multinomial.
• Application: Gives probability of exactly ni outcomes of event $\displaystyle{ i }$, for i = 1, 2, ..., k in n independent trials when the probability pi of event i in a single trial is a constant. Used frequently in quality control and other industrial problems.
• Example: Four companies are bidding for each of three contracts, with specified success probabilities. What is the probability that a single company will receive all the orders?
• Comments: Generalization of binomial distribution for ore than 2 outcomes.
• Hypergeometric.
• Application: Gives probability of picking exactly x good units in a sample of n units from a population of N units when there are k bad units in the population. Used in quality control and related applications.
• Example: Given a lot with 21 good units and four defective. What is the probability that a sample of five will yield not more than one defective?
• Comments: May be approximated by binomial distribution when n is small related to N.
• Geometric.
• Application: Gives probability of requiring exactly x binomial trials before the first success is achieved. Used in quality control, reliability, and other industrial situations.
• Example: Determination of probability of requiring exactly five tests firings before first success is achieved.
• Pascal.
• Application: Gives probability of exactly x failures preceding the sth success.
• Example: What is the probability that the third success takes place on the 10th trial?
• Negative Binomial.
• Application: Gives probability similar to Poisson distribution when events do not occur at a constant rate and occurrence rate is a random variable that follows a gamma distribution.
• Example: Distribution of number of cavities for a group of dental patients.
• Comments: Generalization of Pascal distribution when s is not an integer. Many authors do not distinguish between Pascal and negative binomial distributions.
• Poisson.
• Application: Gives probability of exactly x independent occurrences during a given period of time if events take place independently and at a constant rate. May also represent number of occurrences over constant areas or volumes. Used frequently in quality control, reliability, queuing theory, and so on.
• Example: Used to represent distribution of number of defects in a piece of material, customer arrivals, insurance claims, incoming telephone calls, alpha particles emitted, and so on.
• Comments: Frequently used as approximation to binomial distribution.
• Normal.
• Application: A basic distribution of statistics. Many applications arise from central limit theorem (average of values of n observations approaches normal distribution, irrespective of form of original distribution under quite general conditions). Consequently, appropriate model for many, but not all, physical phenomena.
• Example: Distribution of physical measurements on living organisms, intelligence test scores, product dimensions, average temperatures, and so on.
• Comments: Many methods of statistical analysis presume normal distribution.
• A so-called Generalized Gaussian distribution has the following pdf:
• A.exp[-B|x|n], where A, B, n are constants. For n=1 and 2 it is Laplacian and Gaussian distribution respectively. This distribution approximates reasonably good data in some image coding application.
• Slash distribution is the distribution of the ratio of a normal random variable to an independent uniform random variable, see Hutchinson T., Continuous Bivariate Distributions, Rumsby Sci. Publications, 1990.
• Gamma.
• Application: A basic distribution of statistics for variables bounded at one side - for example x greater than or equal to zero. Gives distribution of time required for exactly k independent events to occur, assuming events take place at a constant rate. Used frequently in queuing theory, reliability, and other industrial applications.
• Example: Distribution of time between re calibrations of instrument that needs re calibration after k uses; time between inventory restocking, time to failure for a system with standby components.
• Comments: Erlangian, exponential, and chi- square distributions are special cases. The Dirichlet is a multidimensional extension of the Beta distribution.
• Distribution of a product of iid uniform (0, 1) random? Like many problems with products, this becomes a familiar problem when turned into a problem about sums. If X is uniform (for simplicity of notation make it U(0,1)), Y=-log(X) is exponentially distributed, so the log of the product of X1, X2, … Xn is the sum of Y1, Y2, … Yn which has a gamma (scaled chi-square) distribution. Thus, it is a gamma density with shape parameter n and scale 1.
• Exponential.
• Application: Gives distribution of time between independent events occurring at a constant rate. Equivalently, probability distribution of life, presuming constant conditional failure (or hazard) rate. Consequently, applicable in many, but not all reliability situations.
• Example: Distribution of time between arrival of particles at a counter. Also life distribution of complex nonredundant systems, and usage life of some components - in particular, when these are exposed to initial burn-in, and preventive maintenance eliminates parts before wear-out.
• Comments: Special case of both Weibull and gamma distributions.
• Beta.
• Application: A basic distribution of statistics for variables bounded at both sides - for example x between o and 1. Useful for both theoretical and applied problems in many areas.
• Example: Distribution of proportion of population located between lowest and highest value in sample; distribution of daily per cent yield in a manufacturing process; description of elapsed times to task completion (PERT).
• Comments: Uniform, right triangular, and parabolic distributions are special cases. To generate beta, generate two random values from a gamma, g1, g2. The ratio g1/(g1 +g2) is distributed like a beta distribution. The beta distribution can also be thought of as the distribution of X1 given (X1+X2), when X1 and X2 are independent gamma random variables.
• There is also a relationship between the Beta and Normal distributions. The conventional calculation is that given a PERT Beta with highest value as b lowest as a and most likely as m, the equivalent normal distribution has a mean and mode of (a + 4M + b)/6 and a standard deviation of (b - a)/6.
• See Section 4.2 of, Introduction to Probability by J. Laurie Snell (New York, Random House, 1987) for a link between beta and F distributions (with the advantage that tables are easy to find).
• Uniform.
• Application: Gives probability that observation will occur within a particular interval when probability of occurrence within that interval is directly proportional to interval length.
• Example: Used to generate random valued.
• Comments: Special case of beta distribution.
• The density of geometric mean of n independent uniforms(0,1) is:
• P(X=x) = n x(n-1) (Log[1/xn])(n-1) / (n-1)!.
• zL = [UL-(1-U)L]/L is said to have Tukey's symmetrical l-distribution.
• Long-normal.
• Application: Permits representation of random variable whose logarithm follows normal distribution. Model for a process arising from many small multiplicative errors. Appropriate when the value of an observed variable is a random proportion of the previously observed value.
• In the case where the data are lognormally distributed, the geometric mean acts as a better data descriptor than the mean. The more closely the data follow a lognormal distribution, the closer the geometric mean is to the median, since the log re-expression produces a symmetrical distribution.
• Example: Distribution of sizes from a breakage process; distribution of income size, inheritances and bank deposits; distribution of various biological phenomena; life distribution of some transistor types.
• The ratio of two log-normally distributed variables is log-normal.
• Rayleigh.
• Application: Gives distribution of radial error when the errors in two mutually perpendicular axes are independent and normally distributed around zero with equal variances.
• Example: Bomb-sighting problems; amplitude of noise envelope when a linear detector is used.
• Comments: Special case of Weibull distribution.
• Cauchy.
• Application: Gives distribution of ratio of two independent standardized normal variates.
• Example: Distribution of ratio of standardized noise readings; distribution of tan(x) when x is uniformly distributed.
• Chi-square.
• The probability density curve of a chi-square distribution is asymmetric curve stretching over the positive side of the line and having a long right tail. The form of the curve depends on the value of the degrees of freedom.
• Applications: The most widely applications of Chi-square distribution are:
• Chi-square Test for Association is a (non-parametric, therefore can be used for nominal data) test of statistical significance widely used bivariate tabular association analysis. Typically, the hypothesis is whether or not two different populations are different enough in some characteristic or aspect of their behavior based on two random samples. This test procedure is also known as the Pearson chi-square test.
• Chi-square Goodness-of-fit Test is used to test if an observed distribution conforms to any particular distribution. Calculation of this goodness of fit test is by comparison of observed data with data expected based on the particular distribution.
• Weibull.
• Application: General time-to-failure distribution due to wide diversity of hazard-rate curves, and extreme-value distribution for minimum of N values from distribution bounded at left.
• The Weibull distribution is often used to model "time until failure." In this manner, it is applied in actuarial science and in engineering work.
• It is also an appropriate distribution for describing data corresponding to resonance behavior, such as the variation with energy of the cross section of a nuclear reaction or the variation with velocity of the absorption of radiation in the Mossbauer effect.
• Example: Life distribution for some capacitors, ball bearings, relays, and so on.
• Comments: Rayleigh and exponential distribution are special cases.
• Extreme value.
• Application: Limiting model for the distribution of the maximum or minimum of N values selected from an "exponential-type" distribution, such as the normal, gamma, or exponential.
• Example: Distribution of breaking strength of some materials, capacitor breakdown voltage, gust velocities encountered by airplanes, bacteria extinction times.
• t.
• The t distributions were discovered in 1908 by William Gosset who was a chemist and a statistician employed by the Guinness brewing company. He considered himself a student still learning statistics, so that is how he signed his papers as pseudonym "Student". Or perhaps he used a pseudonym due to "trade secrets" restrictions by Guinness.
• Note that there are different t distributions, it is a class of distributions. When we speak of a specific t distribution, we have to specify the degrees of freedom. The t density curves are symmetric and bell-shaped like the normal distribution and have their peak at 0. However, the spread is more than that of the standard normal distribution. The larger the degrees of freedom, the closer the t-density is to the normal density.

,

volumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 TopicsInStatisticalDataAnalysisTopics in Statistical Data Analysis: Revealing Facts From Datahttp://home.ubalt.edu/ntsbarsh/stat-data/topics.htm