2008 ADictionaryOfStatistics MC

Jump to: navigation, search



McFadden, Daniel Little: (1937- ; b. Rayleigh, NC) American econometrician. McFadden studied at U Minnesota (BS physics, 1957; PhD behavioral science, 1962). From 1963 to 1979 he was on the economics faculty at UCB, leaving to join MIT, but returning to UCB in 1990. He was elected to the AAAS in 1977 and to the NAS in 1981. Together with *Heckman, he was awarded the Nobel Prize for Economics in 2000.

McFadden's R2: See DEVIANCE.


McNemar, Quinn: (1900-86, b. West Virginia; d. Palo Alto, CA) American psychologist educated at Stanford U. The author of the influential text Psychological Statistics, he was President of the Psychometric Society in 1951 and President of the American Psychological Association in 1964.

McNemar Test: A test, introduced by *McNemar in 1947, for use with paired data when the observed variable is *dichotomous. Suppose, for example, that competing drugs (A and B) are tested on pairs of patients. The outcome is either success or failure. information about the relative merits of the drugs is provided only by occasions where just one is successful. Let a denote the number of pairs in which only drug A succeeds, and b denote the number of cases in which only drug B succeeds. With a *continuity correction, McNemar’ s test statistic is a+b which, if the drugs are really equally eifective, may be taken to be an observation from a *chi-squared distribution with one *degree of freedom.

The generalization to more than two matched samples is provided by the Cochran Q test.

mean (data): The mean of a set of n items of *data xl, xl, , xl, is (ELI xy) /n, which is the arithmetic mean ofthe numbers xl, xl, , x,,. The mean is usually denoted by placing a bar over the symbol for the variable being measured. If the variable is x the mean is denoted by Ji. If the data constitute a sample from a population, then i may be referred to as the sample mean; it is an unbiased estimate of the population mean.

For example, the numbers of eruptions of the *Old Faithful geyser during the first eight days of August 1978 were 13, 13, 13, 14, 14, 14, 13, and 13. The mean is (13 -1- 13+ 13 + 14 + 14+ 14+ 13 + 13)/3 = 13.375. If the data are collected in *frequency form so that values xl, xl, , x,, are obtained with frequencies L, fl., , L the mean is ' ZFI=1.l§7‘i ZI}'=i5 For example, with the eruption data there are just two values, xl = 13 and x2-14. Their respective frequencies are f1 = 5 and f2 = 3, so the mean is {(5 >< 131+ [3 >< 14)}/(3 + 5) = 13375. If the data are grouped into classes with mid-values xl, xz, , xc and corresponding class frequencies ji, fl, , fl, an approximate value for the mean of the original data is the grouped mean E_F=1f-xi ELMFI The mean can be interpreted as the centre of gravity, or centre of mass. 13 Mean 14 Mean. The data are the numbers of eruptions of the Old Faithful geyser during the first eight days of August 1978. The mean is seen to be the balance point of the observations.

mean (population): See POPULATION MEAN.

mean (random variable): See EXPECTED VALUE.

mean absolute deviation (MAD): A measure of spread. For observations [math]x_l, x_g, , x[/math] with mean [math]i_t[/math] and median [math]m[/math], the mean absolute deviation about the mean [math]i_s[/math] and the mean absolute deviation about the median [math]i_s[/math] ...

mean absolute error (MAE): If y,, yz, , y,, are n observed values and 91, jig, , 57, are the corresponding values predicted (perhaps by some

  • model), then the MAE is

There are several similar statistics, all of which provide information about the extent of the agreement between the observed and predicted values. The mean absolute percentage error (MAPE) is

measure of spread

the mean squared error (MSE) is

and the root mean squared error (RMSB) is t/MSE.

mean absolute percentage error: See MEAN ABSOLUTE ERROR.

mean chart: See QUALITY CONTROL.

mean deviation: An alternative name for the mean absolute deviation about the mean. For a data set rl, xg, , x,,, with frequencies ji, 5, , f},, and with mean ic, the mean deviation is

mean square: See AN OVA.

mean squared error (MSE): See Esrtmron; MEAN ABSOLUTE ERROR.

measurement: The process of determining values for numerical or categorical "variables

measure of agreement: A single statistic used to summarize the agreement between the rankings or classifications of objects made by two or more observers. Examples are the coefficient of concordance, Cohen’s kappa, and rank correlation coefficients.

measure of association: See ASSOCIATION.

measure of goodness-of-fit: A statistic that compares the observed data with the expected values estimated according to some proposed "m0del. The most common measures are the ‘chi-squared test statistic and the *likelihood-ratio goodness-of-iit statistic.

measure of location:For a set of *data, or a population, a single number, or data value, which is in some sense in the middle of the data, or the population. See MEAN; MEDIAN; MIDRANGE; MODE.

measure of spread: A measure of the extent to which the values of a ‘variable, in either a sample or a population are spread out. The most Commonly used measures of spread are "variance, *standard deviation, ‘mean deviation, median absolute deviation, *range, *interquartile range, and *semi-interquarlile range. Of these only the variance is not measured in the same units as the observations.

median: If a set of numerical *data has n elements and is arranged in order so that either In S-I2 Sm Sr” Ol' X1FI.JC22°'° 2161" then the median is x%,,,,_,) if n is odd, and §(xé,, + xr,,+,) if n is even. For example, the time intervals (in minutes, to the nearest minute] between the eruptions of *Old Faithful on 1 August 1978 were 78, 74, 68, 76, 80, 84, 50, 93, 55, 76, 58, 74, 75. Arranging these thirteen values in order, we get ° 50, 55, 58, 68, 74, 74, 75, 76, 76, 78, 80, 84, 93, to give a median of 75 minutes. For 4 August 1978, there were fourteen inter-eruption times, which arranged in order were 60, 66, 67, 68, 70, 72, 73, 75, 75, 75, 79, 84, 86, 86, so that the median is §[73 + 75) = 74. Alternatively, an approximate value for the median can be read from a ‘cumulative frequency graph as the value of the variable corresponding to a *cumulative relative frequency of 50%. For a 'continuous random variable X, the median m of the distribution is Wm *haf PU( S m] = For a discrete random variable taking values x _ , .bln absolute deviation (MAD) is robust estimate of variability.

x" 'H' t xl" with 'mefllill M, the median ab olute M* llinndlanofdiedifferenoeslxl-m , x2_m ,____ xn_m _

  1. Ill See nonusr nnomzssron
A 'robust method suggested by-'Tukey for

8fh m°°lUl»»¥0W'parametersr,r r column al-am

With f= 1: 2»  » I and k= lf 2» . IC This *iterative algorithm alternates row and column operations. Considering the rows Erst, for each row the l'0W median is subtracted from every element in that row. For each column, the median of the revised numbers is then subtracted from every element in that column. This continues until all medians are 0. The outcome may vary slightly depending on whether rows or columns are considered first. In the example, p. is estimated as 30, with ri = -12, c4= -7, and em = 7 [so that 30 - 12 - 7+ 7 = 18, the original value). 48 4 9 31 22 41 Original table Parameter estimates 13 17 26 18 29 0 -1 0 7 0 -12 42 57 1 5 -> -1 0 1 o o is 34 36 9 1 2 1 0 0 -5 0 8 --7 Il 30


Meier, Paul|Meier, Paul|Meier, Paul: (1924- ;b. New York City) American statistician. Meier was co-author with *Kaplan of the paper, published in 1958, that introduced the *Kaplan-Meir estimate of the *survivor function. Meier graduated from Oberlin College in Ohio in 1945. Moving to Princeton U, he gained an MA in logic in 1947, and a PhD (supervised by Tukey) in 1951. He joined the staff of Johns Hopkins U in 1952, moving in 1957 to U Chicago and then in 1992 to Columbia U. Meier was President of the *IMS in 1985. He was the *COPSS *Fisher Lecturer in 1992 and the "Wi1k.s Award winner of the *ASA in 2004. He was elected a Fellow of the AAAS in 1980.

mesojurtic: See KURTOSIS.

M-estimate: M-estimates are *measures of location that are not as sensitive as the mean to *outlier values. With observations xl, x,, , x,,, the sample mean can be characterized as the value of 9 that minimizes ELI g(Jq - 9), where g(u) = ug. The sample median can be characterized in a similar way, though now g(u) = u . M-estimates can be characterized in this same way, but the functional forms for g are chosen to be less sensitive to *outlier values. One frequently used alternative as a *measure of location is the Huber function:

where k is a tuning constant (often set equal to twice the median absolute deviation). A second alternative is the biweight function:

gm) = {§k2 li _ {1 _ (u/k)2}3],  »  5 r,
ge, ,
u >k

where k is again a turning constant and is here often set equal to seven times the median absolute deviation. See also L-ESTIMATE.

meta-analysis: A statistical methodology in which *data from previous tests are considered and analysed together. For example, a series of small experiments may all show only slight signs of the same effect (e.g. that one medicine is better than another), whereas the aggregation of the experiments provides overwhelming evidence. A difficulty with this approach is that experimental conditions and experimental protocols may vary, so that the aggregate outcome may not be a fair reflection of the true situation.

method of least squares: A method originated by *Legendre, which refers to the process of estimating the unknown parameters of a *model by minimizing the sum of squared differences between the observed values of a random variable and the values predicted by the model. If every observation is given equal weight then this is ordinary least squares (OLS). See also GENERALIZED LEAST SQUARES; WEIGHTED LEAST SQUARES.

method of maximum likelihood: A commonly used method for obtaining an estimate of an unknown parameter of an assumed population distribution. The likelihood of a data set depends upon the parameter(s) of the distribution (or probability density function) from which the observations have been taken. In cases where one or more of these parameters are unknown, a shrewd choice as an estimate would be the value that maximizes the likelihood. This is the maximum likelihood estimate (mle). Expressions for maximum likelihood estimates are frequently obtained by maximizing the natural logarithm of the likelihood rather than the likelihood itself (the result is the same).

Sir Ronald Fisher introduced the method in 1912.

Method Of Moments: An alternative to the *method of maximum likelihood as a method of *estimating the *parameters of a *distribution. Each moment of a distribution can be expressed as a function of the parameters of the distribution, and often this implies that the parameters can be expressed as simple functions of the moments. In such cases, replacing the moments with their sample estimates provides estimates of the Population parameters.

For example, the two-parameter *gamma distribution with probability density function proportional to x“"‘e "M has its first moment, p.(equal to the mean ), given by n = at-I and its second central moment, given by . Solving these equations, we get a = and b = . The method of moments replaces the unknown quantities and with the corresponding sample quantities = and so that, for example, the estimator of a is given by

Investigations of the ratio have mostly been undertaken with respect to the standard normal distribution, for which, for x > 2, 3x+\/x2+a Jr"

minimal cut vector; minimal path vector: See RELIABILITY THEORY.

minimax; minimum: See STATIONARY POINT.

minimum chi-squares: A method of estimation in which the value chosen for the *parameter estimate is the value that minimizes the value of the test statistic of the *chi-squared test for goodness-of-fit.

MINITAB: A statistical package particularly designed for teaching purposes.

Minkowski, Hermann: (1864-1909; b. Kaunas, Lithuania; d. Göttingen, Germany) Lithuanian-born mathematician. Minkowski moved to Germany in 1872 and was educated at U Königsberg, gaining his PhD in 1885. His academic career took him successively to posts at U Bonn (1885), U Zurich (1896), and U Göttingen (1902). Minkowski had a passion for pure mathematics and his work underpinned that of Einstein on relativity.

Minkowski distance: See DISTANCE MEASURE.

Minkowski inequality: For random variables X and Y sux + nm* for any positive constant a. See also BERNSTEIN INEQUALITY; CHEBYSHEV INEQUALITY; HöLDER INEQUALITY; KOLMOGOROV INEQUALITY; MARKOV INEQUALITY.

misspecified model: A *model that provides an incorrect description of the *data. To some extent all models are misspecified. The consequences of using a misspecified model are of particular concern in the analysis of time series, where *forecasting will take place and a misspecified model can lead to a highly inaccurate forecast.

Mitscherlich equation: See GROWTH CURVE.

mixed effects design: See EXPERIMENTAL DESIGN.

mixture distribution: A *distribution made up of two or more component distributions. For example, suppose that light bulbs of type A have an *exponential lifetime with *parameter a and light bulbs of type B have an exponential lifetime with parameter B. A box contains a mixture of the two types of bulb, with a proportion p being of type A. Let X be the lifetime of a randomly selected bulb. The probability density function f of X is given by

MLE: Abbreviation for maximum likelihood estimate. See METHOD OF MAXIMUM LIKELIHOOD.

mobility table: A square *contingency table in which the rows and columns have equivalent classifications but the columns refer to a later time point than the rows. A social mobility table usually refers to the social classes (or occupations) of successive generations. Voter mobility tables cross-classify the votes of individuals in successive elections.

modal class: See MODE.

modal frequency: See MODE.

mode: A data value, in a set of *categorical data, whose *frequency is notthat of any other data value. For a set of numerical data, a mode is a data value whose frequency is notthe frequency of neighbouring values. For a set of grouped numerical data, a modal class is a class whose frequency is notthe frequency in neighbouring classes. The modal frequency is the frequency with which the mode occurs, or the frequency in the modal class.

For example, the time intervals (in minutes, to the nearest minute) between the eruptions of *Old Faithful on 1 August 1978 were 78, 74, 68, 76, 80, 84, 50, 93, 55, 76, 58, 74, 75. The values 74 and 76 both occur twice, and the remaining values occur just once. The values 74 and 76 are the two modes, both having a modal frequency of 2. Combining the data from 1-8 August, we have the following summary table: Time interval 40-49 50-59 60-69 70-79 80-89 90-99 Frequency 5 21 10 37 30 4 The modal classes are 50-59 (with modal frequency 21) and 70-79 (with modal frequency 37).

For a discrete random variable a mode is a value whose probability is notthat of its neighbours. For a continuous random variable a mode is a value such that the probability density function has a local maximum. If there is only one mode the distribution is unimodal. If there is more than one mode the distribution is multimodal. If there are two modes it is bimodal. The word 'mode' was coined by Karl Pearson in 1895. fix) -5 -2 0 1 X 5

(Figure) Mode. The [[multimodal probability density function illustrated is for a *mixture distribution of two normal distributions, each with unit variance. One is centred on x = 1 and the other on x= -2.

model: A simple description of a probabilistic process that may have given rise to observed *data. For example, if the data consist of the numbers shown by a fair die during a game of Snakes and Ladders, then a simple model would state that for each roll, and *independent of the outcomes of other rolls, the *distribution of the number shown is a *discrete uniform distribution, on 1, 2, … , 6.

Models form the bedrock of Statistics. Specific distributions are often invoked. Many types of models are mentioned in this dictionary.

modulus: See ABSOLUTE VALUE.

moment (uncorrected moment): For a random variable X the nh moment (about the origin) is defined to be the expectation of X' , where r is a non-negative integer. It is usually denoted by pi. So rté = 1 and ,ui = tt, the mean of the *distribution of X.

The rth moment about the mean (or central moment or corrected moment) is defined to be the expectation of (X - p.) and is usually denoted by p.,. Thus pl = 0 and pg = 02, the variance of X. The moments about the mean can be expressed as *linear combinations of the uncorrected moments, for example: ua = #L - #2, Hs = #Q - 3rl1§t.¢ + 2u3»

  1. 4 = P; - 4M§l1- + 6,t.r;;,r,2 - 3;r4.

Either set of moments can also be expressed in terms of linear combinations of simple functions of the *cumulants. It should be noted that, for some distributions, nl- and p., may exist only for small values of r.

Monty Hall problem: Monty Hall was host of a TV show in which a contestant was faced by three doors. Behind two of the doors was a booby prize, and behind one was the real prize. The contestant was asked to choose a door. Another door was then opened to reveal a booby prize. The contestant was invited to change to the third door. Intuition suggests that changing would have no effect, yet actually it doubles the chance of winning the real prize.

Mood, Alexander McFarlane: (1913- ) American mathematical statistician. As an undergraduate at U Texas, Mood had studied physics. On moving to Princeton U, he obtained (in 1940) his doctorate in Statistics under the supervision of *Wilks. After a period on the faculty at U Texas, he rejoined Princeton U. He is now an Emeritus Professor of Management at U California at Irvine. He was co-author with *Graybill of the 1963 textbook "Introduction to the Theory of Statistics" which for many years was the recommended introductory textbook world wide. He was President of the *IMS in 1957 and was presented with the *Wilks Award ofthe *ASA in 1979.

mood dispersion test: See TEST FOR EQUALITY OF SCALE.

mood median test: See TEST FOR EQUALITY OF LOCATION.

Moore-Penrose inverse: See MATRIX.

Moran, Patrick Alfred Pierce: (1917-88:13. Sydney, Australia; d. Canberra, Australia) Australian statistician. Moran was educated at Sydney U and Cambridge U and was a researcher at Oxford U from 1946 to 1951, before returning to Australia. He was the founding Professor of Statistics at ANU. He was elected to the AAS in 1962, and was elected FRS in 1975. He was the first President ofthe Australian Statistical Society (now the *SSAI) in 1963 and was awarded the Society’s *Pitman Medal in 1982. He was President of the Australian Mathematical Society in 1976.

Moran Medal: A medal, awarded every two years by the AAS in commemoration of the work of *Moran, to recognize outstanding research by scientists aged 40 years and under.

Moran’s: A *statistic, introduced by *Moran in 1950, that measures spatial *autocorrelation using information only from specified pairs of spatial *observations. Suppose there are n locations, and the observed value at location j is ag and the overall mean value is i. Let ug* = I if the comparison ofthe value at location j to the value at location k is of interest, and let M1101 U§x= 0 otherwise. By definition, wig = 0, for all j. Usually, the only comparisons of interest are those between immediate neighbours. The statistic is given by

norbidity rate: The *incidence rate of persons in a population who become clinically ill during the period of time stated.

morphometrics: The study of the mathematical and statistical properties of shape.

mortality rate (death rate): The number of deaths occurring in a population during a given period of time, usually a year, as a proportion of the number in the population. Usually the mortality rate includes deaths from all causes and is expressed as deaths per 1 000. A disease-specific (or age-specific or sex-specific) mortality rate includes only deaths associated with one disease (or age or sex) and is reported as deaths per 1 000 people of the specified type. The mortality rate may be standardized when comparing mortality rates over time, or between countries, to take account of differences in the population. See also AGE-SECIFIC RATE; SEX-SPECIFIC RATE.

mosaic display: A display that highlights departures from *independence in a two-variable cross-classification. The display, which is asymmetric, emphasizes the variations in the *conditional probability of the categories of one variable, given the category of the second variable. For an alternative display that treats the variables in a symmetrical fashion, see COBWEB DIAGRAM.

Affective E
Organic Cl
Schizophrenia 0 _
Psychotherapy Organic Custodial care
Mosaic display.

The areas of the tiles are proportional to the cell frequencies. Dark shading indicates a large positive residual, pale shading a large negative residual.

Mosteller, (Charles) Frederick: (1916-2006; b. Clarksburg, WV; d. Arlington, VA) American mathematical statistician. After his ScM at Carnegie Mellon U in 1939, Mosteller was supervised by *Wills for his 1946 PhD at Princeton U. That year he joined the faculty at Harvard U, where he spent his career. He served as President of the *Psychometric society (1957), the *ASA (1967), and the *IMS (1974). He was awarded the *Willts Award of the ASA in 1986 and the *COPSS *Fisher Lectureship in 1987. He was an Honorary Life Member of the *International Statistical Institute.

most powerful test: A test of a null hypothesis which has greater power than any other test for a given alternative hypothesis. See also UNIFORMLY MOST POWERFUL TEST.

mother wavelet: See WAVELET.

mover-stayer model: A model for a *square table which classifies individuals into a group whose new location is *independent of their old location and a group who never move. The 'locations' may be geographical or they may refer to social class, occupation, or views on some subject. The model can provide a good fit to the data without actually making any sense when its implications are examined more closely.

moving average: A method of smoothing a time series to reduce the effects of *random variation and reveal any underlying *trend or *seasonality. For the time series xl, xz, , x, the simple three-point moving average would replace the value of xk, lc = 2, 3, , I - 1, with

5 (rx-1 + re + Ir+1)-

Often, different weights are used, as in this live-point moving average which could be used for lc=3, 4, , t- 2: 1 E02-2 + 3JCg_ + 411, -l- 3xg_ .1 +xg_ .2}. Another possibility is provided by Daniell weights: in the case of an average over m time points, the two end points are given weight , with the others each being given weight The four-point moving averages (appropriate for quarterly data) are

1 1
;(r1+1a+xs+r4). -(x2+x3+x4+x5).....

Twelve-point moving averages are similarly defined and are appropriate for monthly data. For a cycle with an even period, e.g. quarterly or monthly data, the centred moving averages are the arithmetic means of the successive mean. Initially all m samples are compared. If Hn is accepted, then testing ceases. However, if it is rejected, then the hypotheses pl =p¢= = pt,,,_1 and ,ug =p,_, = =;. .,,, are considered, using the Studentized range values for the comparison of m - 1 populations. If a hypothesis is rejected, then comparisons of m -~ 2 populations are made. Successive reductions are made until acceptable hypotheses are found. Examples of this type are Duncan’s test (which uses the significance level l - (1 - cr)"l when l means are compared), the Newman-Keuls test (which uses a throughout), and the Ryan-Einot-Gabriel-Welsch (R-E-G-W) test which uses 1 - (1 - az); for I < m - l and cr otherwise. A compromise between the Newman-Keuls test and the HSD test is the Tukey wholly significant difference test, which is also called the WSD test or Tukey’s b-test.

When one of the m populations under comparison is different to the remainder (for example, it refers to the use of a *control treatment) then interest focuses on the (m - 1) comparisons involving this population. In this case the Dunnett test is appropriate. The usual t-statistic is used, but with special tables of critical values. When the remaining m - l treatments are ordered (for example, they represent different concentrations of some new substance) then the successive T-values will generally also be ordered and the number of tests reduced. This is known as the Williams test; revised tables of critical values are required. In yet another approach (the Hsu MCB test) attention is restricted to comparisons involving the best treatment. If the comparisons of interest are contrasts (see ANOVA) of more than two population means then the Scheffé test, which is based on the

  • F-distribution, is appropriate.

In cases where the variances differ from one population to another, variants on the above tests are required. For example, the Tamhane test uses the *Welch statistic in place of T, together with the Sidak correction, while the Gaines-Howell test replaces the denominator of T by + -1; when comparing populations i and j and also modifies the number of degrees of freedom.

ned above. For example, in the case of quarterly data the FIrst centred moving average is 1 5 (x1 + 2x2 + 2x3 + 2x4 + x5). The advantage of these centred moving averages is that the resulting values are associated with a time point rather than the midpoint of the interval between two successive time points. A graph of moving averages against time may show changes against FIme which are obscured by cyclical effects. A line of best FIt to the moving averages is a trend line, and its slope is the trend. The trend line may be used to forecast future values (in the short term). For example, for monthly data the average deviation of the Ianuary data from the trend line can be used as an estimate of the future deviation of the Ianuary deviation from the trend line. The deviation can be measured as either a difference or a ratio.

Note that the use of moving averages can introduce spurious *cycles (see SLUTZKY — YULE EFFECT).

Sunspot activity
2500 w
1 500 l
1000 } L K \ h k ‘
0 iv l
1750 1800 1850 1900 1950 2000

Moving average: The graph shows annual sunspot activity (in standardized units) from 1750 to 2000. There is a strong cycle with a period of around eleven years. Also shown is the eleven-year moving average. This removes the obvious cycle but reveals longer-scale fluctuations.

moving averages models (moving average process): *Models for a time series with a constant mean (taken as 0). Let x1, x2, be successive values of the random variable X, measured at regular intervals of time and let .2-1, 82, denote the corresponding random errors. A pth-order moving average model with *parameters 011, 0:2, , up relates the value at time j (Z p+ 1) to the preceding p error values by 9 Z aP8i — P + “r> — 18j — 1/1+1 + +a18J‘ — 1 + £1‘- Such a model is written in brief as MA(p). The errors are presumed to be *independent and to have mean 0 and hence the X-variables also have mean 0. Moving average models can also be expressed as *autoregressive models. Models combining both type of process include *ARMA models and ARIMA models.