2006 TheCambridgeDictionaryOfStatistics

From GM-RKB
(Redirected from Everitt, 2006)
Jump to navigation Jump to search

Subject Headings: Statistics Dictionary.

Notes

Cited By

Quotes

Book Overview

If you use statistics and need easy access to simple, reliable definitions and explanations of modern statistical concepts, then look no further than this dictionary. Over 3600 terms are defined, covering medical, survey, theoretical, and applied statistics, including computational aspects. Entries are provided for standard and specialized statistical software. In addition, short biographies of over 100 important statisticians are given. Definitions provide enough mathematical detail to clarify concepts and give standard formulae when these are helpful. The majority of definitions then give a reference to a book or article where the user can seek further or more specialized information, and many are accompanied by graphical material to aid understanding.

Preface to the First Edition

The Cambridge Dictionary of Statistics aims to provide students of statistics, working statisticians and researchers in many disciplines who are users of statistics with relatively concise definitions of statistical terms. All areas of statistics are covered, theoretical, applied, medical, etc., although, as in any dictionary, the choice of which terms to include and which to exclude is likely to reflect some aspects of the compiler’s main areas of interest, and I have no illusions that this dictionary is any different. My hope is that the dictionary will provide a useful source of reference for both specialists and non-specialists alike. Many definitions necessarily contain some mathematical formulae and/or nomenclature, others contain none. But the difference in mathematical content and level among the definitions will, with luck, largely reflect the type of reader likely to turn to a particular definition. The non-specialist looking up, for example, Student’s t-tests will hopefully find the simple formulae and associated written material more than adequate to satisfy their curiosity, while the specialist seeking a quick reminder about spline functions will find the more extensive technical material just what they need.

The dictionary contains approximately 3000 headwords and short biographies of more than 100 important statisticians (fellow statisticians who regard themselves as ‘important’ but who are not included here should note the single common characteristic of those who are). Several forms of cross-referencing are used. Terms in slanted roman in an entry appear as separate headwords, although headwords defining relatively commonly occurring terms such as random variable, probability, distribution, population, sample, etc., are not referred to in this way. Some entries simply refer readers to another entry. This may indicate that the terms are synonyms or, alternatively, that the term is more conveniently discussed under another entry. In the latter case the term is printed in italics in the main entry.

Entries are in alphabetical order using the letter-by-letter rather than the word-by-word convention. In terms containing numbers or Greek letters, the numbers or corresponding English word are spelt out and alphabetized accordingly. So, for example, 2 × 2 table is found under two-by-two table, and a-trimmed mean, under alpha-trimmed mean. Only headings corresponding to names are inverted, so the entry for William Gosset is found under Gosset, William but there is an entry under Box–Müller transformation not under Transformation, Box–Müller.

For those readers seeking more detailed information about a topic, many entries contain either a reference to one or other of the texts listed later, or a more specific reference to a relevant book or journal article. (Entries for software contain the appropriate address.) Additional material is also available in many cases in either the Encyclopedia of Statistical Sciences, edited by Kotz and Johnson, or the Encyclopedia of Biostatistics, edited by Armitage and Colton, both published by Wiley. Extended biographies of many of the people included in this dictionary can also be found in these two encyclopedias and also in Leading Personalities in Statistical Sciences by Johnson and Kotz published in 1997 again by Wiley.

Lastly and paraphrasing Oscar Wilde ‘writing one dictionary is suspect, writing two borders on the pathological’. But before readers jump to an obvious conclusion I would like to make it very clear that an anorak has never featured in my wardrobe.

Contents

(from) http://assets.cambridge.org/97805217/66999/excerpt/9780521766999_excerpt.pdf

  • Aalen-Johansen estimator: An estimator of the survival function for a set of survival times, when there are competing causes of death. Related to the Nelson–Aalen estimator. [Scandinavian Journal of Statistics, 1978, 5, 141–50.]
  • Aalen’s linear regression model: A model for the hazard function of a set of survival times
  • Abbot’s formula: A formula for the proportion of animals (usually insects) dying in a toxicity trial that recognizes that some insects may die during the experiment even when they have not been exposed to the toxin, and among those who have been so exposed, some may die of natural causes. Explicitly the formula is
  • ABC method: Abbreviation for approximate bootstrap con?dence method.
  • Ability parameter: See Rasch model.
  • Absolute deviation: Synonym for average deviation.
  • Absolute risk: Synonym for incidence.
  • Absorbing barrier: See random walk.
  • Absorbing Markov chains: A state of a Markov chain is absorbing if it is impossible to leave it,
  • Absorption distributions: Probability distributions that represent the number of ‘individuals’ (e.g. particles) that fail to cross a speci?ed region containing hazards of various kinds. For example, the region may simply be a straight line containing a number of ‘absorption’ points. When a particle travelling along the line meets such a point, there is a probability p that it will be absorbed. If it is absorbed it fails to make any further progress, but also the point is incapable of absorbing any more particles. When there are M active absorption
  • Abundance matrices: Matrices that occur in ecological applications. They are essentially two-dimensional tables in which the classifications correspond to site and species. The value in the ijth cell gives the number of species j found at site i. [Ecography, 2006, 29, 525–530.]
  • Accelerated failure time model: A general model for data consisting of survival times, in which explanatory variables measured on an individual are assumed to act multiplicatively on the time-scale, and so affect the rate at which an individual proceeds along the time axis. Consequently the model can be interpreted in terms of the speed of progression of a disease. In the simplest case of comparing two groups of patients, for example, those receiving treatment A and those receiving treatment B, this model assumes that the survival time of an individual on one treatment is a multiple of the survival time on the other treatment; as a result the probability that an individual on treatment A survives beyond time t is the probability that an individual on treatment B survives beyond time ¢t, where ¢ is an unknown positive constant. When the end-point of interest is the death of a patient, values of ¢one correspond to an acceleration in the time of death of an individual assigned to treatment A, and values of ¢ greater than one indicate the reverse. The parameter ¢ is known as the acceleration factor. [Modelling Survival Data in Medical Research, 2nd edition, 2003, D. Collett, Chapman and Hall/CRC Press, London.]
  • Accelerated-life testing: A set of methods intended to ensure product reliability during design and manufacture in which stress is applied to promote failure. The applied stresses might be temperature, vibration, shock etc. In order to make a valid inference about the normal lifetime of the system from the accelerated data (accelerated in the sense that a shortened time to failure is implied), it is necessary to know the relationship between time to failure and the applied stress. Often parametric statistical models of the time to failure and of the manner in which stress accelerates aging are used. [Accelerated Testing, 2004, W. Nelson, Wiley, New York.]
  • Acceleration factor: See accelerated failure time model.
  • Acceptable quality level: See quality control procedures.
  • Acceptable risk: The risk for which the bene?ts of a particular medical procedure are considered to outweigh the potential hazards. [Acceptable Risk, 1984, B. Fischoff, Cambridge University Press, Cambridge.]
  • Acceptance sampling: A type of quality control procedure in which a sample is taken from a collection or batch of items, and the decision to accept the batch as satisfactory, or reject them as unsatisfactory, is based on the proportion of defective items in the sample. [Quality Control and Industrial Statistics, 4th edition, 1974, A. J. Duncan, R. D. Irwin, Homewood, Illinois.]
  • Acceptance-rejection algorithm: An algorithm for generating random numbers from some probability distribution, f(x), by ?rst generating a random number from some other distri- bution, g(x), where f and g are related by
  • Acceptance-region: A term associated with statistical significance tests, that gives the set of values of a test statistic for which the null hypothesis is not rejected. Suppose, for example, a z-test is being used to test the null hypothesis that the mean blood pressure of men and women is equal against the alternative hypothesis that the two means are not equal. If the chosen significance level of the test is 0.05 then the acceptance region consists of values of the test statistic z between –1.96 and 1.96. [Encyclopedia of Statistical Sciences, 2006, eds. S. Kotz,
  • Accident proneness: A personal psychological factor that affects an individual’s probability of suffering an accident. The concept has been studied statistically under a number of different assumptions for accidents:
  • Accidentally empty cells: Synonym for sampling zeros.
  • Accrual rate: The rate at which eligible patients are entered into a clinical trial, measured as persons per unit of time. Often disappointingly low for reasons that may be both physician and patient related. [Journal of Clinical Oncology, 2001, 19, 3554–61.]
  • Accuracy: The degree of conformity to some recognized standard value. See also bias.
  • ACE model: A biometrical genetic model that postulates additive genetic factors, common environ- mental factors, and speci?c environmental factors in a phenotype. The model is used to quantify the contributions of genetic and environmental influences to variation. [Encyclopedia of Behavioral Statistics, Volume 1, 2005, eds. B. S. Everitt and D. C. Howell, Wiley, Chichester.]
  • ACE: Abbreviation for alternating conditional expectation.
  • ACES: Abbreviation for active control equivalence studies.
  • ACF: Abbreviation for autocorrelation function.
  • ACORN: An acronym for ‘A Classification of Residential Neighbourhoods’. It is a system for classifying households according to the demographic, employment and housing charac- teristics of their immediate neighbourhood. Derived by applying cluster analysis to 40 variables describing each neighbourhood including age, class, tenure, dwelling type and car ownership. [Statistics in Society, 1999, eds. D. Dorling and S. Simpson, Arnold, London.]
  • Acquiescence bias: The bias produced by respondents in a survey who have the tendency to give positive responses, such as ‘true’, ‘like’, ‘often’ or ‘yes’ to a question. At its most extreme, the person responds in this way irrespective of the content of the item. Thus a person may respond ‘true’ to two items like ‘I always take my medication on time’ and ‘I often forget to take my pills’. See also end-aversion bias. [Journal of Intellectual Disability Research, 1995, 39, 331–40.]
  • Action lines: See quality control procedures.
  • Active control equivalence studies (ACES): Clinical trials in which the object is simply to show that the new treatment is at least as good as the existing treatment. Such studies are becoming more widespread due to current therapies that re?ect previous successes in the development of new treatments. The studies rely on an implicit historical control assump- tion, since to conclude that a new drug is ef?cacious on the basis of this type of study requires a fundamental assumption that the active control drug would have performed better than a placebo, had a placebo been used in the trial. [Statistical Issues in Drug Development, 2nd edition, 2008, S. Senn, Wiley-Blackwell, Chichester.]
  • Active control trials: Clinical trials in which the trial drug is compared with some other active compound rather than a placebo. [Annals of Internal Medicine, 2000, 135, 62–4.]
  • Active life expectancy (ALE): De?ned for a given age as the expected remaining years free of disability. A useful index of public health and quality of life for populations. A question of great interest is whether recent trends towards longer life expectancy have been accompanied by a comparable increase in ALE. [New England Journal of Medicine, 1983, 309, 1218–24.]
  • Actuarial estimator: An estimator of the survival function, S(t), often used when the data are in grouped form. Given explicitly by
  • Actuarial statistics: The statistics used by actuaries to evaluate risks, calculate liabilities and plan the ?nancial course of insurance, pensions, etc. An example is life expectancy for people of various ages, occupations, etc. See also life table. [Financial and Actuarial Statistics]]: An Introduction, 2003, D. S. Borowiak and A. F. Shapiro, CRC Press, Boca Raton.]
  • Adaptive cluster sampling: A procedure in which an initial set of subjects is selected by some sampling procedure and, whenever the variable of interest of a selected subject satis?es a given criterion, additional subjects in the neighbourhood of that subject are added to the sample. [Biometrika, 1996, 84, 209–19.]
  • Adaptive designs: Clinical trials that are modi?ed in some way as the data are collected within the trial. For example, the allocation of treatment may be altered as a function of the response to protect patients from ineffective or toxic doses. [Controlled Clinical Trials, 1999, 20, 172–86.]
  • Adaptive estimator: See adaptive methods.
  • Adaptive lasso: See lasso.
  • Adaptive methods of treatment assignment: Any method of treatment allocation in a clinical trial that uses accumulating outcome data to affect the treatment selection, for example, the O’Brien-Fleming method. [Biometrika, 1977, 64, 191–199.]
  • Adaptive methods: Procedures that use various aspects of the sample data to select the most appropriate type of statistical method for analysis. An adaptive estimator, T, for the centre of a distribution, for example, might be
  • Adaptive sampling design: A sampling design in which the procedure for selecting sampling units on which to make observations may depend on observed values of the variable of interest. In a survey for estimating the abundance of a natural resource, for example, additional sites (the sampling units in this case) in the vicinity of high observed abundance may be added to the sample during the survey. The main aim in such a design is to achieve gains in precision or ef?ciency compared to conventional designs of equivalent sample size by taking advantage of observed characteristics of the population. For this type of sampling design the probability of a given sample of units is conditioned on the set of values of the variable of interest in the population. [Adaptive Sampling, 1996, S. K. Thompson and
  • Added variable plot: A graphical procedure used in all types of regression analysis for identifying whether or not a particular explanatory variable should be included in a model, in the presence of other explanatory variables. The variable that is the candidate for inclusion in the model may be new or it may simply be a higher power of one currently included. If the candidate variable is denoted xi, then the residuals from the regression of the response variable on all the explanatory variables, save xi, are plotted against the residuals from the regression of x_i on the remaining explanatory variables. A strong linear relationship in the plot indicates the need for x_i in the regression equation (Fig. 1). [Regression Analysis, Volume 2, 1993, edited by M. S. Lewis-Beck, Sage Publications, London.]
  • Addition rule for probabilities: For two events, A and B that are mutually exclusive, the probability of either event occurring is the sum of the individual probabilities, i.e.
  • Additive clustering model: A model for cluster analysis which attempts to ?nd the structure of a
  • Additive effect: A term used when the effect of administering two treatments together is the sum of their separate effects. See also additive model. [Journal of Bone Mineral Research, 1995, 10, 1303–11.]
  • Additive genetic variance: The variance of a trait due to the main effects of genes. Usually obtained by a factorial analysis of variance of trait values on the genes present at one or more loci. [Statistics in Human Genetics, 1998, P. Sham, Arnold, London.]
  • Additive model: A model in which the explanatory variables have an additive effect on the response variable. So, for example, if variable A has an effect of size a on some response measure and variable B one of size b on the same response, then in an assumed additive model for A and B their combined effect would be a+b.
  • Additive outlier: A term applied to an observation in a time series which is affected by a non- repetitive intervention such as a strike, a war, etc. Only the level of the particular observation is considered affected. In contrast an innovational outlier is one which corresponds to an extraordinary shock at some time point T which also influences sub- sequent observations in the series. [Journal of the American Statistical Association, 1996, 91, 123–31.]
  • Additive tree: A connected, undirected graph where every pair of nodes is connected by a unique path and where the distances between the nodes are such that
  • Adelstein, Abe (1916-1993): Born in South Africa, Adelstein studied medicine at the University of the Witwatersrand. In the 1960s he emigrated to Manchester where he worked in the Department of Social Medicine. Later he was appointed Chief Medical Statistician for England and Wales. Adelstein made significant contributions to the classification of mental illness and to the epidemiology of suicide and alcoholism.
  • Adequate subset: A term used in regression analysis for a subset of the explanatory variables that is thought to contain as much information about the response variable as the complete set. See also selection methods in regression.
  • Adjacency matrix: A matrix with elements, xij, used to indicate the connections in a directed graph. If node i relates to node j, xij = 1, otherwise xij = 0. For a simple graph with no self-loops, the adjacency matrix must have zeros on the diagonal. For an undirected graph the adjacency matrix is symmetric. [Introductory Graph Theory, 1985, G. Chartrand, Dover, New York.]
  • Adjusted correlation matrix: A correlation matrix in which the diagonal elements are replaced by communalities. The basis of principal factor analysis.
  • Adjusted treatment means: Usually used for estimates of the treatment means in an analysis of covariance, after adjusting all treatments to the same mean level for the covariate(s), using the estimated relationship between the covariate(s) and the response variable. [Biostatistics]]: A Methodology for the Health Sciences, 2nd edn, 2004, G. Van Belle, L. D. Fisher, P. J. Heagerty and T. S. Lumley, Wiley, New York.]
  • Adjusting for baseline: The process of allowing for the effect of baseline characteristics on the response variable usually in the context of a longitudinal study. See also Lord’s paradox
  • Administrative databases: Databases storing information routinely collected for purposes of managing a health-care system. Used by hospitals and insurers to examine admissions, procedures and lengths of stay. [Healthcare Management Forum, 1995, 8, 5–13.]
  • Admissibility: A very general concept that is applicable to any procedure of statistical inference. The underlying notion is that a procedure is admissible if and only if there does not exist within that class of procedures another one which performs uniformly at least as well as the procedure in question and performs better than it in at least one case. Here ‘uniformly’ means for all values of the parameters that determine the probability distribution of the random variables under investigation. [KA2 Chapter 31.]
  • Admixture in human populations: The inter-breeding between two or more populations that were previously isolated from each other for geographical or cultural reasons. Population admixture can be a source of spurious associations between diseases and alleles that are both more common in one ancestral population than the others. However, populations that have been admixed for several generations may be useful for mapping disease genes, because spurious associations tend to be dissipated more rapidly than true associations in successive generations of random mating. [Statistics in Human Genetics, 1998, P. Sham, Arnold, London.]
  • Adoption studies: Studies of the rearing of a nonbiological child in a family. Such studies have played an important role in the assessment of genetic variation in human and animal traits. [Foundations of Behavior Genetics, 1978, J. L. Fulker and W. R. Thompson, Mosby, St. Louis.]
  • Adverse selection: A term used in insurance when the insurer cannot distinguish between members of good- and poor-risk categories for a certain hazard and the poor-risks are the only purchasers of coverage with the consequence that the insurer expects to lose money on each policy sold. [Quarterly Journal of Economics, 1976, 90, 629–650.]
  • Aetio logical fraction: Synonym for attributable risk.
  • Affine invariance: A term applied to statistical procedures which give identical results after the data has been subjected to an af?ne transformation. An example is Hotelling’s T 2 test. [Canadian Journal of Statistics, 2003, 31, 437–55.]
  • Affine transformation: The transformation, Y= AX+b where A is a nonsingular matrix and b is any vector of real numbers. Important in many areas of statistics particularly multivariate analysis.
  • Age heaping: A term applied to the collection of data on ages when these are accurate only to the nearest year, half year or month. Occurs because many people (particularly older people) tend not to give their exact age in a survey. Instead they round their age up or down to the nearest number that ends in 0 or 5. See also coarse data and Whipple index. [Population Studies, 1991, 45, 497–518.]
  • Age-period-cohort model: A model important in many observational studies when it is reasonable to suppose that age, number of years exposed to risk factor, and age when ?rst exposed to risk factor, all contribute to disease risk. Unfortunately all three factors cannot be entered simultaneously into a model since this would result in collinearity, because ‘age ?rst exposed to risk factor’+‘years exposed to risk factor’ is equal to ‘age’. Various methods have been suggested for disentangling the dependence of the factors, although most commonly one of the factors is simply not included in the modelling process. See also Lexis diagram. [Statistics in Medicine, 1984, 3, 113–30.]
  • Age-dependent birth and death process: A birth and death process where the birth rate and death rate are not constant over time, but change in a manner which is dependent on the age of the individual. [Stochastic Modelling of Scienti?c Data, 1995, P. Guttorp, Chapman and Hall/CRC Press, London.]
  • Age-related reference ranges: Ranges of values of a measurement that give the upper and lower limits of normality in a population according to a subject’s age. [Archives of Disease in Childhood, 2005, 90, 1117–1121.]
  • Age-specific deathrates: Death rates calculated within a number of relatively narrow age bands.
  • Age-specific failure rate: A synonym for hazard function when the time scale is age. [Statistical Methods for Survival Data Analysis, 3rd edn, E. T. Lee and J. W. Wang, Wiley, New York.]
  • Age-specific incidence rate: Incidence rates calculated within a number of relatively narrow age bands. See also age-speci?c death rates. [Cancer Epidemiology Biomarkers and Prevention, 2004, 13, 1128–1135.]
  • [[Agglomerative hierarchical clustering methods: Methods of cluster analysis that begin with each individual in a separate cluster and then, in a series of steps, combine individuals and later, clusters, into new, larger clusters until a ?nal stage is reached where all individuals are members of a single group. At each stage the individuals or clusters that are ‘closest’, according to some particular definition of distance are joined. The whole process can be summarized by a dendrogram. Solutions corresponding to particular numbers of clusters are found by ‘cutting’ the dendrogram at the appropriate level. See also average linkage,
  • Agreement: The extent to which different observers, raters or diagnostic tests agree on a binary classification. Measures of agreement such as the kappa coef?cient quantify the relative frequency of the diagonal elements in a two-by-two contingency table, taking agreement due to chance into account. It is important to note that strong agreement requires strong association whereas strong association does not require strong agreement. [Statistical Methods for Rates and Proportions, 2nd edn, 2001, J. L.Fleiss, Wiley, New York.]
  • Agresti’s a: A generalization of the odds ratio for 2×2 contingency tables to larger contingency tables arising from data where there are different degrees of severity of a disease and differing amounts of exposure. [Analysis of Ordinal Categorical Data, 1984, A. Agresti, Wiley, New York.]
  • Agronomy trials: A general term for a variety of different types of agricultural ?eld experiments including fertilizer studies, time, rate and density of planting, tillage studies, and pest and
  • AI: Abbreviation for arti?cial intelligence.
  • AIC: Abbreviation for Akaike’s information criterion.
  • Aickin’s measure of agreement: A chance-corrected measure of agreement which is similar to the kappa coef?cient but based on a different definition of agreement by chance. [Biometrics, 1990, 46, 293–302.]
  • AID: Abbreviation for automatic interaction detector.
  • Aitchison distributions: A broad class of distributions that includes the Dirichlet distribution and logistic normal distributions as special cases. [Journal of the Royal Statistical Society, Series B, 1985, 47, 136–46.]
  • Aitken, Alexander Craig (1895-1967): Born in Dunedin, New Zealand, Aitken ?rst studied classical languages at Otago University, but after service during the First World War he was given a scholarship to study mathematics in Edinburgh. After being awarded a D.Sc., Aitken became a member of the Mathematics Department in Edinburgh and in 1946 was given the Chair of Mathematics which he held until his retirement in 1965. The author of many papers on least squares and the ?tting of polynomials, Aitken had a legendary ability at arithmetic and was reputed to be able to dictate rapidly the ?rst 707 digits of p. He was a Fellow of the Royal Society and of the Royal Society of Literature. Aitken died on 3 November 1967 in Edinburgh.
  • Ajne’s test: A distribution free method for testing the uniformity of a circular distribution. The test statistic An is defined as
  • Akaike’s information criterion (AIC): An index used in a number of areas as an aid to choosing between competing models. It is defined as

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 TheCambridgeDictionaryOfStatisticsBrian S. EverittThe Cambridge Dictionary of Statistics. 3rd Edition