Keywords: Textbook, Statistical Model, Statistical Modeling.
Notes
- Website: http://statwww.epfl.ch/people/~davison/SM
- First and second chapter: http://statwww.epfl.ch/davison/SM/SMsample.pdf
- Table of Contents
- 1. Introduction
- 2. Variation
- 3. Uncertainty
- 4. Likelihood
- 5. Models
- 6. Stochastic Models
- 7. Estimation and Hypothesis Testing
- 8. Linear Regression Models
- 9. Designed Experiments
- 10. Nonlinear Regression Models
- 11. Bayesian Models
- 12. Conditional and Marginal Inference
- Key words and phrases: Confidence Interval, Random Variable, Maximum Likelihood Estimate, Sufficient Statistic, Poisson Process, Linear Model, Moment-Generating Function, Design Matrix, Order Statistics, Bayes Factor, Hazard Function, Marginal Likelihood, Markov Chain, Standard Error, Log-Linear Model, Likelihood Ratio, Laplace Approximation, Quantile, Weighted Least Squares, Analysis of Variance, Nonlinear Regression Model, Bayesian Model, Conditional Inference, Marginal Inference, Stochastic Model, Estimation, Hypothesis Testing, Linear Regression Model.
Quotes
Book overview
- Models and likelihood are the backbone of modern statistics and data analysis. The coverage is unrivalled, with sections on survival analysis, missing data, Markov chains, Markov random fields, point processes, graphical models, simulation and Markov chain Monte Carlo, estimating functions, asymptotic approximations, local likelihood and spline regressions as well as on more standard topics. Anthony Davison blends theory and practice to provide an integrated text for advanced undergraduate and graduate students, researchers and practicioners. Its comprehensive coverage makes this the standard text and reference in the subject.
Preface
1 Introduction
- Statistics concerns what can be learned from data. Applied statistics comprises a body of methods for data collection and analysis across the whole range of science, and in areas such as engineering, medicine, business, and law — wherever variable data must be summarized, or used to test or confirm theories, or to inform decisions. Theoretical statistics underpins this by providing a framework for understanding the properties and scope of methods used in applications.
- Statistical ideas may be expressed most precisely and economically in mathematical terms, but contact with data and with scientific reasoning has given statistics a distinctive outlook. Whereas mathematics is often judged by its elegance and generality, many statistical developments arise as a result of concrete questions posed by investigators and data that they hope will provide answers, and elegant and general solutions are not always available. The huge variety of such problems makes it hard to develop a single over-arching theory, but nevertheless common strands appear. Uniting them is the idea of a statistical model.
- The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. Typically it must accommodate both random and systematic variation. The randomness inherent in the probability distribution accounts for apparently haphazard scatter in the data, and systematic pattern is supposed to be generated by structure in the model. The art of modelling lies in finding a balance that enables the questions at hand to be answered or new ones posed. The complexity of the model will depend on the problem at hand and the answer required, so different models and analyses may be appropriate for a single set of data.
Outline
- The idea of treating data as outcomes of random variables has implications for how they should be treated. For example, graphical and numerical summaries of the observations will show variation, and it is important to understand its consequences.
Notation
- Probability, expectation, variance, covariance, and correlation are denoted Pr(·), E(·), var(·) cov(·, ·), and corr(·, ·), while cum(·, ·, · · ·) is occasionally used to denote a cumulant. We use I(A) to denote the indicator random variable, which equals 1 if the event A occurs and 0 otherwise.
- We mostly reserve Z for standard normal random variables.
2 Variation
2.1.1 Data Summaries
- We generally deal with an ensemble of n observations, y1, . . . , yn, known as a sample. Occasionally interest centres on the given sample alone, and if n is not tiny it will be useful to summarize the data in terms of a few numbers. We say that a quantity s = s(y1, . . . , yn) that can be calculated from y1, . . . , yn is a statistic.
- Two basic features of a sample are its typical value and a measure of how spread out the sample is, sometimes known respectively as location and scale. They can be summarized in many ways.
- The fundamental idea of statistical modelling is to treat data as the observed values of random variables. The most basic model is that the data y1, . . . , yn available are the observed values of a random sample of size n, defined to be a collection of n independent identically distributed random variables, Y1, . . . , Yn. We suppose that each of the Yj has the same cumulative distribution function, F, which represents the population from which the sample has been taken. If F were known, we could in principle use the rules of probability calculus to deduce any of its properties — such as its mean and variance, or the probability distribution for a future observation — and any difficulties would be purely computational. In practice, however, F is unknown, and we must try to infer its properties from the data.
2.4 Moments and Cumulants