Statistical Model Family

Jump to navigation Jump to search

A statistical model family is a mathematical model family that defines a probability function set by means of a statistical model parameter vector [math]\displaystyle{ (S\theta, B\theta, P\theta, f\theta, W, P) }[/math], where ...



  • (Wikipedia, 2014) ⇒ Retrieved:2014-8-12.
    • A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related. In mathematical terms, a statistical model is frequently thought of as a pair [math]\displaystyle{ (Y, P) }[/math] where [math]\displaystyle{ Y }[/math] is the set of possible observations and [math]\displaystyle{ P }[/math] the set of possible probability distributions on [math]\displaystyle{ Y }[/math]. It is assumed that there is a distinct element of [math]\displaystyle{ P }[/math] which generates the observed data. Statistical inference enables us to make statements about which element(s) of this set are likely to be the true one.

      Most statistical tests can be described in the form of a statistical model. For example, the Student's t-test for comparing the means of two groups can be formulated as seeing if an estimated parameter in the model is different from 0. Another similarity between tests and models is that there are assumptions involved. Error is assumed to be normally distributed in most models.


    • QUOTE: A statistical model is a collection of probability distribution functions or probability density functions (collectively referred to as distributions for brevity). A parametric model is a collection of distributions, each of which is indexed by a unique finite-dimensional parameter: [math]\displaystyle{ \mathcal{P}=\{\mathbb{P}_{\theta} : \theta \in \Theta\} }[/math], where [math]\displaystyle{ \theta }[/math] is a parameter and [math]\displaystyle{ \Theta \subseteq \mathbb{R}^d }[/math] is the feasible region of parameters, which is a subset of d-dimensional Euclidean space. A statistical model may be used to describe the set of distributions from which one assumes that a particular data set is sampled. For example, if one assumes that data arise from a univariate Gaussian distribution, then one has assumed a Gaussian model: [math]\displaystyle{ \mathcal{P}=\{\mathbb{P}(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left\{ -\frac{1}{2\sigma^2}(x-\mu)^2\right\} : \mu \in \mathbb{R}, \sigma \gt 0\} }[/math].

      A non-parametric model is a set of probability distributions with infinite dimensional parameters, and might be written as [math]\displaystyle{ \mathcal{P}=\{\text{all distributions}\} }[/math]. A semi-parametric model also has infinite dimensional parameters, but is not dense in the space of distributions. For example, a mixture of Gaussians with one Gaussian at each data point is dense in the space of distributions. Formally, if d is the dimension of the parameter, and n is the number of samples, if [math]\displaystyle{ d \rightarrow \infty }[/math] as [math]\displaystyle{ n \rightarrow \infty }[/math] and [math]\displaystyle{ d/n \rightarrow 0 }[/math] as [math]\displaystyle{ n \rightarrow \infty }[/math], then the model is semi-parametric.

    • In probability and statistics, an 'exponential family is an important class of probability distributions sharing a certain form, specified below. ... Properly speaking, there is no such thing as "the" exponential family, but rather an exponential family, and properly speaking, it is not a "distribution" but a family of distributions that either is or is not an exponential family. The problem lies in the fact that we often say, e.g., "the normal distribution" when properly we mean something like "the family of normal distributions with unknown mean and variance". A family of distributions is defined by a set of parameters that can be varied, and what makes a family be an exponential family is a particular relationship between the domain of a family of distributions (the variable over which each distribution in the family is defined) and the parameters.




  • American Meteorology Society. (2007). “Glossary of Meteorology"
    • QUOTE: stochastic model — A model of a system that includes some sort of random forcing. In many cases, stochastic models are used to simulate deterministic systems that include smaller- scale phenomena that cannot be accurately observed or modeled. As such, these small-scale phenomena are effectively unpredictable. A good stochastic model manages to represent the average effect of unresolved phenomena on larger-scale phenomena in terms of a random forcing.


  • (Cox, 2006) ⇒ David R. Cox. (2006). “Principles of Statistical Inference." Cambridge University Press. ISBN:9780521685672
    • QUOTE: Key ideas about probability models and the objectives of statistical analysis are introduced. The differences between frequentist and Bayesian analyses are illustrated in a very special case. ... We use throughout the notation that observable random variables are represented by capital letters and observations by the corresponding lower case letters. ... A model, or strictly a family of models, specifies the density of [math]\displaystyle{ Y }[/math] to be [math]\displaystyle{ f_Y(y:z:\theta) }[/math] Where θ ⊂ Ωθ is unknown. The distribution may depend also on design features of the study that generated the data. We typically simplify the notation to fY(y://θ), although the explanatory variables [math]\displaystyle{ z }[/math] are frequently essential in specific applications. To chose the model appropriately is crucial to fruitful application.



  • (Isaev, 2004) ⇒ Alexander Isaev. (2004). “Introduction to Mathematical Methods in Bioinformatics." Springer. ISBN:3540219730
    • QUOTE: A statistical model is a family of probability spaces [math]\displaystyle{ {(S_\theta, \mathcal{B}_\theta, P_\theta)} }[/math] and a family of random variables [math]\displaystyle{ {f_\theta} }[/math] with common range [math]\displaystyle{ \mathcal{W} \subset \Re }[/math], each defined on the respective space for [math]\displaystyle{ \theta \in \mathcal{P} }[/math], where [math]\displaystyle{ \mathcal{P} }[/math] is an index set. The variable [math]\displaystyle{ \theta }[/math] denotes the parameters of the model, the set is called the range of the model, and the index set [math]\displaystyle{ \mathcal{P} }[/math] the parameter space of the model. Hence a statistical model can be thought of as a family [math]\displaystyle{ \{(S_\theta, \mathcal{B}_\theta, P_\theta, f_\theta, \mathcal{W}, \mathcal{P})\} }[/math]. When studying and applying statistical models, one is primarily interested in the distributions of [math]\displaystyle{ f_\theta }[/math]. Therefore, often statistical models are not specified in full as in Definition 8.1, but only the family [math]\displaystyle{ \{F_\theta = F_{f_\theta}\} }[/math], [math]\displaystyle{ \theta \in \mathcal{P} }[/math], of the distribution functions of [math]\displaystyle{ f_\theta }[/math] is given. Once the family [math]\displaystyle{ \{F_\theta\}, \theta \in \mathcal{P} }[/math], is specified, one can construct a statistical model in the sense of definition 8.1, for example, by setting [math]\displaystyle{ S_\theta = \Re }[/math], [math]\displaystyle{ \mathcal{B}_\theta = \mathcal{B}(\mathcal{F}_0) }[/math] (the σ-algebra of Borel sets in $\Re$), [math]\displaystyle{ P_\theta = P_{F_\theta} }[/math], [math]\displaystyle{ f_\theta(x)=x }[/math], [math]\displaystyle{ \mathcal{W} = \Re }[/math], for [math]\displaystyle{ \theta \in \mathcal{P} }[/math] (see Sect. 6.7). This model, however, is not always useful. We also remark that one can consider more general statistical models by allowing [math]\displaystyle{ f_\theta }[/math] to be vector-valued.




  • (Hogg & Ledolter, 1987) ⇒ Robert V. Hogg and Johannes Ledolter. (1987). “Engineering Statistics. Macmillan Publishing Company.
    • QUOTE: In applied mathematics we are usually concerned with either deterministic or probabilistic models, although in many instances these are intertwined. ... a deterministic model because everything is known once ... conditions are specified.



  • (Thomas & Ross, 1980) ⇒ Ewart A. C. Thomas, and Brian H. Ross. (1980). “On appropriate procedures for combining probability distributions within the same family.” In: Journal of Mathematical Psychology, 21(2). [doi>10.1016/0022-2496(80)90003-6]