Neyman-Fisher factorization theorem

Jump to: navigation, search

A Neyman-Fisher factorization theorem is a statistical inference criterion that provides a method to obtain sufficient statistics.



[math] f_\theta(x)=h(x) \, g_\theta(T(x)), \,\![/math]
i.e. the density ƒ can be factored into a product such that one factor, h, does not depend on θ and the other factor, which does depend on θ, depends on x only through T(x).
It is easy to see that if F(t) is a one-to-one function and T is a sufficient statistic, then F(T) is a sufficient statistic. In particular we can multiply a sufficient statistic by a nonzero constant and get another sufficient statistic.
  • Retrieved 2017-11-08.
    • Better known as “Neyman-Fisher Factorization Criterion”, it provides a relatively simple procedure either to obtain sufficient statistics or check if a specific statistic could be sufficient. Fisher was the first who established the Factorization Criterion like a sufficient condition for sufficient statistics in 1922. Years later, Neyman demonstrated it necessity under certain restrictive conditions in 1935. Finally, Halmos and Savage extended it in 1949 as follows:
Let ℘={Pθ,θ∈Ω} be a family of probability measures on a measurable space (ΘX,A) absolutely continuous with respect to a σ-finite measure μ.
Let suppose that its probability densities in the Radon-Nicodym sense pθ=dPθ/dμ exist a.s.[μ] (almost sure for μ). A necessary and sufficient condition for the sufficiency with respect to ℘ of a statistic T transforming the probability space (ΘX,A,Pθ) into (ΘT,B,PT) is the existence ∀θ∈Ω of a T-1(B)-measurable function gθT(x) and an A-measurable function h(x)≠0 a.s.[Pθ], both defined ∀x∈ΘX, nonnegatives and μ-integrable, such that pθ(x)= gθT(x)· h(x), a.s.[μ].
  • Densities pθ can be either probability density functions from absolutely continuous random variables or probability functions from discrete random variables among other possibilities, depending on the nature and definition of μ.

    In common economic practice, this Factorization Criterion adopts simpler appearances. Thereby, in Estimation Theory under random sampling, the criterion is usually enounced as follows:

Let X be a random variable belonging to a regular family of distributions F(x;θ) which depends on a parameter θ (mixture of absolutely continuous and discrete random variables on values not depending on the parameter) representing some characteristic of certain population. Moreover, let x=(X1, X2, …, Xn) represent a random sample size n of X, extracted from such a population.
A necessary and sufficient condition for the sufficiency of a statistic T=t(x) with respect to the family of distributions F(x;θ) is that the sample likelihood function Ln(x;θ) could be factorized like Ln(x;θ)=g(T;θ)· h(x). Here, “g” and “h” are non-negative real functions, “g” depending on sample observations through the statistic exclusively and “h” not depending on the parameter.
  • When the random variable is absolutely continuous (discrete), function g(t;θ) is closely related to the probability density function (probability function) of the statistic T. Thus, the criterion could be equivalently enounced assuming the function g(t;θ) to be exactly such probability density function (probability function). In this case, the usually complex work of deducing the distribution x|T of the original observations conditioned to the statistic T becomes easier, being specified by h(x).

    According to this Factorization Criterion, any invertible function of a sufficient statistic T* =k(T) is a sufficient statistic too.

    Likewise, exponential family defined by probability density functions like f(x;θ)=k(x)· p(θ)· exp[c(θ)’T(x)] always admits a sufficient statistic T.