Poisson Regression Algorithm

From GM-RKB
Jump to navigation Jump to search

A Poisson Regression Algorithm is a generalized linear model regression algorithm that is restricted to a Poisson distribution family.



References

2019

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Poisson_distribution#Maximum_likelihood Retrieved:2015-6-14.
    • Given a sample of n measured values ki  = 0, 1, 2, ..., for i = 1, ..., n, we wish to estimate the value of the parameter λ of the Poisson population from which the sample was drawn. The maximum likelihood estimate is : [math]\displaystyle{ \widehat{\lambda}_\mathrm{MLE}=\frac{1}{n}\sum_{i=1}^n k_i. \! }[/math] Since each observation has expectation λ so does this sample mean. Therefore the maximum likelihood estimate is an unbiased estimator of λ. It is also an efficient estimator, i.e. its estimation variance achieves the Cramér–Rao lower bound (CRLB).Hence it is minimum-variance unbiased. Also it can be proved that the sum (and hence the sample mean as it is a one-to-one function of the sum) is a complete and sufficient statistic for λ.

      To prove sufficiency we may use the factorization theorem. Consider partitioning the probability mass function of the joint Poisson distribution for the sample into two parts: one that depends solely on the sample [math]\displaystyle{ \mathbf{x} }[/math] (called [math]\displaystyle{ h(\mathbf{x}) }[/math] ) and one that depends on the parameter [math]\displaystyle{ \lambda }[/math] and the sample [math]\displaystyle{ \mathbf{x} }[/math] only through the function [math]\displaystyle{ T(\mathbf{x}) }[/math] . Then [math]\displaystyle{ T(\mathbf{x}) }[/math] is a sufficient statistic for [math]\displaystyle{ \lambda }[/math] . : [math]\displaystyle{ P(\mathbf{x})=\prod_{i=1}^n\frac{\lambda^x e^{-\lambda}}{x!}=\frac{1}{\prod_{i=1}^n x_i!} \times \lambda^{\sum_{i=1}^n x_i}e^{-n\lambda} }[/math] Note that the first term, [math]\displaystyle{ h(\mathbf{x}) }[/math] , depends only on [math]\displaystyle{ \mathbf{x} }[/math] . The second term, [math]\displaystyle{ g(T(\mathbf{x})|\lambda) }[/math] , depends on the sample only through [math]\displaystyle{ T(\mathbf{x})=\sum_{i=1}^nx_i }[/math] . Thus, [math]\displaystyle{ T(\mathbf{x}) }[/math] is sufficient.

      To find the parameter λ that maximizes the probability function for the Poisson population, we can use the logarithm of the probability function: : [math]\displaystyle{ \begin{align} L(\lambda) & = \ln \prod_{i=1}^n f(k_i \mid \lambda) \\ & = \sum_{i=1}^n \ln\!\left(\frac{e^{-\lambda}\lambda^{k_i}}{k_i!}\right) \\ & = -n\lambda + \left(\sum_{i=1}^n k_i\right) \ln(\lambda) - \sum_{i=1}^n \ln(k_i!). \end{align} }[/math] We take the derivative of L with respect to λ and compare it to zero: : [math]\displaystyle{ \frac{\mathrm{d}}{\mathrm{d}\lambda} L(\lambda) = 0 \iff -n + \left(\sum_{i=1}^n k_i\right) \frac{1}{\lambda} = 0. \! }[/math] Solving for λ gives a stationary point. : [math]\displaystyle{ \lambda = \frac{\sum_{i=1}^n k_i}{n} }[/math] So λ is the average of the ki values. Obtaining the sign of the second derivative of L at the stationary point will determine what kind of extreme value λ is. : [math]\displaystyle{ \frac{\partial^2 L}{\partial \lambda^2} = -\lambda^{-2}\sum_{i=1}^n k_i }[/math] Evaluating the second derivative at the stationary point gives: : [math]\displaystyle{ \frac{\partial^2 L}{\partial \lambda^2} = - \frac{n^2}{\sum_{i=1}^n k_i} }[/math] which is the negative of n times the reciprocal of the average of the ki. This expression is negative when the average is positive. If this is satisfied, then the stationary point maximizes the probability function.

      For completeness, a family of distributions is said to be complete if and only if [math]\displaystyle{ E(g(T)) = 0 }[/math] implies that [math]\displaystyle{ P_\lambda(g(T) = 0) = 1 }[/math] for all [math]\displaystyle{ \lambda }[/math] . If the individual [math]\displaystyle{ X_i }[/math] are iid [math]\displaystyle{ \mathrm{Po}(\lambda) }[/math], then [math]\displaystyle{ T(\mathbf{x})=\sum_{i=1}^nX_i\sim \mathrm{Po}(n\lambda) }[/math] . Knowing the distribution we want to investigate, it is easy to see that the statistic is complete. : [math]\displaystyle{ E(g(T))=\sum_{t=0}^\infty g(t)\frac{(n\lambda)^te^{-n\lambda}}{t!}=0 }[/math] For this equality to hold, it is obvious that [math]\displaystyle{ g(t) }[/math] must be 0. This follows from the fact that none of the other terms will be 0 for all [math]\displaystyle{ t }[/math] in the sum and for all possible values of [math]\displaystyle{ \lambda }[/math] . Hence, [math]\displaystyle{ E(g(T)) = 0 }[/math] for all [math]\displaystyle{ \lambda }[/math] implies that [math]\displaystyle{ P_\lambda(g(T) = 0) = 1 }[/math] , and the statistic has been shown to be complete.