Hessian Matrix

From GM-RKB
Jump to navigation Jump to search

A Hessian Matrix is a square matrix that represents the second-order partial derivatives of some differential equation.



References

2012

  • (Wikipedia - Hessian, 2012-May-17) ⇒ http://en.wikipedia.org/wiki/Hessian_matrix
    • In mathematics, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse himself had used the term "functional determinants".

      Given the real-valued function :[math]\displaystyle{ f(x_1, x_2, \dots, x_n),\,\! }[/math] if all second partial derivatives of f exist, then the Hessian matrix of f is the matrix :[math]\displaystyle{ H(f)_{ij}(x) = D_i D_j f(x)\,\! }[/math] where x = (x1, x2, ..., xn) and Di is the differentiation operator with respect to the ith argument and the Hessian becomes :[math]\displaystyle{ H(f) = \begin{bmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\,\partial x_n} \\[2.2ex] \dfrac{\partial^2 f}{\partial x_2\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\,\partial x_n} \\[2.2ex] \vdots & \vdots & \ddots & \vdots \\[2.2ex] \dfrac{\partial^2 f}{\partial x_n\,\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\,\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \end{bmatrix}. }[/math]

      Because f is often clear from context, [math]\displaystyle{ H(f)(x) }[/math] is frequently shortened to simply [math]\displaystyle{ H(x) }[/math].

      The Hessian matrix is related to the Jacobian matrix by, [math]\displaystyle{ H(f)(x) }[/math] = [math]\displaystyle{ J(\nabla \! f)(x) }[/math].

      Some mathematicians define the Hessian as the determinant of the above matrix.

      Hessian matrices are used in large-scale optimization problems within Newton-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is, :[math]\displaystyle{ y=f(\mathbf{x}+\Delta\mathbf{x})\approx f(\mathbf{x}) + J(\mathbf{x})\Delta \mathbf{x} +\frac{1}{2} \Delta\mathbf{x}^\mathrm{T} H(\mathbf{x}) \Delta\mathbf{x} }[/math] where J is the Jacobian matrix, which is a vector (the gradient) for scalar-valued functions. The full Hessian matrix can be difficult to compute in practice; in such situations, quasi-Newton algorithms have been developed that use approximations to the Hessian. The best-known quasi-Newton algorithm is the BFGS algorithm.


2011

  • (Weisstein - Hessian, 2011-Jul-17) ⇒ Eric W. Weisstein. (2011). “Hessian." From MathWorld -- A Wolfram Web Resource.
    • QUOTE: The Jacobian matrix of the derivatives [math]\displaystyle{ \partial{f}/\partial{x_1}, \partial{f}/\partial{x_2}, ..., \partial{f}/\partial{x_n} }[/math] of a function [math]\displaystyle{ f(x_1,x_2,...,x_n) }[/math] with respect to [math]\displaystyle{ x_1, x_2, ..., x_n }[/math] is called the Hessian [math]\displaystyle{ H }[/math] of [math]\displaystyle{ f }[/math], i.e., [math]\displaystyle{ Hf(x_1,x_2,...,x_n)=[(\partial^2{f})/(\partial{x_1^2}) (\partial^2{f})/(\partial{x_1}\partial{x_2}) (\partial^2{f})/(\partial{x_1}\partial{x_3}) … (\partial^2{f})/(\partial{x_1}\partial{x_n}); (\partial^2{f})/(\partial{x_2}\partial{x_1}) (\partial^2{f})/(\partial{x_2^2}) (\partial^2{f})/(\partial{x_2}\partial{x_3}) … (\partial^2{f})/(\partial{x_2}\partial{x_n}); | | | … |; (\partial^2{f})/(\partial{x_n}\partial{x_1}) (\partial^2{f})/(\partial{x_n}\partial{x_2}) (\partial^2{f})/(\partial{x_n}\partial{x_3}) … (\partial^2{f})/(\partial{x_n^2}).] }[/math] As in the case of the Jacobian, the term "Hessian" unfortunately appears to be used both to refer to this matrix and to the determinant of this matrix (Gradshteyn and Ryzhik 2000, p. 1069). In the second derivative test for determining extrema of a function [math]\displaystyle{ f(x,y) }[/math], the discriminant [math]\displaystyle{ D }[/math] is given by [math]\displaystyle{ Hf(x,y)=|(\partial^2{f})/(\partial{x^2}) (\partial^2{f})/(\partial{x}\partial{y}); (\partial^2{f})/(\partial{y}\partial{x}) (\partial^2{f})/(\partial{y^2})| }[/math].

2000