Conjugate Gradient Optimization Algorithm: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
m (Text replacement - "]]↵*" to "]]. *") |
||
(29 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
A [[Conjugate Gradient Optimization Algorithm]] is a [[batch function optimization algorithm]] that ... | A [[Conjugate Gradient Optimization Algorithm]] is a [[batch function optimization algorithm]] that ... | ||
* <B>AKA:</B> [[CG]]. | * <B>AKA:</B> [[CG]]. | ||
* <B> | ** … | ||
* <B>Counter-Example(s):</B> | |||
** [[L-BFGS Algorithm]]. | |||
** [[Generalized Iterative Scaling]]. | |||
* <B>See:</B> [[Nonlinear Conjugate Gradient Algorithm]], [[Biconjugate Gradient Algorithm]]. | |||
---- | ---- | ||
---- | ---- | ||
==References== | |||
===2012=== | == References == | ||
=== 2012 === | |||
* http://en.wikipedia.org/wiki/Conjugate_gradient_method | * http://en.wikipedia.org/wiki/Conjugate_gradient_method | ||
** QUOTE: In [[mathematics]], the | ** QUOTE: In [[mathematics]], the <B>conjugate gradient method</B> is an [[algorithm]] for the [[numerical solution]] of particular [[system of linear equations|systems of linear equations]], namely those whose matrix is [[symmetric matrix|symmetric]] and [[positive-definite matrix|positive-definite]]. The conjugate gradient method is an [[iterative method]], so it can be applied to [[sparse matrix|sparse]] systems that are too large to be handled by direct methods such as the [[Cholesky decomposition]]. Such systems often arise when numerically solving [[partial differential equation]]s. <P> The conjugate gradient method can also be used to solve unconstrained [[Mathematical optimization|optimization]] problems such as [[energy minimization]]. It was developed by [[Magnus Hestenes]] and [[Eduard Stiefel]].<ref>{{cite web|last=Straeter|first=T. A.|title=On the Extension of the Davidon-Broyden Class of Rank One, Quasi-Newton Minimization Methods to an Infinite Dimensional Hilbert Space with Applications to Optimal Control Problems|url=http://hdl.handle.net/2060/19710026200|work=NASA Technical Reports Server|publisher=NASA|accessdate=10 October 2011}}</ref> <P> The [[biconjugate gradient method]] provides a generalization to non-symmetric matrices. Various [[nonlinear conjugate gradient method]]s seek minima of nonlinear equations. | ||
<references/> | <references/> | ||
=== 2006 === | |||
* ([[2006_AcceleratedTrainingofConditiona|Vishwanathan et al., 2006]]) ⇒ [[S. V. N. Vishwanathan]], [[Nicol N. Schraudolph]], [[Mark W. Schmidt]], and [[Kevin P. Murphy]]. ([[2006]]). “[http://people.cs.ubc.ca/~murphyk/Papers/icml06_camera.pdf Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods].” In: [[Proceedings of the 23rd International Conference on Machine learning]] ([[ICML-2006]]). [http://dx.doi.org/10.1145/1143844.1143966 doi:10.1145/1143844.1143966] | |||
** QUOTE: … Current [[training methods for CRFs]] ([[In this paper]], “training” specifically means [[penalized maximum likelihood parameter estimation]]) include [[generalized iterative scaling (GIS)]], [[Conjugate Gradient Optimization Algorithm|conjugate gradient (CG)]], and [[limited-memory BFGS]]. These are all [[batch-only algorithm]]s that do not work well in an online setting, and require many [[Training Data Pass|pass]]es through the [[training data]] to [[converge]]. … [[Stochastic gradient method]]s, on the other hand, are [[online]] and scale [[sub-linear]]ly with the [[amount of training data]], making them very attractive for [[large data set]]s; | |||
=== 1994 === | |||
* ([[1994_TrainingFeedforwardNetworkswith|Hagan & Menhaj, 1994]]) ⇒ [[Martin T. Hagan]], and [[Mohammad B. Menhaj]]. ([[1994]]). “[http://www.das.ufsc.br/~marcelo/pg-ic/Marquardt%20algorithm%20for%20MLP.pdf Training Feedforward Networks with the Marquardt Algorithm].” In: IEEE Transactions on Neural Networks Journal, 5(6). [http://dx.doi.org/10.1109/72.329697 doi:10.1109/72.329697] | |||
** QUOTE: The [[Marquardt algorithm]] for [[nonlinear least square]]s is presented and is incorporated into the [[backpropagation algorithm]] for [[training feedforward neural networks]]. </s> [[The algorithm]] is tested on several [[function approximation problem]]s, and is compared with a [[Conjugate Gradient Optimization Algorithm|conjugate gradient algorithm]] and a [[variable learning rate algorithm]]. </s> | |||
---- | ---- | ||
__NOTOC__ | __NOTOC__ | ||
[[Category:Concept]] | [[Category:Concept]] |
Latest revision as of 17:51, 4 October 2023
A Conjugate Gradient Optimization Algorithm is a batch function optimization algorithm that ...
- AKA: CG.
- …
- Counter-Example(s):
- See: Nonlinear Conjugate Gradient Algorithm, Biconjugate Gradient Algorithm.
References
2012
- http://en.wikipedia.org/wiki/Conjugate_gradient_method
- QUOTE: In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is an iterative method, so it can be applied to sparse systems that are too large to be handled by direct methods such as the Cholesky decomposition. Such systems often arise when numerically solving partial differential equations.
The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It was developed by Magnus Hestenes and Eduard Stiefel.[1]
The biconjugate gradient method provides a generalization to non-symmetric matrices. Various nonlinear conjugate gradient methods seek minima of nonlinear equations.
- QUOTE: In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is an iterative method, so it can be applied to sparse systems that are too large to be handled by direct methods such as the Cholesky decomposition. Such systems often arise when numerically solving partial differential equations.
- ↑ Straeter, T. A.. "On the Extension of the Davidon-Broyden Class of Rank One, Quasi-Newton Minimization Methods to an Infinite Dimensional Hilbert Space with Applications to Optimal Control Problems". NASA Technical Reports Server. NASA. http://hdl.handle.net/2060/19710026200. Retrieved 10 October 2011.
2006
- (Vishwanathan et al., 2006) ⇒ S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. (2006). “Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods.” In: Proceedings of the 23rd International Conference on Machine learning (ICML-2006). doi:10.1145/1143844.1143966
- QUOTE: … Current training methods for CRFs (In this paper, “training” specifically means penalized maximum likelihood parameter estimation) include generalized iterative scaling (GIS), conjugate gradient (CG), and limited-memory BFGS. These are all batch-only algorithms that do not work well in an online setting, and require many passes through the training data to converge. … Stochastic gradient methods, on the other hand, are online and scale sub-linearly with the amount of training data, making them very attractive for large data sets;
1994
- (Hagan & Menhaj, 1994) ⇒ Martin T. Hagan, and Mohammad B. Menhaj. (1994). “Training Feedforward Networks with the Marquardt Algorithm.” In: IEEE Transactions on Neural Networks Journal, 5(6). doi:10.1109/72.329697
- QUOTE: The Marquardt algorithm for nonlinear least squares is presented and is incorporated into the backpropagation algorithm for training feedforward neural networks. The algorithm is tested on several function approximation problems, and is compared with a conjugate gradient algorithm and a variable learning rate algorithm.