Hyperparameter Optimization Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "<P> [[" to "<P>  [[")
m (Text replacement - " [[" to " [[")
 
Line 39: Line 39:


=== 2012a ===
=== 2012a ===
* ([[Bergstra & Bengio, 2012]]) ⇒ [[James Bergstra]]; and [[Yoshua Bengio]] ([[2012]]). [http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf "Random Search for Hyper-Parameter Optimization"]. J. Machine Learning Research. 13: 281–305.
* ([[Bergstra & Bengio, 2012]]) ⇒ [[James Bergstra]]; and [[Yoshua Bengio]] ([[2012]]). [http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf "Random Search for Hyper-Parameter Optimization"]. J. Machine Learning Research. 13: 281–305.
** ABSTRACT: [[Grid search]] and [[manual search]] are the most widely used strategies for [[Parameter Tuning Algorithm|hyper-parameter optimization]]. This paper shows empirically and theoretically that [[randomly chosen trial]]s are more efficient for [[Parameter Tuning Algorithm|hyper-parameter optimization]] than trials on a grid. [[Empirical evidence]] comes from a comparison with a large previous study that used [[grid search]] and [[manual search]] to configure [[neural network]]s and [[deep belief network]]s. Compared with [[neural network]]s configured by a pure [[grid search]], we find that [[random search]] over the same domain is able to find models that are as good or better within a small fraction of the [[computation time]]. Granting [[random search]] the same computational budget, [[random search]] finds better models by effectively searching a larger, less promising [[configuration space]]. Compared with [[deep belief network]]s configured by a thoughtful combination of [[manual search]] and [[grid search]], purely [[random search]] over the same 32-[[dimensional configuration space]] found [[statistical]]ly equal performance on four of seven [[data set]]s, and superior [[performance]] on one of seven. A [[Gaussian process analysis]] of the [[function]] from [[hyper-parameter]]s to [[validation set]] [[performance]] reveals that for most [[data set]]s only a few of the [[hyper-parameter]]s really matter, but that different [[hyper-parameter]]s are important on different [[data set]]s. This phenomenon makes [[grid search]] a poor choice for [[configuring algorithm]]s for new [[data set]]s. Our [[analysis]] casts some light on why recent “High Throughput” methods achieve surprising success — they appear to search through a large number of [[hyper-parameter]]s because most [[hyper-parameter]]s do not matter much. We anticipate that growing interest in large [[hierarchical model]]s will place an increasing burden on techniques for [[Parameter Tuning Algorithm|hyper-parameter optimization]]; this work shows that [[random search]] is a natural [[baseline]] against which to judge progress in the development of [[adaptive (sequential) hyper-parameter optimization algorithm]]s.
** ABSTRACT: [[Grid search]] and [[manual search]] are the most widely used strategies for [[Parameter Tuning Algorithm|hyper-parameter optimization]]. This paper shows empirically and theoretically that [[randomly chosen trial]]s are more efficient for [[Parameter Tuning Algorithm|hyper-parameter optimization]] than trials on a grid. [[Empirical evidence]] comes from a comparison with a large previous study that used [[grid search]] and [[manual search]] to configure [[neural network]]s and [[deep belief network]]s. Compared with [[neural network]]s configured by a pure [[grid search]], we find that [[random search]] over the same domain is able to find models that are as good or better within a small fraction of the [[computation time]]. Granting [[random search]] the same computational budget, [[random search]] finds better models by effectively searching a larger, less promising [[configuration space]]. Compared with [[deep belief network]]s configured by a thoughtful combination of [[manual search]] and [[grid search]], purely [[random search]] over the same 32-[[dimensional configuration space]] found [[statistical]]ly equal performance on four of seven [[data set]]s, and superior [[performance]] on one of seven. A [[Gaussian process analysis]] of the [[function]] from [[hyper-parameter]]s to [[validation set]] [[performance]] reveals that for most [[data set]]s only a few of the [[hyper-parameter]]s really matter, but that different [[hyper-parameter]]s are important on different [[data set]]s. This phenomenon makes [[grid search]] a poor choice for [[configuring algorithm]]s for new [[data set]]s. Our [[analysis]] casts some light on why recent “High Throughput” methods achieve surprising success — they appear to search through a large number of [[hyper-parameter]]s because most [[hyper-parameter]]s do not matter much. We anticipate that growing interest in large [[hierarchical model]]s will place an increasing burden on techniques for [[Parameter Tuning Algorithm|hyper-parameter optimization]]; this work shows that [[random search]] is a natural [[baseline]] against which to judge progress in the development of [[adaptive (sequential) hyper-parameter optimization algorithm]]s.



Latest revision as of 09:04, 23 May 2024

An Hyperparameter Optimization Algorithm is a optimization algorithm that attempts to solve a hyperparameter tuning task (to select optimal hyperparameters for a trained machine learning model).



References

2020

   1.1 Grid search
   1.2 Random search
   1.3 Bayesian optimization
   1.4 Gradient-based optimization
   1.5 Evolutionary optimization
   1.6 Population-based
   1.7 Others

2015

2013

  • (Rémi et al., 2013) ⇒ Rémi Bardenetémi, Mátyás Brendel, Balázs Kégl, and Michele Sebag. (2013). “Collaborative Hyperparameter Tuning.” In: International Conference on Machine Learning, pp. 199-207.
    • ABSTRACT: Hyperparameter learning has traditionally been a manual task because of the limited number of trials. Today's computing infrastructures allow bigger evaluation budgets, thus opening the way for algorithmic approaches. Recently, surrogate-based optimization was successfully applied to hyperparameter learning for deep belief networks and to WEKA classifiers. The methods combined brute force computational power with model building about the behavior of the error function in the hyperparameter space, and they could significantly improve on manual hyperparameter tuning. What may make experienced practitioners even better at hyperparameter optimization is their ability to generalize across similar learning problems. In this paper, we propose a generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand. To this end, we combine surrogate-based ranking and optimization techniques for surrogate-based collaborative tuning (SCoT). We demonstrate SCoT in two experiments where it outperforms standard tuning techniques and single-problem surrogate-based optimization.

2012a

2012b

2012c

2005

2004

1998


.