Hyperparameter Optimization Algorithm: Difference between revisions

Latest revision as of 09:04, 23 May 2024

An Hyperparameter Optimization Algorithm is a optimization algorithm that attempts to solve a hyperparameter tuning task (to select optimal hyperparameters for a trained machine learning model).

Context:
- It can range from being Manually Parameter Tuning to be an Automatically Parameter Tuning algorithm.
- It can be implemented by a Hyperparameter Tuning System (that solves a hyperparameter tuning task).
Example(s):
Counter-Example(s):
- a Parameter Estimation Algorithm,
- a Point Estimate Algorithm.
See: Parameter Update, Regression, Neural Network Training Algorithm, Regularization Parameter.

References

2020

(Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Hyperparameter_optimization#Approaches Retrieved:2020-6-5.

   1.1 Grid search
   1.2 Random search
   1.3 Bayesian optimization
   1.4 Gradient-based optimization
   1.5 Evolutionary optimization
   1.6 Population-based
   1.7 Others

2015

(Claesen & De Moor, 2015) ⇒ Marc Claesen, and Bart De Moor (2015). "Hyperparameter Search in Machine Learning". arXiv preprint arXiv:1502.02127.
- ABSTRACT: We introduce the hyperparameter search problem in the field of machine learning and discuss its main challenges from an optimization perspective. Machine learning methods attempt to build models that capture some element of interest based on given data. Most common learning algorithms feature a set of hyperparameters that must be determined before training commences. The choice of hyperparameters can significantly affect the resulting model's performance, but determining good values can be complex; hence a disciplined, theoretically sound search strategy is essential.

2013

(Rémi et al., 2013) ⇒ Rémi Bardenetémi, Mátyás Brendel, Balázs Kégl, and Michele Sebag. (2013). “Collaborative Hyperparameter Tuning.” In: International Conference on Machine Learning, pp. 199-207.
- ABSTRACT: Hyperparameter learning has traditionally been a manual task because of the limited number of trials. Today's computing infrastructures allow bigger evaluation budgets, thus opening the way for algorithmic approaches. Recently, surrogate-based optimization was successfully applied to hyperparameter learning for deep belief networks and to WEKA classifiers. The methods combined brute force computational power with model building about the behavior of the error function in the hyperparameter space, and they could significantly improve on manual hyperparameter tuning. What may make experienced practitioners even better at hyperparameter optimization is their ability to generalize across similar learning problems. In this paper, we propose a generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand. To this end, we combine surrogate-based ranking and optimization techniques for surrogate-based collaborative tuning (SCoT). We demonstrate SCoT in two experiments where it outperforms standard tuning techniques and single-problem surrogate-based optimization.

2012a

(Bergstra & Bengio, 2012) ⇒ James Bergstra; and Yoshua Bengio (2012). "Random Search for Hyper-Parameter Optimization". J. Machine Learning Research. 13: 281–305.
- ABSTRACT: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent “High Throughput” methods achieve surprising success — they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

2012b

(Arcuri & Fraser, 2012) ⇒ Andrea Arcuri, and Gordon Fraser (2012). "Parameter Tuning or Default Values? An Empirical Investigation in Search-Based Software Engineering". Submitted to EMSE.
- ABSTRACT: Many software engineering problems have been addressed with search algorithms. Search algorithms usually depend on several parameters (e.g., population size and crossover rate in genetic algorithms), and the choice of these parameters can have an impact on the performance of the algorithm. It has been formally proven in the No Free Lunch theorem that it is impossible to tune a search algorithm such that it will have optimal settings for all possible problems. So, how to properly set the parameters of a search algorithm for a given software engineering problem? In this paper, we carry out the largest empirical analysis so far on parameter tuning in search-based software engineering. More than one million experiments were carried out and statistically analyzed in the context of test data generation for object-oriented software using the EvoSuite tool. Results show that tuning does indeed have impact on the performance of a search algorithm. But, at least in the context of test data generation, it does not seem easy to find good settings that significantly outperform the “default” values suggested in the literature. This has very practical value for both researchers (e.g., when different techniques are compared) and practitioners. Using “default” values is a reasonable and justified choice, whereas parameter tuning is a long and expensive process that might or might not pay off in the end.

2012c

(Wilson, 2012) ⇒ (2018) "training pattern". In: Bill Wilson, (2012). The Machine Learning Dictionary. Retrieved: 2018-10-28.
- QUOTE: This term is used to refer to a set of input values and the corresponding set of desired or target output values for use in training an artificial neural network. Usually, a largish number of training patterns would be used in the training of any genuinely useful neural network.
  In toy problems like the XOR problem, only a few training patterns (in the case of XOR, just 4) may be used.
  Patterns used for testing the trained network are referred to as test patterns.

2005

(Yuan & Gallagher, 2005) ⇒ Bo Yuan, and Marcus Gallagher (2005, September). "A hybrid approach to parameter tuning in genetic algorithms". In Evolutionary Computation, 2005. The 2005 IEEE Congress on (Vol. 2, pp. 1096-1103). IEEE.
- ABSTRACT: Choosing the best parameter setting is a well-known important and challenging task in evolutionary algorithms (EAs). As one of the earliest parameter tuning techniques, the meta-EA approach regards each parameter as a variable and the performance of algorithm as the fitness value and conducts searching on this landscape using various genetic operators. However, there are some inherent issues in this method. For example, some algorithm parameters are generally not searchable because it is difficult to define any sensible distance metric on them. In this paper, a novel approach is proposed by combining the meta-EA approach with a method called racing, which is based on the statistical analysis of algorithm performance with different parameter settings. A series of experiments are conducted to show the reliability and efficiency of this hybrid approach in tuning genetic algorithms (GAs) on two benchmark problems.

2004

(Hastie et al., 2004) ⇒ Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. (2004). “The Entire Regularization Path for the Support Vector Machine.” In: The Journal of Machine Learning Research, 5.
- The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model.

1998

(Joachims, 1998) ⇒ Thorsten Joachims. (1998). “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In: Proceedings of the European Conference on Machine Learning (ECML 1998).
- … SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore, they are fully automatic, eliminating the need for manual parameter tuning.

.

@@ Line 39: / Line 39: @@
 === 2012a ===
-* ([[Bergstra & Bengio, 2012]]) ⇒  [[James Bergstra]]; and [[Yoshua Bengio]] ([[2012]]). [http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf "Random Search for Hyper-Parameter Optimization"]. J. Machine Learning Research. 13: 281–305.
+* ([[Bergstra & Bengio, 2012]]) ⇒ [[James Bergstra]]; and [[Yoshua Bengio]] ([[2012]]). [http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf "Random Search for Hyper-Parameter Optimization"]. J. Machine Learning Research. 13: 281–305.
 ** ABSTRACT: [[Grid search]] and [[manual search]] are the most widely used strategies for [[Parameter Tuning Algorithm|hyper-parameter optimization]]. This paper shows empirically and theoretically that [[randomly chosen trial]]s are more efficient for [[Parameter Tuning Algorithm|hyper-parameter optimization]] than trials on a grid. [[Empirical evidence]] comes from a comparison with a large previous study that used [[grid search]] and [[manual search]] to configure [[neural network]]s and [[deep belief network]]s. Compared with [[neural network]]s configured by a pure [[grid search]], we find that [[random search]] over the same domain is able to find models that are as good or better within a small fraction of the [[computation time]]. Granting [[random search]] the same computational budget, [[random search]] finds better models by effectively searching a larger, less promising [[configuration space]]. Compared with [[deep belief network]]s configured by a thoughtful combination of [[manual search]] and [[grid search]], purely [[random search]] over the same 32-[[dimensional configuration space]] found [[statistical]]ly equal performance on four of seven [[data set]]s, and superior [[performance]] on one of seven. A [[Gaussian process analysis]] of the [[function]] from [[hyper-parameter]]s to [[validation set]] [[performance]] reveals that for most [[data set]]s only a few of the [[hyper-parameter]]s really matter, but that different [[hyper-parameter]]s are important on different [[data set]]s. This phenomenon makes [[grid search]] a poor choice for [[configuring algorithm]]s for new [[data set]]s. Our [[analysis]] casts some light on why recent “High Throughput” methods achieve surprising success — they appear to search through a large number of [[hyper-parameter]]s because most [[hyper-parameter]]s do not matter much. We anticipate that growing interest in large [[hierarchical model]]s will place an increasing burden on techniques for [[Parameter Tuning Algorithm|hyper-parameter optimization]]; this work shows that [[random search]] is a natural [[baseline]] against which to judge progress in the development of [[adaptive (sequential) hyper-parameter optimization algorithm]]s.

Hyperparameter Optimization Algorithm: Difference between revisions

Latest revision as of 09:04, 23 May 2024

References

2020

2015

2013

2012a

2012b

2012c

2005

2004

1998

Navigation menu

Search