Easy Hyperparameter Search Using Optunity
Marc Claesen, Jaak Simm, Dusan Popovic, Yves Moreau, Bart De Moor
EEasy Hyperparameter Search Using Optunity
Marc Claesen [email protected]
Jaak Simm [email protected]
Dusan Popovic [email protected]
Yves Moreau [email protected]
Bart De Moor [email protected]
KU Leuven, Department of Electrical Engineering (ESAT)STADIUS Center for Dynamical Systems, Signal Processing and Data AnalyticsiMinds, Department of Medical Information TechnologiesKasteelpark Arenberg 10, box 24463001 Leuven, Belgium
Abstract
Optunity is a free software package dedicated to hyperparameter optimization. It containsvarious types of solvers, ranging from undirected methods to direct search, particle swarmand evolutionary optimization. The design focuses on ease of use, flexibility, code clarityand interoperability with existing software in all machine learning environments. Optunityis written in Python and contains interfaces to environments such as R and MATLAB.Optunity uses a BSD license and is freely available online at . Keywords: hyperparameter search, black-box optimization, algorithm tuning, Python
1. Introduction
Many machine learning tasks aim to train a model M which minimizes some loss function L ( M | X ( te ) ) on given test data X ( te ) . A model is obtained via a learning algorithm A whichuses a training set X ( tr ) and solves some optimization problem. The learning algorithm A may itself be parameterized by a set of hyperparameters λ , e.g. M = A ( X ( tr ) | λ ).Hyperparameter search – also known as tuning – aims to find a set of hyperparameters λ ∗ ,such that the learning algorithm yields an optimal model M ∗ that minimizes L ( M | X ( te ) ): λ ∗ = arg min λ L (cid:0) A ( X ( tr ) | λ ) | X ( te ) (cid:1) = arg min λ F ( λ | A , X ( tr ) , X ( te ) , L ) (1)In the context of tuning, F is the objective function and λ is a tuple of hyperparameters(optimization variables). The learning algorithm A and data sets X ( tr ) and X ( te ) are known.Depending on the learning task, X ( tr ) and X ( te ) may be labeled and/or equal to each other.The objective function often has a constrained domain (for example regularization termsmust be positive) and is assumed to be expensive to evaluate, black-box and non-smooth.Tuning hyperparameters is a recurrent task in many machine learning approaches. Somecommon hyperparameters that must be tuned are related to kernels, regularization, learningrates and network architecture. Tuning can be necessary in both supervised and unsuper-vised settings and may significantly impact the resulting model’s performance. c (cid:13) Marc Claesen, Jaak Simm, Dusan Popovic, Yves Moreau and Bart De Moor. a r X i v : . [ c s . L G ] D ec laesen, Simm, Popovic, Moreau and De Moor General machine learning packages typically provide only basic tuning methods like gridsearch. The most common tuning approaches are grid search and manual tuning (Hsu et al.,2003; Hinton, 2012). Grid search suffers from the curse of dimensionality when the numberof hyperparameters grows large while manual tuning requires considerable expertise whichleads to poor reproducibility, particularly when many hyperparameters are involved.
2. Optunity
Our software is a Swiss army knife for hyperparameter search. Optunity offers a series ofconfigurable optimization methods and utility functions that enable efficient hyperparame-ter optimization. Only a handful of lines of code are necessary to perform tuning. Optunityshould be used in tandem with existing machine learning packages that implement learningalgorithms. The package uses a BSD license and is simple to deploy in any environment.Optunity has been tested in Python, R and MATLAB on Linux, OSX and Windows.
Optunity provides both simple routines for lay users and expert routines that enable fine-grained control of various aspects of the solving process. Basic tuning can be performedwith minimal configuration, requiring only an objective function, an upper limit on thenumber of evaluations and box constraints on the hyperparameters to be optimized.The objective function must be defined by the user. It takes a hyperparameter tuple λ and typically involves three steps: (i) training a model M with λ , (ii) use M to predict atest set (iii) compute some score or loss based on the predictions. In unsupervised tasks,the separation between (i) and (ii) need not exist, for example in clustering a data set.Tuning involves a series of function evaluations until convergence or until a predefinedmaximum number of evaluations is reached. Optunity is capable of vectorizing evaluationsin the working environment to speed up the process at the end user’s volition.Optunity additionally provides k -fold cross-validation to estimate the generalizationperformance of supervised modeling approaches. The cross-validation implementation canaccount for strata and clusters. Finally, a variety of common quality metrics is available.The code example below illustrates tuning an SVM with scikit-learn and Optunity. @optunity.cross_validated(x=data, y=labels, num_folds=10, num_iter=2) def svm auc(x_train, y_train, x_test, y_test, C, gamma): model = sklearn.svm.SVC(C=C, gamma=gamma).fit(x_train, y_train) decision_values = model.decision_function(x_test) return optunity.metrics.roc_auc(y_test, decision_values) optimal_pars, _, _ = optunity.maximize(svm auc, num_evals=100, C=[0, 10], gamma=[0, 1]) optimal_model = sklearn.svm.SVC(**optimal_pars).fit(data, labels) The objective function as per Equation (1) is defined on lines 1 to 5, where λ = ( C, γ ), A is the SVM training algorithm and L is area under the ROC curve. We use 2 × iterated10-fold cross-validation to estimate area under the ROC curve. Up to 100 hyperparametertuples are tested within the box constraints 0 < C <
10 and 0 < γ <
1. Instances in a stratum should be spread across folds. Clustered instances must remain in a single fold.2. We assume the correct imports are made and data and labels contain appropriate content. ptunity Optunity provides a wide variety of solvers, ranging from basic, undirected methods likegrid search and random search (Bergstra and Bengio, 2012) to evolutionary methods suchas particle swarm optimization (Kennedy, 2010) and the covariance matrix adaptation evo-lutionary strategy (CMA-ES) (Hansen and Ostermeier, 2001). Finally, we provide theNelder-Mead simplex (Nelder and Mead, 1965), which is useful for local search after a goodregion has been determined. Optunity’s current default solver is particle swarm optimiza-tion, as our experiments have shown it to perform well for a large variety of tuning tasksinvolving various learning algorithms. Additional solvers will be incorporated in the future.
The design philosophy of Optunity prioritizes code clarity over performance. This is justifiedby the fact that objective function evaluations constitute the real performance bottleneck.In contrast to typical Python packages, we avoid dependencies on big packages likeNumPy/SciPy and scikit-learn to facilitate users working in non-Python environments(sometimes at the cost of performance). To prevent issues for users that are unfamiliarwith Python, care is taken to ensure all code in Optunity works out of the box on anyPython version above 2.7, without requiring tools like to make explicit conversions.Optunity has a single dependency on DEAP (Fortin et al., 2012) for the CMA-ES solver.A key aspect of Optunity’s design is interoperability with external environments. Thisrequires bidirectional communication between Optunity’s Python back-end ( O ) and theexternal environment ( E ) and roughly involves three steps: (i) E → O solver configuration,(ii)
O ↔ E objective function evaluations and (iii)
O → E solution and solver summary. Tothis end, Optunity can do straightforward communication with any environment via socketsusing JSON messages as shown in Figure 1. Only some information must be communicated,big objects like data sets are never exchanged. To port Optunity to a new environment, athin wrapper must be implemented to handle communication.
Solvers grid searchrandom searchNelder-Meadparticle swarmCMA-ES . . .
API ⇔ JSON ⇔ WrapperMethod
RMATLABJava . . . (cid:26) callback requests (cid:63) final solutionconfiguration (cid:63) callback results (cid:27)
Optunity in Pythonworking environment generic solversarbitrary method
Figure 1: Integrating Optunity in non-Python environments.
Code is documented using Sphinx and contains many doctests that can serve as both unittests and examples of the associated functions. Our website contains API documenta-tion, user documentation and a wide range of examples to illustrate all aspects of thesoftware. The examples involve various packages, including scikit-learn (Pedregosa et al.,2011), OpenCV (Bradski, 2000) and Spark’s MLlib (Zaharia et al., 2010). laesen, Simm, Popovic, Moreau and De Moor Collaborative development is organized via GitHub. The project’s master branch is keptstable and is subjected to continuous integration tests using Travis CI. We recommendprospective users to clone the master branch for the most up-to-date stable version of thesoftware. Bug reports and feature requests can be filed via issues on GitHub.Future development efforts will focus on wrappers for Java, Julia and C/C++. Thiswill make Optunity readily available in all main environments related to machine learning.We additionally plan to incorporate Bayesian optimization strategies (Jones et al., 1998).
3. Related work
A number of software solutions exist for hyperparameter search. HyperOpt offers randomsearch and sequential model-based optimization (Bergstra et al., 2013). Some packages dedi-cated to Bayesian approaches include Spearmint (Snoek et al., 2012), DiceKriging (Roustantet al., 2012) and BayesOpt (Martinez-Cantin, 2014). Finally, ParamILS is a command-line-only tuning framework providing iterated local search (Hutter et al., 2009).Optunity distinguishes itself from existing packages by exposing a variety of fundamen-tally different solvers. This matters because the no free lunch theorem suggests that nosingle approach is best in all settings (Wolpert and Macready, 1997). Additionally, Optu-nity is easy to integrate in various environments and features a very simple API.
Acknowledgments
This research was funded via the following channels: • Research Council KU Leuven: GOA/10/09 MaNet, CoE PFV/10/016 SymBioSys; • Flemish Government: FWO: projects: G.0871.12N (Neural circuits); IWT: TBM-Logic Insulin(100793), TBM Rectal Cancer(100783), TBM IETA(130256), O&O Ex-aScience Life Pharma, ChemBioBridge, PhD grants (specifically 111065); IndustrialResearch fund (IOF): IOF/HB/13/027 Logic Insulin; iMinds Medical InformationTechnologies SBO 2014; VLK Stichting E. van der Schueren: rectal cancer • Federal Government: FOD: Cancer Plan 2012-2015 KPC-29-023 (prostate) • COST: Action: BM1104: Mass Spectrometry Imaging
References
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization.
Journal of Machine Learning Research , 13(1):281–305, 2012.James Bergstra, Dan Yamins, and David D Cox. Hyperopt: A python library for optimizingthe hyperparameters of machine learning algorithms. In
Proceedings of the 12th Pythonin Science Conference , pages 13–20. SciPy, 2013.
3. We maintain the following subdomains for convenience: http:// { builds, docs, git, issues } .optunity.net . ptunity G. Bradski. The OpenCV library.
Dr. Dobb’s Journal of Software Tools , 2000. URL .F´elix-Antoine Fortin, De Rainville, Marc-Andr´e Gardner Gardner, Marc Parizeau, ChristianGagn´e, et al. DEAP: Evolutionary algorithms made easy.
Journal of Machine LearningResearch , 13(1):2171–2175, 2012.Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation inevolution strategies.
Evolutionary computation , 9(2):159–195, 2001.Geoffrey E Hinton. A practical guide to training restricted boltzmann machines. In
NeuralNetworks: Tricks of the Trade , pages 599–619. Springer, 2012.Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. A practical guide to support vectorclassification, 2003.Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and Thomas St¨utzle. ParamILS: anautomatic algorithm configuration framework.
Journal of Artificial Intelligence Research ,36(1):267–306, 2009.Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization ofexpensive black-box functions.
Journal of Global optimization , 13(4):455–492, 1998.James Kennedy. Particle swarm optimization. In
Encyclopedia of Machine Learning , pages760–766. Springer, 2010.Ruben Martinez-Cantin. BayesOpt: A Bayesian optimization library for nonlinear opti-mization, experimental design and bandits. arXiv preprint arXiv:1405.7430 , 2014.John A Nelder and Roger Mead. A simplex method for function minimization.
The computerjournal , 7(4):308–313, 1965.Fabian Pedregosa, Ga¨el Varoquaux, Alexandre Gramfort, Vincent Michel, BertrandThirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, VincentDubourg, et al. Scikit-learn: Machine learning in Python.
Journal of Machine LearningResearch , 12:2825–2830, 2011.Olivier Roustant, David Ginsbourger, Yves Deville, et al. DiceKriging, DiceOptim: TwoR packages for the analysis of computer experiments by kriging-based metamodeling andoptimization. 2012.Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization ofmachine learning algorithms. In
Advances in Neural Information Processing Systems ,pages 2951–2959, 2012.David H Wolpert and William G Macready. No free lunch theorems for optimization.
Evolutionary Computation, IEEE Transactions on , 1(1):67–82, 1997.Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica.Spark: cluster computing with working sets. In
Proceedings of the 2nd USENIX confer-ence on Hot topics in cloud computing , pages 1–7, 2010., pages 1–7, 2010.