[PDF] HyperTendril: Visual Analytics for User-Driven Hyperparameter Optimization of Deep Neural Networks

Abstract

To mitigate the pain of manually tuning hyperparameters of deep neural networks, automated machine learning (AutoML) methods have been developed to search for an optimal set of hyperparameters in large combinatorial search spaces. However, the search results of AutoML methods significantly depend on initial configurations, making it a non-trivial task to find a proper configuration. Therefore, human intervention via a visual analytic approach bears huge potential in this task. In response, we propose HyperTendril, a web-based visual analytics system that supports user-driven hyperparameter tuning processes in a model-agnostic environment. HyperTendril takes a novel approach to effectively steering hyperparameter optimization through an iterative, interactive tuning procedure that allows users to refine the search spaces and the configuration of the AutoML method based on their own insights from given results. Using HyperTendril, users can obtain insights into the complex behaviors of various hyperparameter search algorithms and diagnose their configurations. In addition, HyperTendril supports variable importance analysis to help the users refine their search spaces based on the analysis of relative importance of different hyperparameters and their interaction effects. We present the evaluation demonstrating how HyperTendril helps users steer their tuning processes via a longitudinal user study based on the analysis of interaction logs and in-depth interviews while we deploy our system in a professional industrial environment.

Full PDF

TTo appear in an IEEE VGTC sponsored conference proceedings

HyperTendril: Visual Analytics for User-Driven HyperparameterOptimization of Deep Neural Networks

Heungseok Park, Yoonsoo Nam, Ji-Hoon Kim ∗ , and Jaegul Choo ∗ Fig. 1. Overview of HyperTendril that supports user-driven AutoML processes. This example involves the three hyperparameters,e.g., the number of layers, learning rate, and weight decay in the ResNet architecture, using a Bayesian Optimization and HyperBand(BOHB) search method. (C) Search space overview, (D) Model analysis view, and (E1) Exploration overview components showsthe model details selected in (B2) Selected experiments panel. The weight decay hyperparameter is activated in (C) Search spaceoverview, and its effective range is highlighted in the parallel coordinates.

Abstract — To mitigate the pain of manually tuning hyperparameters of deep neural networks, automated machine learning (AutoML)methods have been developed to search for an optimal set of hyperparameters in large combinatorial search spaces. However,the search results of AutoML methods signiﬁcantly depend on initial conﬁgurations, making it a non-trivial task to ﬁnd a properconﬁguration. Therefore, human intervention via a visual analytic approach bears huge potential in this task. In response, we proposeHyperTendril, a web-based visual analytics system that supports user-driven hyperparameter tuning processes in a model-agnostic environment. HyperTendril takes a novel approach to effectively steering hyperparameter optimization through an iterative, interactive tuning procedure that allows users to reﬁne the search spaces and the conﬁguration of the AutoML method based on their owninsights from given results. Using HyperTendril, users can obtain insights into the complex behaviors of various hyperparametersearch algorithms and diagnose their conﬁgurations. In addition, HyperTendril supports variable importance analysis to help the usersreﬁne their search spaces based on the analysis of relative importance of different hyperparameters and their interaction effects. Wepresent the evaluation demonstrating how HyperTendril helps users steer their tuning processes via a longitudinal user study basedon the analysis of interaction logs and in-depth interviews while we deploy our system in a professional industrial environment.

Index Terms —Visual analytics, deep learning, machine learning, automated machine learning, human-centered computing

NTRODUCTION

As deep neural networks evolve with highly modular architectures andadvanced optimization methods, an increasing number of hyperparam-eters are involved. Typically, these hyperparameters need to be opti-mized either entirely by hand or in a semi-automated manner, requir-ing signiﬁcant human efforts and computational resources [2, 11]. Thisissue often hinders researchers and practitioners from ﬁnding their op-timal settings, left with sub-optimal model performances. Therefore, • Heungseok Park, Yoonsoo Nam, and Ji-Hoon Kim are with Clova AIResearch, NAVER Corporation. E-mail: heungseok.park, yoonsoo.nam,[email protected]. • Jaegul Choo is with KAIST. E-mail: [email protected].*Corresponding author. the methods and the user interfaces for automated hyperparameter op-timization (HyperOpt) have emerged as a critical task.To handle such a task, various optimization methods have beenproposed in the machine learning community, say, using a sequen-tial model [3, 33], a genetic algorithm [17, 46], and a bandit al-gorithm [8, 23]. These studies contributed to developing efﬁcientsearch methods by sampling hyperparameters based on prior obser-vations, allocating computing resources to potentially promising mod-els (i.e., excluding poor models at an early stage), or combining theseapproaches. However, these methods still require considerable timeand effort to run, so automated machine learning (AutoML) sys-tems [10, 25, 28, 39] have been developed to help practitioners conve-niently optimize their models by providing interfaces for various Hy-perOpt methods. These systems have numerous advantages includingparallelism, early stopping, and ease of use, which can signiﬁcantlyimprove efﬁciency in terms of resource utilization and user experi-1 a r X i v : . [ c s . H C ] S e p nce.Although these approaches help practitioners effectively optimizetheir models, they often require delicate conﬁguration settings untilobtaining satisfactory results. For example, the evolutionary optimiza-tion algorithms need to carefully set the population size in advance,which determines the number of individuals to generate in each gen-eration, before starting an optimization process, since the convergencebehaviors and ﬁnal results can vary signiﬁcantly. In addition, propersettings can differ depending on the task being applied, so they stillneed to spend considerable time to maximize the potential of the ap-plication method. Due to the absence of the approaches to arranging aHyperOpt conﬁguration, it is a common practice to go through numer-ous manual trials with different conﬁgurations, with no clear guidance.In order to alleviate the pain of tedious processes, human intuitionfor AutoML results, such as the behavior of search algorithms, theeffect of optimization algorithm setting, and the characteristics of hy-perparameters, should be accompanied. Thus, effective and efﬁcienthuman intervention is critical during the HyperOpt process, which ne-cessitates a visual analytics system that can leverage human insightsto steer the optimization process in a user-driven manner. To this end,we propose HyperTendril (Fig. 1), a web-based visual analytics systemthat supports HyperOpt tasks, where users can effectively perform Hy-perOpt through an iterative and interactive tuning procedure, allowingthem to ﬁne-tune the optimal hyperparameters based on their domainknowledge and insights obtained from the previous results. In detail,HyperTendril helps the users progressively reﬁne their search spacesby explicitly highlighting relevant hyperparameters and the promisingranges to explore further, based on a quantitative analysis on the objec-tive model performance (e.g., a test accuracy) (Fig. 1(C)). In addition,HyperTendril visualizes the exploration history of search algorithmsfor each search space (Fig. 1(E1)) so that users can visually under-stand the complex behavior of the used algorithm and compare thedifferences between the algorithm conﬁgurations, enabling them to di-agnose and adjust it to their own tasks.To demonstrate the utility of the proposed system, we deploy oursystem in an industrial-scale environment and conduct an evaluationfocusing on how the visual analytics assists users in steering their tun-ing processes via a longitudinal user study with interaction log analysisand in-depth interviews with professional users.The main contributions of HyperTendril are as follows: • A novel visual representation that visualizes the exploration his-tory of HyperOpt algorithms, which facilitates understanding ofcomplex behavior of the search algorithms and diagnosis of thealgorithm conﬁgurations with algorithm-agnostic support. • A novel approach to effectively steer users’ HyperOpt processesby guiding on the reﬁning search spaces based on quantitativeanalysis of hyperparameter importance. • We demonstrate a user study to show how our visualization andapproach work in an AI research company at an industrial scale.

ELATED W ORK

This section discusses recent hyperparameter optimization systemswith their visualization modules and visual analytics studies relatedto reﬁning deep neural networks.

Various data exploration systems have been developed to visualizehigh-dimensional search spaces with parallel coordinates [16] for theanalysis of HyperOpt result, by showing the relationships between hy-perparameters and performance of a model. Golovin et al. [10] de-veloped Google Vizier, an interactive visualization for HyperOpt usedinternally in Google. They designed dashboard-style interfaces, en-abling users to manage and monitor optimization process. Vizier sup-ports a parallel coordinates plot for analyzing the hyperparameters in-ﬂuencing model performance. Other studies and projects [1, 6, 25, 26,27, 39] for HyperOpt similarly utilize the parallel coordinates, whichcan be considered as a standard visualization technique for the Hyper-Opt task. Even if these existing studies contribute to the analysis and monitoring of optimization results by integrating with their AutoMLsystems, they lack a consideration of the visual analytics system thatleverages human insights required to effectively steer the HyperOptprocess in a user-driven manner.Meanwhile, visualization systems that consider the human-in-the-loop environment have been devised in relation to HyperOpt tasks. Liet al. [24] studied an empirical hyperparameter search process withpractitioners in a software company and described a practical work-ﬂow for the task. They developed HyperTuner, allowing users to ini-tialize their HyperOpt processes and analyze a batch of experimentsin small multiples of scatter plots showing the effects of each hyper-parameter on model performance. AutoAIViz [45] extends the visu-alization for HyperOpt task to an entire model building pipeline vi-sualization, by utilizing a conditional parallel coordinates design. Au-toAIViz is tightly connected with the backend platform [43] to allowusers to directly steer the pipeline optimization. Wang et al. [44] de-rived a workﬂow with key decisions during the use of AutoML anddeveloped an open-source visual analytics system called ATMSeer.ATMSeer provides a multi-granularity visualization of model selec-tion and hyperparameter tuning, enabling users to monitor the processand intervene in the middle of the process to adjust their search spacesin real-time, by tightly integrating with its backend framework calledATM [37].These visualization systems work as powerful data exploration toolsfor the HyperOpt task and also increase the transparency of the processby leveraging human insights. However, they did not consider two im-portant aspects. First, they did not explicitly guide which hyperparam-eters are important nor which hyperparameter ranges are promisingfor further exploration. Consequently, the reﬁnement of search spacessolely depends on the users’ intuition. The explicit guidance on effec-tive hyperparameters can assist users in discovering the sweet spot ofthe search spaces and increase search efﬁciency. Second, they did notconsider the importance of conﬁguring and diagnosing various Hyper-Opt algorithms that can have a signiﬁcant impact on search results.Speciﬁcally, ATMSeer demonstrated that the system can reveal thebias of the used search algorithm, by comparing the histograms of thesearch results. However, from the perspective of non-expert users forAutoML, it is difﬁcult to grasp the reasons for this phenomenon, sincethe visualization was not designed to present the inner workings ofthe AutoML algorithm. Consequently, the transparency and control-lability of the AutoML method still remain low since it is difﬁcult toﬁgure out how to conﬁgure it to ﬁt their tasks when AutoML producesunreliable results.Our approach assists users in reﬁning the search space via a quanti-tative analysis method for measuring hyperparameter importance andvisual representation of the analysis results, which provide users with(1) the relative importance of different hyperparameters with the guid-ance of promising ranges which is worth exploring further. In addi-tion, we provide a visual representation that visualizes the explorationhistory of hyperparameter search algorithms, which facilitates (2) theunderstanding of the nature of search algorithms and the diagnosis ofthe given conﬁguration.

Similar to the internal parameters of deep learning models, hyperpa-rameters can have a signiﬁcant impact on model performance and ro-bustness, but their optimization is difﬁcult to achieve using differentialtechniques. Therefore, a separate ﬁne-tuning process is still required.In response, recent research has focused on the interactive workﬂowfor the process provided by visual analytics systems [7]. These needsfor targeted analysis and design of visual analytics for deep learninghave been summarized as an exploratory workﬂow [32].Numerous visual analytics systems are mainly focused on support-ing workﬂow phases that correspond to the model training prepara-tion and model evaluation, allowing practitioners to understand the ef-fects of hyperparameters and iteratively improve it. TensorFlow Play-ground [34], ShapeShop [14], and ReVACNN [5] provide a straight-forward analysis system for neural network improvement by allow-ing the user to directly select the hyperparameters corresponding to2 o appear in an IEEE VGTC sponsored conference proceedings

OutputData V i s ua li z a t i on Insight

AutoML framework

Computationalbudget (a) budget configuration (b) search spaceconfiguration (c) search algorithmconfiguration

GPU GPUGPUGPU

Optimization loop

Hyperparameter Optimization Processes

Validate & SelectModelsIdentify EffectiveHyperparameters DiagnoseAutoML Algorithm

Hyperparametersetting λ Validation performance f(λ)

Blackboxoptimizer max f(λ) λ Λ

N1. Identifying effective hyperparameter

N2. Analyzing search algorithm behavior

N3. Selecting models satisfying requirements

Tasks to supportActions

Analytic Needs

Increase budget if the resultsare not satisfied requirements Drop ineffective hyperparamsor narrow down search space Select suitable algorithmor refine configuration

Fig. 2. Illustration of practical analytic needs and hyperparameter optimization workﬂow, identiﬁed from interviews with ten practitioners. There arethree common analytic needs from the practitioners, and the top left shows tasks in which a visual analytics approach can assist them. the model training preparation stage. ML-o-scope [4], explAIner [35],and DeepEyes [30] introduced a time-lapse engine to investigate themodel’s learning dynamics, a pipeline framework for comprehensivemodel analysis, and incremental model development through activa-tion heatmap, respectively. These systems suggested that the providedanalysis tool can help to reﬁne the deep learning model through rein-forcement of the model evaluation stage.From an ontological perspective [32], the conventional visual anal-ysis systems are similar to our system that supports hyperparameteranalysis and comparative analysis of deep learning models. However,for quality and result analysis phase, while existing studies only sup-port model performance comparison analysis, our proposed system ad-ditionally provides conﬁguration diagnosis and behavior analysis ofhyperparameter search algorithms so that users can identify effectivehyperparameters through a more optimized search algorithm. There-fore, even if the workﬂow has similar ontology pathway components,the user behavior and model search performance could be more dy-namic and efﬁcient than the previous work.

DENTIFICATION OF A NALYTIC N EEDS IN H YPERPARAME - TER O PTIMIZATION T ASK

To learn how practitioners typically work on the hyperparameter opti-mization process and deal with resulting problematic issues, and howa visualization domain can assist the tasks and reduce the issues, weﬁrst conducted semi-structured interviews with ten machine learningpractitioners in the NAVER company . Based on user interviews, weestablished a common hyperparameter optimization workﬂow whenoptimizing their models. Following the analysis, we present three keyanalytic needs that a visual analytics approach can support. These ﬁnd-ings guide our discovery of design goals that we aim to address. Hyperparameter optimization (HyperOpt) refers to ﬁnding a set of op-timal hyperparameter values that have to be set before performing thetraining of machine learning models. The optimal value of the hy-perparameters here is the hyperparameter value that maximizes theperformance (i.e., minimizes error rate) of the trained model. For ex-ample, learning rates, training batch sizes, batch normalization, andweight decay are considered as typical hyperparameters in training adeep learning model. In addition, factors that determine the structureof the deep learning model can also be considered as hyperparameters,such as the number of layers and the size of the convolution ﬁlters.Meanwhile, HyperOpt is performed through an optimization loop,as shown at the bottom of Fig. 2. If we let λ as a hyperparameter set-ting of a deep neural network A , the optimization loop iteratively sam-ples different λ ( λ ∈ Λ ) that maximizes the validation performance NAVER Co., Ltd. is the largest web search engine in South Korea and aglobal ICT brand that provides services such as LINE messenger and webtoon. f ( λ ) . Contrary to the approach in optimizing neural networks, the gra-dient cannot be computed in this loop, and the loop has no access toany other information about f . Therefore, λ is sampled randomly orthrough an educated guess from the prior results of given search spaces Λ . In order to sample λ efﬁciently, various optimization methods havebeen proposed in the machine learning community with different ap-proaches, such as sequential model-based [3, 33], genetic algorithm-based [17], and bandit-based [8, 23] optimization.Most AutoML frameworks [1, 6, 10, 20, 25, 26, 27, 37] are usuallydesigned to generate deep learning models of multiple hyperparame-ter combinations based on given search spaces and return models withhigh performance. The systems allow users to conﬁgure various set-tings to incorporate human knowledge and improve search efﬁciency.In order to ﬁnd the optimal hyperparameters through AutoML frame-work, three key conﬁgurations should be established by users at aninitial stage in general: (1) hyperparameter candidates to search andtheir search ranges to explore, (2) a computational budget to use, and(3) a HyperOpt algorithm to perform search and its conﬁguration. In order to identify general workﬂows of the HyperOpt process andanalytic needs of real users, we conducted interviews with those whohave used the AutoML system launched as an internal beta servicesince January 2019. The participants were asked to describe their prac-tices when using the AutoML system, such as how they conﬁgure theHyperOpt settings and interpret the results. The interviews have shownus how they optimize the hyperparameters of their models in general,leading us to identifying the three analytic needs (N1-N3) of visualanalytics required during the HyperOpt process. Following the analy-sis, we summarize and illustrate the overall workﬂow of the HyperOptprocess with the identiﬁed needs that can be supported by visual ana-lytics, as shown in Fig. 2.

N1. Identify effective hyperparameters.

As described before, Au-toML frameworks usually allow users to conﬁgure hyperparametersearch spaces as a preset before performing the process. Intervieweesstated that they usually check the hyperparameter values of the best k ( k <

10, usually) models from the prior results and then narrow downthe search spaces by exploring the neighborhood of best hyperparam-eters with a larger computational budget. However, they are not conﬁ-dent about whether the reﬁned search spaces would yield better mod-els. In addition, they want to understand the impact of each hyperpa-rameter on the model performance so that they can increase search ef-ﬁciency by dropping ineffective hyperparameters from the candidatesor narrowing them down to the subspace where a better model can befound.

N2. Understand and diagnose search algorithms.

Those intervie-wees, who have used various search algorithms to tune the hyperpa-rameters of their models, are concerned about the conﬁguration of Hy-3erOpt algorithms (e.g., how much population size or survivor rate ofthe population should be set when using an evolutionary-based opti-mization algorithm [3, 17]), since the results can vary depending onthe values of the algorithm conﬁgurations. In addition, numerous ma-chine learning engineers and experts, even though they had experiencewith deep learning models, were unfamiliar with various HyperOpt al-gorithms. They said that it was difﬁcult to understand how each searchalgorithm works and to obtain the conﬁguration value suited for theirtasks. They cared about the detailed values of the conﬁguration andexpressed the necessity of visual analytics for the diagnosis of variousalgorithms and their conﬁgurations.

N3. Select models with user requirements.

AutoML returns themodel with the best performance score (e.g., a test accuracy) but doesnot consider other aspects of the model. However, in practice, modelfeatures other than the main performance have to be considered formodel selection. For example, some users developing edge-device-related services wanted to ensure that the best-performing model islight-weight to deploy on small devices and to understand which hy-perparameters affect the model size. The other users wanted to validatethe training process of the model by checking whether the loss func-tion value is adequately low and saturated. In addition, if the outputmodel did not satisfy the requirements, they carried out the subsequentHyperOpt processes until a good trade-off point between requirementsand performance were found.A major ﬁnding from the conducted interviews is that the HyperOptprocess does not end in a single trial. Due to the main needs (N1-N3)and other auxiliary reasons such as the lack of computing resources,users iteratively performed the HyperOpt process multiple times untilobtaining a satisfactory model. Therefore, even if users take advantageof a HyperOpt framework, they are always in the middle of the opti-mization loop and have to make a decision on subsequent actions forbetter results based on the insights gained from the prior results, asillustrated in Fig. 2.In summary, the analytic needs and workﬂow we acquired throughthe interviews were in line with those presented by Li et al. [24] andWang et al. [44]. However, previous studies did not consider the users’lack of experience with HyperOpt algorithms, which limited users’ un-derstanding and means for applying prior knowledge. It motivated usto develop a more advanced design that does not only improve modelperformance but also aid the diagnosis of the algorithm’s process.

ESIGN G OALS

In this section, we highlight and formalize the primary analytic needsdiscussed earlier in Section 3 with key design goals that HyperTendrilaims to support. We label the four goals as G1 - G4.

G1. Qualitative and quantitative analysis of effective hyperpa-rameters.

We aim to support the explicit guidance on effective hy-perparameters in both a qualitative and a quantitative manner, by mea-suring the importance of hyperparameters and visualizing the resultseffectively (N1). In addition, we aim to provide an overview interfaceof the conﬁgured search spaces for reﬁning them based on the quanti-tative analysis results.

G2. Effective visual representation for understanding and diag-nosing various search algorithms.

The behavior and conﬁgurationof various search algorithms are complex and diverse, and the hy-perparameter search space of deep learning models is usually high-dimensional. Therefore, designing a visual representation for under-standing the exploration process of the algorithms can be challenging,and none of the previous studies attempted it. We aim to visualizethe exploration history of search algorithms in an algorithm-agnostic manner by using multiple coordinated search history view of each hy-perparameter. It will let the users understand the algorithm behavioron their tasks and choose the proper conﬁgurations for the algorithms(N2). In addition, we aim to support a monitoring view for the per-formance of search algorithms so that the users can quickly diagnosewhether the algorithm is consistently improving their model perfor-mances over time.

G3. Interface for ﬁltering and analyzing models with various per-spectives.

In order to support model selection and analysis from thevast number of trained models, it is desirable to ﬁlter particular modelswith user requirements along with detailed information for them (N3).In general, automated HyperOpt produces a number of deep learningmodels with different hyperparameter combinations. Visualizing ev-ery detail of optimization results can overwhelm users when selectingtheir desired models. Therefore, we aim to present the overview ofthe optimization results, allowing users to ﬁlter the particular modelsby their desired attributes and drill down to the detailed analysis ondemand, by tightly integrating with the overview component.

G4. Interactive and effective interface for iterative optimizationprocess.

As described in Section 3.2, the hyperparameter optimiza-tion process is not typically completed with a single run due to variousreasons, such as limited computing resources, incorrectly conﬁguredsearch algorithms, or large search spaces. Therefore, it is desirable fora system to track users’ successive optimization processes and sup-port their comprehensive analytic reasoning in the processes, eventu-ally leading them to optimal results. We aim to design a ﬂexible envi-ronment for representing the various hyperparameter sets in a scalablemanner and tracking multiple optimization processes. Besides, we alsoaim to design a convenient environment for running a new process bymaking a tight connection with the backend framework.

YPER T ENDRIL : V

ISUAL A NALYTICS FOR U SER -D RIVEN H YPERPARAMETER O PTIMIZATION

Based on the design goals identiﬁed in Section 4, we developed Hyper-Tendril, a visual analytics for user-driven hyperparameter optimiza-tion of deep neural networks. The interface of HyperTendril consistsof a Control panel, an Optimization overview module, and an AutoMLanalytics module, as described below in detail. In order to support our design goal (G4), HyperTendril is designed totrack and analyze multiple HyperOpt processes in a scalable manner.The Control panel (Fig. 1(A)) allows users to control the HyperOptprocesses, adjust those conﬁgurations (e.g., a search space, a searchalgorithm, and a computational budget), and run a new process withthe revised conﬁguration based on the analysis of previous results.

The Optimization overview (Fig. 3) is designed to provide an overviewof the optimization results of AutoML processes with high-level statis-tics, such as the number of experiments conducted in the processes,the highest model performance so far, as well as the distributions ofthe model performance along with additional metrics (e.g., the modelsize and the computing time for model inference). In addition, Hyper-Tendril supports a trade-off plot that helps users choose the best mod-els satisfying their required metrics to support G3. With these views,users can select the suitable models and include the models in the ‘Se-lected experiments’ table to expand them to detailed analyses, such aschecking detailed hyperparameter settings or examining the trainingprocess of the models.

The AutoML analytics summarizes the results of HyperOpt pro-cesses at three different levels of granularity, a search method-level, ahyperparameter-level, and a model-level, to support the essential tasksin optimizing hyperparameters of deep learning models.

When performing and analyzing a HyperOpt process, it is importantto ﬁrst determine whether the used black-box optimization algorithmworked properly on the target task (N2) before performing detailedanalysis on effective hyperparameters and individual models. While The name of the system comes from a vine plant stem called Tendril, whichimplies the design goals of our system. o appear in an IEEE VGTC sponsored conference proceedings Selecting models that meet users’ requirements ( high-performance, lightweight )Showing detail configuration and meta-data of the selected experiment.

Expand to detail analysis of selected deep learning models

Selecting desired models which meets the requirements of services or tasks - Histogram shows distribution of metrics - Scatter plot shows the trade-off between two metrics Fig. 3. The Optimization overview allows users to explore both the high-level statistics of optimization results and the distributions of deep learn-ing models by two important criteria of the model size and prediction ac-curacy. When a user interacts with the histogram plots, it lists the targetmodels and expand them to the detail analysis. many visualization systems have so far devised effective interfaces tosupport HyperOpt tasks [10, 25, 29, 39, 44], we have not found previ-ous approaches which attempt visualizing the behaviors and patternsof various search algorithms that explore the given search spaces. Toincrease the transparency of AutoML, users should be able to under-stand and interpret the process of the black-box optimization. To thisend, HyperTendril provides interfaces from both macro and micro per-spectives. The Exploration overview is designed to support the diagno-sis of HyperOpt algorithms and understanding of their behavior so thatusers can ﬁgure out whether the algorithm is properly conﬁgured (G2).The user interface is composed of two coordinated views (Fig. 4) for(a) monitoring of performance improvement and (b) the explorationhistory of each hyperparameter search space, respectively.First, the performance monitoring view (Fig. 4(a)) visualizes thepeak performance history of the created models in sequential order asthe HyperOpt process progresses. It allows users to check whether theprocess keeps making progress in improving the model performance.In addition, HyperTendril uses a color encoding for the area plot witha scale of the model performance score. By default, the color map cor-responds to the range between the minimum and the maximum values,which users can interactively modify as well.The exploration history view (Fig. 4(b)) is composed of multipleplots for each hyperparameter search space. A single plot for a searchspace visualizes models with their hyperparameters. Each model is vi-sualized as a point, the x-axis presents the iteration index of the searchalgorithm, and the y-axis presents the value of its hyperparameter. Inaddition, each represented model is encoded in darker colors with thehigher performance so that users can conﬁrm that the HyperOpt algo-rithms are exploring promising search space regions. The color scale isupdated by the user interactions on the performance monitoring view,as described before.Furthermore, the exploration history view is designed to assist usersin understanding various types of search algorithms, by visualizingthe exploration process. We ﬁrst categorized well-known search algo-rithms into four types and summarize each characteristic. • Random search: Sample hyperparameters randomly. • Sequential model-based [3, 33]: Sample hyperparameters based onprior results. • Bandit-based [8, 23]: Sample hyperparameters of conﬁgured sam-ple size R and evaluate after certain iterations (depending on the R h1 (a)(b) h2 hovering highlights a target model with its performance & mutation history mutation process adjust color scale ofmodel performance performance history Fig. 4. Visual encoding for search method-level analysis in Hyper-Tendril: (a) monitoring model performance improvement and (b) explor-ing the history of each hyperparameter space. and eta ). From the current set of models, successively discard theunderperforming half at every evaluation step. Perform several suc-cessive halvings with the budget given by R . • Population-based [17]: Sample hyperparameters of conﬁgured pop-ulation size P and evaluate after certain iterations T . Discard the un-derperformers and keep the k best performers given by survivor rate S and population size P . Maintain the population size by copyingthe parameters of survived models and perturbing the hyperparam-eters. Repeat the process for G generation steps.By reviewing the algorithms, one may notice that bandit- andpopulation-based methods also have a need to evaluate and comparebetween models along with sampling hyperparameters. The differencebetween the two types of algorithms is that the bandit-based algorithmjust discards the lower performers, but the population-based algorithmdiscards them and creates other models based on the surviving mod-els. In order to visualize these characteristics, HyperTendril ﬁrst vi-sualizes the history of model performance to represent the evaluationstep in each algorithm’s iterations with small points. The survivorswill have several points along the x-axis, and the small points for theperformance history are connected with a single dashed line. The lastpoint represents the ﬁnal performance of the model, and it is visu-alized with a bigger point to distinguish from the preceding history.Next, HyperTendril visualizes the mutation process of the population-based algorithm to represent how the algorithm creates new modelsfrom promising candidates of each generation. Since the mutation pro-cess modiﬁes the value of hyperparameters from a parent model, it isconnected with a curved line and represented as a solid line to showthat the point is a newly created model. Lastly, since the algorithmsinitially sample hyperparameters in parallel by the characteristics, thegenerated models are aligned vertically. For the categorical type of hy-perparameters (e.g., types of activation function), they can overlap inthe visualization. We address this issue by applying repulsive forceto the target coordinate so the points do not overlap. For the evalua-tion and creation process of a single model, its history of the processis highlighted when the mouse cursor is placed on a point (bottom ofFig. 4(b)). In addition, by displaying the iteration index required by thespeciﬁc algorithms as text and gradation (top and interior of the plotbox in Fig. 4, respectively), it is possible to intuitively and quicklyunderstand how the algorithm searches the given search space and di-agnose the algorithm’s behavior.We tested the Exploration overview for ﬁve search algorithms withdifferent conﬁguration settings, as shown in Fig. 5. Random searchand sequential model-based algorithms (left), which have no com-plex exploration process, reveal little search patterns in the givensearch spaces. Bandit-based algorithms (middle) show that the promis-ing models survived and the worse models were discarded. Interest-ingly, although the ‘Bayesian optimization’ does not reveal an appar-ent search pattern, the ‘BOHB’, which is a combined algorithm of‘Bayesian optimization’ and ‘HyperBand’, reveals a distinct searchpattern. On the other hand, the ‘PBT’ reveals the distinct search pat-terns depending on the conﬁguration values. One of the most important analytic needs found by the preliminarystudy (Section 3) was to identify the hyperparameters with a signif-icant impact on their model performances (N1). In order to support5 ig. 5. The exploration results of the ﬁve different HyperOpt methods with different conﬁgurations are visualized in the Exploration overview:Random search (top-left), Bayesian optimization (bottom-left), HyperBand (top-middle), BOHB (bottom-middle), and PBT (right). It shows that eachmethod explores the same hyperparameter search spaces in a different manner, and the results vary with the conﬁguration values of each method. such needs, HyperTendril provides both qualitative and quantitativeanalysis interfaces to identify effective hyperparameters (G1).

Search space overview : The Search space overview (Fig. 6(A))utilizes a parallel coordinates plot [16] to present the overall searchspaces and exploration results for the qualitative analysis support.Based on this view, different combinations of hyperparameters are ef-fectively visualized as high-dimensional vectors, together with a par-ticular objective metric (e.g., a test accuracy) chosen by users. Also,its effective interaction capability can achieve our design goals to aidusers in analyzing the effective hyperparameters (G1), ﬁltering the de-sired models (G3), and reﬁning the search space (G1) via brushinginteractions. Hyperparameters are arranged in parallel along their cor-responding axes, and the objective metric is placed in the last axis.In the case of loading multiple HyperOpt processes (G4), which havedifferent hyperparameter search spaces, the system ﬂexibly adjusts therange of its corresponding axis using the minimum and the maximumvalues of each hyperparameter and objective metric.In addition, to support the quantitative analysis, HyperTendril uti-lizes the functional-ANOVA (fANOVA) method [15], which measuresthe importance of hyperparameters based on the tested machine learn-ing models. In the machine learning research community, there havebeen numerous studies for assessing and quantifying the importanceof hyperparameters [9, 15, 18, 31, 40, 41]. Most studies quantify theimportance of hyperparameters by building a performance estimationmodel that predicts the dependent variable (i.e., model performance)for the independent variables (i.e., hyperparameter conﬁgurations).Among various approaches, we chose the fANOVA method since themethod ensures linear-time performance in computing the importance,and it computes the importance of both single hyperparameter and in-teraction (i.e., joint) effects between them.The top of parallel coordinates in the Search space overview(Fig. 6(A)) summarizes the results of the importance analysis. Thevisualization is composed of two layers of a bar plot for the impor-tance of individual hyperparameters and the selected hyperparameter,respectively. The width of each bar represents the relative importancebetween hyperparameters (Fig. 6(a)), and each bar is ordered by itsimpacts from right to left. Each vertical curved line connects the corre-sponding hyperparameter axis of the parallel coordinates. When usersinteract with each individual importance (b), the target hyperparame-ter is visualized in the second layer along with related interaction ef-fects, and the estimated performance values in the search space regionare visualized in the parallel coordinates as a bar chart. Through theexplicit guidance for the effective hyperparameters and their promis- ing regions, HyperTendril can assist users in identifying the promisinghyperparameters and in reﬁning the search spaces for a subsequentHyperOpt process, by narrowing or widening them. Also, hyperpa-rameters with low-performance expectations can be removed from thesearch spaces, thus effectively steering the search process.The Search space overview also provides an interface for investi-gating the interactions between other hyperparameters so that userscan consider their relationships in modifying the search space (G1). Ifusers interact with other hyperparameters in the second layer whilea single hyperparameter is activated (Fig. 6(c)), the Search spaceoverview summarizes the interaction effect between the two selectedhyperparameters on the parallel coordinates, and users can investigatethe effect of the brushed range on the other hyperparameter. Fig. 6(c)shows how a user can ﬁnd a promising sub-space of the weight decayhyperparameter, considering the relationship with the learning rate hy-perparameter. When the user brushes with the range between 10 − and10 − of the ‘weight decay’ axis, the bar plot on the ‘learning rate’ axisis updated to a low height, showing that the selected range could notproduce a higher performance score. When the user brushes with therange between 10 − and 10 − , the bar plot is updated to a full height,showing that the range is good for the model performance (Fig. 6(d)). Hyperparameter importance view : While Search space overviewsupports identifying the effective hyperparameters, visualizing detailsof the importance estimation on the view can overwhelm users. To pro-vide the details, we designed the Hyperparameter importance view, asshown in Fig. 6(B). The Hyperparameter importance view utilizes amatrix-based visualization, which is designed to visualize set inter-sections [22], to represent the importance of both individual hyper-parameters and interactions between the hyperparameters efﬁciently.Each hyperparameter is placed in a column of the matrix, and eachmeasured importance is listed by the value of importance so that userscan quickly recognize and focus on the most important hyperparam-eters. The corresponding hyperparameter for each row is visualizedwith a ﬁlled circle and is connected via a line if the row representsan interaction effect between hyperparameters. The values of each im-portance and their conﬁdence values are visualized next to the circles,and if a user interacts with a particular row, the details of the perfor-mance estimation result are visualized. Fig. 6(B) shows examples ofthe details for the importance estimation. The bottom-middle of theﬁgure presents the detail of the weight decay hyperparameter. Lineand interval areas of the estimated model performance (y-axis) for thehyperparameter value (x-axis) are visualized. The green-colored inter-val shows the standard deviation of the estimation. The bottom-right6 o appear in an IEEE VGTC sponsored conference proceedings

Fig. 6. Illustration of visual exploration of (A) Search space overview and (B) Hyperparameter importance view to (a and b) identify effectivehyperparameters, (c) analyze interaction effects between two selected hyperparameters, and (d) reﬁne the search space to explore further. presents the detail of the interaction effect between the weight decayand learning rate hyperparameters. A heatmap visualization shows theestimated model performance according to the hyperparameter searchspaces to provide an overall dependency between them.

HyperTendril allows users to switch from the result overview to in-depth analysis on demand so that they can evaluate the training pro-cess of the deep learning models (e.g., reviewing a learning curve forthe diagnosis of under-ﬁtting or over-ﬁtting of models), supporting G3of our design goals. By ﬁltering or selecting the models from eachinter-linked components (B, C, and E1) in Fig 1, the Model analysisview visualizes the line plot of each model over iterations with severalmetrics users selected, as shown in Fig. 1(D). It supports various inter-actions for the analysis of models such as aggregation of the movingaverage of a metric, area zooming, and others.

EPLOYMENT TO I NDUSTRIAL - SCALE E NVIRONMENT

We have deployed HyperTendril on a cloud-based machine learningplatform by NAVER called NSML [19, 36]. The NSML supports aHyperOpt framework called CHOPT [20], and it provides various op-timization methods, from simple random search to advanced algo-rithms [3, 8, 17, 23, 33]. In addition, it provides the client with var-ious APIs that can conﬁgure and run AutoML processes, and retrievethe recorded data. Users who want to use the HyperOpt frameworkfor their models can easily do so by adding only a few lines of codeand a conﬁguration ﬁle, which contains a metric to be optimized, hy-perparameter spaces to be searched, methods for exploring the searchspaces, and others. Once an AutoML process is submitted to the Au-toML agent, the users can utilize HyperTendril to explore the summa-rized results. We note that the HyperTendril runs on top of the internalsystem, but it is designed to work with various AutoML frameworks.For implementation details, each component was implemented withReact.js , and we used D3.js V5 to build visualization components.The fANOVA method for estimating hyperparameter importance wasimplemented based on the ofﬁcial source code written in Python. Thegenerated model data by AutoML processes and the scalar data of themodels from backend are passed to the interface with JSON format.Meanwhile, the volume of scalar data generated from an industry-scalemodel is large in general, thus rendering performance is signiﬁcantlyreduced when users compare and analyze multiple models using theModel analysis view. To mitigate this issue, we utilize a reservoir sam-pling [42] technique to optimize the performance by ﬁxing the size ofthe samples from the log data of each selected model. https://reactjs.org/ https://d3js.org/ https://github.com/automl/fanova ASE STUDIES

To demonstrate how HyperTendril can help users optimize their mod-els and achieve our design goals, we conducted case studies in collab-oration with two ML experts who have expertise in computer visionand natural language processing, respectively (denoted as P1 and P2).

In this case, we illustrate how HyperTendril helps users understandand diagnose the executed HyperOpt algorithm and its conﬁguration,which supports G2, G3, and G4 of our design goals.P1, who is a machine learning engineer developing the image clas-siﬁcation models, optimized the hyperparameters of the ResNet [12]model for CIFAR100 datasets. He ﬁrst set the hyperparameter searchspaces to optimize: layer depth, weight decay, and learning rate. Withlittle prior knowledge of the methods of AutoML, he created a sin-gle AutoML process using the Population-Based Training (PBT) [17]method as he heard that the method could effectively optimize deeplearning models in a short amount of time. Because he had no un-derstanding of the method, he used the default conﬁguration providedby the example code. After the AutoML process was completed, hechecked the results through HyperTendril and found that the recordedbest performance score was 55.23%, which is much lower than itsknown performance. Through the Exploration overview, he conﬁrmedthat the model’s performance continually improved after a few itera-tions of the PBT algorithm, but he could not resolve why the perfor-mances were generally low (Fig. 7(a)).To ﬁnd out the cause of the phenomenon, he analyzed the detailedbehavior of the PBT method through the Exploration overview. Heobserved that the method only focused on training networks with 20layers and discarded the rest of the networks, which have deeper lay-ers, as shown in Fig. 7(a). Following this observation, he analyzed thetraining status of networks with different layers, which were randomlysampled at the initial stage of the PBT method, through the Modelanalysis view by interacting with them on the Exploration overview(supporting G3). The Model analysis view (Fig. 7(c)) revealed the factthat the training of each network was completed without saturatingthe loss function values and compared their performances by the Au-toML method (T1 in Fig. 7(c)). Based on this ﬁnding, he was able tolearn that performance evaluation time should be carefully conﬁguredto prevent the method from discarding models in a too early stage oftraining. Utilizing the obtained insight, he reﬁned the conﬁguration ofthe PBT method by delaying the evaluation timing (as 150th epoch)and created a new AutoML process (supporting G2 and G4). He couldobtain an accurate model of 78.07% test accuracy score, about 23%higher than of the initial process.

Next, we illustrate how HyperTendril helps users identify effective hy-perparameters and reﬁne those search spaces, supporting G1 and G4.7 r a i n / l o ss V a l i d a t i o n / l o ss Iteration step of training networks

Layer depth16411056443220 (a) Initial configuration of PBT - T1 (c) Training process of initial population(b) Refined configuration of PBT - T2 T1: 5th epoch T2: 150th epoch

Proper time to compare models in the first population La y e r dep t h La y e r dep t h The PBT method considers the shallownetworks as promising at T1 Fig. 7. The Exploration overview shows the results of two HyperOptprocesses using the PBT method with different evaluation time T , con-ﬁgured by the user. The Model analysis view shows the training statusof six models with different layer depths, which were sampled by themethod at the initial stage. It shows that the method evaluates models’performance for each population at different times by the value of T . P2, who is working on sentiment analysis of customer reviews, op-timized the hyperparameters of an LSTM [13]-based model that clas-siﬁes the positive and negative movie reviews in a major portal com-pany. She ﬁrst set the hyperparameter search spaces to optimize: learn-ing rate, weight decay, dropout, learning rate scheduler, two coefﬁ-cients (i.e., betas, used for computing running averages of gradient)of Adam [21] optimizer. She started an initial AutoML process withthe Bayesian optimization method, which is set to explore 100 hyper-parameter conﬁgurations. After the initial process was ﬁnished, sheﬁrst checked the best-recorded performance (89.9%) and overall per-formance distribution in the Optimization overview.Next, she wanted to identify the hyperparameters that have signif-icant impacts on the validation accuracy, following the visual explo-ration illustrated in Fig. 6. She ﬁrst brushed the top of the last axisof parallel coordinates in the Search space overview to examine whichregions of each hyperparameter are effective in the validation accuracy(Fig. 6(a)). She noticed that the lower values of the weight decay hy-perparameter have impacts on the performance score, by observing anumber of polylines at the bottom of the ‘weight decay’ axis. In addi-tion, she could ﬁnd out that the weight decay had the greatest inﬂuenceon the model performance, by observing the list of the estimated im-portance boxes and the nearest axis of the last axis of the parallel coor-dinates. To identify which regions are effective in the search space, sheinteracted with the weight decay hyperparameter box (Fig. 6(b)). Afterthat, she could recognize that the effective values of the hyperparame-ter are under about 2 × − , by looking at the bar plot on the ‘weightdecay’ axis and the detailed estimation results in the Hyperparame-ter importance view. Following this, she interacted with the ‘learningrate’ box located at the left of the ‘weight decay’ box (Fig. 6(c)) toexamine the interaction effect with the highly correlated hyperparam-eter. Then she checked the overall dependency through the heatmapvisualization in the Hyperparameter importance view, and tried to ﬁndthe area that maximizes the expected performance by brushing on the‘weight decay’ axis (Fig. 6(c and d)). After ﬁnding the effective areaof the weight decay (between 10 − and 10 − ), she decided to performa subsequent optimization process with more computational resourceson the area. In a similar manner, she performed analyses of effectiveregions on other hyperparameters and reﬁned their search spaces.After analyzing all the hyperparameters, she interacted with theControl panel and performed a new AutoML process with reﬁnedsearch spaces. When the second process was complete, she noticedthat the best performance score was 90.38%, 0.48% better than theprevious process (supporting G1 and G4). She also learned that theperformance distribution is more stable than the initial process ( µ =0.871 (from 0.728), σ = 0.059 (from 0.175)), implying that the reﬁnedsearch spaces yield generally good performance on her task. Satisﬁedwith the results, she decided to stop the optimization process. Method−levelHyperparameter−levelModel−level 0 10050

Interaction frequency A na l ys i s l e v e l Fig. 8. Interface usage distribution of HyperTendril with 32 users and223 hyperparameter optimization processes.

SER S TUDY

In this section, we evaluate the utility of HyperTendril. In Section 8.1,we summarize the usage behavior of HyperTendril with the interactionlog data collected after the deployment. In Section 8.2, we describevarious use cases of HyperTendril by the representative users and dis-cuss the utility of HyperTendril through in-depth interviews.

In order to understand how users use the utilities of HyperTendril ingeneral, we ﬁrst collected and analyzed interaction logs for each inter-face. We released an internal service of our HyperTendril from October1, 2019, and we collected interaction logs from November 1, 2019, toMarch 1, 2020. There was a total of 32 users during the period, and theinteraction logs for 223 AutoML processes were collected. We catego-rized the collected logs according to each analysis level of granularityand summarized the interface usages with the interaction frequency, asshown in Fig. 8. Based on the analysis, we found out that the distri-bution of interaction frequency is evenly distributed by each analysislevel. This result means that users have various types of analytic needsin performing the HyperOpt task, as identiﬁed in Section 3, and thatHyperTendril has properly supported their needs.

To better understand how users performed HyperOpt tasks for theirproblems with HyperTendril, we conducted in-depth interviews withengineers and scientists who actively used our tools based on the loganalysis we performed. We summarize key ﬁndings and feedback fromthese studies to highlight HyperTendril’s beneﬁts.

We selected and recruited the three most active users of HyperTendrilwith a different team, domain, and their usage behavior:

User A is a machine learning engineer who has expertise in nat-ural language processing. He works with the news team to developclassiﬁcation models for malicious comments in news articles.

User B is a research engineer who has expertise in computer vi-sion. He is developing a model that retrieves similar images or blogpostings on the web, based on given images. He is interested in notonly effectively tuning the model performance, but also optimizingthe model size that could be mounted in edge devices.

User C is a research scientist who has expertise in machine learn-ing and AutoML. Unlike A and B, he studies and develops AutoMLframeworks, including hyperparameter optimization, neural archi-tecture search, and others. He is interested in the differences in thebehavior of various optimization algorithms.We had a 60-minute session with each of the three participants. Forthe ﬁrst 20 minutes, we asked them a few questions about their tasks,typical workﬂows of their HyperOpt processes, and the main intent ofanalysis when using HyperTendril. We then asked them to revisit theprocess they performed with HyperTendril and describe the sequenceof interactions when analyzing the results while thinking aloud.

We summarized the main ﬁndings and feedback from the interviewsinto the following criteria, by highlighting how the HyperTendril helpsto perform their optimization processes and achieves our design goals.

Diagnose AutoML and algorithm conﬁguration.

Users A and Cinsensitively used the PBT algorithm to perform HyperOpt processes.Both commented that HyperTendril helped them understand how thealgorithm works on their tasks and diagnose their conﬁgurations, sat-isfying N2 and G2. User A said that “Although we have never used8 o appear in an IEEE VGTC sponsored conference proceedings L2 r egu l a r i z a t i on BA max: 0.096min: 0.064max: 0.024min: 0.016 Randomly sample between the min.(x0.8) and max.(x1.2) valueof its origin value.

Fig. 9. The use case of the Exploration overview to diagnose an AutoMLmethod. It shows that the PBT method can have a bias for low-valueregions when exploring search space if there is no proper scaling forperturbation in its implementation. the PBT method before, Exploration overview helped to build a men-tal model of how the method explores the search spaces. We initiallyset the algorithm conﬁguration with a default value but observed thatthe process converges too quickly to a particular point without perfor-mance improvement due to the low survivor rate of each generation.Then we expanded the diversity of the population by increasing thesurvivor rate and could ﬁnd better models. The visualization can helpto analyze the trade-offs between exploration and exploitation and todetermine which conﬁguration is appropriate for our task.”User C commented that the HyperTendril could assist AutoML al-gorithm developers in identifying bugs in the implementation of Au-toML methods. Fig. 9 shows the results of the PBT optimizationmethod. The Exploration overview reveals that the method tends toexplore the low-value regions in the search space, and the perturba-tion scale in the high-value region (Fig. 9(A)) is greater than that ofthe lower values (B). Through the visualization, User C was able toquickly recognize the strange behaviors of the perturbation and iden-tify the bug in the current implementation of the AutoML algorithm,by incorporating his domain knowledge into the analysis [38]. In theAutoML framework, the PBT method copies promising model’s pa-rameters and randomly perturbs its hyperparameter with noise (usuallyby a factor between 1.2 and 0.8 of the original value) in the population.However, the current implementation performed perturbation from thepromising hyperparameter value without considering the scaling fac-tor. In this case, hyperparameters of smaller values tend to have smallerperturbation range compared to hyperparameters of larger values, re-sulting in the method to be biased toward the low-value regions whenexploring the search space. User C said that “It was difﬁcult to iden-tify problems in algorithm’s implementation that did not produce anerror using the only console of standard-based output, but the Explo-ration overview helped to identify implemented algorithm’s abnormalbehavior by displaying the search history and its tendency.”

Identify effective hyperparameters.

All the participants appreci-ated the interactive and iterative capabilities for the HyperOpt processbased on the guidance of effective hyperparameters (N1, G1, and G4).They believed such interaction could improve an AutoML process.They commented that “Human involvement with prior knowledge andobservation sometimes can be more efﬁcient than optimization algo-rithms, especially when there are a large search space and limited com-putational budget.” User A said that “Even if there are enough time andcomputational budget, it takes a long time to try all the cases and checkthe results. So the optimization process can be more effective if we canfocus on a few effective hyperparameters. The hyperparameter-levelanalysis view helps to reﬁne search spaces by suggesting hyperparam-eters that should be dropped or explored more in a subsequent opti-mization process, and it helps engage the optimization task by provid-ing the importance quantitatively.” In addition, they valued the com-plementary interactions with the Hyperparameter importance view andSearch space overview. User B said that “Effective and ineffective hy-perparameters could be observed in the Search space overview, but itcan be veriﬁed more in the Hyperparameter importance view, enablingmore accurate reasoning for search space reﬁnement.”

Select and validate target models.

All users appreciated the factthat visual exploration provided the ability to quickly select interest-ing models out of a large number of models (N3 and G3). User B,who is closely working in real services, actively used and appreciatedthe Optimization overview and Model analysis view to validate hismodels for the service deployment. He said, “The trade-off plot in the Optimization overview helps to ﬁlter models that do not satisfy the re-quired network size for our service. Also, the exploration ﬂow from theoverview to the in-depth analysis of ﬁltered models is intuitive, and itwas an essential part of the ﬁnal validation since we usually reporteda dozen of validation-related metrics in each iteration during modeltraining.” Meanwhile, Users A and C commented that the Model anal-ysis view helps to validate and determine the proper evaluation time ineach exploration loop of HyperOpt algorithms. User C said, “Becausesometimes the slope of the loss function value can vary depending onthe hyperparameter of the deep learning model, setting the proper vali-dation time is important in using AutoML. Therefore, it is necessary tocheck the value of loss function through the Model analysis view andset the appropriate validation time so that the early stopping methodcan be operated without biases to speciﬁc hyperparameters.”

ISCUSSIONS AND F UTURE W ORK

Scalability of visualization.

Although there may be some scalabilityissues on parallel coordinates in HyperTendril, the average number ofhyperparameters used by users in the real industrial environment wasfound to be around 5 with a maximum of 16. It was partly due tousers’ limited cognitive load as well as limited computational budgetand time. Therefore, we believe that the scalability of parallel coordi-nates in terms of the number of dimensions is acceptable for most ofthe real-world HyperOpt tasks in a cloud-based machine learning plat-form. Another scalability issue is caused by the increasing number oftested hyperparameter settings, each of which is drawn as a polyline inparallel coordinates. A technique for bundling polylines based on thehyperparameter importance will be studied in future work.

Restricted hyperparameter importance estimation.

Finding effec-tive hyperparameters is one of the most important needs in HyperOpttasks. However, in the PBT method, critical limitations exist for thefANOVA approach we employed because the ﬁnal model can utilizemultiple sets of hyperparameters during the HyperOpt process, result-ing in the incorrect results in the variable analysis. A further study onthe methodology would be helpful in handle the incorrect results.

10 C

ONCLUSION

We presented HyperTendril, a visual analytics system that supports theanalysis and steering of the hyperparameter optimization process fordeep neural networks. We conducted preliminary interviews with tenmachine learning practitioners across various domains to identify theiranalytic needs. Based on the interviews, we distilled four main designgoals: (1) guidance for effective hyperparameters; (2) understandingand diagnosing various AutoML methods; (3) reasoning and ﬁlteringtarget models; (4) supporting iterative optimization processes. We thenproposed a visual analytics system allowing users to examine multipleAutoML processes and analyze the results at three levels of granular-ity: search method-, hyperparameter-, and model-level analysis. Hy-perTendril has been deployed on top of a machine learning platformin a major software company. We presented a user study with real-world users in the platform and use cases of how HyperTendril can beutilized with different applications. Our results showed that users ap-preciated the utility of HyperTendril in reﬁning the search space andthe method conﬁguration of AutoML. A CKNOWLEDGMENTS

This work was partly supported by Basic Science Research Program throughthe National Research Foundation of Korea (NRF) funded by the Ministry ofScience, ICT & Future Planning (2019R1A2C4070420) and by Korea ElectricPower Corporation (Grant number:R18XA05). EFERENCES [1] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama. Optuna: A next-generation hyperparameter optimization framework. In

Proc. the ACMSIGKDD International Conference on Knowledge Discovery and DataMining (KDD) , pages 2623–2631, 2019.[2] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimiza-tion.

Journal of Machine Learning Research (JMLR) , 13(Feb):281–305,2012.[3] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl. Algorithms for hyper-parameter optimization. In

Proc. the Advances in Neural InformationProcessing Systems (NeurIPS) , pages 2546–2554, 2011.[4] D. Bruckner. Ml-o-scope: a diagnostic visualization system for deep ma-chine learning pipelines. Master’s thesis, EECS Department, Universityof California, Berkeley, May 2014.[5] S. Chung, C. Park, S. Suh, K. Kang, J. Choo, and B. C. Kwon. Re-VACNN: Steering convolutional neural network via real-time visual an-alytics. In

Future of interactive learning machines workshop at the30th annual conference on neural information processing systems (NIPS)

Computer Graphics Forum , volume 36, pages 458–486. Wiley Online Library, 2017.[8] S. Falkner, A. Klein, and F. Hutter. BOHB: Robust and efﬁcient hyper-parameter optimization at scale. arXiv preprint arXiv:1807.01774 , 2018.[9] C. Fawcett and H. H. Hoos. Analysing differences between algorithmconﬁgurations through ablation.

Journal of Heuristics , 22(4):431–458,2016.[10] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley.Google vizier: A service for black-box optimization. In

Proc. the ACMSIGKDD International Conference on Knowledge Discovery and DataMining (KDD) , pages 1487–1495, 2017.[11] I. Goodfellow, Y. Bengio, and A. Courville.

Deep learning . 2016.[12] K. He et al. Deep residual learning for image recognition. In

Proc. IEEEConference on Computer Vision and Pattern Recognition (CVPR) , pages770–778, 2016.[13] S. Hochreiter and J. Schmidhuber. Long short-term memory.

Neuralcomputation , 9(8):1735–1780, 1997.[14] F. Hohman, N. Hodas, and D. H. Chau. ShapeShop: Towards under-standing deep learning representations via interactive experimentation.In

Proc. the ACM SIGCHI International Conference on Human Factorsin Computing Systems (CHI) . ACM, 2017.[15] H. Hutter Frank, Hoos, U. Ca, and K. Leyton-Brown. An efﬁcient ap-proach for assessing hyperparameter importance. In

Proc. the Interna-tional Conference on Machine Learning (ICML) , pages 754–762, 2014.[16] A. Inselberg and B. Dimsdale. Parallel coordinates for visualizing multi-dimensional geometry. In

Computer Graphics 1987 , pages 25–44. 1987.[17] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue,A. Razavi, et al. Population based training of neural networks. arXivpreprint arXiv:1711.09846 , 2017.[18] D. Jia, R. Wang, C. Xu, and Z. Yu. Qim: Quantifying hyperparameterimportance for deep learning. In

Proc. the IFIP International Conferenceon Network and Parallel Computing , pages 180–188, 2016.[19] H. Kim, M. Kim, D. Seo, J. Kim, H. Park, S. Park, H. Jo, K. Kim, Y. Yang,Y. Kim, et al. NSML: Meet the mlaas platform with a real-world casestudy. arXiv preprint arXiv:1810.09957 , 2018.[20] J. Kim, M. Kim, H. Park, E. Kusdavletov, A. Kim, J.-H. Kim, J.-W.Ha, and N. Sung. CHOPT: Automated hyperparameter optimizationframework for cloud-based machine learning platforms. arXiv preprintarXiv:1810.03527 , 2018.[21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014.[22] A. Lex, N. Gehlenborg, H. Strobelt, R. Vuillemot, and H. Pﬁster. UpSet:visualization of intersecting sets.

IEEE Transactions on Visualization andComputer Graphics (TVCG) , 20(12):1983–1992, 2014.[23] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hy-perband: A novel bandit-based approach to hyperparameter optimization.

Journal of Machine Learning Research (JMLR) , 18(1):6765–6816, 2017.[24] T. Li, G. Convertino, W. Wang, H. Most, T. Zajonc, and Y.-H. Tsai. Hy-perTuner: Visual analytics for hyperparameter tuning by professionals.[25] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118 , 2018.[26] J. Liu, S. Tripathi, U. Kurup, and M. Shah. Auptimizer–an extensi-ble, open-source framework for hyperparameter tuning. arXiv preprintarXiv:1911.02522 , 2019.[27] Microsoft. NNI open-source https://github.com/microsoft/nni. Last ac-cessed 3 September 2020.[28] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang,W. Paul, M. I. Jordan, and I. Stoica. Ray: A distributed framework foremerging ai applications. arXiv preprint arXiv:1712.05889 , 2017.[29] H. Park, J. Kim, M. Kim, J.-H. Kim, J. Choo, J.-W. Ha, and N. Sung. Vi-sualHyperTuner: Visual analytics for user-driven hyperparameter tuningof deep neural networks. In

Demo in the International Conference onMachine Learning and Systems , 2019.[30] N. Pezzotti, T. H¨ollt, J. Van Gemert, B. P. Lelieveldt, E. Eisemann, andA. Vilanova. DeepEyes: Progressive visual analytics for designing deepneural networks.

IEEE Transactions on Visualization and ComputerGraphics (TVCG) , 24(1):98–108, 2018.[31] P. Probst, B. Bischl, and A.-L. Boulesteix. Tunability: Importanceof hyperparameters of machine learning algorithms. arXiv preprintarXiv:1802.09596 , 2018.[32] D. Sacha, M. Kraus, D. A. Keim, and M. Chen. Vis4ml: An ontology forvisual analytics assisted machine learning.

IEEE Transactions on Visual-ization and Computer Graphics (TVCG) , 25(1):385–395, 2018.[33] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas. Tak-ing the human out of the loop: A review of bayesian optimization.

Proc.the IEEE , 104(1):148–175, 2015.[34] D. Smilkov, S. Carter, D. Sculley, F. B. Vi´egas, and M. Wattenberg.Direct-manipulation visualization of deep networks. arXiv preprintarXiv:1708.03788 , 2017.[35] T. Spinner, U. Schlegel, H. Sch¨afer, and M. El-Assady. explAIner:A visual analytics framework for interactive and explainable machinelearning.

IEEE Transactions on Visualization and Computer Graphics(TVCG) , 26(1):1064–1074, 2019.[36] N. Sung, M. Kim, H. Jo, Y. Yang, J. Kim, L. Lausen, Y. Kim, G. Lee,D. Kwak, and J.-W. Ha. NSML: A machine learning platform that enablesyou to focus on your models. arXiv preprint arXiv:1712.05902 , 2017.[37] T. Swearingen, W. Drevo, B. Cyphers, A. Cuesta-Infante, A. Ross, andK. Veeramachaneni. ATM: A distributed, collaborative, scalable systemfor automated machine learning. In

Proc. the IEEE International Confer-ence on Big Data (Big Data) , pages 151–162, 2017.[38] G. K. Tam, V. Kothari, and M. Chen. An analysis of machine-and human-analytics in classiﬁcation.

IEEE Transactions on Visualization and Com-puter Graphics (TVCG) , 23(1):71–80, 2016.[39] C. Tsirigotis, X. Bouthillier, F. Corneau-Tremblay, P. Henderson,R. Askari, S. Lavoie-Marchildon, T. Deleu, D. Suhubdy, M. Noukhovitch,F. Bastien, et al. Or´ıon: Experiment version control for efﬁcient hyperpa-rameter optimization. 2018.[40] J. N. van Rijn and F. Hutter. An empirical study of hyperparameter impor-tance across datasets. In

AutoML@ PKDD/ECML , pages 91–98, 2017.[41] J. N. van Rijn and F. Hutter. Hyperparameter importance across datasets.In

Proc. the ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining (KDD) , pages 2367–2376, 2018.[42] J. S. Vitter. Random sampling with a reservoir.

Journal of ACM Transac-tions on Mathematical Software (TOMS) , 11(1):37–57, 1985.[43] D. Wang, P. Ram, D. K. I. Weidele, S. Liu, M. Muller, J. D. Weisz, A. Va-lente, A. Chaudhary, D. Torres, H. Samulowitz, et al. Autoai: Automat-ing the end-to-end ai lifecycle with humans-in-the-loop. In

Proceedingsof the 25th International Conference on Intelligent User Interfaces Com-panion , pages 77–78, 2020.[44] Q. Wang, Y. Ming, Z. Jin, Q. Shen, D. Liu, M. J. Smith, K. Veeramacha-neni, and H. Qu. ATMSeer: Increasing transparency and controllabilityin automated machine learning. In

Proc. the ACM SIGCHI InternationalConference on Human Factors in Computing Systems (CHI) , page 681,2019.[45] D. K. I. Weidele, J. D. Weisz, E. Oduor, M. Muller, J. Andres, A. Gray,and D. Wang. AutoAIViz: opening the blackbox of automated artiﬁcialintelligence with conditional parallel coordinates. In

Proc. the ACM Con-ference on Intelligent User Interfaces (IUI) , pages 308–312, 2020.[46] S. R. Young, D. C. Rose, T. P. Karnowski, S.-H. Lim, and R. M. Pat-ton. Optimizing deep learning hyper-parameters through an evolution-ary algorithm. In

Proceedings of the Workshop on Machine Learning inHigh-Performance Computing Environments , pages 1–5, 2015., pages 1–5, 2015.