[PDF] Automated Machine Learning Service Composition

Abstract

Automated service composition as the process of creating new software in an automated fashion has been studied in many different ways over the last decade. However, the impact of automated service composition has been rather small as its utility in real-world applications has not been demonstrated so far. This paper presents \tool, an algorithm for automated service composition applied to the area of machine learning. Empirically, we show that \tool is competitive and sometimes beats algorithms that solve the same task but not benefit of the advantages of a service model. Thereby, we present a real-world example that demonstrates the utility of automated service composition in contrast to non-service oriented solutions in the same area.

Full PDF

AAutomated Machine Learning ServiceComposition

Felix Mohr, Marcel Wever, Eyke H¨ullermeier

Paderborn University

Abstract.

Automated service composition as the process of creatingnew software in an automated fashion has been studied in many diﬀer-ent ways over the last decade. However, the impact of automated servicecomposition has been rather small as its utility in real-world applicationshas not been demonstrated so far. This paper presents

MLS-Plan , analgorithm for automated service composition applied to the area of ma-chine learning. Empirically, we show that

MLS-Plan is competitive andsometimes beats algorithms that solve the same task but not beneﬁt ofthe advantages of a service model. Thereby, we present a real-world ex-ample that demonstrates the utility of automated service composition incontrast to non-service oriented solutions in the same area.

Automated service composition as the process of creating new software in anautomated fashion has been studied in many diﬀerent ways over the last decade[1]. The most commonly addressed problem is the composition or conﬁgurationof a single process either by instantiating or reﬁning an abstract workﬂow [2,3,4]or by creating such a process from scratch given some behavior description interms of preconditions and eﬀects [5,6,7].In the last years, much of the euphoria about automated composition hasdisappeared. First, services in the real world did not appear so nicely describedas expected, which rules out many approaches relying on such descriptions. Eventhough much functionality is available as services, semantic descriptions, e.g., inOWL-S, are rare. Second, even for approaches not relying on such assumptions,automated service composition has been resolved mostly on toy examples andnot shown to be relevant in real world scenarios. In fact, there is only a handfull of approaches that leave the description level at all to work with actuallyimplemented services [8,9,10]. However, even these are rather artiﬁcial and notreal- world services.The main contribution of this paper is to demonstrate that automated servicecomposition can outperform manual or automated non-service-oriented softwarecomposition in real world applications. The domain of the considered servicecomposition problem is automated machine learning . More precisely, given somesample data, the task is to compose a machine learning pipeline (consisting ofmachine learning services) that maximizes the classiﬁcation accuracy over new a r X i v : . [ c s . S E ] S e p Felix Mohr, Marcel Wever, Eyke H¨ullermeier data from the same source. Up to now, this problem has only been tackledby framework-speciﬁc tools such as

Auto-WEKA [11,12] and auto-sklearn [13].However, virtually all of the algorithms are available not only in the libraries ofthose frameworks but also as services oﬀered by commercial platforms .This paper augments our approach sketched in [14] by a detailed technicaldescription and an empricial evaluation that shows the beneﬁts of using servicesfor the purpose of automated machine learning. Similar to existing compositionapproaches [4,15,16], our approach builds on top of hierarchical planning. Themain diﬀerence to existing approaches is that the search process is based onperformance measures that are acquired from the execution of composition can-didates. Since classical planners do not support such a guidance, we implementeda new planner, MLS-Plan , which is available for public . We support our claimwith an empirical evaluation in which we compare our approach against tradi-tional tools and a non-service-oriented version of our own tool. Automated service composition is often reduced to AI planning [1], and hierar-chical automated service composition conducts such a reduction to hierarchicalplanning [17]. The core idea of hierarchical task planning (HTN) is to iterativelybreak down an initially given complex task into new sub-tasks, which may alsobe complex or simple (no need of further reﬁnement). The complex tasks arerecursively decomposed until only simple tasks remain. This is comparable, forexample, to deriving a sentence from a context-free grammar, where complextasks correspond to non-terminals and simple tasks are terminal symbols.There have been several approaches to hierarchical automated service compo-sition. All these approaches are based on the composite process model in OWL-S.Roughly speaking, a composite process is just an abstract process consisting ofa control ﬂow that contains invocations of other service operations. The servicecomposition problem is represented by a description of such a composite process,which is equivalent to the initial complex task of the HTN problem with onlyone possible reﬁnement corresponding to the service process. Existing servicesare either also composite processes, which are translated to complex tasks, oratomic service, which are translated to simple tasks. The very initial works onthis topic [18] were concerned with deriving any valid composition, which is usu-ally a trivial undertaking. Paik et al. slightly extended this setup by consideringadditional logic constraints on plans [19]. Follow-up work by Sohrabi et al. con-sidered the fact that clients may order plans based on preferences that can beexpressed in terms of logic conditions achieved by a process [15,16,20].Curiously, there has been almost no work on optimizing the compositionquality in terms of Quality of Service (QoS). While QoS optimization has beenstudied a lot in automated service composition in general, there is almost no such For the reviewer: algorithmia.com is such a provider, but we do not want to advertisespeciﬁc providers, so this will not be contained in the paper https://github.com/fmohr/ML-Plan utomated Machine Learning Service Composition 3 work on hierarchical composition approaches. Indeed, the preferences over plansin the work of Sohrabi are also induced by an implicit quality. However, the onlywork that optimize QoS typically considered in automated service compositionsuch as runtime, cost, etc. has been presented by [21].In this paper, we consider the problem of ﬁnding a composition with best(numeric) quality with the limitation that this quality measure can not be ag-gregated from the services contained in the composition. So our setup is similarto the one of [21] in that we assume a numeric quality to be optimized, but thediﬀerence is that there is no (known) way to statically assign qualities to existingservices and compute the quality of a composition from these qualities. Instead,the quality of a composition can be accessed by a benchmark that executes thecomposition and observes the desired quality.A highly relevant real world example for such a problem is automated ma-chine learning pipeline construction. In a nutshell, the classiﬁcation problem inmachine learning is to learn a (non-deterministic) relation between instances X and class labels Y . Instances are described in terms of numeric or categoricalattributes called features, and a set of such instances is given, each of which isadditionally associated with a class label. The goal is to establish a new function h : X → Y , called the hypothesis, such that h maximizes the ratio of cor-rectly predicted class labels for new instances. The hypothesis may be a singleclassiﬁcation algorithm such as a decision tree, a neural network, or a supportvector machine, or it may be a whole pipeline consisting of (possibly complex)pre-processing algorithms followed by such a classiﬁer. Finding such a pipelineautomatically is a hot topic in machine learning, but all existing solutions focuson platform-dependent frameworks [11,13]. However, service implementationswhose communication is based on HTTP exist for all of these algorithms, so itis also possible to solve the Auto-ML problem as a service composition problem.Note that the true error rate cannot even be computed exactly but onlyestimated. The true error of a pipeline is the average error produced over alldata points that exist , but only a ﬁnite sample of such points is available. Hence,one estimates the out-of-sample error by keeping a validation set of the initialdata and using it for estimating that error.We claim that solving the Auto-ML problem as a service composition can besigniﬁcantly better than sticking to a single algorithm framework such as WEKAor scikit-learn. The main reason for this is that the portfolio of implementationsfrom which one can select algorithms is much broader. WEKA implements thealgorithms in Java, and scikit-learn implements them in Python. Without theencapsulation into services, it is not easily possible to use algorithms of bothduring optimization. In this section, we describe the hierarchical planning formalism we use to cre-ate the constructor of the composed service. Like most other approaches, ourcomposition algorithm does not directly compose an entire service but a process . Felix Mohr, Marcel Wever, Eyke H¨ullermeier

Fig. 1.

Visualization of the hierarchical structure of an machine learning pipeline

We assume that the target service has already been deﬁned in the form of atemplate with one missing process (the constructor). For example, we alreadydeﬁned how the machine learning pipeline service will work but not the atomicservices on which it will rely. The solution of the composition problem derivedwith the approach in this section will be injected into that template in order toobtain a ready-to-use service; Fig. 3.1 sketched this hierarchical template struc-ture. This section explains how this construction process is created, and thefollowing section will explain how the executable service is obtained from it.

As for any planning formalism, our basis is a logic language L and planningoperators deﬁned in terms of L . The language L has ﬁrst-order logic capacities,i.e., it deﬁnes an inﬁnite set of variable names, constant names, predicate names,function names and quantiﬁers and connectors to build formulas. A state is aset of ground literals; i.e., it does not contain unquantiﬁed variable symbols. Wedo not adopt the closed-world assumption.Constants, functions, and predicates of L may stem from a theory . A theory T deﬁnes constants, functions, and predicates and how these are to be interpreted.Predicates not contained in T behave like normal predicates in classical planning.That is, L consists of the elements of T together with uninterpreted predicatesand constants. In the formalism, we use T as a formula itself.An operator is a tuple (cid:104) name o , I o , O o , P o , E + o , E − o (cid:105) where name o is a name, I o and O o are parameter names described inputs and outputs, P o is a formulafrom L constituting its preconditions and E + o and E − o are sets of conditionalstatements α → β where α is a formula over L conditioning the actual eﬀect β ,which is a set of literals from L to be added or removed. Free variables in P o must be in I o and free variables in E + o and E − o must be in I o ∪ O o .The semantics of the planning domain are as follows. An action is an operatorwhose input and output variables have been replaced by constants; we denote P a , E + a , and E − a as the respectively replaced preconditions and eﬀects. An action a is applicable in a state s under theory T iﬀ s, T | = P a and if none of the outputparameters of a is contained in s . Applying action a to state s changes the state utomated Machine Learning Service Composition 5 in that, for all α → β ∈ E + a , β is added to s if s, T | = α ; analogously, β isremoved if such a rule is contained in E − a . A plan for state s is a list of actions (cid:104) a , .., a n (cid:105) where a i is applicable and applied to s i ; here, s i +1 is obtained byapplying a i to s i .On top of this formalism, we build a hierarchical model [22]. A task net-work is a partially ordered set T of tasks. A task t ( v , .., v n ) is a name witha list of parameters, which are variables or constants from L . For example, setP reprocessor ( pl ) could be the task of choosing and setting the preprocessingalgorithm for the pipeline object pl. A task named after an operator is called primitive , and complex else. A task whose parameters are constants is ground.While primitive tasks are realized canonically by an operation, complex tasksneed to be decomposed by methods . A method m = (cid:104) name m , t m , I m , O m , P m , T m (cid:105) consists of its name, the (non-primitive) task t m it reﬁnes, the input and outputparameters I m and O m , a logic formula P m ∈ L that constitutes the method’sprecondition, and a task network T m that realizes the decomposition. The pre-conditions may, just as in the case of operations, contain interpreted predicatesand functional symbols from the theory T .A method instantiation m is a method where inputs and outputs have beenreplaced by planning constants. m is applicable in a state s under theory T iﬀ s, T | = P m and if none of the output parameters of m is contained in s .An HTN planning problem is a tuple P = (cid:104) O, M, s , N (cid:105) where O is a set ofoperations as above, M is a set of methods, s is the initial state, and N is a tasknetwork. An HTN optimization problem is an HTN planning problem togetherwith an objective function. Formally, P ∗ is an HTN optimization problem iﬀ itis a tuple P ∗ = (cid:104) O, M, s , N, φ (cid:105) where P := (cid:104) O, M, s , N (cid:105) is an HTN planningproblem and φ is a real-valued function that assigns a score to any solution of P . A plan π ∗ is a solution to P ∗ if it is a solution to P and there is no otherplan ˆ π ∗ such that φ (ˆ π ∗ ) > φ ( π ∗ ). We assume that the composition domain is described in terms of available ser-vices and macros that encode abstract processes. Services are described by aname and a set of oﬀered operations. That is, the services are a set { s , .., s n } where each service s i is a tuple (cid:104) name i , { op i , .., op im i }(cid:105) , and each operation op ij is described like the planning operators of the HTN problem by a name, in-puts, outputs, preconditions, and eﬀects. Macros are generic process templatesdescribing reasonable process abstractions in the domain. Every macro consistsof a name, conditions under which it can be applied, and its actual process. Theelements of the process are calls to service operations or other macros, i.e., namesof service operations or macros together with bindings for the data objects usedin the inputs and outputs. In this paper, we only consider sequential macros,i.e., sequential processes, but note that if-else-statements can be easily encodedby having a separate macro for each case.Operations can be called either directly on a service or on a service instance .As in object oriented programming, we assume a class-instance-model; every ser- Felix Mohr, Marcel Wever, Eyke H¨ullermeier vice constitutes the class of all its instantiations. For example, a neural networkservice is the class for all its concrete instantiations (each of which will repre-sent a diﬀerent network). Services may have a constructor that creates a newinstance and returns the resource for that instance to the invoker. The serviceoperation calls in the macros are then either calls to static operations of a service(operations that do not depend on a particular state) or calls to the operationsof a service instance .Compositions are sequences of service operation invocations. That is, a com-position is a sequence of pairs ( o i , b i ) where o i is a service (instance) operation,and b i is a function that maps each input of o i to outputs of earlier invoca-tions or inputs of the overall process. Like in other approaches, non-sequentialcompositions are not considered .A service query consists of three parts. First, it contains a task network asdescribed above. Second, it deﬁnes initial information about the objects thatwill be passed to the network. Tasks in that network correspond either to callsto service operations (primitive tasks), or to calls to a macro, which needs to beconﬁgured (complex tasks). Third, it deﬁnes an objective function that assigns ascore to each possible solution candidate. This function is not given in a closed-form representation but as a reference to an invocable routine.The considered service composition problem is then described by a triplet ofservices, macros, and a service query. Intuitively, a solution candidate to thisproblem is a composition obtained by recursive replacements of macros by op-eration calls or other macros such that the precondition of each operation issatisﬁed in the moment of execution. A composition is an optimal solution if itis a solution candidate and if no other solution candidate receives a higher scorefrom the objective function. The translation of a service composition problem to an HTN planning problemis analogous to the one in [18] except two modiﬁcations. First, the fact that wedistinguish services from service instances requires a small modiﬁcation. Thereis a clear correspondence between macros and methods on one hand, and serviceoperations and HTN operators on the other hand, so the translation seems to becanonical. However, allowing to create new service instances (during planning)also means to allow that new operations are created during planning. It is notimmediately clear how one should treat this situation. Second, we also need totranslate the objective function, which does not exist in previous approaches.The ﬁrst point can be handled with a simple trick in that we treat all opera-tions as if they were static and add the service instance reference as an additionalinput. The diﬀerent “versions” of an operation o of a service s for diﬀerent in-stances of s are not really diﬀerent in their functionality but just deviate in the The SHOP2-encoding of Sirin et al. allows for non-sequential composite processesduring the composition, but the eventually returned composition is also sequentialutomated Machine Learning Service Composition 7

Fig. 2.

Excerpt of the search graph of the HTN planner service instance on which they are invoked. Hence, instead of adding new oper-ations we just assume that instance-speciﬁc operations have an additional anddistinguished input that represents a handle for the service instance on which itshould be invoked. Of course, the handle used for a service instance is exactlythe reference returned from the constructor of the respective service.Given the correspondence between service operations and HTN operators,translating the objective function comes down to a simple wrapper. The onlything this wrapper has to do is to map planning action syntax to service oper-ation invocation syntax. If this functionality is available, a solution of the HTNproblem can be converted into a composition, which can then be executed bythe objective function.Note that we trade the assumption that no service is both information-gathering and world-altering[18] by the assumption that the execution of world-altering services does not aﬀect the execution of other services or compositions.Sirin et al. make the ﬁrst of these two assumptions, because they want to ex-ecute some (the information-gathering) services during planning. In order toavoid side-eﬀects during planning, they forbid that such services alter the world.However, our setup precisely requires to execute entire compositions, so thisassumption is not longer needed and reliefs us from the tedious distinction ofknowledge eﬀects and physical eﬀects introduced in OWL-S. On the contrary,we need the assumption that the compositions do not alter the world in such away that the execution of other compositions is aﬀected.

Like all planning problems, HTN problems are solved using graph search algo-rithms. The (hierarchical) planning problem induces a (possibly inﬁnite) searchgraph, which is represented by a distinguished root node, a successor generatorfunction, and a goal-test function. The successor generator creates the successornodes for any node of the graph, and the goal-test decides whether a node is a

Felix Mohr, Marcel Wever, Eyke H¨ullermeier goal. Most HTN planners perform a forward-decomposition , which means thatthey create one successor for each possible decomposition of the ﬁrst unsolvedtask in the list of remaining tasks. In every child node, the list of remainingtasks is the previous list of tasks where the decomposed task is replaced by thelist that represents the respective decomposition. The resulting search graph issketched in Fig. 2 where every box shows a list of tasks (green ones are simple,the yellow one is the next complex task to be decomposed, and the red ones arecomplex to be resolved later). A node is a goal node if all remaining tasks aresimple. A standard graph search algorithm can then be used to ﬁnd a path fromthe root to a goal node.To overcome the limitation of standard solvers to additive cost measures, wedeveloped

HTN-SPlan , an HTN planner based on arbitrary node evaluationfunctions.

HTN-SPlan realizes a best-ﬁrst search where each node is labeledwith elements of an ordered set (usually real values or vectors with tie breaker).

HTN-SPlan makes no assumption (like monotonicity) about the node evalua-tions or how they are acquired. Instead, it simply requires that the node eval-uation function is provided by the user. It is then possible to conduct complexcomputations in order to obtain node evaluations, a property that is missing inclassical planners.Besides classical node evaluation functions,

HTN-SPlan oﬀers another de-fault node evaluation function based on random path completion as also used inMonte Carlo Tree Search [23]. To obtain the evaluation of a node, this strategydraws a ﬁxed number of path completions and evaluates the plan using a givenplan evaluation function. The score assigned to the node is the best score thatwas observed over these validations in order to estimate the best solution thatcan be obtained when following paths under the node.Using this evaluation function,

HTN-SPlan is an appropriate tool to solvethe service composition problem. The plan evaluation function is the wrapper ofthe objective function.It is important to be aware that, in contrast to classical heuristic approaches,

MLS-Plan does not give any guarantees about the optimality of returned so-lutions. This is precisely because it does make no assumptions about the nodeevaluation function, so there is actually no guarantee possible. So, strictly speak-ing, the algorithm does not even return solutions in the narrower sense at allbut only solution candidates, because optimality is a solution criterion.However, this is not a particular weakness of

HTN-SPlan since, withoutfurther assumptions, it is not even possible to prove the optimality of a solutionwithout enumerating all candidates. Unless all solutions have been observed, every algorithm is prone to miss the true optimum.While it is probably possible to make some assertions about optimality (usu-ally in the form of bounds), we do not provide such proofs here. In fact, forthe concrete evaluation function based on random completions, some guaranteesappear provable since the algorithm is similar to UCT for which bounds havealready been shown. However, proofs for such bounds are way beyond the scopeof this work. We rather focus on experimental analysis and will show that the utomated Machine Learning Service Composition 9 solutions produced by

MLS-Plan , even though being usually sub-optimal, stillsigniﬁcantly outperform any other solution produced by other algorithms.

In our case study, we consider the domain of automated machine learning (Auto-ML). Auto-ML aims at automatically selecting and conﬁguring the algorithmsof a so-called machine learning pipeline. Usually, such a pipeline consists of oneor more preprocessor algorithms (principal components, imputation, etc.) anda classiﬁcation algorithm (decision trees, logistic regression, etc.). State-of-the-art approaches auto-sklearn and

Auto-WEKA reduce the combined algorithmselection and hyperparameter optimization problem to a mere hyperparame-ter optimization problem, considering the selection of an algorithm for featurepreprocessing and a classiﬁer model as additional parameters for a hyperpa-rameter optimization tool. Moreover, these tools are committed to a certainlibrary (e.g., scikit-learn or WEKA) as well as to a speciﬁc programming lan-guage (e.g. Python or Java). However, these libraries are neither equal nor doesone subsume the other. Moreover, implementations of certain machine learningalgorithms usually diﬀer signiﬁcantly so that even for a particular algorithmthere might be diﬀerences in terms of non-functional requirements, e.g., runtimeor even predictive accuracy.In the following, we describe

MLS-Plan , the application of

HTN-SPlan tothe Auto-ML problem. We ﬁrst describe how we created services out of the ex-isting algorithms and how the execution of compositions works. We then presentan experimental evaluation, which shows that the service-based approach comb-ing WEKA and scikit-learn services is often better than using the same searchtechnique with algorithms of just one library, and it is even mostly competitivewith expert approaches such as

Auto-WEKA and auto-sklearn . The idea of what we call serviciﬁcation is to make existing software accessibleas a service. Our contribution is not about this process in general, so we onlydescribe how we convert the learning algorithms relevant for the case study intoservices.We enable serviciﬁcation by so called

Generic Service Managers (GSM),which are web servers that route HTTP requests to invocations of functions inan object-oriented programming language. Every algorithm in the consideredmachine learning frameworks is encoded in its own (Java or Python) class ﬁle,so we only consider these two languages here. A GSM processes requests ofthe form http://host/classname/method with which the client can trigger theinvocation of a given method of a (generally arbitrary) class. The parametersare transmitted by GET or POST; in our case, we only have POST requests.The GSM is generic as it does not have to be tied to a speciﬁc library and maycreate objects via reﬂection in Java or via importlib and getattr in Python.

Fig. 3.

Communication via HTTP

For simplicity, we use one GSM for Java classes and one for Python classes(Figure 3). In other words, the set of available services is the set of classesenabled in the GSM, and the service operations correspond to the (enabled)methods of that class.In our framework, services are generally stateful. We are aware that thereare paradigms that suggest that not only the communication between servicesbut also the services themselves should be stateless. However, we argue thatmany services, including those related to machine learning, are more reasonablyrealized in a stateful manner. For example, machine learning services should savethe model they learned locally instead of exchanging it with the client in orderto keep themselves stateless. Not only are such models sometimes very large, butthe model can also be seen as a part of the implementation of the service and,hence, the service provider might not want to share the model with the client.Stateful services are realized by a distinction between services and their in-stances. This is analogue to classes and objects in programming languages: ser-vices correspond to classes, and service instances to objects of those classes. Aservice is then instantiated by simply creating an object of the respective class.The GSM interprets requests of the form http://host/classname/new as com-mands to create new service instances. The response to such requests is the URLby which the new service instance is reachable, which, by convention, is the sameURL as for the service itself plus some ID. Operations of a service instance canthen be accessed via http://host/classname/id/method .To make the service instances persistent, the GSM serializes the respectiveobjects to its disk. When the Java or Python objects are created on an invocationof http://host/classname/new , their lifetime is limited by the lifetime of theweb server worker thread. Hence, the GSM serializes the created objects on theserver disk in order to have an unlimited lifetime; this storage is called the ServiceState Storage (S3). Of course, this requires that the class to be serviciﬁed actuallyis serializable. However, all of the considered algorithm classes are serializable,so this is no limitation in the given domain. utomated Machine Learning Service Composition 11

The solutions to the composition problem deﬁned in Section 3 are processes but not services. For example, we can compose a process that describes how amachine learning pipeline is conﬁgured, but the process does not correspond tosuch a pipeline itself.To obtain composed services from solutions of the composition problem, weuse service templates . The template contains the code for the diﬀerent operations of the target service except its constructor, which will be used to conﬁgure the(basic) services on which the (composed) service will rely. For example, thetemplate may contain an operation to train the machine learning pipeline andto predict the label of new instances based on preprocessing and classiﬁcationservices s s s s constructor of that template in orderto obtain an executable program. Using the GSM, the composed service can beaccessed as a service in turn.In the following, we refer to compositions as the processes of any operationof the composed service. For example, in the machine learning case, there isone composition for the constructor, one for training the pipeline, and one forpredicting new labels. Syntactically, the processes of the training and predictioncompositions are ﬁxed and do not depend on the solution of the compositionproblem. However, since these operations will rely on external basic servicesconﬁgured in the constructor their behavior does depend on the concrete com-position injected into the constructor. An eﬃcient execution of compositions requires that the participating servicescommunicate via a choreography protocol. In our case study, composed processesinclude, for example, the application of a preprocessor followed by the trainingof the actually used classiﬁcation algorithm. The data passed to these servicescan easily reach several hundreds MB or even some GB, which makes a zig-zag-communication with the client unacceptably slow.To this end, every service operation receives, in addition to its usual argu-ments, the entire composition. The service then only sends its result back tothe client if it is the last operation in the composition; otherwise it directlysends its data to the next service operation in the choreography. Our currentimplementation only supports sequential compositions where the input of oneservice is provided by the preceding service without hops in the data-ﬂow, whichis suﬃcient for the case study and keeps the implementation simple.The execution logic for compositions is also contained in the GSMs. That is,GSMs can not only process service invocations but also entire compositions. Tothis end, they receive a single service operation call together with a composition.They identify the position of the invoked service within the composition, executeit and send the result either to the client or to the subsequent service if one exists.

To asses the question of what the beneﬁt of using services is, we evaluate

MLS-Plan incorporating both scikit-learn (Python) and WEKA (Java) in the formof HTTP services, comparing it to itself limited to use only scikit-learn or wekarespectively. Additionally, we compare

MLS-Plan to other Auto-ML tools suchas

Auto-WEKA (WEKA, Java), auto-sklearn , and

TPOT (scikit-learn, Python)[24] to put our results into context of state-of-the art solutions.

Dataset

MLS-Plan MLS-Plan (W)

MLS-Plan (S)

Auto-WEKA TPOT auto-sklearn abalone ± ± ± • ± ◦ ± ◦ ± • amazon ± ± • ± • ± • - 28.57 ± car ± ± • ± • ± ◦ ± ◦ ± convex ± ± ◦ - 46.83 ± • - 16.12 ± ◦ credit-g ± ± ± ± • ± ± • dexter

600 20000 2 ± ± • ± ± • - - dorothea ± ± • - - - - gisette ± ± • - 3.90 ± • - 2.60 ± glass

214 9 7 25.09 ± ± ± ± ± ◦ ± ionosphere

351 34 2 6.16 ± ± • ± ± ± ± • iris

150 4 3 5.71 ± ± ± ± ± ◦ ± • letter ± ± • ± ± • - 4.89 ± • madelon ± ± ± • ± • ± • ± ◦ page-blocks ± ± ◦ ± ± ± ± secom ± ± • ± ± ± ± segment ± ± ± ± ◦ ± ◦ ± semeion ± ± ◦ ± ± • ± ◦ ± ◦ vowel

990 13 11 ± ± • ± • ± • ± ◦ ± • waveform ± ± • ± • ± ± ± winequality ± ± ◦ ± ± ◦ ± ◦ ± yeast ± ± • ± ± ◦ ± ◦ ± ◦ Table 1.

Means and standard deviation of 0-1 loss. Each entry represents the meanand standard-deviation over 20 runs with diﬀerent random seeds.

Results were obtained by carrying out 20 runs on 21 datasets with a timeoutof 1h. All of the used datasets can be found in the OpenML dataset repository.The signiﬁcance of an improvement or degradation was determined using thet-test with a threshold for the t-score of 2.086.The timeout for the internal evaluation of a single solution was set to 5minutes (if allowed by the respective tool). Runs that did not adhere to thegiven limitations (plus a tolerance threshold) were canceled without consideringtheir results. That is, the algorithms were canceled if they did not terminatewithin 110% of the predeﬁned timeout. Likewise, the algorithms were killed ifthey consumed more resources (memory or CPU) than allowed, which happensas overall CPU and memory consumption is hard to control.In each run, we used 70% of a randomized, stratiﬁed split of the data forlearning (search) and 30% for testing. We used the same splits for all candidates,i.e., for each split and each timeout, we ran each candidate.The experiments were conducted on (up to) 200 Linux machines in parallel,each of which with a resource limitation of 8 cores (Intel Xeon E5-2670, 2.6Ghz)and 16GB memory. Thus, the execution of the experimental evaluation took20.160 CPU hours (840 days) in total. To ensure a fair comparison especially with respect to hardware resources, allthe components required by

MLS-Plan , especially the HTTP servers providingaccess to the respective libraries, run on the same node.

Table 1 summarizes the error rates of the diﬀerent approaches per dataset. Boldentries indicate that the respective approach achieved the best performance onaverage within a dataset among the variants of

MLS-Plan . To compare

MLS-Plan in its service variant against the other approaches, we indicate statisticallysigniﬁcant improvements of

MLS-Plan over another approach by • and degra-dation by ◦ .The results show that the performance of the approaches strongly variesacross the datasets. In fact, there is neither a single best approach that is bestamong most datasets nor is there one approach that is not best among at leastsome datasets. TPOT seems to dominate on small datasets, but often fails toproduce any result on larger datasets.The focus of our evaluation is the comparison of MLS-Plan with servicesfrom both frameworks WEKA and scikit-learn on one hand and the same com-position strategy using algorithms from only one of these libraries on the otherhand. This way, we learn something about the beneﬁt one gains from the service-oriented implementation, which enables combining algorithms from both libraries.The other three approaches are meant to give reference values of recognized toolsin the respective area, but their performance is not relevant for the questionwhether services are advantageous as their entirely diﬀerent search behavior isa signiﬁcant confounding factor. To isolate the service vs. non-service questionfrom the search strategy, we compare

MLS-Plan using both frameworks against

MLS-Plan using only WEKA or scikit-learn respectively.Note that, a priori, it is completely unclear which of the combinations wouldbe better. First, applying

MLS-Plan with both WEKA services and scikit-learnservices, does not mean to consider the joint solution space as we consider algo-rithms occurring in both libraries only once. That is, many algorithms such asNearest Neighbors, Random Forests, Naive Bayes, etc. are implemented in bothlibraries, but in

MLS-Plan we chose to consider only one of these implemen-tations respectively. For space limitations, we refer to the documentation of ourimplementation for the details about the chosen algorithms. Consequently, thesearch space of

MLS-Plan with algorithms of both libraries is not a superset ofthe search space of the

MLS-Plan using only WEKA or scikit-learn algorithmsrespectively. Second, the search space is still much larger compare to using onlyone of the libraries, which makes it more likely that good solutions need moretime to be found. While the more powerful search space suggests better solu-tions due to the coverage of much more pipelines, only a much smaller part ofthe search space can be examined in the same time bound.In fact, the above results show that combining the libraries is often signiﬁ-cantly better than one or even both of the single-library versions but also worsesometimes. However, the overall impression is that the service-based variant yields signiﬁcant improvements, not only over the other

MLS-Plan -variantsbut even globally.We conclude from these results that the automated service composition ap-proach is a relevant approach for solving the Auto-ML problem. It does notdominate other approaches, but it is the best option in quite some cases andshould, hence, be in the portfolio of solution techniques. Therefore, we havedemonstrated the utility of automated service composition on the real-worldproblem of Auto-ML. In this paper, we have presented

MLS-Plan , an approach to automated servicecomposition in the area of machine learning and shown that it can signiﬁcantlyimprove the performance in comparison to non-service based approaches.

MLS-Plan is based on a reduction of the service composition problem to hierarchicalplanning with a black box objective function. The main beneﬁt of using servicesexploited in this approach is that the service architecture allows to combinealgorithms of diﬀerent frameworks (WEKA and scikit-learn) instead of usingonly algorithms of one of them, which is the natural limitation one has withoutthe abstraction to the service layer. The experimental evaluation shows that

MLS-Plan does bring signiﬁcant improvements in many cases but also losesagainst other approaches in some cases, which we trace back to the increasedsearch space size coming with the increased ﬂexibility. In essence, we interpretour results as an evidence for the utility of automated service composition for thereal-world problem of Auto-ML. However, we also see that the enlarged outputspace can be a problem, which gives rise to increase timeouts or to improve onthe search itself.

References

1. F. Mohr,

Automated Software and Service Composition - A Survey and EvaluatingReview , ser. Springer Briefs in Computer Science. Springer, 2016. [Online].Available: https://doi.org/10.1007/978-3-319-34168-22. D. Berardi, D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Mecella, “Auto-matic composition of e-services that export their behavior,” in

Proceedings of theInt. Conf. on Service-Oriented Computing . Springer, 2003, pp. 43–58.3. L. Zeng, B. Benatallah, M. Dumas, J. Kalagnanam, and Q. Z. Sheng, “Qualitydriven web services composition,” in

Proceedings of the 12th international confer-ence on World Wide Web . ACM, 2003, pp. 411–421.4. D. Wu, B. Parsia, E. Sirin, J. A. Hendler, and D. S. Nau, “AutomatingDAML-S web services composition using SHOP2,” in

The Semantic Web- ISWC 2003, Proceedings , 2003, pp. 195–210. [Online]. Available: https://doi.org/10.1007/978-3-540-39718-2 135. M. Klusch, A. Gerber, and M. Schmidt, “Semantic web service composition plan-ning with OWLS-XPlan,” in

Proceedings of the 1st Int. AAAI Fall Symposium onAgents and the Semantic Web , 2005, pp. 55–62.utomated Machine Learning Service Composition 156. J. Hoﬀmann, P. Bertoli, M. Helmert, and M. Pistore, “Message-based web ser-vice composition, integrity constraints, and planning under uncertainty: A newconnection,”

Journal of Artiﬁcial Intelligence Research , pp. 49–117, 2009.7. F. Mohr, A. Jungmann, and H. Kleine B¨uning, “Automated online service com-position,” in

Proceedings of the International Conference on Services Computing .IEEE, 2015, pp. 57–64.8. S. McIlraith and T. C. Son, “Adapting golog for composition of semantic webservices,” KR , vol. 2, pp. 482–493, 2002.9. R. Waldinger, “Web agents cooperating deductively,” in Formal Approaches toAgent-Based Systems . Springer, 2001, pp. 250–262.10. S. Narayanan and S. A. McIlraith, “Simulation, veriﬁcation and automated com-position of web services,” in

Proceedings of the 11th international conference onWorld Wide Web . ACM, 2002, pp. 77–88.11. C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto-weka:combined selection and hyperparameter optimization of classiﬁcation algorithms,”in

The 19th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, KDD 2013 , 2013, pp. 847–855. [Online]. Available:http://doi.acm.org/10.1145/2487575.248762912. L. Kotthoﬀ, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown,“Auto-weka 2.0: Automatic model selection and hyperparameter optimization inWEKA,”

Journal of Machine Learning Research , vol. 18, pp. 25:1–25:5, 2017.[Online]. Available: http://jmlr.org/papers/v18/papers/v18/16-261.html13. M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum,and F. Hutter, “Eﬃcient and robust automated machine learning,” in

Advances in Neural Information Processing Systems 28, Montreal, Quebec,Canada , 2015, pp. 2962–2970. [Online]. Available: http://papers.nips.cc/paper/5872-eﬃcient-and-robust-automated-machine-learning14. F. Mohr, M. Wever, and E. Hllermeier, “Toward,” in

Proceedings of the IEEEInternational Conference on Services Computing, SCC , 2018. [Online]. Available:https://doi.org/10.1109/ICSC.2008.1215. S. Sohrabi, N. Prokoshyna, and S. A. McIlraith, “Web service composition viageneric procedures and customizing user preferences,” in

The Semantic Web-ISWC2006 . Springer, 2006, pp. 597–611.16. ——, “Web service composition via the customization of golog programs with userpreferences,” in

Conceptual Modeling: Foundations and Applications . Springer,2009, pp. 319–334.17. M. Ghallab, D. S. Nau, and P. Traverso,

Automated planning - theory and practice .Elsevier, 2004.18. E. Sirin, B. Parsia, D. Wu, J. Hendler, and D. Nau, “HTN planning for web servicecomposition using SHOP2,”

Web Semantics: Science, Services and Agents on theWorld Wide Web , vol. 1, no. 4, pp. 377–396, 2004.19. I. Paik and D. Maruyama, “Automatic web services composition using combininghtn and csp,” in

Computer and Information Technology, 2007. CIT 2007. 7th IEEEInternational Conference on . IEEE, 2007, pp. 206–211.20. S. Liaskos, S. A. McIlraith, S. Sohrabi, and J. Mylopoulos, “Representingand reasoning about preferences in requirements engineering,”

Requir. Eng. ,vol. 16, no. 3, pp. 227–249, 2011. [Online]. Available: https://doi.org/10.1007/s00766-011-0129-921. K. Chen, J. Xu, and S. Reiﬀ-Marganiec, “Markov-htn planning approach to en-hance ﬂexibility of automatic web service composition,” in

Proceedings of the In-ternational Conference on Web Services . IEEE, 2009, pp. 9–16.6 Felix Mohr, Marcel Wever, Eyke H¨ullermeier22. A. J. Coles, A. Coles, S. Edelkamp, D. Magazzeni, and S. Sanner,Eds.,

ICAPS 2016, London, UK

IEEE Trans. Comput.Intellig. and AI in Games , vol. 4, no. 1, pp. 1–43, 2012. [Online]. Available:https://doi.org/10.1109/TCIAIG.2012.218681024. R. S. Olson, N. Bartley, R. J. Urbanowicz, and J. H. Moore, “Evaluation of atree-based pipeline optimization tool for automating data science,” in