[PDF] Machine learning for improving performance in an evolutionary algorithm for minimum path with uncertain costs given by massively simulated scenarios

Abstract

In this work we introduce an implementation for which machine learning techniques helped improve the overall performance of an evolutionary algorithm for an optimization problem, namely a variation of robust minimum-cost path in graphs. In this big data optimization problem, a path achieving a good cost in most scenarios from an available set of scenarios (generated by a simulation process) must be obtained. The most expensive task of our evolutionary algorithm, in terms of computational resources, is the evaluation of candidate paths: the fitness function must calculate the cost of the candidate path in every generated scenario. Given the large number of scenarios, this task must be implemented in a distributed environment. We implemented gradient boosting decision trees to classify candidate paths in order to identify good candidates. The cost of the not-so-good candidates is simply forecasted. We studied the training process, gain performance, accuracy, and other variables. Our computational experiments show that the computational performance was significantly improved at the expense of a limited loss of accuracy.

Full PDF

MMachine learning for improving performance in an evolutionary algorithm forminimum path with uncertain costs given by massively simulated scenarios

Ricardo Di Pasquale , Javier Marenco Facultad de Ingenier´ıa y Ciencias Agrarias, Pontiﬁcia Universidad Cat´olica Argentina, Argentina. Instituto de Ciencias, Universidad Nacional de General Sarmiento, [email protected], [email protected]

Abstract

In this work we introduce an implementation forwhich machine learning techniques helped improvethe overall performance of an evolutionary algo-rithm for an optimization problem, namely a varia-tion of robust minimum-cost path in graphs. In this big data optimization problem, a path achieving agood cost in most scenarios from an available setof scenarios –generated by a simulation process–must be obtained.The most expensive task of our evolutionary algo-rithm, in terms of computational resources, is theevaluation of candidate paths: the ﬁtness functionmust calculate the cost of the candidate path in ev-ery generated scenario. Given the large number ofscenarios, this task must be implemented in a dis-tributed environment.We implemented gradient boosting decision treesto classify candidate paths in order to identify goodcandidates. The cost of the not-so-good candidatesis simply forecasted. We studied the training pro-cess, gain performance, accuracy, and other vari-ables. Our computational experiments show thatthe computational performance was signiﬁcantlyimproved at the expense of a limited loss of accu-racy.

In this work we report a case in which machine learning (ML)can successfully boost the performance (in terms of runningtime) of a metaheuristic search in a big data environment. Weare particularly focused on instances of big data optimization (BDO) problems, namely optimization problems where mas-sive or complex data arises as an insurmountable problem fortraditional approaches.In this work we deal with an evolutionary algorithm search-ing for minimum-cost paths in a large number of graphs, asmany as generated scenarios, simultaneously. The use of MLhas resulted in a reduction of computing resources, as wellas an improvement in the algorithm performance in terms ofoverall running time. The remainder of the paper is organizedas follows. The studied problem will be stated in Section 2. In Section 3 we introduce the scenario-generating procedure.Section 4 describes our evolutionary algorithm, whereas theevolutionary workﬂow implementation details are describedin Section 5. In Section 6 we analyze considerations comingfrom problem sizing. Results are presented in Section 7. Fi-nally, in Section 8, machine learning algorithms are presentedin order to search for performance improvements. Conclu-sions are described in Section 9.

We consider in this work the sailing regatta route optimiza-tion problem introduced in [1]. We are given an acyclic di-rected graph G = ( V, E ) representing possible paths betweenthe start and goal points in the regatta. The vertex set V is de-ﬁned by a bidimensional discretization of the regatta’s geo-graphic map (called the court in this context). The edge set E represents possible navigation maneuvers between neighbor-ing vertices (including keeping course with no maneuvers).The costs associated with the edges model the expected nav-igation times including maneuvering costs.The graph G is based on the court’s map. The court isdivided into ﬁxed square cells (in our case, we use 50 m cells). Fig. 1 shows the basic graph, where navigable cells arecolored in blue, and non-navigable cells are colored in pink.Fig. 2 shows possible state transitions (i.e., edges) within onecell. Figure 1: Basic graph model for a simple court. a r X i v : . [ c s . N E ] F e b igure 2: Possible state transitions within one cell. Sailboat maneuvers include tack, gybe, bow-up and bow-down . Not all maneuvers are possible all the time, as avail-able maneuvers depend on the wind angle. In [1], this simplemodel is extended in order to take into account possible nav-igation maneuvers. This is accomplished by adding a newdimension to the basic graph model of Fig. 1, which repre-sents the sailing mode so only edges with available maneu-vers are included in the obtained graph. Fig. 3 shows thethree-dimensional graph for a simple example, representingthe geographic map as well as all possible maneuvers. It isimportant to notice that surface ocean current effects on thesailboats are considered as not important in this approach,given that they may affect all regatta sailboats in the sameway within short periods of time.

Figure 3: Complete court graph representation.

Every edge e ∈ E has an associated cost C e = N C e + M C e where N C e is the navigation cost and M C e is the ma-neuver cost for the edge e , respectively. The details of costscalculations are out of scope for this paper, and we refer to[1] for details.Within this context and a known a priori static wind sce-nario W , the authors in [1] provide two approaches: (1) anexact algorithm to ﬁnd minimum-cost routes (i.e., an optimalsolution s w for W ) and (2) a real-time heuristic using sail-boat navigation tools as input. The accuracy of approach (1) in [1] depends on (a) the re-gatta duration, (b) the court area, and (c) weather stability.Light ﬂuctuations of these factors during the regatta can ren-der an optimal solution s w obsolete. Furthermore, in a time-dependent approach, the bigger the ﬂuctuations of these fac-tors, the worse s w can ﬁnally be. Even if the real-timeheuristic (2) is used to complement the optimal solution s w , it is impossible to discard a decision after it has been taken,i.e., no rollback for maneuvers exists in this context.In consequence, we propose in this work a scenario-basedapproach based on the observation that winds in soil, river,and seas have a higher variation rate than winds in higher at-mosphere layers. We locate the court in a speciﬁc locationin order to provide accurate models, and we have chosen the“Rio de la Plata” estuary near Buenos Aires, Argentina (spe-cially due to the fact that we can easily access wind observa-tions within this area). Fig. 4 shows our proposed location fora 6.25 km court ( × cells). Figure 4: Court location in Google Maps © capture.

Knowing historical weather conditions, it is possible to im-plement a model to generate initial wind conditions based onan intermediate weather stability range. Fig. 5 shows a rep-resentation of the initial state for a a 40000 m court ( × cells). Arrow length is proportional to the wind speed deﬁnedfor the cell (we assume the same speed and angle within eachcell), and arrow direction represents wind direction.In the considered location near Buenos Aires, we have that(a) the wind direction is generally E-W, (b) the wind speedtends to be lower at W than at E, and (c) the wind directiontends to be more perpendicular to the coast at W than at E.We implemented a procedure generating valid initial statesaccording to these premises and certain random variations.The parameters of the procedure are the mean and standarddeviation of the incidence angle of winds, and the mean andstandard deviation of wind speed. We use Gaussian randomnumber generators for this task. Once a initial state w is deﬁned by the procedure describedin Section 3.1, we perform a simulation process starting from igure 5: Initial state for a × court. w . We deﬁne a scenario to be a time-indexed sequence ofwinds-state sets for each cell in the court. We take t i = 10 i seconds, for i = 0 , . . . , n , to be the time steps for the simula-tion, hence 360 states are needed in order to simulate an hourof wind states.The simulation proceeds as follows. We introduce changesinto some state in order to produce a new state, by slightlymodifying randomly-selected cell states by adding normalGaussian random numbers (using σ as a parameter of the pro-cess). This is performed for both wind speed and angle. Oncesome cells are altered, we implement a spreading change pro-cedure to alter neighboring cells until all cells are processed.After these slight changes are performed, we add gusts ofwinds [2], a crucial element in regattas. There is no uniformprobability of gusts falling all over the court, so we assignprobabilities of gust falls to each cell as a new parameter ofthe court. Gusts of winds have their own parameters: (a)mean time between two consecutive gusts, (b) mean length,(c) standard deviation of the gust length, (d) mean wind an-gle variation, (e) standard deviation on angle variation, (f)mean wind speed variation, and (g) standard deviation ofwind speed variation. If the random variable determines that agust should appear in the time slot being (parameter (a)), thenthe cell in which it occurs is randomly selected (according tothe gust fall probability for each cell). Gaussian random num-bers are generated in order to state the gust particular length(parameters (b) and (c)) and speed and angle variation (re-maining parameters). Wind variations are spread using sameprocedure logic. Once a set of wind scenarios is produced, we apply a meta-heuristic search to ﬁnd reasonably good solutions. The mainobjective of this metaheuristic search is to ﬁnd good pathsamong all scenarios, so we are not looking for an optimalroute in one state neither looking for an optimal route in onescenario (time-dependent solution), but we are looking for ro-bust routes that are good in most scenarios instead.

We discarded a classical genetic algorithm representation interms of bit strings since (1) no structure could be found by agenetic workﬂow in few generations (we tried less than 1000generations) and (2) no simple crossover operator seems tobe possible. Due to this fact, we chose to model individuals(chromosomes) as lists of ordered integer pairs, where eachtuple represents a navigation vertex (in terms of Fig. 1). Forconvenience, this representation excludes the initial and goalvertices. The vertices taken from the lists included within thegenes do not include the maneuvering component, in order tolet the evolutionary algorithm detect important points wheremaneuvers should be made.Each individual does not represent a single route, but aset of routes instead. Within evolutionary algorithms this ap-proach is known as the

Pittsburgh approach [3]. An individ-ual is modeled with a unique chromosome containing onlyone non-ﬁxed length gene. Maximum and minimum genelength are parameters of the algorithm that can affect perfor-mance. Each locus inside a gene is occupied by a vertex (rep-resented as an ordered pair).

A random initialization schema is not allowed in our case,since feasible individuals are not easily generated and ran-dom paths are usually worse than manually-generated routes.Nevertheless, it is important to introduce some level of ran-domness in the initialization process, so our population ini-tialization procedure includes s w as an individual and con-structs individuals based on paths containing some randomvertex. Our framework deﬁnes an interface “Morphogenesis Agent”whose mission is to transform genotypes into fenotypes. Inour case, this amounts to converting lists of ordered pairs intoa family of valid paths. This process must generate and evalu-ate the family of possible valid paths from the individual. Forconvenience, in this implementation it is useful to evaluatepaths and determine individual ﬁtness in the same process inorder to take advantage of distributed computing resources.The ﬁtness function is simple: each path in the family de-veloped by the morphogenesis agent has an associated cost(evaluated during development process). Call c ij to the costof the path number j within individual i . In order to providea positive ﬁtness value, the ﬁtness function applied to an in-dividual k is deﬁned as F F ( k ) = M − (cid:80) j min( c kj ) , with M ∈ N an arbitrarily big number. Classical one-point or two-point crossover operators were notable to ﬁnd good paths in our experiments, so we adapted theideas suggested in [4] to our logic. Our crossover mechanismcontemplates two possibilities: (1) parent individuals have acommon vertex or (2) not. If they have a common vertex,then a path recombination is made with a random selected pivot , chosen among common vertices. Two auxiliary proce-dures are performed after such a recombination, namely cyclelimination and chromosome reparation. In case (2), a vari-ation of one-point crossover was implemented. Experimentsshowed that the latter can include some randomness whenneeded. In our experience, regular to good paths hace verticesin common, so most crossovers apply case (1). Fig. 6 showsan example of our crossover operator combining two parents(I1 and I2) and generating descent (Desc1 and Desc2).

Figure 6: Crossover mechanism.

Our mutation operator randomly choses a vertex v of thechromosome, and swaps v with a randomly-chosem vertex w . If v is located within the ﬁrst 20% of the path, then w isselected close to the starting vertex of the path. If v is locatedwihtin the last 80% of the path, then w is chosen closer to thegoal vertex of the path. In order to develop an individual and to compute ﬁtness, theevolutionary workﬂow needs to evaluate every path generatedfor every individual of the population for every wind scenario.Due to the number of scenarios, this must be processed in a distributed fashion. The rest of the workﬂow (namely, popu-lation initialization, probabilistic roulette selection, crossoverexecution, mutation execution, and descendant acceptance) isnot expensive in terms of computational resources.Our framework is built on top of Apache Spark framework,is implemented in the Scala language, and it is prepared torun on (private or public) Kubernetes-based clouds. In thisparticular workﬂow, we distribute an RDD [5] list with windscenarios all over the cluster. The process computes a mapoperation in which the whole population is processed in eachexecutor (each one has a part of the wind scenarios set).

We considered instances with 6.25 km courts. Our clusterhas 96 vCPUs and 224GB of total RAM. We planned to as-sign one instance per vCPU, so we have w spark = 96 in-stances. We consider individuals in the population, wind scenarios, and run the evolutionary algorithm for iterations. In this case, the cloud will perform . × pathevaluations. With no boosting of any type, a generation takesaround 1 hour to be processed in the cluster, keeping 100%of CPU for most of time. Our evaluation process involves taking a holdout set of H =10 scenarios not used in the evolutionary algorithm. Afterthe algorithm runs, we take the top 10 solutions found by theevolutionary algorithm ( S ), and compare s w with S inthe H scenarios.Exact optimal solutions like s w showed to be good solu-tions in small court instances. For example, in a 40000 m court, in average, it took 15 generations to ﬁnd similar solu-tions to s w . Slightly better solutions were also found.In case of larger courts, like a 6.25 km court, s w turnedout not to be a good solution. Every single solution takenfrom S is better, and it took about 10 generations to ﬁndsolutions with similar costs.The main problem of our implementation is performance.These experiments show that our algorithm can provide ro-bust solutions for real environments, but is expensive in termsof computational resources and running time. This observa-tion motivates the rest of this work. The ﬁrst approach to improve performance was to maintain acache. The idea of this cache is to avoid evaluating the sameindividual twice. With this simple goal, we measure that, inaverage, a 17% of individuals evaluation could be avoidedand performance was improved accordingly. Given the natureof RDD distribution, the time reduction amounts to 12% ofoverall time.In order to avoid a greater proportion of evaluations we de-cided to incorporate ML techniques. The assumption is thatwith enough data, a path may be classiﬁed as being “good” or“bad”, so we can only evaluate good paths (hence this is a bi-nary classiﬁcation problem). Such an approach has a problemassociated with evolutionary algorithms, namely that it is im-portant to have a ﬁtness value for each individual. To tacklehis issue, we also resort to a mechanism for estimating thecost of a path.In order to evaluate this approach we experimented withsmall instances given by medium-sized courts (0.16 km ),populations of 100 individuals, 200 scenarios, and 100 gen-erations. Based on 10000 evaluations of individuals, we col-lected 8590 individuals with their costs (14.1% cached in thisrun). We also split this set into a train set and keep a hold-outvalidation set of 20% of the individuals.After evaluating alternatives, we decided to implement gra-dient boosting decision trees (GBDT). In order to keep ourframework in Scala-Spark environments, we chose SparkMLLib as the main framework for our ML pipelines. Weused LightGBM [6], one of the most referenced implemen-tations in decision trees which is also compatible with SparkMLLib. In order to determine whether an individual is good or not,we use a threshold around the cost of s w . A solution withcost greater than s w + (cid:15) is considered to be a bad solution.We transformed inputs into data frames with a binary label(0/1) and a list of features (labeled vertex with ordered naturalnumbers, from left to right, and bottom up fashion).After training and ﬁtting the binary classiﬁcation GBDTmodel, we found that the accuracy measurement was alwaysaround . , the precision around . , the recall around . , and the sensitivity around . . After these measure-ments were conﬁrmed, we performed additional tests in orderto rule out the possibility of overﬁtting.The confusion matrix in Fig. 7 shows that, in average, weget true positives and true negatives, false neg-atives ( . of total “bad” solutions) and false positives( . of total “good solutions”). Even when these seem tobe very good indicators, false negatives may be of concern.Fortunately, a regression algorithm can recover part of thisloss, as the following section shows. In order to forecast costs of not-so-good solutions, we imple-mented a regression model with GBDT. The feature engineer-ing was very similar to the description in Section 8.1, but thesolution cost (instead of binary values) was used as target.After training and ﬁtting this model, for a test instance forwhich we used a threshold of cost units to divide goodsolutions from bad solutions, we obtained

M SE = 1173 , RM SE = 34 . , and M AE = 16 .An histogram of results shows that (1409 cases) offorecasted (held-out cases) solutions were under . costunits. Only outlier cases ( . ) were over 150 cost units.We run several experiments in order to determine howmany false negatives were corrected by regression algo-rithms, and found out that about of the cases are fore-casted with less than cost units of error.We analyzed ﬁnal populations in generation processes(for a population of individuals), and found that “good”individuals were less than . Taking this proportion into ac-count, in average, our classiﬁcation algorithm should tag false negatives for the whole run. Of those false negatives, Figure 7: Confusion matrix for GDBT binary classiﬁcation. should be recovered with our regression algorithm. So, theactual loss for this run is “good” solutions. After successfully showing that this approach can boost theperformance of the evolutionary algorithm, we decided toimplement an in-line version in order to train models aftereach generation is completed. We could observe that accu-racy measurement always reached the . threshold in 3 or4 generations, so this suggests that it is possible to use thisapproach in an in-line way.For a running instance of 100 generations, this techniqueallows to reduce the overall running time by , with nosigniﬁcant loss of accuracy in terms of solution quality com-pared to results without ML boosting. Loss of accuracy wasmeasured to be in the range of . − . of solution cost. Although decision trees and gradient boosting have a goodtunability level [7], we found that default values worked verywell with a good number of individuals (8590). There was nopractical difference between default and tuned hyperparame-ter values for the test proposed in this section, with the ex-ception of the in-line version (Section 8.3), in which the dataset can be poorly populated in the very early generations. Inthis case we apply a cross-validation with k -folds ( k = 5 )pipeline in order to tune the hyperparameters. A simple gridsearch approach was enough to tune the hyperparameters inearly steps of in-line executions, running the tuning processafter each generation. Special care was taken in order to avoidoverﬁtting at very early stages.The search space was reduced by removing hyperparame-ters for which tunability showed to be poor (considering thein-line running). As a result, hyperparameters for our GBDTare the number of leaves, the minimal number of data in oneleaf, λ , and λ [8]. Conclusions

In this work we have presented an application of an evolution-ary algorithm for ﬁnding robust solutions, with a distributedcomponent in order to evaluate solutions in a large numberof scenarios. The implementation of such a procedure relieson a distributed framework, and allows to ﬁnd competitiveand robust solutions for real-sized instances. We have alsodescribed the incorporation of ML techniques to the basicimplementation, in order to boost performance by not eval-uating all individuals. Our experiments show that this can beachieved with a reasonable effort and a small impact on theobtained solutions. It would be interesting to explore whetherthese techniques can be applied in other big data optimizationenvironments.

References [1] F.E. M

ART ´ INEZ , G. S

AINZ —T R ´ APAGA , Modelos yalgoritmos de optimizaci´on combinatoria para planiﬁ-caci´on de rutas en regatas de barcos de vela , Undergrad-uate thesis, Computer Science Dept., School of Sciences,University of Buenos Aires, Argentina, 2010.[2] J.D.W. K

AHL , Forecasting Peak Wind Gusts Using Me-teorologically Stratiﬁed Gust Factors and MOS Guid-ance , Atmospheric Science Group, Department of Math-ematical Sciences, University of Wisconsin–Milwaukee,Milwaukee, Wisconsin, 2020. https://doi.org/10.1175/WAF-D-20-0045.1[3] D.L. A

LVES DE A RAUJO , H.S. L

OPES , A.A. F RE - ITAS , A parallel genetic algorithm for rule discoveryin large databases , IEEE SMC’99 Conference Proceed-ings. 1999 IEEE International Conference on Systems,Man, and Cybernetics (Cat. No.99CH37028), Tokyo,Jap´on, 1999. ISBN: 0-7803-5731-0.[4] F

ANGGUO H E , H UAN Q I Q IONG AND F AN Q IONG , An Evolutionary Algorithm for the Multi-objectiveshortest path problem , International Journal of Com-putational Intelligence Systems. 10-2007. DOI:10.2991/iske.2007.217[5] M. Z

AHARIA , M. C

HOWDHURY , T. D

AS ET AL ., Re-silient distributed datasets. A fault-tolerant abstractionfor in-memory cluster computing , in Conference Pro-ceedings of the 9th USENIX conference on NetworkedSystems Design and Implementation. 2012.[6] G

UOLIN K E , Q I M ENG , T. F

INLEY ET AL ., LightGBM:A Highly Efﬁcient Gradient Boosting Decision Tree , inProceedings: NIPS 2017 Computer Science.[7] P. P

ROBST , B. B

ISCHL AND

A.L. B

OULESTEIX , Tun-ability: Importance of Hyperparameters of MachineLearning Algorithms , Journal of Machine, 2019, Learn-ing Research Nr.20 pp1-32.[8] M

ICROSOFT C ORPORATION , Chapter 7 “Parameterstunning” in