A Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling
Yali Wang, Bas van Stein, Michael T.M. Emmerich, Thomas Bäck
AA Tailored NSGA-III Instantiation for FlexibleJob Shop Scheduling (cid:63)
Yali Wang, Bas van Stein, Michael T.M. Emmerich, and Thomas B¨ack
Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1,2333CA Leiden, The Netherlands [email protected]
Abstract.
A customized multi-objective evolutionary algorithm (MOEA)is proposed for the multi-objective flexible job shop scheduling problem(FJSP). It uses smart initialization approaches to enrich the first gen-erated population, and proposes various crossover operators to createa better diversity of offspring. Especially, the
MIP-EGO configurator ,which can tune algorithm parameters, is adopted to automatically tuneoperator probabilities. Furthermore, different local search strategies areemployed to explore the neighborhood for better solutions. In general,the algorithm enhancement strategy can be integrated with any standardEMO algorithm. In this paper, it has been combined with NSGA-III tosolve benchmark multi-objective FJSPs, whereas an off-the-shelf imple-mentation of NSGA-III is not capable of solving the FJSP. The experi-mental results show excellent performance with less computing budget.
Keywords:
Flexible job shop scheduling · Multi-objective optimization · Evolutionary algorithm.
The Job shop scheduling problem (JSP) is an important branch of productionplanning problems. The classical JSP consists of a set of independent jobs to beprocessed on multiple machines and each job contains a number of operationswith a predetermined order. It is assumed that each operation must be processedon a specific machine with a specified processing time. The JSP is to determinea schedule of jobs, meaning to sequence operations on the machines. The flexiblejob shop scheduling problem (FJSP) is an important extension of the classicalJSP due to the wide employment of multi-purpose machines in the real-world jobshop. The FJSP extends the JSP by assuming that each operation is allowed tobe processed on a machine out of a set of alternatives, rather than one specifiedmachine. Therefore, the FJSP is not only to find the best sequence of operationson a machine, but also to assign each operation to a machine out of a set of (cid:63)
This work is part of the research programme Smart Industry SI2016 with projectname CIMPLO and project number 15465, which is (partly) financed by the Nether-lands Organisation for Scientific Research (NWO). a r X i v : . [ c s . N E ] A p r Yali Wang et al. qualified machines. The JSP is well known to be strongly NP-hard [1]. TheFJSP is an even more complex version of the JSP, so the FJSP is clearly alsostrongly NP-hard.A typical objective of the FJSP is the makespan , which is defined as themaximum time for completion of all jobs, in other words, the total length of theschedule. However, to achieve a practical schedule for the FJSP, various con-flicting objectives should be considered. In this paper, evolutionary algorithms(EA) have been applied to a multi-objective flexible job shop scheduling prob-lem (MOFJSP) with three objectives, namely: The makespan, total workloadand critical workload. We propose and adopt multiple initialization approachesto enrich the first generated population based on our definition of the chromo-some representation; at the same time, diverse genetic operators are applied toguide the search towards offspring with a wide diversity; especially, we use analgorithm configurator to tune the parameter configuration; furthermore, twolevels of local search are employed leading to better solutions. Our proposedFJSP multi-objective evolutionary algorithm (FJSP-MOEA) can be combinedwith almost all MOEAs to solve MOFJSP, the experimental results show thatFJSP-MOEA can achieve the state-of-the-art results with less computationaleffort when we merge it with NSGA-III [2].The paper is organized as follows. The next section formulates the MOFJSP,which is the problem we are about to solve. Section 3 gives necessary backgroundknowledge. Section 4 introduces the proposed algorithm and Section 5 reportsthe experimental results. Finally, Section 6 concludes the work and suggestsfuture work directions.
The MOFJSP addressed in this paper is described as follows:1. There are n jobs J = { J , J , · · · , J n } and m machines M = { M , M , · · · , M m } .2. Each job J i comprises l i operations for i = 1 , · · · , n , the j th operation of job J i is represented by O ij , and the operation sequence of job J i is from O i to O il i .3. For each operation O ij , there is a set of machines capable of performing it,which is represented by M ij and it is a subset of M .4. The processing time of the operation O ij on machine M k is predefined anddenoted by t ijk .At the same time, the following assumptions are made:1. All machines are available at time 0 and assumed to be continuously avail-able.2. All jobs are released at time 0 and independent from each other.3. Setting up times of machines and transportation times between operationsare negligible.4. Environmental changes (such as machine breakdowns) are neglected. Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 3
5. A machine can only work on one operation at a time.6. There are no precedence constraints among the operations of different jobs,and the order of operations for each job cannot be modified.7. An operation, once started, must run to completion.8. No operation for a job can be started until the previous operation for thatjob is completed.The makespan, total workload and critical workload, which are commonlyconsidered in the literature on FJSP (e.g., [3], [4]), are minimized and used asthe three objectives in our algorithm. Minimizing the makespan can facilitatethe rapid response to the market demand. The total workload represents thetotal working time of all machines and the critical workload is the maximumworkload among all machines. Minimizing the total workload can reduce the useof machines; minimizing the critical workload can balance the workload betweenmachines. Let C i denote the completion time of job J i , W k the sum of processingtime of all operations that are processed on machine M k . The three objectivescan be defined as follows:makespan( C max ) : f = max { C i | i = 1 , , · · · , n } (1)total workload( W t ) : f = m (cid:88) k =1 W k (2)critical workload( W max ) : f = max { W k | k = 1 , , · · · , m } (3)An example of MOFJSP is shown in Table 1 as an illustration, where rowscorrespond to operations and columns correspond to machines. In this example,there are three machines: M , M and M . Each entry of the table denotes theprocessing time of that operation on the corresponding machine, and the tag“ − ” means that a machine cannot execute the corresponding operation.Table 1: Processing time of a FJSP instance Job Operation M M M J O O O - - 2 J O O J O O The FJSP has been investigated extensively in the last three decades. Accord-ing to [5], EA is the most popular non-hybrid technique to solve the FJSP.Among all EAs for FJSP, some are developed for the more challenging FJSP:the MOFJSP which we formulated in section 2. [6], [3] and [4] are very success-ful MOFJSP algorithms and have obtained high-quality solutions. [6] proposeda multi-objective genetic algorithm (MOGA) based on the immune and entropyprinciple. In this MOGA, the fitness was determined by the Pareto dominancerelation and the diversity was kept by the immune and entropy principle. In[3], a simple EA (SEA) was proposed, which used domain heuristics to generatethe initial population and balanced the exploration and exploitation by refiningduplicate individuals with mutation operators. A memetic algorithm (MA) wasproposed in [4] and it incorporated a local search into NSGA-II [7]. A hierarchicalstrategy was adopted in the local search to handle three objectives: makespan,total workload and maximum workload. In section 6, these algorithms have beencompared with our algorithm on the MOFJSP.
EA involves using multiple parameters, such as the crossover probability, mu-tation probability, computational budget, as so on. The preset values of theseparameters affect the performance of the algorithm in different situations. Theparameters are usually set to values which are assumed to be good. For example,the mutation probability normally is kept very low, otherwise the convergenceis supposed to be delayed unnecessarily. But the best way to identify the prob-ability would be to do a sensitivity analysis: carrying out multiple runs of thealgorithms with different mutation probabilities and comparing the outcomes.Although there are some self-tuning techniques for adjusting these parameter onthe go, the hyper-parameters in EA can be optimized using the technique frommachine learning.The optimization of hyper-parameters and neural network architectures isa very important topic in the field of machine learning due to the large num-ber of design choices for a network architecture and its parameters. Recently,algorithms have been developed to accomplish this automatically since it is in-tractable to do it by hand. The MIP-EGO [8] is one of these configurators thatcan automatically configure convolutional neural network architectures and theresulting optimized neural networks have been proven to be competitive withthe state-of-the-art manually designed ones on some popular classification tasks.Especially, MIP-EGO allows for multiple candidate points to be selected andevaluated in parallel, which can speed up the automatic tuning procedure. Inour paper, we tune several parameters with MIP-EGO to find the best parametersetting for them.
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 5
NSGA-III is a decomposition-based MOEA, it is an extension of the well-knowNSGA-II and eliminates the drawbacks of NSGA-II such as the lack of uni-form diversity among a set of non-dominated solutions. The basic frameworkof NSGA-III is similar to the original NSGA-II, while it replaces the crowdingdistance operator with a clustering operator based on a set of reference points. Awidely-distributed set of reference points can efficiently promote the populationdiversity during the search and NSGA-III defines a set of reference points byDas and Dennis (cid:48) s method [9].In each iteration t , an offspring population Q t of size N pop is created from theparent population P t of size N pop using usual selection, crossover and mutation.Then a combined population R t = P t ∪ Q t is formed and classified into differ-ent layers ( F , F , and so on ), each layer consists of mutually non-dominatedsolutions. Thereafter, starting from the first layer, points are put into a newpopulation S t . A whole population is obtained until the first time the size of S t is equals to or larger than N pop . Suppose the last layer included in S t is the l -thlayer, so far, members in S t \ F l are points that have been chosen for P t +1 andthe next step is to choose the remaining points from F l to make a complete P t +1 .In general (when the size of S t doesn’t equal to N pop ), N pop − | S t \ F l | solutionsfrom F l needs to be selected for P t +1 .When selecting individuals from F l , first, each member in S t is associatedwith a reference point by searching the shortest perpendicular distance from themember to all reference lines created by joining the ideal point with referencepoints. Next, a niching strategy is employed to choose points associated withthe least reference points in P t +1 from F l . The niche count for each referencepoint, defined as the number of members in S t \ F l that are associated with thereference point, is computed. The member in F l associated with the referencepoint having the minimum niche count is included in P t +1 . The niche count ofthat reference point is then increased by one and the procedure is repeated tofill the remaining population slots of P t +1 .NSGA-III is powerful to handle problems with non-linear characteristics aswell as having many objectives. Therefore, we decide to enhance NSGA-III inour algorithm for the MOFJSP. The proposed algorithm,
Flexible Job Shop Problem Multi-objective EvolutionaryAlgorithm (FJSP-MOEA) can in principal be combined with any MOEA andhelp MOEAs solve the MOFJSP, whereas the standard MOEAs cannot solveMOFJSP solely. The algorithm follows the flow of a typical EA and generatesimproved solutions by using local search. Details of the following componentsare given in the next subsections. – Initialization: encode the individual and generate the initial population. – Genetic operators: generate offspring by crossover and mutation operators.
Yali Wang et al. – Local search: decode the individual and improve the solution with localsearch.
The MOFJSP is a combination of assign-ing each operation to a machine and ordering operations on the machines. Inthe algorithm, each chromosome (individual) represents a solution in the searchspace and the chromosome consists of two parts: the operation sequence vectorand the machine assignment vector. Let N denote the number of all operationsof all jobs. The length of both vectors is equal to N . The operation sequencevector decides the sequence of operations assigned to each machine. For anytwo operations which are processed by the same machine, the one located infront is processed earlier than the other one. The machine assignment vector as-signs the operations to machines, in other words, it determines which operationis processed by which machine and the machine should be the one capable ofprocessing the operation.The format of representing an individual not only influences the implementa-tion of crossover and mutation operators, a proper representation can also avoidthe production of infeasible schedules and reduces the computational time. Inour algorithm, the chromosomal representation proposed by Zhang et al. in [10]is adopted and an example is given in Table 2.Table 2: An example of a chromosome representation Operation sequence 111 222 333 222 111 111 333 O O O O O O O Machine assignment 222 111 111 333 222 222 111 O O O O O O O M M M M M M M In Table 2, the first row shows the operation sequence vector which consistsof only job indexes. For each job, the first appearance of its index representsthe first operation of that job and the second appearance of the same indexrepresents the second operation of that job, and so on. The occurrence numberof an index is equal to the number of operations of the corresponding job. Thesecond row explains the first row by giving the real operations. The third rowis the machine assignment vector which presents the selected machines for alloperations. The operation sequence of the machine assignment vector is fixed,which is from the first job to the last job and from the first operation to the lastoperation for each job. The fourth row indicates the fixed operation sequenceof the machine assignment vector and the fifth row shows the real machinesof the operations. Each integer value in the machine assignment vector is the
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 7 index of the machine in the set of alternative machines of that operation. In thisexample, O is assigned to M because M is the first (and only) machine inthe alternative machine set of O (Table 1). The alternative machine set of O is { M , M } , the second machine in this set is M , therefore, O is assigned to M . Our algorithm starts by creating the initial popu-lation. The machine assignment and operation sequence vectors are generatedseparately for each individual. In the literature, a few approaches have beenproposed for producing individuals, such as global minimal workload in [11];AssignmentRule1 and AssignmentRule2 in [12]. In our algorithm, several newmethods are proposed, namely the
Processing Time Roulette Wheel (PRW) and
Workload Roulette Wheel (WRW) for initialising the machine assignment andthe
Most Remaining Machine Operations (MRMO) and
Most Remaining Ma-chine Workload (MRMW) for initialising the operation sequence. These newapproaches have used together with some commonly used dispatching rules ininitializing individuals on the purpose of enriching the initial population. Whengenerating a new individual in our algorithm, two initialization methods are ran-domly picked from the following two lists; one for the machine assignment vectorand one for the operation sequence vector.
Initialization methods for machine assignment
1. Random assignment (Random): an operation is assigned to an eligible machinerandomly.2. Processing time Roulette Wheel (PRW): for each operation, the roulette wheelselection is adopted to select a machine from its machine set based on theprocessing times of these capable machines. The machine with the shorterprocessing time is more likely to be selected.3. Workload Roulette Wheel (WRW): for each operation, the roulette wheel se-lection is used to select a machine from its machine set based on the currentworkloads plus the processing times of these capable machines. The machinewith lower sum of the workload and processing time is more likely to be se-lected.We propose PRW and WRW to assign the operation to the machine withless processing time or accumulated workload, at the same time, maintainingthe freedom of exploring the entire search space.
Initialization methods for operation sequence
1. Random permutation (Random): starting from a fixed sequence: all job in-dexes of J (the number of J job indexes is the number of operations of J ),followed by all job indexes of J , and so on. Then the array with the fixedsequence is permuted and a random order is generated. Yali Wang et al.
2. Most Work Remaining (MWR): operations are placed one by one into theoperation sequence vector. Before selecting an operation, the remaining pro-cessing times of all jobs are calculated respectively, the first optional operationof the job with the longest remaining processing time is placed into the chro-mosome.3. Most number of Operations Remaining (MOR): operations are placed oneby one into the operation sequence vector. Before selecting an operation, thenumber of succeeding operations of all jobs is counted respectively, the firstoptional operation of the job with the most remaining operations is placedinto the chromosome.4. Long Processing Time (LPT)[13]: operations are placed one by one into theoperation sequence vector, each time, the operation with maximal processingtime is selected without breaking the order of jobs.5. Most Remaining Machine Operations (MRMO): operations are placed intothe operation sequence vector according to both the number of subsequentoperations on machines and the number of subsequent operations of jobs.MRMO is a hierarchical method and takes the machine assignment into con-sideration. First, the machine with the most subsequent operations is se-lected. After that, the optional operations in the subsequent operations onthat machine are found based on the already placed operations. For example,if O → O → O are placed operations, the current optional operation canonly be chosen from O , O , and O . In these optional operations, thosewhich are assigned to the selected machine are picked and the one that belongsto the job with the most subsequent operations is placed into the chromosome.In this example, O will be chosen if it is assigned to the selected machinebecause there are two subsequent operations for J and only one subsequentoperation for J and J . Note that it is possible that no operation is availableon that machine, in that case, the machine with the second biggest number ofsubsequent operations will be selected, and so forth.6. Most Remaining Machine Workload (MRMW): operations are placed intothe operation sequence vector according to both the remaining processingtimes of machines and the remaining processing times of jobs. MRMW is ahierarchical method similar to MRMO. After finding the machine with thelongest remaining process time and the optional operations on that machine,the operation which belongs to the job with the longest remaining processtime is placed into the chromosome. Again, if no operation is available onthat machine, the machine with the second longest remaining processing timewill be selected, and so forth.We propose MRMO and MRMW to give priority to both the machine andthe job with the most number of remaining operations (MRMO) and the longestremaining processing time (MRMW). Crossover is a matter of replacing some of the genes in one parent with thecorresponding genes of the other (Glover and Kochenberger [14]). Since our rep-
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 9 resentation of chromosomes has two parts, crossover operators applied to thesetwo parts of chromosomes are implemented separately as well. We propose twonew crossover operators,
Precedence Preserving Two Points Crossover (PPTP)and
Uniform Preservative crossover (UPX), and use them together with severalcommonly adopted crossover operators. When executing the crossover operationin the proposed algorithm, one crossover operator for machine assignment andone operator for the operation sequence, are randomly chosen from the followingtwo lists to generate the offspring.
Crossover operators for machine assignment
1. No crossover2. One point crossover: a cutting point is picked randomly and genes after thecutting point are swapped between two parents.3. Two points crossover: two cutting points are picked randomly and genes be-tween the two points are swapped between two parents.4. Job-based crossover (JX): it generates two children from two parents by thefollowing procedure:a A vector with the size of the jobs is generated, which consists of randomvalues 0 and 1.b For the job corresponding to value 0, the assigned machines of its opera-tions are preserved.c For the job corresponding to value 1, the machines of its operations areswapped between two parents.5. Multi-point preservative crossover (MPX)[15]: MPX generates two childrenfrom two parents by the following procedure:a A vector with the size of all operations is generated, which consists ofrandom values 0 and 1.b For the operations corresponding to value 0, their machines (genes) arepreserved.c For the operations corresponding to value 1, their machines (genes) areswapped between the two parents.
Crossover operators for operation sequence
1. No crossover2. Precedence preserving one point crossover (PPOP) [17]: PPOP generates twochildren from two parents by the following procedure:a A cutting point is picked randomly, genes to the left are preserved andcopied from parent1 to child1 and from parent2 to child2.b The remaining operations in parent1 are reallocated in the order theyappear in parent2.c The remaining operations in parent2 are reallocated in the order theyappear in parent1.
An example of PPOP is shown in Figure 1 and the cutting point is betweenthe third and fourth operation. Red numbers in parent2 are the genes on theright side of the cutting point in parent1 and they are copied to child1 withtheir own sequence following the genes on the left side of the cutting point inparent1, and vice versa.Fig. 1: The process of PPOP3. Precedence Preserving Two Points Crossover (PPTP): PPTP generates twochildren from two parents by the following procedure:a Two cutting points are picked randomly, genes except for those betweenthe two points are preserved and copied from parent1 to child1 and fromparent2 to child2.b Operations between the two cutting points in parent1 are reallocated inthe order they appear in parent2.c Operations between the two cutting points in parent2 are reallocated inthe order they appear in parent1.4. Improved precedence operation crossover (IPOX)[16]: IPOX divides the jobset into two complementary and non-empty subsets randomly. The operationsof one job subset are preserved, while the operations of another job subset arecopied from another parent.5. Uniform Preservative crossover (UPX): UPX generates two children from twoparents by the following procedure:a A vector with the size of all operations is generated, which consists ofrandom values 0 and 1.b For the operations corresponding to value 0, the genes are preserved andcopied from parent1 to child1 and from parent2 to child2.c For the operations corresponding to value 1, the genes in parent1 are foundin parent2 and copied from parent2 with the sequence in parent2, and viceversa.
The mutation operator flips the gene values at selected locations. By forcing thealgorithm to search areas other than the current area, the mutation operator isused to maintain genetic diversity from one generation of a population to thenext. In our algorithm, insertion mutation and swap mutation (including onepoint swap and two points swap) are proposed and used.
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 11
Insertion Mutation Operator generates a new individual by the followingprocedure: – Two random numbers i and j (1 ≤ i ≤ N , 1 ≤ j ≤ N ) are selected. – For the operation sequence vector, the operation on position j is inserted infront of the operation on position i . – For the machine assignment vector, a machine is randomly selected for boththe operations on i and on j respectively. If the processing time on thenewly selected machine is lower than that on the current machine, the cur-rent machine is replaced by the new machine. If the processing time on thenew machine is longer than that on the old machine, there is only a 20%probability that the new machine replaces the old machine. Swap Mutation Operator generates a new individual by the followingprocedure: – One random number i (1 ≤ i ≤ N ) is selected or two random numbers i and j (1 ≤ i ≤ N , 1 ≤ j ≤ N ) are selected. – For the operation sequence vector, with only one swap point i , the operationon the swap point is swapped with its neighbour; with two swap points, theoperations on position i and j are swapped. – For the machine assignment vector, the machine on position i (and j ) isreplaced with a new machine by the same rule used in the insertion mutationoperator. Decoding a chromosome is to convert an individual into a feasible schedule tocalculate the objective values which represents the relative superiority of a so-lution. In this process, the operations are picked one by one from the operationsequence vector and placed on the machines from the machine assignment vectorto form the schedule. When placing each operation to its machine, local search(in the sense of heuristic rules to improve solution) is involved to refine an in-dividual in order to obtain an improved schedule in the proposed algorithm.Two levels of local search are applied to allocate each operation to a time sloton its machine. We know that idle times may exist between operations on eachmachine due to precedence constraints among operations of each job, and twolevels of local search utilize idle times in different degrees.
The first level local search let S ij be the starting time of O ij and C ij thecompletion time of O ij , an example of the first level local search is shown inFigure 2. Because O mn needs to be processed after the completion of O mn − ,an idle time interval between the completion of O ab and the starting of O mn appeared on machine M k . O ij is assigned to M k and we assume that O mn isthe last operation on M k before handling O ij , therefore the starting time of O ij is max { C mn , C ij − } , which in this example is C mn and it is later than C ij − , thus, there is an opportunity that O ij can be processed earlier. When checkingthe idle time on M k , the idle time interval [ C ab , S mn ] is found available for O ij because the idle time span [ C ij − , S mn ], which is part of [ C ab , S mn ], is enoughto process O ij or longer than t ijk .Fig. 2: First level local search Fig. 3: Second level local searchLet S dk be the starting time of the d th idle time interval on M k and C dk be thecompletion time. O ij can be transferred to an earliest possible idle time intervalof its machine which satisfies the following equation:max { S dk , C ij − } + t ijk ≤ C dk , ( C ij = 0 , if j = 1) (4)After using the idle time interval, the starting time of O ij is max { S dk , C ij − } and the idle interval is updated based on the starting and completion time of O ij : (1) the idle time interval is removed; (2) the starting or completion time ofthe idle time interval is modified; (3) the idle time interval is replaced by twonew shorter idle time intervals, like in the example of Figure 2.After decoding a chromosome, the operation sequence vector of the chro-mosome is updated according to new starting times of operations, and threeobjective values are calculated. The first level local search only finds for eachoperation the available idle time interval on its assigned machine. After generat-ing the corresponding schedule with the first level search method, it is possiblethat there are still operations that can be allocated to available idle time in-tervals to benefit the fitness value. To achieve this, decoding the chromosomewhich has been updated with the first level local search is performed with thesecond level local search, and again operations are moved to available idle timeintervals. The second level local search
The second level local search not only checksthe idle time intervals on the assigned machine, but also the idle time intervalson alternative machines. An example of making use of the idle time interval on
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 13 another machine is shown in Figure 3. Let S ijk be the starting time and C ijk bethe completion time of O ij on M k . In this example, O ij is assigned to M k in theinitial chromosome, we assume that O ij can also be performed by M e . Under thecondition that the starting time of O ij on M k is later than the completion timeof O ij − , the idle time intervals on all alternative machines which can process O ij are checked. An idle time interval on M e could be a choice and O ij can bereallocated to M e . In this example, the processing time of O ij on M e is evenshorter then the processing time on M k , therefore, this reallocation can at leastbenefit the total workload.In the second level local search, all available idle time intervals of an operationare checked one by one until the first “really” available idle time interval is foundand then the operation is moved to that idle time interval. Any idle time intervalon an alternative machine which can satisfy Equation 4 is an available idle timeinterval, while it must meet at least one of the following conditions to become a“really” available idle time interval.1. The processing time of the operation on the new machine is shorter thanon the initially assigned machine if the available idle time interval is on adifferent machine;2. The operation can be moved from the machine with the maximal makespanto another machine.3. The operation can be moved from the machine with the maximal workloadto another machine.The total workload can be improved directly by the first condition; the motiveof the second condition is to decrease the maximal makespan and the thirdcondition can benefit the critical workload.After the reallocation of the operations with the second level local search, thecorresponding schedule is obtained and objective values are calculated. While,instead of updating the chromosome immediately, the new objective values arecompared with the old objective values first, the chromosome can be updatedonly when at lease one objectives is better than its old value. This is to makesure that the new schedule is at least not worse than the old schedule (The newsolution is not dominated by the old solution). Another difference between thefirst and second level local search is that the first level local search is performedon every evaluation, while the second level local search is only performed witha 30% probability for each chromosome to avoid local optima. Although thesetwo local searches can be applied repeatedly to improve the solution, to avoidthat the algorithm is stuck in a local optima, they are employed only once foreach evaluation. ka10x7, ka10x10, ka15x10) and 10 BRdata instances (Mk01-Mk10). Table 3 givesthe scale of these instances. The first column is the name of each instance; thesecond column shows the size of the instance, in which n stands for the number ofjobs and m the number of machines; the third column represents the number ofoperations; the fourth column lists the flexibility of each instance, which meansthe average number of alternative machines for each operation in the problem.Table 3: The scale of benchmark instances Instance n → m → → →
10 30 10ka15x10 15 →
10 56 10Mk01 10 → → → → → →
15 150 3Mk07 20 → →
10 225 1.5Mk09 20 →
10 240 3Mk10 20 →
15 240 3
All the experiments are performed with a population size of 100, each runof the algorithm will stop based on a predefined number of evaluation, which is10 ,
000 for Kacem instances and 150 ,
000 for BRdata instances. For each probleminstance, the proposed algorithm is independently run 30 times. The resultingsolution set of an instance is formed by merging all the non-dominated solutionsfrom its 30 runs.The crossover probability is set to 1 and two random crossover operatorscan be chosen each time (one for operation sequence and one for machine as-signment). For Kacem instances, the mutation probabilities are set to 0 .
6. ForBRdata instances, which include larger-scale and more complex problems, theMIP-EGO configurator [8] is adopted to tune both insertion and swap mutationprobabilities (one point swap mutation and two points swap mutation) to findthe best parameter values for each problem. The hypervolume of the solution sethas been used in MIP-EGO as the objective value to tune three mutation prob-abilities. Although the true Pareto fronts (PF) for test instances are unknown,[4] provides the reference set for Kacem and BRdata FJSP instances, which isformed by gathering all non-dominated solutions found by all the implementedalgorithms in [4] and also non-dominated solutions from other state-of-the-art
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 15
MOFJSP algorithms. We define the reference point for calculating the hypervol-ume value based on the largest value in this reference set. To be specific, eachobjective function value of the reference point is: 1 . × largest objective func-tion value of the respective dimension in the reference set. The origin point isused as the ideal point. Other basic parameter settings of MIP-EGO are listedin Table 4. For each mutation probability, we only consider a discretized numberwith only one digit after the decimal point, therefore, the search space is ordinalor integer space, which in MIP-EGO are handled in the same way.Table 4: Settings for MIP-EGO Parameter valuemaximal number of evaluations 200surrogate model random forestoptimizer for infill criterion MIESsearch space ordinal space
With a budget of 200 evaluations, Table 5 shows the percentage of the eval-uations which can achieve the largest hypervolume value (or the best PF) byMIP-EGO. It can be observed for Mk05 and Mk08 that all the evaluations haveobtained the largest hypervolume value, it means that all parameter values ofmutation probabilities in MIP-EGO can achieve the best PF for these two prob-lems. It can also be seen in Table 3 that both problems have a low flexibilityvalue. On the contrary, for Mk06, Mk09 and Mk10, these problems have a largeoperation number and high flexibility. It seems that they can be difficult to solvebecause there is only one best parameter setting for the mutation probabilities.This also means that it is highly likely better solution sets can be found with ahigher budget. Table 5: Probability of finding best configuration
Mk01 Mk02 Mk03 Mk04 Mk05 Mk06 Mk07 Mk08 Mk09 Mk1073% 60% 95% 1% 100% 0 .
5% 4 .
5% 100% 0 .
5% 0 . With the best parameter setting of the mutation probabilities for BRdatainstances, we compared our experimental results with the reference set in [4].Our algorithm can achieve the same Pareto optimal solutions as in the referenceset for all BRdata instances except for Mk06, Mk09 and Mk10. At the sametime, for Mk06 and Mk10, our algorithm can find new non-dominated solutions.Table 6 is the list of new non-dominated solutions obtained by our algorithm, each row of an instance is a solution with three objectives: makespan, totalworkload, and critical workload.Table 6: Newly achieved non-dominated solutions
Mk06 Mk1061 427 53 218 1973 19563 428 52 218 1991 19463 435 51 219 1965 19565 453 49 220 1984 19166 451 49 225 1979 19466 457 48 226 1954 196226 1974 194226 1979 192228 1973 194235 1938 199236 1978 193
Another comparison is between our algorithm (FJSP-MOEA) and MOGA[6], SEA [3] and MA1, MA2 [4]. In [4], there are several variants of the proposedalgorithm with different strategies in the local search. We pick MA1 and MA2as compared algorithms because they perform equally good or superior to otheralgorithms on almost all problems. Table 7 displays the hypervolume value ofthe PF approximation from all algorithms and the new reference set which isformed by combining all solutions from the PF by all algorithms. The highesthypervolume value on each problem in all algorithms has been highlighted inbold. We observed that FJSP-MOEA and MA1, MA2 show the best and similarperformance, and MOGA behaves the best for three of the BRdata instances.The good performance of MOGA on three problems is interesting. MOGA hasa entropy-based mechanism to maintain decision space diversity which might bebeneficial for solving these problem instances. When using one best parametersetting, we also give the average hypervolume and standard deviation from 30runs on each problem in Table 8, the standard deviation of each problem showsthe stable behaviour of each run.For Kacem instances and with fixed mutation probabilities, our obtainednon-dominated solutions are the same as the PF in the reference set. MA1 andMA2 also achieved the best PF for all Kacem instances, but our algorithmuses far less computational resources. The proposed FJSP-MOEA uses only apopulation size of 100 whereas the population size of MA algorithms is 300.FJSP-MOEA uses only 10 ,
000 objective function evaluations, whereas MA used150 ,
000 evaluations. In terms of computational resources the proposed FJSP-MOEA can therefore be used on smaller computer systems, entailing broaderapplicability, and possibly also in real-time algorithm implementations such asdynamic optimization.
Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 17
Table 7: Hypervolume from MOGA, SEA, MA1, MA2, FJSP-MOEA and thereference set
Problem MOGA SEA MA1 MA2 FJSP-MOEA RefMk01 0.00426 0.00508
Table 8: Average hypervolume and std with the best parameter setting
Problem Mk01 Mk02 Mk03 Mk04 Mk05 Mk06 Mk07 Mk08 Mk09 Mk10Average HV 0 . . . . . . . . . . . . . . . . . . A novel multi-objective evolutionary algorithm for MOFJSP is proposed. It usesmultiple initialization approaches to enrich the first generation population, andvarious crossover operators to create better diversity for offspring. Moreover,to determine the optimal mutation probabilities, the MIP-EGO configurator isadopted to automatically generate proper mutation probabilities. Besides, thestraightforward local search is employed with different levels to aid more accurateconvergence to the PF. The proposed customization approach in principle can becombined with almost all MOEAs. In this paper, we incorporate it with one ofthe state-of-the-art MOEAS, namely NSGA-III, to solve MOFJSP, and the newalgorithm can find all Pareto optimal solutions in literature for most problems,and even new Pareto optimal solutions for the large scale instances.In this paper, we show the ability of MIP-EGO in finding the optimal muta-tion probabilities. However, there is more potential in the automated parameterconfiguration domain that can benefit EA. For example, to know the effects ofdifferent initialization approaches and crossover operators, we can optimize theinitialization and crossover configuration. Furthermore, other parameters of theproposed algorithm, such as, population size, evaluation number, and so on, canalso be tuned automatically. However, so far the efficiency of the existing tun-ing framework is limited when it comes to a larger number of parameters. It would therefore be a good topic of future research to find more efficient imple-mentations of these. Finally, based on the good performance of MOGA on someof the problems, it seems to be interesting for future research to integrate theentropy-based selection mechanism also into the MOEA schemes to achieve aneven better performance.