[PDF] A Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling

Abstract

A customized multi-objective evolutionary algorithm (MOEA) is proposed for the multi-objective flexible job shop scheduling problem (FJSP). It uses smart initialization approaches to enrich the first generated population, and proposes various crossover operators to create a better diversity of offspring. Especially, the MIP-EGO configurator, which can tune algorithm parameters, is adopted to automatically tune operator probabilities. Furthermore, different local search strategies are employed to explore the neighborhood for better solutions. In general, the algorithm enhancement strategy can be integrated with any standard EMO algorithm. In this paper, it has been combined with NSGA-III to solve benchmark multi-objective FJSPs, whereas an off-the-shelf implementation of NSGA-III is not capable of solving the FJSP. The experimental results show excellent performance with less computing budget.

Full PDF

AA Tailored NSGA-III Instantiation for FlexibleJob Shop Scheduling (cid:63)

Yali Wang, Bas van Stein, Michael T.M. Emmerich, and Thomas B¨ack

Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1,2333CA Leiden, The Netherlands [email protected]

Abstract.

A customized multi-objective evolutionary algorithm (MOEA)is proposed for the multi-objective ﬂexible job shop scheduling problem(FJSP). It uses smart initialization approaches to enrich the ﬁrst gen-erated population, and proposes various crossover operators to createa better diversity of oﬀspring. Especially, the

MIP-EGO conﬁgurator ,which can tune algorithm parameters, is adopted to automatically tuneoperator probabilities. Furthermore, diﬀerent local search strategies areemployed to explore the neighborhood for better solutions. In general,the algorithm enhancement strategy can be integrated with any standardEMO algorithm. In this paper, it has been combined with NSGA-III tosolve benchmark multi-objective FJSPs, whereas an oﬀ-the-shelf imple-mentation of NSGA-III is not capable of solving the FJSP. The experi-mental results show excellent performance with less computing budget.

Keywords:

Flexible job shop scheduling · Multi-objective optimization · Evolutionary algorithm.

The Job shop scheduling problem (JSP) is an important branch of productionplanning problems. The classical JSP consists of a set of independent jobs to beprocessed on multiple machines and each job contains a number of operationswith a predetermined order. It is assumed that each operation must be processedon a speciﬁc machine with a speciﬁed processing time. The JSP is to determinea schedule of jobs, meaning to sequence operations on the machines. The ﬂexiblejob shop scheduling problem (FJSP) is an important extension of the classicalJSP due to the wide employment of multi-purpose machines in the real-world jobshop. The FJSP extends the JSP by assuming that each operation is allowed tobe processed on a machine out of a set of alternatives, rather than one speciﬁedmachine. Therefore, the FJSP is not only to ﬁnd the best sequence of operationson a machine, but also to assign each operation to a machine out of a set of (cid:63)

This work is part of the research programme Smart Industry SI2016 with projectname CIMPLO and project number 15465, which is (partly) ﬁnanced by the Nether-lands Organisation for Scientiﬁc Research (NWO). a r X i v : . [ c s . N E ] A p r Yali Wang et al. qualiﬁed machines. The JSP is well known to be strongly NP-hard [1]. TheFJSP is an even more complex version of the JSP, so the FJSP is clearly alsostrongly NP-hard.A typical objective of the FJSP is the makespan , which is deﬁned as themaximum time for completion of all jobs, in other words, the total length of theschedule. However, to achieve a practical schedule for the FJSP, various con-ﬂicting objectives should be considered. In this paper, evolutionary algorithms(EA) have been applied to a multi-objective ﬂexible job shop scheduling prob-lem (MOFJSP) with three objectives, namely: The makespan, total workloadand critical workload. We propose and adopt multiple initialization approachesto enrich the ﬁrst generated population based on our deﬁnition of the chromo-some representation; at the same time, diverse genetic operators are applied toguide the search towards oﬀspring with a wide diversity; especially, we use analgorithm conﬁgurator to tune the parameter conﬁguration; furthermore, twolevels of local search are employed leading to better solutions. Our proposedFJSP multi-objective evolutionary algorithm (FJSP-MOEA) can be combinedwith almost all MOEAs to solve MOFJSP, the experimental results show thatFJSP-MOEA can achieve the state-of-the-art results with less computationaleﬀort when we merge it with NSGA-III [2].The paper is organized as follows. The next section formulates the MOFJSP,which is the problem we are about to solve. Section 3 gives necessary backgroundknowledge. Section 4 introduces the proposed algorithm and Section 5 reportsthe experimental results. Finally, Section 6 concludes the work and suggestsfuture work directions.

The MOFJSP addressed in this paper is described as follows:1. There are n jobs J = { J , J , · · · , J n } and m machines M = { M , M , · · · , M m } .2. Each job J i comprises l i operations for i = 1 , · · · , n , the j th operation of job J i is represented by O ij , and the operation sequence of job J i is from O i to O il i .3. For each operation O ij , there is a set of machines capable of performing it,which is represented by M ij and it is a subset of M .4. The processing time of the operation O ij on machine M k is predeﬁned anddenoted by t ijk .At the same time, the following assumptions are made:1. All machines are available at time 0 and assumed to be continuously avail-able.2. All jobs are released at time 0 and independent from each other.3. Setting up times of machines and transportation times between operationsare negligible.4. Environmental changes (such as machine breakdowns) are neglected. Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 3

5. A machine can only work on one operation at a time.6. There are no precedence constraints among the operations of diﬀerent jobs,and the order of operations for each job cannot be modiﬁed.7. An operation, once started, must run to completion.8. No operation for a job can be started until the previous operation for thatjob is completed.The makespan, total workload and critical workload, which are commonlyconsidered in the literature on FJSP (e.g., [3], [4]), are minimized and used asthe three objectives in our algorithm. Minimizing the makespan can facilitatethe rapid response to the market demand. The total workload represents thetotal working time of all machines and the critical workload is the maximumworkload among all machines. Minimizing the total workload can reduce the useof machines; minimizing the critical workload can balance the workload betweenmachines. Let C i denote the completion time of job J i , W k the sum of processingtime of all operations that are processed on machine M k . The three objectivescan be deﬁned as follows:makespan( C max ) : f = max { C i | i = 1 , , · · · , n } (1)total workload( W t ) : f = m (cid:88) k =1 W k (2)critical workload( W max ) : f = max { W k | k = 1 , , · · · , m } (3)An example of MOFJSP is shown in Table 1 as an illustration, where rowscorrespond to operations and columns correspond to machines. In this example,there are three machines: M , M and M . Each entry of the table denotes theprocessing time of that operation on the corresponding machine, and the tag“ − ” means that a machine cannot execute the corresponding operation.Table 1: Processing time of a FJSP instance Job Operation M M M J O O O - - 2 J O O J O O The FJSP has been investigated extensively in the last three decades. Accord-ing to [5], EA is the most popular non-hybrid technique to solve the FJSP.Among all EAs for FJSP, some are developed for the more challenging FJSP:the MOFJSP which we formulated in section 2. [6], [3] and [4] are very success-ful MOFJSP algorithms and have obtained high-quality solutions. [6] proposeda multi-objective genetic algorithm (MOGA) based on the immune and entropyprinciple. In this MOGA, the ﬁtness was determined by the Pareto dominancerelation and the diversity was kept by the immune and entropy principle. In[3], a simple EA (SEA) was proposed, which used domain heuristics to generatethe initial population and balanced the exploration and exploitation by reﬁningduplicate individuals with mutation operators. A memetic algorithm (MA) wasproposed in [4] and it incorporated a local search into NSGA-II [7]. A hierarchicalstrategy was adopted in the local search to handle three objectives: makespan,total workload and maximum workload. In section 6, these algorithms have beencompared with our algorithm on the MOFJSP.

EA involves using multiple parameters, such as the crossover probability, mu-tation probability, computational budget, as so on. The preset values of theseparameters aﬀect the performance of the algorithm in diﬀerent situations. Theparameters are usually set to values which are assumed to be good. For example,the mutation probability normally is kept very low, otherwise the convergenceis supposed to be delayed unnecessarily. But the best way to identify the prob-ability would be to do a sensitivity analysis: carrying out multiple runs of thealgorithms with diﬀerent mutation probabilities and comparing the outcomes.Although there are some self-tuning techniques for adjusting these parameter onthe go, the hyper-parameters in EA can be optimized using the technique frommachine learning.The optimization of hyper-parameters and neural network architectures isa very important topic in the ﬁeld of machine learning due to the large num-ber of design choices for a network architecture and its parameters. Recently,algorithms have been developed to accomplish this automatically since it is in-tractable to do it by hand. The MIP-EGO [8] is one of these conﬁgurators thatcan automatically conﬁgure convolutional neural network architectures and theresulting optimized neural networks have been proven to be competitive withthe state-of-the-art manually designed ones on some popular classiﬁcation tasks.Especially, MIP-EGO allows for multiple candidate points to be selected andevaluated in parallel, which can speed up the automatic tuning procedure. Inour paper, we tune several parameters with MIP-EGO to ﬁnd the best parametersetting for them.

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 5

NSGA-III is a decomposition-based MOEA, it is an extension of the well-knowNSGA-II and eliminates the drawbacks of NSGA-II such as the lack of uni-form diversity among a set of non-dominated solutions. The basic frameworkof NSGA-III is similar to the original NSGA-II, while it replaces the crowdingdistance operator with a clustering operator based on a set of reference points. Awidely-distributed set of reference points can eﬃciently promote the populationdiversity during the search and NSGA-III deﬁnes a set of reference points byDas and Dennis (cid:48) s method [9].In each iteration t , an oﬀspring population Q t of size N pop is created from theparent population P t of size N pop using usual selection, crossover and mutation.Then a combined population R t = P t ∪ Q t is formed and classiﬁed into diﬀer-ent layers ( F , F , and so on ), each layer consists of mutually non-dominatedsolutions. Thereafter, starting from the ﬁrst layer, points are put into a newpopulation S t . A whole population is obtained until the ﬁrst time the size of S t is equals to or larger than N pop . Suppose the last layer included in S t is the l -thlayer, so far, members in S t \ F l are points that have been chosen for P t +1 andthe next step is to choose the remaining points from F l to make a complete P t +1 .In general (when the size of S t doesn’t equal to N pop ), N pop − | S t \ F l | solutionsfrom F l needs to be selected for P t +1 .When selecting individuals from F l , ﬁrst, each member in S t is associatedwith a reference point by searching the shortest perpendicular distance from themember to all reference lines created by joining the ideal point with referencepoints. Next, a niching strategy is employed to choose points associated withthe least reference points in P t +1 from F l . The niche count for each referencepoint, deﬁned as the number of members in S t \ F l that are associated with thereference point, is computed. The member in F l associated with the referencepoint having the minimum niche count is included in P t +1 . The niche count ofthat reference point is then increased by one and the procedure is repeated toﬁll the remaining population slots of P t +1 .NSGA-III is powerful to handle problems with non-linear characteristics aswell as having many objectives. Therefore, we decide to enhance NSGA-III inour algorithm for the MOFJSP. The proposed algorithm,

Flexible Job Shop Problem Multi-objective EvolutionaryAlgorithm (FJSP-MOEA) can in principal be combined with any MOEA andhelp MOEAs solve the MOFJSP, whereas the standard MOEAs cannot solveMOFJSP solely. The algorithm follows the ﬂow of a typical EA and generatesimproved solutions by using local search. Details of the following componentsare given in the next subsections. – Initialization: encode the individual and generate the initial population. – Genetic operators: generate oﬀspring by crossover and mutation operators.

Yali Wang et al. – Local search: decode the individual and improve the solution with localsearch.

The MOFJSP is a combination of assign-ing each operation to a machine and ordering operations on the machines. Inthe algorithm, each chromosome (individual) represents a solution in the searchspace and the chromosome consists of two parts: the operation sequence vectorand the machine assignment vector. Let N denote the number of all operationsof all jobs. The length of both vectors is equal to N . The operation sequencevector decides the sequence of operations assigned to each machine. For anytwo operations which are processed by the same machine, the one located infront is processed earlier than the other one. The machine assignment vector as-signs the operations to machines, in other words, it determines which operationis processed by which machine and the machine should be the one capable ofprocessing the operation.The format of representing an individual not only inﬂuences the implementa-tion of crossover and mutation operators, a proper representation can also avoidthe production of infeasible schedules and reduces the computational time. Inour algorithm, the chromosomal representation proposed by Zhang et al. in [10]is adopted and an example is given in Table 2.Table 2: An example of a chromosome representation Operation sequence 111 222 333 222 111 111 333 O O O O O O O Machine assignment 222 111 111 333 222 222 111 O O O O O O O M M M M M M M In Table 2, the ﬁrst row shows the operation sequence vector which consistsof only job indexes. For each job, the ﬁrst appearance of its index representsthe ﬁrst operation of that job and the second appearance of the same indexrepresents the second operation of that job, and so on. The occurrence numberof an index is equal to the number of operations of the corresponding job. Thesecond row explains the ﬁrst row by giving the real operations. The third rowis the machine assignment vector which presents the selected machines for alloperations. The operation sequence of the machine assignment vector is ﬁxed,which is from the ﬁrst job to the last job and from the ﬁrst operation to the lastoperation for each job. The fourth row indicates the ﬁxed operation sequenceof the machine assignment vector and the ﬁfth row shows the real machinesof the operations. Each integer value in the machine assignment vector is the

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 7 index of the machine in the set of alternative machines of that operation. In thisexample, O is assigned to M because M is the ﬁrst (and only) machine inthe alternative machine set of O (Table 1). The alternative machine set of O is { M , M } , the second machine in this set is M , therefore, O is assigned to M . Our algorithm starts by creating the initial popu-lation. The machine assignment and operation sequence vectors are generatedseparately for each individual. In the literature, a few approaches have beenproposed for producing individuals, such as global minimal workload in [11];AssignmentRule1 and AssignmentRule2 in [12]. In our algorithm, several newmethods are proposed, namely the

Processing Time Roulette Wheel (PRW) and

Workload Roulette Wheel (WRW) for initialising the machine assignment andthe

Most Remaining Machine Operations (MRMO) and

Most Remaining Ma-chine Workload (MRMW) for initialising the operation sequence. These newapproaches have used together with some commonly used dispatching rules ininitializing individuals on the purpose of enriching the initial population. Whengenerating a new individual in our algorithm, two initialization methods are ran-domly picked from the following two lists; one for the machine assignment vectorand one for the operation sequence vector.

Initialization methods for machine assignment

1. Random assignment (Random): an operation is assigned to an eligible machinerandomly.2. Processing time Roulette Wheel (PRW): for each operation, the roulette wheelselection is adopted to select a machine from its machine set based on theprocessing times of these capable machines. The machine with the shorterprocessing time is more likely to be selected.3. Workload Roulette Wheel (WRW): for each operation, the roulette wheel se-lection is used to select a machine from its machine set based on the currentworkloads plus the processing times of these capable machines. The machinewith lower sum of the workload and processing time is more likely to be se-lected.We propose PRW and WRW to assign the operation to the machine withless processing time or accumulated workload, at the same time, maintainingthe freedom of exploring the entire search space.

Initialization methods for operation sequence

1. Random permutation (Random): starting from a ﬁxed sequence: all job in-dexes of J (the number of J job indexes is the number of operations of J ),followed by all job indexes of J , and so on. Then the array with the ﬁxedsequence is permuted and a random order is generated. Yali Wang et al.

2. Most Work Remaining (MWR): operations are placed one by one into theoperation sequence vector. Before selecting an operation, the remaining pro-cessing times of all jobs are calculated respectively, the ﬁrst optional operationof the job with the longest remaining processing time is placed into the chro-mosome.3. Most number of Operations Remaining (MOR): operations are placed oneby one into the operation sequence vector. Before selecting an operation, thenumber of succeeding operations of all jobs is counted respectively, the ﬁrstoptional operation of the job with the most remaining operations is placedinto the chromosome.4. Long Processing Time (LPT)[13]: operations are placed one by one into theoperation sequence vector, each time, the operation with maximal processingtime is selected without breaking the order of jobs.5. Most Remaining Machine Operations (MRMO): operations are placed intothe operation sequence vector according to both the number of subsequentoperations on machines and the number of subsequent operations of jobs.MRMO is a hierarchical method and takes the machine assignment into con-sideration. First, the machine with the most subsequent operations is se-lected. After that, the optional operations in the subsequent operations onthat machine are found based on the already placed operations. For example,if O → O → O are placed operations, the current optional operation canonly be chosen from O , O , and O . In these optional operations, thosewhich are assigned to the selected machine are picked and the one that belongsto the job with the most subsequent operations is placed into the chromosome.In this example, O will be chosen if it is assigned to the selected machinebecause there are two subsequent operations for J and only one subsequentoperation for J and J . Note that it is possible that no operation is availableon that machine, in that case, the machine with the second biggest number ofsubsequent operations will be selected, and so forth.6. Most Remaining Machine Workload (MRMW): operations are placed intothe operation sequence vector according to both the remaining processingtimes of machines and the remaining processing times of jobs. MRMW is ahierarchical method similar to MRMO. After ﬁnding the machine with thelongest remaining process time and the optional operations on that machine,the operation which belongs to the job with the longest remaining processtime is placed into the chromosome. Again, if no operation is available onthat machine, the machine with the second longest remaining processing timewill be selected, and so forth.We propose MRMO and MRMW to give priority to both the machine andthe job with the most number of remaining operations (MRMO) and the longestremaining processing time (MRMW). Crossover is a matter of replacing some of the genes in one parent with thecorresponding genes of the other (Glover and Kochenberger [14]). Since our rep-

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 9 resentation of chromosomes has two parts, crossover operators applied to thesetwo parts of chromosomes are implemented separately as well. We propose twonew crossover operators,

Precedence Preserving Two Points Crossover (PPTP)and

Uniform Preservative crossover (UPX), and use them together with severalcommonly adopted crossover operators. When executing the crossover operationin the proposed algorithm, one crossover operator for machine assignment andone operator for the operation sequence, are randomly chosen from the followingtwo lists to generate the oﬀspring.

Crossover operators for machine assignment

1. No crossover2. One point crossover: a cutting point is picked randomly and genes after thecutting point are swapped between two parents.3. Two points crossover: two cutting points are picked randomly and genes be-tween the two points are swapped between two parents.4. Job-based crossover (JX): it generates two children from two parents by thefollowing procedure:a A vector with the size of the jobs is generated, which consists of randomvalues 0 and 1.b For the job corresponding to value 0, the assigned machines of its opera-tions are preserved.c For the job corresponding to value 1, the machines of its operations areswapped between two parents.5. Multi-point preservative crossover (MPX)[15]: MPX generates two childrenfrom two parents by the following procedure:a A vector with the size of all operations is generated, which consists ofrandom values 0 and 1.b For the operations corresponding to value 0, their machines (genes) arepreserved.c For the operations corresponding to value 1, their machines (genes) areswapped between the two parents.

Crossover operators for operation sequence

1. No crossover2. Precedence preserving one point crossover (PPOP) [17]: PPOP generates twochildren from two parents by the following procedure:a A cutting point is picked randomly, genes to the left are preserved andcopied from parent1 to child1 and from parent2 to child2.b The remaining operations in parent1 are reallocated in the order theyappear in parent2.c The remaining operations in parent2 are reallocated in the order theyappear in parent1.

An example of PPOP is shown in Figure 1 and the cutting point is betweenthe third and fourth operation. Red numbers in parent2 are the genes on theright side of the cutting point in parent1 and they are copied to child1 withtheir own sequence following the genes on the left side of the cutting point inparent1, and vice versa.Fig. 1: The process of PPOP3. Precedence Preserving Two Points Crossover (PPTP): PPTP generates twochildren from two parents by the following procedure:a Two cutting points are picked randomly, genes except for those betweenthe two points are preserved and copied from parent1 to child1 and fromparent2 to child2.b Operations between the two cutting points in parent1 are reallocated inthe order they appear in parent2.c Operations between the two cutting points in parent2 are reallocated inthe order they appear in parent1.4. Improved precedence operation crossover (IPOX)[16]: IPOX divides the jobset into two complementary and non-empty subsets randomly. The operationsof one job subset are preserved, while the operations of another job subset arecopied from another parent.5. Uniform Preservative crossover (UPX): UPX generates two children from twoparents by the following procedure:a A vector with the size of all operations is generated, which consists ofrandom values 0 and 1.b For the operations corresponding to value 0, the genes are preserved andcopied from parent1 to child1 and from parent2 to child2.c For the operations corresponding to value 1, the genes in parent1 are foundin parent2 and copied from parent2 with the sequence in parent2, and viceversa.

The mutation operator ﬂips the gene values at selected locations. By forcing thealgorithm to search areas other than the current area, the mutation operator isused to maintain genetic diversity from one generation of a population to thenext. In our algorithm, insertion mutation and swap mutation (including onepoint swap and two points swap) are proposed and used.

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 11

Insertion Mutation Operator generates a new individual by the followingprocedure: – Two random numbers i and j (1 ≤ i ≤ N , 1 ≤ j ≤ N ) are selected. – For the operation sequence vector, the operation on position j is inserted infront of the operation on position i . – For the machine assignment vector, a machine is randomly selected for boththe operations on i and on j respectively. If the processing time on thenewly selected machine is lower than that on the current machine, the cur-rent machine is replaced by the new machine. If the processing time on thenew machine is longer than that on the old machine, there is only a 20%probability that the new machine replaces the old machine. Swap Mutation Operator generates a new individual by the followingprocedure: – One random number i (1 ≤ i ≤ N ) is selected or two random numbers i and j (1 ≤ i ≤ N , 1 ≤ j ≤ N ) are selected. – For the operation sequence vector, with only one swap point i , the operationon the swap point is swapped with its neighbour; with two swap points, theoperations on position i and j are swapped. – For the machine assignment vector, the machine on position i (and j ) isreplaced with a new machine by the same rule used in the insertion mutationoperator. Decoding a chromosome is to convert an individual into a feasible schedule tocalculate the objective values which represents the relative superiority of a so-lution. In this process, the operations are picked one by one from the operationsequence vector and placed on the machines from the machine assignment vectorto form the schedule. When placing each operation to its machine, local search(in the sense of heuristic rules to improve solution) is involved to reﬁne an in-dividual in order to obtain an improved schedule in the proposed algorithm.Two levels of local search are applied to allocate each operation to a time sloton its machine. We know that idle times may exist between operations on eachmachine due to precedence constraints among operations of each job, and twolevels of local search utilize idle times in diﬀerent degrees.

The ﬁrst level local search let S ij be the starting time of O ij and C ij thecompletion time of O ij , an example of the ﬁrst level local search is shown inFigure 2. Because O mn needs to be processed after the completion of O mn − ,an idle time interval between the completion of O ab and the starting of O mn appeared on machine M k . O ij is assigned to M k and we assume that O mn isthe last operation on M k before handling O ij , therefore the starting time of O ij is max { C mn , C ij − } , which in this example is C mn and it is later than C ij − , thus, there is an opportunity that O ij can be processed earlier. When checkingthe idle time on M k , the idle time interval [ C ab , S mn ] is found available for O ij because the idle time span [ C ij − , S mn ], which is part of [ C ab , S mn ], is enoughto process O ij or longer than t ijk .Fig. 2: First level local search Fig. 3: Second level local searchLet S dk be the starting time of the d th idle time interval on M k and C dk be thecompletion time. O ij can be transferred to an earliest possible idle time intervalof its machine which satisﬁes the following equation:max { S dk , C ij − } + t ijk ≤ C dk , ( C ij = 0 , if j = 1) (4)After using the idle time interval, the starting time of O ij is max { S dk , C ij − } and the idle interval is updated based on the starting and completion time of O ij : (1) the idle time interval is removed; (2) the starting or completion time ofthe idle time interval is modiﬁed; (3) the idle time interval is replaced by twonew shorter idle time intervals, like in the example of Figure 2.After decoding a chromosome, the operation sequence vector of the chro-mosome is updated according to new starting times of operations, and threeobjective values are calculated. The ﬁrst level local search only ﬁnds for eachoperation the available idle time interval on its assigned machine. After generat-ing the corresponding schedule with the ﬁrst level search method, it is possiblethat there are still operations that can be allocated to available idle time in-tervals to beneﬁt the ﬁtness value. To achieve this, decoding the chromosomewhich has been updated with the ﬁrst level local search is performed with thesecond level local search, and again operations are moved to available idle timeintervals. The second level local search

The second level local search not only checksthe idle time intervals on the assigned machine, but also the idle time intervalson alternative machines. An example of making use of the idle time interval on

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 13 another machine is shown in Figure 3. Let S ijk be the starting time and C ijk bethe completion time of O ij on M k . In this example, O ij is assigned to M k in theinitial chromosome, we assume that O ij can also be performed by M e . Under thecondition that the starting time of O ij on M k is later than the completion timeof O ij − , the idle time intervals on all alternative machines which can process O ij are checked. An idle time interval on M e could be a choice and O ij can bereallocated to M e . In this example, the processing time of O ij on M e is evenshorter then the processing time on M k , therefore, this reallocation can at leastbeneﬁt the total workload.In the second level local search, all available idle time intervals of an operationare checked one by one until the ﬁrst “really” available idle time interval is foundand then the operation is moved to that idle time interval. Any idle time intervalon an alternative machine which can satisfy Equation 4 is an available idle timeinterval, while it must meet at least one of the following conditions to become a“really” available idle time interval.1. The processing time of the operation on the new machine is shorter thanon the initially assigned machine if the available idle time interval is on adiﬀerent machine;2. The operation can be moved from the machine with the maximal makespanto another machine.3. The operation can be moved from the machine with the maximal workloadto another machine.The total workload can be improved directly by the ﬁrst condition; the motiveof the second condition is to decrease the maximal makespan and the thirdcondition can beneﬁt the critical workload.After the reallocation of the operations with the second level local search, thecorresponding schedule is obtained and objective values are calculated. While,instead of updating the chromosome immediately, the new objective values arecompared with the old objective values ﬁrst, the chromosome can be updatedonly when at lease one objectives is better than its old value. This is to makesure that the new schedule is at least not worse than the old schedule (The newsolution is not dominated by the old solution). Another diﬀerence between theﬁrst and second level local search is that the ﬁrst level local search is performedon every evaluation, while the second level local search is only performed witha 30% probability for each chromosome to avoid local optima. Although thesetwo local searches can be applied repeatedly to improve the solution, to avoidthat the algorithm is stuck in a local optima, they are employed only once foreach evaluation. ka10x7, ka10x10, ka15x10) and 10 BRdata instances (Mk01-Mk10). Table 3 givesthe scale of these instances. The ﬁrst column is the name of each instance; thesecond column shows the size of the instance, in which n stands for the number ofjobs and m the number of machines; the third column represents the number ofoperations; the fourth column lists the ﬂexibility of each instance, which meansthe average number of alternative machines for each operation in the problem.Table 3: The scale of benchmark instances Instance n → m → → →

10 30 10ka15x10 15 →

10 56 10Mk01 10 → → → → → →

15 150 3Mk07 20 → →

10 225 1.5Mk09 20 →

10 240 3Mk10 20 →

15 240 3

All the experiments are performed with a population size of 100, each runof the algorithm will stop based on a predeﬁned number of evaluation, which is10 ,

000 for Kacem instances and 150 ,

000 for BRdata instances. For each probleminstance, the proposed algorithm is independently run 30 times. The resultingsolution set of an instance is formed by merging all the non-dominated solutionsfrom its 30 runs.The crossover probability is set to 1 and two random crossover operatorscan be chosen each time (one for operation sequence and one for machine as-signment). For Kacem instances, the mutation probabilities are set to 0 .

6. ForBRdata instances, which include larger-scale and more complex problems, theMIP-EGO conﬁgurator [8] is adopted to tune both insertion and swap mutationprobabilities (one point swap mutation and two points swap mutation) to ﬁndthe best parameter values for each problem. The hypervolume of the solution sethas been used in MIP-EGO as the objective value to tune three mutation prob-abilities. Although the true Pareto fronts (PF) for test instances are unknown,[4] provides the reference set for Kacem and BRdata FJSP instances, which isformed by gathering all non-dominated solutions found by all the implementedalgorithms in [4] and also non-dominated solutions from other state-of-the-art

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 15

MOFJSP algorithms. We deﬁne the reference point for calculating the hypervol-ume value based on the largest value in this reference set. To be speciﬁc, eachobjective function value of the reference point is: 1 . × largest objective func-tion value of the respective dimension in the reference set. The origin point isused as the ideal point. Other basic parameter settings of MIP-EGO are listedin Table 4. For each mutation probability, we only consider a discretized numberwith only one digit after the decimal point, therefore, the search space is ordinalor integer space, which in MIP-EGO are handled in the same way.Table 4: Settings for MIP-EGO Parameter valuemaximal number of evaluations 200surrogate model random forestoptimizer for inﬁll criterion MIESsearch space ordinal space

With a budget of 200 evaluations, Table 5 shows the percentage of the eval-uations which can achieve the largest hypervolume value (or the best PF) byMIP-EGO. It can be observed for Mk05 and Mk08 that all the evaluations haveobtained the largest hypervolume value, it means that all parameter values ofmutation probabilities in MIP-EGO can achieve the best PF for these two prob-lems. It can also be seen in Table 3 that both problems have a low ﬂexibilityvalue. On the contrary, for Mk06, Mk09 and Mk10, these problems have a largeoperation number and high ﬂexibility. It seems that they can be diﬃcult to solvebecause there is only one best parameter setting for the mutation probabilities.This also means that it is highly likely better solution sets can be found with ahigher budget. Table 5: Probability of ﬁnding best conﬁguration

Mk01 Mk02 Mk03 Mk04 Mk05 Mk06 Mk07 Mk08 Mk09 Mk1073% 60% 95% 1% 100% 0 .

5% 4 .

5% 100% 0 .

5% 0 . With the best parameter setting of the mutation probabilities for BRdatainstances, we compared our experimental results with the reference set in [4].Our algorithm can achieve the same Pareto optimal solutions as in the referenceset for all BRdata instances except for Mk06, Mk09 and Mk10. At the sametime, for Mk06 and Mk10, our algorithm can ﬁnd new non-dominated solutions.Table 6 is the list of new non-dominated solutions obtained by our algorithm, each row of an instance is a solution with three objectives: makespan, totalworkload, and critical workload.Table 6: Newly achieved non-dominated solutions

Mk06 Mk1061 427 53 218 1973 19563 428 52 218 1991 19463 435 51 219 1965 19565 453 49 220 1984 19166 451 49 225 1979 19466 457 48 226 1954 196226 1974 194226 1979 192228 1973 194235 1938 199236 1978 193

Another comparison is between our algorithm (FJSP-MOEA) and MOGA[6], SEA [3] and MA1, MA2 [4]. In [4], there are several variants of the proposedalgorithm with diﬀerent strategies in the local search. We pick MA1 and MA2as compared algorithms because they perform equally good or superior to otheralgorithms on almost all problems. Table 7 displays the hypervolume value ofthe PF approximation from all algorithms and the new reference set which isformed by combining all solutions from the PF by all algorithms. The highesthypervolume value on each problem in all algorithms has been highlighted inbold. We observed that FJSP-MOEA and MA1, MA2 show the best and similarperformance, and MOGA behaves the best for three of the BRdata instances.The good performance of MOGA on three problems is interesting. MOGA hasa entropy-based mechanism to maintain decision space diversity which might bebeneﬁcial for solving these problem instances. When using one best parametersetting, we also give the average hypervolume and standard deviation from 30runs on each problem in Table 8, the standard deviation of each problem showsthe stable behaviour of each run.For Kacem instances and with ﬁxed mutation probabilities, our obtainednon-dominated solutions are the same as the PF in the reference set. MA1 andMA2 also achieved the best PF for all Kacem instances, but our algorithmuses far less computational resources. The proposed FJSP-MOEA uses only apopulation size of 100 whereas the population size of MA algorithms is 300.FJSP-MOEA uses only 10 ,

000 objective function evaluations, whereas MA used150 ,

000 evaluations. In terms of computational resources the proposed FJSP-MOEA can therefore be used on smaller computer systems, entailing broaderapplicability, and possibly also in real-time algorithm implementations such asdynamic optimization.

Tailored NSGA-III Instantiation for Flexible Job Shop Scheduling 17

Table 7: Hypervolume from MOGA, SEA, MA1, MA2, FJSP-MOEA and thereference set

Problem MOGA SEA MA1 MA2 FJSP-MOEA RefMk01 0.00426 0.00508

Table 8: Average hypervolume and std with the best parameter setting

Problem Mk01 Mk02 Mk03 Mk04 Mk05 Mk06 Mk07 Mk08 Mk09 Mk10Average HV 0 . . . . . . . . . . . . . . . . . . A novel multi-objective evolutionary algorithm for MOFJSP is proposed. It usesmultiple initialization approaches to enrich the ﬁrst generation population, andvarious crossover operators to create better diversity for oﬀspring. Moreover,to determine the optimal mutation probabilities, the MIP-EGO conﬁgurator isadopted to automatically generate proper mutation probabilities. Besides, thestraightforward local search is employed with diﬀerent levels to aid more accurateconvergence to the PF. The proposed customization approach in principle can becombined with almost all MOEAs. In this paper, we incorporate it with one ofthe state-of-the-art MOEAS, namely NSGA-III, to solve MOFJSP, and the newalgorithm can ﬁnd all Pareto optimal solutions in literature for most problems,and even new Pareto optimal solutions for the large scale instances.In this paper, we show the ability of MIP-EGO in ﬁnding the optimal muta-tion probabilities. However, there is more potential in the automated parameterconﬁguration domain that can beneﬁt EA. For example, to know the eﬀects ofdiﬀerent initialization approaches and crossover operators, we can optimize theinitialization and crossover conﬁguration. Furthermore, other parameters of theproposed algorithm, such as, population size, evaluation number, and so on, canalso be tuned automatically. However, so far the eﬃciency of the existing tun-ing framework is limited when it comes to a larger number of parameters. It would therefore be a good topic of future research to ﬁnd more eﬃcient imple-mentations of these. Finally, based on the good performance of MOGA on someof the problems, it seems to be interesting for future research to integrate theentropy-based selection mechanism also into the MOEA schemes to achieve aneven better performance.