Airline Crew Pairing Optimization Framework for Large Networks with Multiple Crew Bases and Hub-and-Spoke Subnetworks
Divyam Aggarwal, Dhish Kumar Saxena, Thomas Bäck, Michael Emmerich
AAirline Crew Pairing Optimization Framework for Large Networkswith Multiple Crew Bases and Hub-and-Spoke Subnetworks
Divyam Aggarwal a , Dhish Kumar Saxena a , ∗ , Thomas Bäck b and Michael Emmerich b a Department of Mechanical & Industrial Engineering (MIED), Indian Institute of Technology Roorkee, Roorkee, Uttarakhand-247667, India b Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Niels Bohrweg 1, 2333 CA Leiden, the Netherlands
A R T I C L E I N F O
Keywords :Airline Crew SchedulingCrew PairingCombinatorial OptimizationColumn GenerationMathematical ProgrammingHeuristics
A B S T R A C T
Crew Pairing Optimization aims at generating a set of flight sequences ( crew pairings ), covering all flights in an airlines’ flight schedule, at minimum cost , while satisfying several legality con-straints. CPO is critically important for airlines’ business viability considering that the crew op-erating cost is second only to the fuel cost. It poses an NP-hard combinatorial optimization prob-lem, to tackle which, the state-of-the-art relies on relaxing the underlying Integer ProgrammingProblem (IPP) into a Linear Programming Problem (LPP), solving the latter through ColumnGeneration (CG) technique, and integerization of the resulting LPP solution. However, with thegrowing scale and complexity of the airlines’ networks (those with large number of flights, mul-tiple crew bases and/or multiple hub-and-spoke subnetworks), the efficacy of the conventionallyused exact
CG-implementations is severely marred, and their utility has become questionable.This paper proposes an Airline Crew Pairing Optimization Framework,
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 , whose con-stitutive modules include the Legal Crew Pairing Generator, Initial Feasible Solution Generator,and an Optimization Engine built on heuristic-based CG-implementation.
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s noveltylies in not just the design of its constitutive modules but also in how these modules interact . Inthat, insights in to several important questions which the literature is otherwise silent on, havebeen shared. These relate to sensitivity analysis of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance in terms of fi-nal solutions’ cost quality and run-time, with respect to - sources of variability over multipleruns for a given problem; cost quality of the initial solution and the run-time spent to obtainit; and termination parameters for LPP-solutioning and IPP-solutioning. In addition, the effi-cacy of the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 has been: (a) demonstrated on real-world airline flight networks withan unprecedented conjunct scale-and-complexity, marked by over 4200 flights, 15 crew bases,and billion-plus pairings, and (b) validated by the research consortium’s industrial sponsor. Itis hoped that with the emergent trend of conjunct scale and complexity of airline networks, thispaper shall serve as an important milestone for affiliated research and applications.
1. Introduction
Airline scheduling poses some of the most challenging optimization problems encountered in the entire OperationsResearch (OR) domain. For a large-scale airline, the crew operating cost constitutes the second-largest cost compo-nent, next to the fuel cost, and even its marginal improvements may translate to annual savings worth millions ofdollars. Given the potential for huge cost-savings, Airline Crew Scheduling is recognized as a critical planning activ-ity. It has received an unprecedented attention from the researchers of the OR community over the last three decades.Conventionally, it is tackled by solving two problems, namely,
Crew Pairing Optimization Problem (CPOP) and
CrewAssignment Problem , in a sequential manner. The former problem is aimed at generating a set of flight sequences (eachcalled a crew pairing ) that covers all flights from an airlines’ flight schedule, at minimum cost, while satisfying severallegality constraints linked to federations’ rules, labor laws, airline-specific regulations, etc. These optimally-derivedcrew pairings are then fed as input to the latter problem, which is aimed to generate a set of pairing sequences (eachsequence is a schedule for an individual crew member), while satisfying the corresponding crew requirements. Be-ing the foremost step of the airline crew scheduling, CPOP is the main focus of this paper, and interested readers arereferred to Barnhart et al. (2003) for a comprehensive review of the airline crew scheduling. ∗ Corresponding author; Email Address: [email protected]; Postal Address: Room No.-231, East Block, MIED, IIT Roorkee, Roorkee,Uttarakhand-247667, India; Phone: +91-8218612326 [email protected] (D. Aggarwal); [email protected] (D.K. Saxena); [email protected] (T. Bäck); [email protected] (M. Emmerich)
ORCID (s): (D. Aggarwal); (D.K. Saxena); (T. Bäck); (M. Emmerich)
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 1 of 28 a r X i v : . [ c s . M S ] N ov rew pairing optimization framework for tackling large-scale & complex flight networks CPOP is an
NP-hard combinatorial optimization problem (Garey & Johnson, 1979). It is modeled as either a setpartitioning problem (SPP) in which each flight is allowed to be covered by only one pairing, or a set covering problem (SCP) in which each flight is allowed to be covered by more than one pairing. In CPOP, a crew pairing has to satisfyhundreds of legality constraints (Section 2.2) to be classified as legal , and it is imperative to generate legal pairingsin a time-efficient manner to assist optimization search. Several legal pairing generation approaches, based on eithera flight- or a duty-network, have been proposed in the literature (Aggarwal et al., 2018). Depending upon how thelegal pairing generation module is invoked, two CPOP solution-architectures are possible. In the first architecture, allpossible legal pairings are enumerated a priori the CPOP-solutioning. However, this is computationally-tractable onlyfor small-scale CPOPs (with ≈ <1000 flights). Alternatively, legal pairings are generated during each iteration of theCPOP-solutioning, but only for a subset of flights, so the CPOP solution could be partially improved before triggeringthe next iteration. Such an architecture mostly suits medium- to large-scale CPOPs (with ≈ ≥ heuristic-based optimization techniques and mathematical programming tech-niques, are commonly employed (Section 2.3). In the former category, Genetic Algorithms (GAs) which are population-based randomized-search heuristics (Goldberg, 2006) are most commonly used. However, they are found to be efficientonly for tackling very small-scale CPOPs (Ozdemir & Mohan, 2001). Alternatively, several mathematical program-ming based approaches do exist to solve CPOPs of varying-scales. CPOP is inherently an Integer Programming Prob-lem (IPP), and some approaches have used standard Integer Programming (IP) techniques to find a best-cost pairingsubset from a pre-enumerated pairings’ set (Hoffman & Padberg, 1993). However, these approaches have proven effec-tive only with small-scale CPOPs with up to a million pairings. This perhaps explains the prevalence of an altogetherdifferent strategy, in which the original CPOP/IPP is relaxed into a Linear Programming Problem (LPP); the LPP issolved iteratively by invoking a LP solver and relying on
Column Generation (CG) technique to generate new pairingsas part of the pricing sub-problem; and finally, the resulting LPP solution is integerized using IP techniques and/or somespecial connection-fixing heuristics. The challenge associated with this strategy is that even though the LPP solver maylead to a near-optimal LPP solution, the scope of finding a good-cost IPP solution is limited to the pairings available inthe LPP solution. To counter this challenge, heuristic implementations of branch-and-price framework (Barnhart et al.(1998)) in which CG is utilized during the integerization phase too, have been employed to generate new legal pairingsat nodes of the IP-search tree. However, the efficacy of such heuristic implementations depends on a significant numberof algorithmic-design choices (say, which branching scheme to adopt, or how many CG-iterations to perform at thenodes). Furthermore, it is noteworthy that the scale and complexity of flight networks have grown alarmingly over thepast decades. As a result, an inestimably large number of new pairings are possible under the pricing sub-problem,given which most existing solution methodologies are rendered computationally-inefficient. Recognition of such chal-lenges have paved the way towards domain-knowledge driven CG strategies to generate a manageable, yet crucial partof the overall pairings’ space under the pricing sub-problem (Zeren & Özkol, 2016). Though rich in promise, the effi-cacy of this approach is yet to be explored vis- ̀𝑎 -vis the emergent large-scale and complex flight networks characterizedby multiple crew bases and/or multiple hub-and-spoke subnetworks where billions of legal pairings are possible.In an endeavor to address airline networks with conjunct scale and complexity, this paper proposes an Airline CrewPairing Optimization Framework ( 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ) based on domain-knowledge driven CG strategies, and:• presents not just the design of its constitutive modules (including Legal Crew Pairing Generator, Initial FeasibleSolution Generator, and Optimization Engine powered by CG-driven LPP-solutioning and IPP-solutioning), butalso how these modules interact • discusses how sensitive its performance is to - sources of variability over multiple runs for a given problem; costquality of the initial solution and the run-time spent to obtain it; and termination parameters for LPP-solutioningand IPP-solutioning. Such an investigation promises important insights for researchers and practitioners oncritical issues which are otherwise not discussed in the existing literature.• presents empirical results for real-world, large-scale (over 4200 flights), complex flight network (over 15 crewbases and multiple hub-and-spoke subnetworks) for a US-based airline, the data for which has been provided bythe research consortium’s industrial partner. For NP-hard (NP-complete) problems, no polynomial time algorithms on sequential computers are known up to now. However, verificationof a solution might be (can be) accomplished efficiently, i.e., in polynomial time.
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 2 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
The outline of the remaining paper is as follows. Section 2 discusses the underlying concepts, related work, andproblem formulation; Section 3 entails the details of the proposed
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ; Section 4 presents the results of thecomputational experiments along with the corresponding observations; and Section 5 concludes the paper as well asbriefly describes the potential future directions.
2. Crew Pairing Optimization: Preliminaries, Related Work and Problem Formulation
This section first describes the preliminaries, including the associated terminology, pairings’ legality constraints,and pairings’ costing criterion. Subsequently, the related work is presented in which the existing CPOP solutionapproaches are discussed. Lastly, the airline CPOP formulation is presented.
In airline crew operations, each crew member is assigned a fixed (home) airport, called a crew base . A crew pairing (or a pairing ) is a flight sequence operated by a crew, that begins and ends at the same crew base, and satisfies the givenpairing legality constraints (detailed in Section 2.2). An example of a crew pairing with the Dallas (DAL) airport as thecrew base is illustrated in Figure 1. In a crew pairing, the legal sequence of flights operated by a crew in a single workingday (not necessarily equivalent to a calendar day) is called a crew duty or a duty . A sit-time or a connection-time is asmall rest-period, provided between any two consecutive flights within a duty for facilitating operational requirementssuch as aircraft changes by the crew, turn-around operation for the aircraft, etc. An overnight-rest is a longer rest-period, provided between any two consecutive duties within a pairing. Moreover, two short-periods, provided in thebeginning and ending of any duty within a pairing, are called briefing and de-briefing time , respectively. The total timeelapsed in a crew pairing, i.e., the time for which a crew is away from its crew base is called the time away from base (TAFB). Sometimes, it is required for a crew to be transported at an airport to fly their next flight. For this, the crewtravels as passenger in another flight, flown by another crew, to arrive at the required airport. Such a flight is called a deadhead flight or a deadhead for the crew traveling as passenger. It is desired by an airline to minimize the numberof deadheads (ideally zero), as it affects the airline’s profit in two-folds. Firstly, the airline suffers a loss of the revenueon the passenger seat being occupied by the deadhead-ing crew, and secondly, the airline has to pay the hourly wagesto the deadhead-ing crew even when it is not operating the flight.
DAL(Crew base)BOILASONT 0800 0910 1025 1325 1500 1945 1010 1235 1420 1850Sit-time Sit-time Sit-time O v e r n i gh t R e s t Flight LegBriefing-time Debriefing-timeCrew Duty 1 (Elapsed Time) Crew Duty 2Time Away From Base (TAFB)
Figure 1:
An example of a crew pairing starting from
Dallas (DAL) crew base
To govern the safety of crew members, airline federations such as European Aviation Safety Agency, FederalAviation Administration, and others, have laid down several rules and regulations, which in addition to the airline-specific regulations, labor laws, etc. are required to be satisfied by a pairing to be “legal”. These legality constraintscould be broadly categorized as follows:•
Connection-city constraint ( 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ): this constraint requires the arrival airport of a flight (or the last flight of aduty) within a pairing to be same as the departure airport of its next flight (or the first flight of its next duty). D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 3 of 28rew pairing optimization framework for tackling large-scale & complex flight networks • Sit-time ( 𝐶 𝑠𝑖𝑡 ) and Overnight-rest ( 𝑛𝑖𝑔ℎ𝑡 ) constraints : these constraints imposes the respective maximum andminimum limits on the duration of sit-times and overnight-rests, where these limits are governed by airlines andfederations’ regulations.• Duty constraints ( 𝑑𝑢𝑡𝑦 ): these constraints govern the regulations linked to the crew duties. For instance, theyimpose maximum limits on the– number of flights allowed in a duty of a pairing; duty elapsed-time and thecorresponding flying-time; number of duties allowed in a pairing, etc.• Start- and end-city constraint ( 𝑏𝑎𝑠𝑒 ): this constraint requires the beginning airport (departure airport of the firstflight) and ending airport (arrival airport of the last flight) of a pairing, to be the same crew base.• Other constraints ( 𝑜𝑡ℎ𝑒𝑟 ): Airlines formulate some specific constraints, according to their operational require-ments, so as to maximize their crew utilization. For example, a pairing is refrained from involving overnight-restsat the airports that belong to the same city as the crew base from which the pairing started, etc.Considering the multiplicity of the above constraints, it is critical to develop a time-efficient legal crew pairing gener-ation approach , enabling their prompt availability, when their requirement arises during the optimization.In general, a pairing’s cost could be split into the flying cost and non-flying (variable) cost . The flying cost isthe cost incurred in actually flying all the given flights, and is computed on hourly-basis. The variable cost is thecost incurred during the non-flying hours of the pairing, and is made up of two sub-components, namely, hard cost and soft cost . The hard cost involves the pairing’s hotel cost, meal cost, and excess pay – the cost associated with thedifference between the guaranteed hours of pay and the actual flying hours. Here, the pairing’s hotel cost is the lodgingcost incurred during its overnight-rests, and its meal cost is computed as a fraction of its TAFB. The soft cost is theundesirable cost associated with the number of aircraft changes (during flight-connections) in the pairing, etc. As mentioned in Section 1, the existing CPOP solution approaches are based on either heuristic or mathematicalprogramming techniques. Among the heuristic-based approaches, GA is the most widely adopted technique, andBeasley & Chu (1996) is the first instance to customize a GA (using guided GA-operators) for solving a general classof SCPs. In that, the authors validated their proposed approach on small-scale synthetic test cases (with over 1,000 rowsand just 10,000 columns). The important details of the GA-based CPOP solution approaches, available in the literature,are reported in Table 1. Notably, the utility of the studies reported in the table, have been demonstrated on CPOPs
Table 1
Key facts around the GA-based CPOP solution approaches, available in the literature
Literature Studies Modeling Timetable Airline Test Cases* Airlines
Levine (1996) Set Partitioning - 40R; 823F; 43,749P -Ozdemir & Mohan (2001) Set Covering Daily 28R; 380F; 21,308P Multiple AirlinesKornilakis & Stamatopoulos (2002) Set Covering Monthly 1R; 2,100F; 11,981P Olympic AirwaysZeren & Özkol (2012) Set Covering Monthly 1R; 710F; 3,308P Turkish AirlinesDeveci & Demirel (2018a) Set Covering - 12R; 714F; 43,091P Turkish AirlinesR represents the number of real-world test cases considered; F and P represents the maximum number of flights andpairings covered, therein. with reasonably small number of flights, leading to relatively smaller number of pairings. Though, CPOPs with 2,100and 710 flights have been tackled by Kornilakis & Stamatopoulos (2002) and Zeren & Özkol (2012) respectively,only a subset of all possible legal pairings has been considered by them for finding the reported solutions. Zeren& Özkol (2012) proposed a GA with highly-customized operators, which efficiently solved small-scale CPOPs butfailed to solve large-scale CPOPs with the same search-efficiency. Furthermore, Aggarwal, Saxena, et al. (2020b)tackled a small-scale CPOP (with 839 flights and multiple hub-and-spoke sub-networks) using a customized-GA (withguided operators) as well as mathematical programming techniques. The authors concluded that customized-GAs areinefficient in solving complex versions of even small-scale flight networks, compared to a mathematical programming-based solution approach.
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 4 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Several mathematical programming-based CPOP solution approaches have been proposed in the literature overpast few decades, and based on the size and characteristics of the flight network being tackled, these approaches havebeen categorized into either of the three general classes. In the first class of approaches, all legal pairings or a subset ofgood pairings are enumerated prior to the CPOP-solutioning, and the corresponding CPOP/IPP model is solved usingstandard IP techniques (such as branch-and-bound algorithm (Land & Doig, 1960)). Gershkoff (1989) proposed aniterative solution approach, which is initialized using a set of artificial pairings (each covering a single flight at a highpseudo-cost). In that, each iteration involves selection of very few pairings (5 to 10); enumeration of all legal pairingsusing the flights covered in the selected pairings; optimization of the resulting SPP to find the optimal pairings; andlastly, replacement of the originally selected pairings with the optimal pairings, only if the latter offers a better cost.The search-efficiency of such an approach is highly dependent on the sub-problem-size (handled up to 100 flights and5,000 pairings), as the length and breadth of the branching tree increases drastically with an increase in sub-problem-size. Hoffman & Padberg (1993) proposed an alternative approach to tackle SPPs with up to 825 flights and 1.05million pairings in which all possible pairings are enumerated a priori, and the resulting SPP is solved to optimalityusing a branch-and-cut algorithm . Such approaches are efficient only in tackling small-scale CPOPs, that too with upto a million pairings. However, even small-scale CPOPs may involve large number of pairings (an instance reportedin Vance et al. (1997) had 250 flights and over five million pairings), rendering it computationally-intractable to usesuch approaches.The second class of approaches relies on relaxing the integer constraints in the original CPOP/IPP to form an LPP,which is then solved iteratively by– invoking an LP solver and generating new pairings using CG; and integerizing theresulting LPP solution. In any iteration of the LPP-solutioning (referred to as an LPP iteration ), an LP solver (based oneither a simplex method or an interior-point method ) is invoked on the input pairing set to find the LPP solution andits corresponding dual information (shadow price corresponding to each flight-coverage constraint), which are thenutilized to generate new pairings as part of the pricing sub-problem, promising the corresponding cost-improvements.For the first LPP iteration, any set of pairings covering all the flights becomes the input to the LP solver, and forany subsequent LPP iteration, the current LPP solution and the set of new pairings (from the pricing sub-problem)constitute the new input. For more details on how new pairings are generated under the pricing sub-problem in the CGtechnique, interested readers are referred to Vance et al. (1997); Lübbecke & Desrosiers (2005). As cited in Zeren &Özkol (2016), the CG technique has several limitations, out of which the prominent ones are– heading-in effect (poordual information in initial LPP iteration leads to generation of irrelevant columns), bang-bang effect (dual variablesoscillate from one extreme point to another, leading to poor or slower convergence), and tailing-off effect (the cost-improvements in the later LPP iterations taper-off). While, different stabilization techniques are available for CG inthe literature Du Merle et al. (1999); Lübbecke (2010), the use of interior point methods is gaining prominence. Anbilet al. (1991) presented the advancements at the American Airlines, and enhanced the approach proposed by Gershkoff(1989) (discussed above), by leveraging the knowledge of dual variables to screen-out/price-out the pairings from theenumerated set at each iteration, enabling it to solve larger sub-problems (up to 25 flights and 100,000 pairings). Asan outcome of a collaboration between IBM and American Airlines, Anbil et al. (1992) proposed an iterative globalsolution approach (though falling short of global optimization) in which an exhaustive set of pairings ( ≈ ≈ The branch-and-cut algorithm was first proposed by Padberg & Rinaldi (1991) to solve Mixed Integer Programs (MIP), by integrating thestandard branch-and-bound and cutting-plane algorithms. For comprehensive details of the MIP solvers, interested readers are referred to Lodi(2009); Linderoth & Lodi (2011); Achterberg & Wunderling (2013). The class of interior-point methods was first introduced by Karmarkar (1984). In that, a polynomial-time algorithm, called
Karmarkar’salgorithm , was proposed, which, in contrary to simplex method, searches for the best solution by traversing the interior of the feasible region of thesearch space.
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 5 of 28rew pairing optimization framework for tackling large-scale & complex flight networks pairings available in it may not fit well together to constitute a good-cost IPP solution.The third class of approaches share a similar solution-architecture as of the preceding class, however, differs interms of the integerization of the LPP solution. In that, a heuristic branch-and-price framework is adopted, wherein,CG is utilized during the integerization phase too, to generate new legal pairings at nodes of the MIP-search tree.Desrosiers et al. (1991) is the first instance that solved CPOP using a branch-and-price framework. However, giventhe inestimable number of legal pairings possible for even medium-scale CPOPs, numerous branch-and-price basedheuristic-approaches have been proposed over the last three decades (Desaulniers et al., 1997; Vance et al., 1997;Anbil et al., 1998; Desaulniers & Soumis, 2010). Notably, the development of these approaches, being heuristic innature, require a significant number of algorithmic-design choices to be taken empirically, which may vary with thecharacteristics of the flight networks being solved for. To name a few such decisions, which branching scheme shouldbe employed (branching on linear variables, branching on flight-connections, or others), should CG be performedon each node of the MIP-search tree, how many CG iterations to be performed each time, etc. Furthermore, thecommercial LP and MIP solvers are not much open to modifications, making it difficult for the new researchers toimplement a computationally- and time-efficient branch-and-price framework from scratch. For further details of theexisting CPOP solution approaches, interested readers are referred to recent survey articles– Kasirzadeh et al. (2017);Deveci & Demirel (2018b).In addition to the above classification of solution approaches, the literature differs on the notion of how the pricingsub-problem is modeled and solved to generate new legal pairings during the LPP iterations. However, the focus ofthis paper is not on the solution to the pricing sub-problem step, but on the interactions between different modules ofa CG-based CPOP solution approach. Hence, for details on the existing work related to the pricing sub-problem step,interested readers are referred to Vance et al. (1997); Aggarwal, Saxena, et al. (2020a). As mentioned earlier, a CPOP is intrinsically an IPP, modeled either as a SCP or a SPP. Notably, the SCP formu-lation provides higher flexibility during its solutioning compared to the SPP formulation by accommodating deadheadflights in the model, possibly resulting in faster convergence (Gustafsson, 1999). For a given flight set (including 𝐹 flights) that could be covered in numerous ways by a set of legal pairings (including 𝑃 pairings), the set coveringproblem is aimed to find a subset of pairings ( ∈ ), say ∗ 𝐼𝑃 , which not only covers each flight ( ∈ ) at least once ,but does it at a cost lower than any alternative subset of pairings in . In that, while finding ∗ 𝐼𝑃 ( ⊆ ), each pairing 𝑝 𝑗 ∈ corresponds to a binary variable 𝑥 𝑗 , which represents whether the pairing 𝑝 𝑗 is included in ∗ 𝐼𝑃 (marked by 𝑥 𝑗 = 1 ) or not ( 𝑥 𝑗 = 0 ). Here, 𝑝 𝑗 is a 𝐹 -dimensional vector, whose each element, say 𝑎 𝑖𝑗 , represents whether the flight 𝑓 𝑖 is covered by pairing 𝑝 𝑗 (marked by 𝑎 𝑖𝑗 = 1 ) or not ( 𝑎 𝑖𝑗 = 0 ). In this background, the IPP formulation, as used inthis paper, is as follows.Minimize 𝑍 𝐼𝑃 = 𝑃 ∑ 𝑗 =1 𝑐 𝑗 𝑥 𝑗 + 𝜓 𝐷 ⋅ ( 𝐹 ∑ 𝑖 =1 ( 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 − 1 )) , (1)subject to 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 ≥ , ∀ 𝑖 ∈ {1 , , ..., 𝐹 } (2) 𝑥 𝑗 ∈ ℤ = {0 , , ∀ 𝑗 ∈ {1 , , ..., 𝑃 } (3)where , 𝑐 𝑗 ∶ the cost of a legal pairing 𝑝 𝑗 ,𝜓 𝐷 ∶ an airline-defined penalty cost against each deadhead in the solution ,𝑎 𝑖𝑗 = 1 , if flight 𝑓 𝑖 is covered in pairing 𝑝 𝑗 ; else 𝑥 𝑗 = 1 , if pairing 𝑝 𝑗 contributes to Minimum 𝑍 ; else In the objective function (Equation 1), the first component gives the sum of the individual costs of the pairings selectedin the solution, while the other component gives the penalty cost for the deadheads incurred in the solution (note, The branch-and-price algorithm was originally proposed by Barnhart et al. (1998) as an exact algorithm to solve then-known large-scale IPPs,and has been utilized to solve a variety of combinatorial optimization problems in transportation such as Desrosiers et al. (1984); Desrochers &Soumis (1989); Desrochers et al. (1992).
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 6 of 28rew pairing optimization framework for tackling large-scale & complex flight networks ( ∑ 𝑃𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 − 1) gives the number of deadheads, corresponding to the flight 𝑓 𝑖 ). Notably, in the above formulation,it is assumed that the set of all possible legal pairings, namely, , are available a priori , and the task is to determine ∗ 𝐼𝑃 . However, the generation of a priori is computationally-intractable for large-scale CPOPs, as mentioned inSection 2.3. Hence, the solution to the CPOP/IPP is pursued in conjunction with the corresponding LPP (formulationdeferred till Section 3.3.1) assisted by the CG technique.
3. Proposed Airline Crew Pairing Optimization Framework (
AirCROP ) This section presents the constitutive modules of the proposed optimization framework -
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 , their working,and their interactions. As per the schematic in Figure 2,
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 accepts a set of given flights along with thepairings’ legality constraints and costing criterion as input, and outputs a minimal-cost set of legal pairings ⋆𝐼𝑃 , thatcovers all given flights. This transition from the input to output is enabled by the constitutive modules, namely, the Legal Crew Pairing Generator , the
Initial Feasible Solution Generator , and an
Optimization Engine in turn enabled by
CG-driven LPP-solutioning and
IPP-solutioning submodules and their intermittent interactions. While parts of thesemodules have been presented elsewhere (Aggarwal et al., 2018; Aggarwal, Saxena, et al., 2020a) in isolation, these arebeing detailed below towards a holistic view on the experimental results presented later.
This module enables generation of the legal pairings in a time-efficient manner, so they could feed real-time intothe other modules - Initial Feasible Solution Generator and the optimization engine. For time-efficiency, it employs aparallel, duty-network based legal pairing generation approach, whose distinctive contributions are two-folds. Firstly,a crew base centric parallel architecture is adopted considering that several duty- and pairing- constitutive constraintsdo vary with crew bases. In that, for an input flight set, the legal pairing generation process is decomposed intoindependent sub-processes (one for each crew base), running in parallel on idle-cores of the central processing unit(CPU). This leads to a significant reduction in the pairing generation time ( ≈
10 folds for a CPOP with 15 crew bases,as demonstrated in Aggarwal et al. (2018)). Secondly, the set of all possible legal duties and the corresponding dutyovernight-connection graph with-respect-to each crew base are enumerated and stored a priori the CPOP-solutioning.In a duty overnight-connection graph, a node represents a legal duty, and an edge between any two nodes representsa legal overnight-rest connection between the respective duties. Such a preprocessing ensures that all the connection-city, sit-time, duty, and overnight-rest constraints get naturally satisfied, eliminating the need for their re-evaluationduring the generation of legal pairings, and leading to a significant reduction in the legal pairing generation time.The implementation of this module, formalized in Algorithms 1 & 2, is elaborated below. For solving any CPOP,the foremost step of the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is to preprocess the entire duty-connection network– set of legal duties 𝑏 and dutyovernight-connection graph 𝑑𝑏 ( ≡ ( 𝑏 , 𝑑𝑏 )) for each crew base 𝑏 in the given set of crew bases , where 𝑑𝑏 is the setof legal overnight-rest connections between duty-pairs ∈ 𝑏 . The procedure for the above preprocessing is presentedin Algorithm 1. In that, the first step is the generation of a flight-connection graph (denoted by 𝑓 ) by evaluating thelegality of connection-city ( 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ) and sit-time ( 𝑠𝑖𝑡 ) constraints between every flight-pair in the given flight schedule (line 1). Here, in 𝑓 ( ≡ ( , 𝑓 )) , is the set of nodes (flights) and 𝑓 is the set of edges (legal flight connections).Subsequently, 𝑓 is used for legal duty enumeration, by decomposing the process into independent sub-processes, onefor each crew base 𝑏 ∈ , and executing them in parallel (lines 2-12). In each of these sub-processes, enumeration oflegal duties, starting from each flight 𝑓 ∈ , is explored. In that:• flight 𝑓 is added to an empty candidate duty stack, given by 𝑑𝑢𝑡𝑦 (line 4).• the flight-sequence in 𝑑𝑢𝑡𝑦 is checked for satisfaction of duty constraints 𝑑𝑢𝑡𝑦 , and if satisfied, 𝑑𝑢𝑡𝑦 is addedto the desired legal duty set 𝑏 (lines 5-6). Notably, if 𝑓 has at least one connection with another flight in 𝑓 ,and if the duty constraints permit, then more flights could be accommodated in 𝑑𝑢𝑡𝑦 , leading to enumeration ofother legal duties (lines 7-9).• a Depth-first Search (DFS) algorithm (Tarjan, 1972) is adapted, which is called recursively to enumerate legalduties, starting from a parent flight node ( 𝑝𝑎𝑟𝑒𝑛𝑡 ), by exploring its all successive paths in 𝑓 in a depth-firstmanner (lines 16-25). In each recursion, a child flight node ( 𝑐ℎ𝑖𝑙𝑑 ) is pushed into 𝑑𝑢𝑡𝑦 , the updated flight-sequence is checked for satisfaction of 𝑑𝑢𝑡𝑦 , and if satisfied, 𝑑𝑢𝑡𝑦 is yielded to 𝑏 , followed by another recursionof DFS() with 𝑐ℎ𝑖𝑙𝑑 as the new 𝑝𝑎𝑟𝑒𝑛𝑡 . In this way, all legal duties, starting from flight 𝑓 , are enumerated. D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 7 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Initial Feasible Solution Generator : Generation of a set of legal pairings covering all flights using IPDCH
LP solver on the
Input: , for Solving dual using interior-point methodChecktermination
IPP-solutioning
Integerization of LPPsolution offering Yes
Output : Final crew pairing solution No Yes No
Input : Set of given flights , and pairings' legality constraints and costing criteria
MIP solver
CG-driven LPP-solutioning
CG heuristic ( )
Legal CrewPairingGenerator using parallelarchitecture
ColumnGeneration (generation/re-induction ofpairings with negative ) Input: , for
Optimization Engine
Checktermination
Figure 2:
A schematic of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 illustrating the interactions between its constitutive modules– Legal Crew PairingGenerator, Initial Feasible Solution Generator, Optimization Engine (CG-driven LPP-solutioning interacting with IPP-solutioning). The CG heuristic in LPP-solutioning generates a set of fresh pairings 𝑡𝐶𝐺 at any LPP iteration 𝑡 using thefollowing CG strategies: Deadhead reduction ( 𝐶𝐺𝐷 , generating 𝑡𝐶𝐺𝐷 ), Crew Utilization enhancement ( 𝐶𝐺𝑈 , generating 𝑡𝐶𝐺𝑈 ), Archiving ( 𝐶𝐺𝐴 , generating 𝑡𝐶𝐺𝐴 ), and Random exploration ( 𝐶𝐺𝑅 , generating 𝑡𝐶𝐺𝑅 ). The interactions betweenLPP-solutioning and IPP-solutioning are tracked by the counter 𝑇 . Subsequently, 𝑓 is popped out from 𝑑𝑢𝑡𝑦 , and duty enumeration using other flights in is explored (lines 3 & 11).The resulting set 𝑏 is then used to generate the duty overnight-connection graph 𝑑𝑏 ), by evaluating the legality ofconnection-city ( 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ) and overnight-rest ( 𝑛𝑖𝑔ℎ𝑡 ) constraints between every duty-pair ∈ 𝑏 (line 13). Here, in 𝑑𝑏 ( ≡ ( 𝑏 , 𝑑𝑏 )) , 𝑏 is the set of nodes (legal duties), and 𝑑𝑏 is the set of edges (legal overnight-rest connections).The preprocessed sets of legal duties and the corresponding duty overnight-connection graphs are utilized to enu-merate legal pairings for any input flight set (say ∗ ) or a duty set (say ∗ ), when required in real-time in othermodules of the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . Its procedure, formalized in Algorithm 2, is elaborated below. For legal pairing enumera-tion, the same crew base driven parallel architecture is utilized in which the process is decomposed into independentsub-processes, one for each crew base 𝑏 ∈ , running in parallel on idle-cores of the CPU (line 1). In each of thesesub-processes, the first step is to update 𝑏 and 𝑑𝑏 , by removing duties ∉ ∗ if ∗ is input, or those duties that coverflights ∉ ∗ if ∗ is input (line 2). Subsequently, the enumeration of legal pairings, starting from each duty ( 𝑑𝑢𝑡𝑦 ) ∈ 𝑏 , is explored (line 3). In that: D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 8 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Algorithm 1:
Procedure for enumeration of legal duties and duty overnight-connection graphs
Input: ; ; and constraints: 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 , 𝑠𝑖𝑡 , 𝑑𝑢𝑡𝑦 & 𝑛𝑖𝑔ℎ𝑡 Output: 𝑏 & 𝑑𝑏 ∀ 𝑏 ∈ 𝑓 ← Generate the flight-connection graph by evaluating 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 & 𝑠𝑖𝑡 between each pair of flights ∈ ⊳ 𝑓 ≡ ( , 𝑓 ) for each crew base 𝑏 ∈ in parallel do for each flight 𝑓 ∈ do Push 𝑓 into an empty 𝑑𝑢𝑡𝑦 if updated flight-sequence in 𝑑𝑢𝑡𝑦 satisfies constraints in 𝑑𝑢𝑡𝑦 then Add 𝑑𝑢𝑡𝑦 to 𝑏 if 𝑓 has at least one flight-connection in 𝑓 then DFS( 𝑑𝑢𝑡𝑦, 𝑓 , 𝑓 , 𝑑𝑢𝑡𝑦 ) , and add the enumerated duties to 𝑏 end end Pop out 𝑓 from 𝑑𝑢𝑡𝑦 end 𝑑𝑏 ← Generate the duty overnight-connection graph by evaluating 𝑛𝑖𝑔ℎ𝑡 between each pair of duties ∈ 𝑏 end return 𝑏 & 𝑑𝑏 ∀ 𝑏 ∈ ⊳ DFS( 𝑑𝑢𝑡𝑦, 𝑝𝑎𝑟𝑒𝑛𝑡, 𝑓 , 𝑑𝑢𝑡𝑦 ) for each 𝑐ℎ𝑖𝑙𝑑 of 𝑝𝑎𝑟𝑒𝑛𝑡 in 𝑓 do Push 𝑐ℎ𝑖𝑙𝑑 into 𝑑𝑢𝑡𝑦 if updated flight-sequence in 𝑑𝑢𝑡𝑦 satisfies 𝑑𝑢𝑡𝑦 then yield 𝑑𝑢𝑡𝑦 to 𝑏 if 𝑐ℎ𝑖𝑙𝑑 has at least one connection in 𝑓 then DFS( 𝑑𝑢𝑡𝑦, 𝑐ℎ𝑖𝑙𝑑, 𝑓 , 𝑑𝑢𝑡𝑦 ) end end Pop out 𝑐ℎ𝑖𝑙𝑑 from 𝑑𝑢𝑡𝑦 end • the 𝑑𝑢𝑡𝑦 is pushed into an empty candidate pairing stack, given by 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 , only if the departure airport of 𝑑𝑢𝑡𝑦 is same as the crew base 𝑏 (lines 4-5).• the 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is checked for satisfaction of pairing constraints 𝑜𝑡ℎ𝑒𝑟 , and if satisfied, 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is further checkedfor satisfaction of end-city constraint 𝑏𝑎𝑠𝑒 , which ensures that the arrival airport of the 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 ’s last duty issame as the crew base 𝑏 . – If 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisfies 𝑏𝑎𝑠𝑒 , it is classified as legal , and is added to the desired pairing set ∗ (lines 7-8). – If 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 does not satisfy 𝑏𝑎𝑠𝑒 , it is not complete, and more duties are required to be covered in it tocomplete the legal duty-sequence. This is only possible if 𝑑𝑢𝑡𝑦 has at least one overnight-rest connectionin 𝑑𝑏 . And if it does, the DFS() sub-routine, similar to the one used in legal duty enumeration, is calledrecursively to enumerate legal pairings, starting from a parent duty node ( 𝑝𝑎𝑟𝑒𝑛𝑡 ), by exploring its allsuccessive paths in 𝑑𝑏 in a depth-first manner (lines 18-28). In each recursion: ◦ a child duty node ( 𝑐ℎ𝑖𝑙𝑑 ) is pushed into the 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 (line 19). ◦ the updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is checked for satisfaction of first 𝑜𝑡ℎ𝑒𝑟 and then 𝑏𝑎𝑠𝑒 (lines20-21). ◦ if it satisfies both constraints, then 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is complete (legal), and is yielded to the desired pairing set ∗ (line 22). ◦ if it satisfies 𝑜𝑡ℎ𝑒𝑟 but not 𝑏𝑎𝑠𝑒 , then another recursion of DFS() with 𝑐ℎ𝑖𝑙𝑑 as new 𝑝𝑎𝑟𝑒𝑛𝑡 is called,only if 𝑐ℎ𝑖𝑙𝑑 has at least one duty overnight-rest connection in 𝑑𝑏 (lines 23-25).In the above way, all legal pairings, starting from 𝑑𝑢𝑡𝑦 , are enumerated using the DFS() sub-routine. Subsequently,
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 9 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Algorithm 2:
Procedure for enumeration of legal pairings from an input flight set ∗ or a duty set ∗ Input: ∗ or ∗ ; ; 𝑏 & 𝑑𝑏 ∀ 𝑏 ∈ ; and constraints: 𝑏𝑎𝑠𝑒 & 𝑜𝑡ℎ𝑒𝑟 Output: ∗ for each crew base 𝑏 ∈ in parallel do Update 𝑏 & 𝑑𝑏 by removing duties ∉ ∗ if ∗ is input, or by removing those duties which cover flights ∉ ∗ if ∗ is input for each 𝑑𝑢𝑡𝑦 ∈ 𝑏 do if departure airport of 𝑑𝑢𝑡𝑦 is 𝑏 then Push 𝑑𝑢𝑡𝑦 into an empty 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisfies 𝑜𝑡ℎ𝑒𝑟 then if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisfies 𝑏𝑎𝑠𝑒 then Add 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 to ∗ else if 𝑑𝑢𝑡𝑦 has at least one duty overnight-connection in 𝑑𝑏 then DFS( 𝑝𝑎𝑖𝑟𝑖𝑛𝑔, 𝑑𝑢𝑡𝑦, 𝑑𝑏 , 𝑏𝑎𝑠𝑒 ∪ 𝑜𝑡ℎ𝑒𝑟 ) , and add enumerated pairings to ∗ end end Pop out 𝑑𝑢𝑡𝑦 from 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 end end end return ∗ ⊳ DFS( 𝑝𝑎𝑖𝑟𝑖𝑛𝑔, 𝑝𝑎𝑟𝑒𝑛𝑡, 𝑑𝑏 , 𝑏𝑎𝑠𝑒 ∪ 𝑜𝑡ℎ𝑒𝑟 ) for each 𝑐ℎ𝑖𝑙𝑑 of 𝑝𝑎𝑟𝑒𝑛𝑡 in 𝑑𝑏 do Push 𝑐ℎ𝑖𝑙𝑑 into 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisfies 𝑜𝑡ℎ𝑒𝑟 then if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisfies 𝑏𝑎𝑠𝑒 then yield 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 to ∗ else if 𝑐ℎ𝑖𝑙𝑑 has at least one duty overnight-connection in 𝑑𝑏 then DFS( 𝑝𝑎𝑖𝑟𝑖𝑛𝑔, 𝑐ℎ𝑖𝑙𝑑, 𝑑𝑏 , 𝑏𝑎𝑠𝑒 ∪ 𝑜𝑡ℎ𝑒𝑟 ) end end Pop out 𝑐ℎ𝑖𝑙𝑑 from 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 end 𝑑𝑢𝑡𝑦 is popped out of 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 (line 13), and the legal pairing enumeration using other duties ∈ 𝑏 is explored (line 3).Once, all the sub-processes are complete, the desired pairing set ∗ is returned (line 17). An initial feasible solution (IFS) is any set of pairings, covering all flights in the given flight schedule, which is usedto initialize a CPOP solution approach. For large-scale CPOPs, generation of an IFS standalone is a computationally-challenging task. This module is designed to generate a reasonably-sized IFS in a time-efficient manner for large andcomplex flight networks, which is then used to initialize the Optimization Engine of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . For this, it employsa novel
Integer Programming based Divide-and-cover Heuristic (IPDCH), which relies on: (a) a divide-and-cover strategy to decompose the input flight schedule into sufficiently-small flight subsets, and (b) integer programming tofind a lowest-cost pairing set, covering the maximum possible flights for each of the decomposed flight subsets.The procedure of the proposed IPDCH, formalized in Algorithm 3, is elaborated below. Being an iterative heuristic,IPDCH terminates when all flights in the input set are covered by pairings in the desired IFS, notated as 𝐼𝐹 𝑆 (lines 1).The input to the heuristic involves the given flight schedule (with 𝐹 number of flights), the pairing generation sub-routine Pairing_Gen() (presented in Section 3.1), and a pre-defined decomposition parameter 𝐾 , which regulates thenumber of flights to be selected from in each IPDCH-iteration. The setting of 𝐾 largely depends upon the availablecomputational resources, and the characteristics of the input flight dataset (as highlighted in Section 4.3.3). In eachIPDCH-iteration, first a flight subset, say 𝐾 ( 𝐾 < 𝐹 ) , is formed by randomly selecting 𝐾 number of flights from without replacement (line 2). Subsequently, 𝐾 is fed as input to the Pairing_Gen() sub-routine to enumerate the
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 10 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Algorithm 3:
Procedure for IFS generation using the proposed IPDCH
Input: , 𝐾, Pairing_Gen()
Output: 𝐼𝐹𝑆 while all flights ∈ are not covered in 𝐼𝐹𝑆 do 𝐾 ← Select 𝐾 random flights from without replacement ⊳ 𝐾 < 𝐹 𝐾 ← Pairing_Gen( 𝐾 ) 𝐾 ′ ← Flights covered in 𝐾 ⊳ 𝐾 ′ ≤ 𝐾 Add remaining flights ( 𝐾 ∖ 𝐾 ′ ) back to 𝐾 Formulate the IPP using flights in 𝐾 ′ and pairings in 𝐾 𝐼𝑃 ← Solve the IPP using an MIP solver, and select pairings corresponding to non-zero variables Add pairings from 𝐼𝑃 to 𝐼𝐹𝑆 Replace flights in if it becomes empty end return 𝐼𝐹𝑆 set of all possible legal pairings, say 𝐾 (line 3). Notably, all flights in 𝐾 may not get covered by pairings in 𝐾 , asrandom selection of flights does not guarantee legal connections for all selected flights. Let 𝐾 ′ ( 𝐾 ′ ≤ 𝐾 ) be the setof flights covered in 𝐾 (line 4). The remaining flights, given by 𝐾 ∖ 𝐾 ′ , are added back to (line 5). Subsequently, 𝐾 ′ and 𝐾 are used to formulate the corresponding IPP (line 6), which is then solved using a commercial off-the-shelfMIP solver to find the optimal IPP solution, say 𝐼𝑃 , constituted by pairings corresponding to only non-zero variables(line 7). The pairings in 𝐼𝑃 are then added to the desired set 𝐼𝐹 𝑆 (line 8). Lastly, the flights in are replaced ifit becomes empty (line 9). As soon as 𝐼𝐹 𝑆 covers all the required flights, IPDCH is terminated, and 𝐼𝐹 𝑆 is passedover to the Optimization Engine for its initialization.
The search for minimal cost, full flight-coverage CPOP solution is enabled by an optimization engine. It tacklesthe underlying LPP and IPP through intermittent interactions of two submodules, namely, CG-driven LPP-solutioningand IPP-solutioning, tracked by a counter 𝑇 . These submodules are presented below. As illustrated in Figure 2, this submodule entails several iterations (each referred to as an
LPP iteration , and istracked by 𝑡 ) in each of which: (a) an LP solver is invoked on the input pairing set, leading to the current LPP solution 𝑡𝐿𝑃 , (b) the corresponding dual od the LPP is formulated using 𝑡𝐿𝑃 , which is then solved to fetch dual variables(given by vector 𝑌 𝑡 ), and (c) a fresh set of pairings 𝑡𝐶𝐺 , that promises associated cost-improvement, is generatedusing a domain-knowledge driven CG heuristic. For the first LPP iteration ( 𝑡 = 1 ), the input to the LP solver is either 𝐼𝐹 𝑆 if 𝑇 = 1 , or 𝑇 −1 𝐼𝑃 if 𝑇 > . For any subsequent LPP iteration ( 𝑡 > ), the input comprises of the current 𝑡𝐶𝐺 and 𝑡𝐿𝑃 . In this background, each of these LPP iterations are implemented in the following three phases :• In the first phase, a primal of the LPP (Equations 4 to 6) is formulated from the input pairing set, and is solvedusing an interior-point method based commercial off-the-shelf LP solver (Gurobi Optimization, 2019). In theresulting LPP solution, a primal variable 𝑥 𝑗 , varying from to , is assigned to each pairing 𝑝 𝑗 in the inputpairing set. These 𝑥 𝑗 s together constitute the primal vector , notated as 𝑋 ( = [ 𝑥 𝑥 𝑥 ... 𝑥 𝑃 ] 𝖳 ) . The set of 𝑥 𝑗 swith non-zero values ( 𝑥 𝑗 ≠ ) and the set of corresponding pairings are notated as 𝑋 𝐿𝑃 and 𝐿𝑃 , respectively.Minimize 𝑍 𝑝𝐿𝑃 = 𝑃 ∑ 𝑗 =1 𝑐 𝑗 𝑥 𝑗 + 𝜓 𝐷 ⋅ ( 𝐹 ∑ 𝑖 =1 ( 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 − 1 )) = 𝑃 ∑ 𝑗 =1 ( 𝑐 𝑗 + 𝜓 𝐷 ⋅ 𝐹 ∑ 𝑖 =1 𝑎 𝑖𝑗 ) 𝑥 𝑗 − 𝐹 ⋅ 𝜓 𝐷 , (4)subject to 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 ≥ , ∀ 𝑖 ∈ {1 , , ..., 𝐹 } (5) For ease of reference, the notations introduced in these phases are kept independent of the LPP iteration counter 𝑡 . However, these notationsare super-scripted by 𝑡 in the corresponding discussions and pseudocodes with reference to a particular LPP iteration. D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 11 of 28rew pairing optimization framework for tackling large-scale & complex flight networks 𝑥 𝑗 ∈ ℝ = [0 , , ∀ 𝑗 ∈ {1 , , ..., 𝑃 } (6)It is to be noted that the minimization of 𝑍 𝑝𝐿𝑃 will always lead to a solution with all primal variables 𝑥 𝑗 ≤ , evenwithout explicitly involving the corresponding constraint– Equation 6 (Vazirani, 2003). Hence, the contributionof each pairing in the LPP solution, given by its 𝑥 𝑗 , could be effectively treated as 𝑥 𝑗 ∈ ℝ ≥ instead of Equation 6.• In the second phase, dual variables are extracted from the current LPP solution. For this, the dual of the LPP(Equations 7 to 9) is formulated using the pairing set 𝐿𝑃 , and is solved using an interior-point method (Andersen& Andersen, 2000) based non-commercial LP solver (Virtanen et al., 2020), to fetch the optimal dual solution. Inthat, a dual variable 𝑦 𝑖 represents a shadow price corresponding to an 𝑖 𝑡ℎ flight-coverage constraint in the primal.The optimal dual vector , constituted by all 𝑦 𝑖 s in the optimal dual solution, is notated as 𝑌 ( = [ 𝑦 𝑦 𝑦 ... 𝑦 𝐹 ] 𝖳 ) ,whose dimension is equal to 𝐹 .Maximize 𝑍 𝑑𝐿𝑃 = 𝐹 ∑ 𝑖 =1 𝑦 𝑖 − 𝐹 ⋅ 𝜓 𝐷 , (7)subject to 𝐹 ∑ 𝑖 =1 𝑎 𝑖𝑗 𝑦 𝑖 ≤ ( 𝑐 𝑗 + 𝜓 𝐷 ⋅ 𝐹 ∑ 𝑖 =1 𝑎 𝑖𝑗 ) , ∀ 𝑗 ∈ {1 , , ..., 𝑃 𝐿𝑃 } (8) 𝑦 𝑖 ∈ ℝ ≥ , ∀ 𝑖 ∈ {1 , , ..., 𝐹 } (9)where , 𝑃 𝐿𝑃 ∶ is the number of pairings in the set 𝐿𝑃 𝑦 𝑖 ∶ dual variable, corresponding to an 𝑖 𝑡ℎ flight-coverage constraint , Notably, in a conventional approach, the optimal 𝑌 is directly computed from the optimal basis of the primalsolution (obtained in the first phase), using the principles of duality theory , particularly the theorem of comple-mentary slackness (Bertsimas & Tsitsiklis, 1997), without explicitly solving the corresponding dual. However, inthe second phase, solving the dual explicitly using the interior-point method (Andersen & Andersen, 2000), in asense, helps in stabilizing the oscillating behavior of dual variables over the successive LPP iterations (bang-bangeffect, as discussed in Section 2.3). Moreover, this interior-point method is available via only a non-commercialLP solver (Virtanen et al., 2020), and to ensure a time-efficient search, the above dual is formulated using thepairings ∈ 𝐿𝑃 , instead of pairings from the large-sized input pairing set.• In the last phase, the availability of dual variables from the second phase paves the way for solution to the pricingsub-problem. It is aimed to generate those legal pairings (non-basic), which if included as part of the input to thenext LPP iteration, promise a better-cost (at least a similar-cost) LPP solution compared to the current solution.Such non-basic pairings are identified using a reduced cost metric, given by 𝜇 𝑗 (Equation 10), which if negative(as CPOP is a minimization problem) indicates the potential in the pairing to further reduce the cost of the currentLPP solution 𝑍 𝑝𝐿𝑃 , when included in the current basis (Bertsimas & Tsitsiklis, 1997). Moreover, the potentialof such a pairing to further reduce the current 𝑍 𝑝𝐿𝑃 , is in proportion to the magnitude of its 𝜇 𝑗 value. 𝜇 𝑗 = 𝑐 𝑗 − 𝜇𝑑 𝑗 , where, 𝜇𝑑 𝑗 = 𝐹 ∑ 𝑖 =1 ( 𝑎 𝑖𝑗 ⋅ 𝑦 𝑖 ) = ∑ 𝑓 𝑖 ∈ 𝑝 𝑗 𝑦 𝑖 ( represents the dual cost component of 𝜇 𝑗 ) (10)As mentioned in Section 2.3, the standard CG practices generate a complete pricing network and solves it as aresource-constrained shortest-path optimization problem, to identify only the pairing(s) with negative reducedcost(s). However, generation of a complete pricing network for CPOPs with large-scale and complex flightnetworks is computationally-intractable. To overcome this challenge, a domain-knowledge driven CG heuristic (Aggarwal, Saxena, et al., 2020a) is employed here to generate a set of promising pairings (of pre-defined size,criterion for which is discussed in Section 4.2). Notably, the merit of this CG heuristic lies in the fact that fromwithin the larger pool of pairings with negative 𝜇 𝑗 , besides selecting pairings randomly, it also selects pairings ina guided manner. In that, the selection of such pairings is guided by optimal solution features at a set level and anindividual pairing level , and re-utilization of the past computational efforts . These optimal solution features arerelated to the minimization of deadheads and maximization of the crew utilization , respectively. In essence, while D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 12 of 28rew pairing optimization framework for tackling large-scale & complex flight networks the standard CG practices present equal opportunity for any pairing with a negative 𝜇 𝑗 to qualify as an input forthe next LPP iteration, this CG heuristic, besides ensuring that the pairings have negative 𝜇 𝑗 , prioritizes somepairings over the others via its two-pronged strategy– exploration of the new pairings’ space and re-utilizationof pairings from the past LPP iterations . In that: – the exploration of the new pairings’ space is guided by three CG strategies, which are elaborated below. ◦ Deadhead Reduction strategy ( 𝐶𝐺𝐷 ): this strategy prioritizes a set of legal pairings that is character-ized by low deadheads , a feature which domain knowledge recommends for optimality at a set level .To exploit this optimality feature,
𝐶𝐺𝐷 generates a new paring set 𝐶𝐺𝐷 , which not only provides analternative way to cover the flights involved in a subset of the current 𝐿𝑃 , but also ensures that someof these flights get covered with zero deadheads. It promises propagation of the zero deadhead featureover successive LPP iterations, as: (a) 𝐶𝐺𝐷 alongside the current 𝐿𝑃 forms a part of the input forthe next LPP iteration; (b) 𝐶𝐺𝐷 provides a scope for better coverage (zero deadhead) of some flights,compared to the current 𝐿𝑃 ; and (c) 𝐶𝐺𝐷 may focus on zero deadhead coverage for different flightsin different LPP iterations. ◦ Crew Utilization enhancement strategy ( 𝐶𝐺𝑈 ): this strategy prioritizes a set of legal pairings eachmember of which is characterized by high crew utilization , a feature which domain knowledge rec-ommends for optimality at an individual pairing level . To exploit this optimality feature,
𝐶𝐺𝑈 : (a)introduces a new measure, namely, crew utilization ratio , given by 𝛾 𝑗 (Equation 11), to quantify thedegree of crew utilization in a pairing 𝑝 𝑗 at any instant; (b) identifies pairings from the current 𝐿𝑃 ,which are characterized by high dual cost component ( 𝜇𝑑 𝑗 , Equation 10), reflecting in turn on thoseconstitutive flights that have high value of dual variables 𝑦 𝑖 , and hence, on the potential of these flightsto generate new pairings with more negative 𝜇 𝑗 ; and (c) utilizes these flights to generate promisingpairings from which only the ones with high 𝛾 𝑗 are picked to constitute the new pairing set 𝐶𝐺𝑈 . 𝛾 𝑗 = 1 Number of duties in 𝑝 𝑗 ⋅ ∑ 𝑑 ∈ 𝑝 𝑗 Working hours in duty 𝑑 Permissible hours of duty 𝑑 (11)In doing so, 𝐶𝐺𝐷 promises propagation of the higher crew utilization ratio over successive LPPiterations, given that in each LPP iteration, 𝐶𝐺𝑈 alongside the current 𝐿𝑃 forms a part of the inputfor the next LPP iteration. ◦ Random exploration strategy ( 𝐶𝐺𝑅 ): this strategy, unlike
𝐶𝐺𝑈 and
𝐶𝐺𝐷 which are guided by opti-mal solution features, pursues random and unbiased exploration of the new pairings’ space, indepen-dent of the current LPP solution. It involves generation of new pairings for a random selected set oflegal duties from which only the pairings with negative reduced cost are selected to constitute the newpairing set 𝐶𝐺𝑅 . Here, a random set of legal duties is used instead of a random set of flights, as theformer has a higher probability of generating legal pairings, given that a majority of pairing legalityconstraints get satisfied with the preprocessing of legal duties. – the re-utilization of pairings from the past LPP iterations is guided by an Archiving strategy ( 𝐶𝐺𝐴 ), thatprioritizes a set of legal pairings comprising of those flight-pairs, which as per the existing LPP solu-tion, bear better potential for improvement in the objective function. Such a pairing set, originating fromthe flight-pair level information, is extracted from an archive (denoted by ) of the previously generatedpairings. In doing so, 𝐶𝐺𝐴 facilitates re-utilization of the past computational efforts, by providing an op-portunity for a previously generated pairing to be re-inducted in the current pairing pool. For this,
𝐶𝐺𝐴 : ◦ updates the archive in each LPP iteration such that any pairing is stored/retrieved with reference toa unique index ( 𝑓 𝑚 , 𝑓 𝑛 ) reserved for any legal flight-pair in that pairing. ◦ introduces a new measure, namely, reduced cost estimator , given by 𝜂 𝑚𝑛 (Equation 12), for a flight-pair ( 𝑓 𝑚 , 𝑓 𝑛 ) in . In each LPP iteration, this estimator is computed for all the flight-pairs present in , by fetching 𝑓 𝑚 , 𝑓 𝑛 , 𝑦 𝑚 and 𝑦 𝑛 . 𝜂 𝑚𝑛 = flying_cost( 𝑓 𝑚 ) + flying_cost( 𝑓 𝑛 ) − 𝑦 𝑚 − 𝑦 𝑛 = ∑ 𝑖 ∈{ 𝑚,𝑛 } ( flying_cost( 𝑓 𝑖 ) − 𝑦 𝑖 ) (12) D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 13 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Notably, this formulation is analogous to Equation 10, just that instead of the complete cost of a pairing,only the flying costs corresponding to the flights in a legal flight-pair are accounted for. Given this, 𝜂 𝑚𝑛 may be seen as an indicator of 𝜇 𝑗 at the flight-pair level . ◦ recognizes that towards further improvement in the current LPP solution, it may be prudent to includeas a part of the input for the next LPP iteration– the new pairing set 𝐶𝐺𝐴 , constituted by preferentiallypicking pairings from , that cover flight-pairs with lower 𝜂 𝑚𝑛 value.In doing so, 𝐶𝐺𝐴 pursues the goal of continual improvement in the objective function, while relying onthe flight-pair level information embedded in the LPP solution of current LPP iteration, and re-utilizingthe computational efforts spent till that LPP iteration.For further details and associated nitty-gritty of the above domain-knowledge driven CG heuristic, interestedreaders are referred to the authors’ previous work– Aggarwal, Saxena, et al. (2020a). Once this CG heuristicgenerates a set of promising pairings 𝐶𝐺 of pre-defined size, it is merged with the current 𝐿𝑃 , and fed as theinput to the next LPP iteration ( 𝑡 += 1 ).These LPP iterations are repeated until the cost-improvements over a pre-specified number of successive LPP itera-tions falls below a pre-specified cost-threshold (settings given in Section 4.2). In this submodule, these LPP iterationsare repeated, until its termination criterion is not met. In that, the cost-improvement over LPP iterations is observed,and if it falls below a pre-specified cost-threshold, say 𝑇 ℎ 𝑐𝑜𝑠𝑡 , over a pre-specified number of successive LPP itera-tions, say
𝑇 ℎ 𝑡 , then it is terminated. The settings of these pre-specified limits– 𝑇 ℎ 𝑐𝑜𝑠𝑡 and
𝑇 ℎ 𝑡 , are highlighted inSection 4.2. After termination, the final LPP solution 𝑇𝐿𝑃 is then passed over to the IPP-solutioning submodule forits integerization.
This submodule receives as input, the LPP solution 𝑇𝐿𝑃 , and aims to find therein a full-coverage integer solution,notated as 𝑇𝐼𝑃 . Towards it, an IPP (Equations 1 to 3) is formulated using 𝑇𝐿𝑃 and , and solved using a branch-and-cutalgorithm based off-the-shelf commercial MIP solver (Gurobi Optimization, 2019). At each node of the MIP-searchtree, this solver maintains a valid lower bound (cost of the LPP solution) and a best upper bound (cost of the IPPsolution), and it self-terminates if the gap between these two bounds becomes zero, or all branches in the MIP-searchtree have been explored. Considering that the MIP-search for large-scale CPOPs is extremely time-consuming, a pre-defined time limit, notated as 𝑇 ℎ 𝑖𝑝𝑡 (setting highlighted in Section 4.2), is used to terminate this MIP solver, if it doesnot terminate by itself a priori. Once the 𝑇𝐼𝑃 is obtained, it is passed back to the previous submodule for the nextLPP-IPP interaction ( 𝑇 += 1 ), only if the termination criterion of the Optimization Engine is not satisfied. Overarching Optimization Engine
In the wake of the above, the procedure of the overarching Optimization Engine, formalized in Algorithm 4, is elabo-rated below. Its input involves the given flight set ; the generated IFS 𝐼𝐹 𝑆 ; the pre-defined termination parameters–
𝑇 ℎ 𝑐𝑜𝑠𝑡 & 𝑇 ℎ 𝑡 (for CG-driven LPP-solutioning) and 𝑇 ℎ 𝑖𝑝𝑡 (for IPP-solutioning); and the sub-routines for Legal CrewPairing Generator (
Pairing_Gen() ) and the four CG strategies (
CGD() , CGU() , CGR() and
CGA() ) in the proposedCG heuristic. In each LPP-IPP interaction of the Optimization Engine, first, the CG-driven LPP-solutioning is exe-cuted (lines 3-25). It entails several LPP iterations (tracked by 𝑡 ), in each of which the first step is to formulate the primal using and the respective input pairing set. This input pairing set is:• 𝐼𝐹 𝑆 , if the first LPP iteration ( 𝑡 = 1 ) of the first LPP-IPP interaction ( 𝑇 = 1 ) is being executed (lines 5-6).• 𝑇 −1 𝐼𝑃 , if the first LPP iteration ( 𝑡 = 1 ) of any subsequent LPP-IPP interaction ( 𝑇 > ) is being executed (lines7-8).• 𝑡 −1 𝐶𝐺 ∪ 𝑡 −1 𝐿𝑃 , if any subsequent LPP iteration ( 𝑡 > ) of any LPP-IPP interaction ( 𝑇 ≥ ) is being executed (lines9-11).Once the primal is formulated, it is solved using the corresponding LP solver to obtain the current optimal LPP solu-tion, constituted by 𝑡𝐿𝑃 and 𝑋 𝑡𝐿𝑃 (line 12). Subsequently, the termination criterion of CG-driven LPP-solutioning ischecked (lines 13-16). If it is terminated, then the current LPP solution 𝑡𝐿𝑃 is fetched as the final LPP solution 𝑇𝐿𝑃
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 14 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Algorithm 4:
Procedure for the Optimization Engine
Input: , 𝐼𝐹𝑆 , 𝑇 ℎ 𝑐𝑜𝑠𝑡 , 𝑇 ℎ 𝑡 , 𝑇 ℎ 𝑖𝑝𝑡 , Pairing_Gen() , CGD() , CGU() , CGR() , CGA()
Output: ⋆𝐼𝑃 𝑇 ← while termination criterion of Optimization Engine is not met do ⊳ CG-driven LPP-solutioning: 𝑡 ← while termiantion criterion of CG-driven LPP-solutioning is not met do if 𝑡 = 1 and 𝑇 = 1 then Formulate the primal of the LPP using 𝐼𝐹𝑆 and else if 𝑡 = 1 and 𝑇 > then Formulate the primal of the LPP using 𝑇 −1 𝐼𝑃 and else Formulate the primal of the LPP using 𝑡 −1 𝐶𝐺 ∪ 𝑡 −1 𝐿𝑃 and end 𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 ← Solve the primal using the interior-point method based LP solver ⊳ Termination of the CG-driven LPP-solutioning: if cost-improvements ≤ 𝑇 ℎ 𝑐𝑜𝑠𝑡 over last
𝑇 ℎ 𝑡 number of successive LPP iterations then 𝑇𝐿𝑃 ← 𝑡𝐿𝑃 Break end Formulate the dual of the LPP using and 𝑡𝐿𝑃 𝑌 𝑡 ← Solve the dual using the interior-point method based LP solver ⊳ Solution to pricing sub-problem using the CG heuristic: 𝑡𝐶𝐺𝐷 ← CGD( 𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 , 𝑌 𝑡 , … ) 𝑡𝐶𝐺𝑈 ← CGU( 𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 , 𝑌 𝑡 , … ) 𝑡𝐶𝐺𝑅 ← CGR( 𝑌 𝑡 , … ) 𝑡𝐶𝐺𝐴 ← CGA( 𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 , 𝑌 𝑡 , … ) 𝑡𝐶𝐺 ← 𝑡𝐶𝐺𝐷 ∪ 𝑡𝐶𝐺𝑈 ∪ 𝑡𝐶𝐺𝑅 ∪ 𝑡𝐶𝐺𝐴 𝑡 += 1 end ⊳ IPP-solutioning: Formulate the IPP using 𝑇𝐿𝑃 and 𝑇𝐼𝑃 ← Solve the IPP using a branch-and-cut algorithm based MIP solver until its run-time becomes ≥ 𝑇 ℎ 𝑖𝑝𝑡 ⊳ Termination of the Optimization Engine: if 𝑍 𝑇𝐼𝑃 ( cost of 𝑇𝐼𝑃 ) = 𝑍 𝑇𝐿𝑃 ( cost of 𝑇𝐿𝑃 ) then ⋆𝐼𝑃 ← 𝑇𝐼𝑃 Break end 𝑇 += 1 end return ⋆𝐼𝑃 of this LPP-IPP interaction. If not, then a dual is formulated using 𝑡𝐿𝑃 and (line 17), which is then solved using thecorresponding LP solver to obtain the current optimal dual vector 𝑌 𝑡 (line 18). Using the current 𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 and 𝑌 𝑡 , afresh set of pairings 𝑡𝐶𝐺 is obtained using the CG heuristic, which is constituted by the new pairing sets from the fourunderlying CG strategies (lines 19-23). At the end of the LPP iteration 𝑡 , the fresh set of pairings 𝑡𝐶𝐺 is combinedwith the current 𝑡𝐿𝑃 to serve as input pairing set for the subsequent LPP iteration ( 𝑡 += 1 ). Once this submodule isterminated, the resulting 𝑇𝐿𝑃 is passed over to the IPP-solutioning for its integerization, wherein, the MIP solver isused to obtain the IPP solution 𝑇𝐼𝑃 (lines 26 and 27). In that, the pre-defined
𝑇 ℎ 𝑖𝑝𝑡 time-limit is used to terminatethe MIP-search, if it does not self-terminate a priori. Subsequently, the resulting 𝑇𝐼𝑃 is passed back to the CG-drivenLPP-solutioning for the next LPP-IPP interaction ( 𝑇 += 1 ), or returned as the final integer solution ⋆𝐼𝑃 , depending D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 15 of 28rew pairing optimization framework for tackling large-scale & complex flight networks upon the termination condition of the Optimization Engine (lines 28-32). In that, if the cost of 𝑇𝐼𝑃 ( 𝑍 𝑇𝐼𝑃 ) , matchesthe cost of 𝑇𝐿𝑃 ( 𝑍 𝑝,𝑇𝐿𝑃 ) , then the Optimization Engine is terminated.
4. Computational Experiments
This section first presents the test cases and the computational setup, used to investigate the utility of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 , itsmodules, and their interactions. Subsequently, the settings of parameters involved in different modules of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 are presented. Lastly, the experimental results are discussed.
The real-world airline test cases, used for experimentation, are detailed in Table 2. Each of these test cases involvesa weekly flight schedule, and have been provided by the research consortium’s industrial sponsor (from the networksof US-based airlines). The columns in Table 2, in order of their occurrence, highlight the notations for the different
Table 2
Real-world airline test cases used in this research work
Test Cases
Flights
Crew Bases
Airports
Legal Duties
TC-1 3202 15 88 454205TC-2 3228 15 88 464092TC-3 3229 15 88 506272TC-4 3265 15 90 446937TC-5 4212 15 88 737184 (a) (b)
Figure 3: (a) Geographical representation of TC-5 flight network, where the red nodes, green edges and yellow nodesrepresent the airports, scheduled flights and crew bases, respectively, and (b) legal flight-connections, each represented bya point in the plot, where for a flight marked on the y-axis, the connecting flight is marked on the x-axis. test cases; the number of its constituent flights; the number of constituent crew bases; and the total number of legalduties involved, respectively. It is critical to recognize that the challenge associated with solutioning of these testcases, depends not just on the number of flights involved but also to the fact that these flights are part of complex flightnetworks, characterized by a multiplicity of hubs as opposed to a single hub, and multiplicity of crew bases as opposed
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 16 of 28rew pairing optimization framework for tackling large-scale & complex flight networks to a single crew base . In that, the number of legal pairings possible, grow exponentially with the number of hubs andcrew bases. As a sample instance, the geographical representation of the flight network associated with TC-5, andthe legal flight connections involved in it, are portrayed in Figure 3. Notably, in Figure 3a, the presence of multiplehub-and-spoke subnetworks and multiple crew bases (highlighted in yellow color) is evident. Furthermore, the patternvisible in Figure 3b could be attributed to the (minimum and maximum) limits on the sit-time and overnight-restconstraints. For instance, a flight, say 𝑓 , has legal connections only with those flights that depart from the arrivalairport of 𝑓 , and whose departure-time gap (difference between its departure-time and the arrival time of 𝑓 ) lieswithin the minimum and maximum allowable limits, of the sit-time or the overnight-rest.All the experiments in this research have been performed on an HP Z640 Workstation, which is powered by twoIntel Ⓡ Xeon Ⓡ E5-2630v3 processors, each with 16 cores at 2.40 GHz, and 96 GBs of RAM. All codes related to the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 have been developed using the Python scripting language in alignment with the Industrial sponsor’s largervision and preference. Furthermore:• the interior-point method from Gurobi Optimizer 8.1.1 (Gurobi Optimization, 2019) is used to solve the primalin the CG- driven LPP-solutioning submodule.• the interior-point method (Andersen & Andersen, 2000) from SciPy’s linprog library (Virtanen et al., 2020) isused to solve the dual in the CG-driven LPP-solutioning submodule.• the branch-and-cut algorithm based MIP solver from Gurobi Optimizer 8.1.1 is used to solve the IPP in the InitialFeasible Solution Generator and the IPP-solutioning submodule.• an
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -run, in principle, terminates when the cost of the IPP solution matches the cost of its inputLPP solution in a particular LPP-IPP interaction. However, for practical considerations on the time-limit, an
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -run is allowed to terminate if the IPP and LPP costs do not conform with each other even after 30LPP-IPP interactions are over, or 30 hours of total run-time is elapsed.
The settings of the parameters associated with different modules and submodules of the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 are, as high-lighted below.•
Initial Feasible Solution Generator : here, the proposed IPDCH involves the decomposition parameter 𝐾 , whichregulates the size of flight subsets formed in each of IPDCH-iteration. As mentioned before, the setting of 𝐾 isdependent on the characteristics of input flight dataset and the configuration of available computational resources.Here, the aim is to cover all given flights in a time-efficient manner. Hence, it is important to understand theeffect of setting of 𝐾 on the time-performance of IPDCH, which is highlighted below. – For a relatively lower value of 𝐾 , smaller flight subsets with lesser number of legal flight-connectionswould be formed in each IPDCH-iteration, leading to coverage of relatively lesser number of unique flightsin each of them. Though, this by itself is not a challenge, but this would necessitate a significant number ofadditional IPDCH-iterations (and the respective run-time), since the number of unique flights covered perIPDCH-iteration, which by construct reduces with the iterations, would get further reduced with relativelysmaller flight subsets. – On the flip side, for a relatively higher value of 𝐾 , bigger flight subsets would be formed that would leadto coverage of higher number of unique flights per IPDCH-iteration. Though, this may reduce the totalnumber of IPDCH-iterations required to generate the desired IFS, the overall run-time of the IPDCH mayincrease drastically. The rationale being that with bigger flight subset in each IPDCH-iteration, the numberof possible legal pairings would increase drastically, leading to huge run-time for their generation as wellas for the subsequent MIP-search.The above considerations suggest that 𝐾 should be reasonably-sized. Considering the given computationalresources and the results of initial exploration around the possible number of pairings for differently-sized flightsets, the value of 𝐾 in each IPDCH-iteration is guided by a random integer between one-eighth and one-fourthof the size of the input flight set . It may be noted that this setting of 𝐾 has been selected considering thescale and complexity of the given test cases, and it needs to be re-visited if the scale and complexity of the flightnetwork changes drastically. D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 17 of 28rew pairing optimization framework for tackling large-scale & complex flight networks • CG-driven LPP-solutioning : The parameters involved in the termination criterion for this submodule–
𝑇 ℎ 𝑐𝑜𝑠𝑡 & 𝑇 ℎ 𝑡 , are set as 100 USD & 10 iterations respectively, to achieve an LPP solution with a sufficiently good costin a reasonably good time. Moreover, the sensitivity of these parameters towards the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performanceis discussed in Section 4.3.4. Moreover, the effect of the parameter– size of 𝑡𝐶𝐺 , on the performance of thissubmodule (the final LPP solution’s cost and required run-time), and the demand on the computational resources(dominantly, RAM) is highlighted below. – for a relatively small-sized 𝑡𝐶𝐺 , the alternative pairings available to foster further cost improvement shallbe quite limited, amounting to smaller cost benefits in each phase of the CG-driven LPP-solutioning. Thiswould necessitate far more LPP-IPP interactions, to reach the near-optimal cost. This pre se is not a chal-lenge, however, significant amount of additional run-time may be required, since: (a) each call for CG-driven LPP-solutioning demands a minimum of 10 LPP iterations, before it could be terminated, (b) suchcalls when invoked repeatedly, may consume significant run-time, yet, without reasonable cost benefit. – On the other hand, for a very large-sized 𝑡𝐶𝐺 , though the potential for significant cost benefits may exist,the demand on the RAM may become overwhelming for any CG-driven LPP-solutioning phase to proceed.The above considerations suggest that the size of 𝑡𝐶𝐺 may neither be too small nor too large. Factoring these, theexperiments here aim at 𝑡𝐶𝐺 sized approximately of a million pairings (significant size, yet, not overwhelmingfor 96 GB RAM). Furthermore, for a search that is not biased in favor of any particular CG strategy, the numberof pairings from each CG strategy towards the overall CG heuristic are kept equable.• IPP-solutioning : As mentioned before, the MIP-search on a large-scale IPP is time-intensive. Hence, the termi-nation parameter–
𝑇 ℎ 𝑖𝑝𝑡 , that restricts the run-time of any IPP-solutioning phase if not self-terminated a priori,is reasonably set as 20 minutes, and its sensitivity on the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance is discussed in Section 4.3.4.
This section presents the experimental results and associated inferences, in the order highlighted below.1. The performance of the proposed
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on the given test cases with the aforementioned parameter settingsis discussed.2. The phenomenon referred to as performance variability (Lodi & Tramontani, 2013) is discussed in the contextof
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . This aspect is pertinent since some variability in performance (even for the same random seed)is inevitable owing to
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s reliance on the mathematical programming solvers, which over the differentruns may pick different permutations of the rows (flight-coverage) or columns (pairings).3. The impact of the initialization methods: (a) the proposed IPDCH, (b) an Enhanced-DFS heuristic, earlier pro-posed by the authors (Aggarwal et al., 2018), and (c) a commonly adopted
Artificial Pairings method (Hoffman& Padberg, 1993; Vance et al., 1997), on the final performance of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is investigated.4. The sensitivity of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance to the termination parameters in the Optimization Engine’s sub-modules (CG-driven LPP-solutioning and IPP-solutioning) has been discussed.
The results of the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -runs on the given test cases (TC-1 to TC-5) with the aforementioned parametersettings are reported in Table 3. In that, for each test case:• the first row marked by “ 𝐼𝐹 𝑆 ” highlights the cost associated with the IFS that initializes the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -runand the run-time consumed in its generation.• the subsequent rows present the results of the LPP-IPP interactions (marked by the counter 𝑇 ). In that, for aparticular 𝑇 , the cost of the LP-solution passed on for its integerization and the associated time are highlighted.Also the cost of the IP-solution returned and the associated time are highlighted. Here, the unit of cost is USD,and the time corresponds to the HH:MM format.• the final crew pairing solution ( ⋆𝐼𝑃 ) is highlighted in the last row (emboldened) marked by “Final Solution”. D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 18 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Table 3
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance ∗ on the given test cases LPP-IPP TC-1 TC-2 TC-3 TC-4 TC-5Interactions 𝑇 𝑇𝐿𝑃 ∕ 𝑇𝐼𝑃
Cost Time Cost Time Cost Time Cost Time Cost Time 𝐼𝐹𝑆 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 Final Solution 3473238 10:05 3497106 08:46 3490420 12:55 3604753 09:24 4595613 22:52 ∗ All values in the “Cost” columns are in USD, and all corresponding real values are rounded-off to the next integer values. All valuesin the “Time” columns are in HH:MM format, and all corresponding seconds’ values are rounded-off to the next minute values.
It may be noted that the experimental results in the subsequent sections are presented in the same format, unless anydigression is specifically highlighted.The above results have been tested by the research consortium’s industrial sponsor, and verified to be highly-competitive compared to the best practice solutions known, for different test cases. In general, the obtained solutionshave been found to be superior by about 1.5 to 3.0% in terms of the hard cost , which reportedly is one of the mostimportant solution quality indicator. For reference, a comparison of the obtained solution vis- ̀𝑎 -vis the best knownsolution has been drawn for TC-5, in Table 4, where a significant difference in terms of the size of pairings can beobserved. Notably, the key features contributing to lower hard cost relate to presence of pairings with relatively lower- TAFB, overnight rests and meal cost. However, the obtained solution also entails more crew changes, some of which(involving aircraft change) negatively impact the soft cost. Hence, there appears to be a trade-off between the hard costand the soft cost. These section investigates the sensitivity of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 with respect to the sources of variability over multipleruns, even for the same problem. This study assumes importance, considering that performance variability is ratherinevitable when the mathematical programming based solution approaches are employed (Koch et al., 2011). As citedby Lodi & Tramontani (2013), variability in the performance of LP & MIP solvers may be observed on – changingthe computing platform (which may change the floating-point arithmetic), permuting the constraints/variables of therespective mathematical models, or changing the pseudo-random numbers’ seed. These changes/permutations maylead to an entirely different outcome of the respective search algorithms (LP & MIP), as highlighted below.• The root source for the performance variability in MIP is the imperfect tie-breaking . A majority of the decisionsto be taken during an MIP-search are dependent on– the ordering of the candidates according to an interim score as well as the selection of the best candidate (one with the best score value). A perfect score that could fully-distinguish between the candidates is not-known mostly due to the lack of theoretical knowledge, and even if it
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 19 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Table 4
Salient features of ⋆𝐼𝑃 for TC-5: 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s solution vis- ̀𝑎 -vis the best practice solution Features
𝑨𝒊𝒓𝑪𝑹𝑶𝑷 ’s solution Best practice solution pairings 926 783 unique flights covered 4,212 4,212 deadhead flights 3 3 overnight-rests 1,203 1,279 crew changes 1,002 825 average crew changes per pairing 1.082 1.054Total TAFB (HH:MM) 37444:54 38189:39 pairings covering 2 flights 303 205 pairings covering 3 flights 17 31 pairings covering 4 flights 170 95 pairings covering 5 flights 63 37 pairings covering 6 flights 202 153 pairings covering 7 flights 59 62 pairings covering 8 flights 83 90 pairings covering 9 flights 19 49 pairings covering 10 flights 8 45 pairings covering 11 flights 1 10 pairings covering 12 flights 1 5 pairings covering 13 flights 0 0 pairings covering 14 flights 0 1Hotel cost (USD) 166,240 176,170Meal cost (USD) 157,269 160,397Hard cost (USD) 340,671 350,818Soft cost (USD) 51,600 42,750Actual flying cost (USD) 4,203,342 4,203,342Total cost (USD) 4,595,613 4,596,910 is known, it may be too expensive to compute . Furthermore, additional ties or tiebreaks could be induced bychanging the floating-point operations, which inherently may change when the computing platform is changed.Amidst such an imperfect tie-breaking, the permutation of the variables/constraints changes the path within theMIP-search tree, leading to a completely different evolution of the algorithm with rather severe consequences.• Depending upon the floating-point arithmetic or the sequence of variables loaded in an LPP, the performance ofthe simplex and interior-point methods may vary.• The performance of the LP and MIP solvers is also affected by the choice of pseudo-random numbers’ seed,wherever the decisions are made heuristically. For instance, an interior-point method in the LP solvers performsa (random) crossover to one of the vertices of the optimal face when the search reaches its (unique) center.In the above background, the plausible reasons for variability in 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance are elaborated below.•
Generation of new legal pairings using a parallel architecture: in any LPP iteration 𝑡 , new legal pairings aregenerated in parallel, by allocating the sub-processes to the idle-cores of the CPU. These sub-processes returntheir respective pairing sets as soon as they are terminated. This by itself is not a challenge, however, whenthe 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is re-run, the order in which these sub-processes terminate may not be same as before (as itdepends on the state of the CPU), permuting the pairings in the cumulative pairing set 𝑡𝐶𝐺 . This permutedpairing set, when fed as part of the input to the LP solver in the next LPP iteration, may lead to a different LPPsolution, leading to a different outcome of the subsequent 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s search. To curb this, the pairings in theset that trigger the LP solver are sorted in lexicographical order of their representative strings . These stringsare constructed from the indices of the flights covered in the corresponding pairings. For instance, the stringcorresponding to a pairing that covers flights 𝑓 , 𝑓 , 𝑓 & 𝑓 is _ _ _ . Given that the pairings are For instance, in a strong branching scheme, the best variable to branch at each node is decided after simulating one-level of branching for eachfractional variable, however, it is performed heuristically to make it a computationally-affordable task for MIP solvers (Linderoth & Lodi, 2011)
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 20 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Table 5
Performance variability assessment for
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on two test instances ∗ (TC-2 and TC-5) Test LPP-IPP Runs with performance variability Runs without performance variabilityCase Interactions Run-1 Run-2 Run (Seed- 𝛼 ) Run (Seed- 𝛽 ) Run (Seed- 𝛾 ) 𝑇 𝑇𝐿𝑃 ∕ 𝑇𝐼𝑃
Cost Time Cost Time Cost Time Cost Time Cost TimeTC-2 𝐼𝐹𝑆 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 Final Solution 3497106 08:46 3498588 12:05 3502118 11:05 3500504 09:38 3499063 12:27TC-5 𝐼𝐹𝑆 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 Final Solution 4595613 22:52 4598412 23:37 4594146 22:27 4591176 22:59 4591065 26:32 ∗ All values in the “Cost” columns are in USD, and all the corresponding real values are rounded-off to the next integer values. Allvalues in the “Time” columns are in HH:MM, and all the corresponding seconds’ values are rounded-off to the next minute values. distinct, the resulting strings are distinct too, allowing for a crisp sorting criterion and ensuring a fixed pairingsequence in each
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -run.•
Numerical seed for generation of pseudo-random numbers : variability may also be introduced if the numericalseed employed to generate pseudo-random numbers for use in the proposed modules or the utilized LP & MIPsolvers, varies. For instance, use of the default seed method of Python (i.e., the current time of the computingsystem) across different
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 runs may lead to different pseudo-random numbers, each time. This in turn
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 21 of 28rew pairing optimization framework for tackling large-scale & complex flight networks would trigger variability in the IFS generated by IPDCH (since the random selection of flights in each of itsiterations, is impacted), and the pairing set resulting from the CG heuristic (since each of the underlying CGstrategy is impacted). Such variability could be negated by use of a fixed numerical seed, instead of a timedependent one.The intriguing questions for researchers could relate to the impact that presence or absence of causes of variabilitymay have on the quality of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s solutions, in terms of both cost and run-time. Table 5 attempts to shed light onthese questions through empirical evidence for two test cases involving 3228 flights (TC-2) and 4212 flights (TC-5),respectively. In each of these test cases, the effect of variability is revealed through:• two independent runs (Run-1 and Run-2), in each of which the causes of variability exist, that is: (a) the per-mutations of pairings generated using the parallel architecture is possible, and (b) the default seed method ofPython, based on the time of the computing system applies.• three independent runs, in each of which the causes of variability have been eliminated, that is: (a) the lexi-cographical order of the pairings is imposed, and (b) a fixed numerical seed has been fed for random numbergeneration. For these runs, the numerical seeds are given by 𝛼 = 0 , 𝛽 = 1 , and 𝛾 = 2 , respectively.The key observations and inferences that could be drawn from each test case in Table 5 are highlighted below.• understandably, the Run-1 and Run-2 (corresponding to the same numerical seed), yield different cost solutionsover different run-time. Importantly, the variation in cost (despite the presence of causes of variability) is notalarming, though significantly different run-times may be required.• each run (corresponding to Seed- 𝛼 , Seed- 𝛽 , and Seed- 𝛾 , respectively) where the causes of variability have beennegated, if repeated, yield the same cost solution in the same run-time though it has not been shown in the tablefor paucity of space.• the runs corresponding to the numerical seeds given by 𝛼 , 𝛽 , and 𝛾 , respectively, differ solely due to the differencein the corresponding random numbers generated, and subsequently utilized. It can be observed that the change innumerical seed does not significantly affect the cost-quality of the final 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 solution though the associatedrun-time may vary significantly.The fact that
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 can offer final solutions with comparable cost quality, regardless of the presence or absenceof causes of variability, endorses the robustness of the constitutive modules of the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . Also, the variation inrun-time could be attributed to different search trajectories corresponding to different permutations of variables ordifferent random numbers. It may be noted that for the subsequent runs the lexicographical order of the pairings anda fixed numerical seed (Seed- 𝛼 = 0 ) have been utilized. This section investigates the sensitivity of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 with respect to the cost quality of the initial solution and therun-time spent to obtain it. Towards it, the initial solution is obtained using three different methods (offering threeinput alternatives with varying cost and run-time) and the cost quality of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 final solution alongside thenecessary run-time is noted.Notably, in an initial attempt to generate IFS for large-scale CPOPs, the authors proposed a DFS algorithm basedheuristic, namely, Enhanced-DFS heuristic (Aggarwal et al., 2018). Its performance across the five test cases has beenhighlighted in Table 6. In that, TC-1 emerges as an outlier owing to alarmingly high run-time, when compared to allother test cases. A plausible explanation behind this aberration is that TC-1 involves some flights with very few legalflight connections, and a DFS based algorithm may have to exhaustively explore several flight connections, to be ableto generate an IFS with full flight coverage. The need to do away with reliance on DFS so as to have equable run-timeacross different data sets explains the motivation for:• proposition of IPDCH in this paper, which as highlighted in Section 3.2, relies on: (a) a divide-and-cover strategyto decompose the input flight schedule into sufficiently-small flight subsets, and (b) IP to find a lowest-costpairing set that covers the maximum-possible flights for each of the decomposed flight subsets.
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 22 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Table 6
Performance of Enhanced-DFS heuristic (Aggarwal et al., 2018) for IFS generation. Here, the real valued “Cost” isrounded-off to the next integer value, and the seconds’ in the “Time” column are rounded-off to the next minute values.
Test Cases Time (HH:MM) Cost (USD)
TC-2
TC-3
TC-4
TC-5 • consideration of a commonly adopted
Artificial Pairings method (Vance et al., 1997), that constructs a pairingset which covers all the flights, though some/all the pairings may not be legal. Hence, for this method the initialsolution would be referred as 𝐼𝑆 instead of 𝐼𝐹 𝑆 . Table 7
Performance assessment of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on TC-1 and TC-5 when initialized using the proposed IPDCH, the ArtificialPairings method, and the Enhanced-DFS heuristic.
LPP-IPP TC-1 TC-5Interactions Enhanced-DFS IPDCH Artificial Pairings Enhanced-DFS IPDCH Artificial Pairings 𝑇 𝑇𝐿𝑃 ∕ 𝑇𝐼𝑃
Cost Time Cost Time Cost Time Cost Time Cost Time Cost
Time 𝐼𝐹𝑆 ∕ 𝐼𝑆 ≈ ≈ 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 Final Solution 3469276 13:22 3469950 12:18 3470355 12:00 4592860 23:13 4594146 22:27 4597929 23:57 ∗ All values in the “Cost” columns are in USD, where the real values are rounded-off to the next integer values. All values in the“Time” columns are in HH:MM, where the seconds’ values are rounded-off to the next minute values.
A comparison of the above three methods has been drawn in Table 7, for TC-1 (posing challenge to Enhanced-DFS)and TC-5 (largest flight set). In that, besides the cost and run-time of the initial solution for each test case, the resultsof all the iterations of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 leading up to the final solution have been presented. The latter is done to shed lighton whether
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 final solution cost quality strongly depends on the cost of the initial solution. The prominentobservations from the Table 7 include:• In terms of run-time: IPDCH could outperform the Enhanced-DFS, as its run-time happened to be less thanten minutes in both the test cases. The Artificial pairing method even out performs IPDCH, since its run-timehappened to be in milliseconds (formatted to 0 minutes in the table). D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 23 of 28rew pairing optimization framework for tackling large-scale & complex flight networks • In terms of initial cost: IPDCH could again outperform the Enhanced-DFS. This could be attributed to the useof IP to find a lowest-cost pairing set that covers the maximum-possible flights for each of the decomposed flightsubsets. In contrast, the cost associated with the Artificial pairing method, is the worst. This is owing to a veryhigh pseudo-cost attached to the pairings to offset their non-legality.Critically, regardless of the significantly varying run-time and the initial cost associated with the three methods, thevariation in the cost of the final solution offered by
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is not significant. This endorses the robustness of itsconstitutive modules.
This section investigates the sensitivity of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 to the termination parameter settings of the OptimizationEngine’s submodules, namely, LPP-solutioning and IPP-solutioning. The parameters involved in LPP-solutioning are
𝑇 ℎ 𝑐𝑜𝑠𝑡 and
𝑇 ℎ 𝑡 , while 𝑇 ℎ 𝑖𝑝𝑡 is involved in IPP-solutioning. To assess their impact on
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance, ex-periments are performed with three different sets of parameter settings each, for both the submodules. Impact of Termination Settings of CG-driven LPP-solutioning:
As mentioned earlier, the CG-driven LPP-solutioning is terminated if the cost-improvement per LPP iteration falls be-low the pre-specified threshold
𝑇 ℎ 𝑐𝑜𝑠𝑡 (in USD) over
𝑇 ℎ 𝑡 number of successive LPP iterations. To achieve a reasonablebalance between 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 run time on the one hand and the cost reduction of the crew pairing solution on the otherhand, three different sets of parameter settings are chosen, and experimented with. These settings of { 𝑇 ℎ 𝑐𝑜𝑠𝑡 , 𝑇 ℎ 𝑡 } including {500 , , {100 , , and {50 , symbolize relaxed, moderate and strict settings, respectively, since thecriterion for 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 termination gets more and more difficult as the settings change from {500 , to {50 , .The results of the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -runs corresponding to these termination settings are reported in Table 8, and the keyobservations are as highlighted below.• As the termination settings transition through relaxed, moderate and strict settings, the run-time to obtain thefinal solution increases, while the cost of the final solution decreases. An apparent exception to this trend isobserved in TC-5 with the strict setting, but this could be explained by the fact that the upper limit of 30 hoursset for
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 run time under practical considerations was exceeded during the fourth LPP-IPP interaction( 𝑇 = 4 ). It implies that due to the enforced termination in this particular case, 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 could not fully utilizethe potential for cost reduction.• Despite the variation in the termination settings, the cost quality of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 final solution does not varyas drastically, as its run time. For instance, as the settings switched from relaxed to moderate: an additionalsaving of 6384 USD could be achieved at the expense of additional 5:20 run time in the case of TC-2, whilethese indicators stand at 13388 USD and 10:25, respectively, in the case of TC-5. It can also be inferred that { 𝑇 ℎ 𝑐𝑜𝑠𝑡 , 𝑇 ℎ 𝑡 } set as {100 , possibly offers a fair balance between solution’s cost quality and run time, andthis explains why these settings have been used as the base settings for the experimental results presented in thispaper, beginning with Table 3 and ending with Table 9.It is important to recognize that as the termination settings for LPP-solutioning are made stricter, its run time is boundto increase. It is also fair to expect that the cost quality of the final solution may be better, though it cannot be guaran-teed. Any such departures from the expected trend may be due to the dependence of the quality of the final solutionon the quality of the IPP-solution for each 𝑇 . In that, if an IPP-solution for a particular 𝑇 may largely fail to approachthe lower bound set by the corresponding LPP-solution, it may negatively influence the cost quality obtained in sub-sequent LPP- and IPP-solutioning phases. While such a possibility remains, it did not surface in the experiments above. Impact of Termination Settings of IPP-solutioning:
As mentioned before, integerization of an LPP solution using an MIP solver is extremely time-consuming, particularlyfor large-scale CPOPs, and more so those involving complex flight networks. Hence, from a practical perspective,the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 framework imposes a threshold on the upper time limit for IPP-solutioning (for any given 𝑇 ), namely 𝑇 ℎ 𝑖𝑝𝑡 , in case it does not self-terminate a priori. To investigate the impact of
𝑇 ℎ 𝑖𝑝𝑡 on 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance,experiments are performed with three different settings, including, 00:20 (one-third of an hour), 00:40 (two-third of anhour), and 01:00 (an hour). The results are presented in Table 9, and the key observations are as follows. In the caseof TC-2, as the 𝑇 ℎ 𝑖𝑝𝑡 is raised, the run-time to obtain the final solution increases, while the cost of the final solution
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 24 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Table 8
Performance assessment of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on TC-2 and TC-5, against three different termination settings (
Relaxed , Moderate and
Strict
Settings) of the CG-driven LPP-solutioning ∗ LPP-IPP TC-2 TC-5Interactions Relaxed Setting Moderate Setting Strict Setting Relaxed Setting Moderate Setting Strict Setting 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑇 𝑇𝐿𝑃 ∕ 𝑇𝐼𝑃
Cost Time Cost Time Cost Time Cost Time Cost Time Cost Time 𝐼𝐹𝑆 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 Final Solution 3508502 05:45 3502118 11:05 3496498 23:28 4607534 12:02 4594146 22:27 4624747 30:17 ∗ All values in the “Cost” columns are in USD, and all the corresponding real values are rounded-off to the next integer values. Allvalues in the “Time” columns are in HH:MM, and all the corresponding seconds’ values are rounded-off to the next minute values. decreases. However, there are exceptions to this trend in the case of TC-5. Notably, the cost quality of the final solutioncorresponding to
𝑇 ℎ 𝑖𝑝𝑡 = 𝑇 ℎ 𝑖𝑝𝑡 = 𝑇 = 8 turned worse compared to the case of 𝑇 ℎ 𝑖𝑝𝑡 = 𝑇 = 9 ). The worsening of LPP-solution could be attributedto the fact that LPP-solutioning relies on random number based heuristics, and the resulting pairing combinations maynot necessarily offer lower cost within the pre-specified termination settings.Based on the above, it may be inferred that despite the changes in the termination parameter settings, 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is able to offer solutions with reasonably close cost quality, though significant variations in run time may be observed.It is also evident that even the lowest setting (desired from a practical perspective) for
𝑇 ℎ 𝑖𝑝𝑡 =
5. Conclusion and Future Research
For an airline, crew operating cost is the second largest expense, after the fuel cost, making the crew pairing op-timization critical for business viability. Over the last three decades, CPOP has received an unprecedented attentionfrom the OR community, as a result of which numerous CPOP solution approaches have been proposed. Yet, the emer-gent flight networks with conjunct scale and complexity largely remain unaddressed in the available literature. Sucha scenario is all the more alarming, considering that the air traffic is expected to scale up to double over the next 20years, wherein, most airlines may need to cater to multiple crew bases and multiple hub-and-spoke subnetworks. Thisresearch has proposed an Airline Crew Pairing Optimization Framework (
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ) based on domain-knowledgedriven CG strategies for efficiently tackling real-world, large-scale and complex flight networks. This paper has pre-
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 25 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Table 9
Performance assessment of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on TC-2 and TC-5, against three different termination settings (
𝑇 ℎ 𝑖𝑝𝑡 = ∗ LPP-IPP TC-2 TC-5Interactions
𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑇 𝑇𝐿𝑃 ∕ 𝑇𝐼𝑃
Cost Time Cost Time Cost Time Cost Time Cost Time Cost Time 𝐼𝐹𝑆 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 𝐿𝑃 𝐼𝑃 Final solution 3502118 11:05 3501809 12:24 3499609 19:22 4594146 22:27 4595703 27:55 4596929 30:35 ∗ All values in the “Cost” columns are in USD, and all the corresponding real values are rounded-off to the next integer values. Allvalues in the “Time” columns are in HH:MM, and all the corresponding seconds’ values are rounded-off to the next minute values. sented not just the design of the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s constitutive modules , but has also shared insights on how these modulesinteract and how sensitive the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance is to the sources of variability, choice of different methodsand parameter settings .Given a CPOP, 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 first preprocesses the entire duty overnight-connection network via its Legal Crew Pair-ing Generator Subsequently,
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is initialized using an IFS generated by the proposed method (IPDCH). Next,the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s Optimization Engine attempts to find a good-quality CPOP solution via intermittent interactions of itssubmodules, namely,
CG-driven LPP-solutioning and
IPP-solutioning . The efficacy of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 has been demon-strated on real-world airline flight network characterized by an unprecedented (in reference to available literature)conjunct scale-and-complexity, marked by over 4200 flights, 15 crew bases, multiple hub-and-spoke subnetworks,and billion-plus pairings. The distinctive contribution of this paper is also embedded in its empirical investigation ofcritically important questions relating to variability and sensitivity, which the literature is otherwise silent on. In that:• first, the sensitivity analysis of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is performed in the presence and absence of sources of variability. Itis empirically highlighted that
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is capable of offering comparable cost solutions, both in the presenceor absence of the sources of variability. This endorses the robustness of its constitutive modules.• second, the sensitivity of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 with respect to the cost quality of the initial solution and the associatedrun-time is investigated vis- ̀𝑎 -vis three different initialization methods. Again, the robustness of 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is This module is utilized again to facilitate legal crew pairings when required in real-time in other modules of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃
D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 26 of 28rew pairing optimization framework for tackling large-scale & complex flight networks endorsed, considering that it is found to be capable of offering similar cost solutions, despite the significantlyvarying cost and run-time of the initial solutions.• last, the sensitivity of
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 to the termination parameter settings associated with the Optimization Engine’ssubmodules, is investigated. The fact that with the variation in termination settings of both LPP-solutioning andIPP-solutioning (independent of each other)- the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance strongly aligns with the logicallyexpected trends, is a testimony to the robustness of its constitutive modules.Notably,
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 has been implemented using Python scripting language, aligned with the industrial sponsor’spreferences. However, a significant reduction in run-time could be achieved by the use of compiled programminglanguages such as C++, Java, etc. Moreover, employing the domain-knowledge driven CG strategies during the IPP-solutioning phase too, may augment the overall cost- and time-efficiency of the
𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . Furthermore, the emergingtrend of utilizing the
Machine Learning capabilities for assisting combinatorial optimization tasks, may also holdpromise for the airline crew pairing optimization, towards which an exploratory attempt has been made by the authors(Aggarwal, Singh, & Saxena, 2020). Despite the scope for improvement, the authors hope that with the emergenttrend of evolving scale and complexity of airline flight networks, this paper shall serve as an important milestone forthe affiliated research and applications.
Acknowledgment
This research work is a part of an Indo-Dutch joint research project, supported by the Ministry of Electronics andInformation Technology (MEITY), India [grant number 13(4)/2015-CC&BT]; Netherlands Organization for Scien-tific Research (NWO), the Netherlands; and General Electric (GE) Aviation, India. The authors thank GE Aviation,particularly, Saaju Paulose (Senior Manager), Arioli Arumugam (Senior Director- Data & Analytics), and Alla Ra-jesh (Senior Staff Data & Analytics Scientist) for providing real-world test cases, and sharing their domain knowledgewhich has helped the authors significantly in successfully completing this research work.
References
Achterberg, T., & Wunderling, R. (2013). Mixed integer programming: Analyzing 12 years of progress. In
Facets of combinatorial optimization (pp. 449–481). Springer.Aggarwal, D., Saxena, D. K., Bäck, T., & Emmerich, M. (2020a). A Novel Column Generation Heuristic for Airline Crew Pairing Optimizationwith Large-scale Complex Flight Networks. arXiv preprint arXiv:2005.08636 . Retrieved from https://arxiv.org/abs/2005.08636v3
Aggarwal, D., Saxena, D. K., Bäck, T., & Emmerich, M. (2020b). Real-World Airline Crew Pairing Optimization: Customized Genetic Algorithmversus Column Generation Method. arXiv preprint arXiv:2003.03792 . Retrieved from http://arxiv.org/abs/2003.03792
Aggarwal, D., Saxena, D. K., Emmerich, M., & Paulose, S. (2018, November). On large-scale airline crew pairing generation. In (pp. 593–600).Aggarwal, D., Singh, Y. K., & Saxena, D. K. (2020). On Learning Combinatorial Patterns to Assist Large-Scale Airline Crew Pairing Optimization. arXiv preprint arXiv:2004.13714 . Retrieved from https://arxiv.org/abs/2004.13714v3
Anbil, R., Forrest, J. J., & Pulleyblank, W. R. (1998). Column generation and the airline crew pairing problem.
Documenta Mathematica , (1),677.Anbil, R., Gelman, E., Patty, B., & Tanga, R. (1991). Recent advances in crew-pairing optimization at american airlines. Interfaces , (1), 62–74.Anbil, R., Tanga, R., & Johnson, E. L. (1992). A global approach to crew-pairing optimization. IBM Systems Journal , (1), 71–78.Andersen, E. D., & Andersen, K. D. (2000). The mosek interior point optimizer for linear programming: an implementation of the homogeneousalgorithm. In High performance optimization (pp. 197–232). Springer.Barnhart, C., Cohn, A. M., Johnson, E. L., Klabjan, D., Nemhauser, G. L., & Vance, P. H. (2003). Airline crew scheduling. In
Handbook oftransportation science (pp. 517–560). Springer.Barnhart, C., Johnson, E. L., Nemhauser, G. L., Savelsbergh, M. W., & Vance, P. H. (1998). Branch-and-price: Column generation for solvinghuge integer programs.
Operations research , (3), 316–329.Beasley, J. E., & Chu, P. C. (1996). A genetic algorithm for the set covering problem. European journal of operational research , (2), 392–404.Bertsimas, D., & Tsitsiklis, J. N. (1997). Introduction to linear optimization (Vol. 6). Athena Scientific Belmont, MA.Desaulniers, G., Desrosiers, J., Dumas, Y., Marc, S., Rioux, B., Solomon, M. M., & Soumis, F. (1997). Crew pairing at air france.
Europeanjournal of operational research , (2), 245–259.Desaulniers, G., & Soumis, F. (2010). Airline crew scheduling by column generation. CIRRELT Spring School, Montréal Canada .Desrochers, M., Desrosiers, J., & Solomon, M. (1992). A new optimization algorithm for the vehicle routing problem with time windows.
Operationsresearch , (2), 342–354.Desrochers, M., & Soumis, F. (1989). A column generation approach to the urban transit crew scheduling problem. Transportation science , (1),1–13. D. Aggarwal et al.:
Preprint submitted to Elsevier
Page 27 of 28rew pairing optimization framework for tackling large-scale & complex flight networks
Desrosiers, J., Dumas, Y., Desrochers, M., Soumis, F., Sanso, B., & Trudeau, P. (1991).
A breakthrough in airline crew scheduling (Tech. Rep. No.G-91-11). Montreal: Cahiers du GERAD.Desrosiers, J., Soumis, F., & Desrochers, M. (1984). Routing with time windows by column generation.
Networks , (4), 545–565.Deveci, M., & Demirel, N. Ç. (2018a). Evolutionary algorithms for solving the airline crew pairing problem. Computers & Industrial Engineering , , 389–406.Deveci, M., & Demirel, N. C. (2018b). A survey of the literature on airline crew scheduling. Engineering Applications of Artificial Intelligence , , 54–69.Du Merle, O., Villeneuve, D., Desrosiers, J., & Hansen, P. (1999). Stabilized column generation. Discrete Mathematics , (1-3), 229–237.Garey, M. R., & Johnson, D. S. (1979). Computers and intractibility: A guide to the theory of np-completeness (Vol. 44). New York: W. H. Freeman& Company.Gershkoff, I. (1989). Optimizing flight crew schedules.
Interfaces , (4), 29–43.Goldberg, D. E. (2006). Genetic algorithms . Pearson Education India.Gurobi Optimization, L. (2019).
Gurobi optimizer reference manual.
Retrieved from
Gustafsson, T. (1999).
A heuristic approach to column generation for airline crew scheduling . Department of Mathematics, Chalmers Universityof Technology.Hoffman, K. L., & Padberg, M. (1993). Solving airline crew scheduling problems by branch-and-cut.
Management science , (6), 657–682.Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the sixteenth annual acm symposium on theoryof computing (pp. 302–311).Kasirzadeh, A., Saddoune, M., & Soumis, F. (2017). Airline crew scheduling: models, algorithms, and data sets.
EURO Journal on Transportationand Logistics , (2), 111–137.Koch, T., Achterberg, T., Andersen, E., Bastert, O., Berthold, T., Bixby, R. E., ... others (2011). Miplib 2010. Mathematical ProgrammingComputation , (2), 103.Kornilakis, H., & Stamatopoulos, P. (2002). Crew pairing optimization with genetic algorithms. In Hellenic conference on artificial intelligence (pp. 109–120).Land, A. H., & Doig, A. G. (1960). An automatic method of solving discrete programming problems.
Econometrica , (3), 497–520.Levine, D. (1996). Application of a hybrid genetic algorithm to airline crew scheduling. Computers & Operations Research , (6), 547–558.Linderoth, J. T., & Lodi, A. (2011). Milp software (J. J. Cochran, Ed.). John Wiley & Sons.Lodi, A. (2009).
Mixed integer programming computation (M. Jünger et al., Eds.). Springer-Verlag.Lodi, A., & Tramontani, A. (2013). Performance variability in mixed-integer programming. In
Theory driven by influential applications (pp. 1–12).INFORMS.Lübbecke, M. E. (2010). Column generation.
Wiley encyclopedia of operations research and management science .Lübbecke, M. E., & Desrosiers, J. (2005). Selected topics in column generation.
Operations research , (6), 1007–1023.Marsten, R. (1994). Crew planning at delta airlines. Presentation at XV Mathematical Programming Symposium, Ann Arbor, MI, USA .Ozdemir, H. T., & Mohan, C. K. (2001). Flight graph based genetic algorithm for crew scheduling in airlines.
Information Sciences , (3-4),165–173.Padberg, M., & Rinaldi, G. (1991). A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAMreview , (1), 60–100.Tarjan, R. (1972). Depth-first search and linear graph algorithms. SIAM journal on computing , (2), 146–160.Vance, P. H., Barnhart, C., Gelman, E., Johnson, E. L., Krishna, A., Mahidhara, D., ... Rebello, R. (1997). A heuristic branch-and-price approachfor the airline crew pairing problem (Tech. Rep. No. LEC-97-06). Atlanta: Georgia Institute of Technology.Vazirani, V. V. (2003).
Approximation algorithms . Springer, Berlin, Heidelberg, (Chapter 13).Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., ... Contributors, S. . . (2020). SciPy 1.0: FundamentalAlgorithms for Scientific Computing in Python.
Nature Methods , , 261–272. doi: https://doi.org/10.1038/s41592-019-0686-2Zeren, B., & Özkol, İ. (2012). An improved genetic algorithm for crew pairing optimization. Journal of Intelligent Learning Systems and Applica-tions , (01), 70.Zeren, B., & Özkol, I. (2016). A novel column generation strategy for large scale airline crew pairing problems. Expert Systems with Applications , , 133–144. D. Aggarwal et al.: