[PDF] Airline Crew Pairing Optimization Framework for Large Networks with Multiple Crew Bases and Hub-and-Spoke Subnetworks

Abstract

Crew Pairing Optimization aims at generating a set of flight sequences (crew pairings), covering all flights in an airline's flight schedule, at minimum cost, while satisfying several legality constraints. CPO is critically important for airlines' business viability, considering that the crew operating cost is their second-largest expense. It poses an NP-hard combinatorial optimization problem, to tackle which, the state-of-the-art relies on relaxing the underlying Integer Programming Problem (IPP) into a Linear Programming Problem (LPP), solving the latter through Column Generation (CG) technique, and integerization of the resulting LPP solution. However, with the growing scale and complexity of the flight networks (those with a large number of flights, multiple crew bases and/or multiple hub-and-spoke subnetworks), the utility of the conventional CG-practices has become questionable. This paper proposed an Airline Crew Pairing Optimization Framework, AirCROP, whose constitutive modules include the Legal Crew Pairing Generator, Initial Feasible Solution Generator, and an Optimization Engine built on heuristic-based CG-implementation. In this paper, besides the design of AirCROP's modules, insights into important questions related to how these modules interact, which the literature is otherwise silent on, have been shared. These relate to the sensitivity of AirCROP's performance towards: sources of variability over multiple runs for a given problem, initialization method, and termination parameters for LPP-solutioning and IPP-solutioning. The efficacy of the AirCROP has been demonstrated on real-world large-scale and complex flight networks (with over 4200 flights, 15 crew bases, and billion-plus pairings). It is hoped that with the emergence of such complex flight networks, this paper shall serve as an important milestone for affiliated research and applications.

Full PDF

AAirline Crew Pairing Optimization Framework for Large Networkswith Multiple Crew Bases and Hub-and-Spoke Subnetworks

Divyam Aggarwal a , Dhish Kumar Saxena a , ∗ , Thomas Bäck b and Michael Emmerich b a Department of Mechanical & Industrial Engineering (MIED), Indian Institute of Technology Roorkee, Roorkee, Uttarakhand-247667, India b Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Niels Bohrweg 1, 2333 CA Leiden, the Netherlands

A R T I C L E I N F O

Keywords :Airline Crew SchedulingCrew PairingCombinatorial OptimizationColumn GenerationMathematical ProgrammingHeuristics

A B S T R A C T

Crew Pairing Optimization aims at generating a set of ﬂight sequences ( crew pairings ), covering all ﬂights in an airlines’ ﬂight schedule, at minimum cost , while satisfying several legality con-straints. CPO is critically important for airlines’ business viability considering that the crew op-erating cost is second only to the fuel cost. It poses an NP-hard combinatorial optimization prob-lem, to tackle which, the state-of-the-art relies on relaxing the underlying Integer ProgrammingProblem (IPP) into a Linear Programming Problem (LPP), solving the latter through ColumnGeneration (CG) technique, and integerization of the resulting LPP solution. However, with thegrowing scale and complexity of the airlines’ networks (those with large number of ﬂights, mul-tiple crew bases and/or multiple hub-and-spoke subnetworks), the eﬃcacy of the conventionallyused exact

CG-implementations is severely marred, and their utility has become questionable.This paper proposes an Airline Crew Pairing Optimization Framework,

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 , whose con-stitutive modules include the Legal Crew Pairing Generator, Initial Feasible Solution Generator,and an Optimization Engine built on heuristic-based CG-implementation.

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s noveltylies in not just the design of its constitutive modules but also in how these modules interact . Inthat, insights in to several important questions which the literature is otherwise silent on, havebeen shared. These relate to sensitivity analysis of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance in terms of ﬁ-nal solutions’ cost quality and run-time, with respect to - sources of variability over multipleruns for a given problem; cost quality of the initial solution and the run-time spent to obtainit; and termination parameters for LPP-solutioning and IPP-solutioning. In addition, the eﬃ-cacy of the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 has been: (a) demonstrated on real-world airline ﬂight networks withan unprecedented conjunct scale-and-complexity, marked by over 4200 ﬂights, 15 crew bases,and billion-plus pairings, and (b) validated by the research consortium’s industrial sponsor. Itis hoped that with the emergent trend of conjunct scale and complexity of airline networks, thispaper shall serve as an important milestone for aﬃliated research and applications.

1. Introduction

Airline scheduling poses some of the most challenging optimization problems encountered in the entire OperationsResearch (OR) domain. For a large-scale airline, the crew operating cost constitutes the second-largest cost compo-nent, next to the fuel cost, and even its marginal improvements may translate to annual savings worth millions ofdollars. Given the potential for huge cost-savings, Airline Crew Scheduling is recognized as a critical planning activ-ity. It has received an unprecedented attention from the researchers of the OR community over the last three decades.Conventionally, it is tackled by solving two problems, namely,

Crew Pairing Optimization Problem (CPOP) and

CrewAssignment Problem , in a sequential manner. The former problem is aimed at generating a set of ﬂight sequences (eachcalled a crew pairing ) that covers all ﬂights from an airlines’ ﬂight schedule, at minimum cost, while satisfying severallegality constraints linked to federations’ rules, labor laws, airline-speciﬁc regulations, etc. These optimally-derivedcrew pairings are then fed as input to the latter problem, which is aimed to generate a set of pairing sequences (eachsequence is a schedule for an individual crew member), while satisfying the corresponding crew requirements. Be-ing the foremost step of the airline crew scheduling, CPOP is the main focus of this paper, and interested readers arereferred to Barnhart et al. (2003) for a comprehensive review of the airline crew scheduling. ∗ Corresponding author; Email Address: [email protected]; Postal Address: Room No.-231, East Block, MIED, IIT Roorkee, Roorkee,Uttarakhand-247667, India; Phone: +91-8218612326 [email protected] (D. Aggarwal); [email protected] (D.K. Saxena); [email protected] (T. Bäck); [email protected] (M. Emmerich)

ORCID (s): (D. Aggarwal); (D.K. Saxena); (T. Bäck); (M. Emmerich)

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 1 of 28 a r X i v : . [ c s . M S ] N ov rew pairing optimization framework for tackling large-scale & complex ﬂight networks CPOP is an

NP-hard combinatorial optimization problem (Garey & Johnson, 1979). It is modeled as either a setpartitioning problem (SPP) in which each ﬂight is allowed to be covered by only one pairing, or a set covering problem (SCP) in which each ﬂight is allowed to be covered by more than one pairing. In CPOP, a crew pairing has to satisfyhundreds of legality constraints (Section 2.2) to be classiﬁed as legal , and it is imperative to generate legal pairingsin a time-eﬃcient manner to assist optimization search. Several legal pairing generation approaches, based on eithera ﬂight- or a duty-network, have been proposed in the literature (Aggarwal et al., 2018). Depending upon how thelegal pairing generation module is invoked, two CPOP solution-architectures are possible. In the ﬁrst architecture, allpossible legal pairings are enumerated a priori the CPOP-solutioning. However, this is computationally-tractable onlyfor small-scale CPOPs (with ≈ <1000 ﬂights). Alternatively, legal pairings are generated during each iteration of theCPOP-solutioning, but only for a subset of ﬂights, so the CPOP solution could be partially improved before triggeringthe next iteration. Such an architecture mostly suits medium- to large-scale CPOPs (with ≈ ≥ heuristic-based optimization techniques and mathematical programming tech-niques, are commonly employed (Section 2.3). In the former category, Genetic Algorithms (GAs) which are population-based randomized-search heuristics (Goldberg, 2006) are most commonly used. However, they are found to be eﬃcientonly for tackling very small-scale CPOPs (Ozdemir & Mohan, 2001). Alternatively, several mathematical program-ming based approaches do exist to solve CPOPs of varying-scales. CPOP is inherently an Integer Programming Prob-lem (IPP), and some approaches have used standard Integer Programming (IP) techniques to ﬁnd a best-cost pairingsubset from a pre-enumerated pairings’ set (Hoﬀman & Padberg, 1993). However, these approaches have proven eﬀec-tive only with small-scale CPOPs with up to a million pairings. This perhaps explains the prevalence of an altogetherdiﬀerent strategy, in which the original CPOP/IPP is relaxed into a Linear Programming Problem (LPP); the LPP issolved iteratively by invoking a LP solver and relying on

Column Generation (CG) technique to generate new pairingsas part of the pricing sub-problem; and ﬁnally, the resulting LPP solution is integerized using IP techniques and/or somespecial connection-ﬁxing heuristics. The challenge associated with this strategy is that even though the LPP solver maylead to a near-optimal LPP solution, the scope of ﬁnding a good-cost IPP solution is limited to the pairings available inthe LPP solution. To counter this challenge, heuristic implementations of branch-and-price framework (Barnhart et al.(1998)) in which CG is utilized during the integerization phase too, have been employed to generate new legal pairingsat nodes of the IP-search tree. However, the eﬃcacy of such heuristic implementations depends on a signiﬁcant numberof algorithmic-design choices (say, which branching scheme to adopt, or how many CG-iterations to perform at thenodes). Furthermore, it is noteworthy that the scale and complexity of ﬂight networks have grown alarmingly over thepast decades. As a result, an inestimably large number of new pairings are possible under the pricing sub-problem,given which most existing solution methodologies are rendered computationally-ineﬃcient. Recognition of such chal-lenges have paved the way towards domain-knowledge driven CG strategies to generate a manageable, yet crucial partof the overall pairings’ space under the pricing sub-problem (Zeren & Özkol, 2016). Though rich in promise, the eﬃ-cacy of this approach is yet to be explored vis- ̀𝑎 -vis the emergent large-scale and complex ﬂight networks characterizedby multiple crew bases and/or multiple hub-and-spoke subnetworks where billions of legal pairings are possible.In an endeavor to address airline networks with conjunct scale and complexity, this paper proposes an Airline CrewPairing Optimization Framework ( 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ) based on domain-knowledge driven CG strategies, and:• presents not just the design of its constitutive modules (including Legal Crew Pairing Generator, Initial FeasibleSolution Generator, and Optimization Engine powered by CG-driven LPP-solutioning and IPP-solutioning), butalso how these modules interact • discusses how sensitive its performance is to - sources of variability over multiple runs for a given problem; costquality of the initial solution and the run-time spent to obtain it; and termination parameters for LPP-solutioningand IPP-solutioning. Such an investigation promises important insights for researchers and practitioners oncritical issues which are otherwise not discussed in the existing literature.• presents empirical results for real-world, large-scale (over 4200 ﬂights), complex ﬂight network (over 15 crewbases and multiple hub-and-spoke subnetworks) for a US-based airline, the data for which has been provided bythe research consortium’s industrial partner. For NP-hard (NP-complete) problems, no polynomial time algorithms on sequential computers are known up to now. However, veriﬁcationof a solution might be (can be) accomplished eﬃciently, i.e., in polynomial time.

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 2 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

The outline of the remaining paper is as follows. Section 2 discusses the underlying concepts, related work, andproblem formulation; Section 3 entails the details of the proposed

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ; Section 4 presents the results of thecomputational experiments along with the corresponding observations; and Section 5 concludes the paper as well asbrieﬂy describes the potential future directions.

2. Crew Pairing Optimization: Preliminaries, Related Work and Problem Formulation

This section ﬁrst describes the preliminaries, including the associated terminology, pairings’ legality constraints,and pairings’ costing criterion. Subsequently, the related work is presented in which the existing CPOP solutionapproaches are discussed. Lastly, the airline CPOP formulation is presented.

In airline crew operations, each crew member is assigned a ﬁxed (home) airport, called a crew base . A crew pairing (or a pairing ) is a ﬂight sequence operated by a crew, that begins and ends at the same crew base, and satisﬁes the givenpairing legality constraints (detailed in Section 2.2). An example of a crew pairing with the Dallas (DAL) airport as thecrew base is illustrated in Figure 1. In a crew pairing, the legal sequence of ﬂights operated by a crew in a single workingday (not necessarily equivalent to a calendar day) is called a crew duty or a duty . A sit-time or a connection-time is asmall rest-period, provided between any two consecutive ﬂights within a duty for facilitating operational requirementssuch as aircraft changes by the crew, turn-around operation for the aircraft, etc. An overnight-rest is a longer rest-period, provided between any two consecutive duties within a pairing. Moreover, two short-periods, provided in thebeginning and ending of any duty within a pairing, are called brieﬁng and de-brieﬁng time , respectively. The total timeelapsed in a crew pairing, i.e., the time for which a crew is away from its crew base is called the time away from base (TAFB). Sometimes, it is required for a crew to be transported at an airport to ﬂy their next ﬂight. For this, the crewtravels as passenger in another ﬂight, ﬂown by another crew, to arrive at the required airport. Such a ﬂight is called a deadhead ﬂight or a deadhead for the crew traveling as passenger. It is desired by an airline to minimize the numberof deadheads (ideally zero), as it aﬀects the airline’s proﬁt in two-folds. Firstly, the airline suﬀers a loss of the revenueon the passenger seat being occupied by the deadhead-ing crew, and secondly, the airline has to pay the hourly wagesto the deadhead-ing crew even when it is not operating the ﬂight.

DAL(Crew base)BOILASONT 0800 0910 1025 1325 1500 1945 1010 1235 1420 1850Sit-time Sit-time Sit-time O v e r n i gh t R e s t Flight LegBriefing-time Debriefing-timeCrew Duty 1 (Elapsed Time) Crew Duty 2Time Away From Base (TAFB)

Figure 1:

An example of a crew pairing starting from

Dallas (DAL) crew base

To govern the safety of crew members, airline federations such as European Aviation Safety Agency, FederalAviation Administration, and others, have laid down several rules and regulations, which in addition to the airline-speciﬁc regulations, labor laws, etc. are required to be satisﬁed by a pairing to be “legal”. These legality constraintscould be broadly categorized as follows:•

Connection-city constraint (  𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ): this constraint requires the arrival airport of a ﬂight (or the last ﬂight of aduty) within a pairing to be same as the departure airport of its next ﬂight (or the ﬁrst ﬂight of its next duty). D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 3 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks • Sit-time ( 𝐶 𝑠𝑖𝑡 ) and Overnight-rest (  𝑛𝑖𝑔ℎ𝑡 ) constraints : these constraints imposes the respective maximum andminimum limits on the duration of sit-times and overnight-rests, where these limits are governed by airlines andfederations’ regulations.• Duty constraints (  𝑑𝑢𝑡𝑦 ): these constraints govern the regulations linked to the crew duties. For instance, theyimpose maximum limits on the– number of ﬂights allowed in a duty of a pairing; duty elapsed-time and thecorresponding ﬂying-time; number of duties allowed in a pairing, etc.• Start- and end-city constraint (  𝑏𝑎𝑠𝑒 ): this constraint requires the beginning airport (departure airport of the ﬁrstﬂight) and ending airport (arrival airport of the last ﬂight) of a pairing, to be the same crew base.• Other constraints (  𝑜𝑡ℎ𝑒𝑟 ): Airlines formulate some speciﬁc constraints, according to their operational require-ments, so as to maximize their crew utilization. For example, a pairing is refrained from involving overnight-restsat the airports that belong to the same city as the crew base from which the pairing started, etc.Considering the multiplicity of the above constraints, it is critical to develop a time-eﬃcient legal crew pairing gener-ation approach , enabling their prompt availability, when their requirement arises during the optimization.In general, a pairing’s cost could be split into the ﬂying cost and non-ﬂying (variable) cost . The ﬂying cost isthe cost incurred in actually ﬂying all the given ﬂights, and is computed on hourly-basis. The variable cost is thecost incurred during the non-ﬂying hours of the pairing, and is made up of two sub-components, namely, hard cost and soft cost . The hard cost involves the pairing’s hotel cost, meal cost, and excess pay – the cost associated with thediﬀerence between the guaranteed hours of pay and the actual ﬂying hours. Here, the pairing’s hotel cost is the lodgingcost incurred during its overnight-rests, and its meal cost is computed as a fraction of its TAFB. The soft cost is theundesirable cost associated with the number of aircraft changes (during ﬂight-connections) in the pairing, etc. As mentioned in Section 1, the existing CPOP solution approaches are based on either heuristic or mathematicalprogramming techniques. Among the heuristic-based approaches, GA is the most widely adopted technique, andBeasley & Chu (1996) is the ﬁrst instance to customize a GA (using guided GA-operators) for solving a general classof SCPs. In that, the authors validated their proposed approach on small-scale synthetic test cases (with over 1,000 rowsand just 10,000 columns). The important details of the GA-based CPOP solution approaches, available in the literature,are reported in Table 1. Notably, the utility of the studies reported in the table, have been demonstrated on CPOPs

Table 1

Key facts around the GA-based CPOP solution approaches, available in the literature

Literature Studies Modeling Timetable Airline Test Cases* Airlines

Levine (1996) Set Partitioning - 40R; 823F; 43,749P -Ozdemir & Mohan (2001) Set Covering Daily 28R; 380F; 21,308P Multiple AirlinesKornilakis & Stamatopoulos (2002) Set Covering Monthly 1R; 2,100F; 11,981P Olympic AirwaysZeren & Özkol (2012) Set Covering Monthly 1R; 710F; 3,308P Turkish AirlinesDeveci & Demirel (2018a) Set Covering - 12R; 714F; 43,091P Turkish AirlinesR represents the number of real-world test cases considered; F and P represents the maximum number of ﬂights andpairings covered, therein. with reasonably small number of ﬂights, leading to relatively smaller number of pairings. Though, CPOPs with 2,100and 710 ﬂights have been tackled by Kornilakis & Stamatopoulos (2002) and Zeren & Özkol (2012) respectively,only a subset of all possible legal pairings has been considered by them for ﬁnding the reported solutions. Zeren& Özkol (2012) proposed a GA with highly-customized operators, which eﬃciently solved small-scale CPOPs butfailed to solve large-scale CPOPs with the same search-eﬃciency. Furthermore, Aggarwal, Saxena, et al. (2020b)tackled a small-scale CPOP (with 839 ﬂights and multiple hub-and-spoke sub-networks) using a customized-GA (withguided operators) as well as mathematical programming techniques. The authors concluded that customized-GAs areineﬃcient in solving complex versions of even small-scale ﬂight networks, compared to a mathematical programming-based solution approach.

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 4 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Several mathematical programming-based CPOP solution approaches have been proposed in the literature overpast few decades, and based on the size and characteristics of the ﬂight network being tackled, these approaches havebeen categorized into either of the three general classes. In the ﬁrst class of approaches, all legal pairings or a subset ofgood pairings are enumerated prior to the CPOP-solutioning, and the corresponding CPOP/IPP model is solved usingstandard IP techniques (such as branch-and-bound algorithm (Land & Doig, 1960)). Gershkoﬀ (1989) proposed aniterative solution approach, which is initialized using a set of artiﬁcial pairings (each covering a single ﬂight at a highpseudo-cost). In that, each iteration involves selection of very few pairings (5 to 10); enumeration of all legal pairingsusing the ﬂights covered in the selected pairings; optimization of the resulting SPP to ﬁnd the optimal pairings; andlastly, replacement of the originally selected pairings with the optimal pairings, only if the latter oﬀers a better cost.The search-eﬃciency of such an approach is highly dependent on the sub-problem-size (handled up to 100 ﬂights and5,000 pairings), as the length and breadth of the branching tree increases drastically with an increase in sub-problem-size. Hoﬀman & Padberg (1993) proposed an alternative approach to tackle SPPs with up to 825 ﬂights and 1.05million pairings in which all possible pairings are enumerated a priori, and the resulting SPP is solved to optimalityusing a branch-and-cut algorithm . Such approaches are eﬃcient only in tackling small-scale CPOPs, that too with upto a million pairings. However, even small-scale CPOPs may involve large number of pairings (an instance reportedin Vance et al. (1997) had 250 ﬂights and over ﬁve million pairings), rendering it computationally-intractable to usesuch approaches.The second class of approaches relies on relaxing the integer constraints in the original CPOP/IPP to form an LPP,which is then solved iteratively by– invoking an LP solver and generating new pairings using CG; and integerizing theresulting LPP solution. In any iteration of the LPP-solutioning (referred to as an LPP iteration ), an LP solver (based oneither a simplex method or an interior-point method ) is invoked on the input pairing set to ﬁnd the LPP solution andits corresponding dual information (shadow price corresponding to each ﬂight-coverage constraint), which are thenutilized to generate new pairings as part of the pricing sub-problem, promising the corresponding cost-improvements.For the ﬁrst LPP iteration, any set of pairings covering all the ﬂights becomes the input to the LP solver, and forany subsequent LPP iteration, the current LPP solution and the set of new pairings (from the pricing sub-problem)constitute the new input. For more details on how new pairings are generated under the pricing sub-problem in the CGtechnique, interested readers are referred to Vance et al. (1997); Lübbecke & Desrosiers (2005). As cited in Zeren &Özkol (2016), the CG technique has several limitations, out of which the prominent ones are– heading-in eﬀect (poordual information in initial LPP iteration leads to generation of irrelevant columns), bang-bang eﬀect (dual variablesoscillate from one extreme point to another, leading to poor or slower convergence), and tailing-oﬀ eﬀect (the cost-improvements in the later LPP iterations taper-oﬀ). While, diﬀerent stabilization techniques are available for CG inthe literature Du Merle et al. (1999); Lübbecke (2010), the use of interior point methods is gaining prominence. Anbilet al. (1991) presented the advancements at the American Airlines, and enhanced the approach proposed by Gershkoﬀ(1989) (discussed above), by leveraging the knowledge of dual variables to screen-out/price-out the pairings from theenumerated set at each iteration, enabling it to solve larger sub-problems (up to 25 ﬂights and 100,000 pairings). Asan outcome of a collaboration between IBM and American Airlines, Anbil et al. (1992) proposed an iterative globalsolution approach (though falling short of global optimization) in which an exhaustive set of pairings ( ≈ ≈ The branch-and-cut algorithm was ﬁrst proposed by Padberg & Rinaldi (1991) to solve Mixed Integer Programs (MIP), by integrating thestandard branch-and-bound and cutting-plane algorithms. For comprehensive details of the MIP solvers, interested readers are referred to Lodi(2009); Linderoth & Lodi (2011); Achterberg & Wunderling (2013). The class of interior-point methods was ﬁrst introduced by Karmarkar (1984). In that, a polynomial-time algorithm, called

Karmarkar’salgorithm , was proposed, which, in contrary to simplex method, searches for the best solution by traversing the interior of the feasible region of thesearch space.

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 5 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks pairings available in it may not ﬁt well together to constitute a good-cost IPP solution.The third class of approaches share a similar solution-architecture as of the preceding class, however, diﬀers interms of the integerization of the LPP solution. In that, a heuristic branch-and-price framework is adopted, wherein,CG is utilized during the integerization phase too, to generate new legal pairings at nodes of the MIP-search tree.Desrosiers et al. (1991) is the ﬁrst instance that solved CPOP using a branch-and-price framework. However, giventhe inestimable number of legal pairings possible for even medium-scale CPOPs, numerous branch-and-price basedheuristic-approaches have been proposed over the last three decades (Desaulniers et al., 1997; Vance et al., 1997;Anbil et al., 1998; Desaulniers & Soumis, 2010). Notably, the development of these approaches, being heuristic innature, require a signiﬁcant number of algorithmic-design choices to be taken empirically, which may vary with thecharacteristics of the ﬂight networks being solved for. To name a few such decisions, which branching scheme shouldbe employed (branching on linear variables, branching on ﬂight-connections, or others), should CG be performedon each node of the MIP-search tree, how many CG iterations to be performed each time, etc. Furthermore, thecommercial LP and MIP solvers are not much open to modiﬁcations, making it diﬃcult for the new researchers toimplement a computationally- and time-eﬃcient branch-and-price framework from scratch. For further details of theexisting CPOP solution approaches, interested readers are referred to recent survey articles– Kasirzadeh et al. (2017);Deveci & Demirel (2018b).In addition to the above classiﬁcation of solution approaches, the literature diﬀers on the notion of how the pricingsub-problem is modeled and solved to generate new legal pairings during the LPP iterations. However, the focus ofthis paper is not on the solution to the pricing sub-problem step, but on the interactions between diﬀerent modules ofa CG-based CPOP solution approach. Hence, for details on the existing work related to the pricing sub-problem step,interested readers are referred to Vance et al. (1997); Aggarwal, Saxena, et al. (2020a). As mentioned earlier, a CPOP is intrinsically an IPP, modeled either as a SCP or a SPP. Notably, the SCP formu-lation provides higher ﬂexibility during its solutioning compared to the SPP formulation by accommodating deadheadﬂights in the model, possibly resulting in faster convergence (Gustafsson, 1999). For a given ﬂight set  (including 𝐹 ﬂights) that could be covered in numerous ways by a set of legal pairings  (including 𝑃 pairings), the set coveringproblem is aimed to ﬁnd a subset of pairings ( ∈  ), say  ∗ 𝐼𝑃 , which not only covers each ﬂight ( ∈  ) at least once ,but does it at a cost lower than any alternative subset of pairings in  . In that, while ﬁnding  ∗ 𝐼𝑃 ( ⊆  ), each pairing 𝑝 𝑗 ∈  corresponds to a binary variable 𝑥 𝑗 , which represents whether the pairing 𝑝 𝑗 is included in  ∗ 𝐼𝑃 (marked by 𝑥 𝑗 = 1 ) or not ( 𝑥 𝑗 = 0 ). Here, 𝑝 𝑗 is a 𝐹 -dimensional vector, whose each element, say 𝑎 𝑖𝑗 , represents whether the ﬂight 𝑓 𝑖 is covered by pairing 𝑝 𝑗 (marked by 𝑎 𝑖𝑗 = 1 ) or not ( 𝑎 𝑖𝑗 = 0 ). In this background, the IPP formulation, as used inthis paper, is as follows.Minimize 𝑍 𝐼𝑃 = 𝑃 ∑ 𝑗 =1 𝑐 𝑗 𝑥 𝑗 + 𝜓 𝐷 ⋅ ( 𝐹 ∑ 𝑖 =1 ( 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 − 1 )) , (1)subject to 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 ≥ , ∀ 𝑖 ∈ {1 , , ..., 𝐹 } (2) 𝑥 𝑗 ∈ ℤ = {0 , , ∀ 𝑗 ∈ {1 , , ..., 𝑃 } (3)where , 𝑐 𝑗 ∶ the cost of a legal pairing 𝑝 𝑗 ,𝜓 𝐷 ∶ an airline-deﬁned penalty cost against each deadhead in the solution ,𝑎 𝑖𝑗 = 1 , if ﬂight 𝑓 𝑖 is covered in pairing 𝑝 𝑗 ; else 𝑥 𝑗 = 1 , if pairing 𝑝 𝑗 contributes to Minimum 𝑍 ; else In the objective function (Equation 1), the ﬁrst component gives the sum of the individual costs of the pairings selectedin the solution, while the other component gives the penalty cost for the deadheads incurred in the solution (note, The branch-and-price algorithm was originally proposed by Barnhart et al. (1998) as an exact algorithm to solve then-known large-scale IPPs,and has been utilized to solve a variety of combinatorial optimization problems in transportation such as Desrosiers et al. (1984); Desrochers &Soumis (1989); Desrochers et al. (1992).

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 6 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks ( ∑ 𝑃𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 − 1) gives the number of deadheads, corresponding to the ﬂight 𝑓 𝑖 ). Notably, in the above formulation,it is assumed that the set of all possible legal pairings, namely,  , are available a priori , and the task is to determine  ∗ 𝐼𝑃 . However, the generation of  a priori is computationally-intractable for large-scale CPOPs, as mentioned inSection 2.3. Hence, the solution to the CPOP/IPP is pursued in conjunction with the corresponding LPP (formulationdeferred till Section 3.3.1) assisted by the CG technique.

3. Proposed Airline Crew Pairing Optimization Framework (

AirCROP ) This section presents the constitutive modules of the proposed optimization framework -

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 , their working,and their interactions. As per the schematic in Figure 2,

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 accepts a set of given ﬂights  along with thepairings’ legality constraints and costing criterion as input, and outputs a minimal-cost set of legal pairings  ⋆𝐼𝑃 , thatcovers all given ﬂights. This transition from the input to output is enabled by the constitutive modules, namely, the Legal Crew Pairing Generator , the

Initial Feasible Solution Generator , and an

Optimization Engine in turn enabled by

CG-driven LPP-solutioning and

IPP-solutioning submodules and their intermittent interactions. While parts of thesemodules have been presented elsewhere (Aggarwal et al., 2018; Aggarwal, Saxena, et al., 2020a) in isolation, these arebeing detailed below towards a holistic view on the experimental results presented later.

This module enables generation of the legal pairings in a time-eﬃcient manner, so they could feed real-time intothe other modules - Initial Feasible Solution Generator and the optimization engine. For time-eﬃciency, it employs aparallel, duty-network based legal pairing generation approach, whose distinctive contributions are two-folds. Firstly,a crew base centric parallel architecture is adopted considering that several duty- and pairing- constitutive constraintsdo vary with crew bases. In that, for an input ﬂight set, the legal pairing generation process is decomposed intoindependent sub-processes (one for each crew base), running in parallel on idle-cores of the central processing unit(CPU). This leads to a signiﬁcant reduction in the pairing generation time ( ≈

10 folds for a CPOP with 15 crew bases,as demonstrated in Aggarwal et al. (2018)). Secondly, the set of all possible legal duties and the corresponding dutyovernight-connection graph with-respect-to each crew base are enumerated and stored a priori the CPOP-solutioning.In a duty overnight-connection graph, a node represents a legal duty, and an edge between any two nodes representsa legal overnight-rest connection between the respective duties. Such a preprocessing ensures that all the connection-city, sit-time, duty, and overnight-rest constraints get naturally satisﬁed, eliminating the need for their re-evaluationduring the generation of legal pairings, and leading to a signiﬁcant reduction in the legal pairing generation time.The implementation of this module, formalized in Algorithms 1 & 2, is elaborated below. For solving any CPOP,the foremost step of the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is to preprocess the entire duty-connection network– set of legal duties  𝑏 and dutyovernight-connection graph  𝑑𝑏 ( ≡ (  𝑏 ,  𝑑𝑏 )) for each crew base 𝑏 in the given set of crew bases  , where  𝑑𝑏 is the setof legal overnight-rest connections between duty-pairs ∈  𝑏 . The procedure for the above preprocessing is presentedin Algorithm 1. In that, the ﬁrst step is the generation of a ﬂight-connection graph (denoted by  𝑓 ) by evaluating thelegality of connection-city (  𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ) and sit-time (  𝑠𝑖𝑡 ) constraints between every ﬂight-pair in the given ﬂight schedule  (line 1). Here, in  𝑓 ( ≡ (  ,  𝑓 )) ,  is the set of nodes (ﬂights) and  𝑓 is the set of edges (legal ﬂight connections).Subsequently,  𝑓 is used for legal duty enumeration, by decomposing the process into independent sub-processes, onefor each crew base 𝑏 ∈  , and executing them in parallel (lines 2-12). In each of these sub-processes, enumeration oflegal duties, starting from each ﬂight 𝑓 ∈  , is explored. In that:• ﬂight 𝑓 is added to an empty candidate duty stack, given by 𝑑𝑢𝑡𝑦 (line 4).• the ﬂight-sequence in 𝑑𝑢𝑡𝑦 is checked for satisfaction of duty constraints  𝑑𝑢𝑡𝑦 , and if satisﬁed, 𝑑𝑢𝑡𝑦 is addedto the desired legal duty set  𝑏 (lines 5-6). Notably, if 𝑓 has at least one connection with another ﬂight in  𝑓 ,and if the duty constraints permit, then more ﬂights could be accommodated in 𝑑𝑢𝑡𝑦 , leading to enumeration ofother legal duties (lines 7-9).• a Depth-ﬁrst Search (DFS) algorithm (Tarjan, 1972) is adapted, which is called recursively to enumerate legalduties, starting from a parent ﬂight node ( 𝑝𝑎𝑟𝑒𝑛𝑡 ), by exploring its all successive paths in  𝑓 in a depth-ﬁrstmanner (lines 16-25). In each recursion, a child ﬂight node ( 𝑐ℎ𝑖𝑙𝑑 ) is pushed into 𝑑𝑢𝑡𝑦 , the updated ﬂight-sequence is checked for satisfaction of  𝑑𝑢𝑡𝑦 , and if satisﬁed, 𝑑𝑢𝑡𝑦 is yielded to  𝑏 , followed by another recursionof DFS() with 𝑐ℎ𝑖𝑙𝑑 as the new 𝑝𝑎𝑟𝑒𝑛𝑡 . In this way, all legal duties, starting from ﬂight 𝑓 , are enumerated. D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 7 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Initial Feasible Solution Generator : Generation of a set of legal pairings covering all flights using IPDCH

LP solver on the

Input: , for Solving dual using interior-point methodChecktermination

IPP-solutioning

Integerization of LPPsolution offering Yes

Output : Final crew pairing solution No Yes No

Input : Set of given flights , and pairings' legality constraints and costing criteria

MIP solver

CG-driven LPP-solutioning

CG heuristic ( )

Legal CrewPairingGenerator using parallelarchitecture

ColumnGeneration (generation/re-induction ofpairings with negative ) Input: , for

Optimization Engine

Checktermination

Figure 2:

A schematic of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 illustrating the interactions between its constitutive modules– Legal Crew PairingGenerator, Initial Feasible Solution Generator, Optimization Engine (CG-driven LPP-solutioning interacting with IPP-solutioning). The CG heuristic in LPP-solutioning generates a set of fresh pairings  𝑡𝐶𝐺 at any LPP iteration 𝑡 using thefollowing CG strategies: Deadhead reduction ( 𝐶𝐺𝐷 , generating  𝑡𝐶𝐺𝐷 ), Crew Utilization enhancement ( 𝐶𝐺𝑈 , generating  𝑡𝐶𝐺𝑈 ), Archiving ( 𝐶𝐺𝐴 , generating  𝑡𝐶𝐺𝐴 ), and Random exploration ( 𝐶𝐺𝑅 , generating  𝑡𝐶𝐺𝑅 ). The interactions betweenLPP-solutioning and IPP-solutioning are tracked by the counter 𝑇 . Subsequently, 𝑓 is popped out from 𝑑𝑢𝑡𝑦 , and duty enumeration using other ﬂights in  is explored (lines 3 & 11).The resulting set  𝑏 is then used to generate the duty overnight-connection graph  𝑑𝑏 ), by evaluating the legality ofconnection-city (  𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ) and overnight-rest (  𝑛𝑖𝑔ℎ𝑡 ) constraints between every duty-pair ∈  𝑏 (line 13). Here, in  𝑑𝑏 ( ≡ (  𝑏 ,  𝑑𝑏 )) ,  𝑏 is the set of nodes (legal duties), and  𝑑𝑏 is the set of edges (legal overnight-rest connections).The preprocessed sets of legal duties and the corresponding duty overnight-connection graphs are utilized to enu-merate legal pairings for any input ﬂight set (say  ∗ ) or a duty set (say  ∗ ), when required in real-time in othermodules of the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . Its procedure, formalized in Algorithm 2, is elaborated below. For legal pairing enumera-tion, the same crew base driven parallel architecture is utilized in which the process is decomposed into independentsub-processes, one for each crew base 𝑏 ∈  , running in parallel on idle-cores of the CPU (line 1). In each of thesesub-processes, the ﬁrst step is to update  𝑏 and  𝑑𝑏 , by removing duties ∉  ∗ if  ∗ is input, or those duties that coverﬂights ∉  ∗ if  ∗ is input (line 2). Subsequently, the enumeration of legal pairings, starting from each duty ( 𝑑𝑢𝑡𝑦 ) ∈  𝑏 , is explored (line 3). In that: D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 8 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Algorithm 1:

Procedure for enumeration of legal duties and duty overnight-connection graphs

Input:  ;  ; and constraints:  𝑐𝑜𝑛𝑛𝑒𝑐𝑡 ,  𝑠𝑖𝑡 ,  𝑑𝑢𝑡𝑦 &  𝑛𝑖𝑔ℎ𝑡 Output:  𝑏 &  𝑑𝑏 ∀ 𝑏 ∈   𝑓 ← Generate the ﬂight-connection graph by evaluating  𝑐𝑜𝑛𝑛𝑒𝑐𝑡 &  𝑠𝑖𝑡 between each pair of ﬂights ∈  ⊳  𝑓 ≡ (  ,  𝑓 ) for each crew base 𝑏 ∈  in parallel do for each ﬂight 𝑓 ∈  do Push 𝑓 into an empty 𝑑𝑢𝑡𝑦 if updated ﬂight-sequence in 𝑑𝑢𝑡𝑦 satisﬁes constraints in  𝑑𝑢𝑡𝑦 then Add 𝑑𝑢𝑡𝑦 to  𝑏 if 𝑓 has at least one ﬂight-connection in  𝑓 then DFS( 𝑑𝑢𝑡𝑦, 𝑓 ,  𝑓 ,  𝑑𝑢𝑡𝑦 ) , and add the enumerated duties to  𝑏 end end Pop out 𝑓 from 𝑑𝑢𝑡𝑦 end  𝑑𝑏 ← Generate the duty overnight-connection graph by evaluating  𝑛𝑖𝑔ℎ𝑡 between each pair of duties ∈  𝑏 end return  𝑏 &  𝑑𝑏 ∀ 𝑏 ∈  ⊳ DFS( 𝑑𝑢𝑡𝑦, 𝑝𝑎𝑟𝑒𝑛𝑡,  𝑓 ,  𝑑𝑢𝑡𝑦 ) for each 𝑐ℎ𝑖𝑙𝑑 of 𝑝𝑎𝑟𝑒𝑛𝑡 in  𝑓 do Push 𝑐ℎ𝑖𝑙𝑑 into 𝑑𝑢𝑡𝑦 if updated ﬂight-sequence in 𝑑𝑢𝑡𝑦 satisﬁes  𝑑𝑢𝑡𝑦 then yield 𝑑𝑢𝑡𝑦 to  𝑏 if 𝑐ℎ𝑖𝑙𝑑 has at least one connection in  𝑓 then DFS( 𝑑𝑢𝑡𝑦, 𝑐ℎ𝑖𝑙𝑑,  𝑓 ,  𝑑𝑢𝑡𝑦 ) end end Pop out 𝑐ℎ𝑖𝑙𝑑 from 𝑑𝑢𝑡𝑦 end • the 𝑑𝑢𝑡𝑦 is pushed into an empty candidate pairing stack, given by 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 , only if the departure airport of 𝑑𝑢𝑡𝑦 is same as the crew base 𝑏 (lines 4-5).• the 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is checked for satisfaction of pairing constraints  𝑜𝑡ℎ𝑒𝑟 , and if satisﬁed, 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is further checkedfor satisfaction of end-city constraint  𝑏𝑎𝑠𝑒 , which ensures that the arrival airport of the 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 ’s last duty issame as the crew base 𝑏 . – If 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisﬁes  𝑏𝑎𝑠𝑒 , it is classiﬁed as legal , and is added to the desired pairing set  ∗ (lines 7-8). – If 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 does not satisfy  𝑏𝑎𝑠𝑒 , it is not complete, and more duties are required to be covered in it tocomplete the legal duty-sequence. This is only possible if 𝑑𝑢𝑡𝑦 has at least one overnight-rest connectionin  𝑑𝑏 . And if it does, the DFS() sub-routine, similar to the one used in legal duty enumeration, is calledrecursively to enumerate legal pairings, starting from a parent duty node ( 𝑝𝑎𝑟𝑒𝑛𝑡 ), by exploring its allsuccessive paths in  𝑑𝑏 in a depth-ﬁrst manner (lines 18-28). In each recursion: ◦ a child duty node ( 𝑐ℎ𝑖𝑙𝑑 ) is pushed into the 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 (line 19). ◦ the updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is checked for satisfaction of ﬁrst  𝑜𝑡ℎ𝑒𝑟 and then  𝑏𝑎𝑠𝑒 (lines20-21). ◦ if it satisﬁes both constraints, then 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 is complete (legal), and is yielded to the desired pairing set  ∗ (line 22). ◦ if it satisﬁes  𝑜𝑡ℎ𝑒𝑟 but not  𝑏𝑎𝑠𝑒 , then another recursion of DFS() with 𝑐ℎ𝑖𝑙𝑑 as new 𝑝𝑎𝑟𝑒𝑛𝑡 is called,only if 𝑐ℎ𝑖𝑙𝑑 has at least one duty overnight-rest connection in  𝑑𝑏 (lines 23-25).In the above way, all legal pairings, starting from 𝑑𝑢𝑡𝑦 , are enumerated using the DFS() sub-routine. Subsequently,

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 9 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Algorithm 2:

Procedure for enumeration of legal pairings from an input ﬂight set  ∗ or a duty set  ∗ Input:  ∗ or  ∗ ;  ;  𝑏 &  𝑑𝑏 ∀ 𝑏 ∈  ; and constraints:  𝑏𝑎𝑠𝑒 &  𝑜𝑡ℎ𝑒𝑟 Output:  ∗ for each crew base 𝑏 ∈  in parallel do Update  𝑏 &  𝑑𝑏 by removing duties ∉  ∗ if  ∗ is input, or by removing those duties which cover ﬂights ∉  ∗ if  ∗ is input for each 𝑑𝑢𝑡𝑦 ∈  𝑏 do if departure airport of 𝑑𝑢𝑡𝑦 is 𝑏 then Push 𝑑𝑢𝑡𝑦 into an empty 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisﬁes  𝑜𝑡ℎ𝑒𝑟 then if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisﬁes  𝑏𝑎𝑠𝑒 then Add 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 to  ∗ else if 𝑑𝑢𝑡𝑦 has at least one duty overnight-connection in  𝑑𝑏 then DFS( 𝑝𝑎𝑖𝑟𝑖𝑛𝑔, 𝑑𝑢𝑡𝑦,  𝑑𝑏 ,  𝑏𝑎𝑠𝑒 ∪  𝑜𝑡ℎ𝑒𝑟 ) , and add enumerated pairings to  ∗ end end Pop out 𝑑𝑢𝑡𝑦 from 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 end end end return  ∗ ⊳ DFS( 𝑝𝑎𝑖𝑟𝑖𝑛𝑔, 𝑝𝑎𝑟𝑒𝑛𝑡,  𝑑𝑏 ,  𝑏𝑎𝑠𝑒 ∪  𝑜𝑡ℎ𝑒𝑟 ) for each 𝑐ℎ𝑖𝑙𝑑 of 𝑝𝑎𝑟𝑒𝑛𝑡 in  𝑑𝑏 do Push 𝑐ℎ𝑖𝑙𝑑 into 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisﬁes  𝑜𝑡ℎ𝑒𝑟 then if updated duty-sequence in 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 satisﬁes  𝑏𝑎𝑠𝑒 then yield 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 to  ∗ else if 𝑐ℎ𝑖𝑙𝑑 has at least one duty overnight-connection in  𝑑𝑏 then DFS( 𝑝𝑎𝑖𝑟𝑖𝑛𝑔, 𝑐ℎ𝑖𝑙𝑑,  𝑑𝑏 ,  𝑏𝑎𝑠𝑒 ∪  𝑜𝑡ℎ𝑒𝑟 ) end end Pop out 𝑐ℎ𝑖𝑙𝑑 from 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 end 𝑑𝑢𝑡𝑦 is popped out of 𝑝𝑎𝑖𝑟𝑖𝑛𝑔 (line 13), and the legal pairing enumeration using other duties ∈  𝑏 is explored (line 3).Once, all the sub-processes are complete, the desired pairing set  ∗ is returned (line 17). An initial feasible solution (IFS) is any set of pairings, covering all ﬂights in the given ﬂight schedule, which is usedto initialize a CPOP solution approach. For large-scale CPOPs, generation of an IFS standalone is a computationally-challenging task. This module is designed to generate a reasonably-sized IFS in a time-eﬃcient manner for large andcomplex ﬂight networks, which is then used to initialize the Optimization Engine of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . For this, it employsa novel

Integer Programming based Divide-and-cover Heuristic (IPDCH), which relies on: (a) a divide-and-cover strategy to decompose the input ﬂight schedule into suﬃciently-small ﬂight subsets, and (b) integer programming toﬁnd a lowest-cost pairing set, covering the maximum possible ﬂights for each of the decomposed ﬂight subsets.The procedure of the proposed IPDCH, formalized in Algorithm 3, is elaborated below. Being an iterative heuristic,IPDCH terminates when all ﬂights in the input set are covered by pairings in the desired IFS, notated as  𝐼𝐹 𝑆 (lines 1).The input to the heuristic involves the given ﬂight schedule  (with 𝐹 number of ﬂights), the pairing generation sub-routine Pairing_Gen() (presented in Section 3.1), and a pre-deﬁned decomposition parameter 𝐾 , which regulates thenumber of ﬂights to be selected from  in each IPDCH-iteration. The setting of 𝐾 largely depends upon the availablecomputational resources, and the characteristics of the input ﬂight dataset (as highlighted in Section 4.3.3). In eachIPDCH-iteration, ﬁrst a ﬂight subset, say  𝐾 ( 𝐾 < 𝐹 ) , is formed by randomly selecting 𝐾 number of ﬂights from  without replacement (line 2). Subsequently,  𝐾 is fed as input to the Pairing_Gen() sub-routine to enumerate the

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 10 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Algorithm 3:

Procedure for IFS generation using the proposed IPDCH

Input:  , 𝐾, Pairing_Gen()

Output:  𝐼𝐹𝑆 while all ﬂights ∈  are not covered in  𝐼𝐹𝑆 do  𝐾 ← Select 𝐾 random ﬂights from  without replacement ⊳ 𝐾 < 𝐹  𝐾 ← Pairing_Gen(  𝐾 )  𝐾 ′ ← Flights covered in  𝐾 ⊳ 𝐾 ′ ≤ 𝐾 Add remaining ﬂights (  𝐾 ∖  𝐾 ′ ) back to  𝐾 Formulate the IPP using ﬂights in  𝐾 ′ and pairings in  𝐾  𝐼𝑃 ← Solve the IPP using an MIP solver, and select pairings corresponding to non-zero variables Add pairings from  𝐼𝑃 to  𝐼𝐹𝑆 Replace ﬂights in  if it becomes empty end return  𝐼𝐹𝑆 set of all possible legal pairings, say  𝐾 (line 3). Notably, all ﬂights in  𝐾 may not get covered by pairings in  𝐾 , asrandom selection of ﬂights does not guarantee legal connections for all selected ﬂights. Let  𝐾 ′ ( 𝐾 ′ ≤ 𝐾 ) be the setof ﬂights covered in  𝐾 (line 4). The remaining ﬂights, given by  𝐾 ∖  𝐾 ′ , are added back to  (line 5). Subsequently,  𝐾 ′ and  𝐾 are used to formulate the corresponding IPP (line 6), which is then solved using a commercial oﬀ-the-shelfMIP solver to ﬁnd the optimal IPP solution, say  𝐼𝑃 , constituted by pairings corresponding to only non-zero variables(line 7). The pairings in  𝐼𝑃 are then added to the desired set  𝐼𝐹 𝑆 (line 8). Lastly, the ﬂights in  are replaced ifit becomes empty (line 9). As soon as  𝐼𝐹 𝑆 covers all the required ﬂights, IPDCH is terminated, and  𝐼𝐹 𝑆 is passedover to the Optimization Engine for its initialization.

The search for minimal cost, full ﬂight-coverage CPOP solution is enabled by an optimization engine. It tacklesthe underlying LPP and IPP through intermittent interactions of two submodules, namely, CG-driven LPP-solutioningand IPP-solutioning, tracked by a counter 𝑇 . These submodules are presented below. As illustrated in Figure 2, this submodule entails several iterations (each referred to as an

LPP iteration , and istracked by 𝑡 ) in each of which: (a) an LP solver is invoked on the input pairing set, leading to the current LPP solution  𝑡𝐿𝑃 , (b) the corresponding dual od the LPP is formulated using  𝑡𝐿𝑃 , which is then solved to fetch dual variables(given by vector 𝑌 𝑡 ), and (c) a fresh set of pairings  𝑡𝐶𝐺 , that promises associated cost-improvement, is generatedusing a domain-knowledge driven CG heuristic. For the ﬁrst LPP iteration ( 𝑡 = 1 ), the input to the LP solver is either  𝐼𝐹 𝑆 if 𝑇 = 1 , or  𝑇 −1 𝐼𝑃 if 𝑇 > . For any subsequent LPP iteration ( 𝑡 > ), the input comprises of the current  𝑡𝐶𝐺 and  𝑡𝐿𝑃 . In this background, each of these LPP iterations are implemented in the following three phases :• In the ﬁrst phase, a primal of the LPP (Equations 4 to 6) is formulated from the input pairing set, and is solvedusing an interior-point method based commercial oﬀ-the-shelf LP solver (Gurobi Optimization, 2019). In theresulting LPP solution, a primal variable 𝑥 𝑗 , varying from to , is assigned to each pairing 𝑝 𝑗 in the inputpairing set. These 𝑥 𝑗 s together constitute the primal vector , notated as 𝑋 ( = [ 𝑥 𝑥 𝑥 ... 𝑥 𝑃 ] 𝖳 ) . The set of 𝑥 𝑗 swith non-zero values ( 𝑥 𝑗 ≠ ) and the set of corresponding pairings are notated as 𝑋 𝐿𝑃 and  𝐿𝑃 , respectively.Minimize 𝑍 𝑝𝐿𝑃 = 𝑃 ∑ 𝑗 =1 𝑐 𝑗 𝑥 𝑗 + 𝜓 𝐷 ⋅ ( 𝐹 ∑ 𝑖 =1 ( 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 − 1 )) = 𝑃 ∑ 𝑗 =1 ( 𝑐 𝑗 + 𝜓 𝐷 ⋅ 𝐹 ∑ 𝑖 =1 𝑎 𝑖𝑗 ) 𝑥 𝑗 − 𝐹 ⋅ 𝜓 𝐷 , (4)subject to 𝑃 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝑥 𝑗 ≥ , ∀ 𝑖 ∈ {1 , , ..., 𝐹 } (5) For ease of reference, the notations introduced in these phases are kept independent of the LPP iteration counter 𝑡 . However, these notationsare super-scripted by 𝑡 in the corresponding discussions and pseudocodes with reference to a particular LPP iteration. D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 11 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks 𝑥 𝑗 ∈ ℝ = [0 , , ∀ 𝑗 ∈ {1 , , ..., 𝑃 } (6)It is to be noted that the minimization of 𝑍 𝑝𝐿𝑃 will always lead to a solution with all primal variables 𝑥 𝑗 ≤ , evenwithout explicitly involving the corresponding constraint– Equation 6 (Vazirani, 2003). Hence, the contributionof each pairing in the LPP solution, given by its 𝑥 𝑗 , could be eﬀectively treated as 𝑥 𝑗 ∈ ℝ ≥ instead of Equation 6.• In the second phase, dual variables are extracted from the current LPP solution. For this, the dual of the LPP(Equations 7 to 9) is formulated using the pairing set  𝐿𝑃 , and is solved using an interior-point method (Andersen& Andersen, 2000) based non-commercial LP solver (Virtanen et al., 2020), to fetch the optimal dual solution. Inthat, a dual variable 𝑦 𝑖 represents a shadow price corresponding to an 𝑖 𝑡ℎ ﬂight-coverage constraint in the primal.The optimal dual vector , constituted by all 𝑦 𝑖 s in the optimal dual solution, is notated as 𝑌 ( = [ 𝑦 𝑦 𝑦 ... 𝑦 𝐹 ] 𝖳 ) ,whose dimension is equal to 𝐹 .Maximize 𝑍 𝑑𝐿𝑃 = 𝐹 ∑ 𝑖 =1 𝑦 𝑖 − 𝐹 ⋅ 𝜓 𝐷 , (7)subject to 𝐹 ∑ 𝑖 =1 𝑎 𝑖𝑗 𝑦 𝑖 ≤ ( 𝑐 𝑗 + 𝜓 𝐷 ⋅ 𝐹 ∑ 𝑖 =1 𝑎 𝑖𝑗 ) , ∀ 𝑗 ∈ {1 , , ..., 𝑃 𝐿𝑃 } (8) 𝑦 𝑖 ∈ ℝ ≥ , ∀ 𝑖 ∈ {1 , , ..., 𝐹 } (9)where , 𝑃 𝐿𝑃 ∶ is the number of pairings in the set  𝐿𝑃 𝑦 𝑖 ∶ dual variable, corresponding to an 𝑖 𝑡ℎ ﬂight-coverage constraint , Notably, in a conventional approach, the optimal 𝑌 is directly computed from the optimal basis of the primalsolution (obtained in the ﬁrst phase), using the principles of duality theory , particularly the theorem of comple-mentary slackness (Bertsimas & Tsitsiklis, 1997), without explicitly solving the corresponding dual. However, inthe second phase, solving the dual explicitly using the interior-point method (Andersen & Andersen, 2000), in asense, helps in stabilizing the oscillating behavior of dual variables over the successive LPP iterations (bang-bangeﬀect, as discussed in Section 2.3). Moreover, this interior-point method is available via only a non-commercialLP solver (Virtanen et al., 2020), and to ensure a time-eﬃcient search, the above dual is formulated using thepairings ∈  𝐿𝑃 , instead of pairings from the large-sized input pairing set.• In the last phase, the availability of dual variables from the second phase paves the way for solution to the pricingsub-problem. It is aimed to generate those legal pairings (non-basic), which if included as part of the input to thenext LPP iteration, promise a better-cost (at least a similar-cost) LPP solution compared to the current solution.Such non-basic pairings are identiﬁed using a reduced cost metric, given by 𝜇 𝑗 (Equation 10), which if negative(as CPOP is a minimization problem) indicates the potential in the pairing to further reduce the cost of the currentLPP solution 𝑍 𝑝𝐿𝑃 , when included in the current basis (Bertsimas & Tsitsiklis, 1997). Moreover, the potentialof such a pairing to further reduce the current 𝑍 𝑝𝐿𝑃 , is in proportion to the magnitude of its 𝜇 𝑗 value. 𝜇 𝑗 = 𝑐 𝑗 − 𝜇𝑑 𝑗 , where, 𝜇𝑑 𝑗 = 𝐹 ∑ 𝑖 =1 ( 𝑎 𝑖𝑗 ⋅ 𝑦 𝑖 ) = ∑ 𝑓 𝑖 ∈ 𝑝 𝑗 𝑦 𝑖 ( represents the dual cost component of 𝜇 𝑗 ) (10)As mentioned in Section 2.3, the standard CG practices generate a complete pricing network and solves it as aresource-constrained shortest-path optimization problem, to identify only the pairing(s) with negative reducedcost(s). However, generation of a complete pricing network for CPOPs with large-scale and complex ﬂightnetworks is computationally-intractable. To overcome this challenge, a domain-knowledge driven CG heuristic (Aggarwal, Saxena, et al., 2020a) is employed here to generate a set of promising pairings (of pre-deﬁned size,criterion for which is discussed in Section 4.2). Notably, the merit of this CG heuristic lies in the fact that fromwithin the larger pool of pairings with negative 𝜇 𝑗 , besides selecting pairings randomly, it also selects pairings ina guided manner. In that, the selection of such pairings is guided by optimal solution features at a set level and anindividual pairing level , and re-utilization of the past computational eﬀorts . These optimal solution features arerelated to the minimization of deadheads and maximization of the crew utilization , respectively. In essence, while D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 12 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks the standard CG practices present equal opportunity for any pairing with a negative 𝜇 𝑗 to qualify as an input forthe next LPP iteration, this CG heuristic, besides ensuring that the pairings have negative 𝜇 𝑗 , prioritizes somepairings over the others via its two-pronged strategy– exploration of the new pairings’ space and re-utilizationof pairings from the past LPP iterations . In that: – the exploration of the new pairings’ space is guided by three CG strategies, which are elaborated below. ◦ Deadhead Reduction strategy ( 𝐶𝐺𝐷 ): this strategy prioritizes a set of legal pairings that is character-ized by low deadheads , a feature which domain knowledge recommends for optimality at a set level .To exploit this optimality feature,

𝐶𝐺𝐷 generates a new paring set  𝐶𝐺𝐷 , which not only provides analternative way to cover the ﬂights involved in a subset of the current  𝐿𝑃 , but also ensures that someof these ﬂights get covered with zero deadheads. It promises propagation of the zero deadhead featureover successive LPP iterations, as: (a)  𝐶𝐺𝐷 alongside the current  𝐿𝑃 forms a part of the input forthe next LPP iteration; (b)  𝐶𝐺𝐷 provides a scope for better coverage (zero deadhead) of some ﬂights,compared to the current  𝐿𝑃 ; and (c)  𝐶𝐺𝐷 may focus on zero deadhead coverage for diﬀerent ﬂightsin diﬀerent LPP iterations. ◦ Crew Utilization enhancement strategy ( 𝐶𝐺𝑈 ): this strategy prioritizes a set of legal pairings eachmember of which is characterized by high crew utilization , a feature which domain knowledge rec-ommends for optimality at an individual pairing level . To exploit this optimality feature,

𝐶𝐺𝑈 : (a)introduces a new measure, namely, crew utilization ratio , given by 𝛾 𝑗 (Equation 11), to quantify thedegree of crew utilization in a pairing 𝑝 𝑗 at any instant; (b) identiﬁes pairings from the current  𝐿𝑃 ,which are characterized by high dual cost component ( 𝜇𝑑 𝑗 , Equation 10), reﬂecting in turn on thoseconstitutive ﬂights that have high value of dual variables 𝑦 𝑖 , and hence, on the potential of these ﬂightsto generate new pairings with more negative 𝜇 𝑗 ; and (c) utilizes these ﬂights to generate promisingpairings from which only the ones with high 𝛾 𝑗 are picked to constitute the new pairing set  𝐶𝐺𝑈 . 𝛾 𝑗 = 1 Number of duties in 𝑝 𝑗 ⋅ ∑ 𝑑 ∈ 𝑝 𝑗 Working hours in duty 𝑑 Permissible hours of duty 𝑑 (11)In doing so, 𝐶𝐺𝐷 promises propagation of the higher crew utilization ratio over successive LPPiterations, given that in each LPP iteration,  𝐶𝐺𝑈 alongside the current  𝐿𝑃 forms a part of the inputfor the next LPP iteration. ◦ Random exploration strategy ( 𝐶𝐺𝑅 ): this strategy, unlike

𝐶𝐺𝑈 and

𝐶𝐺𝐷 which are guided by opti-mal solution features, pursues random and unbiased exploration of the new pairings’ space, indepen-dent of the current LPP solution. It involves generation of new pairings for a random selected set oflegal duties from which only the pairings with negative reduced cost are selected to constitute the newpairing set  𝐶𝐺𝑅 . Here, a random set of legal duties is used instead of a random set of ﬂights, as theformer has a higher probability of generating legal pairings, given that a majority of pairing legalityconstraints get satisﬁed with the preprocessing of legal duties. – the re-utilization of pairings from the past LPP iterations is guided by an Archiving strategy ( 𝐶𝐺𝐴 ), thatprioritizes a set of legal pairings comprising of those ﬂight-pairs, which as per the existing LPP solu-tion, bear better potential for improvement in the objective function. Such a pairing set, originating fromthe ﬂight-pair level information, is extracted from an archive (denoted by  ) of the previously generatedpairings. In doing so, 𝐶𝐺𝐴 facilitates re-utilization of the past computational eﬀorts, by providing an op-portunity for a previously generated pairing to be re-inducted in the current pairing pool. For this,

𝐶𝐺𝐴 : ◦ updates the archive  in each LPP iteration such that any pairing is stored/retrieved with reference toa unique index ( 𝑓 𝑚 , 𝑓 𝑛 ) reserved for any legal ﬂight-pair in that pairing. ◦ introduces a new measure, namely, reduced cost estimator , given by 𝜂 𝑚𝑛 (Equation 12), for a ﬂight-pair ( 𝑓 𝑚 , 𝑓 𝑛 ) in  . In each LPP iteration, this estimator is computed for all the ﬂight-pairs present in  , by fetching 𝑓 𝑚 , 𝑓 𝑛 , 𝑦 𝑚 and 𝑦 𝑛 . 𝜂 𝑚𝑛 = flying_cost( 𝑓 𝑚 ) + flying_cost( 𝑓 𝑛 ) − 𝑦 𝑚 − 𝑦 𝑛 = ∑ 𝑖 ∈{ 𝑚,𝑛 } ( flying_cost( 𝑓 𝑖 ) − 𝑦 𝑖 ) (12) D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 13 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Notably, this formulation is analogous to Equation 10, just that instead of the complete cost of a pairing,only the ﬂying costs corresponding to the ﬂights in a legal ﬂight-pair are accounted for. Given this, 𝜂 𝑚𝑛 may be seen as an indicator of 𝜇 𝑗 at the ﬂight-pair level . ◦ recognizes that towards further improvement in the current LPP solution, it may be prudent to includeas a part of the input for the next LPP iteration– the new pairing set  𝐶𝐺𝐴 , constituted by preferentiallypicking pairings from  , that cover ﬂight-pairs with lower 𝜂 𝑚𝑛 value.In doing so, 𝐶𝐺𝐴 pursues the goal of continual improvement in the objective function, while relying onthe ﬂight-pair level information embedded in the LPP solution of current LPP iteration, and re-utilizingthe computational eﬀorts spent till that LPP iteration.For further details and associated nitty-gritty of the above domain-knowledge driven CG heuristic, interestedreaders are referred to the authors’ previous work– Aggarwal, Saxena, et al. (2020a). Once this CG heuristicgenerates a set of promising pairings  𝐶𝐺 of pre-deﬁned size, it is merged with the current  𝐿𝑃 , and fed as theinput to the next LPP iteration ( 𝑡 += 1 ).These LPP iterations are repeated until the cost-improvements over a pre-speciﬁed number of successive LPP itera-tions falls below a pre-speciﬁed cost-threshold (settings given in Section 4.2). In this submodule, these LPP iterationsare repeated, until its termination criterion is not met. In that, the cost-improvement over LPP iterations is observed,and if it falls below a pre-speciﬁed cost-threshold, say 𝑇 ℎ 𝑐𝑜𝑠𝑡 , over a pre-speciﬁed number of successive LPP itera-tions, say

𝑇 ℎ 𝑡 , then it is terminated. The settings of these pre-speciﬁed limits– 𝑇 ℎ 𝑐𝑜𝑠𝑡 and

𝑇 ℎ 𝑡 , are highlighted inSection 4.2. After termination, the ﬁnal LPP solution  𝑇𝐿𝑃 is then passed over to the IPP-solutioning submodule forits integerization.

This submodule receives as input, the LPP solution  𝑇𝐿𝑃 , and aims to ﬁnd therein a full-coverage integer solution,notated as  𝑇𝐼𝑃 . Towards it, an IPP (Equations 1 to 3) is formulated using  𝑇𝐿𝑃 and  , and solved using a branch-and-cutalgorithm based oﬀ-the-shelf commercial MIP solver (Gurobi Optimization, 2019). At each node of the MIP-searchtree, this solver maintains a valid lower bound (cost of the LPP solution) and a best upper bound (cost of the IPPsolution), and it self-terminates if the gap between these two bounds becomes zero, or all branches in the MIP-searchtree have been explored. Considering that the MIP-search for large-scale CPOPs is extremely time-consuming, a pre-deﬁned time limit, notated as 𝑇 ℎ 𝑖𝑝𝑡 (setting highlighted in Section 4.2), is used to terminate this MIP solver, if it doesnot terminate by itself a priori. Once the  𝑇𝐼𝑃 is obtained, it is passed back to the previous submodule for the nextLPP-IPP interaction ( 𝑇 += 1 ), only if the termination criterion of the Optimization Engine is not satisﬁed. Overarching Optimization Engine

In the wake of the above, the procedure of the overarching Optimization Engine, formalized in Algorithm 4, is elabo-rated below. Its input involves the given ﬂight set  ; the generated IFS  𝐼𝐹 𝑆 ; the pre-deﬁned termination parameters–

𝑇 ℎ 𝑐𝑜𝑠𝑡 & 𝑇 ℎ 𝑡 (for CG-driven LPP-solutioning) and 𝑇 ℎ 𝑖𝑝𝑡 (for IPP-solutioning); and the sub-routines for Legal CrewPairing Generator (

Pairing_Gen() ) and the four CG strategies (

CGD() , CGU() , CGR() and

CGA() ) in the proposedCG heuristic. In each LPP-IPP interaction of the Optimization Engine, ﬁrst, the CG-driven LPP-solutioning is exe-cuted (lines 3-25). It entails several LPP iterations (tracked by 𝑡 ), in each of which the ﬁrst step is to formulate the primal using  and the respective input pairing set. This input pairing set is:•  𝐼𝐹 𝑆 , if the ﬁrst LPP iteration ( 𝑡 = 1 ) of the ﬁrst LPP-IPP interaction ( 𝑇 = 1 ) is being executed (lines 5-6).•  𝑇 −1 𝐼𝑃 , if the ﬁrst LPP iteration ( 𝑡 = 1 ) of any subsequent LPP-IPP interaction ( 𝑇 > ) is being executed (lines7-8).•  𝑡 −1 𝐶𝐺 ∪  𝑡 −1 𝐿𝑃 , if any subsequent LPP iteration ( 𝑡 > ) of any LPP-IPP interaction ( 𝑇 ≥ ) is being executed (lines9-11).Once the primal is formulated, it is solved using the corresponding LP solver to obtain the current optimal LPP solu-tion, constituted by  𝑡𝐿𝑃 and 𝑋 𝑡𝐿𝑃 (line 12). Subsequently, the termination criterion of CG-driven LPP-solutioning ischecked (lines 13-16). If it is terminated, then the current LPP solution  𝑡𝐿𝑃 is fetched as the ﬁnal LPP solution  𝑇𝐿𝑃

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 14 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Algorithm 4:

Procedure for the Optimization Engine

Input:  ,  𝐼𝐹𝑆 , 𝑇 ℎ 𝑐𝑜𝑠𝑡 , 𝑇 ℎ 𝑡 , 𝑇 ℎ 𝑖𝑝𝑡 , Pairing_Gen() , CGD() , CGU() , CGR() , CGA()

Output:  ⋆𝐼𝑃 𝑇 ← while termination criterion of Optimization Engine is not met do ⊳ CG-driven LPP-solutioning: 𝑡 ← while termiantion criterion of CG-driven LPP-solutioning is not met do if 𝑡 = 1 and 𝑇 = 1 then Formulate the primal of the LPP using  𝐼𝐹𝑆 and  else if 𝑡 = 1 and 𝑇 > then Formulate the primal of the LPP using  𝑇 −1 𝐼𝑃 and  else Formulate the primal of the LPP using  𝑡 −1 𝐶𝐺 ∪  𝑡 −1 𝐿𝑃 and  end  𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 ← Solve the primal using the interior-point method based LP solver ⊳ Termination of the CG-driven LPP-solutioning: if cost-improvements ≤ 𝑇 ℎ 𝑐𝑜𝑠𝑡 over last

𝑇 ℎ 𝑡 number of successive LPP iterations then  𝑇𝐿𝑃 ←  𝑡𝐿𝑃 Break end Formulate the dual of the LPP using  and  𝑡𝐿𝑃 𝑌 𝑡 ← Solve the dual using the interior-point method based LP solver ⊳ Solution to pricing sub-problem using the CG heuristic:  𝑡𝐶𝐺𝐷 ← CGD(  𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 , 𝑌 𝑡 , … )  𝑡𝐶𝐺𝑈 ← CGU(  𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 , 𝑌 𝑡 , … )  𝑡𝐶𝐺𝑅 ← CGR( 𝑌 𝑡 , … )  𝑡𝐶𝐺𝐴 ← CGA(  𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 , 𝑌 𝑡 , … )  𝑡𝐶𝐺 ←  𝑡𝐶𝐺𝐷 ∪  𝑡𝐶𝐺𝑈 ∪  𝑡𝐶𝐺𝑅 ∪  𝑡𝐶𝐺𝐴 𝑡 += 1 end ⊳ IPP-solutioning: Formulate the IPP using  𝑇𝐿𝑃 and   𝑇𝐼𝑃 ← Solve the IPP using a branch-and-cut algorithm based MIP solver until its run-time becomes ≥ 𝑇 ℎ 𝑖𝑝𝑡 ⊳ Termination of the Optimization Engine: if 𝑍 𝑇𝐼𝑃 ( cost of  𝑇𝐼𝑃 ) = 𝑍 𝑇𝐿𝑃 ( cost of  𝑇𝐿𝑃 ) then  ⋆𝐼𝑃 ←  𝑇𝐼𝑃 Break end 𝑇 += 1 end return  ⋆𝐼𝑃 of this LPP-IPP interaction. If not, then a dual is formulated using  𝑡𝐿𝑃 and  (line 17), which is then solved using thecorresponding LP solver to obtain the current optimal dual vector 𝑌 𝑡 (line 18). Using the current  𝑡𝐿𝑃 , 𝑋 𝑡𝐿𝑃 and 𝑌 𝑡 , afresh set of pairings  𝑡𝐶𝐺 is obtained using the CG heuristic, which is constituted by the new pairing sets from the fourunderlying CG strategies (lines 19-23). At the end of the LPP iteration 𝑡 , the fresh set of pairings  𝑡𝐶𝐺 is combinedwith the current  𝑡𝐿𝑃 to serve as input pairing set for the subsequent LPP iteration ( 𝑡 += 1 ). Once this submodule isterminated, the resulting  𝑇𝐿𝑃 is passed over to the IPP-solutioning for its integerization, wherein, the MIP solver isused to obtain the IPP solution  𝑇𝐼𝑃 (lines 26 and 27). In that, the pre-deﬁned

𝑇 ℎ 𝑖𝑝𝑡 time-limit is used to terminatethe MIP-search, if it does not self-terminate a priori. Subsequently, the resulting  𝑇𝐼𝑃 is passed back to the CG-drivenLPP-solutioning for the next LPP-IPP interaction ( 𝑇 += 1 ), or returned as the ﬁnal integer solution  ⋆𝐼𝑃 , depending D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 15 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks upon the termination condition of the Optimization Engine (lines 28-32). In that, if the cost of  𝑇𝐼𝑃 ( 𝑍 𝑇𝐼𝑃 ) , matchesthe cost of  𝑇𝐿𝑃 ( 𝑍 𝑝,𝑇𝐿𝑃 ) , then the Optimization Engine is terminated.

4. Computational Experiments

This section ﬁrst presents the test cases and the computational setup, used to investigate the utility of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 , itsmodules, and their interactions. Subsequently, the settings of parameters involved in diﬀerent modules of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 are presented. Lastly, the experimental results are discussed.

The real-world airline test cases, used for experimentation, are detailed in Table 2. Each of these test cases involvesa weekly ﬂight schedule, and have been provided by the research consortium’s industrial sponsor (from the networksof US-based airlines). The columns in Table 2, in order of their occurrence, highlight the notations for the diﬀerent

Table 2

Real-world airline test cases used in this research work

Test Cases

Flights

Crew Bases

Airports

Legal Duties

TC-1 3202 15 88 454205TC-2 3228 15 88 464092TC-3 3229 15 88 506272TC-4 3265 15 90 446937TC-5 4212 15 88 737184 (a) (b)

Figure 3: (a) Geographical representation of TC-5 ﬂight network, where the red nodes, green edges and yellow nodesrepresent the airports, scheduled ﬂights and crew bases, respectively, and (b) legal ﬂight-connections, each represented bya point in the plot, where for a ﬂight marked on the y-axis, the connecting ﬂight is marked on the x-axis. test cases; the number of its constituent ﬂights; the number of constituent crew bases; and the total number of legalduties involved, respectively. It is critical to recognize that the challenge associated with solutioning of these testcases, depends not just on the number of ﬂights involved but also to the fact that these ﬂights are part of complex ﬂightnetworks, characterized by a multiplicity of hubs as opposed to a single hub, and multiplicity of crew bases as opposed

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 16 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks to a single crew base . In that, the number of legal pairings possible, grow exponentially with the number of hubs andcrew bases. As a sample instance, the geographical representation of the ﬂight network associated with TC-5, andthe legal ﬂight connections involved in it, are portrayed in Figure 3. Notably, in Figure 3a, the presence of multiplehub-and-spoke subnetworks and multiple crew bases (highlighted in yellow color) is evident. Furthermore, the patternvisible in Figure 3b could be attributed to the (minimum and maximum) limits on the sit-time and overnight-restconstraints. For instance, a ﬂight, say 𝑓 , has legal connections only with those ﬂights that depart from the arrivalairport of 𝑓 , and whose departure-time gap (diﬀerence between its departure-time and the arrival time of 𝑓 ) lieswithin the minimum and maximum allowable limits, of the sit-time or the overnight-rest.All the experiments in this research have been performed on an HP Z640 Workstation, which is powered by twoIntel Ⓡ Xeon Ⓡ E5-2630v3 processors, each with 16 cores at 2.40 GHz, and 96 GBs of RAM. All codes related to the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 have been developed using the Python scripting language in alignment with the Industrial sponsor’s largervision and preference. Furthermore:• the interior-point method from Gurobi Optimizer 8.1.1 (Gurobi Optimization, 2019) is used to solve the primalin the CG- driven LPP-solutioning submodule.• the interior-point method (Andersen & Andersen, 2000) from SciPy’s linprog library (Virtanen et al., 2020) isused to solve the dual in the CG-driven LPP-solutioning submodule.• the branch-and-cut algorithm based MIP solver from Gurobi Optimizer 8.1.1 is used to solve the IPP in the InitialFeasible Solution Generator and the IPP-solutioning submodule.• an

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -run, in principle, terminates when the cost of the IPP solution matches the cost of its inputLPP solution in a particular LPP-IPP interaction. However, for practical considerations on the time-limit, an

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -run is allowed to terminate if the IPP and LPP costs do not conform with each other even after 30LPP-IPP interactions are over, or 30 hours of total run-time is elapsed.

The settings of the parameters associated with diﬀerent modules and submodules of the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 are, as high-lighted below.•

Initial Feasible Solution Generator : here, the proposed IPDCH involves the decomposition parameter 𝐾 , whichregulates the size of ﬂight subsets formed in each of IPDCH-iteration. As mentioned before, the setting of 𝐾 isdependent on the characteristics of input ﬂight dataset and the conﬁguration of available computational resources.Here, the aim is to cover all given ﬂights in a time-eﬃcient manner. Hence, it is important to understand theeﬀect of setting of 𝐾 on the time-performance of IPDCH, which is highlighted below. – For a relatively lower value of 𝐾 , smaller ﬂight subsets with lesser number of legal ﬂight-connectionswould be formed in each IPDCH-iteration, leading to coverage of relatively lesser number of unique ﬂightsin each of them. Though, this by itself is not a challenge, but this would necessitate a signiﬁcant number ofadditional IPDCH-iterations (and the respective run-time), since the number of unique ﬂights covered perIPDCH-iteration, which by construct reduces with the iterations, would get further reduced with relativelysmaller ﬂight subsets. – On the ﬂip side, for a relatively higher value of 𝐾 , bigger ﬂight subsets would be formed that would leadto coverage of higher number of unique ﬂights per IPDCH-iteration. Though, this may reduce the totalnumber of IPDCH-iterations required to generate the desired IFS, the overall run-time of the IPDCH mayincrease drastically. The rationale being that with bigger ﬂight subset in each IPDCH-iteration, the numberof possible legal pairings would increase drastically, leading to huge run-time for their generation as wellas for the subsequent MIP-search.The above considerations suggest that 𝐾 should be reasonably-sized. Considering the given computationalresources and the results of initial exploration around the possible number of pairings for diﬀerently-sized ﬂightsets, the value of 𝐾 in each IPDCH-iteration is guided by a random integer between one-eighth and one-fourthof the size of the input ﬂight set  . It may be noted that this setting of 𝐾 has been selected considering thescale and complexity of the given test cases, and it needs to be re-visited if the scale and complexity of the ﬂightnetwork changes drastically. D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 17 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks • CG-driven LPP-solutioning : The parameters involved in the termination criterion for this submodule–

𝑇 ℎ 𝑐𝑜𝑠𝑡 & 𝑇 ℎ 𝑡 , are set as 100 USD & 10 iterations respectively, to achieve an LPP solution with a suﬃciently good costin a reasonably good time. Moreover, the sensitivity of these parameters towards the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performanceis discussed in Section 4.3.4. Moreover, the eﬀect of the parameter– size of  𝑡𝐶𝐺 , on the performance of thissubmodule (the ﬁnal LPP solution’s cost and required run-time), and the demand on the computational resources(dominantly, RAM) is highlighted below. – for a relatively small-sized  𝑡𝐶𝐺 , the alternative pairings available to foster further cost improvement shallbe quite limited, amounting to smaller cost beneﬁts in each phase of the CG-driven LPP-solutioning. Thiswould necessitate far more LPP-IPP interactions, to reach the near-optimal cost. This pre se is not a chal-lenge, however, signiﬁcant amount of additional run-time may be required, since: (a) each call for CG-driven LPP-solutioning demands a minimum of 10 LPP iterations, before it could be terminated, (b) suchcalls when invoked repeatedly, may consume signiﬁcant run-time, yet, without reasonable cost beneﬁt. – On the other hand, for a very large-sized  𝑡𝐶𝐺 , though the potential for signiﬁcant cost beneﬁts may exist,the demand on the RAM may become overwhelming for any CG-driven LPP-solutioning phase to proceed.The above considerations suggest that the size of  𝑡𝐶𝐺 may neither be too small nor too large. Factoring these, theexperiments here aim at  𝑡𝐶𝐺 sized approximately of a million pairings (signiﬁcant size, yet, not overwhelmingfor 96 GB RAM). Furthermore, for a search that is not biased in favor of any particular CG strategy, the numberof pairings from each CG strategy towards the overall CG heuristic are kept equable.• IPP-solutioning : As mentioned before, the MIP-search on a large-scale IPP is time-intensive. Hence, the termi-nation parameter–

𝑇 ℎ 𝑖𝑝𝑡 , that restricts the run-time of any IPP-solutioning phase if not self-terminated a priori,is reasonably set as 20 minutes, and its sensitivity on the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance is discussed in Section 4.3.4.

This section presents the experimental results and associated inferences, in the order highlighted below.1. The performance of the proposed

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on the given test cases with the aforementioned parameter settingsis discussed.2. The phenomenon referred to as performance variability (Lodi & Tramontani, 2013) is discussed in the contextof

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . This aspect is pertinent since some variability in performance (even for the same random seed)is inevitable owing to

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s reliance on the mathematical programming solvers, which over the diﬀerentruns may pick diﬀerent permutations of the rows (ﬂight-coverage) or columns (pairings).3. The impact of the initialization methods: (a) the proposed IPDCH, (b) an Enhanced-DFS heuristic, earlier pro-posed by the authors (Aggarwal et al., 2018), and (c) a commonly adopted

Artiﬁcial Pairings method (Hoﬀman& Padberg, 1993; Vance et al., 1997), on the ﬁnal performance of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is investigated.4. The sensitivity of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance to the termination parameters in the Optimization Engine’s sub-modules (CG-driven LPP-solutioning and IPP-solutioning) has been discussed.

The results of the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -runs on the given test cases (TC-1 to TC-5) with the aforementioned parametersettings are reported in Table 3. In that, for each test case:• the ﬁrst row marked by “  𝐼𝐹 𝑆 ” highlights the cost associated with the IFS that initializes the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -runand the run-time consumed in its generation.• the subsequent rows present the results of the LPP-IPP interactions (marked by the counter 𝑇 ). In that, for aparticular 𝑇 , the cost of the LP-solution passed on for its integerization and the associated time are highlighted.Also the cost of the IP-solution returned and the associated time are highlighted. Here, the unit of cost is USD,and the time corresponds to the HH:MM format.• the ﬁnal crew pairing solution (  ⋆𝐼𝑃 ) is highlighted in the last row (emboldened) marked by “Final Solution”. D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 18 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Table 3

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance ∗ on the given test cases LPP-IPP TC-1 TC-2 TC-3 TC-4 TC-5Interactions 𝑇  𝑇𝐿𝑃 ∕  𝑇𝐼𝑃

Cost Time Cost Time Cost Time Cost Time Cost Time  𝐼𝐹𝑆  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃 Final Solution 3473238 10:05 3497106 08:46 3490420 12:55 3604753 09:24 4595613 22:52 ∗ All values in the “Cost” columns are in USD, and all corresponding real values are rounded-oﬀ to the next integer values. All valuesin the “Time” columns are in HH:MM format, and all corresponding seconds’ values are rounded-oﬀ to the next minute values.

It may be noted that the experimental results in the subsequent sections are presented in the same format, unless anydigression is speciﬁcally highlighted.The above results have been tested by the research consortium’s industrial sponsor, and veriﬁed to be highly-competitive compared to the best practice solutions known, for diﬀerent test cases. In general, the obtained solutionshave been found to be superior by about 1.5 to 3.0% in terms of the hard cost , which reportedly is one of the mostimportant solution quality indicator. For reference, a comparison of the obtained solution vis- ̀𝑎 -vis the best knownsolution has been drawn for TC-5, in Table 4, where a signiﬁcant diﬀerence in terms of the size of pairings can beobserved. Notably, the key features contributing to lower hard cost relate to presence of pairings with relatively lower- TAFB, overnight rests and meal cost. However, the obtained solution also entails more crew changes, some of which(involving aircraft change) negatively impact the soft cost. Hence, there appears to be a trade-oﬀ between the hard costand the soft cost. These section investigates the sensitivity of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 with respect to the sources of variability over multipleruns, even for the same problem. This study assumes importance, considering that performance variability is ratherinevitable when the mathematical programming based solution approaches are employed (Koch et al., 2011). As citedby Lodi & Tramontani (2013), variability in the performance of LP & MIP solvers may be observed on – changingthe computing platform (which may change the ﬂoating-point arithmetic), permuting the constraints/variables of therespective mathematical models, or changing the pseudo-random numbers’ seed. These changes/permutations maylead to an entirely diﬀerent outcome of the respective search algorithms (LP & MIP), as highlighted below.• The root source for the performance variability in MIP is the imperfect tie-breaking . A majority of the decisionsto be taken during an MIP-search are dependent on– the ordering of the candidates according to an interim score as well as the selection of the best candidate (one with the best score value). A perfect score that could fully-distinguish between the candidates is not-known mostly due to the lack of theoretical knowledge, and even if it

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 19 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Table 4

Salient features of  ⋆𝐼𝑃 for TC-5: 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s solution vis- ̀𝑎 -vis the best practice solution Features

𝑨𝒊𝒓𝑪𝑹𝑶𝑷 ’s solution Best practice solution pairings 926 783 unique ﬂights covered 4,212 4,212 deadhead ﬂights 3 3 overnight-rests 1,203 1,279 crew changes 1,002 825 average crew changes per pairing 1.082 1.054Total TAFB (HH:MM) 37444:54 38189:39 pairings covering 2 ﬂights 303 205 pairings covering 3 ﬂights 17 31 pairings covering 4 ﬂights 170 95 pairings covering 5 ﬂights 63 37 pairings covering 6 ﬂights 202 153 pairings covering 7 ﬂights 59 62 pairings covering 8 ﬂights 83 90 pairings covering 9 ﬂights 19 49 pairings covering 10 ﬂights 8 45 pairings covering 11 ﬂights 1 10 pairings covering 12 ﬂights 1 5 pairings covering 13 ﬂights 0 0 pairings covering 14 ﬂights 0 1Hotel cost (USD) 166,240 176,170Meal cost (USD) 157,269 160,397Hard cost (USD) 340,671 350,818Soft cost (USD) 51,600 42,750Actual ﬂying cost (USD) 4,203,342 4,203,342Total cost (USD) 4,595,613 4,596,910 is known, it may be too expensive to compute . Furthermore, additional ties or tiebreaks could be induced bychanging the ﬂoating-point operations, which inherently may change when the computing platform is changed.Amidst such an imperfect tie-breaking, the permutation of the variables/constraints changes the path within theMIP-search tree, leading to a completely diﬀerent evolution of the algorithm with rather severe consequences.• Depending upon the ﬂoating-point arithmetic or the sequence of variables loaded in an LPP, the performance ofthe simplex and interior-point methods may vary.• The performance of the LP and MIP solvers is also aﬀected by the choice of pseudo-random numbers’ seed,wherever the decisions are made heuristically. For instance, an interior-point method in the LP solvers performsa (random) crossover to one of the vertices of the optimal face when the search reaches its (unique) center.In the above background, the plausible reasons for variability in 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance are elaborated below.•

Generation of new legal pairings using a parallel architecture: in any LPP iteration 𝑡 , new legal pairings aregenerated in parallel, by allocating the sub-processes to the idle-cores of the CPU. These sub-processes returntheir respective pairing sets as soon as they are terminated. This by itself is not a challenge, however, whenthe 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is re-run, the order in which these sub-processes terminate may not be same as before (as itdepends on the state of the CPU), permuting the pairings in the cumulative pairing set  𝑡𝐶𝐺 . This permutedpairing set, when fed as part of the input to the LP solver in the next LPP iteration, may lead to a diﬀerent LPPsolution, leading to a diﬀerent outcome of the subsequent 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s search. To curb this, the pairings in theset that trigger the LP solver are sorted in lexicographical order of their representative strings . These stringsare constructed from the indices of the ﬂights covered in the corresponding pairings. For instance, the stringcorresponding to a pairing that covers ﬂights 𝑓 , 𝑓 , 𝑓 & 𝑓 is _ _ _ . Given that the pairings are For instance, in a strong branching scheme, the best variable to branch at each node is decided after simulating one-level of branching for eachfractional variable, however, it is performed heuristically to make it a computationally-aﬀordable task for MIP solvers (Linderoth & Lodi, 2011)

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 20 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Table 5

Performance variability assessment for

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on two test instances ∗ (TC-2 and TC-5) Test LPP-IPP Runs with performance variability Runs without performance variabilityCase Interactions Run-1 Run-2 Run (Seed- 𝛼 ) Run (Seed- 𝛽 ) Run (Seed- 𝛾 ) 𝑇  𝑇𝐿𝑃 ∕  𝑇𝐼𝑃

Cost Time Cost Time Cost Time Cost Time Cost TimeTC-2  𝐼𝐹𝑆  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃 Final Solution 3497106 08:46 3498588 12:05 3502118 11:05 3500504 09:38 3499063 12:27TC-5  𝐼𝐹𝑆  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃 Final Solution 4595613 22:52 4598412 23:37 4594146 22:27 4591176 22:59 4591065 26:32 ∗ All values in the “Cost” columns are in USD, and all the corresponding real values are rounded-oﬀ to the next integer values. Allvalues in the “Time” columns are in HH:MM, and all the corresponding seconds’ values are rounded-oﬀ to the next minute values. distinct, the resulting strings are distinct too, allowing for a crisp sorting criterion and ensuring a ﬁxed pairingsequence in each

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -run.•

Numerical seed for generation of pseudo-random numbers : variability may also be introduced if the numericalseed employed to generate pseudo-random numbers for use in the proposed modules or the utilized LP & MIPsolvers, varies. For instance, use of the default seed method of Python (i.e., the current time of the computingsystem) across diﬀerent

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 runs may lead to diﬀerent pseudo-random numbers, each time. This in turn

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 21 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks would trigger variability in the IFS generated by IPDCH (since the random selection of ﬂights in each of itsiterations, is impacted), and the pairing set resulting from the CG heuristic (since each of the underlying CGstrategy is impacted). Such variability could be negated by use of a ﬁxed numerical seed, instead of a timedependent one.The intriguing questions for researchers could relate to the impact that presence or absence of causes of variabilitymay have on the quality of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s solutions, in terms of both cost and run-time. Table 5 attempts to shed light onthese questions through empirical evidence for two test cases involving 3228 ﬂights (TC-2) and 4212 ﬂights (TC-5),respectively. In each of these test cases, the eﬀect of variability is revealed through:• two independent runs (Run-1 and Run-2), in each of which the causes of variability exist, that is: (a) the per-mutations of pairings generated using the parallel architecture is possible, and (b) the default seed method ofPython, based on the time of the computing system applies.• three independent runs, in each of which the causes of variability have been eliminated, that is: (a) the lexi-cographical order of the pairings is imposed, and (b) a ﬁxed numerical seed has been fed for random numbergeneration. For these runs, the numerical seeds are given by 𝛼 = 0 , 𝛽 = 1 , and 𝛾 = 2 , respectively.The key observations and inferences that could be drawn from each test case in Table 5 are highlighted below.• understandably, the Run-1 and Run-2 (corresponding to the same numerical seed), yield diﬀerent cost solutionsover diﬀerent run-time. Importantly, the variation in cost (despite the presence of causes of variability) is notalarming, though signiﬁcantly diﬀerent run-times may be required.• each run (corresponding to Seed- 𝛼 , Seed- 𝛽 , and Seed- 𝛾 , respectively) where the causes of variability have beennegated, if repeated, yield the same cost solution in the same run-time though it has not been shown in the tablefor paucity of space.• the runs corresponding to the numerical seeds given by 𝛼 , 𝛽 , and 𝛾 , respectively, diﬀer solely due to the diﬀerencein the corresponding random numbers generated, and subsequently utilized. It can be observed that the change innumerical seed does not signiﬁcantly aﬀect the cost-quality of the ﬁnal 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 solution though the associatedrun-time may vary signiﬁcantly.The fact that

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 can oﬀer ﬁnal solutions with comparable cost quality, regardless of the presence or absenceof causes of variability, endorses the robustness of the constitutive modules of the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . Also, the variation inrun-time could be attributed to diﬀerent search trajectories corresponding to diﬀerent permutations of variables ordiﬀerent random numbers. It may be noted that for the subsequent runs the lexicographical order of the pairings anda ﬁxed numerical seed (Seed- 𝛼 = 0 ) have been utilized. This section investigates the sensitivity of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 with respect to the cost quality of the initial solution and therun-time spent to obtain it. Towards it, the initial solution is obtained using three diﬀerent methods (oﬀering threeinput alternatives with varying cost and run-time) and the cost quality of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 ﬁnal solution alongside thenecessary run-time is noted.Notably, in an initial attempt to generate IFS for large-scale CPOPs, the authors proposed a DFS algorithm basedheuristic, namely, Enhanced-DFS heuristic (Aggarwal et al., 2018). Its performance across the ﬁve test cases has beenhighlighted in Table 6. In that, TC-1 emerges as an outlier owing to alarmingly high run-time, when compared to allother test cases. A plausible explanation behind this aberration is that TC-1 involves some ﬂights with very few legalﬂight connections, and a DFS based algorithm may have to exhaustively explore several ﬂight connections, to be ableto generate an IFS with full ﬂight coverage. The need to do away with reliance on DFS so as to have equable run-timeacross diﬀerent data sets explains the motivation for:• proposition of IPDCH in this paper, which as highlighted in Section 3.2, relies on: (a) a divide-and-cover strategyto decompose the input ﬂight schedule into suﬃciently-small ﬂight subsets, and (b) IP to ﬁnd a lowest-costpairing set that covers the maximum-possible ﬂights for each of the decomposed ﬂight subsets.

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 22 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Table 6

Performance of Enhanced-DFS heuristic (Aggarwal et al., 2018) for IFS generation. Here, the real valued “Cost” isrounded-oﬀ to the next integer value, and the seconds’ in the “Time” column are rounded-oﬀ to the next minute values.

Test Cases Time (HH:MM) Cost (USD)

TC-2

TC-3

TC-4

TC-5 • consideration of a commonly adopted

Artiﬁcial Pairings method (Vance et al., 1997), that constructs a pairingset which covers all the ﬂights, though some/all the pairings may not be legal. Hence, for this method the initialsolution would be referred as  𝐼𝑆 instead of  𝐼𝐹 𝑆 . Table 7

Performance assessment of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on TC-1 and TC-5 when initialized using the proposed IPDCH, the ArtiﬁcialPairings method, and the Enhanced-DFS heuristic.

LPP-IPP TC-1 TC-5Interactions Enhanced-DFS IPDCH Artiﬁcial Pairings Enhanced-DFS IPDCH Artiﬁcial Pairings 𝑇  𝑇𝐿𝑃 ∕  𝑇𝐼𝑃

Cost Time Cost Time Cost Time Cost Time Cost Time Cost

Time  𝐼𝐹𝑆 ∕  𝐼𝑆 ≈ ≈  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃 Final Solution 3469276 13:22 3469950 12:18 3470355 12:00 4592860 23:13 4594146 22:27 4597929 23:57 ∗ All values in the “Cost” columns are in USD, where the real values are rounded-oﬀ to the next integer values. All values in the“Time” columns are in HH:MM, where the seconds’ values are rounded-oﬀ to the next minute values.

A comparison of the above three methods has been drawn in Table 7, for TC-1 (posing challenge to Enhanced-DFS)and TC-5 (largest ﬂight set). In that, besides the cost and run-time of the initial solution for each test case, the resultsof all the iterations of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 leading up to the ﬁnal solution have been presented. The latter is done to shed lighton whether

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 ﬁnal solution cost quality strongly depends on the cost of the initial solution. The prominentobservations from the Table 7 include:• In terms of run-time: IPDCH could outperform the Enhanced-DFS, as its run-time happened to be less thanten minutes in both the test cases. The Artiﬁcial pairing method even out performs IPDCH, since its run-timehappened to be in milliseconds (formatted to 0 minutes in the table). D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 23 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks • In terms of initial cost: IPDCH could again outperform the Enhanced-DFS. This could be attributed to the useof IP to ﬁnd a lowest-cost pairing set that covers the maximum-possible ﬂights for each of the decomposed ﬂightsubsets. In contrast, the cost associated with the Artiﬁcial pairing method, is the worst. This is owing to a veryhigh pseudo-cost attached to the pairings to oﬀset their non-legality.Critically, regardless of the signiﬁcantly varying run-time and the initial cost associated with the three methods, thevariation in the cost of the ﬁnal solution oﬀered by

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is not signiﬁcant. This endorses the robustness of itsconstitutive modules.

This section investigates the sensitivity of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 to the termination parameter settings of the OptimizationEngine’s submodules, namely, LPP-solutioning and IPP-solutioning. The parameters involved in LPP-solutioning are

𝑇 ℎ 𝑐𝑜𝑠𝑡 and

𝑇 ℎ 𝑡 , while 𝑇 ℎ 𝑖𝑝𝑡 is involved in IPP-solutioning. To assess their impact on

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance, ex-periments are performed with three diﬀerent sets of parameter settings each, for both the submodules. Impact of Termination Settings of CG-driven LPP-solutioning:

As mentioned earlier, the CG-driven LPP-solutioning is terminated if the cost-improvement per LPP iteration falls be-low the pre-speciﬁed threshold

𝑇 ℎ 𝑐𝑜𝑠𝑡 (in USD) over

𝑇 ℎ 𝑡 number of successive LPP iterations. To achieve a reasonablebalance between 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 run time on the one hand and the cost reduction of the crew pairing solution on the otherhand, three diﬀerent sets of parameter settings are chosen, and experimented with. These settings of { 𝑇 ℎ 𝑐𝑜𝑠𝑡 , 𝑇 ℎ 𝑡 } including {500 , , {100 , , and {50 , symbolize relaxed, moderate and strict settings, respectively, since thecriterion for 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 termination gets more and more diﬃcult as the settings change from {500 , to {50 , .The results of the 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 -runs corresponding to these termination settings are reported in Table 8, and the keyobservations are as highlighted below.• As the termination settings transition through relaxed, moderate and strict settings, the run-time to obtain theﬁnal solution increases, while the cost of the ﬁnal solution decreases. An apparent exception to this trend isobserved in TC-5 with the strict setting, but this could be explained by the fact that the upper limit of 30 hoursset for

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 run time under practical considerations was exceeded during the fourth LPP-IPP interaction( 𝑇 = 4 ). It implies that due to the enforced termination in this particular case, 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 could not fully utilizethe potential for cost reduction.• Despite the variation in the termination settings, the cost quality of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 ﬁnal solution does not varyas drastically, as its run time. For instance, as the settings switched from relaxed to moderate: an additionalsaving of 6384 USD could be achieved at the expense of additional 5:20 run time in the case of TC-2, whilethese indicators stand at 13388 USD and 10:25, respectively, in the case of TC-5. It can also be inferred that { 𝑇 ℎ 𝑐𝑜𝑠𝑡 , 𝑇 ℎ 𝑡 } set as {100 , possibly oﬀers a fair balance between solution’s cost quality and run time, andthis explains why these settings have been used as the base settings for the experimental results presented in thispaper, beginning with Table 3 and ending with Table 9.It is important to recognize that as the termination settings for LPP-solutioning are made stricter, its run time is boundto increase. It is also fair to expect that the cost quality of the ﬁnal solution may be better, though it cannot be guaran-teed. Any such departures from the expected trend may be due to the dependence of the quality of the ﬁnal solutionon the quality of the IPP-solution for each 𝑇 . In that, if an IPP-solution for a particular 𝑇 may largely fail to approachthe lower bound set by the corresponding LPP-solution, it may negatively inﬂuence the cost quality obtained in sub-sequent LPP- and IPP-solutioning phases. While such a possibility remains, it did not surface in the experiments above. Impact of Termination Settings of IPP-solutioning:

As mentioned before, integerization of an LPP solution using an MIP solver is extremely time-consuming, particularlyfor large-scale CPOPs, and more so those involving complex ﬂight networks. Hence, from a practical perspective,the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 framework imposes a threshold on the upper time limit for IPP-solutioning (for any given 𝑇 ), namely 𝑇 ℎ 𝑖𝑝𝑡 , in case it does not self-terminate a priori. To investigate the impact of

𝑇 ℎ 𝑖𝑝𝑡 on 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance,experiments are performed with three diﬀerent settings, including, 00:20 (one-third of an hour), 00:40 (two-third of anhour), and 01:00 (an hour). The results are presented in Table 9, and the key observations are as follows. In the caseof TC-2, as the 𝑇 ℎ 𝑖𝑝𝑡 is raised, the run-time to obtain the ﬁnal solution increases, while the cost of the ﬁnal solution

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 24 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Table 8

Performance assessment of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on TC-2 and TC-5, against three diﬀerent termination settings (

Relaxed , Moderate and

Strict

Settings) of the CG-driven LPP-solutioning ∗ LPP-IPP TC-2 TC-5Interactions Relaxed Setting Moderate Setting Strict Setting Relaxed Setting Moderate Setting Strict Setting 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑻𝒉 𝒄𝒐𝒔𝒕 = 𝑻𝒉 𝒕 = 𝑇  𝑇𝐿𝑃 ∕  𝑇𝐼𝑃

Cost Time Cost Time Cost Time Cost Time Cost Time Cost Time  𝐼𝐹𝑆  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃 Final Solution 3508502 05:45 3502118 11:05 3496498 23:28 4607534 12:02 4594146 22:27 4624747 30:17 ∗ All values in the “Cost” columns are in USD, and all the corresponding real values are rounded-oﬀ to the next integer values. Allvalues in the “Time” columns are in HH:MM, and all the corresponding seconds’ values are rounded-oﬀ to the next minute values. decreases. However, there are exceptions to this trend in the case of TC-5. Notably, the cost quality of the ﬁnal solutioncorresponding to

𝑇 ℎ 𝑖𝑝𝑡 = 𝑇 ℎ 𝑖𝑝𝑡 = 𝑇 = 8 turned worse compared to the case of 𝑇 ℎ 𝑖𝑝𝑡 = 𝑇 = 9 ). The worsening of LPP-solution could be attributedto the fact that LPP-solutioning relies on random number based heuristics, and the resulting pairing combinations maynot necessarily oﬀer lower cost within the pre-speciﬁed termination settings.Based on the above, it may be inferred that despite the changes in the termination parameter settings, 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is able to oﬀer solutions with reasonably close cost quality, though signiﬁcant variations in run time may be observed.It is also evident that even the lowest setting (desired from a practical perspective) for

𝑇 ℎ 𝑖𝑝𝑡 =

5. Conclusion and Future Research

For an airline, crew operating cost is the second largest expense, after the fuel cost, making the crew pairing op-timization critical for business viability. Over the last three decades, CPOP has received an unprecedented attentionfrom the OR community, as a result of which numerous CPOP solution approaches have been proposed. Yet, the emer-gent ﬂight networks with conjunct scale and complexity largely remain unaddressed in the available literature. Sucha scenario is all the more alarming, considering that the air traﬃc is expected to scale up to double over the next 20years, wherein, most airlines may need to cater to multiple crew bases and multiple hub-and-spoke subnetworks. Thisresearch has proposed an Airline Crew Pairing Optimization Framework (

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ) based on domain-knowledgedriven CG strategies for eﬃciently tackling real-world, large-scale and complex ﬂight networks. This paper has pre-

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 25 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Table 9

Performance assessment of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 on TC-2 and TC-5, against three diﬀerent termination settings (

𝑇 ℎ 𝑖𝑝𝑡 = ∗ LPP-IPP TC-2 TC-5Interactions

𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑻 𝒉 𝒊𝒑𝒕 = 𝑇  𝑇𝐿𝑃 ∕  𝑇𝐼𝑃

Cost Time Cost Time Cost Time Cost Time Cost Time Cost Time  𝐼𝐹𝑆  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃  𝐿𝑃  𝐼𝑃 Final solution 3502118 11:05 3501809 12:24 3499609 19:22 4594146 22:27 4595703 27:55 4596929 30:35 ∗ All values in the “Cost” columns are in USD, and all the corresponding real values are rounded-oﬀ to the next integer values. Allvalues in the “Time” columns are in HH:MM, and all the corresponding seconds’ values are rounded-oﬀ to the next minute values. sented not just the design of the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s constitutive modules , but has also shared insights on how these modulesinteract and how sensitive the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ′ 𝑠 performance is to the sources of variability, choice of diﬀerent methodsand parameter settings .Given a CPOP, 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ﬁrst preprocesses the entire duty overnight-connection network via its Legal Crew Pair-ing Generator Subsequently,

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is initialized using an IFS generated by the proposed method (IPDCH). Next,the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s Optimization Engine attempts to ﬁnd a good-quality CPOP solution via intermittent interactions of itssubmodules, namely,

CG-driven LPP-solutioning and

IPP-solutioning . The eﬃcacy of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 has been demon-strated on real-world airline ﬂight network characterized by an unprecedented (in reference to available literature)conjunct scale-and-complexity, marked by over 4200 ﬂights, 15 crew bases, multiple hub-and-spoke subnetworks,and billion-plus pairings. The distinctive contribution of this paper is also embedded in its empirical investigation ofcritically important questions relating to variability and sensitivity, which the literature is otherwise silent on. In that:• ﬁrst, the sensitivity analysis of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is performed in the presence and absence of sources of variability. Itis empirically highlighted that

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is capable of oﬀering comparable cost solutions, both in the presenceor absence of the sources of variability. This endorses the robustness of its constitutive modules.• second, the sensitivity of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 with respect to the cost quality of the initial solution and the associatedrun-time is investigated vis- ̀𝑎 -vis three diﬀerent initialization methods. Again, the robustness of 𝐴𝑖𝑟𝐶𝑅𝑂𝑃 is This module is utilized again to facilitate legal crew pairings when required in real-time in other modules of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃

D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 26 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks endorsed, considering that it is found to be capable of oﬀering similar cost solutions, despite the signiﬁcantlyvarying cost and run-time of the initial solutions.• last, the sensitivity of

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 to the termination parameter settings associated with the Optimization Engine’ssubmodules, is investigated. The fact that with the variation in termination settings of both LPP-solutioning andIPP-solutioning (independent of each other)- the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 ’s performance strongly aligns with the logicallyexpected trends, is a testimony to the robustness of its constitutive modules.Notably,

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 has been implemented using Python scripting language, aligned with the industrial sponsor’spreferences. However, a signiﬁcant reduction in run-time could be achieved by the use of compiled programminglanguages such as C++, Java, etc. Moreover, employing the domain-knowledge driven CG strategies during the IPP-solutioning phase too, may augment the overall cost- and time-eﬃciency of the

𝐴𝑖𝑟𝐶𝑅𝑂𝑃 . Furthermore, the emergingtrend of utilizing the

Machine Learning capabilities for assisting combinatorial optimization tasks, may also holdpromise for the airline crew pairing optimization, towards which an exploratory attempt has been made by the authors(Aggarwal, Singh, & Saxena, 2020). Despite the scope for improvement, the authors hope that with the emergenttrend of evolving scale and complexity of airline ﬂight networks, this paper shall serve as an important milestone forthe aﬃliated research and applications.

Acknowledgment

This research work is a part of an Indo-Dutch joint research project, supported by the Ministry of Electronics andInformation Technology (MEITY), India [grant number 13(4)/2015-CC&BT]; Netherlands Organization for Scien-tiﬁc Research (NWO), the Netherlands; and General Electric (GE) Aviation, India. The authors thank GE Aviation,particularly, Saaju Paulose (Senior Manager), Arioli Arumugam (Senior Director- Data & Analytics), and Alla Ra-jesh (Senior Staﬀ Data & Analytics Scientist) for providing real-world test cases, and sharing their domain knowledgewhich has helped the authors signiﬁcantly in successfully completing this research work.

References

Achterberg, T., & Wunderling, R. (2013). Mixed integer programming: Analyzing 12 years of progress. In

Facets of combinatorial optimization (pp. 449–481). Springer.Aggarwal, D., Saxena, D. K., Bäck, T., & Emmerich, M. (2020a). A Novel Column Generation Heuristic for Airline Crew Pairing Optimizationwith Large-scale Complex Flight Networks. arXiv preprint arXiv:2005.08636 . Retrieved from https://arxiv.org/abs/2005.08636v3

Aggarwal, D., Saxena, D. K., Bäck, T., & Emmerich, M. (2020b). Real-World Airline Crew Pairing Optimization: Customized Genetic Algorithmversus Column Generation Method. arXiv preprint arXiv:2003.03792 . Retrieved from http://arxiv.org/abs/2003.03792

Aggarwal, D., Saxena, D. K., Emmerich, M., & Paulose, S. (2018, November). On large-scale airline crew pairing generation. In (pp. 593–600).Aggarwal, D., Singh, Y. K., & Saxena, D. K. (2020). On Learning Combinatorial Patterns to Assist Large-Scale Airline Crew Pairing Optimization. arXiv preprint arXiv:2004.13714 . Retrieved from https://arxiv.org/abs/2004.13714v3

Anbil, R., Forrest, J. J., & Pulleyblank, W. R. (1998). Column generation and the airline crew pairing problem.

Documenta Mathematica , (1),677.Anbil, R., Gelman, E., Patty, B., & Tanga, R. (1991). Recent advances in crew-pairing optimization at american airlines. Interfaces , (1), 62–74.Anbil, R., Tanga, R., & Johnson, E. L. (1992). A global approach to crew-pairing optimization. IBM Systems Journal , (1), 71–78.Andersen, E. D., & Andersen, K. D. (2000). The mosek interior point optimizer for linear programming: an implementation of the homogeneousalgorithm. In High performance optimization (pp. 197–232). Springer.Barnhart, C., Cohn, A. M., Johnson, E. L., Klabjan, D., Nemhauser, G. L., & Vance, P. H. (2003). Airline crew scheduling. In

Handbook oftransportation science (pp. 517–560). Springer.Barnhart, C., Johnson, E. L., Nemhauser, G. L., Savelsbergh, M. W., & Vance, P. H. (1998). Branch-and-price: Column generation for solvinghuge integer programs.

Operations research , (3), 316–329.Beasley, J. E., & Chu, P. C. (1996). A genetic algorithm for the set covering problem. European journal of operational research , (2), 392–404.Bertsimas, D., & Tsitsiklis, J. N. (1997). Introduction to linear optimization (Vol. 6). Athena Scientiﬁc Belmont, MA.Desaulniers, G., Desrosiers, J., Dumas, Y., Marc, S., Rioux, B., Solomon, M. M., & Soumis, F. (1997). Crew pairing at air france.

Europeanjournal of operational research , (2), 245–259.Desaulniers, G., & Soumis, F. (2010). Airline crew scheduling by column generation. CIRRELT Spring School, Montréal Canada .Desrochers, M., Desrosiers, J., & Solomon, M. (1992). A new optimization algorithm for the vehicle routing problem with time windows.

Operationsresearch , (2), 342–354.Desrochers, M., & Soumis, F. (1989). A column generation approach to the urban transit crew scheduling problem. Transportation science , (1),1–13. D. Aggarwal et al.:

Preprint submitted to Elsevier

Page 27 of 28rew pairing optimization framework for tackling large-scale & complex ﬂight networks

Desrosiers, J., Dumas, Y., Desrochers, M., Soumis, F., Sanso, B., & Trudeau, P. (1991).

A breakthrough in airline crew scheduling (Tech. Rep. No.G-91-11). Montreal: Cahiers du GERAD.Desrosiers, J., Soumis, F., & Desrochers, M. (1984). Routing with time windows by column generation.

Networks , (4), 545–565.Deveci, M., & Demirel, N. Ç. (2018a). Evolutionary algorithms for solving the airline crew pairing problem. Computers & Industrial Engineering , , 389–406.Deveci, M., & Demirel, N. C. (2018b). A survey of the literature on airline crew scheduling. Engineering Applications of Artiﬁcial Intelligence , , 54–69.Du Merle, O., Villeneuve, D., Desrosiers, J., & Hansen, P. (1999). Stabilized column generation. Discrete Mathematics , (1-3), 229–237.Garey, M. R., & Johnson, D. S. (1979). Computers and intractibility: A guide to the theory of np-completeness (Vol. 44). New York: W. H. Freeman& Company.Gershkoﬀ, I. (1989). Optimizing ﬂight crew schedules.

Interfaces , (4), 29–43.Goldberg, D. E. (2006). Genetic algorithms . Pearson Education India.Gurobi Optimization, L. (2019).

Gurobi optimizer reference manual.

Retrieved from

Gustafsson, T. (1999).

A heuristic approach to column generation for airline crew scheduling . Department of Mathematics, Chalmers Universityof Technology.Hoﬀman, K. L., & Padberg, M. (1993). Solving airline crew scheduling problems by branch-and-cut.

Management science , (6), 657–682.Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the sixteenth annual acm symposium on theoryof computing (pp. 302–311).Kasirzadeh, A., Saddoune, M., & Soumis, F. (2017). Airline crew scheduling: models, algorithms, and data sets.

EURO Journal on Transportationand Logistics , (2), 111–137.Koch, T., Achterberg, T., Andersen, E., Bastert, O., Berthold, T., Bixby, R. E., ... others (2011). Miplib 2010. Mathematical ProgrammingComputation , (2), 103.Kornilakis, H., & Stamatopoulos, P. (2002). Crew pairing optimization with genetic algorithms. In Hellenic conference on artiﬁcial intelligence (pp. 109–120).Land, A. H., & Doig, A. G. (1960). An automatic method of solving discrete programming problems.

Econometrica , (3), 497–520.Levine, D. (1996). Application of a hybrid genetic algorithm to airline crew scheduling. Computers & Operations Research , (6), 547–558.Linderoth, J. T., & Lodi, A. (2011). Milp software (J. J. Cochran, Ed.). John Wiley & Sons.Lodi, A. (2009).

Mixed integer programming computation (M. Jünger et al., Eds.). Springer-Verlag.Lodi, A., & Tramontani, A. (2013). Performance variability in mixed-integer programming. In

Theory driven by inﬂuential applications (pp. 1–12).INFORMS.Lübbecke, M. E. (2010). Column generation.

Wiley encyclopedia of operations research and management science .Lübbecke, M. E., & Desrosiers, J. (2005). Selected topics in column generation.

Operations research , (6), 1007–1023.Marsten, R. (1994). Crew planning at delta airlines. Presentation at XV Mathematical Programming Symposium, Ann Arbor, MI, USA .Ozdemir, H. T., & Mohan, C. K. (2001). Flight graph based genetic algorithm for crew scheduling in airlines.

Information Sciences , (3-4),165–173.Padberg, M., & Rinaldi, G. (1991). A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAMreview , (1), 60–100.Tarjan, R. (1972). Depth-ﬁrst search and linear graph algorithms. SIAM journal on computing , (2), 146–160.Vance, P. H., Barnhart, C., Gelman, E., Johnson, E. L., Krishna, A., Mahidhara, D., ... Rebello, R. (1997). A heuristic branch-and-price approachfor the airline crew pairing problem (Tech. Rep. No. LEC-97-06). Atlanta: Georgia Institute of Technology.Vazirani, V. V. (2003).

Approximation algorithms . Springer, Berlin, Heidelberg, (Chapter 13).Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., ... Contributors, S. . . (2020). SciPy 1.0: FundamentalAlgorithms for Scientiﬁc Computing in Python.

Nature Methods , , 261–272. doi: https://doi.org/10.1038/s41592-019-0686-2Zeren, B., & Özkol, İ. (2012). An improved genetic algorithm for crew pairing optimization. Journal of Intelligent Learning Systems and Applica-tions , (01), 70.Zeren, B., & Özkol, I. (2016). A novel column generation strategy for large scale airline crew pairing problems. Expert Systems with Applications , , 133–144. D. Aggarwal et al.: