[PDF] Statistical Characterization of Airplane Delays

Abstract

The aviation industry is of great importance for a globally connected economy. Customer satisfaction with airlines and airport performance is considerably influenced by how much flights are delayed. But how should the delay be quantified with thousands of flights for each airport and airline? Here, we present a statistical analysis of arrival delays at several UK airports between 2018 and 2020. We establish a procedure to compare both mean delay and extreme events among airlines and airports, identifying a power-law decay of large delays. Furthermore, we note drastic changes in plane delay statistics during the COVID-19 pandemic. Finally, we find that delays are described by a superposition of simple distributions, leading to a superstatistics.

Full PDF

SStatistical Characterization of Airplane Delays

Evangelos Mitsokapas, ∗ Benjamin Sch¨afer, ∗ Rosemary Harris, and Christian Beck School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom

The aviation industry is of great importance for a globally connected economy. Customer sat-isfaction with airlines and airport performance is considerably inﬂuenced by how much ﬂights aredelayed. But how should the delay be quantiﬁed with thousands of ﬂights for each airport andairline? Here, we present a statistical analysis of arrival delays at several UK airports between 2018and 2020. We establish a procedure to compare both mean delay and extreme events among airlinesand airports, identifying a power-law decay of large delays. Furthermore, we note drastic changesin plane delay statistics during the COVID-19 pandemic. Finally, we ﬁnd that delays are describedby a superposition of simple distributions, leading to a superstatistics.

I. INTRODUCTION

The aviation industry was a rapidly growing sector until recently, prior to the current COVID-19pandemic. Economic growth led to higher average yearly distances travelled, as well as higher airtraﬃc volumes, robustly observed among several regions worldwide until 2019 [1, 2]. But both theongoing pandemic [3] and also the push towards more renewable options in aviation [4] may inducea considerable change in the industry in the future. This makes the industry a very interestingobject to study as it transforms.As a passenger, an important benchmark for evaluating travel options, e.g. in terms of airports,airlines or even modes of transportation (train vs plane) is the punctuality of each option. In par-ticular, ﬂight delays severely decrease customer satisfaction and might lead to customers choosinga diﬀerent airport or airline, especially if a timely arrival is urgently required [5]. Generally, itis important to quantitatively understand delay-risks both in terms of the expectation values butalso in terms of the extreme events, i.e. quantifying how likely a very early or very late arrival is.The study of delays in aviation is already an active ﬁeld of research. Previous, simple, investiga-tion frameworks to classify and categorize delays have been proposed [6] but mostly relied on meanvalues. In other cases, stochastic models of plane delays [7] were developed without consideringthe corresponding probability distributions or assuming simple Normal or Poisson distributions [8].More recent work also includes the application of machine learning techniques to aviation data, e.g.via recurrent neural networks [9]. One problem of any data-driven approach is that many articleson aviation research solely rely on proprietary data: In a recent review investigating 200 researcharticles, 68% were based on proprietary data [10]. Hence, to enable the broader applicability ofmachine learning applications, more publicly available data are still required.To quantify delay statistics, we will go beyond the oft-used averages of delays [6] and insteadinvestigate the entire probability density function, as it encodes the full range of delay information,from highly negative delays (i.e. ﬂights arriving signiﬁcantly earlier than their scheduled arrivaltime) to severely positively delayed ﬂights. To explain the emergence of heavy tails, i.e. extremedeviations from the mean, we will utilize superstatistical modelling [11]. Such an approach has beensuccessfully applied in transport before, for modelling train delays [12]; it has also attracted recentinterest when describing ﬂuctuations in the energy system [13] and air pollutant concentrations[14] and it has been extended to the general framework of diﬀusing diﬀusivities in nonequilibriumstatistical physics and biologically inspired physics [15].In this article, we present new data collected from 2018 to 2020 at several UK airports, with aparticular focus on Heathrow, being the most important international hub in the UK. The data ∗ contributed equally a r X i v : . [ phy s i c s . s o c - ph ] D ec were publicly available from the arrival information of each airport, given out on their websites eachday but had to be collected and processed for further usage. We analyse the full probability densityof delay distributions and introduce certain performance indices to describe these distributions,such as the mean delay, the exponential decay rate of negative delays, and the power-law exponentof large delays. These indices are then compared for the diﬀerent UK airports and the diﬀerentairlines operating at these airports, to understand the main features of the delay statistics in amore systematic way. Finally, we deal with a theoretical model to explain features of the delaystatistics. We show that the power law of large positive delays can be linked to a superpositionof exponential delays with a varying decay parameter, in a superstatistical approach. On thecontrary, negative delays (early arrivals) do not exhibit any power laws but simply behave in anexponential way, with extremely early arrivals exponentially unlikely. Throughout this article, weassume that passengers prefer to arrive as early as possible, i.e. with as little positive and as muchnegative delay as possible. II. NEW DATA

We collected ﬂight details from a number of diﬀerent airports. For the purposes of this article,we have taken into consideration the top ﬁve UK airports, in order of passenger traﬃc [16], namely:London Heathrow Airport (LHR), London Gatwick Airport (LGW), London Luton Airport (LTN),London Stansted Airport (STN) and Manchester Airport (MAN). For a period of time lastingbetween Autumn 2018 and Spring 2019, we collected a combined total of approximately two-hundred and twenty thousand (2 . × ) ﬂight-arrivals from all ﬁve airports mentioned above.Furthermore, we continued collecting ﬂight-information from London Heathrow during the 2020COVID-19 pandemic, to illustrate the eﬀect the lockdown had on the delay distribution. For eachﬂight, we recorded the airline company operating the ﬂight along with the corresponding ﬂightnumber, departure and arrival airports, as well as scheduled and actual landing times. The delayis then computed simply as the diﬀerence between an aircraft’s scheduled arrival time and its actualarrival time. We made all collected data publicly available. For details of the data processing andavailability, see Methods.The main body of our data (about 85%) is sourced from London Heathrow, making it the chieffocus of our analysis simply due to its size. London Heathrow is an international airport operatingﬂights of 80 diﬀerent airlines in total, which ﬂy to 84 diﬀerent countries around the world, as of 2019[16]. Of course, in addition there are domestic ﬂights within the UK. The passenger nationalitiesare 48% European and UK and 52% from the rest of the world. It is the busiest airport in Europeby passenger traﬃc [16].The empirical probability density function (PDF) of all delays is a key characteristic to monitor,see Fig. 1 for all Heathrow delays. There, we compare the data collected from 2018 to 2019 withmore recent data collected during the 2020 COVID-19 pandemic (during the ﬁrst lockdown inSpring to Summer 2020), which led to a drastic reduction in air transport [17, 18]. There aretwo interesting observations: Firstly, the delay statistics under COVID-19 are shifted to the left,indicating overall smaller delays (including more negative delays); secondly, the general shape ofthe distribution does not change drastically. In particular, we observe a fast decay of the PDF ofnegative delays on the left side and a much slower decay of the PDF on the right side for positivedelays. In the following sections, we will analyse this behaviour in much more detail. III. QUANTIFYING DELAY STATISTICS

Starting from a histogram of the ﬂight delays, we derive three indices/measures to quantify ﬂightdelay distributions: Mean delay, exponent of left exponential and power-law exponent of right q -exponential, as explained below in detail. We will use the LHR data previous to any COVID-19inﬂuence as our main example. − −

50 0 50 100 150 200Arrival Delays [min]10 − − − − P D F LHR prior to COVID-19LHR during COVID-19

FIG. 1. Flight delays follow a broad distribution with large negative and positive delays. We display LHRdelay histograms prior to and during the COVID-19 pandemic, both normalized. As the COVID-19 LHRdata set is signiﬁcantly smaller in size, compared to the regular LHR data set, it contains many gaps,where no data were recorded. The COVID-19 data set is signiﬁcantly shifted towards the left (smallerdelays) as compared to the pre-pandemic time.

As a ﬁrst step, we split the full histogram at its peak value into two histograms, a left ﬂank ofdominantly negative delays and a right ﬂank of dominantly positive delays, see Fig. 2. Based on theshape of the empirical distributions, we use exponentials and q -exponentials as ﬁtting functions,see also Methods for details.The left ﬂank is observed to be well approximated by an exponential function of the form p ( t L ; λ ) = λe − λt L , λ > , (1)where t L are the normalized arrival delays on the left ﬂank, see Methods for details. The exponent λ here quantiﬁes the exponential decay of the probability of early arrivals. Therefore, a large λ implies that few ﬂights arrive very early while a small λ indicates that very large negative delaysare observed. Hence, typically, we assume a small λ to be desirable.The right ﬂank of the delay distribution obeys a power law, i.e. a slow decay of p ∼ t ν , with ν negative. To quantitatively describe the right ﬂank, we use a q -exponential function [19] of theform p ( t R ; q, λ q ) = (2 − q ) λ q [1 + ( q − λ q t R ] − q , (2)where t R are the normalized arrival delays on the right ﬂank, see Methods for details. The power-law exponent, i.e., the rate at which the probability density decays for high (positive) delay values,is given by ν := 1 / (1 − q ) , < q <

2. Note that the scale parameter λ q > ν should be as large as possible. Large (absolute) values of ν imply a rapid decay ofpositive delays, i.e. fewer extreme events of very delayed arrivals.Finally, we note that the two ﬂanks describe the tails of the distribution well but overestimatethe peak, i.e. the most likely value, see Fig. 3. Hence, we complement the two previous ﬁts byusing the mean delay µ as a third index. Since we assume that passengers wish to arrive as early as − − − − − − − − P D F − − − FIG. 2. Splitting the full distribution at the peak leads to two easier-to-ﬁt ﬂanks. Left: Negative delaysdecay approximately linearly in the log-scale and thereby suggest an exponential ﬁt (1) Right: Positivedelays display substantial heavy tails and thereby suggest the usage of a q -exponential function (2). − −

50 0 50 100 150 200Arrival Delays [min]10 − − − − − − P D F λ = ν = − FIG. 3. Exponential (green) and q -exponential (blue) theoretical distributions capture the empirical dis-tribution. The ﬁts are obtained via the MLE method, see Methods for ﬁtting details. To complement theover-estimated “peak” (tent-like shape) we introduce the mean delay µ index. possible, the mean µ should be as small (or as negative) as possible. In the case of LHR, the threedelay indices that we introduced are λ = 0 . µ = − .

06 and ν = − . µ can be easily manipulated by airline companies by scheduling ﬂightarrival times later then actually needed, hence always causing a negative mean delay, which mayartiﬁcially improve their performance. On the contrary, the tail behavior truthfully represents theextreme event statistics for both positive and negative delays and cannot be easily manipulatedby the operators. I b e r i a B r i t i s h A i r w a y s A e r L i n g u s F i nn a i r A m e r i c a n A i r li n e s A i r C a n a d a U n i t e d A i r li n e s J a p a n A i r li n e s Q a t a r A i r w a y s L e f t E x p o n e n t s λ [ m i n − ] Desirable: ↓ I b e r i a B r i t i s h A i r w a y s A e r L i n g u s F i nn a i r A m e r i c a n A i r li n e s A i r C a n a d a U n i t e d A i r li n e s J a p a n A i r li n e s Q a t a r A i r w a y s − − − − − − − M e a n D e l a y s µ [ m i n ] Desirable: ↑ I b e r i a B r i t i s h A i r w a y s A e r L i n g u s F i nn a i r A m e r i c a n A i r li n e s A i r C a n a d a U n i t e d A i r li n e s J a p a n A i r li n e s Q a t a r A i r w a y s − − − − − − − R i g h t E x p o n e n t s ν Desirable: ↑ Airlines covering long-distance ﬂights

FIG. 4. International airlines appear to diﬀer substantially in their three delay indices. We plot the left-side (negative) delay exponential decay, right-side (positive) power-law delay decay and the mean delay.Arrows indicate whether a small or large value is desirable.

IV. COMPARISON OF AIRPORTS AND AIRLINES

We here use the previously developed framework to quantify and compare delay statistics fordiﬀerent airlines and airports. Intuitively, we expect that long-distance ﬂights would, on average,yield more extreme early or late arrivals, compared to the corresponding short-distance ones. Thus,we distinguish between short-distance airlines, covering mostly domestic and European destina-tions, and airlines that include long-distance, international destinations, as well as destinationswithin Europe. We ﬁrst compute the three indices λ, µ, ν for each of those airline groups and thencompare full airport statistics, aggregating all airlines.There are several factors impacting the delay distribution for each airport or airline: Airlinepolicies, ﬂight routes, technical defects or issues with documentation contribute to 27% of all delays[20]. Speciﬁcally, overseas ﬂights are more sensitive to wind (head wind or tail wind), as well asunstable weather conditions (storms, fog) and military exercises. Airlines operating internationalﬂights, as illustrated in Fig. 4, exhibit considerable variations in their ﬂight delay indices. Notethat while a low left exponent λ may be regarded as a desirable property (ﬂights often arrivevery early), the mean µ and the right exponent ν should deﬁnitely be as small as possible (lowmean delay and few very late arrivals). Since both quantities tend to be negative, their absolutevalues should be large. Comparing the airlines, we observe a “grouping” behaviour for some of thecarriers. On the one hand, airlines having a blend between short-distance (e.g. domestic or EU)and overseas destinations, such as Iberia, British Airways (BA), Aer Lingus and Finnair, appearto follow a similar trend for each index. On the other hand, airlines that do not possess such aspread of destinations tend to perform well only in some of the indices. As an illustrative examplewe choose Air Canada and United Airlines: Although both their left and right exponents are in asimilar range to the other airlines, their mean delays are substantially less negative than those oftheir competitors.Characterization of short-distance ﬂights shows a strong grouping of the delay behavior forsome airlines. As seen in Fig. 5, comparison of ﬁve of the largest low-cost domestic and Europeanproviders, reveals a systematic similarity between Wizz Air, easyJet and Ryanair. All three airlinesmanage to perform well in the left exponent metric, minimizing early arrivals, while they maintainan acceptable negative average delay (with easyJet performing worst here). Again, they follow a W i zz A i r e a s y J e t R y a n a i r V u e li n g J e t L e f t E x p o n e n t s λ [ m i n − ] Desirable: ↓ W i zz A i r e a s y J e t R y a n a i r V u e li n g J e t − − − − − − − − M e a n D e l a y s µ [ m i n ] Desirable: ↑ W i zz A i r e a s y J e t R y a n a i r V u e li n g J e t -14.00-13.00-12.00-11.00-10.00-9.00-8.00-7.00-6.00-5.00 R i g h t E x p o n e n t s ν Desirable: ↑ Airlines not covering long-distance ﬂights

FIG. 5. Delay indices for low-cost airlines not covering long-distance ﬂights. Wizz Air, easyjet, Ryanairand Vueling share the largest λ index (early arrivals). Jet2 has the smallest mean delay µ and Vueling ischaracterized by the smallest ν index (late arrivals). similar right-exponent trend, translated to a certain number of overall late arrivals. Furthermore,Jet2 outperforms all other short-distance airlines in λ left-exponents and mean delays. Finally,Vueling resembles Wizz Air and Ryanair values in the λ and µ metrics but seems to have less latearrivals as per its high right exponent ν .Comparing the long distance airlines with the short-distance ones, we notice some diﬀerences:Airlines covering long distances tend to display lower (more desirable) left exponents as well ashigher negative mean delays. Meanwhile, the right exponent behavior is similar between the twogroups with Vueling and Qatar Airlines as the ”outliers” in their respective category. Whetherthis behavior is due to company policies or ﬂight distance remains a question for future research.Studying the indices for individual airports yields interesting insights as well. Airports populatedby airlines ﬂying mainly to domestic and EU destinations, such as LTN and STN, score fairly wellin both early and late arrivals, with an approximately net zero mean delay, see Fig. 6. On the onehand, STN is characterized by the minimum λ value, showing the best performance in early arrivalsin the group of airports, while LTN attains an acceptable value. On the other hand, it can beseen that LTN scores the second best ν value while STN lies very slightly above the median group ν . Interestingly, mean delays at MAN airport are net positive, contrary to LHR and GAT wherearrivals are scheduled in such a way that the mean delay is negative. Furthermore, MAN seems tonot perform that well in the early arrivals index, having the worst score, but does outperform allthe rest of the airports when compared under the extreme positive delays. International airportssuch as LHR and GAT (with the exception of LHR COVID-19) tend to cluster around similarvalues for all delay indices.LHR during the COVID-19 pandemic outperforms all airports on the mean delay index by a largemargin. The reason behind this is that the dramatic reduction of ﬂight traﬃc worldwide saw manyﬂights arriving too early. Interestingly, the left exponent, i.e. the decay of early arrivals, did notchange substantially, compared to business-as-usual since the shape of the delay distribution on theleft did not change much but was only shifted to more negative values. The right ﬂank behaves quitediﬀerently: Both business-as-usual and LHR during the COVID-19 pandemic, recorded heavilydelayed ﬂights, which arrived more than 3 hours late (see also Fig. 1). The right index revealsthe likelihood of these extreme events. In the case of LHR under COVID-19, the low mean delaysuggests early arrival but the extreme events are still present and hence the right exponent revealsthis poor performance. L H R L H R C O V I D G A T M A N L T N S T N L e f t E x p o n e n t s λ [ m i n − ] Desirable: ↓ L H R L H R C O V I D G A T M A N L T N S T N − − − − − M e a n D e l a y s µ [ m i n ] Desirable: ↑ L H R L H R C O V I D G A T M A N L T N S T N − − − − − − − R i g h t E x p o n e n t s ν Desirable: ↑ UK Airports Three Measures

FIG. 6. Airports appear to diﬀer substantially in the three delay metrics. Airports that serve mostlydomestic and European destinations, such as LTN and STN, behave diﬀerently from international airportssuch as LHR, GAT and MAN.

V. SUPERSTATISTICAL MODELLING OF DELAYS

As we have seen previously, the right ﬂank of the delay statistics exhibits heavy tails and iswell-described by a q -exponential. Let us now explore a potential explanation for this particulardistribution by employing the framework of superstatistics [11, 21, 22]. Superstatistics is relevantwhen an aggregated system (e.g. a long time series) displays heavy tails, but the system may thenbe disentangled into many smaller sub-parts (e.g. short time periods of the trajectory). Thesesub-parts then are no longer heavy-tailed but follow a simple local distribution, for example anexponential or a Gaussian. This idea has been successfully applied, for example, to train delays[12], electric power systems [13] and intermittent wind statistics [23].Assuming for now that the right-ﬂank delays are indeed q -exponentially distributed and follow asuperstatistics, we should be able to observe “local” exponential densities, with a decay parameter λ . Superimposing all these λ , we get a q -exponential if the λ themselves follow a χ -distribution: f ( λ ) = 1Γ (cid:0) n (cid:1) (cid:18) n λ (cid:19) n λ n − e − nλ λ . (3)Here n denotes the number of degrees of freedom characterizing the ﬂuctuations in λ and λ isthe sample mean of λ . Indeed, choosing an adequate time scale to separate the trajectory (seenext paragraph), the heavy tails of delay distributions vanish and instead the distributions are welldescribed by simple exponential functions, see Fig. 7.Let us explain how to extract the relevant time scale T on which we locally observe exponentialdistributions. Since we know that an exponential distribution has a kurtosis of κ exponential = 9, wetest time windows of diﬀerent size ∆ τ and compute the local average kurtosis [11] as¯ κ (∆ τ ) = 1 τ max − ∆ τ (cid:90) τ max − ∆ τ dτ (cid:104) ( u − ¯ u ) (cid:105) τ , ∆ τ (cid:104) ( u − ¯ u ) (cid:105) τ , ∆ τ , (4)where τ max is the length of the time series u and ¯ u is the mean of the time series. We denote by (cid:104) . . . (cid:105) τ , ∆ τ the expectation formed for a time slice of length ∆ τ starting at τ . For the LHR data,we show how to determine the long time scale T in Fig. 8.Next, let us carry out an important consistency check: As explained above, the mixing ofnumerous local exponential distributions with exponents following a χ -distribution leads to a

10 5 0Delay [min]10 P D F High exponent, e =-0.24 Exp fitData

25 0 25 50 75Delay [min]10 P D F Low exponent, e =-0.04 Exp fitData

FIG. 7. We analyse the full time series of plane delays and extract a time window during which we observelocally exponential distributions. These local distributions can decay slowly or fast, i.e. the rate λ isﬂuctuating. q -exponential. Now, we can make a histogram of the λ -distribution and ﬁt it with a χ - andan inverse χ -distribution. Next, we derive the q -exponential from the ﬁtted χ -distribution andcompare it with the direct ﬁt of the q -exponential and the original data. This is illustrated inFig. 9.We note that the empirical λ -distribution is slightly better ﬁtted by an inverse χ - than a χ -distribution, as also observed in other application areas [14, 24]. Overall, the superstatisticaldescription seems consistent, given the short time series of ﬂight delays under consideration. The q -exponential derived from the χ tends overestimate the PDF at low values, which is understandableas we also exclude them for the ﬁtting of the q -exponential via MLE (see methods). Still, the tailbehavior of the q -exponential based on the χ matches the real data and the MLE ﬁt nicely.This means the observed power laws of the right ﬂank are essentially explained by a suitablesuperstatistics which describes changes in the microvariables on a time scale of T ≈ . VI. CONNECTING THE FLANKS

So far, we focused our attention on describing and ﬁtting the tail aspects of the distribution,namely the left, approximately exponential, ﬂank and the right, approximately q -exponential,ﬂank. Both these functions combined overestimate the peak of the distribution and hence, we alsoincluded the mean delay as the ﬁnal metric in our framework. Now, let us consider how the twotail distributions could be merged in one smooth-ﬁtting function.First, we note that the so far mostly ignored central part of the delay distribution might beapproximated by a Gaussian distribution, based on the parabola shape in the log-scale plots. Weuse this insight to propose the following continuous ﬁtting function p ( t ) =  A e exp (cid:16) − λ (cid:112) C + ( t − t peak ) (cid:17) , t < t peak A q exp q (cid:16) − λ q (cid:112) C + ( t − t peak ) (cid:17) , t ≥ t peak (5) A v e r a g e k u r t o s i s DataExponential

FIG. 8. The average kurtosis ¯ κ of the data set is plotted as a function of the time window ∆ τ in hours(blue). The intersection between the horizontal line at ¯ κ = 9 (the kurtosis of an exponential distribution)and the ¯ κ vs ∆ t curve gives the optimal value for ∆ t ; we ﬁnd T ≈ .

55 hours. λ f ( λ ) inv- χ , n = χ , n = − − − P D F ν χ = − FIG. 9. Applying superstatistics leads to consistent results. Left: We extract the distribution of localexponents and compare them to a χ and inverse χ ﬁt (based on the method of least squares). Right: Usingthe previously derived χ distribution, we again derive a q -exponential with right exponent ν χ ≈ − . ν MLE ≈ − . χ -based q -exponential, the tail behavior is decently captured. with exp q ( t ) = (2 − q ) λ q [1 + ( q − λ q t ] − q being the q -exponential function. Here, A e and A q areamplitudes, C is a curvature parameter, describing the approximately Gaussian part in the center, t peak is the delay at the peak of the delay distribution, where we split into left and right ﬂank and t are the delay values, see Methods for ﬁtting details.The resulting ﬁt is a smooth function, covering the full delay range, see Fig. 10. Since the newcurvature parameter C also inﬂuences the general shape, the new values for q and λ , now named˜ q and ˜ λ , are slightly diﬀerent from the ones solely focusing on the tails (empirically we tend toobserve a slight reduction in λ and increase in q ). Still, the general observations using the delayindices and comparing airlines, such as in Figs. 4-6, remain mostly unchanged. Equation (5)provides an alternative approach to the three delay indices introduced so far. If one is interestedin describing the full distribution as accurately as possible, we recommend using equation(5).Meanwhile, to compare performance of individual airlines or to obtain a general impression of thedelay distribution, the three delay indices are a simpliﬁed framework, allowing easy and robustestimation and comparison. Finally, note that the full curve is not strictly a probability densityfunction as we did not enforce that its integral equals one. While theoretically making it easier byreducing the number of parameters, that would make the ﬁtting more diﬃcult in practice as theintegrals cannot be evaluated analytically by hand but only impose additional constraints during0the ﬁtting. Also note that our observed ﬂight delays live on the ﬁnite interval [-100,210], whereasthe ﬁtting function is deﬁned on [ −∞ , ∞ ], which makes the normalization outside the intervalambiguous.

100 50 0 50 100 150 200Delay [min]0.0000.0050.0100.0150.0200.025 P D F LHR, =0.123, q =1.306

100 50 0 50 100 150 200Delay [min]10 P D F Combined fitData

FIG. 10. Using the approximately Gaussian shape in the center, we smoothly combine left and right ﬂankﬁts into one coherent ﬁt of the full delay data set. To emphasize the quality of the ﬁt, we display both alinear (left) and logarithmic (right) scale of the PDF for LHR.

VII. DISCUSSION AND CONCLUSIONS

In summary, we have analysed a newly obtained data set of plane delays for various Britishairports, which contains tens of thousands of ﬂights, aggregated over multiple months. This is asubstantial improvement to some earlier studies which only investigated a few days of measure-ments and a couple of thousand ﬂights, thereby greatly underestimating the contribution of thetails to the probability distribution [25]. Interestingly, we ﬁnd that all investigated airports andeven individual airlines at each airport follow a qualitatively similar distribution, namely an ap-proximately exponential decay on the left ﬂank (of negative delays) and a slowly decaying powerlaw on the right ﬂank (of positive delays). To characterize these distributions and systematicallycompare airlines and airports, we have developed a framework to quantify delay performance.Critically, we do not only use the mean delay but also consider extreme events of both positiveand negative delays via their respective ﬂanks in the empirical probability distribution. Applyingthis newly developed framework, we ﬁnd substantial diﬀerences between airlines serving short andlong-distance routes.We oﬀer an explanation for the emerging power law on the right ﬂank via superstatistics: Thelocal q -exponential distribution with its heavy tails seems to arise from many superimposed expo-nential distributions. In particular, we identify the long time scale T as approximately 1.5 hours,during which delays fall oﬀ exponentially. Comparing to other superstatistical results [21, 22], wenote the relevance of both χ -distributions and inverse- χ -distributions for the scale parameter,similar to the ones observed in air pollution or cancer [14, 24], stressing again the universalityof superstatistics. Finally, we propose a continuous function to capture the full delay statistics.While this introduces additional parameters and the superstatistical theory mentioned previouslyis no longer applicable, this ﬁt does describe the full distribution with high accuracy.Our framework of three delay indices to characterize ﬂight delay distributions can be appliedquite generally to measure the punctuality of ﬂights, going beyond an analysis based on just themean. Crucially, while airlines or airports might be able to “game” the system of mean delays,this is not possible with the left and right exponents. Companies could shift their ﬂight schedule,i.e. announcing intentionally that ﬂights will take longer than they do in practice, and thereby1systematically record early arrivals so pushing their mean delay to negative values. However, sucha procedure would still leave the remaining two indices (left and right exponent) untouched so thatthey provide a stable way of measuring performance.One remarkable result is the impact of the global pandemic of COVID-19 on the delay statis-tics. Heathrow (LHR) under COVID-19 conditions (travel restrictions, quarantine upon arrivaletc) displays an impressively low mean delay, while the left ﬂank decay was mostly unchanged.Interestingly, LHR still experienced some heavily delayed ﬂights during the COVID-19 pandemic,which leads to pronounced heavy tails towards the right and thereby a poor performance in theright exponent. These observations demonstrate that given fewer ﬂights, the existing infrastructureis able to perform much better in some aspects (mean) than under “business-as-usual” conditions,while others (extreme delays) still can be improved. Aside from the upsides of COVID-19-relatedlockdown measures on air quality [26, 27] or CO emissions [28], we ﬁnd that having fewer ﬂightsalso improves delay statistics.We have assumed throughout this article that negative delays are preferred by all passengers.However, some passengers might value arrival at exactly the predicted time more highly thanarriving early. This would change the interpretation of the left index slightly: Instead of desiringlow exponents, airlines and airports should aim for high exponents. Similarly, the absolute valueof the delay should be zero, i.e. the arrival on time should be the default. Regardless of preference,the indices, as introduced, provide a suﬃcient framework to measure the delay performance.In the future, we would like to apply our framework to delay statistics at other airports indiﬀerent countries, and investigate how delays are related to geographical distance of the ﬂights.In particular it would be interesting to see how our three indices diﬀer between years, countriesand so on. From a more fundamental perspective, we aim to further understand correlationsin the ﬂight delays. Preliminary indications from the British data are that on “typical” dayscorrelations decay quickly but on some “exceptional” days (perhaps those where external factorsaﬀect many ﬂights) the autocorrelation function can settle on a non-zero value for some time andmany ﬂights have long delays which contribute to the tail of the probability density function.Long-range temporal correlations and memory eﬀects have been studied in many other physicaland non-physical systems [29, 30]; modelling such eﬀects here is challenging, since the build-up ofdelays at one airport may be inﬂuenced by earlier ﬂights to and from completely diﬀerent airports,but practically important since controlling the “cascading” of delays would lead to a signiﬁcantlyimproved passenger experience. Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and inno-vation programme under the Marie Sklodowska-Curie grant agreement No 840825.

Author contributions

E.M., B.S., contributed equally. E.M., B.S., and C.B. conceived and designed the research. E.M.collected the data, E.M. and B.S. analysed the data and produced the ﬁgures. R.H. and all otherauthors contributed to discussing and interpreting the results and writing the manuscript.

Competing interests

The authors declare no competing interests.2

METHODSData processing

As we mentioned in the main text, for each ﬂight, we recorded the airline company operatingthe ﬂight, the ﬂight number, the departure and arrival airports as well as the scheduled and actuallanding times. The data was cleaned and organized according to the delay, computed as thediﬀerence between scheduled arrival time and actual arrival time for each ﬂight. We kept datafor each arrival airport as well as a summary of the overall delays, independent of the arrivalairport. A “negative” delay occurs when the actual aircraft arrival is earlier than the expectedone, according to the scheduled timetable. After examining the data it became evident that areasonable cut-oﬀ point as to how early or late an aircraft can arrive at the designated airportshould be implemented. This prevents over-representation of individual extreme events in theresulting probability distributions. We decided that the delays (in minutes) would have to becontained in the interval [ − , Theoretical distribution ﬁtting

Here we explain the ﬁtting procedure in more detail. We approximate the empirical distributionof the left ﬂank, where negative delays are dominant, with an exponential distribution of the form p ( t L ; λ ) = λe − λt L , λ > . (6)As we have seen earlier, the observed distribution curves towards a Gaussian distribution aroundthe peak value and thereby deviates from an exponential distribution. Hence, we restrict ourﬁtting to values deviating from the central area as follows. Let t peak be the delay at which thedistribution reaches its highest PDF value and t min the smallest delay we observe. Then, werestrict our exponential ﬁt to any delay falling in the interval [ t min , t peak − . | t min − t peak | ], where | ... | indicates the absolute value. Following this restriction, we deﬁne the left ﬂank delay values as t L = − t + t peak − . | t min − t peak | , t ∈ [ t min , t peak − . | t min − t peak | ] . (7)We now turn to the right ﬂank of the empirical distribution, i.e. the data set that constitutesthe majority of the positive delays. The q -exponential is much better at incorporating parts of theGaussian central distribution on the right-hand side than the exponential distribution is on theleft ﬂank. Hence, we only exclude the smallest 10% of the data, i.e. we consider delays t in theinterval interval [ t peak + 0 . | t max − t peak | , t max ], where t max is the highest delay observed. Hencethe right-ﬂank delays to be ﬁtted are deﬁned as t R = t − t peak − . | t max − t peak | , t ∈ [[ t peak + 0 . | t max − t peak | , t max ] . (8)Our theoretical distribution choice is now a q -exponential p ( t R ; q, λ q ) = (2 − q ) λ q [1 + ( q − λ q t R ] − q , (9)with parameters λ q and q .Note that both t L and t R are deﬁned such that they start at 0 and continue towards positivevalues to keep the ﬁtting functions easier.These two functions (exponential and q -exponential) are ﬁtted to the data using a maximumlikelihood estimate (MLE), i.e. maximizing the Likelihood L ( θ, x ). Here, x indicates the data wewish to ﬁt and θ the set of parameters that are being optimized. The likelihood of a parametersetting θ on a given one-dimensional data set x = x , x , ..., x N is computed as L ( θ, x ) = N (cid:89) i =1 p ( x i , θ ) , (10)3with probability density function p ( x i , θ ), dependent on the parameters θ . Technically, we carryout the MLE using the scipy.stats module in python with custom PDFs, see also Code availability(below) for a link to the code. Fitting the smooth combined function

To obtain a smooth ﬁt, combining both ﬂanks, we employ the following procedure. We ﬁrstestimate the exponential decay rate λ based on the lowest 70% of negative delays, then estimate q and the q -exponential decay rate λ q based on almost the full right-hand side of the histogram.This is identical to the procedure for the individual ﬂanking ﬁts. Next, we estimate the centralcurvature C , which we assume to be identical for both intervals, and the amplitudes A e and A q ,as well as λ q using least squares ﬁtting. While carrying out this least-square ﬁt, we also allowthe parameters q and λ to vary up to 10% from the MLE-optimal value determined earlier. Thereason to allow any variance is to ensure a continuous ﬁt and we want to keep the change from theoptimal MLE parameters small, hence, we restrict it to 10%. Technically, we use the scipy.stats module to perform the MLE ﬁts and the least square ﬁt; continuity is ensured using constraints inthe symﬁt package. Airline data

In Figs. 4 and 5 we compared several airlines. Let us brieﬂy list how many ﬂights we analysedto derive our delay indices: For the short-distance airlines ’Wizz Air’: 2428, ’easyJet’: 15449,’Ryanair’: 13488, ’Vueling’: 1034, ’Jet2’: 1215 and for the other airlines we have ’Iberia’: 12892,’British Airways’: 38257, ’Aer Lingus’: 7331, ’Finnair’: 8560, ’American Airlines’: 23119, ’AirCanada’: 7247, ’United Airlines’: 6797, ’Japan Airlines’: 5966, ’Qatar Airways’: 5935. For allairlines we have at least 1000 ﬂights and often several thousand ﬂights.

Data availability

The original data of airport arrivals has been uploaded to an open repository: https://osf.io/snav9/?view_only=74c3a41eab9345aa9bae2b8bc322775c . All data that support the resultspresented in the ﬁgures of this study are available from the authors upon reasonable request.

Code availability

Python code to reproduce ﬁgures, perform the ﬁts and extract the delay indices, is also uploadedhere: https://osf.io/snav9/?view_only=74c3a41eab9345aa9bae2b8bc322775c . [1] M. M. Hakim and R. Merkert, Journal of Transport Geography , 120 (2016).[2] J. G. Brida, P. D. Monterubbianesi, and S. Zapata-Aguirre, World Review of Intermodal Transporta-tion Research , 310 (2018).[3] P. Suau-Sanchez, A. Voltes-Dorta, and N. Cuguer´o-Escofet, Journal of Transport Geography (2020).[4] H. Kuhn, C. Falter, and A. Sizmann, in Proceedings of the 3rd CEAS Air&Space Conference and 21stAIDAA Congress, Venice, Italy (2011) pp. 1249–1259.[5] M. Efthymiou, E. T. Njoya, P. L. Lo, A. Papatheodorou, and D. Randall, Journal of AerospaceTechnology and Management (2019). [6] J. J. Rebollo and H. Balakrishnan, Transportation Research Part C: Emerging Technologies , 231(2014).[7] J. M. Rosenberger, A. J. Schaefer, D. Goldsman, E. L. Johnson, A. J. Kleywegt, and G. L. Nemhauser,Transportation Science , 357 (2002).[8] E. Mueller and G. Chatterji, in AIAA’s Aircraft Technology, Integration, and Operations (ATIO) 2002Technical Forum (2002) p. 5866.[9] G. Gui, F. Liu, J. Sun, J. Yang, Z. Zhou, and D. Zhao, IEEE Transactions on Vehicular Technology , 140 (2019).[10] M. Z. Li and M. S. Ryerson, Journal of Air Transport Management , 111 (2019).[11] C. Beck, E. G. D. Cohen, and H. L. Swinney, Physical Review E , 056133 (2005).[12] K. Briggs and C. Beck, Physica A: Statistical Mechanics and its Applications , 498 (2007).[13] B. Sch¨afer, C. Beck, K. Aihara, D. Witthaut, and M. Timme, Nature Energy , 119 (2018).[14] G. Williams, B. Sch¨afer, and C. Beck, Physical Review Research , 013019 (2020).[15] R. Metzler, The European Physical Journal Special Topics , 711 (2020).[16] UK Civil Aviation Authority, “UK Airports - Annual Statements of Movements, Passengers and Cargo-Table 09,” (2019), [Online; accessed 10-September-2020].[17] Kalyeena Makortoﬀ , “Heathrow cargo ﬂights rise 500% as airport restyles itself as ‘vital airbridge’,”(2020), [Online; accessed 19-August-2020].[18] S. Niˇzeti´c, International Journal of Energy Research , 10953 (2020).[19] C. Tsallis, Journal of Statistical Physics , 479 (1988).[20] EUROCONTROL, “Delays – three questions and many answers,” (2018), [Online; accessed 10-September-2020].[21] C. Beck, Physical Review Letters , 180601 (2001).[22] C. Beck and E. G. D. Cohen, Physica A: Statistical Mechanics and its Applications , 267 (2003).[23] J. Weber, M. Reyers, C. Beck, M. Timme, J. G. Pinto, D. Witthaut, and B. Sch¨afer, Scientiﬁc Reports , 1 (2019).[24] L. L. Chen and C. Beck, Physica A: Statistical Mechanics and its Applications , 3162 (2008).[25] M. V. Caccavale, A. Iovanella, C. Lancia, G. Lulli, and B. Scoppola, Journal of Air TransportManagement , 116 (2014).[26] A. M. Shrestha, U. B. Shrestha, R. Sharma, S. Bhattarai, H. N. T. Tran, and M. Rupakheti, Earth-Arxiv (2020).[27] B. Sch¨afer, R. Verma, A. Giri, H. He, S. Nagendra, M. Khare, and C. Beck, arXiv preprintarXiv:2007.00755 (2020).[28] C. Le Qu´er´e, R. B. Jackson, M. W. Jones, A. J. Smith, S. Abernethy, R. M. Andrew, A. J. De-Gol,D. R. Willis, Y. Shan, J. G. Canadell, et al. , Nature Climate Change , 1 (2020).[29] G. Rangarajan and M. Ding, eds., Processes with Long-Range Correlations: Theory and Applications ,Lecture Notes in Physics, Vol. 621 (Springer-Verlag, Berlin Heidelberg, 2003).[30] J. Beran, Y. Feng, S. Ghosh, and R. Kulik,