[PDF] Asymmetric excitation of left- and right-tail extreme events probed using a Hawkes model: application to financial returns

Abstract

We construct a two-tailed peak-over-threshold Hawkes model that captures asymmetric self- and cross-excitation in and between left- and right-tail extreme values within a time series. We demonstrate its applicability by investigating extreme gains and losses within the daily log-returns of the S&P 500 equity index. We find that the arrivals of extreme losses and gains are described by a common conditional intensity to which losses contribute twice as much as gains. However, the contribution of the former decays almost five times more quickly than that of the latter. We attribute these asymmetries to the different reactions of market traders to extreme upward and downward movements of asset prices: an example of negativity bias, wherein trauma is more salient than euphoria.

Full PDF

AAsymmetric excitation of left- and right-tail extreme events probed using a Hawkesmodel: application to ﬁnancial returns

Matthew F. Tomlinson,

1, 2, ∗ David Greenwood, and Marcin Mucha-Kruczy´nski

1, 4 Department of Physics, University of Bath, Bath BA2 7AY, United Kingdom Centre for Networks and Collective Behaviour, University of Bath, Bath BA2 7AY, United Kingdom CheckRisk LLP, 4 Miles’s Buildings, George Street, Bath BA1 2QS, United Kingdom Centre for Nanoscience and Nanotechnology, University of Bath, Bath BA2 7AY, United Kingdom (Dated: November 26, 2020)We construct a two-tailed peak-over-threshold Hawkes model that captures asymmetric self- andcross-excitation in and between left- and right-tail extreme values within a time series. We demon-strate its applicability by investigating extreme gains and losses within the daily log-returns of theS&P 500 equity index. We ﬁnd that the arrivals of extreme losses and gains are described by acommon conditional intensity to which losses contribute twice as much as gains. However, the con-tribution of the former decays almost ﬁve times more quickly than that of the latter. We attributethese asymmetries to the diﬀerent reactions of market traders to extreme upward and downwardmovements of asset prices: an example of negativity bias, wherein trauma is more salient thaneuphoria.

I. INTRODUCTION

Heuristics such as imitation and herding are signiﬁcantdrivers of human agents within social systems. Thesereﬂexive behaviours of individuals lead to self-excitingdynamics at the group level that often feature time-clustering of extreme events at the macro scale [1, 2].Such extreme events often have profound consequences,which motivates a strong interest in their accurate fore-casting. This problem is often approached through ex-treme value analysis (EVA), where asymptotic tail be-haviour is modelled independently from bulk behaviour,with the justiﬁcation that the two are often generated bydistinct mechanisms and, therefore, that the bulk pro-vides little information about the tail and vice versa [3–5]. Nonstationary EVA methods that account for thetime-clustering of extremes promise both improved fore-casting accuracy and potential insight into the underlyingmechanisms that generate extreme events.Peaks-over-threshold (POT) Hawkes models providea parsimonious framework to describe macroscopic self-excitement of extreme values [6, 7]. In these models, thearrivals of threshold exceeding values within a time se-ries become the discrete events of a inhomogeneous pointprocess in which past events cause a time-decaying in-crease in the arrival rate of future events [2, 8]. Havingﬁrst emerged as a stochastic model for the self-reﬂexivepattern of fore-shocks and aftershocks that decorate ma-jor seismic activity [6, 7, 9, 10], Hawkes-type models havesince found application to broader classes of systems thatexhibit similar activity bursts, including neural networks[11, 12], inter-group conﬂict [13, 14], social media [15],and ﬁnancial markets [8, 16–29].Here, we present a novel two-tailed peaks-over-threshold (2T-POT) Hawkes model that captures asym- ∗ [email protected] metric self- and cross-excitation in and between left-and right-tail extreme values within the same univari-ate time series. This model assumes a conditional ar-rival intensity common to extreme events from both tailsand is conceived as a stochastic model for the time clus-tering of extreme ﬂuctuations within drift-diﬀusion-likeprocesses. Previous work has seldom investigated thepossibility of asymmetric interactions between the twosets of extremes; however, such interactions are a dis-tinct possibility, especially, for example, in socially drivenprocesses, where the guiding heuristics of human agentsmay include diﬀerent responses to the two sets of tailevents [30, 31]. To illustrate this point, we apply ourmodel to the extreme gains and losses within the historicdaily log-returns of the S&P 500 equity index; we uselikelihood-based inference methods to compare its per-formance against the limiting case of symmetric inter-actions between tails, as well as to a bivariate Hawkesmodel in which left- and right-tail extremes are treatedas the events of two distinct point processes – both withand without cross-excitement between them.Financial asset price time series as a class representan ideal case study for our model. Prices are often mod-elled as geometric random walks, in which the log-returns(i.e. changes in the log-price) are independent and iden-tically distributed (i.i.d.) white noise [32, 33]. However,contrary to this description, log-returns are characterizedby heavy tailed marginal distributions and positively au-tocorrelated conditional heteroscedasticity [33–36]. Ac-cordingly, extreme price ﬂuctuations - measured as largemagnitude log-returns - cluster in time, especially withinperiods of sustained overall negative price growth. Thesebursts of extremes are evident in Fig. 1 for the S&P500 daily log-returns: in the left panel, they manifestas step-like increases in the count of extreme returns asa function of time; in the right panels, we observe thatshort interarrival times between such extremes are muchmore frequent than expected under the null hypothesisof i.i.d. returns. A 2T-POT Hawkes approach is of par- a r X i v : . [ q -f i n . S T ] N ov Date N ⇋ ⇋ t ) N ← ⇋t)N → ⇋t) 050 ← f  ⇋λ = 0.025 td −1 )050 → f  ⇋λ = 0.025 td −1 )0 10 20 30 40 50 60 Δt ⇋ Δ td ↔ f  ⇋λ = 0.05 td −1 ) FIG. 1. Arrival processes of extreme S&P 500 daily log-returns. Left-tail (right-tail) extremes are deﬁned as daily log-returnsless (greater) than the 2.5% (97.5%) sample quantile in the training period, 1959-10-02 to 2008-09-01. Left panel: left- (orange)and right-tail (blue) exceedance count against time; the non-vertical grey and black lines show the 95% and 99% conﬁdenceintervals for the null hypothesis of i.i.d. log-returns; the vertical black line marks the end of the training period. Right panel:histogram of the interarrival times for exceedances from the left-tail (top, orange), right-tail (middle, blue), and both tails(green, bottom); the solid lines show the expected exponential distribution under the assumption of i.i.d. log-returns (td =trading days). ticular interest here, because extreme gains and losses,while highly correlated, tend to be described by asym-metric distributions [33–36]. Moreover, heuristics suchas negativity bias and loss aversion (i.e. the tendencyfor human agents to prefer avoiding losses to acquiringequivalent gains) are well established within behaviouraleconomics [37, 38], and these could be expected to haveeﬀects at the group level, including asymmetric excita-tion of extremes. Indeed, our model suggests that lossescontribute signiﬁcantly more (by a factor of two) thangains to the conditional intensity. However, their impor-tance as a function of time decays more rapidly.We construct our Hawkes models in Section II. In Sec-tion III, we apply them to the daily log-returns of theS&P 500 index between 1959-10-02 and 2020-11-20; theirperformance is then evaluated through likelihood-basedinference and residual analysis.

II. HAWKES MODELS FOR TWO-TAILEDTHRESHOLD EXCEEDANCES

Starting from the discrete time series { x k } , where k indexes the data points, we extract two sets of extremeevents, { m k ← } = { x k − u ← < } and { m k → } = { x k − u → > } , where u ← and u → are the thresholds for theleft- and right-tails of the data distribution, respectively.Note that here we use the subscripts ← and → to denotethe left- and right-tail, respectively, and the subscript (cid:11) is used to represent either tail (i.e. either ← or → ) ingeneric expressions. We thus extract two point processes N (cid:11) ( t ), wherein events are fully described by their arrival time t k (cid:11) and excess magnitude m k (cid:11) , such that dN (cid:11) ( t ) = (cid:88) k (cid:11) δ ( t − t k (cid:11) ) , (1)where δ ( t (cid:48) ) is the Dirac delta function. The arrival rateof events within either point process is the conditionalintensity for that process, λ (cid:11) ( t |I t ) = E (cid:20) dN (cid:11) ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) I t (cid:21) , (2)where E [ . ] is the expectation operator. The explicit time-dependence of λ (cid:11) ( t |I t ) speciﬁes N (cid:11) ( t ) to be inhomoge-neous point processes; Hawkes-type behaviour is speci-ﬁed by the conditional dependence on the event historyup to the present time t , I t = { ( t k (cid:11) , m k (cid:11) ) : t k (cid:11) < t } . A. Bivariate 2T-POT Hawkes model

We start our discussion from the most general descrip-tion: a bivariate Hawkes model in which the left- andright-tail exceedances are treated as distinct point pro-cesses. Note that this is incompatible with the fact thatthe arrivals of left- and right-tail events within the orig-inal time series are mutually exclusive. We remedy thisin Section II B and here only point out that the bivariatedescription becomes an increasingly valid approximationas λ (cid:11) →

0. With this in mind, we write the bivariatemodel, (cid:18) λ ← λ → (cid:19) = (cid:18) µ ← µ → (cid:19) + (cid:18) γ ←← γ ←→ γ →← γ →→ (cid:19) (cid:18) χ ← χ → (cid:19) , (3) χ ⇋ ⇋ t d − χ ← χ → −0.050.000.05 x k t ⇋ td λ ⇋ ⇋ t d − λ ← λ → FIG. 2. Endogenous excitements χ (cid:11) (top panel) and conditional intensities λ (cid:11) (bottom panel) for left- (orange) and right-(blue) tail threshold exceedances under the bivariate 2T-POT Hawkes model θ bi during a sample activity cluster within thehistoric S&P 500 daily log-returns (middle panel, with threshold excesses marked in bolder shading). (td = trading days). or, in vector notation, λ ( t | θ bi ; I t ) = µ + Γ χ ( t | θ bi ; I t ) , (4)where θ bi is the parameter vector for the bivariate model, µ ≡ ( µ ← , µ → ) T are the stationary exogenous backgroundintensities for each process, Γ is the 2 × χ are the endogenous excitements generated bythe arrivals of events in each respective process. The re-sponse of this model, as parametrized below, to a sampleactivity cluster is shown in Fig. 2.The endogenous excitements χ are the sums of contri-butions from all past events within each process, χ (cid:11) ( t | θ ; I t ) = (cid:88) k (cid:11) : t k (cid:11) κ (cid:11) ( m k (cid:11) ) → | m k (cid:11) | → α (cid:11) = 0, κ (cid:11) becomes unity and werecover an unmarked Hawkes process in which χ (cid:11) is in-dependent of the magnitudes of past events. Also, notethat E [ κ (cid:11) ( m )] ≡ α (cid:11) .The excess magnitudes are assumed to be describedby a conditional generalized Pareto distribution (GPD).This choice is motivated by the Pickands-Balkema-deHaan theorem [39, 40], which states that the GPD is thelimiting distribution for linearly rescaled threshold ex-cesses within a series of i.i.d. random variables . More-over, since the GPD is speciﬁed with a shape parameter ξ (cid:11) , it can describe a range of tail heaviness from expo-nential decay ( ξ (cid:11) = 0) to increasingly leptokurtic power-law decay ( ξ (cid:11) > The GPD has become the classical asymptotically motivated dis-tribution for threshold excesses within extreme value analysis forthis reason [5]. excess magnitudes is F (cid:11) ( m | t ) =  − (cid:104) ∓ ξ (cid:11) mσ (cid:11) ( t ) (cid:105) − /ξ (cid:11) ξ (cid:11) > − exp [ ± m/σ (cid:11) ( t )] ξ (cid:11) = 0 , (9)where conditional dependence on the excess (i.e. non-background) intensity of the Hawkes process is intro-duced via the conditional scale parameter σ (cid:11) ( t ) = ϕ (cid:11) + η (cid:11) [ λ (cid:11) ( t ) − µ (cid:11) ] . (10)Thus, when η (cid:11) >

0, larger magnitude events becomemore likely in high activity clusters, as is generally ob-served in price data [34]. Conversely, when η (cid:11) = 0 theexcess magnitudes are drawn from an unconditional GPDwith scale parameter ϕ (cid:11) .The self-exciting dynamics of the Hawkes process canbe understood as a branching process, in which daughter events are triggered by the additional endogenous inten-sity produced by the arrival of prior mother events. Theelements of the excitation matrix Γ may be interpretedas branching ratios, meaning that γ ij is the mean numberof daughter events in the process N i that are triggeredby a mother event in the process N j . The process is sub-critical (i.e. non-explosive) provided the spectral radiusof excitation matrix ρ ( Γ ) is less than one [41].Overall, the bivariate model is characterized by a setof parameters, θ bi = { µ , Γ , β , ξ , ϕ , η , α } , where vectorquantities are of the form, µ ≡ ( µ ← , µ → ) T . Note that thetwo distinct Hawkes processes describing each tail can bedecoupled by applying the constraints γ ←→ = 0 = γ →← ,i.e. we recover two independent univariate Hawkes pro-cesses between which there is no cross-excitation. Thisdecoupled bivariate 2T-POT Hawkes model is denotedby the parameter vector θ d bi . B. Common intensity 2T-POT Hawkes model

As noted in the beginning of Section II A, the bivariatemodel assumes that the arrivals of left- and right-tail ex-ceedances form two distinct point processes. This, how-ever, does not guarantee that these two types of eventsare mutually exclusive. In order to enforce this require-ment, we assume that both sets of exceedance arrivalsconstitute the events of a single, common point process, N ↔ , whose arrival rate is given by the one-dimensionalcommon conditional intensity, λ ↔ .To develop a general common intensity model that al-lows for asymmetric cross-excitation between asymmetrictails, we modify Eq. (4) by reducing the excitation matrix Γ to the excitation vector γ T ↔ ≡ (cid:104) | Γ ≡ ( γ ↔← , γ ↔→ )and by reducing the background intensity to a scalar, µ ↔ ≡ (cid:104) | µ (cid:105) ≡ µ ← + µ → . Thus, λ ↔ ( t | θ ci ; I t ) = µ ↔ + γ T ↔ χ ( t | θ ci ; I t ) , (11)where now θ ci = { µ ↔ , γ ↔ , β , ξ , ϕ , η , α , w ↔ } is the pa-rameter vector for the common intensity model. Each event is then stochastically drawn from eithertail upon arrival just as the excess magnitude is also ran-domly sampled. This can be realized as the excess mag-nitude being drawn from a probability distribution thatis a weighted piecewise union of the left- and right-tailtail distributions, i.e. from a probability density functionof the form f ↔ ( m ) = (cid:40) S ( − w ↔ ) · f ← ( m ) m < S (+ w ↔ ) · f → ( m ) m > , (12)where f (cid:11) are the probability density functions for theleft- and right-tail excess magnitude distributions andthe weighting of probability between the two tails is de-termined by the logistic function, S ( w ↔ ) = 1 / (1 + e − w ↔ ) , (13)with the tail-weight asymmetry parameter w ↔ , suchthat the relative frequency of left- to right-tail eventsis E [ N ← /N → ] = exp ( − w ↔ ).Note that, if w ↔ = 0 and all parameters in θ ci areconstrained to be symmetric (i.e. so that the left- andright-tail components of all vector parameters are equal),then the common intensity 2T-POT model is equiva-lent to a single-tail POT Hawkes model applied to theabsolute values of a copy of the original time seriesthat is centred on the mid-point between the thresholds, x ∗ k = x k − ( u → + u ← ) /

2. That is, the set of absoluteexceedances {| m k ↔ |} = {| x ∗ k | − u ↔ > } , where u ↔ =( u → − u ← ) /

2, is a union of {| m k ← |} and {| m k → |} , anda univariate Hawkes model applied to this exceedanceseries describes equal self- and cross-excitation betweenleft- and right-tails that are symmetric in all properties.This symmetric common intensity 2T-POT model is de-noted by the parameter vector θ s ci . III. APPLICATION TO S&P 500 DAILYLOG-RETURNS

To demonstrate the utility of the two-tailed extensionto the classic POT Hawkes model, we apply the 2T-POTHawkes models developed in Section II to the daily log-returns of the S&P 500 equity index between 1959-10-02and 2020-11-20. The data is partitioned into a trainingperiod and an out-of-sample forecast period, with theformer ending (and the latter beginning) on 2008-09-01.The data was sourced from Yahoo Finance [42].The thresholds u ← and u → are set respectively as the2.5% and 97.5% sample quantiles in the training period(values listed in Table I). By deﬁning them as a sym-metric pair of sample quantiles, we guarantee an equalnumber (308) of left- and right-tail exceedances withinthe training period, and so that w ↔ = 0 for the commonintensity models. Hereafter, we refer to exceedances ofthese two thresholds as extreme losses and extreme gains .The parameters of each model are estimated from thetraining period data through the maximum likelihood TABLE I. L-BFGS-B parameter estimates ( ± standard errors) for the (de)coupled bivariate 2T-POT Hawkes model trainedon the extreme losses and gains of the S&P 500 daily log-returns from 1959-10-02 to 2008-09-01 (td = trading days).Parameter ˆ θ bi ˆ θ d bi ← a → b ← → µ / td − (4 . ± . × − (3 . ± . × − (5 . ± . × − (6 . ± . × − γ ← (5 . ± . × − (2 . ± . × − (7 . ± . × − γ → (6 . ± . × − (2 . ± . × − (7 . ± . × − β / td − (7 . ± . × − (1 . ± . × − (3 . ± . × − (2 . ± . × − ξ (2 . ± . × − . ± . . ± . × − (9 . ± . × − ϕ (3 . ± . × − (3 . ± . × − (3 . ± . × − (5 . ± . × − η (3 . ± . × − (5 . ± . × − (3 . ± . × − (2 . ± . × − α (3 . ± . × − . ± . . ± . × − . ± . a Left-tail: x k < u ← = − . b Right-tail: x k > u → = +0 . TABLE II. L-BFGS-B parameter estimates ( ± standard errors) for the (a)symmetric common intensity 2T-POT Hawkes modeltrained on the extreme losses and gains of the S&P 500 daily log-returns from 1959-10-02 to 2008-09-01 (td = trading days).Parameter ˆ θ ci ˆ θ s ci ← → ↔ c µ ↔ / td − (7 . ± . × − (8 . ± . × − γ ↔ . ± . . ± . × − (8 . ± . × − β / td − (7 . ± . × − (1 . ± . × − (4 . ± . × − ξ (2 . ± . × − . ± . . ± . × − ϕ (3 . ± . × − (3 . ± . × − (3 . ± . × − η (1 . ± . × − (2 . ± . × − (2 . ± . × − α (3 . ± . × − . ± . . ± . × − Common-tail: | x k − ( u → + u ← ) / | > u ↔ = ( u → − u ← ) / . (ML) procedure detailed in Appendix A. The parameterestimates are listed in Table I for the bivariate modelsand in Table II for the common intensity models. A. Likelihood-based inference

The general bivariate and common intensity 2T-POTHawkes models ( θ bi and θ ci ) are novel descriptions ofasymmetric cross-excitement between asymmetric left-and right-tail extreme events; their constrained forms( θ d bi and θ s ci ) are equivalent to single-tail models thathave been applied to ﬁnancial returns in previous litera-ture [24, 25]. We use likelihood-based inference to mea-sure and compare the goodness of ﬁt of each model to thedata sample I t , and so determine which best describesthe underlying data generating process of I t .The goodness of ﬁt of the model θ to I t is measured bythe log-likelihood function (cid:96) ( θ |I t ) (deﬁned as Eq. (A1)in Appendix A), with higher values of (cid:96) indicating a bet-ter ﬁt. The log-likelihood is often quoted as the deviance − (cid:96) , for which lower values are optimal. While the de-viance can itself be used to compare the ﬁtness, it is morecommon to use Akaike’s information criterionAIC( θ |I t ) = 2dim( θ ) − (cid:96) ( θ |I t ) , (14) which approximates the expected deviance of a hypothet-ical new sample that is independent of I t , and, in doingso, penalizes redundant complexity [43]. An alternativepenalized deviance is Bayes’s information criterion [43]BIC( θ |I t ) = log [dim( I t )] − (cid:96) ( θ |I t ) . (15)Table III lists the penalized deviance scores for all mod-els in both the training and forecasting periods. A clearhierarchy of ﬁtness emerges from these scores. The de-coupled model with no cross-excitation θ d bi yields theworst ﬁt, followed by the symmetric common intensitymodel θ s ci . The novel 2T-POT models with asymmetricinteractions provide the best ﬁt, with comparatively lit-tle diﬀerence between the two: θ ci is preferred to θ bi byAIC and BIC in the training period, but the opposite istrue in the forecasting period.The relative ﬁtness of pairs of models is compared di-rectly through the likelihood ratio test [33]. Speciﬁcally,for a given pair of models, θ and θ , where dim( θ ) < dim( θ ), the null hypothesis, H : (cid:96) ( θ ) = (cid:96) ( θ ), istested against against the alternative, H : (cid:96) ( θ ) <(cid:96) ( θ ). H is rejected when the higher-dimensional (i.e.more complex) model yields a signiﬁcantly better ﬁt tothe data sample I t . Since the contributions to the log-likelihood (cid:96) from left- and right-tail events are indepen-dent, the likelihood ratio test can be used to compare TABLE III. Deviance and penalized deviance scores of all 2T-POT Hawkes models against the extreme losses and gains of theS&P 500 daily log-returns in both the training and forecasting periods.Deviance score Train (1959-10-02 – 2008-09-01) Forecast (2008-09-01 – 2020-11-20)ˆ θ bi ˆ θ d bi ˆ θ ci ˆ θ s ci ˆ θ bi ˆ θ d bi ˆ θ ci ˆ θ s ci − (cid:96) (ˆ θ ) 46.83 250.30 48.76 138.85 -0.87 116.51 1.53 31.13AIC(ˆ θ ) 78.83 278.30 74.76 152.85 -0.87 116.51 1.53 31.13BIC(ˆ θ ) 160.69 349.93 141.27 188.66 5.41 122.00 6.63 33.88TABLE IV. Likelihood ratio test p -values for the 2T-POTHawkes models during the training and forecasting periods. H : (cid:96) ( θ ) = (cid:96) ( θ ). H : (cid:96) ( θ ) < (cid:96) ( θ ). Rejections of H atthe 95% conﬁdence level are highlighted in bold.Process θ θ p LR Train Forecast N ← ˆ θ s ci ˆ θ ci . × − . × − . × − . × − ˆ θ d bi ˆ θ ci . × − . × − ˆ θ ci ˆ θ bi . × − . × − N → ˆ θ s ci ˆ θ ci . × − . × − . × − . × − . × − . × − ˆ θ d bi ˆ θ ci . × − . × − . × − . × − . × − . × − ˆ θ ci ˆ θ bi . × − . N ↔ ˆ θ s ci ˆ θ ci . × − . × − . × − . × − . × − . × − ˆ θ d bi ˆ θ ci . × − . × − . × − . × − . × − . × − ˆ θ ci ˆ θ bi . × − . × − the relative ﬁtness to each process – N ← , N → , and N ↔ – separately.Table IV lists the p -values for the likelihood ratio testapplied to model pairs with respect to all three pro-cesses. The results for N ↔ – which correspond to thescores quoted in Table III – conﬁrm that the classicalPOT Hawkes models are rejected in favour of the 2T-POT models with asymmetric interactions at the 95%signiﬁcance level. The results for N ← and N → show thatthis is primarily because the latter provide a signiﬁcantlybetter ﬁt to the right-tail exceedances events – support-ing the ﬁnding that the excitement of λ → is mostly in-ﬂuenced by the history of left-tail events. Notably, θ ci is never rejected in favour of θ bi . By directly examiningthe parameter estimates in Tables I and II, we ﬁnd thatthe estimated parameters for θ bi are eﬀectively equiva-lent to those for θ ci (i.e. γ ←← ≈ γ →← ≈ γ ↔← / γ ←→ ≈ γ →→ ≈ γ ↔→ / λ ← ≈ λ → .We therefore infer that the arrival of extreme losses andgains is governed by a common conditional intensity, andthat this intensity is best approximated by λ ↔ ( t | θ ci ; I t ).Having concluded that the common intensity modelbest describes the data I t , we examine the values ofits estimated parameters ˆ θ ci when ﬁtted to this data,as listed in Table II. We observe signiﬁcant asymme-tries in the values estimated for the two tails. Firstly, there is an asymmetry in the excitation vector γ ↔ , suchthat γ ↔← /γ ↔→ = 2 . ± .

5. This means that, on av-erage, extreme losses trigger more than twice as manydaughter events (from either tail) as extreme gains. Atthe same time, the ratio between the decay constants, β ← /β → = 4 . ± .

2, means that the excitation from lossesdecays signiﬁcantly faster, and, therefore, that this ex-citement is more concentrated in time to the immediateaftermath of the mother event’s arrival. We speculatethat these features arise due to the characteristic struc-ture of high activity clusters within the training data.These regimes – which correspond to acknowledged bearmarkets – begin with a burst of activity in which extremelosses are especially frequent; extreme gains then becomemore frequent towards the end of the cluster. Conse-quently, in Hawkes models without other non-constantintensity sources, the two types of exceedances occupydiﬀerent roles: losses drive up the intensity at the startof clusters (hence high excitation with short memory),while gains then prolong the excited state (low excita-tion with long memory).

B. Residual analysis

We further assess the performance of the 2T-POTHawkes models at describing the arrival process throughthe residual analysis technique developed by Ogata [44].If the continuous time arrivals of the point process N i ( t ) are described by the conditional intensity λ i ( t ),then, in the residual time τ i ( t ) = (cid:90) t λ i ( t (cid:48) ) dt (cid:48) , (16)the residual process N i ( τ i ) is a homogeneous unit Poissonprocess and the residual interarrivals ∆ τ i,k i = τ i ( t k i ) − τ i ( t k i − ) are therefore i.i.d. unit exponential randomvariables. If event arrivals instead occur in discrete timewith a minimum time-step δt , then these expected dis-tributions are asymptotic in the limit λ i δt → λ ↔ ≡ (cid:104) | λ (cid:105) ≡ λ ← + λ → , and soa residual time for exceedances from both tails, τ ↔ , istrivial to derive from λ . It is less trivial to derive τ ← and τ → from the common intensity model, since thereis no inverse function to calculate λ ← and λ → from λ ↔ . τ ⇋ N ⇋ ⇋ τ ⇋ ) N ← ⇋τ ← )N → ⇋τ → ) 010 ← f  ⇋λ = 1)010 → f  ⇋λ = 1)0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Δτ ⇋ ↔ f  ⇋λ = 1) FIG. 3. Residual arrival processes of extreme S&P 500 daily log-returns under θ ci . Left panel: left- (orange) and right-tail(blue) exceedance count against residual time; the non-vertical grey and black lines show the 95% and 99% conﬁdence intervalsfor the null hypothesis, H : λ (cid:11) ( t ) = λ (cid:11) ( t | θ ci ; I t ); the vertical black line marks the end of the training period. Right panel:histogram of the residual interarrivals for exceedances from the left-tail (top, orange), right-tail (middle, blue), and both tails(bottom, green); the solid lines show the unit exponential distribution expected under H . Instead, the residual interarrivals of these processes arederived from the probability of an event occurring in N ↔ and then being stochastically drawn from either tail, withrelative frequency E [ N ← /N → ] = exp ( − w ↔ ):∆ τ (cid:11) ,k (cid:11) = ∆ τ ↔ ,k (cid:11) ∓ w ↔

2+ log (cid:20) ± sinh w ↔ (cid:114) sinh w ↔ e − ∆ τ ↔ ,k (cid:11) (cid:21) . (17)If w ↔ = 0, Eq. (17) reduces to ∆ τ (cid:11) ,k (cid:11) = ∆ τ ↔ ,k (cid:11) / θ ci are shown in Fig. 3; this is also representativeof the equivalent residual processes under θ bi , due to theequivalences of the estimated parameters, as discussed inSection III A. In contrast to Fig. 1, we observe that theresidual processes are approximately unit Poisson, and,therefore, that the true conditional intensities are wellapproximated by λ ( t | θ ci ; I t ). A minor, but notable ex-ception is seen in the bottom right panel of Fig. 3, wherethere is a decline in the observed frequency relative to ex-pectation in the limit ∆ τ ↔ →

0, i.e. there are fewer thanexpected instances of exceedance events arriving almostsimultaneously in N ↔ ( τ ↔ ). This is a ﬁnite-size eﬀectthat arises because the arrivals occur in discrete time:since E [ λ ↔ /λ ← ] = E [ λ ↔ /λ → ] = 2, λ ↔ is always furtherfrom the asymptotic limit in which the approximation ofcontinuous time arrivals is valid; this suppresses severeover-forecasting, because the interarrival times cannot beless than discrete time step δt .We note that there are persistent inequalities in thearrival frequencies of losses versus gains in the residualtime. This is especially pronounced in the forecasting pe-riod, i.e. for data outside of the sample whose symmetric TABLE V. Kolmogorov-Smirnov test p -values for the residualprocesses of exceedance arrivals from either and both tails un-der the 2T-POT Hawkes models during the training and fore-casting periods. H : λ i ( t ) = λ i ( t | θ ). H : λ i ( t ) (cid:54) = λ i ( t | θ ).Rejections of H at the 95% conﬁdence level are highlightedin bold.Model p KS Train Forecast λ ← λ → λ ↔ λ ← λ → λ ↔ θ bi .

121 0 .

955 0 .

440 0 .

204 0 . . .

016 0 . θ d bi .

192 0 .

193 0 .

076 0 .

442 0 .

379 0 . θ ci .

218 0 .

853 0 .

452 0 .

274 0 . . .

032 0 . θ s ci .

050 0 .

307 0 .

076 0 . . .

044 0 .

180 0 . quantiles were used to deﬁne the thresholds. While ourmethod of threshold deﬁnition guarantees an equal num-ber of left- and right-tail exceedances within the trainingperiod, we ﬁnd a 168 to 124 split within the forecastingperiod. This could be mitigated in future work throughdynamic threshold values set by the symmetric samplequantiles within a ﬁnite rolling window.We test the null hypothesis that the true conditionalintensity and the conditional intensity approximated bythe Hawkes model are the same, i.e. H : λ i ( t ) = λ i ( t | θ ; I t ), by performing the Kolmogorov-Smirnov (KS)test on the null hypothesis that residual process derivedfrom λ i ( t | θ ; I t ) is unit Poisson. Table V shows the p -values of this test performed for each tail intensity – λ ← , λ → , and λ ↔ – within both the training and forecastingperiods. There are few rejections at the 95% signiﬁcancelevel and there are no such rejections for tests performedon λ ↔ . We note that, with one exception, the KS p -values under θ ci are higher than under θ bi , supporting E m p i r i c a l  ( m ← ) Theoretical (m ⇋ ) E m p i r i c a l  ( m ⇋ ) FIG. 4. Quantile-quantile plots of left- (top, orange) andright- (bottom, blue) tail residual excess magnitudes for theS&P 500 daily log-returns under θ ci . the conclusion in Section III A that the common intensitymodel is the optimal choice for the S&P 500 data set.We complement the residual analysis of the arrivalsprocesses with a residual analysis of the excess magni-tudes. If the excesses { m k (cid:11) } are distributed accordingto the conditional GPD speciﬁed in Eq. (9), then theresidual excess magnitudes, E ( m k (cid:11) ) =  ξ − (cid:11) log (cid:20) ξ (cid:11) m k (cid:11) σ (cid:11) ( t k (cid:11) ) (cid:21) ξ (cid:11) > m k (cid:11) /σ (cid:11) ( t k (cid:11) ) ξ (cid:11) = 0 , (18)are approximately i.i.d. unit exponential random vari-ables. In Fig. 4, quantile-quantile plots show that, under θ ci , these residuals do not perceptibly deviate from theunit exponential distribution, even at the far extremes.We investigate serial dependence within the resid-ual interarrivals as a signal of systemic under- or over-forecasting. This has implications for the forecastingability of the model in practice, but it could also her-ald additional dynamics within the true underlying ar-rival process that are not captured by the model. Forthis analysis, we transform the residual intervals from anexpected unit Poisson distribution to an expected unitnormal distribution via the operation N (∆ τ i,k i ) = F − F Expon (∆ τ i,k i )= √ − (1 − − ∆ τ i,k i ]) , (19)where erf − is the inverse error function. Thus, if theexceedance event at time t k i is over-forecast (i.e. arrivedsooner than expected) by the model, then N (∆ τ i,k i ) < −0.20.00.2 A C F h −0.20.00.2 A C F h Serial lag, h −0.20.00.2 A C F h  [ Δ τ ← Δ  [ Δ τ → Δ  [ Δ τ ↔ Δ FIG. 5. Correlograms (sample lag- h autocorrelation ACF h )for the transformed residual interarrivals of S&P 500 dailylog-returns under θ ci ; derived from N ← (top, orange), N → (middle, blue), and N ↔ (bottom, green). The horizontal greyand black lines show the 99% and 95% conﬁdence intervalsfor ACF h = 0, respectively. ual intervals under θ ci (and, therefore, under θ bi ). Con-versely, Fig. 6 shows peaks of statistically signiﬁcant lo-calized autocorrelation. For the residual interarrivals of N ↔ , three notable peaks of positive lag-1 autocorrela-tion are observed: these follow the high-activity clusterscorresponding to the 1973-4 stock market crash, BlackMonday (1987), and the Global Financial Crisis (2007-9).We infer this to be a signal of systemic overestimation ofthe conditional intensity (i.e. systemic over-forecasting)in the latter stages of high activity regimes. We specu-late that this is a consequence of neglecting signiﬁcantadditional sources of non-constant exogenous intensity:without these sources, all excess intensity must be at-tributed to endogenous self-excitement alone; this leadsto an overestimation of the excitation coeﬃcients, whichthen works against the relaxation of the conditional in-tensity at the end of high activity clusters. Rather thanbeing the mechanism by which the system reaches theexcited state, self-excitement may more so be the mech-anism by which the excited state persists, having beeninitially instigated by a sudden increase in exogenous in-tensity corresponding to either impactful news or othercomplex dynamics within the market. IV. SUMMARY

To summarize, we have developed a two-tailed peaks-over-threshold Hawkes model that captures asymmetricself- and cross-excitation between the left- and right-tail extremes based upon a common conditional inten- N ⇋ −0.50.00.5 A C F | ACF([Δτ ← ]) ACF([Δτ → ])0 200 400 600 800 N ↔ −0.50.00.5 A C F | ACF([Δτ ↔ ])1970 1980 1990 2000 2010 20201970 1980 1990 2000 2010 202095Δ⇋CI 99Δ⇋CI FIG. 6. Rolling-window (length 50) lag-1 autocorrelation for transformed residual interarrivals of extreme S&P 500 daily logreturns under θ ci ; derived from N ← (top, orange), N → (top, blue) and N ↔ (bottom, green). The horizontal grey and blacklines show the 99% and 95% conﬁdence intervals for ACF | = 0, respectively; the vertical black line marks the end of thetraining period on 2008-09-01. sity. Such a model provides a way to measure and de-scribe self-exciting processes with more than one mu-tually exclusive but interacting types of extreme be-haviours. When compared to its symmetric version aswell as a bivariate model in which each tail contributesto either coupled or decoupled distinct point processes,our model, applied to daily log-returns of the S&P 500index, was found to provide the most parsimonious ﬁt tothe data as measured by penalized deviance.By accounting for asymmetric interactions betweenthe tails, our model ﬁnds that, for the S&P 500 dailylog-returns, extreme losses trigger on average more thantwice as many daughter events as do extreme gains. Theexcitation from losses is also found to decay more thanfour times as quickly as that from gains. The greater,more immediate impact of losses is consistent with thegreater psychological weight assigned to them by humanagents, i.e. this result reﬂects a negativity bias whereinnegative events generally provoke a stronger responsethan equivalent positive events [30, 31]. We speculatethat these asymmetries reﬂect an underlying structureof the high activity regimes that correspond to ﬁnancialshocks: losses are more frequent in the initial burst of activity that drives up the conditional intensity at thestart of the regime, whereas gains become more frequenttowards the end of the regime as the intensity decaystowards the background.Beyond the demonstrated application to ﬁnancial data,we anticipate possible extensions of our model to otherdrift-diﬀusion-like processes in which a clustering of ex-treme ﬂuctuations is observed. ACKNOWLEDGMENTS

M.F.T. acknowledges support from EPSRC (UK)Grant No. EP/R513155/1 and CheckRisk LLP.

Appendix A: Maximum likelihood (ML) estimation

The parameters of the 2T-POT Hawkes models arefound through maximum likelihood (ML) estimation.The log-likelihood under the parameters θ over the data I t is (cid:96) ( θ |I t ) = (cid:88) i − (cid:90) t λ i ( t (cid:48) | θ ; I t ) dt (cid:48) + (cid:88) k i : t ki

Extreme Value Theory , Springer Series in Operations Research and Financial En-gineering (Springer New York, New York, NY, 2006).[5] C. Scarrott and A. MacDonald, A review of extremevalue threshold estimation and uncertainty quantiﬁca-tion, Revstat Stat. J. , 33 (2012).[6] A. G. Hawkes, Spectra of some self-exciting and mutuallyexciting point processes, Biometrika , 83 (1971).[7] A. G. Hawkes, Point Spectra of Some Mutually ExcitingPoint Processes, J. R. Stat. Soc. Ser. B , 438 (1971).[8] A. G. Hawkes, Hawkes processes and their applicationsto ﬁnance: a review, Quant. Financ. , 193 (2018).[9] L. Adamopoulos, Cluster models for earthquakes: Re-gional comparisons, J. Int. Assoc. Math. Geol. , 463(1976).[10] R. Shcherbakov, J. Zhuang, G. Z¨oller, and Y. Ogata,Forecasting the magnitude of the largest expected earth-quake, Nat. Commun. , 4051 (2019).[11] V. Pernice, B. Staude, S. Cardanobile, and S. Rotter,Recurrent interactions in spiking networks with arbitrarytopology, Phys. Rev. E , 31916 (2012).[12] N. R. Tannenbaum and Y. Burak, Theory of nonstation-ary Hawkes processes, Phys. Rev. E , 62314 (2017).[13] M. B. Short, G. O. Mohler, P. J. Brantingham, and G. E.Tita, Gang rivalry dynamics via coupled point processnetworks, Discret. Contin. Dyn. Syst. - Ser. B , 1459(2014).[14] N. Johnson, A. Hitchman, D. Phan, and L. Smith, Self-exciting point process models for political conﬂict fore-casting, Eur. J. Appl. Math. , 685 (2018).[15] K. Fujita, A. Medvedev, S. Koyama, R. Lambiotte, andS. Shinomoto, Identifying exogenous and endogenous ac-tivity in social media, Phys. Rev. E , 52304 (2018).[16] E. Bacry, I. Mastromatteo, and J.-F. Muzy, Hawkes pro-cesses in ﬁnance, Mark. Microstruct. Liq. , 1550005(2015).[17] A. G. Hawkes, Hawkes jump-diﬀusions and ﬁnance: abrief history and review, Eur. J. Financ. online , 1 (2020).[18] C. G. Bowsher, Modelling security market events in con-tinuous time: Intensity based, multivariate point processmodels, J. Econom. , 876 (2007).[19] E. Bacry, K. Dayri, and J. F. Muzy, Non-parametric ker-nel estimation for symmetric Hawkes processes. Applica-tion to high frequency ﬁnancial data, Eur. Phys. J. B ,157 (2012).[20] V. Filimonov and D. Sornette, Quantifying reﬂexivity inﬁnancial markets: Toward a prediction of ﬂash crashes,Phys. Rev. E , 056108 (2012).[21] S. J. Hardiman, N. Bercot, and J.-P. Bouchaud, Criticalreﬂexivity in ﬁnancial markets: a Hawkes process analy-sis, Eur. Phys. J. B (2013).[22] S. J. Hardiman and J.-P. Bouchaud, Branching-ratio ap-proximation for the self-exciting Hawkes process, Phys.Rev. E , 62807 (2014).[23] M. Rambaldi, P. Pennesi, and F. Lillo, Modeling foreignexchange market activity around macroeconomic news:Hawkes-process approach, Phys. Rev. E , 012819(2015).[24] O. Grothe, V. Korniichuk, and H. Manner, Modelingmultivariate extreme events using self-exciting point pro-cesses, J. Econom. , 269 (2014).[25] F. Gresnigt, E. Kole, and P. H. Franses, Interpretingﬁnancial market crashes as earthquakes: A new Early Warning System for medium term crashes, J. Bank. Fi-nanc. , 123 (2015).[26] F. Gresnigt, E. Kole, and P. H. Franses, SpeciﬁcationTesting in Hawkes Models, J. Financ. Econom. , 139(2016).[27] F. Gresnigt, E. Kole, and P. H. Franses, ExploitingSpillovers to Forecast Crashes, J. Forecast. , 936(2017).[28] V. Chavez-Demoulin, A. C. Davison, and A. J. Mc-Neil, Estimating value-at-risk: a point process approach,Quant. Financ. , 227 (2005).[29] V. Chavez-Demoulin and J. A. McGill, High-frequencyﬁnancial data modeling using Hawkes processes, J. Bank.Financ. , 3415 (2012).[30] R. F. Baumeister, E. Bratslavsky, C. Finkenauer, andK. D. Vohs, Bad is Stronger than Good, Rev. Gen. Psy-chol. , 323 (2001).[31] P. Rozin and E. B. Royzman, Negativity Bias, NegativityDominance, and Contagion, Personal. Soc. Psychol. Rev. , 296 (2001).[32] L. Bachelier, Th´eorie de la sp´eculation, Ann. Sci. l’´EcoleNorm. sup´erieure , 21 (1900).[33] D. Ruppert and D. S. Matteson, Statistics and DataAnalysis for Financial Engineering , 2nd ed., SpringerTexts in Statistics (Springer New York, New York, NY,2015).[34] R. Cont, Empirical properties of asset returns: Stylizedfacts and statistical issues, Quant. Financ. , 223 (2001).[35] L. Davies and W. Kr¨amer, Stylized Facts and Simulat-ing Long Range Financial Data 10.17877/DE290R-16489(2016), arXiv:1612.05229.[36] R. S. Tsay, Analysis of Financial Time Series , 3rd ed.(Wiley, 2010).[37] M. Rabin, Psychology and Economics, J. Econ. Lit. ,11 (1998).[38] D. Kahneman, Maps of bounded rationality: Psychol-ogy for behavioral economics, Am. Econ. Rev. , 1449(2003).[39] J. Pickands, Statistical Inference Using Extreme OrderStatistics, Ann. Stat. , 119 (1975).[40] A. A. Balkema and L. de Haan, Residual Life Time atGreat Age, Ann. Probab. , 792 (1974).[41] S. Wheatley, A. Wehrli, and D. Sornette, The endo–exoproblem in high frequency ﬁnancial price ﬂuctuations andrejecting criticality, Quant. Financ. , 1165 (2019).[42] Yahoo Finance (2020).[43] E. Wit, E. van den Heuvel, and J.-W. Romeijn, ‘All mod-els are wrong...’: an introduction to model uncertainty,Stat. Neerl. , 217 (2012).[44] Y. Ogata, Statistical Models for Earthquake Occurrencesand Residual Analysis for Point Processes, J. Am. Stat.Assoc. , 9 (1988).[45] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A LimitedMemory Algorithm for Bound Constrained Optimiza-tion, SIAM J. Sci. Comput. , 1190 (1995).[46] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, Algorithm778: L-BFGS-B: Fortran Subroutines for Large-ScaleBound-Constrained Optimization, ACM Trans. Math.Softw. , 550 (1997).[47] M. D. Homan and A. Gelman, The No-U-Turn Sampler:Adaptively Setting Path Lengths in Hamiltonian MonteCarlo, J. Mach. Learn. Res.15