[PDF] A Bivariate Compound Dynamic Contagion Process for Cyber Insurance

Abstract

As corporates and governments become more digital, they become vulnerable to various forms of cyber attack. Cyber insurance products have been used as risk management tools, yet their pricing does not reflect actual risk, including that of multiple, catastrophic and contagious losses. For the modelling of aggregate losses from cyber events, in this paper we introduce a bivariate compound dynamic contagion process, where the bivariate dynamic contagion process is a point process that includes both externally excited joint jumps, which are distributed according to a shot noise Cox process and two separate self-excited jumps, which are distributed according to the branching structure of a Hawkes process with an exponential fertility rate, respectively. We analyse the theoretical distributional properties for these processes systematically, based on the piecewise deterministic Markov process developed by Davis (1984) and the univariate dynamic contagion process theory developed by Dassios and Zhao (2011). The analytic expression of the Laplace transform of the compound process and its moments are presented, which have the potential to be applicable to a variety of problems in credit, insurance, market and other operational risks. As an application of this process, we provide insurance premium calculations based on its moments. Numerical examples show that this compound process can be used for the modelling of aggregate losses from cyber events. We also provide the simulation algorithm for statistical analysis, further business applications and research.

Full PDF

aa r X i v : . [ q -f i n . R M ] J un A Bivariate Compound Dynamic Contagion Process for CyberInsurance

Jiwook Jang

Department of Actuarial Studies & Business Analytics, Macquarie Business School,Macquarie University, Sydney NSW 2109, Australia, E-mail: [email protected]

Rosy Oh

Institute of Mathematical Sciences, Ewha Womans University, Seoul, 03760, Korea,E-mail: [email protected]

Abstract

As corporates and governments become more digital, they become vulner-able to various forms of cyber attack. Cyber insurance products have been used as riskmanagement tools, yet their pricing does not reﬂect actual risk, including that of multiple,catastrophic and contagious losses. For the modelling of aggregate losses from cyber events,in this paper we introduce a bivariate compound dynamic contagion process, where the bi-variate dynamic contagion process is a point process that includes both externally excitedjoint jumps, which are distributed according to a shot noise Cox process and two separateself-excited jumps, which are distributed according to the branching structure of a Hawkesprocess with an exponential fertility rate, respectively. We analyse the theoretical distri-butional properties for these processes systematically, based on the piecewise deterministicMarkov process developed by Davis (1984) and the univariate dynamic contagion processtheory developed by Dassios and Zhao (2011). The analytic expression of the Laplace trans-form of the compound process and its moments are presented, which have the potential tobe applicable to a variety of problems in credit, insurance, market and other operationalrisks. As an application of this process, we provide insurance premium calculations basedon its moments. Numerical examples show that this compound process can be used for themodelling of aggregate losses from cyber events. We also provide the simulation algorithmfor statistical analysis, further business applications and research.

Keywords : Aggregate losses from cyber events; Contagion risk; Bivariate compounddynamic contagion process; Hawkes process; Piecewise deterministic Markov process; Mar-tingale methodology; Insurance premium

Due to the digitalisation of business and economic activities via the Internet of Things(IoT), cloud computing, mobile and other innovative technologies, cyber risk is inherent andextreme. Cyber risks refer to any risk of ﬁnancial loss, disruption to operations, or damageto the reputation of an organisation due to failure of its information technology (IT) systems,as deﬁned by the Institute of Risk Management (IRM). Financial losses from malicious cyberactivities result from IT security/data/digital assets recovery, liability in respect of identity1heft and data breaches, reputation/brand damage, legal liability, cyber extortion, regulatorydefence and penalties coverage and business interruption.The frequency of malicious cyber activities is rapidly increasing, with the scope andnature dependent on an organisation’s industry, size and location. According to a 2016Allianz survey, cyber risk is the top long-term risk to business and currently a top-threeglobal business risk. It is therefore critical that corporations and governments focus onIT and network security enhancement. Unless public and private sector organisations haveeﬀective cyber security plans and strategies in place, and tools to manage and mitigatelosses from cyber risks, cyber events have the potential to aﬀect their business signiﬁcantly,possibly damaging hard-earned reputations irreparably.Insurance has served to mitigate liability since the 17th century, after the Great Fire ofLondon in 1666. As part of a cyber risk mitigation strategy, cyber insurance can be purchasedby organisations to cover economic and ﬁnancial losses occurring from cyber incidents. Sincethe widespread Y2K concerns raised the proﬁle of the possible security vulnerabilities ofdigitalisation, the cyber insurance industry has grown to a total annual premium of $2.5billion, and the market is expected to reach $20 billion by 2025 globally. However, due tothe complexity of cyber incidents, i.e. multiple, catastrophic and contagious losses, it isdiﬃcult for insurers to price cyber insurance products accurately. Inaccurate pricing couldhave severe market eﬀects in the event of a signiﬁcant claim.To date however there has been little theoretical work done on developing acceptablecyber insurance pricing models. Also due to the complexity of cyber risks, the previousstudies (Mukhopadhyay et al. 2006; Herath and Herath 2011and Xu and Hua 2017) donot provide a suitable framework to measure cyber risks as they have not accounted forfuture cyber attacks dynamically. Also traditionally insurance claim modelling has usedhomogeneous/non-homogeneous Poisson processes as a claim arrival process. However, forcyber events, the assumption that resulting claims occur in terms of the Poisson process isinadequate due to its deterministic intensity. Therefore, an alternative point process needsto be used to predict claim arrivals from cyber incidents.To this eﬀect, we introduce a bivariate compound dynamic contagion process (BCDCP)for the modelling of aggregate losses from cyber events, where the bivariate dynamic con-tagion process (BDCP) is a point process which has both externally excited joint jumps,which are distributed according to a shot noise Cox process and two separate self-excitedjumps, which are Hawkes processes. Since Hawkes (1971a, 1971b) and Hawkes and Oakes(1974) introduced a self-exciting point process, the applications and modelling of Hawkesprocesses in ﬁnance and insurance can be found in Chavez-Demoulin et al. (2005), McNeilet al. (2005), Bauwens and Hautsch (2009), Bowsher (2007), Errais et al. (2010), Stabileand Torrisi (2010), Embrechts et al. (2011), Giesecke and Kim (2011) and A¨ıt-Sahalia et al.(2014, 2015).Dassios and Zhao (2011) introduced a dynamic contagion process, which is a generali-sation of the externally excited Cox process with shot noise intensity and the self-excitedHawkes process applying to credit risk. Dassios and Zhao (2012) also examined inﬁnite hori-zon ruin probability with its Monte Carlo simulation using this process as the claim arrivalprocess. Dassios and Zhao (2017a) extended this process with diﬀusion component to cal-culate the default probability and to price defaultable zero-coupon bonds. We have founddynamic contagion processes to be ﬂexible and realistic in modelling claims with contagion.These aforementioned papers are neither the bivariate dynamic contagion models nor the2ompound models. In contrast we extend it further to quantify aggregate losses from cyberevents using a bivariate compound dynamic contagion process as they are multiple, catas-trophic and contagious losses. Biener et al. (2015) emphasised that one of characteristicsof cyber risk is highly interrelated losses, and modelling cyber risk would be a great deal ofpromise to test them when enough cyber loss data become available.Bivariate modelling with self-exciting Hawkes processes can be noticed in Jang and Das-sios (2013), where they introduced a bivariate shot noise self-exciting process that can be usedfor the modelling of catastrophic losses. Dong (2014) examined the stationarity of bivari-ate dynamic contagion processes including the cross-exciting contagion eﬀect in his doctoralthesis. Applications and modelling of multivariate Hawkes process in high-frequency limitorder book data can be found in Rombaldi et al. (2017) and Lu and Abergel (2018). Yang etal. (2018) investigated the interactions between market return events and investor sentimentusing a multivariate Hawkes process.Compound modelling with univariate self-exciting Hawkes processes can be noticed inDassios and Zhao (2017b), where they developed the algorithms for a generalised self-excitingpoint process with CIR-type intensities. Gao et al. (2018) applied the joint Laplace trans-form of the classical Hawkes process and its compound process in dark pool trading, whichdo not display bid and ask quotes to the public.This project develops a new model for pricing cyber risk using a BCDCP, which accom-modate the interdependence dynamics of IT system and the frequency and impact of cyberevents. Our research oﬀers a new framework to enable insurance companies to price cyberinsurance policies accommodating clustering of losses.This paper is structured as follows. In Section 2, we provide a mathematical deﬁnition ofthe BCDCP and the BDCP, respectively via the stochastic intensity representation adoptedthe one used by Dassios and Zhao (2011) and the algorithm for simulating these processes inSection 5. In Section 3, we analyse these processes systematically for their theoretical distri-butional properties, based on the piecewise deterministic Markov process theory developedby Davis (1984), and the martingale methodology used by Dassios and Jang (2003). Thejoint moment of two processes, its covariance and linear correlation are derived in Section 4,where for simplicity, we use the case for the stationary distribution of the intensity processes.As an application of this process, we provide cyber insurance premium calculations basedon these quantities in Section 5. Section 6 concludes the paper.

In this section, we have a mathematical deﬁnition for the BCDCP in Deﬁnition 2.2.Before that, let us have a mathematical deﬁnition for the BDCP in Deﬁnition 2.1 via thestochastic intensity representation adopted the one used by Dassios and Zhao (2017). Foran alternative deﬁnition for this process, we refer you Dassios and Zhao (2011), Jang andDassios (2013) and Dong (2014), where they gave as a cluster process representation for theunivariate dynamic contagion process, the bivariate shot noise self-exciting process and thebivariate dynamic contagion process, respectively.

Deﬁnition 2.1 (Bivariate dynamic contagion process).

Bivariate dynamic con-tagion process is a point process N (1) t N (2) t ! t> =  P j ≥ I ( T ,j ≤ t ) j =1 , , ··· P k ≥ I ( T ,k ≤ t ) k =1 , , ···  with the non-3egative ℑ t − stochastic bivariate intensity process λ (1) t λ (2) t ! , i.e. λ (1) t = a (1) + (cid:16) λ (1)0 − a (1) (cid:17) e − δ (1) t + X i ≥ X (1) i e − δ (1) ( t − T ,i ) I ( T ,i ≤ t )+ X j ≥ Y j e − δ (1) ( t − T ,j ) I ( T ,j ≤ t ) ,λ (2) t = a (2) + (cid:16) λ (2)0 − a (2) (cid:17) e − δ (2) t + X i ≥ X (2) i e − δ (2) ( t − T ,i ) I ( T ,i ≤ t )+ X k ≥ Z k e − δ (2) ( t − T ,k ) I ( T ,k ≤ t ) , (2.1)where • {ℑ t } t ≥ is a history of the joint process N (1) t N (2) t ! , with respect to which ( λ (1) t λ (2) t ) t ≥ is adapted; • λ ( d )0 > t = 0, where d = 1 , • a ( d ) ≥ • δ ( d ) > • n X (1) i , X (2) i o i =1 , , ··· is a sequence of i . i . d . positive externally-excited joint jumps with distribution F ( x (1) , x (2) ) , x (1) > , x (2) >

0, where margins are F X (1) and F X (2) at the corresponding random times { T ,i } i =1 , , ··· following a Poissonprocess M t with constant rate ρ >

0, and I is the indicator function. • { Y j } j =1 , , ··· is a sequence of i . i . d . positive self-excited jumps with distributionfunction G ( y ), y >

0, at the corresponding random times { T ,j } j =1 , , ··· . • { Z k } k =1 , , ··· is another sequence of i . i . d . positive self-excited jumps with distri-bution function H ( z ), z >

0, at the corresponding random times { T ,k } k =1 , , ··· . • n X (1) i , X (2) i o i =1 , , ··· , { Y j } j =1 , , ··· , { Z k } k =1 , , ··· , { T ,i } i =1 , , ··· , { T ,j } j =1 , , ··· and { T ,k } k =1 , , ··· areassumed to be independent of each other.The bivariate compound model we consider has the following structure: L (1) t = X j ≥ Ξ (1) j I ( T ,j ≤ t ) ,L (2) t = X k ≥ Ξ (2) k I ( T ,k ≤ t ) , (2.2)4here L ( d ) t is the total amount of claims/losses arising from risk type d = 1 , N ( d ) t is thenumber of points (i.e. claims/losses) up to time t . The random variables Ξ (1) j and Ξ (2) k denotethe individual claim/loss amounts, where we assume that they are independent identicallydistributed with distributions J Y (1) and K Y (2) , respectively. Our intensity processes for N (1) t and N (2) t are modelled by jump processes, which are in the form of (2.1). Deﬁnition 2.2 (Bivariate compound dynamic contagion process).

Bivari-ate compound dynamic contagion process is a compound point process L (1) t L (2) t ! t> =  P j ≥ Ξ (1) j I ( T ,j ≤ t ) j =1 , , ··· P k ≥ Ξ (2) k I ( T ,k ≤ t ) k =1 , , ···  with the non-negative ℑ t − stochastic bivariate intensity pro-cess λ (1) t λ (2) t ! which is in the form of (2.1), where • n Ξ (1) j o j =1 , , ··· is a sequence of i . i . d . positive individual claim/loss amounts fromrisk type d = 1 with distribution function J ( ξ (1) ), ξ (1) >

0, at the correspondingrandom times { T ,j } j =1 , , ··· . • n Ξ (2) k o k =1 , , ··· is another sequence of i . i . d . positive individual claim/loss amountsfrom risk type d = 2 with distribution function K ( ξ (2) ), ξ (2) >

0, at the corre-sponding random times { T ,k } k =1 , , ··· . • n X (1) i , X (2) i o i =1 , , ··· , { Y j } j =1 , , ··· , { Z k } k =1 , , ··· , n Ξ (1) j o j =1 , , ··· , n Ξ (1) k o k =1 , , ··· , { T ,i } i =1 , , ··· { T ,j } j =1 , , ··· and { T ,k } k =1 , , ··· are assumed to be independent of each other.The joint process of ( λ (1) t λ (2) t ! , N (1) t N (2) t ! , L (1) t L (2) t !) t ≥ is a Markov process in thestate space R + × N × R +0 . With the aid of piecewise deterministic Markov process the-ory and using the results in Davis (1984), the inﬁnitesimal generator of the bivariate com-pound dynamic contagion process (cid:16) λ (1) t , N (1) t , L (1) t , λ (2) t , N (2) t , L (2) t , t (cid:17) acting on a function f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17) within its domain D ( A ) is given by5 f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17) = ∂f∂t + δ (1) (cid:16) a (1) − λ (1) (cid:17) ∂f∂λ (1) + δ (2) (cid:16) a (2) − λ (2) (cid:17) ∂f∂λ (2) + λ (1)  ∞ R ∞ R f (cid:16) λ (1) + y, n (1) + 1 , l (1) + ξ (1) , λ (2) , n (2) , l (2) , t (cid:17) dG ( y ) dJ ( ξ (1) ) − f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17)  + λ (2)  ∞ R ∞ R f (cid:16) λ (1) , n (1) , l (1) , λ (2) + z, n (2) + 1 , l (2) + ξ (2) , t (cid:17) dH ( z ) dK ( ξ (2) ) − f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17)  + ρ  ∞ R ∞ R f (cid:16) λ (1) + x (1) , n (1) , l (1) , λ (2) + x (2) , n (2) , l (2) , t (cid:17) dF X (1) ,X (2) ( x (1) , x (2) ) − f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17)  , (2.3)where D ( A ) is the domain of the generator A such that f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17) isdiﬀerentiable with respect to λ (1) , λ (2) and t for all λ (1) , λ (2) and t, and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ Z ∞ Z f (cid:16) · , λ (1) + y, n (1) + 1 , l (1) + ξ (1) , · (cid:17) dG ( y ) dJ ( ξ (1) ) − f (cid:16) · , λ (1) , n (1) , l (1) , · (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ Z ∞ Z f (cid:16) · , λ (2) + z, n (2) + 1 , l (2) + ξ (2) , · (cid:17) dH ( z ) dK ( ξ (2) ) − f (cid:16) · , λ (2) , n (2) , l (2) , · (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ Z ∞ Z f (cid:16) · , λ (1) + x (1) , λ (2) + x (2) , · (cid:17) dF (cid:0) x (1) , x (2) (cid:1) − f (cid:16) · , λ (1) + x (1) , λ (2) + x (2) , · (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ .

3. Bivariate Compound Dynamic Contagion Process

In this section, we derive the joint Laplace transform of the process ( L (1) T , L (2) T ) in Theorem3.4, for which we start with Theorem 3.1. Theorem 3.1 leads to the key results of the paperas we also derive the joint probability generating function of the process ( N (1) T , N (2) T ) inTheorem 3.2. The joint Laplace transform of the process ( λ (1) T , λ (2) T ) can be also derivedusing this theorem as presented in Jang and Dassios (2013). ( λ (1) t , λ (2) t ,N (1) t , N (2) t , L (1) t , L (2) t ) 6 heorem 3.1 Considering the constants , 0 ≤ θ ≤ , ≤ η ≤ , ν ≥ , ζ ≥ ,υ ≥ , γ ≥ and time ≤ t ≤ T, we have the conditional joint Laplace transform , probability generating function of the process ( λ (1) T , λ (2) T ), the point process ( N (1) T , N (2) T ) andthe compound point process ( L (1) T , L (2) T ) is given by E (cid:20) θ n N (1) T − N (1) t o η n N (2) T − N (2) t o e − ν n L (1) T − L (1) t o e − ζ n L (2) T − L (2) t o × e − υλ (1) T e − γλ (2) T | ℑ t (cid:21) = e − B ( t ) λ (1) t e − B ( t ) λ (2) t e −{ C ( T ) − C ( t ) } , (3.1) where B ( t ) and B ( t ) are determined by two non-linear ordinary diﬀerential equations ( ODEs ) − B ′ ( t ) + δ (1) B ( t ) + θ ∧ g { B ( t ) } ∧ j ( ν ) − , (3.2) − B ′ ( t ) + δ (2) B ( t ) + η ∧ h { B ( t ) } ∧ k ( ζ ) − , (3.3) with the boundary condition B ( T ) = υ and B ( T ) = γ , respectively, where ∧ g ( ε ) = ∞ Z e − εy dG ( y ), ∧ h ( ε ) = ∞ Z e − εz dH ( z ) , ∧ j ( κ ) = ∞ Z e − κζ (1) dJ ( ζ (1) ) and ∧ k ( κ ) = ∞ Z e − κζ (2) dK ( ζ (2) ) . (3.4) C ( t ) is determined by C ( t ) = ρ t Z (cid:20) − ∧ f { B ( s ) , B ( s ) } (cid:21) ds + a (1) δ (1) t Z B ( s ) ds + a (2) δ (2) t Z B ( s ) ds, (3.5) where ∧ f ( ε, κ ) = ∞ Z ∞ Z e − εx (1) e − κx (2) dF (cid:0) x (1) , x (2) (cid:1) . (3.6) It is assumed that the Laplace transforms of above , i . e . ∧ g ( ε ) , ∧ h ( ε ) , ∧ j ( κ ) , ∧ k ( κ ) and the jointLaplace transform , ∧ f ( ε, κ ) are ﬁnite . Proof.

Consider a function f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17) with an exponential aﬃne form f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17) = θ n (1) η n (2) e − νl (1) e − ζl (2) e − B ( t ) λ (1) e − B ( t ) λ (2) e C ( t ) , A f = 0 in (2.3), we have − λ (1) B ′ ( t ) − λ (2) B ′ ( t ) + C ′ ( t )+ λ (1) (cid:20) θ ∧ g { B ( t ) } ∧ j ( ν ) (cid:21) + λ (2) (cid:20) η ∧ h { B ( t ) } ∧ k ( ζ ) (cid:21) + δ (1) (cid:16) a (1) − λ (1) (cid:17) {− B ( t ) } + δ (2) (cid:16) a (2) − λ (2) (cid:17) {− B ( t ) } + ρ (cid:20) ∧ f { B ( t ) , B ( t ) } − (cid:21) = 0 . (cid:20) − B ′ ( t ) + δ (1) B ( t ) + θ ∧ g { B ( t ) } ∧ j ( ν ) − (cid:21) λ (1) (cid:20) − B ′ ( t ) + δ (2) B ( t ) + η ∧ h { B ( t ) } ∧ k ( ζ ) − (cid:21) λ (2) + (cid:20) C ′ ( t ) + ρ ∧ f { B ( t ) , B ( t ) } − ρ − δ (1) a (1) B ( t ) − δ (2) a (2) B ( t ) (cid:21) = 0 . (3.7)where ∧ g ( ε ) ∧ j ( κ ) = ∞ Z ∞ Z e − εy e − κζ (1) dG ( y ) dJ ( ζ (1) ) , ∧ h ( ε ) ∧ k ( κ ) = ∞ Z ∞ Z e − εz e − κζ (2) dH ( z ) dK ( ζ (2) ) , ∧ f ( ε, κ ) = ∞ Z ∞ Z e − εx (1) e − κx (2) dF (cid:0) x (1) , x (2) (cid:1) . Since this equation holds for any l (1) , l (2) , n (1) , n (2) , λ (1) and λ (2) , it is equivalent to solvingthree separated equations, i.e. − B ′ ( t ) + δ (1) B ( t ) + θ ∧ g { B ( t ) } ∧ j ( ν ) − , (3.8.1) − B ′ ( t ) + δ (2) B ( t ) + η ∧ h { B ( t ) } ∧ k ( ζ ) − , (3.8.2) C ′ ( t ) + ρ ∧ f { B ( t ) , B ( t ) } − ρ − δ (1) a (1) B ( t ) − δ (2) a (2) B ( t ) = 0 . (3.8.3)We have two ODEs of (3.8.1) and (3.8.2) with the boundary condition B ( T ) = υ and B ( T ) = γ , respectively. By (3.8.3) with boundary condition C (0) = 0 , the integration of83.5) follows. Since θ N (1) t η N (2) t e − νL (1) t e − ζL (2) t e − B ( t ) λ (1) t e − B ( t ) λ (2) t e C ( t ) is a ℑ -martingale by theproperty of the inﬁnitesimal generator, we have E h θ N (1) T η N (2) T e − νL (1) T e − ζL (2) T e − B ( T ) λ (1) T e − B ( T ) λ (2) T e C ( T ) | ℑ t i = θ N (1) t η N (2) t e − νL (1) t e − ζL (2) t e − B ( t ) λ (1) t e − B ( t ) λ (2) t e C ( t ) . (3.9)Then, by the boundary condition B ( T ) = υ and B ( T ) = γ, (3.1) follows. ( λ (1) T , λ (2) T )Based on (3.1), we can easily derive the joint Laplace transform for the process ( λ (1) T ,λ (2) T ) setting θ = 1 , η = 1 , ν = 0 , ζ = 0 . As it has already presented in Jang and Dassios(2013), we state two propositions adopted from them in this section. G − υ, ( T ) and H − γ, ( T )in the proposition will become apparent in Theorem 3.3. Proposition 3.1.

The conditional joint Laplace transform for the process (cid:16) λ (1) T , λ (2) T (cid:17) given λ (1)0 and λ (2)0 at time t = 0 is given by E h e − υλ (1) T e − γλ (2) T | λ (1)0 , λ (2)0 i = exp n −G − υ, ( T ) λ (1)0 o exp n −H − γ, ( T ) λ (2)0 o × exp  − ρ T Z (cid:20) − ∧ f (cid:8) G − υ, ( τ ) , H − γ, ( τ ) (cid:9)(cid:21) dτ  × exp  − υ Z G − υ, ( T ) ( a (1) δ (1) uδ (1) u + ∧ g ( u ) − ) du  × exp  − γ Z H − γ, ( T )  a (2) δ (2) uδ (2) u + ∧ h ( u ) −  du  , (3.10) where µ G = ∞ Z ydG ( y ), G υ, (Ψ ) =: υ Z Ψ " δ (1) u + ∧ g ( u ) − du , µ H = ∞ Z zdH ( z ) , H γ, (Ψ ) =: γ Z Ψ  δ (2) u + ∧ h ( u ) −  du,δ (1) > µ G and δ (2) > µ H .9 emark 1 . (3.10) is the conditional joint Laplace transform of the process (cid:16) λ (1) T , λ (2) T (cid:17) given λ (1)0 and λ (2)0 at time t = 0 , where the jumps X (1) and X (2) with distribution function F (cid:0) x (1) , x (2) (cid:1) , occur simultaneously/collaterally with constant intensity ρ . Because of thesetwo dependences in the process, this conditional joint Laplace transform is not the productof conditional Laplace transform of λ (1) T given λ (1)0 and the Laplace transform of λ (2) T given λ (2)0 , i.e. E h e − υλ (1) T e − γλ (2) T | λ (1)0 , λ (2)0 i = E h e − υλ (1) T | λ (1)0 i E h e − γλ (2) T | λ (2)0 i . (3.11) Proposition 3.2.

The joint Laplace transform of the asymptotic distribution of (cid:16) λ (1) T , λ (2) T (cid:17) is given by lim T →∞ E h e − υλ (1) T e − γλ (2) T | λ (1)0 , λ (2)0 i = exp  − ρ ∞ Z (cid:20) − ∧ f (cid:8) G − υ, ( τ ) , H − γ, ( τ ) (cid:9)(cid:21) dτ  × exp  − υ Z ( a (1) δ (1) uδ (1) u + ∧ g ( u ) − ) du  × exp  − γ Z  a (2) δ (2) uδ (2) u + ∧ h ( u ) −  du  , (3.12) where δ (1) > µ G and δ (2) > µ H . Remark 2 . We can easily derive the Laplace transform of λ (1) T and λ (2) T for a ﬁxed time T ,respectively using (3.10). This can also be found in Theorem 3.2 in Dassios and Zhao (2011).Setting ρ = 0, we can obtain the conditional Laplace transform of λ ( d ) T ( d = 1 ,

2) given λ ( d )0 at time t = 0 for the self-exciting process with exponential decay. These processes canbe considered in modelling the bivariate intensity process only when self-excited jumps areinvolved eliminating the eﬀect of the externally excited jumps, or to see the contribution of“after-cyber attacks” to the intensity eliminating the contribution of “initial-cyber attacks”to the intensity in cyber insurance context. ( N (1) T , N (2) T )We derive the joint probability generating function for the process ( N (1) T , N (2) T ) for a ﬁxedtime T in Theorem 3.2 using the result in Theorem 3.1. Theorem 3.2.

The conditional joint probability generating function for the process ( N (1) T , N (2) T ) given λ (1)0 and λ (2)0 , and N (1)0 = 0 and N (2)0 = 0 at time t = 0 is given by h θ N (1) T η N (2) T | λ (1)0 , λ (2)0 i = exp n −G − ,θ ( T ) λ (1)0 o exp n −H − ,η ( T ) λ (2)0 o × exp  − ρ T Z (cid:20) − ∧ f (cid:8) G − ,θ ( τ ) , H − ,η ( τ ) (cid:9)(cid:21) dτ  × exp  − G − ,θ ( T ) Z ( a (1) δ (1) u − δ (1) u − θ ∧ g ( u ) ) du  × exp  − H − ,η ( T ) Z  a (2) δ (2) u − δ (2) u − η ∧ h ( u )  du  . (3.13) Proof.

By setting t = 0 , ν = 0 , ζ = 0 , υ = 0 and γ = 0 in (3.1) with the assumption that N (1)0 = 0 and N (2)0 = 0, we have E h θ N (1) T η N (2) T | ℑ i = e − B (0) λ (1)0 e − B (0) λ (2)0 e − C ( T ) , (3.14)where B (0) is uniquely determined by the non -linear ordinary diﬀerential equation(ODE) − B ′ ( t ) + δ (1) B ( t ) + θ ∧ g { B ( t ) } − B ( T ) = 0 and similarly, B (0) is uniquely determined by the non -linear ODE − B ′ ( t ) + δ (2) B ( t ) + η ˆ h { B ( t ) } − B ( T ) = 0 . (3.15) can be solved, under the condition δ (1) > µ G , by the following steps (1)-(7).(1) Set B ( t ) = Ψ ( T − t ) = Ψ ( τ ) . Then it becomes d Ψ ( τ ) dτ = 1 − δ (1) B ( t ) − θ ∧ g { B ( t ) } = 1 − δ (1)1 Ψ ( τ ) − θ ∧ g { Ψ ( τ ) } =: f (Ψ ) , ≤ θ ≤ (0) = 0; we deﬁne the right-hand side as the function, f (Ψ ).(2) There is only one positive singular point, denoted by υ ∗ > , which can be obtainedby solving the equation 1 − δ (1) u − θ ∧ g ( u ) = 0 , (3.18)at which the uniqueness of the solution of equation (3.18) is violated. This is because, forthe case 0 < θ < , f (Ψ ) = 0 is equivalent to ∧ g ( u ) = 1 θ (cid:16) − δ (1) u (cid:17) , < θ <

1. (3.19)11ote that the left-hand side of (3.19) is a convex function, hence it is clear that there is onlyone positive solution to f (Ψ ). For the case that θ = 0, there is only one singular point υ ∗ = 1 δ (1) > . For both cases, we have υ ∗ = 1 − θ ∧ g ( υ ∗ ) δ (1) ≥ − θδ (1) > , hence, we have f (Ψ ) > ≤ Ψ < υ ∗ and f (Ψ ) < > υ ∗ .(3) (3.17) can be written as d Ψ ( τ )1 − δ (1)1 Ψ ( τ ) − θ ∧ g { Ψ ( τ ) } = dτ . Integrate both sides from time 0 to τ , then we have Ψ ( τ ) Z " − δ (1) u − θ ∧ g ( u ) du = τ , where 0 ≤ Ψ ( τ ) < υ ∗ . Now we deﬁne the left-hand side as the function G ,θ (Ψ ) =: Ψ ( τ ) Z " − δ (1) u − θ ∧ g ( u ) du. Then we have G ,θ (Ψ ) = τ (= T − t ) , which is the time diﬀerence between T and t , and it is obvious that Ψ ( τ ) → τ → ( τ ) → υ ∗ when τ → ∞ . The integrand is positive in the domain u ∈ (0 , υ ∗ ] and forΨ ( τ ) ≥ G ,θ (Ψ ) is a strictly increasing function. Therefore G ,θ (Ψ ) = τ : [0 , υ ∗ ) → [0 , ∞ )is a well deﬁned function and it inverse function G − ,θ ( τ ) = Ψ : [0 , ∞ ) → [0 , υ ∗ )exists.(4) The unique solution is found byΨ ( τ ) = Ψ ( T − t ) = B ( t ) = G − ,θ ( τ ) = G − ,θ ( T − t )and hence B (0) is obtained, B (0) = Ψ ( T ) = G − ,θ ( T ) . δ (2) > µ H , the unique solution for (3.16)is given by Ψ ( τ ) = Ψ ( T − t ) = B ( t ) = H − ,η ( τ ) = H − ,η ( T − t )and hence B (0) is obtained, B (0) = Ψ ( T ) = H − ,η ( T ) , where H ,η (Ψ ) = Ψ ( τ ) Z  − δ (2) u − η ∧ h ( u )  du is also a strictly increasing function: the integrand is positive in the domain u ∈ (0 , γ ∗ ] andfor Ψ ( τ ) ≥ H ,η (Ψ ) = τ : [0 , γ ∗ ) → [0 , ∞ )is a well deﬁned function and it inverse function H − ,η ( τ ) = Ψ : [0 , ∞ ) → [0 , γ ∗ )exists.(6) C ( T ) is determined by C ( T ) = ρ T Z (cid:20) − ∧ f (cid:8) G − ,θ ( τ ) , H − ,η ( τ ) (cid:9)(cid:21) dτ + δ (1) a (1) T Z G − ,θ ( τ ) dτ + δ (2) a (2) T Z H − ,η ( τ ) dτ , and by the change of variable G − ,θ ( τ ) = u, we have τ = G ,θ ( u ) ( → dτ = ∂ G ,θ ( u ) ∂u du ), and T Z G − ,θ ( τ ) dτ = G − ,θ ( T ) Z u − δ (1) u − θ ∧ g ( u ) du and similarly, H − ,η ( τ ) = u, we have τ = H ,η ( u ) ( → dτ = ∂ H ,η ( u ) ∂u du ), and T Z H − ,η ( τ ) dτ = H − ,η ( T ) Z u − δ (2) u − η ∧ h ( u ) du (7) Finally, substitute B (0) , B (0) and C ( T ) into (3.14) and the result follows. Remark 3 . We can easily derive the Laplace transform of N (1) T and N (2) T for a ﬁxed time T , respectively, using (3.13). This can also be found in Theorem 3.4 in Dassios and Zhao(2011). Setting ρ = 0, we can obtain the conditional Laplace transform of N ( d ) T ( d = 1 , λ ( d )0 at time t = 0 for the self-exciting process with exponential decay. These processescan be considered in modelling the bivariate point process only when self-excited jumps are13nvolved in the bivariate intensity process eliminating the eﬀect of the externally excitedjumps, or to see the number of losses from the contribution of “after-cyber attacks” to theintensity eliminating the contribution of “initial-cyber attacks” to the intensity in cyberinsurance context. ( L (1) T , L (2) T )To derive the joint Laplace transform of the process ( L (1) T , L (2) T ) for a ﬁxed time T , westart with deriving the conditional joint Laplace transform, probability generating functionof the process ( λ (1) T , λ (2) T ) and the compound point process ( L (1) T , L (2) T ) in Theorem 3.3. Theorem 3.3

The conditional joint Laplace transform , probability generating functionof the process ( λ (1) T , λ (2) T ) and the compound point process ( L (1) T , L (2) T ) given λ (1)0 and λ (2)0 , and L (1)0 = 0 and L (2)0 = 0 at time t = 0 is given by E h e − νL (1) T e − ζL (2) T × e − υλ (1) T e − γλ (2) T | λ (1)0 , λ (2)0 i = exp n −G − υ,ν ( T ) λ (1)0 o exp n −H − γ,ζ ( T ) λ (2)0 o × exp  − ρ T Z (cid:20) − ∧ f (cid:8) G − υ,ν ( τ ) , H − γ,ζ ( τ ) (cid:9)(cid:21) dτ  × exp  − υ Z G − υ,ν ( T )  a (1) δ (1) uδ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du  × exp  − γ Z H − γ,ζ ( T )  a (2) δ (2) uδ (2) u + ∧ k ( ξ ) ∧ h ( u ) −  du  , (3.20)where µ G = ∞ Z ydG ( y ), G υ,ν (Ψ ) = υ Z Ψ  δ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du , µ H = ∞ Z zdH ( z ), H γ,ζ (Ψ ) = γ Z Ψ  δ (2) u + ∧ k ( ζ ) ∧ h ( u ) −  du,δ (1) > ∧ j ( ν ) µ G and δ (2) > ∧ k ( ξ ) µ H . Proof.

By setting t = 0 , θ = 1 , and η = 1 , in (3.1), we have E h e − νL (1) T e − ζL (2) T e − υλ (1) T e − γλ (2) T | ℑ i = e − B (0) λ (1)0 e − B (0) λ (2)0 e − C ( T ) , (3.21)14here B (0) is uniquely determined by the non -linear ordinary diﬀerential equation (ODE) − B ′ ( t ) + δ (1) B ( t ) + ∧ g { B ( t ) } ∧ j ( ν ) − B ( T ) = υ , and similarly B (0) is uniquely determined by the non -linear ODE − B ′ ( t ) + δ (2) B ( t ) + ∧ h { B ( t ) } ∧ k ( ζ ) − B ( T ) = γ. (3.22) can be solved, under the condition δ (1) > ∧ j ( ν ) µ G , by the following steps (1)-(8):(1) Let us set B ( t ) = Ψ ( T − t ) = Ψ ( τ ) . Then it becomes d Ψ ( τ ) dτ = 1 − δ (1) B ( t ) − ∧ g { B ( t ) } ∧ j ( ν ) = 1 − δ (1) Ψ ( τ ) − ∧ g { Ψ ( τ ) } ∧ j ( ν ) =: f (Ψ ) (3.24)with initial condition Ψ (0) = υ ; we deﬁne the right-hand side as the function, f (Ψ ).(2) For ν = 0, we have f (Ψ ) = 1 − δ (1) Ψ ( τ ) − ∧ g { Ψ ( τ ) } and its unique solution is found by Ψ ( τ ) = G − υ, ( τ ) , that has been shown in Proposition 3.1.Under the condition of δ (1) > ∧ j ( ν ) µ G , we have ∂f (Ψ ) ∂ Ψ = ∧ j ( ν ) ∞ Z ye − Ψ y dG ( y ) − δ (1) ≤ ∧ j ( ν ) ∞ Z ydG ( y ) − δ (1) = ∧ j ( ν ) µ G − δ (1) < , for Ψ ≥ , then f (Ψ ) < > d Ψ ( τ ) δ (1) Ψ ( τ ) − ∧ j ( ν ) ∧ g { Ψ ( τ ) } − − dτ . Integrate both sides from time 0 to τ with initial condition Ψ (0) = υ > , then we have υ Z Ψ  δ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du = τ , where Ψ ≥ . Now we deﬁne the left-hand side as the function G υ,ν (Ψ ) =: υ Z Ψ  δ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du. G υ,ν (Ψ ) = τ (= T − t ) , which is the time diﬀerence between T and t and it is obvious that Ψ → υ when τ (= T − t ) → . (4) As δ (1) − ∧ j ( ν ) µ G > υ Z  δ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du = ∞ so Ψ → τ → ∞ . The integrand is positive in the domain u ∈ (0 , υ ] and forΨ ≤ υ , G υ,ν (Ψ ) is a strictly decreasing function. Therefore G υ,ν (Ψ ) = τ : (0 , υ ] → [0 , ∞ )is a well deﬁned (monotone) function and its inverse function G − υ,ν ( τ ) = Ψ : [0 , ∞ ) → (0 , υ ]exists.(5) The unique solution is found byΨ ( τ ) = Ψ ( T − t ) = B ( t ) = G − υ,ν ( τ ) = G − υ,ν ( T − t )and hence B (0) is obtained, B (0) = Ψ ( T ) = G − υ,ν ( T ) . (6) Similar to solving (3.22), under the condition δ (2) > ∧ k ( ζ ) µ H , the unique solutionfor (3.23) is found byΨ ( τ ) = Ψ ( T − t ) = B ( t ) = H − γ,ζ ( τ ) = H − γ,ζ ( T − t )and hence B (0) is obtained, B (0) = Ψ ( T ) = H − γ,ζ ( T ) . Hence H γ,ζ (Ψ ) =: γ Z Ψ  δ (2) u + ∧ k ( ζ ) ∧ h ( u ) −  du is a strictly decreasing function, where the integrand is positive in the domain u ∈ (0 , γ ]and for Ψ ≤ γ , H γ,ζ (Ψ ) is a strictly decreasing function. Therefore H γ,ζ (Ψ ) = τ : (0 , γ ] → [0 , ∞ )16s a well deﬁned (monotone) function and its inverse function H − γ,ζ ( τ ) = Ψ : [0 , ∞ ) → (0 , γ ]exists.(7) Now C ( T ) is determined by C ( T ) = ρ T Z (cid:20) − ∧ f (cid:8) G − υ,ν ( τ ) , H − γ,ζ ( τ ) (cid:9)(cid:21) dτ + a (1) δ (1) T Z G − υ,ν ( τ ) dτ + a (2) δ (2) T Z H − γ,ζ ( τ ) dτ . By the change of variable G − υ,ν ( τ ) = u, we have τ = G − υ,ν ( u ), and T Z G − υ,ν ( τ ) dτ = G − υ,ν ( T ) Z G − υ,ν (0) u ∂τ∂u du = υ Z G − υ,ν ( T )  uδ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du. Similarly, we have T Z H − γ,ζ ( τ ) dτ = H − γ,ζ ( T ) Z H − γ,ζ (0) u ∂τ∂u du = γ Z H − γ,ζ ( τ )( T )  uδ (2) u + ∧ k ( ξ ) ∧ h ( u ) −  du. (8) Finally, substitute B (0) , B (0) and C ( T ) into (3.21) and the result follows.Now let us derive the joint Laplace transform of the process ( L (1) T , L (2) T ) for a ﬁxed time T in Theorem 3.4. Theorem 3.4.

The conditional joint Laplace transform of the process ( L (1) T , L (2) T ) given λ (1)0 and λ (2)0 , and L (1)0 = 0 and L (2)0 = 0 at time t = 0 is given by E h e − νL (1) T e − ζL (2) T | λ (1)0 , λ (2)0 i = exp n −G − ,ν ( T ) λ (1)0 o exp n −H − ,ζ ( T ) λ (2)0 o × exp  − ρ T Z (cid:20) − ∧ f (cid:8) G − ,ν ( τ ) , H − ,ζ ( τ ) (cid:9)(cid:21) dτ  × exp  − Z G − ,ν ( T )  a (1) δ (1) uδ (1) u + ∧ j ( ν ) ∧ g ( u ) −  du  × exp  − Z H − ,ζ ( T )  a (2) δ (2) uδ (2) u + ∧ k ( ξ ) ∧ h ( u ) −  du  . (3.25)17 roof. Set υ = 0, and γ = 0 in (3.20), then the result follows immediately. Remark 4 . We can easily derive the Laplace transform of L (1) T and L (2) T for a ﬁxed time T , respectively, using (3.25). Setting ρ = 0, we can obtain the conditional Laplace transformof L ( d ) T ( d = 1 ,

2) given λ ( d )0 at time t = 0 for the self-exciting process with exponential decay.These processes can be considered in modelling the bivariate compound point process onlywhen self-excited jumps are involved in the bivariate intensity process eliminating the eﬀectof the externally excited jumps, or to see the aggregate losses from the contribution of “after-cyber attacks” to the intensity eliminating the contribution of “initial-cyber attacks” to theintensity in cyber insurance context.

4. Moments, covariance and linear correlation

In this section, we derive the expectation of L ( i ) t ( i = 1 ,

2) and the joint expectation of L (1) t and L (2) t , which is another key result of this paper, for which we need the expectationsof λ (1) t and λ (2) t , respectively and the joint expectation of λ (1) t and λ (2) t . So let us startwith stating three propositions adopted from Dassios and Zhao (2011) and Jang and Dassios(2013). Proposition 4.1.

The conditional expectation of the process λ (1) t given λ (1)0 at time t = 0, is given by E (cid:16) λ (1) t | λ (1)0 (cid:17) = λ (1)0 e − ( δ (1) − µ G ) t + µ F ρ + a (1) δ (1) δ (1) − µ G (cid:16) − e − ( δ (1) − µ G ) t (cid:17) , for δ (1) = µ G , (4.1) E (cid:16) λ (1) t | λ (1)0 (cid:17) = λ (1)0 + (cid:16) µ F ρ + a (1) δ (1) (cid:17) t, for δ (1) = µ G , (4.2) where µ F = ∞ Z x (1) dF (cid:0) x (1) (cid:1) and F (cid:0) x (1) (cid:1) is the marginal distribution function for n X (1) i o i =1 , , ··· .The conditional expectation of the process λ (2) t given λ (2)0 at time t = 0, is given by E (cid:16) λ (2) t | λ (2)0 (cid:17) = λ (2)0 e − ( δ (2) − µ H ) t + µ F ρ + a (2) δ (2) δ (2) − µ H (cid:16) − e − ( δ (2) − µ H ) t (cid:17) , for δ (2) = µ H , (4.3) E (cid:16) λ (2) t | λ (2)0 (cid:17) = λ (2)0 + (cid:16) µ F ρ + a (2) δ (2) (cid:17) t, for δ (2) = µ H , (4.4) where µ F = ∞ Z x (2) dF (cid:0) x (2) (cid:1) nd F (cid:0) x (2) (cid:1) is the marginal distribution function for n X (2) i o i =1 , , ··· .Assuming that δ (1) > µ G and δ (2) > µ H , and setting time t → ∞ in (4.1) and (4.3) respectively , the expectations of the stationary distribution of the process λ ( i ) t ( i = 1 , aregiven by E (cid:16) λ (1) t (cid:17) = µ F ρ + a (1) δ (1) δ (1) − µ G (4.5) and E (cid:16) λ (2) t (cid:17) = µ F ρ + a (2) δ (2) δ (2) − µ H . (4.6) Proposition 4.2.

The conditional joint expectation of λ (1) t and λ (2) t given λ (1)0 and λ (2)0 at time t = 0, is given by E (cid:16) λ (1) t λ (2) t | λ (1)0 , λ (2)0 (cid:17) = λ (1)0 λ (2)0 e − { ( δ (1) − µ G ) + ( δ (2) − µ H ) } t + (cid:16) a (2) δ (2) + µ F ρ (cid:17)  (cid:18) λ (1)0 − µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) (cid:26) e − ( δ (1) − µ G ) t − e − { ( δ (1) − µ G ) + ( δ (2) − µ H ) } t δ (2) − µ H (cid:27) + (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) (cid:26) − e − { ( δ (1) − µ G ) + ( δ (2) − µ H ) } t ( δ (1) − µ G ) + ( δ (2) − µ H ) (cid:27)  + (cid:16) a (1) δ (1) + µ F ρ (cid:17)  (cid:18) λ (2)0 − µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) (cid:26) e − ( δ (2) − µ H ) t − e − { ( δ (1) − µ G ) + ( δ (2) − µ H ) } t δ (1) − µ G (cid:27) + (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) (cid:26) − e − { ( δ (1) − µ G ) + ( δ (2) − µ H ) } t ( δ (1) − µ G ) + ( δ (2) − µ H ) (cid:27)  + µ F , ρ  − e − { ( δ (1) − µ G ) + ( δ (2) − µ H ) } t (cid:16) δ (1) − µ G (cid:17) + (cid:16) δ (2) − µ H (cid:17)  , for δ (1) = µ G and δ (2) = µ H . (4.7) E (cid:16) λ (1) t λ (2) t | λ (1)0 , λ (2)0 (cid:17) = λ (1)0 λ (2)0 + (cid:16) a (2) δ (2) + µ F ρ (cid:17) " λ (1)0 t + µ F ρ + a (1) δ (1) ! t + (cid:16) a (1) δ (1) + µ F ρ (cid:17) " λ (2)0 t + µ F ρ + a (2) δ (2) ! t µ F , ρt, for δ (1) = µ G and δ (2) = µ H . (4.8) where µ F , = ∞ R ∞ R x (1) x (2) dF (cid:0) x (1) , x (2) (cid:1) . Assuming that δ (1) > µ G and δ (2) > µ H , and setting time t → ∞ in (4.7), the jointexpectation of the stationary distribution of the process λ ( i ) t ( i = 1 , is given by E (cid:16) λ (1) t λ (2) t (cid:17) =  (cid:16) δ (1) − µ G (cid:17) + (cid:16) δ (2) − µ H (cid:17)  ×  (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) + µ F , ρ  (4.9) Proposition 4.3.

The second moment of the process λ (1) t given λ (1)0 at time t = 0, isgiven by E (cid:20)n λ (1) t o | λ (1)0 (cid:21) = (cid:16) λ (1)0 (cid:17) e − ( δ (1) − µ G ) t + 2 (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G δ (1) − µ G × λ (1)0 − µ F ρ + a (1) δ (1) δ (1) − µ G ! (cid:16) e − ( δ (1) − µ G ) t − e − ( δ (1) − µ G ) t (cid:17) +  n (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G o (cid:16) µ F ρ + a (1) δ (1) (cid:17) (cid:16) δ (1) − µ G (cid:17) + µ F ρ (cid:16) δ (1) − µ G (cid:17)  (cid:16) − e − ( δ (1) − µ G ) t (cid:17) , for δ (1) = µ G , (4.10) E (cid:20)n λ (1) t o | λ (1)0 (cid:21) = (cid:16) λ (1)0 (cid:17) + n (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G o (cid:26) λ (1)0 t + 12 (cid:16) µ F ρ + a (1) δ (1) (cid:17) t (cid:27) + µ F ρt, for δ (1) = µ G , (4.11)20 here µ F = ∞ Z x (1) dF (cid:0) x (1) (cid:1) , and µ F = ∞ Z (cid:8) x (1) (cid:9) dF (cid:0) x (1) (cid:1) and F (cid:0) x (1) (cid:1) is the marginal distribution function for n X (1) i o i =1 , , ··· .The second moment of the process λ (2) t given λ (2)0 at time t = 0, is given by E (cid:20)n λ (2) t o | λ (2)0 (cid:21) = (cid:16) λ (2)0 (cid:17) e − ( δ (2) − µ H ) t + 2 (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H δ (2) − µ H × λ (2)0 − µ F ρ + a (2) δ (2) δ (2) − µ H ! (cid:16) e − ( δ (2) − µ H ) t − e − ( δ (2) − µ H ) t (cid:17) +  n (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H o (cid:16) µ F ρ + a (2) δ (2) (cid:17) (cid:16) δ (2) − µ H (cid:17) + µ F ρ (cid:16) δ (2) − µ H (cid:17)  (cid:16) − e − ( δ (2) − µ H ) t (cid:17) , for δ (2) = µ H , (4.12) E (cid:20)n λ (2) t o | λ (2)0 (cid:21) = (cid:16) λ (2)0 (cid:17) + n (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H o (cid:26) λ (2)0 t + 12 (cid:16) µ F ρ + a (2) δ (2) (cid:17) t (cid:27) + µ F ρt, for δ (2) = µ H , (4.13) where µ F = ∞ Z x (2) dF (cid:0) x (2) (cid:1) and µ F = ∞ Z (cid:8) x (2) (cid:9) dF (cid:0) x (2) (cid:1) and F (cid:0) x (2) (cid:1) is the marginal distribution function for n X (2) i o i =1 , , ··· .Assuming that δ (1) > µ G and δ (2) > µ H , and setting time t → ∞ in (4.10) and (4.12) respectively , the second moments of the stationary distribution of the process λ ( i ) t ( i = 1 , are given by (cid:20)n λ (1) t o (cid:21) = n (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G o (cid:16) µ F ρ + a (1) δ (1) (cid:17) (cid:16) δ (1) − µ G (cid:17) + µ F ρ (cid:16) δ (1) − µ G (cid:17) (4.14) and E (cid:20)n λ (2) t o (cid:21) = n (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H o (cid:16) µ F ρ + a (2) δ (2) (cid:17) (cid:16) δ (2) − µ H (cid:17) + µ F ρ (cid:16) δ (2) − µ H (cid:17) . (4.15)Using Proposition 4.1, we now derive the expectation of L ( i ) T ( i = 1 ,

2) directly solvingan ODE in Theorem 4.1. We can derive them by diﬀerentiating the Laplace transform of L ( i ) T ( i = 1 ,

2) with respect to ν and ξ , and then setting ν = 0 and ξ = 0, respectively.However solving the ODE directly is easier to generalise to derive higher moments beyondthe conditions δ (1) > µ G and δ (2) > µ H , if necessary.The moments of N t can also be derived directly solving relevant ODEs, for which werefer you Dassios and Zhao (2011, 2017). Theorem 4.1.

The conditional expectation of the process L (1) t given λ (1)0 at time t = 0, is given by E (cid:16) L (1) t | λ (1)0 (cid:17) = L (1)0 + µ J  (cid:18) λ (1)0 − µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19) + (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) t  , for δ (1) = µ G , (4.16) E (cid:16) L (1) t | λ (1)0 (cid:17) = L (1)0 + µ J ( λ (1)0 t + µ F ρ + a (1) δ (1) ! t ) , for δ (1) = µ G , (4.17) where µ J = ∞ Z ζ (1) dJ ( ζ (1) ). The conditional expectation of the process L (2) t given λ (2)0 at time t = 0, is given by E (cid:16) L (2) t | λ (2)0 (cid:17) = L (2)0 + µ K  (cid:18) λ (2)0 − µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19) + (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) t  , for δ (2) = µ H , (4.18) E (cid:16) L (2) t | λ (2)0 (cid:17) = L (2)0 + µ K ( λ (2)0 t + µ F ρ + a (2) δ (2) ! t ) , for δ (2) = µ H , (4.19)22 here µ K = ∞ Z ζ (2) dK ( ζ (2) ) . Proof.

See Appendix A.

Corollary 4.1.

For the stationary distribution of the process λ (1) t , given L (1)0 = 0, theexpectation of the process L (1) t is given by E (cid:16) L (1) t (cid:17) = µ J µ F ρ + a (1) δ (1) δ (1) − µ G ! t, δ (1) > µ G (4.20) and for the stationary distribution of the process λ (2) t , given L (2)0 = 0, the expectation ofthe process L (2) t is given by E (cid:16) L (2) t (cid:17) = µ K µ F ρ + a (2) δ (2) δ (2) − µ H ! t, δ (2) > µ H . (4.21) Proof.

See Appendix B.We now derive the joint expectation of L (1) t and L (2) t in Theorem 3.6, for which we startwith a lemma to show the joint expectation of λ (1) t L (2) t and the joint expectation of λ (2) t L (1) t ,respectively. For simplicity, we use the case for the stationary distribution of the process λ ( i ) t ( i = 1 , L (1) t and L (2) t provided that the process the has been running for a relatively long period and is closeto the stationary (asymptotic) state. Lemma 4.1.

For the stationary distribution of the process λ (1) t and λ (2) t , given L (2)0 = 0, the joint expectation of λ (1) t and L (2) t is given by E (cid:16) λ (1) t L (2) t (cid:17) = µ K (cid:16) a (1) δ (1) + µ F ρ (cid:17)  µ F ρ + a (2) δ (2) (cid:16) δ (2) − µ H (cid:17) (cid:16) δ (1) − µ G (cid:17)  t + µ K − e − ( δ (1) − µ G ) t δ (1) − µ G ! ×  (cid:18) ( δ (1) − µ G ) + ( δ (2) − µ H ) (cid:19)  (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) + µ F , ρ  − (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H )( δ (1) − µ G ) (cid:27)  for δ (1) > µ G and δ (2) > µ H , (4.22) and given L (1)0 = 0, the joint expectation of λ (2) t and L (1) t is given by (cid:16) λ (2) t L (1) t (cid:17) = µ J (cid:16) a (2) δ (2) + µ F ρ (cid:17)  µ F ρ + a (1) δ (1) (cid:16) δ (2) − µ H (cid:17) (cid:16) δ (1) − µ G (cid:17)  t + µ J − e − ( δ (2) − µ H ) t δ (2) − µ H ! ×  (cid:18) ( δ (1) − µ G ) + ( δ (2) − µ H ) (cid:19)  (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) + µ F , ρ  − (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:26) µ F ρ + a (1) δ (1) ( δ (2) − µ H )( δ (1) − µ G ) (cid:27)  for δ (1) > µ G and δ (2) > µ H , (4.23) Proof.

See Appendix C.

Theorem 4.2.

For the stationary distribution of the process λ (1) t and λ (2) t , given L (1)0 = L (2)0 = 0, the joint expectation of L (1) t and L (2) t is given by E (cid:16) L (1) t L (2) t (cid:17) = µ J µ K a (1) δ (1) + µ F ρ !  µ F ρ + a (2) δ (2) (cid:16) δ (2) − µ H (cid:17) (cid:16) δ (1) − µ G (cid:17)  t + µ J µ K δ (1) − µ G ! ( t − − e − ( δ (1) − µ G ) t δ (1) − µ G !) ×  (cid:18) ( δ (1) − µ G ) + ( δ (2) − µ H ) (cid:19)  (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) + µ F , ρ  − (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H )( δ (1) − µ G ) (cid:27)  + µ K µ J a (2) δ (2) + µ F ρ !  µ F ρ + a (1) δ (1) (cid:16) δ (2) − µ H (cid:17) (cid:16) δ (1) − µ G (cid:17)  t + µ K µ J δ (2) − µ H ! ( t − − e − ( δ (2) − µ H ) t δ (2) − µ H !) ×  (cid:18) ( δ (1) − µ G ) + ( δ (2) − µ H ) (cid:19)  (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:18) µ F ρ + a (1) δ (1) δ (1) − µ G (cid:19) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) (cid:18) µ F ρ + a (2) δ (2) δ (2) − µ H (cid:19) + µ F , ρ  − (cid:16) a (2) δ (2) + µ F ρ (cid:17) (cid:26) µ F ρ + a (1) δ (1) ( δ (2) − µ H )( δ (1) − µ G ) (cid:27)  , or δ (1) = µ G and δ (2) = µ H . (4.24) Proof.

See Appendix D.Based on Theorem 4.2 and Corollary 4.1, we can easily obtain the covariance between L (1) t and L (2) t , i.e. Cov (cid:16) L (1) t , L (2) t (cid:17) = E (cid:16) L (1) t L (2) t (cid:17) − E (cid:16) L (1) t (cid:17) E (cid:16) L (2) t (cid:17) (4.25)and the linear correlation coeﬃcient between L (1) t and L (2) t , i.e. Corr (cid:16) L (1) t , L (2) t (cid:17) = Cov (cid:16) L (1) t , L (2) t (cid:17)r V ar (cid:16) L (1) t (cid:17)r V ar (cid:16) L (2) t (cid:17) (4.26)and hence we omit their corresponding expressions. We show their numerical values in cyberinsurance context in Section 5.For the correlation coeﬃcient calculation, we need variance of L (1) t and L (2) t , respectively,for which we start with a lemma to show the joint expectation of λ (1) t L (1) t and the jointexpectation of λ (2) t L (2) t , respectively. Lemma 4.2.

For the stationary distribution of the process λ (1) t and λ (2) t , given L (1)0 = 0, the joint expectation of λ (1) t and L (1) t is given by E (cid:16) λ (1) t L (1) t (cid:17) = (cid:16) a (1) δ (1) + µ F ρ (cid:17) µ J  µ F ρ + a (1) δ (1) (cid:16) δ (1) − µ G (cid:17)  ( t − − e − ( δ (1) − µ G ) t δ (1) − µ G !) + µ J − e − ( δ (1) − µ G ) t δ (1) − µ G ! ×  n (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G o (cid:16) µ F ρ + a (1) δ (1) (cid:17) (cid:16) δ (1) − µ G (cid:17) + µ F ρ (cid:16) δ (1) − µ G (cid:17)  + µ G µ J − e − ( δ (1) − µ G ) t δ (1) − µ G ! µ F ρ + a (1) δ (1) δ (1) − µ G ! for δ (1) > µ G , (4.27) and given L (1)0 = 0, the joint expectation of λ (2) t and L (2) t is given by E (cid:16) λ (2) t L (2) t (cid:17) = (cid:16) a (2) δ (2) + µ F ρ (cid:17) µ K  µ F ρ + a (2) δ (2) (cid:16) δ (2) − µ H (cid:17)  ( t − − e − ( δ (2) − µ H ) t δ (2) − µ H !) µ K − e − ( δ (2) − µ H ) t δ (2) − µ H ! ×  n (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H o (cid:16) µ F ρ + a (2) δ (2) (cid:17) (cid:16) δ (2) − µ H (cid:17) + µ F ρ (cid:16) δ (2) − µ H (cid:17)  + µ H µ K − e − ( δ (2) − µ H ) t δ (2) − µ H ! µ F ρ + a (2) δ (2) δ (2) − µ H ! for δ (2) > µ H , (4.28) Proof.

See Appendix E.

Theorem 4.3.

For the stationary distribution of the process λ (1) t and λ (2) t , given L (1)0 = L (2)0 = 0, the second moment of the process of L (1) t is given by E (cid:26)(cid:16) L (1) t (cid:17) (cid:27) = 2 µ J  (cid:18) a (1) δ (1) + µ F ρ (cid:19) µ J (cid:26) µ F ρ + a (1) δ (1) ( δ (1) − µ G ) (cid:27) t − (cid:16) a (1) δ (1) + µ F ρ (cid:17) µ J (cid:26) µ F ρ + a (1) δ (1) ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27) + µ J (cid:20) n (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G o(cid:16) µ F ρ + a (1) δ (1) (cid:17) ( δ (1) − µ G ) + µ F ρ ( δ (1) − µ G ) (cid:21) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27) + µ G µ J (cid:26) µ F ρ + a (1) δ (1) ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27)  + µ J µ F ρ + a (1) δ (1) δ (1) − µ G ! t (4.29) and the second moment of the process of L (2) t is given by E (cid:26)(cid:16) L (2) t (cid:17) (cid:27)

26 2 µ K  (cid:18) a (2) δ (2) + µ F ρ (cid:19) µ K (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H ) (cid:27) t − (cid:16) a (2) δ (2) + µ F ρ (cid:17) µ K (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19)(cid:27) + µ K (cid:20) n (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H o(cid:16) µ F ρ + a (2) δ (2) (cid:17) ( δ (2) − µ H ) + µ F ρ ( δ (2) − µ H ) (cid:21) (cid:26) t − (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19)(cid:27) + µ H µ K (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19)(cid:27)  + µ K µ F ρ + a (2) δ (2) δ (2) − µ H ! t. (4.30) Proof.

See Appendix F.

Corollary 4.2.

For the stationary distribution of the process λ (1) t and λ (2) t , given L (1)0 = L (2)0 = 0, the variance of the process of L (1) t is given by V ar (cid:16) L (1) t (cid:17) = 2 µ J  µ J (cid:20) n (cid:16) µ F ρ + a (1) δ (1) (cid:17) + µ G o(cid:16) µ F ρ + a (1) δ (1) (cid:17) ( δ (1) − µ G ) + µ F ρ ( δ (1) − µ G ) (cid:21) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27) + µ G µ J (cid:26) µ F ρ + a (1) δ (1) ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27) − (cid:16) a (1) δ (1) + µ F ρ (cid:17) µ J (cid:26) µ F ρ + a (1) δ (1) ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27)  + µ J µ F ρ + a (1) δ (1) δ (1) − µ G ! t (4.31) and the variance of the process of L (2) t is given by V ar (cid:16) L (2) t (cid:17) = 2 µ K  µ K (cid:20) n (cid:16) µ F ρ + a (2) δ (2) (cid:17) + µ H o(cid:16) µ F ρ + a (2) δ (2) (cid:17) ( δ (2) − µ H ) + µ F ρ ( δ (2) − µ H ) (cid:21) (cid:26) t − (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19)(cid:27) + µ H µ K (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19)(cid:27) − (cid:16) a (2) δ (2) + µ F ρ (cid:17) µ K (cid:26) µ F ρ + a (2) δ (2) ( δ (2) − µ H ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (2) − µ H ) t δ (2) − µ H (cid:19)(cid:27)  + µ K µ F ρ + a (2) δ (2) δ (2) − µ H ! t. (4.32)27 roof. See Appendix G.The corresponding results for Lemma 4.1-4.2, Theorem 4.2-4.3 and Corollary 4.2 can beobtained without using the case for the stationary distribution of the process λ ( i ) t ( i = 1 ,

5. Insurance application

The proposed bivariate compound dynamic contagion process may be interpreted in thecontext of cyber insurance. An initial cyber attack/incident/shock (e.g. a computer virus)may be the magnitude of joint contribution to intensities for two diﬀerent business risks/linesat the same time. In the bivariate compound dynamic contagion process, they are positiveexternally-excited joint jumps with its distribution F ( x (1) , x (2) ) , x (1) > , x (2) >

0, wheremargins are F X (1) and F X (2) at the corresponding random times { T ,i } i =1 , , ··· following aPoisson process M t with constant rate ρ > G ( y ), y >

0, at the corresponding random times { T ,j } j =1 , , ··· and another positiveself-excited jumps with distribution function H ( z ), z >

0, at the corresponding randomtimes { T ,k } k =1 , , ··· . The impact of each attack/incident/shock decays exponentially withconstant rate δ .The number of losses/claims released from the ﬁrst business risk/line, N (1) t is driven by aseries of after-cyber attacks/incidents/shocks { Y j } j =1 , , ··· and initial cyber attacks/incidents/shocks n X (1) i o i =1 , , ··· via its intensity λ (1) t , and the number losses/claims released from the sec-ond business risk/line, N (2) t is driven by a series of after-cyber attacks/incidents/shocks { Z k } k =1 , , ··· and initial cyber attacks/incidents/shocks n X (2) i o i =1 , , ··· via its intensity λ (2) t ,where initial cyber attacks/incidents/shocks n X (1) i , X (2) i o i =1 , , ··· occur to two diﬀerent busi-ness risks/lines simultaneously/collaterally with constant intensity ρ . L (1) t is the aggregate loss from the ﬁrst business risk/line, where loss/claim distributionfunction is given by J ( ξ (1) ), ξ (1) >

0, and L (2) t is the aggregate loss from the second businessrisk/line, where loss/claim distribution function is given by K ( ξ (2) ), ξ (2) > Set a (1) = 0 in (2.1), then from (4.20) the expectation of the process L (1) t is given by E (cid:16) L (1) t (cid:17) = µ J µ F ρδ (1) − µ G ! t, δ (1) > µ G (5.1)and from (4.31) its variance is given by V ar (cid:16) L (1) t (cid:17)

28 2 µ J  µ J (cid:26) (cid:16) µ F ρ + µ G (cid:17) × µ F ρ ( δ (1) − µ G ) + µ F ρ ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27) + µ G µ J (cid:26) µ F ρ ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27) − µ J µ F ρ (cid:26) µ F ρ ( δ (1) − µ G ) (cid:27) (cid:26) t − (cid:18) − e − ( δ (1) − µ G ) t δ (1) − µ G (cid:19)(cid:27)  + µ J µ F ρδ (1) − µ G ! t (5.2)If there are no self-excited jumps, from (5.1) we have µ J (cid:18) µ F ρδ (1) (cid:19) t , (5.3)which is the expectation of compound shot-noise Cox process, and can also be found inDassios and Jang (2003) and Jang and Fu (2012). From (5.2), the corresponding varianceis given by (cid:0) µ J (cid:1) µ F ρ (cid:16) δ (1) (cid:17) ( t − − e − δ (1) t δ (1) !) + µ J (cid:18) µ F ρδ (1) (cid:19) t. (5.4)Let us now illustrate the calculations of above expectations as cyber insurance premiums.For F (cid:0) x (1) (cid:1) , we use an exponential distribution, i.e.1 − e − αx (1) , α > G ( y ), we use a Loggamma distribution with probability density, i.e. ς c ψ Γ ( c ) (cid:26) ln (cid:18) yψ + 1 (cid:19)(cid:27) c − (cid:18) yψ + 1 (cid:19) − ς − , ψ > , ς > c > J ( ξ (1) ) , we use a Pareto distributionwith probability density, i.e.Γ (cid:0) ω (1) + k (1) (cid:1) n ζ (1) o ω (1) n ξ (1) o k (1) − Γ ( ω (1) ) Γ ( k (1) ) (cid:16) ζ (1) + ξ (1) (cid:17) ω (1) + k (1) , ω (1) > , ζ (1) > k (1) > Example 5.1

We assume that the frequency of initial cyber attack/incident/shock (e.g. a computervirus) to single business risk/line is 3 per unit time period (say, per year) with the average29f contribution to intensity, 10. Once the virus is executed, it replicates itself by modifyingother computer programs causing a series of infection to this business risk/line IT system.The mean of contribution to intensity by after-cyber attacks/incidents/shocks (e.g. infec-tions), which are unknown at the arrival times of initial cyber attacks/incidents/shocks, isassumed to be 2 . δ (1) = 3 , ρ = 3 , α = 0 . , ψ = 1 , ς = 2 . c = 3 ,ω (1) = 3 , ζ (1) = 4 , k (1) = 6 and t = 1 . and from (5.1)-(5.4), their calculations are shown in Table 5.1. Table 5.1

Univariate compounddynamic contagion process Univariate compoundshot-noise Cox processMean 3 , .

71 120Variance 6 , , . , , . . Remark 5 : Table 5.1 shows that mean-standard deviation principle premium, 5 , . .

59 calculatedbased on (5.3)-(5.4). It is because after-cyber attacks/incidents/shocks (e.g. infections)driven by initial cyber attacks/incidents/shocks (e.g. a computer virus). In other words, µ G , which is the mean of after-cyber attacks/incidents/shocks, is the main driver to raisethe premium extremely higher than its counterpart. Hence the signiﬁcance of after-cyber at-tacks/incidents/shocks driven from an initial attack/incident/shock depends on its measure G ( y ).Due to the digitalisation of business and economic activities, all types of risk are touchedby cyber nowadays. To deal with new challenge insurers face - risks arising from cyberspace, they need new tools to measure these risks. The mean-standard deviation principlepremium value calculated based on (5.1)-(5.2) clearly justiﬁes that the univariate compounddynamic contagion process can be used for modelling aggregate losses/claims from cyberattacks/incidents. Set a (1) = 0 , a (2) = 0 and δ = δ (1) = δ (2) , then from (4.20) and (4.21), the expectation ofthe process L (1) t is given by E (cid:16) L (1) t (cid:17) = µ J (cid:18) µ F ρδ − µ G (cid:19) t, δ > µ G (5.5)and the expectation of the process L (2) t is given by30 (cid:16) L (2) t (cid:17) = µ K (cid:18) µ F ρδ − µ H (cid:19) t, δ > µ H . (5.6)Let us assume that an insurance company charges cyber insurance premium as follows: E (cid:16) L (1) t + L (2) t (cid:17) + φ r V ar (cid:16) L (1) t + L (2) t (cid:17) = E (cid:16) L (1) t (cid:17) + E (cid:16) L (2) t (cid:17) + φ r V ar (cid:16) L (1) t (cid:17) + V ar (cid:16) L (2) t (cid:17) + 2 Cov (cid:16) L (1) t , L (2) t (cid:17) , (5.7)where 0 ≤ φ ≤ φ r V ar (cid:16) L (1) t + L (2) t (cid:17) can be considered as a security loading.To calculate the covariance, we need to specify externally-excited joint jump distribution F ( x (1) , x (2) ), for which we oﬀer four choices of copulas: (1) the Farlie-Gumbel-Morgenstern(FGM) copula, (2) the Gaussian copula, (3) the t copula and (4) the Gumbel copula. TheFarlie-Gumbel-Morgenstern (FGM) family copula is given by C θ ( u , u ) = u u + θu u (1 − u )(1 − u ) , (5.8)where u ∈ [0 , u ∈ [0 ,

1] and θ ∈ [ − , C θ ( u , u ) = Φ Σ (cid:0) Φ − ( u ) , Φ − ( u ) (cid:1) , (5.9)where Φ − is the inverse cumulative distribution function (c.d.f.) of a standard univariatenormal, Φ Σ denotes the c.d.f. for a bivariate normal distribution with mean vector zeroand covariance matrix Σ, where Σ the 2 × θ otherwise, u ∈ [0 , u ∈ [0 ,

1] and θ ∈ [ − , t copula is given by C θ ( u , u ) = t ε, Σ (cid:0) t − ν ( u ) , t − ν ( u ) (cid:1) , (5.10)where t − ν is the inverse cumulative distribution function (c.d.f.) of a standard univariate t , t ε, Σ denotes the c.d.f. for a bivariate t distribution with mean vector zero and covariancematrix Σ, where Σ the 2 × θ otherwise, ε is the degrees of freedom, u ∈ [0 , u ∈ [0 ,

1] and θ ∈ [ − , C θ ( u , u ) = exp (cid:20)n ( − ln ( u )) − θ + ( − ln ( u )) − θ − o − θ (cid:21) , (5.11)where u ∈ [0 , u ∈ [0 ,

1] and θ ∈ [1 , ∞ ).For F (cid:0) x (2) (cid:1) , we also use an exponential distribution, i.e. F (cid:0) x (2) (cid:1) = 1 − e − βx (2) ( β > H ( z ), we use a Fr´echet distribution with probability density, i.e. ǫϕ (cid:18) zϕ (cid:19) − ǫ − e − ( zϕ ) − ǫ , ϕ > ǫ > K ( ξ (2) ) , we use another Pareto distri-bution with probability density, i.e.Γ (cid:0) ω (2) + k (2) (cid:1) n ζ (2) o ω (2) n ξ (2) o k (2) − Γ ( ω (2) ) Γ ( k (2) ) (cid:16) ζ (2) + ξ (2) (cid:17) ω (2) + k (2) , ω (2) > , ζ (2) > k (2) > joint cyber at-tack/incident/shock (e.g. a computer virus) to two business risks/lines is 3 per unit timeperiod (say, per year) with the same average of contributions to both intensities, 10. Oncethe virus is executed, it replicates itself by modifying other computer programs causing a se-ries of infection to two business risks/lines IT systems, separately. The mean of contributionto the ﬁrst & second business risk/line intensity by after-cyber attacks/incidents/shocks (e.g.infections), which are unknown at the arrival times of initial cyber attacks/incidents/shocks,is assumed to be 2 . . β = 0 . , ϕ = 2 , ǫ = 3, φ = 1 ,ω (2) = 4 , ζ (2) = 4 and k (2) = 6and using the parameter values in Example 5.1, let us now illustrate the calculations of cyberloss insurance premiums at diﬀerent value of θ, comparing their counterparts when there areno after cyber attacks/incidents/shocks. Example 5.2 (FGM copula)Due to the Farlie-Gumbel-Morgenstern (FGM) copulas simplicity and analytical tractabil-ity, we have µ F , = ∞ Z ∞ Z x (1) x (2) dF (cid:0) x (1) , x (2) (cid:1) = 1 αβ (cid:18) θ (cid:19) (5.12)to calculate E (cid:16) L (1) t L (2) t (cid:17) in Cov (cid:16) L (1) t , L (2) t (cid:17) . Cyber loss insurance premium calculationsare shown in Table 5.2, Table 5.2

Cyber loss insurance premium θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

74 331 . − . .

83 333 .

340 6487 .

92 335 . . .

01 337 .

381 6494 .

09 339 . E (cid:16) L (1) t (cid:17) = 3011 .

71 and

V ar (cid:16) L (1) t (cid:17) = 6 , , E (cid:16) L (2) t (cid:17) = 822 .

582 and

V ar (cid:16) L (2) t (cid:17) = 197 , E (cid:16) L (1) t (cid:17) = 120 and V ar (cid:16) L (1) t (cid:17) = 9 , . E (cid:16) L (2) t (cid:17) = 80 and V ar (cid:16) L (2) t (cid:17) = 4 , . . The covariances between L (1) t and L (2) t and their corresponding linear correlation coeﬃ-cients at diﬀerent value of θ , compared to their counterparts when there are no self-excitedjumps are shown in Table 5.3 and Table 5.4, respectively. Table 5.3

Cov (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

16 1639 . − . .

35 1913 .

130 65497 .

54 2186 . . .

73 2459 .

741 81871 .

93 2733 . Table 5.4

Corr (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − . . − . . . . . . . . . . Example 5.2 (Gaussian copula)For the Gaussian copulas, using the programming language R cyber loss insurance pre-mium calculations are shown in Table 5.5, 33 able 5.5

Cyber loss insurance premium θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

99 6 , .

08 324 . − . , .

91 329 .

360 6 , .

92 335 . . , .

08 342 . .

99 6 , .

20 350 . L (1) t and L (2) t and their corresponding linear correlation coeﬃ-cients at diﬀerent value of θ , compared to their counterparts when there are no self-excitedjumps are shown in Table 5.6 and Table 5.7, respectively. Table 5.6

Cov (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

99 23 , .

72 786 . − . , .

69 1 , .

780 65 , .

54 2 , . . , .

84 3 , . .

99 130 , .

13 4 , . Table 5.7

Corr (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

99 0 . . − . . . . . . . . .

99 0 . . Example 5.3 ( t copula with ε = 5)For the t copulas, using the programming language R cyber loss insurance premiumcalculations are shown in Table 5.8 Table 5.8

Cyber loss insurance premium θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

99 6 , .

08 324 . − . , .

53 329 .

780 6 , .

87 336 . . , .

76 342 . .

99 6 , .

21 350 . L (1) t and L (2) t and their corresponding linear correlation coeﬃ-cients at diﬀerent value of θ , comparing their counterparts when there are no self-excitedjumps are shown in Table 5.9 and Table 5.10, respectively. Table 5.9

Cov (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

99 23 , .

21 787 . − . , .

73 1 , .

400 68 , .

96 2 , . . , .

52 3 , . .

99 130 , .

18 4 , . Table 5.10

Corr (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process − .

99 0 . . − . . . . . . . . .

99 0 . . Example 5.4 (Gumbel copula)For the Gaussian copulas, using the programming language R cyber loss insurance pre-mium calculations are shown in Table 5.11.

Table 5.11

Cyber loss insurance premium θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process1 .

001 6 , .

97 335 .

412 6 , .

88 347 .

305 6 , .

66 350 . , .

29 350 . , .

49 350 . L (1) t and L (2) t and their corresponding linear correlation coeﬃ-cients at diﬀerent value of θ , compared to their counterparts when there are no self-excitedjumps are shown in Table 5.12 and Table 5.13, respectively.35 able 5.12 Cov (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process1 .

001 65 , .

60 2 , .

122 115 , .

56 3 , .

865 128 , .

15 4 , . , .

78 4 , . , .

50 4 , . Table 5.13

Corr (cid:16) L (1) t , L (2) t (cid:17) θ Bivariate compounddynamic contagion process Bivariate compoundshot-noise Cox process1 .

001 0 . . . . . . . . . . Remark 6 : Table 5.2, 5.5, 5.8 and 5.11 show that cyber loss insurance premium valuescalculated using the bivariate compound dynamic contagion process are signiﬁcantly higherthan their counterparts calculated using the bivariate compound shot-noise Cox process at adiﬀerent value of θ . The covariances in Table 5.3, 5.6, 5.9 and 5,12 also support this. It isbecause two means for after-cyber attacks/incidents/shocks, i.e. µ G and µ H , and µ F , areinvolved in calculating cyber loss insurance premium values using (5.7). Hence the signiﬁ-cance of two separate after-cyber attacks/incidents/shocks impacts driven from initial jointcyber attack/incident/shock depends on two measures G ( y ) and H ( z ). It will be of interestto examine cyber loss insurance premium values using other joint measures for initial cyberattack/incident/shock as well as other measures for after-cyber attacks/incidents/shocks. Remark 7 : Table 5.4, 5.7, 5.10 and 5.13 show that the linearities between L (1) t and L (2) t calculated using the bivariate compound dynamic contagion process and the bivariatecompound shot-noise Cox process at a diﬀerent value of θ. They show the former linearitiesbetween L (1) t and L (2) t signiﬁcantly lower than the latter linearities between L (1) t and L (2) t . Itis because two separate after-cyber attacks/incidents/shocks weaken the linearity between L (1) t and L (2) t . Therefore it will be also of interest to compare bivariate distribution forcompound dynamic contagion case with its counterpart, in particular seeing their two tailcorners inverting bivariate Fast Fourier transform using bivariate Laplace transform of theprocess ( L (1) t , L (2) t ) shown in Section 3.To make easier for statistical analysis, further business applications and research, we closethis section providing the simulation algorithm for one sample path of the bivariate com-pound dynamic contagion process L (1) t L (2) t ! N (1) t N (2) t ! , λ (1) t λ (2) t !! , with m jump times36 { T ∗ (1)1 , T ∗ (1)2 , · · · , T ∗ (1) m }{ T ∗ (2)1 , T ∗ (2)2 , · · · , T ∗ (2) m } in the process λ (1) t λ (2) t ! (see Figure 1). This algorithm hasbeen extended from Dassios and Zhao (2011) Section 5 algorithm, where they have shownhow to simulate the univariate dynamic contagion process. Algorithm 5.1 . (The bivariate compound dynamic contagion process simulation algo-rithm)1. Set the initial conditions T ∗ (1)0 = T ∗ (2)0 = 0 , λ (1) T ∗ (1)+0 = λ (1)0 > a (1) , λ (2) T ∗ (2)+0 = λ (2)0 > a (2) and i ∈ { , , , . . . , m − } .2. Simulate the ( i + 1) th externally excited joint jump waiting time E ∗ i +1 by E ∗ i +1 = − ρ ln U, U ∼ U[0 , i + 1) th self-excited jump waiting time S ∗ (1) i +1 by S ∗ (1) i +1 =  S ∗ (1)1 ,i +1 ∧ S ∗ (1)2 ,i +1 (cid:16) d (1) i +1 > (cid:17) S ∗ (1)2 ,i +1 (cid:16) d (1) i +1 < (cid:17) , where d (1) i +1 = 1 + δ (1) ln U λ (1) T ∗ (1)+ i − a (1) , U ∼ U[0 , S ∗ (1)1 ,i +1 = − δ (1) ln d (1) i +1 ; S ∗ (1)2 ,i +1 = − a (1) ln U , U ∼ U[0 , . (ii) Similarly, simulate the ( i + 1) th self-excited jump waiting time S ∗ (2) i +1 by S ∗ (2) i +1 =  S ∗ (2)1 ,i +1 ∧ S ∗ (2)2 ,i +1 (cid:16) d (2) i +1 > (cid:17) S ∗ (2)2 ,i +1 (cid:16) d (2) i +1 < (cid:17) , where d (2) i +1 = 1 + δ (2) ln U λ (2) T ∗ (2)+ i − a (2) , U ∼ U[0 , ∗ (2)1 ,i +1 = − δ (2) ln d (2) i +1 ; S ∗ (2)2 ,i +1 = − a (2) ln U , U ∼ U[0 , .

4. Simulate the ( i + 1) th overall jump time T ∗ i +1 by T ∗ i +1 = T ∗ i + S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 , where T ∗ = T ∗ (1)0 = T ∗ (2)0 = 05. (i) The ( i + 1) th jump time for the process λ (1) t is given by the overall jumptime T ∗ i +1 in Step 4, i.e. T ∗ (1) i +1 = T ∗ i +1 =  T ∗ i + S ∗ (1) i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (1) i +1 (cid:17) T ∗ i + E ∗ i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17) , where S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (2) i +1 is irrelevant to the ( i + 1) th jump time for theprocess λ (1) t .(ii) Similarly, the ( i + 1) th jump time for the process λ (2) t is given by the overalljump time T ∗ i +1 in Step 4, i.e. T ∗ (2) i +1 = T ∗ i +1 =  T ∗ i + S ∗ (2) i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (2) i +1 (cid:17) T ∗ i + E ∗ i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17) , where S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (1) i +1 is irrelevant to the ( i + 1) th jump time for theprocess λ (2) t .6. The changes at jump time T ∗ (1) i +1 in the intensity process λ (1) t is given by λ (1) T ∗ (1)+ i +1 =  λ (1) T ∗ (1) − i +1 + Y i +1 , Y i +1 ∼ G ( y ) (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (1) i +1 (cid:17) λ (1) T ∗ (1) − i +1 + X (1) i +1 , (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17) and the changes at jump time T ∗ (2) i +1 in the intensity process λ (2) t is given by λ (2) T ∗ (2)+ i +1 =  λ (2) T ∗ (2) − i +1 + Z i +1 , Z i +1 ∼ H ( z ) (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (2) i +1 (cid:17) λ (2) T ∗ (2) − i +1 + X (2) i +1 , (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17) , where λ (1) T ∗ (1) − i +1 = ( λ (1) T ∗ (1)+ i − a (1) ) e − δ (1) ( T ∗ (1) i +1 − T ∗ (1) i ) + a (1) , (2) T ∗ (2) − i +1 = ( λ (2) T ∗ (2)+ i − a (2) ) e − δ (2) ( T ∗ (2) i +1 − T ∗ (2) i ) + a (2) , (cid:16) X (1) i +1 , X (2) i +1 (cid:17) ∼ F ( x (1) , x (2) ) , where the joint distribution of the vector (cid:16) X (1) i +1 , X (2) i +1 (cid:17) is assumed to be of theform C θ ( F (cid:0) x (1) (cid:1) , F (cid:0) x (2) (cid:1) ) with C θ being a given copula.7. The change at jump time T ∗ (1) i +1 in the point process N (1) t is given by N (1) T ∗ (1)+ i +1 =  N (1) T ∗ (1) − i +1 + 1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (1) i +1 (cid:17) N (1) T ∗ (1) − i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17) and the change at jump time T ∗ (2) i +1 in the point process N (2) t is given by N (2) T ∗ (2)+ i +1 =  N (2) T ∗ (2) − i +1 + 1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (2) i +1 (cid:17) N (2) T ∗ (2) − i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17)

8. The change at jump time T ∗ (1) i +1 in the compound point process L (1) t is given by L (1) T ∗ (1)+ i +1 =  L (1) T ∗ (1) − i +1 + ξ (1) i +1 , ξ (1) i +1 ∼ J ( ξ (1) ) (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (1) i +1 (cid:17) L (1) T ∗ (1) − i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17) and the change at jump time T ∗ (2) i +1 in the compound point process L (2) t is given by L (2) T ∗ (2)+ i +1 =  L (2) T ∗ (2) − i +1 + ξ (2) i +1 , ξ (2) i +1 ∼ K ( ξ (2) ) (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = S ∗ (2) i +1 (cid:17) L (2) T ∗ (2) − i +1 (cid:16) S ∗ (1) i +1 ∧ S ∗ (2) i +1 ∧ E ∗ i +1 = E ∗ i +1 (cid:17)

6. Conclusion

Digitalisation of business and economic activities have changed the risk landscape tocyber space. A cyber attack can trigger multiple, catastrophic and contagious losses tocorporates and governments due to IT system interdependence. It is a real threat to allorganisations as the number of cyber attacks and its complex way of doing so are rising.Cyber insurance can be purchased to cover economic and ﬁnancial losses occurring fromcyber incidents. However, due to the complexity of cyber risks, i.e. multiple, catastrophic39 C o m pound po i n t p r o c e ss L t Compound point process L t ( ) Compound point process L t ( ) P o i n t p r o c e ss N t Point process N t ( ) Point process N t ( ) Time, T I n t en s i t y p r o c e ss l t Intensity process l t ( ) Intensity process l t ( ) Figure 1: Simulated sample path of the bivariate compound dynamic contagion pro-cess: Intensity process (cid:16) λ (1) t , λ (2) t (cid:17) , point processes (cid:16) N (1) t , N (2) t (cid:17) , and compound pointprocesses (cid:16) L (1) t , L (2) t (cid:17) . FGM copula is considered with parameter θ . The parame-ters for the process 1 and process 2 are (cid:16) a (1) , a (2) , ρ, δ (1) , δ (2) ; α, ψ, ς, c ; β, ϕ, ǫ ; θ ; ω (1) , ω (2) ,ζ (1) , ζ (2) , k (1) , k (2) ; λ (1)0 , λ (2)0 (cid:17) = (0 , , , ,

3; 0 . , , ,

3; 1 , . ,

15; 0 .

5; 3 , , , , ,

6; 0 . , . compound dynamic contagion process, which accommodate the interdependence of IT systemand the frequency and impact of cyber events.Our numerical results conﬁrm that cyber loss insurance premiums calculated using thebivariate compound dynamic contagion process are signiﬁcantly higher than their counter-parts calculated using a bivariate compound shot-noise Cox process. For that purpose,we provided moment-based insurance premium calculations using a log gamma distribu-tion and a Fr´echet distribution for two separate self-excited jumps (i.e. after-cyber at-tacks/incidents/shocks), two diﬀerent exponential distributions and four diﬀerent copulas(i.e. the Farlie-Gumbel-Morgenstern (FGM) family copula, the Gaussian family copula, the t copula and the Gumbel copula) for externally excited joint jumps (i.e. initial cyber at-tacks/incidents/shocks). Two Pareto distributions were used to represent catastrophic cyberlosses from contagious cyber attacks. This suggests that the bivariate compound dynamiccontagion process can be considered for modelling two aggregate cyber losses to calculatecyber loss insurance premiums accommodating waves of events and with the critical aspectsof the interdependence of IT system and the impact of cyber events taken into account. Forfurther research, we may consider the extension of dimension, other copulas, other measuresfor initial and after cyber attacks/incidents/shocks, and other measures for cyber losses.As loss/claim size and after-cyber attacks/incidents/shocks (self-exciting jumps) could becorrelated, considering the dependency { Y j } j =1 , , ··· and n Ξ (1) j o j =1 , , ··· and { Z k } k =1 , , ··· and n Ξ (1) k o k =1 , , ··· , respectively could be another object of further research.Cyber attacks would occur more often as all types of risk are touched by cyber space dueto digitalisation of business and economic activities, so the proposed bivariate compounddynamic contagion process can be an improved model for insurance companies to quantifycyber losses. The bivariate compound dynamic contagion process is also very much ap-plicable to credit, insurance, market and other operational risks. We hope that what wepresented in this paper provides practitioners with feasible models to quantify cyber losses,and to deal with a variety of problems in economics, ﬁnance and insurance. Acknowledgements

Rosy Oh’s research was supported by Basic Science Research Program throughthe National Research Foundation of Korea(NRF) funded by the Ministry ofEducation (Grant No. 2019R1A6A1A11051177 and 2020R1I1A1A01067376).

References

Allianz (2016), Allianz Risk Barometer Top Business Risks.A¨ıt-Sahalia, Y., Cacho-Diaz, J. A. and Laeven, R. J. (2015) : Modeling ﬁnancialcontagion using mutually exciting jump processes, Journal of FinancialEconomics117 (3), 585-606.A¨ıt-Sahalia, Y., Laeven, R. J. and Pelizzon, L. (2014) : Mutual excitation in Eu-rozone sovereign CDS, Journal of Econometrics, Journal of FinancialEconomics183(2), 151–167. 41auwens, L. and Hautsch, N. (2009). Modelling Financial High Frequency DataUsing Point Processes, In: Handbook of Financial Time Series, T.G. Andersen,R.A. Davis, J.-P. Kreiss and T. Mikosch (eds), Springer.Biener, C., Eling, M. and Wirfs, J. H. (2015): Insurability of Cyber Risk: An Em-pirical Analysis, The Geneva Papers on Risk and Insurance - Issues and Practice,40(1), 131–158.Bowsher, C. G. (2007) : Modelling security market events in continuous time:Intensity based, multivariate point process models, Journal of Econometrics,141(2), 876-912.Chavez-Demoulin, V., Davison, A. C. and McNeil, A. J. (2005): EstimatingValue-at-Risk: A point process approach, Quantitative Finance, 5(2), 227-234.Dassios, A. and Jang, J. (2003) : Pricing of catastrophe reinsurance & derivativesusing the Cox process with shot noise intensity, Finance & Stochastics, 7/1, 73-95.Dassios, A. and Zhao, H. (2011) : A dynamic contagion process, Advances inApplied Probability, 43, 814-846.Dassios, A. and Zhao, H. (2012) : Ruin by Dynamic Contagion Claims, Insurance:Mathematics & Economics, 51/1, 93-106.Dassios, A. and Zhao, H. (2017a) : A generalized contagions process with an ap-plication to credit risk, International Journal of Theoretical and Applied Finance,20(1), 1750003 (33 pages).Dassios, A. and Zhao, H. (2017b) : Eﬃcient simulation of clustering jumps withCIR intensity, Operations Research, 65(6), 1494-1515.Davis, M. H. A.(1984) : Piecewise deterministic Markov processes: A generalclass of non diﬀusion stochastic models. J. R. Stat. Soc. B 46, 353–388.Dong, X. (2014): Compensators and diﬀusion approximation of point processesand applications, Ph.D Thesis. Imperial College London.Embrechts, P., Liniger, T. and Lin, L. (2011) : Multivariate Hawkes Processes:an Application to Financial Data, Journal of Applied Probability, pecial Volume48(A), 367-378Errais, E., Giesecke, K. and Goldberg, L. R. (2010) : Aﬃne Point Processes andPortfolio Credit Risk, SIAM Journal on Financial Mathematics, 1, 642-665.Gao, X, Zhou, X and Zhu, L. (2018) : Transform analysis for Hawkes processeswith applications in dark pool trading, Quantitative Finance, 18(2), 265–282.Giesecke, K. and Kim, B. (2011) : Risk Analysis of Collateralized Debt Obliga-tions, Operations Research, 59(1), 32–49.Hawkes, A. G. (1971a) : Point spectra of some mutually exciting point processes,Journal of the Royal Statistical Society. Series B (Methodological ) 33 (3), 438–443.Hawkes, A. G. (1971b) : Spectra of some self-exciting and mutually excitingpoint processes, Biometrika, 58(1), 83-90.42awkes, A. G. and Oakes, D. (1974) : A cluster process representation of aself-exciting process, Journal of Applied Probability, 11, 493-503.Herath, H. S. B. and Herath, T. S. (2011) : Copula Based Actuarial Model forPricing Cyber-Insurance Policies. Insurance Markets and Companies: Analysesand Actuarial Computations, 2 (1). 7-20.Jang, J. and Dassios, A. (2013) : A Bivariate Shot Noise Self-Exciting Processfor Insurance, Insurance: Mathematics & Economics, 53/3, 524–532.Lu, X. and Abergel, F. (2018): High-dimensional Hawkes processes for limit orderbooks: modelling, empirical analysis and numerical calibration, QuantitativeFinance, 18(20), 249–264.McNeil, A. J., Frey, R. and Embrechts, P. (2005): Quantitative Risk Manage-ment: Concepts, Techniques and Tools, Princeton University Press, USA.Mukhopadhyay, A., Chatterjee, S., Saha, D., Mahanti, A. and Sadhukhan, S.K. (2006) : e-Risk management with insurance: A framework using copula aidedBayesian belief networks. In Proceedings of the 39th Annual Hawaii InternationalConference on System Sciences (HICSS’06), vol. 6, 126.1–126.6. Hoboken, NJ:IEEE.Rambaldi, M., Bacry, E. and Lillo, F. (2017) : The role of volume in order bookdynamics: a multivariate Hawkes process analysis, Quantitative Finance, 17(7),999–1020.Stabile, G. and Torrisi, G. L. (2010) : Risk processes with non-stationary Hawkesclaims arrivals, Methodology and Computing in Applied Probability, 12(3), 415-429.Xu, M. and Hua, L. (2017) : Cybersecurity Insurance: Modeling and Pricing,Society of Actuaries. Schaumburg, Illinois.Yang, S. Y., Liu, A., Chen, J. and Hawkes, A. (2018) : Applications of a multi-variate Hawkes process to joint modeling of sentiment and market return events,Quantitative Finance, 18(2), 295–310.

AppendixA Proof of Theorem 4.1

Setting A f (cid:16) λ (1) , n (1) , l (1) , λ (2) , n (2) , l (2) , t (cid:17) = l (1) in (2.3), we have A l (1) = µ J λ (1) . As L (1) t − L (1)0 − t R A l (1) s ds is a ℑ -martingale, we have E  L (1) t − t Z A l (1) s ds | λ (1)0  = L (1)0 . E (cid:16) L (1) t | λ (1)0 (cid:17) = L (1)0 + E  t Z A l (1) s ds | λ (1)0  = L (1)0 + µ J t Z E (cid:16) λ (1) s | λ (1)0 (cid:17) ds and (4.16) and (4.17) follow using (4.1) and (4.2) in Proposition 4.1. Similarly, (4.18)and (4.19) can be obtained. B Proof of Corollary 4.1

From the proof in Theorem 4.1, we have E (cid:16) L (1) t − L (1)0 | λ (1)0 (cid:17) = µ J t Z E (cid:16) λ (1) s | λ (1)0 (cid:17) ds and also we know E (cid:16) λ (1) t (cid:17) from (4.5), then by assuming that L (1)0 = 0, we have E (cid:16) L (1) t (cid:17) = E (cid:16) L (1) t − L (1)0 (cid:17) = µ J t Z E (cid:16) λ (1) s (cid:17) ds = µ J µ F ρ + a (1) δ (1) δ (1) − µ G ! t. Similarly, we have E (cid:16) L (2) t (cid:17) = E (cid:16) L (2) t − L (2)0 (cid:17) = µ K t Z E (cid:16) λ (2) s (cid:17) ds = µ K µ F ρ + a (2) δ (2) δ (2) − µ H ! t. C Proof of Lemma 4.1

Setting A f (cid:16) Λ (1) , λ (1) , n (1) , l (1) , Λ (2) , λ (2) , n (2) , l (2) , t (cid:17) = λ (1) l (2) in (2.3), we have A (cid:16) λ (1) l (2) (cid:17) = − (cid:16) δ (1) − µ G (cid:17) λ (1) l (2) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) l (2) + µ K λ (1) λ (2) . As λ (1) t L (2) t − λ (1)0 L (2)0 − t R A (cid:16) λ (1) s L (2) s (cid:17) ds is a ℑ -martingale, given L (2)0 = 0 we have theODE dE (cid:16) λ (1) t L (2) t (cid:17) dt = − (cid:16) δ (1) − µ G (cid:17) E (cid:16) λ (1) t L (2) t (cid:17) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) E (cid:16) L (2) t (cid:17) + µ K E (cid:16) λ (1) t λ (2) t (cid:17) with the initial condition E (cid:16) λ (1)0 L (2)0 (cid:17) = 0. The solution of this ODE using (4.21) and(4.9) is given by (4.22). Similarly, we have (4.23). D Proof of Theorem 4.2 A f (cid:16) Λ (1) , λ (1) , n (1) , l (1) , Λ (2) , λ (2) , n (2) , l (2) , t (cid:17) = l (1) l (2) in (2.3), we have A (cid:0) l (1) l (2) (cid:1) = µ J λ (1) l (2) + µ K λ (2) l (1) . As L (1) t L (2) t − L (1)0 L (2)0 − t R A (cid:16) L (1) s L (2) s (cid:17) ds is a ℑ -martingale, we have E  L (1) t L (2) t − t Z A (cid:0) L (1) s L (2) s (cid:1) ds | λ (1)0 , λ (2)0  = L (1)0 L (2)0 . Hence E (cid:16) L (1) t L (2) t | λ (1)0 , λ (2)0 (cid:17) = L (1)0 L (2)0 + E  t Z A (cid:0) L (1) s L (2) s (cid:1) ds | λ (1)0 , λ (2)0  = L (1)0 L (2)0 + µ J t Z E (cid:16) λ (1) s L (2) s | λ (1)0 , λ (2)0 (cid:17) ds + µ K t Z E (cid:16) λ (2) s L (1) s | λ (1)0 , λ (2)0 (cid:17) ds. Using (4.22) and (4.23), with L (1)0 = L (2)0 = 0, we have E (cid:16) L (1) t L (2) t (cid:17) = µ J t Z E (cid:16) λ (1) s L (2) s (cid:17) ds + µ K t Z E (cid:16) λ (2) s L (1) s (cid:17) ds and the result follows. E Proof of Lemma 4.2

Setting A f (cid:16) Λ (1) , λ (1) , n (1) , l (1) , Λ (2) , λ (2) , n (2) , l (2) , t (cid:17) = λ (1) l (1) in (2.3), we have A (cid:16) λ (1) l (1) (cid:17) = − (cid:16) δ (1) − µ G (cid:17) λ (1) l (1) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) l (1) + µ J n λ (1) o + µ G µ J λ (1) . As λ (1) t L (1) t − λ (1)0 L (1)0 − t R A (cid:16) λ (1) s L (1) s (cid:17) ds is a ℑ -martingale, given L (1)0 = 0 we have theODE, dE (cid:16) λ (1) t L (1) t (cid:17) dt = − (cid:16) δ (1) − µ G (cid:17) E (cid:16) λ (1) t L (1) t (cid:17) + (cid:16) a (1) δ (1) + µ F ρ (cid:17) E (cid:16) L (1) t (cid:17) + µ J E (cid:20)n λ (1) t o (cid:21) + µ G µ J E (cid:16) λ (1) t (cid:17) . with the initial condition E (cid:16) λ (1)0 L (1)0 (cid:17) = 0. The solution of this ODE using (4.20),(4.14) and (4.5), is given by (4.27). Similarly, we have (4.28).45 Proof of Theorem 4.3

Setting A f (cid:16) Λ (1) , λ (1) , n (1) , l (1) , Λ (2) , λ (2) , n (2) , l (2) , t (cid:17) = (cid:8) l (1) (cid:9) in (2.3), we have A (cid:8) l (1) (cid:9) = 2 µ J λ (1) l (1) + µ J λ (1) . As n L (1) t o − n L (1)0 o − t R A n L (1) s o ds is a ℑ -martingale, given L (1)0 = 0 we have E (cid:20)n L (1) t o (cid:21) = 2 µ J t Z E (cid:16) λ (1) s L (1) s (cid:17) ds + µ J t Z E (cid:16) λ (1) s (cid:17) ds and (4.29) follows using (4.27) and (4.5). Similarly, we have (4.30). G Proof of Corollary 4.2 By V ar n L (1) t o = E (cid:20)n L (1) t o (cid:21) − n E (cid:16) L (1) t (cid:17)o2