[PDF] Uncovering the mesoscale structure of the credit default swap market to improve portfolio risk modelling

Abstract

One of the most challenging aspects in the analysis and modelling of financial markets, including Credit Default Swap (CDS) markets, is the presence of an emergent, intermediate level of structure standing in between the microscopic dynamics of individual financial entities and the macroscopic dynamics of the market as a whole. This elusive, mesoscopic level of organisation is often sought for via factor models that ultimately decompose the market according to geographic regions and economic industries. However, at a more general level the presence of mesoscopic structure might be revealed in an entirely data-driven approach, looking for a modular and possibly hierarchical organisation of the empirical correlation matrix between financial time series. The crucial ingredient in such an approach is the definition of an appropriate null model for the correlation matrix. Recent research showed that community detection techniques developed for networks become intrinsically biased when applied to correlation matrices. For this reason, a method based on Random Matrix Theory has been developed, which identifies the optimal hierarchical decomposition of the system into internally correlated and mutually anti-correlated communities. Building upon this technique, here we resolve the mesoscopic structure of the CDS market and identify groups of issuers that cannot be traced back to standard industry/region taxonomies, thereby being inaccessible to standard factor models. We use this decomposition to introduce a novel default risk model that is shown to outperform more traditional alternatives.

Full PDF

UUncovering the mesoscale structure of the creditdefault swap market to improve portfolio riskmodelling

I. ANAGNOSTOU ∗ †‡ , T. SQUARTINI § , D. GARLASCHELLI §¶ and D.KANDHAI †‡ † Computational Science Lab, University of Amsterdam, Science Park 904, 1098 XHAmsterdam, The Netherlands ‡ Quantitative Analytics, ING Bank, Foppingadreef 7, 1102 BD Amsterdam, The Netherlands § IMT School for Advanced Studies Lucca, Piazza San Francesco 19, 55100 Lucca, Italy ¶ Lorentz Institute for Theoretical Physics, Leiden University, Niels Bohrweg 2, 2333 CALeiden, The Netherlands ( Received June 5, 2020 )One of the most challenging aspects in the analysis and modelling of ﬁnancial markets, including CreditDefault Swap (CDS) markets, is the presence of an emergent, intermediate level of structure standing inbetween the microscopic dynamics of individual ﬁnancial entities and the macroscopic dynamics of themarket as a whole. This elusive, mesoscopic level of organisation is often sought for via factor modelsthat ultimately decompose the market according to geographic regions and economic industries. However,at a more general level the presence of mesoscopic structure might be revealed in an entirely data-driven approach, looking for a modular and possibly hierarchical organisation of the empirical correlationmatrix between ﬁnancial time series. The crucial ingredient in such an approach is the deﬁnition of anappropriate null model for the correlation matrix. Recent research showed that community detectiontechniques developed for networks become intrinsically biased when applied to correlation matrices.For this reason, a method based on Random Matrix Theory has been developed, which identiﬁes theoptimal hierarchical decomposition of the system into internally correlated and mutually anti-correlatedcommunities. Building upon this technique, here we resolve the mesoscopic structure of the CDS marketand identify groups of issuers that cannot be traced back to standard industry/region taxonomies, therebybeing inaccessible to standard factor models. We use this decomposition to introduce a novel default riskmodel that is shown to outperform more traditional alternatives.

Keywords : Financial Time Series; Correlation Modelling; Correlation Matrices; Community Detection;Credit Default Swaps; Applications to Default Risk; Multi-factor Models)

JEL Classiﬁcation : C15; C53; D85; G11; L14

1. Introduction

The ﬁnancial crisis of 2007-08 laid bare the downside of a highly interconnected ﬁnancial system byevidencing that each link constitutes a channel through which shocks can propagate rapidly acrossmarkets and asset classes. This has been popularised by the too-interconnected-to-fail motto, stress-ing the impact that the failure of a highly central node would have on the rest of the network.Capturing ﬁnancial complexity within models has been a major challenge since, for ﬁnancial insti-tutions and regulators alike. ∗ Corresponding author. Email: [email protected] a r X i v : . [ q -f i n . R M ] J un omplexity-inspired models rest upon the evidence that economic and ﬁnancial systems havemany of the key properties characterising natural complex systems: they are composed by manyheterogeneous units that interact with each other in a non-linear fashion, usually in the presence offeedback (Amaral and Ottino 2004, Mantegna et al. networks . Techniques from network theoryhave been used to study a variety of ﬁnancial assets, including equities (Mantegna 1999, Mantegna et al. et al. et al. et al. et al. et al. et al. (2016) proposed to use principal components analysis (PCA) to identifythe common systematic factors that drive issuer returns. While the factors obtained using thisapproach are explanatory in a statistical sense, they usually lack of an obvious economic meaning.Moreover, the stability of such factors over time might prove insuﬃcient. A more common approachamong practitioners is employing observable economic factors, representing the overall state of theeconomy or eﬀects related to particular geographic regions or industry sectors. An example of thisapproach is described in Wilkens and Predescu (2017).Motivated by the absence of models encoding complexity-based ‘inputs’, our paper takes up thechallenge by focusing on one of the key aspects of complex systems, i.e. the presence of a communitystructure . Detecting the presence of communities means individuating clusters of units sharing somekind of similarity: when considering ﬁnancial systems, this usually boils down at identifying sets ofstocks sharing similar price dynamics . We focus on credit default swap (CDS) spread time serieswith the aim of individuating clusters whose similarities cannot trace back to the standard, region-and sector-wise ones. To achieve this goal, we employ a recently-proposed community detection2ethod that take as input the empirical correlation matrix induced by a given set of time series(MacMahon and Garlaschelli 2015, Almog et al.

2. Preliminary deﬁnitions2.1.

Credit default swaps

A credit default swap (CDS) is a ﬁnancial contract in which a protection buyer, A, buys insurancefrom a protection seller, B, against the default of a reference entity, C. More speciﬁcally, regularcoupon payments with respect to a contractual notional and a ﬁxed rate, the CDS spread, areswapped with a payment of in the event of default of C, where RR , known as the recovery rate, isa contract parameter representing the fraction of investment which is assumed be recovered in theevent of default of C.CDS spreads reﬂect the market participants’ view on probability of default. Thus, practitionersoften rely on them to obtain market-implied parameters which are key inputs to their models.For instance, in the case of credit valuation adjustment (CVA) , the default probabilities areobtained from CDS spreads. Apart from derivatives valuation, CDS spreads are used extensively forestimating correlations in market risk and capital models (Basel Committee on Banking Supervision2016). Description of the data-set

The raw CDS data-set is provided by Markit and consists of daily CDS spreads for a range ofmaturities covering the period between 1 January 2007 and 31 December 2016. Markit maintains anetwork of market makers who contribute quotes from their oﬃcial books and records for thousandsof entities on a daily basis. Using the contributed quotes, the daily CDS spreads for each entity, aswell as the daily recovery rates used to price the contracts, are calculated. In addition, the data-setcontains information on the names of the underlying reference entities, recovery rates, seniorityof the debt on which the contract is priced on, restructuring type, number of quote contributors,region, sector, average of the ratings from Standard & Poors, Moodys, Fitch Group of each entityand currency of the quote. CVA is the diﬀerence between the risk-free portfolio value and the actual portfolio market value that takes into account therisk of a counterpartys default. Its magnitude depends on the probability of default of the counterparty, the future exposuresof the underlying derivative or portfolio, and the loss given default.

Region N Sector N Africa 5 Basic materials 51Asia 132 Consumer goods 104Eastern Europe 14 Consumer services 94Europe 239 Energy 52India 7 Financials 164Latin America 13 Government 64Middle East 9 Health Care 33North America 342 Industrials 86Oceania 25 Technology 31Telecommunication services 42Utilities 69

Since Markit data are characterised by a number of attributes, it is possible to have multipleCDS data series for the same issuer. In order to obtain unique, representative time series, we applya set of selection criteria. First, we select the CDS spreads of entities for the ﬁve-year tenor for ouranalysis; it is observed that Markit’s raw data is more complete for this tenor, since ﬁve-year CDScontracts are the most liquid. For the same reason, we select senior unsecured debt for corporatesand foreign debt for sovereigns. Finally, we set up a hierarchy for the document clause/restructuringtype which deﬁnes what constitutes a credit event and select the series denominated in Euro forthe European issuers and in U.S. dollar for issuers from the rest of the world. Besides the abovesteps, we apply a couple of additional ﬁltering steps to the CDS data to retain the most liquidquotes. After applying these pre-processing steps, we are left with a total of 786 entities and 2608trading days. The distribution of issuers across regions and sectors is shown in Table 1.

3. Community detection on CDS correlation matrices3.1.

MethodsBasic notation.

In this subsection we introduce the basic notation and describe the community-detection method for time series introduced in MacMahon and Garlaschelli (2015). Financial mar-kets are represented as a set of time series X . . . X N , each one encoding the temporally orderedactivity of the i -th unit of the system over, say, T time-steps, i.e. X i ≡ { x i (1) . . . x i ( T ) } , ∀ i. (1)In our case i is a credit issuer. The mutual interactions between the series considered above aresummed up by an X × N correlation matrix , i.e. a matrix C whose generic entry C ij reads C ij = Cov[ X i , X j ] (cid:112) Var[ X i ] · Var[ X j ] , ∀ i, j (2)i.e. the Pearson coeﬃcient between series i and j , withCov[ X i , X j ] = X i · X j − X i · X j , ∀ i, j (3)and Var[ X i ] = X i − X i , ∀ i ; (4)4n the above equations, the bar is assumed to denote a temporal average, i.e. X i = (cid:80) Tt =1 x i ( t ) T , ∀ i, (5) X i = (cid:80) Tt =1 x i ( t ) T , ∀ i, (6) X i · X j = (cid:80) Tt =1 x i ( t ) · x j ( t ) T , ∀ i, j. (7)As frequently done in order to ﬁlter out the inherent heterogeneity of time series, each series X i has been standardised by subtracting the temporal average X i and dividing the result by thestandard deviation σ i = (cid:112) Var[ X i ]; in other words, X i has been redeﬁned as ( X i − X i ) /σ i in a sucha way to ensure that X i = 0, Var[ X i ] = 1 and C ij = Cov[ X i , X j ] = X i · X j .One of most challenging problems in the ﬁeld of complex systems is that of extracting informationfrom the matrix C . As we are considering ﬁnancial markets, we would be interested in individuatingsets of issuers sharing a similar CDS spread dynamics. More formally, this amounts at inspectingthe community structure of the considered set of issuers, i.e. the presence of (internally cohesive) modules of issuers .Over the past years, several techniques to retrieve information regarding the modularity of multi-ple time series have been proposed (see MacMahon and Garlaschelli (2015) and references therein).One of the most promising approach is that of applying community detection techniques to em-pirical correlation matrices: however, as existing methods are tailored on graphs, they suﬀer fromstatistical bias whenever applied ‘as they are’ to correlation matrices (see MacMahon and Gar-laschelli (2015) and references therein).Recently, a method based on Random Matrix Theory (RMT) (Potters et al. internally but anti-correlated witheach other. In what follows, we will provide a brief explanation of the method. More details canbe found in the original reference (MacMahon and Garlaschelli 2015). Spectral analysis of random correlation matrices.

Let us start by inspecting the properties ofrandom correlation matrices. The latter are constructed by considering N completely random timeseries of length T (more precisely, time series whose entries are independent, identically distributedrandom variables with zero mean and ﬁnite variance): the matrix encoding the correlations of thisset of series is a N × N Wishart matrix, whose eigenvalues follow (in the limits N → + ∞ and T → + ∞ with 1 < T /N < + ∞ ) the so-called Marcenko-Pastur distribution (Laloux et al. ρ ( λ ) = TN (cid:112) ( λ + − λ )( λ − λ − )2 πλ if λ − ≤ λ ≤ λ + (8)and ρ ( λ ) = 0 otherwise, with λ + and λ − being, respectively, the maximum and the minimumeigenvalue: λ ± = (cid:34) ± (cid:114) NT (cid:35) . (9)5 .0 0.5 1.0 1.5 2.00.00.51.01.5 () () () Figure 1.: Left panel: smoothed eigenvalue density of the correlation matrix extracted from theCDS spreads of N = 786 issuers during the period 2007-2016 ( T = 2608); for comparison we haveplotted the Marcenko-Pastur density function (8) coming from N uncorrelated series of duration T .Inset: same plot, but including the highest eigenvalue corresponding to the ‘market’. Right panel:eigenvalues density distribution of randomised CDS data. The ﬁgure shows the agreement betweenempirical (in blue) and random (in red) distributions once the original data are shuﬄed.The method we are going to describe implements the idea that, while the eigenvalues of an em-pirical correlation matrix falling within these boundaries can be attributed to random noise, anyeigenvalue smaller than λ − and larger than λ + is to be considered as representing some mean-ingful structure in the data. The result above leads to the possibility of expressing any empiricalcorrelation matrix as the sum of two components, i.e. C = C ( r ) + C ( s ) (10)where C ( r ) is the tensor random component , induced by the eigenvalues in the random bulk (i.e. ∀ i such that λ i ∈ [ λ − , λ + ]) and reading C ( r ) ≡ (cid:88) i ( λ − ≤ λ i ≤ λ + ) λ i | v i (cid:105)(cid:104) v i | (11)and C ( s ) is the tensor structural component , aggregated from the remaining eigenvalues. The struc-tural component C ( s ) can be further subdivided, upon considering the existence of the so called market mode . As it has been shown in a number of previous studies (MacMahon and Garlaschelli2015, Almog et al. et al. λ m which is orders of magnitudelarger that the remaining ones; in case ﬁnancial stocks are considered, such a leading eigenvalueembodies the common factor driving all the constituents of a given market. As the eﬀect of λ m isthat of pushing all nodes into the same community, we need to properly discount it, by focusingon the ‘reduced’ portion of the structured spectrum deﬁned by the condition λ i ∈ ( λ + , λ m ). Thisleads to a further decomposition of the correlation matrix, i.e. C = C ( r ) + C ( g ) + C ( m ) (12)6here C ( m ) ≡ λ m | v m (cid:105)(cid:104) v m | (13)represents the tensor portion induced by the market mode and C ( g ) ≡ (cid:88) i ( λ + <λ i <λ m ) λ i | v i (cid:105)(cid:104) v i | (14)represents the tensor portion ﬁltered from both the random noise and the common factor. As aconsequence, the correlations encoded in C ( g ) are neither at the individual level nor at the levelof the entire market but at the level of groups of stocks (i.e. at the mesoscale in the networkjargon). Remarkably, the eigenvectors contributing to C ( g ) have alternating signs, allowing for thedetection of groups aﬀected in a similar manner by some (other) common factors (MacMahon andGarlaschelli 2015, Potters et al. N = 786 CDS spreads (corresponding to T = 2608 daily log-returns for the period2007-2016) together with the Marcenko-Pastur distribution coming from N totally uncorrelatedseries of duration T (shown in red). The maximum expected eigenvalue amounts at approximately λ + = 1 .

27. The inset is the fully zoomed-out version of the plot, illustrating that the empiricalcorrelation matrix has a maximum eigenvalue of about λ m = 216 (i.e. the market mode), in additionto a few other eigenvalues lying between λ + and λ m . The eigenvector | v m (cid:105) corresponding to λ m hasall positive signs. We also inspect whether the system follows the Marcenko-Pastur distributiononce the original data re shuﬄed. To this aim, we, ﬁrst, randomly permute the entries of each timeseries separately, thus destroying the daily correlations; then, we check if the eigenvalues of thecorrelation matrix of the shuﬄed set of series follow the Marcenko-Pastur distribution: as ﬁg. 1shows, this is indeed the case. Community detection on ﬁltered correlation matrices.

Community detection is an activeﬁeld of research within network theory. Among the many proposed approaches, the most popularone is based on the maximisation of the quantity known as modularity , a score function measuringthe optimality of a given partition by comparing the empirical pattern of interconnections with theone predicted by a properly-deﬁned benchmark model. It is deﬁned as Q ( σ ) = 1 || A || N (cid:88) i,j =1 [ a ij − (cid:104) a ij (cid:105) ] δ ( σ i , σ j ) (15)where a ij is the generic entry of the network adjacency matrix A , (cid:104) a ij (cid:105) is the probability that nodes i and j establish a connection according to the chosen benchmark (i.e. the expectation of whethera link exists or not under some suitable null hypothesis), σ is the N -dimensional vector encodingthe information carried by a given partition (the i -th component, σ i , denotes the module to whichnode i is assigned) and the Kronecker delta δ ( σ i , σ j ) ensures that only the nodes within the samemodules provide a positive contribution to the sum. The normalisation factor || A || = (cid:80) Ni,j =1 a ij guarantees that − ≤ Q ( σ ) ≤ C as a weighted network and searchingfor communities by using the weighted extension of the modularity, deﬁned by posing a ij ≡ C ij (cid:104) a ij (cid:105) = (cid:104) C ij (cid:105) = s i s j W , ∀ i, j (16)with s i = (cid:80) Nl =1 C il = Cov[ X i , X tot ] and W = (cid:80) Ni,j =1 C ij = Var[ X tot ] (having deﬁned x tot ( t ) = (cid:80) Tt =1 x i ( t )). According to MacMahon and Garlaschelli (2015), however, this approach may leadto biased results. As a consequence, a null model encoding the spectral properties of correlationmatrices has been proposed, i.e. (cid:104) a ij (cid:105) = (cid:104) C ij (cid:105) = C ( r ) ij + C ( m ) ij , ∀ i, j (17)in turn leading to the following redeﬁnition of modularity Q ( σ ) = 1 || C || N (cid:88) i,j =1 (cid:104) C ij − (cid:16) C ( r ) ij + C ( m ) ij (cid:17)(cid:105) δ ( σ i , σ j ) (18)(with || C || = (cid:80) Ni,j =1 C ij ), a ‘novel’ quantity whose maximisation outputs a partition of a given setof time series upon ﬁltering out the random noise and the market component. ResultsCommunity detection.

We now proceed with the application of the methodology describedin Section 3.1. Figure 2 shows the output of the Louvain algorithm when applied to the dailyCDS spread data of 786 issuers, covering the period between 1 January 2007 and 31 December2016. The algorithm is making use of the modularity deﬁned in Equation (18), which is able todiscount random as well as market-wide eﬀects. With the CDS data, we obtain three mesoscopiccommunities, labelled A, B and C, characterised (as explained in the previous sections) by positivecorrelations within them and negative correlations between them . The pie charts represent thecomposition of each community according to the industry and region of the constituent issuers forFigure 2a and Figure 2b respectively. The colour legends can be found in Tables 2 and 3.From the data in Figure 2a, it is apparent that every community contains a range of issuersfrom all industry sectors: no pattern of association between sector and community structure isimmediately evident. For Community A, it can be seen that over a quarter of issuers are classiﬁedas Financials ( ), while another quarter of the community comprises issuers from the ConsumerGoods ( ) and Consumer Services ( ) sectors; the rest of the industry sectors are represented withlower percentages, ranging from slightly less than 10% for Utilities ( ) to approximately 3% forTechnology ( ). Issuers from Financials ( ) are slightly less frequent in Community B, accountingfor approximately 15% of the total issuers, while consumer Goods ( ), Consumer Services ( ),and Industrials ( ) sectors follow with similar percentages; the rest of the community consists ofissuers from all sectors with lower percentages, such as Energy ( ) and Utilities ( ) with slightlyless than 10% each. Almost 40% of Community C consists of Financials ( ) and Government ( )issuers: interestingly, the proportion of Government ( ) issuers in Community C is signiﬁcantlyhigher than the corresponding proportion in Communities A and B and almost twice as high asthe one in the full sample of 786 issuers; issuers from the Consumer Goods ( ) and Industrials ( )sectors are also quite frequent, constituting approximately a quarter of Community C.We now turn to the relative breakdown of the issuers of each community according to region. AsFigure 2b demonstrates, the identiﬁed communities display a high degree of overlap with regionclassiﬁcation. Communities A and B are dominated by the regions Europe ( ) and North America( ) respectively. On the other hand, Community C contains the bulk of issuers from Asia ( ),8

A B (a) (cid:38)A B (b)

Figure 2.: CDS market community structure. The ﬁgure shows the communities detected on CDSspread data for 786 issuers during the period 1 January 2007-31 December 2016. The communitieswere generated by the modiﬁed Louvain algorithm (MacMahon and Garlaschelli 2015) using dailylog-returns. Individual communities are labelled A, B, C and the pie charts represent the relativecomposition of each community with respect to the sector (left panel) and region (right panel) ofthe component entities.Table 2.: The 11 industry sectors with the colour representation used to highlight the sectors inthe following ﬁgures.

Region N Sector N Basic materials: Health care:Consumer goods: Industrials:Consumer services: Technology:Energy: Telecommunication servicesFinancials: Utilities:Government

Table 3.: The 9 regions with the colour representation used to highlight the sectors in the followingﬁgures.

Region N Sector N Africa: Latin America:Asia: Middle East:Eastern Europe: North America:Europe: OceaniaIndia:

Hierarchical community structure.

The methodology described in Section 3.1 can be appliediteratively, to individuate smaller subcommunities which may be nested inside larger communities.As the leading eigenvalue of the correlation matrix represents something that all issuers in themarket have in common, the leading eigenvalue of the correlation matrix restricted to a speciﬁcindividual community can be interpreted as something that all issuers in that community havein common: by washing out the eﬀects of this ‘community mode’, one can detect the ‘residual’modular structure (internal to each community). Naturally, subcommunities within each parentcommunity are anticorrelated with each other while being positively correlated internally.The results of a single iteration over the communities A, B, and C (summarised in Figure 2)are illustrated in Figures 3 and 4. It can be seen from the graph that the degree of overlap withthe sector-based classiﬁcation is higher for subcommunities than for communities. Moreover, theoverlap with the region-based classiﬁcation is even more pronounced.Community A is divided is four subcommunities labelled A1 through A4. A1 and A3 contain arange of issuers from all sectors, while A2 is dominated by Financials ( ) and three quarters ofA4 is constituted by Energy ( ) and Utilities ( ). In all four subcommunities, Europe ( ) is theleading region and there is a small percentage of issuers from North America ( ); moreover, MiddleEast ( ), Eastern Europe ( ) and Africa ( ) are represented exclusively in A1 while issuers fromAsia ( ) can be found exclusively in A3.Community B is split in ﬁve subcommunities. B1 is quite heterogeneous, including issuers fromall diﬀerent industry sectors. B2 includes issuers from the Energy ( ) and Utilities ( ) sectors.Approximately half of B3 and B4 include are constituted by Consumer Goods ( ) and ConsumerServices ( ) issuers, with Industrials ( ) and Health Care ( ) accounting for another quarter of B4.Finally, B5 is dominated by Financials ( ) and Government ( ). From a geographic perspective,subcommunities of B are all dominated by North America ( ), with some issuers from Europe( ) being present in B1 and to a lesser extent in B3 and B5. B2 contains issuers only from NorthAmerica ( ) and, with very few exceptions, the same holds for B3 and B4 as well. Most of theissuers from Latin America ( ) can be found in B5 with almost a third of this subcommunity beingfrom that region.Moving to the four subcommunities of Community C, almost half of the issuers in subcommunityC1 are equally split among Consumer Goods ( ) and Industrials ( ). Issuers from the Government( ) and Financials ( ) sectors constitute more than half of subcommunities C2 and C4. Financials( ) are frequent in subcommunity C3 as well, followed by Consumer Services ( ), i.e. the secondmost frequent industry sector. In terms of regional classiﬁcation, although Community C is moreheterogeneous than A and B, its subcommunities reveal that issuers from Asia ( are concentratedin subcommunities C1 and C4, with C4 containing almost all the issuers from India ( ). C3consists of issuers from Oceania ( ), with a small number of issuers from Europe ( ). The bulkof the European issuers in community C is concentrated in C2, making up almost 50% of thesubcommunity. The rest of C2 consists of issuers from Eastern Europe ( ) and North America ( )with about 15% each, and Middle East ( ) with slightly over 10%.In summary, these results provide important insights into the structure of the CDS market. It issuggested that the three communities initially detected are quite heterogeneous as far as industrysectors are concerned, while they overlap to a greater extent when considering a regional-basedclassiﬁcation of issuers. Interestingly, although Communities A and B are dominated by Europe( ) and North America ( ), some European and North American issuers are clustered with issuersfrom Asia ( ) and Oceania ( ) in Community C. At the second iteration of the method, some10

Figure 3.: Subcommunity structure of the three communities of the CDS market by sector. Althoughat ﬁrst glance the subcommunities seem quite heterogeneous with respect to sector, after closeinspection it can be seen that some of them are dominated by certain sectors, for example A2and B5 are dominated by Financials ( ) and Government ( ), while A4 and B2 are dominated byEnergy ( ) and Utilities ( ).of the obtained subcommunities are dominated by certain sectors: for instance, A2 and B5 aredominated by Financials ( ) and Government ( ), while A4 and B2 are dominated by Energy( ) and Utilities ( ). The overlap between regions and subcommunities is even better than theone between sectors and subcommunities. These results have implications for the management ofportfolios of credit risky instruments, demonstrating that after global eﬀects have been ﬁlteredout, default risk dependence is related to regional eﬀects to a larger extent than to sectoral eﬀects.This seems to be in line with previous evidence from equity markets (Heston and Rouwenhorst1994), indicating that industry-speciﬁc eﬀects are less signiﬁcant than region eﬀects. In addition,11

Figure 4.: Subcommunity structure of the three communities of the CDS market by region. Theoverlap between region and subcommunities is better than the one between sectors and subcom-munities.as far as default risk is concerned, it appears that neither diversiﬁcation over regions alone nordiversiﬁcation over industries alone can achieve the optimal diversiﬁcation beneﬁts, a result thatis aligned with Aretz and Pope (2013).

Community detection: a temporal multiresolution analysis.

Once the mesoscopic organi-sation of the CDS market has been detected, one may wonder how stable (i.e. ‘robust over time’)this organisation is. This amounts at investigating the presence of discrepancies from the originalcommunity structure once the log-returns constituting the time series are sampled at diﬀerent fre-quencies, e.g. weekly or monthly. For consistency with the results presented so far, the same periodof ten years is considered but our analysis is now focused on the set of time series induced by the12 da y da ys w ee k w ee ks m on t h Time-step resolution D i v e r gen c e f r o m c o mm un i t y s t r u c t u r e (a) da y da ys w ee k w ee ks m on t h da y da ys w ee k w ee ks m on t h (b) Figure 5.: Left panel (5a): Divergence from original community structure for the ﬁve data sets ofdiﬀerent sampling frequencies. It can be seen that the

V I is increasing steadily when moving fromthe ﬁnest to the coarsest resolution but does not exceed 10%, indicating a high level of similaritybetween partitions. Right panel (5b): Heat map illustrating the value of

V I between each pair ofthe ﬁve data sets of diﬀerent sampling frequencies. From the heat map, it is apparent that there is aconsiderable degree of consistency across sampling frequencies, but the similarity degrades steadilywhen moving from the ﬁnest to the coarsest resolution.following array of time resolutions∆ t ∈ { , , , , } . (19)To measure the eﬀects of the temporal resolution, we employ the index known as variation ofinformation ( V I ), an information-theoretic measure quantifying the diﬀerence between any twopartitions. Two diﬀerent partitions can be represented by two N -dimensional vectors (cid:126) σ and (cid:126) σ whose i -th component σ i denotes the module to which node i belongs. The (normalised) variationof information is deﬁned as V I ( (cid:126) σ : (cid:126) σ ) = 1 − I ( (cid:126) σ : (cid:126) σ ) H ( (cid:126) σ : (cid:126) σ ) (20)where I ( (cid:126) σ : (cid:126) σ ) is the mutual information , which is deﬁned as follows I ( (cid:126) σ : (cid:126) σ ) = N (cid:88) i =1 N (cid:88) j =1 p ( σ i , σ j ) log (cid:34) p ( σ i , σ j ) p ( σ i ) p ( σ j ) (cid:35) (21)(with p ( σ i ) being the probability for a node to belong to σ i in partition 1, p ( σ j ) being the proba-bility for a node to belong to σ j in partition 2 and p ( σ i , σ j ) being the joint probability distributionfor a node to belong to σ i in partition 1 and to σ j in partition 2) and H ( (cid:126) σ : (cid:126) σ ) is the jointentropy , which is deﬁned as follows 13igure 6.: Multifrequency heat map showing the normalised co-occurrence of diﬀerent pairs ofissuers within the same community, for the same time period but over various temporal resolutionsof the original time series. The issuers have been ordered using hierarchical clustering with averagelinkage to position issuers with a high degree of co-occurrence next to each other. H ( (cid:126) σ : (cid:126) σ ) = N (cid:88) i =1 N (cid:88) j =1 p ( σ i , σ j ) log (cid:2) p ( σ i , σ j ) (cid:3) . (22)Notice that, unlike mutual information, the variation of information is a true metrics sinceit satisﬁes the triangle inequality. The divergence from the community structure presented inSection 3.2, for each additional set of time series is depicted by Figure 5a. From the chart, itis apparent that V I is increasing steadily when moving from the ﬁnest to the coarsest resolutionbut does not exceed 10% in any case, indicating a high level of similarity between partitions. Thevalue of the

V I between each pair of data-sets is, instead, shown in Figure 5b. This result providessome support for the conceptual premise that community structure does not (strongly) depend onthe level of temporal resolution at which our data-set is considered.The

V I -based analysis contributes to our understanding of the robustness of the communitystructure with respect to diﬀerent temporal resolutions; however, we would like to extend this kindof analysis at the issuer level, by assessing how robust is the assignment of issuers to communitiesby measuring the frequency with which any two issuers are assigned to the same community overdiﬀerent temporal partitions. The results of this analysis are presented in Figure 6. The heat mapshows how frequently issuers co-occur within the same communities across the time resolutionsconsidered here. In case two issuers are found within the same community for each time resolution,the entry corresponding to the considered pair of issuers in the heat map is drawn in white, whileif they are never found within the same community the entry is drawn in black. As the heatmap shows, three ‘hard cores’ of issuers appear, indicating that the issuers belonging to them areassigned to the same community for the vast majority of the temporal resolutions; in addition,there are also few ‘soft issuers’ who move across communities for diﬀerent time resolutions, oﬀeringan explanation for the small variation observed in Figure 5.

Community detection: temporal stability.

One of the major challenges that managers mea-suring portfolio risk face is that of determining the appropriate period of history to use whenestimating correlations to be used in their models. According to Loretan and English (2000), usinga relatively short period of data might be dangerous: in case the employed period is uncharacter-istically stable, in fact, the estimated correlations may be lower than average, leading to excessive14 H H H H H H H H H H H H H H H H H H H H Years (2007-2016) D i v e r gen c e f r o m i n i t i a l s t r u c t u r e (a) H H H H H H H H H H T o t a l H H H H H H H H H H T o t a l (b) Figure 7.: Left panel (7a): Divergence from initial community structure for the ensuing six-monthwindows. The values of

V I do not exceed 10% for any of the six-month windows, while no particulartrend can be observed. Right panel (7b): The heat map illustrates the value of

V I between everypair of six-month time windows, as well as the

V I between each window and the total ten-yearperiod (rightmost column and bottom row). It can be seen that there is a high degree of communitycoherence over time.risk taking; on the other hand, if the used interval is relatively volatile, the resulting correlationscan be unrealistically low, leading to excessive risk aversion. However, choosing a longer time seriesis not guaranteed to produce more reliable estimates: the ever-evolving nature of ﬁnancial marketsdeems undesirable to rely on data from the distant past. In order to be conﬁdent that the resultspresented in Section 3.2 can provide useful insights for risk managers, it is required to study thetime dynamics of the detected communities.To analyse the stability of communities over the course of time, we employ a non-overlappingsliding window of six months. Figure 7a illustrates the divergence from the community structuredetected using the ﬁrst six-month window throughout the years. It can be seen that there are nosigniﬁcant ﬂuctuations, with

V I not exceeding 10% for any of the six-month periods. To furtherimprove our understanding on the community coherence, in Figure 7b we provide a heat mapshowing the mutual

V I between every two pairs of six-month windows. Each square in the matrixrepresents the value of

V I between the i -th and j -th six-month period, while the last row andcolumn represent the value of V I between each six-month period and the partition obtained usingthe full ten-year sample. The results indicate that there was little movement of issuers betweencommunities during the ten-year period. In addition, it can be seen that there is little diﬀerencebetween these periods and the community structure obtained when we using the entire ten-yearperiod.Finally, having determined that the communities do not exhibit signiﬁcant ﬂuctuations over time,we use the same sliding window to examine the stability of communities over time at the issuerlevel. In a similar fashion to the temporal multiresolution analysis, we plot a heat map containingthe frequency with which each pair of issuers can be found within the same community over thecourse of the ten-year time frame. As Figure 8 shows, pairs of issuers appearing to be in the samecommunity for all the six-month windows have white entries in the heat-map, while the entriescorresponding to pairs of issuers never appearing within the same community are drawn in black.After a closer inspection, it can be seen that the three communities detected by using the full ten-year history appear to be tight-knit and unwavering, thus maintaining a high degree of coherenceover the course of time. A small number of issuers moving ﬂuidly from one community to anothercan still be appreciated. 15igure 8.: Coherence of communities over time. The heat map shows the frequency of co-occurrenceof diﬀerent pairs of issuers within the same community over time. The communities appear to betight-knit and unwavering, maintaining coherence over the course of the ten-year time frame. Theissuers have been ordered using hierarchical clustering with average linkage to position issuers witha high degree of co-occurrence next to each other.

4. Default risk charge model4.1.

Model speciﬁcation

Consider a portfolio of m issuers, indexed by i = 1 , ...m and a ﬁxed time horizon of T = 1 year.The overall portfolio loss is modelled by a random variable L , deﬁned as the sum of the individuallosses on issuers’ default, i.e. L = (cid:80) mi =1 L i , with L i = q i e i Y i (23)where L i = q i e i Y i denotes the loss on issuer i , with e i and q i being, respectively, the exposure atdefault and the loss given default of issuer i and Y i being the random default indicator takingthe value 1 if issuer i defaults before time T and 0 otherwise. In order to deﬁne the probabilitydistributions of L i ’s, as well as their dependence structure, we rely on a factor model approach,descending from the structural model of Merton (Merton 1974), which is widely used for portfo-lio default risk modelling by regulators and ﬁnancial institutions alike. Notable examples of thisapproach include the Asymptotic Single Risk Factor (ASRF) model (Gordy 2003), which is at theheart of Basel II credit risk capital charge, as well as industrial adaptations of Merton model suchas the CreditMetrics (JP Morgan 1997) and KMV models (Bohn and Kealhofer 2001, Crosbie andBohn 2002).Let us introduce a random variable X i representing issuer i ’s creditworthiness. In the same spiritof Mertons structural model, we specify that default occurs before time T if the value of X i liesbelow a threshold d i , or equivalently: Y i := [ −∞ ,d i ] ( X i ) , (24)where A ( · ) is the indicator function of set A . Hence, modelling Y i ’s boils down to model thecreditworthiness indices X i with i = 1 ...m , which are linearly dependent on a vector F of p < m systematic factors satisfying F ∼ N p (0 , Ω). Issuer i ’s creditworthiness index is assumed to be drivenby an issuer-speciﬁc combination ˜ F i = α (cid:124) i F of the systematic factors:16 i = (cid:112) β i ˜ F i + (cid:112) − β i (cid:15) i , (25)where ˜ F i and (cid:15) ...(cid:15) m are independent, standard normal variables (i.e. ˜ F i ∼ N (0 , , ∀ i and (cid:15) i ∼ N (0 , , ∀ i ) and the latter ones model the idiosyncratic risk. The coeﬃcient β i can be seen as ameasure of sensitivity of X i to systematic risk, as it represents the proportion of the X i variationthat is explained by the systematic factors. The assumption that Var[ ˜ F i ] = 1 implies that α (cid:124) i Ω α i =1 for all i . The correlations between asset returns are given by ρ ( X i , X j ) = Cov[ X i , X j ]= (1 − β i ) { i = j } + (cid:112) β i β j Cov[ ˜ F i , ˜ F j ]= (1 − β i ) { i = j } + (cid:112) β i β j α (cid:124) i Ω α j (26)(since ˜ F i and (cid:15) ...(cid:15) m are independent, standard normal variables and Var[ X i ] = 1). In order toset up the model we need to determine α i and β i for each issuer and Ω (while ensuring that α (cid:124) i Ω α i = 1).We choose d i such that P ( Y i = 1) = p i , where p i is the marginal probability of default of issuer i .As a result, d i = F − X i ( p i ), with F X i ( · ) being the cumulative distribution function of X i . Given thenormality of X i , it follows that d i = Φ − ( p i ), with Φ − ( · ) denoting the standard normal cumulativedistribution function. The portfolio loss can, then, be written as follows: L = m (cid:88) i =1 q i e i [ −∞ , Φ − ( p i )] (cid:16)(cid:112) β i ˜ F i + (cid:112) − β i (cid:15) i (cid:17) . (27)For p = 1, the speciﬁcation above is equivalent to the ASRF model. In this model, the singlesystematic factor aﬀecting all issuers is usually interpreted as the state of the economy and thecorrelation coeﬃcients are regulatory prescribed. In multi-factor models ( p ≥ Model calibration

Models used by banks for DRC calculations are required to account for systematic risk via multiplesystematic factors of two diﬀerent types (Basel Committee on Banking Supervision 2016, Para-graph 186(b)). For the ﬁrst systematic factor, we consider a global factor that is common to allissuers, reﬂecting the overall state of the economy. We adopt this approach due to relevant literaturesuggesting strong dependence of changes in default risk on global eﬀects (Aretz and Pope 2013).For the second systematic factor we consider factors representing industry and region eﬀects, aswell as community and subcommunity eﬀects. Even though a model with three types of systematicfactors would not be in line with the regulatory requirements for the calculation of DRC, for com-parison purposes we also consider a model with global, industry, and region systematic factors, anapproach commonly adopted in the industry for the calculation of IRC.In addition to the types of systematic factors, the regulatory rule-set speciﬁes that correlationsmust be calibrated using credit spreads or listed equity prices over a period of at least ten yearsthat includes a period of stress. We calibrate the model presented in Section 4.1 using CDS spreadscovering the period between 1 January 2007-31 December 2016 which includes the ‘stressed’ periodbetween 2007 and 2009. In our model setting, the liquidity horizon is set as one year and, asa result, the correlations should be measured over the same horizon. However, non-overlapping17able 4.: The table provides the statistics for the R between the individual issuers and systematicfactor(s). The model estimation is based on the following settings: global factor only (Model 1);global and industry factors (Model 2); global and region factors (Model 3); global, region, andindustry factors (Model 4); global and community factors (Model 5); and global and subcommunityfactors (Model 6). Industry, region, community, and subcommunity factors are deﬁned as cross-sectional averages at each time point and taken as already decomposed into a global factor andresiduals. The estimation is based on non-overlapping monthly log-returns and conducted fromJanuary 2007 to December 2016, which was identiﬁed as a recent 10-year period which includedthe ‘stressed’ period between 2007 and 2009. Individual R Model 1: Model 2: Model 3: Model 4: Model 5: Model 6:versus Global Global and Global and Global, region, Global and Global andsystematic factor industry region and industry community subcommunityfactors only factors factors factors factors factorsAverage 48.1% 54.5% 54.2% 58.1% 55.4% 58.2%SD 15.8% 16.4% 17.6% 17.5% 16.7% 18.0%Minimum 0.0% 0.6% 0.0% 1.4% 1.4% 0.8%Maximum 0.82% 84.1% 96.3% 96.5% 88.4% 92.6% annual log-returns from ten years history contain only nine points, which is not suﬃcient to yield areliable correlation estimate. If overlapping log-returns are considered alternatively, the sample sizecan be suﬃciently big, but the linear relation between the data series can be distorted. Moreover,certain bias can be introduced into the correlation estimate via auto-correlation, which usuallyleads to over-estimations. Instead of using overlapping annual returns, we chose to use to monthlynon-overlapping returns over the ten-year period. This approach leads to a suﬃcient number ofdata points for correlation estimation and can be seen as a reasonable compromise. The impliedhypothesis is that correlations measured over monthly and annual horizons are interchangeable andcan be used as a predictor for future one-year correlations. According to Wilkens and Predescu(2017), this assumption can be questioned, but, it is hard to reject from a statistical point of view,if one takes into account the uncertainty of the correlation measurement itself.We start by scaling each individual time series to have zero mean and unit variance. At each timepoint, global ( X G,t ), industry ( X I ( j ) ,t ), region ( X R ( k ) ,t ), community ( X C ( l ) ,t ) and subcommunity( X S ( n ) ,t ) returns are derived from the corresponding cross-section of the issuer returns. All theresulting factor time series have zero mean. The dependence of the region, industry, communityand subcommunity factors on global returns is explored by running the following linear regressionmodels X I ( j ) ,t = γ I ( j ) X G,t + ε I ( j ) ,t ,X R ( k ) ,t = γ R ( k ) X G,t + ε R ( k ) ,t ,X C ( l ) ,t = γ C ( l ) X G,t + ε C ( l ) ,t ,X S ( n ) ,t = γ S ( n ) X G,t + ε S ( n ) ,t (28)where γ I ( j ) , γ R ( k ) , γ C ( l ) and γ S ( n ) are coeﬃcients weighing the global factor and ε I ( j ) ,t , ε R ( k ) ,t , ε C ( l ) ,t and ε S ( n ) ,t are the industry-, region-, community- and subcommunity-speciﬁc residuals, re-spectively. The full regression results on the basis of Equation (28) for the period between January2007 to December 2016 can be found in Table A1 in Appendix A. The majority of the factorreturns move in line with the global returns, with coeﬃcients not signiﬁcantly diﬀerent from one.In addition, the proportion of variance explained by the global returns is high (as indicated by thevalues of R , between 63% and 96%), highlighting the leading role of the global factor.Turning now to the case of a single issuer, it is important to note that by regressing the industry18 X I ( j ) ), region ( X R ( k ) ), community ( X C ( l ) ) and subcommunity ( X S ( n ) ) returns against the globalreturns ( X G ), we have essentially orthogonalised the rest of the factors relative to the global factor.Hence, in addition to the global returns ( X G ), we use the residuals ε I ( j ) , ε R ( k ) , ε C ( l ) , and ε S ( n ) fromeq. (28) as explanatory factors for the returns of a single issuer, representing industry, region,community, and subcommunity eﬀects respectively. In the following we discuss the calibration ofa model with a global and a subcommunity factor; calibration of other model variants should thenbe straightforward. Recall that issuer i ’s creditworthiness index X i follows the dynamics presentedin Section 4.1. In our case, ˜ F i := α G ( i ) X G + α S ( i ) (cid:15) S ( i ) , where the coeﬃcients α G ( i ) and α S ( i ) havebeen rescaled so that ˜ F i ∼ N (0 , α G ( i ) := ˆ α G ( i ) Ψ i , α S ( i ) := ˆ α S ( i ) Ψ i (29)with ˆ α G,i and ˆ α S,i being the factor loadings of the regression model X i,t = ˆ α G ( i ) X G,t + ˆ α S ( i ) (cid:15) S ( i ) ,t + ε i,t (30)and Ψ i = σ [ ˆ α G ( i ) X G + ˆ α S ( i ) (cid:15) S ( i ) ]. Thus, we calibrate the factor loadings ˆ α G ( i ) and ˆ α S ( i ) by runningthe above regression. If we collect X G and (cid:15) S ( i ) into a matrix F i , then the least squares estimatorˆ α i equalises the two sides of the following equation: X i = F i ˆ α i (31)or, in other words, ˆ α i , ∀ i represent estimates of the coeﬃcients appearing in Equation (30):ˆ α i = ( F (cid:124) i F i ) − ( F (cid:124) i X i ) . (32)Finally, the coeﬃcient β i from Section 4.1 is the R of this regression, representing the proportionof variance for X i explained by the systematic factors.We estimate the parameters of six model variants on the basis of the calibration process describedpreviously: global factor only (Model 1); global and industry factors (Model 2); global and regionfactors (Model 3); global, region, and industry factors (Model 4); global and community factors(Model 5) and global and subcommunity factors (Model 6). The statistics for the individual R versus systematic factors are compared in Table 4. As it can be seen from the table, not surprisingly,the model based only on the global factor provides the worst ﬁt to the data with an average R of 48.2%. After the introduction of the industry factor the average R increases to 54.5%. Acomparable value (54.2%) is obtained if instead of the industry factor we introduce a region factor.The model based on global and community factors (Model 5) provides a slightly better ﬁt than theother two-factor models with an average R of 55.4%. The best ﬁt is achieved by the model basedon global and subcommunity factors (Model 6) with an average R of 58.2%, outperforming eventhe three-factor model based on global, region, and industry factors. This result is particularlyinteresting since the diﬀerence in the average R between Model 6 and the other two-factor modelsis comparable to the diﬀerence between the other two-factor models and the model based only onthe global factor.Having estimated the factor loadings and covariance matrices for the six model variants, weare able to obtain the distribution of the diﬀerences between the model-implied and empirical(pairwise) correlations (shown in Figure 9): after close inspection, it becomes evident that thedistributions of the correlation errors based on Model 4 and Model 6 are heavier around zero and19 .2 0.1 0.0 0.1 0.2 Difference between model-implied and empirical correlation O b s e r v ed f r equen cy Model 1: Global factor onlyModel 2: Global and industry factorsModel 3: Global and region factors Model 4: Global, region, and industry factorsModel 5: Global and community factorsModel 6: Global and subcommunity factors

Figure 9.: The ﬁgure shows the distribution of the diﬀerences between the model-implied and empir-ical (pairwise) correlations. As for the model, the six variants (global factor only, global/industry,global/region, global/industry/region, global/community, global/subcommunity factors) are ex-plored.have thinner tails. The distribution of the actual empirical correlations is shown in Figure B1 inAppendix B.

Numerical experiments

In order to study the properties of the framework presented in Section 4 we set up four synthetictest portfolios: • Portfolio A : Long-only portfolio consisting of 36 sovereign issuers from the iTraxx SovXindex family. • Portfolio B : Long-only portfolio consisting of 89 corporate issuers (Financials and Non-Financials) from the iTraxx Europe index. • Portfolio C : Long-only consisting of 125 issuers from Portfolio and Portfolio B combined. • Portfolio D : Long-short portfolio consisting of 22 long positions on issuers from Financialsand 22 short positions on issuers from Non-Financials from the iTraxx Europe index selectedsuch that the average default probability between the two groups is the same.For the purposes of our numerical experiments, we use historical default rates per rating from S&PGlobal Ratings (2019b) and S&P Global Ratings (2019a) as probabilities of default . The historicaldefaults rates per rating are shown in Table 5. In accordance with regulatory requirements (BaselCommittee on Banking Supervision 2016, Paragraph 186(b)), probabilities of default are subjectto a ﬂoor of 3 bps. The detailed composition of the synthetic portfolios in terms of rating, as wellas the mean and standard deviation of the corresponding probabilities of default are shown inTable 6.As far as the exposure at default is concerned, for the long-only portfolios we consider a constantand equally weighted exposure for each issuer such that e i = 1 /m for all i = 1 ...m and (cid:80) mi =1 = 1.For the long-short portfolio we consider e i ∈ F inancials = 1 /

22 and e i/ ∈ F inancials = − /

22, and as aresult (cid:80) mi =1 e i = 0. Finally, for the sake of simplicity, the loss given default parameter is set to 1for all issuers, i.e. q i = 1 , i = 1 ...m . 20able 5.: The table provides the historical default rates (%) per rating. Source S&P Global Ratings(2019b,a) Rating Corporates Sovereigns(1981-2018) (1975-2018)AAA 0.00 0.00AA 0.02 0.00A 0.06 0.00BBB 0.17 0.00BB 0.65 0.49B 3.44 2.82CCC/C 26.63 41.56

Table 6.: The table provides the composition of the synthetic test portfolios in terms of rating, aswell as the mean and standard deviation for the corresponding probabilities of default.

AAA AA A BBB BB Average PD SD PDPortfolio A 4 7 6 14 5 0.09% 0.16%Portfolio B - 6 32 51 - 0.12% 0.05%Portfolio C 4 13 38 65 5 0.11% 0.10%Portfolio D - 6 30 8 - 0.08% 0.05%

Table 7.: The table provides the quantiles of the loss distribution for the corresponding portfoliosand for each of the six model conﬁgurations. α Model 1: Model 2: Model 3: Model 4: Model 5: Model 6:Global Global and Global and Global, region, Global and Global andfactor industry region and industry community subcommunityonly factors factors factors factors factorsPortfolio A: 0.99 2.8% 2.8% 2.8% 2.8% 2.8% 2.8%Sovereign bonds, 0.995 2.8% 5.6% 5.6% 5.6% 2.8% 2.8%long position 0.999 8.3% 11.1% 11.1% 11.1% 8.3% 11.1%Portfolio B: 0.99 3.4% 3.4% 2.2% 2.2% 3.4% 3.4%Corporate bonds, 0.995 5.6% 5.6% 5.6% 5.6% 5.6% 5.6%long position 0.999 14.6% 14.6% 16.9% 16.9% 15.7% 16.9%Portfolio C: 0.99 2.4% 2.4% 2.4% 2.4% 2.4% 2.4%Combination of 0.995 4.8% 4.8% 4.8% 4.8% 4.8% 4.8%portfolios A and B 0.999 12.8% 12.8% 13.6% 13.6% 12.8% 13.6%Portfolio D: 0.99 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%Corporate bonds, 0.995 4.5% 4.5% 4.5% 0.0% 4.5% 4.5%long/short position 0.999 9.1% 13.6% 9.1% 9.1% 13.6% 13.6%

We, then, generate portfolio loss distributions and derive the associated risk measures by meansof Monte Carlo simulations. This process entails generating joint realizations of the systematicand idiosyncratic risk factors and comparing the resulting critical variables with the correspondingdefault thresholds. By this comparison, we obtain the default indicator Y i for each issuer and thisenables us to calculate the overall portfolio loss for this trial. A liquidity horizon of 1 year is assumedthroughout and the results are based on calibrations according to Section 4.2 and simulations withten million sample paths each. For a given conﬁdence level α ∈ [0 , α is deﬁned as the α -quantile of the loss distribution:VaR α ( L ) = inf { l ∈ R : P ( L ≤ l ) ≥ α ) } . (33)21 V a R ( % ) Portfolio A = 0.990 = 0.995 = 0.999 = 0.990 = 0.995 = 0.999 V a R ( % ) Portfolio C = 0.990 = 0.995 = 0.999

Figure 10.: Quantiles of the loss distributions obtained from the six model variants(global factor only, global/industry, global/region, global/industry/region, global/community,global/subcommunity factors) for the four synthetic test portfolios.Since both IRC in Basel 2.5 and DRC in FRTB are calculated based on a 99.9% VaR over acapital horizon of one year, we rely on this risk measure in order to compare the impact of diﬀerentcorrelation model conﬁgurations on portfolio risk. We calculate VaR α ( L ) for selected conﬁdencelevels α ∈ { . , . , . } , including the 0.999 which corresponds to DRC, for each of the sixmodel conﬁgurations and the synthetic test portfolios. The results are illustrated in Table 7 andFigure 10.For Portfolio A, consisting of long positions on sovereign issuers, Model 1 and Model 5 yielda DRC ﬁgure of 8.3%, while the other model variants yield a slightly more conservative ﬁgureat 11.1%. For Porfolio B, consisting of long positions on corporates, the values are higher andmore variable, with Model 1 and Model 2 producing a DRC at 14.6%, Model 5 at 15.7%, andthe rest of the models at 16.9%. For the more diversiﬁed Portfolio C, models 1,2, and 5 producea DRC of 12.8%, while models 3,4,6 produce slightly higher ﬁgures at 13.6%. Finally, for thelong/short Portfolio D, models 1,3, and 4 yield a DRC of 9.1%, while models 2,5, and 6 yield amore conservative 13.6%.In general, diﬀerent model variants produce less variable losses for lower quantiles. Furthermore,although it is not straightforward to draw solid conclusions on one model being consistently moreconservative, it seems that model based on global and subcommunity factors (Model 6) is among themost conservative models for all four portfolios, while the one-factor model (Model 1) is consistentlyamong the least conservative. In addition, the DRC values produced by models 3 and 4 are inagreement for all the portfolios.Another noteworthy observation has to do with the tails of the generated loss distributions. Byanalysing the relationship between the quantiles presented in Table 7, it can be seen that all modelvariants seem to produce distributions with heavier tails than the standard normal distribution.For instance, the ratio between the 0.999 and the 0.99 quantile for Portfolio A is 2.96 for Model 1and Model 5, and 3.96 for all the other model variants, compared to 1.33 for the standard normaldistribution. Similar tail behaviour can be observed for all the portfolios.22 . Concluding remarks One of the most challenging problems in the study of complex systems is that of identifyingthe mesoscopic organisation of the constituting units. This amounts to detecting groups of unitswhich are more densely connected internally than with the rest of the system. In case the unitsare represented by time series, a common approach is to regard correlation matrices as weightednetworks and employ standard network community detection methods. Since such an approach canintroduce biases, in this paper we adopt a principled approach based on Random Matrix Theory,leading to an algorithm that is able to identify internally correlated and mutually anti-correlatedcommunities, in a multiresolution fashion.Our methods are applied to the analysis of CDS time series with the aim of identifying mesoscopicgroups of issuers whose similarities cannot be traced back to industry sectors or geographicalregions. We use ten years of data including the stressed period between 2007 and 2009. The analysisreveals several interesting results with regards to the community structure. In addition, our resultsshow that diﬀerent time resolutions yield similar community structures and that these structuresare stable over time; this renders the obtained communities useful for risk models.Based on the detected communities, we derive factors and build a model for portfolio creditrisk that is in line with regulatory requirements for the calculation of DRC. This model is thencompared with industry-standard models based on global, country, and region factors. The modelsbased on global, communities, and subcommunities factors provide a better ﬁt to the data andlower error between the model-implied and the empirical pairwise correlations compared to theother two-factor models.To further explore the properties of the obtained factor models we set up four synthetic portfoliosand generate loss distributions via Monte Carlo simulations. The results show that the model basedon global and subcommunity factors is consistently among the most conservative. As a more generalobservation, all models produce distributions with heavy tails and the variability is higher in thehigher quantiles, including the 0.999 which is of particular interest for DRC.The present work represents a ﬁrst step towards incorporating community detection in riskmodels that can be used in a realistic set up. Further research could usefully explore applications inportfolio optimisation and hedging of credit sensitive instruments. An interesting avenue for furtherstudy that would establish the value of our ﬁndings for portfolio construction could be a rigorousanalysis of the diversiﬁcation potential for portfolios of risky instruments across communities andsubcommunities, similar to the one conducted by Aretz and Pope (2013) for country and industryfactors.

Disclosure

The opinions expressed in this work are solely those of the authors and do not represent in anyway those of their current and past employers. No potential conﬂict of interest was reported bythe authors.

Funding

This project has received funding from the European Unions Horizon 2020 research andinnovation programme under the Marie Skodowska-Curie Grant Agreement no. 675044(http://bigdataﬁnance.eu/), Training for Big Data in Financial Research and Risk Management.DG acknowledges support from the Dutch Econophysics Foundation (Stichting Econophysics, Lei-den, the Netherlands) and the Netherlands Organization for Scientiﬁc Research (NWO/OCW).23 eferences

Almog, A., Besamusca, F., MacMahon, M. and Garlaschelli, D., Mesoscopic community structure of ﬁnancialmarkets revealed by price and sign ﬂuctuations.

PloS one , 2015, , e0133679.Amaral, L.A. and Ottino, J.M., Complex networks. The European Physical Journal B , 2004, , 147–162.Anagnostou, I., S´anchez Rivero, J., Sourabh, S. and Kandhai, D., Contagious Defaults in a Credit Portfolio:A Bayesian Network Approach. Available at SSRN , 2019.Anagnostou, I., Sourabh, S. and Kandhai, D., Incorporating contagion in portfolio credit risk models usingnetwork theory.

Complexity , 2018, .Aretz, K. and Pope, P.F., Common factors in default risk across countries and industries.

European FinancialManagement , 2013, , 108–152.Basel Committee on Banking Supervision, Amendment to the capital accord to incorporate market risks.Technical report, Bank for International Settlements, 1996.Basel Committee on Banking Supervision, Guidelines for computing capital for incremental risk in thetrading book. Technical report, Bank for International Settlements, 2009.Basel Committee on Banking Supervision, Revisions to the Basel II market risk framework. Technical report,Bank for International Settlements, 2011.Basel Committee on Banking Supervision, Minimum capital requirements for Market Risk. Technical report,Bank for International Settlements, 2016.Bernaschi, M., Grilli, L. and Vergni, D., Statistical analysis of ﬁxed income market. Physica A: StatisticalMechanics and its Applications , 2002, , 381–390.Bohn, J.R. and Kealhofer, S., Portfolio management of default risk.

KMV working paper , 2001.Crosbie, P.J. and Bohn, J.R., Modeling default risk.

KMV working paper , 2002.Di Matteo, T., Aste, T. and Mantegna, R.N., An interest rates cluster analysis.

Physica A: StatisticalMechanics and its Applications , 2004, , 181–188.Gordy, M., A risk-factor model foundation for ratings-based bank capital rules.

Journal of Financial Inter-mediation , 2003, , 199–232.Heston, S.L. and Rouwenhorst, K.G., Does industrial structure explain the beneﬁts of international diversi-ﬁcation?. Journal of Financial Economics , 1994, , 3–27.JP Morgan, CreditMetrics–Technical Document. JP Morgan, New York , 1997.Laloux, L., Cizeau, P., Bouchaud, J.P. and Potters, M., Noise dressing of ﬁnancial correlation matrices.

Physical Review Letters , 1999, , 1467.Laurent, J.P., Sestier, M. and Thomas, S., Trading book and credit risk: How fundamental is the Baselreview?. Journal of Banking & Finance , 2016, , 211–223.Loretan, M. and English, W.B., III. Special feature: Evaluating changes in correlations during periods ofhigh market volatility. BIS Quarterly Review , 2000, pp. 29–36.MacMahon, M. and Garlaschelli, D., Community Detection for Correlation Matrices.

Phys. Rev. X , 2015, , 021006.Mantegna, R.N., Hierarchical structure in ﬁnancial markets. The European Physical Journal B-CondensedMatter and Complex Systems , 1999, , 193–197.Mantegna, R.N., Stanley, H.E. et al. , An introduction to econophysics: correlations and complexity in ﬁnance ,Vol. 9, , 2000, Cambridge university press Cambridge.McDonald, M., Suleman, O., Williams, S., Howison, S. and Johnson, N.F., Detecting a currencys dominanceor dependence using foreign exchange network trees.

Physical Review E , 2005, , 046106.McDonald, M., Suleman, O., Williams, S., Howison, S. and Johnson, N.F., Impact of unexpected events,shocking news, and rumors on foreign exchange market dynamics. Physical Review E , 2008, , 046110.Mehta, M.L., Random matrices , Vol. 142, , 2004, Elsevier.Merton, R.C., On the pricing of corporate debt: The risk structure of interest rates.

The Journal of ﬁnance ,1974, , 449–470.Onnela, J.P., Chakraborti, A., Kaski, K., Kertesz, J. and Kanto, A., Dynamics of market correlations:Taxonomy and portfolio analysis. Physical Review E , 2003, , 056110.Potters, M., Bouchaud, J.P. and Laloux, L., Financial applications of random matrix theory: Old laces andnew pieces. arXiv preprint physics/0507111 , 2005.Sieczka, P. and Ho(cid:32)lyst, J.A., Correlations in commodity markets. Physica A: Statistical Mechanics and itsApplications , 2009, , 1621–1630.S&P Global Ratings, Default, Transition, and Recovery: 2018 Annual Global Corporate Default And Rating ransition Study. Report , 2019a.S&P Global Ratings, Default, Transition, and Recovery: 2018 Annual Sovereign Default And Rating Tran-sition Study.

Report , 2019b.Wilkens, S. and Predescu, M., Default Risk Charge: Modeling Framework for the’Basel’Risk Measure.

Jour-nal of Risk , 2017, . ppendix A: Full regression results on the basis of Equation (28) This appendix presents the full regression results for the time-series regression models that arepresented in Equation (28) of the main article.Table A1.: This table provides the results of the industry, region, community, and subcommu-nity factor analysis derived from CDS spread returns. The analysis is based on non-overlappingmonthly log-returns and conducted over the period January 2007 through December 2016. Afterstandardisation the CDS returns are used to derive global, industry, region and community factorsas cross-sectional averages at each time point. The industry, region, and community factors areregressed onto the global factor; coeﬃcients ( γ I , γ R , γ C , and γ SC respectively) as well as R areprovided in the table, complemented by the standard deviation of the residual returns. t -tests areconducted to evaluate whether the coeﬃcients diﬀer from one. Industry γ I t -statistic p -value R σ I σ G Basic Materials 1.027 0.993 32.3% 92.3% 20.1% 67.9%Consumer goods 0.988 -0.586 55.9% 95.2% 15.1%Consumer services 0.962 -1.676 9.6% 93.7% 16.9%Energy 1.021 0.491 62.5% 82.7% 31.7%Financials 1.019 0.603 54.8% 90.1% 23.0%Government 1.042 1.035 30.3% 84.6% 30.2%Health care 0.880 -3.306 0.1% 83.2% 26.9%Industrials 1.028 1.493 13.8% 96.3% 13.6%Technology 0.890 -4.178 0.0% 90.7% 19.4%Telecommunication services 1.023 0.982 32.8% 94.0% 17.5%Utilities 1.009 0.350 72.7% 93.2% 18.5%Region γ R t -statistic p -value R σ R σ G Africa 1.056 0.936 35.1% 72.8% 43.8% 67.9%Asia 1.115 2.883 0.5% 86.9% 29.3%Eastern Europe 1.065 1.056 29.3% 71.8% 45.3%Europe 1.021 0.928 35.5% 94.5% 16.7%India 1.131 1.792 7.6% 67.1% 53.8%Latin America 1.072 1.221 22.4% 73.6% 43.6%Middle East 0.928 -1.282 20.2% 69.7% 41.6%North America 0.926 -3.702 0.0% 94.9% 14.6%Oceania 1.104 2.688 0.8% 87.3% 28.7%Community γ C t -statistic p -value R σ C σ G A 0.967 -1.196 23.4% 91.5% 20.1% 67.9%B 0.987 -0.526 60.0% 92.6% 18.9%C 1.061 1.852 6.6% 89.9% 24.2%D 0.989 -0.220 82.6% 76.8% 36.9%Subcommunity γ S t -statistic p -value R σ S σ G A1 0.943 -1.665 0.098 86.5% 25.3% 67.9%A2 0.934 -2.052 0.042 87.6% 23.9%A3 1.093 2.260 0.026 85.6% 30.4%A4 0.999 -0.015 0.988 76.5% 37.5%B1 0.952 -1.516 0.132 88.5% 23.2%B2 1.067 2.267 0.025 91.8% 21.7%C1 1.022 0.474 0.636 80.4% 34.3%C2 0.995 -0.141 0.888 88.4% 24.5%C3 1.154 3.665 0.000 86.5% 31.0%D1 1.026 0.379 0.705 65.5% 50.5%D2 0.895 -3.202 0.002 86.4% 24.1%D3 0.980 -0.295 0.768 62.8% 51.2%D4 1.123 1.664 0.099 66.2% 54.6%D5 1.081 1.122 0.264 65.7% 53.0%D6 1.011 0.168 0.867 65.0% 50.4% ppendix B: Empirical pairwise correlations Empirical pairwise correlation O b s e r v ed f r equen cycy