[PDF] Credit migration: Generating generators

Abstract

Markovian credit migration models are a reasonably standard tool nowadays, but there are fundamental difficulties with calibrating them. We show how these are resolved using a simplified form of matrix generator and explain why risk-neutral calibration cannot be done without volatility information. We also show how to use elementary ideas from differential geometry to make general inferences about calibration stability. This the longer version of an article published by RISK (Feb 2021).

Full PDF

CCredit migration: Generating generators

Richard J. Martin ∗ June 22, 2020

Abstract

Markovian credit migration models are a reasonably standard tool nowadays, but thereare fundamental diﬃculties with calibrating them. We show how these are resolved using asimpliﬁed form of matrix generator and explain why risk-neutral calibration cannot be donewithout volatility information. We also show how to use elementary ideas from diﬀerentialgeometry to make general inferences about calibration stability.

Introduction

Credit markets face uncertain times, and the current upset caused by the coronavirus pandemic islikely to result in deterioration of the credit fundamentals for many issuers. Although there hasbeen a huge rally since the end of the Global Financial Crisis, which now seems a distant memory,there have been upsets in the Eurozone (2011), and in commodity and energy markets (2014–16),while in emerging markets there have been downgrades in the sovereigns of Brazil, Russia andTurkey, and even during the benign year of 2019, the great majority of LatAm corporate nameson rating-watch were potential downgrades rather than upgrades . It is worthwhile, we think, torevisit the subject of credit migration.A standard modelling framework, introduced by Jarrow et al. [11] over twenty years ago, uses aMarkov chain, parametrised by a generator matrix, to represent the evolution of the credit ratingof an issuer. A good overview of subsequent developments, including its extension to multivariatesettings, is given by [2]; in terms of how it relates to the credit markets and default history, anexcellent series of annual reports is produced by Standard & Poors, of which the most recent is[22]. In the taxonomy of credit models, the ratings-based Markov chain model is best understoodas a sort of reduction of the credit barrier models, in the sense that the ﬁrm value, or distanceto default, is replaced by the credit rating. This reduces the dimensionality from a continuum ofpossible default distances to a small number of rating states, a natural idea as credit rating is aconvenient way of thinking about a performing issuer.The Markov chain model can be used in a number of ways, making it of general interest: • Econometrically, to tie potential credit losses to econometric indicators and thence to bankcapital requirements [7]; • Risk management of vanilla and exotic credit portfolios; ∗ Department of Mathematics, Imperial College London, Exhibition Road, London SW7 2AZ, U.K. Email [email protected] See e.g. Credit Suisse Latin America Chartbook. a r X i v : . [ q -f i n . R M ] J un Valuation of structured credit products. It is worth bearing in mind that although thesynthetic CDO market is a fraction of its former self, the CLO market is still thriving, withnew CLOs being issued frequently. Further, the regulatory capital sector is growing, andrequires the pricing and risk management of contingent credit products; • Stripping the credit curve from bond prices, as a way of constructing sensibly-shaped yieldcurves for diﬀerent ratings and tenors.The model can, therefore, be used in both subjective (historical) and market-implied (risk-neutral) modelling, and we shall consider both here. Nevertheless there are several diﬃculties thathave not been discussed in the literature, detracting from the utility of the model and limiting itsapplication hitherto:(I) The number of free parameters is uncomfortably high (with n states plus default it is n );(II) Computation of transition probabilities requires the generator matrix to be exponentiated,but this is a trappy subject [18];(III) There are algebraic hurdles associated with trying to make a generator matrix time-varying.Additionally the following fundamental matters are insuﬃciently understood:(IV) What is the best way to calibrate to an empirical (historical) transition matrix, and whatlevel of accuracy may reasonably be asked for?(V) Model calibration is one of the most important disciplines in quantitative ﬁnance, and yetno serious attempt has been made to answer the following: Can we uniquely identify therisk-neutral generator from the term structure of credit spreads for each rating? If not, whatextra information in needed?We address all of these here, solving most of the above problems using a new parametrisation inwhich a tridiagonal generator matrix is coupled with a L´evy stochastic time change (we call thisTDST). We consider an n -states-plus-default continuous-time Markov chain and write G ∈ R n +1 ,n +1 forthe generator matrix with the ( n + 1)th state denoting default (we will write this state as (cid:122) ). Theoﬀ-diagonal elements of G must be nonnegative and the rows sum to zero; the last row is all zerobecause default is an absorbing state. The probability of default in time τ , conditional on startingat state i , is the i th element of the last column of the matrix exp( Gτ ): in other words [exp( Gτ )] i (cid:122) .Let us begin by writing the generator matrix G ∈ R n +1 ,n +1 in the form G = (cid:20) H v (cid:21) (1)where H ∈ R n,n , and v ∈ R n is such that the rows of G sum to zero; the oﬀdiagonal elements of H , and those of v , must of course be nonnegative. The τ -period transition matrix is given byexp( Gτ ) = (cid:20) exp( Hτ ) E ( Hτ ) · vτ (cid:21) E is deﬁned by E ( x ) = ( e x − /x . As is apparent, we have used a matrixexponential, and also the function E , and it is worth making some general remarks about analyticfunctions of matrices.Let f be a function analytic on a domain D ⊂ C . Let B ( z , ρ ) denote the open disc of centre z and radius ρ , and assume that this lies inside D . Then f possesses a Taylor series (cid:80) ∞ k =0 f k ( z − z ) k convergent for z ∈ B ( z , ρ ). It is immediate, proven for example by reduction to Jordan normalform [4], that the expansion ∞ (cid:88) k =0 f k ( A − z I ) k enjoys the same convergence provided that all the eigenvalues of A lie in B ( z , ρ ). We can thereforewrite f ( A ) without ambiguity, and will use this notation henceforth. This construction establishesthe existence of the matrix exponential ( f k = 1 /k ! and ρ = ∞ ), though it does not at all make it agood computational method, because roundoﬀ errors can be substantial . Taylor series expansionis one of the many bad ways of calculating the matrix exponential [9, 18]—a point that is routinelyignored. A better idea is exp( A ) = lim n →∞ ( I + A/n ) n = lim n →∞ ( I − A/n ) − n (2)and in each case computation is easiest when n is a power of two because it can be accomplishedby repeated squaring. The second expression is slightly harder to calculate as it requires a matrixinverse, but it has an advantage: for n ∈ N , ( I − A/n ) − n is a valid transition matrix whereas( I + A/n ) n may not be.Having discussed the matrix exponential, we also have something to say about its inverse. Itis in principle possible to write down a τ -period (commonly one year) transition matrix M andask if it comes from a generator G , which amounts to ﬁnding the matrix logarithm of M . If sucha generator exists then M can be raised to any real positive power, and is sometimes said to be embeddable . Not all valid transition matrices are embeddable. By identifying a generator at theoutset we avoid this problem.Another important thread is the use of diagonalisation, implicit in the above discussion becausewe have already mentioned eigenvalues. If A = P EP − for some invertible P and diagonal E then f ( A ) = P f ( E ) P − . However, while a real matrix is generically diagonalisable, it may be that P is almost singular. There is also the minor irritation that P, E may not be real. This methodis another popular way of exponentiating matrices, but again one that is only reliable in certaincontexts, for example symmetric matrices: more of this presently.A simpliﬁcation of the model is to assume that H is tridiagonal: that is to say the only nonzeroelements are those immediately above, below, or on the leading diagonal. We deﬁne a transitive generator to be one in which all the super- and sub-diagonal elements of H are strictly positive. Anintransitive generator does not permit certain transitions to neighbouring states to occur: informallythis causes the rating to ‘get stuck’ and can be ruled out on fundamental grounds, though it doesretain some interest as a limiting case. Also, we deﬁne a restricted generator to be one in which allthe rows of H , except for the n th, sum to zero: equivalently, v is zero except for its bottom element.Both these simpliﬁed models have a clear connection to credit modelling as discretisations of thestructural (Merton) model, in which the ﬁrm value is replaced by the credit rating as a distance-to-default measure. A restricted tridiagonal generator corresponds to a Brownian motion hitting a For example in computing e − z at z = 20: the roundoﬀ error is far in excess of the correct answer. ‘Embedding’: there exists a smooth map m from R ≥ into the space of valid transition matrices, obeying m ( s + t ) = m ( s ) m ( t ). It is, of course, m ( t ) = M t . This means that if it isn’t, it is arbitrarily close to one that is. n − n −

2. These are substantially lower than the n needed to deﬁne a generalmodel. Also, tridiagonality is useful not just for dimensionality reduction. A transitive generatorcan always be diagonalised, and has real eigenvalues. This is because H = D (cid:101) HD − with D apositive diagonal matrix and (cid:101) H a symmetric tridiagonal matrix; then (cid:101) H can be diagonalised via anorthogonal change of basis as (cid:101) H = QEQ (cid:48) with (cid:48) denoting transpose. (The method is particularlyfast for tridiagonal matrices: see e.g. [20, § Gτ ) = (cid:20) DQ exp( Eτ ) Q (cid:48) D − ∗ (cid:21) . The right-hand column of the matrix, written as “ ∗ ”, need not be computed directly becauseit can be obtained from the rows summing to unity. Note that the eigenvalues of H , i.e. the(diagonal) elements of E , are real and negative. This makes computation of the matrix exponentialstraightforward .The problem with tridiagonal generators is that they do not permit multiple downgrades overan inﬁnitesimal time period, and so the probability of such events over short (non-inﬁnitesimal)periods, while positive, is not high enough. In this respect, the presence of a jump-to-defaulttransition achieves little, because that is not the main route to default: far more likely is theoccurrence of multiple downgrades, often by several notches at a time. So how do we make thesesudden multiple downgrades occur?To retain the structure above we employ a stochastic time change, replacing τ in the aboveequation by τ t with t denoting ‘real’ time and τ t ‘business’ time. We are going to make τ t a purejump process, with the idea that business time occasionally takes a big leap, causing multipletransitions, and we call this model TDST (tridiagonal with stochastic time change). Formally theprocess ( τ t ) is to be a monotone-increasing L´evy process described by the generator ϕ , i.e.: E [ e u ( τ t − τ ) ] ≡ e ϕ ( u ) t . The transition matrix is obtained by integrating over all paths { τ t } Tt =0 . As the dependence on τ is through an exponential, this is straightforward, and the eﬀect is to replace H by ϕ ( H ) in (1),adjusting the last column as necessary. (In context ϕ will be regular in the left half-plane, so we canlegitimately write the matrix function ϕ ( H ).) Accordingly the generator and T -period transitionmatrix are G ϕ = (cid:20) ϕ ( H ) ∗ (cid:21) ; M ϕ ( T ) = (cid:20) exp( ϕ ( H ) T ) ∗ (cid:21) = (cid:20) DQ · exp( ϕ ( E ) T ) · Q (cid:48) D − ∗ (cid:21) . (3)We can impose that τ t /t have unit mean (as otherwise the model is over-speciﬁed because H and τ can be scaled in opposite directions), so ϕ (cid:48) (0) = 1. An obvious choice is the CMY process ϕ ( u ) = βγ (cid:0) − (1 − u/β ) γ (cid:1) , β > , γ < Technically, the limit of an intransitive generator will cause D to become singular, and so the construction thenfails, but this seems to be a theoretical issue only. See e.g. [21] for a general introduction. γ = , ϕ ( u ) = 2 β (1 − (cid:112) − u/β ) , ϕ ( u ) = − β log(1 − u/β ) . This retains the connection with structural (Merton) modelling, in that we have a discretisationof a L´evy process hitting a barrier—a standard idea in credit risk modelling [13, 14, 15, 6]. Anincidental remark about the Gamma process is that it allows the matrix exponential ϕ ( H ) to becalculated in an alternative simple way, as noted in the Appendix.It is perhaps worth emphasising that despite our using the term ‘stochastic time change’ thismodel speciﬁcation still has a static generator. The stochastic time change serves only as a mech-anism for generating multiple downgrades in an inﬁnitesimal time period using a tridiagonal gen-erator matrix. The number of parameters for this model is 2 n − n ϕ , where n ϕ (= 2 here) is thenumber needed to specify ϕ . Israel et al. [10] spend some time showing that empirical transition matrices are not necessarilyembeddable (q.v.). In fact it is only necessary to test whether the empirical matrix is statisticallylikely to be so. Taking S&P’s data in [22, Tables 21,23], we ask if we can ﬁnd a TDST model thatﬁts well enough.There are a number of issues to consider. The ﬁrst is the right measure of closeness of ﬁt. Fromthe perspective of statistical theory the most appropriate is the Kullback–Leibler divergence : d M = n (cid:88) i =1 n +1 (cid:88) j =1 p ij ln p ij q ij (5)where p ij are the historical probabilities of transition from state i to j (with n + 1 denoting default)and q ij the model ones. This is essentially the log likelihood ratio statistic, i.e. the logarithm ofthe ratio of the likelihood of the model compared with that of the maximum likelihood estimator[5, Ch.4]. Note that d M is not symmetrical in p, q : nor should it be. To see why, consider ﬁrst theeﬀect of letting q ij → p ij >

0: this means that the event has been observed but that themodel probability is zero, an error that should be heavily penalised. On the other hand, p ij → q ij > d M with respect to the elements of H and the parameters β, γ in (4). Op-timisation of the 7-state model takes a fraction of a second and the results are shown in Figure 1.It is seen that the ﬁt is good, and furthermore the ﬁtting errors are small by comparison with A Taylor series expansion around p = q gives the well-known chi-squared measure of “( O − E ) /E , summed” ( O observed, E expected)—this can also be used to deﬁne the ﬁtting error, and it gives similar results. .If we pass to an 18-state model (AAA, AA+, AA, . . . , CCC+, CCC) we still have a largenumber of parameters even after the reduction to a TDST model (324 to around 36). It is possibleto ﬁt these, but some further simpliﬁcation proves advantageous. Suppose that the free parametersin the matrix H are the superdiagonal elements corresponding to instantaneous downgrades forthe following states only: AA, A, BBB, BB, B, CCC (in the last of these the downgrade is simplythe default rate), and the subdiagonal elements corresponding to the upgrade of these states. Thevalues for intermediate states (AA − , A+, etc.) are found by logarithmic interpolation. A minorcomplication is that the S&P data group CCC+, CCC and lower ratings into one. We have ignoredany rating lower than CCC because in practice transition through these states is very rapid, anddefault almost always ensues [22, Chart 10]. Again the results are good: see Figure 2. One has to be careful in interpreting these, for a reason that we have already touched on. Certain transitions havenever been observed, and for these S&P show the probability and standard error as zero. Now if the true frequencyis zero, then the mean and standard deviation of the observed frequency will be zero. But what we really want is anestimate of the true frequency and the standard deviation of that estimate . Clearly if Bayesian inference is followedthen the latter is not zero, because the true frequency may be > i) (%) AAA AA A BBB BB B CCC (cid:122) NRAAA 86 .

99 9 .

12 0 .

53 0 .

05 0 .

08 0 .

03 0 .

05 0 .

00 3 . .

50 87 .

06 7 .

85 0 .

49 0 .

05 0 .

06 0 .

02 0 .

02 3 .

94A 0 .

03 1 .

69 88 .

17 5 .

16 0 .

29 0 .

12 0 .

02 0 .

06 4 . .

01 0 .

09 3 .

42 86 .

04 3 .

62 0 .

46 0 .

11 0 .

17 6 . .

01 0 .

03 0 .

11 4 .

83 77 .

50 6 .

65 0 .

55 0 .

65 9 .

67B 0 .

00 0 .

02 0 .

08 0 .

17 4 .

93 74 .

53 4 .

42 3 .

44 12 . .

00 0 .

11 0 .

20 0 .

59 13 .

21 43 .

51 26 .

89 15 . (cid:122) AAA 89 .

82 9 .

42 0 .

55 0 .

05 0 .

08 0 .

03 0 .

05 0 . .

52 90 .

64 8 .

17 0 .

51 0 .

05 0 .

06 0 .

02 0 .

02A 0 .

03 1 .

77 92 .

29 5 .

40 0 .

30 0 .

13 0 .

02 0 . .

01 0 .

10 3 .

64 91 .

62 3 .

85 0 .

49 0 .

12 0 . .

01 0 .

03 0 .

12 5 .

35 85 .

86 7 .

37 0 .

61 0 .

65B 0 .

00 0 .

02 0 .

09 0 .

20 5 .

66 85 .

52 5 .

07 3 . .

00 0 .

14 0 .

25 0 .

75 16 .

76 55 .

21 26 . (cid:122) AAA 89 .

69 8 .

90 1 .

08 0 .

23 0 .

05 0 .

02 0 .

00 0 . .

56 91 .

12 7 .

49 0 .

66 0 .

09 0 .

04 0 .

00 0 .

04A 0 .

02 1 .

83 92 .

44 5 .

21 0 .

33 0 .

09 0 .

01 0 . .

00 0 .

11 3 .

64 91 .

55 4 .

03 0 .

47 0 .

04 0 . .

00 0 .

02 0 .

30 5 .

21 85 .

68 7 .

69 0 .

46 0 .

63B 0 .

00 0 .

01 0 .

06 0 .

43 5 .

43 85 .

43 5 .

69 2 . .

00 0 .

02 0 .

12 0 .

96 16 .

72 55 .

06 27 . (cid:122) AAA − .

91 9 .

84 0 .

78 0 .

19 0 .

04 0 .

02 0 .

00 0 . . − .

42 8 .

16 0 .

49 0 .

08 0 .

03 0 .

00 0 .

04A 0 .

01 2 . − .

05 5 .

66 0 .

24 0 .

08 0 .

01 0 . .

00 0 .

08 3 . − .

08 4 .

54 0 .

33 0 .

03 0 . .

00 0 .

02 0 .

22 5 . − .

88 8 .

99 0 .

27 0 .

51B 0 .

00 0 .

01 0 .

05 0 .

30 6 . − .

91 8 .

28 1 . .

00 0 .

02 0 .

09 0 .

56 24 . − .

85 35 . ↑ ↓ AAA − . . . − . . . − . . . − . . . − . . . − . . . − . . γ . β . Figure 1:

S&P 7-state historical transition matrix: (i) raw [22, Table 21], (ii) adjusted for transition tounrated (NR); (iii) TDST ﬁtted one-year transition matrix M ϕ (1), and its corresponding generator G ϕ , asper eq.(3); (iv) parameters, i.e. the subdiagonal (upgrade, ‘ ↑ ’), leading diagonal (rating unchanged, ‘0’), andsuperdiagonal (downgrade, ‘ ↓ ’) in eq.(1), and the parameters β, γ specifying ϕ in eq.(4). A c t u a l] ( % ) AAAAA + AAAA – A + AA – BBB + BBBBBB – BB + BBBB – B + BB – CCC (cid:122)

AAA . . . . . . . . . . . . . . . . . . AA + . . . . . . . . . . . . . . . . . . AA . . . . . . . . . . . . . . . . . . AA –0 . . . . . . . . . . . . . . . . . . A + . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . A –0 . . . . . . . . . . . . . . . . . . BBB + . . . . . . . . . . . . . . . . . . BBB . . . . . . . . . . . . . . . . . . BBB –0 . . . . . . . . . . . . . . . . . . BB + . . . . . . . . . . . . . . . . . . BB . . . . . . . . . . . . . . . . . . BB –0 . . . . . . . . . . . . . . . . . . B + . . . . . . . . . . . . . . . . . . B . . . . . . . . . . . . . . . . . . B –0 . . . . . . . . . . . . . . . . . . CCC . . . . . . . . . . . . . . . . . . [ F i tt e d ] ( % ) AAAAA + AAAA – A + AA – BBB + BBBBBB – BB + BBBB – B + BB – CCC + CCC (cid:122)

AAA . . . . . . . . . . . . . . . . . . . AA + . . . . . . . . . . . . . . . . . . . AA . . . . . . . . . . . . . . . . . . . AA –0 . . . . . . . . . . . . . . . . . . . A + . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . A –0 . . . . . . . . . . . . . . . . . . . BBB + . . . . . . . . . . . . . . . . . . . BBB . . . . . . . . . . . . . . . . . . . BBB –0 . . . . . . . . . . . . . . . . . . . BB + . . . . . . . . . . . . . . . . . . . BB . . . . . . . . . . . . . . . . . . . BB –0 . . . . . . . . . . . . . . . . . . . B + . . . . . . . . . . . . . . . . . . . B . . . . . . . . . . . . . . . . . . . B –0 . . . . . . . . . . . . . . . . . . . CCC + . . . . . . . . . . . . . . . . . . . CCC . . . . . . . . . . . . . . . . . . . ↓ AAA − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . . − . . γ . β . As above but with the 18-state model. One-year transition matrix [22, Table 23], TDST ﬁt,and parameters. Making the generator time-varying

We start by pointing out a trap that is very easy to fall into. When the generator varies with time,the transition matrix is not given by the following expression (or its expectation conditional oninformation known at time 0, for a stochastically varying G ): M ( T ) ?? = exp (cid:18)(cid:90) T G t dt (cid:19) (Wrong in general) . (6)Instead, the so-called ordered exponential must be used . This can be thought of aslim N →∞ (cid:104) ( I + G δt )( I + G δt δt )( I + G δt δt ) · · · ( I + G ( N − δt δt ) (cid:105) , δt = T /N

What is beguiling about (6) is its simplicity, its ‘obvious’ connection to standard ideas on ratesmodelling, and that its RHS is a valid transition matrix, albeit the wrong one.We can invoke the Feynman-Kac representation, which is that if X t obeys the SDE dX t = µ ( X t ) dt + σ ( X t ) dW t and G is a function of x only then the matrix M obeys the backward equation ∂M∂t = G ( x ) M + µ ( x ) ∂M∂x + σ ( x )2 ∂ M∂x , M (0) = I ;but this equation is unlikely to have a closed-form solution, so has to be solved numerically e.g. ona trinomial tree.Another route that is likely to cause diﬃculties is an over-reliance on diagonalisation methods.We have said that if we can write G = P EP − then it may be easy to exponentiate G , dependingon how well-conditioned P is. To make G time-varying it is superﬁcially attractive to allow E tovary while ﬁxing P . The problem with that is that there is no guarantee that a valid transitionmatrix is thereby produced.There is one case which, interestingly, links to both these ideas. That is where the eﬀect of timevariation is simply to scale G by a factor, i.e. G t = X t G with X t some scalar positive process whichmay as well have unit mean. The commutativity problem disappears, so that (6) is now correct,and only the eigenvalues vary over time. This model may or may not be appropriate, dependingon the context, as we discuss later.An obvious choice of dynamics is the CIR process, dX t = κ ( X − X t ) dt + σ (cid:112) X t dW t . (7)Importantly, the joint distribution of X t and exp (cid:0) (cid:82) t X s ds (cid:1) is known via the double Laplace trans-form [12, Prop. 2.5]: E (cid:104) e − λX t − µ (cid:82) t X s ds (cid:105) = A ( λ, µ, t ) e −B ( λ,µ,t ) X (8) To see where the problem is, suppose that G t equals G i for t ∈ [0 , T /

2) and G ii for t ∈ [ T / , T ], both constantmatrices. Then the period- T transition matrix is exp( G i T /

2) exp( G ii T / (cid:54) = exp( G i T / G ii T /

2) because for thelast two expressions to coincide one needs commutativity. A ( λ, µ, t ) = (cid:32) (cid:98) κ µ e ( (cid:98) κ µ + κ ) t/ σ λ ( e (cid:98) κ µ t −

1) + (cid:98) κ µ − κ + e (cid:98) κ µ t ( (cid:98) κ µ + κ ) (cid:33) κX/σ B ( λ, µ, t ) = λ (cid:0)(cid:98) κ µ + κ + e (cid:98) κ µ t ( (cid:98) κ µ − κ ) (cid:1) + 2 µ ( e (cid:98) κ µ t − σ λ ( e (cid:98) κ µ t −

1) + (cid:98) κ µ − κ + e (cid:98) κ µ t ( (cid:98) κ µ + κ )and (cid:98) κ µ = (cid:112) κ + 2 σ µ . Accordingly the T -period transition matrix conditional on X is E (cid:104) e G (cid:82) T X t dt (cid:105) = A (0 , − G, T ) e −B (0 , − G,T ) X ; (9)the unconditional one is obtained by integrating X out, which is immediate because its uncondi-tional distribution is Gamma of mean X and shape parameter 2 κX/σ . Default rates ﬂuctuate over time, and we wish to capture this eﬀect in a model that can thenbe linked to suitable econometric indicators [7]. Thus we need a stochastically-driven model, asopposed to a deterministic but time-inhomogenous one as considered in [3].As just mentioned, a simple scaling of the generator matrix can be done easily. However, it isunrealistic, because in benign years there are more upgrades and fewer downgrades, and the reversein bad ones: it is not correct simply to scale both by the same factor. Returning to (3), write G ϕ = G + ϕ + G − ϕ in which the terms on the RHS are valid generator matrices that are, respectively, lower- andupper-triangular. These are generators for an upgrade-only and a downgrade-only Markov chain,and so this decomposition is unique. Except in trivial cases we will always have G + G − (cid:54) = G − G + .Now scale these two terms by diﬀerent positive variables X + t and X − t representing the contributionfrom upgrades and downgrades respectively, so that G ϕ,t = X + t G + ϕ + X − t G − ϕ . (10)It is easiest to do this in a discrete-time setting, because our natural source of data for calibrationis annual. Then the one-year transition matrix is the exponential of G ϕ,t , but we no longer have aconvenient closed-form calculation for it, and have to resort to (2).We do not have yearly transition matrices to calibrate to: even if we did, they would bequite ‘noisy’, so that for example we might have an instance of a transition A to BB, but noneto BB+. However we do have two pieces of annual information are available [22, Table 6]: thedowngrade:upgrade ratio and the default rate, both of which are averaged across rating states.Roughly speaking the former gives information about X − t /X + t , and the latter about X − t . Fittingboth over time gives the picture shown in Figure 3. It is hard to be precise about what stochasticprocess X ± t follow, as we have only 38 data points. The sample statistics of X + and X − are: mean0.95, 0.95; standard deviation 0.42, 0.65; correlation 0.55. This shows, as anticipated, that it isnecessary to have two factors rather than just a single one. In fact, this is a similar conclusion to thatin [1, § The functions z (cid:55)→ A (0 , z, T ) and z (cid:55)→ B (0 , z, T ) are regular for Re z ≥

0, so can be evaluated ‘at’ z = − G . X + X − Figure 3:

Evolution of coeﬃcients X ± t in time-varying model. The green trace X + t controls upgrades andthe red trace X − t controls downgrades. Risk-neutral calibration is the identiﬁcation of a generator matrix that ﬁts a given set of CDSspreads or bond prices, via the survival curve . This is an entirely diﬀerent problem from ﬁttingto a prescribed historical transition matrix.The ﬁrst point we make is that in reality we should not expect anywhere near a perfect ﬁt indoing this: there is no reason why the market should trade a credit in line with its public rating, asthe market provides a forward-looking view of ﬁrm’s solvency prospects, and the rating may not beup to date. Thus it is quite possible to have two diﬀerent BBB − ﬁrms trading at 80bp and 140bpat the 5y tenor; once we get down to CCC+ we can have names trading anywhere between 500bpand 5000bp. Also there may be no bonds of a particular rating: this will almost certainly truewhen rating modiﬁers are used. All these examples can be found in the dataset we used. Thereforewe must be prepared for very ‘noisy’ data, and the objective of ﬁtting must be to come up withsensibly-shaped curves that give a reasonable ﬁt.It is superﬁcially attractive to describe this as a ‘bond pricing’ model. However, bonds arequoted on price or spread-to-Treasury, so there is no need for a pricing model as such. Arguably itis suitable for ‘matrix pricing’ of illiquid bonds, but an illiquid bond is likely to trade at a signiﬁcantyield premium to a liquid one of the same rating and tenor, so this would have to be borne in mind.In reality, the model is most suited to relative value analysis or for a parsimonious representation ofa certain sector, e.g. “Where do US industrials trade for diﬀerent rating and maturity?” Anotherapplication is as an input to the evaluation of EM corporate bonds, coupling the developed marketspread to the country spread [17].Our next point is diﬀerent, but no less fundamental. Even in a hypothetical world of bond/CDSspreads exactly lining up with their public ratings, and trading with great liquidity and low bid-oﬀer, does the term structure, for each rating, allow us to uniquely identify the generator? Thereis a clear intuition about why the term structure on its own does not suﬃce. Knowing the fulltransition matrix allows us to value any contingent claim, including claims that require us to know The valuation of CDS and bonds in terms of the survival curve is well-understood, e.g. [19]. volatility . But such information cannot be gleaned from bond pricesalone.The term structure provides (in principle) 2 n clearly-interpretable pieces of information, in theform of the short-term spread and the gradient of the spread curve at the short end, but in practiceobservations of these quantities will be ‘noisy’. Denoting by Q i ( T ) the survival probability for time T , conditional on being in state i at time zero, we have for small T , Q i ( T ) = 1 − G i (cid:122) T − ( G ) i (cid:122) T O ( T ) . If we convert to a par CDS spread then the PV of the payout leg and the coupon leg are, for smallmaturity T , respectively(1 − (cid:60) ) (cid:0) − Q i ( T ) (cid:1) = (1 − (cid:60) ) (cid:18) G i (cid:122) T + ( G ) i (cid:122) T · · · (cid:19) and T Q i ( T )2 = T (cid:18) − G i (cid:122) T · · · (cid:19) and the CDS spread is the ratio of the two: s i ( T )1 − (cid:60) = G i (cid:122) + (cid:2) ( G ) i (cid:122) + ( G i (cid:122) ) (cid:3) T · · · Alternatively we can use the continuously-compounded zero-coupon Z-spread s i ( T ) = − T − ln Q i ( T ),and the result is the same. Writing the matrix square G as an explicit sum, we obtain after someelementary algebra: s i (0) = (1 − (cid:60) ) G i (cid:122) s (cid:48) i (0) = 1 − (cid:60) n (cid:88) j =1 G ij ( G j (cid:122) − G i (cid:122) ) (11)The ﬁrst of these is obvious: the short-term spread is explained by the instantaneous probabilityof transition to default. The second expresses extra loss arising from transition to riskier states( i → j with G j (cid:122) > G i (cid:122) ) balanced oﬀ against transition to less-risky states ( G j (cid:122) < G i (cid:122) ): if theformer exceeds the latter, the spread curve is upward-sloping, and vice versa. In particular, thehighest rating necessarily has an upward-sloping curve and the lowest a downward-sloping one.Now suppose we have two term structures that agree on s i (0) and s (cid:48) i (0) for each i . They mightlook diﬀerent at the long end ( T → ∞ ), but how many extra parameters would such a diﬀerenceentail? So it seems plausible that the term structure provides a little over 2 n pieces of information,and certainly nowhere near n . A ‘toy’ example shows this well (Figure 4 ). The term structuresare not identical, but are so close that there is no realistic way of telling the generators apart—wechose, as is easily done, matrices for which the information in (11) is identical. Rating B clearlyhas higher volatility in the second matrix. It is possible to use some basic diﬀerential geometry to provide some quantitative precision aboutinvertibility. There exists a smooth pricing map f ∈ C ∞ from the space of generator matrices The letter codes A,B,C do not correspond to S&P ratings: they are just labels of convenience, with A the bestand C the worst. Recovery (cid:60) = 0 . to the space of term structures S , that is, a set of curves of spread vs maturity, one for eachrating. At any point G ∈ G it has a derivative ∂f that relates, linearly, inﬁnitesimal changes ingenerator matrix to inﬁnitesimal changes in term structure, and represented by the jacobian J f .To construct it, take G ∈ G and perform, one by one, n bumps on it, in which each bump consistsin increasing the element G ij , for i (cid:54) = j , by a small amount (cid:15) while adding − (cid:15) to G ii so as tokeep the i th row sum zero. Compute the new term structure f ( G + δG ), written as a set of N n points (maturities 1y, 2y, . . . N y and n ratings). The diﬀerence (cid:0) f ( G + δG ) − f ( G ) (cid:1) /(cid:15) estimatesthe directional derivative, and repeating this for each possible bump gives the jacobian, which is ofdimension N n × n , containing the sensitivity of all points on the term structure to all n bumps.The degree to which f is locally invertible can be understood by examining the singular valuedecomposition (SVD, [20, § J f = U ∆ V (cid:48) , in which U, V are orthogonaland ∆ = diag( δ , δ , . . . , ) with δ ≥ δ ≥ · · · ≥

0: these elements are the roots of the secularequation det( δ I − J f (cid:48) J f ) = 0, and are the singular values . In attempting to locally invert f wehave to divide by ∆, so small singular values cause a problem. As an overall scaling of the matrixdoes not aﬀect its conditioning, standard procedure is to plot ˆ δ i = δ i /δ , for i = 1 , , . . . , on alogarithmic scale, and we call this the normalised spectrum of ∂f . In well-conditioned problems theˆ δ i are all roughly equal; in ill-conditioned problems some are very small, and in these coordinatedirections inversion will be impossible without grossly amplifying any observation noise.Taking one further step we also want to examine whether a particular parametrised subspaceof G permits stable inversion: in context P is, of course, the TDST parameters. Let P be the spaceof parameters, with a map π : P → G taking a particular parameter set to its associated generator.We can construct J π and J f ◦ π by bumping the parameters p and seeing the eﬀect on the generator π ( p ) ∈ G and the term structure f ( π ( p )) ∈ S . Notionally the singular values of the restricted ∂f are simply those of J f ◦ π ◦ J − π , but this is not meaningful as J π is a rectangular matrix: instead,the values we seek are the roots ( δ i ) of the more general secular equationdet (cid:0) δ J π (cid:48) J π − J f ◦ π (cid:48) J f ◦ π (cid:1) = 0 . (12)This construction is invariant under local reparametrisation: if we have two diﬀerent parametrisa-tions of the same subspace of G , i.e. π : P → G and π : P → G , with π − ◦ π locally invertible,then the above equation does not depend on the choice ( π , P ) or ( π , P ): in other words it justdepends on f and on the subspace of G . We ﬁtted an 18-state matrix to US bond data from the manufacturing sector on 05-Feb-20. Toreduce the dimensionality of the problem further we used the same trick as in §

2, specifying sub-and super-diagonal elements for AA,A,BBB,BB,B,CCC and dealing with rating modiﬁers by inter-polation: this gives 14 parameters (= 2 × n ϕ ). The most obvious thing that we noticed was thatquite a ﬂat performance surface: diﬀerent parameter sets gave almost the same quality of ﬁt. Wehave more to discuss, but the reader may wish to glance at Figure 5, which shows the end result. Toavoid cluttering the plot, rating modiﬁers are not shown, so that for example BBB+,BBB,BBB − The jacobian of a map f : X → Y at x ∈ X is the matrix J f ( x ) whose ( i, j )th element is ∂f i /∂x j evaluated at x . The singular values of J f only tell us about local invertibility: it is easy to construct an f that causes the domainto be folded on itself in such a way that J f is nonsingular but f is not one-to-one. A simple example is f : R → R given by f ( x, y ) = e x (cos y, sin y ): the singular values of J f are identical, being e x , but f is periodic in y , and soobviously not one-to-one. In practice, therefore, establishing that a function f : R d → R d is invertible is hard if d >

1. In context this means that we might have two totally diﬀerent generator matrices giving precisely the sameterm structure, though this is perhaps unlikely. i.e. the restiction of ∂f to the tangent space of π ( P ) ⊂ G . n = 324 parameters, as the singular values roll oﬀ so rapidly. This ﬁrstgraph can be constructed for any generator so it has nothing to do with TDST. Turning next toFigure 6(b), which pertains to TDST, we see from the green trace that some instability is likelyas the smallest normalised singular values are quite small. This points to the conclusion that fur-ther constraining is required. We have already anticipated that volatility assumptions need to beincorporated, and so we address that next. If we wish to capture spread volatility in a realistic way, we should augment the model to allowfor the generator to be stochastic, thereby capturing day-to-day ﬂuctuations in spread that arenot associated with rating transition. Let us therefore return to § X t and exp( G (cid:82) t X s ds ) allows a widerange of contingent claims to be valued. However, single-name CDS options scarcely trade sothere would be nothing to calibrate to.A more practical idea is to compute the instantaneous spread volatility and attempt to matchit to historical data. As the eﬀect of X is approximately to scale the credit spreads proportionally,the instantaneous volatility of the period- T spread for rating i is σ si ( T ) ≈  n (cid:88) j =1 G ij (cid:0) ln s j ( T ) − ln s i ( T ) (cid:1) + σ  / (13)where σ is the volatility parameter in the CIR process.Turning now to empirical matters, we make some simple observations, based on time seriesanalysis of some 1500 bond and CDS spreads going back 10–20 years. The ﬁrst is that creditspread volatility is roughly 40%, and that this does not depend strongly on rating, tenor, or spreadlevel. The second is that there is little evidence of mean reversion, and so κ in (7) needs to be quitelow: we have ﬁxed it at 0.1(/year), though the model calibration is not sensitive to this choice.Thus, at the expense of adding one more parameter ( σ ), we can penalise the deviation of each σ si ( T ) from our assumed value of 0.4. This was found to stabilise the calibration.We can now redo the spectral analysis of Figure 6, augmenting the pricing map f so that ittakes a given generator G ∈ G to the term structure of spread and also of instantaneous volatility,notated f : G → S × V . The singular values, as expected, roll oﬀ more slowly (blue trace), showingthat new information is being supplied by the volatility term structure. By Figure 6(a) this is,unsurprisingly, still not enough to allow 324 parameters to be ﬁtted, but Figure 6(b) suggests thatenough stability may be conferred within the TDST parametrisation, which is what we found inpractice. See [16] for a review and discussion of this subject. As X can be set to unity.

15 B C (cid:122) A − .

14 0.10 0.03 0.01B 0.00 − .

05 0.00 0.05C 0.00 0.05 − .

15 0.10 s p r e a d ( b p ) CBA

A B C (cid:122) A − .

14 0.10 0.03 0.01B 0.20 − .

41 0.16 0.05C 0.00 0.05 − .

15 0.10 s p r e a d ( b p ) CBA

Figure 4:

Two diﬀerent hypothetical generators (last row omitted) and corresponding term structures,which are seen to be virtually identical.

Figure 5:

Implied par spreads from ﬁtting to US bond data (manufacturing sector, 05-Feb-20). (b)

Figure 6:

Normalised spectrum ˆ δ i vs i of ∂f (the derivative of the pricing map), using the model calibrationfor Figure 5. In each case “spreads” means f : G → S , and “spreads+vols” means f : G → S × V . Plot (a)refers to the map f , and (b) to its restriction to π ( P ) ⊂ G , using (12) to compute the singular values. We have presented a new model (TDST) based on a tridiagonal generator matrix coupled with astochastic time change to allow multiple downgrades to occur with a probability that is not fartoo low. The model may be calibrated historically or to bond/CDS spreads, and the relevantconclusions are given below. An incidental advantage is that the matrix exponential is generallyeasy to calculate.With regard to historical calibration, the presence of a ‘WR/NR’ (unrated or withdrawn rating),and the fact that some transitions are very rare, gives rise to considerable uncertainty as to the trueunderlying transition probabilities. The TDST model as described here is suﬃciently ﬂexible tocalibrate within these uncertainties, without having excessive number of parameters. If the modelis made time-varying, two factors are needed to capture the volatility of upgrades and downgrades.With regard to risk-neutral calibration, it is impossible to unambiguously identify a generatormatrix from term structure, and two ingredients are essential: (i) dimensionality reduction e.g.through TDST and (ii) information about volatility. Once this is provided the calibration workswell even in the presence of real data which must be expected to be very ‘noisy’.Using the theory of § Acknowledgements

I thank Di Wang and Huong Vu for their work in earlier stages of this project, and Roland Ordov`asfor helpful discussions.

A Appendix

A.1 Comment on the Gamma process

In the limit of one or more super- or sub-diagonal elements of H being zero, i.e. an intransitive gen-erator, the H = D (cid:101) HD − construction will no longer work, because D becomes singular. However,17n the case of the Gamma model the transition matrix is simply a matrix power, as it isexp( ϕ ( H ) t ) = ( I − H/β ) − βt . Going back to (2), we can now see why the second expression is a valid transition matrix. It can becomputed directly, as follows. First, compute A = ( I − H/β ) − , which is most easily done by LUfactorisation of I − H/β . Next, raise this to the power ν = βt . As raising a matrix to an integer ν is easy (by successive squaring and using the binary representation of ν ), we only have to dealwith the case where ν ∈ (0 , A ν = ( I + ( A − I )) ν = m (cid:88) r =0 Γ( ν + 1)( A − I ) r Γ( ν + 1 − r ) r ! + (cid:15) m = m (cid:88) r =0 Γ( ν + 1)Γ( ν + 1 − r ) r ! r (cid:88) k =0 ( − ) r − k (cid:18) rk (cid:19) A k + (cid:15) m = m (cid:88) k =0 A k c k ( ν ) + (cid:15) m c k ( ν ) = 1 k ! m (cid:88) r = k ( − ) r − k Γ( ν + 1)( r − k )! Γ( ν + 1 − r )where (cid:15) m denotes the truncation error. It is easy to see that c k ( ν ) is a polynomial of degree m that vanishes at ν = 0 , , . . . , k − , k + 1 , . . . , m , while c k ( k ) = 1. (See [8, § A ν as the Lagrange interpolant between I, A , . . . , A m ; the simplest formula is m = 1 which gives a linear interpolation between I and A . The expansion is convergent becausein context the eigenvalues of A lie in (0 , References [1] A. Andersson and P. Vanini. Credit migration risk modeling.

J. Credit Risk , 6(1):3–30, 2010.[2] T. R. Bielecki, S. Cr´epey, and A. Hebertsson. Markov Chain Models of Portfolio Credit Risk.In A. Lipton and A. Rennie, editors,

The Oxford Handbook of Credit Derivatives , chapter 10.Oxford University Press, 2011.[3] C. Bluhm and L. Overbeck. Calibration of PD term structures: To be Markov or not to be?

RISK , 20(11), 2007.[4] P. M. Cohn.

Algebra I . Wiley, 1984.[5] D. R. Cox and D. V. Hinkley.

Theoretical Statistics . Chapman & Hall, 1974.[6] A. Dalessandro. Credit migration risk and point in time credit dynamics: A new perspectivefor credit risk management. Technical report, UBS, 2011.[7] F. Fei, A.-M. Fuertes, and E. Kalotychou. Credit rating migration risk and business cycles.

Journal of Business Finance & Accounting , 39:229–263, 2013.188] I. S. Gradshteyn and I. M. Ryzhik.

Table of Integrals, Series and Products . Academic, ﬁfthedition, 1994.[9] N. J. Higham.

Functions of Matrices: Theory and Computation . SIAM, 2008.[10] R. Israel, J. S. Rosenthal, and J. Z. Wei. Finding generators for Markov chains via empiricaltransition matrices, with applications to credit ratings. Technical report, Univ. Toronto, 2000.[11] R. A. Jarrow, D. Lando, and S. M. Turnbull. A Markov model for the term structure of creditrisk spreads.

Rev. Fin. Studies , 10(2):481–523, 1997.[12] D. Lamberton and B. Lapeyre.

Introduction au Calcul Stochastique Appliqu´e `a la Finance .Ellipses, Paris, 2012.[13] A. Lipton. Assets with jumps.

RISK , 15(9):149–153, 2002.[14] R. J. Martin. Credit spread shocks: How big, and how often?

RISK , 22(8):84–89, 2009.[15] R. J. Martin. Smiling Jumps.

RISK , 23(9):108–113, 2010.[16] R. J. Martin. A CDS Option Miscellany. arXiv:1201.0111 , 2019. Vsn 3.[17] R. J. Martin, T. Uzuner, and Y. Ma. Emerging market corporate bonds as ﬁrst-to-defaultbaskets.

RISK , 31(7), 2018. arXiv:1804.09056 .[18] L. Moler and C. van Loan. Nineteen dubious ways to compute the exponential of a matrix, 25years later.

SIAM Rev. , 45(1):3–49, 2003. Originally 20(4):801–836, 1978.[19] D. O’Kane.

Valuing single-name and multi-name credit derivatives . Wiley, 2008.[20] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling.

Numerical Recipes inC++ . Cambridge University Press, 2002.[21] W. Schoutens.