Nonparametric estimation of the tree structure of a nested Archimedean copula
NNonparametric estimation of the tree structure of anested Archimedean copula
Johan Segers a,1 , Nathan Uyttendaele a,1, ∗ a Universit´e catholique de Louvain, Institut de Statistique, Biostatistique et SciencesActuarielles, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium
Abstract
One of the features inherent in nested Archimedean copulas, also called hier-archical Archimedean copulas, is their rooted tree structure. A nonparametric,rank-based method to estimate this structure is presented. The idea is to rep-resent the target structure as a set of trivariate structures, each of which canbe estimated individually with ease. Indeed, for any three variables there areonly four possible rooted tree structures and, based on a sample, a choice canbe made by performing comparisons between the three bivariate margins of theempirical distribution of the three variables. The set of estimated trivariatestructures can then be used to build an estimate of the target structure. Theadvantage of this estimation method is that it does not require any parametricassumptions concerning the generator functions at the nodes of the tree.
Keywords:
Archimedean copula, dependence, nested Archimedean copula,hierarchical Archimedean copula, rooted tree, subtree, Kendall distribution,fan, triple, nonparametric inference
1. Introduction
Archimedean copulas have become a popular tool for modeling or simulat-ing bivariate data. They are however not useful for every application, failingfor instance to properly model in high dimensions if the data do not exhibitsymmetric dependencies. Nested Archimedean copulas (NACs), or hierarchi-cal Archimedean copulas, are an interesting attempt to overcome this problem.They were first introduced by Joe (1997, pp. 87–89) and then have been stud-ied extensively, see for instance McNeil (2008), Hofert (2008), Hofert (2010) orHofert (2011) for sampling algorithms; Hofert and Maechler (2011), who re-leased the first R package devoted to NACs; Hering, Hofert, Mai and Scherer ∗ Corresponding author
Email addresses: [email protected] (Johan Segers), [email protected] (Nathan Uyttendaele) Order of contributions not necessarily reflected by alphabetical order.
Preprint submitted to Elsevier October 15, 2018 a r X i v : . [ s t a t . M E ] D ec X i , X j , X k ) can be estimated nonparametrically. The idea is to estimatethe Kendall distribution associated with each pair of variables within ( X i , X j ,X k ); these estimates allow us then to decide if all pairs of variables have actuallythe same underlying bivariate distribution or not. If so, then the tree structureof ( X i , X j , X k ) is the trivial trivariate structure, that is, a structure with oneinternal vertex and three leaves, also called a 3-fan. If not, determining whichpair has a different underlying bivariate distribution allows one to assign thecorrect tree structure to ( X i , X j , X k ).Section 6 introduces a key point, namely that a given tree structure λ canalways be represented as a set of trivariate structures. That is, for a randomvector of continuous random variables ( X , . . . , X d ) with a nested Archimedeancopula, it is possible to obtain the tree structure of this nested Archimedeancopula provided the tree structure of the nested Archimedean copula associatedwith any three variables ( X i , X j , X k ) with distinct i, j, k ∈ { , . . . d } is known.A very similar result was obtained by Ng and Wormald (1996), who showedthat a given structure λ can always be represented as a set of triples and fans ,triples and fans being formally defined in Section 6. Another interesting resultis offered by Okhrin, Okhrin and Schmid (2013b) who showed that the structurecan be retrieved from the bivariate margins of the nested Archimedean copula.Our suggestion to estimate the structure of ( X , . . . , X d ) is first to estimatethe tree structure of the nested Archimedean copula associated with any threevariables ( X i , X j , X k ) with distinct i, j, k ∈ { , . . . d } , and second to use thisset of estimated trivariate structures to build an estimate of the structure of( X , . . . , X d ). This suggestion and one important related difficulty make upSection 7.The performance of our approach is then assessed by means of a simulationstudy involving target structures in several dimensions (Section 8). As part of2his simulation study, the performance of the approach used by Okhrin, Okhrinand Schmid (2013a) is also investigated.Finally, Section 9 illustrates how our method could be used to highlighthierarchical interactions in the stock market. Some remaining challenges areoutlined in Section 10.
2. Archimedean copulas
Let ( X , . . . , X d ) be a vector of continuous random variables. The copula ofthis vector is defined as C ( u , . . . , u d ) = P ( U ≤ u , . . . , U d ≤ u d ) , where ( U , . . . , U d ) = ( F X ( X ) , . . . , F X d ( X d )), and where F X , . . . , F X d are themarginal cumulative distribution functions (CDFs) of X , . . . , X d , respectively.Archimedean copulas are a class of copulas that admit the representation C ( u , . . . , u d ) = ψ ( ψ − ( u ) + · · · + ψ − ( u d )) , where ψ is called the generator and ψ − is its generalized inverse, with ψ :[0 , ∞ ) → [0 , ψ (0) = 1 and ψ ( ∞ ) = 0.In order for C to be a d -dimensional copula, the generator is required to be d -monotone on [0 , ∞ ), see McNeil and Neˇslehov´a (2009) for details.The generators in Table 1 are among the most popular ones. All of them arecompletely monotone, that is, d -monotone for all integer d ≥
2. For the Frankfamily, D ( θ ) = θ (cid:82) θ t/ (exp( t ) − dt .Table 1: Some popular generators of Archimedean copulas. name generator ψ ( x ) θ τ AMH (1 − θ ) / ( e x − θ ) θ ∈ [0 ,
1) 1 − (cid:0) θ + (1 − θ ) log(1 − θ ) (cid:1) / (3 θ )Clayton (1 + x ) − /θ θ ∈ (0 , ∞ ) θ/ ( θ + 2)Frank − log(1 − (1 − e − θ ) e − x ) /θ θ ∈ (0 , ∞ ) 1 + 4( D ( θ ) − /θ Gumbel exp( − x /θ ) θ ∈ [1 , ∞ ) ( θ − /θ Joe 1 − (1 − e − x ) /θ θ ∈ [1 , ∞ ) 1 − (cid:80) ∞ k =1 / ( k ( θk + 2)( θ ( k −
1) + 2))
The parameter θ in Table 1 allows one to control the strength of the depen-dence between any two variables of the related Archimedean copula. This isbest understood by expressing Kendall’s τ coefficient between any two variablesof the related Archimedean copula in terms of θ (Hofert and Maechler, 2011),as done in the last column of Table 1.All margins of the same dimension of an Archimedean copula are equal, thatis, for all m ∈ { , . . . , d } and for every subset { i , . . . , i m } of { , . . . , d } having m elements, the two vectors( U i , . . . , U i m ) and ( U , . . . , U m )3ave the same distribution. This result stems from the fact that for Archimedeancopulas, C ( u , . . . , u d ) is a symmetric function in its arguments and this is whyArchimedean copulas are sometimes also called exchangeable . For modelingpurposes, this exchangeability becomes an increasingly strong assumption asthe dimension grows.
3. Nested Archimedean copulas
Asymmetries, allowing for more realistic dependencies, are obtained by plug-ging in Archimedean copulas into each other (Joe, 1997, pp. 87–89). For in-stance, in the two-dimensional Archimedean copula C D ( u , • ) = ψ D ( ψ − D ( u ) + ψ − D ( • )) , the argument • can be replaced by another Archimedean copula, such as C D ( u , u ) = ψ D ( ψ − D ( u ) + ψ − D ( u )) , in order to get a copula of the form C D ( u , C D ( u , u ) ) = ψ D (cid:0) ψ − D ( u ) + ψ − D ( ψ D ( ψ − D ( u ) + ψ − D ( u )) ) (cid:1) . (3.1)This last equation describes a copula where the bivariate marginal distribu-tion of ( U , U ) is not the same as the bivariate marginal distribution of ( U , U ) or ( U , U ), provided the generators ψ D and ψ D are different. If thejoint CDF of ( U , U , U ) was a simple Archimedean copula, all the bivariatemarginal distributions would have been identical. This allows one to appreciatehow the symmetry inherent in Archimedean copulas can be broken, althoughsome partial symmetry always remains, as the bivariate marginal distributionsof ( U , U ) and ( U , U ) still coincide.The way Archimedean copulas are nested corresponds to a rooted tree struc-ture, which will be referred to as the NAC tree structure or sometimes simplyas the structure later on. Nested Archimedean copulas, such as the one in (3.1),are defined through that rooted tree structure and through a collection of gen-erators, one for each node in the tree that is not a leaf. If the only nodes in thetree are the root and the leaves, then the copula is an Archimedean copula, thatis, a nested Archimedean copula with trivial structure and only one generator.
Definition . Let D be a nonempty, finite set with | D | = d elements. Forconcreteness, let D = { U , . . . , U d } . Formally, a rooted tree structure λ on D is a collection of nonempty subsets of D such that(i) D ∈ λ ;(ii) { U j } ∈ λ for every U j ∈ D ; iii) if A, B ∈ λ , then either A ⊂ B , B ⊂ A , or A ∩ B = ∅ .The elements of λ are called the nodes of the structure. The element D of λ is called the root node , or root in short; the singleton elements { U j } of λ arecalled the leaves . The nodes of λ that are not leaves are called the branchingnodes . If A, B ∈ λ are such that A ⊂ B , A (cid:54) = B , and there is no C ∈ λ suchthat A ⊂ C ⊂ B and C (cid:54) = A and C (cid:54) = B , then A is called a child of B andconversely B is called the parent of A . The set of children of B in λ is denotedby C ( B, λ ) . For instance, the structure λ implied by Equation (3.1) is (cid:8) { U , U , U } , { U , U } , { U } , { U } , { U } (cid:9) , and it can be graphically represented as shown in the left-hand panel of Figure 1,where D is a convenient label for the subset { U , U , U } and D for thesubset { U , U } . U U U D D U U U U D D D Figure 1: On the left, the tree structure implied by Equation (3.1). To ease thenotation, the singletons { U } , { U } and { U } are denoted by U , U and U . Onthe right, D is a convenient label for { U , U } , as well as D for the subset { U , U } and D for the set { U , U , U , U } . Again, we ease the notation bywriting U , . . . , U d instead of { U } , . . . , { U d } for the singletons.In the structure on the left in Figure 1, { U } and { U } are the children of D while { U } and D are the children of D , the root node.Let λ be a rooted tree on D = { U , . . . , U d } . Define the related set ofindices as d D = { , . . . , d } . Suppose that for each B ∈ λ with | B | (cid:62) ψ B , that is, we are given a generator for eachbranching node in the structure. Further let the set of indices related to B bedenoted as d B .Next, recursively define the functions C B : [0 , | B | → [0 , B ∈ λ , | B | (cid:62)
1, by C B ( u b : b ∈ d B ) = (cid:40) u b if B = { U b } ψ B (cid:16)(cid:80) A ∈C ( B,λ ) ψ − B (cid:0) C A ( u a : a ∈ d A ) (cid:1)(cid:17) if | B | (cid:62) . (3.2)5 efinition . A d -variate copula C D is a nested Archimedean copula (NAC) if it is of the form C B in (3.2) , with B = D . For any A ⊂ D with | A | (cid:62)
2, the copula C A on the variables ( u a : a ∈ d A )turns out to be a nested Archimedean copula, too. To describe its structure andits generators, we need a few more definitions.Let λ be a NAC structure on D and let A be a nonempty subset of D . Theset A need not be a node of λ . The NAC structure λ induces a NAC structureon A by the following operation: λ (cid:117) A = { A ∩ B : B ∈ λ } \ { ∅ } . That is, λ (cid:117) A is obtained by intersecting every node B of λ with A . Some ofthese intersections will be empty, and they are removed. Different nodes B and B of λ may have identical intersections B ∩ A and B ∩ A with A ; since λ (cid:117) A isthe collection of all intersections, identical intersections are counted only once.It is easy to verify that this construction produces a tree structure on A :verification of (i), (ii), and (iii) in Definition 3.1 is immediate.Let T be a subset of D containing at least two elements, that is | T | ≥ T does not need to be a node of λ . The lowest common ancestor (lca)in λ of the elements of T is given by the intersection of all the nodes B in λ that contain T , that is, lca( T, λ ) = (cid:92) B ∈ λ : T ⊆ B B, (3.3)and it provides the lowest branching node (read: farthest from the root) throughwhich the elements of T are linked up in λ . For instance, looking back atFigure 1, one can see that the lowest common ancestor between U and U is D , while lca( { U , U } , λ ) = D and lca( { U , U } , λ ) = D .Let C D be a d -variate nested Archimedean copula and let A be a nonemptysubset of D , not necessarily a node in the tree λ . The marginal copula C A onthe variables in A is a nested Archimedean copula, too. Its NAC structure isgiven by λ (cid:117) A , and the generator function associated to a branching node T in λ (cid:117) A is given by ψ lca( T,λ ) .As appealing as it is, Definition 3.2 is unfortunately not sufficient to guaran-tee that C D and its margins are copulas. A sufficient but not necessary condi-tion was developed by Joe (1997, pp. 87–89) and McNeil (2008): the derivativesof ψ − I ◦ ψ J are required to be completely monotone for every pair of branchingnodes I and J in the NAC structure such that J is a child of I . As an example,a sufficient condition for C D in Equation (3.1) to be a proper copula is thatthe derivatives of ψ − D ◦ ψ D are completely monotone. Although this sufficientnesting condition was originally formulated only in the context of fully nestedArchimedean copula structures, that is, structures where each branching nodehas either two leaves as children, or one leave and another branching node, weassume this sufficient nesting condition to hold for any NAC structure. Also6ote this sufficient nesting condition can be weakened at least on the lowestnesting level of the structure, as briefly discussed in Hofert (2012).The sufficient nesting condition is often easily verified if all generators ap-pearing in the nested structure come from the same parametric family. For eachfamily of Table 1, two generators ψ I and ψ J of the same family with correspond-ing parameters θ I and θ J will fulfill the sufficient nesting condition if θ I ≤ θ J ,assuming J is the child of I . Verifying the sufficient nesting condition if ψ I and ψ J do not belong to the same Archimedean family is usually harder, see forinstance Hofert (2010).
4. Identifiability
Recall that a parameter θ (possibly infinite-dimensional) in a statisticalmodel ( P θ : θ ∈ Θ), with P θ a probability measure on a fixed space, is identifiableif θ (cid:54) = θ implies that P θ (cid:54) = P θ , that is, different parameters yield differentdistributions of the observable. For d -variate nested Archimedean copulas, theparameter θ consists of the pair (cid:0) λ, { ψ B : B ∈ λ, | B | (cid:62) } (cid:1) . In this parametrization, the parameter θ is not identifiable, since replacinga generator function ψ B ( x ) by the function ψ B ( ax ), with 0 < a < ∞ , yields thesame copula; that is, the generator functions are identifiable up to scaling only.This issue can be solved easily in different ways, for instance by requiring that ψ B (1) = 1 /
2. This problem has however no impact on the structure λ itself andis therefore of little interest for this paper.A more fundamental identifiability issue arises if some generator functionsare the same. Consider for instance the tree λ implied by Equation (3.1), shownon the left in Figure 1. If the generators ψ D and ψ D are the same, say ψ ,then the nested Archimedean copula with parameter ( λ ; ψ D , ψ D ) is C D ( u , C D ( u , u )) = ψ D (cid:0) ψ − D ( u ) + ψ − D ( ψ D ( ψ − D ( u ) + ψ − D ( u ))) (cid:1) = ψ (cid:0) ψ − ( u ) + ψ − ( u ) + ψ − ( u ) (cid:1) , and actually describes an exchangeable Archimedean copula with generator ψ ,that is, a nested Archimedean copula with trivial tree structure and single gen-erator ψ .To ensure identifiability of the structure, we must require that for any twonodes A and B such that A ⊂ B and A (cid:54) = B , meaning A is a descendant of B ,the bivariate Archimedean copulas generated by the generator functions ψ A and ψ B are different, prohibiting the tree structure to collapse at some level. If thiscondition holds, then the structure λ can be identified. This weak restrictionon the generators will be assumed to hold throughout this paper.7ote that some generator functions can still be identical. Consider for in-stance the structure on the right in Figure 1. The generators associated to thenodes D and D can be identical, without simplification of the tree beingpossible.Also note the implication of this identifiability condition on the sufficientnesting condition if all generators appearing in the nested structure come fromthe same parametric family. For each family of Table 1, two generators ψ I and ψ J of the same family with corresponding parameters θ I and θ J will fulfill thesufficient nesting condition and the identifiability condition if θ I is strictly less than θ J , assuming J is a child of I .
5. Nonparametric estimation of a trivariate NAC structure
Let ( X , X , X ) be a vector of continuous random variables such that thejoint distribution of ( U , U , U ) = ( F X ( X ) , F X ( X ) , F X ( X )) is a nestedArchimedean copula. We are interested in estimating the NAC structure basedon n observations ( x l , x l , x l ) from ( X , X , X ), l = 1 , . . . , n .There are only four possible structures fulfilling Definition 3.1 for the trivari-ate case: (cid:8) { U , U , U } , { U } , { U } , { U } (cid:9) = Λ ; (cid:8) { U , U , U } , { U , U } , { U } , { U } , { U } (cid:9) = λ ; (cid:8) { U , U , U } , { U , U } , { U } , { U } , { U } (cid:9) = λ ; (cid:8) { U , U , U } , { U , U } , { U } , { U } , { U } (cid:9) = λ . In the trivial structure (tree Λ ), all bivariate marginal distributions of thenested Archimedean copula are the same, while in structures λ , λ and λ ,two bivariate marginal distributions are the same and one is different. More-over, if the bivariate marginal distributions are not all the same, being able todetermine the one that is different from the two others is enough to select theproper nested Archimedean copula structure λ , λ or λ .It is known from Genest and Rivest (1993) that the Kendall distribution ofa pair of variables ( X j , X k ) fully determines the copula of that pair providedthe copula is Archimedean. Thus, rather than working directly with bivariatedistributions, let us work with the related Kendall distributions which are uni-variate and therefore easier to handle. The Kendall distribution of the pair ( X j , X k ) is defined as the distribution of the variable W jk = C jk ( U j , U k ) = H jk ( X j , X k ) , where C jk ( u j , u k ) = P ( U j ≤ u j , U k ≤ u k ) is the joint CDF of ( U j , U k ), andwhere H jk ( x j , x k ) = P ( X j ≤ x j , X k ≤ x k ) is the joint CDF of ( X j , X k ). Themap defined for all w ∈ [0 ,
1] by K jk ( w ) = P ( W jk ≤ w ) ,
8s the Kendall distribution function (Barbe et al. 1996; Nelsen et al. 2003;Genest and Rivest 2001).The Kendall distribution function of a pair of variables ( X j , X k ) can beestimated (Genest, Neˇslehov´a and Ziegel, 2011) by first computing the pseudo-observations w ,jk , . . . , w n,jk and then the empirical distribution function ofthese pseudo-observations: w m,jk = 1 n + 1 n (cid:88) l =1 x lj < x mj , x lk < x mk ); K n,jk ( w ) = 1 n n (cid:88) m =1 w m,jk ≤ w ) , with 0 < w < . Since there are three possible pairs in our case, namely ( X , X ) , ( X , X )and ( X , X ), three empirical Kendall distribution functions need to be esti-mated. The distance between the empirical Kendall distribution functions of( X i , X j ) and ( X i , X k ) is defined as (cid:90) | K n,ij ( x ) − K n,ik ( x ) | dx = 1 n n (cid:88) m =1 | w ( m ) ,ij − w ( m ) ,ik | = δ ij,ik , where w (1) ,ij , . . . , w ( n ) ,ij are the ordered pseudo-observations related to the vari-ables ( X i , X j ) and w (1) ,ik , . . . , w ( n ) ,ik are those related to the variables ( X i , X k ).Typically, a trivial structure will result in three distances that are all aboutthe same, while trees such as λ , λ or λ will result in one small distancerelative to two other distances that are bigger and about the same. Thus forany three variables ( X i , X j , X k ), if, for instance, δ ij,ik is the minimum amongthe three distances, it seems reasonable to assume that the tree spanned on( X i , X j , X k ) is either the trivial structure or the structure λ jk where ( X i , X j )and ( X i , X k ) have the same Kendall distribution.The problem of determining the structure of ( X , X , X ) can thus be rewrit-ten as an hypothesis test: H : the true structure is the trivial structure. H : the true structure is structure λ or λ or λ , depending on whichpair of Kendall functions were the closest. As a test statistic, the absolute difference between the minimum distanceand the average of the two remaining distances is used. The null hypothesis isrejected when the test statistic is observed in the upper tail of its H distribution.As the H distribution of the test statistic is unknown, we rely on the boot-strap to calculate p-values. Under H the original sample is assumed to comefrom an unknown trivariate Archimedean copula. Using the work of Genest9t al. (2011), it is possible to estimate that Archimedean copula nonparamet-rically and to resample from that estimated Archimedean copula. For eachnew sample one obtains the three empirical Kendall distributions, the three dis-tances, and the related test statistic. The p-value of the observed test statisticfrom the original sample is then estimated by the proportion of test statisticsobtained from the new samples that are greater than or equal to the value of theobserved test statistic from the original sample. Should this estimated p-valuebe lower than a significance level α , for instance 10%, the null hypothesis isrejected.Since the estimator for the Kendall distribution depends on the data onlythrough the ranks and since our test statistic only depends on this estimator,our NAC structure estimator is rank-based, too.There are two key points in the test presented above: • First, determine which should be the alternative hypothesis. Should it bestructure λ , λ or λ ? • Second, choose between a trivial structure (= H ) and H .Possible errors are: • If the true structure is the trivial structure, rejecting it and thereforecommitting a type I error; • If the true structure is structure λ , λ or λ , failing to reject H (typeII error); • If the true structure is for instance structure λ , getting a wrong H andthen picking H (we will call this a type III error).The main difficulty with the test developed in this section is encounteredwhen the true structure is the trivial trivariate structure, that is, the structureone gets when the nested Archimedean copula is actually an exchangeable Ar-chimedean copula. Indeed if the probability of committing a type I error is fixedto α = 0 .
10, the trivial structure will be rejected 10% of the time regardless ofthe input sample size n . Our estimator is therefore not a consistent estimatorfor the trivial trivariate structure, unless we let α tend to 0 as n increases, sothat type I errors are asymptotically impossible. Practically speaking however,this rule has little significance as it does not help to select α given a value of n .In the simulation section of this paper, α will be fixed to 10% for all n , yieldingsatisfactory performance.
6. Recovering a target structure from trivariate structures
In Section 5, we showed how to infer the tree structure for three variablesat a time. Next, we need to assemble these trivariate structures into a single d -variate structure. For this to be possible, we need to ensure that the full tree10s indeed determined by the tree structures it induces on the collection of subsetsof three variables.Let λ be a NAC structure on D = { U , . . . , U d } , d ≥
3. Let B be abranching node of λ . The set of all children of B forms a partition of B , thatis, taking the union of all children of a branching node B allows to reconstructthat branching node. As a consequence, every branching node has at least twochildren.Since the children of a branching node B form a partition of B and since eachbranching node has at least two children, it follows that each branching nodecan be reconstructed from the pairs of which it is the lowest common ancestor,that is, for every branching node B , we have B = (cid:91)(cid:8) { U i , U j } ⊂ D : U i (cid:54) = U j , lca( { U i , U j } , λ ) = B (cid:9) . (6.1)The relation “. . . has the same lowest common ancestor as . . . ” is an equiva-lence relation on the set of pairs { U i , U j } of D . This relation induces a partitionof the set of pairs into equivalence classes: two pairs { U i , U j } and { U k , U l } be-long to the same equivalence class if and only if they have the same lowestcommon ancestor in λ .By Equation (6.1), the nested Archimedean copula structure λ can be re-constructed from the equivalence relation it induces on the set of pairs: everyequivalence class of pairs corresponds to a branching node, the branching nodebeing given by the union of the pairs in that equivalence class. Put differently,the union of all pairs within an equivalence class yields the branching node thatis the lowest common ancestor for each pair in that equivalence class. Hence,every NAC structure λ on D can be represented as a partition on the set ofpairs of D .Let d ≥
4. Suppose that for any set K ijk = { U i , U j , U k } with distinct i, j, k ∈ { , . . . , d } , the tree spanned on { U i , U j , U k } , λ (cid:117) K ijk , is known. Define ( λ ) as the set of these (cid:0) d (cid:1) trees.In Proposition 6.1, it is shown that the nested Archimedean copula structure λ can be recovered from ( λ ). Lemmas 1 and 2 contain some auxiliary results,with proofs in the Appendix. Lemma . Let λ be a tree on D . For any nonempty subsets T , T , C of D such that T ∪ T ⊂ C , we have lca( T , λ ) = lca( T , λ ) ⇐⇒ lca( T , λ (cid:117) C ) = lca( T , λ (cid:117) C ) . Essentially this lemma states that if two subsets of D have the same lowestcommon ancestor in λ , then they also have the same lowest common ancestorin any subtree of λ , provided the two subsets are included in the set of leavesof that subtree. For instance, consider the structure in the left-hand panel ofFigure 8, where D = { U , . . . , U } . The set { U , U , U } has the same lowestcommon ancestor as the set { U , U } in λ , this lowest common ancestor being11he node D . This holds true even if you consider only the subtree spannedon { U , U , U , U } , that is, even if you only consider the right branch of thestructure in Figure 8. Lemma . Let λ be a tree on D and let A ∈ λ . Let B be a nonempty subset of D with a least two elements. The lowest common ancestor of B is equal to A if and only if B ⊂ A and there exist distinct children B and B of A such that B ∩ B (cid:54) = ∅ and B ∩ B (cid:54) = ∅ . The meaning of this lemma is less straightforward. It states that if B is asubset of A , A being a node of λ , the only way A is going to be the lowestcommon ancestor of B is if B has a nonempty intersection with two distinctchildren of A . Consider Figure 8 again. If A = { U , U , U , U } , then the lca of B = { U , U } is A because in that case, B has a nonempty intersection with theonly two children of A , these two children being { U } and D = { U , U , U } . Proposition . The NAC structure λ can be recovered from the set ( λ ) , thatis, it is possible to retrieve the partition of the set of pairs { U i , U j } of D intoequivalence classes from the set ( λ ) .Proof. Let first { U i , U j } and { U i , U k } be two pairs with exactly one element, U i , in common. To see whether they have the same lowest common ancestorin λ , it is sufficient to consider the tree induced by λ on the set { U i , U j , U k } :it is known from Lemma 1 that the pairs { U i , U j } and { U i , U k } have the samelowest common ancestor in λ if and only if they have the same lowest commonancestor in λ (cid:117) { U i , U j , U k } .On the other hand, if two pairs are disjoint, there exists no set with onlythree elements containing both pairs. Still, considering { K ijk } with distinct i, j, k ∈ { , . . . , d } turns out to be sufficient to verify the equivalence of thetwo disjoint pairs: the two pairs can only be equivalent if there is a third pairequivalent to both of them and having a nonempty intersection with each ofthem.Indeed suppose first there exists a pair { U i , U j } having the same lowestcommon ancestor as the pair { U i , U k } . Also suppose { U i , U k } has the samelowest common ancestor as { U k , U l } . Then by transitivity { U i , U j } has thesame lowest common ancestor as { U k , U l } .Conversely, suppose that { U k , U l } and { U i , U j } have the same lowest com-mon ancestor, A . Recall Lemma 2. Let B i , B j , B k , B l be the children of A , towhich U i , U j , U k , U l belong, respectively. Since the lca of U i and U j is A , thepair { U i , U j } must have a non-empty intersection with two different children of A (Lemma 2). Hence, B i and B j must be disjoint, B i ∩ B j = ∅ . Similarly B k ∩ B l = ∅ . Then B k and B l cannot both be equal to B i . • If B k is different from B i , then U i and U k belong to two different childrenof A , and the lowest common ancestor of { U i , U k } is A , too; • If B l is different from B i , then, similarly, the lowest common ancestor of { U i , U l } is A , too. 12n both cases, we have found a pair that is equivalent to { U i , U j } and { U k , U l } and that has a nonempty intersection with each of them.Given a nested Archimedean copula structure λ , it is thus always possible tobreak it down into a set of trivariate structures, one trivariate structure for eachcombination of the elements of D , taken three at the time without repetition.Proposition 6.1 states that this set of trivariate structures is sufficient to recover λ . The idea to decompose a structure into smaller pieces that uniquely deter-mine the structure is not new however. Ng and Wormald (1996) show that agiven structure can be broken down into a set of triples and fans. A triple is atree with three leaves and two internal vertices. A fan is a tree with only oneinternal vertex and at least three leaves (that is, a fan is a trivial tree with atleast three leaves). A fan with d leaves is called a d -fan.Hereafter is a practical example on how to retrieve λ from ( λ ) when d = 4.Suppose indeed the (cid:0) (cid:1) = 4 elements of ( λ ) are as shown in Figure 2. U U U D D U U U D D U U U D D U U U D D Figure 2: A set of trivariate structures that uniquely determines the four-variatestructure in Figure 1. Note the labels of the internal nodes are irrelevant. Thelowest common ancestor of U and U could have been labeled P instead of D in the first structure and G in the second structure.From Figure 2, we get that • The lowest common ancestors of the pair { U , U } are { D , D } ; • The lowest common ancestors of the pair { U , U } are { D , D } ; • The lowest common ancestors of the pair { U , U } are { D , D } ; • The lowest common ancestors of the pair { U , U } are { D , D } ; • The lowest common ancestors of the pair { U , U } are { D , D } ; • The lowest common ancestors of the pair { U , U } are { D , D } .It appears therefore that { U , U } , { U , U } , { U , U } and { U , U } belongto the same equivalence class, while { U , U } is by itself, as well as { U , U } .The branching nodes of λ in this case are therefore { U , U , U , U } , { U , U } and { U , U } . The rooted tree structure λ is thus as shown in Figure 1.13he general procedure for any d ≥ for all pairs { U i , U j } such that i < j do Get from ( λ ) the set of lowest common ancestors. There should be d − end forfor all pairs { U i , U j } such that i < j do Intersect the set of lowest common ancestors of the working pair with theother sets (one set for each other pair of variables). Any nonemptyintersection means the two pairs are related, that is, belong to the sameequivalence class. This also allows to determine the number of equivalenceclasses; end forfor all equivalence classes do Take the union of all pairs within each equivalence class to get thebranching nodes of the structure. There are as many branching nodes asthere are equivalence classes; end for
Add the leaves to the branching nodes to get λ . Algorithm 1:
How to retrieve λ from ( λ )
7. Reconstruction of a NAC structure based on a set of estimatedtrivariate structures
Let λ be a NAC structure on a finite set D = { U , . . . , U d } , d ≥
4. It isknown from Section 6 that if the tree spanned on any three distinct elements of D is known (that is, each element of ( λ ) is known), then λ can be uniquelyrecovered, for instance using the algorithm at the end of the same section.Our suggestion for estimating λ is therefore to estimate, using the proceduredeveloped in Section 5, each element of ( λ ), thus effectively getting (cid:100) ( λ ) whichcan then be used to build ˆ λ .However if each element of ( λ ) is estimated, the problem of reconstructinga tree from that set of estimated trivariate trees is a bit different than whatwas considered in Section 6. Indeed it is not guaranteed that a proper nestedArchimedean copula structure can be recovered from a given set of estimatedtrivariate structures, that is, (cid:100) ( λ ) does not necessarily lead to a proper treeˆ λ . When ˆ λ retrieved from (cid:100) ( λ ) is not a proper nested Archimedean copulastructure, meaning it does not fulfill Definition 3.1, we call (cid:100) ( λ ) a faulty set.With a value of α equal to 0.00 for all tests required to estimate ( λ ), we failto reject the null hypothesis everywhere and we therefore get a set of estimatedtrivariate structures each describing a 3-fan. Such a set is never a faulty set,and ˆ λ , the estimated NAC structure retrieved from it, will always be a trivialstructure of dimension d , a d -fan. Of course if the true structure is not a d -fan,a value of α equal to 0.00 means you are sure to commit type II errors.14ith a value of α equal to 1.00 for all tests, all null hypotheses are rejectedand we end up with a set where each estimated trivariate structure describes atriple. Such a set can be a faulty set and usually is.Assuming the copula of the vector ( X , . . . , X d ) is a nested Archimedeancopula, a faulty set of estimated trivariate structures means at least one error(type I, type II or type III) has been committed. Notice the converse is not true:even when at least one type I, type II or type III error has been committed, (cid:100) ( λ ) might lead to a structure ˆ λ meeting Definition 3.1, however not equal to λ , the target structure. Suppose indeed the target structure is the structure onthe right in Figure 1. We know from Section 6 that this structure is uniquelydetermined by the four triples shown in Figure 2. Suppose however that thefirst two triples in Figure 2 are replaced by two 3-fans, that is, two type II errorshave been committed during the estimation process. Yet, this leads to a properfour-variate structure, shown on the left in Figure 5, however unequal to thetarget structure.A faulty set is essentially a red flag that should be viewed as an opportunityfor correction. However what kind of corrective measure should be applied tosuch a set is not straightforward. As done in the simulation study, we simplysuggest to decrease the value of α for all tests until the resulting set of estimatedtrivariate structures is not a faulty set anymore. At worst, α is to be decreaseddown to 0.00, we end up with a set of 3-fans, and ˆ λ is then a d -fan. This laststrategy is certainly not the best one can imagine, but is a very convenient oneto apply and ensures that ˆ λ will always be a proper tree.Since the estimator developed in Section 5 is unable to consistently estimatea 3-fan if we keep the same value of α > n , it also means we willbe unable to consistently estimate any λ that has at least one trivial trivariatecomponent, see for instance the simulation results from Figure 5.
8. Simulation study
Let ( X , X , X ) be a vector of random variables, the copula of which isArchimedean. We generate 500 samples of size n from ( X , X , X ) with thehelp of the R package nacopula (Hofert and Maechler, 2011). Please note thatthis package has since been merged with the copula package . With α arbitrarilyset to 0.10, how many times among the 500 samples are we able to retrieve the3-fan? Figure 3 shows the percentage of correct estimates for various values of n , various generator families and two different values of the related parameter θ , expressed as Kendall’s τ coefficient for convenience according to Table 1.15 lllllllllllllllllllllllllllllllllllllll t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbelJoe llllllllllllllllllllllllllllllllllllllll t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbelJoe
Figure 3: Percentage of correct estimates when the true structure is the trivialtrivariate structure.As expected, the percentage of correct estimates does not converge towards100% but oscillates around 90%, that is, (1 − α ) × α of 2 .
5% or 0 .
1% for all n , the percentage of correct estimates would likewiseoscillate around 97 .
5% or 99 .
9% respectively.If we generate samples from a 5-fan structure (left-hand side of Figure 4),with τ arbitrarily set to 0.5 for all tested generator families and the samearbitrary value of α as before, the same lack of consistency can be observed,as shown on the right-hand side of Figure 4. Notice that the percentage ofcorrect estimates in this case is near 100%, even though α = 0 .
1. This excellentperformance can be explained by the way faulty structures are handled: recallthat the strategy is to decrease α until a valid structure emerges. At worst, α is decreased to 0% and the estimated structure is then the trivial five-variatestructure which happens to be the target structure in this case, meaning this ruleof lowering α not only ensures that ˆ λ is always a proper tree but also improvesthe performance of our estimator for this particular case.16 U U U U D llllllllllllllllllllllllllllllllllllllll t = n % o f c o rr e c t e s t i m a t e s l GumbelFrankJoe
Figure 4: Percentage of correct estimates for the trivial five-variate case.Finally, if we generate samples from the structure on the left-hand side ofFigure 5, with τ = 0 . τ = 0 . U , U , U ) and the structure of ( U , U , U ),this lack of consistency was, again, expected. U U U U D D llllllllllllllllllllllllllllllllllllllll t = t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbelFrank
Figure 5: Percentage of correct estimates for a four-variate case.In all these cases, it seems consistency can be achieved only if we let α tend to0 as n increases, in order to ensure type I errors are asymptotically impossible.17 .2. Testing the method from Okhrin et al. (2013a) with samples from a 3-fan,a 5-fan and a four-variate structure containing two 3-fans. In order to estimate a NAC structure, Okhrin et al. (2013a) advise to usewhat they call the binary aggregated grouping with recursive estimation method,or RML method in short. Essentially, this approach consists in building a fullynested tree from bottom to top and then to aggregate some of the nodes of theresulting tree according to some criterion, so that the final estimated structurecan possibly be something else than a fully nested tree.To apply the RML method throughout the simulation section of this paper,we used the function estimate.copula of the R package HAC (Okhrin andRistig, 2012), this package being related to the work of Okhrin et al. (2013a).Since only the Clayton and Gumbel generator families are currently imple-mented in the HAC package, assessment of the RML method performance forother generator families is not possible at the time of writing.For the aggregation step in their approach, Okhrin et al. (2013a) suggestseveral criteria. We used the only criterion currently implemented in the HACpackage, namely that for any two successives nodes with estimated parametersˆ θ I and ˆ θ J in the structure, the nodes have to be aggregated if | ˆ θ I − ˆ θ J | < (cid:15) ,where (cid:15) has to be chosen by the user.With a value for (cid:15) arbitrarily set to 0.30 for all n , Figure 6 displays theperformance of their method for the estimation of a 3-fan. llllllllllllllllllllllllllllllllllllllll t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbel llllllllllllllllllllllllllllllllllllllll t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbel
Figure 6: Performance of the RML method by Okhrin et al. (2013a) for a 3-fan,with (cid:15) = 0 .
30 as threshold for aggregation.Increasing the value of (cid:15) for all n typically improves the performance oftheir estimator in the case of a 3-fan, as it increases the chances of aggregation.Lowering the value of (cid:15) typically deteriorates the performance of their estimatorfor this case. At the limit, with (cid:15) set to 0.00, no aggregation is done at all, andtheir estimator is unable to estimate correctly the trivial trivariate structure18tudied here. These remarks hold if the samples are generated from a 5-fan.The case of the structure on the left-hand side of Figure 5 is a little morecomplex to investigate. Their estimator is indeed able to consistently estimatethis structure for (cid:15) set to 0.15, 0.30 or 0.60, but not for (cid:15) set to 5.00 for instance. Given 500 samples of size n from a non-trivial trivariate structure (a triple)such as the one in the left of Figure 1 and α = 0 .
10, how many times amongthe 500 samples are we able to retrieve this triple with our method? Figure 7shows the percentage of correct estimates for various values of n and variousgenerator families (again, note that the same generator family is always usedacross all nodes of a given structure in the simulation section of this paper).The parameters θ (root node, D ) and θ (the other branching node, D ) areexpressed as Kendall’s τ coefficients for convenience. llllllllllllllllllllllllllllllllllllllll t = t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbelJoe l l l l l l l l l l l l l l l l l l l l l l l l l l t = t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbelJoe
Figure 7: Percentage of correct estimates when d = 3 and true structure is λ .As the sample size increases, there is a clear convergence towards 100% ofcorrect estimates. The more apart τ and τ , the faster the convergence towards100% of correct estimates (compare the two horizontal axes in Figure 7). Theseresults strongly suggest our estimator, at least when α = 10%, is a consistentestimator for any non-trivial trivariate NAC structure and thus for any largerNAC structure made up only of triples. Indeed, if the samples are generatedfrom the seven-variate structure such as the one on the left-hand side of Figure8, with τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . n increases (right-hand side of Figure 8).19 U U U U U U D D D D D D llllllllllllllllllllllllllllllllllllllll
200 400 600 800 1000 1200 t = t = t = t = t = t = n % o f c o rr e c t e s t i m a t e s l ClaytonJoeFrank
Figure 8: Percentage of correct estimates (right) for a seven-variate structure(shown on the left).Increasing the value of α for all n actually further improves the performanceof our estimator for both structures, the best performance possible being deliv-ered when α is set to its upper limit, that is, α = 100%. If α is set to its lowerlimit, that is α = 0%, our estimator becomes unable to estimate correctly anyof the two structures studied here.When the target structure is a triple, we found that the percentage of correctestimates also converges towards 100% by using the RML method from Okhrinet al. (2013a) as we did in Subsection 8.2, provided the value of (cid:15) is small enough.In fact, as any aggregation should be avoided in case the target structure is atriple, the performance of their estimator typically improves by lowering (cid:15) forall n , the best performance being delivered when (cid:15) = 0, that is, when theaggregation step is completely skipped. Should the value of (cid:15) be too large, thentheir estimator will fail to be a consistent estimator for the non-trivial trivariatestructure.In case the target structure is a triple, Figure 9 allows for a direct compar-ison between our approach and their approach when both are pushed to theirfavorable respective limit, thus with α = 100% for our method and with (cid:15) = 0for the method of Okhrin et al. (2013a).20 lllllllllllllllllllllllllllllllllllllllllllll
10 20 30 40 50 t = t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbel llllllllllllllllllllllllllllllllllllllllllllll
10 20 30 40 50 t = t = n % o f c o rr e c t e s t i m a t e s l ClaytonGumbelFrank
Figure 9: Percentage of correct estimates when d = 3 and true structure is λ . Left is the RML method for Clayton and Gumbel generators, right is themethod described in this paper for Clayton, Gumbel and Frank generators, thelatter generator being an arbitrary choice. Both methods were pushed to theirrespective limit in order to deliver the best performance possible for structure λ .There is a performance gap between the two methods. Recall however thatthe RML method was applied with the prior knowledge that the generators wereClayton and Gumbel generators while our method does not require such priorknowledge.
9. Application
Daily log returns from January 2010 to December 2012 of the followingindices were gathered with the help of Yahoo! Finance: • Abercrombie & Fitch Co. (ANF), traded in New York; • Amazon.com Inc. (AMZN), traded in New York; • China Mobile Limited (ChM), traded in Hong Kong; • PetroChina (PCh), traded in Hong Kong; • Groupe Bruxelles Lambert (GBLB), traded in Brussels; • and KBC Group (KBC), traded in Brussels.For each of these six time series, we fitted a GARCH(1,1) model with generalizederror distribution and extracted the residuals, that is, we divided each of the sixoriginal time series by the related estimated standard deviations, leading to atable of n = 740 observations and d = 6 columns. These new six time series wewill call the GARCH(1,1)-standardized log returns. We performed a Ljung-Boxtest (lag 20) on each of the six GARCH(1,1)-standardized log returns as well ason each of the six squared GARCH(1,1)-standardized log returns and failed to21eject the null hypothesis of zero serial correlation each time. A chi-squared testwas also performed on each of the six GARCH(1,1)-standardized log returnsto check if the generalized error distribution assumed for the residuals in theGARCH(1,1) model is warranted. Again, we failed to reject the null hypothesiseach time.Figure 10 shows the estimated structure for the GARCH(1,1)-standardizedlog returns of ANF, AMZN, ChM and PCh, the estimated structure for ANF,AMZN, GBLB and KBC, and the estimated structure for ChM, PCh, GBLBand KBC. ANF AMZN ChM PChD D D ANF AMZN GBLB KBCD D D ChM PCh GBLB KBCD D D Figure 10: Given two series of GARCH(1,1)-standardized log returns from onegeographical area and two from another area, a natural clustering by area arises.The above structures are all strongly supported by the data, as the 12 relatedp-values are less than 10e-04.In order to build a six-variate structure, we need to estimate eight extratrivariate structures. The left-hand side of Figure 11 shows a reasonable guessfor the six-variate structure in which the eight extra trivariate structures all are3-fans. 22
NF AMZN ChM PCh GBLB KBCD D D D ANF AMZN ChM PChGBLB KBCD D D D D Figure 11: Possible six-variate structures for the data.However, the 3-fan in four of the eight extra trivariate structures is stronglyrejected by the data, which rather suggest the structure on the right-hand sideof Figure 11. Unfortunately, this last structure implies we must reject the 3-fanfor all eight extra trivariate structures and not only for half of them, makingthe estimation of a six-variate structure quite uncertain. Since both PetroChinaand China Mobile are traded not only in Hong Kong but also in New York,we could expect their log returns in Hong Kong to be more related to the logreturns of some companies in New York (for instance ANF and AMZN) than tothe log returns of two companies in Belgium. The structure on the right-handside of Figure 11 seems therefore more appropriate.
10. Discussion
In this paper, we have paved the way for a nonparametric rank-based ap-proach to estimating a NAC structure, without the need to make any assump-tions about the generators of the nested Archimedean copula prior to estimationof its structure apart from a natural identifiability condition. A number of chal-lenges remain however: • Difficulties can appear when the method is applied to real data for whichthe true copula is not necessarily a NAC. For instance, one can end up witha subset of estimated triples each strongly supported by the data (that is,very small p-values, meaning type I or type III errors are unlikely) and yetthese triples contradict each other in the sense that no global structurecan be retrieved unless α is set to 0.00 and the global estimated structureis a fan, i.e., and Archimedean copula. • The whole method is computationally intensive, unlike the method fromOkhrin et al. (2013a). This is best understood by calculating the number23f trivariate structures for which a test is necessary: to get an estimatefor a five-variate structure for instance ( d = 5), we need to estimate 10trivariate structures. With d = 10, we have to estimate 120 trivariatestructures. Regarding the estimation of a single trivariate structure, therequired time depends mainly on the sample size and on the number ofbootstrap replications. With 200 bootstrap replications (the value weused throughout the simulation section of this paper), a few seconds areneeded at worst to get a trivariate estimated structure. Optimized R codeis available from the authors. • Once a genuine NAC structure has been estimated with our nonparametricapproach, the problem of estimating the generators remains. These gen-erators cannot be estimated marginally, as doing so does not guaranteethat the resulting function will be a proper copula.
Acknowledgements
This research is supported by contract “Projet d’Actions de Recherche Con-cert´ees” No. 12/17-045 of the “Communaut´e fran¸caise de Belgique” and by IAPresearch network grant nr. P7/06 of the Belgian government (Belgian SciencePolicy).We are also grateful to Alexander McNeil (Heriot-Watt University) for care-ful reading of parts of our manuscript and for constructive, detailed feedback.
References
References
Barbe, P., Genest, C., Ghoudi, K., R´emillard, B., 1996. On Kendall’s process.Journal of Multivariate Analysis 58, 197–229.Genest, C., Neˇslehov´a, J., Ziegel, J., 2011. Inference in multivariate Archime-dean copula models. Test 20, 223–256.Genest, C., Rivest, L., 2001. On the multivariate probability integral transfor-mation. Statistics & Probability Letters 53, 391–399.Genest, C., Rivest, L.P., 1993. Statistical inference procedures for bivariateArchimedean copulas. Journal of the American Statistical Association 88,1034–1043.Hering, C., Hofert, M., Mai, J., Scherer, M., 2010. Constructing hierarchical Ar-chimedean copulas with L´evy subordinators. Journal of Multivariate Analysis101, 1428–1433.Hofert, J., 2010. Sampling Nested Archimedean Copulas: With Applications toCDO Pricing. Ph.D. Thesis, Ulm University .24ofert, M., 2008. Sampling Archimedean copulas. Computational Statistics &Data Analysis 52, 5163–5174.Hofert, M., 2011. Efficiently sampling nested Archimedean copulas. Computa-tional Statistics & Data Analysis 55, 57–70.Hofert, M., 2012. A stochastic representation and sampling algorithm for nestedArchimedean copulas. Journal of Statistical Computation and Simulation 82,1239–1255.Hofert, M., Maechler, M., 2011. Nested Archimedean Copulas Meet R: Thenacopula Package. Journal of Statistical Software 39, 1–20. Please note thepackage nacopula has been merged with the package copula.Hofert, M., Pham, D., 2012. Densities of nested Archimedean copulas. arXivpreprint arXiv:1204.2410 .Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman andHall, London.McNeil, A.J., 2008. Sampling nested Archimedean copulas. Journal of StatisticalComputation and Simulation 78, 567–581. doi: .McNeil, A.J., Neˇslehov´a, J., 2009. Multivariate Archimedean copulas, d -monotone functions and l -norm symmetric distributions. The Annals ofStatistics 37, 3059–3097.Nelsen, R., Quesada-Molina, J., Rodr´ıguez-Lallena, J., ´Ubeda-Flores, M., 2003.Kendall distribution functions. Statistics & Probability Letters 65, 263–268.Ng, M.P., Wormald, N.C., 1996. Reconstruction of rooted trees from subtrees.Discrete Applied Mathematics 69, 19–31.Okhrin, O., Okhrin, Y., Schmid, W., 2013a. On the structure and estimationof hierarchical Archimedean copulas. Journal of Econometrics 173, 189–204.Okhrin, O., Okhrin, Y., Schmid, W., 2013b. Properties of hierarchical Archi-medean copulas. Statistics & Risk Modeling 30, 21–54.Okhrin, O., Ristig, A., 2012. Hierarchical Archimedean Copulae: The HACPackage. SFB 649 Discussion Papers SFB649DP2012-036. Sonderforschungs-bereich 649, Humboldt University, Berlin, Germany.Puzanova, N., 2011. A hierarchical Archimedean copula for portfolio credit riskmodelling. Deutsche Bundesbank Discussion Paper, Series 2.25 ppendix Proof of Lemma 1
The proof is built in two steps. First we need to prove that, for ∅ (cid:54) = A ⊂ C ⊂ D , we have lca( A, λ (cid:117) C ) = lca( A, λ ) ∩ C. By definition, we havelca(
A, λ ) ∩ C = (cid:18) (cid:92) B ∈ λ : A ⊆ B B (cid:19) ∩ C = (cid:92) B ∈ λ : A ⊆ B ( B ∩ C ) . Since A is a subset of C and since A must be a subset of B , notice that requiring A ⊂ B is equivalent to requiring A ⊂ B ∩ C . Thus we can writelca( A, λ ) ∩ C = (cid:92) B ∈ λ : A ⊆ B ∩ C ( B ∩ C ) . On the other hand, lca(
A, λ (cid:117) C ) = (cid:92) B (cid:48) ∈ λ (cid:117) C : A ⊆ B (cid:48) B (cid:48) . Since λ (cid:117) C = { B ∩ C : B ∈ λ } \ { ∅ } by definition, we can rewrite the aboveexpression as lca( A, λ (cid:117) C ) = (cid:92) B ∈ λ : A ⊆ B ∩ C,B ∩ C (cid:54) = ∅ ( B ∩ C ) . And because A ⊆ B ∩ C and A (cid:54) = ∅ , the requirement B ∩ C (cid:54) = ∅ can be dropped,thus lca( A, λ (cid:117) C ) = (cid:92) B ∈ λ : A ⊆ B ∩ C ( B ∩ C ) = lca( A, λ ) ∩ C. The second step of the proof of Lemma 1 begins by making use of the resultfrom the first step. Indeed, we can now write:lca( T j , λ (cid:117) C ) = lca( T j , λ ) ∩ C with j = 1 , . Suppose to begin lca( T , λ ) = lca( T , λ ). We therefore havelca( T , λ (cid:117) C ) = lca( T , λ ) ∩ C = lca( T , λ ) ∩ C = lca( T , λ (cid:117) C ) .
26n the other hand, suppose that lca( T , λ (cid:117) C ) = lca( T , λ (cid:117) C ). Obviously,lca( T , λ ) ⊃ lca( T , λ ) ∩ C, and since T is both a subset of lca( T , λ ) and of C , we also havelca( T , λ ) ∩ C ⊃ T . Because lca( T , λ (cid:117) C ) = lca( T , λ (cid:117) C ) implies that lca( T , λ ) ∩ C = lca( T , λ ) ∩ C ,we have lca( T , λ ) ⊃ T , which means that lca( T , λ ) is an ancestor of T , but not necessarily the lowest.Therefore lca( T , λ ) ⊃ lca( T , λ ). The converse inclusion holds as well, by sym-metry of the argument. We conclude that the two sets lca( T , λ ) and lca( T , λ )are in fact equal. Proof of Lemma 2