Cutting down p -trees and inhomogeneous continuum random trees
CCutting down p -trees andinhomogeneous continuum random trees Nicolas Broutin ∗ Minmin Wang † Abstract
We study a fragmentation of the p -trees of Camarri and Pitman [ Elect. J. Probab. , vol. 5, pp. 1–18,2000]. We give exact correspondences between the p -trees and trees which encode the fragmentation.We then use these results to study the fragmentation of the ICRTs (scaling limits of p -trees) andgive distributional correspondences between the ICRT and the tree encoding the fragmentation. Thetheorems for the ICRT extend the ones by Bertoin and Miermont [ Ann. Appl. Probab. , vol. 23(4), pp.1469–1493, 2013] about the cut tree of the Brownian continuum random tree.
The study of random cutting of trees has been initiated by Meir and Moon [40] in the following form:Given a (graph theoretic) tree, one can proceed to chop the tree into pieces by iterating the followingprocess: choose a uniformly random edge; removing it disconnects the tree into two pieces; discard thepart which does not contain the root and keep chopping the portion containing the root until it is reducedto a single node. In the present document, we consider the related version where the vertices are chosenat random and removed (until one is left with an empty tree); each such pick is referred to as a cut . Wewill see that this version is actually much more adapted than the edge cutting procedure to the problemswe consider here.The main focus in [40] and in most of the subsequential papers has been put on the study of someparameters of this cutting down process, and in particular on how many cuts are necessary for the processto finish. This has been studied for a number of different models of deterministic and random trees suchas complete binary trees of a given height, random trees arising from the divide-and-conquer paradigm[24, 32–34] and the family trees of finite-variance critical Galton–Watson processes conditioned on thetotal progeny [29, 36, 42]. The latter model of random trees turns out to be far more interesting, andit provides an a posteriori motivation for the cutting down process. As we will see shortly, the cuttingdown process provides an interesting way to investigate some of the structural properties of random treesby partial destruction and re-combination, or equivalently as partially resampling the tree.Let us now be more specific: if L n denotes the number of cuts required to completely cut downa uniformly labelled rooted tree (random Cayley tree, or equivalently condition Galton–Watson treewith Poisson offspring distribution) on n nodes, then n − / L n converges in distribution to a Rayleighdistribution which has density xe − x / on R + . Janson [36] proved that a similar result holds for anyGalton–Watson tree with a finite-variance offspring distribution conditioned on the total progeny to be n .This is the parameter point of view. Addario-Berry, Broutin, and Holmgren [3] have shown that for therandom Cayley trees, L n actually has the same distribution as the number of nodes on the path betweentwo uniformly random nodes. Their method relies on an “objective” argument based on a coupling that ∗ Inria Paris–Rocquencourt, Domaine de Voluceau, 78153 Le Chesnay - France. Email: [email protected] † Universit´e Pierre et Marie Curie, 4 place Jussieu, 75005 Paris - France. Email: [email protected] a r X i v : . [ m a t h . P R ] A ug ssociates with the cutting procedure a partial resampling of the Cayley tree of the kind mentioned earlier:if one considers the (ordered) sequence of subtrees which are discarded as the cutting process goes on,and adds a path linking their roots, then the resulting tree is a uniformly random Cayley tree, and the twoextremities of the path are independent uniform random nodes. So the properties of the parameter L n follow from a stronger correspondence between the combinatorial objects themselves.This strong connection between the discrete objects can be carried to the level of their scaling limit,namely Aldous’ Brownian continuum random tree (CRT) [5]. Without being too precise for now, thenatural cutting procedure on the Brownian CRT involves a Poisson rain of cuts sampled according tothe length measure. However, not all the cuts contribute to the isolation of the root. As in the partialresampling of the discrete setting, we glue the sequence of discarded subtrees along an interval, therebyobtaining a new CRT. If the length of the interval is well-chosen (as a function of the cutting process),the tree obtained is distributed like the Brownian CRT and the two ends of the interval are independentlyrandom leaves. This identifies the distribution of the discarded subtrees from the cutting procedure as thedistribution of the forest one obtains from a spinal decomposition of the Brownian CRT. The distributionof the latter is intimately related to Bismut’s [18] decomposition of a Brownian excursion. See also [25]for the generalization to the L´evy case. Note that a similar identity has been proved by Abraham andDelmas [2] for general L´evy trees without using a discrete approximation. A related example is that ofthe subtree prune and re-graft dynamics of Evans et al. [28] [See also 26], which is even closer to thecutting procedure and truly resamples the object rather than giving a “recursive” decomposition.The aim of this paper is two-fold. First we prove exact identities and give reversible transformationsof p -trees similar to the ones for Cayley trees in [3]. The model of p -trees introduced by Camarri andPitman [21] generalizes Cayley trees in allowing “weights” on the vertices. In particular, this additionalstructure of weights introduces some inhomogeneity. We then lift these results to the scaling limits, theinhomogeneous continuum random trees (ICRT) of Aldous and Pitman [8], which are closely related tothe general additive coalescent [8, 13, 14]. Unlike the Brownian CRT or the stable trees (special cases ofL´evy trees), a general ICRT is not self-similar. Nor does it enjoy a “branching property” as the L´evy treesdo [37]. This lack of “recursivity” ruins the natural approaches such as the one used in [1, 2] or the oneswhich would argue by comparing two fragmentations with the same dislocation measure but differentindices of self-similarity [15]. This is one of the reasons why we believe these path transformationsat the level of the ICRT are interesting. Furthermore, a conjecture of Aldous, Miermont, and Pitman[10, p. 185] suggests that the path transformations for ICRTs actually explain the result of Abraham andDelmas [2] for L´evy trees by providing a result “conditional on the degree distribution”.Second, rather than only focusing on the isolation of the root we also consider the genealogy ofthe entire fragmentation as in the recent work of Bertoin and Miermont [16] and Dieuleveut [23] (whoexamine the case of Galton–Watson trees). In some sense, this consists in obtaining transformationscorresponding to tracking the effect of the cutting down procedure on the isolation of all the pointssimultaneously. Tracking finitely many points is a simple generalization of the one-point results, butthe “complete” result requires additional insight. The results of the present document are used in acompanion paper [20] to prove that the “complete” cutting procedure in which one tries to isolate everypoint yields a construction of the genealogy of the fragmentation on ICRTs which is reversible in the caseof the Brownian CRT. More precisely, the genealogy of Aldous–Pitman’s fragmentation of a BrownianCRT is another Brownian CRT, say G , and there exists a random transformation of G into a real tree T such that in the pair ( T , G ) the tree G is indeed distributed as the genealogy of the fragmentation on T ,conditional on T . The proof there relies crucially on the “bijective” approach that we develop here. Plan of the paper.
In the next section, we introduce the necessary notation and relevant background. Wethen present more formally the discrete and continuous models we are considering, and in which sensethe inhomogeneous continuum random trees are the scaling limit of p -trees. In Section 3 we introducethe cutting down procedures and state our main results. The study of cutting down procedure for p -trees2s the topic of Section 4. The results are lifted to the level of the scaling limits in Section 5. Although we would like to introduce our results earlier, a fair bit of notation and background is in orderbefore we can do so properly. This section may safely be skipped by the impatient reader and referred tolater on. p -trees Let A be a finite set and p = ( p u , u ∈ A ) be a probability measure on A such that min u ∈ A p u > ;this ensures that A is indeed the support of p . Let T A denote the set of rooted trees labelled with (allthe) elements of A (connected acyclic graphs on A , with a distinguished vertex). For t ∈ T A , we let r = r ( t ) denote its root vertex. For u, v ∈ A , we write { u, v } to mean that u and v are adjacent in t . Wesometimes write (cid:104) u, v (cid:105) to mean that { u, v } is an edge of t , and that u is on the path between r and v (wethink of the edges as pointing towards the root). For a tree t ∈ T A (rooted at r , say) and a node v ∈ A ,we let t v denote the tree re-rooted at v .We usually abuse notation, but we believe it does not affect the clarity or precision of our statements.For instance, we refer to a node u in the vertex set v ( t ) of a tree t using u ∈ t . Depending on the context,we sometimes write t \ { u } to denote the forest induced by t on the vertex set v ( t ) \ { u } . The (in-)degree C u ( t ) of a vertex u ∈ A is the number of edges of the form (cid:104) u, v (cid:105) with v ∈ A . For a rooted tree t , anda node u of t , we write Sub( t, u ) for the subtree of t rooted at u (above u ). For t ∈ T A and V ⊆ A ,we write Span( t ; V ) for the subtree of t spanning V and the root of r ( t ) . So Span( t ; V ) is the subtreeinduced by t on the set (cid:91) u ∈ V (cid:74) r ( t ) , u (cid:75) , where (cid:74) u, v (cid:75) denotes collection of nodes on the (unique) path between u and v in t . When V = { v , v , . . . , v k } , we usually write Span( t ; v , . . . , v k ) instead of Span( t ; { v , . . . , v k } ) . We also write Span ∗ ( t ; V ) := Span( t ; V ) \ { r ( t ) } . As noticed by Aldous [12] and Broder [19], one can generate random trees on A by extracting a treefrom the trace of a random walk on A , where the sequence of steps is given by a sequence of i.i.d. verticesdistributed according to p . Algorithm 2.1 (Weighted version of Aldous–Broder Algorithm) . Let Y = ( Y j , j ≥ be a sequenceof independent variables with common distribution p ; further on, we say that Y j are i.i.d. p -nodes. Let T ( Y ) be the graph rooted at Y with the set of edges {(cid:104) Y j − , Y j (cid:105) : Y j / ∈ { Y , · · · , Y j − } , j ≥ } . (2.1)The sequence Y defines a random walk on A , which eventually visits every element of A withprobability one, since A is the support of p . So the trace {(cid:104) Y j − , Y j (cid:105) : j ≥ } of the random walk on A is a connected graph on A , rooted at Y . Algorithm 2.1 extracts the tree T ( Y ) from the trace of therandom walk. To see that T ( Y ) is a tree, observe that the edge (cid:104) Y j − , Y j (cid:105) is added only if Y j has neverappeared before in the sequence. It follows easily that T ( Y ) is a connected graph without cycles, hencea tree on A . Let π denote the distribution of T ( Y ) . Lemma 2.2 ([12, 19, 27]) . For t ∈ T A , we have π ( t ) := π ( p ) ( t ) = (cid:89) u ∈ A p C u ( t ) u . (2.2)3ote that π is indeed a probability distribution on T A , since by Cayley’s multinomial formula ([22,43]), we have (cid:88) t ∈ T A π ( t ) = (cid:88) t ∈ T A (cid:89) u ∈ A p C u ( t ) u = (cid:32)(cid:88) u ∈ A p u (cid:33) | A |− = 1 . (2.3)A random tree on A distributed according to π as specified by (2.2) is called a p -tree . It is also called thebirthday tree in the literature, for its connection with the general birthday problem (see [21]). Observethat when p is the uniform distribution on [ n ] := { , , . . . , n } , a p -tree is a uniformly random rooted treeon [ n ] (a Cayley tree). So the results we are about to present generalize the exact distributional results in[3]. However, we believe that the point of view we adopt here is a little cleaner, since it permits to makethe transformation exactly reversible without any extra anchoring nodes (which prevent any kind dualityat the discrete level).From now on, we consider n ≥ and let [ n ] denote the set { , , · · · , n } . We write T n as a shorthandfor T [ n ] , the set of the rooted trees on [ n ] . Let also p = ( p i , ≤ i ≤ n ) be a probability measure on [ n ] satisfying min i ∈ [ n ] p i > . For a subset A ⊆ [ n ] such that p ( A ) > , we let p | A ( · ) = p ( · ∩ A ) / p ( A ) denote the restriction of p on A , and write π | A := π ( p | A ) . The following lemma says that the distributionof p -trees is invariant by re-rooting at an independent p -node and “recursive” in a certain sense. Thesetwo properties are one of the keys to our results on the discrete objects. (For a probability distribution µ ,we write X ∼ µ to mean that µ is the distribution of the random variable X .) Lemma 2.3.
Let T be a p -tree on [ n ] .i) If V is an independent p -node. Then, T V ∼ π .ii) Let N be set of neighbors of the root in T . Then, for u ∈ N , conditional on v (Sub( T, u )) = V , Sub(
T, u ) ∼ π | V independent of { Sub(
T, w ) : w ∈ N, w (cid:54) = u } . The first claim can be verified from (2.2), the second is clear from the product form of π . If ( X, d ) is a metric space endowed with the Borel σ -algebra, we denote by M f ( X ) the set of finitemeasures on X and by M ( X ) the subset of probability measures on X . If m ∈ M f ( X ) , we denote by supp( m ) the support of m on X , that is the smallest closed set A such that m ( A c ) = 0 . If f : X → Y is a measurable map between two metric spaces, and if m ∈ M f ( X ) , then the push-forward of m isan element of M f ( Y ) , denoted by f ∗ m ∈ M f ( Y ) , and is defined by ( f ∗ m )( A ) = m ( f − ( A )) foreach Borel set A of Y . If m ∈ M f ( X ) and A ⊆ X , we denote by m (cid:22) A the restriction of m to A : m (cid:22) A ( B ) = m ( A ∩ B ) for any Borel set B . This should not be confused with the restriction of aprobability measure, which remains a probability measure and is denoted by m | A .We say a triple ( X, d, µ ) is a measured metric space (or sometimes a metric measure space ) if ( X, d ) is a Polish space (separable and complete) and µ ∈ M ( X ) . Two measured metric spaces ( X, d, µ ) and ( X (cid:48) , d (cid:48) , µ (cid:48) ) are said to be weakly isometric if there exists an isometry φ between the supports of µ on X and of µ (cid:48) on X (cid:48) such that ( φ ) ∗ µ = µ (cid:48) . This defines an equivalence relation between the measured metricspaces, and we denote by M the set of equivalence classes. Note that if ( X, d, µ ) and ( X (cid:48) , d (cid:48) , µ (cid:48) ) areweakly isometric, the metric spaces ( X, d ) and ( X (cid:48) , d (cid:48) ) may not be isometric.We can define a metric on M by adapting Prokhorov’s distance. Consider a metric space ( X, d ) and for (cid:15) > , let A (cid:15) := { x ∈ X : d ( x, A ) < (cid:15) } . Then, given two (Borel) probability measures µ, ν ∈ M ( X ) , the Prokhorov distance d P between µ and ν is defined by d P ( µ, ν ) := inf { (cid:15) > µ ( A ) ≤ ν ( A (cid:15) ) + (cid:15) and ν ( A ) ≤ µ ( A (cid:15) ) + (cid:15), for all Borel sets A } . (2.4)Note that the definition of the Prokhorov distance (2.4) can be easily extended to a pair of finite (Borel)measures on X . Then, for two measured metric spaces ( X, d, µ ) and ( X (cid:48) , d (cid:48) , µ (cid:48) ) the Gromov–Prokhorov4GP) distance between them is defined to be d GP (( X, d, µ ) , ( X (cid:48) , d (cid:48) , µ (cid:48) )) = inf Z,φ,ψ d P ( φ ∗ µ, ψ ∗ µ (cid:48) ) , where the infimum is taken over all metric spaces Z and isometric embeddings φ : supp( µ ) → Z and ψ : supp( µ (cid:48) ) → Z . It is clear that d GP depends only on the equivalence classes containing ( X, d, µ ) and ( X (cid:48) , d (cid:48) , µ (cid:48) ) . Moreover, the Gromov–Prokhorov distance turns M in a Polish space.There is another more convenient characterization of the GP topology (the topology induced by d GP ) that relies on convergence of distance matrices between random points. Let X = ( X, d, µ ) be ameasured metric space and let ( ξ i , i ≥ be a sequence of i.i.d. points of common distribution µ . Inthe following, we will often refer to such a sequence as ( ξ i , i ≥ as an i.i.d. µ -sequence. We write ρ X = ( d ( ξ i , ξ j ) , ≤ i, j < ∞ ) for the distance matrix associated with this sequence. One easilyverifies that the distribution of ρ X does not depend on the particular element of an equivalent class of M .Moreover, by Gromov’s reconstruction theorem [30, ], the distribution of ρ X characterizes X as anelement of M . Proposition 2.4 (Corollary 8 of [38]) . If X is some random element taking values in M and for each n ≥ , X n is a random element taking values in M , then X n converges to X in distribution as n → ∞ ifand only if ρ X n converges to ρ X in the sense of finite-dimensional distributions. Pointed Gromov–Prokhorov topology.
The above characterization by matrix of distances turns out tobe quite handy when we want to keep track of marked points. Let k ∈ N . If ( X, d, µ ) is a measuredmetric space and x = ( x , x , · · · , x k ) ∈ X k is a k -tuple, then we say ( X, d, µ, x ) is a k -pointedmeasured metric space , or simply a pointed measured metric space. Two pointed metric measure spaces ( X, d, µ, x ) and ( X (cid:48) , d (cid:48) , µ (cid:48) , x (cid:48) ) are said to be weakly isometric if there exists an isometric bijection φ : supp( µ ) ∪ { x , x , · · · , x k } → supp( µ (cid:48) ) ∪ { x (cid:48) , x (cid:48) , · · · , x (cid:48) k } such that ( φ ) ∗ µ = µ (cid:48) and φ ( x i ) = x (cid:48) i , ≤ i ≤ k , where x = ( x , x , · · · , x k ) and x (cid:48) = ( x (cid:48) , x (cid:48) , · · · , x (cid:48) k ) .We denote by M ∗ k the space of weak isometry-equivalence classes of k -pointed measured metric spaces.Again, we emphasize the fact that the underlying metric spaces ( X, d ) and ( X (cid:48) , d (cid:48) ) do not have to beisometric. The space M ∗ k equipped with the following pointed Gromov–Prokhorov topology is a Polishspace.A sequence ( X n , d n , µ n , x n ) n ≥ of k -pointed measured metric spaces is said to converge to somepointed measured metric space ( X, d, µ, x ) in the k -pointed Gromov–Prokhorov topology if for any m ≥ , (cid:0) d n ( ξ ∗ n,i , ξ ∗ n,j ) , ≤ i, j ≤ m (cid:1) n →∞ −→ d (cid:0) d ( ξ ∗ i , ξ ∗ j ) , ≤ i, j ≤ m (cid:1) , where for each n ≥ and ≤ i ≤ k , ξ ∗ n,i = x n,i if x n = ( x n, , x n, , · · · , x n,k ) and ( ξ ∗ n,i , i ≥ k + 1) isa sequence of i.i.d. µ n -points in X n . Similarly, ξ ∗ i = x i for ≤ i ≤ k and ( ξ ∗ i , i ≥ k + 1) is a sequenceof i.i.d. µ -points in X . This induces the k -pointed Gromov–Prokhorov topology on M ∗ k . Gromov–Hausdorff metric.
Two compact subsets A and B of a given metric space ( X, d ) are comparedusing the Hausdorff distance d H . d H ( A, B ) := inf { (cid:15) > A ⊆ B (cid:15) and B ⊆ A (cid:15) } .
5o compare two compact metric spaces ( X, d ) and ( X (cid:48) , d (cid:48) ) , we first embed them into a single metricspace ( Z, δ ) via isometries φ : X → Z and ψ : X (cid:48) → Z , and then compare the images φ ( X ) and ψ ( X (cid:48) ) using the Hausdorff distance on Z . One then defines the Gromov–Hausdoff (GH) distance d GH by d GH (( X, d ) , ( X (cid:48) , d (cid:48) )) := inf Z,φ,ψ d H ( φ ( X ) , ψ ( X (cid:48) )) , where the infimum ranges over all choices of metric spaces Z and isometric embeddings φ : X → Z and ψ : X (cid:48) → Z . Note that, as opposed to the case of the GP topology, two compact metric spaces that areat GH distance zero are isometric. Gromov–Hausdorff–Prokhorov metric.
Now if ( X, d ) and ( X (cid:48) , d (cid:48) ) are two compact metric spacesand if µ ∈ M f ( X ) and µ (cid:48) ∈ M f ( X (cid:48) ) , one way to compare simultaneously the metric spaces and themeasures is to define d GHP (cid:0) ( X, d, µ ) , ( X (cid:48) , d (cid:48) , µ (cid:48) ) (cid:1) := inf Z,φ,ψ (cid:110) d H (cid:0) φ ( X ) , ψ ( X (cid:48) ) (cid:1) ∨ d P ( φ ∗ µ, ψ ∗ µ (cid:48) ) (cid:111) , where the infimum ranges over all choices of metric spaces Z and isometric embeddings φ : X → Z and ψ : X (cid:48) → Z . If we denote by M c the set of equivalence classes of compact measured metric spacesunder measure-preserving isometries, then M c is Polish when endowed with d GHP . Pointed Gromov–Hausdorff metric.
We fix some k ∈ N . Given two compact metric spaces ( X, d X ) and ( Y, d Y ) , let x = ( x , x , · · · , x k ) ∈ X k and y = ( y , y , · · · , y k ) ∈ Y k . Then the pointed Gromov–Hausdorff metric between ( X, d X , x ) and ( Y, d Y , y ) is defined to be d pGH (cid:0) ( X, d X , x ) , ( Y, d Y , y ) (cid:1) := inf Z,φ,ψ (cid:110) d H (cid:0) φ ( X ) , ψ ( Y ) (cid:1) ∨ max ≤ i ≤ k d Z (cid:0) φ ( x i ) , ψ ( y i ) (cid:1)(cid:111) , where the infimum ranges over all choices of metric spaces Z and isometric embeddings φ : X → Z and ψ : X (cid:48) → Z . Let M kc denote the isometry-equivalence classes of those compact metric spaces with k marked points. It is a Polish space when endowed with d pGH . A real tree is a geodesic metric space without loops. More precisely, a metric space ( X, d, r ) is called a(rooted) real tree if r ∈ X and • for any two points x, y ∈ X , there exists a continuous injective map φ xy : [0 , d ( x, y )] → X suchthat φ xy (0) = x and φ xy ( d ( x, y )) = y . The image of φ xy is denoted by (cid:74) x, y (cid:75) ; • if q : [0 , → X is a continuous injective map such that q (0) = x and q (1) = y , then q ([0 , (cid:74) x, y (cid:75) .As for discrete trees, when it is clear from context which metric we are talking about, we refering tometric spaces by the sets. For instance ( T , d ) is often referred to as T .A measured (rooted) real tree is a real tree ( X, d, r ) equipped with a finite (Borel) measure µ ∈M ( X ) . We always assume that the metric space ( X, d ) is complete and separable. We denote by T w theset of the weak isometry equivalence classes of measured rooted real trees, equipped with the pointedGromov–Prokhorov topology. Also, let T cw be the set of the measure-preserving isometry equivalenceclasses of those measured rooted real trees ( X, d, r, µ ) such that ( X, d ) is compact. We endow T cw withthe pointed Gromov–Hausdorff–Prokhorov distance. Then both T w and T cw are Polish spaces. Howeverin our proofs, we do not always distinguish an equivalence class and the elements in it.6et ( T, d, r ) be a rooted real tree. For u ∈ T , the degree of u in T , denoted by deg( u, T ) , is thenumber of connected components of T \ { u } . We also denote by Lf( T ) = { u ∈ T : deg( u, T ) = 1 } and Br( T ) = { u ∈ T : deg( u, T ) ≥ } the set of the leaves and the set of branch points of T , respectively. The skeleton of T is the complemen-tary set of Lf( T ) in T , denoted by Sk( T ) . For two points u, v ∈ T , we denote by u ∧ v the closest commonancestor of u and v , that is, the unique point w of (cid:74) r, u (cid:75) ∩ (cid:74) r, v (cid:75) such that d ( u, v ) = d ( u, w ) + d ( w, v ) .For a rooted real tree ( T, r ) , if x ∈ T then the subtree of T above x , denoted by Sub(
T, x ) , is definedto be Sub(
T, x ) := { u ∈ T : x ∈ (cid:74) r, u (cid:75) } . Spanning subtree.
Let ( T, d, r ) be a rooted real tree and let x = ( x , · · · , x k ) be k points of T for some k ≥ . We denote by Span( T ; x ) the smallest connected set of T which contains the root r and x , thatis, Span( T ; x ) = ∪ ≤ i ≤ k (cid:74) r, x i (cid:75) . We consider Span( T ; x ) as a real tree rooted at r and refer to it as a spanning subtree or a reduced tree of T .If ( T, d, r ) is a real tree and there exists some x = ( x , x , · · · , x k ) ∈ T k for some k ≥ such that T = Span( T ; x ) , then the metric aspect of T is rather simple to visualize. More precisely, if we write x = r and let ρ x = ( d ( x i , x j ) , ≤ i, j ≤ k ) , then ρ x determines ( T, d, r ) under an isometry. Gluing. If ( T i , d i ) , i = 1 , are two real trees with some distinguished points x i ∈ T i , i = 1 , , the resultof the gluing of T and T at ( x , x ) is the metric space ( T ∪ T , δ ) , where the distance δ is defined by δ ( u, v ) = (cid:26) d i ( u, v ) , if ( u, v ) ∈ T i , i = 1 , d ( u, x ) + d ( v, x ) , if u ∈ T , v ∈ T . It is easy to verify that ( T ∪ T , δ ) is a real tree with x and x identified as one point, which we denoteby T (cid:126) x = x T in the following. Moreover, if T is rooted at some point r , we make the conventionthat T (cid:126) x = x T is also rooted at r . The inhomogeneous continuum random tree (abbreviated as ICRT in the following) has been introducedin [21] and [8]. See also [7, 10, 11] for studies of ICRT and related problems.Let Θ (the parameter space ) be the set of sequences θ = ( θ , θ , θ , · · · ) ∈ R ∞ + such that θ ≥ θ ≥ θ · · · ≥ , θ ≥ , (cid:80) i ≥ θ i = 1 , and either θ > or (cid:80) i ≥ θ i = ∞ . Poisson point process construction.
For each θ ∈ Θ , we can define a real tree T in the following way. • If θ > , let P = { ( u j , v j ) , j ≥ } be a Poisson point process on the first octant { ( x, y ) : 0 ≤ y ≤ x } of intensity measure θ dxdy , ordered in such a way that u < u < u < · · · . • For every i ≥ such that θ i > , let P i = { ξ i,j , j ≥ } be a homogeneous Poisson process on R + of intensity θ i under P , such that ξ i, < ξ i, < ξ i, < · · · .All these Poisson processes are supposed to be mutually independent and defined on some commonprobability space (Ω , F , P ) . We consider the points of all these processes as marks on the half line R + ,among which we distinguish two kinds: the cutpoints and the joinpoints . A cutpoint is either u j for some j ≥ or ξ i,j for some i ≥ and j ≥ . For each cutpoint x , we associate a joinpoint x ∗ as follows: x ∗ = v j if x = u j for some j ≥ and x ∗ = ξ i, if x = ξ i,j for some i ≥ and j ≥ . One easily verifiesthat the hypotheses on θ imply that the set of cutpoints is a.s. finite on each compact set of R + , while thejoinpoints are dense a.s. everywhere. (See for example [8] for a proof.) In particular, we can arrange the7utpoints in increasing order as < η < η < η < · · · . This splits R + into countaly intervals that wenow reassemble into a tree. We write η ∗ k for the joinpoint associated to the k -th cutpoint η k . We define R to be the metric space [0 , η ] rooted at . For k ≥ , we let R k +1 := R k (cid:126) η ∗ k = η k [ η k , η k +1 ] . In words, we graft the intervals [ η k , η k +1 ] by gluing the left end at the joinpoint η ∗ k . Note that we have η ∗ k < η k a.s., thus η ∗ k ∈ R k and the above grafting operation is well defined almost surely. It followsfrom this Poisson construction that ( R k ) k ≥ is a consistent family of “discrete” trees which also verifiesthe “leaf-tight” condition in Aldous [5]. Therefore by [5, Theorem 3], the complete metric space T := ∪ k ≥ R k is a real tree and almost surely there exists a probability measure µ , called the mass measure ,which is concentrated on the leaf set of T . Moreover, if conditional on T , ( V k , k ≥ is a sequenceof i.i.d. points sampled according to µ , then for each k ≥ , the spanning tree Span( T ; V , V , · · · , V k ) has the same unconditional distribution as R k . The distribution of the weak isometry equivalence classof ( T , µ ) is said to be the distribution of an ICRT of parameter θ , which is a probability distributionon T w . The push-forward of the Lebesgue measure on R + defines a σ -finite measure (cid:96) on T , which isconcentrated on Sk( T ) and called the length measure of T . Furthermore, it is not difficult to deduce thedistribution of (cid:96) ( R ) from the above construction of T : P ( (cid:96) ( R ) > r ) = P ( η > r ) = e − θ r (cid:89) i ≥ (1 + θ i r ) e − θ i r , r > . (2.5)In the important special case when θ = (1 , , , · · · ) , the above construction coincides with the line-breaking construction of the Brownian CRT in [4, Algorithm 3], that is, T is the Brownian CRT. Thiscase will be referred as the Brownian case in the sequel. We notice that whenever there is an index i ≥ such that θ i > , the point, denoted by β i , which corresponds to the joinpoint ξ i, is a branch point ofinfinite degree. According to [10, Theorem 2]), θ i is a measurable function of ( T , β i ) , and we refer to itas the local time of β i in what follows. ICRTs as scaling limits of p -trees. Let p n = ( p n , p n , · · · , p nn ) be a probability measure on [ n ] suchthat p n ≥ p n ≥ · · · ≥ p nn > , n ≥ . Define σ n ≥ by σ n = (cid:80) ni =1 p ni and denote by T n thecorresponding p n -tree, which we view as a metric space on [ n ] with graph distance d T n . Suppose thatthe sequence ( p n , n ≥ verifies the following hypothesis: there exists some parameter θ = ( θ i , i ≥ such that lim n →∞ σ n = 0 , and lim n →∞ p ni σ n = θ i , for every i ≥ . (H)Then, writing σ n T n for the rescaled metric space ([ n ] , σ n d T n ) , Camarri and Pitman [21] have shownthat ( σ n T n , p n ) n →∞ −→ d, GP ( T , µ ) , (2.6)where → d, GP denotes the convergence in distribution with respect to the Gromov–Prokhorov topology. p -trees and ICRT Consider a p -tree T . We perform a cutting procedure on T by picking each time a vertex according tothe restriction of p to the remaining part; however, it is more convenient for us to retain the portion ofthe tree that contains a random node V sampled according to p rather than the root. We denote by L ( T ) the number of cuts necessary until V is finally picked, and let X i , ≤ i ≤ L ( T ) , be the sequence of8odes chosen. The following identity in distribution has been already shown in [3] in the special case ofthe uniform Cayley tree: L ( T ) d = Card { vertices on the path from the root to V } . (3.1)In fact, (3.1) is an immediate consequence of the following result. In the above cutting procedure, weconnect the rejected parts, which are subtrees above X i just before the cutting, by drawing an edgebetween X i and X i +1 , i = 1 , , · · · , L ( T ) − (see Figure 1 in Section 4). We obtain another tree on thesame vertex set, which contains a path from the first cut X to the random node V that we were trying toisolate. We denote by cut( T, V ) this tree which (partially) encodes the isolating process of V . We provein Section 4 that we have (cut( T, V ) , V ) d = ( T, V ) . (3.2)This identity between the pairs of trees contains a lot of information about the distributional structure ofthe p -trees, and our aim is to obtain results similar to (3.2) for ICRTs. The method we use relies on thediscrete approximation of ICRT by p -trees, and a first step consists in defining the appropriate cuttingprocedure for ICRT.In the case of p -trees, one may pick the nodes of T in the order in which they appear in a Poissonrandom measure. We do not develop it here but one should keep in mind that the cutting procedure maybe obtained using a Poisson point process on R + × T with intensity measure dt ⊗ p . In particular, thismeasure has a natural counterpart in the case of ICRTs, and it is according to this measure that the pointsshould be sampled in the continuous case.So consider now an ICRT T . Recall that for θ (cid:54) = (1 , , . . . ) , for each θ i > with i ≥ , there existsa unique point, denoted by β i , which has infinite degree. Let L be the measure on T defined by L ( dx ) := θ (cid:96) ( dx ) + (cid:88) i ≥ θ i δ β i ( dx ) , (3.3)which is almost surely σ -finite (Lemma 5.1). Proving that L is indeed the relevant cutting measure (ina sense made precise in Proposition 5.2) is the topic of Section 7. Conditional on T , let P be a Poissonpoint process on R + × T of intensity measure dt ⊗ L ( dx ) and let V be a µ -point on T . We considerthe elements of P as the successive cuts on T which try to isolate the random point V . For each t ≥ ,define P t = { x ∈ T : ∃ s ≤ t such that ( s, x ) ∈ P} , and let T t be the part of T still connected to V at time t , that is the collection of points u ∈ T for whichthe unique path in T from V to u does not contain any element of P t . Clearly, T t (cid:48) ⊂ T t if t (cid:48) ≥ t . We set C := { t > µ ( T t − ) > µ ( T t ) } . Those are the cuts which contribute to the isolation of V . We construct a tree which encodes this cutting process in a similar way that the tree H = cut( T, V ) en-codes the cutting procedure for discrete trees. First we construct the “backbone”, which is the equivalentof the path we add in the discrete case. For t ≥ , we define L t := (cid:90) t µ ( T s ) ds, and L ∞ the limit as t → ∞ (which might be infinite). Now consider the interval [0 , L ∞ ] , togetherwith its Euclidean metric, that we think of as rooted at . Then, for each t ∈ C we graft T t − \ T t , theportion of the tree discarded at time t , at the point L t ∈ [0 , L ∞ ] (in the sense of the gluing introduced inSection 2.5). This creates a rooted real tree and we denote by cut( T , V ) its completion. Moreover, we9an endow cut( T , V ) with a (possibly defective probability) measure ˆ µ by taking the push-forward of µ under the canonical injection φ from ∪ t ∈C ( T t − \ T t ) to cut( T , V ) . We denote by U the endpoint L ∞ ofthe interval [0 , L ∞ ] . We show in Section 5 that Theorem 3.1.
We have L ∞ < ∞ almost surely. Moreover, under (H) we have ( σ n cut( T n , V n ) , p n , V n ) n →∞ −→ d, GP (cut( T , V ) , ˆ µ, U ) , jointly with the convergence in (2.6) . Combining this with (3.2), we show in Section 5 that
Theorem 3.2.
Conditional on T , U has distribution ˆ µ , and the unconditional distribution of (cut( T , V ) , ˆ µ ) is the same as that of ( T , µ ) . Theorems 3.1 and 3.2 immediately entail that
Corollary 3.3.
Suppose that (H) holds. Then σ n L ( T n ) n →∞ −→ d L ∞ , jointly with the convergence in (2.6) . Moreover, the unconditional distribution of L ∞ is the same as thatof the distance in T between the root and a random point V chosen according to µ , given in (2.5) . In the procedure of the previous section, the fragmentation only takes place on the portions of the treewhich contain the random point V . Following Bertoin and Miermont [16], we consider a more generalcutting procedure which keeps splitting all the connected components. The aim here is to describe thegenealogy of the fragmentation that this cutting procedure produces. For each t ≥ , P t induces anequivalence relation ∼ t on T : for x, y ∈ T we write x ∼ t y if (cid:74) x, y (cid:75) ∩ P t = ∅ . We denote by T x ( t ) theequivalence class containing x . In particular, we have T V ( t ) = T t . Let ( V i ) i ≥ be a sequence of i.i.d. µ -points in T . For each t ≥ , define µ i ( t ) = µ ( T V i ( t )) . We write µ ↓ ( t ) for the sequence ( µ i ( t ) , i ≥ rearranged in decreasing order. In the case where T is the Brownian CRT, the process ( µ ↓ ( t )) t ≥ is thefragmentation dual to the standard additive coalescent [8]. In the other cases, however, it is not evenMarkov because of the presence of those branch points β i with fixed local times θ i .As in [16], we can define a genealogical tree for this fragmentation process. For each i ≥ and t ≥ , let L it := (cid:90) t µ i ( s ) ds, and let L i ∞ ∈ [0 , ∞ ] be the limit as t → ∞ . For each pair ( i, j ) ∈ N , let τ ( i, j ) = τ ( j, i ) be thefirst moment when (cid:74) V i , V j (cid:75) contains an element of P (or more precisely, its projection onto T ), which isalmost surely finite by the properties of T and P . It is not difficult to construct a sequence of increasingreal trees S ⊂ S ⊂ · · · such that S k has the form of a discrete tree rooted at a point denoted ρ ∗ withexactly k leaves { U , U , · · · , U k } satisfying d ( ρ ∗ , U i ) = L i ∞ , d ( U i , U j ) = L i ∞ + L j ∞ − L iτ ( i,j ) , ≤ i < j ≤ k ; (3.4)where d denotes the distance of S k , for each k ≥ . Then we define cut( T ) := ∪ k ≥ S k , ( ∪ k S k , d ) , which is still a real tree. In the case where T is theBrownian CRT, the above definition of cut( T ) coincides with the tree defined by Bertoin and Miermont[16].Similarly, for each p n -tree T n , we can define a complete cutting procedure on T n by first generatinga random permutation ( X n , X n , · · · , X nn ) on the vertex set [ n ] and then removing X ni one by one.Here the permutation ( X n , X n , . . . , X nn ) is constructed by sampling, for i ≥ , X ni according to therestriction of p n to [ n ] \ { X nj , j < i } . We define a new genealogy on [ n ] by making X ni an ancestor of X nj if i < j and X nj and X ni are in the same connected component when X ni is removed. If we denoteby cut( T n ) the corresponding genealogical tree, then the number of vertices in the path of cut( T n ) between the root X n and an arbitrary vertex v is precisely equal to the number of cuts necessary toisolate this vertex v . We have Theorem 3.4.
Suppose that (H) holds. Then, we have (cid:0) σ n cut( T n ) , p n (cid:1) n →∞ −→ d, GP (cid:0) cut( T ) , ν (cid:1) , jointly with the convergence in (2.6) . Here, ν is the weak limit of the empirical measures k (cid:80) k − i =0 δ U i ,which exists almost surely conditional on T . From this, we show that
Theorem 3.5.
Conditionally on T , ( U i , i ≥ has the distribution as a sequence of i.i.d. points ofcommon law ν . Furthermore, the unconditioned distribution of the pair (cut( T ) , ν ) is the same as ( T , µ ) . In general, the convergence of the p n -trees to the ICRT in (2.6) cannot be improved to Gromov–Hausdorff (GH) topology, see for instance [9, Example 28]. However, when the sequence ( p n ) n ≥ issuitably well-behaved, one does have this stronger convergence. (This is the case for example with p n the uniform distribution on [ n ] , which gives rise to the Brownian CRT, see also [10, Section 4.2].) In suchcases, we can reinforce accordingly the above convergences of the cut trees in the Gromov–Hausdorfftopology. Note however that a ”reasonable” condition on p ensuring the Gromov–Hausdorff convergenceseems hard to find. Let us mention a related open question in [10, Section 7], which is to determinea practical criterion for the compactness of a general ICRT. Writing → d, GHP for the convergence indistribution with respect to the Gromov–Hausdorff–Prokhorov topology (see Section 2), we have
Theorem 3.6.
Suppose that T is almost surely compact and suppose also as n → ∞ , (cid:0) σ n T n , p n (cid:1) n →∞ −→ d, GHP (cid:0) T , µ (cid:1) . (3.5) Then, jointly with the convergence in (3.5) , we have (cid:0) σ n cut( T n , V n ) , p n (cid:1) n →∞ −→ d, GHP (cid:0) cut( T , V ) , ˆ µ (cid:1) , (cid:0) σ n cut( T n ) , p n (cid:1) n →∞ −→ d, GHP (cid:0) cut( T ) , ν (cid:1) . We also consider the transformation that “reverses” the construction of the trees cut( T , V ) defined above.Here, by reversing we mean to obtain a tree distributed as the primal tree T , conditioned on the cut treebeing the one we need to transform. So for an ICRT ( H , d H , ˆ µ ) and a random point U sampled accordingto its mass measure ˆ µ , we should construct a tree shuff( H , U ) such that ( T , cut( T , V )) d = (shuff( H , U ) , H ) . (3.6)11his reverse transformation is the one described in [3] for the Brownian CRT. For H rooted at r ( H ) , thepath between (cid:74) r ( H ) , U (cid:75) that joins r ( H ) to U in H decomposes the tree into countably many subtrees ofpositive mass F x = { y ∈ H : U ∧ y = x } , where U ∧ y denotes the closest common ancestor of U and y , that is the unique point a such that (cid:74) r ( H ) , U (cid:75) ∩ (cid:74) r ( H ) , y (cid:75) = (cid:74) r ( H ) , a (cid:75) . Informally, the tree shuff( H , U ) is the metric space one obtainsfrom H by attaching each F x of positive mass at a random point A x , which is sampled proportionallyto ˆ µ in the union of the F y for which d H ( U, y ) < d H ( U, x ) . We postpone the precise definition of shuff( H , U ) until Section 6.1.The question of reversing the complete cut tree cut( T ) is more delicate and is the subject of thecompanion paper [20]. There we restrict ourselves to the case of a Brownian CRT: for T and G BrownianCRT we construct a tree shuff( G ) such that ( T , cut( T )) d = (shuff( G ) , G ) . We believe that the construction there is also valid for more general ICRTs, but the arguments we usethere strongly rely on the self-similarity of the Brownian CRT.
Remarks. i.
Theorem 3.2 generalizes Theorem 1.5 in [3], which is about the Brownian CRT. The specialcase of Theorem 3.1 concerning the convergence of uniform Cayley trees to the Brownian CRT is alsofound there. ii.
When T is the Brownian CRT, Theorem 3.5 has been proven by Bertoin and Miermont [16].Their proof relies on the self-similar property of the Aldous–Pitman’s fragmentation. They also proveda convergence similar to the one in Theorem 3.4 for the conditioned Galton–Watson trees with finite-variance offspring distributions. Let us point out that their definition of the discrete cut trees is distinctfrom ours, and there is no “duality” at the discrete level for their definitions. Very recently, a resultrelated to Theorem 3.4 has been proved for the case of stable trees [23] (with a different notion ofdiscrete cut tree). Note also that the convergence of the cut trees proved in [16] and [23] is with respectto the Gromov–Prokhorov topology, so is weaker than the convergence of the corresponding conditionedGalton–Watson trees, which holds in the Gromov–Hausdorff–Prokhorov sense. In our case, the identitiesimply that the convergence of the cut trees is as strong as that of the p n -trees (Theorem 3.6). iii. Abraham and Delmas [2] have shown an analog of Theorem 3.2 for the L´evy tree, introduced in[37]. In passing Aldous et al. [10] have conjured that a L´evy tree is a mixture of the ICRTs where theparameters θ are chosen according to the distribution of the jumps in the bridge process of the associatedL´evy process. Then the similarity between Theorem 3.2 and the result of Abraham and Delmas may beseen as a piece of evidence supporting this conjecture. p -tree As we have mentioned in the introduction, our approach to the theorems about continuum random treesinvolves taking limits in the discrete world. In this section, we prove the discrete results about thedecomposition and the rearrangement of p -trees that will enable us to obtain similar decomposition andrearrangement procedures for inhomogeneous continuum random trees. As a warm up, and in order to present many of the important ideas, we start by isolating a single node.Let T be a p -tree and let V be an independent p -node. We isolate the vertex V by removing each time arandom vertex of T and preserving only the component containing V until the time when V is picked.12 HE CUTTING PROCEDURE AND THE - CUT TREE . Initially, we have T = T , and an independentvertex V . Then, for i ≥ , we choose a node X i according to the restriction of p to the vertex set v ( T i − ) of T i − . We define T i to be the connected component of the forest induced by T i − on v ( T i − ) \ { X i } which contains V . If T i = ∅ , or equivalently X i = V , the process stops and we set L = L ( T ) = i .Since at least one vertex is removed at every step, the process stops in time L ≤ n .As we destruct the tree T to isolate V by iteratively pruning random nodes, we construct a treewhich records the history of the destruction, that we call the -cut tree. This -cut tree will, in particular,give some information about the number of cuts which were needed to isolate V . However, we remindthe reader that this number of cuts is not our main objective, and that we are after a more detailedcorrespondence between the initial tree and its -cut tree. We will prove that these two trees are dual ina sense that we will make precise shortly.By construction, ( T i , ≤ i < L ) is a decreasing sequence of nonempty trees which all contain V , and ( X i , ≤ i < L ) is a sequence of distinct vertices of T = T . Removing X i from T i − disconnects it intoa number of connected components. Then T i is the one of these connected components which contains V , or T i = ∅ if X i = V . If X i = V we set F i = T i − , which we see as a tree rooted at X i = V .Otherwise X i (cid:54) = V and there is a neighbor U i of X i on the path between X i and V in T i − . Then U i ∈ T i and we see T i as rooted at U i ; furthermore, we let F i be the subtree of T i − \ { U i } which contains X i ,seen as rooted at X i . In other words, T i is the subtree with vertex set { u ∈ T i − : X i / ∈ (cid:74) u, V (cid:75) } rootedat U i and F i is the subtree with vertex set { u ∈ T i − : X i ∈ (cid:74) u, V (cid:75) } rooted at X i .When the procedure stops, we have a vector ( F i , ≤ i ≤ L ) of subtrees of T which together spanall of [ n ] . We may re-arrange them into a new tree, the -cut tree corresponding to the isolation of V in T . We do this by connecting their roots X , X , . . . , X L into a path (in this order). The resulting tree,denoted by H is seen as rooted at X , and carries a distinguished path or backbone (cid:74) X , V (cid:75) , which wedenote by S , and distinguished points U , . . . , U L − .Note that for i = 1 , . . . , L − , we have U i ∈ T i . Equivalently, U i lies in the subtree of H rootedat X i +1 . In general, for a tree t ∈ T n and v ∈ [ n ] , let x , . . . , x (cid:96) = v be the nodes of Span( t ; v ) . Wedefine U ( t, v ) as the collection of vectors ( u , . . . , u (cid:96) − ) of nodes of [ n ] such that u i ∈ Sub( t, x i +1 ) , for ≤ i < (cid:96) . Then by construction, for a h ∈ T n , conditional on H = h and V = v , we have L equal to thenumber of the nodes in Span( h ; v ) and ( U , . . . , U L − ) ∈ U ( h, v ) with probability one. For A ⊆ [ n ] ,we write p ( A ) := (cid:80) i ∈ A p i . Lemma 4.1.
Let T be a p -tree on [ n ] , and V be an independent p -node. Let h ∈ T n , and v ∈ [ n ] forwhich Span( h ; v ) is the path made of the nodes x , x , . . . , x (cid:96) − , x (cid:96) = v . Let ( u , . . . , u (cid:96) − ) ∈ U ( h, v ) and w ∈ [ n ] . Then we have P ( H = h ; V = v ; r ( T ) = w ; U i = u i , ≤ i < (cid:96) ) = π ( h ) · (cid:89) ≤ i<(cid:96) p u i p (Sub( h, x i +1 )) · p v · p w . In particular, ( H, V ) ∼ π ⊗ p . As a direct consequence of our construction of H , L is the number of nodes of the subtree Span(
H, V ) ,which we write H, V ) . So Lemma 4.1 entails immediately that Proposition 4.2.
Let T be a p -tree and V be an independent p -node. Then L d = T, V ) . Proof of Lemma 4.1.
By construction, we have { H = h ; V = v } ⊂ { X = x , · · · , X (cid:96) − = x (cid:96) − , X (cid:96) = v ; L = (cid:96) } , X X X X X X X X X U U U U U U U U Figure 1:
The re-organization of the tree in the one-cutting procedure: on the left the initial tree T , onthe right H and the marked nodes U , . . . , U where to reattach X , . . . , X in order to recover T . and the sequence ( F i , ≤ i ≤ (cid:96) ) is precisely the sequence of subtrees f i , of h rooted at x i , ≤ i ≤ (cid:96) , thatare obtained when one removes the edges { x i , x i +1 } , ≤ i < (cid:96) (the edges of the subgraph Span( h ; v ) ).Furthermore, given that L = (cid:96) and the sequence of cut vertices X i = x i , ≤ i < (cid:96) , in order to recoverthe initial tree T it suffices to identify the vertices U i , ≤ i < (cid:96) , for which there used to be an edge { X i , U i } (which yields the correct adjacencies) and the root of T . Note that U i is a node of T i , ≤ i < (cid:96) .However, by construction, given that H = h and V = v , the set of nodes of T i is precisely the set ofnodes of Sub( h, x i +1 ) , the subtree of h rooted at x i +1 .For u = ( u , · · · , u (cid:96) − ) ∈ U ( h, v ) , define τ ( h, v ; u ) as the tree obtained from h by removing theedges of Span( h ; v ) , and reconnecting the pieces by adding the edges { x i , u i } , for all the edges (cid:104) x i , x i +1 (cid:105) in Span( h, v ) . (In particular, the number of edges is unchanged.) We regard τ ( h, v ; u ) as a tree rootedat r = x , the root of h . The the tree T may be recovered by characterizing T r , the tree T rerooted at r ,and the initial root r ( T ) . We have: { H = h ; V = v ; r ( T ) = w ; U i = u i , ≤ i < (cid:96) } = { T r = τ ( h, v ; u ); r ( T ) = w ; X i = x i , ≤ i ≤ (cid:96) } . It follows that, for any nodes u , u , . . . , u (cid:96) − as above, we have P ( H = h ; V = v ; r ( T ) = w ; U i = u i , ≤ i < (cid:96) ) = P ( T = τ ( h, v ; u ) w ; V = v ; X i = x i ; 1 ≤ i ≤ (cid:96) )= π ( τ ( h, v ; u ) w ) · p v · (cid:89) ≤ i ≤ (cid:96) p x i p (Sub( h, x i )) . Now, by definition, the only nodes that get their (in-)degree modified in the transformation from h to τ ( h, v ; u ) are u i , x i +1 , ≤ i < (cid:96) : every such x i +1 gets one less in-edge while u i gets one more. There-rooting at w then only modifies the in-degrees of the extremities of the path that is reversed, namely x = r and w . It follows that π ( τ ( h, v ; u ) w ) = π ( h ) · (cid:89) ≤ i<(cid:96) p u i p x i +1 · p w p x . Since p (Sub( h, x )) = 1 , we have P ( H = h ; V = v ; r ( T ) = w ; U i = u i , ≤ i < (cid:96) ) = π ( h ) · (cid:89) ≤ i<(cid:96) p u i p (Sub( h, x i +1 )) · p v · p w , u = ( u , u , . . . , u (cid:96) − ) ∈ U ( h, v ) , and w ∈ [ n ] , we obtain P ( H = h ; V = v ) = (cid:88) w ∈ [ n ] (cid:88) u ∈ U ( h,v ) π ( h ) · (cid:89) ≤ i<(cid:96) p u i p (Sub( h, x i +1 )) · p v · p w = π ( h ) · p v · (cid:88) u =( u , ··· ,u (cid:96) − ): u i ∈ Sub( h,x i +1 ) , ≤ i<(cid:96) p u p (Sub( h, x )) · · · p u (cid:96) − p (Sub( h, x (cid:96) ))= π ( h ) · p v , which completes the proof.T HE REVERSE CUTTING PROCEDURE . We have transformed the tree T into the tree H , by some-what “knitting” a path between the first picked random p -node X and the distinguished node V . Thistransform is reversible. Indeed, it is possible to “unknit” the path between V and the root of H , andreshuffle the subtrees thereby created in order to obtain a new tree ˜ T , distributed as T and in which V is an independent p -node. Knowing the U i , one could do this exactly, and recover the adjacencies of T (recovering T also requires the information about the root r ( T ) which has been lost). Defining a reversetransformation reduces to finding the joint distribution of ( U i ) and r ( T ) , which is precisely the statementof Lemma 4.1, so that the following reverse construction is now straightforward.Let h ∈ T n , rooted at r and let v be a node in [ n ] . We think of h as the tree that was obtained by the -cutting procedure cut( T, v ) , for some initial tree T . Suppose that Span( h, v ) consists of the vertices r = x , x , . . . , x (cid:96) = v . Removing the edges of Span( h, v ) from h disconnects it into (cid:96) connectedcomponents which we see as rooted at x i , ≤ i ≤ (cid:96) . For w ∈ Span ∗ ( h, v ) = Span( h, v ) \ { r } ,sample a node U w according to the restriction of p to Sub( h, w ) . Let U = ( U w , w ∈ Span ∗ ( h, v )) bethe obtained vector. Then U ∈ U ( h, v ) . We then define shuff( h, v ) to be the rooted tree which has theadjacencies of τ ( h, v ; U ) , but that is re-rooted at an independent p -node.It should now be clear that the -cutting procedure and the reshuffling operation we have just definedare dual in the following sense. Proposition 4.3 (1-cutting duality) . Let T be p -tree on [ n ] and V be an independent p -node. Then, (shuff( T, V ) , T, V ) d = ( T, cut( T, V ) , V ) . In particular, (shuff(
T, V ) , V ) ∼ π ⊗ p . Note that for the joint distribution in Proposition 4.3, it is necessary to re-root at another independent p -node in order to have the claimed equality. Indeed, T and τ ( T, V ; U ) have the same root almost surely,while T and cut( T, V ) do not (they only have the same root with probability (cid:80) i ≥ p i < ). Proof of Proposition 4.3.
Let H = cut( T, V ) be the tree resulting from the cutting procedure. Let L = H ; V ) . For ≤ i < L , we defined nodes U i , which used to be the neighbors of X i in T . For w ∈ Span ∗ ( H ; V ) , we let U w = U i if w = X i +1 , and let U be the corresponding vector. Thenwriting ˆ r = r ( T ) , with probability one, we have T = τ ( H, V ; U ) ˆ r . By Lemma 4.1, U ∈ U ( H, V ) and conditional on H and V , U w , w ∈ Span ∗ ( H, V ) and ˆ r = r ( T ) areindependent and distributed according to the restriction of p to Sub(
H, w ) and p , respectively. So thiscoupling indeed gives that T = τ ( H, V ; U ) ˆ r is distributed as shuff( H, V ) , conditional on H . Since inthis coupling (shuff( H, V ) , T, V ) is almost surely equal to ( T, H, V ) , the proof is complete.15 V V X i U i U i U i F i Figure 2:
The decomposition of the tree when removing the point X i from the connected component of Γ i which contains V , V and V . Remark.
Note that the shuffle procedure would permit to obtain the original tree T exactly if we were touse some information that might be gathered as the cutting procedure goes on. In this discrete case, thisis rather clear that one could do this, since the shuffle construction only consists in replacing some edgeswith others but the vertex set remains the same. This observation will be used in Section 6 to prove asimilar statement for the ICRT. There it is much less clear and the result is slightly weaker: it is possibleto couple the shuffle in such a way that the tree obtained is measure-isometric to the original one. We define a cutting procedure analogous to the one described in Section 4.1, but which continues untilmultiple nodes have been isolated. Again, we let T be a p -tree and, for some k ≥ , let V , V , · · · , V k be k independent vertices chosen according to p (so not necessarily distinct).T HE k - CUTTING PROCEDURE AND THE k - CUT TREE . We start with Γ = T . Later on, Γ i is meant tobe the forest induced by T on the nodes that are left. For each time i ≥ , we pick a random vertex X i according to p restricted to v (Γ i − ) , the set of the remaining vertices, and remove it. Then among theconnected components of T \ { X , · · · , X i } , we only keep those containing at least one of V , · · · , V k .We stop at the first time when all k vertices V , . . . , V k have been chosen, that is at time L k := inf { i ≥ { V , . . . , V k } ⊆ { X , . . . , X i }} . For ≤ (cid:96) ≤ k and for i ≥ , we denote by T (cid:96)i the connected component of T \ { X , X , · · · , X i } containing V (cid:96) at time i , or T (cid:96)i = ∅ if V (cid:96) ∈ { X , . . . , X i } . Then Γ i is the graph consisting of theconnected components T (cid:96)i , (cid:96) = 1 , . . . , k .Fix some (cid:96) ∈ { , , . . . , k } , and suppose that at time i ≥ , we have X i ∈ T (cid:96)i − . If X i = V (cid:96) , then T (cid:96)i = ∅ and we define F i = T (cid:96)i − , re-rooted at X i = V (cid:96) . Otherwise, X i (cid:54) = V (cid:96) and there is a first node U (cid:96)i on the path between X i and V (cid:96) in T (cid:96)i − . Then U (cid:96)i ∈ T (cid:96)i , and we see T (cid:96)i as rooted at U (cid:96)i . Note that itis possible that T ji − = T (cid:96)i − , for j (cid:54) = (cid:96) , and that removing X i may separate V (cid:96) from V j . Removing from Γ i − the edges { X i , U (cid:96)i } , for ≤ (cid:96) ≤ k such that T (cid:96)i (cid:51) X i , isolates X i from the nodes V , . . . , V k , andwe define F i as the subtree of T induced on the nodes in Γ i − \ Γ i , so that F i is the portion of the forest Γ i − which gets discarded at time i , which we see as rooted at X i .Consider the set of effective cuts which affect the size of T (cid:96)i : E k(cid:96) = { x ∈ [ n ] : there exists i ≥ , such that X i = x ∈ T (cid:96)i − } , and note that E k ∪ E k ∪ · · · ∪ E kk = { X i : 1 ≤ i ≤ L k } . Let S k , the k -cutting skeleton , be a treeon E k ∪ · · · ∪ E kk that is rooted at X , and such that the vertices on the path from X to V (cid:96) in S k E k(cid:96) , in the order given by the indices of the cuts. So if we view S k as agenealogical tree, then in particular, for ≤ j, (cid:96) ≤ k , the common ancestors of V j and V (cid:96) are exactlythe ones in E kj ∩ E k(cid:96) . The tree S k constitutes the backbone of a tree on [ n ] which we now define. Forevery x ∈ S k , there is a unique i = i ( x ) ≥ such that x = X i . For that integer i we have defineda subtree F i which contains X i = x . We append F i to S k at x . Formally, we consider the tree on [ n ] whose edge set consists of the edges of S k together with the edges of all F i , ≤ i ≤ L k . Furthermore,the tree is considered as rooted at X . Then this tree is completely determined by T , V , . . . , V k , andthe sequence X := ( X i , i = 1 , . . . , L k ) , and we denote this tree by κ ( T ; V , . . . , V k ; X ) when we wantto emphasize the dependence in X , or more simply cut( T, V , . . . , V k ) (in which it is implicit that thecutting sequence used in the transformation is such that for every i ≥ , X i is a p -node in Γ i − ). Clearly,if H k = cut( T, V , . . . , V k ) then S k = Span( H k ; V , . . . , V k ) .It is convenient to define a canonical (total) order (cid:22) on the vertices of S k . It will be needed later on inorder to define the reverse procedure. For two nodes u, v in S k , we say that u (cid:22) v if either u ∈ (cid:74) X , v (cid:75) ,or if there exists (cid:96) ∈ { , . . . , k } such that u ∈ Span( S k ; V , . . . , V (cid:96) ) but v (cid:54)∈ Span( S k ; V , . . . , V (cid:96) ) .A USEFUL COUPLING . It is useful to see all the trees cut( T ; V , . . . , V k ) on the same probability space,and provide a natural but crucial coupling for which the sequence ( S k ) is increasing in k . Let Y i , i ≥ , bea sequence of i.i.d. p -nodes. For k ≥ , we define an increasing sequence σ k as follows. Let σ k (1) = 1 .Suppose that we have already defined X k , . . . , X ki − . Let Γ ki − be the collection of connected componentsof T \ { X k , . . . , X ki − } which contain at least one of V , . . . , V k . Let σ k ( i ) = inf { j > σ k ( i −
1) : Y j ∈ Γ ki − } , and define X ki = Y σ k ( i ) . Then, for every k , X ki , i ≥ , is a sequence of nodes sampled according to therestriction of p to Γ ki − , so that X k := ( X ki , i ≥ can be used to define cut( T, V , . . . , V k ) , k ≥ , in aconsistent way by setting cut( T, V , . . . , V k ) = κ ( T, V , . . . , V k ; X k ) . Suppose that the trees H k := cut( T ; V , . . . , V k ) , k ≥ , are constructed using the coupling we have justdescribed. By convention let H = T and Span( T ; ∅ ) = ∅ . Lemma 4.4.
Let S k = Span( H k ; V , . . . , V k ) . Then, S k ⊆ S k +1 and S k = Span( S k +1 ; V , . . . , V k ) . Proof.
Let T (cid:96)i be the connected component of Γ ki which contains V (cid:96) . Let ˆ T (cid:96)j be the connected componentof T \ { Y , Y , . . . , Y j } which contains V (cid:96) . Then, for (cid:96) ≤ k , we have E k(cid:96) = { x : ∃ i ≥ , x = X ki ∈ T (cid:96)i − } = { y : ∃ j ≥ , y = Y j ∈ ˆ T (cid:96)j − } , so that E k(cid:96) does not depend on k . Then S k is the tree on E k ∪ · · · ∪ E kk such that the nodes on the path Span( S k ; V (cid:96) ) are precisely the nodes of E k(cid:96) , in the order given by the cut sequence X k . It follows that S k ⊆ S k +1 and more precisely that S k = Span( S k +1 ; V , . . . , V k ) . Remark.
The coupling we have just defined justifies an ordered cutting procedure which is very similarto the one defined in [3]. Suppose that, for some j, (cid:96) ∈ { , . . . , k } we have x ∈ E kj \ E k(cid:96) and y ∈ E k(cid:96) \ E kj .Write ( ˜ X i , i ≥ for the sequence in which we have exchanged the positions of x and y . Then the trees T ki , i ≥ max { m : X m = x or y } are unaffected if we replace ( X i , i ≥ by ( ˜ X i , i ≥ in the cuttingprocedure. In particular, if we are only interested in the final tree H k , we can always suppose that there17xist numbers m < m < m < · · · < m k ≤ n such that, for ≤ (cid:96) ≤ k , and if V (cid:96) (cid:54)∈ { V , . . . , V j } ,we have E k(cid:96) \ (cid:91) ≤ j<(cid:96) E kj = { X i : m (cid:96) − < i ≤ m (cid:96) } . However, we prefer the coupling over the reordering of the sequence since it does not involve any modi-fication of the distribution of the cutting sequences.Let ˜ T k be the subtree of H k − \ Span( H k − ; V , . . . , V k − ) = H k − \ S k − which contains V k ; weagree that ˜ T k = ∅ if V k ∈ Span( H k − ; V , . . . , V k − ) . Lemma 4.5.
Let T be a p -tree and let V k , k ≥ , be a sequence of i.i.d. p -nodes. Then, for each k ≥ :i. Let V ⊆ [ n ] with V (cid:54) = ∅ , then conditional on V (cid:96) ∈ v ( ˜ T k ) = V , the pair ( ˜ T k , V (cid:96) ) is distributed as π | V ⊗ p | V , and is independent of ( H k − \ V , V , · · · , V k − ) .ii. The joint distribution of ( H k , V , · · · , V k ) is given by π ⊗ p ⊗ k .Proof. We proceed by induction on k ≥ . Let ˜ R k denote the tree induced by H k on the vertex set [ n ] \ v ( ˜ T k ) . For the base case k = 1 , the first claim is trivial since ˜ T = T , and the second is exactly thestatement of Lemma 4.1.Given the two subtrees ˜ T k and ˜ R k , it suffices to identify where the tree ˜ T k is grafted on ˜ R k in orderto recover the tree H k − . By construction, the edge connecting ˜ T k and ˜ R k in H k − binds the root of ˜ T k to a node of Span( ˜ R k ; V , . . . , V k − ) . Let t ∈ T V , r ∈ T [ n ] \ V , v k ∈ V and v i ∈ [ n ] \ V for ≤ i < k .Write v k − = { v , . . . , v k − } . For a given node x ∈ Span( r ; v k − ) , let j x ( r, t ) (the joint of r and t at x ) be the tree obtained from t and r by adding an edge between x and the root of t . By the inductionhypothesis, ( H k − , V , · · · , V k − ) is distributed like a p -tree together with k − independent p -nodes.Furthermore V k is independent of ( H k − , V , · · · , V k − ) . It follows that P ( ˜ T k = t ; ˜ R k = r ; V i = v i , ≤ i ≤ k ) = (cid:88) x ∈ Span( r ; v k − ) P ( H k − = j x ( r, t ); V i = v i , ≤ i ≤ k )= (cid:88) x ∈ Span( r ; v k − ) (cid:89) i ∈ V p C i ( t ) i · (cid:89) j ∈ [ n ] \ V p C j ( r ) j · p x · (cid:89) ≤ i ≤ k p v i = (cid:89) i ∈ V p C i ( t ) i · p v k · (cid:89) j ∈ [ n ] \ V p C j ( r ) j · p (Span( r ; v k − )) · (cid:89) ≤ i In order to obtain cut( T, V , . . . , V k ) from cut( T, V , . . . , V k − ) , it suffices to transform thesubtree ˜ T k of cut( T, V , . . . , V k − ) \ S k − which contains V k . X U U V V V Figure 4: The -cut tree and the marked points U , U corresponding to the cut node X . The backboneis represented by the subtree in thick blue. Corollary 4.6. Suppose that T is a p -tree and that V , . . . , V k are k ≥ independent p -nodes, alsoindependent of T . Then, S k d = Span( T ; V , . . . , V k ) . In particular, the total number of cuts needed to isolate V , . . . , V k in T is distributed as the number ofvertices of Span( T ; V , . . . , V k ) . R EVERSE k - CUTTING AND DUALITY . As when we were isolating a single node V in Section 4.1, thetransformation that yields H k = cut( T, V , . . . , V k ) is reversible. To reverse the -cutting procedure, we“unknitted” the path between X and V . Similarly, to reverse the k -cutting procedure, we “unknit” thebackbone S k and by doing this obtain a collection of subtrees; then we re-attach these pendant subtreesat random nodes, which are chosen in suitable subtrees in order to obtain a tree distributed like the initialtree T .For every i , the subtree F i , rooted at X i , was initially attached to the set of nodes U i := { U ji : 1 ≤ j ≤ k such that T ji − (cid:51) X i } . The corresponding edges have been replaced by some edges which now lie in the backbone S k . So,to reverse the cutting procedure knowing the sets U i , it suffices to remove all the edges of S k , and tore-attach X i to every node in U i . In other words, defining a reverse k -cutting transformation knowing19nly the tree H k and the distinguished nodes V , . . . , V k reduces to characterizing the distribution of thesets U i .Consider a tree h ∈ T n , and k nodes v , v , . . . , v k not necessarily distinct. Removing the edges of Span( h ; v , . . . , v k ) from h disconnects it into connected components f x , each containing a single vertex x of Span( h ; v , . . . , v k ) . For a given edge (cid:104) x, w (cid:105) of Span( h ; v , . . . , v k ) , let u w be a node in Sub( h, w ) .Let u be the vector of the u w , sorted according to the canonical order of w on Span( h ; v , . . . , v k ) (seep. 17). For a given tree h and v , . . . , v k , we let U ( h, v , . . . , v k ) be the set of such vectors u . For u ∈ U ( h, v , . . . , v k ) , define τ ( h, v , . . . , v k ; u ) as the graph obtained from h by removing every edge (cid:104) x, w (cid:105) of Span( h ; v , . . . , v k ) and replacing it by { x, u w } . We regard τ ( h, v , . . . , v k ; u ) as rooted at theroot of h . Lemma 4.7. Suppose that h ∈ T n , and that v , v , . . . , v k are k nodes of [ n ] , not necessarily distinct.Then for every u ∈ U ( h, v , . . . , v k ) , τ ( h, v , . . . , v k ; u ) is a tree on [ n ] .Proof. Write t := τ ( h, v , . . . , v k ; u ) . We proceed by induction on n ≥ . For n = 1 , t = h is reducedto a single node; so t is a tree.Suppose now that for any tree t (cid:48) of size at most n − , any k ≥ , any nodes v , v , . . . , v k ∈ v ( t (cid:48) ) ,and any u (cid:48) ∈ U ( t (cid:48) , v , . . . , v k ) , the graph τ ( t (cid:48) , v , . . . , v k ; u (cid:48) ) is a tree. Let N be the set of neighborsof the root x of h . For y ∈ N , define v y the subset of { v , . . . , v k } containing the vertices whichlie in Sub( h, y ) . If v y (cid:54) = ∅ , let also u y ∈ U (Sub( h, y ) , v y ) be obtained from u by keeping onlythe vertices u w for w ∈ Span ∗ (Sub( h, y ) , v y ) , still in the canonical order. Then, by construction, thesubtrees Sub( h, y ) , with y ∈ N such that v y (cid:54) = ∅ are transformed regardless of one another, and theothers, for which v y = ∅ , are left untouched. So the graph τ ( h, v , . . . , v k ; u ) induced on [ n ] \ { x } consists precisely of τ (Sub( h, y ) , v y ; u y ) , y ∈ N . By the induction hypothesis, these subgraphs areactually trees. Then τ ( h, v , . . . , v k ; u ) is simply obtained by adding the node x together with the edges { x , u y } , for y ∈ N , where u y ∈ Sub( h, y ) . In other words, each such edge connects x to a differenttree τ (Sub( h, y ) , v y ; u y ) so that the resulting graph is also a tree.For a given tree h and v , . . . , v k ∈ [ n ] let U ∈ U ( h, v , . . . , v k ) be obtained by sampling U w according to the restriction of p to Sub( h, w ) , for every w ∈ Span ∗ ( h, v , . . . , v k ) . Finally, we definethe k -shuffled tree shuff( h ; v , . . . , v k ) to be the tree τ ( h, v , . . . , v k ; U ) re-rooted at an independent p -node.We have the following result, which expresses the fact that the k -cutting and k -shuffling proceduresare truly reverses of one another. Proposition 4.8 ( k -cutting duality) . Let T be a p -tree and let V , . . . , V k be k independent p -nodes, alsoindependent of T . Then, we have the following duality (shuff( T, V , . . . , V k ) , T, V , . . . , V k ) d = ( T, cut( T, V , . . . , V k ) , V , . . . , V k ) . In particular, (shuff( T, V , . . . , V k ) , V , . . . , V k ) ∼ π ⊗ p ⊗ k .Proof. We consider the coupling we have defined on page 17: We let H k = cut( T, V , . . . , V k ) for a p -tree T rooted at ˆ r = r ( T ) , and for every edge (cid:104) x, w (cid:105) of Span( H k ; V , . . . , V k ) we let U w be the uniquenode of Sub( H k , w ) which used to be connected to x in the initial tree T . This defines the vector U =( U w , w ∈ Span ∗ ( H k ; V , . . . , V k )) . We show by induction on k ≥ that τ ( H k , V , . . . , V k ; U ) ˆ r = T and that the joint distribution of ( H k , ˆ r, V , . . . , V k , U ) is that required by the construction above, so that ( τ ( H k , V , . . . , V k ; U ) ˆ r , H k , V , . . . , V k ) d = (shuff( H k , V , . . . , V k ) , H k , V , . . . , V k ) . Since H k d = T , this would complete the proof. 20or k = 1 , the statement corresponds precisely to the construction of the proof of Proposition 4.3. Asbefore, for (cid:96) ≤ k , we let S (cid:96) = Span( H k ; V , . . . , V (cid:96) ) . If k ≥ , let ˜ R k be the connected component of H k \ S k − which contains V k , or ˜ R k = ∅ if V k ∈ S k − . In the latter case, T = τ ( H k , V , . . . , V k − , U ) ˆ r and the joint distribution of ( H k , ˆ r, V , . . . , V k − , U ) is correct by the induction hypothesis. Other-wise, let U k denote the sub-vector of U consisting of the components U w for w ∈ Span ∗ ( ˜ R k , V k ) ,and let U ,k − = ( U w , w ∈ Span ∗ ( H k ; V , . . . , V k − )) . If θ ∈ S k is the unique point such that ˜ R k = Sub( H k , θ ) (that is, θ is the root of ˜ R k ), then removing ˜ R k from H k and replacing it by τ ( ˜ R k , V k ; U k ) U θ yields precisely the tree H k − := cut( T ; V , . . . , V k − ) . Also, the distribution of ( ˜ R k , U θ , V k , U k ) is correct, since conditional on the vertex set ˜ R k is distributed as π | v ( ˜ R k ) (Lemma 4.5i).Note that this transformation does not modify the distribution of U ,k − . By the induction hypothesis, T = τ ( H k − , V , . . . , V k − ; U ,k − ) ˆ r . Since conditionally on S k − = Span( H k ; V , . . . , V k − ) wehave V k ∈ S k − with probability p ( S k − ) , the proof is complete. For n a natural number, we may also easily apply the previous procedure until all n nodes have beenchosen. In this case, the cutting procedure continues recursively in all the connected components. Thenumber of cuts is now completely irrelevant (it is a.s. equal to n ), and we define the forward transform asfollows. Let T be a p -tree and let ( X i , i ≥ be a sequence of elements of [ n ] such that X i is sampledaccording to the restriction of p to [ n ] \ { X , . . . , X i − } . Let Γ i = T \ { X , . . . , X i } ; we stop preciselyat time n , when { X , . . . , X n } = [ n ] and Γ n = ∅ .For every k ∈ [ n ] , define T (cid:104) k (cid:105) i as the connected component of Γ i which contains the vertex k , or T (cid:104) k (cid:105) i = ∅ if k ∈ { X , . . . , X i } . For each i = 1 , . . . , n , let U i denote the set of neighbors of X i in Γ i − .Then we can write U i = { U (cid:104) k (cid:105) i : 1 ≤ k ≤ n such that T (cid:104) k (cid:105) i − (cid:51) X i } where U (cid:104) k (cid:105) i is the unique element of U i which lies in T (cid:104) k (cid:105) i . The cuts which affect the connected component containing k are E (cid:104) k (cid:105) := { x ∈ [ n ] : ∃ i ≥ , X i = x ∈ T (cid:104) k (cid:105) i − } . We claim that there exists a tree G such that for every k ∈ [ n ] , the path (cid:74) X , k (cid:75) in G is preciselymade of the nodes in E (cid:104) k (cid:105) , in the order in which they appear in the permutation ( X , X , . . . , X n ) . In thefollowing, we write cut( T ) := G . The following proposition justifies the claim. Proposition 4.9. Let T be a p -tree, and let V k , k ≥ , be i.i.d. p -nodes, independent of T . Then, as k → ∞ , cut( T, V , . . . , V k ) d −→ cut( T ) . Proof. We rely on the coupling we introduced in Section 4.2. Since, for k ≥ , we have V , . . . , V k ∈ S k and S k ⊆ S k +1 , the tree S k converges almost surely to a tree on [ n ] , so that lim k →∞ cut( T ; V , . . . , V k ) indeed exists with probability one. In particular, although cut( T ; V , . . . , V k ) certainly depends on V , . . . , V k , the limit only depends on the sequence ( X i , i ≥ . Indeed, K := inf { k ≥ n ] = { V , . . . , V k }} is a.s. finite, and for every k ≥ K , one has cut( T ; V , . . . , V k ) = cut( T ; X , . . . , X n ) .We then write cut( T ) := cut( T ; X , . . . , X n ) . Theorem 4.10 (Cut tree) . Let T be a p -tree on [ n ] . Then, we have cut( T ) ∼ π .Proof. In the coupling defined in Section 4.2, we have S k = Span(cut( T ); V , V , . . . , V k ) → cut( T ) almost surely as k → ∞ . However, by Corollary 4.6, S k is distributed like Span( T ; V , . . . , V k ) , so that S k → T in distribution, as k → ∞ , which completes the proof.21 HUFFLING TREES AND THE REVERSE TRANSFORMATION . Given a tree g ∈ T n that we know is cut( t ) for some tree t ∈ T n , and the collections of sets U x , x ∈ [ n ] , we cannot recover the initial tree t exactly,for the information about the root has been lost. However, the structure of t as an unrooted tree is easily(in this case, trivially) recovered by connecting every node x to all the nodes in U x . We now define thereverse operation, which samples the sets U x with the correct distribution conditional on g , and producesa tree ˜ T distributed as T conditionally on cut( T ) = g .Consider a tree g ∈ T n , rooted at r ∈ [ n ] . For each edge (cid:104) x, w (cid:105) of the tree g , let U w be a randomelement sampled according to the restriction of p to Sub( g, w ) . Let U ∈ U ( g ) := U ( g, , , . . . , n ) bethe vector of the U w , sorted using the canonical order on g with distinguished nodes , , . . . , n . Let τ ( g, [ n ]; U ) denote the graph on [ n ] whose edges are { x, U w } , for (cid:104) x, w (cid:105) edges of g . Then, τ ( g, [ n ]; U ) is a tree (Lemma 4.7) and we write shuff( g ) for the random rerooting of τ ( g, [ n ]; U ) at an independent p -node. Proposition 4.11. Let G be a p -tree, and ( V k , k ≥ a sequence of i.i.d. p -nodes. Then, as k → ∞ , shuff( G ; V , . . . , V k ) d −→ shuff( G ) . Proof. We prove the claim using a coupling which we build using the random variables U w , w (cid:54) = r .For k ≥ , we let U k be the subset of U containing the U w for which w ∈ Span ∗ ( G ; V , . . . , V k ) ,in the canonical order on Span ∗ ( G ; V , . . . , V k ) . Then for k ≥ , U k ∈ U ( G, V , . . . , V k ) and since Span( G ; V , . . . , V k ) increases to T , the number of edges of τ ( G ; V , . . . , V k ; U k ) which are constrainedby the choices in U k increases until they are all constrained. It follows that τ ( G ; V , . . . , V k ; U k ) → τ ( G ; 1 , , . . . , n ; U ) almost surely, as k → ∞ . Re-rooting all the trees at the same random p -node proves the claim.We can now state the duality for the complete cutting procedure. It follows readily from the distribu-tional identity in Proposition 4.8 ( T, cut( T, V , . . . , V k )) d = (shuff( T, V , . . . , V k ) , T ) . and the fact that cut( T ; V , . . . , V k ) → cut( T ) and shuff( T ; V , . . . , V k ) → shuff( T ) in distribution as k → ∞ (Propositions 4.9 and 4.11). Proposition 4.12 (Cutting duality) . Let T be a p -tree. Then, we have the following duality in distribution ( T, cut( T )) d = (shuff( T ) , T ) . In particular, shuff( T ) ∼ π . From now on, we fix some θ = ( θ , θ , θ , · · · ) ∈ Θ . We denote by I = { i ≥ θ i > } the indexset of those θ i with nonzero values. Let T be the real tree obtained from the Poisson point processconstruction in Section 2.5. We denote by µ and (cid:96) its respective mass and length measures. Recall themeasure L defined by L ( dx ) = θ (cid:96) ( dx ) + (cid:88) i ∈ I θ i δ β i ( dx ) , where β i is the branch point of local time θ i for i ∈ I . The hypotheses on θ entail that L has infinite totalmass. On the other hand, we have 22 emma 5.1. Almost surely, L is a σ -finite measure concentrated on the skeleton of T . More precisely, if ( V i , i ≥ is a sequence of independent points sampled according to µ , then for each k ≥ , we have P -almost surely L (Span( T ; V , V , · · · , V k )) < ∞ . Proof. We consider first the case k = 1 . Recall the Poisson processes (P j , j ≥ in the Section 2.5 andthe notations there. We have seen that Span( T ; V ) and R have the same distribution. Then we have L (Span( T ; V )) d = θ η + (cid:88) i ≥ θ i δ ξ i, ([0 , η ]) . By construction, η is either ξ j, for some j ≥ or u . This entails that on the event { η ∈ P j } , we have η < ξ i, for all i ∈ N \ { j } . Then, E (cid:34) (cid:88) i ≥ θ i δ ξ i, ([0 , η ]) (cid:35) = (cid:88) j ≥ E (cid:34) (cid:88) i ≥ θ i · { ξ i, ≤ η } { η = ξ j, } (cid:35) + E (cid:34) (cid:88) i ≥ θ i · { ξ i, <η } { η = u } (cid:35) . Note that the event { ξ j, ≤ η } ∩ { η = ξ j, } always occurs. By breaking the first sum on i into θ j + (cid:80) i (cid:54) = j θ i { ξ i, <η <ξ i, } and re-summing over j , we obtain E (cid:34) (cid:88) i ≥ θ i δ ξ i, ([0 , η ]) (cid:35) = (cid:88) j ≥ θ j P ( η ∈ P j ) + (cid:88) j ≥ E (cid:34) (cid:88) i ≥ ,i (cid:54) = j θ i · { ξ i, <η <ξ i, } { η ∈ P j } (cid:35) = (cid:88) j ≥ θ j P ( η ∈ P j ) + (cid:88) j ≥ (cid:88) i (cid:54) = j E [ θ i η e − θ i η { η ∈ P j } ] ≤ (cid:88) i ≥ θ i · E [ η ] , where we have used the independence of (P j , j ≥ in the second equality. The distribution of η isgiven by (2.5). If θ > , we have P ( η > r ) ≤ exp( − θ r / ; otherwise, we have P ( η > r ) ≤ (1 + θ r ) e − θ r . In either case, we are able to show that E [ η ] < ∞ . Therefore, E [ L (Span( T ; V ))] = θ E [ η ] + E (cid:34) (cid:88) i ≥ θ i δ ξ i, ([0 , η ]) (cid:35) < ∞ . In general, the variables V , V , · · · , V k are exchangeable, therefore E [ L (Span( T ; V , V , · · · , V k ))] ≤ k E [ L (Span( T ; V ))] < ∞ , which proves that L is almost surely finite on the trees spanning finitely many random leaves. Finally,with probability one, ( V i , i ≥ is dense in T , thus Sk( T ) = ∪ k ≥ (cid:75) r ( T ) , V i (cid:74) (see for example [5,Lemma 5]). This concludes the proof.We recall the Poisson point process P of intensity measure dt ⊗ L ( dx ) , whose points we haveused to define both the one-node-isolation procedure and the complete cutting procedure. As a directconsequence of Lemma 5.1, P has finitely many atoms on [0 , t ] × Span( T ; V , V , · · · , V k ) for all t > and k ≥ , almost surely. This fact will be implicitly used in the sequel.23 .1 An overview of the proof Recall the hypothesis (H) on the sequence of the probability measures ( p n , n ≥ : σ n = (cid:32) n (cid:88) i =1 p ni (cid:33) / n →∞ −→ , and lim n →∞ p ni σ n = θ i , for every i ≥ . (H)Recall the notation T n for a p n -tree, which, from now on, we consider as a measured metric space,equipped with the graph distance and the probability measure p n . Camarri–Pitman [21] have proved thatunder hypothesis (H), ( σ n T n , p n ) n →∞ −−−→ d, GP ( T , µ ) . (5.1)This is equivalent to the convergence of the reduced subtrees: For each n ≥ and k ≥ , write R nk =Span( T n ; ξ n , · · · , ξ nk ) for the subtree of T n spanning the points { ξ n , · · · , ξ nk } , which are k randompoints sampled independently with distribution p n . Similarly, let R k = Span( T ; ξ , . . . , ξ k ) be thesubtree of T spanning the points { ξ , · · · , ξ k } , where ( ξ i , i ≥ is an i.i.d. sequence of common law µ .Then (5.1) holds if and only if for each k ≥ , σ n R nk n →∞ −−−→ d, GH R k . (5.2)However, even if the trees converge, one expects that for the cut trees to converge, one at least needsthat the measures which are used to sample the cuts also converge in a reasonable sense. Observe that L has an atomic part, which, as we shall see, is the scaling limit of large p n -weights. Recall that p n issorted: p n ≥ p n ≥ · · · p nn . For each m ≥ , we denote by B nm = (1 , , · · · , m ) the vector of the m p n -heaviest points of T n , which is well-defined at least for n ≥ m . Recall that for i ≥ , β i denotes thebranch point in T of local time θ i , and write B m = ( β , β , · · · , β m ) . Then Camarri and Pitman [21]also proved that (cid:0) σ n T n , p n , B nm (cid:1) n →∞ −→ d (cid:0) T , µ, B m (cid:1) (5.3)with respect to the m -pointed Gromov–Prokhorov topology, which will allow us to prove the followingconvergence of the cut-measures. Let L n = (cid:88) i ∈ [ n ] p ni σ n · δ i = σ − n p n . (5.4)Recall the notation m (cid:22) A for the (non-rescaled) restriction of a measure to a subset A . Proposition 5.2. Under hypothesis (H) , we have (cid:0) σ n R nk , L n (cid:22) R nk (cid:1) n →∞ −→ d (cid:0) R k , L (cid:22) R k (cid:1) , ∀ k ≥ , (5.5) with respect to the Gromov–Hausdorff–Prokhorov topology. The proof uses the techniques developed in [10, 21] and is postponed until Section 7. We prove inthe following subsections that the convergence in Proposition 5.2 is sufficient to entail convergence ofthe cut trees. To be more precise, we denote by V n a p n -node independent of the p -tree T n , and recallthat in the construction of H n := cut( T n , V n ) , the node V n ends up at the extremity of the path uponwhich we graft the discarded subtrees. Recall from the construction of H := cut( T , V ) in Section 3 thatthere is a point U , which is at distance L ∞ from the root. In Section 5.2, we prove Theorem 3.1, that is:if (H) holds, then ( σ n H n , p n , V n ) n →∞ −→ d, GP ( H , ˆ µ, U ) , (5.6)jointly with the convergence in (5.5). From there, the proof of Theorem 3.2 is relatively short, and weprovide it immediately (taking Theorem 3.1 or equivalently (5.6) for granted).24 roof of Theorem 3.2. For each n ≥ , let ( ξ ni ) i ≥ be a sequence of i.i.d. points of common law p n , andlet ξ n = V n . Let ( ξ i ) i ≥ be a sequence of i.i.d. points of common law ˆ µ , and let ξ = U . We let ρ n = ( σ n d H n ( ξ ni , ξ nj )) i,j ≥ and ρ ∗ n = ( σ n d H n ( ξ ni , ξ nj )) i,j ≥ the distance matrices in σ n H n = σ n cut( T n , V n ) induced by the sequences ( ξ ni ) i ≥ and ( ξ ni ) i ≥ , respec-tively. According to Lemma 4.1, the distribution of ξ n = V n is p n , therefore ρ n is distributed as ρ ∗ n .Writing similarly ρ = ( d H ( ξ i , ξ j )) i,j ≥ and ρ ∗ = ( d H ( ξ i , ξ j )) i,j ≥ , where d H denotes the distance of H = cut( T , V ) , (5.6) entails that ρ n → ρ in the sense of finite-dimensional distributions. Combined with the previous argument, we deduce that ρ and ρ ∗ have thesame distribution. However, ρ ∗ is the distance matrix of an i.i.d. sequence of law ˆ µ on H n . And thedistribution of ρ determines that of V . As a consequence, the law of U is ˆ µ .For the unconditional distribution of ( H , ˆ µ ) , it suffices to apply the second part of Lemma 4.1, whichsays that ( H n , p n ) is distributed like ( T n , p n ) . Then comparing (5.6) with (5.1) shows that the uncondi-tional distribution of ( H , ˆ µ ) is that of ( T , µ ) .In order to prove the similar statement for the sequence of complete cut trees G n = cut( T n ) that isTheorem 3.4, the construction of the limit metric space G = cut( T ) first needs to be justified by resortingto Aldous’ theory of continuum random trees [5]. The first step consists in proving that the backbones of cut( T n ) converge. For each n ≥ , let ( V ni , i ≥ be a sequence of i.i.d. points of law p n . Recall thatwe defined cut( T ) using an increasing family ( S k ) k ≥ , defined in (3.4). We show in Section 5.3 that Lemma 5.3. Suppose that (H) holds. Then, for each k ≥ , we have σ n Span(cut( T n ); V n , · · · , V nk ) n →∞ −−−→ d, GH S k , (5.7) jointly with the convergence in (5.5) . Combining this with the identities for the discrete trees in Section 4, we can now prove Theorems 3.4and 3.5. Proof of Theorem 3.4. By Theorem 4.10, (cut( T n ) , p n ) and ( T n , p n ) have the same distribution foreach n ≥ . Recall the notation R nk for the subtree of T n spanning k i.i.d. p n -points. Then for each k ≥ we have S nk := Span(cut( T n ) , V n , . . . , V nk ) d = R nk . Now comparing (5.7) with (5.2), we deduce immediately that, for each k ≥ , S k d = R k . In particular the family ( S k ) k ≥ is consistent and leaf-tight in the sense of Aldous [5]. This even holdstrue almost surely conditional on T . According to Theorem 3 and Lemma 9 of [5], this entails that con-ditionally on cut( T ) , the empirical measure k (cid:80) ki =1 δ U i converges weakly to some probability measure ν on cut( T ) such that ( U i , i ≥ has the distribution of a sequence of i.i.d. ν -points. This proves theexistence of ν . Moreover, S k d = Span(cut( T ) , ξ , · · · , ξ k ) , where ( ξ i , i ≥ is an i.i.d. µ -sequence. Therefore, (5.7) entails that ( σ n cut( T n ) , p n ) → (cut( T ) , ν ) in distribution with respect to the Gromov–Prokhorov topology. Proof of Theorem 3.5. According to Theorem 3 of [5] the distribution of (cut( T ) , ν ) is characterized bythe family ( S k ) k ≥ . Since S k and R k have the same distribution for k ≥ , it follows that (cut( T ) , ν ) isdistributed like ( T , µ ) . 25 .2 Convergence of the cut-trees cut( T n , V n ) : Proof of Theorem 3.1 In this part we prove Theorem 3.1 taking Proposition 5.2 for granted. Let us first reformulate (5.6) inthe terms of the distance matrices, which is what we actually show in the following. For each n ∈ N ,let ( ξ ni , i ≥ be a sequence of random points of T n sampled independently according to the massmeasure p n .We set ξ n = V n and let ξ n be the root of H n = cut( T n , V n ) . Similarly, let ( ξ i , i ≥ be a sequenceof i.i.d. µ -points and let ξ = V . Recall that the mass measure ˆ µ of H = cut( T , V ) is defined to be thepush-forward of µ by the canonical injection φ . We set (cid:98) ξ i = φ ( ξ i ) for i ≥ , (cid:98) ξ = U and (cid:98) ξ to be theroot of H .Then the convergence in (5.6) is equivalent to the following: (cid:0) σ n d H n ( ξ ni , ξ nj ) , ≤ i < j < ∞ (cid:1) n →∞ −→ d (cid:0) d H ( (cid:98) ξ i , (cid:98) ξ j ) , ≤ i < j < ∞ (cid:1) , (5.8)jointly with (cid:0) σ n d T n ( ξ ni , ξ nj ) , ≤ i < j < ∞ (cid:1) n →∞ −→ d (cid:0) d T ( ξ i , ξ j ) , ≤ i < j < ∞ (cid:1) , (5.9)in the sense of finite-dimensional distributions. Notice that (5.9) is a direct consequence of (5.1). Inorder to express the terms in (5.8) with functionals of the cutting process, we introduce the followingnotations. For n ∈ N , let P n be a Poisson point process on R + × T n with intensity measure dt ⊗ L n ,where L n = p n /σ n . For u, v ∈ T n , recall that (cid:74) u, v (cid:75) denotes the path between u and v . For t ≥ , wedenote by T nt the set of nodes still connected to V n at time t : T nt := { x ∈ T n : [0 , t ] × (cid:74) V n , x (cid:75) ∩ P n = ∅ } . Recall that the remaining part of T at time t is T t = { x ∈ T : [0 , t ] × (cid:74) V, x (cid:75) ∩ P = ∅ } . We then define L nt := Card (cid:8) s ≤ t : p n ( T ns ) < p n ( T ns − ) (cid:9) a.s. = Card (cid:8) ( s, x ) ∈ P n : s ≤ t, x ∈ T ns − (cid:9) . (5.10)This is the number of cuts that affect the connected component containing V n before time t . In particular, L n ∞ := lim t →∞ L nt has the same distribution as L ( T n ) in the notation of Section 4. Indeed, this followsfrom the coupling on page 17 and the fact that if P n = { ( t i , x i ) : i ≥ } such that t ≤ t ≤ · · · then ( x i ) is an i.i.d. p n -sequence. Let us recall that L t , the continuous analogue of L nt , is defined by L t = (cid:82) t µ ( T s ) ds in Section 3. For n ∈ N and x ∈ T n , we define the pair ( τ n ( x ) , ς n ( x )) to be theelement of P n separating x from V n τ n ( x ) := inf { t > , t ] × (cid:74) V n , x (cid:75) ∩ P n (cid:54) = ∅ } , with the convention that inf ∅ = ∞ . In words, ς n ( x ) is the first cut that appeared on (cid:74) V n , x (cid:75) . For x ∈ T , ( τ ( x ) , ς ( x )) is defined similarly. We notice that almost surely τ ( ξ j ) < ∞ for each j ≥ , since τ ( ξ j ) is exponential with rate L ( (cid:74) V, ξ j (cid:75) ) , which is positive almost surely. Furthermore, it follows from ourconstruction of H n = cut( T n , V n ) that for n ∈ N and i, j ≥ , d H n ( ξ n , ξ n ) = L n ∞ − ,d H n ( ξ n , ξ nj ) = L nτ n ( ξ nj ) − d T n (cid:0) ξ nj , ς n ( ξ nj ) (cid:1) ; d H n ( ξ n , ξ nj ) = L n ∞ − L nτ n ( ξ nj ) + d T n (cid:0) ξ nj , ς n ( ξ nj ) (cid:1) , while for i, j ≥ , d H ( (cid:98) ξ , (cid:98) ξ ) = L ∞ ,d H ( (cid:98) ξ , (cid:98) ξ j ) = L τ ( ξ j ) + d T (cid:0) ξ j , ς ( ξ j ) (cid:1) ; d H ( (cid:98) ξ , (cid:98) ξ j ) = L ∞ − L τ ( ξ j ) + d T (cid:0) ξ j , ς ( ξ j ) (cid:1) . n ∈ N and i, j ≥ , if we define the event A n ( i, j ) := { τ n ( ξ ni ) = τ n ( ξ nj ) } a.s. = { ς n ( ξ ni ) = ς n ( ξ nj ) } , (5.11)and A cn ( i, j ) its complement, then on the event A n ( i, j ) , we have d H n ( ξ ni , ξ nj ) = d T n ( ξ ni , ξ nj ) . Similarlywe define A ( i, j ) := { τ ( ξ i ) = τ ( ξ j ) } , and note that A ( i, j ) = { ς ( ξ i ) = ς ( ξ j ) } almost surely. Recallthat (5.1) implies that σ n d T n ( ξ ni , ξ nj ) → d T ( ξ i , ξ j ) . Now, on the event A cn ( i, j ) , we have d H n ( ξ ni , ξ nj ) = (cid:12)(cid:12) L nτ n ( ξ nj ) − L nτ n ( ξ ni ) (cid:12)(cid:12) + d T n (cid:0) ξ nj , ς n ( ξ nj ) (cid:1) + d T n (cid:0) ξ ni , ς n ( ξ ni ) (cid:1) , if n ∈ N , and d H ( (cid:98) ξ i , (cid:98) ξ j ) = (cid:12)(cid:12) L τ ( ξ j ) − L τ ( ξ i ) (cid:12)(cid:12) + d T (cid:0) ξ j , ς ( ξ j ) (cid:1) + d T (cid:0) ξ i , ς ( ξ i ) (cid:1) , for the limit case. Therefore in order to prove (5.8), it suffices to show the joint convergence of the vector (cid:16) A n ( i,j ) , τ n ( ξ ni ) , σ n d T n (cid:0) ξ nj , ς n ( ξ nj ) (cid:1) , (cid:0) σ n L nt , t ∈ R + ∪ {∞} (cid:1)(cid:17) to the corresponding quantities for T , for each i, j ≥ . We begin with a lemma. Lemma 5.4. Under (H) , we have the following joint convergences as n → ∞ : ( p n ( T nt )) t ≥ d → ( µ ( T t )) t ≥ , (5.12) in Skorokhod J -topology, along with (cid:0) A n ( i,j ) , ≤ i, j ≤ k (cid:1) d → (cid:0) A ( i,j ) , ≤ i, j ≤ k (cid:1) , (5.13) (cid:0) τ n ( ξ nj ) , ≤ j ≤ k (cid:1) d → (cid:0) τ ( ξ j ) , ≤ j ≤ k (cid:1) , and (5.14) (cid:0) σ n d T n (cid:0) ξ nj , ς ( ξ nj ) (cid:1) , ≤ j ≤ k (cid:1) d → (cid:0) d T (cid:0) ξ j , ς n ( ξ j ) (cid:1) , ≤ j ≤ k (cid:1) , (5.15) for each k ≥ , and jointly with the convergence in (5.5) .Proof. Recall Proposition 5.2, which says that, for each k ≥ σ n R nk , L n (cid:22) R nk ) n →∞ −→ d ( R k , L (cid:22) R k ) , in Gromov–Hausdorff–Prokhorov topology. By the properties of the Poisson point process, this entailsthat for t ≥ , ( σ n R nk , P n (cid:22) [0 ,t ] × R nk ) d → ( R k , P (cid:22) [0 ,t ] × R k ) , (5.16)in Gromov–Hausdorff–Prokhorov topology, jointly with the convergence in (5.5). For each n ∈ N ,the pair ( τ n ( ξ ni ) , ς n ( ξ ni )) corresponds to the first jump of the point process P n restricted to (cid:74) V n , ξ ni (cid:75) .We notice that for each pair ( i, j ) such that ≤ i, j ≤ k , the event A n ( i, j ) occurs if and only if τ n ( ξ ni ∧ ξ nj ) ≤ min { τ n ( ξ ni ) , τ n ( ξ nj ) } . Similarly, ( τ ( ξ i ) , ς ( ξ i )) is the first point of P on R × (cid:74) V , ξ (cid:75) , and A ( i, j ) occurs if and only if τ ( ξ i ∧ ξ j ) ≤ min { τ ( ξ i ) , τ ( ξ j ) } . Therefore, the joint convergences in (5.13),(5.14) and (5.15) follow from (5.16). On the other hand, we have { ξ ni ∈ T nt } = { t<τ n ( ξ ni ) } , t ≥ , n ≥ For each fixed t ≥ , this sequence of random variables converge to { t<τ ( ξ i ) } = { ξ i ∈T t } by (5.16).By the law of large numbers, k − (cid:80) ≤ i ≤ k { t<τ n ( ξ nj ) } → p n ( T nt ) almost surely. Then we can find asequence k n → ∞ slowly enough such that (see also [6, Section 2.3]) k n k (cid:88) i =1 { t<τ n ( ξ nj ) } d → µ ( T t ) . n → ∞ , p n ( T nt ) d → µ ( T t ) . (5.17)Using (5.17) for a sequence of times ( t m , m ≥ dense in R + and combining with the fact that t (cid:55)→ µ ( T t ) is decreasing, we obtain the convergence in (5.12), jointly with (5.13), (5.14), (5.15) and (5.5). Proposition 5.5. Under (H) , we have ( σ n L nt , t ≥ n →∞ −→ d ( L t , t ≥ (5.18) with respect to the uniform topology, and jointly with the convergences in (5.13) , (5.14) and (5.15) . Inparticular, this entails that L ∞ < ∞ almost surely. Moreover we have L ∞ d = d T ( r ( T ) , V ) , (5.19) where V is a random point of distribution µ . The distribution of d T ( r ( T ) , V ) is given in (2.5) . The above proposition is a consequence of the following lemmas. Lemma 5.6. Jointly with (5.13) , (5.14) and (5.15) , we have for any m ≥ and ( t i , ≤ i ≤ m ) ∈ R m + , (cid:18) (cid:90) t i p n ( T ns ) ds, ≤ i ≤ m (cid:19) n →∞ −→ d (cid:18) (cid:90) t i µ ( T s ) ds, ≤ i ≤ m (cid:19) . Proof. This is a direct consequence of Lemma 5.4. Lemma 5.7. If we let M nt := σ n L nt − (cid:90) t p n ( T ns ) ds, n ≥ then under the hypothesis that σ n → as n → ∞ , the sequence of variables ( σ n M nt , n ≥ convergesto in L as n → ∞ . Moreover, this convergence is uniform on compacts. In particular, Lemma 5.6 and Lemma 5.7 combined entail that for any fixed t ≥ , σ n L nt → L t indistribution. However, to obtain the convergence of σ n L n ∞ to L ∞ in distribution we need the followingtightness condition. Lemma 5.8. Under (H) , for every δ > , lim t →∞ lim sup n →∞ P (cid:0) σ n (cid:0) L n ∞ − L nt (cid:1) ≥ δ (cid:1) = 0 , (5.20) Proof of Lemma 5.7. Let N nt = Card { ( s, x ) ∈ P n : s ≤ t } be the counting process of P n . Then (N nt , t ≥ is a Poisson process of rate /σ n . We write d N n for the Stieltjes measure associated with t (cid:55)→ N nt . For t ≥ , let M nt := L nt − (cid:90) [0 ,t ] p n ( T ns − ) d N ns , and N nt := σ n (cid:90) [0 ,t ] p n ( T ns − ) d N ns − (cid:90) t p n ( T ns ) ds. We notice that, by the definition of L nt , M nt = (cid:88) ( s,x ) ∈P n : s ≤ t (cid:16) { x ∈ T ns − } − p n ( T ns − ) (cid:17) . Since σ − n p n = L n , conditionally on T ns − , { x ∈ T ns − } is a Bernoulli random variable of mean p n ( T ns − ) .Therefore, we have E [ M nt | (N ns ) s ≤ t ] = 0 . (5.21)28rom this, we can readily show that M n is a martingale. On the other hand, classical results on thePoisson process entail that N n is also a martingale. Once combined, we see that M n = σ n M n + N n itself is a martingale. Therefore, by Doob’s maximal inequality for the L -norms of martingales, weobtain for any t ≥ , E (cid:20) sup s ≤ t (M ns ) (cid:21) ≤ E (cid:2) (M nt ) (cid:3) = 4 E (cid:2) ( σ n M nt ) (cid:3) + 4 E (cid:2) ( N nt ) (cid:3) , as a result of (5.21). Direct computation shows that E (cid:2) ( M nt ) (cid:3) = E (cid:20) σ n (cid:90) t (cid:16) p n ( T ns ) − p n ( T ns ) (cid:17) ds (cid:21) , and E (cid:2) ( N nt ) (cid:3) = E (cid:20) σ n (cid:90) t p n ( T ns ) ds (cid:21) . As a consequence, for any fixed t , E (cid:20) sup s ≤ t (M ns ) (cid:21) ≤ σ n E (cid:20) (cid:90) t p n ( T ns ) ds (cid:21) ≤ σ n t → , as n → ∞ .We need an additional argument to prove Lemma 5.8. For each n ∈ N and s ≥ , let ζ n ( s ) :=inf { t > L nt ≥ (cid:98) s (cid:99)} be the right-continuous inverse of L nt . Recall that from the construction of H n = cut( T n , V n ) , there is a correspondence between the vertex sets of the remaining tree at step (cid:96) − and the subtree in H at X (cid:96) . Then it follows Lemma 4.1 that (cid:0) v ( T nζ n ( s ) ) , ≤ s < L n ∞ (cid:1) d = (cid:0) v (Sub( T n , x ns )) , ≤ s < d T n ( r ( T n ) , V n ) (cid:1) , where x ns is the point on the path (cid:74) r ( T n ) , V n (cid:75) at distance (cid:98) s (cid:99) from r ( T n ) . In particular, this entails (cid:0) p n (cid:0) T nζ n ( s ) (cid:1) , ≤ s < L n ∞ (cid:1) d = (cid:0) p n (cid:0) Sub( T n , x ns ) (cid:1) , ≤ s < d T n ( r ( T n ) , V n ) (cid:1) . (5.22)The limit of the right-hand side is easily identified using the convergence of p -trees in (5.1). Combinedwith (5.22), this will allow us to prove Lemma 5.8 by a time-change argument.Let V be a random point of T of distribution µ . For ≤ s ≤ d T ( r ( T ) , V ) , let x s be the pointin (cid:74) r ( T ) , V (cid:75) at distance s from r ( T ) , or x s = V if (cid:96) > d ( r ( T ) , V ) . Similarly, we set x ns = V n if s ≥ d T n ( r ( T n ) , V n ) . Lemma 5.9. Under (H) , we have (cid:16) σ n L n ∞ , (cid:0) p n (cid:0) T nζ n ( s/σ n ) (cid:1)(cid:1) s ≥ (cid:17) n →∞ −→ d (cid:16) d T ( r ( T ) , V ) , (cid:0) µ (Sub( T , x s )) (cid:1) s ≥ (cid:17) , where the convergence of the second coordinates is with respect to the Skorokhod J -topology.Proof. Because of (5.22) and the fact σ n → , it suffices to prove that (cid:16) p n (cid:0) Sub( T n , x ns/σ n ) , s ≥ (cid:17) n →∞ −→ d (cid:16) µ (cid:0) Sub( T , x s (cid:1) , s ≥ (cid:17) , with respect to the Skorokhod J -topology, jointly with σ n d T n ( r ( T n ) , V n ) → d T ( r ( T ) , V ) in distri-bution. Recall that ( ξ ni , i ≥ is a sequence of i.i.d. points of common law p n and set ξ n = V n , ξ n = r ( T n ) , for n ∈ N . Note that ( ξ ni , i ≥ is still an i.i.d. sequence. Then it follows from (5.1) that ( σ n d T n ( ξ ni , ξ nj ) , i, j ≥ d −→ ( d T ( ξ i , ξ j ) , i, j ≥ 29n the sense of finite-dimensional distributions. Taking i = 0 and j = 1 , we get the convergence σ n d T n ( V n , r ( T n )) d → d T ( V, r ( T )) . On the other hand, for i ≥ , ξ ni ∈ Sub( T n , x ns ) if and only if d T n ( ξ ni ∧ V n , r ( T n )) ≥ s . Since for anyrooted tree ( T, d, r ) and u, v ∈ T we have d ( r, u ∧ v ) = d ( r, u ) + d ( r, v ) − d ( u, v ) , we deduce that forany k, m ≥ and ( s j , ≤ j ≤ m ) ∈ R m + , (cid:16) { ξ ni ∈ Sub( T n ,x nsj/σn ) } , ≤ i ≤ k, ≤ j ≤ m (cid:17) d → (cid:16) { ξ i ∈ Sub( T ,x sj ) } , ≤ i ≤ k, ≤ j ≤ m (cid:17) , jointly with σ n d T n ( V n , r ( T n )) d → d T ( V, r ( T )) . Then the argument used to establish (5.17) shows theconvergence of ( p n (Sub( T n , x ns/σ n ) , n ≥ in the sense of finite-dimensional distributions. The conver-gence in the Skorokhod topology follows from the monotonicity of the function s (cid:55)→ p n (Sub( T n , x ns )) . Proof of Lemma 5.8. Let us begin with a simple observation on the Skorokhod J -topology. Let D ↑ bethe set of those functions x : R + → [0 , which are nondecreasing and c`adl`ag. We endow D ↑ with theSkorokhod J -topology. Taking (cid:15) > and x ∈ D ↑ , we denote by κ (cid:15) ( x ) = inf { t > x ( t ) > (cid:15) } . Thefollowing is a well-known fact. A proof can be found in [35, Ch. VI, p. 304, Lemma 2.10] FACT If x n → x in D ↑ , n → ∞ and t (cid:55)→ x ( t ) is strictly increasing, then κ (cid:15) ( x n ) → κ (cid:15) ( x ) as n → ∞ .If x = ( x ( t ) , t ≥ is a process with c`adl`ag paths and t ∈ R + , we denote by R t [ x ] the reversedprocess of x at t : R t [ x ]( t ) = x (cid:0) ( t − t ) − (cid:1) if t < t and R t [ x ]( t ) = x (0) otherwise. For each n ≥ , let x n ( t ) = p n ( T nζ n ( t ) ) , t ≥ and denoteby Λ n = R L n ∞ [ x n ] the reversed process at L n ∞ . Similarly, let y ( t ) = µ (Sub( T , x t )) , t ≥ and denoteby Λ = R D [ y ] for D = d T ( V, r ( T )) . Then almost surely Λ n ∈ D ↑ for n ∈ N and Λ ∈ D ↑ . Moreover,Lemma 5.9 says that (cid:0) Λ n ( t/σ n ) , t ≥ (cid:1) n →∞ −→ d (cid:0) Λ( t ) , t ≥ (cid:1) (5.23)in D ↑ . From the construction of the ICRT in Section 2.5 it is not difficult to show that t (cid:55)→ Λ( t ) is strictlyincreasing. Then by the above FACT, we have σ n κ (cid:15) (Λ n ) → κ (cid:15) (Λ) in distribution, for each (cid:15) > . Inparticular, we have for any fixed δ > , lim (cid:15) → lim sup n →∞ P (cid:0) σ n κ (cid:15) (Λ n ) ≥ δ (cid:1) ≤ lim (cid:15) → P (cid:0) κ (cid:15) (Λ) ≥ δ (cid:1) = 0 , (5.24)since almost surely Λ( t ) > for any t > .By Lemma 5.4, the sequence (( p n ( T nt )) t ≥ , n ≥ is tight in the Skorokhod topology. Combinedwith the fact that, for each fixed n , p n ( T nt ) (cid:38) as t → ∞ almost surely, this entails that for any fixed (cid:15) > t →∞ lim sup n →∞ P (cid:18) sup t ≥ t p n ( T nt ) ≥ (cid:15) (cid:19) = 0 . (5.25)Now note that if L nt = k ∈ N , then T nt = T nζ n ( k ) a.s. since no change occurs until the time of the nextcut, in particular we have p n ( T nt ) = p n (cid:0) T nζ n ( L nt ) (cid:1) a.s. , from which we deduce that (cid:8) p n ( T nt ) < (cid:15) (cid:9) ⊆ (cid:8) κ (cid:15) (Λ n ) ≥ L n ∞ − L nt (cid:9) a.s. , (cid:8) σ n (cid:0) L n ∞ − L nt (cid:1) ≥ δ (cid:9) ∩ (cid:8) sup t ≥ t p n ( T nt ) < (cid:15) (cid:9) ⊆ (cid:8) σ n κ (cid:15) (Λ n ) ≥ δ (cid:9) , a.s.Therefore, lim sup n →∞ P (cid:0) σ n ( L n ∞ − L nt ) ≥ δ (cid:1) ≤ lim sup n →∞ P (cid:18) sup t ≥ t p n ( T nt ) ≥ (cid:15) (cid:19) + lim sup n →∞ P (cid:18) σ n (cid:0) L n ∞ − L nt (cid:1) ≥ δ and sup t ≥ t p n ( T nt ) < (cid:15) (cid:19) ≤ lim sup n →∞ P (cid:18) sup t ≥ t p n ( T nt ) ≥ (cid:15) (cid:19) + lim sup n →∞ P ( σ n κ (cid:15) (Λ n ) ≥ δ ) . In above, if we let first t → ∞ and then (cid:15) → , we obtain (5.20) as a combined consequence of (5.24)and (5.25). Proof of Proposition 5.5. We fix a sequence of ( t m , m ≥ , which is dense in R + . Combining Lem-mas 5.6 and 5.7, we obtain, for all k ≥ , (cid:0) σ n L nt m , ≤ m ≤ k (cid:1) n →∞ −→ d (cid:0) L t m , ≤ m ≤ k (cid:1) , (5.26)jointly with the convergences in (5.13), (5.14), (5.15) and (5.5). We deduce from this and Lemma 5.8that L ∞ < ∞ a.s. and σ n L n ∞ n →∞ −→ d L ∞ , (5.27)jointly with (5.13), (5.14), (5.15) and (5.5), by Theorem 4.2 of [17, Chapter 1]. Combined with the factthat t (cid:55)→ L t is continuous and increasing, this entails the uniform convergence in (5.18). Finally, thedistributional identity (5.19) is a direct consequence of Lemma 5.9. Proof of Theorem 3.1. We have seen that L ∞ < ∞ almost surely. Therefore the cut tree (cut( T , V ) , ˆ µ ) is well defined almost surely. Comparing the expressions of d H n ( ξ ni , ξ nj ) given at the beginning of thissubsection with those of d H ( ξ i , ξ j ) , we obtain from Lemma 5.4 and Proposition 5.5 the convergence in(5.8). This concludes the proof. Remark. Before concluding this section, let us say a few more words on the proof of Proposition 5.5. Theconvergence of ( σ n L nt , t ≥ to ( L t , t ≥ on any finite interval follows mainly from the convergencein Proposition 5.2. The proof here can be easily adapted to the other models of random trees, see [16,39]. On the other hand, our proof of the tightness condition (5.20) depends on the specific cuttingson the birthday trees, which has allowed us to deduce the distributional identity (5.22). In general, theconvergence of L n ∞ may indeed fail. An obvious example is the classical record problem (see Example1.4 in [36]), where we have L nt → L t for any fixed t , while L n ∞ ∼ ln n and therefore is not tight in R . cut( T n ) : Proof of Lemma 5.3 Let us recall the settings of the complete cutting down procedure for T : ( V i , i ≥ is an i.i.d. sequenceof common law µ ; T V i ( t ) is the equivalence class of ∼ t containing V i , whose mass is denoted by µ i ( t ) ;and L it = (cid:82) t µ i ( s ) ds . The complete cut-tree cut( T ) is defined as the complete separable metric space ∪ k S k . We introduce some corresponding notations for the discrete cuttings on T n . For each n ≥ , we31ample a sequence of i.i.d. points ( V ni , i ≥ on T n of distribution p n . Recall P n the Poisson pointprocess on R + × T n of intensity dt ⊗ L n . We define µ n,i ( t ) := p n ( { u ∈ T n : [0 , t ] × (cid:74) u, V ni (cid:75) ∩ P n = ∅ } ) ,L n,it := Card { s ≤ t : µ n,i ( s ) < µ n,i ( s − ) } , t ≥ , i ≥ τ n ( i, j ) := inf { t ≥ , t ] × (cid:74) V ni , V nj (cid:75) ∩ P n (cid:54) = ∅ } , ≤ i, j < ∞ . By the construction of G n = cut( T n ) , we have d G n ( V ni , r ( G n )) = L n,i ∞ − , (5.28) d G n ( V ni , V nj ) = L n,i ∞ + L n,j ∞ − L n,iτ n ( i,j ) , ≤ i, j < ∞ where L n,i ∞ := lim t →∞ L n,it is the number of cuts necessary to isolate V ni . The proof of Lemma 5.3 isquite similar to that of Theorem 3.1. We outline the main steps but leave out the details. Sketch of proof of Lemma 5.3. First, we can show with essentially the same proof of Lemma 5.4 that wehave the following joint convergences: for each k ≥ , (cid:0)(cid:0) µ n,i ( t ) , ≤ i ≤ k (cid:1) , t ≥ (cid:1) n →∞ −→ d (cid:0)(cid:0) µ i ( t ) , ≤ i ≤ k (cid:1) , t ≥ (cid:1) , (5.29)with respect to Skorokhod J -topology, jointly with (cid:0) τ n ( i, j ) , ≤ i, j ≤ k (cid:1) n →∞ −→ d (cid:0) τ ( i, j ) , ≤ i, j ≤ k (cid:1) , (5.30)jointly with the convergence in (5.5). Then we can proceed, with the same argument as in the proof ofLemma 5.7, to showing that for any k, m ≥ and ( t j , ≤ j ≤ m ) ∈ R m + , (cid:18) (cid:90) t i µ n,i ( s ) ds, ≤ j ≤ m, ≤ i ≤ k (cid:19) n →∞ −→ d (cid:18) (cid:90) t i µ i ( s ) ds, ≤ j ≤ m, ≤ i ≤ k (cid:19) . Since the V ni , i ≥ are i.i.d. p n -nodes on T n , each process ( L n,it ) t ≥ has the same distribution as ( L nt ) t ≥ defined in (5.10). Then Lemmas 5.7 and 5.8 hold true for each L n,i , i ≥ . We are able to show (cid:0) σ n L n,it , ≤ i ≤ k (cid:1) t ≥ n →∞ −→ d (cid:0) L it , ≤ i ≤ k (cid:1) t ≥ , (5.31)with respect to the uniform topology, jointly with the convergences (5.30) and (5.5). Comparing (5.28)with (3.4), we can easily conclude.In general, the convergence in (5.1) does not hold in the Gromov–Hausdorff topology. However, inthe case where T is a.s. compact and the convergence (5.1) does hold in the Gromov–Hausdorff sense,then we are able to show that one indeed has GHP convergence as claimed in Theorem 3.6. In thefollowing proof, we only deal with the case of convergence of cut( T n ) . The result for cut( T n , V n ) canbe obtained using similar arguments and we omit the details. Proof of Theorem 3.6. We have already shown in Lemma 5.3 the joint convergence of the spanning sub-trees: for each k ≥ , (cid:0) σ n R nk , σ n S nk (cid:1) n →∞ −→ d, GH (cid:0) R k , S k (cid:1) . (5.32)We now show that for each (cid:15) > , lim k →∞ lim n →∞ P (cid:0) max (cid:8) d GH ( R nk , T n ) , d GH ( S nk , cut( T n )) (cid:9) ≥ (cid:15)/σ n (cid:1) = 0 . (5.33)32ince the couples ( S nk , cut( T n )) and ( R nk , T n ) have the same distribution, it is enough to prove that foreach (cid:15) > , lim k →∞ lim n →∞ P ( σ n d GH ( R nk , T n ) ≥ (cid:15) ) = 0 . (5.34)Let us explain why this is true when ( σ n T n , p n ) → ( T , µ ) in distribution in the sense of GHP. Recallthe space M kc of equivalence classes of k -pointed compact metric spaces, equipped with the k -pointedGromov–Hausdorff metric. For each k ≥ and (cid:15) > , we set A ( k, (cid:15) ) := (cid:8) ( T, d, x ) ∈ M kc : d GH (cid:0) T, Span( T ; x ) (cid:1) ≥ (cid:15) (cid:9) . It is not difficult to check that A ( k, (cid:15) ) is a closed set of M kc . Now according to the proof of Lemma 13 of[41], the mapping from M c to M kc : ( T, µ ) (cid:55)→ m k ( T, A ( k, (cid:15) )) is upper-semicontinuous, where M c is theset of equivalence classes of compact measured metric spaces, equipped with the Gromov–Hausdorff–Prokhorov metric and m k is defined by m k ( T, A ( k, (cid:15) )) := (cid:90) T k µ ⊗ k ( d x ) { [ T, x ] ∈ A ( k,(cid:15) ) } . Applying the Portmanteau Theorem for upper-semicontinuous mappings [17, p. 17, Problem 7], weobtain lim sup n →∞ E (cid:2) m k (cid:0) ( σ n T n , p n ) , A ( k, (cid:15) ) (cid:1)(cid:3) ≤ E (cid:2) m k (cid:0) ( T , µ ) , A ( k, (cid:15) ) (cid:1)(cid:3) , or, in other words, lim sup n →∞ P (cid:0) σ n d GH (cid:0) T n , R nk (cid:1) ≥ (cid:15) (cid:1) ≤ P (cid:0) d GH (cid:0) T , R k (cid:1) ≥ (cid:15) (cid:1) −−−→ k →∞ , since d GH ( R k , T ) → almost surely for T is compact [5]. This proves (5.34) and thus (5.33). By[17, Ch. 1, Theorem 4.5], (5.32) combined with (5.33) entails the joint convergence in distribution of ( σ n T n , σ n cut( T n )) to ( T , cut( T )) in the Gromov–Hausdorff topology. To strengthen to the Gromov–Hausdorff–Prokhorov convergence, one can adopt the arguments in Section 4.4 of [31] and we omit thedetails. In this section, we justify the heuristic construction of shuff( H , U ) given in Section 3 for an ICRT H and a uniform leaf U . The objective is to define formally the shuffle operation in such a way that theidentity (3.6) hold. In Section 6.1, we rely on weak convergence arguments to justify the constructionof shuff( H , U ) by showing it is the limit of the discrete construction in Section 4.1. In Section 6.2, wethen determine from this result the distribution of the cuts in the cut-tree cut( T , V ) and prove that withthe right coupling, the shuffle can yield the initial tree back (or more precisely, a tree that is in the sameGHP equivalence class, which is as good as it gets). shuff( H , U ) Let ( H , d H , µ H ) be an ICRT rooted at r ( H ) , and let U be a random point in H of distribution µ H . Then H is the disjoint union of the following subsets: H = (cid:91) x ∈ (cid:74) r ( H ) ,U (cid:75) F x where F x := { u ∈ T : (cid:74) r ( H ) , u (cid:75) ∩ (cid:74) r ( H ) , U (cid:75) = (cid:74) r, x (cid:75) } . It is easy to see that F x is a subtree of T . It is nonempty ( x ∈ F x ), but possibly trivial ( F x = { x } ). Let B := { x ∈ (cid:74) r ( H ) , U (cid:75) : µ H ( F x ) > } ∪ { U } , and for x ∈ B , let S x := Sub( T, x ) \ F x , which is the33nion of those F y such that y ∈ B and d H ( U, y ) < d H ( U, x ) . Then for each x ∈ B \ { U } , we associatean attaching point A x , which is independent and sampled according to µ H | S x , the restriction of µ H to S x . We also set A U = U .Now let ( ξ i , i ≥ be a sequence of i.i.d. points of common law µ H . The set F := ∪ x ∈ B F x has fullmass with probability one. Thus almost surely ξ i ∈ F for each i ≥ . We will use ( ξ i ) i ≥ to span thetree shuff( H , U ) and the point ξ is the future root of shuff( H , U ) . For each ξ i , we define inductivelytwo sequences x i := ( x i (0) , x i (1) , · · · ) ∈ B and a i := ( a i (0) , a i (1) , · · · ) : we set a i (0) = ξ i , and, for j ≥ , x i ( j ) = a i ( j ) ∧ U, and a i ( j + 1) = A x i ( j ) . By definition of ( A x , x ∈ B ) , the distance d H ( r ( H ) , x i ( k )) is increasing in k ≥ . For each i, j ≥ , wedefine the merging time mg( i, j ) := inf { k ≥ ∃ l ≤ k and x i ( l ) = x j ( k − l ) } , with the convention inf ∅ = ∞ . Another way to present mg( i, j ) is to consider the graph on B with theedges { x, A x ∧ U } , x ∈ B , then mg( i, j ) is the graph distance between ξ i ∧ U and ξ j ∧ U . On the event { mg( i, j ) < ∞} , there is a path in this graph that has only finitely many edges, and the two walks x i and x j first meet at a point y ( i, j ) ∈ B (where by first, we mean with minimum distance to the root r ( H ) ).In particular, if we set I ( i, j ) , I ( j, i ) to be the respective indices of the element y ( i, j ) appearing in x i and x j , that is, I ( i, j ) = inf { k ≥ x i ( k ) = y ( i, j ) } and I ( j, i ) = inf { k ≥ x j ( k ) = y ( i, j ) } , with the convention that I ( i, j ) = I ( j, i ) = ∞ if mg( i, j ) = ∞ , then mg( i, j ) = I ( i, j ) + I ( j, i ) .Write Ht( u ) = d ( u, u ∧ U ) for the height of u in the one of F x , x ∈ B ) , containing it. On the event { mg( i, j ) < ∞} we define γ ( i, j ) which is meant to be the new distance between ξ i and ξ j : γ ( i, j ) := I ( i,j ) − (cid:88) k =0 Ht( a i ( k )) + I ( j,i ) − (cid:88) k =0 Ht( a j ( k )) + d H ( a i ( I ( i, j )) , a j ( I ( j, i ))) , with the convention if k ranges from to − , the sum equals zero.The justification of the definition relies on weak convergence arguments: Let p n , n ≥ , be a se-quence of probability measures such that (H) holds with θ the parameter of H . Let H n be a p n -tree and U n a p n -node. Let ( ξ ni ) i ≥ be a sequence of i.i.d. p n -points. Then, the quantities S nx , B n , x n , a n , and ξ ξ x (0) x (1) a (1) x (2) x (3) = x (1) x (0) a (1) a (3) a (2) Figure 5: An example with I (1 , 2) = 3 , I (2 , 1) = 1 and mg(1 , 2) = 4 . The dashed lines indicate theidentifications where the root of the relevant subtrees are sent to. The blue lines represent the location ofthe path between ξ and ξ before the transformation. g n ( i, j ) are defined for H n in the same way as S x , B , x , a , and mg( i, j ) have been defined for H . Let d H n denote the graph distance on H n . There is only a slight difference in the definition of the distances γ n ( i, j ) := I n ( i,j ) − (cid:88) k =0 (cid:16) Ht( a ni ( k )) + 1 (cid:17) + I n ( j,i ) − (cid:88) k =0 (cid:16) Ht( a nj ( k )) + 1 (cid:17) + d H n (cid:16) a i ( I n ( i, j )) , a nj ( I n ( j, i )) (cid:17) , to take into account the length of the edges { x, A nx } , for x ∈ B n . In that case, the sequence x n (resp. a n ) is eventually constant and equal to U n so that mg n ( i, j ) < ∞ with probability one. Furthermore, theunique tree defined by the distance matrix ( γ n ( i, j ) : i, j ≥ is easily seen to have the same distributionas the one defined in Section 4.1, since the attaching points are sampled with the same distributions and ( γ n ( i, j ) : i, j ≥ coincides with the tree distance after attaching. Recall that we have re-rooted shuff( H n , U n ) at a random point of law p n . We may suppose this point is ξ n . Therefore we have(Proposition 4.3) (shuff( H n , U n ) , H n ) d = ( H n , cut( H n , U n )) , (6.1)by Lemma 4.1.In the case of the ICRT H , it is a priori not clear that P (mg( i, j ) < ∞ ) = 1 . We prove that Theorem 6.1. For any ICRT ( H , µ H ) and a µ H -point U , we have the following assertions:a) almost surely for each i, j ≥ , we have mg( i, j ) < ∞ ;b) almost surely the distance matrix ( γ ( i, j ) , ≤ i, j < ∞ ) defines a CRT, denoted by shuff( H , U ) ;c) (shuff( H , U ) , H ) and ( H , cut( H , V )) have the same distribution. The main ingredient in the proof of Theorem 6.1 is the following lemma: Lemma 6.2. Under (H) , for each k ≥ , we have the following convergences (cid:0) σ n d H n ( r ( H n ) , x ni ( j )) , ≤ i ≤ k, ≤ j ≤ k (cid:1) n →∞ −→ d (cid:0) d H ( r ( H ) , x i ( j )) , ≤ i ≤ k, ≤ j ≤ k (cid:1) , (6.2) (cid:0) p n ( S nx ni ( j ) ) , ≤ i ≤ k, ≤ j ≤ k (cid:1) n →∞ −→ d (cid:0) µ H ( S x i ( j ) ) , ≤ i ≤ k, ≤ j ≤ k (cid:1) , (6.3) and (cid:0) σ n H n , ( a ni ( j ) , ≤ i ≤ k, ≤ j ≤ k ) (cid:1) n →∞ −→ d (cid:0) H , ( a i ( j ) , ≤ i ≤ k, ≤ j ≤ k ) (cid:1) , (6.4) in the weak convergence of the pointed Gromov–Prokhorov topology.Proof. Fix some k ≥ . We argue by induction on j . For j = 0 , we note that a ni (0) = ξ ni and x ni (0) = ξ ni ∧ U n . Then the convergences in (6.4) and (6.2) for j = 0 follows easily from (5.1). On theother hand, we can prove (6.3) with the same proof as in Lemma 5.9. Suppose now (6.2), (6.3) and (6.4)hold true for some j ≥ . We Notice that a ni ( j + 1) is independently sampled according to p n restrictedto S x ni ( j ) , we deduce (6.4) for j + 1 from (5.1). Then the convergence in (6.2) also follows for j + 1 ,since x ni ( j + 1) = a ni ( j ) ∧ U n . Finally,the very same arguments used in the proof of Lemma 5.9 showthat (6.3) holds for j + 1 . Proof of Theorem 6.1. Proof of a) By construction, shuff( H n , U n ) is the reverse transformation of theone from H n to cut( H n , U n ) in the sense that each attaching “undoes” a cut. In consequence, since mg n ( i, j ) is the number of cuts to undo in order to get ξ ni and ξ nj in the same connected component, mg n ( i, j ) has the same distribution as the number of the cuts that fell on the path (cid:74) ξ ni , ξ nj (cid:75) . But the latteris stochastically bounded by a Poisson variable N n ( i, j ) of mean d H n ( ξ ni , ξ nj ) · E n ( i, j ) , where E n ( i, j ) isan independent exponential variable of rate d H n ( U n , ξ ni ∧ ξ nj ) . Indeed, each cut is a point of the Poisson35oint process P n and no more cuts fall on (cid:74) ξ ni , ξ nj (cid:75) after the time of the first cut on (cid:74) U n , ξ ni ∧ ξ nj (cid:75) . Butthe time of the first cut on (cid:74) U n , ξ ni ∧ ξ nj (cid:75) has the same distribution as E n ( i, j ) and is independent of P n restricted on (cid:74) ξ ni , ξ nj (cid:75) . The above argument shows that mg n ( i, j ) = I n ( i, j ) + I n ( j, i ) ≤ st N n ( i, j ) , i, j ≥ , n ≥ , (6.5)where ≤ st denotes the stochastic domination order. It follows from (5.1) that, jointly with the conver-gence in (5.1), we have N n ( i, j ) → N ( i, j ) in distribution, as n → ∞ , where N ( i, j ) is a Poissonvariable with parameter d H ( ξ i , ξ j ) · E ( i, j ) with E ( i, j ) an independent exponential variable of rate d H ( U, ξ i ∧ ξ j ) , which is positive with probability one. Thus the sequence (mg n ( i, j ) , n ≥ is tight in R + .On the other hand, observe that for x ∈ B , P ( A x ∈ F y ) = µ H ( F y ) /µ H ( S x ) if y ∈ B and d H ( U, y ) < d H ( U, x ) . In particular, for two distinct points x, x (cid:48) ∈ B , P ( ∃ y ∈ B such that A x ∈ F y , A x (cid:48) ∈ F y ) = (cid:88) y µ H ( F y ) µ H ( S x ) µ H ( S x (cid:48) ) , where the sum is over those y ∈ B such that d H ( U, y ) < min { d H ( U, x ) , d H ( U, x (cid:48) ) } . Similarly, for n ≥ , P (cid:0) ∃ y ∈ B n such that A nx ∈ F ny , A nx (cid:48) ∈ F ny (cid:1) = (cid:88) y p n ( F ny ) p n ( S nx ) p n ( S nx (cid:48) ) . Then it follows from (6.2) and the convergence of the masses in Lemma 5.9 that P ( I n ( i, j ) = 1; I n ( j, i ) = 1) = P (cid:0) ∃ y ∈ B n such that A nx i (0) ∈ F ny , A nx j (0) ∈ F ny (cid:1) n →∞ −→ P ( I ( i, j ) = 1; I ( j, i ) = 1) . By induction and Lemma 6.2, this can be extended to the following: for any natural numbers k , k ≥ ,we have P ( I n ( i, j ) = k ; I n ( j, i ) = k ) n →∞ −→ P ( I ( i, j ) = k ; I ( j, i ) = k ) . Combined with the tightness of (mg n ( i, j ) , n ≥ 1) = ( I n ( i, j ) + I n ( j, i ) , n ≥ , this entails that ( I n ( i, j ) , I n ( j, i )) n →∞ −→ d ( I ( i, j ) , I ( j, i )) , i, j ≥ (6.6)jointly with (6.2) and (6.4), using the usual subsequence arguments. In particular, I ( i, j ) + I ( j, i ) ≤ st N ( i, j ) < ∞ almost surely, which entails that mg( i, j ) < ∞ almost surely, for each pair ( i, j ) ∈ N × N . Proof of b) It follows from (6.4), (6.6) and the expression of γ ( i, j ) that (cid:0) σ n γ n ( i, j ) , i, j ≥ (cid:1) n →∞ −→ d (cid:0) γ ( i, j ) , i, j ≥ (cid:1) , (6.7)in the sense of finite-dimensional distributions, jointly with the Gromov–Prokhorov convergence of σ n H n to H in (5.1). However by (6.1), the distribution of shuff( H n , U n ) is identical to H n . Hence,the unconditional distribution of ( γ ( i, j ) , ≤ i, j < ∞ ) is that of the distance matrix of the ICRT H .We can apply Aldous’ CRT theory [5] to conclude that for a.e. H , the distance matrix ( γ ( i, j ) , i, j ≥ defines a CRT, denoted by shuff( H , U ) . Moreover, there exists a mass measure ˜ µ , such that if ( ˜ ξ i ) i ≥ isan i.i.d. sequence of law ˜ µ , then ( d shuff( H ,U ) ( ˜ ξ i , ˜ ξ j ) , ≤ i, j < ∞ ) d = ( γ ( i, j ) , ≤ i, j < ∞ ) . Therefore, we can rewrite (6.7) as (cid:0) σ n shuff( H n , U n ) , σ n H n (cid:1) n →∞ −→ d (cid:0) shuff( H , U ) , H (cid:1) , (6.8)36ith respect to the Gromov–Prokhorov topology. Proof of c) This is an easy consequence of (6.1) and (6.8). Let f, g be two arbitrary bounded functionscontinuous in the Gromov–Prokhorov topology. Then (6.8) and the continuity of f, g entail that E [ f (shuff( H , U )) · g ( H )] = lim n →∞ E [ f ( σ n shuff( H n , U n )) · g ( σ n H n )]= lim n →∞ E [ f ( σ n H n ) · g ( σ n cut( H n , U n )]= E [ f ( H ) · g (cut( H , U ))] , where we have used (6.1) in the second equality. Thus we obtain the identity in distribution in c). According to Proposition 4.3 and (6.1), the attaching points a ni ( j ) have the same distribution as thepoints where the cuts used to be connected to in the p n tree H n . Then Theorem 6.1 suggests that theweak limit a i ( j ) should play a similar role for the continuous tree. Indeed in this section, we show that a i ( j ) represent the “holes” left by the cutting on ( T t ) t ≥ .Let ( T , d T , µ ) be the ICRT in Section 5. The µ -point V is isolated by successive cuts, which areelements of the Poisson point process P . Now let ξ (cid:48) , ξ (cid:48) be two independent points sampled according to µ . We plan to give a description of the image of the path (cid:74) ξ (cid:48) , ξ (cid:48) (cid:75) in the cut tree cut( T , V ) , which turnsout to be dual to the construction of one path in shuff( H , U ) .During the cutting procedure which isolates V , the path (cid:74) ξ (cid:48) , ξ (cid:48) (cid:75) is pruned from the two ends intosegments. See Figure 6. Each segment is contained in a distinct portion ∆ T t := T t − \ T t , which isdiscarded at time t . Also recall that ∆ T t is grafted on the interval [0 , L ∞ ] to construct cut( T , V ) . Thefollowing is just a formal reformulation: Lemma 6.3. Let ( t , , y , ) , ( t , , y , ) , · · · , ( t ,M , y ,M ) and ( t , , y , ) , ( t , , y , ) , · · · , ( t ,M , y ,M ) , be the respective (finite) sequences of cuts on (cid:74) ξ (cid:48) , V (cid:75) ∩ (cid:74) ξ (cid:48) , ξ (cid:48) (cid:75) and (cid:74) ξ (cid:48) , V (cid:75) ∩ (cid:74) ξ (cid:48) , ξ (cid:48) (cid:75) such that An example with M = 3 and M = 1 . Above, the cuts partition the path between ξ (cid:48) and ξ (cid:48) into segments. The cross represents the first cut on (cid:74) ξ (cid:48) , V (cid:75) ∩ (cid:74) ξ (cid:48) , V (cid:75) . Below, the image of these segmentsin cut( T , V ) . two leaves contained in the closure of φ (∆ T t me ) . Comparing this with Theorem 6.1, one may suspectthat { O ( t ,j ) : 1 ≤ j ≤ M } , { O ( t ,j ) : 1 ≤ j ≤ M } should have the same distribution as { a ( j ) :1 ≤ j ≤ I (1 , } , { a ( j ) : 1 ≤ j ≤ I (2 , } . This is indeed true. In the following, we show a slightlymore general result about all the points. For each t ∈ C = { t > µ (∆ T t ) > } , let x ( t ) ∈ T bethe point such that ( t, x ( t )) ∈ P . Then we can define O ( t ) to be the point of cut( T , V ) which marksthe “hole” left by the cutting at x ( t ) . More precisely, let ( t (cid:48) , x (cid:48) ) be the first element after time t of P on (cid:74) r ( T ) , x ( t ) (cid:74) . Then there exists some point O ( t ) such that the closure of φ ( (cid:75) x ( t ) , x (cid:48) (cid:75) ) in cut( T , V ) is (cid:74) O ( t ) , φ ( x (cid:48) ) (cid:75) . Proposition 6.4. Conditionally on cut( T , V ) , the collection { O ( t ) , t ∈ C} is independent, and each O ( t ) has distribution ˆ µ restricted to ∪ s>t φ ( T s − \ T s ) .Proof. It suffices to show that { O ( t ) , t ∈ C} has the same distribution as the collection of attaching points { A x , x ∈ B } introduced in the previous section. Observe that if we take ( H , U ) = (cut( T , V ) , L ∞ ) and replace { A x , x ∈ B } with { O ( t ) , t ∈ C} , then it follows that shuff( H , U ) is isometric to T , sincethe two trees are metric completions of the same distance matrix with probability one. In particular, wehave (shuff( H , U ) , H ) d = ( T , cut( T , V )) . (6.9)Therefore, to determine the distribution of { O ( t ) , t ∈ C} , we only need to argue that the distribution of { A x , x ∈ B } is the unique distribution for which (6.9) holds. To see this, we notice that (6.9) implies thatthe distribution of ( γ ( i, j )) i,j ≥ is unique. But from the distance matrix ( γ ( i, j )) i,j ≥ (also given H and ( ξ i , i ≥ ), we can recover ( a i (1) , i ≥ , which is a size-biased resampling of ( A x , x ∈ B ) . Indeed, thesequence ( ξ k ) k ≥ is everywhere dense in H . For x ∈ B , let ( ξ m k , k ≥ be the subsequence consistingof the ξ i contained in F x . Then a i (1) ∈ F x if and only if lim inf k →∞ γ ( i, m k ) − Ht( ξ i ) = 0 , where38 t( ξ i ) = d H ( ξ i , ξ i ∧ U ) . Moreover, if the latter holds, we also have d H ( a i (1) , ξ m k ) = γ ( i, m k ) − Ht( ξ i ) .By Gromov’s reconstruction theorem [30, ], we can determine a i (1) for each i ≥ . By the previousarguments, this concludes the proof.The above proof also shows that if we use ( O ( t ) , t ∈ C ) to define the points ( A x , x ∈ B ) then theshuffle operation yields a tree that is undistinguishable from the original ICRT T . Recall the setting at the beginning of Section 5.1. Then proving Proposition 5.2 amounts to show that foreach k ≥ , we have (cid:0) σ n R nk , L n (cid:22) R nk (cid:1) n →∞ −→ d (cid:0) R k ( T ) , L (cid:22) R k (cid:1) (7.1)in Gromov–Hausdorff–Prokhorov topology. Observe that the Gromov–Hausdorff convergence is clearfrom (5.1), so that it only remains to prove the convergence of the measures. Case We first prove the claim assuming that θ i > for every i ≥ . In this case, define m n := min (cid:26) j : j (cid:88) i =1 (cid:16) p ni σ n (cid:17) ≥ (cid:88) i ≥ θ i (cid:27) , and observe that m n < ∞ since (cid:80) i ≤ n ( p ni /σ n ) = 1 ≥ (cid:80) i ≥ θ i . Note also that m n → ∞ . Indeed, forevery integer k ≥ , since p ni /σ n → θ i , for i ≥ , and θ k +1 > , we have, for all n large enough, k (cid:88) i =1 (cid:16) p ni σ n (cid:17) < (cid:88) i ≥ θ i , so that m n > k for all n large enough. Furthermore lim j →∞ θ j = 0 , and (H) implies that lim n →∞ p m n σ n = 0 . (7.2)Combining this with the definition of m n , it follows that, as n → ∞ , (cid:88) i ≤ m n (cid:16) p ni σ n (cid:17) → (cid:88) i ≥ θ i . (7.3)If n, k, M ≥ , we set L ∗ n = (cid:88) m n
Suppose that (H) holds. Then, for each k ≥ , we have the following assertions:a) as n → ∞ , in probability d P (cid:0) L ∗ n (cid:22) R nk , θ σ n (cid:96) n (cid:22) R nk (cid:1) → (7.4) b) for each (cid:15) > , there exists M = M ( k, (cid:15) ) ∈ N such that lim sup n →∞ P (cid:0) Σ( n, k, M ) ≥ (cid:15) (cid:1) ≤ (cid:15) ; (7.5)39efore proving Lemma 7.1, let us first explain why this entails Proposition 5.2. Proof of Proposition 5.2 in the Case . By Skorokhod representation theorem and a diagonal argument,we can assume that the convergence ( σ n T n , µ n , B nm ) → ( T , µ, B m ) , holds almost surely in the m -pointed Gromov–Prokhorov topology for all m ≥ . Since the length measure (cid:96) n (resp. (cid:96) ) dependscontinuously on the metric of T n (resp. the metric of T ), according to Proposition 2.23 of [39] thisimplies that, for each k ≥ , (cid:0) σ n R nk , θ σ n (cid:96) n (cid:22) R nk (cid:1) → (cid:0) R k , θ (cid:96) (cid:22) R k (cid:1) , (7.6)almost surely in the Gromov–Hausdorff–Prokhorov topology. On the other hand, we easily deduce fromthe convergence of the vector B nm and (H) that, for each fixed m ≥ , (cid:32) σ n R nk , m (cid:88) i =1 p ni σ n δ i (cid:22) R nk (cid:33) → (cid:32) R k , m (cid:88) i =1 θ i δ B i (cid:22) R k (cid:33) , (7.7)almost surely in the Gromov–Hausdorff–Prokhorov topology. In the following, we write d n , kP (resp. d kP ) for the Prokhorov distance on the finite measures on the set R nk (resp. R k ). In particular, since themeasures below are all restricted to either R nk or R k , we omit the notations (cid:22) R nk , (cid:22) R k when the meaningis clear from context. We write Kt m ( L ) := θ (cid:96) + m (cid:88) i =1 θ i δ B i for the cut-off measure of L at level m . By Lemma 5.1, the restriction of L to R nk is a finite measure.Therefore, Kt m ( L ) → L almost surely in d kP as m → ∞ .Now fix some (cid:15) > . By Lemma 7.1 we can choose some M = M ( k, (cid:15) ) such that (7.5) holds, aswell as P (d kP (Kt M ( L ) , L ) ≥ (cid:15) ) ≤ (cid:15). (7.8)Define now the approximation ϑ n,M := θ σ n (cid:96) n + (cid:88) i ≤ M p ni σ n δ i . Then recalling the definition of L n in (5.4), and using (7.5) and (7.4), we obtain lim sup n →∞ P (d n , kP ( ϑ n,M , L n ) ≥ (cid:15) ) ≤ (cid:15). (7.9)We notice that (cid:0) σ n R nk , ϑ n,M (cid:1) → (cid:0) R k , Kt M ( L ) (cid:1) (7.10)almost surely in the Gromov–Hausdorff–Prokhorov topology as a combined consequence of (7.6) and(7.7). Finally, by the triangular inequality, we deduce from (7.8), (7.9) and (7.10) that lim sup n →∞ P (cid:0) d GHP (cid:0) ( σ n R nk , L n ) , ( R k , L ) (cid:1) ≥ (cid:15) (cid:1) ≤ (cid:15), for any (cid:15) > , which concludes the proof. Proof of Lemma 7.1. We first consider the case k = 1 . Define D n := d T n (cid:0) r ( T n ) , V n (cid:1) , and F L n ( l ) := L ∗ n (cid:0) B ( r ( T n ) , l ) ∩ R n (cid:1) , (7.11)where B ( x, l ) denotes the ball in T n centered at x and with radius l . Then the function F L n determinesthe measure L ∗ n (cid:22) R n in the same way a distributional function determines a finite measure of R + . Let40 X nj , j ≥ be a sequence of i.i.d. random variables of distribution p n . We define R n = 0 , and for m ≥ , R nm = inf (cid:8) j > R nm − : X nj ∈ { X n , X n , · · · , X nj − } (cid:9) the m -th repeat time of the sequence. For l ≥ , we set F n ( l ) := l ∧ ( R n − (cid:88) j =0 (cid:88) i>m n p ni σ n { X nj = i } . According to the construction of the birthday tree in [21] and Corollary 3 there, we have (cid:0) D n , F L n ( · ) (cid:1) d = (cid:0) R n − , F n ( · ) (cid:1) . (7.12)Let q n ≥ be defined by q n = (cid:80) i>m n p ni . Then (7.3) entails lim n →∞ q n /σ n = θ . For l ≥ , we set Z n ( l ) := (cid:12)(cid:12)(cid:12)(cid:12) F n ( l ) − q n σ n (cid:0) ( l + 1) ∧ R n (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) . We claim that sup l ≥ Z n ( l ) → in probability as n → ∞ . To see this, observe first that Z n ( l ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l ∧ ( R n − (cid:88) j =0 (cid:32) (cid:88) i>m n p ni σ n { X nj = i } − q n σ n (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where the terms in the parenthesis are independent, centered, and of variance χ n := σ − n (cid:80) i>m n p ni − σ − n q n . Therefore, Doob’s maximal inequality entails that for any fixed number N > , E (cid:34)(cid:18) sup l ≥ Z n ( l ) { R n ≤ N/σ n } (cid:19) (cid:35) ≤ E sup l< (cid:98) N/σ n (cid:99) l (cid:88) j =0 (cid:32) (cid:88) i>m n p ni σ n { X nj = i } − q n σ n (cid:33) ≤ N σ − n χ n ≤ N q n σ n p nm n + q n σ n → by (7.2) and the fact that q n /σ n → θ . In particular, it follows that sup l ≥ Z n ( l ) { R n ≤ N/σ n } → , (7.13)in probability as n → ∞ . On the other hand, the convergence of the p n -trees in (5.1) implies that thefamily of distributions of ( σ n D n , n ≥ is tight. By (7.12), this entails that lim N →∞ lim sup n →∞ P ( R n > N/σ n ) = 0 . (7.14)Combining this with (7.13) proves the claim.The generalized distribution function as in (7.11) for the discrete length measure (cid:96) n is l (cid:55)→ l ∧ D n .Thus, since sup l Z n ( l ) → in probability, the identity in (7.12) and q n /σ n → θ imply that d P (cid:0) L ∗ n (cid:22) R n , θ σ n (cid:96) n (cid:22) R n (cid:1) → in probability as n → ∞ . This is exactly (7.4) for k = 1 .In the general case where k ≥ , we set D n, := D n , D n,m := d T n (cid:0) b n ( m ) , V nm (cid:1) , m ≥ , b n ( m ) denotes the branch point of T n between V nm and R nm − , i.e, b n ( m ) ∈ R nm − such that (cid:74) r ( T n ) , V nm (cid:75) ∩ R nm − = (cid:74) r ( T n ) , b n ( m ) (cid:75) . We also define F L n, ( l ) := F L n , and F L n,m ( l ) := L ∗ n (cid:0) B ( b n ( m ) , l ) ∩ (cid:75) b n ( m ) , V nm (cid:75) (cid:1) , m ≥ . Then conditional on R nk , the vector ( F L n, ( · ) , · · · , F L n,k ( · )) determines the measure L ∗ n (cid:22) R nk for the samereason as before. If we set F n, ( l ) := F n ( l ) , and F n,m ( l ) := l ∧ ( R nm − (cid:88) j = R nm − +1 (cid:88) i>m n p ni σ n { X nj = i } , m ≥ , then Corollary 3 of [21] entails the equality in distribution (cid:0)(cid:0) D n,m , F L n,m ( · ) (cid:1) , ≤ m ≤ k (cid:1) d = (cid:0)(cid:0) R nm − R nm − − , F n,m ( · ) (cid:1) , ≤ m ≤ k (cid:1) Then by the same arguments as before we can show that max ≤ m ≤ k sup l ≥ (cid:12)(cid:12)(cid:12)(cid:12) F n,m ( l ) − q n σ n (cid:16) l ∧ ( R nm − R nm − − (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) → in probability as n → ∞ . This then implies (7.4) by the same type of argument as before.Now let us consider (7.5). The idea is quite similar. For each M ≥ , we set ˜ Z n,M := R n − (cid:88) j =0 (cid:88) M (cid:15) ) ≤ (cid:15) − E [ ˜ Z n,M { R n ≤ N/σ n } ] + P ( R n > N/σ n ) . According to (7.14) and (7.15), we can first choose some N = N ( (cid:15) ) then some M = M ( N ( (cid:15) ) , (cid:15) ) = M ( (cid:15) ) such that lim sup n P ( ˜ Z n,M > (cid:15) ) < (cid:15) . On the other hand, Corollary 3 of [21] says that Σ( n, , M ) is distributed like ˜ Z n,M . Then we have shown (7.5) for k = 1 . The general case can be treated in thesame way, and we omit the details.So far we have completed the proof of Proposition 5.2 in the case where θ has all strictly positiveentries. The other cases are even simpler: Case . Suppose that θ = 0 , we take m n = n and the same argument follows. Case . Suppose that θ has a finite length I , then it suffices to take m n = I . We can proceed as before.42 eferences [1] R. Abraham and J.-F. Delmas. Record process on the continuum random tree. ALEA , 10:225–251, 2013.[2] R. Abraham and J.-F. Delmas. The forest associated with the record process on a L´evy tree. StochasticProcesses and their Applications , 123:3497–3517, 2013.[3] L. Addario-Berry, N. Broutin, and C. Holmgren. Cutting down trees with a Markov chainsaw. The Annals ofApplied Probability , 2013. to appear, arXiv:1110.6455 [math.PR].[4] D. Aldous. The continuum random tree. I. Ann. Probab. , 19(1):1–28, 1991.[5] D. Aldous. The continuum random tree. III. Ann. Probab. , 21(1):248–289, 1993.[6] D. Aldous and J. Pitman. The standard additive coalescent. Ann. Probab. , 26(4):1703–1726, 1998.[7] D. Aldous and J. Pitman. A family of random trees with random edge lengths. Random Structures Algorithms ,15(2):176–195, 1999.[8] D. Aldous and J. Pitman. Inhomogeneous continuum random trees and the entrance boundary of the additivecoalescent. Probab. Theory Related Fields , 118(4):455–482, 2000.[9] D. Aldous and J. Pitman. Invariance principles for non-uniform random mappings and trees. In Asymptoticcombinatorics with application to mathematical physics (St. Petersburg, 2001) , volume 77 of NATO Sci. Ser.II Math. Phys. Chem. , pages 113–147. Kluwer Acad. Publ., Dordrecht, 2002.[10] D. Aldous, G. Miermont, and J. Pitman. The exploration process of inhomogeneous continuum random trees,and an extension of Jeulin’s local time identity. Probab. Theory Related Fields , 129(2):182–218, 2004.[11] D. Aldous, G. Miermont, and J. Pitman. Weak convergence of random p -mappings and the explorationprocess of inhomogeneous continuum random trees. Probab. Theory Related Fields , 133(1):1–17, 2005.[12] D. J. Aldous. The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J.Discrete Math. , 3(4):450–465, 1990.[13] J. Bertoin. A fragmentation process connected to Brownian motion. Probability Theory and Related Fields ,117:289–301, 2000.[14] J. Bertoin. Eternal additive coalescents and certain bridges with exchangeable increments. Annals of Proba-bility , 29:344–360, 2001.[15] J. Bertoin. Self-similar fragmentations. Annales de l’Institut Henri Poincar´e: Probabilit´es et Statistiques , 38(3):319–340, 2002.[16] J. Bertoin and G. Miermont. The cut-tree of large Galton-Watson trees and the Brownian CRT. The Annalsof Applied Probability , 23:1469–1493, 2013.[17] P. Billingsley. Convergence of probability measures . John Wiley & Sons Inc., New York, 1968.[18] J.-M. Bismut. Last exit decompositions and regularity at the boundary of transition probabilities. Zeitschriftf¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete , 69:65–98, 1985.[19] A. Broder. Generating random spanning trees. In Proc. 30’th IEEE Symp. Found. Comp. Sci. , pages 442–447.1989.[20] N. Broutin and M. Wang. Reversing the cut tree of the Brownian CRT. arXiv:1408.2924 [math.PR], 2014.[21] M. Camarri and J. Pitman. Limit distributions and random trees derived from the birthday problem withunequal probabilities. Electron. J. Probab. , 5:no. 2, 18 pp. (electronic), 2000.[22] A. Cayley. A theorem on trees. Quarterly Journal of Pure and Applied Mathematics , 23:376–378, 1889.[23] D. Dieuleveut. The vertex-cut-tree of Galton-Watson trees converging to a stable tree. arXiv:1312.5525[math.PR], 2013.[24] M. Drmota, A. Iksanov, M. Moehle, and U. Roesler. A limiting distribution for the number of cuts needed toisolate the root of a random recursive tree. Random Structures and Algorithms , 34:319–336, 2009.[25] T. Duquesne and J.-F. Le Gall. Probabilistic and fractal aspects of L´evy trees. Probab. Theory Related Fields ,131(4):553–603, 2005.[26] S. N. Evans. Probability and real trees , volume 1920 of Lecture Notes in Mathematics . Springer, Berlin,2008. Lectures from the 35th Summer School on Probability Theory held in Saint-Flour, July 6–23, 2005. 27] S. N. Evans and J. Pitman. Construction of Markovian coalescents. Ann. Inst. H. Poincar´e Probab. Statist. ,34(3):339–383, 1998.[28] S. N. Evans, J. Pitman, and A. Winter. Rayleigh processes, real trees, and root growth with re-grafting. Probab. Theory Related Fields , 134(1):81–126, 2006.[29] J. Fill, N. Kapur, and A. Panholzer. Destruction of very simple trees. Algorithmica , 46:345–366, 2006.[30] M. Gromov. Metric structures for Riemannian and non-Riemannian spaces . Modern Birkh¨auser Classics.Birkh¨auser Boston Inc., Boston, MA, english edition, 2007.[31] B. Haas and G. Miermont. Scaling limits of Markov branching trees with applications to Galton-Watson andrandom unordered trees. Ann. Probab. , 40(6):2589–2666, 2012.[32] C. Holmgren. Random records and cuttings in binary search trees. Combinatorics, Probability and Comput-ing , 19:391–424, 2010.[33] C. Holmgren. A weakly 1-stable limiting distribution for the number of random records and cuttings in splittrees. Advances in Applied Probability , 43:151–177, 2011.[34] A. Iksanov and M. M¨ohle. A probabilistic proof of a weak limit law for the number of cuts needed to isolatethe root of a random recursive tree. Electronic Communications in Probability , 12:28–35, 2007.[35] J. Jacod and A. N. Shiryaev. Limit theorems for stochastic processes , volume 288 of Grundlehren der Math-ematischen Wissenschaften . Springer-Verlag, Berlin, 1987. ISBN 3-540-17882-1.[36] S. Janson. Random cutting and records in deterministic and random trees. Random Structures Algorithms ,29(2):139–179, 2006. ISSN 1042-9832.[37] J.-F. Le Gall and Y. Le Jan. Branching processes in L´evy processes: the exploration process. Ann. Probab. ,26(1):213–252, 1998.[38] W. L¨ohr. Equivalence of Gromov-Prohorov- and Gromov’s (cid:3) λ -metric on the space of metric measure spaces. Electron. Commun. Probab. , 18:no. 17, 10, 2013.[39] W. L¨ohr, G. Voisin, and A. Winter. Convergence of bi-measure R -Trees and the pruning process.arxiv:1304.6035 [math.PR], 2013.[40] A. Meir and J. Moon. Cutting down random trees. Journal of the Australian Mathematical Society , 11:313–324, 1970.[41] G. Miermont. Tessellations of random maps of arbitrary genus. Ann. Sci. ´Ec. Norm. Sup´er. (4) , 42(5):725–781, 2009.[42] A. Panholzer. Cutting down very simple trees. Quaestiones Mathematicae , 29:211–228, 2006.[43] A. R´enyi. On the enumeration of trees. In Combinatorial Structures and their Applications (Proc. CalgaryInternat. Conf., Calgary, Alta., 1969) , pages 355–360. Gordon and Breach, New York, 1970., pages 355–360. Gordon and Breach, New York, 1970.