Asymptotic Enumeration and Limit Laws for Multisets: the Subexponential Case
aa r X i v : . [ m a t h . C O ] J u l Asymptotic Enumeration and Limit Laws for Multisets: theSubexponential Case
Konstantinos Panagiotou ∗ and Leon Ramzews † Abstract
For a given combinatorial class C we study the class G = Mset ( C ) satisfying the multiset construction,that is, any object in G is uniquely determined by a set of C -objects paired with their multiplicities. Forexample, Mset ( N ) is just the class of number partitions of positive integers, a prominent and well-studiedcase. The multiset construction appears naturally in the study of unlabelled objects, for example graphsor various objects related to number partitions. Our main result establishes the asymptotic size of theset G n,N that contains all multisets in G having size n and being comprised of N objects from C , as n and N tend to infinity and when the counting sequence of C is governed by subexponential growth; thisis a particularly important setting in combinatorial applications. Moreover, we study the componentdistribution of typical objects from G n,N and we discover a unique phenomenon that we baptise extremecondensation : taking away the largest component as well as all the components of the smallest possiblesize, we are left with an object which converges in distribution as n, N → ∞ . The exact distributionof the limiting object is also retrieved. Moreover and rather surprisingly, in stark contrast to analogousresults for labelled objects, the results here hold uniformly in N . Let C be a combinatorial class, that is, a countable set endowed with a size function |·| : C → N such that C n := { C ∈ C : | C | = n } contains only finitely many objects for all n ∈ N . Assume that C = ∅ . Then theclass of C -multisets G = Mset ( C ) consists of all objects of the form (cid:8) ( C , d ) , . . . , ( C k , d k ) (cid:9) , C i ∈ C , d i ∈ N , k ∈ N , ≤ i ≤ k, where ( C i ) ≤ i ≤ k are pairwise distinct and d i describes the multiplicity of the object C i in the multiset. Insimple words, a C -multiset is a finite unordered collection of elements from C such that multiple occurrences ofeach element are admissible. For example, if C = N , then Mset ( C ) contains all partitions of natural numbers,a prominent object. The multiset construction is omnipresent in combinatorial settings, for example when C is some class of connected unlabelled graphs; this makes G the class of unlabelled graphs having connectedcomponents in C . For many historical references and examples we refer the reader to the excellent books[30, 19]. An alternative and instructive way to describe multisets of size n ∈ N is to make the connectionto number partitions explicit as follows. First, choose a number partition of n . Then, assign to each of theparts an element of that size from C . Hence, multisets are also called weighted integer partitions , frequentlyencountered in the context of statistical physics of ideal gas. There, c k := |{ C ∈ C : | C | = k }| describes thedifferent possible states of a particle at energy level k ∈ N , see [47] for a thorough overview.Given G = { ( C , d ) , . . . , ( C k , d k ) } ∈ G we denote by | G | := P ≤ i ≤ k d i | C i | the size and by κ ( G ) := P ≤ i ≤ k d i the number of components of G . We further set G n := { G ∈ G : | G | = n } and G n,N := { G ∈ G n : κ ( G ) = N } , n, N ∈ N . Additionally, we define G n and G n,N to be the multisets drawn uniformly at random from G n and G n,N ,respectively. A vast amount of literature is dedicated to the enumerative problem of determining g n := ∗ Department of Mathematics, Ludwigs-Maximilians-Universit¨at M¨unchen. E-mail: [email protected]. † Department of Mathematics, Ludwigs-Maximilians-Universit¨at M¨unchen. E-mail: [email protected]. Funded by theDeutsche Forschungsgemeinschaft (DFG, German Research Foundation), Project PA 2080/3-1. G n | , and sometimes also g n,N := |G n,N | , asymptotically under various general assumptions or for specificexamples of multisets such as integer partitions, plane partitions or unlabelled (un-)rooted forests, see e.g.[25, 32, 38, 29, 26, 23, 24]. Note that this is directly linked to the limiting distribution or local limit theoremsfor the number of components in G n , for example investigated in [18, 5, 27, 35]. Another interesting researchtopic which has received a lot of attention is devoted to finding the asymptotic behaviour of the global shapeof G n and G n,N in terms of phenomena like condensation or gelation, cf. [18, 34, 3, 4, 36, 46]. Section 1.1highlights some of these results in more detail.We associate to C and G the (ordinary) generating series in two formal variables x and yC ( x ) := X k ∈ N |C k | x k and G ( x, y ) := X k,ℓ ∈ N |G k,ℓ | x k y ℓ , and we use the standard notation g n,N = |G n,N | = [ x n y N ] G ( x, y ) for all n, N ∈ N . These two power seriesare known to fulfil the fundamental relation, see for example [30, 19], G ( x, y ) = exp X j ≥ y j C ( x j ) j . (1.1)In this paper we will consider only a specific case, namely when the counting sequence ( c n ) n ∈ N is subexpo-nential , and our aim is to study the class G n,N – what is g n,N , how do typical objects look like? – as n → ∞ and for all ≤ N ≤ n . Such subexponential sequences appear naturally in combinatorial contexts, the mainreason being the appearance of square-root singularities in the analysis of the corresponding generatingfunctions. Let us give an example that is prototypical for an application in this context. Example.
Let T be the class of unlabelled trees, that is, isomorphism classes of connected and acyclicgraphs. Then F = Mset ( T ) is the class of unlabelled forests. Moreover, see [37], the number of unlabelledtrees satisfies |T n | ∼ c · n − / · ρ − n for some c > and < ρ < . What can we say about F n,N ? Similar counting sequences, in particular with a polynomial term that is n − α for some α >
1, appear in avariety of contexts in graph enumeration; so-called subcritical graph classes ([8, 13]) that include outerplanarand series-parallel graphs are prominent examples. All these counting sequences – and many more – are subexponential .Let us proceed with a formal definition. We say that C ( x ) = P k ≥ c k x k , or ( c k ) k ≥ respectively, is subexponential with radius of convergence ρ >
0, if c n − c n ∼ ρ and c − n X ≤ k ≤ n c k c n − k ∼ C ( ρ ) < ∞ , n → ∞ . Central examples for subexponential sequences are of the form c n ∼ λ ( n ) · n − α · ρ − n for α > λ ( n ) anyslowly varying function, as in the previously mentioned examples, see [17].In the rest of the section we will assume that C is subexponential. Further, let m ≡ m ( C ) ∈ N be suchthat c m > c = · · · = c m − = 0. That is, the “smallest” element in N has size m . In our first resultwe determine [ x n y N ] G ( x, y ) for large n, N such that the difference n − mN also diverges; so, the only casewe do not treat is when n = mN + O (1), that is, all components in the multiset are of smallest size m , witha bounded number of bounded exceptions. Theorem 1.1.
Suppose that C ( x ) is subexponential with radius of convergence < ρ < . Then, as n, N, n − mN → ∞ , [ x n y N ] G ( x, y ) ∼ A · N c m − · c n − m ( N − , (1.2) where A = 1Γ( c m ) exp X j ≥ C ( ρ j ) − c m ρ jm jρ jm . < ρ < ρ ≥
1, then the counting sequence c n →
0, which would not befeasible. However, note that actually the validity of the theorem is not restricted solely to combinatorialapplications, as we have never required that the c n ’s are natural numbers. Second and more importantly, notethe right hand side of (1.2): this formula establishes an explicit connection between g n,N and c n − m ( N − ,that is, we do not need the actual counting sequence of C to make statements about g n,N . Moreover, acloser look at this formula reveals an unexpected fact. In a combinatorial setting, note that the number ofpossible ways to choose a multiset of N objects from C m is given by (cid:0) c m + N − N (cid:1) ∼ N c m − / Γ( c m ) (this is justa number partition of N in c m parts). Hence the right hand side of (1.2) is proportional to the number ofpossibilities to choose N objects from C m and one object from C n − m ( N − ; that is, a “typical” object from G n,N should essentially consist of a big component with more or less n − mN vertices and N − m . This is rather extreme, as the largest possible size of an object in G n,N is n − mN .Our next result formalizes this intuition. For G = { ( C , d ) , . . . , ( C k , d k ) } ∈ G denote by L ( G ) :=max ≤ i ≤ k | C i | the size of one of its largest components. We show that except for a quantity O p (1), that is,something that is bounded in probability, the largest component in a uniformly drawn object G n,N from G n,N has indeed size n − mN . Theorem 1.2.
Suppose that C ( x ) is subexponential with radius of convergence < ρ < . Then, as n, N, n − mN → ∞ , L ( G n,N ) = n − mN + O p (1) . The proof can be found in Section 3.3. We call the phenomenon established in Theorem 1.2 extremecondensation : we observe that typically our objects have a giant component that is essentially as largeas possible , that is, its size is close to the largest possible size n − mN . In particular, virtually all othercomponents are of smallest possible size m . We are not aware of any other object with a comparablebehaviour. Moreover, this behaviour is surprising for one more reason: if we consider the labelled counterpartsof our unlabelled objects, in our running example trees, then the typical structure is well known ([28, 39]) toundergo various phase transitions (from subcritical to condensation) depending on the number of components,but it never becomes as extreme as observed here. See Section 1.2 for a more detailed discussion.Our final main result addresses the last remaining bit and describes the shape of a typical object from G n,N when we remove a component of largest size and all components of the smallest possible size m . Weshow that the remainder is a multiset of stochastically bounded size and number of components, for whichwe determine the exact limiting distribution in the next theorem. To formulate this statement, we needto introduce the class C >m = S k>m C k equipped with the modified size function | C | >m := | C | − m for C ∈ C >m . The resulting generating function C >m ( x ) is hence given by ( C ( x ) − c m x m ) /x m ; here subtracting c m x m accounts for the fact that we remove objects of size m and by dividing through x m all objects in C k are now counted as objects with size k − m for k > m . Similar to the formula in (1.1) (setting y = 1) theclass of all multisets G >m := Mset ( C >m ) therefore has generating series G >m ( x ) := exp X j ≥ C ( x j ) − c m x j jx jm . Further, the size of an object G in G >m is given by | G | >m := | G | − mκ ( G ). One readily understands that C >m ( x ) is also subexponential with radius of convergence ρ and G >m ( ρ ) < ∞ . Define a random variableΓ G >m ( ρ ) on G >m specified byPr [Γ G >m ( ρ ) = G ] = ρ | G | >m G >m ( ρ ) = exp − X j ≥ C ( ρ j ) − c m ρ j jρ jm ρ | G |− mκ ( G ) , G ∈ G >m . (1.3)For G ∈ G let the remainder R ( G ) be the multiset obtained after removing all tuples ( C, d ) ∈ G with C ∈ C m and one object of largest size from G (this choice can be done in a canonical way by numbering all elementsin C ). That means, if the object of largest size has multiplicity d > d by d −
1, otherwise removethe object and its multiplicity 1 completely from the set. Then the distribution in (1.3) is the limit of theremainder R ( G n,N ), see Section 3.4 for the proof. 3 heorem 1.3. Suppose that C ( x ) is subexponential with radius of convergence < ρ < . Then, as n, N, n − mN → ∞ , in distribution R ( G n,N ) → Γ G >m ( ρ ) . We close this introduction and the presentation of the main results by catching up with our previousexample regarding the class T of unlabelled trees and F = Mset ( T ) the unlabelled forests. Example (continued)
Theorem 1.1 is directly applicable to the class of unlabelled trees. We readily obtainthat the number of unlabelled forests of size n with N components satisfies f n,N ∼ A · |T n − N +1 | ∼ A ′ · ( n − N ) − / ρ − n + N , for n, N, n − N → ∞ and for some constants A, A ′ > . Moreover, for this range of N , we obtain that with high probability, arandom unlabelled forest contains a huge tree with n − N + O (1) vertices, and N + O (1) “trivial” trees thatconsist of a single vertex. Proof Strategy
The main idea in the proof is to consider a randomized algorithm/stochastic process thatgenerates C -multisets. As it turns out, such an algorithm that outputs elements from G (with a priori nocontrol on the size or the number of components!) can be designed by defining the so-called Boltzmanndistribution on G , see Section 3.1 for all details. The crucial property of this algorithm is that all choicesit makes are independent . Our first contribution here is to establish explicitly the connection between thechoices of the algorithm and its output; hence the probability that the output is in G n,N can be linked to anevent regarding the actual choices of the algorithm. Our second and main contribution is then to actuallycompute the probability that this event occurs; as we will see, this is not at all an easy task, since the involvedrandom variables are not identically distributed and interfere in a complex way with the parameters of thegenerated object. Plan of the Paper
Subsequently, we embed our results in the corpus of existing literature in Section1.1. In Section 1.2 we compare the labelled and the unlabelled setting in light of our results followed byan application to Benjamini-Schramm convergence in Section 1.3. Then we collect and prove some resultsabout subexponential power series tailored to our needs in Section 2. In Section 3 all proofs are presented,where each of the main results is treated in an extra subsection, such that Theorem 1.1 is proven in Section3.2, Theorem 1.2 in Section 3.3 and Theorem 1.3 in Section 3.4.
Notation
We shall use the following (standard) notation. Given two real-valued sequences ( a k ) k ∈ N and( b k ) k ∈ N with b k = 0 for all k ≥ k for some k ∈ N , we write, as n → ∞ ,i) a n ∼ b n (“ a n is asymptotically equal to b n ”) if lim n →∞ a n /b n = 1,ii) a n ∝ b n (“ a n is asymptotically proportional to b n ”) if there exist constants 0 < A ≤ A such that A ≤ lim inf n →∞ (cid:12)(cid:12) a n b n (cid:12)(cid:12) ≤ lim sup n →∞ (cid:12)(cid:12) a n b n (cid:12)(cid:12) ≤ A , iii) a n = o ( b n ) if lim n →∞ a n /b n = 0.For a sequence of real-valued random variables ( X k ) k ∈ N and a non-negative sequence ( a k ) k ∈ N we write X n = O p ( a n ) (“ X n is stochastically bounded by a n ”) if for all ε > K > n →∞ Pr [ | X n | ≥ Ka n ] ≤ ε . In the case a k ≡ X n is stochastically bounded”.We will use the following notation with respect to formal power series. For a k -dimensional vector offormal variables x = ( x , . . . , x k ) and d = ( d , . . . , d k ) ∈ N k we write x d for the monomial x d · · · x d k k . Amultivariate power series with real-valued coefficients is given by A ( x ) = P d ∈ N k a d x d , where the a d ’s arereal numbers. For d ∈ N k we write [ x d ] A ( x ) = a d for the coefficient of x d .4 .1 Other Related Work In this section we put our results in the broader context of (asymptotic) enumeration of multisets. The mostprominent assumption is that the counting sequence of C fulfils c n ∼ λ ( n ) · n α · ρ − n as n → ∞ for someslowly varying function λ ( · ) and parameters α ∈ R , < ρ ≤
1. Then there emerge three cases dependingon the parameter α determining the behaviour of C ( x ) = P k ≥ c k x k at or near its radius of convergence ρ ,each giving rise to a fundamentally different picture.In the expansive case ( α > −
1) the quantity g n is well-understood [23], however, general results about G n and G n,N are not known to our knowledge without any extra conditions. For example, under the assumptionof Meinardus scheme of conditions [32] implying that ρ = 1 the number of components of G n fulfils variouslocal limit theorems, see [35]. The authors of [24] show that sequences of the form c k ≥ Ck α for a constant C > k ≥ k for some k ∈ N satisfy Meinardus’ conditions. Hence, they call this case quasi-expansive as many expansive structures such as integer partitions ( c k = 1) and plane partitions ( c k = k )are encapsulated by this approach. For quasi-expansive sequences the size of the largest component of G n iswith high probability of order (1 + o (1)) n r log n as established by [36]. The broader picture here is that thenumber of components in G n is typically unbounded and the size of the largest component sublinear in n . Ingeneral, it is reasonable to conjecture that the shape of G n,N depends on the asymptotic regime of N whichcan be justified analysing the case of integer partitions: Depending on whether N is O ( n / ) or ω ( n / ) theasymptotic behaviour of the number of partitions of n into N parts is given by different formulas, see [29].For N ≥ (1 + ε ) / ( p / π ) √ n log n for any 0 < ε < g n,N ∼ g n − N ([26]). Note thedifferences to our main results for subexponential C ( x ): no matter the asymptotic regime of N we have that g n,N is proportional to c n − N if c m = m = 1.The logarithmic case ( α = − λ ≡ λ ( · ) constant) is concomitant with similar effects. The numberof components in G n is typically of order λ log n [3, Theorem 8.21] and, denoting by ( L , . . . , L N ) the N largest component sizes of G n , then n − ( L , . . . , L N ) has a limiting distribution that is Poisson-Dirichlet [3,Theorem 6.8] implying that G n is composed of several “large” objects. The same holds true for G n,N with N ∈ N fixed, where the N − n U i ) ≤ i ≤ N − for iiduniformly distributed random variables ( U i ) ≤ i ≤ N − [3, Theorem 6.9]. For instance, mapping patterns arelogarithmic with λ = 1 /
2, cf. [33].In contrary, condensation is observed in the convergent case ( α < − G n is of size n − O p (1) and its number of components converges in distribution, see [4]. As mentioned before,unlabelled forests are contained in this case and were studied in [34].In accordance with the results observed for the convergent case the number of components in G n in thesubexponential setting has a limiting distribution given by a weighted sum of independent Poisson randomvariables [5]. Equivalently, this means that g n,N is known for fixed values of N ∈ N . As for the global shapeof G n , the results in [46] imply that almost all the size is concentrated in the largest component, whereasthe remainder obtained after removing the largest component converges in distribution to a limit given bythe P´olya-Boltzmann distribution discussed in [9].Basically, most of the proofs in the referenced literature are either conducted from a purely analyticalgenerating function perspective or involve somehow the conditioning relation representing the (heavily de-pendent) number of component frequencies in G n of a particular size by independent random variables withnegative binomial distributions.As opposed to analysing the component spectrum in [3, 4, 23], our proofs are in the style of [43, 46]:We use the P´olya-Boltzmann model representing G n as random C -objects attached to cycles of a randompermutation, which is helpful to get rid of cumbersome appearances of symmetries. Then we show that thesize associated to fixpoints is dominant and the subexponentiality-feature often referred to as “single bigjump” guarantees that in fact only one object receives the entire possible size. In what follows, we have a closer look at the resemblances and surprising disparities between multisets, whichare typically associated to unlabelled structures, and sets of labelled combinatorial structures. For the sakeof brevity, we refer the reader to the books [30, 19] to recall the concept of labelled and unlabelled classes.Another vast source of references and examples is the tour-de-force paper [28] entailing many results about5he balls-in-boxes model (Section 11) which by choosing the weight sequence ( c k /k !) k ∈ N implies the labelledset-construction.Given a labelled class C ι we may form the analogon to the multiset construction discussed in this work.Initially we pick C , . . . , C k from C ι and let n be the total size. Then we partition n into sets L , . . . , L k suchthat L i = | C i | for all i . Subsequently, we canonically assign the labels in L i to C i for all i . The outcome ofthis procedure is a labelled set, where each of the labels in { , . . . , n } appears exactly one time. The notionof size and number of components carries over from the multiset construction. Let us call the collection ofall such labelled sets G ι = Set ( C ι ) and introduce the sets G ιn and G ιn,N of objects in G ι of size n and of size n having N components, respectively. Further, let c ιk := |{ C ∈ C ι : | C | = k }| for k ∈ N . Similarly to (1.1)the bivariate (exponential) generating series related to this case is known [30, 19] to be G ι ( x, y ) = X k,ℓ ∈ N |G ιk,ℓ | x k k ! y ℓ = exp ( yC ι ( x )) , where C ι ( x ) = X k ≥ c ιk x k k ! . The Number of Components
Let G ιn be drawn uniformly at random from G ιn . If C ( ι ) ( x ) is subexponential with radius of convergence ρ >
0, in [5] it was shown that the limit distribution of the number of components κ ( G n ) (multiset) and κ ( G ιn ) (set) is in both cases given byPr h κ ( G ( ι ) n ) = N i ∼ [ y N − ] G ( ι ) ( ρ, y ) G ( ι ) ( ρ, , N ∈ N , n → ∞ . The situation seems to be comparable as in both cases we infer that |G ( ι ) n,N | is asymptotically equal to theproduct of some function only depending on N with |G ( ι ) n | . However, this works only in the limited domainof fixed N ; as we shall see, the interesting behaviour of κ ( G ιn ) evolves in the tails. The works [28, 39] treatthis topic extensively : under the condition that c ιn ∼ bn − (1+ α ) ρ − n n ! for b > α > n → ∞ thereemerges a “trichotomy” (1 < α ≤
2) and in some cases a “dichotomy” ( α >
2) depending on the asymptoticregime of N . Similar results under slightly varying conditions are presented in [19]. To illustrate the natureof these results, let us consider the class of labelled unrooted trees T ι such that F ι = Set ( T ι ) is the classof labelled forests. The well-known formula by Cayley states that t ιn := |{ T ∈ T ι : | T | = n }| = n n − ∼√ π − n − (1+3 / ρ − n n ! where ρ = e − , so that α = 3 /
2. Abbreviating by f ιn,N the number of forests on n nodes and N trees, the following detailed result exposing two phase transitions is known. Let N := ⌊ λn ⌋ ,then f ιn,N ∼ c − ( λ ) n − / e n − N , λ ∈ (0 , / cn − / e n − N , λ = 1 / c + ( λ ) n − / f ( λ ) n T ι ( f ( λ )) N , λ ∈ (1 / , , n → ∞ , for positive real-valued continuous functions c − / + ( λ ) , f ( λ ) and a constant c . For more general ranges ofparameters (i.e. arbitrary N such that N → ∞ as n → ∞ ) comparable results were already discovered in1988 by [10].Denoting by T the class of unlabelled trees, Theorem 1.1 reveals substantial differences between thelabelled and the unlabelled case: the entire asymptotic behaviour of f n,N , the number of forests in F = Mset ( T ) with n nodes and N trees, is determined by n − N . In fact, in the unlabelled case, we obtain forsome constant A > f n, ⌊ λn ⌋ ∼ A · (1 − λ ) − / n − / ρ n −⌊ λn ⌋ , no matter which 0 < λ < The Global Shape
The same polarity emerges in the investigation of the objects F ιn,N and F n,N drawn uniformly at randomfrom the set of (un-)labelled forests of size n composed of N trees. Let us first point out that the much more In particular we want to stress out [28, Theorems 18.12, 18.14, 19.34, 19.49]. both G n and G ιn are unifiedin having one giant component as n tends to infinity such that the remainder converges in distribution to alimit given by the P´olya-Boltzmann model. At first sight this suggests that the two models should behavecomparably. However, while we understand from Theorems 1.2 and 1.3 that the remainder of F n,N afterremoving the largest tree and all singletons converges in distribution as n, N, n − N → ∞ , the situation inthe labelled case is disparate: in [28, Theorem 19.49] it is precisely stated how, denoting again N := ⌊ λn ⌋ ,the parameter λ influences the universal shape of F ιn,N . Namely, three different cases emerge as n approachesinfinity:1. In the case where there are “few” components (0 < λ < / − λ ) of all nodes and the remaining N − O p ( n / ).2. In the case where the ratio between components and total size is “balanced” ( λ = 1 /
2) all trees havesize O p ( n / ).3. Whenever there are “many” components with respect to the total number of nodes (1 / < λ < n .For a detailed discussion for what happens near the critical point λ = 1 / In this section we investigate what implications our results have on Benjamini-Schramm convergence ofunlabelled graphs with many connected components. Informally, the Benjamini-Schramm limit of a sequenceof graphs describes what a uniformly at random chosen vertex typically sees in its neighbourhood and isa special instance of local weak convergence, see also [6, 2]. Given a graph G = ( V, E ) from the set of allsimple connected locally finite graphs we form the rooted graph (
G, o ) by distinguishing a vertex o ∈ V .Let B be the collection of all these rooted graphs. Then two graphs ( G, o ) and ( G ′ , o ′ ) in B are calledisomorphic, ( G, o ) ≃ ( G ′ , o ′ ), if there exists an edge-preserving bijection Φ on the vertex sets of G and G ′ such that Φ( o ) = o ′ . Hence, the space B ∗ = B / ≃ of equivalence classes in B under the relation ≃ containsall unlabelled simple connected locally finite graphs. On this space we define a metric by setting B k ( G, o ) tobe the induced subgraph of (
G, o ) ∈ B ∗ containing all vertices within graph distance k from the root o anddefining d loc (( G, o ) , ( G ′ , o ′ )) := 2 − sup { k ∈ N : B k ( G,o ) ≃ B k ( G ′ ,o ′ ) } , ( B, o ) , ( B ′ , o ′ ) ∈ B ∗ . Note that we see an element picked from B ∗ as one labelled graph from B representing its entire equivalenceclass, which is why the relation ≃ makes sense in above definition. It is known that ( B ∗ , d loc ) is a Polishspace.In this space we then say that a sequence of (labelled or unlabelled) simple connected locally finite graphs( G n ) n ≥ (possibly random) converges in the Benjamini-Schramm (BS) sense to a limiting object ( G , o ) ∈ B ∗ if for a vertex o n being selected uniformly at random from G n it holds for every bounded continuous function f : B ∗ → R that lim n →∞ E [ f ( G n , o n )] = E [ f ( G , o )]or equivalently lim n →∞ Pr [ B k ( G n , o n ) ≃ ( G, o )] = Pr [ B k ( G , o ) ≃ ( G, o )] , k ∈ N , ( G, o ) ∈ B ∗ . Back to our setting, we consider C to be a class of unlabelled finite connected graphs (with subexponentialcounting sequence and m ∈ N denotes the size of the smallest possible graph in C ) such that G = Mset ( C )is the class of graphs with connected components in C . Let G n be drawn uniformly at random from G n,N containing all graphs in G with n vertices and having N ≡ N ( n ) connected components. In order to adaptto the setting above we let ( G n , o n ) denote the connected component around a uniformly at random chosenroot o n in G n . Let C n be drawn uniformly at random from C n := { C ∈ C : | C | = n } . With this at hand, theextension of BS convergence to non-connected graphs is evident and we obtain the following result.7 roposition 1.4. Assume that mN ( n ) /n → λ ∈ [0 , and N ( n ) → ∞ as n → ∞ . If the sequence ( C n ) n ≥ converges to a limit object ( C , o ) in the BS sense, then G n converges to a limit object ( G , o ) in the BS sensegiven by the law (1 − λ ) δ ( C , o ) + λδ ( C m ,o m ) , where o m is a vertex chosen uniformly at random among the m vertices in C m . In particular, if N ( n ) = o ( n ) we have that ( G , o ) = ( C , o ) . The proof is found in Section 3.5. The authors of [22] show that any subcritical class C of connectedunlabelled graphs fulfils the conditions of Proposition 1.4. In particular, the class of unlabelled trees T is subcritical and Stufler [44, 45] makes the BS limit ( T , o ) explicit in this case. Then we obtain withProposition 1.4 that the BS limit ( F , o ) of F n drawn uniformly at random from all unlabelled forests of size n and being composed of N ( n ) ≡ N trees has law, assuming that mN ( n ) /n → λ ∈ [0 ,
1) and N ( n ) → ∞ as n → ∞ , (1 − λ ) δ ( T , o ) + λδ X , where X is a single rooted vertex. In other words, with probability 1 − λ the neighbourhood of a uniformlyat random chosen vertex from F n looks like the infinite tree T and with probability λ the neighbourhood isempty. In this section we collect (and prove) some properties of subexponential power series that will be quite handyin the rest of paper. Many of the definitions and statements shown here are taken from Embrechts and Omey[17] or Foss, Korshunov, and Zachary [21] and adapted to the discrete case, see also Stufler [46].
Definition 2.1.
A power series C ( x ) = P k ≥ c k x k with non-negative coefficients and radius of convergence < ρ < ∞ is called subexponential if c n X ≤ k ≤ n c n − k c k ∼ C ( ρ ) < ∞ and ( S ) c n − c n ∼ ρ, n → ∞ . ( S )Note that the radius of convergence of a power series C ( x ) satisfying ( S ) (in particular of any subex-ponential power series) is ρ and that eventually [ x n ] C ( x ) >
0, where as usual, [ x n ] C ( x ) = c n denotes thecoefficient of x n in C ( x ). Any arbitrary subexponential power series C ( x ) with radius of convergence ρ induces the probability generating series of a N -valued random variable by setting d k := c k ρ k C ( ρ ) , n ∈ N . Then D ( x ) = P k ≥ d k x k is subexponential with ρ = 1 and D ( ρ ) = 1. There are several results aboutthe asymptotic behaviour of sums of random variables with such a subexponential generating series. Herewe will need Lemma 2.2 (i) below, taken from [21, Theorem 4.30], which corresponds to determining theprobability that a randomly stopped sum of random variables with distribution ( d k ) k ≥ attains a large value.Moreover, Lemma 2.2 (ii), see [21, Theorem 4.11], will be particularly useful, since it provides bounds holdinguniformly in the given parameters. In Lemma 2.2 (iii) we present and prove a statement often referred to– with various interpretations – as “principle of a single big jump”. The dominant contribution to a largesum of subexponential random variables stems typically from one single summand. Lemma 2.2.
Let ( D i ) i ∈ N be iid N -valued random variables with probability generating function D ( x ) .Assume that D ( x ) is subexponential with radius of convergence . For p ∈ N let S p := P ≤ i ≤ p D i and M p := max { D , . . . , D p } . Then the following statements are true.(i) If τ is a N -valued random variable with probability generating function H ( x ) analytic at , then Pr [ S τ = n ] ∼ E [ τ ] Pr [ D = n ] , n → ∞ . ii) For every δ > there exists n ∈ N and a C > such that Pr [ S p = n ] ≤ C (1 + δ ) p Pr [ D = n ] , for all n ≥ n , p ∈ N . (iii) For any p ≥ M p | S p = n ) = n + O p (1) , n → ∞ . Proof of Lemma 2.2 (iii).
Let ε > K ∈ N such that lim n →∞ Pr [ | M p − n | ≥ K | S p = n ] < ε. Clearly under the condition S p = n we have M p ≤ n . ThusPr [ | M p − n | ≥ K | S p = n ] = X k ≥ K Pr [ M p = n − k | S p = n ] . (2.1)Since D , . . . , D p are iid we obtain for any k ≥ K Pr [ M p = n − k | S p = n ] ≤ Pr [ ≤ i ≤ p { D i = n − k } | S p = n = p Pr [ D = n − k ] Pr [ S p − = k ]Pr [ S p = n ] . Together with Lemma 2.2 (ii) we find some constant
C > k ≥ K sufficiently largePr [ S p − = k ] ≤ C (1 + ε ) p − Pr [ D = k ] . Part (i) justifies for n sufficiently large that Pr [ S p = n ] ≥ (1 − ε ) p Pr [ D = n ] . All in all, for a suitablychosen constant C ( p ) the expression in (2.1) can be estimated byPr [ | M p − n | ≥ K | S p = n ] ≤ C ( p ) X k ≥ K Pr [ D = n − k ] Pr [ D = k ]Pr [ D = n ] . Due to property ( S ) we conclude that this is smaller than ε choosing K large enough and the proof isfinished.The following lemma establishes asymptotics for the coefficients of the product of two power series andcan be found in [11, Theorem 3.42] or [42, Exercise 178, p. 32]. Lemma 2.3.
Let G ( x ) , H ( x ) be power series such that G has property ( S ) . Assume that the radii ofconvergence ρ G and ρ H of G and H , respectively, satisfy < ρ G < ρ H and H ( ρ G ) = 0 . Then [ x n ] G ( x ) H ( x ) ∼ H ( ρ G ) · [ x n ] G ( x ) , n → ∞ . Note that Lemma 2.3 does not require G to be subexponential, neither does it require that H hasnon-negative coefficients only. We will later apply the lemma with (powers of) G ( x ) := 11 − ax = X k ≥ ( ax ) k for some a >
0. Then G ( x ) has ( S ) with ρ G = a − but G ( a − ) = ∞ ; in particular, property ( S ) is notsatisfied and G is not subexponential.As a final remark in this section we make the following observation, which we shall use mostly withoutfurther reference. Suppose that for two sequences ( a n ) n ∈ N , ( b n ) n ∈ N in R we know that a n ∼ b n as n → ∞ .Then the ratio a n /b n is bounded unless b n = 0, that is, a n ∼ b n = ⇒ there is A > a n ≤ A | b n | for all n ∈ N such that b n = 0 . (2.2)9 Proofs
We briefly (re-)collect all assumptions and fix the notation needed in this section. Let C ( x ) = P k ≥ c k x k denote a power series with non-negative real-valued coefficients and radius of convergence 0 < ρ < C ( ρ ) < ∞ . Further, let m = m C := min { k ∈ N : c k > } (3.1)be the index of the first coefficient that does not equal zero. We also assume that C ( x ) is subexponential,although this is only needed in the very last step of the proof, cf. Lemma 3.10; all other statements precedingthis lemma are valid even without this asssumption.We begin with two auxiliary statements. The first one is about the radius of convergence of G ( x ). Lemma 3.1.
Assume that C ( x ) is a power series with non-negative real-valued coefficients and radius ofconvergence < ρ < and C (0) = 0 , C ( ρ ) < ∞ . Then G ( x ) has radius of convergence ρ and G ( ρ ) < ∞ .Proof. From the definition of G we obtain that G ( x ) = e C ( x ) H ( x ), where log H ( x ) = P j ≥ C ( x j ) /j . Since ρ ∈ (0 ,
1) we obtain for any ε > ε ) ρ < ρ and any j ≥ C (cid:0) (1 + ε ) ρ j (cid:1) = X k ≥ c k (1 + ε ) k ρ jk = (1 + ε ) ρ j X k ≥ c k ((1 + ε ) ρ j ) k − < (1 + ε ) ρ j − C ( ρ ) . In particular, H ((1 + ε ) ρ ) < ∞ and the radius of convergence of H is larger than ρ . Thus, the radius ofconvergence of G is ρ , and G ( ρ ) = e C ( ρ ) H ( ρ ) < ∞ .The second statement is a purely technical result that we will be handy, see e.g [19, Theorem VI.1]. Lemma 3.2.
Let α, β ∈ R + . Then [ x n ](1 − βx ) − α ∼ n α − Γ( α ) β n , n → ∞ . In the remaining part of this section we introduce the Boltzmann model as helpful tool to reduce ourproblems to the investigation of iid random variables in Section 3.1. Subsequently, we present the proofs ofour three main theorems in sections 3.2-3.4. At last we prove Proposition 1.4 in Section 3.5.
In this section we will introduce the
Boltzmann model from the pioneering paper [16], which has foundvarious applications in the study of the typical shape of combinatorial structures, see for example [40, 7, 12,14, 41, 43, 1, 15]. With the help of this model we translate the initial problem of extracting coefficients of themultiset ogf of unlabelled classes into a probabilistic question. This gives us the proper idea for the generalapproach for arbitrary functions of the form (1.1), i.e. when the coefficients are not necessarily integers.Further, the formalisation via this model will allow us to prove the extreme condensation phenomenon.Let for the moment C ( x ) = P k ≥ c k x k denote the (ordinary) generating series of an unlabelled class C ,that is, c k ∈ N for all k ∈ N . Assume that z ∈ R + is chosen such that C ( z ) > C ( z ) taking values in the entire space C throughPr [Γ C ( z ) = C ] = z | C | C ( z ) , C ∈ C . In complete analogy the random variable Γ G ( z ) is defined on G = Mset ( C ) of multisets containing C -objects,where in this case the parameter z > G ( z ) := G ( z, < ∞ in (1.1). In the rest of this sectionwe assume C is subexponential and we fix z = ρ , where ρ ∈ (0 ,
1) is the radius of convergence of C . Then,in virtue of Lemma 3.1, G has radius of convergence ρ and G ( ρ ) < ∞ , so that both Γ C ( ρ ) , Γ G ( ρ ) arewell-defined, and we just write Γ C, Γ G . 10et g n be the number of objects of size n in G and g n,N those of size n comprised of N components. Byusing Bayes’ Theorem and that the Boltzmann model induces a uniform distribution on objects of the samesize, we immediately obtain g n,N g n = Pr [ κ (Γ G ) = N | | Γ G | = n ] = Pr [ | Γ G | = n | κ (Γ G ) = N ] Pr [ κ (Γ G ) = N ]Pr [ | Γ G | = n ] , n, N ∈ N . (3.2)To get a handle on this expression we exploit a powerful description of the distribution of Γ G ( z ) in termsof Γ C ( · ), derived in [20]. In the next steps, the notation F j ∈ J A j is used to denote a multiset of elements A j from a set A , j ∈ J being indices in some countable set J . That is, multiple occurrences of identicalelements are allowed and F j ∈ J A j is completely determined by the different elements it contains and theirmultiplicities.(1) Let ( P j ) j ≥ be independent random variables, where P j ∼ Po (cid:0) C ( ρ j ) /j (cid:1) .(2) Let ( γ j,i ) j,i ≥ be independent random variables with γ j,i ∼ Γ C ( ρ j ) for j, i ≥ j, i ≥ ≤ k ≤ j set γ ( k ) j,i = γ j,i , that is, make j copies of γ j,i . Let Λ G := F j ≥ F ≤ i ≤ P j F ≤ k ≤ j γ ( k ) j,i .Intuitively, we interpret P j as the number of j -cycles in some not further specified permutation and to eachcycle of length j we attach j times an identical copy of a Γ C ( ρ j )-distributed C -object. Afterwards we discardthe permutation and the cycles and keep the multiset of the generated C -objects. This construction is alsomade explicit in [9, Prop. 37]. Lemma 3.3. [20, Prop. 2.1] The distributions of Γ G and Λ G are identical. This statement paves the way to study Γ G . In particular, if we write C j,i = | γ j,i | , note that the definitionof Λ G guarantees that in distribution κ (Γ G ) = X j ≥ jP j and | Γ G | = X j ≥ j X ≤ i ≤ P j C j,i . So, let us for n, N ∈ N define the events P N := X j ≥ jP j = N and E n := X j ≥ j X ≤ i ≤ P j C j,i = n . (3.3)With Pr [ E n ] = Pr [ | Λ G | = n ] = g n ρ n /G ( ρ ) at hand, Lemma 3.3 and (3.2) then guarantee that g n,N = G ( ρ ) − ρ − n Pr [ E n | P N ] Pr [ P N ] . (3.4)Note that for all 1 ≤ i ≤ P j and j ∈ N , we havePr [ C j,i = k ] = c k ρ jk C ( ρ j ) , k ∈ N . (3.5)Equation (3.4) enables us to reduce the problem of determining g n,N = [ x n y N ] G ( x, y ) to the problem ofdetermining the probability of the events P N and E n conditioned on P N .Before we treat the aforementioned probabilities, let us first turn to the general setting considered in thispaper, namely the case where ( c k ) k ∈ N is a real-valued non-negative sequence. In complete analogy to theprevious discussion let P j ∼ Po (cid:0) C ( ρ j ) /j (cid:1) for j ∈ N , ( C j, , . . . , C j,P j ) j ∈ N be as in (3.5), and assume that allthese variables are independent. As a matter of fact, also in this (general) case we obtain exactly the samerepresentation of [ x n y N ] G ( x, y ) in terms of E n and P N defined in (3.3). Lemma 3.4.
Let C ( x ) be a power series with non-negative real-valued coefficients and radius of convergence < ρ < at which C ( ρ ) < ∞ . Then [ x n y N ] G ( x, y ) = G ( ρ ) ρ − n Pr [ E n | P N ] Pr [ P N ] , n, N ∈ N . roof. We begin with the simple observationPr [ P N , E n ] = [ x n y N ] X k ≥ X ℓ ≥ Pr [ P k , E ℓ ] x ℓ y k = [ x n y N ] X k ≥ y k X P j ≥ jp j = k Y j ≥ Pr [ P j = p j ] X ℓ ≥ Pr X j ≥ j X ≤ i ≤ p j C j,i = ℓ x ℓ . (3.6)We will study this expresion by first simplifying the sum over ℓ , then the sum over all p j ’s, and even-tually the sum over k . We begin with the sum over ℓ . For a N -valued random variable A let A ( x ) := P ℓ ≥ Pr [ A j = ℓ ] x ℓ denote its probability generating series. Then, if ( A j ) j ∈ N is a sequence of independent N -valued random variables, ( A + · · · + A m )( x ) = Y ≤ j ≤ m A j ( x ) , m ∈ N . (3.7)Let us write C j ( x ) for the probability generating series of jC j,i ; note that the actual value of i is notimportant, since the ( C j,i ) i ∈ N are iid. Then, whenever P j ≥ p j is finite, (3.7) implies X ℓ ≥ Pr X j ≥ j X ≤ i ≤ p j C j,i = ℓ x ℓ = Y j ≥ C j ( x ) p j . Noting that jC j, takes only values in the lattice j N , we obtain C j ( x ) = X ℓ ≥ Pr [ jC j, = ℓ ] x ℓ = X ℓ ≥ Pr [ C j, = ℓ ] x jℓ = 1 C ( ρ j ) X ℓ ≥ c ℓ ρ ℓ x jℓ = C (( ρx ) j ) C ( ρ j ) . We deduce X ℓ ≥ Pr X j ≥ j X ≤ i ≤ p j C j,i = ℓ x ℓ = Y j ≥ (cid:18) C (( ρx ) j ) C ( ρ j ) (cid:19) p j . This puts the sum over ℓ in (3.6) in compact form. To simplify the sum over the p j ’s in (3.6) defineindependent random variables ( H j ) j ≥ with H j ∼ Po (cid:0) C (( ρx ) j ) /j (cid:1) . Then X P j ≥ jp j = k Y j ≥ Pr [ P j = p j ] (cid:18) C (( ρx ) j ) C ( ρ j ) (cid:19) p j = G ( ρx, G ( ρ,
1) Pr X j ≥ jH j = k . By similar reasoning as before the probability generating function of jH j is given by X ℓ ≥ Pr [ H j = ℓ ] y jℓ = exp (cid:0) − C (( ρx ) j ) /j (cid:1) X ℓ ≥ ( C (( ρx ) j ) y j /j ) ℓ ℓ ! = exp (cid:0) C (( ρx ) j ) y j /j (cid:1) exp ( C (( ρx ) j ) /j ) . Applying (3.7), where we set A j := jH j , in combination with this identity and plugging everything into (3.6)yields X k ≥ Pr X j ≥ jH j = k y k = G ( ρx, y ) G ( ρ ) . All in all, we have shown that Pr [ P N , E n ] = G ( ρ ) − [ x n y N ] G ( ρx, y ). With [ x n ] F ( ax ) = a n [ x n ] F ( x ) for anypower series F and a ∈ R we finish the proof. 12 .2 Proof of Theorem 1.1 Let P N , E n be as in the previous section, see (3.3), where P j ∼ Po (cid:0) C ( ρ j ) /j (cid:1) , j ∈ N and C j, , . . . , C j,P j , j ∈ N have the distribution specified in (3.5). Moreover, we assume that all these random variables are independent.Equipped with Lemma 3.4 from the previous section, the proof of Theorem 1.1 boils down to estimatingPr [ E n | P N ] and Pr [ P N ]. Before we actually do so, let us introduce some more auxiliary quantities. Set P := X j ≥ jP j and P ( ℓ ) := X j>ℓ jP j , ℓ ∈ N . With this notation, P N is the same as { P = N } and { P (0) = N } . Moreover, recall (3.1) and set L := X ≤ i ≤ P ( C ,i − m ) and R := X j ≥ j X ≤ i ≤ P j ( C j,i − m ) . (3.8)With this notation Pr [ E n | P N ] = Pr [ L + R = n − mN | P N ] . (3.9)The driving idea behind these definitions is that the random variables C j,i − m , for j ≥
2, have exponentialtails, and these tails get thinner as we increase j ; in particular, the probability that C j,i − m = 0 approachesone exponentially fast as we increase j . However, things are not so easy, since we always condition on P N ,and in this space some of the P j ’s might be large. This brings us to our general proof strategy. First of all,we will study our probability space conditioned on P N ; in particular, in Corollary 3.7 and Lemma 3.8 belowwe describe the joint distribution of P , . . . , P N given P N . More specifically, these results show that the P j ’sare (more or less) distributed like Poisson random variables with bounded expectations. This will allow usthen in Lemma 3.9 to show that L dominates the sum L + R in the sense that Pr [ L + R = n − mN | P N ] ∼ Pr [ L = n − mN | P N ] as n, N, n − N → ∞ . Subsequently, in Lemma 3.10 we expoit the subexponentialityand establish that this last probability is essentially a multiple of Pr [ C , = n − mN ]. Just as a side remarkand so as to make the notation more accessible: it is instructive to think of the random variable L assomething (that will turn out to be) large, and R as some remainder (that will turn out to be small withexponential tails).Our first aim is to study the distribution – in particular the tails – of P and P ( ℓ ) , that is, we want toestimate the probability of P N . To this end, consider the probability generating series F ( x ) and F ( ℓ ) ( x ) of P and P ( ℓ ) , respectively, that is F ( ℓ ) ( x ) = 1 G ( ℓ ) ( ρ ) · exp X j>ℓ C ( ρ j ) j x j , where G ( ℓ ) ( ρ ) := exp X j>ℓ C ( ρ j ) j and F ( x ) = F (0) ( x ), G (0) ( ρ ) = G ( ρ ). Hence, the distribution of P ( ℓ ) (and P ) is given by (Pr[ P ( ℓ ) = N ]) N ≥ = ([ x N ] F ( ℓ ) ( x )) N ≥ . In Lemma 3.6 we determine the precise asymptotic behaviour of these proba-bilities. But first, we need a simple auxiliary statement. Proposition 3.5.
There exists
A > such that, for all < z ≤ ρ and j ∈ N ≤ C ( z j ) c m z jm ≤ Az j . Proof.
The first inequality follows directly from the definition of C and m . Note that C ( z j ) c m z jm ≤ z j c m X k>m c k ρ jk − j ( m +1) = 1 + z j ρ − m c m X k>m c k ρ jk − j ( m +1)+2 m . (3.10)Since m ≥ k ≥ m + 1 we obtain that jk − j ( m + 1) + 2 m = j ( k − m −
1) + 2 m ≥ ( k − m −
1) + 2 m = k + m − ≥ k. Thus, as ρ <
1, we obtain from (3.10) the claimed bound with A = C ( ρ ) ρ − m /c m .13 emma 3.6. There exist constants ( B ( ℓ ) ) ℓ ∈ N > such that, as N → ∞ [ x N ] F ( x ) ∼ B (0) · N c m − ρ mN and [ x N ] F ( ℓ ) ( x ) ∼ B ( ℓ ) · N c m − ρ mN , ℓ ∈ N , where B ( ℓ ) B (0) = exp X ≤ j ≤ ℓ C ( ρ j ) j exp − X ≤ j ≤ ℓ C ( ρ j ) j ρ − jm . Proof.
We split up F ( x ) = 1 G ( ρ ) exp X j ≥ c m ρ jm j x j exp X j ≥ C ( ρ j ) − c m ρ jm j x j =: 1 G ( ρ ) G ( x ) H ( x ) . Proposition 3.5 asserts that H ( x ) has radius of convergence ρ H ≥ ρ − ( m +1) . Further, G ( x ) = (1 − ρ m x ) − c m , and the radius of convergence of G ( x ) is ρ G = ρ − m < ρ − ( m +1) ≤ ρ H (since ρ < x N ] G ( x ) ∼ N c m − ρ mN / Γ( c m ) and thus G ( x ) has property ( S ). From Lemma 2.3 we thenobtain thatPr [ P = N ] = [ x N ] F ( x ) ∼ G ( ρ ) H ( ρ G )[ x N ] G ( x ) = H ( ρ − m ) G ( ρ )Γ( c m ) · N c m − ρ mN , N → ∞ , (3.11)Similarly, for ℓ ∈ N F ( ℓ ) ( x ) = G ( x ) G ( ℓ ) ( ρ ) exp X j ≥ C ( ρ j ) − c m ρ jm j x j exp − X ≤ j ≤ ℓ C ( ρ j ) j x j =: G ( x ) G ( ℓ ) ( ρ ) H ( ℓ ) ( x ) . Since the radius of convergence of H ( ℓ ) ( x ) is again (at least) ρ − ( m +1) [ x N ] F ( ℓ ) ( x ) ∼ H ( ℓ ) ( ρ − m ) G ( ℓ ) ( ρ )Γ( c m ) · N c m − ρ mN . As an immediate consequence of Lemma 3.6 we establish the asymptotic distribution of the randomvector ( P , . . . , P ℓ ) conditioned on the event P N for fixed ℓ ∈ N ; this will be useful later when we considerthe distribution of L , cf. (3.8). Clearly, the condition P N makes P , . . . , P ℓ dependent, but the corollary saysthat this effect vanishes for large N . Moreover, we study the moments of P given P N . Corollary 3.7.
Let ℓ ∈ N and ( p , . . . , p ℓ ) ∈ N ℓ . Then Pr \ ≤ j ≤ ℓ { P j = p j } | P N → Y ≤ j ≤ ℓ Pr (cid:20) Po (cid:18) C ( ρ j ) jρ jm (cid:19) = p j (cid:21) , N → ∞ . (3.12) Moreover, for any z ∈ R , as N → ∞ E (cid:2) z P | P N (cid:3) → E h z Po ( C ( ρ ) ρm ) i = e C ( ρ ) ρm ( z − , E [ P | P N ] → E (cid:20) Po (cid:18) C ( ρ ) ρ m (cid:19)(cid:21) = C ( ρ ) ρ − m . Proof.
Let s = P ≤ j ≤ ℓ jp j . Using the definition of conditional probability we obtain readilyPr \ ≤ j ≤ ℓ { P j = p j } | P N = Pr hT ≤ j ≤ ℓ { P j = p j } ∩ { P ( ℓ ) = N − s } i Pr [ P = N ] . P , . . . P ℓ , P ( ℓ ) are independent, the right-hand size equals Y ≤ j ≤ ℓ Pr [ P j = p j ] · [ x N − s ] F ( ℓ ) ( x ) / [ x N ] F ( x ) , (3.13)and (3.12) follows by applying Lemma 3.6. We will next show P given P N has exponential moments.Abbreviate B := C ( ρ ) ρ − m . Note that (3.12) (where we use ℓ = 1) yields for any fixed K ∈ N X ≤ k ≤ K z k Pr [ P = k | P N ] ∼ X ≤ k ≤ K z k Pr [Po ( B ) = k ] , N → ∞ . Let ε >
0. Note that we can choose K large enough such that the right hand side differs at most ε from E (cid:2) z Po( B ) (cid:3) = e B ( z − . In order finish the proof we will argue that if K and N are large enough, then P K ≤ k ≤ N z k Pr [ P = k | P N ] < ε as well. First, by Lemma 3.6 z N Pr [ P = N | P N ] ≤ z N Pr [ P = N ]Pr [ P N ] ∼ A N − c m +1 ( zB ) N N ! → , N → ∞ . Moreover, according to Lemma 3.6 there exists A > x N − k ] F (1) ( x ) / [ x N ] F ( x ) ≤ A · (1 − k/N ) c m − ρ − mk for all 0 ≤ k ≤ N −
1. Then with (3.13) we obtain X K ≤ k ≤ N − z k Pr [ P = k | P N ] ≤ A X K ≤ k ≤ N − t k , where t k := (1 − k/N ) c m − ( zB ) k k ! . (3.14)Note that we can choose K large enough such that, say, t k +1 ≤ t k / K ≤ k < N −
1. Then the sumis bounded by 2 t K , and choosing K once more large enough gives 2 t K < ε .Note that Corollary 3.7 (only) holds for a fixed ℓ ∈ N ; it does not tell us anything about ( P , . . . , P ℓ )in the case where ℓ is not fixed, or, more importantly, when ℓ = N (note that P N ′ = 0 for all N ′ > N ifwe condition on P N ). Regarding this general case, the following statement gives an upper bound for theprobability of the event T ≤ j ≤ N { P j = p j } that is not too far from the right-hand side in Corollary 3.7. Forthe remainder of this section it is convenient to defineΩ N := n ( p , . . . , p N ) ∈ N N : X ≤ j ≤ N jp j = N o , N ≥ . In what follows we derive a stochastic upper bound for the distribution of ( P , . . . , P N ) conditioned on P N . Lemma 3.8.
There exists an
A > such that for all N and all ( p , . . . , p N ) ∈ Ω N Pr \ ≤ j ≤ N { P j = p j } | P N ≤ A · N · Y ≤ j ≤ N Pr (cid:20) Po (cid:18) C ( ρ j ) jρ jm (cid:19) = p j (cid:21) . Proof.
Using the definition of conditional probability and recalling that the P j ’s are independent and P j ∼ Po (cid:0) C ( ρ j ) /j (cid:1) Pr " \ ≤ j ≤ N { P j = p j } | P N ≤ P N ] · Y ≤ j ≤ N (cid:18) C ( ρ j ) j (cid:19) p j p j != 1Pr [ P N ] · exp X ≤ j ≤ N C ( ρ j ) jρ jm ρ mN · Y ≤ j ≤ N Pr (cid:20) Po (cid:18) C ( ρ j ) jρ jm (cid:19) = p j (cid:21) . With Lemma 3.6 we obtain the existence of B > N large enoughPr [ P N ] − ≤ B ρ − mN N − c m .
15y Proposition 3.5 there exists a constant B > C ( ρ j ) /ρ jm ≤ c m + B c m ρ j . Consequently, since ρ ∈ (0 ,
1) there exists B > X ≤ j ≤ N C ( ρ j ) jρ jm ≤ B N c m , which concludes the proof.With this result at hand we are ready to study the distribution of R , cf. (3.8). As it will be necessarylater, we show uniform tails bounds that hold for the joint distribution of P and R conditioned on P N . Lemma 3.9.
There exist
A > and < a < such that Pr [ P = p, R = r | P N ] ≤ A · a p + r , p, r, N ∈ N . Proof.
We will prove the claimed bound by showing appropriate bounds for the moment generating function E (cid:2) e λR | P N (cid:3) . Let us fix any 0 < λ < − log( ρ ) / ρe λ <
1. Then ρ j e λj < ρ for all j ≥
2. Recallthat Pr [ C j,i = k ] = c k ρ jk /C ( ρ j ) , k ∈ N , j ≥ , i ≥
1, see (3.5). We obtain that E h e λ ( j ( C j,i − m )) i = X s ≥ Pr [ C j,i = s + m ] e λjs = e − λjm C ( ρ j e λj ) C ( ρ j ) , i ≥ , j ≥ . Let Ω
N,p be the set of all p = ( p , . . . , p N ) ∈ N N − such that ( p, p , . . . , p N ) ∈ Ω N , i.e. p = N − P ≤ j ≤ N jp j ,and let E p be the event E p := { P = p } ∩ \ ≤ j ≤ N { P j = p j } . Then by Markov’s inequality and the independence of the C j,i ’s and the P j ’s, for any p ∈ Ω N,p
Pr [ P = p, R ≥ r | E p ] = Pr (cid:2) e λR ≥ e λr | E p (cid:3) ≤ e − λr E (cid:2) e λR | E p (cid:3) = e − λr N Y j =2 (cid:18) C ( ρ j e λj ) C ( ρ j )e λjm (cid:19) p j . Abbreviate τ j := C (( ρ e λ ) j ) / ( ρ e λ ) jm for j ∈ N . By Lemma 3.8 there exists A > P = p, R ≥ r | P N ] = X p ∈ Ω N,p
Pr [ R ≥ r | E p ] Pr [ E p | P N ] ≤ A e − λr N exp − X ≤ j ≤ N C ( ρ j ) jρ jm ( C ( ρ ) /ρ m ) p p ! X p ∈ Ω N,p Y ≤ j ≤ N ( τ j /j ) p j p j ! . (3.15)With Proposition 3.5 we find A > − X ≤ j ≤ N C ( ρ j ) jρ jm ≤ exp − c m X ≤ j ≤ N j ≤ A N − c m . Let H j ∼ Po ( τ j /j ) be independent for j = 2 , . . . , N and set τ := exp( P ≤ j ≤ N τ j /j ). Moreover, abbreviate B := C ( ρ ) /ρ m . From (3.15) we obtain that there is an A > P = p, R ≥ r | P N ] ≤ A e − λr N − c m · τ · B p p ! X p ∈ Ω N,p N Y j =2 Pr [ H j = p j ] . (3.16)Note that X p ∈ Ω N,p N Y j =2 Pr [ H j = p j ] = Pr N X j =2 jH j = N − p = τ − · [ x N − p ] exp X j ≥ τ j j x j . ≤ j ≤ N ;however, [ x M ]exp( P j ≥ τ j x j /j ) = [ x M ]exp( P ≤ j ≤ M τ j x j /j ) for all M ∈ N . Thenexp X j ≥ τ j j x j = exp c m X j ≥ x j j · exp − c m x + X j ≥ x j j ( τ j − c m ) =: G ( x ) · H ( x ) . By Proposition 3.5 there exists a constant A > τ j ≤ c m (1 + A ( ρ e λ ) j ) . With this at handwe deduce that H ( x ) has radius of convergence (at least) ( ρ e λ ) − , which by our choice of λ is >
1. Notethat G ( x ) = (1 − x ) − c m , which shows together with Lemma 3.2 that G has property ( S ) with radius ofconvergence 1. As G ( x ) only has positive coefficients, by Lemma 2.3 and the remark in (2.2) there is an A > x N − p ] G ( x ) H ( x ) ≤ A ( N − p ) c m − , p = 0 , . . . , N − . All in all, Pr X ≤ j ≤ N jH j = N − p ≤ A τ − ( N − p ) c m − , p = 0 , . . . , N − . (3.17)For the case p = N note that the probability that P ≤ j ≤ N jH j = 0 equals τ − . Putting the pieces together,we get from (3.16) that there is an A > P = p, R ≥ r | P N ] ≤ A e − λr N − c m (cid:18) B N N ! + B p p ! ( N − p ) c m − · [ p = N ] (cid:19) . (3.18)Observe that N − c m B N /N ! ≤ e − λN for N large enough. Additionally, if N/ ≤ p < N , then for N largeenough N − c m ( e λ B ) p p ! ( N − p ) c m − = (1 − p/N ) c m − ( e λ B ) p p ! ≤ N − c m · ( e λ B ) p p ! ≤ ≤ p ≤ N/ N − c m ( e λ B ) p p ! ( N − p ) c m − ≤ max { − c m , } · e e λ B Pr (cid:2) Po (cid:0) e λ B (cid:1) = p (cid:3) ≤ max { − c m , } · e e λ B is also bounded. Plugging these bounds into (3.18) completes the proof.We have just proven that P , R have (joint) exponential tails when conditioned on P N . The next lemmais the last essential step towards the proof of Theorem 1.1, where we estimate Pr [ E n | P N ]. Recall from (3.9)that Pr [ E n | P N ] = Pr [ L + R = n − mN | P N ] , where L = X ≤ i ≤ P ( C ,i − m ) . Lemma 3.10.
Let C ( x ) be subexponential. Then Pr [ E n | P N ] ∼ c n − m ( N − ρ n − mN , n, N, n − mN → ∞ . Proof.
For the entire proof we abbreviate e N := n − mN . ThenPr [ E n | P N ] = X p ≥ X r ≥ Pr h L = e N − r | P N , P = p, R = r i Pr [ P = p, R = r | P N ] . (3.19)For brevity, let us write in the remainder D N,p,r = P N ∩ { P = p } ∩ { R = r } and Q e N := Pr (cid:2) C , = e N + m (cid:3) = c n − m ( N − ρ n − m ( N − C ( ρ ) .
17e will show that Pr (cid:2) L = e N − r | D N,p,r (cid:3) ∼ p · Q e N for p, r ∈ N , as e N → ∞ . (3.20)Let a ∈ (0 ,
1) be the constant guaranteed to exist from Lemma 3.9, and choose δ > δ ) a < C > , N ∈ N such thatPr (cid:2) L = e N − r | D N,p,r (cid:3) ≤ C (1 + δ ) p + r · Q e N for all p, r ∈ N , e N ≥ N . (3.21)From the two facts (3.20) and (3.21) the statement in the lemma can be obtained as follows. We will assumethroughout that δ is fixed as described above, say for concreteness δ = ( a − − /
2, and choose an 0 < ε < K ∈ N in dependence of ε only, and we will split the double sum in (3.19)in three parts with ( p, r ) in the sets B ≤ = { ( p, r ) : 0 ≤ p, r ≤ K } , B >, · = { ( p, r ) : p > K, r ∈ N } , B · ,> = { ( p, r ) : p ∈ N , r > K } . We will show that the main contribution to Pr [ E n | P N ] stems from B ≤ , while the other two parts contributerather insignificantly. Let us begin with treating the latter parts. Observe that using Lemma 3.9 and (3.21)we obtain that there is a constant C ′ > r ∈ N and K ≥ K ( ε ) X p ≥ K Pr h L = e N − r | D N,p,r i Pr [ P = p, R = r | P N ] ≤ C ′ X p ≥ K (1 + δ ) p + r · a p + r · Q ≤ ε · ((1 + δ ) a ) r · Q e N . Since (1 + δ ) a <
1, summing this over all r readily yields for c = (1 − (1 + δ ) a ) − that X ( p,r ) ∈ B >, · Pr h L = e N − r | D N,p,r i Pr [ P = p, R = r | P N ] ≤ cε · Q e N . (3.22)Completely analogously with the roles of p, r interchanged we obtain that also X ( p,r ) ∈ B · ,> Pr h L = e N − r | D N,p,r i Pr [ P = p, R = r | P N ] ≤ cε · Q e N . (3.23)It remains to handle the part of the sum in (3.19) with p, r ∈ B ≤ . Using (3.20) we infer that X ( p,r ) ∈ B ≤ Pr h L = e N − r | D N,p,r i Pr [ P = p, R = r | P N ] ∼ X ( p,r ) ∈ B ≤ p Pr [ P = p, R = r | P N ] · Q e N . Using Lemma 3.9 once again note that we can choose K large enough such that X ≤ p ≤ K X r ≥ K p Pr [ P = p, R = r | P N ] ≤ A X ≤ p ≤ K X r ≥ K pa p + r ≤ ε and that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X p ≥ p Pr [ P = p | P N ] − X ≤ p ≤ K p Pr [ P = p | P N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X p>K p Pr [ P = p | P N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε. Altogether this establishes that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X ( p,r ) ∈ B ≤ Pr h L = e N − r | D N,p,r i Pr [ P = p, R = r | P N ] − E [ P | P N ] Q e N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ εQ e N . Corollary 3.7 asserts that E [ P | P N ] → C ( ρ ) ρ − m . Since ε > E n | P N ] ∼ C ( ρ ) ρ − m · Q e N , which is the claim of the lemma.18n order to complete the proof it remains to show the two claims (3.20) and (3.21). We begin with (3.20).Note that for p, r ∈ N Pr (cid:2) L = e N − r | P N , P = p, R = r (cid:3) = Pr X ≤ i ≤ p C ,i = e N − r + pm . (3.24)Recall that Pr[ C , = k ] = c k ρ k /C ( ρ ), where ρ is the radius of convergence of C . Since C is subexponential, c k − ∼ ρc k and thus the distribution of the C ,i ’s is also subexponential with Pr[ C , = k − ∼ Pr[ C , = k ].We obtain with Lemma 2.2 (i) that the latter probability is ∼ p Pr[ C , = e N − r + pm ], as e N → ∞ . Moreover,as e N → ∞ , Pr[ C , = e N − r + pm ] ∼ Q e N , and (3.20) is established.We finally show (3.21). Our starting point is again (3.24). Note that with Lemma 2.2 (ii) there are C > N ∈ N such that the sought probability is at most C (1 + δ ) p Pr[ C , = e N − r + pm ] for all e N − r + pm ≥ N . Moreover, as we have argued in the previous paragraph, the distribution of C , issubexponential with Pr[ C , = k − ∼ Pr[ C , = k ]; we thus may choose C and N large enough suchthat in addition Pr[ C , = e N − r + pm ] ≤ C (1 + δ ) r Q e N . This establishes (3.21) if e N − r + pm ≥ N . Totreat the remaining cases, note that in this situation we have r > e N − N . Since the probability generatingseries of C , is subexponential with radius of convergence 1, we obtain that C (1 + δ ) r Q e N > e N ; thus (3.21) is trivially true in this case.With all these facts at hand the proof of Theorem 1.1 is straightforward. With Lemma 3.4 and 3.6 (inparticular, Equation (3.11)) we obtain as n, N, n − mN → ∞ ,[ x n y N ] G ( x, y ) = 1 G ( ρ ) ρ − n Pr [ E n | P N ] Pr [ P N ] ∼ c m ) exp X j ≥ C ( ρ j ) − c m ρ jm j ρ − jm N c m − c n − m ( N − . Let us begin with (re-)collecting all basic definitions that will be needed in the proof. Suppose that C ( x )is subexponential with radius of convergence 0 < ρ < m := min { k ∈ N : c k > } , see also (3.1).Moreover, let P j ∼ Po (cid:0) C ( ρ j ) /j (cid:1) , j ∈ N and C j, , . . . , C j,P j , j ∈ N have the distribution specified in (3.5),that is, Pr [ C j,i = k ] = c k ρ jk /C ( ρ j ) , k, i, j ∈ N . We assume that all these random variables are independent.Let P N , E n be as in (3.3), that is, with P = X j ≥ jP j , L = X ≤ i ≤ P ( C j,i − m ) , R = X j ≥ j X ≤ i ≤ P j ( C j,i − m )we have that P N = { P = N } and E n = { L + R = n − mP } .With this notation at hand, let G n,N be a uniformly drawn random object from G n,N , meaning thatthe number of atoms is n and the number of components N . According to Lemma 3.3 and using that theBoltzmann model induces the uniform distribution on objects of the same size, we infer thatPr [ G n,N = G ] = 1 |G n,N | = ρ n /C ( ρ ) |G n,N | ρ n /C ( ρ ) = Pr [Λ G = G ]Pr [ P N , E n ] = Pr [Λ G = G | P N , E n ] , G ∈ G n,N , that is, studying the distribution of G n,N boils down to considering the distribution of Λ G conditional onboth P N , E n . This is the starting point of our investigations. In particular, G n,N has N components withsizes given by the vector ( C j,i : 1 ≤ j ≤ N, ≤ i ≤ P j ). Our aim is here to study the properties of thatvector in the conditional space given by P N , E n . To this end, set M ∗ := max j ≥ , ≤ i ≤ P j C j,i and C ∗ p := max { C , , . . . , C ,p } for p ∈ N . (3.25)19hen the statement of the theorem is that, conditional on P N , E n , we have that M ∗ = n − mN + O p (1);since the total number of atoms is n , the number of components is N , and the smallest component contains m atoms, this immediately implies that there are N + O p (1) components with exactly m atoms, and allremaining components have a total size of O p (1) as well.The general proof strategy in the remaining section is as follows. We first show in Lemma 3.11 that both P , R are “small” in the conditioned space; this makes sure that only a bounded number of entries in thevector ( C j,i ) j ≥ , ≤ i ≤ P j are larger than m , and that this total excess is bounded. Hence, the remaining numberof n − ( N − P ) m + O p (1) atoms is to be found in the components with sizes in ( C ,i ) ≤ i ≤ P . In Lemma 3.11we exclude that P grows too large conditioned on E n , P N ; indeed, we show that it is stochastically bounded.Then the property of subexponentiality guarantees that only the maximum of the C ,i ’s dominates the entiresum, cf. Lemma 2.2 (iii), and Theorem 1.2 follows.Let us now fill this overview with details. Recall Lemma 3.9, which says that P , R have (joint) exponentialtails given P N . We show that conditioning in addition to E n does not change the behaviour qualitatively.The proof can be found at the end of the section. Lemma 3.11.
There exist constants
A > and < a < such that Pr [ P = p, R = r | E n , P N ] ≤ A · a p + r , p, r, n, N ∈ N . With this lemma the proof of the theorem can be completed as follows. Let ε > e N = n − mN . With M ∗ as in (3.25) we will show that there is K ∈ N such thatPr h | M ∗ − e N | ≥ K | E n , P N i < ε for n, N, e N sufficiently large, which is the statement of the theorem. According to Lemma 3.11 there existconstants C R , C P ∈ N such thatPr [ R ≥ C R , P ≥ C P | E n , P N ] < ε/ , n, N, e N ∈ N . We deducePr h | M ∗ − e N | ≥ K | E n , P N i ≤ ε X ≤ r ≤ C R X ≤ p ≤ C P Pr h | M ∗ − e N | ≥ K | E n , P N , R = r, P = p i . (3.26)Note that we only need to consider values of p which are larger than 1 as p = 0 excludes R = r ≤ C R < e N .The event “ E n , P N , R = r, P = p ” implies that | C j,i | ≤ m + r for all j ≥ , ≤ i ≤ P j , and S p := P ≤ i ≤ p C ,i = e N − r + pm . Recall the definition of C ∗ from (3.25). Assume that C ∗ p ≤ m + r , then we getthe contradiction e N − r + pm = S p ≤ p ( m + r ) < e N − r + pm for e N large enough. It follows that C ∗ p > m + r and hence we are allowed to interchange C ∗ p and M ∗ in this conditioned space. That yieldsPr h | M ∗ − e N | ≥ K | E n , P N , R = r, P = p i = Pr h | C ∗ p − e N | ≥ K | S p = e N − r + pm i , for 1 ≤ p ≤ C P , ≤ r ≤ C R . As C ∗ p is at most e N − r + pm under this condition, we particularlyobtain that { C ∗ p ≥ e N + K } = ∅ for K ≥ mC P as long as 0 ≤ p ≤ C P and r ≥
0. Consequently, for1 ≤ p ≤ C P , ≤ r ≤ C R ,Pr h | C ∗ p − e N | ≥ K | S p = e N − r + pm i = Pr h C ∗ p ≤ e N − K | S p = e N − r + pm i . Now Lemma 2.2 (iii) is applicable as C ,i has subexponential distribution for 1 ≤ i ≤ p and hence for1 ≤ p ≤ C P , ≤ r ≤ C R we have ( C ∗ p | S p = e N + r − pm ) = e N + r − pm + O p (1) as ˜ N → ∞ . Consequently,choosing K large enough,Pr h C ∗ p ≤ e N − K | S p = e N − r + pm i < ε C R C P , ≤ p ≤ C P , ≤ r ≤ C R .
20e conclude from (3.26)Pr h | M ∗ − e N | ≥ K | E n , P N i ≤ ε X ≤ r ≤ C R X ≤ p ≤ C P Pr h C ∗ p ≤ e N − K | S p = e N − r + pm i < ε. Since ε > M ∗ | E n , P N ) = e N + O p (1),and the proof is completed. Proof of Lemma 3.11.
We start with the observationPr [ P = p, R = r | E n , P N ] = Pr [ E n | P = p, R = r, P N ] Pr [ P = p, R = r | P N ] Pr [ E n | P N ] − . (3.27)Set e N := n − mN and L p := P ≤ i ≤ p ( C ,i − m ) for p ∈ N as well as Q e N = Pr[ C , − m = e N ]. Let 0 < a < δ > δ ) a <
1. With (3.21) we obtain that thereexists A > E n | P = p, R = r, P N ] = Pr h L p = e N − r | P N i ≤ A (1 + δ ) p + r Q e N , p, r, n, N ∈ N . Lemma 3.9 tells us that we find A > P = p, R = r | P N ] ≤ A a p + r , p, r, N ∈ N . Finally, according to Lemma 3.10 there is a constant A such thatPr [ E n | P N ] ≥ A Q e N , n, N ∈ N , and the claim follows with a replaced by (1 + δ ) a < For the proof of this theorem we use the equivalent definition of multisets in which all objects not occurringin G ∈ G are counted with multiplicity d = 0. Let G = { ( C, d C ) : C ∈ C >m } ∪ { ( C,
0) : C ∈ C m } ∈ G andassume that N ( n ) ≡ N is such that N ( n ) , n − mN ( n ) → ∞ as n → ∞ . Let us write R n,N for the objectobtained after removing (i.e. setting the multiplicity to 0) all objects of size m and a largest component (i.e.subtracting the multiplicity by one) from G n,N . The statement of the theorem is equivalent to showing thatPr [ R n,N = G ] → exp − X j ≥ C ( ρ j ) − c m ρ jm jρ jm ρ | G |− mκ ( G ) , n → ∞ , see also (1.3). Defining the family of multiplicity counting functions ( d C ( · )) C ∈C by ( d C ( G )) C ∈C = ( d C ) C ∈C for G = { ( C, d C ) : C ∈ C} ∈ G we immediately obtain thatPr [ R n,N = G ] = Pr [ ∀ C ∈ C >m : d C ( R n,N ) = d C ] . Let
S > max { m, | G |} be some arbitrary integer to be specified later. We infer thatPr [ R n,N = G ] ≤ Pr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C ] . To obtain a lower bound, since
S > | G | , we observe that {∀ C ∈ C >m : d C ( R n,N ) = d C } is the same as {∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C } ∩ {∀ C ∈ C >S : d C ( R n,N ) = 0 } . Moreover, note that | R n,N | ≤ S implies d C ( R n,N ) = 0 for all C ∈ C >S . ThusPr [ R n,N = G ] ≥ Pr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C , | R n,N | ≤ S ] ≥ Pr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C ] − Pr [ | R n,N | > S ] . Let ε >
0. According to Theorem 1.2 there is S > max { m, | G |} so that Pr [ | R n,N | > S ] < ε . HencePr [ R n,N = G ] differs by at most ε from Pr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C ] for all S > S . Let us write L n,N G n,N . Theorem 1.2 guarantees that L n,N is unbounded whp, and sowe obtain for any S ∈ N Pr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C ] = Pr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C , | L n,N | > S ] + o (1) . However, the event {∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C , | L n,N | > S } is equivalent to the event {∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C , | L n,N | > S } , since we obtain R n,N by removing all components with size m and a largestcomponent (of size > S ) from G n,N . Now we add and subtract Pr [ ∀ C ∈ C m +1 ,s : d C ( G n,N ) = d C , | L n,N | ≤ S ] = o (1) in order to get rid of the event | L n,N | > S and arrive at the factPr [ ∀ C ∈ C m +1 ,S : d C ( R n,N ) = d C ] = Pr [ ∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C ] + o (1) . Combining all previous facts yields that for n sufficiently large (cid:12)(cid:12) Pr [ R n,N = G ] − Pr [ ∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C ] (cid:12)(cid:12) ≤ ε (3.28)and thus we are left with estimating Pr [ ∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C ]. For v S := ( v C ) C ∈C m +1 ,S denoteby G ( x, y, v S ) the generating series of G such that x marks the size, y the number of components and v S = ( v C ) C ∈C m +1 ,S the multiplicities of ( C ) C ∈C m +1 ,S , or in other words: for ℓ, k ∈ N , t S := ( t C ) C ∈C m +1 ,S ∈ N |C m +1 ,S | the coefficients are given by g ℓ,k, t S = |{ G ∈ G : | G | = ℓ, κ ( G ) = k, ∀ C ∈ C m +1 ,S : d C ( G ) = t C }| . Setting v C = 1 for all C ∈ C m +1 ,S we obtain the generating series G ( x, y ) counting only size and number ofcomponents by x and y respectively. As G n,N is drawn uniformly at random from G n,N the proof reduces todetermining Pr [ ∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C ] = [ x n y N v d S s ] G ( x, y, v S )[ x n y N ] G ( x, y ) , The following lemma, whose proof is shifted to the end of this section, acocmplishes this task.
Lemma 3.12.
Let d = ( d C ) C ∈C m +1 ,S with D := P C ∈C m +1 ,S | C | d C and D ′ := P C ∈C m +1 ,S d C . Then [ x n y N v d S s ] G ( x, y, v S )[ x n y N ] G ( x, y ) → ρ D − mD ′ Y C ∈C m +1 ,S (1 − ρ | C |− m ) , n → ∞ . Lemma 3.12 yields directly for sufficiently large n (cid:12)(cid:12)(cid:12)(cid:12) Pr [ ∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C ] − ρ | G |− mκ ( G ) Y C ∈C m +1 ,S (1 − ρ | C |− m ) (cid:12)(cid:12)(cid:12)(cid:12) < ε. Now observe that with defining C m +1 ,S ( x ) := P m<ℓ ≤ S |C ℓ | x ℓ we obtainlim S →∞ Y C ∈C m +1 ,S (1 − ρ | C |− m ) = lim S →∞ Y m<ℓ ≤ S exp (cid:0) |C ℓ | log(1 − ρ ℓ − m ) (cid:1) = lim S →∞ exp − X j ≥ C m +1 ,S ( ρ j ) jρ jm . By the continuity of exp ( · ) and monotone convergence this equals G >m ( ρ ) − . Choose S > max { m, | G |} large enough such that Q C ∈C m +1 ,S (1 − ρ | C |− m ) differs at most by ε from G >m ( ρ ) − for all S > S . Summa-rizing, fixing S ≥ max { S , S } we obtain for sufficiently large n (cid:12)(cid:12) Pr [ ∀ C ∈ C m +1 ,S : d C ( G n,N ) = d C ] − ρ | G |− mκ ( G ) G >m ( ρ ) − (cid:12)(cid:12) ≤ ε. Since ε > roof of Lemma 3.12.
First we determine G ( x, y, v S ) explicitly. Define the multivariate generating series C ( x, y, v S ) = y C ( x ) + X C ∈C m +1 ,S ( v C − x | C | , where as usual x marks the size, y the number of components (which by convention is always 1 for C ∈ C )and v S objects in C m +1 ,S . Note that these parameters are clearly additive when forming multisets. Hence,according to [19, Theorem III.1] the formula (1.1) extends to the multivariate version G ( x, y, v S ) = exp X j ≥ C ( x j , x j , v jS ) j , (3.29)where v jS = ( v jC ) C ∈C m +1 ,S . Setting v C = 1 for all C ∈ C m +1 ,S we see that G ( x, y, ) ≡ G ( x, y ) such that[ x n y N ] G ( x, y ) = |G n,N | . By elementary algebraic manipulations we reformulate (3.29) to G ( x, y, v S ) = G ( x, y ) exp X C ∈C m +1 ,S X j ≥ ( x | C | yv C ) j j − X j ≥ ( x | C | y ) j j = G ( x, y ) Y C ∈C m +1 ,S − x | C | y − x | C | yv C . (3.30)Let us now turn to the initial claim in Lemma 3.12. We obtain that[ x n y N v d S S ] G ( x, y, v S ) = [ x n y N ] G ( x, y ) Y C ∈C m +1 ,S [ v d C C ] 1 − x | C | y − x | C | v C y = [ x n − D y N − D ′ ] G ( x, y ) Y C ∈C m +1 ,S (1 − x | C | y ) . Since C m +1 ,S does only have finitely many elements, there exist L, K ∈ N such that [ x ℓ y k ] Q C ∈C m +1 ,S (1 − x | C | y ) = 0 for all ℓ ≥ L, k ≥ K . Recall that, using Theorem 1.1,[ x n y N ] G ( x, y ) ∼ exp X j ≥ C ( ρ j ) − c m ρ jm jρ jm N c m − Γ( c m ) |C n − m ( N − | , n → ∞ , and so [ x n − a y N − b ] G ( x, y ) ∼ [ x n y N ] G ( x, y ) ρ a − mb for fixed a, b ∈ N as C is subexponential. Hence, as n → ∞ ,[ x n y N v d S S ] G ( x, y, v S ) = X ℓ ∈ [ L ] ,k ∈ [ K ] [ x n − D − ℓ y N − D ′ − k ] G ( x, y )[ x ℓ y k ] Y C ∈C m +1 ,S (1 − x | C | y ) ∼ [ x n y N ] G ( x, y ) · ρ D − mD ′ X ℓ ∈ [ L ] ,k ∈ [ K ] ρ ℓ − mk [ x ℓ y k ] Y C ∈C m +1 ,S (1 − x | C | y )= [ x n y N ] G ( x, y ) · ρ D − mD ′ Y C ∈C m +1 ,S (1 − ρ | C |− m ) , which finishes the proof. Proof of Proposition 1.4.
Let f : B ∗ → R be a bounded continuous function and for any finite graph G denote by o G a vertex chosen uniformly at random from its vertex set. Recall that L ( G n ) denotes one largestcomponent of G n and R ( G n ) the remainder after removing all objects of size m and L ( G n ). Then E [ f ( G n , o n )] = E (cid:2) f ( L ( G n ) , o L ( G n ) ) (cid:3) Pr [ o n ∈ L ( G n )]+ E (cid:2) f ( R ( G n ) , o R ( G n ) ) (cid:3) Pr [ o n ∈ R ( G n )]+ E [ f ( C m , o m )] Pr [ o n / ∈ R ( G n ) ∪ L ( G n )] . L ( G n ) = n − mN + O p (1) implying Pr [ o n ∈ L ( G n )] ∼ ( n − mN ( n )) /n → − λ . As the size L ( G n ) ∈ C tends to infinity and ( C n ) n ≥ converges in the BS sense to( C , o ) we have that E (cid:2) f ( L ( G n ) , o L ( G n ) ) (cid:3) Pr [ o n ∈ L ( G n )] → (1 − λ ) E [ f ( C , o )] , n → ∞ . Theorem 1.3 entails that R ( G n ) has a limiting distribution and hence Pr [ o n ∈ R ( G n )] →
0. As f is boundedit follows E (cid:2) f ( R ( G n ) , o R ( G n ) ) (cid:3) Pr [ o n ∈ R ( G n )] → , n → ∞ . Finally, we obtain by a combination of Theorems 1.2 and 1.3 that n − |R ( G n ) ∪ L ( G n ) | = mN ( n ) + O p (1)and hence Pr [ o n / ∈ R ( G n ) ∪ L ( G n )] ∼ mN ( n ) /n → λ . We conclude the proof by statinglim n →∞ E [ f ( G n , o n )] = (1 − λ ) E [ f ( C , o )] + λ E [ f ( C m , o m )] . Acknowledgements
The authors thank Benedikt Stufler for fruitful discussions and valuable input to the proof of Theorem 1.2.
References [1] L. Addario-Berry. A Probabilistic Approach to Block Sizes in Random Maps.
ALEALat. Am. J. Probab. Math. Stat. , 16(1):1–13, 2019. doi: 10.30757/alea.v16-01. URL https://doi.org/10.30757/alea.v16-01 .[2] D. Aldous and J. M. Steele. The Objective Method: Probabilistic Combinatorial Optimizationand Local Weak Convergence. In
Probability on Discrete Structures , volume 110 of
Encyclopae-dia Math. Sci. , pages 1–72. Springer, Berlin, 2004. doi: 10.1007/978-3-662-09444-0 1. URL https://doi.org/10.1007/978-3-662-09444-0_1 .[3] R. Arratia, A. Barbour, and S. Tavar´e.
Logarithmic Combinatorial Structures: A Probabilistic Approach .EMS monographs in mathematics. European Mathematical Society, 2003. ISBN 9783037190005. URL https://books.google.de/books?id=oBPvAAAAMAAJ .[4] A. Barbour and B. L. Granovsky. Random Combinatorial Structures: the Convergent Case.
Journal ofCombinatorial Theory, Series A , 109(2):203–220, 2005. ISSN 0097-3165. doi: 10.1016/j.jcta.2004.09.001.URL .[5] J. P. Bell, E. A. Bender, P. J. Cameron, and L. B. Richmond. Asymptotics for the Probability of Con-nectedness and the Distribution of Number of Components.
The Electronic Journal of Combinatorics ,7:R33, 2000.[6] I. Benjamini and O. Schramm. Recurrence of Distributional Limits of Finite Planar Graphs.
Electron. J. Probab. , 6:no. 23, 13, 2001. ISSN 1083-6489. doi: 10.1214/EJP.v6-96. URL https://doi.org/10.1214/EJP.v6-96 .[7] N. Bernasconi, K. Panagiotou, and A. Steger. On Properties of Random Dissections and Triangula-tions.
Combinatorica , 30(6):627–654, 2010. ISSN 1439-6912. doi: 10.1007/s00493-010-2464-8. URL https://doi.org/10.1007/s00493-010-2464-8 .[8] M. Bodirsky, ´E. Fusy, M. Kang, and S. Vigerske. Enumeration and Asymptotic Properties of UnlabeledOuterplanar Graphs.
The Electronic Journal of Combinatorics , 24:R66, 2007.[9] M. Bodirsky, ´E. Fusy, M. Kang, and S. Vigerske. Boltzmann Samplers, P´olya Theory, and CyclePointing.
SIAM Journal on Computing , 40(3):721–769, 2011. ISSN 0097-5397. doi: 10.1137/100790082.URL http://dx.doi.org/10.1137/100790082 .2410] V. E. Britikov. Asymptotic Number of Forests from Unrooted Trees.
Mat. Zametki , 43(5):672–684,1988. ISSN 0025-567X. URL https://doi.org/10.1007/BF01158847 .[11] S. N. Burris.
Number Theoretic Density and Logical Limit Laws , volume 86 of
Mathematical Surveysand Monographs . American Mathematical Society, Providence, RI, 2001. ISBN 0-8218-2666-2.[12] N. Curien and I. Kortchemski. Random Non-Crossing Plane Configurations: A Conditioned Galton-Watson Tree Approach.
Random Structures & Algorithms , 45(2):236–260, 2014. doi: 10.1002/rsa.20481.URL https://onlinelibrary.wiley.com/doi/abs/10.1002/rsa.20481 .[13] M. Drmota, ´E. Fusy, M. Kang, V. Kraus, and J. Ru´e. Asymptotic Study of Subcritical Graph Classes.
SIAM J. Discrete Math. , 25(4):1615–1651, 2011. doi: 10.1137/100790161.[14] M. Drmota, O. Gim´enez, M. Noy, K. Panagiotou, and A. Steger. The Maximum Degree of RandomPlanar Graphs.
Proceedings of the London Mathematical Society , 109(4):892–920, 2014. ISSN 1460-244X.doi: 10.1112/plms/pdu024. URL http://dx.doi.org/10.1112/plms/pdu024 .[15] M. Drmota, E. Y. Jin, and B. Stufler. Graph Limits of Random Graphs From a Subset of Connected k –Trees. Random Structures & Algorithms , 55(1):125–152, 2019. ISSN 1042-9832. doi: 10.1002/rsa.20802.URL https://doi.org/10.1002/rsa.20802 .[16] P. Duchon, P. Flajolet, G. Louchard, and G. Schaeffer. Boltzmann Samplers for the Random Generationof Combinatorial Structures.
Combinatorics, Probability and Computing , 13:2004, 2004.[17] P. Embrechts and E. Omey. Functions of Power Series.
Yokohama Mathematical Journal , 32, 01 1984.[18] P. Erd¨os and J. Lehner. The Distribution of the Number of Summands in the Parti-tions of a Positive Integer.
Duke Math. J. , 8:335–345, 1941. ISSN 0012-7094. URL http://projecteuclid.org/euclid.dmj/1077492649 .[19] P. Flajolet and R. Sedgewick.
Analytic Combinatorics . Cambridge University Press, New York, NY,USA, 1 edition, 2009. ISBN 0521898064, 9780521898065.[20] P. Flajolet, ´E. Fusy, and C. Pivoteau. Boltzmann Sampling of Unlabelled Struc-tures. In , pages 201–211. 2007. doi: 10.1137/1.9781611972979.5. URL http://epubs.siam.org/doi/abs/10.1137/1.9781611972979.5 .[21] S. Foss, D. Korshunov, and S. Zachary.
An Introduction to Heavy-tailed and Subexponential Distribu-tions . Springer New York Dordrecht Heidelberg London, 2009. ISBN 978-1-4419-9472-1.[22] A. Georgakopoulos and S. Wagner. Limits of Subcritical Random Graphs and Random Graphs withExcluded Minors.
ArXiv e-prints , 2016.[23] B. L. Granovsky and D. Stark. Asymptotic Enumeration and Logical Limit Laws for ExpansiveMultisets and Selections.
J. London Math. Soc. (2) , 73(1):252–272, 2006. ISSN 0024-6107. doi:10.1112/S0024610705022477. URL https://doi.org/10.1112/S0024610705022477 .[24] B. L. Granovsky, D. Stark, and M. Erlihson. Meinardus’ Theorem on Weighted Partitions: Extensionsand a Probabilistic Proof.
Adv. in Appl. Math. , 41(3):307–328, 2008. ISSN 0196-8858. doi: 10.1016/j.aam.2007.11.001. URL https://doi.org/10.1016/j.aam.2007.11.001 .[25] G. H. Hardy and S. Ramanujan. Asymptotic Formulae in Combinatory Analysis.
Proc. Lon-don Math. Soc. (2) , 17:75–115, 1918. ISSN 0024-6115. doi: 10.1112/plms/s2-17.1.75. URL https://doi.org/10.1112/plms/s2-17.1.75 .[26] H.-K. Hwang. Distribution of Integer Partitions with Large Number of Summands.
ActaArith. , 78(4):351–365, 1997. ISSN 0065-1036. doi: 10.4064/aa-78-4-351-365. URL https://doi.org/10.4064/aa-78-4-351-365 .2527] H.-K. Hwang. Limit Theorems for the Number of Summands in Integer Partitions.
J. Com-bin. Theory Ser. A , 96(1):89–126, 2001. ISSN 0097-3165. doi: 10.1006/jcta.2000.3170. URL https://doi.org/10.1006/jcta.2000.3170 .[28] S. Janson. Simply Generated Trees, Conditioned Galton–Watson Trees, Random Allocations and Con-densation.
Probab. Surveys , 9:103–252, 2012. doi: 10.1214/11-PS188.[29] C. Knessl and J. B. Keller. Partition Asymptotics from Recursion Equations.
SIAM J. Appl. Math. , 50(2):323–338, 1990. ISSN 0036-1399. doi: 10.1137/0150020. URL https://doi.org/10.1137/0150020 .[30] P. Leroux, F. Bergeron, and G. Labelle.
Combinatorial Species and Tree-Like Structures , volume 67 of
Encyclopedia of Mathematics and its Applications . Cambridge University Press, Cambridge, 1997. URL .[31] T. Luczak and B. Pittel. Components of Random Forests.
Combin. Probab. Com-put. , 1(1):35–52, 1992. ISSN 0963-5483. doi: 10.1017/S0963548300000067. URL https://doi.org/10.1017/S0963548300000067 .[32] G. Meinardus. Asymptotische Aussagen ¨uber Partitionen.
Math. Z. , 59:388–398, 1954. ISSN 0025-5874.doi: 10.1007/BF01180268. URL https://doi.org/10.1007/BF01180268 .[33] L. Mutafchiev. Large Components and Cycles in a Random Mapping Pattern. In
Random graphs ’87(Pozna´n, 1987) , pages 189–202. Wiley, Chichester, 1990.[34] L. Mutafchiev. The Largest Tree in Certain Models of RandomForests.
Random Structures & Algorithms , 13(34):211–228, 1998. doi:10.1002/(SICI)1098-2418(199810/12)13:3/4 h i https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291098-2418%28199810/12%2913%3A3/4%3C211%3A%3AAID-RSA2%3E3.0.CO%3B2-Y .[35] L. Mutafchiev. Limit Theorems for the Number of Parts in a Random Weighted Partition. Electron. J.Combin. , 18(1):Paper 206, 27, 2011. ISSN 1077-8926.[36] L. Mutafchiev. The Size of the Largest Part of Random Weighted Partitions of Large Integers.
Com-bin. Probab. Comput. , 22(3):433–454, 2013. ISSN 0963-5483. doi: 10.1017/S0963548313000047. URL https://doi.org/10.1017/S0963548313000047 .[37] R. Otter. The Number of Trees.
Ann. of Math. (2) , 49:583–599, 1948. ISSN 0003-486X. doi: 10.2307/1969046. URL https://doi.org/10.2307/1969046 .[38] E. M. Palmer and A. J. Schwenk. On the Number of Trees in a Random Forest.
J. Combin.Theory Ser. B , 27(2):109–121, 1979. ISSN 0095-8956. doi: 10.1016/0095-8956(79)90073-X. URL https://doi.org/10.1016/0095-8956(79)90073-X .[39] K. Panagiotou and L. Ramzews. Asymptotic Enumeration of Graph Classes with Many Compo-nents. In , pages 133–142. SIAM, Philadelphia, PA, 2018. doi: 10.1137/1.9781611975062.12. URL https://doi.org/10.1137/1.9781611975062.12 .[40] K. Panagiotou and A. Steger. Maximal Biconnected Subgraphs of Random Planar Graphs.
ACMTrans. Algorithms , 6(2):31:1–31:21, 2010. ISSN 1549-6325. doi: 10.1145/1721837.1721847. URL http://doi.acm.org/10.1145/1721837.1721847 .[41] K. Panagiotou, B. Stufler, and K. Weller. Scaling Limits of Random Graphs from SubcriticalClasses.
The Annals of Probability , 44(5):3291–3334, 2016. doi: 10.1214/15-AOP1048. URL http://dx.doi.org/10.1214/15-AOP1048 .[42] G. P´olya and G. Szeg˝o.
Aufgaben und Lehrs¨atze aus der Analysis. Band I: Reihen, Integralrechnung,Funktionentheorie . Vierte Auflage. Heidelberger Taschenb¨ucher, Band 73. Springer-Verlag, Berlin-NewYork, 1970. 2643] B. Stufler. Gibbs Partitions: the Convergent Case.
Random Structures Algorithms , 53(3):537–558, 2018.ISSN 1042-9832. doi: 10.1002/rsa.20771. URL https://doi.org/10.1002/rsa.20771 .[44] B. Stufler. Random Enriched Trees with Applications to Random Graphs.
Electron. J. Combin. , 25(3):Paper 3.11, 81, 2018. ISSN 1077-8926.[45] B. Stufler. The Continuum Random Tree is the Scaling Limit of Unlabeled Unrooted Trees.
Ran-dom Structures & Algorithms , 55(2):496–528, 2019. ISSN 1042-9832. doi: 10.1002/rsa.20833. URL https://doi.org/10.1002/rsa.20833 .[46] B. Stufler. Unlabelled Gibbs Partitions.
Combin. Probab. Comput. , 29(2):293–309, 2020. ISSN 0963-5483. doi: 10.1017/s0963548319000336. URL https://doi.org/10.1017/s0963548319000336 .[47] A. M. Vershik. Statistical Mechanics of Combinatorial Partitions, and their Limit Configurations.
Funktsional. Anal. i Prilozhen. , 30(2):19–39, 96, 1996. ISSN 0374-1990. doi: 10.1007/BF02509449.URL https://doi.org/10.1007/BF02509449https://doi.org/10.1007/BF02509449