Several topological indices of random caterpillars
SSeveral topological indices of random caterpillars
Panpan Zhang Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104, U.S.A.
Xiaojing Wang Department of Statistics, University of Connecticut, Storrs, CT 06269, U.S.A.
February 26, 2021
Abstract.
In chemical graph theory, caterpillar trees have been an appealing model to rep-resent the molecular structures of benzenoid hydrocarbon. Meanwhile, topological index hasbeen thought of as a powerful tool for modeling quantitative structure-property relationshipand quantitative structure-activity between molecules in chemical compounds. In this article,we consider a class of caterpillar trees that are incorporated with randomness, called randomcaterpillars, and investigate several popular topological indices of this random class, includ-ing Zagreb index, Randi´c index and Wiener index, etc. Especially, a central limit theorem isdeveloped for the asymptotic distribution of the Zagreb index of random caterpillars.
AMS subject classifications.
Primary: 05C80, 92E10Secondary: 50C10, 60F05
Key words.
Caterpillar tree, chemical graph theory, central limit theorem, martingale, randomcaterpillars, topological index 1. I
NTRODUCTION
In this article, we investigate several topological indices of a class of random graphs. The topological index of a graph is a graph-invariant descriptor that quantifies its structure or somekind of feature. Topological index has found a plethora of applications in chemical graphtheory, mathematical chemistry and chemoinformatics. In practice, various kinds of topologicalindices, such as Zagreb index [22] and Randi´c index [45], are used to compare moleculargraphs of chemical compounds [49], and to model quantitative structure-property relationship(QSPR) and quantitative structure-activity relationship (QSAR) between molecules [9]. Werefer interested readers to [49] for a text-style exposition of utilization of topological index inchemistry.Specifically, we look into a class of random graphs that incorporate randomness into cater-pillar graphs , i.e., random caterpillars . In mathematical chemistry and chemoinformatics, acaterpillar graph (or simply caterpillar) is an acyclic graph with the property that there remainsa path (called spine) if all leaves are pruned, best known for modeling the structure and intrinsicproperties of benzoid hydrocarbon molecules [1, 12, 13, 24].Our motivation of incorporating randomness and caterpillar graphs is from a recent arti-cle [33], in which a random graph model was utilized to model the chemistry of a discretepolymerization process. More precisely, we consider random caterpillars that grow in a uni-form manner. At time , there is a spine consisting of m ≥ (fixed) nodes. At each subsequentpoint of time, a leaf is connected with one of the spine nodes by an edge, all spine nodes beingequally likely to be selected. At time n , we denote the structure of a random caterpillar by C n . Corresponding author; Email: [email protected]. Email: [email protected]. a r X i v : . [ m a t h . P R ] F e b At first, we present some known results of C n as preliminaries, and give some notationsthat will be used throughout the manuscript. At time n , there is a total of ( n + m ) nodes.Additionally, the total number of edges is fixed, i.e., ( n + 2 m − . Enumerate m spine nodesin a preserved order (e.g., from left to right) with distinct numbers in { , , . . . , m } . Let X i,n be the number of leaves attached to spine node i for i = 1 , , . . . , m , and let D i,n be the degreeof spine node i . According to the evolution of random caterpillars, we know that the jointdistribution of X i,n ’s is multinomial with parameters n and I m × /m , where I m × is a columnvector of all ’s. Additionally, there is an instantaneous relation between X i,n and D i,n . That is D i,n = X i,n + 1 for i = 1 , m ; D i,n = X i,n + 2 for i = 2 , , . . . , m − .In the remainder of the paper, we calculate several topological indices of C n . They are Giniindex, Hoover index, Zagreb index, Randi´c index and Wiener index, respectively presentedfrom Section 2 to Section 6. We give the definition of each index and some brief introductionsabout their applications that initiate and promote the motivations of our analysis in the sequel.In Section 7, we carry out numerical experiments to verify the theoretical results developed inthe preceding sections. Some concluding remarks are addressed at the end of the article .2. G INI INDEX
The
Gini index (or called Gini coefficient) is a widely used measure in economics [19],mainly known for assessing statistical dispersion of income or wealth of a population. Moreprecisely, the Gini index (of a target population) is a number measuring the degree of inequalityin (income or wealth) distribution. Given a population of size n , the Gini index is given by G = (cid:80) ni =1 (cid:80) nj =1 | w i − w j | n (cid:80) ni =1 w i , where w i refers to the wealth of person i .Recently, different types of Gini index that are well defined for graphs were proposed. Wepresent the results only in this section without proof, as random caterpillars were used as exam-ples in all the relevant sources. A distance-based Gini index for rooted trees was given in [3],used as a measure of disparities of trees within random tree classes. Let B n be the distance-based Gini index of C n . The exact and the asymptotic mean of B n were calculated in [3], andrevisited by [53]. Proposition 1 ([3]) . The mean of the distance-based Gini index for a random caterpillar is E [ G n ] = (2 m − n + ( m + 4 m − m + 2) n + 2 m − m (6 m + 6 m ) n + 12 m n + 6 m − m . As n → ∞ and m → ∞ , the expectation of this index converges to / . In an independent work, a degree-based Gini index for general graphs was proposed by [10].This particular Gini index, in general, is used to assess the regularity of classes of randomgraphs. A follow-up study that uncovers a duality theory is conducted in [11]. To avoid ambi-guity, we denote the degree-based Gini index in [10] by ˜ B n . Proposition 2 ([10]) . The mean of the degree-based Gini index for the class of random cater-pillars is ˜ B n L −→ , as n → ∞ . A slightly different degree-based Gini index for random caterpillars was discussed in anindependent source [53], where a different target population was considered. It is evidentthat this type of Gini index degenerates as n goes to infinity, shown in [53] via numericalexperiments and in [11] via a probabilistic approach.
3. H
OOVER INDEX
Another important topological measure with applications in economics is the
Hoover in-dex [30], which is also known as the Robin Hood Index or the Schutz index in the literature.Like the Gini index, the Hoover index is another inequality metric used for measuring thedeviation of the current (income or wealth) distribution from the perfectly even distribution.An alternative interpretation of this index is the portion of population income that would betaken from the richer half to the poorer half for the whole community to be perfectly equal.Mathematically, the
Hoover index of a population with size n is H = n × (cid:80) ni =1 | w i − ¯ w | ¯ w , (1)where ¯ w = 1 /n (cid:80) ni =1 w i is the average of the entire population wealth. A graphical interpre-tation of the Hoover index is the longest vertical distance between the Lorenz curve and the degree line of a unit square. Thus, we immediately have ≤ H < .A graph-friendly Hoover index is to replace all w i ’s in Equation (1) with node degrees. Ourintent is to consider the Hoover index for a class of graphs. Therefore, we propose a degree-based Hoover index for graphs analogous to the degree-based Gini index introduced in [10] asa competing measure for assessing graph regularity. Given an arbitrary graph G = ( V, E ) ∈ G ,where V and E respectively denotes the vertex set and the edge set of graph G and G is theclass to which G belongs, the degree-based Hoover index is defined as follows: H ( G ) = 12 × (cid:80) v ∈ V (cid:12)(cid:12) deg( v ) − (cid:80) v ∈ V deg( v ) / | V | (cid:12)(cid:12) E (cid:2) |V| (cid:3) × E (cid:2) deg( U ) (cid:3) , where deg( v ) represents the degree (i.e., the number of edges incident to v ) of v , | V | is thecardinality of set V , E (cid:2) |V| (cid:3) is the expected order (i.e., the number of nodes) of a randomlychosen graph in G , and E (cid:2) deg( U ) (cid:3) is the degree of a randomly chosen node in a randomlychosen graph in G . The Hoover index of the class G , H ( G ) , is the average of all H ( G ) for G in G . An argument similar to [10, Theorem 4.1] can be established to show that the Hoover indexof a graph class takes values between and asymptotically, and a value closer to suggeststhat the graphs in the class tend to be more regular.The order of an arbitrary C n is fixed, i.e., ( n + m ) . Let U C n be a randomly selected node of C n uniformly chosen from the class of random caterpillars. We have E (cid:2) deg( U C n ) (cid:3) = (cid:80) mi =1 D i,n + (cid:80) mi =1 X i,n n + m = 2 n + 2 m − n + m = 2 − n + m . Next, we compute the numerator. Note that all leaves have degree , less than the average.The probability that the (spine) nodes at the two ends of the spine are never selected in a longrun is negligible. Hence, with high probability, the degree of each spine node is larger than theaverage. Thus, the expectation of the numerator is equivalent to m (cid:88) i =1 (cid:18) E [ D i,n ] − n + 2 m − n + m (cid:19) + n (cid:18) n + 2 m − n + m − (cid:19) = 2 n ( n + m − n + m . In what follows, we get the asymptotic mean of the Hoover index of the class of random cater-pillars at time n , denoted by H n , in the next proposition. Proposition 3.
The mean of the Hoover index of the class of random caterpillars H n → , as n → ∞ . Proof.
By the definition of the Hoover index, we have H n = E (cid:2) H ( C n ) (cid:3) = 2 n ( n + m − / ( n + m )2( n + m ) (cid:0) − / ( n + m ) (cid:1) → , as n → ∞ . (cid:3)
4. Z
AGREB INDEX
In this section, we calculate the Zagreb index of a random caterpillar at time n , denotedby Z n = Zagreb ( C n ) . The Zagreb index of a graph is defined as the sum of the squareddegrees of the nodes in the graph. Applications of Zagreb index mostly appear in mathematicalchemistry, used to study molecular complexity [41], chirality [20], ZE-isomerism [21] andheterosystems [36]. It is not even possible to list them all, so we refer the interested readers to asurvey article [42], in which the authors emphasized the potential applicability of Zagreb indexfor deriving multilinear regression models. In the literature of random graphs and algorithms,the Zagreb indices of random recursive trees, random b -ary tree, plain-oriented recursive treesand preferential attachment caterpillars were respectively studied in [15, 16, 52, 54].By definition, the Zagreb index of random caterpillars is given by Z n = m (cid:88) i =1 D i,n + m (cid:88) i =1 X i,n . Here, first, we give some additional useful notations. Let n ( i ) denote the event that the spinenode labeled with i is selected at time n , and let F n denote the σ -field generated by the historyof the growth of a caterpillar in the first n stages. We present the mean of Z n as well as a weaklaw in the next proposition. Proposition 4.
For n ≥ , we have E [ Z n ] = n m + (6 m − nm + 4 m − . As n → ∞ , we have Z n n L −→ m . This convergence takes place in probability as well.Proof.
We consider the following almost-sure recursive relation between Z n − and Z n , condi-tional on n ( i ) and F n − : Z n = Z n − + ( D i,n − + 1) − D i,n − + 1 = Z n − + 2 D i,n − + 2 . (2)Averaging it out over i , we get E [ Z n | F n − ] = Z n − + 2 m m (cid:88) i =1 D i,n − + 2 = Z n − + 2( n + 2 m − m + 2 . Taking another expectation with respect to F n − , we obtain the recurrence for Z n . We solve itwith initial condition E [ Z ] = Z = 4 m − , and get the result stated in the proposition.As n → ∞ , we have Z n /n converges to /m in L -space, suggesting that Z n /n convergesto /m in probability as well. (cid:3) We continue to calculate the second moment of Z n , and accordingly obtain the variance of Z n . Proposition 5.
For n ≥ , we have E (cid:2) Z n (cid:3) = n m + (12 m − n m + (44 m − m + 23) n m + (48 m − m + 66 m − nm + 16 (cid:18) m − (cid:19) , and V ar[ Z n ] = 2 n (cid:0) ( m − n + 3 m − (cid:1) m . Proof.
Recall the almost-sure recursive relation between Z n − and Z n , conditional on n ( i ) and F n , established in Proposition 4 (c.f. Equation (2)). We square it on both sides and get Z n = ( Z n − + 2 D i,n − + 2) = Z n − + 4 D i,n − + 4 + 4 Z n − + 4 Z n − D n − + 8 D i,n − Again, we average it out over i to obtain E (cid:2) Z n | F n − (cid:3) = Z n − + 4 m m (cid:88) i =1 D i,n − + 4 + 4 Z n − + 4 Z n − m m (cid:88) i =1 D i,n − + 8 m m (cid:88) i =1 D i,n − . = Z n − + 4 m (cid:0) Z n − − ( n − (cid:1) + 4 + 4 Z n − + 4( n + 2 m − m Z n − + 8( n + 2 m − m . We then obtain the recurrence for the second moment of Z n by taking the expectation withrespect to F n − and by plugging the result of the first moment of Z n − . Solve the recurrencewith initial condition E [ Z ] = Z = (4 m − to get the result stated in the proposition. Inwhat follows, we obtained the exact expression of the variance of Z n by subtracting the squareof the mean of Z n from its second moment. (cid:3) According to the expression of E [ Z n ] , we can simply conclude that Z n n L −→ m , done in a similar manner as the proof for L convergence of Z n /n . Thus, we also obtain aweak law for Z n . Besides, we find a stronger L convergence of Z n /n , presented in the nextcorollary. Corollary 1. As n → ∞ , we have Z n n L −→ m . Proof.
By the L -convergence results for Z n and Z n , we have lim n →∞ E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) Z n n − m (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) = lim n →∞ E (cid:20) Z n − × Z n n × m + 1 m (cid:21) = 1 m − × m × m + 1 m = 0 . (cid:3) According to the variance of Z n given in Proposition 5, we find that the order of its leadingterm is n , which is the same as that for the mean of Z n . A sharp concentration of the varianceoften suggests asymptotic normality of the random variable. In what follows, we characterizethe asymptotic behavior of Z n after properly scaled. Based on our investigation, the scale is n .Our strategy is to construct a martingale array based on a transformation of Z n , and appeal to a Martingale Central Limit Theorem (MCLTs) for developing a Gaussian law of Z n /n as n goesto infinity.In the proof of Proposition 4, we find that E [ Z n | F n − ] = Z n − + 2( n + 2 m − m + 2 , suggesting that { Z n } n is not a martingale. In the next lemma, we apply a transformation to Z n such that the new sequence is a martingale. Lemma 1.
The sequence of { M n } n such that M n = Z n − n ( n + 6 m − m is a martingale.Proof. Let us consider M n = Z n + β n such that the following fundamental property of martin-gale holds; namely, E [ Z n + β n | F n − ] = Z n − + 2( n + 2 m − m + 2 + β n = Z n − + β n − . This produces a recurrence for β n , namely, β n = β n − − n + 6 m − m . The solution is β n = − n ( n + 6 m − /m , obtained by taking an arbitrary choice of the initialcondition; we choose β = 0 . (cid:3) The MCLT that we exploit to show asymptotic normality is based on martingale differences,expressed in terms of a difference operator, i.e., ∇ M j := M j − M j − . In fact, there are differentversions of MCLT listed in [29], requiring different sets of conditions. The MCLT that we userefers to [29, Corollary 3.2], requiring two conditions which are respectively known as the conditional Lindeberg’s condition and the conditional variance condition . In the next twolemmas, we verify these two conditions one after another. Lemma 2.
The Lindeberg’s condition is given by U n := n (cid:88) j =1 E (cid:34)(cid:18) ∇ M j n (cid:19) (cid:0) |∇ M j /n | > ε (cid:1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F j − (cid:35) P −→ , for arbitrary ε > . Proof.
We establish an absolutely uniform bound in all j ≤ n . By the construction of themartingale, we have |∇ M j | = | M j − M j − |≤ | Z j − Z j − | + 2 j + 6 m − m ≤ i D i,j − + 2 + 2 j + 6 m − m ≤ (cid:0) j − (cid:1) + 2 + 2 j + 6 m − m = (cid:18) m + 2 m (cid:19) j + 10 m − m , which is increasing in j for any fixed integer m ≥ . Thus, we conclude that |∇ M j | isuniformly bounded by n . Hence, for any ε > , there exists n ( ε ) > such that the sets {| M j /n | > ε } are all empty for n > n ( ε ) . In what follows, we conclude that U n converges to almost surely, which is stronger than the in-probability convergence required for the Linde-berg’s condition. (cid:3) Lemma 3.
The conditional variance condition is given by V n := n (cid:88) j =1 E (cid:34)(cid:18) ∇ M j n (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F j − (cid:35) P −→ m − m . Proof.
We rewrite V n as follows: V n = 1 n n (cid:88) j =1 E (cid:104)(cid:0) Z j + β j − ( Z j − + β j − ) (cid:1) (cid:12)(cid:12)(cid:12) F j − (cid:105) = 1 n n (cid:88) j =1 E (cid:2) ( Z j − Z j − ) + 2( Z j − Z j − )( β j − β j − )+ ( β j − β j − ) | F j − (cid:3) We evaluate the three parts in the summand one after another, by considering the asymptoticequivalents of Z j − and Z j − .(1) The first part is E [( Z j − Z j − ) | F j − ] = E [ Z j | F j − ] − Z j − E [ Z j | F j − ] + Z j − = 4 j + 28( m − j + 6 m − m + 24 m . (2) The second part is E [2( Z j − Z j − )( β j − β j − ) | F j − ] = 2( β j − β j − ) (cid:0) E [ Z j | F j − ] − Z j − (cid:1) = − j + 3 m + 3)( j + 3 m − m . (3) The third part it E [( β j − β j − ) | F j − ] = ( β j − β j − ) = (2 j + 6 m + 6) m . Putting three parts together, we obtain the summand in V n for each j . Then, we sum these termsfor j = 1 , , . . . , n , and let n go to infinity, obtaining V n = 1 n (cid:18) m − m n + O ( n ) (cid:19) L −→ m − m . The L convergence is stronger than the required in-probability convergence. (cid:3) Theorem 1. As n → ∞ , Z n /n follows a Gaussian law, namely, Z n − n /mn D −→ N (cid:18) , m − m (cid:19) . The proof is easily verified by using the MCLT [29, Corollary 3.2].5. R
ANDI ´ C INDEX
In this section, the Randi´c index of a random caterpillar, denoted by R n = Randi´c ( C n ) ,is calculated. The Randi´c index of a graph G = ( V, E ) (with parameter α ) is the sum of theproduct of the degrees (raised to power α ) of all pairwise connected nodes. Mathematically, itis R ( G ) = (cid:88) { u,v }∈ E (cid:0) deg( u )deg( v ) (cid:1) α , for all u, v ∈ V .The classical choice of α is − / [45]. Under such choice, the Randi´c index is also called connectivity index . The general definition given above was later proposed by [5]. Similar toZagreb index, Randi´c is also used for modeling QSAR and QSPR of chemical compounds (e.g.,alkanes [25], saturated hydrocarbon [26] and benzenoid systems [44]) in chemoinformatics. Werefer interested readers to [47] for a concise survey, to [48] for a complete history review andto [34] for a summary of mathematical properties. To the best of our knowledge, little work hasbeen done for the Randi´c index of random graph models. The only source that we find in theliterature is the Randi´c index of random binary trees [14].In C n , note that no pair of leaves is connected. Each leaf, instead, is only connected toits parent spine node. Therefore, the contribution by each edge connecting a leaf and its cor-responding spine node to R n is the degree of the spine node raised to power α . There are ( m − edges on the spine, and the contribution by each of them to R n is ( D i − ,n D i,n ) α for i = 2 , , . . . , m . Hence, we arrive at R n = m (cid:88) i =2 ( D i − ,n D i,n ) α + m (cid:88) i =1 X i,n D αi,n . (3)Specifically, we consider the Randi´c index with parameter α = 1 . This particular type ofRandi´c index is also popular in mathematical chemistry, viewed as a molecular structure de-scriptor [2]. Among all, this index is best known for measuring the branching of molecularcarbon-atom skeleton [23, 27]. In the rest of this section, all the Randi´c indicies without speci-fication all refer to the Randi´c indicies with parameter α = 1 . Accordingly, we redefine R n inEquation (3) as R n = m (cid:88) i =2 D i − ,n D i,n + m (cid:88) i =1 X i,n D i,n . Many mathematical properties of this specific class of Randi´c indices ( α = 1 ) were inves-tigated by graph theorists and combinatorists recently [28, 32, 35]. In the graph theory com-munity, the Randi´c index is more often known by a different name: the second-order Zagrebindex . In [6], the Randi´c index for extremal graphs was studied, where another different name“extreme -weight ” was used. In the next proposition, we present the mean of R n as well as aweak law. Proposition 6.
For n ≥ , the mean of the Randi´c index of a caterpillar is E [ R n ] = (2 m − n + (7 m − m + 1) n + 4 m ( m − m . As n → ∞ , we have R n n L −→ m − m . This convergence takes place in probability as well.Proof.
Recall that the the distribution of X i,n ’s is multinomial with parameters n and I m × /m .Hence, we have E [ X i,n ] = n/m , V ar[ X i,n ] = ( m − n/m , and E (cid:2) X i,n (cid:3) = ( n + m − n/m for each i , and E [ X i,n X j,n ] = − n/m + n /m = n ( n − /m for all i (cid:54) = j . In what follows,we obtain E [ R n ] = m (cid:88) i =2 E [ D i − ,n D i,n ] + m (cid:88) i =1 E [ X i,n D i,n ]= 2 E (cid:2) ( X ,n + 1)( X ,n + 2) (cid:3) + ( m − E (cid:2) ( X ,n + 2)( X ,n + 2) (cid:3) + 2 E (cid:2) X ,n ( X ,n + 1) (cid:3) + ( m − E (cid:2) X ,n ( X ,n + 2) (cid:3) = ( m − E [ X ,n X ,n ] + m E (cid:2) X ,n (cid:3) + (6 m − E [ X ,n ] + (4 m − m − n ( n − m + m ( n + m − nm + (6 m − nm + 4 m −
8= (2 m − n + (7 m − m + 1) n + 4 m ( m − m . It is obvious that R n /n converges to (2 m − /m in L -space, and this convergence isstronger than the in-probability convergence stated in the weak law. (cid:3) Another approach to calculating the mean of R n is to establish an almost-sure relation anal-ogous to Equation (2) for R n , to obtain an recurrence for E [ R n ] by taking expectation twice,and lastly to solve the recurrence. We omit the details of the derivation, but just present anintermediate step that we need for the follow-up study: E [ R j | F j − ] = R j − + 1 m (cid:0) j + 3 m − − D ,j − − D m,j − (cid:1) + 1 ≥ R j − + 2 j + 7 m − m , suggesting that { R j } ≤ j ≤ n is a super-martingale. Note that we use j as subscript instead of n toavoid potential confusion of notation. Consider a generic scale, ξ n = n , free of j . It is obviousthat { R j /ξ n } ≤ j ≤ n remains a super-martingale. Theorem 2.
There exists a random variable ˜ R finite in its mean, such that R n n a.s. −→ ˜ R, as n → ∞ .Proof. Notice that R j is increasing in j . So is R n /ξ n . We thus have sup j E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) R j n (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) = E (cid:20) R n n (cid:21) = 2 m − m + O (cid:18) n (cid:19) < + ∞ . We thus arrive at the conclusion by the
Doob’s Convergence Theorem . (cid:3)
6. W
IENER INDEX
In this section, we place focus on the Wiener index of a random caterpillar, i.e., W n = Wiener ( C n ) . The Wiener index of a graph is defined as the sum of the lengths of the shortestpaths (i.e., distances) of all pairs of nodes therein. Mathematically, given G = ( V, E ) , it is W ( G ) = (cid:88) u,v ∈ V dist( u, v ) , where dist( u, v ) is the distance between u and v .In chemistry, the Wiener index was first used to study carbon-carbon bonds between all pairsof carbon atoms in an alkane [51], and later was used to characterize QSPR for alkanes [43].There is a variety of applications of the Wiener index in chemical graph theory and chemoin-formatics, not limited to QSAR and QSPR modeling. For the sake of conciseness, we refer theinterested readers to [40] and relevant references therein.On the other hand, the Wiener index is very popular in the random graph community, andis extensively investigated for many random models, such as random binary search trees andrandom recursive trees [39], balanced binary trees [4], random digital trees [18], random splittrees [37], random b -ary trees [38], conditioned Galton-Watson trees [17] and more generalrooted and unrooted trees [50] and simply generated random trees [31].The Wiener index of a caterpillar is constituted by three classes of contributions:(1) The total distances between spine nodes and spine nodes;(2) The total distances between leaves and leaves;(3) The total distances between spine nodes and leaves.We calculate the three contributions to W n one after another.The contribution purely among spine nodes is simple, as it is fixed. That is m − (cid:88) i =1 m (cid:88) j = i +1 ( j − i ) = 12 m − (cid:88) i =1 ( m + 1 − i )( m − i ) = 16 m ( m − . To compute the contributions between leaves and leaves, we consider the following twoscenarios. For all ≤ i < j ≤ m , the distance between a leaf attached to spine node i and aleaf attached to spine node j is ( j − i + 2) , and the number of pairs is X i,n X j,n . On the otherhand, for each ≤ i ≤ m , the distance between a leaf attached to spine node i and another(different) leaf attached to the same spine node is , and the number of pairs is X i ( X i − / .Thus, the total contribution among leaves is m − (cid:88) i =1 m (cid:88) j = i +1 ( j − i + 2) X i,n X j,n + m (cid:88) i =1 X i,n ( X i,n − . Lastly, we calculate the contributions between leaves and spine nodes. Note that the com-putation for this class is slightly different from the previous two. We consider the orders ofindices i and j to avoid double counting, but it is unnecessary here. For all ≤ i, j ≤ m ,the distance between a leaf attached to spine node i and spine node j is ( | i − j | + 1) , and thenumber of leaves attached to spine node i is X i,n . Hence, the total contribution by this class isgiven by m (cid:88) i =1 m (cid:88) j =1 ( | j − i | + 1) X i,n . We are now ready to calculate the mean of the Wiener index. Likewise, we obtain a weaklaw. Proposition 7.
For n ≥ , the mean of the Wiener index of a caterpillar is E [ W n ] = ( m + 6 m − n + ( m − m + 7 m − n + m ( m − m . As n → ∞ , we have W n n L −→ m + 6 m − m . The convergence takes place in probability as well.Proof.
Putting the three types of contributions that we have calculated together, and taking theexpectation for all X i,n ’s, we get E [ W n ] = m ( m − m ( m + 7)( m − E [ X ,n X ,n ] + m E (cid:2) X ,n (cid:3) − m E [ X ,n ] + (cid:18) m + 2 × (cid:18) m + 13 (cid:19)(cid:19) E [ X ,n ]= m ( m − m + 7)( m − n ( n − m + ( n + m − nm + ( m + 4)( m − n
3= ( m + 6 m − n + ( m − m + 7 m − n + m ( m − m . The weak law stated in the proposition follows immediately. (cid:3)
There exists a finite-mean random variable, to which W n converges, after properly scaled(by n ). The proof is done mutandis mutatis, with an application of the Doob’s ConvergenceTheorem. We thus omit the details, but only state the theorem. Theorem 3.
There exists a random variable ˜ W finite in its mean, such that W n n a.s. −→ ˜ W , as n → ∞ . An extension of the Wiener index is the hyper Wiener index , a relatively new topologicalindex that characterizes molecular structure and feature of more complex chemical compounds.This index was proposed by [46] to analyze the structure of 2-Methylhexane, and later onwas used in [8] to investigate one-pentagonal carbon nanocone. The hyper Wiener index of G = ( V, E ) is given by W h ( G ) = (cid:88) u,v ∈ V (cid:0) dist( u, v ) + dist ( u, v ) (cid:1) . Let W hn be the hyper Wiener index of C n . The computation of W hn is indeed similar to thatof W n . We thus only list the key steps. We again decompose the index into three parts, wherethe first refers to spine-spine contributions: m − (cid:88) i =1 m (cid:88) j = i +1 (cid:0) ( j − i ) + ( j − i ) (cid:1) = m ( m + 2 m − m − , which is deterministic. The second part is leaf-leaf contributions, the expectation of which is ( m + 11 m + 46)( m − m E [ X ,n X ,n ] + 3 m (cid:0) E (cid:2) X ,n (cid:3) − E [ X ,n ] (cid:1) = ( m + 10 m + 35 m − n − ( m + 10)( m + 1)( m − n m . The last part is contributed by the distances between spine nodes and leaf nodes. Its expectationis ( m + 7 m + 18)( m − m E [ X i,n ] = ( m + 7 m + 18)( m − n . Putting three parts together, we get the expectation of the hyper Wiener index of C n , pre-sented in the next proposition. Proposition 8.
For n ≥ , the mean of the hyper Wiener index of a caterpillar is E (cid:2) W hn (cid:3) = 112 m (cid:0) ( m + 10 m + 35 m − n + (2 m + 13 m + 25 m − × ( m − n + m ( m + 2)( m + 1)( m − (cid:1) . As n → ∞ , we have W hn n L −→ m + 10 m + 35 m − m . The convergence takes place in probability as well.
7. N
UMERICAL EXPERIMENTS
We conduct a series of numerical experiments to verify the results developed from Section 3to 6. Given a fixed m = 200 , we independently generate R = 500 replications of randomcaterpillars after n = 5000 evolutionary steps. For each simulated caterpillar, its Hoover,Zagreb, Randi´c, Wiener and hyper Wiener indices are computed and recorded. We evaluateeach of the simulation results one after another.Our results from Section 3 indicate that the Hoover index of a random caterpillar is a de-terministic function of m and n , and hence lacks randomness. From the simulation, we findthat the Hoover index of each generated caterpillar is around . , which is consistent withProposition 3.Next, we compute the Zagreb index of each generated caterpillar, and standardize the (ran-dom) sample of Zagreb indices according to Propositions 4 and 5. The resulting histogram isdepicted in Figure 1. In addition, we use the kernel method to estimate the density based on thesample of the standardized Zagreb indices, presented in Figure 1 as well. The simulation resultssuggest that the Zagreb index (after proper scaling) of random caterpillars follows a Gaussianlaw, which agrees with Theorem 1. We further confirm the conclusion via the Shapiro-Wilk normality test, which yields that the p -value equals . .For the Randi´c index, the sample mean based on the simulated caterpillars is . afterscaled by n . This simulation result is reasonably close to the theoretical result . fromProposition 6.For the Wiener index, we use the igraph package (from R program) to compute the dis-tances among the vertices in the generated caterpillars. The distance calculation requires a greatdeal of computation powers, so we adjust the simulation parameters to R (cid:48) = 500 , m (cid:48) = 50 and n (cid:48) = 2000 . The average of the Wiener indices (after properly scaled) of the simulated cater-pillars is . , almost identical to the theoretical result . from Proposition 7. Underthe same setting of R (cid:48) , m (cid:48) and n (cid:48) , we also compute the hyper Wiener indices. The simulationand theoretical results (after properly scaled) are respectively given by . and . ,which completes the verification.8. C ONCLUDING REMARKS
In this section, we address some concluding remarks, and propose some potential future workas well. We investigate several popular statistical indicies for a class of random caterpillars,including Gini index, Hoover index, Zagreb index, Randi´c index and Wiener index (and itsextension, hyper Wiener index). The mean of each index is computed. Specifically, we show Histogram of Zagreb index
Standardized Zagreb index D en s i t y −3 −2 −1 0 1 2 3 . . . . . F IGURE
1. Histogram of the standardized Zagreb indices of independentlygenerated random caterpillars with m = 200 and n = 5000 ; the thick blue curveis the estimated density of the sample.that the limit distribution of the Zagreb index of random caterpillars is Gaussian. Topologicalindex of random graphs is a burgeoning research area in the applied probability community.The follow-up study of the present paper may be given to further investigations of the limitdistribution of the Wiener index and the analysis of the Randi´c index with a more generalparameter α . Brand new research in this area is three folded:(1) Propose novel indices according to practical needs; for instance, we can consider aglobal metric that captures the total weight of random caterpillars, where the weightcan be added to nodes or edges or itself can be temporal.(2) Investigate other statistical indices that are not yet covered in the present paper; forinstance, the Hosoya’s Z index that counts the number of matchings in a graph and the Balaban’s index that interprets graph connectivity via the associated distance matrix. (3) Consider other types of random trees or more complex random networks that can beused for modeling molecular structures of chemical compounds, and investigate therelevant topological indices thereof. We will report our results elsewhere.R EFERENCES [1] Andrade, E., Gomes, H., Robbiano, M. Spectra and Randi´c spectra of caterpillar graphs and applications tothe energy,
MATCH Commun. Math. Comput. Chem. , , (2017), 61–75. MR 3645367[2] Balaban, A., Motoc, I., Bonchev, D. and Mekenyan, O.: Topological indices for structure-activity correla-tions. In: Steric Effects in Drug Design. Eds.: Austel, V., Balaban, A., Bonchev, D., Charton, M., Fujita,T., Iwamura, H., Mekenyan, O. and Motoc, I. Topics in Current Chemistry, , 21–55. Springer, Berlin,Heidelberg , 1983.[3] Balaji, H. and Mahmoud, H.: The Gini index of random trees with an application to caterpillars,
J. Appl.Probab. , , (2017), 701–709. MR 3707823[4] Bereg, S. and Wang, H.: Wiener indices of balanced binary trees. Discrete Appl. Math. , , (2007), 457–467. MR 2296868[5] Bollob´as, B. and Erd¨os, P.: Graphs of extremal weights, Ars Combin. , , (1998), 225–233. MR 1670561[6] Bollob´as, B., Erd¨os, P. and Sarkar, A.: Extremal graphs for weights, Discrete Math. , , (1999), 5–19.MR 1692275[7] Das, K. and Gutman, I.: Some properties of the second Zagreb index, MATCH Commun. Math. Comput.Chem. , , (2004), 103–112. MR 2104642[8] Darafsheh, M., Khalifeh, M. and Jolany, H.: The hyper-wiener index of one-pentagonal carbon nanocone, Curr. Nanosci. , , (2013), 557–569.[9] Devillers, J. and Balaban, A.: Topological Indices and Related Descriptors in QSAR and QSPR. Edition 1. CRC Press, Boca Raton, FL , 2000.[10] Domicolo, C. and Mahmoud, H.: Degree-based Gini index for graphs.
Probab. Eng. Inform. Sci.
Availableonline.[11] Domicolo, C., Zhang, P. and Mahmoud, H.: The degree Gini index of several classes of random trees andtheir poissonized counterparts—An evidence for the duality theory. arXiv:1903.00086 [math.PR] [12] El-Basil, S.: Applications of caterpillar trees in chemistry and physics,
J. Math. Chem. , , (1987), 153–174.MR 0906155[13] El-Basil S.: Caterpillar (Gutman) trees in chemical graph theory. In: Advances in the Theory of BenzenoidHydrocarbons. Eds.: Gutman, I. and Cyvin S. Topics in Current Chemistry, , 273–289, Springer, Berlin,Heidelberg , 1990.[14] Feng, Q., Mahmoud, H. and Panholzer, A.: Limit for the Randi´c index of random binary tree models,
Ann.Inst. Statist. Math. , , (2008), 319–343. MR 2403522[15] Feng, Q. and Hu, Z.: On the Zagreb index of random recursive trees, J. Appl. Probab. , , (2011), 1189–1196. MR 2896676[16] Feng, Q. and Hu, Z.: Asymptotic normality of the Zagreb index of random b -ary recursive trees, Dal’nevost.Mat. Zh. , , (2015), 91–101. MR 3582623[17] Fill, J. and Janson, S.: Precise logarithmic asymptotics for the right tails of some limit random variables forrandom trees, Ann. Comb. , , (2009), 403–416. MR 2496125[18] Fuchs, M. and Lee, C.K.: The Wiener index of random digital trees, SIAM J. Discrete Math. , , (2015),586–614. MR 3324969[19] Gini, C.: Measurement of inequality of incomes, Econ. J. , , (1921), 124–126.[20] Golbraikh, A., Bonchev, D. and Tropsha, A.: Novel chirality descriptors derived from molecular topology, J. Chem. Inf. Comput. Sci. , , (2001), 147–158.[21] Golbraikh, A., Bonchev, D. and Tropsha, A.: Novel ZE-isomerism descriptors derived from molecular topol-ogy and their application to QSAR analysis, J. Chem. Inf. Comput. Sci. , , (2002), 769–787.[22] Gutman, I. and Trinajsti´c, N.: Graph theory and molecular orbitals. Total phi -electron energy of alternanthydrocarbons Chem. Phys. Lett. , , (1972), 535–538.[23] Gutman, I., Ruˇsˇci´c, B., Trinajsti´c, N. and Wilcox, C.: Graph theory and molecular orbitals. XII. Acyclicpolyenes, J. Chem. Phys. , , (1975), 3399–3405.[24] Gutman, I. and El-Basil, S.: Topological properties of Benzenoid systems. XXXVII. Characterization ofcertain chemical graphs, Z. Naturforsch. A , , (1985), 923–926. MR 0813290[25] Gutman, I., Miljkovi´c, O., Caporossi, G. and Hansen, P.: Alkanes with small and large Randi´c connectivityindex, Chem. Phys. Lett. , , (1999), 366–372. [26] Gutman, I., Araujo, O. and Morales, D.: Estimating the connectvity index of saturated hydrocarbon, IndianJ. Chem. , , (2000), 381–385.[27] Gutman, I.: Degree-based topological indices, Croat. Chem. Acta , , (2013), 351–361.[28] Gutman, I., Furtula, B., Kovijani´c, Kovijani´c Vuki´cevi´c, ˇZ. and Popivoda, G.: On Zagreb indices andcoindices, MATCH Commun. Math. Comput. Chem. , , (2015), 5–16. MR 3379512[29] Hall, P. and Heyde, C.: Martingale limit theory and its application. Academic Press, Inc., New York, NY ,1980. xii+308 pp. MR 0624435[30] Hoover, E., Jr.: The measurement of industrial localization,
Rev. Econ. Stat. , , (1936), 162–171.[31] Janson, S.: The Wiener index of simply generated random trees, Random Structures Algorithms , , (2003),337–358. MR 1980963[32] Khalifeh, M., Yousefi-Azari, H. and Ashrafi, A.: The first and second Zagreb indices of some graph opera-tions, Discrete Appl. Math. , , (2009), 804–811. MR 2499494[33] Karyven, I.: Analytic results on the polymerisation random graph model, J. Math. Chem. , , (2018), 140–157. MR 3742858[34] Li, X. and Shi, Y.: A survey on the Randi´c index, MATCH Commun. Math. Comput. Chem. , 59, (2008),127–156. MR 2378255[35] Eliasi, M. and Ghalavand, A.: Ordering of trees by multiplicative second Zagreb index,
Trans. Comb. , ,(2016), 49–55. MR 3462890[36] Miliˇcevi´c, A. and Nikoli´c, S.: On variable Zagreb indices, Croat. Chem. Acta , , (2004), 97–101.[37] Munsonius, G.: On the asymptotic internal path length and the asymptotic Wiener index of random splittrees. Electron. J. Probab. , , (2011), 1020–1047. MR 2820068[38] Munsonius, G. and R¨uschendorf, L.: Limit theorems for depths and distances in weighted random b -aryrecursive trees, J. Appl. Probab. , , (2011), 1060–1080. MR 2896668[39] Neininger, R.: The Wiener index of random trees, Combin. Probab. Comput. , , (2002), 587–597.MR 1940122[40] Nikoli´c, S. and Trinajsti´c, N.: The Wiener index: Development and applications, Croat. Chem. Acta , ,(1995), 105–129.[41] Nikoli´c, S., Toli´c, I., Trinajsti´c, N.: On the complexity of molecular graphs, Match , , (1999), 187–201.MR 1729484[42] Nikoli´c, S., Kovaˇcevi´c, G., Miliˇcevi´c, A. and Trinajsti´c, N.: The Zagreb index 30 years after, Croat. Chem.Acta , , (2003), 113–124.[43] Platt, J.: Influence of neighbor bonds on additive bond properties in paraffins, J. Chem. Phys. , , (1947),419–420.[44] Rada, J., Araujo, O. and Gutman, I. Randi´c index of benzenoid systems and phenylenes , Croat. Chem. Acta , , (2001), 225–235.[45] Randi´c, M.: Characterization of molecular branching, J. Am. Chem. Soc. , , (1975), 6609–6615.[46] Randi´c, M.: Novel molecular descriptor for structure-property studies, Chem. Phys. Lett. , , (1993), 478–483.[47] Randi´c, M.: The connectivity index 25 years after, J. Mol. Graph. Model. , , (2001), 19–35.[48] Randi´c, M.: On history of the Randi´c index and emerging hostility toward chemical graph theory, MATCHCommun. Math. Comput. Chem. , , (2008), 5–124. MR 2378254[49] Todeschini, R. and Consonni, V.: Molecular Descriptors for Chemoinformatics. Wiley, Hoboken, NJ , 2009.1257 pp.[50] Wagner, S.: On the Wiener index of random trees,
Discrete Math. , , (2012), 1502–1511. MR 2899882[51] Wiener, H.: Correlation of heats of isomerization, and differences in heats of vaporization of isomers, amongthe paraffin hydrocarbons, J. Am. Chem. Soc. , , (1947), 2636–2638.[52] Zhang, P.: On several properties of plain-oriented recursive trees, arXiv:1706.02441, (2018).[53] Zhang, P. and Dey, D.: The degree profile and Gini index of random caterpillar trees, Probab. Eng. Inform.Sci. ,33