Paths vs. stars in the local profile of trees
aa r X i v : . [ m a t h . C O ] F e b PATHS VS. STARS IN THE LOCAL PROFILE OF TREES ´EVA CZABARKA, L ´ASZL ´O A. SZ´EKELY AND STEPHAN WAGNER
Abstract.
The aim of this paper is to provide an affirmative answer to a recent questionby Bubeck and Linial on the local profile of trees. For a tree T , let p ( k )1 ( T ) be theproportion of paths among all k -vertex subtrees (induced connected subgraphs) of T , andlet p ( k )2 ( T ) be the proportion of stars. Our main theorem states: if p ( k )1 ( T n ) → T , T , . . . whose size tends to infinity, then p ( k )2 ( T n ) →
1. Both arealso shown to be equivalent to the statement that the number of k -vertex subtrees growssuperlinearly and the statement that the ( k − introduction In their recent paper [2], Bubeck and Linial studied what they call the local profile of trees. For two trees S and T , we denote the number of copies of S in T by c ( S, T )(formally, the number of vertex subsets of T that induce a tree isomorphic to S ). For aninteger k ≥
4, let T k , T k , . . . be a list of all k -vertex trees (up to isomorphism), such that T k = P k is the path and T k = S k is the star, and set p ( k ) i ( T ) = c ( T ki , T ) Z k ( T ) , where Z k ( T ) = X j c ( T kj , T ) . In words, Z k ( T ) is the number of k -vertex subtrees of T (the number of k -vertex subsetsthat induce a tree), and p ( k ) i the proportion of copies of T ki among those subtrees. Inparticular, p ( k )1 ( T ) is the proportion of paths, and p ( k )2 ( T ) is the proportion of stars. Thevector p ( k ) ( T ) = ( p ( k )1 ( T ) , p ( k )2 ( T ) , . . . ) is called the k -profile of T .Bubeck and Linial study specifically the limit set ∆( k ) of k -profiles p ( k ) ( T ) as thenumber of vertices of T tends to infinity. Their main result is that ∆( k ) is convex for every k . This contrasts the situation for general graphs, where the analogously defined set isnot convex and even determining the convex hull is computationally infeasible [3]. Evenin special cases, fairly little is known about k -profiles (see [4] for a study of 3-profiles).We remark that there is also a notable difference in the definitions of k -profiles of general Date : April 11, 2018.2010
Mathematics Subject Classification.
Key words and phrases. trees, subtrees, local profile, paths, stars.The second author was supported in part by the NSF DMS, grant number 1300547, the third authorwas supported in part by the National Research Foundation of South Africa, grant number 96236. graphs and trees: for graphs, the proportion is taken among all vertex sets of cardinality k , while for trees it makes more sense to only consider those k -vertex sets that actuallyinduce a tree. For general graphs, this would amount to considering only those subsetsthat induce a connected graph.Furthermore, Bubeck and Linial show that the sum of the first two components (cor-responding to the path and the star respectively) is strictly positive for every point in thelimit set ∆( k ) and in fact bounded below by an explicit constant that only depends on k (see the discussion at the end of Section 2 and in particular Corollary 11 for an equivalentstatement). They also obtain a somewhat stronger inequality in the special case k = 5.Bubeck and Linial list a number of open problems at the end of their article, and oneof them will be the main topic of this paper. It can be expressed as follows: Question . Let T , T , . . . be a sequence of trees such that the number of vertices of T n tends to infinity as n → ∞ . Given that lim n →∞ p ( k )1 ( T n ) = 0, is it necessarily true thatlim n →∞ p ( k )2 ( T n ) = 1?In somewhat more informal terms, this states the following: if only few of the k -vertexsubtrees of a large tree are paths, almost all of those subtrees have to be stars. We remarkthat the statement is not true if p ( k )1 and p ( k )2 are interchanged. For example, consider thesequence of caterpillars as shown in Figure 1. v v v v v n v n +1 Figure 1.
A caterpillar.Obviously, p (5)2 ( T n ) = 0 for every n in this example: the maximum degree is 3, so T n does not contain any 5-vertex stars. On the other hand, simple calculations show thatlim n →∞ p (5)1 ( T n ) = .In the following, we will provide an affirmative answer to the question raised by Bubeckand Linial, and even prove a slight extension involving the total number of k -vertex subtreesand the degree moments. Here and in the following, we write V ( T ) and E ( T ) for the vertexset and edge set of a tree T , | T | is the number of vertices of T , and d ( v ) denotes the degreeof a vertex v ; whenever we speak about the degree of a vertex, we always mean the degreein the underlying tree T , not a subtree. Theorem 1.
Let T , T , . . . be a sequence of trees such that | T n | → ∞ as n → ∞ . Forevery k ≥ , the following four statements are equivalent: (M1) lim n →∞ p ( k )1 ( T n ) = 0 , ATHS VS. STARS IN THE LOCAL PROFILE OF TREES 3 (M2) lim n →∞ | T n | Z k ( T n ) = ∞ , (M3) lim n →∞ | T n | X v ∈ V ( T n ) d ( v ) k − = ∞ , (M4) lim n →∞ p ( k )2 ( T n ) = 1 . Informally, statement (M2) says that T n contains more than linearly many k -vertexsubtrees. (M3) states that the ( k − ⇒ (M1) is trivial, so our main task will be to prove the implications (M1) ⇒ (M2) ⇔ (M3) ⇒ (M4).Shortly after a first version of this paper was published online, the equivalence of (M1)and (M4) was shown independently by Bubeck, Edwards, Mania and Supko [1], who alsoprovided an explicit (nonlinear) inequality between p ( k )1 ( T ) and p ( k )2 ( T ) that implies theequivalence. 2. Proof of the main theorem
Theorem 1 will follow from a sequence of lemmas. As a first step, we estimate the totalnumber of k -vertex subtrees. Lemma 2.
Let k be a positive integer. The total number of k -vertex subtrees of any tree T can be bounded above as follows: Z k ( T ) ≤ ( k − X v ∈ V ( T ) d ( v ) k − . Proof.
For every vertex v of T , we count the number of k -vertex subtrees with the propertythat v is contained, and that it has maximum degree (in T , not the subtree!) among allvertices of the subtree. Every such subtree can be constructed by repeatedly adding a leaf,starting with the single vertex v . At the j -th such step, there are at most j vertices toattach a leaf to, and at most d ( v ) choices for the new leaf (since v was assumed to havemaximum degree). Therefore, there are at most ( k − · d ( v ) k − possible subtrees of thiskind for any fixed vertex v . Summing over all v , we obtain the desired result. Clearlyevery subtree is counted at least once in the sum—possibly even several times, but sincewe are only interested in an upper bound, this is immaterial. (cid:4) Lemma 3.
For every integer k ≥ , the total number of k -vertex stars contained in a tree T is c ( S k , T ) = X v ∈ V ( T ) (cid:18) d ( v ) k − (cid:19) . ´EVA CZABARKA, L ´ASZL ´O A. SZ´EKELY AND STEPHAN WAGNER Proof.
The number of k -vertex stars contained in T whose center is v is given by (cid:0) d ( v ) k − (cid:1) , thenumber of ways to choose k − (cid:4) Note that p ( k )1 ( T n ) = 0 if the diameter of T n is at most k − k -vertex paths), so this would provide us with a simple constructionfor which condition (M1) holds. We will treat this case separately and show that it implies(M2): Lemma 4.
Fix an integer k ≥ , and let T , T , . . . be a sequence of trees whose diameteris bounded above by some fixed constant D . If | T n | → ∞ as n → ∞ , then (M2) holds, i.e. lim n →∞ | T n | Z k ( T n ) = ∞ . Proof.
We prove the slightly stronger statement thatlim n →∞ | T n | c ( S k , T n ) = ∞ , i.e. the number of induced k -vertex stars grows faster than linearly. To this end, it willbe useful to consider all trees as rooted (at an arbitrary vertex). Clearly, if the diameteris bounded by D , the height of any rooted version is also bounded by D . We prove thefollowing by induction on D , from which the statement of the lemma follows immediately: Claim.
For every positive integer D , there exist positive constants α D , β D with β D > N D depending only on D and k such that c ( S k , T ) ≥ α D max( | T | − N D , β D for any rooted tree T whose height is at most D .First note that the claim is trivial for D = 1: there is only one possible rooted tree inthis case, namely a star. Thus we have c ( S k , T ) = (cid:18) | T | − k − (cid:19) ≥ (cid:16) | T | k (cid:17) k − in this case as soon as | T | ≥ k , which gives us the desired inequality with β = k − > α = k − ( k − and N = k .Now we turn to the induction step. Let r be the root degree, and let T , T , . . . , T r bethe root branches, each endowed with the natural root (the neighbor of T ’s root). Thenumber of copies of S k in T for which the root is the centre is given by (cid:0) rk − (cid:1) , so c ( S k , T ) ≥ (cid:18) rk − (cid:19) + r X j =1 c ( S k , T j ) . ATHS VS. STARS IN THE LOCAL PROFILE OF TREES 5
Each of the branches has height at most D −
1, so we can apply the induction hypothesisto them. In addition, we note that f ( x ) = α D − max( x − N D − , β D − is a convex function,so Jensen’s inequality gives us c ( S k , T ) ≥ (cid:18) rk − (cid:19) + rα D − max (cid:16) | T | − r − N D − , (cid:17) β D − . If r ≥ | T | / and | T | ≥ ( k − / , then the first term is (cid:18) rk − (cid:19) ≥ (cid:18) | T | / k − (cid:19) ≥ (cid:16) | T | / k − (cid:17) k − . If, on the other hand, r < | T | / and | T | ≥ ( N D − + 2) , then the second term is rα D − max (cid:16) | T | − r − N D − , (cid:17) β D − ≥ rα D − (cid:16) | T | r − N D − − (cid:17) β D − ≥ rα D − (cid:16) | T | r ( N D − + 2) (cid:17) β D − = α D − ( N D − + 2) β D − · r − β D − | T | β D − ≥ α D − ( N D − + 2) β D − · | T | ( β D − +2) / . Thus we obtain the desired inequality with α D = min (cid:16) k − k − , α D − ( N D − + 2) β D − (cid:17) ,β D = min (cid:16) k − , β D − + 23 (cid:17) ,N D = max (cid:16) ( k − / , ( N D − + 2) (cid:17) . Since k ≥ β D − >
1, we also have β D >
1. This completes theinduction and thus the proof of the lemma. (cid:4)
Lemma 4 shows that (M2) always holds for sequences of trees with bounded diameter,even without the assumption (M1). On the other hand, if the diameter is sufficiently large,then it turns out that there must always be at least linearly many paths of length k . Infact, we have the following simple lemma: Lemma 5.
Let k be a positive integer. If the diameter of a tree T is at least k − , then c ( P k , T ) ≥ | T | / .Proof. Since the diameter is assumed to be at least 2 k −
2, the radius must be at least k −
1. Therefore, for every vertex v of T , there is a k -vertex path in T starting at v . Sinceevery path has only two ends, no path is counted more than twice in this argument, thusthere must be at least | T | / k -vertex paths occurring in T . (cid:4) ´EVA CZABARKA, L ´ASZL ´O A. SZ´EKELY AND STEPHAN WAGNER Corollary 6.
For every integer k ≥ , the implication (M1) ⇒ (M2) holds.Proof. Consider a sequence T , T , . . . of trees with | T n | → ∞ for which (M1) holds. Forthe subsequence consisting of trees whose diameter is at most 2 k −
3, (M2) follows fromLemma 4, regardless of whether (M1) is true or not. For the remaining subsequence, wecan simply combine Lemma 5 with the assumption (M1). (cid:4)
As a next step, we show the equivalence of (M2) and (M3), which is quite straightfor-ward:
Lemma 7.
For every integer k ≥ , the two statements (M2) and (M3) are equivalent.Proof. Condition (M2), combined with Lemma 2, implies thatlim n →∞ | T n | X v ∈ V ( T n ) d ( v ) k − = ∞ , which is exactly (M3). On the other hand, since (cid:0) dk − (cid:1) ≥ (cid:0) dk − (cid:1) k − for d ≥ k −
1, Lemma 3gives c ( S k , T n ) ≥ ( k − − ( k − X v ∈ V ( T n ) d ( v ) k − − | T n | , (1)where the final term stems from vertices whose degree is less than k −
1. Therefore, if (M3)holds, then we also have lim n →∞ c ( S k , T n ) | T n | = ∞ , which is (M2). (cid:4) Now we would like to bound the number of non-star k -vertex subtrees from above toobtain the implication (M2) ⇒ (M4). To this end, we first introduce the notion of edgeweights:Define the weight of an edge e = vu as ω ( e ) = max (cid:16) d ( u ) d ( v ) , d ( v ) d ( u ) (cid:17) . In words: take the degrees of the two endpoints of e and divide the higher degree by thelower degree. For some real number a >
1, call a subtree S of a tree T an a - unbalanced subtree if it contains at least one edge that is not a pendant edge (incident to a leaf) of S and that has a weight of at least a in T . Denote the total number of a -unbalanced k -vertexsubtrees of T by Z k ( T, a ). The following lemma is in some sense a refinement of Lemma 2.
Lemma 8.
For every integer k ≥ , every real number a > , and every tree T , we have Z k ( T, a ) ≤ ( k − a X v ∈ V ( T ) d ( v ) k − . ATHS VS. STARS IN THE LOCAL PROFILE OF TREES 7
Proof.
We can follow the proof of Lemma 2. The only change in the argument is that atleast one vertex of degree at most d ( v ) /a has to be added to the subtree at some pointso as to include an edge of weight at least a . Since we also require the presence of suchan edge that is not a pendant edge of the subtree, at some stage a neighbor of a vertex ofdegree at most d ( v ) /a has to be added to the subtree as well, for which there are only atmost d ( v ) /a possibilities. This gives us the same inequality as in Lemma 2, but with anextra factor a in the denominator. (cid:4) It remains to bound the number of k -vertex subtrees that are neither stars nor a -unbalanced; we denote this number by Z k ( T, a ). Our next lemma provides a suitablebound:
Lemma 9.
For every integer k ≥ , every real number a > , and every tree T , we have Z k ( T, a ) ≤ k − a ( k − X v ∈ V ( T ) d ( v ) k − . Proof.
Consider any edge e whose weight is at most a . It is not difficult to see that thereexists some nonnegative integer ℓ such that the degrees of both its endpoints lie in theinterval [ a ℓ , a ℓ +2 ): simply take ℓ in such a way that the smaller degree of the two lies in[ a ℓ , a ℓ +1 ). Now consider any subtree S that is not a -unbalanced and contains e as a non-pendant edge (it automatically follows that S is not a star). Every internal vertex v of S can be reached from e by a path of non-pendant edges whose length is at most k −
4. Since S was assumed not to be a -unbalanced, none of these edges can have a weight greater than a , so the degree of v in T is at most a ℓ +2 · a k − = a ℓ + k − .Now we count all subtrees S that are not a -unbalanced and contain e as a non-pendantedge. Every such subtree can be obtained by repeatedly adding leaves, starting from e .This is done k − j -th step, we have a choice of j + 1 vertices to attach aleaf to, and at most a ℓ + k − possible choices for the leaf by the observation on degrees ofinternal vertices in S . It follows that there are no more than( k − · a ( k − ℓ + k − such subtrees.The number of edges whose ends both have degrees in [ a ℓ , a ℓ +2 ) is less than the numberof vertices whose degrees lie in this interval, since the edges induce a forest on the set ofthese vertices. Therefore, we obtain the following upper bound for the number of k -vertexsubtrees that are neither stars nor a -unbalanced (note that every non-star has at least one ´EVA CZABARKA, L ´ASZL ´O A. SZ´EKELY AND STEPHAN WAGNER non-pendant edge): Z k ( T, a ) ≤ X ℓ ≥ X e = vw ∈ E ( T ) d ( v ) ,d ( w ) ∈ [ a ℓ ,a ℓ +2 ) ( k − · a ( k − ℓ + k − ≤ X ℓ ≥ X v ∈ V ( T ) d ( v ) ∈ [ a ℓ ,a ℓ +2 ) ( k − · a ( k − ℓ + k − ≤ X ℓ ≥ X v ∈ V ( T ) d ( v ) ∈ [ a ℓ ,a ℓ +2 ) ( k − · d ( v ) k − · a ( k − ≤ k − a ( k − X v ∈ V ( T ) d ( v ) k − . The last inequality holds since every vertex is counted at most twice in the double sum. (cid:4)
Now we put everything together to obtain the desired implication (M2) ⇒ (M4), com-pleting the proof of Theorem 1. Let us formulate this explicitly: Corollary 10.
For every integer k ≥ , the implication (M2) ⇒ (M4) holds.Proof. Assume that condition (M2) is satisfied. Combining it with inequality (1) from theproof of Lemma 7, we see that 1 c ( S k , T n ) X v ∈ V ( T n ) d ( v ) k − is bounded above by a positive constant (for sufficiently large n ).We combine Lemma 8 and Lemma 9 to find that the total number of k -vertex subtreesof a tree T that are not stars can be bounded by Z k ( T, a ) + Z k ( T, a ) ≤ ( k − a X v ∈ V ( T ) d ( v ) k − + 2( k − a ( k − X v ∈ V ( T ) d ( v ) k − . H¨older’s inequality gives us X v ∈ V ( T ) d ( v ) k − ≤ | T | / ( k − (cid:18) X v ∈ V ( T ) d ( v ) k − (cid:19) ( k − / ( k − , ATHS VS. STARS IN THE LOCAL PROFILE OF TREES 9 so putting everything together, we obtain Z k ( T n ) − c ( S k , T n ) = Z k ( T n , a ) + Z k ( T n , a )= O a − X v ∈ V ( T n ) d ( v ) k − + a ( k − X v ∈ V ( T n ) d ( v ) k − ! = O a − X v ∈ V ( T n ) d ( v ) k − + a ( k − | T n | / ( k − (cid:18) X v ∈ V ( T n ) d ( v ) k − (cid:19) ( k − / ( k − ! = O (cid:16) a − c ( S k , T n ) + a ( k − | T n | / ( k − c ( S k , T n ) ( k − / ( k − (cid:17) . The O -constant depends on k and the specific sequence of trees, but notably not on a ,which we can still choose freely. Taking a = (cid:16) c ( S k , T n ) | T n | (cid:17) k − k − , which is greater than 1 for sufficiently large n in view of condition (M2), the two terms inthe estimate balance, and we end up with Z k ( T n ) − c ( S k , T n ) = O (cid:16)(cid:16) | T n | c ( S k , T n ) (cid:17) k − k − c ( S k , T n ) (cid:17) , so that (M2) now implies lim n →∞ c ( S k , T n ) Z k ( T n ) = 1 , which is exactly (M4). (cid:4) As we have now shown the implications (M1) ⇒ (M2) (Corollary 6), (M2) ⇔ (M3)(Lemma 7) and (M2) ⇒ (M4) (Corollary 10) and the implication (M4) ⇒ (M1) is trivial,this also completes the proof of Theorem 1.Our ideas can also be used to re-prove a result of Bubeck and Linial [2, Theorem 2],even with a slightly improved constant: namely, they showed thatlim inf n →∞ (cid:16) p ( k )1 ( T n ) + p ( k )2 ( T n ) (cid:17) ≥ k k N k for any sequence T , T , . . . of trees with | T n | → ∞ , where N k is the number of nonisomor-phic trees with k vertices.Making use of the arguments used to prove Theorem 1, we obtain the following: Corollary 11.
For every sequence T , T , . . . of trees with | T n | → ∞ , we have lim inf n →∞ p ( k )1 ( T n ) + p ( k )2 ( T n ) ≥ k − k − ( k − . Proof.
Lemma 2 gives us Z k ( T n ) ≤ ( k − X v ∈ V ( T n ) d ( v ) k − . Combining this inequality with (1) and Lemma 5 (we may assume that the diameter is notbounded in view of Lemma 4) yields Z k ( T n ) ≤ ( k − k − k − ( c ( S k , T n ) + | T n | ) ≤ ( k − k − k − ( c ( S k , T n ) + 2 c ( P k , T n )) . Therefore, p ( k )1 ( T n ) + p ( k )2 ( T n ) = c ( S k , T n ) + c ( P k , T n ) Z k ( T n ) ≥ c ( S k , T n ) + c ( P k , T n )( k − k − k − ( c ( S k , T n ) + 2 c ( P k , T n )) , and the desired result follows immediately. (cid:4) With more careful estimates, it is certainly possible to improve further on the lowerbound in Corollary 11. 3.
Subtrees of different sizes
So far, we were only comparing subtrees of the same fixed size k . However, it is naturalto assume that lim n →∞ p ( k )1 ( T n ) = 0 for some k (in words: the proportion of paths among k -vertex subtrees goes to 0) should also imply lim n →∞ p ( ℓ )2 ( T n ) = 1 (the proportion of starsamong ℓ -vertex subtrees goes to 1) for some ℓ that is not necessarily equal to k . Indeedthis is true if k ≤ ℓ : since we trivially have X v ∈ V ( T ) d ( v ) k − ≤ X v ∈ V ( T ) d ( v ) ℓ − in this case, condition (M3) is satisfied for ℓ if it is satisfied for k . Therefore, we immediatelyobtain a slight extension of Theorem 1: Theorem 12.
Let T , T , . . . be a sequence of trees such that | T n | → ∞ as n → ∞ . Let k, ℓ be integers such that ℓ ≥ k ≥ , and assume that one of the following equivalent statementsholds: (M1) k lim n →∞ p ( k )1 ( T n ) = 0 , (M2) k lim n →∞ | T n | Z k ( T n ) = ∞ , (M3) k lim n →∞ | T n | X v ∈ V ( T n ) d ( v ) k − = ∞ , ATHS VS. STARS IN THE LOCAL PROFILE OF TREES 11 (M4) k lim n →∞ p ( k )2 ( T n ) = 1 .In this case, the following statements hold as well: (M1) ℓ lim n →∞ p ( ℓ )1 ( T n ) = 0 , (M2) ℓ lim n →∞ | T n | Z ℓ ( T n ) = ∞ , (M3) ℓ lim n →∞ | T n | X v ∈ V ( T n ) d ( v ) ℓ − = ∞ , (M4) ℓ lim n →∞ p ( ℓ )2 ( T n ) = 1 . In heuristic terms: if most k -vertex subtrees are stars, then this is also the case for ℓ -vertex subtrees, provided ℓ ≥ k . On the other hand, if only very few of the k -vertexsubtrees are paths, then the same applies to ℓ -vertex subtrees for every ℓ ≥ k . It isnoteworthy, however, that the converse is not true, and counterexamples are very easy toconstruct.Consider for instance a family of extended stars constructed as follows (Figure 2): T n has n vertices, of which the central vertex has degree (approximately) n / (2 k − for some k ≥
4, while all other vertices have degree 1 or 2. The actual lengths of the paths aroundthe central vertex are irrelevant. It is easy to see in this example that (M3) k is not satisfied,and that in fact lim n →∞ p ( k )2 ( T n ) = 0, while on the other hand (M3) k +1 is satisfied, so thatlim n →∞ p ( k +1)2 ( T n ) = 1. Figure 2.
An extended star.
References [1] S´ebastien Bubeck, Katherine Edwards, Horia Mania, and Cathryn Supko,
On paths, stars and wyes intrees , arXiv:1601.01950, 2016.[2] S´ebastien Bubeck and Nati Linial,
On the local profiles of trees , J. Graph Theory (2016), no. 2,109–119.[3] Hamed Hatami and Serguei Norine, Undecidability of linear inequalities in graph homomorphism den-sities , J. Amer. Math. Soc. (2011), no. 2, 547–565.[4] Hao Huang, Nati Linial, Humberto Naves, Yuval Peled, and Benny Sudakov, On the 3-local profiles ofgraphs , J. Graph Theory (2014), no. 3, 236–248. ´Eva Czabarka and L´aszl´o A. Sz´ekely, Department of Mathematics, University ofSouth Carolina, Columbia, SC 29208, USA E-mail address : { czabarka,szekely } @math.sc.edu Stephan Wagner, Department of Mathematical Sciences, Stellenbosch University,Private Bag X1, Matieland 7602, South Africa
E-mail address ::