Estimating the Number of Connected Components in a Graph via Subgraph Sampling
EEstimating the Number of Connected Components in a Graph viaSubgraph Sampling
Jason M. Klusowski ∗ Yihong Wu † June 18, 2019
Abstract
Learning properties of large graphs from samples has been an important problem in statisticalnetwork analysis since the early work of Goodman [27] and Frank [21]. We revisit a problemformulated by Frank [21] of estimating the number of connected components in a large graphbased on the subgraph sampling model, in which we randomly sample a subset of the vertices andobserve the induced subgraph. The key question is whether accurate estimation is achievable inthe sublinear regime where only a vanishing fraction of the vertices are sampled. We show that itis impossible if the parent graph is allowed to contain high-degree vertices or long induced cycles.For the class of chordal graphs, where induced cycles of length four or above are forbidden, wecharacterize the optimal sample complexity within constant factors and construct linear-timeestimators that provably achieve these bounds. This significantly expands the scope of previousresults which have focused on unbiased estimators and special classes of graphs such as forestsor cliques.Both the construction and the analysis of the proposed methodology rely on combinatorialproperties of chordal graphs and identities of induced subgraph counts. They, in turn, also playa key role in proving minimax lower bounds based on construction of random instances of graphswith matching structures of small subgraphs.
Contents ∗ Department of Statistics, Rutgers University – New Brunswick, Piscataway, NJ, 8019, email: [email protected] . † Department of Statistics and Data Science, Yale University, New Haven, CT, 06511, email: [email protected] .This research was supported in part by the NSF Grant IIS-1447879, CCF-1527105, an NSF CAREER award CCF-1651588, and an Alfred Sloan fellowship. a r X i v : . [ m a t h . S T ] J un Algorithms and performance guarantees 11
A Additional proofs 25B Additional results 31
B.1 Extensions to uniform sampling model . . . . . . . . . . . . . . . . . . . . . . . . . . 31B.2 Lower bound for graphs with long induced cycles . . . . . . . . . . . . . . . . . . . . 33B.3 Lower bounds for forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C Numerical experiments 36
C.1 Synthetic experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36C.2 Real-data experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Counting the number of features in a graph – ranging from basic local structures like motifs orgraphlets (e.g., edges, triangles, wedges, stars, cycles, cliques) to more global features like thenumber of connected components – is an important task in network analysis. For example, theglobal clustering coefficient of a graph (i.e. the fraction of closed triangles) is a measure of thetendency for nodes to cluster together and a key quantity used to study cohesion in various networks[42]. To learn these graph properties, applied researchers typically collect data from a randomsample of nodes to construct a representation of the true network. We refer to these problemscollectively as statistical inference on sampled networks , where the goal is to infer properties of theparent network (population) from a subsampled version. Below we mention a few examples thatarise in various fields of study. • Sociology: Social networks of the Hadza hunter-gatherers of Tanzania were studied in [3] bysurveying 205 individuals in 17 Hadza camps (from a population of 517). Another study [12]of farmers in Ghana used network data from a survey of 180 households in three villages froma population of 550 households. • Economics and business: Low sampling ratios have been used in applied economics (such as30% in [18]), particularly for large scale studies [4,19]. A good overview of various experiments2n applied economics and their corresponding sampling ratios can be found in [9, AppendixF, p. 11]. Word of mouth marketing in consumer referral networks was studied in [49] using158 respondents from a potential subject pool of 238. • Genomics: The authors of [53] use protein-protein interaction data and demonstrate that itis possible to arrive at a reliable statistical estimate for the number of interactions (edges)from a sample containing approximately 1500 vertices. • World Wide Web and Internet: Informed random IP address probing was used in [29] in anattempt to obtain a router-level map of the Internet.As mentioned earlier, a primary concern of these studies is how well the data represent the truenetwork and how to reconstruct the relevant properties of the parent graphs from samples. Theseissues and how they are addressed broadly arise from two perspectives: • The full network is unknown due to the lack of data, which could arise from the underlyingexperimental design and data collection procedure, e.g., historical or observational data. Inthis case, one needs to construct statistical estimators (i.e., functions of the sampled graph)to conduct sound inference. These estimators must be designed to account for the fact thatthe sampled network is only a partial observation of the true network, and thus subject tocertain inherent biases and variability. • The full network is either too large to scan or too expensive to store. In this case, approxi-mation algorithms can overcome such computational or storage issues that would otherwisebe unwieldy. For example, for massive social networks, it is generally impossible to enu-merate the whole population. Rather than reading the entire graph, query-based algorithmsrandomly (or deterministically) sample parts of the graph or adaptively explore the graphthrough a random walk [5]. Some popular instances of traversal based procedures are snow-ball sampling [28] and respondent-driven sampling [52]. Indeed, sampling (based on edge anddegree queries) is a commonly used primitive to speed up computation, which leads to vari-ous sublinear-time algorithms for testing or estimating graph properties such as the averagedegree [25], triangle and more general subgraph counts [2, 15], expansion properties [26]; werefer the readers to the monograph [23].Learning properties of graphs from samples has been an important problem in statistical networkanalysis since the early work of Goodman [27] and Frank [21]. Estimation of various properties suchas graph totals [20] and connectivity [8,21] has been studied in a variety of sample models. However,most of the analysis has been confined to obtaining unbiased estimators for certain classes of graphsand little is known about their optimality. The purpose of this paper is to initiate a systematic studyof statistical inference on sampled networks, with the goal of determining their statistical limits interms of minimax risks and sample complexity, achieved by computationally efficient procedures.As a first step towards this goal, in this paper we focus on a representative problem introducedin [21], namely, estimating the number of connected components in a graph from a partial sample ofthe population network. In fact, the techniques developed in this paper are also useful for estimatingother graph statistics such as motif counts, which were studied in the companion paper [35].Before we proceed, let us emphasize that the objective of this paper is not testing whetherthe graph is connected, which is a property too fragile to test on the basis of a small sampled3raph; indeed, missing a single edge can destroy the connectivity. Instead, our goal is to estimatethe number of connected components with an optimal additive accuracy. Thus, naturally, it isapplicable to graphs with a large number of components.We study the problem of estimating the number of connected components for two reasons. First,it encapsulates many challenging aspects of statistical inference on sampled graphs, and we believethe mathematical framework and machinery developed in this paper will prove useful for estimatingother graph properties as well. Second, the number of connected components is a useful graphproperty that quantifies the connectivity of a network. In addition, it finds use in data-analyticapplications related to determining the number of classes in a population [27]. Another example isthe recent work [11], which studies the estimation of the number of documented deaths in the SyrianCivil War from a subgraph induced by a set of vertices obtained from an adaptive sampling process(similar to subgraph sampling). There, the goal is to estimate the number of unique individuals ina population, which roughly corresponds to the number of connected components in a network ofduplicate records connected by shared attributes.Next we discuss the sampling model, which determines how reflective the data is of the popula-tion graph and therefore the quality of the estimation procedure. There are many ways to samplefrom a graph (see [13, 39] for a list of techniques and [30, 37, 38] for comprehensive reviews). Forsimplicity, this paper focuses on the simplest sampling model, namely, subgraph sampling , wherewe randomly sample a subset of the vertices and observe their induced subgraph; in other words,only the edges between the sampled vertices are revealed. Results on estimating motif counts forthe related neighborhood sampling model can be found in the companion paper [35]. One of theearliest works that adopts the subgraph sampling model is by Frank [21], which is the basis for thetheory developed in this paper. Drawing from previous work on estimating population total usingvertex sampling [20], Frank obtained unbiased estimators of the number of connected componentsand performance guarantees (variance calculations) for graphs whose connected components areeither all trees or all cliques. Extensions to more general graphs are briefly discussed, although nounbiased estimators are proposed. This generality is desirable since it is more realistic to assumethat the objects in each class (component) are in between being weakly and strongly connectedto each other, corresponding to having the level of connectivity between a tree and clique. Whilethe results of Frank are interesting, questions of their generality and optimality remain open andwe therefore address these matters in the sequel. Specifically, the main goals of this paper are asfollows: • Characterize the sample complexity, i.e., the minimal sample size to achieve a given accuracy,as a function of graph parameters. • Devise computationally efficient estimators that provably achieve the optimal sample com-plexity bound.Of particular interest is the sublinear regime , where only a vanishing fraction of the vertices aresampled. In this case, it is impossible to reconstruct the entire graph, but it might still be possibleto accurately estimate the desired graph property.The problem of estimating the number of connected components in a large graph has also beenstudied in the computer science literature, where the goal is to design randomized algorithms withsublinear (in the size of the graph) time complexity. The celebrated work [10] proposed a random-ized algorithm to estimate the number of connected components in a general graph (motivated bycomputing the weight of the minimum spanning tree) within an additive error of (cid:15)N for graphs4ith N vertices and average degree d avg , with runtime O ( d avg (cid:15) log d avg (cid:15) ). Their method relies on dataobtained from a random sample of vertices and then performing a breadth first search on each ver-tex which ends according to a random stopping criterion. The algorithm requires knowledge of theaverage degree d avg and must therefore be known or estimated a priori. The runtime was furtherimproved to O ( (cid:15) − log (cid:15) ) by modifying the stopping criterion [6]. In these algorithms, the breadthfirst search may visit many of the edges and explore a larger fraction of the graph at each round.From an applied perspective, such traversal based procedures can be impractical or impossible toimplement in many statistical applications due to limitations inherent in the experimental designand it is more realistic to treat the network data as a random sample from a parent graph.Finally, let us compare, conceptually, the framework in the present paper with the work on model-based network analysis, where networks are modeled as random graphs drawn from specificgenerative models, such as the stochastic block model [31], graphons [22], or exponential randomgraph models [32] (cf. the recent survey [38]), and performance analysis of statistical proceduresfor parameter estimation or clustering are carried out for these models. In contrast, in networksampling we adopt a design-based framework [30], where the graph is assumed to be deterministicand the randomness comes from the sampling process. Organization
The paper is organized as follows. In Section 2, we formally define the estimationproblem, the subgraph sampling model, and describe what classes of graphs we will be focusingon. To motivate our attention on specific classes of graphs (chordal graphs with maximum degreeconstraints), we show that in the absence of such structural assumptions, sublinear sample com-plexity is impossible in the sense that at least a constant faction of the vertices need to be sampled.Section 3 introduces the definition of chordal graphs and states our main results in terms of theminimax risk and sample complexity. In Section 4, after introducing the relevant combinatorialproperties of chordal graphs, we define the estimator of the number of connect components andprovide its statistical guarantees. We also propose a heuristic for constructing an estimator on non-chordal graphs. In Section 5, we develop a general strategy for proving minimax lower bound forestimating graph properties and particularize it to obtain matching lower bounds for the estimatorconstructed in Section 4.Some of the technical proofs, additional results for the uniform sampling model and for forestsand graphs with long cycles, and a numerical study of the proposed estimators on simulated datafor various graphs are deferred till Appendix A, Appendix B, and Appendix C, respectively.
Notations
We use standard big- O notations, e.g., for any positive sequences { a n } and { b n } , a n = O ( b n ) or a n (cid:46) b n if a n ≤ Cb n for some absolute constant C > a n = o ( b n ) or a n (cid:28) b n or iflim a n /b n = 0. Furthermore, the subscript in a n = O r ( b n ) means a n ≤ C r b n for some constant C r depending on the parameter r only. For positive integer k , let [ k ] = { , . . . , k } . Let Bern( p ) denotethe Bernoulli distribution with mean p and Bin( N, p ) the binomial distribution with N trials andsuccess probability p .Next we introduce some graph-theoretic notations that will be used throughout the paper. Let G = ( V, E ) be a simple undirected graph. Let e = e ( G ) = | E ( G ) | denote the number of edges, v = v ( G ) = | V ( G ) | denote the number of vertices, and cc = cc ( G ) be the number of connectedcomponents in G . The neighborhood of a vertex u is denoted by N G ( u ) = { v ∈ V ( G ) : { u, v } ∈ E ( G ) } .Two graphs G and G (cid:48) are isomorphic, denoted by G (cid:39) G (cid:48) , if there exists a bijection between5he vertex sets of G and G (cid:48) that preserves adjacency, i.e., if there exists a bijective function g : V ( G ) → V ( G (cid:48) ) such that { g ( u ) , g ( v ) } ∈ E ( G (cid:48) ) if and only if { u, v } ∈ E ( G ). The disjoint union oftwo graphs G and G (cid:48) , denoted G + G (cid:48) , is the graph whose vertex (resp. edge) set is the disjointunion of the vertex (resp. edge) sets of G and of G (cid:48) . For brevity, we denote by kG to the disjointunion of k copies of G .We use the notation K n , P n , and C n to denote the complete graph, path graph, and cycle graphon n vertices, respectively. Let K n,n (cid:48) denote the complete bipartite graph with nn (cid:48) edges and n + n (cid:48) vertices. Let S n denote the star graph K ,n on n + 1 vertices.We need two types of subgraph counts: Denote by s ( H, G ) (resp. n ( H, G )) the number of vertex(resp. edge) induced subgraphs of G that are isomorphic to H . For example, s ( , ) = 2 and n ( , ) = 8. Let ω ( G ) denote the clique number, i.e., the size of the largest clique in G . To fix notations, let G = ( V, E ) be a simple, undirected graph on N vertices. In the subgraphsampling model, we sample a set of vertices denoted by S ⊂ V , and observe their induced subgraph,denoted by G [ S ] = ( S, E [ S ]), where the edge set is defined as E [ S ] = {{ i, j } ∈ E : i, j ∈ S } . SeeFig. 1 for an illustration. To simplify notations, we abbreviate the sampled graph G [ S ] as (cid:101) G . (a) Parent graph G with the set of sampled ver-tices S shown in black. (b) Subgraph induced by sampled vertices (cid:101) G = G [ S ]. Non-sampled vertices are shown as isolatedvertices. Figure 1: Subgraph sampling.According to how the set S of sampled vertices is generated, there are two variations of thesubgraph sampling model [21]: • Uniform sampling : Exactly n vertices are chosen uniformly at random without replacementfrom the vertex set V . In this case, the probability of observing a subgraph isomorphic to The subgraph counts are directly related to the graph homomorphism numbers [41, Sec 5.2]. Denote by inj ( H, G )the number of injective homomorphisms from H to G and ind ( H, G ) the number of injective homomorphisms thatalso preserve non-adjacency. Then ind ( H, G ) = s ( H, G ) aut ( H ) and inj ( H, G ) = n ( H, G ) aut ( H ), where aut ( H ) denotesthe number of automorphisms (i.e. isomorphisms to itself) for H . Note that it is sufficient to describe the sampled graph up to isomorphism since the property cc we want toestimate is invariant under graph isomorphisms. with v ( H ) = n is equal to P [ (cid:101) G (cid:39) H ] = s ( H, G ) (cid:0) Nn (cid:1) . (1) • Bernoulli sampling : Each vertex is sampled independently with probability p , where p iscalled the sampling ratio . Thus, the sample size | S | is distributed as Bin( N, p ), and theprobability of observing a subgraph isomorphic to H is equal to P [ (cid:101) G (cid:39) H ] = s ( H, G ) p v ( H ) (1 − p ) v ( G ) − v ( H ) . (2)The relation between these two models is analogous to that between sampling without replacementsand sampling with replacements. In the sublinear sampling regime where n (cid:28) N , they are nearlyequivalent. For technical simplicity, we focus on the Bernoulli sampling model and we refer to n (cid:44) pN as the effective sample size . Extensions to the uniform sampling model will be discussedin Section B.1 of Appendix B.A number of previous work on subgraph sampling is closely related with the theory of graphlimits [7], which is motivated by the so-called property testing problems in graphs [23]. Accordingto [7, Definition 2.11], a graph parameter f is “testable” if for any (cid:15) >
0, there exists a samplesize n such that for any graph G with at least n vertices, there is an estimator (cid:98) f = (cid:98) f ( (cid:101) G ) suchthat P [ | f ( G ) − (cid:98) f | > (cid:15) ] < (cid:15) . In other words, testable properties can be estimated with samplecomplexity that is independent of the size of the graph. Examples of testable properties include theedge density e ( G ) / (cid:0) v ( G )2 (cid:1) and the density of maximum cuts MaxCut ( G ) v ( G ) , where MaxCut ( G ) is the sizeof the maximum edge cut-set in G [24]; however, the number of connected components cc ( G ) or itsnormalized version cc ( G ) v ( G ) are not testable. Instead, our focus is to understand the dependency ofsample complexity of estimating cc ( G ) on the graph size N as well as other graph parameters. Itturns out for certain classes of graphs, the sample complexity grows sublinearly in N , which is themost interesting regime. Before introducing the classes of graphs we consider in this paper, we note that, unless furtherstructures are assumed about the parent graph, estimating many graph properties, including thenumber of connected components, has very high sample complexity that scales linearly with thesize of the graph. Indeed, there are two main obstacles in estimating the number of connectedcomponents in graphs, namely, high-degree vertices and long induced cycles . If either is allowed tobe present, we will show that even if we sample a constant faction of the vertices, any estimator of cc ( G ) has a worst-case additive error that is almost linear in the network size N . Specifically, • For any sampling ratio p bounded away from 1, as long as the maximum degree is allowed toscale as Ω( N ), even if we restrict the parent graph to be acyclic, the worst-case estimationerror for any estimator is Ω( N ). • For any sampling ratio p bounded away from 1 /
2, as long as the length of the induced cyclesis allowed to be Ω(log N ), even if we restrict the parent graph to have maximum degree 2,the worst-case estimation error for any estimator is Ω( N log N ). To see this, recall from [7, Theorem 6.1(b)] an equivalent characterization of f being testable is that for any (cid:15) >
0, there exists a sample size n such that for any graph G with at least n vertices, | f ( G ) − E f ( (cid:101) G ) | < (cid:15) . This isviolated for star graphs G = S N as N → ∞ G and G (cid:48) , where G is the star graph on N vertices and G (cid:48) consisting of N isolated vertices. Note that as long asthe center vertex in G is not sampled, the sampling distributions of G and G (cid:48) are identical. Thisimplies that the total variation between the sampled graph under G and G (cid:48) is at most p . Sincethe numbers of connected components in G and G (cid:48) differ by N −
1, this leads to a minimax lowerbound for the estimation error of Ω( N ) whenever p is bounded away from one.The effect of long induced cycles is subtler. The key observation is that a cycle and a path (ora cycle versus two cycles) locally look exactly the same. Indeed, let G (resp. G (cid:48) ) consists of N/ (2 r )disjoint copies of the smaller graph H (resp. H (cid:48) ), where H is a cycle of length 2 r and H (cid:48) consists oftwo disjoint cycles of length r . Both G and G (cid:48) have maximum degree 2 and contain induced cyclesof length at most 2 r . The local structure of G and G (cid:48) is the same (e.g., each connected subgraphwith at most r − N times in each graph) and the sampled versions of H and H (cid:48) are identically distributed provided at most r − r vertices (which occurs with probability at most e − r (1 − p ) ) for the distributionsto be different. By a union bound, it can be shown that the total variation between the sampledgraphs (cid:101) G and (cid:101) G (cid:48) is O (( N/r ) e − r (1 − p ) ). Thus, whenever the sampling ratio p is bounded awayfrom 1 /
2, choosing r = Θ(log N ) leads to a near-linear lower bound Ω( N log N ).The difficulties caused by high-degree vertices and long induced cycles motivate us to considerclasses of graphs defined by two key parameters, namely, the maximum degree d and the lengthof the longest induced cycles c . The case of c = 2 corresponds to forests (acyclic graphs), whichhave been considered by Frank [21]. The case of c = 3 corresponds to chordal graphs , i.e., graphswithout induced cycle of length four or above, which is the focus of this paper. It is well-knownthat various computation tasks that are intractable in the worst case, such as maximal clique andgraph coloring, are easy for chordal graphs; it turns out that the chordality structure also aids inboth the design and the analysis of computationally efficient estimators which provably attain theoptimal sample complexity. This section summarizes our main results in terms of the minimax risk of estimating the number ofconnected components over various class of graphs. As mentioned before, for ease of exposition, wefocus on the Bernoulli sampling model, where each vertex is sampled independently with probability p . Similar conclusions can be obtained for the uniform sampling model upon identifying p = n/N ,as given in Section B.1.When p grows from 0 to 1, an increasing fraction of the graph is observed and intuitively the es-timation problem becomes easier. Indeed, all forthcoming minimax rates are inversely proportionalto powers of p . Of particular interest is whether accurate estimation in the sublinear samplingregime, i.e., p = o (1). The forthcoming theory will give explicit conditions on p for this to holdtrue.As mentioned in the previous section, the main class of graphs we study is the so-called chordalgraphs (see Fig. 2 for an example): Definition 1.
A graph G is chordal if it does not contain induced cycles of length four or above,i.e., s ( C k , G ) = 0 for k ≥ . a) Chordal graph. (b) Non-chordal graph (containing an induced C ). Figure 2: Examples of chordal and non-chordal graphs both with three connected components.We emphasize that chordal graphs are allowed to have arbitrarily long cycles but no inducedcycles longer than three. The class of chordal graphs encompasses forests and disjoint union ofcliques as special cases, the two models that were studied in Frank’s original paper [21]. In additionto constructing estimators that adapt to larger collections of graphs (for which forests and unions ofcliques are special cases), we also provide theoretical analysis and optimality guarantees – elementsthat were not considered in past work.Next, we characterize the rate of the minimax mean-squared error for estimating the numberof connected components in a chordal graph, which turns out to depend on the number of vertices,the maximum degree, and the clique number. The upper and lower bounds differ by at most amultiplicative factor depending only on the clique number. To simplify the notation, henceforthwe denote q = 1 − p . Theorem 1 (Chordal graphs) . Let G ( N, d, ω ) denote the collection of all chordal graphs on N vertices with maximum degree and clique number at most d and ω ≥ , respectively. Then inf (cid:98) cc sup G ∈G ( N,d,ω ) E G | (cid:98) cc − cc ( G ) | = Θ ω (cid:18)(cid:18) Np ω ∨ N dp ω − (cid:19) ∧ N (cid:19) , where the lower bound holds provided that p ≤ p for some constant p < that only depends on ω . Furthermore, if p ≥ / , then for any ω , inf (cid:98) cc sup G ∈G ( N,d,ω ) E G | (cid:98) cc − cc ( G ) | ≤ N q ( d + 1) . (3)Specializing Theorem 1 to ω = 2 yields the minimax rates for estimating the number of treesin forests for small sampling ratio p . The next theorem shows that the result holds verbatim evenif p is arbitrarily close to 1, and, consequently, shows minimax rate-optimality of the bound in (3).The lower bound component is proved in Section B.3 of Appendix B. Theorem 2 (Forests) . Let F ( N, d ) (cid:44) G ( N, d, denote the collection of all forests on N verticeswith maximum degree at most d . Then for all ≤ p ≤ and ≤ d ≤ N , inf (cid:98) cc sup G ∈F ( N,d ) E G | (cid:98) cc − cc ( G ) | (cid:16) (cid:18) N qp ∨ N qdp (cid:19) ∧ N . (4)9he upper bounds in the previous results are achieved by unbiased estimators. As (3) shows,they work well even when the clique number ω grow with N , provided we sample more than half ofthe vertices; however, if the sample ratio p is below , especially in the sublinear regime of p = o (1)that we are interested in, the variance is exponentially large. To deal with large d and ω , we mustgive up unbiasedness to achieve a good bias-variance tradeoff. Such biased estimators, obtainedusing the smoothing technique introduced in [47], lead to better performance as quantified in thefollowing theorem. The proofs of these bounds are given in Theorem 7 and Theorem 9. Theorem 3 (Chordal graphs) . Let G ( N, d ) denote the collection of all chordal graphs on N verticeswith maximum degree at most d . Then, for any p < / , inf (cid:98) cc sup G ∈G ( N,d ) E G | (cid:98) cc − cc ( G ) | (cid:46) N (cid:0) N/d (cid:1) − p − p . Finally, for the special case of graphs consisting of disjoint union of cliques, as the followingtheorem shows, there are enough structures so that we no longer need to impose any condition onthe maximal degree. Similar to Theorem 3, the achievable scheme is a biased estimator, significantlyimproving the unbiased estimator in [21, 27] which has exponentially large variance.
Theorem 4 (Cliques) . Let C ( N ) denote the collection of all graphs on N vertices consisting ofdisjoint unions of cliques. Then, for any p < / , inf (cid:98) cc sup G ∈C ( N ) E G | (cid:98) cc − cc ( G ) | ≤ N ( N/ − p − p . Alternatively, the above results are summarized in Table 1 in terms of the sample complex-ity , i.e., the minimum sample size that allows an estimator cc ( G ) within an additive error of (cid:15)N with probability, say, at least 0.99, uniformly for all graphs in a given class. Here the samplesize is understood as the average number of sampled vertices n = pN . We have the followingcharacterization: Graph Sample complexity n Chordal Θ ω (cid:16) max (cid:110) N ω − ω − d ω − (cid:15) − ω − , N ω − ω (cid:15) − ω (cid:111)(cid:17) Forest Θ (cid:16) max (cid:110) d(cid:15) , √ N(cid:15) (cid:111)(cid:17)
Cliques Θ (cid:16) N log N log (cid:15) (cid:17) , (cid:15) ≥ N − / * The lower bound part of this statement follows from [58, Section3], which shows the optimality of Theorem 4.
Table 1: Sample complexity for various classes of graphsA consequence of Theorem 2 is that if the effective sample size n scales as O (max( √ N , d )),for the class of forests F ( N, d ) the worse-case estimation error for any estimator is Ω( N ), whichis within a constant factor to the trivial error bound when no samples are available. Conversely,10f n (cid:29) max( √ N , d ), which is sublinear in N as long as the maximal degree satisfies d = o ( N ),then it is possible to achieve a non-trivial estimation error of o ( N ). More generally for chordalgraphs, Theorem 1 implies that if n = O (max( N ω − ω , d ω − N ω − ω − )), the worse-case estimation errorin G ( N, d, ω ) for any estimator is at least Ω ω ( N ), In this section we propose estimators which provably achieve the upper bounds presented in Sec-tion 3 for the Bernoulli sampling model. In Section 4.1, we highlight some combinatorial propertiesand characterizations of chordal graphs that underpin both the construction and the analysis ofthe estimators in Section 4.2. The special case of disjoint unions of cliques is treated in Section 4.3,where the estimator of Frank [21] is recovered and further improved. Analogous results for theuniform sampling model are given in Section B.1 of Appendix B. Finally, in Section 4.4, we discussa heuristic to generalize the methodology to non-chordal graphs.
In this subsection we discuss the relevant combinatorial properties of chordal graphs which aid inthe design and analysis of our estimators. We start by introducing a notion of vertex eliminationordering.
Definition 2.
A perfect elimination ordering (PEO) of a graph G on N vertices is a vertex labelling { v , v , . . . , v N } such that, for each j , N G ( v j ) ∩ { v , ..., v j − } is a clique. Figure 3: A chordal graph G with PEO labelled. In this example, cc ( G ) = 3 = 16 −
19 + 6 = s ( K , G ) − s ( K , G ) + s ( K , G ).In other words, if one eliminates the vertices sequentially according to a PEO starting from thelast vertex, at each step, the neighborhood of the vertex to be eliminated forms a clique; see Fig. 3for an example. A classical result of Dirac asserts that the existence of a PEO is in fact the definingproperty of chordal graphs (cf. e.g., [56, Theorem 5.3.17]). Theorem 5.
A graph is chordal if and only if it admits a PEO.
In general a PEO of a chordal graph is not unique; however, it turns out that the size of eachneighborhood in the vertex elimination process is unique up to permutation, a fact that we will11xploit later on. The next lemma makes this claim precise. For brevity, we defer its proof toAppendix A.
Lemma 1.
Let { v , . . . , v N } and { v (cid:48) , . . . , v (cid:48) N } be two PEOs of a chordal graph G . Let c j and c (cid:48) j denote the cardinalities of N G ( v j ) ∩ { v , . . . , v j − } and N G ( v (cid:48) j ) ∩ { v (cid:48) , . . . , v (cid:48) j − } , respectively. Thenthere is a bijection σ : [ N ] → [ N ] such that c σ ( j ) = c (cid:48) j for all j . Recall that s ( K i , G ) denotes the number of cliques of size i in G . For any chordal graph G , itturns out that the number of components can be expressed as an alternating sum of clique counts(cf. e.g., [56, Exercise 5.3.22, p. 231]); see Fig. 3 for an example. Instead of the topological proofinvolving properties of the clique simplex of chordal graphs [14, 44], in the next lemma we providea combinatorial proof together with a sandwich bound. The main purpose of this exposition is toexplain how to enumerate cliques in chordal graphs using vertex elimination, which plays a key rolein analyzing the statistical estimator developed in the next subsection. Lemma 2.
For any chordal graph G , cc ( G ) = (cid:88) i ≥ ( − i +1 s ( K i , G ) . (5) Furthermore, for any r ≥ , r (cid:88) i =1 ( − i +1 s ( K i , G ) ≤ cc ( G ) ≤ r − (cid:88) i =1 ( − i +1 s ( K i , G ) . (6) Proof.
Since G is chordal, by Theorem 5, it has a PEO { v , . . . , v N } . Define C j (cid:44) N G ( v j ) ∩ { v , . . . , v j − } , c j (cid:44) | C j | . (7)Since the neighbors of v j among v , . . . , v j − form a clique, we obtain (cid:0) c j i − (cid:1) new cliques of size i when we adjoin the vertex v j to the subgraph induced by v , . . . , v j − . Thus, s ( K i , G ) = N (cid:88) j =1 (cid:18) c j i − (cid:19) . (8)Moreover, note that cc ( G ) = (cid:80) Nj =1 { c j = 0 } . Hence, it follows that r − (cid:88) i =1 ( − i +1 s ( K i , G ) = r − (cid:88) i =1 ( − i +1 N (cid:88) j =1 (cid:18) c j i − (cid:19) = N (cid:88) j =1 2 r − (cid:88) i =1 ( − i +1 (cid:18) c j i − (cid:19) = N (cid:88) j =1 2( r − (cid:88) i =0 ( − i (cid:18) c j i (cid:19) = N (cid:88) j =1 (cid:18)(cid:18) c j − r − (cid:19) { c j (cid:54) = 0 } + { c j = 0 } (cid:19) ≥ N (cid:88) j =1 { c j = 0 } = cc ( G ) , r (cid:88) i =1 ( − i +1 s ( K i , G ) = r (cid:88) i =1 ( − i +1 N (cid:88) j =1 (cid:18) c j i − (cid:19) = N (cid:88) j =1 2 r (cid:88) i =1 ( − i +1 (cid:18) c j i − (cid:19) = N (cid:88) j =1 2 r − (cid:88) i =0 ( − i (cid:18) c j i (cid:19) = N (cid:88) j =1 (cid:18) − (cid:18) c j − r − (cid:19) { c j (cid:54) = 0 } + { c j = 0 } (cid:19) ≤ N (cid:88) j =1 { c j = 0 } = cc ( G ) . In this subsection, we consider unbiased estimation of the number of connected components inchordal graphs. As we will see, unbiased estimators turn out to be minimax rate-optimal forchordal graphs with bounded clique size. The subgraph count identity (5) suggests the followingunbiased estimator (cid:98) cc = − (cid:88) i ≥ (cid:18) − p (cid:19) i s ( K i , (cid:101) G ) . (9)Indeed, since the probability of observing any given clique of size i is p i , (9) is clearly unbiased inthe same spirit of the Horvitz-Thompson estimator [33]. In the case where the parent graph G isa forest, (9) reduces to the estimator (cid:98) cc = v ( (cid:101) G ) /p − e ( (cid:101) G ) /p , as proposed by Frank [21].A few comments about the estimator (9) are in order. First, it is completely adaptive to theparameters ω , d and N , since the sum in (9) terminates at the clique number of the subsampledgraph. Second, it can be evaluated in time that is linear in v ( (cid:101) G ) + e ( (cid:101) G ). Indeed, the next lemmagives a simple formula for computing (9) using the PEO. Since a PEO of a chordal graph G canbe found in O ( v ( G ) + e ( G )) time [50, 54] and any induced subgraph of a chordal graph remainschordal, the estimator (9) can be evaluated in linear time. Recall that q = 1 − p . Lemma 3.
Let { (cid:101) v , . . . , (cid:101) v m } , m = | S | , be a PEO of (cid:101) G . Then (cid:98) cc = 1 p m (cid:88) j =1 (cid:18) − qp (cid:19) (cid:101) c j , (10) where (cid:101) c j (cid:44) | N (cid:101) G ( (cid:101) v j ) ∩ { (cid:101) v , . . . , (cid:101) v j − }| can be calculated from (cid:101) G in linear time.Proof. Because the subsampled graph (cid:101) G is also chordal, by (8), we have s ( K i , (cid:101) G ) = (cid:80) mj =1 (cid:0) (cid:101) c j i − (cid:1) . The algorithm in [54] is implemented in R using the max_cardinality() function in the graph package igraph . (cid:98) cc = − m (cid:88) i =1 (cid:18) − p (cid:19) i s ( K i , (cid:101) G ) = − m (cid:88) i =1 (cid:18) − p (cid:19) i m (cid:88) j =1 (cid:18) (cid:101) c j i − (cid:19) = − m (cid:88) j =1 m (cid:88) i =1 (cid:18) − p (cid:19) i (cid:18) (cid:101) c j i − (cid:19) = 1 p m (cid:88) j =1 m − (cid:88) i =0 (cid:18) − p (cid:19) i (cid:18)(cid:101) c j i (cid:19) = 1 p m (cid:88) j =1 (cid:18) − qp (cid:19) (cid:101) c j . In addition to the aforementioned computational advantages of using (10) over (9), let us alsodescribe why (10) is more numerically stable. Both estimators are equal to an alternating sumof the form (cid:80) i a i ( − /p ) b i . In (10), a i = q b i /p , whereas a i = − s ( K b i , (cid:101) G ) in (9), which can be aslarge as O ( N ω ) in magnitude. Thus, when (cid:101) G is sufficiently dense, computation of (9) involvesadding and subtracting extremely large numbers – making it prone to integer overflow and sufferfrom loss of numerical precision. For example, double-precision floating-point arithmetic (e.g., usedin R ) gives from 15 to 17 significant decimal digits precision. In our experience, this tends to beinsufficient for most mid-sized, real-world networks (see Appendix C) and the estimator (9) outputswildly imprecise numbers.Using elementary enumerative combinatorics, in particular, the vertex elimination structure ofchordal graphs, the next theorem provides a performance guarantee for the estimator (9) in termsof a variance bound and a high-probability bound, which, in particular, settles the upper bound ofthe minimax mean squared error in Theorem 1 and Theorem 2. Theorem 6.
Let G be a chordal graph on N vertices with maximum degree and clique number atmost d and ω ≥ , respectively. Suppose (cid:101) G is generated by the Bern( p ) sampling model. Then (cid:98) cc defined in (9) is an unbiased estimator of cc ( G ) . Furthermore, Var [ (cid:98) cc ] ≤ N (cid:18) qp + d (cid:19) (cid:32)(cid:18) qp (cid:19) ω − ∨ qp (cid:33) ≤ Np ω + N dp ω − , (11) and for all t ≥ , P [ | (cid:98) cc − cc ( G ) | ≥ t ] ≤ (cid:26) − p ω t dω + 1)( N + t/ (cid:27) . (12)To prove Theorem 6 we start by presenting a useful lemma. Note that Lemma 3 states that (cid:98) cc is a linear combination of ( − q/p ) (cid:101) c j ; here (cid:101) c j is computed using a PEO of the sampled graph, whichitself is random. The next result allows us rewrite the same estimator as a linear combinationof ( − q/p ) (cid:98) c j , where (cid:98) c j depends on the PEO of the parent graph (which is deterministic). Notethat this is only used in the course of analysis since the population level PEO is not observed.This representation is extremely useful in analyzing the performance of (cid:98) cc and its biased variant inSection 4.2.2. More generally, we have the following result, which we prove in Appendix A. Lemma 4.
Let { v , . . . , v N } be a PEO of G and let { (cid:101) v , . . . , (cid:101) v m } , m = | S | , be a PEO of (cid:101) G .Furthermore, let (cid:98) c j = | N (cid:101) G ( v j ) ∩ { v , . . . , v j − }| and (cid:101) c j = | N (cid:101) G ( (cid:101) v j ) ∩ { (cid:101) v , . . . , (cid:101) v j − }| . Let (cid:98) g = (cid:98) g ( (cid:101) G ) be a linear estimator of the form (cid:98) g = m (cid:88) j =1 g ( (cid:101) c j ) . (13)14 hen (cid:98) g = N (cid:88) j =1 b j g ( (cid:98) c j ) , where b j (cid:44) { v j ∈ S } . We also need a couple of ancillary results whose proofs are also given in Appendix A:
Lemma 5 (Orthogonality) . Let f ( k ) = (cid:18) − qp (cid:19) k , k ≥ . (14) Let { b v : v ∈ V } be independent Bern( p ) random variables. For any S ⊂ V , define N S = (cid:80) v ∈ S b v .Then E [ f ( N S ) f ( N T )] = { S = T } ( q/p ) | S | . In particular, E [ f ( N S )] = 0 for any S (cid:54) = ∅ . Lemma 6.
Let { v , . . . , v N } be a PEO of a chordal graph G on N vertices with maximum degreeand clique number at most d and ω , respectively. Let C j (cid:44) N G ( v j ) ∩ { v , . . . , v j − } . Then |{ ( i, j ) : i (cid:54) = j, C j = C i (cid:54) = ∅}| ≤ N ( d − . (15) Furthermore, let A j = { v j } ∪ C j . (16) Then for each j ∈ [ N ] , |{ i ∈ [ N ] : i (cid:54) = j, A i ∩ A j (cid:54) = ∅}| ≤ dω. (17) Proof of Theorem 6.
For a chordal graph G on N vertices, let { v , . . . , v N } be a PEO of G . Recallfrom (7) that C j denote the set of neighbors of v j among v , . . . , v j − and c j denotes its cardinality.That is, c j = | N G ( v j ) ∩ { v , . . . , v j − }| = j − (cid:88) k =1 { v k ∼ v j } . As in Lemma 4, let (cid:98) c j denote the sample version, i.e., (cid:98) c j (cid:44) | N (cid:101) G ( v j ) ∩ { v , . . . , v j − }| = b j j − (cid:88) k =1 b k { v k ∼ v j } , where b k (cid:44) { v k ∈ S } i.i.d. ∼ Bern( p ). By Lemma 3 and Lemma 4, (cid:98) cc can be written as (cid:98) cc = 1 p m (cid:88) j =1 f ( (cid:101) c j ) = 1 p N (cid:88) j =1 b j f ( (cid:98) c j ) , (18) In fact, the function f ( N S ) = ( − qp ) N S is the (unnormalized) orthogonal basis for the binomial measure that isused in the analysis of Boolean functions [46, Definition 8.40]. The bound in (15) is almost optimal, since the left-hand side is equal to N ( d −
2) when G consists of N/ ( d + 1)copies of stars S d . f is defined in (14).To show the variance bound (11), we note that Var [ (cid:98) cc ] = 1 p N (cid:88) j =1 Var [ b j f ( (cid:98) c j )] + 1 p (cid:88) j (cid:54) = i Cov [ b j f ( (cid:98) c j ) , b i f ( (cid:98) c i )] . (19)Note that (cid:98) c j | { b j = 1 } ∼ Bin( c j , p ). Using Lemma 5, it is straightforward to verify that Var [ b j f ( (cid:98) c j )] = (cid:40) p (cid:16) qp (cid:17) c j if c j > pq if c j = 0 . (20)Since c j ≤ ω −
1, it follows that
Var [ b j f ( (cid:98) c j )] ≤ p (cid:34)(cid:18) qp (cid:19) ω − ∨ qp (cid:35) . (21)The covariance terms are less obvious to bound; but thanks to the orthogonality property inLemma 5, many of them are zero or negative. Let N C (cid:44) (cid:80) b j { v j ∈ C } . For any j , since v j (cid:54)∈ C j by definition, applying Lemma 5 yields E [ b j f ( (cid:98) c j )] = p E [ f ( N C j )] = p { C j = ∅} . (22)Without loss of generality, assume j < i . By the definition of C j , we have v i / ∈ C j . Next, weconsider two cases separately: Case I: v j / ∈ C i . If either C j or C i is nonempty, Lemma 5 yields Cov [ b j f ( (cid:98) c j ) , b i f ( (cid:98) c i )] (22) = E [ b i b j f ( (cid:98) c j ) f ( (cid:98) c i )] = p E [ f ( N C j ) f ( N C i )] = p { C j = C i } (cid:18) qp (cid:19) c j . If C j = C i = ∅ , then Cov [ b j f ( (cid:98) c j ) , b i f ( (cid:98) c i )] = Cov [ b j , b i ] = 0. Case II: v j ∈ C i . Then E [ b i f ( (cid:98) c i )] = 0 by (22). Using Lemma 5 again, we have Cov [ b j f ( (cid:98) c j ) , b i f ( (cid:98) c i )] = p E (cid:34) b j (cid:18) − qp (cid:19) b j (cid:35) E [ f ( N C j ) f ( N C i \{ v j } )]= − pq E [ f ( N C j ) f ( N C i \{ v j } )]= − pq { C j = C i \ { v j }} (cid:18) qp (cid:19) c j . To summarize, we have shown that
Cov [ b j f ( (cid:98) c j ) , b i f ( (cid:98) c i )] = p (cid:16) qp (cid:17) c j if C j = C i (cid:54) = ∅− pq (cid:16) qp (cid:17) c j if C j = C i \ { v j } and v j ∈ C i . (cid:88) j (cid:54) = i Cov [ b j f ( (cid:98) c j ) , b i f ( (cid:98) c i )] ≤ (cid:88) j (cid:54) = i : C j = C i (cid:54) = ∅ p (cid:18) qp (cid:19) c j (15) ≤ N ( d − p (cid:34)(cid:18) qp (cid:19) ω − ∨ qp (cid:35) . (23)Finally, combining (19), (21) and (23) yields the desired (11).The high-probability bound (12) for (cid:98) cc follows from the concentration inequality in Lemma 10in Appendix A. To apply this result, note that (cid:98) cc is a sum of dependent random variables (cid:98) cc = (cid:88) j ∈ [ N ] Y j , (24)where Y j = p b j f ( (cid:98) c j ) satisfies E [ Y j ] = 0 for c j > | Y j | ≤ b (cid:44) ( p ) ω almost surely. Also, S (cid:44) (cid:80) j ∈ [ N ] Var [ Y j ] ≤ N ( p ) ω by (20). To control the dependency between { Y j } j ∈ [ N ] , note that (cid:98) c j = b j (cid:80) k : v k ∈ C j b k . Thus Y j only depends on { b k : k ∈ A j } , where A j = { v j } ∪ C j . Define adependency graph Γ, where V (Γ) = [ N ] and E (Γ) = {{ i, j } : i (cid:54) = j, A i ∩ A j (cid:54) = ∅} . Then Γ has maximum degree bounded by dω , by Lemma 6. Up to this point, we have only considered unbiased estimators of the number of connected compo-nents. If the sample ratio p is at least , Theorem 1 implies its variance is Var [ (cid:98) cc ] ≤ N ( d + 1) , regardless of the clique number ω of the parent graph. However, if the clique number ω growswith N , for small sampling ratio p the coefficients of the unbiased estimator (9) are as large as p ω which results in exponentially large variance. Therefore, in order to deal with graphs with largecliques, we must give up unbiasedness to achieve better bias-variance tradeoff. Using a techniqueknown as smoothing introduced in [47], next we modify the unbiased estimator to achieve a goodbias-variance tradeoff.To this end, consider a discrete random variable L ∈ N independent of everything else. Definethe following estimator by discarding those terms in (10) for which (cid:101) c j exceeds L , and then averagingover the distribution of L . In other words, let (cid:98) cc L (cid:44) E L p m (cid:88) j =1 (cid:18) − qp (cid:19) (cid:101) c j { (cid:101) c j ≤ L } = 1 p m (cid:88) j =1 (cid:18) − qp (cid:19) (cid:101) c j P [ L ≥ (cid:101) c j ] . (25)Effectively, smoothing acts as soft truncation by introducing a tail probability that modulates theexponential growth of the original coefficients. The variance can then be bounded by the maximummagnitude of the coefficients in (25). Like (9), (25) can be computed in linear time.The next theorem bounds the mean-square error of (cid:98) cc L , which implies the minimax upperbound previously announced in Theorem 3. Its proof is somewhat technical and so we defer it toAppendix A. 17 heorem 7. Let L ∼ Poisson( λ ) with λ = p − p log (cid:16) Np dω (cid:17) . If the maximum degree and cliquenumber of G is at most d and ω , respectively, then when p < / , E G | (cid:98) cc L − cc ( G ) | ≤ N (cid:18) N p dω (cid:19) − p − p . If the parent graph G consists of disjoint union of cliques, so does the sampled graph (cid:101) G . Countingcliques in each connected components, we can rewrite the estimator (9) as (cid:98) cc = (cid:88) r ≥ (cid:18) − (cid:18) − qp (cid:19) r (cid:19) (cid:101) cc r = cc ( (cid:101) G ) − (cid:88) r ≥ (cid:18) − qp (cid:19) r (cid:101) cc r , (26)where (cid:101) cc r is the number of components in the sampled graph (cid:101) G that have r vertices. This coincideswith the unbiased estimator proposed by Frank [21] for cliques, which is, in turn, based on theestimator of Goodman [27]. The following theorem, whose proof is given in Appendix A, providesan upper bound on its variance, recovering the previous result in [21, Corollary 11]: Theorem 8.
Let G be a disjoint union of cliques with clique number at most ω . Then (cid:98) cc is anunbiased estimator of cc ( G ) and E G | (cid:98) cc − cc ( G ) | = Var [ (cid:98) cc ] = N (cid:88) r =1 (cid:18) qp (cid:19) r cc r ≤ N (cid:18)(cid:18) qp (cid:19) ω ∧ qp (cid:19) , where cc r is the number of connected components in G of size r . Theorem 8 implies that as long as we sample at least half of the vertices, i.e., p ≥ , for any G consisting of disjoint cliques, the unbiased estimator (26) satisfies E G | (cid:98) cc − cc ( G ) | ≤ N, regardless of the clique size. However, if p < /
2, the variance can be exponentially large in N . Next, we use the smoothing technique again to obtain a biased estimator with near-optimalperformance. To this end, consider a discrete random variable L ∈ N and define the followingestimator by truncating (26) at the random location L and average over its distribution: (cid:101) cc L (cid:44) cc ( (cid:101) G ) − E L (cid:34) L (cid:88) r =1 (cid:18) − qp (cid:19) r (cid:101) cc r (cid:35) = cc ( (cid:101) G ) − (cid:88) r ≥ (cid:18) − qp (cid:19) r P [ L ≥ r ] (cid:101) cc r . (27)The following result, proved in Appendix A, bounds the mean squared error of (cid:101) cc L and, con-sequently, bounds the minimax risk in Theorem 4. It turns out that the smoothed estimator (27)with appropriately chosen parameters is nearly optimal. In fact, Theorem 9, whose proof is givenin Appendix A, gives an upper bound on the sampling complexity (see Table 1), which, in viewof [58, Theorem 4], is seen to be optimal. Theorem 9.
Let G be a disjoint union of cliques. Let L ∼ Pois ( λ ) with λ = p − p log( N/ . If p < / , then E G | (cid:101) cc L − cc ( G ) | ≤ N ( N/ − p − p . emark 1. Alternatively, we could specialize the estimator (cid:98) cc L in (25) that is designed for generalchordal graphs to the case when G is a disjoint union of cliques; however, the analysis is less cleanand the results are slightly weaker than Theorem 9. A general graph can always be made chordal by adding edges. Such an operation is called a chordalcompletion or triangulation of a graph, henceforth denoted by TRI . There are many ways totriangulate a graph and this is typically done with the goal of minimizing some objective function(e.g., number of edges or the clique number). Without loss of generality, triangulations do notaffect the number of connected components, since the operation can be applied to each component.In view of the various estimators and their performance guarantees developed so far for chordalgraphs, a natural question to ask is how one might generalize those to non-chordal graphs. Oneheuristic is to first triangulate the subsampled graph and then apply the estimator such as (10)and (25) that are designed for chordal graphs. Suppose a triangulation operation commutes withsubgraph sampling in distribution, then the modified estimator would inherit all the performanceguarantees proved for chordal graphs; unfortunately, this does not hold in general. Thus, so far ourtheory does not readily extend to non-chordal graphs. Nevertheless, the empirical performance ofthis heuristic estimator is competitive with (cid:98) cc in both performance (see Fig. 10) and computationalefficiency. Indeed, there are polynomial time algorithms that add at most 8 k edges if at least k edges must be added to make the graph chordal [45]. In view of the theoretical guarantees inTheorem 6, it is better to be conservative with adding edges so as the maximal degree d and theclique number ω are kept small.It should be noted that blindly applying estimators designed for chordal graphs to the subsam-pled non-chordal graph without triangulation leads to nonsensical estimates. Thus, preprocessingthe graph appears to be necessary for producing good results. We will leave the task of rigorouslyestablishing these heuristics for future work. Next we give a general lower bound for estimating additive graph properties (e.g. the number ofconnected components, subgraph counts) under the Bernoulli sampling model. The proof uses themethod of two fuzzy hypotheses [55, Theorem 2.15], which, in the context of estimating graph prop-erties, entails constructing a pair of random graphs whose properties have different average values,and the distributions of their subsampled versions are close in total variation, which is ensured bymatching lower-order subgraph counts or sampling certain configurations on their vertices. Theutility of this result is to use a pair of smaller graphs (which can be found in an ad hoc manner)to construct a bigger pair of graphs on N vertices and produce a lower bound that scales with N .The proof of Theorem 10 is furnished in Appendix A. By “commute in distribution” we mean the random graphs
TRI ( (cid:101) G ) and (cid:94) TRI ( G ) have the same distribution. Thatis, the triangulated sampled graph is statistically identical to a sampled version of a triangulation of the parent graph. An implementation of graph triangulation R is provided by the is_chordal() function in the package igraph [ ? ]. heorem 10. Let f be a graph parameter that is invariant under isomorphisms and additive underdisjoint union, i.e., f ( G + H ) = f ( G ) + f ( H ) [41, p. 41]. Let G be a class of graphs with at most N vertices. Let m and M = N/m be integers. Let H and H (cid:48) be two graphs with m vertices. Assumethat any disjoint union of the form G + · · · + G M is in G where G i is either H or H (cid:48) . Suppose M ≥ and TV(
P, P (cid:48) ) ≤ / , where P (resp. P (cid:48) ) denote the distribution of the isomorphismclass of the sampled graph (cid:101) H (resp. (cid:101) H (cid:48) ). Let (cid:101) G denote the sampled version of G under the Bernoullisampling model with probability p . Then inf (cid:98) f sup G ∈G P (cid:104) | (cid:98) f (cid:0) (cid:101) G (cid:1) − f ( G ) | ≥ ∆ (cid:105) ≥ . , (28) where ∆ (cid:44) | f ( H ) − f ( H (cid:48) ) | (cid:16)(cid:113) Nm TV(
P,P (cid:48) ) ∧ Nm (cid:17) . The application of Theorem 10 relies on the construction of a pair of small graphs H and H (cid:48) whose sampled versions are close in total variation. To this end, we provide two schemes to boundTV( P (cid:101) H , P (cid:101) H (cid:48) ) from above. Since cc ( G ) is invariant with respect to isomorphisms, it suffices to describe the sampled graph (cid:101) G up to isomorphisms. It is well-known that a graph G can be determined up to isomorphisms by itshomomorphism numbers that count the number of ways to embed a smaller graph in G . Amongvarious versions of graph homomorphism numbers (cf. [41, Sec 5.2]) the one that is most relevantto the present paper is s ( H, G ), which, as defined in Section 1, is the number of vertex-induced subgraphs of G that are isomorphic to H . Specifically, the relevance of induced subgraph countsto the subgraph sampling model is two-fold: • The list of vertex-induced subgraph counts { s ( H, G ) : v ( H ) ≤ N } determines G up to iso-morphism and hence constitutes a sufficient statistic for (cid:101) G . In fact, it is further sufficient tosummarize (cid:101) G into the list of numbers: { s ( H, (cid:101) G ) : v ( H ) ≤ N, H is connected } , since thecount of any disconnected subgraph is a fixed polynomial of connected subgraph counts. Thisis a well-known result in the theory of graph reconstruction [17, 36, 57]. For example, for anygraph G , we have s ( , G ) = (cid:0) s ( , G )2 (cid:1) − s ( , G ) and s ( , G ) = (cid:18) s ( , G )2 (cid:19) − s ( , G ) − s ( , G ) − s ( , G ) − s ( , G ) − s ( , G ) − s ( , G ) − s ( , G ) , which can be obtained by counting pairs of vertices or edges in two different ways, respectively.See [43, Section 2] for more examples. This statistic cannot be further reduced because it is known that the connected subgraphs counts do not fulfillany predetermined relations in the sense that the closure of the range of their normalized version (subgraph densities)has nonempty interior [17]. Under the Bernoulli sampling model, the probabilistic law of the isomorphism class of thesampled graph is a polynomial in the sampling ratio p , with coefficients given by the inducedsubgraph counts. Indeed, recall from (2) that P [ (cid:101) G (cid:39) H ] = s ( H, G ) p v ( H ) (1 − p ) v ( G ) − v ( H ) .Therefore two graphs with matching subgraph counts for all (connected) graphs of n verticesare statistically indistinguishable unless more than n vertices are sampled.We begin with a refinement of the classical result that says disconnected subgraphs counts arefixed polynomials of connected subgraph counts. Below we provide a more quantitative version byshowing that only those connected subgraphs which contain no more vertices than the disconnectedsubgraph involved. The proofs of the next set of results are given in Appendix A. Lemma 7.
Let H be a disconnected graph of v vertices. Then for any G , s ( H, G ) can be expressedas a polynomial, independent of G , in { s ( g, G ) : g is connected and v ( g ) ≤ v } . Corollary 1.
Suppose H and H (cid:48) are two graphs in which s ( h, H ) = s ( h, H (cid:48) ) for all connected h with v ( h ) ≤ v . Then s ( h, H ) = s ( h, H (cid:48) ) for all h with v ( h ) ≤ v . Lemma 8.
Let H and H (cid:48) be two graphs on m vertices. If s ( h, H ) = s ( h, H (cid:48) ) (29) for all connected graphs h with at most k vertices with k ∈ [ m ] , then TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ P [Bin( m, p ) ≥ k + 1] ≤ (cid:18) mk + 1 (cid:19) p k +1 . (30) Furthermore, if p ≤ ( k + 1) /m , then TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ exp (cid:26) − k + 1 − pm ) m (cid:27) . (31)In Fig. 6, we give an example of two graphs H and H (cid:48) on 8 vertices that have matching countsof connected subgraphs with at most 4 vertices. Thus, by Lemma 8, they also have matching countsof all subgraphs with at most 4 vertices, and if p ≤ /
8, then TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ e − (1 − p ) . It is well-known that for any probability distributions P and P (cid:48) , the total variation is given byTV( P, P (cid:48) ) = inf P [ X (cid:54) = X (cid:48) ], where the infimum is over all couplings, i.e., joint distributions of X and X (cid:48) that are marginally distributed as P and P (cid:48) respectively. There is a natural couplingbetween the sampled graphs (cid:101) H and (cid:101) H (cid:48) when we define the parent graph H and H (cid:48) on the sameset of labelled vertices. In some of the applications of Theorem 10, the constructions of H and H (cid:48) are such that if certain configurations of the vertices are included or excluded in the sample, theresulting graphs are isomorphic. This property allows us to bound the total variation between thesampled graphs as follows. Lemma 9.
Let H and H (cid:48) be graphs defined on the same set of vertices V . Let U be a subset of V and suppose that for any u ∈ U , we have H [ V \ { u } ] (cid:39) H (cid:48) [ V \ { u } ] . Then, the total variation TV( P (cid:101) H , P (cid:101) H (cid:48) ) can be bounded by the probability that every vertex in U is sampled, viz., TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ − P (cid:104) (cid:101) H (cid:39) (cid:101) H (cid:48) (cid:105) ≤ p | U | . f, in addition, H [ U ] (cid:39) H (cid:48) [ U ] , then the total variation TV( P (cid:101) H , P (cid:101) H (cid:48) ) can be bounded by the proba-bility that every vertex in U is sampled and at least one vertex in V \ U is sampled, viz., TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ p | U | (1 − (1 − p )) | V |−| U | . In Fig. 4, we give an example of two graphs H and H (cid:48) satisfying the assumption of Lemma 9.In this example, | U | = 2, and | V | = 8. Note that if any of the vertices in U are removed alongwith all their incident edges, then the resulting graphs are isomorphic. Also, since H [ U ] (cid:39) H (cid:48) [ U ],Lemma 9 implies that TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ p (1 − (1 − p ) ). u u bc (a) The graph H . u u (b) The graph H (cid:48) . u b u (c) The resulting graph when u is sampled but not u . Figure 4: Example where U = { u , u } is an edge. If any of these vertices are not sampled and allincident edges are removed, the resulting graphs are isomorphic.In the remainder of the section, we apply Theorem 10, Lemma 8, and Lemma 9 to derive lowerbounds on the minimax risk for graphs that contain cycles and general chordal graphs, respectively.The main task is to handcraft a pair of graphs H and H (cid:48) that either have matching counts of smallsubgraphs or for which certain configurations of their vertices induce subgraphs that are isomorphic. Theorem 11 (Chordal graphs) . Let G ( N, d, ω ) denote the collection of all chordal graphs on N vertices with maximum degree and clique number at most d and ω ≥ , respectively. Assume that p < ω . Then inf (cid:98) cc sup G ∈G ( N,d,ω ) E G | (cid:98) cc − cc ( G ) | = Θ ω (cid:18)(cid:18) Np ω ∨ N dp ω − (cid:19) ∧ N (cid:19) . Proof.
There are two different constructions we give, according to whether d ≥ ω or d < ω . Case I: d ≥ ω . For every ω ≥ m ∈ N , we construct a pair of graphs H and H (cid:48) , such that v ( H ) = v ( H (cid:48) ) = ω − m ω − (32) d max ( H ) = d max ( H (cid:48) ) = m ω − + ω − , ω ≥ d max ( H ) = 0 , d max ( H (cid:48) ) = m, ω = 2 (34) cc ( H ) = m + 1 , cc ( H (cid:48) ) = 1 (35) | s ( K ω , H ) − s ( K ω , H (cid:48) ) | = m (36)22ix a set of ω − U that forms a clique. We first construct H . For every subset S ⊂ U such that | S | is even, let V S be a set of m distinct vertices such that the neighborhood of every v ∈ V S is given by ∂v = S . Let the vertex set V ( H ) be the union of U and all V S such that | S | is even. In particular, because of the presence of S = ∅ , H always has exactly m isolated vertices(unless ω = 2, in which case H consists of m + 1 isolated vertices). Repeat the same constructionfor H (cid:48) with | S | being odd. Then both H are H (cid:48) are chordal and have the same number of verticesas in (32), since v ( H ) = ω − m (cid:88) ≤ i ≤ ω − , i even (cid:18) ω − i (cid:19) = v ( H (cid:48) ) = ω − m (cid:88) ≤ i ≤ ω − , i odd (cid:18) ω − i (cid:19) which follows from the binomial summation formula. Similarly, (33)–(36) can be readily verified.We also have that s ( K i , H ) = (cid:18) ω − i (cid:19) + m (cid:88) ≤ j ≤ ω − , j even (cid:18) ω − j (cid:19)(cid:18) ji − (cid:19) = s ( K i , H (cid:48) ) = (cid:18) ω − i (cid:19) + m (cid:88) ≤ j ≤ ω − , j odd (cid:18) ω − j (cid:19)(cid:18) ji − (cid:19) = (cid:18) ω − i (cid:19) + m (cid:18) ω − i − (cid:19) ω − − i , for i = 1 , , . . . , ω −
1. This follows from the fact that (cid:80) ≤ j ≤ ω − ( − j (cid:0) ω − j (cid:1)(cid:0) ji − (cid:1) = 0 and (cid:80) ≤ j ≤ ω − (cid:0) ω − j (cid:1)(cid:0) ji − (cid:1) = (cid:0) ω − i − (cid:1) ω − i .To compute the total variation distance between the sampled graphs, we first assume that H and H (cid:48) are defined on the same set of labelled vertices V . The key observation is the following:by construction, H [ U ] (cid:39) H (cid:48) [ U ] (since U induces a clique) and, furthermore, failing to sample anyvertex in U results in an isomorphic graph, i.e., H [ V \ { u } ] (cid:39) H (cid:48) [ V \ { u } ] for any u ∈ U . Indeed,the structure of the induced subgraph H [ V \ { u } ] can be described as follows. First, let U form aclique. Next, for every nonempty subset S ⊂ U \ { u } , attach a set of m distinct vertices (denotedby V S ) so that the neighborhood of every v ∈ V S is given by ∂v = S . Finally, add m + 1 isolatedvertices. See Fig. 4 ( ω = 3) and Fig. 5 ( ω = 4) for illustrations of this property and the iterativenature of this construction, in the sense that the construction of H (resp. H (cid:48) ) for ω = k + 1 can beobtained from the construction of H (resp. H (cid:48) ) for ω = k by adding another vertex u to U suchthat ∂u = U and then adjoining m distinct vertices to every even (resp. odd) cardinality set S ⊂ U containing u .Thus by Lemma 9, TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ p | U | (cid:0) − (1 − p ) | V |−| U | (cid:1) = p ω − (1 − (1 − p ) m ω − ). Accordingto (33), we choose m = (cid:4) ( d − ω + 2)2 − ω +3 (cid:5) ≥ d − ω +2 if ω ≥ m = d if ω = 2. Thenwe have, TV( P (cid:101) H , P (cid:101) H (cid:48) ) = p ω − (1 − (1 − p ) d ) ≤ p ω − ( pd ∧ p ensures thatTV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ p < / (cid:98) cc sup G ∈G ( N,d,ω ) E G | (cid:98) cc − cc ( G ) | = Θ ω (cid:18)(cid:18) Np ω ∨ N dp ω − (cid:19) ∧ N (cid:19) , provided d ≥ ω . Case II: d ≤ ω . In this case, the previous construction is no longer feasible and we mustconstruct another pair of graphs with a smaller maximum degree. To this end, we consider graphs H and H (cid:48) consisting of disjoint cliques of size at most ω ≥
2, such that v ( H ) = v ( H (cid:48) ) = ω ω − , d max ( H ) = d max ( H (cid:48) ) = ω − , | cc ( H ) − cc ( H (cid:48) ) | = 1 . (37)23 u u bc bcbcbc (a) The graph H . bc bc u u u (b) The graph H (cid:48) . b b u u u (c) The resulting graphwhen u and u aresampled but not u . Figure 5: Example for ω = 4 and m = 3, where U = { u , u , u } form a triangle. If any one or two(as shown in the figure) of these vertices are not sampled and all incident edges are removed, theresulting graphs are isomorphic.If ω is odd, we set H = (cid:0) ωω (cid:1) K ω + (cid:0) ωω − (cid:1) K ω − + · · · + (cid:0) ω (cid:1) K + (cid:0) ω (cid:1) K H (cid:48) = (cid:0) ωω − (cid:1) K ω − + (cid:0) ωω − (cid:1) K ω − + · · · + (cid:0) ω (cid:1) K + (cid:0) ω (cid:1) K . (38)If ω is even, we set H = (cid:0) ωω (cid:1) K ω + (cid:0) ωω − (cid:1) K ω − + · · · + (cid:0) ω (cid:1) K + (cid:0) ω (cid:1) K H (cid:48) = (cid:0) ωω − (cid:1) K ω − + (cid:0) ωω − (cid:1) K ω − + · · · + (cid:0) ω (cid:1) K + (cid:0) ω (cid:1) K . (39)For example, for ω = 3, (38) becomes H = + 3 × and H (cid:48) = 3 × ; for ω = 4, (39) becomes H = + 6 × and H (cid:48) = 4 × + 4 × .Next we verify that H and H (cid:48) have matching subgraph counts. Indeed, for i = 1 , , . . . , ω − s ( K i , H ) − s ( K i , H (cid:48) ) = (cid:80) ωk = i ( − k (cid:0) ωk (cid:1)(cid:0) ki (cid:1) = 0 and s ( K i , H ) = s ( K i , H (cid:48) ) = (cid:80) ωk = i (cid:0) ωk (cid:1)(cid:0) ki (cid:1) =2 ω − − i (cid:0) ωi (cid:1) . Hence H and H (cid:48) contain matching number of cliques up to size ω −
1. Note that theonly connected induced subgraphs of H and H (cid:48) with at most ω − P (cid:101) H , P (cid:101) H (cid:48) ) ≤ (cid:0) ω ω − ω (cid:1) p ω and together with Theorem 10 and (37), we haveinf (cid:98) cc sup G ∈G ( N,d,ω ) E G | (cid:98) cc − cc ( G ) | ≥ Ω ω (cid:18) Np ω ∧ N (cid:19) = Θ ω (cid:18)(cid:18) Np ω ∨ N dp ω − (cid:19) ∧ N (cid:19) , where the last inequality follows from the current assumption that d ≤ ω . The condition on p ensures that TV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ p ω − < / Acknowledgment
The authors are grateful to Ben Rossman for fruitful discussions and Richard Stanley for helpfulcomments on (5). 24
Additional proofs
In this appendix, we give the proofs of Lemma 1, Lemma 4, Lemma 5, Lemma 6, Theorem 7,Theorem 8, Theorem 9, Theorem 10, Lemma 7, and Lemma 8. We also state the concentrationinequality from Lemma 10 that was used in the proof of Theorem 6.
Proof of Lemma 1.
By [56, Theorem 5.3.26], the chromatic polynomial of G is χ ( G ; x ) = ( x − c ) · · · ( x − c N ) = ( x − c (cid:48) ) · · · ( x − c (cid:48) N ) . The conclusion follows from the uniqueness of the chromatic polynomial (and its roots).
Proof of Lemma 4.
Note that { v , . . . , v N } is also a PEO of (cid:101) G and hence by Lemma 1, there is abijection between { (cid:101) c j : j ∈ [ m ] } and { (cid:98) c j : j ∈ [ N ] } . Therefore (cid:98) g = m (cid:88) j =1 g ( (cid:101) c j ) = N (cid:88) j =1 b j g ( (cid:98) c j ) . Proof of Lemma 5.
Note that N S + N T = N S \ T + N T \ S + 2 N S ∩ T , where N S \ T , N T \ S , and N S ∩ T are independent binomially distributed random variables. By independence, we have E [ f ( N S ) f ( N T )] = E (cid:34)(cid:18) − qp (cid:19) N S + N T (cid:35) = E (cid:34)(cid:18) − qp (cid:19) N S \ T + N T \ S +2 N S ∩ T (cid:35) = E (cid:34)(cid:18) − qp (cid:19) N S \ T (cid:35) E (cid:34)(cid:18) − qp (cid:19) N T \ S (cid:35) E (cid:34)(cid:18) − qp (cid:19) N S ∩ T (cid:35) . Finally, note that if S (cid:54) = T , then at least one of E [( − qp ) N S \ T ] or E [( − qp ) N T \ S ] is zero. If S = T , wehave E [ f ( N S ) ] = E (cid:34)(cid:18) − qp (cid:19) N S (cid:35) = (cid:18) qp (cid:19) | S | . Proof of Lemma 6.
Let c j = | C j | . To prove (15), we will show that for any fixed j , |{ i ∈ [ N ] : i (cid:54) = j, C j = C i (cid:54) = ∅}| ≤ d − c j ≤ d − . By definition of the PEO, | N G ( v ) | ≥ c j for all v ∈ C j . For any i ∈ [ N ] such that C j = C i (cid:54) = ∅ , v i ∈ N G ( v ) for all v ∈ C j . Also, the fact that C j = C i (cid:54) = ∅ makes it impossible for v i ∈ C j . Thisshows that c j + |{ i ∈ [ N ] : i (cid:54) = j, C j = C i (cid:54) = ∅}| ≤ | N G ( v ) | ≤ d, and hence the desired (15). When we say a PEO { v , . . . , v N } of G is also a PEO of (cid:101) G = G [ S ], it is understood in the following sense: forany v j ∈ S , N (cid:101) G ( v j ) ∩ { v i ∈ S : i < j } is a clique in G [ S ]. a j = | A j | . We will prove that for any fixed j , |{ i ∈ [ N ] : i (cid:54) = j, A i ∩ A j (cid:54) = ∅}| ≤ d a j − ( a j − . (40)This fact immediately implies (44) by noting that a j ≤ ω . To this end, note that |{ i ∈ [ N ] : i (cid:54) = j, A i ∩ A j (cid:54) = ∅}| = |{ i ∈ [ N ] : i (cid:54) = j, v i / ∈ A j , A i ∩ A j (cid:54) = ∅}| + |{ i ∈ [ N ] : i (cid:54) = j, v i ∈ A j }| , where the second term is obviously at most a j −
1. Next we prove that the first term is at most( d + 1 − a j ) a j , which, in view of ( d + 1 − a j ) a j + ( a j −
1) = d a j − ( a j − , implies the desired (40).Suppose, for the sake of contradiction, that |{ i ∈ [ N ] : i (cid:54) = j, v i / ∈ A j , A i ∩ A j (cid:54) = ∅}| ≥ ( d + 1 − a j ) a j + 1Then at least ( d + 1 − a j ) a j + 1 of the A i have nonempty intersection with A j , meaning that at least( d + 1 − a j ) a j + 1 vertices outside A j are incident to vertices in A j . By the pigeonhole principle,there is at least one vertex u ∈ A j which is incident to d + 2 − a j of those vertices outside A j .Moreover, the vertices in A j form a clique of size a j in G by definition of the PEO. This impliesthat | N G ( u ) | ≥ ( a j −
1) + ( d − a j + 2) = d + 1, contradicting the maximum degree assumption andcompleting the proof.To prove the high-probability bound (12), we used a concentration inequality for the sum ofdependent random variables due to Janson [34]. This result, stated next, can be distilled from [34,Theorem 2.3]. The two-sided version of the concentration inequality therein also holds; see theparagraph before [34, Equation (2.3)]. Lemma 10.
Let X = (cid:80) j ∈ [ N ] Y j , where | Y j − E [ Y j ] | ≤ b almost surely. Let S = (cid:80) j ∈ [ N ] Var [ Y j ] .Let Γ = ([ N ] , E (Γ)) be a dependency graph for { Y j } j ∈ [ N ] in the sense that if A ⊂ [ N ] , and i ∈ [ N ] \ A does not belong to the neighborhood of any vertex in A , then Y i is independent of { Y j } j ∈ A .Furthermore, suppose Γ has maximum degree d max . Then, for all t ≥ , P [ | X − E [ X ] | ≥ t ] ≤ (cid:26) − t d max + 1)( S + bt/ (cid:27) . Proof of Theorem 7.
Let { v , . . . , v N } be a PEO of the parent graph G and let { (cid:101) v , . . . , (cid:101) v m } , m = | S | , be a PEO of (cid:101) G and (cid:101) c j = | N (cid:101) G ( (cid:101) v j ) ∩ { (cid:101) v , . . . , (cid:101) v j − }| . Let (cid:98) c j = | N (cid:101) G ( v j ) ∩ { v , . . . , v j − }| and c j = | N G ( v j ) ∩ { v , . . . , v j − }| . By Lemma 4, we can rewrite (cid:98) cc L as (cid:98) cc L = 1 p (cid:88) j ≥ b j (cid:18) − qp (cid:19) (cid:98) c j P [ L ≥ (cid:98) c j ] , where (cid:98) c j ∼ Bin( c j , p ) conditioned on { b j = 1 } . 26e compute the bias and variance of (cid:98) cc L and then optimize over λ . First, E [ cc ( G ) − (cid:98) cc L ] = 1 p N (cid:88) j =1 E [ b j (cid:18) − qp (cid:19) (cid:98) c j P [ L < (cid:98) c j ]] = N (cid:88) j =1 c j (cid:88) i =0 (cid:18) c j i (cid:19) p i q c j − i (cid:18) − qp (cid:19) i P [ L < i ]= N (cid:88) j =1 q c j c j (cid:88) i =0 (cid:18) c j i (cid:19) ( − i P [ L < i ] = N (cid:88) j =1 q c j c j (cid:88) i =0 (cid:18) c j i (cid:19) ( − i i − (cid:88) (cid:96) =0 P [ L = (cid:96) ]= N (cid:88) j =1 q c j c j − (cid:88) (cid:96) =0 P [ L = (cid:96) ] c j (cid:88) i = (cid:96) +1 (cid:18) c j i (cid:19) ( − i (a) = N (cid:88) j =1 q c j E L (cid:20)(cid:18) c j − L (cid:19) ( − L +1 (cid:21) (b) = − e − λ N (cid:88) j =1 q c j L c j − ( λ ) , where (a) follows from the fact that (cid:80) ki = (cid:96) +1 (cid:0) ki (cid:1) ( − i = (cid:0) k − (cid:96) (cid:1) ( − (cid:96) +1 , and (b) follows from E L (cid:20)(cid:18) k − L (cid:19) ( − L +1 (cid:21) = e − λ L k − ( λ ) , (41)where L m is the Laguerre polynomial of degree m , which satisfies | L m ( x ) | ≤ e x/ for all m ≥ x ≥ | E [ (cid:98) cc L − (cid:98) cc ] | ≤ N e − λ/ . (42)To bound the variance, write (cid:98) cc L = p (cid:80) Nj =1 W j , where W j = b j ( − qp ) (cid:98) c j P [ L ≥ (cid:98) c j ]. Thus Var [ (cid:98) cc L ] = 1 p (cid:88) j ∈ [ N ] Var [ W j ] + 1 p (cid:88) i (cid:54) = j Cov [ W i , W j ] (43)Note that W j is a function of { b (cid:96) : v (cid:96) ∈ A j , (cid:96) ∈ [ N ] } , where A j is defined in (16). Using Lemma 6,we have |{ ( i, j ) ∈ [ N ] : i (cid:54) = j, A i ∩ A j (cid:54) = ∅}| ≤ N dω. (44)Thus the number of cross terms in (43) is at most
N dω thanks to (44). Thus,
Var [ (cid:98) cc L ] ≤ N (1 + dω ) p max ≤ j ≤ N Var [ W j ] . (45)Finally, note that if p < /
2, then
Var [ W j ] ≤ p (cid:32) sup k ≥ (cid:40)(cid:18) qp (cid:19) k P [ L ≥ k ] (cid:41)(cid:33) ≤ p (cid:32) E L (cid:34)(cid:18) qp (cid:19) L (cid:35)(cid:33) = p exp (cid:26) λ (cid:18) qp − (cid:19)(cid:27) . (46)Combining (42), (45), and (46), we have E G | (cid:98) cc L − cc ( G ) | ≤ N e − λ + N (1 + dω ) p exp (cid:26) λ (cid:18) qp − (cid:19)(cid:27) . The choice of λ yields the desired bound. 27 roof of Theorem 8. The estimator (9) can also be written as (cid:98) cc = (cid:80) cc ( G ) k =1 [1 − ( − qp ) (cid:101) N k ], where (cid:101) N k is the number of sampled vertices from the k th component. Then (cid:101) N k ind. ∼ Bin( N k , p ). Thus, Var [ (cid:98) cc ] = cc ( G ) (cid:88) k =1 (cid:18) qp (cid:19) N k = N (cid:88) r =1 (cid:18) qp (cid:19) r cc r . The upper bound follows from the fact that cc r = 0 for all r > ω and (cid:80) Nr =1 cc r = cc ( G ) ≤ N . Proof of Theorem 9.
The bias of this estimator is seen to be E [ cc ( G ) − (cid:101) cc L ] = cc ( G ) (cid:88) k =1 E (cid:34) P (cid:104) L < (cid:101) N k (cid:105) (cid:18) − qp (cid:19) (cid:101) N k (cid:35) . Note that E (cid:34) P (cid:104) L < (cid:101) N k (cid:105) (cid:18) − qp (cid:19) (cid:101) N k (cid:35) = N (cid:88) r =1 P [ L < r ] (cid:18) − qp (cid:19) r P (cid:104) (cid:101) N k = r (cid:105) = N − (cid:88) i =0 P [ L = i ] N (cid:88) r = i +1 (cid:18) − qp (cid:19) r P (cid:104) (cid:101) N k = r (cid:105) . Since (cid:101) N k ∼ Bin( N k , p ), it follows that N (cid:88) r = i +1 (cid:18) − qp (cid:19) r P (cid:104) (cid:101) N k = r (cid:105) = q N k N (cid:88) r = i +1 (cid:18) N k r (cid:19) ( − r = q N k ( − i +1 (cid:18) N k − i (cid:19) . Putting these facts together, we have E [ cc ( G ) − (cid:101) cc L ] = − cc ( G ) (cid:88) k =1 q N k P N k − ( λ ) = cc ( G ) (cid:88) k =1 q N k E L (cid:20)(cid:18) N k − L (cid:19) ( − L +1 (cid:21) , Analogous to (41), we have (cid:12)(cid:12)(cid:12) E L (cid:104)(cid:0) N k − L (cid:1) ( − L +1 (cid:105)(cid:12)(cid:12)(cid:12) ≤ e − λ/ , and hence by the Cauchy-Schwarzinequality, | E [ cc ( G ) − (cid:101) cc L ] | ≤ e − λ/ (cid:118)(cid:117)(cid:117)(cid:116) N cc ( G ) (cid:88) k =1 q N k . (47)For the variance of (cid:101) cc L , note that (cid:101) cc L = (cid:80) cc ( G ) k =1 W k , where W k (cid:44) − P (cid:104) L ≥ (cid:101) N k (cid:105) (cid:16) − qp (cid:17) (cid:101) N k . The W k are independent random variables and hence Var [ (cid:101) cc L ] = cc ( G ) (cid:88) k =1 Var [ W k ] ≤ cc ( G ) (cid:88) k =1 E W k . Also, W k ≤ max ≤ r ≤ N (cid:26) − P [ L ≥ r ] (cid:18) − qp (cid:19) r (cid:27) { (cid:101) N k ≥ } . Var [ (cid:101) cc L ] ≤ max ≤ r ≤ N (cid:26) − P [ L ≥ r ] (cid:18) − qp (cid:19) r (cid:27) cc ( G ) (cid:88) k =1 (1 − q N k ) . Since p < /
2, we have P [ L ≥ r ] (cid:18) qp (cid:19) r = ∞ (cid:88) i = r P [ L = i ] (cid:18) qp (cid:19) r ≤ ∞ (cid:88) i = r P [ L = i ] (cid:18) qp (cid:19) i ≤ ∞ (cid:88) i =0 P [ L = i ] (cid:18) qp (cid:19) i = E L (cid:18) qp (cid:19) L = e λ ( qp − . Thus, it follows that
Var [ (cid:101) cc L ] ≤ e λ ( qp − cc ( G ) (cid:88) k =1 (1 − q N k ) . (48)Combining (47) and (48) yields E | (cid:101) cc L − cc ( G ) | ≤ e λ ( qp − cc ( G ) (cid:88) k =1 (1 − q N k ) + N e − λ cc ( G ) (cid:88) k =1 q N k ≤ cc ( G ) max (cid:110) e λ ( qp − , N e − λ (cid:111) . Choosing λ = p − p log( N/
4) leads to 4 e λ ( qp − = N e − λ and completes the proof. Proof of Theorem 10.
Fix α ∈ (0 , M = N/m and G = G + G + · · · + G M , where G i (cid:39) H or H (cid:48) with probability α and 1 − α , respectively. Let P α denote the law of G and E α the corresponding expectation. Assume without loss of generality that f ( H ) > f ( H (cid:48) ). Note that E α f ( G ) = M [ αf ( H ) + (1 − α ) f ( H (cid:48) )].Let (cid:101) G i be the sample version of G i . Then (cid:101) G = (cid:101) G + · · · + (cid:101) G M . For each subgraph h , by (2), wehave P (cid:104) (cid:101) G i (cid:39) h | G i (cid:39) H (cid:105) = s ( h, H ) p v ( h ) (1 − p ) m − v ( h ) , and P (cid:104) (cid:101) G i (cid:39) h | G i (cid:39) H (cid:48) (cid:105) = s ( h, H (cid:48) ) p v ( h ) (1 − p ) m − v ( h ) . Let P (cid:44) P (cid:101) H = L ( (cid:101) G i | G i (cid:39) H ) and P (cid:48) (cid:44) P (cid:101) H (cid:48) = L ( (cid:101) G i | G i (cid:39) H (cid:48) ). Then the law of each (cid:101) G i issimply a mixture P α (cid:44) L ( (cid:101) G i ) = αP + (1 − α ) P (cid:48) . Furthermore, ( (cid:101) G , (cid:101) G , . . . , (cid:101) G M ) (cid:48) ∼ P ⊗ Mα .To lower bound the minimax risk of estimating the functional f ( G ), we apply the method oftwo fuzzy hypotheses [55, Theorem 2.15(i)]. To this end, consider a pair of priors, that is, thedistribution of G with α = α = 1 / α = 1 / δ , respectively, where δ ∈ [0 , /
2] is tobe determined. To ensure that the values of f ( G ) are separated under the two priors, note that f ( G ) law = ( f ( H ) − f ( H (cid:48) ))Bin( M, α ) + f ( H (cid:48) ) M . Define L = f ( H )(1 / δ/ M + f ( H (cid:48) )(1 / − δ/ M and ∆ (cid:44)
14 ( E α f ( G ) − E α f ( G )) = M δ f ( H ) − f ( H (cid:48) )) . By Hoeffding’s inequality, for any δ ≥ P α [ f ( G ) ≤ L ] = P [Bin( M, α ) ≤ M α + M δ/ ≥ − e − δ M/ (cid:44) − β . P α [ f ( G ) ≥ L + 2∆] = P [Bin( M, α ) ≥ M α − M δ/ ≥ − e − δ M/ (cid:44) − β . Invoking [55, Theorem 2.15(i)], we haveinf (cid:98) f sup G ∈G P (cid:104) | (cid:98) f (cid:0) (cid:101) G (cid:1) − f ( G ) | ≥ ∆ (cid:105) ≥ − TV( P ⊗ Mα , P ⊗ Mα ) − β − β . (49)The total variation term can be bounded as follows:TV( P ⊗ Mα , P ⊗ Mα ) (a) ≤ −
12 exp {− χ ( P ⊗ Mα (cid:107) P ⊗ Mα ) } = 1 −
12 exp {− (1 + χ ( P α (cid:107) P α )) M + 1 } (b) ≤ −
12 exp {− (1 + 4 δ TV(
P, P (cid:48) )) M + 1 } , where (a) follows from the inequality between the total variation and the χ -divergence χ ( P (cid:107) Q ) (cid:44) (cid:82) ( dPdQ − dQ [55, Eqn. (2.25)]; (b) follows from χ ( P α (cid:107) P α ) = χ (cid:18) P + P (cid:48) δ ( P − P (cid:48) ) (cid:13)(cid:13)(cid:13) P + P (cid:48) (cid:19) = δ (cid:90) ( P − P (cid:48) ) P + P (cid:48) ≤ δ TV(
P, P (cid:48) ) . Choosing δ = ∧ (cid:113) M TV( P (cid:101) H ,P (cid:101) H (cid:48) ) and in view of the assumptions that M ≥
300 and TV(
P, P (cid:48) ) ≤ / {− (1 + 4 δ TV(
P, P (cid:48) )) M + 1 } − e − δ M/ ≥ . , which proves (28). Proof of Lemma 7.
We use Kocay’s Vertex Theorem [36] which says that if H is a collection ofgraphs, then (cid:89) h ∈H s ( h, G ) = (cid:88) g a g s ( g, G ) , where the sum runs over all graphs g such that v ( g ) ≤ (cid:80) h ∈H v ( h ) and a g is the number of decom-positions of V ( g ) into ∪ h ∈H V ( h ) such that g [ V ( h )] (cid:39) h .In particular, if H consists of the connected components of H , then the only disconnected g with v ( g ) = v satisfying the above decomposition property is g (cid:39) H . Hence s ( H, G ) = 1 a H (cid:34) (cid:89) h ∈H s ( h, G ) − (cid:88) g a g s ( g, G ) (cid:35) , where the sum runs over all g that are either connected and v ( g ) ≤ v or disconnected and v ( g ) ≤ v −
1. This shows that s ( H, G ) can be expressed as a polynomial, independent of G , in s ( g, G )where either g is connected and v ( g ) ≤ v or g is disconnected and v ( g ) ≤ v − v . The base case of v = 1 is clearly true. Suppose thatfor any disconnected graph h with at most v vertices, s ( h, G ) can be expressed as a polynomial,independent of G , in s ( g, G ) where g is connected and v ( g ) ≤ v . By the first part of the proof,if H is a disconnected graph with v + 1 vertices, then s ( H, G ) can be expressed as a polynomial,independent of G , in s ( h, G ) where either h is connected and v ( h ) ≤ v + 1 or h is disconnected and v ( h ) ≤ v . By S ( v ), each s ( h, G ) with h disconnected and v ( h ) ≤ v can be expressed as a polynomial,independent of G , in s ( g, G ) where g is connected and v ( g ) ≤ v . Thus, we can express s ( H, G ) asa polynomial, independent of G , in terms of s ( g, G ) where g is connected and v ( g ) ≤ v + 1. Proof of Lemma 8.
By Corollary 1, we have s ( h, H ) = s ( h, H (cid:48) ) , (50)for all h (not necessarily connected) with v ( h ) ≤ k . Note that conditioned on (cid:96) vertices are sampled, (cid:101) H is uniformly distributed over the collection of all induced subgraphs of H with (cid:96) vertices. Thus P (cid:104) (cid:101) H (cid:39) h | v ( (cid:101) H ) = (cid:96) (cid:105) = s ( h, H ) (cid:0) m(cid:96) (cid:1) . In view of (50), we conclude that the isomorphism class of (cid:101) H and (cid:101) H (cid:48) have the same distributionprovided that no more than k vertices are sampled. Hence the first inequality in (30) follows, whilethe last inequality therein follows from the union bound P [Bin( m, p ) ≥ (cid:96) ] ≤ (cid:0) m(cid:96) (cid:1) p (cid:96) . The bound (31)follows directly from Hoeffding’s inequality on the binomial tail probability in (30). B Additional results
In this appendix, we provide results for the uniform sampling model in Section B.1. We alsodiscuss additional lower bound conclusions for graphs with long cycles in Section B.2 and forestsin Section B.3.
B.1 Extensions to uniform sampling model
As we mentioned in Section 2.1, the uniform sampling model, where n vertices are selected uniformlyat random from G , is similar to Bernoulli sampling with p = n/N . For this model, the unbiasedestimator analogous to (9) is (cid:98) cc U = (cid:88) i ≥ ( − i +1 p i s ( K i , (cid:101) G ) , (51)where p i (cid:44) ( N − in − i )( Nn ) . Next we show that this unbiased estimator enjoys the same variance boundin Theorem 6 up to constant factors that only depend on ω . The proof of this result if given inAppendix A. Theorem 12.
Let (cid:101) G be generated from the uniform sampling model with n = pN . Then Var [ (cid:98) cc U ] = O ω (cid:18) Np ω + N dp ω − (cid:19) . roof. Using ( a + · · · + a k ) ≤ k ( a + · · · + a k ), we have Var [ (cid:98) cc U ] ≤ ω · ω (cid:88) i =1 Var [ s ( K i , (cid:101) G )] p i . (52)Next, each variance term can be bounded as follows. Let b v = { v ∈ S } ∼ Bern( p ). Note that Var [ s ( K i , (cid:101) G )] = Var (cid:88) T : G [ T ] (cid:39) K i (cid:89) v ∈ T b v = (cid:88) T : G [ T ] (cid:39) K i Var (cid:34) (cid:89) v ∈ T b v (cid:35) + i − (cid:88) k =0 (cid:88) T (cid:54) = T (cid:48) : | T ∩ T (cid:48) | = k,G [ T ] (cid:39) K i , G [ T (cid:48) ] (cid:39) K i Cov (cid:34) (cid:89) v ∈ T b v , (cid:89) v (cid:48) ∈ T (cid:48) b v (cid:48) (cid:35) = s ( K i , G ) p i,i + 2 i − (cid:88) k =0 n ( T i,k , G ) p i,k , (53)where p i,k (cid:44) p i − k − p i = (cid:0) N − i + kn − i + k (cid:1)(cid:0) Nn (cid:1) − (cid:32) (cid:0) N − in − i (cid:1)(cid:0) Nn (cid:1) (cid:33) , ≤ k ≤ i ≤ n,T i,k denotes two K i ’s sharing k vertices, and we recall that n ( H, G ) notes the number of embeddingsof (edge-induced subgraphs isomorphic to) H in G . It is readily seen that p i,k p i ≤ i ! p k since p i,k p i ≤ p i − k p i = (cid:0) N − i + kn − i + k (cid:1)(cid:0) N − in − i (cid:1) (cid:0) Nn (cid:1)(cid:0) N − in − i (cid:1) = (cid:81) i − kj = i +1 n − j +1 N − j +1 (cid:81) ij =1 n − j +1 N − j +1 ≤ (cid:81) i − kj = i +1 nN (cid:81) ij =1 njN = i ! p k , where we used p = n/N and the inequalities njN ≤ n − j +1 N − j +1 ≤ nN for 1 ≤ j ≤ (1 + N ) n . Furthermore,from the same steps, for k = 0 we have p i p i = i (cid:89) j =1 n − j +1 − iN − j +1 − in − j +1 N − j +1 ≤ , or equivalently, p i, ≤
0, which also follows from negative association.Substituting p i, ≤ p i,k p i ≤ i ! p k into (53) yields1 p i Var [ s ( K i , (cid:101) G )] = s ( K i , G ) p i,i p i + 2 i − (cid:88) k =0 n ( T i,k , G ) p i,k p i ≤ s ( K i , G ) p i,i p i + 2 i − (cid:88) k =1 n ( T i,k , G ) p i,k p i ≤ i ! (cid:32) s ( K i , G ) p i + 2 i − (cid:88) k =1 n ( T i,k , G ) p k (cid:33) . (54)32o finish the proof, we establish two combinatorial facts: s ( K i , G ) = O ω ( N ) , i = 1 , , . . . , ω (55) n ( T i,k , G ) = O ω ( N d ) , k = 1 , , . . . , i − G with clique number bounded by ω ,the number of cliques of any size is at most O ω ( | v ( G ) | ) = O ω ( N ). This can be seen from the PEOrepresentation in (8) since c j ≤ ω −
1. To show (56), note that to enumerate T i,k , we can firstenumerate cliques of size i , then for each clique, choose i − k other vertices in the neighborhood of k vertices of the clique. Note that for each v ∈ V ( G ), the neighborhood of v is also a chordal graphof at most d vertices and clique number at most ω . Therefore, by (55), the number of K i − k ’s inthe neighborhood of any given vertex is at most O ω ( d ).Finally, applying (55)–(56) to each term in (54), we have1 p i Var [ s ( K i , (cid:101) G )] = O ω (cid:32) Np i + i − (cid:88) k =1 N dp k (cid:33) = O ω (cid:18) Np i + N dp i − (cid:19) , which, in view of (52), yields the desired result. B.2 Lower bound for graphs with long induced cycles
Theorem 13.
Let G ( N, r ) denote the collection of all graphs on N vertices with longest inducedcycle at most r , with r ≥ . Suppose p < / and r ≥ − p ) . Then inf (cid:98) cc sup G ∈G ( N,r ) E G | (cid:98) cc − cc ( G ) | (cid:38) N e r (1 − p ) ∧ N r . In particular, if p < / and r = Θ(log N ) , then inf (cid:98) cc sup G ∈G ( N,r ) E G | (cid:98) cc − cc ( G ) | (cid:38) N log N .
Proof.
We will prove the lower bound via Theorem 10 with m = 2( r − H = C r + P r − and H (cid:48) = P r − . Note that s ( P i , H ) = s ( P i , H (cid:48) ) = 2 r − − i for i = 1 , , . . . , r −
1. For anillustration of the construction when r = 5, see Fig. 6. Since paths of length at most r − H and H (cid:48) with at most r − H and H (cid:48) have matching subgraph counts up to order r − k = r − m = 2( r − | cc ( H ) − cc ( H (cid:48) ) | = 1. ByTheorem 10, inf (cid:98) cc sup G ∈G ( N,r ) P [ | (cid:98) cc − cc ( G ) | ≥ ∆] ≥ . , where ∆ (cid:16) | cc ( H ) − cc ( H (cid:48) ) | (cid:32)(cid:115) Nm TV( P (cid:101) H , P (cid:101) H (cid:48) ) ∧ Nm (cid:33) = (cid:32)(cid:115) Nm TV( P (cid:101) H , P (cid:101) H (cid:48) ) ∧ Nm (cid:33) . Furthermore, by (31), the total variation between the sampled graphs (cid:101) H and (cid:101) H (cid:48) satisfiesTV( P (cid:101) H , P (cid:101) H (cid:48) ) ≤ e − r r − (1 − p + pr ) ≤ e − r (1 − p ) < / , provided p < / r ≥ − p ) . The desired lower bound on the squared error follows fromMarkov’s inequality. 33 a) The graph H (b) The graph H (cid:48) . Figure 6: The construction for r = 5. Each connected subgraph with k ≤ − k times in each graph. B.3 Lower bounds for forests
Particularizing Theorem 11 to ω = 2, we obtain a lower bound which shows that the estimator forforests (cid:98) cc = v ( (cid:101) G ) /p − e ( (cid:101) G ) /p proposed by Frank [21] is minimax rate-optimal. As opposed to thegeneral construction in Theorem 11, Fig. 7 illustrates a simple construction of H and H (cid:48) for forests.However, we still require that p is less than some absolute constant. Through another argument,we show that this constant can be arbitrarily close to one. (a) The graph of H for ω = 2 and m = 6. (b) The graph of H (cid:48) for ω = 2 and m = 6. Figure 7: The two graphs are isomorphic if the center vertex is not sampled and all incident edgesare removed. Thus, TV( P (cid:101) H , P (cid:101) H (cid:48) ) = p (1 − q ). Theorem 14 (Forests) . Let F ( N, d ) = G ( N, d, denote the collection of all forests on N verticeswith maximum degree at most d . Then for all < p < , inf (cid:98) cc sup G ∈F ( N,d ) E G | (cid:98) cc − cc ( G ) | (cid:38) (cid:18) N qp ∨ N qdp (cid:19) ∧ N . In particular, if d = Θ( N ) and ω ≥ , then inf (cid:98) cc sup G ∈G ( N,d,ω ) E G | (cid:98) cc − cc ( G ) | ≥ inf (cid:98) cc sup G ∈F ( N,d ) E G | (cid:98) cc − cc ( G ) | (cid:38) N. Proof.
The strategy is to choose a one-parameter family of forests F and reduce the problem toestimating the total number of trials in a binomial experiment with a given success probability. Tothis end, define M = N/ ( d + 1) and let F = { ( N − m ( d + 1)) S + mS d : m ∈ { , , . . . , M }} . G ∈ F . Because we do not observe the labels { b v : v ∈ V ( G ) } , the distribution of (cid:101) G can be described by the vector ( T , T , . . . , T d ), where T j is the observed number of S j . Since T = N − (cid:80) j ≥ ( j + 1) T j , it follows that ( T , . . . , T d ) is sufficient for (cid:101) G . Next, we will show that T = T + · · · + T d ∼ Bin( m, p (cid:48) ), where p (cid:48) (cid:44) p (1 − q d ) is sufficient for (cid:101) G . To this end, note thatconditioned on T = n , the probability mass function of ( T , . . . , T d ) at ( n , . . . , n d ) is equal to P [ T = n , . . . , T d = n d , T = n ] P [ T = n ] = (cid:0) mn (cid:1)(cid:0) nn ,...,n d (cid:1) p n · · · p n d d (1 − p (cid:48) ) m − n (cid:0) mn (cid:1) ( p (cid:48) ) n (1 − p (cid:48) ) m − n = (cid:18) nn , . . . , n d (cid:19) ( p /p (cid:48) ) n · · · ( p d /p (cid:48) ) n d , where p j (cid:44) (cid:0) dj (cid:1) p j q d − j . Thus, ( T , . . . , T d ) | T = n ∼ Multinomial( n, p /p (cid:48) , . . . , p d /p (cid:48) ), whosedistribution is independent of m . Thus, since cc ( G ) = N − md , we have thatinf (cid:98) cc sup G ∈F ( N,d ) E G | (cid:98) cc − cc ( G ) | ≥ inf (cid:98) cc sup G ∈F E G | (cid:98) cc − cc ( G ) | = d inf (cid:98) m ( T ) sup m ∈{ , ,...,M } E T ∼ Bin( m,p (cid:48) ) | (cid:98) m ( T ) − m | (cid:38) (cid:18) N qp ∨ N qdp (cid:19) ∧ N , which follows applying Lemma 11 below with α = p (cid:48) and M = N/ ( d + 1) and the fact that p (cid:48) = p (1 − q d ) ≤ p ∧ ( p d ).The proof of Lemma 11 is given in Appendix A. Lemma 11 (Binomial experiment) . Let X ∼ Bin( m, α ) . For all ≤ α ≤ and M ∈ N known apriori, inf (cid:98) m sup m ∈{ , ,...,M } E | (cid:98) m ( X ) − m | (cid:16) (1 − α ) Mα ∧ M . Proof.
The upper bound follows from choosing (cid:98) m = X/α when α > (1 − α ) /M and (cid:98) m = ( M + 1) / α ≤ (1 − α ) /M .For the lower bound, let γ >
0. Consider the two hypothesis H : m = M and H : m = M − (cid:113) γMα ∧ M . By Le Cam’s two point method [55, Theorem 2.2(i)],inf (cid:98) m sup m ∈{ , ,...,M } E | (cid:98) m ( X ) − m | ≥ | m − m | [1 − TV(Bin( m , α ) , Bin( m , α ))] ≥ γMα ∧ M − H (Bin( m , α ) , Bin( m , α ))] , where we used the inequality between total variation and the Hellinger distance TV ≤ H [55,Lemma 2.3]. Finally, choosing γ = (1 − α ) /
16 and using the bound in [48, Lemma 21] on theHellinger distance between two binomials, we obtain H (Bin( m , α ) , Bin( m , α )) ≤ / Numerical experiments
In this section, we study the empirical performance of the estimators proposed in Section 4 usingsynthetic data from various random graphs. The error bars in the following plots show the vari-ability of the relative error | (cid:98) cc − cc ( G ) | cc ( G ) over 20 independent experiments of subgraph sampling on afixed parent graph G . The solid black horizontal line shows the sample average and the whiskersshow the mean ± the standard deviation. C.1 Synthetic experiment
Chordal graphs
Both Fig. 8 and Fig. 9 focus on chordal graphs, where the parent graph is firstgenerated from a random graph ensemble then triangulated by calculating a fill-in of edges to makeit chordal (using a maximum cardinality search algorithm from [54]). In Fig. 8a, the parent graph G is a triangulated Erd¨os-R´enyi graph G ( N, δ ), with N = 2000 and δ = 0 . δ = log NN [16]. In Fig. 8b, we generate G with N = 20000 vertices bytaking the disjoint union of 200 independent copies of G (100 , .
2) and then apply triangulation. Inaccordance with Theorem 6, the better performance in Fig. 8b is due to moderately sized d and ω ,and large cc ( G ).In Fig. 9 we perform a simulation study of the smoothed estimator (cid:98) cc L from Theorem 7. Theparent graph is equal to a triangulated realization of G (1000 , . d = 88, ω = 15, and cc ( G ) = 325. The plots in Fig. 11b show that the sampling variability is significantly reduced for thesmoothed estimator, particularly for small values of p (to show detail, the vertical axes are plottedon different scales). This behavior is in accordance with the upper bounds furnished in Theorem 6and Theorem 7. Large values of ω inflate the variance of (cid:98) cc considerably by an exponential factorof 1 /p ω , whereas the effect of large ω on the variance of (cid:98) cc L is polynomial, viz., ω p − p . We chosethe smoothing parameter λ to be p log N , but other values that improve the performance can bechosen through cross-validation on various known graphs.The non-monotone behavior of the relative error in Fig. 11a can be explained by the tradeoffbetween increasing p (which improves the accuracy) and increasing probability of observing a clique(which increases the variability, particularly in this case of large ω ). Such behavior is apparent formoderate values of p (e.g., p < . p increases to 1 since the mean squared errortends to zero as more of the parent graph is observed. The plots also suggest that the marginalbenefit (i.e., the marginal decrease in relative error) from increasing p diminishes for moderatevalues of p . Future research would address the selection of p , if such control was available to theexperimenter. Non-chordal graphs
Finally, in Fig. 10 we experiment with sampling non-chordal graphs. Asproposed in Section 4.4, one heuristic is to modify the original estimator by first triangulating thesubsampled graph (cid:101) G to TRI ( (cid:101) G ) and then applying the estimator (cid:98) cc in (10). The plots in Fig. 10show that this strategy works well; in fact the performance is competitive with the same estimatorin Fig. 8, where the parent graph is first triangulated and then subsampled.36 .2 . . . . . . p R e l a t i v e e rr o r (a) Parent graph equal to a triangulated re-alization of G (2000 , . d = 36, ω = 5, and cc ( G ) = 985. . . . . . p R e l a t i v e e rr o r (b) Parent graph equal to a triangulated re-alization of 200 copies of G (100 , .
2) with d = 8, ω = 4, and cc ( G ) = 803. Figure 8: The relative error of (cid:98) cc with moderate values of d and ω . - p R e l a t i v e e rr o r (a) Non-smoothed (cid:98) cc . . . . . p R e l a t i v e e rr o r (b) Smoothed (cid:98) cc L . Figure 9: A comparison of the relative error of the unbiased estimator (cid:98) cc in (10) and its smoothedversion (cid:98) cc L in (25). The parent graph is a triangulated realization of G (1000 , . d = 88, ω = 15, and cc ( G ) = 325. 37 .2 . . . . . . . . p R e l a t i v e e rr o r (a) Parent graph equal to a realization of G (2000 , . d = 8, ω = 3, and cc ( G ) = 756. . . . . . p R e l a t i v e e rr o r (b) Parent graph equal to a realization of200 copies of G (100 , .
2) with d = 7, ω = 4,and cc ( G ) = 532. Figure 10: The estimator (cid:98) cc ( TRI ( (cid:101) G )) applied to non-chordal graphs. C.2 Real-data experiment
The point of developing a theory for graphs with specific structure (i.e. chordal) is to (a) study howthe graph parameters (such as maximal degree) impact the estimation difficulty and (b) motivate aheuristic for real-world graphs encountered in practice. Indeed, as with all problems in a minimaxframework, a certain amount of finesse is required to define a parameter space that showcases therichness of the problem, while at the same time, enables one to provide a characterization of thefundamental limits of estimation. Chordal graphs seem to fit this purpose. Importantly, they serveas a catalyst for more general strategies that apply to a wider collection of parent graphs, includingthose commonly encountered in practice.In the previous subsection, we studied the estimators (cid:98) cc and (cid:98) cc L using synthetic data on mod-erately sized, chordal parent graphs. In this section, we consider real-world instances of networkdatasets, where the parent graph is not chordal and the number of nodes is large. More specifi-cally, we consider two representative examples of collaboration and biological networks. We believethese examples show the usefulness of our estimators on real-world data, despite the fact that themethodology was developed for chordal parent graphs.The first network [40] is the collaboration network of authors with arXiv “cond-mat” (condensematter physics) articles submitted between January 1993 and April 2003. Note that the categoryhas been active since April 1992. An edge is attached between two researchers in the network ifand only if they co-authored a paper together.The second network [51, Supplementary Table S2] is an initial version of a proteome-scalemap of human binary protein-protein interactions (i.e., edges represent direct physical interactionsbetween two protein molecules). Because self-loops do not affect the connectivity of the network,we removed them from the dataset.We use the smoothed estimator (cid:98) cc L on both networks; the standard estimator (cid:98) cc performspoorly because of high-degree vertices and large clique numbers (c.f., Fig. 9). To deal with the38on-chordal parent graphs, we again use the heuristic proposed in Section 4.4 of first triangulatingthe subsampled graph (cid:101) G to TRI ( (cid:101) G ) and then applying the smoothed estimator (cid:98) cc L in (25). Theresults of this estimation scheme on both networks are displayed in Fig. 11. . . . . . p R e l a t i v e e rr o r (a) Collaboration network of arXiv con-dense matter physics: N = 23133, e ( G ) =93439, d = 279, and cc ( G ) = 567. . . . . . p R e l a t i v e e rr o r (b) Human protein-protein network: N =3133, e ( G ) = 6149, d = 129, and cc ( G ) =210. Figure 11: Smoothed estimator (cid:98) cc L ( TRI ( (cid:101) G )) applied to a collaboration and biological network. References [1] Milton Abramowitz and Irene A. Stegun, editors.
Handbook of mathematical functions withformulas, graphs, and mathematical tables . Dover Publications, Inc., New York, 1992. Reprintof the 1972 edition.[2] Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, RonittRubinfeld, and Anak Yodpinyanee. Sublinear-time algorithms for counting star subgraphs viaedge sampling.
Algorithmica , pages 1–30, 2017.[3] Coren L. Apicella, Frank W. Marlowe, James H. Fowler, and Nicholas A. Christakis. Socialnetworks and cooperation in hunter-gatherers.
Nature , 481(7382):497–501, 01 2012.[4] Oriana Bandiera and Imran Rasul. Social networks and technology adoption in northernMozambique.
The Economic Journal , 116(514):869–902, 2006.[5] Anna Ben-Hamou, Roberto I Oliveira, and Yuval Peres. Estimating graph parameters viarandom walks with restarts. arXiv preprint arXiv:1709.00869 , 2017.[6] Petra Berenbrink, Bruce Krayenhoff, and Frederik Mallmann-Trenn. Estimating the numberof connected components in sublinear time.
Inform. Process. Lett. , 114(11):639–642, 2014.397] C. Borgs, J. T. Chayes, L. Lov´asz, V. T. S´os, and K. Vesztergombi. Convergent sequences ofdense graphs. I. Subgraph frequencies, metric properties and testing.
Adv. Math. , 219(6):1801–1851, 2008.[8] Michael Capobianco. Estimating the connectivity of a graph.
Graph Theory and Applications ,pages 65–74, 1972.[9] Arun Chandrasekhar and Randall Lewis. Econometrics of sampled networks.
Unpublishedmanuscript , 2011.[10] Bernard Chazelle, Ronitt Rubinfeld, and Luca Trevisan. Approximating the minimum span-ning tree weight in sublinear time.
SIAM J. Comput. , 34(6):1370–1379, 2005.[11] Beidi Chen, Anshumali Shrivastava, and Rebecca C Steorts. Unique entity estimation withapplication to the Syrian conflict. arXiv preprint arXiv:1710.02690 , 2017.[12] Timothy G. Conley and Christopher R. Udry. Learning about a new technology: Pineapplein ghana.
American Economic Review , 100(1):35–69, March 2010.[13] Graham Cormode and Nick Duffield. Sampling for big data: a tutorial. In
Proceedings ofthe 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ,pages 1975–1975. ACM, 2014.[14] Klaus Dohmen. Lower bounds for the probability of a union via chordal graphs.
ElectronicCommunications in Probability , 18, 2013.[15] Talya Eden, Amit Levi, Dana Ron, and C. Seshadhri. Approximately counting triangles insublinear time. In , pages 614–633. IEEE Computer Soc., Los Alamitos, CA, 2015.[16] P. Erd¨os and A. R´enyi. On the evolution of random graphs.
Magyar Tud. Akad. Mat. Kutat´oInt. K¨ozl. , 5:17–61, 1960.[17] Paul Erd¨os, L´aszl´o Lov´asz, and Joel Spencer. Strong independence of graphcopy functions. In
Graph theory and related topics (Proc. Conf., Univ. Waterloo, Waterloo, Ont., 1977) , pages165–172. Academic Press, New York-London, 1979.[18] Marcel Fafchamps and Susan Lund. Risk-sharing networks in rural Philippines.
Journal ofdevelopment Economics , 71(2):261–287, 2003.[19] Benjamin Feigenberg, Erica M Field, and Rohini Pande. Building social capital throughmicrofinance. Technical report, National Bureau of Economic Research, 2010.[20] Ove Frank. Estimation of graph totals.
Scand. J. Statist. , 4(2):81–89, 1977.[21] Ove Frank. Estimation of the number of connected components in a graph by using a sampledsubgraph.
Scand. J. Statist. , 5(4):177–188, 1978.[22] Chao Gao, Yu Lu, and Harrison H Zhou. Rate-optimal graphon estimation.
The Annals ofStatistics , 43(6):2624–2652, 2015. 4023] Oded Goldreich.
Introduction to Property Testing . Cambrdige University, 2017.[24] Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connection tolearning and approximation.
Journal of the ACM (JACM) , 45(4):653–750, 1998.[25] Oded Goldreich and Dana Ron. Approximating average parameters of graphs.
Random Struc-tures Algorithms , 32(4):473–493, 2008.[26] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. In
Stud-ies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness andComputation , pages 68–75. Springer, 2011.[27] Leo A. Goodman. On the estimation of the number of classes in a population.
Ann. Math.Statistics , 20:572–579, 1949.[28] Leo A. Goodman. Snowball sampling.
Ann. Math. Statist. , 32:148–170, 1961.[29] Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for internet map discovery. In
INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communi-cations Societies. Proceedings. IEEE , volume 3, pages 1371–1380. IEEE, 2000.[30] Mark S. Handcock and Krista J. Gile. Modeling social networks from sampled data.
Ann.Appl. Stat. , 4(1):5–25, 2010.[31] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels:First steps.
Social networks , 5(2):109–137, 1983.[32] Paul W Holland and Samuel Leinhardt. An exponential family of probability distributions fordirected graphs.
Journal of the American Statistical Association , 76(373):33–50, 1981.[33] Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without replacementfrom a finite universe.
Journal of the American Statistical Association , 47(260):663–685, 1952.[34] Svante Janson. Large deviations for sums of partly dependent random variables.
RandomStructures Algorithms , 24(3):234–248, 2004.[35] Jason M. Klusowski and Yihong Wu. Counting motifs with graph sampling. In
Proceedings ofthe 31st Conference On Learning Theory , pages 1966–2011, 2018.[36] W. L. Kocay. Some new methods in reconstruction theory. In
Combinatorial mathematics, IX(Brisbane, 1981) , volume 952 of
Lecture Notes in Math. , pages 89–114. Springer, Berlin-NewYork, 1982.[37] Eric D Kolaczyk.
Statistical Analysis of Network Data: Methods and Models . Springer Science& Business Media, 2009.[38] Eric D. Kolaczyk.
Topics at the Frontier of Statistics and Network Analysis: (Re)Visiting theFoundations . SemStat Elements. Cambridge University Press, 2017.[39] Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In
Proceedings of the 12thACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages631–636. ACM, 2006. 4140] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data/ca-CondMat.html , June 2014.[41] L´aszl´o Lov´asz.
Large Networks and Graph Limits , volume 60. American Mathematical Society,2012.[42] R Duncan Luce and Albert D Perry. A method of matrix analysis of group structure.
Psy-chometrika , 14(2):95–116, 1949.[43] Brendan D. McKay and Stanis(cid:32)law P. Radziszowski. Subgraph counting identities and Ramseynumbers.
J. Combin. Theory Ser. B , 69(2):193–209, 1997.[44] Elizabeth W. McMahon, Beth A. Shimkus, and Jessica A. Wolfson. Chordal graphs and thecharacteristic polynomial.
Discrete Math. , 262(1-3):211–219, 2003.[45] Assaf Natanzon, Ron Shamir, and Roded Sharan. A polynomial approximation algorithm forthe minimum fill-in problem.
SIAM J. Comput. , 30(4):1067–1079, 2000.[46] Ryan O’Donnell.
Analysis of Boolean functions . Cambridge University Press, New York, 2014.[47] Alon Orlitsky, Ananda Theertha Suresh, and Yihong Wu. Optimal prediction of the numberof unseen species.
Proc. Natl. Acad. Sci. USA , 113(47):13283–13288, 2016.[48] Yury Polyanskiy, Ananda Theertha Suresh, and Yihong Wu. Sample complexity of populationrecovery. In
Proceedings of Conference on Learning Theory (COLT) , Amsterdam, Netherland,Jul 2017. arXiv:1702.05574.[49] Peter H Reingen and Jerome B Kernan. Analysis of referral networks in marketing: Methodsand illustration.
Journal of Marketing Research , pages 370–378, 1986.[50] Donald J. Rose, R. Endre Tarjan, and George S. Lueker. Algorithmic aspects of vertex elimi-nation on graphs.
SIAM J. Comput. , 5(2):266–283, 1976.[51] Jean-Fran¸cois Rual, Kavitha Venkatesan, Tong Hao, Tomoko Hirozane-Kishikawa, Am´elie Dri-cot, Ning Li, Gabriel F Berriz, Francis D Gibbons, Matija Dreze, Nono Ayivi-Guedehoussou,et al. Towards a proteome-scale map of the human protein–protein interaction network.
Nature ,437(7062):1173, 2005.[52] Matthew J Salganik and Douglas D Heckathorn. Sampling and estimation in hidden popula-tions using respondent-driven sampling.
Sociological methodology , 34(1):193–240, 2004.[53] Michael PH Stumpf, Thomas Thorne, Eric de Silva, Ronald Stewart, Hyeong Jun An, MichaelLappe, and Carsten Wiuf. Estimating the size of the human interactome.
Proceedings of theNational Academy of Sciences , 105(19):6959–6964, 2008.[54] Robert E. Tarjan and Mihalis Yannakakis. Simple linear-time algorithms to test chordality ofgraphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs.
SIAM J.Comput. , 13:566–579, 1984.[55] Alexandre B. Tsybakov.
Introduction to nonparametric estimation . Springer Series in Statis-tics. Springer, New York, 2009. Revised and extended from the 2004 French original, Translatedby Vladimir Zaiats. 4256] Douglas B. West.
Introduction to graph theory . Prentice Hall, Inc., Upper Saddle River, NJ,1996.[57] Hassler Whitney. The coloring of graphs.
Ann. of Math. (2) , 33(4):688–718, 1932.[58] Yihong Wu and Pengkun Yang. Sample complexity of the distinct element problem. arxivpreprint arxiv:1612.03375arxivpreprint arxiv:1612.03375