Modifying a Graph's Degree Sequence and the Testablity of Degree Sequence Properties
aa r X i v : . [ m a t h . C O ] S e p Modifying a Graph’s Degree Sequence and the Testablity of DegreeSequence Properties
Lior Gishboliner ∗ September 29, 2020
Abstract
We show that if the degree sequence of a graph G is close in ℓ -distance to a given realizable degreesequence ( d , . . . , d n ), then G is close in edit distance to a graph with degree sequence ( d , . . . , d n ).We then use this result to prove that every graph property defined in terms of the degree sequenceis testable in the dense graph model with query complexity independent of n . Our main result in this paper is concerned with efficiently modifying a graph (i.e. adding/deleting edges)so as to obtain a graph with a prescribed degree sequence. Let us begin by introducing the relevantdefinitions. For convenience, we let the vertex-set of n -vertex graphs be [ n ]. The degree sequence of an n -vertex graph G is ( d G (1) , . . . , d G ( n )), where d G ( i ) is the degree of i in G . A sequence ( d , . . . , d n ) iscalled realizable or graphic if there is an n -vertex graph G such that d G ( i ) = d i for every 1 ≤ i ≤ n .Graphic sequences are a classical object of study, and there are several famous theorems characterizingthem, such as the Erd˝os-Gallai theorem [2] and the Havel-Hakimi theorem [7, 9]. See [8] for moreinformation. The normalized ℓ -distance between sequences x = ( x , . . . , x n ) and y = ( y , . . . , y n ) isdefined as ℓ ( x, y ) := n · P ni =1 | x i − y i | . Let G, G ′ be graphs on [ n ]. The (normalized) edit distancebetween G and G ′ is defined as | E ( G ) △ E ( G ′ ) | /n .We consider the following natural question: given that the degree sequence of a graph G is close in ℓ -distance to some realizable sequence ( d , . . . , d n ), how close is G (in edit distance) to a graph whosedegree sequence is ( d , . . . , d n )? As far as we know, this question has so far only been considered for theconstant sequence ( r, . . . , r ), which evidently corresponds to an r -regular graph (see [3, Claim 8.5.1]).Here we obtain a general result for all degree sequences: Theorem 1.1.
Let δ > and n ≥ δ − , let G be a graph on [ n ] , let ( d , . . . , d n ) be a realizable sequence,and suppose that P i ∈ V ( G ) | d G ( i ) − d i | ≤ δn . Then there is a graph G ′ on V ( G ) = [ n ] such that | E ( G ′ ) △ E ( G ) | = O ( δ / n ) and d G ′ ( i ) = d i for every i ∈ [ n ] . Note that Theorem 1.1 gives a quadratic dependence; if the ℓ -distance between the degree sequencesis δ , then the edit distance between G and G ′ is O ( δ / ). In the special case where ( d , . . . , d n ) is a ∗ ETH Zurich. Email: [email protected]. r -regular for some r ), a better,linear dependence has been proved (see Claim 8.5.1 in [3]). We wonder if one can obtain a lineardependence for all degree sequences, thus proving Theorem 1.1 with an optimal bound. Theorem 1.1 has an immediate application to graph property testing . This is an area of theoreticalcomputer science concerned with the design of fast, randomized algorithms which distinguish graphssatisfying a certain property from graphs which are far from the property. In the so-called dense graphmodel , which is the setting considered here, the measure of distance is simply the (normalized) editdistance defined above. More precisely, we say that a graph G is ε -close to a graph property P if thereis a graph satisfying P whose edit distance to G is at most ε ; otherwise, G is ε -far to P . A tester for P is an algorithm which, given an input consisting of a graph G and a proximity parameter ε >
0, acceptswith probability at least if G satisfies P , and rejects with probability at least if G is ε -far from P .The tester accesses the graph G by making edge-queries to its adjacency matrix, and may also samplerandom vertices of G . As is customary, we assume that n , the number of vertices of the input graph, isgiven to the tester as part of the input. We say that a tester has query complexity q ( n, ε ) if it makes atmost q ( n, ε ) edge-queries whenever invoked with an n -vertex input graph and with proximity parameter ε . In the present paper, we focus on testers whose query complexity is independent of n , namely, canbe bounded by a function q ( ε ) of ε alone. In such a case, we will say that the query complexity ofthe tester is q ( ε ). The study of graph property testers was initiated in the seminal paper of Goldreich,Goldwasser and Ron [4]. We must mention that property testing is a much wider area of research thanis presented here, and that apart from the dense graph model considered in this paper, property testershave been extensively studied in numerous other settings. For an overview of the field, we refer thereader to the book of Goldreich [3].In the present paper we are concerned with properties defined in terms of the degree sequence. Theprecise definitions are as follows. A degree-sequence property is a set D of realizable sequences thatis closed under permuting the coordinates, which means that for every ( d , . . . , d n ) ∈ D and everypermutation π : [ n ] → [ n ], the permuted sequence ( d π (1) , . . . , d π ( n ) ) is also in D . The graph propertydefined by D , which we denote by P ( D ), is the set of all graphs whose degree sequence is in D . Ourmain result establishes that every degree-sequence property is testable with query complexity which isindependent of n , and moreover depends only polynomially on ε . Theorem 1.2.
Every degree property is testable with query complexity q ( ε ) = poly (1 /ε ) . A significant caveat to Theorem 1.2 is that while the query complexity of the tester supplied by thistheorem is independent of n , its time complexity may depend on n , and in some cases quite badly; itmay be as large as exponential in n . The reason is that our tester for P ( D ) works by first constructinga bounded-size approximation of the degree sequence (of size depending only on ε ), and then checkingwhether this approximation is close (in ℓ -distance) to one of the sequences in D . This second stepis the source of (possibly) large time complexity: for general degree-sequence properties, we are notaware of a way of performing this check more efficietly than simply going over all sequences in D . Therequired run-time may thus be as large as the number of n -term realizable sequences, which is at leastexponential in n (see, for example, [1]). Still, for some specific degree-sequence properties D , this taskcan be done much faster, in time which depends only on ε . One example is the case that P ( D ) is the2roperty of being r -regular (for some r = r ( n )); see the explanation at the end of Section 3. We shouldmention that, as demonstrated in [3, Section 8.2.3], r -regularity was already known to be testable withquery- and time-complexity depending only on ε (and in fact with a better dependence than is given byour general Theorem 1.2). It may be interesting to find general families of degree-sequence propertiesfor which the run-time is independent of n . On the other hand, it seems likely that dependence on n isunavoidable if one wishes to handle all degree-sequence properties. It may be interesting to determineif there are cases where exponential dependence is necessary. Paper organization
Theorem 1.1 is proved in Section 2, and Theorem 1.2 is proved in Section 3,where it is restated as Theorem 3.2.
In the proof of Theorem 1.1 we will consider 2-edge-coloured graphs, namely graphs whose edges arecoloured by red and blue. For a vertex v in a 2-edge-coloured graph G , we denote by N RG ( v ) (resp. N BG ( v )) the red (resp. blue) neighbourhood of v in G ; namely, N RG ( v ) = { u ∈ V ( G ) : { u, v } ∈ E ( G ) is coloured red } , and similarly for blue. We will also put d RG ( v ) = | N RG ( v ) | and d BG ( v ) = | N BG ( v ) | . When there is nosuperscript indicating the colour, we mean that both colours are accounted; so N G ( v ) = N RG ( v ) ∪ N BG ( v )and d G ( v ) = d RG ( v ) + d BG ( v ). The subscript G will be omitted when there is no risk of confusion. For aset U ⊆ V ( G ), we will denote by G [ U ] the (2-edge-coloured) subgraph of G induced by U . We will use e ( U ) to denote the number of edges inside a vertex-set U , and e ( U, W ) to denote the number of edgesbetween vertex-sets
U, W . All logarithms are base 2.An alternating cycle in a 2-edge-coloured graph is a cycle whose edge have alternating colours (inparticular, such a cycle must be even). We will use the following result of Grossman and H¨aggkvist [5].
Theorem 2.1 ([5]) . Let F be a -connected -edge-coloured graph in which d R ( v ) , d B ( v ) ≥ for every v ∈ V ( F ) . Then F contains an alternating cycle. The following is an easy corollary of Theorem 2.1.
Corollary 2.2.
Let F be a -edge-coloured graph on n ≥ vertices in which d R ( v ) , d B ( v ) ≥ log n forevery v ∈ V ( F ) . Then F contains an alternating cycle.Proof. The proof is by induction on n , where the base case n = 2 is trivial. Suppose then that n ≥
3. ByTheorem 2.1, if F has no alternating cycle then it is not 2-connected, meaning that there is a partition V ( F ) = X ∪ Y ∪ S such that | S | ≤ X, Y = ∅ , and there are no edges in F between X and Y . Supposewithout loss of generality that | X | ≤ n/ | Y | ≤ n/ X and Y and as | S | ≤
1, we must have d RF [ X ] ( v ) , d BF [ X ] ( v ) ≥ log n − n/ ≥ log( | X | )for every v ∈ X . In particular we have | X | ≥
2, since log n >
1. By the induction hypothesis, F [ X ]contains an alternating cycle, as required.An alternative way of deriving Corollary 2.2 is to apply a directed analogue thereof due to Gutin,Sudakov and Yeo [6], which states that every n -vertex 2-edge-coloured digraph with minimum out- and3n-degree at least C log n in each of the colours contains an alternating directed cycle. The reductionfrom the undirected case to the directed one proceeds by replacing each undirected edge { u, v } withboth directed edges ( u, v ) , ( v, u ), coloured with the same colour as { u, v } .The last tool we need in the proof of Theorem 1.1 is the following lemma. Lemma 2.3.
Let δ > and n ≥ δ − , let F be a -edge-coloured graph on [ n ] which contains noalternating cycle, and suppose that P i ∈ V ( F ) | d R ( i ) − d B ( i ) | ≤ δn . Then | E ( F ) | ≤ O ( δ / n ) .Proof. By (the contrapositive of) Corollary 2.2, there exists no non-empty subset U ⊆ V ( F ) such that d RF [ U ] ( v ) , d BF [ U ] ( v ) ≥ log n ≥ log( | U | ) for every v ∈ U . This implies that there is an ordering of thevertices of F such that each vertex has either less than log n red edges or less than log n blue edgesto the vertices succeeding it in the ordering. By possibly renaming vertices, we may assume that thisordering is 1 , . . . , n ; namely, for every 1 ≤ i ≤ n there is a colour c i ∈ { red,blue } , such that i has lessthan log n neighbours in colour c i in the set { i + 1 , . . . , n } . For an edge e = { i, j } ∈ E ( F ) with i < j ,we will call i the left end of e and j the right end of e . Let E ∗ be the set of edges of the form { i, j } , i < j , whose colour is c i . Note that | E ∗ | ≤ n log n .Set k := δ − / , and partition [ n ] into k consecutive intervals X , . . . , X k of length nk = δ / n each.Namely, for each 1 ≤ j ≤ k , define X j := { i ∈ [ n ] : ( j − · nk + 1 ≤ i ≤ j · nk } . For each 1 ≤ j ≤ k ,put ∆ j := P i ∈ X j | d R ( i ) − d B ( i ) | , noting that ∆ + · · · + ∆ k ≤ δn by assumption. Finally, define R j := { i ∈ X j : c i = red } and B j := { i ∈ X j : c i = blue } (for 1 ≤ j ≤ k ).Fixing any 1 ≤ j ≤ k , we claim that X i ∈ R j d B ( i ) + X i ∈ B j d R ( i ) ≥ k X t = j +1 e ( X j , X t ) + e ( X j ) − n log n. (1)To see that (1) holds, recall that each i ∈ R j is the left end of at most log n red edges, and each i ∈ B j is the left end of at most log n blue edges. Thus, apart from the | E ∗ | ≤ n log n edges in E ∗ , all edgeswhose left end is in X j — i.e., all edges inside X j or between X j and X j +1 ∪ · · · ∪ X k — are either bluewith a left end in R j or red with a left end in B j , and are hence counted by the left-hand side of (1).Next, we use a similar consideration to argue that X i ∈ R j d R ( i ) + X i ∈ B j d B ( i ) ≤ j − X s =1 e ( X s , X j ) + e ( X j ) + 2 n log n. (2)To see that (2) holds, we again use the fact that each i ∈ R j is the left end of at most log n red edges andeach i ∈ B j is the left end of at most log n blue edges, which means that apart from the | E ∗ | ≤ n log n edges in E ∗ (which are counted at most twice), every edge counted by the left-hand side of (2) has itsright end in X j , and hence its left end in X ∪ · · · ∪ X j . Furthermore, if e = { x, y } is contained in X j and e / ∈ E ∗ , then e is counted at most once by the left-hand side of (2); indeed, assuming x < y ,we know that either x ∈ R j and e is blue, or x ∈ B j and e is red, which means that the sums on theleft-hand side of (2) may only count e when i = y , but not when i = x . This establishes (2).Finally, our definition of ∆ j implies that X i ∈ R j d B ( i ) + X i ∈ B j d R ( i ) ≤ X i ∈ R j d R ( i ) + X i ∈ B j d B ( i ) + ∆ j . (3)4y combining (1), (2) and (3), we obtain k X t = j +1 e ( X j , X t ) ≤ j − X s =1 e ( X s , X j ) + ∆ j + 3 n log n. (4)Now fix any 1 ≤ r ≤ k −
1, and sum the inequality (4) over j = 1 , . . . , r to obtain: r X j =1 k X t = j +1 e ( X j , X t ) ≤ X ≤ s 1, we get X ≤ s Since ( d , . . . , d n ) is realizable, there is a graph on [ n ] in which the degree ofvertex i is d i (for every 1 ≤ i ≤ n ). Among all such graphs, fix one, G ′ , which minimizes | E ( G ′ ) △ E ( G ) | .Our goal is to show that | E ( G ′ ) △ E ( G ) | = O ( δ / n ). Let F be the graph on [ n ] with edge-set E ( F ) = E ( G ′ ) △ E ( G ). We 2-colour the edges of F by colouring E ( G ′ ) \ E ( G ) red and E ( G ) \ E ( G ′ ) blue. Thekey observation is that F does not have an alternating cycle; indeed, if there were an alternating cycle in F , then by removing from G ′ all red edges of this cycle, and adding to G ′ all blue edges of this cycle, wewould get a graph G ′′ with the same degree sequence as G ′ and with | E ( G ′′ ) △ E ( G ) | < | E ( G ′ ) △ E ( G ) | ,in contradiction to the minimality of G ′ . Next, observe that for every i ∈ [ n ], we have d i = d G ′ ( i ) = d G ( i ) + d RF ( i ) − d BF ( i ). Hence, X i ∈ V ( F ) | d RF ( i ) − d BF ( i ) | = X i ∈ V ( G ) | d G ( i ) − d i | ≤ δn . By Lemma 2.3 we have | E ( G ′ ) △ E ( G ) | = e ( F ) = O ( δ / n ), as required.5 Testing Degree Sequence Properties In this section we show that every degree-sequence property admits a tester with query complexityindependent of n , thus proving Theorem 1.2. Our tester works by approximating the degree sequenceof the given input graph. Let us introduce the relevant definitions. A degree statistic is given by reals α , . . . , α k ∈ [0 , 1] with α + · · · + α k = 1. For a degree statistic α = ( α , . . . , α k ) and an integer n ≥ d = d ( n, α ) as the sequence d = ( d , . . . , d n ) ∈ N n which has, for every 1 ≤ ℓ ≤ k , precisely α ℓ n coordinates equal to ℓ − k n (we assume, for simplicity of presentation, that n k and α n, . . . , α k n areintegers). For example, the sequence d = d ( n, α ) corresponding to α = (0 . , . 8) has 0 . n terms equal to n and 0 . n terms equal to n . Let δ > G be a graph on [ n ]. A degree statistic α = ( α , . . . , α k )is said to δ -approximate G if there is a permutation π : [ n ] → [ n ] such that P ni =1 | d G ( π ( i )) − d i | ≤ δn ,where ( d , . . . , d n ) = d ( n, α ). In other words, a degree statistic δ -approximates G if, after possiblyrelabeling the vertices of G , the degree sequence of G and the degree sequence corresponding to thestatistic differ by at most δ in (normalized) ℓ -distance. Lemma 3.1. There is an algorithm which, given δ > and an input graph G , outputs a degree statistic ( α , . . . , α k ) with k = ⌈ /δ ⌉ which δ -approximates G with probability at least . The query complexityand run-time of the algorithm is ˜ O ( δ − ) .Proof. We use a standard “approximation-by-sampling” argument. Put n := | V ( G ) | . We set k := ⌈ /δ ⌉ and γ := k (2 k +1) . Our algorithm works as follows: sample s := log(12 k )2 γ = ˜ O ( δ − ) vertices v , . . . , v s ∈ V ( G ), and for each 1 ≤ i ≤ s , sample t := log(6 s )2 γ = ˜ O ( δ − ) additional vertices u i,j ∈ V ( G ), 1 ≤ j ≤ t ,where all samples are made uniformly and independently. Now, query all pairs { v i , u i,j } (1 ≤ i ≤ s and1 ≤ j ≤ t ). For each 1 ≤ i ≤ s , let ¯ d i := | N G ( v i ) ∩ { u i, , . . . , u i,t }| . Set α := s · { ≤ i ≤ s : ¯ d i t ≤ k } and α ℓ := s · { ≤ i ≤ s : ℓ − k < ¯ d i t ≤ ℓk } for 2 ≤ ℓ ≤ k . The algorithm outputs ( α , . . . , α k ). Notethat the total number of queries, as well as the run-time, is O ( st ) = ˜ O ( δ − ), as required.Let us prove the correctness of the above algorithm. Observe that ¯ d i is distributed as a binomialrandom variable with parameters t and d G ( v i ) /n . By Hoeffding’s inequality, for every 1 ≤ i ≤ s we have P (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) ¯ d i t − d G ( v i ) n (cid:12)(cid:12)(cid:12)(cid:12) ≥ γ (cid:21) ≤ e − γ t ≤ s . (5)Let A be the event that (cid:12)(cid:12)(cid:12) ¯ d i t − d G ( v i ) n (cid:12)(cid:12)(cid:12) ≤ γ for every 1 ≤ i ≤ s . By using (5) and taking the union boundover 1 ≤ i ≤ s , we get that P [ A ] ≥ / X ⊆ { , . . . , n − } , we define c ( X ) := n · { v ∈ V ( G ) : d G ( v ) ∈ X } and ¯ c ( X ) := s · { ≤ i ≤ s : d G ( v i ) ∈ X } . Observe that forany given set X , the random variable { ≤ i ≤ s : d G ( v i ) ∈ X } = s · ¯ c ( X ) has a binomial distributionwith parameters s and c ( X ). Hence, by Hoeffding’s inequality, we have P [ | ¯ c ( X ) − c ( X ) | ≥ γ ] ≤ e − γ s ≤ k . (6)We will apply (6) to the following 2 k − X ℓ := { d : 0 ≤ d ≤ ( ℓk + γ ) n } for 1 ≤ ℓ ≤ k − Y ℓ := { d : 0 ≤ d ≤ ( ℓ − k − γ ) n } for 2 ≤ ℓ ≤ k . Let B be the event that | ¯ c ( X ) − c ( X ) | ≤ γ for every X ∈ { X , . . . , X k − , Y , . . . , Y k } . By using (6) and the union bound, we get that P [ B ] ≥ / P [ A ∩ B ] ≥ / 3. Let us assume from now on that both A and B occurred, and show that under this assumption, ( α , . . . , α k ) indeed δ -approximates G . First, observethat for every 1 ≤ ℓ ≤ k − 1, we have1 n · (cid:26) v ∈ V ( G ) : d G ( v ) n ≤ ℓk + γ (cid:27) = c ( X ℓ ) ≥ ¯ c ( X ℓ ) − γ = 1 s · (cid:26) ≤ i ≤ s : d G ( v i ) n ≤ ℓk + γ (cid:27) − γ ≥ s · (cid:26) ≤ i ≤ s : ¯ d i t ≤ ℓk (cid:27) − γ = α + · · · + α ℓ − γ, (7)where the first inequality holds because B occurred, and the second inequality holds because having ¯ d i t ≤ ℓk implies that d G ( v i ) n ≤ ℓk + γ , as A occurred. Note that (7) also trivially holds for ℓ = k , since theleft-hand side equals 1. The last equality follows from the definition of α , . . . , α k .Similarly, for every 2 ≤ ℓ ≤ k we have1 n · (cid:26) v ∈ V ( G ) : d G ( v ) n ≤ ℓ − k − γ (cid:27) = c ( Y ℓ ) ≤ ¯ c ( Y ℓ ) + γ = 1 s · (cid:26) ≤ i ≤ s : d G ( v i ) n ≤ ℓ − k − γ (cid:27) + γ ≤ s · (cid:26) ≤ i ≤ s : ¯ d i t ≤ ℓ − k (cid:27) + γ = α + · · · + α ℓ − + γ. (8)Again, (8) trivially also holds for ℓ = 1.We now define subsets A , . . . , A k ⊆ V ( G ) as follows. Suppose without loss of generality that V ( G ) = [ n ] and d G (1) ≤ · · · ≤ d G ( n ) (otherwise simply relabel the vertices of G ). For each 1 ≤ ℓ ≤ k ,define A ℓ = (cid:8) x ∈ [ n ] : ( α + · · · + α ℓ − + γ ) n + 1 ≤ x ≤ ( α + · · · + α ℓ − γ ) n (cid:9) (some of these setsmight be empty). Then | A ℓ | = max { , ( α ℓ − γ ) n } for every 1 ≤ ℓ ≤ k , and A , . . . , A k are pairwisedisjoint. By (7), there are at least ( α + · · · + α ℓ − γ ) n vertices v ∈ V ( G ) with d G ( v ) ≤ ( ℓk + γ ) · n ,which implies that d G ( v ) ≤ ( ℓk + γ ) · n for every v ∈ A ℓ (as the vertices of G are ordered in an increasingorder by their degrees). Similarly, by (8), there are at most ( α + · · · + α ℓ − + γ ) n vertices v ∈ V ( G )with d G ( v ) ≤ ( ℓ − k − γ ) · n , which implies that d G ( v ) > ( ℓ − k − γ ) · n for every v ∈ A ℓ . Observe also that P kℓ =1 | A i | ≥ P kℓ =1 ( α ℓ − γ ) n = n − kγn. Hence, Z := V ( G ) \ ( A ∪ · · · ∪ A k ) has size at most 2 kγn .To conclude, we compare the degree sequence of G with the sequence d ( n, α ), with the goal of showingthat α = ( α , . . . , α k ) indeed δ -approximates G . Recall that for each 1 ≤ ℓ ≤ k , the sequence d ( n, α )has exactly α ℓ n coordinates equal to ℓ − k n . By comparing | A ℓ | of these coordinates to the vertices of A ℓ (for each 1 ≤ ℓ ≤ k ), and comparing the remaining coordinates to the vertices of Z , we see that the ℓ -distance between d ( n, α ) and (a suitable permutation of) the degree sequence of G is at most k X ℓ =1 X v ∈ A ℓ (cid:12)(cid:12)(cid:12)(cid:12) d G ( v ) − ℓ − k n (cid:12)(cid:12)(cid:12)(cid:12) + | Z | n ≤ k X ℓ =1 | A ℓ | · (cid:18) k + γ (cid:19) n + | Z | n ≤ (cid:18) k + γ (cid:19) n + | Z | n ≤ (cid:18) k + (2 k + 1) γ (cid:19) n ≤ δn . ℓ − k − γ ) · n < d G ( v ) ≤ ( ℓk + γ ) · n for every v ∈ A ℓ and 1 ≤ ℓ ≤ k , and the last inequality follows from our choice of k and γ . This completes the proof.Recall that a degree-sequence property is a set of realizable sequences which is closed under permutingthe coordinates, and that the graph property P ( D ) corresponding to a degree-sequence property D isthe set of all graphs whose degree sequence is in D . Recall that the normalized ℓ -distance betweensequences x = ( x , . . . , x n ) and y = ( y , . . . , y n ) is defined as ℓ ( x, y ) := n · P ni =1 | x i − y i | . The followingis the precise form of Theorem 1.2 that we will prove. Theorem 3.2.