[PDF] Exploring the Subgraph Density-Size Trade-off via the Lovász Extension

Abstract

Given an undirected graph, the Densest-k-Subgraph problem (DkS) seeks to find a subset of k vertices such that the sum of the edge weights in the corresponding subgraph is maximized. The problem is known to be NP-hard, and is also very difficult to approximate, in the worst-case. In this paper, we present a new convex relaxation for the problem. Our key idea is to reformulate DkS as minimizing a submodular function subject to a cardinality constraint. Exploiting the fact that submodular functions possess a convex, continuous extension (known as the Lov\'asz extension), we propose to minimize the Lov\'asz extension over the convex hull of the cardinality constraints. Although the Lov\'asz extension of a submodular function does not admit an analytical form in general, for DkS we show that it does. We leverage this result to develop a highly scalable algorithm based on the Alternating Direction Method of Multipliers (ADMM) for solving the relaxed problem. Coupled with a pair of fortuitously simple rounding schemes, we demonstrate that our approach outperforms existing baselines on real-world graphs and can yield high quality sub-optimal solutions which typically are a posteriori no worse than 65-80\% of the optimal density.

Full PDF

aa r X i v : . [ c s . S I] F e b Exploring the Subgraph Density-Size Trade-oﬀvia the Lovász Extension

Aritra Konar

University of VirginiaCharlottesville, Virginia, [email protected]

Nicholas D. Sidiropoulos

University of VirginiaCharlottesville, Virginia, [email protected]

ABSTRACT

Given an undirected graph, the

Densest - 𝑘 - Subgraph problem (DkS)seeks to ﬁnd a subset of 𝑘 vertices such that the sum of the edgeweights in the corresponding subgraph is maximized. The problemis known to be NP-hard, and is also very diﬃcult to approximate, inthe worst-case. In this paper, we present a new convex relaxationfor the problem. Our key idea is to reformulate DkS as minimizinga submodular function subject to a cardinality constraint. Exploit-ing the fact that submodular functions possess a convex, contin-uous extension (known as the Lovász extension), we propose tominimize the Lovász extension over the convex hull of the cardi-nality constraints. Although the Lovász extension of a submodularfunction does not admit an analytical form in general, for DkS weshow that it does. We leverage this result to develop a highly scal-able algorithm based on the Alternating Direction Method of Mul-tipliers (ADMM) for solving the relaxed problem. Coupled with apair of fortuitously simple rounding schemes, we demonstrate thatour approach outperforms existing baselines on real-world graphsand can yield high quality sub-optimal solutions which typicallyare a posteriori no worse than − of the optimal density. CCS CONCEPTS • Mathematics of computing → Graph algorithms ; Contin-uous optimization ; Submodular optimization and polyma-troids . KEYWORDS

Dense subgraphs; submodularity; Lovász extension; convex opti-mization; Alternating Direction Method of Multipliers

ACM Reference Format:

Aritra Konar and Nicholas D. Sidiropoulos. 2021. Exploring the SubgraphDensity-Size Trade-oﬀ via the Lovász Extension. In

Proceedings of the Four-teenth ACM International Conference on Web Search and Data Mining (WSDM’21), March 8–12, 2021, Virtual Event, Israel.

ACM, New York, NY, USA, 10 pages.https://doi.org/10.1145/3437963.3441756

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor proﬁt or commercial advantage and that copies bear this notice and the full cita-tion on the ﬁrst page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior speciﬁc permissionand/or a fee. Request permissions from [email protected].

WSDM ’21, March 8–12, 2021, Virtual Event, Israel © Motivation and Overview:

Dense subgraph discovery is a keyprimitive in graph mining that ﬁnds application in diverse disci-plines ranging from computational biology [34], chemical infor-matics [32], network science [9, 17, 41] and fraud detection [20, 40].Given an unweighted, undirected graph on 𝑛 vertices, the classical DensestSubgraph problem [18] seeks to determine the subgraphwith the largest average degree. The problem can be solved exactlyin polynomial-time and approximately (but quasi-optimally) via agreedy algorithm [8]. Recent work has extended these ideas to takeinto account higher-order structure in graphs [28, 37].A drawback of the aforementioned approaches is that they donot feature a means of explicitly controlling the size of the desiredsubgraph. Hence, if one is interested in computing the densest sub-graph as a function of the size 𝑘 with the aim of exploring the opti-mal density-size trade-oﬀ, an additional cardinality constraint onthe subgraph size has to be imposed in the formulation of Densest-Subgraph . Unfortunately, this simple modiﬁcation renders the re-sulting problem, known as

Densest - 𝑘 - Subgraph (DkS), NP-hard.Furthermore, the problem is known to be notoriously diﬃcult toapproximate in the worst-case [5, 21, 27].

Prior Art:

The state-of-the-art approximation algorithm [4] forthe DkS problem provides a worst-case approximation guaranteeof 𝑂 ( 𝑛 / + 𝜖 ) (for some 𝜖 > ) in time 𝑛 𝑂 ( / 𝜖 ) for every choice of 𝑘 , which is a very pessimistic result in general. Restricted casesof the problem are known to enjoy better approximation guaran-tees. For dense graphs, where the number of edges 𝑚 = Ω ( 𝑛 ) and for linear subgraph sizes 𝑘 = Ω ( 𝑛 ) , a + 𝜖 approximation al-gorithm was presented in [1]. However, the result has limited im-plications for real-world networks since they are sparse in edges(with 𝑚 = 𝑂 ( 𝑛 ) ) [38]. For general sizes 𝑘 , a 𝑂 ( 𝑛 / 𝑘 ) approxima-tion can be achieved by applying a greedy algorithm [13] or viasemideﬁnite relaxation [12, 36]. Note that in the linear size regime 𝑘 = Ω ( 𝑛 ) , this yields a constant-factor approximation. That be-ing said, in practice, for large graphs one is more interested in the sublinear size regime 𝑘 = 𝑜 ( 𝑛 ) , where the bounds again becomevery pessimistic. Recently, a new semideﬁnite relaxation approachfor DkS has been proposed in [6] that guarantees exact recoveryin planted dense subgraph models with high probability. However,real world graphs are not known to obey such synthetic models.Additionally, the high complexity incurred in solving the semidef-inite program is a limitation of the approach.In a departure from such worst-case results, the recent workof [30] approaches the problem via the lens of low-rank matrixfactorization. Speciﬁcally, it is shown that if the graph adjacencymatrix has constant rank (in 𝑛 ), then the DkS problem is solvable inpolynomial-time. When the adjacency matrix is not constant rank,olving the problem using low-rank approximation still yields an aposteriori graph-dependent upper bound on the optimal density fora given 𝑘 . Through experiments on real-world graphs, it is shownthat the approach yields high-quality solutions that can come closeto attaining the upper bound in certain cases. Approach and Contributions:

In this paper, we propose a newconvex relaxation of the DkS problem for obtaining high-quality,sub-optimal solutions. Our contributions can be summarized as fol-lows: • We reformulate the DkS problem as minimizing a submodu-lar function subject to a cardinality constraint. Leveragingthe fact that submodular functions are endowed with a con-vex, continuous extension (i.e., the Lovász extension), wedevise a new convex formulation for DkS that minimizesthe Lovász extension over the convex hull of the cardinalityconstraints. • In general, the Lovász extension of a submodular functiondoes not admit an analytical form. In this case however, byjudiciously exploiting the structure inherent in the problem,we establish a simple closed-form expression for the Lovászextension. We utilize this result to develop an eﬃcient andscalable algorithm for solving the convex relaxation via aninexact variant of the popular Alternating Direction Methodof Multipliers (ADMM) [7, 24], which features computation-ally lightweight updates and guaranteed convergence. • The solution of our relaxed problem is not guaranteed to beintegral in general. Hence, we perform post-processing viatwo simple rounding schemes to obtain ﬁnal integral solu-tions for DkS. While we do not possess a priori guaranteeson the quality of the obtained solution at present, via exper-iments on real-world graphs we demonstrate that our ap-proach can consistently outperform prominent baselines. Infact, utilizing the upper bound on the optimal edge-densitydeveloped in [30], we demonstrate that a posteriori our ap-proach can discover dense subgraphs that are typically noworse than − of the optimal density.On a ﬁnal note, to put our contributions into perspective, we notethat the prevailing approach to convex relaxation for combinato-rial quadratic programming problems has been semideﬁnite relax-ation [26], which is the Lagrangian bi-dual of the original problem,and hence is the closest convex problem to DkS, in a certain sense.Since the Lovász extension is the convex envelope of a submodularfunction [25], our convex relaxation can be viewed as an alterna-tive which is the closest convex problem to DkS in a diﬀerent sense(this notion is made precise in Section 4). We provide an overview of basic concepts regarding submodularfunctions [2, 15, 25]. Given a set of 𝑛 objects V = { , · · · , 𝑛 } , a setfunction 𝐹 : 2 V → R assigns a real value to any subset S ⊆ V . Deﬁnition 1. [Submodularity]

A set function 𝐹 ( . ) is said to be submodular if and only if for all subsets A , B ⊆ V , it holds that 𝐹 (A ∪ B) + 𝐹 (A ∩ B) ≤ 𝐹 (A) + 𝐹 (B) . (1)The above deﬁnition can be equivalently, and more conveniently,restated in the following form. Deﬁnition 2.

For all

A ⊆ B ⊆ V \ { 𝑣 } , it holds that 𝐹 (A ∪ { 𝑣 }) − 𝐹 (A) ≥ 𝐹 (B ∪ { 𝑣 }) − 𝐹 (B) . (2)That is, for such functions, given subsets A ⊆ B ⊆ V \ { 𝑣 } ,the marginal improvement obtained by adding an element 𝑣 to thelarger set B never exceeds that obtained by adding 𝑣 to its subset A . Simply stated, equation (2) asserts that submodular functionsexhibit a diminishing returns property. Deﬁnition 3. [The Lovász extension]

A remarkable feature ofsubmodular functions is that they possess a continuous, convex ex-tension known as the Lovász extension, which extends their do-main from V to the unit interval [ , ] 𝑛 (recall 𝑛 = |V| ). Formally,the Lovász extension 𝑓 𝐿 : [ , ] 𝑛 → R of a submodular function 𝐹 ( . ) is deﬁned as 𝑓 𝐿 ( x ) : = max g ∈B 𝐹 g 𝑇 x , (3)where the set B 𝐹 is the base polytope associated with 𝐹 ( . ) and isdeﬁned as B 𝐹 : = { g ∈ R 𝑛 : g 𝑇 𝑛 = 𝐹 (V) ; g 𝑇 S ≤ 𝐹 (S) , ∀ S ⊆ V} . (4)From equation (3), it is evident that the Lovász extension corre-sponds to the support function of the base polytope B 𝐹 , and isthus convex. In fact, it can be shown that 𝑓 𝐿 ( . ) is convex if andonly if 𝐹 ( . ) is submodular. Furthermore, when evaluated at a bi-nary vector x ∈ { , } 𝑛 , the Lovász extension equals the value ofthe submodular function 𝐹 ( . ) . In this section, we formally describe the

Densest - 𝑘 - Subgraph (DkS)problem. Consider a weighted, undirected, simple graph G : = (V , E , 𝑤 ) on 𝑛 vertices, with vertex set V : = { , · · · , 𝑛 } and edge set E ⊆V × V consisting of 𝑚 : = |E| edges. The function 𝑤 : E → R ++ assigns each edge with a positive weight, and we collect theseweights in a vector w ∈ R 𝑚 ++ . In the special case that w is thevector of all-ones, we say that the graph G is unweighted.Given a positive integer 1 < 𝑘 < 𝑛 , we consider the problemof computing the subset of vertices S ⊂ V of size 𝑘 such that thesum of the edge weights in the induced subgraph G S is as largesas possible. The DkS problem can be expressed in quadratic pro-gramming form as max x ∈{ , } 𝑛 x 𝑇 Wx s.to 𝑇 x = 𝑘, (5)where W represents the 𝑛 × 𝑛 (weighted) adjacency matrix of thegraph G . Note that each binary vector x ∈ { , } 𝑛 corresponds tothe indicator vector of a vertex subset S ⊆ V , i.e., we have 𝑥 𝑖 = ( , if 𝑖 ∈ S , otherwise . (6)Hence, for a given subset of vertices S ⊂ V , the objective functioncounts the total weight of the edges in the subgraph G S induced by S , while the constraints ensure that S contains precisely 𝑘 vertices.Regarding computational complexity, problem (5) is known tobe NP–hard in its general form (it contains the MaximumCliqe problem as a special case [13]). Additionally, the problem also hasa documented history of resistance to eﬃcient approximation inolynomial-time [5, 21, 27]. Notwithstanding such pessimistic worst-case results, in this paper we devise a new polynomial-time approx-imation algorithm for the DkS problem that relies on exploitingthe combinatorial structure of (5) in a principled manner. Our ap-proach is outlined in the following section.

Consider the following equivalent reformulation of problem (5) insubset selection formmin |S| = 𝑘 (cid:26) 𝐹 (S) : − 𝑇 S W1 S (cid:27) , (7)where S denotes the binary indicator vector of subset S ⊂ V .We now make the following crucial observation regarding the costfunction.

Theorem 4.1.

The cost function 𝐹 ( . ) is submodular. Proof.

Note that for a given subset S , the cost function is lin-early separable over the edge set E S of the induced subgraph G S ,i.e., we have 𝐹 (S) = Õ ( 𝑖,𝑗 ) ∈E S − 𝑤 𝑖 𝑗 , (8)where 𝑤 𝑖 𝑗 denotes the weight of edge ( 𝑖, 𝑗 ) ∈ E S . Since submodu-larity is preserved under summation, in order to obtain the desiredresult, it suﬃces to show that each constituent function 𝐹 𝑖 𝑗 (S) : = ( − 𝑤 𝑖 𝑗 , if 𝑖 and 𝑗 ∈ S , , otherwise , (9)is submodular. Deﬁning the pair of sets A : = S ∩ { 𝑖 } , B : = S ∩ { 𝑗 } and applying Deﬁnition 1 then completes the proof. (cid:3) Although the above observation does not make the (NP–hard)DkS problem any easier to solve, it does open the door to the fol-lowing approximation approach. First, we deﬁne the set P : = { x ∈ [ , ] 𝑛 ; 𝑇 x = 𝑘 } (10)to be the convex hull of the combinatorial sum-to- 𝑘 constraints.Since submodular functions are endowed with a convex, continu-ous extension (the Lovász extension) which equals the value of 𝐹 ( . ) at all binary { , } 𝑛 vectors, the DkS problem can be equivalentlyexpressed as min 𝑓 𝐿 ( x ) s.to x ∈ { , } 𝑛 ∩ P . (11)On dropping the combinatorial constraints, we obtain the relaxedproblem min x ∈P 𝑓 𝐿 ( x ) (12)which we refer to as the Lovász relaxation. Clearly, the above prob-lem is convex, and can be solved in polynomial-time to obtain alower bound on the optimal value of (11).We now outline our primary motivation for employing the Lovászextension. Before proceeding, we recall a few basics of convex anal-ysis [33]. Given any function 𝑓 : R 𝑛 → R ∪ {+∞} , its Fenchelconjugate is deﬁned as 𝑓 ∗ ( y ) : = sup x { y 𝑇 x − 𝑓 ( x )} , which is al-ways closed and convex (even if 𝑓 ( . ) is not). Taking the conjugateof 𝑓 ∗ ( . ) yields the biconjugate 𝑓 ∗∗ ( . ) of the function 𝑓 ( . ) , which isalso closed and convex, and an under-estimator of 𝑓 ( . ) , i.e., 𝑓 ∗∗ ≤ 𝑓 . As a matter of fact, the biconjugate 𝑓 ∗∗ ( . ) constitutes the convexclosure of 𝑓 ( . ) , and thus, is the tightest convex under-estimator of 𝑓 ( . ) (in a certain sense). The link between the Lovász extension ofa submodular function and its Fenchel biconjugate is provided bythe following result, which is extracted from [2, 25]. Lemma 4.2.

Given a subodular function 𝐹 ( . ) , deﬁne the function 𝑔 ( x ) : = ( 𝐹 (S) , ∀ x = S , S ⊆ V +∞ , ∀ x ≠ { , } 𝑛 . (13) Then, the Fenchel biconjugate of 𝑔 ( . ) is the Lovász extension of 𝐹 ( . ) . Hence, the Lovász extension corresponds to the convex closure,or the tightest convex under-estimator (in the above sense) of thesubmodular function 𝐹 ( . ) on the domain [ , ] 𝑛 , which justiﬁesits use as a principled, continuous relaxation of the quadratic costfunction of DkS.While the above result places the Lovász relaxation (12) on aﬁrm theoretical footing, from an algorithmic perspective, a notabledrawback of the approach is that the Lovász extension does notadmit an analytical form in general. This stems from the fact that 𝑓 𝐿 ( . ) is the support function of the base polytope B 𝐹 of 𝐹 ( . ) (seeequation (3)), which is characterized by (potentially) an exponen-tial number of inequalities in the problem dimension 𝑛 . In his sem-inal work [11], Edmonds presented a simple greedy algorithm forcomputing a subgradient of the Lovász extension at any point x ∈[ , ] 𝑛 in time 𝑂 ( 𝑛 log 𝑛 + 𝑛𝑇 ) without explicitly constructing B 𝐹 . While this fact can be exploited to solve the Lovász relaxation(12) via a projected subgradient algorithm, such an approach suf-fers from slow convergence. Indeed, the primal convergence rate(i.e., convergence to the optimal value) of subgradient methods forconvex problems is 𝑂 ( /√ 𝑡 ) [29], where 𝑡 is the number of itera-tions. Hence, adopting such an approach is limited to producinglow-accuracy solutions for large-scale problems.We now demonstrate that it is possible to solve the Lovász relax-ation (hereafter referred to as the L-relaxation) in a substantiallymore eﬃcient manner. Our key result is that for the DkS problem,we can explicitly characterize the base polytope of the submodu-lar cost function 𝐹 , which in turn allows us to obtain an analyti-cal form for the Lovász extension. Finally, we apply a primal-dualalgorithm that leverages the explicit structure of the problem tocompute eﬃcient solutions for the L-relaxation.Before proceeding, we introduce the following notation: let d : = W1 𝑛 represent the (weighted) degree vector of the vertices of G ,and B ∈ {− , , } 𝑛 × 𝑚 denote the directed vertex-edge incidencematrix of the graph G . Note that a column of B corresponds to anedge ( 𝑖, 𝑗 ) ∈ E , and is of the form ( e 𝑖 − e 𝑗 ) , where e 𝑖 denotes the 𝑖 𝑡ℎ canonical basis vector in R 𝑛 . We are now ready to state our mainresult. Theorem 4.3.

The base polytope of 𝐹 can be expressed as B 𝐹 = { g ∈ R 𝑛 : g = − d + Bf , ∀ | f | ≤ w } . Proof.

Once again, we exploit the fact that the function 𝐹 canbe linearly decomposed as 𝐹 (S) = Í ( 𝑖,𝑗 ) ∈E 𝐹 𝑖 𝑗 (S) , where 𝐹 𝑖 𝑗 hasbeen previously deﬁned in (9). An important result [35, Theorem Here, 𝑇 > is an upper bound on the maximum time taken to evaluate 𝐹 ( . ) for anychoice of subset S ⊆ V . { 𝐹 𝑖 𝑗 } ( 𝑖,𝑗 ) ∈E , i.e., wehave B 𝐹 = Í ( 𝑖,𝑗 ) ∈E B 𝐹 𝑖𝑗 , where B 𝐹 𝑖𝑗 is the base polytope of 𝐹 𝑖 𝑗 .This suggests that if we can ﬁnd a simple expression for each con-stituent base polytope B 𝐹 𝑖𝑗 , then we can possibly characterize thefull polytope B 𝐹 .To this end, consider a component function 𝐹 𝑖 𝑗 . Applying thedeﬁnition (4), its base polytope can be expressed as B 𝐹 𝑖𝑗 = { g ∈ R 𝑛 : 𝑔 𝑖 ≤ , 𝑔 𝑗 ≤ , 𝑔 𝑖 + 𝑔 𝑗 ≤ − 𝑤 𝑖 𝑗 , 𝑔 𝑖 + 𝑔 𝑗 = − 𝑤 𝑖 𝑗 } , (14)which in turn can be re-expressed as B 𝐹 𝑖𝑗 = − 𝑤 𝑖 𝑗 conv ( e 𝑖 , e 𝑗 ) , = − 𝑤 𝑖 𝑗 [ 𝛼 𝑖 𝑗 ( e 𝑖 − e 𝑗 ) + e 𝑗 ] , ∀ 𝛼 𝑖 𝑗 ∈ [ , ] . (15)Introducing the change of variable 𝛽 𝑖 𝑗 : = − 𝛼 𝑖 𝑗 , we obtain B 𝐹 𝑖𝑗 = − 𝑤 𝑖 𝑗 [( e 𝑖 + e 𝑗 ) − 𝛽 𝑖 𝑗 ( e 𝑖 − e 𝑗 )] , = [ 𝑤 𝑖 𝑗 𝛽 𝑖 𝑗 ( e 𝑖 − e 𝑗 ) − 𝑤 𝑖 𝑗 ( e 𝑖 + e 𝑗 )] , ∀ 𝛽 𝑖 𝑗 ∈ [− , ] . (16)This allows us to obtain the complete representation B 𝐹 = − Õ ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 ( e 𝑖 + e 𝑗 ) + Õ ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 𝛽 𝑖 𝑗 ( e 𝑖 − e 𝑗 ) . (17)Note that the ﬁrst summand is precisely the (weighted) degree vec-tor d (as the contribution of each vertex 𝑖 ∈ V is Í 𝑗 : ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 e 𝑖 ),while the second summand can be expressed as Bf , where f ∈ R 𝑚 is a vector with entries 𝑓 𝑖 𝑗 : = 𝑤 𝑖 𝑗 𝛽 𝑖 𝑗 . Since | 𝛽 𝑖 𝑗 | ≤

1, by construc-tion, we have | 𝑓 𝑖 𝑗 | ≤ 𝑤 𝑖 𝑗 , and thus | f | ≤ w . Putting everythingtogether, we ﬁnally obtain the following characterization of thebase polytope B 𝐹 = [− d + Bf ] , ∀ | f | ≤ w , (18)which yields the desired result up to the global scaling factor 1 / (cid:3) As an immediate consequence of the above result, we obtain thefollowing analytical form for the Lovász extension.

Corollary 4.4.

The Lovász extension of 𝐹 is 𝑓 𝐿 ( x ) = − d 𝑇 x + Õ ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 | 𝑥 𝑖 − 𝑥 𝑗 | . Proof.

Utilizing the form of the base polytope, we can expressthe Lovász extension as 𝑓 𝐿 ( x ) = max | f |≤ w (− d + Bf ) 𝑇 x = − d 𝑇 x + max | f |≤ w ( B 𝑇 x ) 𝑇 f = − d 𝑇 x + Õ ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 | 𝑥 𝑖 − 𝑥 𝑗 | , (19)where in going from the second to the third step we have utilizedthe fact that the vector B 𝑇 x generates pair-wise diﬀerences be-tween entries of x that are connected by an edge in G . (cid:3) The above result allows us to express the L-relaxation (in maxi-mization form) asmax x ∈P (cid:26) d 𝑇 x − Õ ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 | 𝑥 𝑖 − 𝑥 𝑗 | (cid:27) . (20)An intuitive explanation of the above formulation is as follows. Forany binary vector x ∈ P that represents an induced subgraph G 𝑆 ,the ﬁrst term in the above objective function is a measure of thevolume of G 𝑆 , i.e., it is the sum of the degrees of all the verticesin the induced subgraph. Meanwhile, the second term, which cor-responds to graph total variation, counts the weighted sum of alledges crossing the boundary of G 𝑆 , i.e., it measures the cut. The dif-ference of these two terms is then (twice) the sum of all edges in G 𝑆 , which is precisely the objective function that the DkS problemseeks to maximize. Equivalently stated, we wish to ﬁnd an inducedsubgraph on 𝑘 vertices with high volume and small cut.When solving the L-relaxation, we allow for non-binary vectors x ∈ P . In this case, the value of each entry of x is a soft “member-ship” score that reﬂects the “likelihood” of a vertex belonging tothe 𝑘 -densest subgraph. The objective function then places higheremphasis on those likelihood proﬁles where the membership val-ues are largest for those vertices that have large degree and are si-multaneously “smooth” (in the total-variation sense) with respectto their one-hop neighbors, which is an intuitive proxy for densesubgraphs of size 𝑘 .Hence, the L-relaxation constitutes a meaningful relaxation ofthe DkS problem. That being said, the form of the Lovász extensionreveals that problem (11) is neither diﬀerentiable, nor strongly con-cave, which constitutes a computational impediment in solving iteﬃciently at scale. In the next section, we show that by exploitingthe structure of the problem in an intelligent fashion, it is in factpossible to develop an eﬃcient and scalable algorithm. In order to motivate our algorithmic approach, we express the L-relaxation in the following manner. First, we deﬁne the functions 𝑔 ( x ) : = ( − d 𝑇 x , x ∈ P , +∞ , otherwise (21)and ℎ ( x ) : = Õ ( 𝑖,𝑗 ) ∈E 𝑤 𝑖 𝑗 | 𝑥 𝑖 − 𝑥 𝑗 | = k DB 𝑇 x k , (22)where D : = diag ( w ) denotes a diagonal matrix containing theedge-weights w . We can now express problem (12) asmin x ∈ R 𝑛 𝑔 ( x ) + ℎ ( x ) , (23)which in turn is equivalent tomin x , z ∈ R 𝑛 𝑔 ( x ) + ℎ ( z ) s.to x − z = . (24)The above problem is now in a form suitable for the applicationof the Alternating Direction Method of Multipliers (ADMM) [7,24] - a ﬂexible framework for solving convex optimization prob-lems that fuses the beneﬁts of dual decomposition and augmentedLagrangian techniques into a simple primal-dual algorithm. Theain utility of ADMM is that it decomposes complicated cost func-tions into simpler components (these can be non-smooth or evenrepresent embedded constraints) via variable splitting and allowsthem to be handled separately, while featuring guaranteed conver-gence to the optimal solution of the problem under very mild as-sumptions. While being a very general framework for solving con-vex optimization problems, ADMM is most eﬃcient when its sub-problems admit an analytical or simple computational solution.For the particular form of variable splitting employed in prob-lem (24), it can be shown that the ADMM updates are given by x 𝑡 + = prox 𝜌𝑔 ( z 𝑡 − u 𝑡 ) (25a) z 𝑡 + = prox 𝜌ℎ ( x 𝑡 + + u 𝑡 ) (25b) u 𝑡 + = u 𝑡 + x 𝑡 + − z 𝑡 + (25c)where u ∈ R 𝑛 is the normalized dual variable associated with theconsensus constraint, 𝜌 > 𝜌 𝑓 ( v ) : = arg min x ∈ R 𝑛 𝑓 ( x ) + 𝜌 k x − v k (26)denotes the proximal operator [31] of a closed, proper, convex func-tion 𝑓 . It has been shown [19] that the algorithm converges at a rateof 𝑂 ( / 𝑡 ) , which represents an order of magnitude improvementover subgradient methods. However, since ADMM accesses thefunctions 𝑔, ℎ via their proximal operators, the overall eﬃciencyof the algorithm depends on the complexity of evaluating theseoperators.First, we focus on the complexity of the x - update, i.e., comput-ing the proximal operator of the function 𝑔 . Our next result showsthat it admits a simple solution. Lemma 5.1.

The optimal solution x ∗ : = prox 𝜌𝑔 ( v ) is character-ized by the pair of conditions 𝑥 ∗ 𝑖 = max (cid:26) min (cid:18) 𝑣 𝑖 + ( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ ) , (cid:19) , (cid:27) , ∀ 𝑖 ∈ [ 𝑛 ] , 𝑛 Õ 𝑖 = 𝑥 ∗ 𝑖 = 𝑘, where 𝜈 ∗ ∈ R is the optimal dual variable associated with the sum-to- 𝑘 constraint. Proof.

Deﬁne the function˜ 𝑓 ( x ) : = ( − d 𝑇 x + 𝜌 k x − v k , ≤ x ≤ , +∞ , otherwise . (27)Then, the proximal operator of 𝑔 is given byprox 𝜌𝑔 ( v ) = arg min 𝑇 x = 𝑘 ˜ 𝑓 ( x ) . (28)The Lagrangian of the above problem is 𝐿 ( x , 𝜈 ) : = ( − d 𝑇 x + ( 𝜌 / )k x − v k + 𝜈 ( 𝑇 x − 𝑘 ) , ≤ x ≤ , +∞ , otherwise(29)where 𝜈 ∈ R is the dual variable associated with the equality con-straint. Let ( x ∗ , 𝜈 ∗ ) denote the primal-dual optimal pair of (28). TheKarush-Kuhn-Tucker (KKT) conditions (which are necessary andsuﬃcient for optimality in this case) assert that the pair ( x ∗ , 𝜈 ∗ ) satisfy x ∗ = arg min ≤ x ≤ 𝐿 ( x , 𝜈 ∗ ) , 𝑇 x ∗ = 𝑘. (30a) Since the Lagrangian is linearly separable in x , the ﬁrst conditionsimpliﬁes to 𝑥 ∗ 𝑖 = arg min ≤ 𝑥 𝑖 ≤ (cid:26) ( 𝜈 ∗ − 𝑑 𝑖 ) 𝑥 𝑖 + ( 𝜌 / )( 𝑥 𝑖 − 𝑣 𝑖 ) (cid:27) , ∀ 𝑖 ∈ [ 𝑛 ] . (31)The solution of each sub-problem can be computed in closed formas 𝑥 ∗ 𝑖 =  , 𝑣 𝑖 < −( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ ) 𝑣 𝑖 + ( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ ) , 𝑣 𝑖 ∈ [−( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ ) , − ( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ )] , 𝑣 𝑖 > − ( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ ) (32)which can be compactly represented as 𝑥 ∗ 𝑖 = max (cid:26) min (cid:18) 𝑣 𝑖 + ( / 𝜌 )( 𝑑 𝑖 − 𝜈 ∗ ) , (cid:19) , (cid:27) , ∀ 𝑖 ∈ [ 𝑛 ] . (33) (cid:3) The above observation suggests a very simple approach to comput-ing ( x ∗ , 𝜈 ∗ ) . Deﬁne the non-linear equation 𝜙 ( 𝜈 ) : = 𝑛 Õ 𝑖 = max (cid:26) min (cid:18) 𝑣 𝑖 + ( / 𝜌 )( 𝑑 𝑖 − 𝜈 ) , (cid:19) , (cid:27) − 𝑘, (34)which is monotone, non-increasing in 𝜈 . Since 𝜙 ( 𝜈 ∗ ) =

0, in orderto solve for 𝜈 ∗ (and hence, x ∗ ), we can resort to bisection search. Wechoose the lower and upper limits of the initial bisection intervalto be 𝜈 𝑙 : = min 𝑖 ∈[ 𝑛 ] { 𝑑 𝑖 + 𝜌𝑣 𝑖 } − 𝜈 𝑢 : = max 𝑖 ∈[ 𝑛 ] { 𝑑 𝑖 + 𝜌𝑣 𝑖 } respectively,which yields the initial value interval [ 𝜙 ( 𝜈 𝑙 ) , 𝜙 ( 𝜈 𝑢 )] = [ 𝑛 − 𝑘, − 𝑘 ] .Pseudocode for the bisection algorithm is provided in Algorithm1. Algorithm 1:

Bisection ( v , d , 𝑘, 𝜌, 𝜖 ) Input: v ∈ R 𝑛 , degree vector d ∈ R 𝑛 , subgraph size 𝑘 , parameter 𝜌 > ,exit tolerance 𝜖 > . Output:

The solution x ∗ : = prox 𝜌𝑔 ( v ) . Initialize: 𝜈 𝑙 = min 𝑖 ∈[ 𝑛 ] { 𝑑 𝑖 + 𝜌𝑣 𝑖 } − , 𝜈 𝑢 = max 𝑖 ∈[ 𝑛 ] { 𝑑 𝑖 + 𝜌𝑣 𝑖 } repeat 𝜈 𝑚 = ( 𝜈 𝑙 + 𝜈 𝑢 )/ if 𝜙 ( 𝜈 𝑚 ) 𝜙 ( 𝜈 𝑢 ) < then 𝜈 𝑙 = 𝜈 𝑚 else 𝜈 𝑢 = 𝜈 𝑚 end until 𝜙 ( 𝜈 𝑙 ) − 𝜙 ( 𝜈 𝑢 ) ≤ 𝜖 Return: 𝑥 ∗ 𝑖 = max (cid:26) min (cid:18) 𝑣 𝑖 + ( / 𝜌 ) ( 𝑑 𝑖 − 𝜈 𝑚 ) , (cid:19) , (cid:27) , ∀ 𝑖 ∈ [ 𝑛 ] . Note that for a prescribed exit tolerance 𝜖 , the maximum num-ber of bisection steps is 𝑂 ( log [ 𝜙 ( 𝜈 𝑙 ) − 𝜙 ( 𝜈 𝑢 )]) , which, for ourchoice of initial intervals { 𝜈 𝑙 , 𝜈 𝑢 } , is only 𝑂 ( log 𝑛 ) . Hence, the max-imum number of steps required by the bisection algorithm to ter-minate grows only logarithmically with the problem dimension 𝑛 . We conclude that the above algorithm is an eﬃcient means forevaluating the proximal operator of the function 𝑔 .We now turn our attention towards assessing the complexity ofcomputing the proximal operator of the graph total-variation func-tion ℎ . Unfortunately, this problem does not admit a simple analyt-ical or computational solution. While its solution can be obtainedvia solving a sequence of maximum-ﬂow problems [16], this incursomplexity 𝑂 ( 𝑚𝑛 log ( 𝑛 / 𝑚 )) , which, even for sparse graphs (with 𝑚 = 𝑂 ( 𝑛 )) is 𝑂 ( 𝑛 log 𝑛 ) . Hence, owing to the high computationalcomplexity of the z -update, the ADMM framework applied to (24)is not scalable to large instances.In hindsight, the above diﬃculty appears to stem from the factthat our choice of variable splitting was not eﬀective in yieldingsimple ADMM updates. Consequently, with the aim of obtainingeﬃcient updates, we introduce a diﬀerent type of variable splitting.With some abuse of notation, we redeﬁne the function ℎ as ℎ ( z ) : = k Dz k . (35)Then, the L-relaxation (12) can be equivalently expressed asmin x ∈ R 𝑛 , z ∈ R 𝑚 𝑔 ( x ) + ℎ ( z ) s.to B 𝑇 x − z = . (36)The ADMM updates for this problem can be shown to be x 𝑡 + = arg min x (cid:26) 𝑔 ( x ) + ( 𝜌 / )k B 𝑇 x − z 𝑡 + u 𝑡 k (cid:27) (37a) z 𝑡 + = prox 𝜌ℎ ( B 𝑇 x 𝑡 + + u 𝑡 ) (37b) u 𝑡 + = u 𝑡 + B 𝑇 x 𝑡 + − z 𝑡 + (37c)where u ∈ R 𝑛 is the normalized dual variable associated with thecoupling constraint and 𝜌 > ℎ admits an analytical solution given by [31, Section 6.5.2] Shrinkage ( v , w , 𝜌 ) : = max ( , v − w / 𝜌 ) − max ( , − v − w / 𝜌 ) . (38)However, the downside is that the simplicity of the x -update doesnot carry over from the previous incarnation of ADMM (it is nolonger the proximal operator of 𝑔 ), which again hinders the scala-bility of the algorithm.The lesson to be learned is that the although the functions 𝑔 and ℎ have proximal operators which can be evaluated eﬃciently, thematrix B is the “troublesome” component as it complicates the pri-mal updates in ADMM, no matter how we elect to perform variablesplitting. While this seems like a major drawback of ADMM for ourproblem, it turns out that there is an inexact version of ADMM,which can provide the desired solution. To be precise, we invokethe framework of Linearized -ADMM (L-ADMM) [10]. In order tomotivate the approach, we denote the augmented Lagrangian as-sociated with problem (36) as 𝐿 𝜌 ( x , z , y ) : = 𝑔 ( x ) + ℎ ( z ) + y 𝑇 ( B 𝑇 x − z ) + ( 𝜌 / )k B 𝑇 x − z k , (39)where y ∈ R 𝑚 is the dual variable corresponding to the couplingconstraint. In standard ADMM, the x -update is computed by mini-mizing 𝐿 𝜌 ( x , z , y ) with respect to (w.r.t.) x while keeping the othervariables ﬁxed. In L-ADMM, this update is modiﬁed by linearizingthe quadratic term in the augmented Lagrangian and adding a newproximal regularization, i.e., replacing ( 𝜌 / )k B 𝑇 x − z 𝑡 k in (39) by 𝜌 ( BB 𝑇 x 𝑡 − Bz 𝑡 ) 𝑇 x + ( 𝜇 / )k x − x 𝑡 k , where 0 < 𝜇 ≤ /( 𝜌 k B k ) is a regularization parameter. Afterworking out the updates, the algorithm takes the following form x 𝑡 + = prox 𝑔 / 𝜇 ( x 𝑡 − 𝜇𝜌 B ( B 𝑇 x 𝑡 − z 𝑡 + u 𝑡 )) (40a) z 𝑡 + = prox 𝜌ℎ ( B 𝑇 x 𝑡 + + u 𝑘 ) (40b) u 𝑡 + = u 𝑡 + B 𝑇 x 𝑡 + − z 𝑡 + . (40c)It is evident that L-ADMM accesses both of the functions 𝑔 and ℎ via their proximal operators only, in contrast to the variants ofADMM considered previously. Hence, each round of ADMM up-dates can be carried out eﬃciently, as we have already demon-strated that the proximal operators are easy to compute. We pointout that although L-ADMM employs inexact updates, it is still guar-anteed to converge to the optimal solution of (36). An even moreremarkable feature of L-ADMM is that its convergence does notdegrade compared to standard ADMM [19], i.e., it enjoys the same 𝑂 ( / 𝑡 ) convergence rate. Hence, the L-ADMM algorithm featuresboth lightweight updates and fast convergence. Pseudocode for thealgorithm is summarized in Algorithm 2. In practice, we employ anover-relaxation technique [7, Section 3.4], i.e., we replace the term B 𝑇 x 𝑡 in the z , u updates by 𝛼 B 𝑇 x 𝑡 + + ( − 𝛼 ) z 𝑡 , where 𝛼 > Algorithm 2:

L-ADMM Input: degree vector d ∈ R 𝑛 , edge weight vector w ∈ R 𝑚 , directedvertex-edge incidence matrix B ∈ {− , , } 𝑛 × 𝑚 , subgraph size 𝑘 , penaltyparameter 𝜌 > , regularization parameter 𝜇 > , over-relaxationparameter 𝛼 ∈ [ . , . ] , bisection exit tolerance 𝜖 > . Output:

A solution of the L-relaxation. Initialize: x = supp ( top 𝑘 ( d )) , z = B 𝑇 x , u = , 𝜇 = /( 𝜌 k B k ) , 𝑡 ← repeat x 𝑡 + = Bisection ( x 𝑡 − 𝜇𝜌 B ( B 𝑇 x 𝑡 − z 𝑡 + u 𝑡 ) , d , 𝑘, 𝜌, 𝜖 ) z 𝑡 + = Shrinkage ( 𝛼 B 𝑇 x 𝑡 + + ( − 𝛼 ) z 𝑡 + u 𝑡 , w , 𝜌 ) u 𝑡 + = u 𝑡 + 𝛼 B 𝑇 x 𝑡 + + ( − 𝛼 ) z 𝑡 − z 𝑡 + 𝑡 ← 𝑡 + until convergence criterion is met Return: x 𝐿 = ( / 𝑡 ) Í 𝑡𝑖 = x 𝑖 Finally, since the solution ¯ x computed by L-ADMM is not guar-anteed to be integral in general, we require a post-processing stepinto order to “round” the solution of the L-relaxation into a binaryindicator vector. One such step is to simply project the solutiononto the discrete sum-to- 𝑘 constraints, i.e., we compute x ∈ arg min x ∈{ , } 𝑛 , 𝑇 x = 𝑘 k x − x 𝐿 k = supp ( top 𝑘 ( x 𝐿 )) (41)which is tantamount to identifying the support of the 𝑘 -largest en-tries in x 𝐿 , and can be performed in 𝑂 ( 𝑛𝑘 ) time.Additionally, we also employ an algorithmic reﬁnement schemewhere we use the solution of the L-relaxation to initialize a local-search algorithm. In this scheme, we consider the following indef-inite relaxation of the DkS problemmin x ∈P (cid:26) 𝑓 ( x ) : = − x 𝑇 Wx (cid:27) (42) lgorithm 3: Frank-Wolfe Input:

Adjacency matrix W ∈ R 𝑛 × 𝑛 , subgraph size 𝑘 , solution of L-ADMM x 𝐿 , Lipschitz constant 𝐿 = k W k . Output:

An approximate solution of the indeﬁnite relaxation (42). Initialize: x = x 𝐿 , 𝑡 ← repeat g 𝑡 = − Wx 𝑡 ¯ x 𝑡 = supp ( top 𝑘 (− g 𝑡 )) 𝛼 𝑡 = min { , (( ¯ x 𝑡 − x 𝑡 ) 𝑇 g 𝑡 )/( 𝐿 k ¯ x 𝑡 − x 𝑡 k ) } x 𝑡 + = x 𝑡 + 𝛼 𝑡 ( ¯ x 𝑡 − x 𝑡 ) 𝑡 ← 𝑡 + until convergence criterion is met Return: x 𝑡 Table 1:

Summary of network statistics: the number of vertices ( 𝑛 ), the num-ber of edges ( 𝑚 ), and the network type. Graph 𝑛 𝑚

Network Type polBlog

Facebook ppi-Human loc-Gowalla web-Google

YouTube as-Skitter wiki-Talk which is not convex, and hence cannot be optimally solved in polynomial-time in general. Consequently, we employ the Frank-Wolfe (FW) al-gorithm [14] initialized with the solution computed by L-ADMMin order to obtain a high-quality sub-optimal solution. This is sum-marized in Algorithm 3. Under the prescribed step-size rule, thealgorithm is guaranteed to converge to a stationary point of prob-lem (42) [3, p. 268].

In this section, we test the eﬃcacy of the combined L-relaxationand post-processing schemes in discovering 𝑘 -densest subgraphsacross a diverse set of real-world graphs. We perform comparisonsagainst a slew of state-of-the-art benchmarks to illustrate the su-perior performance of our approach. A summary of the datasets used can be found in Table 1, whichwere retrieved from standard repositories [22, 23]. We pre-processedthe datasets (which are unweighted) by symmetrizing the arcs ifthe network was originally directed, removing all self-loops, andextracting the largest connected component.

In order to benchmark the performance of our algorithm, we em-ployed the following baselines.(1)

Greedy:

The greedy approximation algorithm proposed in[13, Procedure 2]. Given an unweighted graph G and a de-sired subgraph size 𝑘 , the algorithm ﬁrst constructs a set H of the 𝑘 / 𝑘 / V \ H which have thelargest number of one-hop neighbors in H . (2) Truncated Power Method (TPM):

A variant of the classicpower method applied to the DkS formulation (5) [39, Al-gorithm 2]. At each step, the algorithm performs standardpower-method iterations followed by projecting the resultonto the discrete sum-to- 𝑘 set to ensure iterate feasibility.(3) Low-rank Binary Principal Component:

In this approach[30], a low rank decomposition of the adjacency matrix W isﬁrst performed, followed by solving the DkS problem withthe low-rank approximation in place of W . It turns out thatin the the rank-1 approximation case, the resulting problemadmits a simple solution in 𝑂 ( 𝑛 ) time, whereas for constantranks (i.e., 𝑟 = 𝑂 ( ) ), instead of checking all (cid:0) 𝑛𝑘 (cid:1) possiblesubsets in the worst-case, the problem can be surprisinglysolved in polynomial-time 𝑂 ( 𝑛 𝑟 + ) . In practice, it is only fea-sible to run the algorithm for ranks 𝑟 ≤

5, owing to its highcomplexity. In fact, we were only able to run the algorithmwith rank-1 approximation for all the datasets consideredherein, as even the rank-2 case proved too expensive for allbut the two smallest datasets.(4)

Edge-density upper bound:

An important feature of theabove approach is that the solution of the DkS problem withrank- 𝑟 approximation yields an a posteriori , data-dependentupper bound on the optimal value of the DkS problem. Informal terms, let 𝜎 ≥ 𝜎 ≥ · · · ≥ 𝜎 𝑑 denote the 𝑑 ≤ 𝑛 non-zero singular values of W . If W 𝑟 denotes the rank- 𝑟 ap-proximation of W , with k W − W 𝑟 k = 𝜎 𝑟 + , and S ∗ 𝑟 denotesthe optimal solution of the rank- 𝑟 approximation problemfor a given 𝑘 , then the quantitymin { , ( 𝑇 S ∗ 𝑟 W 𝑟 S ∗ 𝑟 + 𝜎 𝑟 + )/ 𝑘 − , 𝜎 / 𝑘 − } constitutes an upper bound on the edge-density of the op-timal 𝑘 -densest subgraph (see [30, Lemma 3]). The utilityof the above result is that it provides a benchmark for as-sessing the sub-optimality of a solution generated by any algorithm that aims to solve the DkS problem. Althoughthe upper-bound is not attainable in general for every 𝑘 , wedemonstrate that the subgraphs computed by our approachcan come close to attaining it on real-world graphs for alarge range of 𝑘 . We performed all our experiments in Matlab on a Windows work-station equipped with 16GB RAM and an Intel i7 processor. Weused Matlab code for the low-rank principal component approxi-mation approach and TPM [30].

L-ADMM:

Regarding the implementation of our L-ADMM algo-rithm for solving the L-relaxation, we set the ADMM penalty pa-rameter 𝜌 = .

1, the proximal regularization parameter 𝜇 = /( 𝜌 k B k ) ,and the over-relaxation parameter 𝛼 = .

8. The exit tolerance forthe bisection subroutine was set to be 𝜖 = − . The terminationcriterion of the ADMM algorithm was based on a standard measure[7, Section 3.3] - given a pair of absolute and relative tolerances 𝜖 abs and 𝜖 rel respectively, at each iteration 𝑡 of ADMM, we compute therimal and dual tolerances 𝜖 pri = √ 𝑚𝜖 abs + 𝜖 rel max {k B 𝑇 x 𝑡 k , k z 𝑡 k } , (43a) 𝜖 dual = √ 𝑛𝜖 abs + 𝜖 rel k Bu 𝑡 k . (43b)Deﬁning the primal and dual residuals r 𝑡 : = B 𝑇 x 𝑡 − z 𝑡 and 𝑠 𝑡 : = B ( z 𝑡 − z 𝑡 − ) respectively, we stop the algorithm when these resid-uals are small in the sense that k 𝑟 𝑡 k ≤ 𝜖 pri and k s 𝑡 k ≤ 𝜖 dual ,or a maximum of 3000 iterations have been performed. In our ex-periments, we set 𝜖 abs = 𝜖 rel = − for all datasets excepting web-Google and YouTube , for which we used the setting 𝜖 abs = 𝜖 rel = − . FW and TPM:

We initialized both algorithms with the solution re-turned by the L-ADMM algorithm. Note that for TPM, the solutionof the L-relaxation is a superior initialization compared to select-ing the support of the 𝑘 -vertices with the largest degree (originallyproposed in [39]), i.e., here we give TPM the beneﬁt of the doubt.The algorithms are run till they attain convergence in terms of thecost function, or a maximum of 100 iterations are reached. Finally,while the solution of FW is not guaranteed to be integral in gen-eral, we observed in our experiments that the algorithm returns asolution that is integral (up to machine precision), and thus we didnot perform a rounding step at the end. The outcomes of our experiments are depicted in Figure 1 and2, which depict the edge density of the subgraphs determined bythe methods and the runtimes versus subgraph size 𝑘 , respectively.Our main ﬁndings are as follows: • The upper-bound on the optimal edge-density computedfrom solving the low-rank approximation to the DkS prob-lem is very useful in gauging the sub-optimality of the solu-tions computed by the diﬀerent methods. It reveals that incontrast to pessimistic worst-case results regarding the DkSproblem, several methods (with the exception of the greedyalgorithm) can yield high quality solutions on real-worldgraphs. • Our proposed approach, the L-relaxation coupled with thetwo rounding techniques (projection and iterative reﬁne-ment via the Frank-Wolfe algorithm) performs very well. Inparticular, the latter scheme is consistently the best overall,outperforming TPM, the low-rank approximation, and thesolution obtained by projecting the L-ADMM solution. Al-though TPM shares the same initialization as FW, it can ex-hibit non-monotone behavior with regard to density as thesize is varied. We attribute this to the fact that FW is guaran-teed to converge to a stationary point of the indeﬁnite relax-ation (which is empirically observed to be integral), whereasTPM simply increases the objective function of DkS. Fur-thermore, for small values of 𝑘 ≤

100 (the regime whereone intuitively expects the densest subgraphs to be present),L-ADMM + FW can attain the upper-bound in many cases,which is clearly optimal; otherwise it attains the most sig-niﬁcant fraction of the upper-bound (typically 65 −

80% for 𝑘 ≤ • The runtime of the rank-2 approximation algorithm scalesunfavorably relative to the other methods, and hence it is Size E dge den s i t y polBlog Rank-1 PCRank-2 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWUpper bound 10 Size E dge den s i t y Facebook

Rank-1 PCRank-2 PCADMM + Proj.ADMM + TPMADMM+ FWUpper bound Size E dge den s i t y PPI-Human

Rank-1 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWbUpper bound 10 Size E dge den s i t y loc-Gowalla Rank-1 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWUpper bound10 Size E dge den s i t y web-Google Rank-1 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWUpper bound 10 Size E dge den s i t y YouTube

Rank-1 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWUpper bound10 Size E dge den s i t y as-Skitter Rank-1 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWUpper bound 10 Size E dge den s i t y wiki-Talk Rank-1 PCGreedyADMM + Proj.ADMM + TPMADMM+ FWUpper bound

Figure 1: Edge density vs size: We ran the rank-2 approxima-tion only on the 2 smallest datasets owing to its complexity.The greedy algorithm is omitted from comparison on

Face-book owing to its poor performance relative to the otherbaselines. For

Facebook and polBlog , the upper-bound iscomputed w.r.t. the rank-2 approximation, while it is w.r.t.the rank-1 approximation on the remaining datasets. omitted from the larger datasets. While ADMM comes sec-ond in terms of complexity, it is by no means unaﬀordable,taking an average of 15 minutes to terminate on the largestgraphs. This is due to the simplicity of its subroutines, whichrequire performing bisection search and shrinkage at eachstep. Additionally, compared to running ADMM, the com-plexity of performing iterative reﬁnement via the Frank-Wolfealgorithm is substantially smaller.Our investigation reveals that solving the L-relaxation via L-ADMMfollowed by reﬁning the solution via few iterations of FW consti-tutes a potent and eﬃcient algorithmic framework for eﬀectivelymining dense subgraphs from real-world graphs. Size -2 -1 E l ap s ed T i m e Facebook

Rank-1 PCRank-2 PCADMMTPMFW 10 Size -1 E l ap s ed T i m e YouTube

Rank-1 PCGreedyADMMTPMFW10 Size E l ap s ed T i m e as-Skitter Rank-1 PCGreedyADMMTPMFW 10 Size -1 E l ap s ed T i m e wiki-Talk Rank-1 PCGreedyADMMTPMFW

Figure 2: Runtime vs size on selected, representativedatasets, owing to space constraints.

We considered the

Densest - 𝑘 - Subgraph problem (DkS), and re-formulated it as minimizing a submodular cost function subject toa cardinality constraint. Adopting this viewpoint, we proposed aconvex relaxation of DkS that minimizes the Lovász extension ofthe submodular cost function over the convex hull of the cardinal-ity constraint. While the Lovász extension does not admit a closedform expression in general, we showed that for DkS it does admitan analytical form. We exploited this form to develop an eﬃcientalgorithm based on an inexact variant of the Alternating DirectionMethod of Multipliers (ADMM) that is capable of solving the re-laxed problem at scale. After rounding the solution returned byADMM via the proposed schemes, we conducted experiments onreal-world graphs to showcase the eﬀectiveness of our approachcompared to prevailing baselines. Contrary to pessimistic worst-case results, our relaxation scheme is very eﬀective at exploringthe edge-density vs size curve in real-world graphs, yielding sub-graphs that are no worse than 65 −

80% of the optimal density.

Supported by the National Science Foundation and the Army Re-search Oﬃce under Grants No. IIS-1908070 and ARO-W911NF1910407respectively.

REFERENCES [1] Sanjeev Arora, David Karger, and Marek Karpinski. 1999. Polynomial time ap-proximation schemes for dense instances of NP-hard problems.

Journal of com-puter and system sciences

58, 1 (1999), 193–210.[2] Francis Bach et al. 2013. Learning with Submodular Functions: A Convex Opti-mization Perspective.

Foundations and Trends® in Machine Learning

6, 2-3 (2013),145–373.[3] Dimitri P Bertsekas. 2016.

Nonlinear Programming . Athena Scientiﬁc.[4] Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and AravindanVijayaraghavan. 2010. Detecting high log-densities: an O (n / ) approximationfor densest k-subgraph. In Proceedings of the forty-second ACM symposium onTheory of computing . 201–210.[5] Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vija-yaraghavan, and Yuan Zhou. 2012. Polynomial integrality gaps for strong sdp re-laxations of densest k-subgraph. In

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms . SIAM, 388–405. [6] Polina Bombina and Brendan Ames. 2020. Convex optimization for the dens-est subgraph and densest submatrix problems. In

SN Operations Research Forum ,Vol. 1. Springer, 1–24.[7] Stephen Boyd, Neal Parikh, and Eric Chu. 2011.

Distributed optimization andstatistical learning via the alternating direction method of multipliers . Now Pub-lishers Inc.[8] Moses Charikar. 2000. Greedy approximation algorithms for ﬁnding dense com-ponents in a graph. In

International Workshop on Approximation Algorithms forCombinatorial Optimization . Springer, 84–95.[9] Jie Chen and Yousef Saad. 2010. Dense subgraph extraction with application tocommunity detection.

IEEE Transactions on knowledge and data engineering

Journal of Opti-mization Theory and Applications

Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

11 (1970).[12] Uriel Feige and Michael Langberg. 2001. Approximation algorithms for max-imization problems arising in graph partitioning.

Journal of Algorithms

41, 2(2001), 174–211.[13] Uriel Feige, David Peleg, and Guy Kortsarz.2001. The dense k-subgraphproblem.

Algorithmica

29, 3 (2001), 410–421.[14] Marguerite Frank, Philip Wolfe, et al. 1956. An algorithm for quadratic program-ming.

Naval research logistics quarterly

3, 1-2 (1956), 95–110.[15] Satoru Fujishige. 2005.

Submodular functions and optimization . Elsevier.[16] Giorgio Gallo, Michael D Grigoriadis, and Robert E Tarjan. 1989. A fast para-metric maximum ﬂow algorithm and applications.

SIAM J. Comput.

18, 1 (1989),30–55.[17] Christos Giatsidis, Fragkiskos D Malliaros, Dimitrios M Thilikos, and MichalisVazirgiannis. 2014. CoreCluster: A Degeneracy Based Graph Clustering Frame-work.. In

AAAI , Vol. 14. 44–50.[18] Andrew V Goldberg. 1984.

Finding a maximum density subgraph . Technicalreport, University of California Berkeley, CA.[19] Bingsheng He and Xiaoming Yuan. 2012. On the O(1/n) Convergence Rate ofthe Douglas–Rachford Alternating Direction Method.

SIAM J. Numer. Anal.

Proceedings of the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining . ACM, 895–904.[21] Subhash Khot. 2006. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique.

SIAM J. Comput.

36, 4 (2006), 1025–1071.[22] Jérôme Kunegis. 2013. KONECT – The Koblenz Network Collec-tion. In

Proc. Int. Conf. on World Wide Web Companion . 1343–1350.http://dl.acm.org/citation.cfm?id=2488173[23] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large NetworkDataset Collection. http://snap.stanford.edu/data.[24] Pierre-Louis Lions and Bertrand Mercier. 1979. Splitting algorithms for the sumof two nonlinear operators.

SIAM J. Numer. Anal.

16, 6 (1979), 964–979.[25] László Lovász. 1983. Submodular functions and convexity. In

Mathematicalprogramming the state of the art . Springer, 235–257.[26] Zhi-Quan Luo, Wing-Kin Ma, Anthony Man-Cho So, Yinyu Ye, and ShuzhongZhang. 2010. Semideﬁnite relaxation of quadratic optimization problems.

IEEESignal Processing Magazine

3, 27 (2010), 20–34.[27] Pasin Manurangsi. 2017. Almost-polynomial ratio ETH-hardness of approximat-ing densest k-subgraph. In

Proceedings of the 49th Annual ACM SIGACT Sympo-sium on Theory of Computing . 954–961.[28] Michael Mitzenmacher, Jakub Pachocki, Richard Peng, CharalamposTsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detec-tion in large-scale networks via sampling. In

Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining .815–824.[29] Yurii Nesterov. 2013.

Introductory lectures on convex optimization: A basic course .Vol. 87. Springer Science & Business Media.[30] Dimitris Papailiopoulos, Ioannis Mitliagkas, Alexandros Dimakis, and Constan-tine Caramanis. 2014. Finding dense subgraphs via low-rank bilinear optimiza-tion. In

International Conference on Machine Learning . 1890–1898.[31] Neal Parikh and Stephen Boyd. 2014. Proximal algorithms.

Foundations andTrends in optimization

1, 3 (2014), 127–239.[32] Yevgeniy Podolyan and George Karypis. 2009. Common pharmacophore identiﬁ-cation using frequent clique detection algorithm.

Journal of chemical informationand modeling

49, 1 (2009), 13–21.[33] R Tyrrell Rockafellar. 1970.

Convex analysis . Number 28. Princeton universitypress.[34] Barna Saha, Allison Hoch, Samir Khuller, Louiqa Raschid, and Xiao-Ning Zhang.2010. Dense subgraphs with restrictions and applications to gene annotationgraphs. In

Annual International Conference on Research in Computational Molec-ular Biology . Springer, 456–472.35] Alexander Schrijver. 2003.

Combinatorial optimization: polyhedra and eﬃciency .Vol. 24. Springer Science & Business Media.[36] Anand Srivastav and Katja Wolf. 1998. Finding dense subgraphs with semidef-inite programming. In

International Workshop on Approximation Algorithms forCombinatorial Optimization . Springer, 181–191.[37] Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In

Proceedings of the 24th International Conference on World Wide Web . InternationalWorld Wide Web Conferences Steering Committee, 1122–1132.[38] Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of ‘small-world’ networks.

Nature

Journal of Machine Learning Research

14, Apr (2013), 899–925.[40] Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, HasanDavulcu, and Hanghang Tong. 2017. Hidden: hierarchical dense subgraph de-tection with application to ﬁnancial fraud detection. In

Proceedings of the 2017SIAM International Conference on Data Mining . SIAM, 570–578.[41] Yang Zhang and Srinivasan Parthasarathy. 2012. Extracting analyzing and visu-alizing triangle k-core motifs within networks. In2012 IEEE 28th InternationalConference on Data Engineering