[PDF] Augmented Sparsifiers for Generalized Hypergraph Cuts

Abstract

In recent years, hypergraph generalizations of many graph cut problems have been introduced and analyzed as a way to better explore and understand complex systems and datasets characterized by multiway relationships. Recent work has made use of a generalized hypergraph cut function which for a hypergraph H=(V,E) can be defined by associating each hyperedge e∈E with a splitting function w e , which assigns a penalty to each way of separating the nodes of e . When each w e is a submodular cardinality-based splitting function, meaning that w e (S)=g(|S|) for some concave function g , previous work has shown that a generalized hypergraph cut problem can be reduced to a directed graph cut problem on an augmented node set. However, existing reduction procedures often result in a dense graph, even when the hypergraph is sparse, which leads to slow runtimes for algorithms that run on the reduced graph. We introduce a new framework of sparsifying hypergraph-to-graph reductions, where a hypergraph cut defined by submodular cardinality-based splitting functions is (1+ε) -approximated by a cut on a directed graph. Our techniques are based on approximating concave functions using piecewise linear curves. For ε>0 we need at most O( ε −1 |e|log|e|) edges to reduce any hyperedge e , which leads to faster runtimes for approximating generalized hypergraph s - t cut problems. For the machine learning heuristic of a clique splitting function, our approach requires only O(|e| ε −1/2 loglog 1 ε ) edges. This sparsification leads to faster approximate min s - t graph cut algorithms for certain classes of co-occurrence graphs. Finally, we apply our sparsification techniques to develop approximation algorithms for minimizing sums of cardinality-based submodular functions.

Full PDF

AAugmented Sparsiﬁers for Generalized Hypergraph Cuts ∗ Austin R. BensonComputer Science Dept.Cornell [email protected] Jon KleinbergComputer Science Dept.Cornell [email protected] Nate VeldtCenter for Applied MathCornell [email protected]

Abstract

In recent years, hypergraph generalizations of many graph cut problems and algorithms havebeen introduced and analyzed as a way to better explore and understand complex systems anddatasets characterized by multiway relationships. The standard cut function for a hypergraph H = ( V, E ) assigns the same penalty to a cut hyperedge, regardless of how its nodes are separatedby a partition of V . Recent work in theoretical computer science and machine learning has madeuse of a generalized hypergraph cut function that can be deﬁned by associating each hyperedge e ∈ E with a splitting function w e , which assigns a (possibly diﬀerent) penalty to each way ofseparating the nodes of e . When each w e is a submodular cardinality-based splitting function ,meaning that w e ( S ) = g ( | S | ) for some concave function g , previous work has shown that ageneralized hypergraph cut problem can be reduced to a directed graph cut problem on anaugmented node set. However, existing reduction procedures introduce up to O ( | e | ) edges fora hyperedge e . This often results in a dense graph, even when the hypergraph is sparse, whichleads to slow runtimes (in theory and practice) for algorithms that run on the reduced graph.We introduce a new framework of sparsifying hypergraph-to-graph reductions, where a hy-pergraph cut deﬁned by submodular cardinality-based splitting functions is (1+ ε )-approximatedby a cut on a directed graph. Our techniques are based on approximating concave functionsusing piecewise linear curves, and we show that they are optimal within an existing strategy forhypergraph reduction. We provide bounds on the number of edges needed to model diﬀerenttypes of splitting functions. For ε >

0, in the worst case, we need O ( ε − | e | log | e | ) edges toreduce any hyperedge e , which leads to faster runtimes for approximately solving generalizedhypergraph s - t cut problems. For the common machine learning heuristic of a clique split-ting function on a node set e , our approach requires only O ( | e | ) nodes and O ( | e | ε − / log log ε )edges, instead of the O ( | e | ) edges used with existing reductions. Equivalently, we can modelthe cut properties of a complete graph on n nodes using O ( n ) nodes and O ( nε − / log log ε )directed and weighted edges. This sparsiﬁcation leads to faster approximate min s - t graph cutalgorithms for certain classes of co-occurrence graphs that are represented implicitly by a collec-tion of sets modeling co-occurrences. Finally, we apply our sparsiﬁcation techniques to developthe ﬁrst approximation algorithms for approximately minimizing sums of cardinality-based sub-modular functions, which arise in numerous machine learning and computer vision applications,producing faster algorithms in a number of settings. ∗ This research was supported by NSF Award DMS-1830274, ARO Award W911NF19-1-0057, ARO MURI, JP-Morgan Chase & Co., a Simons Investigator Award, a Vannevar Bush Faculty Fellowship, and a grant from theAFOSR. The authors thank Pan Li for helpful conversations about decomposable submodular function minimization. a r X i v : . [ c s . D S ] J u l Introduction

Hypergraphs are a generalization of graphs in which nodes are organized into multiway relationshipscalled hyperedges. Given a hypergraph H = ( V, E ) and a set of nodes S ⊆ V , a hyperedge e ∈ E issaid to be cut by S if both S and ¯ S = V \ S contain at least one node from e . Developing eﬃcientalgorithms for cut problems in hypergraphs is an active area of research in theoretical computerscience [15–17, 23, 36], and has been applied to problems in VLSI layout [4, 28, 35], sparse matrixpartitioning [2, 6], and machine learning [44, 46, 67].Here, we consider recently introduced generalized hypergraph cut functions [44,46,66,70], whichassign diﬀerent penalties to cut hyperedges based on how the nodes of a hyperedge are split intodiﬀerent sides of the bipartition induced by S . To deﬁne a generalized hypergraph cut function,each hyperedge e ∈ E is ﬁrst associated with a splitting function w e : A ⊆ e → R + that mapseach node conﬁguration of e (deﬁned by the subset A ⊆ e in S ) to a nonnegative penalty. In orderto mirror edge cut penalties in graphs, splitting functions are typically assumed to be symmetric( w e ( A ) = w e ( e \ A )) and only penalize cut hyperedges (i.e., w e ( ∅ ) = 0). The generalized hypergraphcut function for a set S ⊆ V is then given by cut H ( S ) = (cid:88) e ∈E w e ( S ∩ e ) . (1)The standard hypergraph cut function is all-or-nothing , meaning it assigns the same penalty toa cut hyperedge regardless of how its nodes are separated. Using the splitting function terminology,this means that w e ( A ) = 0 if A ∈ { e, ∅} , and w e ( A ) = w e otherwise, where w e is a scalar hyperedgeweight. One particularly relevant class of splitting function are submodular functions, which for all A , B ⊆ e satisfy w e ( A ) + w e ( B ) ≥ w e ( A ∩ B ) + w e ( A ∪ B ). When all hyperedge splitting functionsare submodular, solving generalized hypergraph cut problems is closely related to minimizing adecomposable submodular function [20, 21, 39, 45, 54, 64], which in turn is closely related to energyminimization problems often encountered in computer vision [24, 37, 38]. The standard graph cutfunction is another well-known submodular special case of (1).One of the most common techniques for solving hypergraph cut problems is to reduce thehypergraph to a graph sharing similar (or in some cases identical) cut properties. Arguably themost widely used reduction technique is clique expansion, which replaces each hyperedge with a(possibly weighted) clique [10, 28, 44, 71, 73]. In the unweighted case this corresponds to applyinga splitting function of the form: w e ( A ) = | A | · | e \ A | . Previous work has also explored otherclasses of submodular hypergraph cut functions that can be modeled as a graph cut problem ona potentially augmented node set [24, 37, 38, 41, 66]. This research primarily focuses on provingwhen such a reduction is possible, regardless of the number of edges and auxiliary nodes needed torealize the reduction. However, because hyperedges can be very large and splitting functions maybe very general and intricate, many of these techniques lead to large and dense graphs. Therefore,the reduction strategy signiﬁcantly aﬀects the runtime and practicality of algorithms that runon the reduced graph. This leads to several natural questions. Are the graph sizes resultingfrom existing techniques inherently necessary for modeling hypergraph cuts? Given a class offunctions that are known to be graph reducible, can one determine more eﬃcient or even the most eﬃcient reduction techniques? Finally, is it possible to obtain more eﬃcient reductions and fasterdownstream algorithms if it is suﬃcient to just approximately model cut penalties?To answer these questions, we present a novel framework for sparsifying hypergraph-to-graphreductions with provable guarantees on preserving cut properties. Our framework brings togetherconcepts and techniques from several diﬀerent theoretical domains, including algorithms for solvinggeneralized hypergraph cut problems [44, 46, 66, 70], standard graph sparsiﬁcation techniques [8,10, 62], and tools for approximating functions with piecewise linear curves [49, 50]. We presentsparsiﬁcation techniques for a large and natural class of submodular splitting functions that are cardinality-based , meaning that w e ( A ) = w e ( B ) whenever | A | = | B | . These are known to alwaysbe graph reducible, and are particularly natural for several downstream applications [66]. Ourapproach leads to graph reductions that are signiﬁcantly more sparse than previous approaches,and we show that our method is in fact optimally sparse under a certain type of reduction strategy.Our sparsiﬁcation framework can be directly used to develop faster algorithms for approximatelysolving hypergraph s - t cut problems [66], and improve runtimes for a large class of cardinality-baseddecomposable submodular minimization problems [33, 37, 39, 64]. We also show how our techniquesenable us to develop eﬃcient sparsiﬁers for graphs constructed from co-occurrence data. Our framework and results share numerous connections with existing work on graph sparsiﬁcation,which we review here. Let G = ( V, E ) be a graph with a cut function cut G , which can be viewedas a very restricted case of the generalized hypergraph cut function in Eq. (1). An ε -cut sparsiﬁerfor G is a sparse weighted and undirected graph H = ( V, F ) with cut function cut H , such that cut G ( S ) ≤ cut H ( S ) ≤ (1 + ε ) cut G ( S ) , (2)for every subset S ⊆ V . This deﬁnition was introduced by Bencz´ur and Karger [9], who showedhow to obtain a sparsiﬁer with O ( n log n/ε ) edges for any graph in O ( m log n ) time for an n -node, m -edge graph. The more general notion of spectral sparsiﬁcation, which approximately preservesthe Laplacian quadratic form of a graph rather than just the cut function, was later introduced bySpielman and Teng [61]. The best cut and spectral sparsiﬁers have O ( n/ε ) edges, which is knownto be asymptotically optimal for both spectral and cut sparsiﬁers [5, 8]. Although studied muchless extensively, analogous deﬁnitions of cut [17, 36] and spectral [60] sparsiﬁers for hypergraphshave also been developed. However, these apply exclusively to the all-or-nothing cut penalty, anddo not preserve generalized cut functions of the form shown in (1). Bansal et al. [7] also considereda weaker notion of graph and hypergraph sparsiﬁcation, involving additive approximation terms,but in the present work we only consider multiplicative approximations. In this paper, we introduce an alternative notion of an augmented cut sparsiﬁer. We present ourresults in the context of hypergraph-to-graph reductions, though our framework also provides anew notion of augmented sparsiﬁers for graphs. Let H = ( V, E ) be a hypergraph with a generalizedcut function cut H , and let ˆ G = ( V ∪ A , ˆ E ) be a directed graph on an augmented node set V ∪ A .The graph is equipped with an augmented cut function deﬁned for any S ⊆ V by cut ˆ G ( S ) = min T ⊆A dircut ˆ G ( S ∪ T ) , (3)where dircut ˆ G is the standard directed cut function on ˆ G . We say that ˆ G is an ε - augmented cutsparsiﬁer for H if it is sparse and satisﬁes cut H ( S ) ≤ cut ˆ G ( S ) ≤ (1 + ε ) cut H ( S ) . (4)The minimization involved in (3) is especially natural when the goal is to approximate a minimumcut or minimum s - t cut in H . If we solve the corresponding cut problem in ˆ G , nodes from the2 uxiliary node set A will be automatically arranged in a way that yields the minimum directed cutpenalty, as required in (3). If ˆ S ∗ is the minimum cut in ˆ G , S ∗ = V ∩ ˆ S ∗ will be a (1+ ε )-approximateminimum cut in G . Even when solving a minimum cut problem is not the goal, our sparsiﬁers willbe designed in such a way that the augmented cut function (3) will be easy to evaluate.Unlike the standard graph sparsiﬁcation problem, in some cases it may in fact be impossibleto ﬁnd any directed graph ˆ G satisfying (4), independent of the graph’s density. In recent work weshowed that hypergraphs with non-submodular splitting functions are never graph reducible [66].ˇZivn´y et al. [74] showed that even in the case of four-node hyperedges, there exist submodular splitting functions (albeit asymmetric splitting functions) that are not representable by graph cuts.Nevertheless, there are several special cases in which graph reduction is possible [24, 37, 38]. Augmented Sparsiﬁers for Cardinality-Based Hypergraph Cuts

We speciﬁcally considerthe class of submodular splitting functions that are cardinality-based, meaning they satisfy w e ( A ) = w e ( B ) whenever A, B ⊆ e satisfy | A | = | B | . These are known to be graph reducible [37,66], thoughexisting techniques will reduce a hypergraph H = ( V, E ) to a graph with O ( | V | + (cid:80) e ∈E | e | ) nodesand O ( (cid:80) e ∈E | e | ) edges. We prove the following sparse reduction result. Theorem 1.1.

Let H = ( V, E ) be a hypergraph where each e ∈ E is associated with a cardinality-based submodular splitting function. There exists an augmented cut sparsiﬁer ˆ G for H with O ( | V | + ε (cid:80) e ∈E log | e | ) nodes and O ( ε (cid:80) e ∈ E | e | log | e | ) edges. For certain types of splitting functions (e.g., the one corresponding to a clique expansion), weshow that our reductions are even more sparse.

Augmented Sparsiﬁers for Graphs

Another relevant class of augmented sparsiﬁers to consideris the setting where H is simply a graph. In this case, if A is empty and all edges are undirected,condition (4) reduces to the standard deﬁnition of a cut sparsiﬁer. A natural question is whetherthere exist cases where allowing auxiliary nodes and directed edges leads to improved sparsiﬁers.We show that the answer is yes in the case of dense graphs constructed from co-occurrence data. Augmented Spectral Sparsiﬁers

Just as spectral sparsiﬁers generalize cut sparsiﬁers in thestandard graph setting, one can deﬁne an analogous notion of an augmented spectral sparsiﬁerfor hypergraph reductions. This can be accomplished using existing hypergraph generalizations ofthe Laplacian operator [14,46,47,70]. However, although developing augmented spectral sparsiﬁersconstitutes an interesting open direction for future research, it is unclear whether the techniques wedevelop here can be used or adapted to spectrally approximate generalized hypergraph cut functions.We include further discussion on hypergraph Laplacians and spectral sparsiﬁers in Section 7, andpose questions for future work. Our primary focus in this manuscript is to develop techniques foraugmented cut sparsiﬁers.

Graph reduction techniques work by replacing a hyperedge with a small graph gadget modeling thesame cut properties as the hyperedge splitting function. The simplest example of a graph reduciblefunction is the quadratic splitting function, which we also refer to as the clique splitting function: w e ( S ) = | A | · | e \ A | , for A ⊆ e . (5)3

234 5 6 v e (a) Star gadget (b) Clique gadget e e a aa aa aa aa aa aa · b (c) CB-gadget Figure 1: Three gadgets, each modeling a diﬀerent hyperedge splitting function.This function can be modeled by replacing a hyperedge with a clique (Figure 1b). Another functionthat can be modeled by a gadget is the linear penalty, which can be modeled by a star gadget [73]: w e ( S ) = min {| A | , | e \ A |} , for A ⊆ e . (6)A star gadget (Figure 1a) contains an auxiliary node v e for each e ∈ E , which is attached to each v ∈ e with an undirected edge. In order to model the broader class of submodular cardinality-based splitting functions, we previously introduced the cardinality-based gadget [66] (CB-gadget)(Figure 1c). This gadget is parameterized by positive scalars a and b , and includes two auxiliarynodes e (cid:48) and e (cid:48)(cid:48) . For each node v ∈ e , there is a directed edge from v to e (cid:48) and a directed edgefrom e (cid:48)(cid:48) to v , both of weight a . Lastly, there is a directed edge from e (cid:48) to e (cid:48)(cid:48) of weight a · b . ThisCB-gadget corresponds to the following splitting function: w a,b ( A ) = a · min {| A | , | e \ A | , b } . (7)Every submodular, cardinality-based (SCB) splitting function can be modeled by a combination of CB-gadgets with diﬀerent edge weights [66]. A diﬀerent reduction strategy for minimizingsubmodular energy functions with cardinality-based penalties was also previously developed byKohli et al. [37]. Both techniques require up to O ( k ) directed edges for a k -node hyperedge. Sparse Combinations of CB-gadgets

Our work introduces a new framework for approximately modeling submodular cardinality-based (SCB) splitting functions using a small combinations ofCB-gadgets. Figure 2 illustrates our sparsiﬁcation strategy. We ﬁrst associate an SCB splittingfunction with a set of points { ( i, w i ) } , where i represents the number of nodes on the “small side”of a cut hyperedge, and w i is the penalty for such a split. We show that when many of these pointsare collinear, they can be modeled with a smaller number of CB-gadgets. As an example, thestar expansion penalties (6) can be modeled with a single CB-gadget (Figures 2a and 2d), whereasmodeling the quadratic penalty with previous techniques [66] requires many more (Figures 2band 2e). Given this observation, we design new techniques for ε -approximating the set of points { ( i, w i ) } with a piecewise linear curve using a small number linear pieces. We then show how totranslate the resulting piecewise linear curve back into a smaller combination of CB-gadgets that ε -approximates the original splitting function. Our piecewise linear approximation strategy allowsus to ﬁnd the optimal (i.e., minimum-sized) graph reduction in terms of CB-gadgets. When ε = 0,our approach ﬁnds the best way to exactly model an SCB splitting function, and requires only halfthe number of gadgets needed by previous techniques [66]. More importantly, for larger ε , we provethe following sparse approximation result, which is used to prove Theorem 1.1.4 (a) Star function weights (b) Clique function weights (c) Piecewise linear approx e e (d) CB-gadget for star e e e e e e (e) CB-gadgets for clique e e (f) Approximating the clique Figure 2: (a) The linear splitting function (6) can be modeled by a sparse gadget (d). The quadraticsplitting function (5) penalties (b) can be modeled by a dense gadget (e). A piecewise linearapproximation for the quadratic splitting penalties (c) corresponds to a sparse gadget (f).

Theorem 1.2.

For ε ≥ , any submodular cardinality-based splitting function on a k -node hyper-edge can be ε -modeled by combining O (min { log k/ε, k } ) CB-gadgets.

We show that a nearly matching lower bound of O (log k/ √ ε ) CB-gadgets is required for model-ing a square root splitting function. Despite worst case bounds, we prove that only O ( ε − / log log ε )CB-gadgets are needed to approximate the quadratic splitting function, independent of hyperedgesize. This is particularly relevant for approximating the widely used clique expansion technique,as well as for modeling certain types of dense co-occurrence graphs. All of our sparse reductiontechniques are combinatorial, deterministic, and very simple to use in practice. When H is just a graph, augmented sparsiﬁers correspond to a generalization of standard cutsparsiﬁers that allow directed edges and auxiliary nodes. The auxiliary nodes in this case play arole analogous to Steiner nodes in ﬁnding minimum spanning trees. Just as adding Steiner nodesmakes it possible to ﬁnd a smaller weight spanning tree, it is natural to ask whether including anauxiliary node set might lead to better cut sparsiﬁers for a graph G . We show that the answer is yesfor certain classes of dense co-occurrence graphs, which are graphs constructed by inducing a cliqueon a set of nodes that share a certain property or participate in a certain type of group interaction(equivalently, clique expansions of hypergraphs). Steiner nodes have in fact been previously usedin constructing certain types of sparsiﬁers called vertex and ﬂow sparsiﬁers [18]. However, theseare concerned with preserving certain routing properties between distinguished terminal nodes ina graph, and are therefore distinct from our goal of obtaining ε -cut sparsiﬁers. Sparsifying the complete graph

Our ability to sparsify the clique splitting function (5) directlyimplies a new approach for sparsifying a complete graph. Cut sparsiﬁers for the complete graph5rovide a simple case study for understanding the diﬀerences in sparsiﬁcation guarantees that canbe obtained when we allow auxiliary nodes and directed edges. Furthermore, better sparsiﬁers forthe complete graph can be used to design useful sparsiﬁers for co-occurrence graphs. We have thefollowing result.

Theorem 1.3.

Let G = ( V, E ) be the complete graph on n = | V | nodes. There exists an ε -augmented sparsiﬁer for G with O ( n ) nodes and O ( nε − / log log ε ) edges. By comparison, the best standard cut and spectral sparsiﬁers for the complete graph haveexactly n nodes and O ( n/ε ) edges. This is tight for spectral sparsiﬁers [8], as well as for degree-regular cut sparsiﬁers with uniform edge weights [3]. Thus, by adding a small number of auxiliarynodes, our sparsiﬁers enable us to obtain a signiﬁcantly better dependence on ε when cut-sparsifyinga complete graph. Our sparsiﬁer is easily constructed deterministically in O ( nε − / log log ε ) time.Standard undirected sparsiﬁers for the complete graph have received signiﬁcant attention as theycorrespond to expander graphs [3, 8, 48, 51]. We remark that the directed augmented cut sparsiﬁerswe produce are very diﬀerent in nature and should not be viewed as expanders. In particular,unlike for expander graphs, random walks on our complete graph sparsiﬁers will converge to avery non-uniform distribution. We are interested in augmented sparsiﬁers for the complete graphsimply for their ability to model cut properties in a diﬀerent way, and the implications this has forsparsifying hypergraph clique expansions and co-occurrence graphs. Sparsifying co-occurrence graphs

Co-occurrence relationships are inherent in the construc-tion of many types of graphs. Formally, consider a set of n = | V | nodes that are organized intoa set of co-occurrence interactions C ⊆ V . Interaction c ∈ C is associated with a weight w c > i and j is created with weight w ij = (cid:80) c ∈C : i,j ∈ c w c . When w c = 1 forevery c ∈ C , w ij equals the number of interactions that i and j share. We use d avg to denote theaverage number of co-occurrence interactions in which nodes in V participate. The cut value in theresulting graph G = ( V, E ) for a set S ⊆ V is given by the following co-occurrence cut function : cut G ( S ) = (cid:88) c ∈C w c · | S ∩ c | · | ¯ S ∩ c | . (8)Graphs with this co-occurrence cut function arise frequently as clique expansions of a hyper-graph [10, 28, 71, 73], or as projections of a bipartite graph [42, 52, 53, 57, 63, 69, 72]. Even whenthe underlying dataset is not ﬁrst explicitly modeled as a hypergraph or bipartite graph, manyapproaches implicitly use this approach to generate a graph from data. When enough group in-teraction sizes are large, G becomes dense, even if |C| is small. We can signiﬁcantly sparsify G byapplying an eﬃcient sparsiﬁer to each clique induced by a co-occurrence relationship. Importantly,we can do this without ever explicitly forming G . By applying Theorem 1.3 as a black-box forclique sparsiﬁcation, we obtain the following result. Theorem 1.4.

Let G = ( V, E ) be the co-occurrence graph for some C ⊆ V and let n = | V | . For ε > , there exists an augmented sparsiﬁer ˆ G with O ( n + |C| · f ( ε )) nodes and O ( n · d avg · f ( ε )) edges, where f ( ε ) = ε − / log log ε . In particular, if d avg is constant and for some δ > we have (cid:80) c ∈C | c | = Ω( n δ ) , then forming G explicitly takes Ω( n δ ) time, but an augmented sparsiﬁer for G with O ( nf ( ε )) nodes and O ( nf ( ε )) edges can be constructed in O ( nf ( ε )) time. Importantly, the average co-occurrence degree d avg is not the same as the average node degreein G , which will typically be much larger. Theorem 1.4 highlights that in regimes where d avg isa constant, our augmented sparsiﬁers will have fewer edges than the number needed by standard6able 1: Runtimes for cardinality-based decomposable submodular function minimization, andbounds for special regimes. k = k avg = µ/R is the average hyperedge size. For IBFS when k = Θ( n ), we simply list lower bounds indicating why these methods are not practical in this case. k = Θ( n )Method Runtime k = O (1) n = Ω( R ) R = Ω( n )Kolmogorov SF [39] ˜ O ( R k ) ˜ O ( R ) ˜ O ( R n ) ˜ O ( R n )IBFS Strong [20, 22] O ( n θ max (cid:80) e | e | ) O ( n ) Ω( n ) Ω( n R )IBFS Weak [20, 22] ˜ O ( n θ max + n (cid:80) e | e | ) ˜ O ( n ) Ω( n ) Ω( n (cid:80) e | e | )ACDM [20, 21] ˜ O ( nRk ) ˜ O ( nR ) ˜ O ( n R ) ˜ O ( n R )This paper ˜ O (cid:16) min (cid:110)(cid:0) Rkε (cid:1) , Rkε ( n + Rε ) (cid:111)(cid:17) ˜ O (cid:16)(cid:0) Rε (cid:1) (cid:17) ˜ O (cid:16) R (cid:0) nε (cid:1) (cid:17) ˜ O (cid:16) n (cid:0) Rε (cid:1) (cid:17) ε -cut sparsiﬁers. In Section 5, we consider simple graph models that satisfy these assumptions. Wealso consider tradeoﬀs between our augmented sparsiﬁers and standard sparsiﬁcation techniquesfor co-occurrence graphs. Independent of the black-box sparsiﬁer we used, implicitly sparsifying G in this way will often lead to signiﬁcant runtime improvements over forming G explicitly. Typically in hypergraph cut problems it is natural to assume that splitting functions are symmetricand satisfy w e ( ∅ ) = w e ( e ) = 0. However, we show that our sparse reduction techniques apply evenwhen these assumptions do not hold. This allows us to design fast algorithms for approximatelysolving cardinality-based decomposable submodular minimization problems. Formally a function f : 2 V → R + is a decomposable submodular function if it can be written as f ( S ) = (cid:88) e ∈E f e ( S ∩ e ) , (9)where each f e is a submodular function deﬁned on a set e ⊆ V . Following our previous notation andterminology, we say f e is cardinality-based if f e ( S ) = g e ( | S | ) for some concave function g e . Thisspecial case has also received some attention in previous literature on decomposable submodularfunction minimization [33, 37, 39, 64]. Existing approaches for minimizing these functions focuslargely on ﬁnding exact solutions. Using our sparse reduction techniques, we develop the ﬁrst fastalgorithms for approximately solving the problem. Let n = | V | , R = |E| , and µ = (cid:80) e ∈E | e | . InAppendix B, we show that a result similar to Theorem 1.1 also holds for more general cardinality-based splitting functions. In Section 6, we combine that result with the s - t cut solvers of Goldbergand Rao [27] to prove the following theorem. Theorem 1.5.

Let ε > . Any cardinality-based decomposable submodular function can be mini-mized to within a multiplicative (1 + ε ) factor in ˜ O (min { ε − / µ / , ε − µ ( n + ε − R ) / } ) time. We compare this runtime against the best previous techniques for exactly minimizing sumsof cardinality-based submodular functions. We summarize runtimes for competing approaches inTable 1, which includes both strongly polynomial and weakly polynomial methods, the latter ofwhich assume integer-valued functions. We again note that the runtimes for competing approachesare for ﬁnding exact minimizers, whereas our approach provides a (1+ ε ) guarantee. Our techniquesenable us to highlight regimes of the problem where we can obtain signiﬁcantly faster algorithms incases where it is suﬃcient to solve the problem approximately. For example, whenever n = Ω( R ),7ur algorithms for ﬁnding approximate solutions provide a runtime advantage — often a signiﬁcantone — over approaches for computing an exact solution. A generalized hypergraph cut function is deﬁned as the sum of its splitting functions. Therefore,if we can design a technique for approximately modeling a single hyperedge with a sparse graph,this in turn provides a method for constructing an augmented sparsiﬁer for the entire hypergraph.We now formalize the problem of approximating a submodular cardinality-based (SCB) splittingfunction using a combination of cardinality-based (CB) gadgets. We abstract this as the task ofapproximating a certain class of functions with integer inputs (equivalent to SCB splitting func-tions), using a small number of simpler functions (equivalent to cut properties of the gadgets). Let[ r ] = { , , . . . , r } . Deﬁnition 2.1. An r -SCB integer function is a function w : { } ∪ [ r ] → R + satisfying w (0) = 0 (10)2 w ( j ) ≥ w ( j −

1) + w ( j + 1) for j = 1 , . . . , r − ≤ w (1) ≤ w (2) ≤ . . . ≤ w ( r ) (12) We denote the set of r -SCB integer functions by S r . The value w ( i ) represents the splitting penalty for placing i nodes on the small side of a cuthyperedge. In previous work we showed that the inequalities given in Deﬁnition 2.1 are necessaryand suﬃcient conditions for a cardinality-based splitting function to be submodular [66]. The r -SCB integer function for a CB-gadget with edge parameters ( a, b ) (see (7)) is w a,b ( i ) = a · min { i, b } . (13)Combining J CB-gadgets produces a combined r -SCB integer function of special importance. Deﬁnition 2.2. An r -CCB ( C ombined C ardinality- B ased gadget) function of order J , is an r -SCBinteger function ˆ w with the form ˆ w ( i ) = J (cid:88) j =1 a j · min { i, b j } , for i ∈ [ r ] . (14) where the t -dimensional vectors a = ( a j ) and b = ( v j ) parameterizing ˆ w satisfy: b j > , a j > for all j ∈ [ J ] (15) b j < b j +1 for j ∈ [ J ] (16) b J ≤ r. (17) We denote the set of r -CCB functions of order J by C Jr . The conditions on the vectors a and b come from natural observations about combining CB-gadgets. Condition (15) ensures that we do not consider CB-gadgets where all edge weights arezero. The ordering in condition (16) is for convenience; the fact that b j values are all distinctimplies that we cannot collapse two distinct CB-gadgets into a single CB-gadget with new weights.8or condition (17), observe that for any b J ≥ r , min { i, b J } = i for all i ∈ [ r ]. For a helpful visual,note that the r -SCB function in (13) represents splitting penalties for the CB-gadget in Figure 1c.An r -CCB function corresponds to a combination of CB-gadgets, as in Figures 2c and 2f.In previous work we showed that any combination of CB-gadgets produces a submodular andcardinality-based splitting function, which is equivalent to stating that C Jr ⊆ S r for all J ∈ N [66].Furthermore, C rr = S r , since any r -SCB splitting function can be modeled by a combination of r CB-gadgets. Our goal here is to determine how to approximate a function w ∈ S r with somefunction ˆ w ∈ C Jr where J (cid:28) r . This corresponds to modeling an SCB splitting function using a small combination of CB-gadgets. Deﬁnition 2.3.

For a ﬁxed w ∈ S r and an approximation tolerance parameter ε ≥ , the SparseGadget Approximation Problem ( Spa-GAP ) is the following optimization problem: minimize κ subject to w ≤ ˆ w ≤ (1 + ε ) w ˆ w ∈ C κr . (18) Upper Bounding Approximations

Problem (18) speciﬁcally optimizes over functions ˆ w thatupper bound w . This restriction simpliﬁes several aspects of our analysis without any practicalconsequence. For example, we could instead ﬁx some δ ≥ w satisfying δ w ≤ ˜ w ≤ δ w . However, this implies that the function ˆ w = δ ˜ w satisﬁes w ≤ ˆ w ≤ (1 + ε ) w , with ε = δ −

1. Thus, the problems are equivalent for the correct choice of δ and ε . Motivation for Optimizing over CB-gadgets

A natural question to ask is whether it wouldbe better to search for a sparsest approximating gadget over a broader classes of gadgets. There areseveral key reasons why we restrict to combinations of CB-gadgets. First of all, we already knowthese can model any

SCB splitting function, and thus they provide a very simple building block withbroad modeling capabilities. Furthermore, it is clear how to deﬁne an optimally sparse combinationof CB-gadgets: since all CB-gadgets for a k -node hyperedge have the same number of auxiliarynodes and directed edges, an optimally sparse reduction is one with a minimum number of CB-gadgets. If we instead wish to optimize over all possible gadgets, it is likely that the best reductiontechnique will depend on the splitting function that we wish to approximate. Furthermore, theoptimality of a gadget may not even be well-deﬁned, since one must take into account both thenumber of auxiliary nodes as well as the number of edges that are introduced, and the tradeoﬀbetween the two is not always clear. Finally, as we shall see in the next section, by restricting toCB-gadgets, we are able to draw a useful connection between sparse gadgets and approximatingpiecewise linear curves with a smaller number of linear pieces. We begin by deﬁning the class of piecewise linear functions in which we are interested.

Deﬁnition 3.1.

For r ∈ N , F r is the class of functions f : [0 , ∞ ] −→ R + such that:1. f (0) = 0 f is a constant for all x ≥ r f is increasing: x ≤ x = ⇒ f ( x ) ≤ f ( x ) f is piecewise linear . f is concave (and hence, continuous). It will be key to keep track of the number of linear pieces that make up a given function f ∈ F r .Let L be the set of linear functions with nonnegative slopes and intercept terms: L = { g ( x ) = mx + d | m, d ∈ R + } . (19)Every function f ∈ F r can be characterized as the lower envelope of a set of these linear functions. f ( x ) = min g ∈ L g ( x ) , where L ⊂ L . (20)We use | L | to denote the number of linear pieces of f . In order for (20) to properly characterizea function in F r , it must be constant for all x ≥ r (property 2 in Deﬁnition 3.1), and thus L must contain exactly one line of slope zero. The continuous extension ˆ f of an r -CCB function w parameterized by ( a , b ) is deﬁned asˆ f ( x ) = J (cid:88) j =1 a j · min { x, b j } for x ∈ [0 , ∞ ] . (21)We prove that continuously extending any r -CCB function always produces a function in F r .Conversely, every f ∈ F r is the continuous extension of some r -CCB function. Appendix A providesproofs for these results. Lemma 3.1.

Let ˆ f be the continuous extension for w , shown in (21) . This function is in the class F r , and has exactly J positive sloped linear pieces, and one linear piece of slope zero. Lemma 3.2.

Let f be a function in F r with J + 1 linear pieces. Let b i denote the i th breakpointof f , and m i denote the slope of the i th linear piece of f . Deﬁne vectors a , b ∈ R J where b ( i ) = b i and a ( i ) = a i = m i − m i +1 for i ∈ [ J ] . If w is the r -CCB function parameterized by vectors ( a , b ) ,then f is the continuous extension of w . Let w ∈ S r be an arbitrary SCB integer function. Lemma 3.2 implies that if we can ﬁnd a piecewiselinear function f that approximates w and has few linear pieces, we can extract from it a CCBfunction ˆ w with a small order J that approximates w . Equivalently, we can ﬁnd a sparse gadgetthat approximates an SCB splitting function of interest. Our updated goal is therefore to solve thefollowing piecewise linear approximation problem, for a given w ∈ S r and ε ≥ L ⊂L | L | subject to w ( i ) ≤ f ( i ) ≤ (1 + ε ) w ( i ) for i ∈ [ r ] f ∈ F r f ( x ) = min g ∈ L g ( x )for each g ∈ L , g ( j ) = w ( j ) for some j ∈ { } ∪ [ r ] . (22)The last constraint ensures that each linear piece g ∈ L we consider crosses through at least onepoint ( j, w ( j )). We can add this constraint without loss of generality; if any linear piece g is strictlygreater than w at all integers, we could obtain an improved approximation by scaling g until it istangent to w at some point. This constraint, together with the requirement f ∈ F r , implies thatthe constant function g ( r ) ( x ) = w ( r ) is contained in every set of linear functions L that is feasible10igure 3: We restrict our attention to lines in L that coincide with w at at least one integer value.Thus, every function we consider is incident to two consecutive values of w (e.g., the solid line, g (1) ), or, it touches w at exactly one point (dashed line, g ).for (22). Since all feasible solutions contain this constant linear piece, our focus is on determiningthe optimal set of positive-sloped linear pieces needed to approximate w . Optimal linear covers.

Given a ﬁxed ε ≥ i ∈ { } ∪ [ r − L ⊂ L is a linear cover for a function w ∈ S r over the range R = { i, i + 1 , . . . , r } , if each g ∈ L upper bounds w at all points, and if for each j ∈ R there exists g ∈ L such that g ( j ) ≤ (1 + ε ) w ( j ). The set L isan optimal linear cover if it contains the minimum number of positive-sloped linear pieces neededto cover R . Thus, an equivalent way of expressing (22) is that we wish to ﬁnd an optimal linearcover for w over the interval { } ∪ [ r ]. In practice there may be many diﬀerent function f ∈ F r which solve (22), but for our purposes it suﬃces to ﬁnd one. We solve problem (22) by iteratively growing a set of linear functions L ⊂ L one function at a time,until all of w is covered. Let f be the piecewise linear function we construct from linear pieces in L . In order for f to upper bound w , every function g ∈ L in problem (22) must upper bound w atevery i ∈ { } ∪ [ r ]. One way to obtain such a linear function is to connect two consecutive pointsof w . For i ∈ { } ∪ [ r − i, w ( i )) and ( i + 1 , w ( i + 1)) by g ( i ) ( x ) = M i ( x − i ) + w ( i ) , (23)where the slope of the line is M i = w ( i + 1) − w ( i ). In order for a line to upper bound w but onlypass through a single point ( i, w ( i )) for some i ∈ [ r − g ( x ) = m ( x − i ) + w ( i ) , (24)where the slope m satisﬁes M i < m < M i − . The existence of such a line g is only possible whenthe points ( i − , w ( i − i, w ( i )), and ( i + 1 , w ( i + 1)) are not collinear. To understand thestrict bounds on m , note that if g passes through ( i, w ( i )) and has slope exactly M i − , then g isin fact the line g ( i − and also passes through ( i − , w ( i − g has slope greater than M i − ,then g ( i − < w ( i −

1) and does not upper bound w everywhere. We can similarly argue that11he slope of g must be strictly greater than M i so that it does not touch or cross below the point( i + 1 , w ( i + 1)).We illustrate both types of functions (23) and (24) in Figure 3. The following simple observationwill later help in comparing approximation properties of diﬀerent functions in L . Observation 3.1.

For a ﬁxed w ∈ S r , let g, h ∈ L both upper bound w at all integers i ∈ { } ∪ [ r ] ,and assume that for some j ∈ { } ∪ [ r ] , g ( j ) = h ( j ) = w ( j ) . If m g and m h are the slopes of g and h respectively, and m g ≥ m h ≥ , then • For every integer i ∈ [0 , j ] , w ( i ) ≤ g ( i ) ≤ h ( i ) • For every integer i ∈ [ j, r ] , w ( i ) ≤ h ( i ) ≤ g ( i ) . In other words, if g and h are both tangent to w at the same point j , but g has a larger slopethan h , then g provides a better approximation for values smaller than j , while h is the betterapproximation for values larger than j . The ﬁrst linear piece.

Every set L solving (22) must include a linear piece that goes throughthe origin, so that f (0) = 0. We speciﬁcally choose g (0) ( x ) = ( w (1) − w (0)) x + w (0) = w (1) x to bethe ﬁrst linear piece in the set L we construct. Given this ﬁrst linear piece, we can then computethe largest integer i ∈ [ r ] for which g (0) provides a (1 + ε )-approximation: p = max { i ∈ [ r ] | g (0) ( i ) ≤ (1 + ε ) w ( i ) } . The integer (cid:96) = p + 1 therefore is the smallest integer for which we do not have a (1 + ε )-approximation. If (cid:96) ≤ r , our task is then to ﬁnd the smallest number of additional linearpieces in order to cover { (cid:96), . . . , r } with (1 + ε )-approximations. By Observation 3.1, any other g ∈ L with g (0) = 0 and g (1) > w (1) will be a worse approximation to w at all integer values: w ( i ) ≤ g (0) ( i ) < g ( i ) for all i ∈ [ r ]. Therefore, as long as we can ﬁnd a minimum set of additionallinear pieces which provides a (1 + ε )-approximation for all { (cid:96), . . . , r } , our set of functions L willoptimally solve objective (22). Iteratively ﬁnding the next linear piece.

Consider now a generic setting in which we aregiven a left integer endpoint (cid:96) and we wish to ﬁnd linear pieces to approximate the function w from (cid:96) to r . We ﬁrst check whether the constant function g ( r ) ( x ) = w ( r ) provides the desiredapproximation: g ( r ) ( (cid:96) ) ≤ (1 + ε ) w ( (cid:96) ) . (25)If so, we augment L to include g ( r ) and we are done, since this implies that g ( r ) also provides atleast a (1 + ε )-approximation at every i ∈ { (cid:96), (cid:96) + 1 , . . . , r } . If (25) is not true, we must add anotherpositive-sloped linear function to L in order to get the desired approximation for all i ∈ [ r ]. Weadopt a greedy approach that chooses the next line to be the optimizer of the following objectivemax g ∈L p (cid:48) subject to w ( j ) ≤ g ( j ) ≤ (1 + ε ) w ( j ) for j = (cid:96), (cid:96) + 1 , . . . , p (cid:48) . (26)In other words, solving problem (26) means ﬁnding a function that provides at least a (1 + ε )-approximation from (cid:96) to as far towards r as possible in order to cover the widest possible contiguousinterval with the same approximation guarantee. (There is always a feasible point by adding a line g tangent to w ( (cid:96) ).) The following Lemma will help us prove that this greedy scheme produces anoptimal cover for w . 12 emma 3.3. Let p ∗ the solution to (26) and g ∗ be the function that achieves it. If ˆ L ⊂ L is anoptimal cover for w over the integer range { p ∗ + 1 , p ∗ + 2 , . . . r } , then { g ∗ } ∪ ˆ L is an optimal coverfor { (cid:96), (cid:96) + 1 , . . . r } .Proof. Let ˜ L be an arbitrary optimal linear cover for w over the range { (cid:96), (cid:96) + 1 , . . . , r } . This meansthat | ˆ L ∪ { g ∗ }| ≥ | ˜ L | . We know ˜ L must contain a function g such that g ( (cid:96) ) ≤ (1 + ε ) w ( (cid:96) ). Let p g be the largest integer satisfying g ( p g ) ≤ (1 + ε ) w ( p g ). By the optimality of p ∗ and g ∗ , we know p ∗ ≥ p g . Therefore, the set of functions ˜ L − { g } must be a cover for the set { p g + 1 , p g + 2 , . . . r } ⊇{ p ∗ + 1 , p ∗ + 2 , . . . r } . Since ˆ L is an optimal cover for a subset of the integers covered by ˜ L − { g } , | ˆ L | ≤ | ˜ L − { g }| = ⇒ | ˆ L | + 1 ≤ | ˜ L | = ⇒ | ˆ L ∪ { g ∗ }| ≤ | ˜ L | . Therefore, | ˆ L ∪ { g ∗ }| = | ˜ L | , so the result follows.We illustrate a simple procedure for solving (26) in Figure 4. The function g solving (26) musteither join two consecutive points of w (the form given in (23)), or coincide at exactly one point of w (form given in (24)). We ﬁrst identify the integer j ∗ such that g ( j ∗ ) ( (cid:96) ) ≤ (1 + ε ) w ( (cid:96) ) g ( j ∗ +1) ( (cid:96) ) > (1 + ε ) w ( (cid:96) ) . In other words, the linear piece connecting ( j ∗ , w ( j ∗ )) and ( j ∗ + 1 , w ( j ∗ + 1)) provides the neededapproximation at the left endpoint (cid:96) , but g ( i ) for every i > j ∗ does not. Therefore, the solutionto (26) has a slope m ∈ [ M j ∗ , M j ∗ +1 ), and passes through the point ( j ∗ , w ( j ∗ )). By Observa-tion 3.1, the line passing through this point with the smallest slope is guaranteed to provide thebest approximation for all integers p ≥ j ∗ . To minimize the slope of the line while still preservingthe needed approximation at w ( (cid:96) ), we select the line passing through the points ( (cid:96), (1 + ε ) w ( (cid:96) ))and ( j ∗ , w ( j ∗ )). This is given by g ∗ ( x ) = w ( j ∗ ) − (1 + ε ) w ( (cid:96) )( j ∗ − (cid:96) ) ( x − (cid:96) ) + (1 + ε ) w ( (cid:96) ) . (27)After adding this function g ∗ to L , we ﬁnd the largest integer p ≤ r such that g ∗ ( p ) ≤ (1 + ε ) w ( p ).If p < r , then we still need to ﬁnd more linear pieces to approximate w , so we continue withanother iteration. If p = r exactly, then we do not need any more positive-sloped linear pieces toapproximate w . However, we still add the constant function g ( r ) to L before terminating. Thisguarantees that the function f ( x ) = min g ∈ L ( x ) we return is in fact in F r . Furthermore, addingthe constant function serves to improve the approximation, without aﬀecting the order of the CCBfunction we will obtain from f by applying Lemma 3.2.Pseudocode for our procedure for constructing a set of function L is given in Algorithm 1, whichrelies on Algorithm 2 for solving (26). We summarize with a theorem about the optimality of ourmethod for solving (22). Theorem 3.4.

Algorithm 1 runs in O ( r ) time and returns a function f that optimizes (22) .Proof. The optimality of the algorithm follows by inductively applying Lemma 3.3 at each iterationof the algorithm. For the runtime guarantee, note ﬁrst of all that we can compute and store allslopes and intercepts for linear pieces g ( i ) (as given in (23)) in O ( r ) time and space. As thealgorithm progresses, we visit each integer i ∈ [ r ] once, either to perform a comparison of the form g ( i ) ( (cid:96) ) ≤ (1 + ε ) w ( (cid:96) ) for some left endpoint (cid:96) , or to check whether g ∗ ( i ) ≤ (1 + ε ) w ( i ) for somelinear piece g ∗ we added to our linear cover L . Each such g ∗ can be computed in constant time,and as a loose bound we know we compute at most O ( r ) such linear pieces for any ε .13igure 4: Given a left endpoint (cid:96) for which we do not yet have a (1 + ε )-approximate piece, we ﬁndthe next linear piece by choosing a function g ∗ that provides the desired approximation at (cid:96) , whilealso providing a good approximation for as large of an integer p > (cid:96) as possible.By combining Algorithm 2 and Lemma 3.2, we are able to eﬃciently solve Spa-GAP . Theorem 3.5.

Let f be the solution to (22) , and ˆ w be the CCB function obtained from Lemma 3.2based on f . Then ˆ w optimally solves the sparse gadget approximation problem (18) .Proof. Since f and ˆ w coincide at integer values, and f approximates w at integer values, we know w ( i ) ≤ ˆ w ( i ) ≤ (1 + ε ) w ( i ) for i ∈ [ r ]. Thus, ˆ w is feasible for objective (18). If κ ∗ is the numberof positive-sloped linear pieces of f , then the order of ˆ w is κ ∗ by Lemma 3.2, and this must beoptimal for (18). If it were not optimal, this would imply that there exists some upper boundingCCB function w (cid:48) of order κ (cid:48) < κ ∗ that approximates w to within 1 + ε . But by Lemma 3.1, thiswould imply that the continuous extension of w (cid:48) is some f (cid:48) ∈ F r with exactly κ (cid:48) positive-slopedlinear pieces that is feasible for objective (22), contradicting the optimality of f . In our last section we showed an eﬃcient strategy for ﬁnding the minimum number of linear piecesneeded to approximate an SCB integer function. We now consider bounds on the number ofneeded linear pieces in diﬀerent cases, and highlight implications for sparsifying hyperedges withSCB splitting functions. In the worst case, we show that we need O (log k/ε ) gadgets, where k is thesize of the hyperedge. Moreover, this is nearly tight for the square root splitting function. Finally,we show that we only need O ( ε − / log log ε ) gadgets to approximate the clique splitting function.This result is useful for sparsifying co-occurrence graphs and clique expansions of hypergraphs. O (log k/ε ) Upper Bound

We begin by showing that a logarithmic number of CB-gadgets is suﬃcient to approximate anySCB splitting function. 14 lgorithm 1

FindBest-PL-Approx ( w , ε ) (solves (22)) Input: w ∈ S r , ε ≥ Output: f ∈ F r optimizing (22) L = { g (0) } , where g (0) = w (1) xp = max { i ∈ [ r ] | g (0) ( i ) ≤ (1 + ε ) w ( i ) } (cid:96) = p + 1 while (cid:96) ≤ r do ( g ∗ , p ) = FindNext ( w , ε, (cid:96) ) (cid:96) ← p + 1 L ← L ∪ { g ∗ } if p = r then L ← L ∪ { g ( r ) } , where g ( r ) ( x ) = w ( r ) end ifend while Return f deﬁned by f ( x ) = min g ∈ L g ( x ) Algorithm 2

FindNext ( w , ε, (cid:96) ) (solves (26)) Input: w ∈ S r , ε ≥ (cid:96) ∈ [ r ] Output: g ∈ L optimizing (26) if w ( r ) ≤ (1 + ε ) w ( (cid:96) ) then Return ( g ( r ) , r + 1), where g ( r ) ( x ) = w ( r ) else j ∗ = (cid:96) while g ( j ∗ +1) ( (cid:96) ) ≤ (1 + ε ) w ( (cid:96) ) do j ∗ = j ∗ + 1 end while g ∗ ( x ) = w ( j ∗ ) − (1+ ε ) w ( (cid:96) )( j ∗ − (cid:96) ) ( x − (cid:96) ) + (1 + ε ) w ( (cid:96) ) p = max { i ∈ [ r ] | g ∗ ( p ) ≤ (1 + ε ) w ( p ) } Return ( g ∗ , p ) end ifTheorem 4.1. Let ε ≥ and w e be an SCB splitting function on a k -node hyperedge. There existsa set of O (log ε k ) CB-gadgets, which can be constructed in O ( k log ε k ) time, whose splittingfunction ˆ w e satisﬁes w e ( A ) ≤ ˆ w e ( A ) ≤ (1 + ε ) w e ( A ) for all A ⊆ e .Proof. Let r = (cid:98) k/ (cid:99) , and let w ∈ S r be the SCB integer function corresponding to w e , i.e., w ( i ) = w e ( A ) for A ⊆ e such that | A | ∈ { i, k − i } . If we join all points of the form ( i, w ( i )) for i ∈ [ r ] by a line, this results in a piecewise linear function f ∈ F r that is concave and increasing onthe interval [0 , r ]. We ﬁrst show that there exists a set of O (log ε r ) linear pieces that approximates f on the entire interval [1 , r ] to within a factor (1+ ε ). Our argument follows similar previous resultsfor approximating a concave function with a logarithmic number of linear pieces [26, 50].For any value y ∈ [1 , r ], not necessarily an integer, f ( y ) lies on a linear piece of f which we willdenote by g ( y ) ( x ) = M y · x + B y , where M y ≥ B y ≥ y = i is an integer, it may be the breakpoint between two distinct linear pieces, in which case we use therightmost line so that g ( y ) = g ( i ) as in (23), so g ( i ) ( x ) = M i · x + B i where M i = w ( i + 1) − w ( i ) and B i = w ( i ) − M i · i . For any z ∈ ( y, r ), the line g ( y ) provides a z/y approximation to f ( z ) = g ( z ) ( z ),15ince g ( y ) ( z ) = M y · z + B y ≤ zy ( M y · y + B y ) = zy f ( y ) ≤ zy f ( z ) . Equivalently, the line g ( y ) provides a (1 + ε )-approximation for every z ∈ [ y, (1 + ε ) y ]. Thus, it takes J linear pieces to cover the set of intervals [1 , (1 + ε )] , [(1 + ε ) , (1 + ε ) ] , . . . , [(1 + ε ) J − , (1 + ε ) J ]for a positive integer J , and overall at most 1 + (cid:100) log ε r (cid:101) linear pieces to cover all of [0 , r ].Since Algorithm 1 ﬁnds the smallest set of linear pieces to (1 + ε )-cover the splitting penalties,this smallest set must also have at most O (log ε r ) linear pieces. Given this piecewise linearapproximation, we can use Lemma 3.2 to extract a CCB function ˆ w of order J = O (log ε r )satisfying w ( i ) ≤ ˆ w ( i ) ≤ (1 + ε ) w ( i ) for i ∈ { } ∪ [ r ]. This ˆ w in turn corresponds to a set of J CB-gadgets that (1 + ε )-approximates the splitting function w e . Computing edge weights forthe CB-gadgets using Algorithm 1 and Lemma 3.2 takes only O ( r ) time, so the total runtime forconstructing the combined gadgets is equal to the number of individual edges that must be placed,which is O ( k log ε k ).Theorem 1.1 on augmented sparsiﬁers follows as a corollary of Theorem 4.1. Given a hypergraph H = ( V, E ) where each hyperedge has an SCB splitting function, we can use Theorem 4.1 to expandeach e ∈ E into a gadget that has O (log ε | e | ) auxiliary nodes and O ( | e | log ε | e | ) edges. Sincelog ε n behaves as ε log n as ε →

0, Theorem 1.1 follows.In Appendix B, we show that using a slightly diﬀerent reduction, we can prove that Theorem 4.1holds even when we do not require splitting functions to be symmetric or satisfy w e ( ∅ ) = w e ( e ) = 0.In Section 6 we use this fact to develop approximation algorithms for cardinality-based decompos-able submodular function minimization. Next we show that our upper bound is nearly tight for the square root r -SCB integer function, w ( i ) = √ i for i ∈ { } ∪ [ r ] . (28)For this result, we rely on a result previously shown by Magnanti and Stratila [50] on the numberof linear pieces needed to approximate the square root function over a continuous interval. Lemma 4.2. (Lemma 3 in [50]) Let ε > and φ ( x ) = √ x . Let ψ be a piecewise linear functionwhose linear pieces are all tangent lines to φ , satisfying ψ ( x ) ≤ (1 + ε ) φ ( x ) for all x ∈ [ l, u ] for < l < u . Then ψ contains at least (cid:100) log γ ( ε ) ul (cid:101) linear pieces, where γ ( ε ) = (1 + 2 ε (2 + ε ) + 2(1 + ε ) (cid:112) ε (2 + ε )) . There exists a piecewise linear function ψ ∗ of this form with exactly (cid:100) log γ ( ε ) ul (cid:101) linear pieces. As ε → , this values behaves as √ ε log ul . Lemma 4.2 is concerned with approximating the square root function for all values on a contin-uous interval. Therefore, it does not immediately imply any bounds on approximating a discreteset of splitting penalties. In fact, we know that when lower bounding the number of linear piecesneeded to approximate any w ∈ S r , there is no lower bound of the form q ( ε ) f ( r ) that holds for all ε >

0, if q is a function such that q ( ε ) → ∞ as ε →

0. This is simply because we can approximate w by piecewise linear interpolation, leading to an upper bound of O ( r ) linear pieces even when ε = 0.Therefore, the best we can expect is a lower bound that holds for ε values that may still go to zero This additional statement is not included explicitly in the statement of Lemma 3 in [50], but it follows directlyfrom the proof of the lemma, which shows how to construct such an optimal function ψ ∗ . r → ∞ , but are bounded in such a way that we do not contradict the O ( r ) upper bound thatholds for all SCB integer functions. We prove such a result for the square root splitting function,using Lemma 4.2 as a black box. When ε falls below the bound we assume in the following theoremstatement, forming O ( r ) linear pieces will be nearly optimal. Theorem 4.3.

Let ε > and w ( i ) = √ i be the square root r -SCB integer function. If ε ≥ r − δ forsome constant δ ∈ (0 , , then any piecewise linear function providing a (1 + ε ) -approximation for w contains Ω(log γ ( ε ) r ) linear pieces, which behaves as Ω( ε − / log r ) as ε → .Proof. Let L ∗ be the optimal set of linear pieces returned by running Algorithm 1. In order toshow | L ∗ | = Ω(log γ ( ε ) r ), we will construct a new set of linear pieces L that has asymptoticallythe same number of linear pieces as L ∗ , but also provides a (1 + ε )-approximation for all x in aninterval [ r β , r ] for some constant β <

1. Invoking Lemma 4.2 will then guarantee the ﬁnal result.Recall that L ∗ includes only two types of linear pieces: either linear pieces g satisfying g ( j ) = √ j for exactly one integer j (see (24)), or linear pieces formed by joining two points of w (see (23)).For the square root splitting function, the latter type of linear piece is of the form g ( t ) ( i ) = ( √ t + 1 − √ t )( i − t ) + √ t, (29)for some positive integer t less than r . This is the linear interpolation of the points ( t, √ t ) and( t + 1 , √ t + 1). Both types of linear pieces bound φ ( x ) = √ x above at integer points, but theymay cross below φ at non-integer values of x . To apply Lemma 4.2, we would like to obtain a setof linear pieces that are all tangent lines to φ . We accomplish this by replacing each linear piece in L ∗ with two or three linear pieces that are tangent to φ at some point. For a positive integer j , let g j denote the line tangent to φ ( x ) = √ x at x = j , which is given by g j ( x ) = 12 √ j ( x − j ) + (cid:112) j. (30)We form a new set of linear pieces L made up of lines tangent to φ using the following replacements: • If L ∗ contains a linear piece g that satisﬁes g ( j ) = √ j for exactly one integer j , add lines g j − , g j , and g j +1 to L . • If for an integer t , L ∗ contains the line g ( t ) as given by Eq. (29), add lines g t and g t +1 to L .By Observation 3.1, this replacement can only improve the approximation guarantee at integerpoints. Therefore, L provides a (1 + ε )-approximation at integer values, is made up strictly of linesthat are tangent to φ , and contains at most three times the number of lines in L ∗ .Due to the concavity of φ , if a single line g ∈ L provides a (1 + ε )-approximation at consecutiveintegers i and i +1, then g provides the same approximation guarantee for all x ∈ [ i, i +1]. However,if two integers i and i + 1 are not both covered by the same line in L , then L does not necessarilyprovide a (1 + ε )-approximation for every x ∈ [ i, i + 1]. There can be at most | L | intervals of thisform, since each interval deﬁnes an “intersection” at which one line g ∈ L ceases to be a (1 + ε )-approximation, and another line g (cid:48) ∈ L “takes over” as the line providing the approximation.By Lemma 4.2, we can cover an entire interval [ i, i +1] for any integer i using a set of (cid:100) log γ ( ε ) (cid:0) i (cid:1) (cid:101) linear pieces that are tangent to φ somewhere in [ i, i + 1]. Since 1 + √ ε ≤ γ ( ε ), it in fact takesonly one linear piece to cover [ i, i + 1] as long as 1 + 1 /i ≤ √ ε = ⇒ i ≥ / √ ε . Since ε ≥ r − δ ,interval [ i, i + 1] can be covered by a single linear piece if i ≥ r δ/ . Therefore, for each interval[ i, i + 1], with i ≥ r δ/ , that is not already covered by a single linear piece in L , we add one morelinear piece to L to cover this interval. This at most doubles the size of L .17he resulting set L will have at most 6 times as many linear pieces as L ∗ , and is guaranteed toprovide a (1 + ε )-approximation for all integers, as well as the entire continuous interval [ r δ/ , r ].Since δ is a ﬁxed constant strictly less than 2, applying Lemma 4.2 shows that L has at least (cid:108) log γ ( ε ) rr δ/ (cid:109) = Ω(log γ ( ε ) r − δ/ ) = Ω(log γ ( ε ) r )linear pieces. Therefore, | L ∗ | = Ω(log γ ( ε ) r ) as well. When approximating the clique expansion splitting function, Algorithm 1 will in fact ﬁnd a piece-wise linear curve with at most O ( ε − / log log ε ) linear pieces. We prove this by highlighting adiﬀerent approach for constructing a piecewise linear curve with this many linear pieces, whichupper bounds the number of linear pieces in the optimal curve found by Algorithm 1.Clique splitting penalties for a k -node hyperedge correspond to nonnegative integer values ofthe continuous function ζ ( x ) = x · ( k − x ). As we did in Section 3.3, we want to build a set oflinear pieces L that provides and upper bounding (1 + ε )-cover of ζ at integer values in [0 , r ], where r = (cid:98) k/ (cid:99) . We start by adding the line g (0) ( x ) = ( w (1) − w (0)) x + w (0) = ( k − · x to L , whichperfectly covers the ﬁrst two splitting penalties w (0) = 0 and w (1) = k −

1. In the remainder ofour new procedure we will ﬁnd a set of linear pieces to (1 + ε )-cover ζ at every value of x ∈ [1 , k/ x .We apply a greedy procedure similar to Algorithm 1. At each iteration we consider a leftmostendpoint z i which is the largest value in [1 , k/

2] for which we already have a (1 + ε )-approximation.In the ﬁrst iteration, we have z = 1. We then would like to ﬁnd a new linear piece that providesa (1 + ε )-approximation for all values from z i to some z i +1 , where the value of z i +1 is maximized.We restrict to linear pieces that are tangent to ζ . The line tangent to ζ at t ∈ [1 , k/

2] is given by g t ( x ) = kx − tx + t . (31)We ﬁnd z i +1 in two steps:1. Step 1:

Find the maximum value t such that g t ( z i ) = (1 + ε ) ζ ( z i ).2. Step 2:

Given t , ﬁnd the maximum z i +1 such that g t ( z i +1 ) = (1 + ε ) ζ ( z i +1 ).After completing these two steps, we add the linear piece g t to L , knowing that it covers all values in[ z i , z i +1 ] with a (1 + ε )-approximation. At this point, we will have a cover for all values in [0 , z i +1 ],and we begin a new iteration with z i +1 being the largest value covered. We continue until we havecovered all values up until z i +1 ≥ k/

2. If t > k/ ζ at x = k/

2, so that we only include lines that havea nonnegative slope.

Lemma 4.4.

For any z i ∈ [1 , k/ , the values of t and z i +1 given in steps 1 and 2 are given by t = z i + (cid:112) z i ( k − z i ) ε (32) z i +1 = t ε + kε ε ) + 12(1 + ε ) (cid:0) k ε + 4 εt ( k − t ) (cid:1) / (33) Proof.

The proof simply requires solving two diﬀerent quadratic equations. For Step 1: g t ( z i ) = (1 + ε ) ζ ( z i ) ⇐⇒ kz i − tz i + t = (1 + ε )( z i k − z i ) ⇐⇒ t − z i t − εz i k + (1 + ε ) z i = 018 lgorithm 3 Find a (1 + ε )-cover L for the clique splitting function. Input:

Hyperedge size k , ε ≥ Output: (1 + ε ) cover for clique splitting function. L = { g (0) } , where g (0) ( x ) = ( k − xz = 1 do t ← z + (cid:112) z ( k − z ) εz ← t ε + kε ε ) + ε ) (cid:0) k ε + 4 εt ( k − t ) (cid:1) / if t > k/ then L ← L ∪ { g k/ } , where g k/ ( x ) = k/ else L ← L ∪ { g t } , where g t ( x ) = kx − tx + t end ifwhile z i +1 < k/ f deﬁned by f ( x ) = min g ∈ L g ( x )Taking the larger solution to maximize t : t = 12 (cid:18) z i + (cid:113) z i − ε ) z i + 4 εkz i (cid:19) = z i + (cid:112) z i ( k − z i ) ε. For Step 2: g t ( z i +1 ) = (1 + ε ) ζ ( z i +1 ) ⇐⇒ kz i +1 − tz i +1 + t = (1 + ε )( z i +1 k − z i +1 ) ⇐⇒ (1 + ε ) z i +1 + z i +1 ( − εk − t ) + t = 0 . We again take the larger solution to this quadratic equation since we want to maximize z i +1 : z i +1 = 12(1 + ε ) (cid:16) εk + 2 t + (cid:112) ε k + 4 tεk + 4 t − ε ) t (cid:17) = 12(1 + ε ) (cid:16) εk + 2 t + (cid:112) ε k + 4 tε ( k − t ) (cid:17) . Algorithm 3 summarizes the new procedure for covering the clique splitting function. Since z = 1, if ε ≥

1, then z ≥ ε ) (2 kε ) = kε ε ≥ k , so after one step we have covered the entire interval [1 , k/ ε < Theorem 4.5.

For ε < , if L is the output from Algorithm 3, then | L | = O ( ε − / log log ε ) .Proof. We get a loose bound for the value of t in Lemma 4.4 by noting that ( k − z i ) ≥ k/ ≥ z i : t = z i + (cid:112) z i ε ( k − z i ) ≥ z i + (cid:113) z i ε = z i (1 + √ ε ) . (34)Since we assumed ε <

1, we know that t ε ≥ z i (1 + √ ε )1 + ε > z i . (35)19herefore, from (33) we see that z i +1 > z i + kε ε ) + 12(1 + ε ) (cid:0) k ε + 4 εt ( k − t ) (cid:1) / (36) > z i + kε ε ) + 12(1 + ε ) (cid:0) k ε (cid:1) / = z i + kε ε . (37)From this we see that at each iteration, we cover an additional interval of length z i +1 − z i > kε ε ,and therefore we know it will take at most O (1 /ε ) iterations to cover all of [1 , k/ z i +1 − z i in fact increases signiﬁcantly with each iteration,allowing the algorithm to cover larger and larger intervals as it progresses.Since z = 1 and z i +1 − z i ≥ kε ε , we see that z j ≥ kε for all j ≥

3. For the remainder ofthe proof, we focus on bounding the number of iterations it takes to cover the interval [ kε, k/ j refers to the set ofiterations that the algorithm spends to cover the interval R j = (cid:104) kε ( ) j − , kε ( ) j (cid:105) , (38)For example, Round 1 starts with the iteration i such that z i ≥ kε , and terminates when thealgorithm reaches an iteration i (cid:48) where z i (cid:48) ≥ kε / . A key observation is that it takes less than4 / √ ε iterations for the algorithm to ﬁnish Round j for any value of j . To see why, observe thatfrom the bound in (36) we have z i +1 − z i > kε ε ) + 12(1 + ε ) (cid:0) k ε + 4 εt ( k − t ) (cid:1) / > ε ) (4 εt ( k − t )) / ≥ ε ) (cid:18) εz i k (cid:19) / > √ √ kε (1 + ε ) √ z i . For each iteration i in Round j , we know that z i ≥ kε ( ) j − , so that z i +1 − z i > √ √ kε (1 + ε ) (cid:113) kε ( ) j − ≥ √ kε + ( ) j ε = C · k · ε + ( ) j , (39)where C = √ / (2(1 + ε )) is a constant larger than 1 /

4. Since each iteration of Round j covers aninterval of length at least C · k · ε + ( ) j , and the right endpoint for Round j is kε ( ) j , the maximumnumber of iterations needed to complete Round j is kε ( ) j C · k · ε + ( ) j = 1 C √ ε . (40)Therefore, after p rounds, the algorithm will have performed O ( p · ε − / ) iterations, to cover theinterval [1 , kε ( ) p ]. Since we set out to cover the interval [1 , k/ p satisﬁes ε ( ) p ≥ /

2, which holds as long as p ≥ log log ε : ε ( ) p ≥ / ⇐⇒ (cid:18) (cid:19) p log ε ≥ − ⇐⇒ log ε ≥ − p ⇐⇒ log ε ≤ p ⇐⇒ log log ε ≤ p. This means that the number of iteration of Algorithm 3, and therefore the number of linear piecesin L , is bounded above by O ( ε − / log log ε ).We obtain a proof of Theorem 1.3 on sparsifying the complete graph as a corollary. Proof of Theorem 1.3.

A complete graph on n nodes can be viewed as a hypergraph witha single n -node hyperedge with a clique expansion splitting function. Theorem 1.3 says that theclique expansion integer function w ( i ) = i · ( n − i ) can be covered with O ( ε − / log log ε − ) linearpieces, which is equivalent to saying the clique expansion splitting function can be modeled usingthis many CB-gadgets. Each CB-gadget has two auxiliary nodes and (2 n + 1) directed edges. Thisresults in an augmented sparsiﬁer for the complete graph with O ( nε − / log log ε − ) edges. This isonly meaningful if ε is small enough so that O ( ε − / log log ε − ) is asymptotically less than n , soour sparsiﬁer has O ( n + ε − / log log ε − ) = O ( n ) nodes. (cid:3) Recall from the introduction that a co-occurrence graph is formally deﬁned by a set of nodes V and a set of subsets C ⊆ V . In practice, each c ∈ C could represent some type of group interactioninvolving nodes in c or a set of nodes sharing the same attribute. We deﬁne the co-occurrence graph G = ( V, E ) on C to be the graph where nodes i and j share an edge with weight w ij = (cid:80) c ∈C w c ,where w c ≥ c ∈ C . The case when w c = 1 is standardand is an example of a common practice of “one-mode projections” of bipartite graphs or aﬃliationnetworks [11, 40, 42, 52, 53, 57, 72] — a graph is formed on the nodes from one side of a bipartitegraph by connecting two nodes whenever they share a common neighbor on the other side, whereedges are weighted based on the number of shared neighbors.A co-occurrence graph G has the following co-occurrence cut function: cut G ( S ) = (cid:88) c ∈C w c · | S ∩ c | · | ¯ S ∩ c | . (41)In this sense, the co-occurrence graph is naturally interpreted as a weighted clique expansion of ahypergraph H = ( V, C ), which itself is a special case of reducing a submodular, cardinality-basedhypergraph to a graph. However, this type of graph construction is by no means restricted toliterature on hypergraph clustering. In many applications, the ﬁrst step in a larger experimentalpipeline is to construct a graph of this type from a large dataset. The resulting graph is often quitedense, as numerous domains involve large hyperedges [56, 67]. This makes it expensive to form,store, and compute over co-occurrence graphs in practice.21olving cut problems on these dense co-occurrence graphs arises naturally in many settings. Forexample, any hypergraph clustering application that relies on a clique expansion involves a graphwith a co-occurrence cut function [1, 28–30, 44, 58, 65, 67, 68, 71, 73]. Clustering social networks isanother use case, as online platforms have many ways to create groups of users (e.g., events, specialinterest groups, businesses, organizations, etc.), that can be large in practice. Furthermore, cutsin co-occurrence graphs of students on a university campus (based on, e.g., common classes, livingarrangements, or physical proximity) are relevant to preventing the spread of infectious diseasessuch as COVID-19.In these cases, it would be more eﬃcient to sparsify the graph without ever forming it explicitly ,by sparsifying large cliques induced by co-occurrence relationships. Although this strategy seemsintuitive, it is often ignored in practice. We therefore present several theoretical results that high-light the beneﬁts of this implicit approach to sparsiﬁcation. Our focus is on results that can beachieved using augmented sparsiﬁers for cliques, though many of the same beneﬁts could also beachieved with standard sparsiﬁcation techniques. Let C be a set of nonempty co-occurrence groups on a set of n nodes, V , and let G = ( V, E ) bethe corresponding co-occurrence graph on C . For c ∈ C , let k c = | c | be the number of nodes in c . For v ∈ V , let d v be the co-occurrence degree of v : the number of sets c containing v . Let d avg = n (cid:80) v ∈ V d v be the average co-occurrence degree. We re-state and prove Theorem 1.4, ﬁrstpresented in the introduction. The proof holds independent of the weight w c we associate with each c ∈ C , since we can always scale our graph reduction techniques by an arbitrary positive weight. Theorem.

Let ε > and f ( ε ) = ε − / log log ε . There exists an augmented sparsiﬁer for G with O ( n + |C| · f ( ε )) nodes and O ( n · d avg · f ( ε )) edges. In particular, if d avg is constant and for some δ > we have (cid:80) c ∈C | c | = Ω( n δ ) , then forming G explicitly takes Ω( n δ ) time, but an augmentedsparsiﬁer for G with O ( nf ( ε )) nodes and O ( nf ( ε )) edges can be constructed in O ( nf ( ε )) time.Proof. The set c induces a clique in the co-occurrence graph with O ( k c ) edges. Therefore, theruntime for explicitly forming G = ( V, E ) by expanding cliques and placing all edges equals O ( (cid:80) c ∈C k c ) = Ω( n δ ). By Theorem 1.3, for each c ∈ C we can produce an augmented spar-siﬁer with O ( k c f ( ε )) directed edges and O ( f ( ε )) new auxiliary nodes. Sparsifying each clique inthis way will produce an augmented sparsiﬁer ˆ G = ( ˆ V , ˆ E ) where | ˆ E | = (cid:88) c ∈C O ( k c f ( ε )) = O ( f ( ε ) · n · d avg ) (42) | ˆ V | = n + (cid:88) c ∈C O ( f ( ε )) = O ( n + |C| f ( ε )) . (43)Observe that n · d avg = (cid:80) v ∈ V d v = (cid:80) c ∈C k c . If d avg is a constant, this implies that (cid:80) c ∈C k c = O ( n ),and furthermore that |C| = O ( n ), since each k c ≥

1. Therefore | ˆ E | and | ˆ V | are both O ( nf ( ε )).Only O ( f ( ε )) edge weights need to be computed for the clique, so the overall runtime is just thetime it takes to explicitly place the O ( nf ( ε )) edges.The above theorem and its proof includes the case where |C| = o ( n ), meaning that C is madeup of a sublinear number of large co-occurrence interactions. In this case, our augmented sparsiﬁerwill have fewer than O ( nf ( ε )) nodes. When |C| = ω ( n ), the average degree will no longer be aconstant and therefore it becomes theoretically beneﬁcial to sparsify each clique in C using standard22ndirected sparsiﬁers. For each c ∈ C , standard cut sparsiﬁcation techniques will produce an ε -cut sparsiﬁer of c with O ( k c ε − ) undirected edges and exactly k c nodes. If two nodes appearin multiple co-occurrence relationships, the resulting edges can be collapsed into a weighted edgebetween the nodes, meaning that the number of edges in the resulting sparsiﬁer does not depend on d avg . We discuss tradeoﬀs between diﬀerent sparsiﬁcation techniques in depth in a later subsection.Regardless of the sparsiﬁcation technique we apply in practice, implicitly sparsifying a co-occurrencegraph will often lead to a signiﬁcant decrease in runtime compared to forming the entire graph priorto sparsifying it. We now consider a simple model for co-occurrence graphs with a power-law group size distribution,that produces graphs satisfying the conditions of Theorem 1.4 in a range of diﬀerent parametersettings. Such distributions have been observed for many types of co-occurrence graphs constructedfrom real-world data [11, 19]. More formally, let V be a set of n nodes, and assume a co-occurrenceset c is randomly generated by sampling a set of size K from a discrete power-law distributionwhere for k ∈ [1 , n ]: P [ K = k ] = Ck − γ . Here, C is a normalizing constant for the distribution, and γ and is a parameter of the model.Once K is drawn from this model, a co-occurrence set c is generated with a set of K nodes from V chosen uniformly at random. This procedure can be repeated an arbitrary number of times(drawing all sizes K independently) to produce a set of co-occurrence sets C . This C can then beused to generate a co-occurrence graph G = ( V, E ). (The end result of this procedure is a type of random intersection graph [12].) We ﬁrst consider a parameter regime where set sizes are constanton average but large enough to produce a dense co-occurrence graph that is ineﬃcient to explicitlyform in practice. The regime has an exponent γ ∈ (2 , Theorem 5.1.

Let C be a set of O ( n ) co-occurrence sets obtained from the power-law model with γ ∈ (2 , . The expected degree of each node will be constant and E (cid:2)(cid:80) c ∈C | c | (cid:3) = O ( n − γ ) .Proof. Let K be the size of a randomly generated co-occurrence set. We compute: E [ K ] = n (cid:88) k =1 k · P [ K = k ] = C · n (cid:88) k =1 k − γ ≤ C · (cid:20) (cid:90) n x − γ dx (cid:21) = C + Cn − γ − γ − C − γ = O ( n − γ ) . Therefore, E (cid:34)(cid:88) c ∈C | c | (cid:35) = (cid:88) c ∈C E [ K ] = O ( n − γ ) . For a node v ∈ V and a randomly generated set c , the probability that v will be selected to be in c is P [ v ∈ c ] = n (cid:88) k =1 P [ | c | = k ] · (cid:0) n − k − (cid:1)(cid:0) nk (cid:1) = C · n (cid:88) k =1 k − γ · kn = Cn · (cid:20) (cid:90) n x − γ dx (cid:21) = O ( n − ) . Since there are O ( n ) co-occurrence sets in C and they each are generated independently, in expec-tation, v will have a constant degree. 23e similarly consider another regime of co-occurrence graphs where the number of co-occurrencesets is asymptotically smaller than n , but the co-occurrence sets are larger on average. Theorem 5.2.

Let C be a set of O ( n β ) co-occurrence sets, where β ∈ (0 , , obtained from thepower-law co-occurrence model with γ = 1 + β . Then the expected degree of each node will be aconstant and E (cid:2)(cid:80) c ∈C | c | (cid:3) = O ( n ) .Proof. Again let K be a random variable representing the co-occurrence set size. We have E [ K ] = C · n (cid:88) k =1 k − γ = O ( n − γ ) = ⇒ E (cid:34)(cid:88) c ∈C | c | (cid:35) = O ( n β +4 − γ ) = O ( n ) . For a node v ∈ V and a randomly generated set c , the probability that v will be in c is P [ v ∈ c ] = n (cid:88) k =1 P [ | c | = k ] · (cid:0) n − k − (cid:1)(cid:0) nk (cid:1) = Cn n (cid:88) k =1 k − γ = O ( n − γ − ) = O ( n − β )Since there are O ( n β ) co-occurrence sets in C , the expected degree of v is a constant.In Theorem 5.2, the exponent of the power-law distribution is assumed to be directly relatedto the number of co-occurrence sets in C . This assumption is included simply to ensure that weare in fact considering co-occurrence graphs with O ( n ) nodes. We could alternatively consider apower-law distribution with exponent γ ∈ (1 ,

2) and generate O ( n β ) co-occurrence sets for any β < − γ . We simply note that in this regime, the expected average degree will be o (1). Assumingwe exclude isolated nodes, this will produce a co-occurrence graph with o ( n ) nodes in expectation.Our techniques still apply in this setting, and we can produce augmented sparsiﬁers with O ( |C|· f ( ε ))nodes and O ( n · d avg · f ( ε )) = o ( n · f ( ε )) edges. When |C| = Ω( n ), then d avg = Ω(1) and the numberof edges in our augmented sparsiﬁers will have worse than linear dependence on n . However, inthis regime we can still quickly obtain sparsiﬁers with O ( nε − ) edges via implicit sparsiﬁcation byusing standard undirected sparsiﬁers.More sophisticated models for generating co-occurrence graphs can also be derived from existingmodels for projections of bipartite graphs [11–13]. These make it possible to set diﬀerent distri-butions for node degrees in V and highlight other classes of co-occurrence graphs satisfying theassumptions of Theorem 1.4. Here we have chosen to focus on the simplest model for illustratingclasses of power-law co-occurrence graphs that satisfy the assumptions of the theorem. There are several tradeoﬀs to consider when using diﬀerent black-box sparsiﬁers for implicit co-occurrence sparsiﬁcation. Standard sparsiﬁcation techniques involve no auxiliary nodes, and haveundirected edges, which is beneﬁcial in numerous applications. Also, the number of edges theyrequire is independent of d avg . Therefore, in cases where the average co-occurrence degree is largerthan a constant, we obtain better theoretical improvements using standard sparsiﬁers.On the other hand, in many settings, it is natural to assume the number of co-occurrences eachnode belongs to is a constant, even if some co-occurrences are very large. In these regimes, ouraugmented sparsiﬁers will have fewer edges that traditional sparsiﬁers due to a better dependenceon ε . Our techniques are also deterministic and our sparsiﬁers are very easy to construct in practice.Edge weights for our sparsiﬁers are easy to determine in O ( f ( ε )) time for each co-occurrence groupusing Algorithm 1 (or Algorithm 3) coupled with Lemma 3.2. The bottleneck in our construction24s simply visiting each node in a set c to place edges between it and the auxiliary nodes. Even incases where there are no asymptotic reductions in theoretical runtime, our techniques provide asimple and highly practical tool for solving cut problems on co-occurrence data. Appendix B shows how our sparse reduction techniques can be adjusted to apply even when split-ting functions are asymmetric and are not required to satisfy w e ( ∅ ) = w e ( e ) = 0 (the non-cutignoring property). Section 3 addresses the special case of symmetric and non-cut ignoring func-tions, as these assumptions are more natural for hypergraph cut problems [44, 46, 66], and providethe clearest exposition of our main techniques and results. Furthermore, applying the generalizedasymmetric reduction strategy in Appendix B to a symmetric splitting function would introducetwice as many edges as applying the reduction from Section 3 designed explicitly for the symmetriccase. Nevertheless, the same asymptotic upper bound of O ( ε log k ) edges holds for approximatelymodeling the more general splitting function on a k -node hyperedge. By dropping the symmetryand non-cut ignoring assumptions, our techniques lead to the ﬁrst approximation algorithms forthe more general problem of minimizing cardinality-based decomposable submodular functions. Any submodular function can be minimized in polynomial time [31, 32, 55], but the runtimes forgeneral submodular functions are impractical in most cases. A number of recent papers have devel-oped faster algorithms for minimizing submodular functions that are sums of simpler submodularfunctions [20, 21, 33, 34, 39, 45, 54, 64]. This is also know as decomposable submodular functionminimization (DSFM). Many energy minimization problems from computer vision correspond toDSFM problems [24, 37, 38].Let f : 2 V → R + be a submodular function, such that for S ⊆ V , f ( S ) = (cid:88) e ∈E f e ( S ∩ e ) , (44)where for each e ∈ E , f e is a simpler submodular function with support only on a subset e ⊆ V .We can assume without loss of generality that every f e is a non-negative function. The goal ofDFSM is to ﬁnd arg min S f ( S ). The terminology used for problems of this form diﬀers dependingon the context. We will continue to refer to E as a hyperedge set, V as a node set, f e as generalizedsplitting functions, and f as some type of generalized hypergraph cut function.Much previous research explicitly considers the case where each function f e is given by f e ( S ) = g e ( | S | ) for some concave function g e [33, 34, 37, 39, 64]. Unlike existing work on generalized hyper-graph cut functions [44, 46, 66], research on DFSM does not typically assume that the functions f e are symmetric, and also do not assume that f e ( ∅ ) = f e ( e ) = 0.25 .2 Notation for Runtime Comparisons Let n = | V | , R = |E| , µ = (cid:80) e ∈E | e | , and let k avg = µR denote the average hyperedge size. Note that (cid:88) e ∈E log | e | ≤ R log n (cid:88) e ∈E | e | log | e | ≤ µ log n max { n, R } ≤ µ ≤ n · R. We primarily focus on how our techniques enable us to obtain runtimes that are strictly betterin terms of number of nodes, number of edges, and average hyperedge size, by producing anapproximate solution. We use ˜ O notation to hide logarithmic factors of n and R . In order tocompare weakly polynomial runtimes, in some cases we restrict to the case where f e has integeroutputs. For this case, we let F max = max S ⊆ V f ( S ), and assume log F max is small enough that itcan also be absorbed by ˜ O notation. We also consider strongly polynomial runtimes that can beobtained for arbitrary edge weights. Previous research on DFSM has focused largely on runtimesfor ﬁnding exact solutions. Our goal is to highlight improved runtimes that can be obtained if weare content with solutions that are within a factor (1 + ε ) of optimality. Appendix B shows how our reduction techniques enable us to approximately minimize a cardinality-based DFSM problem. This can be accomplished by solving a directed minimum s - t cut problemon a reduced graph with N = O ( n + ε − R log n ) nodes and M = O ( ε − µ log n ) edges. We usethis to obtain the strongly polynomial runtime guarantee of ˜ O (min { ε − / µ / , ε − µ ( n + ε − R ) / } )given in Theorem 1.5, which we now prove. Proof of Theorem 1.5

Proof.

The runtime comes from applying the directed s - t cut solvers of of Goldberg and Rao [27].Although Goldberg and Rao assume integer edge weights and report weakly polynomial runtimesfor exact s - t cut solution, as long as we are content with approximate solutions, slight adjustmentsallow us to obtain a strongly polynomial runtime for arbitrary weights.If ε is a constant greater than one, we can decrease it to equal 1 and get a better approximationwith the same asymptotic runtime. In the remainder of the proof, we therefore assume ε ≤ ε = ε/ G = ( ˆ V , ˆ E ) be the directed graph resulting from our approximate reductiontechniques with parameter ε . This graph has N = O ( n + ε − R log n ) nodes and M = O ( ε − µ log n )edges, and distinguished source and sink nodes s and t so that the minimum s - t cut corresponds toa (1 + ε )-approximation for DFSM. Begin by scaling the edge weights so that the minimum s - t cutin ˆ G is at least one. This can be done by ﬁnding an augmenting ﬂow path from s to t and scalingedge weights so that the path has a capacity of at least 1 on all edges.If the graph has irrational edge weights, we can perform a standard scaling procedure to turnit into a directed graph with integer edge weights, in a way that guarantees we do not lose muchin the approximation factor. This is done by adjusting all edge weights by up to an additive term ε /M to reach a nearby rational number, producing a graph ˜ G = ( ˆ V , ˜ E ). Let cut ˆ G and cut ˜ G be In some cases, runtimes for ﬁnding solutions to within an additive error of optimality have been considered [20],but these are not directly comparable to our multiplicative approximation guarantees. Furthermore, these runtimesonly improve in terms of logarithmic factors when an approximate solution is returned rather than an optimal one. G and ˜ G respectively. The graphs have the same set of nodes and edges, but mayhave diﬀerent edge weights. Let w ij > i, j ) in ˆ G . For any S ⊆ ˆ V , let ∂S be the set of edges cut by S . We have cut ˆ G ( S ) ≤ cut ˜ G ( S ) ≤ (cid:88) ( i,j ) ∈ ∂S [ w ij + ε /M ] ≤ cut ˆ G ( S ) + ε ≤ (1 + ε ) cut ˆ G ( S ) . Finally, since all edge weights in ˜ G are rational, we can scale them up to be integers. Goldberg andRao [27] provide a method for ﬁnding a (1 + ε )-approximate minimum s - t cut in a directed graphwith N nodes and M edges in time O ( M · min { M / , N / } log( M /N ) log M/ε ), which does notdepend on the largest edge weight. Overall we performed three levels of approximation: approxi-mately reducing the hypergraph, approximating the cut properties of ˆ G with ˜ G , and approximatingthe s - t cut solution in ˜ G . Since ε = ε/

7, the overall approximation factor is (1 + ε ) ≤ (1 + ε ).Plugging in the appropriate values for M and N yields the runtime guarantee. Our approximate solution techniques will always be faster than runtimes obtained by methods thatperform an exact reduction to a graph s - t cut problem [37, 66], since these introduce O ( | e | ) edgesfor each hyperedge e . Kolmogorov [39] presented an algorithm for minimizing sums of submodularfunctions based on submodular ﬂows. Although the approach provides a way to also solve moregeneral variants of the problem, the algorithm has a runtime of O (( n + µ ) log F max ) = ˜ O ( µ )speciﬁcally in the case of cardinality-based functions with integer-valued weights, which is slowerthan our approximate techniques by at least a factor of µ / .Recently, Ene et al. [20] presented improved runtime analyses for optimization techniques forsolving DFSM. The runtimes depend on the time it takes to evaluate oracle functions that corre-spond to solving a submodular minimization problem at a single splitting function. Let θ e be thetime it take to evaluate the oracle at e ∈ E , and deﬁne θ max = max e ∈E θ e and θ avg = R (cid:80) e ∈E θ e .For a cardinality-based function f e , such an oracle can be queried in O ( | e | log | e | ) time [33], and so θ avg = O ( R (cid:80) e ∈E | e | log | e | ). Combing this oracle with the runtimes presented by Ene at al. [20]produces the fastest known runtimes for cardinality-based DFSM. For methods based on discreteoptimization, Ene et al. [20] note that the incremental breadth ﬁrst search (IBFS) algorithm ofFix et al. [22] can be implemented with a strongly polynomial runtime of O ( n θ max (cid:80) e ∈E | e | ),or a weakly polynomial runtime of O ( n θ max log F max + n (cid:80) e ∈E | e | θ i ) = ˜ O ( n θ max + n (cid:80) e ∈E | e | ).Among continuous optimization approaches, the best runtime presented by Ene et al. [20] is achievedby the accelerated random coordinate descent method (ACDM) [21]. The method has a weaklypolynomial runtime of O ( nRθ avg log( nF max )) = ˜ O ( nµ ). The hyperedge e deﬁnes the support of the function f e . Previous research has addressed both thecase of functions f e of large support and the case of small support (see discussion and experimentsin [20, 39, 45, 59]). Functions of small support are common in computer vision applications, thoughthe case of large support has also been studied [59]. In hypergraph cut problems, large hyperedgesare natural for modeling large-scale multiway interactions (as discussed in Section 5 with respect toco-occurrence data). Table 1 summarizes runtimes for the methods we have considered here, withspecialized runtimes highlighted for diﬀerent regimes. The table shows runtimes for small support, k avg = O (1), in which case R = Ω( n ) and also runtimes for k avg = Θ( n ), which highlights the27lowest possible runtime for each method in terms of nodes and hyperedges. For the latter case,Table 1 additionally distinguishes between subcases where R = Ω( n ) and n = Ω( R ).For subcases of each of these regimes, our sparsiﬁcation techniques lead to improved runtimeswhen searching for approximately optimal solutions. When k avg = O (1), our runtime is the fastestas long as R = o ( n ). When k avg = Θ( n ) and R = Ω( n ), the ACDM algorithm has the bestperformance whenever R = Ω( n / ), though we provide a faster approximate alternative belowthis threshold. Most importantly, our method provides a signiﬁcantly faster runtime for the casewhere n = Ω( R ), independent of k avg . Thus, in hypergraphs with a sublinear (in n ) number oflarge hyperedges (equivalently, a sum of sublinearly many functions of large support), obtainingan approximately optimal solution is much faster than solving the problem exactly with existingtechniques. We have introduced the notion of an augmented cut sparsiﬁer, which approximates a generalizedhypergraph cut function with a sparse directed graph on an augmented node set. Our approachrelies on a connection we highlight between graph reduction strategies and piecewise linear ap-proximations to concave functions. Our framework leads to more eﬃcient techniques for approx-imating hypergraph s - t cut problems via graph reduction, improved sparsiﬁers for co-occurrencegraphs, and fast algorithms for approximately minimizing cardinality-based decomposable submod-ular functions.As noted in Section 1.2, an interesting open question is to establish and study analogous notionsof augmented spectral sparsiﬁcation, given that spectral sparsiﬁers provide a useful generalizationof cut sparsiﬁers in graphs [61]. One way to deﬁne such a notion is to apply existing deﬁnitionsof submodular hypergraph Laplacians [46, 70] to both the original hypergraph and its sparsiﬁer.This requires viewing our augmented sparsiﬁer as a hypergraph with splitting functions of theform w e ( A ) = a · min {| A | , | e \ A | , b } , corresponding to hyperedges with cut properties that can bemodeled by a cardinality-based gadget. From this perspective, augmented spectral sparsiﬁcationmeans approximating a generalized hypergraph cut function with another hypergraph cut functioninvolving simpliﬁed splitting functions. While this provides one possible deﬁnition for augmentedspectral sparsiﬁcation, it is not clear whether the techniques we have developed can be used to satisfythis deﬁnition. Furthermore, it is not clear whether obtaining such a sparsiﬁer would imply anyimmediate runtime beneﬁts for approximating the spectra of generalized hypergraph Laplacians,or for solving generalized Laplacian systems [25, 43]. We leave these as questions for future work.While our work provides the optimal reduction strategy in terms of cardinality-based gadgets,this is more restrictive than optimizing over all possible gadgets for approximately modeling hyper-edge cut penalties. Optimizing over a broader space of gadgets poses another interesting directionfor future work, but is more challenging in several ways. First of all, it is unclear how to even deﬁnean optimal reduction when optimizing over arbitrary gadgets, since it is preferable to avoid bothadding new nodes and adding new edges, but the tradeoﬀ between these two goals is not clear.Another challenge is that the best reduction may depend heavily on the splitting function we wishto reduce, which makes developing a general approach diﬃcult. A natural next step would be toat least better understand lower bounds on the number of edges and auxiliary nodes needed tomodel diﬀerent cardinality-based splitting functions. While we do not have any concrete results,there are several indications that cardinality-based gadgets may be nearly optimal in many settings.For example, star expansions and clique expansions provide a more eﬃcient way to model linearand quadratic splitting functions respectively, but modeling these functions with cardinality-based28adgets only increases the number of edges by roughly a factor two.Finally, we ﬁnd it interesting that using auxiliary nodes and directed edges makes it possibleto sparsify the complete graph using only O ( nε − / log log ε ) edges, whereas standard sparsiﬁersrequire O ( nε − ). We would like to better understand whether both directed edges and auxiliarynodes are necessary for making this possible, or whether improved approximations are possibleusing only one or the other. A Proofs of Lemmas in Section 3

Proof of Lemma 3.1Lemma.

Let ˆ f be the continuous extension for a function w ∈ S r , shown in (21) . This function isin the class F r , and has exactly J positive-sloped linear pieces, and one linear piece of slope zero.Proof. Deﬁne b = 0 for notational convenience. The ﬁrst three conditions in Deﬁnition 3.1 can beseen by inspection, recalling that 0 < a j and 0 < b j ≤ r for all j ∈ [ J ]. Observe that ˆ f is linearover the interval [ b i − , b i ) for i ∈ [ J ], since for x ∈ [ b i − , b i ),ˆ f ( x ) = J (cid:88) j =1 a j · min { x, b j } = i − (cid:88) j =1 a j b j + x · J (cid:88) j = i a j . In other words, the i th linear piece of ˆ f , deﬁned over x ∈ [ b i − , b i ) is given by ˆ f ( i ) ( x ) = I i + S i x, where the intercept and slope terms are given by I i = (cid:80) i − j =1 a j b j and S i = (cid:80) Jj = i a j . For the ﬁrst J intervals of the form [ b i − , b i ), the slopes are always positive but strictly decreasing. Thus, there areexactly J positive sloped linear pieces. The ﬁnal linear piece is a ﬂat line, since ˆ f ( x ) = (cid:80) Jj =1 a j b j for all x ≥ b J . The concavity of ˆ f follows directly from the fact that it is a continuous and piecewiselinear function with decreasing slopes. Proof of Lemma 3.2Lemma.

Let f be a function in F r with J + 1 linear pieces. Let b i denote the i th breakpoint of f ,and m i denote the slope of the i th linear piece of f . Deﬁne vectors a , b ∈ R J where b ( i ) = b i and a ( i ) = a i = m i − m i +1 for i ∈ [ J ] . If w is the r -CCB function parameterized by vectors ( a , b ) , then f is the continuous extension of w .Proof. Since f is in F r , it has J positive-sloped linear pieces and one ﬂat linear piece, and thereforeit has exactly J breakpoints: 0 < b < b < . . . < b J . Let b = ( b j ) be the vector storing thesebreakpoints. For convenience we deﬁne b = 0, though b is not stored in b . By deﬁnition, f isconstant for all x ≥ r , which implies that b J ≤ r .Let f i = f ( b i ). For i ∈ [ J ], the positive slope of the i th linear piece of f , which occurs in therange [ b i − , b i ], is given by m i = f i − f i − b i − b i − . (45)The i th linear piece of f is given by f ( i ) ( x ) = m i ( x − b i − ) + f i − for x ∈ [ b i − , b i ] . (46)The last linear piece of f is a ﬂat line over the interval x ∈ [ b J , ∞ ), i.e., m J +1 = 0. Since f haspositive and strictly decreasing slopes, we can see that a i = m i − m i +1 > i ∈ [ J ].29et w be the order- J CCB function constructed from vectors ( a , b ), and let ˆ f be its resultingcontinuous extension: ˆ f = J (cid:88) j =1 a j · min { x, b j } . (47)We must check that ˆ f = f . By Lemma 3.1, we know that ˆ f is in F r and has exactly J + 1 linearpieces. The functions will be the same, therefore, if they share the same values at breakpoints.Evaluating ˆ f at an arbitrary breakpoint b i gives:ˆ f ( b i ) =  i − (cid:88) j =1 a j · b j  + b i ·  J (cid:88) j = i a j  =  i − (cid:88) j =1 a j · b j  + b i · m i . (48)We ﬁrst conﬁrm that the functions coincide at the ﬁrst breakpoint:ˆ f ( b ) = b · m = b · f − f b − b = b f b = f . For any ﬁxed i ∈ { , , . . . , J } ,ˆ f ( b i ) − ˆ f ( b i − ) =  i − (cid:88) j =1 a j b j  + b i m i −  i − (cid:88) j =1 a j b j  − b i − m i − = a i − b i − + b i m i − b i − m i − = ( m i − − m i ) b i − + b i m i − b i − m i − = m i ( b i − b i − ) = f i − f i − . Since f ( b ) = ˆ f ( b ) and f ( b i ) − f ( b i − ) = ˆ f ( b i ) − ˆ f ( b i − ) for i ∈ { , , . . . , t } , we have f ( b i ) = ˆ f ( b i )for i ∈ [ J ]. Therefore, f and ˆ f are the same piecewise linear function. B Sparsiﬁcation for Generalized Splitting Functions

In Sections 2 and 3 we focused on sparsiﬁcation techniques for representing splitting functions thatare symmetric and penalize only cut hyperedges: w e ( S ) = w ( e \ S ) for all S ⊆ e w e ( e ) = w e ( ∅ ) = 0 . These assumptions are standard for generalized hypergraph cut problems [44, 46, 66], and lead tothe clearest exposition of our main results. In this appendix, we extend our sparse approximationtechniques so that they apply even if we remove these restrictions. This will allow us to obtain im-proved techniques for approximately solving a certain class of decomposable submodular functions(see Section 6). Formally, our goal is to minimizeminimize S ⊆ V f ( S ) = (cid:88) e ∈E w e ( S ∩ e ) , (49)where each w e is a submodular cardinality-based function, that is not necessarily symmetric anddoes not need to equal zero when the hyperedge e is uncut. Our proof strategy for reducing thismore general problem to a graph s - t cut problem closely follows the same basic set of steps used inSection 3 for the special case. 30 .1 Submodularity Constraints for Cardinality-Based Functions We ﬁrst provide a convenient characterization of general cardinality-based submodular functions.By general we mean the splitting function does not need to be symmetric nor does it need to havea zero penalty when the hyperedge is uncut.

Lemma B.1.

Let w e be a general submodular cardinality-based splitting function on a k -nodehyperedge e , and let w i denote the penalty for any A ⊆ e with | A | = i . Then for i ∈ { , , . . . , k − } w i ≥ w i − + w i +1 . (50) Proof.

Let v , v , . . . , v k denote the nodes in the hyperedge. Submodularity means that for all A, B ⊆ e , w ( A ) + w ( B ) ≥ w ( A ∪ B ) + w ( A ∩ B ). In order to show inequality (50), simply set A = { v , v , . . . , v i } and B = { v , v , . . . , v i +1 } and the result follows.To simplify our analysis, as we did for the symmetric case, we will deﬁne a set of functions thatis virtually identical to these splitting functions on k -node hyperedges, but are deﬁned over integersfrom 0 to k rather than on subsets of a hyperedge. Deﬁnition B.1. A k -GSCB (Generalized Submodular Cardinality-Based) integer function is afunction w : { } ∪ [ k ] → R + satisfying w ( i ) ≥ w ( i −

1) + w ( i + 1) for all i ∈ [ k − . B.2 Combining Gadgets for Generalized SCB Functions

Our goal is to show how to approximate k -GSCB integer functions using piecewise linear functionswith few linear pieces. This in turn corresponds to approximating a hyperedge splitting functionwith a sparse gadget. In order for this to work for our more general class of splitting functions, weuse a slight generalization of an asymmetric gadget we introduced in previous work [66]. Deﬁnition B.2.

The asymmetric cardinality-based gadget (ACB-gadget) for a k -node hyperedge e is parameterized by scalars a and b and constructed as follows: • Introduce an auxiliary vertex v e . • For each v ∈ e , introduce a directed edge from v to v e with weight a · ( k − b ) , and a directededge from v e to v with weight a · b . The ACB-gadget models the following k -GSCB integer function: w a,b ( i ) = a · min { i · ( k − b ) , ( k − i ) · b } . (51)To see why, consider where we must place the auxiliary node v e when solving a minimum s - t cutproblem involving the ACB-gadget. If we place i nodes on the s -side, then placing v e on the s -sidehas a cut penalty of ab ( k − i ), whereas placing v e on the t -side gives a penalty of ai ( k − b ). Tominimize the cut, we choose the smaller of the two options.Previously we showed that asymmetric splitting functions can be modeled exactly by a com-bination of k − w e ( ∅ ) = w e (0) = 0 even for asymmetric splitting functions, but we remove this constrainthere. In order to model the cut properties of an arbitrary GSCB splitting function, we deﬁne acombined gadget involving multiple ACB-gadgets, as well as edges from each node v ∈ e to the31ource and sink nodes of the graph. The augmented cut function for the resulting directed graphˆ G = ( V ∪ A ∪ { s, t } , ˆ E ) will then be given by cut ˆ G ( S ) = min T ⊆A dircut ˆ G ( { s } ∪ S ∪ T ) for aset S ⊆ V , where dircut is the directed cut function on ˆ G . Finding a minimum s - t cut in ˆ G will solve objective (49), or equivalently, the cardinality-based decomposable submodular functionminimization problem. Deﬁnition B.3. A k -CG function ( k -node, combined gadget function) ˆ w of order J is a k -GSCBinteger function that is parameterized by scalars z , z k , and ( a j , b j ) for j ∈ [ J ] . The function hasthe form: ˆ w ( i ) = z · ( k − i ) + z k · i + J (cid:88) j =1 a j min { i · ( k − b j ) , ( k − i ) · b j } . (52) The scalars parameterizing ˆ w satisfy b j > , a j > for all j ∈ [ J ] b j < b j +1 for all j ∈ [ J − b J < kz ≥ and z k ≥ . Conceptually, the function shown in (52) represents a combination of J ACB-gadgets for ahyperedge e , where additionally for each node v ∈ e we have place a directed edge from a sourcenode s to v of weight z , and an edge from v to a sink node t with weight z k .The continuous extension of the k -CG function (52) is deﬁned to be:ˆ f ( x ) = z · ( k − x ) + z k · x + J (cid:88) j =1 a j min { x · ( k − b j ) , ( k − x ) · b j } for x ∈ [0 , k ]. (53) Lemma B.2.

The continuous extension ˆ f of ˆ w is nonnegative over the interval [0 , k ] , piecewiselinear, concave, and has exactly J + 1 linear pieces.Proof. Nonnegativity follows quickly from the positivity of z , z k , and ( a i , b i ) for i ∈ [ J ], and b J < k . For other properties, we begin by re-writing the function asˆ f ( x ) = z · ( k − x ) + z k · x + J (cid:88) j =1 a j min { x · ( k − b j ) , ( k − x ) · b j } (54)= kz + x ( z k − z ) + k · J (cid:88) j =1 a j min { x, b j } − x · J (cid:88) j =1 a j b j (55)= kz + x ( z k − z ) + kx · (cid:88) j : x

For every function f that is nonnegative, piecewise linear with J + 1 linear pieces,and concave over the interval [0 , k ] , there exists some k -CG function w of order J such that f isthe continuous extension of w .Proof. The function w will be deﬁned by choosing parameters z , z k , and ( a j , b j ) for j ∈ [ J ]. Let ˆ f denote the continuous extension of the function w that we will build. From the proof of Lemma B.2,we know that the parameter b j will correspond to the j th breakpoint of ˆ f . Therefore, given f , weset b j to be the j th breakpoint of the function f , so that the functions match at breakpoints. Forconvenience, we also set b = 0 and b J +1 = k . We then set z = f (0) /k and z k = f ( k ) /k , toguarantee that ˆ f (0) = f (0) and ˆ f ( k ) = f ( k ). In order to set the a j values, we ﬁrst compute theslopes of each line of f . Let f j = f ( b j ) for j ∈ { } ∪ [ J + 1]. The j th linear piece of f has the slope: m i = f i − f i − b i − b i − . Finally, for j ∈ [ J ] we set a j = k ( m j − m j +1 ). All of our chosen parameters satisfy the conditionsof Deﬁnition B.3, so it simply remains to check that f and ˆ f coincide at breakpoints.Let t ∈ [ J ]. Using (56) to evaluating ˆ f at breakpoint b t , we getˆ f ( b t ) = f + b t k ( f k − f ) + kb t J (cid:88) j = t +1 a j + k t (cid:88) j =1 a j b j − b t J (cid:88) j =1 a j b j . (57)We can simplify several terms using the fact that a j = k ( m j − m j +1 ). First of all, k J (cid:88) j = t +1 a j = J (cid:88) j = t +1 [ m j − m j +1 ] = m t +1 − m J +1 . Furthermore, k t (cid:88) j =1 a j b j = t (cid:88) j =1 ( m j − m j +1 ) b j = m b − m t +1 b t + t (cid:88) j =2 m j ( b j − b j − )= ( f − f ) − m t +1 b t + t (cid:88) j =2 [ f j − f j − ] = ( f − f ) − m t +1 b t + f t − f = f t − f − m t +1 b t . (cid:80) Jj =1 a j b j = k ( f J − f − m J +1 b J ). Plugging this into (57), we getˆ f ( b t ) = f + b t k ( f J +1 − f ) + b t ( m t +1 − m J +1 ) + f t − f − m t +1 b t − b t k ( f J − f − m J +1 b J )= b t k f J +1 − b t m J +1 + f t − b t k ( f J − m J +1 b J )= f t + b t k ( f J +1 − f J ) − b t m J +1 (cid:18) − b J k (cid:19) = f t + b t k ( f J +1 − f J ) − b t (cid:18) f J +1 − f J k − b J (cid:19) (cid:18) k − b J k (cid:19) = f t = f ( b t ) . So we see that f = ˆ f at breakpoints, and therefore these be the same piecewise linear function. B.3 Finding the Best Piecewise Approximation

As we did for symmetric splitting functions, we can quickly ﬁnd the best piecewise linear (1 + ε )-approximation to a k -GSCB integer function w using a greedy approach. We omit proof details,as they exactly mirror arguments provided for the symmetric case. The submodularity constraint2 w ( i ) ≥ w ( i + 1) + w ( i −

1) for i ∈ { } ∪ [ k ] can be viewed as a discrete version of concavity, and willensure that the piecewise linear function returned by such a procedure will also be nonnegative andconcave. After obtaining the piecewise linear approximation, we can apply Lemma B.3 to reverseengineer a k -CG function of a small order that approximates w . We obtain the same asymptoticupper bound on the number of linear pieces needed to approximate w . Lemma B.4.

Let w be a k -GSCB integer function and ε ≥ . There exists a k -CG function ˆ w oforder J = O ( ε log k ) that satisﬁes w ( i ) ≤ ˆ w ( i ) ≤ (1 + ε ) w ( i ) for any i ∈ { } ∪ [ k ] . B.4 Approximating Cardinality-Based Sum of Submodular Functions

Recall that k -CG functions correspond to combinations of ACB-gadgets for a hyperedge e as wellas directed edges between nodes in e and the source and sink nodes in some minimum s - t cutproblem. Each ACB-gadget involves one new auxiliary node and 2 | e | directed edges, and thenumber of ACB-gadgets is equal to the order of the k -CG function (the number of linear piecesminus one). Let H = ( V, E ) be a hypergraph with n = | V | nodes, where each splitting function issubmodular, cardinality-based, and is not required to be symmetric or penalize only cut hyperedges.Finding the minimum cut in H corresponds to solving the sum of submodular splitting functionsgiven in (49). For ε ≥

0, we can preserve cuts in H to within a factor (1 + ε ) by introducinga source and sink node s and t and applying our sparse reduction techniques to each hyperedgeto obtain a directed graph ˆ G = ( V ∪ A ∪ { s, t } , ˆ E ), where A is the set of auxiliary nodes, with N = O ( n + ε (cid:80) e ∈E log | e | ) nodes and M = O ( n + ε (cid:80) e ∈E log | e | ) edges. Even if the size of each e ∈ E is O ( n ), we have N = O ( n + ε − |E| log n ) and M = O ( ε − log n (cid:80) e ∈E | e | ). References [1] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs.In

Proceedings of the 23rd International Conference on Machine Learning , ICML ’06, pages17–24, New York, NY, USA, 2006. ACM. 342] Kadir. Akbudak, Enver. Kayaaslan, and Cevdet. Aykanat. Hypergraph partitioning basedmodels and methods for exploiting cache locality in sparse matrix-vector multiplication.

SIAMJournal on Scientiﬁc Computing , 35(3):C237–C262, 2013.[3] Noga Alon. On the edge-expansion of graphs.

Comb. Probab. Comput. , 6(2):145–152, June1997.[4] Charles J Alpert and Andrew B Kahng. Recent directions in netlist partitioning: a survey.

Integration , 19(1):1 – 81, 1995.[5] Alexandr Andoni, Jiecao Chen, Robert Krauthgamer, Bo Qin, David P. Woodruﬀ, and QinZhang. On sketching quadratic forms. In

Proceedings of the 2016 ACM Conference on Innova-tions in Theoretical Computer Science , ITCS ’16, pages 311–319, New York, NY, USA, 2016.Association for Computing Machinery.[6] Grey Ballard, Alex Druinsky, Nicholas Knight, and Oded Schwartz. Hypergraph partition-ing for sparse matrix-matrix multiplication.

ACM Trans. Parallel Comput. , 3(3):18:1–18:34,December 2016.[7] N. Bansal, O. Svensson, and L. Trevisan. New notions and constructions of sparsiﬁcation forgraphs and hypergraphs. In , FOCS ’19, pages 910–928, 2019.[8] Joshua Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-ramanujan sparsiﬁers.

SIAMReview , 56(2):315–334, 2014.[9] Andr´as A Bencz´ur and David R Karger. Approximating s – t minimum cuts in ˜ O ( n ) time. In Proceedings of the twenty-eighth annual ACM Symposium on Theory of computing , STOC ’96,pages 47–55, 1996.[10] Austin R. Benson, David F. Gleich, and Jure Leskovec. Higher-order organization of complexnetworks.

Science , 353(6295):163–166, 2016.[11] Austin R. Benson, Paul Liu, and Hao Yin. A simple bipartite graph projection model forclustering in networks. arXiv preprint: https://arxiv.org/abs/2007.00761 , 2020.[12] Mindaugas Bloznelis et al. Degree and clustering coeﬃcient in sparse random intersectiongraphs.

The Annals of Applied Probability , 23(3):1254–1289, 2013.[13] Mindaugas Bloznelis and Justinas Petuchovas. Correlation between clustering and degree inaﬃliation networks. In

International Workshop on Algorithms and Models for the Web-Graph ,pages 90–104. Springer, 2017.[14] T. H. Hubert Chan and Zhibin Liang. Generalizing the hypergraph laplacian via a diﬀusionprocess with mediators. In

Computing and Combinatorics , pages 441–453. Springer Interna-tional Publishing, 2018.[15] Karthekeyan Chandrasekaran, Chao Xu, and Xilin Yu. Hypergraph k-cut in randomizedpolynomial time. In

Proceedings of the 2018 Annual ACM-SIAM Symposium on Discrete Al-gorithms , SODA ’18, pages 1426–1438, USA, 2018. Society for Industrial and Applied Mathe-matics. 3516] Chandra Chekuri and Chao Xu. Computing minimum cuts in hypergraphs. In

Proceedings ofthe 2017 Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’17, pages 1085–1100,2017.[17] Chandra Chekuri and Chao Xu. Minimum cuts and sparsiﬁcation in hypergraphs.

SIAMJournal on Computing , 47(6):2118–2156, 2018.[18] Julia Chuzhoy. On vertex sparsiﬁers with steiner nodes. In

Proceedings of the Forty-FourthAnnual ACM Symposium on Theory of Computing , STOC ’12, pages 673–688, New York, NY,USA, 2012. Association for Computing Machinery.[19] Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. Power-law distributions inempirical data.

SIAM Review , 51(4):661–703, 2009.[20] Alina Ene, Huy Nguyen, and L´aszl´o A V´egh. Decomposable submodular function minimiza-tion: discrete and continuous. In

Advances in Neural Information Processing Systems , NeurIPS’17, pages 2870–2880, 2017.[21] Alina Ene and Huy L. Nguyen. Random coordinate descent methods for minimizing decom-posable submodular functions. In

Proceedings of the 32nd International Conference on Inter-national Conference on Machine Learning - Volume 37 , ICML’15, pages 787–795. JMLR.org,2015.[22] A. Fix, T. Joachims, S. M. Park, and R. Zabih. Structured learning of sum-of-submodularhigher order energy functions. In ,pages 3104–3111, 2013.[23] Kyle Fox, Debmalya Panigrahi, and Fred Zhang. Minimum cut and minimum k -cut in hyper-graphs via branching contractions. In Proceedings of the 2019 Annual ACM-SIAM Symposiumon Discrete Algorithms , SODA ’19, pages 881–896, 2019.[24] D. Freedman and P. Drineas. Energy minimization via graph cuts: settling what is possible.In ,CVPR ’05, 2005.[25] Kaito Fujii, Tasuku Soma, and Yuichi Yoshida. Polynomial-time algorithms for submodularlaplacian systems. arXiv preprint: 1803.10923 , 2018.[26] Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang. Graph clustering inall parameter regimes. In

International Symposium on Mathematical Foundations of ComputerScience , MFCS ’20, 2020.[27] Andrew V. Goldberg and Satish Rao. Beyond the ﬂow decomposition barrier.

J. ACM ,45(5):783–797, September 1998.[28] Scott W. Hadley. Approximation techniques for hypergraph partitioning problems.

DiscreteApplied Mathematics , 59(2):115 – 127, 1995.[29] Matthias Hein, Simon Setzer, Leonardo Jost, and Syama Sundar Rangapuram. The totalvariation on hypergraphs - learning on hypergraphs revisited. In

Proceedings of the 26thInternational Conference on Neural Information Processing Systems , NeurIPS ’13, pages 2427–2435, 2013. 3630] Jin Huang, Rui Zhang, and Jeﬀrey Xu Yu. Scalable hypergraph learning and processing. In

Proceedings of the 2015 IEEE International Conference on Data Mining , ICDM ’15, pages775–780, Washington, DC, USA, 2015. IEEE Computer Society.[31] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly polynomial algo-rithm for minimizing submodular functions.

J. ACM , 48(4):761–777, July 2001.[32] Satoru Iwata and James B. Orlin. A simple combinatorial algorithm for submodular functionminimization. In

Proceedings of the 2009 Annual ACM-SIAM Symposium on Discrete Algo-rithms , SODA ’09, pages 1230–1237, Philadelphia, PA, USA, 2009. Society for Industrial andApplied Mathematics.[33] Stefanie Jegelka, Francis Bach, and Suvrit Sra. Reﬂection methods for user-friendly submod-ular optimization. In

Proceedings of the 26th International Conference on Neural InformationProcessing Systems , NeurIPS ’13, pages 1313–1321, 2013.[34] Stefanie Jegelka, Hui Lin, and Jeﬀ A Bilmes. On fast approximate submodular minimization.In

Advances in Neural Information Processing Systems , NeurIPS ’11, pages 460–468, 2011.[35] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: ap-plications in vlsi domain.

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ,7(1):69–79, March 1999.[36] Dmitry Kogan and Robert Krauthgamer. Sketching cuts in graphs and hypergraphs. In

Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science , ITCS’15, pages 367–376, New York, NY, USA, 2015. Association for Computing Machinery.[37] Pushmeet Kohli, Philip HS Torr, et al. Robust higher order potentials for enforcing labelconsistency.

International Journal of Computer Vision , 82(3):302–324, 2009.[38] V. Kolmogorov and R. Zabin. What energy functions can be minimized via graph cuts?

IEEETransactions on Pattern Analysis and Machine Intelligence , 26(2):147–159, Feb 2004.[39] Vladimir Kolmogorov. Minimizing a sum of submodular functions.

Discrete Appl. Math. ,160(15):2246–2258, October 2012.[40] Silvio Lattanzi and D Sivakumar. Aﬃliation networks. In

Proceedings of the forty-ﬁrst annualACM symposium on Theory of Computing , pages 427–434, 2009.[41] E. L. Lawler. Cutsets and partitions of hypergraphs.

Networks , 3(3):275–285, 1973.[42] Menghui Li, Jinshan Wu, Dahui Wang, Tao Zhou, Zengru Di, and Ying Fan. Evolving modelof weighted networks inspired by scientiﬁc collaboration networks.

Physica A: Statistical Me-chanics and its Applications , 375(1):355 – 364, 2007.[43] Pan Li, Niao He, and Olgica Milenkovic. Quadratic decomposable submodular function mini-mization: Theory and practice.

Journal of Machine Learning Research , 21(106):1–49, 2020.[44] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. In

Advances in Neural Information Processing Systems 30 , NeurIPS ’17, pages 2308–2318. 2017.[45] Pan Li and Olgica Milenkovic. Revisiting decomposable submodular function minimizationwith incidence relations. In

Advances in Neural Information Processing Systems 31 , NeurIPS’18, pages 2237–2247, 2018. 3746] Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-laplacians, Cheeger inequalitiesand spectral clustering. In Jennifer Dy and Andreas Krause, editors,

Proceedings of the 35thInternational Conference on Machine Learning , volume 80 of

ICML ’18 , pages 3014–3023.PMLR, 2018.[47] Anand Louis. Hypergraph markov operators, eigenvalues and approximation algorithms. In

Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing , STOC’15, pages 713–722, New York, NY, USA, 2015. Association for Computing Machinery.[48] A. Lubotzky. Ramanujan graphs.

Combinatorica , 8:261–278, 1988.[49] Thomas L. Magnanti and Dan Stratila. Separable concave optimization approximately equalspiecewise linear optimization. In

IPCO 2004 , pages 234–243, 2004.[50] Thomas L Magnanti and Dan Stratila. Separable concave optimization approximately equalspiecewise-linear optimization. arXiv preprint arXiv:1201.3148 , 2012.[51] Grigorii Aleksandrovich Margulis. Explicit group-theoretical constructions of combinatorialschemes and their application to the design of expanders and concentrators.

Problemy peredachiinformatsii , 24(1):51–60, 1988.[52] Zachary Neal. The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors.

Social Networks , 39:84– 97, 2014.[53] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degreedistributions and their applications.

Phys. Rev. E , 64:026118, Jul 2001.[54] Robert Nishihara, Stefanie Jegelka, and Michael I. Jordan. On the convergence rate of decom-posable submodular function minimization. In

Proceedings of the 27th International Conferenceon Neural Information Processing Systems , NeurIPS ’14, pages 640–648, 2014.[55] James B. Orlin. A faster strongly polynomial time algorithm for submodular function mini-mization.

Mathematical Programming , 118(2):237–251, May 2009.[56] Pulak Purkait, Tat-Jun Chin, Alireza Sadri, and David Suter. Clustering with hypergraphs:the case for large hyperedges.

IEEE transactions on pattern analysis and machine intelligence ,39(9):1697–1711, 2016.[57] Jos´e J. Ramasco and Steven A. Morris. Social inertia in collaboration networks.

Phys. Rev.E , 73:016122, Jan 2006.[58] J.A. Rodr´ıguez. Laplacian eigenvalues and partition problems in hypergraphs.

Applied Math-ematics Letters , 22(6):916 – 921, 2009.[59] I. Shanu, C. Arora, and P. Singla. Min norm point algorithm for higher order mrf-mapinference. In , CVPR ’16,pages 5365–5374, 2016.[60] Tasuku Soma and Yuichi Yoshida. Spectral sparsiﬁcation of hypergraphs. In

Proceedings of the2019 Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’19, pages 2570–2581,2019. 3861] Daniel A. Spielman and Shang-Hua Teng. Spectral sparsiﬁcation of graphs.

SIAM Journal onComputing , 40(4):981–1025, 2011.[62] Daniel A. Spielman and Shang-Hua Teng. Nearly linear time algorithms for preconditioningand solving symmetric, diagonally dominant linear systems.

SIAM Journal on Matrix Analysisand Applications , 35(3):835–885, 2014.[63] Domenico De Stefano, Vittorio Fuccella, Maria Prosperina Vitale, and Susanna Zaccarin. Theuse of diﬀerent data sources in the analysis of co-authorship networks and scientiﬁc perfor-mance.

Social Networks , 35(3):370 – 381, 2013.[64] Peter Stobbe and Andreas Krause. Eﬃcient minimization of decomposable submodular func-tions. In

Proceedings of the 23rd International Conference on Neural Information ProcessingSystems , NeurIPS ’10, pages 2208–2216, 2010.[65] A. Vannelli and S. W. Hadley. A gomory-hu cut tree representation of a netlist partitioningproblem.

IEEE Transactions on Circuits and Systems , 37(9):1133–1139, Sep. 1990.[66] Nate Veldt, Austin R. Benson, and Jon Kleinberg. Hypergraph cuts with general splittingfunctions. arXiv preprint: 2001.02817 , 2020.[67] Nate Veldt, Austin R. Benson, and Jon Kleinberg. Minimizing localized ratio cut objectives inhypergraphs. In

Proceedings of the 26th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining (to appear) , KDD ’20, 2020.[68] Nate Veldt, Anthony Wirth, and David F. Gleich. Parameterized correlation clustering inhypergraphs and bipartite graphs. In

Proceedings of the 26th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (to appear) , KDD ’20, 2020.[69] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. nature ,393(6684):440, 1998.[70] Yuichi Yoshida. Cheeger inequalities for submodular transformations. In

Proceedings of the2019 Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’19, pages 2582–2601,2019.[71] Dengyong Zhou, Jiayuan Huang, and Bernhard Sch¨olkopf. Learning with hypergraphs: Clus-tering, classiﬁcation, and embedding. In

Proceedings of the 19th International Conference onNeural Information Processing Systems , NeurIPS ’06, pages 1601–1608, 2006.[72] Tao Zhou, Jie Ren, Mat´u ˇs Medo, and Yi-Cheng Zhang. Bipartite network projection andpersonal recommendation.

Phys. Rev. E , 76:046115, Oct 2007.[73] J. Y. Zien, M. D. F. Schlag, and P. K. Chan. Multilevel spectral hypergraph partitioning witharbitrary vertex sizes.

IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems , 18(9):1389–1399, Sep. 1999.[74] Stanislav ˇZivn´y, David A. Cohen, and Peter G. Jeavons. The expressive power of binarysubmodular functions. In Rastislav Kr´aloviˇc and Damian Niwi´nski, editors,

Related Researches

The Multiplicative Version of Azuma's Inequality, with an Application to Contention Analysis

by William Kuszmaul

Balanced Districting on Grid Graphs with Provable Compactness and Contiguity

by Cyrus Hettle

Deterministic Tree Embeddings with Copies for Algorithms Against Adaptive Adversaries

by Bernhard Haeupler

Approximately counting independent sets of a given size in bounded-degree graphs

by Ewan Davies

A Dynamic Data Structure for Temporal Reachability with Unsorted Contact Insertions

by Luiz F. Afra Brito

Semi-Streaming Algorithms for Submodular Matroid Intersection

by Paritosh Garg

Multivariate Analysis of Scheduling Fair Competitions

by Siddharth Gupta

Streaming k-PCA: Efficient guarantees for Oja's algorithm, beyond rank-one updates

by De Huang

Online Bin Packing with Predictions

by Spyros Angelopoulos

Minimum projective linearizations of trees in linear time

by Lluís Alemany-Puig

Parameterized complexity of computing maximum minimal blocking and hitting sets

by Júlio Araújo

A 2 -Approximation Algorithm for Flexible Graph Connectivity

by Sylvia Boyd

A Faster Algorithm for Finding Closest Pairs in Hamming Metric

by Andre Esser

Kernelization of Maximum Minimal Vertex Cover

by Júlio Araújo

Fractionally Log-Concave and Sector-Stable Polynomials: Counting Planar Matchings and More

by Yeganeh Alimohammadi

Optimal Construction of Hierarchical Overlap Graphs

by Shahbaz Khan

Gapped Indexing for Consecutive Occurrences

by Philip Bille

CountSketches, Feature Hashing and the Median of Three

by Kasper Green Larsen

A Refined Analysis of Submodular Greedy

by Ariel Kulik

Generalized Parametric Path Problems

by Prerona Chatterjee

Approximate Privacy-Preserving Neighbourhood Estimations

by Alvaro Garcia-Recuero

Coalgebra Encoding for Efficient Minimization

by Hans-Peter Deifel

Algorithms and Complexity on Indexing Founder Graphs

by Massimo Equi

A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

by Sangsoo Park

Density Sketches for Sampling and Estimation

by Aditya Desai

«

1

2

3

4

»

Submitted on 16 Jul 2020 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar