[PDF] Approximate Modularity Revisited

Abstract

Set functions with convenient properties (such as submodularity) appear in application areas of current interest, such as algorithmic game theory, and allow for improved optimization algorithms. It is natural to ask (e.g., in the context of data driven optimization) how robust such properties are, and whether small deviations from them can be tolerated. We consider two such questions in the important special case of linear set functions. One question that we address is whether any set function that approximately satisfies the modularity equation (linear functions satisfy the modularity equation exactly) is close to a linear function. The answer to this is positive (in a precise formal sense) as shown by Kalton and Roberts [1983] (and further improved by Bondarenko, Prymak, and Radchenko [2013]). We revisit their proof idea that is based on expander graphs, and provide significantly stronger upper bounds by combining it with new techniques. Furthermore, we provide improved lower bounds for this problem. Another question that we address is that of how to learn a linear function h that is close to an approximately linear function f, while querying the value of f on only a small number of sets. We present a deterministic algorithm that makes only linearly many (in the number of items) nonadaptive queries, by this improving over a previous algorithm of Chierichetti, Das, Dasgupta and Kumar [2015] that is randomized and makes more than a quadratic number of queries. Our learning algorithm is based on a Hadamard transform.

Full PDF

aa r X i v : . [ c s . D S ] M a r Approximate Modularity Revisited ∗ Uriel Feige † , Michal Feldman ‡ , Inbal Talgam-Cohen § Abstract

Set functions with convenient properties (such as submodularity) appear in application ar-eas of current interest, such as algorithmic game theory, and allow for improved optimizationalgorithms. It is natural to ask (e.g., in the context of data driven optimization) how robustsuch properties are, and whether small deviations from them can be tolerated. We consider twosuch questions in the important special case of linear set functions.One question that we address is whether any set function that approximately satisﬁes themodularity equation (linear functions satisfy the modularity equation exactly) is close to alinear function. The answer to this is positive (in a precise formal sense) as shown by Kaltonand Roberts [1983] (and further improved by Bondarenko, Prymak, and Radchenko [2013]).We revisit their proof idea that is based on expander graphs, and provide signiﬁcantly strongerupper bounds by combining it with new techniques. Furthermore, we provide improved lowerbounds for this problem.Another question that we address is that of how to learn a linear function h that is closeto an approximately linear function f , while querying the value of f on only a small numberof sets. We present a deterministic algorithm that makes only linearly many (in the number ofitems) nonadaptive queries, by this improving over a previous algorithm of Chierichetti, Das,Dasgupta and Kumar [2015] that is randomized and makes more than a quadratic number ofqueries. Our learning algorithm is based on a Hadamard transform. A set function f over a universe U of n items assigns a real value f ( S ) to every subset S ⊆ U (including the empty set ∅ ). Equivalently, it is a function whose domain is the Boolean n -dimensional cube { , } n , where each coordinate corresponds to an item, and a vector in { , } n corresponds to the indicator vector of a set. Set functions appear in numerous applications, someof which are brieﬂy mentioned in Section 1.2. Though set functions are deﬁned over domains ofsize 2 n , one is often interested in optimizing over them in time polynomial in n . This oﬀers severalchallenges, not least of which is the issue of representing f . An explicit representation of the truthtable of f is of exponential size, and hence other representations are sought.Some classes of set functions have convenient structure that leads to a polynomial size represen-tation, from which the value of every set can easily be computed. A prime example for this is theclass of linear functions. Formally, a set function f is linear if there exist constants c , c , . . . , c n ∗ The conference version appeared in STOC’17. Part of this work was done at Microsoft Research, Herzliya. Partof the work of U. Feige was done while visiting Princeton University. This work has received funding from theIsrael Science Foundation (grant No. 1388/16), the European Research Council under the European Union’s SeventhFramework Programme (FP7/2007-2013) / ERC grant agreement No. 337122, and the European Union’s Horizon2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 708935. † Weizmann Institute of Science, Rehovot, Israel, [email protected] . ‡ Tel-Aviv University, Tel-Aviv, and Microsoft Research, Herzliya, Israel, [email protected] . § Hebrew University, Jerusalem, Israel, [email protected] . S , f ( S ) = c + P i ∈ S c i . The constants ( c , c , . . . , c n ) may serve as a poly-nomial size representation of f . More generally, for set functions that arise naturally, one typicallyassumes that there is a so called value oracle , such that for every set S one can query the oracle on S and receive f ( S ) in reply (either in unit time, or in polynomial time, depending on the context).The value oracle serves as an abstraction of either having some explicit polynomial time represen-tation (e.g., a Boolean circuit) from which the value of f on any given input can be computed, or(in cases in which set functions model some physical reality) having a physical process of evaluating f on the set S (e.g., by making a measurement).Optimizing over general set functions (e.g., ﬁnding the maximum, the minimum, maximizingsubject to size constraints, etc.) is a diﬃcult task, requiring exponentially many value queries ifonly a value oracle is given, and NP-hard if an explicit representation is given. However, for somespecial classes of set functions various optimization problems can be solved in polynomial timeand with only polynomially many value queries. Notable nontrivial examples are minimizationof submodular set functions Schrijver [2000]Iwata et al. [2001], and welfare maximization whenthe valuation function of each agent satisﬁes the gross substitutes property (see, e.g., Paes Leme[2017]). For the class of linear set functions, many optimization problems of interest can be solvedin polynomial time, often by trivial algorithms.A major concern regarding the positive algorithmic results for some nice classes of set functionsis their stability . Namely, if f is not a member of that nice class, but rather is only close to being amember (under some natural notion of closeness ), is optimizing over f still easy? Can one obtainsolutions that are “close” to optimal? Or are the algorithmic results “unstable” in the sense thata small divergence from the nice class leads to a dramatic deterioration in the performance of theassociated algorithms?A complicating factor is that there is more than one way of deﬁning closeness . For example,when considering two functions, one may consider the variational distance between them, the meansquare distance, the Hamming distance, and more. The situation becomes even more complicatedwhen one wishes to deﬁne how close f is to a given class C of nice functions (rather than toa particular function). One natural deﬁnition is in terms of a distance to the function g ∈ C closest to f . But other deﬁnitions make sense as well, especially if the class C is deﬁned in termsof properties that functions in C have. For example, the distance from being submodular can bemeasured also by the extent to which the submodularity condition f ( S )+ f ( T ) ≥ f ( S ∪ T )+ f ( S ∩ T )might be violated by f (as in Lehmann et al. [2006]; Krause and Cevher [2010]), or even by the socalled supermodular degree Feige and Izsak [2013].Following the lead of Chierichetti, Das, Dasgupta, and Kumar [2015], the goal of this work is tostudy questions such as the above in a setting that is relatively simple, yet important, namely, thatof linear set functions. As illustrated by the work of Chierichetti et al. [2015], even this relativelysimple setting is challenging.

Kalton constants

One question that we address is the relation between two natural notions ofbeing close to a linear function. The ﬁrst notion is that of being close point-wise in an additivesense. We say that f is ∆ -linear if there is a linear set function g such that | f ( S ) − g ( S ) | ≤ ∆for every set S . Under this notion, the smaller ∆ is, the closer we consider f to being linear.The other notion of closeness concerns a diﬀerent but equivalent deﬁnition of linear functions,namely, as those functions that satisfy the modular equation f ( S ) + f ( T ) = f ( S ∪ T ) + f ( S ∩ T )for every two sets S and T . This form of deﬁning linear functions is the key to generalizing linearfunctions to other classes of functions of interest, and speciﬁcally to submodular functions that2atisfy f ( S ) + f ( T ) ≥ f ( S ∪ T ) + f ( S ∩ T ), and are considered to be the discrete analog of convexfunctions. Formally, we say that f is • ǫ -modular if | f ( S ) + f ( T ) − f ( S ∪ T ) − f ( S ∩ T ) | ≤ ǫ for every two sets S and T ; • weakly ǫ -modular if the inequality holds for every two disjoint sets S and T .The smaller ǫ is, the closer we consider f to being linear (or equivalently, to being modular).It can easily be shown that every ∆-linear function is ǫ -modular for ǫ ≤ ǫ -modularfunction is ∆-linear for ∆ = O ( ǫ log n ). However, the authors of Chierichetti et al. [2015] were notaware of the earlier work of Kalton and Roberts [1983], which already showed that ∆ ≤ O ( ǫ ) (thiswork was brought to our attention by Assaf Naor). We shall use K s to denote the smallest constantsuch that every ǫ -modular function is K s ǫ -linear, and refer to K s as the strong Kalton constant . Thebound provided in Kalton and Roberts [1983] was K s ≤ , and this was subsequently improvedby Bondarenko, Prymak, and Radchenko Bondarenko et al. [2013] to K s ≤ . K s . We revisit thisapproach, simplify it and add to it new ingredients. Technically, the advantage that we get by ournew ingredients is that we can use bipartite graphs in which only small sets of vertices expand,whereas previous work needed to use bipartite graphs in which large sets expand. This allows usto use expanders with much better parameters, leading to substantially improved upper bounds on K s . For concreteness, we prove in this paper the following upper bound. Theorem 1.1 (Upper bound, strong Kalton) . Every ǫ -modular function is ∆ -linear for ∆ < . ǫ .Hence K s < . . We remark that our technique for upper bounding K s adds a lot of versatility to the expanderapproach, which is not exploited to its limit in the current version of the paper: the upper boundthat we report strikes a balance between simplicity of the proof and quality of the upper bound.Directions for further improvements are mentioned in Section 3.6.Obtaining good lower bounds on K s is also not easy. Part of the diﬃculty is that even if onecomes up with a function f that is a candidate for a lower bound, verifying that it is ǫ -modularinvolves checking roughly 2 n approximate modularity equations (one equation for every pair S and T of sets), making a computer assisted search for good lower bounds impractical. For set functionsover 4 items, we could verify that the worst possible constant is 1 /

2. Checking ǫ -modularity ismuch easier for symmetric functions (for which the value of a set depends only on its size), butwe show that for such functions K s = , and this is also the case for ǫ -modular functions that aresubmodular (see Section 2.3). The only lower bound on K s that we could ﬁnd in previous work is K s ≥ , implicit in Pawlik [1987]. We consider a class of functions that enjoys many symmetries (wecall such functions ( k, M ) -symmetric ), and for some function in this class we provide the followinglower bound. Theorem 1.2 (Lower bound, strong Kalton) . There is an integer valued set function over 70 itemsthat is 2-modular and tightly 2-linear. Hence K s ≥ . In addition, we shall use K w to denote the smallest constant such that every weakly ǫ -modularfunction is K w ǫ -linear, and refer to K w as the weak Kalton constant . For the weak Kalton constantwe prove the following theorem. Theorem 1.3 (Upper and lower bounds, weak Kalton) . Every weakly ǫ -modular function is ∆ -linear for ∆ < ǫ , so K w < . Moreover, there is an integer valued set function over 20 itemsthat is weakly 2-modular and 3-linear, so K w ≥ / . earning algorithms Another part of our work concerns the following setting. Suppose thatone is given access to a value oracle for a function f that is ∆-linear. In such a setting, it isdesirable to obtain an explicit representation for some linear function h that is close to f , becauseusing such an h one can approximately solve optimization problems on f by solving them exactlyon h . The process of learning h involves two complexity parameters, namely the number of queriesmade to the value oracle for f , and the computation time of the learning procedure, and its qualitycan be measured by the distance d ( h, f ) = max S {| h ( S ) − f ( S ) |} of h from f . It was shownin Chierichetti et al. [2015] that for every learning algorithm that makes only polynomially manyvalue queries, there will be cases in which d ( h, f ) ≥ Ω(∆ √ n/ log n ). This might sound discouragingbut it need not be. One reason is that in many cases ∆ < log n/ √ n . For example, this may be thecase when f itself is actually linear, but is modeled as ∆-linear due to noise in measurements ofvalues of f . By investing more in the measurements, ∆ can be decreased. Another reason is thatthe lower bounds are for worst case f , and it is still of interest to design natural learning algorithmsthat might work well in practice. Indeed, Chierichetti et al. [2015] designed a randomized learningalgorithm that makes O ( n log n ) nonadaptive queries and learns h within distance O (∆ √ n ) from f . We improve upon this result in two respects, one being reducing the number of queries, and theother being removing the need for randomization. Theorem 1.4 (Learning algorithm) . There is a deterministic polynomial time learning algorithmthat given value oracle access to a ∆ -linear function on n items, makes O ( n ) nonadaptive valuequeries and outputs a linear function h , such that h ( S ) is O (∆(1 + p min {| S | , n − | S |} )) -close to f ( S ) for every set S . The learning algorithm is based on the Hadamard basis. A similar technique is applied inDwork and Yekhanin [2008] for a diﬀerent purpose (as brought to our attention by Moni Naor) –attacking the privacy of an n -sized binary database, and recovering all but o ( n ) entries via a linearnumber of queries.The Hadamard basis is an orthogonal basis of R n consisting of vectors with ± n that is a power of 2. It has the following property: for every twolinear functions that are O (∆)-close on the basis vectors (where each basis vector can be interpretedas a diﬀerence between two sets, one corresponding to its +1 indices, the other to its − O (∆(1 + p min {| S | , n − | S |} ))-close on every set S . (This property followsfrom the vectors having large norms, and thus a large normalization factor by which the distanceof ∆ is divided.) Given O ( n ) value queries to a ∆-linear function f , an algorithm can learnthe values of the linear function g that is ∆-close to f for the n Hadamard basis vectors, up toan additive error of O (∆). This is enough information to construct a linear function h that is O (∆(1 + p min {| S | , n − | S |} ))-close to g ( S ), and thus to f ( S ), for every set S . The stability of linearity and the connection between approximate modularity and approximatelinearity, which we study on the discrete hypercube, have been extensively studied in mathematicson continuous domains (e.g., Banach spaces). An early example is Hyers [1941], and this resulttogether with its extensions to other classes of functions are known as the Hyers-Ulam-Rassiastheory Jung [2011].Our work is related to the literature on data-driven optimization (see, e.g., Bertsimas and Thiele[2014]; Singer and Vondr´ak [2015]; Hassidim and Singer [2017]; Balkanski et al. [2017]). This liter-ature studies scenarios in which one wishes to optimize some objective function, whose parametersare derived from real-world data and so can be only approximately evaluated. Such scenarios arise4n applications like machine learning Krause et al. [2008], sublinear algorithms and property testingBlum et al. [1993], and algorithmic game theory Lehmann et al. [2006]. This motivates the studyof optimization of functions that satisfy some properties only approximately (e.g., submodularityKrause and Cevher [2010], gross substitutes Roughgarden et al. [2017], or convexity Belloni et al.[2015]). Our work is related to this strand of works in that we are also interested in functionsthat satisfy some property (modularity in our case) approximately, but unlike the aforementionedworks, we are interested in the characterization and learning of functions, not in their optimization.Learning of (exactly) submodular functions was studied by Balcan and Harvey [2011] andGoemans et al. [2009], in the PMAC model and general query model, respectively. We shall discusslearning of approximately modular functions in a value query model.

Organization

In Section 2 we present preliminaries. Section 3 shows improved upper bounds onthe weak and strong Kalton constants, and Section 4 shows improved lower bounds on the weakand strong Kalton constants. Section 5 describes how to learn an approximately linear functionfrom an approximately modular one.

Let U = { , . . . , n } be a ground set of n ≥ items (also called elements). For every set S ⊆ U ofitems, let ¯ S denote its complement U \ S . A collection G is a multiset of sets, and its complement¯ G is the collection of complements of the sets in G . Given a set function f : 2 U → R , the value of set S is f ( S ). Throughout we focus on additive closeness, and say that two values x, y ∈ R are∆-close (or equivalently, that x is ∆-close to y ) if | x − y | ≤ ∆.A set function f is ǫ -modular if for every two sets S, T ⊆ U , | f ( S ) + f ( T ) − f ( S ∪ T ) − f ( S ∩ T ) | ≤ ǫ. (1)If ǫ = 0 then f is modular . A set function f is weakly ǫ -modular if Condition (1) holds for everytwo disjoint sets S, T ⊆ U . The following proposition shows the relation: Proposition 2.1.

Every weakly ǫ -modular set function is a ǫ -modular set function. Proof.

Let f be a weakly ǫ -modular set function, we show it must be 2 ǫ -modular. For every twosets S and T , since f is weakly ǫ -modular we have that f ( T ) = f ( S ∩ T )+ f ( T \ S ) − f ( ∅ ) ± ǫ , and that f ( S ∪ T ) ± ǫ = f ( S )+ f ( T \ S ) − f ( ∅ ). Therefore, f ( S )+ f ( T ) = f ( S )+ f ( S ∩ T )+ f ( T \ S ) − f ( ∅ ) ± ǫ = f ( S ∪ T ) + f ( S ∩ T ) ± ǫ , completing the proof. (cid:4) Observation 2.2 (Approximate modularity for ≥ . Let f be a weakly ǫ -modular set function,and let S ⊆ U be a set of items with a partition ( S , . . . , S s ) (i.e., the disjoint union S ⊎ · · · ⊎ S s is equal to S ). By iterative applications of weak ǫ -modularity, f ( S ) + ( s − f ( ∅ ) − ǫ ) ≤ f ( S ) + · · · + f ( S s ) ≤ f ( S ) + ( s − f ( ∅ ) + ǫ ) . A set function f is linear if there exist constants c , c , . . . , c n such that for every set S , f ( S ) = c + P i ∈ S c i . A linear set function is additive if c = 0. The zero function has c i = 0 for every0 ≤ i ≤ n . A set function f is ∆ -linear if there exists a linear set function g that is ∆ -close to f ,i.e., f ( S ) and g ( S ) are ∆-close for every set S . We say that a set function f is tightly ∆-linear if itis ∆-linear and for every ∆ ′ < ∆, f is not ∆ ′ -linear. A closest linear function to f is a linear setfunction g that is ∆-close to f where f is tightly ∆-linear.5 roposition 2.3 (Modularity = linearity) . A set function is modular if and only if it is linear.

Proof.

Let f be a modular function, we show it must be linear: Let c = f ( ∅ ), and let c i = f ( { i } ) − c . For every set S , by modularity, f ( S ) = f ( S ) + f ( S ) − c for any S , S whose disjointunion is S . Applying this equality recursively gives f ( S ) = P i ∈ S c i + | S | c −| S − | c = P i ∈ S c i + c ,as required.For the other direction, let f be a linear function such that f ( S ) = c + P i ∈ S c i . For any twosets S, T , it holds that f ( S ) + f ( T ) = 2 c + P i ∈ S c i + P i ∈ T c i , and f ( S ∪ T ) + f ( S ∩ T ) = 2 c + P i ∈ S ∪ T c i + P i ∈ S ∩ T c i . The desired equality follows by P i ∈ S c i + P i ∈ T c i = P i ∈ S ∪ T c i + P i ∈ S ∩ T c i . (cid:4) Proposition 2.4.

Every ∆ -linear set function is -modular. This result is tight, even for sym-metric functions. Proof.

Let f be a ∆-linear function, and let g be a linear function such that g ( S ) = c + P i ∈ S c i and | f ( S ) − g ( S ) | ≤ ∆ for every set S . For every two sets S and T , f ( S ) + f ( T ) − f ( S ∪ T ) − f ( S ∩ T ) ≤ ( c + P i ∈ S c i + ∆) + ( c + P i ∈ T c i + ∆) − ( c + P i ∈ S ∪ T c i − ∆) − ( c + P i ∈ S ∩ T c i − ∆) = 4∆.Similarly, f ( S ) + f ( T ) − f ( S ∪ T ) − f ( S ∩ T ) ≥ ( c + P i ∈ S c i − ∆) + ( c + P i ∈ T c i − ∆) − ( c + P i ∈ S ∪ T c i + ∆) − ( c + P i ∈ S ∩ T c i + ∆) = − f is 4∆-modular.The proposition is tight, even for symmetric functions: Consider the 1-linear function on 4items in which sets of size 0 and 4 are worth 0, sets of size 1 and 3 are worth −

1, and sets of size2 are worth +1. If S and T are two diﬀerent sets of size 2 that intersect, the modularity equationis violated by 4. (cid:4) We shall often refer to set functions whose closest linear function is the zero function.

Observation 2.5 (Chierichetti et al. [2015]) . For every ǫ -modular (resp., weakly ǫ -modular) setfunction f that is tightly ∆ -linear, there is an ǫ -modular (resp., weakly ǫ -modular) set function f ′ that is tightly ∆ -linear and whose closest linear function is the zero function.The function f ′ can be deﬁned as follows: if g is a closest linear function to f then f ′ ( S ) = f ( S ) − g ( S ) for every set S . Let K w denote the smallest constant such that every weakly ǫ -modular set function is K w ǫ -linear.Let K s denote the smallest constant such that every ǫ -modular set function is K s ǫ -linear. Noticethat K s ≤ K w . We refer to K w as the weak Kalton constant (the possibility that there is sucha constant K w independent of n was advocated in the works of Nigel Kalton), and to K s as thestrong Kalton constant. Formally: Deﬁnition 2.6. K s ∈ R ≥ (resp., K w ∈ R ≥ ) is the strong (weak) Kalton constant if: • for every n ∈ N ≥ , ǫ ∈ R ≥ , every (weakly) ǫ -modular set function over [ n ] is K s ǫ -linear( K w ǫ -linear); and • for every κ < K s ( κ < K w ) and for every ǫ ∈ R ≥ , there exists a suﬃciently large n and a(weakly) ǫ -modular set function f over [ n ] such that f is not κǫ -linear (it is suﬃcient thatthere exist a (weakly) ǫ -modular set function that is tightly K s ǫ -linear ( K w ǫ -linear)). The propositions in Section 2.1 imply the following corollaries on Kalton constants.6 orollary 2.7 (of Proposition 2.1) . K w / ≤ K s ≤ K w . Corollary 2.8 (of Observation 2.5) . If K s is the strong Kalton constant for set functions whoseclosest linear function is the zero function, then K s is the strong Kalton constant (for general setfunctions). Similarly, if K w is the weak Kalton constant for set functions whose closest linearfunction is the zero function, then K w is the weak Kalton constant (for general set functions). The analysis of Kalton constants becomes much easier in the following special cases.

Here we give tight bounds on K s for symmetric set functions. A set function f is symmetric iffor every two sets S and T , | S | = | T | = ⇒ f ( S ) = f ( T ). A symmetric set function over [ n ] canbe represented as a function over integers f : { , . . . , n } → R . The following example shows that K s ≥ / Example 2.9.

For every n ∈ N ≥ , consider the symmetric set function f n over [ n ] where f n ([ n ]) = − ǫ and f n ( S ) = 0 for every other set S . Let g δ be a symmetric linear set function over [ n ] suchthat g δ ( { j } ) = − ǫjn + δ for every ≤ j ≤ n . Observe that g δ is max { δ, ǫ − ǫn − δ } -close to f n .When δ = ǫ − ǫ n , the distance is minimized and g δ is a closest linear function to f n . Thus f n istightly δ -linear. This shows that for every κ < / , there exists n such that f n is tightly δ -linearfor δ = ǫ − ǫ n > κǫ . Proposition 2.10.

The strong Kalton constant for symmetric set functions is K s = . Proof.

Let f be a symmetric ǫ -modular set function, we argue that f must be ǫ -linear. ByObservation 2.11 we can assume without loss of generality that f ’s closest linear function is thezero function. Let M be the maximum absolute value of f , then there exist k < k < k such thateither f ( k ) = f ( k ) = − M and f ( k ) = M , or f ( k ) = f ( k ) = M and f ( k ) = − M . Withoutloss of generality assume the former. Suppose that k ≥ n/

2. Using ǫ -modularity we get that2 f ( k ) ≤ f ( k ) + f (2 k − k ) + ǫ ≤ − M + M + ǫ . On the other hand, 2 f ( k ) = 2 M . We get2 M ≤ − M + M + ǫ , implying that M ≤ ǫ/

2, as desired. If k < n/

2, an analogous argument isinvoked using k . (cid:4) Observation 2.11.

For every symmetric ǫ -modular set function f that is tightly ∆ -linear, thereis a symmetric ǫ -modular set function f ′ that is tightly ∆ -linear and whose closest linear functionis the zero function. Proof.

Let g be a linear set function ∆-close to f . We show a symmetric set function g ′ that is∆-close to f : for every k ∈ [ n ], let g ′ ( k ) be the average value of k items according to g . The prooffollows as in Observation 2.5. (cid:4) We now give tight bounds on K s for submodular set functions. A set function f is submodular iffor every two sets S, T it holds that f ( S ) + f ( T ) ≥ f ( S ∪ T ) + f ( S ∩ T ). Example 2.9 shows that K s ≥ / roposition 2.12. The strong Kalton constant for submodular set functions is K s = . Proof.

Without loss of generality we normalize f such that f ( ∅ ) = 0. Since f is submodular andnormalized, it belongs to the class of XOS functions, and so there exists an additive set function g such that for every set S , f ( S ) ≥ g ( S ), and for the set S = U , f ( U ) = g ( U ) (see, e.g., Feige[2009]). It remains to show that for every S , f ( S ) ≤ g ( S ) + ǫ . Assume for contradiction that f ( S ) > g ( S ) + ǫ . Then f ( S ) + f ( S ) > g ( S ) + ǫ + g ( S ) = g ( U ) + ǫ = f ( U ) + ǫ , where we use that g ( S ) + g ( S ) = g ( U ) by additivity. We have shown a contradiction to ǫ -modularity, completing theproof. (cid:4) Kalton and Roberts [1983] prove that K w ≤ . This upper bound was subsequently improved to K w ≤ . K s ≤ .

8. Let us provide more detailson how the known upper bounds on K w are achieved. Deﬁnition 2.13 (Expander) . For k ∈ N ≥ and α, r, θ ∈ R ≥ such that α, θ < and r > , wesay that a bipartite graph G k ( V, W ; E ) is an ( α, r, θ ) -expander if | V | = 2 k , | W | = 2 θk , | E | = 2 kr ,and every set S ⊂ V of at most kα vertices has at least | S | neighbors in W (and hence a perfectmatching into W ). We say that ( α, r, θ )-expanders exist if there is some k ′ ∈ N ≥ such that for every integermultiple k of k ′ , there exists an ( α, r, θ )-expander G k .The following theorem (rephrased from Kalton and Roberts [1983]) is the key to the knownupper bounds on K w . Theorem 2.14 (Kalton and Roberts [1983]) . Suppose that for ﬁxed r and θ and all suﬃcientlylarge k there are ( , r, θ ) -expanders { G k } . Then an upper bound on the weak Kalton constant is: K w ≤ r − θ − θ ) . Pippenger [1977] shows that ( , r, θ )-expanders exist with r = 6 and θ = , if k is suﬃcientlylarge ( k ≥ K w ≤ . In Bondarenko et al. [2013]it is shown that r can be reduced to 5 .

05, thus leading to the improved bound of K w ≤ . Theorem 2.15 (Lower bound Pawlik [1987]) . Lower bounds on the Kalton constants are K w ≥ − Θ(1 /n ) and K s ≥ − Θ(1 /n ) . For a set function f , let M = max S {| f ( S ) |} be the maximum absolute value of f (also called f ’s extreme value). We say that a set S has value M if f ( S ) = M , and value − M if f ( S ) = − M .Given a distribution ( p , . . . , p κ ) over sets S , . . . , S κ , the marginal probability of item i accordingto this distribution is the probability that i appears in a set randomly selected according to thedistribution, i.e., P j | i ∈ S j p j . 8 emma 2.16 (Chierichetti et al. [2015]) . The closest linear function to a set function f is the zerofunction if and only if there exist probability distributions P + and P − with rational probabilities oversets with value M and sets with value − M , respectively, such that for every item i , the marginalprobabilities of i according to P + and P − are the same. Lemma 2.16 appears as Lemma 10 in Chierichetti et al. [2015]. The rationality of the probabilitydistributions follows since they are obtained as solutions to a linear program.

Deﬁnition 2.17 (Positive and Negative Supports) . For a set function f whose closest linear setfunction is the zero function, let P + , P − be the distributions guaranteed by Lemma 2.16. Then the positive support PS = ( P , . . . , P κ ) of f is the support of P + (sets assigned a positive probabilityby P + ), and the negative support NS = ( N , . . . , N ν ) of f is the support of P − . We emphasize that all sets in PS have value M and all sets in NS have value − M . Our main goal in this section is to provide improved upper bounds for the strong Kalton con-stant K s , which was previously known to be ≤ . K w , which was previouslyknown to be ≤ . K w and how they are derived. The basicapproach of Kalton and Roberts [1983] is outlined in Theorem 2.14. There the value of K w isrelated to parameters of ( , r, θ )-expanders. Rearranging the bound from Theorem 2.14, denotingthe denominator by D and the numerator by N r + N where N r depends on r and N does not, itis shown that: K w ≤ N r + N D , where D = 1 − θ , N r = 2 r and N = − θ . Kalton and Roberts [1983] use a previously knownexpander construction of Pippenger [1977] to get an upper bound of 44.5. The improved upperbound of 38.8 of Bondarenko et al. [2013] comes from constructions of expanders with a smallervalue of r . The value of r cannot be substantially reduced further (without changing θ ), and so theapproach of constructing better expanders is unlikely to signiﬁcantly further reduce K w .In Section 3.4 we improve upon the upper bound of 38.8 on K w by reducing N from − θ to − − θ , giving an upper bound of 26.8 using the expander construction of Bondarenko et al. [2013].The key to this improvement is extracting the main idea from the proof of Kalton and Roberts[1983], cleaning away redundancies and using instead Lemma 3.8 (which establishes the existenceof complementary collections with values approximately equal to the function’s extreme value orits negation). It seems that the value of N cannot be substantially reduced further within thisframework of ideas, and hence new ideas appear to be needed if one wishes to obtain signiﬁcantimprovements in the upper bound on K w . In Appendix B.2 we upper-bound K w using the minimumbetween two expressions rather than a bound of the form ( N r + N ) /D , and show that K w ≤ . K s , the strong Kalton constant. This places additionalrestrictions on f ( ǫ -modularity instead of only weak ǫ -modularity). Indeed, these additional re-strictions were used in Bondarenko et al. [2013] to reduce N from − θ to − θ , achieving anupper bound of 35 . K s . 9ur approach for improving K s will make more extensive use of ǫ -modularity. Rather thanconsidering sets from a collection with approximately the extreme value M , we shall consider intersections of these sets. Using ǫ -modularity we shall be able to show that the function valuesof intersections are also close to M . In fact, using some averaging arguments we shall obtain evenstronger estimates on how close these values are to M (a point that is relevant to controlling thevalue of N ). Thereafter, we will no longer be restricted to using ( α, r, θ )-expanders with α = .We will be able to use α = instead (for intersections of two sets), or even α = (for intersectionsof three sets), and so on. The advantage of reducing α is that for smaller values of α , expanderswith lower values of r and θ exist, leading to better upper bounds on K s . However, we cannotreduce α to arbitrarily small values because each reduction of α by a factor of two is accompaniedby an increase in N , and one needs to balance between these two factors. We present deﬁnitions and preliminary results used to establish our upper bounds.

As discussed when describing our approach (Section 3.1), we utilize the existence of expanders witha range of parameters. Our existence argument (see Appendix B.1) uses the probabilistic methodas in Pippenger [1977], and so results in expanders that are biregular (all vertices on the same sideof the bipartite graph have the same degree). More complicated expander constructions that arenot biregular may achieve even better parameters, as in Bondarenko et al. [2013].

Lemma 3.1.

The following families of (biregular) expanders exist:1. ( , , ) -expanders.2. ( , , ) -expanders.3. ( , , ) -expanders.4. ( , , ) -expanders.5. ( , , ) -expanders.6. ( , , ) -expanders. ǫ -modular set functions Throughout this section, let f be a weakly ǫ -modular set function whose closest linear set functionis the zero function. Observation 3.2. − ǫ ≤ f ( ∅ ) + f ( U ) ≤ ǫ . Proof.

Denote δ = f ( ∅ ) + f ( U ). Let M denote the maximum absolute value of f . If M = 0the claim follows trivially, otherwise by Lemma 2.16 there exist sets P, N with values M, − M ,respectively. Consider the values of ¯ P , ¯ N . By weak ǫ -modularity and the deﬁnition of M , we have − M ≤ f ( ¯ P ) ≤ − f ( P ) + δ + ǫ = − M + δ + ǫ , and M + δ − ǫ = − f ( N ) + δ − ǫ ≤ f ( ¯ N ) ≤ M . Weconclude that 0 ≤ δ + ǫ and δ − ǫ ≤

0, completing the proof. (cid:4)

The following is a direct corollary of Observation 3.2 and weak ǫ -modularity:10 orollary 3.3 (Value of complement set) . Let δ = f ( ∅ ) + f ( U ) . Then for every set S , − f ( S ) − ǫ + δ ≤ f ( ¯ S ) ≤ − f ( S ) + ǫ + δ . Let M be the maximum absolute value of f . Deﬁnition 3.4.

A set S has deﬁcit d ≥ if f ( S ) = M − d , and has surplus s ≥ if f ( S ) = − M + s .A collection has average deﬁcit d (resp., surplus s ) if the expected deﬁcit (resp. surplus) of its setswith respect to the uniform distribution is d (resp. s ). The next observation follows directly from Corollary 3.3.

Observation 3.5 (Average deﬁcit/surplus of complement collection) . Let δ = f ( ∅ ) + f ( U ) . Let G be a collection with average deﬁcit d and surplus s . Then the average surplus of its complement ¯ G is at most d + ǫ + δ , and the average deﬁcit of ¯ G is at most s + ǫ − δ . Deﬁnition 3.6.

An item is α -frequent in a collection if it appears in exactly an α -fraction of thesets. A collection has α -frequent items if every item is α -frequent in it. Observation 3.7.

The complement of a collection with α -frequent items has (1 − α ) -frequent items. Lemma 3.8 (Complementary collections) . There exists k ′ ∈ N ≥ such that for every k which is aninteger multiple of k ′ , f has a collection PS ∗ of k sets (the same set might appear multiple times inthe collection and we treat these appearances as distinct) with / -frequent items and average deﬁcit d , whose complement collection NS ∗ has / -frequent items and average surplus s , and d + s ≤ ǫ . Proof.

Consider the positive and negative supports PS , NS of f , where PS = { P , . . . , P κ } and NS = { N , . . . , N ν } , as deﬁned in Deﬁnition 2.17. By Lemma 2.16, there exist distributions P + = ( p +1 , . . . , p + κ ) and P − = ( p − , . . . , p − ν ) with rational probabilities over PS and NS, whosemarginals are equal for all items. Consider the complement collections PS and NS; let d ′ be theaverage deﬁcit of NS and let s ′ be the average surplus of PS. Since the average deﬁcit of PS andthe average surplus of NS are 0, then by Corollary 3.5, d ′ ≤ ǫ − δ and s ′ ≤ ǫ + δ and so d ′ + s ′ ≤ ǫ .Towards constructing the collections PS ∗ and NS ∗ , consider the collections PS ∪ NS and PS ∪ NSof κ + ν sets each. We deﬁne a distribution Q over κ + ν sets, which can be associated with bothPS ∪ NS and PS ∪ NS, to be Q = ( p +1 , . . . , p + κ , p − , . . . , p − ν ). This distribution has the followingproperties: • First, if sets are randomly drawn from PS ∪ NS according to this distribution, the probabilityof selecting a set with value M is at least 1 /

2, since the total weight on sets in PS is exactly1 /

2. Similarly, the probability of selecting a set from PS ∪ NS with value − M when samplingaccording to Q is at least 1 / • Second, for PS ∪ NS and for every item i , the probability of selecting a set with item i (i.e., the marginal of i ) is exactly 1 /

2, since it is equal to P j | i ∈ P j p + j + P j | i ∈ ¯ N j p − j = P j | i ∈ P j p + j + − P j | i/ ∈ ¯ N j p − j , and from the equality of the marginals of P + , P − we havethat P j | i ∈ P j p + j = P j | i ∈ N j p − j = P j | i/ ∈ ¯ N j p − j . The same holds for item marginals when thedistribution is taken over PS ∪ NS. • Third, the probabilities of the distribution Q are all rational and strictly positive.By the third property, we can duplicate sets in PS ∪ NS and in PS ∪ NS to construct thecollections PS ∗ and NS ∗ , such that sampling a set uniformly at random from PS ∗ is equivalent tosampling a set according to Q from PS ∪ NS, and similarly for NS ∗ and PS ∪ NS. By the ﬁrst11roperty, the average deﬁcit of PS ∗ is d ≤ d ′ / ∗ is s ≤ s ′ /

2, and d + s ≤ ǫ . By the second property, every item must appear in exactly half the sets in PS ∗ andhalf the sets in NS ∗ , meaning that the number of sets in PS ∗ and NS ∗ is even, and we can denoteit by 2 k ′ for some integer k ′ >

0. We have thus shown the existence of a collection PS ∗ and itscomplement NS ∗ with k ′ sets each whose average surplus and deﬁcit guarantee d + s ≤ ǫ . Observethat existence of such collections of size k holds for every integer multiple k = ck ′ , since taking c copies of PS ∗ and c copies of NS ∗ satisﬁes all the conditions of the lemma. (cid:4) We present the two key lemmas used to establish our upper bounds. We begin with Lemma 3.9,which is a simpliﬁed version of Lemma 3.1 of Kalton and Roberts [1983].

Lemma 3.9 (Using expanders for set recombination) . Let k ∈ N ≥ and α, r, θ ∈ R ≥ be suchthat there exists an ( α, r, θ ) -expander G k . Consider a collection G of k sets with α -frequent items(referred to as the source sets). Then G has a reﬁned partition into a total of kr subsets (referredto as the intermediate subsets), which can be recombined by disjoint unions into a collection of kθ sets with α/θ -frequent items (referred to as the target sets). Proof.

Let G k = G k ( V, W ; E ). Align the 2 k source sets with the 2 k vertices of V in the ( α, r, θ )-expander G k . Because every item appears in 2 kα sets, then for every item i there are 2 kα vertices in V corresponding to i (i.e., aligned with the source sets that contain i ). By the expansion propertyof G k , for every item i there exists a perfect matching M i between the vertices in V that correspondto i , and some 2 kα vertices in W .We now use these matchings to label the edges: For every item i , add i to the labels of thematched edges in M i . Every edge in E is now labeled by a subset of items (some labels may bethe empty set). Let these labels be the intermediate subsets. Their total number is | E | = 2 kr , asdesired. For every vertex v ∈ V , the items of the source set S v corresponding to v are partitionedamong the edges leaving v , and hence the intermediate subsets indeed reﬂect a reﬁned partitioningof the target sets.Observe that the edges entering a vertex w ∈ W are labeled by disjoint intermediate subsets(since for every item i , the edges labeled by subsets containing i form a matching). Let the set S w corresponding to w be the disjoint union of the subsets labeling the edges adjacent to w . The setscorresponding to the vertices in W can thus be the target sets.Notice that by construction, every item i appears in the same number of source and target sets,so if the source sets have α -frequent items, the target sets have α/θ -frequent items, completing theproof. (cid:4) Let f be a weakly ǫ -modular set function whose closest linear set function is the zero function.Let M be the absolute highest value of f . The following lemma upper-bounds M ; a more nuancedversion appears as Lemma B.2 in Appendix B.2. Lemma 3.10.

Let k ∈ N ≥ and α, r, θ ∈ R ≥ be such that there exists an ( α, r, θ ) -expander G k .Let G and G ′ be collections of k sets each, both with α -frequent items, such that the average deﬁcitof G is at most d and the average surplus of G ′ is at most s . Then M ≤ ( d + s ) + 2 ǫ ( r − − θ + ǫ. roof. We ﬁrst apply Lemma 3.9 to partition and disjointly recombine the sets of collection G using the expander G k = G k ( V, W ; E ). We use the following notation: For a vertex v ∈ V (resp., w ∈ W ) let S v (resp., S w ) be the source (resp., target) set corresponding to v (resp., w ). Denote theneighboring vertices of a vertex v by N ( v ) and its degree by deg( v ). Let S v,w be the intermediatesubset that labels (corresponds to) edge ( v, w ) ∈ E .By Lemma 3.9, for every v ∈ V , the intermediate subsets labeling the edges adjacent to v aredisjoint, and the same holds for every w ∈ W . We can thus apply Observation 2.2 to get P w ∈ N ( v ) f ( S v,w ) ≥ f ( S v ) + (deg( v ) − f ( ∅ ) − ǫ ) ∀ v ∈ V. P v ∈ N ( w ) f ( S v,w ) ≤ f ( S w ) + (deg( w ) − f ( ∅ ) + ǫ ) ∀ w ∈ W. Denote the maximum absolute value of f by M . Since collection G has average deﬁcit of at most d , by summing over vertices v ∈ V (where | V | = 2 k ) we get P v ∈ V f ( S v ) ≥ k ( M − d ). Since thetarget sets of G have value at most M , by summing over vertices w ∈ W (where | W | = 2 θk ) we get P w ∈ W f ( S w ) ≤ θkM . Clearly in the bipartite graph G k , P v ∈ V deg( v ) = P w ∈ W deg( w ), and bythe parameters of G k both are equal to 2 kr . Therefore, summing over v ∈ V and w ∈ W we get2 kM − kd + (2 kr − k )( f ( ∅ ) − ǫ ) ≤ X ( v,w ) ∈ E f ( S v,w ) ≤ θkM + (2 kr − θk )( f ( ∅ ) + ǫ ) . Dividing the resulting inequality by 2 k and rearranging gives(1 − θ ) M ≤ d + ( r − ǫ − f ( ∅ )) + ( r − θ )( ǫ + f ( ∅ ))= d + 2 ǫ ( r −

1) + (1 − θ )( ǫ + f ( ∅ )) . (2)Similarly, using that the average deﬁcit of collection G ′ is at most s and the value of its targetsets is at most M , P v ∈ V f ( S v ) ≤ k ( − M + s ), and P w ∈ W f ( S w ) ≥ − θkM . Therefore − kM + 2 ks + (2 kr − k )( f ( ∅ ) + ǫ ) ≥ X ( v,w ) ∈ E f ( S v,w ) ≥ − θkM + (2 kr − θk )( f ( ∅ ) − ǫ ) . Dividing the resulting inequality by 2 k and rearranging gives(1 − θ ) M ≤ s + ( r − ǫ + f ( ∅ )) + ( r − θ )( ǫ − f ( ∅ ))= s + 2 ǫ ( r −

1) + (1 − θ )( ǫ − f ( ∅ )) . (3)Rearranging Inequalities (2) and (3) as well as averaging the resulting inequalities implies thetheorem. (cid:4) K w To upper bound K w in this section and in Appendix B.2, we may focus without loss of generality ona weakly 1-modular set function f , whose closest linear set function is the zero function (Corollary2.8). We show the following upper bound: Lemma 3.11 (Upper bound on K w ) . Suppose that for ﬁxed r, θ ∈ R ≥ there exist ( , r, θ ) -expanders. Then the weak Kalton constant satisﬁes: K w ≤ r − − θ − θ . roof. By Lemma 3.8, there exists k ′ such that for every k that is a member of the arithmetic pro-gression k ′ , k ′ , k ′ , . . . , the function f has collections PS ∗ and NS ∗ of 2 k sets with -frequent items,and whose average deﬁcit d and average surplus s , respectively, satisfy d + s ≤

1. The assumptionthat ( , r, θ )-expanders exist implies that there is another arithmetic progression k ” , k ” , k ” , . . . such that for every k in this sequence an expander G k with the above parameters exist. As the twoarithmetic progressions must meet at k ′ k ”, there is a common k in both progressions. The upperbound follows from applying Lemma 3.10 to collections PS ∗ , NS ∗ to get K w ≤ +2 r − − θ + 1. (cid:4) Theorem 3.12.

The weak Kalton constant satisﬁes K w ≤ . . Proof.

In Lemma 3.11, we can get K w ≤ . r = 5 .

05 and θ = by using theexpanders from Bondarenko et al. [2013]. (cid:4) Using additional ideas, we further improve our upper bound on the weak Kalton constant byproviding a stronger version of Lemma 3.10. Using the new ideas we show that the weak Kaltonconstant satisﬁes K w ≤ . K s As in the previous section, we focus here without loss of generality on a 1-modular set function f whose closest linear set function is the zero function, and on its collections PS ∗ and NS ∗ as deﬁnedin Lemma 3.8. Recall that d and s are the average deﬁcit and surplus of PS ∗ and NS ∗ , respectively,and d + s ≤

1. Denote the average deﬁcit of an intersection of ℓ sets in PS ∗ by d ℓ and the averagesurplus of an intersection of ℓ sets in NS ∗ by s ℓ . Lemma 3.13.

For even ℓ , d ℓ + s ℓ ≤ ℓ − . For odd ℓ , d ℓ + s ℓ ≤ ℓ − + 1 . Proof.

We ﬁrst prove the following claim: d + s ≤ ∗ and NS ∗ are comple-ments. For every P , P ∈ PS ∗ whose complements are N , N ∈ NS ∗ it holds that f ( P ∪ P ) = f ( P ∩ P ) = f ( N ∩ N ) ≤ − f ( N ∩ N ) + ǫ + δ (Corollary 3.3). Similarly, f ( N ∪ N ) = f ( N ∩ N ) = f ( P ∩ P ) ≥ − f ( P ∩ P ) − ǫ + δ . By 1-modularity we thus have f ( P ∩ P ) ≥ f ( P ) + f ( P ) − f ( P ∪ P ) − ≥ f ( P ) + f ( P ) + f ( N ∩ N ) − δ −

2, and f ( N ∩ N ) ≤ f ( N ) + f ( N ) − f ( N ∪ N ) + 1 ≤ f ( N ) + f ( N ) + f ( P ∩ P ) − δ + 2.We now take the average over P , P . This gives M − d ≥ M − d + ( − M + s ) − δ − N , N gives − M + s ≤ − M + 2 s + ( M − d ) − δ + 2, where M is the maximum absolute value of f . Subtracting the second inequality from the ﬁrst we get2 M − d − s ≥ M − d − s − M + d + s −

4, so d + s ≤ d + s + 2 ≤

3, completing the proofof the claim.We can now prove Lemma 3.13 by induction: It holds for ℓ = 1 and ℓ = 2 using the aboveclaim. For every ℓ sets P , . . . , P ℓ ∈ PS ∗ , by 1-modularity the value of their intersection satisﬁes f (( P ∩ . . . ∩ P ℓ ) ∩ ( P ℓ +1 ∩ . . . ∩ P ℓ )) ≥ f ( P ∩ . . . ∩ P ℓ ) + f ( P ℓ +1 ∩ . . . ∩ P ℓ ) − M −

1, where M is themaximum absolute value of f . Taking the average over the ℓ sets we get M − d ℓ ≥ M − d ℓ − d ℓ − ℓ − − M + s ℓ ≤ − M + s ℓ + s ℓ − ℓ + 1. So d ℓ + s ℓ ≤ + d ℓ + s ℓ − ℓ + d ℓ + s ℓ − ℓ + 2. Using the expanders from Pippenger [1977] (with r = 6 instead of r = 5 .

05) would result in K w ≤ . ℓ is even we can take ℓ = 2 and using the induction hypothesis get d ℓ + s ℓ ≤ ℓ − − ℓ −

2. If ℓ is odd we can take ℓ = 1 and using the induction hypothesis get d ℓ + s ℓ ≤ ℓ − − ℓ − + 1. This completes the proof. (cid:4) Observation 3.14.

Every item is contained in / ℓ of the intersections of ℓ sets in PS ∗ or NS ∗ . Observation 3.14 allows us to use the main lemma for upper-bounding the Kalton constants(Lemma 3.9) with α = 1 / ℓ . Together with Lemma 3.13 that establishes the average deﬁcit andsurplus, this enables us to obtain the following result. Theorem 3.15.

Suppose that for ﬁxed r, θ ∈ R ≥ there exist ( α, r, θ ) -expanders. Then the strongKalton constant satisﬁes K s ≤ r + N − θ − θ , where if α = then N = − ; if α = then N = ; if α = then N = 2 ; if α = then N = 3 ; and if α = then N = 4 . . Proof.

For every ℓ , by taking the intersections of ℓ sets in PS ∗ and NS ∗ we get collections G and G ′ , both with ℓ -frequent items (Observation 3.14), whose average deﬁcit and surplus are d ℓ and s ℓ ,respectively. Moreover, there exists some k such that G and G ′ have 2 k sets each, and an expander G k with the above parameters exists. We now apply Lemma 3.10, which gives an upper bound of K s ≤ ( d ℓ + s ℓ )+2 r − − θ + 1 . By Lemma 3.13, d ℓ + s ℓ is at most 3 , , ,

11 for ℓ = 2 , , ,

5, respectively.This implies the theorem. (cid:4)

To use Theorem 3.15 one needs to substitute in parameters of expanders. Theorem 3.15 com-bined with the ( , , )-expanders from Lemma 3.1 implies that the value of the strong Kaltonconstant satisﬁes K s ≤ · − . − . = 14 . K s Using our techniques, the upper bound on K s can be improved even further. The sources of theseimprovements are twofold. First, all of our results are derived using bi-regular expanders; betterbounds can be obtained using more sophisticated expanders. Second, improvements can be ob-tained by using additional properties of ǫ -modular functions to improve the bounds in Lemma 3.13.Improvements of the second type are demonstrated in this section, where the main result is showingthat K s < .

65 (establishing Theorem 1.1).

Deﬁnition 3.16.

Given ≤ α ≤ and µ ≥ , an ( α, µ )-collection-pair is a pair of collections D, S , such that in both collections items are α -frequent, the average value of sets in D is at least M − d , the average value of sets in S is at most − M + s , and ( d + s ) = µ . For a given µ ≥

0, let α [ µ ] denote the smallest 0 ≤ α ≤ such that there is some µ ′ ≤ µ forwhich an ( α, µ ′ )-collection-pair exists. By Lemma 3.13, we may assume for our given 1-modularfunction f that α [ ] ≤ , α [ ] ≤ , and α [4] ≤ .Fix some < δ < δ satisfying α [ δ ] ≃ .) Now we consider two cases, each addressed in its own lemma. Lemma 3.17.

Suppose that α [ δ ] ≤ α [ ]2 . Then α [2 δ − ] ≤ . Proof.

Consider the µ ′ ≤ δ for which an ( α [ δ ] , µ ′ )-collection-pair ( D, S ) with deﬁcit d ′ andsurplus s ′ exists, and let m denote the number of sets in D and in S (sets appearing more thanonce are counted more than once). Consider now the two collection-pairs ( D ∩ , S ∩ ) and ( D ∪ , S ∪ )15btained by taking all m pairwise intersections or unions (respectively) of sets from D, S . (A pairis generated by picking one set and then another set, with repetitions.) Let α ∩ and α ∪ be theitem frequencies associated with these two collection-pairs, respectively. Let d ∪ , d ∩ be the deﬁcitsof D ∪ , D ∩ , respectively, and let s ∪ , s ∩ be the surpluses of S ∪ , S ∩ , respectively. Let µ ∪ (resp., µ ∩ )be the average of d ∪ , s ∪ (resp., d ∩ , s ∩ ). Then: • α ∩ = ( α [ δ ]) ≤ (cid:16) α [ ]2 (cid:17) ≤ ; • α ∪ < α [ δ ] ≤ α [ ].It follows from the second bullet that µ ∪ ≥ . The 1-modularity condition implies that 2 d ′ − d ∪ +1 ≥ d ∩ and 2 s ′ − s ∪ + 1 ≥ s ∩ , and so by averaging 2 µ ′ − µ ∪ + 1 ≥ µ ∩ . Substitute µ ∪ ≥ / µ ′ ≤ δ to get 2 δ − ≥ µ ∩ . By the ﬁrst bullet, it follows that α [2 δ − ] ≤ , as required. (cid:4) Lemma 3.18.

Suppose that α [ δ ] > α [ ]2 . Then α [9 − δ ] ≤ . Proof.

Consider the µ ′ ≤ for which an ( α [ ] , µ ′ )-collection-pair ( D, S ) with deﬁcit d ′ and surplus s ′ exists. For the collection-pair ( D ∩ , S ∩ ) (that we shall denote by ( ˜ D, ˜ S )) the item frequencies are (cid:0) α [ ] (cid:1) , and the average of the deﬁcit and surplus ˜ µ is at most 2 µ ′ + 1 ≤ D ∩ , ˜ S ∩ ) and ( ˜ D ∪ , ˜ S ∪ ). Let ˜ α ∩ and ˜ α ∪ be the item frequencies associated with thesetwo collections respectively. Then • ˜ α ∩ = (cid:0) α [ ] (cid:1) ≤ , • ˜ α ∪ < (cid:0) α [ ] (cid:1) < α [ δ ],where the last inequality uses the fact that α [ ] ≤ and the premise of the lemma.Similarly to the proof of Lemma 3.17, it follows from the second bullet that ˜ µ ∪ ≥ δ . The1-modularity condition then implies that 9 − δ ≥ · − δ + 1 ≥ µ − ˜ µ ∪ + 1 ≥ ˜ µ ∩ . This completesthe proof of the lemma. (cid:4) Corollary 3.19.

For every < δ < , every 1-modular function f as above has an ( α, µ ) -collection-pair with either α ≤ and µ ≤ δ − , or α ≤ and µ ≤ − δ . Improved upper bound no. 1: K s < .

25 We now apply Lemma 3.10, which gives an upperbound of K s ≤ µ +2 r − − θ + 1 when there are ( α, r, θ )-expanders. By Corollary 3.19 we get K s ≤ max (cid:26) δ + 2 r − . − θ , − δ + 2 r + 71 − θ (cid:27) + 1 , where r , θ are the best expander parameters for α = , and r , θ are the best expander param-eters for α = . Plugging in δ = 43 /

16, and using Lemma 3.1 we get that K s ≤ . mproved upper bound no. 2: K s < .

65 The 13 .

25 bound can be further improved by usingLemma B.2 (which is an improved version of Lemma 3.10). To make use of Lemma B.2 we need d ′ + s ′ (the lower bounds on the average deﬁcit and surplus of the target sets) to be suﬃcientlylarge. We therefore distinguish between two cases as shown next.Suppose f has an ( α, µ )-collection-pair with α ≤ and µ ≤ δ − / α, , )-expander (which exists by Lemma 3.1). Consider the target sets ofthe collection pair guaranteed by Lemma 3.9; their frequency is α / ≤ / / < /

16. If d ′ + s ′ ≤ . , , )-expanders (Lemma 3.1), to get K s < .

65 (by substituting d ′ + s ′ ≤ . r = 4, θ = 4 / f has an ( α, µ )-collection-pair with α ≤ and µ ≤ − δ (thesecond case in Corollary 3.19), repeat an analogous analysis using an ( α, , )-expander (whichexists by Lemma 3.1) to get K s < .

65, as above (this is valid since the frequency of the targetsets is α / ≤ / / < / d ′ + s ′ > .

08. By Corollary 3.19, we get thatfor every < δ < K s is upper-bounded bymax ( δ − − . θ + 2( r − − θ , − δ − . θ + 2( r − − θ ) + 1 , where r , θ are the best expander parameters for α = , and r , θ are the best expander parame-ters for α = . Plugging in δ = 43 /

16, and using Lemma 3.1 we get that K s ≤ . K s < . Remark

The bounds shown in this section illustrate the techniques we use. Clearly, these tech-niques can be extended to give even better bounds by, e.g., establishing better expanders andapplying our ideas recursively. We save these extensions for future work.

In this section we prove the lower bound stated in Theorem 1.2 on the strong Kalton constant; i.e.,we prove that K s ≥

1. The lower bound stated in Theorem 1.3 on K w is proved in Appendix C.We begin by explicitly constructing a set function f that establishes the lower bound. The ideaunderlying f ’s construction is to achieve symmetry properties which facilitate its analysis; this ideais developed in Section 4.1, and the analysis appears in Section 4.2. Proof. [Proof of Theorem 1.2] The following set function with n = 70 is 2-modular and tightly2-linear, thus showing K s ≥

1. Let M = 2. The positive support PS and the negative supportNS of f (Deﬁnition 2.17) are as follows: There are 8 sets in PS, each with 35 out of the 70items, such that every item appears in exactly 4 of the 8 sets (notice that (cid:0) (cid:1) = 70). (We canﬁx any such collection of sets as PS without loss of generality.) The negative support NS is thecomplement collection of PS. Notice that PS , NS support uniform distributions P + , P − with equalitem marginals, as required. The values of sets under f are determined by the following rules,where the ﬁrst applicable rule applies (and hence f is well deﬁned):1. Each positive support set S ∈ PS has value f ( S ) = M = 2.2. For every two sets S and S in PS and every set R , we have f ( S ∪ ( S ∩ R )) = 1, and likewise f ( S ∩ ( S ∪ R )) = 1. (In particular, for every two sets S , S ∈ PS, f ( S ∪ S ) = f ( S ∩ S ) = 1.)17. We impose − f ( S ) = f ( ¯ S ) and derive from this sets with negative value.4. All other sets have value 0. Claim 4.1. f is tightly 2-linear. Claim 4.2. f is 2-modular. The proofs of Claims 4.1 and 4.2 use the tool of ( k, M )-symmetric set functions introducedin Section 4.1. In particular, the proof of the latter claim is based on an analysis of a subset ofselected cases, which are proven to be suﬃcient to establish ǫ -modularity for f due to its symmetryproperties (see Lemma 4.7). The proofs of the claims appear in Section 4.2, thus completing theproof of Theorem 1.2. (cid:4) ( k, M ) -Symmetric Set Functions In this section we introduce the class of ( k, M )-symmetric set functions, which are tightly M -linear(Proposition 4.6), while enjoying many symmetries. In general, checking whether a set functionover n items is approximately modular involves verifying that roughly 2 n modular equations (oneequation for every pair of sets) approximately hold. Verifying approximate modularity for ( k, M )-symmetric set functions becomes an easier task thanks to their symmetries. We begin by introducingsome terminology. Deﬁnition 4.3.

A collection G = { S , S , . . . , S g } of g subsets of U is generating if the followingconditions hold:1. It is covering , i.e., S S j ∈G S j = U (every item is contained in at least one set).2. It is item-diﬀerentiating , i.e., T S j ∈G| i ∈ S j S j = { i } for every item i (equivalently, for everypair of items, there is a set containing one but not the other). Note that this implies inparticular that T S j ∈G S j = ∅ (no item is contained in all sets). Observe that if a collection G is generating, then every subset of U can be generated from setsin G by a sequence of intersections and unions (possibly, in more than one way). Also observe thatgiven a generating collection G = { S , S , . . . , S g } , the complement collection ¯ G = { ¯ S , ¯ S , . . . , ¯ S g } (obtained by complementing each of the generating sets) is also generating. This can be shown byapplying De Morgan’s laws. Deﬁnition 4.4.

A generating collection G is canonical if the number g of generating sets is even(we denote g = 2 k for some positive integer k ), every item is contained in exactly k sets from G ,every set in G contains exactly n/ items, and n = (cid:0) kk (cid:1) . Given a canonical generating collection G where G = { S , S , . . . , S k } , items can be thoughtof as balanced vectors in {± } k , where coordinate j of item i is +1 if i ∈ S j , and − i ∈ ¯ S j .Observe also that if G is a canonical generating collection, then so is its complement ¯ G .A generating circuit C is a directed acyclic graph (namely, with no directed cycles) with g nodes referred to as input nodes (these nodes have no incoming edges), one node referred to as the output node (this node has no outgoing edges), and in which each non-source node has at mosttwo incoming edges. Nodes with two incoming edges are labeled by either a ∩ (intersection) or ∪ (union) operation, whereas nodes with one incoming edge are labeled by ¯ · (complementation).18ssociating the input nodes with the g sets of a generating collection G , the set at each node iscomputed by applying the respective operation on the incoming sets (either intersection, union, orcomplementation), and the output of the circuit is the set at the output node.Given a circuit C , the dual circuit ˆ C is obtained by replacing ∩ by ∪ and vice versa. For agiven generating collection G and a permutation that maps G to the input nodes of circuit C , if S is the set output by C , we refer to the set output by the dual circuit ˆ C as the dual of S , anddenote in by ˆ S . It can be shown that given G and S , the dual set ˆ S is well deﬁned, in the sensethat for all circuits C that generate S , their duals generate the same ˆ S . (In a canonical generatingcollection G , every item appears in exactly k generating sets. This induces a perfect matching overitems, where two items are matched if there is no generating set in which they both appear, orequivalently, if the vectors in {± } k representing them are negations of each other. Given a set S ,its dual can be seen to be the set ˆ S that contains all items that are not matched to items in S . Inparticular, every set in G is the dual of itself.)From now on we restrict attention to set functions f whose closest linear set function is thezero function. Recall from Deﬁnition 2.17 that the positive and negative supports PS and NS of f are the collections of sets that f assigns maximum or minimum values to and are in the supportsof distributions P + or P − , respectively (see Lemma 2.16 above). Deﬁnition 4.5.

For integers k ≥ and M ≥ , we say that a set function f over a set U of n = (cid:0) kk (cid:1) items is ( k, M ) -symmetric if it has the following properties:1. Integrality: f attains only integer values.2. Antisymmetry: for every set S and its complement ¯ S it holds that f ( S ) = − f ( ¯ S ) .3. Canonical generating sets: positive support PS contains k sets P , . . . , P k (of value M ),negative support NS contains k sets N = ¯ P , . . . , N k = ¯ P k (of value − M , these are thecomplements of the sets in PS ), and PS (and likewise NS ) is a canonical generating collection.4. Generator anonymity: let C be an arbitrary generating circuit. Then the value (under f ) ofthe set output by the circuit is independent of the permutation that determines which of thegenerating sets from PS is mapped to which source node.5. Dual symmetry: for every set S and its dual ˆ S it holds that f ( S ) = f ( ˆ S ) . Proposition 4.6.

Every ( k, M ) -symmetric function f is tightly M -linear, and the 0 function is alinear function closest to f . Proof.

Consider item 3 (canonical generating sets) in Deﬁnition 4.5. By the virtue of PS beinga canonical generating set, the uniform distribution over PS has marginal 1 / U ,and likewise for NS. Hence Lemma 2.16 implies that the 0 function is a linear function closest to f . As the maximum value of f is M , it then follows that f is tightly M -linear. (cid:4) We shall design certain ( k, M )-symmetric functions and would like to prove that they are ǫ -modular, typically for ǫ = 2. The case analysis involved in checking ǫ -modularity can be reducedto the cases outlined in the following lemma. Lemma 4.7.

To verify that a ( k, M ) -symmetric function f is ǫ -modular it suﬃces to verify theapproximate modularity equation | f ( S ) + f ( T ) − f ( S ∩ T ) − f ( S ∪ T ) | ≤ ǫ in the cases where S and T satisfy all of the following conditions: (a) | f ( T ) | ≤ | f ( S ) | ; (b) ≤ f ( S ) ≤ M ; and (c) f ( S ∩ T ) ≤ f ( S ∪ T ) . roof. Suppose that we wish to verify the approximate modularity condition | f ( S ) + f ( T ) − f ( S ∩ T ) − f ( S ∪ T ) | ≤ ǫ for two sets S and T . If S and T satisfy the three conditions in the lemma,then indeed the approximate modularity condition will be checked directly. Hence it remains toshow that even if S and T do not satisfy some of the conditions of the lemma, there will be twoother sets, say S ′ and T ′ , that do satisfy the conditions, and such that the approximate modularitycondition holds for S ′ and T ′ if and only if it holds for S and T .Suppose that condition (a) is violated, namely, | f ( T ) | ≥ | f ( S ) | . Then simply interchange S and T and then condition (a) holds.Suppose that condition (a) holds and condition (b) is violated, namely, f ( S ) < f ( T ). Thenconsider the complement sets ¯ S and ¯ T , for which both conditions (a) and (b) hold. By antisymmetry they satisfy f ( S ) = − f ( ¯ S ), f ( T ) = − f ( ¯ T ), f ( S ∩ T ) = − f ( ¯ S ∪ ¯ T ), and f ( S ∪ T ) = − f ( ¯ S ∩ ¯ T ). Hence the approximate modularity condition for ¯ S and ¯ T implies it for S and T .Suppose that conditions (a) and (b) hold, and condition (c) is violated, namely, f ( S ∩ T ) >f ( S ∪ T ). In this case, consider the dual sets ˆ S and ˆ T . By dual symmetry we have that f ( S ) = f ( ˆ S ), f ( T ) = f ( ˆ T ), f ( S ∩ T ) = f ( ˆ S ∪ ˆ T ), and f ( S ∪ T ) = f ( ˆ S ∩ ˆ T ). Hence the approximate modularitycondition for ˆ S and ˆ T implies it for S and T . Moreover, for ˆ S and ˆ T conditions (a), (b) are inheritedfrom S and T , and condition (c) does hold. (cid:4) In the remainder of the section we use ( k, M )-symmetric functions to prove Claims 4.1 and 4.2.

Proof. [Proof of Claim 4.1] It is not hard to verify that the function f deﬁned above is ( k, M )-symmetric according to Deﬁnition 4.5 for k = 4 and M = 2: the main thing to check is that thereare no two positive sets that are complements of each other (and consequently antisymmetry isenforced by rule 3), and this is implied by the proof of Claim 4.8 below. Proposition 4.6 impliesthat f is tightly 2-linear. (cid:4) We say that S is a positive set if f ( S ) >

0, a negative set if f ( S ) <

0, and a zero set if f ( S ) = 0. Claim 4.8.

There is no pair of sets S and T such that one of them is positive and the other isnegative and S ⊂ T . Proof.

We show that no positive set is contained in a negative set. The opposite direction can beshown analogously. It is suﬃcient to show that no minimal positive set is contained in a maximalnegative set. View each item as a balanced vector in {± } . Let S be a minimal positive set. S is the intersection of two sets in PS, thus has two coordinates (out of eight), say 1 and 2, ﬁxed to+1 and contains all items that agree with both. Let T be a maximal negative set. T is the unionof two sets in NS, thus has two coordinates, say 3 and 4, ﬁxed to − k ≥

4) is in S but not in T . (cid:4) Proof. [Proof of Claim 4.2] We show that f is 2-modular. Consider two sets S and T . ByLemma 4.7, the cases in the following case analysis suﬃces in order to establish 2-modularity.1. f ( S ) = 2 (namely, S ∈ PS). In this case both f ( S ∪ T ) ≥ f ( S ∩ T ) ≥

0, by Claim 4.8,a fact that will be implicitly used in the subcases below.(a) f ( T ) = 2 (namely, T ∈ PS). In this case f ( S ∪ T ) = f ( S ∩ T ) = 1, by deﬁnition of f (rule 2). Thus, f ( S ) + f ( T ) − f ( S ∪ T ) − f ( S ∩ T ) = 2.20b) f ( T ) = 1. In this case f ( S ) + f ( T ) = 3. So it suﬃces to show that either f ( S ∪ T ) ≥ f ( S ∩ T ) ≥

1. By the deﬁnition of f , there are two possibilities:i. T is contained in some PS set S , and then f ( S ∪ T ) ≥ f ( S ∪ ( S ∩ R )) = 1 (with S serving as S ).ii. T contains some PS set S , and then f ( S ∩ T ) ≥ f ( S ∩ ( S ∪ R )) =1.(c) f ( T ) = 0. In this case f ( S ) + f ( T ) = 2 and 0 ≤ f ( S ∪ T ) + f ( S ∩ T ) ≤

4, satisfying the2-modularity condition.(d) f ( T ) <

0. Given that f ( S ) > f ( T ) < f ( S ∪ T ) = f ( S ∩ T ) ≥ f ( S ) = 1. Both f ( S ∪ T ) ≥ f ( S ∩ T ) ≥

0, by Claim 4.8.(a) f ( T ) = 1. The 2-modularity condition then holds since f ( S ) + f ( T ) = 2 and 0 ≤ f ( S ∪ T ) + f ( S ∩ T ) ≤ f ( T ) = 0. Observe that it cannot be that both f ( S ∪ T ) = 2 and f ( S ∩ T ) = 2. Hence f ( S ) + f ( T ) = 1 and 0 ≤ f ( S ∪ T ) + f ( S ∩ T ) ≤

3, satisfying the 2-modularity condition.(c) f ( T ) = −

1. Given that f ( S ) > f ( T ) < f ( S ∪ T ) = f ( S ∩ T ) ≥ f ( S ) + f ( T ) = f ( S ∪ T ) + f ( S ∩ T ) in this case.3. f ( S ) = f ( T ) = 0. We need to show that − ≤ f ( S ∪ T ) + f ( S ∪ T ) ≤

2. This might beviolated only if max[ | f ( S ∪ T ) | , | f ( S ∪ T ) | ] = 2. Lemma 4.7 implies that it suﬃces to considerthe case that the set f ( S ∪ T ) = 2. Namely, S ∪ T ∈ PS. Then necessarily | S ∩ T | < k and0 ≤ f ( S ∩ T ) <

2. We show that f ( S ∩ T ) = 1, and then indeed − ≤ f ( S ∪ T ) + f ( S ∪ T ) ≤ | S ∩ T | < k , the only rule that might cause f ( S ∩ T ) = 1 is that S ∩ T = S ∩ ( S ∪ R ),where S , S ∈ PS. But given that k ≥ S ∪ T (which we shall call P ) is also in PS, itfollows that either S or S are equal to P . (No set in PS contains the intersection of two othersets from PS.) If S = P then S (and also T ) is sandwiched between S ∩ ( S ∪ R ) ⊂ S ⊂ S ,and hence is itself of the form S = S ∩ ( S ∪ R ′ ), contradicting the assumption that f ( S ) = 0.If S = P then S (and also T ) is sandwiched between S ∩ ( S ∪ R ) ⊂ S ⊂ S , and this impliesthat without loss of generality R = ∅ . Hence S is of the form S = ( S ∪ R ′ ) ∩ S , contradictingthe assumption that f ( S ) = 0.Hence we established that f is 2-modular. (cid:4) ∆ -Linear and ǫ -Modular Functions Consider the following natural question: We have value-query access to a ∆-linear set function f .Our goal is to learn in polynomial-time a “hypothesis” linear set function h that is δ -close to f .How small can δ be as a function of ∆ and n ? We say that an algorithm δ -learns a set function f if given value-query access to f it returns in polynomial-time a linear set function that is δ -closeto f .The work of Chierichetti et al. Chierichetti et al. [2015] (Theorem 4) presents an algorithmthat O ( ǫ √ n )-learns ǫ -modular set functions. Since every ∆-linear set function is 4∆-modular(Proposition 2.4), their algorithm also O (∆ √ n )-learns ∆-linear set functions. The algorithm of21hierichetti et al. [2015] is randomized, and after making O ( n log n ) non-adaptive queries to thefunction it is learning, returns a δ -close linear set function with probability 1 − o (1).In this section we present an alternative algorithm for O (∆ √ n )-learning ∆-linear set functions(and since every ǫ -modular set function is K s ǫ -linear for a constant K s , also for O ( ǫ √ n )-learning ǫ -modular set functions). Our algorithm (Algorithm 1) is simple, deterministic , and makes a linear number of non-adaptive queries to the function it is learning. A Hadamard basis is an orthogonal basis { v , . . . , v n } of R n which consists of vectors in {± } n .Let { y , . . . , y n } = { v / √ n, . . . , v n / √ n } , then { y , . . . , y n } is an orthonormal basis of R n . Easyrecursive constructions of a Hadamard basis are known whenever n is a power of 2 (and also forsome other values of n ). For all values of n for which a Hadamard basis exists, and for every choiceof one particular vector v ∈ {± } n , we may assume that v is a member of the Hadamard basis. Thiscan be enforced by taking the ﬁrst vector v in an arbitrary Hadamard basis of R n , and ﬂipping –in all vectors of the basis – those coordinates in which v and v do not agree.For every set of items S ⊆ [ n ], let v S ∈ { , } n be the indicator vector of S . Vector v S canbe written as a linear combination P ni =1 λ i y i of the basis vectors { y , . . . , y n } . We shall makeuse of the following property of orthonormal bases: Express v S as P ni =1 λ i y i ; since { y , . . . , y n } isorthonormal, then by the Pythagorean theorem generalized to Euclidean spaces, n X i =1 ( λ i ) = | S | . (4)Given a basis vector v i we denote by S i the set of indices in which v i has entries +1, and by ¯ S i the set of indices in which v i has entries − i ∈ [ n ] speciﬁes the index of a basis vector (as in v i or y i ), whereas toindex other objects (such as coordinates of vectors, or items) we shall use j rather than i . When avector is indexed by a set (such as v S , or v { j } ) then the vector is the indicator vector (in { , } n )for the set. Claim 5.1 (Expressing a linear set function using S i , ¯ S i ) . Consider a linear set function g where g ( S ) = P j ∈ S g j + g ( ∅ ) for every set S ⊆ [ n ] . Then for every v S it holds that g ( S ) = √ n P ni =1 λ i ( g ( S i ) − g ( ¯ S i )) + g ( ∅ ) , where here λ , . . . , λ n are the unique coeﬃcients for which v S = P ni =1 λ i y i . In partic-ular g j = g ( { j } ) can be obtained by substituting in the unique λ , . . . , λ n for which v { j } = P ni =1 λ i y i . Proof.

Let g ′ be the additive function deﬁned as g ′ ( S ) = g ( S ) − g ( ∅ ). Extend the domain of g ′ from { , } n to R n , giving the additive function ˜ g over R n deﬁned as follows: ˜ g ( x ) = P j ∈ S g j x j forevery x ∈ R n . Observe that for every Hadamard basis vector v i , ˜ g ( v i ) = g ( S i ) − g ( ¯ S i ) (by deﬁnitionof the sets S i , ¯ S i ). By additivity of ˜ g , for every set S whose indicator vector v S can be expressedas P ni =1 λ i y i , we have that ˜ g ( v S ) = P ni =1 λ i ˜ g ( y i ) = √ n P ni =1 λ i ˜ g ( v i ) = √ n P ni =1 λ i ( g ( S i ) − g ( ¯ S i )).Finally, g ( S ) − g ( ∅ ) = g ( S ) − g = ˜ g ( v S ) for every set S . (cid:4) Lemma 5.2.

Let h and g be two linear set functions such that for every i ∈ [ n ] , h ( S i ) − h ( ¯ S i ) is O (∆) -close to g ( S i ) − g ( ¯ S i ) , and for every S ′ ∈ {∅ , U } , h ( S ′ ) is O (∆) -close to g ( S ′ ) . Then forevery set S ⊆ [ n ] , h ( S ) is O (∆(1 + p min {| S | , n − | S |} )) -close to g ( S ) . Proof.

The lemma clearly holds for S ∈ {∅ , U } , since in this case O (∆(1 + p min {| S | , n − | S |} )) = O (∆ · < | S | < n . By Claim 5.1 applied to h and g ,22 ( S ) = √ n P ni =1 λ i ( h ( S i ) − h ( ¯ S i )) + h ( ∅ ), and g ( S ) = √ n P ni =1 λ i ( g ( S i ) − g ( ¯ S i )) + g ( ∅ ). For every i ∈ [ n ], h ( S i ) − h ( ¯ S i ) estimates g ( S i ) − g ( ¯ S i ) up to an additive error of O (∆), and h ( ∅ ) estimates g ( ∅ ) up to an additive error of O (∆). Thus we get that h ( S ) estimates g ( S ) up to an additive errorof O (∆) √ n P ni =1 | λ i | + O (∆). Invoking (4), this error is maximized when ( λ i ) = | S | /n for every i . Weconclude that an upper bound on the additive estimation error is O (∆) √ n n r | S | n + O (∆) = O (∆) p | S | + O (∆) = O (∆ p | S | ) . (5)Since h ( S ) = h ( ∅ ) + h ( U ) − h ( ¯ S ) and g ( S ) = g ( ∅ ) + g ( U ) − g ( ¯ S ), and since we know from (5) that h ( ¯ S ) estimates g ( ¯ S ) up to an additive error of O (∆ p | ¯ S | ) = O (∆ p n − | S | ), we also get that themaximum additive estimation error is O (∆) + O (∆ p n − | S | ) = O (∆ p n − | S | ) . (6)Taking the minimum among (5) and (6) completes the proof. (cid:4) ALGORITHM 1:

Using the Hadamard basis to learn a linear set function.

Input:

Value-query access to a set function f . Output:

A linear set function h where h ( S ) = h + P j ∈ S h j .% Query linearly many values of f query f ( ∅ ) for every i ∈ [ n ] do % Consider the i th Hadamard basis vector v i query f ( S i ) and f ( ¯ S i ) % S i , ¯ S i are sets (possibly empty) of the +1 , − v i end for % Compute h set h = f ( ∅ ) for every j ∈ [ n ] do express v { j } as P ni =1 λ i y i % y i is the Hadamard basis vector v i after normalizationset h j = √ n P ni =1 λ i ( f ( S i ) − f ( ¯ S i )) end for For simplicity, we state the following theorem for values of n that are a power of 2. In Remark 5.5we explain how to extend it beyond powers of 2. In the following Theorem, when referring toAlgorithm 1, we mean Algorithm 1 run with the ﬁrst Hadamard basis vector v being the all-onesvector. Theorem 5.3.

Let n be a power of 2. Given value-query access to a ∆ -linear set function f over n items, Algorithm 1 returns in polynomial time a linear set function h such that for every set S ⊆ [ n ] , h ( S ) is O (∆(1 + p min {| S | , n − | S |} )) -close to f ( S ) . Algorithm 1 thus O (∆ √ n ) -learns ∆ -linear set functions over n items. Proof.

First note that Algorithm 1 clearly runs in polynomial time. By the construction of h inAlgorithm 1 and by Claim 5.1, h is the function that assigns to every set S = P ni =1 λ i y i the value h ( S ) = √ n P ni =1 λ i ( f ( S i ) − f ( ¯ S i )) + f ( ∅ ). So for every i ∈ [ n ], h ( S i ) − h ( ¯ S i ) = f ( S i ) − f ( ¯ S i ), and h ( ∅ ) = f ( ∅ ). Let g be a linear set function ∆-close to f , then for every i ∈ [ n ], h ( S i ) − h ( ¯ S i ) is23∆-close to g ( S i ) − g ( ¯ S i ), and h ( ∅ ) is ∆-close to g ( ∅ ). Recall that v is the all-ones vector, whichis the indicator vector of U . So h ( U ) is also 2∆-close to g ( U ). Invoking Lemma 5.2, we get thatfor every set S , h ( S ) is O (∆(1 + p min {| S | , n − | S |} ))-close to g ( S ). Since g is ∆-close to f , thiscompletes the proof. (cid:4) Theorem 5.3 is essentially the best possible, in the following strong sense: Corollary 23 ofChierichetti et al. [2015] shows that no algorithm (deterministic or randomized) that performspolynomially-many value queries can ﬁnd a o (∆ p n/ log n )-close linear set function, even if thequeries are allowed to be adaptive. We include a proof sketch of this tightness result in AppendixD for completeness. Our tightness proof holds even for learning monotone ∆-linear set functions. Remark 5.4 (LP-based approach) . We describe an alternative way to derive the linear function h : After querying f ( S ) for S = ∅ and for S = S i , ¯ S i ∀ i ∈ [ n ] , one solves a linear program (LP).The n + 1 variables x j of the LP are intended to have the value g j for every j . The constraintsare f ( S ) − ∆ ≤ P j ∈ S x j ≤ f ( S ) + ∆ for every set S that was queried. Let h be the linear setfunction deﬁned by c j = x ∗ j for every j where x ∗ is a feasible solution of the LP. We claim that h is O (∆(1 + p min {| S | , n − | S |} )) -close to f : On each of the queried sets S , the values of h and g diﬀer by at most . Since we may assume that v is the all-ones vector, the set S = U is oneof the queried sets. Invoking Lemma 5.2 and using that g is ∆ -close to f shows the claim. Theadvantage of this LP-based approach is that additional constraints can easily be incorporated oncemore data of f is collected, potentially leading to better accuracy. Likewise, one can easily enforcedesirable properties such as non-negativity on h (if g is nonnegative). Remark 5.5 (Beyond powers of 2) . Using either Algorithm 1 or the LP-based algorithm describedin Remark 5.4, we can O (∆ √ n ) -learn ∆ -linear set functions over n items even when n is not apower of 2, as follows. Extend f to f ′ over n ′ ≥ n items U ′ , where n ′ is a power of 2 and U ⊆ U ′ ,by setting f ′ ( S ) = f ( S ∩ U ) for every set S ⊆ U ′ . Extend g to g ′ over U ′ in the same way. Noticethat the extended versions f ′ , g ′ are still ∆ -close.Given h ′ over U returned by Algorithm 1, we deﬁne h over U by setting h ( S ) = h ′ ( S ) for everyset S ⊆ U . The proof of Theorem 5.3 holds verbatim with the single following modiﬁcation: theHadamard basis vector v ∈ R n ′ is no longer the all-ones vector, but rather the vector that is +1on the ﬁrst n coordinates and − on the n ′ − n auxiliary variables. This ensures that we can applyLemma 5.2 to h instead of h ′ .For the LP-based algorithm, we can formulate the LP with n ′ + 1 variables, but since we knowthere is a feasible solution in which x j = 0 for every j > n (namely x = g ), we can add theseconstraints so that the resulting linear function h is over n items. A Missing Proof from Section 2 (Preliminaries)

Proof. [Proof of Theorem 2.15] Let ǫ = 1, and let f be the function given in Pawlik’s paper“Approximately Additive Set Functions” Pawlik [1987] over the items X ⊎ Y . We present thefunction f and the proof for completeness (and due to typos and brevity in Pawlik [1987]). Let f ( S ∪ T ) denote the value of the union of a set S ⊆ X and a set T ⊆ Y . Let X ′ (resp., Y ′ ) denotea non-empty proper subset of X (resp., Y ). The function f is deﬁned as follows: f ( ∅ ∪ ∅ ) = 0, f ( X ′ ∪ ∅ ) = 1, f ( X ∪ ∅ ) = 3, f ( ∅ ∪ Y ′ ) = − f ( X ′ ∪ Y ′ ) = 0, f ( X ∪ Y ′ ) = 1, f ( ∅ ∪ Y ) = − f ( X ′ ∪ Y ) = − f ( X ∪ Y ) = 0.As claimed in Pawlik [1987], f is weakly 1-modular. We observe that f is 2-modular, but nobetter than 2 modular (consider two non-disjoint proper subsets of X whose union is X ).24et µ be the closest linear function to f . Consider a set C = X − x + y ( x ∈ X, y ∈ Y willbe speciﬁed below). By deﬁnition f ( X ) = 3 and f ( C ) = 0. We argue that f is no closer than3 / µ .Assume for contradiction that f is (3 / − δ )-close to µ for δ >

0. We show there is large enough k for which this leads to contradiction ( k is the number of items in X , which equals the number ofitems in Y ).First notice that µ ( X ) ≥ / δ (otherwise f and µ are more than 3 / − δ apart on X ). If itwere to hold that µ ( x ) ≤ µ ( y ), we’d get that µ ( C ) = µ ( X ) − µ ( x ) + µ ( y ) ≥ / δ (i.e., f and µ are more than 3 / − δ apart on C ). So µ ( x ) > µ ( y ) for every x, y .We know that µ ( X ) can’t be too big (since it can’t exceed f ( X ) = 3 by too much). For thesame reason and since f ( Y ) = − µ ( Y ) is negative but can’t be too small. By letting k grow large, it becomes apparent that µ ( x ) cannot be bounded away from 0 from above, similarly µ ( y ) cannot be bounded away from 0 from below. Using µ ( x ) > µ ( y ) we conclude that µ ( x ) ≥ x and µ ( y ) ≤ y , and that for most items µ ( x ) → µ ( y ) → x and y that minimize | µ ( x ) | + | µ ( y ) | . For large enough k we get that µ ( x ) − µ ( y ) < δ ,so µ ( C ) = µ ( X ) − µ ( x ) + µ ( y ) > / δ − δ > f ( C ) + 3 / − δ , contradiction. The bounds assertedin the theorem follow. (cid:4) B Supplement to Section 3 (Upper Bounds)

B.1 Expanders

In this appendix we give a proof sketch for Lemma 3.1, establishing the existence of expanders witha range of parameters. We begin with a claim following from Stirling’s approximation. The proofof this claim appears for completeness in Feige et al. [2016].

Claim B.1 (Stirling) . For every c, d ∈ R where c > d > , for every suﬃciently large integer m such that cm and dm are integers, (cid:18) cmdm (cid:19) = (cid:18) c c d d ( c − d ) c − d (cid:19) m Θ (cid:18) √ m (cid:19) . Proof. [Proof of Lemma 3.1 (Sketch)] We give the proof for ( , , )-expanders. Existence for theother families in Lemma 3.1 follows from the same argument. Let k = 2 m . Suppose that there are4 m left-hand side vertices and consequently 2 m right-hand side vertices (because θ = ). The edgesare determined by making r = 5 copies of each left-hand side vertex, taking a random permutationover all 2 kr = 20 m copies, and connecting to the right-hand side in a round-robin fashion. Theprobability that there is a set of 2 kα = m left-hand side vertices with fewer than m neighbors is atmost: (cid:18) k kα (cid:19)(cid:18) kθ kα (cid:19) (cid:0) kr αθ krα (cid:1)(cid:0) kr krα (cid:1) = (cid:18) mm (cid:19)(cid:18) mm (cid:19) (cid:0) m m (cid:1)(cid:0) m m (cid:1) = (cid:18) · · · · · · · (cid:19) m Θ (cid:18) m (cid:19) = (cid:18) (cid:19) m Θ (cid:18) m (cid:19) < , m to overcome the constants in Θ(1 /m ). Hence such expanders exist for every integer multiple k of suﬃciently large k ′ . (cid:4) B.2 Strengthening the Upper Bound on K w In this appendix we use the notation of Section 3.4, and in addition denote the average deﬁcit andsurplus of the target sets of PS ∗ and NS ∗ by d ′ and s ′ , respectively. We ﬁrst state a stronger versionof Lemma 3.10: Lemma B.2 (Strong Version of Lemma 3.10) . Let k ∈ N ≥ and α, r, θ ∈ R ≥ be such that thereexists an ( α, r, θ ) -expander G k . Let G and G ′ (possibly G = G ′ ) be collections of k sets each, bothwith α -frequent items, such that the average deﬁcit of G is ≤ d , the average surplus of G ′ is ≤ s ,and their target sets have average deﬁcit ≥ d ′ and average surplus ≥ s ′ , respectively. Then M ≤ ( d + s − θ ( d ′ + s ′ )) + 2 ǫ ( r − − θ + ǫ. Proof.

We ﬁrst apply Lemma 3.9 to partition and disjointly recombine the sets of collection G via the expander G k = G k ( V, W ; E ). We use the following notation: For a vertex v ∈ V (resp., w ∈ W ) let S v (resp., S w ) be the source (resp., target) set corresponding to v (resp., w ). Denote theneighboring vertices of a vertex v by N ( v ) and its degree by deg( v ). Let S v,w be the intermediatesubset that labels (corresponds to) edge ( v, w ) ∈ E .By Lemma 3.9, for every v ∈ V , the intermediate subsets labeling the edges adjacent to v aredisjoint, and the same holds for every w ∈ W . We can thus apply Observation 2.2 to get P w ∈ N ( v ) f ( S v,w ) ≥ f ( S v ) + (deg( v ) − f ( ∅ ) − ǫ ) ∀ v ∈ V. P v ∈ N ( w ) f ( S v,w ) ≤ f ( S w ) + (deg( w ) − f ( ∅ ) + ǫ ) ∀ w ∈ W. Denote the maximum absolute value of f by M . Since collection G has average deﬁcit ≤ d , bysumming over vertices v ∈ V (where | V | = 2 k ) we get P v ∈ V f ( S v ) ≥ k ( M − d ). Since the targetsets of G have average deﬁcit ≥ d ′ , by summing over vertices w ∈ W (where | W | = 2 θk ) we get P w ∈ W f ( S w ) ≤ θk ( M − d ′ ). Clearly in the bipartite graph G k , P v ∈ V deg( v ) = P w ∈ W deg( w ),and by the parameters of G k both are equal to 2 kr . Therefore, summing over v ∈ V and w ∈ W we get 2 kM − kd + (2 kr − k )( f ( ∅ ) − ǫ ) ≤ X ( v,w ) ∈ E f ( S v,w ) ≤ θkM − θkd ′ + (2 kr − θk )( f ( ∅ ) + ǫ ) . Dividing the resulting inequality by 2 k and rearranging gives(1 − θ ) M ≤ d − θd ′ + ( r − ǫ − f ( ∅ )) + ( r − θ )( ǫ + f ( ∅ ))= d − θd ′ + 2 ǫ ( r −

1) + (1 − θ )( ǫ + f ( ∅ )) . (7)Similarly, using that the average deﬁcit of collection G ′ is ≤ s and the average deﬁcit of itstarget sets is ≥ s ′ , P v ∈ V f ( S v ) ≤ k ( − M + s ), and P w ∈ W f ( S w ) ≥ − θk ( M − s ′ ). Therefore − kM + 2 ks + (2 kr − k )( f ( ∅ ) + ǫ ) ≥ X ( v,w ) ∈ E f ( S v,w ) ≥ − θkM + 2 θks ′ + (2 kr − θk )( f ( ∅ ) − ǫ ) . k and rearranging gives(1 − θ ) M ≤ s − θs ′ + ( r − ǫ + f ( ∅ )) + ( r − θ )( ǫ − f ( ∅ ))= s − θs ′ + 2 ǫ ( r −

1) + (1 − θ )( ǫ − f ( ∅ )) . (8)Rearranging Inequalities (7) and (8) as well as averaging the resulting inequalities implies thetheorem. (cid:4) We use this lemma to show the following upper bound.

Lemma B.3 (Upper Bound on K w ) . Suppose that for ﬁxed r, r ′ , θ, θ ′ ∈ R ≥ there exist ( , r, θ ) -expanders and (1 − θ , r ′ , θ ′ ) -expanders. Then the weak Kalton constant satisﬁes: K w ≤ min ( r − − θ − θ d ′ + s ′ − θ , r ′ − θ ′ + d ′ + s ′ − θ ′ ) . Proof.

Observe that there exists k such that f has collections PS ∗ and NS ∗ of 2 k sets as describedabove, and expanders G k and G ′ k with the above parameters, respectively, exist. The ﬁrst upperbound is found in Lemma 3.11. Now consider the collection G that is the complement of the targetsets of PS ∗ . Since PS ∗ has -frequent items, then the target sets have θ -frequent items (Lemma3.9), and G has (1 − θ )-frequent items (Observation 3.7). By Corollary 3.5 with ǫ = 1, the averagesurplus of G is ≤ d ′ + 1 + δ , and clearly the average surplus of its target sets is ≥

0. Similarly, let G ′ be the complement of the target sets of NS ∗ , which has (1 − θ )-frequent items and average deﬁcit ≤ s ′ + 1 − δ . Clearly the average deﬁcit of the target sets of G ′ is ≥

0. Applying Lemma B.2 thusgives an upper bound of K w ≤ ( d ′ + s ′ + 2) + 2 r ′ − − θ ′ + 1 = 2 r ′ − d ′ + s ′ − θ ′ + 1 . (9)Taking the minimum of the bound in Lemma 3.11 and the bound in (9) completes the proof. (cid:4) We can now prove the upper bound in Theorem 1.3, by which the weak Kalton constant satisﬁes K w ≤ . Proof. [Proof of First Part of Theorem 1.3] By using the expanders with α = from Lemma 3.1we can substitute r = 5 and θ = . Since 1 − θ = , we can complement these parameters byusing the expanders with α ′ = from Lemma 3.1 for which r ′ = 4 and θ ′ = . This implies that K w ≤ min { . − . d ′ + s ′ ) , + ( d ′ + s ′ ) } = 23 . (cid:4) C Lower Bound on K w We prove the lower bound stated in Theorem 1.3 on K w . Proof. [Proof of Second Part of Theorem 1.3] We show there exists a weakly 2-modular set functionwith n = 20 that is tightly 3-linear. Thus, K w ≥ . Consider the following ( k, M )-symmetricfunction f with k = 3 (hence with n = (cid:0) (cid:1) = 20 items) and M = 3. The positive support sets(PS) form a canonical generating collection. The values of sets under f is determined by the ﬁrstapplicable rule (and hence f is well deﬁned):1. Each positive support set S ∈ PS has value f ( S ) = M = 3.27. If there is some set P ∈ PS such that S ⊂ P and for every set N ∈ NS it holds that S N , then f ( S ) = 1. Likewise (enforcing the dual symmetry property of ( k, M )-symmetricfunctions), if there is some set P ∈ PS such that P ⊂ S and for every set N ∈ NS it holdsthat N S , then f ( S ) = 1.3. Enforcinging the antisymmetry property of ( k, M )-symmetric functions, we impose − f ( S ) = f ( ¯ S ) and derive from this sets with negative value.4. All other sets have value 0.One can verify (proof omitted) that the function f deﬁned above is indeed ( k, M )-symmetricaccording to Deﬁnition 4.5. The reason why we choose k = 3 (and not smaller) is because we shalluse the following claim. Claim C.1.

For every two sets P , P ∈ PS and two sets N , N ∈ NS all the following hold:1. P ∩ P N .2. N ∩ N P .3. P N ∪ N .4. N P ∪ P . Proof.

We prove only item 1, as the proofs of the remaining items are similar. Suppose withoutloss of generality that P i (for i ∈ { , } ) contains those items whose vector representation in {± } k has bit i set to 1, and that N contains those items whose vector representation in {± } k has bit 3set to −

1. (If in N it was bit 1 that was set to −

1, the proof would be immediate.) Then because k ≥ P ∩ P contains a vector for which all three bits 1, 2 and 3 are set to 1, and hence isnot in N . (cid:4) Proposition 4.6 implies that f is tightly 3-linear. In remains to show that f is weakly 2-modular.Consider two disjoint nonempty sets S and T . As S ∩ T = ∅ , we have that f ( S ∩ T ) = 0. Henceto prove weak 2-modularity, one needs to show that | f ( S ) + f ( T ) − f ( S ∪ T ) | ≤

2. Similar to theproof of Lemma 4.7 it can be shown that one can assume without loss of generality that f ( S ) ≥ f ( S ) ≥ | f ( T ) | . We proceed by a case analysis.1. f ( S ) = 3 (namely, S ∈ PS). In this case, because T is disjoint from S , we have that T ⊂ ¯ S where ¯ S ∈ NS. The deﬁnition of f then implies that f ( T ) ≤

0. We consider three cases.(a) f ( T ) = 0. Given that T ⊂ ¯ S and ¯ S ∈ NS, for f ( T ) = 0 to hold there must be someset P ∈ PS such that T ⊂ P . We claim that f ( S ∪ T ) = 1 (which establishes weak2-modularity, because f ( S ) + f ( T ) = 3). Suppose for the sake of contradiction that f ( S ∪ T ) = 1. As S ⊂ S ∪ T and S ∈ PS, this can happen only if there is a set N ∈ NSsuch that N ⊂ S ∪ T . But then N ∩ ¯ S ⊂ T ⊂ P . Hence we found two sets in NS whoseintersection lies in a set in PS, contradicting item 2 of Claim C.1.(b) f ( T ) = −

1. In this case f ( S ) + f ( T ) = 2. We also have that f ( S ∪ T ) ≥ S ⊂ S ∪ T and S ∈ PS. Hence regardless of the value of f ( S ∪ T ), weak 2-modularityholds.(c) f ( T ) = −

3. In this case T = ¯ S and S ∪ T = U , implying that f ( S ∪ T ) = 0 = f ( S )+ f ( T ).28. f ( S ) = 1. Observe that the only way by which it may happen that f ( S ∪ T ) < P ∈ PS such that S ⊂ P (leading to f ( S ) = 1, and there is some N ∈ NS such that N ⊂ S ∪ T , and f ( S ∪ T ) = −

1. (If S ∪ T ⊂ N then also S ⊂ N and it cannot be that f ( S ) = 1.)(a) f ( T ) = 1. We claim that f ( S ∪ T ) ≥ f ( S ∪ T ) = −

1. As noted above this impliesthat there is P ∈ PS with S ⊂ P , another P ′ ∈ PS with T ⊂ P ′ (because we caninterchange S and T ), and N ∈ N S such that N ⊂ S ∪ T . But then N ⊂ P ∪ P ′ ,contradicting item 4 of Claim C.1.(b) f ( T ) = 0. In this case f ( S ) + f ( T ) = 1, and regardless of the value of f ( S ∪ T ) (whichlies in the range [ − , f ( T ) = −

1. We need to show that | f ( S ∪ T ) | 6 = 3. This follows from the fact that S innot contained in any set in NS, and T is not contained in any set in PS.3. f ( S ) = f ( T ) = 0. We need to show that | f ( S ∪ T ) | 6 = 3. We show that S ∪ T PS (and theproof that S ∪ T NS is similar). Suppose for the sake of contradiction that S ∪ T = P with P ∈ PS. Then S ⊂ P , and the fact that f ( S ) = 0 implies that there is a set N ∈ NS suchthat S ⊂ N . Likewise, T ⊂ N for some N ∈ NS. Hence P ⊂ N ∪ N , contradicting item 3of Claim C.1.Hence we established that f is weakly 2-modular, completing the proof. (cid:4) D Tightness of Learning Algorithm

The following theorem establishes tightness of the learning algorithm. Results similar to Theo-rem D.1 appear in the literature – see for example Singer and Vondr´ak [2015]. See also Chierichetti et al.[2015] (Corollary 23).

Theorem D.1.

For ∆ ≤ p log n/n and δ = o ( p n/ log n ) , no δ -learning algorithm exists for ∆ -linear set functions f , even if f is monotone, and even if one allows for unbounded computationtime (but only polynomially-many value queries). Proof. [Proof (Sketch)] Consider a random linear function g which contains a random set T ofcardinality n , in which each item has value q = ∆ / √ n log n , and the other items have value 0.Then g ( T ) = (∆ √ n ) / (2 √ log n ) (which is at most 1), whereas g ( ¯ T ) = 0. Consider now a function f constructed as follows. Call a set S balanced if || S ∩ T | − | S | | ≤ √ n log n , positively unbalanced if | S ∩ T | − | S | ≥ √ n log n , and negatively unbalanced if | S | − | S ∩ T | ≥ √ n log n . Then f ( S ) = | S | q forbalanced sets, f ( S ) = g ( S ) − √ n log n for positively unbalanced sets, and f ( S ) = g ( S ) + √ n log n for negatively unbalanced sets. Observe that g approximates f because q √ n log n = ∆. If thereare only polynomially many queries then w.h.p., for every query S , the underlying set is balanced.Hence these queries are not informative in exposing T , and δ ≥ ( f ( T ) − f ( ¯ T )) = ∆ √ n √ log n − ∆. (cid:4) Acknowledgements

We thank Assaf Naor for helpful discussions and for directing us to the paper of Kalton and Roberts[1983]. We thank Moni Naor for pointing us to the result of Dwork and Yekhanin [2008].29 eferences

M.-F. Balcan and N. J. A. Harvey. Learning submodular functions. In

Proceedings of the 43rdAnnual ACM Symposium on Theory of Computing , pages 793–802, 2011.E. Balkanski, A. Rubinstein, and Y. Singer. The limitations of optimization from samples. In

Proceedings of the 48th Annual ACM Symposium on Theory of Computing , pages 1016–1027,2017.A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. Escaping the local minima via simulated an-nealing: Optimization of approximately convex functions. In

Proceedings of the 28th Conferenceon Learning Theory , pages 240–265, 2015.D. Bertsimas and A. Thiele.

Robust and data-driven optimization: Modern decision making underuncertainty , chapter 5, pages 95–122. INFORMS PubsOnline, 2014. TutORials in OperationsResearch.M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical prob-lems.

J. Comput. Syst. Sci. , 47(3):549–595, 1993.A. V. Bondarenko, A. Prymak, and D. Radchenko. On concentrators and related approximationconstants.

J. Math. Anal. Appl. , 402(1):234–241, 2013.F. Chierichetti, A. Das, A. Dasgupta, and R. Kumar. Approximate modularity. In

Proceedings ofthe 56th Symposium on Foundations of Computer Science , pages 1143–1162, 2015.C. Dwork and S. Yekhanin. New eﬃcient attacks on statistical disclosure control mechanisms. In

Proceedings of the 28th Annual International Cryptology Conference , pages 469–480, 2008.U. Feige. On maximizing welfare when utility functions are subadditive.

SIAM J. Comput. , 39(1):122–142, 2009.U. Feige and R. Izsak. Welfare maximization and the supermodular degree. In

Proceedings of the4th Innovations in Theoretical Computer Science , pages 247–256, 2013.U. Feige, M. Feldman, and I. Talgam-Cohen. Approximate modularity revisited. Supplementedversion, available at https://arxiv.org/abs/1612.02034 , 2016.M. X. Goemans, N. J. A. Harvey, S. Iwata, and V. S. Mirrokni. Approximating submodularfunctions everywhere. In

Proceedings of the 20th Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 535–544, 2009.A. Hassidim and Y. Singer. Submodular optimization under noise. In

Proceedings of the 34thInternational Conference on Machine Learning , pages 1069–1122, 2017.D. H. Hyers. On the stability of the linear functional equation.

PNAS , 27:222–224, 1941.S. Iwata, L. Fleischer, and S. Fujishige. A combinatorial strongly polynomial algorithm for mini-mizing submodular functions.

J. ACM , 48(4):761–777, 2001. doi: 10.1145/502090.502096. URL http://doi.acm.org/10.1145/502090.502096 .S. M. Jung.

Hyers-Ulam-Rassias stability of functional equations in nonlinear analysis . Springer,2011. 30. Kalton and J. W. Roberts. Uniformly exhaustive submeasures and nearly additive set functions.

Transactions of the American Mathematical Society , 278(2):803–816, 1983.A. Krause and V. Cevher. Submodular dictionary selection for sparse representation. In

Proceedingsof the 27th International Conference on Machine Learning , pages 567–574, 2010.A. Krause, A. P. Singh, and C. Guestrin. Near-optimal sensor placements in gaussian processes:Theory, eﬃcient algorithms and empirical studies.

Journal of Machine Learning Research , 9:235–284, 2008.B. Lehmann, D. Lehmann, and N. Nisan. Combinatorial auctions with decreasing marginal utilities.

Games and Economic Behavior , 55:270–296, 2006.R. Paes Leme. Gross substitutability: An algorithmic survey. To appear in Games and EconomicBehavior, 2017.B. Pawlik. Approximately additive set functions.

Colloquium Mathematicae , 54(1):163–164, 1987.N. Pippenger. Superconcentrators.

SIAM J. Comput. , 6(2):298–304, 1977.T. Roughgarden, I. Talgam-Cohen, and J. Vondr´ak. When are welfare guarantees robust? In

Proceedings of the 20th International Workshop on Approximation Algorithms for CombinatorialOptimization Problems , pages 22:1–22:23, 2017.A. Schrijver. A combinatorial algorithm minimizing submodular functions in strongly polynomialtime.

Journal of Combinatorial Theory , 80(2):346–355, 2000.Y. Singer and J. Vondr´ak. Information-theoretic lower bounds for convex optimization with erro-neous oracles. In