[PDF] A Faster Exponential Time Algorithm for Bin Packing With a Constant Number of Bins via Additive Combinatorics

Abstract

In the Bin Packing problem one is given n items with weights w 1 ,…, w n and m bins with capacities c 1 ,…, c m . The goal is to find a partition of the items into sets S 1 ,…, S m such that w( S j )≤ c j for every bin j , where w(X) denotes ∑ i∈X w i . Björklund, Husfeldt and Koivisto (SICOMP 2009) presented an O ⋆ ( 2 n ) time algorithm for Bin Packing. In this paper, we show that for every m∈N there exists a constant σ m >0 such that an instance of Bin Packing with m bins can be solved in O( 2 (1− σ m )n ) randomized time. Before our work, such improved algorithms were not known even for m equals 4 . A key step in our approach is the following new result in Littlewood-Offord theory on the additive combinatorics of subset sums: For every δ>0 there exists an ε>0 such that if |{X⊆{1,…,n}:w(X)=v}|≥ 2 (1−ε)n for some v then |{w(X):X⊆{1,…,n}}|≤ 2 δn .

Full PDF

AA Faster Exponential Time Algorithm for Bin Packing With aConstant Number of Bins via Additive Combinatorics

Jesper Nederlof ∗ Jakub Pawlewicz † Céline M. F. Swennenhuis ‡ Karol Węgrzycki § Abstract

In the Bin Packing problem one is given n items with weights w , . . . , w n and m bins withcapacities c , . . . , c m . The goal is to ﬁnd a partition of the items into sets S , . . . , S m such that w ( S j ) (cid:54) c j for every bin j , where w ( X ) denotes (cid:80) i ∈ X w i .Björklund, Husfeldt and Koivisto (SICOMP 2009) presented an O (cid:63) (2 n ) time algorithm forBin Packing. In this paper, we show that for every m ∈ N there exists a constant σ m > suchthat an instance of Bin Packing with m bins can be solved in O (2 (1 − σ m ) n ) randomized time.Before our work, such improved algorithms were not known even for m equals .A key step in our approach is the following new result in Littlewood-Oﬀord theory on theadditive combinatorics of subset sums: For every δ > there exists an ε > such that if |{ X ⊆ { , . . . , n } : w ( X ) = v }| (cid:62) (1 − ε ) n for some v then |{ w ( X ) : X ⊆ { , . . . , n }}| (cid:54) δn . ∗ Utrecht University, The Netherlands, [email protected] . Supported by the project CRACKNP that has re-ceived funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research andinnovation programme (grant agreement No 853234). † Institute of Informatics, University of Warsaw, Poland, [email protected] . ‡ Eindhoven University of Technology, The Netherlands, [email protected] . Supported by the Nether-lands Organization for Scientiﬁc Research under project no. 613.009.031b. § Institute of Informatics, University of Warsaw, Poland, [email protected] . Supported by the grants2016/21/N/ST6/01468 and 2018/28/T/ST6/00084 of the Polish National Science Center and project TOTAL thathas received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 researchand innovation programme (grant agreement No 677651). a r X i v : . [ c s . D S ] J u l Introduction

A central aim in contemporary algorithm design is to minimize the worst-case complexity of analgorithm for a given (supposedly) hard computational problem in a ﬁne-grained sense. The under-lying goal is to reveal the optimal runtime witnessed by (1) an algorithm with worst-case complexity T ( n ) on instances with parameter n , and (2) a lower bound that excludes improvements to T ( n ) − ε time for some constant ε > . For some problems, it is an especially intriguing question whethernatural runtimes of the basic algorithms solving them are optimal. One of the most importantinstances of such a question for an NP-complete problem is about improvements over a relativelydirect dynamic programming algorithm for Set Cover: Question 1:

Can Set Cover with n elements be solved in O (cid:63) ((2 − ε ) n ) time, for some ε > ?Unfortunately, Question 1 seems to have a fate similar to the Strong Exponential Time Hypothesis(that is about a similar improvement for the CNF-SAT problem): While there is an increasinginterest and dependence on its validity (see e.g. [15, 39]), we seem to be far from resolving it.Therefore, it is natural to study Question 1 for special cases of Set Cover. And indeed, improvedalgorithms of the type asked in Question 1 were already presented for instances with small sets [38],(more generally) large solutions [45], and for several other cases (see e.g. [27]).However, some of the most fundamental NP-complete problems that are special cases of SetCover such as Graph Coloring and Directed Hamiltonicity still defy considerable research eﬀortsto obtain the type of improved algorithms asked for in Question 1 (see e.g. [10, 22]). Bin Packing

We study one of such a fundamental NP-complete problem, the

Bin Packing prob-lem : Given item weights w (1) , . . . , w ( n ) ∈ N and capacities c , . . . , c m ∈ N , is there a partition S , . . . , S m of [ n ] such that w ( S j ) (cid:54) c j for each j ∈ [ m ] ? Here w ( X ) denotes (cid:80) i ∈ X w ( i ) . Due toits elegant formulation and clear practical applicability, Bin Packing is a central problem in com-puter science. For example, it models the most basic non-trivial scheduling problem with multiplemachines. While Bin Packing has been extensively studied from an approximation and online algo-rithms perspective [13], much less research has been devoted to exact algorithms for Bin Packing.The currently fastest algorithm for Bin Packing is a consequence of the aforementioned algo-rithm for Set Cover from [9], and it runs in O (cid:63) (2 n ) time. With Question 1 on the horizon, we askwhether this can be improved: Question 2:

Can Bin Packing with n items be solved in O ((2 − ε ) n ) time, for some ε > ?The only improvement over the O (cid:63) (2 n ) time algorithm for Bin Packing is due to Lente et al. [40],who gave an O (cid:63) ( m n/ ) time algorithm. Note, that this is only an improvement for m = 2 , bins andQuestion 2 remained illusive for m = 4 already. In stark contrast, our main result is an improvementover the O (cid:63) (2 n ) time algorithm for every constant number of bins: Theorem 1.1 (Main Theorem) . For every m ∈ N there is a constant σ m > such that every BinPacking instance with m bins can be solved in O (2 (1 − σ m ) n ) time with high probability. While our algorithm does not resolve Question 2, we believe it makes substantial progress on itbecause (1) Set Cover with a constant-sized solution is as least as hard as general Set Cover, and(2) the other extreme, Set Cover with a linear number sets in the solution (and hence Bin Packingwith a linear number of bins with equal capacity ), can be solved in O ((2 − ε ) n ) time (see [45]). In principle, it is natural to assume the Set Cover instance has n elements and poly( n ) sets, but an algorithmby Björklund et al. [9] solves Set Cover instances in O (cid:63) (2 n ) time irrespective of the number of sets. Krauthgamer and Trabelsi [39] rewrite a Directed Hamiltonicity instance eﬃciently as a Set Cover instance. Assuming the capacity of each bin equals c , create a Set Cover instance with all item sets of weight at most c . .1 Our Approach for Proving Theorem 1.1 As our starting point, we extend the methods from [8, 45] to show that instances of Bin Packingwith the following restrictions admit an O (cid:63) (2 (1 − σ m ) n ) time randomized algorithm for some σ m > : (R1) the instance has anti-concentrated subset sums in the sense that β ( w ) (cid:54) (1 − ε ) n for some ε > , where β ( w ) := max v |{ X ⊆ { , . . . , n } : w ( X ) = v }| , and (R2) the instance is tight in the sense that (cid:80) mj =1 c j = w ([ n ]) .Fix a set of bins L ⊆ [ m ] and recall ( S , . . . , S m ) denotes a solution. The crux of (R1) and (R2) is that they together imply that the number of candidates for S L := ∪ j ∈ L S j is at most (1 − ε ) n since w ( S L ) = (cid:80) j ∈ L c j . We explain in § 1.1.3 how this allows a faster algorithm via the methodsof [8, 45].However, extending this algorithm to an improved algorithm that solves all instances with aconstant number of bins requires both new combinatorial (for relaxing (R1) ) and new algorithmic(for relaxing (R2) ) insights that are our main contributions. Therefore we ﬁrst discuss these insights. Our main combinatorial contribution is a new structural insight on instances that do not satisfy (R1) , i.e. vectors w with |{ X ⊆ { , . . . , n } : w ( X ) = v }| (cid:62) (1 − ε ) n for some v and ε > .Determining the structure of such vectors w is well-known in additive combinatorics as the Littlewood-Oﬀord Problem . Its rich theory has found applications ranging from pure mathematics(such as estimating the singularity of random Bernoulli matrices [51] or zeroes of random polyno-mials [41]), to database security [28], and to complexity theory [19, 34, 43]. See also the designatedchapter in the standard textbook on additive combinatorics [50]. However, whereas most works(with notable exceptions being e.g. [29, 48]) assumed inversely polynomially small concentration,e.g. β ( w ) (cid:62) n /n O (1) , restriction (R1) is about inversely exponentially small concentration.Recent work studied such exponentially small concentration with applications to improved ex-ponential time algorithms for the Subset Sum problem [2, 5]. Speciﬁcally, they studied trade-oﬀbetween the parameters β ( w ) and | w (2 [ n ] ) | := |{ w ( X ) : X ⊆ [ n ] }| . Two extremal cases are:If w a :=(0 , , . . . , then | w a (2 [ n ] ) | = 1 and β ( w a ) = 2 n If w b :=(1 , , . . . , n − ) then | w b (2 [ n ] ) | = 2 n and β ( w b ) = 1 One may suspect that all vectors w ∈ Z n are a combination of these two extremes and thereforethat a smooth trade-oﬀ between β ( w ) and w (2 [ n ] ) can be proved. This suspicion can be conﬁrmed inthe case w ∈ F n where | w (2 [ n ] ) | β ( w ) = 2 n . Observe that a similar trade-oﬀ for w ∈ Z n would allowus to lift (R1) by a simple algorithm O (cid:63) ( | w (2 [ n ] ) | m ) time algorithm for Bin Packing (Lemma 3.4).Unfortunately, this intuition is not true and the case w ∈ Z n is far more subtle. For instance,Wiman [54] showed in his remarkable bachelor thesis that, surprisingly, vectors satisfying simulta-neously both | w (2 [ n ] ) | (cid:62) (1 − ε ) n and β ( w ) (cid:62) . n exist for any ε > . Our main combinatorialcontribution is that instances with the same parameters but the roles of β ( w ) and | w (2 [ n ] ) | swappeddo not exist: Theorem 1.2.

Let ε > . If β ( w ) (cid:62) (1 − ε ) n , then | w (2 [ n ] ) | (cid:54) δn , where δ ( ε ) = O ε → (cid:18) log(log(1 /ε )) √ log(1 /ε ) (cid:19) . So w ( i ) is a n -dimensional binary vector for every i . Then w (2 [ n ] ) = 2 rk ( w ) and β ( w ) = 2 n − rk ( w ) where w isinterpret as matrix by concatenating the vectors w (1) , . . . , w ( n ) and rk is the rank over F n . UniquelyDecodable Code Pairs (UDCPs) from information theory (see Subsection 1.2 for details). Thisimplies for example that if β ( w ) (cid:62) (1 − ε ) n , then | w (2 [ n ] ) | (cid:54) . n + √ ε by a result on UDCPsfrom [4]. However, the reduction from [2] is symmetric with respect to swapping the roles of β ( w ) and w (2 [ n ] ) , and thus by the result from [54] UDCP techniques alone are not enough to decreasethe constant . beyond . .Therefore, we need a new ideas to reduce the constant . to an arbitrarily small one. To doso, we ﬁrst investigate the combinatorial structure of the hyperplane H := { x ∈ Z n : (cid:104) w, x (cid:105) = v } ,assuming | H ∩{ , } n | (cid:62) (1 − ε ) n . Afterwards we apply an argument similar to the UDCP connectionfrom [2]. We formally describe our approach for proving Theorem 1.2 in Section 4.Note that Theorem 1.2 enables us to lift (R1) : We may assume β ( w ) (cid:54) (1 − ε m ) n where ε m > depends on m since otherwise the dynamic programming O (cid:63) ( | w (2 [ n ] ) | m ) algorithm will be fastenough (see Lemma 3.4) As mentioned before, (R2) is algorithmically useful because of the following reason: We aim todetect a solution S , . . . , S m to the Bin Packing instance by listing all candidates for S L := (cid:83) j ∈ L S j for some L ⊆ [ m ] , and (R2) implies that w (cid:0) S L (cid:1) = (cid:80) j ∈ L c j . This allows us to narrow down thenumber of candidates to (1 − ε ) n by (R1) (we explain in §1.1.3 why this is useful). Note this evennarrows down the number of candidates for S L if all bins have polynomially bounded slack , i.e., c j − w ( S j ) (cid:54) n O (1) since the number of possibilities of w ( S L ) is only n O (1) .But generally this strategy does not work whenever a bin has a large slack , that is c j − w ( S j ) is large. While reductions in several similar situations were able to turn inequalities into equalitiesvia general rounding techniques (such as [46, 53]), we need a more sophisticated method in thispaper to deal with this issue: The idea of [46] is to divide the weights by roughly c j − w ( S j ) and(conservatively) round to an integer. In this case, the bin j has small slack with respect to therounded weight function. The major complication however is that for diﬀerent bins we would thenneed to work with diﬀerently rounded weight functions, which still does not allow us to narrowdown the number of options for w ( S L ) and hence (via (R1) ) S L .Instead we work with a rounded version w θ of weights w where w θ ( i ) is obtained from w ( i ) byonly keeping the θ most signiﬁcant bits. We will show we can choose θ such that | w θ (2 [ n ] ) | ≈ δn , forsome parameter δ that depends on m . We will deal with the bins in two diﬀerent ways dependingon whether it has large slack (i.e. is at least approximately n l − θ , assuming all weights are l -bitintegers) or not: • Large Slack Bins:

In this case, our idea is loosely inspired by rounding approximationalgorithms, for e.g. for Knapsack (see e.g. [37, Section 11.8]). Observe that if some bin haslarge slack, we can split it in two parts. Then, we only need to keep track of the roundedweight of these parts in order to determine whether they jointly ﬁt into the bin. Because weassumed the upper bound w θ (2 [ n ] ) (cid:47) δn we can aﬀord to keep track of all combinations ofrounded weights as long as δ < /m . • Small Slack Bins:

In this case we have a split of the bins ( L, R ) and all bins in L have smallslack. Now we use the lower bound | w θ (2 [ n ] ) | (cid:39) δn and our additive combinatorics resultguarantees β ( w θ ) (cid:54) (1 − ε ( δ )) n for some ε ( δ ) > . Now, we use the fact that all bins have smallslack. Note, that there are only n O (1) candidates for w θ ( S L ) and therefore there are at most n O (1) (1 − ε ( δ )) n candidates for S L , which can be algorithmically exploited.3igure 1: Schematic view of the algorithm from §1.1.3 . A point in the square represents a set in [ n ] .The vertical axis corresponds to the cardinality of this set (e.g., longest horizontal line represents allsets in (cid:0) [ n ] n/ (cid:1) ). The left Figure illustrates the analysis for the case when there exists an α -balancedsolution W ∈ (cid:0) [ n ] n/ ± αn (cid:1) . We iterate through all W in time proportional to area the of the red region.The right Figure illustrates the case of an α -unbalanced solution. A division of the solution ( L, R ) is witnessed by the roughly αn sets W in (cid:0) [ n ] n/ (cid:1) satisfying S L ⊆ W ⊆ [ n ] \ S R .In this informal discussion, we omitted several nontrivial technical issues. For example, in orderto deal with instances with both a substantial number of small slack bins and large slack bins, weneed to distinguish a number of cases. Due, to the subtle technical issues, we need to deal witheach one of them in slightly diﬀerent ways. Details are postponed to Section 3. We now discuss how the methods from [8, 45] can be used to solve all instances that satisfy Restric-tions (R1) and (R2) in O (cid:63) (2 (1 − σ m ) n ) for some σ m > . An important subroutine from [8] is analgorithm that, given a set family W ⊆ [ n ] and set of bins L , computes for all W ∈ W whether theitems in W can be divided among the bins in L . That is, it computes whether W can be a candidatefor S L = ∪ j ∈ L S j . The runtime of this algorithm is O ( |↓W| n ) , where ↓W := { X ⊆ W : W ∈ W} is deﬁned as the down-closure of W . The analogous up-closure of all supersets of elements from W is denoted with ↑W . Let us ﬁx a solution ( S , . . . , S m ) . We consider two cases based on how‘balanced’ as solution is, with respect to a small parameter α ∈ [0 , : • There is a b ∈ [ m ] such that (cid:80) bj =1 | S j | ∈ [ n/ ± αn ] . Observe (cid:83) bj =1 S j is an element of W :=  Y ⊆ [ n ] : w ( Y ) := b (cid:88) j =1 c j , and | Y | ∈ [ n/ ± αn ]  , and that |W| (cid:54) β ( w ) (cid:54) (1 − ε ) n by (R1) . Now we can enumerate W in essentially (1 − ε ) n time. We will present an O (( |↓W| + |↑W| ) n ) time algorithm that for each W ∈ W computeswhether W can divided among bins , . . . , b and [ n ] \ W among bins b + 1 , . . . , m that is basedon techniques from [8]. This will detect a solution if it exists. We bound the running timeusing the property | W | ∈ [ n/ ± αn ] . In this case we will show |↓W| + |↑W| (cid:54) (1 − ε (cid:48) ) n andhence the algorithm is fast enough (see left Figure 1 for an illustration). The actual deﬁnition of α -balancedness (Deﬁnition 3.3) will be independent of the ordering of the bins. There exists no b ∈ [ m ] such that (cid:80) bj =1 | S j | ∈ [ n/ ± αn ] . Here we can use a methodfrom [45]: We let W consist of (1 − α ) n independently sampled subsets of [ n ] of cardinality n/ . We answer yes if there exist W ∈ W , disjoint sets S L = S (cid:48) , . . . , S (cid:48) b − ⊆ W and S R = S (cid:48) b +1 , . . . , S (cid:48) m ⊆ [ n ] \ W such that w ( S (cid:48) j ) = c j for all j ∈ [ m ] \ { b } . This conditioncan also be computed in O (( |↓W| + |↑W| ) n ) time by the methods of [8]. The crux is that bothconditions together imply our instance is a yes -instance, since the remaining elements haveweight c b by Restriction (R2) . Moreover, by the balancedness assumption at least αn sets W ⊆ [ n ] with the above conditions exist. Therefore the random sampling will include such a W with good probability (see right Figure 1 for an illustration). Littlewood Oﬀord, UDCP’s, and Exponential Time Algorithms.

Two sets

A, B ⊆ { , } n form a Uniquely Decodable Code Pair (UDCP) if | A + B | = | A | · | B | , where A + B := { a + b : a ∈ A, b ∈ B } (and addition is in Z n ). The maximal sizes of UDCP’s have been very well studied ininformation theory. See e.g. [49, Section 3.5.1] for a (not so recent) overview. Two record upperbounds are | A | · | B | (cid:54) . n (from [52]) and | A | (cid:54) (0 . √ ε ) n whenever | B | (cid:54) (1 − ε ) n (from [4]).The study of UDCP’s is relevant for this paper by the following connection shown in [3]: For anyvector w ∈ Z n , there is a UDCP A, B ⊆ { , } n such that | A | = | w (2 [ n ] ) | and | B | = β ( w ) .A study of the trade-oﬀ between the parameters | w (2 [ n ] ) | and β ( w ) was already fruitful forobtaining improved exponential time algorithms in two earlier papers in the context of the SubsetSum problem. In this problem one is given w ∈ Z n and a target integer t and one needs toﬁnd a subset X ⊆ [ n ] such that w ( X ) = t . First, the aforementioned paper [3] combined theirconnection to UDCP’s with the bound from [52] to show that instances of Subset Sum satisfying | w (2 [ n ] ) | (cid:62) . n can be solved in O (2 . n ) time, thereby improving the best O (cid:63) (2 n/ ) worstcase run time from [31] for these instances. Second, a slight variant of the trade-oﬀ was used in [5]to give a O (2 . n ) time algorithm that uses a random oracle and only a polynomial in the inputsize amount of working memory. Exact Algorithms with Minimum Worst Case Run Time for Set Cover.

Question 1 wasfor the ﬁrst time explicitly posed in [16], who showed that a no answer to (a variant of) the questionimplies hardness in a ﬁne-grained sense for the Subset Sum, Steiner Tree, and Connected VertexCover problems. A main motivation in [16] posing the question was a curious reduction showingthat there is no improved algorithm for counting the number of Set Cover solutions modulo 2 unlessimproved algorithms for CNF-Sat exist (i.e. the Strong Exponential Time Hypothesis fails). Laterthe assumption that no improved algorithm exists was dubbed as ‘Set Cover Conjecture’ (see e.g. [17,Conjecture 14.36]). Since then, the conjecture has been used in several works, e.g. in [1, 39].On the positive side, (especially for this work) important algorithmic tools were developed in [9]:Fast zeta/Möbius transformations were introduced in the area of exponential time algorithms toshow that Set Cover can be solved in n · n O (1) even when the number of sets in the input isexponential in n . One major consequence was a n · n O (1) time algorithm for computing whether aninput graph on k vertices has a proper coloring with k colors. While for k = 3 , faster algorithmsexist (see e.g. [11]), this is still the fastest algorithm for k > .Improved algorithms for solving Set Cover instances of sets with bounded cardinality were givenin [38]. Later, this was generalized to improved algorithms for Set Cover instances where theoptimum is linear in the universe size [45]. Other instance that allows improved algorithm were alsopresented in e.g. [26]. 5 xact Algorithms with Minimum Worst Case Run Time for Bin Packing. In a text-book on exact exponential time algorithms it was shown that Bin Packing can be solved in time O ( n max i w ( i )2 n ) time [23, Section 4.2.3]. A faster algorithm O (cid:63) (2 n ) time algorithm was givenin [9]. Even faster algorithms were given for m = 2 , in [40].In [25] it was shown that Bin Packing can be solved in polynomial time if there are only aconstant number of distinct items weights. In [32] Bin Packing with a constant number of bins andbounded items weights was studied. A Dynamic Programming algorithm similar to the one from 3.4was studied: It was observed the algorithm runs in time n O ( m ) if the items are polynomial in n .The authors show this run time cannot be improved to n o ( m/ log m ) , unless the Exponential TimeHypothesis fails. Heuristics for Bin Packing.

The applications and combinatorial properties of Bin Packing havebeen studied since the 1930’s [35]. To the best of our knowledge the ﬁrst attempt to exactly solveBin Packing with assistance of the modern computer was developed in ﬁfties by Eisemann [21],with motivation to trim losses in cutting rolls of paper. Starting from the seventies the researchon exact algorithms for Bin Packing focused on the branch-and-bound technique proposed by Eilonand Christoﬁdes [20]. These heuristics work great in practice. Nevertheless, there are no theoreticalguarantees on their worst case performance.For a modern survey and experimental evaluations of the available software see [42, 18].

Approximation Algorithms for Bin Packing.

Bin Packing is one of the problems that ini-tiated the study of approximation algorithms. The earliest one is the

First Fit algorithm anal-ysed by Johnson [33] that requires at most . · OPT + 1 bins. The major breakthrough wasdone by Karmarkar-Karp [36] who provided a polynomial time algorithm that requires at mostOPT + O (log ( OPT )) bins. Recently, a big leap forward was done by Rothvoß [47] who gave apolynomial time algorithm that requires only OPT + O (log( OPT ) log log(

OPT )) bins and Hobergand Rothvoß [30] who improved this even further to OPT + O (log( OPT )) bins. This paper is organized as follows: In Section 2 we present some preliminaries and introduce somenotations. In Section 3 we present the algorithm and proof of our main theorem, assuming Theo-rem 1.2. The latter theorem is proved in the next two Sections 4 and 5.

All the logarithms are base unless stated otherwise. In this paper we assume that basic arithmeticoperations take constant time. We can then use a result of Frank and Tardos [24] to assume that max i w ( i ) (cid:54) n O (1) . Throughout the paper we use the O (cid:63) notation to hide polynomial factors andthe (cid:101) O notation to hide logarithmic factors. We say a function f ( ε ) = O ε → ( g ( ε )) if there exists apositive number C and suﬃciently small ε > , such that | f ( ε ) | (cid:54) C · g ( ε ) for all ε < ε . We use Ω ε → similarly to express lower bounds.We use [ n ] to denote the set { , . . . , n } . If a, b ∈ R and B (cid:62) we let [ a ± b ] denote the interval [ a − b, a + b ] . If A and B are sets we denote A B as the set of vectors indexed by B with valuesfrom A , and we will interchangeably address these vectors as functions from B to A . If f ∈ A B and b ∈ B we denote f − ( b ) := { a ∈ A : f ( a ) = b } for its inverse evaluated at b . If x, y ∈ R B we denote (cid:104) x, y (cid:105) := (cid:80) b ∈ B x b · y b for their inner product. 6o quickly refer to properties of a solution of a Bin Packing instance we use the followingnotations: The function w indicated the weights of the input. It is extended to sets X ⊆ [ n ] bydeﬁning w ( X ) := (cid:80) i ∈ X w ( i ) and to set families F ⊆ [ n ] by deﬁning { w ( X ) : X ∈ F } . We say aset X ⊆ [ n ] of items can be divided over bins L ⊂ [ m ] if there is a partition X , . . . , X | L | of X , suchthat for all j ∈ { , . . . , | L |} , the set X j can be placed in bin j , i.e., w ( X j ) (cid:54) c j . Our algorithm will crucially relies on the following algorithmic tools and deﬁnitions tools from [8].

Deﬁnition 2.1 (Zeta and Möbius Transform) . Let f : 2 U → N . Then the Zeta-transform ζf andMöbius transform µf are functions from U to N such that for every X ⊆ U : ( ζf )( X ) := (cid:88) Y ⊆ X f ( Y ) ( µf )( X ) := (cid:88) Y ⊆ X ( − | U \ Y | f ( Y ) Deﬁnition 2.2.

Given

S ⊆ U , the down-closure ↓S and up-closure ↑S are deﬁned as follows: ↓S := { X | ∃ S ∈ S : X ⊆ S } ↑S := { X | ∃ S ∈ S : X ⊇ S } . Theorem 2.3 (Fast Zeta/ Möbius transform [8]) . Suppose that f : 2 U → N is such that f ( X ) canbe evaluated in T time for any given X ⊆ U , and let S ⊆ U be a set family and f : 2 U → N .There is an algorithm that, given access to an oracle that for given X evaluates f ( X ) in T timeand S , can compute for every X ∈ ↓S the values ( ζf )( X ) and ( µf )( X ) . The algorithm runs in O ( |↓S|| U | T ) time. Deﬁnition 2.4 (Cover Product and Dot Product) . Given f, g : 2 U → N , the cover product f ∗ c g = h and the dot product f · g = h (cid:48) are the functions h : 2 U → N such that h ( Z ) := (cid:88) X ∪ Y = Z f ( X ) g ( Y ) h (cid:48) ( Z ) := f ( Z ) · g ( Z ) . Theorem 2.5 ([7]) . µ (( ζf ) · ( ζg )) = f ∗ c g . Theorem 2.6.

Suppose that we have Bin Packing instance with bin capacities c , . . . , c m , an itemweight function w . Then for any B ⊆ [ m ] and set W ⊆ [ n ] , computing for all X ∈ ↓W whether X can be divided over the bins in B can be done in time O ( |↓W| n ) . Similar, for any B ⊆ [ m ] and set W ⊆ [ n ] , computing for all X ∈ ↑W whether [ n ] \ X can be divided over the bins in B can be donein time O ( |↑W| n ) .Proof. For all j = 1 , . . . , m deﬁne a function f j : 2 [ n ] → { , } as f j ( X ) = (cid:40) , if w ( X ) (cid:54) c j , otherwise.Assume without loss of generality that B = { , . . . , d } . Notice that X can be divided over the binsin B if and only if ( f ∗ c f ∗ c · · · ∗ c f d )( X ) > . By Theorem 2.5 we have that f ∗ c f ∗ c · · · ∗ c f d = µ (( ζf ) · ( ζf ) · · · ( ζf d )) . Then, the right hand side can be computed in O ( |↓W| ) time using subsequently fast d zeta-transformation (Theorem 2.3), naïve dot product computation, and one fast Möbius-transformation(Theorem 2.3). The proof for the second part of the theorem one takes W (cid:48) := { [ n ] \ W : W ∈ W} and applies the technique above to W (cid:48) . Notice that indeed ↓W (cid:48) = ↑W .7ote this can be used solve to obtain the algorithm already mention in Section 1: Theorem 2.7 ([8]) . Bin Packing with capacities c , . . . , c m can be solved in O (cid:63) (2 n ) time.Proof. For i = 1 , . . . , m deﬁne the function f i : 2 [ n ] → { , } as f i ( X ) = (cid:40) , if w ( X ) (cid:54) c i , otherwise . Note that ( f ∗ c f ∗ c . . . ∗ c f m )([ n ]) > if and only if we have YES instance. By Theorem 2.5 wehave that f ∗ c f ∗ c . . . ∗ c f m = µ (( ζf ) · ( ζf ) . . . ( ζ · f m )) , and the right hand side can be computed in O ∗ (2 n ) time using subsequently fast m zeta-transformations(Theorem 2.3), naïve dot product computation, and one fast Möbius-transformation (Theorem 2.3). We heavily use properties of the entropy function, which we will now deﬁne. Let D = (Ω , p ) be adiscrete probability space. The entropy h ( D ) of D is deﬁned as follows: − (cid:88) x ∈ Ω p ( x ) log p ( x ) . (1)We say p = ( p , . . . , p k ) is a probability vector if the p i ’s are non-negative and satisfy (cid:80) ki =1 p i = 1 . Ifno underlying probability space is given, we may interpret p as a probability measure over { , . . . , k } and thus (1) gives h ( p ) = − (cid:80) ki =1 p i log p i . The support of p is k . If p ∈ (0 , , we use the shorthandnotation h ( p ) := h ( p, − p ) . If n is positive integer, we let (cid:0) np · n (cid:1) denote the multinomial coeﬃcient (cid:0) np n,p n,...,p k n (cid:1) . This multinomial coeﬃcient can be approximated with h ( p ) as follows: Lemma 2.8 ([14], Lemma 2.2) . If p is probability vector with support at most s , then (cid:18) n + s − s − (cid:19) − h ( p ) n (cid:54) (cid:18) np · n (cid:19) (cid:54) h ( p ) n . We will frequently use the special case (cid:0) npn (cid:1) (cid:54) h ( p ) n where p ∈ (0 , .The following lemma states the intuitive fact that close probability vectors have close entropy. Lemma 2.9.

Let p, q ∈ R k be probability vectors such that | p i − q i | (cid:54) ε for each i = 1 , . . . , k . Then | h ( p ) − h ( q ) | (cid:54) ln(2) kε log ε .Proof. Recall that h ( p ) = (cid:80) ki =1 p i log p i . Thus the lemma follows by applying the following inequal-ities to all summands of the entropy of p and q : If x, ε, x + ε ∈ [0 , , then we have x log x − ln(2) ε (cid:54) ( x + ε ) log x + ε (cid:54) x log x + ε log ε . The second inequality is direct, and the ﬁrst inequality can be derived as ( x + ε ) log x + ε = x log x + x log xx + ε + ε log x + ε (cid:62) x log x − x log(1 + εx ) (cid:62) x log x − ln(2) ε, where we use the standard fact z (cid:54) exp( z ) in the last inequality.8 Proof of Theorem 1.1

In this section we prove our main theorem which we ﬁrst restate for convenience:

Theorem 1.1 (restated) . For every m ∈ N there is a constant σ m > such that every Bin Packinginstance with m bins can be solved in O (2 (1 − σ m ) n ) time with high probability. This section is organized as follows: In Subsection 3.1 we introduce deﬁnitions that will beused throughout this section, such as the key deﬁnition of α -balanced solutions . We then prove inSubsection 3.2 that ‘easy’ instances of Bin Packing, namely those where w generates relatively fewdistinct sums and those with α -unbalanced solutions (for some α > ), can be solved fast. Wecan therefore assume that there are only α -balanced solutions and that | w θ (2 [ n ] ) | (cid:62) δn (for some δ < m ) in the rest of the section.Subsection 3.3 introduces a few more deﬁnitions, such as the slack of a bin, which is the unusedcapacity of a bin in a solution. This is also where we deﬁne the ‘ θ -pruned item weights’ as the bitrepresentation of the weights, pruned to the θ most signiﬁcant bits. The parameter θ is then chosensuch that | w θ (2 [ n ] ) | ≈ δn , as discussed in § 1.1.2. These deﬁnitions will be central in solving theremaining two types of instances.In Subsection 3.4, we consider instances where at least roughly half of the items are in a binwith small slack. This is where we use the approach discussed in §1.1.1 and apply Theorem 1.2 onthe θ -pruned item weights, to conclude that β ( w θ ) (cid:54) − (cid:15) for some (cid:15) > .Subsection 3.5 then solves instances where at least roughly half of the items are in a bin withlarge slack. In the proof, we can split the large slack bins into two parts, where we use the θ -pruneditem weights in each of these parts in order to determine whether they ﬁt. Because w θ (2 [ n ] ) ≈ δn ,there are at most δmn diﬀerent tuples of weights, which we can keep track of since we assumed δ < m . Furthermore, the splitting of the large slack bins into two, guarantees a constant probabilityto correctly guess how to split the items in small slack bins into two.Finally, the proof of Theorem 1.1 can be found in Subsection 3.6, where we combine all theseresults by choosing the right parameters for δ and α based on the number of bins. Fix an instance of Bin Packing. Let us ﬁrst recall what we mean by a solution.

Deﬁnition 3.1.

A partition S , . . . , S m of [ n ] is a solution of an instance of Bin Packing with n items and m bins, if for all j = 1 , . . . , m the set S j can be put in bin j (i.e., w ( S j ) (cid:54) c j ). The following notion of a witness will be crucial in our approach.

Deﬁnition 3.2 ( ( L, R ) -witnesses) . Let

L, R ⊆ [ m ] . A set W ⊆ [ n ] is an ( L, R ) - witness if there isa solution S , . . . , S m such that (cid:83) j ∈ L S j ⊆ W and (cid:83) j ∈ R S j ⊆ [ n ] \ W . We commonly denote S L := (cid:83) j ∈ L S j and S R := (cid:83) j ∈ R S j . To prove that W ⊆ [ n ] is an ( L, R ) witness, it is suﬃcient to prove that there exist S L ⊆ W and S R ⊆ [ n ] \ W such that: • S L can be divided over the bins in L , • S R can be divided over the bins in R , • and [ n ] \ ( S L ∪ S R ) can be divided over the bins in [ m ] \ ( L ∪ R ) .9ence, ﬁnding a witness gives us a proof for existence of a solution. This will be used several timesthroughout this section.Our algorithmic approach will heavily depend on whether or not the set of items can be evenlydivided, which we formalize as follows: Deﬁnition 3.3 ( α -balanced solution) . Let S , . . . , S m be a solution of Bin Packing. Then thesolution is α -balanced if for all permutations π : [ m ] → [ m ] there exists an b ∈ [ m ] such that (cid:80) bj =1 | S π ( j ) | ∈ [ n/ ± αn ] . If a solution is not α -balanced, it is called α -unbalanced . Hence, a solution is α -unbalanced if and only if there exists a permutation π : [ m ] → [ m ] and a b ∈ [ m ] such that (cid:80) b − j =1 | S π ( j ) | < (1 / − α ) n and (cid:80) bj =1 | S π ( j ) | > (1 / α ) n . If the instance generates relatively few distinct sums in the sense that | w (2 [ n ] ) | (cid:54) δn for some small δ , we can solve Bin Packing in suﬃciently fast relatively easily as follows. Lemma 3.4.

A solution of Bin Packing can be found in time O (cid:63) ( | w (2 [ n ] ) | m ) .Proof. First compute w (2 [ n ] ) in time O (cid:63) ( | w (2 [ n ] ) | ) with Lemma A.1. Subsequently, use the followingDynamic Programming algorithm: For every i = 1 , . . . , n and W , . . . , W m ∈ w (2 [ n ] ) , deﬁne A ( i, W , . . . , W m ) = (cid:40) , if items , . . . , i can be divided over m bins with capacities W , . . . , W m , , otherwise.Then the following recurrence relation can be easily seen to hold: A ( i, W , . . . , W m ) = (cid:95) j A ( i − , W , . . . , W j − w ( i ) , . . . W m ) . Let c , . . . , c m be the capacities of the bins of the bin packing instance. Then A ( n, c , ..., c m ) = 1 ifand only if there is a solution. We only have to check W j ∈ w (2 [ n ] ) , since those are all the possiblesums that w generates and thus A ( i, W , . . . , W m ) = 0 whenever W j / ∈ w (2 [ n ] ) for some j . Since wecan compute each table entry in O ( m ) time, the run time follows.Next we show that α -unbalanced solutions with α > can be detected quickly: Lemma 3.5.

If a Bin Packing instance has an α -unbalanced solution, then with probability (cid:62) itcan be found in time O (2 m · (1 − f A ( α )) n ) where f A ( α ) = Ω α → (cid:16) α log ( α ) (cid:17) .Proof. The algorithm iterates over all subsets

L, R ⊆ [ m ] such that L ∩ R = ∅ , | L ∪ R | = m − .Let b ∈ [ m ] be the only element not in L ∪ R . For each such L and R , the algorithm searches for ( L, R ) -witnesses of size n . Concretely, it samples a set W of (1 − α ) n random subsets of [ n ] of size n , and it computes for every W ∈ W whether it is an ( L, R ) -witness as follows: First, it computeswhich sets from ↓W ∪ ↑W are potential candidates for S L and S R . This is done by computing thebooleans l X for every X ∈ ↓W and r X for every X ∈ ↑W , where l X := (cid:40) if X can be divided over the bins in L, otherwise, r X := (cid:40) if [ n ] \ X can be divided over the bins in R, otherwise. 10his can be done in time O (( |↓W| + |↑W| ) n ) using Theorem 2.6.Second, for each W ∈ W , we search for sets X L ⊆ W and X R ⊆ [ n ] \ W of maximum weightsuch that they can be distributed over the bins L and R respectively. To do this, we compute l ∗ X for every X ∈ ↓W and r ∗ X for every X ∈ ↑W , where l ∗ X := max Y ⊆ X : l Y =1 w ( Y ) , r ∗ X := max Y ⊇ X : r Y =1 w ([ n ] \ Y ) . This can be done using Dynamic Programming with the recurrence relations l ∗ X = (cid:40) w ( X ) if l X = 1 , max i ∈ X l ∗ X \{ i } if l X = 0 , and r ∗ X = (cid:40) w ([ n ] \ X ) if r X = 1 , max i (cid:54)∈ X r ∗ X ∪{ i } if r X = 0 . The runtime is only O (( |↓W| + |↑W| ) n ) since the values l ∗ X for X ∈ ↓W do not depend on entries l ∗ Y for Y / ∈ ↓W , and the values r ∗ X for X ∈ ↑W do not depend on entries r ∗ Y for Y / ∈ ↑W . Thusthe algorithm only needs to evaluate |↓W| + |↑W| table entries which can be done in time O ( n ) perentry.Third, the algorithm checks if there exists a W ∈ W such that (cid:80) ni =1 w ( i ) − l ∗ W − r ∗ W (cid:54) c b andreturns yes if this is the case. If for all diﬀerent choices of L and R , no ( L, R ) -witness has beenfound, the algorithm returns no . Correctness of Algorithm

Assume that there is an α -unbalanced solution S , . . . , S m . Let π : [ m ] → [ m ] be a permutationof the bins such that (cid:80) b − j =1 | S π ( j ) | < (1 / − α ) n and (cid:80) bj =1 | S π ( j ) | > (1 / α ) n for some b ∈ [ m ] .Thus | S b | (cid:62) αn . Take L = { π (1) , . . . , π ( b − } and R = { π ( b + 1) , . . . , π ( m ) } . Recall the notation S L = ∪ b − j =1 | S π ( j ) | . Since for every Y ∈ (cid:0) S b n/ −| S L | (cid:1) the set Y ∪ S L is an ( L, R ) -witness of size n , thereare at least (cid:0) | S b | n/ −| S L | (cid:1) ( L, R ) -witnesses of cardinality n , which is at least (cid:0) αnαn (cid:1) since | S b | (cid:62) αn and n/ − | S L | (cid:62) αn . Thus the algorithm will detect the solution with probability − Pr[ W has no witness ] = 1 − (cid:32) − (cid:0) αnαn (cid:1) n (cid:33) |W| (cid:62) − exp (cid:32) (cid:0) αnαn (cid:1) |W| n (cid:33) (cid:62) − exp (cid:18) n |W| (1 − α ) n (cid:19) , where we use x (cid:54) exp( x ) in the ﬁrst inequality and (cid:0) aa/ (cid:1) (cid:62) a /a in the second inequality. Hence,if we take W to be a set of (1 − α ) n random subsets of [ n ] of size n , with constant probability therewill be an ( L, R ) -witness of the solution in W . Notice that for any witness W it will hold that (cid:80) ni =1 w ( i ) − l ∗ W − r ∗ W (cid:54) c π ( b ) and so the algorithm will return yes if W ∈ W .Moreover, when the algorithm ﬁnds a W ∈ W such that (cid:80) ni =1 w ( i ) − l ∗ W − r ∗ W (cid:54) c π ( b ) , it meansthere exist sets X L ⊆ W and X R ⊆ [ n ] \ W that can be divided over the bins of L and R respectively,such that [ n ] \ ( X L ∪ X R ) ﬁts into bin π ( b ) . Therefore, W is an ( L, R ) -witness and we proved theexistence of a solution to the Bin Packing instance. Runtime Analysis

We are left to prove the runtime of the algorithm. Recall that the algorithm will repeat theprocedure above for all O (2 m ) combinations of L and R . The runtime per diﬀerent guess of L and R is dominated by O (( |↓W| + |↑W| ) n ) , hence we are left to prove that |↓W| + |↑W| = O (2 (1 − f A ( α )) n ) .Let γ α = α /α ) . We can describe any X ∈ ↓W either as a set in (cid:0) [ n ]( − γ α ) n (cid:1) (if | X | (cid:54) ( − γ α ) n ),11r as a subset of a W ∈ W together with items that W and X diﬀer on (if | X | (cid:62) ( − γ α ) n ). Inthe latter case, the two sets diﬀer on at most γ α items, since | W | = n . This, together with the factthat | X | can only take n distinct values implies that |↓W| (cid:54) n (cid:18) n ( − γ α ) n (cid:19) + |W| (cid:18) n γ α n (cid:19) (cid:54) n h ( − γ α ) n + 2 (1 − α + h (2 γ α )) n . Notice that |↑W| can be bounded in the same way. Now we apply Lemma B.2 with x := α , γ := γ α , b = and c = 2 . Note that indeed the condition γ α (cid:54) x bc log( bx ) of Lemma B.2 is satisﬁed. Thuswe obtain that h ( − γ α ) (cid:62) − α + h (2 γ α ) > − α + h (2 γ α ) . Lastly, Lemma B.1 tells us that h ( − γ α ) (cid:54) − γ α . Hence, |↓W| + |↑W| = O (2 (1 − f A ( α )) n ) where f A ( α ) = α /α )) = Ω α → (cid:18) α log ( α ) (cid:19) . This gives us the desired runtime.

The results from the previous subsection enable us to assume that both | w (2 [ n ] ) | (cid:62) δn for somesmall constant δ > (that we will ﬁx later) and that there is an α -balanced solution for some α > .To solve these instances of Bin Packing, we ﬁrst need to deﬁne diﬀerent parameters of an instancethat determine our proof strategy. Deﬁnition 3.6 ( s -pruned item weights) . Let l = 1 + (cid:100) log(max i { w ( i ) } ) (cid:101) . For s ∈ { , . . . , l } , deﬁnethe s -pruned weight of an item i as w s ( i ) := (cid:98) w ( i ) / l − s (cid:99) . The s -pruned weight of item i comes down to pruning the l -bit representation of w ( i ) to the s most signiﬁcant bits. Indeed, w s ( i ) (cid:54) s , and w ( i ) = 0 for all items i and w l = w . We will needthe fact that the sequence | w (2 [ n ] ) | , | w (2 [ n ] ) | , . . . , | w l (2 [ n ] ) | = | w (2 [ n ] ) | , is almost non-decreasing and relatively smooth. Observe that the sequence may not be non-decreasing. For example when w = (3 , , the number of bits is l = 5 , and w = (0 , , , | w (2 [ n ] ) | = 1 w = (0 , , , | w (2 [ n ] ) | = 1 w = (0 , , , | w (2 [ n ] ) | = 2 w = (0 , , , | w (2 [ n ] ) | = 1 w = (1 , , , | w (2 [ n ] ) | = 8 w = (3 , , , | w (2 [ n ] ) | = 7 . Nevertheless, we show this is only an artefact of smaller order rounding errors and that the sequencein fact is smooth in the following precise sense:

Lemma 3.7.

Let w : [ n ] → N be an item weight function. Then for all s ∈ [ l ] : (cid:18) n (cid:19) | w s (2 [ n ] ) | (cid:54) | w s − (2 [ n ] ) | (cid:54) (cid:18) n (cid:19) | w s (2 [ n ] ) | . roof. Let l = 1 + (cid:100) log(max i { w ( i ) } ) (cid:101) . We are given w s ( A ) for all A ∈ [ n ] . Observe, that we canbound the value of w s − ( A ) by the following: (cid:88) i ∈ A w ( i ) / l − s +1 − (cid:54) w s − ( A ) (cid:54) (cid:88) i ∈ A w ( i ) / l − s +1 ⇒ (cid:88) i ∈ A ( (cid:98) w ( i ) / l − s (cid:99) − (cid:54) w s − ( A ) (cid:54) (cid:88) i ∈ A ( (cid:98) w ( i ) / l − s (cid:99) + 1) ⇒

12 ( w s ( A ) − n ) (cid:54) w s − ( A ) (cid:54)

12 ( w s ( A ) + n ) Hence, for each value in w s (2 [ n ] ) , there are at most n values in w s − (2 [ n ] ) , i.e. | w s − (2 [ n ] ) | (cid:54) (cid:18) n (cid:19) | w s (2 [ n ] ) | Analogously, for a given w s − ( A ) for any subset A ∈ [ n ] . Then we can bound the value of w s ( A ) by the following: (cid:88) i ∈ A w ( i ) / l − s − (cid:54) w s ( A ) (cid:54) (cid:88) i ∈ A w ( i ) / l − s ⇒ (cid:88) i ∈ A (cid:18) (cid:98) w ( i ) / l − s +1 (cid:99) − (cid:19) (cid:54) w s ( A ) (cid:54) (cid:88) i ∈ A ( (cid:98) w ( i ) / l − s +1 (cid:99) + 1) ⇒ (cid:16) w s − ( A ) − n (cid:17) (cid:54) w s ( A ) (cid:54) w s − ( A ) + n ) Hence, for each value in w s − (2 [ n ] ) , there are at most n values in w s (2 [ n ] ) , i.e. | w s − (2 [ n ] ) | (cid:62) (cid:18) n (cid:19) | w s (2 [ n ] ) | A part of our strategy is to use techniques from Lemma 3.4 to deal with some bins that arelargely empty. However, the analogous Dynamic Programming table needs to be indexed by w s for some s ∈ [ l ] . Therefore we will need only a ﬁnite precision (similar as in e.g. approximationschemes for Knapsack (see e.g. [37, Section 11.8])). The precision parameter we will be using is thefollowing: Deﬁnition 3.8 (Critical pruner) . Let δ ∈ (0 , be a ﬁxed parameter such that | w (2 [ n ] ) | (cid:62) δn . Wedeﬁne the critical pruner θ as θ := θ ( δ ) := min (cid:110) s ∈ N : | w s (2 [ n ] ) | (cid:62) δn (cid:111) . Observe, that | w θ (2 [ n ] ) | = Θ(2 δn ) by Lemma 3.7 and the fact that w (2 [ n ] ) = { } . Furthermore,by Corollary A.2 the critical pruner θ can be computed in O (2 δn ) time. Deﬁnition 3.9 (Slack) . The slack of a bin j is c j − (cid:80) i ∈ S j w ( i ) . A bin has δ -large slack if ithas slack at least n · l − θ and δ -small slack otherwise. We often skip δ , as δ will be ﬁxed later inSubsection 3.6. An item is a large slack item if it is in a bin of large slack and a small slack item otherwise. Which we will set later in Subsection 3.6 .4 Detecting a balanced solution with many small slack items In the next lemma we will solve Bin Packing instances with at least (1 / − α ) n small slack items. Lemma 3.10.

Suppose | w (2 [ n ] ) | (cid:62) δn and < α (cid:54) − /δ . If a Bin Packing instance has asolution that is α -balanced and has at least (1 / − α ) n items with δ -small slack, then such a solutioncan be found in time O (2 m · (1 − f B ( δ )) n ) for f B ( δ ) = Ω δ → (cid:16) − /δ (cid:17) .Proof. Use Corollary A.2 to compute the critical pruner θ in time O (2 δn ) . Then iterate over allcombinations of sets L, R ⊆ [ m ] that form a partition of [ m ] . For each such a partition, the algorithmsearches for ( L, R ) -witnesses of size n ± αn as follows: First, enumerate W , which is deﬁned as W :=  W ⊆ [ n ] : | W | ∈ [ n ± αn ] , w θ ( W ) ∈ (cid:88) j ∈ L c j / l − θ − ( | L | + 1) · n, (cid:88) j ∈ L c j / l − θ  . We can enumerate W in time O (2 n/ + |W| ) with a standard Meet-in-the-Middle approach (seee.g. [6, Section 3.2] or Lemma [44, Lemma 3.8]). Next, for every W ∈ W we determine whether W is an ( L, R ) -witness. This is done by computing the boolean l X for every X ∈ ↓W and r X for every X ∈ ↑W , where l X := (cid:40) if X can be divided over the bins in L, otherwise, r X := (cid:40) if [ n ] \ X can be divided over the bins in R, otherwise.Using Theorem 2.6 we can do this in time O (( |↓W| + |↑W| ) n ) . Next, the algorithm checks forall W ∈ W , whether l W = r W = 1 , and if so the algorithm returns yes . If for no partition L, R of [ m ] the algorithm ﬁnds a witness, the algorithm returns no . Correctness of Algorithm

Assume that there is an α -balanced solution S , . . . , S m . Let π : [ m ] → [ m ] be a permutationof the bins such that all bins with small slack have smaller index than the large slack bins, i.e. π ( j ) (cid:54) π ( j (cid:48) ) for all small slack bins j and large slack bins j (cid:48) . Since we assumed the solution to be α -balanced, there exists a bin b ∈ [ m ] such that (cid:80) bj =1 | S π ( j ) | ∈ [ n ± αn ] . Take L = { π (1) , . . . , π ( b ) } and R = { π ( b + 1) , . . . , π ( m ) } . Notice that S L = ∪ bj =1 S π ( j ) is an ( L, R ) -witness. We will provethat in the iteration of the algorithm where correct partition L, R is chosen, it holds that S L ∈ W .Since there are at least (1 / − α ) n small slack items, all bins in L have small slack. Hence, b (cid:88) j =1 ( c π ( j ) − n · l − θ ) (cid:54) w ( S L ) (cid:54) b (cid:88) j =1 c π ( j ) . Then using the bound on the pruned weights of the items w θ ( S L ) (cid:54) w ( S L ) / l − θ (cid:54) w θ ( S L ) + n, we can conclude that b (cid:88) j =1 c π ( j ) / l − θ − ( b + 1) n (cid:54) w θ ( S L ) (cid:54) b (cid:88) j =1 c π ( j ) / l − θ . | S L | ∈ [ n ± αn ] , we conclude that the set S L is present in W :=  W ⊆ [ n ] : | W | ∈ [( ± α ) n ] , w θ ( W ) ∈  b (cid:88) j =1 c j / l − θ − ( b + 1) · n, b (cid:88) j =1 c j / l − θ  . Notice that l W = r W = 1 if and only if W is an ( L, R ) -witness, since we chose L and R topartition [ m ] . Because S L ∈ W , the algorithm always returns yes in a yes-instance. Furthermore,when we ﬁnd a W s.t., l W = r W = 1 , we can conclude that there is a solution to the Bin Packinginstance since all items are divided over all bins. Runtime Analysis

It remains to analyze the runtime of the algorithm. Recall that the algorithm iterates over all O (2 m ) combinations of L and R . Each iteration takes O (( |↓W| + |↑W| ) n + 2 n/ ) time. Firstwe prove that |↓W| + |↑W| (cid:54) (1 − f B ( δ )) n . Recall that θ is the critical pruner, and therefore bydeﬁnition | w θ (2 [ n ] ) | (cid:62) δn . Theorem 4.1 states that if β ( w θ ) (cid:62) (1 − ε (cid:48) ) n , then | w θ (2 [ n ] ) | (cid:54) δ (cid:48) n where δ (cid:48) = O ε (cid:48) → (cid:18) log log(1 /ε (cid:48) ) √ log(1 /ε (cid:48) ) (cid:19) . By the bound log log(1 /ε (cid:48) ) √ log(1 /ε (cid:48) ) (cid:54) √ log(1 /ε (cid:48) ) , this implies δ (cid:48) (cid:54) O ε (cid:48) → (cid:32) (cid:112) log(1 /ε (cid:48) ) (cid:33) ⇔ /δ (cid:48) (cid:62) O ε (cid:48) → (1 /ε (cid:48) ) ⇔ ε (cid:48) (cid:54) O δ (cid:48) → (cid:16) − /δ (cid:48) (cid:17) Hence, if we denote ε ( δ ) := 2 − /δ there is some constant δ > such that for all δ < δ it holdsthat if | w θ (2 [ n ] ) | (cid:62) δn , then β ( w θ ) (cid:54) (1 − ε ( δ )) n . Note that because we only claim an asymptotictime bound in the lemma, we may assume that δ (cid:54) δ . As a consequence, for ﬁxed weight value v ,there are at most (1 − ε ( δ )) n sets W ⊆ [ n ] that have a weight w θ ( W ) = v and so |W| (cid:54) mn · (1 − ε ( δ )) n .Knowing this, we still need to bound the sizes of ↓W and ↑W . We can describe all X ∈ ↓W eitheras a set in (cid:0) [ n ]( − α ) n (cid:1) (if | X | (cid:54) ( − α ) n ), or as a subset of a W ∈ W together with the items that W and X diﬀer on (if | X | (cid:62) ( − α ) n ). In the latter case, the sets diﬀer on at most α items.Therefore |↓W| (cid:54) (cid:18) n ( − α ) n (cid:19) + |W| (cid:18) n αn (cid:19) (cid:54) h ( − α ) n + mn · (1 − ε ( δ )+ h (2 α )) n . Notice that |↑W| can be bounded in the same way. Now we apply Lemma B.2 with x := ε ( δ ) , b := 1 and c := 2 . We assumed < α (cid:54) − /δ . Note there is some δ such that for any δ satisfying < δ (cid:54) δ we have α (cid:54) − /δ (cid:54) − /δ / − /δ ) = ε ( δ )8 log(12 /ε ( δ )) . Thus the condition of Lemma B.2 is satisﬁed and it states that h ( − α ) (cid:62) − ε ( δ ) + h (2 α ) . Lastly,Lemma B.1 tells us that h ( − α ) (cid:54) − α . Hence, |↓W| + |↑W| = O (2 (1 − f B ( δ )) n ) where f B ( δ ) = ε ( δ )

32 ln(2)(log (6 /ε ( δ ))) = Ω δ → (cid:18) ε ( δ ) log ε ( δ ) (cid:19) = Ω δ → (cid:16) δ · − /δ (cid:17) . f B ( δ ) (cid:28) / for δ ∈ [0 , , the O (2 n/ ) time bound is subsumed by the O (2 (1 − f B ( δ )) n ) term.Multiplying this term by m diﬀerent choices for L and R , gives us the requested runtime. We are left to prove the remaining case. In this case we will assume that the solution is α -balancedfor some < α < m , | w (2 [ n ] ) | (cid:54) δn for some δ > and that there are at most (1 / − α ) n smallslack items with respect to the critical pruner θ . First we observe the following property of an α -balanced solution. Observation 3.11.

Let S , . . . , S m be an α -balanced solution for some < α < m . Assume k, k (cid:48) ∈ [ m ] to be two diﬀerent bins with the most items. Then either:1. | S k | , | S k (cid:48) | ∈ [(1 / ± α ) n ] , or2. | S j | (cid:54) ( − m ) n for all bins j .Proof. If condition does not hold, then we know that | S k | > ( − m ) n . That means that onaverage the other bins have n −| S k | m − items, meaning that | S k (cid:48) | (cid:62) n −| S k | m − . Hence, | S k | + | S k (cid:48) | (cid:62) (cid:32) − m + 1 − + m m − (cid:33) n> (cid:32) − m + ( m − (cid:33) n> (cid:18) − m + 12 m (cid:19) n = (cid:18)

12 + 14 m (cid:19) n Since the solution is α -balanced (with α (cid:54) m ), it means that for all permutations π , in particularthose with π − ( k ) = 1 and π − ( k (cid:48) ) = 2 , there exists an b ∈ [ m ] such that (cid:80) bj =1 | S π ( j ) | ∈ [ n ± αn ] .Because | S k | + | S k (cid:48) | > (1 / α ) n , we know that b = 1 and | S k | ∈ [ n ± αn ] . We can concludethe same for k (cid:48) by repeating these last arguments for all permutations π with π − ( k ) = 2 and π − ( k (cid:48) ) = 1 , and thus condition must hold, and the lemma follows. Lemma 3.12.

Assume α < m . If a solution of a Bin Packing instance with m bins is α -balancedand has at most (1 / − α ) n items that have δ -small slack, then with probability at least a solutioncan be found in time O (2 (1 − f C ( m )+ δm ) n ) with f C ( m ) = Ω m →∞ (cid:16) h ( m ) log ( h ( m )) (cid:17) .Proof. For an overview of the algorithm, see Algorithm 1. Compute the critical pruner θ and theset w θ (2 [ n ] ) in O (2 δn ) time with Corollary A.2. The algorithm will search for ( L, R ) -witnesses forall L, R ⊆ [ m ] such that | R | = 1 and L ∩ R = ∅ . For notation purposes, assume without loss ofgenerality that R = { } , L = { , . . . , k } and let M = { k + 1 , . . . , m } . Let W ⊆ (cid:0) [ n ] n (cid:1) by sampleduniformly at random with size (1 − g ( m )) n where g ( m ) = h (1 / (2 m )) . For given L and R , we guess16 k +1 , . . . , a m ∈ w θ (2 [ n ] ) . Then, we compute the boolean l X for every X ∈ ↓W and r X for every X ∈ ↑W , where l X :=  if there exists a partition X , . . . X k , Y k +1 , . . . , Y m of X such thatfor all j ∈ L : w ( X j ) (cid:54) c j and for all j (cid:48) ∈ M : w θ ( Y j (cid:48) ) (cid:54) a j (cid:48) , otherwise, r X :=  if there exists a partition X , Y k +1 , . . . , Y m of [ n ] \ X such that w ( X ) (cid:54) c and for all j (cid:48) ∈ M : w θ ( Y j (cid:48) ) (cid:54) c j (cid:48) / l − θ − n − a j (cid:48) , otherwise.This can be done using Fast Zeta/Möbius transformation. For j ∈ L ∪ R deﬁne the functions f j : 2 [ n ] → { , } as f j ( X ) = (cid:40) , if w ( X ) (cid:54) c j , , otherwise,and for j (cid:48) ∈ M deﬁne the functions f θj , f θj : 2 [ n ] → { , } as f θj (cid:48) ( X ) = (cid:40) , if w θ ( X ) (cid:54) a j (cid:48) , , otherwise, f θj (cid:48) ( X ) = (cid:40) , if w θ ( X ) (cid:54) c j (cid:48) / l − θ − n − a j (cid:48) , , otherwise.Now, we observe the following: Claim 3.13. l X = 1 if and only if ( f ∗ c · · · ∗ c f k ∗ c f θk +1 ∗ c · · · ∗ c f θm )( X ) > .Proof. Let us assume that l X = 1 . Let S (cid:48) , . . . , S (cid:48) m be such that f j ( S (cid:48) j ) = 1 for every j . Then S (cid:48) , . . . , S (cid:48) m gives a non-zero contribution to ( f ∗ c · · · ∗ c f k ∗ c f θk +1 ∗ c · · · ∗ c f θm )( X ) and hence it mustbe positive.For the other way around, if ( f ∗ c · · · ∗ c f k ∗ c f θk +1 ∗ c · · · ∗ c f θm )( X ) > there exist S (cid:48) , . . . , S (cid:48) m be such that f j ( S (cid:48) j ) = 1 for every j and S (cid:48) ∪ . . . ∪ S (cid:48) m = X . Observe that we can transform thisinto a partition S (cid:48)(cid:48) , . . . , S (cid:48)(cid:48) m of X by choosing S (cid:48)(cid:48) j ⊆ S (cid:48) j . Because f j does not decrease when takingsubsets we know that f j ( S (cid:48) j ) (cid:54) f j ( S (cid:48)(cid:48) j ) , and thus l X = 1 because.Similarly, we can argue that r X = 1 if and only if ( f ∗ c f θk +1 ∗ c · · · ∗ c f θm )([ n ] \ X ) > .We can compute booleans l x and r X in time O (( |↓W| + |↑W| ) nm ) by combining Theorem 2.3and Theorem 2.5. Finally, if we ﬁnd W ∈ W , such that l W = r W = 1 , we can return yes . Constant probability of a witness in W Recall that the set W is a random subset of (cid:0) [ n ] n/ (cid:1) of size (1 − g ( m )) n with g ( m ) = h (1 / (2 m )) / . Weﬁrst analyze the number of ( L, R ) -witnesses that are in (cid:0) [ n ] n/ (cid:1) and with that we will show that theprobability that W contains such a witness is constant. Assume that there is an α -balanced solution S , . . . , S m for some < α < m . We use Observation 3.11 to conclude that either | S j | < ( − m ) for all bins, or that | S k | , | S k (cid:48) | ∈ [ n ± αn ] for largest bins k and k (cid:48) . Since we assumed that there areat most (1 / − α ) n small slack items, we know that in the latter case bins k and k (cid:48) are thereforelarge slack bins. In either case, we conclude that | S j | (cid:54) ( − m ) n for all small slack bins. We will17 lgorithm: BinPacking( w , . . . , w n ) Output :

Yes (whp), if an α -balanced solution with (1 / − α ) n small slack items exist Compute the critical pruner θ with Corollary A.2. // In time O (2 δn ) Compute the set w θ (2 [ n ] ) with Lemma A.1. // In time O (2 δn ) Choose W to be a set of (1 − g ( m )) n random subsets of n of size n . for L, R ⊆ [ m ] such that | R | = 1 and L ∩ R = ∅ do // m m repetitions Assume without loss of generality that R = { } , L = { , . . . , k } . for a k +1 , . . . , a m ∈ w θ (2 [ n ] ) do // | w θ (2 [ n ] ) | m − k repetitions Compute l X for all X ∈ ↓W . // In time O (( |↓W| + |↑W| ) nm ) Compute r X for all X ∈ ↑W . // In time O (( |↓W| + |↑W| ) nm ) if l W = r W = 1 , for W ∈ W then return yes. return no. Algorithm 1:

Overview of the algorithm for Lemma 3.12assume without loss of generality that bin is the largest small slack bin and that bins , . . . , k arethe other small slack bins. Then let L = { , . . . , k } , R = { } and thus M = { k + 1 , . . . , m } are alllarge slack bins. We will lower bound the number of ( L, R ) -witnesses of size n . S · · · S k M S n items L Rx − | S | items W n − x items Figure 2: Overview of ( L, R ) -witnesses of size n for explanation of equation 2. Let x be the numberof small slack items. Then any such ( L, R ) -witness, W , must include all items of S , . . . , S k andexclude any items of S . The other items in W can then be any combination of large slack items,which are exactly the items in the bins of M .Let x be the number of small slack items in the solution. Note that the number of ( L, R ) -witnesses of size n is equal to (cid:18) n − xn/ − ( x − | S | ) (cid:19) , (2)since the sets S , . . . , S k together with any subset of n/ − ( x −| S | ) large slack items form a witness.See Figure 2 for an illustration of this. Since (cid:0) n − xn/ − ( x −| S | ) (cid:1) (cid:62) (cid:0) n − xn/ − x (cid:1) , if x (cid:54) n/ , there are at least (cid:0) n/ n/ (cid:1) ( L, R ) -witnesses of size n .If x (cid:62) n , then notice that xm (cid:54) | S | (cid:54) ( − m ) n , because S is the largest small slack bin.Therefore, the number of witnesses is at least the number of ways to choose n/ − | S | items from M to exclude in the witness. Thus we have that the number of witnesses of size n is at least (cid:18) n − x n − | S | (cid:19) (cid:62) min (cid:26)(cid:18) n n − xm (cid:19) , (cid:18) n n − ( n − nm ) (cid:19)(cid:27) (cid:62) (cid:18) n n m (cid:19) .

18o in both cases for x , we can conclude that the number of ( L, R ) -witnesses of size n is at least g ( m ) n . Thus the algorithm will detect the solution with probability − Pr[ W has no witness ] = 1 − (cid:32) − g ( m ) n n (cid:33) |W| (cid:62) − exp (cid:32) g ( m ) n |W| n (cid:33) (cid:62) − exp (cid:18) |W| (1 − g ( m )) n (cid:19) , where we use x (cid:54) exp( x ) . Hence, if we take W to be a set of (1 − g ( m )) n random subsets of [ n ] of size n , with constant probability there will be an ( L, R ) -witness of the solution in W . Correctness of Algorithm

The algorithm returns yes if and only if l W = r W = 1 for some W ∈ W . So if it returns yes , thereexists a partition X , . . . , X k , Y k +1 , . . . , Y m of W and a partition X , Y (cid:48) k +1 , . . . , Y (cid:48) m of [ n ] \ W bydeﬁnition. Together they partition all items. Notice that by deﬁnition we know that X j can be putinto bin j for all j ∈ L ∪ R . Hence we are left to prove that for all j ∈ M : X j = Y j ∪ Y (cid:48) j can be putinto bin j . Notice that since f θj ( Y j ) = f θj ( Y (cid:48) j ) = 1 we have that (cid:88) i ∈ X j w θ ( i ) (cid:54) a j + c j / l − θ − n − a j = ⇒ (cid:88) i ∈ X j (cid:98) w ( i ) / l − θ (cid:99) (cid:54) c j / l − θ − n = ⇒ (cid:88) i ∈ X j ( w ( i ) − l − θ ) (cid:54) c j − n l − θ = ⇒ (cid:88) i ∈ X j w ( i ) (cid:54) c j , and so, indeed the items of X j ﬁt into bin j and we have a yes-instance. For the implication inthe other direction, we prove that if there exists a solution, the algorithm ﬁnds it with constantprobability. We already showed that with constant probability there is an ( L, R ) -witness W ∈ W .Next, we will prove that there exist a k +1 , . . . , a m ∈ w θ (2 [ n ] ) such that l W = r W = 1 for all witnesses W . Let S , . . . , S k , S k +1 ∩ W, . . . S m ∩ W be the partition of W from the deﬁnition of l W , and let S , . . . , S k +1 \ W, . . . , S m \ W be the partition of [ n ] \ W from the deﬁnition of r W .Note that, for all j ∈ L ∪ R it holds that w ( S j ) (cid:54) c j because S , . . . , S m is a solution. So we areleft to prove that for all j ∈ M there exists an a j ∈ w θ (2 [ n ] ) such that w θ ( S j ∩ W ) (cid:54) a j and w θ ( S j \ W ) (cid:54) c j / l − θ − n − a j . Recall that we assumed that the bins of M are large slack bins. Hence we know that for j ∈ M : (cid:88) i ∈ S j w ( i ) (cid:54) c j − n l − θ = ⇒ (cid:88) i ∈ S j (cid:98) w ( i ) / l − θ (cid:99) (cid:54) c j / l − θ − n = ⇒ (cid:88) i ∈ S j w θ ( i ) (cid:54) c j / l − θ − n − a j + a j . So, take a j = w θ ( S j ∩ W ) ∈ w θ (2 [ n ] ) and indeed the correctness of the algorithm follows.19 ime Analysis The algorithm will go through the procedure of computing the booleans l X and r X for all diﬀerentsets L, R ⊆ [ m ] such that | R | = 1 and for all diﬀerent values of a k +1 , . . . , a m ∈ w θ (2 [ n ] ) . This gives atotal of at most m · m · | w θ (2 [ n ] ) | m repetitions. By Lemma 3.7, we have | w θ (2 [ n ] ) | (cid:54) n | w θ − (2 [ n ] ) | .Because θ is the critical pruner, and since w (2 [ n ] ) = { } , we know that | w θ − (2 [ n ] ) | (cid:54) δn . Hence,the number of repetitions at most O (2 m · δmn ) .Now we analyze the time complexity per choice of ( L, R ) and a k +1 , . . . , a m . Recall that we chose W ⊆ (cid:0) [ n ] n/ (cid:1) as a random set of size (1 − g ( m )) n . Computing all the booleans l X and r X can be donein O (( |↓W| + |↑W| ) nm ) time. Let γ m = g ( m )4 log(12 /g ( m )) . Notice that we can describe all X ∈ ↓W either as a set in (cid:0) [ n ]( − γ m ) n (cid:1) (if | X | (cid:54) ( − γ m ) n ), or as a subset of a set W ∈ W together with theitems that W and X diﬀer on (if | X | (cid:62) ( − γ m ) n ). In the latter case, the sets diﬀer at most on γ m items. Therefore: |↓W| (cid:54) (cid:18) n ( − γ m ) n (cid:19) + |W| (cid:18) nγ m n (cid:19) (cid:54) h ( − γ m ) n + 2 (1 − g ( m )+ h ( γ m )) n . Notice that |↑W| can be bounded in the same way. We apply Lemma B.2 with b := c := 1 , γ := γ m and x := g ( m ) . Since γ m (cid:54) g ( m ) / (4 log g ( m ) ) it implies that h ( − γ m ) (cid:62) − g ( m ) + h ( γ m ) . Lastly,Lemma B.1 tells us that h ( − γ m ) (cid:54) − ( γ m ) . Hence, |↓W| + |↑W| = O (2 (1 − f C ( m )) n ) where f C ( m ) = g ( m ) /g ( m ))) = h ( m )

32 ln(2) log(24 /h ( m )) = Ω m →∞ (cid:32) h ( m ) log ( h ( m )) (cid:33) . Combining this with the number of repetitions we get a run time of O (2 m · (1 − f C ( m )+ δm ) n ) . Thisgives us the requested runtime. We are now ready to prove Theorem 1.1 by combining all work of the previous sections and settingthe parameters α and δ : Proof.

We will now combine all previous lemma’s. An overview of the algorithm can be found inFigure 3. To facilitate the asymptotic analysis, note we can assume the number of bins m is atleast m for some constant m . If this is not the case we can add m − m artiﬁcial bins with uniquesmall capacities and a matching items. Since m is constant this does not inﬂuence the asymptoticrun time of the algorithm. Deﬁne f C ( m ) as in Lemma 3.12 as: f C ( m ) = h ( m )

32 ln(2) log(24 /h ( m )) = Ω m →∞ (cid:32) h ( m ) log ( h ( m )) (cid:33) . Then, set δ := f C ( m ) / (2 m ) and α := 2 − /δ . Then f C ( m ) > , δ > and α > .1. If | w (2 [ n ] ) | (cid:54) δn , the algorithm from Lemma 3.4 solves the instance in time O (2 fC ( m )2 n ) .2. If the instance has an α -unbalanced solution, the algorithm from Lemma 3.5 can detect withconstant probability in time O (cid:18) m + (cid:16) − Ω (cid:16) δ − /δ (cid:17)(cid:17) n (cid:19) . w (2 [ n ] ) | (cid:54) δn ?Lemma 3.4 α -balanced solution? | w (2 [ n ] ) | (cid:54) δn | w (2 [ n ] ) | > δn Lemma 3.5 Number of small slack items α -unbalanced α -balancedLemma 3.10 Lemma 3.12At least (1 / − α ) n At most (1 / − α ) n Figure 3: Overview of use of Lemma’s proving Theorem 1.1.3. If the instance has an α -balanced solution, | w (2 [ n ] ) | (cid:54) δn , and a solution with at least (1 / − α ) n small slack items, the upper bound α (cid:54) − /δ ensures that the solution can bedetected by the algorithm from Lemma 3.10 in time O (cid:18) m + (cid:16) − Ω(2 − /δ ) (cid:17) n (cid:19) .

4. Otherwise, if the instance has an α -balanced solution, | w (2 [ n ] ) | (cid:54) δn , and a solution with atmost (1 / − α ) n small slack items, the algorithm from Lemma 3.12 detect the solution withprobability at least / in time O (cid:18) m + (cid:16) − fC ( m )2 (cid:17) n (cid:19) . Thus, we obtain a probabilistic algorithm for Bin Packing that runs in time O (2 (1 − σ m ) n ) , where σ m is a strictly positive number. In this section, we will prove our Additive Combinatorics result which we ﬁrst restate for convenience:

Theorem 1.2 (restated) . Let ε > . If β ( w ) (cid:62) (1 − ε ) n , then | w (2 [ n ] ) | (cid:54) δn , where δ ( ε ) = O ε → (cid:32) log(log(1 /ε )) (cid:112) log(1 /ε ) (cid:33) . For our proof, it will be convenient to use a reformulation of Theorem 1.2 to a version with twoset families that attain the parameters and that uses vector notation (so w is a vector and w ( X ) isthe inner product (cid:104) w, x (cid:105) of w with the characteristic vector x of set X ):21 heorem 4.1 (Theorem 1.2 reformulated) . Let w = ( w , . . . , w n ) ∈ Z n be a vector with integerweights, and let A, B ⊆ { , } n be such that | a − (1) | = αn for each a ∈ A and ( ) (cid:104) w, b (cid:105) = τ for every b ∈ B, and ( ) if a, a (cid:48) ∈ A and (cid:104) w, a (cid:105) = (cid:104) w, a (cid:48) (cid:105) , then a = a (cid:48) . If | B | (cid:62) (1 − ε ) n , then | A | (cid:54) δ ( ε ) n , where δ ( ε ) = O ε → (cid:18) log(log(1 /ε )) √ log(1 /ε ) (cid:19) . We ﬁrst show that this implies Theorem 4.1:

Proof of Theorem 1.2 from Theorem 4.1.

Suppose w , . . . , w n and τ are such that |{ X ⊆ [ n ] : w ( X ) = τ }| (cid:62) (1 − ε ) n . Then B := { b ∈ { , } n : (cid:104) w, b (cid:105) = τ } satisﬁes the conditions of Theo-rem 4.1 and | B | (cid:62) (1 − ε ) n . For every i ∈ w (2 [ n ] ) arbitrary choose a vector a ( i ) ∈ { , } n such that (cid:104) w, a ( i ) (cid:105) = i . Deﬁne A (cid:48) = { a ( i ) : i ∈ w (2 [ n ] ) } . Since | a − (1) | can only take n diﬀerent values, thereexist an α such that | a − (1) | = αn for at least an /n fraction of the elements of A (cid:48) . This gives aset A that satisﬁes the condition of Theorem 4.1, and thus | w (2 [ n ] ) | = | A (cid:48) | (cid:54) | A | · n (cid:54) δ ( ε ) n + o ( n ) , and we can use the O ( · ) notation in the term δ ( ε ) to hide the o ( n ) factors.The rest of this section is dedicated to the proof of Theorem 4.1. We use the following standarddeﬁnitions from Additive Combinatorics: For sets X, Y we deﬁne X + Y as the sumset { x + y : x ∈ X, y ∈ Y } . For an integer k , we deﬁne k · X as the k -fold sum X + X + · · · + X (cid:124) (cid:123)(cid:122) (cid:125) k times .The starting point of the proof of Theorem 4.1 is the following simple Lemma that proves that | A || k · B | = | A + k · B | . It is heavily inspired by the UDCP connection from [3, Proposition 4.2]. Lemma 4.2. If a, a (cid:48) ∈ A and b, b (cid:48) ∈ k · B are such that a + b = a (cid:48) + b (cid:48) , then ( a, b ) = ( a (cid:48) , b (cid:48) ) .Proof. Note that (cid:104) w, a (cid:105) + (cid:104) w, b (cid:105) = (cid:104) w, a + b (cid:105) = (cid:104) w, a (cid:48) + b (cid:48) (cid:105) = (cid:104) w, a (cid:48) (cid:105) + (cid:104) w, b (cid:48) (cid:105) . By deﬁnition of B , we know that (cid:104) w, b (cid:105) = (cid:104) w, b (cid:48) (cid:105) = k · τ , hence (cid:104) w, a (cid:105) = (cid:104) w, a (cid:48) (cid:105) . Therefore bydeﬁnition of set A it has to be that a (cid:48) = a . This implies that b = b (cid:48) , since a + b = a (cid:48) + b (cid:48) .Thus | A | is equal to | A + k · B | / | k · B | , and we may restrict our attention to upper boundingthe latter quantity for any integer k ∈ N . Since this is in general not easy, we instead deﬁne a set P ⊆ A × k · B of pairs such that for each ( a, b ) ∈ P the distribution of the values in the vector a + b is close to what one would expect for random vectors. This is useful since the control on pairs ( a, b ) ∈ P gives us control on the vectors a + b which allow us to upper bound P . Moreover, we alsoprovide a lower bound that shows that P is not much smaller than | A | · | k · B | . Combining the twobounds results in the upper bound for A . We will make this more formal in the next subsections,but ﬁrst we give a warm-up result that sets up the notation for the main proof.22 .1 A Warm-up with B = { , } n Let us ﬁrst investigate what happens in a case when B is equal to the whole Boolean hypercube { , } n . While | A | can be easily be upper bounded by direct methods, it is instructive to see whatour approach will be in this special case. In this setting we can think about vectors from B assampled uniformly at random. Fix a parameter < α < , let a ∈ { , } n be a ﬁxed, adversariallychosen vector with | a − (1) | = αn , and let b , . . . , b k ∈ { , } n be independently sampled randomvectors. Let b = b + b + . . . + b k and c = a + b . Observe that for every i ∈ { , . . . , k } and i (cid:48) ∈ { , . . . , k + 1 } E (cid:20) | b − ( i ) | n (cid:21) = (cid:18) ki (cid:19) − k , and E (cid:20) | c − ( i (cid:48) ) | n (cid:21) = (cid:18) (1 − α ) (cid:18) ki (cid:48) (cid:19) + α (cid:18) ki (cid:48) − (cid:19)(cid:19) − k . For further reference, we now deﬁne the found distributions explicitly:

Deﬁnition 4.3 ((Altered) Binomial Distribution) . For every k ∈ N , we let Bin( k ) denote the binomial distribution ( { , . . . , k } , p ) where p ( i ) = (cid:0) ki (cid:1) − k .For an additional parameter α ∈ (0 , , we deﬁne the altered binomial distribution Bin( k, α ) as ( { , . . . , k + 1 } , p (cid:48) ) where p (cid:48) ( i ) = (1 − α ) (cid:0) ki (cid:1) − k + α (cid:0) ki − (cid:1) − k . Note, that

Bin( k + 1) := Bin( k, / by Pascal’s Formula. Now, we present the intuition for therandom case. We have that: n · h (Bin( k, α )) = h ( c ) = h ( a, b ) = h ( a ) + h ( b ) = h ( a ) + n · h (Bin( k )) , where the second equality follows by Lemma 4.2 and the third inequality follows because a and b are independent. Thus h ( a ) = n ( h (Bin( k, α )) − h (Bin( k ))) , and the proof in the random casecan be concluded by using Lemma 4.8 in which we show that for any constant α ∈ (0 , it holdsthat h (Bin( k, α )) − h (Bin( k )) = O k →∞ ((log k ) / √ k ) , and the standard fact that the support of anyuniform random variable of entropy h is at most h .To extend this to the setting where B ⊂ { , } n we need to obtain vectors b , . . . , b k that aresuﬃciently random. This condition will translate to ε (cid:54) / O ( k ) , that will enforce k := O (log(1 /ε )) .The following chain of (in-)equalities summarizes the strategy of our proof. | A | = Lemma 4.2 | A + k · B || k · B | (cid:54) Section 4.2 | P || k · B | · f ( ε,k ) n (cid:54) Section 4.3 n ( h (Bin( k +1)) − h (Bin( k ))) · f ( ε,k ) n (cid:54) Lemma 4.8 δ ( ε ) n Now, we will make the above idea more precise.

The following deﬁnition quantiﬁes the ‘suﬃciently random’ terms from the previous subsection bymeasuring how far the distribution of the values of a vector are from a given (expected) distribution.

Deﬁnition 4.4 (Balanced vectors) . Let D = (Ω , p ) be a discrete probability space. Fix γ ∈ (0 , .Let U be the ﬁnite universe set and let X ⊆ U . A mapping (or a vector) v ∈ Ω U is γ - D balancedfor X if for all ω ∈ Ω it holds that | v − ( ω ) ∩ X || X | ∈ [ p ( ω ) ± γ ] . s a shorthand we say a mapping (or a vector) v ∈ Ω U is γ - D balanced if it is γ - D balanced for U . We denote the set of all γ - D balanced vectors v ∈ Ω U with ( D ± γ ) U . As an illustration of Deﬁnition 4.4, suppose U = { , . . . , } , X = { , . . . , } , Ω = { , } , p (0) = p (1) = , D = (Ω , p ) . Then (0 , , , , , is - D balanced for X but not - D balanced. The vector (0 , , , , , is not - D balanced for X but it is - D balanced.We will use Deﬁnition 4.4 with D being the distribution we would get in the random case asoutlined in Subsection 4.1 (hence D will usually be Bin( k ) or Bin( k, α ) ). Now, we prove a generalupper bound on the number of γ - D balanced vectors. Lemma 4.5.

Let D = (Ω , p ) be a discrete probability space. The number of γ - D balanced vectors isat most ( h ( D )+ f (Ω ,γ )) | U | , where f (Ω , γ ) := O ( | Ω | · γ log(1 /γ )) .Proof. The number of γ - D balanced vectors is at most (cid:80) q (cid:0) | U | q ·| U | (cid:1) , where the sum is over all proba-bility distributions q such that q · | U | is a vector with integer coordinates and q ( ω ) ∈ [ p ( ω ) ± γ ] forevery ω ∈ Ω . Since the number of possibilities for such a q is at most | U | | Ω | and (cid:0) | U | q ·| U | (cid:1) (cid:54) h ( q ) | U | byLemma 2.8, we obtain (cid:88) q (cid:18) | U | q · | U | (cid:19) (cid:54) | U | | Ω | h ( q ) | U | (cid:54) | U | | Ω | (cid:16) h ( D )+ln(2) | Ω | γ log 1 γ (cid:17) | U | (cid:54) h ( D ) | U | · O (cid:16) | U || Ω | γ log( 1 γ ) (cid:17) , where the second inequality follows from Lemma 2.9 and the third comes from | U | | Ω | (cid:28) | U || Ω | .For example, Lemma 4.5 bounds the number of γ - Bin( k ) balanced vectors by nh (Bin( k ))+ nf ( γ,k ) for some positive function f ( γ, k ) → when γ → .With Deﬁnition 4.4 in hand, we are ready to deﬁne the set of pairs mentioned in the start ofthis section: P := { ( a, b ) ∈ A × k · B : b ∈ B is ε . - Bin( k ) balanced for a } . We will devote Section 5 to the proof of the following somewhat technical lemma:

Lemma 4.6.

Let k < . · log(1 /ε ) . Then, for every a ∈ { , } n with | a − (1) | > ε . n , thereexists E a ⊆ k · B , such that | E a | (cid:62) ( h (Bin( k )) − ε . ) n and ( a, b ) ∈ P for every b ∈ E a . Note, that we can assume that α > ε . because otherwise | A | (cid:54) (cid:0) nε . n (cid:1) (cid:54) ε . log(4 /ε ) n (cid:54) δn and Theorem 4.1 follows automatically.Thus, we may apply Lemma 4.6 for each a ∈ A and obtain that | P | (cid:62) | A | · ( h (Bin( k )) − ε . ) n . (3)On the other hand, the balancedness property can be used to give an upper bound on P viaLemma 4.2. To do so, the following will be useful: Lemma 4.7. If ( a, b ) ∈ P , then a + b is (2 ε . ) - Bin( k, α ) balanced. roof. From the deﬁnition of P , vector b is ε . - Bin( k ) balanced for a . So, for every i ∈ { , . . . , k } : | a − (1) ∩ b − ( i ) | ∈ (cid:20)(cid:18) ki (cid:19) αn k ± ε . n (cid:21) And similarly, | a − (0) ∩ b − ( i ) | ∈ (cid:20)(cid:18) ki (cid:19) (1 − α ) n k ± ε . n (cid:21) It follows that for every i ∈ { , . . . , k + 1 } it holds that: | ( a + b ) − ( i ) | ∈ (cid:20)(cid:18) ki (cid:19) (1 − α ) n k + (cid:18) ki − (cid:19) αn k ± ε . n (cid:21) Next we deﬁne function η ( a, b ) := a + b . Observe that η is an injective function on A × k · B byLemma 4.2, and since P ⊆ A × k · B we have | η ( P ) | = | P | . By Lemma 4.7, every vector in η ( P ) is (2 ε . ) - Bin( k, α ) balanced, and thus Lemma 4.5 implies | P | (cid:54) n · h (Bin( k,α )) · O ( nkε log(1 /ε )) . (4) By combining (3) and (4) we obtain the following bound: | A | (cid:54) n ( h (Bin( k,α )) − h (Bin( k ))) · O (( ε . + kε log 1 ε ) n ) . (5)By Lemma B.4 we in fact have that h (Bin( k, α )) (cid:54) h (Bin( k + 1)) , and thus it remains to boundthe diﬀerence in entropy of two consecutive binomial distributions as follows: Lemma 4.8.

For large enough k , we have that h (Bin( k )) − h (Bin( k − (cid:54) log k √ k . Before we present the proof of Lemma 4.8, let us see how to use it. We choose k := Θ(log(1 /ε )) .Thus Lemma 4.8 implies that | A | (cid:54) n ( log k/ √ k + ε . log(1 /ε ) ) = 2 O ( n · δ ( ε )) , where δ ( ε ) = O ε → (cid:18) log(log(1 /ε )) √ log(1 /ε ) (cid:19) , because ε . log(1 /ε ) (cid:28) δ ( ε ) for small enough ε . This ﬁnishesthe proof of Theorem 4.1. Proof of Lemma 4.8.

For every i, k ∈ N such that i (cid:54) k let us deﬁne an auxiliary function: f ( k, i ) := (cid:0) ki (cid:1) k log (cid:32) k (cid:0) ki (cid:1) (cid:33) . Thus we have h (Bin( k )) = (cid:80) ki =0 f ( k, i ) . To relate h (Bin( k )) with h (Bin( k − , the following willbe useful: Claim 4.9. f ( k, i ) (cid:54) (cid:40) f ( k − , i ) , if i < (cid:98) k/ (cid:99) ,f ( k − , i − , if i (cid:62) k/ . (6)25 roof. Deﬁne g ( x ) = x · log(1 /x ) . Since its derivative is g (cid:48) ( x ) = − ln( x )+1ln(2) we have that g ( x ) (cid:54) g ( x (cid:48) ) whenever x (cid:54) x (cid:48) (cid:54) /e .Note f ( k, i ) = g ( (cid:0) ki (cid:1) / k ) , and since (cid:0) ki (cid:1) / k (cid:54) / √ k by the standard bound (cid:0) ki (cid:1) (cid:54) k / √ k , wehave (cid:0) ki (cid:1) / k (cid:54) /e for k (cid:62) large enough. Thus to prove the claim it remains to show that (cid:18) ki (cid:19) − k (cid:54) (cid:40)(cid:0) k − i (cid:1) − ( k − , if i < (cid:98) k/ (cid:99) , (cid:0) k − i − (cid:1) − ( k − , if i (cid:62) k/ . To see this ﬁrst suppose i < (cid:98) k/ (cid:99) . Then we have that (cid:18) ki (cid:19) − k = (cid:18) k − i (cid:19) kk − i − − k (cid:54) (cid:18) k − i (cid:19) − ( k − . Second, if i (cid:62) k/ , then we have that (cid:18) ki (cid:19) − k = (cid:18) k − i − (cid:19) k − i − − k (cid:54) (cid:18) k − i − (cid:19) − ( k − . Now we can use Claim 4.9 to give the required upper bound: h (Bin( k )) = k (cid:88) i =0 f ( k, i )=  (cid:98) k/ (cid:99)− (cid:88) i =0 f ( k, i )  +  k (cid:88) i = (cid:98) k/ (cid:99) +1 f ( k, i )  + f ( k, (cid:98) k/ (cid:99) ) (cid:54)  (cid:98) k/ (cid:99)− (cid:88) i =0 f ( k − , i )  +  k (cid:88) i = (cid:98) k/ (cid:99) +1 f ( k − , i −  + f ( k, (cid:98) k/ (cid:99) )= h (Bin( k −

1) + f ( k, (cid:98) k/ (cid:99) ) (cid:54) h (Bin( k − k ) / √ k, where we use Claim 4.9 in the ﬁrst inequality, and (cid:0) k (cid:98) k/ (cid:99) (cid:1) (cid:54) k / √ k in the second inequality. Hence h (Bin( k )) − h (Bin( k − (cid:54) log( k ) / √ k . k · B : Proof of Lemma 4.6 In this section we prove the following Lemma that we used in Section 4 to prove Theorem 1.2.

Lemma 4.6 (restated) . Let k < . · log(1 /ε ) . Then, for every a ∈ { , } n with | a − (1) | > ε . n ,there exists E a ⊆ k · B , such that | E a | (cid:62) ( h (Bin( k )) − ε . ) n and ( a, b ) ∈ P for every b ∈ E a . Recall that P := { ( a, b ) ∈ A × k · B : b ∈ B is ε . - Bin( k ) balanced for a } . Intuitively, we prove that for any ﬁxed set B ⊆ { , } n there exists a large set E a ⊆ k · B with thefollowing property: for every b ∈ E a we can perturb ε . n entries in b , such that it is indistinguish-able from a vector randomly sampled from the binomial distribution, even if we focus on a concretesubset of coordinates a − (1) ⊆ [ n ] . 26 Figure 4: The ζ operation takes a x × y matrix c ∈ ( Z x ) y as input and outputs a vector ζ ( c ) ∈ Z x by adding all columns.First, observe that we can interpret a tuple ( b , . . . , b k ) ∈ B k as the n × k matrix with the i ’thcolumn equal to b i . We interchangeably address such a tuple as an n × k matrix and as an elementof ( { , } n ) k . To emphasize the type of such variables, we denote such matrices with bold face. Forexample, ( b , . . . , b k ) is denoted with b ∈ ( { , } n ) k .Using the notation b T to denote the transpose of a matrix, we let C := { b T : b ∈ B k } ⊆ ( { , } k ) n denote the set of matrices B k interpreted in the transposed way.In Section 5.1 we show how to select a subset D ⊆ C of matrices in C , in such a way that forall b ∈ D ⊆ ( { , } k ) n , any column z ∈ { , } k occurs n k ± f ( ε ) n times in b , that is | b − ( z ) | ∈ [ n k ± f ( ε ) n ] .Next, in Section 5.2 we deﬁne the operation ζ (( a , . . . , a x )) := (cid:80) xi =1 a i , that sums the columnsof a matrix a ∈ ( Z y ) x to a single column ζ ( a ) ∈ Z y (see Figure 4). We consider the set E := { ζ ( b T ) : b ∈ D } ⊆ { , . . . , k } n , and argue that each vector in E is γ - Bin( k ) balanced for somesmall γ > .Finally, in Section 5.3 we take care of a ∈ { , } n and select the set E a ⊆ E to be all vectors in E that are ε . - Bin( k ) balanced for a − (1) . Uniform distribution.

We deﬁne the uniform distribution to be

Uni(Ω) = (Ω , p ) if p ( ω ) = | Ω | for each ω ∈ Ω . We will focus on the special cases when Ω = { , } and Ω = { , } k . Thus, v ∈ (cid:0) Uni( { , } ) ± γ (cid:1) [ n ] means that v − ( i ) ∈ [ n/ ± γn ] for all i ∈ { , } . Similarly, v ∈ (cid:0) Uni( { , } k ) ± γ (cid:1) [ n ] means that v − ( z ) ∈ [ n/ k ± γn ] for all z ∈ { , } k . If Ω is clear from the context, v ∈ Ω U and X ⊆ U , we also say that a vector is γ -uniform for X to refer to the statement that it is γ - Uni(Ω) balanced for X . Inequalities.

Through the section we assume that ε (cid:54) / k , γ := 4 √ ε and k > is an integer.This means that the following inequalities hold: 27 / (cid:54) k γ (cid:54) ε / , (7) · k k γ log(1 / (2 k γ )) (cid:54) · ε / log (1 /ε ) (cid:54) ε / . (8) D of uniform k -tuples We ﬁrst prove the following result that will be helpful to obtain the aforementioned set C . Lemma 5.1 (Most vectors in B are uniform) . Let U (cid:93) . . . (cid:93) U (cid:96) = [ n ] be a partition such that | U i | (cid:62) µn for all i ∈ [ (cid:96) ] . Let λ ∈ (0 , / . For every B ⊆ { , } n with | B | (cid:62) (1 − µλ ) n − o ( n ) it holdsthat: |{ b ∈ B : b is λ -uniform in U i for every i }| (cid:62) | B | / . Proof.

For a ﬁxed i we argue that number of vectors that are not λ -uniform for U i is bounded by (1 − λ µ ) n + o ( n ) . This will ﬁnish the proof, since we can sum this bound over all partitions.Let s = | U i | /n and note that s (cid:62) µ . Observe that the number of vectors v ∈ { , } n such that | v − (1) ∩ U i | / ∈ [ sn/ ± λsn ] is at most: (cid:88) λ (cid:48) / ∈ [ − λ,λ ] (cid:18) snsn/ − λ (cid:48) sn (cid:19) n − sn because a vector v that is not λ -uniform on U i can be arbitrary in [ n ] \ U i . We upper bound thiswith binary entropy function (cid:88) (cid:54) λ (cid:48) (cid:54) λ sn · h (1 / − λ (cid:48) )+ o ( n ) · n − sn . The expression is maximized when λ (cid:48) = λ because the h ( p ) entropy function is increasing in [0 , . .Hence we can upper bound the expression with n (1+ s ( h (1 / − λ ) − n + o ( n ) . Now, we use a bound h (1 / − x ) (cid:54) − x when (cid:54) x (cid:54) / (recall that λ ∈ [0 , / ) and obtainthat the number of vectors that are not λ -uniform for U i is at most − sλ ) n + o ( n ) (cid:54) − µλ ) n + o ( n ) Thus, by summing over all U i , the number of vectors that are not λ -uniform for some U i is atmost − µλ ) n + o ( n ) , and the number of vectors in B that are λ -uniform for all U i is at least | B | − − µλ ) n + o ( n ) (cid:62) | B | / , and the claim follows.Set a balance parameter γ := 4 √ ε , and deﬁne D := C ∩ (cid:0) Uni( { , } k ) ± γ (cid:1) [ n ] Lemma 5.2 (Most k -tuples are uniform) . Let k ∈ N be such that ε < / k +2 . Then it holds that | D | (cid:62) (cid:18) | B | (cid:19) k . roof. We denote C j ⊆ ( { , } j ) [ n ] to select all matrices obtain by removing the ﬁrst j columns ofmatrices C , namely C j := { b T : b ∈ B j } . Thus, C = C k . For j ∈ { , . . . , k } , let D j := C j ∩ (cid:0) Uni( { , } j ) ± γ (cid:1) [ n ] . We prove that | D j | (cid:62) ( | B | ) j by induction on k . First we prove the base case j = 1 of the induction,so | D | (cid:62) | B | / . This follows by applying Lemma 5.1 with λ = √ ε and partition U = [ n ] , since itimplies that | B | / (cid:54) (cid:12)(cid:12)(cid:12) B ∩ (cid:0) Uni( { , } ) ± √ ε (cid:1) U (cid:12)(cid:12)(cid:12) (cid:54) (cid:12)(cid:12)(cid:12) C ∩ (cid:0) Uni( { , } ) ± γ (cid:1) U (cid:12)(cid:12)(cid:12) = | D | . The induction step with j > is a direct consequence of the following claim, which therefore issuﬃcient to ﬁnish the proof. Claim 5.3.

Let b ∈ D j − . Then there are at least | B | / vectors b j ∈ B , such that b + ∈ (cid:0) Uni( { , } j ) ± γ (cid:1) [ n ] , where b + is obtained from b by appending b j as the j ’th row to it.Proof. Deﬁne a partition { U z } z ∈{ , } j − of [ n ] by U z = b − ( z ) . Because b ∈ (cid:0) Uni( { , } ( j − ) ± γ (cid:1) [ n ] we know that: µ := min z ∈{ , } j − | U z | /n (cid:62) j − − γ. Note, that µ > / j because we assumed that ε < / k +2 (hence γ < / j ). Now, we useLemma 5.1 with partition { U z } z ∈{ , } j − and λ := 2 j − · γ . First let us assert that the condition | B | (cid:62) (1 − µλ ) n + o ( n ) holds. Recall that we assumed | B | (cid:62) (1 − ε ) n + o ( n ) and µλ (cid:62) j (2 j − · √ ε ) (cid:62) j − ε (cid:62) ε (for j (cid:62) ). Hence | B | (cid:62) (1 − ε ) n (cid:62) (1 − µλ ) n + f ( n ) for some function f ∈ o ( n ) .Lemma 5.1 states that there are at least | B | / vectors b j ∈ B such that for each z ∈ { , } j − | U z ∩ b − j (1) | ∈ (cid:20) | U z | ± λ | U z | (cid:21) . (9)We know that | U z | ∈ [ n j − ± γn ] (because b ∈ (Uni( { , } j − ) ± γ ) [ n ] ). Thus in fact (9) can berewritten to | U z ∩ b − j (1) | ∈ (cid:104) n j ± (cid:0) λ | U z | + ( γ/ n (cid:1)(cid:105) . We bound λ | U z | by λ | U z | = 2 j − γ | U z | (cid:54) j − γ (cid:16) n j − + γn (cid:17) = γn (cid:18)

14 + 2 j − γ (cid:19) = γn (cid:18)

14 + 2 j − √ ε (cid:19) < γn (cid:18)

14 + 2 j − (cid:113) / k +2 (cid:19) = γn (cid:18)

14 + 2 j − / k +2 (cid:19) < γn (cid:18)

14 + 14 (cid:19) = ( γ/ n, where we use the assumption ε (cid:54) k +2 in the second line of the inequality. Thus, for every z ∈{ , } j − we have | U z ∩ b − j (1) | ∈ (cid:104) n j ± γ/ n (cid:105) . Now, observe that for all z ∈ { , } j − it holds that:29 z ∩ b − j (1) = b − ( z ) ∩ b − j (1) = b − (cid:0) z (cid:48) ) , where z (cid:48) ∈ { , } j is the vector obtained from z by adding a j -th entry with value . Thus vector z (cid:48) fulﬁlls the condition for b + to be in (cid:0) Uni( { , } j ) ± γ (cid:1) [ n ] . Similarly we can prove this conditionby concatenating a to the vector z . Hence, for every b ∈ D j − there are at least | B | / vectors b j ∈ B , such that b + ∈ (cid:0) Uni( { , } j ± γ (cid:1) [ n ] .Thus this claim proves our induction hypothesis and hence the lemma. D gives many distinct sums As mentioned in the beginning of this section we deﬁne the operation ζ (( a , . . . , a x )) := (cid:80) xi =1 a i ,that sums the columns of a matrix a ∈ ( Z y ) x to a single column ζ ( a ) ∈ Z y (see Figure 4).We deﬁne E to be all sums of tuples from D : E := { ζ ( b T ) : b ∈ D } ⊆ { , . . . , k } [ n ] In fact, by the assumption on D we have the following control on the distributions of the values inthe vectors in E : Lemma 5.4. If v ∈ E , then for j = 0 , . . . , k it holds that | v − ( j ) | ∈ (cid:104)(cid:0) kj (cid:1) n k ± (cid:0) kj (cid:1) γn (cid:105) , i.e. everyvector in E is a (2 k γ ) - Bin( k ) balanced vector.Proof. Consider an arbitrarily vector v ∈ E and ﬁx j ∈ { , . . . k } . From the deﬁnition of E , thereexists a vector b ∈ D = C ∩ (cid:0) Uni( { , } k ) ± γ (cid:1) [ n ] such that ζ ( b T ) = v . Hence for every z ∈ { , } k : | b − ( z ) | ∈ (cid:104) n k ± γn (cid:105) . Hence, if we sum over all vectors z ∈ { , } k such that | z − (1) | = j we have: | v − ( j ) | (cid:54) (cid:88) z ∈{ , } k | z − (1) | = j n k + γn = (cid:18) kj (cid:19) n k + (cid:18) kj (cid:19) γn, and analogously | v − ( j ) | (cid:62) (cid:0) kj (cid:1) n k − (cid:0) kj (cid:1) γn . Thus indeed v is a (2 k γ ) - Bin( k ) balanced vector, asdesired.We now show that E is suﬃciently large: Lemma 5.5.

It holds that | E | (cid:62) ( h (Bin( k ) − ε . ) n .Proof. For a vector v ∈ E we deﬁne D v := { b ∈ D : ζ ( b T ) = v } . By grouping all elements of D on their image with respect to ζ : | D | = (cid:88) v ∈ E | D v | (cid:54) | E | max v ∈ E | D v | , | E | : | E | (cid:62) | D | / max v ∈ E | D v | (cid:62) ( | B | / k / max v ∈ E | D v | (cid:62) k ( n − εn +1) / max v ∈ E | D v | . (10)Thus in the remainder of the proof we can focus on showing that for any vector v ∈ E , | D v | (cid:54) n ( k − h (Bin( k ))+ ε . ) ; the Lemma would then follow from substituting the bound in (10).Let b ∈ D v . This means that for every j = 0 , . . . , k : (cid:91) z ∈{ , } k | z − (1) | = j b − ( z ) = v − ( j ) . Thus the number of possibilities for b is k (cid:89) j =0 (cid:18) kj (cid:19) | v − ( j ) | . We multiply this quantity with (cid:0) [ n ] | v − (0) | ,..., | v − ( k ) | (cid:1) = (cid:0) [ n ] φ · n (cid:1) where φ := ( | v − (0) | /n, . . . , | v − ( k ) | /n ) and obtain (cid:18) [ n ] φ · n (cid:19) k (cid:89) j =0 (cid:18) kj (cid:19) | v − ( j ) | (cid:54) kn , where the inequality follows since the left hand size counts partitions of [ n ] into (cid:80) ki =0 (cid:0) ki (cid:1) = 2 k parts. Thus by Lemma 2.8 we have | D v | (cid:54) n ( k − h ( φ )) . Because v is γ - Bin( k ) balanced (since it isin E ), we have h ( φ ) (cid:62) h (Bin( k )) − ln(2) k k γ log 12 k γ (cid:62) h (Bin( k )) − ε . , where the ﬁrst inequality is by Lemma 2.9, and the second inequality uses that γ = 4 √ ε (seeInequality 8). E a ⊆ E for every a ∈ A Lemma 5.6.

Let D = (Ω , p ) be a discrete probability space, and let X ⊆ [ n ] with | X | = αn .The number of vectors v ∈ ( D + 0) [ n ] that are not ρ - D balanced for X and [ n ] \ X is at most n ( h ( D ) − α min( ρ , log α )) .Proof. We deﬁne a relation R ⊆ ( D + 0) [ n ] × (cid:0) [ n ] αn (cid:1) as follows: ( v, X ) ∈ R ⇔ v is not ρ - D balanced for X. Additionally, let R v = R ∩ (cid:18) { v } × (cid:18) [ n ] αn (cid:19)(cid:19) , for v ∈ ( D + 0) [ n ] ,R X = R ∩ (cid:16) ( D + 0) [ n ] × { X } (cid:17) , for X ∈ (cid:18) [ n ] αn (cid:19) . Note that | R X | is the value we want to bound. Note that the mapping ( v, X ) (cid:55)→ ( v ◦ π, π ( X )) forany permutation π : [ n ] ↔ [ n ] of the index set [ n ] is an automorphism of R (i.e., ( v, X ) ∈ R if andonly if ( π ( v ) , π ( X )) ∈ R ). Therefore, we have | R | = | ( D + 0) [ n ] | · | R v | = | R X | (cid:18) nαn (cid:19) (11)31or a ﬁxed v and X . By (11) we can focus on bounding | R v | instead of | R X | . To do so, note that if ( v, X ) ∈ R for X ∈ (cid:0) nαn (cid:1) , there must exist ω ∈ Ω such that | X ∩ v − ( ω ) | / ∈ [ p ( ω ) αn ± ραn ] . We canconstruct any such X by ﬁrst selecting a subset of v − ( ω ) (which has cardinality p ( ω ) n ), and thenchoosing the remaining elements. Hence: | R v | (cid:54) (cid:88) ω ∈ Ω (cid:88) x/ ∈ [ − ρ,ρ ] (cid:18) p ( ω ) n ( p ( ω ) + x ) αn (cid:19)(cid:18) (1 − p ( ω )) n (1 − p ( ω ) − x ) αn (cid:19) . Next, we use Lemma B.3. In our case with β := p ( ω ) , α := α and ρ := − αx it implies: | R v | (cid:54) (cid:88) j ∈{ ,...,k } (cid:88) x/ ∈ [ − ρ,ρ ] (cid:18) nαn (cid:19) − F n where F = (cid:40) ( αx ) , if | αx | < α (1 − α ) p ( ω ) ,α log(1 / (2 α )) , otherwise . Since | x | (cid:62) ρ we have F (cid:62) α min { ρ , log(1 / (2 α )) } , thus | R v | (cid:54) (cid:18) nαn (cid:19) − α min { ρ , log(1 / (2 α )) } n , which plugged into (11) gives the desired inequality. Proof of Lemma 4.6.

Recall that we assume that α > ε . . By Lemma 5.5 E is large, and byLemma 5.4 each vector in E is a (2 k γ ) - Bin( k ) balanced vector. By the pigeon hole principle theremust be a distribution D = ( { , . . . , k } , p ) where p = ( p , . . . , p k ) such that | p − (cid:0) kj (cid:1) − k | (cid:54) k γ for each j and E has a subset E (cid:48) of at least | E | /n k vectors that are in D [ n ] . Hence: | E (cid:48) | (cid:62) | E | /n k (cid:62) ( h ( D ) − ε . n ) − o ( n ) Now for each a ∈ A , deﬁne E a to be all vectors in E (cid:48) that are ε . - D balanced for a − (1) .Observe that this means that vectors in E a are ε . - Bin( k ) balanced (because ε . + 2 k γ (cid:28) ε . ).Applying Lemma 5.6 with E (cid:48) and a − (1) , we get that there are at most h ( D ) − α min( ε . , log(1 / (2 α )) (cid:54) ( h ( D ) − ε . ) n ) vectors in E (cid:48) that are not ε . - D balanced. Hence: | E (cid:48) \ E a | (cid:54) ( h ( D ) − ε . n ) (cid:54) | E (cid:48) | / Now the lemma follows because | E a | (cid:62) | E (cid:48) | / (cid:62) ( h ( D ) − ε . ) n − o ( n ) (cid:62) ( h (Bin( k ) − ln(2)2 k γ log(1 / (2 k γ )) − ε . ) n (cid:62) ( h (Bin( k ) − ε . ) n where the last inequality follows from Lemma 2.9 and Inequality 8 (since ε is small enough).32 Conclusion and Open Problems

In this paper, we present a randomized O (2 (1 − σ m ) n ) time algorithm for the Bin Packing problem,where σ m > and m denotes the number of bins. This is an improvement over the state-of-the-artalgorithm of Björklund et al. [9] that runs in O (cid:63) (2 n ) time for small m . Nevertheless, it still remainsto give an algorithm for Bin Packing that works in O (cid:63) ((2 − ε ) n ) time for arbitrarily large numberof bins for some ﬁxed constant ε > . We believe our algorithm made signiﬁcant progress on thisquestion. One open end for further research is how the number of bins inﬂuence the complexity ofan instance. By the methods of [45], instances of Bin Packing with a linear number of bins (withequal capacity) can also be solved in time O (2 (1 − ε ) n ) based on a witness sampling technique similarto what we used in some of our cases. It is thus natural to wonder whether (an extension) of themethods presented in this paper are enough to give improved algorithms for all number of bins.Our improvement is tiny and we provide only inversely exponentially small asymptotic lowerbounds bounds on σ m . The main bottleneck in our analysis is the Additive Combinatorics result.We conjecture that the bound on δ ( ε ) in Theorem 1.2 can be signiﬁcantly improved. This wouldautomatically yield a better bound on the running time of our algorithm.We believe our Additive Combinatorics result is natural and may have applications beyond thescope of this paper. As mentioned in the introduction, Littlewood-Oﬀord theory has a wide varietyof applications, and it is natural to expect that the setting that we address may be of interest inany of these settings.In the introduction we mentioned Question 1 as one motivation for studying improved exactexponential time algorithms for the Bin Packing problem. While it is not clear whether made directprogress on this question, we do believe that some of our ideas such as the approach to narrowdown the number of witnesses may inspire future work on improved algorithms for Set Cover. Forexample, our algorithm gives improved run times for all Set Cover instance with family F ⊆ U such that F = H ∩ { , } n where H is some hyperplane in R n via standard methods. Our methodsmay also inspire progress on improved exponential time algorithms for special cases of Set Coversuch as the Graph Coloring problem mentioned in the introduction. To the best of our knowledgethere is still no O ((2 − ε ) n ) time algorithm for some ε > to determine whether a graph admits aproper -coloring (see e.g. [11]). Acknowledgement

The research leading to the results presented in this paper was partially carried out during theParameterized Algorithms Retreat of the University of Warsaw, PARUW 2020, held in Krynica-Zdrój in February 2020. This workshop was supported by a project that has received funding fromthe European Research Council (ERC) under the European Union’s Horizon 2020 research andinnovation programme under grant agreement No 714704 (PI: Marcin Pilipczuk).

References [1] A. Abboud. Fine-grained reductions and quantum speedups for dynamic programming. InC. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi, editors, , volume 132 of

LIPIcs , pages 8:1–8:13. Schloss Dagstuhl - Leibniz-Zentrum für Infor-matik, 2019. 332] P. Austrin, P. Kaski, M. Koivisto, and J. Nederlof. Subset sum in the absence of concentration.In E. W. Mayr and N. Ollinger, editors, , volume 30 of

LIPIcs ,pages 48–61. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015.[3] P. Austrin, P. Kaski, M. Koivisto, and J. Nederlof. Dense subset sum may be the hardest.In N. Ollinger and H. Vollmer, editors, , volume 47 of

LIPIcs , pages13:1–13:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.[4] P. Austrin, P. Kaski, M. Koivisto, and J. Nederlof. Sharper upper bounds for unbalanceduniquely decodable code pairs.

IEEE Trans. Inf. Theory , 64(2):1368–1373, 2018.[5] N. Bansal, S. Garg, J. Nederlof, and N. Vyas. Faster space-eﬃcient algorithms for subset sum,k-sum, and related problems.

SIAM J. Comput. , 47(5):1755–1777, 2018.[6] A. Becker, J. Coron, and A. Joux. Improved generic algorithms for hard knapsacks. In K. G.Paterson, editor,

Advances in Cryptology - EUROCRYPT 2011 - 30th Annual InternationalConference on the Theory and Applications of Cryptographic Techniques, Tallinn, Estonia,May 15-19, 2011. Proceedings , volume 6632 of

Lecture Notes in Computer Science , pages 364–385. Springer, 2011.[7] A. Björklund, T. Husfeldt, P. Kaski, and M. Koivisto. Fourier meets möbius: fast subsetconvolution. In D. S. Johnson and U. Feige, editors,

Proceedings of the 39th Annual ACMSymposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007 , pages67–74. ACM, 2007.[8] A. Björklund, T. Husfeldt, P. Kaski, and M. Koivisto. Counting paths and packings in halves.In A. Fiat and P. Sanders, editors,

Algorithms - ESA 2009, 17th Annual European Symposium,Copenhagen, Denmark, September 7-9, 2009. Proceedings , volume 5757 of

Lecture Notes inComputer Science , pages 578–586. Springer, 2009.[9] A. Björklund, T. Husfeldt, and M. Koivisto. Set partitioning via inclusion-exclusion.

SIAM J.Comput. , 39(2):546–563, 2009.[10] A. Björklund, P. Kaski, and I. Koutis. Directed hamiltonicity and out-branchings via gener-alized laplacians. In I. Chatzigiannakis, P. Indyk, F. Kuhn, and A. Muscholl, editors, , volume 80 of

LIPIcs , pages 91:1–91:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.[11] J. M. Byskov. Enumerating maximal independent sets with applications to graph colouring.

Oper. Res. Lett. , 32(6):547–556, 2004.[12] C. Calabro.

The exponential complexity of satisﬁability problems . PhD thesis, UC San Diego,2009.[13] E. G. Coﬀman Jr., J. Csirik, G. Galambos, S. Martello, and D. Vigo.

Bin Packing Approxi-mation Algorithms: Survey and Classiﬁcation , pages 455–531. Springer New York, New York,NY, 2013.[14] I. Csiszár and P. C. Shields.

Information theory and statistics: A tutorial . Now Publishers Inc,2004. 3415] M. Cygan, H. Dell, D. Lokshtanov, D. Marx, J. Nederlof, Y. Okamoto, R. Paturi, S. Saurabh,and M. Wahlström. On problems as hard as CNF-SAT. In

Proceedings of the 27th Conferenceon Computational Complexity, CCC 2012, Porto, Portugal, June 26-29, 2012 , pages 74–84.IEEE Computer Society, 2012.[16] M. Cygan, H. Dell, D. Lokshtanov, D. Marx, J. Nederlof, Y. Okamoto, R. Paturi, S. Saurabh,and M. Wahlström. On problems as hard as CNF-SAT.

ACM Trans. Algorithms , 12(3):41:1–41:24, 2016.[17] M. Cygan, F. V. Fomin, L. Kowalik, D. Lokshtanov, D. Marx, M. Pilipczuk, M. Pilipczuk, andS. Saurabh.

Parameterized Algorithms . Springer, 2015.[18] M. Delorme, M. Iori, and S. Martello. Bin packing and cutting stock problems: Mathematicalmodels and exact algorithms.

European Journal of Operational Research , 255(1):1–20, 2016.[19] I. Diakonikolas and R. A. Servedio. Improved approximation of linear threshold functions.

Comput. Complex. , 22(3):623–677, 2013.[20] S. Eilon and N. Christoﬁdes. The loading problem.

Management Science , 17(5):259–268, 1971.[21] K. Eisemann. The trim problem.

Management Science , 3(3):279–284, 1957.[22] F. V. Fomin and P. Kaski. Exact exponential algorithms.

Commun. ACM , 56(3):80–88, 2013.[23] F. V. Fomin and D. Kratsch.

Exact Exponential Algorithms . Texts in Theoretical ComputerScience. An EATCS Series. Springer, 2010.[24] A. Frank and É. Tardos. An application of simultaneous diophantine approximation in combi-natorial optimization.

Combinatorica , 7(1):49–65, 1987.[25] M. X. Goemans and T. Rothvoß. Polynomiality for bin packing with a constant number of itemtypes. In C. Chekuri, editor,

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposiumon Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages 830–839. SIAM, 2014.[26] A. Golovnev, A. S. Kulikov, and I. Mihajlin. Families with infants: A general approach tosolve hard partition problems. In J. Esparza, P. Fraigniaud, T. Husfeldt, and E. Koutsoupias,editors,

Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014,Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I , volume 8572 of

Lecture Notes inComputer Science , pages 551–562. Springer, 2014.[27] A. Golovnev, A. S. Kulikov, and I. Mihajlin. Families with infants: Speeding up algorithmsfor np-hard problems using FFT.

ACM Trans. Algorithms , 12(3):35:1–35:17, 2016.[28] J. R. Griggs. Database security and the distribution of subset sums in R m . In Graph Theoryand Combinatorial Biology , 1998.[29] G. Halász. Estimates for the concentration function of combinatorial number theory and prob-ability.

Periodica Mathematica Hungarica , 8(3-4):197–211, 1977.[30] R. Hoberg and T. Rothvoss. A logarithmic additive integrality gap for bin packing. In P. N.Klein, editor,

Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19 , pages 2616–2625.SIAM, 2017. 3531] E. Horowitz and S. Sahni. Computing partitions with applications to the knapsack problem.

J. ACM , 21(2):277–292, 1974.[32] K. Jansen, S. Kratsch, D. Marx, and I. Schlotter. Bin packing with ﬁxed number of binsrevisited.

J. Comput. Syst. Sci. , 79(1):39–49, 2013.[33] D. S. Johnson.

Near-optimal bin packing algorithms . PhD thesis, Massachusetts Institute ofTechnology, 1973.[34] D. M. Kane and R. Williams. Super-linear gate and super-quadratic wire lower bounds fordepth-two and depth-three threshold circuits. In D. Wichs and Y. Mansour, editors,

Proceed-ings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016,Cambridge, MA, USA, June 18-21, 2016 , pages 633–643. ACM, 2016.[35] L. V. Kantorovich. Mathematical methods of organizing and planning production.

Managementscience, English Translation of a 1939 paper written in Russian , 6(4):366–422, 1960.[36] N. Karmarkar and R. M. Karp. An eﬃcient approximation scheme for the one-dimensionalbin-packing problem. In , pages 312–320. IEEE, 1982.[37] J. M. Kleinberg and É. Tardos.

Algorithm design . Addison-Wesley, 2006.[38] M. Koivisto. Partitioning into sets of bounded cardinality. In J. Chen and F. V. Fomin,editors,

Parameterized and Exact Computation, 4th International Workshop, IWPEC 2009,Copenhagen, Denmark, September 10-11, 2009, Revised Selected Papers , volume 5917 of

LectureNotes in Computer Science , pages 258–263. Springer, 2009.[39] R. Krauthgamer and O. Trabelsi. The Set Cover Conjecture and Subgraph Isomorphismwith a Tree Pattern. In R. Niedermeier and C. Paul, editors, , volume 126 of

Leibniz Interna-tional Proceedings in Informatics (LIPIcs) , pages 45:1–45:15, Dagstuhl, Germany, 2019. SchlossDagstuhl–Leibniz-Zentrum fuer Informatik.[40] C. Lenté, M. Liedloﬀ, A. Soukhal, and V. T’Kindt. On an extension of the sort & searchmethod with application to scheduling theory.

Theor. Comput. Sci. , 511:13–22, 2013.[41] J. E. Littlewood and A. C. Oﬀord. On the number of real roots of a random algebraic equation.

Journal of the London Mathematical Society , s1-13(4):288–295, 1938.[42] S. Martello and P. Toth.

Knapsack Problems: Algorithms and Computer Implementations .Wiley Series in Discrete Mathematics and Optimization. Wiley, 1990.[43] R. Meka, O. Nguyen, and V. Vu. Anti-concentration for polynomials of independent randomvariables.

Theory Comput. , 12(1):1–17, 2016.[44] M. Mucha, J. Nederlof, J. Pawlewicz, and K. Wegrzycki. Equal-subset-sum faster than the meet-in-the-middle. In M. A. Bender, O. Svensson, and G. Herman, editors, ,volume 144 of

LIPIcs , pages 73:1–73:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,2019. 3645] J. Nederlof. Finding large set covers faster via the representation method. In P. Sankowski andC. D. Zaroliagis, editors, , volume 57 of

LIPIcs , pages 69:1–69:15. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2016.[46] J. Nederlof, E. J. van Leeuwen, and R. van der Zwaan. Reducing a target interval to a fewexact queries. In B. Rovan, V. Sassone, and P. Widmayer, editors,

Mathematical Foundationsof Computer Science 2012 - 37th International Symposium, MFCS 2012, Bratislava, Slovakia,August 27-31, 2012. Proceedings , volume 7464 of

Lecture Notes in Computer Science , pages718–727. Springer, 2012.[47] T. Rothvoß. Approximating bin packing within O (log OP T · log log OP T ) bins. In , pages 20–29, 2013.[48] M. Rudelson and R. Vershynin. The littlewood–oﬀord problem and invertibility of randommatrices. Advances in Mathematics , 218(2):600 – 633, 2008.[49] C. Schlegel and A. Grant.

Coordinated multiuser communications . Springer, 2006.[50] T. Tao and V. H. Vu.

Additive combinatorics , volume 105 of

Cambridge studies in advancedmathematics . Cambridge University Press, 2007.[51] K. Tikhomirov. Singularity of random bernoulli matrices.

Annals of Mathematics , 191(2):593–634, 2020.[52] H. C. A. van Tilborg. An upper bound for codes in a two-access binary erasure channel(corresp.).

IEEE Trans. Inf. Theory , 24(1):112–116, 1978.[53] V. V. Williams and R. R. Williams. Subcubic equivalences between path, matrix, and triangleproblems.

J. ACM , 65(5):27:1–27:38, 2018.[54] M. Wiman. Improved constructions of unbalanced uniquely decodable code pairs, 2017. Bach-elor Thesis KTH.

A Computing the number of distinct sums and critical pruner

Lemma A.1.

Let w : [ n ] → N be an item weight function. Then the set w (2 [ n ] ) can be computedin time O ( | w (2 [ n ] ) | ) .Proof. Deﬁne for all i ∈ { , . . . , n } the set W i as: W i = { w ( X ) : X ⊆ [ i ] } Notice that W n = w (2 [ n ] ) . We iterate over i to compute these sets, setting W = ∅ . Then tocompute the next sets use that: W i = W i − ∪ { x + w ( i ) : x ∈ W i } The total computation time can be upper bounded by (cid:80) ni =1 | S i | = O ( | W n | ) . Corollary A.2.

Let | w (2 [ n ] ) | (cid:62) δn . Then the critical pruner θ can be computed in time O (2 δn ) . roof. Recall the following deﬁnitions. Let l = 1 + (cid:100) log(max i { w ( i ) } ) (cid:101) . For s ∈ { , . . . , l } , the s -pruned weight of item i is w s ( i ) := (cid:98) w ( i ) / l − s (cid:99) . The critical pruner, θ , is θ = min { s ∈ N : | w s (2 [ n ] ) | (cid:62) δn } . Notice that we can assume l = n O (1) by [24].The algorithm ﬁnds θ by computing | w s (2 [ n ] ) | using Lemma A.1 for each s = 1 , , . . . un-til | w s (2 [ n ] ) | (cid:62) δn . Take this s as θ . Because w (2 [ n ] ) = { } and Lemma 3.7 tells us that (cid:0) n (cid:1) | w s (2 [ n ] ) | (cid:54) | w s − (2 [ n ] ) | for any s , we know that | w θ (2 [ n ] ) | = O (2 δn ) . Therefore, the algorithmwill take O (2 δn ) times per iteration and will repeat at most l times, which gives the requestedruntime. B Inequalities with binomials and entropy

Let us start with the useful facts about binary entropy function. h ( x ) := − x log( x ) − (1 − x ) log(1 − x ) . The ﬁrst derivative of binary entropy is: h (cid:48) ( x ) := log(1 − x ) − log( x ) The second derivative: h (cid:48)(cid:48) ( x ) := − x (1 − x ) and we will also need third derivative h (cid:48)(cid:48)(cid:48) ( x ) := 1 − x (ln 2) x (1 − x ) Observe, that for x ∈ [0 , . we have that h (cid:48) ( x ) (cid:62) , h (cid:48)(cid:48) ( x ) (cid:54) and h (cid:48)(cid:48)(cid:48) ( x ) (cid:62) . From 4thderivative we will only need that h (4) ( x ) (cid:54) when x ∈ [0 , . .Hence from Taylor expansion for x ∈ [0 , . it holds that: h ( x + ε ) := h ( x ) + h (cid:48) ( x ) ε + h (cid:48)(cid:48) ( x )2 ε + h (cid:48)(cid:48)(cid:48) ( x )6 ε + O ( ε ) . If we assume, that x ∈ [0 , . and ε (cid:54) h (cid:48)(cid:48) ( x )2 h (cid:48)(cid:48)(cid:48) ( x )) then: h ( x + ε ) (cid:54) h ( x ) + h (cid:48) ( x ) ε + h (cid:48)(cid:48) ( x )4 ε . (12)because h (4) ( x ) (cid:54) when x ∈ [0 , . . Lemma B.1 (Theorem 2.2 from [12]) . ∀ x ∈ [0 ,

1] : 1 − (cid:18) x − (cid:19) (cid:54) h ( x ) (cid:54) − (cid:18) x − (cid:19) , ∀ x ∈ [0 ,

1] : x x ) (cid:54) h − ( x ) (cid:54) x log x , where the inverse entropy function h − : [0 , → [0 , is the inverse of h restricted to the interval [0 , . . emma B.2. Let c (cid:62) , and b (cid:62) . Then for any x satisfying < x < : h (1 / − γ ) (cid:62) − x + b · h ( cγ ) , if γ (cid:54) x bc log(12 b/x ) Proof.

Note that by the various assumptions of the lemma γ (cid:54) x/ (2 log (6 /x )) , and thus γ (cid:54) x (6 /x ) (cid:54)

42 log x/ (cid:54) x/ . (13)Now h (cid:0) − γ (cid:1) can be lower bounded by (cid:62) − γ , by Lemma B.1 (cid:62) − x/ , by (13) (cid:62) − x + b · h ( cγ ) . as desired. Here, the last inequality follows because cγ (cid:54) x/ b / ( x/ b )) (cid:54) h − (cid:0) x b (cid:1) by Lemma B.1,and thus b · h ( cγ ) (cid:54) x . Therefore we have that b · h ( cγ ) (cid:54) x since h is a monotonic in [0 , / . Lemma B.3.

For every β, α, ρ ∈ [0 , . it holds that : (cid:18) βnαβn − ρn (cid:19)(cid:18) (1 − β ) nα (1 − β ) n + ρn (cid:19) (cid:54) (cid:18) nαn (cid:19) · − f ( ρ,α ) n . where f ( ρ, α, β ) := (cid:40) ρ if | ρ | < α (1 − α ) min { β, (1 − β ) }− α log 2 α otherwiseProof. First, observe that when | ρ | > α (1 − α ) min { β, (1 − β ) } we our expression is upper boundedby: (cid:18) βnαβn (1 − α ) (cid:19)(cid:18) (1 − β ) nα (1 − α )(1 − β ) n (cid:19) (cid:54) (cid:18) nα (1 − α ) n (cid:19) . This however is bounded by h ( α (1 − α )) n . Observe that h ( α − α ) (cid:54) h ( α ) − h (cid:48) ( α ) α = h ( α ) − α (log( α ) − log(1 − α )) (cid:54) h ( α ) − α log(2 α ) . Hence when | ρ | is large we upper bound our expressionwith: (cid:18) nαn (cid:19) α log(2 α ) n . Now, we consider the case of small | ρ | . We upper bound the expression with binary entropy. (cid:18) βnαβn − ρn (cid:19)(cid:18) (1 − β ) nα (1 − β ) n + ρn (cid:19) = 2 n (cid:16) βh ( α − ρβ )+(1 − β ) h ( α + ρ − β ) (cid:17) Let us consider an exponent: βh (cid:18) α − ρβ (cid:19) + (1 − β ) h (cid:18) α + ρ − β (cid:19) We use Inequality 12 with x = α and ε := − ρβ for h ( α − ρ/β ) and with ε := ρ − β for h ( α + ρ/ (1 − β )) . 39bserve that at the beginning we assumed that | ρ | (cid:54) α (1 − α ) min { β, (1 − β ) } hence | ρβ | and | ρ − β | are upper bounded by | h (cid:48)(cid:48) ( α )2 h (cid:48)(cid:48)(cid:48) ( α ) | . So, by Inequality 12: βh (cid:18) α − ρβ (cid:19) + (1 − β ) h (cid:18) α + ρ − β (cid:19) (cid:54) h ( α ) + h (cid:48)(cid:48) ( α )4 ρ β (1 − β ) . Observe that ﬁrst order factors cancel out. Hence (cid:18) βnαβn − ρn (cid:19)(cid:18) (1 − β ) nα (1 − β ) n + ρn (cid:19) (cid:54) (cid:18) nαn (cid:19) h (cid:48)(cid:48) ( α ) ρ β (1 − β ) Finally, observe that h (cid:48)(cid:48) ( α ) < − for all α ∈ [0 , . and β (1 − β ) (cid:54) for all β ∈ [0 , . hence: (cid:18) βnαβn − ρn (cid:19)(cid:18) (1 − β ) nα (1 − β ) n + ρn (cid:19) (cid:54) (cid:18) nαn (cid:19) − ρ n Lemma B.4.

For all k ∈ N and α ∈ [0 , we have: h (Bin( k, α )) (cid:54) h (Bin( k + 1)) . Proof.

Let us ﬁx k ∈ N . Recall that Bin( k + 1) := ( { , . . . , k + 1 } , p ( i )) and Bin( k, α ) := ( { , . . . , k +1 } , p α ( i )) where p ( i ) := (cid:0) k +1 i (cid:1) k +1 and p α ( i ) := (cid:0) ki (cid:1) (1 − α )2 k + (cid:0) ki − (cid:1) α k . Hence, we need to prove thatfor all α ∈ [0 , : h ( p α (0) , . . . p α ( k + 1)) (cid:54) h ( p (0) , . . . , p ( k + 1)) . Let us denote φ ( α ) := h ( p α (0) , . . . , p α ( k + 1)) . First, observe that φ (0 .

5) = h ( p (0) , . . . , p ( k + 1)) because (cid:0) k +1 i (cid:1) = (cid:0) ki (cid:1) + (cid:0) ki − (cid:1) . Therefore we need to prove that for all α ∈ [0 , it holds that: φ ( α ) (cid:54) φ (0 . . Recall that binary entropy of multinomial is h ( a , . . . , a k +1 ) := − a log( a ) − . . . − a k +1 log( a k +1 ) and ( x ln( x )) (cid:48) = ln( x ) + 1 Observe that function φ ( α ) is well deﬁned for α = 0 and α = 1 as limits.Moreover φ ( α ) (cid:62) for all α ∈ [0 , .Now, we compute the ﬁrst derivative. φ (cid:48) ( α ) = − k ln 2 (cid:88) i (cid:20)(cid:18) ki − (cid:19) − (cid:18) ki (cid:19)(cid:21) (cid:0) p α ( i )) (cid:1) . Because (cid:80) i (cid:0) ki − (cid:1) = (cid:80) i (cid:0) ki (cid:1) the ﬁrst derivative simpliﬁes to: φ (cid:48) ( α ) = − k ln 2 (cid:88) i (cid:20)(cid:18) ki − (cid:19) − (cid:18) ki (cid:19)(cid:21) ln ( p α ( i ))) Now the second derivative is φ (cid:48)(cid:48) ( α ) = − k ln 2 (cid:88) i (cid:20)(cid:18) ki − (cid:19) − (cid:18) ki (cid:19)(cid:21) · p α ( i ) (cid:54) , thus φ ( α ) is concave for all α ∈ [0 , . So in order to show that the φ ( α ) function has exactlyone maximum in α = 1 / it is suﬃcient to show that φ (cid:48) (0 .

5) = 0 .40et us rearrange the sum: φ (cid:48) (0 .

5) = (cid:88) i (cid:20)(cid:18) ki − (cid:19) − (cid:18) ki (cid:19)(cid:21) ln( p . ( i )) = (cid:88) i (cid:18) ki (cid:19) ln( p . ( i + 1)) − (cid:88) i (cid:18) ki (cid:19) ln( p . ( i ))= (cid:88) i (cid:18) ki (cid:19) ln p . ( i + 1) p . ( i ) . Because p . ( i ) = 12 k +1 (cid:18) k + 1 i (cid:19) we can simplify the fraction: p . ( i + 1) p . ( i ) = (cid:0) k +1 i +1 (cid:1)(cid:0) k +1 i (cid:1) = k + 1 − ii + 1 , thus φ (cid:48) (0 .

5) = (cid:88) i (cid:18) ki (cid:19) ln k + 1 − ii + 1 = (cid:88) i (cid:18) ki (cid:19) ln( k + 1 − i ) − (cid:88) i (cid:18) ki (cid:19) ln( i + 1)= (cid:88) i (cid:18) ki (cid:19) ln( k + 1 − i ) − (cid:88) i (cid:18) kk − i (cid:19) ln( k − i + 1) = 0 ,,