[PDF] Online Bin Packing with Predictions

Abstract

Bin packing is a classic optimization problem with a wide range of applications from load balancing in networks to supply chain management. In this work we study the online variant of the problem, in which a sequence of items of various sizes must be placed into a minimum number of bins of uniform capacity. The online algorithm is enhanced with a (potentially erroneous) prediction concerning the frequency of item sizes in the sequence. We design and analyze online algorithms with efficient tradeoffs between consistency (i.e., the competitive ratio assuming no prediction error) and robustness (i.e., the competitive ratio under adversarial error), and whose performance degrades gently as a function of the prediction error. This is the first theoretical study of online bin packing in the realistic setting of erroneous predictions, as well as the first experimental study in the setting in which the input is generated according to both static and evolving distributions. Previous work on this problem has only addressed the extreme cases with respect to the prediction error, has relied on overly powerful and error-free prediction oracles, and has focused on experimental evaluation based on static input distributions.

Full PDF

OOnline Bin Packing with Predictions

Spyros Angelopoulos , Shahin Kamali , and Kimia Shadkami CNRS and Sorbonne Université, Laboratoire d’Informatique de Paris 6, 4 Place Jussieu, Paris,France 75252. email: [email protected] Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada. email: [email protected], [email protected]

Abstract

Bin packing is a classic optimization problem with a wide range of applications from load balancing innetworks to supply chain management. In this work we study the online variant of the problem, in whicha sequence of items of various sizes must be placed into a minimum number of bins of uniform capacity.The online algorithm is enhanced with a (potentially erroneous) prediction concerning the frequency ofitem sizes in the sequence. We design and analyze online algorithms with efﬁcient tradeoffs betweentheir consistency (i.e., the competitive ratio assuming no prediction error) and their robustness (i.e., thecompetitive ratio under adversarial error), and whose performance degrades gently as a function of theerror. Previous work on this problem has only addressed the extreme cases with respect to the predictionerror, and has relied on overly powerful and error-free prediction oracles.

Bin packing is a classic optimization problem and one of the original NP-hard problems. Given a set of items , each with a (positive) size , and as a bin capacity , the objective is to assign the items to the minimumnumber of bins such that the sum of item sizes in each bin does not exceed the bin capacity. Bin packingis instrumental in modeling resource allocation problems such as load balancing and scheduling [13], andhas many applications in supply chain management such as capacity planning in logistics and cutting stock.Efﬁcient exact algorithms for the problem have been proposed within AI [23, 24, 16, 32].In this work we focus on the online variant of bin packing, in which the set of items is not known inadvance, but is rather revealed in the form of a sequence . Upon the arrival of a new item, the online algorithmmust either place it into one of the currently open bins, as long as this action does not violate the bin’s capacity,or into a new bin. The online model has several applications related to dynamic resource management suchas virtual machine placement for server consolidation [33, 34] and memory allocation in data centers [10].We rely on the standard framework of competitive analysis in order to evaluate the performance of onlinealgorithms for the problem. In fact, as stated in [13] bin packing has served as an early proving ground for thistype of analysis, in the broader context of online computation. The competitive ratio of an online algorithmis the worst-case ratio of the algorithm’s cost (total number of opened bins) over the optimal ofﬂine cost(optimal number of opened bins given knowledge of all items). In what concerns bin packing, in particular,the standard metric for worst-case performance is the asymptotic competitive ratio , where we restrict to inputsequences whose length is arbitrarily large (i.e., as the number of items tends to inﬁnity) [13].1 a r X i v : . [ c s . D S ] F e b hile the standard online framework assumes that the algorithm has no information on the input sequence,there is a very active direction in machine learning that studies online algorithms which leverage predictions concerning the input. More precisely, the online algorithm has access to some machine-learned informationon the input; this information is, in general, erroneous, namely there is a prediction error η associated with it,which is problem-speciﬁc. The objective is to design algorithms whose competitive ratio degrades gently asthe prediction error increases. Following [27], we refer to the competitive ratio of an algorithm with error-freeprediction as the consistency of the algorithm, and to the competitive ratio with adversarial prediction as its robustness . Several online optimization problems have been studied in this setting, including caching [27, 31],ski rental and non-clairvoyant scheduling [30], makespan scheduling [26], rent-or-buy problems [9, 18],secretary and matching problems [4], and metrical task systems [3]. See also the survey [29]. In this work we study online bin packing with (potentially erroneous) predictions. Following [32, 16] weassume that the size of each item is an integer in [1 , k ] , where k is the bin capacity. This is motivated bytypical applications of bin packing such as Virtual Machine (VM) placement, in which each item correspondsto a resource of discrete size. For example, Amazon AWS offers 42 types of EC2 instances [15] and MicrosoftAzure offers 12 series of Virtual Machines [6]. We thus assume that k is a constant, independent of n .We make use of very natural predictions, namely the frequencies at which items of different sizes areexpected to appear in the input sequence. The prediction error can be deﬁned using a natural distance metricsuch as the L distance.Our algorithms are based on the concept of the proﬁle set , which serves as an approximation of the itemsthat are expected to appear in the sequence, given the prediction. We deﬁne this concept in Section 3, andpresent an algorithm called P ROFILE P ACKING that is based on optimal packings of the proﬁle set. Ourtheoretical analysis of this algorithm is given in Section 4. Here, our approach is to exploit ﬁrst an analysisfor the ideal case of error-free predictions, which we then extend to the realistic case of erroneous predictions.P

ROFILE P ACKING has near-optimal consistency but is not robust. In Section 5 we address this issue bydeﬁning and analyzing a class of hybrid algorithms that combine P

ROFILE P ACKING and any one of the knownrobust online algorithms, and which offer a more balanced tradeoff between robustness and consistency.The above algorithms do not update their predictions as they process the input, and they are better suitedfor inputs that are generated according to an unknown, but ﬁxed distribution. We thus present a naturalheuristic that updates the predictions based on previously served items, and which is better suited for inputsgenerated from distributions that change over time (e.g., in the case of evolving data [19]). Last, in Section 6we present the experimental evaluation of our algorithms.

Online bin packing has a long history of study. Simple algorithms such as F

IRST F IT (which places an iteminto the ﬁrst bin of sufﬁcient space, and opens a new bin if required), and B EST F IT (which works similarly,except that it places the item into the bin of minimum available capacity which can still ﬁt the item) are1.7-competitive [21]. Improving upon this performance requires more sophisticated algorithms. The currentlybest upper bound on the competitive ratio is 1.57829 [7], whereas the best known lower bound is 1.54037 [8].The results above apply to the standard model in which there is no prediction on the input.The problem has also been studied under the advice complexity model [11, 2], in which the onlinealgorithm has access to some error-free information on the input called advice , and the objective is to quantifythe tradeoffs between the performance of the algorithm and the size of the advice (in terms of bits). It should2e emphasized that such studies are only of theoretical interest, not only because the advice is assumed tohave no errors, but also because it may encode any information, with no learnability considerations (i.e. itmay be provided by an omnipotent oracle that knows the optimal solution).Last, online bin packing was recently studied under an extension of the advice complexity model, inwhich the advice may be untrusted [1]. Here, the performance of the algorithm is evaluated only at theextreme cases in which the advice is either error-free, or adversarially generated, namely with respect to itsconsistency and its robustness, respectively. The objective is to ﬁnd Pareto-efﬁcient algorithms with respect tothese two metrics, as function of the advice size. However, this model is not concerned with the performanceof the algorithm on typical cases in which the prediction does not fall in one of the two above extremes. The input to the online algorithm is a sequence σ = a , . . . , a n , where a i ∈ [1 , k ] is the size of the i -th itemin σ , and k is the bin capacity. We denote by n the length of σ , and by σ [ i, j ] the subsequence of σ thatconsists of items with indices i, . . . , j in σ . Given an online algorithm A (with no predictions), we denote by A ( σ ) its output (packing) on input σ , and by | A ( σ ) | the number of bins in its output. We denote by O PT ( σ ) the ofﬂine optimal algorithm with knowledge of the input sequence. The (asymptotic) competitive ratio of A is deﬁned as lim n →∞ sup σ : | σ | = n | A ( σ ) | / | O PT ( σ ) | . Consider a bin b . For the purposes of the analysis, we will often have to associate b with a speciﬁc conﬁgurationof items that can be placed into it. We thus say that b is of type ( s , s , . . . , s l , e ) , with s i , e ∈ [1 , k ] and (cid:80) lj =1 s j + e = k , in the sense that the bin can pack l items of sizes s , . . . , s l , with a remaining empty spaceequal to e . We will also specify that a bin is ﬁlled according to type ( s , s , . . . , s l , e ) , if it contains l itemsof sizes s , . . . , s l , with a remaining empty space equal to e . Note that the deﬁnition of the type induces apartition of k into l + 1 integers; we call each of the l elements s , . . . , s l in the partition a placeholder . Last,we denote by τ k the number of all possible bin types. It is important to note that τ k depends only on k andnot on the length n of the sequence, and is constant, since k is constant. Consider an input sequence σ . For any x ∈ [1 , k ] , let n x,σ denote the number of items of size x in σ . Wedeﬁne the frequency of size x in σ , denoted by f x,σ , to be equal to n x,σ /n , hence f x,σ ∈ [0 , .Our algorithms will use these frequencies as predictions. Namely, for every x ∈ [1 , k ] , there is a predictedvalue of the frequency of size x in σ , which we denote by f (cid:48) x,σ . The predictions come with an error, andin general, f (cid:48) x,σ (cid:54) = f x,σ . To quantify the prediction error, let f σ and f (cid:48) σ denote the frequencies and theirpredictions in σ , respectively, as points in the k -dimensional space. In line with previous work on onlinealgorithms with predictions, e.g. [30], we can deﬁne the error η as the L norm of the distance between f σ and f (cid:48) σ .In general, the error η may be bounded by a value H , i.e., we have η ≤ H . We can thus make a distinctionbetween H - aware and H - oblivious algorithms, depending on whether the algorithm knows H . Such anupper bound may be estimated e.g., from available data on typical sequences. Note however, that unlessotherwise speciﬁed, we will assume that the algorithm is H -oblivious.3e denote by A ( σ, f (cid:48) σ ) the output of A on input σ and predictions f (cid:48) σ . To simplify notation, we will omit σ when it is clear from context, i.e., we will use f (cid:48) in place of f (cid:48) σ . In this section we present an algorithm for online bin packing with predictions f (cid:48) which we call P RO - FILE P ACKING . The algorithm uses a parameter denoted by m , which is a sufﬁciently large, but constantinteger, that will be speciﬁed later. The algorithm is based on a the concept of a proﬁle , denoted by P f (cid:48) , whichis deﬁned as a multiset that consists of (cid:100) f (cid:48) x m (cid:101) items of size x , for all x ∈ [1 , k ] . One may think of the proﬁleas an “approximation” of the multiset of items that the algorithm expects as input, given the predictions f (cid:48) .Consider an optimal packing of the items in the proﬁle P f (cid:48) ; since the size of items in P f (cid:48) is bounded by k , it is possible to compute the optimal packing in constant time (e.g., using the algorithm of [23]). We willdenote by p f (cid:48) the number of bins in the optimal packing of all items in the proﬁle. Note that each of these p f (cid:48) bins is ﬁlled according to a certain type that is speciﬁed by the optimal packing of the proﬁle. We willsimplify notation and use P and p instead of P f (cid:48) and p f (cid:48) , respectively, when f (cid:48) is implied.We deﬁne the actions of P ROFILE P ACKING . Prior to serving any items, P

ROFILE P ACKING opens p empty bins of types that are in accordance with the optimal packing of the proﬁle (so that there are (cid:100) f (cid:48) x m (cid:101) placeholders of size x in these empty bins). When an item, say of size x , arrives, the algorithm will placeit into any placeholder reserved for items of size x , provided that such one exists. Otherwise, i.e., if allplaceholders for size x are occupied, then the algorithm will open another set of p bins, again of typesdetermined by the optimal proﬁle packing. We call each such set of p bins a proﬁle group . Note that thealgorithm does not close any bins at any time, that is, any placeholder for an item of size x can be used at anypoint in time, so long as it is unoccupied.We require that P ROFILE P ACKING opens bins in a lazy manner, that is, the p bins in the proﬁle groupare opened virtually, and each bin is counted in the cost of the algorithm only after receiving an item. Last,suppose that for some size x , it is f x > , whereas its prediction is f (cid:48) x = 0 . In this case, x is not in the proﬁleset P . P ROFILE P ACKING packs these special items separately from others, using the F

IRST F IT strategy. ROFILE P ACKING

We ﬁrst show that in the ideal setting of error-free prediction ( η = 0 ), P ROFILE P ACKING is near-optimal(see Lemma 1). This result will be useful for analyzing the algorithm in the realistic setting of erroneouspredictions. We denote by (cid:15) any ﬁxed constant less than 0.2. We deﬁne m to be any constant such that m ≥ τ k k/(cid:15) . Lemma 1.

For any constant (cid:15) ∈ (0 , . , and error-free prediction ( f (cid:48) = f ), P ROFILE P ACKING hascompetitive ratio at most (cid:15) .Proof.

Let (cid:15) (cid:48) = (cid:15)/ and note that m ≥ τ k k/(cid:15) (cid:48) . Given an input sequence σ , denote by P P ( σ, f (cid:48) ) the packingoutput by the algorithm. This output can be seen as consisting of g proﬁle group packings (since each timethe algorithm allocates a new set of p bins, a new proﬁle group is generated). Since the input consists of n items, and the proﬁle is comprised of at least m items, we have that g ≤ (cid:100) n/m (cid:101) .Given an optimal packing O PT ( σ ) , we deﬁne a new packing, which we denote by N , that not only packsitems in σ , but also additional items as follows. N contains all (ﬁlled) bins of O PT ( σ ) , along with theircorresponding items. For every bin type in O PT ( σ ) , we want to ensure that N contains a number of bins ofthat type that is divisible by g . To this end, we add at most g − ﬁlled bins of the same type in N .4e can argue that | N | is not much bigger than | O PT ( σ ) | . We have that | N | ≤ | O PT ( σ ) | + ( g − τ k < | O PT ( σ ) | + nτ k /m ≤ | O PT ( σ ) | (1 + τ ( k ) k/m ) (since | O PT ( σ ) | ≥ (cid:100) n/k (cid:101) ). We conclude that | N | ≤ (1 + (cid:15) (cid:48) ) | O PT ( σ ) | .By construction, N contains g copies of the same bin (i.e., bins that are ﬁlled according to the same type).Equivalently, we can say that N consists of g copies of the same packing, and we denote this packing by N .Let q = | N | be the number of bins in this packing. We will show that p is not much bigger than q , which iscrucial in the proof. The number of items of size x in the packing N is at least (cid:100) n x /g (cid:101) , since N contains atleast n x items of size x . We also have (cid:100) n x /g (cid:101) ≥ n x / (cid:100) n/m (cid:101) ( g ≤ (cid:100) n/m (cid:101) ) > n x m/ ( n + m ) ( (cid:100) n/m (cid:101) < ( n + m ) /m ) = n x ( m/n − m / ( n + mn )) ≥ n x m/n − m / ( n + m ) ( n x ≤ n ) ≥ (cid:100) n x m/n (cid:101) − − m / ( n + m ) > (cid:100) n x m/n (cid:101) − . ( m < n )In other words, for each x ∈ [1 , k ] , N packs each item of size x that appears in the proﬁle set, with theexception of at most one such item. From the statement of P ROFILE P ACKING , and its optimal packingof the proﬁle set, we infer that q + k ≥ p . Note that q ≥ | O PT ( σ ) | /g ≥ n/ ( kg ) ≥ n/ ( k (cid:100) n/m (cid:101) ) > ( (cid:100) n/m (cid:101) m − m ) / ( k (cid:100) n/m (cid:101) ) ≥ m/k − m /kn > m/k − (cid:15) (cid:48) ≥ τ k /(cid:15) (cid:48) − (cid:15) (cid:48) > ( τ k − /(cid:15) (cid:48) > k/(cid:15) (cid:48) . We thusshowed that q > k/(cid:15) (cid:48) , and the inequality p ≤ q + k implies that p < q (1 + (cid:15) (cid:48) ) .We conclude that the number of bins in each proﬁle group of P ROFILE P ACKING is within a factor (1 + (cid:15) (cid:48) ) of the number of bins in N . Moreover, recall that P P ( σ, f (cid:48) ) consists of g proﬁle groups, and N consistsof g copies of N . Combining this with previously shown properties, we have that | P P ( σ, f (cid:48) ) | ≤ g · p

For any constant (cid:15) ∈ (0 , . , and predictions f (cid:48) with error η , P ROFILE P ACKING has competitiveratio at most (cid:15) ) ηk + (cid:15) .Proof. Let f be the frequency vector for the input σ . Of course, f is unknown to the algorithm. In thiscontext, P P ( σ, f ) is the packing output by P ROFILE P ACKING with error-free prediction, and from Lemma 1we know that | P P ( σ, f ) | ≤ (1 + (cid:15) ) | O PT ( σ ) | . (1)Recall that P f (cid:48) denotes the proﬁle set of P ROFILE P ACKING on input σ with predictions f (cid:48) , and p f (cid:48) denotes the number of bins in the optimal packing of P f (cid:48) ; P f and p f are deﬁned similarly. We will ﬁrst relate p f and p f (cid:48) in terms of the error η . Note that the multisets P f and P f (cid:48) differ in at most (cid:80) kx =1 µ x elements,where µ x = |(cid:100) f x m (cid:101) − (cid:100) f (cid:48) x m (cid:101)| . We call these elements differing . We have µ x ≤ | ( f x − f (cid:48) x ) m | + 1 , hence (cid:80) kx =1 µ x ≤ k + (cid:80) kx =1 | ( f x − f (cid:48) x ) m | ≤ k + ηm . We conclude that the number of bins in the optimal packingof P f (cid:48) exceeds the number of bins in the optimal packing of P f by at most k + ηm , i.e., p f (cid:48) ≤ p f + k + ηm .Let g and g (cid:48) denote the number of proﬁle groups in P P ( σ, f ) and P P ( σ, f (cid:48) ) , respectively. We aim tobound | P P ( σ, f (cid:48) ) | , and to this end we will ﬁrst show a bound on the number of bins opened by P P ( σ, f (cid:48) ) in5ts ﬁrst g proﬁle groups, then in on the number of bins in its remaining g (cid:48) − g proﬁle groups (if g (cid:48) ≤ g , there isno such contribution to the total cost). For the ﬁrst part, the bound follows easily: There are g proﬁle groups,each one consisting of p f (cid:48) bins, therefore the number of bins in question is at most g · p f (cid:48) ≤ g ( p f + k + ηm ) .For the second part, we observe that since P ROFILE P ACKING is lazy, any item packed by

P P ( σ, f (cid:48) ) in itslast g (cid:48) − g packings has to be a differing element, which implies from the discussion above that P P ( σ, f (cid:48) ) opens at most g ( k + ηm ) bins in its last g (cid:48) − g proﬁle groups. We can thus bound | P P ( σ, f (cid:48) ) | as follows. | P P ( σ, f (cid:48) ) |≤ g ( p f + k + ηm ) + g ( k + ηm )= g ( p f + 2 k + 2 ηm ) ≤ g ( p f + 2 ηm (1 + (cid:15) )) ( k ≤ (cid:15)m ) ≤ g ( p f + 2 ηp f k (1 + (cid:15) )) ( p f ≥ (cid:100) m/k (cid:101) ) = g · p f (1 + 2 ηk (1 + (cid:15) )) ≤ | P P ( σ, f ) | (1 + 2 ηk (1 + (cid:15) )) ≤ (1 + (cid:15) )(1 + 2 ηk (1 + (cid:15) )) | O PT ( σ ) | (From (1)) = (1 + 2 ηk (1 + (cid:15) ) + (cid:15) ) | O PT ( σ ) | = (1 + 2 ηk (1 + (cid:15) + 2 (cid:15) ) + (cid:15) ) | O PT ( σ ) | < (1 + ηk (2 + 5 (cid:15) ) + (cid:15) ) | O PT ( σ ) | . ( (cid:15) < (cid:15)/ )Theorem 2 is asymptotically tight, in the sense that there is a worst-case input for which the competitiveratio of P ROFILE P ACKING indeed depends on ηk . Consider the situation in which k = 4 , m = 100 , f = (1 . , , , and f (cid:48) = (0 . , , . . Here, we have η = EMD ( f , f (cid:48) ) = 0 . . With predictions f (cid:48) ,P ROFILE P ACKING crates a proﬁle that consists of 1 item of size 1 and 99 items of size 4. The optimalproﬁle packing has thus size 100. This means that for an input σ of n items, P ROFILE P ACKING outputs apacking of n bins whereas O PT ( σ ) = n/ . This implies that P ROFILE P ACKING is not robust. We addressthis shortcoming in the next section.

In this section we obtain online bin packing algorithms of improved robustness. The main idea is to letP

ROFILE P ACKING serve only certain items, whereas the remaining ones are served by an online algorithm A that is robust. In particular, let A denote any algorithm of competitive ratio c A , in the standard online modelin which there is no prediction. For instance, as mentioned in Section 1 F IRST F IT is . -competitive (thus . -robust), and the best-known competitive ratio is 1.57829. We will deﬁne a class of algorithms based on aparameter λ ∈ [0 , which we denote by H YBRID ( λ ). In particular, let a, b ∈ N + be such that λ = a/ ( a + b ) .We require that the parameter m in the statement of P ROFILE P ACKING is a sufﬁciently large constant, namely m ≥ τ k max { k, a + b } /(cid:15) .Upon arrival of an item of size x ∈ [1 , k ] , H YBRID ( λ ) marks it as either an item to be served byP ROFILE P ACKING , or as an item to be served by A ; we call such an item a PP-item or an A -item , inaccordance to this action. Moreover, for every x ∈ [1 , k ] , H YBRID ( λ ) maintains two counters: count ( x ) ,which is the number of items of size x that have been served so far, and ppcount ( x ) , which is the number ofPP-items of size x that have been served so far. 6e describe the actions of H YBRID ( λ ). Suppose that an item of size x arrives. If there is an emptyplaceholder of size x in a non-empty bin, then the item is assigned to that bin (and to the correspondingplaceholder), and declared PP-item. Otherwise, there are two possibilities: If ppcount ( x ) ≤ λ · count ( x ) ,then it is served using P ROFILE P ACKING and is declared PP-item. If ppcount ( x ) > λ · count ( x ) , then it isserved using A and declared A -item.It is important to note that in H YBRID ( λ ), A and P ROFILE P ACKING each maintain their own bin space, sowhen serving according to one of these two algorithms, only the bins opened by the corresponding algorithmare considered. Thus, we can partition the bins used by H

YBRID ( λ ) into PP-bins and A -bins . Theorem 3.

For any (cid:15) ∈ (0 , . and λ ∈ [0 , , H YBRID ( λ ) has competitive ratio (1 + (cid:15) )((1 + (2 + 5 (cid:15) ) ηk + (cid:15) ) λ + c A (1 − λ )) , where c A is the competitive ratio of A .Proof. We deﬁne two partitions of the multiset of items in the input sequence σ . The ﬁrst partition is S P P ∪ S A , where S P P and S A are the PP-items and A-items of H YBRID ( λ ), respectively. The secondpartition is S (cid:48) P P ∪ S (cid:48) A , where S (cid:48) P P and S (cid:48) A are deﬁned such that for any x ∈ [1 , k ] there are (cid:98) λn x (cid:99) items ofsize x in S (cid:48) P P and n x − (cid:98) λn x (cid:99) items of size x in S (cid:48) A . Given an optimal packing O PT ( σ ) , we will deﬁne a newpacking, which we denote by N , such that every bin in N contains only items in S (cid:48) P P or only items in S (cid:48) A .Let N P P and N A denote the set of bins in N that include items in S (cid:48) P P and in S (cid:48) A , respectively. Similarly, let B P P and B A denote the set of bins in the packing of H YBRID ( λ ) that contain only PP-items (PP-bins) andA-items (A-bins), respectively. In order to prove the theorem, we will ﬁrst show the following properties:(i) | N | ≤ (1 + (cid:15) ) | O PT ( σ ) | .(ii) | B P P | ≤ (1 + (2 + 5 (cid:15) ) ηk + (cid:15) ) | N P P | .(iii) | B A | ≤ c A | N A | .We ﬁrst explain how to derive N from O PT ( σ ) . N contains the ﬁlled bins of O PT ( σ ) and up to ( a + b ) τ k additional ﬁlled bins so as to guarantee that the number of bins of each given type in N is divisible by a + b .Given that m ≥ τ k max { k, a + b } /(cid:15) , we can use an argument similar to the proof of Lemma 1 to show | N | ≤ (1 + (cid:15) ) | O PT ( σ ) | . Since the number of bins of each type in N is divisible by a + b , we can partition N into N P P and N A so that | N P P | ≤ a (1 + (cid:15) ) | O PT ( σ ) | / ( a + b ) and | N A | ≤ b (1 + (cid:15) ) | O PT ( σ ) | / ( a + b ) . Thatis, | N P P | ≤ λ (1 + (cid:15) ) | O PT ( σ ) | and N A ≤ (1 − λ )(1 + (cid:15) ) | O PT ( σ ) | . Note that N not only packs items in σ ,but also additional items in the added bins. That implies that all items S (cid:48) P P are packed in N P P and all itemsin S (cid:48) A are packed in N A , and hence (i) follows.To prove (ii) and (iii), we note that S A ⊆ S (cid:48) A which implies S (cid:48) P P ⊆ S P P . This is because the algorithmdeclares an item of size x as A-item only if ppcount ( x ) > λ count ( x ) . Hence, at any given time during theexecution of H YBRID ( λ ), the number of A-items of size x is no more than a fraction (1 − λ ) of count ( x ) .Next we will show property (ii). First, note that | B P P | = | P P ( σ P P , f (cid:48) ) | , where σ P P is the subsequenceof σ formed by the P P -items, and

P P abbreviates the output of P

For any (cid:15) ∈ (0 , . and λ ∈ [0 , , there is an algorithm with competitive ratio (1+ (cid:15) )(1 . λ ((2 + 5 (cid:15) ) ηk − . (cid:15) )) . It is worth noting that algorithms such as the one of [7] belong in a class that is tailored to worst-casecompetitive analysis (namely the class of harmonic-based algorithms) and do not tend to perform well ontypical instances [22]. For this reason, simple algorithms such as F

IRST F IT and B EST F IT are preferred inpractice, since they have a much better average-case performance at the expense of a somewhat inferiorworst-case performance [13]. Hence we obtain: Corollary 5.

For any (cid:15) ∈ (0 , . and λ ∈ [0 , , H YBRID ( λ ) using F IRST F IT has competitive ratio (1 + (cid:15) )(1 . λ ((2 + 5 (cid:15) ) ηk − . (cid:15) )) . From Theorem 3, it follows that for H

YBRID ( λ ) to be robust, one must chose λ = 1 / Ω( k ) , whichin turn implies that the consistency is not much better than c A . But we can obtain a better result if anupper bound on the error H is known. More precisely, let H -A WARE denote the algorithm which executesH

YBRID (1), if

H < ( c A − − (cid:15) ) / ( k (2 + 5 (cid:15) )) , and H YBRID (0), otherwise. An equivalent statement is that H -A WARE executes P

ROFILE P ACKING if H < ( c A − − (cid:15) ) / ( k (2 + 5 (cid:15) )) , and A , otherwise. The followingtheorem follows directly from Theorem 3 with the observation that as long as η < ( c A − − (cid:15) ) / ( k (2 + 5 (cid:15) )) ,P ROFILE P ACKING has a competitive ratio better than c A . Theorem 6.

For any (cid:15) ∈ (0 , . , H -A WARE using algorithm A has competitive ratio min { c A , (cid:15) ) ηk + (cid:15) } , where c A is the competitive ratio of A . Using F

IRST F IT as A we obtain the following corollary. Corollary 7.

For any (cid:15) ∈ (0 , . , H -A WARE using F IRST F IT has competitive ratio min { . , (cid:15) ) ηk + (cid:15) } . In all previous algorithms the prediction does not change throughout their execution. While such algorithmscan be useful for inputs that are drawn from a ﬁxed distribution, they may not always perform well if theinput sequence is generated from distributions that change with time, e.g., when dealing with evolving datastreams. We deﬁne a heuristic called A

DAPTIVE ( w ), in which predictions are updated dynamically using a sliding window approach; see e.g. [19]. 8peciﬁcally, A DAPTIVE ( w ) uses a parameter w ∈ N + as follows. In the initial phase, A DAPTIVE ( w )serves σ [1 , w ] using F IRST F IT ; moreover, at the end of this phase, it computes f σ [1 ,w ] , namely the frequencyvector of all sizes in σ [1 , w ] . From this point onwards, the algorithm will serve items using P ROFILE P ACKING with predictions f (cid:48) which are initialized to f σ [1 ,w ] . Speciﬁcally, every time A DAPTIVE ( w ) encounters item σ [ iw ] , for i ∈ N + , it updates f (cid:48) to f σ [( i − w +1 ,iw ] . Several benchmarks have been used in previous work on exact and approximation algorithms for (ofﬂine)bin packing (see [12] for a list of related work). Many of these benchmarks typically use item sizes thatare generated using either uniform or normal distributions. There are two important issues that one needsto take into account. First, inputs generated from such simple distributions are often unrealistic and do notcapture typical applications of bin packing such as resource allocation [17]. Second, in what concerns onlinealgorithms, simple algorithms such as F

IRST F IT and B EST F IT are very close to optimal for input sequencesgenerated from uniform distributions [13] and very often outperform, in practice, many online algorithms ofbetter competitive ratio [22].We evaluate our algorithms on two types of benchmarks. The ﬁrst type is based on the Weibull distribution,and was ﬁrst studied in [12] as a model of several real-world applications of bin packing, e.g., the 2012ROADEF/EURO Challenge on a data center problem provided by Google and several examination timetablingproblems. The Weibull distribution is speciﬁed by two parameters: the shape parameter sh and the scale parameter sc (with sh, sc > ). The shape parameter deﬁnes the spread of item sizes: lower values indicategreater skew towards smaller items. The scale parameter, informally, has the effect of stretching out theprobability density. In our experiments, we chose sh ∈ [1 . , . . This is because values outside this rangeresult in trivial sequences with items that are generally too small (hence easy to pack) or too large (for whichany online algorithm tends to open a new bin). The scale parameter is not critical, since we scale items to thebin capacity, as discussed later; we thus set sc = 1000 , in accordance with [12].The second type of benchmarks is generated from the BPPLIB Bin Packing Library [14]. This is acollection of bin packing benchmarks used in various works on (ofﬂine) algorithms for bin packing and itsvariants. Due to space limitations, we report results on the GI Benchmark of the BPPLIB Library, which isthe most recent benchmark of BPPLIB. The GI benchmark is from [20] which studied bin packing in thecontext of cutting stock applications. We describe how we generate input sequences from the benchmarks. We ﬁx the size of the sequence to n = 10 . We generate two different classes of input sequences. Sequences from a ﬁxed distribution.

For Weibull benchmarks, the input sequence consists of items generatedindependently and uniformly at random, for shape parameter set to sh = 3 . . For BPPLIB benchmarks, eachitem is chosen uniformly and independently at random from the item sizes in one of the benchmark ﬁles; thisﬁle is also chosen uniformly at random. Sequences from an evolving distribution.

Here, the distribution of the input sequence changes every 50000items. Namely, the input sequence is the concatenation of n/ subsequences. For Weibull benchmarks,each subsequence is a Weibull distribution, whose shape parameter is chosen uniformly at random from9 . , . . For BPPLIB benchmarks, each subsequence is generated by choosing a ﬁle uniformly at random,then generating items uniformly at random from that speciﬁc ﬁle.We set the bin capacity to k = 100 , and we also scale down each item to the closest integer in [1 , k ] . Thechoice of k = 100 is relevant for practical applications such as Virtual Machine placement as explained inSection 1. We evaluate H

YBRID ( λ ) using F IRST F IT , for λ ∈ { , . , . , . , } . This means that H YBRID (0) isidentical to F

IRST F IT , whereas H YBRID (1) is identical to P

ROFILE P ACKING . We ﬁx the size of the proﬁleset to m = 5000 . To ensure a time-efﬁcient and simpliﬁed implementation of P ROFILE P ACKING , we usethe F

IRST F IT D ECREASING algorithm [13] to compute the proﬁle packing, instead of an optimal algorithm.F

IRST F IT D ECREASING ﬁrst sorts items in the non-increasing order of their sizes and then applies theF

IRST F IT algorithm to pack the sorted sequence. Using F IRST F IT D ECREASING helps reduce the timecomplexity, in particular with regards to A

DAPTIVE ( w ), which must compute a new proﬁle packing everytime it updates the frequency prediction. The experimental results only improve by using an optimal proﬁlepacking instead of F IRST F IT D ECREASING .We evaluate H

YBRID ( λ ) on ﬁxed distributions, since it is tailored to this type of input. We generate thefrequency predictions to H YBRID ( λ ) as follows: For a parameter b ∈ N + , we deﬁne the predictions f (cid:48) as f σ [1 ,b ] . In words, we use a preﬁx of size b of the input σ so as to estimate the frequencies of item sizes in σ . In our experiments, we consider 100 different preﬁx sizes. More precisely, we consider all b of the form b = (cid:98) · . i (cid:99) , with i ∈ [25 , . Thus the smallest preﬁx size is equal to 338, and the largest is equalto 44530. We also evaluate A DAPTIVE ( w ) for 100 values of the sliding window w , equidistant in the range [100 , .In evaluating H YBRID ( λ ), we deﬁne the prediction error η as the L distance between the predictedand the actual frequencies. Note that for a given input sequence σ , the prediction error is a function of thepreﬁx size b . Since we consider 100 distinct values for b , as discussed above, for each sequence we consider100 possible error values. It is expected that the prediction error decreases in b , which is conﬁrmed in ourexperiments, as we will discuss.As explained earlier, for typical instances, simple algorithms such as F IRST F IT and B EST F IT tend toperform very well in practice, and we use them as benchmarks for comparing our algorithms. As often inapproximation heuristics for ofﬂine bin packing, we also report the L2 lower bound [28, 16] as a lower-boundestimation of the optimal (ofﬂine) bin packing solution.

Figure 1 illustrates our results for this type of distribution. The input sequence is generated as describedin Section 6.2. Figure 1a and Figure 1b depict the cost of the various algorithms (total number of openedbins) for a typical sequence, as function of the prediction error (For the GI benchmark, the chosen ﬁle is ﬁle“csBA125_9"). We consider a single sequence, as opposed to averaging the cost of algorithms over multipleinput sequences, because each input sequence is associated with its own prediction error, for any given sizeof the preﬁx (and averaging over both the cost and the error may produce misleading results). This should notbe an issue, because the input sequence is of considerable size ( n = 10 ), and the distribution is ﬁxed. Thelargest value of prediction error in our experiments is 0.3622 for the Weibull instance, and 0.3082 for the GIinstance. 10 rror (η) nu m be r o f b i n s L2 Lower Bound (Opt) First Fit Best Fit Hybrid (Lambda = 0.25)Hybrid (Lambda = 0.5) Hybrid (Lambda = 0.75) Profile Packing (Lambda = 1) (a)

Weibull distribution. error (η) nu m b e r o f b i n s L2 Lower Bound (Opt) First Fit Best Fit Hybrid (Lambda = 0.25)Hybrid (Lambda = 0.5) Hybrid (Lambda = 0.75) Profile Packing (Lambda = 1) (b)

GI benchmark from BPPLIB.

Figure 1: Number of opened bins for sequences from a ﬁxed distribution. sliding window (w) nu m b e r o f b i n s L2 Lower Bound (Opt) First Fit Best Fit Adaptive

WeibullDynamic (a)

Weibull distribution. sliding window (w) nu m b e r o f b i n s L2 Lower Bound (Opt) First Fit Best Fit Adaptive

GI Dynamic (b)

GI benchmark from BPPLIB.

Figure 2: Number of opened bins for sequences from an evolving distribution.For both benchmarks, we observe that P

ROFILE P ACKING ( λ = 1) degrades quickly as the error increases,even though it has very good performance for small values of error. As λ decreases, we observe thatH YBRID ( λ ) becomes less sensitive to error, which conﬁrms the statement of Corollary 5. For the Weibullbenchmarks, H YBRID ( λ ) dominates both F IRST F IT and B EST F IT for all λ ∈ { . , . , . } and forall η < . , approximately. For the GI benchmarks, H YBRID ( λ ) dominates F IRST F IT and B EST F IT for λ ∈ { . , . } , and for practically all values of error.The results demonstrate that frequency-based predictions indeed lead to performance gains. Even for verylarge prediction error (i.e., a preﬁx size as small as b = 338 ) H YBRID ( λ ) with λ ≤ . outperforms bothF IRST F IT and B EST F IT , therefore the performance improvement comes by only observing a tiny portion ofthe input sequence. We report experiments on the performance of A

DAPTIVE ( w ) for evolving sequences generated as discussedin Section 6.2. Recall that w is the sliding window that determines how often the prediction is updated. Thisis a parameter that must be chosen judiciously: if w is too small, we do not obtain sufﬁcient information onthe frequencies, whereas if w is too big, the predictions become “stale”.11igure 2 depicts the number of bins opened by A DAPTIVE ( w ) as a function of w for the two types ofbenchmarks. Here we report the average cost of the algorithms over 20 randomly generated sequences. Weobserve that for both benchmarks, there is a relatively wide range for w that leads to marked performanceimprovement, in comparison to F IRST F IT and B EST F IT , namely w ∈ [2100 , . In this work we presented the ﬁrst results for online bin packing in a setting in which the algorithm has accessto machine-learned predictions. We believe that our approach can be applicable to generalizations of onlinebin packing, such as online vector packing [5], which is another important problem for modeling machineplacement in cloud computing [35]. Here, the size of the proﬁle set increases exponentially in the vectordimension, so it will be crucial to use time-efﬁcient heuristics for proﬁle packing.Previous work on the experimental evaluation of online bin packing algorithms has focused on ﬁxed inputdistributions. In our work we supplemented the analysis with a model for evolving input distributions, as wellas a heuristic based on a sliding window. This should be considered only as a ﬁrst step towards this direction.Future work needs to address more sophisticated input models and algorithms, drawn from the rich literatureon evolving data streams; see e.g., the survey [25].

References [1] Spyros Angelopoulos, Christoph Dürr, Shendan Jin, Shahin Kamali, and Marc P. Renault. Onlinecomputation with untrusted advice. In

Proceedings of the 11th Innovations in Theoretical ComputerScience Conference (ITCS) , pages 52:1–52:15, 2020.[2] Spyros Angelopoulos, Christoph Dürr, Shahin Kamali, Marc P. Renault, and Adi Rosén. Online binpacking with advice of small size.

Theory Computing Systems , 62(8):2006–2034, 2018.[3] Antonios Antoniadis, Christian Coester, Marek Eliás, Adam Polak, and Bertrand Simon. Online metricalgorithms with untrusted predictions. In

Proceedings of the 37th International Conference on MachineLearning (ICML) , pages 345–355, 2020.[4] Antonios Antoniadis, Themis Gouleakis, Pieter Kleer, and Pavel Kolev. Secretary and online matchingproblems with machine learned advice. In

Proceedings of the 33rd Conference on Neural InformationProcessing Systems (NeurIPS) , 2020.[5] Yossi Azar, Ilan Reuven Cohen, Seny Kamara, and Bruce Shepherd. Tight bounds for online vector binpacking. In

Proceedings of the 45th Annual ACM Symposium on Theory of Computing , pages 961–970,2013.[6] Azure. Azure virtual machine series. http://azure.microsoft.com/en-ca/pricing/details/virtual-machines/series/ . Accessed: 2021-02-02.[7] János Balogh, József Békési, György Dósa, Leah Epstein, and Asaf Levin. A new and improvedalgorithm for online bin packing. In

Proceedings of the 26th European Symposium on Algorithms (ESA) ,volume 112, pages 5:1–5:14, 2018.[8] János Balogh, József Békési, and Gábor Galambos. New lower bounds for certain classes of bin packingalgorithms.

Theoretical Computer Science , 440:1–13, 2012.129] Soumya Banerjee. Improving online rent-or-buy algorithms with sequential decision making andML predictions. In

Proceedings of the 33rd Conference on Neural Information Processing Systems(NeurIPS) , 2020.[10] Doina Bein, Wolfgang Bein, and Swathi Venigella. Cloud storage and online bin packing. In

Proceedingsof the 5th International Symposium on Intelligent Distributed Computing , pages 63–68. 2011.[11] Joan Boyar, Shahin Kamali, Kim S. Larsen, and Alejandro López-Ortiz. Online bin packing with advice.

Algorithmica , 74(1):507–527, 2016.[12] Ignacio Castiñeiras, Milan De Cauwer, and Barry O’Sullivan. Weibull-based benchmarks for binpacking. In

Proceedings of the 18th International Conference on Principles and Practice of ConstraintProgramming (CP) , volume 7514, pages 207–222, 2012.[13] E. G. Coffman, M. R. Garey, and D. S. Johnson. Approximation algorithms for bin packing: A survey.In

Approximation Algorithms for NP-Hard Problems , page 46–93. 1996.[14] M. Delorme, M. Iori, and S. Martello. BPPLIB–a bin packing problem library. Accessed: 2021-01-15.[15] EC2. Amazon EC2 instance types. http://aws.amazon.com/ec2/instance-types/?trkCampaign=acq_paid_search_brand . Accessed: 2021-02-02.[16] Alex S. Fukunaga and Richard E. Korf. Bin completion algorithms for multicontainer packing, knapsack,and covering problems.

Journal of Artiﬁcial Intelligence Research (JAIR) , 28:393–429, 2007.[17] Ian P. Gent. Heuristic solution of open bin packing problems.

Journal of Heuristics , 3(4):299–304,1998.[18] Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expert advice.In

Proceedings of the 36th International Conference on Machine Learning (ICML) , pages 2319–2327,2019.[19] Heitor Murilo Gomes, Jean Paul Barddal, Fabrício Enembreck, and Albert Bifet. A survey on ensemblelearning for data stream classiﬁcation.

ACM Computing Surveys (CSUR) , 50(2):1–36, 2017.[20] Timo Gschwind and Stefan Irnich. Dual inequalities for stabilized column generation revisited.

IN-FORMS Journal on Computing , 28(1):175–194, 2016.[21] David S. Johnson, A. Demers, J. D. Ullman, Michael R. Garey, and Ronald L. Graham. Worst-caseperformance bounds for simple one-dimensional packing algorithms.

SIAM Journal on Computing(SICOMP) , 3:256–278, 1974.[22] Shahin Kamali and Alejandro López-Ortiz. All-around near-optimal solutions for the online bin packingproblem. In

International Symposium on Algorithms and Computation (ISAAC) , pages 727–739, 2015.[23] Richard E. Korf. A new algorithm for optimal bin packing. In

Proceedings of the 18th AAAI Conferenceon Artiﬁcial Intelligence , pages 731–736, 2002.[24] Richard E. Korf. An improved algorithm for optimal bin packing. In

Proceedings of the 18th Interna-tional Joint Conference on Artiﬁcial Intelligence (IJCAI) , pages 1252–1258, 2003.1325] Bartosz Krawczyk, Leandro L Minku, João Gama, Jerzy Stefanowski, and Michał Wo´zniak. Ensemblelearning for data stream analysis: A survey.

Information Fusion , 37:132–156, 2017.[26] Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online scheduling vialearned weights. In

Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms (SODA) ,pages 1859–1877, 2020.[27] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. In

Proceedings of the 35th International Conference on Machine Learning (ICML) , pages 3302–3311,2018.[28] Silvano Martello and Paolo Toth. Lower bounds and reduction procedures for the bin packing problem.

Discrete Applied Mathematics , 28(1):59–70, 1990.[29] M. Mitzenmacher and S. Vassilvitskii. Algorithms with predictions. In Tim Roughgarden, editor,

Beyond the Worst-Case Analysis of Algorithms , pages 646–662. Cambridge University Press, 2020.[30] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ML predictions. In

Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS) , volume 31,pages 9661–9670, 2018.[31] Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In

Proceedingsof the 14th ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1834–1845, 2020.[32] Ethan L. Schreiber and Richard E. Korf. Improved bin completion for optimal bin packing and numberpartitioning. In

Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence IJCAI ,pages 651–658, 2013.[33] Weijia Song, Zhen Xiao, Qi Chen, and Haipeng Luo. Adaptive resource provisioning for the cloudusing online bin packing.

IEEE Transactions on Computers , 63(11):2647–2660, 2013.[34] Meng Wang, Xiaoqiao Meng, and Li Zhang. Consolidating virtual machines with dynamic bandwidthdemand in data centers. In

Proceedings of the 30th IEEE Conference on Computer Communications(INFOCOM) , pages 71–75, 2011.[35] Qi Zhang, Lu Cheng, and Raouf Boutaba. Cloud computing: state-of-the-art and research challenges.