[PDF] Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems

Abstract

Full PDF

aa r X i v : . [ c s . D S ] M a r Near Optimal Online Algorithms and Fast ApproximationAlgorithms for Resource Allocation Problems ∗ Nikhil R. Devanur † Kamal Jain ‡ Balasubramanian Sivan § Christopher A. Wilkens ¶ Abstract

We present prior robust algorithms for a large class of resource allocation problems whererequests arrive one-by-one (online), drawn independently from an unknown distribution at everystep. We design a single algorithm that, for every possible underlying distribution, obtains a1 − ǫ fraction of the proﬁt obtained by an algorithm that knows the entire request sequence aheadof time. The factor ǫ approaches 0 when no single request consumes/contributes a signiﬁcantfraction of the global consumption/contribution by all requests together. We show that thetradeoﬀ we obtain here that determines how fast ǫ approaches 0, is near optimal: we give anearly matching lower bound showing that the tradeoﬀ cannot be improved much beyond whatwe obtain.Going beyond the model of a static underlying distribution, we introduce the adversarialstochastic input model, where an adversary, possibly in an adaptive manner, controls the distri-butions from which the requests are drawn at each step. Placing no restriction on the adversary,we design an algorithm that obtains a 1 − ǫ fraction of the optimal proﬁt obtainable w.r.t. theworst distribution in the adversarial sequence. Further, if the algorithm is given one numberper distribution, namely, the optimal proﬁt possible for each of the adversary’s distribution, wedesign an algorithm that achieves a 1 − ǫ fraction of the weighted average of the optimal proﬁtof each distribution the adversary picks.In the oﬄine setting we give a fast algorithm to solve very large LPs with both packing andcovering constraints. We give algorithms to approximately solve (within a factor of 1 + ǫ ) themixed packing-covering problem with O ( γm log( n/δ ) ǫ ) oracle calls where the constraint matrix ofthis LP has dimension n × m , the success probability of the algorithm is 1 − δ , and γ quantiﬁeshow signiﬁcant a single request is when compared to the sum total of all requests.We discuss implications of our results to several special cases including online combinatorialauctions, network routing and the adwords problem. There has been an increasing interest in online algorithms for resource allocation problems moti-vated by their wide variety of applications in Internet advertising, allocating multi-leg ﬂight seats for ∗ Part of this work was done when the second author was a researcher at Microsoft Research, Redmond, and thethird, fourth authors were interns at Microsoft Research, Redmond. † Microsoft Research. [email protected]. ‡ Faira. [email protected]. § Google Research. [email protected]. ¶ Facebook Research. [email protected]. ustomers online, allocating goods to customers arriving online in a combinatorial auction etc. De-signing eﬃcient resource allocation algorithms has signiﬁcant scientiﬁc and commercial value. Thetraditional computer science approach to deal with uncertain future inputs has been the worst-casecompetitive analysis. Here nothing is assumed about the sequence of requests that arrive online,and the benchmark is the optimal algorithm that knows the entire sequence of requests ahead oftime. Several problems in this space have been analyzed in the traditional framework, exempliﬁed,for instance, in the well-studied

Adwords problem introduced by Mehta et al. [2005]. While worst-case analysis is a robust framework, for many problems it leads to pessimistic bounds that ruleout obtaining more than a constant fraction of the optimal proﬁt. Consequently, there has beena drive in the last few years to go beyond worst-case analysis. A frequently used alternative is toperform stochastic analysis: assume that the input is drawn from a known distribution and opti-mize the objective w.r.t. this distribution. While stochastic analysis circumvents the impossibilityresults in worst-case analysis, any error in the knowledge of distribution could render the algorithmsuboptimal, and sometimes even infeasible.In this paper, we study a middle ground between worst-case and stochastic settings. We assumethat the input is drawn from an underlying distribution that is unknown to the algorithm designer.We present a single algorithm, that for every distribution performs nearly as well as the optimalalgorithm that knows the entire sequence of requests ahead of time. In this sense, the algorithm is prior robust .We now give an informal description of the resource allocation framework and our main con-tributions. See Section 2 for a formal description and theorem statements. We consider a resourceallocation setting where requests arrive online; every request can be served by some subset of sev-eral available options; each (request, option) pair consumes some amount of every resource, andgenerates some proﬁt. There is a given budget for each resource. Requests are drawn i.i.d. from an unknown distribution. The goal is to maximize the total proﬁt generated while making sure thatthe total consumption of each resource is no more than the corresponding budget. We comparethe proﬁt of the algorithms against the oﬄine optimum and prove competitive ratio bounds. Evenfor very restricted special cases of this problem, the worst-case setting cannot yield anything be-yond a 1 − e competitive ratio Kalyanasundaram and Pruhs [1996], Mehta et al. [2005]. While thestochastic setting with a fully known distribution can give near optimal performance guarantees,it often leads to very distribution dependent algorithms (e.g. see Alaei et al. [2012] for the specialcase of the adwords problem, which requires knowledge of the entire distribution to perform theoptimization). Hence both these approaches are not satisfactory, and this problem lends itself wellto the middle ground of prior robust analysis.Going beyond i.i.d., our work introduces the adversarial stochastic input (ASI) model as a morerealistic model for analyzing online algorithmic problems. Here the distribution from which therequests arrive is allowed to change over time (unlike i.i.d., where it stays identical for every request).The adversary decides how to pick the distributions, and is even allowed to pick them adaptively.For many practical applications such as in display advertising, the distribution of requests showstrends that change over the course of time: mornings are diﬀerent from evenings and weekdays arediﬀerent from weekends. Thus a time varying distributional model is more realistic than the i.i.d.model. A keen reader might notice that the above description includes the worst-case setting as well,therefore we have to make some extra assumptions, either by restricting how these distributionscan be picked, or by allowing the algorithm some extra information about the distributions. Wewill describe these in greater detail later on. 2part from the theoretical contribution, the algorithms we design for the ASI models wereused to completely overhaul the display advertising management system at Microsoft , leading toa signiﬁcant improvement in revenue ( ≈ We believe that our results make a signiﬁcant contribution to the search for “allows-positive-results-yet-realistic” models for online algorithms.

First Result: Near-Optimal Prior Robust Online Algorithms for Resource AllocationProblems

A key parameter on which algorithms for several resource allocation problems dependon is the relative signiﬁcance of any single request when compared to the entire sequence of requests.For instance, for the special case of the Adwords problem, this is the ratio of a single bid to anadvertiser’s budget. For the Adwords problem, Mehta et al. [2005] and Buchbinder et al. [2007]design an algorithm that achieves a worst case competitive ratio that tends to 1 − /e as the bidto budget ratio (which we denote by γ ) tends to 0. Devanur and Hayes [2009] studied the sameproblem in the random permutation model, and showed that the competitive ratio tends to 1 as γ tends to 0. This result showed that competitive ratio of algorithms in stochastic models couldbe much better than that of algorithms in the worst case. The important question since then hasbeen to determine the optimal trade-oﬀ between γ and the competitive ratio. Devanur and Hayes[2009] showed how to get a 1- O ( ǫ ) competitive ratio when γ is at most O ( ǫ n log( mn/ǫ ) ) where n isthe number of advertisers and m is the number of keywords. Subsequently Agrawal et al. [2014]improved the bound on γ to O ( ǫ n log( m/ǫ ) ). The papers of Feldman et al. [2010] and Agrawal et al.[2014] have also shown that the technique of Devanur and Hayes [2009] can be extended to otheronline problems. The ﬁrst main result in this paper is the following -fold improvement of previous results (The-orems 2.2 and 2.3), for the i.i.d. with unknown distributions model . All our results apply to thegeneral class of problems that we call the resource allocation framework . A formal deﬁnition of theframework is presented in Section 2.2 and a discussion of many interesting special cases includingonline network routing and online combinatorial auctions is presented in Section 7.1. We give an algorithm which guarantees a 1 − ǫ approximation factor when γ = O ( ǫ log( n/ǫ ) ).2. We show that our bound on γ is almost optimal ; we show that no algorithm, even if it knew thedistribution, can guarantee a 1 − ǫ approximation when γ = ω ( ǫ log( n ) ).3. Our algorithms lend themselves to natural generalizations that provide identical guarantees inthe more general adversarial stochastic input (ASI) model that was described earlier. We providethree diﬀerent versions of the ASI model in Section 3.5. Signiﬁcance

1. Regarding the bound on γ , we remove a factor of n from γ , making the algorithm more practical.Consider for instance the Adwords problem and suppose that the bids are all in [0,1]. The earlierbound implies that the advertiser budgets need to be of the order of n log n/ǫ in order to get a This system had been globally operational from 2011 to 2015, when Microsoft made a deal with AOL to allowAOL to sell the display advertisement on behalf of Microsoft. Note that γ approaching zero is the easiest case. Even with γ approaching zero, 1 − /e is the best competitiveratio that any randomized algorithm can achieve in the worst case, illustrating how worst-case analysis leads topessimistic bounds. − ǫ competitive algorithm, where n is the number of advertisers. With realistic values for theseparameters, it seems unlikely that this condition would be met. While with the improved boundspresented in this paper, we only need the advertiser budget to be of the order of log n/ǫ andthis condition is met for reasonable values of the parameters. Furthermore, in the more generalresource allocation framework, the previous best upper bound on γ is from Agrawal, Wang andYe Agrawal et al. [2014] and equals O ( ǫ n log( mK/ǫ ) ). Here K is the number of available “options”(see Section 2.2) and in typical applications like network routing, K could be exponential in n ,and thus, the factor saved by our algorithm becomes quadratic in n .2. Our ASI models are realistic models of time varying distributions for which we present algorithmswith asymptotically optimal performance guarantees. We consider three diﬀerent benchmarks,each progressively stronger than the previous, and require diﬀerent levels of information aboutthe distributions to achieve near optimal performance guarantees. For the weakest benchmark,we need just one parameter from the distribution, while for the strongest benchmark, we stillneed only 2 mn parameters. Note that the distributions themselves can have an arbitrarily largesupport size and hence the amount of information we need is much smaller than the descriptionof all the distributions. Our results for the ASI model can be thought of as generalizations of the“Prophet Inequality”. Finally, as mentioned earlier, our algorithms for this model have madea signiﬁcant impact on the practice of display advertising management at Microsoft.

Second Result: Prior Robust − /e approximation Greedy Algorithm for Adwords A natural algorithm for the Adwords problem that is widely used for its simplicity is the greedyalgorithm: always match an incoming query to the advertiser that has the maximum eﬀective bid(the minimum of bid and remaining budget) for that query. Because of its wide use, previously theperformance of the greedy algorithm has been analyzed by Goel and Mehta Goel and Mehta [2008]who showed that in the random permutation and the i.i.d. models, it has a competitive ratio of1 − /e with an assumption which is essentially that γ tends to 0.It has been an important open problem to analyze the performance of greedy algorithm in astochastic setting for unbounded γ , i.e., for all 0 ≤ γ ≤

1. The best factor known so far is 1 /

2, andthis works for the worst case also. Nothing better was known, even in the stochastic models.

Thesecond result in this paper is that for the Adwords problem in the i.i.d. unknown distributions model,with no assumption on γ (i.e., γ could be as big as ), the greedy algorithm gets an approximationfactor of − /e against the optimal fractional solution to the expected instance (Theorem 2.4) .Our proof technique for this result has been subsequently used to prove a similar result forthe greedy algorithm in online submodular welfare maximization Kapralov et al. [2013]. We notehere that there are other algorithms that achieve a 1 − /e approximation for the Adwords problemwith unbounded γ , but the greedy algorithm is the only prior robust (i.e., distribution independent)algorithm known, and it is quite simple too. For example Alaei, Hajiaghayi and Liaghat Alaei et al.[2012] design a randomized algorithm that obtains a 1 − /e approximation, but requires theknowledge of the entire distribution. Devanur, Sivan and Azar Devanur et al. [2012] design a We even allow continuous distributions which have an inﬁnite support. The Prophet Inequality is essentially a 1 / − /e approximation, but requires a few parameters fromthe distribution. Third Result: Fast Approximation Algorithms for Mixed Packing and Covering Inte-ger Programs

Charles et al. [2010] considered the following (oﬄine) problem: given a lopsidedbipartite graph G = ( L, R, E ), that is a bipartite graph where m = | L | ≫ | R | = n , does there existan assignment M : L → R with ( j, M ( j )) ∈ E for all j ∈ L , and such that for every vertex i ∈ R , | M − ( i ) | ≥ B i for some given values B i . Even though this is a classic problem in combinatorialoptimization with well known polynomial time algorithms, the instances of interest are too largeto use traditional approaches to solve this problem. (The value of m in particular is very large.)The approach used by Charles et al. [2010] was to essentially design an online algorithm in thei.i.d. model: choose vertices from L uniformly at random and assign them to vertices in R in anonline fashion. The online algorithm is guaranteed to be close to optimal, as long as suﬃcientlymany samples are drawn. Therefore it can be used to solve the original problem (approximately):the online algorithm gets an almost satisfying assignment if and only if the original graph has asatisfying assignment (with high probability). The third result in this paper is a generalization of this result to get fast approximation algorithmsfor a wide class of mixed packing and covering integer programs (IPs) inspired by problems inthe resource allocation framework (Theorem 2.5) . Problems in the resource allocation frameworkwhere the instances are too large to use traditional algorithms occur fairly often, in particular inthe management of display advertising systems, where these algorithms are being used. Formalstatements and a more detailed discussion are presented in Section 2.4.

High Level Description of Techniques

The underlying idea used for all these results can besummarized at a high level as thus: consider a hypothetical algorithm called

Hypothetical-Oblivious that knows the distribution from which the input is drawn and uses an optimal solution w.r.t.this distribution. Now suppose that we can analyze the performance of Hypothetical-Oblivious byconsidering a potential function and showing that it decreases by a certain amount in each step.Now we can design an algorithm that does not know the distribution as follows: consider the samepotential function, and in every step choose the option that minimizes the potential function. Sincethe algorithm minimizes the potential in each step, the decrease in the potential for this algorithmis better than that for Hypothetical-Oblivious and hence we obtain the same guarantee as that forHypothetical-Oblivious. The choice of potential function varies across the results; also, whether weminimize or maximize the the potential function varies across the results.For instance, our ﬁrst result (Theorem 2.2), the performance of Hypothetical-Oblivious is ana-lyzed using Chernoﬀ bounds. The Chernoﬀ bounds are proven by showing bounds on the expec-tation of the moment generating function of a random variable. Thus the potential function is thesum of the moment generating functions for all the random variables that we apply the Chernoﬀbounds to. The proof shows that in each step this potential function decreases by some multiplica-tive factor. The algorithm is then designed to achieve the same decrease in the potential function.A particularly pleasing aspect about this technique is that we obtain very simple proofs. E.g., theproof of the second result mentioned above (that greedy is 1 − /e competitive, Theorem 2.4 ) isextremely simple: the potential function in this case is simply the total amount of unused budgetsand we show that this amount (in expectation) decreases by a factor of 1 − /m in each step wherethere are m steps in all. 5 ultiplicative-Weight Updates Our techniques and the resulting algorithms for our ﬁrst andthird results (Theorem 2.2 and Theorem 2.5) are similar to the algorithms of Young [1995, 2001]for derandomizing randomized rounding and the fast approximation algorithms for solving cov-ering/packing LPs of Plotkin, Shmoys, and Tardos [1991], Garg and Koenemann [1998], Fleischer[2000]. In fact Arora et al. [2005] showed that all these algorithms are related to the multiplicativeweights update method for solving the experts problem and especially highlighted the similaritybetween the potential function used in the analysis of the multiplicative update method and themoment generating function used in the proof of Chernoﬀ bounds and Young’s algorithms. Henceit is no surprise that our algorithm which uses Chernoﬀ bounds is also a multiplicative updatealgorithm. Our algorithm is closer in spirit to Young’s algorithms than others. The main diﬀerenceis that our algorithm solves an online problem, rather than an oﬄine one, and hence will run shortof essential distribution dependent parameters to run the multiplicative weights based algorithm di-rectly: we show that these parameters can be estimated near optimally. And further, we introducemore adversarial models of online input, namely, the varying ASI models, and come up with varyinglevels of knowledge of the distribution that are suﬃcient to be able to design good algorithms forthese models. And for the oﬄine case, a basic diﬀerence of our algorithm from this previous setof results is that our algorithm uses the special structure of the polytope P k x j,k ≤ γm if weused the Plotkin et al. [1991] algorithm, where as using the special structure of the polytope, weobtain a linear dependence on γm .It is possible that our algorithm can also be interpreted as an algorithm for the experts prob-lem. In fact Mehta et al. [2005] asked if there is a 1 − o (1) competitive algorithm for Adwordsin the i.i.d model with small bid to budget ratio, and in particular if the algorithms for expertscould be used. They also conjectured that such an algorithm would iteratively adjust a budgetdiscount factor based on the rate at which the budget is spent. Our algorithms for resource allo-cation problem when specialized for Adwords look exactly like that, but we do not provide formalconnections to the experts framework. This was done in follow-up works Agrawal and Devanur[2015], Gupta and Molinaro [2014] which showed that essentially the same algorithm as ours canbe thought of as using a subroutine of Multiplicative-weight updates on a suitably deﬁned learningwith experts problem. Follow-up work

There has been a number of follow-up papers since the conference version of thispaper has been published. Alaei et al. [2012] show that for the Adwords problem with a known dis-tribution, it is enough for γ to be O ( ǫ ) to get a 1 − ǫ approximation. Simultaneously, Devanur et al.[2012] showed the same dependence of γ = O ( ǫ ) for the Adwords problem, but requiring only afew parameters from the distribution. Kapralov et al. [2013] study a generalized version of the ad-words problem where, an advertiser’s proﬁt, instead of being budget-additive, could be an arbitrarysubmodular function of the queries assigned to him. For this problem in the worst case setting,they show that no algorithm can obtain better than a approximation, which the greedy algorithmalready achieves. For the same problem in the i.i.d. setting, they show, using techniques we developin this work, that the greedy algorithm obtains a 1 − e approximation. Kesselheim et al. [2014]gave similar guarantees as us, for the random permutation model (i.i.d. without replacement), andalso get the improved bound of γ = O ( ǫ ) for the special case of the Adwords problem. On theother hand, the algorithms of Kesselheim et al. [2014] are computationally expensive, requiring to6olve a linear program for serving every single request, where as our algorithm performs a muchsimpler optimization in every step: for the adwords problem for instance, it takes only a linear timeto perform each step’s optimization. Both Agrawal and Devanur [2015] and Gupta and Molinaro[2014] showed that essentially the same algorithm as ours also works for the random permuta-tion model, with the same guarantees, while also relating it formally to the learning from expertsframework. Agrawal and Devanur [2015] also greatly generalize the resource allocation framework,to handle arbitrary concave objectives and convex constraints. Eghbali et al. [2014] interpret ouralgorithm as an exponentiated sub-gradient algorithm, show that it works for the random per-mutation model, and give a slight generalization to handle additively separable concave rewardfunctions. We consider the following framework of optimization problems. There are n resources, with resource i ∈ A having a capacity of c i . There are m requests; each request j ∈ J can be satisﬁed by a vector x j ∈ { , } K , with coordinates x j,k , such that P k x j,k ≤

1. Think of vector x j as picking a singleoption to satisfy a request from a total of K options. We use K to denote the set of options. Thevector x j consumes a i,j · x j amount of resource i , and gives w i,j · x j amount of type i proﬁt . The a i,j ’s and w i,j ’s are non-negative vectors of length K (and so are the x j ’s). The co-ordinates of thevectors a i,j and w i,j will be denoted by a ijk and w ijk respectively, i.e., the k th option consumes a ijk amount of resource i and gives a type i proﬁt of w ijk . The objective is to maximize the minimumamong all types of proﬁt subject to the capacity constraints on the resources. The following is thelinear program relaxation of the resource allocation problem:Maximize min i ∈A X j ∈J w i,j · x j s.t. ∀ i ∈ A , X j ∈J a i,j · x j ≤ c i ∀ j ∈ J , X k ∈K x j,k ≤ ∀ j ∈ J , k ∈ K , x j,k ≥ ⊥ option ( ⊥ maynot be in the set K ) for which a ij ⊥ = 0 and w ij ⊥ = 0 for all i, j .We consider two versions of the above problem. The ﬁrst is an online version with stochasticinput: requests are drawn from an unknown distribution. The second is an oﬄine problem whenthe number of requests is much larger than the number of resources, and our goal is to design afast PTAS for the problem. While this notation seems to imply that the number of resource-types is equal to the number/set of proﬁt types,namely n , this choice was made purely to reduce clutter in notation. In general the number/set of resource-typescould be diﬀerent from that of the number of proﬁt-types, and it’s straight-forward to verify that our proofs gothrough for the general case. .2 Near-Optimal Online Algorithm for Resource Allocation We now consider an online version of the resource allocation framework. Here requests arriveonline. We consider the i.i.d. model, where each request is drawn independently from a givendistribution. The distribution is unknown to the algorithm. The algorithm knows m , the totalnumber of requests. To deﬁne our benchmark, we now deﬁne the expected instance. Expected Instance

Consider the following expected instance of the problem, where everythinghappens as per expectation. It is a single oﬄine instance which is a function of the given distributionover requests and the total number of requests m . Every request in the support of the distributionis also a request in this instance. The capacities of the resources in this instance are the same as inthe original instance. Suppose request j has a probability p j of arriving in the given distribution.The resource consumption of j in the expected instance is given by mp j a i,j for all i and the type i proﬁt is mp j w i,j . The intuition is that if the requests were drawn from this distribution then theexpected number of times request j is seen is mp j . To summarize, the LP relaxations of a randominstance with set of requests R , and the expected instance E are as follows (slightly rewritten forconvenience). LP relaxations for random and expected instances (1) Random Instance R Expected Instance E Maximize λ s.t. Maximize λ s.t. ∀ i ∈ A , P j ∈ R,k ∈K w ijk x j,k ≥ λ ∀ i ∈ A , P j ∈J ,k ∈K mp j w ijk x j,k ≥ λ ∀ i ∈ A , P j ∈ R,k ∈K a ijk x j,k ≤ c i ∀ i ∈ A , P j ∈J ,k ∈K mp j a ijk x j,k ≤ c i ∀ j ∈ R, P k ∈K x j,k ≤ ∀ j ∈ J , P k ∈K x j,k ≤ ∀ j ∈ R, k ∈ K , x j,k ≥ . ∀ j ∈ J , k ∈ K , x j,k ≥ . We now prove that the fractional optimal solution to the expected instance W E is an upperbound on the expectation of W R , where W R is the oﬄine fractional optimum of the actual sequenceof requests in a random instance R . Lemma 2.1 W E ≥ E [ W R ] Proof:

The average of optimal solutions for all possible sequences of requests is a feasible solutionto the expected instance with a proﬁt equal to E [ W R ]. Thus the optimal proﬁt for the expectedinstance could only be larger.The approximation factor of an algorithm in the i.i.d. model is deﬁned as the ratio of theexpected proﬁt of the algorithm to the fractional optimal proﬁt W E for the expected instance. Let8 = max (cid:18)n a ijk c i o i,j,k ∪ n w ijk W E o i,j,k (cid:19) be the parameter capturing the signiﬁcance of any one requestwhen compared to the total set of requests that arrive online. The main result is that as γ tendsto zero, the approximation factor ratio tends to 1. In fact, we give the almost optimal trade-oﬀ. Theorem 2.2

For any ǫ ≥ /m , Algorithm 2 achieves an objective value of W E (1 − O ( ǫ )) forthe online resource allocation problem with probability at least − ǫ , assuming γ = O ( ǫ log( n/ǫ ) ) .Algorithm 2 does not require any knowledge of the distribution at all. Theorem 2.3

There exist instances with γ = ǫ log( n ) such that no algorithm, even with completeknowledge of the distribution, can get a − o ( ǫ ) approximation factor. Oracle Assumption

We assume that we have the following oracle available to us: given a request j and a vector v , the oracle returns the vector x j that maximizes v . x j among all x j s in { , } K satisfying P k ∈K x j,k ≤

1. This assumption boils down to being able to ﬁnd the maximum among K numbers, but K may be exponential in some cases. For the Adwords and display ads problem(described below), K is actually equal to n , and this is trivial. For network routing (described inSection 7), K could be exponential in the size of the network, and this assumption correspondsto being able to ﬁnd the shortest path in a graph in polynomial time. For combinatorial auctions(described in Section 7), this corresponds to the demand query assumption: given prices on variousitems, the buyer should be able to decide in polynomial time which bundle gives her the maximumutility. (While this is not always achievable in polynomial time, there cannot be any hope of aposted pricing solution for combinatorial auction without this minimum assumption. ) Extensions and Special Cases

The extensions of Theorem 2.2 to the various generalizationsof the i.i.d. model, including the adversarial stochastic input model are presented in Section 3.5.We refer the reader to Section 7 for a discussion on several problems that are special cases of theresource allocation framework and have been previously considered. Here, we discuss two specialcases — the Adwords problem and display ads problem.1.

Adwords.

In the adwords problem, there are n advertisers with a daily budget of B i foradvertiser i . There are m keywords/queries that arrive online, and advertiser i has a bidof b ij for query j . This is a special case of the resource allocation framework where the setof options K matches the set of resources/advertisers A , i.e., each query can be given to atmost one advertiser, and will consume budget just from that advertiser. Let x ij denote theindicator variable for whether or not query j was allocated to agent i . After all allocation isover, agent i pays min( P j ∈J b ij x ij , B i ), i.e., the minimum of the sum of the bids for queriesallocated to i and his budget B i . The objective is to maximize the sum of the paymentsfrom all advertisers — this is again a special case of the resource allocation framework wherethis only a single proﬁt type, and we just want to maximize it. One could raise a technicalobjection that this is not a special case of the resource allocation framework because thebudget constraint is not binding: the value of the allocated bids to an advertiser can exceedhis budget, although the total payment from the advertiser will be at most the budget. But itis not diﬃcult to see that the LP relaxation of the oﬄine problem can be written as in LP (2),which is clearly a special case of resource allocation framework LP. Note that the benchmarkis anyway an upper bound even on the expected optimal fractional solution. Therefore, any9lgorithm that gets an α approximation factor for resource allocation is also guaranteed to getthe same approximation factor for Adwords. The only notable thing being that an algorithmfor resource allocation when used for adwords will treat the budget constraints as binding,and obtain the guarantee promised in Theorem 2.2 (In our 1 − /e approximation algorithmfor adwords in Section 5 that holds for all values of γ ( ≤ Display Ads.

In the display ads problem, there are n advertisers and m impressions arriveonline. Advertiser i has wants c i impressions in total and pays v ij for impression j , and willget paid a penalty of ρ i for every undelivered impression. If over-delivered, he will pay his bidfor the ﬁrst c i impressions delivered. Letting b ij = v ij + ρ i , we can write the LP relaxationof the oﬄine display ads problem as in LP (2), which is clearly a special case of the resourceallocation LP, where just like the Adwords special case the set of options K is equal to theset of resources/advertisers A , and there is only a single proﬁt type.LP relaxations for Adwords and Display Ads (2) Adwords Display Ads

Maximize P i ∈A ,j ∈J b ij x ij s.t. Maximize P i ∈A ,j ∈J b ij x ij s.t. ∀ i ∈ A , P j ∈J b ij x ij ≤ B i ∀ i ∈ A , P j ∈J x ij ≤ c i ∀ j ∈ J , P i ∈A x ij ≤ ∀ j ∈ J , P i ∈A x ij ≤ ∀ i ∈ A , j ∈ J , x ij ≥ . ∀ i ∈ A , j ∈ J , x ij ≥ . As noted in the introduction, the greedy algorithm is widely implemented due to its simplicity, butits performance was known to be only a 1 / − /e approximation for all γ , i.e., 0 ≤ γ ≤ Theorem 2.4

The greedy algorithm achieves an approximation factor of − /e for the Adwordsproblem in the i.i.d. unknown distributions model for all γ , i.e., ≤ γ ≤ . We note here that the competitive ratio of 1 − /e is tight for the greedy algorithm Goel and Mehta[2008]. It is however not known to be tight for an arbitrary algorithm. Charles et al. [2010] consider the following problem: given a bipartite graph G = ( L, R, E ) where m = | L | ≫ | R | = n , does there exist an assignment M : L → R with ( j, M ( j )) ∈ E for all j ∈ L ,10nd such that for every vertex i ∈ R , | M − ( i ) | ≥ B i for some given values B i . Since m is verylarge classic matching algorithms are not useful. Charles et al. [2010] gave an algorithm that runsin time linear in the number of edges of an induced subgraph obtained by taking a random samplefrom L of size O (cid:16) m log n min i { B i } ǫ (cid:17) , for a gap-version of the problem with gap ǫ . Such an algorithm isvery useful in a variety of applications involving ad assignment for online advertising, particularlywhen min i { B i } is large.We consider a generalization of the above problem inspired by the resource allocation framework.In fact, we consider the following mixed covering-packing integer program. Suppose that there are n packing constraints, one for each i ∈ [ n ] of the form P mj =1 a i,j · x j ≤ c i and n covering constraints,one for each i ∈ [ n ] of the form P mj =1 w i,j · x j ≥ d i . Each x j (with coordinates x j,k ) is constrainedto be in { , } K and satisfy P k x j,k ≤

1. The a i,j ’s and w i,j ’s (and hence x j ’s) are non-negativevectors of length K with coordinates a ijk and w ijk . Does there exist a feasible solution to thissystem of constraints? The gap-version of this problem is as follows. Distinguish between the twocases, with a high probability, say 1 − δ : • YES: There is a feasible solution. • NO: There is no feasible solution even if all the c i ’s are multiplied by 1 + ǫ and all the d i ’sare multiplied by 1 − ǫ .We note that solving (oﬄine) an optimization problem in the resource allocation framework can bereduced to the above problem through a binary search on the objective function value.Let γ = max (cid:18)n a ijk c i o i,j,k ∪ n w ijk d i o i,j,k (cid:19) . Theorem 2.5

For any ǫ > , assuming γ = O ( ǫ log( n/ǫ ) ) , Algorithm 5 solves the gap version of themixed covering-packing integer program with Θ( γm log( n/δ ) ǫ ) oracle calls. We present here the form of Chernoﬀ bounds that we use throughout the rest of this paper. Let X = P i X i , where X i ∈ [0 , B ] are i.i.d random variables. Let E [ X ] = µ . Then, for all ǫ > Pr [ X < µ (1 − ǫ )] < exp (cid:18) − ǫ µ B (cid:19) . Consequently, for all δ >

0, with probability at least 1 − δ , X − µ ≥ − p µB ln(1 /δ ) . Similarly, for all ǫ ∈ [0 , e − Pr [ X > µ (1 + ǫ )] < exp (cid:18) − ǫ µ B (cid:19) . Consequently, for all δ > exp( − (2 e − µ B ), with probability at least 1 − δ , X − µ ≤ p µB ln(1 /δ ) . For ǫ > e − Pr [ X > µ (1 + ǫ )] < − (1+ ǫ ) µ/B . In fact, the algorithm makes a single pass through this graph. Near-Optimal Prior Robust Online Algorithms for Resource Al-location

For convenience, we begin by rewriting the LP relaxation of a random instance R of the onlineresource allocation problem and the expected instance (already deﬁned in Section 2.2 as LP (1)).LPs for random and expected instances (3) Random Instance R Expected Instance E Maximize λ s.t. Maximize λ s.t. ∀ i ∈ A , P j ∈ R,k ∈K w ijk x j,k ≥ λ ∀ i ∈ A , P j ∈J ,k ∈K mp j w ijk x j,k ≥ λ ∀ i ∈ A , P j ∈ R,k ∈K a ijk x j,k ≤ c i ∀ i ∈ A , P j ∈J ,k ∈K mp j a ijk x j,k ≤ c i ∀ j ∈ R, P k ∈K x j,k ≤ ∀ j ∈ J , P k ∈K x j,k ≤ ∀ j ∈ R, k ∈ K , x j,k ≥ . ∀ j ∈ J , k ∈ K , x j,k ≥ . We showed in Lemma 2.1 that W E ≥ E [ W R ]. All our approximation guarantees are w.r.t. thestronger benchmark of W E which is the optimal fractional solution of the expected instance. Wewould like to remind the reader that while the benchmark is allowed to be fractional, the onlinealgorithm of course is allowed to ﬁnd only integral solutions.We divide the rest of this section into four subsections. The subsections progressively weakenthe assumptions on knowledge of the distribution of the input.1. In section 3.1 we develop a hypothetical algorithm called Hypothetical-Oblivious-Conservative,denoted by e P , that achieves an objective value of W E (1 − ǫ ) w.p. at least 1 − ǫ assuming γ = O (cid:16) ǫ log( n/ǫ ) (cid:17) . Theorem 3.1 is the main result of this section. The algorithm is hypothet-ical because it assumes knowledge of the entire distribution, where as the goal of this paperis to develop algorithms that work without distributional knowledge.2. In section 3.2 we design an algorithm for the online resource allocation problem that achievesthe same guarantee as the Hypothetical-Oblivious-Conservative algorithm e P , without anyknowledge of the distribution except for a single parameter of the distribution — the valueof W E . Theorem 3.2 is the main result of this section.3. In section 3.3 we design an algorithm for the online resource allocation problem that achievesan objective value of at least W E (1 − O ( ǫ )) w.p. at least 1 − ǫ assuming γ = O ( ǫ log( n/ǫ ) )without any knowledge at all about the distribution. The algorithm in Section 3.2 serves asa good warm-up for the algorithm in this section. Theorem 2.2 is the main result of thissection. 12. In section 3.5 we relax the assumption that the distribution from which the requests aredrawn is i.i.d.; we give three diﬀerent generalizations of the i.i.d. model with strong revenueguarantees as in the i.i.d. model. When the distributions are completely known, we ﬁrst compute the expected instance and solveits LP relaxation (LP (3)) optimally. Let x ∗ jk denote the optimal solution to the expected LP (3).The Hypothetical-Oblivious algorithm P works as follows: when request j arrives, it serves it usingoption k with probability x ∗ jk . Let X ∗ i,t denote the amount of resource i consumed in step t for thealgorithm P . Thus the total amount of resource i consumed over the entire m steps of algorithm P is P mt =1 X ∗ i,t . Note that E [ X ∗ i,t ] = P j,k p j a ijk x ∗ jk ≤ c i m . Thus, we can bound the probability that Pr [ P mt =1 X ∗ i,t ≥ c i (1 + ǫ )] using Chernoﬀ bounds. We explicitly derive this bound here since we usethis derivation in designing the algorithm in Section 3.2.Since we cannot exceed c i amount of resource consumption by any non-zero amount, we needto be more conservative than P . So we analyze the following algorithm e P , called Hypothetical-Oblivious-Conservative, instead of P : when request j arrives, it serves it using option k withprobability x ∗ jk ǫ , where ǫ is an error parameter of algorithm designer’s choice. Let e X i,t denote theamount of resource i consumed in step t for the algorithm e P . Note that E [ e X i,t ] ≤ c i (1+ ǫ ) m . Thus,even with a (1 + ǫ ) deviation using Chernoﬀ bounds, the resource consumption is at most c i .We begin by noting that e X i,t ≤ γc i by the deﬁnition of γ . For all ǫ ∈ [0 ,

1] we have, Pr (cid:20) m X t =1 e X i,t ≥ c i ǫ (1 + ǫ ) (cid:21) = Pr (cid:20) P mt =1 e X i,t γc i ≥ γ (cid:21) = Pr (cid:20) (1 + ǫ ) P mt =1 e Xi,tγci ≥ (1 + ǫ ) γ (cid:21) ≤ E (cid:20) (1 + ǫ ) P mt =1 e Xi,tγci (cid:21) / (1 + ǫ ) γ = E (cid:20) m Y t =1 (1 + ǫ ) e Xi,tγci (cid:21) / (1 + ǫ ) γ ≤ E (cid:20) m Y t =1 ǫ e X i,t γc i ! (cid:21) / (1 + ǫ ) γ ≤ (cid:20) m Y t =1 (cid:18) ǫ (1 + ǫ ) γm (cid:19) (cid:21) / (1 + ǫ ) γ ≤ (cid:18) e ǫ (1 + ǫ ) ǫ (cid:19) γ (1+ ǫ ) ≤ e − ǫ γ ǫ ≤ ǫ n where the ﬁrst inequality follows from Markov’s inequality, the second from convexity of exponentialfunction together with with the fact that e X i,t ≤ γc i , the third from E [ e X i,t ] ≤ c i (1+ ǫ ) m , and the fourth13rom 1+ x ≤ e x , the ﬁfth is standard for all ǫ ∈ [0 , γ = O ( ǫ / log( n/ǫ ))for an appropriate choice of constant inside the big-oh coupled with n ≥ Remark 3.1

At ﬁrst sight this bound might seem anomalous — the bound ǫ n is increasing in ǫ ,i.e., the probability of a smaller deviation is smaller than the probability of a larger deviation! Thereason for this anomaly is that γ is related to ǫ as γ = O ( ǫ log( n/ǫ ) ) , and smaller the γ , the betterrevenue we can get (i.e., more granular requests leads to lesser wastage from errors, and hence morerevenue). Thus a small deviation for small γ has a smaller probability than a larger deviation fora larger γ . Similarly let e Y i,t denote the revenue obtained from type i proﬁt in step t for the algorithm e P .Note that E [ e Y i,t ] = P j,k p j w ijk x ∗ jk ǫ ≥ W E (1+ ǫ ) m . By the deﬁnition of γ , we have e Y i,t ≤ γW E . For all ǫ ∈ [0 ,

1] we have, Pr (cid:20) m X t =1 e Y i,t ≤ W E ǫ (1 − ǫ ) (cid:21) = Pr (cid:20) P mt =1 e Y i,t γW E ≤ − ǫγ (1 + ǫ ) (cid:21) = Pr (cid:20) (1 − ǫ ) P mt =1 e Yi,tγWE ≥ (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) ≤ E (cid:20) (1 − ǫ ) P mt =1 e Yi,tγWE (cid:21) / (1 − ǫ ) − ǫγ (1+ ǫ ) = E (cid:20) m Y t =1 (1 − ǫ ) e Yi,tγWE (cid:21) / (1 − ǫ ) − ǫγ (1+ ǫ ) ≤ E (cid:20) m Y t =1 − ǫ e Y i,t γW E ! (cid:21) / (1 − ǫ ) − ǫγ (1+ ǫ ) ≤ (cid:20) m Y t =1 (cid:18) − ǫ (1 + ǫ ) γm (cid:19) (cid:21) / (1 − ǫ ) − ǫγ (1+ ǫ ) ≤ (cid:18) e − ǫ (1 − ǫ ) − ǫ (cid:19) γ (1+ ǫ ) ≤ e − ǫ γ ǫ ≤ ǫ n Thus, we have all the capacity constraints satisﬁed, (i.e., P i e X i,t ≤ c i ), and all resource proﬁtsare at least W E ǫ (1 − ǫ ) (i.e., P i e Y i,t ≥ W E ǫ (1 − ǫ ) ≥ W E (1 − ǫ )), with probability at least 1 − n · ǫ/ n =1 − ǫ . This proves the following theorem: Theorem 3.1

For any ǫ > , the Hypothetical-Oblivious-Conservative algorithm e P achieves anobjective value of W E (1 − ǫ ) for the online resource allocation problem with probability at least − ǫ , assuming γ = O ( ǫ log( n/ǫ ) ) . .2 Unknown Distribution, Known W E We now design an algorithm A without knowledge of the distribution, but just knowing a singleparameter W E . Let A s e P m − s be a hybrid algorithm that runs A for the ﬁrst s steps and e P for theremaining m − s steps. Let ǫ ∈ [0 ,

1] be the error parameter, which is the algorithm designer’schoice. Call the algorithm a failure if at least one of the following fails:1. For all i , P mt =1 X Ai,t ≤ c i .2. For all i , P mt =1 Y Ai,t ≥ W E (1 − ǫ ).For any algorithm A , let the amount of resource i consumed in the t -th step be denoted by X Ai,t and the amount of resource i proﬁt be denoted by Y Ai,t . Let S s (cid:0) X Ai (cid:1) = P st =1 X Ai,t denote the amountof resource i consumed in the ﬁrst s steps, and let S s (cid:0) Y Ai (cid:1) = P st =1 Y Ai,t denote the resource i proﬁtin the ﬁrst s steps. Similar to the derivation in Section 3.1 which bounded the failure probabilityof e P , we can bound the failure probability of any algorithm A , i.e., Pr (cid:20) m X t =1 X Ai,t ≥ c i ǫ (1 + ǫ ) (cid:21) = Pr (cid:20) P mt =1 X Ai,t γc i ≥ γ (cid:21) = Pr (cid:20) (1 + ǫ ) P mt =1 XAi,tγci ≥ (1 + ǫ ) γ (cid:21) ≤ E (cid:20) (1 + ǫ ) P mt =1 XAi,tγci (cid:21) / (1 + ǫ ) γ = E (cid:20) m Y t =1 (1 + ǫ ) XAi,tγci (cid:21) / (1 + ǫ ) γ (4) Pr (cid:20) m X t =1 Y Ai,t ≤ W E ǫ (1 − ǫ ) (cid:21) = Pr (cid:20) P mt =1 Y Ai,t γW E ≤ − ǫγ (1 + ǫ ) (cid:21) = Pr (cid:20) (1 − ǫ ) P mt =1 Y Ai,tγWE ≥ (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) ≤ E (cid:20) (1 − ǫ ) P mt =1 Y Ai,tγWE (cid:21) / (1 − ǫ ) − ǫγ (1+ ǫ ) = E (cid:20) m Y t =1 (1 − ǫ ) Y Ai,tγWE (cid:21) / (1 − ǫ ) − ǫγ (1+ ǫ ) (5)In Section 3.1 our algorithm A was e P (cid:18) and therefore we can use E [ e X i,t ] ≤ c i (1+ ǫ ) m and E [ e Y i,t ] ≥ W E (1+ ǫ ) m (cid:19) , the total failure probability which is the sum of (4) and (5) for all the i ’s would have been Note that the notation A that we use for the set of advertisers/resources is diﬀerent from the non-calligraphic A that we use for an algorithm. Also, it is immediate from the context which one we are referring to. · (cid:2) ǫ n + ǫ n (cid:3) = ǫ . The goal is to design an algorithm A for stage r that, unlike P , does not knowthe distribution and knows just W E , but obtains the same ǫ failure probability. That is we wantto show that the sum of (4) and (5) over all i ’s is at most ǫ : X i E (cid:20) Q mt =1 (1 + ǫ ) XAi,tγci (cid:21) (1 + ǫ ) γ + X i E (cid:20) Q mt =1 (1 − ǫ ) Y Ai,tγWE (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) ≤ ǫ For the algorithm A s e P m − s , the above quantity can be rewritten as X i E (cid:20) (1 + ǫ ) Ss ( XAi ) γci Q mt = s +1 (1 + ǫ ) e Xi,tγci (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γWE Q mt = s +1 (1 − ǫ ) e Yi,tγWE (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) , which, by using (1 + ǫ ) x ≤ ǫx for 0 ≤ x ≤

1, is in turn upper bounded by X i E (cid:20) (1 + ǫ ) Ss ( XAi ) γci Q mt = s +1 (cid:16) ǫ e X i,t γc i (cid:17) (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γWE Q mt = s +1 (cid:16) − ǫ e Y i,t γW E (cid:17) (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) . Since for all t , the random variables e X i,t , X Ai,t , e Y i,t and Y Ai,t are all independent, and E [ e X i,t ] ≤ c i (1+ ǫ ) m and E [ e Y i,t ] ≥ W E (1+ ǫ ) m , the above is in turn upper bounded by X i E (cid:20) (1 + ǫ ) Ss ( XAi ) γci (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − s (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γWE (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − s (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) . (6)Let F [ A s e P m − s ] denote the quantity in (6), which is an upper bound on failure probability ofthe hybrid algorithm A s e P m − s . By Theorem 3.1, we know that F [ e P m ] ≤ ǫ . We now prove that forall s ∈ { , , . . . , m − } , F [ A s +1 e P m − s − ] ≤ F [ A s e P m − s ], thus proving that F [ A m ] ≤ ǫ , i.e., runningthe algorithm A for all the m steps results in a failure with probability at most ǫ . To design suchan A we closely follow the derivation of Chernoﬀ bounds, which is what established that F [ e P m ] ≤ ǫ in Theorem 3.1. However the design process will reveal that unlike algorithm e P which needs theentire distribution, just the knowledge of W E will do for bounding the failure probability by ǫ .Assuming that for all s < p , the algorithm A has been deﬁned for the ﬁrst s + 1 steps in such away that F [ A s +1 e P m − s − ] ≤ F [ A s e P m − s ], we now deﬁne A for the p + 1-th step in a way that willensure that F [ A p +1 e P m − p − ] ≤ F [ A p e P m − p ]. We have F [ A p +1 e P m − p − ] = X i E (cid:20) (1 + ǫ ) Sp +1 ( XAi ) γci (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Sp +1 ( Y Ai ) γWE (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) X i E (cid:20) (1 + ǫ ) Sp ( XAi ) γci (cid:18) ǫ X Ai,p +1 γc i (cid:19) (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Sp ( Y Ai ) γWE (cid:18) − ǫ Y Ai,p +1 γW E (cid:19) (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) (7)Deﬁne φ i,s = 1 c i (cid:20) (1 + ǫ ) Ss ( XAi ) γci (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − s − (1 + ǫ ) γ (cid:21) ψ i,s = 1 W E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γWE (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − s − (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) Deﬁne the step p + 1 of algorithm A as picking the following option k ∗ for request j , where: k ∗ = arg min k (X i a ijk · φ i,p − X i w ijk · ψ i,p ) . (8)For the sake of clarity, the entire algorithm is presented in Algorithm 1. Algorithm 1 : Algorithm for stochastic online resource allocation with unknown distribution,known W E Input:

Capacities c i for i ∈ [ n ], the total number of requests m , the values of γ and W E , an errorparameter ǫ > . Output:

An online allocation of resources to requests Initialize φ i, = c i (cid:20) (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − (1+ ǫ ) γ (cid:21) , and, ψ i, = W E (cid:20) (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) for s = 1 to m do If the incoming request is j , use the following option k ∗ : k ∗ = arg min k ∈K∪{⊥} (X i a ijk · φ i,s − − X i w ijk · ψ i,s − ) . X Ai,s = a ijk ∗ , Y Ai,s = w ijk ∗ Update φ i,s = φ i,s − ·  (1+ ǫ ) XAi,sγci ǫ (1+ ǫ ) γm  , and, ψ i,s = ψ i,s − ·  (1 − ǫ ) Y Ai,sγWE − ǫ (1+ ǫ ) γm  end for By the deﬁnition of step p + 1 of algorithm A given in equation (8), it follows that for anytwo algorithms with the ﬁrst p steps being identical, and the last m − p − e P , algorithm A ’s p + 1-th step is the one thatminimizes expression (7). In particular it follows that expression (7) is upper bounded by the sameexpression where the p + 1-the step is according to e X i,p +1 and e Y i,p +1 , i.e., we replace X Ai,p +1 by e X i,p +1 and Y Ai,p +1 by e Y i,p +1 . Therefore we have F [ A p +1 e P m − p − ] ≤ X i E (cid:20) (1 + ǫ ) Sp ( XAi ) γci (cid:16) ǫ e X i,p +1 γc i (cid:17) (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Sp ( Y Ai ) γWE (cid:16) − ǫ e Y i,p +1 γW E (cid:17) (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) ≤ X i E (cid:20) (1 + ǫ ) Sp ( XAi ) γci (cid:16) ǫ (1+ ǫ ) γm (cid:17) (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 + ǫ ) γ + X i E (cid:20) (1 − ǫ ) Sp ( Y Ai ) γWE (cid:16) − ǫ (1+ ǫ ) γW E (cid:17) (cid:16) − ǫ (1+ ǫ ) γm (cid:17) m − p − (cid:21) (1 − ǫ ) − ǫγ (1+ ǫ ) = F [ A p e P m − p ]This completes the proof of the following theorem. Theorem 3.2

For any ǫ > , Algorithm 1 achieves an objective value of W E (1 − ǫ ) for the onlineresource allocation problem with probability at least − ǫ , assuming γ = O ( ǫ log( n/ǫ ) ) . The algorithm A does not require any knowledge of the distribution except for the single parameter W E . We ﬁrst give a high-level overview of this section before going into the details. In this section, wedesign an algorithm A without any knowledge of the distribution at all. The algorithm is similarin spirit to the one in Section 3.2 except that since we do not have knowledge of W E , we divide thealgorithm into many stages. In each stage, we run an algorithm similar to the one in Section 3.2except that instead of W E , we use an estimate of W E that gets increasingly accurate with eachsuccessive stage.More formally, the algorithm runs in l stages { , , . . . , l − } , where l is such that ǫ l = 1, and ǫ ∈ [1 /m, /

2] (we need ǫ ≤ / l is at least 1) is the error parameter of algorithm designer’schoice. Further we need m ≥ ǫ so that ǫm ≥

1. We assume that ǫm is an integer for clarity ofexposition. Stage r handles t r = ǫm r requests for r ∈ { , . . . l − } . The ﬁrst ǫm requests are usedjust for future estimation, and none of them are served. For convenience we sometimes call thispre-zero stage as stage −

1, and let t − = ǫm . Stage r ≥ t ∈ [ t r + 1 , t r +1 ]. Note that inthe optimal solution to the expected instance of stage r , no resource i gets consumed by more than t r c i m , and every resource i gets a proﬁt of t r W E m , i.e., consumption and proﬁt have been scaled downby a factor of t r m . As in the previous sections, with a high probability, we can only reach close to18 r W E m . Further, since stage r consists of only t r requests, which is much smaller than m for small r ,it follows that for small r , our error in how close to we get to t r W E m will be higher. Indeed, insteadof having the same error parameter of ǫ in every stage, we set stage-speciﬁc error parameters whichget progressively smaller and become close to ǫ in the ﬁnal stages. These parameters are chosensuch that the overall error is still O ( ǫ ) because the later stages having more requests matter morethan the former. There are two sources of error/failure which we detail below.1. The ﬁrst source of failure stems from not knowing W E . Instead we estimate a quantity Z r which is an approximation we use for W E in stage r , and the approximation gets better as r increases. We use Z r to set a proﬁt target of t r Z r m for stage r . Since Z r could be much smallerthan W E our algorithm could become very suboptimal. We prove that with a probability ofat least 1 − δ we have W E (1 − ǫ x,r − ) ≤ Z r ≤ W E (see next for what ǫ x,r is), where δ = ǫ l .Thus for all the l stages, these bounds are violated with probability at most 2 lδ = 2 ǫ/ ǫ x,r and ǫ y,r such that for every i stage r consumes at most t r c i m (1 + ǫ x,r )amount of resource i , and for every i we get a proﬁt of at least t r Z r m (1 − ǫ y,r ), with probabilityat least 1 − δ . Thus the overall failure probability, as regards falling short of the target t r Z r m by more than ǫ y,r and exceeding t r c i m by more than ǫ x,r , for all the l stages together is at most δ · l = ǫ/ ǫ/ ǫ/ ǫ . We have that with probabilityat least 1 − ǫ , for every i , the total consumption of resource i is at most P l − r =0 t r c i m (1 + ǫ x,r ), andtotal proﬁt from resource i is at least P l − r =0 t r W E m (1 − ǫ x,r − )(1 − ǫ y,r ). We set ǫ x,r = q γm ln(2 n/δ ) t r for r ∈ {− , , , . . . , l − } , and ǫ y,r = q w max m ln(2 n/δ ) t r Z r for r ∈ { , , . . . , l − } (we deﬁne ǫ x,r starting from r = −

1, with t − = ǫm , just for technical convenience). From this it follows that P l − r =0 t r c i m (1 + ǫ x,r ) ≤ c i and P l − r =0 t r W E m (1 − ǫ x,r − )(1 − ǫ y,r ) ≥ W E (1 − O ( ǫ )), assuming γ = O ( ǫ log( n/ǫ ) ). The algorithm is described in Algorithm 2. This completes the high-level overviewof the proof. All that is left to prove is the points 1 and 2 above, upon which we would haveproved our main theorem, namely, Theorem 2.2, which we recall below. Theorem 2.2 . For any ǫ ≥ /m , Algorithm 2 achieves an objective value of W E (1 − O ( ǫ )) for the online resource allocationproblem with probability at least 1 − ǫ , assuming γ = O ( ǫ log( n/ǫ ) ). Algorithm 2 does not requireany knowledge of the distribution at all. Detailed Description and Proof

We begin with the ﬁrst point in our high-level overviewabove, namely by describing how Z r is estimated and proving its concentration around W E . Afterstage r (including stage − e r to the following instance I r : the instance has the t r requests of stage r , and the capacity ofresource i is capped at t r c i m . Using the optimal fractional objective value e r of this instance, weset Z r +1 = e r ǫ x,r · mt r . The ﬁrst task now is to prove that Z r +1 as estimated above is concentratedenough around W E . This basically requires proving concentration of e r . Lemma 3.3

With a probability at least − δ , we have t r W E m (1 − ǫ x,r ) ≤ e r ≤ t r W E m (1 + ǫ x,r ) . lgorithm 2 : Algorithm for stochastic online resource allocation with unknown distribution Input:

Capacities c i for i ∈ [ n ], the total number of requests m , the value of γ , an error parameter ǫ > /m. Output:

An online allocation of resources to requests Set l = log(1 /ǫ ) Initialize t − : t − ← ǫm for r = 0 to l − do Compute e r − : the optimal solution to the t r − requests of stage r − t r − c i m . Set Z r = e r − ǫ x,r − · mt r − Set ǫ x,r = q γm ln(2 n/δ ) t r , ǫ y,r = q w max m ln(2 n/δ ) t r Z r Set φ ri, = ǫ x,r γc i (cid:20) (cid:16) ǫx,rmγ (cid:17) tr − (1+ ǫ x,r ) (1+ ǫx,r ) trmγ (cid:21) , and, ψ ri, = ǫ y,r w max (cid:20) (cid:16) − ǫy,rmγ (cid:17) tr − (1 − ǫ y,r ) (1 − ǫy,r ) trZrmw max (cid:21) for s = 1 to t r do If the incoming request is j , use the following option k ∗ : k ∗ = arg min k ∈K∪{⊥} (X i a ijk · φ ri,s − − X i w ijk · ψ ri,s − ) . X Ai,t r + s = a ijk ∗ , Y Ai,t r + s = w ijk ∗ Update φ ri,s = φ ri,s − ·  (1+ ǫ x,r ) XAi,tr + sγci ǫx,rmγ  , and, ψ ri,s = ψ ri,s − ·  (1 − ǫ y,r ) Y Ai,tr + sw max − ǫy,rmγ  end for end for Proof:

We prove that the lower and upper bound hold with probability 1 − δ each, thus provingthe lemma.We begin with the lower bound on e r . Note that the expected instance of the instance I r hasthe same optimal solution x ∗ jk as the optimal solution to the full expected instance (i.e., the onewithout scaling down by t r m ). Now consider the algorithm e P ( r ), which is the same as the e P deﬁnedin Section 3.1 except that ǫ is replaced by ǫ x,r , i.e., it serves request j with option k with probability x ∗ jk ǫ x,r . When e P ( r ) is run on instance I r , with a probability at least 1 − δ n , at most t r c i m amount ofresource i is consumed, and with probability at least 1 − δ n , at least t r W E m − ǫ x,r ǫ x,r resource i proﬁt isobtained. Thus with a probability at least 1 − n · δ n = 1 − δ , e P ( r ) achieves an objective value of atleast t r W E m (1 − ǫ x,r ). Therefore the optimal objective value e r will also be at least t r W E m (1 − ǫ x,r ).To prove the upper bound, we consider the primal and dual LPs that deﬁne e r in LP 9 and theprimal and dual LPs deﬁning the expected instance in LP (10). In the latter, for convenience, weuse mp j β j as the dual multiplier instead of just β j .Note that the set of constraints in the dual of LP (10) is a superset of the set of constraints inthe dual of LP (9), making any feasible solution to dual of LP (10) also feasible to dual of LP (9).20rimal and dual LPs deﬁning e r (9) Primal deﬁning e r Dual deﬁning e r Maximize λ s.t. Minimize P j ∈I r β j + t r m P i α i c i s.t. ∀ i, P j ∈I r ,k w ijk x j,k ≥ λ ∀ j ∈ I r , k, β j + P i ( α i a ijk − ρ i w ijk ) ≥ ∀ i, P j ∈I r ,k a ijk x j,k ≤ t r c i m P i ρ i ≥ ∀ j ∈ I r , P k x j,k ≤ ∀ i, ρ i ≥ , α i ≥ ∀ j ∈ I r , k, x j,k ≥ . ∀ j ∈ I r , β j ≥ Primal for the expected instance Dual for the expected instance

Maximize λ s.t. Minimize P j mp j β j + P i α i c i s.t. ∀ i, P j,k mp j w ijk x j,k ≥ λ ∀ j, k, mp j (cid:18) β j + P i ( α i a ijk − ρ i w ijk ) (cid:19) ≥ ∀ i, P j,k mp j a ijk x j,k ≤ c i P i ρ i ≥ ∀ j, P k x j,k ≤ ∀ i, ρ i ≥ , α i ≥ ∀ j, k, x j,k ≥ . ∀ j, β j ≥ β ∗ j ’s, α ∗ i ’s and ρ ∗ i ’s is feasible fordual of LP (9). Hence the value of e r is upper bounded the objective of dual of LP (9) at β ∗ j ’s, α ∗ i ’sand ρ ∗ i ’s. That is we have e r ≤ X j ∈I r β ∗ j + t r m X i α ∗ i c i . We now upper bound the RHS by applying Chernoﬀ bounds on P j ∈I r β ∗ j . Since the dual LP inLP (10) is a minimization LP, the constraints there imply that β ∗ j ≤ w max . Applying Chernoﬀ21ounds we have, e r ≤ t r X j p j β ∗ j + s t r ( X j p j β ∗ j ) w max ln(1 /δ ) + t r m X i α ∗ i c i ≤ t r W E m + t r W E m ǫ x,r where the ﬁrst inequality holds with probability at least 1 − δ and the second inequality uses thefact the optimal value of the expected instance (dual of LP (10)) is W E . This proves the requiredupper bound on e r that e r ≤ t r W E m (1 + ǫ x,r ) with probability at least 1 − δ .Going back to our application of Chernoﬀ bounds above, in order to apply it in the form above,we require that the multiplicative deviation from mean r w max ln(1 /δ ) t r P j p j β ∗ j ∈ [0 , e − P j p j β ∗ j ≥ ǫW E m , then this requirement would follow. Suppose on the other hand that P j p j β ∗ j < ǫW E m . Sincewe are happy if the excess over mean is at most t r W E m ǫ x,r , let us look for a multiplicative error of trWEǫx,rm t r P j p j β ∗ j . Based on the fact that P j p j β ∗ j < ǫW E m and that ǫ x,r > ǫ for all r , the multiplicativeerror can be seen to be at least a constant, and can be made larger than 2 e − γ . We now use the version of Chernoﬀ bounds for multiplicative errorlarger than 2 e −

1, which gives us that a deviation of t r W E m ǫ x,r occurs with a probability at most2 − trWEǫx,rmtr P j pjβ ∗ j ! tr P j pjβ ∗ jw max , where the division by w max is because of the fact that β ∗ j ≤ w max . Notingthat w max ≤ γW E , we get that this probability is at most δ/n which is at most δ .Lemma 3.3 implies that W E (1 − ǫ x,r − ) ≤ Z r ≤ W E , ∀ r ∈ { , , . . . , l − } . The rest of theproof is similar to that of Section 3.2, and is focused on the second point in the high-level overviewwe gave in the beginning of this section 3.3. In Section 3.2 we knew W E and obtained a W E (1 − ǫ )approximation with no resource i consumed beyond c i with probability 1 − ǫ . Here, instead of W E we have an approximation for W E in the form of Z r which gets increasingly accurate as r increases.We set a target of t r Z r m for stage r , and show that with a probability of at least 1 − δ we get a proﬁtof t r Z r m (1 − ǫ y,r ) from every resource i and no resource i consumed beyond t r c i m (1 + ǫ x,r ) capacity. As in Section 3.2, call stage r of algorithm A a failure if at least one of the following fails:1. For all i , P t r +1 t = t r +1 X Ai,t ≤ t r c i m (1 + ǫ x,r ).2. For all i , P t r +1 t = t r +1 Y Ai,t ≥ t r Z r m (1 − ǫ y,r ).Let S rs ( X i ) = P t r + st = t r +1 X i,t denote the amount of resource i consumed in the ﬁrst s steps ofstage r , and let S rs ( Y i ) = P t r + st = t r +1 Y i,t denote the resource i proﬁt in the ﬁrst s steps of stage r . Pr (cid:20) t r +1 X t = t r +1 X Ai,t ≥ t r c i m (1 + ǫ x,r ) (cid:21) = Pr (cid:20) P t r +1 t = t r +1 X Ai,t γc i ≥ t r mγ (1 + ǫ x,r ) (cid:21) Note that we are allowed to consume a bit beyond t r c i m because our goal is just that over all we don’t consumebeyond c i , and not that for every stage we respect the t r c i m constraint. In spite of this (1 + ǫ x,r ) excess consumptionin all stages, since stage − Pr (cid:20) (1 + ǫ x,r ) P tr +1 t = tr +1 XAi,tγci ≥ (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ (cid:21) ≤ E (cid:20) (1 + ǫ x,r ) P tr +1 t = tr +1 XAi,tγci (cid:21) / (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ = E (cid:20) Q t r +1 t = t r +1 (1 + ǫ x,r ) XAi,tγci (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ (11) Pr (cid:20) t r +1 X t = t r +1 Y Ai,t ≤ t r Z r m (1 − ǫ y,r ) (cid:21) = Pr (cid:20) P t r +1 t = t r +1 Y Ai,t w max ≤ t r Z r mw max (1 − ǫ y,r ) (cid:21) = Pr (cid:20) (1 − ǫ y,r ) P tr +1 t = tr +1 Y Ai,tw max ≥ (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max (cid:21) ≤ E (cid:20) (1 − ǫ y,r ) P tr +1 t = tr +1 Y Ai,tw max (cid:21) / (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max = E (cid:20) Q t r +1 t = t r +1 (1 − ǫ y,r ) Y Ai,tw max (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max (12)If our algorithm A was P (cid:18) and therefore we can use E [ X ∗ i,t ] ≤ c i m and E [ Y ∗ i,t ] ≥ W E m ≥ Z r m (cid:19) ,the total failure probability for each stage r which is the sum of (11) and (12) for all the i ’s wouldhave been n · " e − ǫ x,r γ trm + e − ǫ y,r trZrmw max = n · (cid:2) δ n + δ n (cid:3) = δ . The goal is to design an algorithm A for stage r that, unlike P , does not know the distribution but also obtains the same δ failureprobability, just as we did in Section 3.2. That is we want to show that the sum of (11) and (12)over all i ’s is at most δ : X i E (cid:20) Q t r +1 t = t r +1 (1 + ǫ x,r ) XAi,tγci (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) Q t r +1 t = t r +1 (1 − ǫ y,r ) Y Ai,tw max (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max ≤ δ. For the algorithm A s P t r − s , the above quantity can be rewritten as X i E (cid:20) (1 + ǫ x,r ) Srs ( XAi ) γci Q t r +1 t = t r + s +1 (1 + ǫ x,r ) X ∗ i,tγci (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srs ( Y Ai ) w max Q t r +1 t = t r + s +1 (1 − ǫ y,r ) Y ∗ i,tw max (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max ǫ ) x ≤ ǫx for 0 ≤ x ≤

1, is in turn upper bounded by X i E (cid:20) (1 + ǫ x,r ) Srs ( XAi ) γci Q t r +1 t = t r + s +1 (cid:16) ǫ x,r X ∗ i,t γc i (cid:17) (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srs ( Y Ai ) w max Q t r +1 t = t r + s +1 (cid:16) − ǫ y,r Y ∗ i,t w max (cid:17) (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max . Since for all t , the random variables X ∗ i,t , X Ai,t , Y ∗ i,t and Y Ai,t are all independent, and E [ X ∗ i,t ] ≤ c i m , E [ Y ∗ i,t ] ≥ W E m , and W E w max ≥ γ , the above is in turn upper bounded by X i E (cid:20) (1 + ǫ x,r ) Srs ( XAi ) γci (cid:16) ǫ x,r mγ (cid:17) t r − s (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srs ( Y Ai ) w max (cid:16) − ǫ y,r mγ (cid:17) t r − s (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max . (13)Let F r [ A s P t r − s ] denote the quantity in (13), which is an upper bound on failure probability ofthe hybrid algorithm A s P t r − s for stage r . We just showed that F r [ P t r ] ≤ δ . We now prove thatfor all s ∈ { , , . . . , t r − } , F r [ A s +1 P t r − s − ] ≤ F r [ A s P t r − s ], thus proving that F r [ A t r ] ≤ δ , i.e.,running the algorithm A for all the t r steps of stage r results in a failure with probability at most δ . Assuming that for all s < p , the algorithm A has been deﬁned for the ﬁrst s + 1 steps in sucha way that F r [ A s +1 P t r − s − ] ≤ F r [ A s P t r − s ], we now deﬁne A for the p + 1-th step of stage r in away that will ensure that F r [ A p +1 P t r − p − ] ≤ F r [ A p P t r − p ]. We have F r [ A p +1 P m − p − ] = X i E (cid:20) (1 + ǫ x,r ) Srp +1 ( XAi ) γci (cid:16) ǫ x,r mγ (cid:17) t r − p − (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srp +1 ( Y Ai ) w max (cid:16) − ǫ y,r mγ (cid:17) t r − p − (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max ≤ X i E (cid:20) (1 + ǫ x,r ) Srp ( XAi ) γci (cid:18) ǫ x,r X Ai,tr + p +1 γc i (cid:19) (cid:16) ǫ x,r mγ (cid:17) t r − p − (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srp ( Y Ai ) w max (cid:18) − ǫ y,r Y Ai,tr + p +1 w max (cid:19) (cid:16) − ǫ y,r mγ (cid:17) t r − p − (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max (14)24eﬁne φ ri,s = ǫ x,r γc i (cid:20) (1+ ǫ x,r ) Srs ( XAi ) γci (cid:16) ǫx,rmγ (cid:17) tr − s − (1+ ǫ x,r ) (1+ ǫx,r ) trmγ (cid:21) ψ ri,s = ǫ y,r w max (cid:20) (1 − ǫ y,r ) Srs ( Y Ai ) w max (cid:16) − ǫy,rmγ (cid:17) tr − s − (1 − ǫ y,r ) (1 − ǫy,r ) trZrmw max (cid:21) Deﬁne the step p + 1 of algorithm A as picking the following option k ∗ for request j : k ∗ = arg min k (X i a ijk · φ ri,p − X i w ijk · ψ ri,p ) . By the above deﬁnition of step p + 1 of algorithm A (for stage r ), it follows that for any two algo-rithms with the ﬁrst p steps being identical, and the last t r − p − P , algorithm A ’s p + 1-th step is the one that minimizes expression (14). Inparticular it follows that expression (14) is upper bounded by the same expression where the p + 1-the step is according to X ∗ i,t r + p +1 and Y ∗ i,t r + p +1 , i.e., we replace X Ai,t r + p +1 by X ∗ i,t r + p +1 and Y Ai,t r + p +1 by Y ∗ i,t r + p +1 . Therefore we have F r [ A p +1 P m − p − ] ≤ X i E (cid:20) (1 + ǫ x,r ) Srp ( XAi ) γci (cid:16) ǫ x,r X ∗ i,tr + p +1 γc i (cid:17) (cid:16) ǫ x,r mγ (cid:17) t r − p − (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srp ( Y Ai ) w max (cid:16) − ǫ y,r Y ∗ i,tr + p +1 w max (cid:17) (cid:16) − ǫ y,r mγ (cid:17) t r − p − (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max ≤ X i E (cid:20) (1 + ǫ x,r ) Srp ( XAi ) γci (cid:16) ǫ x,r mγ (cid:17) (cid:16) ǫ x,r mγ (cid:17) t r − p − (cid:21) (1 + ǫ x,r ) (1+ ǫ x,r ) trmγ + X i E (cid:20) (1 − ǫ y,r ) Srp ( Y Ai ) w max (cid:16) − ǫ y,r mγ (cid:17) (cid:16) − ǫ y,r mγ (cid:17) t r − p − (cid:21) (1 − ǫ y,r ) (1 − ǫ y,r ) trZrmw max = F r [ A p P t r − p ]This completes the proof of Theorem 2.2. Our Algorithm 2 in Section 3.3 required periodically computing the optimal solution to an oﬄineinstance. Similarly, our Algorithm 1 in Section 3.2 requires the value of W E to be given. Supposewe could only approximately estimate these quantities, do our results carry through approximately?That is, suppose the solution to the oﬄine instance is guaranteed to be at least α of the optimal, and25he stand-in that we are given for W E is guaranteed to be at least α of W E . Both our Theorem 2.2and Theorem 3.2 go through with just the W E replaced by W E /α . Every step of the proof of theexact version goes through in this approximate version, and so we skip the formal proof for thisstatement. In this section, we relax the assumption that requests are drawn i.i.d. every time step. Namely,the distribution for each time step need not be identical, but an adversary gets to decide whichdistribution to sample a request from. The adversary could even use how the algorithm has per-formed in the ﬁrst t − t . The relevance ofthis model for the real world is that for settings like display ads, the distribution ﬂuctuates over theday. In general a day is divided into many chunks and within a chunk, the distribution is assumedto remain i.i.d. This is exactly captured by this model.We give algorithms that give guarantees against three diﬀerent benchmarks in the three mod-els below. The benchmarks get successively stronger, and hence the information sought by thealgorithm also increases successively. In this model, the guarantee we give is against the worst distribution over all time steps pickedby the adversary. More formally, let W E ( t ) denote the optimal proﬁt for the expected instance ofdistribution of time step t . Our benchmark will be W E = min t W E ( t ). Given just the single number W E , our Algorithm 1 in Section 3.2 will guarantee a revenue of W E (1 − ǫ ) with a probability ofat least 1 − ǫ assuming γ = O ( ǫ log( n/ǫ ) ), just like the guarantee in Theorem 3.2.Algorithm 1 works for this ASI model because, the proof did not use the similarity of the dis-tributions beyond the fact that E [ X ∗ i,t | X ∗ i,t ′ ∀ t ′ < t ] ≤ c i m for all values of X ∗ i,t ′ , and E [ Y ∗ i,t | Y ∗ i,t ′ ∀ t ′

Input:

Capacities c i for i ∈ [ n ], the total number of requests m , the values of γ and W E ( t ) for t ∈ [ m ], an error parameter ǫ > . Output:

An online allocation of resources to requests Initialize φ i, = c i (cid:20) (cid:16) ǫ (1+ ǫ ) γm (cid:17) m − (1+ ǫ ) γ (cid:21) , and, ψ i, = W E (cid:20) Q mt =2 (cid:16) − ǫWE ( t )(1+ ǫ ) WEγm (cid:17) (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) for s = 1 to m do If the incoming request is j , use the following option k ∗ : k ∗ = arg min k ∈K∪{⊥} (X i a ijk · φ i,s − − X i w ijk · ψ i,s − ) . X Ai,s = a ijk ∗ , Y Ai,s = w ijk ∗ Update φ i,s = φ i,s − ·  (1+ ǫ ) XAi,sγci ǫ (1+ ǫ ) γm  , and, ψ i,s = ψ i,s − ·  (1 − ǫ ) Y Ai,sγWE − ǫ (1+ ǫ ) γm  end for We skip the proof for the proﬁt guarantee of W E (1 − ǫ ) since it is almost identical to the proofin Section 3.2 for Algorithm 1. In this model, which is otherwise identical to models 1 and 2, our benchmark is even stronger:namely, the optimal proﬁt of the expected instance with all the time varying distributions (explicitlyspelled out in LP (15)). This benchmark W E is the strongest benchmark possible. Correspondingly,our algorithm requires more information than in model 2: we ask for W E,i ( t ) for every i and t , and27 i ( t ) for every i and t at the beginning of the algorithm, where W E,i ( t ) and c i ( t ) are the amountof type i proﬁt obtained and type i resource consumed by the optimal solution to the expectedinstance in LP (15) at step t . Namely, W E,i ( t ) = P j,k p j,t w ijk x ∗ j,k,t , and c i ( t ) = P j,k p j,t a ijk x ∗ j,k,t ,where x ∗ j,k,t ’s are the optimal solution to LP (15).Primal and dual LPs deﬁning the expected instance (15) Primal for ASI model 3 Dual for ASI model 3

Maximize λ s.t. Minimize P j,t p j,t β j , t + P i α i c i s.t. ∀ i, P t,j,k p j,t w ijk x j,k,t ≥ λ ∀ j, k, p j,t (cid:18) β j,t + P i ( α i a ijk − ρ i w ijk ) (cid:19) ≥ ∀ i, P t,j,k p j,t a ijk x j,k,t ≤ c i P i ρ i ≥ ∀ j, t, P k x j,k,t ≤ ∀ i, ρ i ≥ , α i ≥ ∀ j, k, t x j,k,t ≥ . ∀ j, β j ≥

0A slight modiﬁcation of our Algorithm 1 in Section 3.2 will give a revenue of W E (1 − ǫ ) withprobability at least 1 − ǫ . We modify the two potential functions φ i,s and ψ i,s in the most naturalway to account for the fact that distributions change every step. Let W E,i = P mt =1 W E,i ( t ), andthus, our benchmark W E is simply min i W E,i . Note also that P mt =1 c i ( t ), call it c ∗ i , is at most c i bythe feasibility of the optimal solution to LP (15).Deﬁne φ i,s = 1 c i (cid:20) (1 + ǫ ) Ss ( XAi ) γci Q mt = s +2 (cid:16) ǫc i ( t )(1+ ǫ ) c i γ (cid:17) (1 + ǫ ) γ (cid:21) ψ i,s = 1 W E,i (cid:20) (1 − ǫ ) Ss ( Y Ai ) γWE,i Q mt = s +2 (cid:16) − ǫW E,i ( t )(1+ ǫ ) W E,i γ (cid:17) (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) We present our algorithm below in Algorithm 4.Algorithm 4 works for this ASI model much for the same reason why Algorithm 3 worked forASI model 2: all the proof needs is that E [ X ∗ i,t | X ∗ i,t ′ ∀ t ′ < t ] = c i ( t ) m for all values of X ∗ i,t ′ , and E [ Y ∗ i,t | Y ∗ i,t ′ ∀ t ′ < t ] = W E,i ( t ) m for all values of Y ∗ i,t ′ (Here X ∗ i,t and Y ∗ i,t denote the random variables forresource consumption and proﬁt at time t following from allocation according the optimal solutionto the expected instance captured by LP (15)).We skip the proof for the proﬁt guarantee of W E (1 − ǫ ) since it is almost identical to the proofin Section 3.2 for Algorithm 1. 28 lgorithm 4 : Algorithm for stochastic online resource allocation in ASI model 3 Input:

Capacities c i ( t ) and c i , and proﬁts W E,i ( t ) for i ∈ [ n ] , t ∈ [ m ], the total number ofrequests m , the values of γ and an error parameter ǫ > . Output:

An online allocation of resources to requests Initialize φ i, = c i (cid:20) Q mt =2 (cid:16) ǫci ( t )(1+ ǫ ) ciγ (cid:17) (1+ ǫ ) γ (cid:21) , and, ψ i, = W E,i (cid:20) Q mt =2 (cid:18) − ǫWE,i ( t )(1+ ǫ ) WE,iγ (cid:19) (1 − ǫ ) − ǫγ (1+ ǫ ) (cid:21) for s = 1 to m do If the incoming request is j , use the following option k ∗ : k ∗ = arg min k ∈K∪{⊥} (X i a ijk · φ i,s − − X i w ijk · ψ i,s − ) . X Ai,s = a ijk ∗ , Y Ai,s = w ijk ∗ Update φ i,s = φ i,s − ·  (1+ ǫ ) XAi,sγci ǫ (1+ ǫ ) γm  , and, ψ i,s = ψ i,s − ·  (1 − ǫ ) Y Ai,sγWE,i − ǫ (1+ ǫ ) γm  end for In this section, we construct a family of instances of the resource allocation problem in the i.i.d.setting for which γ = ω ( ǫ / log n ) will rule out a competitive ratio of 1 − O ( ǫ ). The constructionclosely follows the construction by Agrawal et al. [2014] for proving a similar result in the random-permutation model.The instance has n = 2 z resources with B units of each resource, and Bz (2 + 1 /α ) + √ Bz requests where α < z , with the ones (or the scalars) representing the coordinates of resources that are consumedby this request, if served.The requests are classiﬁed into z categories. Each category in expectation consists of m/z = B (2 + 1 /α ) + p B/z requests. A category, indexed by i , is composed of two diﬀerent binary vectors v i and w i (each of length 2 z ). The easiest way to visualize these vectors is to construct two 2 z × z − z , written onestring in a row. The ﬁrst matrix lists the strings in ascending order and the second matrix indescending order. The i -th column of the ﬁrst matrix multiplied by the scalar α is the vector v i and the i -th column of the second matrix is the vector w i . There are two properties of these vectorsthat are useful for us:1. The vectors v i /α and w i are complements of one another2. Any matrix of z columns, with column i being either v i /α or w i has exactly one row with allones in it. 29e are ready to construct the i.i.d. instance now. Each request is drawn from the followingdistribution. A given request could be, for each i , of type:1. v i and proﬁt 4 α with probability Bαzm w i and proﬁt 3 with probability Bzm w i and proﬁt 2 with probability q Bzm w i and proﬁt 1 with probability Bzm

5. Zero vector with zero proﬁt with probability 1 − Bzm − q Bzm − Bαzm

We use the following notation for request types: a (2 , w i ) request stands for a w i type request ofproﬁt 2. Observe that the expected instance has an optimal proﬁt of OPT = 7 B . This is obtainedby picking for each i , the Bαz vectors of type v i and proﬁt 4 α , along with Bz vectors of type w i withproﬁt 3. Note that this exhausts every unit of every item, and thus, combined with the fact thatthe most proﬁtable requests have been served, the value of 7 B is indeed the optimal value. Thismeans, any algorithm that obtains a 1 − ǫ competitive ratio must have an expected proﬁt of atleast 7 B − ǫB .Let r i ( w ) and r i ( v ) be the random variables denoting the number of vectors of type w i and v i picked by some 1 − ǫ competitive algorithm ALG . Let a i ( v ) denote the total number of vectors oftype v i that arrived in this instance. Lemma 4.1

For some constant k , the r i ( w ) ’s satisfy X i E (cid:20) | r i ( w ) − B/z | (cid:21) ≤ ǫB + 4 √ αkBz. Proof:

Let Y denote the set of indices i for which r i ( w ) > B/z . One way to upper bound the totalnumber of vectors of type v picked by ALG is the following. Split the set of indices into Y and X = [ z ] \ Y . The number of v ’s from Y is, by chosen notation, P i ∈ Y r i ( v ). The number of v ’sfrom X , we show, is at most B − P i ∈ Y r i ( w ) α . Note that since there are only B copies of every item,it follows that α [ P i r i ( v )] ≤ B , and P i r i ( w ) ≤ B . Further, by property 2 of v i ’s and w i ’s, wehave that α [ P i ∈ X r i ( v )] + P i ∈ Y r i ( w ) ≤ B . This means that the number of v ’s from X is at most B − P i ∈ Y r i ( w ) α .Let P = P i ∈ Y ( r i ( w ) − B/z ), and M = P i ∈ X ( B/z − r i ( w )). Showing E [ P + M ] ≤ ǫB +4 √ αkBz proves the lemma. By an abuse of notation, let ALG also be the proﬁt obtained by the algorithm

ALG and let best w i ( t ) denote the most proﬁtable t requests of type w i in a given instance. Notethat 4 B + P zi =1 best w i ( B/z ) ≤ B = OPT. We upper-bound E [ ALG ] as: E [ ALG ] ≤ E (cid:20) z X i =1 best w i ( r i ( w )) (cid:21) + 4 α " B − P i ∈ Y E [ r i ( w )] α + X i ∈ Y E [ r i ( v )] ≤ E (cid:20) z X i =1 best i ( B/z ) + 3 P − M (cid:21) + 4 (cid:18) B − E (cid:20) X i ∈ Y ( r i ( w ) − B/z ) + | Y | B/z (cid:21)(cid:19)

30 4 α E (cid:20) X i ∈ Y r i ( v ) (cid:21) ≤ OPT − E [ P + M ] + 4 α E (cid:20) X i ∈ Y (cid:18) r i ( v ) − Bαz (cid:19)(cid:21)(cid:18)

Since P = X i ∈ Y ( r i ( w ) − B/z ) (cid:19) ≤ OPT − E [ P + M ] + 4 α E (cid:20) X i ∈ Y (cid:18) a i ( v ) − Bαz (cid:19)(cid:21) (Since r i ( v ) ≤ a i ( v )) ≤ OPT − E [ P + M ] + 4 α E (cid:20) X i : a i ( v ) ≥ Bαz (cid:18) a i ( v ) − Bαz (cid:19)(cid:21) ≤ OPT − E [ P + M ] + 4 α · z · k ′ · r Bαz (where k ′ is some constant from Central Limit Theorem) ≤ OPT − E [ P + M ] + 4 √ αkBz (where k is k ′ )The inequality that follows from CLT uses the fact that for a random variable X ∼ ( m, c/m ) ( X is binomially distributed with success probability of c/m ), whenever c = ω (1), and c ≤ m , we havethat E [ X | X ≥ c ] = c + k ′ √ c , for some constant k ′ . In this case, we have Bαz in place of c . Forexample, if n = log( m ) (and thus z = log n = log log m ), as long as B = ω (log log m ) and B ≤ m ,the CLT inequality will hold. Note that α could have been any constant and this argument stillholds.We are now ready to prove Theorem 2.3, which we restate here for convenience. Theorem 2.3 .There exist instances with γ = ǫ log( n ) such that no algorithm, even with complete knowledge of thedistribution, can get a 1 − o ( ǫ ) approximation factor. Proof:

We ﬁrst give the overview of theproof before providing a detailed argument.

Overview.

Lemma 4.1 says that r i ( w ) has to be almost always close to B/z for all i . In particular,the probability that P i | r i ( w ) − B/z | ≤ (cid:16) ǫB + 4 √ αkBz (cid:17) is at least 3 /

4. In this proof we show,in an argument similar to the one in Agrawal et al. [2014], that if this has to be true, one has tolose a revenue of Ω( √ Bz ) − ǫB + 4 √ αkBz ). Since α can be set to any arbitrary constant, thismeans that we lose a revenue of Ω( √ Bz ) − ǫB . Since OPT is 7 B , to get a 1 − ǫ approximation,we require that Ω( √ Bz ) − ǫB ≤ ǫB . Thus, we need B ≥ Ω( log mǫ ). In other words, we require γ = B ≤ O ( ǫ log m ). In Detail

We now proceed to prove the claim that a revenue loss of Ω( √ Bz ) − ǫB + √ αkBz ) isinevitable. We just showed that with a probability of at least 3/4, P i | r i ( w ) − B/z | ≤ (cid:16) ǫB + 4 √ αkBz (cid:17) .For now we assume that r i ( w ) should be exactly B/z and later account for the probability 1 / (cid:16) ǫB + 4 √ αkBz (cid:17) error that is allowed by Lemma 4.1. With this assumption,we show that for each i there is a loss of Ω( p B/z ).31or each i let o i denote the number of (1 , w i ) requests that the algorithm served in total. Witha constant probability the number of 3’s and 2’s (of type w i ) exceed B/z . If o i = Ω( p B/z ) thereis a loss of at least Ω( p B/z ) because of picking ones instead of 2’s or 3’s. This establishes theΩ( p B/z ) loss that we wanted to prove, for this case.Suppose o i < Ω( √ Bz ). For each i , let R i be the set of requests of type w i with proﬁt either 1 or3. For every i , with a constant probability 2 B/z − p B/z ≤ | R i | ≤ B/z + 2 p B/z . Conditionalon the set R i we make the following two observations: • the types of requests in R i are independent random variables that take value 1 or 3 with equalprobability. • the order of requests in R i is a uniformly random permutation of R i Now consider any (2 , w i ) request, say t -th request, of proﬁt 2. With a constant probability thisrequest can be served without violating any capacity constraints, and thus, the algorithm has todecide whether or not to serve this request. In at least 1/2 of the random permutations of R i , thenumber of bids from set R i before the bid t is less than B/z . Conditional on this event, the proﬁtsof requests in R i before t , with a constant probability could:1. take values such that there are enough (3 , w i ) requests after t to make the total number of w i requests picked by the algorithm to be at least B/z ;2. take values such that even if all the (3 , w i ) requests after t were picked, the total number of w i requests picked is at most B/z − p B/z with a constant probability.In the ﬁrst kind of instances (where number of (3 , w i ) requests are more than B/z ) retaining (2 , w i )causes a loss of 1 as we could have picked a 3 instead. In the second kind, skipping (2 , w i ) causesa loss of 1 since we could have picked that 2 instead of a 1. Thus there is an inevitable constantprobability loss of 1 per (2 , w i ) request. Thus in expectation, there is a Ω( p B/z ) loss.Thus whether o i = p B/z or o i < p B/z , we have established a loss of Ω( p B/z ) per i and thusa total expected loss of Ω( √ Bz ). This is under the assumption that r i ( w ) is exactly B/z . Thereis a leeway of 4 (cid:16) ǫB + 4 √ αkBz (cid:17) granted by Lemma 4.1. Even after that leeway, since α can bemade an arbitrarily small constant and Lemma 4.1 still holds, we have the loss at Ω( √ Bz ) − ǫB .Now after the leeway, the statement P i | r i ( w ) − B/z | ≤ (cid:16) ǫB + 4 √ αkBz (cid:17) has to hold only withprobability 3 /

4. But even this puts the loss at Ω( √ Bz ) − ǫB Therefore, E [ ALG ] ≤ OPT − Ω( √ Bz ) − ǫB. Since OPT = 7 B , we have E [ ALG ] ≤ OPT(1 − Ω( p z/B ) − ǫ ), and in order to get 1 − O ( ǫ ) approximation we need Ω( p z/B − ǫ ) ≤ O ( ǫ ),implying that B ≥ Ω( z/ǫ ) = Ω(log m/ǫ ) . In this section, we give a simple proof of Theorem 2.4, which we restate below for convenience.

Theorem 2.4 . The greedy algorithm achieves an approximation factor of 1 − /e for the Adwordsproblem in the i.i.d. unknown distributions model for all γ , i.e., 0 ≤ γ ≤ j arrives, with a bid amount b ij > remaining budget of i , we are still32llowed to allot that query to advertiser i , but we only earn a revenue of the remaining budget of i , and not the total value b ij .Goel and Mehta [2008] prove that the greedy algorithm gives a (1 − /e ) approximation tothe adwords problem when the queries arrive in a random permutation or in i.i.d., but underan assumption which almost gets down to γ tending to zero, i.e., bids being much smaller thanbudgets. We give a much simpler proof for a (1 − /e ) approximation by greedy algorithm for thei.i.d. unknown distributions case, and our proof works for all γ .Let p j be the probability of query j appearing in any given impression. Let y j = mp j . Let x ij denote the oﬄine fractional optimal solution for the expected instance. Let w i ( t ) denote the amountof money spent by advertiser i at time step t , i.e., for the t -th query in the greedy algorithm (to bedescribed below). Let f i (0) = P j b ij x ij y j . Let f i ( t ) = f i (0) − P tr =1 w i ( r ). Let f ( t ) = P ni =1 f i ( t ).Note that f i (0) is the amount spent by i in the oﬄine fractional optimal solution to the expectedinstance.Consider the greedy algorithm which allocates the query j arriving at time t to the advertiserwho has the maximum eﬀective bid for that query, i.e., argmax i min { b ij , B i − P t − r =1 w i ( r ) } . Weprove that this algorithm obtains a revenue of (1 − /e ) P i,j b ij x ij y j and thus gives the desired1 − /e competitive ratio against the fractional optimal solution to the expected instance. Considera hypothetical algorithm that allocates queries to advertisers according to the x ij ’s. We prove thatthis hypothetical algorithm obtains an expected revenue of (1 − /e ) P i,j b ij x ij y j , and argue thatthe greedy algorithm only performs better. Let w hi ( t ) and f hi ( t ) denote the quantities analogous to w i ( t ) and f i ( t ) for the hypothetical algorithm, with the initial value f hi (0) = f i (0) = P j b ij x ij y j .Let f h ( t ) = P ni =1 f hi ( t ). Let EXCEED i ( t ) denote the set of all j such that b ij is strictly greaterthan the remaining budget at the beginning of time step t , namely b ij > B i − P t − r =1 w hi ( r ) . Lemma 5.1 E [ w hi ( t ) | f hi ( t − ≥ f hi ( t − m Proof:

The expected amount amount of money spent at time step t , is given by E [ w hi ( t ) | f hi ( t − X j ∈ EXCEED i ( t ) (cid:18) B i − t − X r =1 w hi ( r ) (cid:19) x ij y j m + X j / ∈ EXCEED i ( t ) b ij x ij y j m . (16)If P j ∈ EXCEED i ( t ) x ij y j ≥

1, then by (16), E [ w hi ( t ) | f hi ( t − ≥ B i − P t − r =1 w hi ( r ) m ≥ f hi (0) − P t − r =1 w hi ( r ) m = f hi ( t − m . Suppose on the other hand P j ∈ EXCEED i ( t ) x ij y j < . We can write E [ w hi ( t ) | f hi ( t − E [ w hi ( t ) | f hi ( t − f hi (0) m − X j ∈ EXCEED i ( t ) (cid:18) b ij − ( B i − t − X r =1 w hi ( r )) (cid:19) x ij y j m . (17)Since b ij ≤ B i , and P j ∈ EXCEED i ( t ) x ij y j <

1, (17) can be simpliﬁed to E [ w hi ( t ) | f hi ( t − > f hi (0) m − P t − r =1 w hi ( r ) m = f hi ( t − m . emma 5.2 The hypothetical algorithm satisﬁes the following: E [ f h ( t ) | f h ( t − ≤ f h ( t − − /m ) Proof:

From the deﬁnition of f hi ( t ), we have f hi ( t ) = f hi ( t − − w hi ( t ) E [ f hi ( t ) | f hi ( t − f hi ( t − − E [ w hi ( t ) | f hi ( t − ≤ f hi ( t − − m ) , where the inequality is due to Lemma 5.1. Summing over all i gives the Lemma. Lemma 5.3 E [GREEDY] ≥ (1 − /e ) P i,j b ij x ij y j Proof:

Lemma 5.2 proves that for the hypothetical algorithm, the value of the diﬀerence f h ( t − − E [ f h ( t ) | f h ( t − t by all the advertisers together,conditioned on f h ( t − f h ( t − m . But by deﬁnition, conditioned on the amount of moneyspent in ﬁrst t − t . Thus,for the greedy algorithm too, the statement of the lemma 5.2 must hold, namely, E [ f ( t ) | f ( t − ≤ f ( t − − /m ). This means that E [ f ( m )] ≤ f (0)(1 − /m ) m ≤ f (0)(1 /e ). Thus the expectedrevenue earned is E [ m X r =1 w ( r )] = f (0) − E [ f ( m )] ≥ f (0) (1 − /e )= (1 − /e ) X i,j b ij x ij y j and this proves the lemma.Lemma 5.3 proves Theorem 2.4. In this section, we consider the mixed packing-covering problem stated in Section 2.4. and proveTheorem 2.5. We restate the integer program for the mixed covering-packing problem here. ∀ i, X j,k a ijk x j,k ≤ c i ∀ i, X j,k w ijk x j,k ≥ d i ∀ j, X k x j,k ≤ ∀ j, k, x j,k ∈ { , } . (18)34he goal is to check if there is a feasible solution to this IP. We solve a gap version of this problem.Distinguish between the two cases with a high probability, say 1 − δ : • YES: There is a feasible solution. • NO: There is no feasible solution even with a slack, namely, even if all of the c i ’s are multipliedby 1 + 3 ǫ (1 + ǫ ) and all of the d i ’s are multiplied by 1 − ǫ (1 + ǫ ).We use 1 + 3 ǫ (1 + ǫ ) and 1 − ǫ (1 + ǫ ) for slack instead of just 1 + ǫ and 1 − ǫ purely to reducenotational clutter in what follows (mainly for the NO case).Like in the online problem, we refer to the quantities indexed by j as requests, a ijk as resource i consumption, and w ijk as resource i proﬁt, and the quantities indexed by k as options. There area total of m requests, n resources, and K options, and the “zero” option denoted by ⊥ . Recall thatthe parameter γ for this problem is deﬁned by γ = max (cid:18)n a ijk c i o i,j,k ∪ n w ijk d i o i,j,k (cid:19) . Our algorithmneeds the value of m , n and γ (an upper bound on the value of γ also suﬃces). High-level overview.

We solve this oﬄine problem in an online manner via random sampling.We sample T = Θ( γm log( n/δ ) ǫ ) requests j from the set of possible requests uniformly at random withreplacement, and then design an algorithm that allocates resources online for these requests. At theend of serving T requests we check if the obtained solution proportionally satisﬁes the constraints ofIP (18). If yes, we declare YES as the answer and declare NO otherwise. At the core of the solutionis the online sampling algorithm we use, which is identical to the techniques used to develop theonline algorithm in Sections 3.2 and 3.3. We describe our algorithm in Algorithm 5The main theorem of this section is Theorem 2.5, which we restate here: Theorem 2.5 .For any ǫ >

0, Algorithm 5 solves the gap version of the mixed covering-packing problem withΘ( γm log( n/δ ) ǫ ) oracle calls. Detailed Description and Proof

The proof is in two parts. The ﬁrst part proves that ouralgorithm indeed answers YES when the actual answer is YES with a probability at least 1 − δ .The second part is the identical statement for the NO case. The YES case

We begin with the case where the true answer is YES. Let x ∗ jk denote somefeasible solution to the LP relaxation of IP (18). In a similar spirit to Sections 3.1, 3.2 and 3.3,we deﬁne the algorithm P as follows. It samples a total of T = Θ( γm log( n/δ ) ǫ ) requests uniformly atrandom, with replacement, from the total pool of m requests. When request j is sampled, P serves j using option k with probability x ∗ jk . Thus, if we denote by X ∗ i,t the consumption of resource i in step t of P , then we have E [ X ∗ i,t ] = P mj =1 1 m P k a ijk x ∗ jk ≤ c i m . This inequality follows from x ∗ jk being a feasible solution to LP relaxation of (18). Similarly let Y ∗ i,t denote the resource i proﬁt instep t of P . We have E [ Y ∗ i,t ] ≥ d i m . We now write the probability that our condition for YES isviolated for some algorithm A . Pr (cid:20) T X t =1 X Ai,t ≥ T c i m (1 + ǫ ) (cid:21) = Pr (cid:20) P Tt =1 X Ai,t γc i ≥ Tmγ (1 + ǫ ) (cid:21) lgorithm 5 : Online sampling algorithm for oﬄine mixed covering-packing problems Input:

The mixed packing and covering IP (18), failure probability δ >

0, and an error parameter ǫ > Output:

Distinguish between the cases ‘YES’ where there is a feasible solution to IP (18), and‘NO’ where there is no feasible solution to IP (18) even if all the c i ’s are multiplied by 1 + 3 ǫ (1 + ǫ )and all of the d i ’s are multiplied by 1 − ǫ (1 + ǫ ). Set T = Θ( γm log( n/δ ) ǫ ) Initialize φ i, = c i (cid:20) (cid:16) ǫmγ (cid:17) T − (1+ ǫ ) (1+ ǫ ) Tmγ (cid:21) , and, ψ i, = d i (cid:20) (cid:16) − ǫmγ (cid:17) T − (1 − ǫ ) (1 − ǫ ) Tmγ (cid:21) for s = 1 to T do Sample a request j uniformly at random from the total pool of m requests If the incoming request is j , use the following option k ∗ : k ∗ = arg min k ∈K∪{⊥} (X i a ijk · φ i,s − − X i w ijk · ψ i,s − ) . X Ai,s = a ijk ∗ , Y Ai,s = w ijk ∗ Update φ i,s = φ i,s − ·  (1+ ǫ ) XAi,sγci ǫmγ  , and, ψ i,s = ψ i,s − ·  (1 − ǫ ) Y Ai,sγdi − ǫmγ  end for if ∀ i P Tt =1 X Ai,t < T c i m (1 + ǫ ), and, P Tt =1 Y Ai,t > T d i m (1 − ǫ ) then Declare YES else

Declare NO end if = Pr (cid:20) (1 + ǫ ) P Tt =1 XAi,tγci ≥ (1 + ǫ ) (1+ ǫ ) Tmγ (cid:21) ≤ E (cid:20) (1 + ǫ ) P Tt =1 XAi,tγci (cid:21) / (1 + ǫ ) (1+ ǫ ) Tmγ = E (cid:20) Q Tt =1 (1 + ǫ ) XAi,tγci (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ (19) Pr (cid:20) T X t =1 Y Ai,t ≤ T d i m (1 − ǫ ) (cid:21) = Pr (cid:20) P Tt =1 Y Ai,t γd i ≥ Tmγ (1 − ǫ ) (cid:21) = Pr (cid:20) (1 − ǫ ) P Tt =1 Y Ai,tγdi ≥ (1 − ǫ ) (1 − ǫ ) Tmγ (cid:21) E (cid:20) (1 − ǫ ) P Tt =1 Y Ai,tγdi (cid:21) / (1 − ǫ ) (1 − ǫ ) Tmγ = E (cid:20) Q Tt =1 (1 − ǫ ) Y Ai,tγdi (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ (20)If our algorithm A was P (cid:18) and therefore we can use E [ X ∗ i,t ] ≤ c i m and E [ Y ∗ i,t ] ≥ d i m (cid:19) , the totalfailure probability in the YES case, which is the sum of (19) and (20) for all the i ’s would havebeen at most δ , if T = Θ( γm log( n/δ ) ǫ ) for an appropriate constant inside Θ. The goal is to designan algorithm A that, unlike P , does not ﬁrst solve LP relaxation of IP (18) and then use x ∗ jk ’s toallocate resources, but allocates online and also obtains the same δ failure probability, just as wedid in Sections 3.2 and 3.3. That is we want to show that the sum of (19) and (20) over all i ’s isat most δ : E (cid:20) Q Tt =1 (1 + ǫ ) XAi,tγci (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + E (cid:20) Q Tt =1 (1 − ǫ ) Y Ai,tγdi (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ ≤ δ. For the algorithm A s P T − s , the above quantity can be rewritten as X i E (cid:20) (1 + ǫ ) Ss ( XAi ) γci Q Tt = s +1 (1 + ǫ ) X ∗ i,tγci (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + X i E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γdi Q Tt = s +1 (1 − ǫ ) Y ∗ i,tγdi (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ , which, by using (1 + ǫ ) x ≤ ǫx for 0 ≤ x ≤

1, is in turn upper bounded by X i E (cid:20) (1 + ǫ ) Ss ( XAi ) γci Q Tt = s +1 (cid:16) ǫ X ∗ i,t γc i (cid:17) (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + X i E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γdi Q Tt = s +1 (cid:16) − ǫ Y ∗ i,t γd i (cid:17) (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ . Since for all t , the random variables X ∗ i,t , X Ai,t , Y ∗ i,t and Y Ai,t are all independent, and E [ X ∗ i,t ] ≤ c i m , E [ Y ∗ i,t ] ≥ d i m , the above is in turn upper bounded by X i E (cid:20) (1 + ǫ ) Ss ( XAi ) γci (cid:16) ǫmγ (cid:17) T − s (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + X i E (cid:20) (1 − ǫ ) Ss ( Y Ai ) γdi (cid:16) − ǫmγ (cid:17) T − s (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ . (21)Let F [ A s P T − s ] denote the quantity in (21), which is an upper bound on failure probabilityof the hybrid algorithm A s P T − s . We just said that F [ P T ] ≤ δ . We now prove that for all s ∈ { , , . . . , T − } , F [ A s +1 P T − s − ] ≤ F [ A s P T − s ], thus proving that F [ A T ] ≤ δ , i.e., running thealgorithm A for all the T steps of stage r results in a failure with probability at most δ .Assuming that for all s < p , the algorithm A has been deﬁned for the ﬁrst s + 1 steps in such away that F [ A s +1 P T − s − ] ≤ F [ A s P T − s ], we now deﬁne A for the p + 1-th step of stage r in a way37hat will ensure that F [ A p +1 P T − p − ] ≤ F [ A p P T − p ]. We have F [ A p +1 P m − p − ] = X i E (cid:20) (1 + ǫ ) Sp +1 ( XAi ) γci (cid:16) ǫmγ (cid:17) T − p − (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + X i E (cid:20) (1 − ǫ ) Sp +1 ( Y Ai ) γdi (cid:16) − ǫmγ (cid:17) T − p − (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ ≤ X i E (cid:20) (1 + ǫ ) Sp ( XAi ) γci (cid:18) ǫ X Ai,p +1 γc i (cid:19) (cid:16) ǫmγ (cid:17) T − p − (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + X i E (cid:20) (1 − ǫ ) Sp ( Y Ai ) γdi (cid:18) − ǫ Y Ai,p +1 γd i (cid:19) (cid:16) − ǫmγ (cid:17) T − p − (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ (22)Deﬁne φ i,s = 1 c i (cid:20) (1 + ǫ ) Ss ( XAi ) γci (cid:16) ǫmγ (cid:17) T − s − (1 + ǫ ) (1+ ǫ ) Tmγ (cid:21) ; ψ i,s = 1 d i (cid:20) (1 − ǫ ) Ss ( Y Ai ) γdi (cid:16) − ǫmγ (cid:17) T − s − (1 − ǫ ) (1 − ǫ ) Tmγ (cid:21)

Deﬁne the step p + 1 of algorithm A as picking the following option k for request j : k ∗ = arg min k ∈ K ∪{⊥} (X i a ijk · φ i,p − X i w ijk · ψ i,p ) . By the above deﬁnition of step p + 1 of algorithm A , it follows that for any two algorithms withthe ﬁrst p steps being identical, and the last T − p − P , algorithm A ’s p + 1-th step is the one that minimizes expression (22). In particularit follows that expression (22) is upper bounded by the same expression where the p + 1-the stepis according to X ∗ i,p +1 and Y ∗ i,p +1 , i.e., we replace X Ai,p +1 by X ∗ i,p +1 and Y Ai,p +1 by Y ∗ i,p +1 . Thereforewe have F [ A p +1 P T − p − ] ≤ X i E (cid:20) (1 + ǫ ) Sp ( XAi ) γci (cid:16) ǫ X ∗ i,p +1 γc i (cid:17) (cid:16) ǫmγ (cid:17) T − p − (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ + X i E (cid:20) (1 − ǫ ) Sp ( Y Ai ) γdi (cid:16) − ǫ Y ∗ i,T + p +1 γd i (cid:17) (cid:16) − ǫmγ (cid:17) T − p − (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ ≤ X i E (cid:20) (1 + ǫ ) Sp ( XAi ) γci (cid:16) ǫmγ (cid:17) (cid:16) ǫmγ (cid:17) T − p − (cid:21) (1 + ǫ ) (1+ ǫ ) Tmγ +38 i E (cid:20) (1 − ǫ ) Sp ( Y Ai ) γdi (cid:16) − ǫmγ (cid:17) (cid:16) − ǫmγ (cid:17) T − p − (cid:21) (1 − ǫ ) (1 − ǫ ) Tmγ = F [ A p P T − p ] The NO case

We now proceed to prove that when the real answer is NO, our algorithm saysNO with a probability at least 1 − δ . To prove this result (formally stated in Lemma 6.3), we useas a tool the fact that when the integer program in (18) is in the NO case where even a slack of3 ǫ (1 + ǫ ) will not make it feasible, then even the LP relaxation of (18) will be infeasible with aslack of 2 ǫ . We prove this statement now by proving its contrapositive in Lemma 6.1. Lemma 6.1

If the LP relaxation of (18) is feasible with a slack of s , then the integer programin (18) is feasible with a slack of s (1 + ǫ ) + ǫ .Proof: To prove this, we write the LP relaxation of the integer program in (18) slightly diﬀerentlybelow. Primal and dual LPs corresponding to integer program in (18) (23)

Primal LP corresponding to IP (18)

Dual LP corresponding to IP (18)Minimize λ s.t. Maximize P i ( ρ i − α i ) − P j β j s.t. ∀ i, λ − P j,k a ijk x j,k c i ≥ − ∀ j, k, β j ≥ P i (cid:16) ρ i w ijk d i − α i a ijk c i (cid:17) ∀ i, λ + P j,k w ijk x j,k d i ≥ P i ( α i + ρ i ) ≤ ∀ j, P k x j,k ≤ ∀ i, α i , ρ i ≥ ∀ j, k, x j,k ≥ ∀ j, β j ≥ λ ≥ λ ∗ of the primal LP in (23) represents the slack in the YES/NO problem.I.e., if λ ∗ = 0, then we have zero slack and hence are in the YES case. Else, we are in theNO case with a slack equal to λ ∗ . Given this, all we have to show is that when the LP in (23)has an optimal value of λ ∗ , then the corresponding integer program’s optimal solution is at most λ ∗ (1 + ǫ ) + ǫ . To see this is true, let x ∗ j,k denote the optimal solution to primal LP (23). Considerthe integral solution that does a randomized rounding of the x ∗ j,k ’s and allocates according to theserounded integers, and let X jk be the corresponding { , } random variable. Let random variable X ij = P k a ijk X jk c i . By the deﬁnition of LP (23), we have E [ P j X ij ] ≤ λ ∗ . Noting that each of39he X ij ’s is at most γ , by Chernoﬀ bounds it follows that Pr [ P j X ij ≥ (1 + λ ∗ )(1 + ǫ )] ≤ e − ǫ γ ,which given that γ = O ( ǫ log( n/ǫ ) ), is at most ǫ n (the probability derivation is just like the derivationin Section 3.1). Likewise, if we deﬁne by Y ij the random variable P k w ijk X jk d i , then by the deﬁnitionof LP (23), we have E [ P j Y ij ] ≥ − λ ∗ . By Chernoﬀ bounds, we get an identical argument, weget Pr [ P j Y ij ≤ (1 − λ ∗ )(1 − ǫ )] ≤ ǫ n . By doing a union bound over the 2 n Chernoﬀ bounds, wehave that the randomized rounding integer solution is feasible with an optimal value of at most λ ∗ + ǫ + λ ∗ ǫ with probability at least 1 − ǫ . This means that there exists an integer solution of value λ ∗ + ǫ + λ ∗ ǫ , and this proves the lemma. Corollary 6.2

For a NO instance with a slack of ǫ (1 + ǫ ) , the LP relaxation of the instance isstill infeasible with a slack of ǫ . In particular, this implies that the optimal value of the primal λ ∗ in (23) is at least ǫ , and likewise the optimal dual value in (23) is P i ( ρ ∗ i − α ∗ i ) − P j β ∗ j ≥ ǫ . Lemma 6.3

For a NO instance, if T ≥ Θ( γm log( n/δ ) ǫ ) , then Pr " max i S T (cid:0) X Ai (cid:1) c i < Tm (1 + ǫ ) & min i S T (cid:0) Y Ai (cid:1) d i > Tm (1 − ǫ ) ≤ δ. Proof:

Let R denote the set of requests sampled. Consider the following LP.Sampled primal and dual LPs (24) Sampled primal LP Sampled dual LP

Minimize λ s.t. Maximize Tm P i ( ρ i − α i ) − P j ∈ R β j s.t. ∀ i, λ − P j ∈ R,k a ijk x j,k c i ≥ − Tm ∀ j ∈ R, k, β j ≥ P i (cid:16) ρ i w ijk d i − α i a ijk c i (cid:17) ∀ i, λ + P j ∈ R,k w ijk x j,k d i ≥ Tm P i ( α i + ρ i ) ≤ ∀ j ∈ R, P k x j,k ≤ ∀ i, α i , ρ i ≥ ∀ j, k, x j,k ≥ ∀ j ∈ R, β j ≥ λ ≥ T ǫm , then by deﬁnition of ouralgorithm 5, we would have declared NO, i.e., if the sampled LP itself had a slack of ǫ (scaled by Tm ), then no integral allocation based on those samples can obtain a smaller slack. We now showthat by picking T = Θ( γm ln( n/δ ) ǫ ), the above LP (24) will have its optimal objective value at least T ǫm , with a probability at least 1 − δ . This makes our algorithm answer NO with a probability atleast 1 − δ . 40ow, the primal of LP (24) has an optimal value equal to that of the dual which in turn islower bounded by the value of dual at any feasible solution. One such feasible solution is α ∗ , β ∗ , ρ ∗ ,which is the optimal solution to the full version of the dual in LP (24), namely the one written inLP (23) where R = [ m ], T = m . This is because the set of constraints in the full version of thedual is clearly a superset of the constraints in the dual of LP (24). Thus, the optimal value of theprimal of LP (24) is lower bounded by value of dual at α ∗ , β ∗ , ρ ∗ , which is= Tm ( X i ρ ∗ i − α ∗ i ) − X j ∈ R β ∗ j (25)For proceeding further in lower bounding (25), we apply Chernoﬀ bounds to P j ∈ R β ∗ j . The factthat the dual of the full version of LP (24) is a maximization LP, coupled with the constraints therein imply that β ∗ j ≤ γ . Further, let τ ∗ denote the optimal value of the full version of LP (24), i.e., P i ( ρ ∗ i − α ∗ i ) − P j β ∗ j = τ ∗ . Now, the constraint P i ( α ∗ i + ρ ∗ i ) ≤ τ ∗ ≥ P j β ∗ j ≤

1. We are now ready to lower bound the quantity in (25). We have the optimalsolution to primal of LP (24) ≥ Tm ( X i ρ ∗ i − α ∗ i ) − X j ∈ R β ∗ j ≥ Tm X i ( ρ ∗ i − α ∗ i ) −  T P j β ∗ j m + s T ( P j β ∗ j ) γ ln(1 /δ ) m  (cid:18) Since β ∗ j ∈ [0 , γ ] (cid:19) ≥ T τ ∗ m − r T γ ln(1 /δ ) m = T τ ∗ m " − r γm ln(1 /δ ) T · τ ∗ (26)where the second inequality is a “with probability at least 1 − δ ” inequality, i.e., we apply Chernoﬀbounds for P j ∈ S β ∗ j , along with the observation that each β ∗ j ∈ [0 , γ ]. The third inequality followsfrom P j β ∗ j ≤ P i ( ρ ∗ i − α ∗ i ) − P j β ∗ j = τ ∗ . Setting T = Θ( γm ln( n/δ ) ǫ ) with a appropriateconstant inside the Θ, coupled with the fact that τ ∗ ≥ ǫ in the NO case (see Corollary 6.2), it iseasy to verify that the quantity in (26) is at least T ǫm .Going back to our application of Chernoﬀ bounds above, in order to apply it in the form above,we require that the multiplicative deviation from mean r γm ln(1 /δ ) T P j β ∗ j ∈ [0 , e − P j β ∗ j ≥ ǫ ,then this requirement would follow. Suppose on the other hand that P j β ∗ j < ǫ . Since we arehappy if the excess over mean is at most T ǫm , let us look for a multiplicative error of

TǫmT P j β ∗ jm .Based on the fact that P j β ∗ j < ǫ the multiplicative error can be seen to be at least 1 /ǫ which islarger than 2 e − ǫ < e − . We now use the version of Chernoﬀ bounds for multiplicativeerror larger than 2 e −

1, which gives us that a deviation of

T ǫm occurs with a probability at most2 −  TǫmT P j β ∗ jm  T P j β ∗ jmγ , where the division by γ is because of the fact that β ∗ j ≤ γ . This probabilityis at most ( δn ) /ǫ , which is at most δ . 41he proofs for the YES and NO cases together prove Theorem 2.5. We now list the problems that are special cases of the resource allocation framework and havebeen previously considered. The Adwords and Display ads special cases were already discussed inSection 2.2.

Consider a graph (either undirected or directed) with edge capacities. Requests arrive online; arequest j consists of a source-sink pair, ( s j , t j ) and a bandwidth ρ j . In order to satisfy a request, acapacity of ρ j must be allocated to it on every edge along some path from s j to t j in the graph. Inthe throughput maximization version, the objective is to maximize the number of satisﬁed requestswhile not allocating more bandwidth than the available capacity for each edge (Diﬀerent requestscould have diﬀerent values on them, and one could also consider maximizing the total value of thesatisﬁed requests). Our Algorithm 2 for resource allocation framework directly applies here andthe approximation guarantee there directly carries over. Kamath et al. [1996] consider a diﬀerentversion of this problem where requests according to a Poisson process with unknown arrival rates.Each request has an associated holding time that is assumed to be exponentially distributed, andonce a request has been served, the bandwidth it uses up gets freed after its holding time (this isunlike our setting where once a certain amount of resource capacity has been consumed, it remainsunavailable to all future requests). Some aspects of the distribution are assumed to be known,namely, that the algorithm knows the average rate of proﬁt generated by all the incoming circuits,the average holding time, and also the target oﬄine optimal oﬄine solution that the online algorithmis aiming to approximate (again this is unlike our setting where no aspect of the request distributionis known to the algorithm, and there could even be some adversarial aspects like in the ASI model).When each request consumes at most γ fraction of any edge’s bandwidth, Kamath et al. [1996] givean online algorithm that achieves an expected proﬁt of (1 − ǫ ) times the optimal oﬄine solutionwhen γ = O ( ǫ log n ). Suppose we have n items for sale, with c i copies of item i . Bidders arrive online, and bidder j has autility function U j : 2 [ n ] → R . If we posted prices p i for each item i , then bidder j buys a bundle S that maximizes U j ( S ) − P i ∈ S p i . We assume that bidders can compute such a bundle. The goal isto maximize social welfare, the total utility of all the bidders, subject to the supply constraint thatthere are only c i copies of item i . Firstly, incentive constraints aside, this problem can be writtenas an LP in the resource allocation framework. The items are the resources and agents arrivingonline are the requests. All the diﬀerent subsets of items form the set of options. The utility U j ( S )represents the proﬁt w j,S of serving agent j through option S , i.e. subset S . If an item i ∈ S ,then a i,j,S = 1 for all j and zero otherwise. Incentive constraints aside, our algorithm for resourceallocation at step s , will choose the option k ∗ (or equivalently the bundle S ) as speciﬁed in point42 of Algorithm 2, i.e., minimize the potential function. That is, if step s falls in stage r , k ∗ = arg min k (X i a ijk · φ ri,s − − w j,k · ψ rs − ) (note that unlike Algorithm 2 there is no subscripting for w j,k ). This can be equivalently writtenas k ∗ = arg max k ( w j,k · ψ rs − − X i a ijk · φ ri,s − ) Now, maximizing the above expression at step s is the same as picking the k to maximize w j,k − P i p i ( s ) a ijk , where p i ( s ) = φ ri,s − ψ rs − . Thus, if we post a price of p i ( s ) on item i for bidder number s ,he will do exactly what the algorithm would have done otherwise. Suppose that the bidders arei.i.d samples from some distribution (or they arrive as in the adversarial stochastic input model).We can use Theorem 2.2 to get an incentive compatible posted price auction with a competitiveratio of 1 − O ( ǫ ) whenever γ = min i { c i } ≥ Ω (cid:16) log( n/ǫ ) ǫ (cid:17) . Further if an analog of Theorem 2.2 alsoholds in the random permutation model then we get a similar result for combinatorial auctions inthe oﬄine case: we simply consider the bidders one by one in a random order.

References

Shipra Agrawal and Nikhil R. Devanur. Fast algorithms for online stochastic convex programming.In

Proceedings of the Symposium on Discrete Algorithms, SODA , 2015.Shipra Agrawal, Zizhuo Wang, and Yinyu Ye. A dynamic near-optimal algorithm for online linearprogramming.

Operations Research , 62(4):876–890, 2014.Saeed Alaei, MohammadTaghi Hajiaghayi, and Vahid Liaghat. Online prophet-inequality matchingwith applications to ad allocation. In

ACM Conference on Electronic Commerce , pages 18–35,2012.Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a metaalgorithm and applications. Technical report, 2005.Niv Buchbinder, Kamal Jain, and Joseph Seﬃ Naor. Online primal-dual algorithms for maxi-mizing ad-auctions revenue. In

ESA’07: Proceedings of the 15th annual European conferenceon Algorithms , pages 253–264, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-75519-5,978-3-540-75519-7.Denis Xavier Charles, Max Chickering, Nikhil R. Devanur, Kamal Jain, and Manan Sanghi. Fastalgorithms for ﬁnding matchings in lopsided bipartite graphs with applications to display ads.In

ACM Conference on Electronic Commerce , pages 121–128, 2010.Nikhil R. Devanur and Thomas P. Hayes. The adwords problem: online keyword matching withbudgeted bidders under random permutations. In

Proceedings of the 10th ACM conference onElectronic commerce , EC ’09, pages 71–78, 2009. Here we assume that each agent reveals his true utility function after he makes his purchase. This informationis necessary to compute the prices to be charged for future agents.

ACM Conference on Electronic Commerce , pages 388–404, 2012.R. Eghbali, J. Swenson, and M. Fazel. Exponentiated Subgradient Algorithm for Online Optimiza-tion under the Random Permutation Model.

ArXiv e-prints , (1410.7171), October 2014.Jon Feldman, Monika Henzinger, Nitish Korula, Vahab S. Mirrokni, and Cliﬀord Stein. Onlinestochastic packing applied to display ad allocation. In

ESA , pages 182–194, 2010.Lisa K. Fleischer. Approximating fractional multicommodity ﬂow independent of the number ofcommodities.

SIAM J. Discret. Math. , 13(4):505–520, 2000.Naveen Garg and Jochen Koenemann. Faster and simpler algorithms for multicommodity ﬂow andother fractional packing problems. In

FOCS ’98: Proceedings of the 39th Annual Symposiumon Foundations of Computer Science , page 300, Washington, DC, USA, 1998. IEEE ComputerSociety. ISBN 0-8186-9172-7.Gagan Goel and Aranyak Mehta. Online budgeted matching in random input models with appli-cations to adwords. In

SODA ’08: Proceedings of the nineteenth annual ACM-SIAM symposiumon Discrete algorithms , pages 982–991, Philadelphia, PA, USA, 2008. Society for Industrial andApplied Mathematics.Anupam Gupta and Marco Molinaro. How experts can solve lps online. In Andreas Schulz andDorothea Wagner, editors,

Algorithms - ESA 2014 , volume 8737 of

Lecture Notes in ComputerScience , pages 517–529. Springer Berlin Heidelberg, 2014. ISBN 978-3-662-44776-5.Bala Kalyanasundaram and Kirk Pruhs. An optimal deterministic algorithm for online b-matching.In V. Chandru and V. Vinay, editors,

Foundations of Software Technology and Theoretical Com-puter Science , volume 1180 of

Lecture Notes in Computer Science , pages 193–199. Springer BerlinHeidelberg, 1996. ISBN 978-3-540-62034-1. doi: 10.1007/3-540-62034-6 49.Anil Kamath, Omri Palmon, and Serge Plotkin. Routing and admission control in general topologynetworks with poisson arrivals. In

SODA ’96: Proceedings of the seventh annual ACM-SIAMsymposium on Discrete algorithms , pages 269–278, Philadelphia, PA, USA, 1996. Society forIndustrial and Applied Mathematics. ISBN 0-89871-366-8.Michael Kapralov, Ian Post, and Jan Vondr´ak. Online submodular welfare maximization: Greedyis optimal. In

Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013 , pages 1216–1225,2013.Thomas Kesselheim, Klaus Radke, Andreas T¨onnis, and Berthold V¨ocking. Primal beats dual ononline packing lps in the random-order model. In

Symposium on Theory of Computing, STOC2014, New York, NY, USA, May 31 - June 03, 2014 , pages 303–312, 2014.Aranyak Mehta, Amin Saberi, Umesh Vazirani, and Vijay Vazirani. Adwords and generalized on-line matching. In

In FOCS 05: Proceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science , pages 264–273. IEEE Computer Society, 2005.44erge A. Plotkin, David B. Shmoys, and ´Eva Tardos. Fast approximation algorithms for fractionalpacking and covering problems. In

SFCS ’91: Proceedings of the 32nd annual symposium onFoundations of computer science , pages 495–504, Washington, DC, USA, 1991. IEEE ComputerSociety. ISBN 0-8186-2445-0. doi: http://dx.doi.org/10.1109/SFCS.1991.185411.Ester Samuel-Cahn. Comparison of threshold stop rules and maximum for independent nonnegativerandom variables.

The Annals of Probability , 12(4):1213–1216, 1984.Neal E. Young. Randomized rounding without solving the linear program. In

SODA ’95: Proceed-ings of the sixth annual ACM-SIAM symposium on Discrete algorithms , pages 170–178, Philadel-phia, PA, USA, 1995. Society for Industrial and Applied Mathematics.Neal E. Young. Sequential and parallel algorithms for mixed packing and covering. In