[PDF] Online Packing and Covering Framework with Convex Objectives

Abstract

We consider online fractional covering problems with a convex objective, where the covering constraints arrive over time. Formally, we want to solve min{f(x)∣Ax≥1,x≥0}, where the objective function f: R n →R is convex, and the constraint matrix A m×n is non-negative. The rows of A arrive online over time, and we wish to maintain a feasible solution x at all times while only increasing coordinates of x . We also consider "dual" packing problems of the form max{ c ⊺ y−g(μ)∣ A ⊺ y≤μ,y≥0} , where g is a convex function. In the online setting, variables y and columns of A ⊺ arrive over time, and we wish to maintain a non-decreasing solution (y,μ) . We provide an online primal-dual framework for both classes of problems with competitive ratio depending on certain "monotonicity" and "smoothness" parameters of f ; our results match or improve on guarantees for some special classes of functions f considered previously. Using this fractional solver with problem-dependent randomized rounding procedures, we obtain competitive algorithms for the following problems: online covering LPs minimizing ℓ p -norms of arbitrary packing constraints, set cover with multiple cost functions, capacity constrained facility location, capacitated multicast problem, set cover with set requests, and profit maximization with non-separable production costs. Some of these results are new and others provide a unified view of previous results, with matching or slightly worse competitive ratios.

Full PDF

aa r X i v : . [ c s . D S ] D ec Online Packing and Covering Framework with Convex Objectives

Niv Buchbinder ∗ Shahar Chen † Anupam Gupta ‡ Viswanath Nagarajan § Joseph (Seﬃ) Naor † Abstract

We consider online fractional covering problems with a convex objective, where the coveringconstraints arrive over time. Formally, we want to solvemin { f ( x ) | Ax ≥ , x ≥ } , where the objective function f : R n → R is convex, and the constraint matrix A m × n is non-negative. The rows of A arrive online over time, and we wish to maintain a feasible solution x atall times while only increasing coordinates of x . We also consider packing problems of the formmax { c ⊺ y − g ( µ ) | A ⊺ y ≤ µ, y ≥ } , where g is a convex function. In the online setting, variables y and columns of A ⊺ arrive overtime, and we wish to maintain a non-decreasing solution ( y, µ ). These problems are dual to eachother when g = f ⋆ the Fenchel dual of f .We provide an online primal-dual framework for both classes of problems with competitiveratio depending on certain “monotonicity” and “smoothness” parameters of f ; our results matchor improve on guarantees for some special classes of functions f considered previously.Using this fractional solver with problem-dependent randomized rounding procedures, we ob-tain competitive algorithms for the following problems: online covering LPs minimizing ℓ p -normsof arbitrary packing constraints, set cover with multiple cost functions, capacity constrained facil-ity location, capacitated multicast problem, set cover with set requests, and proﬁt maximizationwith non-separable production costs. Some of these results are new and others provide a uniﬁedview of previous results, with matching or slightly worse competitive ratios. ∗ Statistics and Operations Research Dept., Tel Aviv University, Research supported in part by ISF grant 954/11and by BSF grant 2010426. † Technion - Israel Institute of Technology, Haifa, Israel. Work supported by ISF grant 954/11 and BSF grant2010426. ‡ Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Research partly sup-ported by NSF awards CCF-1016799 and CCF-1319811. § Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109. Introduction

We consider the following class of fractional covering problems:min { f ( x ) : Ax ≥ , x ≥ } . (1)Above, f : R n → R is a non-decreasing convex function and A m × n is non-negative. (Observe thatwe can transform the more general constraints Ax ≥ b with all non-negative entries into this formby scaling the constraints.) The covering constraints a ⊺ i x ≥ x , where x is required to be non-decreasing over time.We also consider the Fenchel dual of (1) which is the following packing problem:max { ⊺ y − f ⋆ ( µ ) : A ⊺ y ≤ µ, y ≥ } . (2)Here, the variables y i along with columns of A ⊺ (or, alternatively, rows of A ) arrive over time, andthe Fenchel dual is formally deﬁned in (6); see, e.g., [Roc70] for background and properties. Let d denote the row sparsity of the matrix A , i.e., the maximum number of non-zeroes in any row, andlet ∇ ℓ f ( z ) be the ℓ th coordinate of the gradient of f at point z ∈ R n .This paper gives an online primal-dual algorithm for this pair of convex programs (1) and (2). Thisextends the widely-used online primal-dual framework for linear objective functions to the convexcase. The competitive ratio is given as the ratio between the primal and dual objective functions ∗ . Itdepends on certain “smoothness” parameters of the function f . We provide two general algorithms: • In the ﬁrst algorithm, the primal variables x and dual variables µ are monotonically non-decreasing, while the dual variables y are allowed to both increase and decrease over time.The competitive ratio of this algorithm is:DualPrimal ≥ max c> (cid:20) min z (cid:18)

18 log(1 + d ) n min ℓ =1 (cid:26) ∇ ℓ f ( z ) ∇ ℓ f ( cz ) (cid:27)(cid:19) − max z (cid:18) z ⊺ ∇ f ( z ) − f ( z ) f ( cz ) (cid:19)(cid:21) . (3) • In the second algorithm, all variables—primal variables x as well as dual variables y, µ —arerequired to be monotonically non-decreasing. The competitive ratio is slightly worse in thiscase, given by:DualPrimal ≥ max c> (cid:20) min z (cid:18)

12 log(1 + dρ ) n min ℓ =1 (cid:26) ∇ ℓ f ( z ) ∇ ℓ f ( cz ) (cid:27)(cid:19) − max z (cid:18) z ⊺ ∇ f ( z ) − f ( z ) f ( cz ) (cid:19)(cid:21) . (4)Observe that the diﬀerence from (3) is the additional parameter ρ , which is deﬁned to be anupper bound on the maximum-to-minimum ratio of positive entries in any column of A .The above expressions are diﬃcult to parse because of their generality, so the ﬁrst special case ofinterest is that of linear objectives. In this case z ⊺ ∇ f ( z ) = f ( z ), and also ∇ f ( z ) = ∇ f ( cz ), hence ∗ However, for clarity of exposition we provide the ratio as Dual/Primal and not vice versa. O (log d ) for monotone primals, and O (log( dρ )) for monotone primals andduals. Both of these competitive ratios are known to be best possible [BN09, GN14].The applicability of our framework extends to a number of settings, most of which have been studiedbefore in diﬀerent works. We now outline some of these connections. • Mixed Covering and Packing LPs . In this problem, covering constraints Ax ≥ K “packing constraints” P nj =1 b kj · x j ≤ λ k , for k ∈ [ K ], that are given up-front. The right hand sides λ k of these packing constraints are themselves variables, and theobjective is to minimize the ℓ p -norm ( P Kk =1 λ pk ) /p of the “load vector” λ = ( λ , . . . , λ K ).All entries a ij and b kj are non-negative. Clearly, the objective function is a monotonicallynon-decreasing convex function.We obtain an O ( p log d )-competitive algorithm for this problem, where d ≤ n is the row-sparsity of matrix A . Prior to our work, [ABFP13] gave an O (log K · log( dκγ ))-competitivealgorithm for the special case of p = log K (corresponding to k λ k ∞ , the makespan of theloads); here γ and κ are the maximum-to-minimum ratio of the entries in the covering andpacking constraints. • Set Cover with Multiple Costs . Here the oﬄine input is a collection of n sets { S j } nj =1 over auniverse U , and K diﬀerent linear cost functions B k : [ n ] → R + for k ∈ [ K ]. Elements from U arrive online and must be covered by some set upon arrival, where the decision to select aset into the solution is irrevocable. The goal is to maintain a set-cover that minimizes the ℓ p norm of the K cost functions. Combining our framework with a simple randomized roundingscheme gives an O ( p log p log d log | U | )-competitive randomized online algorithm; here d is themaximum number of sets containing any element. The special case of K = 1 (when p = 1without loss of generality) is the online set-cover problem [AAA + O (log d log | U | )-competitive bound is tight, at least for randomized polynomial-time onlinealgorithms [Kor05]. • Capacity Constrained Facility Location (CCFL). Here we are given m potential facility loca-tions, each with an opening cost c i and a capacity u i . Now, n clients arrive online, each client j ∈ [ n ] having an assignment cost a ij and a demand/load b ij for each facility i ∈ [ m ]. Theonline algorithm must open facilities (paying the opening costs c i ) and assign each arrivingclient j to some open facility i (paying the assignment cost a ij , and incurring a load p ij onfacility i ). The makespan of an assignment is the maximum load on any facility. The objectivein CCFL is to minimize the sum of opening costs, assignment costs and the makespan. Usingour framework, we obtain an O (log m )-competitive fractional solution to a convex relaxationof CCFL. This is then rounded online to get an O (log m log mn )-competitive randomizedonline algorithm. This competitive ratio is worse by a logarithmic factor than the best re-sult [ABFP13], but it follows easily from our general framework. • Capacitated Multicast Problem (CMC). This is a common generalization of CCFL and the online multicast problem [AAA + m edge-disjoint rooted trees T , · · · , T m cor-responding to multicast trees in some network. Each tree T i has a capacity u i , and each edge e ∈ ∪ mi =1 T i has an opening cost c e . A sequence of n clients arrive online, and each must be3ssigned to one of these trees. Each client j has a tree-dependent load of p ij for tree T i ,and is connected to exactly one vertex π ij in tree T i . Thus, if client j is assigned to tree T i then the load of T i increases by p ij , and all edges on the path in T i from π ij to its rootmust be opened. The objective is to minimize the total cost of opening the edges, subjectto the capacity constraints that the total load on tree T i is at most u i . Solving a naturalfractional convex relaxation, and then applying a suitable randomized rounding to it, we getan O (log m log mn )-competitive randomized online algorithm that violates each capacity byan O (( d + log m ) log mn ) factor; here d is the maximum depth of the trees { T i } mi =1 . The ca-pacitated multicast problem with depth d = 2 trees generalizes the CCFL problem, in whichcase we recover the above result for CCFL. • Online Set Cover with Set Requests (SCSR). We are given a universe U of n resources , anda collection of m facilities , where each facility i ∈ [ m ] is speciﬁed by (i) a subset S i ⊆ U of resources (ii) opening cost c i and (iii) capacity u i . The resources and facilities are givenup-front. Now, a sequence of k requests arrive over time. Each request j ∈ [ k ] requires somesubset R j ⊆ U of resources. The request has to be served by assigning it to some collection F j ⊆ [ m ] of facilities whose sets collectively cover R j , i.e., R j ⊆ ∪ i ∈ F j S i . Note that thesefacilities have to be open, and we incur the cost of these facilities. Moreover, if a facility i isused to serve client j , this contributes to the load of facility i , and this total load must be atmost the capacity u i . This problem was considered recently by Bhawalkar et al. [BGP14].Using an approach identical to that for the CCFL problem, we get an O (log m log mnk )-competitive randomized online algorithm that violates each capacity by an O (log m log mnk )factor. Again this factor is weaker than the best result by a logarithmic factor, but directlyfollows from our general framework. • Proﬁt Maximization with Production Costs (PMPC). This is an application of the dual packingproblem (2), in contrast to the above applications which are all applications of the primalcovering problem.Consider a seller with m items that can be produced and sold. The seller has a productioncost function g : R m + → R + which is monotone, convex and satisﬁes some other technicalconditions; the total cost incurred by the seller to produce µ j units of every item j ∈ [ m ] isgiven by g ( µ ). † There are n buyers who arrive online. Each buyer i ∈ [ n ] is interested insubsets of items (bundles) that belong to a set family S i ⊆ [ m ] . The value of buyer i forsubset S ∈ S i is given by v i ( S ), where v i : S i → R + is her valuation function . If buyer i isallocated a bundle T ∈ S i , she pays the seller her valuation v i ( T ). (Observe: this is not anauction setting.) The goal in the PMPC problem is to produce items and allocate subsetsto buyers so as to maximize the proﬁt P ni =1 v i ( T i ) − g ( µ ), where T i ∈ S i denotes the subsetallocated to buyer i and µ ∈ R m is the total quantity of all items produced. As mentionedabove, we consider a non-strategic setting, where the valuation of each buyer is known to theseller. † An important diﬀerence from prior work on such problems [BGMS11, HK14]: in these works, each item j hada separate production cost function g j ( µ j ), and g ( µ ) := P j g j ( µ j ). We call this the separable case. Our techniquesallow the production cost to be non-separable over items—e.g., we can handle g ( µ ) = ( P mj =1 µ j ) . i is allowed to be any point in the convex hull of the set family S i . We show that for alarge class of valuation functions (e.g., supermodular, or weighted rank-functions of matroids)and production cost functions, our framework provides a polynomial time online algorithm:the precise competitive ratio is given by expression (4) with f = g ⋆ . As a concrete example,suppose the production cost function is g ( µ ) = ( P mj =1 µ j ) p for some p >

1. In this case,we get an O ( q log β ) q -competitive algorithm, where q > q + p = 1, and β is themaximum-to-minimum ratio of the valuation functions { v i } .As the above list indicates, the framework to solve fractional convex programs is fairly versatileand gives good fractional results for a variety of problems. In some cases, solving the particularrelaxation we consider and then rounding ends up being weaker than the best known results for thatspeciﬁc problems (by a logarithmic factor); we hope that further investigation into this problem willhelp close this gap. Bibliographic Note:

In independent and concurrent work, Azar et al. [ACP14] consider onlinecovering problems with convex objectives—i.e., problem (1). They also obtain a competitive ratiothat depends on properties of the function f , but their parameterization is somewhat diﬀerent fromours. As an example, for online covering LPs minimizing the ℓ p -norm of packing constraints, theyobtain an O ( p log( dκγ ))-competitive algorithm, whereas we obtain a tighter O ( p log d ) ratio. In § f can be reduced to linear optimizationusing the gradient of the convex function f . In the process we end up also giving a cleaner algorithmand proof for linear optimization problems as well, signiﬁcantly simplifying the previous algorithmfrom [GN14]. The resulting algorithm performs multiplicative increases on the primal variables; forthe dual, it does an initial increase followed by a linear decrease after some point.In § § § §

3, are for the convex covering problem (1). Some comments on the main ideas to watch out for: • For applications to combinatorial problems we have to deﬁne the convex relaxation with somecare in order to avoid bad integrality gaps. Moreover, some of our convex relaxations aremotivated by the particular constraints we want to enforce when subsequently rounding. • For some of the problems our convex relaxations have an exponential number of constraints.To get a polynomial running time, we use the natural “separation oracle” approach. Moreover,we relax the constraints by a constant factor, so that each call to the separation oracle givesus a “big” improvement, and hence there are only a few updates per request.5

For capacity constrained facility location (in § § § § m of items and the cost function g ; in particular it does not depend on n , the number of buyers.We note that such an additive loss is necessary for our approach due to an integrality gap of theconvex relaxation. This paper adds to the body of work in online primal-dual algorithms; see [BN07] for a surveyof this area. This approach has been applied successfully to a large class of online problems:set cover [AAA + + linear programs were ﬁrst considered by Buchbinder and Naor [BN09],where they obtained an O (log n )-competitive algorithm for covering and an O (log( n a max a min ))-competitivealgorithm for packing. The competitive ratio for covering linear programs was improved to O (log d )by Gupta and Nagarajan [GN14], where d ≤ n is the maximum number of non-zero entries in anyrow.Azar, Bhaskar, Fleischer, and Panigrahi [ABFP13] gave the ﬁrst algorithm for online mixed packingand covering LPs, where the packing constraints are given upfront and covering constraints arriveonline; the objective is to minimize the maximum violation of the packing constraints. Their algo-rithm had a competitive ratio of O (log K · log( dκγ )), where K is the number of packing constraintsand γ (resp. κ ) denotes the maximum-to-minimum ratio of entries in the covering (resp. packing)constraints. Using our framework, this bound can be improved to O (log K · log d ). This is also bestpossible as shown in [ABFP13].The capacity constrained facility location problem was also introduced by Azar, Bhaskar, Fleischer,and Panigrahi [ABFP13], who gave an O (log m log mn )-competitive algorithm. Our result for thisproblem is worse by a log-factor, but has the advantage of following directly from our general frame-work. Moreover, our approach can be extended to the capacitated multicast problem, which is ageneralization of CCFL to multi-level facility costs. The online multicast problem (without capaci-ties) was considered by Alon et al. [AAA +

06] where they obtained an O (log m · log n )-competitiverandomized algorithm.The online set cover problem with set requests was considered recently by Bhawalkar, Gollapudi,and Panigrahi [BGP14] who obtained an O (log m log mnk )-competitive algorithm where capacities6re violated by an O (log m log mnk ) factor. The competitive ratio obtained through our approachis worse by a logarithmic factor in the cost guarantee. Still, we think this is useful, since it followswith almost no additional eﬀort, given our online fractional framework and the CCFL roundingscheme. Our approach is also likely to be useful in other such generalizations.The class of online maximization problems with production costs was introduced by Blum, Gupta,Mansour, and Sharma [BGMS11] and extended by Huang and Kim [HK14]. The key diﬀerencesfrom our setting are: (i) these papers deal with an auction setting where the seller is not awareof the valuations of the buyers, whereas our setting is not strategic, and (ii) these papers arerestricted to separable production costs, whereas we can handle much more general (non-separable)cost functions. Let f : R n → R be a non-negative non-decreasing convex function. We assume that the function f is continuous and diﬀerentiable, and satisﬁes the following monotonicity condition : ∀ x ≥ x ′ ∈ R n , ∇ f ( x ) ≥ ∇ f ( x ′ ) (5)Here, x ≥ x ′ means x i ≥ x ′ i for all i ∈ [ n ].We consider the online fractional covering problem (1) where the constraints in A arrive online. Ouralgorithm is a primal-dual algorithm, which works with the following pair of convex programs:( P ) : min f ( x ) ( D ) : max P mi =1 y i − f ⋆ ( µ ) Ax ≥ y ⊺ A ≤ µ ⊺ x ≥ . y ≥ . Here f ⋆ is the Fenchel dual of f , which is deﬁned as f ⋆ ( µ ) = sup z { µ ⊺ z − f ( z ) } . (6)(Observe that by scaling the rows of A appropriately, we can transform any convering LP of theform Ax ≥ b into the form above.) The following duality is standard. Lemma 2.1 (Weak duality) . Let x, ( y, µ ) be feasible primal and dual solutions to ( P ) and ( D ) respectively. Then, Primal objective = f ( x ) ≥ m X i =1 y i − f ⋆ ( µ ) = Dual objective . (7) Proof. m X i =1 y i = y ⊺ ≤ y ⊺ Ax ≤ µ ⊺ x = ( µ ⊺ x − f ( x )) + f ( x ) ≤ f ⋆ ( µ ) + f ( x ) . Rearranging we get the desired. 7 .1 The Algorithm

The algorithm maintains a feasible primal x and a feasible dual solution y at each time. Fractional Algorithm:

At round t : • Let τ be a continuous variable denoting the current time. • While the new constraint is unsatisﬁed, i.e., P nj =1 a tj x j <

1, increase τ at rate 1 and: • Change of primal variables:–

For each j with a tj >

0, increase each x j at rate ∂x j ∂τ = a tj x j + d ∇ j f ( x ) . (8)Here d is an upper bound on the row sparsity of the matrix. ∇ j f ( x ) is the j th -coordinate of the gradient ∇ f ( x ). • Change in dual variables:–

Set µ = ∇ f ( δx ), where δ > – Increase y t at rate r = d ) · min nℓ =1 n ∇ ℓ f ( δx ) ∇ ℓ f ( x ) o . – If the dual constraint of variable x j is tight, that is, P ti =1 a ij y i = µ j , then, ∗ Let m ⋆j = arg max ti =1 { a ij | y i > } . ∗ Increase y m ⋆j at rate − a tj a m⋆j j · r .(Note that this change occurs only if a tj is strictly positive.)We emphasize that the primal algorithm does not depend on the value δ . The last step in thealgorithm decreases certain dual variables; all other steps only increase primal and dual variables.For the analysis, we denote x τ , y τ , µ τ , r τ as the value of x, y, µ, r at time τ , respectively. Observation 2.2.

For any δ > , the following are maintained. • The algorithm maintains a feasible monotonically non-decreasing primal solution. • The algorithm maintains a feasible dual solution with non-decreasing µ j .Proof. The ﬁrst property follows by construction, since we only increase x till reaching a feasiblesolution. For the second property, we observe that the dual variables µ are non-decreasing since ∇ f ( x ) is non-decreasing. We prove that y, µ is feasible by induction over the execution of thealgorithm. While processing constraint t , if P ti =1 a ij y τi < µ τj for column j we are trivially satisﬁed.Suppose that during the processing of constraint t , we have P ti =1 a ij y τi = µ τj for some dual constraint8 and time τ . Now the dual decrease part of the algorithm kicks in, and the rate of change in theleft-hand side of the dual constraint is: ddτ t X i =1 a ij y τi ! = a tj · r τ − a m ⋆j j · a tj a m ⋆j j · r τ = 0Before analyzing the competitive factor, let us ﬁrst prove the following claim. Claim 2.3.

For a variable x j , let T j = { i | a ij > } and let S j be any subset of T j . Then, x τj ≥ i ∈ S j { a ij } · d  exp  ln (cid:0) d (cid:1) µ τj X i ∈ S j a ij y τi  −  (9) Proof.

Let τ ( i ) denote the value of τ at the arrival of the i th primal constraint. We ﬁrst note thatthe increase in the primal variables at any time τ ( i ) ≤ τ ≤ τ ( i + 1) can be alternatively formulatedby the following diﬀerential equation. ∂x j ∂y i = log (cid:0) d (cid:1) min nℓ =1 n ∇ ℓ f ( δx ) ∇ ℓ f ( x ) o · a ij x j + d ∇ j f ( x ) ≥ log (cid:0) d (cid:1) · a ij x j + d ∇ j f ( δx ) . (10)By solving the latter equation we get for any τ ( i ) ≤ τ ≤ τ ( i + 1), x τj + a ij d x τ ( i ) j + a ij d ≥ exp ln (cid:0) d (cid:1) ∇ j f ( δx τ ) · a ij y τi ! , (11)where we use the fact that ∇ j f ( δx ) is monotonically non-decreasing. Note that Inequality (11) issatisﬁed even when no decrease is performed on the dual variables, and such a decrease only eﬀectsthe right handside of the inequality. For convenience, let us denote τ ( t + 1) = τ (the actual valueof τ ( t + 1) has not been revealed by the algorithm yet). Multiplying over all indices in S j we get,exp  ln (cid:0) d (cid:1) µ τj X i ∈ S j a ij y τi  ≤ exp  X i ∈ S j ln (cid:0) d (cid:1) ∇ j f ( δx τ ( i +1) ) · a ij y τ ( i +1) i  (12) ≤ Y i ∈ S j x τ ( i +1) j + a ij d x τ ( i ) j + a ij d ≤ Y i ∈ S j x τ ( i +1) j + i ∈ Sj { a ij }· d x τ ( i ) j + i ∈ Sj { a ij }· d (13) ≤ Y i ∈ T j x τ ( i +1) j + i ∈ S { a ij }· d x τ ( i ) j + i ∈ Sj { a ij }· d = x τj + i ∈ Sj { a ij }· d i ∈ Sj { a ij }· d . (14)Inequality (12) follows as µ τj = ∇ j f ( δx τ ) and the value of ∇ j f ( δx ) monotonically non-decreases intime. Inequality (13) follows by substituting (11) into (12). Inequality (14) follows as the value of9 j monotonically non-decreases in time. Finally, the last equality is obtained using a telescopic sumand the fact that x j increases only in rounds with a tj > Theorem 2.4.

The competitive ratio of the algorithm is: min z  min nℓ =1 n ∇ ℓ f ( δz ) ∇ ℓ f ( z ) o d )  − max z (cid:18) ( δz ) ⊺ ∇ f ( δz ) − f ( δz ) f ( z ) (cid:19) , (15) where δ > is the parameter chosen in the algorithm.Proof. Consider the update when primal constraint t arrives and τ is the current time. Let U ( τ )denote the set of tight dual constraints at time τ . That is, for every j ∈ U ( τ ) we have a tj > P ti =1 a ij y τi = µ τj . So | U ( τ ) | ≤ d the row-sparsity of A . Moreover, let us deﬁne for every j ∈ U ( τ ), S j = { i | a ij > , y τi > } . Clearly, P i ∈ S j a ij y τi = P ti =1 a ij y τi = µ τj , hence by Claim 2.3 and the factthat P j a tj x τj <

1, we get for every j ∈ U ( τ ),1 a tj > x τj ≥ i ∈ S j { a ij } · d (cid:0) exp (cid:0) ln(1 + 2 d ) (cid:1) − (cid:1) , and after simplifying we get a tj a m⋆j j = a tj max i ∈ Sj { a ij } ≤ d . As a result, we can bound the rate of changein the dual expression P ti =1 y i at any time τ : d (cid:0)P ti =1 y i (cid:1) dτ ≥ r τ − X j ∈ U ( τ ) a tj a m ⋆j j · r τ ≥ r τ  − X j ∈ U ( τ ) d  ≥ r τ , (16)where the last inequality follows as | U ( τ ) | ≤ d .On the other hand, when processing constraint t during the execution of the algorithm, the rate ofincrease of the primal objective f is: df ( x τ ) dτ = X j ∇ j f ( x τ ) ∂x τj ∂τ = X j | a tj > ∇ j f ( x τ ) a tj x τj + d ∇ j f ( x τ ) ! = X j | a tj > (cid:18) a tj x τj + 1 d (cid:19) ≤ . (17)The ﬁnal inequality uses the fact that the covering constraint is unsatisﬁed, and that d is at leastthe number of non-zeroes in the vector a t . From (16) and (17) we can now bound the followingprimal-dual ratio: d (cid:0)P ti =1 y τi (cid:1) df ( x τ ) ≥ r τ nℓ =1 n ∇ ℓ f ( δx τ ) ∇ ℓ f ( x τ ) o d ) . (18)Thus, if x and y are the ﬁnal primal and dual solutions we get, m X i =1 y i ≥ min x ′ min nℓ =1 n ∇ ℓ f ( δx ′ ) ∇ ℓ f ( x ′ ) o d ) · f ( x ) . (19)10o complete the proof of Theorem 2.4, we use the following standard claim. Claim 2.5.

For any a ∈ R n , we have f ⋆ ( ∇ f ( a )) = a ⊺ ∇ f ( a ) − f ( a ) .Proof. By deﬁnition, f ⋆ ( ∇ f ( a )) = sup x { x ⊺ ∇ f ( a ) − f ( x ) } . Note that x ⊺ ∇ f ( a ) − f ( x ) is concave asa function of x . So a necessary and suﬃcient condition for optimality is: ∇ i f ( x ) = ∇ i f ( a ) , ∀ i ∈ [ n ] . Thus setting x = a , we have f ⋆ ( ∇ f ( a )) = a ⊺ ∇ f ( a ) − f ( a ).Finally, we can attain the competitive ratio by a simple application of Claim 2.5 and Inequality (19)to the deﬁnition of the dual. Indeed,Dual = m X i =1 y i − f ⋆ ( µ ) ≥  min x ′ min nℓ =1 n ∇ ℓ f ( δx ′ ) ∇ ℓ f ( x ′ ) o d ) − f ⋆ ( ∇ f ( δx )) f ( x )  · f ( x )by Inequality (19), and using Claim 2.5 (with a = δx ), we get=  min x ′ min nℓ =1 n ∇ ℓ f ( δx ′ ) ∇ ℓ f ( x ′ ) o d ) − ( δx ) ⊺ ∇ f ( δx ) − f ( δx ) f ( x )  · f ( x ) ≥  min z  min nℓ =1 n ∇ ℓ f ( δz ) ∇ ℓ f ( z ) o d )  − max z (cid:18) ( δz ) ⊺ ∇ f ( δz ) − f ( δz ) f ( z ) (cid:19) · PrimalHence the proof.How to choose the value of δ ? If we set c = 1 /δ and optimize over c , the competitive ratio is:DualPrimal ≥ max c>  min z min nℓ =1 n ∇ ℓ f ( z ) ∇ ℓ f ( cz ) o d ) − max z z ⊺ ∇ f ( z ) − f ( z ) f ( cz )  . (20)This expression looks quite formidable, however it simply captures how sharply the function f changes locally. For special cases it gives us very simple expressions; e.g., for linear cost functions f ( x ) = c ⊺ x it gives us Dual ≥ P rimal/O (log d ). See § In the general framework above, we maintained both the primal and dual solutions simultaneously.If our goal is to solve (1) online, i.e., to minimize the convex function f ( x ) subject to coveringconstraints arriving online, then the dual values can be determined with hindsight once the ﬁnalvalue of the primal variables x has been computed. In particular, we set µ = ∇ f ( δx ) once and for11ll, and increase y at a constant rate r = min nℓ =1 n ∇ ℓ f ( δx ) ∇ ℓ f ( x ) o log (1 + 2 d ) . These modiﬁcations can be easily plugged into the analysis above, allowing us to omit the mini-mization over x ′ in the competitive ratio. (Observe that the update for the primal variables remainsthe same). Corollary 2.6.

For online minimization, the competitive ratio of the algorithm is: max c> min z  min nℓ =1 n ∇ ℓ f ( z ) ∇ ℓ f ( cz ) o d ) − z ⊺ ∇ f ( z ) − f ( z ) f ( cz )  (21) If our goal is to solve (2) and maximize a dual objective function subject to packing constraints,then indeed the above framework increases the dual variables µ , however the dual variables y canboth increase and decrease. (Moreover, this potential decrease is essential for the competitive ratioto be independent of the magnitude of entries in the matrix A [GN14]). In settings where decreasein dual variables is not allowed, we need to slightly modify (and simplify) the online dual update inthe algorithm by setting r = min nℓ =1 n ∇ ℓ f ( δx ) ∇ ℓ f ( x ) o log (1 + dρ ) , where ρ is an upper bound on max t { a tj } min t,atj> { a tj } for all 1 ≤ j ≤ n . And we skip the last step whichdecreases duals. Here, application of Claim 2.3 at any round t and time τ ( t ) ≤ τ ≤ τ ( t + 1) yields1 a tj ≥ x τj ≥ ti =1 { a ij } · d exp ln (1 + dρ ) µ τj t X i =1 a ij y i ! − ! , (22)which implies ln (cid:16) d · max ti =1 { a ij } a tj (cid:17) / ln (1 + dρ ) ≥ P ti =1 a ij y i µ τj , and thus guarantees P ti =1 a ij y i ≤ µ τj . Corollary 2.7.

For online maximization, when decreasing dual variables is not allowed, the adjustedalgorithm obtains the following competitive ratio: max c> min z  min nℓ =1 n ∇ ℓ f ( z ) ∇ ℓ f ( cz ) o ρd )  − max z (cid:18) z ⊺ ∇ f ( z ) − f ( z ) f ( cz ) (cid:19) (23)This results in a worse competitive ratio, but having monotone duals is useful for two reasons: (a)in some settings we need monotone duals, as in the proﬁt maximization application in Section 4),and (b) we get a simpler algorithm since we skip the third step of the online dual update (involving12he dual decrease). We show how the general framework above can be used to give algorithms for several previously-studied as well as new problems. In contrast to previous papers where a primal-dual algorithm hadto be tailored to each of these problems, we use the framework above to solve the underlying convexprogram, and then apply a suitable rounding algorithm to the fractional solution. ℓ p -norm of Packing Constraints We consider the problem of solving a mixed packing-covering linear program online, as deﬁned byAzar et al. [ABFP13]. The covering constraints Ax ≥ K “packing constraints” P nj =1 b kj · x j ≤ λ k for k ∈ [ K ] that are given up-front. Theright sides λ k of these packing constraints are themselves variables, and the objective is to minimize P Kk =1 λ pk or alternatively, k λ k p = p qP Kk =1 λ pk . All the entries in the constraint matrices A = ( a ij )and B = ( b kj ) are non-negative. Theorem 3.1.

There is an O ( p log d ) -competitive online algorithm for fractional covering with theobjective of minimizing ℓ p -norm of multiple packing constraints.Proof. In order to apply our framework to this problem, we seek to minimize the convex function f ( x ) = 1 p k Bx k pp = 1 p K X k =1 ( B k x ) p = 1 p K X k =1  n X j =1 b kj · x j  p . This is the p -power of the original objective; above B k = ( b k , · · · , b kn ) is the k th packing constraint.To obtain the competitive ratio, observe that ∇ j f ( x ) = P Kk =1 b kj · ( P k x ) p − . Thus, we have for all c > x ∈ R n + and 1 ≤ j ≤ n : f ( z ) f ( cz ) = (1 /c ) p ∇ j f ( z ) ∇ j f ( cz ) = (1 /c ) p − P nj =1 z j · ∇ f ( z ) j f ( cz ) = P nj =1 z j P Kk =1 b kj · ( B k z ) p − f ( cz ) = p · f ( z ) f ( cz ) = p (1 /c ) p . Substituting δ = 1 /c and plugging into (20) we get:Dual ≥ (cid:18) δ p − d ) − pδ p + δ p (cid:19) · Primal (24)So the primal-dual ratio (as a function of δ ) is DualPrimal ≥ δ p − /L − ( p − δ p where L = 4 ln(1 + 2 d ).This quantity is maximized when δ = pL , leading to a primal-dual ratio of 1 / ( pL ) p . Taking the p th ℓ p -norm of the primal is at most pL = O ( p log d ) times theoptimum.When p = Θ(log m ), the ℓ p and ℓ ∞ norms are within constant factors of each other, we obtainthe online mixed packing-covering LP (OMPC) problem studied by Azar et al. [ABFP13]. Forthis setting this gives an improved O (log d · log m )-competitive ratio, where d is the row-sparsity ofthe matrix A , and m is the number of packing constraints. This competitive ratio is known to betight [ABFP13, Theorem 1.2]. Remark 3.2.

The above result also holds if function f is the sum of distinct powers of linearfunctions, i.e. f ( x ) = P Kk =1 ( B k x ) p k where p , · · · , p K ≥ may be non-uniform. For this case, weobtain an O ( p log d ) -competitive algorithm where p = max Kk =1 p k . Consider the online set-cover problem [AAA +

09] with n sets { S j } nj =1 over some ground set U . Apartfrom the set system, we are also given K cost functions B k : [ n ] → R + for k ∈ [ K ]. Elements from U arrive online and must be covered by some set upon arrival; the decision to select a set into thesolution is irrevocable. The goal is to maintain a set-cover that minimizes the ℓ p norm of the K cost functions. We use Theorem 3.1 along with a rounding scheme (similar to [GKP12]) to obtain: Theorem 3.3.

There is an O (cid:16) p log p log d log r (cid:17) -competitive randomized online algorithm for setcover minimizing the ℓ p -norm of multiple cost-functions. Here d is the maximum number of setscontaining any element, and r = | U | is the number of elements.Proof. We use the following convex relaxation. There is a variable x j for each set j ∈ [ n ] whichdenotes whether this set is chosen.min g ( x ) = K X k =1 (cid:18) n X j =1 b kj · x j (cid:19) p + n X j =1 (cid:18) K X k =1 b pkj (cid:19) · x j s.t. X j : e ∈ S j x j ≥ , ∀ e ∈ Ux ≥ . We can use our framework to solve this fractional convex covering problem online. Although the ob-jective has a linear term in addition to the p -powers, we obtain an O ( p log d ) p -competitive algorithmas noted in Remark 3.2.Let C ⋆ denote the p th power of the optimal objective of the given set cover instance. Then it is clearthat the optimal objective of the above fractional relaxation is at most 2 C ⋆ . Thus the objective ofour fractional online solution g ( x ) = O ( p log d ) p · C ⋆ .To get an integer solution, we use a simple online randomized rounding algorithm. For each set j ∈ [ n ], deﬁne X j to be a { , } -random variable with Pr[ X j = 1] = min { p log r · x j , } . This caneasily be implemented online. It is easy to see by a Chernoﬀ bound that for each element e , it is notcovered with probability at most r p . If an element e is not covered by this rounding, we choose the14et minimizing min nj =1 { P Kk =1 b pkj : e ∈ S j } ; let e ∈ [ n ] index this set and C e = P Kk =1 b pke . Observethat C e ≤ C ⋆ for all e ∈ U .To bound the ℓ p -norm of the cost, let C k = P nj =1 b kj · X j be the cost of the randomly roundedsolution under the k th cost function, and let C := P Kk =1 C pk . Also for each element e ∈ U , deﬁne: • D ek = b ke for all k ∈ [ K ] and D e = C e if e is not covered by the rounding. • D ek = 0 for all k ∈ [ K ] and D e = 0 otherwise.Note that D e = P Kk =1 D pek . The p th power of the objective function is: C = K X k =1 C k + X e ∈ U D ek ! p ≤ p K X k =1 C pk + 2 p K X k =1 X e ∈ U D ek ! p ≤ p · C + 2 p K X k =1 r p X e ∈ U D pek = 2 p · C + (2 r ) p X e ∈ U D e (25)We now bound E [ C ] using (25). Observe that E [ C k ] ≤ p log r · P nj =1 b kj · x j . Since each C k isthe sum of independent non-negative random variables, we can bound E [ C pk ] using a concentrationinequality involving p th moments [Lat97]: E [ C pk ] ≤ K p · (cid:18) E [ C k ] p + n X j =1 E [ b pkj · X pj ] (cid:19) ≤ K p · (cid:18) (4 p log r ) p (cid:18) n X j =1 b kj · x j (cid:19) p + 4 p log r n X j =1 b pkj · x j (cid:19) . Above K p = O ( p/ log p ) p . By linearity of expectation, E [ C ] = K X k =1 E [ C pk ] ≤ K p (4 p log r ) p K X k =1 (cid:18)(cid:18) n X j =1 b kj · x j (cid:19) p + n X j =1 b pkj · x j (cid:19) = K p (4 p log r ) p · g ( x ) . Thus we have E [ C ] = O (cid:16) p log p · log d · log r (cid:17) p · C ⋆ .Observe that E (cid:2)P e ∈ U D e (cid:3) = P e ∈ U Pr[ e uncovered] · C e ≤ r − p · P e ∈ U C ⋆ = r − p · C ⋆ . Using thesebounds in (25), we have E [ C ] ≤ p · E [ C ] + (2 r ) p P e ∈ U E [ D e ] = O (cid:16) p log p · log d · log r (cid:17) p · C ⋆ . In the Capacity-constrained Facility Location (CCFL) problem, there are m potential facility loca-tions each with an opening cost c i and a capacity u i that are given up-front. There are n clientswhich arrive online. Each client j ∈ [ n ] has, for each facility i ∈ [ m ], an assignment cost a ij anda demand/load p ij . The online algorithm needs to open facilities (paying the opening costs) andassign each arriving client j to some open facility i (paying the assignment cost a ij , and incurring aload p ij on i ). The makespan of an assignment is the maximum load on any facility. The objectivein CCFL is to minimize the sum of opening costs, assignment costs and the makespan. An integer15rogramming formulation for this problem is the following:min m X i =1 c i x i + X i,j a ij y ij + m max i =1 n X j =1 p ij · y ij s.t. X i ∈ S x i + X i/ ∈ S y ij ≥ , ∀ j ∈ [ n ] , ∀ S ⊆ [ m ] y, x ∈ { , } . In order to apply our framework to CCFL, we allow the variables to be fractional, and use thefollowing objective function with p = Θ(log m ). f ( x, y ) = (cid:18) m X i =1 c i x i (cid:19) p + (cid:18) X i,j a ij y ij (cid:19) p + m X i =1 (cid:18) n X j =1 p ij · y ij (cid:19) p . Note that f ( x, y ) /p is within a constant factor of the original objective. We refer to the aboveconvex program as the fractional CCFL problem. Theorem 3.4.

There is an O (log m ) -competitive online algorithm for fractional CCFL.Proof. To apply our framework to solving fractional CCFL, we need a few observations. Firstly,although the function f is not fully known in advance, we know at any time the parts of f thatcorrespond to variables appearing in constraints revealed until then. It is easy to check that thissuﬃces for our framework to apply.Another issue is that there are an exponential number of covering constraints. This does notaﬀect the O ( p log d ) = O (log m ) competitive ratio we obtain through Theorem 3.1, since it isindependent of the number of covering constraints. However, the running time will be exponentialin the straightforward implementation. In order to obtain a polynomial running time, we relax thecovering constraints to (instead of one). Upon arrival of client j , we add covering constraintsbased on the following procedure.While there is some S ⊆ [ m ] with (cid:0)P i ∈ S x i + P i/ ∈ S y ij < (cid:1) , do:Add constraint P i ∈ S x i + P i/ ∈ S y ij ≥

1, andupdate solution ( x, y ) according to the algorithm of Theorem 3.1.Note that given a current solution ( x, y ), the set S that minimizes P i ∈ S x i + P i/ ∈ S y ij is S = { i ∈ [ m ] | x i < y ij } ; comparing this to 1 / m , because P mi =1 (min { x i , } + min { y ij , } )increases by at least in each iteration, and this sum is always between 0 and 2 m . Hence, at anytime (2 x, y ) is a feasible fractional solution, which satisﬁes all constraints. The online fractional solution can be rounded in an online fashion to obtain a randomized O (log m · log mn )-competitive algorithm. While this is worse by a log m factor than the result in [ABFP13],it follows directly from our general algorithm. 16e use a “guess and double” approach in the rounding. Let M denote some upper bound on theoptimal oﬄine value. Upon arrival of a new client, our algorithm will succeed if M is a correctupper bound. If the algorithm fails then M is doubled and we repeat the updates. We start with M being some known lower bound. A phase is a sequence of client arrivals for which M remainsthe same. At any point in the algorithm, the only allowed facilities are { i ∈ [ m ] | c i ≤ M } and theonly allowed assignments are { ( i, j ) | i ∈ [ m ] , j ∈ [ n ] , p ij ≤ M } . We denote by I M the restrictedinstance which consists only of the clients that arrive in this phase and the above facilities andallowed assignments. When we progress from one phase to the next (i.e. M is doubled), we resetall the x, y variables to zero.Deﬁne a modiﬁed objective as follows: g ( x, y ) = (cid:18) X i c i (cid:18) x i + P j p ij · y ij M (cid:19)(cid:19) p + (cid:18) X i,j a ij y ij (cid:19) p + X i (cid:18) X j p ij · y ij (cid:19) p . Note that this depends on the guess M and is ﬁxed for a single phase. Below we focus on therestricted instance I M . Unless speciﬁed otherwise, clients j and facilities i are only from I M .Consider the following convex program:min g ( x, y ) (26) s.t. X i ∈ S x i + X i/ ∈ S y ij ≥ , ∀ j ∈ [ n ] , S ⊆ [ m ] y, x ≥ . When a new client h arrives, the algorithm ﬁrst updates the fractional solution to ensure thecovering-constraints of client h up to a factor 2, as in Theorem 3.4. Now we have to do therounding. To do this, ﬁrst deﬁne the following modiﬁed variables: y ij = min { y ij , x i } , ∀ i, j, and x i = max (cid:26) x i , P j p ij · y ij M (cid:27) , ∀ i. By construction, the variables ( x, y ) clearly satisfy: P j p ij · y ij ≤ M · x i ∀ i (27) P mi =1 y ij ≥ ∀ j (28) y ij ≤ x i ∀ i, j (29) Claim 3.5.

Suppose there exists an integral solution to the current CCFL instance having cost atmost M . Then the following inequalities hold, where α = O (log m ) is the competitive ratio inTheorem 3.4: X i c i · x i ≤ α · M (30)17 i,j a ij · y ij ≤ α · M (31) X j p ij · y ij ≤ α · M ∀ i. (32) Proof.

Since the optimal integral value of the current CCFL instance is at most M , the optimalCCFL value of the restricted instance I M is also at most M . That is, there is an integral assignmentwith opening cost ≤ M , assignment cost ≤ M , and maximum load ≤ M . So the optimal fractionalvalue of program (26) is at most (2 M ) p + M p + m · M p ≤ m (3 M ) p . Since the fractional algorithmin Theorem 3.4 is α -competitive, we have: g ( x, y ) ≤ α p · m (3 M ) p ≤ (4 αM ) p , since m ≤ (4 / p for p ≥ log / m . This implies: X i c i · x i ≤ X i c i (cid:18) x i + P j p ij · y ij M (cid:19) ≤ g ( x, y ) /p X i,j a ij · y ij ≤ X i,j a ij y ij ≤ g ( x, y ) /p X j p ij · y ij ≤ X j p ij · y ij ≤ g ( x, y ) /p and g ( x, y ) /p ≤ αM proves all three claims.Hence, after the fractional updates, we check whether the conditions in (30)-(32) are satisﬁed; ifnot, we end the phase and double M (knowing by Claim 3.5 that M is a lower bound on the CCFLinstance so far), and start the next phase with the new client h and the new value of M . So assumethat after fractionally assigning h , all the inequalities (27)-(32) hold for the current value M . Nowwe perform randomized rounding as follows. • For each i , set X i to 1 with probability min { mn ) · x i , } . Let F f = { i : x i ≥

14 log( mn ) } denote the set of ﬁxed facilities for which Pr[ X i = 1] = 1. • For each i, j , deﬁne Z ij as follows:Pr[ Z ij = 1] = ( min { mn · y ij , } if i ∈ F f , y ij x i otherwise.All the above random variables are independent. Each client j is assigned to some facility i with X i · Z ij = 1; if there are multiple possible assignments, the algorithm breaks ties arbitrarily. (For thesake of analysis, we may imagine that the client is assigned to all facilities such that X i · Z ij = 1.)If client j is unassigned, we open the facility corresponding to min mi =1 ( c i + a ij + p ij ) and assign j to it (note that this minimum value is at most M ); we will show that this event happens with lowprobability, so the eﬀect on the objective will be small. We now analyze this rounding.18 laim 3.6. For any client j , Pr[ j not assigned ] = Pr[ P i X i · Z ij = 0] < /n .Proof. If i ∈ F f , then E [ X i Z ij ] = E [ Z ij ] = min { mn · y ij , } . Else, E [ X i Z ij ] = 4 log mn · y ij ≥ n · y ij . In either case,Pr[ j not assigned] = Pr[ X i X i Z ij = 0] = Y i (1 − E [ X i Z ij ]) ≤ exp (cid:18) − n X i y ij (cid:19) < /n , where the last inequality is by (28). Claim 3.7.

For any facility i ∈ F f , we have Pr[ load > α log mn · M ] ≤ /m .Proof. For facility i ∈ F f , the load assigned to it is P j p ij · Z ij . This is a sum of independent[0 , M ]-bounded random variables (by deﬁnition of the restricted instance I M ), with expectation atmost 4 log mn P j p ij · y ij , which by (32) is at most 16 α log mn · M . The claim now follows by aChernoﬀ bound. Claim 3.8.

For any facility i F f , we have Pr[ load > mn · M | X i = 1] ≤ /m .Proof. Fix i F f and condition on X i = 1. The load assigned to i is P j p ij · ( Z ij | X i = 1), whichis a sum of independent [0 , M ]-bounded random variables (again by deﬁnition of the restrictedinstance). The expectation is at most P j p ij · y ij x i ≤ M , by (27). The claim again follows by aChernoﬀ bound. Claim 3.9.

Pr[ opening cost > α log mn · M ] < /n .Proof. The opening cost is P i c i · X i which is a sum of independent [0 , M ]-bounded random variables,whose expectation is at most 4 log mn P i c i · x i ≤ α log mn · M by (30). The claim now followsby a Chernoﬀ bound. Claim 3.10. E [ assignment cost ] ≤ α log mn · M .Proof. The assignment cost is P i P j a ij · X i Z ij which has mean at most 4 log mn P ij a ij y ij ≤ α log mn · M by (31).Combining the above claims, and using the fact that each element is uncovered with probabilityless than n , we get: Lemma 3.11.

The expected sum of opening and assignment costs and makespan is O ( α log mn ) · M . A standard doubling argument accounts for all the phases as follows. Let M ⋆ denote the ﬁnal valueof the parameter M achieved by the algorithm. By Claim 3.5 we have OP T > M ⋆ /

2. On the otherhand, the expected cost in any phase corresponding to M is at most O ( α log mn ) · M by Lemma 3.11;this gives a geometric sum with total cost at most O ( α log mn ) · ( M ⋆ + M ⋆ + · · · ) ≤ O ( α log mn ) · OP T .This proves the following theorem.

Theorem 3.12.

There is a randomized O (log m log mn ) -competitive ratio for CCFL. Remark 3.13.

We can use randomized rounding with alteration, as in [GN14], to obtain a morenuanced O (log m · log mℓ ) -competitive ratio, where ℓ ≤ n is the “machine degree” i.e. max i ∈ [ m ] |{ j : p ij < ∞}| . We omit the details. .4 Capacitated Multicast Problem We consider the online multicast problem [AAA +

06] in the presence of capacities, which we callthe

Capacitated Multicast (CMC) problem. In this problem, there are m edge-disjoint rooted trees T , · · · , T m corresponding to multicast trees in some network. Each tree T i has a capacity u i whichis the maximum load that can be assigned to it. Each edge e ∈ ∪ mi =1 T i has an opening cost c e . Asequence of n clients arrive online, and each must be assigned to one of these trees. Each client j has a tree-dependent load of p ij for tree T i , and is connected to vertex π ij in tree T i . Thus, if client j is assigned to tree T i then the load of T i increases by p ij , and all edges on the path in T i from π ij to its root must be opened. The objective is to minimize the total cost of opening the edges,subject to the capacity constraints that the total load on tree T i is at most u i .The capacitated multicast problem generalizes the CCFL problem. Indeed, let each machine i ∈ [ m ]correspond to a two-level tree T i with capacity u i , where tree T i has a single edge r i incident tothe root, and n leaves corresponding to the clients. Edge r i has opening cost c i , and the leaf edgecorresponding to client j has opening cost a ij . The load of client j in tree T i is p ij . It is easyto check that a feasible solution to this CMC problem instance corresponds precisely to a CCFLsolution with precisely the same cost.In this section, we generalize the solution from the previous section to give the following result: Theorem 3.14.

There is a randomized online algorithm that given any instance of the capacitatedmulticast problem on d -level trees, and a bound C on its optimal cost, computes a solution of cost O (log m · log mn ) · C with congestion O (( d + log m ) · log mn ) . The congestion of a solution is the maximum (over all facilities) of the multiplicative factor by whichthe capacity is violated.The proof of this theorem will occupy the rest of this section. The main idea is similar: we solvea convex programming relaxation of this problem in an online fashion, and show how to round thesolution online as well. However, these will require some ideas over and above those used in theprevious section.First, the convex relaxation. It will be convenient to augment each tree T i as follows. For eachclient j with p ij ≤ u i (i.e., that can be feasibly assigned to T i ), we introduce a new leaf vertex v ij connected to vertex π ij ∈ T i via an edge of zero cost. These new leaf vertices v ij are assigneda vertex weight p v ij := p ij , whereas all the original vertices of the trees are given zero weight.To minimize extra notation, we refer to these augmented trees also as T i . Finally we merge theroots of these trees T i into a single root vertex r to get a new tree T = ( V, E ). For client j , let V j = { v ij | i ∈ [ m ] s.t. p ij ≤ u i } denote the leaves in T corresponding to client j .For any edge e ∈ E , denote the subtree of T below edge e by T e . Observe that if e was in T i then T e is a subtree of the i th tree T i . In this case, we use the notation j ∈ T e to denote that v ij ∈ T e .For each vertex v ∈ V \ { r } , its parent in T is denoted τ ( v ).Our fractional relaxation has a variable x e for each edge e ∈ E . For brevity, we use y ij := x ( v ij ,τ ( v ij )) to denote the variable for the edge connecting the leaf-node corresponding to client j in tree T i toits parent. The x e variables naturally denote the “opening” of edges, and the y ij variables denote20he assignment of clients to trees. The objective is the following convex function: g ( x ) =  m X i =1 X e ∈ T i c e x e + 2 u i X v ∈ T e p v · x v,τ ( v ) ! p + m X i =1  Cu i · X v ∈ T i p v · x v,τ ( v )  p . (33)In the above expression, we choose p = Θ(log m ). Using the facts that the weights p v are deﬁnedonly for the new leaf nodes, and that leaf edges are denoted by the y ij variables, we can write theabove expression equivalently as follows: g ( x ) = g ( x, y ) =  m X i =1 X e ∈ T i c e  x e + 2 u i X j ∈ T e p ij · y ij  p + m X i =1  Cu i · X j ∈ T e p ij · y ij  p . (34)We will solve the following convex covering program:min g ( x ) s.t. P e ∈ δ ( S ) x e ≥ , ∀ V j ⊆ S ⊆ V \ { r } , ∀ j ∈ [ n ] x ≥ . The constraints say that the min-cut between the root and the nodes in the set V j , which containsall the nodes corresponding to client j in the various trees, is at least 1—i.e., j is (fractionally)connected at least to unit extent. Much as in Section 3.3, we deal with the exponential number ofcovering constraints as follows: we relax the covering constraints to (instead of one). Upon arrivalof client j , we add covering constraints based on the following procedure.While there is some V j ⊆ S ⊆ V ( G ) \ { r } with (cid:0) P e ∈ δ ( S ) x e < (cid:1) , do:Add constraint P e ∈ δ ( S ) x e ≥

1, and update ( x, y ) according to Theorem 3.1.Note that given a current solution x , one can ﬁnd such a “violated constraint” (if there is one) bya minimum-cut subroutine, which takes polynomial time. The number of iterations of the aboveprocedure is at most 2 | E | , because P e ∈ E min { x e , } increases by at least in each iteration, but itstarts at 0 and stays at most | E | . Moreover, twice the solution is always a feasible solution, whichimplies an O (log m )-competitive online algorithm for the fractional problem. For the online rounding, deﬁne some modiﬁed variables. For each client j ∈ [ n ], compute a unit-ﬂow F j from the set V j to the root r in the tree T with edge-capacities 2 x ; note that the fractionalsolution guarantees this ﬂow exists. Let f je be the amount of ﬂow on edge e in F j , and deﬁne f e := max j f je . Note that the f e values are monotone non-decreasing as we go up the tree T . Nowset: x e = max (cid:26) f e , u i X j ∈ T e p ij · y ij (cid:27) , ∀ e ∈ T i , ∀ i ∈ [ m ] . (35)21lso deﬁne y ij = x ( v ij ,τ ( v ij )) for any client j and tree T i , to capture the assignment of clients oftrees. Claim 3.15.

The variables x e are monotone non-decreasing up the tree T .Proof. The ﬂow values f e are monotone non-decreasing up the tree. Also, for any tree T i and anyedge e ∈ T i , the quantity f ′ e := u i P j ∈ T e p ij · y ij is also monotone non-decreasing up the tree, sinceit is the sum of non-negative quantities over larger subtrees. Since tree T is obtained by mergingthe trees T i at the root, the monotonicity of x e = max { f e , f ′ e } maintained.Note that ( x, y ) clearly satisﬁes: y ij ≤ x i ∀ j ∈ T e , ∀ e ∈ T i , ∀ i ∈ [ m ] . (36) X j ∈ T e p ij · y ij = X v ∈ T e p v · x v,τ ( v ) ≤ u i · x e ∀ e ∈ T i , ∀ i ∈ [ m ] . (37) x e ≤ x τ ( e ) ∀ e ∈ T. (38) m X i =1 y ij ≥ ∀ j ∈ [ n ] . (39) Claim 3.16.

Assuming there exists an integral solution to the CMC problem instance having costat most C , the following inequalities hold with α = O (log m ) : X e ∈ T c e · x e ≤ α · C (40) X j ∈ T i p ij · y ij = X v ∈ T i p v · x v,τ ( v ) ≤ α · u i , ∀ i ∈ [ m ] . (41) Proof.

The optimal integral solution of the current CMC problem instance has cost most C , hencethe optimal fractional value of our convex covering problem is at most (3 C ) p + m · (2 C ) p ≤ ( m +1)(3 C ) p , and our α -competitive algorithm ensures that g ( x, y ) ≤ α p · ( m + 1)(3 C ) p ≤ (4 αC ) p for p ≥ log / ( m + 1) = Θ(log m ). This, in turn, implies that X e ∈ T c e · x e ≤ X e ∈ T c e (cid:18) x e + 2 u i X j ∈ T e p ij · y ij (cid:19) ≤ g ( x, y ) /p (42) Cu i X j ∈ T i p ij · y ij ≤ Cu i X j ∈ T i p ij · y ij ≤ g ( x, y ) /p (43)which proves the claim.Having deﬁned these convenient modiﬁed variables, the rounding proceeds as follows. For each tree T i , the edges F i = { e ∈ T i | x e ≥ } form a rooted subtree, by the monotonicity of the x values.We include the edges in F i in the solution deterministically. For the rest of the edges, we performthe following experiment β := Θ( d · log mn ) times independently, and take the union of the edgespicked.For each tree T i , independently: 22i) For each edge e ∈ T i \ F i , pick it independently with probability x e x τ ( e ) , where weuse τ ( e ) to denote the parent edge of e . An edge e whose parent edge does not liein T i \ F i is chosen with probability x e .(ii) If the load for tree T i exceeds 8( d + 4 α ) · u i , declare failure for all clients assignedto T i .The rounding in step (i) is from Garg et al. [GKR00] and hence is often called the GKR-rounding;it can be implemented online using ideas from [AAA + j we may declare failure for all β experiments. In that case we can choose the pathin that tree T i for client j which is cheapest subject to p ij ≤ u i .A client j is assigned in tree T i if all edges on the path from v ij to r are picked in T i during Step 1,and if we don’t declare failure in Step 2; a client is assigned if it is assigned in at least one tree.We ﬁrst show that there is a good probability of any client being assigned in one run of the randomexperiment above. Claim 3.17.

For any client j , Pr[ j assigned in one run ] ≥ .Proof. It is easy to check that for any tree T i , Pr[ j assigned to T i in Step 1] = min { y ij , } . Sincethe random choices in diﬀerent trees T i are independent,Pr[ j not assigned to any tree in step 1] = m Y i =1 (cid:0) − min { y ij , } (cid:1) ≤ e − P mi =1 y ij ≤ (39) . Next, we claim that conditioned on j being assigned in tree T i in Step 1 (i.e., on all edges on thepath P ij from the root of T i to v ij being chosen in the solution), the conditional probability it isrejected in Step 2 is at most 1 /

8, i.e.,Pr[ j rejected in step 2 | j assigned to T i in step 1] ≤ . (44)This would imply that j is assigned in at least one tree with probability (1 − / e), and survivesrejection in that tree with probability 7 /

8, giving (1 − ) ≥ .To prove (44), let edges e , · · · , e k be the edges of T i \ F i on path P ij at increasing distance fromthe root; hence e k = ( τ ( v ij ) , v ij ). For h = 1 , · · · , k , deﬁne subtree S h := T e h \ T e h +1 which consistsof all nodes whose path to the root ﬁrst intersects with P ij at the edge e h . By the properties of theGKR rounding, we have in Step 1: E [load from S h | j assigned to T i in step 1] = X ℓ ∈ S h p iℓ · y iℓ x e h ≤ x e h · X ℓ ∈ T eh p iℓ · y iℓ ≤ (37) u i . Summing the expression above for all h = 1 , · · · , k , E [load from ∪ kh =1 S h | j assigned to T i in step 1] ≤ k · u i ≤ d · u i , since the tree has depth at most d . 23his bounds the expected load of those clients whose paths to the root share an edge with P ij intree T i . For any other client ℓ , the conditioning does not matter, and hencePr[ ℓ assigned to T i | j assigned to T i in step 1] = Pr[ ℓ assigned to T i ] = y iℓ . So using (41), E [load from [ n ] \ j \ ∪ kh =1 S h | j assigned to T i in step 1] ≤ X ℓ ∈ T i p iℓ · y iℓ ≤ α · u i Thus the total expected load from [ n ] \ j conditioned on j being assigned to T i in Step 1 is at most( d + 4 α ) · u i . Markov’s inequality now implies (44), and hence the claim.By Step 2 of the algorithm, we immediately have: Claim 3.18.

The load assigned to tree T i is at most β ( d + 4 α ) · u i , for each i ∈ [ m ] . Claim 3.19.

The expected opening cost is at most αβ · C .Proof. The expected cost of edges chosen in each of the β independent trials is at most P e c e · x e ≤ αC using (40). Summing the cost over all trials gives the claim. Claim 3.20.

For any client j , the probability that j is unassigned is at most mn .Proof. By Claim 3.17, the probability of j being unassigned in one trial is at most . Since thereare β = Θ(log mn ) independent trials, the claim follows. Proof of Theorem 3.14:

By Claims 3.18 and 3.19, we know that the cost and load of the solutionis at most the claimed bounds. Moreover, we know that the probability of the client not beingassigned to any of the trees is at most mn . Since this will increase the load of some tree i by atmost u i and the cost by at most OP T , and happens with probability at most mn , this increasesthe expected cost and congestion by a negligible factor. (cid:3) We consider here the online set cover with set requests (SCSR) problem ﬁrst consideed by Bhawalkaret al. [BGP14], which is deﬁned as follows. We are given a universe U of n resources , and a collectionof m facilities , where each facility i ∈ [ m ] is speciﬁed by (i) a subset S i ⊆ U of resources (ii) openingcost c i and (iii) capacity u i . The resources and facilities are given up-front. Now, a sequenceof k requests arrive over time. Each request j ∈ [ k ] requires some subset R j ⊆ U of resources.The request has to be served by assigning it to some collection F j ⊆ [ m ] of facilities whose setscollectively cover R j , i.e., R j ⊆ ∪ i ∈ F j S i . Note that these facilities have to be open, and we incurthe cost of these facilities. Moreover, if a facility i is used to serve client j , this contributes to theload of facility i , and this total load must be at most the capacity u i .As in previous sections, we give an algorithm to compute a solution online which violates thecapacity constraint by some factor. Our main result for this problem is the following:24 heorem 3.21. There is a randomized online algorithm that given any instance of the set coverwith set requests problem and a bound C on its optimal cost, computes a solution of cost O (log m · log mnk ) · C with congestion O (log m log mnk ) . The ideas — for both the convex relaxation and the rounding — are very similar to that for CCFL;hence we only sketch the main ideas here. For the fractional relaxation, there is a variable x i foreach facility i ∈ [ m ] denoting if the facility is opened. For each request j ∈ [ k ] and facility i thereis a variable y ij that denotes if request j is connected to facility i . We set p = Θ(log m ) and theobjective is: g ( x, y ) = X i c i (cid:18) x i + P j y ij u i (cid:19)! p + C p · X i  x i + 1 u i X j y ij  p . We deﬁne the following convex covering program, where we use F ( ℓ ) := { i ∈ [ m ] | ℓ ∈ S i } for eachresource ℓ ∈ U . min g ( x, y ) s.t. X i ∈ T x i + X i ∈ F ( ℓ ) \ T y ij ≥ , ∀ T ⊆ F ( ℓ ) , ∀ ℓ ∈ R j , ∀ j ∈ [ k ] ,y, x ≥ . We can solve this convex program in an online fashion, much as in Theorem 3.1. Now for therounding: we maintain the following modiﬁed variables: y ij = min { y ij , x i } , ∀ i, j.x i = max (cid:26) x i , P j y ij u i (cid:27) , ∀ i. Note that ( x, y ) clearly satisfy: X j y ij ≤ u i · x i ∀ i. (45) X i ∈ F ( ℓ ) y ij ≥ ∀ ℓ ∈ R j , ∀ j ∈ [ k ] . (46) y ij ≤ x i ∀ i, j. (47) Claim 3.22.

Assuming that there is an integral solution to the SCSR instance having cost at most C , the following inequalities hold for α = O (log m ) : X i c i · x i ≤ α · C. (48) x i ≤ α, ∀ i. (49)25he proof is similar to Claim 3.5 and omitted.The ﬁnal randomized rounding is the same as for CCFL. For each facility i , set X i to one withprobability min { mnk ) · x i , } . Let F denote the set of ﬁxed facilities, i.e. Pr[ X i = 1] = 1. So F = { i : x i >

14 log( mnk ) } . For each request i and facility j , set Z ij to one with probability: P r [ Z ij = 1] = ( min { mnk ) · y ij , } if i ∈ F , y ij x i otherwise.All the above random variables are independent. Each request i gets connected to all facilities j with X i · Z ij = 1. The analysis of the rounding is also identical to that of CCFL, and is omitted.This completes the proof of Theorem 3.21. In this section we consider a proﬁt maximization problem (called PMPC) for a single seller withproduction costs for items. There are m items that the seller can produce and sell. The productionlevels are given by a vector µ ∈ R m + ; the total cost incurred by the seller to produce µ j units ofevery item j ∈ [ m ] is g ( µ ) for some production cost function g : R m + → R + . In this work we allowfor functions g which are convex and monotone in a certain sense ‡ . There are n buyers who arriveonline. Each buyer i ∈ [ n ] is interested in certain subsets of items (a.k.a. bundles ) which belong tosome set family S i ⊆ [ m ] . The extent of interest of buyer i for subset S ∈ S i is given by v i ( S ),where v i : S i → R + is her valuation function .If buyer i is allocated a subset T ∈ S i of items, he pays the seller his valuation v i ( T ). Consider theoptimization problem for the seller: he must produce some items and allocate bundles to buyers soas to maximize the proﬁt P ni =1 v i ( T i ) − g ( µ ), where T i ∈ S i denotes the bundle allocated to buyer i and µ = P ni =1 χ T i ∈ R m is the total quantity of all items produced. (Here χ S ∈ { , } m is thecharacteristic function of the set S .) Observe that in this paper we consider a non-strategic setting,where the valuation of each buyer is known to the seller; this diﬀers from an auction setting, wherethe seller has to allocate items to buyers without knowledge of the true valuation, and the buyersmay have an incentive to mis-report their true valuations.This class of maximization problems with production costs was introduced by Blum et al. [BGMS11]and more recently studied by Huang and Kim [HK14]. Both these works dealt with the online auction setting, but in both works they considered a special case where the production costs were separableover items; i.e, where g ( µ ) = P j g j ( µ j ) for some convex functions g j ( · ). In contrast, we can handlegeneral production costs g ( · ), but we do not consider the auction setting. Our main result is for the fractional version of the problem where the allocation to each buyer i is allowed to be any pointin the convex hull of the S i . In particular, we want to solve following convex program in an onlinefashion: maximize n X i =1 X T ∈S i v i ( T ) · y iT − g ( µ ) ( D ) ‡ The formal conditions on g appear in Assumption 4.1. T ∈S i y iT ≤ ∀ i ∈ [ n ] , (50) n X i =1 X T ∈S i j ∈ T · y iT − µ j ≤ ∀ j ∈ [ m ] , (51) y, µ ≥ . (52)Note that this problem looks like the dual of the covering problems we have been studying inprevious sections, and hence is suggestively called ( D ). Consider the following “dual” program thatgives an upper bound on the value of ( D ).minimize n X i =1 u i + g ⋆ ( x ) ( P ) u i + X j ∈ T x j ≥ v i ( T ) ∀ i ∈ [ n ] , ∀ T ∈ S i , (53) u, x ≥ . (54)Again, to be consistent with our general framework, we refer to this minimization (covering) problemas the “primal” ( P ).Notice that this primal-dual pair falls into the general framework of Section 2 if we set f ( u, x ) := n X i =1 u i + g ⋆ ( x ) . Indeed, if we were to construct the Fenchel dual of ( P ) as in Section 2, we would again arriveat ( D ) after some simpliﬁcation (using the fact that g ∗∗ = g for any convex function g with subgra-dients § [Roc70]). In order to apply now our framework, we assume that f is continuous, diﬀerentiableand satisﬁes ∇ f ( z ) ≥ ∇ f ( z ′ ) for all z ≥ z ′ . This translates to the following assumptions on theproduction function g : Assumption 4.1.

Function g ⋆ : R m + → R + (recall g ⋆ ( x ) = sup µ { x T µ − g ( µ ) } ) is monotone, convex,continuous, diﬀerentiable and has ∇ g ⋆ ( x ) ≥ ∇ g ⋆ ( x ′ ) for all x ≥ x ′ . Since we require irrevocable allocations, we cannot use the primal-dual algorithm from Section 2.1,since that algorithm could decrease the dual variables y iT . Instead, we use the algorithm fromSection 2.2 which ensures both primal and dual variables are monotonically raised. We can now usethe competitive ratio from (23)—when g ⋆ (0) = 0 this ratio is at leastmax c>  min z  min nℓ =1 n ∇ ℓ g ⋆ ( z ) ∇ ℓ g ⋆ ( cz ) o ρd )  − max z (cid:18) z ⊺ ∇ g ⋆ ( z ) − g ⋆ ( z ) g ⋆ ( cz ) (cid:19) (55)In this expression, recall that d is the row-sparsity of the covering constraints in ( P ), i.e. d = § A subgradient of g : R m → R at u is a vector V u ∈ R m such that g ( w ) ≥ g ( u ) + V Tu ( w − u ) for all w ∈ R m .

27 + max T ∈∪S i | T | . And the term ρ is the ratio between the maximum and minimum (non-zero)valuations any player i has for any set in S i . In other words, ρ ≤ R := max { v i ( T ) : T ∈ S i , i ∈ [ n ] } min { v i ( T ) : T ∈ S i , v i ( T ) > , i ∈ [ n ] } . (56) ( D ) To solve the primal-dual convex programs using our general framework in polynomial time, we needaccess to the following oracle:

Oracle:

Given vectors ( u, x ), and an index i , ﬁnd a set T ∈ S i such that u i + (cid:16)P j ∈ T x j − v i ( T ) (cid:17) < , (57)or else report that no such set exists.Given such an oracle, we maintain a ( u, x ) such that (2 u, x ) is feasible for ( P ) as follows. Whena new buyer i arrives, we use the oracle on (2 u, x ). While it returns a set T ∈ S i , we update( u, x ) to satisfy the constraint (53). Else we know that (2 u, x ) is a feasible solution for ( P ).This scaling by a factor of 2 allows us to bound the number of iterations as follows: when buyer i arrives, deﬁne Q i = min { u i , V imax } + P mj =1 min { x j , V imax } where V imax = max { v i ( T ) | T ∈ S i } .Note that Q i ≤ ( m + 1) V imax and Q i increases by at least V imin / V imin =min { v i ( T ) | T ∈ S i , v i ( T ) > } . So the number of iterations is at most O ( mR ) where R is deﬁnedin (56). This gives us a polynomial-time online algorithm if R is polynomially bounded.What properties do we need from the collection S i and valuation functions v i such that can weimplement the oracle eﬃciently? Here are some cases when this is possible. • Small S i . If each |S i | is polynomially bounded then we can solve (57) just by enumeration.An example is when each buyer is “single-minded” i.e. she wants exactly one bundle. • Supermodular valuations.

Here, buyer i has S i = 2 [ m ] and v i : 2 [ m ] → R + is supermodular, i.e. v i ( T ) + v i ( T ) ≤ v i ( T ∪ T ) + v i ( T ∩ T ) for all T , T ⊆ [ m ]. In this case, we can solve (57)using polynomial-time algorithms for submodular minimization [Sch03], since the expressioninside the minimum is a linear function minus a supermodular function. • Matroid constrained valuations.

In this setting, each buyer i has some value v ij for each item j ∈ [ m ] and the feasible bundles S i are independent sets of some matroid. ¶ Here we cansolve (57) by maximizing a linear function over a matroid. This is because the minimizationmin T ∈S i (cid:18) X j ∈ T x j − v i ( T ) (cid:19) = min T ∈S i X j ∈ T ( x j − v ij ) = − max T ∈S i ( v ij − x j ) , ¶ An alternative description of such valuation functions is to have S ′ i = 2 [ m ] and v ′ i ( T ) = maximum weight inde-pendent subset of T (where each item j has weight v ij ). Viewed this way, the buyer’s valuation is a weighted matroidrank function which is a special submodular function. We now have a deterministic online algorithm for ( D ) with competitive ratio as given in (55).Moreover, this algorithm runs in polynomial time for many special cases. Here we show howthe fractional online solution can be rounded to give integral allocations. We make the followingadditional assumption on the production costs. Assumption 4.2.

There is a constant β > such that g ( aµ ) ≤ a β · g ( µ ) ∀ < a < , µ ∈ R m + . Theorem 4.3.

For any ǫ ∈ (0 , there is a randomized online algorithm for PMPC under Assump-tions 4.1 and 4.2 that achieves expected proﬁt at least (1 + ǫ ) − − β − · OPT α − OPT mα − g ( L · ) , where OPT is the oﬄine optimal proﬁt, α is the fractional competitive ratio and L = O ( log mǫ ) . Note that the additive error term g ( L · ) is independent of the number n of buyers: it depends onlyon the number m of items and the production function g . We also give an example below whichshows that any rounding algorithm for ( D ) must incur some such additive error.We now describe the rounding algorithm. Let ǫ ∈ (0 ,

1) be any value; set a = (1 + ǫ ) − − β − . Therounding algorithm scales the fractional allocation y by factor a < M ∈ Z m + denote the (integral) quantities of diﬀerent items produced at any point inthe online rounding. Upon arrival of buyer i , the algorithm does the following.1. Update fractional solution ( y, µ ) according to the fractional online algorithm.2. If M j > (1 + ǫ ) aµ j + ǫ log m for any j ∈ [ m ] then skip.3. Else, allocate set T ∈ S i to buyer i with probability a · y iT . Claim 4.4.

Pr[ M j > (1 + ǫ ) aµ j + ǫ log m ] ≤ m for all items j ∈ [ m ] and ǫ ∈ (0 , .Proof. Fix j ∈ [ m ] and ǫ ∈ (0 , M j is the sum of independent 0 − E [ M j ] ≤ a · µ j . The claim now follows by Chernoﬀ bound.Below ℓ := ǫ log m + 1 and L := (1 + ǫ ) · ℓ = O ( log mǫ ). Lemma 4.5.

The expected objective of the integral allocation is at least a (1 − m ) · n X i =1 X T ∈S i v i ( T ) · y iT − a · g ( µ ) − g ( L · ) . Proof.

Note that the algorithm ensures (in step 2 above) that M ≤ (1 + ǫ ) a · µ + ℓ · . So withprobability one, the production cost is at most: g ((1 + ǫ ) a · µ + ℓ · ) = g (cid:18)

11 + ǫ · (1 + ǫ ) a · µ + ǫ ǫ · (1 + 1 ǫ ) ℓ · (cid:19) ≤ g (cid:0) (1 + ǫ ) a · µ (cid:1) + g (cid:18) (1 + 1 ǫ ) ℓ · (cid:19) ((1 + ǫ ) a ) β · g ( µ ) + g ( L · ) = a · g ( µ ) + g ( L · ) . The ﬁrst inequality is by convexity of g ; the second inequality uses Assumption 4.2 and (1+ ǫ ) a < a .By Claim 4.4, the probability that we skip some buyer i is at most m . Thus the expected totalvalue is at least (1 − m ) a · P ni =1 P T ∈S i v i ( T ) y iT . Subtracting the upper bound on the cost from theexpected value, we obtain the lemma.This completes the proof of Theorem 4.3. Integrality Gap.

We note that the additive error term is necessary for any algorithm basedon the convex relaxation ( D ). Consider a single buyer with S = 2 [ m ] and v ( T ) = | T | . Let g ( µ ) = P mj =1 µ j . The optimal integral allocation clearly has proﬁt zero. However the fractionaloptimum is Ω( m ) due to the feasible solution with y T = 2 − m for all T ⊆ [ m ] and µ j = for all j ∈ [ m ]. Thus any algorithm using this relaxation incurs an additive error depending on m . Here we give two examples of production costs (satisfying Assumptions 4.1 and 4.2) to which ourresults apply. In each case, we ﬁrst show the competitive ratio obtained for the fractional convexprogram, and then use the rounding algorithm to obtain an integral solution.

Example 1.

Consider a seller who can produce items in K diﬀerent factories, where the k ’thfactory produces in one hour of work p kj units of item j . The production cost is the sum of q th powers of the work hours of the K factories (speciﬁcally, we get a linear production cost for q = 1and the q th power of makespan when q ≥ log K ). This corresponds to the following function: g ( µ ) = min ( q K X k =1 z qk : K X k =1 p kj · z k ≥ µ j , ∀ j ∈ [ m ] , z ≥ ) . (58)We scale the objective by 1 /q to get a more convenient form. The dual function is: g ⋆ ( x ) = 1 p K X k =1  m X j =1 p kj · x j  p , where 1 p + 1 q = 1 . Applying our framework (as Assumption 4.1 is satisﬁed), as in Section 3.1, we obtain an α = O ( p log ρd ) p -competitive fractional online algorithm, where ρ = R the maximum-to-minimum ratioof valuations and row-sparsity d ≤ m + 1. Combined with Theorem 4.3 (note that Assumption 4.2is satisﬁed with β = q ), setting ǫ = , we obtain: Corollary 4.6.

There is a randomized online algorithm for PMPC with cost function (58) for q > that achieves expected proﬁt at least (1 − m ) OPT O ( p log Rd ) p − g ( O (log m ) · ) . Note that g ( O (log m ) · ) ≤ K · O (cid:16) log mp min (cid:17) q where p min > p kj s .30 xample 2. This deals with the dual of the above production cost. Suppose there are K diﬀerentlinear cost functions: for k ∈ [ K ] the k th cost function is given by ( c k , · · · , c km ) where c kj is thecost per unit of item j ∈ [ m ]. The production cost g is deﬁned to be the (scaled) sum of p th powersof these K diﬀerent costs: g ( µ ) = 1 p K X k =1  m X j =1 c kj · µ j  p . (59)This has dual: g ⋆ ( x ) = min ( q K X k =1 z qk : K X k =1 c kj · z k ≥ x j , ∀ j ∈ [ m ] , z ≥ ) , where 1 p + 1 q = 1 . The primal program ( P ) after eliminating variables { x j } mj =1 is given below with its dual: minimize n X i =1 u i + 1 q K X k =1 z qk ( P ′ ) u i + X j ∈ T K X k =1 c kj · z k ≥ v i ( T ) , ∀ i ∈ [ n ] , ∀ T ∈ S i ,u, z ≥ . maximize n X i =1 X T ∈S i v i ( T ) · y iT − p K X k =1 λ pk ( D ′ ) X T ∈S i y iT ≤ , ∀ i ∈ [ n ] , n X i =1 X T ∈S i ( X j ∈ T c kj ) · y iT − λ k ≤ , ∀ k ∈ [ K ] ,y, λ ≥ . Note that the row-sparsity in ( P ′ ) is d = K + 1 which is incomparable to m . We obtain a solutionto ( P ) by setting x = Cz from any solution ( u, z ) to ( P ′ ) where C m × K has k th column ( c k , · · · , c km ).We can apply our algorithm to the convex covering problem ( P ′ ) as g ′ ( λ ) = p P Kk =1 λ pk satisﬁesAssumption 4.1. This algorithm maintains monotone feasible solutions ( u, z ) to ( P ′ ) and ( y, λ ) to( D ′ ). However, to solve ( D ) online we need to maintain variables ( y, µ ) which is diﬀerent from thevariables ( y, λ ) in ( D ′ ). We maintain y in ( D ) to be the same as that in ( D ′ ). We set the productionquantities µ j = P ni =1 P T ∈S i j ∈ T · y iT so that all constraints in ( D ) are satisﬁed. Note that thedual variables y (allocations) and µ (production quantities) are monotone increasing- so this is avalid online algorithm. In order to bound the objective in ( D ) we use the feasible solution ( y, λ ) to( D ′ ). Note that for all k ∈ [ K ]: c Tk µ = m X j =1 c kj · µ j = m X j =1 c kj n X i =1 X T ∈S i j ∈ T · y iT = n X i =1 X T ∈S i y iT X j ∈ T c kj ≤ λ k . So the objective of ( y, µ ) in ( D ) is at least that of ( y, λ ) in ( D ′ ). Our general framework then impliesa competitive ratio for the fractional problem of α = O ( q log ρd ) q = O ( q log ρ ) q where ρ = R · K · max { c kj : k ∈ [ K ] , j ∈ [ m ] } min { c kj : k ∈ [ K ] , j ∈ [ m ] } . R is the maximum-to-minimum ratio of valuations, and recall d ≤ K + 1.Combined with Theorem 4.3 ( ǫ = ), we obtain: Corollary 4.7.

There is a randomized online algorithm for PMPC with cost function (59) for p > that achieves expected proﬁt at least (1 − m ) OPT O ( q log ρ ) q − g ( O (log m ) · ) . Here g ( O (log m ) · ) ≤ K · O ( m log m · c max ) p where c max is the maximum entry in c kj s . References [AAA +

06] Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, and Joseph (Seﬃ) Naor. A generalapproach to online network optimization problems.

ACM Trans. Algorithms , 2(4):640–660, 2006.[AAA +

09] Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, and Joseph Naor. The online setcover problem.

SIAM J. Comput. , 39(2):361–370, 2009.[ABFP13] Yossi Azar, Umang Bhaskar, Lisa Fleischer, and Debmalya Panigrahi. Online mixed packingand covering. In

Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013 , pages 85–100, 2013.[ACP14] Yossi Azar, Ilan Reuven Cohen, and Debmalya Panigrahi. Online covering with convex objectivesand applications.

CoRR , abs/1412.3507, 2014.[BBN12] Nikhil Bansal, Niv Buchbinder, and Joseph Naor. A primal-dual randomized algorithm forweighted paging.

J. ACM , 59(4):19, 2012.[BGMS11] Avrim Blum, Anupam Gupta, Yishay Mansour, and Ankit Sharma. Welfare and proﬁt maxi-mization with production costs. In

FOCS , pages 77–86, Nov 2011.[BGP14] Kshipra Bhawalkar, Sreenivas Gollapudi, and Debmalya Panigrahi. Online set cover with setrequests. In

APPROX/RANDOM , pages 64–79, 2014.[BN07] Niv Buchbinder and Joseph (Seﬃ) Naor. The design of competitive online algorithms via aprimal-dual approach.

Found. Trends Theor. Comput. Sci. , 3(2-3):93–263, 2007.[BN09] Niv Buchbinder and Joseph (Seﬃ) Naor. Online primal-dual algorithms for covering and packing.

Math. Oper. Res. , 34(2):270–286, 2009.[DH14] Nikhil R. Devanur and Zhiyi Huang. Primal dual gives almost optimal energy eﬃcient onlinealgorithms. In

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Al-gorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages 1123–1140, 2014.[GKP12] Anupam Gupta, Ravishankar Krishnaswamy, and Kirk Pruhs. Online primal-dual for non-linearoptimization with applications to speed scaling. In Thomas Erlebach and Giuseppe Persiano,editors,

Workshop on Approximation and Online Algorithms , volume 7846 of

Lecture Notes inComputer Science , pages 173–186, Sep 2012.[GKR00] Naveen Garg, Goran Konjevod, and R. Ravi. A polylogarithmic approximation algorithm for thegroup Steiner tree problem.

Journal of Algorithms , 37(1):66–84, 2000. (Preliminary version in , pages 253–259, 1998).[GN14] Anupam Gupta and Viswanath Nagarajan. Approximating sparse covering integer programsonline.

Mathematics of Operations Research , 39(4):998–1011, 2014.[HK14] Zhiyi Huang and Anthony Kim. Welfare maximization with production costs: A primal dualapproach.

CoRR , abs/1411.4384, 2014. To appear in SODA 2015. Kor05] Simon Korman. On the use of randomness in the online set cover problem.

M.Sc. thesis, WeizmannInstitute of Science , 2005.[Lat97] Rafa l Lata la. Estimation of moments of sums of independent real random variables.

Ann. Probab. ,25(3):1502–1513, 1997.[Roc70] R. Tyrrell Rockafellar.

Convex analysis . Princeton University Press, 1970.[Sch03] Alexander Schrijver.

Combinatorial optimization. Polyhedra and eﬃciency. , volume 24 of

Algo-rithms and Combinatorics . Springer-Verlag, Berlin, 2003.. Springer-Verlag, Berlin, 2003.