[PDF] Tight Bounds for Online Vector Scheduling

Abstract

Modern data centers face a key challenge of effectively serving user requests that arrive online. Such requests are inherently multi-dimensional and characterized by demand vectors over multiple resources such as processor cycles, storage space, and network bandwidth. Typically, different resources require different objectives to be optimized, and L r norms of loads are among the most popular objectives considered. To address these problems, we consider the online vector scheduling problem in this paper. Introduced by Chekuri and Khanna (SIAM J of Comp. 2006), vector scheduling is a generalization of classical load balancing, where every job has a vector load instead of a scalar load. In this paper, we resolve the online complexity of the vector scheduling problem and its important generalizations. Our main results are: -For identical machines, we show that the optimal competitive ratio is Θ(logd/loglogd) by giving an online lower bound and an algorithm with an asymptotically matching competitive ratio. The lower bound is technically challenging, and is obtained via an online lower bound for the minimum mono-chromatic clique problem using a novel online coloring game and randomized coding scheme. -For unrelated machines, we show that the optimal competitive ratio is Θ(logm+logd) by giving an online lower bound that matches a previously known upper bound. Unlike identical machines, however, extending these results, particularly the upper bound, to general L r norms requires new ideas. In particular, we use a carefully constructed potential function that balances the individual L r objectives with the overall (convexified) min-max objective to guide the online algorithm and track the changes in potential to bound the competitive ratio.

Full PDF

TTight Bounds for Online Vector Scheduling

Sungjin Im ∗ Nathaniel Kell † Janardhan Kulkarni ‡ Debmalya Panigrahi †∗ Electrical Engineering and Computer Science, University of California at Merced, Merced, CA, USA.

Email: [email protected] † Department of Computer Science, Duke University, Durham, NC, USA.

Email: { kell,debmalya } @cs.duke.edu ‡ Microsoft Research, Redmond, WA, USA.

Email: [email protected]

Abstract

Modern data centers face a key challenge of effectively serving user requests that arrive online. Such requests are inherently multi-dimensional and characterized by demand vectors over multiple resources such as processor cycles, storage space, andnetwork bandwidth. Typically, different resources require different objectives to be optimized, and L r norms of loads are amongthe most popular objectives considered. Furthermore, the server clusters are also often heterogeneous making the schedulingproblem more challenging.To address these problems, we consider the online vector scheduling problem in this paper. Introduced by Chekuri and Khanna(SIAM J. of Comp. 2006), vector scheduling is a generalization of classical load balancing, where every job has a vector loadinstead of a scalar load. The scalar problem, introduced by Graham in 1966, and its many variants (identical and unrelatedmachines, makespan and L r -norm optimization, ofﬂine and online jobs, etc.) have been extensively studied over the last 50 years.In this paper, we resolve the online complexity of the vector scheduling problem and its important generalizations — for all L r norms and in both the identical and unrelated machines settings. Our main results are: • For identical machines , we show that the optimal competitive ratio is

Θ(log d/ log log d ) by giving an online lower boundand an algorithm with an asymptotically matching competitive ratio. The lower bound is technically challenging, and isobtained via an online lower bound for the minimum mono-chromatic clique problem using a novel online coloring gameand randomized coding scheme. Our techniques also extend to asymptotically tight upper and lower bounds for general L r norms. • For unrelated machines , we show that the optimal competitive ratio is

Θ(log m + log d ) by giving an online lower boundthat matches a previously known upper bound. Unlike identical machines, however, extending these results, particularly theupper bound, to general L r norms requires new ideas. In particular, we use a carefully constructed potential function thatbalances the individual L r objectives with the overall (convexiﬁed) min-max objective to guide the online algorithm andtrack the changes in potential to bound the competitive ratio. Index Terms

Online algorithms, scheduling, load balancing.

I. I

NTRODUCTION

A key algorithmic challenge in modern data centers is the scheduling of online resource requests on the available hardware.Such requests are inherently multi-dimensional and simultaneously ask for multiple resources such as processor cycles, networkbandwidth, and storage space [23], [27], [34] (see also multi-dimensional load balancing in virtualization [28], [32]). In additionto the multi-dimensionality of resource requests, another challenge is the heterogeneity of server clusters because of incrementalhardware deployment and the use of dedicated specialized hardware for particular tasks [1], [24], [45]. As a third source ofnon-uniformity, the objective of the load balancing exercise is often deﬁned by the application at hand and the resource beingallocated. In addition to the traditional goals of minimizing maximum ( L ∞ norm) and total ( L norm) machine loads, variousintermediate L r norms are also important for speciﬁc applications. For example, the L norm of machine loads is suitable fordisk storage [17], [20] while the L r norm for r between 2 and 3 is used for modeling energy consumption [3], [38], [44].In the algorithmic literature, the (single dimensional) load balancing problem, also called list scheduling, has a long historysince the pioneering work of Graham in 1966 [26]. However, the multi-dimensional problem, introduced by Chekuri andKhanna [18] and called vector scheduling ( VS ), remains less understood. In the simplest version of this problem, each job hasa vector load and the goal is to assign the jobs to machines so as to minimize the maximum machine load over all dimensions.As an example of our limited understanding of this problem, we note that the approximation complexity of this most basicversion is not resolved yet — the current best approximation factor is O (log d/ log log d ) (e.g., [30]), where d is the numberof dimensions, while only an ω (1) lower bound is known [18]. In this paper, we consider the online version of this problem,i.e., where the jobs appear in a sequence and have to be assigned irrevocably to a machine on arrival. Note that this is themost common scenario in the data center applications that we described earlier, and in other real world settings. In additionto the basic setting described above, we also consider more general scenarios to capture the practical challenges that we a r X i v : . [ c s . D S ] A ug utlined. In particular, we consider this problem in both the identical and unrelated machines settings, the latter capturing thenon-uniformity of servers. Furthermore, we also consider all L r norm objectives of machine loads in addition to the makespan( L ∞ ) objective . In this paper, we completely resolve the online complexity of all these variants of the vector schedulingproblem.Formally, there are n jobs (denoted J ) that arrive online and must be immediately and irrevocably assigned on arrival toone among a ﬁxed set of m machines (denoted M ). We denote the d -dimensional load vector of job j on machine i by p i,j = (cid:104) p i,j ( k ) : k ∈ [ d ] (cid:105) , which is revealed on its online arrival. For identical machines, the load of job j in dimension k isidentical for all machines i , and we denote it p j ( k ) . Let us denote the assignment function of jobs to machines by f : J → M .An assignment f produces a load of Λ i ( k ) = (cid:80) j : f ( j )= i p i,j ( k ) in dimension k of machine i ; we succinctly denote the machineloads in dimension k by an m -dimensional vector Λ( k ) . (Note that for the scalar problem, there is only one such machineload vector.) The makespan norm.

We assume (by scaling) that the optimal makespan norm on each dimension is 1. Then, the VS problemfor the makespan norm (denoted VSMAX ) is deﬁned as follows.

Deﬁnition 1.

VSMAX : For any dimension k , the objective is the maximum load over all machines, i.e., (cid:107) Λ( k ) (cid:107) ∞ = max i ∈ M Λ i ( k ) . An algorithm is said to be α -competitive if (cid:107) Λ( k ) (cid:107) ∞ ≤ α for every dimension k . We consider this problem in both the identical machines (denoted VSMAX - I ) and the unrelated machines (denoted VSMAX - U ) settings. First, we state our result foridentical machines. Theorem 1.

There is a lower bound of Ω (cid:16) log d log log d (cid:17) on the competitive ratio of online algorithms for the VSMAX - I problem.Moreover, there is an online algorithm whose competitive ratio asymptotically matches this lower bound. The upper bound is a slight improvement over the previous best O (log d ) [8], [35], but the only lower bound knownpreviously was NP-hardness of obtaining an O (1) -approximation for the ofﬂine problem [18]. We remark that while the ofﬂineapproximability remains unresolved, the best ofﬂine algorithms currently known ([8], [35], this paper) are in fact online. Also,our lower bound is information-theoretic, i.e., relies on the online model instead of computational limitations.For unrelated machines ( VSMAX - U ), an O (log m + log d ) -competitive algorithm was given by Meyerson et al. [35]. Weshow that this is the best possible. Theorem 2.

There is a lower bound of

Ω(log m + log d ) on the competitive ratio of online algorithms for the VSMAX - U problem. Extensions to other L r norms. As we brieﬂy discussed above, there are many applications where an L r norm (for some r ≥ ) is more suitable than the makespan norm. First, we consider identical machines, and aim to simultaneously optimize all norms on all dimensions (denoted VSALL - I ). Deﬁnition 2.

VSALL - I : For dimension k and norm L r , r ≥ , the objective is (cid:107) Λ( k ) (cid:107) r = (cid:16) (cid:88) i ∈ M Λ ri ( k ) (cid:17) /r . An algorithm is said to α r -competitive for the L r norm if (cid:107) Λ( k ) (cid:107) r ≤ α r for every dimension k and every L r norm, r ≥ .The next theorem extends Theorem 1 to an all norms optimization. Theorem 3.

There is an online algorithm for the

VSALL - I problem that obtains a competitive ratio of O (cid:16)(cid:16) log d log log d (cid:17) r − r (cid:17) ,simultaneously for all L r norms. Moreover, these competitive ratios are tight, i.e., there is a matching lower bound for everyindividual L r norm. For unrelated machines, there is a polynomial lower bound for simultaneously optimizing multiple L r norms, even with scalarloads. This rules out an all norms approximation. Therefore, we focus on an any norm approximation, where the algorithm isgiven norms r , r , . . . , r d (where ≤ r k ≤ log m ), and the goal is to minimize the L r k norm for dimension k . The samelower bound also rules out the possibility of the algorithm being competitive against the optimal value of each individual normin their respective dimensions. We use a standard trick in multi-objective optimization to circumvent this impossibility: weonly require the algorithm to be competitive against any given feasible target vector T = (cid:104) T , . . . , T d (cid:105) . For ease of notation, Our L r -norms are typically referred to as p -norms or L p - norms. We use L r -norms to reserve the letter p for job processing times. For any m -dimensional vector x , (cid:107) x (cid:107) ∞ = Θ( (cid:107) x (cid:107) log m ) . Therefore, for any r k > log m , an algorithm can instead use a L log m norm to approximatean L r k norm objective up to constant distortion. Thus, in both our upper and lower bound results we restrict ≤ r k ≤ log m . e assume wlog (by scaling) that T k = 1 for all dimensions k . Now, we are ready to deﬁne the VS problem with arbitrary L r norms for unrelated machines — we call this problem VSANY - U . Deﬁnition 3.

VSANY - U : For dimension k , the objective is (cid:107) Λ( k ) (cid:107) r k = (cid:16) (cid:88) i ∈ M Λ r k i ( k ) (cid:17) /r k . An algorithm is said to α r k -competitive in the L r k norm if (cid:107) Λ( k ) (cid:107) r k ≤ α r k for every dimension k . Note the (necessary)difference between the deﬁnitions of VSALL - I and VSANY - U : in the former, the algorithm must be competitive in all normsin all dimensions simultaneously, whereas in VSANY - U , the algorithm only needs to be competitive against a single norm ineach dimension that is speciﬁed in the problem input. We obtain the following result for the any norm problem. Theorem 4.

There is an online algorithm for the

VSANY - U problem that simultaneously obtains a competitive ratio of O ( r k +log d ) for each dimension k , where the goal is to optimize the L r k norm in the k th dimension. Moreover, these competitiveratios are tight, i.e., there is a matching lower bound for every L r norm.A. Our Techniques First, we outline the main techniques used for the identical machines setting. A natural starting point for lower bounds isthe online vertex coloring ( VC ) lower bound of Halld´orsson and Szegedy [29], for which connections to VSMAX - I [18] havepreviously been exploited. The basic idea is to encode a VC instance as a VSMAX - I instance where the number of dimensions d is (roughly) n B and show that an approximation factor of (roughly) B for VSMAX - I implies an approximation factor of(roughly) n − /B for VC . One may want to try to combine this reduction and the online lower bound of Ω( n/ log n ) for VC [29] to get a better lower bound for VSMAX - I . However, the reduction crucially relies on the fact that a graph with thelargest clique size of at most k has a chromatic number of (roughly) O ( n − /k ) , and this does not imply that the graph canbe colored online with a similar number of colors.A second approach is to explore the connection of VSMAX - I with online vector bin packing ( VBP ), where multi-dimensionalitems arriving online must be packed into a minimum number of identical multi-dimensional bins. Recently, Azar et al. [8]obtained strong lower bounds of Ω( d /B ) where B ≥ is the capacity of each bin in every dimension (the items have amaximum size of 1 on any dimension). It would be tempting to conjecture that the inability to obtain a constant approximationalgorithm for the VBP problem unless B = Ω(log d ) should yield a lower bound of Ω(log d ) for the VSMAX - I problem.Unfortunately, this is false. The difference between the two problems is in the capacity of the bins/machines that the optimalsolution is allowed to use: in VSMAX - I , this capacity is 1 whereas in VBP , this capacity is B , and using bins with largercapacity can decrease the number of bins needed super-linearly in the increased capacity. Therefore, a lower bound for VBP does not imply any lower bound for

VSMAX - I . On the other hand, an upper bound of O ( d / ( B − log d ) for the VBP problemis obtained in [8] via an O (log d ) -competitive algorithm for VSMAX - I . Improving this ratio considerably for VSMAX - I wouldhave been a natural approach for closing the gap for VBP ; unfortunately, our lower bound of

Ω(log d/ log log d ) rules out thispossibility.Our lower bound is obtained via a different approach from the ones outlined above. At a high level, we leverage theconnection with coloring, but one to a problem of minimizing the size of the largest monochromatic clique given a ﬁxed setof colors. Our main technical result is to show that this problem has a lower bound of Ω( √ t ) for online algorithms, where t is the number of colors. To the best of our knowledge, this problem was not studied before and we believe this result shouldbe of independent interest. As is typical in establishing online lower bounds, the construction of the lower bound instanceis viewed as a game between the online algorithm and the adversary. Our main goal is to force the online algorithm to growcliques while guaranteeing that the optimal (ofﬂine) solution can color vertices in a way that limits clique sizes to a constant.The technical challenge is to show that the optimal solution does not form large cliques across the cliques that the algorithmhas created. For this purpose, we develop a novel randomized code that dictates the choices of the optimal solution and restrictsthose of the online algorithm. Using the probabilistic method on this code, we are able to show the existence of codewordsthat always lead to a good optimal solution and an expensive algorithmic one. We also show that the same idea can be usedto obtain a lower bound for any L r norm.We now turn our attention to our second main result which is in the unrelated machines setting: an upper bound for the VSANY - U problem. Our algorithm is greedy with respect to a potential function (as are algorithms for all special cases studied A target vector is feasible if there is an assignment such that for every dimension k , the value of the L r k norm in that dimension is at most T k . Ourresults do not rely heavily on the exact feasibility of the target vector; if there is a feasible solution that violates targets in all dimensions by at most a factorof β , then our results hold with an additional factor of β in the competitive ratio. In [37], the problem of coloring vertices without creating certain monochromatic subgraphs was studied, which is different from our goal of minimizingthe largest monochromatic clique size. Furthermore, this previous work was only for random graphs and the focus was on whether the desirable coloring isachievable online depending on the parameters of the random graph. arlier [4], [6], [15], [35]), and the novelty lies in the choice of the potential function. For each individual dimension k , we usethe L r k r k norm as the potential (following [4], [15]). The main challenge is to combine these individual potentials into a singlepotential. We use a weighted linear combination of the individual potentials for the different dimensions. This is somewhatcounter-intuitive since the combined potential can possibly allow a large potential in one dimension to be compensated by asmall potential in a different one — indeed, a na¨ıve combination only gives a competitive ratio of O (max k r k + log d ) for all k . However, we observe that we are aiming for a competitive ratio of O ( r k + log d ) which allows some slack compared toscalar loads if r k < log d . Suppose q k = r k + log d ; then we use weights of q − q k k in the linear combination after changing theindividual potentials to L q k r k . Note that as one would expect, the weights are larger for dimensions that allow a smaller slack.We show that this combined potential simultaneously leads to the asymptotically optimal competitive ratio on every individualdimension.Finally, we brieﬂy discuss our other results. Our slightly improved upper bound for the VSMAX - I problem follows from asimple random assignment and redistributing ‘overloaded’ machines. We remark that derandomizing this strategy is relativelystraightforward. Although this improvement is very marginal, we feel that this is somewhat interesting since our algorithm issimple and perhaps more intuitive yet gives the tight upper bound. For the VSALL - I problem, we give a reduction to VSMAX - I by structuring the instance by “smoothing” large jobs and then arguing that for structured instances, a VSMAX - I algorithm isalso optimal for other L r norms. B. Related Work

Due to the large volume of related work, we will only sample some relevant results in online scheduling and refer theinterested reader to more detailed surveys (e.g., [7], [39]–[41]) and textbooks (e.g., [13]).

Scalar loads.

Since the (2 − /m ) -competitive algorithm by Graham [26] for online (scalar) load balancing on identicalmachines, a series of papers [2], [10], [33] have led to the current best ratio of 1.9201 [22]. On the negative side, this problemwas shown to be NP-hard in the strong sense by Faigle et al. [21] and has since been shown to have a competitive ratio of atleast 1.880 [2], [11], [25], [31]. For other norms, Avidor et al. [5] obtained competitive ratios of (cid:112) / and − O (cid:16) log rr (cid:17) forthe L and general L r norms respectively.For unrelated machines, Aspnes et al. [4] obtained a competitive ratio of O (log m ) for makespan minimization, which isasymptotically tight [9]. Scheduling for the L norm was considered by [17], [20], and Awerbuch et al. [6] obtained a competitiveratio of √ , which was shown to be tight [16]. For general L r norms, Awerbuch et al. [6] (and Caragiannis [15]) obtaineda competitive ratio of O ( r ) , and showed that it is tight up to constants. Various intermediate settings such as related machines(machines have unequal but job-independent speeds) [4], [12] and restricted assignment (each job has a machine-independentload but can only be assigned to a subset of machines) [9], [16], [19], [42] have also been studied for the makespan and L r norms. Vector loads.

The

VSMAX - I problem was introduced by Chekuri and Khanna [18], who gave an ofﬂine approximation of O (log d ) and observed that a random assignment has a competitive ratio of O (cid:16) log dm log log dm (cid:17) . Azar et al. [8] and Meyerson etal. [35] improved the competitive ratio to O (log d ) using deterministic online algorithms. An ofﬂine ω (1) lower bound wasalso proved in [18], and it remains open as to what the exact dependence of the approximation ratio on d should be. Ouronline lower bound asserts that a signiﬁcantly sub-logarithmic dependence would require a radically different approach fromall the known algorithms for this problem.For unrelated machines, Meyerson et al. [35] noted that the natural extension of the algorithm of Aspnes et al. [4] to vectorloads has a competitive ratio of O (log dm ) for makespan minimization; in fact, for identical machines, they used exactly thesame algorithm but gave a tighter analysis. For the ofﬂine VSMAX - U problem, Harris and Srinivasan [30] recently showed thatthe dependence on m is not required by giving a randomized O (log d/ log log d ) approximation algorithm.II. I DENTICAL M ACHINES

First, we consider the online vector scheduling problem for identical machines. In this section, we obtain tight upper andlower bounds for this problem, both for the makespan norm (Theorem 1) and for arbitrary L r norms (Theorem 3). A. Lower Bounds for

VSMAX - I and VSALL - I In this section, we will prove the lower bound in Theorem 1, i.e., show that any online algorithm for the

VSMAX - I problemcan be forced to construct a schedule such that there exists a dimension where one machine has load Ω(log d/ log log d ) ,whereas the optimal schedule has O (1) load on all dimensions of all machines. This construction will also be extended to all L p norms ( VSALL - I ) in order to establish the lower bound in Theorem 3.We give our lower bound for VSMAX - I in two parts. First in Section II-A1, we deﬁne a lower bound instance for anonline graph coloring problem, which we call M ONOCHROMATIC C LIQUE . Next, in Section II-A2, we show how our loweround instance for M

ONOCHROMATIC C LIQUE can be encoded as an instance for

VSMAX - I in order to obtain the desired Ω(log d/ log log d ) bound.

1) Lower Bound for M ONOCHROMATIC C LIQUE : The M

ONOCHROMATIC C LIQUE problem is deﬁned as follows:M

ONOCHROMATIC C LIQUE : We are given a ﬁxed set of t colors. The input graph is revealed to an algorithm as an onlinesequence of n vertices v , . . . , v n that arrive one at a time. When vertex v j arrives, we are given all edges between vertices v , v , . . . , v j − and vertex v j . The algorithm must then assign v j one of the t colors before it sees the next arrival. Theobjective is to minimize the size of the largest monochromatic clique in the ﬁnal coloring.The goal of the section will be to prove the following lemma, which we will use later in Section II-A2 to establish ourlower bound for VSMAX - I . Theorem 5.

The competitive ratio of any online algorithm for M ONOCHROMATIC C LIQUE is Ω( √ t ) , where t is the numberof available colors. More speciﬁcally, for any online algorithm A , there is an instance on which A produces a monochromatic clique of size √ t , whereas the optimal solution can color the graph such that the size of the largest monochromatic clique is O (1) .We will frame the lower bound as a game between the adversary and the online algorithm. At a high level, the instance isdesigned as follows. For each new arriving vertex v and color c , the adversary connects v to every vertex in some currentlyexisting monochromatic clique of color c . Since we do this for every color, this ensures that regardless of the algorithm’scoloring of v , some monochromatic clique grows by 1 in size (or the ﬁrst vertex in a clique is introduced). Since this growthhappens for every vertex, the adversary is able to quickly force the algorithm to create a monochromatic clique of size √ t .The main challenge now is to ensure that the adversary can still obtain a good ofﬂine solution. Our choice for this solutionwill be na¨ıve: the adversary will simply properly color the monochromatic cliques it attempted to grow in the algorithm’ssolution. Since the game stops once the algorithm has produced a monochromatic clique of size √ t , and there are t colors,such a proper coloring of every clique is possible. The risk with this approach is that a large monochromatic clique may nowform in the adversary’s coloring from edges that cross these independently grown cliques (in other words, properly coloredcliques in the algorithm’s solution could now become monochromatic cliques for the adversary). This may seem hard to avoidsince each vertex is connecting to some monochromatic clique for every color. However, in our analysis we show that if oneach step the adversary selects which cliques to grow in a carefully deﬁned random fashion, then with positive probability, allproperly colored cliques in the algorithm’s solution that hurt the adversary’s na¨ıve solution are of size O (1) . Instance Construction:

We adopt the standard terminology used in online coloring problems (see, e.g. [29]). Namely, thealgorithm will place each vertex in one of t bins to deﬁne its color assignments, whereas we will use colors to refer to thecolor assignment in the optimal solution (controlled by the adversary). For each vertex arrival, the game is deﬁned by thefollowing 3-step process:1) The adversary issues a vertex v j and deﬁnes v j ’s adjacencies with vertices v , . . . , v j − .2) The online algorithm places v j in one of the available t bins.3) The adversary selects a color for the vertex.We further divide each bin into √ t slots , , . . . , √ t . These slots will only be used for the adversary’s bookkeeping.Correspondingly, we partition the t colors into √ t color sets C , . . . , C √ t , each of size √ t . Each vertex will reside in a slotinside the bin chosen by the algorithm, and all vertices residing in slot i across all bins will be colored by the optimal solutionusing a color from C i . The high-level goal of the construction will be to produce properly colored cliques inside each slot ofevery bin.Consider the arrival of vertex v j . Inductively assume the previous vertices v , . . . , v j − have been placed in the bins by thealgorithm, and that every vertex within a bin lies in some slot. Further assume that all the vertices in any particular slot of abin form a properly colored clique.To specify the new adjacencies formed by vertex v j for Step 1, we will use a t -length √ t -ary string s j , where we connect v j to every vertex in slot s j [ k ] of bin k , for all k = 1 , , . . . , t . Next, for Step 2, the algorithm places v j in some bin b j . Wesay that v j is then placed in slot q j = s j [ b j ] in bin b j . Finally for Step 3, the adversary chooses an arbitrary color for v j fromthe colors in C q j that have not yet been used for any vertex in slot q j of bin b j . The adversary will end the instance wheneverthere exists a slot in some bin that contains √ t vertices. This ensures that as long as the game is running, there is always anunused color in every slot of every bin. Also observe that after this placement, the clique in slot q j in bin b j has grown insize by 1 but is still properly colored. So, this induction is well deﬁned. This completes description of the instance (barringour choice for each adjacency string s j ). See Figures 1 and 2 for illustrations of the construction. Instance Analysis:

The following lemma follows directly from the construction.

Lemma 6.

For any online algorithm there is a monochromatic clique of size √ t . in 1 Bin 2

Bin 3

Bin 4

Bin 5 . . . . . .

Bin 16

Slot 1

Slot 2

Slot 3

Slot 4

Slot 1

Slot 2

Slot 3

Slot 4

Slot 1

Slot 2

Slot 3

Slot 4 p t = 4 slots per bin (adversary’s bookkeeping) t = 16 (adversary) colors: t = 16 bins (algorithm’s colors): Color-to-slot partition( p t colors per slot) Fig. 1.

Illustration of the construction set-up for Lemma 5.

Bin 15 { v j s j = 142122213434 Step 1:

Step 2:

Step 3:

Algo assigns v j to Bin 15. Adversarycolors v j black. Fig. 2. Depiction of the three-step lower-bound game for Lemma 5. For simplicity, the only adjacencies shown for vertices issued before v j are thosebetween vertices in the same bin-slot pair (in reality, other adjacencies also exist). Also for simplicity, the only adjacencies shown for v j are those it has withvertices in bins 13 through 16 (dictated by the bold substring “3213” in s j ). Again note that in reality, v j will also be adjacent to vertices in bins 1 through12 due to remaining preﬁx “142122213434”. Proof:

After t vertices are issued, there will some bin b containing at least t vertices, and therefore some slot in bin b containing at least √ t vertices forming a clique of size √ t . Since all the vertices in the clique are in the same bin, there existsa monochromatic clique of size √ t in the algorithm’s solution.Thus, it remains to show that there exists a sequence of √ t -ary strings of length t (recall that these strings deﬁne theadjacencies for each new vertex) such that the size of the largest monochromatic clique in the optimal coloring is O (1) . Forbrevity, we call such a sequence a good sequence .First observe that monochromatic edges (i.e., edges between vertices of the same color) cannot form between vertices inslots s and s (cid:48) (cid:54) = s (in the same or in different bins) since the color sets used for the slots are disjoint. Moreover, monochromaticedges cannot form within the same slot in the same bin since these vertices always form a properly colored clique. Therefore,monochromatic edges can only form between two adjacent vertices v j and v j (cid:48) such that q j = q j (cid:48) and b j (cid:54) = b j (cid:48) , i.e., vertices inthe same slot but in different bins. Relating back to our earlier discussion, these are exactly the edges that are properly coloredin the algorithm’s solution that could potentially form monochromatic cliques in the adversary’s solution; we will refer to suchedges as bad edges .hus, in order to deﬁne a good sequence of t strings, we need ensure our adjacency strings do not induce large cliques ofbad edges. To do this, we ﬁrst need a handle on what structure must exist across the sequence in order for bad-edge cliquesto form. This undesired structure is characterized by the following lemma. Lemma 7.

Suppose K = { u φ (1) , . . . , u φ ( w ) } is a w -sized monochromatic clique of color c ∈ C (cid:96) that forms during the instance,where φ : [ w ] → [ t ] maps k ∈ [ w ] to the index of the k th vertex to join K (note, from the above discussion, that b φ ( j ) aredifferent for all i ∈ [ w ] ). Then s φ ( j ) [ b φ ( i ) ] = (cid:96) ∀ j ∈ { , . . . , w } , ∀ i ∈ { , . . . , j − } . Proof:

Consider vertex u φ ( j ) (the j th vertex to join K ). Since K is a clique, u φ ( j ) must be adjacent to vertices u φ (1) , . . . , u φ ( j − . Since all these vertices are colored with c ∈ C (cid:96) , they must have been placed in slot (cid:96) in their respectivebins. Therefore, the positions in s φ ( j ) that correspond to these bins must also be (cid:96) , i.e., s φ ( j ) [ b φ ( i ) ] = (cid:96) for all previous vertices u φ ( i ) .In the remainder of the proof, we show that the structure in Lemma 7 can be avoided with non-zero probability for constantsized cliques if we generate our strings uniformly at random, thus implying the existence of a good set of t strings.Speciﬁcally, suppose the adversary picks each s j uniformly at random, i.e., for each character in s j we pick w ∈ [ √ t ] withprobability t − / . We deﬁne the following notation: • Let K be the event that the adversary creates a monochromatic clique of size 20 or greater. • Let K ( S, c ) be the event that a monochromatic clique K of color c and size 20 or greater forms such that the ﬁrst 10vertices to join K are placed in the bins speciﬁed by the set of 10 indices S . • Let P j ( S, q ) be a random variable that is 1 if s j [ i ] = q ∀ i ∈ S and 0 otherwise. Let P ( S, q ) = (cid:80) t j =1 P j ( S, q ) . • Let q ( c ) ∈ [ √ t ] to be the index of the color set to which color c belongs (i.e., c ∈ C q ( c ) ). • Let [ n ] k := (cid:0) [ n ] k (cid:1) denote the set of all size- k subsets of [ n ] .The next lemma follows from standard Chernoff-Hoeffding bounds, which we state ﬁrst for completeness. Theorem 8. (Chernoff-Hoeffding Bounds (e.g., [36])) Let X , X , ..., X n be independent binary random variables and let a , a , ..., a n be coefﬁcients in [0 , . Let X = (cid:80) i a i X i . Then, • For any µ ≥ E [ X ] and any δ > , Pr[

X > (1 + δ ) µ ] ≤ (cid:16) e δ (1+ δ ) (1+ δ ) (cid:17) µ . • For any µ ≤ E [ X ] and δ > , Pr[

X < (1 − δ ) µ ] ≤ e − µδ / . We are now ready to state and prove the lemma.

Lemma 9.

If the adversary picks each s j uniformly at random, then Pr[ P ( S, q ) ≥ < t − .Proof: First, we observe that for any set S ∈ [ t ] and any r ∈ [ √ t ] , we have Pr[ P i ( S, r ) = 1] = (1 / √ t ) = t − .Therefore by linearity of expectation, we have E [ P ( S, r )] = E  t (cid:88) i =1 P i ( S, r )  = t · t − = t − . (1)Applying Theorem 8 to P ( S, r ) with X i = P i ( S, r ) , a i = 1 , δ = 10 t − and µ = t − from Eqn. (1), we get Pr[ P ( S, r ) ≥ ≤ (cid:32) e t − (10 t ) t (cid:33) t − ≤ (cid:18) e (cid:19) · (cid:18) t (cid:19) < t − . Using Lemmas 7 and 9, we argue that there exist an ofﬂine solution with no monochromatic clique of super constant size.

Lemma 10.

There exists an ofﬂine solution where every monochromatic clique is of size O (1) .Proof: To show the existence of a good set of t strings, it is sufﬁcient to show that Pr[ K ] < . Using Lemma 9, wein fact show this event occurs with low probability. Observe that Pr[ K ] ≤ (cid:88) c ∈ [ √ t ] (cid:88) S ∈ [ t ] Pr[ K ( S, c )] ≤ (cid:88) c ∈ [ √ t ] (cid:88) S ∈ [ t ] Pr[ P ( S, q ( c )) ≥ . (2)

20 is an arbitrarily chosen large enough constant. he ﬁrst inequality is a straightforward union bound. The second inequality follows by Lemma 7. If the event K ( S, c ) occurs,then Lemma 7 implies s j [ b i ] = q ( c ) for j = 11 , . . . , , i ∈ S .Since there are √ t possible colors and | [ t ] | < t , applying both (2) and Lemma 9 we get Pr[ K ] ≤ (cid:88) c ∈ [ √ t ] (cid:88) S ∈ [ t ] Pr[ P ( S, q ( c )) ≥ ≤ (cid:88) c ∈ [ √ t ] (cid:88) S ∈ [ t ] t − ≤ t / · t · t − = t − / < , for all t > . Therefore, there is an optimal coloring such that there is no monochromatic clique of size more than 20.Theorem 5 now follows directly from Lemmas 6 and 10.

2) Lower Bound for

VSMAX - I and VSALL - I from M ONOCHROMATIC C LIQUE : We are now ready to use Theorem 5 toshow an

Ω(log d/ log log d ) lower bound for VSMAX - I . We will describe a lower bound instance for VSMAX - I whose structureis based on an instance of MONOCHROMATIC CLIQUE . This will allow us to use the lower bound instance from Theorem 5 asa black box to produce the desired lower bound for

VSMAX - I .We ﬁrst set the problem deﬁnition of M ONOCHROMATIC CLIQUE to be for m colors where m is also the number of machinesused in the VSMAX - I instance. Let I C be the lower-bound instance for this problem given by Theorem 5. This produces agraph G of m vertices such that the algorithm forms a monochromatic clique of size √ m , whereas the largest monochromaticclique in the optimal solution is of size O (1) . Let G j = ( V j , E j ) be the graph in I C after vertices v , . . . , v j have been issued(and so G n = G ). We deﬁne the corresponding lower bound instance for VSMAX - I as follows (see Figures 3 and 4 for anillustration): • There are m jobs, which correspond to vertices v , . . . , v m from I C . • Each job has d = (cid:0) m √ m (cid:1) dimensions, where each dimension corresponds to a speciﬁc √ m -sized vertex subset of the m vertices. Let S , . . . , S d be an arbitrary ordering of these subsets. • Job vectors will be binary. Namely, the k th vector entry for job j is 1 if v j ∈ S k and the vertices in { v , · · · , v j } ∩ S k form a clique in G j (if { v , · · · , v j } ∩ S k = { v j } , then it is considered a 1-clique); otherwise, the k th entry is 0. • Let c , . . . , c m deﬁne an ordering on the available colors from I C . We match each color from I C to a machine in ourscheduling instance. Therefore, when the VSMAX - I algorithm makes an assignment for a job, we translate this machineassignment as the corresponding color assignment in I C . Formally, if job j is placed on machine i in the schedulinginstance, then vertex v j is assigned color c i in I C .Since assigning jobs to machines corresponds to colorings in I C , it follows that the largest load in dimension k is the size ofthe largest monochromatic sub-clique in S k . I C is given by the construction in Theorem 5; therefore at the end of the instance,there will exist a dimension k (cid:48) such that the online algorithm colored every vertex in S k (cid:48) with some color c i . Thus, machine i will have √ m load in dimension k (cid:48) . In contrast, Theorem 5 ensures that all the monochromatic cliques in the optimal solutionare of size O (1) , and therefore the load on every machine in dimension k (cid:48) is O (1) .The relationship between m and d is given as follows. Fact 11. If d = (cid:0) m √ m (cid:1) , then √ m = Ω(log d/ log log d ) . Proof:

We will use the following well-known bounds on (cid:0) nk (cid:1) : for integers ≤ k ≤ n , (cid:0) nk (cid:1) k ≤ (cid:0) nk (cid:1) ≤ (cid:0) enk (cid:1) k . First, weobserve that log d = log (cid:18) m √ m (cid:19) ≤ log (cid:18) em √ m (cid:19) √ m = log( e √ m · m (3 / √ m ) = √ m · (1 + (3 /

2) log m ) . (3)We also have log log d = log log (cid:18) m √ m (cid:19) ≥ log log (cid:18) m √ m (cid:19) √ m ≥ log((3 / √ m log m ) ≥ (1 /

2) log m. (4)Hence, combining Eqns. (3) and (4), we obtain √ m ≥ log d /

2) log m ≥ log d d , which implies that √ m = Ω(log d/ log log d ) , as desired.To end the section, we show that our lower bound for VSMAX - I extends to general L r norms (Theorem 3). As before,our lower bound construction forces any algorithm to schedule jobs so that there exists a dimension k (cid:48) where at least onemachine has load at least √ m , whereas the load on every dimension of every machine in the optimal solution is boundedby some constant C . Since any dimension has at most √ m jobs with load 1, any assignment ensures that there are at most √ m machines with non-zero load in a given dimension. Therefore, in the optimal solution, the L r -norm of the load vector fordimension k (cid:48) is at most ( C r · √ m ) /r = C · m / (2 r ) .Thus, the ratio between the objective of the solution produced by the online algorithm and the optimal solution is at least m / / ( C · m / r ) = (1 /C ) · m ( r − / (2 r ) . Using Fact 11, we conclude the lower bound. , , } Dimension 1 . . . . . . { , , } { , , } Dimension { , , } Machine 1

Machine 2 ...

Machine 9

Color-to-machinecorrespondence Dimension 2 { , , } { , , } Job 6 vector: ( ) , , , . . . , , , . . . , Algorithm’s MonochromaticClique Solution

Fig. 3.

Illustration of the lower bound construction for

VSMAX - I using the MONOCHROMATIC CLIQUE lower bound (Theorem 5) for aninstance where m = 9 (and thus d = (cid:0) √ (cid:1) = (cid:0) (cid:1) for the VSMAX - I instance and t = 9 for the MONOCHROMATIC CLIQUE instance).Currently job 6 is being issued; its binary load vector, which is based on the current edge structure in the

MONOCHROMATIC CLIQUE instance, is given above the machines/dimensions. Observe that job 6 has 0 load in the ﬁrst three dimensions and the last dimension since 6is not contained in any of the these dimensions’ S k sets (indicated below). It does have load 1 in the dimension corresponding to set { , , } since vertex 6 forms a clique with vertices 2 and 3 in the MONOCHROMATIC CLIQUE instance; however, it still has load 0 in dimension { , , } since vertex 6 does not form a clique with vertices 2 and 4. { , , } . . . . . . { , , } { , , } { , , } Machine 1

Machine 2 ...

Machine 9 { , , } { , , } Fig. 4.

State of the construction after job 6 is assigned to machine 3. Since black is the color we associated with machine 3, this jobassignment by the

VSMAX - I algorithm is translated as coloring vertex 6 black in the MONOCHROMATIC CLIQUE instance.

B. Upper Bounds for

VSMAX - I and VSALL - I In this section we prove the upper bounds in Theorem 1 (

VSMAX - I ) and Theorem 3 ( VSALL - I ). First, we give a randomized O (log d/ log log d ) -competitive online algorithm for VSMAX - I (Section II-B1) and then show how to derandomize it (SectionII-B2). Next, we give an O ((log d/ log log d ) r − r ) -competitive algorithm for VSALL - I (Section II-B3), i.e., for each dimension k and ≤ r ≤ log m , (cid:107) Λ( k ) (cid:107) r is competitive with the optimal schedule for dimension k under the L r norm objective.Throughout the section we assume that a priori, the online algorithm is aware of both the ﬁnal volume of all jobs on eachdimension and the largest load over all dimensions and jobs. We note that the lower bounds claimed in Theorems 1 and 3 arerobust against this assumption since the optimal makespan is always a constant and this knowledge does not help the onlinealgorithm. Furthermore, these assumptions can be completely removed for our VSMAX - I algorithm by updating a thresholdon the maximum job load on any dimension and the total volume of jobs that the algorithm has observed so far. However, inorder to make our presentation more transparent and our notation simple, we present our results under these assumptions.For each job j that arrives online, both our VSMAX - I and VSALL - I algorithms will perform the following transformation: • Transformation 1:

Let V = (cid:104) V , . . . , V d (cid:105) be the volume vector given to the algorithm a priori, where V k denotes the totalvolume of all jobs for dimension k . For this transformation, we normalize p j ( k ) by dividing it by V k /m (for ease of notation,we will still refer to this normalized value as p j ( k ) ).Our VSMAX - I and VSALL - I algorithms will also perform subsequent transformations; however, these transformations willdiffer slightly for the two algorithms. ) Randomized Algorithm for VSALL - I : We now present our randomized O (log d/ log log d ) -competitive algorithm for VSMAX - I . Informally, our algorithm works as follows. For each job j , we ﬁrst attempt to assign it to a machine i chosenuniformly at random; however, if the resulting assignment would result in a load larger than O (log / log log d ) on machine i , then we dismiss the assignment and instead assign j greedily among other previously dismissed jobs. In general, a greedyassignment can be as bad as Ω( d ) -competitive; however, in our analysis we show that a job is dismissed by the randomassignment with low probability. Therefore in expectation, the total volume of these jobs is low enough to assign greedily andstill remain competitive. Instance Transformations:

Before formally deﬁning our algorithm, we deﬁne additional online transformations and outline theproperties that these transformations guarantee. Note that we perform these transformations for both the randomized algorithmpresented in this section and the derandomized algorithm presented in Section II-B2. These additional transformations aredeﬁned as follows (which are preformed in sequence after Transformation 1): • Transformation 2 : Let T be the load of the largest job in the instance (given a priori). If for dimension k we have T ≥ V k /m , then for each job j we set p j ( k ) to be ( p j ( k ) · V k ) / ( mT ) . In other words, we normalize jobs in dimension k by T instead of m/V k . • Transformation 3:

For each job j and dimension k , if p j ( k ) < (1 /d ) max k (cid:48) p j ( k (cid:48) ) , then we increase p j ( k ) to (1 /d ) max k (cid:48) p j ( k (cid:48) ) .Observe that after we apply Transformations 1 and 2 to all jobs, we have (cid:80) j p j ( k ) ≤ m for all k ∈ [ d ] and ≤ p j ( k ) ≤ for all jobs j and k ∈ [ d ] .In Lemmas 12 and 13, we prove additional properties that Transformation 3 preserves. Since Transformations 1 and 2 aresimple scaling procedures, an α -competitive algorithm on the resulting scaled instance is also α -competitive on the originalinstance, if we only apply the ﬁrst two transformations. In Lemma 12, we prove that this property is still maintained afterTransformation 3. Lemma 12.

After Transformations 1 and 2 have been applied, Transformation 3 increases the optimal makespan by a factorof at most 2.Proof:

Fix a machine i and a dimension k . Let OPT denote the optimal assignment before Transformation 3 is applied.Let J ∗ ( i ) denote the jobs assigned to machine i in OPT, Λ ∗ i ( k ) be the load of OPT on machine i in dimension k , and Λ ∗ = max i,k Λ ∗ i ( k ) denote the makespan of OPT. We will show that Transformation 3 can increase the load on machine i indimension k by at most Λ ∗ .Let V ∗ i = (cid:80) j ∈ J ∗ ( i ) (cid:80) k (cid:48) ∈ [ d ] p j ( k (cid:48) ) denote the total volume of jobs that OPT assigns to machine i . Observe that by a simpleaveraging argument, we have V i /d ≤ max k (cid:48) ∈ [ d ] Λ ∗ i ( k (cid:48) ) . Since Transformation 3 can increase the load of a job j in a ﬁxeddimension by at most (1 /d ) max k (cid:48) p j ( k (cid:48) ) , we can upper bound the total increase in load on machine i in dimension k asfollows: (cid:88) j ∈ J ∗ ( i ) (1 /d ) max k (cid:48) p j (cid:48) ( k (cid:48) ) ≤ V ∗ i /d ≤ max k (cid:48) ∈ [ d ] Λ ∗ i ( k (cid:48) ) ≤ Λ ∗ , (5)as desired. Note that the ﬁrst inequality follows from the fact that the sum of maximum loads on a machine is at most thetotal volume of its jobs.Recall that after Transformations 1 and 2, (cid:80) j p j ( k ) ≤ m for all k ∈ [ d ] . In Lemma 13, we show that this property ispreserved within a constant factor after Transformation 3. Lemma 13.

After performing Transformation 3, (cid:80) j p j ( k ) ≤ m for all k ∈ [ d ] .Proof: Consider any ﬁxed dimension k ∈ [ d ] . After Transformation 3, each job j ’s load on dimension k increases by atmost (1 /d ) max k (cid:48) p j ( k (cid:48) ) . Hence the total increase in load from jobs in dimension k is at most (cid:88) j (1 /d ) max k (cid:48) p j ( k (cid:48) ) ≤ (1 /d ) (cid:88) j (cid:88) k (cid:48) ∈ [ d ] p j ( k (cid:48) ) ≤ (1 /d ) md ≤ m, where the second inequality and the lemma follow from the fact that (cid:80) j p j ( k ) ≤ m before Transformation 3.In summary, the properties that we collectively obtain from these transformations are as follows: • Property 1 . For all k ∈ [ d ] , (cid:80) j p j ( k ) ≤ m . • Property 2 . For all j and k ∈ [ d ] , ≤ p j ( k ) ≤ . • Property 3 . For all j and k ∈ [ d ] , (1 /d ) max k (cid:48) p j ( k (cid:48) ) ≤ p j ( k ) ≤ max k (cid:48) p j ( k (cid:48) ) . • Property 4 . The optimal makespan is at least 1.Property 1 is a restatement of Lemma 13. Property 2 was true after the ﬁrst two transformations, and Transformation 3 hasno effect on this property. Property 3 is a direct consequence of Transformation 3.o see why Property 4 is true, let j be the job with the largest load T in the instance, and let k = arg max k (cid:48) p j ( k (cid:48) ) (i.e., max k (cid:48) p j ( k (cid:48) ) = T ). If Transformation 2 is applied to dimension k , then p j ( k ) = 1 afterwards, which immediately impliesProperty 4. Otherwise, only Transformations 1 and 3 are applied to dimension k and we have (cid:80) j (cid:48) p j (cid:48) ( k ) ≥ m , which againleads to Property 4 by a simple volume argument. Thus, by Property 4 and Lemma 12, it sufﬁcient to show that the makespanof the algorithm’s schedule is O (log d/ log log d ) . Algorithm Deﬁnition:

As discussed earlier, our algorithm consists of two procedures: a random assignment and greedy packing.It will be convenient to assume that the algorithm has two disjoint sets M , M of m identical machines that will be usedindependently by the two procedures, respectively. Each machine in M is paired with an arbitrary distinct machine in M ,and the actual load on a machine will be evaluated as the sum of the loads on the corresponding pair of machines. In otherwords, to show competitiveness it is sufﬁcient to prove that all machines in both M and M have load O (log d/ log log d ) .Deﬁne the parameter α :=

10 log d log log d . Our two procedures are formally deﬁned as follows. • First procedure (random assignment) : Assign each job to one of the machines in M uniformly at random. Let J j ( i ) denote the subset of the ﬁrst j jobs { , , ..., j } that are assigned to machine i in this procedure, and let Λ i,j ( k ) denotethe resulting load on machine i on dimension k due to jobs in J j ( i ) . If Λ i,j ( k ) ≥ α + 1 for some k ∈ [ d ] , then we passjob j to the second procedure. (However, note that all jobs are still scheduled by the ﬁrst procedure; so even if a job j ispassed to the second procedure after being assigned to machine i in the ﬁrst procedure, j still contributes load to Λ i ( k ) for all k ). • Second procedure (greedy packing) : This procedure is only concerned with the jobs J that are passed from the ﬁrstprocedure. It allocates each job in J (in the order that the jobs arrive in) to one of the machines in M such thatthe resulting makespan, max i ∈ M ,k ∈ [ d ] Λ i,j ( k ) is minimized; Λ i,j ( k ) is analogously deﬁned for this second procedure asabove.This completes the description of the algorithm. We will let J ( i ) := J n ( i ) and Λ i ( k ) := Λ i,n ( k ) , and deﬁne J ( i ) and Λ i ( k ) similarly. We emphasize again that jobs in J are scheduled only on machines M ; all other jobs are scheduled on M machines. Algorithm Analysis:

It follows directly from the deﬁnition of the algorithm that the loads on machines in M are at most α + 1 = O (log d/ log log d ) . Therefore, we are only left with bounding the loads on machines in M . The following lemmashows that the second procedure receives only a small fraction of the total volume, which then allows us to argue that thegreedy assignment in the second procedure is α -competitive. Lemma 14.

The probability that a job j is passed to the second procedure is at most /d , i.e. Pr[ j ∈ J ] ≤ /d .Proof: Fix a machine i , job j and dimension k . Suppose job j was assigned to machine i by the ﬁrst procedure and ispassed to the second procedure because we would have had Λ i,j ( k ) ≥ α + 1 . Since p j ( k ) ≤ Λ i,j − ( k ) ≥ α . Therefore we will show Pr[Λ i,j − ( k ) ≥ α ] ≤ /d , (6)where the probability space is over the random choices of jobs , , ..., j − . Once inequality (6) is established, the lemmafollows from a simple union bound over all dimensions.To show (6), we use standard Chernoff-Hoeffding bounds (stated in Theorem 8 earlier). Note that E [Λ i,j − ( k )] ≤ due toProperty 1 and the fact that jobs are assigned to machines uniformly at random. To apply the inequality, we deﬁne randomvariables X , X , ..., X j − where X j (cid:48) = 1 if job j (cid:48) is assigned to machine i ; otherwise X j (cid:48) = 0 . Set the parameters ofTheorem 8 as follows: a j (cid:48) = p j (cid:48) ( k ) , µ = 2 , and δ = α − . Thus we have: Pr[Λ i,j − ( k ) ≥ α ] = Pr  (cid:88) j (cid:48) ∈ [ j − a j (cid:48) X j (cid:48) ≥ αµ  = Pr  (cid:88) j (cid:48) ∈ [ j − a j (cid:48) X j (cid:48) ≥ (1 + δ ) µ  ≤ (cid:18) e δ (1 + δ ) (1+ δ ) (cid:19) µ ≤ e δ (1 + δ ) (1+ δ ) ≤ / (5 log d/ log log d ) (5 log d/ log log d ) ≤ /d (for sufﬁciently large d ),as desired.Next, we upper bound the makespan of the second procedure in terms of its total volume of jobs V , i.e. V = (cid:80) j ∈ J ,k ∈ [ d ] p j ( k ) . Lemma 15. max i ∈ M ,k ∈ [ d ] Λ i ( k ) ≤ V /m + 1 .Proof: For sake of contradiction, suppose that at the end of the instance there exists a dimension k and machine i suchthat Λ i ( k ) > V /m + 1 . Let j be the job that made machine i ﬁrst cross this V /m + 1 threshold in dimension k . For eachmachine i (cid:48) , let k i (cid:48) = arg max k (cid:48) Λ i,j − ( k (cid:48) ) denote the dimension with maximum load on machine i (cid:48) before j was assigned.y Property 2 and the greediness of the algorithm, we have that Λ i (cid:48) ,j − ( k i (cid:48) ) > V /m for all i (cid:48) . Otherwise, j would havebeen assigned to a machine other than i resulting in a makespan less than V /m + 1 (since max k,j p j ( k ) ≤ ). However, thisimplies that every machine in M has a dimension with more than V /m load. Clearly, this contradicts the deﬁnition of V .We are now ready to complete the analysis. From Lemma 14 and linearity of expectation, we know that E [ V ] ≤ d (cid:88) j,k ∈ [ d ] p j ( k ) ≤ d · dm = md , (7)where the second inequality follows from Property 1. Hence, inequality (7) along Lemma 15 imply that the second procedureyields an expected makespan of O (1) , which completes our analysis.

2) Derandomized Algorithm for

VSMAX - I : Our derandomization borrows the technique developed in [14]. To derandomizethe algorithm, we replace the ﬁrst procedure — a uniformly random assignment — with a deterministic assignment guided bythe following potential Φ . Let f ( x ) := α x for notational simplicity. Recall that α := 10 log d/ log log d . Φ i,k ( j ) := f  Λ i,j ( k ) − αm (cid:88) j (cid:48) ∈ [ j ] p j (cid:48) ( k )  ∀ i ∈ M , j ∈ [ n ] , k ∈ [ d ]Φ( j ) := (cid:88) i ∈ M d (cid:88) k =1 Φ i,k ( j ) • (New deterministic) ﬁrst procedure. Each job j is assigned to a machine i such that Φ( j ) is minimized. If Λ i,j ( k ) ≥ α + 1 , then j is added to queue J so that it can be scheduled by the second procedure. As before, each job is scheduledby either the ﬁrst procedure or the second, and contributes to the “virtual” load Λ i,j ( k ) in either case. Lemma 16. Φ( j ) is non-increasing in j .Proof: Consider the arrival of job j . To structure our argument, we assume the algorithm still assigns j to a machine in M uniformly at random. Our goal now is to show that E [Φ( j )] ≤ Φ( j − , which implies the existence of a machine i suchthat assigning job j to the machine i leads to Φ( j ) ≤ Φ( j − (and such an assignment is actually found by the algorithmsince its assignment maximizes the decrease in potential). We bound E [Φ i,k ( j )] as follows. E [Φ i,k ( j )] = 1 m f  Λ i,j − + p j ( k ) − αm p j ( k ) − αm (cid:88) j (cid:48) ∈ [ j − Λ i,j − ( k )  + (1 − m ) f  Λ i,j − − αm p j ( k ) − αm (cid:88) j (cid:48) ∈ [ j − Λ i,j − ( k )  = Φ i,k ( j − · α − αm p j ( k ) · (cid:18) m ( α p j ( k ) −

1) + 1 (cid:19) ≤ Φ i,k ( j − · α − αm p j ( k ) (cid:18) p j ( k ) m ( α −

1) + 1 (cid:19) (8) ≤ Φ i,k ( j − · exp (cid:18) − ( α log α ) · p j ( k ) m (cid:19) exp (cid:18) p j ( k ) m · ( α − (cid:19) (9) ≤ Φ i,k ( j − Inequality (8) follows since α x − ≤ ( α − x for x ∈ [0 , , and p j ( k ) ≤ due to Property 2. Inequality (9) follows fromthe fact that x + 1 ≤ e x . Therefore, by linearity of expectation, we have E [Φ( j )] ≤ Φ( j − , thereby proving the lemma.The next corollary follows from Lemma 16 and the simple observation that Φ(0) = md . Corollary 17. Φ( n ) ≤ md .As in Section II-B1, it is straightforward to see that the algorithm forces machines in M to have makespan O ( α ) , sowe again focus on the second procedure of the algorithm. Here, we need a deterministic bound on the total volume V = (cid:80) j ∈ J (cid:80) k ∈ [ d ] p j ( k ) that can be scheduled on machines in M . Lemma 18 provides us with such a bound. Lemma 18. V ≤ m/d .Proof: Consider a job j ∈ J that was assigned to machine i in the ﬁrst procedure. Let k ( j ) be an arbitrary dimension k with Λ i,j ≥ α + 1 (such a dimension exists since j ∈ J ). Let J i ( k ) = { j : j ∈ J ( i ) ∩ J and k ( j ) = k } denote the set ofobs j ∈ J that were assigned to machine i by the ﬁrst procedure and are associated with dimension k . We upper bound V as follows: V = (cid:88) j ∈ J (cid:88) k (cid:48) ∈ [ d ] p j ( k (cid:48) )= (cid:88) i ∈ M ,k ∈ [ d ] (cid:88) j ∈ J i ( k ) (cid:88) k (cid:48) ∈ [ d ] p j ( k (cid:48) ) (since we associate job j ∈ J with a unique dimemsion k ( j ) ) ≤ (cid:88) i ∈ M ,k ∈ [ d ] (cid:88) j ∈ J i ( k ) d p j ( k ) (by Property 3) ≤ d (cid:88) i ∈ M (cid:88) k ∈ [ d ] (Λ i ( k ) − α ) + (10)To see why the last inequality holds, recall that Λ i,j ( k ) ≥ α + 1 when j ∈ J ( i ) and k = k ( j ) . This can happen onlywhen Λ i,j − ( k ) ≥ α since p j ( k ) ≤ due to Property 2. Since Λ i,j (cid:48) ( k ) is non-decreasing in j (cid:48) , the sum of p j ( k ) over allsuch jobs j is at most (Λ i ( k ) − α ) + ; here ( x ) + := max { , x } .We claim that for all i ∈ M , k ∈ [ d ] , Φ i,k ( n ) ≥ α α (Λ i,j ( k ) − α ) + (11)If Λ i,j ( k ) − α ≤ , then the claim is obviosuly true since Φ i,k ( n ) is always non-negative. Otherwise, we have Φ i,k ( n ) ≥ α Λ i,n ( k ) − α ≥ α α (Λ i,j ( k ) − α ) , where the ﬁrst inequlaity follows from Property 1. So in either case, (11) holds.By combining (10), (11), Corollary 17, and recalling α =

10 log d log log d , we have V ≤ d (cid:88) i (cid:88) k (Λ i ( k ) − α ) + ≤ d α α (cid:88) i (cid:88) k Φ i,k ( n ) ≤ d α α m ≤ md . By Lemma 15, we have max i ∈ M ,k ∈ [ d ] Λ i ( k ) ≤ m V + 1 = O (1) . Thus, we have shown that each of the two deterministicprocedures yields a makespan of O ( α ) = O (log d/ log log d ) , thereby proving the upper bound.

3) Algorithm for

VSALL - I : We now give our O ((log d/ log log d ) r − r ) -competitive algorithm for VSALL - I . Throughoutthe section, let A denote the O (log d/ log log d ) -competitive algorithm for VSMAX - I deﬁned in Section II-B2. Our VSALL - I algorithm essentially works by using A as a black box; however, we will perform a smoothing transformation on large loadsbefore scheduling jobs with A . Algorithm Deﬁnition:

We will apply the following transformation to all jobs j that arrive online after Transformation 1 hasbeen performed (note that this is in replacement of Transformations 2 and 3 deﬁned in Section II-B1). • Transformation 2: If p j ( k ) > , we reduce p j ( k ) to be 1. If this load reduction is applied in dimension k for job j , wesay j is large in k ; otherwise, j is small in dimension k .It is straightforward to see that Transformations 1 and 2 provide the following two properties: • Property 1: (cid:80) j ∈ J p j ( k ) ≤ m for all k ∈ [ d ] . • Property 2: ≤ p j ( k ) ≤ for all j ∈ J, k ∈ [ d ] .On this transformed instance, our algorithm simply schedules jobs using our VSMAX - I algorithm A . Algorithm Analysis:

Let α = O (log d/ log log d ) be the competitive ratio of algorithm A . Clearly if we can establish α ( r − /r -competitiveness for the scaled instance (i.e. just applying Transformation 1 to all jobs but not Transformation 2), then ouralgorithm is competitive on the original instance as well. Let OPT (cid:48) ( k, r ) be the cost of the optimal solution on the scaledloads in dimension k . In Lemma 19, we establish two lower bounds on OPT (cid:48) ( k, r ) r . Lemma 19.

OPT (cid:48) ( k, r ) r ≥ max (cid:88) j ∈ J p j ( k ) r , m · (cid:88) j ∈ J p j ( k ) /m  r  = max (cid:88) j ∈ J ( p j ( k ) r ) , m  . Proof:

Consider any ﬁxed assignment of jobs, and let J (cid:48) ( i ) ⊆ J be the set of jobs assigned to machine i . Consider anyﬁxed k . The ﬁrst lower bound (within the max in the statement of the lemma) follows since (cid:88) i ∈ M  (cid:88) j ∈ J (cid:48) ( i ) p j ( k )  r ≥ (cid:88) i ∈ M (cid:88) j ∈ J (cid:48) ( i ) p j ( k ) r = (cid:88) j ∈ J p j ( k ) r . he second lower bound is due to the convexity of x r when r ≥ .Let J ( i ) ⊆ J be the set of jobs assigned to machine i by the online algorithm. Let (cid:96) ( i, k ) and s ( i, k ) be the set of jobsassigned to machine i that are large and small in dimension k , respectively. For brevity, let σ (cid:96) ( i, k ) = (cid:80) j ∈ (cid:96) ( i,k ) p j ( k ) and σ s ( i, k ) = (cid:80) j ∈ s ( i,k ) p j ( k ) . Observe that since algorithm A is α -competitive on an instance with both Properties 1 and 2, weobtain the following additional two properties for the algorithm’s schedule: • Property 3: | (cid:96) ( i, k ) | ≤ α for all i ∈ M, k ∈ [ d ] . • Property 4: σ s ( i, k ) ≤ α for all i ∈ M, k ∈ [ d ] .Using these additional properties, the next two lemmas will bound the contribution of both large and small loads to theobjective; namely, we need to bound both σ (cid:96) ( i, k ) r and (cid:80) i σ s ( i, k ) r in terms of α . Lemma 20 provides this bound for largeloads, while Lemma 21 will be used to bound small loads. Lemma 20. σ (cid:96) ( i, k ) r = (cid:16)(cid:80) j ∈ (cid:96) ( i,k ) p j ( k ) (cid:17) r ≤ α r − (cid:80) j ∈ (cid:96) ( i,k ) p j ( k ) r . Proof:

Let h = | (cid:96) ( i, k ) | . Then, it follows that  (cid:88) j ∈ (cid:96) ( i,k ) p j ( k )  r =  h (cid:88) j ∈ (cid:96) ( i,k ) ( p j ( k ) · h )  r ≤ h (cid:88) j ∈ (cid:96) ( i,k ) ( p j ( k ) · h ) r (due to the convexity of x r ) = h r − (cid:88) j ∈ (cid:96) ( i,k ) p j ( k ) r ≤ α r − (cid:88) j ∈ (cid:96) ( i,k ) p j ( k ) r (by Property 3) . Recall that by Property 1, we have that σ s ( i, k ) ≤ m . Using this fact and along with Property 4, the general statementshown in Lemma 21 will immediately provide us with the desired bound on (cid:80) i σ s ( i, k ) r (stated formally in Corollary 22). Lemma 21.

Let f ( x ) = x r for some r ≥ whose domain is deﬁned over a set of variables x , . . . , x n ∈ [0 , α ] where α ≥ .If (cid:80) mi =1 x i ≤ m , then m (cid:88) i =1 f ( x i ) ≤ m α r − . Proof:

Let ˜ f = (cid:80) mi =1 f ( x i ) . We claim that ˜ f is maximized when < x i < α for at most one i ∈ [ m ] . If there aretwo such variables x i and x j with < x i ≤ x j < α , it is easy to see that we can further increase ˜ f by decreasing x i andincreasing x j by an inﬁnitesimal equal amount (i.e. x i ← x i − (cid:15) and x j ← x j + (cid:15) ) due to convexity of f .Hence, the ˜ f is maximized when the multi-set { x i : i ∈ [ m ] } has (cid:98) m/α (cid:99) copies of α , and one copy of m − α (cid:98) m/α (cid:99) (whichis at most α ), which gives, ˜ f ≤ (cid:98) m/α (cid:99) f ( α ) + f ( m − α (cid:98) m/α (cid:99) ) . (12)If (cid:98) m/α (cid:99) ≥ , then it follows that m (cid:88) i =1 f ( x i ) = ˜ f ≤ ( (cid:98) m/α (cid:99) + 1) f ( α ) (by Eqn. (12) and since m − α (cid:98) m/α (cid:99) ≤ α ) ≤ m/α ) α r = 2 m α r − (since (cid:98) m/α (cid:99) ≥ ).In the case where m < α , ˜ f is maximized by making single x i = m . Therefore ˜ f ≤ f ( m ) = m r ≤ mα r − . Corollary 22.

For all dimensions k , (cid:80) i ∈ M σ s ( i, k ) r ≤ m α r − .We are now ready to bound (cid:107) Λ( k ) (cid:107) r against OPT (cid:48) ( k, r ) . Lemma 23.

For all dimensions k , (cid:107) Λ( k ) (cid:107) r = O ( α ( r − /r ) OPT (cid:48) ( k, r ) , i.e., the L r norm of the vector load is at most O ( α ( r − /r ) times the L r norm of the vector load of the optimal solution.Proof: Using Lemmas 19, 20, and Corollary 22, we have the following bound for | Λ( k ) (cid:107) rr = (cid:80) i ∈ M (cid:16)(cid:80) j ∈ J ( i ) p j ( k ) (cid:17) r : i ∈ M  (cid:88) j ∈ J ( i ) p j ( k )  r = (cid:88) i ∈ M  (cid:88) j ∈ (cid:96) ( i,k ) p j ( k ) + (cid:88) j ∈ s ( i,k ) p j ( k )  r ≤ (cid:88) i ∈ M   (cid:88) j ∈ (cid:96) ( i,k ) p j ( k ) , (cid:88) j ∈ s ( i,k ) p j ( k )  r ≤ r (cid:88) i ∈ M  (cid:88) j ∈ (cid:96) ( i,k ) p j ( k )  r +  (cid:88) j ∈ s ( i,k ) p j ( k )  r  ≤ r  α r − (cid:88) j ∈ (cid:96) ( i,k ) p j ( k ) r + 2 m · α r −  (by Lemma 20 and Corollary 22) ≤ (2 r +1 α r − ) OPT (cid:48) ( k, r ) r (by Lemma 19)which, raising both the LHS and RHS to /r , gives us (cid:107) Λ( k ) (cid:107) r ≤ (cid:16) /r α ( r − /r (cid:17) OPT (cid:48) ( k, r ) = O ( α ( r − /r ) OPT (cid:48) ( k, r ) . The upper bound in Theorem 3 now follows immediately from Lemma 23.III. U

NRELATED M ACHINES

Now, we consider the online vector scheduling problem for unrelated machines. In this section, we obtain tight upper andlower bounds for this problem, both for the makespan norm (Theorem 2) and for arbitrary L r norms (Theorem 4). A. Lower Bound for

VSANY - U In this section we prove the lower bound in Theorem 4, i.e., we show that we can force any algorithm to make an assignmentwhere there exists a dimension k that has cost at least Ω(log d + r k ) where ≤ r k ≤ log m .Our construction is an adaptation of the lower bounds in [15] and [6] but for a multidimensional setting. Informally, theinstance is deﬁned as follows. We set m = d and then associate i th machine with the i th dimension, i.e., machine i onlyreceives load in the i th dimension. We then issue jobs in a series of log d + 1 phases. In a given phase, there will be a currentset of active machines , which are the only machines that can be loaded in the current phase and for the rest of the instance (soonce a machine is inactivated it stays inactive). More speciﬁcally, in a given phase we arbitrarily pair off the active machinesand then issue one job for each pair, where each job has unit load but is deﬁned such that it must be assigned to a uniquemachine in its pair. When a phase completes, we inactivate all the machines that did not receive load (so we cut the numberof active machines in half). This process eventually produces a load of log d + 1 on some machine, whereas reversing thedecisions of the algorithm gives an optimal schedule where L k = 1 for all k ∈ [ d ] .More formally let d = 2 h . The adversary sets the instance target parameters to be T k = 1 for all k ∈ [ d ] (it will be clearfrom our construction that these targets are feasible). For each job j , let m ( j ) , m ( j ) ∈ [ m ] denote the machine pair theadversary associates with job j . We deﬁne j to have unit load on machines m ( j ) , m ( j ) in their respective dimensions andarbitrarily large load on all other machines. Formally, p i,j ( k ) is deﬁned to be p i,j ( k ) =  if i (cid:54) = k and i ∈ { m ( j ) , m ( j ) } if i = k and i ∈ { m ( j ) , m ( j ) }∞ otherwise . As discussed above, the adversary issues jobs in h + 1 phases. Phases through h will work as previously speciﬁed (wedescribe how the ﬁnal ( h + 1) th phase works shortly). Let S (cid:96) denote the active machines in phase (cid:96) . In the (cid:96) th phase, weissue a set of jobs J (cid:96) where | J (cid:96) | = 2 h − (cid:96) . We then pair off the machines in S (cid:96) and use each machine pair as m ( j ) and m ( j ) for a unique job j ∈ J (cid:96) . Clearly the algorithm must schedule j on m ( j ) or m ( j ) , and thus h − (cid:96) machines accumulate anadditional load of 1 in phase (cid:96) . Machines that receive jobs in phase (cid:96) remain active in phase (cid:96) + 1 ; all other machines areset to be inactive. In the ﬁnal phase h + 1 , there will be a single remaining active machine i (cid:48) ; thus, we issue a single job j (cid:48) with unit load that must be scheduled on i (cid:48) (note that this ﬁnal phase is added to the instance only to make our target vectorfeasible).Based on this construction, there will exist a dimension k (cid:48) at the end of the instance that has load h + 1 on machine k (cid:48) and 0 on all other machines. Observe that the optimal schedule is obtained by reversing the decisions of the algorithm, whichlaces a unit load on one machine in each dimension. Namely, if j was assigned to m ( j ) , then the optimal schedule assigns j to m ( j ) (and vice versa), with the exception that j (cid:48) is assigned to its only feasible machine.In the case that log d ≥ r k (cid:48) , the adversary stops. Since T k (cid:48) = 1 and L k (cid:48) = h + 1 = log d + 1 , we have that L k (cid:48) /T k (cid:48) =Ω(log d + r k ) . If log d < r k (cid:48) , then the adversary stops the current instance and begins a new instance. In the new instance,we simply simulate the lower bound from [6] in dimension k (cid:48) (i.e., the only dimension that receives load is dimension k (cid:48) ; theadversary also resets the target vectors accordingly). Here, the adversary forces the algorithm to be Ω( r k (cid:48) ) -competitive, which,since log d < r k (cid:48) , gives us the desired bound of Ω(log d + r k (cid:48) ) . B. Upper Bound

Our goal is to prove the upper bound in Theorem 4. Recall that we are given targets T k , and we have to show that (cid:107) Λ( k ) (cid:107) r k = O (log d + r k ) · T k for all k ∈ [ d ] . ( Λ( k ) is the load vector in dimension k and r k is the norm that we areoptimizing.) First, we normalize p i,j ( k ) to p i,j ( k ) /T k for all dimensions k ; to keep the notation simple, we will also denotethis normalized load p i,j ( k ) . This ensures that the target objective is 1 in every dimension. (We assume wlog that T k > . If T k = 0 , the algorithm discards all assignments that put non-zero load on dimension k ).

1) Description of the Algorithm:

As described in the introduction, our algorithm is greedy with respect to a potentialfunction deﬁned on modiﬁed L r k norms. Let L k = (cid:107) Λ( k ) (cid:107) r k denote the L r k -norm of the machine loads in the k th dimension,and q k = r k + log d denote the desired competitive ratio; all logs are base 2. We deﬁne the potential for dimension k as Φ k = L q k k . The potentials for the d different dimensions are combined using a weighted linear combination, where the weightof dimension k is α k = (3 q k ) − q k . Note that dimensions that allow a smaller slack in the competitive ratio are given a largerweight in the potential. We denote the combined potential by Φ = (cid:80) dk =1 α k · Φ k . The algorithm assigns job j to the machinethat minimizes the increase in potential Φ .

2) Competitive analysis:

Let us ﬁx a solution satisfying the target objectives, and call it the optimal solution. Let Λ i ( k ) and Λ ∗ i ( k ) be the load on the i th machine in the k th dimension for the algorithmic solution and the optimal solution respectively.We also use L ∗ k to denote the L r k norm in the k th dimension for the optimal solution; we have already asserted that by scaling, L ∗ k = 1 .Similar to [4], [15], we compare the actual assignment made by the algorithm (starting with zero load on every machine inevery dimension) to a hypothetical assignment made by the optimal solution starting with the ﬁnal algorithmic load on everymachine (i.e., load of Λ i ( k ) on machine i in dimension k ).We will need the following fact for our analysis, which follows by observing that all parameters are positive, the functionis continuous in the domain, and its derivative is non-negative. Fact 24.

The function f ( x , x , . . . , x m ) = ( (cid:80) i ( x i + a i ) w ) z − ( (cid:80) i x wi ) z is non-decreasing if for all i ∈ [ m ] we restrict thedomain of x i to be [0 , ∞ ) , w ≥ , z ≥ , and a i ≥ .Using greediness of the algorithm and convexity of the potential function, we argue in Lemma 25 that the change in potentialin the former process is upper bounded by that in the latter process. Lemma 25.

The total change in potential in the online algorithm satisﬁes: d (cid:88) k =1 α k L q k k = Φ( n ) − Φ(0) ≤ d (cid:88) k =1 α k (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i ( k ) (cid:17) r k (cid:17) q k /r k − d (cid:88) k =1 α k L q k k Proof:

Let y i,j = 1 if the algorithm assigns job j to machine i ; otherwise, y i,j = 0 . Deﬁne y ∗ i,j similarly but for theoptimal solution’s assignments. We can express the resulting change in potential from scheduling job j as follows. Φ( j ) − Φ( j −

1) = (cid:80) dk =1 α k ( L q k k ( j ) − L q k k ( j − (cid:80) dk =1 α k (cid:16)(cid:16) (cid:80) mi =1 Λ r k i,j ( k ) (cid:17) q k /r k − (cid:16) (cid:80) mi =1 Λ r k i,j − ( k ) (cid:17) q k /r k (cid:17) = (cid:80) dk =1 α k (cid:16)(cid:16) (cid:80) mi =1 (cid:16) Λ i,j − ( k ) + p i,j ( k ) · y i,j (cid:17) r k (cid:17) q k /r k − (cid:16) (cid:80) mi =1 Λ r k i,j − ( k ) (cid:17) q k /r k (cid:17) . (13)Since the online algorithm schedules greedily based on Φ( j ) , using optimal schedule’s assignment for job j must result ina potential increase that is at least as large. Therefore by (13) we have Φ( j ) − Φ( j − ≤ d (cid:88) k =1 α k (cid:16)(cid:16) m (cid:88) i =1 (cid:16) Λ i,j − ( k ) + p i,j ( k ) · y ∗ i,j (cid:17) r k (cid:17) q k /r k − (cid:16) m (cid:88) i =1 Λ r k i,j − ( k ) (cid:17) q k /r k (cid:17) . (14)As loads are non-decreasing, Λ i ( k ) ≥ Λ i,j − ( k ) . Also note that r k ≥ and q k /r k = ( r k + log d ) /r k > . hus, we can apply Fact 24 to (14) (setting w = r k , z = q k /r k , and a i = p i,j ( k ) · y ∗ i,j ) to obtain Φ( j ) − Φ( j − ≤ d (cid:88) k =1 α k (cid:16)(cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + p i,j ( k ) · y ∗ i,j (cid:17) r k (cid:17) q k /r k − (cid:16) m (cid:88) i =1 Λ r k i ( k ) (cid:17) q k /r k (cid:17) . (15)We can again use Fact 24 to further bound the potential increase (using the same values of a i , w , and z , but now x i = Λ ∗ i,j − ( k ) ): Φ( j ) − Φ( j − ≤ d (cid:88) k =1 α k (cid:16)(cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i,j − ( k ) + p i,j ( k ) · y ∗ ij (cid:17) r k (cid:17) q k /r k − (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i,j − ( k ) (cid:17) r k (cid:17) q k /r k (cid:17) = d (cid:88) k =1 α k (cid:16)(cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i,j ( k ) (cid:17) r k (cid:17) q k /r k − (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i,j − ( k ) (cid:17) r k (cid:17) q k /r k (cid:17) . (16)Observe that for a ﬁxed k , the RHS of (16) is a telescoping series if we sum over all jobs j : n (cid:88) j =1 α k (cid:16)(cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i,j ( k ) (cid:17) r k (cid:17) q k /r k − (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i,j − ( k ) (cid:17) r k (cid:17) q k /r k (cid:17) = α k (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i ( k ) (cid:17) r k (cid:17) q k /r k − (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) (cid:17) r k (cid:17) q k /r k . We have n (cid:88) j =1 (Φ( j ) − Φ( j − n ) − Φ(0) , since this is also a telescoping series. By deﬁnition, Φ(0) = 0 and Φ( n ) = (cid:80) dk =1 α k L q k k . Using these facts along with (16)and (17), we establish the lemma: d (cid:88) k =1 α k L q k k = n (cid:88) j =1 (Φ( j ) − Φ( j − (since Φ telescopes, Φ(0) = 0 , and Φ( n ) = d (cid:88) k =1 α k L q k k ) ≤ d (cid:88) k =1 α k (cid:16) m (cid:88) i =1 (cid:16) Λ i ( k ) + Λ ∗ i ( k ) (cid:17) r k (cid:17) q k /r k − d (cid:88) k =1 α k L q k k (cid:17) (by (16) and (17)) . We proceed by applying Minkowski inequality (e.g., [43]), which states that for any two vectors v and v , we have (cid:107) v + v (cid:107) r ≤ (cid:107) v (cid:107) r + (cid:107) v (cid:107) r . Applying this inequality to the RHS in Lemma 25, we obtain d (cid:88) k =1 α k L q k k ≤ d (cid:88) k =1 α k (cid:16)(cid:16) m (cid:88) i =1 Λ r k i ( k ) (cid:17) /r k + (cid:16) m (cid:88) i =1 (Λ ∗ i ( k )) r k (cid:17) /r k (cid:17) q k − d (cid:88) k =1 α k L q k k = d (cid:88) k =1 α k (cid:16) L k + L ∗ k (cid:17) q k − d (cid:88) k =1 α k L q k k . (17)Next, we prove a simple lemma that we will apply to inequality (17). Lemma 26. ( L k + L ∗ k ) q k ≤ e / L q k k + (3 q k · L ∗ k ) q k for all k ∈ [ d ] .Proof: First consider the case L k ≥ q k · L ∗ k . Then it follows, ( L k + L ∗ k ) q k ≤ (1 + 1 / (2 q k )) q k · L q k k ≤ (cid:16) e / (2 q k ) (cid:17) q k · L q k k = e / L q k k . (18)Otherwise L k < q k · L ∗ k , and then we have ( L k + L ∗ k ) q k ≤ (3 q k · L ∗ k ) q k . Combining these two upper bounds completes the proof.Thus, we can rearrange (17) and bound (cid:80) dk =1 α k L q k k as follows: d (cid:88) k =1 α k L q k k ≤ d (cid:88) k =1 α k ( L k + L ∗ k ) q k ≤ e / d (cid:88) k =1 α k L q k k + d (cid:88) k =1 α k (3 q k · L ∗ k ) q k (by Lemma 26) = e / d (cid:88) k =1 α k L q k k + d (cid:88) k =1 ( L ∗ k ) q k . (19)Note that the last equality is due to the fact that α − k = (3 q k ) q k . By our initial scaling, L ∗ k = 1 for all k . Therefore, afterrearranging (19), we obtain (cid:16) − e / (cid:17) d (cid:88) k =1 α k L q k k ≤ d (cid:88) k =1 ( L ∗ k ) q k ≤ d, which for any ﬁxed k implies L k ≤ (cid:0) − e / (cid:1) /q k · (cid:18) dα k (cid:19) /q k ≤ − e / · (cid:18) dα k (cid:19) /q k = 32 − e / · (cid:16) d rk +log d (cid:17) q k < · d d · q k = 20 q k = O ( r k + d ) , where the ﬁrst inequality uses q k ≥ and − e / < . This completes the proof of the upper bound claimed in Theorem 4.A CKNOWLEDGEMENTS

S. Im is supported in part by NSF Award CCF-1409130. A part of this work was done by J. Kulkarni at Duke University,supported in part by NSF Awards CCF-0745761, CCF-1008065, and CCF-1348696. N. Kell and D. Panigrahi are supportedin part by NSF Award CCF-1527084, a Google Faculty Research Award, and a Yahoo FREP Award.R

EFERENCES[1] Faraz Ahmad, Srimat T Chakradhar, Anand Raghunathan, and TN Vijaykumar. Tarazu: optimizing mapreduce on heterogeneous clusters. In

ACMSIGARCH Computer Architecture News , volume 40, pages 61–74. ACM, 2012.[2] Susanne Albers. Better bounds for online scheduling.

SIAM J. Comput. , 29(2):459–473, 1999.[3] Susanne Albers. Energy-efﬁcient algorithms.

Communications of the ACM , 53(5):86–96, 2010.[4] James Aspnes, Yossi Azar, Amos Fiat, Serge A. Plotkin, and Orli Waarts. On-line routing of virtual circuits with applications to load balancing andmachine scheduling.

J. ACM , 44(3):486–504, 1997.[5] Adi Avidor, Yossi Azar, and Jiri Sgall. Ancient and new algorithms for load balancing in the l p norm. Algorithmica , 29(3):422–441, 2001.[6] Baruch Awerbuch, Yossi Azar, Edward F. Grove, Ming-Yang Kao, P. Krishnan, and Jeffrey Scott Vitter. Load balancing in the lp norm. In

FOCS , pages383–391, 1995.[7] Yossi Azar. On-line load balancing. In

Online Algorithms, The State of the Art (the book grow out of a Dagstuhl Seminar, June 1996) , pages 178–195,1996.[8] Yossi Azar, Ilan Reuven Cohen, Seny Kamara, and F. Bruce Shepherd. Tight bounds for online vector bin packing. In

Symposium on Theory ofComputing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013 , pages 961–970, 2013.[9] Yossi Azar, Joseph Naor, and Raphael Rom. The competitiveness of on-line assignments.

J. Algorithms , 18(2):221–237, 1995.[10] Yair Bartal, Amos Fiat, Howard J. Karloff, and Rakesh Vohra. New algorithms for an ancient scheduling problem.

J. Comput. Syst. Sci. , 51(3):359–366,1995.[11] Yair Bartal, Howard J. Karloff, and Yuval Rabani. A better lower bound for on-line scheduling.

Inf. Process. Lett. , 50(3):113–116, 1994.[12] Piotr Berman, Moses Charikar, and Marek Karpinski. On-line load balancing for related machines.

J. Algorithms , 35(1):108–121, 2000.[13] A. Borodin and R. El-Yaniv.

Online Computation and Competitive Analysis . Cambridge University Press, 1998.[14] Niv Buchbinder and Joseph Naor. Online primal-dual algorithms for covering and packing problems. In

ESA , pages 689–701, 2005.[15] Ioannis Caragiannis. Better bounds for online load balancing on unrelated machines. In

SODA , pages 972–981, 2008.[16] Ioannis Caragiannis, Michele Flammini, Christos Kaklamanis, Panagiotis Kanellopoulos, and Luca Moscardelli. Tight bounds for selﬁsh and greedyload balancing.

Algorithmica , 61(3):606–637, 2011.[17] Ashok K. Chandra and C. K. Wong. Worst-case analysis of a placement algorithm related to storage allocation.

SIAM J. Comput. , 4(3):249–263, 1975.[18] Chandra Chekuri and Sanjeev Khanna. On multidimensional packing problems.

SIAM J. Comput. , 33(4):837–851, 2004.[19] George Christodoulou, Vahab S. Mirrokni, and Anastasios Sidiropoulos. Convergence and approximation in potential games.

Theor. Comput. Sci. ,438:13–27, 2012.[20] R. A. Cody and Edward G. Coffman Jr. Record allocation for minimizing expected retrieval costs on drum-like storage devices.

J. ACM , 23(1):103–115,1976.[21] Ulrich Faigle, Walter Kern, and Gy¨orgy Tur´an. On the performance of on-line algorithms for partition problems.

Acta Cybern. , 9(2):107–119, 1989.[22] Rudolf Fleischer and Michaela Wahl. Online scheduling revisited. In

Algorithms - ESA 2000, 8th Annual European Symposium, Saarbr¨ucken, Germany,September 5-8, 2000, Proceedings , pages 202–210, 2000.[23] Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: Fair allocation of multipleresource types. In

NSDI , volume 11, pages 24–24, 2011.[24] Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. Choosy: Max-min fair sharing for datacenter jobs with constraints. In

Proceedings of the 8thACM European Conference on Computer Systems , pages 365–378. ACM, 2013.[25] Todd Gormley, Nick Reingold, Eric Torng, and Jeffery Westbrook. Generating adversaries for request-answer games. In

Proceedings of the EleventhAnnual ACM-SIAM Symposium on Discrete Algorithms, January 9-11, 2000, San Francisco, CA, USA. , pages 564–565, 2000.26] R. L. Graham. Bounds for certain multiprocessing anomalies.

Siam Journal on Applied Mathematics , 1966.[27] Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. Multi-resource packing for cluster schedulers. In

ACMSIGCOMM , pages 455–466, 2014.[28] Ajay Gulati, Ganesha Shanmuganathan, Anne Holler, Carl Waldspurger, Minwen Ji, and Xiaoyun Zhu. Vmware distributed resource management: De-sign, implementation, and lessons learned. https://labs.vmware.com/vmtj/vmware-distributed-resource-management-design-implementation-and-lessons-learned.[29] Magn´us M. Halld´orsson and Mario Szegedy. Lower bounds for on-line graph coloring.

Theor. Comput. Sci. , 130(1):163–174, 1994.[30] David G. Harris and Aravind Srinivasan. The moser-tardos framework with partial resampling. In , pages 469–478, 2013.[31] J. F. Rudin III. Improved bound for the on-line scheduling problem.

PhD thesis, The University of Texas at Dallas , 2001.[32] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and Calton Pu. Generating adaptation policies for multi-tier applicationsin consolidated server environments. In

Autonomic Computing, 2008. ICAC’08. International Conference on , pages 23–32. IEEE, 2008.[33] David R. Karger, Steven J. Phillips, and Eric Torng. A better algorithm for an ancient scheduling problem.

J. Algorithms , 20(2):400–430, 1996.[34] Gunho Lee, Byung-Gon Chun, and Randy H Katz. Heterogeneity-aware resource allocation and scheduling in the cloud. In

Proceedings of the 3rdUSENIX Workshop on Hot Topics in Cloud Computing, HotCloud , volume 11, 2011.[35] Adam Meyerson, Alan Roytman, and Brian Tagiku. Online multidimensional load balancing. In

Approximation, Randomization, and CombinatorialOptimization. Algorithms and Techniques - 16th International Workshop, APPROX 2013, and 17th International Workshop, RANDOM 2013, Berkeley,CA, USA, August 21-23, 2013. Proceedings , pages 287–302, 2013.[36] R. Motwani and P. Raghavan.

Randomized Algorithms . Cambridge University Press, 1997.[37] Torsten M¨utze, Thomas Rast, and Reto Sp¨ohel. Coloring random graphs online without creating monochromatic subgraphs.

Random Structures &Algorithms , 44(4):419–464, 2014.[38] Steven Pelley, David Meisner, Thomas F Wenisch, and James W VanGilder. Understanding and abstracting total data center power. In

Workshop onEnergy-Efﬁcient Design , 2009.[39] Kirk Pruhs, Jiri Sgall, and Eric Torng. Online scheduling.

Handbook of scheduling: algorithms, models, and performance analysis , pages 15–1, 2004.[40] Jiri Sgall. On-line scheduling. In

Online Algorithms , pages 196–231, 1996.[41] Jiri Sgall. Online scheduling. In

Algorithms for Optimization with Incomplete Information, 16.-21. January 2005 , 2005.[42] Subhash Suri, Csaba D. T´oth, and Yunhong Zhou. Selﬁsh load balancing and atomic congestion games.

Algorithmica , 47(1):79–96, 2007.[43] Wikipedia. Minkowski inequality — Wikipedia, the free encyclopedia.[44] Frances Yao, Alan Demers, and Scott Shenker. A scheduling model for reduced cpu energy. In

Foundations of Computer Science, 1995. Proceedings.,36th Annual Symposium on , pages 374–382. IEEE, 1995.[45] Shuo Zhang, Baosheng Wang, Baokang Zhao, and Jing Tao. An energy-aware task scheduling algorithm for a heterogeneous data center. In