[PDF] Online Page Migration with ML Advice

Abstract

We consider online algorithms for the {\em page migration problem} that use predictions, potentially imperfect, to improve their performance. The best known online algorithms for this problem, due to Westbrook'94 and Bienkowski et al'17, have competitive ratios strictly bounded away from 1. In contrast, we show that if the algorithm is given a prediction of the input sequence, then it can achieve a competitive ratio that tends to 1 as the prediction error rate tends to 0 . Specifically, the competitive ratio is equal to 1+O(q) , where q is the prediction error rate. We also design a ``fallback option'' that ensures that the competitive ratio of the algorithm for {\em any} input sequence is at most O(1/q) . Our result adds to the recent body of work that uses machine learning to improve the performance of ``classic'' algorithms.

Full PDF

OOnline Page Migration with ML Advice

Piotr Indyk ∗ Frederik Mallmann-Trenn † Slobodan Mitrovi´c ∗ Ronitt Rubinfeld ‡ Abstract

We consider online algorithms for the page migration problem that use predictions, potentiallyimperfect, to improve their performance. The best known online algorithms for this problem, dueto Westbrook’94 and Bienkowski et al’17, have competitive ratios strictly bounded away from 1.In contrast, we show that if the algorithm is given a prediction of the input sequence, then it canachieve a competitive ratio that tends to 1 as the prediction error rate tends to 0. Speciﬁcally, thecompetitive ratio is equal to 1 + O ( q ), where q is the prediction error rate. We also design a “fallbackoption” that ensures that the competitive ratio of the algorithm for any input sequence is at most O (1 /q ). Our result adds to the recent body of work that uses machine learning to improve theperformance of “classic” algorithms. Recently, there has been a lot of interest in using machine learning to design improved algorithmsfor various computational problems. This includes work on data structures [KBC +

18, Mit18], onlinealgorithms [LV18, PSK18, GP19a, Roh20], combinatorial optimization [KDZ +

17, BDSV18], similaritysearch [WLKC16], compressive sensing [MPB15, BJPD17] and streaming algorithms [HIKV19]. Thisbody of work is motivated by the fact that modern machine learning methods are capable of discover-ing subtle structure in collections of input data, which can be utilized to improve the performance ofalgorithms that operate on similar data.In this paper we focus on learning-augmented online algorithms . An on-line algorithm makes non-revocable decisions based only on the part of the input seen so far, without any knowledge of the future.It is thus natural to consider a relaxation of the model where the algorithm has access to (imperfect)predictors of the future input that could be used to improve the algorithm performance. Over the lastcouple of years this line of research has attracted growing attention in the machine learning and algorithmsliterature, for classical on-line problems such as caching [LV18, Roh20], ski-rental and scheduling [PSK18,GP19b, LLMV20] and graph matching [KPS + For instance, for caching, even a single misprediction can lead to anunbounded competitive ratio [LV18].In this paper we show that, perhaps surprisingly, the aforementioned “optimistic” strategy leadsto near-optimal performance for some well-studied on-line problems. We focus on the problem of pagemigration [BS89] (a.k.a. ﬁle migration [Bie12] or [MMS90]). Here, the algorithmis given a sequence s of points (called requests ) s , s , . . . from a metric space ( X, d ), in an online fashion.The state of the algorithm is also a point from (

X, d ). Given the next request s i , the algorithm moves to itsnext state a i (at the cost of D · d ( a i − , a i ), where D is a parameter), and then “satisﬁes” the request s i (atthe cost of d ( a i , s i )). The objective is to satisfy all requests while minimizing the total cost. The problemhas been a focus on a large body of research, see e.g., [ABF93, Wes94, CLRW97, BCI97, KM16, BBM17]. ∗ CSAIL, MIT, {indyk,slobo}@mit.edu † King’s College London, [email protected] ‡ CSAIL, MIT, [email protected] To the best of our knowledge the only problem for which this strategy is known to result in an optimal algorithm isthe online bipartite matching, see Section 1.1 for more details. a r X i v : . [ c s . D S ] J un he best known algorithms for this problem have competitive ratios of 4 (a deterministic algorithm dueto [BBM17]), 3 (a randomized algorithm against adaptive adversaries due to [Wes94]) and 2 . . . . (a randomized algorithm against oblivious adversaries due to [Wes94]). The original paper [BS89] alsoshowed that the competitive ratio of any deterministic algorithm must be at least 3, which was recentlyimproved to 3 + (cid:15) for some (cid:15) > Our results

Suppose that we are given a predicted request sequence ˆ s that, in each interval of length (cid:15)D , diﬀers from the actual sequence s on at most a fraction q of positions, where (cid:15), q ∈ (0 ,

1) are theparameters (note that the lower the values of (cid:15) and q are, the stronger our assumption is). Under thisassumption we show that the optimal oﬀ-line solution for ˆ s is a (1+ (cid:15) )(1+ O ( q ))-competitive solution for s as long as the parameter q > Furthermore, to make the algorithm robust,we also design a “fallback option”, which is triggered if the input sequence violates the aforementionedassumption (i.e., if the fraction of errors in the suﬃx of the current input sequence exceeds q ). Thefallback option ensures that the competitive ratio of the algorithm for any input sequence is at most O (1 /q ). Thus, our ﬁnal algorithm produces a near-optimal solution if the prediction error is small, whileguaranteeing a constant competitive ratio otherwise.For the case when the underlying metric is uniform , i.e., all distances between distinct points areequal to 1, we further improve the competitive ratio to 1 + O ( q ) under the assumption that each intervalof length D diﬀers from the actual sequence in at most qD positions. That is, the parameter (cid:15) is notneeded in this case. Moreover, any algorithm has a competitive ratio of at least 1 + Ω( q ).It is natural to wonder whether the same guarantees hold even when the predicted sequence diﬀersfrom the actual sequence on at most a fraction of q positions distributed arbitrarily over ˆ s , as opposedto over chunks of length εD . We construct a simple example that shows that such a relaxed assumptionresults in the same lower bound as for the classical problem. Multiple variations of the page migration problem have been studied over the years. For example, if thepage can be copied as well as moved, the problem has been studied under the name of ﬁle allocation ,see e.g., [BFR95, ABF03, LRWY98]. Other formulations add constraints on nodes capacities, allowdynamically changing networks etc. See the survey [Bie12] for an overview.There is a large body of work concerning on-line algorithms working under stochastic or probabilisticassumptions about the input [Unc16]. In contrast, in this paper we do not make such assumptions,and allow worst case prediction errors (similarly to [LV18, KPS +

19, PSK18]). Among these works, ourprediction error model (bounding the fraction of mispredicted requests) is most similar to the “agnostic”model deﬁned in [KPS + d vertices. Since each vertex impacts at most one matching edge, it directly follows that d errorsreduce the matching size by at most d . In contrast, in our case a single error can aﬀect the cost of theoptimum solution by an arbitrary amount. Thus, our analysis requires a more detailed understanding ofthe properties of the optimal solution.Multiple papers studied on-line algorithms that are given a small number of bits of advice [BFK + Page Migration

In the classical version, the algorithm is given a sequence s of points (called requests ) s = ( s i ) i ∈ [ n ] from a metric space ( X, d ), in an online fashion. The state of the algorithm (i.e., the page ), is also a point from (

X, d ). Given the next request s i , the algorithm moves to its next state Note that if each interval of length D has at most a fraction of q of errors, then it is also the case that each intervalof length √ qD has at most a fraction of √ q of errors. Thus, if q tends to 0, the competitive ratio tends to 1 even if theinterval length remains ﬁxed. i (at the cost of D · d ( a i − , a i ), where D > s i (atthe cost of d ( a i , s i )). The objective is to satisfy all requests while minimizing the total cost. We canconsider a version of this problem where the algorithm is given, prior to the arrival of the requests, a predicted sequence ˆ s = ( s ∗ i ) i ∈ [ n ] . The (ﬁnal) sequence s is generated adversarially from ˆ s and an arbitrary adversarial sequence s (cid:63) = ( s ∗ i ) i ∈ [ n ] . That is either s i = ˆ s i or s i = s ∗ i . If we do not make any assumptionson how well s is predicted by ˆ s , then the problem is no easier than the classical online version. On theother hand, if s = ˆ s , then one obtains an optimal online algorithm, by simply computing the optimaloﬄine algorithm. The interesting regime lies in between these two cases. We will make the followingassumption throughout the paper, which roughly speaking demands that a 1 − q fraction of the input iscorrectly predicted and that the q fraction of errors is somewhat spread out. Deﬁnition 1 (Number of mismatches m ( · )) . Let I be an interval of indices. We deﬁne m ( I ) def = (cid:80) t ∈I s t (cid:54) =ˆ s t to be the number of mismatches between s and ˆ s within the interval I . Assumption 1.

Consider an interval I of s of length εD . For any I it holds m ( I ) ≤ qεD . Remark 1.

Relaxing Assumption 1 by allowing the adversary to change an arbitrary q fraction of theinput results in the same lower bound as for the classical problem. To see this, consider an arbitraryinstance on qn elements that gives a lower bound of c in the classical problem. Call this sequence ofelements adversarial . Let ˆ s consists of n elements being equal to the starting point. That is, ˆ s is simplythe starting position replicated n times. Let s be equal to the sequence ˆ s whose suﬃx of length qn isreplaced by the adversarial sequence. Now, on s deﬁned in this way no algorithm can be better than c -competitive. Hence, in general this relaxation of Assumption 1 gives no advantage. Our main results hold for general metric space , where for all p, p (cid:48) , p (cid:48)(cid:48) ∈ X all of the following hold: d ( p, p ) = 0, d ( p, p (cid:48) ) > p (cid:54) = p (cid:48) , d ( p, p (cid:48) ) = d ( p (cid:48) , p ), and d ( p, p (cid:48)(cid:48) ) ≤ d ( p, p (cid:48) ) + d ( p (cid:48) , p (cid:48)(cid:48) ). We obtain betterresults for uniform metric space , where, d ( p, p (cid:48) ) = 1 for p (cid:54) = p (cid:48) . Notation

Given a sequence s , we use s i to denote the i -th element of s . For integers i and j , suchthat 1 ≤ i ≤ j , we use s [ i,j ] to denote the subsequence of s consisting of the elements s i , . . . , s j .For a ﬁxed algorithm, let p i be the position of the page at time i . In particular, p denotes the startposition for all algorithms.Given an algorithm B that pays cost C for serving n requests, we denote by C t ,t the cost paid by B during the interval [ t , t ]. We sometimes abuse notation and write C t as a shorthand for C ,t . Inparticular, C denotes C ,n as well as C n . This notation is the most often used in the context of ouralgorithm ALG and the optimal solution

OPT , whose total serving costs are A and O , respectively. Our two main contributions are: algorithm

ALG that is (1 + O ( q ))-competitive provided Assumption 1;and, a black-box reduction from ALG to a O (1 /q )-competitive algorithm ALG robust when Assumption 1does not hold. In Section 3.1 we present an overview of

ALG , while an overview of

ALG robust is givenin Section 3.2.

Algorithm

ALG (given as Algorithm 1) simply computes the optimal oﬄine solution and moves pagesaccordingly.

Algorithm 1

ALG ( i, s, ˆ s ) Input

The number i of the next request. Output s and ˆ s are sequences as deﬁned in Section 2. Let p i be the position of the page in the optimal algorithm at the i -th request with respect to ˆ s . Move the page to p i and serve the request s i .The main challenge in proving that ALG still performs well in the online setting lies in leveragingthe optimality of

ALG with respect to the oﬄine sequence. The reason for this is that, due to s and3 s not being identical, OPT and

ALG may be on diﬀerent page locations throughout all the requests.In addition to that, we have no control over which q fraction of any interval of length D is changednor to what it is changed. In particular, if s i (cid:54) = ˆ s i , then s i and ˆ s i could be very far from each other.To circumvent this, we use the following way to argue about the oﬄine optimality, that is, about theoptimality computed with respect to ˆ s .We think of ALG ( OPT , respectively) as a sequence of page locations that are deﬁned with respectto ˆ s ( s , respectively). These page locations do not change even if, for instance, the i -th online requestto ALG deviates from ˆ s i . Let A t ( O t , respectively) be the cost of ALG ( OPT , respectively) serving t requests given by s [1 ,t ] . Similarly, let ˆ A t ( ˆ O t , respectively) be the cost of ALG ( OPT , respectively) forserving the oracle subsequence ˆ s [1 ,t ] . In particular, A n is the cost of ALG (optimal on ˆ s ) on the ﬁnalsequence s , whereas ˆ O n is the cost of the optimal algorithm for s on the predicted sequence ˆ s . It isconvenient to think of ˆ O n as the ‘evil twin’ of A n .We have, due to optimality of ALG on the oﬄine sequence, A n − O n = A n − ˆ A n + ˆ A n − O n ≤ A n − ˆ A n + ˆ O n − O n . (1)The intuition behind this is best explained pictorially, which we do in Fig. 1. Here ALG is at a and OPT is at o . In the depicted example a request is moved from s to ˆ s . This causes A n − ˆ A n to increase,however, at the same time, ˆ O n − O n decreases by almost the same amount. In fact, one can show thatfor such a moved page the right hand side of Eq. (1) will increase by no more than 2 d ( a, o ). For pagesthat are not moved, i.e., s = ˆ s , the costs of ALG and

OPT do not change. It remains to bound d ( a t , o t ),which we do next. By triangle inequality, it holds that d ( a t , o t ) ≤ d ( a t , s t ) + d ( o t , s t ) ≤ A t − A t − + O t − O t − , (2)Consider an interval ( t i − , t i ]. Let c ( t i − ,t i ] move be the total sum of moving costs for both OPT and

ALG for the requests in the interval ( t i − , t i ]. As a reminder (see Deﬁnition 1), for a given interval I , m ( I ) isthe number of mismatches between s and ˆ s within I . From Eq. (2), we derive A n − O n ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . (3)We would like the right hand side of Eq. (3) to be small, implying that A n − O n is small as well. Tounderstand the nature of the right hand side of Eq. (3) and what is required for it to be small, assumefor a moment that m (( t i − , t i ]) = α ( t i − t i − ). Then, the rest of the summation telescopes to A n − O n ,and Eq. (3) reduces to A n − O n ≤ α ( A n − O n ). Now, if α is suﬃciently small, e.g., α ≤ q , then we areable to upper-bound Eq. (3) by 4 q ( A n + O n ) and derive A n O n ≤ q − q , which gives the desired competitive factor.So, to utilize Eq. (3), in our proof we will focus on showing that m (( t i − , t i ]) is suﬃciently smallerthan t i − t i − . However, this can be challenging as OPT is allowed to move often, potentially on everyrequest which results in t i − t i − being very small. But, if t i − t i − is too small, then Assumption 1gives no information about m (( t i − , t i ]). However, if intervals t i − t i − would be large enough, e.g.,at least βD for some positive constant β , then from Assumption 1 we would be able to conclude that α = O ( q ). Since in principle OPT can move in every step, we design ‘lazy’ versions of

OPT and

ALG that only move O (1) times in any interval of length D . This will enable us to argue that t i − t i − is nottoo small. It turns out that the respective competitive factors of the lazy versions with respect to theoriginal versions is very close, allowing us prove A n O n ≈ A lazyn O lazyn ≤ (1 + ε ) 1 + O ( q )1 − O ( q ) . We oversimpliﬁed here, since the right hand side of (1) only holds for the sum of all points, but a similar argumentcan be made for a single requests. a s ˆ s Figure 1: A pictorial representation of Eq. (1). robust , a robust version of ALG

We now describe

ALG robust . This algorithm follows a “lazy” variant of

ALG as long as Assumption 1holds, and otherwise switches to

ALG online . Instead of using

ALG directly, we use a ‘lazy’ version of

ALG that works as follows: Follow the optimal oﬄine solution given by

ALG with a delay of 6 qD steps.Let ALG lazy be the corresponding algorithm. We point out that performing some delay with respect to

ALG is crucial here. To see that, consider the following example in the case of uniform metric spaces: s = { } n and ˆ s = { } n , and let the starting location be 0. According to ALG , the page should be movedfrom 0 to 1 in the very beginning, incurring the cost of D . On the other hand, OPT never moves from 0.If

ALG robust would follow

ALG until it realizes that the fraction of errors is too high, it would alreadypay the cost of at least D , leading to an unbounded competitive ratio. However, if ALG robust delaysfollowing

ALG , then it gets some “slack” in verifying whether the predicted sequence properly predictsrequests or not. As a result, when Assumption 1 holds, this delay increases the overall serving cost bya factor O (1 + O ( q )), but in turn achieves a bounded competitive ratio when this assumption does nothold.While serving requests, ALG robust also maintains the execution of

ALG online , i.e.,

ALG robust main-tains where

ALG online would be at a given point in time, in case a fallback is needed. Now

ALG robust simply executes

ALG lazy unless we ﬁnd a violation of Assumption 1 is detected. Once such a violationis detected, the algorithm switches to

ALG online by moving its location to

ALG online ’s current location.From there on

ALG online is executed.We now present the intuition behind the proof for the competitive factor of the algorithm.

Case when Assumption 1 holds.

In this case

ALG robust is ALG lazy , and the analysis boils downto proving competitive ratio of

ALG lazy . We show that

ALG lazy is (1 + O ( q ))-competitive to ALG ,which is, as we argued in the previous section, 1 + O ( q ) competitive to OPT . To see this, we employ thefollowing charging argument: whenever

ALG moves from p to p (cid:48) it pays D · d ( p, p (cid:48) ). The lazy algorithmeventually pays the same moving cost of less.However, in addition, the serving cost of ALG lazy for each of the 6 qD requests is potentially increased,as ALG lazy is not at the same location as

ALG . Nevertheless, by triangle inequality, the cost due to themovement from p to p (cid:48) of ALG reﬂect to an increase in the serving cost of

ALG lazy by at most d ( p, p (cid:48) ).In total over all the 6 qD requests and per each move of ALG from p to p (cid:48) , ALG lazy pays at most6 qDd ( p, p (cid:48) ) extra cost compared to ALG . Considering all migrations, this gives a 1 + O ( q ) competitivefactor. Case when Assumption 1 is violated.

The case where Assumption 1 is violated (say at time t (cid:48) ) isconsiderably more involved. We then have ALG robust ≤ ALG lazy (0 , t (cid:48) ) + ALG online ( t (cid:48) + 1 , n ) + D · d ( a, a (cid:48) ) , and we seek to upper-bound each of these terms by O ( OPT /q ). While the upper-bound holds directlyfor ALG online ( t (cid:48) + 1 , n ), showing the upper-bound for other terms is more challenging.The key insight is that, due to the optimality of ALG , d ( a, p ) ≤ OP T ( t (cid:48) ) / ( qD ) , (4)which can be proven as follows. If ALG migrates its page to a location that is far from the startinglocation p , then there have to be, even when taking into account noise, at least 4 qD page requests that5re far from p . OPT also has to serve these requests (either remotely or by moving), and hence hasto pay a cost of at least qD · d ( a, p ). Equipped with this idea, we can now bound D · d ( a, a (cid:48) ) in termsof OPT ( t (cid:48) ) /q . To bound ALG lazy (0 , t (cid:48) ) we need one more idea. Namely, we compare ALG lazy (0 , t (cid:48) ) tothe optimal solution that has a constraint to be at the same position as ALG lazy at time t (cid:48) . A formalanalysis is given in Section 5. Now we analyze

ALG (Algorithm 1). As discussed in Section 3.1, our main objective is to establishEq. (3), which we do in Section 4.1. That upper-bound will be directly used to obtain our result foruniform metric spaces, as we present in Section 4.2. To construct our algorithm for general-metric spaces,in Section 4.3 we build on

ALG by ﬁrst designing its “lazy” variant. As the ﬁnal result, we show thefollowing. Recall that q is the fraction of symbols that the adversary is allowed to change in any sequenceof length εD of the predicted sequence. Theorem 1.

If Assumption 1 holds with respect to parameter ε , then we obtain the following results:(A) There exists a (1 + ε ) · (1 + O ( q )) -competitive algorithm for the online page migration problem.(B) There exists a (1 + O ( q )) -competitive algorithm for the online page migration problem in uniformmetric spaces. Note that Theorem 1 is asymptotically optimal with respect to q . Namely, any algorithm is at least1 + Ω( q ) competitive; even in the uniform metric case. To see this consider the following binary examplewhere the algorithm starts at position 0. The advice is s = 111 · · · (cid:124) (cid:123)(cid:122) (cid:125) (1 − q ) D · · · (cid:124) (cid:123)(cid:122) (cid:125) qD . The ﬁnal sequence isˆ s =  s w.p. 1 / · · · (cid:124) (cid:123)(cid:122) (cid:125) (1+ q ) D otherwise . In the ﬁrst case

OPT simply stays at 0 since moving costs D ; in the second case, OPT goes immediatelyto 1. Note that

ALG can only distinguish between the sequences after (1 − q ) D steps at which point itis doomed to have an additional cost of qD with probability at least 1 / s . In our proofs we will use the following corollary of Assumption 1.

Corollary 1.

If Assumption 1 holds, then for any interval I of length (cid:96) > εD it holds m ( I ) ≤ q(cid:96) .Proof. This statement follows from the fact that each such I can be subdivided into k ≥ εD and at most one interval I (cid:48) of length less than εD . On one hand, the total numberof mismatches for these intervals of length exactly εD is upper-bounded by qkεD ≤ q(cid:96) . On the otherhand, since I (cid:48) is a subinterval of an interval of length εD , it holds m ( I (cid:48) ) ≤ qεD < q(cid:96) . The claim nowfollows.Most of our analysis in this section proceeds by reasoning about intervals where neither ALG nor

OPT moves. Let t , t . . . be the time steps at which either OPT or ALG move. The ﬁnal product ofthis section will be an upper-bound on A n − O n as given by Eq. (3) , i.e., A n − O n ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . We begin by rewriting and upper-bounding A t − O t as follows A t − O t = A t − ˆ A t + ˆ A t − O t ≤ A t − ˆ A t + ˆ O t − O t , (5) As a reminder, A t ( O t , respectively) is the cost of ALG ( OPT , respectively) at time for the sequence s [1 ,t ] . A t ≤ ˆ O t as ˆ A t is the optimum for ˆ s . Consider a ﬁxed interval I = ( t i − , t i ]. Then,by triangle inequality, it holds d ( a t , o t ) ≤ d ( a t , s t ) + d ( o t , s t ) ≤ A t − A t − + O t − O t − . (6)Let c ( t i − ,t i ] move be the sum of moving costs for OPT and

ALG in ( t i − , t i ]. Note that A t i − A t i − + O t i − O t i − = (cid:88) t ∈ ( t i − ,t i ] ( A t − A t − + O t − O t − ) ≥ c ( t i − ,t i ] move + d ( a t i , o t i ) | t i − t i − | , (7)where the inequality comes from Eq. (6) applied to every time step in ( t i − , t i ] and the fact that ALG or OPT must have moved inducing a cost of at least c ( t i − ,t i ] move . The following notation is used to representthe diﬀerence between serving s i and ˆ s i by ALG A [ t − , t ] := A t − ˆ A t − ( A t − − ˆ A t − ) = d ( a t , s t ) − d ( a t , ˆ s t ) . Note that this holds even when

ALG moves since the moving costs for the oracle sequence and on theﬁnal sequence are the same and therefore cancel each other out. Similarly to A [ t − , t ], letˆ O [ t − , t ] := ˆ O t − O t − ( ˆ O t − − O t − ) = d ( o t , ˆ s t ) − d ( o t , s t ) . Consider now any t ∈ [1 , n ]. By triangle inequality we have A [ t − , t ] + ˆ O [ t − , t ] = d ( a t , s t ) − d ( o t , s t ) + d ( o t , ˆ s t ) − d ( a t , ˆ s t ) ≤ ( d ( a t , o t ) + d ( o t , s t )) − d ( o t , s t ) + ( d ( a t , ˆ s t ) + d ( a t , o t )) − d ( a t , ˆ s t )= 2 d ( a t , o t ) (7) ≤ A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . (8)Let ∆ i = A t i − ˆ A t i + ˆ O t i − O t i , where ∆ = 0 by deﬁnition. Note that A n − O n (5) ≤ A n − ˆ A n + ˆ O n − O n = (cid:88) i (∆ i − ∆ i − )= (cid:88) i (cid:88) t ∈ ( t i − ,t i ] (cid:16) A [ t − , t ] + ˆ O [ t − , t ] (cid:17) . Recall that, for a given interval I the function m ( I ) denotes the number of mismatches between s andˆ s within I (see Deﬁnition 1). Now, as for t such that s t = ˆ s t we have A [ t − , t ] = ˆ O [ t − , t ] = 0, thelast chain of inequalities further implies A n − O n Eq. (8) ≤ (cid:88) i (cid:88) t ∈ ( t i − ,t i ] s t (cid:54) =ˆ s t · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . (9)This establishes the desired upper-bound on A n − O n . As discussed in Section 3.1, this upper-bound isused to derive our non-robust results for uniform (Section 4.2) and general (Section 4.3) metric spaces.The main task in those two sections will be to show that m (( t i − , t i ]) is suﬃciently smaller than t i − t i − . (B) We now use the upper-bound on A n − O n given by Eq. (9) to show that ALG is (1 + O ( q ))-competitiveunder Assumption 1, i.e., we show Theorem 1 (A). We distinguish between two cases: t i − t i − ≥ D ;and t i − t i − < D . 7 ase t i − t i − ≥ D . In this case, by Corollary 1 we have m (( t i − , t i ]) ≤ q | t i − t i − | . Plugging thisinto Eq. (9) we derive A n − O n ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − t i − t i − ≤ q (cid:88) i ( A t i − A t i − + O t i − O t i − )= 4 q ( A n + O n ) . Case t i − t i − < D . We proceed by upper-bounding all the terms in Eq. (9). As the interval ( t i − , t i ]is a subinterval of ( t i − , t i − + D ], we have m (( t i − , t i − + D ]) ≤ m (( t i − , t i ]) ≤ qD. Also, observe that trivially it holds A t i − A t i − + O t i − O t i − ≤ | t i − t i − | + c ( t i − ,t i ] move . (10)Combining the derived upper-bounds, we establish A n − O n Eq. (9) ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − Eq. (10) ≤ (cid:88) i qD t i − t i − ) + c ( t i − ,t i ] move − c ( t i − ,t i ] move t i − t i − (11)= 4 q (cid:88) i D. (12)To conclude this case, note that by deﬁnition either ALG or OPT moves within ( t i − , t i ], incurring thecost of at least D . Therefore, A t i − A t i − + O t i − O t i − ≥ D . This together with Eq. (12) implies A n − O n ≤ q (cid:88) i ( A t i − A t i − + O t i − O t i − ) = 4 q ( A n + O n ) . Combining the two cases.

We have concluded that in either case it holds A n − O n ≤ q ( A n + O n )and hence we derive A n O n ≤ q − q . This concludes the analysis for uniform metric spaces. (A)

As in the uniform case, our goal for general metric spaces is to use Eq. (3) for proving the advertisedcompetitive ratio. However, as we discussed in Section 3.1, the main challenge in applying Eq. (3) lies inupper-bounding the ratio between m (( t i − , t i ]) and t i − t i − by a small constant, ideally much smallerthan 1. Unfortunately, this ratio can be as large as 1 as OPT (or

ALG ) could possibly move on everysingle request. To see that, consider the scenario in which all the requests are on the x -axis and arerequested in their increasing order of their location. Then, for all but potentially the last D requests, OPT would move from request to request. To bypass this behavior of

OPT and

ALG , we deﬁne andanalyze their “lazy” variants, i.e., variants in which

OPT and

ALG are allowed to move only at the i -threquest when i is a multiple of εD . We now state the algorithm. lazy We use the following algorithm

ALG lazy : Compute the optimal oﬄine solution (on ˆ s ) while only movingon multiples of εD . Let A lazy be the cost of the solution s and let (cid:92) A lazy be the cost of the solution onˆ s . Note that there can be better oﬄine algorithms for ˆ s , however ALG lazy has the minimal cost amongall online algorithms that are only allowed to move every multiple of εD .8 .3.2 Proof We also need to consider a lazy version of

OPT , which we do in the following lemma. There we showthat making any algorithm lazy does not increase the cost by more than a factor of (1 + ε ). In particular,we will show O lazy ≤ (1 + ε ) OPT . Let A lazy t and O lazy t denote their costs at time t . Lemma 1.

Let ε ∈ (0 , . Consider an arbitrary preﬁx w of length t of a sequence of requests. Let B t be the cost of any algorithm ALG B serving w . Let B lazy t (cid:48) be the cost of the algorithm that has to moveat every time step that is a multiple of εD (and is not allowed to move at any other time step), and tomove to the position where ALG B is at that time step. Then, we have B lazy t (cid:48) ≤ (1 + ε ) B t . Proof.

Let x i be the distance of the i -th move and y i be the cost for serving the i -th request remotely.Then, B t = D (cid:88) i x i + (cid:88) i y i . Now we relate B t and B lazy t (cid:48) . B lazy t (cid:48) has two components: the moving cost and the cost for servingremotely. By triangle inequality, the moving cost is upper-bounded by D (cid:80) i x i . Consider now interval I j ∈ [ jεD + 1 , ( j + 1) εD ] for some integer j . To serve point i ∈ I j remotely, the cost is, by triangleinequality, at most the cost of y i plus the cost of traversing all the points with indices in I j where ALG B has moved to. Thus the cost per request i ∈ I j is upper-bounded by y i + (cid:80) k ∈I j x k . Note that thesummation (cid:80) k ∈I j x k is charged to εD requests. Hence, summing over all the intervals gives B lazy t (cid:48) ≤ D (cid:88) i x i + (cid:88) i y i + εD (cid:88) i x i ≤ (1 + ε ) B t . Deﬁne O lazy as the cost of the optimal algorithm for s that is allowed to move only at time stepswhich are multiple of εD . Similarly as in Lemma 1, we have O lazy n ≤ (1 + ε ) O n . Thus, A lazy n O n ≤ (1 + ε ) A lazy n O lazy n . (13)Now we need to upper-bound A lazy n O lazy n . We will do that by showing that the same statements as we developedin Section 4.1 hold for A lazy and O lazy . To that end, observe that to derive Eq. (5) we used the factthat ˆ A ≤ ˆ O . Notice that the analog inequality (cid:92) A lazy ≤ (cid:92) O lazy holds, since ALG lazy is the the optimaloﬄine algorithm that only moves every multiple of εD .Hence, we can obtain the derivation Eq. (9) for A lazy n − O lazy n A lazy n − O lazy n ≤ (cid:88) i m (( t i − , t i ]) · A lazy t i − A lazy t i − + O lazy t i − O lazy t i − t i − t i − . (14)Since for the lazy versions we have | t i − t i − | = εD , Assumption 1 implies m (( t i − , t i ]) ≤ εqD . Pluggingthis into Eq. (14) gives A lazy n − O lazy n ≤ q (cid:88) i (cid:16) A lazy t i − A lazy t i − + O lazy t i − O lazy t i − (cid:17) = 2 q ( A lazy n + O lazy n ) . From Eq. (13) we establish A lazy n O n ≤ (1 + ε ) A lazy n O lazy n ≤ (1 + ε ) 1 + 2 q − q . This concludes the proof of Theorem 1 (A). 9

Robust Page Migration

So far we designed algorithms for the online page migration problem that have small competitive ratiowhen Assumption 1 holds. In this section we build on those algorithm and design a (robust) algorithmthat performs well even when Assumption 1 does not hold, while still retaining competitiveness whenAssumption 1 is true. We refer to this algorithm by

ALG robust . For

ALG robust we prove the following.

Theorem 2.

Let γ be the competitive ratio of ALG for the online page migration problem, and let q bea positive number less than / . If Assumption 1 holds, then ALG robust is γ · (1 + O ( q )) -competitive,and otherwise ALG robust is O (1 /q ) -competitive. Using our techniques it is straight-forward to obtain an arbitrary trade-oﬀ between the two com-petitive ratios. Fix an arbitrary x ≥

1, then Algorithm

ALG robust is (1 + O ( x · q ))-competitive ifAssumption 1 holds and O (1 / ( x · q ))-competitive otherwise. robust Let

ALG online refer to an arbitrary online algorithm for the problem, e.g., [Wes94]. We now deﬁne

ALG robust . This algorithm switches from

ALG to ALG online when it detects that Assumption 1 doesnot hold. Instead of using

ALG directly, we use a “lazy” version of

ALG that works as follows. Followthe optimal oﬄine solution given by

ALG with a delay of 6 qD steps. Let ALG lazy be the correspondingalgorithm. (A lazy version for diﬀerent setup of parameters was presented in Section 4.3.)Throughout its execution,

ALG robust maintains/tracks in its memory the execution of

ALG online onthe preﬁx of s seen so far. That is, ALG robust maintains where

ALG online would be at a given point intime in case a fallback is needed. Now

ALG robust simply executes

ALG lazy unless we ﬁnd a violationof Assumption 1 is detected. Once such a violation is detected, the algorithm switches to

ALG online bymoving its location to

ALG online ’s current location. From there on

ALG online is executed.We now analyze

ALG robust and show that in case Assumption 1 holds, then

ALG and

ALG robust are close in terms of total cost, and otherwise the cost of

ALG robust is at most O (1 /q ) larger than thatof ALG online . Case 1: Assumption 1 holds for the entire sequence.

In this case

ALG robust executes

ALG lazy throughout. Following the same argument for ε = 6 q as given for A lazy (cid:48) in the proof of Lemma 1, wehave A lazy t ≤ (1 + 6 q ) A t . (15)Thus, A robust = A lazy n ≤ (1 + 6 q ) A n ≤ γ (1 + O ( q )) O, where we used the assumption that ALG is γ -competitive. This completes this case. Case 2: Assumption 1 is violated at the t -th request. Let t (cid:48) = t − qD + 1. Note that up to thispoint in time no violation occurred. We deﬁne the following: a is the position of ALG lazy at time t (cid:48) ; a (cid:48) is the position of ALG online at time t (cid:48) + 1; o is the position of OPT at time t (cid:48) ; and, O p ,t (cid:48)(cid:48) is the cost of OPT up to time t (cid:48)(cid:48) where we demand that OPT is at position p at t (cid:48)(cid:48) .In the following, we assume the following holds. We defer the proof of its correctness for later. d ( a, p ) ≤ O t (cid:48) / ( qD ) . (16)Intuitively, this means that we can bound the distance from the starting position by the cost of OPT .Using Eq. (16), we get, A robust ≤ A lazy t (cid:48) + A online t (cid:48) +1 ,n + D · d ( a, a (cid:48) ) . (17)As O a ,t (cid:48) and A lazy ,t (cid:48) are at the same position at time t (cid:48) , inequality A lazy ,t (cid:48) ≤ (1 + c q ) O a ,t (cid:48) follows fromEq. (15) for a suitable constant c . Note that O t (cid:48) ≥ D · d ( p , o ), which holds since this cost is alreadyincurred by moving to o , where we used triangle inequality.10ext, using triangle inequality again, we get A lazy ,t (cid:48) ≤ (1 + c q ) O a ,t (cid:48) ≤ (1 + c q ) (cid:0) O o ,t (cid:48) + D · d ( a, o ) (cid:1) ≤ (1 + c q ) (cid:0) O o ,t (cid:48) + D · d ( a, p ) + D · d ( p , o ) (cid:1) ≤ (1 + c q ) (cid:0) O o ,t (cid:48) + O t (cid:48) /q + O t (cid:48) (cid:1) = O ( O t (cid:48) /q ) . (18)Furthermore, using Eq. (16), triangle inequality and a simple lower bound on A lazy ,t (cid:48) as well as Eq. (18),we get, D · d ( a, a (cid:48) ) ≤ D · d ( a, p ) + D · d ( p , a (cid:48) ) ≤ O t (cid:48) /q + A online ,t (cid:48) ≤ O t (cid:48) /q. (19)Thus, plugging Eq. (19) and Eq. (18) into Eq. (17) and using A online ≤ O ( O n ), we get A robust ≤ A lazy t (cid:48) + A online t (cid:48) +1 ,n + D · d ( a, a (cid:48) )= O ( O t (cid:48) /q ) + O ( O n ) + 2 O t (cid:48) /q = O ( O n /q ) . Thus, it only remains to prove Eq. (16), as we do using the following lemma. That lemma showsthat if

ALG moves its page to a location that is far from p , then this means that there must be pagesthat are far from p . Later we will show that OPT pays considerable cost to serve them, even if doneremotely. See Fig. 2 for an illustration of the lemma.

Lemma 2.

Let P = p , p , . . . be the sequence of page locations that ALG produces. Let p max be thefurthest point with respect to p a page is moved to by the ALG , i.e., p max def = arg max p i d ( p i , p ) . In case that there are several pages at p max , we let p max be the ﬁrst among them. Let d max def = d ( p max , p ) .Let P be the maximal consecutive sequence of P including p max consisting of pages that are each atdistance at least r def = d max / from p . Then, for q < / , it holds that the page locations in P servetogether at least qD points at distance r from p in the oracle sequence.Proof. The proof proceeds by contradiction. Suppose that P serves fewer than 6 qD points in the oraclesequence. We will show that a better solution consists of replacing the sequence P by simply moving to p and serving all points remotely from there. Since P is a maximal sequence of P including p max suchthat each page location is at distance r from p , ALG moves by at least d max − r within P . Hence, thecost of ALG using the page locations P is at least D ( d max − r ) + (cid:88) d i , (20)where the (cid:80) d i represents the distances to pages served remotely from the page locations in P (depictedas solid lines connected to p, p max and p (cid:48) in Fig. 2). Consider a request s that is served from location p in the original (using P ) solution. In the new solution, where all points are served from p , servingany request has, by triangle inequality, a cost of at most d ( p , p ) + d ( p, s ) ≤ d max + d ( p, s ). Moreover,observe that the sequence P consists of at most 6 qD locations. This is because otherwise there wouldbe a location that does not serve any points. Putting everything together, the cost of the new solutionis at most 2 Dr + 6 qDd max + (cid:88) d i , (21)where the 2 Dr accounts for moving the page from the location preceding P to p (the cost of at most Dr ) and moving the page from p to the location just after P (also the cost of at most Dr ). Recall that r = d max /

4. Thus, Eq. (21) is cheaper than the solution Eq. (20) for q small enough (i.e., for q < / ALG of the oracle sequence.11y Lemma 2, we conclude that there are at least 6 qD points at distance r from p in the oraclesequence. Note that the ﬁnal sequence s will contain at least 6 qD − qD of these points, due to ourassumption on noise and the fact that up to the ﬁrst violation of Assumption 1 were detected as time t . OPT has to serve these points as well and thus O t (cid:48) ≥ (6 qD − qD ) r = 4 qD · d max / ≥ qD · d ( a, p ) , which yields Eq. (16) and therefore completes the proof. p p max p (cid:48) p (cid:48)(cid:48) d max r Figure 2: An illustration of Lemma 2, where we argue that the reason we moved a page to a locationfar away (at distance d max ) from p means that there must be many points that are at least at distance r = d max / p . OPT will have to serve most of these points as well. The squares denote locationof pages, the small circles denote page requests, the solid lines between squares and small circles depicta remotely served request. The dashed lines denote the movement of the page. The sequence P consistsof p (cid:48) , p max and p (cid:48)(cid:48) . We evaluate our approach on two synthetic data sets, and compare it to the state of the art algorithmfor page migration due to Westbrook [Wes94]. The two data sets are obtained by generating “predicted”sequences of points in the plane, and then perturbing each point by independent Gaussian noise to obtain“actual” sequences. The predicted sequence is fed to our algorithm, while the actual sequence forms aninput of the online algorithm. Recall that our algorithm sees the actual sequence only in the onlinefashion.

Data sets

The predicted sequences of the two sets of points are generated as follows:1.

Line process: the t -th point ( ˆ X ( t ) , ˆ X ( t )) is equal to ( t, Brownian motion process: the t -th point ˆ X ( t ) is equal to ˆ X ( t −

1) + (∆ ( t ) , ∆ ( t )), where ∆ t ( t )and ∆ ( t ) are i.i.d. random variables chosen from N (0 , t -th request X ( t ) in the actual sequence is equal toˆ X ( t ) + ( N ( t ) , N ( t )), where N ( t ) , N ( t ) are i.i.d. random variables chosen from N (0 , σ ). The value of σ varies, depending on the speciﬁc experiment. An example Brownian motion sequence is depicted inFig. 3. 12igure 3: An example of Brownian motion sequence. The predicted sequence is in blue, the actualsequence is in red. Set up

We use the two data sets to compare the following three algorithms: • Predict refers to our algorithm, which computes the optimum solution for the predicted sequence(by using standard dynamic programming) and follows that optimum to serve actual requests. • Opt is the optimum oﬄine algorithm executed on the actual sequence. This optimum is computedby using the same dynamic programming as in the implementation of

Predict . • Online is state-of-the-art online randomized algorithm for page migration that achieves 2 . Online and as the output report the averageof all the runs. The standard deviation is smaller than 5%.For both data sets, we depict the costs of the three algorithms as a function of either D or σ . Seethe text above each plots for the speciﬁcation. Results

The results for the Brownian motion data set are depicted in Fig. 4. The top two ﬁgures showthe cost incurred by each algorithm for ﬁxed values of σ and diﬀerent values of D , while the bottom twoﬁgures show the costs for ﬁxed values of D while σ varies. Not surprisingly, for low values of σ , the costs Predict and

Opt are almost equal, since the predicted and the actual sequences are very close to eachother. As the value of σ increases, their costs starts to diverge. Nevertheless, the beneﬁt of predictionsis clear, as the cost of Predict is signiﬁcantly lower than the cost of

Online . Interestingly, this holdseven though the fraction of requests predicted exactly is very close to 0.The results for the Line data set is depicted in Fig. 5. They are qualitatively similar to those forBrownian motion. 13 a) Fixed sigma, varying D . (b) Fixed sigma, varying D .(c) Fixed D = 2, varying sigma. (d) Fixed D = 5, varying sigma. Figure 4: Comparison between

Predict , Opt and

Online on Brownian motion data set. (a) Fixed sigma, varying D . (b) Fixed sigma, varying D .(c) Fixed D = 2, varying sigma. (d) Fixed D = 5, varying sigma. Figure 5: Comparison between

Predict , Opt and

Online on Line data set.14 eferences [ABF93] Baruch Awerbuch, Yair Bartal, and Amos Fiat. Competitive distributed ﬁle allocation. In

STOC , volume 93, pages 164–173, 1993.[ABF03] Baruch Awerbuch, Yair Bartal, and Amos Fiat. Competitive distributed ﬁle allocation.

Information and Computation , 185(1):1–40, 2003.[BBM17] Marcin Bienkowski, Jaroslaw Byrka, and Marcin Mucha. Dynamic beats ﬁxed: On phase-based algorithms for ﬁle migration.

ICALP , 2017.[BCI97] Yair Bartal, Moses Charikar, and Piotr Indyk. On page migration and other relaxed tasksystems.

SODA , 1997.[BDSV18] Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning tobranch. In

International Conference on Machine Learning , pages 353–362, 2018.[BFK +

17] Joan Boyar, Lene M Favrholdt, Christian Kudahl, Kim S Larsen, and Jesper W Mikkelsen.Online algorithms with advice: A survey.

ACM Computing Surveys (CSUR) , 50(2):19, 2017.[BFR95] Yair Bartal, Amos Fiat, and Yuval Rabani. Competitive algorithms for distributed datamanagement.

Journal of Computer and System Sciences , 51(3):341–358, 1995.[Bie12] Marcin Bienkowski. Migrating and replicating data in networks.

Computer Science-Researchand Development , 27(3):169–179, 2012.[BJPD17] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing usinggenerative models. In

International Conference on Machine Learning , pages 537–546, 2017.[BS89] David L Black and Daniel D Sleator.

Competitive algorithms for replication and migrationproblems . Carnegie-Mellon University. Department of Computer Science, 1989.[CLRW97] Marek Chrobak, Lawrence L Larmore, Nick Reingold, and Jeﬀery Westbrook. Page migrationalgorithms using work functions.

Journal of Algorithms , 24(1):124–157, 1997.[GP19a] Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expertadvice. In

Proceedings of the 36th International Conference on Machine Learning , pages2319–2327, 2019.[GP19b] Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expertadvice. In

International Conference on Machine Learning , pages 2319–2327, 2019.[HIKV19] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. Learning-based frequency esti-mation algorithms. In

International Conference on Learning Representations , 2019.[KBC +

18] Tim Kraska, Alex Beutel, Ed H Chi, Jeﬀrey Dean, and Neoklis Polyzotis. The case for learnedindex structures. In

Proceedings of the 2018 International Conference on Management ofData , pages 489–504, 2018.[KDZ +

17] Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorialoptimization algorithms over graphs. In

Advances in Neural Information Processing Systems ,pages 6348–6358, 2017.[KM16] Amanj Khorramian and Akira Matsubayashi. Uniform page migration problem in euclideanspace.

Algorithms , 9(3):57, 2016.[KPS +

19] Ravi Kumar, Manish Purohit, Aaron Schild, Zoya Svitkina, and Erik Vee. Semi-onlinebipartite matching.

ITCS , 2019.[LLMV20] Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Onlinescheduling via learned weights. In

Proceedings of the Fourteenth Annual ACM-SIAM Sym-posium on Discrete Algorithms , pages 1859–1877. SIAM, 2020.15LRWY98] Carsten Lund, Nick Reingold, Jeﬀery Westbrook, and Dicky Yan. Competitive on-line al-gorithms for distributed data management.

SIAM Journal on Computing , 28(3):1086–1111,1998.[LV18] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice.In

International Conference on Machine Learning , pages 3302–3311, 2018.[Mat15] Akira Matsubayashi. A 3+ omega (1) lower bound for page migration. In , pages 314–320. IEEE,2015.[Mit18] Michael Mitzenmacher. A model for learned bloom ﬁlters and optimizing by sandwiching.In

Advances in Neural Information Processing Systems , pages 464–473, 2018.[MMS90] Mark S Manasse, Lyle A McGeoch, and Daniel D Sleator. Competitive algorithms for serverproblems.

Journal of Algorithms , 11(2):208–230, 1990.[MPB15] Ali Mousavi, Ankit B Patel, and Richard G Baraniuk. A deep learning approach to structuredsignal recovery. In

Communication, Control, and Computing (Allerton), 2015 53rd AnnualAllerton Conference on , pages 1336–1343. IEEE, 2015.[PSK18] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml pre-dictions. In

Advances in Neural Information Processing Systems , pages 9661–9670, 2018.[Roh20] Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In

Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages1834–1845. SIAM, 2020.[Unc16]

Special Semester on Algorithms and Uncertainty , 2016. https://simons.berkeley.edu/programs/uncertainty2016 .[Wes94] Jeﬀery Westbrook. Randomized algorithms for multiprocessor page migration.

SIAM Journalon Computing , 23(5):951–965, 1994.[WLKC16] Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. Learning to hash for indexing bigdata - a survey.