Online Page Migration with ML Advice
Piotr Indyk, Frederik Mallmann-Trenn, Slobodan Mitrović, Ronitt Rubinfeld
OOnline Page Migration with ML Advice
Piotr Indyk ∗ Frederik Mallmann-Trenn † Slobodan Mitrovi´c ∗ Ronitt Rubinfeld ‡ Abstract
We consider online algorithms for the page migration problem that use predictions, potentiallyimperfect, to improve their performance. The best known online algorithms for this problem, dueto Westbrook’94 and Bienkowski et al’17, have competitive ratios strictly bounded away from 1.In contrast, we show that if the algorithm is given a prediction of the input sequence, then it canachieve a competitive ratio that tends to 1 as the prediction error rate tends to 0. Specifically, thecompetitive ratio is equal to 1 + O ( q ), where q is the prediction error rate. We also design a “fallbackoption” that ensures that the competitive ratio of the algorithm for any input sequence is at most O (1 /q ). Our result adds to the recent body of work that uses machine learning to improve theperformance of “classic” algorithms. Recently, there has been a lot of interest in using machine learning to design improved algorithmsfor various computational problems. This includes work on data structures [KBC +
18, Mit18], onlinealgorithms [LV18, PSK18, GP19a, Roh20], combinatorial optimization [KDZ +
17, BDSV18], similaritysearch [WLKC16], compressive sensing [MPB15, BJPD17] and streaming algorithms [HIKV19]. Thisbody of work is motivated by the fact that modern machine learning methods are capable of discover-ing subtle structure in collections of input data, which can be utilized to improve the performance ofalgorithms that operate on similar data.In this paper we focus on learning-augmented online algorithms . An on-line algorithm makes non-revocable decisions based only on the part of the input seen so far, without any knowledge of the future.It is thus natural to consider a relaxation of the model where the algorithm has access to (imperfect)predictors of the future input that could be used to improve the algorithm performance. Over the lastcouple of years this line of research has attracted growing attention in the machine learning and algorithmsliterature, for classical on-line problems such as caching [LV18, Roh20], ski-rental and scheduling [PSK18,GP19b, LLMV20] and graph matching [KPS + For instance, for caching, even a single misprediction can lead to anunbounded competitive ratio [LV18].In this paper we show that, perhaps surprisingly, the aforementioned “optimistic” strategy leadsto near-optimal performance for some well-studied on-line problems. We focus on the problem of pagemigration [BS89] (a.k.a. file migration [Bie12] or [MMS90]). Here, the algorithmis given a sequence s of points (called requests ) s , s , . . . from a metric space ( X, d ), in an online fashion.The state of the algorithm is also a point from (
X, d ). Given the next request s i , the algorithm moves to itsnext state a i (at the cost of D · d ( a i − , a i ), where D is a parameter), and then “satisfies” the request s i (atthe cost of d ( a i , s i )). The objective is to satisfy all requests while minimizing the total cost. The problemhas been a focus on a large body of research, see e.g., [ABF93, Wes94, CLRW97, BCI97, KM16, BBM17]. ∗ CSAIL, MIT, {indyk,slobo}@mit.edu † King’s College London, [email protected] ‡ CSAIL, MIT, [email protected] To the best of our knowledge the only problem for which this strategy is known to result in an optimal algorithm isthe online bipartite matching, see Section 1.1 for more details. a r X i v : . [ c s . D S ] J un he best known algorithms for this problem have competitive ratios of 4 (a deterministic algorithm dueto [BBM17]), 3 (a randomized algorithm against adaptive adversaries due to [Wes94]) and 2 . . . . (a randomized algorithm against oblivious adversaries due to [Wes94]). The original paper [BS89] alsoshowed that the competitive ratio of any deterministic algorithm must be at least 3, which was recentlyimproved to 3 + (cid:15) for some (cid:15) > Our results
Suppose that we are given a predicted request sequence ˆ s that, in each interval of length (cid:15)D , differs from the actual sequence s on at most a fraction q of positions, where (cid:15), q ∈ (0 ,
1) are theparameters (note that the lower the values of (cid:15) and q are, the stronger our assumption is). Under thisassumption we show that the optimal off-line solution for ˆ s is a (1+ (cid:15) )(1+ O ( q ))-competitive solution for s as long as the parameter q > Furthermore, to make the algorithm robust,we also design a “fallback option”, which is triggered if the input sequence violates the aforementionedassumption (i.e., if the fraction of errors in the suffix of the current input sequence exceeds q ). Thefallback option ensures that the competitive ratio of the algorithm for any input sequence is at most O (1 /q ). Thus, our final algorithm produces a near-optimal solution if the prediction error is small, whileguaranteeing a constant competitive ratio otherwise.For the case when the underlying metric is uniform , i.e., all distances between distinct points areequal to 1, we further improve the competitive ratio to 1 + O ( q ) under the assumption that each intervalof length D differs from the actual sequence in at most qD positions. That is, the parameter (cid:15) is notneeded in this case. Moreover, any algorithm has a competitive ratio of at least 1 + Ω( q ).It is natural to wonder whether the same guarantees hold even when the predicted sequence differsfrom the actual sequence on at most a fraction of q positions distributed arbitrarily over ˆ s , as opposedto over chunks of length εD . We construct a simple example that shows that such a relaxed assumptionresults in the same lower bound as for the classical problem. Multiple variations of the page migration problem have been studied over the years. For example, if thepage can be copied as well as moved, the problem has been studied under the name of file allocation ,see e.g., [BFR95, ABF03, LRWY98]. Other formulations add constraints on nodes capacities, allowdynamically changing networks etc. See the survey [Bie12] for an overview.There is a large body of work concerning on-line algorithms working under stochastic or probabilisticassumptions about the input [Unc16]. In contrast, in this paper we do not make such assumptions,and allow worst case prediction errors (similarly to [LV18, KPS +
19, PSK18]). Among these works, ourprediction error model (bounding the fraction of mispredicted requests) is most similar to the “agnostic”model defined in [KPS + d vertices. Since each vertex impacts at most one matching edge, it directly follows that d errorsreduce the matching size by at most d . In contrast, in our case a single error can affect the cost of theoptimum solution by an arbitrary amount. Thus, our analysis requires a more detailed understanding ofthe properties of the optimal solution.Multiple papers studied on-line algorithms that are given a small number of bits of advice [BFK + Page Migration
In the classical version, the algorithm is given a sequence s of points (called requests ) s = ( s i ) i ∈ [ n ] from a metric space ( X, d ), in an online fashion. The state of the algorithm (i.e., the page ), is also a point from (
X, d ). Given the next request s i , the algorithm moves to its next state Note that if each interval of length D has at most a fraction of q of errors, then it is also the case that each intervalof length √ qD has at most a fraction of √ q of errors. Thus, if q tends to 0, the competitive ratio tends to 1 even if theinterval length remains fixed. i (at the cost of D · d ( a i − , a i ), where D > s i (atthe cost of d ( a i , s i )). The objective is to satisfy all requests while minimizing the total cost. We canconsider a version of this problem where the algorithm is given, prior to the arrival of the requests, a predicted sequence ˆ s = ( s ∗ i ) i ∈ [ n ] . The (final) sequence s is generated adversarially from ˆ s and an arbitrary adversarial sequence s (cid:63) = ( s ∗ i ) i ∈ [ n ] . That is either s i = ˆ s i or s i = s ∗ i . If we do not make any assumptionson how well s is predicted by ˆ s , then the problem is no easier than the classical online version. On theother hand, if s = ˆ s , then one obtains an optimal online algorithm, by simply computing the optimaloffline algorithm. The interesting regime lies in between these two cases. We will make the followingassumption throughout the paper, which roughly speaking demands that a 1 − q fraction of the input iscorrectly predicted and that the q fraction of errors is somewhat spread out. Definition 1 (Number of mismatches m ( · )) . Let I be an interval of indices. We define m ( I ) def = (cid:80) t ∈I s t (cid:54) =ˆ s t to be the number of mismatches between s and ˆ s within the interval I . Assumption 1.
Consider an interval I of s of length εD . For any I it holds m ( I ) ≤ qεD . Remark 1.
Relaxing Assumption 1 by allowing the adversary to change an arbitrary q fraction of theinput results in the same lower bound as for the classical problem. To see this, consider an arbitraryinstance on qn elements that gives a lower bound of c in the classical problem. Call this sequence ofelements adversarial . Let ˆ s consists of n elements being equal to the starting point. That is, ˆ s is simplythe starting position replicated n times. Let s be equal to the sequence ˆ s whose suffix of length qn isreplaced by the adversarial sequence. Now, on s defined in this way no algorithm can be better than c -competitive. Hence, in general this relaxation of Assumption 1 gives no advantage. Our main results hold for general metric space , where for all p, p (cid:48) , p (cid:48)(cid:48) ∈ X all of the following hold: d ( p, p ) = 0, d ( p, p (cid:48) ) > p (cid:54) = p (cid:48) , d ( p, p (cid:48) ) = d ( p (cid:48) , p ), and d ( p, p (cid:48)(cid:48) ) ≤ d ( p, p (cid:48) ) + d ( p (cid:48) , p (cid:48)(cid:48) ). We obtain betterresults for uniform metric space , where, d ( p, p (cid:48) ) = 1 for p (cid:54) = p (cid:48) . Notation
Given a sequence s , we use s i to denote the i -th element of s . For integers i and j , suchthat 1 ≤ i ≤ j , we use s [ i,j ] to denote the subsequence of s consisting of the elements s i , . . . , s j .For a fixed algorithm, let p i be the position of the page at time i . In particular, p denotes the startposition for all algorithms.Given an algorithm B that pays cost C for serving n requests, we denote by C t ,t the cost paid by B during the interval [ t , t ]. We sometimes abuse notation and write C t as a shorthand for C ,t . Inparticular, C denotes C ,n as well as C n . This notation is the most often used in the context of ouralgorithm ALG and the optimal solution
OPT , whose total serving costs are A and O , respectively. Our two main contributions are: algorithm
ALG that is (1 + O ( q ))-competitive provided Assumption 1;and, a black-box reduction from ALG to a O (1 /q )-competitive algorithm ALG robust when Assumption 1does not hold. In Section 3.1 we present an overview of
ALG , while an overview of
ALG robust is givenin Section 3.2.
Algorithm
ALG (given as Algorithm 1) simply computes the optimal offline solution and moves pagesaccordingly.
Algorithm 1
ALG ( i, s, ˆ s ) Input
The number i of the next request. Output s and ˆ s are sequences as defined in Section 2. Let p i be the position of the page in the optimal algorithm at the i -th request with respect to ˆ s . Move the page to p i and serve the request s i .The main challenge in proving that ALG still performs well in the online setting lies in leveragingthe optimality of
ALG with respect to the offline sequence. The reason for this is that, due to s and3 s not being identical, OPT and
ALG may be on different page locations throughout all the requests.In addition to that, we have no control over which q fraction of any interval of length D is changednor to what it is changed. In particular, if s i (cid:54) = ˆ s i , then s i and ˆ s i could be very far from each other.To circumvent this, we use the following way to argue about the offline optimality, that is, about theoptimality computed with respect to ˆ s .We think of ALG ( OPT , respectively) as a sequence of page locations that are defined with respectto ˆ s ( s , respectively). These page locations do not change even if, for instance, the i -th online requestto ALG deviates from ˆ s i . Let A t ( O t , respectively) be the cost of ALG ( OPT , respectively) serving t requests given by s [1 ,t ] . Similarly, let ˆ A t ( ˆ O t , respectively) be the cost of ALG ( OPT , respectively) forserving the oracle subsequence ˆ s [1 ,t ] . In particular, A n is the cost of ALG (optimal on ˆ s ) on the finalsequence s , whereas ˆ O n is the cost of the optimal algorithm for s on the predicted sequence ˆ s . It isconvenient to think of ˆ O n as the ‘evil twin’ of A n .We have, due to optimality of ALG on the offline sequence, A n − O n = A n − ˆ A n + ˆ A n − O n ≤ A n − ˆ A n + ˆ O n − O n . (1)The intuition behind this is best explained pictorially, which we do in Fig. 1. Here ALG is at a and OPT is at o . In the depicted example a request is moved from s to ˆ s . This causes A n − ˆ A n to increase,however, at the same time, ˆ O n − O n decreases by almost the same amount. In fact, one can show thatfor such a moved page the right hand side of Eq. (1) will increase by no more than 2 d ( a, o ). For pagesthat are not moved, i.e., s = ˆ s , the costs of ALG and
OPT do not change. It remains to bound d ( a t , o t ),which we do next. By triangle inequality, it holds that d ( a t , o t ) ≤ d ( a t , s t ) + d ( o t , s t ) ≤ A t − A t − + O t − O t − , (2)Consider an interval ( t i − , t i ]. Let c ( t i − ,t i ] move be the total sum of moving costs for both OPT and
ALG for the requests in the interval ( t i − , t i ]. As a reminder (see Definition 1), for a given interval I , m ( I ) isthe number of mismatches between s and ˆ s within I . From Eq. (2), we derive A n − O n ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . (3)We would like the right hand side of Eq. (3) to be small, implying that A n − O n is small as well. Tounderstand the nature of the right hand side of Eq. (3) and what is required for it to be small, assumefor a moment that m (( t i − , t i ]) = α ( t i − t i − ). Then, the rest of the summation telescopes to A n − O n ,and Eq. (3) reduces to A n − O n ≤ α ( A n − O n ). Now, if α is sufficiently small, e.g., α ≤ q , then we areable to upper-bound Eq. (3) by 4 q ( A n + O n ) and derive A n O n ≤ q − q , which gives the desired competitive factor.So, to utilize Eq. (3), in our proof we will focus on showing that m (( t i − , t i ]) is sufficiently smallerthan t i − t i − . However, this can be challenging as OPT is allowed to move often, potentially on everyrequest which results in t i − t i − being very small. But, if t i − t i − is too small, then Assumption 1gives no information about m (( t i − , t i ]). However, if intervals t i − t i − would be large enough, e.g.,at least βD for some positive constant β , then from Assumption 1 we would be able to conclude that α = O ( q ). Since in principle OPT can move in every step, we design ‘lazy’ versions of
OPT and
ALG that only move O (1) times in any interval of length D . This will enable us to argue that t i − t i − is nottoo small. It turns out that the respective competitive factors of the lazy versions with respect to theoriginal versions is very close, allowing us prove A n O n ≈ A lazyn O lazyn ≤ (1 + ε ) 1 + O ( q )1 − O ( q ) . We oversimplified here, since the right hand side of (1) only holds for the sum of all points, but a similar argumentcan be made for a single requests. a s ˆ s Figure 1: A pictorial representation of Eq. (1). robust , a robust version of ALG
We now describe
ALG robust . This algorithm follows a “lazy” variant of
ALG as long as Assumption 1holds, and otherwise switches to
ALG online . Instead of using
ALG directly, we use a ‘lazy’ version of
ALG that works as follows: Follow the optimal offline solution given by
ALG with a delay of 6 qD steps.Let ALG lazy be the corresponding algorithm. We point out that performing some delay with respect to
ALG is crucial here. To see that, consider the following example in the case of uniform metric spaces: s = { } n and ˆ s = { } n , and let the starting location be 0. According to ALG , the page should be movedfrom 0 to 1 in the very beginning, incurring the cost of D . On the other hand, OPT never moves from 0.If
ALG robust would follow
ALG until it realizes that the fraction of errors is too high, it would alreadypay the cost of at least D , leading to an unbounded competitive ratio. However, if ALG robust delaysfollowing
ALG , then it gets some “slack” in verifying whether the predicted sequence properly predictsrequests or not. As a result, when Assumption 1 holds, this delay increases the overall serving cost bya factor O (1 + O ( q )), but in turn achieves a bounded competitive ratio when this assumption does nothold.While serving requests, ALG robust also maintains the execution of
ALG online , i.e.,
ALG robust main-tains where
ALG online would be at a given point in time, in case a fallback is needed. Now
ALG robust simply executes
ALG lazy unless we find a violation of Assumption 1 is detected. Once such a violationis detected, the algorithm switches to
ALG online by moving its location to
ALG online ’s current location.From there on
ALG online is executed.We now present the intuition behind the proof for the competitive factor of the algorithm.
Case when Assumption 1 holds.
In this case
ALG robust is ALG lazy , and the analysis boils downto proving competitive ratio of
ALG lazy . We show that
ALG lazy is (1 + O ( q ))-competitive to ALG ,which is, as we argued in the previous section, 1 + O ( q ) competitive to OPT . To see this, we employ thefollowing charging argument: whenever
ALG moves from p to p (cid:48) it pays D · d ( p, p (cid:48) ). The lazy algorithmeventually pays the same moving cost of less.However, in addition, the serving cost of ALG lazy for each of the 6 qD requests is potentially increased,as ALG lazy is not at the same location as
ALG . Nevertheless, by triangle inequality, the cost due to themovement from p to p (cid:48) of ALG reflect to an increase in the serving cost of
ALG lazy by at most d ( p, p (cid:48) ).In total over all the 6 qD requests and per each move of ALG from p to p (cid:48) , ALG lazy pays at most6 qDd ( p, p (cid:48) ) extra cost compared to ALG . Considering all migrations, this gives a 1 + O ( q ) competitivefactor. Case when Assumption 1 is violated.
The case where Assumption 1 is violated (say at time t (cid:48) ) isconsiderably more involved. We then have ALG robust ≤ ALG lazy (0 , t (cid:48) ) + ALG online ( t (cid:48) + 1 , n ) + D · d ( a, a (cid:48) ) , and we seek to upper-bound each of these terms by O ( OPT /q ). While the upper-bound holds directlyfor ALG online ( t (cid:48) + 1 , n ), showing the upper-bound for other terms is more challenging.The key insight is that, due to the optimality of ALG , d ( a, p ) ≤ OP T ( t (cid:48) ) / ( qD ) , (4)which can be proven as follows. If ALG migrates its page to a location that is far from the startinglocation p , then there have to be, even when taking into account noise, at least 4 qD page requests that5re far from p . OPT also has to serve these requests (either remotely or by moving), and hence hasto pay a cost of at least qD · d ( a, p ). Equipped with this idea, we can now bound D · d ( a, a (cid:48) ) in termsof OPT ( t (cid:48) ) /q . To bound ALG lazy (0 , t (cid:48) ) we need one more idea. Namely, we compare ALG lazy (0 , t (cid:48) ) tothe optimal solution that has a constraint to be at the same position as ALG lazy at time t (cid:48) . A formalanalysis is given in Section 5. Now we analyze
ALG (Algorithm 1). As discussed in Section 3.1, our main objective is to establishEq. (3), which we do in Section 4.1. That upper-bound will be directly used to obtain our result foruniform metric spaces, as we present in Section 4.2. To construct our algorithm for general-metric spaces,in Section 4.3 we build on
ALG by first designing its “lazy” variant. As the final result, we show thefollowing. Recall that q is the fraction of symbols that the adversary is allowed to change in any sequenceof length εD of the predicted sequence. Theorem 1.
If Assumption 1 holds with respect to parameter ε , then we obtain the following results:(A) There exists a (1 + ε ) · (1 + O ( q )) -competitive algorithm for the online page migration problem.(B) There exists a (1 + O ( q )) -competitive algorithm for the online page migration problem in uniformmetric spaces. Note that Theorem 1 is asymptotically optimal with respect to q . Namely, any algorithm is at least1 + Ω( q ) competitive; even in the uniform metric case. To see this consider the following binary examplewhere the algorithm starts at position 0. The advice is s = 111 · · · (cid:124) (cid:123)(cid:122) (cid:125) (1 − q ) D · · · (cid:124) (cid:123)(cid:122) (cid:125) qD . The final sequence isˆ s = s w.p. 1 / · · · (cid:124) (cid:123)(cid:122) (cid:125) (1+ q ) D otherwise . In the first case
OPT simply stays at 0 since moving costs D ; in the second case, OPT goes immediatelyto 1. Note that
ALG can only distinguish between the sequences after (1 − q ) D steps at which point itis doomed to have an additional cost of qD with probability at least 1 / s . In our proofs we will use the following corollary of Assumption 1.
Corollary 1.
If Assumption 1 holds, then for any interval I of length (cid:96) > εD it holds m ( I ) ≤ q(cid:96) .Proof. This statement follows from the fact that each such I can be subdivided into k ≥ εD and at most one interval I (cid:48) of length less than εD . On one hand, the total numberof mismatches for these intervals of length exactly εD is upper-bounded by qkεD ≤ q(cid:96) . On the otherhand, since I (cid:48) is a subinterval of an interval of length εD , it holds m ( I (cid:48) ) ≤ qεD < q(cid:96) . The claim nowfollows.Most of our analysis in this section proceeds by reasoning about intervals where neither ALG nor
OPT moves. Let t , t . . . be the time steps at which either OPT or ALG move. The final product ofthis section will be an upper-bound on A n − O n as given by Eq. (3) , i.e., A n − O n ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . We begin by rewriting and upper-bounding A t − O t as follows A t − O t = A t − ˆ A t + ˆ A t − O t ≤ A t − ˆ A t + ˆ O t − O t , (5) As a reminder, A t ( O t , respectively) is the cost of ALG ( OPT , respectively) at time for the sequence s [1 ,t ] . A t ≤ ˆ O t as ˆ A t is the optimum for ˆ s . Consider a fixed interval I = ( t i − , t i ]. Then,by triangle inequality, it holds d ( a t , o t ) ≤ d ( a t , s t ) + d ( o t , s t ) ≤ A t − A t − + O t − O t − . (6)Let c ( t i − ,t i ] move be the sum of moving costs for OPT and
ALG in ( t i − , t i ]. Note that A t i − A t i − + O t i − O t i − = (cid:88) t ∈ ( t i − ,t i ] ( A t − A t − + O t − O t − ) ≥ c ( t i − ,t i ] move + d ( a t i , o t i ) | t i − t i − | , (7)where the inequality comes from Eq. (6) applied to every time step in ( t i − , t i ] and the fact that ALG or OPT must have moved inducing a cost of at least c ( t i − ,t i ] move . The following notation is used to representthe difference between serving s i and ˆ s i by ALG A [ t − , t ] := A t − ˆ A t − ( A t − − ˆ A t − ) = d ( a t , s t ) − d ( a t , ˆ s t ) . Note that this holds even when
ALG moves since the moving costs for the oracle sequence and on thefinal sequence are the same and therefore cancel each other out. Similarly to A [ t − , t ], letˆ O [ t − , t ] := ˆ O t − O t − ( ˆ O t − − O t − ) = d ( o t , ˆ s t ) − d ( o t , s t ) . Consider now any t ∈ [1 , n ]. By triangle inequality we have A [ t − , t ] + ˆ O [ t − , t ] = d ( a t , s t ) − d ( o t , s t ) + d ( o t , ˆ s t ) − d ( a t , ˆ s t ) ≤ ( d ( a t , o t ) + d ( o t , s t )) − d ( o t , s t ) + ( d ( a t , ˆ s t ) + d ( a t , o t )) − d ( a t , ˆ s t )= 2 d ( a t , o t ) (7) ≤ A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . (8)Let ∆ i = A t i − ˆ A t i + ˆ O t i − O t i , where ∆ = 0 by definition. Note that A n − O n (5) ≤ A n − ˆ A n + ˆ O n − O n = (cid:88) i (∆ i − ∆ i − )= (cid:88) i (cid:88) t ∈ ( t i − ,t i ] (cid:16) A [ t − , t ] + ˆ O [ t − , t ] (cid:17) . Recall that, for a given interval I the function m ( I ) denotes the number of mismatches between s andˆ s within I (see Definition 1). Now, as for t such that s t = ˆ s t we have A [ t − , t ] = ˆ O [ t − , t ] = 0, thelast chain of inequalities further implies A n − O n Eq. (8) ≤ (cid:88) i (cid:88) t ∈ ( t i − ,t i ] s t (cid:54) =ˆ s t · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − . (9)This establishes the desired upper-bound on A n − O n . As discussed in Section 3.1, this upper-bound isused to derive our non-robust results for uniform (Section 4.2) and general (Section 4.3) metric spaces.The main task in those two sections will be to show that m (( t i − , t i ]) is sufficiently smaller than t i − t i − . (B) We now use the upper-bound on A n − O n given by Eq. (9) to show that ALG is (1 + O ( q ))-competitiveunder Assumption 1, i.e., we show Theorem 1 (A). We distinguish between two cases: t i − t i − ≥ D ;and t i − t i − < D . 7 ase t i − t i − ≥ D . In this case, by Corollary 1 we have m (( t i − , t i ]) ≤ q | t i − t i − | . Plugging thisinto Eq. (9) we derive A n − O n ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − t i − t i − ≤ q (cid:88) i ( A t i − A t i − + O t i − O t i − )= 4 q ( A n + O n ) . Case t i − t i − < D . We proceed by upper-bounding all the terms in Eq. (9). As the interval ( t i − , t i ]is a subinterval of ( t i − , t i − + D ], we have m (( t i − , t i − + D ]) ≤ m (( t i − , t i ]) ≤ qD. Also, observe that trivially it holds A t i − A t i − + O t i − O t i − ≤ | t i − t i − | + c ( t i − ,t i ] move . (10)Combining the derived upper-bounds, we establish A n − O n Eq. (9) ≤ (cid:88) i m (( t i − , t i ]) · A t i − A t i − + O t i − O t i − − c ( t i − ,t i ] move t i − t i − Eq. (10) ≤ (cid:88) i qD t i − t i − ) + c ( t i − ,t i ] move − c ( t i − ,t i ] move t i − t i − (11)= 4 q (cid:88) i D. (12)To conclude this case, note that by definition either ALG or OPT moves within ( t i − , t i ], incurring thecost of at least D . Therefore, A t i − A t i − + O t i − O t i − ≥ D . This together with Eq. (12) implies A n − O n ≤ q (cid:88) i ( A t i − A t i − + O t i − O t i − ) = 4 q ( A n + O n ) . Combining the two cases.
We have concluded that in either case it holds A n − O n ≤ q ( A n + O n )and hence we derive A n O n ≤ q − q . This concludes the analysis for uniform metric spaces. (A)
As in the uniform case, our goal for general metric spaces is to use Eq. (3) for proving the advertisedcompetitive ratio. However, as we discussed in Section 3.1, the main challenge in applying Eq. (3) lies inupper-bounding the ratio between m (( t i − , t i ]) and t i − t i − by a small constant, ideally much smallerthan 1. Unfortunately, this ratio can be as large as 1 as OPT (or
ALG ) could possibly move on everysingle request. To see that, consider the scenario in which all the requests are on the x -axis and arerequested in their increasing order of their location. Then, for all but potentially the last D requests, OPT would move from request to request. To bypass this behavior of
OPT and
ALG , we define andanalyze their “lazy” variants, i.e., variants in which
OPT and
ALG are allowed to move only at the i -threquest when i is a multiple of εD . We now state the algorithm. lazy We use the following algorithm
ALG lazy : Compute the optimal offline solution (on ˆ s ) while only movingon multiples of εD . Let A lazy be the cost of the solution s and let (cid:92) A lazy be the cost of the solution onˆ s . Note that there can be better offline algorithms for ˆ s , however ALG lazy has the minimal cost amongall online algorithms that are only allowed to move every multiple of εD .8 .3.2 Proof We also need to consider a lazy version of
OPT , which we do in the following lemma. There we showthat making any algorithm lazy does not increase the cost by more than a factor of (1 + ε ). In particular,we will show O lazy ≤ (1 + ε ) OPT . Let A lazy t and O lazy t denote their costs at time t . Lemma 1.
Let ε ∈ (0 , . Consider an arbitrary prefix w of length t of a sequence of requests. Let B t be the cost of any algorithm ALG B serving w . Let B lazy t (cid:48) be the cost of the algorithm that has to moveat every time step that is a multiple of εD (and is not allowed to move at any other time step), and tomove to the position where ALG B is at that time step. Then, we have B lazy t (cid:48) ≤ (1 + ε ) B t . Proof.
Let x i be the distance of the i -th move and y i be the cost for serving the i -th request remotely.Then, B t = D (cid:88) i x i + (cid:88) i y i . Now we relate B t and B lazy t (cid:48) . B lazy t (cid:48) has two components: the moving cost and the cost for servingremotely. By triangle inequality, the moving cost is upper-bounded by D (cid:80) i x i . Consider now interval I j ∈ [ jεD + 1 , ( j + 1) εD ] for some integer j . To serve point i ∈ I j remotely, the cost is, by triangleinequality, at most the cost of y i plus the cost of traversing all the points with indices in I j where ALG B has moved to. Thus the cost per request i ∈ I j is upper-bounded by y i + (cid:80) k ∈I j x k . Note that thesummation (cid:80) k ∈I j x k is charged to εD requests. Hence, summing over all the intervals gives B lazy t (cid:48) ≤ D (cid:88) i x i + (cid:88) i y i + εD (cid:88) i x i ≤ (1 + ε ) B t . Define O lazy as the cost of the optimal algorithm for s that is allowed to move only at time stepswhich are multiple of εD . Similarly as in Lemma 1, we have O lazy n ≤ (1 + ε ) O n . Thus, A lazy n O n ≤ (1 + ε ) A lazy n O lazy n . (13)Now we need to upper-bound A lazy n O lazy n . We will do that by showing that the same statements as we developedin Section 4.1 hold for A lazy and O lazy . To that end, observe that to derive Eq. (5) we used the factthat ˆ A ≤ ˆ O . Notice that the analog inequality (cid:92) A lazy ≤ (cid:92) O lazy holds, since ALG lazy is the the optimaloffline algorithm that only moves every multiple of εD .Hence, we can obtain the derivation Eq. (9) for A lazy n − O lazy n A lazy n − O lazy n ≤ (cid:88) i m (( t i − , t i ]) · A lazy t i − A lazy t i − + O lazy t i − O lazy t i − t i − t i − . (14)Since for the lazy versions we have | t i − t i − | = εD , Assumption 1 implies m (( t i − , t i ]) ≤ εqD . Pluggingthis into Eq. (14) gives A lazy n − O lazy n ≤ q (cid:88) i (cid:16) A lazy t i − A lazy t i − + O lazy t i − O lazy t i − (cid:17) = 2 q ( A lazy n + O lazy n ) . From Eq. (13) we establish A lazy n O n ≤ (1 + ε ) A lazy n O lazy n ≤ (1 + ε ) 1 + 2 q − q . This concludes the proof of Theorem 1 (A). 9
Robust Page Migration
So far we designed algorithms for the online page migration problem that have small competitive ratiowhen Assumption 1 holds. In this section we build on those algorithm and design a (robust) algorithmthat performs well even when Assumption 1 does not hold, while still retaining competitiveness whenAssumption 1 is true. We refer to this algorithm by
ALG robust . For
ALG robust we prove the following.
Theorem 2.
Let γ be the competitive ratio of ALG for the online page migration problem, and let q bea positive number less than / . If Assumption 1 holds, then ALG robust is γ · (1 + O ( q )) -competitive,and otherwise ALG robust is O (1 /q ) -competitive. Using our techniques it is straight-forward to obtain an arbitrary trade-off between the two com-petitive ratios. Fix an arbitrary x ≥
1, then Algorithm
ALG robust is (1 + O ( x · q ))-competitive ifAssumption 1 holds and O (1 / ( x · q ))-competitive otherwise. robust Let
ALG online refer to an arbitrary online algorithm for the problem, e.g., [Wes94]. We now define
ALG robust . This algorithm switches from
ALG to ALG online when it detects that Assumption 1 doesnot hold. Instead of using
ALG directly, we use a “lazy” version of
ALG that works as follows. Followthe optimal offline solution given by
ALG with a delay of 6 qD steps. Let ALG lazy be the correspondingalgorithm. (A lazy version for different setup of parameters was presented in Section 4.3.)Throughout its execution,
ALG robust maintains/tracks in its memory the execution of
ALG online onthe prefix of s seen so far. That is, ALG robust maintains where
ALG online would be at a given point intime in case a fallback is needed. Now
ALG robust simply executes
ALG lazy unless we find a violationof Assumption 1 is detected. Once such a violation is detected, the algorithm switches to
ALG online bymoving its location to
ALG online ’s current location. From there on
ALG online is executed.We now analyze
ALG robust and show that in case Assumption 1 holds, then
ALG and
ALG robust are close in terms of total cost, and otherwise the cost of
ALG robust is at most O (1 /q ) larger than thatof ALG online . Case 1: Assumption 1 holds for the entire sequence.
In this case
ALG robust executes
ALG lazy throughout. Following the same argument for ε = 6 q as given for A lazy (cid:48) in the proof of Lemma 1, wehave A lazy t ≤ (1 + 6 q ) A t . (15)Thus, A robust = A lazy n ≤ (1 + 6 q ) A n ≤ γ (1 + O ( q )) O, where we used the assumption that ALG is γ -competitive. This completes this case. Case 2: Assumption 1 is violated at the t -th request. Let t (cid:48) = t − qD + 1. Note that up to thispoint in time no violation occurred. We define the following: a is the position of ALG lazy at time t (cid:48) ; a (cid:48) is the position of ALG online at time t (cid:48) + 1; o is the position of OPT at time t (cid:48) ; and, O p ,t (cid:48)(cid:48) is the cost of OPT up to time t (cid:48)(cid:48) where we demand that OPT is at position p at t (cid:48)(cid:48) .In the following, we assume the following holds. We defer the proof of its correctness for later. d ( a, p ) ≤ O t (cid:48) / ( qD ) . (16)Intuitively, this means that we can bound the distance from the starting position by the cost of OPT .Using Eq. (16), we get, A robust ≤ A lazy t (cid:48) + A online t (cid:48) +1 ,n + D · d ( a, a (cid:48) ) . (17)As O a ,t (cid:48) and A lazy ,t (cid:48) are at the same position at time t (cid:48) , inequality A lazy ,t (cid:48) ≤ (1 + c q ) O a ,t (cid:48) follows fromEq. (15) for a suitable constant c . Note that O t (cid:48) ≥ D · d ( p , o ), which holds since this cost is alreadyincurred by moving to o , where we used triangle inequality.10ext, using triangle inequality again, we get A lazy ,t (cid:48) ≤ (1 + c q ) O a ,t (cid:48) ≤ (1 + c q ) (cid:0) O o ,t (cid:48) + D · d ( a, o ) (cid:1) ≤ (1 + c q ) (cid:0) O o ,t (cid:48) + D · d ( a, p ) + D · d ( p , o ) (cid:1) ≤ (1 + c q ) (cid:0) O o ,t (cid:48) + O t (cid:48) /q + O t (cid:48) (cid:1) = O ( O t (cid:48) /q ) . (18)Furthermore, using Eq. (16), triangle inequality and a simple lower bound on A lazy ,t (cid:48) as well as Eq. (18),we get, D · d ( a, a (cid:48) ) ≤ D · d ( a, p ) + D · d ( p , a (cid:48) ) ≤ O t (cid:48) /q + A online ,t (cid:48) ≤ O t (cid:48) /q. (19)Thus, plugging Eq. (19) and Eq. (18) into Eq. (17) and using A online ≤ O ( O n ), we get A robust ≤ A lazy t (cid:48) + A online t (cid:48) +1 ,n + D · d ( a, a (cid:48) )= O ( O t (cid:48) /q ) + O ( O n ) + 2 O t (cid:48) /q = O ( O n /q ) . Thus, it only remains to prove Eq. (16), as we do using the following lemma. That lemma showsthat if
ALG moves its page to a location that is far from p , then this means that there must be pagesthat are far from p . Later we will show that OPT pays considerable cost to serve them, even if doneremotely. See Fig. 2 for an illustration of the lemma.
Lemma 2.
Let P = p , p , . . . be the sequence of page locations that ALG produces. Let p max be thefurthest point with respect to p a page is moved to by the ALG , i.e., p max def = arg max p i d ( p i , p ) . In case that there are several pages at p max , we let p max be the first among them. Let d max def = d ( p max , p ) .Let P be the maximal consecutive sequence of P including p max consisting of pages that are each atdistance at least r def = d max / from p . Then, for q < / , it holds that the page locations in P servetogether at least qD points at distance r from p in the oracle sequence.Proof. The proof proceeds by contradiction. Suppose that P serves fewer than 6 qD points in the oraclesequence. We will show that a better solution consists of replacing the sequence P by simply moving to p and serving all points remotely from there. Since P is a maximal sequence of P including p max suchthat each page location is at distance r from p , ALG moves by at least d max − r within P . Hence, thecost of ALG using the page locations P is at least D ( d max − r ) + (cid:88) d i , (20)where the (cid:80) d i represents the distances to pages served remotely from the page locations in P (depictedas solid lines connected to p, p max and p (cid:48) in Fig. 2). Consider a request s that is served from location p in the original (using P ) solution. In the new solution, where all points are served from p , servingany request has, by triangle inequality, a cost of at most d ( p , p ) + d ( p, s ) ≤ d max + d ( p, s ). Moreover,observe that the sequence P consists of at most 6 qD locations. This is because otherwise there wouldbe a location that does not serve any points. Putting everything together, the cost of the new solutionis at most 2 Dr + 6 qDd max + (cid:88) d i , (21)where the 2 Dr accounts for moving the page from the location preceding P to p (the cost of at most Dr ) and moving the page from p to the location just after P (also the cost of at most Dr ). Recall that r = d max /
4. Thus, Eq. (21) is cheaper than the solution Eq. (20) for q small enough (i.e., for q < / ALG of the oracle sequence.11y Lemma 2, we conclude that there are at least 6 qD points at distance r from p in the oraclesequence. Note that the final sequence s will contain at least 6 qD − qD of these points, due to ourassumption on noise and the fact that up to the first violation of Assumption 1 were detected as time t . OPT has to serve these points as well and thus O t (cid:48) ≥ (6 qD − qD ) r = 4 qD · d max / ≥ qD · d ( a, p ) , which yields Eq. (16) and therefore completes the proof. p p max p (cid:48) p (cid:48)(cid:48) d max r Figure 2: An illustration of Lemma 2, where we argue that the reason we moved a page to a locationfar away (at distance d max ) from p means that there must be many points that are at least at distance r = d max / p . OPT will have to serve most of these points as well. The squares denote locationof pages, the small circles denote page requests, the solid lines between squares and small circles depicta remotely served request. The dashed lines denote the movement of the page. The sequence P consistsof p (cid:48) , p max and p (cid:48)(cid:48) . We evaluate our approach on two synthetic data sets, and compare it to the state of the art algorithmfor page migration due to Westbrook [Wes94]. The two data sets are obtained by generating “predicted”sequences of points in the plane, and then perturbing each point by independent Gaussian noise to obtain“actual” sequences. The predicted sequence is fed to our algorithm, while the actual sequence forms aninput of the online algorithm. Recall that our algorithm sees the actual sequence only in the onlinefashion.
Data sets
The predicted sequences of the two sets of points are generated as follows:1.
Line process: the t -th point ( ˆ X ( t ) , ˆ X ( t )) is equal to ( t, Brownian motion process: the t -th point ˆ X ( t ) is equal to ˆ X ( t −
1) + (∆ ( t ) , ∆ ( t )), where ∆ t ( t )and ∆ ( t ) are i.i.d. random variables chosen from N (0 , t -th request X ( t ) in the actual sequence is equal toˆ X ( t ) + ( N ( t ) , N ( t )), where N ( t ) , N ( t ) are i.i.d. random variables chosen from N (0 , σ ). The value of σ varies, depending on the specific experiment. An example Brownian motion sequence is depicted inFig. 3. 12igure 3: An example of Brownian motion sequence. The predicted sequence is in blue, the actualsequence is in red. Set up
We use the two data sets to compare the following three algorithms: • Predict refers to our algorithm, which computes the optimum solution for the predicted sequence(by using standard dynamic programming) and follows that optimum to serve actual requests. • Opt is the optimum offline algorithm executed on the actual sequence. This optimum is computedby using the same dynamic programming as in the implementation of
Predict . • Online is state-of-the-art online randomized algorithm for page migration that achieves 2 . Online and as the output report the averageof all the runs. The standard deviation is smaller than 5%.For both data sets, we depict the costs of the three algorithms as a function of either D or σ . Seethe text above each plots for the specification. Results
The results for the Brownian motion data set are depicted in Fig. 4. The top two figures showthe cost incurred by each algorithm for fixed values of σ and different values of D , while the bottom twofigures show the costs for fixed values of D while σ varies. Not surprisingly, for low values of σ , the costs Predict and
Opt are almost equal, since the predicted and the actual sequences are very close to eachother. As the value of σ increases, their costs starts to diverge. Nevertheless, the benefit of predictionsis clear, as the cost of Predict is significantly lower than the cost of
Online . Interestingly, this holdseven though the fraction of requests predicted exactly is very close to 0.The results for the Line data set is depicted in Fig. 5. They are qualitatively similar to those forBrownian motion. 13 a) Fixed sigma, varying D . (b) Fixed sigma, varying D .(c) Fixed D = 2, varying sigma. (d) Fixed D = 5, varying sigma. Figure 4: Comparison between
Predict , Opt and
Online on Brownian motion data set. (a) Fixed sigma, varying D . (b) Fixed sigma, varying D .(c) Fixed D = 2, varying sigma. (d) Fixed D = 5, varying sigma. Figure 5: Comparison between
Predict , Opt and
Online on Line data set.14 eferences [ABF93] Baruch Awerbuch, Yair Bartal, and Amos Fiat. Competitive distributed file allocation. In
STOC , volume 93, pages 164–173, 1993.[ABF03] Baruch Awerbuch, Yair Bartal, and Amos Fiat. Competitive distributed file allocation.
Information and Computation , 185(1):1–40, 2003.[BBM17] Marcin Bienkowski, Jaroslaw Byrka, and Marcin Mucha. Dynamic beats fixed: On phase-based algorithms for file migration.
ICALP , 2017.[BCI97] Yair Bartal, Moses Charikar, and Piotr Indyk. On page migration and other relaxed tasksystems.
SODA , 1997.[BDSV18] Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning tobranch. In
International Conference on Machine Learning , pages 353–362, 2018.[BFK +
17] Joan Boyar, Lene M Favrholdt, Christian Kudahl, Kim S Larsen, and Jesper W Mikkelsen.Online algorithms with advice: A survey.
ACM Computing Surveys (CSUR) , 50(2):19, 2017.[BFR95] Yair Bartal, Amos Fiat, and Yuval Rabani. Competitive algorithms for distributed datamanagement.
Journal of Computer and System Sciences , 51(3):341–358, 1995.[Bie12] Marcin Bienkowski. Migrating and replicating data in networks.
Computer Science-Researchand Development , 27(3):169–179, 2012.[BJPD17] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing usinggenerative models. In
International Conference on Machine Learning , pages 537–546, 2017.[BS89] David L Black and Daniel D Sleator.
Competitive algorithms for replication and migrationproblems . Carnegie-Mellon University. Department of Computer Science, 1989.[CLRW97] Marek Chrobak, Lawrence L Larmore, Nick Reingold, and Jeffery Westbrook. Page migrationalgorithms using work functions.
Journal of Algorithms , 24(1):124–157, 1997.[GP19a] Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expertadvice. In
Proceedings of the 36th International Conference on Machine Learning , pages2319–2327, 2019.[GP19b] Sreenivas Gollapudi and Debmalya Panigrahi. Online algorithms for rent-or-buy with expertadvice. In
International Conference on Machine Learning , pages 2319–2327, 2019.[HIKV19] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. Learning-based frequency esti-mation algorithms. In
International Conference on Learning Representations , 2019.[KBC +
18] Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learnedindex structures. In
Proceedings of the 2018 International Conference on Management ofData , pages 489–504, 2018.[KDZ +
17] Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorialoptimization algorithms over graphs. In
Advances in Neural Information Processing Systems ,pages 6348–6358, 2017.[KM16] Amanj Khorramian and Akira Matsubayashi. Uniform page migration problem in euclideanspace.
Algorithms , 9(3):57, 2016.[KPS +
19] Ravi Kumar, Manish Purohit, Aaron Schild, Zoya Svitkina, and Erik Vee. Semi-onlinebipartite matching.
ITCS , 2019.[LLMV20] Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Onlinescheduling via learned weights. In
Proceedings of the Fourteenth Annual ACM-SIAM Sym-posium on Discrete Algorithms , pages 1859–1877. SIAM, 2020.15LRWY98] Carsten Lund, Nick Reingold, Jeffery Westbrook, and Dicky Yan. Competitive on-line al-gorithms for distributed data management.
SIAM Journal on Computing , 28(3):1086–1111,1998.[LV18] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice.In
International Conference on Machine Learning , pages 3302–3311, 2018.[Mat15] Akira Matsubayashi. A 3+ omega (1) lower bound for page migration. In , pages 314–320. IEEE,2015.[Mit18] Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching.In
Advances in Neural Information Processing Systems , pages 464–473, 2018.[MMS90] Mark S Manasse, Lyle A McGeoch, and Daniel D Sleator. Competitive algorithms for serverproblems.
Journal of Algorithms , 11(2):208–230, 1990.[MPB15] Ali Mousavi, Ankit B Patel, and Richard G Baraniuk. A deep learning approach to structuredsignal recovery. In
Communication, Control, and Computing (Allerton), 2015 53rd AnnualAllerton Conference on , pages 1336–1343. IEEE, 2015.[PSK18] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml pre-dictions. In
Advances in Neural Information Processing Systems , pages 9661–9670, 2018.[Roh20] Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In
Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages1834–1845. SIAM, 2020.[Unc16]
Special Semester on Algorithms and Uncertainty , 2016. https://simons.berkeley.edu/programs/uncertainty2016 .[Wes94] Jeffery Westbrook. Randomized algorithms for multiprocessor page migration.
SIAM Journalon Computing , 23(5):951–965, 1994.[WLKC16] Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. Learning to hash for indexing bigdata - a survey.