[PDF] Improved Analysis of RANKING for Online Vertex-Weighted Bipartite Matching

Abstract

Full PDF

IImproved Analysis of RANKING for Online Vertex-WeightedBipartite Matching

Billy Jin ∗ Cornell University David P. Williamson † Cornell University

Abstract

In this paper, we consider the online vertex-weighted bipartite matching problem in the randomarrival model. We consider the generalization of the RANKING algorithm for this problem introducedby Huang, Tang, Wu, and Zhang [HTWZ19], who show that their algorithm has a competitive ratioof 0.6534. We show that assumptions in their analysis can be weakened, allowing us to replace theirderivation of a crucial function g on the unit square with a linear program that computes the values ofa best possible g under these assumptions on a discretized unit square. We show that the discretizationdoes not incur much error, and show computationally that we can obtain a competitive ratio of 0.6629.To compute the bound over our discretized unit square we use parallelization, and still needed two days ofcomputing on a 64-core machine. Furthermore, by modifying our linear program somewhat, we can showcomputationally an upper bound on our approach of 0.6688; any further progress beyond this bound willrequire either further weakening in the assumptions of g or a stronger analysis than that of Huang et al. In the maximum bipartite matching problem, we are given as input a bipartite graph G = ( U, V, E ) suchthat each edge ( u, v ) ∈ E has u ∈ U and v ∈ V . A set F ⊆ E of edges is a matching if there is at most oneedge of F incident to each vertex u ∈ U and v ∈ V . The goal is to ﬁnd a matching of maximum cardinality.This problem has been well-studied and is one of the fundamental problems in combinatorial optimization(see, for example, Schrijver [Sch03, Chapter 16]).In a classic paper from 1990, Karp, Vazirani, and Vazirani [KVV90] introduce an online version of thisproblem and the RANKING algorithm for it. In their online version of the problem, the vertices V areknown to the algorithm in advance, while the vertices of U are introduced one at a time; we refer to thevertices of V as the oﬄine vertices and those of U as the online vertices. The algorithm maintains a matching F , initially empty. As each vertex of U arrives, the edges incident to U are also revealed to the algorithm.Once a vertex of U arrives, the algorithm must either choose an edge incident to U to add to F or decide notto add an edge incident to U to the matching F . These choices are irrevocable: no edge incident to U maybe added at any later point in time. In the RANKING algorithm, the algorithm initially chooses a randompermutation π of the oﬄine vertices V ; when a new vertex u ∈ U arrives, the algorithm adds edge ( u, v ) tothe matching that maximizes π ( v ) over the vertices v ∈ V that do not have any edge of F already incident(i.e. the unmatched vertices of V incident to u ), if such a vertex exists, otherwise it leaves u unmatched.Karp, Vazirani, and Vazirani prove that this algorithm achieves a competitive ratio of at least 1 − e ; that is,the algorithm ﬁnds a matching whose expected cardinality is at least 1 − e times the size of the maximummatching in G . They further show that this ratio is tight; that is, there are instances of the problem suchthat no online algorithm can achieve a better competitive ratio. ∗ Supported in part by an NSERC fellowship PGSD3-532673-2019 and NSF grant CCF-1908517. Address: School of Opera-tions Research and Information Engineering, Cornell University, Ithaca, NY, 14853, USA. Email: [email protected] . † Supported in part by NSF grant CCF-1908517. Address: School of Operations Research and Information Engineering,Cornell University, Ithaca, NY, 14853, USA. Email: [email protected] . a r X i v : . [ c s . D S ] J u l ince this work, there have been many simpliﬁcations of the original analysis (e.g. Birnbaum and Mathieu[BM08]; Devanur, Jain, and Kleinberg [DJK13]), proposed changes in the online model, and extensions tomore general matching problems. Of interest to us in this paper are the random arrival model , proposedby Goel and Mehta [GM08], and the maximum vertex-weighted online matching problem, introduced byAggarwal, Goel, Karande, and Mehta [AGKM11]. In the random arrival model, the online vertices of U arrive in an order given by a random permutation. Goel and Mehta show that the greedy algorithm attainsa competitive ratio of 1 − e in the random arrival model. Later, Karande, Mehta, and Tripathi [KMT11]and Mahdian and Yan [MY11] show that the RANKING algorithm has competitive ratio strictly betterthan 1 − e in this model, with Mahdian and Yan giving a competitive ratio of 0.696. In the vertex-weightedversion of the problem, the oﬄine vertices v ∈ V have weight w v ≥

0, and the goal is to ﬁnd a matching F that maximizes the total weight of the matched vertices in V (that is, the vertices in V that have an incidentedge in F ). Aggarwal et al. show that a generalization of RANKING achieves a 1 − e competitive ratiofor the vertex-weighted version of the problem (with adversarial arrivals). Devanur, Jain, and Kleinberg[DJK13] later interpreted the Aggarwal et al. algorithm as follows. Each oﬃne vertex v ∈ V draws a value y v from [0,1] uniformly at random; when a new vertex u ∈ U arrives, we add edge ( u, v ) to matching F forthe unmatched v (if any) that maximizes w v (1 − g ( y v )), where g ( y ) = e y − .Huang, Tang, Wu, and Zhang [HTWZ19] studied the combination of these two models, the maximum vertex-weighted online matching problem in the random arrival model. Drawing on the ideas of Devanur et al., theyproposed the following further generalization of the RANKING algorithm. In addition to having each oﬄinevertex v ∈ V draw a value y v from [0,1], since the online vertices arrive in random order, they propose havingeach online vertex u ∈ U draw a value y u ∈ [0 ,

1] uniformly at random, and have the oﬄine vertices arrivein order of nondecreasing y u . When a new vertex u ∈ U arrives, we add edge ( u, v ) to the matching F forthe unmatched v (if any) that maximizes w v (1 − g ( y v , y u )), for a function g with certain properties. Huanget al. assume that g ( x, y ) = ( h ( x ) + 1 − h ( y )) for h : [0 , → [0 , h ( x ) = min(1 , e x )to achieve a competitive ratio of 0.6534, beating the 1 − e ≈ .

632 competitive ratio achieved by Aggarwalet al. in the adversarial arrival model.We build upon the work of Huang et al. to give a competitive ratio of 0.6629 for the maximum vertex-weightedonline matching problem in the random arrival model. We begin by showing that several assumptions Huanget al. make about the form of g ( x, y ) needed for the analysis of their generalization of RANKING can berelaxed. Instead, we can make several weaker assumptions about the form of g ( x, y ). These assumptionscan be encoded in a linear program that allows us to produce the best possible piecewise-aﬃne function g : [0 , → [0 ,

1] under these assumptions for any given discretization of [0 , .We then need to compute the competitive ratio by ﬁnding a point in [0 , where g reaches a certain minimumof a complicated function of g given by Huang et al. To do this, we show that the error in the competitiveratio achieved by restricting ourselves to ﬁnding the minimum in the set of discretized points is linear in thesize of the discretization, so we can restrict ourselves to checking just the points in this set if we are willingto tolerate some small error. However, even checking all the discretized points becomes computationallyinfeasible if we discretize the square ﬁnely enough so that the error is tolerable. We note that the checking iseasily parallelizable, and we wrote our code to use all the cores of the machine on which it is run. Even so,we still needed two days of a 64-core, 64GB machine on Amazon’s EC2 platform to achieve our competitiveratio of 0.6629.Because we use a linear program to ﬁnd the function g , we can also use a slight modiﬁcation of it to ﬁndan upper bound on the best possible competitive ratio obtainable using the Huang et al. analysis with ourweakened assumptions on g . We modify the linear program so that any function g with our weakenedassumptions is feasible, and modify the objective function so that it gives an upper bound on the ratioobtained via the Huang et al. analysis. Solving the linear program results in an upper bound of 0.6688. Thusany further improvement in the competitive ratio will require either further weakening in the assumptionsof g or a stronger analysis than that of Huang et al.Our paper is organized as follows. In Section 2, we recap the argument of Huang et al. that we will use.In Section 3, we introduce the weaker assumptions on the function g that we will use, and prove that thearguments of Huang et al. continue to hold under these weaker assumptions so that we can still use theirbound on the competitive ratio under these weaker assumptions. In Section 4, we introduce the LP that2ill deﬁne our function g ; we show how to deﬁne a piecewise-aﬃne function g from the LP solution, and weshow that the assumptions we need on g hold for this LP-deﬁned function. In Section 5, we provide a boundon the error we incur in the competitive ratio by only checking the Huang et al. bound at discrete points ofthe unit square. In Section 6, we explain the computation that was used to obtain our competitive ratio of0.6629. Section 7 explains how we modify our linear program to obtain an upper bound on the competitiveratio that is attainable via the Huang et al. analysis with our weakened assumptions on g . We conclude inSection 8. As stated in the introduction, we assign each oﬄine vertex v ∈ V a value y v from [0,1] chosen uniformly atrandom, and following Huang et al. we assume that each online vertex u ∈ U also has a value y u from [0,1]chosen uniformly at random, and that the online vertices arrive in nondecreasing order of their y u value.The variant of the RANKING algorithm for the problem uses a function g : [0 , → [0 ,

1] that is increasingin the ﬁrst argument and decreasing in the second. When an online vertex u ∈ U arrives, it is matched tothe unmatched neighbor v ∈ V that maximizes w v (1 − g ( y v , y u )).The analysis of this algorithm by Huang et al. [HTWZ19] follows that of Devanur, Jain, and Kleinberg[DJK13]. It considers the linear programming relaxation of the vertex-weighted bipartite matching problemand its dual linear program, shown below, with the primal on the left and the dual on the right.Max (cid:88) ( u,v ) ∈ E w v x uv Min (cid:88) u ∈ U α u + (cid:88) v ∈ V α v s.t. (cid:88) v :( u,v ) ∈ E x uv ≤ ∀ u ∈ U s.t. α u + α v ≥ w v ∀ ( u, v ) ∈ E (cid:88) u :( u,v ) ∈ E x uv ≤ ∀ v ∈ V α u , α v ≥ ∀ u ∈ U, v ∈ V.x uv ≥ ∀ ( u, v ) ∈ E. The goal of the analysis is to ﬁnd a set of nonnegative variables α , whose values may depend on the random y values, such that (cid:80) ( u,v ) ∈ F w v = (cid:80) u ∈ V α u + (cid:80) v ∈ V α v and E y [ α u + α v ] ≥ β · w v for all ( u, v ) ∈ E . (Here, F is the set of edges in the matching found by the algorithm.) Given the two conditions, it is possible to deﬁnea dual solution that is a factor of β away from the total weight of the matched edges, implying a competitiveratio of β . Whenever the algorithm adds a matching edge ( u, v ) to F , it deﬁnes α u = w v · g ( y v , y u ) and α v = w v (1 − g ( y v , y u )), ensuring that the ﬁrst condition is met.The main result of Huang et al. is the following. Lemma 1 (Lemma 4.1 [HTWZ19]) . Suppose that g ( x, y ) = ( h ( x ) + 1 − h ( y )) , for some increasing function h : [0 , → [0 , that satisﬁes h (cid:48) ( x ) ≤ h ( x ) . Then for any u ∈ U and v ∈ V such that ( u, v ) ∈ E , w v E y [ α u + α v ] ≥ min ≤ γ,τ ≤ f ( γ, τ ) for f ( γ, τ ) = (cid:26) (1 − τ )(1 − γ )+(1 − τ ) (cid:90) γ g ( x, τ ) dx + (cid:90) τ min θ ≤ γ (cid:40) (1 − g ( θ, y )) + (cid:90) θ g ( x, y ) dx + (cid:90) γθ g ( x, τ ) dx (cid:41) dy (cid:27) . Thus, the competitive ratio of RANKING is at least min ≤ γ,τ, ≤ f ( γ, τ ) . Huang et al. show that by taking h ( x ) = min(1 , e x ), they can prove that f ( γ, τ ) > − ln 2 ≈ . ≤ γ, τ ≤

1, attaining their claimed competitive ratio by the reasoning above.3

Relaxing Assumptions

Huang et al. assume that g ( x, y ) = (1 + h ( x ) − h ( y )), for some increasing function h : [0 , → [0 ,

1] thatsatisﬁes h (cid:48) ( x ) ≤ h ( x ). This is a strong assumption and gives several nice properties of g which are usefulin the analysis. We relax this assumption and do not constrain g to satisfy this condition. Instead, wereplace this condition by several weaker conditions. This allows us to search over a wider class of functions g when trying to maximize the bound in Lemma 1. However, to leverage their result, we must show thatthe conclusion of Lemma 1 still holds for all g that satisfy these weaker conditions. We prove the following. Theorem 1.

Let g be a function obeying the following conditions.1. g ( x, y ) : [0 , → [0 , is continuous,2. g ( x, y ) is increasing in x and decreasing in y ,3. ∂g ( x,y ) ∂x ≤ g ( x, y ) , ∂g ( x,y ) ∂y ≥ g ( x, y ) − , and5. for all x, y, y (cid:48) with y (cid:48) > y , we have g (1 , y ) − g ( x, y ) ≥ g (1 , y (cid:48) ) − g ( x, y (cid:48) ) Then Lemma 4.1 in [HTWZ19] still holds, and the competitive ratio of the RANKING algorithm is at least min ≤ γ,τ ≤ (cid:26) (1 − τ )(1 − γ ) + (1 − τ ) (cid:90) γ g ( x, τ ) dx + (cid:90) τ min θ ≤ γ (cid:40) (1 − g ( θ, y )) + (cid:90) θ g ( x, y ) dx + (cid:90) γθ g ( x, τ ) dx (cid:41) dy (cid:27) (1) Proof.

The result in Lemma 4.1 of [HTWZ19] follows entirely from facts proved in their Lemmas 3.3, 3.4,and 3.5. We show that these lemmas continue to hold given the conditions on g above. Claim 1.

Claim 2.1 in [HTWZ19] still holds.Proof.

Claim 2.1 in [HTWZ19] is precisely condition 4.

Claim 2.

Fact 3.1 in [HTWZ19] still holds.Proof.

This fact follows from condition 2, that is, g ( x, y ) is increasing in x and decreasing in y . Claim 3.

Lemma 3.1 in [HTWZ19] still holds.Proof.

This lemma follows mostly from condition 2, and that part of the proof still holds. The only part ofthe proof which doesn’t directly follow from condition 2 is the assertion that if θ ( x ) = 1 for some x ∈ [0 , θ ( x (cid:48) ) = 1 for all x (cid:48) ≥ x . This part mimics the proof in [HTWZ19], except we use condition 5 insteadof the stronger assumption g ( x, y ) = (1 + h ( x ) − h ( y )).Suppose for the sake of contradiction that θ ( x ) = 1 but θ ( x (cid:48) ) < x (cid:48) > x . Since θ ( x (cid:48) ) <

1, thismeans when y u = x (cid:48) and y v = 1, we have that v is unmatched when u arrives, and u matched to some vertex z (cid:54) = v . Thus, w z (1 − g ( y z , x (cid:48) )) > w v (1 − g (1 , x (cid:48) )). We use notation for partial derivatives, but the result also holds for non-diﬀerentiable functions, if we use subgradients,etc. In particular, the result holds for the piecewise-aﬃne functions g we obtain from solving the LP in Section 4. To keep theexposition simple, we will continue using partial derivative notation throughout the paper. u arrives at time x . Since θ ( x ) = 1, this means when y u = x and y v = 1,we have that v is matched to u . On the other hand, z is an unmatched neighbour of u if u arrives at time x . (This is because z is unmatched if u arrives at the later time x (cid:48) .) Moreover, choosing z induces utility w z (1 − g ( y z , x )) = w z (1 − g ( y z , x (cid:48) )) · − g ( y z , x )1 − g ( y z , x (cid:48) ) > w v (1 − g (1 , x (cid:48) )) · − g ( y z , x )1 − g ( y z , x (cid:48) ) ≥ w v (1 − g (1 , x (cid:48) )) · − g (1 , x )1 − g (1 , x (cid:48) )= w v (1 − g (1 , x ))The ﬁrst inequality is because u matched to z over v when y u = x (cid:48) . The second inequality is true because1 − g ( y z , x )1 − g ( y z , x (cid:48) ) = 1 − g (1 , x ) + g (1 , x ) − g ( y z , x )1 − g (1 , x (cid:48) ) + g (1 , x (cid:48) ) − g ( y z , x (cid:48) )If we let a = 1 − g (1 , x ), b = 1 − g (1 , x (cid:48) ), c = g (1 , x ) − g ( y z , x ), and d = g (1 , x (cid:48) ) − g ( y z , x (cid:48) ), then the expressionabove is a + cb + d . Observe that for positive a, b, c, d , whenever a ≤ b and c ≥ d , we have a + cb + d ≥ ab . The secondinequality follows directly from this observation. Note that a ≤ b follows from condition 2 (monotonicity of g ), and c ≥ d is condition 5.Thus we have shown that w z (1 − g ( y z , x )) > w v (1 − g (1 , x )), which says that u is better oﬀ choosing z than v . This is a contradiction. Remark 1.

This is the only place where condition 5 is used.

Claim 4.

Lemma 3.3 in [HTWZ19] still holds.Proof.

This lemma relies on nothing but the deﬁnition of γ and τ . Claim 5.

Lemma 3.4 in [HTWZ19] still holds.Proof.

The proof of Lemma 3.4 has three parts:1. In the ﬁrst part, they show that for any ﬁxed y v = x < γ , we have α v ≥ w v · g ( x, β − ( x )) for all y u ∈ [0 , w v · E y u [ α v · ( y v < γ ) + α u · ( y v < γ, y u > τ )] ≥ f ( x, β − ( x )), where f ( x, β − ( x )) := g ( x, β − ( x )) + max { , β − ( x ) − τ } · (1 − g ( x, β − ( x )))3. Finally, they show that f ( x, β − ( x )) ≥ g ( x, τ ).The proofs of 1 and 2 follow entirely from Lemma 3.1 in [HTWZ19], which in turn follows from our conditions2 and 5. More speciﬁcally, the proof of 1 only uses the monotonicity of g (condition 2), whereas the proof of2 also relies on the fact if θ ( x ) = 1 then θ ( x (cid:48) ) = 1 for all x (cid:48) ≥ x , so it uses condition 5 as well.The proof of 3 follows unchanged, and uses condition 2 and condition 4. For completeness we write theargument here: • If β − ( x ) < τ , then f ( x, β − ( x )) = g ( x, β − ( x )) ≥ g ( x, τ ), since ∂g ( x,y ) ∂y ≤ If β − ( x ) ≥ τ , then f ( x, β − ( x )) = g ( x, β − ( x )) + ( β − ( x ) − τ )(1 − g ( x, β − ( x )). Observe that f ( x, β − ( x )) is non-decreasing in its second argument, since ∂f ( x, β − ( x )) ∂β − ( x ) = ∂g ( x, β − ( x )) ∂β − ( x ) + 1 − g ( x, β − ( x )) (cid:124) (cid:123)(cid:122) (cid:125) ≥ − ( β − ( x ) − τ ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ · ∂g ( x, β − ( x )) ∂β − ( x ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ≥ . Thus, f ( x, β − ( x )) ≥ f ( x, τ ) = g ( x, τ ). Claim 6.

Lemma 3.5 in [HTWZ19] still holds.Proof.

In this lemma, the authors write w v · g ( y u , y v ) to denote the gain, α u , of the online vertex u . Thisrelies on their assumption that g ( y v , y u ) + g ( y u , y v ) = 1, because the RANKING algorithm actually sets α u = w v · (1 − g ( y v , y u )). Since we are not assuming g ( y v , y u ) + g ( y u , y v ) = 1, we need to replace alloccurrences of the form g ( y u , y v ) in their lemma with 1 − g ( y v , y u ). We show that everything in the lemmagoes through with this replacement. With this replacement, the statement of the lemma becomes E [ α u · ( y u < τ ) + α v · ( y u < τ, y v > γ )] ≥ w v · (cid:90) τ (1 − g ( γ, x )) dx. The proof of Lemma 3.5 has three parts (again, the statements below are rewritten from their proof, replacingall occurrences of the form g ( y u , y v ) with 1 − g ( y v , y u )):1. In the ﬁrst part, they show that for any ﬁxed y u = x < τ , we have α u ≥ w v · (1 − g ( θ ( y u ) , y u ))2. Next, they show that for ﬁxed y u = x < τ , w v · E y v [ α u · ( y u < τ ) + α v · ( y u < τ, y v > γ )] ≥ f ( θ ( x ) , x ), where f ( θ ( x ) , x ) := (1 − g ( θ ( x ) , x )) + max { , θ ( x ) − γ } · g ( θ ( x ) , x ) Note: In [HTWZ19], they use the notation f ( x, θ ( x )) . We choose to use f ( θ ( x ) , x ) here in order tomake it consistent with the notation g ( θ ( x ) , x ) .

3. Finally, they show that f ( θ ( x ) , x ) ≥ − g ( γ, x ).The proofs of 1 and 2 follow entirely from Lemma 3.1 in [HTWZ19], which in turn follows from our conditions2 and 5. Actually, the proofs of 1 and 2 only rely on the part of Lemma 3.1 which follow from the monotonicityof g . (i.e. The parts of Lemma 3.1 except the statement that θ ( x ) = 1 for some x implies θ ( x (cid:48) ) = 1 for all x (cid:48) ≥ x .) Thus 1 and 2 follow entirely from our condition 2, and do not need to use condition 5.The proof of 3 uses, in addition, condition 3. The proof in [HTWZ19] follows through unchanged, but forcompleteness we write the argument here: • If θ ( x ) ≤ γ , then f ( θ ( x ) , x ) = 1 − g ( θ ( x ) , x ) ≥ − g ( γ, x ), because g is increasing in its ﬁrst argument.(Condition 2.) • If θ ( x ) > γ , then f ( θ ( x ) , x ) is non-decreasing in its ﬁrst argument, since ∂f ( θ ( x ) , x ) ∂θ ( x ) = − ∂g ( θ ( x ) , x ) ∂θ ( x ) + g ( θ ( x ) , x ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ + ( θ ( x ) − γ ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ · ∂g ( θ ( x ) , x ) ∂θ ( x ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ ≥ f ( θ ( x ) , x ) ≥ f ( γ, x ) = 1 − g ( γ, x ).From now on, we will refer to the ﬁve conditions in Theorem 1 as conditions 1-5.6 LP Formulation

To ﬁnd a function g that maximizes the bound in Theorem 1, we discretize [0 , into an n × n grid for asuﬃciently large positive integer n , and write an LP to search for the values of g on this discretized grid.In Section 4.1, we formulate the conditions 1-5, which are the conditions that any feasible g must satisfy, asconstraints in the LP. Next, in Section 4.2, we formulate the expression in Theorem 1, which is the boundwe are trying to maximize, as an LP objective. Finally, in Section 4.3, we will see how to extend the valuesof g on the discretized n × n grid, which is what the LP returns, to a function g deﬁned on the entire unitsquare. In this section, we show how to formulate the conditions 1-5 as constraints in the LP.Fix a positive integer n and let x i = y i = in , for i = 0 , , . . . , n . Our LP will have variables g ( x i , y j ), thevalues of g on the discretized unit square. Next, we encode the conditions 1-5 as constraints of the LP. Beloware the conditions, and their corresponding LP constraints:1. g ( x, y ) : [0 , → [0 ,

1] and g is continuous. The corresponding LP constraints are 0 ≤ g ( x i , y j ) ≤ i, j = 0 , , . . . , n . Note that we do not include any constraints to enforce the continuity of g ,since the aim of the LP is to determine the value of g at a discretized set of points.2. g ( x, y ) is increasing in x and decreasing in y . The corresponding LP constraints are • g ( x i , y j ) ≤ g ( x k , y j ) for all 0 ≤ i, j, k ≤ n with i ≤ k ; • g ( x i , y j ) ≥ g ( x i , y (cid:96) ) for all 0 ≤ i, j, (cid:96) ≤ n with j ≤ (cid:96) .3. ∂g ( x,y ) ∂x ≤ g ( x, y ). We discretize this constraint to create the following LP constraints: • g ( x i +1 ,y j ) − g ( x i ,y j ) x i +1 − x i ≤ g ( x i , y j +1 ) for all 0 ≤ i, j ≤ n − • g ( x i +1 ,y n ) − g ( x i ,y n ) x i +1 − x i ≤ g ( x i , y n ) for all 0 ≤ i ≤ n − Remark.

It is more natural to encode the constraints as g ( x i +1 ,y j ) − g ( x i ,y j ) x i +1 − x i ≤ g ( x i , y j ) for all ≤ i ≤ n − , ≤ j ≤ n . Since g ( x i , y j +1 ) ≤ g ( x i , y j ) , our constraints are even stronger. We do this becausewhen we extend g from its discretized values to a function deﬁned on the entire unit square, this slightlystronger version of the constraint will be needed to show that the extended function also satisﬁes thecondition. ∂g ( x,y ) ∂y ≥ g ( x, y ) −

1. As with the previous constraint, the corresponding LP constraints are • g ( x i ,y j +1 ) − g ( x i ,y j ) y j +1 − y j ≥ g ( x i +1 , y j ) −

1, for all 0 ≤ i, j ≤ n − • g ( x n ,y j +1 ) − g ( x n ,y j ) y j +1 − y j ≥ g ( x n , y j ) −

1, for all 0 ≤ j ≤ n − x, y, y (cid:48) with y (cid:48) > y , g (1 , y ) − g ( x, y ) ≥ g (1 , y (cid:48) ) − g ( x, y (cid:48) ). The corresponding LP constraints are g ( x n , y j ) − g ( x i , y j ) ≥ g ( x n , y (cid:96) ) − g ( x i , y (cid:96) ) for all 0 ≤ i, j ≤ n with (cid:96) > j . The expression we are trying to maximize is given in (1). To formulate this approximately as an LP objective,we 7. Approximate the min ≤ γ,τ ≤ and min θ ≤ γ expressions by minimizing over a ﬁnite set of values, and2. Approximate the integrals by ﬁnite sums.We begin by letting f ( γ, τ ) be the expression inside the outermost min, so that the bound is equal tomin ≤ γ,τ ≤ f ( γ, τ ). Since we cannot check all values of γ and τ , we approximate it by min ≤ i,j ≤ n f ( x i , y j ).We write this as a linear objective using the standard trick of introducing a dummy variable t , and writingmax t s.t. t ≤ f ( x i , y j ) for all 0 ≤ i, j ≤ n .Next, we must write constraints to model f ( x i , y j ). We replace the inner min θ ≤ x i by a minimum over thediscretized grid: min θ ≤ x i becomes min x k ≤ x i . For each integral that appears in the expression for f , we replaceit by a left Riemann sum. For example, the integral (cid:82) x i g ( x, y j ) dx would be replaced by n (cid:80) i − k =0 g ( x k , y j ).With these approximations, we can approximate f ( x i , y j ) as a linear function ˜ f ( x i , y j ) of the g ( x i , y j )variables: f ( x i , y j ) ≈ ˜ f ( x i , y j ) = (1 − x i )(1 − y j ) + (1 − y j ) · n i − (cid:88) k =0 g ( x k , y j )+ 1 n j − (cid:88) (cid:96) =0 min k ≤ i (cid:40) (1 − g ( x k , y (cid:96) )) + 1 n k − (cid:88) d =0 g ( x d , y (cid:96) ) + 1 n i − (cid:88) d = k g ( x d , y j ) (cid:41) Hence, to summarize this section and Section 4.1, the full linear program we use the compute the values of g on the discretized n × n grid is as follows:max t s.t. t ≤ ˜ f ( x i , y j ) for all 0 ≤ i, j ≤ n and such that g satisﬁes the constraints from Section 4.1. The linear program gives us values of g on any given discretization of [0 , , but to use the bound in Theorem1 we must1. Extend g to be deﬁned on the entire unit square, and2. Show that this extended function satisﬁes Conditions 1-5.To extend g from its values on an n × n grid to a function deﬁned on the entire unit square, we triangulatethe n × n grid as shown in Figure 1.For a point ( x, y ) on a gridpoint, its function value is given by the LP. For any other point ( x, y ), we deﬁne g ( x, y ) to be a convex combination of the function values on the three vertices of the triangle containing( x, y ). More precisely, suppose ( x, y ) is contained in the triangle with vertices ( a , b ) , ( a , b ), and ( a , b ),where the ( a i , b i ) are gridpoints. (See Figure 2.) Then we deﬁne g ( x, y ) = λ · g ( a , b ) + λ · g ( a , b ) + λ · g ( a , b ) , where λ , λ , λ are the unique coeﬃcients that satisfy λ , λ , λ ≥ λ + λ + λ = 1 and( x, y ) = λ · ( a , b ) + λ · ( a , b ) + λ · ( a , b ) . Geometrically, the extended function is piecewise aﬃne – it is aﬃne on each triangle.We devote the remainder of this section to proving that the extended function satisﬁes conditions 1-5. Welist the conditions below, and prove that the extended function satisﬁes them:8 y Figure 1: Triangulating the grid.Here, n = 4. ( a , b )( a , b ) ( a , b )( x, y )Figure 2: Extending the func-tion values to a point inside atriangle. ( x i , y j ) ( x i +1 , y j ) (cid:96)T T Figure 3: Illustration of theproof of Condition 3.( a , b ) ( a , b ) (a) Illustration of the proof of why we may assume( a , b ) and ( a , b ) are contained in the same triangle. (0 ,

0) ( x , , y )( a , b ) ( a , b )(0 , b ) ( n − b, b ) (b) Illustration of the proof of why g ( a , b ) ≤ g ( a , b )for ( a , b ), ( a , b ) in the same triangle and a ≤ a . Figure 4: Side-by-side comparison of the function g we used, versus the function g used by Huang et al.1. g ( x, y ) : [0 , → [0 ,

1] and g is continuous. The extended function takes values in [0 ,

1] because itsvalues are convex combinations of its values on the discretized grid, which are in [0 , g ( x, y ) is increasing in x and decreasing in y . We will show that g is increasing in x ; the proof that itis decreasing in y is similar.Let ( a , b ) and ( a , b ) be points in the unit square, with a ≤ a . We must show that g ( a , b ) ≤ g ( a , b ).First, observe that it suﬃces to show this when the two points are contained in the same triangle. Thisis because if ( a , b ) and ( a , b ) were contained in diﬀerent triangles, then the horizontal line segment (cid:96) from ( a , b ) to ( a , b ) can be divided from left to right into a sequence of segments (say (cid:96) , . . . , (cid:96) k ), eachof which is contained in a single triangle. Then the fact that g is increasing on each smaller segment (cid:96) i would imply that g is increasing on (cid:96) . (For an illustration of this, see FigureSo, we can assume ( a , b ) and ( a , b ) are contained in the same triangle. Without loss of generality,suppose ( a , b ) and ( a , b ) are both contained in the lower-leftmost triangle; that is, the triangle withvertices (0 , x , , y ); the proof for any other triangle is the same.Note that ( a , b ) and ( a , b ) are both on the line segment from (0 , b ) to ( n − b, b ). Since g is piecewiseaﬃne in any triangle, it follows that g ( a , b ) = (1 − λ ) · g (0 , b ) + λ · g ( n − b, b ), where 0 ≤ λ ≤ λ · ( n − b ) = a . Similarly, g ( a , b ) = (1 − λ ) · g (0 , b ) + λ · g ( n − b, b ), where 0 ≤ λ ≤ λ · ( n − b ) = a . Now, since a ≤ a , it follows that λ ≤ λ . Therefore, to show that g ( a , b ) ≤ g ( a , b ) it suﬃces to show that g (0 , b ) ≤ g ( n − b, b ).To see this, we note that g (0 , b ) = (1 − λ ) · g (0 ,

0) + λ · g (0 , n ), where 0 ≤ λ ≤ λn = b .Similarly, g ( n − b, b ) = (1 − λ ) · g ( n ,

0) + λ · g (0 , n ). Since g ( n , ≥ g (0 ,

0) (this was a constraint inthe LP), it follows that g (0 , b ) ≤ g ( n − b, b ), as needed.9 x, y ) (1 , y )( x, y (cid:48) ) (1 , y (cid:48) ) h h h (cid:48) h (cid:48) b b b (cid:48) b (cid:48) xy Figure 5: Illustration of proof of condition 5.3. ∂g ( x,y ) ∂x ≤ g ( x, y ). Consider a horizontal line segment (cid:96) between two adjacent gridpoints, say between( x i , y j ) and ( x i +1 , y j ). In the triangulation, (cid:96) is adjacent to two triangles: one triangle T below it andone triangle T above it. (If y i = 0 or y i = 1, then (cid:96) is only adjacent to one triangle, but the sameargument still goes through.) See Figure 3 for an illlustration. Because g is piecewise aﬃne in eachtriangle, it follows that ∂g ( x,y ) ∂x is constant on T ∪ T , and is equal to the slope of (cid:96) . Recall that theLP imposes the following constraint on the slope of (cid:96) :slope( (cid:96) ) = g ( x i +1 , y j ) − g ( x i , y j ) x i +1 − x i ≤ g ( x i , y j +1 )Because g is increasing in x and decreasing in y , we note that g ( x i , y j +1 ) ≤ inf { g ( x, y ) : ( x, y ) ∈ T ∪ T } .Thus ∂g ( x,y ) ∂x ≤ g ( x, y ) holds on T ∪ T . Because any triangle is adjacent to some horizonal line segmentin the grid, this argument shows that ∂g ( x,y ) ∂x ≤ g ( x, y ) holds for all ( x, y ) in the unit square, and weare done.4. ∂g ( x,y ) ∂y ≥ g ( x, y ) −

1. The proof of this is similar to the proof of the previous condition.5. For all x, y, y (cid:48) with y (cid:48) > y , g (1 , y ) − g ( x, y ) ≥ g (1 , y (cid:48) ) − g ( x, y (cid:48) ). Let F = { , n , n , . . . , } . If x, y, y (cid:48) ∈F , then the condition holds, because these were constraints imposed by the LP.Suppose now ( x, y ) lies in the interior of some triangle T . Fix x and y (cid:48) , and imagine varying y upand down such that ( x, y ) remains inside T . Let I be the range of values of y such that ( x, y ) remainsinside T . Since g is aﬃne on each triangle, it follows that ∂∂y ( g (1 , y ) − g ( x, y )) is constant for all y in I . Therefore (by moving y in the direction that decreases the LHS of the inequality if necessary), itsuﬃces to prove the inequality in the case ( x, y ) is on the boundary of a triangle. Similarly, we mayassume that ( x, y (cid:48) ) lies on the boundary of a triangle.Suppose ( x, y ) and ( x, y (cid:48) ) both lie on hypotenuses (see Figure 5). The case where one or both of thepoints lie on a base of a triangle is very similar (and easier), so we will omit it here.Let h and h be the two endpoints of the hypotenuse containing ( x, y ), with h lower than h .Similarly, deﬁne h (cid:48) and h (cid:48) . Let b and b be the two endpoints of the vertical grid segment containing(1 , y ). Similarly, deﬁne b (cid:48) and b (cid:48) . We will use the fact that the inequality holds for the gridpoints( b , h , b (cid:48) , h (cid:48) ) and the gridpoints ( b , h , b (cid:48) , h (cid:48) ) to deduce that it holds for our points.The inequality on the points ( b , h , b (cid:48) , h (cid:48) ) is g ( b ) − g ( h ) ≥ g ( b (cid:48) ) − g ( h (cid:48) )The inequality on the points ( b , h , b (cid:48) , h (cid:48) ) is g ( b ) − g ( h ) ≥ g ( b (cid:48) ) − g ( h (cid:48) )10ow let 0 ≤ λ ≤ λb + (1 − λ ) b = (1 , y ). Observe that we also have λh + (1 − λ ) h = ( x, y ), λb (cid:48) + (1 − λ ) b (cid:48) = (1 , y (cid:48) ), and λh (cid:48) + (1 − λ ) h (cid:48) = ( x, y (cid:48) ).Now, multiply the inequality for ( b , h , b (cid:48) , h (cid:48) ) by λ , and multiply the inequality for ( b , h , b (cid:48) , h (cid:48) ) by(1 − λ ), then add them together. The result is the inequality g (1 , y ) − g ( x, y ) ≥ g (1 , y (cid:48) ) − g ( x, y (cid:48) ) , which is what we wanted. The linear program gives us function values deﬁned on a discretization of the unit square, which we thenextend to a function g deﬁned on the entire unit square via triangulation. It remains now to plug this g intothe bound for the competitive ratio given by Theorem 1. We cannot evaluate the bound analytically for thefunction g returned by the LP; instead, we evaluate it computationally.For 0 ≤ γ, τ ≤

1, let f ( γ, τ ) = (1 − τ )(1 − γ ) + (1 − τ ) (cid:90) γ g ( x, τ ) dx + (cid:90) τ min θ ≤ γ (cid:40) (1 − g ( θ, y )) + (cid:90) θ g ( x, y ) dx + (cid:90) γθ g ( x, τ ) dx (cid:41) dy so that, by Theorem 1, the competitive ratio of g is at least min ≤ γ,τ ≤ f ( γ, τ ).When we evaluate this bound using a computer, we incur two sources of error:1. The bound takes a minimum over all ( γ, τ ) in the unit square. However, using a computer, we canonly check a ﬁnite number of points ( γ, τ ).2. For a ﬁxed ( γ, τ ), we do not calculate f ( γ, τ ) exactly. Instead, using a computer, we calculate anapproximation ˆ f ( γ, τ ), by • Approximating the integrals with ﬁnite sums, and • Replacing the inner minimum over all θ ≤ γ by a minimum over a ﬁnite set of θ .In what follows, we will bound the errors above. This proves that the output of the computer program is avalid bound on the competitive ratio. We will show that f is Lipschitz in γ and τ , which implies that checkingall values of ( γ, τ ) in a suﬃciently ﬁne discretization of the unit square is enough to obtain a quantiﬁablebound on the error.Before we move on, we remind the reader what it means for a function to be Lipschitz. Deﬁnition 1.

A function f : R n → R is L -Lipschitz if | f ( x ) − f ( y ) | ≤ L (cid:107) x − y (cid:107) for all x, y ∈ R n . It will be convenient for us to work with Lipschitzness in a particular coordinate.

Deﬁnition 2.

A function f : R n → R is L -Lipschitz in its i th coordinate if | f ( x , . . . , x i , . . . , x n ) − f ( x , . . . , x (cid:48) i , . . . , x n ) | ≤ L | x i − x (cid:48) i | for all x , . . . , x i , x (cid:48) i , . . . , x n ∈ R . Throughout the proofs below, we will repeatedly use the following facts: • ≤ ∂g ( x,y ) ∂x ≤ g ( x, y ) ≤ • − ≤ g ( x, y ) − ≤ ∂g ( x,y ) ∂y ≤

0. This follows from conditions 1, 2, and 4.11 emma 2. f ( γ, τ ) is 1-Lipschitz in γ and -Lipschitz in τ .Proof. First, consider varying γ . For δ >

0, we have p ( γ + δ, τ ) − p ( γ, τ ) = − δ (1 − τ ) + (1 − τ ) (cid:90) γ + δγ g ( x, τ ) (cid:124) (cid:123)(cid:122) (cid:125) ∈ [0 , dx ∈ [ − δ (1 − τ ) , ∂∂θ h ( γ, τ, θ, y ) = − ∂g ( θ, y ) ∂θ (cid:124) (cid:123)(cid:122) (cid:125) ∈ [ − g ( θ,y ) , + g ( θ, y ) − g ( θ, γ ) ∈ [ − g ( θ, γ ) , g ( θ, y )] ⊂ [ − , h ( γ + δ, τ, θ, y ) − h ( γ, τ, θ, y ) = (cid:90) γ + δγ g ( x, τ ) dx ∈ [0 , δ ]Therefore, q ( γ + δ, τ, y ) − q ( γ, τ, y ) = min θ ≤ γ + δ h ( γ + δ, τ, θ, y ) − min θ ≤ γ h ( γ, τ, θ, y ) ∈ min θ ≤ γ + δ { h ( γ, τ, θ, y ) + [0 , δ ] } − min θ ≤ γ h ( γ, τ, θ, y )= [0 , δ ] + min θ ≤ γ + δ h ( γ, τ, θ, y ) − min θ ≤ γ h ( γ, τ, θ, y ) ⊆ [0 , δ ] + [ − δ, − δ, δ ]Combining, we get f ( γ + δ, τ ) − f ( γ, τ ) = p ( γ + δ, τ ) − p ( γ, τ ) + (cid:90) τ ( q ( γ + δ, τ, y ) − q ( γ, τ, y )) dy ∈ [ − δ (1 − τ ) ,

0] + [ − δτ, δτ ]= [ − δ, δτ ] ⊆ [ − δ, δ ]This shows that f is 1-Lipschitz in γ .Next, we consider varying τ . Let δ >

0. First, we have ∂p ( γ, τ ) ∂τ = − (1 − γ ) − (cid:90) γ g ( x, τ ) (cid:124) (cid:123)(cid:122) (cid:125) ∈ [0 , dx + (1 − τ ) (cid:90) γ ∂g ( x, τ ) ∂τ (cid:124) (cid:123)(cid:122) (cid:125) ∈ [ − , dx ∈ {− (1 − γ ) } + [ − γ,

0] + [ − (1 − τ ) , ⊂ [ − , − p ( γ, τ + δ ) − p ( γ, τ ) ∈ [ − δ, − δ ].Next, note that h ( γ, τ + δ, θ, y ) − h ( γ, τ, θ, y ) = (cid:90) γθ ( g ( x, τ + δ ) − g ( x, τ )) (cid:124) (cid:123)(cid:122) (cid:125) ∈ [ − δ, dy ∈ [ − δ, q ( γ, τ + δ, y ) − q ( γ, τ, y ) = min θ ≤ γ h ( γ, τ + δ, θ, y ) − min θ ≤ γ h ( γ, τ, θ, y ) ∈ min θ ≤ γ { h ( γ, τ, θ, y ) + [ − δ, } − min θ ≤ γ h ( γ, τ, θ, y )= [ − δ,

0] 12ence, (cid:90) τ + δ q ( γ, τ + δ, y ) dy − (cid:90) τ q ( γ, τ, y ) dy ∈ ( τ + δ )[ − δ,

0] + (cid:90) τ + δ q ( γ, τ, y ) − (cid:90) τ q ( γ, τ, y ) dy ⊂ [ − δ,

0] + (cid:90) τ + δτ q ( γ, τ, y ) (cid:124) (cid:123)(cid:122) (cid:125) ∈ [0 , dy ⊂ [ − δ,

0] + [0 , δ ]= [ − δ, δ ]Here, q ( γ, τ, y ) ∈ [0 ,

2] is because q ( γ, τ, y ) = min θ ≤ γ h ( γ, τ, θ, y ), and h ( γ, τ, θ, y ) = (1 − g ( θ, y )) + (cid:90) θ g ( x, y ) dx + (cid:90) γθ g ( x, τ ) dx ≤ − g ( θ, y ) + g ( θ, y ) + (cid:90) γθ g ( x, τ ) dx ≤ . (Clearly, h ( γ, τ, θ, y ) ≥

0. )Combining, we get f ( γ, τ + δ ) − f ( γ, τ ) = p ( γ, τ + δ ) − p ( γ, δ ) + (cid:90) τ + δ q ( γ, τ + δ, y ) dy − (cid:90) τ q ( γ, τ, y ) dy ∈ [ − δ, − δ ] + [ − δ, δ ]= [ − δ, δ ] ⊂ [ − δ, δ ]This shows that f is 3-Lipschitz in τ .The preceding lemma allows us to control the error incurred from checking the bound over all ( γ, τ ) in adiscretization of the unit square instead of the entire unit square. The second source of error is that for aﬁxed ( γ, τ ), we evaluate an approximation ˆ f ( γ, τ ) to f ( γ, τ ), because we replace the integrals with discretesums and the minimization over all θ ≤ γ with a minimization over ﬁnitely many θ . The following lemmacontrols the second source of error.To make notation less cluttered, let • p ( γ, τ ) = (1 − γ )(1 − τ ) + (1 − τ ) (cid:82) γ g ( x, τ ) dx , • h ( γ, τ, θ, y ) = (1 − g ( θ, y )) + (cid:82) θ g ( x, y ) dx + (cid:82) γθ g ( x, τ ) dx • q ( γ, τ, y ) = min θ ≤ γ h ( γ, τ, θ, y )so that f ( γ, τ ) = p ( γ, τ ) + (cid:82) τ q ( γ, τ, y ) dy . Lemma 3.

Fix γ, τ ∈ [0 , , and let m be a positive integer. Let ˆ f ( γ, τ ) be the approximation to f ( γ, τ ) obtained by: • Replacing the integral (cid:82) τ q ( γ, τ, y ) dy with a trapezoidal sum with subdivision length m , • Replacing the other three integrals with left Riemann sums with subdivision length m , and • Replacing the minimum over all θ ≤ γ with a minimum over a discretization with subdivision length m . hen ˆ f ( γ, τ ) ≤ f ( γ, τ ) + m .More precisely, ˆ f is deﬁned as follows. Deﬁne x k = y k = km for k = 0 , , . . . , m . Let i and j be the integerssuch that x i ≤ γ < x i +1 , and y j ≤ τ < y j +1 . Then ˆ f ( γ, τ ) = ˆ p ( γ, τ ) + 1 m j − (cid:88) k =0 ˆ q ( γ, τ, y k ) + ˆ q ( γ, τ, y k +1 )2 where ˆ p and ˆ q are deﬁned to be ˆ p ( γ, τ ) = (1 − γ )(1 − τ ) + (1 − τ ) · m i − (cid:88) k =0 g ( x k , τ ) and ˆ q ( γ, τ, y ) = min k ≤ i +1 (cid:40) − g ( x k , y ) + 1 m k − (cid:88) d =0 g ( x d , y ) + 1 m i − (cid:88) d = k g ( x, y j +1 ) (cid:41) Proof.

Recall that f ( γ, τ ) = p ( γ, τ ) + (cid:82) τ q ( γ, τ, y ) dy , where • p ( γ, τ ) = (1 − γ )(1 − τ ) + (1 − τ ) (cid:82) γ g ( x, τ ) dx and • q ( γ, τ, y ) = (cid:82) τ min θ ≤ γ (cid:110) − g ( θ, y ) + (cid:82) θ g ( x, y ) dx + (cid:82) τθ g ( x, τ ) dx (cid:111) First, note that left Riemann sums always underapproximate the integral of an increasing function, and g ( x, y ) is increasing in x . Hence, wherever we discretize an integral of g with respect to x , we obtain a lowerbound. It follows that • ˆ p ( γ, τ ) ≤ p ( γ, τ ), and • ˆ q ( γ, τ, y ) ≤ min k ≤ i +1 (cid:110) − g ( x k , y ) + (cid:82) x k g ( x, y ) dx + (cid:82) x i x k g ( x, y j +1 ) dx (cid:111) .Let F γ = { x k : x k ≤ γ } . Note that we can further upper bound the previous expression bymin θ ∈F γ (cid:110) − g ( θ, y ) + (cid:82) θ g ( x, y ) dx + (cid:82) γθ g ( x, τ ) dx (cid:111) ; this is because x i ≤ γ < x i +1 and y j ≤ τ < y j +1 ,and g ( x, y j +1 ) ≤ g ( x, τ ).Hence, ˆ f ( γ, τ ) ≤ p ( γ, τ ) + 1 m j − (cid:88) k =0 min θ ∈F γ h ( γ, τ, θ, y k ) + min θ ∈F γ h ( γ, τ, θ, y k +1 )2 , where h ( γ, τ, θ, y ) := 1 − g ( θ, y ) + (cid:82) θ g ( x, y ) dx + (cid:82) γθ g ( x, τ ) dx . The rest of the proof will be devoted tobounding the second term in the displayed inequality above.Observe that h is 1-Lipschitz in θ , because ∂∂θ h ( γ, τ, θ, y ) = − ∂∂θ g ( θ, y ) (cid:124) (cid:123)(cid:122) (cid:125) ∈ [ − g ( θ,y ) , + g ( θ, y ) − g ( θ, τ ) ∈ [ − g ( θ, τ ) , g ( θ, y )] ⊂ [ − , γ, τ, y , min θ ∈ F γ h ( γ, τ, θ, y ) ≤ min θ ≤ γ h ( γ, τ, θ, y ) + 1 m θ ∗ = arg min θ ≤ γ h ( γ, τ, θ, y ), we can ﬁnd some ¯ θ ∈ F γ with (cid:12)(cid:12) ¯ θ − θ ∗ (cid:12)(cid:12) < m , and thenmin θ ∈ F γ h ( γ, τ, θ, y ) ≤ h ( γ, τ, ¯ θ, y ) ≤ h ( γ, τ, θ ∗ , y ) + 1 m = min θ ≤ γ h ( γ, τ, θ, y ) + 1 m , where the second inequality is because h is 1-Lipschitz in θ .Next, observe that h ( γ, τ, θ, y ) is 1-Lipschitz in y , because ∂∂y h ( γ, τ, θ, y ) = − ∂g ( θ, y ) ∂y (cid:124) (cid:123)(cid:122) (cid:125) ∈ [0 , + (cid:90) θ ∂g ( x, y ) ∂y (cid:124) (cid:123)(cid:122) (cid:125) ∈ [ − , dx ∈ [ − , q ( γ, τ, y ) = min θ ≤ γ h ( γ, τ, θ, y ) is 1-Lipschitz in y .We can now put these facts together to prove the desired bound. We have1 m j − (cid:88) k =0 min θ ∈F γ h ( γ, τ, θ, y k ) + min θ ∈F γ h ( γ, τ, θ, y k +1 )2 ≤ m j − (cid:88) k =0 (cid:18) m + min θ ≤ γ h ( γ, τ, θ, y k ) + min θ ≤ γ h ( γ, τ, θ, y k +1 )2 (cid:19) = jm + 1 m j − (cid:88) k =0 (cid:18) q ( γ, τ, y k ) + q ( γ, τ, y k +1 )2 (cid:19) ≤ jm + j m + (cid:90) τ q ( γ, τ, y ) dy The last inequality is because m (cid:80) j − k =0 (cid:16) q ( γ,τ,y k )+ q ( γ,τ,y k +1 )2 (cid:17) is the trapezoidal sum approximation of theintegral (cid:82) τ q ( γ, τ, y ) dy , with discretization length m . Since q ( γ, τ, y ) is 1-Lipschitz in y , this incurs anadditive error of at most j m by Lemma 4 in Appendix A.Combining, we get ˆ f ( γ, τ ) ≤ p ( γ, τ ) + jm + j m + (cid:90) τ q ( γ, τ, y ) dy = f ( γ, τ ) + jm + j m ≤ f ( γ, τ ) + 54 m , as claimed.Combining the above two lemmas allows us to quantify the error incurred when we evaluate the bound inTheorem 1 using a computer. Corollary 1.

Let F = { , n , n , . . . , } be an n × n discretization of the unit square. If we minimize overall ( γ, τ ) ∈ F , of the function ˆ f ( γ, τ ) deﬁned in Lemma 3, then the minimum value satisﬁes min ( γ,τ ) ∈F ˆ f ( γ, τ ) ≤ min ( γ,τ ) ∈ [0 , f ( γ, τ ) + 2 n + 54 m . Proof.

By Lemma 3, we have min ( γ,τ ) ∈F ˆ f ( γ, τ ) ≤ min ( γ,τ ) ∈F f ( γ, τ ) + m . γ ∗ , τ ∗ ) = arg min ( γ,τ ) ∈ [0 , f ( γ, τ ). Let (ˆ γ, ˆ τ ) be the closest point to ( γ ∗ , τ ∗ ) in the discretized grid F . Then | ˆ γ − γ ∗ | ≤ n , and | ˆ τ − τ ∗ | ≤ n . By Lemma 2, we know f is 1-Lipschitz in γ and 3-Lipschitz in τ , which implies that min ( γ,τ ) ∈F f ( γ, τ ) ≤ f (ˆ γ, ˆ τ ) ≤ f ( γ ∗ , τ ∗ ) + 12 n + 32 n . Chaining this with the previous displayed inequality, we obtainmin ( γ,τ ) ∈F ˆ f ( γ, τ ) ≤ f ( γ ∗ , τ ∗ ) + 2 n + 54 m , as claimed. In this section, we describe the computations that we performed to obtain a competitive ratio of 0 . Recall that Theorem 1 states that the competitive ratio of RANKING is bounded below by an expressionof the form min ( γ,τ ) ∈ [0 , f ( γ, τ ), where f depends on the function g that the algorithm uses. To obtain ourcompetitive ratio, we1. Solve the LP in Section 4 for an appropriate discretization of the unit square. (We chose a 50 × g obtained from the LP into the bound in Theorem 1.Note that for the function g obtained from the LP, we can only evaluate the bound in Theorem 1 approxi-mately. This is because g is a piecewise-aﬃne function deﬁned by interpolating its values on a 50 ×

50 grid,so it has no amenable closed form. As described in Section 5, we let F = { , n , . . . , } for some large enough n , and we evaluate min ( γ,τ ) ∈F ˆ f ( γ, τ ), where ˆ f is an approximation to f amenable to computer evaluation.(Again, refer to Section 5 for the details.)Corollary 1 gives us quantiﬁable bound on the error incurred when we evaluate min ( γ,τ ) ∈F ˆ f ( γ, τ ) insteadof the true bound min ( γ,τ ) ∈ [0 , f ( γ, τ ). We used a computer to evaluate min ( γ,τ ) ∈F ˆ f ( γ, τ ) with n = 2 and m = 2 , and obtained min ( γ,τ ) ∈F ˆ f ( γ, τ ) = 0 . ( γ,τ ) ∈F ˆ f ( γ, τ ) − n − m = 0 . . Computing the bound for the above choice of parameters n and m necessitated the use of clever computationtechniques; the naive computation (which simply goes through all ( γ, τ ) ∈ F one by one, evaluating fromscratch ˆ f ( γ, τ ) for each) is too slow for the size of the discretization we required to obtain a good bound.(For a point ( γ, τ ) ∈ F , we estimate that evaluating ˆ f ( γ, τ ) is roughly a O ( m ) operation. The naivecomputation, which does this for each of the n points in F , is then a O ( n m ) computation, which is muchtoo slow for the parameters n = 2 and m = 2 .) To speed up the computation, we used two techniques:1. Precomputation of values that are used repeatedly by the code, and2. Parallelization.Even after speeding up the computation using precomputed tables and parallelization, we still needed twodays of computing time on a 64-core machine with 64GB of memory. Without either one of these techniques,the computation would not have terminated in a reasonable amount of time. In the remainder of this section,we describe the above techniques in more detail. The code for this paper can be found at https://github.com/MapleOx/rankinglp. We performed this computation on Amazon EC2. We used a compute-optimized c6g.16xlarge instance, running theAmazon Linux 2 AMI. emark. The perceptive reader might notice that it would be conceptually simpler to skip the second stepgiven above altogether (i.e. plugging the function g from the LP into the bound of Theorem 1), since theobjective of the LP is already an approximation of the bound in Theorem 1. The reason we do not dothis is because to be able to prove a good enough bound, we need to evaluate min ( γ,τ ) ∈F ˆ f ( γ, τ ) for a ﬁneenough discretization. However, solving the LP is prohibitively expensive for large discretizations. To putthis into context, we solved the LP on a × discretization, and used the output of the LP to evaluate min ( γ,τ ) ∈F ˆ f ( γ, τ ) on an n × n discretization, where n = 2 . From the description of the LP in Section4, it can be seen that for an n × n discretization, the LP has roughly n variables and n constraints. For n = 2 , this would have been too large an LP to solve. The computation we are trying to perform is min ( γ,τ ) ∈F ˆ f ( γ, τ ). Recall from Section 5 that ˆ f is deﬁned asˆ f ( γ, τ ) = ˆ p ( γ, τ ) + 1 m j − (cid:88) k =0 ˆ q ( γ, τ, y k ) + ˆ q ( γ, τ, y k +1 )2where ˆ p and ˆ q are deﬁned to be • ˆ p ( γ, τ ) = (1 − γ )(1 − τ ) + (1 − τ ) · m (cid:80) i − k =0 g ( x k , τ ), and • ˆ q ( γ, τ, y ) = min k ≤ i +1 (cid:110) − g ( x k , y ) + m (cid:80) k − d =0 g ( x d , y ) + m (cid:80) i − d = k g ( x d , y j +1 ) (cid:111) ,where in the above expressions, • x k = y k = km , and • i and j are deﬁned to be the integers such that x i ≤ γ < x i +1 , and y j ≤ τ < y j +1 .The key observation is that for two diﬀerent points ( γ, τ ) and ( γ (cid:48) , τ (cid:48) ), some parts of the computation ofˆ f ( γ, τ ) and ˆ f ( γ (cid:48) , τ (cid:48) ) are the same. Thus, we can speed up the code by precomputing these values and storingthem in memory, so that they can be fetched instead of being recomputed each time they are needed. Weidentiﬁed two types of values that could be reused, and precomputed a table for each. A table to store the values of g.

From the expression for ˆ f , we see that it involves many evaluationsof g . We can precompute these values and store them in an table for future use. Note that we only everneed to evaluate g on points of the form ( x i , y j ) = ( i/n, j/n ), which results in a ( n + 1) × ( n + 1) table tobe stored in memory. For our choice of n = 2 = 16384, this resulted in a table of size roughly 6GB. A table to store the values of the inner minimum.

We also precomputed a table to store the values ofˆ q ( γ, τ, y ). Note that ˆ q ( γ, τ, y ) only depends on x i , y j , and y , where x i ≤ γ < x i +1 , and y j ≤ τ < y j +1 . Thatis, the computation of ˆ q ( γ, τ, y ) rounds γ and τ to the nearest points on the m discretized grid. Thus thereare m + 1 possible values for each of x i , y j , and y , which implies that there are ( m + 1) possible values forˆ q ( γ, τ, y ). We precomputed a table to store all of these values in memory. For our choice of m = 2 = 1024,this resulted in a table of size roughly 8GB. The other eﬃciency gain came from parallelizing the code. With the use of precomputed tables, our coderuns in two stages:1. Stage 1: Compute the two tables described above.2. Stage 2: Use the precomputed tables to compute min ( γ,τ ) ∈F ˆ f ( γ, τ ).17oth stages are amenable to parallelization. For the ﬁrst stage, computing the value of a table entry isindependent of computing the value of another table entry, so ﬁlling each table can be done in parallel.For the second stage, evaluating ˆ f ( γ, τ ) is independent of evaluating ˆ f ( γ (cid:48) , τ (cid:48) ) for diﬀerent pairs ( γ, τ ) and( γ (cid:48) , τ (cid:48) ), so this can also be done in parallel. We used the multiprocessing module in Python to parallelizeour code, which we then ran on a 64-core machine on Amazon’s EC2. In total, this took about 2 days.Parallelizing was an essential step for making the code run in a reasonable amount of time; extrapolatingfrom its running time on the 64-core machine, we estimate that a non-parallelized version would have takenmore than 100 days to run. In this section, we show that our approach cannot obtain a competitive ratio signiﬁcantly better than 0.6629.Thus, any further progress beyond this bound will require either further weakening in the assumptions of g ,or a stronger analysis than that of Huang et al. Precisely, we prove the following theorem. Theorem 2.

For any g function that satisﬁes conditions 1–5, the value of the the bound in Theorem 1 is atmost . .Proof. We can write an LP for which any function g that satisﬁes conditions 1–5 is feasible, and whoseobjective upper bounds the value of the bound in Theorem 1 for g . This LP is a slight modiﬁcation of theone in Section 4. The reason we must modify the LP slightly is to ensure that1. It is a relaxation of the original problem, in the sense that any function g which satisﬁes conditions1–5 must be feasible to the LP, and2. The objective of the LP is provably an upper bound on the value of the expression in Theorem 1 whenevaluated on g .Below, we explain how this modiﬁcation works.First, recall that the LP in Section 4 searched for g by discretizing the unit square into an n × n grid, forsome positive integer n , and instantiating variables for the values of g on the discretized grid. Our modiﬁedLP also does this. Using the same notation as in Section 4, let x i = y i = n for i = 0 , . . . , n . Then thevariables in our LP will be g ( x i , y j ). Constraints.

The original LP had 5 sets of constraints, modelling the conditions 1–5. The modiﬁed LPtakes these 5 sets of constraints, and changes two of them slightly to obtain a valid relaxation. Below are theconstraints. Constraints 1, 2, and 5 are unchanged from the LP in Section 4, but we will write them againhere for completeness. Constraints 3 and 4 are the ones that have been changed slightly. As we describeeach constraint, we explain why it is a valid relaxation of the corresponding condition.1. g ( x, y ) : [0 , → [0 ,

1] and g is continuous. The corresponding LP constraints are 0 ≤ g ( x i , y j ) ≤ i, j = 0 , , . . . , n . This constraint is clearly a valid relaxation.2. g ( x, y ) is increasing in x and decreasing in y . The corresponding LP constraints are • g ( x i , y j ) ≤ g ( x k , y j ) for all 0 ≤ i, j, k ≤ n with i ≤ k ; • g ( x i , y j ) ≥ g ( x i , y (cid:96) ) for all 0 ≤ i, j, (cid:96) ≤ n with j ≤ (cid:96) .This constraint is a valid relaxation, because it is checking monotonicity at only a ﬁnite set of points.3. ∂g ( x,y ) ∂x ≤ g ( x, y ). We discretize this constraint to create the following LP constraint: g ( x i +1 , y j ) − g ( x i , y j ) x i +1 − x i ≤ g ( x i +1 , y j ) for all 0 ≤ i ≤ n −

1, 0 ≤ j ≤ n

18o see why this constraint is a valid relaxation, we use the mean value theorem. The means valuetheorem tells us that g ( x i +1 , y j ) − g ( x i , y j ) x i +1 − x i = ∂g ( x, y j ) ∂x for some x ∈ [ x i , x i +1 ]. If g satisﬁes conditions 1–5, then ∂g ( x,y j ) ∂x ≤ g ( x, y j ) ≤ g ( x i +1 , y j ), where theﬁrst inequality is by condition 3 and the second inequality is by condition 2. Thus the discretizedconstraint is indeed a valid relaxation. Remark.

The LP in Section 4 had instead the right-hand side of the constraint as g ( x i , y j +1 ) . Since g ( x i , y j +1 ) ≥ g ( x i +1 , y j ) , the constraints of the modiﬁed LP are weaker. We do this to make themodiﬁed LP a valid relaxation. ∂g ( x,y ) ∂y ≥ g ( x, y ) −

1. Similar to the previous constraint, the corresponding LP constraint is g ( x i , y j +1 ) − g ( x i , y j ) y j +1 − y j ≥ g ( x i , y j +1 ) − ≤ i, j ≤ n − x, y, y (cid:48) with y (cid:48) > y , g (1 , y ) − g ( x, y ) ≥ g (1 , y (cid:48) ) − g ( x, y (cid:48) ). The corresponding LP constraints are g ( x n , y j ) − g ( x i , y j ) ≥ g ( x n , y (cid:96) ) − g ( x i , y (cid:96) ) for all 0 ≤ i, j, k ≤ with (cid:96) > j .This is a valid relaxation, because it is checking condition 5 at only a ﬁnite set of points.Now that we have described the constraints of the modiﬁed LP, and proved that any feasible function g mustsatisfy them, we turn to describing the objective of the modiﬁed LP. Objective . Recall that the LP in Section 4.2 approximated the objective in Theorem 1 by1. Replacing the min ≤ γ,τ ≤ and min θ ≤ γ expressions with minimums over a ﬁnite set of values, and2. Replacing the integrals with ﬁnite sums.For the modiﬁed LP, we also replace the min expressions with minimums over a ﬁnite set of values as before.Since this can only increase the objective, this is valid for obtaining an upper bound. To get an upper boundon the integrals, we replace them with right Riemann sums whenever the integrand is monotone increasing.It turns out that all but one of the integrals in the objective has a monotone increasing integrand. For theremaining integral (which is not necessarily monotone increasing or monotone decreasing), we replace it witha trapezoidal sum and bound the error using a Lipschitz continuity argument.More precisely, recall that the bound in Theorem 1 ismin ≤ γ,τ ≤ f ( γ, τ ) , where f ( γ, τ ) = (1 − τ )(1 − γ ) + (1 − τ ) (cid:90) γ g ( x, τ ) dx + (cid:90) τ min θ ≤ γ (cid:40) (1 − g ( θ, y )) + (cid:90) θ g ( x, y ) dx + (cid:90) γθ g ( x, τ ) dx (cid:41) dy Below, we describe how we express the objective in the modiﬁed LP. We justify that each step can onlyincrease the objective. • Replace min ≤ γ,τ ≤ f ( γ, τ ) by min ( i,j ) ∈ S f ( x i , y j ), for some small ﬁnite subset S ⊂ { , . . . , n } . Thiscan only increase the objective. The way we chose the subset S is by observing, empirically, where theexpression min ≤ γ,τ ≤ f ( γ, τ ) attains its minimum. (We did this by computing min ≤ i,j ≤ n f ( x i , y j )for several small values of n , and seeing at which points ( x i , y j ) in the unit square the minimum wasattained.) We then chose several of these points to put into S ; in the end our set S contained 7 points. These points were (0 , , (1 , , (0 , , (1 , , ( , ) , ( , ) , ( , ). S (instead of a discretizationof the entire unit square) is because this led to a substantially smaller LP while only increasing theobjective value by a negligible amount. The full LP is very computationally expensive to solve, so thistype of heuristic was necessary for obtaining an upper bound in a reasonable amount of computationtime. • Replace min θ ≤ γ by min θ ∈{ ,γ } . Again, this can only increase the objective. We opted to take aminimum over just the two values { , γ } , instead of a discretization of the interval [0 , γ ], because thisled to a substantially smaller LP while increasing the objective value by only a negligible amount.Empirically, we noticed that for the function g returned by the LP, the min θ ≤ γ was usually attainedat θ = 0 or θ = γ . • Replace the integral (cid:90) τ min θ ∈{ ,γ } (cid:40) (1 − g ( θ, y )) + (cid:90) θ g ( x, y ) dx + (cid:90) γθ g ( x, τ ) dx (cid:41) dy with a trapezoidal sum. That is, if h ( y ) is the integrand, then we replace the integral (cid:82) τ h ( y ) dy with n (cid:80) (cid:96) − j =0 h ( y j +1 )+ h ( y j ))2 , where y (cid:96) = τ . This by itself may not result in an upper bound – in particular,it is not clear if h ( y ) is monotone in y . However, it cannot be too far from an upper bound, and wecan quantify this using a Lipschitz argument.Indeed, note that h ( y ) is 1-Lipschitz in y . (The argument for this is the same as one used in theproof of Lemma 3, so we choose to not repeat it here.) Hence, by Lemma 4 in Appendix A, we have (cid:82) y j +1 y j h ( y ) dy ≤ h ( y j +1 )+ h ( y j )2 n + n . (Equality is attained when h ( y j ) = h ( y j +1 ), and h is linear withslope 1 on the interval [ y j , y j + n ], and linear with slope − y j + n , y j +1 ].)Therefore, (cid:90) y (cid:96) h ( y ) dy ≤ n (cid:96) − (cid:88) j =0 h ( y j +1 ) + h ( y j )2 + (cid:96) n ≤ n (cid:96) − (cid:88) j =0 h ( y j +1 ) + h ( y j )2 + 14 n . Thus, we can replace the integral (cid:82) y (cid:96) h ( y ) dy with the sum n (cid:80) (cid:96) − j =0 h ( y j +1 )+ h ( y j )2 + n , and this will bea valid upper bound. • Replace the three other integrals (which are over x ) with right Riemann sums over the discretized grid.Since those three integrals all have a monotone increasing integrand (as g ( x, y ) is increasing in x ), itfollows that replacing them with right sums can only increase the objective. For example, the integral (cid:82) x k g ( x, τ ) dx is replaced by the sum n (cid:80) ki =1 g ( x i , τ ). Putting together.

Putting everything together, the upper bound LP is as follows:max t s.t. t ≤ (1 − x k )(1 − y (cid:96) ) + (1 − y (cid:96) ) · n k (cid:88) i =1 g ( x i , y (cid:96) ) + 1 n  (cid:96) − (cid:88) j =0 ˜ h ( y j +1 ) + ˜ h ( y j )2  + 14 n ∀ k, (cid:96) ∈ [ n ]and g satisﬁes the constraints described above.In the above, ˜ h is the function h with the integrals replaced by right Riemann sums:˜ h ( y j ) := min x i ∈{ ,x k } (cid:40) − g ( x i , y j ) + 1 n i (cid:88) d =1 g ( x d , y j ) + 1 n k (cid:88) d = i +1 g ( x d , y (cid:96) ) (cid:41) We solved the LP with n = 210, and obtained an objective value of 0.6688. This shows that any function g satisfying conditions 1–5 has value at most 0.6688 when plugged into the bound in Theorem 1.20 a) Contour plot of the function values obtained fromsolving the LP in Section 4. Here, we used a 50 × g ( x, y ) = ( h ( x ) +1 − h ( y )), where h ( x ) = min(1 , e x ). This is thefunction used by Huang et al. Figure 6: Side-by-side comparison of the function g we used, versus the function g used by Huang et al. Figure 6 compares the contour plot of the function g we used, to the function g ( x, y ) = ( h ( x ) + 1 − h ( y ))used by Huang et al. (Here, h ( x ) = min(1 , e x ) . ) The plots look qualitatively quite diﬀerent. One interestingquestion is to try to extrapolate a simple function g from the LP contour plot, such that when g is pluggedinto the bound in Theorem 1, it improves upon the competitive ratio in Huang et al. Visually, the LP contourplot suggests trying a piecewise-linear g with two pieces, where one piece only depends on y . However, wecan prove that no function in this class can improve upon the competitive ratio in Huang et al.As we have noted, because of our upper bound, the value of the competitive ratio cannot be improved bymuch without either weakening the assumptions on g that we use in Theorem 1 or improving the analysisof Huang et al. [HTWZ19] that we use. Huang et al. give a potentially stronger bound on the competitiveratio in the conclusion of their paper. However, it was unclear to us how to express their stronger bound asa linear program on discretized values of g .As discussed in Section 7, in order to compute the upper bound, we signiﬁcantly relaxed the points at whichthe minimums of the function f ( γ, τ ) are taken, yet this did not change the value of the LP by much. Itseems possible that the Huang et al. analysis can be simpliﬁed to reﬂect this fact.It would also be interesting to derive an improved upper bound for this problem. To our knowledge, thebest upper bound is the same as for the unweighted online bipartite matching problem with random arrivals,which is 0.823, and is due to Manshadi, Oveis Gharan, and Saberi [MOS12]. References [AGKM11] Gagan Aggarwal, Gagan Goel, Chinmay Karande, and Aranyak Mehta. Online vertex-weightedbipartite matching and single-bid budgeted allocations. In

Proceedings of the Twenty-SecondAnnual ACM-SIAM Symposium on Discrete Algorithms , SODA ’11, page 1253–1264, USA, 2011.Society for Industrial and Applied Mathematics.[BM08] Benjamin Birnbaum and Claire Mathieu. On-line bipartite matching made simple.

SIGACTNews , 39:81–87, 2008.[DJK13] Nikhil R. Devanur, Kamal Jain, and Robert D. Kleinberg. Randomized primal-dual analysis ofranking for online bipartite matching.

Proceedings of the 24th Annual ACM-SIAM Symposiumon Discrete Algorithms , Jun 2013. 21GM08] Gagan Goel and Aranyak Mehta. Online budgeted matching in random input models withapplications to adwords. In

Proceedings of the Nineteenth Annual ACM-SIAM Symposium onDiscrete Algorithms , SODA ’08, page 982–991, USA, 2008. Society for Industrial and AppliedMathematics.[HTWZ19] Zhiyi Huang, Zhihao Gavin Tang, Xiaowei Wu, and Yuhao Zhang. Online vertex-weightedbipartite matching: Beating 1 − /e with random arrivals. ACM Transactions on Algorithms ,15(3), June 2019.[KMT11] Chinmay Karande, Aranyak Mehta, and Pushkar Tripathi. Online bipartite matching withunknown distributions. In

Proceedings of the Forty-Third Annual ACM Symposium on Theoryof Computing , STOC ’11, page 587–596, 2011.[KVV90] R. M. Karp, U. V. Vazirani, and V. V. Vazirani. An optimal algorithm for on-line bipartitematching.

Proceedings of the 22nd Annual ACM Symposium on Theory of Computing - STOC90 , 1990.[MOS12] Vahideh H. Manshadi, Shayan Oveis Gharan, and Amin Saberi. Online stochastic matching:Online actions based on oﬄine statistics.

Mathematics of Operations Research , 37:559–573,2012.[MY11] Mohammad Mahdian and Qiqi Yan. Online bipartite matching with random arrivals.

Proceedingsof the 43rd Annual ACM Symposium on Theory of Computing - STOC 11 , 2011.[Sch03] Alexander Schrijver.

Combinatorial Optimization: Polyhedra and Eﬃciency . Springer, 2003.

A Approximating the Integral of a Lipschitz function

In this section, we show that integrals of Lipschitz functions are well-approximated by trapezoidal sums.The below lemma is used twice in our previous proofs, once in the proof of Lemma 3, and once in the proofof Theorem 2.

Lemma 4.

Suppose f : R → R is L -Lipschitz. Let n be a positive integer, and let x k = kn for all integers k .Then for all i < j , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) x j x i f ( x ) dx − n j − (cid:88) k = i ( f ( x k ) + f ( x k +1 )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ L ( j − i )4 n . Proof.

It suﬃces to show that (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) x k +1 x k f ( x ) dx − f ( x k ) + f ( x k +1 )2 n (cid:12)(cid:12)(cid:12)(cid:12) ≤ L n . The statement in the lemma then follows from summing over k , and then applying the triangle inequality.For clarity of exposition, we will prove the following simpler claim. The proof of this claim generalizes easilyto the more general statement above. Claim 7.

Whenever f is 1-Lipschitz, (cid:90) f ( x ) dx − f (0) + f (1)2 ≤ Proof.

Deﬁne g ( x ) to be the linear function through (0 , f (0)) with slope 1. Similarly, deﬁne g ( x ) tobe the linear function passing through (1 , f (1)) with slope −

1. Since f is 1-Lipschitz, it follows that f ( x ) ≤ min { g ( x ) , g ( x ) } on the interval [0 , (cid:82) f ( x ) dx ≤ (cid:82) min { g ( x ) , g ( x ) } dx. (Foran illustration, see Figure 7.) 220 , f (0)) (1 , f (1)) g ( x ) g ( x ) f ( x ) Figure 7: The 1-Lipschitz function f , together with its two upper bounds g and g .It is a straightforward exercise to calculate that (cid:90) min { g ( x ) , g ( x ) } dx − f (0) + f (1)2 = 14 · (cid:0) − ( f (1) − f (0)) (cid:1) ≤ ,,