A Converse to Banach's Fixed Point Theorem and its CLS Completeness
Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis
aa r X i v : . [ c s . CC ] F e b A Converse to Banach’s Fixed Point Theoremand its CLS Completeness
Constantinos DaskalakisEECS and CSAIL, [email protected] Christos TzamosEECS and CSAIL, [email protected] Manolis ZampetakisEECS and CSAIL, [email protected]
Abstract
Banach’s fixed point theorem for contraction maps has been widely used to analyze theconvergence of iterative methods in non-convex problems. It is a common experience, however,that iterative maps fail to be globally contracting under the natural metric in their domain,making the applicability of Banach’s theorem limited. We explore how generally we can applyBanach’s fixed point theorem to establish the convergence of iterative methods when pairing itwith carefully designed metrics.Our first result is a strong converse of Banach’s theorem, showing that it is a universalanalysis tool for establishing global convergence of iterative methods to unique fixed points, andfor bounding their convergence rate. In other words, we show that, whenever an iterative mapglobally converges to a unique fixed point, there exists a metric under which the iterative mapis contracting and which can be used to bound the number of iterations until convergence. Weillustrate our approach in the widely used power method, providing a new way of bounding itsconvergence rate through contraction arguments.We next consider the computational complexity of Banach’s fixed point theorem. Makingthe proof of our converse theorem constructive, we show that computing a fixed point whoseexistence is guaranteed by Banach’s fixed point theorem is CLS-complete. We thus provide thefirst natural complete problem for the class CLS, which was defined in [DP11] to capture thecomplexity of problems such as P-matrix LCP, computing KKT-points, and finding mixed Nashequilibria in congestion and network coordination games.
Introduction
Several widely used computational methods are fixed point iteration methods. These include gra-dient descent, the power iteration method, alternating optimization, the expectation-maximizationalgorithm, k -means clustering, and others. In several important applications, we have theoreticalguarantees for the convergence of these methods. For example, convergence to a unique solutioncan be guaranteed when the method is explicitly, or can be related to, gradient descent on a convexfunction [BTN01, Nes13, BV04]. More broadly, convergence to a stationary point can be guaranteedwhen the method is, or can be related to, gradient descent; for some interesting recent work on thelimit points of gradient descent, see [PP16, LSJR16] and their references.Another, more general, style of analysis for proving convergence of fixed point iteration methodsis via a potential (a.k.a. Lyapunov) function. For example, analyzing the power iteration methodamounts to showing that, as time progresses, the unit vector maintained by the algorithm placesmore and more of its ℓ energy on the principle eigenvector of the matrix used in the iteration, if it isunique, or, anyways, on the eigenspace spanned by the principal eigenvectors; see Appendix A.2 fora refresher. In passing, it should also be noted that the power iteration method itself is commonlyused as a tool for establishing the convergence of other fixed point iteration methods, such asalternating optimization; e.g. [Har14].Ultimately, all fixed point iteration methods aim at converging to a fixed point of their iterationmap. For global convergence to a unique solution, it should also be the case that the fixed pointof the iteration map is unique. It is, thus, unsurprising that another widely used approach forestablishing convergence of these methods is by appealing to Banach’s fixed point theorem. Torecall, consider an iteration map x t +1 ← f ( x t ) , where f : D → D , and suppose that there is adistance metric d such that ( D , d ) is a complete metric space and f is contracting with respect to d , i.e. for some constant c < , d ( f ( x ) , f ( y )) ≤ c · d ( x, y ) , for all x, y ∈ D . Under this condition,Banach’s fixed point theorem guarantees that there is a unique fixed point x ∗ = f ( x ∗ ) . Moreover,iterating f is bound to converge to x ∗ . Specifically, the t -fold composition, f [ t ] , of f with itselfsatisfies: d ( f [ t ] ( x ) , x ∗ ) ≤ c t d ( x , x ∗ ) , for any starting point x .Given Banach’s theorem, if you established that your iteration method is contracting undersome distance metric d , you would also have immediately proven that your method converges andthat it may only converge to a unique point. Moreover, you can predict how many steps you needfrom any starting point x to reach an approximate fixed point x satisfying d ( f ( x ) , x ) < ǫ for someaccuracy ǫ . Alas, several widely used fixed point iteration methods are not generally contracting,or only contracting in a small neighborhood around their fixed points and not the entire domainwhere they are defined. At least, this is typically the case for the metric d under which approximatefixed points, d ( f ( x ) , x ) < ǫ , are sought. There is also quite an important reason why they may notbe contracting: several of these methods may in fact have multiple fixed points.Given the above motivation, our goal in this paper is to investigate the extent to which Banach’sfixed point theorem is a universal analysis tool for establishing that a fixed point iteration methodboth converges and globally converges to a unique fixed point. More precisely, our question is thefollowing: if an iterative map x t +1 ← f ( x t ) for some f : D → D converges to a unique fixed point x ∗ from any starting point, is there always a way to prove this using Banach’s fixed point theorem?Additionally, can we always use Banach’s fixed point theorem to compute how many iterations wewould need to find an approximate fixed point x of f satisfying d ( x, f ( x )) < ǫ , for some distancemetric d and accuracy ǫ > ?We study this question from both a mathematical and a computational perspective. On themathematical side, we show a strong converse of Banach’s fixed point theorem, saying the following: Indeed, it can be easily shown that d ( f [ t +1] ( x ) , f [ t ] ( x )) ≤ c t d ( x , x ) . So t = log /c d ( x ,x ) ǫ steps suffice. x t +1 ← f ( x t ) for some f : D → D , some distance metric d on D such that ( D , d ) is a complete and proper metric space, and some accuracy ǫ > , if f has a unique fixed pointthat the f -iteration converges to from any starting point, then for any constant c ∈ (0 , , thereexists a distance metric d c on D such that:1. d c certifies uniqueness and convergence to the fixed point, by satisfying d c ( f ( x ) , f ( y )) ≤ c · d c ( x, y ) , for all x, y ∈ D ;2. d c allows an analyst to predict how many iterations of f would suffice to arrive at an approx-imate fixed point x satisfying d ( x, f ( x )) < ǫ ; notice in particular that we are interested infinding an approximate fixed point with respect to the original distance metric d (and not theconstructed one d c ).Our converse theorem is formally stated as Theorem 1 in Section 3. In the same section we discussits relationship to other known converses of Banach’s theorem known in the literature, in particularBessaga’s and Meyers’s converse theorems. The improvement over these converses is that ourconstructed metric d c is such that it allows us to bound the number of steps requied to reach anapproximate fixed point according to the metric of interest d and not just d c ; namely Property 2above holds. We discuss this further in Section 3.3. Section 3.2 provides a sketch of the proof, andthe complete details can be found in Appendix B.While the proof of Theorem 1 is non-constructive, it does imply that Banach’s fixed pointtheorem is a universal analysis tool for establishing global convergence of fixed point iterationmethods to unique solutions. Namely, it implies that one can always find a witnessing metric. Weillustrate this by studying an important such method: power iteration. The power iteration methodis a widely-used and well-understood method for computing the eigenvalues and eigenvectors of amatrix. It is well known that if a matrix A has a unique principal eigenvector, then the powermethod starting from a vector non-perpendicular to the principal eigenvector will converge to it.This is shown using a potential function argument, outlined above and in Appendix A.2, which alsopins down the rate of convergence.Our converse to Banach’s theorem, guarantees that, besides the potential function argument,there must also exist a distance metric under which the power iteration is a contraction map. Such adistance metric is not obvious, as contraction under any ℓ p -norm fails; we provide counter-examplesin Section 4. To illustrate our theorem, we identify a new distance metric under which the powermethod is indeed contracting at the optimal rate. See Proposition 1. Our distance metric servesas an alternative proof for establishing that the power iteration converges and for pinning down itsconvergence rate.We close the circle by studying Banach’s fixed point theorem from a computational standpoint.Recent work of Daskalakis and Papadimitriou [DP11] has identified a complexity class, CLS, wherefinding a Banach fixed point lies. CLS, defined formally in Section 5, is a complexity class at theintersection of PLS [JPY88] and PPAD [Pap94]. Roughly speaking, PLS contains total problemswhose existence of solutions is guaranteed by a potential function argument, while PPAD containstotal problems whose existence of solutions is guaranteed by Brouwer’s fixed point theorem. Lotsof interesting work has been done on both classes in the past two decades; for a small sample seee.g. [DGP09, CDT09, Rub16, FPT04, SV08, ER17, ABPW17] and their references. CLS, lying inthe intersection of PLS and PPAD, contains comptutational problems whose existence of solutionsis guaranteed by both a potential function and a fixed point argument. More precisely, it contains all problems reducible to
Continuous LocalOpt , defined in Section 5, and whichdoesn’t necessarily capture the whole intersection of PPAD and PLS.
ContinuousLocalOpt , through which the class was defined. By making our converse to Banach’s fixed pointtheorem constructive, we show that finding a Banach fixed point is CLS-complete. More precisely,in Section 5 we define problem
Banach , whose input is a continuous function f and a continuousmetric d , and whose goal is to either output an approximate fixed point of f or a violation of thecontraction of f with respect to d . In Theorem 2 we show that Banach is CLS-complete. Further Related Work.
We note that contemporaneously and independently from our work,Fearnley et al. [FGMS17] have also identified a CLS-complete problem related to Banach’s fixedpoint theorem. Their problem, called
MetametricContraction , takes as input a function f anda metametric d , and asks to find an approximate fixed point of f , or a violation of the contraction of f with respect to d . In comparison to our CLS-completeness results, the CLS-hardness of Banach inour paper is stronger than that of
MetametricContraction as the input to
Banach is a metric.On the other hand, the containment of
MetametricContraction into CLS is stronger than thecontainment of
Banach , as
Banach is polynomial-time reducible to
MetametricContraction . Basic Notation
We use R + to refer to set of non-negative real numbers and N is the set of naturalnumbers except . We call a function f selfmap if it maps a domain D to itself, i.e. f : D → D .For a selfmap f we use f [ n ] to refer to the n times composition f with it self, i.e. f ( f ( . . . f ( · ))) | {z } n times .We use k·k p to refer to the ℓ p norm of a vector in R n . We use D / ∼ to refer to the set ofequivalence classes of the equivalence relation ∼ on a set D . Finally, we use S ∗ to refer to theKleene star of a set S .A real valued function g : D → R is called symmetric if g ( x, y ) = g ( y, x ) and anti-symmetric if g ( x, y ) = − g ( y, x ) .In Appendix A.1, the reader can find some well-known definitions that we are going to use inthe rest of the paper. More precisely in the field of: Topological Spaces , we define the notion of: topology, topological spaces, open sets, closedsets, interior of a set A , denoted Int( A ) , closure of a set A , denoted Clos( A ) . Metric Spaces , we define the notion of: distance metric, metric space, diameter, boundedmetric space, continuous function, open and closed sets in a metric space, compact set, locally It is worth pointing out that, while some problems in CLS (e.g. Banach fixed points, simple stochastic games)have unique solutions, most do not. Given that contraction maps have unique fixed points, the way we bypass thepotential oxymoron, is by accepting as solutions violations of contraction.
Definition 1.
Let D be a set and d : D → R a function with the following properties:(i) d ( x, y ) ≥ for all x, y ∈ D .(ii) d ( x, y ) = 0 if and only if x = y .(iii) d ( x, y ) = d ( y, x ) for all x, y ∈ D .(iv) d ( x, y ) ≤ d ( x, z ) + d ( z, x ) for all x, y, z ∈ D . This is called triangle inequality .Then we say that d is a metric on D , and ( D , d ) is a metric space . Basic Iterative Procedure.
If a selfmap f has a fixed point and is continuous, we can definethe following sequence of points x n +1 = f ( x n ) where the starting point x can be picked arbitrarily.If ( x n ) converges to a point ¯ x then lim n →∞ x n +1 = lim n →∞ f ( x n ) ⇒ lim n →∞ x n +1 = f (cid:16) lim n →∞ x n (cid:17) ⇒ ¯ x = f (¯ x ) . This observation implies that a candidate procedure for computing a fixed point of a selfmap f isto iteratively apply the function f starting from an arbitrary point x . If this procedure convergesthen the limit is a fixed point x ∗ of f . We will refer to this method of computing fixed points asthe Basic Iterative Procedure . Arithmetic Circuits.
In Section 5 we work with functions from continuous domains to continu-ous domains represented as arithmetic circuits . An arithmetic circuit is defined by a directed acyclicgraph (DAG). The inputs to the circuit are in-degree nodes, and the outputs are out-degree nodes. Each non-input node is a gate from the set { + , − , ∗ , max , min , > } , performing an operationon the outputs of its in-neighbors. The meaning of the “ > ” gate is > ( x, y ) = 1 if x > y and otherwise. We also allow “output a rational constant” gates. These are gates without any inputs,which output a rational constant. We start, in Section 3.1, with an overview of known converses to Banach’s fixed point theorem. Wealso explain why these converses are not enough to prove that Banach’s fixed point theorem is auniversal tool for analyzing the convergence of iterative algorithms. Then, in Section 3.2, we provea stronger converse theorem that demonstrates the universality of Banach’s fixed point theoremfor the analysis of iterative algorithms. Before beginning, we formally state Banach’s fixed PointTheorem. A useful survey of the applications of this theorem can be found in [Con14].
Banach’s Fixed Point Theorem.
Suppose d is a distance metric function such that ( D , d ) is acomplete metric space, and suppose that f : D → D is a contraction map according to d , i.e. d ( f ( x ) , f ( y )) ≤ c · d ( x, y ) , ∀ x, y, for some c < . (1) Then f has a unique fixed point x ∗ and the convergence rate of the Basic Iterative Procedure withrespect to d is c . That is, d ( f [ n ] ( x ) , x ∗ ) < c n · d ( x , x ∗ ) , for all x . .1 Known Converses to Banach’s Fixed Point Theorem The first known converse to Banach’s fixed point theorem is the following [Bes59].
Bessaga’s Converse Theorem.
Let f be a map from D to itself, and suppose that f [ n ] has uniquefixed point for every n ∈ N . Then, for every constant c ∈ (0 , , there exists a distance metric d c such that ( D , d c ) is a complete metric space and f is a contraction map with respect to d c withcontraction constant c . The implication of the above theorem is that, if we want to prove existence and uniquenessof fixed points of f [ n ] for all n , then Banach’s fixed point theorem is a universal way to do it.Moreover, there is a potential function of the form p ( x ) = d c ( x, f ( x )) , where d c is a distance metric,that decreases under successive applications of f , and successive applications of f starting from anypoint x are bound to converge to the unique fixed point of f .Unfortunately, d c cannot provide any information about the number of steps that the BasicIterative Procedure needs before computing an approximate fixed point under some metric d ofinterest. The reason is that, after log c ε steps of the Basic Iterative Procedure, we only have d c ( x n , f ( x n )) ≤ ε . However, d c might not have any relation to d , hence an approximate fixed pointunder d c may not be one for d . So Bessaga’s theorem is not useful for bounding the running timeof iterative methods for approximate fixed point computation.Given the above discussion, it is reasonable to expect that a converse to Banach’s theorem thatis useful for bounding the running time of approximate fixed point computation methods shouldtake into account, besides the function f and its domain D , the distance metric d under which weare interested in computing approximate fixed points. One step in this direction has already beenmade by Meyers [Mey67]. Meyers’s Converse Theorem.
Let ( D , d ) be a complete metric space, where D is compact, andsuppose that f : D → D is continuous with respect to d . Suppose further that f has a unique fixedpoint x ∗ , that the Basic Iterative Method converges to x ∗ from any starting point, and that thereexists an open neighborhood U of x ∗ such that f [ n ] ( U ) → { x ∗ } . Then, for any c ∈ (0 , , there existsa distance metric d c , which is topologically equivalent to d , such that ( D , d c ) is a complete metricspace and f is a contraction map with respect to d c with contraction c . Compared to Bessaga’s theorem, the improvement offered by Meyer’s Theorem is that, insteadof the existence of an arbitrary metric, it proves the existence of a metric that is topologicallyequivalent to the metric d . However, this is still not enough to bound the number of steps neededby the Basic Iterative Procedure in order to arrive at a point x n such that d ( x n , f ( x n )) ≤ ε . Ourgoal in the next section is to close this gap. We will also replace the compactness assumption withthe assumption that ( D , d ) is proper, so that the converse holds for unbounded spaces. The main technical idea behind our converse to Banach’s fixed point theorem is to adapt the proofof Meyers’s theorem to get a distance metric d c with the property d c ( x, y ) ≥ d ( x, y ) everywhere,except maybe for the region d ( x, x ∗ ) ≤ ε . This implies that, if we guarantee that d c ( x n , x ∗ ) ≤ ε ,then d ( x n , x ∗ ) ≤ ε . Theorem 1.
Suppose ( D , d ) is a complete, proper metric space, f : D → D is continuous withrespect to d and the following hold:1. f has a unique fixed point x ∗ ;2. for every x ∈ D , the sequence ( f [ n ] ( x )) converges to x ∗ with respect to d ; moreover there existsan open neighborhood U of x ∗ such that f [ n ] ( U ) → { x ∗ } . hen, for every c ∈ (0 , and ε > , there exists a distance metric function d c,ε that is topologicallyequivalent to d and is such that ( D , d c,ε ) is a complete metric space and ∀ x, y ∈ D : d c,ε ( f ( x ) , f ( y )) ≤ c · d c,ε ( x, y ); (2a) ∀ x, y ∈ D : d c,ε ( x, y ) ≤ ε = ⇒ min { d ( x ∗ , x ) , d ( x ∗ , y ) , d ( x, y ) } ≤ ε. (2b) Remark.
Notice that the continuity of f is a necessary assumption for the above statement tohold, as (2a) implies continuity given that d c,ε and d are topologically equivalent. Also the condition2. of the theorem is implied by the existence of d c,ε and it is not true even if f [ n ] ( x ) → x ∗ for any n ∈ N , since counter examples exist. Therefore this assumption is also necessary for our theorem tohold.The proof of our Theorem 1 adapts the construction of Meyers’s proof, to ensure that (2b) issatisfied. We give here a proof sketch postponing the complete details to Appendix B, where werepeat also all the technical details proven by Meyers [Mey67]. Proof Sketch.
The construction of the metric d c follows is done in three steps:I. Starting from the original metric d , a non-expanding closure of d is defined as the metric d M ( x, y ) = sup i ≥ d ( f ( i ) ( x ) , f ( i ) ( y )) . This is topologically equivalent to d , but ensures thatthe images of any two points are at least as close in d M as the original two points (non-expanding property).Notice that as d M ( x, y ) ≥ d ( x, y ) for all points x, y ∈ D , if we ensure that Property (2b) holdswith respect to d M for the final constructed metric d c,ε , it will also hold with respect to theoriginal metric d .II. Given d M , the construction proceeds by defining a function ρ c,ε which satisfies (2a). Thisfunction achieves contraction by a constant c < by counting the number of steps requiredto reach an ε -ball close to the fixed point.While for the original proof of Meyer any such ε -ball suffices, in order to guarantee Property(2b), our proof requires a set S of points with small diameter with respect to d such thatperforming an iteration of f on any one of them results in a point still in the set S . We showthat such a set always exists in Lemma 5 in Appendix B.This guarantees that ρ c,ε ( x, y ) ≥ d M ( x, y ) if max { d ( x ∗ , x ) , d ( x ∗ , y ) } ≥ ε , and therefore Prop-erty (2b) is preserved.The function ρ c,ε satisfies all required properties other than triangle inequality and thus is nota metric. However, it can be converted into one.III. Given ρ c,ε , we construct the sought after metric d c,ε by taking it equal to the ρ c,ε -geodesicdistance (metric closure of ρ c,ε ). This directly converts ρ c,ε into a metric. We show thatafter this operation Properties (2a) and (2b). This is done in Lemma 9 and Lemma 10 inAppendix B. Property (2b) of the metric output by Theorem 1 has some interesting corollaries that we wouldnot be able to get using the known converses to Banach’s theorem discussed in Section 3.1. Thefirst one is that we can now compute, from d c,ε , the number of iterations needed in order to get towithin ε of the fixed point x ∗ of f from any starting point x ∈ D .6 orollary 1. Under the assumptions of Theorem 1, starting from a point x ∈ D , and for anyconstant c ∈ (0 , , the Basic Iterative Procedure finds a point x such that d ( x, x ∗ ) ≤ ε after log( d c,ε/ ( x , f ( x ))) + log((2 − c ) /ε )log(1 /c ) iterations, where d c,ε/ is the metric guaranteed by Theorem 1. In Corollary 1, for any given ε of interest, we have to identify a different distance metric d c,ε/ ,guaranteed by Theorem 1, to bound the number of steps required by the Basic Iterative Procedure toget to within ε from the fixed point. Sometimes we are interested in the explicit tradeoff betweenthe number of steps required to get to the proximity of the fixed point and the amount of proximity ε . To find such a tradeoff we have to make additional assumptions on f . A mild assumption thatis commonly satisfied by iterative procedures for non-convex problems is that the Basic IterativeProcedure locally converges to the fixed point x ∗ . That is, if x is appropriately close to x ∗ , thenthe Basic Iterative Procedure converges. A common way of proving local convergence is to provethat f is a contraction with respect to d locally for x, y ∈ ¯ B ( x ∗ , ε ) . Theorem 1 provides a way toextend this local contraction property to the whole domain D and get an an explicit closed form ofthe tradeoff between the number of steps and ε , as implied by the following result. Corollary 2.
Under the assumptions of Theorem 1, and the assumption that there exists < c < , δ > such that d ( f ( x ) , f ( y )) ≤ c · d ( x, y ) for all x, y ∈ ¯ B ( x ∗ , δ ) , starting from any point x ∈ D , the Basic Iterative Procedure finds a point x such that d ( x, x ∗ ) ≤ ε after log( d c,δ/ ( x , f ( x ))) + log(1 /ε ) + log(1 − c ) + 1log(1 /c ) + 1 iterations, where d c,δ/ is the metric guaranteed by Theorem 1. The results of the previous section imply that Banach’s fixed point theorem is a universal analysistool for establishing global convergence of fixed point iteration methods to unique solutions. Whilethe proof of Theorem 1 is non-constructive, it does imply that one can always find a witnessingmetric under which the iterative map is contracting.In this section, we illustrate this possibility by studying an important iterative method, thepower iteration. The power iteration method is a widely-used and well-understood method forcomputing the eigenvalues and eigenvectors of a matrix. For a given matrix A , it is defined as: x t +1 = A x t k A x t k It is well known that if a matrix A has a unique principal eigenvector, then the power methodstarting from a vector non-perpendicular to the principal eigenvector will converge to it. This isshown using a potential function argument, presented in Appendix A.2, which also pins down therate of convergence.Our converse to Banach’s theorem, guarantees that, besides the potential function argument,there must also exist a distance metric under which the power iteration is a contraction map. Toillustrate our theorem, we identify a new distance metric under which the power method is indeedcontracting at the optimal rate.Such a distance metric is not obvious. As the following counterexample shows, contraction underany ℓ p -norm fails. 7 ounterexamples for k·k p . We show a counter example for ℓ norm which directly extends toany ℓ p norm. In particular, let n = 2 , λ = 2 , λ = 1 and the corresponding eigenvectors be e = (1 , and e = (0 , . The power iteration is given by f ( x ) = (2 x ,x ) √ x + x . We set x = (cid:16) √ , √ (cid:17) and y = (cid:16) √ , √ (cid:17) . We get that k f ( x ) − f ( y ) k = (cid:13)(cid:13)(cid:13)(cid:16) √ , √ (cid:17) − (cid:16) √ , √ (cid:17)(cid:13)(cid:13)(cid:13) ≥ . . Also k x − y k = (cid:13)(cid:13)(cid:13)(cid:16) √ , √ (cid:17) − (cid:16) √ , √ (cid:17)(cid:13)(cid:13)(cid:13) ≤ . and therefore k f ( x ) − f ( y ) k > k x − y k .Even though contraction is not achieved under any ℓ p -norm, it is possible to construct a metricunder which power iteration is contracting even at the optimal rate which is given by the ratio ofthe two largest eigenvalues of matrix A . Our next theorem constructs such a metric. Proposition 1.
Let A ∈ R n × n be a matrix with left eigenvector-eigenvalue pairs ( λ , v ) , ..., ( λ n , v n ) such that λ > λ ≥ ... ≥ λ n . Then the power iteration, x t +1 = f ( x t ) , A x t k A x t k is contracting underthe metric d ( x , y ) = (cid:13)(cid:13)(cid:13) x h x , v i − y h y , v i (cid:13)(cid:13)(cid:13) with contraction constant λ /λ , i.e. for all x , y ∈ R n : d ( f ( x ) , f ( y )) ≤ λ λ d ( x , y ) . Moreover, t = log( d ( x , v ) /ε )log( λ /λ ) iterations suffice to have k x t − v k ≤ d ( x t , v ) ≤ ε .Proof. For any vector x , it holds that h A x , v i = λ h x , v i . We have that d ( f ( x ) , f ( y )) = (cid:13)(cid:13)(cid:13)(cid:13) A x h A x , v i − A y h A y , v i (cid:13)(cid:13)(cid:13)(cid:13) = 1 λ (cid:13)(cid:13)(cid:13)(cid:13) A (cid:18) x h x , v i − y h y , v i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ λ λ (cid:13)(cid:13)(cid:13)(cid:13) x h x , v i − y h y , v i (cid:13)(cid:13)(cid:13)(cid:13) = λ λ d ( x , y ) where the inequality is true as the vector x h x , v i − y h y , v i is perpendicular to the principal eigenvector v . This shows that f is contracting with respect to d as required.To convert a bound on the d metric to a bound on the error with respect to the ℓ norm, we cansee that at every step t > , k x t k = 1 . If at some step t > , it holds that d ( x t , v ) ≤ ε , we get ε ≥ d ( x t , v ) = (cid:13)(cid:13)(cid:13)(cid:13) x t h x t , v i − v (cid:13)(cid:13)(cid:13)(cid:13) = h x t , v i − − ⇒ h x t , v i ≥ (1 + ε ) − / . This implies that k x t − v k = 2(1 − h x t , v i ) ≤ (cid:0) − (1 + ε ) − / (cid:1) ≤ ε . This guarantees thatbounding the norm d by ε implies a bound of ε on the ℓ norm between the principal eigenvectorand the current iterate x t .Using these observations and following the same approach as in Corollaries 1-2 we get therequired bound on the number of iterations.Notice, that the definition of the metric in Proposition 1 depends on the principal eigenvector butnot on any of the other eigenvectors. When applied to show global convergence of Markov chains,the principal eigenvector corresponds to the stationary distribution. For a symmetric Markov chainwhose stationary distribution is uniform Proposition 1 implies that the iterations are contractingdirectly with respect to the ℓ norm. 8 Banach is Complete for
CLS
As discussed in Section 1, the complexity class
CLS was defined in [DP11] to capture problems inthe intersection of
PPAD and
PLS , such as P-matrix LCP, mixed Nash equilibria of congestion andmulti-player coordination games, finding KKT points, etc. It also contains computational variantsof finding fixed points whose existence is guaranteed by Banach’s fixed point theorem. In thissection, we close the circle by proposing two variants of Banach fixed point computation that areboth
CLS -complete. Our
CLS completeness results are obtained by making our proof of Theorem 1constructive. We start with a formal definition of
CLS , which is defined in terms of the problem
Continuous LocalOpt . Definition 2.
Continuous LocalOpt takes as input two functions f : [0 , → [0 , , p :[0 , → [0 , , both represented as arithmetic circuits, and two rational positive constants ε and λ .The desired output is any of the following:(CO1) a point x ∈ [0 , such that p ( f ( x )) ≥ p ( x ) − ε .(CO2) two points x, x ′ ∈ [0 , violating the λ -Lipschitz continuity of f , i.e. | f ( x ) − f ( x ′ ) | > λ | x − x ′ | .(CO3) two points x, x ′ violating the λ -Lipschitz continuity of p , i.e. | p ( x ) − p ( x ′ ) | > λ | x − x ′ | .The class CLS is the set of search problems that can be reduced to
Continuous LocalOpt . Remark 1.
As discussed in [DP11], both the choice of domain [0 , and the use of ℓ normin the definition of the above problem are not crucial, and high-dimensional polytopes as well asother ℓ p norms can also be used in the definition without any essential effect to the complexity ofthe problem. Moreover, instead of the functions f and p being provided in the input as arithmeticcircuits there is a canonical way to provide them in the input as binary circuits that define the valuesof f and p on all points of some finite bit complexity, and (implicitly) extend to the full domainvia continuous interpolation. In this way, we can syntactically guarantee the Lipschitz continuity ofboth f and p and can remove (CO2) and (CO3) from the above definition. For more details, pleasesee [DP11], [DGP09] and [EY07]. This remark applies to all definitions in this section. The variant of Banach’s theorem that is known to belong to
CLS is Contraction Map , definedas follows:
Definition 3 ([DP11]) . Contraction Map takes as input a function f : [0 , → [0 , repre-sented as an arithmetic circuit and three rational positive constants ε , λ , c < . The desired outputis any of the following (where d represents Euclidean distance):(Oa) a point x ∈ [0 , such that d ( x, f ( x )) ≤ ε (Ob) two points x, x ′ ∈ [0 , disproving the contraction of f w.r.t. d with constant c , i.e. d ( f ( x ) , f ( x ′ )) > c · d ( x, x ′ ) (Oc) two points x, x ′ ∈ [0 , disproving the λ -Lipschitz continuity of f , i.e. | f ( x ) − f ( x ′ ) | > λ | x − x ′ | . Contraction Map targets fixed points whose existence is guaranteed by Banach’s fixed pointtheorem when f is a contraction map with respect to the Euclidean distance. However, it doesn’tcapture the full generality of Banach’s theorem, since the latter can be applied to any completemetric space. We thus define a more general problem, Banach that: (i) still lies inside CLS, (ii)captures the generality of Banach’s theorem, (iii) and in fact tightly captures the complexity of theclass CLS, by being CLS-complete. This problem is defined as follows:9 efinition 4.
Banach takes as input two functions f : [0 , → [0 , and d : [0 , × [0 , → R represented as arithmetic circuits, where d is promised to be a metric that is topologically equivalentto the Euclidean distance and satisfy that ([0 , , d ) is a complete metric space, and three rationalpositive constants ε , λ , c < . The desired output is any of the following:(Oa) a point x ∈ [0 , such that d ( x, f ( x )) ≤ ε (Ob) two points x, x ′ ∈ [0 , disproving the contraction of f w.r.t. d with constant c , i.e. d ( f ( x ) , f ( x ′ )) > c · d ( x, x ′ ) (Oc) two points x, x ′ ∈ [0 , disproving the λ -Lipschitz continuity of f , i.e. | f ( x ) − f ( x ′ ) | > λ | x − x ′ | .(Od) four points x , x , y , y ∈ [0 , with x = x and y = y disproving the λ -Lipschitz conti-nuity of d ( · , · ) , i.e. | d ( x , x ) − d ( y , y ) | > λ ( | x − y | + | x − y | ) . Remark 2.
We remark that
Banach is tightly related to
Contraction Map defined above, withthe following differences. First, instead of Euclidean distance, the metric with respect to which f ispurportedly contacting is provided as part of the input and it is promised to be a metric. Second, weneed to add an extra type of accepted solution (Od), which is a violation of the Lipschitz propertyof that metric. This is necessary to guarantee that the above problem has a solution of polynomiallength for any possible input, and in particular is needed to place the above problem in CLS. (It isnot needed for the CLS-hardness.) Our main result is the following:
Theorem 2.
Banach is CLS -complete.
We give here a sketch of the proof of Theorem 2 and we present the full proof in Appendix C.
Proof Sketch.
Since the inclusion to
CLS is a simple argument very similar to the argument from[DP11] that shows that
Contraction Map belongs to
CLS , we focus here on the hardness proof.We are given two functions f : [0 , → [0 , , p : [0 , → [0 , and we want to find acontraction d : [0 , × [0 , → R such that f is a contraction map with respect to d and the pointswhere p ( f ( x )) ≥ p ( x ) − ε are approximate fixed points of f with respect to d .The inspiration of this proof is to make the proof of Theorem 1 constructive in polynomial time.We therefore follow the steps of the proof sketch of Theorem 1 as presented in Section 3. Step I.
Since we don’t have the strong requirement of Theorem 1 to output a metric that istopologically equivalent with some given metric we can use in place of d M any metric d ′ such that f is non-expanding with respec to d ′ . Hence we can easily observe that the discrete metric can beused as d M . Step II.
The construction of Theorem 1 uses in the definition of d ( x, y ) the number of times n ( x ) ,that we have to apply f on x in order for f [ n ( x )] ( x ) to come ε -close to the fixed point x ∗ of f . Ofcourse n ( x ) is not a quantity that can be computed in polynomial time. Instead we show that itsuffices to use an upper bound on n ( x ) which we can get using the potential function, namely p ( x ) /ε .Of course the operations that we are allowed to use to describe d as an arithmetic circuit are limitedand this step appears to need more expressive power that the simple arithmetic operations that weare allowed to use. We give a careful construction that bypasses these difficulties and completesthis step of the proof. Steps III.
This step of Theorem 1 is highly non-constructive and hence we cannot hope to replicateit in polynomial time. But we prove that our carefully designed metric already has the triangleinequality and hence the transitive closure step is not necessary.10he last part of our proof is to show that the constructed circuit of d is actually Lipschitz with arelatively small Lipschitz constant if the potential function p is Lipschitz. That is, we have to showthat the circuit of d does not need some time exponentially many bits with respect to the size ofthe circuits of p and the magnitude of the constant /ε . Not suprisingly we observe that in orderto succeed to this task we have to set approximately c = 1 − ε . This is natural to expect, since ifwe could set a much lower contraction constant then we could find the approximate fixed point of f in much less that poly(1 /ε ) steps which cannot hold given that CLS = FP.
Acknowledgements
The authors were supported by NSF CCF-1551875, CCF-1617730, CCF-1650733, and a SimonsGraduate Research Fellowship.
References [ABPW17] Omer Angel, Sébastien Bubeck, Yuval Peres, and Fan Wei. Local max-cut in smoothedpolynomial time. In
Symposium on Theory of Computing (STOC) , 2017.[Bes59] C. Bessaga. On the converse of banach "fixed-point principle".
Colloquium Mathemat-icae , 7(1):41–43, 1959.[BPR15] Nir Bitansky, Omer Paneth, and Alon Rosen. On the cryptographic hardness of findinga nash equilibrium. In
Foundations of Computer Science (FOCS), 2015 IEEE 56thAnnual Symposium on , pages 1480–1498. IEEE, 2015.[BTN01] Aharon Ben-Tal and Arkadi Nemirovski.
Lectures on modern convex optimization: anal-ysis, algorithms, and engineering applications . SIAM, 2001.[BV04] Stephen Boyd and Lieven Vandenberghe.
Convex optimization . Cambridge universitypress, 2004.[CDT09] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computingtwo-player nash equilibria.
Journal of the ACM (JACM) , 56(3):14, 2009.[Con14] Keith Conrad. The contraction mapping theorem.
Expository paper. University ofConnecticut, College of Liberal Arts and Sciences, Department of Mathematics , 2014.[DGP09] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The com-plexity of computing a nash equilibrium.
SIAM Journal on Computing , 39(1):195–259,2009.[DP11] Constantinos Daskalakis and Christos H. Papadimitriou. Continuous local search. In
Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algo-rithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011 , pages 790–804, 2011.[ER17] Michael Etscheid and Heiko Röglin. Smoothed analysis of local search for the maximum-cut problem.
ACM Transactions on Algorithms (TALG) , 2017.[EY07] Kousha Etessami and Mihalis Yannakakis. On the complexity of nash equilibria andother fixed points. In
Foundations of Computer Science (FOCS) . IEEE, 2007.[FGMS17] John Fearnley, Spencer Gordon, Ruta Mehta, and Rahul Savani. Cls: New problemsand completeness. arXiv preprint arXiv:1702.06017 , 2017.11FPT04] Alex Fabrikant, Christos Papadimitriou, and Kunal Talwar. The complexity of purenash equilibria. In
Symposium on Theory of Computing (STOC) . ACM, 2004.[Har14] Moritz Hardt. Understanding alternating minimization for matrix completion. In
Foun-dations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on , pages651–660. IEEE, 2014.[HY17] Pavel Hubacek and Eylon Yogev. Hardness of continuous local search: Query complexityand cryptographic lower bounds. In
Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 1352–1371. SIAM, 2017.[JPY88] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. How easy islocal search?
J. Comput. Syst. Sci. , 37(1):79–100, 1988.[KNY17] Ilan Komargodski, Moni Naor, and Eylon Yogev. White-box vs. black-box complexityof search problems: Ramsey and graph property testing. In
Electronic Colloquium onComputational Complexity (ECCC) , volume 24, page 15, 2017.[Kör10] T Körner. Metric and topological spaces, 2010.[LSJR16] Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. Gradientdescent only converges to minimizers. In
Proceedings of the 29th Conference on LearningTheory, COLT 2016, New York, USA, June 23-26, 2016 , pages 1246–1257, 2016.[Mey67] Philip R. Meyers. A converse to banach’s contraction theorem.
Journal of Researchof the National Bureau of Standards Section B Mathematics and Mathematical Physics ,71B(2 and 3):73, apr 1967.[Nes13] Yurii Nesterov.
Introductory lectures on convex optimization: A basic course , volume 87.Springer Science & Business Media, 2013.[Pap94] Christos H. Papadimitriou. On the complexity of the parity argument and other ineffi-cient proofs of existence.
J. Comput. Syst. Sci. , 48(3):498–532, 1994.[PP16] Ioannis Panageas and Georgios Piliouras. Gradient descent converges to minimizers:The case of non-isolated critical points.
CoRR , abs/1605.00405, 2016.[RSS16] Alon Rosen, Gil Segev, and Ido Shahaf. Can ppad hardness be based on standardcryptographic assumptions? In
Electronic Colloquium on Computational Complexity(ECCC) , volume 23, page 59, 2016.[Rub16] Aviad Rubinstein. Settling the complexity of computing approximate two-player nashequilibria. In
Foundations of Computer Science (FOCS) . IEEE, 2016.[SV08] Alexander Skopalik and Berthold Vöcking. Inapproximability of pure nash equilibria.In
Proceedings of the fortieth annual ACM symposium on Theory of computing , pages355–364. ACM, 2008. 12
Preliminaries
A.1 Basic Definitions
Topological Spaces
Let D be a set and τ a collection of subsets of D with the following properties.(a) The empty set ∅ ∈ τ and the space D ∈ τ .(b) If U a ∈ τ for all a ∈ A then S a ∈ A U a ∈ τ .(c) If U j ∈ τ for all ≤ j ≤ n ∈ N , then T nj =1 U j ∈ τ .Then we say that τ is a topology on D and that ( D , τ ) is a topological space . We call open sets themembers of τ . Also a subset C of D is called closed if D \ C is an open set, i.e. belongs to τ . Let ( D , τ ) be a topological space and A a subset of D . We write Int( A ) = [ { U ∈ τ | U ⊆ A } Clos( A ) = \ { U closed | A ⊆ U } and we call Clos( A ) the closure of A and Int( A ) the interior of A . We now give a basic lemmawithout proof. A proof can be found in [Kör10]. Lemma 1. (i)
Int( A ) = { x ∈ A | ∃ U ∈ τ with x ∈ U ⊆ A } .(ii) Clos( A ) = { x ∈ D | ∀ U ∈ τ with x ∈ U, we have U ∩ A = ∅} . Metric Spaces
The diameter of a set W ⊆ D according to the metric d is defined as diam d [ W ] = max x,y ∈ W d ( x, y ) A metric space ( D , d ) is called bounded if diam d [ D ] is finite.We define d S : D → R by d S ( x, y ) = ( x = y x = y then d S is called the discrete metric on D . Remark.
It is very easy to see that discrete metric is indeed a metric, i.e. it satisfies the condi-tions (i)-(iv).Let ( D , d ) and ( X , d ′ ) be metric spaces. A function f : D → X is called continuous if, given x ∈ D and ε > , we can find a δ ( x, ε ) > such that d ′ ( f ( x ) , f ( y )) < ε whenever d ( x, y ) < δ ( x, ε ) We say that a subset E ⊆ D is open in D if, whenever e ∈ E , we can find a δ > (dependingon e ) such that x ∈ E whenever d ( x, e ) < δ The next lemma connects the definition of open sets according to some metric with the definitionof open sets in a topological space.
Lemma 2. If ( D , d ) is a metric space, then the collection of open sets forms a topology. We define the open ball of radius r around x to be B ( x, r ) = { y ∈ D| d ( x, y ) < r } .13 losed Sets for Metric Spaces Consider a sequence ( x n ) in a metric space ( D , d ) . If x ∈ D and, given ε > , we can find an integer N ∈ N (depending maybe on ε ) such that d ( x n , x ) < ε for all n ≥ N then we say that x n → x as n → ∞ and that x is the limit of the sequence ( x n ) .A set G ⊆ D is said to be closed if, whenever x n ∈ G and x n → x then x ∈ G . A proof of thefollowing lemma can be found in [Kör10]. Lemma 3.
Let ( D , d ) be a metric space and A a subset of D . Then Clos( A ) consists of all those x ∈ D such that we can find ( x n ) with x n ∈ A with d ( x n , x ) → . We define the closed ball of radius r around x to be ¯ B ( x, r ) = { y ∈ D| d ( x, y ) ≤ r } .A subset G of a metric space ( D , d ) is called compact if G is closed and every sequence in G has aconvergent subsequence. A metric space ( D , d ) is called compact if D is compact, locally compact iffor any x ∈ D , x has a neighborhood that is compact and proper if every closed ball is compact. Complete Metric Spaces
We say that a sequence ( x n ) in D is Cauchy sequence (or d -Cauchysequence if the distance metric is not clear from the context) if, given ε > , we can find N ( ε ) ∈ N with d ( x n , x m ) < ε whenever n, m ≥ N ( ε ) A metric space ( D , d ) is complete if every Cauchy sequence converges.Two metrics d , d ′ of the same set D are called topologically equivalent (or just equivalent ) if forevery sequence ( x n ) in D , ( x n ) is d -Cauchy sequence if and only if it is d ′ -Cauchy sequence. Definition 5.
Let ( D , d ) be a metric spaces. A function f : D → D is called continuous with repectto d , if given x ∈ D and ε > , we can find a δ ( x, ε ) such that d ′ ( f ( x ) , f ( y )) < ε whenever d ( x, y ) < δ ( x, ε ) Lipschitz Continuity
Let ( D , d ) and ( X , d ′ ) be metric spaces. A function f : D → X is Lipschitzcontinuous (or ( d, d ′ ) -Lipschitz continuous if the distance metric is not clear from the context or d -Lipschitz continuous if d = d ′ ) if there exists a positive constant λ ∈ R + such that for all x, y ∈ D d ′ ( f ( x ) , f ( y )) ≤ λd ( x, y ) Lemma 4.
If a function f : D → X is Lipschitz continuous then it is continuous.
Definition 6.
Let ( D , d ) and ( X , d ′ ) be metric spaces. A function f : D → X is contraction (or ( d, d ′ ) -contraction or d -contraction if d = d ′ ) if there exists a positive constant > c ∈ R + such thatfor all x, y ∈ D d ′ ( f ( x ) , f ( y )) ≤ cd ( x, y ) If c = 1 then we call f non-expansion. A fixed point of a selfmap f is any point x ∗ ∈ D such that f ( x ∗ ) = x ∗ .14 .2 Introduction to Power Method Let A ∈ R n × n . Recall that if q is an eigenvector for A with eigenvalue λ , then Aq = λq , and ingeneral, A k q = λ k q for all k ∈ N . This observation is the foundation of the power iteration method .Suppose that the set { q i } of unit eigenvectors of A forms a basis of R n , and has correspondingset of real eigenvalues { λ i } such that | λ | > | λ | > · · · > | λ n | . Let v be an arbitrary initial vector,not perpendicular to q , with k v k = 1 . We can write v as a linear combination of the eigenvectorsof A for some c . . . , c n ∈ R we have that v = c q + c q + · · · + c n q n and since we assumed that v is not perpendicular to q we have that c = 0 .Also Av = c λ q + c λ q + · · · + c n λ n q n and therefore Av k = c λ k q + c λ k q + · · · + c n λ kn q n = λ k c q + c (cid:18) λ λ (cid:19) k q + · · · + c n (cid:18) λ n λ (cid:19) k q n ! Since the eigenvalues are assumed to be real, distinct, and ordered by decreasing magnitude, itfollows that lim k →∞ (cid:18) λ i λ (cid:19) k = 0 So, as k increases, A k v approaches c λ k q , and thus for large values, A k v k A k v k → q as k → ∞ The power iteration method is simple and elegant, but suffers some drawbacks. Except froma measure of initial conditions, the method returns a single eigenvector, corresponding to theeigenvalue of largest magnitude. In addition, convergence is only guaranteed if the eigenvalues aredistinct—in particular, the two eigenvalues of largest absolute value must have distinct magnitudes.The rate of convergence primarily depends upon the ratio of these magnitudes, so if the two largesteigenvalues have similar sizes, then the convergence will be slow.In spite of its drawbacks, the power method is still used in many applications, since it workswell on large, sparse matrices when only a single eigenvector is needed. However, there are othermethods overcoming some of the issues with the power iteration method. B Proof of Theorem 1
B.1 Meyer’s Construction and Our Contribution
The proof of Theorem 1, follows the construction of [Mey67]. We give a complete step by stepdescription of this construction. For every step of this construction we explain what it was alreadyproven by Meyers and we additionally prove some properties that are needed in order to satisfy ouradditional condition (2b).The construction of d c,ε starts with an open neighborhood of x ∗ with some desired properties.Meyers starts with an arbitrary open neighborhood W such that f ( W ) ⊂ W , whereas we also needthat diam d [ W ] ≤ ε . 15 emma 5. There exists an open neighborhood W of x ∗ such that f ( W ) ⊆ W (3a) diam d [ W ] ≤ ε (3b) Proof.
From the hypothesis of the theorem there exists an open neighborhood U such that f [ n ] ( U ) → { x ∗ } . This implies that any open subset V of U satisfies f [ n ] ( V ) → { x ∗ } . Therefore wecan choose a V = Int( ¯ B ( x ∗ , ε )) such that diam d [ V ] ≤ ε and f [ n ] ( V ) → { x ∗ } . For simplicity of thenotation we just assume refer to V as U and so diam d [ U ] ≤ ε .Starting from U we prove the existence of W . For this, we will prove that there exists an openneighborhood W of x ∗ such that f ( W ) ⊂ W and W ⊂ U . The latter implies f [ n ] ( W ) → { x ∗ } and diam d [ W ] ≤ ε .Since f [ n ] ( U ) → { x ∗ } , there is an integer k such that f [ k ] ( U ) ⊆ U . Let W = k − \ j =0 f [ − j ] ( U ) ⊆ U Then for any x ∈ W and for any ≤ j ≤ k − it holds that x ∈ f [ − j ] ( U ) and thus f ( x ) ∈ f [ − ( j − ( U ) . Moreover x ∈ U , so that f [ k ] ( x ) ∈ f [ k ] ( U ) ⊂ U and thus f ( x ) ∈ f [ − ( k − ( U ) . Hence x ∈ W implies f ( x ) ∈ W , which was to be shown. The diameter of W can be bounded by thediameter of U and hence is less that ε .We now proceed to the main line of the proof. The construction follows three steps:I. We first construct a metric d M , which is topologically equivalent to d , and with respect towhich f is non-expanding. It also holds that d M ( x, y ) ≥ d ( x, y ) for all x, y ∈ D and thereforeProperty (2b) can be transferred from d M to d .II. Given d M , we proceed to construct a “distance” function ρ c,ε , which satisfies (2a) and allthe metric properties except maybe for the triangle inequality. Moreover ρ c,ε satisfies that ρ c,ε ( x, y ) ≥ d M ( x, y ) if max { d ( x ∗ , x ) , d ( x ∗ , y ) } ≥ ε , and therefore (2b) is preserved.III. Given ρ c,ε , we construct the sought after metric d c,ε by taking it equal to the ρ c,ε -geodesicdistance. Given the properties of ρ c,ε and the definition of d c,ε , we can prove that d c,ε is ametric and Properties (2a) and (2b) hold. I. Construction of d M In the fist step of the construction we define an metric d M as d M ( x, y ) = sup n ∈ N { d ( f [ n ] ( x ) , f [ n ] ( y )) } and we show that f is non-expanding with respect to d M . Lemma 6 ([Mey67]) . For the d M function defined above we have that:1. d M is well defined and satisfies all the metric properties (see Definition 1).2. d M is topologically equivalent with d . The proof of Lemma 6 can be found in Section B.2 where for completeness we keep the proofsthat were already proved by Meyers’s [Mey67].For our purposes we also observe that by the definition of d M the following holds d M ( x, y ) ≥ d ( x, y ) and hence if d M ( x, y ) ≤ ε then also d ( x, y ) ≤ ε , therefore d M satisfies (2b).16 I. Construction of ρ c We begin by defining K n to be the closure of f n ( W ) for n ≥ , in particular we have that K = W and hence by Lemma 5 we have that diam d [ K ] . Also we define K ( − n ) = f [ − n ] ( K ) , so that ourassumption f [ n ] ( W ) → { x ∗ } implies K n → { x ∗ } as n → ∞ . (4)For x ∈ K \ { x ∗ } we set n ( x ) = max x ∈ K n { n } ≥ . The fact that n ( x ) is finite is guaranteed by (4).To see this assume that there is an infinite sequence n , n , . . . such that x ∈ K n i which impliesthat x ∈ T ∞ i =1 K i which definitely contradicts (4). We define also n ( x ∗ ) = ∞ , and for x ∈ D \ K set n ( x ) = − min f [ m ] ( x ) ∈ K { m } = max x ∈ K n { n } < which again is finite because of condition 2.Let κ ( x, y ) = min { n ( x ) , n ( y ) } , we define ρ c to be ρ c ( x, y ) = c κ ( x,y ) d M ( x, y ) . Lemma 7 ([Mey67]) . For the ρ c function defined above we have that:1. ρ c is well defined and satisfies the metric properties (see Definition 1), except the triangleinequality (iv).2. f is a contraction map with respect to ρ c with contraction constant c . ρ c ( f ( x ) , f ( y )) ≤ c · ρ c ( x, y ) . (5)The proof of Lemma 7 is almost immediate from the definition of ρ c , but for a detailed expla-nation we refer to the initial proof by Meyers [Mey67]. III. Construction of d c In this last step what we do is that we assign the distance between two points to be the length ofthe shortest path that connects these two points, with the lengths computed according to ρ c . Thenthe distance satisfies the triangle inequality because of the shortest path property.Formally, denote by S xy the set of chains s xy = ( x = x , x , . . . , x m = y ) from x to y withassociated lengths L c ( s xy ) = P mi =1 ρ c ( x i , x i − ) . We define d c ( x, y ) = inf { L c ( s xy ) | s xy ∈ S xy } . (6) Lemma 8 ([Mey67]) . For the d c function defined above we have that:1. d c is well defined and satisfies all the metric properties (see Definition 1).2. f is a contraction map with respect to d c with contraction constant c . d c ( f ( x ) , f ( y )) ≤ c · d c ( x, y ) . (7) d c is topologically equivalent with d and hence ( D , d c ) is a complete metric space. The proof of Lemma 8 can be found in Section B.2.We know need to prove two lemmas to prove that Meyers’s construction also satisfies (2b).
Lemma 9.
Consider any x = x ∗ and y = x, x ∗ and assume that y / ∈ K , then d c ( x, y ) ≥ min { d M ( x, y ) , d M ( x, K ) } > . (8) Proof of Lemma 9:
By definition any chain s xy either lies in D \ K entirely, or has a last linkwhich leaves K . If s xy lies in D \ K entirely then n ( x ) , n ( y ) < and hence d c ( x, y ) ≥ d M ( x, y ) .17therwise we consider the last link that leaves K and we have that the length between x and y according to d c is greater than the distance with respect to d M of x from K which gives that d M ( x, K ) ≤ d c ( x, y ) . The final step is to prove (2b).
Lemma 10. ∀ x, y ∈ D : d c,ε ( x, y ) ≤ ε = ⇒ min { d ( x ∗ , x ) , d ( x ∗ , y ) , d ( x, y ) } ≤ ε. Proof of Lemma 10:
Let A = diam d (cid:2) ¯ B ( x ∗ , ε ) (cid:3) and without loss of generality d ( x, x ∗ ) ≥ d ( y, x ∗ ) .If either d M ( x, x ∗ ) ≤ ε or d M ( y, x ∗ ) ≤ ε then we are done since as we have seen in the construc-tion of d M , d M ( x, y ) ≥ d ( x, y ) , thus either d ( x, x ∗ ) ≤ ε or d ( y, x ∗ ) ≤ ε and (2b) is satisfied. So wemay assume that d ( x, x ∗ ) ≥ ε and d ( y, x ∗ ) ≥ ε . Therefore x, y ∈ D \ K and which translates to n ( x ) , n ( y ) < . So now using Lemma 9 and we get d c ( x, y ) ≥ min { d M ( x, y ) , d M ( x, K ) } . (9)Now we consider two cases according to the value of d M ( x, K ) . If d M ( x, K ) ≥ ε then d c ( x, y ) ≤ ε = ⇒ d M ( x, y ) ≤ ε ≤ A = ⇒ d ( x, y ) ≤ ε. Otherwise if d M ( x, K ) ≤ ε then d ( x, K ) ≤ ε and by triangle inequality d ( x, x ∗ ) ≤ ε . By ourassumption for the relative position of x and y we also get d ( y, x ∗ ) ≤ ε and therefore x, y ∈ ¯ B ( x ∗ , ε ) . Thus, d ( x, y ) ≤ diam d (cid:2) ¯ B ( x ∗ , ε ) (cid:3) . B.2 Omitted Proof of Lemmas proven by Meyers in [Mey67]
Proof of Lemma 6:
The fact that this maximum is finite can be proved using the condition 2. of thetheorem. Indeed, since d ( f [ n ] ( x ) , x ∗ ) → and d ( f [ n ] ( y ) , x ∗ ) → , for any δ > there is a number N ∈ N such that d ( f [ n ] ( x ) , x ∗ ) ≤ δ and d ( f [ n ] ( y ) , x ∗ ) ≤ δ for all n > N . Now if let δ = d ( x, y ) we get that max n ≥ N { d ( f [ n ] ( x ) , f [ n ] ( y )) } ≤ d ( x, y ) and therefore max n ∈ N { d ( f [ n ] ( x ) , f [ n ] ( y )) } =max ≤ n ≤ N { d ( f [ n ] ( x ) , f [ n ] ( y )) } . Hence the maximum has a finite value. Observe also that by defi-nition it holds that d M ( f ( x ) , f ( y )) ≤ d M ( x, y ) and hence f is a non-xpansion according to d M . It only remains to prove that d M satisfies theproperties of a metric function. The positive definiteness and symmetry of d M follow immediatelyfrom the corresponding properties of d . The fact that d M ( x, y ) = 0 for x = y follows from thefact that d ( x, y ) ≤ d M ( x, y ) , which follows directly from the definition of d M since f [0] ( x ) = x . Itremains to prove the triangle inequality. For this we observe that by the definition of d M and usignthe fact that the maximum in the definition of d M for any x, y ∈ D is finite, there exists an n ∈ N such that d M ( x, z ) = d ( f [ n ] ( x ) , f [ n ] ( z )) ≤≤ d ( f [ n ] ( x ) , f [ n ] ( y )) + d ( f [ n ] ( y ) , f [ n ] ( x )) ≤≤ d M ( x, y ) + d M ( y, z ) . d M is indeed a metric. We now show that d M is topologically equivalent to d . From theinequality d ( x, y ) ≤ d M ( x, y ) it follows that any d M -convergent sequence is also d -convergent, withthe same limit point. To prove the implication in the opposite direction, note that condition 2. ofthe hypotheses of the theorem implies the existence for each η > of an N such that diam d h f [ n ] ( W ) i < η for n > N. For each x ∈ D , it follows from 2. that ν ( x ) = min n ∈ N ,f [ n ] ( x ) ∈ W { n } (10)is finite. Since f is continuous, there is an δ > so small that d ( x, y ) < δ implies f [ ν ( x )] ( y ) ∈ W and d ( f [ j ] ( x ) , f [ j ] ( y )) < η for ≤ j ≤ N + ν ( x ) . (11)By (3a) we have f [ n + N + ν ( x )] ( x ) ∈ f [ n + N ] ( W ) and f [ n + N + ν ( x )] ( y ) ∈ f [ n + N ] ( W ) for all n > , so thatthe (11) implies d ( f [ j ] ( x ) , f [ j ] ( y )) < η for j > N + ν ( x ) . Thus d ( x, y ) ≤ δ implies d M ( x, y ) ≤ η . This shows that a sequence which is d -convergent to x isalso d M -convergent to x , completing the proof of topological equivalence. Finally since d and d M are topologically equivalent and d is complete for D it follows that d M is also complete for D . Proof of Lemma 8:
We will prove that d c is the desired metric. That f is a contraction with constant c with respect to d c follows by applying (7) to the links [ x i − , x i ] of any chain s xy . Clearly d c issymmetric and d c ( x, x ) = 0 . The triangle law holds since following a s xy with a s yz yields a s xz . Itremains to show that it is positive definite.Consider any x = x ∗ and y = x and assume n ( x ) ≤ n ( y ) without loss of generality. If y = x ∗ ,any chain s xy either lies in D \ K n ( y )+1 , or has a last link which leaves K n ( y )+1 , so that d c ( x, y ) ≥ c n ( y ) min { d M ( x, y ) , d M ( x, K n ( y )+1 ) } > . (12)The remaining case, y = x ∗ is covered by d c ( x, y ) ≥ c n ( x ) d M ( x, K n ( x )+1 ) > . (13)Thus d c is a distance metric. We now have to prove that d c is equivalent to d M . Let B ν = D\ f [ − ν ] ( W ) for ν ≥ , so that the definition of ν ( x ) (10) implies d M ( x, B ν ( x ) ) > and n ( x ) ≥ − ν ( x ) .For any x = x ∗ , if y obeys d M ( x, y ) < δ ( x ) = min { d M ( x, K n ( x )+1 ) , d M ( x, B ν ( x ) ) } (14)then n ( x ) ≥ − ν ( x ) , so that (6) and (12), the last with x and y interchanged, imply c n ( x ) d M ( x, y ) ≤ d c ( x, y ) ≤ ρ c ( x, y ) ≤ c − ν ( x ) d M ( x, y ) . (15)Now choose k ( x ) > max { , n ( x ) } such that z ∈ K k ( x ) implies d M ( z, x ∗ ) < d c ( x, x ∗ ) / . Then d c ( x, K k ( x ) ) ≥ d c ( x, x ∗ ) / , so that if y obeys d c ( x, y ) < d c ( x, x ∗ ) / (16)19hen only chains disjoint from K k ( x ) need enter (6), implying d c ( x, y ) ≥ c k ( x ) d M ( x, y ) . (17)In particular, if d c ( x, y ) < min { d c ( x, x ∗ ) / , c k ( x ) δ ( x ) } then with (16) and (17) this implies (14) and hence (15) applies. Thus d c ( x n , x ) → whenever d M ( x n , x ) → .Now if x = x ∗ , note first that if d M ( x ∗ , y ) < d M ( x ∗ , B ) , then d c ( x ∗ , y ) ≤ ρ c ( x ∗ , y ) ≤ d M ( x ∗ , y ) . (18)Also note that for any η > , f [ n ] ( W ) → { x ∗ } guarantees an N ( η ) > such that d M ( x ∗ , z ) < η/ for all z ∈ K N ( η ) . Then d M ( x ∗ , y ) > η implies that d M ( y, K N ( η ) ) ≥ η/ and thus that d c ( x ∗ , y ) ≥ d c ( K N ( η ) , y ) ≥ c N ( η ) η/ Hence d c ( x n , x ∗ ) → if and only if d M ( x n , x ∗ ) → .To show that d M -completeness is preserved, assume that ( x n ) is a d c -Cauchy sequence and that ( X, d M ) is complete. If ( x n ) does not converge to x ∗ then since d c and d M are equivalent, for some N ∈ N and all sufficiently large n , n ( x n ) < N .Now exactly as above choose k (( x n )) = P > max 0 , N such that z ∈ K k (( x n )) implies d M ( x ∗ , z ) < inf i ∈ N (cid:26) d c ( x i , x ∗ )2 (cid:27) = R then since ( x n ) is a Cauchy sequence there is an i ∈ N such that d c ( x p , x p + j ) < R for all p > i , and using (17) with k ( x ) = P , we have c − P d c ( x p , x p + j ) ≥ d M ( x p , x p + j ) so that ( x n ) is a d M -Cauchy sequence. Therefore since ( D , d M ) is complete, the topological space ( D , d c ) is complete too. C Proof of Theorem 2
On route to establishing the CLS-completeness of
Banach , we will define an intermediate, syntacticproblem
MetricBanach , which is similar to
Banach except that the function d given in the inputis not promised to be a metric, and hence a violation of the metricity of d is accepted as a solution. Definition 7.
MetricBanach takes as input two functions f : [0 , → [0 , , d : [0 , × [0 , → R , both represented as arithmetic circuits, and three rational positive constants ε , λ , c < . Thedesired output is any of the following:(Oa) a point x ∈ [0 , such that d ( x, f ( x )) ≤ ε (Ob) two points x, x ′ ∈ [0 , disproving the contraction of f w.r.t. d with constant c , i.e. d ( f ( x ) , f ( x ′ )) > c · d ( x, x ′ ) (Oc) two points x, x ′ ∈ [0 , disproving the λ -Lipschitz continuity of f , i.e. | f ( x ) − f ( x ′ ) | > λ | x − x ′ | . 20Od) four points x , x , y , y ∈ [0 , with x = x and y = y disproving the λ -Lipschitz conti-nuity of d ( · , · ) , i.e. | d ( x , x ) − d ( y , y ) | > λ ( | x − y | + | x − y | ) .(Oe) points x, y, z ∈ [0 , violating any of the metric properties of d ((i)-(iv) of Definition 1).Notice that MetricBanach is syntactic, namely for any input there exists a solution. Weproceed to show that the problem is CLS-complete.
Theorem 3.
MetricBanach is CLS -complete.Proof of Theorem 3.
We first show that
MetricBanach belongs to
CLS even when we disallow(Oe). Starting from an instance ( f, d, ε, λ, c ) of Continuous LocalOpt we create the followinginstance ( f ′ ( x ) = f ( x ) , p ( x ) = d ( x, f ( x )) , ε ′ = (1 − c ) · ε, λ ′ = λ ) Now we have to show that any output of the
Continuous LocalOpt with input ( f, p, ε ′ , λ ) willgive us a output of MetricBanach with input ( f, d, ε, λ, c ) .(CO1) = ⇒ If d ( f ( x ) , f ( f ( x ))) > c · d ( x, f ( x )) then ( x, f ( x )) satisfies (Ob). Otherwise p ( f ( x )) ≥ p ( x ) − ε ′ ⇒ d ( f ( x ) , f ( f ( x ))) ≥ d ( x, f ( x )) − ε ′ ⇒ c · d ( x, f ( x )) ≥ d ( f ( x ) , f ( f ( x ))) ≥ d ( x, f ( x )) − ε ′ ⇒ c · d ( x, f ( x )) ≥ d ( x, f ( x )) − (1 − c ) · ε ⇒ (1 − c ) · d ( x, f ( x )) ≤ (1 − c ) · ε ⇒ d ( x, f ( x )) ≤ ε Therefore x satisfies (Oa) and therefore is a solution of MetricBanach .(CO2) = ⇒ (Oc).(CO3) = ⇒ Without loss of generality let k x − f ( x ) k ≤ k y − f ( y ) k . If x = f ( x ) then if d ( x, f ( x )) =0 we immediately satisfy (Oa) otherwise we satisfy (Oe). Otherwise we can give x = x , x = f ( x ) , y = y , y = f ( y ) and since x = x , y = y we satisfy (ii) of (Od).This implies that any output of Continuous LocalOpt at the instance ( f ′ , p, ε ′ , λ ′ ) can pro-duce an output to the instance ( f, d, ε, λ, c ) of the MetricBanach problem. Therefore
MetricBanach ∈ CLS .Now we are going to show the opposite direction and reduce
Continuous LocalOpt to MetricBanach . Starting from an instance ( f, p, ε, λ ) of Continuous LocalOpt we define forany x, y ∈ [0 , , κ ( x, y ) = min (cid:26) − p ( x ) ε , − p ( y ) ε (cid:27) We also remind the reader the definition of the discrete metric d S ( x, y ) = 1 if x = y and d S ( x, x ) = 0 Finally we define the smooth interpolation function for w ≤ B ( w ) = (1 − ( ⌈ w ⌉ − w )) c ⌈ w ⌉ + (( ⌈ w ⌉ − w )) c ⌈ w ⌉ +1 The basic observation about B ( · ) since c < is that c ⌈ κ ( x,y ) ⌉ +1 ≤ B ( κ ( x, y )) ≤ c ⌈ κ ( x,y ) ⌉ .Based on these definitions we create the following instance of MetricBanach f ′ = f, d ( x, y ) = B ( κ ( x, y )) · d S ( x, y ) , ε ′ = 1 c , λ ′ = max (cid:26) λ, (cid:24) c − /ε λ ln(1 /c ) ε (cid:25)(cid:27) , c = 1 − . ε
21s in the previous reduction we have to show that any result of the
MetricBanach with input ( f, d, ε ′ , λ, c ) will give us a result of Continuous LocalOpt with input ( f, p, ε, λ ) .(Oa) = ⇒ If p ( f ( x )) ≥ p ( x ) then x satisfies (CO1). Otherwise we can see that κ ( x, f ( x )) = − p ( x ) / ε and x = f ( x ) so d ( x, f ( x )) ≤ ε ′ ⇒ B ( κ ( x, y )) ≤ ε ′ ⇒ (cid:18) p ( x ) ε (cid:19) log(1 /c ) ≤ log( ε ′ ) ⇒ p ( x ) ε ≤ log( ε ′ )log(1 /c ) ⇒ p ( x ) ≤ ε so p ( f ( x )) ≥ ≥ p ( x ) − ε and so x satisfies (CO1).(Ob) = ⇒ As in the previous case we may assume that p ( f ( x )) ≤ p ( x ) − ε and that p ( f ( y )) ≤ p ( y ) − ε . Without loss of generality we can assume that p ( x ) > p ( y ) . If also p ( f ( x )) ≥ p ( f ( y )) then κ ( x, y ) = − p ( x ) /ε and κ ( f ( x ) , f ( y )) = − p ( f ( x )) /ε . Therefore d ( x, y ) = B ( κ ( x, y )) , d ( f ( x ) , f ( y )) = B ( κ ( f ( x ) , f ( y ))) Now if (Ob) is satisfied then c − j p ( f ( x )) ε k ≥ d ( f ( x ) , f ( y )) = B ( κ ( f ( x ) , f ( y ))) > c · B ( κ ( x, y )) = c · d ( x, y ) ≥ c · c − j p ( x ) ε k = ⇒ (cid:22) p ( f ( x )) ε (cid:23) ≥ (cid:22) p ( x ) ε (cid:23) − ⇒ p ( f ( x )) ≥ p ( x ) − ε Therefore x satisfies (CO1).Now similarly if p ( f ( y )) > p ( f ( x )) then p ( f ( y )) > p ( x ) − ε . But by our assumption that p ( x ) > p ( y ) we get p ( f ( y )) > p ( y ) − ε . Therefore y satisfies (CO1).(Oc) = ⇒ (CO2).(Od) = ⇒ We will analyze the function h ( x ) = c − x when x ∈ [0 , /ε ] . By the mean value theoremwe have that the Lipschitz constant ℓ h of h is less that max x ∈ [0 , /ε ] h ′ ( x ) . But h ′ ( x ) = (cid:16) e − x ln c (cid:17) ′ = ln(1 /c ) c − x and because c < we have that max x ∈ [0 , /ε ] h ′ ( x ) = c − /ε ln(1 /c ) .Let now κ ( x , x ) = − p ( x ) /ε and κ ( y , y ) = − p ( y ) /ε . Since x = x and y = y wehave d ( x , x ) = B (cid:0) c − p ( x ) /ε (cid:1) and d ( y , y ) = B (cid:0) c − p ( y ) /ε (cid:1) . Since B ( κ ( x, y )) is just an lin-ear interpolation of points that belong to c κx,y using the Mean Value Theorem we have that | B ( κ ( x , x )) − B ( κ ( y , y )) | ≤ max x ∈ [0 , /ε ] h ′ ( x ) (cid:12)(cid:12)(cid:12) p ( x ) ε − p ( y ) ε (cid:12)(cid:12)(cid:12) | d ( x , x ) − d ( y , y ) | = | B ( κ ( x , x )) − B ( κ ( y , y )) | ≤ (cid:18) max x ∈ [0 , /ε ] h ′ ( x ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) p ( x ) ε − p ( y ) ε (cid:12)(cid:12)(cid:12)(cid:12) ⇒ | d ( x , x ) − d ( y , y ) | ≤ c − /ε ln(1 /c ) ε | p ( x ) − p ( y ) | Now if | p ( x ) − p ( y ) | > λ | x − y | then x , y satisfy (CO3) and we have a solution for Continuous LocalOpt . So | p ( x ) − p ( y ) | ≤ λ | x − y | and from the last inequality we have that | d ( x , x ) − d ( y , y ) | ≤ c − /ε λ ln(1 /c ) ε | x − y | At this point we should have set κ ( x, y ) = min {− p ( x ) /ε, p ( y ) /ε } to get the inequality p ( f ( x )) ≥ p ( x ) − ε butthis would complicate the calculations in the rest of the cases. It is clear though that we could scale every parameterso that ε becomes ε and nothing changes. λ ′ = max n λ, l c − /ε λ ln(1 /c ) ε mo .Finally it is easy to see that the size of the arithmetic circuits that we used for this reductionis polynomial in the size of the input. The only function that needs for explanation is that of d and λ ′ . We start with the observation that both c , c − are given and have descriptions of size onlylinear in the description of ε , since ε is a rational constant. The difficult term in the description of d is the term B ( κ ( x, y )) . For this, we need to bound the size of κx, y , let this bound be A . Thenwe can have precomputed the possible digits of ⌈ κ ( x, y ) ⌉ using log( A ) arithmetic circuits. Finallya final circuit combines the digits in order to get ⌈ κ ( x, y ) ⌉ . Now to compute c ⌈ κ ( x,y ) ⌉ for each m i i of the log( A ) digits of κ ( x, y ) we compute the corresponding power i with repeated squaring using O ( i ) arithmetic gates. Then we combine the results such to compute c ⌈ κ ( x,y ) ⌉ . This whole processneeds O (log A ) arithmetic gates. Since A ≤ /ε the overall circuit for d needs poly(1 /ε ) arithmeticgates. For λ ′ we can also do a similar process but we have to bound c − κ ( x,y ) . We can see thatthat c − κ ( x,y ) ≤ c − /ε = (1 − . ε ) − /ε ≤ e . Therefore the size of c − /ε is bounded its ceil can becomputed using a polynomial sized circuit.By inspecting the proof of the CLS-completeness of MetricBanach we realize that, in theCLS-hardness part of the proof, we can actually guarantee that d is a metric. We can thus alsoestablish the CLS-completeness of Banach . Proof of Theorem 2.
Obviously because of Theorem 3,
Banach belongs to
CLS .For the opposite direction, we use the same reduction as in the proof of Theorem 3. We thenprove that d satisfies the desired properties. We remind that we used the following instance of Banach for the reduction f ′ = f, d ( x, y ) = B ( κ ( x, y )) · d S ( x, y ) , ε ′ = 1 √ c , λ ′ = max (cid:26) λ, (cid:24) c − /ε λ ln(1 /c ) ε (cid:25)(cid:27) , c = 1 − . ε We first prove that d is always a distance metric.(i) Obvious from the definition of d .(ii) If x = y then d S ( x, y ) > . Also always c κ ( x,y ) > , therefore d ( x, y ) > . Now since d S ( x, x ) = 0 we also have d ( x, x ) = 0 .(iii) It is obvious from the definition of κ that κ ( x, y ) = κ ( y, x ) and since d S is a distance metric,the same is true for the d S and thus for d .(iv) Without loss of generality we assume that p ( x ) ≥ p ( y ) . We consider the following cases p ( x ) ≥ p ( z ) then we have d ( x, y ) = d ( x, z ) and therefore obviously d ( x, y ) ≤ d ( x, z ) + d ( z, y ) . p ( x ) ≤ p ( z ) then we have d ( x, y ) = B (cid:16) − p ( x )2 ε (cid:17) , d ( x, z ) = d ( z, y ) = B (cid:16) − p ( z )2 ε (cid:17) but since p ( x ) ≤ p ( z ) obviously B (cid:16) − p ( z )2 ε (cid:17) ≥ B (cid:16) − p ( x )2 ε (cid:17) .Finally we will show the completeness of ([0 , , d ) . We first observe that for all x = y , d ( x, y ) > , this comes from the fact that c < and so c − p ( x ) /ε > .Now let ( x n ) be a Cauchy sequence then ∀ δ > , ∃ N ∈ N such that ∀ n, m > N , d ( x n , x m ) ≤ δ .We set δ = 1 / then there exists N ∈ N such that ∀ n, m > N , d ( x n , x m ) < / . But from theprevious observation this implies d ( x n , x m ) = 0 and since d defines a metric we get x n = x m .Therefore ( x n ) is constant for all n > N and obviously converges. This means that every Cauchysequence converges and so ([0 , , d ) is a complete metric space.23 Proofs of Corollaries 1 2
Proof of Corollary 1:
Let d c,ε/ be the distance metric guaranteed by Theorem 1 with parameters c , ε/ . Let also ( x n ) be the sequence produced by the Basic Iterative Procedure. Since f is acontraction with respect to d c,ε/ , we have d c,ε/ ( x n , x ∗ ) ≤ c n − c d c,ε/ ( x , x ) . If we make sure that d c,ε/ ( x n , x n +1 ) ≤ ε/ then according to Theorem 1 d ( x n , x ∗ ) ≤ ε . So the number of steps that areneeded are: c n − c d c,ε/ ( x , x ) ≤ ε ⇔ n ≥ log( d c,ε/ ( x , x )) + log((2 − c ) /ε )log(1 /c ) . Proof of Corollary 2:
Using Corollary 1, we get that after n = log( d c,δ/ ( x ,f ( x )))+log((2 − c ) /δ )log(1 /c ) itera-tions we will have d ( x n , x ∗ ) ≤ δ or d ( x n +1 , x ∗ ) ≤ δ . Since in ¯ B ( x ∗ , δ ) , f is a contraction with respectto d , it certainly must be that d ( x n +1 , x ∗ ) ≤ δ . By the same token, d ( x n +1+ m , x ∗ ) ≤ c m d ( x n +1 , x ∗ ) ,for all m > . Therefore, to guarantee d ( x n +1+ m , x ∗ ) ≤ ε , it suffices to take m ≥ − log(1 /δ )+log(1 /ε )log(1 /c ) .So in total we need n + 1 + mm