Local Adaption for Approximation and Minimization of Univariate Functions
LLocal Adaption for Approximation and Minimization ofUnivariate Functions
Sou-Cheng T. Choi, Yuhan Ding, Fred J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology, RE 208, 10 West 32 nd Street, Chicago, Illinois, 60616, USA
Xin Tong
Department of Mathematics, Statistics, and Computer Science, University of Illinois atChicago, Room 322 SEO, 851 S. Morgan Street, Chicago, Illinois, 60607, USA
Abstract
Most commonly used adaptive algorithms for univariate real-valued functionapproximation and global minimization lack theoretical guarantees. Our newlocally adaptive algorithms are guaranteed to provide answers that satisfy auser-specified absolute error tolerance for a cone, C , of non-spiky input func-tions in the Sobolev space W , ∞ [ a, b ]. Our algorithms automatically determinewhere to sample the function—sampling more densely where the second deriva-tive is larger. The computational cost of our algorithm for approximating aunivariate function f on a bounded interval with L ∞ -error no greater than ε is O (cid:16)(cid:113) (cid:107) f (cid:48)(cid:48) (cid:107) /ε (cid:17) as ε →
0. This is the same order as that of the best function ap-proximation algorithm for functions in C . The computational cost of our globalminimization algorithm is of the same order and the cost can be substantiallyless if f significantly exceeds its minimum over much of the domain. Our Guar-anteed Automatic Integration Library (GAIL) contains these new algorithms.We provide numerical experiments to illustrate their superior performance. Keywords: adaption, automatic, computational complexity, functionapproximation, function recovery, global minimization, nonlinear optimization
1. Introduction
Our goal is to reliably solve univariate function approximation and globalminimization problems by adaptive algorithms. We prescribe a suitable set,
Email address: [email protected] (Fred J. Hickernell)
Preprint submitted to Journal of Complexity September 27, 2018 a r X i v : . [ m a t h . NA ] N ov , of continuously differentiable, real-valued functions defined on a finite in-terval [ a, b ]. Then, we construct algorithms A : ( C , (0 , ∞ )) → L ∞ [ a, b ] and M : ( C , (0 , ∞ )) → R such that for any f ∈ C and any error tolerance ε > (cid:107) f − A ( f, ε ) (cid:107) ≤ ε, (APP)0 ≤ M ( f, ε ) − min a ≤ x ≤ b f ( x ) ≤ ε. (MIN)Here, (cid:107)·(cid:107) denotes the L ∞ -norm on [ a, b ], i.e., (cid:107) f (cid:107) = sup x ∈ [ a,b ] | f ( x ) | . Thealgorithms A and M depend only on function values.Our algorithms proceed iteratively until their data-dependent stopping crite-ria are satisfied. The input functions are sampled nonuniformly over [ a, b ], withthe sampling density determined by the function data. We call our algorithms locally adaptive , to distinguish them from globally adaptive algorithms that havea fixed sampling pattern and only the sample size determined adaptively. Our algorithms A and M are based on a linear spline , S ( f, x n ) defined on[ a, b ]. Let 0 : n be shorthand for { , . . . , n } , and let x n be any ordered sequenceof n + 1 points that includes the endpoints of the interval, i.e., a =: x < x < · · · < x n − < x n := b . We call such a sequence a partition . Then given any x n and any i ∈ n , the linear spline is defined for x ∈ [ x i − , x i ] by S ( f, x n )( x ) := x − x i x i − − x i f ( x i − ) + x − x i − x i − x i − f ( x i ) . (1)The error of the linear spline is bounded in terms of the second derivative ofthe input function as follows [2, Theorem 3.3]: (cid:107) f − S ( f, x n ) (cid:107) [ x i − ,x i ] ≤ ( x i − x i − ) (cid:107) f (cid:48)(cid:48) (cid:107) [ x i − ,x i ] , i ∈ n, (2)where (cid:107) f (cid:107) [ α,β ] denotes the L ∞ -norm of f restricted to the interval [ α, β ] ⊆ [ a, b ].This error bound leads us to focus on input functions in the Sobolev space W , ∞ := W , ∞ [ a, b ] := { f ∈ C [ a, b ] : (cid:107) f (cid:48)(cid:48) (cid:107) < ∞} .Algorithms A and M require upper bounds on (cid:107) f (cid:48)(cid:48) (cid:107) [ x i − ,x i ] , i ∈ n , tomake use of (2). A nonadaptive algorithm might assume that (cid:107) f (cid:48)(cid:48) (cid:107) ≤ σ , forsome known σ , and proceed to choose n = (cid:6) ( b − a ) (cid:112) σ/ (8 ε ) (cid:7) , x i = a + i ( b − a ) /n , i ∈ n . Providing an upper bound on (cid:107) f (cid:48)(cid:48) (cid:107) is often impractical, and so wepropose adaptive algorithms that do not require such information.However, one must have some a priori information about f ∈ W , ∞ toconstruct successful algorithms for (APP) or (MIN). Suppose that algorithm A satisfies (APP) for the zero function f = 0, and A (0 , ε ) uses the data sites x n ⊂ [ a, b ]. Then one can construct a nonzero function g ∈ W , ∞ satisfying g ( x i ) = 0, i ∈ n but with (cid:107) g − A ( g, ε ) (cid:107) = (cid:107) g − A (0 , ε ) (cid:107) > ε .Our set C ⊂ W , ∞ for which A and M succeed includes only those functionswhose second derivatives do not change dramatically over a short distance. The2recise definition of C is given in Section 2. This allows us to use second-order divided differences to construct rigorous upper bounds on the linear splineerror in (2). These data-driven error bounds inform the stopping criteria forAlgorithm A in Section 3.1 and Algorithm M in Section 4.1.The computational cost of Algorithm A is analyzed in Section 3.2 and isshown to be O (cid:16)(cid:113) (cid:107) f (cid:48)(cid:48) (cid:107) /ε (cid:17) as ε →
0. Here, (cid:107)·(cid:107) denotes the L -quasi-norm,a special case of the L p -quasi-norm, (cid:107) f (cid:107) p := (cid:0)(cid:82) ba | f | p d x (cid:1) /p , 0 < p <
1. Since (cid:107) f (cid:48)(cid:48) (cid:107) can be much smaller than (cid:107) f (cid:48)(cid:48) (cid:107) , locally adaptive algorithms can be moreefficient than globally adaptive algorithms, whose computational costs are pro-portional to (cid:112) (cid:107) f (cid:48)(cid:48) (cid:107) /ε . The computational complexity of (APP) is determinedin Section 3.3 to be O (cid:16)(cid:113) (cid:107) f (cid:48)(cid:48) (cid:107) /ε (cid:17) as well.The computational cost of our optimization algorithm M is analyzed inSection 4.2. A lower bound on the computational complexity of (MIN) is asubject for future investigation.Our algorithms are implemented in our MATLAB [23] Guaranteed Auto-matic Integration Library (GAIL) [5]. Section 5 provides numerical examplesof our algorithms and compares their performances with MATLAB’s and Cheb-fun’s algorithms. We note cases where our algorithms are successful in meetingthe error tolerance, and other algorithms are not. Adaptive algorithms relieve the user of having to specify the number ofsamples required. Only the desired error tolerance is needed. Existing adaptivenumerical algorithms for function approximation, such as the MATLAB toolboxChebfun [11], succeed for some functions, but fail for others. No theory explainsfor which f Chebfun succeeds. A corresponding situation exists for minimizationalgorithms, such as min in Chebfun or MATLAB’s built-in fminbnd [1, 10].Our theoretically justified Algorithms A and M build upon the ideas usedto construct the adaptive algorithms in [7, 9, 12, 13, 15, 16, 24]. In all thosecases, a cone, C , of input functions is identified for which the adaptive algorithmssucceed, just as is done here. However, unlike the algorithms in [7, 9, 12, 24], thedefinition of C here does not depend on a weaker norm. Also, unlike the globallyadaptive approximation and optimization algorithms in [7, 24], the algorithmsproposed here are locally adaptive, sampling the interval [ a, b ] nonuniformly.Novak [18] summarizes the settings under which adaption may provide anadvantage over nonadaption. For linear problems, such as (APP), adaption hasno advantage if the set of functions being considered is symmetric and convex[18, Theorem 1], [25, Chapter 4, Theorem 5.2.1], [26]. The cone C definedfor our approximation problem (APP) is symmetric, but not convex. Plaskotaet al. [20] have developed adaptive algorithms for functions with singularities.Our algorithms are not designed for such functions. Rather they are designedto be efficient when the second derivative is large in a small part of the domain.Plaskota [19] has developed an adaptive Simpson’s algorithm for approximat-ing (cid:82) ba f ( x ) d x assuming that the fourth derivative f (4) ( x ) ≥ x ∈ [ a, b ].3is algorithm relies on divided differences, like ours do. His error is asymptoti-cally proportional to (cid:13)(cid:13) f (4) (cid:13)(cid:13) , which is analogous to the (cid:107) f (cid:48)(cid:48) (cid:107) that appears inour analysis. Horn [14] has developed an optimization algorithm for Lipschitzcontinuous functions that does not require knowledge of the Lipschitz constant.There is a significant literature on theoretically justified algorithms basedon interval arithmetic [17, 22], which are implemented in INTLAB [21]. Thisapproach assumes that functions have interval inputs and outputs. We focus onthe more common situation where functions have point inputs and outputs.
2. The Cone, C , of Functions of Interest Linear splines (1) are the foundation for adaptive algorithms A and M . Tobound the error of the linear spline in (2), our algorithms construct data-basedupper bounds on (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] in terms of divided differences. For these bounds tohold, we must assume that f (cid:48)(cid:48) ( x ) does not change drastically with respect to asmall change in x . These assumptions define our cone of functions, C , for whichour algorithms ultimately apply.Let p denote the quadratic Lagrange interpolating polynomial at the nodes { α, ( α + β ) / , β } , which may be written as p ( x ) := f ( α ) + ( x − α )[ f ( β ) − f ( α )] β − α + ( x − α )( x − β ) D ( f, α, β ) ,D ( f, α, β ) := 2 f ( β ) − f (( α + β ) / f ( α )( β − α ) . (3)For any f ∈ W , ∞ , the function f − p has at least three distinct zeros on [ α, β ],so f (cid:48) − p (cid:48) has at least two distinct zeros on ( α, β ). Specifically, there exist ξ ± with α < ξ − < ( α + β ) / < ξ + < β with f (cid:48) ( ξ ± ) − p (cid:48) ( ξ ± ) = 0. Thus, (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ α,β ] := inf α ≤ η<ζ ≤ β (cid:12)(cid:12)(cid:12)(cid:12) f (cid:48) ( ζ ) − f (cid:48) ( η ) ζ − η (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) f (cid:48) ( ξ + ) − f (cid:48) ( ξ − ) ξ + − ξ − (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) p (cid:48) ( ξ + ) − p (cid:48) ( ξ − ) ξ + − ξ − (cid:12)(cid:12)(cid:12)(cid:12) = 2 | D ( f, α, β ) |≤ sup α ≤ η<ζ ≤ β (cid:12)(cid:12)(cid:12)(cid:12) f (cid:48) ( ζ ) − f (cid:48) ( η ) ζ − η (cid:12)(cid:12)(cid:12)(cid:12) =: (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] . (4)This inequality tells us that twice the divided difference, 2 | D ( f, α, β ) | , is a lower bound for (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] , which by itself is not helpful. But 2 | D ( f, α, β ) | isan upper bound for (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ α,β ] . The cone of interesting functions, C , willcontain those f for which (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] is not drastically greater than the maximumof (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h − ,α ] and (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h + ] , where h ± > β − α .The cone C is defined in terms of two numbers: an integer n ninit ≥ C ≥
1. Let h := 3( b − a ) n ninit − , C ( h ) := C hh − h for 0 < h < h . (5)4 igure 1: For some sample f , a plot of | f (cid:48)(cid:48) ( x ) | (solid), (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] (dashed), (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h − ,α ] and (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h + ] (dotted), and 2 | D ( f, β − h − , α ) | and 2 | D ( f, β, α + h + ) | (dot-dashed).All figures in this paper are reproducible by LocallyAdaptivePaperFigs.m in GAIL [5].
For any [ α, β ] ⊂ [ a, b ] and any h ± satisfying 0 < β − α < h ± < h , define B ( f (cid:48)(cid:48) , α, β, h − , h + ) := max (cid:0) C ( h − ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h − ,α ] , C ( h + ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h + ] (cid:1) ,a ≤ β − h − < α + h + ≤ b, C ( h − ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h − ,α ] , a ≤ β − h − < b < α + h + , C ( h + ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h + ] , β − h − < a < α + h + ≤ b. (6) C := (cid:110) f ∈ W , ∞ : (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] ≤ B ( f (cid:48)(cid:48) , α, β, h − , h + ) for all [ α, β ] ⊂ [ a, b ]and h ± ∈ ( β − α, h ) (cid:111) . (7)The set C is a cone because f ∈ C = ⇒ cf ∈ C for all real c . The integer n ninit is the initial number of subintervals in Algorithms A and M . The parameter C is some number no less than one for whichlim h → (cid:107) f (cid:48)(cid:48) (cid:107) [ x − h,x + h ] ≤ C lim h → (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ x − h,x + h ] , ∀ x ∈ ( a, b ) , f ∈ C . Increasing either n ninit or C expands the cone to include more functions.Figure 1 depicts the second derivative of a typical function in W , ∞ . Inthis figure (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h − ,α ] = | f (cid:48)(cid:48) ( β − h − ) | = 0, which means that the behaviorof f (cid:48)(cid:48) to the left of [ α, β ] cannot help provide an upper bound on (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] .However, (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h + ] = | f (cid:48)(cid:48) ( α + h + ) | >
0, so this f may lie in the cone C provided that C ( h + ) is large enough. The possibility of points in [ a, b ] where f (cid:48)(cid:48) vanishes motivates the definition of B ( f (cid:48)(cid:48) , α, β, h − , h + ) to depend on the behav-ior of f (cid:48)(cid:48) to both the left and right of [ α, β ]. One may note that if f (cid:48)(cid:48) vanishes attwo points that are close to each other or at a point that is close to either a or b ,then f will lie outside C . The definition of “close” depends on ( b − a ) /n ninit .We give an example of a family of functions whose members lie inside C ifthey are not too spiky. Consider the following hump-shaped function defined5 x
01 -2525 f ( x ) f ′′ ( x ) (a) (b) Figure 2: (a) The example f with − c = δ = 0 . ± f used to prove (19) (with different choices of c and δ ). The case n = 15 is shown. on [ − , f ( x ) = δ (cid:104) δ + ( x − c ) + ( x − c − δ ) | x − c − δ |− ( x − c + δ ) | x − c + δ | (cid:105) , | x − c | ≤ δ, , otherwise , (8) f (cid:48)(cid:48) ( x ) = δ [1 + sign( x − c − δ ) − sign( x − c + δ )] , | x − c | ≤ δ, , otherwise . Here c and δ are parameters satisfying − ≤ c − δ < c + 2 δ ≤
1. This functionand its second derivative are shown in Figure 2(a) for − c = δ = 0 . δ ≥ h , then f ∈ C for any choice of C ≥ α, β ] ⊆ [ − ,
1] and h ± satisfying the conditions in the definition of C in (7), it follows that (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] = 1 δ = (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h + ] if α or β ∈ [ c − δ, c − . δ ] ∪ [ c − δ, c − . δ ] ∪ [ c + δ, c + 1 . δ ] , (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h − ,α ] if α or β ∈ [ c − . δ, c − δ ] ∪ [ c − . δ, c + δ ] ∪ [ c + 1 . δ, c + 2 δ ] . Thus, B ( f (cid:48)(cid:48) , α, β, h − , h + ) ≥ (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] for β ≥ c − δ or α ≤ c + 2 δ . For [ α, β ] ⊂ [ − , c − δ ) ∪ ( c + 2 δ, (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] = 0, so B ( f (cid:48)(cid:48) , α, β, h − , h + ) ≥(cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] automatically. Thus, the definition of the cone is satisfied.However, if the hump is too narrow, i.e., δ < h , the function f is too spikyto lie in the cone C regardless of how C is defined. For α , β , and h satisfying0 < c − . δ − α = β − c + 1 . δ < . δ < c − . δ − β + h < h ,
6t follows that β − h < c − δ < α < c − . α < β < c − δ < α + h, (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h,α ] = (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h ] = 0 < δ − = (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] . This violates the definition of C . This example illustrates how the choice of n ninit , or equivalently h , influences the width of a spiky function and determineswhether it lies in C .
3. The Function Approximation Algorithm, A A The idea of Algorithm A is to use divided differences to provide upper boundson (cid:107) f (cid:107) −∞ , [ β − h − ,α ] and (cid:107) f (cid:107) −∞ [ β,α + h + ] via (4), which then provide an upperbound on (cid:107) f (cid:107) [ α,β ] via the definition of the cone, C , in (7). This in turn yieldsan upper bound on the spline error via (2). After stating the algorithm, itseffectiveness is proven. Algorithm A . For some finite interval [ a, b ], integer n ninit ≥
5, and constant C ≥
1, let h and C ( h ) be defined as in (5). Let f : [ a, b ] → R and ε > n = n ninit , and the iteration number, l = 0. Define the initial partition of equally spaced points, x n , and an indexset of subintervals: h = b − an , x i = a + ih , i ∈ n, I = 1 : ( n − . Step 1. Check for convergence.
For all i ∈ I computeerr i = 18 C (3 h l ) | f ( x i +1 ) − f ( x i ) + f ( x i − ) | . (9)Let (cid:101) I = { i ∈ I : err i > ε } be the index set for those err i that are toolarge. If (cid:101) I = ∅ , return the linear spline A ( f, ε ) = S ( f, x n ) and termi-nate the algorithm. Otherwise, continue to the next step. Step 2. Split the subintervals as needed.
Update the present partition, x n , toinclude the subinterval midpoints x i − + x i − , x i − + x i , x i + x i +1 , x i +1 + x i +2 , i ∈ (cid:101) I . (The leftmost midpoint is only needed for i ≥
2, and the rightmostmidpoint is only needed for i ≤ n − I to consist ofthe new indices corresponding to the old points x i − , x i − + x i , x i + x i +1 , x i +1 , i ∈ (cid:101) I . (The point x i − is only included for i ≥
2, and the point x i +1 is onlyincluded for i ≤ n − l ← l + 1 and h l = h l − /
2. Return toStep 1. 7 heorem 1.
Algorithm A defined above satisfies (APP) for functions in thecone C defined in (7). Proof.
For every iteration l and every i ∈ I , the definitions in this algorithmimply that x i − x i − = x i +1 − x i = h l = 2 − l h , anderr i = 14 C (3 h l ) h l | D ( f, x i − , x i +1 ) | by (3) (10) ≥ C (3 h l ) h l (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ x i − ,x i +1 ] by (4) . (11)We show that when all err i get small enough, Algorithm A terminates success-fully.For all x ∈ [ a, b ], let I x,l be the closed interval with width h l containing x that might arise at some stage in Algorithm A as [ x i l − , x i l ] for some i l ∈ n .(The dependence of n on l is suppressed.) Specifically this interval is definedfor all x ∈ [ a, b ] and l ∈ N as I x,l := [ a + jh l , a + ( j + 1) h l ] , j = min (cid:18)(cid:22) ( x − a ) h l (cid:23) , l n ninit − (cid:19) . (12)Let (cid:96) ( x ) be defined such that I x,(cid:96) ( x ) is the final subinterval in Algorithm A that contains x when the algorithm terminates. We need to establish that (cid:107) f − S ( f ) (cid:107) I x,(cid:96) ( x ) ≤ ε for every x ∈ [ a, b ].Fix x ∈ [ a + h , b − h ]. The proof for x ∈ [ a, a + h ) ∪ ( b − h , b ] is similar. By(11) there exists some l − ≤ (cid:96) ( x ) for which I x,l − = [ x i l − − , x i l − ] and18 C (3 h l − ) h l − (cid:107) f (cid:107) −∞ , [ x il −− ,x il −− ] ≤ err i l − − ≤ ε. (13a)There also exists an l + ≤ (cid:96) ( x ) such that I x,l + = [ x i l + − , x i l + ] and18 C (3 h l + ) h l + (cid:107) f (cid:107) −∞ , [ x il + ,x il ++2 ] ≤ err i l + +1 ≤ ε. (13b)Noting that x i l ± − ≤ x i (cid:96) ( x ) − < x i (cid:96) ( x ) ≤ x i l ± , we may conclude that (cid:107) f − S ( f ) (cid:107) I x,(cid:96) ( x ) ≤ h (cid:96) ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) I x,(cid:96) ( x ) by (2) ≤ h (cid:96) ( x ) B ( f, x i (cid:96) ( x ) − , x i (cid:96) ( x ) , x i (cid:96) ( x ) − x i l − − , x i l + +2 − x i (cid:96) ( x ) − ) by (7) ≤ h (cid:96) ( x ) max (cid:16) C ( x i (cid:96) ( x ) − x i l − − ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ x il −− ,x i(cid:96) ( x ) − ] , C ( x i l + +2 − x i (cid:96) ( x ) − ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ x i(cid:96) ( x ) ,x il ++2 ] (cid:17) by the definition of B in (6) ≤ max (cid:16) h l − C (3 h l − ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ x il −− ,x il −− ] , h l + C (3 h l + ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ x il + ,x il ++2 ] (cid:17) because h (cid:96) ( x ) ≤ h l ± and C is non-decreasing8a) (b) Figure 3: (a) The nonuniform sampling density of Algorithm A for input function − f definedby δ = 0 . c = − .
2. A total of 3 iterations and 65 points are used to meet the errortolerance of 0 .
02. We have chosen n ninit = 20 and C = 10. (b) The same situation as in (a),but now with Algorithm M . Still 3 iterations but only 43 nonuniform sampling points areneeded to obtain the minimum of − f . ≤ ε by (13) . This concludes the proof.Figure 3(a) displays the function − f defined in (8) for a certain choice ofparameters, along with the data used to compute the linear spline approximation A ( − f , .
02) by the algorithm described above. Note that − f is sampled lessdensely where it is flat. A In this section, we investigate the computational cost of our locally adap-tive algorithm. Recall the definitions of h l , I x,l , and (cid:96) ( x ) from the previoussubsection. Let ¯ I x,l be a similar interval with generally five times the widthof I x,l : ¯ I x,l = (cid:2) a + max(0 , j − h l , a + min( j + 2 , l n ninit ) h l (cid:3) ⊃ I x,l , (14)with the same j as in (12) above. Let L ( x ) = min (cid:26) l ∈ N : 18 C (3 h l ) h l (cid:107) f (cid:48)(cid:48) (cid:107) ¯ I x,l ≤ ε (cid:27) . (15)Note that L ( x ) does depend on f and ε , although this dependence is suppressedin the notation.We now show that (cid:96) ( x ) ≤ L ( x ). At each iteration of Algorithm A , x liesin I x,l for some l , and by the time Algorithm A terminates, all values of l =0 , . . . , (cid:96) ( x ) are realized. If (cid:96) ( x ) > L ( x ), then at iteration L ( x ), the interval I x,L ( x ) must be split in Step 2 of A . So, I x,L ( x ) has width h L ( x ) and correspondsto [ x i − , x i ] for some i . We assume that i ∈ n −
2; the other cases have a9imilar proof. According to Step 2 of Algorithm A , the only way for [ x i − , x i ]to be split is if err i − , err i − , err i , or err i +1 is larger than ε . However, in theproof of Theorem 1 it is noted that for k ∈ {− , − , , } ,err i + k = 14 C (3 h L ( x ) ) h L ( x ) | D ( f, x i − k , x i +1+ k ) | by (10) ≤ C (3 h L ( x ) ) h L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) [ x i − k ,x i +1+ k ] by (4) ≤ C (3 h L ( x ) ) h L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) ¯ I x,L ( x ) ≤ ε by (14) and (15) . (16)This is a contradiction, so in fact, (cid:96) ( x ) ≤ L ( x ), which is used to prove an upperbound on the computational cost of Algorithm A . Theorem 2.
Let cost(
A, f, ε ) denote the number of functional evaluations re-quired by A ( f, ε ). This computational cost has the following upper bound:cost( A, f, ε ) ≤ h (cid:90) ba L ( x ) d x + 1 = (cid:90) ba h L ( x ) d x + 1 , where L ( x ) is defined in (15). Proof.
Let x n be the final partition when A ( f, ε ) successfully terminates. Notethat 2 (cid:96) ( x ) is constant for x ∈ I x i − ,(cid:96) ( x i − ) = [ x i − , x i ] for i ∈ n . Furthermore (cid:82) x i x i − (cid:96) ( x ) d x = h . Then the number of function values required is n + 1 = 1 + n (cid:88) i =1 n (cid:88) i =1 h (cid:90) x i x i − (cid:96) ( x ) d x = 1 + 1 h (cid:90) ba (cid:96) ( x ) d x. Noting that (cid:96) ( x ) ≤ L ( x ) establishes the formula for cost( A, f, ε ).From the definition of L ( x ) in (15), we know that1 h L ( x ) = 2 h L ( x ) − < (cid:115) C (cid:0) h L ( x ) − (cid:1) (cid:107) f (cid:48)(cid:48) (cid:107) ¯ I x,L ( x ) − ε = (cid:115) C (cid:0) h L ( x ) (cid:1) (cid:107) f (cid:48)(cid:48) (cid:107) ¯ I x,L ( x ) − ε . As ε → L ( x ) → ∞ , h L ( x ) →
0, and (cid:107) f (cid:48)(cid:48) (cid:107) ¯ I x,L ( x ) − approaches | f (cid:48)(cid:48) ( x ) | . Thus,the small ε asymptotic upper bound on computational cost iscost( A, f, ε ) (cid:46) (cid:90) ba (cid:114) C (0) | f (cid:48)(cid:48) ( x ) | ε d x + 1 = (cid:115) C (cid:107) f (cid:48)(cid:48) (cid:107) ε + 1 ≤ ( b − a ) (cid:114) C (cid:107) f (cid:48)(cid:48) (cid:107) ε + 1 by (17a) below . For functions in the cone C , the (quasi-)seminorms (cid:107) f (cid:48)(cid:48) (cid:107) and (cid:107) f (cid:48)(cid:48) (cid:107) areequivalent, but for functions in W , ∞ they are not, as shown in the followingproposition. 10 roposition 3. The quantities (cid:107) f (cid:48)(cid:48) (cid:107) and (cid:107) f (cid:48)(cid:48) (cid:107) bound each other as follows:( b − a ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ α,β ] ≤ (cid:107) f (cid:48)(cid:48) (cid:107) , [ α,β ] ≤ ( b − a ) (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] ∀ f ∈ W , ∞ , (17a)4 h C (cid:107) f (cid:48)(cid:48) (cid:107) ≤ (cid:107) f (cid:48)(cid:48) (cid:107) ∀ f ∈ C , (17b)sup f ∈ W , ∞ : (cid:107) f (cid:48)(cid:48) (cid:107) ≤ (cid:107) f (cid:48)(cid:48) (cid:107) = ∞ . (17c) Proof.
The first inequality follows from the definitions of the (quasi-)norms:( β − α ) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ α,β ] = (cid:26)(cid:113) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ α,β ] (cid:90) βα d x (cid:27) ≤ (cid:26)(cid:90) βα (cid:112) | f (cid:48)(cid:48) ( x ) | d x (cid:27) = (cid:107) f (cid:48)(cid:48) (cid:107) , [ α,β ] ≤ (cid:26)(cid:113) (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] (cid:90) βα d x (cid:27) ≤ ( β − α ) (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] . (18)The second inequality comes from the cone definition. Since (cid:107) f (cid:48)(cid:48) (cid:107) = (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] for some interval [ α, β ] whose width can be made arbitrarily small, we have (cid:107) f (cid:48)(cid:48) (cid:107) [ α,β ] ≤ inf (cid:8) B ( f, α, β, h, h ) : h ∈ ( β − α, h ) (cid:9) by (7) ≤ inf (cid:8) C ( h ) max (cid:0) (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β − h,α ] , (cid:107) f (cid:48)(cid:48) (cid:107) −∞ , [ β,α + h ] (cid:1) : h ∈ ( β − α, h ) (cid:9) ≤ inf β − α 16, thencost( A ∗ , f, ε ) = ∞ . (19a)ii. If A ∗ solves (APP) for all f ∈ B , ∞ σ and all ε > 0, thencost( A ∗ , f, ε ) ≥ ( b − a )4 (cid:114) σε − . (19b)iii. If A ∗ satisfies (APP) for all f ∈ C ∩ B , σ and all ε > 0, thencost( A ∗ , f, ε ) ≥ (cid:115) ( C − σ C + 1) ε − . (20a)iv. If A ∗ satisfies (APP) for all f ∈ C ∩ B , ∞ σ and all ε > 0, thencost( A ∗ , f, ε ) ≥ ( b − a ) (cid:115) ( C − σ C + 1) ε − . (20b)Note by comparing (19a) and (20a) that the lower complexity bound issignificantly altered by restricting the set of input functions from the whole ballof B , σ to the intersection of that ball with the cone C . Also note that thelower bounds above assume that the radius of the ball, σ , is known a priori,whereas for our Algorithm A , no bound on a norm of f (cid:48)(cid:48) is provided as input.However, the computational cost of Algorithm A is asymptotically the same asthe computational cost of the best possible algorithm, A ∗ in (20), as ε → Proof. The lower bounds are proved by constructing fooling functions for whichAlgorithm A succeeds, and then showing that at least a certain number ofsamples must be used. The proofs of (19) are simpler, so we start with them.Let A ∗ be a successful algorithm for all f ∈ W , ∞ , and consider the partition x n +1 , where x n are the data sites used to compute A ∗ (0 , ε ). We now allow thepossibility of a = x = x and x n = x n +1 = b . Choose any j = 1 , . . . , n + 1 with x j − x j − ≥ ( b − a ) / ( n + 1). Let f be defined as in (8) with c = ( x j + x j − ) / δ = ( b − a ) / [4( n + 1)].For any real γ , it follows that γf ( x i ) = 0 for i = 0 , . . . , n + 1. Figure 2(b)illustrates this situation. Since 0 and ± γf share the same values at the datasites, then they must share the same approximation: A ∗ ( ± γf , ε ) = A ∗ (0 , ε ).Moreover, cost( A ∗ , , ε ) = cost( A ∗ , ± γf , ε ) = n . Since the approximationsof 0 , − γf , and γf are identical, this implies that γ must be no greater than ε : ε ≥ max( (cid:107) γf − A ∗ ( γf , ε ) (cid:107) , (cid:107)− γf − A ∗ ( − γf , ε ) (cid:107) )= max( (cid:107) γf − A ∗ (0 , ε ) (cid:107) , (cid:107)− γf − A ∗ (0 , ε ) (cid:107) ) ≥ 12 [ (cid:107) γf − A ∗ (0 , ε ) (cid:107) + (cid:107)− γf − A ∗ (0 , ε ) (cid:107) ]12 (cid:107) γf − ( − γf ) (cid:107) = (cid:107) γf (cid:107) = γ = (cid:107) γf (cid:48)(cid:48) (cid:107) / ,δ (cid:107) γf (cid:48)(cid:48) (cid:107) = ( b − a ) (cid:107) γf (cid:48)(cid:48) (cid:107) n + 1) , since (cid:107) f (cid:48)(cid:48) (cid:107) = δ − , and (cid:107) f (cid:48)(cid:48) (cid:107) = 16. The top inequality cannot be satisfiedunless σ = (cid:107) γf (cid:48)(cid:48) (cid:107) is small enough, which establishes (19a). Solving the bottominequality for n in terms of σ = (cid:107) γf (cid:48)(cid:48) (cid:107) establishes (19b).Now, we prove the lower complexity bounds (20), assuming that A ∗ is asuccessful algorithm for all f ∈ C . Let f be defined as follows f ( x ) = x , f (cid:48)(cid:48) ( x ) = 1 , x ∈ [ a, b ]; (cid:107) f (cid:48)(cid:48) (cid:107) = ( b − a ) , (cid:107) f (cid:48)(cid:48) (cid:107) = 1 . Since f (cid:48)(cid:48) is constant, it follows that f ∈ C , and A ∗ successfully approximates γf for any γ ≥ x n +1 , where x n are the data sites used to compute A ∗ ( γf , ε ), and we again allow the possibility of a = x = x and x n = x n +1 = b .Again choose any j = 1 , . . . , n + 1 with x j − x j − ≥ ( b − a ) / ( n + 1), and let f be defined as in (8) with c = ( x j + x j − ) / 2, and δ = ( b − a ) / [4( n + 1)]. Weconstruct two fooling functions: f ± = f ± (cid:101) γf , (cid:101) γ = C − C + 1 δ , (cid:13)(cid:13) f (cid:48)(cid:48)± (cid:13)(cid:13) = 1 + (cid:101) γδ = 2 C C + 1 , (cid:13)(cid:13) f (cid:48)(cid:48)± (cid:13)(cid:13) −∞ , [ α,β ] ≥ − (cid:101) γδ = 2 C + 1 = (cid:13)(cid:13) f (cid:48)(cid:48)± (cid:13)(cid:13) C ∀ [ α, β ] ⊆ [ a, b ] . The above calculations show that γf ± ∈ C for all real γ . Moreover, the definitionof f ± ensures that A ∗ ( γf ) = A ∗ ( γf ± ), and cost( A ∗ , γf ) = cost( A ∗ , γf ± ) = n .Analogously to the argument above, we show that γ (cid:101) γ must be no largerthan ε : ε ≥ max( (cid:107) γf + − A ( γf + , ε ) (cid:107) , (cid:107) γf − − A ( γf − , ε ) (cid:107) ) ≥ 12 [ (cid:107) γf + − A ( γf + , ε ) (cid:107) + (cid:107) γf − − A ( γf − , ε ) (cid:107) ]= 12 [ (cid:107) γf + − A ( γf , ε ) (cid:107) + (cid:107) γf − − A ( γf , ε ) (cid:107) ] ≥ (cid:107) γf + − γf − (cid:107) = (cid:107) γ (cid:101) γf (cid:107) = γ (cid:101) γ = (cid:107) γf (cid:48)(cid:48) (cid:107) ( b − a ) , (cid:107) γf (cid:48)(cid:48) (cid:107) · C − C + 1 δ = (cid:26) (cid:107) γf (cid:48)(cid:48) (cid:107) , ( b − a ) (cid:107) γf (cid:48)(cid:48) (cid:107) (cid:27) · C − C + 1)( n + 1) Substituting (cid:107) γf (cid:48)(cid:48) (cid:107) = σ in the top inequality and (cid:107) γf (cid:48)(cid:48) (cid:107) = σ in the bottominequality, and then solving for n yield the two bounds in (20).13 . The Minimization Algorithm, M M Our minimization algorithm M relies on the derivations in the previoussections. The main departure from Algorithm A is the stopping criterion. It isunnecessary to approximate f accurately everywhere, only where f is small. Algorithm M . For some finite interval [ a, b ], integer n ninit ≥ 5, and constant C ≥ 1, let h and C ( h ) be defined as in (5). Let f : [ a, b ] → R and ε > n = n ninit , and define the initial partition of equally spaced points, x n , and certain index sets of subintervals: x i = a + i b − an , i ∈ n, I + = 2 : ( n − , I − = 1 : ( n − . Compute (cid:99) M = min i ∈ n f ( x i ). For s ∈ { + , −} do the following. Step 1. Check for convergence. Compute err i for all i ∈ I ± according to (9).Let (cid:101) I s = { i ∈ I s : err i > ε } . Next compute (cid:99) err i,s := err i + (cid:99) M − min (cid:0) f ( x i − s ) , f ( x i − s ) (cid:1) ∀ i ∈ (cid:101) I s , (cid:98) I s = (cid:110) i ∈ (cid:101) I s : (cid:99) err i,s > ε or (cid:0) i − s ∈ (cid:101) I − s & (cid:99) err i − s , − s > ε (cid:1)(cid:111) . If (cid:98) I + ∪ (cid:98) I − = ∅ , return M ( f, ε ) = (cid:99) M and terminate the algorithm.Otherwise, continue to the next step. Step 2. Split the subintervals as needed. Update the present partition, x n , toinclude the subinterval midpoints x i − s + x i − s , x i − s + x i ∀ i ∈ (cid:98) I s . (The point ( x i − + x i − ) / i ≥ 2, and the point( x i +1 + x i +2 ) / i ≤ n − I ± toconsist of the new indices corresponding to the old points x i − s , x i − s + x i i ∈ (cid:98) I s . (The point x i − is only included for i ≥ 2, and the point x i +1 is onlyincluded for i ≤ n − Theorem 5. Algorithm M defined above satisfies (MIN) for functions in thecone C defined in (7). Proof. The proof of success of Algorithm M is similar to that for Algorithm A .Here we give the highlights. We use the notation of I x,l introduced in (12) andanalogously define ˜ (cid:96) ( x ) such that I x, ˜ (cid:96) ( x ) is the final subinterval in Algorithm M containing x when the algorithm terminates. For a fixed x ∈ [ a, b ] we argue14s in the proof of Theorem 1 that there exist l ± ≤ l ∗ ≤ (cid:96) ( x ) such that I x,l ∗ =[ x i l ∗ − , x i l ∗ ], x i l ±− ≤ x i l ∗ − ≤ x i l ∗ ≤ x i l ± , and18 C (3 h l − ) h l − (cid:107) f (cid:107) −∞ , [ x il −− ,x il −− ] + (cid:99) M l ∗ − min (cid:0) f ( x i l ∗ − ) , f ( x i l ∗ ) (cid:1) ≤ ε, C (3 h l + ) h l + (cid:107) f (cid:107) −∞ , [ x il + ,x il ++2 ] + (cid:99) M l ∗ − min (cid:0) f ( x i l ∗ − ) , f ( x i l ∗ ) (cid:1) ≤ ε, where (cid:99) M l denotes the value of (cid:99) M at iteration l ∈ N . By the definition of C in (7), this then implies that (cid:99) M l ∗ − min x il ∗− ≤ x ≤ x il ∗ f ( x ) ≤ (cid:99) M l ∗ − min( f ( x i − ) , f ( x i )) + 18 h l ∗ (cid:107) f (cid:107) [ x il ∗− ,x il ∗ ] ≤ ε. (21)Further iterations of the algorithm can only make (cid:99) M l possibly closer to thesolution, min a ≤ x ≤ b f ( x ).Figure 3(b) displays the same function − f as in Figure 3(a), but this timewith the sampling points used for minimization. Here M ( − f , . 02) uses only 43points, whereas A ( − f , . 02) uses 65 points. This is because − f does not needto be approximated accurately when its value is far from the minimum. M The derivation of an upper bound on the cost of Algorithm M proceeds ina similar manner as that for Algorithm A . There are essentially two reasonsthat a subinterval [ x i − , x i ] need not be split further. The first reason is thesame as that for Algorithm A : the function being minimized is approximatedon [ x i − , x i ] with an error no more than the tolerance ε . This is reflected in thedefinition of (cid:101) I ± in Step 1 of Algorithm M . The second reason is that, althoughthe spline approximation error on [ x i − , x i ] is larger than ε , the function valueson that subinterval are significantly larger than the minimum of the functionover [ a, b ]. This is reflected in the definition of (cid:98) I ± in Step 1 of Algorithm M .Our definition of (cid:101) L ( x ) reflects these two reasons. Let x ∗ be some place wherethe minimum of f is obtained, i.e., f ( x ∗ ) = min a ≤ x ≤ b f ( x ) . Let (cid:101) L ( x ) = min (cid:0) L ( x ) , (cid:98) L ( x ) (cid:1) , x ∈ [ a, b ] , (22)where L ( x ) is defined above in (15), (cid:98) L ( x ) = min (cid:40) l ∈ N : (cid:26)(cid:20) C (3 h l ) + 2 (cid:21) (cid:107) f (cid:48)(cid:48) (cid:107) ˜ I x,l + 18 (cid:107) f (cid:48)(cid:48) (cid:107) I x ∗ ,l (cid:27) h l + 2 | f (cid:48) ( x ) | h l + [ f ( x ∗ ) − f ( x )] ≤ (cid:41) , (23)15nd ˜ I x,l is similar to I x,l , but with generally seven times the width:˜ I x,l = (cid:2) a + max(0 , j − h l , a + min( j + 3 , l n ninit ) h l (cid:3) ⊃ I x,l , with the same j as in (12) above.Note that (cid:98) L ( x ) does not depend on ε , whereas L ( x ) does. As is the case with L ( x ), both (cid:98) L ( x ) and (cid:101) L ( x ) depend on f , although this dependence is suppressedin the notation. Theorem 6. Denote by cost( M, f, ε ) the number of functional evaluations re-quired by M ( f, ε ). This computational cost is bounded as follows:cost( M, f, ε ) ≤ h (cid:90) ba (cid:101) L ( x ) d x + 1 , where (cid:101) L ( x ) is defined in (22). Proof. Using the same argument as for Theorem 2, we only need to show that˜ (cid:96) ( x ) ≤ (cid:101) L ( x ) for all x ∈ [ a, b ]. At each iteration of Algorithm M , the index sets I ± are both subsets of I for the corresponding iteration of Algorithm A . Thus˜ (cid:96) ( x ) ≤ L ( x ) by the same argument as used to prove Theorem 2. We only needto show that ˜ (cid:96) ( x ) ≤ (cid:98) L ( x ).We will show that (cid:98) L ( x ) < ˜ (cid:96) ( x ) ≤ L ( x ) for any fixed x leads to a contradic-tion. If (cid:98) L ( x ) < ˜ (cid:96) ( x ), then at the (cid:98) L ( x ) th iteration, I x, (cid:98) L ( x ) = [ x i − , x i ] for some i must be split in Step 2 of M , where x i − x i − = h (cid:98) L ( x ) = h − (cid:98) L ( x ) . This meansthat one or more of the following must exceed ε : (cid:99) err i +2 , + , (cid:99) err i +1 , + , (cid:99) err i, + , (cid:99) err i − , − , (cid:99) err i − , − , (cid:99) err i − , − . We prove that (cid:99) err i +2 , + > ε is impossible. The arguments for the other casesare similar.If [ x i − , x i ] must be split because (cid:99) err i +2 , + > ε , then it is also the case that i − ∈ (cid:101) I − , and so err i − > ε . In this case x j − x j − = h (cid:98) L ( x ) for j = ( i − 1) : ( i + 3) . This means that [ x i − , x i +3 ] ∈ ˜ I x,l . By the same argument used in (16) it canbe shown that err i +2 ≤ C (3 h (cid:98) L ( x ) ) h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) ˜ I x, (cid:98) L ( x ) . (24)The quantity err i +2 is the first term in the definition of (cid:99) err i +2 , + in Step 1 ofAlgorithm M .Next, we bound min (cid:0) f ( x i ) , f ( x i +1 ) (cid:1) , which also appears in the definition of (cid:99) err i +2 , + . As was argued earlier, [ x i − , x i +1 ] ∈ ˜ I x, (cid:98) L ( x ) . Then a Taylor expansionabout the arbitrary x ∈ [ x i − , x i ] under consideration establishes thatmin (cid:0) f ( x i ) , f ( x i +1 ) (cid:1) ≥ f ( x ) − h (cid:98) L ( x ) | f (cid:48) ( x ) | − h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) ˜ I x, (cid:98) L ( x ) . (25)16ince | x i − x | ≤ | x i +1 − x | ≤ h (cid:98) L ( x ) .Finally, we bound (cid:99) M (cid:98) L ( x ) . Let x ∗ be a point where f attains its minimum, andlet I x ∗ ,l ∗ = [ x i ∗ − , x i ∗ ] be the subinterval in the present partition containing x ∗ ,where l ∗ ≤ (cid:98) L ( x ). By (2) it follows that f ( x ∗ ) ≥ min( f ( x i ∗ − ) , f ( x i ∗ )) − h l ∗ (cid:107) f (cid:48)(cid:48) (cid:107) I x ∗ ,l ∗ . (26)There are two possibilities regarding l ∗ . If l ∗ < (cid:98) L ( x ), then by the argument inin (21) used to prove Theorem 5, (cid:99) M (cid:98) L ( x ) ≤ (cid:99) M l ∗ ≤ min( f ( x i ∗ − ) , f ( x i ∗ )) − h l ∗ (cid:107) f (cid:48)(cid:48) (cid:107) [ x i ∗− ,x i ∗ ] + ε ≤ f ( x ∗ ) + ε by (26) . Otherwise, if l ∗ = (cid:98) L ( x ), then (cid:99) M (cid:98) L ( x ) ≤ min( f ( x i ∗ − ) , f ( x i ∗ )) ≤ f ( x ∗ ) + 18 h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) I x ∗ , (cid:98) L ( x ) by (26) . Thus, in either case we have (cid:99) M (cid:98) L ( x ) ≤ f ( x ∗ ) + 18 h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) I x ∗ , (cid:98) L ( x ) + ε. (27)Combining the three inequalities (24), (25), and (27) yields the inequalitythat allows us to contradict the assumption that ˜ (cid:96) ( x ) > (cid:98) L ( x ): ε < (cid:99) err i +2 , + by assumption= err i +2 + (cid:99) M (cid:98) L ( x ) − min (cid:0) f ( x i ) , f ( x i +1 ) (cid:1) by Step 1 of Algorithm M = 18 C (3 h (cid:98) L ( x ) ) h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) ˜ I x, (cid:98) L ( x ) + f ( x ∗ ) + 18 h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) I x ∗ , (cid:98) L ( x ) + ε − f ( x ) + 2 h (cid:98) L ( x ) | f (cid:48) ( x ) | + 2 h (cid:98) L ( x ) (cid:107) f (cid:48)(cid:48) (cid:107) ˜ I x, (cid:98) L ( x ) by (24), (25), and (27) ≤ ε + (cid:26)(cid:20) C (cid:16) h (cid:98) L ( x ) (cid:17) + 2 (cid:21) (cid:107) f (cid:48)(cid:48) (cid:107) ˜ I x, (cid:98) L ( x ) + 18 (cid:107) f (cid:48)(cid:48) (cid:107) I x ∗ , (cid:98) L ( x ) (cid:27) h (cid:98) L ( x ) + 2 | f (cid:48) ( x ) | h (cid:98) L ( x ) + [ f ( x ∗ ) − f ( x )] ≤ ε by (23) . This gives a contradiction and completes the proof.If f ( x ) is close to the minimum function value, f ( x ∗ ), for x in much of [ a, b ],then (cid:98) L ( x ) may be quite large, and L ( x ) determines the computational cost ofAlgorithm M . In this case, the computational cost for minimization is similarto that for function approximation. However, if f attains its minimum at onlya finite number of points, then for vanishing ε , (cid:101) L ( x ) = (cid:98) L ( x ) for nearly all x ,17nd the computational cost for minimization is significantly smaller than thatfor function approximation.The minimization problem (MIN) for functions in the whole Sobolev space W , ∞ has a similar lower complexity bound as (19) for the function approxi-mation problem by a similar proof. However, for functions only in the cone C ,we have not yet derived a lower bound on the complexity of the minimizationproblem (MIN) for functions in C . 5. Numerical Examples Together with our collaborators, we have developed the Guaranteed Auto-matic Integration Library (GAIL) [5]. This MATLAB software library imple-ments algorithms that provide answers to univariate and multivariate integra-tion problems, as well as (APP) and (MIN), by automatically determining thesampling needed to satisfy a user-provided error tolerance. GAIL is under ac-tive development. It implements our best adaptive algorithms and upholds theprinciples of reproducible and reliable computational science as elucidated inChoi et al. [6, 3]. We have adopted practices including input parsing, extensivetesting, code comments, a user guide [4], and case studies. Algorithms A and M described here are implemented as GAIL functions funappx g and funmin g ,respectively in GAIL version 2.2. The following examples showcase the meritsand drawbacks of our algorithms. We compare them to the performance ofalgorithms in MATLAB and the Chebfun toolbox.Chebfun [11] is a MATLAB toolbox that approximates functions in termsof a Chebyshev polynomial basis, in principle to machine precision ( ≈ − )by default. In this example, we show that it fails to reach its intended errortolerance for the function f defined in (8) with − c = 0 . δ . Figure 4(a) showsthe absolute errors of Chebfun’s approximation to f with an input error toler-ance 10 − , and the “splitting” option turned on to allow Chebfun to construct apiecewise polynomial interpolant if derivative discontinuities are detected. How-ever, Chebfun produces some pointwise errors computed at a partition of [ − , − to be greater than 10 − .In contrast, the pointwise errors of the piecewise linear interpolant producedby funappx g are uniformly below the error tolerance. Unfortunately, the timetaken by funappx g is about 30 times as long as the time required by Chebfun.Next, we compare our adaptive algorithms with Chebfun for random samplesfrom the following families of test functions defined on [ − , f ( x ) defined in (8) , δ = 0 . , c ∼ U [0 , . , (28a) f ( x ) = x sin( d/x ) , d ∼ U [0 , , (28b) f ( x ) = 10 x + f ( x ) , (28c)where U [ a, b ] represents a uniform distribution over [ a, b ]. We set n ninit = 250, C ( h ) = 10 h / ( h − h ), and ε = 10 − . Our new algorithm funappx g and Chebfunare used to approximate 1000 random test functions from each family. For18a) (b) Figure 4: (a) The approximation errors for f ( x ), x ∈ [ − , − c = 0 . δ using Chebfunwith an error tolerance of 10 − . (b) An empirical distribution function of performance ratiosbased on 1000 simulations for each test function in (28): funappx g time / Chebfun time(solid), funappx g / Chebfun funappx g test.m and LocallyAdaptivePaperFigs.m in GAIL.Table 1: Comparison of number of sample points, computational time, and success rates of funappx g and Chebfun in upper table; funmin g , fminbnd , and Chebfun’s min in lower table.This table is conditionally reproducible by funappx g test.m and funmin g test.m in GAIL. Mean funappx g Chebfun funappx g Chebfun funappx g Chebfun f f f funmin g fminbnd min funmin g fminbnd min funmin g fminbnd min − f 111 8 116 0.0029 0.0006 0.0256 100 100 14 f 48 22 43 0.0028 0.0007 0.0063 100 27 60 f 108 9 22 0.0028 0.0007 0.0037 100 100 35 Chebfun we override the default tolerance to 10 − , and switch on the splittingfeature to allow piecewise Chebyshev polynomials for approximation. Success isdetermined by whether a discrete approximation to the L ∞ error is no greaterthan the error tolerance.We see in Table 1 that funappx g obtains the correct answer in all cases, evenfor f , which is outside the cone C . Since it is a higher order algorithm, Chebfungenerally uses substantially fewer samples than funappx g , but its run time islonger than funappx g for a significant proportion of the cases; see Figure 4(b).Moreover, Chebfun rarely approximates the test functions satisfactorily.Similar simulation tests have been run to compare our funmin g , MATLAB’s fminbnd , and Chebfun’s min , but this time n ninit = 20 for funmin g . The resultsare summarized in the lower half of Table 1. Our funmin g achieves 100% successfor all families of test functions with substantially fewer sampling points andrun time than funappx g . This is because funmin g does not sample densely19here the function is not close to its minimum value. Although MATLAB’s fminbnd uses far fewer function values than funmin g , it cannot locate theglobal minimum (at the left boundary) for about 70% of the f test cases.Chebfun’s min uses fewer points than funmin g , but Chebfun is slower and lessaccurate than funmin g for these tests. 6. Discussion Adaptive and automatic algorithms are popular because they require only a(black-box) function and an error tolerance. Such algorithms exist in a variety ofsoftware packages. We have highlighted those found in MATLAB and Chebfunbecause they are among the best. However, as we have shown by numericalexamples, these algorithms may fail. Moreover, there is no theory to providenecessary conditions for failure, or equivalently, sufficient conditions for success.Our Algorithms A ( funappx g ) and M ( funmin g ) are locally adaptive andhave sufficient conditions for success. Although it may be difficult to verifythose conditions in practice, the theory behind these algorithms provide severaladvantages: • The cone, C , is intuitively explained as a set of functions whose secondderivatives do not change drastically over a small interval. This intuitioncan guide the user in setting the parameters defining C , if desired. • The norms of f and its derivatives appearing in the upper bounds of com-putational cost in Theorems 2 and 6 may be unknown, but these theoremsexplain how the norms influence the time required by our algorithms. • Our Algorithm A has been shown to be asymptotically optimal for thecomplexity of the function approximation problem (APP).The minimum horizontal scale of functions in C is roughly 1 /n ninit . Thecomputational cost of our algorithms is at least n ninit , but n ninit is not a mul-tiplicative factor. Increasing n ninit makes our new algorithms more robust, andit may increase the minimum number of sample points and computational cost,if any, only mildly.As mentioned in the introduction, there are general theorems providing suf-ficient conditions under which adaption provides no advantage. Our setting failsto satisfy those conditions because C is not convex. One may average two mildlyspiky functions in C —whose spikes have opposite signs and partially overlap—toobtain a very spiky function outside C .Nonadaptive algorithms are unable to solve (APP) or (MIN) using a finitenumber of function values if the set of interesting functions, C , is a cone, unlessthere exist nonadaptive algorithms that solve these problems exactly. Supposethat some nonadaptive, algorithm A satisfies (APP) for some cone C , and thatfor an error tolerance ε , this algorithm A requires n function values. For anypositive c , define A ∗ ( f, ε ) = A ( cf, ε ) /c for all f ∈ C . Then (cid:107) f − A ∗ ( f, ε ) (cid:107) = (cid:107) cf − A ( cf, ε ) (cid:107) /c ≤ ε/c for all f ∈ C since cf is also in C . Thus, A ∗ satisfies20APP) for error tolerance ε/c , using the same number of function values as A .Making c arbitrarily large establishes the existence of a nonadaptive algorithmthat solves (APP) exactly.Our algorithms do not take advantage of higher orders of smoothness thatthe input function may have. We view the present work as a stepping stone todeveloping higher order algorithms. Nonlinear splines or higher degree polyno-mials, such as those used in Chebfun, are potential candidates. Acknowledgments We dedicate this article to the memory of our colleague Joseph F. Traub, whopassed away on August 24, 2015. He was a polymath and an influential figurein computer science and applied mathematics. He was the founding Editor-in-Chief of the Journal of Complexity and we are grateful to his tremendousimpact and lifelong service to our research community.We thank Greg Fasshauer, Erich Novak, the GAIL team, and two anony-mous referees for their valuable comments and suggestions. This research wassupported in part by grants NSF-DMS-1522687 and NSF-DMS-0923111. References [1] R.P. Brent, Algorithms for Minimization Without Derivatives, Dover Pub-lications, Inc., Mineola, NY, 2013. Republication of the 1973 edition byPrentice-Hall, Inc.[2] R.L. Burden, J.D. Faires, A.M. Burden, Numerical Analysis, Tenth ed.,Brooks/Cole, 2016.[3] S.C.T. Choi, MINRES-QLP Pack and reliable reproducible research viastaunch scientific software, J. Open Research Software 2 (2014) 1–7. doi: .[4] S.C.T. Choi, Y. Ding, F.J. Hickernell, L. Jiang, Ll.A. Jim´enez Rugama,X. Tong, Y. Zhang, X. Zhou, GAIL—Guaranteed Automatic IntegrationLibrary in MATLAB: Documentation for version 2.1, arXiv:1503.06544(2015).[5] S.C.T. Choi, Y. Ding, F.J. Hickernell, L. Jiang, Ll.A. Jim´enez Rugama,X. Tong, Y. Zhang, X. Zhou, GAIL: Guaranteed Automatic IntegrationLibrary (versions 1.0–2.1), MATLAB software, 2013–2015. URL: http://gailgithub.github.io/GAIL_Dev/ .[6] S.C.T. Choi, F.J. Hickernell, IIT MATH-573 Reliable Mathematical Soft-ware, Illinois Institute of Technology, 2013. Course slides at [5].[7] N. Clancy, Y. Ding, C. Hamilton, F.J. Hickernell, Y. Zhang, The cost ofdeterministic, adaptive, automatic algorithms: Cones, not balls, J. Com-plexity 30 (2014) 21–45. doi: .218] R. Cools, D. Nuyens (Eds.), Monte Carlo and Quasi-Monte Carlo Methods:MCQMC, Leuven, Belgium, April 2014, volume 163 of Springer Proceedingsin Mathematics and Statistics , Springer-Verlag, Berlin, 2016.[9] Y. Ding, Guaranteed Adaptive Univariate Function Approximation, Ph.D.thesis, Illinois Institute of Technology, 2015.[10] G. Forsythe, M. Malcolm, C. Moler, Computer methods for mathematicalcomputations, Prentice-Hall, 1976.[11] N. Hale, L.N. Trefethen, T.A. Driscoll, Chebfun Version 5.4, 2016.[12] F.J. Hickernell, L. Jiang, Y. Liu, A.B. Owen, Guaranteed conservativefixed width confidence intervals via Monte Carlo sampling, in: J. Dick,F.Y. Kuo, G.W. Peters, I.H. Sloan (Eds.), Monte Carlo and Quasi-MonteCarlo Methods 2012, volume 65 of Springer Proceedings in Mathematicsand Statistics , Springer-Verlag, Berlin, 2013, pp. 105–128.[13] F.J. Hickernell, Ll.A. Jim´enez Rugama, Reliable adaptive cubature usingdigital sequences, in: [8], pp. 367–383. ArXiv:1410.8615 [math.NA].[14] M. Horn, Optimal algorithms for global optimization in case of unknownLipschitz constant, J. Complexity 22 (2006) 50–70.[15] L. Jiang, Guaranteed Adaptive Monte Carlo Methods for Estimating Meansof Random Variables, Ph.D. thesis, Illinois Institute of Technology, 2016.[16] Ll.A. Jim´enez Rugama, F.J. Hickernell, Adaptive multidimensional inte-gration based on rank-1 lattices, in: [8], pp. 407–422. ArXiv:1411.1966.[17] R. Moore, R. Kearfott, M. Cloud, Introduction To Interval Analysis, Cam-bridge University Press, Cambridge, 2009.[18] E. Novak, On the power of adaption, J. Complexity 12 (1996) 199–237.[19] L. Plaskota, Automatic integration using asymptotically optimal adaptiveSimpson quadrature, Numer. Math. 131 (2015) 173–198.[20] L. Plaskota, G.W. Wasilkowski, Y. Zhao, The power of adaption for approx-imating functions with singularities, Math. Comput. 77 (2008) 2309–2338.[21] S.M. Rump, INTLAB - INTerval LABoratory, in: T. Csendes (Ed.), Devel-opments in Reliable Computing, Kluwer Academic Publishers, Dordrecht,1999, pp. 77–104. URL: