A seven-point algorithm for piecewise smooth univariate minimization
aa r X i v : . [ m a t h . O C ] D ec Noname manuscript No. (will be inserted by the editor)
A seven-point algorithm for piecewise smoothunivariate minimization
Jonathan Grant-Peters · Raphael Hauser
Received: date / Accepted: date
Abstract
In this paper, we construct an algorithm for minimising piecewisesmooth functions for which derivative information is not available. The al-gorithm constructs a pair of quadratic functions, one on each side of thepoint with smallest known function value, and selects the intersection of thesequadratics as the next test point. This algorithm relies on the quadratic func-tion underestimating the true function within a specific range, which is accom-plished using a adjustment term that is modified as the algorithm progresses.
Keywords
Nonsmooth optimisation · nondifferentiable programming · univariate minimisation · quadratic approximation When solving the problem of minimizing a function of many variables, manyexisting solutions belong to a class of algorithms called line-search algorithms[15,3,5] which reduce the problem by iteratively restricting the search to 1-dimensional subspaces determined by a search direction to determine a stepsize. This 1-dimensional problem may be solved either exactly or inexactly.Most methods rely on the latter approach, as it is computationally cheaperwhile no less effective.While exact line search is rarely used, due to its cost, there exist certainspecial cases where it is more effective such as Linear Programming. Moreover,
This publication is based on work supported by the EPSRC Centre for Doctoral Train-ing in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration withSiemens.J. Grant-PetersMathematical Institute, University of Oxford, United KingdomE-mail: [email protected]. HauserMathematical Institute, University of Oxford, United Kingdom Jonathan Grant-Peters, Raphael Hauser it has been observed by Yu et al [17] that in the context of non-smooth optimi-sation, using exact line-search may result in stepping to a location from wherea better subdifferential approximation may be constructed, thus resulting inselecting better search directions at future iterations.In this paper, motivated by designing an exact line search algorithm fornon-smooth optimisation, we examine the effectiveness of existing univari-ate optimisation algorithms on a class of non-smooth objective functions andintroduce a new algorithm which is specifically designed for this class. Theparticular problem for which we develop this method is a black box, piecewisesmooth objective function for which derivatives are not available.When conducting an exact line-search, there are two distinct steps. Firstof all we must construct a bracket , which means finding a closed interval whichis guaranteed to contain a local minimum of the univariate objective function.The second is find the local minimum within this interval. In this paper, wefocus mainly on the second step which we rigorously state below.
Problem 1.1
Given is a function f : R → R of the form f = max i =1 ,...,k f i ( x )where f i : R → R are smooth functions, an oracle for calling function values of f ( x ), and an interval [ a, b ]. Design an optimisation method for finding a localminimiser of f in [ a, b ] which makes use of only oracle calls for f .One class of algorithms which solves Problem 1.1 is that of 1d global op-timisers. This class includes using interval analysis based methods such asthe Moore-Skelboe algorithm [13]. However, these methods require that theobjective function is known, and are not applicable to black box functions.Another category of solutions is that of global Lipschitzian methods suchas those summarised by Hansen et al [7]. While these might be feasible if wecould guess a suitable Lipschitz Constant, in practice there is little gainedfrom them. Even if more than one local optimum existed, we have no reasonto expect that finding one minimum as opposed to another would make anexact line search based method more effective. Therefore, while we will makeuse of some of the ideas behind such algorithms, we reject them as they involveneedless extra work.When choosing a local univariate solver, one faces a trade off between speedand robustness. The basic methods are Golden Section [10,4] which convergesQ-linearly for any continuous function, and interpolating methods [4,9] which,when successful, converge super linearly. The latter’s stability depends on theirability to construct a polynomial which approximates the objective functionwell locally. If the approximation does not fit the objective function well, thenthese algorithms may fail.In practice, the most effective methods are bracketing interpolation hybridmethods such as Brent’s Method [2,1] and Hager’s cubic method [6] whichcombine the speed of interpolation methods with the stability of Golden Sec-tion by making use of a fall back option when the interpolation is not working.While convergence is guaranteed for such hybrid methods, they will still needtheir interpolations to match the objective function well if super linear conver- seven-point algorithm for piecewise smooth univariate minimization 3 gence is to be obtained. Therefore, we can’t expect them to converge quicklyfor an objective function of the form stated in Problem 1.1.There exist other bracketing-interpolation hybrid methods for non differ-entiable functions of the form f ( x ) = max i f i ( x ) [14,17]. However, these al-gorithms assume that we can compute each f i (from which f is computed)separately. Therefore, these are unsuitable for Problem 1.1 given that the or-acle is defined to only return the value of f ( x ).The only algorithm which seems to be optimised for our setting is thefive point method of Mifflin and Strodiot [12], to which we will from nowon refer to as “the Mifflin-Strodiot method”. This algorithm is designed toconverge rapidly to the local minimum x ∗ even when the objective function isnon-differentiable at x ∗ .The remainder of this paper is structured as follows: In Section 2, we definethe notion of a bracket rigorously and describe in brief the existing methodswhich are currently most applicable to our problem. Sections 3 to 5 focus on thederivation of a new univariate optimiser, for which we present three variations.We compare our new methods against relevant competitors in Section 6. We begin by rigorously defining a bracket:
Definition 2.1 (Bracket)
Let f : R → R be a function, and x L , x M , x R ∈ R . We call the trio of points x L , x M , x R a bracket of f if they satisfy thefollowing:1. x L < x M < x R , 2. f ( x L ) ≥ f ( x M ) ≤ f ( x R ).The significance of a bracket is that Definition 2.1 guarantees that there existsa local minimum of f in ( x L , x R ). Definition 2.2 (Set of Brackets)
We define B f to be the set of all bracketsfor the function f .In this paper, we refer to the elements of a bracket X as x L , x M and x R .If a bracket X i is associated with a particular iteration of an algorithm, thenwe write its elements as x Li , x Mi and x Ri . Given this notation, we define thefunction b ( X ) = x R − x L .Bracketing methods are the set of algorithms which are vaguely in the formof Algorithm 1, given the inputs of a step function σ and an update function U . The step function should select a new point within ( x L , x M ) ∪ ( x M , x R ),while the update function should return a bracket. When discussing existingbracketing methods and constructing our new one, we use the structure ofAlgorithm 1 where one algorithm is distinguished from another based on howthe step function σ and update function U are defined.The purpose of this paper is to introduce a new bracketing method calledthe Underestimating Polynomial Method (UPM) for Problem 1.1 for which, to Jonathan Grant-Peters, Raphael Hauser
Algorithm 1
General Bracketing Method
Require: f Objective function, σ Step Function, U Update Function, X Initial Bracket, ǫ > i = 0; while b ( X i ) > ǫ do ˜ x i = σ ( X i ). X i +1 = U (˜ x i , X i ). i = i + 1; end whilereturn X i . our knowledge, there does not currently exist a robust and fast solution. Whenassessing the effectiveness of this algorithm, we will compare it to a smallselection of existing bracketing methods including Golden Section, Brent’sMethod [2,1], and the Mifflin-Strodiot method [11].Of these algorithms, Golden Section may be considered to most robust as itis guaranteed to converge Q-linearly with a rate of approximately 0 . Definition 2.3
A function f : R n ⊇ Ω → R is called piecewise smooth if f is continuous on Ω and there exists a disjoint finite family of sets { Ω i ; i =1 , , . . . , k } such that f is smooth on Ω i for all i ∈ I and ∪ ki =1 Ω i is dense in Ω . We call any point x such that x ∈ Ω i ∩ Ωj for some i, j ∈ I a kink .The theory behind algorithms such as Brent’s method collapses due to theexistence of kinks, because interpolating across a kink has no meaning andyields no meaningful error bound. Therefore, if the local minimum of a function f is a kink, we expect algorithms like Brent’s method to converge to it slowly. The premise of the UPM is inspired by the approach used in the Mifflin-Strodiot method [11]. We approximate the objective function with two poly-nomials, one on each side of the bracket. If the local minimum is a kink, thenthe combination of these polynomials may be valid approximations and usefulfor locating the minimum. To construct these polynomials, we need more than seven-point algorithm for piecewise smooth univariate minimization 5 the three points contained in a bracket and therefore extend the definition ofa bracket to include seven points:
Definition 3.1 (Extended Bracket)
Given the function f , we call X ∈ R an extended bracket written in the following form X = (cid:0) x L x L x L x M x R x R x R (cid:1) T , (3.1)if the following conditions apply: x L < x L < x L Let X be an extended bracketof f . We define: diam ( X ) = diam ( Conv ( X )), and b ( X ) = x R − x L . Remark 3.2 Definition 3.2 is the natural extension of the function b ( X ) fromSection 2 where X is merely a bracket as opposed to an extended bracket.The UPM mostly conforms to the form of Algorithm 1. The main difference,apart from replacing the bracket with an extended bracket, is the fact thatthe UPM also depends on an input parameter α . This parameter is used bythe step function σ when constructing the next point to evaluate. The threevariations of the UPM presented in this paper differ in how they tread α .The Static Underestimating Polynomial Method (SUPM) requires an ini-tial value of α to be supplied by the user, which is then kept constant for theduration of the algorithm. For the remainder of this section, we denote thestep function σ and update function U which apply to the SUPM by σ S and U S respectively.Given the objective function f , our strategy is to use the points x L , x L , x L and x R , x R , x R to construct the model functions q L ( x ; X , α ), and q R ( x ; X , α )which approximate f L and f R respectively. We will employ Newtons DividedDifference notation which we summarize below. Definition 3.3 Given a, b, c ∈ R , and f : R → R , the 1 st and 2 nd divideddifferences are defined by: f [ a, b ] = f ( a ) − f ( b ) a − b , and f [ a, b, c ] = f [ a, b ] − f [ a, c ] b − c . The reason we use this notation is that the 1 st and 2 nd divided differencesare natural approximations for the 1 st and 2 nd derivatives of f respectively.We will state the result in a later section when more rigour is needed.Now we define our model functions q L and q R as the following: q k ( x ; X , α ) = f ( x k )+ f [ x k , x k ]( x − x k )+( f [ x k , x k , x k ] − αh ( X ))( x − x k )( x − x k ) , (3.3) Jonathan Grant-Peters, Raphael Hauser where k ∈ { L, R } , α > h ( X ) is a scaling function with theproperty that h ( X ) → diam ( X ) → 0. Note that q k has the form of the 2 nd order Newton Interpolating Polynomial combined with the adjustment term αh ( X ). Finally we express σ S in terms of q L and q R : σ S ( X ; α ) = argmin x ∈ [ x L ,x R ] max( q L ( x ; X , α ) , q R ( x ; X , α )) . (3.4) Remark 3.3 The model functions q L and q R are designed to underestimate f L and f R within [ x L , x R ]. We will show how we achieve this in Lemma 3.2. − . − . − . − . 25 0 . 00 0 . 25 0 . 50 0 . 75 1 . − . . . . f ( x ) q L ( x ) q R ( x ) Fig. 3.1 We plot the objective function in addition to the quadratic function approximatingthe left and right. These two quadratics intersect at x = 0 . . . . . Example 3.1 Consider the function f ( x ) = max (cid:0) sin( xπ ) , − cos( xπ ) (cid:1) . Thisfunction is piecewise smooth and unimodal in the interval [ − , 1] with a localminimum at 0. At this local minimum, f is not differentiable.Now suppose that { x Li } i =1 = {− . , − . , − } and { x Ri } i =1 = { . , . , . } . We construct q L and q R by interpolating { ( x ki , f ( x ki )) } i =1 for k = L, R .These two quadratics intersect at x ≈ . 005 as shown in Figure 3.1. This valueis taken to be the next point at which we evaluate f .All that remains is for us to define the function U S . U S (˜ x ; X , α ) := U (˜ x ; X ) if ˜ x < x M and f (˜ x ) < f ( x M ) U (˜ x ; X ) if ˜ x > x M and f (˜ x ) < f ( x M ) U (˜ x ; X ) if ˜ x > x M and f (˜ x ) > f ( x M ) U (˜ x ; X ) if ˜ x < x M and f (˜ x ) > f ( x M ) , (3.5)where U (˜ x ; X ) := (cid:0) x L x L x L ˜ x x M x R x R (cid:1) T , (3.6a) U (˜ x ; X ) := (cid:0) x L x L x M ˜ x x R x R x R (cid:1) T , (3.6b) U (˜ x ; X ) := (cid:0) x L x L x L x M ˜ x x R x R (cid:1) T , (3.6c) U (˜ x ; X ) := (cid:0) x L x L ˜ x x M x R x R x R (cid:1) T . (3.6d) seven-point algorithm for piecewise smooth univariate minimization 7 Remark 3.4 The update function σ S ( X ) is the natural extension of the oneused by the Mifflin-Strodiot method [12] to 7 points. Lemma 3.1 Let X be an extended bracket. If ˜ x = σ S ( X ) ∈ ( x L , x M ) ∪ ( x M , x R ) , then X = U S (˜ x ; X , α ) is an extended bracket and b ( X ) < b ( X ) . Definition 3.4 A function f : [ a, b ] → R is called unimodal if there exists ζ ∈ ( a, b ) such that f is monotonically decreasing on ( a, ζ ) and monotonicallyincreasing on ( ζ, b ). If a function is not unimodal, then it is multi-modal. Definition 3.5 Let f be a piece-wise smooth function. We call f locallyunimodal if for any local minimum x ∗ of f , there exists an open set ( a, b ) ∋ x ∗ such that f is unimodal on ( a, b ).When constructing convergence results, there are three levels of assump-tions we might make. First is the case where f is a piece-wise smooth func-tion, X is a bracket, and nothing more is assumed. In this general context,we can do nothing beyond defining a minimum distance between points δ .If | σ S ( X ) − x M | < δ , we replace σ S ( X ) with argmin x | x − σ S ( X ) | such that | x − x M | ≥ δ , x R − x ≥ δ and x − x L ≥ δ . If σ S ( X ) = x M , then there exists twoequally valid solutions: x M + δ and x M − δ . When this applies, we arbitrarilyset σ S ( X ) = x M − δ . By doing this, we are sure to converge eventually, evenif slowly.For the next level of assumptions, we additionally require f to be in theform f ( x ) = max( f L ( x ) , f R ( x )), and X satisfy f ( x Lj ) = f L ( x Lj ) , j = 1 , , f ( x Rj ) = f L ( x Rj ) , j = 1 , , 3. Given these assumptions, we show in Lemma 3.2and Corollary 3.1 that σ S ( X ; α ) ∈ ( x L , x R ) given a sufficiently large choice of α . In short, when there are not too many kinks in one place, then the UPMselects sensible points to evaluate f at.Finally, we add the assumptions that f is unimodal and diam ( X ) is suf-ficiently small. Under these circumstances, we show in Lemma 3.3 and The-orem 3.2 that the distance between σ S ( X ) and the true solution x ∗ can bebounded.Taken in the context of locally unimodal functions, these results will implythat the SUPM is stable when the function is not unimodal, and fast when itis. This is sufficient for our purposes given that a stable algorithm applied toa locally unimodal function will eventually converge to a bracket in which thefunction is unimodal.For the analysis that follows, we require the following external result: Theorem 3.1 ([1]) Suppose that k, n ≥ ; f ∈ C n + k [ a, b ] ; ζ ∈ [ a, b ] ; and x , . . . , x n are distinct points in [ a, b ] . Then f [ x , . . . , x n ] = f ( n ) ( ζ ) n ! + X ≤ r ≤ n ( x r − ζ ) f ( n +1) ( ζ )( n + 1)!+ . . . + X ≤ r ≤ ... ≤ r k ≤ n k Y j =1 ( x r j − ζ ) f ( n + k ) ( ζ )( n + k )! + E (3.7) Jonathan Grant-Peters, Raphael Hauser where E = 1( n + k )! X ≤ r ≤ ... ≤ r k ≤ n k Y j =1 ( x r j − ζ ) [ f ( n + k ) ( ξ r ,...,r k ) − f ( n + k ) ( ζ )] , (3.8) and ξ r ,...,r k are points in the interval spanned by x r , . . . , x r k and ζ . Lemma 3.2 Let f be a function of the form f = max( f L , f R ) such that f ′′ L and f ′′ R are both Lipschitz continuous with Lipschitz constants M L and M R respectively, and X be a an extended bracket of f where f ( x kj ) = f k ( x kj ) for j = 1 , , and k ∈ { L, R } . If α ≥ max ( M L , M R ) and h ( X ) = max( x R − x L , x R − x L ) , then q R ( x ; X , α ) < f R ( x ) ∀ x ∈ [ x L , x R ) and q L ( x ; X , α ) 3, we have: f L ( y ) > q L ( y ; X , α ) , ⇔ f L ( y ) > f L ( x L ) + f L [ x L , x L ]( y − x L ) + ( f L [ x L , x L , x L ] − αh ( X ))( y − x L )( y − x L ) , ⇔ f L [ y, x L ] > f L [ x L , x L ] + ( f L [ x L , x L , x L ] − αh ( X ))( y − x L ) , ⇔ f L [ y, x L , x L ] > f L [ x L , x L , x L ] − αh ( X ) , ⇔ αh ( X ) > f L [ x L , x L , x L ] − f L [ y, x L , x L ] . Analogously, for y ∈ [ x L , x R ) we have: αh ( X ) > f R [ x R , x R , x R ] − f R [ y, x R , x R ] . From Theorem 3.1, we know that there exists ξ R ∈ [ x R , x R ] such that f ′′ ( ξ R ) = f R [ x R , x R , x R ] and ξ R ∈ [ y, x R ] such that f ′′ ( ξ R ) = f R [ y, x R , x R ].Equivalently, the points ξ L and ξ L exist for f L [ x L , x L , x L ] and f L [ y, x L , x L ]respectively. Therefore: f L [ x L , x L , x L ] − f L [ y, x L , x L ] = (cid:0) f ′′ L ( ξ L ) − f ′′ L ( ξ L ) (cid:1) , ≤ M L | ξ L − ξ L | , ≤ M L | max( y, x L ) − x L | . The bound on the RHS depends on the value of y , and is itself bounded by M L | x R − x L | when considering y ∈ [ x L , x R ]. Therefore, a sufficient condi-tion for f L ( y ) > q L ( y ; X , α ) ∀ y ∈ ( x L , x R ] is that αh ( X ) > M L | x R − x L | .Similarly, a sufficient condition for f R ( y ) > q R ( y ; X , α ) ∀ y ∈ [ x L , x R ) is αh ( X ) > M R | x R − x L | . Choosing h ( X ) = max( x R − x L , x R − x L ), and 2 α > max( M L , M R ) ensuresthat both of these are satisfied. Corollary 3.1 Let f be a function of the form f = max( f L , f R ) such that f ′′ L and f ′′ R are both Lipschitz continuous with Lipschitz constants M L and M R respectively, and X be a an extended bracket of f where f ( x kj ) = f k ( x kj ) for j = 1 , , and k ∈ { L, R } . If α ≥ max( M L , M R ) and h ( X ) = max( x R − x L , x R − x L ) , then σ S ( X ) ∈ ( x L , x R ) . seven-point algorithm for piecewise smooth univariate minimization 9 Proof As Lemma 3.2 is applicable, we know that q R ( x ) < f R ( x ) ∀ x ∈ [ x L , x R )and q L ( x ) < f L ( x ) ∀ x ∈ ( x L , x R ]. Since f = max( f L , f R ), it follows that q L ( x M ) < f ( x M ) and q R ( x M ) < f ( x M ). For convenience, write q ( x ) =max( q L ( x ; X , α ) , q R ( x ; X , α )) . Starting from Equation (3.4), we obtain: q ( σ S ( X ; α )) ≤ q ( x M ) < f ( x M ) ≤ min( f ( x L ) , f ( x R )) = min( q L ( x L ) , q R ( x R )) . But q ( x L ) = q L ( x L ) and q ( x R ) = q R ( x R ). Therefore q ( σ S ( X ; α )) < min( q ( x L ) , q ( x R )),which implies that σ S ( X ; α ) = x L or x R . Since σ S ( X ; α ) ∈ [ x L , x R ] by defini-tion (see Equation (3.4)), it follows that σ S ( X ; α ) ∈ ( x L , x R ). Remark 3.5 Between Corollary 3.1 and the requirement that the minimumdistance between any two points in X is δ , we are assured that the conditionsfor Lemma 3.1 will be satisfied.For the remainder of this section, we work towards bounding the conver-gence of the SUPM when f is piecewise-smooth and unimodal. We define x ∗ to be the unique local minimum of f in ( x L , x R ). When assessing the SUPM’sconvergence rate, we consider two cases based on the values of f ′ L ( x ∗ ) and f ′ R ( x ∗ ).If f ′ R ( x ∗ ) = 0 or f ′ L ( x ∗ ) = 0 holds, then we expect the SUPM to behavesimilarly to Brent’s Method. For example, if f ′ R ( x ∗ ) = 0 and f ′ L ( x ∗ ) < 0, thenwe expect q R to behave similarly to the interpolated polynomial from Brent’sMethod. The SUPM will be slower however for two reasons.1. When the SUPM computes a new point, that value may be stored in x M which does not impact the next computation but only the one after that.2. As only q R is converging, any point which is used to construct q L will playlittle to no role in determining σ S ( X ), and is therefore useless.We focus our attention primarily on the case where f ′ L ( x ∗ ) < f ′ R ( x ∗ ) > 0. In order to bound | σ S ( X ) | , we need to understand what typeof point σ S ( X ) returns. We address this in Lemma 3.3. First however, weintroduce a shorthand notation for Divided Differences which we use for theremainder of this section. f k = f ( x k ) , k ∈ { L, R } ,f k = f [ x k , x k ] , k ∈ { L, R } ,f k = f [ x k , x k , x k ] , k ∈ { L, R } . Lemma 3.3 Let f be a piecewise smooth unimodal function of the form f =max( f L , f R ) such that f ′′ L and f ′′ R are both Lipschitz continuous with Lipschitzconstants M L and M R respectively, and X be a an extended bracket of f .Further let α be given such that α ≥ max( M L , M R ) .If f ′ L (0) < and f ′ R (0) > , then there exists ǫ > such that if X ∈ ( − ǫ, ǫ ) , then the model functions q L and q R intersect exactly once within ( x L , x R ) and σ S ( X ) returns this uniqueintersection point. Proof Since f = max( f L , f R ) is unimodal, and X is an extended bracket, itfollows that f ( x kj ) = f k ( x kj ) for j = 1 , , k ∈ { L, R } . We use this alongwith Lemma 3.2 to show: q L ( x L ) = f ( x L ) = f L ( x L ) ≥ f R ( x L ) > q R ( x L ), andsimilarly q R ( x R ) > q L ( x L ). By the Intermediate Value Theorem [16, Theorem4.23], it follows that q L intersects with q R at least once in ( x L , x R ). Moreover,since q L and q R are both quadratic functions, they must intersect exactly once.A sufficient but not necessary condition for the result to follow would be ∃ ǫ > q L and q R are monotonically decreasing and increasingrespectively. Once this holds, σ S ( X ) = argmin x max( q L ( x ) , q R ( x )) must referto the intersection of q L and q R .To show that q L is monotonically decreasing, it is sufficient to prove that( q L ) ′ ( x L ) ≤ q L ) ′ ( x R ) < 0. Once this is shown, we are done since ( q L ) ′ is a linear function of x . We find :( q L ) ′ ( x L ) = f L = f ( x L ) − f ( x L ) x L − x L ≤ , since f is unimodal and X is an extended bracket of f (recall Definition 3.1).For ( q L ) ′ ( x R ) we begin with( q L ) ′ ( x R ) = f L + ( f L − αh ( X ))(2 x R − x L − x L ) . (3.9)Next we apply Theorem 3.1 to conclude: ∃ ξ L , ξ L ∈ [ x L , x L ] such that f L = f ′ L (0) + ( x L f ′′ L ( ξ L ) + x L f ′′ L ( ξ L )) , ∃ ξ L ∈ [ x L , x L ] such that f L = f ′′ L ( ξ L ) . From this and Equation (3.9) it follows that( q L ) ′ ( x R ) = f ′ L (0) + (cid:0) x L f ′′ L ( ξ L ) + x L f ′′ L ( ξ L ) (cid:1) + (2 x R − x L − x L ) (cid:0) f ′′ L ( ξ L ) − αh ( X ) (cid:1) . Since f ′′ L and f ′′ R are Lipschitz continuous, we know that they are bounded on[ x L , x R ]. Let M be constant which bounds f ′′ L and f ′′ R . We use this along withthe fact that X ∈ ( − ǫ, ǫ ) to conclude:( q L ) ′ ( x R ) ≤ f ′ L (0) + (cid:0) − x L M − x L M + (2 x R − x L − x L )( M − αh ( X ) (cid:1) , ( q L ) ′ ( x R ) ≤ f ′ L (0) + (2 ǫM + 4 ǫ ( M − αh ( X )) , ≤ f ′ L (0) + ǫ (3 M − αh ( X )) ≤ f ′ L (0) + 3 ǫM, since 0 < h ( X ) < ǫ and α is constant. Therefore a sufficient condition for( q L ) ′ ( x R ) < ǫ < | f ′ L (0) | / M . That q R is monotonicallyincreasing given sufficiently small ǫ follows from the equivalent argument.Having established the conditions under which we can express σ S ( X ) ana-lytically, we now bound the distance between this point and the true minimiser. seven-point algorithm for piecewise smooth univariate minimization 11 Theorem 3.2 Let f : [ a, b ] → R be a piecewise smooth unimodal functionof the form f = max( f L , f R ) such that x ∗ ∈ [ a, b ] is a minimiser of f , f ′ L ( x ∗ ) < , f ′ R ( x ∗ ) > , and X be an extended bracket of f . If the func-tions f ′′ L and f ′′ R are both Lipschitz continuous (with constant M L and M R )and α > max( M L , M R ) , then there exists ǫ > such that | σ S ( X ) | ≤ √ f ′ R ( x ∗ ) − f ′ L ( x ∗ ) (cid:0) | x L − x ∗ || x L − x ∗ | ( | x L − x ∗ | + | x L − x ∗ | ) M L + | x R − x ∗ || x R − x ∗ | ( | x R − x ∗ | + | x R − x ∗ | ) M R (cid:1) , when X ∈ ( − ǫ, ǫ ) .Proof Since Lemma 3.3 applies, we know that σ S ( X ) returns the unique pointof intersection between q L and q R in ( x L , x R ). The intersection points of thesequadratics are the roots to the polynomial ax + bx + c = 0, where: a = f R − f L ,b = f R − f L − ( x R + x R )( f R − αh ) + ( x L + x L )( f L − αh ) ,c = f R − f L − x R f R + x L f L + x R x R ( f R − αh ) − x L x L ( f L − αh ) . Assuming without loss of generality x ∗ = 0, we invoke Theorem 3.1 to makethe following substitutions for ℓ ∈ { L, R } . f ℓ = f ℓ (0) + x ℓ f ′ ℓ (0) + ( x ℓ ) f ′′ ℓ ( ξ ℓ ) , ξ ℓ ∈ [0 , x ℓ ] ,f ℓ = f ′ ℓ (0) + (cid:0) x ℓ f ′′ ℓ ( ξ ℓ ) + x ℓ f ′′ ℓ ( ξ ℓ ) (cid:1) , ξ ℓ ∈ [0 , x ℓ ] , ξ ℓ ∈ [0 , x ℓ ] ,f ℓ = f ′′ ℓ ( ξ ℓ ) , ξ ℓ ∈ [0 , x ℓ ] . By inserting the substitutions above into the definitions of a, b and c , we get: a = ( f ′′ R (0) − f ′′ L (0)) + K Ra − K La ,b = f ′ R (0) − f ′ L (0) + K Rb − K Lb ,c = K Rc − K Lc , where K ℓa , K ℓb and K ℓc are in turn defined for ℓ ∈ { L, R } by: K ℓa = (cid:0) f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ (0) (cid:1) ,K ℓb = (cid:0) x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) + x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) (cid:1) + ( x ℓ + x ℓ ) αh ( X ) ,K ℓc = x ℓ (cid:0) x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) + x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) (cid:1) − x ℓ x ℓ αh ( X ) . Writing a, b and c in terms of these K values allows us to give bounds usingthe assumption that f ′′ L and f ′′ R are Lipschitz Continuous . | K ℓa | ≤ M ℓ | x ℓ | ≤ M ℓ ǫ, (3.10a) | K ℓb | ≤ ( | x ℓ | + | x ℓ | ) (cid:0) M ℓ | x ℓ | + αh ( X ) (cid:1) ≤ ǫ ( M ℓ + 2 α ) , (3.10b) | K ℓc | ≤ | x ℓ || x ℓ | M ℓ (cid:0) | x ℓ | + | x ℓ | (cid:1) ≤ M ℓ ǫ . (3.10c) Choose an ǫ small enough to ensure ( f ′ R (0) − f ′ L (0)) > K Lb − K Rb (notethat b → f ′ R (0) − f ′ L (0) as ǫ → ǫ ≤ ǫ , then it follows that b > σ S ( X ) = − b + √ b − ac a , (3.11)because the other root lies outside [ x L , x R ] as ǫ → 0. We apply Taylors The-orem to conclude ∃ ξ ∈ [ −| ac | , | ac | ] such that √ b − ac = b + 2 ac/ p b − ξ .Since ac → ǫ → 0, there exists ǫ such that | ac | ≤ b when ǫ < ǫ .Then if ǫ < min( ǫ , ǫ , ǫ ), we have: | σ S ( X ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − b + √ b − ac a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 | a | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ac p b − ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , ξ ∈ [0 , ac ] , ≤ | c | p b − ξ ≤ | c | p b − | ac | ≤ √ (cid:12)(cid:12)(cid:12) cb (cid:12)(cid:12)(cid:12) , = √ K Rc − K Lc f ′ R (0) − f ′ L (0) + K Rb − K Lb ≤ √ | K Rc | + | K Lc | f ′ R (0) − f ′ L (0) . Finally we insert Equation (3.10c) and the result follows.Theorem 3.2 is remarkably similar to [11, Theorem 4.1], the equivalentresult which bounds the convergence of the Mifflin-Strodiot method . Usingour own notation for their result, Mifflin and Strodiot showed | x Mi − x ∗ | ≤ A i | x ∗ − x L ,i | + B i | x L ,i − x ∗ | , for each iteration i , where A i and B i are sequenceswhich converge to 0 as i → ∞ . If we rewrite the result from Theorem 3.2 in thesame form, the sequence A i may be written explicitly in terms of x L ,i , x L ,i , and x L ,i , and similarly for B i . Then, if x L ,i and x L ,i both converge to x ∗ , the rateat which A i and B i tend to zero begins to resemble quadratic convergence.However, Theorem 3.2, like [11, Theorem 4.1] does not prove quadraticconvergence, nor even linear convergence. This is because we cannot know apriori whether the sequences { x L ,i } and { x L ,i } converge at all, nor how quicklyif they do. Suppose there exists an iteration i such that ∀ i > i , x L ,i isconstant. In this case, the bound given by Theorem 3.2 will decrease, but notconverge to 0. In practice, when this occurs x Mi will still converge to x ∗ butonly sub-linearly, while b ( X i ) { x L ,i } and { x L ,i } converge to x ∗ , butwe only update x L ,i once every k iterations. Then we may say that | x Mki − x ∗ | → i → ∞ , but depending on the size of k , this may not beparticularly useful.In short, there are three main weaknesses of the SUPM which lead toits failure. The first is that we need to choose α , a suitable value of whichwe cannot know in advance. The second is that convergence only occurs inreasonable time if both updates of x L ,i and x L ,i occur regularly. Finally, werequire a bracket sufficiently close to the solution that f is unimodal before anyof our results apply. These shortcomings of the SUPM motivate subsequentsections, in which we describe more stable variations of the UPM. seven-point algorithm for piecewise smooth univariate minimization 13 In Section 3 , we showed that if α was sufficiently large, then the σ S is guar-anteed to select a point within ( x L , x R ). Coupled with a restriction on theminimum distance between points, this is enough to ensure convergence even-tually. However, for an arbitrary piecewise smooth objective function, we haveno idea what a suitable α might be. As there is no finite value of α whichsatisfies Lemma 3.2 for all functions, we ask what happens if we let α → ∞ and name the corresponding variation of the SUPM the Extremal Underesti-mating Polynomial Method (EUPM), denoting its step function and updatefunction by σ E and U E respectively.As α → ∞ we are assured by Lemma 3.2 that q L and q R will underestimate f L and f R respectively within [ x L , x R ]. Furthermore, if h ( X ) > 0, then we canpick a sufficiently large α such that the coefficient of x in q k ( x ; X , α ) (seeEquation (3.3)) will be negative. Once α satisfies both of these, then q L and q R will intersect at exactly one point, which is the point that σ S ( X ) returns.We define σ E ( X ) to be this point which we calculate by identifying the root of q R ( x ; X , α ) − q L ( x ; X , α ) = 0 , which remains bounded as α → ∞ . By startingfrom Equation (3.11) and noting the Taylor Series as α → ∞ , we find: σ E ( X ) = x R x R − x L x L x R + x R − x L − x L . (4.1)Note that the step function σ E does not depend on either x L or x R . There-fore, within the context of this section, our extended bracket X is of the form: X = (cid:0) x L x L x M x R x R (cid:1) T , (4.2)In order to construct U E , we first define ˜ U S to be the update function U S ,were we remove the entries corresponding to x R and x L . Using this notation,we define: U E ( X ) := ˜ U S ( σ E ( X ) , X ) . (4.3)Given σ E and U E , the EUPM is precisely in the form of Algorithm 1. Remark 4.1 Equation (4.3) is of the same form as the update function usedby the Mifflin-Strodiot method [12]. Remark 4.2 We see the EUPM is qualitatively different from the SUPM inthat σ E depends only on the values of X , and not on the function f at all.Therefore, like Golden Section, the EUPM is equally suited to smooth, non-smooth, unimodal and multi-modal functions and is scale invariant.4.1 Theoretical Convergence of the EUPMIn this section we derive a bound for the convergence rate of the EUPM. Inparticular, we find an integer k and constant C such that b ( X i + k ) /b ( X i ) ≤ C holds for any objective function. While there exist infinitely many possibleojective functions, there are only finitely possible values for X i + k given X i .This is because σ E does not depend on the objective function f and thus theonly effect that f has on the EUPM is in the function U E (See Equation (3.5))by determining which update function: U , U , U or U to apply, none of whichdepend on f .Therefore, there are only 4 possible values of X i +1 given X i : U j X for j =1 , , , 4. As a consequence of this, there are at most 4 k possible values of X i + k given X i . Each possible value of X i + k depends on the sequence of updatefunctions which generated it. Remark 4.3 In this section, we will refer to the update functions U , U , U and U extensively. To simplify notation, we replace U j with ˜ U j , where ˜ U j ( X ) := U j ( σ E ( X ) , X ). For the remainder of Section 4, U j should be read as ˜ U j . Definition 4.1 Let {X i } ni =0 be a sequence of extended brackets generated bythe EUPM such that X i +1 = U E ( X i ) for each i . Let I = { i j } n − j =0 ∈ { , , , } n be the sequence of numbers such that X j +1 = U i j ( X j ) , ∀ j ∈ { , , . . . , n − } .Then we call I the sequence of EUPM iterations which generate {X i } ni =0 .Moreover we write: U I := U i n ◦ . . . U i ◦ U i , where I = { i j } nj =1 . Definition 4.2 Define I n to be the set of all length n sequences of EUPMiterations. Definition 4.3 Define the Q k ( f, X ) to be the sequence of the first k itera-tions generated by the EUPM when applied to the function f with the startingbracket X . Lemma 4.1 Let f be a function, X ∈ B f and I ∈ I n such that I = { I , I } = Q n ( f, X ) , where I ∈ I n − k and I ∈ I k . Then it holds that: Q k ( f, U I X ) = I . Using the notation from Definitions 4.1 and 4.3, it follows X n + k = U Q k ( f, X n ) ( X n ).Since Q k ( f, X n ) ∈ I k for every function f , it follows that: b ( X n + k ) b ( X n ) ≤ C for any function f ⇔ b ( U I X n ) b ( X n ) , ∀ I ∈ I k . Definition 4.4 Given I ∈ I n , we define:˜ B I := {X ∈ R | ∃ f such that X ∈ B f , Q n ( f, X ) = I } . Lemma 4.2 Let I ∈ I n be of the form I = { I , I } , where I ∈ I n − k and I ∈ I k . Then it holds that: ˜ B I ⊆ ˜ B I and U I ( ˜ B I ) ⊆ ˜ B I .Proof If X ∈ ˜ B I , then there exists a function f such that I = Q n ( f, X ).Since, I = { I , I } , then I = Q n − k ( f, X ) by definition of Q . Hence X ∈ ˜ B I .Moreover, I = Q k ( f, U I X ) by Lemma 4.1, which implies U I X ∈ ˜ B I . seven-point algorithm for piecewise smooth univariate minimization 15 We finally state our main result. Theorem 4.1 The EUPM converges R-linearly. In particular: max I ∈I max X ∈ ˜ B I b ( U I X ) b ( X ) ≤ , for any bracket . Given how long the proof of Theorem 4.1 is, we reserve an entire sectionfor it. The proof may be found in Section 4.2, although it may be skipped onfirst reading.Theorem 4.1 provides a bound on the linear convergence rate for the EUPMalbeit a weak one. In practice, we observe far faster convergence as will bedemonstrated in Section 6 and appendix B. However, the main justificationfor the EUPM is its robustness. In Section 3, we assumed that the SUPMstarted with a bracket that was already sufficiently close to the solution.Once equipped with this, we might expect the SUPM to converge quickly.The EUPM is useful for producing such a bracket in the first place. In Sec-tion 5, we present the final variation of the UPM, in which we combine theinitial robustness of the EUPM with the desirable properties of the SUPM.4.2 Proof of Theorem 4.1We divide this proof into two parts, which are defined by Lemmas 4.3 and 4.4. Lemma 4.3 It holds that b ( U I X ) /b ( X ) ≤ ∀ I ∈ A where A = { , , , , , , , , , , , } . Lemma 4.4 If b ( U I X ) /b ( X ) ≤ ∀ I ∈ A then b ( U I X ) /b ( X ) ≤ ∀ I ∈ I ,where A is as defined in Lemma 4.3. Given that the proof of Lemma 4.3 is entirely algebraic calculations, weleave it for Appendix A. In this section, we work towards proving Lemma 4.4.Ultimately, Theorem 4.1 follows from these two results. Definition 4.5 Let I = { i j } nj =1 ∈ I n be a length n sequence of iterationsof the EUPM. A sub-string of the sequence I is a subsequence of the form J = { i j } k j = k such that 1 ≤ k < k ≤ n . Denote by S ( I ) the set of sub-stringsof the sequence I .The first step towards proving Lemma 4.4 is to show that if I ∈ I isa length 5 sequence of iterations and J ∈ S ( I ) is a sub-string of I , then b ( U J X ) /b ( X ) ≤ implies b ( U I X ) /b ( X ) ≤ . Lemma 4.5 Let X be an extended bracket of f . Given I ∈ I n , and J ∈ S ( I ) ,it holds that: max X ∈ ˜ B I b ( U I X ) b ( X ) ≤ max X ∈ ˜ B J b ( U J X ) b ( X ) . In the next few lemmas, we show that not all sequences of { , , , } arepossible sequences of EUPM iterations given a particular starting bracket X . Lemma 4.6 Let I = { i j } n − j =0 ∈ I n be a sequence of EUPM iterations. If i j = 1 then i j +1 ∈ { , } . Similarly, if i j = 2 then i j +1 ∈ { , } .Proof If i j = 1, then by definition X j +1 = U X j . From this, we calculate: σ E ( X j +1 ) − x Mj +1 = x L ,j +1 x L ,j +1 − x L ,j +1 x L ,j +1 x L ,j +1 + x L ,j +1 − x L ,j +1 − x L ,j +1 − x Mj +1 , = x L ,j x Mj − x L ,j x L ,j x L ,j + x Mj − x L ,j − x L ,j − x L ,j x L ,j − x L ,j x L ,j x L ,j + x L ,j − x L ,j − x L ,j , = − ( x L ,j − x Mj )( x L ,j − x L ,j )( x L ,j − x L ,j )( x Mj + x L ,j − x L ,j − x L ,j )( x L ,j + x L ,j − x L ,j − x L ,j ) < , since each X j is an extended bracket. Given that σ E ( X j +1 ) − x Mj +1 < 0, it fol-lows from Equation (3.5) that i j +1 ∈ { , } . The equivalent argument appliesfor when i j = 2.In the statement of Lemma 4.6, one may notice a symmetry between thefunctions U , U and U , U respectively. In the lemmas which follow, we derivea relation between U , U and U , U respectively such that if b ( U I X ) /b ( X ) ≤ holds for some I , it will still hold after replacing U and U with U and U and vice versa. Definition 4.6 Let I ∈ I n be a sequence of EUPM iterations. We define X : R → R to be the function which satisfies X ( a, b, c, d, e ) = − ( e, d, c, b, a ).We make use of the function X because of the following useful properties. Lemma 4.7 It holds that XX X = X , σ E ( X X ) = − σ E ( X ) , b ( X X ) = b ( X ) , U X = XU X X , U X = XU X X , U X = XU X X and U X = XU X X . Having shown that the function X relates U to U and U to U , we nowshow that this relation also holds for sequences of iterations. Definition 4.7 Let I = { i j } nj =1 ∈ I n . Define W : I n → I n to be the function: W ( I ) = { w ( i j ) } nj =1 , where w (1) = 2 , w (2) = 1 , w (3) = 4 and w (4) = 3. Lemma 4.8 If f and g are functions such that g ( x ) = f ( − x ) and X ∈B f , then X X ∈ B g . Moreover, if {X fi } ni =0 and {X gi } ni =0 are the sequences ofextended brackets generated by the EUPM when applied to f and g startingfrom X and X X respectively, then it holds that X fj = X X gj for every j =0 , , . . . , n and Q n ( g, X X ) = W ( Q n ( f, X )) .Proof Note that X X ∈ B g follows from Definitions 3.1 and 4.6. Let {X fi } ni =0 and {X gi } ni =0 be the sequences of extend brackets generated by applying theEUPM with the starting brackets X and X X to f and g respectively. Define: ϕ ( X ) = σ E ( X ) − x M . (4.4) seven-point algorithm for piecewise smooth univariate minimization 17 Since σ E ( X ) = − σ E ( X X ), it follows that f ( σ E ( X )) = g ( σ E ( X X )) and ϕ ( X ) < ⇔ ϕ ( X X ) > 0. Given that sign ( ϕ ( X )) is precisely the condition inEquation (3.5) which determines whether to apply update functions U , U or U , U , it follows that X f = U j X f = U j X ⇔ X g = U W ( j ) X g = U W ( j ) X X .Applying Lemma 4.7, we note that U W ( j ) X X = XU j X which impliesthat X g = X X f . From this it follows inductively that X fj = X X gj for all j = 1 , , . . . , n . Moreover, if we write Q n ( f, X ) and Q n ( g, X X ) in the forms { i fj } n − j =0 and { i gj } n − j =0 respectively, it also follows inductively that i fj = W ( i gj )for each j = 0 , , . . . , n − 1; in other words Q n ( g, X X ) = W ( Q n ( f, X )). Corollary 4.1 If I ∈ I n , then: X ( ˜ B I ) = ˜ B W ( I ) (recall Definition 4.4). Corollary 4.2 It holds that U I X = XU W ( I ) X X , ∀ I ∈ I n .Proof Let f be a function such that I = Q ( f, X ). By Lemma 4.8, the function g ( x ) = f ( − x ) has the property that Q n ( g, X X ) = W ( Q n ( f, X )) and U I ( X ) = U Q n ( f, X ) ( X ) = XU Q n ( g,X X ) ( X X ) = XU W ( I ) ( X X ).The map W serves as a natural bijective between sequences of EUPMiterations which start with 1 or 4 and those which start with 2 or 3. Dividing I is useful due to the following result. Corollary 4.3 Given I ∈ I n a sequence of EUPM iterations, it holds that: max X ∈ ˜ B I b ( U I X ) b ( X ) = max X ∈ ˜ B W ( I ) b ( U W ( I ) X ) b ( X ) Proof Using Lemma 4.7 and Corollaries 4.1 and 4.2 it follows that:max X ∈ ˜ B I b ( U I X ) b ( X ) = max X ∈ ˜ B I b ( XU W ( I ) X X ) b ( X ) = max X ∈ ˜ B I b ( U W ( I ) X X ) b ( X X ) = max sss ∈ ˜ B W ( I ) b ( U W ( I ) sss ) b ( sss ) . We are now equipped to prove Lemma 4.4. For this proof, we write partic-ular sequences of iterations such as { , , } simply as 413. Proof (Lemma 4.4) To prove this lemma, we list out elements of I , I and I which begin with either 1 or 4 and mark them in the following way. If asequence is contained in A , we overline it. If a sequence contains a sub-stringwhich is contained in A , we underline the relevant sub-string. Finally, if asequence contains a sub-string J such that W ( J ) ∈ A , then we place squarebrackets around J . We will show that all elements of I can be marked in oneof these three ways. First we list elements of I which start with either 1 or4. There is no need to list the sequences starting with 2 or 3 because each ofthose sequences is merely W ( J ) for some J listed below.111 141 143 411 422 431 4[33] 441 443114 142 144 414 423 432 434 442 444 We see that only 7 (or 14 if you count the equivalent sequences starting with 2or 3) of the sequences above are unmarked. We take these seven sequences, andlist all elements of I which begin with any of these 7 sub-strings. We neglectto list out elements of I which begin with a marked sub-string because theseelements are themselves guaranteed to be marked .1141 1143 1411 1422 4111 4231 42[33] 4[311] 43221142 1144 1414 1423 4114 4232 4[234] 4314 4[323]Using the same procedure as earlier, we see only 3 sequences are left unmarked.We list out all elements of I which begin with one of these three sequencesand note that all of them are marked.11422 42311 4[2322] 11423 4[2314] 42[323] . If a sequence J is marked in any of these ways, then it follows from Lemma 4.5and Corollary 4.3 that b ( U I X ) /b ( X ) ≤ ∀ I ∈ A implies b ( U J X ) /b ( X ) ≤ . In Section 3, we identified two main weaknesses of the SUPM: that of choosinga suitable α , and that of ensuring that both x R ,i and x L ,i converge to x ∗ fastenough. In this section, we introduce a heuristic which makes use of the featuresof the SUPM, while being equipped with a safeguarding option designed toavoid the situations where the SUPM fails. We call this heuristic the DynamicUPM (DUPM), and denote the step and update functions for the DUPM by σ D and U D respectively.The effect that α has on σ S lies in how the model functions (see Equa-tion (3.3)) are constructed. We see that in this definition, α is multiplied to h ( X ). Since h ( X ) → α should not a problem ultimately. However, theDUPM may stall temporarily when α is too small. Therefore, we constructthe DUPM such that it will increase α when necessary, but never decrease it.In order for Theorem 3.2 to apply, we need α to satisfy Lemma 3.2. Whilethere is no way to check this, a necessary condition for Lemma 3.2 to apply isthat f ( x M ) ≥ q L ( x M ) and f ( x M ) ≥ q R ( x M ). Using a method similar to theproof of Lemma 3.2, we find that this is equivalent to: α ≥ h ( X ) max k ∈{ L,R } (cid:0) f [ x k , x k , x k ] − f [ x M , x k , x k ] (cid:1) . (5.1)Therefore, at each iteration of the DUPM we set α i +1 ≥ max( α i , α ∗ ) where α ∗ is the smallest value α which satisfies Equation (5.1).Next we wish to force the DUPM to update both x L ,i and x R ,i regularly. Inorder to do this, we must first equip the DUPM to recognize when this occurs.We append the three variables u , u , u , whose purpose is record whether theupdate function U D updated x L or x R during each of the last three iterations, seven-point algorithm for piecewise smooth univariate minimization 19 to the extended bracket X . The update function U D is defined to be the naturalextension of U S (see Equation (3.5)) which includes u , u and u : U D (˜ x ; X , α ) = ( U (˜ x ; X ) , R, u , u ) if ˜ x < x M and f (˜ x ) < f ( x M )( U (˜ x ; X ) , L, u , u ) if ˜ x > x M and f (˜ x ) < f ( x M )( U (˜ x ; X ) , R, u , u ) if ˜ x > x M and f (˜ x ) > f ( x M )( U (˜ x ; X ) , L, u , u ) if ˜ x < x M and f (˜ x ) > f ( x M ) , (5.2)where the functions U , . . . , U are as defined in Equation (3.6).If u = u = u , then this means that the DUPM has updated either x L ,i ,or x R ,i for each of the last three iterations. When this occurs, we force theDUPM to take an EUPM step instead, σ D ( X ; α ) = σ E ( X ). This interferenceis similar to the fall back option used by Brent’s method [2,1]. The choice tointerfere after three iterations is based on practical experience, and not on anytheoretical insight.Finally, we define one more condition under which we interfere with theSUPM. From practical experience, we observe situations where either q L or q R is a convex function whose local minimum is returned by σ S , which resultin slow convergence. As a response to this, we ensure that α is sufficiently bigthat σ D returns a point of intersection between q L and q R . More rigorously,we require α such that q L ( σ S ( X ; α ); X , α ) = q R ( σ S ( X ; α ); X , α ) (5.3)holds, where σ S is the step function for the SUPM (see Equation (3.4)).Define χ ( X ) to be the the smallest value of α such that Equation (5.3)holds for all α ≥ χ ( X ). Then at each iteration of the DUPM, we require α i +1 ≥ max( α i , χ ( X )).In order to compute χ ( X ), note Equation (5.3) is satisfied whenever q L and q R are concave functions, provided that Equation (5.1) holds. Define α + to be max k f [ x k , x k , x k ] /h ( X ), k ∈ { L, R } . Then it follows that for all α > α + ,Equation (5.3) is satisfied, implying that χ ( X ) ≤ α + .If at iteration i , α i satisfies Equation (5.3), then there is no need to compute χ ( X ). Otherwise, we know that χ ( X ) lies in the interval ( α i , α + ]. Therefore,we can apply bisection to compute χ ( X ) to any desired accuracy.To define the DUPM rigorously, we must insert an additional line intoAlgorithm 1. Before computing σ D , we set α i +1 = max (cid:18) α i , χ ( X i ) , h ( X ) max k ∈{ L,R } (cid:0) f [ x k , x k , x k ] − f [ x M , x k , x k ] (cid:1)(cid:19) . Finally we define σ D ( X ; α ) := (cid:26) if u = u = u σ E ( X )else σ S ( X ; α ) . (5.4)Ultimately the purpose of these interferences is to ensure that Theorem 3.2applies as soon as possible while forcing both x L ,i and x R ,i to be updated. Thus far, we have presented three algorithms: the SUPM, EUPM, and DUPM.Of these, we have only bounded the theoretical convergence rate for EUPM.While Theorem 3.2 is insightful, it does not yield any concrete bound. More-over, while the DUPM has some nice properties which are designed to strengthenthe SUPM, we have not offered any proof as to their effectiveness. Therefore, inorder to justify these algorithms, we demonstrate their use on a variety of testfunctions, comparing them to Brent’s method [1,2] and the Mifflin-Strodiotmethod [11]. Also, recall Golden Section which converges Q-linearly with rate0 . f such that the minimum Lipschitz constant for f ′′′ lies between 0 . f SU = − √ e exp (cid:0) − x (cid:1) , − ≤ x ≤ , (6.1) f SU = 124 x , − ≤ x ≤ , (6.2) f SU = 111 (cid:0) − sin(2 x − π ) − x − x (cid:1) , − . ≤ x ≤ , (6.3) f SU = 12500 (cid:18) x − cos (5 πx )25 π − x sin(5 πx )5 π (cid:19) , − ≤ x ≤ , (6.4) f SU = − (cid:16) x + (1 − x ) (cid:17) , . ≤ x ≤ . , (6.5) f SU = 16000 (cid:18) e x + 1 √ x (cid:19) , . ≤ x ≤ , (6.6) f SU = − 113 (16 x − x + 5) e − x , . ≤ x ≤ . . (6.7)For each test function, we have an interval [ a, b ] on which the function isintended to be used. We sample 4 points uniformly from both [ a, a + ( b − a )]and [ b − ( b − a ) , b ]. After evaluating the function at these 8 points, we selectthe point where f is minimised to be x M ; usually this will be one of the pointscloser to ( b − a ). Finally we order the remaining points, and select the 3closest on the left and right to be x L , x L , . . . , x R . Selecting our initial pointsin this way yields an extended bracket.Having generated the bracket, we run each algorithm and compute theaverage convergence rate over 1000 different random initialisations. If an algo-rithm converged for a particular starting bracket, then its average convergencerate is by definition between 0 and 1, the smaller the better. Iffor any random seven-point algorithm for piecewise smooth univariate minimization 21SUPM with parameter value Other methodsFunctions α = 0 α = 0 . α = 1 α = 10 EUPM DUPM Brent Mifflin f SU . . . . . . 48 0 . . f SU . 68 0 . . . . . . . f SU ∞ . . . . . . ∞ f SU ∞ . . . . . . ∞ f SU ∞ . . . . . . . f SU ∞ . . 615 0 . . . . ∞ f SU . 35 0 . . . . . . ∞ Table 6.1 Average convergence rate of different algorithms on the set of smooth unimodaltest functions. initialisation, an algorithm did not converge, we assign ∞ instead. We presentthese results in Table 6.1.First of all, we see that Brent’s method consistently converges the fastest.This is not surprising, given that Brent’s method is the only algorithm shownhere which is specifically designed for smooth functions. While the Mifflin-Strodiot method is faster in select circumstances, it fails entirely on others.This is also what we observe from the SUPM with α = 0.Looking at the SUPM’s performance, we see that that the larger α is, theslower the SUPM will converge (assuming it converges in the first place). TheDUPM outperforms the SUPM with α = 1, but is not always faster than theSUPM with α = 0 . 1, which suggests that while Lemma 3.2 is required forour theoretical results, satisfying it is by no means necessary. Meanwhile, theEUPM is both consistent and slow, converging with an average rate compara-ble to Golden Section.In short, the DUPM is not competitive with Brent’s method for this classof test functions. However, it converge with an average rate which is betterthan for Golden Section, implying that it is not prohibitively slow.Next we define our set of non-smooth unimodal test functions: f NU = − (cid:18) − | x | (cid:19) , − ≤ x ≤ , (6.8) f NU = 16 max (cid:18) x + 3 , log( x ) (cid:19) , − ≤ x ≤ , (6.9) f NU = 124 max (cid:18) x + 3 , x − (cid:19) , − ≤ x ≤ , (6.10) f NU = 1160 max (cid:18) x + 3 , exp( x ) (cid:19) , − ≤ x ≤ , (6.11) f NU = 1150 max (exp( − x ) , exp( x )) , − ≤ x ≤ . (6.12)The results for these functions are shown in Table 6.2. These Results areparticularly important as it was for this class of functions that the UPM wasdesigned in the first place. As we’d hope, we see the SUPM with α = 1 andthe DUPM outperform Brent’s method for every function. Despite the fact α = 0 α = 0 . α = 1 α = 10 EUPM DUPM Brent Mifflin f NU ∞ . . . . . 264 0 . ∞ f NU ∞ . . . . . 427 0 . ∞ f NU ∞ . . . . . . . f NU ∞ . . . . . . . f NU ∞ . . . . . . . Table 6.2 Average convergence rate of different algorithms on the set of non-smooth uni-modal test functions. that the Mifflin-Strodiot method is also designed for this context, it does notalways successfully converge. In particular, we observe that the Mifflin-Strodiotmethod struggles with non-convex functions.Finally we define our smooth multimodal test functions. f SM = x (cid:18) (cid:18) x (cid:19)(cid:19) , − ≤ x ≤ , (6.13) f SM = − πx ) , − ≤ x ≤ , (6.14) f SM = − (cid:18) π ( x − 120 ) (cid:19) , . ≤ x ≤ , (6.15) f SM = 15 sin (cid:18) x − (cid:19) + sin (cid:18) x − (cid:19) ! , − ≤ x ≤ , (6.16) f SM = x − cos x + 1 , ≤ x ≤ , (6.17) f SM = 171 (cid:16) (log( x − + (log(10 − x )) − x (cid:17) , . ≤ x ≤ . , (6.18) f SM = 140 (cid:18) sin( x ) + sin (cid:18) x (cid:19) + log( x ) + 2125 x (cid:19) , . ≤ x ≤ . (6.19)It is important to note that theory for neither the UPM nor indeed Brent’smethod is suitable for multi-modal functions. However, both algorithms areequipped with features ensuring sufficient robustness in the general case thatthey converge (possibly slowly) to a locality where the function is unimodal.For these test-functions, Brent’s method is once again the best. Whileslower than Brent’s method, the DUPM consistently performs better thanGolden Section, which is an ideal result considering that these are not idealcircumstances for the DUPM. Finally we see that the Mifflin-Strodiot methodis entirely unsuitable for this type of problem, and as such consistently failsto converge in reasonable time. seven-point algorithm for piecewise smooth univariate minimization 23SUPM with parameter value Other methodsFunctions α = 0 α = 0 . α = 1 α = 10 EUPM DUPM Brent Mifflin f SM ∞ . . . . . . ∞ f SM ∞ . . . . . 486 0 . ∞ f SM ∞ . . . . . . ∞ f SM ∞ . . . . . . ∞ f SM ∞ . . . . . . ∞ f SM ∞ . . . . 624 0 . . ∞ f SM ∞ . . . . . . ∞ Table 6.3 Average convergence rate of different algorithms on the set of smooth multimodaltest functions. In this paper, we have constructed a univariate optimization algorithm forblack box, piece-wise smooth functions. As seen in Section 6, this new method,the DUPM, converges both more robustly and often faster than existing meth-ods for such problems. Furthermore, while it is not the fastest algorithms forstandard smooth test functions, it is not prohibitively slow either.It must be acknowledged that the comparison between the UPM, theMifflin-Strodiot method and Brent’s method is not entirely fair given thatthey require a different number of points to start. Therefore, in the context ofa line-search, we expect to start with a method like Golden Section, and switchto the DUPM once we have accumulated enough points to form and extendedbracket. Regardless, the DUPM offers a univariate solver which performs wellin most contexts and is therefore suitable for non-smooth functions. A Proof of Lemma 4.3 For the calculations in this section, we use the following change of variables: ttt = (cid:0) p q r s (cid:1) T = (cid:0) x L − x L , x M − x L , x R − x M , x R − x R (cid:1) T . Converting our calculations to being in termsof ttt both reduces the number of variables, and simplifies the constraint X ∈ ˜ B (recall Defi-nition 4.4) into ttt ∈ R .4 Jonathan Grant-Peters, Raphael HauserThe algebra for the calculations which follow is tedious, and therefore we employedMathematica to compute the values of U I , I ∈ A as well as changing the variables used. b ( U X ) b ( X ) = ( q + r )( p + q + r )2( q + r )( p + q + r ) + ( q + r ) + s + s ( p + 3 q + 3 r ) < ,b ( U X ) b ( X ) = qp + 2 q + r < ,b ( U X ) b ( X ) = q ( p + q )( p + q + r )( p + 2 q + r )(3 q + 3 qr + r + p (2 q + r )) ,< q ( p + q )3 q + 3 qr + r + p (2 q + r ) < ,b ( U X ) b ( X ) = ( q + r )( p + q + r )2( q + r )( p + q + r ) + ( q + r ) + s + s ( p + 3 q + 3 r ) < ,b ( U X ) b ( X ) = 12 + q + p r + 7 q r + 5 qr + r + p ( q + r )(3 q + 2 r ) q ( p + q )( p + q + r ) ! ≤ ,b ( U X ) b ( X ) = q ( p + q )( p + q + r )( p + 2 q + r )( p + 3 q + r ( r + s ) + 2 q (2 r + s ) + p (3 q + 2 r + s )) ,< q ( p + q )3 q ( p + q ) + p + r ( r + s ) + 2 q (2 r + s ) + 2 rp + ps < ,b ( U X ) b ( X ) = ( p + q + r )( p + 2 q + r )( q ( p + q ) − r ( r + s ))( p + 2 q + 2 r + s ) × q ( q + r ) + r + p (2 q + r ) + q ( s − r ) + p (6 q + 2 r + q (7 r + s ))) ,< ( p + q + r )( q ( p + q ) − r ( r + s ))4 q ( q + r ) + r + p (2 q + r ) + q ( s − r ) + p (6 q + 2 r + q (7 r + s )) ,< ( p + q + r )( q ( p + q ) − r )4 q ( q + r ) + r + p (2 q + r ) − q r + p (6 q + 2 r + 7 qr ) ,< q ( p + q ) − r q ( p + q ) < . For the sequence 414, we need to employ another piece of information. From Equa-tion (3.5), we see that U is only applied to X if ϕ ( X ) < q ( p + q ) − r ( r + s ) > b ( U X ) b ( X ) = 12 + q + r ( p + r ) + q (7 r + s ) + qr (3 p + 7 r + 3 s ) q ( q ( p + q ) − r ( r + s )) ! , which is bounded above by precisely when q ( p + q ) − r ( r + s ) > 0. Therefore b ( U X ) /b ( X ) < as required.There remain 4 elements of A for which we must establish a bound: 434, 4322, 4314 and4114. As the previous change of variables does little to simplify the calculations for thesesubsequences, we instead use the following alternative change of variables: ( a, b, c, d ) = (cid:0) x R − x L , x R − x L , x R − x L , ϕ ( X ) (cid:1) . For this set of variables, X ∈ ˜ B is equivalent to seven-point algorithm for piecewise smooth univariate minimization 25 a, b, c > b > a and c > a . Using this transformation, we find: b ( U X ) b ( X ) = bc ( b + c )( b + c − a )( ab + bc + c )( ab + 2 b c + 3 bc + c ) < bc ( b + c ) ab + 2 b c + 3 bc + c < bc ( b + c )2 b c + 3 bc + c < b ( b + c )2 b + 3 bc + c < b b + c < ,b ( U X ) b ( X ) = bc ( b + c )( b + c − a )( ab + bc + c )( ab + 2 b c + 3 bc + c ) < bc ( b + c ) ab + 2 b c + 3 bc + c < bc ( b + c )2 b c + 3 bc + c < b ( b + c )2 b + 3 bc + c < b b + c < . Remark A.1 From the calculations above, we see that b ( U X ) /b ( X ) = b ( U X ) /b ( X )and b ( U X ) /b ( X ) = b ( U X ) /b ( X ). While this may imply another relation which we havenot taken advantage of, we have not examined this further.Finally we have the sequences 4314 and 4114. Similar to what we did with the sequence 414,we need to make use of additional information, specifically the fact that U is only appliedif ϕ ( X ) = d < 0. In addition, U may only follow after U if ϕ ( U X ) < 0. This is equivalentto: − d ( ab + bc + c ) > abc ( b + c − a ). From this it follows that: b ( U X ) b ( X ) = abc ( b + c )( b + c − a )( ab + bc + c )( ac ( b + c ) − d ( ab + bc + c )) ,< abc ( b + c )( b + c − a )( ab + bc + c )( ac ( b + c ) + abc ( b + c − a )) , = bc ( b + c )( b + c − a )( ab + bc + c )((2 b + c )( b + c ) − ab ) < bc ( b + c ) ( ab + bc + c )(2 b + c )( b + c ) , = bc ( b + c )( ab + bc + c )(2 b + c ) < b b + c < . Similarly for the sequence 4314, U may only follow after U if ϕ ( U X ) > 0. This isequivalent to: − d ( ab + bc + c ) < abc ( b + c − a ). First we compute b ( U X ) b ( X ) = − d ( ac − d )( ab + bc + c ) a ( b + c )( ac ( b + c ) − d ( ab + bc + c )) . Next we observe that ϕ ( U X ) > − d > abc ( b + c − a ) ab + bc + c ⇔ ac − d > ac ( b + c ) ab + bc + c . (A.1)This implies that: b ( U X ) b ( X ) < − dc ( b + c ) ac ( b + c ) − d ( ab + bc + c ) . Finally we that a sufficient condition for b ( U X ) /b ( X ) < is: − dc ( b + c ) < ac ( b + c ) − d ( ab + bc + c ) , ⇔ − d ( − ab + bc + c ) < ac ( b + c ) . We see that this is true when we combine the requirement that ϕ ( U X ) > − d ( − ab + bc + c ) < − d ( ab + bc + c ) < ( ac − d )( ab + bc + c ) < ac ( b + c ) . Therefore, b ( U X ) /b ( X ) < holds and moreover b ( U I X ) /b ( X ) < ∀ I ∈ I .6 Jonathan Grant-Peters, Raphael Hauser B Numerical Performance of the EUPM In Section 4.1 we bounded the convergence rate over 5 iterations of the EUPM. In practicewe observe far superior convergence rates. From our observation, this seems to be becausethe extremal values of ttt which yield the worst convergence rate over 5 iterations tendto yield best case performance over 6 iterations and so on. In this section, we show whatperformance might realistically be expected from the EUPM.In order to construct a suitable experiment, we first must ask what factors affect theconvergence of the EUPM? The answer to this question is the function to minimise f ,along with the initial value of ttt . However, the only effect that the function f actually hasis to determine whether f (˜ x ) < f ( x M ). This along with the current value of ttt uniquelydetermines which update function will be used. Therefore, when testing the convergencerate of the EUPM, we do not test it on a range of functions, but rather for different binarysequences, where 1’s mean f (˜ x ) < f ( x M ) and 0’s the opposite.The main advantage of this is that while the set C ([ a, b ]) has infinite cardinality, the setof binary sequences of length n is finite. Therefore, given an initial bracket ttt , it is possible todetermine how the EUPM will perform on literally any function. Given ttt and I ∈ { , } n ,we plot the average convergence rate of the EUPM. I ∈ { , } . . A v e r ag e C o n v e r g e n ce R a t e Golden SectionEUPM Fig. B.1 We compare the convergence rate of the EUPM with Golden Section over allpossible sequences I ∈ { , } 0. These sequences are ordered such that the slowest ones areplotted first. For each sequences, we sample a large number of values for ttt and average overthese.For each sequence I ∈ { , } , we sample a large number ttt values, constrained suchthat | ttt | = 1, and average over these. This leaves us with an average convergence rate forevery possible sequence in { , } , and by extension for every possible function. We choose n = 10 and plot the resulting data corresponding to both the EUPM and Golden Section inFigure B.1.What we see is that Golden Section is the more conservative of the two. It performsnotably better in the worst case scenario even if notably worse in the best case scenario.Even though we see some convergence rates from the EUPM which are significantly worsethan those observed in Tables 6.1 to 6.3, they still are far better than the upper bound fromTheorem 4.1. References 1. Brent, R.P.: Algorithms for minimization without derivatives. 19732. Brent, R.P.: A new algorithm for minimizing a function of several variables withoutcalculating derivatives. Algorithms for minimization without derivatives pp. 200–248(1976) seven-point algorithm for piecewise smooth univariate minimization 273. Fletcher, R.: Practical methods of optimization. John Wiley & Sons (2013)4. Gill, P.E., Murray, W., Wright, M.H.: Practical optimization. SIAM (2019)5. Gould, N.: An introduction to algorithms for continuous optimization (2006)6. Hager, W.W.: A derivative-based bracketing scheme for univariate minimization andthe conjugate gradient method. Computers & Mathematics with Applications (9),779–795 (1989)7. Hansen, P., Jaumard, B., Lu, S.H.: Global optimization of univariate Lipschitz functions:I. Survey and properties. Mathematical programming (1-3), 251–272 (1992)8. Jamil, M., Yang, X.S.: A literature survey of benchmark functions for global optimiza-tion problems. arXiv preprint arXiv:1308.4008 (2013)9. Jarratt, P.: An iterative method for locating turning points. The Computer Journal (1), 82–84 (1967)10. Kiefer, J.: Sequential minimax search for a maximum. Proceedings of the Americanmathematical society (3), 502–506 (1953)11. Mifflin, R.: On superlinear convergence in univariate nonsmooth minimization. Mathe-matical programming (1-3), 273–279 (1990)12. Mifflin, R., Strodiot, J.J.: A rapidly convergent five-point algorithm for univariate min-imization. Mathematical programming (1-3), 299–319 (1993)13. Moore, R.E.: Interval analysis, vol. 4. Prentice-Hall Englewood Cliffs, NJ (1966)14. Murray, W., Overton, M.L.: Steplength algorithms for minimizing a class of nondiffer-entiable functions. Computing (4), 309–331 (1979)15. Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media(2006)16. Rudin, W., Others: Principles of mathematical analysis, vol. 3. McGraw-hill New York(1976)17. Yu, J., Vishwanathan, S.V.N., G¨unter, S., Schraudolph, N.N.: A quasi-Newton approachto nonsmooth convex optimization problems in machine learning. Journal of MachineLearning Research11