[PDF] A seven-point algorithm for piecewise smooth univariate minimization

Abstract

In this paper, we construct an algorithm for minimising piecewise smooth functions for which derivative information is not available. The algorithm constructs a pair of quadratic functions, one on each side of the point with smallest known function value, and selects the intersection of these quadratics as the next test point. This algorithm relies on the quadratic function underestimating the true function within a specific range, which is accomplished using a adjustment term that is modified as the algorithm progresses.

Full PDF

aa r X i v : . [ m a t h . O C ] D ec Noname manuscript No. (will be inserted by the editor)

A seven-point algorithm for piecewise smoothunivariate minimization

Jonathan Grant-Peters · Raphael Hauser

Received: date / Accepted: date

Abstract

In this paper, we construct an algorithm for minimising piecewisesmooth functions for which derivative information is not available. The al-gorithm constructs a pair of quadratic functions, one on each side of thepoint with smallest known function value, and selects the intersection of thesequadratics as the next test point. This algorithm relies on the quadratic func-tion underestimating the true function within a speciﬁc range, which is accom-plished using a adjustment term that is modiﬁed as the algorithm progresses.

Keywords

Nonsmooth optimisation · nondiﬀerentiable programming · univariate minimisation · quadratic approximation When solving the problem of minimizing a function of many variables, manyexisting solutions belong to a class of algorithms called line-search algorithms[15,3,5] which reduce the problem by iteratively restricting the search to 1-dimensional subspaces determined by a search direction to determine a stepsize. This 1-dimensional problem may be solved either exactly or inexactly.Most methods rely on the latter approach, as it is computationally cheaperwhile no less eﬀective.While exact line search is rarely used, due to its cost, there exist certainspecial cases where it is more eﬀective such as Linear Programming. Moreover,

This publication is based on work supported by the EPSRC Centre for Doctoral Train-ing in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration withSiemens.J. Grant-PetersMathematical Institute, University of Oxford, United KingdomE-mail: [email protected]. HauserMathematical Institute, University of Oxford, United Kingdom Jonathan Grant-Peters, Raphael Hauser it has been observed by Yu et al [17] that in the context of non-smooth optimi-sation, using exact line-search may result in stepping to a location from wherea better subdiﬀerential approximation may be constructed, thus resulting inselecting better search directions at future iterations.In this paper, motivated by designing an exact line search algorithm fornon-smooth optimisation, we examine the eﬀectiveness of existing univari-ate optimisation algorithms on a class of non-smooth objective functions andintroduce a new algorithm which is speciﬁcally designed for this class. Theparticular problem for which we develop this method is a black box, piecewisesmooth objective function for which derivatives are not available.When conducting an exact line-search, there are two distinct steps. Firstof all we must construct a bracket , which means ﬁnding a closed interval whichis guaranteed to contain a local minimum of the univariate objective function.The second is ﬁnd the local minimum within this interval. In this paper, wefocus mainly on the second step which we rigorously state below.

Problem 1.1

Given is a function f : R → R of the form f = max i =1 ,...,k f i ( x )where f i : R → R are smooth functions, an oracle for calling function values of f ( x ), and an interval [ a, b ]. Design an optimisation method for ﬁnding a localminimiser of f in [ a, b ] which makes use of only oracle calls for f .One class of algorithms which solves Problem 1.1 is that of 1d global op-timisers. This class includes using interval analysis based methods such asthe Moore-Skelboe algorithm [13]. However, these methods require that theobjective function is known, and are not applicable to black box functions.Another category of solutions is that of global Lipschitzian methods suchas those summarised by Hansen et al [7]. While these might be feasible if wecould guess a suitable Lipschitz Constant, in practice there is little gainedfrom them. Even if more than one local optimum existed, we have no reasonto expect that ﬁnding one minimum as opposed to another would make anexact line search based method more eﬀective. Therefore, while we will makeuse of some of the ideas behind such algorithms, we reject them as they involveneedless extra work.When choosing a local univariate solver, one faces a trade oﬀ between speedand robustness. The basic methods are Golden Section [10,4] which convergesQ-linearly for any continuous function, and interpolating methods [4,9] which,when successful, converge super linearly. The latter’s stability depends on theirability to construct a polynomial which approximates the objective functionwell locally. If the approximation does not ﬁt the objective function well, thenthese algorithms may fail.In practice, the most eﬀective methods are bracketing interpolation hybridmethods such as Brent’s Method [2,1] and Hager’s cubic method [6] whichcombine the speed of interpolation methods with the stability of Golden Sec-tion by making use of a fall back option when the interpolation is not working.While convergence is guaranteed for such hybrid methods, they will still needtheir interpolations to match the objective function well if super linear conver- seven-point algorithm for piecewise smooth univariate minimization 3 gence is to be obtained. Therefore, we can’t expect them to converge quicklyfor an objective function of the form stated in Problem 1.1.There exist other bracketing-interpolation hybrid methods for non diﬀer-entiable functions of the form f ( x ) = max i f i ( x ) [14,17]. However, these al-gorithms assume that we can compute each f i (from which f is computed)separately. Therefore, these are unsuitable for Problem 1.1 given that the or-acle is deﬁned to only return the value of f ( x ).The only algorithm which seems to be optimised for our setting is theﬁve point method of Miﬄin and Strodiot [12], to which we will from nowon refer to as “the Miﬄin-Strodiot method”. This algorithm is designed toconverge rapidly to the local minimum x ∗ even when the objective function isnon-diﬀerentiable at x ∗ .The remainder of this paper is structured as follows: In Section 2, we deﬁnethe notion of a bracket rigorously and describe in brief the existing methodswhich are currently most applicable to our problem. Sections 3 to 5 focus on thederivation of a new univariate optimiser, for which we present three variations.We compare our new methods against relevant competitors in Section 6. We begin by rigorously deﬁning a bracket:

Deﬁnition 2.1 (Bracket)

Let f : R → R be a function, and x L , x M , x R ∈ R . We call the trio of points x L , x M , x R a bracket of f if they satisfy thefollowing:1. x L < x M < x R , 2. f ( x L ) ≥ f ( x M ) ≤ f ( x R ).The signiﬁcance of a bracket is that Deﬁnition 2.1 guarantees that there existsa local minimum of f in ( x L , x R ). Deﬁnition 2.2 (Set of Brackets)

We deﬁne B f to be the set of all bracketsfor the function f .In this paper, we refer to the elements of a bracket X as x L , x M and x R .If a bracket X i is associated with a particular iteration of an algorithm, thenwe write its elements as x Li , x Mi and x Ri . Given this notation, we deﬁne thefunction b ( X ) = x R − x L .Bracketing methods are the set of algorithms which are vaguely in the formof Algorithm 1, given the inputs of a step function σ and an update function U . The step function should select a new point within ( x L , x M ) ∪ ( x M , x R ),while the update function should return a bracket. When discussing existingbracketing methods and constructing our new one, we use the structure ofAlgorithm 1 where one algorithm is distinguished from another based on howthe step function σ and update function U are deﬁned.The purpose of this paper is to introduce a new bracketing method calledthe Underestimating Polynomial Method (UPM) for Problem 1.1 for which, to Jonathan Grant-Peters, Raphael Hauser

Algorithm 1

General Bracketing Method

Require: f Objective function, σ Step Function, U Update Function, X Initial Bracket, ǫ > i = 0; while b ( X i ) > ǫ do ˜ x i = σ ( X i ). X i +1 = U (˜ x i , X i ). i = i + 1; end whilereturn X i . our knowledge, there does not currently exist a robust and fast solution. Whenassessing the eﬀectiveness of this algorithm, we will compare it to a smallselection of existing bracketing methods including Golden Section, Brent’sMethod [2,1], and the Miﬄin-Strodiot method [11].Of these algorithms, Golden Section may be considered to most robust as itis guaranteed to converge Q-linearly with a rate of approximately 0 . Deﬁnition 2.3

A function f : R n ⊇ Ω → R is called piecewise smooth if f is continuous on Ω and there exists a disjoint ﬁnite family of sets { Ω i ; i =1 , , . . . , k } such that f is smooth on Ω i for all i ∈ I and ∪ ki =1 Ω i is dense in Ω . We call any point x such that x ∈ Ω i ∩ Ωj for some i, j ∈ I a kink .The theory behind algorithms such as Brent’s method collapses due to theexistence of kinks, because interpolating across a kink has no meaning andyields no meaningful error bound. Therefore, if the local minimum of a function f is a kink, we expect algorithms like Brent’s method to converge to it slowly. The premise of the UPM is inspired by the approach used in the Miﬄin-Strodiot method [11]. We approximate the objective function with two poly-nomials, one on each side of the bracket. If the local minimum is a kink, thenthe combination of these polynomials may be valid approximations and usefulfor locating the minimum. To construct these polynomials, we need more than seven-point algorithm for piecewise smooth univariate minimization 5 the three points contained in a bracket and therefore extend the deﬁnition ofa bracket to include seven points:

Deﬁnition 3.1 (Extended Bracket)

Given the function f , we call X ∈ R an extended bracket written in the following form X = (cid:0) x L x L x L x M x R x R x R (cid:1) T , (3.1)if the following conditions apply: x L < x L < x L

Let X be an extended bracketof f . We deﬁne: diam ( X ) = diam ( Conv ( X )), and b ( X ) = x R − x L . Remark 3.2

Deﬁnition 3.2 is the natural extension of the function b ( X ) fromSection 2 where X is merely a bracket as opposed to an extended bracket.The UPM mostly conforms to the form of Algorithm 1. The main diﬀerence,apart from replacing the bracket with an extended bracket, is the fact thatthe UPM also depends on an input parameter α . This parameter is used bythe step function σ when constructing the next point to evaluate. The threevariations of the UPM presented in this paper diﬀer in how they tread α .The Static Underestimating Polynomial Method (SUPM) requires an ini-tial value of α to be supplied by the user, which is then kept constant for theduration of the algorithm. For the remainder of this section, we denote thestep function σ and update function U which apply to the SUPM by σ S and U S respectively.Given the objective function f , our strategy is to use the points x L , x L , x L and x R , x R , x R to construct the model functions q L ( x ; X , α ), and q R ( x ; X , α )which approximate f L and f R respectively. We will employ Newtons DividedDiﬀerence notation which we summarize below. Deﬁnition 3.3

Given a, b, c ∈ R , and f : R → R , the 1 st and 2 nd divideddiﬀerences are deﬁned by: f [ a, b ] = f ( a ) − f ( b ) a − b , and f [ a, b, c ] = f [ a, b ] − f [ a, c ] b − c . The reason we use this notation is that the 1 st and 2 nd divided diﬀerencesare natural approximations for the 1 st and 2 nd derivatives of f respectively.We will state the result in a later section when more rigour is needed.Now we deﬁne our model functions q L and q R as the following: q k ( x ; X , α ) = f ( x k )+ f [ x k , x k ]( x − x k )+( f [ x k , x k , x k ] − αh ( X ))( x − x k )( x − x k ) , (3.3) Jonathan Grant-Peters, Raphael Hauser where k ∈ { L, R } , α > h ( X ) is a scaling function with theproperty that h ( X ) → diam ( X ) →

0. Note that q k has the form of the 2 nd order Newton Interpolating Polynomial combined with the adjustment term αh ( X ). Finally we express σ S in terms of q L and q R : σ S ( X ; α ) = argmin x ∈ [ x L ,x R ] max( q L ( x ; X , α ) , q R ( x ; X , α )) . (3.4) Remark 3.3

The model functions q L and q R are designed to underestimate f L and f R within [ x L , x R ]. We will show how we achieve this in Lemma 3.2. − . − . − . − .

25 0 .

00 0 .

25 0 .

50 0 .

75 1 . − . . . . f ( x ) q L ( x ) q R ( x ) Fig. 3.1

We plot the objective function in addition to the quadratic function approximatingthe left and right. These two quadratics intersect at x = 0 . . . . . Example 3.1

Consider the function f ( x ) = max (cid:0) sin( xπ ) , − cos( xπ ) (cid:1) . Thisfunction is piecewise smooth and unimodal in the interval [ − ,

1] with a localminimum at 0. At this local minimum, f is not diﬀerentiable.Now suppose that { x Li } i =1 = {− . , − . , − } and { x Ri } i =1 = { . , . , . } . We construct q L and q R by interpolating { ( x ki , f ( x ki )) } i =1 for k = L, R .These two quadratics intersect at x ≈ .

005 as shown in Figure 3.1. This valueis taken to be the next point at which we evaluate f .All that remains is for us to deﬁne the function U S . U S (˜ x ; X , α ) :=  U (˜ x ; X ) if ˜ x < x M and f (˜ x ) < f ( x M ) U (˜ x ; X ) if ˜ x > x M and f (˜ x ) < f ( x M ) U (˜ x ; X ) if ˜ x > x M and f (˜ x ) > f ( x M ) U (˜ x ; X ) if ˜ x < x M and f (˜ x ) > f ( x M ) , (3.5)where U (˜ x ; X ) := (cid:0) x L x L x L ˜ x x M x R x R (cid:1) T , (3.6a) U (˜ x ; X ) := (cid:0) x L x L x M ˜ x x R x R x R (cid:1) T , (3.6b) U (˜ x ; X ) := (cid:0) x L x L x L x M ˜ x x R x R (cid:1) T , (3.6c) U (˜ x ; X ) := (cid:0) x L x L ˜ x x M x R x R x R (cid:1) T . (3.6d) seven-point algorithm for piecewise smooth univariate minimization 7 Remark 3.4

The update function σ S ( X ) is the natural extension of the oneused by the Miﬄin-Strodiot method [12] to 7 points. Lemma 3.1

Let X be an extended bracket. If ˜ x = σ S ( X ) ∈ ( x L , x M ) ∪ ( x M , x R ) , then X = U S (˜ x ; X , α ) is an extended bracket and b ( X ) < b ( X ) . Deﬁnition 3.4

A function f : [ a, b ] → R is called unimodal if there exists ζ ∈ ( a, b ) such that f is monotonically decreasing on ( a, ζ ) and monotonicallyincreasing on ( ζ, b ). If a function is not unimodal, then it is multi-modal. Deﬁnition 3.5

Let f be a piece-wise smooth function. We call f locallyunimodal if for any local minimum x ∗ of f , there exists an open set ( a, b ) ∋ x ∗ such that f is unimodal on ( a, b ).When constructing convergence results, there are three levels of assump-tions we might make. First is the case where f is a piece-wise smooth func-tion, X is a bracket, and nothing more is assumed. In this general context,we can do nothing beyond deﬁning a minimum distance between points δ .If | σ S ( X ) − x M | < δ , we replace σ S ( X ) with argmin x | x − σ S ( X ) | such that | x − x M | ≥ δ , x R − x ≥ δ and x − x L ≥ δ . If σ S ( X ) = x M , then there exists twoequally valid solutions: x M + δ and x M − δ . When this applies, we arbitrarilyset σ S ( X ) = x M − δ . By doing this, we are sure to converge eventually, evenif slowly.For the next level of assumptions, we additionally require f to be in theform f ( x ) = max( f L ( x ) , f R ( x )), and X satisfy f ( x Lj ) = f L ( x Lj ) , j = 1 , , f ( x Rj ) = f L ( x Rj ) , j = 1 , ,

3. Given these assumptions, we show in Lemma 3.2and Corollary 3.1 that σ S ( X ; α ) ∈ ( x L , x R ) given a suﬃciently large choice of α . In short, when there are not too many kinks in one place, then the UPMselects sensible points to evaluate f at.Finally, we add the assumptions that f is unimodal and diam ( X ) is suf-ﬁciently small. Under these circumstances, we show in Lemma 3.3 and The-orem 3.2 that the distance between σ S ( X ) and the true solution x ∗ can bebounded.Taken in the context of locally unimodal functions, these results will implythat the SUPM is stable when the function is not unimodal, and fast when itis. This is suﬃcient for our purposes given that a stable algorithm applied toa locally unimodal function will eventually converge to a bracket in which thefunction is unimodal.For the analysis that follows, we require the following external result: Theorem 3.1 ([1])

Suppose that k, n ≥ ; f ∈ C n + k [ a, b ] ; ζ ∈ [ a, b ] ; and x , . . . , x n are distinct points in [ a, b ] . Then f [ x , . . . , x n ] = f ( n ) ( ζ ) n ! +  X ≤ r ≤ n ( x r − ζ )  f ( n +1) ( ζ )( n + 1)!+ . . . +  X ≤ r ≤ ... ≤ r k ≤ n k Y j =1 ( x r j − ζ )  f ( n + k ) ( ζ )( n + k )! + E (3.7) Jonathan Grant-Peters, Raphael Hauser where E = 1( n + k )!  X ≤ r ≤ ... ≤ r k ≤ n  k Y j =1 ( x r j − ζ )  [ f ( n + k ) ( ξ r ,...,r k ) − f ( n + k ) ( ζ )]  , (3.8) and ξ r ,...,r k are points in the interval spanned by x r , . . . , x r k and ζ . Lemma 3.2

Let f be a function of the form f = max( f L , f R ) such that f ′′ L and f ′′ R are both Lipschitz continuous with Lipschitz constants M L and M R respectively, and X be a an extended bracket of f where f ( x kj ) = f k ( x kj ) for j = 1 , , and k ∈ { L, R } . If α ≥ max ( M L , M R ) and h ( X ) = max( x R − x L , x R − x L ) , then q R ( x ; X , α ) < f R ( x ) ∀ x ∈ [ x L , x R ) and q L ( x ; X , α )

3, we have: f L ( y ) > q L ( y ; X , α ) , ⇔ f L ( y ) > f L ( x L ) + f L [ x L , x L ]( y − x L ) + ( f L [ x L , x L , x L ] − αh ( X ))( y − x L )( y − x L ) , ⇔ f L [ y, x L ] > f L [ x L , x L ] + ( f L [ x L , x L , x L ] − αh ( X ))( y − x L ) , ⇔ f L [ y, x L , x L ] > f L [ x L , x L , x L ] − αh ( X ) , ⇔ αh ( X ) > f L [ x L , x L , x L ] − f L [ y, x L , x L ] . Analogously, for y ∈ [ x L , x R ) we have: αh ( X ) > f R [ x R , x R , x R ] − f R [ y, x R , x R ] . From Theorem 3.1, we know that there exists ξ R ∈ [ x R , x R ] such that f ′′ ( ξ R ) = f R [ x R , x R , x R ] and ξ R ∈ [ y, x R ] such that f ′′ ( ξ R ) = f R [ y, x R , x R ].Equivalently, the points ξ L and ξ L exist for f L [ x L , x L , x L ] and f L [ y, x L , x L ]respectively. Therefore: f L [ x L , x L , x L ] − f L [ y, x L , x L ] = (cid:0) f ′′ L ( ξ L ) − f ′′ L ( ξ L ) (cid:1) , ≤ M L | ξ L − ξ L | , ≤ M L | max( y, x L ) − x L | . The bound on the RHS depends on the value of y , and is itself bounded by M L | x R − x L | when considering y ∈ [ x L , x R ]. Therefore, a suﬃcient condi-tion for f L ( y ) > q L ( y ; X , α ) ∀ y ∈ ( x L , x R ] is that αh ( X ) > M L | x R − x L | .Similarly, a suﬃcient condition for f R ( y ) > q R ( y ; X , α ) ∀ y ∈ [ x L , x R ) is αh ( X ) > M R | x R − x L | . Choosing h ( X ) = max( x R − x L , x R − x L ), and 2 α > max( M L , M R ) ensuresthat both of these are satisﬁed. Corollary 3.1

Let f be a function of the form f = max( f L , f R ) such that f ′′ L and f ′′ R are both Lipschitz continuous with Lipschitz constants M L and M R respectively, and X be a an extended bracket of f where f ( x kj ) = f k ( x kj ) for j = 1 , , and k ∈ { L, R } . If α ≥ max( M L , M R ) and h ( X ) = max( x R − x L , x R − x L ) , then σ S ( X ) ∈ ( x L , x R ) . seven-point algorithm for piecewise smooth univariate minimization 9 Proof

As Lemma 3.2 is applicable, we know that q R ( x ) < f R ( x ) ∀ x ∈ [ x L , x R )and q L ( x ) < f L ( x ) ∀ x ∈ ( x L , x R ]. Since f = max( f L , f R ), it follows that q L ( x M ) < f ( x M ) and q R ( x M ) < f ( x M ). For convenience, write q ( x ) =max( q L ( x ; X , α ) , q R ( x ; X , α )) . Starting from Equation (3.4), we obtain: q ( σ S ( X ; α )) ≤ q ( x M ) < f ( x M ) ≤ min( f ( x L ) , f ( x R )) = min( q L ( x L ) , q R ( x R )) . But q ( x L ) = q L ( x L ) and q ( x R ) = q R ( x R ). Therefore q ( σ S ( X ; α )) < min( q ( x L ) , q ( x R )),which implies that σ S ( X ; α ) = x L or x R . Since σ S ( X ; α ) ∈ [ x L , x R ] by deﬁni-tion (see Equation (3.4)), it follows that σ S ( X ; α ) ∈ ( x L , x R ). Remark 3.5

Between Corollary 3.1 and the requirement that the minimumdistance between any two points in X is δ , we are assured that the conditionsfor Lemma 3.1 will be satisﬁed.For the remainder of this section, we work towards bounding the conver-gence of the SUPM when f is piecewise-smooth and unimodal. We deﬁne x ∗ to be the unique local minimum of f in ( x L , x R ). When assessing the SUPM’sconvergence rate, we consider two cases based on the values of f ′ L ( x ∗ ) and f ′ R ( x ∗ ).If f ′ R ( x ∗ ) = 0 or f ′ L ( x ∗ ) = 0 holds, then we expect the SUPM to behavesimilarly to Brent’s Method. For example, if f ′ R ( x ∗ ) = 0 and f ′ L ( x ∗ ) <

0, thenwe expect q R to behave similarly to the interpolated polynomial from Brent’sMethod. The SUPM will be slower however for two reasons.1. When the SUPM computes a new point, that value may be stored in x M which does not impact the next computation but only the one after that.2. As only q R is converging, any point which is used to construct q L will playlittle to no role in determining σ S ( X ), and is therefore useless.We focus our attention primarily on the case where f ′ L ( x ∗ ) < f ′ R ( x ∗ ) >

0. In order to bound | σ S ( X ) | , we need to understand what typeof point σ S ( X ) returns. We address this in Lemma 3.3. First however, weintroduce a shorthand notation for Divided Diﬀerences which we use for theremainder of this section. f k = f ( x k ) , k ∈ { L, R } ,f k = f [ x k , x k ] , k ∈ { L, R } ,f k = f [ x k , x k , x k ] , k ∈ { L, R } . Lemma 3.3

Let f be a piecewise smooth unimodal function of the form f =max( f L , f R ) such that f ′′ L and f ′′ R are both Lipschitz continuous with Lipschitzconstants M L and M R respectively, and X be a an extended bracket of f .Further let α be given such that α ≥ max( M L , M R ) .If f ′ L (0) < and f ′ R (0) > , then there exists ǫ > such that if X ∈ ( − ǫ, ǫ ) , then the model functions q L and q R intersect exactly once within ( x L , x R ) and σ S ( X ) returns this uniqueintersection point. Proof

Since f = max( f L , f R ) is unimodal, and X is an extended bracket, itfollows that f ( x kj ) = f k ( x kj ) for j = 1 , , k ∈ { L, R } . We use this alongwith Lemma 3.2 to show: q L ( x L ) = f ( x L ) = f L ( x L ) ≥ f R ( x L ) > q R ( x L ), andsimilarly q R ( x R ) > q L ( x L ). By the Intermediate Value Theorem [16, Theorem4.23], it follows that q L intersects with q R at least once in ( x L , x R ). Moreover,since q L and q R are both quadratic functions, they must intersect exactly once.A suﬃcient but not necessary condition for the result to follow would be ∃ ǫ > q L and q R are monotonically decreasing and increasingrespectively. Once this holds, σ S ( X ) = argmin x max( q L ( x ) , q R ( x )) must referto the intersection of q L and q R .To show that q L is monotonically decreasing, it is suﬃcient to prove that( q L ) ′ ( x L ) ≤ q L ) ′ ( x R ) <

0. Once this is shown, we are done since ( q L ) ′ is a linear function of x . We ﬁnd :( q L ) ′ ( x L ) = f L = f ( x L ) − f ( x L ) x L − x L ≤ , since f is unimodal and X is an extended bracket of f (recall Deﬁnition 3.1).For ( q L ) ′ ( x R ) we begin with( q L ) ′ ( x R ) = f L + ( f L − αh ( X ))(2 x R − x L − x L ) . (3.9)Next we apply Theorem 3.1 to conclude: ∃ ξ L , ξ L ∈ [ x L , x L ] such that f L = f ′ L (0) + ( x L f ′′ L ( ξ L ) + x L f ′′ L ( ξ L )) , ∃ ξ L ∈ [ x L , x L ] such that f L = f ′′ L ( ξ L ) . From this and Equation (3.9) it follows that( q L ) ′ ( x R ) = f ′ L (0) + (cid:0) x L f ′′ L ( ξ L ) + x L f ′′ L ( ξ L ) (cid:1) + (2 x R − x L − x L ) (cid:0) f ′′ L ( ξ L ) − αh ( X ) (cid:1) . Since f ′′ L and f ′′ R are Lipschitz continuous, we know that they are bounded on[ x L , x R ]. Let M be constant which bounds f ′′ L and f ′′ R . We use this along withthe fact that X ∈ ( − ǫ, ǫ ) to conclude:( q L ) ′ ( x R ) ≤ f ′ L (0) + (cid:0) − x L M − x L M + (2 x R − x L − x L )( M − αh ( X ) (cid:1) , ( q L ) ′ ( x R ) ≤ f ′ L (0) + (2 ǫM + 4 ǫ ( M − αh ( X )) , ≤ f ′ L (0) + ǫ (3 M − αh ( X )) ≤ f ′ L (0) + 3 ǫM, since 0 < h ( X ) < ǫ and α is constant. Therefore a suﬃcient condition for( q L ) ′ ( x R ) < ǫ < | f ′ L (0) | / M . That q R is monotonicallyincreasing given suﬃciently small ǫ follows from the equivalent argument.Having established the conditions under which we can express σ S ( X ) ana-lytically, we now bound the distance between this point and the true minimiser. seven-point algorithm for piecewise smooth univariate minimization 11 Theorem 3.2

Let f : [ a, b ] → R be a piecewise smooth unimodal functionof the form f = max( f L , f R ) such that x ∗ ∈ [ a, b ] is a minimiser of f , f ′ L ( x ∗ ) < , f ′ R ( x ∗ ) > , and X be an extended bracket of f . If the func-tions f ′′ L and f ′′ R are both Lipschitz continuous (with constant M L and M R )and α > max( M L , M R ) , then there exists ǫ > such that | σ S ( X ) | ≤ √ f ′ R ( x ∗ ) − f ′ L ( x ∗ ) (cid:0) | x L − x ∗ || x L − x ∗ | ( | x L − x ∗ | + | x L − x ∗ | ) M L + | x R − x ∗ || x R − x ∗ | ( | x R − x ∗ | + | x R − x ∗ | ) M R (cid:1) , when X ∈ ( − ǫ, ǫ ) .Proof Since Lemma 3.3 applies, we know that σ S ( X ) returns the unique pointof intersection between q L and q R in ( x L , x R ). The intersection points of thesequadratics are the roots to the polynomial ax + bx + c = 0, where: a = f R − f L ,b = f R − f L − ( x R + x R )( f R − αh ) + ( x L + x L )( f L − αh ) ,c = f R − f L − x R f R + x L f L + x R x R ( f R − αh ) − x L x L ( f L − αh ) . Assuming without loss of generality x ∗ = 0, we invoke Theorem 3.1 to makethe following substitutions for ℓ ∈ { L, R } . f ℓ = f ℓ (0) + x ℓ f ′ ℓ (0) + ( x ℓ ) f ′′ ℓ ( ξ ℓ ) , ξ ℓ ∈ [0 , x ℓ ] ,f ℓ = f ′ ℓ (0) + (cid:0) x ℓ f ′′ ℓ ( ξ ℓ ) + x ℓ f ′′ ℓ ( ξ ℓ ) (cid:1) , ξ ℓ ∈ [0 , x ℓ ] , ξ ℓ ∈ [0 , x ℓ ] ,f ℓ = f ′′ ℓ ( ξ ℓ ) , ξ ℓ ∈ [0 , x ℓ ] . By inserting the substitutions above into the deﬁnitions of a, b and c , we get: a = ( f ′′ R (0) − f ′′ L (0)) + K Ra − K La ,b = f ′ R (0) − f ′ L (0) + K Rb − K Lb ,c = K Rc − K Lc , where K ℓa , K ℓb and K ℓc are in turn deﬁned for ℓ ∈ { L, R } by: K ℓa = (cid:0) f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ (0) (cid:1) ,K ℓb = (cid:0) x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) + x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) (cid:1) + ( x ℓ + x ℓ ) αh ( X ) ,K ℓc = x ℓ (cid:0) x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) + x ℓ ( f ′′ ℓ ( ξ ℓ ) − f ′′ ℓ ( ξ ℓ )) (cid:1) − x ℓ x ℓ αh ( X ) . Writing a, b and c in terms of these K values allows us to give bounds usingthe assumption that f ′′ L and f ′′ R are Lipschitz Continuous . | K ℓa | ≤ M ℓ | x ℓ | ≤ M ℓ ǫ, (3.10a) | K ℓb | ≤ ( | x ℓ | + | x ℓ | ) (cid:0) M ℓ | x ℓ | + αh ( X ) (cid:1) ≤ ǫ ( M ℓ + 2 α ) , (3.10b) | K ℓc | ≤ | x ℓ || x ℓ | M ℓ (cid:0) | x ℓ | + | x ℓ | (cid:1) ≤ M ℓ ǫ . (3.10c) Choose an ǫ small enough to ensure ( f ′ R (0) − f ′ L (0)) > K Lb − K Rb (notethat b → f ′ R (0) − f ′ L (0) as ǫ → ǫ ≤ ǫ , then it follows that b > σ S ( X ) = − b + √ b − ac a , (3.11)because the other root lies outside [ x L , x R ] as ǫ →

0. We apply Taylors The-orem to conclude ∃ ξ ∈ [ −| ac | , | ac | ] such that √ b − ac = b + 2 ac/ p b − ξ .Since ac → ǫ →

0, there exists ǫ such that | ac | ≤ b when ǫ < ǫ .Then if ǫ < min( ǫ , ǫ , ǫ ), we have: | σ S ( X ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − b + √ b − ac a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 | a | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ac p b − ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , ξ ∈ [0 , ac ] , ≤ | c | p b − ξ ≤ | c | p b − | ac | ≤ √ (cid:12)(cid:12)(cid:12) cb (cid:12)(cid:12)(cid:12) , = √ K Rc − K Lc f ′ R (0) − f ′ L (0) + K Rb − K Lb ≤ √ | K Rc | + | K Lc | f ′ R (0) − f ′ L (0) . Finally we insert Equation (3.10c) and the result follows.Theorem 3.2 is remarkably similar to [11, Theorem 4.1], the equivalentresult which bounds the convergence of the Miﬄin-Strodiot method . Usingour own notation for their result, Miﬄin and Strodiot showed | x Mi − x ∗ | ≤ A i | x ∗ − x L ,i | + B i | x L ,i − x ∗ | , for each iteration i , where A i and B i are sequenceswhich converge to 0 as i → ∞ . If we rewrite the result from Theorem 3.2 in thesame form, the sequence A i may be written explicitly in terms of x L ,i , x L ,i , and x L ,i , and similarly for B i . Then, if x L ,i and x L ,i both converge to x ∗ , the rateat which A i and B i tend to zero begins to resemble quadratic convergence.However, Theorem 3.2, like [11, Theorem 4.1] does not prove quadraticconvergence, nor even linear convergence. This is because we cannot know apriori whether the sequences { x L ,i } and { x L ,i } converge at all, nor how quicklyif they do. Suppose there exists an iteration i such that ∀ i > i , x L ,i isconstant. In this case, the bound given by Theorem 3.2 will decrease, but notconverge to 0. In practice, when this occurs x Mi will still converge to x ∗ butonly sub-linearly, while b ( X i ) { x L ,i } and { x L ,i } converge to x ∗ , butwe only update x L ,i once every k iterations. Then we may say that | x Mki − x ∗ | → i → ∞ , but depending on the size of k , this may not beparticularly useful.In short, there are three main weaknesses of the SUPM which lead toits failure. The ﬁrst is that we need to choose α , a suitable value of whichwe cannot know in advance. The second is that convergence only occurs inreasonable time if both updates of x L ,i and x L ,i occur regularly. Finally, werequire a bracket suﬃciently close to the solution that f is unimodal before anyof our results apply. These shortcomings of the SUPM motivate subsequentsections, in which we describe more stable variations of the UPM. seven-point algorithm for piecewise smooth univariate minimization 13 In Section 3 , we showed that if α was suﬃciently large, then the σ S is guar-anteed to select a point within ( x L , x R ). Coupled with a restriction on theminimum distance between points, this is enough to ensure convergence even-tually. However, for an arbitrary piecewise smooth objective function, we haveno idea what a suitable α might be. As there is no ﬁnite value of α whichsatisﬁes Lemma 3.2 for all functions, we ask what happens if we let α → ∞ and name the corresponding variation of the SUPM the Extremal Underesti-mating Polynomial Method (EUPM), denoting its step function and updatefunction by σ E and U E respectively.As α → ∞ we are assured by Lemma 3.2 that q L and q R will underestimate f L and f R respectively within [ x L , x R ]. Furthermore, if h ( X ) >

0, then we canpick a suﬃciently large α such that the coeﬃcient of x in q k ( x ; X , α ) (seeEquation (3.3)) will be negative. Once α satisﬁes both of these, then q L and q R will intersect at exactly one point, which is the point that σ S ( X ) returns.We deﬁne σ E ( X ) to be this point which we calculate by identifying the root of q R ( x ; X , α ) − q L ( x ; X , α ) = 0 , which remains bounded as α → ∞ . By startingfrom Equation (3.11) and noting the Taylor Series as α → ∞ , we ﬁnd: σ E ( X ) = x R x R − x L x L x R + x R − x L − x L . (4.1)Note that the step function σ E does not depend on either x L or x R . There-fore, within the context of this section, our extended bracket X is of the form: X = (cid:0) x L x L x M x R x R (cid:1) T , (4.2)In order to construct U E , we ﬁrst deﬁne ˜ U S to be the update function U S ,were we remove the entries corresponding to x R and x L . Using this notation,we deﬁne: U E ( X ) := ˜ U S ( σ E ( X ) , X ) . (4.3)Given σ E and U E , the EUPM is precisely in the form of Algorithm 1. Remark 4.1

Equation (4.3) is of the same form as the update function usedby the Miﬄin-Strodiot method [12].

Remark 4.2

We see the EUPM is qualitatively diﬀerent from the SUPM inthat σ E depends only on the values of X , and not on the function f at all.Therefore, like Golden Section, the EUPM is equally suited to smooth, non-smooth, unimodal and multi-modal functions and is scale invariant.4.1 Theoretical Convergence of the EUPMIn this section we derive a bound for the convergence rate of the EUPM. Inparticular, we ﬁnd an integer k and constant C such that b ( X i + k ) /b ( X i ) ≤ C holds for any objective function. While there exist inﬁnitely many possibleojective functions, there are only ﬁnitely possible values for X i + k given X i .This is because σ E does not depend on the objective function f and thus theonly eﬀect that f has on the EUPM is in the function U E (See Equation (3.5))by determining which update function: U , U , U or U to apply, none of whichdepend on f .Therefore, there are only 4 possible values of X i +1 given X i : U j X for j =1 , , ,

4. As a consequence of this, there are at most 4 k possible values of X i + k given X i . Each possible value of X i + k depends on the sequence of updatefunctions which generated it. Remark 4.3

In this section, we will refer to the update functions U , U , U and U extensively. To simplify notation, we replace U j with ˜ U j , where ˜ U j ( X ) := U j ( σ E ( X ) , X ). For the remainder of Section 4, U j should be read as ˜ U j . Deﬁnition 4.1

Let {X i } ni =0 be a sequence of extended brackets generated bythe EUPM such that X i +1 = U E ( X i ) for each i . Let I = { i j } n − j =0 ∈ { , , , } n be the sequence of numbers such that X j +1 = U i j ( X j ) , ∀ j ∈ { , , . . . , n − } .Then we call I the sequence of EUPM iterations which generate {X i } ni =0 .Moreover we write: U I := U i n ◦ . . . U i ◦ U i , where I = { i j } nj =1 . Deﬁnition 4.2

Deﬁne I n to be the set of all length n sequences of EUPMiterations. Deﬁnition 4.3

Deﬁne the Q k ( f, X ) to be the sequence of the ﬁrst k itera-tions generated by the EUPM when applied to the function f with the startingbracket X . Lemma 4.1

Let f be a function, X ∈ B f and I ∈ I n such that I = { I , I } = Q n ( f, X ) , where I ∈ I n − k and I ∈ I k . Then it holds that: Q k ( f, U I X ) = I . Using the notation from Deﬁnitions 4.1 and 4.3, it follows X n + k = U Q k ( f, X n ) ( X n ).Since Q k ( f, X n ) ∈ I k for every function f , it follows that: b ( X n + k ) b ( X n ) ≤ C for any function f ⇔ b ( U I X n ) b ( X n ) , ∀ I ∈ I k . Deﬁnition 4.4

Given I ∈ I n , we deﬁne:˜ B I := {X ∈ R | ∃ f such that X ∈ B f , Q n ( f, X ) = I } . Lemma 4.2

Let I ∈ I n be of the form I = { I , I } , where I ∈ I n − k and I ∈ I k . Then it holds that: ˜ B I ⊆ ˜ B I and U I ( ˜ B I ) ⊆ ˜ B I .Proof If X ∈ ˜ B I , then there exists a function f such that I = Q n ( f, X ).Since, I = { I , I } , then I = Q n − k ( f, X ) by deﬁnition of Q . Hence X ∈ ˜ B I .Moreover, I = Q k ( f, U I X ) by Lemma 4.1, which implies U I X ∈ ˜ B I . seven-point algorithm for piecewise smooth univariate minimization 15 We ﬁnally state our main result.

Theorem 4.1

The EUPM converges R-linearly. In particular: max I ∈I max X ∈ ˜ B I b ( U I X ) b ( X ) ≤ , for any bracket . Given how long the proof of Theorem 4.1 is, we reserve an entire sectionfor it. The proof may be found in Section 4.2, although it may be skipped onﬁrst reading.Theorem 4.1 provides a bound on the linear convergence rate for the EUPMalbeit a weak one. In practice, we observe far faster convergence as will bedemonstrated in Section 6 and appendix B. However, the main justiﬁcationfor the EUPM is its robustness. In Section 3, we assumed that the SUPMstarted with a bracket that was already suﬃciently close to the solution.Once equipped with this, we might expect the SUPM to converge quickly.The EUPM is useful for producing such a bracket in the ﬁrst place. In Sec-tion 5, we present the ﬁnal variation of the UPM, in which we combine theinitial robustness of the EUPM with the desirable properties of the SUPM.4.2 Proof of Theorem 4.1We divide this proof into two parts, which are deﬁned by Lemmas 4.3 and 4.4.

Lemma 4.3

It holds that b ( U I X ) /b ( X ) ≤ ∀ I ∈ A where A = { , , , , , , , , , , , } . Lemma 4.4 If b ( U I X ) /b ( X ) ≤ ∀ I ∈ A then b ( U I X ) /b ( X ) ≤ ∀ I ∈ I ,where A is as deﬁned in Lemma 4.3. Given that the proof of Lemma 4.3 is entirely algebraic calculations, weleave it for Appendix A. In this section, we work towards proving Lemma 4.4.Ultimately, Theorem 4.1 follows from these two results.

Deﬁnition 4.5

Let I = { i j } nj =1 ∈ I n be a length n sequence of iterationsof the EUPM. A sub-string of the sequence I is a subsequence of the form J = { i j } k j = k such that 1 ≤ k < k ≤ n . Denote by S ( I ) the set of sub-stringsof the sequence I .The ﬁrst step towards proving Lemma 4.4 is to show that if I ∈ I isa length 5 sequence of iterations and J ∈ S ( I ) is a sub-string of I , then b ( U J X ) /b ( X ) ≤ implies b ( U I X ) /b ( X ) ≤ . Lemma 4.5

Let X be an extended bracket of f . Given I ∈ I n , and J ∈ S ( I ) ,it holds that: max X ∈ ˜ B I b ( U I X ) b ( X ) ≤ max X ∈ ˜ B J b ( U J X ) b ( X ) . In the next few lemmas, we show that not all sequences of { , , , } arepossible sequences of EUPM iterations given a particular starting bracket X . Lemma 4.6

Let I = { i j } n − j =0 ∈ I n be a sequence of EUPM iterations. If i j = 1 then i j +1 ∈ { , } . Similarly, if i j = 2 then i j +1 ∈ { , } .Proof If i j = 1, then by deﬁnition X j +1 = U X j . From this, we calculate: σ E ( X j +1 ) − x Mj +1 = x L ,j +1 x L ,j +1 − x L ,j +1 x L ,j +1 x L ,j +1 + x L ,j +1 − x L ,j +1 − x L ,j +1 − x Mj +1 , = x L ,j x Mj − x L ,j x L ,j x L ,j + x Mj − x L ,j − x L ,j − x L ,j x L ,j − x L ,j x L ,j x L ,j + x L ,j − x L ,j − x L ,j , = − ( x L ,j − x Mj )( x L ,j − x L ,j )( x L ,j − x L ,j )( x Mj + x L ,j − x L ,j − x L ,j )( x L ,j + x L ,j − x L ,j − x L ,j ) < , since each X j is an extended bracket. Given that σ E ( X j +1 ) − x Mj +1 <

0, it fol-lows from Equation (3.5) that i j +1 ∈ { , } . The equivalent argument appliesfor when i j = 2.In the statement of Lemma 4.6, one may notice a symmetry between thefunctions U , U and U , U respectively. In the lemmas which follow, we derivea relation between U , U and U , U respectively such that if b ( U I X ) /b ( X ) ≤ holds for some I , it will still hold after replacing U and U with U and U and vice versa. Deﬁnition 4.6

Let I ∈ I n be a sequence of EUPM iterations. We deﬁne X : R → R to be the function which satisﬁes X ( a, b, c, d, e ) = − ( e, d, c, b, a ).We make use of the function X because of the following useful properties. Lemma 4.7

It holds that XX X = X , σ E ( X X ) = − σ E ( X ) , b ( X X ) = b ( X ) , U X = XU X X , U X = XU X X , U X = XU X X and U X = XU X X . Having shown that the function X relates U to U and U to U , we nowshow that this relation also holds for sequences of iterations. Deﬁnition 4.7

Let I = { i j } nj =1 ∈ I n . Deﬁne W : I n → I n to be the function: W ( I ) = { w ( i j ) } nj =1 , where w (1) = 2 , w (2) = 1 , w (3) = 4 and w (4) = 3. Lemma 4.8 If f and g are functions such that g ( x ) = f ( − x ) and X ∈B f , then X X ∈ B g . Moreover, if {X fi } ni =0 and {X gi } ni =0 are the sequences ofextended brackets generated by the EUPM when applied to f and g startingfrom X and X X respectively, then it holds that X fj = X X gj for every j =0 , , . . . , n and Q n ( g, X X ) = W ( Q n ( f, X )) .Proof Note that X X ∈ B g follows from Deﬁnitions 3.1 and 4.6. Let {X fi } ni =0 and {X gi } ni =0 be the sequences of extend brackets generated by applying theEUPM with the starting brackets X and X X to f and g respectively. Deﬁne: ϕ ( X ) = σ E ( X ) − x M . (4.4) seven-point algorithm for piecewise smooth univariate minimization 17 Since σ E ( X ) = − σ E ( X X ), it follows that f ( σ E ( X )) = g ( σ E ( X X )) and ϕ ( X ) < ⇔ ϕ ( X X ) >

0. Given that sign ( ϕ ( X )) is precisely the condition inEquation (3.5) which determines whether to apply update functions U , U or U , U , it follows that X f = U j X f = U j X ⇔ X g = U W ( j ) X g = U W ( j ) X X .Applying Lemma 4.7, we note that U W ( j ) X X = XU j X which impliesthat X g = X X f . From this it follows inductively that X fj = X X gj for all j = 1 , , . . . , n . Moreover, if we write Q n ( f, X ) and Q n ( g, X X ) in the forms { i fj } n − j =0 and { i gj } n − j =0 respectively, it also follows inductively that i fj = W ( i gj )for each j = 0 , , . . . , n −

1; in other words Q n ( g, X X ) = W ( Q n ( f, X )). Corollary 4.1 If I ∈ I n , then: X ( ˜ B I ) = ˜ B W ( I ) (recall Deﬁnition 4.4). Corollary 4.2

It holds that U I X = XU W ( I ) X X , ∀ I ∈ I n .Proof Let f be a function such that I = Q ( f, X ). By Lemma 4.8, the function g ( x ) = f ( − x ) has the property that Q n ( g, X X ) = W ( Q n ( f, X )) and U I ( X ) = U Q n ( f, X ) ( X ) = XU Q n ( g,X X ) ( X X ) = XU W ( I ) ( X X ).The map W serves as a natural bijective between sequences of EUPMiterations which start with 1 or 4 and those which start with 2 or 3. Dividing I is useful due to the following result. Corollary 4.3

Given I ∈ I n a sequence of EUPM iterations, it holds that: max X ∈ ˜ B I b ( U I X ) b ( X ) = max X ∈ ˜ B W ( I ) b ( U W ( I ) X ) b ( X ) Proof

Using Lemma 4.7 and Corollaries 4.1 and 4.2 it follows that:max

X ∈ ˜ B I b ( U I X ) b ( X ) = max X ∈ ˜ B I b ( XU W ( I ) X X ) b ( X ) = max X ∈ ˜ B I b ( U W ( I ) X X ) b ( X X ) = max sss ∈ ˜ B W ( I ) b ( U W ( I ) sss ) b ( sss ) . We are now equipped to prove Lemma 4.4. For this proof, we write partic-ular sequences of iterations such as { , , } simply as 413. Proof (Lemma 4.4)

To prove this lemma, we list out elements of I , I and I which begin with either 1 or 4 and mark them in the following way. If asequence is contained in A , we overline it. If a sequence contains a sub-stringwhich is contained in A , we underline the relevant sub-string. Finally, if asequence contains a sub-string J such that W ( J ) ∈ A , then we place squarebrackets around J . We will show that all elements of I can be marked in oneof these three ways. First we list elements of I which start with either 1 or4. There is no need to list the sequences starting with 2 or 3 because each ofthose sequences is merely W ( J ) for some J listed below.111 141 143 411 422 431 4[33] 441 443114 142 144 414 423 432 434 442 444 We see that only 7 (or 14 if you count the equivalent sequences starting with 2or 3) of the sequences above are unmarked. We take these seven sequences, andlist all elements of I which begin with any of these 7 sub-strings. We neglectto list out elements of I which begin with a marked sub-string because theseelements are themselves guaranteed to be marked .1141 1143 1411 1422 4111 4231 42[33] 4[311] 43221142 1144 1414 1423 4114 4232 4[234] 4314 4[323]Using the same procedure as earlier, we see only 3 sequences are left unmarked.We list out all elements of I which begin with one of these three sequencesand note that all of them are marked.11422 42311 4[2322] 11423 4[2314] 42[323] . If a sequence J is marked in any of these ways, then it follows from Lemma 4.5and Corollary 4.3 that b ( U I X ) /b ( X ) ≤ ∀ I ∈ A implies b ( U J X ) /b ( X ) ≤ . In Section 3, we identiﬁed two main weaknesses of the SUPM: that of choosinga suitable α , and that of ensuring that both x R ,i and x L ,i converge to x ∗ fastenough. In this section, we introduce a heuristic which makes use of the featuresof the SUPM, while being equipped with a safeguarding option designed toavoid the situations where the SUPM fails. We call this heuristic the DynamicUPM (DUPM), and denote the step and update functions for the DUPM by σ D and U D respectively.The eﬀect that α has on σ S lies in how the model functions (see Equa-tion (3.3)) are constructed. We see that in this deﬁnition, α is multiplied to h ( X ). Since h ( X ) → α should not a problem ultimately. However, theDUPM may stall temporarily when α is too small. Therefore, we constructthe DUPM such that it will increase α when necessary, but never decrease it.In order for Theorem 3.2 to apply, we need α to satisfy Lemma 3.2. Whilethere is no way to check this, a necessary condition for Lemma 3.2 to apply isthat f ( x M ) ≥ q L ( x M ) and f ( x M ) ≥ q R ( x M ). Using a method similar to theproof of Lemma 3.2, we ﬁnd that this is equivalent to: α ≥ h ( X ) max k ∈{ L,R } (cid:0) f [ x k , x k , x k ] − f [ x M , x k , x k ] (cid:1) . (5.1)Therefore, at each iteration of the DUPM we set α i +1 ≥ max( α i , α ∗ ) where α ∗ is the smallest value α which satisﬁes Equation (5.1).Next we wish to force the DUPM to update both x L ,i and x R ,i regularly. Inorder to do this, we must ﬁrst equip the DUPM to recognize when this occurs.We append the three variables u , u , u , whose purpose is record whether theupdate function U D updated x L or x R during each of the last three iterations, seven-point algorithm for piecewise smooth univariate minimization 19 to the extended bracket X . The update function U D is deﬁned to be the naturalextension of U S (see Equation (3.5)) which includes u , u and u : U D (˜ x ; X , α ) =  ( U (˜ x ; X ) , R, u , u ) if ˜ x < x M and f (˜ x ) < f ( x M )( U (˜ x ; X ) , L, u , u ) if ˜ x > x M and f (˜ x ) < f ( x M )( U (˜ x ; X ) , R, u , u ) if ˜ x > x M and f (˜ x ) > f ( x M )( U (˜ x ; X ) , L, u , u ) if ˜ x < x M and f (˜ x ) > f ( x M ) , (5.2)where the functions U , . . . , U are as deﬁned in Equation (3.6).If u = u = u , then this means that the DUPM has updated either x L ,i ,or x R ,i for each of the last three iterations. When this occurs, we force theDUPM to take an EUPM step instead, σ D ( X ; α ) = σ E ( X ). This interferenceis similar to the fall back option used by Brent’s method [2,1]. The choice tointerfere after three iterations is based on practical experience, and not on anytheoretical insight.Finally, we deﬁne one more condition under which we interfere with theSUPM. From practical experience, we observe situations where either q L or q R is a convex function whose local minimum is returned by σ S , which resultin slow convergence. As a response to this, we ensure that α is suﬃciently bigthat σ D returns a point of intersection between q L and q R . More rigorously,we require α such that q L ( σ S ( X ; α ); X , α ) = q R ( σ S ( X ; α ); X , α ) (5.3)holds, where σ S is the step function for the SUPM (see Equation (3.4)).Deﬁne χ ( X ) to be the the smallest value of α such that Equation (5.3)holds for all α ≥ χ ( X ). Then at each iteration of the DUPM, we require α i +1 ≥ max( α i , χ ( X )).In order to compute χ ( X ), note Equation (5.3) is satisﬁed whenever q L and q R are concave functions, provided that Equation (5.1) holds. Deﬁne α + to be max k f [ x k , x k , x k ] /h ( X ), k ∈ { L, R } . Then it follows that for all α > α + ,Equation (5.3) is satisﬁed, implying that χ ( X ) ≤ α + .If at iteration i , α i satisﬁes Equation (5.3), then there is no need to compute χ ( X ). Otherwise, we know that χ ( X ) lies in the interval ( α i , α + ]. Therefore,we can apply bisection to compute χ ( X ) to any desired accuracy.To deﬁne the DUPM rigorously, we must insert an additional line intoAlgorithm 1. Before computing σ D , we set α i +1 = max (cid:18) α i , χ ( X i ) , h ( X ) max k ∈{ L,R } (cid:0) f [ x k , x k , x k ] − f [ x M , x k , x k ] (cid:1)(cid:19) . Finally we deﬁne σ D ( X ; α ) := (cid:26) if u = u = u σ E ( X )else σ S ( X ; α ) . (5.4)Ultimately the purpose of these interferences is to ensure that Theorem 3.2applies as soon as possible while forcing both x L ,i and x R ,i to be updated. Thus far, we have presented three algorithms: the SUPM, EUPM, and DUPM.Of these, we have only bounded the theoretical convergence rate for EUPM.While Theorem 3.2 is insightful, it does not yield any concrete bound. More-over, while the DUPM has some nice properties which are designed to strengthenthe SUPM, we have not oﬀered any proof as to their eﬀectiveness. Therefore, inorder to justify these algorithms, we demonstrate their use on a variety of testfunctions, comparing them to Brent’s method [1,2] and the Miﬄin-Strodiotmethod [11]. Also, recall Golden Section which converges Q-linearly with rate0 . f such that the minimum Lipschitz constant for f ′′′ lies between 0 . f SU = − √ e exp (cid:0) − x (cid:1) , − ≤ x ≤ , (6.1) f SU = 124 x , − ≤ x ≤ , (6.2) f SU = 111 (cid:0) − sin(2 x − π ) − x − x (cid:1) , − . ≤ x ≤ , (6.3) f SU = 12500 (cid:18) x − cos (5 πx )25 π − x sin(5 πx )5 π (cid:19) , − ≤ x ≤ , (6.4) f SU = − (cid:16) x + (1 − x ) (cid:17) , . ≤ x ≤ . , (6.5) f SU = 16000 (cid:18) e x + 1 √ x (cid:19) , . ≤ x ≤ , (6.6) f SU = −

113 (16 x − x + 5) e − x , . ≤ x ≤ . . (6.7)For each test function, we have an interval [ a, b ] on which the function isintended to be used. We sample 4 points uniformly from both [ a, a + ( b − a )]and [ b − ( b − a ) , b ]. After evaluating the function at these 8 points, we selectthe point where f is minimised to be x M ; usually this will be one of the pointscloser to ( b − a ). Finally we order the remaining points, and select the 3closest on the left and right to be x L , x L , . . . , x R . Selecting our initial pointsin this way yields an extended bracket.Having generated the bracket, we run each algorithm and compute theaverage convergence rate over 1000 diﬀerent random initialisations. If an algo-rithm converged for a particular starting bracket, then its average convergencerate is by deﬁnition between 0 and 1, the smaller the better. Iﬀor any random seven-point algorithm for piecewise smooth univariate minimization 21SUPM with parameter value Other methodsFunctions α = 0 α = 0 . α = 1 α = 10 EUPM DUPM Brent Miﬄin f SU . . . . . .

48 0 . . f SU .

68 0 . . . . . . . f SU ∞ . . . . . . ∞ f SU ∞ . . . . . . ∞ f SU ∞ . . . . . . . f SU ∞ . .

615 0 . . . . ∞ f SU .

35 0 . . . . . . ∞ Table 6.1

Average convergence rate of diﬀerent algorithms on the set of smooth unimodaltest functions. initialisation, an algorithm did not converge, we assign ∞ instead. We presentthese results in Table 6.1.First of all, we see that Brent’s method consistently converges the fastest.This is not surprising, given that Brent’s method is the only algorithm shownhere which is speciﬁcally designed for smooth functions. While the Miﬄin-Strodiot method is faster in select circumstances, it fails entirely on others.This is also what we observe from the SUPM with α = 0.Looking at the SUPM’s performance, we see that that the larger α is, theslower the SUPM will converge (assuming it converges in the ﬁrst place). TheDUPM outperforms the SUPM with α = 1, but is not always faster than theSUPM with α = 0 .

1, which suggests that while Lemma 3.2 is required forour theoretical results, satisfying it is by no means necessary. Meanwhile, theEUPM is both consistent and slow, converging with an average rate compara-ble to Golden Section.In short, the DUPM is not competitive with Brent’s method for this classof test functions. However, it converge with an average rate which is betterthan for Golden Section, implying that it is not prohibitively slow.Next we deﬁne our set of non-smooth unimodal test functions: f NU = − (cid:18) − | x | (cid:19) , − ≤ x ≤ , (6.8) f NU = 16 max (cid:18) x + 3 , log( x ) (cid:19) , − ≤ x ≤ , (6.9) f NU = 124 max (cid:18) x + 3 , x − (cid:19) , − ≤ x ≤ , (6.10) f NU = 1160 max (cid:18) x + 3 , exp( x ) (cid:19) , − ≤ x ≤ , (6.11) f NU = 1150 max (exp( − x ) , exp( x )) , − ≤ x ≤ . (6.12)The results for these functions are shown in Table 6.2. These Results areparticularly important as it was for this class of functions that the UPM wasdesigned in the ﬁrst place. As we’d hope, we see the SUPM with α = 1 andthe DUPM outperform Brent’s method for every function. Despite the fact α = 0 α = 0 . α = 1 α = 10 EUPM DUPM Brent Miﬄin f NU ∞ . . . . .

264 0 . ∞ f NU ∞ . . . . .

427 0 . ∞ f NU ∞ . . . . . . . f NU ∞ . . . . . . . f NU ∞ . . . . . . . Table 6.2

Average convergence rate of diﬀerent algorithms on the set of non-smooth uni-modal test functions. that the Miﬄin-Strodiot method is also designed for this context, it does notalways successfully converge. In particular, we observe that the Miﬄin-Strodiotmethod struggles with non-convex functions.Finally we deﬁne our smooth multimodal test functions. f SM = x (cid:18) (cid:18) x (cid:19)(cid:19) , − ≤ x ≤ , (6.13) f SM = − πx ) , − ≤ x ≤ , (6.14) f SM = − (cid:18) π ( x −

120 ) (cid:19) , . ≤ x ≤ , (6.15) f SM = 15 sin (cid:18) x − (cid:19) + sin (cid:18) x − (cid:19) ! , − ≤ x ≤ , (6.16) f SM = x − cos x + 1 , ≤ x ≤ , (6.17) f SM = 171 (cid:16) (log( x − + (log(10 − x )) − x (cid:17) , . ≤ x ≤ . , (6.18) f SM = 140 (cid:18) sin( x ) + sin (cid:18) x (cid:19) + log( x ) + 2125 x (cid:19) , . ≤ x ≤ . (6.19)It is important to note that theory for neither the UPM nor indeed Brent’smethod is suitable for multi-modal functions. However, both algorithms areequipped with features ensuring suﬃcient robustness in the general case thatthey converge (possibly slowly) to a locality where the function is unimodal.For these test-functions, Brent’s method is once again the best. Whileslower than Brent’s method, the DUPM consistently performs better thanGolden Section, which is an ideal result considering that these are not idealcircumstances for the DUPM. Finally we see that the Miﬄin-Strodiot methodis entirely unsuitable for this type of problem, and as such consistently failsto converge in reasonable time. seven-point algorithm for piecewise smooth univariate minimization 23SUPM with parameter value Other methodsFunctions α = 0 α = 0 . α = 1 α = 10 EUPM DUPM Brent Miﬄin f SM ∞ . . . . . . ∞ f SM ∞ . . . . .

486 0 . ∞ f SM ∞ . . . . . . ∞ f SM ∞ . . . . . . ∞ f SM ∞ . . . . . . ∞ f SM ∞ . . . .

624 0 . . ∞ f SM ∞ . . . . . . ∞ Table 6.3

Average convergence rate of diﬀerent algorithms on the set of smooth multimodaltest functions.

In this paper, we have constructed a univariate optimization algorithm forblack box, piece-wise smooth functions. As seen in Section 6, this new method,the DUPM, converges both more robustly and often faster than existing meth-ods for such problems. Furthermore, while it is not the fastest algorithms forstandard smooth test functions, it is not prohibitively slow either.It must be acknowledged that the comparison between the UPM, theMiﬄin-Strodiot method and Brent’s method is not entirely fair given thatthey require a diﬀerent number of points to start. Therefore, in the context ofa line-search, we expect to start with a method like Golden Section, and switchto the DUPM once we have accumulated enough points to form and extendedbracket. Regardless, the DUPM oﬀers a univariate solver which performs wellin most contexts and is therefore suitable for non-smooth functions.

A Proof of Lemma 4.3

For the calculations in this section, we use the following change of variables: ttt = (cid:0) p q r s (cid:1) T = (cid:0) x L − x L , x M − x L , x R − x M , x R − x R (cid:1) T . Converting our calculations to being in termsof ttt both reduces the number of variables, and simpliﬁes the constraint

X ∈ ˜ B (recall Deﬁ-nition 4.4) into ttt ∈ R .4 Jonathan Grant-Peters, Raphael HauserThe algebra for the calculations which follow is tedious, and therefore we employedMathematica to compute the values of U I , I ∈ A as well as changing the variables used. b ( U X ) b ( X ) = ( q + r )( p + q + r )2( q + r )( p + q + r ) + ( q + r ) + s + s ( p + 3 q + 3 r ) < ,b ( U X ) b ( X ) = qp + 2 q + r < ,b ( U X ) b ( X ) = q ( p + q )( p + q + r )( p + 2 q + r )(3 q + 3 qr + r + p (2 q + r )) ,< q ( p + q )3 q + 3 qr + r + p (2 q + r ) < ,b ( U X ) b ( X ) = ( q + r )( p + q + r )2( q + r )( p + q + r ) + ( q + r ) + s + s ( p + 3 q + 3 r ) < ,b ( U X ) b ( X ) = 12 + q + p r + 7 q r + 5 qr + r + p ( q + r )(3 q + 2 r ) q ( p + q )( p + q + r ) ! ≤ ,b ( U X ) b ( X ) = q ( p + q )( p + q + r )( p + 2 q + r )( p + 3 q + r ( r + s ) + 2 q (2 r + s ) + p (3 q + 2 r + s )) ,< q ( p + q )3 q ( p + q ) + p + r ( r + s ) + 2 q (2 r + s ) + 2 rp + ps < ,b ( U X ) b ( X ) = ( p + q + r )( p + 2 q + r )( q ( p + q ) − r ( r + s ))( p + 2 q + 2 r + s ) × q ( q + r ) + r + p (2 q + r ) + q ( s − r ) + p (6 q + 2 r + q (7 r + s ))) ,< ( p + q + r )( q ( p + q ) − r ( r + s ))4 q ( q + r ) + r + p (2 q + r ) + q ( s − r ) + p (6 q + 2 r + q (7 r + s )) ,< ( p + q + r )( q ( p + q ) − r )4 q ( q + r ) + r + p (2 q + r ) − q r + p (6 q + 2 r + 7 qr ) ,< q ( p + q ) − r q ( p + q ) < . For the sequence 414, we need to employ another piece of information. From Equa-tion (3.5), we see that U is only applied to X if ϕ ( X ) < q ( p + q ) − r ( r + s ) > b ( U X ) b ( X ) = 12 + q + r ( p + r ) + q (7 r + s ) + qr (3 p + 7 r + 3 s ) q ( q ( p + q ) − r ( r + s )) ! , which is bounded above by precisely when q ( p + q ) − r ( r + s ) >

0. Therefore b ( U X ) /b ( X ) < as required.There remain 4 elements of A for which we must establish a bound: 434, 4322, 4314 and4114. As the previous change of variables does little to simplify the calculations for thesesubsequences, we instead use the following alternative change of variables: ( a, b, c, d ) = (cid:0) x R − x L , x R − x L , x R − x L , ϕ ( X ) (cid:1) . For this set of variables,

X ∈ ˜ B is equivalent to seven-point algorithm for piecewise smooth univariate minimization 25 a, b, c > b > a and c > a . Using this transformation, we ﬁnd: b ( U X ) b ( X ) = bc ( b + c )( b + c − a )( ab + bc + c )( ab + 2 b c + 3 bc + c ) < bc ( b + c ) ab + 2 b c + 3 bc + c < bc ( b + c )2 b c + 3 bc + c < b ( b + c )2 b + 3 bc + c < b b + c < ,b ( U X ) b ( X ) = bc ( b + c )( b + c − a )( ab + bc + c )( ab + 2 b c + 3 bc + c ) < bc ( b + c ) ab + 2 b c + 3 bc + c < bc ( b + c )2 b c + 3 bc + c < b ( b + c )2 b + 3 bc + c < b b + c < . Remark A.1

From the calculations above, we see that b ( U X ) /b ( X ) = b ( U X ) /b ( X )and b ( U X ) /b ( X ) = b ( U X ) /b ( X ). While this may imply another relation which we havenot taken advantage of, we have not examined this further.Finally we have the sequences 4314 and 4114. Similar to what we did with the sequence 414,we need to make use of additional information, speciﬁcally the fact that U is only appliedif ϕ ( X ) = d <

0. In addition, U may only follow after U if ϕ ( U X ) <

0. This is equivalentto: − d ( ab + bc + c ) > abc ( b + c − a ). From this it follows that: b ( U X ) b ( X ) = abc ( b + c )( b + c − a )( ab + bc + c )( ac ( b + c ) − d ( ab + bc + c )) ,< abc ( b + c )( b + c − a )( ab + bc + c )( ac ( b + c ) + abc ( b + c − a )) , = bc ( b + c )( b + c − a )( ab + bc + c )((2 b + c )( b + c ) − ab ) < bc ( b + c ) ( ab + bc + c )(2 b + c )( b + c ) , = bc ( b + c )( ab + bc + c )(2 b + c ) < b b + c < . Similarly for the sequence 4314, U may only follow after U if ϕ ( U X ) >

0. This isequivalent to: − d ( ab + bc + c ) < abc ( b + c − a ). First we compute b ( U X ) b ( X ) = − d ( ac − d )( ab + bc + c ) a ( b + c )( ac ( b + c ) − d ( ab + bc + c )) . Next we observe that ϕ ( U X ) > − d > abc ( b + c − a ) ab + bc + c ⇔ ac − d > ac ( b + c ) ab + bc + c . (A.1)This implies that: b ( U X ) b ( X ) < − dc ( b + c ) ac ( b + c ) − d ( ab + bc + c ) . Finally we that a suﬃcient condition for b ( U X ) /b ( X ) < is: − dc ( b + c ) < ac ( b + c ) − d ( ab + bc + c ) , ⇔ − d ( − ab + bc + c ) < ac ( b + c ) . We see that this is true when we combine the requirement that ϕ ( U X ) > − d ( − ab + bc + c ) < − d ( ab + bc + c ) < ( ac − d )( ab + bc + c ) < ac ( b + c ) . Therefore, b ( U X ) /b ( X ) < holds and moreover b ( U I X ) /b ( X ) < ∀ I ∈ I .6 Jonathan Grant-Peters, Raphael Hauser B Numerical Performance of the EUPM

In Section 4.1 we bounded the convergence rate over 5 iterations of the EUPM. In practicewe observe far superior convergence rates. From our observation, this seems to be becausethe extremal values of ttt which yield the worst convergence rate over 5 iterations tendto yield best case performance over 6 iterations and so on. In this section, we show whatperformance might realistically be expected from the EUPM.In order to construct a suitable experiment, we ﬁrst must ask what factors aﬀect theconvergence of the EUPM? The answer to this question is the function to minimise f ,along with the initial value of ttt . However, the only eﬀect that the function f actually hasis to determine whether f (˜ x ) < f ( x M ). This along with the current value of ttt uniquelydetermines which update function will be used. Therefore, when testing the convergencerate of the EUPM, we do not test it on a range of functions, but rather for diﬀerent binarysequences, where 1’s mean f (˜ x ) < f ( x M ) and 0’s the opposite.The main advantage of this is that while the set C ([ a, b ]) has inﬁnite cardinality, the setof binary sequences of length n is ﬁnite. Therefore, given an initial bracket ttt , it is possible todetermine how the EUPM will perform on literally any function. Given ttt and I ∈ { , } n ,we plot the average convergence rate of the EUPM. I ∈ { , } . . A v e r ag e C o n v e r g e n ce R a t e Golden SectionEUPM

Fig. B.1

We compare the convergence rate of the EUPM with Golden Section over allpossible sequences I ∈ { , }

0. These sequences are ordered such that the slowest ones areplotted ﬁrst. For each sequences, we sample a large number of values for ttt and average overthese.For each sequence I ∈ { , } , we sample a large number ttt values, constrained suchthat | ttt | = 1, and average over these. This leaves us with an average convergence rate forevery possible sequence in { , } , and by extension for every possible function. We choose n = 10 and plot the resulting data corresponding to both the EUPM and Golden Section inFigure B.1.What we see is that Golden Section is the more conservative of the two. It performsnotably better in the worst case scenario even if notably worse in the best case scenario.Even though we see some convergence rates from the EUPM which are signiﬁcantly worsethan those observed in Tables 6.1 to 6.3, they still are far better than the upper bound fromTheorem 4.1. References

1. Brent, R.P.: Algorithms for minimization without derivatives. 19732. Brent, R.P.: A new algorithm for minimizing a function of several variables withoutcalculating derivatives. Algorithms for minimization without derivatives pp. 200–248(1976) seven-point algorithm for piecewise smooth univariate minimization 273. Fletcher, R.: Practical methods of optimization. John Wiley & Sons (2013)4. Gill, P.E., Murray, W., Wright, M.H.: Practical optimization. SIAM (2019)5. Gould, N.: An introduction to algorithms for continuous optimization (2006)6. Hager, W.W.: A derivative-based bracketing scheme for univariate minimization andthe conjugate gradient method. Computers & Mathematics with Applications (9),779–795 (1989)7. Hansen, P., Jaumard, B., Lu, S.H.: Global optimization of univariate Lipschitz functions:I. Survey and properties. Mathematical programming (1-3), 251–272 (1992)8. Jamil, M., Yang, X.S.: A literature survey of benchmark functions for global optimiza-tion problems. arXiv preprint arXiv:1308.4008 (2013)9. Jarratt, P.: An iterative method for locating turning points. The Computer Journal (1), 82–84 (1967)10. Kiefer, J.: Sequential minimax search for a maximum. Proceedings of the Americanmathematical society (3), 502–506 (1953)11. Miﬄin, R.: On superlinear convergence in univariate nonsmooth minimization. Mathe-matical programming (1-3), 273–279 (1990)12. Miﬄin, R., Strodiot, J.J.: A rapidly convergent ﬁve-point algorithm for univariate min-imization. Mathematical programming (1-3), 299–319 (1993)13. Moore, R.E.: Interval analysis, vol. 4. Prentice-Hall Englewood Cliﬀs, NJ (1966)14. Murray, W., Overton, M.L.: Steplength algorithms for minimizing a class of nondiﬀer-entiable functions. Computing (4), 309–331 (1979)15. Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media(2006)16. Rudin, W., Others: Principles of mathematical analysis, vol. 3. McGraw-hill New York(1976)17. Yu, J., Vishwanathan, S.V.N., G¨unter, S., Schraudolph, N.N.: A quasi-Newton approachto nonsmooth convex optimization problems in machine learning. Journal of MachineLearning Research11