Piecewise linear secant approximation via Algorithmic Piecewise Differentiation
Andreas Griewank, Tom Streubel, Lutz Lehmann, Manuel Radons, Richard Hasenfelder
aa r X i v : . [ m a t h . NA ] A ug Piecewise Linear Secant Approximation viaAlgorithmic Piecewise Differentiation
Andreas Griewank , Tom Streubel , Lutz Lehmann , ManuelRadons , and Richard Hasenfelder School of Mathematical Sciences and Information Technology,Ecuador Zuse Institute Berlin, Germany Humboldt University of Berlin, Germany Technical University of Berlin, GermanySeptember 18, 2018
Abstract
It is shown how piecewise differentiable functions F : R n R m thatare defined by evaluation programs can be approximated locally by apiecewise linear model based on a pair of sample points ˇ x and ˆ x . Weshow that the discrepancy between function and model at any point x isof the bilinear order O ( k x − ˇ x kk x − ˆ x k ). As an application of the piecewiselinearization procedure we devise a generalized Newton’s method based onsuccessive piecewise linearization and prove for it sufficient conditions forconvergence and convergence rates equaling those of semismooth Newton.We conclude with the derivation of formulas for the numerically stableimplementation of the aforedeveloped piecewise linearization methods. Keywords
Automatic differentiation, Stable piecewise linearization, General-ized Newton’s method, Lipschitz continuity, Generalized Hermite interpolation,ADOL-C
MSC2010
In this paper we refine and extend the theory of piecewise linearizations ofpiecewise differentiable functions F : R n R m that was introduced in [Gri13]. Our notion of linearity includes nonhomogeneous functions, where the adjective affine orperhaps polyhedral would be more precise. However, such mathematical terminology mightbe less appealing to computational practicioners and to the best of our knowledge thereare no good nouns corresponding to linearity and linearization for the adjectives affine andpolyhedral. v i = ϕ i ( v j ) j ≺ i , where ϕ i is either smooth or the absolute value function. The data dependence relation ≺ generates a partial ordering, which yields a directed acyclic graph. In otherwords, we assume that we have a straight line program without any loops orjumps in the control flow.In the above reference a piecewise linearization ∆ F (˚ x ; ∆ x ) of F about thepoint ˚ x is constructed such that, for arbitrary x ∈ R n with ˚ F = F (˚ x ), it holds F ( x ) = ˚ F + ∆ F (˚ x ; x − ˚ x ) + O ( k x − ˚ x k ) . Since ∆ F (˚ x ; ∆ x ) is constructed by replacing smooth elementals by their tangentlines at ˚ x , we will refer to it as piecewise tangent linearization . An impor-tant property of these piecewise linearizations is that they vary continuouslywith respect to the sample point or points at which they are developed.As a generalization of ∆ F (˚ x ; ∆ x ) we construct a piecewise secant lin-earization ∆ F (ˇ x, ˆ x ; ∆ x ) of F at the pair ˇ x and ˆ x such that for the midpoints˚ x = (ˇ x + ˆ x ) / F = ( ˇ F + ˆ F ) / F = F (ˇ x ) and ˆ F = F (ˆ x ) it holds F ( x ) = ˚ F + ∆ F (ˇ x, ˆ x ; x − ˚ x ) + O ( k x − ˇ x kk x − ˆ x k ) . The new ∆ F (ˇ x, ˆ x ; ∆ x ) reduces to ∆ F (˚ x ; ∆ x ) when the two sample points ˇ x andˆ x coalesce at ˚ x .Our results in this article are twofold: • Firstly, we present approximation properties and Lipschitz continuity es-timates. Here the two major points are that the piecewise linearizationis a second order approximation to the underlying function. Moreover,we do not only prove the existence of Lipschitz constants, but provide ex-plicit estimates. These results immediately yield additional statement onconditions for perturbation-stable surjectivity of a piecewise linear model. • Secondly, we develop an application for the piecewise linearization. Namely,we introduce, for each linearization mode, a Newton’s method based onsuccessive piecewise linearization and give sufficient conditions for conver-gence, as well as statements on the convergence rates.In an appended section we moreover provide formulas for singularity free andthus numerically stable implementations of the piecewise secant linearizations.One significant advantage of the piecewise linearization-approach is that,once the piecewise linearization is generated, we also know where its kinks arelocated. This means that we are liberated of the complications of event han-dling that the nondifferentiabilities of piecewise smooth functions usually bringabout, which was one of the motivations for the development of the techniquespresented in [Gri13]. In said reference it is proved that the variations of thepiecewise linear model are Lipschitz continuous with respect to perturbations ofthe development point. The results mentioned in the first bullet point representa significant improvement over those given in [Gri13] in so far as they are notonly sharper, but also provide explicit bounds for the Lipschitz constants forthe variations of the model with respect to perturbations of the base point.2 ontent and Structure
To provide some more background, we will, in the second section, elaborate onthe setting outlined above both from the mathematical and the implementationperspective. In Section 2 we will introduce the piecewise linearization frameworkthat is investigated thereafter. In Section 3 we derive the approximation andstability properties of said framework. In the subsequent section the generalizedNewton’s methods are derived for both linearization modes. In Section 5 weshow that the secant linearization can be computed in a division free, centeredform such that it reduces continuously to the tangent mode when the referencepoints ˇ x and ˆ x coalesce. The latter is followed by a numerical example in Section6. The article is concluded by some final remarks. Preliminary Remarks
Let ˜Φ be a library of elemental functions that conforms to the condition ofelemental differentiability as described in [GW08]. In the following we willconsider piecewise differentiable functions that are defined by an evaluationprocedure consisting of a sequence of such elemental functions v i = ϕ i ( v j ) j ≺ i ,where the ϕ i are contained in a libraryΦ := ˜Φ ∪ { abs () } . We remark that the inclusion of the absolute value function into the librarymeans that we can also evaluate min() and max() via the identitiesmax( u, w ) = ( u + w + abs ( u − w )) / , min( u, w ) = ( u + w − abs ( u − w )) / . (1)There is a slight implicit restriction, namely, we assume that whenever minor max are evaluated both their arguments have well defined finite values sothat the same is true for their sum and difference. On the other hand, theexpression min(1 , / abs ( u )) makes perfect sense in IEEE arithmetic [iee85],but rewriting it as above leads to a N aN at u = 0. While this restriction mayappear quite technical, it imposes the requirement that all relevant quantitiesare well defined at least in some open neighborhood, which is exactly in thenature of piecewise differentiability. For an in-depth investigation of piecewisedifferentiable functions, see, e.g., the books by Kummer [Kum88] and Scholtes[Sch12].The function ∆ F (˚ x ; ∆ x ) is incremental in that ∆ F (˚ x ; 0) = 0, but like generalpiecewise linear continuous functions, it is only locally and positively homoge-neous so that∆ F (˚ x ; α ∆ x ) = α ∆ F (˚ x ; ∆ x ) for 0 < α k ∆ x k < ρ (˚ x ) . Here the bound ρ (˚ x ) is positive everywhere, but generally not continuous withrespect to ˚ x .In [Gri13] it has been shown that the Jacobians of the linear pieces of∆ F (˚ x ; α ∆ x ) in the ball of radius ρ (˚ x ) about ˚ x are conically active general-ized Jacobians of the underlying nonlinear function F ( x ). We will not elaborate3n this connection here, because even in the smooth case secant approximationsneed not correspond to exact Jacobians. In contrast, the generalized derivativesets in the sense of Clarke [Cla83] are for piecewise differentiable functions al-most everywhere just singletons, containing the classical Jacobian matrix. Infloating point arithmetic the user or client algorithm will then quite likely never ’see’ the nonsmoothness or gain any useful information about it. Suppose the vector function F : D ⊂ R n → R m in question is evaluated by asequence of assignments v i = v j ◦ v k or v i = ϕ i ( v j ) for i = 1 . . . l . Here ◦ ∈ { + , − , ∗ , / } is a polynomial arithmetic operation and ϕ i ∈ Φ ≡ { rec, sqrt , sin , cos , exp , log , . . . , abs , . . . } a univariate function. The user or reader may extend the library by other locallyLipschitz-continuously differentiable functions like the analysis favorites ϕ ( u ) ≡ | u | > u p · sin(1 /u ) : 0 for p ≥ . To fit into the the framework they then also have to supply an evaluation pro-cedure for both the elemental function ϕ and its derivative ϕ ′ , which cannot bebased mechanically on the chain rule.Following the notation from [GW08] we partition the sequence of scalarvariables v i into the vector triple( x, z, y ) = ( v − n , . . . , v − , v , . . . , v l − m , v l − m +1 , . . . , v l ) ∈ R n + l such that x ∈ R n is the vector of independent variables, y ∈ R m the vector ofdependent variables and z ∈ R l − m the (internal) vector of intermediates.Some of the elemental functions like the reciprocal rec ( u ) ≡ /u , the squareroot and the logarithm are not globally defined. As mentioned above, we willassume that the input variables x are restricted to an open domain D ⊂ R n such that all resulting intermediate values v i = v i ( x ) are well defined.Throughout we will assume that the evaluation procedure for F involvesexactly s ≥ abs (), including min and max rewritten or at least rein-terpreted as discussed above. Starting from ˚ x and an increment ∆ x = x − ˚ x ,we will now construct for each intermediate v i an approximation v i (˚ x + ∆ x ) − ˚ v i ≈ ∆ v i ≡ ∆ v i (∆ x ) . Here the incremental function ∆ v i (∆ x ) is continuous and piecewise linear, with˚ x or ˇ x and ˆ x considered constant in the tangent and secant case, respectively.Hence, we will often list ∆ x , but only rarely use ˚ x , ˇ x and ˆ x as arguments of the∆ v i in proofs. 4 efining Relations for Tangent Approximation We use the reference values ˚ v i = v i (˚ x ) and, assuming that all ϕ i other than theabsolute value function are differentiable within the domain of interest, we mayuse the tangent linearizations∆ v i = ∆ v j ± ∆ v k for v i = v j ± v k , (2)∆ v i = ˚ v j ∗ ∆ v k + ∆ v j ∗ ˚ v k for v i = v j ∗ v k , (3)∆ v i = ˚ c ij ∗ ∆ v j for v i = ϕ i ( v j ) abs () . (4)Here ˚ c ij ≡ ϕ ′ i (˚ v j ), which will be different for the secant linearization.If no absolute value or other nonsmooth elemental occurs, the function y = F ( x ) is, at the current point, differentiable and by the chain rule we have therelation ∆ y = ∆ F (˚ x ; ∆ x ) ≡ F ′ (˚ x )∆ x, where F ′ ( x ) ∈ R m × n is the Jacobian matrix. Thus we observe the obvious factthat smooth differentiation is equivalent to linearizing all elemental functions.Now, let us move to the piecewise differentiable scenario, where the abso-lute value function does occur s > F (˚ x + ∆ x ) − F (˚ x ) by incrementing∆ v i = abs (˚ v j + ∆ v j ) − ˚ v i when v i = abs ( v j ) . (5)Here ˚ v i = abs (˚ v j ), which will be slightly different for the secant linearization.In other words, we keep the piecewise linear function abs () unchanged so thatthe resulting ∆ y represents, for each fixed x ∈ D , the piecewise linear andcontinuous increment function ∆ y = ∆ y (∆ x ) = ∆ F (˚ x ; ∆ x ) : R n → R m . Defining Relations for Secant Approximation
In the tangent approximation the reference point was always the evaluationpoint ˚ x and the resulting values ˚ v i = v i (˚ x ). Now we will make reference to themidpoints ˚ v i ≡ (ˇ v i + ˆ v i ) / v i ≡ v i (ˇ x ) and ˆ v i ≡ v i (ˆ x ) . (6)Consequently, we have the functional dependence ˚ v i = ˚ v i (ˇ x, ˆ x ), which is at leastLipschitz continuous under our assumptions. Now an intriguing observation isthat the recurrences (2) and (3) for arithmetic operations can stay just thesame, and the recurrence (4) for nonlinear univariates is still formally valid,except that the tangent slope ϕ ′ (˚ v j ) must be replaced by the secant slope c ij ≡ (cid:26) (ˇ v i − ˆ v i ) / (ˇ v j − ˆ v j ) if ˇ v j = ˆ v j ϕ ′ i (˚ v j ) otherwise . (7)5heoretically, some ˇ v i and ˆ v i may coincide, even if the underlying sample pointsˇ x and ˆ x are not selected identically, in which case the secant based model wouldreduce to the tangent based model. While exact coincidence of any pair ˇ v i andˆ v i is rather unlikely, taking the divided difference over small increments is likelyto generate numerical cancellation. Therefore we will develop a division freecentered form in Section 5. Finally, the nonsmooth rule (5) can stay unchangedexcept that we now set˚ v i ≡ (ˇ v i + ˆ v i ) = [ abs (ˇ v j ) + abs (ˆ v j )] . (8)Hence, it is immediately clear that the new secant approximation reduces to theold tangent approximation when ˇ x = ˆ x . In general, we will denote the mappingbetween the input increments ∆ x ∈ R n and the resulting values ∆ y ∈ R m by∆ y = ∆ y (∆ x ) = ∆ F (ˇ x, ˆ x ; ∆ x ) : R n → R m . Its piecewise linear structure is very much the same as that of the tangent basedmodel, which is described in detail in [Gri13]. Here we emphasize its quality inapproximating the underlying nonlinear and nonsmooth F .In contrast to the tangent model, the secant model is not a priori uniquein that it depends quite strongly on the procedural representation of the vectorfunction F and not just its values, i.e. its properties as a mathematical map.For example, one can easily check that applying the above secant modeling rulesto f ( x ) = log(exp( x )) does not yield the same approximation ∆ f (ˇ x, ˆ x, ∆ x ) asthe one for f ( x ) = x . On the other hand the natural secant linearization rule forthe product v = u · w is equivalent to that obtained by applying the Appoloniusidentity u · w = (cid:2) ( u + w ) − ( u − w ) (cid:3) . Of course, the same is true for the tangent linearization and we may assumewithout loss of generality that we only have three kinds of elemental functions,the addition, the modulus and smooth univariate functions. That reductiongreatly simplifies the thoretical analysis but might not be numerically optimalfor actual implementations.
In contrast to the presentation in our previous papers we will now also use thenonincremental forms ♦ ˚ x F ( x ) ≡ F (˚ x ) + ∆ F (˚ x ; x − ˚ x )and ♦ ˆ x ˇ x F ( x ) ≡ ( F (ˇ x ) + F (ˆ x )) + ∆ F (ˇ x, ˆ x ; x − ˚ x ) . For the square as a univariate nonlinear function v ( x ) = x we find: ♦ ˆ x ˇ x v = ♦ ˆ x ˇ x x = [ˇ x + ˆ x ] + ˆ x − ˇ x ˆ x − ˇ x ( x − [ˆ x + ˇ x ]) = [ˇ x + ˆ x ] + (ˆ x + ˇ x )( x − ˚ x )= [ˇ x + ˆ x ] + 2˚ x ( x − ˚ x ) = ˚ v + 2˚ x ( x − ˚ x ) . (9)6 emma 3.1. Plugging the secant approximation for the square into the Appolo-nius identity, we obtain for the general multiplication v = u · w : ♦ (ˆ u, ˆ w )(ˇ u, ˇ w ) ( u · w ) = h ♦ (ˆ u, ˆ w )(ˇ u, ˇ w ) ( u + w ) − ♦ (ˆ u, ˆ w )(ˇ u, ˇ w ) ( u − w ) i = (ˆ u ˆ w + ˇ u ˇ w ) + ˚ w ( u − ˚ u ) + ˚ u ( w − ˚ w ) . Proof. · ♦ (ˆ u, ˆ w )(ˇ u, ˇ w ) ( u · w ) = h ♦ (ˆ u, ˆ w )(ˇ u, ˇ w ) ( u + w ) − ♦ (ˆ u, ˆ w )(ˇ u, ˇ w ) ( u − w ) i = (cid:0) [ˆ u + ˆ w ] + [ˇ u + ˇ w ] (cid:1) + 2(˚ u + ˚ w ) ( u + w − [˚ u + ˚ w ]) − (cid:0) [ˆ u − ˆ w ] + [ˇ u − ˇ w ] (cid:1) + 2(˚ u − ˚ w ) ( u − w − [˚ u − ˚ w ])= (cid:0) [ˆ u + ˆ w ] + [ˇ u + ˇ w ] (cid:1) + 2(˚ u + ˚ w ) ([ u − ˚ u ] + [ w − ˚ w ]) − (cid:0) [ˆ u − ˆ w ] + [ˇ u − ˇ w ] (cid:1) + 2(˚ u − ˚ w ) ([ u − ˚ u ] − [ w − ˚ w ])= 4 · (cid:2) (ˇ u ˇ w + ˆ u ˆ w ) + ˚ w [ u − ˚ u ] + ˚ u [ w − ˚ w ] (cid:3) . Hereafter we will denote by k·k ≡ k·k ∞ the infinity norm. Due to the normequivalence in finite dimensional spaces all inequalities to be derived take thesame form in other norms, provided the constants are adjusted accordingly. Theinfinity norm is particularly convenient, since we can then prove the followingresult for the general vector case m > f = F i for i = 1 . . . m . Moreover, we will make useof the Appolonius identity in the following proposition: Proposition 3.2.
Suppose ˜ x, ˇ x, ˆ x, ˇ y, ˆ y, ˇ z, ˆ z ∈ R n are restricted to a sufficentlysmall closed convex neighboorhod K ⊂ R n where the evalution procedure for F : R n R m is well defined. Then there are Lipschitz constants β F and γ F such that we have(i) Lipschitz continuity of function, tangent and secant models: k F ( x ) − F (˜ x ) k ≤ β F k x − ˜ x k for x, ˜ x ∈ K, max (cid:0) k ♦ ˆ x ˇ x F ( x ) − ♦ ˆ x ˇ x F (˜ x ) k , k ♦ ˚ x F ( x ) − ♦ ˚ x F (˜ x ) k (cid:1) ≤ β F k x − ˜ x k for x, ˜ x ∈ R n . The constant β F can be defined by the recurrences β v = β u + β w if v = u + v , β v = β u if v = | u | and β v = β u L K ( ϕ ) if v = ϕ ( u ) with L K ( ϕ ) ≡ max x ∈ K | ϕ ′ ( u ( x )) | . (ii) Error between function and secant or tangent model: k F ( x ) − ♦ ˆ x ˇ x F ( x ) k ≤ γ F k x − ˇ x kk x − ˆ x kk F ( x ) − ♦ ˚ x F ( x ) k ≤ γ F k x − ˚ x k , here x ∈ K . The constant γ F can be defined using the recurrences γ v = γ u + γ w if v = u + w , γ v = γ u if v = | u | and γ v = L K ( ϕ ) γ u + L K ( ϕ ′ ) β u if v = ϕ ( u ) with L K ( ϕ ′ ) ≡ max x ∈ K | ϕ ′′ ( u ( x )) | . (iii) Lipschitz continuity of secant and tangent model: k ♦ ˆ z ˇ z F ( x ) − ♦ ˆ y ˇ y F ( x ) k ≤ γ F max [ k ˆ z − ˆ y k max( k x − ˇ y k , k x − ˇ z k ) , k ˇ z − ˇ y k max( k x − ˆ y k , k x − ˆ z k )] k ♦ ˚ z F ( x ) − ♦ ˚ y F ( x ) k ≤ γ F k ˚ z − ˚ y k max( k x − ˚ y k , k x − ˚ z k ) , where x ∈ R n .(iv) Lipschitz continuity of the incremental part: Let x ∈ K . Abbreviating ∆ y = ˆ y − ˇ y and ∆ z = ˆ z − ˇ z we obtain in the secant case k ∆ ˆ z ˇ z F (∆ x ) − ∆ ˆ y ˇ y F (∆ x ) k≤ β F k ˚ z − ˚ y k + γ F ( k ˚ z − ˚ y k + max( k ∆ y k , k ∆ z k )) + γ F (cid:0) k ˚ z − ˚ y k + ( k ∆ y k + k ∆ z k ) (cid:1) k ∆ x k , which reduces in the tangent case to k ∆ ˚ z F (∆ x ) − ∆ ˚ y F (∆ x ) k ≤ β F k ˚ z − ˚ y k + γ F k ˚ z − ˚ y k + γ F k ˚ z − ˚ y kk ∆ x k . Here the second bounds applying to the tangent model are always specializationsof the previous ones for the secant model.Proof.
Since otherwise the bounds can be applied componentwise we may as-sume without loss of generality that F is a scalar function f and the norm inthe range is simply the absolute value | · | . The proof proceeds by induction onthe intermediate quantities v in the computational graph of f . We will definethe constants β v and γ v recursively on the basis of the Lipschitz constants ofthe elemental functions and their derivatives: Variable initialization:
The initialization of independant variables representthe minimal nodes and all assertions are tivially true with the constants β x i = 1and γ x i = 0. Smooth univariate operation:
Let v = ϕ ( u ), ϕ ∈ ˜Φ, be some elementalfunction in the computational graph of f : k ϕ ( u ) − ϕ (˜ u ) k ≤ L K ( ϕ ) k u − ˜ u k ≤ β u L K ( ϕ ) k x − ˜ x k and k ♦ ˆ u ˇ u ϕ ( u ) − ♦ ˆ u ˇ u ϕ (˜ u ) k ≤ L K ( ♦ ˆ u ˇ u ϕ ) k u − ˜ u k ≤ β u L K ( ♦ ˆ u ˇ u ϕ ) k x − ˜ x k . The first inequality holds due to the Lipschitz continuity of ϕ and ♦ ˆ u ˇ u ϕ . Thelatter inequality is the induction hypothesis and since L K ( ♦ ˆ u ˇ u ϕ ) ≤ L K ( ϕ ) holdsby the mean value theorem, we may set β v ≡ β u L K ( ϕ ).8 bsolute value function and sum: The absolute value function v = abs ( u )naturally maintains the Lipschitz constant and the addition summates them.Thus we have established ( i ) for all cases.Now let us consider the approximation property ( ii ). For additions v = u + w we may set γ v = γ u + γ w and then have by the triangle inequality | v ( x ) − ♦ ˆ x ˇ x v ( x ) | ≤ | u ( x ) − ♦ ˆ x ˇ x u ( x ) | + | w ( x ) − ♦ ˆ x ˇ x w ( x ) |≤ γ u ( k x − ˇ x kk x − ˆ x k ) + γ w ( k x − ˇ x kk x − ˆ x k ) = γ v ( k x − ˇ x kk x − ˆ x k ) . For the absolute value function v = abs ( u ) we may also set γ v = γ u , since | v ( x ) − ♦ ˆ x ˇ x v ( x ) | = || u | − | ♦ ˆ x ˇ x u ( x ) || ≤ | u − ♦ ˆ x ˇ x u ( x ) | ≤ γ v ( k x − ˇ x kk x − ˆ x k ) . For the univariate functions v = ϕ ( u ) we have with ˜ u ≡ ♦ ˆ x ˇ x u ( x ) | v ( x ) − ♦ ˆ x ˇ x v ( x ) | ≤ | ϕ ( u ( x )) − ϕ (˜ u ) | + | ϕ (˜ u ) − ♦ ˆ u ˇ u ϕ (˜ u ) | . By the mean value theorem and the induction hypothesis, the first term isbounded by L K ( ϕ ) | u ( x ) − ♦ ˆ x ˇ x u ( x ) | ≤ L K ( ϕ ) γ u ( k x − ˇ x kk x − ˆ x k ) . The second term represents the error in the Hermite interpolation of ϕ betweenˇ u and ˆ u . With L K ( ϕ ′ ) a Lipschitz constant of ϕ ′ on u ( K ) it is bounded by L K ( ϕ ′ ) | ˜ u − ˇ u || ˜ u − ˆ u | ≤ L K ( ϕ ′ ) β u ( k x − ˇ x kk x − ˆ x k ) , where the last bound follows from the fact that according to ( i ) the approxima-tion ♦ ˆ x ˇ x u ( x ) has the Lipschitz constant β u and takes on at ˇ x and ˆ x the values ˇ u and ˆ u . Hence, we have shown that (ii) holds indeed with γ v = L K ( ϕ ) γ u + L K ( ϕ ′ ) β u . (10)Next we want to prove Lipschitz continuity of the model as stated in ( iii ). Again,we find for additions and the abs function that the assertion is almost trivialwith the constants γ either being summated or just passed on. The challenge isonce more the induction through the nonlinear univariates v = ϕ ( u ). To limitthe notational complexity we will connect the two point pairs at the u level bystraight lines settingˇ u ≡ ˇ u ( t ) ≡ u (ˇ y )(1 − t ) + tu (ˇ z ) ⇒ ∂ ˇ u ( t ) /∂t = ∆ˇ u ≡ u (ˇ z ) − u (ˇ y ) , ˆ u ≡ ˆ u ( t ) ≡ u (ˆ y )(1 − t ) + tu (ˆ z ) ⇒ ∂ ˆ u ( t ) /∂t = ∆ˆ u ≡ u (ˆ z ) − u (ˆ y ) , ˜ u ≡ ˜ u ( t ) ≡ ♦ ˆ y ˇ y u ( x )(1 − t ) + t ♦ ˆ z ˇ z u ( x ) ⇒ ∂ ˜ u/∂t = ∆˜ u ≡ ♦ ˆ z ˇ z u ( x ) − ♦ ˆ y ˇ y u ( x ) . Here x is fixed and we assume as induction hypothesis that k ∆˜ u k ≤ γ u max (cid:2) k ˆ z − ˆ y k max( k x − ˇ y k , k x − ˇ z k ) , k ˇ z − ˇ y k max( k x − ˆ y k , k x − ˆ z k ) (cid:3) . (11)9o connect the piecewise linearizations of v define˜ v ( t ) ≡ [ ϕ (ˇ u ( t )) + ϕ (ˆ u ( t ))] + ϕ (ˇ u ( t )) − ϕ (ˆ u ( t ))ˇ u ( t ) − ˆ u ( t ) (cid:2) ˜ u ( t ) − (ˇ u ( t ) + ˆ u ( t )) (cid:3) The quantity we want to find a bound for is ∆˜ v ≡ ˜ v (1) − ˜ v (0) = ♦ ˆ z ˇ z v ( x ) − ♦ ˆ y ˇ y v ( x ) . By the mean value theorem we find some ¯ t ∈ [0 ,
1] where∆˜ v = ˜ v (1) − ˜ v (0) = ∂ ˜ v ( t ) ∂t (cid:12)(cid:12)(cid:12)(cid:12) t =¯ t = [ ϕ ′ (ˇ u )∆ˇ u + ϕ ′ (ˆ u )∆ˆ u ] + ϕ (ˇ u ) − ϕ (ˆ u )ˇ u − ˆ u (cid:2) ∆˜ u − (∆ˇ u + ∆ˆ u ) (cid:3) + (cid:26) ϕ ′ (ˇ u )∆ˇ u − ϕ ′ (ˆ u )∆ˆ u ˇ u − ˆ u − ( ϕ (ˇ u ) − ϕ (ˆ u ))(∆ˇ u − ∆ˆ u )(ˇ u − ˆ u ) (cid:27) (cid:2) ˜ u − (ˇ u + ˆ u ) (cid:3) . The functions ˇ u, ˆ u, ˜ u here and following are to be read as evaluated at t = ¯ t .Now introduce ¯ u as the mean value of ˇ u and ˆ u at which the difference quotientof ϕ over the intervening interval is equal to its derivative. This yields:∆˜ v = [ ϕ ′ (ˇ u )∆ˇ u + ϕ ′ (ˆ u )∆ˆ u ] + ϕ ′ (¯ u ) (cid:2) ∆˜ u − (∆ˇ u + ∆ˆ u ) (cid:3) + (cid:26) ϕ ′ (ˇ u )∆ˇ u + ϕ ′ (¯ u )(∆ˆ u − ∆ˇ u ) − ϕ ′ (ˆ u )∆ˆ u ˇ u − ˆ u (cid:27) (cid:2) ˜ u − (ˇ u + ˆ u ) (cid:3) = (cid:26) ( ϕ ′ (ˇ u ) − ϕ ′ (¯ u ))∆ˇ u + ( ϕ ′ (ˆ u ) − ϕ ′ (¯ u ))∆ˆ u ˇ u − ˆ u (cid:27) (ˇ u − ˆ u ) + ϕ ′ (¯ u )∆˜ u + (cid:26) ( ϕ ′ (ˇ u ) − ϕ ′ (¯ u ))∆ˇ u − ( ϕ ′ (ˆ u ) − ϕ ′ (¯ u ))∆ˆ u ˇ u − ˆ u (cid:27) [(˜ u − ˇ u ) + (˜ u − ˆ u )]= ϕ ′ (¯ u ) − ϕ ′ (ˆ u )ˇ u − ˆ u ∆ˆ u (˜ u − ˇ u ) − ϕ ′ (¯ u ) − ϕ ′ (ˇ u )ˇ u − ¯ u ∆ˇ u (˜ u − ˆ u ) + ϕ ′ (¯ u )∆˜ u . The two quotients are bounded according to (cid:12)(cid:12)(cid:12)(cid:12) ϕ ′ (¯ u ) − ϕ ′ (ˇ u )ˇ u − ˆ u (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) ϕ ′ (¯ u ) − ϕ ′ (ˆ u )ˇ u − ˆ u (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) ϕ ′ (ˇ u ) − ϕ ′ (¯ u )ˇ u − ¯ u (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) ¯ u − ˇ u ˇ u − ˆ u (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) ϕ ′ (ˆ u ) − ϕ ′ (¯ u )ˆ u − ¯ u (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) ¯ u − ˆ u ˇ u − ˆ u (cid:12)(cid:12)(cid:12)(cid:12) ≤ L K ( ϕ ′ ) (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) ¯ u − ˇ u ˇ u − ˆ u (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) ¯ u − ˆ u ˇ u − ˆ u (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) = L K ( ϕ ′ ) , where the last equality follows from ¯ u being between ˇ u and ˆ u . Hence, we findthat | ∆˜ v | ≤ L K ( ϕ ′ ) max ( | ∆ˇ u || ˜ u − ˆ u | + | ∆ˆ u || ˜ u − ˇ u | ) + L K ( ϕ ) | ∆˜ u | . (12)The factors in the middle are easily bounded by | ∆ˇ u | = | u (ˇ z ) − u (ˇ y ) | ≤ β u k ˇ z − ˇ y k and | ∆ˆ u | = | u (ˆ z ) − u (ˆ y ) | ≤ β u k ˆ z − ˆ y k . t such that | ˜ u ( t ) − ˇ u ( t ) | ≤ max( | ˜ u (0) − ˇ u (0) | , | ˜ u (1) − ˇ u (1) | )= max( | ♦ ˆ y ˇ y u ( x ) − u (ˇ y ) | , | ♦ ˆ z ˇ z u ( x ) − u (ˇ z ) | )= max( | ♦ ˆ y ˇ y u ( x ) − ♦ ˆ y ˇ y u (ˇ y ) | , | ♦ ˆ z ˇ z u ( x ) − ♦ ˆ z ˇ z u (ˇ z ) | ) ≤ β u max( k x − ˇ y k , k x − ˇ z k ) , where the last inequality follows from (i). Analogously, we can derive | ˜ u ( t ) − ˆ u ( t ) | ≤ β u max( k x − ˆ y k , k x − ˆ z k ) . Substituting this into (12) we get | ∆˜ v | ≤ γ v max( k ˇ z − ˇ y k k x − ˆ y k , k ˇ z − ˇ y k k x − ˆ z k , k ˆ z − ˆ y k k x − ˇ y k , k ˆ z − ˆ y k k x − ˇ z k ) , with γ v ≡ L K ( ϕ ) γ u + L K ( ϕ ′ ) β u , which completes the proof of ( iii ). Finally,we have to prove ( iv ), which gives a bound on the increment part only. Setting ξ ≡ (˚ z + ˚ y ) + ∆ x one gets, with the results already proved and a few triangleinequalities, that k ∆ ˆ z ˇ z F (∆ x ) − ∆ ˆ y ˇ y F (∆ x ) k = k ♦ ˆ z ˇ z F (˚ z + ∆ x ) − F (˚ z ) − ( ♦ ˆ y ˇ y F (˚ y + ∆ x ) − F (˚ y )) k = k ♦ ˆ z ˇ z F ( ξ ) − F (˚ z ) + ♦ ˆ z ˇ z F (˚ z + ∆ x ) − ♦ ˆ z ˇ z F ( ξ ) − ♦ ˆ y ˇ y F ( ξ ) + F (˚ y ) − ♦ ˆ y ˇ y F (˚ y + ∆ x ) + ♦ ˆ y ˇ y F ( ξ ) k≤ β F k ˚ z − ˚ y k + β F k ˚ z + ∆ x − (˚ z + ˚ y ) − ∆ x k + β F k ˚ y + ∆ x − (˚ z + ˚ y ) − ∆ x k + γ F max[ k ˇ z − ˇ y k max( k ξ − ˆ z k , k ξ − ˆ y k ) , k ˆ z − ˆ y k max( k ξ − ˇ z k , k ξ − ˇ y k )]= 2 β F k ˚ z − ˚ y k + γ F max[ k ˇ z − ˇ y k max( k ξ − ˆ z k , k ξ − ˆ y k ) , k ˆ z − ˆ y k max( k ξ − ˇ z k , k ξ − ˇ y k )] . Now, since for example k (˚ y + ˚ z ) − ˇ z k = k (˚ y − ˚ z ) − ˇ z + (ˇ z + ˆ z ) k ≤ ( k ˚ y − ˚ z k + k ∆ z k ) , both inner maxima can be bounded by the same expression, namely k ∆ x k + ( k ˚ z − ˚ y k + max( k ∆ z k , k ∆ y k )) , so that we obtain the upper bound k ∆ ˆ z ˇ z F (∆ x ) − ∆ ˆ y ˇ y F (∆ x ) k≤ β F k ˚ z − ˚ y k + γ F max( k ˇ z − ˇ y k , k ˆ z − ˆ y k ) (cid:2) k ∆ x k + ( k ˚ z − ˚ y k + max( k ∆ z k , k ∆ y k )) (cid:3) . Finally, we can also bound k ˇ y − ˇ z k = k ˇ y − ˚ y − ˇ z + ˚ z + ˚ y − ˚ z k ≤ k ˚ y − ˚ z k + ( k ∆ y k + k ∆ x k ) , which yields the assertion after some elementary modifications. ¿From thesecant result we can easily get the bound for the tangent model by settingˇ z = ˆ z = ˚ z and ˇ y = ˆ y = ˚ y . 11s one can see by setting ˇ y = x = ˇ z the assertion ( ii ) almost follows from( iii ), except that a factor of 2 is lost in the constants. The proposition also statesthat the values of F at ˇ x and ˆ x are reproduced exactly by our approximationas one would expect from a secant approximation. This property clearly nailsdown the piecewise linearization rules (5) and (4) with (7) for all univariatefunctions. Also, there is no doubt that addition and subtraction should belinearized according to (2) and that multiplications v i = c v j by constants c should yield the differentiated version ∆ v i = c ∆ v j , which is a special case of(3). For general multiplications v i = v j ∗ v k the two values ˇ v i = ˇ v j ∗ ˇ v k andˆ v i = ˆ v j ∗ ˆ v k could also be interpolated by linear functions other than the onedefined by (3). However, we currently see no possible gain in that flexibility,and maintaing the usual product rule form seems rather attractive. Stable Surjectivity
Recall that a continuous function F : R n → R m is called proper if the preimageof every compact set is compact. One can easily see that any piecewise linearfunction is proper if and only if it maps no affine ray { a + λb : λ ≥ } , ( a, b ∈ R n ) , to a point. This is trivially the case if n = m and the Jacobians ofall selection functions of F are invertible. In [Rad17] it was proved that if apiecewise linear function F : R n → R n is proper, there exists a d ∈ Z suchthat for all regular values y of F , i.e. all values y such that the Jacobian at allpreimages of y exists and is invertible [which is trivially the case if F − ( y ) = ∅ ],it holds d = X x ∈ F − ( y ) sign[det( D x F )] =: deg(F) . Hence, any regular value y of a proper piecewise linear function F : R n → R n has at least one preimage if deg( F ) = 0. This implies surjectivity of F by theclosedness of piecewise linear functions (cf. [Sch12]) and the well known factthat regular values lie dense in the range. We call deg( F ) the degree of F . Lemma 3.3. ([Rad17, Cor. 5.2]) Assume a piecewise linear function F : R n → R n is composed of k affine functions F i with invertible linear parts A i and define ρ F := min (cid:26) k A − k , . . . , k A − k k (cid:27) . Moreover, let < ε < ρ F and l ≥ . Then all piecewise linear functions G : R n → R n with k F ( x ) − G ( x ) k ≤ ε k x k + l ∀ x ∈ R n are proper and their degree equals that of F . The latter statement enables us to prove another stability result:
Proposition 3.4.
Let F ∈ span (Φ) where F : R n → R n and assume a piecewiselinearization ♦ ˚ x F : R n → R n is composed of k affine functions F i with invertible inear parts A i . Moreover, define ρ F as in Lemma 3.3. Then for all tangentand secant mode piecewise linearizations with development points in the ball B (˚ x, ρ F /γ F ) , where γ F is defined as in Proposition 3.2, the mapping degree iswell defined and equals that of ♦ ˚ x F .Proof. Just plug Proposition 3.2. ( iii ) into Lemma 3.3.The fact that the mapping degree of a coherently oriented piecewise linearfunction cannot be zero yields a rather pointed statement:
Corollary 3.5.
In the situation of Proposition 3.4, if ♦ ˚ x F is coherently ori-ented, then all tangent and secant mode piecewise linearizations with develop-ment points in the ball B (˚ x, ρ F /γ F ) are surjective. We will proceed by proposing and analyzing a possible application of the tangentand secant approximations developed in the previous sections. For this wepresent generalized Newton’s methods for composite piecewise smooth functions F : R n → R n based on the piecewise linear approximations, both for the tangentand the secant mode. The merit of these methods is the fact that they impose nostrong differentiability requirements but require only piecewise differentiabilityat the root in question. Definition 1 (Newton operator) . Let F ∈ span(Φ) and x ∗ be an isolated rootof F in an open neighborhood D . The Newton step for F is definable on D in tangent or secant mode if the piecewise linear equation ♦ ˚ x F ( x ) = 0 resp. ♦ ˆ x ˇ x F ( x ) = 0 has at least one root for all ˚ x ∈ D resp. ˇ x, ˆ x ∈ D .Then the Newton operator is defined in tangent mode as N (˚ x ) = arg min {k x − ˚ x k : ♦ ˚ x F ( x ) = 0 } and in secant mode as N (ˆ x, ˇ x ) = arg min {k x − (ˆ x + ˇ x ) k : ♦ ˆ x ˇ x F ( x ) = 0 } . Definition 1 is the minimal assumption under which we can conclude, us-ing the approximation and Lipschitz continuity results of Section 3, that theiteration x k +1 = N ( x k ) (tangent) x k +1 = N ( x k , x k − ) (secant) (13)converges locally to an isolated root x ∗ . Assuming that the piecewise lineariza-tion ♦ x ∗ F is bijective on a ball D = B ( x ∗ , ¯ R ) we will proceed to show that closeto x ∗ • the Newton step can be defined 13 the Newton step stays close to x ∗ • the tangent mode Newton method converges quadratically and the secantmode Newton method converges with the golden mean as order.We know that the assumption of bijectivity on a ball can be relaxed throughthe use of degree theory, but this requires a significant technical effort for whichwe refer to subsequent works. Note that the minimal assumption for the localconvergence of semismooth Newton [Hin10] is similar, namely, that there alwaysexists a generalized Jacobian J ( x ) that has a uniformly bounded inverse overall x ∈ B ( x ∗ , ¯ ρ ).For most of the following results up to the last it is sufficient to considerthe secant mode piecewise linear approximation of F , as results for the tangentmode can be obtained by setting ˇ x = ˆ x = ˚ x .The general bound for the difference of piecewise linearizations with differentbasis points ˇ z, ˆ z and ˇ y = ˆ y = ˚ y can be refined to a closer bound in the case thatone is to express that bound only in distances to ˚ y . Lemma 4.1.
Let the second order constant γ F be valid on some convex set U and x, ˚ y, ˇ z, ˆ z ∈ U . Then k ♦ ˆ z ˇ z F ( x ) − ♦ ˚ y F ( x ) k ≤ γ F (cid:20) max( k ˆ z − ˚ y k , k ˇ z − ˚ y k ) · k x − ˚ y k + 12 k ˆ z − ˚ y k k ˇ z − ˚ y k (cid:21) . Proof.
Select some N ∈ N and consider the subdivision of the segments [ˇ z, ˚ y ]and [ˆ z, ˚ y ] by ˇ u k = ˚ y + kN (ˇ z − ˚ y ) and ˆ u k = ˚ y + kN (ˆ z − ˚ y ). Then by Prop. 3.2.( iii ) k ♦ ˆ u k +1 ˇ u k +1 F ( x ) − ♦ ˆ u k ˇ u k F ( x ) k ≤ γ F max " k ˆ u k +1 − ˆ u k k max( k x − ˇ u k +1 k , k x − ˇ u k k ) k ˇ u k +1 − ˇ u k k max( k x − ˆ u k +1 k , k x − ˆ u k k ) ≤ γ F max " N k ˆ z − ˚ y k (cid:0) k x − ˚ y k + k +1 N k ˇ z − ˚ y k (cid:1) N k ˇ z − ˚ y k (cid:0) k x − ˚ y k + k +1 N k ˆ z − ˚ y k (cid:1) = γ F (cid:2) N max( k ˆ z − ˚ y k , k ˇ z − ˚ y k ) k x − ˚ y k + k +1 N k ˆ z − ˚ y kk ˇ z − ˚ y k (cid:3) . Summation and limit N → ∞ results in the claim.We assume hereafter that ♦ x ∗ F is bijective on a ball D = B ( x ∗ , ¯ R ), where x ∗ is an isolated root of F , which implies that it is also coherently oriented andmetrically regular on the latter. This means that there exists a constant c > x ∈ D , y ∈ ♦ x ∗ F ( D ) (cid:13)(cid:13)(cid:13) x − ( ♦ x ∗ F | D ) − ( y ) (cid:13)(cid:13)(cid:13) ≤ c k ♦ x ∗ F ( x ) − y k . Lemma 4.2 (existence of roots of the PL approximation) . If R ≤ min( ¯ R, cγ F ) and ρ = R then for any ˇ x, ˆ x ∈ B ( x ∗ , ρ ) the piecewise linearization ♦ ˆ x ˇ x F has aroot in B ( x ∗ , R ) . roof. We intend to apply the Brouwer theorem to the fixed point operator T ( x ) = x + x ∗ − ( ♦ x ∗ F | D ) − (cid:0) ♦ ˆ x ˇ x F ( x ) (cid:1) . (14)Any fixed point is then a root of ♦ ˆ x ˇ x F . As T is obviously continuous, we onlyneed to show that T maps B ( x ∗ , R ) into itself. Using metric regularity andLemma 4.1 we find k T ( x ) − x ∗ k = (cid:13)(cid:13)(cid:13) x − ( ♦ x ∗ F | D ) − (cid:0) ♦ ˆ x ˇ x F ( x ) (cid:1)(cid:13)(cid:13)(cid:13) ≤ c (cid:13)(cid:13) ♦ x ∗ F ( x ) − ♦ ˆ x ˇ x F ( x ) (cid:13)(cid:13) ≤ cγ F (cid:2) max( k ˇ x − x ∗ k , k ˆ x − x ∗ k ) k x − x ∗ k + k ˇ x − x ∗ k k ˆ x − x ∗ k (cid:3) ≤ ρR (cid:18) R + 12 ρ (cid:19) ≤ R. Lemma 4.3 (containment of the roots) . Under the same assumptions any rootof ♦ ˆ x ˇ x F in B ( x ∗ , R ) is actually contained in B ( x ∗ , ρ ) .Proof. Let x be a root of ♦ ˆ x ˇ x F . Then again using metric regularity and Lemma 4.1the distance to x ∗ has the bound k x − x ∗ k ≤ c k ♦ x ∗ F ( x ) k = c k ♦ x ∗ F ( x ) − ♦ ˆ x ˇ x F ( x ) k≤ cγ F (cid:2) max( k ˇ x − x ∗ k , k ˆ x − x ∗ k ) k x − x ∗ k + k ˇ x − x ∗ k k ˆ x − x ∗ k (cid:3) ≤ k x − x ∗ k + k ˇ x − x ∗ k k ˆ x − x ∗ k ρ so that k x − x ∗ k ≤ k ˇ x − x ∗ k k ˆ x − x ∗ k ρ ≤
14 min( k ˇ x − x ∗ k , k ˆ x − x ∗ k ) < ρ. (15) Corollary 4.4 (Newton iteration) . Let ♦ x ∗ F be bijective on a ball D = B ( x ∗ , ¯ R ) ,where x ∗ is an isolated root of F ∈ span(Φ) . Then on B ( x ∗ , ρ ) , where ρ is de-fined as in Lemma 4.2, the Newton step is defined in both the tangent and secantmode and maps back into B ( x ∗ , ρ ) . The thus definable Newton iteration con-verges at least linearly.Proof. As the next Newton iterate is among the roots of minimal distance tothe basis point(s) of the linearization, one needs to ensure that any root that ♦ ˆ x ˇ x F may have outside the ball B ( x ∗ , R ) has a larger distance to { ˇ x, ˆ x } thanthe root that is known to exist inside B ( x ∗ , ρ ). The distance from the basispoints inside B ( x ∗ , ρ ) to the outside of B ( x ∗ , R ) is at least R − ρ = R . Thedistance from the root of ♦ ˆ x ˇ x F inside B ( x ∗ , R ) to the basis points is at most ρ + ρ = R and thus the smaller distance.By equation (15) of the last lemma we also see that the distance of the rootof ♦ ˆ x ˇ x F to x ∗ is at most the distance of the basis points to x ∗ , which implieslinear convergence. 15 orollary 4.5 (convergence rates) . Let R = min( ¯ R, cγ F ) and ρ = R . Then forall initial points x ( , x ) ∈ B ( x ∗ , ρ ) the Newton iteration in tangent mode x k +1 = N ( x k ) converges quadratically resp. in secant mode x k +1 = N ( x k , x k − ) converges with order √ towards x ∗ .Proof. With the choice of the initial points and by Lemma 4.3, the full iterationsequence stays inside B ( x ∗ , ρ ). Tangent mode:
By replacing x, ˇ x, ˆ x with x j +1 , x j , x j equation (15) in Lemma 4.3we get k x j +1 − x ∗ k ≤ ρ k x j − x ∗ k which implies the quadratic convergence of the sequence ( x j ) towards x ∗ , k x j − x ∗ k ≤ ρ (cid:18) k x − x ∗ k ρ (cid:19) j . Secant mode:
Replacing x, ˇ x, ˆ x with x j +2 , x j +1 , x j in equation (15) in Lemma 4.3one finds k x j +2 − x ∗ k ≤ ρ k x j +1 − x ∗ k k x j − x ∗ k . As in the scalar secant method, this implies convergence with rate φ = √ ormore precisely k x j − x ∗ k ≤ ρ (cid:18) k x − x ∗ k ρ (cid:19) F j − (cid:18) k x − x ∗ k ρ (cid:19) F j ≤ ρ (cid:18) max( k x − x ∗ k , k x − x ∗ k )4 ρ (cid:19) φ j . where ( F j ) is the Fibonacci sequence with F = 0, F = 1 and F j +1 ≤ φ j .The inner iterations of (13) require the solution of piecewise linear equa-tions. However, these may possess several solutions ∆ x j , of which we mustfind one of minimal norm. So far, the solvers that we know and surveyed in[Gri13, GBRS15, SGRB14, Rad16] require at least coherent orientation to guar-antee convergence. Hence, the actual implementation of successive piecewiselinearization needs further study. 16 Singularity Free Implementation
In contrast to the tangent mode of piecewise linearization, the secant modeinvolves two points of evaluation ˇ v i , ˆ v i ∈ R n , which means that its computationalcost of the primal values are roughly twice that of the tangent mode. Thesedefine a line segment [ˇ v i , ˆ v i ] ≡ { λ ˇ v i + (1 − λ )ˆ v i | λ ∈ [0 , } for any intermediateoperation. The formal definition of the secant slope given in equation (7) maycause numerically unstable divisions when the denominator gets small duringthe transition from secant to tangent mode, e.g. when the secant mode Newtoniteration scheme converges.However, in this section we will provide singularity free closed form expressions.Therefore an exception handling at ˇ v i = ˆ v i will no longer be necessary. To thatend we move from the line segment representation to a midpoint-radius basedrepresentation. Now let v = ϕ ( u ), where ϕ ∈ { sin , exp , . . . } is some elementaryoperation and(ˇ v i , ˆ v i ) = ( ϕ (ˇ v j ) , ϕ (ˆ v j )) (˚ v i , δv i ) , where ˚ v i = ˇ v i + ˆ v i δv i = − ˇ v i − ˆ v i . We adopted the concept of representing intervals via midpoint and radius frominterval arithmetic calculus (described in detail e.g. by Siegfrid Rump in [Rum99],or by G¨otz Alefeld and J¨urgen Herzberger in [AH12]). We remark though, thatin the present setting the radius δv i ∈ R is allowed to become negative as well.Now one can rewrite the secant slope of differentiable functions to the newrepresentation c ij ≡ ˇ v i − ˆ v i ˇ v j − ˆ v j = δv i δv j . Using some algebraic manipulations one can find individual formulas for theaforementioned propagation rules of the secant mode:binary operation ˚ v i δv i c ij c ik v i = v j + v k ˚ v j + ˚ v k δv j + δv k v i = v j − v k ˚ v j − ˚ v k δv j − δv k − v i = v j · v k ˚ v j ˚ v k + δv j δv k δv j ˚ v k + ˚ v j δv k ˚ v k ˚ v j v i = v j v k ˚ v j ˚ v k − δv j δv k ˚ v k − δv k δv j ˚ v k − ˚ v j δv k ˚ v k − δv k v k − ˚ v j ˚ v k − δv k Alternatively, we could represent v i = v j /v k as an application of a multiplicationon the reciprocal 1 /v k . Furthermore, we can represent the multiplication by the17ppolonius identity as above. Moreover, for unary operations we get:unary operation ˚ v i δv i c ij v i = sin( v j ) sin(˚ v j ) cos( δv j ) cos(˚ v j ) sin( δv j ) cos(˚ v j ) sinc( δv j ) v i = cos( v j ) cos(˚ v j ) cos( δv j ) − sin(˚ v j ) sin( δv j ) − sin(˚ v j ) sinc( δv j ) v i = exp( v j ) exp(˚ v j ) cosh( δv j ) exp(˚ v j ) sinh( δv j ) exp(˚ v j ) sinhc( δv j ) v i = log( v j ) 12 log(˚ v j + δv j ) artanh (cid:18) δv j ˚ v j (cid:19) v j artanhc (cid:18) δv j ˚ v j (cid:19) Note that by, e.g., [Wei16] sinc( x ) and sinhc( x ) (hyperbolic sinc( x )) have regularTaylor expansionssin( x ) = x · sinc( x ) = x · ∞ X i =0 ( − x ) n (2 n + 1)! , sinc( x ) ≡ x · sinhc( x ) ≡ x · ∞ X i =0 x n (2 n + 1)! . We want to define artanhc( x ) similar to sinhc( x ) via its Taylor expansion:tanh − ( x ) = artanh( x ) = x · artanhc( x ) = x · ∞ X i =0 x n n + 1and it can be implemented in a similar fashion as sinhc( x ) from the boost c++libraries (see [lib]). For Root functions ( v i = c √ v j ), general Powers ( v i = v cj , orin a binary fashion v i = v v k j ) and monomials ( v i = v nj ) one can use the identity v i = v cj = exp( c · log( v j )) or v i = v v k j = exp( v k · log( v j ))and apply the rules above. Of course, the base v j > v i δv i c ij v i = v nj ( n natural number) X k ≤ n even (cid:18) nk (cid:19) ˚ v n − kj δv kj X k ≤ n odd (cid:18) nk (cid:19) ˚ v n − kj δv kj X k ≤ n odd (cid:18) nk (cid:19) ˚ v n − kj δv k − j v i = v j (special case: square) ˚ v j + δv j v j δv j v j Remark . Of course one can finda lot more singularity free formulas for the secant slopes of other operations.Using a Taylor expansion approach one can provide general approximation for-18ulas for the triplet ˚ v i , δv i and c ij by:˚ v i = 12 X k ≥ ϕ ( k ) (˚ v j ) k ! δv kj + X k ≥ ϕ ( k ) (˚ v j ) k ! ( − δv j ) k = X k ≥ k even ϕ ( k ) (˚ v j ) k ! δv kj ,δv j = ϕ (˚ v j + δv j ) − ϕ (˚ v j − δv j )2 = X k> k odd ϕ ( k ) (˚ v j ) k ! δv kj , c ij = X k> k odd ϕ ( k ) (˚ v j ) k ! δv k − j . Consider the function F ( x ) = (cid:20) cos( ϕ ( ∠ x ) − ∠ x ) − sin( ϕ ( ∠ x ) − ∠ x )sin( ϕ ( ∠ x ) − ∠ x ) cos( ϕ ( ∠ x ) − ∠ x ) (cid:21) · x − c , where c = [1 . , . ⊤ , and ∠ x ∈ [0 , π [ is the angle of x = ( x , x ) ⊤ in polarcoordinate representation. Moreover, ϕ , which is defined by ϕ ( ψ ) ≡ ψ + 85 π ψ − π ψ + 25 π ψ maps [0 , π [ strictly monotonically onto itself. Hence, F ( x ) is bijective. Further-more, the function is differentiable everywhere except at the origin. There it is,just as the Euclidean norm, locally Lipschitz continuous and not even piecewisedifferentiable in the sense of [Sch12].We investigated the behavior of the tangent and secant mode Newton on F both with and without noise. That is, we investigated F and ˜ F , where˜ F ( x ) = F ( x ) + sin(5000 · [ x + x ])10 . The secant mode was started with the initial values ˇ x = [ − . , − . ⊤ andˆ x = [7 . , . ⊤ . The tangent mode was started with the mean value of the latterpoints.We recall the well known formula for approximating the convergence ratenumerically: γ ≈ log (cid:12)(cid:12)(cid:12) x n +1 − x n x n − x n − (cid:12)(cid:12)(cid:12) log (cid:12)(cid:12)(cid:12) x n − x n − x n − − x n − (cid:12)(cid:12)(cid:12) . . . . . . . . . . e −
06 0 . . e −
13 0 . . e −
15 5 . e −
067 3 . e −
108 8 . e − γ . ≈ . ≈ √ The next table shows the residuals of the iterations with noise.iteration residual with tangent mode res. with secant mode0 13 . . . . . . . . . . . . . . . . e −
058 0 . . e −
069 0 . . e − . . e − . e −
05 4 . e − . e − . e − . e − γ . ≈ . ≈ √ Both methods attain their theoretical convergence rates in either iteration. How-ever, we observe that the secant mode fares better with the problem with noiseas it cuts through the latter, while the tangent Newton is thrown off for severaliteration steps.
The framework developed in the present paper, as well as in [Gri13, GBRS15,SGRB14, Rad16, GHRS16], aims at presenting viable alternatives to currentapproaches to piecewise differentiability as it may occur, e.g., in nonsmoothnonlinear systems or ODEs with nonsmooth right hand side. The piecewise20inearizations, which were first introduced in [Gri13] can be obtained in anautomated fashion by an adaptation of AD tools such as ADOL-C [WG12].The generalized Newton iterations introduced in Section 5 are intended asan alternative to semismooth Newton [Hin10]. The quadratic convergence rateof the tangent version is in line with that of semismooth Newton. While for onestep of semismooth Newton an appropriate element of the generalized derivativehas to be calculated, the piecewise linear Newton’s methods solve a piecewiselinear system in each step. Both problems are NP-hard in general, but maybe solvable in practice with essentially the same effort as an ordinary linearsystem. Hence, it is likely highly situation dependent, which approach yieldsbetter performance.It is our hope that, in combination with the formulas for numerically stableimplementation of the secant linearization, the generalized Newton’s methodscan be developed into robust and stable workhorse algorithms, which mighteven outperform the quadratic methods on selected problems. For example onproblems with oscillating noise we observed that the secant method requiredfewer Newton steps to converge, as it cuts through the oscillations.
Acknowledgements
The work for the article has been partially conducted within the Research Cam-pus MODAL funded by the German Federal Ministry of Education and Research(BMBF) (fund number 05M14ZAM).
References [AH12]
Alefeld , G. ;
Herzberger , J.:
Introduction to Interval Computa-tion . Elsevier Science, 2012 (Computer Science and Applied Math-ematics). https://books.google.de/books?id=rUsX5x0OqUcC . –ISBN 9780080916361[Cla83]
Clarke , F.H.:
Optimization and Nonsmooth Analysis . Wiley-Interscience, 1983 (Canadian Mathematical Society Series of Mono-graphs and Advanced Texts). – ISBN 047187504X[GBRS15]
Griewank , A. ;
Bernt , J.U. ;
Radons , M. ;
Streubel , T.: Solv-ing piecewise linear systems in abs-normal form. In:
Linear Algebraand its Applications
471 (2015), Nr. 0, 500 - 530. [GHRS16]
Griewank , A. ;
Hasenfelder , R. ;
Radons , M. ;
Streubel ,T.: Integrating Lipschitzian Dynamical Systems using PiecewiseAlgorithmic Differentiation. In:
Submitted 2016 (2016)[Gri13]
Griewank , A.: On stable piecewise linearization and generalized al-gorithmic differentiation. In:
Optimization Methods and Software http://dx.doi.org/10.1080/10556788.2013.796683 . – DOI 10.1080/10556788.2013.796683[GW08]
Griewank , A. ;
Walther , A.:
Evaluating Derivatives: Principlesand Techniques of Algorithmic Differentiation . Society for Indus-trial and Applied Mathematics (SIAM), 2008 (Other Titles in Ap-plied Mathematics). http://epubs.siam.org/doi/book/10.1137/1.9780898717761 . – ISBN 978–0–89871–776–1[Hin10]
Hintermuller , M.:
Semismooth Newton methods and applications .Oberwolfach Seminar on Mathematics of PDE-Constrained Op-timization, Mathematisches Forschunginstitut, Oberwolfach, Ger-many, 2010[iee85] IEEE Standard for Binary Floating-Point Arithmetic. In:
AN-SI/IEEE Std 754-1985 (1985). http://dx.doi.org/10.1109/IEEESTD.1985.82928 [Kum88]
Kummer , B.: Newton’s method for non-differentiable functions. In:
Advances in Math. Optimization (1988)[lib] libraries boost c.: boost/math/special functions/sinhc.hpp . –retrieved from , Last visited on 12/01/2016[Rad16]
Radons , M.: Direct solution of piecewise linear systems. In:
Theo-retical Computer Science
626 (2016), S. 97–109[Rad17]
Radons , M.: Degree theory for piecewise affine functions. In:
Sub-mitted 2017 (2017)[Rum99]
Rump , Siegfried M.: Fast and Parallel Interval Arithmetic. In:
BITNumerical Mathematics
39 (1999), Nr. 3, 534–554. http://dx.doi.org/10.1023/A:1022374804152 . – DOI 10.1023/A:1022374804152.– ISSN 1572–9125[Sch12]
Scholtes , S.:
Introduction to Piecewise Differentiable Equations .Springer New York, 2012 (SpringerBriefs in optimization). http://link.springer.com/book/10.1007/978-1-4614-4340-7 . – ISBN978–1–4614–4340–7[SGRB14]
Streubel , T. ;
Griewank , A. ;
Radons , M. ;
Bernt , J.U.: Repre-sentation and Analysis of Piecewise Linear Functions in Abs-normalform. In:
Proc. of the IFIP TC 7 (2014), S. 323–332[Wei16]
Weisstein , Eric W.:
Sinhc Function. From MathWorld—AWolfram Web Resource . \url{http://mathworld.wolfram.com/SinhcFunction.html} . Version: 1999-2016. – Last visited on12/01/2016 22WG12] Walther , A. ;
Griewank , A.: Getting started with ADOL-C.In:
Naumann , U. (Hrsg.) ;
Schenk , O. (Hrsg.):