[PDF] Numerical Differentiation using local Chebyshev-Approximation

Abstract

In applied mathematics, especially in optimization, functions are often only provided as so called "Black-Boxes" provided by software packages, or very complex algorithms, which make automatic differentation very complicated or even impossible. Hence one seeks the numerical approximation of the derivative. Unfortunately numerical differentation is a difficult task in itself, and it is well known that it is numerical instable. There are many works on this topic, including the usage of (global) Chebyshev approximations. Chebyshev approximations have the great property that they converge very fast, if the function is smooth. Nevertheless those approches have several drawbacks, since in practice functions are not smooth, and a global approximation needs many function evalutions. Nevertheless there is hope. Since functions in real world applications are most times smooth except for finite points, corners or edges. This motivates to use a local Chebyshev approach, where the function is only approximated locally, and hence the Chebyshev approximations still yields a fast approximation of the desired function. We will study such an approch in this work, and will provide a numerical example

Full PDF

NNumerical Diﬀerentiation using localChebyshev-Approximation

Stefan H. ReitererFebruary 5, 2021

Abstract

In applied mathematics, especially in optimization, functions are often only provided as so called”Black-Boxes” provided by software packages, or very complex algorithms, which make automaticdiﬀerentation very complicated or even impossible. Hence one seeks the numerical approximationof the derivative.Unfortunately numerical diﬀerentation is a diﬃcult task in itself, and it is well known thatit is numerical instable. There are many works on this topic, including the usage of (global)Chebyshev approximations. Chebyshev approximations have the great property that they con-verge very fast, if the function is smooth. Nevertheless those approches have several drawbacks,since in practice functions are not smooth, and a global approximation needs many functionevalutions.Nevertheless there is hope. Since functions in real world applications are most times smoothexcept for ﬁnite points, corners or edges. This motivates to use a local Chebyshev approach,where the function is only approximated locally, and hence the Chebyshev approximations stillyields a fast approximation of the desired function. We will study such an approch in this work,and will provide a numerical example.Disclaimer: This work was done as private research during my study years and is not relatedto my current aﬃliation at all. No funding was recieved for this work.

First we recall some well known deﬁnitions. the following deﬁnitions and properties are takenfrom [1].

Deﬁnition 2.1 (Chebyshev-Polynomials) . For n ∈ N the n -th Chebyshev polynomial T n ( x ) :[ − , → R is given by T n ( x ) = cos( n arccos( x )) . From this deﬁnition we see that the Chebyshevpolynomial takes it’s extrema at x k := − cos (cid:18) kπn (cid:19) for k = 0 , . . . , n. The points ( x k ) nk =0 are the so called Gauss-Lobatto Grid-points. a r X i v : . [ m a t h . NA ] F e b he best way to approximate functions via Chebyshev-polynomials lies in the following Proposition 2.2 (Chebyshev-Series & Co) . Set ω ( x ) := π (1 − x ) − / , then (cid:104) f, g (cid:105) ω := (cid:90) − f ( x ) g ( x ) ω ( x ) dx forms the (weighted) scalar product of the Hilbert-Space L ,ω := (cid:26) f : [ − , → R | (cid:107) f (cid:107) L ,ω = (cid:113) (cid:104) f, f (cid:105) ω < ∞ (cid:27) . Then for a function f ∈ L ,ω the Chebyshev-Coeﬃcients a n := (cid:104) f, T n (cid:105) ω , exists, and the Chebyshev-Series a ∞ (cid:88) n =1 a n T n ( x ) , converges in the L ,ω sense to f .Proof. These well known facts follow from applying classical Fourier-theory to˜ f = f (cos( t )) ∈ L (0 , π ).Since Chebyshev-Polynomials are “cosines in disguise” other important properties from Fourier-theory carry over to Chebyshev-series, like spectral convergence for smooth functions etc.Another way to approximate a function is to use the Chebyshev-Interpolation Polynomial C n = n (cid:80) k =0 = b k T k , which is uniquely deﬁned by f ( x k ) = C n ( x k ) for k = 0 , . . . , n .Although the Chebyshev-Series yields the best approximation there is a little known factbetween the two approximation types, namely the Proposition 2.3.

Let f : [ − , → R a function with f ( x ) = a ∞ (cid:88) n =1 a n T n ( x ) , and | a | ∞ (cid:88) n =1 | a n | < ∞ . Further let for f N ( x ) := a + (cid:80) Nn =1 a n T n ( x ) , E T ( N ) := sup x ∈ [ − , | f ( x ) − f N ( x ) | , be the trunctation error, and Then E T ( N ) ≤ ∞ (cid:88) n = N | a n | , nd for the interpolation error sup x ∈ [ − , | f ( x ) − C N ( x ) | ≤ E T ( N ) . Hence the penalty for using interpolation instead of truncation is at most a factor of two ! Ad-ditionally the coeﬃcients b k of the interpolation polynomial C n are related to the exact coeﬃcents a k by the identity b k = a k + ∞ (cid:88) j =1 a k +2 jn + a − k +2 jn . (1) That means the approximated coeﬃcients diﬀer from the exact coeﬃcients by the aliasing-error,and hence the error vanishes with O ( a n ) Proof.

See [1, Thm. 6& Thm. 21].Since with Chebyshev interpolation we are close to the realm of the DFT, additional methodsfrom signal processing, like denoising could be considered to handle numerical distortions.To work with a function g : [ a, b ] → R a generalized Chebyshev interpolation for g canbe achieved by using a linear transformation ϕ : [ a, b ] → [ − , f = g ◦ ϕ .Since computing the derivative of a polynomial can be done exactly we could use the derivativeof C N for some N , as the numerical derivative of f . This means f (cid:48) ≈ C (cid:48) N . Then how about the errors? First recall the identity T (cid:48) n ( x ) = nU n − . Hence for the truncationerror we have f (cid:48) ( x ) − f (cid:48) N ( x ) = E (cid:48) N ( x ) = ∞ (cid:88) n = N na n U n − ( x ) . Considering that U n has it’s extrema at ± U n − ( ±

Proposition 2.4.

For f : [ − , → R smooth we have the error estimation sup x ∈ [ − , | f (cid:48) ( x ) − C (cid:48) N ( x ) | ≤ E (cid:48) T ( N ) O (2 n a n ) . Hence the penalty for using interpolation instead of truncation when diﬀerentiating is at most a factor of two ! Practical Considerations in the 1D Case and General-izations

In this section consider a function f : [ a, b ] → R , which is continuous and piecewise smooth, butnot diﬀerentiable in a set of ﬁnite points ( y k ) k ∈ I ⊂ [ a, b ] (with I ﬁnite and allowed to be empty).If we interpolate now the function f on the interval [ a, b ], we know from Fourier-Theory, thatthe error will converge only slowly to zero. We also note that it is easy to see that (from variabletransformations) that error estimation depends on the length of the interval [ a, b ]. with thefactor ( b − a ) N . Hence if one restricts the function f only on a small interval, one achieves moreprecise interpolation results, with lesser degrees of the interpolation polynomial. If we restrictthe function f to some interval [ c, d ] ⊂ [ a, b ], with1. d − c < y i (cid:54)∈ ( c, d ) for i ∈ I ,then the sequence of the Chebyshev interpolation polynomials ( C N ) N ∈ N converges very fast (withspectral convergence) and uniformly to the original function in the interval [ c, d ].From Proposition 2.4 we know that this then also applies to C (cid:48) N , since the error decays with O ( n a n ). Nevertheless we also see that for practical use it is important to ensure the smoothnesson the observed interval [ c, d ], because of the term n . This leads to the following deﬁnition Deﬁnition 3.1.

Let f : [ a, b ] → R be continuous and piecewise smooth, then we call the Cheby-shev interpolation polynomial C N, [ c,d ] : [ c, d ] → R of f | [ c,d ] the local smooth Chebyshevpolynomial iﬀ ( y k ) k ∈ I ∩ ( c, d ) = ∅ . This motivates the following algortithm to compute derivatives of a function f (if the points Y := ( y k ) k ∈ I are known): − h ; d= x+h < a and b < d and i n t e r s e c t i o n (Y , [ c , d ] ) != { } :h = m a k e h s m a l l e r ( h )c = x − h ; d= x+hCN = c o m p u t e i n t e r p o l a t i o n ( f , c , d ,N)CN prime = d e r i v e p o l y n o m i a l (CN)r e t u r n CN prime ( x )e l s e : − h ; d [ 0 ] = xc [ 1 ] = x ; d [ 1 ] = x+hw h i l e a l l ( [ c [ i ] < a and b < d [ i ] and \ i n t e r s e c t i o n (Y, ( c [ i ] , d [ i ] ) ) != {} f o r i i n [ 0 , 1 ] ] ) :4 h f h C , [ x ± h ] C , [ x ± h ] . e − e − e − e − e − . e − . e − . e − e − . e −

10 1 . e −

10 1 . e − . e − e −

10 5 e −

10 1 . e − e − e −

13 5 e −

13 2 . e − e − e −

16 5 e −

16 1 . e − f h = m a k e h s m a l l e r ( h )c [ 0 ] = x ; d [ 0 ] = x+hc [ 1 ] = x − h ; d [ 1 ] = xf o r i i n [ 0 , 1 ] :CN[ i ] = c o m p u t e i n t e r p o l a t i o n ( f , c [ i ] , d [ i ] ,N)CN prime [ i ] = d e r i v e p o l y n o m i a l (CN[ i ] )r e t u r n [ CN prime [ 0 ] ( x ) , CN prime [ 1 ] ( x ) ]As one can see we also included more general derivatives based on directional derivatives like sub-gradients. It also would be possible to deﬁne a weak derivative witch is motivated by Chebyshevconvergence theory deﬁned by f (cid:48) ( x ) := ( f (cid:48) ( − x ) + f (cid:48) (+ x )).A question which still remains, is what to do when the set Y is unknown. This is still anopen problem, but since one can observe the speed of the decay of the coeﬃcients a n (or b n ) it ispossible to locate non diﬀerentiable points. There are several works on this topic from spectraltheory.Now to the M dimensional case: Consider a functional f : Ω ⊆ R M → R , which is continuousand piecewise smooth on the domain Ω. We can compute the directional derivatives in a direction h ∈ R M in a point x ∈ Ω by deriving the one dimensional function g ( t ) := f ( x + th ). Hence themethods from the one dimensional case, can be carried over to the M dimensional setting. as ﬁrst we deﬁne the function f given by (see also Figure 4): f ( x ) = (cid:40) x if x > , f (cid:48) ( x ) ≈ f ( x + h ) − f ( x − h )2 h =: f h ( x ) , by computing the values at x = 0 and x = 0 .

5. See Table 4 for results. As we can observethe errors of f h and C , [ x ± h ] are equal. This comes without surprise, since the derivative of the3rd Chebyshev polynomial is exactly the central diﬀerence quotient. Next we will modify f inthe following way: For some ε >

0, and a randomly standard normal distributed variable X wedeﬁne f by f ( x ) = f ( x ) + εX h f h C , [ x ± h ] C , [ x ± h ] C , [ x ± h ] . e − e − e − . e − . e − e − . e − e − e − e − e − . e − e − . e − . e − e − e − e − e − e − e − . e − e − e − e − e − e − e − e − e − f Figure 1: Function f and then try to diﬀerentiate it at 0 .

5. the results can be seen at Table 4. The errors are maximalvalues after taking some samples (

There is some space for improvement here, e.g. betterstatistics, but we get the picture... ) We Observe the following behavior: While f h and C , [ x ± h ] behave like expected, we see that making h smaller, does not make the approximationquality of C , [ x ± h ] and C , [ x ± h ] better, but even worse. This can be explained by the fact, thatfor smooth f , we know that the function behaves like it’s Taylor polynomial of lower order,while the higher order terms are neglectable. This also means that computing a higher orderChebyshev polynomial does not make sense, and we even get into more trouble. hence whenchoosing a suitable h for the Chebyshev approximation should be not too small, and yet not toobig.Finally we try the Local Chebyshev Method on the 2D Rosenbrock function R a,b ( x ) = ( a − x ) + b ( x − x ) , with parameters a = 1 and b = 100, and try to ﬁnd the minimum with help of gradient methods.The argmin is x = (1 ,

1) Additionally we add some disturbances δ ( x ) and ε ( x ) in the interval[ − e − , e − δ is some randomly normal distributed variable, ε is a jump function whichchanges it’s sign (which is fatal for ﬁnite diﬀerences). As optimization method we used SteepestDescent with Armijo-rule and (cid:107)∇ f (cid:107) < e − h = 1 e − h = 1 e − h should be6unction Method Iteration Numbers Result R a,b Exact Gradient 1427 (0 . , . . , . . , . R a,b + δ Finite Diﬀerences 19999 (interrupted) (0 . , . . , . R a,b + ε Finite Diﬀerences 19999 (interrupted) (0 . , . . . References [1] Boyd, John P.