Numerical Differentiation using local Chebyshev-Approximation
NNumerical Differentiation using localChebyshev-Approximation
Stefan H. ReitererFebruary 5, 2021
Abstract
In applied mathematics, especially in optimization, functions are often only provided as so called”Black-Boxes” provided by software packages, or very complex algorithms, which make automaticdifferentation very complicated or even impossible. Hence one seeks the numerical approximationof the derivative.Unfortunately numerical differentation is a difficult task in itself, and it is well known thatit is numerical instable. There are many works on this topic, including the usage of (global)Chebyshev approximations. Chebyshev approximations have the great property that they con-verge very fast, if the function is smooth. Nevertheless those approches have several drawbacks,since in practice functions are not smooth, and a global approximation needs many functionevalutions.Nevertheless there is hope. Since functions in real world applications are most times smoothexcept for finite points, corners or edges. This motivates to use a local Chebyshev approach,where the function is only approximated locally, and hence the Chebyshev approximations stillyields a fast approximation of the desired function. We will study such an approch in this work,and will provide a numerical example.Disclaimer: This work was done as private research during my study years and is not relatedto my current affiliation at all. No funding was recieved for this work.
First we recall some well known definitions. the following definitions and properties are takenfrom [1].
Definition 2.1 (Chebyshev-Polynomials) . For n ∈ N the n -th Chebyshev polynomial T n ( x ) :[ − , → R is given by T n ( x ) = cos( n arccos( x )) . From this definition we see that the Chebyshevpolynomial takes it’s extrema at x k := − cos (cid:18) kπn (cid:19) for k = 0 , . . . , n. The points ( x k ) nk =0 are the so called Gauss-Lobatto Grid-points. a r X i v : . [ m a t h . NA ] F e b he best way to approximate functions via Chebyshev-polynomials lies in the following Proposition 2.2 (Chebyshev-Series & Co) . Set ω ( x ) := π (1 − x ) − / , then (cid:104) f, g (cid:105) ω := (cid:90) − f ( x ) g ( x ) ω ( x ) dx forms the (weighted) scalar product of the Hilbert-Space L ,ω := (cid:26) f : [ − , → R | (cid:107) f (cid:107) L ,ω = (cid:113) (cid:104) f, f (cid:105) ω < ∞ (cid:27) . Then for a function f ∈ L ,ω the Chebyshev-Coefficients a n := (cid:104) f, T n (cid:105) ω , exists, and the Chebyshev-Series a ∞ (cid:88) n =1 a n T n ( x ) , converges in the L ,ω sense to f .Proof. These well known facts follow from applying classical Fourier-theory to˜ f = f (cos( t )) ∈ L (0 , π ).Since Chebyshev-Polynomials are “cosines in disguise” other important properties from Fourier-theory carry over to Chebyshev-series, like spectral convergence for smooth functions etc.Another way to approximate a function is to use the Chebyshev-Interpolation Polynomial C n = n (cid:80) k =0 = b k T k , which is uniquely defined by f ( x k ) = C n ( x k ) for k = 0 , . . . , n .Although the Chebyshev-Series yields the best approximation there is a little known factbetween the two approximation types, namely the Proposition 2.3.
Let f : [ − , → R a function with f ( x ) = a ∞ (cid:88) n =1 a n T n ( x ) , and | a | ∞ (cid:88) n =1 | a n | < ∞ . Further let for f N ( x ) := a + (cid:80) Nn =1 a n T n ( x ) , E T ( N ) := sup x ∈ [ − , | f ( x ) − f N ( x ) | , be the trunctation error, and Then E T ( N ) ≤ ∞ (cid:88) n = N | a n | , nd for the interpolation error sup x ∈ [ − , | f ( x ) − C N ( x ) | ≤ E T ( N ) . Hence the penalty for using interpolation instead of truncation is at most a factor of two ! Ad-ditionally the coefficients b k of the interpolation polynomial C n are related to the exact coefficents a k by the identity b k = a k + ∞ (cid:88) j =1 a k +2 jn + a − k +2 jn . (1) That means the approximated coefficients differ from the exact coefficients by the aliasing-error,and hence the error vanishes with O ( a n ) Proof.
See [1, Thm. 6& Thm. 21].Since with Chebyshev interpolation we are close to the realm of the DFT, additional methodsfrom signal processing, like denoising could be considered to handle numerical distortions.To work with a function g : [ a, b ] → R a generalized Chebyshev interpolation for g canbe achieved by using a linear transformation ϕ : [ a, b ] → [ − , f = g ◦ ϕ .Since computing the derivative of a polynomial can be done exactly we could use the derivativeof C N for some N , as the numerical derivative of f . This means f (cid:48) ≈ C (cid:48) N . Then how about the errors? First recall the identity T (cid:48) n ( x ) = nU n − . Hence for the truncationerror we have f (cid:48) ( x ) − f (cid:48) N ( x ) = E (cid:48) N ( x ) = ∞ (cid:88) n = N na n U n − ( x ) . Considering that U n has it’s extrema at ± U n − ( ±
1) = ( ± n n , we immediately see thatsup x ∈ [ − , | f (cid:48) ( x ) − f (cid:48) N ( x ) | ≤ ∞ (cid:88) n = N n | a n | sup x ∈ [ − , | U n − ( x ) | ≤ ∞ (cid:88) n = N n | a n | = O ( n a n ) . For the interpolation polynomial C N we get by using the aliasing identity (1) and rearrangingterms like in the proof of [1, Thm. 21] the error estimation | f (cid:48) ( x ) − C (cid:48) N ( x ) | ≤ ∞ (cid:88) n = N n | a n | = O (2 n a n ) . Hence we have the
Proposition 2.4.
For f : [ − , → R smooth we have the error estimation sup x ∈ [ − , | f (cid:48) ( x ) − C (cid:48) N ( x ) | ≤ E (cid:48) T ( N ) O (2 n a n ) . Hence the penalty for using interpolation instead of truncation when differentiating is at most a factor of two ! Practical Considerations in the 1D Case and General-izations
In this section consider a function f : [ a, b ] → R , which is continuous and piecewise smooth, butnot differentiable in a set of finite points ( y k ) k ∈ I ⊂ [ a, b ] (with I finite and allowed to be empty).If we interpolate now the function f on the interval [ a, b ], we know from Fourier-Theory, thatthe error will converge only slowly to zero. We also note that it is easy to see that (from variabletransformations) that error estimation depends on the length of the interval [ a, b ]. with thefactor ( b − a ) N . Hence if one restricts the function f only on a small interval, one achieves moreprecise interpolation results, with lesser degrees of the interpolation polynomial. If we restrictthe function f to some interval [ c, d ] ⊂ [ a, b ], with1. d − c < y i (cid:54)∈ ( c, d ) for i ∈ I ,then the sequence of the Chebyshev interpolation polynomials ( C N ) N ∈ N converges very fast (withspectral convergence) and uniformly to the original function in the interval [ c, d ].From Proposition 2.4 we know that this then also applies to C (cid:48) N , since the error decays with O ( n a n ). Nevertheless we also see that for practical use it is important to ensure the smoothnesson the observed interval [ c, d ], because of the term n . This leads to the following definition Definition 3.1.
Let f : [ a, b ] → R be continuous and piecewise smooth, then we call the Cheby-shev interpolation polynomial C N, [ c,d ] : [ c, d ] → R of f | [ c,d ] the local smooth Chebyshevpolynomial iff ( y k ) k ∈ I ∩ ( c, d ) = ∅ . This motivates the following algortithm to compute derivatives of a function f (if the points Y := ( y k ) k ∈ I are known): − h ; d= x+h < a and b < d and i n t e r s e c t i o n (Y , [ c , d ] ) != { } :h = m a k e h s m a l l e r ( h )c = x − h ; d= x+hCN = c o m p u t e i n t e r p o l a t i o n ( f , c , d ,N)CN prime = d e r i v e p o l y n o m i a l (CN)r e t u r n CN prime ( x )e l s e : − h ; d [ 0 ] = xc [ 1 ] = x ; d [ 1 ] = x+hw h i l e a l l ( [ c [ i ] < a and b < d [ i ] and \ i n t e r s e c t i o n (Y, ( c [ i ] , d [ i ] ) ) != {} f o r i i n [ 0 , 1 ] ] ) :4 h f h C , [ x ± h ] C , [ x ± h ] . e − e − e − e − e − . e − . e − . e − e − . e −
10 1 . e −
10 1 . e − . e − e −
10 5 e −
10 1 . e − e − e −
13 5 e −
13 2 . e − e − e −
16 5 e −
16 1 . e − f h = m a k e h s m a l l e r ( h )c [ 0 ] = x ; d [ 0 ] = x+hc [ 1 ] = x − h ; d [ 1 ] = xf o r i i n [ 0 , 1 ] :CN[ i ] = c o m p u t e i n t e r p o l a t i o n ( f , c [ i ] , d [ i ] ,N)CN prime [ i ] = d e r i v e p o l y n o m i a l (CN[ i ] )r e t u r n [ CN prime [ 0 ] ( x ) , CN prime [ 1 ] ( x ) ]As one can see we also included more general derivatives based on directional derivatives like sub-gradients. It also would be possible to define a weak derivative witch is motivated by Chebyshevconvergence theory defined by f (cid:48) ( x ) := ( f (cid:48) ( − x ) + f (cid:48) (+ x )).A question which still remains, is what to do when the set Y is unknown. This is still anopen problem, but since one can observe the speed of the decay of the coefficients a n (or b n ) it ispossible to locate non differentiable points. There are several works on this topic from spectraltheory.Now to the M dimensional case: Consider a functional f : Ω ⊆ R M → R , which is continuousand piecewise smooth on the domain Ω. We can compute the directional derivatives in a direction h ∈ R M in a point x ∈ Ω by deriving the one dimensional function g ( t ) := f ( x + th ). Hence themethods from the one dimensional case, can be carried over to the M dimensional setting. as first we define the function f given by (see also Figure 4): f ( x ) = (cid:40) x if x > , f (cid:48) ( x ) ≈ f ( x + h ) − f ( x − h )2 h =: f h ( x ) , by computing the values at x = 0 and x = 0 .
5. See Table 4 for results. As we can observethe errors of f h and C , [ x ± h ] are equal. This comes without surprise, since the derivative of the3rd Chebyshev polynomial is exactly the central difference quotient. Next we will modify f inthe following way: For some ε >
0, and a randomly standard normal distributed variable X wedefine f by f ( x ) = f ( x ) + εX h f h C , [ x ± h ] C , [ x ± h ] C , [ x ± h ] . e − e − e − . e − . e − e − . e − e − e − e − e − . e − e − . e − . e − e − e − e − e − e − e − . e − e − e − e − e − e − e − e − e − f Figure 1: Function f and then try to differentiate it at 0 .
5. the results can be seen at Table 4. The errors are maximalvalues after taking some samples (
There is some space for improvement here, e.g. betterstatistics, but we get the picture... ) We Observe the following behavior: While f h and C , [ x ± h ] behave like expected, we see that making h smaller, does not make the approximationquality of C , [ x ± h ] and C , [ x ± h ] better, but even worse. This can be explained by the fact, thatfor smooth f , we know that the function behaves like it’s Taylor polynomial of lower order,while the higher order terms are neglectable. This also means that computing a higher orderChebyshev polynomial does not make sense, and we even get into more trouble. hence whenchoosing a suitable h for the Chebyshev approximation should be not too small, and yet not toobig.Finally we try the Local Chebyshev Method on the 2D Rosenbrock function R a,b ( x ) = ( a − x ) + b ( x − x ) , with parameters a = 1 and b = 100, and try to find the minimum with help of gradient methods.The argmin is x = (1 ,
1) Additionally we add some disturbances δ ( x ) and ε ( x ) in the interval[ − e − , e − δ is some randomly normal distributed variable, ε is a jump function whichchanges it’s sign (which is fatal for finite differences). As optimization method we used SteepestDescent with Armijo-rule and (cid:107)∇ f (cid:107) < e − h = 1 e − h = 1 e − h should be6unction Method Iteration Numbers Result R a,b Exact Gradient 1427 (0 . , . . , . . , . R a,b + δ Finite Differences 19999 (interrupted) (0 . , . . , . R a,b + ε Finite Differences 19999 (interrupted) (0 . , . . . References [1] Boyd, John P.