Ergodic Sensitivity Analysis of One-Dimensional Chaotic Maps
EErgodic Sensitivity Analysis of One-Dimensional Chaotic Maps
Adam A. ´Sliwiak ∗ , Nisha Chandramoorthy , and Qiqi Wang July 30, 2020
Abstract
Sensitivity analysis in chaotic dynamical systems is a challenging task from a computational point ofview. In this work, we present a numerical investigation of a novel approach, known as the space-splitsensitivity or S3 algorithm. The S3 algorithm is an ergodic-averaging method to differentiate statistics inergodic, chaotic systems, rigorously based on the theory of hyperbolic dynamics. We illustrate S3 on one-dimensional chaotic maps, revealing its computational advantage over na¨ıve finite difference computationsof the same statistical response. In addition, we provide an intuitive explanation of the key componentsof the S3 algorithm, including the density gradient function.
Keywords: sensitivity analysis, chaotic systems, ergodicity, space-split sensitivity (S3) method
Sensitivity analysis is a discipline that studies the response of outputs of a certain model to changes ininput parameters. It involves computing the derivatives of output quantities of interest with respect tospecified parameters. This mathematical tool is essential in many engineering and scientific applications,as it enables optimal design of structures [11, 28] and fluid-thermal systems [2], analysis of heterogeneousflows [22], supply chain management [26], estimate errors and uncertainties in measurement, modeling andnumerical computations [3, 24].For example, consider an optimal design problem in structural mechanics involving an elastic truss underloading. In such problems, the ultimate goal might be to approximate derivatives of constrained functions(e.g. resulting displacements) with respect to design parameters (e.g. bar cross-sections). Such derivativescan be approximated using analytical methods, as well as finite differences [13]. A classic example from fluidmechanics is a turbulent flow past a rigid object, in which the sensitivity of the drag (resistance) forces withrespect to the Reynolds number and other flow parameters [4, 19], is of interest. Particularly, aerospaceengineers use the computed sensitivity in the design of airfoils [10]. Both mechanical phenomena describedabove are governed by strongly nonlinear dynamical systems, however the latter features an extra difficulty,namely the chaotic behavior.Computing such sensitivities in chaotic dynamical systems is a challenging task. The primary issue isthe so-called butterfly effect , which is a large sensitivity of the system to initial conditions. This concept isassociated with the classical study of Edward Lorenz on climate prediction [16]. Quantitatively, it means thatany two points initially separated by an infinitesimal distance diverge at an exponential rate. This impliesthe prediction of far-future states in chaotic phenomena is hardly possible. We observe this phenomenon indaily weather forecasts, as the predictions of several weeks forward tend to be highly inaccurate. However,we are sometimes interested in predicting the response of long-time averaged behavior, to perturbations[4, 19, 7]. ∗ Corresponding author, E-mail address: [email protected] a r X i v : . [ n li n . C D ] J u l n the last few decades, there have been different attempts to compute sensitivities of long-term aver-ages in chaotic systems. The conventional methods [12, 6], which require solving either tangent or adjointequations, fail if the time-averaging window is large. Due to butterfly effect , almost every infinitesimal per-turbation to the system expands exponentially, and therefore the sensitivity computed using the tangent oradjoint solutions grows equally fast. A more successful family of methods utilize the concept of shadowingtrajectories. Methods like least squares shadowing (LSS) [25, 23] or its computationally cheaper variant,known as non-intrusive least squares shadowing (NILSS) [20], provide accurate derivatives of long-time aver-ages in many small- and large-scale problems, e.g. weakly turbulent flows [19]. However, shadowing methodshave been proven to have a systematic error, which can be non-zero if the connecting map between the baseand shadowing trajectory is not differentiable [18]. Some approaches adopt the Fluctuation-Dissipation The-orem, which is widely used in the statistical equilibrium analysis of turbulence, Brownian motion, and otherareas [14]. Unfortunately, they are inexact as well, when they do not assume specific properties of the phys-ical systems, e.g. Gaussian distribution of the equilibrium state [1]. Moreover, they require solving costlyFokker-Planck equations, which makes them infeasible for large systems [5]. Another group of methods forsensitivity analysis are trajectory-based and utilize Ruelle’s linear response formula [21]. Many of these tech-niques, generically referred to as ensemble methods, solve tangent/adjoint equations and compute ensembleaverage over a trajectory to estimate the sensitivity [9, 15]. The two major drawbacks of ensemble-basedmethods is that they exhibit slow convergence since they suffer from exponentially increasing variance of thetangent/adjoint equations [9, 7].Space-split sensitivty (S3) is an alternative trajectory-based method that uses Ruelle’s formula [8]. How-ever, unlike the ensemble methods, it does not manifest the problem of unbounded variances. Moreover,the S3 method does not assume that the probability distribution in state space is of a particular type (e.g.Gaussian), and also does not rely on directly estimating the probability distribution by e.g. discretizingphase space. In the paper, we will closely review the basic concepts of the S3 method in the context ofone-dimensional chaotic maps.In Section 2 of this paper, we review two representative one-dimensional maps that exhibit chaoticbehavior, namely the sawtooth and cusp map. The space-split sensitivity method is derived in Section 3.Section 4 focuses on the interpretation and computational aspects of the density gradient, which is a keyquantity appearing in the S3 method. Section 5 demonstrates numerical examples showing sensitivitiesgenerated using the S3 method. Finally, Section 6 concludes this paper. In this section, we introduce two families of perturbed one-dimensional chaotic systems that can generallybe expressed as x k +1 = ϕ ( x k ; s ) , x = x init , (1)where x init is a given initial condition, while s denotes a scalar parameter. Let J be a scalar observable. Ourquantity of interest is the infinite-time average or ergodic average of J , (cid:104) J (cid:105) := lim N →∞ N N − (cid:88) i =0 J ( x i ; s ) . (2)In particular, we focus on the relationship between (cid:104) J (cid:105) and the parameter s for ϕ . In addition, we reviewthe concepts of Lyapunov exponents and ergodicity through numerical illustrations on the two maps. We consider as our first example, perturbations of the sawtooth map, also known as the dyadic transforma-tion, defined in the following way: x k +1 = ϕ ( x k ; s ) = 2 x k + s sin(2 πx k ) mod 1 , x k ∈ [0 , . (3)2t is a periodic map that maps [0 ,
1) to itself. Figure 1 illustrates the sawtooth map for different values ofthe parameter s .Figure 1: The sawtooth map at different values of parameter s . Note if s = 0, we obtain the classicalBernoulli shift.A natural question that arises is whether the chosen is map actually chaotic. Roughly speaking, a chaotic map shows high sensitivity to initial conditions. For example, consider s = 0 , and two phase points x and x + δx . Under one iteration of the map, these two points are now separated by a distance of 2 δx. Thus, in thelimit δx → , a trajectory that is infinitesimally separated from x at n = 0 moves away from the trajectoryof x at an exponential rate of log 2 ≈ . . This exponential growth of perturbations to the state is thesignature of chaotic systems and is measured by the rate of asymptotic growth, known as the Lyapunovexponent (LE) and denoted by λ . More rigorously, the Lyapunov exponent is defined by λ ( s ) = lim n →∞ n n − (cid:88) k =0 log (cid:12)(cid:12)(cid:12)(cid:12) ∂ϕ∂x ( x k ; s ) (cid:12)(cid:12)(cid:12)(cid:12) , (4)which clearly indicates that the infinite-time averaged rate of growth converges to a constant. We say thata map is chaotic when its LE is positive. Formula 4 requires computing the derivative of the map at pointsalong a trajectory. Note that the value of the Lyapunov exponent does not depend on initial condition x init ,nor on the step k . Figure 2 shows that λ > s ∈ ( − π , π ] meaning that the sawtooth map is chaoticin this regime. This can be easily justified by the observation that ∂ϕ∂x ≥ x ∈ [0 , s = 0, repetitive the sawtooth map always appears toconverge to a fixed point, after some iterations, when simulated numerically. This is because all machine-representable numbers with a fixed number of digits after the binary point, are dyadic-rational numbers,which converge to the fixed point 0, under this map, because the sawtooth map at s = 0 is simply a leftshiftoperation on binary digits. More details about this problem and possible remedies can be found in AppendixA.1. Note also that if s = 0 and x init is rational, the forward orbit of x init would either converge to a fixedpoint or be periodic, containing a finite number of distinct values within the interval [0 , x init = 0 .
1, then all future states belong to a four-element set, { . , . , . , . } , and x k = x k +4 for all k >
0. This is an example of an unstable periodic orbit; in this paper, we are interested in chaotic orbits,which are aperiodic and unstable to perturbations. 3igure 2: Relation between the Lyapunov exponent λ and parameter s ∈ (cid:2) − π , π (cid:3) for the sawtooth map. Another example of a chaotic map is the cusp map ϕ : [0 , → [0 , h ] defined as follows, x k +1 = ϕ ( x k ; h, γ ) = h − (cid:12)(cid:12)(cid:12)(cid:12) − x k (cid:12)(cid:12)(cid:12)(cid:12) − (cid:18) h − (cid:19) | − x k | γ . (5)The above function produces a spade-shaped graph, as shown in Figure 3. The cusp map is a two-parametermap with s = { h, γ } , where h is the height, while γ is a parameter that determines the sharpness of thetip. We use the definition Eq. 4 to compute the LE of the cusp map at different values of h and γ . Fromthe positivity of the Lyapunov exponent shown in Figure 4, we see that the cusp map is always chaotic if γ ∈ [0 ,
1] and h ≥ .
6. 4igure 3: The cusp map at different values of parameters h and γ . Note all the curves include points (0 , , . , h ). If h = γ = 1, the map is piecewise linear, and this particularcase is usually referred to as the tent map.Figure 4: Relation between the Lyapunov exponent λ and parameter γ ∈ [0 ,
1] for the cusp map.Historically, the cusp map has been used as a one-dimensional representation of the three-dimensionalLorenz’63 system [16], a set of ordinary differential equations used as a model for atmospheric convection.Specifically, the iterates of the cusp map are local maxima of the third coordinate of the Lorenz’63 system[17]. 5 .3 Ergodic probability distributions
The long-time average of the objective function, (cid:104) J (cid:105) , was calculated using 100 million iterates of the map,with the initial condition chosen uniformly, at random between (0,1), in the following way: (cid:104) J (cid:105) ≈ N N − (cid:88) i =0 J ( x i ) , (6)where x i +1 = ϕ ( x i ). We choose a sufficiently large N to ensure that the right hand side converges to a fixedvalue, within numerical precision. Figures 5 and 6 illustrate examples of the mean statistics (i.e. long-timeaverages) and their dependence on the map parameters for the sawtooth and cusp map, respectively.Figure 5: Long-time averaged behavior with respect to the map parameter for the sawtooth map. Theobjective function itself does not depend to s , and is defined as J ( x ) = cos(2 πx ). In our computations, J isaveraged over 100 million samples. 6igure 6: Long-time averaged behavior with respect to the map parameters for the cusp map. The objectivefunction itself does not depend to h nor γ and is the same as in the previous example, i.e. J ( x ) = cos(2 πx ).In the computation of the long-time averages of the objective function, we used the concept of ergodicity.This property guarantees that long-time averages do not depend on the initial conditions. That is, the timeaverage of the objective function (right hand side of Eq. 6) converges, as N → ∞ , to a value independent ofthe initial condition x , for almost every x chosen uniformly between (0 , ergodic , invariant probability distribution ρ . This probability distribution ρ is invariant under ϕ, in the sense thatfor any open interval A ⊂ (0 , ρ ( A ) = ρ ( ϕ − ( A )) . In addition, ρ is defined by the fact that expectationswith respect to ρ are the same as infinite-time averages starting from a point uniformly distributed in theunit interval. Such a probability distribution ρ, is known as the SRB distribution [27], and only sometimescoincides with the uniform distribution (it does e.g. for the sawtooth map at s = 0). The above descriptioncan be mathematically rephrased as follows, for almost every x uniformly distributed in (0 , (cid:104) J (cid:105) = lim N →∞ N N − (cid:88) i =0 J ( x i ) = (cid:90) U J ( x ) ρ ( x ) dx. (7)Thus, in ergodic systems, there exist two alternative ways of computing the long-time average, either throughthe averaging of the time-series or ensemble averaging. The latter requires prior computation of the prob-ability distribution, which will be explained and illustrated in the next sections. Using these preliminaryconcepts, we will review the space-split sensitivity method to compute the derivative of (cid:104) J (cid:105) with respect tothe map parameter. In [21], Ruelle rigorously derived a formula for the derivative of the quantity of interest, (cid:104) J (cid:105) , with respectto the map parameter s . This expression is an ensemble average (or expectation) with respect to ρ , whichcan be simplified for one-dimensional maps ϕ : U → U to dds (cid:90) U J ( x ) ρ ( x ) dx = ∞ (cid:88) k =0 (cid:90) U f ( x ) d (cid:0) J ◦ ϕ k (cid:1) ( x ) dx ρ ( x ) dx (8)where f ( x ) := ∂ϕ (cid:0) ϕ − ( x ) (cid:1) ∂s (9)7eflects the parameter perturbation of the map, while U refers to the unit interval [0 , d ( J ◦ ϕ k )( x ) dx = (cid:16) dJdx ( ϕ k ( x )) (cid:17) k − (cid:89) j =0 ∂ϕ∂x ( ϕ j ( x )) . (10)As discussed earlier, for a large k , the product of the derivatives exponentially grows with k . However,Ruelle’s series converges due to cancellations of these large quantities, upon taking an ensemble average.This problem makes the direct evaluation of Ruelle’s formulation computationally impractical since a largenumber of trajectories are needed for these cancellations. More precisely, since for a large k , d (cid:0) J ◦ ϕ k (cid:1) dx ( x ) ∼ O ( e λk ) , (11)at almost every x , we need to increase the number of trajectories by a factor of O ( e λk ) in order to reducethe mean-squared error in a linear fashion. For example, consider the sawtooth map with s ∈ [ − π , π ]. Inthis case, ( ∂ϕ/∂x ) ∈ [1 , k, an overflow error isencountered. Another challenge is that the evaluation of the SRB distribution requires expensive computationof map probability densities [5]. In a recent study [8], Ruelle’s formula has been reformulated to a differentensemble average, known as the S3 formula. There, the latter formula has been derived for maps of arbitrarydimension, and is based on splitting the total sensitivity into that due to stable and unstable perturbations.Note the notion of splitting the perturbation space is irrelevant for 1D maps, and the one-dimensionalperturbation is, by defintion of chaos, unstable. Therefore we will skip some aspects of the original derivation,and note that our derivation represents only the unstable component of sensitivity in [8] specialized to 1D.The S3 formula, corresponding to equations 8–9, can be expressed as follows: dds (cid:90) U J ( x ) ρ ( x ) dx = − ∞ (cid:88) k =0 (cid:90) U ∇ ρ f ( x ) J (cid:0) x k ) ρ ( x ) dx, (12)where ∇ ρ f ( x ) := 1 ρ ( x ) d (cid:0) ρ ( x ) f ( x ) (cid:1) dx = dfdx ( x ) + f ( x ) g ( x ) , (13)and, g ( x ) := 1 ρ ( x ) dρdx ( x ) . (14)For one-dimensional maps, the derivation is simple, as it requires integrating 8 by parts and the fact thatthe integral of df /dx at the boundary of U vanishes; see Appendix A.2 for the full derivation. We observethat both J and df /dx have their analytical forms. However, the function g ( x ), which will be referred toas density gradient , does not have a closed-form expression, since the SRB distribution ρ , is unknown. Thedensity gradient g represents the variation in phase space, of the logarithm of ρ ( x ), g ( x ) := 1 ρ ( x ) dρdx ( x ) = d log ρ ( x ) dx . (15)In the next section, we focus on further interpreting g ( x ), its computation and verification on the 1D mapsintroduced in Section 2. In this section, we focus on the density gradient function, denoted by g ( x ). First, we present a computable,iterative scheme for g ( x ). Moreover, we provide an intuitive explanation for g and visualize it on the mapsintroduced in Section 2. 8ased on the S3 formula (Eq. 12), we can conclude the following recursive relation g (cid:0) ϕ ( x ) (cid:1) = g ( x ) dϕ ( x ) /dx − d ϕ ( x ) /dx (cid:0) dϕ ( x ) /dx (cid:1) , (16)holds. The full derivation of Eq. 16 is included in Appendix A.3. This recursive procedure can be used toapproximate g ( x ) along a trajectory in the asymptotic sense, which means that we need a sufficiently largenumber of iterations to obtain an accurate approximation of g ( x ) [8]. In practice, we generate a sufficientlylong trajectory, compute first and second derivatives of the map evaluated along the trajectory, and applyEq. 16. We arbitrarily set g ( x init ) = 0, to start the recursive procedure, and obtain g ( ϕ ( x init )) . The recursionis continued by setting x = ϕ ( x init ) , and so on. For a sufficiently large K , the true value of g ( ϕ K ( x init )) isapproached, for almost every initial condition x init . To intuitively understand the density gradient formula (Eq. 16), we isolate the effects of each term in Eq.16. In order to do this, we consider a small interval around an iterate x k and examine two cases: 1. themap is a straight line on this interval and 2. the map has a constant curvature on this interval. These twocases are graphically shown on the left (numbered as 1) and right hand sides (numbered as 2) of Figure 7.The x -axis represents an interval around an iterate x k and the y-axis, an interval around x k +1 = ϕ ( x k ).The density ρ , around each interval, is shown adjacent to the axes, as a colormap. The colors reflect thedistribution of ρ on a logarithmic scale.Figure 7: A graphical representation of two different scenarios in one-dimensional maps, to intuitivelyunderstand the derivation of the quantity g . The bold lines illustrate the map, while shaded bars adjacent toeach axis represents the corresponding density distribution on that axis. The region around H correspondsto a high value of density, while the region around L to low values. The slope of the line is indicated as t .1. Consider a small region of ( x − (cid:15), x + (cid:15) ) where the map ϕ ( x ) has zero second derivative, i.e., the firstderivative dϕ/dx is constant. As shown in Figure 7(1), let us assume that the density on the left sideof the region, ρ ( x − (cid:15) ), is higher than the density on the right side, ρ ( x + (cid:15) ). Due probability massconservation, the mapped density can be calculated using the following equation, ρ ( ϕ ( x )) = ρ ( x ) | dϕ/dx | . (17)Since we consider case dϕ/dx >
0, we drop the absolute value. On this interval where the map is astraight line, this statement says that the density around ϕ ( x ) is a constant multiple of the density9round x . On the logarithmic scale, the density around ϕ ( x ) is shifted by a constant, when comparedto the density around x since, log ρ ( ϕ ( x )) = log ρ ( x ) − log dϕdx . (18)This relationship is graphically depicted in Figure 7 (1), where the regions marked H and L , corre-sponding to higher and lower densities, are shifted to the left. Notice that Eq. 18 implies that thedifference, on the logarithmic scale, between the higher and lower densities on the y -axis equals thedifference between the higher and lower densities on the x -axis. However, the small interval is stretchedby a factor of dϕ/dx under one iteration of the map. Thus, the derivative of the logarithm of densitydecreases by a factor of dϕ/dx . Mathematically, we can see this by differentiating both sides of theequation with respect to x (and using that dϕ/dx is constant), (cid:18) ρ dρdx (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) ϕ ( x ) dϕdx = (cid:18) ρ dρdx (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) x (19)From the definition of g , this reduces to, g ( ϕ ( x )) = g ( x ) dϕ/dx , (20)which is confirmed by our formula, Eq. 16, by setting d ϕ/dx = 0.2. To isolate the effect of curvature of the map on g , we consider a curved map and a constant densityregion. Thus, by definition, g ( x ) = 0 in the interval considered. We now describe that g ◦ ϕ ( x ) becomesnon-zero on this interval due to the curvature of the map. Note that Eq. 18 still applies, since it is arestatement of probability mass conservation. This means that ρ is reduced by a factor equal to theslope of the map at every point. This is graphically depicted in Figure 7, in which we have assumedthat dϕ/dx is an increasing function that crosses the value 1 at the point indicated using dashed lines.To the left of this point, the density is therefore increased (shown as H ) around ϕ ( x ) and to theright, the density is decreased (shown as L ), when compared to its uniform value around x . Note alsothe larger the first derivative of the map, the lower the density on the y-axis. Again, by taking thederivative of Eq. 18 with respect to x , and using the definition of g , we obtain g ( ϕ ( x )) = − d ϕ/dx ( dϕ/dx ) . (21)As mentioned in Case 1, the first derivative ( dϕ/dx )( x ), gives the factor by which a length dx around x is stretched (or compressed) by ϕ . The second derivative gives us the change of this stretching (orcompression) as a function of x . Thus, the effect of a non-zero second derivative is felt by the derivative of the density, and can again be derived from measure preservation or probability mass conservation. In the second part of this section, we show numerical results of the density gradient procedure in twoexamples, the sawtooth and cusp maps, which were introduced in Eq. 3 and Eq. 5 respectively. Figure8 shows the stationary probability densities of the sawtooth map at different values of s . We observe thatall curves appear differentiable , however their derivatives are large, near the interval boundaries, when s isclose to − / (2 π ) or 1 /π . 10igure 8: The plot shows the empirically estimated stationary probability distributions achieved by thesawtooth map (Eq. 3). Every curve was generated using 125,829,120,000 samples and counting the numberof solutions in each of 2048 bins of equal length in the interval [0 , g ( x ), computed usingEq. 16, at different values of s , and compare it against its finite difference approximation: (log( ρ ( x + (cid:15) )) − log( ρ ( x − (cid:15) ))) / (2 (cid:15) ) . Note that the expected value of the density gradient is always zero since (cid:90) U g ( x ) ρ ( x ) dx = (cid:90) U ∂ρ∂x dx = [ ρ ( x )] = 0 . (22)Figure 9: Density gradient function, g ( x ) (solid lines), generated using Eq. 16 and compared against theempirically computed value of g ( x ) (dots), where the derivative of ρ ( x ) is estimated using finite difference.We also repeat a similar experiment for the cusp map, whose results are presented in Figures 10–11. Weobserve a behavior similar to the sawtooth map. Similar to the sawtooth density, the densities computed11or the cusp map appear to be differentiable over a range of the parameter γ . However, as γ gets close to 1,the density ρ ( x ) acquires large derivatives at the boundaries of the interval. The boundedness of dρ/dx isneeded for the computation of g ( x ) to be well-conditioned.Figure 10: The plot shows the empirically estimated stationary probability distributions achieved by thecusp map (Eq. 5), at h = 1 and the indicated value of γ . All curves were generated in the same fashion asfor the sawtooth case.Figure 11: The plot compares g ( x ) (solid lines) against the derivative of the empirically estimated stationaryprobability distributions (dots) achieved by the cusp map (Eq. 5), at h = 1 and indicated value of γ . Allcurves were generated in the same fashion as for the sawtooth case.12 Spaces-split sensitivity as a sum of time-correlations
The evaluation of Eq. (12) is the main focus of this paper. In practice, expectations with respect to ρ , orensemble averages, are computed by time-averaging on a single typical trajectory. As mentioned earlier, atime average converges to the ensemble average of the function, as the length of the trajectory approachesinfinity. Thus, Eq. 12 can be written as follows, replacing the ensemble averages with ergodic averages dds (cid:90) U J ( x ) ρ ( x ) dx = − ∞ (cid:88) k =0 lim N →∞ N N − (cid:88) n =0 (cid:16) ∇ ρ f ( x n ) J (cid:0) x n + k ) (cid:17) , (23)where x i = ϕ i ( x init ) is the point at time i , along a trajectory starting at a typical point x init . Using thedefinition of g , and taking a long trajectory, dds (cid:90) U J ( x ) ρ ( x ) dx ≈ − N ∞ (cid:88) k =0 N − (cid:88) n =0 (cid:16) dfdx ( x n ) + f ( x n ) g ( x n ) (cid:17) J (cid:0) x n + k ) . (24) To numerically verify Eq. 24, we consider a set of objective functions, each of which is an indicator functiondenoted by δ c , and defined such that its value is a constant 1 in a small interval around c and zero everywhereelse on the unit interval. With this particular choice, Eq. 24 gives us the gradient of the probability density,since dds (cid:90) U δ c ( x ) ρ ( x ; s ) dx = (cid:90) U δ c ( x ) ∂ρ ( x ; s ) ∂s dx ≈ ∂ρ ( c ; s ) ∂s . (25)Thus, by varying the constant c in the interval [0 , dρ/ds over the unit interval by using Eq. 24. This can be compared with the finitedifference approximation of dρ/ds generated using slightly perturbed values of s and approximating thedensity empirically. This particular choice of J ( x ) exhibits yet another advantage of the S3 method overRuelle’s formula (Eq. 8). The former is also applicable to objective functions that have non-differentiablepoints, since unlike a direct evaluation of Ruelle’s formula, the derivative of J is not used. Figure 12 showsnumerical results for the cusp map, in which the density gradient is computed using the space-split formula(Eq. 24) and compared with the central difference derivative. We observe that only a few terms of the seriesare required to produce accurate sensitivities. 13igure 12: Sensitivity of the density of the cusp map with respect to γ at h = 1 , γ = 0 .
5. The solid linesrepresent the result of Equation (12) when a finite number of terms is used in the summation over k . Thesolid line marked with ( (cid:47) ) represents Equation (12) evaluated with 17 terms, which is visibly indistinguishablefrom the same series summed over 6 or more terms. The dots represent the finite difference derivative of thedensity, evaluated based on the empirical density at h = 1 , γ = 0 .
505 and at h = 1 , γ = 0 . k -th term to Equation (12) for the cusp map. Later terms are overwhelmedby statistical noise.Figure 13 clearly indicates that the consecutive terms of the series in Eq. 23 exponentially decay in norm.We repeat a similar experiment for the sawtooth map (see Figures 14 and 15). In this case, we only needthree terms of Eq. 24 to obtain a result that is indistinguishable from its finite difference approximation.The consecutive terms of Eq. 24 also decay exponentially in norm.14igure 14: Sensitivity of the density of the sawtooth map with respect to s at s = 0 .
1. The solid linesrepresent the result of Eq. 12 when a finite number of terms is used in the summation over k . The solidline marked with ( ◦ ) represent Equation (12) evaluated with 3 terms, is aligned with the the finite differencederivative of the density, evaluated based on the density at s = 0 .
105 and at s = 0 . k -th term to Eq. 12, for the sawtooth map. Later terms are overwhelmedby statistical noise.Note that each term of Eq. 12 is in the form of a lag- k time correlation between ∇ ρ f and J . We usethe term “lag- k ” as ∇ ρ f and the objective function are evaluated at two different states that are k stepsapart. In mixing systems, the lag-k time correlations converges to zero as k → ∞ . Moreover, for a familyof dynamical systems known as Axiom A, the rate of decay of time correlations is proven to be exponential[27]. In the case of one-dimensional maps, Axiom A systems are the ones in which the derivative of the mapis different than 1 everywhere. All the map examples we consider in this paper satisfy this requirement.This guarantees that only a small number of time correlation terms are needed to secure high accuracy ofthe sensitivity approximation. 15 .2 Computational performance of S3 Finally, we compare the space-split sensitivity and classical finite difference method in terms of computationalefficiency. We observe in Figures 16–17, generated for the sawtooth and cusp map, respectively, that the S3method clearly outperforms its competitor, as it requires a few orders of magnitude lesser samples to generatea result with a similar relative error. This is a very promising observation in the context of analysing higher-dimensional systems, since the large cost of generating very long trajectories can make such computationsinfeasible. Note in the case of both the S3 and finite difference methods, the error is upper-bounded asfollows [8], error ≤ C √ N , (26)where N denotes the number of samples, while C is some positive number. This means we observe aconvergence rate of a typical Monte Carlo simulation in both methods. However, the factor C is substantiallylarger in case of finite differencing. Moreover decreasing the step size (indicated as δs ) in the finite differencecalculation, worsens the accuracy, due the dominance of statistical noise.Figure 16: Relative error of the space-split and finite difference methods as a function of the trajectorylength. We compute the parametric derivative of density of the sawtooth map at s = 0 . x = 0). For the S3 computation (curve marked with ( (cid:3) )), we consider only first three terms ofEq. 12, which corresponds to the line marked with ( ◦ ) in Figure 14. For the finite difference approximation,we calculate densities at s = 0 . , .
095 (curve marked with ( ◦ )) and s = 0 . , . (cid:52) )). We also computed the S3 approximation using 125,829,120,000 samples and 9 terms of Eq. 12, whichserves as a reference value. The dashed lines are proportional to the inverse of the square root of the numberof samples. 16igure 17: Relative error of the space-split and finite difference methods as a function of the trajectorylength. We compute the parametric derivative of density of the cusp map at h = 1, γ = 0 . U ( x = 0 . (cid:3) )), we consider first ten termsof Eq. 12. For the finite difference approximation, we calculate densities at γ = 0 . , .
495 (curve markedwith ( (cid:13) )) and γ = 0 . , . (cid:52) )). We also computed the S3 approximation using125,829,120,000 samples and 17 terms of Eq. 12, which serves as a reference value. The dashed lines areproportional to the inverse of the square root of the number of samples. We demonstrate a new method to compute the statistical linear response of chaotic systems, to changes ininput parameters. This method, known as space-split sensitivity or S3, is used to compute the derivativeswith respect to parameters of the long-time average of an objective function. In the S3 method, a quantitycalled density gradient , defined as the derivative of the log density with respect to the state, is obtainedusing a computationally efficient ergodic averaging scheme. An intuitive explanation of this iterative ergodicaveraging scheme, based on probability mass conservation, is discussed in this paper. The density gradientplays a key role in the computation of linear response. Specifically, the sum of time correlations betweenthe density gradient and the objective function partially determines the derivative of the mean statistic ofthe objective function with respect to the parameter. The computational efficiency of the S3 formula whencompared to finite difference, which requires several orders of magnitude more samples, stems precisely fromthis new formula to efficiently estimate the density gradient.In this work, we restrict ourselves to expanding maps in 1D, which are simple examples of chaotic systems.These examples nevertheless give rich insight into chaotic linear response, and specifically into the behaviorof the density gradient. Our study shows that in same cases the derivative of the density gradient might bevery large, which corresponds to heavy tailedness of the density gradient distribution. This phenomenon, aswell as its implication for analysis of higher-dimensional maps, is the main topic of our future work.
Acknowledgments
This work was supported by Air Force Office of Scientific Research Grant No. FA8650-19-C-2207.17 eferences [1] R. V. Abramov and A. J. Majda. Blended response algorithms for linear fluctuation-dissipation forcomplex nonlinear dynamical systems.
Nonlinearity , 20(2793), (2007).[2] D. Balagangadhar and R. Subrata. Design sensitivity analysis and optimization of steady fluid-thermalsystems.
Computer Methods in Applied Mechanics and Engineering , 190, 5465–5479(2001).[3] K. K. Benke, K.E. Lowell, and A.J. Hamilton. Parameter uncertainty, sensitivity analysis and predictionerror in a water-balance hydrological model.
Mathematical and Computer Modelling , 47(11-12), 1134–1149(2008).[4] P. Blonigan.
Least Squares Shadowing for Sensitivity Analysis of Large Chaotic Systems and FluidFlows . PhD thesis, Massachusetts Institute of Technology, (2016).[5] P. Blonigan and Q. Wang. Probability density adjoint for sensitivity analysis of the Mean of Chaos.
Journal of Computational Physics , 270, 660–686(2014).[6] Y. Cao, S. Li, L. Petzold, and R. Serban. Adjoint sensitivity analysis for differential-algebraic equations:The adjoint DAE system and its numerical solution.
SIAM Journal of Scientific Computing , 24(3),1076–1089(2003).[7] N. Chandramoorthy, P. Fernandez, C. Talnikar, and Q. Wang. Feasibility analysis of ensemble sensitivitycomputation in turbulent flows.
AIAA Journal , 57(10), (2019).[8] N. Chandramoorthy and Q. Wang. A computable realization of Ruelle’s formula for linear response ofstatistics in chaotic systems. arXiv e-prints , arXiv:2002.04117, (2020).[9] G. Eyink, T. Haine, and D. Lea. Ruelle’s linear response formula, ensemble adjoint schemes and l´evyflights.
Nonlinearity , 17, 1867(2004).[10] F. Geng, I. Kalkman, A. S. J. Suiker, and B. Blocken. Sensitivity analysis of airfoil aerodynamicsduring pitching motion at a Reynolds number of 1 . · . Journal of Wind Engineering and IndustrialAerodynamics , 183, 315–332(2018).[11] J. Infante Barbosa, C.M. Mota Soares, and C.A. Mota Soares. Sensitivity analysis and shape optimaldesign of axisymmetric shell structures.
Computing Systems in Engineering , 2(5-6), 525–533(1991).[12] A. Jameson. Aerodynamic design via control theory.
Journal of Scientific Computing , 3(3), 233–260(1988).[13] U. Kirsch. Efficient sensitivity analysis for structural optimization.
Computer Methods in AppliedMechanics and Engineering , 117, 143–156(1994).[14] R. Kubo. The fluctuation-dissipation theorem.
Reports on Progress in Physics , 29, 255(1966).[15] D. J. Lea, M. R. Allen, and T. W. N. Haine. Sensitivity analysis of the climate of a chaotic system.
Tellus Series a-Dynamic Meteorology and Oceanography , 52, 523–532(2000).[16] E. Lorenz. Deterministic nonperiodic flow.
Journal of Atmospheric Sciences , 32(10), 2022–2026(1963).[17] M. Mehta and A. K. Mittal. The double-cusp map for the forced Lorenz system.
International Journalof Bifurcation and Chaos , 13, 3029–3035(2003).[18] A. Ni. Approximating Ruelle’s linear response formula by shadowing methods. arXiv e-prints ,arXiv:2003.09801, (2020).[19] A. Ni. Hyperbolicity, shadowing directions and sensitivity analysis of a turbulent three-dimensionalflow.
Journal of Fluid Mechanics , 863, 644–669(2019).1820] A. Ni and Q. Wang. Sensitivity analysis on chaotic dynamical systems by non-intrusive least squaresshadowing (NILSS).
Journal of Computational Physics , 347, 56–77(2017).[21] D. Ruelle. Differentiation of SRB states.
Communications in Mathematical Physics , 187, 227–241(1997).[22] C. Shi-qing, Z. Shen-zhong, H. Yan-zhang, and Z. Wei-yao. Sensitivity coefficients of single-phase flowin low-permeability heterogeneous reservoirs.
Applied Mathematics and Mechanics (English Edition) ,23, 712–720(2002).[23] Q. Wang. Convergence of the least squares shadowing method for computing derivative of ergodicaverages.
SIAM Journal of Numerical Analysis , 52, 156–170(2014).[24] Q. Wang.
Uncertainty quantification for unsteady fluid flow using adjoint-based approaches . PhD thesis,Stanford University, (2009).[25] Q. Wang, R. Hu, and P. Blonigan. Least Squares Shadowing sensitivity analysis of chaotic limit cycleoscillations.
Journal of Computational Physics , 267, 210–224(2014).[26] X. Wang, M. Yoo, R. Glardon, and J. Furbringer. Performance and sensitivity analysis of supply chainalternative configurations - A case study. , 708–713(2009).[27] L. S. Young. Statistical properties of dynamical systems with some hyperbolicity.
Annals of Mathemat-ics , 147(3), 585–650(1998).[28] D. Zhang, Y. Jiang, and J. Cai. Analytic Sensitivity Analysis for Shape Optimization.
Applied Mathe-matics and Mechanics (English Edition) , 22, 1325–1332(2001).
A.1 Binary floating point problem in simulating 1D maps
Consider the case s = 0. Map 3 can be compactly expressed using the modulo operator, i.e. x n +1 =2 x n mod 1. It means we multiply x n by 2 and if x n +1 >
1, then we also subtract 1. Using floating pointarithmetic, we will observe that there exist
N > x n = 0 for all n ≥ N , which contradicts theassumption of chaotic behavior. This phenomenon is due to the round-off errors associated with the modulooperator. To circumvent this problem, one can change the divisor parameter (of the modulo operation) from1 to 1 − (cid:15) , where (cid:15) is a small number, e.g. (cid:15) = 10 − . Another possible (and simple) workaround might bea change of variables such that the domain of the new variable has irrational length. Note this approachwould also require a modification of the objective function. A.2 Derivation of the S3 formula for 1D maps
In this section, we will show Eq. 8–9 are equivalent to Eq. 12–14. Throughout this derivation we will use ashort-hand notation for the composition v ◦ ϕ k = v k , where v is some scalar function defined on U = (0 , k is some integer. If k = 0, the subscript is dropped. First, note (cid:90) U f dJ k dx ρ dx = (cid:90) U ddx ( f J k ) ρ dx − (cid:90) U J k dfdx ρ dx. (1)Integrate the first term of Eq. 1 by parts, (cid:90) U ddx ( f J k ) ρ dx = [ f J k ρ ] U R U L − (cid:90) U f J k ∂ρ∂x dx, (2)where U L = 0 and U R = 1 correspond to the left and right boundary of U , respectively. Since the domainis periodic, the first term of Eq. 2 vanishes. Thus, we can combine Eq. 1 and Eq. 2 to conclude that (cid:90) U f dJ k dx ρ dx = − (cid:90) U J k (cid:18) ∂f∂x + 1 ρ ∂ρ∂x (cid:19) ρ dx. (3)19 .3 Derivation of the iterative procedure for g in 1D maps The purpose of this section is to derive the iterative procedure to calculate the density gradient g . We use thesame notational convention as in Appendix A.2. Let us consider a function h that is integrable in U = (0 , U L = 0 and U R = 1. Using the definition g = (1 /ρ )( ∂ρ/∂x ), and integrating by parts, weobtain (cid:90) U g h ρ dx = (cid:90) U h dρdx dx = [ h ρ ] U R U L − (cid:90) U dhdx ρ dx = − (cid:90) U dhdx ρ dx. (1)The key property used in this derivation is the density preservation of ϕ . We say that the map ϕ is density-preserving with respect to the density ρ , if for any scalar observable f , (cid:82) U f ρ dx = (cid:82) U f ◦ ϕ k ρ dx holds forany integer k . This implies the left-hand side of Eq. 1 can be expressed as (cid:90) U g h ρ dx = (cid:90) U g h ρ dx. (2)We now apply the density preservation together with the chain rule to the right-hand side of Eq. 1, whichgives rise to − (cid:90) U dhdx ρ dx = − (cid:90) U (cid:18) dhdx (cid:19) ρ dx = − (cid:90) U dh dx dϕ/dx ρ dx. (3)Note dh dx dϕ/dx = ddx (cid:18) h dϕ/dx (cid:19) − h ddx (cid:18) dϕ/dx (cid:19) = ddx (cid:18) h dϕ/dx (cid:19) − h d ϕ/dx ( dϕ/dx ) , (4)and, using h ( U L ) = h ( U R ) = 0, integrate by parts to get, − (cid:90) U ddx (cid:18) h dϕ/dx (cid:19) ρ dx = − (cid:20) h dϕ/dx ρ (cid:21) U R U L + (cid:90) U h dϕ/dx dρdx dx = (cid:90) U h dϕ/dx dρdx dx. (5)Combine Eq. 3–5 to observe that − (cid:90) U dhdx ρ dx = (cid:90) U h (cid:18) gdϕ/dx − d ϕ/dx ( dϕ/dx ) (cid:19) ρ dx. (6)Finally, by combining Eq. 1,2, and 6, we obtain the following identity, (cid:90) U h g ρ dx = (cid:90) U h (cid:18) gdϕ/dx − d ϕ/dx ( dϕ/dx ) (cid:19) ρ dx, (7)from which we infer that g = gdϕ/dx − d ϕ/dx ( dϕ/dx ) ..