AAutomatic differentiation for error analysis
Alberto Ramos 𝑎, ∗ 𝑎 Instituto de Física Corpuscular (IFIC), CSIC-Universitat de Valencia46071 - Valencia, SPAIN
E-mail: [email protected]
We present
ADerrors.jl , a software for linear error propagation and analysis of Monte Carlodata. Although the focus is in data analysis in Lattice QCD, where estimates of the observableshave to be computed from Monte Carlo samples, the software also deals with variables withuncertainties, either correlated or uncorrelated. Thanks to automatic differentiation techniqueslinear error propagation is performed exactly, even in iterative algorithms (i.e. errors in parametersof non-linear fits). In this contribution we present an overview of the capabilities of the software,including access to uncertainties in fit parameters and dealing with correlated data. The software,written in julia, is available for download and use in https://gitlab.ift.uam-csic.es/alberto/aderrors.jl . IFIC/20-55
Tools for High Energy Physics and Cosmology - TOOLS20202-6 November, 2020Institut de Physique des 2 Infinis (IP2I), Lyon, France ∗ Speaker © Copyright owned by the author(s) under the terms of the Creative CommonsAttribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). https://pos.sissa.it/ a r X i v : . [ h e p - l a t ] D ec utomatic differentiation for error analysis Alberto Ramos
1. Data analysis in Lattice QCD
Lattice QCD is nowadays able to provide precise results for many quantities of phenomeno-logical interest. A key ingredient in providing solid results is the analysis of lattice QCD data.Lattice QCD is based on discretizing space-time in a hypercubic grid of spacing 𝑎 . The usualcomputation of QCD observables requires the evaluation of path integrals. On the lattice theseobservables are computed as integrals over very many dimensions. The biggest appeal of themethod is that these integrals can be estimated using large-scale computer simulations and MonteCarlo methods. The usual workflow in lattice QCD analysis is the following Production of configurations
A representative ensemble of the discretized lattice QCD action isgenerated using large scale computer simulations. These are usually called configurations . Measurements of primary observables
Several observables of interest, like hadron masses ( 𝑎𝑀 p , Ω ,... ),decay constants ( 𝑎 𝑓 𝜋,𝐾 ,𝐷,... ), form factors, etc. . . are measured in lattice units on the config-urations generated on the previous step. These are usually called primary observables . Data analysis
The observable of interest is generically a function of these primary observables.These are called derived observables .The challenges in data analysis are, on one hand, to control the systematic uncertaintiesassociated with the different extrapolations (to the continuum 𝑎 →
0, to the infinite volume limit,etc. . . ) required to produce a final answer.On the other hand we have to correctly account for the statistical uncertainties of the results.This includes a correct estimate of errors taking into account the autocorrelation of data measured ina Monte Carlo chain. We also have to take into account the correlation of observables measured onthe same configurations. It is important to note that the functions that define the derived observablescommonly involve iterative algorithms (cf. section 3).In these proceedings contribution we present a freely available analysis code,
ADerrors.jl ( https://gitlab.ift.uam-csic.es/alberto/aderrors.jl ), that aims to simplify the dataanalysis. It combines the determination of statistical uncertainties using the Γ -method[1–3] withautomatic differentiation techniques for error propagation [4]. We think that some of the techniquesthat are used for error propagation are potentially useful beyond the lattice QCD community. The software is written in
Julia ( https://julialang.org ). The package in not in thegeneral registry. Still one can use the package manager to install it. ADerrors.jl also dependson
BDIO.jl that is also not registered and should be installed beforehand. In summary the steps tohave a fully functional working environment are:
Code (Julia)julia > import
Pkgjulia > Pkg . add ( "https://gitlab.ift.uam-csic.es/alberto/bdio.jl" ) julia > Pkg . add ( "add https://gitlab.ift.uam-csic.es/alberto/aderrors.jl" ) utomatic differentiation for error analysis Alberto Ramos
2. A calculator with uncertainties
At the center of the package
ADerrors.jl is the data type uwreal . Generally speaking, a uwreal data type contains the value and the difference sources of uncertainties of a variable. Herewe list the main concepts to work with these variables: • Calling var = uwreal ([ x , dx ], "TAG" ) will define var as a uwreal variable with centralvalue x and uncertainty dx. The user provided "TAG" allows to keep track of the differentcontributions to the uncertainty of a variable. • One can perform normal operations with uwreal data types, as if they were normal realvariables: var2 = 2.0* sin ( var )/ var will propagate errors from var to var2 . • Calling uwerr ( var ) will determine the total uncertainty in var . • value ( var ) and err ( var ) return, respectively, the central value and total error of var . • Calling details ( var ) will give the contribution to the total error of var from each source.Here we show an example that shows that ADerrors.jl can be used a “calculator withuncertainties” that automatically takes care of error propagation and correlations.
Code (Julia)julia > x = uwreal ([12.31, 0.23], "Experiment A" ) Error not available ... maybe run uwerr ) julia > y = uwreal ([4.22, 0.12], "Experiment B" ) Error not available ... maybe run uwerr ) julia > z = x + y Error not available ... maybe run uwerr ) julia > uwerr ( z ) julia > println ( z ) julia > details ( z )16.53 +/- 0.25942243542145693 julia > zero_error = sin ( z ) - ( sin ( x )* cos ( y ) + cos ( x )* sin ( y ) )-6.661338147750939e-16 ( Error not available ... maybe run uwerr ) julia > uwerr ( zero_error ) julia > println ( zero_error ) -6.661338147750939e-16 +/- 1.7281005654554058e-16 utomatic differentiation for error analysis Alberto Ramos
In many practical situations we have several variables with uncertainties that are not indepen-dent. The covariance between these variables is given, and has to be taken into account in order toperform error propagation correctly.The method cobs ( central_values , covariance , "TAG" ) allows to import correlateddata as a vector of uwreal . One can operate with the elements of this vector naturally. Let us seean example Code (Julia)julia > avg = [16.26, 0.12, -0.0038]; julia > Mcov = [0.478071 -0.176116 0.0135305-0.176116 0.0696489 -0.005544310.0135305 -0.00554431 0.000454180]; julia > p = cobs ( avg , Mcov , "Correlated data" ) element Array{ uwreal ,1}:16.26 (
Error not available ... maybe run uwerr )0.12 (
Error not available ... maybe run uwerr )-0.0038 (
Error not available ... maybe run uwerr ) julia > uwerr .( p ); julia > p element Array{ uwreal ,1}:16.26 +/- 0.69142678571197980.12 +/- 0.2639107803785211-0.0038 +/- 0.021311499243366245 julia > cov ( p ) × Array{Float64,2}:0.478071 -0.176116 0.0135305-0.176116 0.0696489 -0.005544310.0135305 -0.00554431 0.00045418 julia > z = p [1] + p [2] + sin ( p [3]); julia > uwerr ( z ) julia > z
3. Iterative algorithms
In many practical situations our derived observables are defined by an iterative procedure. Thisis the case of fitting parameters, that being a function of the data that requires an iterative procedureto be determined (i.e. the minimization of the 𝜒 function) make cumbersome the determinationof the derivatives required for linear error propagation.4 utomatic differentiation for error analysis Alberto Ramos
ADerrors.jl uses automatic differentiation to perform exact linear error propagation. Errorscan be naively propagated even in iterative algorithms. As a concrete example, imagine that we areinterested in finding the root of a non-linear function 𝑓 ( 𝑥 ) = 𝑎 cos ( 𝑏 sin ( 𝑥 )) − 𝑥 . (1)If the quantities 𝑎, 𝑏 have uncertainties, these will propagate into an uncertainty in the root 𝑥 ★ .Since 𝑥 ★ is defined by 𝑥 ★ = Find the root of 𝑓 ( 𝑥 ) , (2)what error propagation needs is to find the derivative of "Find the root of 𝑓 ( 𝑥 ) ". The followingprogram performs the job using ADerrors.jl
Code (Julia)julia > a = uwreal ([1.34, 0.12], "Data 01" ) julia > b = uwreal ([1.34, 0.12], "Data 02" ) julia > x0 = uwreal ([0.5,0.5], "Initial position" ) julia > while true val = a * cos ( b * sin ( x0 )) - x0der = - a * b * sin ( b * sin ( x0 ))* cos ( x0 ) - 1.0 x1 = x0 - val / der if ( abs ( value ( x0 ) - value ( x1 ))<1.0E-10)breakelse x0 = x1 endend julia > uwerr ( x1 ) julia > details ( x1 )0.7838003744331717 +/- 0.056992589665847124 Note that the while loop is just a straightforward application of Newton’s method. Automaticdifferentiation approaches the problem of computing the derivative of "Find the root of 𝑓 ( 𝑥 ) " byjust computing the derivative of each expression of Newton’s algorithm. The position of the root 𝑥 ★ = . ( ) , has an uncertainty that has been propagated from the uncertainties of the originalvariables 𝑎, 𝑏 .Note that we also assigned an error to the initial position where the iteration starts 𝑥 . Thisuncertainty does not contribute to the final uncertainty of the position of the root. This is just a In fact
ADerrors.jl uses the
Julia package
ForwardDiff.jl [5] ( https://github.com/JuliaDiff/ForwardDiff.jl/ ) for this propose. utomatic differentiation for error analysis Alberto Ramosmanifestation of the fact that it does not matter what we choose as starting point for our Newtonmethod, we will end up finding the same root.
One of the most common situations in data analysis is fitting data to a model. This is done byfinding the values of the parameters 𝑝 𝑖 that minimize the function 𝜒 ( 𝑝 𝑖 ; 𝑑 𝑎 ) , 𝑝 𝑖 ( 𝑖 = , . . . , 𝑁 parm ) , 𝑑 𝑎 ( 𝑎 = , . . . , 𝑁 data ) . (3)where 𝑑 𝑎 are the data that carry uncertainties (i.e. Monte Carlo data from a lattice simulation, orexperimental data). This data might be correlated and the determination of the uncertainties in thefit parameters has to take these correlations into account.The previous section shows that ADerrors.jl can solve this problem by just propagatingerrors in each step of the minimization routine (i.e. Levenberg-Marquardt, see an explicit examplein [4]). This is cumbersome because it would require to code this algorithm in such a way that uwreal data types are accepted as input.Reference [4] proposes an alternative based on the determination of the Hessian of the 𝜒 function at its minima. This allows ADerrors.jl to rely on external (and efficient) libraries toperform the minimization of the 𝜒 , and delegate the error propagation to the computation of theHessian at the minima once it is found. This Hessian can be determined exactly using automaticdifferentiation techniques, allowing exact error propagation in fit parameters.Let us here show the main characteristics of data fitting using ADerrors.jl (the completeexample is available in appendix A). First we minimize the 𝜒 with an efficient library Code (Julia) julia > fit = optimize ( xx -> lm ( xx , value .( uwy )), [1.0,1.0], LevenbergMarquardt (), autodiff = : forward ) julia > println ( " ) julia > println ( fit ) Results of Optimization Algorithm * Algorithm : LevenbergMarquardt * Minimizer : [1.19727262030669,-0.2938846629270645]*
Sum of squares at Minimum : 19.590325*
Iterations : 7*
Convergence : true* | x - x ' | < 1.0e-08: false* | f ( x ) - f ( x ' )| / | f ( x )| < 1.0e-08: true* | g ( x )| < 1.0e-08: false* Function Calls : 8*
Gradient Calls : 7*
Multiplication Calls : 21 utomatic differentiation for error analysis Alberto RamosOnce the minima of the 𝜒 is found, ADerrors.jl is able to determine the uncertainties inthe fit parameters. We get the fit parameters as a vector of uwreal from the routine fit_error ( csq , fitp_central_values , data ) where csq is the 𝜒 ( 𝑝 𝑖 , 𝑑 𝑎 ) function (as defined in equation (3)), fitp_central_values are thecentral values of the fit parameters (i.e. the minima of the 𝜒 ), and data is the data whose errorshave to be propagated to the fit parameters.Note that we obtain the fit parameters as an array of uwreal data. This means that we canoperate with fit parameters as if they were normal variables, and even use them as input for otherfits. We can also obtain the breaking down of the error in fit parameters (i.e. how much eachoriginal source of uncertainty contributes to the uncertainty in fit parameters). Code (Julia) julia > fitp , chiexp = fit_error ( csq , fit . minimizer , uwy ) julia > uwerr .( fitp ) julia > println ( " ) julia > println ( "Fit results: chi^2 / chi^2_exp: " , fit . ssr , " / " , chiexp ) julia > for i in 1: length ( fitp ) print ( "Fit Parameter " * string ( i )* " (exact value " * string ( exact [ i ])* ": " ) details ( fitp [ i ]) println ( " " ) julia > end println ( " ) Fit results : chi ^2 / chi ^2 _exp : 19.59032480880994 / 18.000000000000004 Fit Parameter exact value ... utomatic differentiation for error analysis Alberto Ramos . . . . . Monte Carlo time x . . . . . Monte Carlo time x Figure 1:
Monte Carlo histories of 𝑥 and 𝑥 for a random walk in the interval [− , ] . Clearly the twohistories are correlated.
4. Dealing with Monte Carlo data
Up to this point we have shown how
ADerrors.jl works with variables with uncertainties,even when they are correlated. But the most common situation in Lattice QCD analysis is that datato be analyzed come from a Monte Carlo simulation. In this case one can not reduce the variableto a single value +/- error , because when several observables measured on the same ensembleare combined the correlation has to be computed from the Monte Carlo history.
ADerrors.jl deals with this case transparently: uwreal variables can also contain full MonteCarlo histories. Let us first generate some Monte Carlo data (see figure 1)
Code (Julia)julia > julia > nsamp = 500 eta = randn ( nsamp ); julia > x = Vector{Float64}( undef , nsamp ); julia > x [1] = 0.0; julia > for i in 2: nsamp x [ i ] = x [ i -1] + 0.2* eta [ i ]if abs ( x [ i ]) > 1.0 x [ i ] = x [ i -1]endend utomatic differentiation for error analysis Alberto RamosThe vector x [:] contains samples of the uniform distribution U [− , ] . We can determine theexpected values, for example, of E [ 𝑋 ] or E [ 𝑋 ] . Let us show how these Monte Carlo data can beinput as uwreal data types and check that the variables are correlated. Code (Julia)julia > xp2 = uwreal ( x .^2, "Random walk ensemble in [-1,1]" )0.3533602504472119 ( Error not available ... maybe run uwerr ) julia > xp4 = uwreal ( x .^4, "Random walk ensemble in [-1,1]" )0.2197572322981959 ( Error not available ... maybe run uwerr ) julia > cov ([ xp2 , xp4 ])2 × Array{Float64,2}:0.0007523017604391568 0.00055518681030901070.0005551868103090107 0.00039631731910912886
Crucially the ensemble tag ( "Random walk ensemble in [-1,1]" in this case) is the same forboth observables. This tells
ADerrors.jl that the measurements of both observables have beenperformed in the same ensemble and makes possible to determine the covariance of the data fromthe Monte Carlo histories. We can determine functions of these expected values. Correlations areextracted from the Monte Carlo histories and automatically taken into account. These correlationsare crucial to obtain correct error estimates of derived observables that depend on both variables xp2 , xp4 , as the following example shows Code (Julia)julia > z = xp4 / xp2 ^2; julia > uwerr ( z ) julia > println ( z )1.7764683477541687 +/- 0.12751045462709376
5. Conclusions
In this proceedings contribution we have presented
ADerrors.jl . Using Automatic differ-entiation, this software package performs exact linear error propagation even in quantities that aredefined via iterative algorithms, like fit parameters.
ADerrors.jl keeps track of the origin of theerror in each variable, making possible to obtain how much each ensemble has contributed to theuncertainty in a particular variable.The benefits of data analysis within this framework is that it is robust : if your analysis codedetermine the central values correctly, the exact nature of automatic differentiation ensures thaterrors are propagated correctly.Some other features of the software include • Exact linear error propagation in fit parameters, integrals and roots of non linear functions. • Handles data from any number of ensembles (i.e. simulations with different parameters). • Handles correlated data, with error estimates automatically including the correlations.9 utomatic differentiation for error analysis
Alberto Ramos • Support for replicas (i.e. several runs with the same simulation parameters). • Irrelgular MC measurements are handled transparently.We refer the interested readers to the original documentation of the package https://gitlab.ift.uam-csic.es/alberto/aderrors.jl ,and the on-line tutorial https://ific.uv.es/~alramos/docs/ADerrors/tutorial/ . Acknowledgments
I am grateful to C. Pena for his help preparing the material of this talk. The author receivedsupport from the Generalitat Valenciana via the genT program (CIDEGENT/2019/040).
A. Error propagation in fits
Code (Julia) using
ADerrors , Distributions , LeastSquaresOptim , Printf npt = 20 x = 2.0* rand ( npt ) @ . model ( x ) = 1.2* exp (- x *0.3) dy = 0.05 ./ ( x .+ model ( x )) y = model ( x ) + dy .* randn ( npt ) uwy = Vector{ uwreal }( undef , npt )for i in 1: nptxfmt = @sprintf( "%6.4f" , x [ i ]) uwy [ i ] = uwreal ([ y [ i ], dy [ i ]], "Data for point at x=" * string ( xfmt ))end uwerr .( uwy ) println ( " )for i in 1: nptprintln ( " dt " , x [ i ], " " , uwy [ i ])end println ( " ) utomatic differentiation for error analysis Alberto Ramos function fit_definitions ( x , dy ) function lmfit ( prm , y ) res = Vector{ eltype ( prm )}( undef , length ( y ))for i in 1: length ( y ) res [ i ] = ( y [ i ] - prm [1]* exp ( prm [2]* x [ i ])) / dy [ i ]endreturn res end chisq ( prm , data ) = sum ( lmfit ( prm , data ) .^ 2)return lmfit , chisq end lm , csq = fit_definitions ( x , err .( uwy )) fit = optimize ( xx -> lm ( xx , value .( uwy )), [1.0,1.0], LevenbergMarquardt (), autodiff = : forward ) println ( " ) println ( fit ) println ( " ) fitp , chiexp = fit_error ( csq , fit . minimizer , uwy ) uwerr .( fitp ) exact = [1.2, -0.3] utomatic differentiation for error analysis Alberto Ramos println ( " ) println ( "Fit results: chi^2 / chi^2_exp: " , fit . ssr , " / " , chiexp )for i in 1: length ( fitp ) print ( "Fit Parameter " * string ( i )* " (exact value " * string ( exact [ i ])* ": " ) details ( fitp [ i ]) println ( " " )end println ( " ) References [1]
ALPHA
Collaboration, U. Wolff, “Monte Carlo errors with less errors,”
Comput.Phys.Commun. (2004) 143–153, arXiv:hep-lat/0306017 [hep-lat] .[2]
ALPHA
Collaboration, S. Schaefer, R. Sommer, and F. Virotta, “Critical slowing down anderror analysis in lattice QCD simulations,”
Nucl. Phys.
B845 (2011) 93–119, arXiv:1009.5228 [hep-lat] .[3] F. Virotta,
Critical slowing down and error analysis of lattice QCD simulations . PhD thesis,Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät I, 2012.[4] A. Ramos, “Automatic differentiation for error analysis of Monte Carlo data,”
Comput. Phys.Commun. (2019) 19–35, arXiv:1809.01289 [hep-lat] .[5] J. Revels, M. Lubin, and T. Papamarkou, “Forward-mode automatic differentiation in Julia,” arXiv:1607.07892 [cs.MS] (2016) . https://arxiv.org/abs/1607.07892https://arxiv.org/abs/1607.07892