[PDF] PC Proxy: A Method for Dynamical Tracer Reconstruction

Abstract

A detailed development of the principal component proxy method of dynamical tracer reconstruction is presented, including error analysis. The method works by correlating the largest principal components of a matrix representation of the transport dynamics with a set of sparse measurements. The Lyapunov spectrum was measured and used to quantify the lifetime of each principal component. The method was tested on the 500 K isentropic surface with stratospheric ozone concentration measurements from the Polar Aerosol and Ozone Measurement (POAM) III satellite instrument during October and November 1998 and compared with the older proxy tracer method which works by correlating measurements with a single other tracer or proxy. Using a 60 day integration time and five (5) principal components, cross validation of globally reconstructed ozone and comparison with ozone sondes returned root-mean-square errors of 0.16 ppmv and 0.36 ppmv, respectively. This compares favourably with the classic proxy tracer method in which a passive tracer equivalent latitude field was used for the proxy and which returned RMS errors of 0.22 ppmv and 0.59 ppmv for cross-validation and sonde validation respectively. The method seems especially effective for shorter lived tracers and was far more accurate than the classic method at predicting ozone concentration in the Southern hemisphere over the same time period. It is also more effective when reconstruction is performed over the entire Earth rather than a single hemisphere allowing for seamless reconstruction of global fields.

Full PDF

aa r X i v : . [ phy s i c s . a o - ph ] A ug PC Proxy: a New Method of Dynamical TracerReconstruction

Peter Mills [email protected]

September 19, 2018

Abstract

A detailed development of the principal component proxy method of dynamicaltracer reconstruction is presented, including error analysis. The method worksby correlating the largest principal components of a matrix representation of thetransport dynamics with a set of sparse measurements. The Lyapunov spectrumwas measured and used to quantify the lifetime of each principal component.The method was tested on the 500 K isentropic surface with ozone measurementsfrom the Polar Aerosol and Ozone Measurement (POAM) III satellite instru-ment during October and November 1998 and compared with the older proxytracer method which works by correlating measurements with a single othertracer or proxy. Using a 60 day integration time and ﬁve (5) principal com-ponents, cross validation of globally reconstructed ozone and comparison withozone sondes returned root-mean-square errors of 0.16 ppmv and 0.36 ppmv,respectively. This compares favourably with the classic proxy tracer method inwhich a passive tracer equivalent latitude ﬁeld was used for the proxy and whichreturned RMS errors of 0.30 ppmv and 0.59 ppmv for cross-validation and sondevalidation respectively. The method seems especially eﬀective for shorter livedtracers and was far more accurate than the classic method at predicting ozoneconcentration in the Southern hemisphere at the end of winter.

Keywords transport dynamics, interpolation methods, satellite remote sensing,tracer advection, assimilation models, numerical analysis, transportmodelling; ozone ills: PC Proxy Contents

A.1 Mass conservation . . . . . . . . . . . . . . . . . . . . . . . . . . 27A.2 Diﬀusion and the Lyapunov spectrum . . . . . . . . . . . . . . . 27

Satellite remote sensing instruments have become invaluable for monitoringtrace atmospheric constituents such as ozone. One area in which they frequentlyfall short is density of coverage. Short of comprehensive prognostic assimilationmodels, there are methods of interpolating sparse measurements that nonethe-less take into account the atmospheric dynamics. One such method was ﬁrstintroduced in Butchart and Remsberg (1986) and used in Randall et al. (2002)to derive Northern hemisphere maps of ozone from Polar Ozone and Aerosol ills: PC Proxy

Proxy tracer analysis works by correlating mea-surements with another atmospheric variable, the proxy, that is conserved bythe ﬂow. The two most common proxies are potential vorticity (Hoskins et al.,1985) or a passive tracer simulated over a long time scale (Allen and Nakamura,2003). In either case, it’s standard practice to transform the proxy to an area-conserving

Lagrangian coordinate called equivalent latitude before performingthe reconstruction. This method is mainly useful for long-lived tracers such asozone.Here we introduce a new method of dynamical tracer reconstruction called principal component proxy analysis that is more appropriate for shorter-livedtracers because it works on shorter time scales. The method is summarized asfollows: the passive tracer dynamics are represented as a matrix which linearlymaps the initial tracer conﬁguration to the ﬁnal tracer conﬁguration. We termthis matrix the transport map . The transport map is decomposed using princi-pal component analysis (PCA)–hence the name of the method–and the largestprincipal components are then ﬁtted to the measurements. Because the trans-port map is integrated over only a short time scale and because it represents alarge number of possible tracer conﬁgurations, with the most likely singled outthrough the decomposition, PC proxy should be better able to compensate fornon-advective changes in concentration or sources and sinks.

The advection-diﬀusion equation is given as follows: ∂q ( ~x, t ) ∂t = {− ~v ( ~x, t ) · ∇ + ∇ · D ∇} q ( ~x, t ) (1)where q is the tracer concentration, ~x is spatial position, t is time, ~v is theﬂuid velocity, and D is the diﬀusivity tensor. There is no mass-conservationterm, q ∇ · ~v , either because q is a volume-mixing ratio (vmr) or the ﬂuid isincompressible ( ∇ · ~v = 0).In an Eulerian tracer simulation, the approximate value of q is known onlyat discrete locations so can be represented as a vector, ~q = { q i } . The linearoperator contained in the braces in Equation (1) is approximated as a matrix, A , which is multiplied with ~q : d ~q d t = A ( t ) ~q (2)For example, consider a second-order ﬁnite diﬀerence scheme in one dimen-sion: ∂q i ∂t = v i ( q i +1 − q i − )2∆ x + d ( q i − + q i +1 − q i )∆ x (3)= (cid:18) − v i x + d ∆ x (cid:19) q i − − d ∆ x q i + (cid:18) v i x + d ∆ x (cid:19) q i +1 (4) ills: PC Proxy d is a scalar diﬀusion coeﬃcient, ∆ x is the grid spacing and v i is the windspeed at the i th grid point. Expressed as elements of the matrix, A : a i,i − = (cid:18) − v i x + d ∆ x (cid:19) (5) a i,i = − d ∆ x (6) a i,i +1 = (cid:18) v i x + d ∆ x (cid:19) (7)To produce a general solution to Equation (2), we ﬁrst substitute a matrix, R , for ~q : d R ( t , t )d t = A ( t + t ) R ( t , t ) (8)We will call the matrix R the discrete transport map or simply the transportmap .Unlike in most analyses, there are two parameters for the time: the in-tegration start time, t , and the integration time, t . In this way, R may bedecomposed in terms of itself: R ( t , t n − t ) = R ( t n , ∆ t n ) R ( t n − , ∆ t n − ) R ( t n − , ∆ t n − ) ... R ( t , ∆ t ) (9)where, t n = t + n X i =0 ∆ t i (10)It follows that R ( t ,

0) = I where I is the identity matrix.Given R , we can calculate ~q given ~q at any other time: ~q ( t ) = R ( t , t − t ) ~q (11)where ~q = ~q ( t ) is the initial tracer conﬁguration.To make this solution fully analytical, although not realizable in practice,we solve R for an inﬁnitessimal integration time using linear algebra:lim ∆ t → R ( t, ∆ t ) = exp [∆ tA ( t )] (12)where exp is the matrix generalization of the exponential function. Suppose we decompose a given transport map using singular value decomposi-tion: R ( t , t n − t ) = U SV T (13)where t n − t is the integration time , S is a diagonal matrix of singular values ,and both U and V are orthogonal matrices: U T U = V T V = I , (14) ills: PC Proxy U is the matrix of left singular vectors and V is the matrix of right singularvectors (Press et al., 1992). This method of matrix decomposition is also knownas principal component analysis or PCA, hence the name of the interpolationtechnique. Singular vectors are increasingly being used in meteorology bothto quantify the predictability of a forecast and to generate perturbations forensemble forecasts (Tang et al., 2006).A good way to think of it is as a set of orthogonal initial conditions–the rightsingular vectors, { ~v ( i ) } –which, when the tracer dynamics are applied, map ontoa set of orthogonal ﬁnal conditions–the left singular vectors, { ~u ( i ) } –that havegrown by respective factors { s i } : R~v ( i ) = s i ~u ( i ) (15)The superscript denotes the column number: this notation will be used through-out. Also, the term PC or principal component will be used as a synonym forleft singular vector.The singular values are all positive and by convention are arranged fromlargest to smallest: s i ≤ s i − (16)The matrix can normally be reconstructed to a high degree of accuracy usingonly a few of the largest singular values and vectors. This also makes theproblem tractable since a typical size for the transport map might be [20000 × { m i } , with the top k principalcomponents, we ﬁrst ﬁnd interpolates at each measurement location in the leftsingular vectors and then ﬁnd a set of coeﬃcients, ~c , that minimizes the mean-square error of the interpolates versus the measurements. In any real problem,the measurements are unlikely to occur at the same time, so rather than usingthe left singular vectors, we use R to advance the right singular vectors to thesame time as the measurement. The time period during which measurementsare admitted is the measurement window . While the diﬀerence between theintegration start time, t , and the center of the measurement window is the leadtime .We have the following minimization problem:min ~c X i  k X j =1 c j ~w i R ( t , t ( i ) − t ) ~v ( j ) − m i  (17)where t ( i ) is the time stamp of the i th measurement, m i , and ~w i is a vector ofinterpolation coeﬃcients.Once the coeﬃcients have been ﬁtted, the tracer is reconstructed as follows: ~q ( t n ) ≈ k X i =1 c i ~u ( i ) (18) ills: PC Proxy Contrast the above description of principal component proxy tracer analysiswith the original proxy tracer technique, hereafter referred to as “classic” proxytracer. In the earlier method, the measurements are simply correlated with an-other tracer (the proxy): either a passive tracer that has been advected continu-ously with periodic re-normalization (Allen and Nakamura, 2003) or some otherquantity that is conserved by the ﬂow such as potential vorticity (Randall et al.,2002; Hoskins et al., 1985). The regression is typically done to second-order andthe proxy variable converted to an area-based “Lagrangian” coordinate calledequivalent latitude (Butchart and Remsberg, 1986).If Φ is the proxy value, then we have the following minimization problem:min ~c X i  N X j =0 c j h ~w i · ~ Φ( t i ) i j − m i  (19)where ~c are the regression coeﬃcients and N is the order of the method. Thetracer is reconstructed: q k ( t ) ≈ N X j =0 c i Φ jk ( t ) (20) Suppose we have a system of ordinary diﬀerential equations (ODEs):d ~r d t = ~f ( ~r, t ) (21)where ~r is a vector of dependent variables. Linearize this about ~r using the tangent vector , ∇ ~r f : dd t ( ~r + δ~r ) ≈ ~f + ∇ ~r ~f · δ~x (22)dd t δ~r ≈ ∇ ~r ~f · ~r (23)where δ~r is a vector of inﬁnitessimal error vectors . Now deﬁne the tangentmodel , H , such that: dd t H = ∇ ~r ~f H (24)A passive Eulerian tracer simulation is linear: the tangent vector is simplythe dynamics. If we take Equation (2), a system of linear ODEs, as our systemof ODEs in (21), then setting ~r = ~q , we have ~f ( ~q, t ) = A ( t ) ~q , while the tangentvector is given as: ∇ ~q ~f = A (25)Thus the transport matrix, R , is equivalent to the tangent model, H . ills: PC Proxy λ i = lim t →∞ t log s i ; λ i − ≤ λ i (26)where s i is the i th singular value (Ott, 1993). For most systems: | δ~r | ≈ | δ~r (0) | exp( λ i ) (27)That is, as H is integrated forward, the largest singular value and the largestsingular vector will increasingly begin to dominate (Ott, 1993). The Lyapunovexponents can help us gauge the signiﬁcance of each singular vector at a givenlead time. An important property of ﬂow tracers is that the amount of substance is con-served: X i q i = const. (28)The equation is exact if the simulation uses an equal area grid and the ﬂuid isincompressible ( ∇ · ~v = 0). From this it follows: X i r ij = 1 (29) X i a ij = 0 (30)See Appendix A.1 for the derivation.All gridded Eulerian tracer simulations are by necessity diﬀusive. Given inaddition the constraint above in (28), we can also show that all the singularvalues are less-than-or-equal-to one:0 ≤ s i ≤ A detailed error analysis can help us both to better understand the techniqueand to chose the best parameters for a given interpolation. Real ﬂow tracerswill have sources and sinks, thus we introduce a source term, σ , to Equations(9) and (11): ~q ( t n ) ≈ ∆ t n σ n + R ( t n − , ∆ t n − )[ ~σ n − + ills: PC Proxy R ( t n − , ∆ t n − )[ ~σ n − +...+ R ( t , ∆ t )[ ~σ + R ( t , ∆ t ) ~q ] ... ]] (32)where ~σ i is the integrated source term for the i th time step. Expanding: ~q ( t n ) = R ( t , t n − t ) ~q + n X i =1 R ( t i , t n − t i ) ~σ i (33)To get a handle on the error, we consider ﬁrst a fully passive tracer (nosources or sinks) started with initial conditions ~q . Expanding this in terms ofthe right singular vectors with a set of coeﬃcients, ~c : ~q = V ~c (34)means that the ﬁnal tracer takes the following form: ~q ( t n ) = U S~c (35)however we are only calculating the top k singular vectors, so the interpolationlooks like this: ~q ( t n ) ≈ k X i =0 c i s i ~u ( i ) (36)The error is simply the terms left out of the equation, discounting er-rors introduced through discretization and because of a ﬁnitely resolved (andpossibly inaccurate) wind ﬁeld. The smaller the remaining singular values, { s i | i = [ k + 1 ..n p ] } , the smaller the error, where n p is the total number of gridpoints in the tracer. This is why the Lyapunov exponents are useful: they tellus how fast the singular values shrink.A tracer interpolated with all the singular vectors will take the form: ~q ( t n ) = U~c (37)Substitution of (34) into (33) and comparison with (37) produces the following: ~c = S~c + U T n X i =1 R ( t i , t n − t i ) ~σ i (38)In other words, we need to project the integrated source terms onto the singularvectors. As before, the error is given by terms not included in the analysis: ~ǫ = n p X i = k +1 ~u ( i )  s i c i + ~u ( i ) · n X j =1 R ( t j , t n − t j ) ~σ j  (39) ills: PC Proxy ~u ( i ), does not cancel outthe second occurrence because the factor in square brackets is a scalar, i.e.,order of operations matters.The ﬁrst term in Equation (39) sums the components of the initial tracerconﬁguration that project onto the smaller singular vectors not included in theanalysis. This term will be small because of the shrinking of singular values overtime. The second term are the components of the source terms projected ontothe same singular vectors. The less the source terms line up with the largestsingular vectors, the larger this source of error.Equation (39) suggests two approaches to reducing the error. The ﬁrst is toreduce the size of R ( t j , t n − t j ) so that the source terms grow as little as possible.This would suggest that the measurement window should be in the middle ofthe integration, i.e. the lead time is half the integration time. The second isto make this factor as close as possible to R ( t n , t n − t ) so that projection ontothe singular vectors leaves the term SV and the leftover smaller componentsare shrunk by the singular values. This would suggest that the measurementwindow should be towards the end of the integration, i.e. the lead time is thesame as the integration time.To understand this last point, rewrite Equation (39) as follows: ~ǫ = n p X i = k +1 s i ~u ( i )  c i + ~v ( i ) · n X j =1 R − ( t , t j − t ) ~σ j  (40)Note that because of diﬀusion, a backwards integration is not equivalent to theinverse of the forwards integration, but only approximately so, that is, R ( t +∆ t, − ∆ t ) ≈ R − ( t, ∆ t ).Measurement error and ﬁtting discrepancies can be treated in the same wayas any other linear least squares problem. It stands to reason that havingfewer measurements will magnify both measurement errors and discrepanciesgenerated by sources and sinks. More measurements will allow the use of moresingular vectors, reducing both the error terms in (39). In this paper we willtake an empirical approach to the error analysis by validating reconstructedtracer ﬁelds against actual measurements whenever possible. To generate the transport maps, the ctraj software package is used (http://ctraj.sf.net).The software is written in C++ and contain programs for gridded, semi-Lagrangiantracer advection on an azimuthally-equidistant-projected coordinate system: x = r cos θ (41) y = r sin θ (42) r = π/ − hφ (43) ills: PC Proxy θ is longitude, φ is latitude and h is the hemisphere in which the projectionis deﬁned: h = (cid:26)

1; North −

1; South (44)Two ﬁelds are advected simultaneously: one for the Northern hemisphere ( h =1) and one for the Southern hemisphere ( h = − (cid:18) dsdx (cid:19) = 1 r (cid:20) R E r sin (cid:18) rR E (cid:19) y + x (cid:21) (45) (cid:18) dsdy (cid:19) = 1 r (cid:20) R E r sin (cid:18) rR E (cid:19) x + y (cid:21) where R E is the radius of the Earth.Because it is a semi-Lagrangian simulation, the factors, { R ( t i , ∆ t ) } , areoutput directly as sparse matrices by calculating the interpolation coeﬃcients.By storing the output as sparse matrices and not multiplying them throughuntil needed, it becomes possible to calculate the singular values and vectorsthrough iterative methods such as the Lanczoz method (Golub and van Loan,1996). The Arnoldi package (ARPACK) (Lehoucq and Scott, 1996) is used tocompute the eigenvalues and eigenvectors needed for the SVD.To calculate back-trajectories, a fourth-order Runge-Kutta integration schemewas used with a 1.2 hour time-step. Back trajectories were linearly interpolatedafter each 1-day, Eulerian time step. Gridding in both hemispheres is 50 by 50,or 400km-, 3.6-degree-latitude-separation at the pole. The Polar Ozone and Aerosol Measurement (POAM) III instrument was a solar-occultation instrument mounted on the SPOT-4 sun-synchronous, low-earth-orbit satellite (Lucke et al., 1999). Operating between March 1998 and Decem-ber 2005, it had nine channels in the visual and near infrared range. Usingoptimal estimation (Rodgers, 2000), ozone proﬁles have been retrieved within apair of narrow latitude bands in both the Arctic and Antarctic (Lumpe et al.,2002). The instrument is capable of returning 28 or 29 measurements per day,alternating between Northern and Southern hemisphere, however because of amalfunction in the instrument, it normally operates in only one or the otherhemisphere for longer periods. Therefore, we conﬁne ourselves to earlier data,October and November 1998, when more frequent and diverse measurementsare available.

The National Center for Environmental Prediction (NCEP) supplies, free-of-charge, gridded (2.5 by 2.5 degrees longitude/latitude, 4 time daily), reanalyzed ills: PC Proxy

The World Ozone and Ultraviolet Data Centre (WOUDC) collects ozone sondedata from around the world (Hare et al., 2000). A list of all contributors isavailable on the website: http://woudc.org . The data is archived by Envi-ronment Canada. The location of all the launch stations used in the validationexercises is shown in Figure 1. Sonde locations are a good match for the POAMIII measurement locations since they are primarily at high latitudes with fewstations towards the equator.

For reference, examples of singular vectors derived from the transport map, R ( t, ∆ t ), where t is August 1, 1998 and ∆ t is sixty days, are shown in Figure2. If R ( t, ∆ t ) = U SV T , Figure 2a. shows the ﬁrst right singular vector, ~v (1) ,while Figure 2b. shows the corresponding left singular vector, ~u (1) ; Figure2c. shows the second right singular vector, ~v (2) , while Figure 2b. shows thecorresponding left singular vector, ~u (2) and so on. While the globally projectedﬁelds representing the right singular vectors appear to be quite similar and ills: PC Proxy − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . . . . . . . . . . . . . . . . . − . − . − . − . − . . . . . . . . . . . . . . . . . . a. Right SV 1 c. Right SV 2 e. Right SV 3 g. Right SV 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . . . . . . . . . . . . . . . . . . . . . . . . . . . . . − . − . − . − . − . − . − . − . − . − . − . − . . . . . . . . . . . . . . . . b. Left SV 1 d. Left SV 2 f. Left SV 3 h. Left SV 4Figure 2: An example of the singular vectors derived from a discrete transportmap. Sixty day integration started on August 1 1998.are indeed quite strongly correlated, the abstract vectors from which they arederived have negligible dot products. Two diﬀerently-initialized tracers, when integrated with the same wind ﬁeldsover a long time period, become correlated. As discussed in Section 2.3, thiscan be used to infer global ﬁelds of a long-lived tracer such as ozone based on onlya few sparse measurements (Allen and Nakamura, 2003; Randall et al., 2002).Figure 3 demonstrates this with the extreme example of an initially zonally-symmetric tracer and an initially meridionally-symmetric tracer. Tracers areadvected with National Center for Environmental Prediction (NCEP) reanalysis1 data at the 500 K isentrop (Kalnay et al., 1996).We also plot the correlation of the ﬁrst tracer with the largest singular vector.We see that, because of Equation (27), they too become correlated over time.This at least partially explains the eﬃcacy of the proxy tracer method. Theperiodic dips in correlation are not yet understood. A sample PC as comparedwith the tracer is shown in Figure 4.

Figure 5 plots the time evolution of the singular values. From this we cancalculate the Lyapunov spectrum by making straight line ﬁts of their logarithms.While the resulting ﬁelds may develop into complex fractals (Mills, 2009) theLyapunov spectrum shows that the tracer dynamics themselves are not truly ills: PC Proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . − . Figure 4: Comparison of a simulated tracer (a.) and the ﬁrst principal compo-nent (b.) for the same time integration. Integration started on January 1, 1997and continued until December 1, 1997, a period of 334 days. ills: PC Proxy ills: PC Proxy /π + 1 / ≈ . In this section we use PC proxy to reconstruct ozone ﬁelds from POAM IIIsatellite data. To perform the analysis, we need to choose an integration time, t n − t in Equation (13), as well as a measurement window. The integrationtime determines how long the tracer is advected before performing the SVD.Measurements are selected from within the measurement window. We alsoneed to choose a lead time which determines how far from the beginning of theintegration, t , the measurement window is centered. For this experiment, itwas centered at the end of the integration period. Placing the measurementwindow in the middle of the integration period was found to work poorly. TheLyapunov spectrum can help us select the number of singular vectors as it showshow many remain signiﬁcant at a given lead time–see Figure 5.Two dimensional ﬁelds were reconstructed daily on the 500 K isentrop be-tween September 26 and November 17, 1998, which is one of only a few periodsin which POAM III was operating in both hemispheres simultaneously. Thetracer simulation was run at a 50 by 50 resolution or 400 by 400 km at the poleswith a 1.2 hour Runge-Kutta time step for the back-trajectories and a 1 day Eu-lerian time step. The integration time was 60 days with the same lead time andﬁve principal components were used unless otherwise noted. The measurementwindow was two days.An example of a reconstructed ozone ﬁeld is shown in Figure 6 with Fig-ure 6a. using the classic proxy tracer method while Figure 6b. uses the PCproxy method. Because POAM III data is conﬁned to two rather narrow lati-tude bands near the poles, values near the equator may not be that accurate.Nonetheless, the author thought it important to test the method with globalreconstructions to see how well the PC proxy method can extrapolate to areaswhere measurements are sparse or non-existant. As we will see, PC proxy isnot only more accurate than the classic method, it can also be better at takinginto account non-local information when it does not become unstable due toover-ﬁtting. ills: PC Proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a. Classic proxy tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . b. PC proxyFigure 6: Sample globally reconstructed ozone ﬁelds on the 500 K isentrop forOctober 1, 1998. ills: PC Proxy ills: PC Proxy ills: PC Proxy ills: PC Proxy ills: PC Proxy In the ﬁrst validation exercise, the POAM data was randomly separated intotwo equal groups, each of which was used to predict the other. Since the POAMmeasurements are closely grouped, falling into one of two narrow latitude bandsin the Arctic and Antarctic, and spaced at roughly 85 degrees longitude betweenconsecutive measurements, skill scores will tend to be quite high. PC proxy wascompared to a classic proxy tracer with a second order ﬁt. The same tracersimulation as used for PC proxy was used to generate equivalent latitude ﬁelds(Allen and Nakamura, 2003) for the classic proxy tracer but with a two yearspin-up. Unlike in Randall et al. (2002), the reconstruction was done over theentire globe. Reconstructed ozone ﬁelds were linearly interpolated to match thelocations of the sonde measured test group.Figure 7 shows the cross-validation results for classic proxy tracer whileFigure 8 shows those for principal component proxy. The correlation for theclassic method was 0.956, as shown in the scatter plot in Figure 7a. The biaswas -0.020 ppmv while the root-mean-square (RMS) error was 0.30 ppmv. Inthe histogram error plot in Figure 7b. the errors have been normalized by theoriginal error estimates for the POAM III retrievals. When this is done, thebias becomes -0.063 while the RMS error is 1.6.The correlation for the PC proxy method was 0.987, as shown in the scatterplot in Figure 7a. The bias was 0.0039 ppmv while the RMS error was 0.16ppmv. The normalized values are 0.021 for the bias and 0.86 for RMS error. Inother words, observed errors for the reconstructed ozone are better, on average,than the error bars from the original retrievals! On the other hand, the PCproxy method, while more accurate, appears to be less stable as the pair ofnegative values in 8a suggest.

In the second validation exercise, ozone ﬁelds were reconstructed from all avail-able POAM data and compared with radio-sonde measurements. Figure 9 showsthe results for the classic proxy tracer reconstruction validated against ozonesonde data from the WOUDC, while ﬁgure 10 shows the same for the PC proxymethod. Correlation for the classic method stands at 0.76, with a bias of -0.058ppmv and a RMS error of 0.59 ppmv while the PC proxy method provides acorrelation of 0.91, a bias of -0.0040 ppmv and a RMS error of 0.36 ppmv.For this case study, the PC proxy method is clearly superior. On the otherhand, because ozone values tend to be higher in the Northern hemisphere thanthey are in the South, skill scores may be unnaturally high. As pointed out inthe previous section, the original proxy tracer was typically only applied to onehemisphere at a time. To see how the two methods truly compare, we should alsoapply them to only the Northern or Southern hemisphere. Meanwhile, to get abetter idea of the skill of the methods, we should likewise restrict comparisonsto only a single hemisphere.To get a more rigorous evaluation of each method we shall conﬁne them to ills: PC Proxy ills: PC Proxy ills: PC Proxy

The raw skill scores listed in the previous section show the PC proxy method tobe more accurate than the classic proxy tracer method, even when the classicmethod is done only hemi-spherically while the PC reconstruction is performedglobally. A closer look at the scatter plots, Figures 12 and 11, in the previoussection suggest that this may not be so clear cut. Except for one outlier, theplot of the N. hemisphere reconstruction for the classic proxy in Figure 12a.appears to have less scatter than either the N. hemisphere reconstruction usingPC proxy in 12b. or the global reconstruction for the same restricted to the N.hemisphere in 11.On the other hand, when the reconstruction is done over the whole globe,PC proxy is clearly superior. It is also superior in the Southern hemisphere.We surmise that part of the advantage lies in having more degrees of freedomor at least, more meaningful degrees of freedom: more than one possible tracerconﬁguration is represented. Moreover, because it is using information fromshorter time scales, it is better able to ﬁt more active tracers that are changingmore rapidly, as ozone would be in the Southern hemisphere during winter time.Unfortunately, because of these extra degrees of freedom, the method canalso be more unstable. This is observed in negative values that showed up ﬁrstin the cross-validation exercise in Section 5.1, but also in ringing and regions ofnegative concentration towards the equator in some of the reconstructed ﬁelds. ills: PC Proxy ~w i , in Equation (17) to be a setof weights for integrating a column of air or performing a radiative transfersimulation rather than a set of interpolation coeﬃcients. Thanks to the National Center for Environmental Prediction and the NationalCenter for Atmospheric Research for the reanalysis data used in the simulations.Thanks also to World Ozone and Ultraviolet Data Center and EnvironmentCanada for ozone sonde data. And thanks especially to my former colleaguesat the Naval Research Laboratory for POAM III ozone data.Contour maps were created with Generic Mapping Tools (GMT) while scat-ter plots and historgrams were done in Open Oﬃce.

References

Allen, D. R. and Nakamura, N. (2003). Tracer Equivalent Latitude: A Di-agnostic Tool for Isentropic Transport Studies.

Journal of the AtmosphericSciences , 60:287–303.Butchart, N. and Remsberg, E. E. (1986). The area of the stratospheric polarvortex as a diagnostic for tracer transport on an isentropic surface.

Journalof the Atmospheric Sciences , 43:1319–1339.Golub, G. H. and van Loan, C. F. (1996).

Matrix Computations . Johns HopkinsUniversity Press. ills: PC Proxy

Quarterly Journal of theRoyal Meteorological Society , 111:877–946.Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L.,Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Chelliah, M., Ebisuzaki,W., Higgins, W., Janowiak, J., Mo, K., Ropelewski, C., Wang, J., Leetmaa,A., Reynolds, R., Jenne, R., and Joseph, D. (1996). The NCEP/NCAR 40-year reanalysis project.

Bull. Amer. Meteor. Soc. , 77:437–470.Lehoucq, R. B. and Scott, J. A. (1996). An Evaluation of Software for Comput-ing Eigenvalues of Sparse Nonsymmetric Matrices. Technical Report MCS-P547-1195, Argonne National Laboratory.Lucke, R. L., Korwan, D. R., Bevilacqua, R. M., Hornstein, J. S., Shettle, E. P.,Chen, D. T., Daehler, M., Lumpe, J. D., Fromm, M. D., Debrestian, D.,Neﬀ, B., Squire, M., Knig-Langlo, G., and J. Davies, J. (1999). The PolarOzone and Aerosol Measurement(POAM) III instrument and early validationresults.

Journal of Geophysical Research , 104(D15):18785–18799.Lumpe, J. D., Bevilacqua, R. M., Hoppel, K. W., and Randall, C. E. (2002).POAM III retrieval algorithm and error analysis.

Journal of GeophysicalResearch , 107(D21):ACH5.1–ACH5.32.Mills, P. (2009). Isoline retrieval: An optimal method for validation of advectedcontours.

Computers & Geosciences , 35(20):2020–2031.Ott, E. (1993).

Chaos in Dynamical Systems . Cambridge University Press.Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992).

Numerical Recipes in C . Cambridge University Press, 2nd edition.Randall, C. E., Lumpe, J. D., Bevilacqua, R. M., Hoppel, K. W., Fromm, M. D.,Salawitch, R. J., Swartz, W. H., Lloyd, S. A., Kyro, E., von der Gathen, P.,Claude, H., Davies, J., DeBacker, H., Dier, H., Molyneux, M. J., and Sanchoi,J. (2002). Reconstruction of three-dimensional ozone ﬁelds using POAM IIIduring SOLVE.

Journal of Geophysical Research , 107(D20):8299–8312.Rodgers, C. D. (2000).

Inverse Methods for Atmospheric Sounding: Theory andPractice . World Scientiﬁc.Tang, Y. R., Kleeman, R., and Miller, S. (2006). ENSO predictability of a FullyCoupled GCM Model Using Singular Vector Analysis.

Journal of Climate ,19(14):3361–3377. ills: PC Proxy A Derivations

A.1 Mass conservation

Suppose that: X i q i = const. (46)This will be true for non-divergent ﬂows on equal area grids. Then: X i X j r ij q j = X j q j (47) X j q j X i r ij − ! = 0 (48)Therefore: X i r ij = 1 (49)If (46) is true, then: dd t X i q i = 0 (50)is also be true. Continuing: X i d q i d t = 0 (51) X i X j a ij q j = 0 (52) X j q j X i a ij = 0 (53)which shows the second part of (29) and (30): X i a ij = 0 (54) A.2 Diﬀusion and the Lyapunov spectrum

As pointed out already, a discrete tracer mapping will always require someamount of diﬀusion. This means that the tracer conﬁguration will tend towardsa uniform distribution over time, that is, it will “ﬂatten out.” We can showthat, given the constraint in (46), a tracer ﬁeld with all the same values has thesmallest magnitude. Suppose there are only two elements in the tracer vector, ~q = { q, q } . The magnitude of the vector is: | ~q | = p q + q = √ q (55) ills: PC Proxy q , that nonethelesskeeps the sum of the elements constant: | q + ∆ q, q − ∆ q | = p ( q + ∆ q ) + ( q − ∆ q ) (56)= √ p q + (∆ q ) ≥ √ q (57)This will generalize to higher-dimensional vectors. In general, we can say that: ~qR T R~q ≤ | ~q | (58)Implying that for the eigenvalue problem, R T R~v = s ~vs ≤ ~q in terms of the right singularvalues, { ~v i } : ~q = X i c i ~v i (60)where { c i } are a set of coeﬃcients. Substituting this into the left-hand-side of(58): ~qR T R~q = X i c i ~v i X i c i s i ~v i (61)= X i X j c i c j s i ~v i ~v j (62)= X i X j c i c j s i δ ij (63)= X i c i s i (64)where δ is the Kronecker delta. Similarly, we can show that: ~q · ~q = X i c i (65)If we assume that s i ≤ i , then: X i c i s i ≤ X i c i (66)since each term on the left side is less-than-or-equal-to the corresponding termon the right side. Note that in order for the inequality in (66) to be broken,at least one singular value must be greater-than one. Therefore (58) is true forevery ~q if-and-only-if (59) is true for every s . In the language of set theory andﬁrst-order logic: ∀ ~q ∈ ℜ n ( | R~q | ≤ | ~q | ) ⇐⇒ ∀ s ∈ ℜ| R T R~v = s ~v ( s ≤≤