[PDF] Pinsker estimators for local helioseismology

Abstract

A major goal of helioseismology is the three-dimensional reconstruction of the three velocity components of convective flows in the solar interior from sets of wave travel-time measurements. For small amplitude flows, the forward problem is described in good approximation by a large system of convolution equations. The input observations are highly noisy random vectors with a known dense covariance matrix. This leads to a large statistical linear inverse problem. Whereas for deterministic linear inverse problems several computationally efficient minimax optimal regularization methods exist, only one minimax-optimal linear estimator exists for statistical linear inverse problems: the Pinsker estimator. However, it is often computationally inefficient because it requires a singular value decomposition of the forward operator or it is not applicable because of an unknown noise covariance matrix, so it is rarely used for real-world problems. These limitations do not apply in helioseismology. We present a simplified proof of the optimality properties of the Pinsker estimator and show that it yields significantly better reconstructions than traditional inversion methods used in helioseismology, i.e.\ Regularized Least Squares (Tikhonov regularization) and SOLA (approximate inverse) methods. Moreover, we discuss the incorporation of the mass conservation constraint in the Pinsker scheme using staggered grids. With this improvement we can reconstruct not only horizontal, but also vertical velocity components that are much smaller in amplitude.

Full PDF

PPinsker estimators for local helioseismology:inversion of travel times for mass-conserving ﬂows

Damien Fournier , Laurent Gizon , , , Martin Holzke ,Thorsten Hohage Institut f¨ur Numerische und Angewandte Mathematik, Georg-August-Universit¨at,Lotzestr. 16-18, D-37083 G¨ottingen Max-Planck-Institut f¨ur Sonnensystemforschung, Justus-von-Liebig-Weg 3, 37077G¨ottingen, Germany Institut f¨ur Astrophysik, Georg-August-Universit¨at G¨ottingen,Friedrich-Hund-Platz 1, 37077 G¨ottingen, Germany National Astronomical Observatory of Japan, Mitaka, Tokyo 181-8588, JapanE-mail: [email protected]

18 September 2018

Abstract.

A major goal of helioseismology is the three-dimensional reconstructionof the three velocity components of convective ﬂows in the solar interior from setsof wave travel-time measurements. For small amplitude ﬂows, the forward problemis described in good approximation by a large system of convolution equations. Theinput observations are highly noisy random vectors with a known dense covariancematrix. This leads to a large statistical linear inverse problem.Whereas for deterministic linear inverse problems several computationally eﬃcientminimax optimal regularization methods exist, only one minimax-optimal linearestimator exists for statistical linear inverse problems: the Pinsker estimator. However,it is often computationally ineﬃcient because it requires a singular value decompositionof the forward operator or it is not applicable because of an unknown noise covariancematrix, so it is rarely used for real-world problems. These limitations do not applyin helioseismology. We present a simpliﬁed proof of the optimality properties of thePinsker estimator and show that it yields signiﬁcantly better reconstructions thantraditional inversion methods used in helioseismology, i.e. Regularized Least Squares(Tikhonov regularization) and SOLA (approximate inverse) methods.Moreover, we discuss the incorporation of the mass conservation constraint in thePinsker scheme using staggered grids. With this improvement we can reconstructnot only horizontal, but also vertical velocity components that are much smaller inamplitude.

1. Introduction

Time-distance helioseismology aims at recovering the internal properties of the Sun frommeasurements of wave travel times between pairs of points [12]. The raw observationsin helioseismology are time sequences of images of the line-of-sight velocity on the a r X i v : . [ m a t h . NA ] F e b insker estimators for local helioseismology insker estimators for local helioseismology

2. Estimating ﬂows by local helioseismology

In local helioseismology, it is acceptable to consider small patches of the solar surfaceand to neglect solar curvature. The domain of interest is approximated by a Cartesianbox, with horizontal coordinates r = ( x, y ) and vertical coordinate (height) z . Let usdenote this domain by V . Typically, x and y span several hundreds of megameters and z several tens of megameters.The observables are time series of the line-of-sight velocities ψ ( r , t j ) at diﬀerentpoints r obtained from dopplergrams of the Sun’s surface taken by satellites atequidistant time points t j . From these quantities, we compute averaged travel times˜ τ a ( r ) at diﬀerent points r (and at time t , but we assume the time series ψ to bestationary) of the form˜ τ a ( r ) = (cid:88) j (cid:90) Cov( ψ ( r , t ) , ψ ( r + ˜ r , t + t j )) w a (˜ r , t j ) d˜ r , a = 1 , . . . , N a . (We reserve the symbol τ a for diﬀerences of ˜ τ a to a reference model.) The weights w a arechosen such that ˜ τ a ( r ) approximates a spatial average of the times a certain type of wavepacket needs to travel from point r to points r + ˜ r , see [5, 12] and the end of this sectionfor more details. Hence, what will be called travel times in the following are linearfunctionals of the covariance operator of the observable ψ , written as ˜ τ a = W a ( Cov [ ψ ])or ˜ τ = W ( Cov [ ψ ]) for the vector ˜ τ = (˜ τ a ) a =1 ..N a of all travel times.The observable ψ is the image of the wave displacement ξ = ξ ( r , z, t ) under anobservation operator T , i.e. ψ = T ξ . Ideally, ψ ( r , t ) = l ( r , t ) · ξ ( r , , t ) with the unit-length line-of-sight vector l ( r , t ), but in practice T also involves the point spread functionof the instrument. The wave displacement is linked to internal properties of the Sun via insker estimators for local helioseismology

4a PDE describing the wave propagation in the Sun [6]: L ξ := ρ ( ∂ t + Γ + v · ∇ ) ξ − ∇ (cid:0) ρc ∇ · ξ (cid:1) + ∇ ( ξ · ∇ P ) + ∇ · ( ρg ξ ) = f, where ρ is the density, c the sound speed, P the pressure, g the gravitational acceleration,Γ the damping, v the ﬂow, and f a (stochastic) source term responsible for the excitationof the seismic waves. Additional terms can be included to take into account the eﬀectsof rotation, magnetic ﬁeld or a more complex form of the gravitational term.Our aim is to recover the 3D ﬂow velocity ﬁeld v = ( v x , v y , v z ) from observed traveltimes ˜ τ . Inversion for other physical quantities can be performed analogously. We pointout that actually computations are performed in the frequency domain, but at leastformally we can write the forward operator as F ( v ) = W ( T L [ v ] − Cov[ f ]( L [ v ] − ) ∗ T ∗ ),so we have to solve the nonlinear operator equation ˜ τ = F ( v ) + n where n denotesnoise. Under the assumption that v is small compared to the local wave speed, whichis true at least in quiet parts of the Sun, the Born approximation F ( v ) ≈ F (0) + F (cid:48) [0] v is suﬃciently accurate [18], and we obtain the linear operator equation F (cid:48) [0] v = τ + n with τ := ˜ τ − F (0). The operator F (cid:48) [0] can be written as an integral operator, and dueto horizontal translation invariance the Schwartz kernel K of F (cid:48) [0] only depends on thediﬀerence r − r (cid:48) , so τ a ( r ) = (cid:90) V (cid:88) β ∈{ x , y , z } K a ; β ( r (cid:48) − r , z ) v β ( r (cid:48) , z ) d r (cid:48) dz + n a ( r ) , a = 1 , . . . , N a (1)(see [18]). The functions K a ; β are known as sensitivity kernels, but in contrast to theconvention used in helioseismology where K a ; β ( r (cid:48) − r , z ) is replaced by K a ; β ( r (cid:48) + r , z ),we use a standard convolution integral as it is mathematically more convenient. Theassumption that the kernels are invariant under horizontal translation is intimatelyconnected to the assumption that we are modeling only a small patch on the solarsurface.Due to mass conservation the ﬂow velocity satisﬁes the equationdiv ( ρ v ) = 0 , (2)where the mass density ρ is assumed to depend on z only. Note that this constraintreduces the eﬀective number of unknowns of the inverse problem by about one third.Besides the Born approximation we will use two further simplifying assumptions:The ﬁrst approximation consists in imposing periodic boundary conditions in thehorizontal variables. Since the kernels are localized, aliasing artifacts can be avoided byzero-padding, but, nevertheless, this approximation leads to a loss of information closeto the boundaries. We may assume without further loss of generality that the periodicitycell is [ − π, π ] in dimensionless coordinates. The second approximation consists in adiscrete treatment of the depth variable z . For simplicity, we assume that the v β ( r , · ) isrepresented by its values on a grid { z , . . . , z N z } and deﬁne v β,z j ( r ) := v β ( r , z j ).Then, (1) can be written as τ a ( r ) = N z (cid:88) j =1 (cid:88) β ∈{ x , y , z } (cid:0) K a ; β,z j ∗ v β,z j (cid:1) ( r ) + n a ( r ) , a = 1 , . . . , N a (3) insker estimators for local helioseismology ∗ denotes periodic convolution. Denoting by v k := (2 π ) − (cid:90) π − π (cid:90) π − π f ( r ) exp( − i r · k ) dxdy , k ∈ Z , the Fourier coeﬃcients of a periodic function f : ( R / Z ) → C , we can write (3)equivalently in Fourier space as τ a k = N z (cid:88) j =1 (cid:88) β ∈{ x , y , z } K a ; β,z j k v β,z j k + n a k , a = 1 , . . . , N a , k ∈ Z . (4)The problem is now decoupled for each spatial frequency k and can be written in amatrix form as τ k = K k v k + n k , k ∈ Z (5)where the quantities we want to recover have been reorganized in the column vectors v k = ( v β,z j k ) β,z j ∈ C N z , the observables are τ k = ( τ a k ) a ∈ C N a , and the Fouriertransformed convolution kernels are K k = ( K a ; β,z j k ) ∈ C N a × N z .The noise is assumed to be translation invariant, so the noise covariance matrix,Λ ab ( d ) = Cov[ n a ( r ) , n b ( r + d )] , a, b = 1 , . . . , N a does not depend on r . As a consequence, noise vectors n k , n k (cid:48) for diﬀerent spatialfrequencies k , k (cid:48) ∈ Z are uncorrelated, and the covariance matrix of n k is given byΛ k = (Λ ab k ) ab ∈ C N a × N a . An expression for these matrices was ﬁrst derived in [19] andgeneralized in [17] taking into account that the observation time is ﬁnite.For our computations we will use the kernels K from [32], which we are goingto describe brieﬂy. We consider a Cartesian patch of the solar surface containing 200 ×

200 pixels with a spatial sampling width of h x = 1 .

46 Mm. The vertical direction z is discretized with N z = 89 points using a variable step size as the variations arestronger close to the surface due to the density proﬁle. This variation of the massdensity of several orders of magnitude near the surface is one of the diﬃculties to invertfor velocities. The quantity v = ( v x , v y , v z ) we want to recover has thus 3 N z = 267degrees of freedom for each spatial frequency k .In order to improve the signal-to-noise ratio, certain averages of point-to-pointtravel times are used, for example between the center of a disk and all the pointslocated at a given radius of this disk. Such types of data are sensitive to in/out ﬂowsin this disk. Imposing other weights on the circle leads to data that are sensitive toEast-West or North-South ﬂows. Varying the center of this disk on the whole surfaceof the observational domain allow to build a map of observations. We use each of thesethree averaging schemes for 16 radii from 5 Mm to 20 Mm. Moreover, we use ﬁlters forf, p1, p2, p3, and p4 waves. (The ﬁrst one is a gravitational wave, and the latters areacoustic waves with 1,2,3 or 4 nodes.) This yields N a = 3 × × points on the solar surface. Thus, the kernels K k are of thesize 240 ×

267 for each of the 200 frequencies k .To provide some intuition for the problem we are solving, a representation of kernelsfor v x and v z using diﬀerent ﬁlters is given in Figure 1. The right column represents insker estimators for local helioseismology Figure 1. y = 0; columns 2 and 4: cuts at z = 0. The diﬀerent rows showdiﬀerent wave ﬁlters, from top to bottom: f, p1 and p3. To facilitate the comparisonthe amplitude of p1 (resp. p3) kernels have been multiplied by 2 (resp. 6). The largestthe radial order the deepest is the sensitivity. The columns 1 and 2 show East-Westaveraging schemes that are designed to be sensitive to v x velocities, and columns 3 and4 are in-out averaging sensitive v z velocities. cuts at z = 0 (surface of the Sun) for the part sensitive to v x (top) and v z (middle).The kernels are localized around the center indicating that the data are relatively closeto the quantities we want to infer for. The bottom plot shows the cross-talk betwween v z and v x i.e. how the data sensitive to v z are related to the ones for v x . One can seethat the amplitude is around ten times smaller than the one of the main kernel (top)and that the integral of this kernel is zero indicating that the average value of v z is notinﬂuenced by v x . The left and right columns of Figure 1 show the depth dependance ofthe kernels for diﬀerent type of waves. One can see the importance of using diﬀerentwaves in order to probe diﬀerent depths in the solar interior. However, all kernels areextremely sensitive to the surface making inversion at large depths highly ill-posed.An example of travel time map for a given ﬁlter is given in Figure 2 before addingthe noise and after. The noise level corresponds to data averaged over 4 days with atemporal sampling of 45 seconds. Even with such a long averaging time, one can seethat the noise is highly correlated, which underlines the importance of a good knowledgeof the noise covariance matrix as computed in [17, 19].

3. Classical inversion methods used in local helioseismology

Tikhonov regularization is generally called Regularized Least Squares (RLS) in thehelioseismology community. Since the problem decouples for all k ([24]), we can compute (cid:98) v RLS k := argmin v k (cid:20)(cid:13)(cid:13)(cid:13) Λ − k ( K k v k − τ k ) (cid:13)(cid:13)(cid:13) + α (cid:107) L k v k (cid:107) (cid:21) = (cid:0) K ∗ k Λ − k K k + αL ∗ k L k (cid:1) − K ∗ k Λ − k τ k (6)independently for all spatial frequencies k ∈ Z . Here α > L k is a regularization matrix that can be the identity or the discretized insker estimators for local helioseismology Figure 2.

Noisy synthetic travel times for a supergranule velocity model from Ref.[11]. The top label indicates the ﬁlter and the factor by which the data are multiplied(e.g. times 2 for p1 mode). The deeper the waves are travelling, the noisier theobservations. version of the gradient or the Laplacian in order to impose additional smoothness onthe solution.

Diﬀerent types of Optimally Localized Averaging (OLA) methods are used inhelioseismology. Recently, it was proposed to take advantage of the convolution inthe horizontal space and to propose a multichannel OLA [23]. Similar to the previousapproach the problem decouples for all frequencies and can be solved eﬃciently. We seekfor a linear combination of travel times via weighting matrices W k = ( W β,z j ; a k ) ∈ C N z × N a (the Fourier coeﬃcients of weighting kernels W ( r ) := (cid:80) k ∈ Z W k exp( i r · k ) with valuesin R N z × N a ) such that (cid:98) v k = W k τ k , k ∈ Z (7)is a good estimate of v k .Note from the second line in (6) that RLS is also of this form with W RLS k =( K ∗ k Λ − k K k + αL ∗ k L k ) − K ∗ k Λ − k . Inserting (4) into (7) yields (cid:98) v k = W k K k v k + W k n k . (8) Deﬁnition 3.1.

For a regularization method of the form (7) the function K ( r ) := (cid:80) k ∈ Z K k exp( i r · k ) with Fourier coeﬃcients K k := W k K k (9) and values in R N z × N z is called the averaging kernel of the method. (Often only speciﬁcrows of W and K corresponding to a speciﬁc depth z j and a Cartesian component β areconsidered. We will denote them by W [ β, z j ; :]( r ) and K [ β, z j ; :]( r ) .)insker estimators for local helioseismology E [ (cid:98) v ] and hence the bias E [ (cid:98) v ] − v of the estimator (cid:98) v is characterized by a convolution with the averaging kernel: E [ (cid:98) v ] = K ∗ v. To keep the bias small the diagonal entries ( α = β ) of the averaging kernel K α,z j ,β,z l ( r )should be well concentrated around z l ≈ z j and r = 0. The oﬀ-diagonal entries ( β (cid:54) = α )measure the leakage from one Cartesian component β to another component α andshould be small.The SOLA (Subtractive OLA) methods aims at ﬁnding rows of a weighting kernel W indexed by β, z j such that the corresponding rows of the averaging kernel K are asclose as possible to rows of a prescribed target function T ( r ) ∈ R N z × N z while keepingthe noise (last term in (8)) small. This can be achieved by setting W OLA k [ β, z j ; :] := argmin W ∈ C × Na (cid:2) (cid:107) W K k − T k [ β, z j ; :] (cid:107) + µW Λ k W ∗ (cid:3) (10)(see [23]) where µ > T β,z j ; α,z l ( r ) for α = β is generally chosen as aGaussian in ( r , z l ) around the point (0 , z j ). For α (cid:54) = β it is chosen as 0. Obviously, theconvex quadratic minimization problem (10) can be solved by solving the linear ﬁrstorder optimality conditions. We also mention the MOLA (Multiplicative OLA) [10]method which uses a product KT instead of the diﬀerence.These methods involve the target functions T and the parameter µ as parameters,the choices of which are not obvious and involve certain subjectivity. In the next section,we propose to use the Pinsker estimator that is optimal in the sense that it minimizesthe risk in a given class of functions.

4. Pinsker estimator

The problem described in Section 2 can be formulated as a linear operator equation τ = Kv + n. (11)in the Hilbert spaces X = L ([ − π, π ] ) N z and Y = L ([ − π, π ] ) N a with a compact,linear operator K : X → Y given by a matrix of convolution operators.We assume that the noise n is a Hilbert space process in Y with zero mean valueand known covariance operator Cov[ n ]. The modelling errors that are ignored in theassumption E [ n ] = 0 and references for Cov[ n ] have been discussed in Section 2.An estimator is an operator W : Y → X that maps observations τ to anapproximation W τ ∈ X of v . The risk (or expected square error) of an estimator W at v is deﬁned by R ( W, v ) = E (cid:2) (cid:107) W ( Kv + n ) − v (cid:107) (cid:3) . (12)If W is linear, the risk can be decomposed into a bias and a variance part using E [ n ] = 0: R ( W, v ) = (cid:107) ( W K − I ) v (cid:107) + E [ (cid:107) W n (cid:107) ] . (13) insker estimators for local helioseismology (cid:107) ( W K − I ) v (cid:107) describes how far W is from the inverse of the forward operatorwhile the variance term E [ (cid:107) W n (cid:107) ] = trace (Cov[ W n ]) = trace ( W ∗ Cov[ n ] W ) describesthe stochastic part of the error.The maximal risk of an estimator W on a set Θ ⊂ X is deﬁned as R ( W, Θ) = sup v ∈ Θ R ( W, v ) = sup v ∈ Θ E (cid:2) (cid:107) W ( Kv + n ) − v (cid:107) (cid:3) . (14)The minimax risk and the minimax linear risk on Θ are obtained by taking the inﬁmumover all estimators (or all linear estimator, resp.) of (14) R N (Θ) = inf W R ( W, Θ) , R L (Θ) = inf W linear R ( W, Θ) . (15)A linear estimator W that attains the inﬁmum in (15) is called a minimax linearestimator . To construct such an estimator for (11) we ﬁrst perform a whitening bymultiplying (11) from the left by Cov[ n ] − / to obtain˜ τ = ˜ Kv + ˜ n (16)where ˜ τ := Cov[ n ] − / τ and ˜ n := Cov[ n ] − / n is now a white noise process, i.e.Cov[˜ n ] = I Y . To ensure that ˜ K := Cov[ n ] − / K is well deﬁned, we assume that Cov[ n ]is strictly positive deﬁnite, i.e. every linear functional of τ contains a minimal ﬁxedamount of noise. Although this assumption could be relaxed, it is simple and intuitive,and also guarantees compactness of ˜ K . Hence, ˜ K admits a singular value decomposition { ( σ l , ϕ l , ψ l ) : l ∈ N } . This allows us to rewrite the operator equation (11) as a diagonaloperator equation in sequence spaces given by y l = σ l v l + n l (17)with observables y l := (cid:104) Cov[ n ] − / τ, ψ l (cid:105) Y and unknowns v l := (cid:104) v, ϕ l (cid:105) X . Due toGaussianity the noise ( n l ) l ∈ N is a sequence of uncorrelated N (0 ,

1) random variables.Let us consider linear diagonal estimators of the formˆ v l := λ l σ l y l , W λ τ := (cid:88) l ∈ N λ l σ l (cid:104) Cov[ n ] − / τ, ψ l (cid:105) Y ϕ l (18)with weights λ l ∈ R . The risk R ( λ, v ) := R ( W λ , v ) of such estimators is given by R ( λ, v ) = (cid:88) l ∈ N (cid:20) (1 − λ l ) v l + λ l σ l (cid:21) . (19)We will consider ellipsoids of the formΘ := (cid:40) v ∈ X : ∞ (cid:88) l =1 a l v l ≤ Q (cid:41) (20)with Q > a l >

0. Then the risk R ( λ, Θ) := R ( W λ , Θ) is given by R ( λ, Θ) = Q sup l ∈ N (1 − λ l ) a l + ∞ (cid:88) l =1 λ l σ l . (21) Lemma 4.1.

Any minimax linear estimator must be of the diagonal form (18).insker estimators for local helioseismology Proof.

Note that since a l → ∞ , the supremum in (21) is attained at some index l λ ∈ N ,and R ( λ, Θ) = R ( λ, { v λ } ) with v λ = ( √ Q/a l λ ) ϕ l λ ∈ Θ. If a linear estimator W with anondiagonal (inﬁnite) matrix representation is replaced by its diagonal part diag( W ),the bias part of R ( W, { v diag( W ) } ) cannot increase and the variance part strictly decreases.Hence, R (diag( W ) , Θ) = R (diag( W ) , { v diag( W ) } ) < R ( W, { v diag( W ) } ) ≤ R ( W, Θ) , which shows that W is not minimax.Even though the following result is well-known, we would like to present a shortproof since we consider it more instructive and simpler than other proofs, e.g. in[2, 30, 36] (all for the equivalent regression problems version of the theorem). Inparticular, we derive the formulas (22) and (23) and not just verify them, and it becomesapparent that κ √ Q is the bound on the bias. Theorem 4.2 (Pinsker estimator) . Consider a sequence ( a l ) l ∈ N such that a l > and lim l →∞ a l = ∞ , and an ellipsoid of the form (20) with Q > . Then there exists aunique minimax linear estimator on Θ . It is of the form (18), and its weights are givenby λ l = max(1 − κa l , , (22) where the constant κ > is the unique solution of the equation κQ − ∞ (cid:88) l =1 a l σ l max(1 − κa l ,

0) = 0 . (23) The minimax linear risk is given by R L (Θ) = (cid:80) ∞ l =1 1 σ l max(1 − κa l , . Proof.

The inﬁmum of R ( λ, Θ) over all sequences λ can be reduced to the set Λ := { λ ∈ l ( N ) : (cid:107) λ (cid:107) ∞ ≤ } since R ( λ, Θ) = ∞ if λ / ∈ l ( N ) and R ( λ, Θ) strictly decreases ifsome λ j / ∈ [ − ,

1] is replaced by its metric projection onto [ − , (cid:91) <κ ≤ /a Λ κ with Λ κ := (cid:26) λ ∈ Λ : sup j ∈ N | − λ j | a j = κ (cid:27) with a := min j a j . In view of (21) we have R ( λ, Θ) = κ Q + (cid:80) ∞ l =1 ( λ l /σ l ) for λ ∈ Λ κ ,so the inﬁmum over λ ∈ Λ κ is attained if and only if λ l = argmin | − x |≤ κa l x =max(1 − κa l ,

0) for all l ∈ N . Note that this is (22) if κ = κ . Using this formulafor the minimizer we ﬁnd thatinf λ ∈ Λ κ R ( λ, Θ) = ϕ ( κ ) with ϕ ( κ ) := κ Q + ∞ (cid:88) l =1 max(1 − κa l , σ l . Therefore inf λ ∈ Λ R ( λ, Θ) = inf <κ ≤ /a ϕ ( κ ). Note that ϕ is strictly convex anddiﬀerentiable with ϕ (cid:48) ( κ ) given by the left hand side of (23) since the sum is ﬁnitein a neighborhood of any κ >

0. Moreover, lim κ (cid:38) ϕ ( κ ) = ∞ and ϕ (cid:48) (1 /a ) = Q/a > ϕ attains its inﬁmum on (0 , /a ] at the unique solution ¯ κ to ϕ (cid:48) ( κ ) = 0. insker estimators for local helioseismology κ there is also the following explicit formulaif the sequence ( a l ) is non-decreasing (see [36]): κ = (cid:80) Nj =1 a j σ j Q + (cid:80) Nj =1 a j σ j with N := max (cid:40) n ∈ N : n (cid:88) l =1 σ l a l ( a n − a l ) < Q (cid:41) . From a practical point of view, this formula is only useful if Q is known exactly. Butthis is a rather unrealistic assumption. Q should rather be seen as a regularizationparameter. But since there is a one-to-one correspondence between Q and κ via (23), itis much simpler to consider κ as regularization parameter. The choice of regularizationparameters is an important and well-studied problem, but since the focus of this paperis on the comparison of regularization methods, we do not further discuss it here.A comparison of the linear minimax risk R L with the nonlinear one was given in[30]. Under the additional assumptions that the noise is Gaussian and thatsup j ∈ N (cid:80) Jj =1 σ j sup j ≤ J σ j < ∞ , (24)then R L (Θ) ∼ R N (Θ) as the noise level tends to 0. Assumption (24) was later relaxed tosup j σ j /σ j +1 < ∞ [20]. This assumption is very plausible in the context of our problem.It remains to discuss the choice of the ellipsoid Θ. Without depth inversion, i.e. for N z = 1 and a scalar physical quantity, it is natural to deﬁne Θ in terms of some boundon the power spectrum of the form (cid:88) k ∈ Z γ ( k ) | v k | ≤ Q. E.g. for the choice γ ( k ) = (1 + | k | ) s the ellipsoids Θ are balls in the periodic Sobolev H s ([ − π, π ] ). In depth direction admissible choices of Θ are more diﬃcult to interpretsince the axes of the ellipsoid must coincide with the singular vectors of the forwardoperators.We choose the weights a l such that a l grows asymptotically as the weights γ ( k ) = γ ( k ( l )) on the L -Fourier coeﬃcients deﬁning an H ( V )-ball in a cuboid V ⊂ R as l → ∞ , i.e., a l = l / . (25)Here k ( l ) denotes an enumeration of the three-dimensional spatial frequencies such that | k ( l ) | is non-decreasing. Empirically, we observe that the singular values σ l of ourforward operator decay exponentially, i.e. σ l ∼ exp( − α − βl ) for some α ∈ R and β > k ( l ) (seeFigure 3).

5. Mass conservation constraint

In this section we discuss how the mass conservation constraint div ( ρ v ) = 0 mentionedin (2) can be incorporated into Tikhonov regularization and the Pinsker method. We insker estimators for local helioseismology Figure 3.

Left panel: Singular values of the forward operator with whitening. Notethe exponential decay of the singular values. Middle and right panel: Distribution ofsingular values in frequency space. The color at each spatial frequency k = ( k x , k y )represents the number of singular values > .

071 and > . σ = 41 . λ l ≥ . will assume that 0 < ρ min ≤ ρ ≤ ρ max < ∞ , that ρ is smooth and depends only on z .Then the inverse problem can be formulated as K v = τ subject to div ( ρ v ) = 0 . (26)In an abstract Hilbert space setting an equality constraint B v = 0 with a boundedlinear operator B : X → Z does not change much since we can simply replace X bythe null-space N ( B ) of B . However, as it is often inconvenient to explicitly construct abasis of N ( B ), it is preferable to work in the larger space X .E.g. statistical Tikhonov regularization with noise covariance operator Λ and a(diﬀerential) operator L mapping to a Hilbert space V applied to (26) reads v α = argmin B v =0 (cid:2) (cid:107) Λ − / K v − τ (cid:107) Y + α (cid:107) L v (cid:107) V (cid:3) . To treat the side condition we consider the corresponding Lagrange function L ( v , µ ) := (cid:107) Λ − / K v − τ (cid:107) Y + α (cid:107) L v (cid:107) V + (cid:104) µ, αB v (cid:105) Z with a Lagrange multiplier µ ∈ Z . Here B has been multiplied by the regularization parameter α to improve the condition numberof the optimality conditions ∂ L ∂ v = 0 and ∂ L ∂µ = 0. These then lead to the saddle pointequation (cid:32) K ∗ Λ − K + αL ∗ L αB ∗ αB (cid:33) (cid:32) v α µ (cid:33) = (cid:32) K ∗ Λ − τ (cid:33) . In this subsection we discuss a continuous treatment of the depth variable z . If V = [ − π, π ] × [ z N z , z ] is the domain of interest, we may choose X = (cid:8) v ∈ H ( V ) : v ( · ; z ) periodic, v z ( · , z ) = v z ( · , z N z ) = 0 (cid:9) . This choice of boundary conditions rules out coronal mass ejections, which are verysimple to detect and for which the Born approximation used in the derivation of theforward operator breaks down anyways. insker estimators for local helioseismology X with the norm (cid:107) v (cid:107) X := (cid:104) ρ v , ρ v (cid:105) / H where (cid:104) ρ v , ρ w (cid:105) H :=  (cid:88) β ∈{ x , y , z } (cid:104) ρv β , ρw β (cid:105) L ( V ) + (cid:104) grad ρv β , grad ρw β (cid:105) L ( V )  / . Under our assumptions on ρ the norms (cid:107) ρ v (cid:107) H and (cid:107) v (cid:107) H are equivalent, but since ρ varies over several orders of magnitude, the incorporation of ρ in the norm makes asigniﬁcant diﬀerence.Let us introduce the operators grad ρ v := grad ( ρ v ), curl ρ v := curl ( ρ v ), anddiv ρ v := div ( ρ v ). The following lemma summarizes the properties of the subspace N ( div ρ ) ⊂ X and will be proved in an appendix. Lemma 5.1. (i) For all v , w ∈ X we have (cid:88) β ∈{ x , y , z } (cid:104) grad ρ v β , grad ρ w β (cid:105) L ( V ) = (cid:104) curl ρ v , curl ρ w (cid:105) L ( V ) + (cid:104) div ρ v , div ρ w (cid:105) L ( V ) (27) (ii) There exists a constant c > such that the inequalities c (cid:107) ρ v (cid:107) H ≤ (cid:107) curl ρ v (cid:107) L + 1 | V | (cid:88) β ∈{ x , y } (cid:18)(cid:90) V ρv β d( r , z ) (cid:19) ≤ (cid:107) ρ v (cid:107) H (28) hold true for all v ∈ X with div ρ v = 0 .(iii) X (cid:5) := { v ∈ X : (cid:82) V ρv x d ( r , x ) = (cid:82) V ρv y d ( r , x ) = 0 } has the Helmholtz decomposition X (cid:5) = N (cid:5) ( div ρ ) ⊕ N (cid:5) ( curl ρ ) with N (cid:5) ( div ρ ) := { v ∈ X (cid:5) : div ρ v = 0 } and N (cid:5) ( curl ρ ) := { v ∈ X (cid:5) : curl ρ v = 0 } .These subspaces are orthogonal both with respect to the X inner product and theinner product (cid:104) ρ v , ρ w (cid:105) L ( V ) . We will choose L v := curl ( ρ v ). This means we do not incorporate the means ofthe horizontal velicity components into the penalty term, which are needed to obtaina norm on { v ∈ X : div ρ v = 0 } . This is justiﬁed as the data are sensitive to constanthorizontal velocities, i.e. K restricted to span { (1 /ρ, , , (0 , /ρ, } is bounded frombelow (see [13, § In this subsection we discuss a discrete approximation of the z -variable which inherits theessential properties of the continuous setting. We found this crucial for good numericalresults. Since ρ depends only on z , the constraint div ( ρ v ) = 0 separates into ik x ρv x k + ik y ρv y k + ∂ρv z k ∂z = 0 , k = ( k x , k y ) ∈ Z . Hence the only diﬀerence between a continuous and a discrete treatment of the (periodic)horizontal variables x and y is that in the former case inﬁnitely many spatial frequenciesmust be considered, and in the latter case only ﬁnitely many. insker estimators for local helioseismology z > z > . . . > z N z in verticaldirection we introduce the midpoints z j +1 / := ( z j + z j +1 ). The horizontal velocities willbe represented on { z / , . . . , z N z − / } whereas the vertical velocities will be representedby their values on { z , . . . , z n − } . Here the points z and z N z have been omitted due tothe Dirichlet boundary conditions for v z such that v x k , v y k ∈ V := C N z , v z k ∈ W := C N z − . These quantities will be indexed by v z k = ( v z k , , . . . , v z k ,N z − ) (cid:62) and v β k =( v β k , / , . . . , v β k ,N z − / ) (cid:62) , β = x, y . To deﬁne inner products on V and W we introduceweights δ j := z j − / − z j +1 / for j = 1 , . . . , N z − δ j +1 / := z j − z j +1 for j = 0 , . . . , N z −

1. Then we introduce Gram matrices G V := diag( δ / , . . . , δ N z − / ) and G W := diag( δ , . . . , δ N z − ) (29)deﬁning inner products (cid:104) v , v (cid:105) V := v ∗ G V v on V and (cid:104) v , v (cid:105) W := v ∗ G W v on W .Similarly, we deﬁne ρ j := ρ ( z j ) for j ∈ { , / , , . . . , N z } and the matrices M V ρ =diag( ρ / , . . . ρ N z − / ) and M W ρ = diag( ρ , . . . ρ N z − ). We approximate derivatives bythe ﬁnite diﬀerences ∂v z k ∂z ( z j +1 / ) ≈ v z k ,j − v z k ,j +1 δ j +1 / = ( D W z v z k ) j , ∂v β k ∂z ( z j ) ≈ v β k ,j − / − v β k ,j +1 / δ j = ( D V z v β k ) j for β = x , y with D W z ∈ C N z × ( N z − . = L ( W , V ) and D V z ∈ C ( N z − × N z . = L ( V , W ) givenby D V z := G − W  −

1. . . . . .1 −  and D W z := G − V  − −

1. . . . . .1 −  . These matrices are skew-adjoint with respect to the inner products in V and W since G V D W z = − ( G W D V z ) ∗ , and hence (cid:104) D W z w, v (cid:105) V = v ∗ G V D W z w = − v ∗ ( G W D V z ) ∗ w = − ( D V z v ) ∗ G W w = −(cid:104) w, D V z v (cid:105) W . (30)Now we introduce the following approximations to the div , grad , curl , and curl ∗ forthe spatial frequency k ∈ Z :div k := (cid:16) ik x I V ik y I V D W z (cid:17) : V × V × W → V , grad k := (cid:16) ik x I V ik y I V ( D V z ) (cid:62) (cid:17) (cid:62) : V → V × V × W curl k :=  − D V z ik y I W D V z − ik x I W − ik y I V ik x I V  : V × V × W → W × W × V , insker estimators for local helioseismology k :=  − D W z ik y I V D W z − ik x I V − ik y I W ik x I W  : W × W × V → V × V × W . Let us introduce the spaces X k := V × V × W and Y k := W × W × V , the multiplicationoperator M X ρ := blockdiag (cid:16) M V ρ , M V ρ , M W ρ (cid:17) , and the mappingsdiv ρ, k := div k M X ρ , curl ρ, k := curl k M X ρ , (31)curl ρ, k := ( M X ρ ) − curl k , grad ρ, k := ( M X ρ ) − grad k . The Gram matrices in X and Y are G X := ( M X ρ ) blockdiag( G V , G V , G W ) and G Y := blockdiag( G W , G W , G V ) . (32)These matrices have the following properties: Lemma 5.2. (i) div ρ, k curl ρ, k = div k curl k = 0 and curl ρ, k grad ρ, k = curl k grad k = 0 .(ii) N ( D W z ) = { } (iii) curl ρ, k is the adjoint of curl ρ, k with respect to the Gram matrices G X and G Y , i.e. G X curl ρ, k = ( G Y curl ρ, k ) ∗ , and similarly G V div ρ, k = − ( G X grad ρ, k ) ∗ .(iv) With respect to the Gram matrix G X we have the orthogonal decomposition X k = N ( div ρ, k ) ⊕ N ( curl ρ, k ) and N ( div ρ, k ) = R ( curl ρ, k ) for k (cid:54) = 0 . (33) Proof.

Part (i) can be veriﬁed by straightforward computations.Part (ii) is also easy to see, and part (iii) follows from (30).To show part (iv) we ﬁrst demonstrate that N ( curl ρ, k ) ∩ N ( div ρ, k ) = { } (34)Let v k = ( v x k , v y k , v z k ) ∈ N ( curl ρ, k ) ∩ N ( div ρ, k ). We only treat the case k x (cid:54) = 0 as thecase k y (cid:54) = 0 is analogous. The last line in curl ρ, k v k = 0 implies that k x M V ρ v y k = k y M V ρ v x k . (35)Together with the relation div ρ, k v k = 0 this yields ik x D W z M W ρ v z k = k M V ρ v x k + k x k y M V ρ v y k = | k | M V ρ v x k . (36)From the second line in curl ρ, k v k = 0 we obtain ik x M W ρ v z k = D V z M V ρ v x k , so D W z D V z M V ρ v x k = | k | M V ρ v x k . Together with (30) we ﬁnd that (cid:0) D V z ∗ G ∗ W D V z + | k | G − V (cid:1) M V ρ v x k = 0. Since the matrix onthe left hand side is strictly positive deﬁnite, it follows that v x k = 0. Now it follows frompart (ii), (35), (36) and k x (cid:54) = 0 that v y k = 0 and v z k = 0, completing the proof of (34).From parts (i) and (iii) we obtain N ( curl ρ, k ) ⊥ = R ( curl ρ, k ) ⊂ N ( div ρ, k )with orthogonality with respect to the inner product generated by G X . Together with(34) this implies (33). insker estimators for local helioseismology Remark 5.3.

Let us discuss the case k = 0 . We claim that in analogy to the continuoussituation we have N ( curl ρ, ) ∩ N ( div ρ, ) = (cid:8)(cid:0) c x ( M V ρ ) − e, c x ( M V ρ ) − e, (cid:1) : c x , c y ∈ C (cid:9) (37) where e ∈ V is the vector with all entries equal to . In fact, for v ∈ N ( curl ρ, ) ∩N ( div ρ, ) it follows follows from part (ii) and div ρ, v = 0 that v z = 0 . Note that N ( D W z ) = { c ( M V ρ ) − e : c ∈ C } . Now (37) follows from curl ρ, v = 0 . The projection matrices onto N ( curl ρ, k ) and N ( div ρ, k ) can be computed using aQR-decomposition of G / X curl ρ, k : G / X curl ρ, k = ( Q k ˜ Q k ) (cid:16) R k (cid:17) , P k := G − / X Q k Q ∗ k G / X , k (cid:54) = 0 . (38)Here R k has full row rank p , [ Q k ˜ Q k ] is unitary, and Q k has p columns. We summarizethe properties of P k : Lemma 5.4.

Let k (cid:54) = 0 . Then P k is a projection onto N ( div ρ, k ) (i.e. P k = P k and R ( P k ) = N ( div ρ, k ) ), and I − P k is a projection onto N ( curl ρ, k ) . P k is orthogonal bothwith respect to the inner product induced by G X (i.e. P ∗ k G X = G X P k ) and the semi-deﬁnitinner product induced by the (Hermitian) Gram matrix G H k ,ρ = G X curl ρ, k G Y curl ρ, k − G X grad ρ, k G V div ρ, k (i.e. P ∗ k G H k ,ρ = G H k ,ρ P k ).Proof. The identity P ∗ k G X = G / X Q k Q ∗ k G / X = G X P k is obvious from the deﬁnition. Wehave P k = G − / X Q k ( Q ∗ k Q k ) Q ∗ k G / X = P k , so P k is a projection, which implies that I − P k is a projection as well. Using Lemma 5.2, parts (iii) and (i) we obtain R ( P k ) = R ( G − / X Q k ) = R ( curl ρ, k ) = N ( div ρ, k ) . Moreover, R ( I − P k ) = R ( P k ) ⊥ = N ( div ρ, k ) ⊥ = N ( curl ρ, k ) using 5.2(iv) and theself-adjointness of P k in X . By Lemma 5.2(iii) we have G H k ,ρ = curl ∗ ρ, k G Y curl ρ, k +div ∗ ρ, k G V div ρ, k , so G H k ,ρ is Hermitian and positive semi-deﬁnite. Moreover, sincediv ρ, k P k = 0 we have P ∗ k G H k ,ρ = (cid:16) P ∗ k G / X (cid:17) (cid:16) G / X curl ρ, k (cid:17) G Y curl ρ, k = (cid:16) G / X Q k Q ∗ k (cid:17) ( Q k R k ) G Y curl ρ, k = G / X ( Q k R k ) G Y curl ρ, k = G X curl ρ, k G Y curl ρ, k = curl ∗ ρ, k G Y curl ρ, k . Since the right hand side of this equation is Hermitian, so is the left hand side, whichimplies P ∗ k G H k ,ρ = G H k ,ρ P k .For k = 0 we deﬁne P k as follows:  G / X curl ρ, (( M V ρ ) − e, , , ( M V ρ ) − e,  = ( Q ˜ Q ) (cid:32) R (cid:33) , P := G − / X Q Q ∗ G / X , (39)see Remark 5.3. In this case P is the orthogonal projection onto N ( div ρ, ) ⊕ (cid:8)(cid:0) c x ( M V ρ ) − e, c x ( M V ρ ) − e, (cid:1) : c x , c y ∈ C (cid:9) . insker estimators for local helioseismology Let us recall of the deﬁnition of the Generalized Singular Value Decomposition GSVD(see [38]): Let A ∈ R m,n and L ∈ R q,n be matrices with m ≥ n and rank( L ) = p . Thenthere exist unitary matrices U ∈ R m,m and V ∈ R q,q and an invertible matrix X ∈ R n,n such that A = U SX − and L = V CX − (40)where S = diag( s , ..., s n ) ∈ R m × n and C = diag( c , ..., c min( q,n ) ) ∈ R q × n with1 ≥ c ≥ . . . ≥ c p > c p +1 = . . . = 0. The generalized singular values σ i of ( A, L ) are σ i = s i /c i for i = 1 , . . . , p , and the generalized right singular vectors of ( A, L ) are the ﬁrst p columns x , . . . , x p of X . They satisfy the orthogonality relations x ∗ j L ∗ Lx k = c j δ j,k for j, k ∈ { , . . . , n } . If L = I then the GSVD and the SVD coincide (except for thatthe ordering of the singular values).We will set A := Λ − / k K k P k with P k deﬁned in (38) and L := (cid:32) G / Y curl ρ, k G / V div ρ, k (cid:33) . For k (cid:54) = 0 this yields vectors x , . . . , x dim( X k ) , v , . . . , v dim( X k ) , and u , . . . , u p and numbers s , . . . , s p , c , . . . , c p > − / k K k P k x j = s j u j , j = 1 , . . . , pLx j = c j v j , j = 1 . . . , dim( X k ) . For j ≤ p we have x j ∈ N (Λ − / k K k P k ) ⊥ ⊂ N ( P k ) ⊥ = N ( div ρ, k )with orthogonality w.r.t. the L -induced inner product and x ∗ j G H k ,ρ x j = (cid:107) Lx j (cid:107) = c j .Therefore x j /c j are the singular vectors of A w.r.t. this inner product, and the k -thFourier coeﬃcient of the Pinsker estimator is W k τ k = p (cid:88) j =1 max(1 − κa j , σ j (cid:104) u j , Λ − / k τ k (cid:105) x j c j = p (cid:88) j =1 max(1 − κa j , s j (cid:104) u j , Λ − / k τ k (cid:105) x j .

6. Numerical results

In the following we will compare RLS, SOLA and Pinsker methods for recovering three-dimensional velocity ﬁelds from travel time measurements on the solar surface.To compare the diﬀerent inversion methods on synthetic data, we use the velocitymodel presented in [11] which reproduces an average supergranule. Supergranulation isa convection pattern with an average life time of about 1 day and a characteristic lengthof around 30 Mm that is observed at the surface of the Sun. A representation of thevelocity ﬁeld v z and v x is given in Figures 4 and 6 (top rows). This velocity is built suchthat mass is conserved, which explains the decrease of the amplitude with depth due tothe strong density gradient. These velocities are then convolved with the kernels, andnoise is added according to (1) in order to obtain travel time maps as shown in Figure 2. insker estimators for local helioseismology Data : • kernels K k ∈ C M × N a and noise covariance matrices Λ k ∈ C N a × N a forall frequencies k ; • regularization parameter κ ; Result : linear estimator W k for all frequencies k set up Gram matrices G V , G W , G X , and G Y (eqs. (29), (32)); for k ∈ [ − N k / , . . . , N k / − do set up matrices div ρ, k , curl ρ, k and P k (eqs. (31), (38), (39)) ;[ U k , X k , V k , s k , c k ] = gsvd (cid:32) Λ − / k K k P k , (cid:32) G / Y curl ρ, k G / V div ρ, k (cid:33)(cid:33) ;( U k , X k v, V k are matrices with columns u k ,j , x k ,j , v k ,j ; s k , c k are vectors with entries s k ,j , c k ,j ); end Find bijective ordering l : [ − N k , . . . , N k − × { , . . . , M } → N k N z such that s k ,j c k ,j ≥ s ˜ k , ˜ j c ˜ k , ˜ j if l ( k , j ) ≤ l (˜ k , ˜ j ), c k ,j , c ˜ k , ˜ j > l ( k , j ) ≥ l (˜ k , ˜ j ) if c k ,j = 0 and c ˜ k , ˜ j > for k ∈ [ − N k / , . . . , N k / − do p k = max { j : c k ,j > } ; for j = 1 , . . . , p k do a k ,j := l ( k , j ) / ; λ k ,j := max(1 − κa k ,j ,

0) ; end W k = X k [: , p k ]diag (cid:16) λ k , s k , , . . . , λ k ,p k s k ,p k (cid:17) U k [: , p k ] ∗ Λ − / k ; endAlgorithm 1: Pinsker algorithm with mass conservation constraint.

In the RLS method we have chosen the regularization term as H norm in horizontal andvertical directions, and in the Pinsker method the ellipsoid Θ was chosen according to(25)) to approximate a ball in the Sobolev space H ( V ). The regularization parameters α and κ have been chosen by the discrepancy principle. Although the discrepancyprinciple performs poorly for high dimensional white noise (and is not even well-deﬁned in the inﬁnite-dimensional case), here the noise is suﬃciently correlated forthe discrepancy principle to work reasonably well. The SOLA weighting kernels areobtained by minimizing (10) with a target function T β,z j ; α,z l ( r ) = exp (cid:32) (cid:107) r (cid:107) s h + (cid:107) z j − z l (cid:107) s v (cid:33) δ αβ , where s h and s v determine the localization of the averaging kernels in the horizontal andvertical directions. As usual we added a constraint for k = 0 via Lagrange multipliers toensure that the integrals over the averaging kernels for horizontal velocities are 1. This insker estimators for local helioseismology K . To allow a fair comparison with RLS and Pinsker, we did not impose a strongadditional penalty to suppress cross-talk as in [32] since we found that this induces asigniﬁcant loss of resolution.It is well-known in helioseismology that RLS and SOLA can reconstruct horizontalvelocity components v x , v y fairly well, but perform poorly for the reconstructionof vertical velocity components. Figure 4 shows the reconstruction of v z for thediﬀerent methods without mass conservation. As expected, the results for Tikhonovregularization are poor except close to the surface. The SOLA method is a bit better atlarger depths and the Pinsker estimator leads to a clear improvement with almost correctreconstructions at z t = − . z t = − . − . − . v z , we look at thedepth localisation of the averaging kernels. To compare the diﬀerent estimators W β forsome velocity component v β , β ∈ { x , y , z } , we choose the parameters in these methodssuch that the variance E [ | ( W β n )( r , z t ) | ] at the target depth z t has the same value for allthe methods. (Due to translation invariance of the noise covariance structure this valueis independent of r .). Then we compare the corresponding averaging kernels K β,z t ; α,z j describing the bias (see Deﬁnition 3.1). In Figure 5, we represented the horizontal L norm of K z ,z t ;z ,z j as a function of the depth z j for RLS, SOLA and Pinsker methodsat two diﬀerent target depths z t . One can see that the averaging kernel for the RLSmethod is mostly localized close to the surface rather than at the target depth. Incontrast, the averaging kernel of Pinsker is much better localized at z t , but still exhibitssome sensitivity to the values close to the surface. Intermediately, the SOLA averagingkernel is localized at the correct depth, but is extremely broad, so the reconstruction of v z at the target depth z t is greatly inﬂuenced by the other depths.The reconstructions of the horizontal velocity v x by Tikhonov, SOLA and Pinskermethods are shown in Figure 6. As expected, all methods perform well. Surprisingly,from visual inspection Pinsker seems slightly less accurate than Tikhonov regularizationat z t = − . K x ,z t ;x ,z j ( x,

0) as a function of z j and x for three diﬀerent depths z t ∈{− . , − . , − . } for the RLS, SOLA and Pinsker estimators.The diﬀerences between the three methods are the more pronounced the greaterthe target depth z t , i.e. the greater the ill-posedness. The Pinsker averaging kernelsturn out to be the most localized, in particular in z direction while the SOLA averagingkernels are the least localized. RLS and Pinsker produce similar averaging kernels forthe v x estimators, which is consistent with the observed reconstructions. However, it issurprising that the reconstruction with the Pinsker method is not the best at − . insker estimators for local helioseismology E x a c t -40040 z t = -0.9 Mm z t = -3.5 Mm z t = -5.5 Mm R L S -40040 S O L A -40040 -40 0 40 P i n sk e r -40040 -20 0 20 40 60 -40 0 40 -40 0 40 Figure 4.

Vertical velocities v z ( x, y, z t ) in m/s of a supergranule model from Ref.[11] (top) and their reconstructions with RLS (2nd row), SOLA (3rd row) and Pinsker(bottom) at three diﬀerent depths z t ∈ {− . , − . , − . } . The circlesat radii 10 Mm and 20 Mm represent zero level lines of the exact solution. need to look at the cross-talk, i.e. how v y and v z inﬂuence the estimator of v x . Figure 8shows the averaging kernels K x ,z t ;y ,z j and K x ,z t ;z ,z j at a target depth of z t = − . K x,z t ; x,z j , as opposed to around10% for RLS and 5% for SOLA. Figure 9 shows the reconstruction of the vertical component of the velocity for theTikhonov and Pinsker methods with mass conservation constraint. It underlines theimportance of incorporating the constraint into the inversion process. The verticalvelocity is now properly reconstructed by both methods. insker estimators for local helioseismology z k [Mm]-10 -5 0 || K z , z t , z , z k || RLSPinskerSOLA z k [Mm]-10 -5 0 || K z , z t , z , z k || RLSPinskerSOLA

Figure 5.

Horizontal norm of the averaging kernels K z ,z t ;z ,z k as a function of z k for target depth z t ∈ {− . , − . } for the estimators RLS, SOLA andPinsker. In each panel the regularization parameters are chosen such that for all threeestimators W z of the vertical velocity the standard deviation at the target depth z t is(Var[( W z τ )( r , z t )]) / = 3m / s for all r for noise corresponding to an averaging time of4 days. To better compare all the methods, Figure 10 represents a cut of the vertical velocityat x = 0 and z t ∈ {− . , − . } . Incorporating mass conservation into Tikhonovleads to a quite good reconstruction with an amplitude of about 70% of the true one.Finally, Pinsker with mass conservation is almost perfect at the depths up to − . δ -peaks arenot divergence free. We redeﬁne the Fourier coeﬃcients of the averaging kernel as K k, div = ( I − P k ) + P k K k P k (41)where P k denotes the L -orthogonal projection on the nullspace of div ρ . This typeof kernel still characterizes the bias of regularization methods if they are appliedto solutions satisfying the mass conservation constraint and if P k is applied as apostprocessing step. Note that this deﬁnition implies that the Fourier coeﬃcients ofthe averaging kernel are non-zero even at high frequencies due to the identity term.Thus, these averaging kernels cannot be directly compared to the ones of Section 3(and thus to the ones classically used in helioseismology), but their deﬁnition using (41)is natural as the convolution of K div with the velocities characterizes the bias of themethod.In Figure 11 we plot at each voxel the Frobenius norm of the 3 × insker estimators for local helioseismology E x a c t -40040 z t = -0.9 Mm z t = -3.5 Mm z t = -5.5 Mm R L S -40040 S O L A -40040 -40 0 40 P i n sk e r -40040 -200 0 200 -40 0 40 -100 0 100 -40 0 40 -50 0 50 Figure 6.

Horizontal velocities v x ( x, y, z t ) in m/s of a supergranule model from Ref.[11] (top) and their reconstruction with diﬀerent methods: RLS, SOLA and Pinsker(from top to bottom) at three diﬀerent depths z t = − . − . − . in z -direction and at − . z -direction, which is not needed forreconstructing the supergranule model used as test example.

7. Conclusions

We have shown that Pinsker estimators yield signiﬁcantly better reconstructionsof vertical velocities from travel time maps than Tikhonov regularization, and isalso superior to Subtractive Optimally Localized Averaging. This is consistentwith theoretical optimality properties of these estimators. However, as soon as insker estimators for local helioseismology z t = − . z t = − . z t = − . S O L A R L S P i n s k e r Figure 7.

Averaging kernels K x ,z t ;x ,z j ( x,

0) (characterizing the v x inﬂuence onthe bias of the v x estimators) as functions of x and z j for target depths z t ∈{− . , − . , − . } for the estimators SOLA, RLS, and Pinsker. Thevariances of these estimators for each target depth are chosen to be of the same size. depth inversion is involved, no simple, precise characterization of the ellipsoids onwhich Pinsker method is optimal is available. This is the usual situation for allspectral regularization methods such as Tikhonov regularization, Showalter’s method,Landweber iteration, and many others in a deterministic context. As opposed to manyother real-world problems, Pinsker estimators are computationally eﬃcient and easy toimplement in the context of local helioseismology.The mass conservation constraint can be incorporated naturally into the Pinsker insker estimators for local helioseismology v x - v y crosstalk v x - v z crosstalk S O L A R L S P i n s k e r Figure 8.

Cross-talk averaging kernels K x ,z t ;y ,z j and K x ,z t ;z ,z j for z t = − . v y and v z on the bias of the v x -estimators. estimator leading to another signiﬁcant improvement of accuracy and resolution. Underrealistic noise levels this yields reliable estimators of vertical velocity components up toa depth of − . f and p to p modes.Alternatively, one may study an adaptive, data-driven choice of the size of theellipsoid in the Pinsker method, which may be interpreted as a regularization parameter.We plan to address this as well as the application to real data in future work. insker estimators for local helioseismology R L S d i v -40040 z t = -0.9 Mm z t = -3.5 Mm z t = -5.5 Mm-40 0 40 P i n sk e r d i v -40040 -20 0 20 40 60 -40 0 40 -40 0 40 Figure 9.

Reconstructions for v z as in Figure 4 but by imposing mass conservation(top: RLS, bottom: Pinsker). x [Mm] -50 0 50 v z [ m / s ] -20020406080100 ExactPinskerPinsker divRLSRLS divSOLA x [Mm] -50 0 50 v z [ m / s ] -20020406080100 ExactPinskerPinsker divRLSRLS divSOLA

Figure 10.

Comparison of the diﬀerent methods to reconstruct the vertical velocity v z ( x, , z t ) at z t = − . z t = − . z t = − . z t = − . z t = − . Figure 11.

Pointwise Frobenius norm of the 3x3 averaging kernels K div forthe Pinsker method with mass conservation at three diﬀerent depths z t ∈{− . , − . , − . } . insker estimators for local helioseismology Acknowledgement

The authors would like to thank Michal ˇSvanda for providing the source codes andthe sensitivity kernels used in the paper [32] and for helpful discussions. We wouldlike to thank Takashi Sekii for useful comments on an earlier version of the manuscript.We acknowledge ﬁnancial support from Deutsche Forschungsgemeinschaft through SFB-963 “Astrophysical Flow Instabilities and Turbulence” (Project A1). L.G. acknowledgessupport from the Center for Space Science, NYU Abu Dhabi Institute, Abu Dhabi.

Appendix

This appendix contains the proof of Lemma 5.1.

Proof.

We make the substitutions p = ρ v and q = ρ w . To prove (i), note that (cid:88) β ∈{ x , y , z } (cid:104) grad p β , grad q β (cid:105) L ( V ) = (cid:88) β,γ ∈{ x , y , z } ∂p β ∂γ ∂q β ∂γ and (cid:104) curl p , curl q (cid:105) L ( V ) + (cid:104) div p , div q (cid:105) L ( V ) = (cid:88) β,γ ∈{ x , y , z } (cid:90) V ∂p β ∂γ ∂q β ∂γ d x + (cid:88) β (cid:54) = γ (cid:90) V (cid:20) ∂p β ∂β ∂q γ ∂γ − ∂p β ∂γ ∂q γ ∂β (cid:21) d x For all terms in the second sum (coming among other terms from the curl part) wecan perform partial integrations without boundary terms to see that these terms vanish.(Note that this would not work without the Dirichlet boundary conditions for the z -components, e.g. for β = x and γ = z.) Therefore, the left hand sides of the last twoequations are equal.Part (ii): As (cid:107) p (cid:107) H ( V ) = (cid:88) β ∈{ x , y , z } (cid:107) p β (cid:107) L ( V ) + (cid:107) grad p β (cid:107) L ( V ) the second inequality in (28) follows from (27) and the Cauchy-Schwarz inequality (cid:82) V p β d( r , z ) ≤ (cid:107) p β (cid:107) L | V | / for β ∈ { x , y } .To prove the ﬁrst inequality in (28) it suﬃces to show that there exists a constant C ≥ (cid:107) p β (cid:107) L ≤ C (cid:107) grad p β (cid:107) L + C (1 − δ β,z ) | V | (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) V p β d( r , z ) (cid:12)(cid:12)(cid:12)(cid:12) for all β ∈ { x , y , z } and all p ∈ X . For β = z this follows from the Poincar´e inequality due to the Dirichletboundary conditions, and for β ∈ { x , y } it is a consequence of the Poincar´e-Wirtingerinequality.Part (iii): To show orthogonality w.r.t. the inner product (cid:104) ρ v , ρ w (cid:105) L ( V ) , let v ∈ N (cid:5) ( curl ρ ). Then by potential theorems (see e.g. [29, Thm. 3.37]) we have ρ v = grad f for some f ∈ H ( V ). It follows by partial integration that (cid:104) ρ v , ρ w (cid:105) L ( V ) = insker estimators for local helioseismology (cid:82) V f div ρ w d x = 0 for all w ∈ N (cid:5) ( div ρ ) where the boundary terms vanish due tothe boundary conditions. Hence, v ⊥ N (cid:5) ( div ρ ) w.r.t. the weighted L inner product.Together with (27) we also obtain orthogonality w.r.t. the X inner product.Let v ∈ X (cid:5) satisfy (cid:104) ρ v , ρ w (cid:105) L ( V ) = 0 for all w ∈ N (cid:5) ( div ρ ) and all w ∈ N (cid:5) ( curl ρ ).We aim to show that v = 0. Since div ρ ( ρ − curl g ) = 0 and curl ρ ( ρ − grad f ) = 0for all smooth f and g vanishing at the boundaries, we may choose w = ρ − curl g or w = ρ − grad f and perform partial integrations to obtain (cid:104) curl ρ v , g (cid:105) L ( V ) = 0 and (cid:104) div ρ v , f (cid:105) = 0. Therefore curl ρ v = 0 and div ρ v = 0, and so v = 0 from part (ii). Thisshows that the sum of the nullspaces is dense in L (cid:5) ( V ). Since the nullspaces are closedand orthogonal in X (cid:5) , their sum equals X (cid:5) . References [1] G. E. Backus and J. F. Gilbert. Numerical applications of a formalism for geophysical inverseproblems.

Geophys. J. R. Astr. Soc. , 13:247–276, 1967.[2] E. N. Belitser and B. Y. Levit. On minimax ﬁltering over ellipsoids.

Math. Methods Statist. ,4(3):259–273, 1995.[3] N. Bissantz, T. Hohage, A. Munk, and F. Ruymgaart. Convergence rates of general regularizationmethods for statistical inverse problems and applications.

SIAM J. Numer. Anal. , 45:2610–2636,2007.[4] G. Blanchard and P. Math´e. Discrepancy principle for statistical inverse problems with applicationto conjugate gradient iteration.

Inverse Problems , 28(11):115011, 23, 2012.[5] T. J. Bogdan. A comment on the relationship between the modal and time-distance formulationsof local helioseismology.

The Astrophysical Journal , 477:475, 1997.[6] R. Burston, L. Gizon, and A. C. Birch. Interpretation of helioseismic travel times.

Space ScienceReviews , pages 1–19, 2015.[7] L. Cavalier. Nonparametric statistical inverse problems.

Inverse Problems , 24(3), 2008.[8] L. Cavalier and N. W. Hengartner. Adaptive estimation for inverse problems with noisy operators.

Inverse Problems , 21(4):1345–1361, 2005.[9] G. Chavent. Least-squares, sentinels and substractive [subtractive] optimally localized average.In ´Equations aux d´eriv´ees partielles et applications , pages 345–356. Gauthier-Villars, ´Ed. Sci.M´ed. Elsevier, Paris, 1998.[10] J. Christensen-Dalsgaard, J. Schou, and M. J. Thompson. A comparison of methods for invertinghelioseismic data.

Monthly Notices of the Royal Astronomical Society , 242:353–369, February1990.[11] D. E. Dombroski, A. C. Birch, D. C. Braun, and S. M. Hanasoge. Testing helioseismic-holographyinversions for supergranular ﬂows using synthetic data.

Sol. Phys. , 282:361–378, 2013.[12] T. L. Duvall, S. M. Jeﬀeries, J. W. Harvey, and M. A. Pomerantz. Time-distance helioseismology.

Nature , 362:430–432, 1993.[13] H. W. Engl, M. Hanke, and A. Neubauer.

Regularization of Inverse Problems . Kluwer AcademicPublisher, Dordrecht, Boston, London, 1996.[14] M. Ermakov. Asymptotically minimax and Bayes estimation in a deconvolution problem.

InverseProblems , 19(6):1339–1359, 2003.[15] M. S. Ermakov. On optimal solutions of the deconvolution problem.

Inverse Problems , 6:863–872,1990.[16] S. N. Evans and P. B. Stark. Inverse problems as statistics.

Inverse Problems , 18:R55–R97, 2002.[17] D. Fournier, L. Gizon, T. Hohage, and A. C. Birch. Generalization of the noise model for time-distance helioseismology.

Astron. Astrophys. , 567:A113, 2014. insker estimators for local helioseismology [18] L. Gizon and A. C. Birch. Time-distance helioseismology: the forward problem for randomdistributed sources. The Astrophysical Journal , 571:966–986, 2002.[19] L. Gizon and A. C. Birch. Time-distance helioseismology: Noise estimation.

The AstrophysicalJournal , 614:472–489, 2004.[20] G. K. Golubev and R. Z. Khasminski. Statistical approach to some inverse boundary problemsfor partial diﬀerential equations.

Probl. Inform. Transm. , 35(2):51–66, 1999.[21] D. A. Haber, B. W. Hindman, J. Toomre, and M. J. Thompson. Organized subsurface ﬂows nearactive regions.

Sol. Phys. , 220:371–380, 2004.[22] R. Hielscher and M. Quellmalz. Optimal molliﬁers for spherical deconvolution.

Inverse Problems ,31(8):085001, 28, 2015.[23] J. Jackiewicz, A. C. Birch, L. Gizon, S. M. Hanasoge, T. Hohage, J.-B. Ruﬃo, and M. Svanda.Multichannel three-dimensional SOLA inversion for local helioseismology.

Sol. Phys. , 276:19–33,2012.[24] B. H. Jacobsen, I. Møller, J. M. Jensen, and F. Eﬀersø. Multichannel deconvolution, MCD, ingeophysics and helioseismology.

Physics and Chemistry of the Earth, Part A: Solid Earth andGeodesy , 24(3):215–220, 1999.[25] A. G. Kosovichev. Tomographic imaging of the Sun’s interior.

Astrophys. J. Lett. , 461:L55, 1996.[26] Ja. Kuks and V. Olman. Linear minimax estimation of regression coeﬃcients. II.

Eesti NSV Tead.Akad. Toimetised F¨u¨us.-Mat. , 20:480–482, 1971.[27] J. L. Lions. Sur les sentinelles des syst`emes distribu´es. Le cas des conditions initiales incompl`etes.

C. R. Acad. Sci. Paris S´er. I Math. , 307(16):819–823, 1988.[28] A. K. Louis and P. Maass. A molliﬁer method for linear operator equations of the ﬁrst kind.

Inverse Problems , 6:427–440, 1990.[29] P. Monk.

Finite element methods for Maxwell’s equations . Numerical Mathematics and ScientiﬁcComputation. Oxford University Press, New York, 2003.[30] M. S. Pinsker. Optimal ﬁltering of square-integrable signals in Gaussian noise.

Probl. Peredachi.Inf. , 16:52–68, 1980.[31] T. Schuster.

The Method of Approximate Inverse: Theory and Applications . Springer LectureNotes in Mathematics 1906, 2007.[32] M. Svanda, L. Gizon, S. M. Hanasoge, and S. D. Ustyugov. Validated helioseismic inversions for3D vector ﬂows.

Astron. Astrophys. , 530(A148), 2011.[33] U. Tautenhahn. Optimality for ill-posed problems under general source conditions.

NumericalFunctional Analysis and Optimization , 19:377–398, 1998.[34] L. Tenorio. Statistical regularization of inverse problems.

SIAM Rev. , 43(2), 2001.[35] A. N. Tikhonov and V. Y. Arsenin.

Solution of Ill-posed Problems . Washington: Winston & Sons,1977.[36] A. B. Tsybakov.

Introduction to Nonparametric Estimation . Springer Series in Statistics. SpringerNew York, 2009.[37] G. M. Vainikko. On the optimality of methods for ill-posed problems.

Zeit. Anal. und ihreAnwend , 6:351–362, 1987.[38] C. F. Van Loan. Generalizing the singular value decomposition.

SIAM J. Numer. Anal. , 13:76–83,1976.[39] J. Zhao et al. Validation of time-distance helioseismology by use of realistic simulations of Solarconvection.