Inversion of noisy Radon transform by SVD based needlet
Gérard Kerkyacharian, George Kyriazis, Erwan Le Pennec, Pencho Petrushev, Dominique Picard
aa r X i v : . [ m a t h . S T ] A ug Inversion of noisy Radon transform by SVD basedneedlets
Kerkyacharian G´erard, Kyriazis George, Le Pennec Erwan,Petrushev Pencho and Picard DominiqueMarch 11, 2009
Abstract
A linear method for inverting noisy observations of the Radon transformis developed based on decomposition systems (needlets) with rapidly decay-ing elements induced by the Radon transform SVD basis. Upper boundsof the risk of the estimator are established in L p (1 ≤ p ≤ ∞ ) norms forfunctions with Besov space smoothness. A practical implementation of themethod is given and several examples are discussed. Reconstructing images (functions) from their Radon transforms is a fundamentalproblem in medical imaging and more generally in tomography. The problem isto find an accurate and efficient algorithm for approximation of the function tobe recovered from its Radon projections. In this paper, we consider the problemof inverting noisy observations of the Radon transform. As in many other inverseproblems, there exists a basis which is fully adapted to the problem, in particular,the inversion in this basis is very stable; this is the Singular Value Decomposition(SVD) basis. The Radon transform SVD basis, however, is not quite suitable fordecomposition of functions with regularities in other than L -related spaces. Inparticular, the SVD basis is not quite capable of representing local features ofimages, which are especially important to recover.The problem requires a special construction adapted to the sphere and theRadon SVD, since usual tensorized wavelets will never reflect the manifold struc-ture of the sphere and will necessarily create unwanted artifacts, or will concentrateon special features (such as ridgelets...).Our idea is to design an estimation method for inverting the Radon trans-form which has the advantages of maximum localization of wavelet based methods1ombined with the stability and computability of the SVD methods. To this endwe utilize the construction from [22] (see also [15]) of localized frames based onorthogonal polynomials on the ball, which are closely related to the Radon trans-form SVD basis. As shown in the simulation section the results obtained are quitepromising.To investigate the properties of this method, we perform two different stud-ies. The first study is of theoretical kind and investigates the possible losses (inexpectation) of the method in the ’minimax framework’. This principle, fairlystandard in statistics, consists in analyzing the mathematical properties of esti-mation algorithms via optimization of their worst case performances over largeensembles of parameters. We carry out this study in a random model which isalso well known in statistics, the white noise model. This random model is a toymodel well admitted in statistics since the 80’s as an approximation of the ’real’model on scattered data. It is proved, for instance in [2] that the regression modelwith uniform design and the white noise model are close in the sense of Le Cam’sdeficiency -which roughly means that any procedure can be transferred from onemodel to the other, with the same order of risk-. This model has the main ad-vantage of avoiding unnecessary technicalities. In this context we prove that overlarge classes of functions (described later), our method has optimal rates of con-vergence, for all the L p losses. To our knowledge, most of the parallel results aregenerally stated for L losses, as in [18] for example, very few (if any) consider L p losses while it is a warrant for instance that the procedure will be able to detectsmall features. Again, the problem of choosing appropriated spaces of regularityin this context is a serious question, and it is important to consider the spaceswhich may be the closest to our natural intuition: those which generalize to thepresent case the approximation properties shared by standard Besov and Sobolevspaces. We can also prove that our results apply for ordinary Besov spaces.In the case p ≥ d − along with edge effectsinduced by the geometry of the ball. These rates are interesting from a statisticalpoint of view and have to be compared with similar phenomena occurring in otherinverse problems involving Jacobi polynomials (e.g. Wicksell problem), see [14].Our second study of the performances of our procedure is performed on sim-ulations. Since in practical situations scattered data are generally observed, wecarried out our simulation study in the scattered data model. We basically com-pared our method to the SVD procedure -since it is the most commonly studiedmethod in statistics- and the simulation study consistently predicts quite goodperformances of our procedure and a comparison extensively in favor of our al-gorithm. One could object that it is a rather common opinion that ’one shouldsmooth the SVD’. However, there are many ways to do so (for instance, we men-2ion a parallel method, employing a similar idea for smoothing out the projectionoperator but without using the needlet construction and in a no-noise framework,which has been developed by Yuan Xu and his co-authors in [28, 29, 30]). Ourshas the advantage of being optimal for at least one point of view since we are ableto obtain the right rates of convergence in L p norms.The paper is structured as follows. In Section 2 we introduce the model and theRadon transform Singular Value Decomposition. In Section 3 we give the classof linear estimators built upon the SVD. We also give the needlet constructionand introduce the needlet estimation algorithm. In Section 4 we establish boundsfor the risk of this estimate over large classes of regularity spaces. Section 5 isdevoted to the practical implementation and results of our method. Section 6 isan appendix where the proofs of some claims from Section 3 are given. Here we recall the definition and some basic facts about the Radon transform (cf.[12], [20], [16]). Denote by B d the unit ball in R d , i.e. B d = { x = ( x , . . . , x d ) ∈ R d : | x | ≤ } with | x | = ( P di =1 x i ) / and by S d − the unit sphere in R d . TheLebesgue measure on B d will be denoted by dx and the usual surface measure on S d − by dσ ( x ) (sometimes we will also deal with the surface measure on S d whichwill be denoted by dσ d ). We let | A | denote the measure | A | = R A dx if A ⊂ B d aswell as | A | = R A dσ ( x ) if A ⊂ S d − .The Radon transform of a function f is defined by Rf ( θ, s ) = Z y ∈ θ ⊥ sθ + y ∈ B d f ( sθ + y ) dy, θ ∈ S d − , s ∈ [ − , , where dy is the Lebesgue measure of dimension d − θ ⊥ = { x ∈ R d : h x, θ i =0 } . With a slight abuse of notation, we will rewrite this integral as Rf ( θ, s ) = Z h y,θ i = s f ( y ) dy. It is easy to see (cf. e.g. [20]) that the Radon transform is a bounded linearoperator mapping L ( B d , dx ) into L (cid:0) S d − × [ − , , dµ ( θ, s ) (cid:1) , where dµ ( θ, s ) = dσ ( θ ) ds (1 − s ) ( d − / . .2 Noisy observation of the Radon transform We consider observations of the form dY ( θ, s ) = Rf ( θ, s ) dµ ( θ, s ) + εdW ( θ, s ) , (2.1)where the unknown function f belongs to L ( B d , dx ). The meaning of this equationis that for any φ ( θ, s ) in L ( S d − × [ − , , dµ ( θ, s )) one can observe Y φ = Z φ ( θ, s ) dY ( θ, s ) = Z S d − × [ − , Rf ( θ, s ) φ ( θ, s ) dµ ( θ, s ) + ε Z φ ( θ, s ) dW ( θ, s )= h Rf, φ i µ + εW φ . Here W φ = R φ ( θ, s ) dW ( θ, s ) is a Gaussian field of zero mean and covariance E ( W φ , W ψ ) = Z S d − × [ − , φ ( θ, s ) ψ ( θ, s ) dσ ( θ ) ds (1 − s ) ( d − / = h φ, ψ i µ . The goal is to recover the unknown function f from the observation of Y . Asexplained in the introduction, this model is a toy model, fairly accepted in statisticsas an approximation of the ’real model’ of scattered data. The study is carriedout in this setting to avoid unnecessary technicalities.Our idea is to devise an estimation scheme which combines the stability andcomputability of SVD decompositions with the superb localization and multiscalestructure of wavelets. To this end we utilize a frame (essentially following theconstruction from [15]) with elements of nearly exponential localization which iscompatible with the SVD basis of the Radon transform. This procedure is also tobe considered as a first step towards a nonlinear procedure especially suitable tohandle spatial adaptivity since real objects frequently exhibit a variety of shapesand spatial inhomogeneity. The SVD of the Radon transform was first established in [5, 6, 17]. In this regardwe also refer the reader to [20, 28]. In this section we record some basic factsrelated to the Radon SVD and recall some standard definitions which will be usedin the sequel.
The Radon SVD bases are defined in terms of Jacobi and Gegenbauer polynomials.The Jacobi polynomials P ( α,β ) n , n ≥
0, constitute an orthogonal basis for the space4 ([ − , , w α,β ( t ) dt ) with weight w α,β ( t ) = (1 − t ) α (1 + t ) β , α, β > −
1. They arestandardly normalized by P ( α,β ) n (1) = (cid:0) n + αn (cid:1) and then [1, 10, 25] Z − P ( α,β ) n ( t ) P ( α,β ) m ( t ) w α,β ( t ) dt = δ n,m h ( α,β ) n , where h ( α,β ) n = 2 α + β +1 (2 n + α + β + 1) Γ( n + α + 1)Γ( n + β + 1)Γ( n + 1)Γ( n + α + β + 1) . (2.2)The Gegenbauer polynomials C λn are a particular case of Jacobi polynomials, tra-ditionally defined by C λn ( t ) = (2 λ ) n ( λ + 1 / n P ( λ − / , λ − / n ( t ) , λ > − / , where by definition ( a ) n = a ( a + 1) . . . ( a + n −
1) = Γ( a + n )Γ( a ) . It is readily seen that C λn (1) = (cid:0) n +2 λ − n (cid:1) = Γ( n +2 λ ) n !Γ(2 λ ) and Z − C λn ( t ) C λm ( t )(1 − t ) λ − dt = δ n,m h ( λ ) n with h ( λ ) n = 2 − λ π Γ( λ ) Γ( n + 2 λ )( n + λ )Γ( n + 1) . (2.3) B d and S d − We detail the following well known notations which will be used in the sequel.Let Π n ( R d ) be the space of all polynomials in d variables of degree ≤ n . Wedenote by P n ( R d ) the space of all homogeneous polynomials of degree n and by V n ( R d ) the space of all polynomials of degree n which are orthogonal to lowerdegree polynomials with respect to the Lebesgue measure on B d . V is the set ofconstants. We have the following orthogonal decomposition:Π n ( R d ) = n M k =0 V k ( R d ) . Also, denote by H n ( R d ) the subspace of all harmonic homogeneous polynomialsof degree n and by H n ( S d − ) the restriction of the polynomials from H n ( R d ) to S d − .Let Π n ( S d − ) be the space of restrictions to S d − of polynomials of degree ≤ n on R d . As is well known Π n ( S d − ) = n M m =0 H m ( S d − )(the orthogonality is with respect of the surface measure dσ on S d − ).5et Y l,i , 1 ≤ i ≤ N d − ( l ), be an orthonormal basis of H l ( S d − ), i.e. Z S d − Y l,i ( ξ ) Y l,i ′ ( ξ ) dσ ( ξ ) = δ i,i ′ . Then the natural extensions of Y l,i on B d are defined by Y l,i ( x ) = | x | l Y l,i (cid:0) x | x | (cid:1) andsatisfy Z B d Y l,i ( x ) Y l,i ′ ( x ) dx = Z r d − Z S d − Y l,i ( rξ ) Y l,i ′ ( rξ ) dσ ( ξ ) dr = Z r d +2 l − Z S d − Y l,i ( ξ ) Y l,i ′ ( ξ ) dσ ( ξ ) dr = δ i,i ′ l + d . For more details we refer the reader to [8].The spherical harmonics on S d − and orthogonal polynomials on B d are natu-rally related to Gegenbauer polynomials. The kernel of the orthogonal projectoronto H n ( S d − ) can be written as (see e.g. [24]) if N d − ( n ) is the dimension of H n ( S d − ): N d − ( n ) X i =1 Y l,i ( ξ ) Y l,i ( θ ) = 2 n + d − d − | S d − | C d − n ( h ξ, θ i ) . (2.4)The “ridge” Gegenbauer polynomials C d/ n ( h x, ξ i ) are orthogonal to Π n − ( B d ) in L ( B d ) and the kernel L n ( x, y ) of the orthogonal projector onto V n ( B d ) can bewritten in the form (see e.g. [21, 28]) L n ( x, y ) = 2 n + d | S d − | Z S d − C d/ n ( h x, ξ i ) C d/ n ( h y, ξ i ) dσ ( ξ ) (2.5)= ( n + 1) d − d π d − Z S d − C d/ n ( h x, ξ i ) C d/ n ( h y, ξ i ) k C d/ n k dσ ( ξ ) . The following important identities are valid for “ridge” Gegenbauer polynomi-als: Z B d C d/ n ( h ξ, x i ) C d/ n ( h η, x i ) dx = h ( d/ n C d/ n (1) C d/ n ( h ξ, η i ) , ξ, η ∈ S d − , (2.6)and, for x ∈ B d , η ∈ S d − , Z S d − C d/ n ( h ξ, x i ) C d/ n ( h ξ, η i ) dσ ( ξ ) = | S d − | C d/ n ( h η, x i ) , (2.7)see e.g. [21]. By (2.5) and (2.7) L n ( x, ξ ) = (2 n + d ) | S d − | C d/ n ( h x, ξ i ) , ξ ∈ S d − , , , f , , f , , f , , Figure 1: A few radon SVD basis elements (Low quality figure due to arXivconstraint)and again by (2.5) Z S d − L n ( x, ξ ) L n ( y, ξ ) dσ ( ξ ) = (2 n + d ) L n ( x, y ) . Assume that { Y l,i : 1 ≤ i ≤ N d − ( l ) } is an orthonormal basis for H l ( S d − ). Thenit is standard and easy to see that the family of polynomials f k,l,i ( x ) = (2 k + d ) / P (0 , l + d/ − j (2 | x | − Y l,i ( x ) , ≤ l ≤ k, k − l = 2 j, ≤ i ≤ N d − ( l ) , form an orthonormal basis of V k ( B d ), see e.g. [8]. On the other hand the collection g k,l,i ( θ, s ) = [ h ( d/ k ] − / (1 − s ) ( d − / C d/ k ( s ) Y l,i ( θ ) , k ≥ , l ≥ , ≤ i ≤ N d − ( l ) , is obviously an orthonormal basis of L ( S d − × [ − , , dµ ( θ, s )).Figure 1 displays a few f k,l,i and illustrates their lack of localization.The following theorem gives the SVD decomposition of the Radon transform. Theorem 2.1.
For any f ∈ L ( B d ) Rf = X k ≥ λ k X ≤ l ≤ k, k − l ≡ X ≤ i ≤ N d − ( l ) h f, f k,l,i i g k,l,i (2.8)7 nd for any g ∈ L ( S d − × [ − , , dµ ( θ, s )) R ∗ g = X k ≥ λ k X ≤ l ≤ k, k − l ≡ X ≤ i ≤ N d − ( l ) h g, g k,l,i i µ f k,l,i . (2.9) Furthermore, for f ∈ L ( B d ) f = X k ≥ λ − k X ≤ l ≤ k, k − l ≡ X ≤ i ≤ N d − ( l ) h Rf, g k,l,i i µ f k,l,i . (2.10) In the above identities the convergence is in L and λ k = 2 d π d − ( k + 1)( k + 2) . . . ( k + d −
1) = 2 d π d − ( k + 1) d − ∼ k − d +1 . (2.11) Remark : Observe that if k ≥
0, 0 ≤ l ≤ k , k − l ≤ i ≤ N d − ( l ), then R ∗ f k,l,i = 0 . ⋆ For the proof of this result, we refer the reader to [20] and [28]. We will onlyshow in the appendix that λ k has the value given in (2.11). In a general noisy inverse model dY t = Kf dt + εdW t , where K is a linear operator mapping f ∈ H Kf ∈ K , and H and K aretwo Hilbert spaces, the SVD yields a family of linear estimators via the followingclassical scheme.Suppose K has an SVD Kf = X m σ m h f, e m i e ∗ m , f ∈ H , where { e m } and { e ∗ m } are orthonormal bases for H and K , respectively, and Ke m = σ m e ∗ m and K ∗ e ∗ m = σ m e m with K ∗ being the adjoint operator of K . We also assumethat σ m →
0. Then, if σ m = 0, Z e ∗ m dY t = Z Kf · e ∗ m dt + ε Z e ∗ m dW t = Z f · K ∗ e ∗ m dt + ε Z e ∗ m dW t = σ m Z f e m dt + ε Z e ∗ m dW t σ m Z e ∗ m dY t = h f, e m i + εσ m Z e ∗ m dW t . (3.1)In going further, suppose that { φ l } is a tight frame for H . Therefore, for any f ∈ H f = X l α l φ l , α l = h f, φ l i . We can represent φ l in the basis { e m } : φ l = X m h φ l , e m i e m = X m γ ml e m and hence α l = X m γ ml h f, e m i . On account of (3.1) this leads to the estimatorˆ f N = X l ≤ N ˆ α l φ l with ˆ α l = X m γ ml σ m Z e ∗ m dY t , (3.2)where N is a parameter. By (3.1) we haveˆ α l = X m γ ml h f, e m i + X m γ ml εσ m Z e ∗ m dW t = α l + Z l , where Z l has a normal distribution N (0 , P m ( γ ml ) ε σ m ). In this scheme the factors σ m , which are inherent to the inverse model, bring instability by inflating thevariance.The selection of the frame { φ l } is critical for the method described above. Thestandard SVD method corresponds to the choice φ l = e l . This SVD method isvery attractive theoretically and can be shown to be asymptotically optimal inmany situations (see Dicken and Maass [7], Math´e and Pereverzev [19] togetherwith their nonlinear counterparts Cavalier and Tsybakov [4], Cavalier et al [3],Tsybakov [26], Goldenschluger and Pereverzev [11], Efromovich and Kolchinskii[9]). It also has the big advantage of performing a quick and stable inversion ofthe operator K . However, while the SVD bases are fully adapted to describe theoperator K , they are usually not quite appropriate for accurate description of thesolution of the problem with a small number of parameters. Although the SVDmethod is suitable for estimating the unknown function f with an L -loss, it isalso rather inappropriate for other losses. It is also restricted to functions whichare well represented in terms of the Sobolev space associated to the SVD basis.9witching to an arbitrary frame { φ m } , however, may yield additional instabilitythrough the factors ( γ ml ) ’s.Our idea is to utilize a frame { φ l } which is compatible with the SVD basis { e m } ,allowing to keep the variance within reasonable bounds, and has elements withsuperb space localization and smoothness, guaranteeing excellent approximationof the unknown function f . In the following we implement the above describedmethod to the inversion of the Radon transform. We shall build upon the framesconstructed in [22] and called “needlets”. In this part we construct the building blocks of our estimator. We will essentiallyfollow the construction from [22]. L k on V k ( B d ) . Let { f k,l,i } be the orthonormal basis of V k ( B d ) defined in § T k theindex set of this basis, i.e. T k = { ( l, i ) : 0 ≤ l ≤ k, l ≡ k (mod 2) , ≤ i ≤ N d − ( l ) } .Also, set ν = d/ −
1. Then the orthogonal projector of L ( B d ) onto V k ( B d ) canbe written in the form L k f = Z B d f ( y ) L k ( x, y ) dy with L k ( x, y ) = X l,i ∈ T k f k,l,i ( x ) f k,l,i ( y ) . Using (2.4) L k ( x, y ) can be written in the form L k ( x, y ) (3.3)= (2 k + d ) X l ≤ k, k − l ≡ P (0 ,l + ν ) j (2 | x | − | x | l P (0 ,l + ν ) j (2 | y | − | y | l X i Y l,i (cid:16) x | x | (cid:17) Y l,i (cid:16) y | y | (cid:17) = (2 k + d ) | S d − | X l ≤ k, k − l ≡ P (0 ,l + ν ) j (2 | x | − | x | l P (0 ,l + ν ) j (2 | y | − | y | l (cid:16) lν (cid:17) C νl (cid:16)D x | x | , y | y | E(cid:17) . Another representation of L k ( x, y ) has already be given in (2.5). Clearly Z B d L m ( x, z ) L k ( z, y ) dz = δ m,k L m ( x, y ) (3.4)and for f ∈ L ( B d ) f = X k ≥ L k f and k f k = X k k L k f k = X k h L k f, f i . (3.5)10 .2.2 Smoothing Let a ∈ C ∞ [0 , ∞ ) be a cut-off function such that 0 ≤ a ≤ a ( t ) = 1 for t ∈ [0 , / a ⊂ [0 , L ( B d ). For j ≥ A j f ( x ) = X k ≥ a (cid:16) k j (cid:17) L k f ( x ) = Z B d A j ( x, y ) f ( y ) dy with A j ( x, y ) = X k a (cid:16) k j (cid:17) L k ( x, y ) . Also, we define B j f = A j +1 f − A j f . Then setting b ( t ) = a ( t/ − a ( t ) we have B j f ( x ) = X k b (cid:16) k j (cid:17) L k f ( x ) = Z B d B j ( x, y ) f ( y ) dy with B j ( x, y ) = X k b (cid:16) k j (cid:17) L k ( x, y ) . Evidently, for f ∈ L ( B d ) h A j f, f i = X k a (cid:16) k j (cid:17) h L k f, f i ≤ k f k (3.6)and lim j →∞ k A j f − f k = lim j →∞ k ( A + j − X m =0 B m ) f − f k = 0 . (3.7)An important result from [22] (see also [15]) asserts that the kernels A j ( x, y ), B j ( x, y ) have nearly exponential localization, namely, for any M > c M > | A j ( x, y ) | , | B j ( x, y ) | ≤ C M jd (1 + 2 j d ( x, y )) M p W j ( x ) p W j ( y ) , x, y ∈ B d , (3.8)where W j ( x ) = 2 − j + p − | x | , | x | = | x | d = P di =1 x i , and d ( x, y ) = Arccos ( h x, y i + p − | x | p − | y | ) , h x, y i = d X i =1 x i y i . (3.9)The left part (1) of Figure 2 illustrates this concentration: it displays theinfluence of a point x to the value of B j f at a second point y , namely the valuesof B j ( x, y ) for a fixed y and j = 4. This influence peaks at y and vanishesexponentially fast to 0 as soon as one goes away from y . The central part (2) ofFigure 2 shows the modification of the concentration when j is set to a large value( j = 6). The right part (3) of Figure 2 shows the lack of concentration of B j whenthe cut-off function a used is far from being C ∞ . The resulting kernel still peaksat y but the value of B j f at y is strongly influenced by values far away from y .11 j ( x, y )j 4 6 4 a C ∞ C ∞ non smooth(1) (2) (3)Figure 2: Smoothing kernel B j ( x, y ) for a fixed y for (1) a C ∞ cut-off function a with j = 4 , (2) for the same a with j = 6 and (3) for a non smooth cut-offfunction a with j = 4 (Low quality figure due to arXiv constraint) Remark : At this point it is important to notice the following correspondencewhich will be used in the sequel. For S d + = { ( x, z ) ∈ R d × R + , | x | d + z = 1 } , wehave the natural bijection x ∈ B d ˜ x = ( x, p − | x | ) and d ( x, y ) = d S d + (˜ x, ˜ y ) , where d S d + is the geodesic distance on S d + . ⋆ Here we discuss the approximation properties of the operators { A j } . We will showthat in a sense they are operators of “near best” polynomial L p -approximation.Denote by E n ( f, p ) the best L p -approximation of f ∈ L p ( B d ) from Π n , i.e. E n ( f, p ) = inf P ∈ Π n k f − P k p . (3.10)Estimate (3.8) yields (cf. ([22, Proposition 4.5]) Z B d | A j ( x, y ) | dy ≤ c ∗ , x ∈ B d , j ≥ , where c ∗ is a constant depending only on d . Therefore, the operators A j are (uni-formly) bounded on L ( B d ) and L ∞ ( B d ), and hence, by interpolation, on L p ( B d ),1 ≤ p ≤ ∞ , i.e. k A j f k p ≤ c ∗ k f k p , f ∈ L p ( B d ) . (3.11)On the other hand, since a ( t ) = 1 on [0 , /
2] we have A j P = P for P ∈ Π j − .We use this and (3.11) to obtain, for f ∈ L p ( B d ) and an arbitrary polynomial12 ∈ Π j − , k f − A j f k p = k f − P + P − A j f k p ≤ k f − P k p + k A j ( P − f ) k p ≤ (1+ c ∗ ) k f − P k p = K k f − P k p . Consequently, k f − A j f k p ≤ KE j − ( f, p ). In the opposite direction, evidently, A j f ∈ Π j and hence E j ( f, p ) ≤ k f − A j f k p . Therefore, for f ∈ L p ( B d ), 1 ≤ p ≤∞ , E j ( f, p ) ≤ k f − A j f k p ≤ KE j − ( f, p ) . (3.12)These estimates do not tell the whole truth about the approximation power of A j .It is rather obvious that because of the superb localization of the kernel A j ( x, y )the operator A j provides far better rates of approximation than E j − ( f, ∞ ) awayfrom the singularities of f .In contrast, the kernel S j ( x, y ) = P ≤ k ≤ j L k ( x, y ) of the orthogonal projector S j onto Π j is poorly localized and hence S j is useless for approximation in L p , p = 2. This partially explains the fact that the traditional SVD estimators performpoorly in L p -norms when p = 2. Let us define C j ( x, y ) = X m r a (cid:16) m j (cid:17) L m ( x, z )) and D j ( x, y ) = X m r b (cid:16) m j (cid:17) L m ( x, z ) . Note that C j and D j have the same localization as the localization of A j , B j in(3.8) (cf. [22]). Using (3.4), we get the desired splitting A j ( x, y ) = Z B d C j ( x, z ) C j ( z, y ) dz (3.13)and B j ( x, y ) = Z B d D j ( x, z ) D j ( z, y ) dz. (3.14)Obviously z C j ( x, z ) C j ( z, y ) is a polynomial of degree < j +1 and z D j ( x, z ) D j ( z, y )is a polynomial of degree < j +2 . The next step is to discretize the kernels A j ( x, y )and B j ( x, y ). To construct the needlets on B d we need one more ingredient - a cubature formulaon B d exact for polynomials of a given degree.13ecall first the bijection between the ball B d (equipped with the usual Lebesguemeasure) and the unit upper hemisphere in R d +1 : S d + = { ( x, y ) , x ∈ R d , ≤ y ≤ , | x | d + y = 1 } equipped with dσ the usual surface measure. T : ( x, y ) ∈ S d + x ∈ R d and T − : x ∈ R d ˜ x = ( x, q − | x | d ) ∈ S d + Applying the substitution T one has (see e.g. [27]) Z S d + F ( x, y ) dσ ( x, y ) = Z B d F ( x, p − | x | ) dx p − | x | (3.15)and hence for f : R d R Z ( x,y ) ∈ S d + f ( x ) ydσ d ( x, y ) = Z x ∈ B d f ( x ) q − | x | d dx p − | x | d = Z B d f ( x ) dx. (3.16)Therefore, given a cubature formula on S d one can easily derive a cubature formulaon B d . Indeed, suppose we have a cubature formula on S d + exact for all polynomialsof degree n + 1, i.e, there exist ˜ χ n ⊂ S d + and coefficients ω ˜ ξ >
0, ˜ ξ ∈ ˜ χ n , such that Z S d + P ( u ) dσ ( u ) = X ˜ ξ ∈ ˜ χ n ω ˜ ξ P ( ˜ ξ ) ∀ P ∈ Π n +1 ( R d +1 ) . If P ∈ Π n ( R d ) then P ( x ) y ∈ Π n +1 ( R d +1 ) and hence X ˜ ξ ∈ ˜ χ n ω ˜ ξ P ( ξ ) p − ξ = Z S d + P ( x ) ydσ = Z B d P ( x ) dx. Thus the projection χ n of ˜ χ n on B d is the set of nodes and the associated coefficientsgiven by ω ξ = p − ξ ω ˜ ξ induce a cubature formula on B d exact for Π n ( R d ).The following proposition follows from results in [22] and [27]. Proposition 3.1.
Let { B ( ˜ ξ i , ρ ) : i ∈ I } be a maximal family of disjoint sphericalcaps of radius ρ = τ − j with centers on the hemisphere S d + . Then for sufficientlysmall < τ ≤ the set of points χ j = { ξ i : i ∈ I } obtained by projecting theset { ˜ ξ i : i ∈ I } on B d is a set of nodes of a cubature formula which is exact for Π j +2 ( B d ) . Moreover, the coefficients ω ξ i of this cubature formula are positive and ω ξ i ∼ W j ( ξ i )2 − jd . Also, the cardinality χ j ∼ jd . .2.6 Needlets Going back to identities (3.13) and (3.14) and applying the cubature formuladescribed in Proposition 3.1, we get A j ( x, y ) = Z B d C j ( x, z ) C j ( z, y ) dz = X ξ ∈ χ j ω ξ C j ( x, ξ ) C j ( y, ξ ) and B j ( x, y ) = Z B d D j ( x, z ) D j ( z, y ) dz = X ξ ∈ χ j ω ξ D j ( x, ξ ) D j ( y, ξ ) . We define the father needlets φ j,ξ and the mother needlets ψ j,ξ by φ j,ξ ( x ) = √ ω ξ C j ( x, ξ ) and ψ j,ξ ( x ) = √ ω ξ D j ( x, ξ ) , ξ ∈ χ j , j ≥ . We also set ψ − , = Bd | B d | and χ − = { } . From above it follows that A j ( x, y ) = X ξ ∈ χ j φ j,ξ ( x ) φ j,ξ ( y ) , B j ( x, y ) = X ξ ∈ χ j ψ j,ξ ( x ) ψ j,ξ ( y ) . Therefore, A j f ( x ) = Z B d A j ( x, y ) f ( y ) dy = X ξ ∈ χ j h f, φ j,ξ i φ j,ξ = X ξ ∈ χ j α j,ξ φ j,ξ , α j,ξ = h f, φ j,ξ i . (3.17)and B j f ( x ) = Z B d B j ( x, y ) f ( y ) dy = X ξ ∈ χ j h f, ψ j,ξ i ψ j,ξ = X ξ ∈ χ j β j,ξ ψ j,ξ , β j,ξ = h f, ψ j,ξ i . (3.18)By (3.17) and (3.6) we have k φ j,ξ k ≥ h A j φ j,ξ , φ j,ξ i = h X ξ ′ ∈ χ j h φ j,ξ , φ j,ξ ′ i φ j,ξ ′ , φ j,ξ i = X ξ ′ ∈ χ j |h φ j,ξ , φ j,ξ ′ i| ≥ k φ j,ξ k and hence k φ j,ξ k ≤ . (3.19)From (3.5) and the fact that P j ≥ b ( t − j ) = 1 for t ∈ [1 , ∞ ), it readily followsthat f = X j ≥− X ξ ∈ χ j h f, ψ j,ξ i ψ j,ξ , f ∈ L ( B d ) , f this leads to k f k = X j X ξ ∈ χ j |h f, ψ j,ξ i| , which in turn shows that the family { ψ j,ξ } is a tight frame for L ( B d ) and conse-quently k ψ j,ξ k ≥ k ψ j,ξ k , i.e. k ψ j,ξ k ≤ . (3.20)Observe that using the properties of the cubature formula from Proposition 3.1,estimate (3.8) leads to the localization estimate (cf. [22]): | φ j,ξ ( x ) | , | ψ j,ξ ( x ) | ≤ C M jd/ p W j ( ξ )(1 + 2 j d ( x, ξ )) M ∀ M > . (3.21)Nontrivial lower bounds for the norms of the needlets are obtained in [15]. Moreprecisely, in [15] it is shown that for 0 < p ≤ ∞k ψ j,ξ k p ∼ k φ j,ξ k p ∼ (cid:16) jd W j ( ξ ) (cid:17) / − /p , ξ ∈ χ j . (3.22)We next record some properties of needlets which will be needed later on. Forconvenience we will denote in the following by h j,ξ either φ j,ξ or ψ j,ξ . Theorem 3.2.
Let ≤ p ≤ ∞ and j ≥ . The following inequalities hold X ξ ∈ χ j k h j,ξ k pp ≤ c j ( dp/ p/ − + ) if p = 4 , (3.23) X ξ ∈ χ j k h j,ξ k pp ≤ cj jdp/ if p = 4 , (3.24) and for any collection of complex numbers { d ξ } ξ ∈ χ j k X ξ ∈ χ j d ξ h j,ξ k p ≤ c (cid:16) X ξ ∈ χ j | d ξ | p k h j,ξ k pp (cid:17) /p . (3.25) Here c > is a constant depending only on d , p , and τ . To make our presentation more fluid we relegate the proof of this theorem tothe appendix. 16 .3 Linear needlet estimator
Our motivation for introducing the estimator described below is the excellent ap-proximation power of the operators A j defined in § f f = X k,l,i h f, f k,l,i i f k,l,i , where the sum is over the index set { ( k, l, i ) : k ≥ , ≤ l ≤ k, l ≡ k (mod 2) , ≤ i ≤ N d − ( l ) } . Combining this with the definition of A j we get A j f = X ξ ∈ χ j h f, φ j,ξ ( y ) i φ j,ξ = X ξ ∈ χ j α j,ξ φ j,ξ , where α j,ξ = h f, φ j,ξ i = X k,l,i γ j,ξk,l,i h f, f k,l,i i = X k,l,i γ j,ξk,l,i λ k h R ( f ) , g k,l,i i µ = X k,l,i γ j,ξk,l,i λ k Z g k,l,i R ( f ) dµ. Here γ j,ξk,l,i = h f k,l,i , φ j,ξ ( y ) i can be precomputed.It seems natural to us to define an estimator b f j of the unknown function f by b f j = X ξ ∈ χ j b α j,ξ φ j,ξ , (3.26)where b α j,ξ = X k,l,i γ j,ξk,l,i λ k Z g k,l,i dY. (3.27)Here the summation is over { ( k, l, i ) : 0 ≤ k < j , ≤ l ≤ k, l ≡ k (mod 2) , ≤ i ≤ N d − ( l ) } and j is a parameter.Some clarification is needed here. The father and mother needlets, introducedin § φ j,ξ and ψ j,ξ have su-perb localization, however, the mother needlets { ψ j,ξ } have multilevel structureand, therefore, are an excellent tool for nonlinear n-term approximation of func-tions on the ball, whereas the father needlets are perfectly well suited for linearapproximation. So, there should be no surprise that we use the father needlets forour linear estimator.Furthermore, even if the needlets are central in the analysis of the estimator,the estimator b f j can be defined without them. Indeed, b f j = X ξ ∈ χ j X k,l,i γ j,ξk,l,i λ k Z g k,l,i dY φ j,ξ
17s all the sum are finite, their order can be interchanged, yielding b f j = X k,l,i λ k Z g k,l,i dY X ξ ∈ χ j γ j,ξk,l,i φ j,ξ = X k,l,i λ k Z g k,l,i dY A j f k,l,i and thus the estimator is obtained by a simple componentwise multiplication onthe SVD coefficients b f j = X k,l,i a (cid:0) k j (cid:1) λ k Z g k,l,i dY f k,l,i . However, as will be shown in the sequel, the precise choice of this smoothingallows to consider L p losses and precisely because of the localization propertiesof the atoms this approach will be extended using a thresholding procedure in afurther work, using this time the mother wavelet. In this section we estimate the risk of the needlet estimator introduced above interms of the Besov smoothness of the unknown function.
We introduce the Besov spaces of positive smoothness on the ball as spaces of L p -approximation from algebraic polynomials. As in § E n ( f, p ) the best L p -approximation of f ∈ L p ( B d ) from Π n . We will mainly usethe notations from [15]. Definition 4.1. [ ] Let < s < ∞ , ≤ p ≤ ∞ , and < q ≤ ∞ . The space B s, p,q on the ball is defined as the space of all functions f ∈ L p ( B d ) such that | f | B s, p,q = (cid:16) X n ≥ ( n s E n ( f, p )) q n (cid:17) /q < ∞ if q < ∞ ,and | f | B s, p,q = sup n ≥ n s E n ( f, p ) < ∞ if q = ∞ . The norm on B s, p,q is defined by k f k B s, p,q = k f k p + | f | B s, p,q . Remark : From the monotonicity of { E n ( f, p ) } it readily follows that k f k B s, p,q ∼ k f k p + (cid:16) X j ≥ (2 js E j ( f, p )) q (cid:17) /q q = ∞ . ⋆ There are several different equivalent norms on the Besov space B s, p,q . Theorem 4.2.
With indexes s, p, q as in the above definition the following normsare equivalent to the Besov norm k f k B s, p,q : ( i ) N ( f ) = k f k p + k (2 js k f − A j f k p ) j ≥ k l q , ( ii ) N ( f ) = k f k p + k (2 js k B j f k p ) j ≥ k l q , ( iii ) N ( f ) = k f k p + k (2 js X ξ ∈ χ j |h f, ψ j,ξ i| p k ψ j,ξ k pp ) j ≥− k l q . Proof . The equivalence N ( f ) ∼ k f k B s, p,q is immediate from (3.12).To prove that N ( f ) ∼ N ( f ), we recall that B j = ( A j +1 − A j ) (see § k B j f k p ≤ k f − A j +1 f k p + k f − A j f k p which readily implies N ( f ) ≤ c N ( f ) . In the other direction, we have k f − A j f k p = k ∞ X l = j B l f k p ≤ ∞ X l = j k B l f k p . Assuming that N ( f ) < ∞ we have k B l ( f ) k p = α l − ls with { α l } ∈ l q . Hence ∞ X l = j k B l ( f ) k p = ∞ X l = j α l − ls = 2 − js ∞ X l = j α l − ( l − j ) s =: 2 − js β j and by the convolution inequality { β j } ∈ l q . Therefore, N ( f ) ≤ c N ( f ).For the equivalence N ( f ) ∼ k f k B s, p,q , see [15, Theorem 5.4]. The classical Besov space B sp,q ( B d ) is defined through the L p -norm of the finitedifferences: ∆ h f ( x ) = ( f ( x + h ) − f ( x )) x ∈ B d x + h ∈ B d and in general∆ Nh f ( x ) = x ∈ B d x + Nh ∈ B d N X k =0 ( − N − k (cid:18) Nk (cid:19) f ( x + kh ) . Then the N th modulus of smoothness in L p is defined by ω Np ( f, t ) = sup | h |≤ t k ∆ Nh f k p , t > . < s < N , 1 ≤ p ≤ ∞ , and 0 < q ≤ ∞ , the classical Besov space B sp,q isdefined by the norm k f k B sp,q = k f k p + (cid:16) Z [ t s ω Np ( f, t )] q dtt (cid:17) /q ∼ k f k p + (cid:16) ∞ X j =0 [2 js ω Np ( f, − j )] q (cid:17) /q with the usual modification for q = ∞ . It is well known that the definition of B sp,q does not depend on N as long as s < N [13]. Moreover, the embedding B sp,q ⊆ B s, p,q , (4.1)is immediate from the estimate E n ( f, p ) ≤ cω Np ( f, /n ) [13]. Theorem 4.3.
Let ≤ p ≤ ∞ , < s < ∞ , and assume that f ∈ B s, p, ∞ with k f k B s, p, ∞ ≤ M . Let b f J = X ξ ∈ χ J b α J,ξ φ j,ξ be the needlet estimator introduced in § J is selected depending on theparameters as described below.1. If M − J ( s + d ) ∼ ε when p = ∞ , then E k f − b f J k ∞ ≤ c ∞ M ds + d ε ss + d p log M/ε.
2. If M − Js ∼ ε J ( d − /p ) when ≤ p < ∞ , then E k f − b f J k pp ≤ c p M ( d − /p ) ps + d − /p ε sps + d − /p , where when p = 4 there is an additional factor ln( M/ε ) on the right.3. If M − Js ∼ ε J ( d − / when ≤ p < , then E k f − b f j k pp ≤ c p M ( d − / ps + d − / ε sps + d − / . Remarks : • It will be shown in a forthcoming paper that the following rates of conver-gence are, in fact, minimax, i.e. there exist positive constants c and c suchthat sup k f k Bs, p, ∞ ≤ M inf ˜ f estimator E k f − ˜ f k pp ≥ c max { ε sps + d − /p , ε sps + d − / } , sup k f k Bs, ∞ , ∞ ≤ M inf ˜ f estimator E k f − ˜ f k ∞ ≥ c ε ss + d p log 1 /ε. The case p = 2 above corresponds to the standard SVD method which in-volves Sobolev spaces. In this setting, minimax rates have already beenestablished (cf. [7], [19] [4], [3], [26], [11], [9]); these rates are ε ss + d − / . Also,it has been shown that the SVD algorithms yield minimax rates. These re-sults extend (using straightforward comparisons of norms) to L p losses for p <
4, but still considering the Sobolev ball {k f k B s, , ∞ ≤ M } rather thanthe Besov ball {k f k B s, p, ∞ ≤ M } . Therefore, our results can be viewed as anextension of the above results, allowing a much wider variety of regularityspaces. • The Besov spaces involved in our bounds are in a sense well adapted to ourmethod. However, the embedding results from Section 4.1.1 shows that thebounds from Theorem 4.3 hold in terms of the standard Besov spaces as well.This means that in using the Besov spaces described above, our results arebut stronger. • In the case p ≥ d − along with edgeeffects induced by the geometry of the ball. These rates have to be comparedwith similar phenomena occurring in other inverse problems involving Jacobipolynomials (e.g. Wicksell problem), see [14]. ⋆ Assume f ∈ B s, p, ∞ and k f k B s, p, ∞ ≤ M . Then by Theorem 4.2, k A j f − f k p ≤ c k f k B s, p, ∞ − js ≤ cM − js . (4.2)Now from dY = Rf dµ + εdW we have Z g k,l,i dY = Z Z Rf g k,l,i dµ + ε Z g k,l,i dW = Z B d f R ∗ g k,l,i dx + ε Z k,l,i = λ k Z B d f f k,l,i dx + ε Z k,l,i and hence 1 λ k Z g k,l,i dY = Z B d f f k,l,i dx + ελ k Z k,l,i .
21n account of (3.27) this leads to b α j,ξ = X k,l,i γ j,ξk,l,i Z B d f f k,l,i dx + X k,l,i γ j,ξk,l,i ελ k Z k,l,i = α j,ξ + Z j,ξ . Here the summation is over { ( k, l, i ) : 0 ≤ k < j , ≤ l ≤ k, l ≡ k (mod 2) , ≤ i ≤ N d − ( l ) } . Since Z k,l,i are independent N (0 ,
1) random variables, Z j,ξ ∼ N (0 , σ j,ξ )with σ j,ξ = ε X k,l,i | γ j,ξk,l,i | ( k ) d π d − d k ≤ (2 j ) d − π d − d ≤ c j ( d − ε (4.3)with c = ( d/ π ) d − . Here we used that { f k,l,i } is an orthonormal basis for L andhence P k,l,i | γ j,ξk,l,i | = k φ j,ξ k ≤ b f j = P ξ ∈ χ j b α j,ξ φ j,ξ and using (4.2) we have, whenever 1 ≤ p < ∞ , E k f − b f j k pp ≤ p − {k f − A j f k pp + E k A j f − b f j k pp }≤ p − { cM p − jsp + E k A j f − b f j k pp } (4.4)and, for p = ∞ , E k f − b f j k ∞ ≤ k f − A j f k ∞ + E k A j f − b f j k ∞ ≤ cM − js + E k A j f − b f j k ∞ . (4.5)On the other hand, using inequality (3.25) of Theorem 3.2 we obtain, if 1 ≤ p < ∞ , k A j f − b f j k pp = k X ξ ∈ χ j ( α j,ξ − b α j,ξ ) φ j,ξ k pp ≤ c X ξ ∈ χ j | α j,ξ − b α j,ξ | p k φ j,ξ k pp and hence E k A j f − b f j k pp ≤ c X ξ ∈ χ j E | Z j,ξ | p k φ j,ξ k pp ≤ c ( ε j ( d − / ) p X ξ ∈ χ j k φ j,ξ k pp , (4.6)where we used that E | Z j,ξ | p ≤ c ( ε j ( d − / ) p . Similarly, for p = ∞ , k A j f − b f j k ∞ = k X ξ ∈ χ j ( α j,ξ − b α j,ξ ) φ j,ξ k ∞ ≤ c max ξ ∈ χ j | α j,ξ − b α j,ξ |k φ j,ξ k ∞ and hence E k A j f − b f j k ∞ ≤ c E { max ξ ∈ χ j | Z j,ξ |k φ j,ξ k ∞ }≤ cε j ( d − / max ξ ∈ χ j k φ j,ξ k ∞ p jd ≤ cε jd p j. (4.7)22or the second inequality above we used Pisier’s lemma: If Z j ∼ N (0 , σ j ), σ j ≤ σ ,then E ( sup ≤ j ≤ N | Z j | ) ≤ σ p N .
We also used that max ξ ∈ χ j k φ j,ξ k ∞ ≤ c j ( d +1) / , which follows by inequality (3.23)of Theorem 3.2.Combining (4.5) and (4.7) we obtain, for p = ∞ , E k f − b f j k ∞ ≤ c { M − js + ε jd p j } and if M − j ( s + d ) ∼ ε , then E k f − b f j k ∞ ≤ cM ds + d ε ss + d p log M/ε.
Similarly, combining estimate (3.23) of Theorem 3.2 with (4.6) and inserting theresulting estimate in (4.4) we obtain in the case 4 ≤ p < ∞ E k f − b f j k pp ≤ c { M − jsp + ( ε j ( d − / ) p jdp/ p/ − } = c { M p − jsp + ε p j ( dp − } . If M − js ∼ ε j ( d − /p ) this yields E k f − b f j k pp ≤ cM ( d − /p ) ps + d − /p ε sps + d − /p . Accordingly, for p = 4 we combine inequality (3.24) with (4.6) and insert the resultin (4.4) to obtain E k f − b f j k pp ≤ c { M p − jsp + ( ε j ( d − / ) p j jdp/ } = c { M p − jsp + j ( ε j ( d − / ) p } and if M − js ∼ ε j ( d − / this yields E k f − b f j k pp ≤ cM ( d − /p ) ps + d − /p ε sps + d − / log M/ε.
Finally, if 1 ≤ p < E k f − b f j k pp ≤ c { M p − jsp + ( ε j ( d − / ) p jdp/ } = c { M p − jsp + ( ε j ( d − / ) p } . So, if M − js ∼ ε j ( d − /p ) , then E k f − b f j k pp ≤ cM ( d − / ps + d − / ε sps + d − / . This completes the proof of the theorem.23 eceptors Ray source θ θ s θ Figure 3: Simplified CAT device d Fan Beam Tomography
We have implemented this scheme for d = 2 in the radiological setting of Cormack[5].This case corresponds to the fan beam Radon transform used in Computed AxialTomography (CAT). As shown if Figure 3, an object is positioned in the middle ofthe device. X rays are sent from a pointwise source S ( θ ) located on the boundaryand making an angle θ with the horizontal. They go through the object and arereceived on the other side on uniformly sampled array of receptors R ( θ , θ ). Thelog decay of the energy from the source to a receptor is proportional to the integralof the density f of the object along the ray and thus one finally measures˜ Rf ( θ , θ ) = Z e θ + λe θ − θ ∈ B f ( x ) dλ with e θ = (cos θ, sin θ ) or equivalently the classical Radon transform Rf ( θ, s ) = Z y ∈ θ ⊥ sθ + y ∈ B f ( sθ + y ) dy, θ ∈ S , s ∈ [ − , , for θ = θ − θ and s = sin θ . The device is then rotated to a different angle θ and the process is repeated. Note that dθ ds (1 − s ) is nothing but the measurecorresponding to the uniform dθ dθ by the change of variable that maps ( θ , θ )into ( θ, s ).The Fan Beam Radon SVD basis of the disk is tensorial in polar coordinates: f k,l,i ( r, θ ) = (2 k + 2) / P (0 , l ) j (2 | r | − | r | l Y l,i ( θ ) , ≤ l ≤ k, k − l = 2 j, ≤ i ≤ , where P ,lj is the corresponding Jacobi polynomial, and Y l, ( θ ) = c l cos( lθ ) and Y l, ( θ ) = c l sin( θ ) with c = √ π and c l = √ π otherwise. The basis of S × [ − , g k,l,i ( θ, t ) = [ h k ] − / (1 − t ) / C k ( t ) Y l,i ( θ ) , k ≥ , l ≥ , ≤ i ≤ , where C k is the Gegenbauer of parameter 1 and degree k . The correspondingeigenvalues are λ k = 2 √ π √ k + 1 . In this paper, we have considered the theoretical framework of the white noisemodel. In this model, we assume that we have access to the noisy “scalar product” R g k,l,i dY , that is to the scalar product of Rf with the SVD basis g kl,i up to a i.i.d.centered Gaussian perturbation of known variance ǫ . This white noise model isa convenient statistical framework closely related to a more classical regressionproblem with a uniform design on θ and θ , which is closer to the implementationin real devices. In this regression design, one observe Y i ,i = Rf (cid:18) π (cid:18) i N − i N (cid:19) , sin 2 π i N (cid:19) + ǫ i ,i , i ≤ N , i ≤ N where N and N gives the discretization level of the angles θ and θ and ǫ i ,i isan i.i.d. centered Gaussian sequence of known variance σ . Note that this pointsare not cubature points for the SVD coefficients. The correspondence between thetwo model is obtain by replacing the noisy scalar product R g k,l,i dY with by thecorresponding Riemann sum \ h Rf, g k,l,i i = 1 N × N N − X i =0 N − X i =0 g k,l,i (cid:18) π (cid:18) i N − i N (cid:19) , sin 2 π i N (cid:19) Y i ,i and using the calibration ǫ = σ / ( N × N ). It is proved, for instance in [2] thatthe regression model with uniform design and the white noise model are close inthe sense of Le Cam’s deficiency -which roughly means that any procedure canbe transferred from one model to the other, with the same order of risk-. Theestimator b f j defined in the white noise model by b f j = X k,l,i a (cid:0) k j (cid:1) λ k Z g k,l,i dY f k,l,i = X k,l,i a (cid:0) k j (cid:1) √ k + 12 √ π Z g k,l,i dY f k,l,i is thus replaced in the regression model by b f j = X k,l,i a (cid:0) k j (cid:1) λ k \ h Rf, g k,l,i i f k,l,i = X k,l,i a (cid:0) k j (cid:1) √ k + 12 √ π \ h Rf, g k,l,i i f k,l,i . .2 Numerical results To illustrate the advantages of the linear needlet estimator over the linear SVDestimator, we have compared their performances on a synthetic example, the clas-sical Logan Shepp phantom[23], for different L p norm and different noise level, andfor both the white noise model and the regression model. The Logan Shepp phan-tom is a synthetic image used as a benchmark in the tomography community. It isa simple toy model for human body structures simplified as a piecewise constantfunction with discontinuities along ellipsoids (see Figure 6). This example is notregular in a classical sense. Indeed, it belongs to B , , but not to any B s, p,q with s > f of the Logan Shepp function presented above, its decomposition in the SVDbasis f k,l,i up to degree ˜ k = 512 has been approximated with an initial numericalquadrature χ valid for polynomial of degree 4 × ˜ k = 2048, h f, f k,l,i i ≃ X ( r i ,θ i ) ∈ χ ω ( r i ,θ i ) f ( r i , θ i ) f k,l,i ( r i , θ i ) = c k,l,i and used this value to approximate the original SVD coefficients of R ( f ), thenoiseless Radon transform of f , h R ( f ) , f k,l,i i ≃ λ k c k,l,i . In the white noise setting, for all k ≤ k = 256, a noisy observation R g k,l,i dY is generated by Z g k,l,i dY ≃ λ k c k,l,i + ǫW k,l,i where ǫ is the noise level and W k,l,i a iid sequence of standard Gaussian randomvariables. Our linear needlet estimator b f J of level J = log ( k N ) is then computedas b f J = X k ≤ k ,l,i a (cid:18) k J (cid:19) ( c k,l,i + ǫλ k W k,l,i ) f k,l,i while the linear SVD estimator b f Sk S of degree k S is defined as b f Sk S = X k ≤ k S ,l,i ( c k,l,i + ǫλ k W k,l,i ) f k,l,i . We also consider the naive inversion up to degree k b f I which is equal to b f Sk . The L p estimation error is measured by reusing the initial quadrature formula, k f − b f k p ≃ X ( r i ,θ i ) ∈ χ ω ( r i ,θ i ) | f ( r i , θ i ) − b f ( r i , θ i ) | p . n the regression setting , we have computed the values of the Radon transform Rf of f on a equispaced grid for the angles θ and θ specified by its sizes N and N using its SVD decomposition up to k = ˜ k = 512. We have then defined thenoisy observation as Y i ,i = ( Rf (cid:18) π (cid:18) i N − i N (cid:19) , sin 2 π i N (cid:19) + ǫ i ,i with ǫ i ,i an i.i.d. centered Gaussian sequence of known variance σ . The esti-mated SVD coefficients are obtained through the Riemann sums \ h Rf, g k,l,i i = 1 N × N N − X i =0 N − X i =0 g k,l,i (cid:18) π (cid:18) i N − i N (cid:19) , sin 2 π i N (cid:19) Y i ,i . We plug then these values instead of the R g k,l,i dY in the previous estimators.For each noise level and each norm, the best level J and the best degree K has been selected as the one minimizing the average error over 50 realizations ofthe noise. Figure 4 displays, in a logarithmic scale, the estimation errors k f − ˆ f k p in the white noise model plotted against the logarithm of the noise level ǫ . Itshows that, except for the very low noise case, both the linear SVD estimator andthe linear Needlet estimators reduce the error over a naive inversion linear SVDestimate up to the maximal available degree k . They also show that the Needletestimator outperforms the SVD estimator in a large majority of cases from thenorm point of view and almost always from the visual point of view as shown inFigure 6. The localization of the needlet also ’localizes’ the errors and thus the“simple” smooth regions are much better restored with the needlet estimate thanwith the SVD because the errors are essentially concentrated along the edges forthe needlet. Remark that the results obtained for the regression model in Figure 5are similar. We have plotted, in a logarithmic scale, the estimation errors againstthe logarithm of the equivalent of the noise ǫ in the regression σ / ( N × N ) with N = N = 64 and various σ . Observe that the curves are similar as long as σ isnot too small, i.e. as long as the error due to the noise dominate the error due tothe discretization. As can be seen both analysis do agree. This confirms the factthat the white noise model analysis is relevant for the corresponding fixed design.A fine tuning for the choice of the maximum degree is very important to obtaina good estimator. In our proposed scheme, and in the Theorem, this parameteris set by the user according to some expected properties of the unknown functionor using some oracle. Nevertheless, an adaptive estimator, which does not requirethis input, can already be obtained from this family, for example, by using someaggregation technique. A different way to obtain an adaptive estimator based onthresholding is under investigation by some of the authors.27 ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5−2.5−2−1.5−1 −log( ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. L L ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5−1.6−1.5−1.4−1.3−1.2−1.1−1−0.9−0.8 −log( ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. L L ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5−1.3−1.2−1.1−1−0.9−0.8−0.7−0.6−0.5 −log( ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. L L ε ) l og ( || f − F || ∞ ) Best Lin. SVDBest Lin. Need. L ∞ Figure 4: Error decay in the white noise model: the red curve corresponds to theneedlet estimator and the black one to the SVD estimator.28 ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. L L ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. L L ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5−1−0.95−0.9−0.85−0.8−0.75−0.7−0.65−0.6−0.55 −log( ε ) l og ( || f − F || ) Best Lin. SVDBest Lin. Need. L L ε ) l og ( || f − F || ∞ ) Best Lin. SVDBest Lin. Need. L ∞ Figure 5: Error decay in the regression model: the red curve corresponds to theneedlet estimator and the black one to the SVD estimator.29riginal ( f ) Inversion ( ˆ f I )Needlet ( ˆ f N ) SVD ( ˆ f D )Figure 6: Visual comparison for the original Logan Shepp phantom with ǫ = 8(Low quality figure due to arXiv constraint)30 Appendix
From [20, p. 99] with some adjustment of notation, we have λ k = | S d − | π ( d − / Γ (cid:0) d +12 (cid:1) C d/ k (1) C ( d − / l (1) Z − C d/ k ( t ) C ( d − / l ( t )(1 − t ) ( d − / dt, where 0 ≤ l ≤ k and l ≡ k (mod 2). As will be seen shortly λ k is independent of l .We will only consider the case d > d = 2 is simpler, see [20, p.99]). To compute the above integral we will use the well known identity (cf. [25,(4.7.29)]) ( n + λ ) C λn ( t ) = λ ( C λ +1 n ( t ) − C λ +1 n − ( t )) . Summing up these identities (with indices n, n − , . . . ) and taking into accountthat C λ ( t ) = 1, C λ ( t ) = 2 λ ( t ), we get C λ +1 n ( t ) = ⌊ n/ ⌋ X j =0 n − j + λλ C λn − j ( t ) . (6.1)This with λ = ( d − / C ( d − / n ( t ), n ≥
0, yield Z − C d/ k ( t ) C ( d − / l ( t )(1 − t ) ( d − / dt = l + λλ Z − [ C ( d − / l ( t )] (1 − t ) ( d − / dt = l + λλ h ( λ ) l = ( l + λ )2 − λ πλ Γ( λ ) Γ( l + 2 λ )( l + λ )Γ( l + 1) . We use this and that C λn (1) = Γ( n +2 λn !Γ(2 λ ) (see § | S d − | = π ( d − / Γ(( d − / to obtain λ k = 2 π d − Γ (cid:0) d − (cid:1) Γ (cid:0) d +12 (cid:1) k !Γ( d ) l !Γ( d − k + d )Γ( l + d −
2) 2 − d π ( d − (cid:0) d − (cid:1) Γ( l + d − l + 1)= 2 − d π d Γ (cid:0) d − (cid:1) Γ (cid:0) d +12 (cid:1) Γ( d )Γ( d − d − (cid:0) d − (cid:1) k + 1) d − . (6.2)The doubling formula for Gamma-function says: Γ(2 z ) = z − √ π Γ( z )Γ( z + 1) (seee.g. [25]) and henceΓ( d )Γ( d −
2) = ( d − d − d − = 2 d − π ( d − d − (cid:16) d − (cid:17) Γ (cid:16) d − (cid:17) . We insert this in (6.2) and then a little algebra shows that λ k = d π d − ( k +1) d − .31 .2 Proof of Theorem 3.2 For the proof of estimates (3.23)-(3.24) we first note that by (3.3) X ξ ∈ χ j k h j,ξ k pp ≤ c jdp/ − jd X ξ ∈ χ j (cid:0) − j + p − | ξ | (cid:1) p/ − and we need an upper bound for Ω r := 2 − jd P ξ ∈ χ j (cid:0) − j + √ −| ξ | (cid:1) r . To this end,we will use the natural bijection between B d and S d + considered in the remark in § x ∈ B d we write ˜ x = ( x, p − | x | ) ∈ S d + . Let ˜ p = (0 ,
1) bethe ”north pole” of S d . For ˜ ξ ∈ χ j we denote by B S d ( ˜ ξ, ρ ) is the geodesic ballon S d of radius ρ centered at ˜ ξ , i.e. B S d ( ˜ ξ, ρ ) := { ˜ x ∈ S d : d S d (˜ x, ˜ p ) < ρ } , where d S d (˜ x, ˜ p ) = Arccos ( p − | x | ) = Arccos h ˜ x, ˜ p i is the geodesic distance between˜ x , ˜ p . Using that || u | − | ξ || ≤ |h ˜ u, ˜ p i − h ˜ ξ, ˜ p i| ≤ d S d ( ˜ ξ, ˜ u ) ≤ ρ for ˜ u ∈ B S d ( ˜ ξ, ρ ), and ρ = τ − j ≤ − j (see Proposition 3.1), it follows that12 − j + p − | ξ | ≤ − j + p − | u | = 22 − j + h ˜ u, ˜ p i ∀ ˜ u ∈ B S d ( ˜ ξ, ρ ) . On the other hand, we have | B S d ( ˜ ξ, ρ ) | = | S d − | R ρ (sin θ ) d − dθ ≥ ρ d | S d − | d − dπ d − with | S d − | = π d/ Γ( d/ . We use the above and the fact that the balls { B S d ( ˜ ξ, ρ ) } ξ ∈ χ j are disjoint to obtainΩ r ≤ − jd X ξ ∈ χ j | B S d ( ˜ ξ, ρ ) | Z B S d (˜ ξ,ρ ) − j + h ˜ u, ˜ p i ) r dσ (˜ u ) ≤ c Z S d + − j + h ˜ u, ˜ p i ) r dσ (˜ u ) ≤ c | S d − | Z π/ (sin θ ) d − (2 − j + cos θ ) r dθ ≤ c Z π/ sin θ (2 − j + cos θ ) r dθ = c Z − j + t ) r dt ≤ c ( d, τ, r ) Z − j t − r dt. This yields estimates (3.23)-(3.24).We now turn to the proof of estimate (3.25). We will employ the maximaloperator M t ( t > M t f ( x ) := sup B ∋ x (cid:18) | B | Z B | f ( y ) | t dy (cid:19) /t , x ∈ B d , (6.3)where the sup is over all balls B ⊂ B d with respect to the distance d ( · , · ) from (3.9)containing x . It is easy to show that (see § B d isa doubling measure with respect to the distance d ( · , · ). Hence the general theory32f maximal operators applies. In particular, the Fefferman-Stein vector-valuedmaximal inequality is valid: If 0 < p < ∞ , < q ≤ ∞ , and 0 < t < min { p, q } thenfor any sequence of functions { f ν } ν on B d k ( ∞ X ν =1 |M t f ν ( · ) | q ) /q k p ≤ c k ( ∞ X ν =1 | f ν ( · ) | q ) /q k p . (6.4)Denote by B ( ξ, r ) the projection of B S d ( ˜ ξ, r ) onto B d , i.e. B ( ξ, r ) := { x ∈ B d : d ( x, ξ ) < r } . By [15, Lemma 2.5], we have( M t B ( ξ,r ) )( x ) ≥ c (cid:16) d ( ξ, x ) r (cid:17) − ( d +1) /t , ξ ∈ B d , < r ≤ π. (6.5)It is easy to see (cf. [15]) that | B ( ξ, ρ ) | ∼ − jd (2 − j + p − | ξ | ) ∼ − jd W j ( ξ ) , ξ ∈ χ j . (6.6)Also, we let ˜ E := | E | E denote the L -normalized characteristic function of E ⊂ B d . Then (3.21) and (6.6) imply k h j,ξ k p ∼ k ˜ B ( ξ,ρ ) k p , ξ ∈ χ j . (6.7)Now, pick 0 < t < M > ( d + 1) /t . From (3.21) and (6.5) it follows that | h j,ξ ( x ) | ≤ c ( M t ˜ B ( ξ,ρ ) )( x ) , x ∈ B d . (6.8)Using this, the maximal inequality (6.4), and (6.7) we obtain k X ξ ∈ χ j d ξ h j,ξ k p ≤ c k X ξ ∈ χ j M t ( d ξ ˜ B ( ξ,ρ ) ) k p ≤ c k X ξ ∈ χ j d ξ ˜ B ( ξ,ρ ) k p ≤ c (cid:16) X ξ ∈ χ j k d ξ ˜ B ( ξ,ρ ) k pp (cid:17) /p ≤ c (cid:16) X ξ ∈ χ j k d ξ h j,ξ k pp (cid:17) /p . This completes the proof of (3.25).
References [1] G. Andew, R. Askey, and R. Roy.
Special Functions . Cambridge UniversityPress.[2] L. Brown and M. Low. Asymptotic equivalence of nonparametric regressionand white noise.
Ann. Statist. , 24(6):2384–2398, 1996.333] L. Cavalier, G. K. Golubev, D. Picard, and A. B. Tsybakov. Oracle inequali-ties for inverse problems.
Ann. Statist. , 30(3):843–874, 2002.[4] L. Cavalier and A. Tsybakov. Sharp adaptation for inverse problems withrandom noise.
Probab. Theory Related Fields , 123(3):323–354, 2002.[5] A. M. Cormack. Representation of a function by its line integrals with someradiological applications II.
J. Appl. Phys. , 35:2908–2913, 1964.[6] M. Davison. A singular value decomposition for the radon transform in n-dimensional eucledian space.
Numer. Func. Anal. and Optimiz. , 3:321–340,1981.[7] V. Dicken and P. Maass. Wavelet-Galerkin methods for ill-posed problems.
J. Inverse Ill-Posed Probl. , 4(3):203–221, 1996.[8] C. Dunkl and Y. Xu.
Orthogonal polynomials of several variables , volume 81of
Cambridge University Press . Cambridge University Press, New York, 2001.[9] S. Efromovich and V. Koltchinskii. On inverse problems with unknown oper-ators.
IEEE Trans. Inform. Theory , 47(7):2876–2894, 2001.[10] A. Erd´elyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi.
Higher Tran-scendental Functions , volume 2. McGraw-Hill, New York, 1953.[11] A. Goldenshluger and S. Pereverzev. On adaptive inverse estimation of linearfunctionals in Hilbert scales.
Bernoulli , 9(5):783–807, 2003.[12] S. Helgason.
The Radon transform . Birkh¨auser, Basel and Boston, 1980.[13] H. Johnen and K. Scherer. On the equivalence of the k functional and moduliof continuity and some applications.
L.N. Math. , 571:119–140, 1976.[14] G. Kerkyacharian, P. Petrushev, D. Picard, and T. Willer. Needlet algorithmsfor estimation in inverse problems.
Electron. J. Stat. , 1:30–76 (electronic),2007.[15] G. Kyriazis, P. Petrushev, and Y. Xu. Decomposition of weighted triebel-lizorkin and besov spaces on the ball.
Proc. London Math. Soc. , 2008. toappear(arXiv:math/0703403).[16] B. Logan and L. Shepp. Optimal reconstruction of a function from its pro-jections.
Duke Math. J. , 42(4):645–659, 1975.3417] A. Louis. Orthogonal function series expansions and the null space of theradon transform.
SIAM J. Math. Anal. , 15:621–633, 1984.[18] W. R. Madych and S. A. Nelson. Polynomial based algorithms for computedtomography.
SIAM Journal on Applied Mathematics , 43(1):157–185, Feb.1983.[19] P. Math´e and S. Pereverzev. Geometry of linear ill-posed problems in variableHilbert scales.
Inverse Problems , 19(3):789–803, 2003.[20] F. Natterer.
The mathematics of computerized tomography , volume 32 of
Clas-sics in Applied Mathematics . Society for Industrial and Applied Mathematics(SIAM), Philadelphia, PA, 2001. Reprint of the 1986 original.[21] P. Petrushev. Approximation by ridge functions and neural networks.
SIAMJ. Math. Anal. , 30:155–189, 1999.[22] P. Petrushev and Y. Xu. Localized polynomials frames on the ball.
Constr.Approx. , 27:121–148, 2008.[23] L. A. Shepp and B. F. Logan. The fourier reconstruction of a head section.
IEEE Trans. Nucl. Sci. , NS-21:21—43, 1974.[24] E. Stein and G. Weiss.
Introduction to Fourier Analysis on Euclidian spaces .Princeton University Press.[25] G. Szeg˝o.
Orthogonal polynomials . American Mathematical Society, Provi-dence, R.I., 1975.[26] A. Tsybakov. On the best rate of adaptive estimation in some inverse prob-lems.
C. R. Acad. Sci. Paris S´er. I Math. , 330(9):835–840, 2000.[27] Y. Xu. Orthogonal polynomials and cubature formulae on spheres and onballs.
SIAM J. Math. Anal. , 29(3):779–793 (electronic), 1998.[28] Y. Xu. Reconstruction from radon projections and orthogonal expansion onball. 2007. (arXiv: 0705.1984v1).[29] Y. Xu and O. Tischenko. Fast oped algorithm for reconstruction of imagesfrom radon data. 2007. (arXiv: math/0703617v1).[30] Y. Xu, O. Tischenko, and C. Hoeschen. Image reconstruction by OPEDalgorithm with averaging.