Minimax optimal estimator in the stochastic inverse problem for exponential Radon transform
aa r X i v : . [ m a t h . S T ] S e p Minimax optimal estimator in the stochastic inverseproblem for exponential Radon transform
Anuj Abhishek
Abstract
In this article, we consider the problem of inverting the exponential Radon trans-form of a function in the presence of noise. We propose a kernel estimator to estimatethe true function, analogous to the one proposed by Korostel¨ev and Tsybakov in theirarticle ‘Optimal rates of convergence of estimators in a probabilistic setup of tomogra-phy problem’, Problems of Information Transmission, 27:73-81,1991. For the estimatorproposed in this article, we then show that it converges to the true function at a min-imax optimal rate.
Exponential Radon transform (ERT), which is the object of study in this article, can bethought of as a generalization of the classical Radon transform. In fact, the ERT of acompactly supported function f ( x ) in R is given by : T µ f ( θ, s ) = Z x · θ = s e µx · θ ⊥ f ( x ) dx (1)Here s ∈ R , θ ∈ S where S is the unit circle in R , µ is a constant and θ ⊥ denotes a unitvector perpendicular to θ . Recall that lines in R can be parameterized as L ( θ, s ) = { x : x · θ = s } . Thus, just as the classical Radon transform, ERT takes a function defined on aplane and maps it to a function defined over the set of lines parameterized by ( θ, s ). Suchtransforms arise naturally in imaging modalities such as SPECT (single photon emissioncomputed tomography) imaging [37] and nuclear magnetic resonance imaging [17].The exponential Radon transform is a special case of a more general transform called at-tenuated Radon transform which takes the integral of a function over straight lines withrespect to an exponential weight that signifies a non-constant attenuation effect. We referthe readers to the article by Finch [6] and the textbook by Natterer and W¨ubbeling [23]for an excellent overview of the attenuated Radon transform. Indeed, the attenuated Radontransform is itself an example of a generalized Radon transform that was studied by Quintoin [25, 26]. 1nversion methods for the exponential Radon transform were derived by Natterer in [22] andby Tretiak and Metz in [34]. Hazou and Solmon in [9] gave filtered backprojection (FBP)type formulas for inversion of ERT using a class of filters. Such FBP type inversion formulasare based on the method of approximate inverse which were developed systematically in thearticles by Louis [15] and Louis and Maas [16]. An exhaustive treatment of the methodof approximate inverse can be found in the book by Schuster [30]. Rigaud and Lakhalhave used the method of approximate inverse and derived Sobolev estimates for attenuatedRadon transform in [27], these estimates were central to proving some of the theorems inthis article. Furthermore, Novikov in [24] and Natterer in [20] give an inversion formulafor the more general attenuated Radon transform. There is extensive literature available onthis subject and we give now a partial list of references where an interested reader may findimportant insights and advances made in the study of exponential and attenuated Radontransforms, see e.g. [1, 2, 3, 7, 10, 18, 28, 29, 31, 32].Classical Radon transform has also been extensively studied in the stochastic setting. Adetailed discussion of positron emission tomography (PET) in presence of noise can be foundin the seminal article by Johnstone and Silverman [11]. In [8], Hahn and Quinto establishupper and lower bounds for the convergence of two probability measures in terms of therates of convergence of their Radon transforms. Korostel¨ev and Tsybakov show that optimalminimax convergence rates are attained by kernel type estimators, which are closely linked toFBP inversion methods, in [12, 13]. An exhaustive coverage of the non-parametric estimationmethods that are used to establish the optimal convergence rates in this article and elsewherecan be founds in the books written by Korostel¨ev and Tsybakov [14] and Tsybakov [35].Cavalier obtained results on efficient estimation of density in the non-parametric setting forstochastic PET problem in [4, 5]. In addition to the non-parametric kernel type estimators,Bayesian estimators for the stochastic problem of X-ray tomography have been studied byseveral authors, most notably by Lassas, Siltanen and Somersalo, see e.g. [33, 36] andreferences therein. More recently, Monard, Nickl and Paternain have obtained results onefficient Bayesian inference for the attenuated X-ray transform on a Riemannian manifold,see [19].In this article, we propose a statistical kernel estimator for the ERT problem and show thatit attains the optimal minimax rate of convergence. The organization of the article is asfollows: in section 2, we describe the mathematical set up of the stochastic problem for ERTand recall some standard definitions from the literature. In section 3, we recall the FBPtype inversion in the deterministic (noise-less) setting. In section 4, we propose a kernel typeestimator and establish that it is asymptotically unbiased. Finally, in section 5 we show thatthis estimator attains optimal minimax rates of convergence. In this section we will describe the mathematical framework for the problem and recallsome standard definitions from the literature that will help us assess the optimality of theestimator proposed in this article. 2et f ( x ) : R → R be a function that satisfies the following assumptions: Assumption 1 (A1):
Let B ( x ) = { x : || x || ≤ } be the unit ball in R . We assume that f ( x ) is supported in the unit ball B ( x ). Assumption 2 (A2):
Let e f ( ξ ) represent the Fourier transform of f ( x ), i.e. e f ( ξ ) = R R f ( x ) e − iξ · x dx . We assume that the Fourier transform of f ( x ) satisfies the following in-equality, Z R (1 + || ξ || ) β | e f ( ξ ) | dξ ≤ L for some fixed positive numbers L and β > H ( β, L ), the class of functions satisfying assumptions A1 and A2. Definition 1.
Let S denote the unit circle in R and Z = S × [ − , be the cylinderwhose points are given by ( θ, s ) where s ∈ [ − , and θ ∈ S . By θ ⊥ , we will denote a unitvector perpendicular to θ . The exponential Radon transform of f ∈ H ( β, L ) is defined as thefollowing function on Z : T µ f ( θ, s ) = Z x · θ = s e µx · θ ⊥ f ( x ) dx where µ is a fixed constant. It is clear that if µ = 0 , then the exponential Radon transformreduces to the case of the classical Radon transform. Definition 2.
Associated to the exponential Radon transform, is its dual transform T ♯µ g ( x ) = Z S e µx · θ ⊥ g ( θ, x · θ ) dθ. Clearly, for µ = 0 , this is the backprojection operator for the classical Radon transform. Now we will describe the stochastic problem of exponential Radon transform. Let { ( θ i , s i ) } i = ni =1 be n random points on the observation space Z and let the observations be of the form: Y i = T µ f ( θ i , s i ) + ǫ i (2)We assume that the points ( θ i , s i ) are independent and identically distributed (i.i.d.) on Z and ǫ i are i.i.d. random variables with zero mean and some finite positive variance σ . Thecollection of the random points { ( θ i , s i ) } i = ni =1 where observations are made is called the designand will be denoted by D n . In the observation model given by equation (2), the randomvariables ǫ i account for noise. The stochastic inverse problem for exponential Radon trans-form is to then estimate the function f ( x ) based on the observations Y i for i = { , , . . . , n } .This problem is non-parametric in the sense that the function f itself is not assumed to beof any parametric form but is rather assumed to belong to a general class of functions, say F . In this article we have assumed f ∈ H ( β, L ). Suppose one devises an estimator ˆ f n ( x )based on the observed data. One is then naturally led to ask the question, if this estimatoris optimal? The most popular of such approaches to assess the optimality of estimators ina non-parametric setting is the minimax approach, which we will describe below. Let the3onparametric class of functions F be equipped with a semi-norm d . Thus the semi-distancebetween two elements f ∈ F and g ∈ F will be represented as d ( f, g ) and we will use thequantity d ( ˆ f , f ) = ( d ( ˆ f , f )) as a measure of error between an estimator ˆ f and the truefunction f . First of all, note that as any such estimator ˆ f n ( x ) will depend on the randomobservation points { ( s i , θ i ) } i = ni =1 and observations { Y i } i = ni =1 , it is better to consider the expectedvalue of the error between the estimator and the true function (under the chosen semi-norm)as a measure of accuracy. The following definitions are standard in the literature. Definition 3 ([11, 35]) . The risk function of an estimator ˆ f n ( x ) is defined as: R ( ˆ f n , f ) = E f ( d ( ˆ f n , f )) . From here on, E f will be used to denote the expectation with respect to the joint distributionof random variables ( s i , θ i , Y i ), i = { , . . . , n } satisfying the model given by (2). Ideally, onewould like to devise an estimator that would minimize the risk function. However, as thedefinition of the risk function depends on f as well, one tries instead to find an overallmeasure of risk such as the minimax risk . Definition 4. [35, Page 78] Let f ( x ) belong to some non-parametric class of functions F .The maximum risk of an estimator ˆ f n is defined as: r ( ˆ f n ) = sup f ∈F R ( ˆ f n , f ) . Finally, the minimax risk on F is defined as: r n ( F ) = inf ˆ f n sup f ∈F R ( ˆ f n , f ) where the infimum is taken over the set of all possible estimators ˆ f n of f . Clearly, r n ( F ) ≤ r ( ˆ f n ) . Definition 5. [35, Page 78] Let { Ψ n } ∞ n =1 be a positive sequence converging to zero. Anestimator ˆ f ∗ n is said to be minimax optimal if there exist finite positive constants C and C such that, C Ψ n ≤ r n ( F ) ≤ r ( ˆ f ∗ n ) ≤ C Ψ n . Furthermore, Ψ n is said to be the optimal rate of converegence. In this article, whenever we refer to the optimality of an estimator, we will mean its minimaxoptimality. In section 4, we will propose an estimator for f ( x ) ∈ H ( β, L ) based on the model(2) and establish its optimality in the following (semi) norms:1. d ( f, g ) = | f ( x ) − g ( x ) | ( x is an arbitrary fixed point in B ( x ))2. d ( f, g ) = ( Z | f ( x ) − g ( x ) | dx ) /
4s per definition 5 above. We also note that the risk function defined using semi-norm d iscalled the mean squared error (MSE), while the risk function defined using d is referred toin the literature as the mean integrated squared error (MISE) of the estimator. Thus:MSE( ˆ f n , f ) = E f ( d ( ˆ f n , f )) , MISE( ˆ f n ) = E f ( d ( ˆ f n , f )) . Finally, we recall the Kullback distance between two probability measures on a measurablespace:
Definition 6. [35, Page 84] Let P and Q be two probability measures on some measurablespace ( X , A ) . The Kullback distance between the two measures is given by, I ( P, Q ) = Z log dPdQ dP if P is absolutely continuous with respect to Q = ∞ otherwise In this section we will describe some of the results from the deterministic set-up, i.e. whenthe observations as per the model given by (2) are not corrupted by noise. Let ρ > < | µ | < /ρ . Consider the function K ρ ( θ, s ) = K ρ ( s ) defined as: K ρ ( s ) = 1 π Z √ (1 /ρ )+ µ | µ | r cos( sr ) dr (3)These kind of functions have been used in the context of filtered backprojection formulasfor Radon transforms, see e.g. [14, Page 237], [21, Page 109]. Let I p ( t ) denote the indicatorfunction: I p ( t ) = 1 , | t | < /p = 0 , | t | ≥ /p The one dimensional Fourier transform of K ρ ( θ, s ) (in the s -variable) is: e K ρ ( θ, t ) = | t | , | µ | < | t | < p (1 /ρ ) + µ = 0 , otherwise . (4)In the following analysis, ⋆ will represent the operation of convolution of functions. Further-more, whenever the convolution of two functions f and g defined on the cylinder Z = S × R is considered, the convolution will be understood to be taken with respect to their secondvariable, i.e. f ⋆ g ( θ, s ) = Z R f ( θ, s − t ) g ( θ, t ) dt. heorem 1. [21, Page 49] Let f ρ ( x ) = π T ♯ − µ ( K ρ ⋆ T µ f ) . Then, f ( x ) = lim ρ → f ρ ( x ) . Proof.
The proof of this theorem is well known, see e.g.[21, Section II.6]. However, we willreproduce it here for the sake of completeness. First of all recall that from [21, (6.2), Page47], we know that: T ♯ − µ ( g ⋆ T µ f ) = ( T ♯ − µ g ) ⋆ f . Thus, if we can show that π T ♯ − µ K ρ is anapproximate Dirac-delta function, then we are done. Let us then compute: T ♯ − µ K ρ ( x ) = Z S e − µx · θ ⊥ K ρ ( θ, x · θ ) dθ = 12 π Z S e − µx · θ ⊥ Z R e ix · θ e K ρ ( θ, t ) dtdθ = 12 π Z | µ | < | t | < √ (1 /ρ )+ µ | t | Z S e − µx · θ ⊥ + i ( x · θ ) t dθdt In what follows, by J we will denote the Bessel function of first kind of integer order 0. Nowfrom [21, VII.3.17] R S e − µx · θ ⊥ + i ( x · θ ) t dθ = 2 πJ ( | x | ( t − µ ) / ). Thus, T ♯ − µ K ρ ( x ) = Z | µ | < | t | < √ (1 /ρ )+ µ | t | J ( | x | ( t − µ ) / ) dt = 2 /ρ Z σJ ( | x | σ ) dσ ( σ = ( t − µ ) / )= 4 π (cid:0) π /ρ Z σJ ( | x | σ ) dσ (cid:1) = 4 πδ /ρ ( x ) [21 , (1.3), Page 183]where δ /ρ ( x ) = 12 π Z | t | < /ρ e ix · t dt = 12 π Z R I ρ ( t ) e ix · t dt. is an approximate Dirac-delta function that converges to Dirac distribution δ ( x ) pointwise(in the space of tempered distributions) as ρ →
0. This completes the proof. β ,L) In this section we propose a statistical estimator for f ∈ H ( β, L ) based on the model (2) inthe stochastic problem of exponential Radon transform. Inspired by the estimator proposed6n [12] and in Theorem 1 above, let us consider the statistical estimator: f ∗ n ( x ) = 1 n n X i =1 e − µx · θ ⊥ i K ρ n ( h x · θ i i − s i ) Y i (5)where θ i , s i and Y i are i.i.d. random variables as per the model (2) and ρ n → n → ∞ .We will call ρ n as the bandwidth of the estimator. Note that the MSE of the estimator in thenon-parametric setting can be broken down in to two terms a “bias term” and a “varianceterm”: MSE( f ∗ n , f ) = E f [( f ∗ n ( x ) − f ( x )) ]= ( E f ( f ∗ n ( x )) − f ( x )) + E f [( f ∗ n ( x ) − E f ( f ∗ n ( x ))) ]= B n ( x ) + V n ( x ) . (6)where B n ( x ) is the bias of the estimator and V n ( x ) is its variance. Note thatMISE( f ∗ n , f ) = || B n ( x ) || + || V n ( x ) || where || ( · ) || denotes L norm. Recall that an estimator is said to be asymptotically unbiasedif its bias goes to zero pointwise as the number of observations (samples) n grows. We willnow show that the estimator proposed above is asymptotically unbiased. Theorem 2.
Let ( θ i , s i ) , i = { , . . . , n } be i.i.d. random variables uniformly distributed on Z = S × [ − , and these points be independent of the errors ( ǫ , · · · ǫ n ) . If we consider thekernel estimator f ∗ n ( x ) = n P ni =1 e − µx · θ ⊥ i K ρ n ( h x · θ i i − s i ) Y i , then for each x ∈ B ( x ) the biasterm, B n ( x ) = ( E f ( f ∗ n ( x )) − f ( x )) , for this estimator goes to zero as n → ∞ .Proof. It suffices to show that E f ( f ∗ n ( x )) = f ρ n ( x ) where f ρ n ( x ) is given by Theorem (1).Then since ρ n → n → ∞ , hence E f ( f ∗ n ( x )) = f ρ n → f ( x ) pointwise. In what follows,we will say that the i.i.d random variables θ i have the same distribution as some randomvariable θ , all s i are distributed with the same distribution as some random variable s andsimilarly Y and ǫ are random variables with the same distribution as random variables Y i and ǫ i respectively. We will also denote by E ( θ,s ) ( · ) the expected value of a random variablewith respect to the joint distribution of ( θ, s ) and by E f | ( θ,s ) ( · ) the conditional expectationof a random variable given ( θ, s ). Consider, E f ( f ∗ n ( x )) = 1 n E f ( n X i =1 e − µx · θ ⊥ i K ρ n ( h x · θ i i − s i ) Y i )= E f ( e − µx · θ ⊥ K ρ n ( h x · θ i − s ) Y )= E ( θ,s ) (cid:0) E f | ( θ,s ) ( e − µx · θ ⊥ K ρ n ( h x · θ i − s )( T µ f ( θ, s ) + ǫ )) (cid:1) (law of iterated expectaion)= E ( θ,s ) (cid:0) E f | ( θ,s ) ( e − µx · θ ⊥ K ρ n ( h x · θ i − s )( T µ f ( θ, s ))) ( ǫ has mean 0)= E ( θ,s ) ( e − µx · θ ⊥ K ρ n ( h x · θ i − s )( T µ f ( θ, s )))7 14 π Z S e − µx · θ ⊥ Z − K ρ n ( h x · θ i − s )( T µ f ( θ, s )) dsdθ = f ρ n ( x ) In this section we will show first of all that while the bias of the estimator decreases asbandwidth goes to zero, the variance increases as bandwidth decreases. Thus an optimalrate of convergence can be obtained by finding a suitable bandwidth ρ n which balances thebias and the variance term. Furthermore, we will establish the optimality of the proposedestimator under both semi-norms d and d as defined in Section 2. Let us now analyze thebias and the variance terms one by one. It is easy to check that for β > | I ρ n ( t ) − | ≤ ( | t | ρ n ) β (7) | I ρ n ( t ) − | ≤ (cid:20) | t | ρ n | t | ρ n (cid:21) β (8)Consider first the bias term, B n ( x ) = f ρ n ( x ) − f ( x ) = δ /ρ n ⋆ f ( x ) − f ( x ). Then for any fixedpoint x ∈ B ( x ) and β > B n ( x ) = | ( δ /ρ n ⋆ f ( x ) − f ( x )) |≤ π Z R | ( I ρ n ( | ξ | ) − || ˜ f ( ξ ) | dξ ≤ π Z R | ˜ f ( ξ ) | (2( | ξ | ρ n )) β / (1 + ( | ξ | ρ n ) β ) dξ (using (8))= ρ βn π [ Z R | ˜ f ( ξ ) | | ξ | β dξ ] [ Z R (1 + ( | ξ | ρ n ) β ) − dξ ] (using H¨older’s inequality)= c ρ β − n , c > d , we also find anestimate for || B n ( x ) || . || B n ( x ) || = || δ /ρ n ⋆ f ( x ) − f ( x ) || = 12 π Z R | ( I ρ n ( | ξ | ) − | | ˜ f ( ξ ) | dξ (using Parseval’s theorem) ≤ π Z R | ˜ f ( ξ ) | ( | ξ | ρ n ) β (using 7) (10)8 Lρ βn π = c ρ βn (11)where c = L/ π . Now we estimate the variance. Lemma 1. V n ( x ) = E f (cid:18) ( f ∗ n ( x ) − E f ( f ∗ n ( x )) ) (cid:19) ≤ c /nρ n for x ∈ B ( x ) and for someconstant c > . From this it also follows that for x ∈ B ( x ) , || V n ( x ) || ≤ c /nρ n for someconstant c .Proof. In the following,
V ar will denote the variance as per standard notation. First of all,note that E f ( f ∗ n ( x )) = f ρ n ( x ) and s i , θ i and Y i are i.i.d. random variables. Thus, V n ( x ) = E f (cid:18) ( f ∗ n ( x ) − E f ( f ∗ n ( x ))) (cid:19) = 1 n (cid:18) V ar f ( e − µ ( x · θ ⊥ ) K ρ n ( x · θ − s ) T µ f ( θ, s )) (cid:19) + 1 n (cid:18) E f ( e − µ ( x · θ ⊥ ) K ρ n ( x · θ − s ) ǫ ) (cid:19) ≤ σ + 4 e | µ | L πn Z S e − µ ( x · θ ⊥ ) 1 Z − K ρ n ( x · θ − s ) dsdθ where we use the fact that since f ∈ H ( β, L ) is compactly supported in B ( x ), we get | T µ f ( θ, s ) | ≤ e | µ | L. Let us now estimate: Z − K ρ n ( x · θ − s ) ds ≤ ∞ Z −∞ | K ρ n ( s ) | ds ≤ ∞ Z −∞ | e K ρ n ( s ) | ds (using Parseval’s theorem)= 13 (cid:20) ((1 /ρ n ) + µ ) / − | µ | (cid:21) = 13 (cid:20)(cid:0) ((1 /ρ n ) + µ ) / − | µ | (cid:1)(cid:0) (1 /ρ n ) + 2 µ + | µ | ((1 /ρ n ) + µ ) (cid:1)(cid:21) = 13 (cid:20) (1 /ρ n ) (cid:0) (1 /ρ n ) + 2 µ + | µ | ((1 /ρ n ) + µ ) (cid:1)(cid:0) ((1 /ρ n ) + µ ) / + | µ | (cid:1) (cid:21) ≤ (3 + √ /ρ n )3where we have used the fact that we choose | µ | ≤ (1 /ρ n ). Thus V n ( x ) ≤ (3 + √ σ + 4 e | µ | L )4 πnρ n Z S e − µx · θ ⊥ dθ c nρ n (for x in B ( x )) (12)where c > || V n ( x ) || = R x ∈ B ( x ) V n ( x ) dx ≤ c /nρ n for some constant c . Theorem 3.
Let f ∈ H ( β, L ) where β > and f ∗ n ( x ) be the estimator defined in section .Let θ i , s i for i = 1 . . . . , n be i.i.d. random variables and the observation model correspondingto the problem of ERT be given by ( ). Let x ∈ B ( x ) be some fixed point. In the thedefinition of risk in section let us use the seminorm d ( f, g ) = | f ( x ) − g ( x ) | where x ∈ B ( x ) is some arbitrary point. Let ρ n = α n − / (2 β +1) for some constant α , then thefollowing upper bound holds: sup f ∈ H ( β,L ) ψ − n MSE ( f ∗ n , f ) ≤ C where ψ n = n − β − β +1 .Proof. MSE( f ∗ n , f ) = B n ( x ) + V n ( x ) ≤ c ρ β − n + c nρ n . The minimum of the RHS is obtained for ρ ∗ n = ( c c ( β − ) β +1 [ n − − β +1 ]. With this choice of ρ n = ρ ∗ n , MSE( f ∗ n , f ) = O ( n − (2 β − / (2 β +1) ). Theorem 4.
Let f ∈ H ( β, L ) where β > and f ∗ n ( x ) be the estimator defined in section .Let θ i , s i for i = 1 . . . . , n be i.i.d. random variables and the observation model correspondingto the problem of ERT be given by ( ). Consider the seminorm given by d ( f, g ) = || f − g || where || ( · ) || indicates the L norm as usual. Let ρ n = α n − / (2 β +3) , where α is a constant.Then the following upper bound holds, sup f ∈ H ( β,L ) Ψ − n MISE ( f ∗ n , f ) ≤ C where Ψ n = n − β/ (2 β +3) and a positive constant C .Proof. MISE( f ∗ n , f ) = || B n ( x ) || + || V n ( x ) || ≤ c ρ βn + c /nρ n . Note that the minimum of the RHS above is attained for ρ ∗ n = ( c c β ) β +3 [ n − β +3 ]. With thischoice of ρ n = ρ ∗ n , MISE( f ∗ n , f ) = O ( n − β/ (2 β +3) ). This completes our proof.10he upper bounds established in Theorems 3 and 4 above imply that the minimax risksfor the estimator using the two seminorms d and d is bounded above by C Ψ n and C ψ n respectively where Ψ n and ψ n are sequences that go to zero as n → ∞ . As per Definition (5),to establish the optimality of the estimator we need to show that each of the two minimiaxrisks also satisfy the corresponding lower bounds. To that end, at first we make the followingadditional assumptions for the observation model 2: Assumption on the distribution of noise (B1):
The random variables ǫ i are i.i.d havinga distribution G ( · ) that satisfies : ∞ Z −∞ ln dG ( u ) dG ( u + v ) dG ( u ) ≤ I v , | v | ≤ v (13)where I > v > Assumption on design points (B2) : Any design, i.e. { θ i , s i } ni =1 on the cylinder Z =S × [ − ,
1] will be said to be feasible if any non-negative measurable function g ( θ, s ) definedon Z satisfies: E ( θ,s ) (cid:20) n X i =1 g ( θ i , s i ) (cid:21) ≤ C Z Z g ( θ, s ) dsdθ. (14)In what follows, we will assume that the design is feasible in the sense described above. Theorem 5.
Let β, f, f ∗ n , θ i , s i as in Theorem . If in addition, assumptions B1 and B2 aresatisfied by the observation model ( ) then the following inequality holds: lim inf n →∞ inf ˆ f n sup f ∈ H ( β,L ) ψ − n MSE ( ˆ f n , f ) ≥ c where ψ n is the same sequence as in Theorem , inf ˆ f n denotes the infimum over all estimatorsand c > is some constant.Proof. The proof method follows that in [12, Theorem 4] and we will adapt their proofwherever needed. As noted there, using standard reduction techniques for establishing lowerbounds on the minimax risk of regression estimators in a non-parametric setting, the problemcan be reduced to showing that the Kullback distance between the two probability measurescorresponding to two appropriately chosen functions (hypothesis) is bounded, see also [35,section 2.5]. Thus consider the functions (hypothesis) f ( x ) = 0 and f ( x ) = Ah β − η (( x − x ) /h ) where h = n − β +1 , η ( x ) ∈ H ( β, L ) is a compactly supported bounded functionsuch that η (0) > < A < f ( x ) ∈ H ( β, L ). Note that:˜ f ( ξ ) = Ah β − Z η (( x − x ) /h ) e iξ · x dx = Ah β − e iξ · x Z S ∞ Z ( u ) η ( uθ/h ) e iξ · uθ dudθ Ah β +1 e iξ · x Z S ∞ Z (¯ u ) η (¯ uθ ) e i ( hξ · θ )¯ u d ¯ udθ = Ah β +1 e iξ · x ˜ η ( hξ ) . Thus, Z (1 + | ξ | ) β | ˜ f ( ξ ) | dξ = A h β +1) Z (1 + | ξ | ) β | ˜ η ( hξ ) | dξ = A Z ( h + | ¯ ξ | ) β | ˜ η ( ¯ ξ ) | d ¯ ξ ≤ L where we have used the fact that 0 < h, A < η ( x ) ∈ H ( β, L ). Also observe that | f ( x ) − f ( x ) | = Ah β − η (0) and η (0) > P and P be proba-bility measures corresponding to the experiments with observations given by the regressionmodel (2) for f = f and f = f respectively and p and p be the densities corresponding tothe measures P and P respectively. Then to complete the proof of the theorem it suffices toshow the Kullback information distance between the two measures, I ( P , P ) ≤ /
2. Again,From [12], I ( P , P ) = Z ln (cid:18) dP dP (cid:19) dP = E f Z ln (cid:18) dp dp (cid:19) dν ( ν is the Lebesgue measure )= E ( θ,s ) (cid:18) E f | ( θ,s ) Z ln (cid:18) dp dp (cid:19) dν (cid:19) = E ( θ,s ) (cid:20) n X i =1 Z ln dG ( v − T µ f ( θ i , s i )) dG ( v − T µ f ( θ i , s i )) dG ( v − T µ f ( θ i , s i )) (cid:21) ( see [35 , (2 . ≤ C nI Z Z | T µ f ( θ, s ) | dsdθ (using B1 and B2) (15)To estimate R Z | T µ f ( θ, s ) | dsdθ , we will follow [27, section 4]. Consider a function φ ( x ) ∈S ( R ) (i.e. Schwartz class) such that φ ( x ) = 1 for x ∈ B ( x ). Let us introduce¯ w ( x, θ ) = φ ( x ) e µx · θ ⊥ (16)Clearly for any function f ( x ) supported in B ( x ), T µ f ( θ, s ) = T ¯ w f ( θ, s ) = Z R ¯ w ( x, θ ) f ( x ) δ ( x · θ − s ) dx Taking the Fourier transform of T ¯ w f ( θ, s ) with respect to the s - variable we get the followinginequality [27, equation 27], | ˜ T ¯ w f ( θ, t ) | ≤ (2 π ) − | W ¯ w ⋆ ˜ f ( ξ ) | (17)12here W ¯ w = sup θ ∈ S | ˜¯ w ( θ, t ) | and ˜( · ) indicates the corresponding Fourier transform (either 1-dor 2-d) as usual. Now from [27, equation 29], || T µ f ( θ, s ) || L ( Z ) ≤ || T µ f ( θ, s ) || H / ( Z ) ≤ K || W ¯ w || L ( R ) || f || L ( R ) = ¯ K || f || L ( R ) (18)where ¯ K = K || W ¯ w || L ( R ) . We note in passing that since ¯ w ( x, θ ) is given by (16), || W ¯ w || L ( R ) is finite.Now || f || L ( R ) = A h β − R R | η (( x − x ) /h ) | dx = A h β +1 R R | η ( y ) | dy . Since η ∈ H ( β, L )is compactly supported bounded function, thus || η ( y ) || is finite. Thus, I ( P , P ) ≤ C I ¯ KA || η ( y ) || nh β +1 = C I ¯ KA || η ( y ) || ( h = n − β +1 ) (19)Thus if we choose A to be small enough, I ( P , P ) ≤ / Remark 1.
Note that Theorems 3 and 5 together establish the optimality of the convergencerate of minimax risk for the estimator proposed in Section 4 under the seminorm d . Theorem 6.
Let β, f, f ∗ n , θ i , s i as in Theorem . If in addition, assumptions B1 and B2 aresatisfied by the observation model ( ) then the following inequality holds: lim inf n →∞ inf ˆ f n sup f ∈ H ( β,L ) Ψ − n MISE ( ˆ f n , f ) ≥ c where Ψ n is the same sequence as in Theorem , inf ˆ f n denotes the infimum over all estimatorsand c > is some constant.Proof. First of all, we recall from [35, section 2.6] that to establish lower bounds for theconvergence rate of the estimators in L p seminorms requires us to work with many hypotheses(M-hypotheses) instead of just two as we did in the proof of Theorem 5 above. The proofof this theorem follows that of [12, Theorem 5]. All the geometric arguments in this proofare identical to the geometrical arguments in [12] and we only need to change the argumentwherever an estimate for the usual Radon transform is to be replaced with an analogousestimate for the exponential Radon transform. For the sake of completeness, we outline theproof given in [12] here, adapting it to the case of ERT wherever needed.Consider a collection of non-intersecting balls ∆ k , k ∈ { , . . . , M } inscribed in B ( x ) withcenter a k and of radius 1 /m such that m and M are sequences and m → ∞ as n → ∞ .Furthermore, one can choose m and M (the precise choice for m is described later) such thatthe following relation is satisfied: C m ≤ M ≤ C m (20)Let η ( x ) be a smooth function supported in B ( x ). Then each function η k ( x ) = η ( m ( x − a k ))is supported respectively in ∆ k . To each m -tuple b = ( b , . . . , b m ) where b k is either 0 or 1,we associate a function f ( x, b ) supported in B ( x ) such that: f ( x, b ) = Am − β M X k =1 b k η k ( x )13here A >
Lemma 2. [12, Lemma 3] There exists A β > such that for A < A β , the function f ( x, b ) ∈ H ( β, L ) for any m -tuple b . Consider any design D n = { ( θ i , s i ) } i = ni =1 , and consider the lines L i = { x ∈ R : x · θ i = s i } .Take the set of balls ∆ k such that each ball intersects at most C n/m lines where C > J be definedas: J = J ( D n ) = { k ∈ { , . . . , M } : number of lines corresponding to D n that intersect with ∆ k is less than or equal to C n/m } Lemma 3. [12, Lemma 4] There exists C > such that for any design D n , we have theinequality: card J > M/ . In what follows, C is chosen such that Lemma 3 is satisfied.Following [12], let us also indicate by, b ( k, = { b , . . . , b k − , , b k +1 , . . . , b M } and b ( k, = { b , . . . , b k − , , b k +1 , . . . , b M } M -tuples with fixed k -th elements as indicated. Furthemore,we use the following notation for functions: f k = f ( x, b ( k, ) and f k = f ( x, b ( k, ) . Let g k ( x ) = f k ( x ) − f k ( x ) which is supported only on ∆ k by construction. Let P k and P k be the probability measures corresponding to the model 2 for f = f k and f = f k . Let I ( P k , P k ) be the Kullback information distance between these two probability measures.Thus from [12], the desired lower bound for the minimax rate will be obtained if we canshow that for a sufficiently small C > m = ( C n ) β +3 , I ( P k , P k ) < /
2. Justas in [12] and similar to the proof of Theorem 5 above, from assumptions B1 and B2, we get: I ( P k , P k ) ≤ I n X i =1 ( T µ g k ( θ i , s i )) (21)Now from the definition of ERT and from the fact that η k ( x ) is supported in ∆ k ⊂ B ( x ), | ( T µ g k )( θ i , s i ) | = (cid:12)(cid:12)(cid:12)(cid:12) Z L i ∩ ∆ k e µx · θ ⊥ i Am − β η ( m ( x − a k )) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ C Z L i ∩ ∆ k | Am − β η ( m ( x − a k )) | dx ( C = sup x ∈ B ( x ) e µx · θ ⊥ ) ≤ C m − β − (22)14ow note that since k ∈ J , thus at most C n/m of the terms in the sum on RHS of (21) arenon zero. Putting it all together, we have : I ( P k , P k ) ≤ I C ( C ) ( n/m ) m − β − ≤ I C C C . (23)Thus if we choose C ≤ I C C , then we get I ( P k , P k ) ≤ / Remark 2.
Note that Theorems 4 and 6 together establish the optimality of the estimatorin the d semi-norm setting. References Cited [1] Valentina Aguilar, Leon Ehrenpreis, and Peter Kuchment. Range conditions for theexponential Radon transform.
J. Anal. Math. , 68:1–13, 1996.[2] Guillaume Bal and Philippe Moireau. Fast numerical inversion of the attenuated Radontransform with full and partial measurements.
Inverse Problems , 20(4):1137–1164, 2004.[3] Jan Boman and Jan-Olov Str¨omberg. Novikov’s inversion formula for the attenuatedRadon transform—a new approach.
J. Geom. Anal. , 14(2):185–198, 2004.[4] L. Cavalier. Asymptotically efficient estimation in a problem related to tomography.
Math. Methods Statist. , 7(4):445–456 (1999), 1998.[5] Laurent Cavalier. Efficient estimation of a density in a problem of tomography.
Ann.Statist. , 28(2):630–647, 2000.[6] David V. Finch. The attenuated x-ray transform: recent developments. In
Inside out:inverse problems and applications , volume 47 of
Math. Sci. Res. Inst. Publ. , pages 47–66.Cambridge Univ. Press, Cambridge, 2003.[7] J.-P. Guillement, F. Jauberteau, L. Kunyansky, R. Novikov, and R. Trebossen. Onsingle-photon emission computed tomography imaging based on an exact formula forthe nonuniform attenuation correction.
Inverse Problems , 18(6):L11–L19, 2002.[8] Marjorie G. Hahn and Eric Todd Quinto. Distances between measures from 1-dimensional projections as implied by continuity of the inverse Radon transform.
Z.Wahrsch. Verw. Gebiete , 70(3):361–380, 1985.[9] Irene A. Hazou and Donald C. Solmon. Filtered-backprojection and the exponentialRadon transform.
J. Math. Anal. Appl. , 141(1):109–119, 1989.[10] Sean Holman, Fran¸cois Monard, and Plamen Stefanov. The attenuated geodesic x-raytransform.
Inverse Problems , 34(6):064003, 26, 2018.1511] Iain M. Johnstone and Bernard W. Silverman. Speed of estimation in positron emissiontomography and related inverse problems.
Ann. Statist. , 18(1):251–280, 1990.[12] A. P. Korostel¨ev and A. B. Tsybakov. Optimal rates of convergence of estimators ina probabilistic setup of tomography problem.
Problems of information transmission ,27:73–81, 1991.[13] A. P. Korostel¨ev and A. B. Tsybakov. Asymptotically minimax image reconstructionproblems. In
Topics in nonparametric estimation , volume 12 of
Adv. Soviet Math. , pages45–86. Amer. Math. Soc., Providence, RI, 1992.[14] A. P. Korostel¨ev and A. B. Tsybakov.
Minimax theory of image reconstruction , vol-ume 82 of
Lecture Notes in Statistics . Springer-Verlag, New York, 1993.[15] A. K. Louis. Approximate inverse for linear and some nonlinear problems.
InverseProblems , 11(6):1211–1223, 1995.[16] A. K. Louis and P. Maass. A mollifier method for linear operator equations of the firstkind.
Inverse Problems , 6(3):427–440, 1990.[17] A.K. Louis. Optimal sampling in nuclear magnetic resonance (NMR) tomography.
Jour-nal of Computer Assisted Tomography , 6(2):334–340, apr 1982.[18] Fran¸cois Monard. Inversion of the attenuated geodesic X-ray transform over functionsand vector fields on simple surfaces.
SIAM J. Math. Anal. , 48(2):1155–1177, 2016.[19] Fran¸cois Monard, Richard Nickl, and Gabriel P. Paternain. Efficient nonparametricBayesian inference for X -ray transforms. Ann. Statist. , 47(2):1113–1147, 2019.[20] F. Natterer. Inversion of the attenuated Radon transform.
Inverse Problems , 17(1):113–119, 2001.[21] F. Natterer.
The Mathematics of Computerized Tomography . Society for Industrial andApplied Mathematics, 2001.[22] Frank Natterer. On the inversion of the attenuated Radon transform.
Numer. Math. ,32(4):431–438, 1979.[23] Frank Natterer and Frank W¨ubbeling.
Mathematical methods in image reconstruction .SIAM Monographs on Mathematical Modeling and Computation. Society for Industrialand Applied Mathematics (SIAM), Philadelphia, PA, 2001.[24] Roman G. Novikov. An inversion formula for the attenuated x-ray transformation.
Arkivfr Matematik , 40(1):145–167, apr 2002.[25] E.T. Quinto. The dependence of the generalized Radon transform on defining measures.
Trans. Amer. Math. Soc. , 257(2):331–346, 1980.1626] E.T. Quinto. The invertibility of rotation invariant Radon transforms.
J. Math. Anal.Appl. , 91(2):510–522, 1983.[27] G. Rigaud and A. Lakhal. Approximate inverse and Sobolev estimates for the attenuatedRadon transform.
Inverse Problems , 31(10):105010, 21, 2015.[28] Hans Rullg˚a rd. An explicit inversion formula for the exponential Radon transformusing data from 180 ◦ . Ark. Mat. , 42(2):353–362, 2004.[29] Mikko Salo and Gunther Uhlmann. The attenuated ray transform on simple surfaces.
J. Differential Geom. , 88(1):161–187, 2011.[30] Thomas Schuster.
The method of approximate inverse: theory and applications , volume1906 of
Lecture Notes in Mathematics . Springer, Berlin, 2007.[31] I. Ya. Shne˘ıberg. Exponential Radon transform. In
Applied problems of Radon trans-form , volume 162 of
Amer. Math. Soc. Transl. Ser. 2 , pages 235–245. Amer. Math. Soc.,Providence, RI, 1994.[32] I. Ya. Shne˘ıberg, I. V. Ponomarev, V. A. Dmitrichenko, and S. D. Kalashnikov. On anew reconstruction algorithm in emission tomography. In
Applied problems of Radontransform , volume 162 of
Amer. Math. Soc. Transl. Ser. 2 , pages 247–255. Amer. Math.Soc., Providence, RI, 1994.[33] S Siltanen, V Kolehmainen, S J rvenp, J P Kaipio, P Koistinen, M Lassas, J Pirttil, andE Somersalo. Statistical inversion for medical x-ray tomography with few radiographs:I. general theory.
Physics in Medicine and Biology , 48(10):1437–1463, may 2003.[34] Oleh Tretiak and Charles Metz. The exponential Radon transform.
SIAM J. Appl.Math. , 39(2):341–354, 1980.[35] Alexandre B. Tsybakov.
Introduction to nonparametric estimation . Springer Seriesin Statistics. Springer, New York, 2009. Revised and extended from the 2004 Frenchoriginal, Translated by Vladimir Zaiats.[36] Simopekka V¨ansk¨a, Matti Lassas, and Samuli Siltanen. Statistical X-ray tomographyusing empirical Besov priors.
Int. J. Tomogr. Stat. , 11(S09):3–32, 2009.[37] Junhai Wen and Zhengrong Liang. An inversion formula for the exponential radontransform in spatial domain with variable focal-length fan-beam collimation geometry.