aa r X i v : . [ m a t h . NA ] A ug Applying Lepskij-Balancing in Practice
Frank Bauer
E-mail: [email protected]
Abstract.
In a stochastic noise setting the Lepskij balancing principle for choosingthe regularization parameter in the regularization of inverse problems is depending ona parameter τ which in the currently known proofs is depending on the unknown noiselevel of the input data. However, in practice this parameter seems to be obsolete.We will present an explanation for this behavior by using a stochastic model fornoise and initial data. Furthermore, we will prove that a small modification of thealgorithm also improves the performance of the method, in both speed and accuracy.AMS classification scheme numbers: 47A52,65J22,60G99,62H12 pplying Lepskij-Balancing in Practice
1. Introduction
In the following, we will consider linear inverse problems [EHN96, Hof86] given as anoperator equation Ax = y, (1)where A : X → Y is a linear, continuous, compact operator acting between separablereal infinite dimensional Hilbert spaces X , Y . Without loss of generality we assume that A has a trivial null-space N ( A ) = { } . A does not have a continuous inverse because A is compact and X is infinite dimensional, and hence (1) is ill-posed.For the analysis we will need the singular value decomposition of A . There existorthonormal bases ( u k ) k ∈ N of X and ( v k ) k ∈ N of Y and a sequence of positive decreasingsingular values ( s k ) k ∈ N such that Ax = ∞ X k =1 s k h x, u k i v k . (2)Moreover, we assume that the data y are noisy, the noise model for ξ will be specifiedlater, in contrast to the classical considerations in a stochastic setting ξ is not necessarilyan element of Y . y δ = Ax + ξ, ξ noise . (3)In order to counter the ill-posedness, we need to regularize; in this article we willconcentrate on the regularization method truncated singular value decomposition(TSVD, also called spectral cut-off regularization) which has some specific features thatmake proofs considerably easier. The level n at which we truncate is called regularizationparameter. The subsampling function s ( · ) : N N is assumed to be strictly increasing. A − n y δ = x δn = s ( n ) X k =1 (cid:16) h x, u k i + s − k h ξ, v k i (cid:17) u k (4)The unknown noise-free regularized solution is defined as A − n y = x n = s ( n ) X k =1 h x, u k i u k (5)The correct choice of the regularization parameter is of major importance for theperformance of the method. In recent times, a number of articles [GP00, MP03, BP05,MP06, HPR07, BHM09] have considered the Lepskij Balancing principle [Lep90] forchoosing this parameter in various situations. For practical applications there are stillthree open issues: • In the case of stochastic noise, one loses, in comparison to the optimal situation, alogarithmic factor; i.e. the proven convergence rate of the error is O ( δ H log( δ )) incomparison to an optimal O ( δ H ) where ξ = δξ with a normalized ξ , H is dependingon x and ξ . This phenomenon cannot be observed in practical implementations;the question is why? pplying Lepskij-Balancing in Practice • In practical implementations, one can replace some knowledge needed explicitly inthe proofs (the size of the regularized error in X ) with a data-driven approximationwithout losing performance. Can this be put on a firm mathematical basis? • Is there a possibility to improve the speed of the method such that it can competewith others, e.g. the Morozov Discrepancy principle [EHN96, Mor66]?In order to explain some behavior observed using other parameter choice methods, inpractical situations an alternative model for describing the solution and the noise hasrecently been proven successful [BR08, BK08]. Using this model, we can answer thequestions posed above by slightly modifying Lepskij’s algorithm such that we can provean oracle inequality.The outline of the article is as follows. First we will cite the definition of theLepskij Balancing principle. Then we will define our model and calculate the underlyingexpectations on whose basis we will estimate the probabilities that the balancingprinciple behaves differently than expected. This will yield the desired oracle inequality.Using the same methodology, we will show that an estimation based on twomeasurements is sufficient to obtain the same result, of course with weaker constants.
2. Lepskij Balancing Principle
The key point in the Lepskij Balancing Principle is the knowledge of the noise behavior,which has different forms for different noise regimes [GP00, MP03, BP05].
Definition 2.1 (Noise Behavior) If ξ is assumed to be in a deterministic regime (i.e., k ξ k ≤ δ ), then define ̺ ( n ) := s − s ( n ) δ ≥ k A − n ξ k (6) where δ is the noise level. If ξ is assumed to be stochastic, then define ̺ ( n ) := E k A − n ξ k . (7) Later on we will specify more precisely what we mean by stochastic. In both cases, ̺ ( · ) is a monotonically increasing function. Now we will follow the approach presented in [BM07], which already incorporates the(minor) modifications of the balancing principle to make it fit for practice, in particular,by limiting the number of necessary computations.
Definition 2.2 (Special parameters)
There are two special regularization parame-ters which are important for the later proofs: • n opt : the optimal regularization parameter, i.e., we have k A − n opt Ax k ≈ ̺ ( n opt ) . Theparameter n opt is generally unknown. • N : the maximal regularization parameter, i.e., the point where one can be sure thatin any case n opt < N . Even when one has just a very rough idea of the noise,respectively the noise level δ , this parameter can be estimated rather reliably. (E.g.,in the deterministic case: N = ̺ − ( δ ) , see [MP03], for a statistical setup [MP06]).pplying Lepskij-Balancing in Practice However, assuming the knowledge of such a parameter N is problematic at somepoint; it is likely that a number of other parameter choice methods would work betterif one were able to detect outliers easily. Definition 2.3 (Look-Ahead)
Let σ > . Define the look-ahead function by l N,σ ( n ) = min { min { m | ̺ ( n ) − > σ̺ ( m ) − } , N } Definition 2.4 (Balancing Functional)
The balancing functional is defined as b N,σ ( n ) = max n The balancing stopping index is de-fined as n N,σ,κ = min n ≤ N { B N,σ ( n ) ≤ κ } . (9) If no ambiguities can occur, we will denote n N,σ,κ by n ∗ Remark 2.6 A number of results and facts are known: • The classical proofs are for σ = ∞ , i.e. l N, ∞ ( n ) = N . However, reducing σ justworsens some constants. • In the case of deterministic noise, κ = 1 . Then it holds [MP03] k x − x δn ∗ k ≤ c (cid:16) k A − n Ax n opt k + ̺ ( n opt ) (cid:17) where c is independent of x and ξ . • In the case of stochastic noise and κ = ̺ ( N ) it holds [GP00] q E k x − x δn ∗ k ≤ c log( ̺ ( N )) (cid:16) k A − n Ax n opt k + ̺ ( n opt ) (cid:17) where c independent of x and ξ . • These results are basically independent of the regularization method, i.e. they alsoapply to other well known methods like Tikhonov regularization and Landweberiteration [MP03]. • Similar results hold for non-linear inverse problems in combination with theIteratively Regularized Gauß-Newton Method (IRGNM) [BHM09].pplying Lepskij-Balancing in Practice 3. A Closer Analysis In order to analyze the behavior of the methods in practice, we will now use the Bayesianmodel introduced in [BR08]. h x, u k i ∼ N (0 , ( ηk − γ ) ) s k = k − λ h ξ, v k i ∼ N (0 , ( δk ε ) )where γ > / λ > λ > − ε and all Gaussian random variables are independentand identically distributed (iid). All expectations E should now be interpreted as jointexpectations of x and ξ . Definition 3.1 (Subsampling) Let ω > , ω > and ω ω > ω + 1 . We choose thefollowing subsampling for obtaining the regularization parameter: s ( n ) = ⌈ ω ω n ⌉ Remark 3.2 Due to ω ω > ω + 1 it always holds s ( n + 1) > s ( n ) . Furthermore wehave ω ω n ≤ s ( n ) ≤ ω + 1 ω ω ω n Basic calculus using upper and lower sums to approximate an integral yields Lemma 3.3 Let m/ω ≥ n ≥ ω . If κ > then (cid:16) − ω − κ +1 (cid:17) κ − n − κ +1 < m − X k = n k − κ < (cid:18) ω − ω (cid:19) − κ +1 κ − n − κ +1 . If κ ≥ then (cid:18) ω − ω (cid:19) κ +1 (cid:16) − ω − κ − (cid:17) 11 + κ m κ +1 < m − X k = n k κ < 11 + κ m κ +1 . Corollary 3.4 (Adjacent Difference) Let ≤ n < m . Then it holds c η ω − γ +10 γ − ω n ( − γ +1) + δ ω λ +2 ε λ + 2 ε ω m (2 λ +2 ε +1) ! ≤ E k x δm − x δn k ≤ c η ω − γ +10 γ − ω n ( − γ +1) + δ ω λ +2 ε λ + 2 ε ω m (2 λ +2 ε +1) ! (10) pplying Lepskij-Balancing in Practice with c = min ((cid:18) ω + 1 ω (cid:19) − γ +1 (1 − ω − γ +1 ) , (cid:18) ω − ω (cid:19) λ +2 ε +1 (cid:16) − ω − λ − ε − (cid:17)) c = max ((cid:18) ω − ω (cid:19) − γ +1 , (cid:18) ω + 1 ω (cid:19) λ +2 ε ) Proof It holds x δm − x δn = s ( m ) − X k = s ( n ) (cid:16) h x, u k i + σ − k h ξ, v k i (cid:17) u k and hence E k x δm − x δn k = s ( m ) − X k = s ( n ) η k − γ + δ k λ +2 ε and hence (cid:18) ω + 1 ω (cid:19) − γ +1 (1 − ω − γ +1 ) η ω − γ +10 γ − ω n ( − γ +1) + (cid:18) ω − ω (cid:19) λ +2 ε +1 (cid:16) − ω − λ − ε − (cid:17) δ ω λ +2 ε λ + 2 ε ω m (2 λ +2 ε +1) ≤ E k x δm − x δn k ≤ (cid:18) ω − ω (cid:19) − γ +1 η ω − γ +10 γ − ω n ( − γ +1) + (cid:18) ω + 1 ω (cid:19) λ +2 ε δ ω λ +2 ε λ + 2 ε ω m (2 λ +2 ε +1) which yields the proposition. (cid:3) Corollary 3.5 (Propagated Noise) Let ≤ n < m . Then it holds c δ ω λ +2 ε λ + 2 ε ω m (2 λ +2 ε +1) ≤ ̺ ( m ) ≤ c δ ω λ +2 ε λ + 2 ε ω m (2 λ +2 ε +1) (11) with c = (cid:18) ω − ω (cid:19) λ +2 ε +1 (cid:16) − ω − λ − ε − (cid:17) c = (cid:18) ω + 1 ω (cid:19) λ +2 ε Proof Using ̺ ( m ) = E k x δm − x m k = s ( m ) − X k =1 δ k λ +2 ε we can proceed as beforehand. (cid:3) pplying Lepskij-Balancing in Practice Corollary 3.6 (Regularization Error) Let ≤ n < m . Then it holds c η ω − γ +10 γ − ω n ( − γ +1) + δ ω λ +2 ε λ + 2 ε ω n (2 λ +2 ε +1) ! ≤ E k x δn − x k ≤ c η ω − γ +10 γ − ω n ( − γ +1) + δ ω λ +2 ε λ + 2 ε ω n (2 λ +2 ε +1) ! (12) with c = min ((cid:18) ω + 1 ω (cid:19) − γ +1 (1 − ω − γ +1 ) , (cid:18) ω − ω (cid:19) λ +2 ε +1 (cid:16) − ω − λ − ε − (cid:17)) c = max ((cid:18) ω − ω (cid:19) − γ +1 , (cid:18) ω + 1 ω (cid:19) λ +2 ε ) Proof Using E k x δn − x k = ∞ X k = s ( n ) η k − γ + s ( n ) − X k =1 δ k λ +2 ε we can proceed as beforehand. (cid:3) Remark 3.7 Obviously it holds c = c < c < < c < c = c where we can get asclose to as we want, as long as for fixed ω the constant ω is big enough.Although this constant ω will have large influence in the latter proofs we cannotobserve in practice [BL10] any major influence; ω = 3 seems to be sufficient in mostsituations even when ω is rather close to .As ω is independent of the noise level δ we have that at least all proofs holdasymptotically. An explication for the insensitivity in practice towards γ and the otherparameters might be that our inequalities to handle the probabilities are too conservative. Now we can approximately determine the expected minimal point for E k x δn − x k : E k x δn − x n k = E k x n − x k which yields η ω − γ +10 γ − ω n opt ( − γ +1) = δ ω λ +2 ε +10 λ + 2 ε + 1 ω n opt (2 λ +2 ε +1) (13)i.e., η γ − s ( n opt ) − γ +1 = δ λ + 2 ε + 1 s ( n opt ) λ +2 ε +1 and hence s ( n opt ) = η δ γ − λ + 2 ε + 1 ! / (2 λ +2 ε +2 γ ) pplying Lepskij-Balancing in Practice n opt = log η δ γ − λ + 2 ε + 1 ! / (2 λ +2 ε +2 γ ) ω − / log ω Obviously n opt does not need to exist if ω is getting too big. However, for the rest ofthe article we will assume the existence of n opt as there exists (depending on ω ) a δ such that n opt exists for any δ < δ .Additionally, it holds l N,σ ( n ) = n + K for some fixed K ≈ log( σ ) / log( ω ). Furthermore, we have a lemma which was proven in[BR08]. Lemma 3.8 Let Z = P ∞ k =1 α k ζ k with P ∞ k =1 α k = 1 and ζ k ∼ N (0 , iid. Assume that max k α k > . Then ∀ z ∈ (0 , 1) : P ( Z ≤ z ) ≤ exp − z + log( z )2 max k α k ! ≤ ( ez ) 12 max k α k (14) ∀ z > P ( Z ≥ z ) ≤ √ e − z/ . (15)Now we will evaluate the probabilities. Lemma 3.9 Assume that n opt < n and that ω is big enough such that c c ≥ . (16) Then it holds that P { b N,σ ( n ) > τ } ≤ K √ e − τ and P { B N,σ ( n ) > τ } ≤ K log δω − λ log ω √ e − τ . pplying Lepskij-Balancing in Practice Proof It holds due to (10), (11) and (15) P { b N,σ ( n ) > τ } ≤ X ≤ k ≤ K P n − k x δn − x δn + k k ̺ ( n + k ) − > τ o ≤ K max ≤ k ≤ K P n − k x δn − x δn + k k ̺ ( n + k ) − > τ o = K max ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k > τ ̺ ( n + k ) E k x δn − x δn + k k ) ( )( ) ≤ K max ≤ k ≤ K P k x δn − x δn + k k E k x δn − x δn + k k > τ c δ ω λ +2 ε +10 λ +2 ε +1 ω ( n + k )(2 λ +2 ε +1) c δ ω λ +2 ε +10 λ +2 ε +1 ω ( n + k )(2 λ +2 ε +1) ( ) ≤ K max ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k > τ ) ( ) ≤ K √ e − τ . The second inequality follows directly, using that any s ( N ) − γ < δ does not make anysense. (cid:3) Lemma 3.10 Assume that it holds n opt ≥ n , with ω big enough such that c ω ω γ − ≥ and c c ≤ Then it holds that P { b N,σ ( n ) < τ } ≤ eω K (2 λ +2 ε +1) τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) where τ is independent of n and linearly dependent on τ ; ω > ω is independent of n andlinearly dependent on ω . Furthermore, it holds P { B N,σ ( n ) < τ } ≤ eω (2 λ +2 ε +1) τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) pplying Lepskij-Balancing in Practice Proof It holds due to (10), (11), (13) and (14) P { b N,σ ( n ) < τ } ≤ P n ∀ ≤ k ≤ K : 4 − k x δn − x δn + k k ̺ ( n + k ) − < τ o ≤ min ≤ k ≤ K P n − k x δn − x δn + k k ̺ ( n + k ) − < τ o = min ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k < τ ̺ ( n + k ) E k x δn − x δn + k k ) ( )( ) ≤ min ≤ k ≤ K P k x δn − x δn + k k E k x δn − x δn + k k < τ c δ ω λ +2 ε +10 λ +2 ε +1 ω ( n + k )(2 λ +2 ε +1) c η ω − γ +10 γ − ω n ( − γ +1) ( ) ≤ min ≤ k ≤ K P k x δn − x δn + k k E k x δn − x δn + k k < τ c δ ω λ +2 ε +10 λ +2 ε +1 ω ( n + k )(2 λ +2 ε +1) c δ ω λ +2 ε +10 λ +2 ε +1 ω n opt (2 λ +2 ε +1) ω ( − γ +1)( n − n opt ) = min ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k < τ c c ω − ( n opt − n )(2 λ +2 ε +2 γ ) ω k (2 λ +2 ε +1) ) ( ) ≤ min ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k < τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) ω k (2 λ +2 ε +1) ) ( ) ≤ min ≤ k ≤ K (cid:16) eτ ω − ( n opt − n )(2 λ +2 ε +2 γ ) ω k (2 λ +2 ε +1) (cid:17) c η ω − γ +102 γ − ωn ( − γ +1) η ω − γ ω − nγ = (cid:16) eω (2 λ +2 ε +1) τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) (cid:17) c ω ω γ − ( ) ≤ eω (2 λ +2 ε +1) τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) . The second inequality is trivial. (cid:3) This means that the balancing functional b N,σ , respectively its smoothed version B N,σ ,shows the following behavior: • Assume n < n opt . The probability that b ( · ) falls below the threshold becomessmaller and smaller the farther away n is from n opt ; near n opt , one cannot makeany sensible statements as in the above inequality the bound for the probability isbigger than 1. In particular, the decay of probabilities is faster than the increaseof error for smaller regularization parameters. • Besides the point n opt , the probability of being above the threshold depends onlyon the level of the threshold.Using this behavior, we can define the following method. This idea has already beenpresented in a different form in [RH08], however in a purely deterministic setting witha focus on convergence results. Definition 3.11 (Fast Balancing) Define n fb = argmin n { b N,σ ( n ) < τ } . pplying Lepskij-Balancing in Practice Theorem 3.12 Let σ such that K = 1 and assume that ω is big enough such that(16),(17) and (18) hold; furthermore assume that n opt exists.For any N (including N = ∞ ) and any τ ≥ , λ + 2 ε > , the parameter n fb exists with probability and it holds the oracle inequality E k x δn fb − x k ≤ C min n E k x δn − x k where C is not dependent on the particular x and ξ (i.e., not on δ resp. η ). The proof we use is rather similar to the one used in [BR08]: Proof The proof consists of three parts:Due to K = 1, all random variables b N,σ ( n ) are independent. Hence, using lemma(3.9) it holds that P ( n ≥ n opt + k ) ≤ ( √ e − ) k k →∞ −−−→ K = 1, all random variables b N,σ ( n ) are independent. This triviallyyields that n fb exists with probability 1.Hence we obtain using the Hölder inequality with p − + p − = 1 E k x δn fb − x k = ∞ X n =0 E k x − x δn k n = n fb ≤ n opt − X n =0 (cid:16) E k x − x δn k p (cid:17) /p (cid:16) E pn = n fb (cid:17) /p + max n E k x − x δn opt − k , E k x − x δn opt k o + ∞ X n = n opt +1 (cid:16) E k x − x δn k p (cid:17) /p (cid:16) E pn = n fb (cid:17) /p In [BR08] it is proven using the Gaussian behavior that (cid:16) E k x − x δn k p (cid:17) /p ≤ c p E k x − x δn k (19)for some constant c p ≥ p . Now using that λ > − ε we can choose p near enough to 1 such that 2 λ + 2 ε + 2 γ (1 − p ) + p > τ in relation to ω was chosen in such a way that ω λ +2 ε +1 (cid:16) √ e − τ (cid:17) /p < . (21) pplying Lepskij-Balancing in Practice E k x δn fb − x k ≤ c p n opt − X n =0 E k x − x δn k ( P { b N,σ ( n ) < τ } ) /p + ω ( − γ +1) E k x − x δn opt k + c p ∞ X n = n opt +1 (cid:16) E k x − x δn k (cid:17) n Y k = n opt +1 P { b N,σ ( k ) > τ } /p , , ( ) ≤ c p E k x − x δn opt k n opt − X n =0 (cid:16) eω λ +2 ε +1 τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) (cid:17) /p c c − ω − ( n opt − n )( − γ +1) + ω ( − γ +1) + ∞ X n = n opt +1 ω ( n − n opt )(2 λ +2 ε +1) (cid:16) √ e − τ (cid:17) ( n − n opt ) /p ( ) ≤ C E k x − x δn opt k )( ) ≤ C min n E k x δn − x k due to the definition of n opt where C ( ) , ( ) ≤ c p (cid:18)(cid:16) eω λ +2 ε +1 τ (cid:17) /p c c − (1 − ω λ +2 ε +2 γ (1 − p )+ p ) − /p + ω ( − γ +1) + (cid:18) − ω λ +2 ε +1 (cid:16) √ e − τ (cid:17) /p (cid:19) − ! . Obviously C is independent of the particular x and ξ . (cid:3) This means in particular that we do not lose a logarithmic factor and can set τ = 1without a problem as long as we keep ω small enough. Furthermore, this speeds up themethod considerably since, as in the Morozov discrepancy principle, we no longer need tofind solutions for all n up to N but can stop after considering at most n ∗ + K ≈ n opt + K solutions. Practice shows that the method works also for K > 4. Obtaining the Noise Behavior In practice, one often does not know ̺ and therefore needs to estimate it. Nevertheless,in most practical situations it is possible to measure more than once or to partition thedata into two or more data sets.Assume that one can partition the measurement in two parts y ˜ δ and y ˜ δ with˜ δ = √ δ , we have x δn = x ˜ δn, + x ˜ δn, pplying Lepskij-Balancing in Practice ̺ is now e ̺ ( n ) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) x ˜ δn, − x ˜ δn, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) and it obviously holds E e ̺ ( n ) = ̺ ( n ) (22)Accordingly, we can define ˜ b N,σ ( n ) by just replacing ̺ with ˜ ̺ .This means that we can modify the probability estimations using a similar trick asin [BR08]. It is important to notice that there is no way to reliably estimate the color ofthe noise based on only two solutions; the same holds for the noise level when the colorof the noise is not known. Nevertheless, the information we obtain from two solutionsis sufficient for optimal reconstructions. Lemma 4.1 Assume that n opt < n and that ω is big enough such that c c ≥ . and c ω λ + 2 ε > Then it holds if e τ < that P { e b N,σ ( n ) > τ } ≤ K eτ Proof Using (22), (11), lemma 3.8 and parts which have already been shown in 3.9, it holds: K − P { e b N,σ ( n ) > τ } ≤ K − X ≤ k ≤ K P n − k x δn − x δn + k k e ̺ ( n + k ) − > τ o ≤ max ≤ k ≤ K P n − k x δn − x δn + k k e ̺ ( n + k ) − > τ o ( ) = max ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k > τ ̺ ( n + k ) E k x δn − x δn + k k e ̺ ( n + k ) ̺ ( n + k ) ) ≤ max ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k > τ e ̺ ( n + k ) E e ̺ ( n + k ) ) ≤ max ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k > τ ) + P ( τ > e ̺ ( n + k ) E e ̺ ( n + k ) ) ( )( )( ) ≤ √ e − τ/ + max ≤ k ≤ K (cid:18) e τ (cid:19) c δ ω λ +2 ε λ +2 ε ω ( n + k )(2 λ +2 ε +1) δ ω λ +2 ε ω ( n + k )(2 λ +2 ε ) ≤ √ e − τ/ + (cid:18) e τ (cid:19) c ω ωn + k λ +2 ε ( ) ≤ √ e − τ/ + e τ ≤ eτ . (cid:3) pplying Lepskij-Balancing in Practice Lemma 4.2 Assume that it holds n opt ≥ n and assume that ω is big enough such that c ω ω γ − ≥ and c c ≤ Then it holds that P { e b N,σ ( n ) < τ } ≤ eω λ +2 ε +1 τ ω − ( n opt − n )( λ +2 ε +2 γ ) Proof It holds using lemma 3.8 and parts of lemma 3.10: P { e b N,σ ( n ) < τ } ≤ P n ∀ ≤ k ≤ K : 4 − k x δn − x δn + k k e ̺ ( n + k ) − < τ o ≤ min ≤ k ≤ K P n − k x δn − x δn + k k e ̺ ( n + k ) − < τ o ( ) = min ≤ k ≤ K P ( k x δn − x δn + k k E k x δn − x δn + k k < τ ̺ ( n + k ) E k x δn − x δn + k k e ̺ ( n + k ) ̺ ( n + k ) ) ≤ P ( k x δn − x δn +1 k E k x δn − x δn +1 k < τ ω − ( n opt − n )(2 λ +2 ε +2 γ ) ω k (2 λ +2 ε +1) e ̺ ( n + 1) E e ̺ ( n + 1) ) ≤ P ( k x δn − x δn +1 k E k x δn − x δn +1 k < τ ω − ( n opt − n )( λ +2 ε +2 γ ) ω k (2 λ +2 ε +1) ) + P ( τ ω ( n opt − n ) λ < e ̺ ( n + 1) E e ̺ ( n + 1) ) , ( ) ≤ eω (2 λ +2 ε +1) τ ω − ( n opt − n )( λ +2 ε +2 γ ) + √ e − τω ( nopt − n ) λ / ≤ eω λ +2 ε +1 τ ω − ( n opt − n )( λ +2 ε +2 γ ) (cid:3) This means that, in principle, the balancing functional e b N,σ shows the same behavior asits non-estimated counterpart b N,σ .Using this behavior, we can define a version of the new method: Definition 4.3 (Fast Balancing) Define n fb = argmin n { e b N,σ ( n ) < τ } . Theorem 4.4 Let σ such that K = 1 and assume that ω is big enough such that (16),(17), (18) and (23) hold; furthermore assume that n opt exists.For any N (including N = ∞ ) and any τ ≥ , λ + 2 ε > , the parameter n fb existswith probability and it holds the oracle inequality E k x δn fb − x k ≤ C min n E k x δn − x k where C is not dependent on the particular x and ξ (i.e., not on δ resp. η ). The proof works in the exact same way as for theorem 3.12. pplying Lepskij-Balancing in Practice 5. Conclusion Assuming that our model is suitable for describing real data, we have presented ananswer to the initial questions, at least for the newly defined methods: • We do not lose a logarithmic factor, because the probability of the balancingprinciple going completely wrong is negligibly small. • We do not need explicit knowledge of the noise level δ and the noise behavior. Arough estimation based on two independent measurements is sufficient. • The newly introduced method is as fast as the Morozov discrepancy principle (ifone neglects constant factors).Although the situation is not completely comparable with the case of deterministic x which suffers from the mentioned logarithmic factor we think this is a significantadvance to understand the difference in theoretical and actual behavior of the balancingprinciple. Though it has not been shown in this paper, one can transfer parts of theproofs also to the case of Tikhonov regularization [Bau10].Furthermore, large numerical experiments show that the newly defined methodworks very well and can, in contrast to most other parameter choice regimes, copewith colored noise without any performance loss [BL10]. In these experiments it wasobserved that the factor C in the oracle inequality is at most around 2. The methodis very stable, i.e., the number of observed outliers is very low, both for Tikhonov andSpectral-Cut-Off regularization.Additionally it was observed that the stability increases if one uses more than twomeasurements in order to estimate the noise behavior and if one chooses K a bit biggerthan 1. Acknowledgements The author gratefully acknowledges the financial support by the Upper AustrianTechnology and Research Promotion. References [Bau10] F. Bauer, Parameter choice by fast balancing , Arxiv.org (2010).[BHM09] F. Bauer, T. Hohage, and A. Munk, Iteratively regularized Gauss–Newton method fornonlinear inverse problems with random noise , SIAM Journal on Numerical Analysis (2009), no. 3, 1827–1846.[BK08] F. Bauer and S. Kindermann, The quasi-optimality criterion for classical inverse problems ,Inverse Problems (2008), 035002.[BL10] F. Bauer and M. Lukas, Comparing parameter choice methods for regulariza-tion of ill-posed problems Optimal regularization for ill-posed problems in metric spaces , J.Inverse Ill-Posed Probl. (2007), no. 2, 137–148. pplying Lepskij-Balancing in Practice [BP05] F. Bauer and S. Pereverzev, Regularization without preliminary knowledge of smoothness anderror behavior , European Journal of Applied Mathematics (2005), no. 3, 303–317.[BR08] F. Bauer and M. Reiß, Regularization independent of the noise level: an analysis of quasi-optimality , Inverse Problems (2008), no. 5, 055009 (16pp).[EHN96] H. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems , Kluwer AcademicPublisher, Dordrecht, Boston, London, 1996.[GP00] A. Goldenshluger and S. Pereverzev, Adaptive estimation of linear functionals in hilbertscales from indirect white noise observations , Probab. Theory Related Fields (2000),169–186.[Hof86] B. Hofmann, Regularization of applied inverse and ill-posed problems , Teubner, Leipzig, 1986.[HPR07] U. Hämarik, R. Palm, and T. Raus, Use of extrapolation in regularization methods , J. InverseIll-Posed Probl. (2007), no. 3, 277–294.[Lep90] O.V. Lepski, On a problem of adaptive estimation in Gaussian white noise , Theory ofProbability and its Applications (1990), no. 3, 454–466.[Mor66] V. A. Morozov, On the solution of functional equations by the method of regularization , SovietMath. Dokl. (1966), 414–417.[MP03] P. Mathé and S. Pereverzev, Geometry of linear ill-posed problems in variable Hilbert spaces ,Inverse Problems (2003), no. 3, 789–803.[MP06] Peter Mathé and Sergei V. Pereverzev, Regularization of some linear ill-posed problems withdiscretized random noisy data. , Math. Comput. (2006), no. 256, 1913–1929 (English).[RH08] T. Raus and Hämarik, About the balancing principle for choice of the regularizationparameter , Journal of Physics135