aa r X i v : . [ m a t h . NA ] M a y On local analysis
Felipe Cucker ∗ Dept. of MathematicsCity University of Hong Kong [email protected]
Teresa Krick † Departamento de Matem´atica & IMASUniv. de Buenos Aires & CONICETARGENTINA [email protected]
Abstract.
We extend to Gaussian distributions a result providing smoothedanalysis estimates for condition numbers given as relativized distances to ill-posedness. We also introduce a notion of local analysis meant to capture thebehavior of these condition numbers around a point.2010 Mathematics Subject Classification: Primary 65Y20, Secondary 65F35.Keywords: Conic condition number. Smoothed analysis. Local analysis.
In the 1990s D. Spielman and S.H. Teng introduced the notion of smoothed analysis ,in an attempt to give a more realistic analysis of the practical performance of analgorithm than those obtained through the use of worst-case or average-case analy-ses. In a nutshell, this new paradigm in probabilistic analysis interpolates betweenworst-case and average-case by considering the worst-case (over the data) of theaverage value (over possible random perturbations) of the analyzed quantity. See,for instance, [7] for an overview.An example of this analysis to the quantity ln κ ( A ), where A is a square matrixand κ ( A ) := k A k k A − k , was provided by M. Wschebor in [10]. Wschebor showedthat max A ∈ S ( R n × n ) E A ∼ N ( A,σ Id) ln κ ( A ) ≤ ln (cid:16) n min { σ, } (cid:17) + O (1) , (1)where here, and in what follows, x ∼ N ( x, σ Id) indicates that x is drawn froman isotropic Gaussian distribution centered at x with covariance matrix σ Id. Thebehavior of the bound H ( n, σ ) in the right-hand side of (1) shows two expectedproperties of a smoothed analysis: ∗ Partially supported by a GRF grant from the Research Grants Council of the Hong Kong SAR(project number CityU 11302418). † Corresponding author. Partially supported by grant CONICET-PIP2014-2016-11220130100073CO. SA1)
When σ → H ( n, σ ) tends to its worst-case value (there are no randomperturbations of the input in this case). (SA2) When σ → ∞ , H ( n, σ ) tends to the average value of the analyzed quantity(the random perturbation is over all the input data in this case).Indeed, the convergence of H ( n, σ ) to infinity when σ → E A ∼ N (0 ,σ Id) ln κ ( A ) = ln n + O (1), thusshowing (SA2).The main agenda of this paper is to introduce the notion of local analysis , whichaims to study locally at a base point x the average value over possible randomperturbations of the analyzed quantity, without taking then the worst-case over allinput data. The benefit of such analysis is that it provides information dependingdirectly on the base point instead of assuming a worst-case, as in the smoothedanalysis.We illustrate this notion by developing it for a conic condition number . This is acondition number satisfying a Condition Number Theorem. We next describe moreprecisely this notion and its context.In 1936 Eckart and Young [5] proved that for a square matrix A , κ ( A ) = k A k /d ( A, Σ) where Σ is the set of non-invertible matrices and d denotes distance.This result came to be known as the Condition Number Theorem , even though itwas proved more than ten years before the introduction of condition numbers byTuring [8] and von Neumann and Goldstine [9]. In 1987 J. Demmel observed (andproved) that similar Condition Number Theorems hold true for the condition num-bers of various problems [3]. More precisely, he showed that these condition numberswere either equal to or closely bounded by the (normalized) inverse to the distanceto ill-posedness. That is, that for an input data x of the problem at hand, thecondition number of x for that problem is either equal to or closely bounded by C ( x ) = k x k d ( x, Σ) , (2)where Σ = { } is an algebraic cone of ill-posed inputs . One year later, Demmel [4]derived general average analysis bounds for those (conic) condition numbers. Thesebounds depend only on the dimension N + 1 of the ambient space, the codimensionof Σ, and its degree. He carried out this idea for the complex case and stated it forthe real case (requiring Σ to be complete intersection) based on an unpublished (andnot findable anywhere) result by Ocneanu. The underlying probability distributionis the isotropic Gaussian on R N +1 but it is easy to observe that the bounds holdas well for the uniform distribution on the unit sphere S N (or, equivalently, on anyhalf-sphere, due to the equality C ( − x ) = C ( x )).In [2] Demmel’s idea was extended to perform a smoothed analysis of the coniccondition number C ( x ) in the case that Σ is the zero set of a single real homogeneouspolynomial F in N + 1 variables. For this analysis one considers the centers x of the2istributions in S N (as in (1)) and there are two natural choices for the distributionitself: a Gaussian supported in R N +1 or a uniform on a spherical cap in S N . Theuniform case is studied in [2], where the following bound is obtained for θ ∈ [0 , π/ x ∈ S N E x ∈ B S ( x,θ ) ln C ( x ) ≤ ln N d sin θ + 2(ln 2 + 1) (3)where d is the degree of F and B S ( x, θ ) is the spherical cap of radius θ centered at x which we endow with the uniform distribution. This bound H ( N, d, θ ) recovers anaverage analysis in the particular case that the spherical cap is a half-sphere. Thatis, (SA2’) H ( N, d, π/
2) = ln(
N d ) + O (1), is the average value of ln C ( x ) for x ∈ S N ,see [4].A smoothed analysis of the conic condition number C ( x ) in the Gaussian case N ( x, σ Id) was still lacking, and it is one of the results we present in this paper,since it is strongly linked with our local analysis as we will see below. Theorem 4.1shows that max x ∈ S N E x ∼ N ( x,σ Id) ln C ( x ) ≤ H ( N, d, σ )where H ( N, d, σ ) is an explicit bound that satisfies (SA1) and (SA2) . That islim σ → H ( N, d, σ ) = ∞ and lim σ →∞ H ( N, d, σ ) = ln(
N d ) + O (1) . With respect to local analysis, the gist is to obtain bounds for the quantities E x ∼D ( x ) ln C ( x )where x ∈ S N and D ( x ) is either the uniform distribution on the spherical cap B S ( x, θ ) or the Gaussian N ( x, σ Id).These bounds will be expressions H ( N, d, ν, C ( x )) where ν is either θ or σ de-pending on the underlying distribution, which should coincide with smoothed anal-ysis bounds when C ( x ) = ∞ . More precisely, if we denote by H ∞ ( N, d, ν ) the resultof replacing C ( x ) by ∞ in H ( N, d, ν, C ( x )) then we want the following: (LA0) H ∞ ( N, d, ν ) has the same behavior as the smoothed analysis bound H ( N, d, ν ).Furthermore, when C ( x ) < ∞ we seek the following limiting behavior: (LA1) lim ν → H ( N, d, ν, C ( x )) = ln( C ( x )) + O (1), the local complexity at x . (LA2) lim σ →∞ H ( N, d, σ, C ( x )) = ln( N d ) + O (1) in the Gaussian case, the averagecomplexity. 3 LA2’) H ( N, d, π/ , C ( x )) = ln( N d ) + O (1) in the uniform case, the average com-plexity.Indeed, we show that this is the case in Theorem 3.1 (uniform case) and Theo-rem 4.8 (Gaussian case). Acknowledgments.
We are grateful to Pierre Lairez for many useful discussions.In particular, for pointing to us an argument in Proposition 4.2.
In all what follows we consider the space R N +1 endowed with the standard innerproduct h , i and its induced norm k k . Within this space we have the unit sphere S N = { x ∈ R N +1 : k x k = 1 } , and for x ∈ S N we denote by B ( x, r ) = { x ∈ R N +1 : k x − x k ≤ r } the closed ball centered at x ∈ R N +1 with radius r ≥
0, and by B S ( x, θ ) = { x ∈ S N : 0 ≤ ∢ ( x, x ) ≤ θ } = { x ∈ S N : h x, x i ≥ cos θ } the spherical cap in S N centered at x ∈ S N with radius 0 ≤ θ ≤ π , that is the closedball of radius θ around x in S N with respect to the Riemannian distance in S N .We will also refer to the sine distance d sin in R N +1 \ { } given by d sin ( x, x ) :=sin( ∢ ( x, x )). Let B sin ( x, ρ ) := { x ∈ S N : d sin ( x, x ) ≤ ρ } denote the closed ball ofradius ρ with respect to d sin around x ∈ S N . This is the union of B S ( x, θ ) with B S ( − x, θ ) where θ ∈ [0 , π/
2] is such that ρ = sin θ .We will denote by O N = vol ( S N ) the volume of S N . We recall (see [1,Prop. 2.19(a)]) that O N = 2 π N +12 Γ( N +12 ) (4)as well as [1, Cor. 2.20] vol ( B (0 , O N N + 1 (5)and, for x ∈ S N and θ ∈ [0 , π ], the bound (see [1, Lem. 2.34]) O N p π ( N + 1) (sin θ ) N ≤ vol ( B S ( x, θ )) ≤ O N θ ) N . (6)The main object in this paper is a conic condition number on R N +1 , i.e. afunction given by C : R N +1 → [1 , ∞ ] , C ( x ) = k x k d ( x, Σ) , where Σ = { } is the set of ill-posed inputs in R N +1 , which we assume closed underscalar multiplication. We note that C ( x ) ≥ x since 0 ∈ Σ. As C is scaleinvariant we may restrict to data x lying in S N where C can also be expressed as C ( x ) = 1 d sin ( x, Σ ∩ S N ) . The uniform case
We endow B sin ( x, ρ ) with the uniform probability measure. A smoothed analysis forthis measure is given in [1, Th. 21.1]. Assume that Σ is contained in a real algebraichypersurface, given as the zero set of a homogeneous polynomial of degree d . Then,for all θ ∈ [0 , π ] and ρ := sin θ , we have E x ∈ B S ( x,θ ) ln C ( x ) = E x ∈ B sin ( x,ρ ) ln C ( x ) ≤ ln N d sin θ + K (7)and E x ∈ S N ln C ( x ) ≤ ln( N d ) + K, (8)where K = 2(ln 2 + 1). Here ln denotes Neperian logarithm. We observe thatthe equality above is due to the fact that C ( x ) = C ( − x ) for all x ∈ S N and that vol B sin ( x, ρ ) = vol B S ( x, θ ) + vol B S ( − x, θ ).The same observation applies to the following result. Theorem 3.1.
Let C ba a conic condition number on R N +1 with set of ill-posedinputs Σ . Assume that Σ is contained in a real algebraic hypersurface, given as thezero set of a homogeneous polynomial of degree d . Let x ∈ S N and ≤ θ ≤ π . Then,for ρ := sin θ , E x ∈ B S ( x,θ ) ln C ( x ) ≤ ln N dρ + − ρ C ( x ) + ln 12 + 2 if ρ > C ( x ) + 1ln 1 ρ + − ρ C ( x ) + ln 4 if ρ ≤ C ( x ) + 1 .In particular, there is a uniform explicit bound H ( N, d, θ, C ( x )) –defined in (10) below– such that E x ∈ B S ( x,θ ) ln C ( x ) ≤ H ( N, d, θ, C ( x )) . This bound satisfies satisfies (LA0) , since H ∞ ( N, d, θ ) = ln Nd sin θ + O (1) as H ( N, d, θ ) in (3) , (LA1) and (LA2’) . Proof.
Assume first that 12 C ( x ) + 1 ≤ ρ ≤
1. In this case, we have ρ (2 C ( x ) + 1) ≥ ⇐⇒ ρ C ( x ) + ρ ≥ ⇐⇒ ρ C ( x ) ≥ − ρ ⇐⇒ ρ ≥ − ρ C ( x )and we can decompose ρ = 13 ρ + 13 (2 ρ ) ≥ (cid:16) ρ + 1 − ρ C ( x ) (cid:17) , i.e 1 ρ ≤ ρ + − ρ C ( x ) . E x ∈ B S ( x,θ ) ln C ( x ) ≤ ln N dρ + ln 4 + 2 ≤ ln 3 N dρ + − ρ C ( x ) + ln 4 + 2 = ln N dρ + − ρ C ( x ) + ln 12 + 2 . We next assume 0 ≤ ρ < C ( x ) + 1 . In this case,12 C ( x ) + 1 = 14 (cid:16) C ( x ) + 1 + 32 C ( x ) + 1 (cid:17) > (cid:16) ρ + 1 − ρ C ( x ) (cid:17) since 32 C ( x ) + 1 > C ( x ) > − ρ C ( x ) . Equivalently,2 C ( x ) + 1 < ρ + − ρ C ( x ) . We also use here that for all x ∈ B S ( x, θ ),1 C ( x ) = d sin ( x, Σ) ≥ d sin ( x, Σ) − d sin ( x, x ) ≥ C ( x ) − ρ ≥ C ( x ) + − C ( x ) + 1 = 12 C ( x ) + 1 , (9)and therefore C ( x ) ≤ C ( x ) + 1 < ρ + − ρ C ( x ) which implies ln C ( x ) ≤ ln 1 ρ + − ρ C ( x ) + ln 4 . This shows the first statement. We now derive the expression of a bound H ( N, d, θ, C ( x )).Let ϕ : [0 , → R be the function defined by ρ N d − ρ log C ( x ) 12 + 1where the exponent of ρ in the numerator is the logarithm in base 12 C ( x ) of 12 ,which, by continuity, we take to be 0 when C ( x ) = ∞ . We note that ϕ is concave,6onotonically increasing, d satisfies ϕ (0) = 1, ϕ (1) = 2 N d −
1, and when C ( x ) = ∞ , ϕ ( ρ ) = 2 N d −
1. Moreover, by monotonicity, ϕ (cid:16) C ( x ) + 1 (cid:17) ≤ ϕ (cid:16) C ( x ) (cid:17) = 2( N d −
1) 12 + 1 =
N d.
This implies, sinceln 1 ρ + − ρ C ( x ) + ln 4 ≤ ln ϕ ( ρ ) ρ + − ρ C ( x ) + ln 12 + 2 for 0 ≤ ρ < C ( x ) + 1and using also concavity,ln N dρ + − ρ C ( x ) + ln 12 + 2 ≤ ln ϕ ( ρ ) ρ + − ρ C ( x ) + ln 12 + 2 for 12 C ( x ) + 1 ≤ ρ ≤ , that E x ∈ B S ( x,θ ) ln C ( x ) ≤ ln ϕ ( ρ ) ρ + − ρ C ( x ) + ln 12 + 2 . That is, H ( N, d, θ, C ( x )) = ln 2( N d − ρ log C ( x ) 12 + 1 ρ + − ρ C ( x ) + ln 12 + 2 . (10)Finally, it is trivial to verify, from the specific values taken by ϕ mentioned previ-ously, that H ( N, d, θ, C ( x )) satisfies (LA0) , (LA1) and (LA2’) . We keep the same conic condition number C but now consider a Gaussian measure N ( x, σ Id) in R N +1 centered at x ∈ S N and with covariance matrix σ Id for 0 <σ < ∞ , that is with density function given by1(2 πσ ) N +12 exp (cid:16) −k x − x k σ (cid:17) . Since our local analysis will rely on a smoothed analysis in this case, which is notyet known, we begin by studying a general smoothed analysis for the Gaussian case.
Let x ∈ S N . We recall that, for any 0 ≤ θ ≤ π , B S ( x, θ ) = { x ∈ S N : 0 ≤ ∢ ( x, x ) < θ } , θ = π we denote S N + ( x ) := B S (cid:16) x, π (cid:17) = n x ∈ S N : 0 ≤ ∢ ( x, x ) < π o = (cid:8) x ∈ S N : h x, x i > (cid:9) , the open half-sphere centered at x .The main result of this section is the following smoothed analysis for the Gaus-sian distribution. Theorem 4.1.
Let C be a conic condition number on R N +1 with set of ill-posedinputs Σ . Assume that Σ is contained in a real algebraic hypersurface, given as thezero set of a homogeneous polynomial of degree d , and that N ≥ . Then, thereexists an explicit bound H ( N, d, σ ) –defined in (13) – such that max x ∈ S N E x ∼ N ( x,σ Id) ln C ( x ) ≤ H ( N, d, σ ) . This bound satisfies (SA1) lim σ → H ( N, d, σ ) = ∞ , the worst-case value. (SA2) lim σ →∞ H ( N, d, σ ) = ln(
N d ) + 2(ln 2 + 1) , the average value, in remarkablecoincidence with (8) . The following map plays a central role in all what follows,Ψ : R N +1 \ x ⊥ → S N + ( x ) , x ( k x k − x if h x, x i > −k x k − x otherwise. (11)The main stepping stone towards the proof of Theorem 4.1 is the following. Proposition 4.2.
Let x ∈ S N . There exists a probability density f : [0 , π ] → R ≥ ofa random variable θ ∈ [0 , π ] , associated to x , σ and N , such that for all measurablefunction F : R N +1 → R ≥ satisfying F ( x ) = F ( λx ) for all λ ∈ R × , one has E y ∼ N ( x,σ Id) F ( y ) = (1 − e − σ ) E θ ∼ f (cid:16) E x ∈ B S ( x,θ ) F ( x ) (cid:17) + e − σ E x ∈ S N + ( x ) F ( x ) . We begin by proving the following lemma.
Lemma 4.3.
For any measurable function F : R N +1 → R + satisfying F ( λy ) = F ( y ) , ∀ λ ∈ R × , one has E y ∼ N ( x,σ Id) F ( y ) = Z S N + ( x ) G x ( ∢ ( x, x )) F ( x )d x where G x : [0 , π ] → R > is a decreasing function of α defined by G x ( α ) = 1(2 πσ ) N +12 Z ∞−∞ exp (cid:16) − λ + 1 − λ cos α σ (cid:17) | λ | N d λ. roof. We have E y ∼ N ( x,σ Id) F ( y ) = 1(2 πσ ) N +12 Z R N +1 F ( y ) exp (cid:16) −k y − x k σ (cid:17) d y = 1(2 πσ ) N +12 Z S N + ( x ) (cid:16) Z ∞−∞ F ( λx ) exp (cid:16) −k λx − x k σ (cid:17) | λ | N d λ (cid:17) d x = Z S N + ( x ) F ( x ) (cid:20) πσ ) N +12 Z ∞−∞ exp (cid:16) −k λx − x k σ (cid:17) | λ | N d λ (cid:21) d x = Z S N + ( x ) F ( x ) G ( x )d x where the second equality follows from the transformation formula [1, Thm. 2.1]applied to the diffeomorphismΦ : R N +1 \ x ⊥ → S N + ( x ) × R \ { } , x ( (Ψ( x ) , k x k ) if h x, x i > x ) , −k x k ) otherwise,and G ( x ) := 1(2 πσ ) N +12 Z ∞−∞ exp (cid:16) −k λx − x k σ (cid:17) | λ | N d λ does not depend on F . Now, for x, x ∈ S N + ( x ), k λx − x k = λ − λ cos( ∢ ( x, x )) + 1 . Therefore, G ( x ) =: G x ( ∢ ( x, x )) where for 0 ≤ α ≤ π , G x ( α ) = 1(2 πσ ) N +12 Z ∞−∞ exp (cid:16) − λ + 1 − λ cos α σ (cid:17) | λ | N d λ, which is a continuously differentiable decreasing function of α . Proof of Proposition 4.2.
By Lemma 4.3, E y ∼ N ( x,σ Id) F ( y ) = Z S N + ( x ) G x ( ∢ ( x, x )) F ( x )d x. (12)Now, by the fundamental Theorem of Calculus for 0 < α < π , G x ( α ) = G x (cid:16) π (cid:17) − Z π α G ′ x ( θ )d θ = G x (cid:16) π (cid:17) − Z π { α ≤ θ } G ′ x ( θ )d θ. Replacing this in (12) and changing the order of integration, we obtain E y ∼ N ( x,σ Id) F ( y ) = G x (cid:16) π (cid:17) Z S N + ( x ) F ( x )d x − Z π (cid:16) Z S N + ( x ) F ( x ) { ∢ ( x,x ) ≤ θ } dx (cid:17) G ′ x ( θ )d θ. E x ∈ S N + ( x ) F ( x ) = Z S N + ( x ) F ( x )d x vol ( S N + ( x )) and E x ∈ B S ( x,θ ) F ( x ) = Z B S ( x,θ ) F ( x )d x vol ( B S ( x, θ )) , we obtain E y ∼ N ( x,σ Id) F ( y ) = G x (cid:16) π (cid:17) vol ( S N + ( x )) E x ∈ S N + ( x ) F ( x ) − Z π (cid:16) vol ( B S ( x, θ )) E x ∈ B S ( x,θ ) F ( x ) (cid:17) G ′ x ( θ )d θ. We now denote f ( θ ) := − vol ( B S ( x, θ )) G ′ x ( θ )1 − e − σ , which is a non-negative function since G x is decreasing, and rewrite the equalityabove as E y ∼ N ( x,σ Id) F ( y ) = H ( N, σ ) E x ∈ S N + ( x ) F ( x ) + (1 − e − σ ) Z π (cid:16) E x ∈ B S ( x,θ ) F ( x ) (cid:17) f ( θ )d θ, where H ( N, σ ) = G x (cid:16) π (cid:17) vol ( S N + ( x )) = vol ( S N + ( x )) 1(2 πσ ) N +12 Z ∞−∞ exp (cid:16) − λ + 12 σ (cid:17) | λ | N d λ. We now prove that H ( N, σ ) = e − σ : Changing variables ν = λσ we have H ( N, σ ) = vol ( S N + ( x ))(2 πσ ) N +12 Z ∞−∞ exp (cid:16) − λ + 12 σ (cid:17) | λ | N d λ = vol ( S N + ( x ))(2 πσ ) N +12 Z ∞−∞ exp (cid:16) − ν − σ (cid:17) | ν | N σ N +1 d ν = e − σ (cid:20) vol ( S N + ( x ))(2 π ) N +12 Z ∞−∞ e − ν | ν | N d ν (cid:21) . To estimate the quantity between the square brackets we use the known equality Z ∞ ν N e − x d ν = Γ (cid:16) N + 12 (cid:17) N − vol ( S N + ( x ))(2 π ) N +12 Z ∞−∞ exp (cid:16) − ν (cid:17) | ν | N d ν = vol ( S N + ( x ))(2 π ) N +12 Γ (cid:16) N + 12 (cid:17) N +12 = π N +12 Γ (cid:16) N +12 (cid:17) (2 π ) N +12 Γ (cid:16) N + 12 (cid:17) N +12 = 1 . Therefore E y ∼ N ( x,σ Id) F ( y ) = e − σ E x ∈ S N + ( x ) F ( x ) + (1 − e − σ ) Z π (cid:16) E x ∈ B S ( x,θ ) F ( x ) (cid:17) f ( θ )d θ. This implies, by taking F = 1, that1 = e − σ + (1 − e − σ ) Z π f ( θ )d θ, i.e. Z π f ( θ )d θ = 1 . Therefore f is a density on [0 , π ], and Z π (cid:16) E x ∈ B S ( x,θ ) F ( x ) (cid:17) f ( θ )d θ = E θ ∼ f (cid:16) E x ∈ B S ( x,θ ) F ( x ) (cid:17) . Since C ( x ) = C ( λx ) for all λ ∈ R × , we can now focus on F ( x ) := ln C ( x ). Proposition 4.4.
With the notation in Proposition 4.2, we have E y ∼ N ( x,σ Id) ln C ( y ) ≤ (1 − e − σ ) E θ ∼ f (ln (cid:16) θ (cid:17) ) + ln( N d ) + 2(ln 2 + 1) . Proof.
Replacing the expectations in the right-hand side of the equality inProposition 4.2 by their bound in (7) for ρ = sin θ and ρ = 1 = sin π , we obtain E y ∼ N ( x,σ Id) ln C ( y ) = (1 − e − σ ) E θ ∼ f (cid:16) E x ∈ B S ( x,θ ) ln C ( x ) (cid:17) + e − σ E x ∈ S N + ( x ) ln C ( x ) ≤ (1 − e − σ ) E θ ∼ f (ln N d sin θ + K ) + e − σ E x ∈ S N + ( x ) (ln( N d ) + K ) ≤ (1 − e − σ ) (cid:16) E θ ∼ f (ln (cid:16) θ (cid:17) + ln( N d ) + K ) (cid:17) + e − σ (ln( N d ) + K ) ≤ (1 − e − σ ) E θ ∼ f (ln (cid:16) θ (cid:17) ) + ln( N d ) + K, K = 2(ln 2 + 1). The result follows from the last equality in Proposition 4.2.Our next goal is to estimate the right-hand side in Proposition 4.4. Lemma 4.5.
Let ≤ t ≤ π . Then Z π t ln (cid:16) θ (cid:17) f ( θ )d θ ≤ ln √ Z √ sin t (cid:16) Z arcsin st f ( θ )d θ (cid:17) s d s. Proof.
Write Z π t ln (cid:16) θ (cid:17) f ( θ )d θ = Z π t ln (cid:16) θ (cid:17) f ( θ )d θ + Z π π ln (cid:16) θ (cid:17) f ( θ )d θ. Since θ ≤ √ θ ∈ [ π , π ], the second term satisfies Z π π ln (cid:16) θ (cid:17) f ( θ )d θ ≤ ln √ Z π π f ( θ )d θ. We analyze the first term. Let A = { ( θ, r ) ∈ [ t, π ] × [0 , ln (cid:0) t (cid:1) ] : r ≤ ln( θ ) } ,and A r = { θ ∈ [ t, π ] : r ≤ ln( θ ) } . By Fubini’s Theorem we have both Z ( θ,r ) ∈ A f ( θ )d( θ, r ) = Z π t (cid:16) Z ln( θ )0 d r (cid:17) f ( θ )d θ = Z π t ln (cid:16) θ (cid:17) f ( θ )d θ and Z ( θ,r ) ∈ A f ( θ )d( θ, t ) = Z ln( t )0 (cid:16) Z θ ∈ A r f ( θ )d θ (cid:17) d r = Z ln √ (cid:16) Z θ ∈ A r f ( θ )d θ (cid:17) d r + Z ln( t )ln √ (cid:16) Z θ ∈ A r f ( θ )d θ (cid:17) d r = ln √ Z π t f ( θ )d θ + Z ln( t )ln √ (cid:16) Z sin t ≤ sin θ ≤ e − r f ( θ )d θ (cid:17) d r, since t ≤ π implies ln √ ≤ ln (cid:0) t (cid:1) and when r ≤ ln √
2, then A r = [ t, π ]. There-fore, Z ( θ,r ) ∈ A f ( θ )d( θ, r ) = ln √ Z π t f ( θ )d θ + Z √ sin t (cid:16) Z arcsin st f ( θ )d θ (cid:17) s d s,
12y taking s = e − r . Finally, Z π t ln (cid:16) θ (cid:17) f ( θ )d θ ≤ ln √ Z π π f ( θ ) + ln √ Z π t f ( θ )d θ + Z √ sin t (cid:16) Z arcsin st f ( θ )d θ (cid:17) s d s ≤ ln √ Z π t f ( θ )d θ + Z √ sin t (cid:16) Z arcsin st f ( θ )d θ (cid:17) s d s ≤ ln √ Z √ sin t (cid:16) Z arcsin st f ( θ )d θ (cid:17) s d s since R π t f ( θ )d θ ≤ Lemma 4.6.
Assume N ≥ . For all t ∈ [0 , π ] , one has Z t f ( θ )d θ ≤ min n , − e − σ (cid:16)
12 (sin(2 t )) N + (sin t ) N σ N +1 (cid:17)o . Proof.
For t ≤ π , Z t f ( θ )d θ = E θ ∼ f ( { θ ≤ t } ) ≤ E θ ∼ f (cid:0) E B S ( x,θ ) { ∢ ( x,x ) ≤ t } (cid:1) ) ≤ − e − σ E y ∼ N ( x,σ Id) ( { ∢ (Ψ( y ) ,x ) ≤ t } )= 11 − e − σ Prob y ∼ N ( x,σ Id) (cid:8) ∢ (Ψ( y ) , x ) ≤ t (cid:9) , for Ψ defined in (11). The first inequality holds because for θ ≤ t , ∢ ( x, x ) ≤ θ implies ∢ ( x, x ) ≤ t , and the second by Proposition 4.2 applied to F = (cid:8) ∢ (Ψ( y ) ,x ) ≤ t (cid:9) . It isthen enough to bound the right-hand expression.We observe that for 0 ≤ t ≤ π , the set K = (cid:8) y ∈ R N +1 : ∢ (Ψ( y ) , x ) ≤ t (cid:9) is apointed cone with vertex at 0, central axis passing through x and angular opening α := 2 t . In addition, one can prove by the cosine theorem that this cone is includedin the union of the pointed cone K with vertex at x , central axis passing through 2 x and angular opening 2 α with the intersection K ∩ B ( x,
1) (see Figure 1). Hence, themeasure of K (with respect to N ( x, σ Id)) is bounded by the sum of the measuresof K and K ∩ B ( x, αα K K Figure 1
The cones K (shaded) and K (line patterned). As the vertex x of K coincides with the center of N ( x, σ ), the measure of K with respect to N ( x, σ ) equals the proportion of the volume (in S ( x, K with S ( x,
1) within this sphere. That is, the measure of K withrespect to N ( x, σ ) satisfiesProb x ∼ N ( x,σ Id) { x ∈ K } = vol ( B S ( x, t )) O N where, we recall, O N := vol ( S N ). Using (6) we deduce that, for t ∈ [0 , π ],Prob x ∼ N ( x,σ Id) { x ∈ K } ≤
12 (sin(2 t )) N . x ∼ N ( x,σ Id) { x ∈ K ∩ B ( x, } = Z x ∈ K ∩ B ( x, πσ ) N +12 exp (cid:16) − k x − x k σ (cid:17) d x ≤ πσ ) N +12 Z x ∈ K ∩ B ( x, x = 1(2 πσ ) N +12 vol ( K ∩ B ( x, ≤ πσ ) N +12 vol ( K ∩ B (0 , vol ( B S ( x, t ))(2 πσ ) N +12 O N vol ( B (0 , ≤ (6) N +1 (sin t ) N (2 πσ ) N +12 · vol ( B (0 , ≤ (4)(5) N +12 (sin t ) N Γ( N +12 )( N + 1) σ N +1 ≤ N + e N − (sin t ) N √ π ( N − N ( N + 1) σ N +1 . Here we used the well-known lower bound Γ( N +12 ) > √ π (cid:16) N − (cid:17) N e − N − (see forinstance [1, Eq. 2.14]) for the last inequality. We finish the proof by noting that itcan be easily proven by induction, using for instance that N N +1 ≥ N ( N − N ,that for all N ≥
5, we have 2 N + e N − √ π ( N − N ( N + 1) ≤ . Lemma 4.7.
Assume N ≥ . Then, E θ ∼ f (ln (cid:16) θ (cid:17) ) ≤ N (cid:16) (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) . Proof.
We have by Lemma 4.5 with t = 0, E θ ∼ f (ln (cid:16) θ (cid:17) ) ≤ ln √ Z √ (cid:16) Z arcsin s f ( θ )d θ (cid:17) s d s, where by Lemma 4.6, since 0 ≤ arcsin s ≤ π for 0 ≤ s ≤ √ , Z arcsin s f ( θ )d θ ≤ min (cid:26) , − e − σ (cid:16)
12 (sin(2 arcsin s )) N + (sin(arcsin s )) N σ N +1 (cid:17)(cid:27) ≤ min (cid:26) , − e − σ (cid:16) N − s N + s N σ N +1 (cid:17)(cid:27) ≤ min n , N − + σ N +1 − e − σ s N o .
15e have2 N − + σ N +1 − e − σ s N ≤ ⇐⇒ (cid:16) N − + 1 σ N +1 (cid:17) s N ≤ − e − σ ⇐⇒ s ≤ c ( N, σ ) , where c ( N, σ ) := N s (1 − e − σ ) σ N +1 N − σ N +1 . In addition we observe that for all N ≥ c ( N, σ ) < √
22 since c ( N, σ ) < √ ⇐⇒ (1 − e − σ ) σ N +1 N − σ N +1 < N ⇐⇒ N (1 − e − σ ) σ N +1 < N − σ N +1 . Rewriting c ( N, σ ) − N = 2 N − + σ N +1 − e − σ we get E θ ∼ f (ln (cid:16) θ (cid:17) ) ≤ ln √ Z c ( N,σ )0 (cid:16) Z arcsin s f ( θ )d θ (cid:17) s d s + Z √ c ( N,σ ) (cid:16) Z arcsin s f ( θ )d θ (cid:17) s d s ≤ ln √ Z c ( N,σ )0 c ( N, σ ) − N s N − d s + Z √ c ( N,σ ) s d s ≤ ln √ N + ln √ − ln c ( N, σ )= 1 N (cid:16) N − + σ N +1 − e − σ (cid:17) = 1 N (cid:16) (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) . Proof of Theorem 4.1.
By Proposition 4.4 and Lemma 4.7, E y ∼ N ( x,σ Id) ln C ( y ) ≤ (1 − e − σ ) E θ ∼ f (ln (cid:16) θ (cid:17) ) + ln( N d ) + K ≤ − e − σ N (cid:16) (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) + ln( N d ) + K, with K = 2(ln 2 + 1). We then define H ( N, d, σ ) = (1 − e − σ ) N (cid:16) (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) +ln( N d )+2(ln 2+1) . (13)We now verify that H ( N, d, σ ) satisfies (SA1) and (SA2) :16
SA1) lim σ → H ( N, d, σ ) = lim σ → (cid:16) N (cid:16) (cid:0) N − + 1 σ N +1 (cid:1)(cid:17) + ln( N d ) + 2(ln 2 + 1) (cid:17) = lim σ →∞ (cid:16) N + 1 N ln N dσ + O (1) (cid:17) = ∞ . Note that actually the difference of the formula in the last line compared to(7), with the dispersion parameter σ replacing sin θ , is negligible. (SA2) lim σ →∞ H ( N, d, σ ) = ln(
N d ) + 2(ln 2 + 1), and we recover the well-known,average-case analysis, bound for E x ∈ S N ln( C ( x )) (see [4] and [1, Theo-rem 21.1]). The main result of this section is the following.
Theorem 4.8.
Let C be a conic condition number on R N +1 with N ≥ , with setof ill-posed inputs Σ . Assume that Σ is contained in a real algebraic hypersurface,given as the zero set of a homogeneous polynomial of degree d . Let x ∈ S N and σ ≥ . Then, there is an explicit bound H ( N, d, σ, C ( x )) –defined in (23) below–such that E x ∼ N ( x,σ Id) ln C ( x ) ≤ H ( N, d, σ, C ( x )) . This bound satisfies (LA0) , (LA1) and (LA2) . In order to prove Theorem 4.8 we need the following lemma.
Lemma 4.9.
Assume N ≥ . For all t ∈ [0 , π/ , Z π t f ( θ )d θ ≤ min n , πσ √ N + 1(1 − e − σ ) t o . Proof.
The idea is to apply Markov’s inequality (e.g. [1, Corollary 2.9]) to thedensity f to deduce that Z π t f ( θ )d θ = Prob θ ∼ f ( θ ≥ t ) ≤ t E θ ∼ f ( θ )Therefore we need to bound E θ ∼ f ( θ ). We first prove that E θ ∼ f ( θ ) ≤ √ π − e − σ E y ∈ N ( x,σ Id) ( k Ψ( y ) − x k ) , (14)where Ψ is given by (11), and then that E y ∈ N ( x,σ Id) ( k Ψ( y ) − x k ) ≤ √ σ √ N + 1 . (15)17his implies E θ ∼ f ( θ ) ≤ πσ √ N + 11 − e − σ . To show (14) we apply Proposition 4.2 with F ( y ) = k Ψ( y ) − x k and get E y ∈ N ( x,σ Id) ( k Ψ( y ) − x k ) ≥ (1 − e − σ ) E θ ∼ f (cid:16) E x ∈ B S ( x,θ ) ( k x − x k ) (cid:17) . (16)We claim that E x ∈ B S ( x,θ ) ( k x − x k ) ≥ √ π θ. (17)Indeed, for 0 ≤ α := ∢ ( x, x ) ≤ π , one has2 √ π α ≤ k x − x k ≤ α. Therefore, writing v ( θ ) := vol ( B S ( x, θ )), E x ∈ B S ( x,θ ) ( k x − x k ) ≥ √ π E x ∈ B S ( x,θ ) ∢ ( x, x )= 2 √ π v ( θ ) (cid:18) Z B S ( x, θ ) ∢ [ x, x ]d x + Z B S ( x,θ ) \ B S ( x, θ ) ∢ [ x, x ]d x (cid:19) ≥ √ π v ( θ ) Z B S ( x,θ ) \ B S ( x, θ ) θ x = θ √ π v ( θ ) − v ( θ ) v ( θ ) ! . Now, for 0 ≤ θ ≤ π , we havesin θ = 2 sin θ θ ≥ √ θ , which implies sin θ ≤ sin θ √ . Using (6) twice we have, for N ≥ v (cid:16) θ (cid:17) ≤ O N (cid:16) sin θ (cid:17) N ≤ O N N (sin θ ) N ≤ O N p π ( N + 1) (sin θ ) N ≤ v ( θ )2and we deduce that v ( θ ) − v ( θ ) v ( θ ) ≥ . With this, E x ∈ B S ( x,θ ) ( k x − x k ) ≥ √ π θ E y ∈ N ( x,σ Id) ( k Ψ( y ) − x k ) ≥ √ − e − σ )2 π E θ ∼ f ( θ ) , which shows (14). We now show (15). We let Ψ ∗ ( y ) be the closest point to x on theline through 0 and y (see Figure 2) and have E y ∈ N ( x,σ Id) ( k Ψ( y ) − x k ) ≤ √ E y ∈ N ( x,σ Id) ( k Ψ ∗ ( y ) − x k ) ≤ √ E y ∈ N ( x,σ Id) ( k y − x k ) ≤ √ σ √ N + 1 , where the last inequality is a consequence of [1, Prop. 2.10 & Lem. 2.15]. x y Ψ ∗ ( y )Ψ( y ) Figure 2
The point Ψ ∗ ( y ). This shows (15). Therefore, E θ ∼ f ( θ ) ≤ πσ √ N + 11 − e − σ . as desired, and hence, Z π t f ( θ )d θ ≤ πσ √ N + 1(1 − e − σ ) t . Proof of Theorem 4.8.
Let t := arcsin C ( x ) . Since C ( x ) ≥ C ( x ) ≤ and we have t ≤ π . For all θ ≤ t and all x ∈ B S ( x, θ ) we have1 C ( x ) = d sin ( x, Σ) ≥ d sin ( x, Σ) − d sin ( x, x ) ≥ C ( x ) − sin θ ≥ C ( x ) − C ( x ) ≥ C ( x ) , C ( x )) ≤ ln(2 C ( x )).We apply Proposition 4.2 to F ( y ) = ln C ( y ) and use the previous inequality and thebounds (7) and (8) to obtain E y ∼ N ( x,σ ) ln C ( y ) = (1 − e − σ ) E θ ∼ f (cid:16) E x ∈ B S ( x,θ ) ln C ( x ) (cid:17) + e − σ E x ∈ S N + ( x ) ln C ( x ) ≤ (1 − e − σ ) (cid:16) ln(2 C ( x )) Z t f ( θ )d θ + Z π t (cid:16) ln (cid:16) N d sin θ (cid:17) + K (cid:17) f ( θ )d θ (cid:17) + e − σ (ln( N d ) + K ) ≤ ln C ( x )(1 − e − σ ) Z t f ( θ )d θ + (1 − e − σ ) Z π t ln (cid:16) θ (cid:17) f ( θ )d θ (18)+ ln( N d ) (cid:16) e − σ + (1 − e − σ ) Z π t f ( θ )d θ (cid:17) + K, since (1 − e − σ ) (cid:0) ln 2 Z t f ( θ )d θ + K Z π t f ( θ )d θ (cid:1) + Ke − σ ≤ K. We next bound each of the first three terms in the right-hand side.Applying Lemma 4.6 and the inequality sin(2 t ) ≤ t we obtain(1 − e − σ ) Z t f ( θ )d θ ≤ min n − e − σ ,
12 (sin(2 t )) N + (sin t ) N σ N +1 o ≤ min n − e − σ , C ( x )) N + 1(2 C ( x )) N σ N +1 o = min n − e − σ , C ( x )) N (cid:16) N − + 1 σ N +1 (cid:17)o . This bounds the first term in (18) byln C ( x ) min n − e − σ , C ( x )) N (cid:16) N − + 1 σ N +1 (cid:17)o . (19)20econd, by Lemma 4.5 since t ≤ π , Lemma 4.9 and t ≥ sin t = C ( x ) , Z π t ln (cid:16) θ (cid:17) f ( θ )d θ ≤ ln √ Z √ C ( x ) (cid:16) Z arcsin st f ( θ )d θ (cid:17) s d s ≤ ln √ Z √ C ( x ) min n − e − σ , πσ √ N + 1 t o s d s ≤ ln √ n , πσ √ N + 1(1 − e − σ ) t o(cid:16) ln (cid:16) √ (cid:17) − ln 12 C ( x ) (cid:17) = ln √ n , πσ C ( x ) √ N + 11 − e − σ o ln( √ C ( x )) ≤ min (cid:8) , πσ C ( x ) √ N + 11 − e − σ (cid:9) ln C ( x ) + ln 2 . (20)Also, as t ≥
0, we have by Lemma 4.7 that Z π t ln (cid:16) θ (cid:17) f ( θ )d θ ≤ N (cid:16) (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) ≤ N (cid:16) ln (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) + ln 2 . Putting together this inequality and (20) we deduce that the second term in (18) isbounded bymin (cid:26) (1 − e − σ ) ln( C ( x )) , πσ C ( x ) √ N + 1 ln( C ( x )) , (1 − e − σ ) N (cid:16) ln (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17)(cid:27) + ln 2 . (21)Finally, using again Lemma 4.9 and t ≥ sin t = C ( x ) we obtain e − σ + (1 − e − σ ) Z π t f ( θ )d θ ≤ e − σ + min n − e − σ , πσ √ N + 1 t o ≤ min n , e − σ + 4 π C ( x ) σ √ N + 1 o which bounds the third term in (18) byln( N d ) min n , e − σ + 4 π C ( x ) σ √ N + 1 o . (22)21ombining (19), (21) and (22) with the bound in (18), we obtain H ( N, d, σ, C ( x )) = ln C ( x ) min n − e − σ , C ( x )) N (cid:16) N − + 1 σ N +1 (cid:17)o + min (cid:26) (1 − e − σ ) ln( C ( x )) , πσ C ( x ) √ N + 1 ln C ( x ) , (23)(1 − e − σ ) N (cid:16) ln (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17)(cid:27) + ln( N d ) min n , e − σ + 4 π C ( x ) σ √ N + 1 o + K, where K = ln 2 + K = 3 ln 2 + 2. We now verify that H ( N, d, σ, C ( x )) satisfies (LA0) , (LA1) and (LA2) . (LA0) When C ( x ) = ∞ we get H ∞ ( N, d, σ ) = (1 − e − σ ) N (cid:16) ln (cid:0) N − + 1 σ N +1 (cid:1) − ln(1 − e − σ ) (cid:17) +ln( N d )+ O (1) , which is that of (13) (with a slightly bigger constant) as required in (LA0) . (LA1) When σ →
0, we havelim σ → H ( N, d, σ, C ( x )) = ln( C ( x )) + K, as required. (LA2) Also, when σ → ∞ , we getlim σ →∞ H ( N, d, σ, C ( x )) = ln( N d ) + K, and we recover the average-case analysis bound for E x ∈ S N ln( C ( x )). References [1] P. B¨urgisser and F. Cucker.
Condition , volume 349 of
Grundlehren der mathe-matischen Wissenschaften . Springer-Verlag, Berlin, 2013.[2] P. B¨urgisser, F. Cucker and M. Lotz. The probability that a slightly perturbednumerical analysis problem is difficult.
Mathematics of Computation , 77:1559–1583, 2008.[3] J. Demmel. On condition numbers and the distance to the nearest ill-posedproblem.
Numer. Math. , 51:251–289, 1987.224] J. Demmel. The probability that a numerical analysis problem is difficult.
Math.Comp. , 50:449–480, 1988.[5] C. Eckart and G. Young. The approximation of one matrix by another of lowerrank.
Psychometrika , 1:211–218, 1936.[6] A. Edelman. Eigenvalues and condition numbers of random matrices.
SIAM J.of Matrix Anal. and Applic. , 9:543–556, 1988.[7] D.A. Spielman, S.H. Teng. Smoothed analysis: an attempt to explain the be-havior of algorithms in practice.
Communications of the ACM , 52(10):76–84,2009.[8] A.M. Turing. Rounding-off errors in matrix processes.
Quart. J. Mech. Appl.Math. , 1:287–308, 1948.[9] J. von Neumann and H.H. Goldstine. Numerical inverting matrices of high order.