Conditional regression for single-index models
CConditional regression for single-index models
ALESSANDRO LANTERI , MAURO MAGGIONI and STEFANO VIGOGNA* DEMM, Università degli Studi di Milano, Milano, Italy. E-mail: [email protected] Department of Mathematics and Department of Applied Mathematics & Statistics, Johns Hopkins University,Baltimore, USA. E-mail: [email protected] MaLGa Center, DIBRIS, Università degli Studi di Genova, Genova, Italy.E-mail: [email protected] Collegio Carlo Alberto, Torino, Italy.
The single-index model is a statistical model for intrinsic regression where responses are assumed to dependon a single yet unknown linear combination of the predictors, allowing to express the regression function as E [ Y | X ] = f ( (cid:104) v, X (cid:105) ) for some unknown index vector v and link function f . Conditional methods provide a simpleand effective approach to estimate v by averaging moments of X conditioned on Y , but depend on parameterswhose optimal choice is unknown and do not provide generalization bounds on f . In this paper we propose anew conditional method converging at √ n rate under an explicit parameter characterization. Moreover, we provethat polynomial partitioning estimates achieve the -dimensional min-max rate for regression of Hölder functionswhen combined to any √ n -convergent index estimator. Overall this yields an estimator for dimension reductionand regression of single-index models that attains statistical optimality in quasilinear time. MSC2020 subject classifications: primary 62G05; secondary 62G08; 62H99
Keywords:
Single-index model; dimension reduction; nonparametric regression; finite sample bounds
1. Introduction
Consider the standard regression problem of estimating a function F : R d → R from n samples { ( X i , Y i ) } ni =1 , where the X i ’s are independent realizations of a predictor variable X ∈ R d , Y i = F ( X i ) + ζ i , i = 1 , . . . , n , (1)and the ζ i ’s are realizations, independent among themselves and of the X i ’s, of a random variable ζ modeling noise. Under rather general assumptions on ζ and the distribution ρ of X , if we onlyknow that F is s -Hölder regular (and, say, compactly supported), it is well-known that the min-maxnonparametric rate for estimating F in L ( ρ ) is n − s/ (2 s + d ) [25]. This is an instance of the curseof dimensionality : the rate slows down dramatically as the dimension d increases. Many regressionmodels have been introduced throughout the decades to circumvent this phenomenon; see, for example,the classical reference [47]. When the covariates are intrinsically low-dimensional, concentrating onan unknown low-dimensional set, several estimators have been proved to converge at rates that areoptimal with respect to the intrinsic dimension [3, 36, 37, 43, 44]. In other models, the domain may behigh-dimensional, but the function itself is assumed to depend only on a small number of features. Aclassical case is the so-called single-index model , where F has the structure F ( x ) = f ( (cid:104) v, x (cid:105) ) (2)for some index vector v ∈ R d (that we may assume unitary without loss of generality) and link function f : R → R . In this context one may consider different estimation problems, depending on whether f isknown (e.g. in logistic regression) or both f and v are unknown. We are interested in the latter case.1 a r X i v : . [ m a t h . S T ] D ec A. Lanteri, M. Maggioni and S. Vigogna
Clearly, if v was known we could learn f by solving a -dimensional regression problem, which maybe done efficiently for large classes of functions f . So the question is: what is the price to pay for notknowing v ?It was conjectured in [47] that the min-max rate for regression of single-index models is n − s/ (2 s +1) ,that is, the min-max rate for univariate functions: no statistical cost would have to be paid. This ratewas proved for pointwise convergence with kernel estimators in [28, Theorem 3.3] and [29, Section2.5], observing that the index can be learned at the parametric rate n − / . Based on these results oron similar heuristics, a wide part of literature focused on index estimation, setting aside the regressionproblem. From this perspective, the main point is that the estimation of the index v can be carried outat parametric rate in spite of the unknown nonparametric nonlinearity f . A proof of Stone’s conjecture(for convergence in L ( ρ ) ) can be found in [25, Corollary 22.1].Granted that the estimation of the index does not entail additional statistical costs (in terms of re-gression rates), a different but no less important problem is determining the computational cost toimplement a statistically optimal estimator for the single-index model. The rate in [25, Corollary 22.1]is obtained by a least squares joint minimization over v and f , but no executable algorithm is provided.In [23] it was proposed to aggregate local polynomial estimators on a lattice of the unit sphere, yieldingan adaptive, universal min-max estimator, although at the expense of a possibly exponential numberof operations Ω( n ( d − / ) . While a heuristic faster algorithm is therein also proposed, its statisticaleffectiveness was not proved.Several other methods for the estimation of v or f were developed over the years. A first categoryincludes semiparametric methods based on maximum likelihood estimation [32, 27, 17, 18, 19, 8, 9].M-estimators produce √ n -consistent index estimates under general assumptions, but their implementa-tion is cumbersome and computationally demanding, in that depends on sensitive bandwidth selectionsfor kernel smoothing and relies on high-dimensional joint optimization. An attempt at avoiding thedata sparsity problem was made by [13], which proposed a fixed-point iterative scheme only involv-ing -dimensional nonparametric smoothers. Direct methods such as the average derivative estimation(ADE [46, 28]) estimate the index vector exploiting its proportionality with the derivative of the re-gression function. Early implementations of this idea suffer from the curse of dimensionality due tokernel estimation of the gradient, while later iterative modifications [31] provide √ n -consistency undermild assumptions, yet not eliminating the computational overhead. More recently, Isotron [34, 33] andSILO [24] achieved linear complexity, but the proven regression rate, even if independent of d , is notmin-max (albeit SILO focuses on the n (cid:28) d regime, rather than the limit n → ∞ as here and most pastwork). In a different yet related direction, the even more recent [1] showed that convex neural networkscan adapt to a large variety of statistical models, including single-index; however, they do not matchthe optimal learning rate (even for the single-index case), and at the same time do not have associatedfast algorithms.Meanwhile, a line of research addressed sufficient dimension reduction [39] in the more general multi-index model (or a slight extension thereof), where F depends on multiple k < d index directionsspanning an unknown index subspace. Along this thread we can find nonparametric methods extendingADE to multiple indices, such as structural adaptation ([30, 14]), the outer product of gradients (OPG[52]) and the minimum average variance estimation (MAVE [52, 51]). Alternatively, conditional meth-ods derive their estimates from statistics of the conditional distribution of the explanatory variable X given the response variable Y . Prominent examples are sliced inverse regression (SIR [21, 42]), slicedaverage variance estimation (SAVE [10]), simple contour regression (SCR [41]) and its generalizations(e.g. GCR [41], DR [40]). Conditional methods are appealing for several reasons. Compared to non-parametric methods, their implementation is straightforward, consisting of noniterative computation of“sliced” empirical moments and having only one “slicing” parameter to tune. Moreover, they are com-putationally efficient and simple to analyze, enjoying √ n -consistency and, in most cases, complexity onditional regression for single-index models Table 1.
Proven rate (up to log factors) and computational cost of several methods for index estimation and/orregression in single-index models, together with salient assumptions on the model.
Performance AssumptionsProven rate Computational cost
XXX fff ζζζ (cid:98) v (cid:98) v (cid:98) v (cid:98) f (cid:98) f (cid:98) f (cid:98) v (cid:98) v (cid:98) v (cid:98) f (cid:98) f (cid:98) f SIR [42] n − / − d n log n − linear E [ X | v T X ] N/A N/A
SAVE [10] n − / − d n log n − linear E [ X | v T X ] ,const Cov[ X | v T X ] N/A N/A
SCR [41] n − / − d n log n − linear E [ X | v T X ] ,const Cov[ X | v T X ] stochasticallymonotone decreasingdensity of ζ − (cid:101) ζ DR [40] n − / − d n log n − linear E [ X | v T X ] ,const Cov[ X | v T X ] N/A N/A
ADE [31] n − / − d n log n − C positive density C Gaussian rMAVE [51] n − / N/A d n per iteration v T X has C density, E | X | < ∞ C E | Y | < ∞ Aggregation [23] − n − s s +1 ( n log n ) d compact supportedlower bounded density C s σ ( X ) N (0 , SlIsotron [33] N/A n − / ( nd log n ) / dn log n bounded monotone,Lipschitz bounded SILO [24] n − / n − / dn n log n Gaussian monotone,Lipschitz bounded
SVR n − / n − s s +1 d n log n n log n linear E [ X | v T X ] , Var[ w T X | v T X ] (cid:38) coarselymonotone, C s sub-Gaussian In this work we introduce a new estimator and a corresponding algorithm, called Smallest VectorRegression (SVR), that are statistically optimal and computationally efficient. Our dimension reductiontechnique falls in the category of conditional methods. Unlike existing studies for similar approaches,we are able to provide a characterization for the parameter selection, and bound both the index esti-mation and the regression errors. Since regression is performed using standard piecewise polynomialestimates on the projected samples after and independently of the index estimation step, our regression
A. Lanteri, M. Maggioni and S. Vigogna bounds hold conditioned to any index estimation method of sufficient accuracy. Our analysis yieldsconvergence by proving finite-sample bounds in high probability. The resulting statements are strongercompared to the ones in the available literature on conditional methods, where typically only asymp-totic convergence, at most, is established. As a side note, SVR has been empirically tested with successalso in the multi-index model, but our analysis, and therefore our exposition, will be restricted to thesingle-index case. In summary, the contributions of this work are:1. We introduce a new conditional regression method that combines accuracy, robustness and lowcomputational cost. This method is multiscale and sheds light on parameter choices that areimportant in theory and practice, and are mostly left unaddressed in other techniques.2. We prove strong, finite-sample convergence bounds, both in probability and in expectation, forthe index estimate of our conditional method.3. We prove that polynomial partitioning estimates are Hölder continuous with high probabilitywith respect to the index estimation error. This allows to bridge the gap between a good estimatorof the index subspace and the performance of regression on the estimated subspace.4. We prove that all √ n -consistent index estimation methods lead to the min-max -dimensionalrate of convergence when combined with polynomial partitioning estimates.5. Using the above, we provide an algorithm for the estimation of the single-index model withtheoretical guarantees of optimal convergence in quasilinear time.The paper is organized as follows. In Section 2 we review several conditional regression methods,and introduce our new estimator; in Section 3 we analyze the converge of our method and establishmin-max rates for regression conditioned on any sufficiently accurate index estimate; in Section 4 weconduct several numerical experiments, both validating the theory and exploring numerically aspectsof various techniques that are not covered by theoretical results; in the Appendix we collect additionalproofs and technical results. Notation symbol definition symbol definition
C, c positive absolute constants (cid:107) A (cid:107) spectral norm of matrix Aa (cid:46) b a ≤ Cb for some C λ i ( A ) i -th largest eigenvalue of matrix Aa (cid:16) b a (cid:46) b and b (cid:46) a | I | Lebesgue measure of interval I (cid:104) u, v (cid:105) inner product of vectors u and v S number of samples in set S (cid:107) u (cid:107) Euclidean norm of vector u { E } indicator function of event EB ( x, r ) Euclidean ball of center x and radius r X | Y r.v. X conditioned on r.v. Y
2. Conditional regression methods
We consider the regression problem as in (1), within the single-index model, with the definition andnotation as in (2). When f is at least Lipschitz, (2) implies ∇ F ( x ) ∈ span { v } for a.e. x ; this is thereason why we may refer to v as the gradient direction. Given n independent copies ( X i , Y i ) , i =1 , . . . , n , of the random pair ( X, Y ) , we will construct estimators (cid:98) v and (cid:98) f , and derive separate andcompound non-asymptotic error bounds in probability and expectation. Our method is conditionalin two ways: 1) the estimator (cid:98) v is a statistic of the conditional distributions of the X i ’s given the Y i ’s onditional regression for single-index models (cid:98) f is conditioned on the estimate (cid:98) v . Several conditionalmethods for step 1) have been previously introduced, see e.g. [42, 10, 41, 40]. Our error bounds forstep 2) are independent of the particular method used in 1), only requiring a minimal non-asymptoticconvergence rate. For these reasons, one may as well consider other existing or new methods for 1),even non conditional, and check for each one the convergence rate needed to pair it with 2).The common idea of all conditional methods is to compute statistics of the predictor X conditionedon the response Y . Conditioning on Y , one introduces anisotropy in the distribution of X , forcing it toreveal the index structure through its moments, be they means (SIR) or variances (SAVE, SCR, DR).Before introducing SVR, we will review the two methods that have strongest connections with ours,namely SIR and SAVE. For consistency with SVR, we will present SIR and SAVE through a particu-lar multiscale implementation. This will allow to progressively define the objects SVR is built upon,facilitating the comparison. Note . All the algorithms we consider include a preprocessing step where data are standardized tohave mean and isotropic covariance. Thus, when illustrating each method, we will assume suchstandardization. Sliced Inverse Regression [42] (SIR) estimates the index vector by a principal component analysis ofa sliced empirical approximation of the inverse regression curve E [ X | Y ] . Samples on this curve areobtained by slicing the range of the function and computing sample means of the corresponding ap-proximate level sets. In our version of SIR, we take dyadic partitions { S l,h } l h =1 , l ∈ Z , of a subintervalof the range of Y , where each S l,h is an interval of length (cid:16) − l . After calculating the sample meanfor each slice, (cid:98) µ l,h = 1 S l,h (cid:88) i X i { Y i ∈ S l,h } , h = 1 , . . . , l , SIR outputs (cid:98) v l as the eigenvector of largest eigenvalue of the weighted covariance matrix (cid:99) M l = (cid:88) h (cid:98) µ l,h (cid:98) µ Tl,h S l,h n . Note that the population limits of (cid:98) µ l,h and (cid:99) M l are, respectively, µ l,h = E [ X | Y ∈ S l,h ] , M l = (cid:88) h µ l,h µ Tl,h P { Y ∈ S l,h } . Sliced Average Variance Estimation [10] (SAVE) generalizes SIR to second order moments. After slic-ing the range of Y and computing the centers (cid:98) µ l,h ’s, it goes further and construct the sample covarianceon each slice: (cid:98) Σ l,h = 1 S l,h (cid:88) i ( X i − (cid:98) µ l,h )( X i − (cid:98) µ l,h ) T { Y i ∈ S l,h } . A. Lanteri, M. Maggioni and S. Vigogna
Then, it averages the (cid:98) Σ l,h ’s and defines (cid:98) v l as the eigenvector of largest eigenvalue of (cid:98) Σ l = (cid:88) h ( I − (cid:98) Σ l,h ) S l,h n . The matrices (cid:98) Σ l,h and (cid:98) Σ l are empirical estimates of Σ l,h = Cov[ X | Y ∈ S l,h ] , Σ l = (cid:88) h ( I − Σ l,h ) P { Y ∈ S l,h } . This is the new method we propose here. We perform a local principal component analysis on eachapproximate level set obtained by multiscale slices of Y . Because of the special structure (2), we expecteach (approximate) level set to be narrow along v and spread out along the orthogonal directions. Thus,the smallest principal component should approximate v . Once we have an estimate for v , we can projectdown the d -dimensional samples and perform nonparametric regression of the -dimensional function f . The method consists of the following steps:1.a) Construct a multiscale family of dyadic partitions of a subinterval S of the range of Y : { S l,h } l h =1 , l ∈ Z . For each l , { S l,h } h is a partition of S with | S l,h | = | S | − l .1.b) Let H l be the set of h ’s such that S l,h ≥ − l n . For h ∈ H l , let (cid:98) v l,h be the eigenvector corre-sponding to the smallest eigenvalue of (cid:98) Σ l,h .1.c) Compute the eigenvector (cid:98) v l corresponding to the largest eigenvalue of (cid:98) V l = 1 (cid:80) h ∈H l S l,h (cid:88) h ∈H l (cid:98) v l,h (cid:98) v Tl,h S l,h .
2) Regress f using a dyadic polynomial estimator (cid:98) f j | (cid:98) v l at scale j ≥ on the samples ( (cid:104) (cid:98) v l , X i (cid:105) , Y i ) , i = 1 , . . . , n (more details in Section 2.4). Return (cid:98) F j | (cid:98) v l ( x ) = (cid:98) f j | (cid:98) v l ( (cid:104) (cid:98) v l , x (cid:105) ) .While SVR shares step 1.a) with SIR, it differs from SIR in step 1.b), where it takes conditional(co)variance statistics in place of conditional means, and in step 1.c), where it averages smallest-variance directions rather than means. We may regard SAVE and SVR as two different modifications ofSIR to higher order statistics, which allows in general for better and more robust estimates (see Section4.1). The fundamental difference between SVR and SAVE is that SVR computes local estimates of theindex vector which then aggregates in a global estimate, while SAVE first aggregates local informationand then computes a single global estimate. In step 2) we use piecewise polynomial estimators in the spirit of [5, 4]: these techniques are basedon partitioning the domain (here, in a multiscale fashion), and constructing a local polynomial on eachelement of the partition by solving a least squares fitting problem. A global estimator is then obtained onditional regression for single-index models (cid:98) v , our step 2) consists of:2.a) Construct a multiscale family of dyadic partitions of an interval I : { I j,k } k ∈K j , j ∈ Z . For each j , { I j,k } k ∈K j is a partition of I with | I j,k | = | I | − j .2.b) For each I j,k , compute the best fitting polynomial (cid:98) f j,k | (cid:98) v = arg min deg( p ) ≤ m (cid:88) i | Y i − p ( (cid:104) (cid:98) v, X i (cid:105) ) | {(cid:104) (cid:98) v, X i (cid:105) ∈ I j,k } . (cid:98) f j,k | (cid:98) v over the partition { I j,k } k ∈K j : (cid:98) f j | (cid:98) v ( t ) = (cid:88) k ∈K j (cid:98) f j,k | (cid:98) v ( t ) { t ∈ I j,k } . The final estimator of F at scale j and conditioned on (cid:98) v is given by (cid:98) F j | (cid:98) v ( x ) = (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) . In SVR, step 2) is carried out on (cid:98) v = (cid:98) v l , yielding for each l a multiscale family of local polynomials (cid:98) f j,k | l and global estimators (cid:98) f j | l , (cid:98) F j | l . However, we will prove results on the performance of 2) alsowhen (cid:98) v is the output of any estimator with n − / probabilistic convergence rate (Theorem 2).Note that for SVR, but also for SIR and SAVE, the final estimator (cid:98) F j | l depends on two scale param-eters, l and j , which may be chosen independently. Our analysis yields optimal choices for these twoscale parameters; the scale − l at which the direction v is estimated will not be finer than the noiselevel, while a possibly finer partition with j > l may be selected to improve the polynomial fit, allowingthe estimator (cid:98) F j | l to de-noise its predictions, provided that enough training samples are available (seeFigure 1).We report below the complete sequence of steps run by SVR. The time complexity of the algorithmis shown in Table 2. Note that 2.c) has only an evaluation cost, i.e. (cid:98) f j | (cid:98) v does not need to be constructed,but only evaluated. The intervals S and I can be chosen such that they contain all or most of theobserved points. We will discuss in Section 3 possible choices for S and I .
3. Analysis of convergence
To carry out our analysis we shall make several assumptions on the distributions of X , Y and ζ :(X) X has sub-Gaussian distribution with variance proxy R . A. Lanteri, M. Maggioni and S. Vigogna l ll l lll l l ll lll ll ll ll l llll l ll l ll l ll l lll lll ll ll lll ll llll l ll lll ll lll ll l lll lllll l lll ll l ll llll ll llll lll ll lll ll l ll lll ll ll l ll ll ll ll lll l ll ll lll ll ll l ll l lll lll ll l ll l ll ll l l lllll l lll l lll lll l ll ll l lll lll llll llll l ll ll l lll ll llll lll ll ll lll l l ll llll lll ll llll ll l lll ll l ll l lll l ll lll ll ll ll ll l lll lll llll ll l lll llll l lll l l lll ll l ll lllll llll l lllll ll ll ll ll l l llll l ll l l llll l lll l lll llll ll l ll l ll lll lll lll lll ll ll ll llll lll ll lll ll ll ll ll ll l ll ll ll ll lll lll lll l ll ll lll l lllll l l ll ll ll ll llll ll llll ll ll l ll ll ll l lll llll lllll l lll ll ll l ll ll lll llll l ll l lll ll l ll ll ll ll ll ll l llll ll l ll l l ll lll lllll lllll l lll lll ll l l llll l ll lll llll ll llll ll l ll l l l ll ll ll lll l lll ll llll ll l ll l ll ll l ll ll ll ll ll lll ll lll l lll l l ll ll l ll ll ll llll l ll l ll ll ll ll ll ll ll l ll ll ll llll ll ll l l lll l ll l lll l ll lll lll ll l l ll llll l llll l ll ll ll lll l lll ll llll l lllll ll ll lll ll ll ll ll ll ll l ll ll ll lll ll ll l ll ll llll ll llll ll lll lll l lll lll l ll l l ll llll lll l lll l ll ll lll l ll llll l lll ll l ll llll l ll l lllll ll lll l ll ll ll l llll lll l ll lll ll ll ll ll l ll ll lll ll lllll l lll ll ll ll llll ll lll ll ll lll llll ll l ll l lll l llll lll lll ll ll ll l ll llll l ll ll lll lll ll ll l llll ll ll lll l l lll l ll ll lll l l ll l l ll ll lll lll l lll llll llll ll l ll l ll ll ll lll l lllll ll ll l l l lll l lll l ll ll l l lllll l llll ll l ll l ll ll ll lll ll lll lll ll ll ll l lllll ll ll ll ll lll ll ll l llll ll ll ll ll l l ll l lll lll l lll l l ll l ll ll ll ll lll lll l lll lll ll l llll lll ll lll l ll ll l l llll llll l ll l ll l ll lll l lll l lll ll ll ll lll ll l lll l lll ll l ll l ll ll l lll lll l l ll ll ll llll lll l lll l ll l l l llll lllll ll l lllll l l ll l l ll l ll ll ll llll l ll ll lll ll ll ll l ll l lllll l lll l l lll l ll l ll lll l ll ll ll llll ll lll ll ll l l l lllll l ll l ll lllll l lll ll lllll lll l l llll llll ll ll l lll ll ll l lll l ll llll ll ll lll ll l ll l l lll lllll l ll llll l llll ll ll llll ll lll ll ll l llll ll ll ll l l l lllll l ll l l ll ll l l ll lll lll l ll ll llll lllll l ll llll l ll ll l l l ll ll l ll l ll ll lll ll ll l ll lll l ll lll l ll ll l llll lll l lll l lll l ll llll lll ll l ll lll l ll l ll ll l l lll l ll l lll ll ll l l ll lll ll l ll ll lll ll llll ll ll ll l ll l ll l lll llll ll llll l ll l ll lll l ll lll lll ll l lllll ll l l l l llll ll ll ll ll l ll llll llll lll ll l llll lll l ll lll l l ll lll ll ll ll ll lll lll llll lll lll ll ll lll ll ll lllll ll ll ll ll l llll llll l ll ll ll ll lll lll llll ll ll l ll ll ll llll l l ll ll ll lll ll lll ll l ll ll lll ll lll lll ll l ll l ll l lll lll ll l lll l lll l l ll lll ll lll l ll ll lll ll l l lll ll ll lll llllll l l l lll ll l l llll l ll ll l ll ll l ll ll ll ll ll l l lll l lll l ll l llll ll ll ll ll lll ll l ll llll lll l l lll l ll ll l l llll l ll ll ll l ll l ll lll l llll l ll llll l l ll ll lll ll ll l l ll ll ll l l ll llll ll l l ll lll l ll lll ll l ll l ll ll l llll l lll ll ll l ll l l l l lll lll l lll ll ll l ll lll l ll ll lll ll ll lll lll ll l ll ll ll l l ll ll llll l lll l ll lll lll llll ll ll l ll ll lll l ll ll ll ll ll ll lll ll lll ll l ll l l lll lll l ll ll ll ll ll ll ll l l lll ll l ll l ll ll lllll ll l lll l lll lll lll l ll ll ll ll ll ll lll l lll ll l lll l ll ll ll lll l l lll l l ll lll ll llll lll lll ll ll lll l l −2 −1 0 1 2 . . . . ll ll l l ll ll lll ll ll lllll l llll ll l l ll lll l l ll l ll ll ll ll l l lll l llll lll llll ll l lll l ll llll lll l llll l lll l ll ll ll ll l lll lll lll lll lll l ll lll llll ll ll ll ll ll ll ll l lll l lll llll ll ll ll ll llll lllll lll ll ll lll ll ll ll ll ll l lll llll l ll l ll lll ll ll lll ll llll l ll lll lll l lll lll l lll ll lll ll lll ll ll l l ll ll llll l llll l ll ll lll l ll lll ll l lll l l ll ll ll ll l lll l ll l lll lll l ll ll ll lll ll l lll ll l llll ll lllll ll llll ll ll l ll llll l llll ll l l lll l l lll l l ll ll l ll ll ll l l ll l ll ll lll ll llll l llll ll l lll ll ll l ll lll l lll llll ll lll ll l lll lll ll llll ll ll lllll ll l ll ll lll ll ll lll llll ll l lll ll ll ll ll l ll l lll ll l ll lll l ll l ll ll ll l l ll lll l lll ll l ll ll l ll lll l ll l llll l lll ll lll ll ll l ll l ll lll ll ll llll ll ll l ll l l ll ll ll ll llll l ll ll lll llll l ll ll ll lllll l l ll ll ll ll l ll lll ll l ll ll lll l ll lll l ll l l lll ll l l lll lll l llll llllll l lll l l ll l lll l l ll l l ll ll ll ll lll lll l ll ll ll ll l l lll ll lllll l ll l ll ll ll l l lll ll l l ll l ll lll lll l lll l ll lll ll llll ll ll lll lll l ll ll l ll ll lll l ll l ll llll lll lll l l lll llll lll l ll lll lll ll ll l l lll lll lll lllll llll ll ll l lll lll llll l l lll l lllll ll lll ll ll ll ll l ll l lll lll llll l ll ll l l lll l ll ll ll lll ll lll lll lll l ll ll ll llll lll ll lllll ll l ll l ll ll ll l ll ll lll ll ll lllll ll lllll l ll l lll ll ll l lll llll llll llll ll l lll ll lll ll llll ll ll lll l ll lll l lll l llll l ll l lll ll lll lll lll lll llll l lll ll l lll l lll ll l lll llll lll l lll l ll ll l lll ll ll l llll ll ll ll llll ll l lll lll ll llll lll ll ll l l l lll l llll ll l ll ll lll lll ll l ll lll lll l lll ll ll l ll lll l lll ll l lll ll ll ll l l llll lll ll lll ll lll ll l lll l ll l l lll ll ll l ll ll l lll l llll l ll ll l ll lll ll l ll lll ll ll l lll l lll ll ll lll ll ll ll ll l ll ll llll llll lll lll ll ll lll ll lll l l ll ll ll ll lllll ll l llllll ll lll ll l ll ll ll ll lll ll l ll lll ll ll l lll lll ll ll lll ll l lll l ll l lll lll ll llll l ll ll ll llll ll ll ll l ll l ll llll ll l l lll ll lll ll ll l l ll llll lll l lll lll ll llll l ll lll ll ll l ll l lll l lll l llll l ll lll lll l lll l ll ll ll ll l ll lll ll l l l ll lll llllll l lll l ll ll l ll ll ll l ll llll lllll ll l lll ll l lll ll l lll l ll ll ll ll l ll lll l ll lll l ll ll lll ll l ll ll l ll ll ll l ll lll ll ll lll l lll ll ll l ll lll ll l lll ll lll ll lll l ll ll llll ll l l ll lll ll lll l llll ll ll lll ll ll ll ll lll ll lllll l l ll llll ll l ll ll l l llll l ll l lll ll l lll ll ll ll l lll l llll l ll lll ll l ll lll lll ll ll ll llll l ll llll lll l ll ll lll lll ll l lll l l lll lllll l lll lll lll ll l ll l lll l ll ll lll lll lll lll ll l llll lll l lll l lll lll ll ll lll ll l lll l l ll lll llll lll ll llll l lll ll ll lll ll ll l lll lll ll llll ll ll llll l lll l l l ll lll l lll ll l l lll l lll lll l lll lll lll ll ll l ll ll lll lll ll l ll ll ll l ll llll l lll lll ll l ll ll ll l l ll llll l ll l lll lll l ll l l ll l ll l ll llll l lll ll ll ll l ll ll ll lll l ll ll ll lll ll llll ll l lll ll lll ll l ll lll l ll llll lll ll lll ll l ll ll ll lll llllllll l ll ll lll ll lll ll l l lll l l ll llll lll ll ll llll l llll l lll ll l l lll ll ll lll ll ll l lll lll lll ll llll l lll l lll l ll l llll lll l l ll ll ll ll lll ll ll lll l ll ll ll l lll l lll ll lll ll lll ll l ll lll l l l ll ll ll ll lll ll l l ll ll ll l ll ll l ll l lll l lllll llll l ll ll l lll l=1 j=1 l ll l lll l l ll lll ll ll ll l llll l ll l ll l ll l lll lll ll ll lll ll llll l ll lll ll lll ll l lll lllll l lll ll l ll llll ll llll lll ll lll ll l ll lll ll ll l ll ll ll ll lll l ll ll lll ll ll l ll l lll lll ll l ll l ll ll l l lllll l lll l lll lll l ll ll l lll lll llll llll l ll ll l lll ll llll lll ll ll lll l l ll llll lll ll llll ll l lll ll l ll l lll l ll lll ll ll ll ll l lll lll llll ll l lll llll l lll l l lll ll l ll lllll llll l lllll ll ll ll ll l l llll l ll l l llll l lll l lll llll ll l ll l ll lll lll lll lll ll ll ll llll lll ll lll ll ll ll ll ll l ll ll ll ll lll lll lll l ll ll lll l lllll l l ll ll ll ll llll ll llll ll ll l ll ll ll l lll llll lllll l lll ll ll l ll ll lll llll l ll l lll ll l ll ll ll ll ll ll l llll ll l ll l l ll lll lllll lllll l lll lll ll l l llll l ll lll llll ll llll ll l ll l l l ll ll ll lll l lll ll llll ll l ll l ll ll l ll ll ll ll ll lll ll lll l lll l l ll ll l ll ll ll llll l ll l ll ll ll ll ll ll ll l ll ll ll llll ll ll l l lll l ll l lll l ll lll lll ll l l ll llll l llll l ll ll ll lll l lll ll llll l lllll ll ll lll ll ll ll ll ll ll l ll ll ll lll ll ll l ll ll llll ll llll ll lll lll l lll lll l ll l l ll llll lll l lll l ll ll lll l ll llll l lll ll l ll llll l ll l lllll ll lll l ll ll ll l llll lll l ll lll ll ll ll ll l ll ll lll ll lllll l lll ll ll ll llll ll lll ll ll lll llll ll l ll l lll l llll lll lll ll ll ll l ll llll l ll ll lll lll ll ll l llll ll ll lll l l lll l ll ll lll l l ll l l ll ll lll lll l lll llll llll ll l ll l ll ll ll lll l lllll ll ll l l l lll l lll l ll ll l l lllll l llll ll l ll l ll ll ll lll ll lll lll ll ll ll l lllll ll ll ll ll lll ll ll l llll ll ll ll ll l l ll l lll lll l lll l l ll l ll ll ll ll lll lll l lll lll ll l llll lll ll lll l ll ll l l llll llll l ll l ll l ll lll l lll l lll ll ll ll lll ll l lll l lll ll l ll l ll ll l lll lll l l ll ll ll llll lll l lll l ll l l l llll lllll ll l lllll l l ll l l ll l ll ll ll llll l ll ll lll ll ll ll l ll l lllll l lll l l lll l ll l ll lll l ll ll ll llll ll lll ll ll l l l lllll l ll l ll lllll l lll ll lllll lll l l llll llll ll ll l lll ll ll l lll l ll llll ll ll lll ll l ll l l lll lllll l ll llll l llll ll ll llll ll lll ll ll l llll ll ll ll l l l lllll l ll l l ll ll l l ll lll lll l ll ll llll lllll l ll llll l ll ll l l l ll ll l ll l ll ll lll ll ll l ll lll l ll lll l ll ll l llll lll l lll l lll l ll llll lll ll l ll lll l ll l ll ll l l lll l ll l lll ll ll l l ll lll ll l ll ll lll ll llll ll ll ll l ll l ll l lll llll ll llll l ll l ll lll l ll lll lll ll l lllll ll l l l l llll ll ll ll ll l ll llll llll lll ll l llll lll l ll lll l l ll lll ll ll ll ll lll lll llll lll lll ll ll lll ll ll lllll ll ll ll ll l llll llll l ll ll ll ll lll lll llll ll ll l ll ll ll llll l l ll ll ll lll ll lll ll l ll ll lll ll lll lll ll l ll l ll l lll lll ll l lll l lll l l ll lll ll lll l ll ll lll ll l l lll ll ll lll llllll l l l lll ll l l llll l ll ll l ll ll l ll ll ll ll ll l l lll l lll l ll l llll ll ll ll ll lll ll l ll llll lll l l lll l ll ll l l llll l ll ll ll l ll l ll lll l llll l ll llll l l ll ll lll ll ll l l ll ll ll l l ll llll ll l l ll lll l ll lll ll l ll l ll ll l llll l lll ll ll l ll l l l l lll lll l lll ll ll l ll lll l ll ll lll ll ll lll lll ll l ll ll ll l l ll ll llll l lll l ll lll lll llll ll ll l ll ll lll l ll ll ll ll ll ll lll ll lll ll l ll l l lll lll l ll ll ll ll ll ll ll l l lll ll l ll l ll ll lllll ll l lll l lll lll lll l ll ll ll ll ll ll lll l lll ll l lll l ll ll ll lll l l lll l l ll lll ll llll lll lll ll ll lll l l −2 −1 0 1 2 . . . . lllll llll lllllll ll llll llllllllll lllllll lllllll lllllllllllll lllllll ll llllllll ll llllllllllllllllllllllllll l llllll llll ll lllll llll lllllllllll lllllllll lllllllllllll llllllllllllllll ll llllll lllll llllllllllllllllllllllllll llllllllllllllllllll l lllllllllll llll lll llllll llllllllllllllllllllllllll lllllllllllllllllllllllll lllllllllllllllll llllllllllllll llllllll lllllllllllllllllllllllllllll llllllllllllll llllllll lllllllllllllllll lllllll llllllllllllll ll llllll ll llllllll llll lllllll lll lllllllllllll lllll llllllllllll llllllllllll lllllllllllll llllllll lllllllll llllll lllllllllll lllllllll llllll llllllllllllllllllllllll llll lllllllllllllllll llllll llllllllllllllllll ll llllll llllllll llllllllll lllll llllllllllllllllllllllll lllllll lllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllll llllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllll llllllllllllllllllllllll lllll lllllllllllllllllll lllll llllllllllllllllllll llllllllllllllll llll lllll lllllll llllllllllllll llll llllllll llllll llllllllllll llllllllllllllll lllll llllllllllll lllllllllllllll llllllllllllll lllllllll llll lllll llllllllllllllll ll lllllll llllllllllllllllll ll llllllll llllllll llllll ll llllllll llllllll lllll lllllllllllllll lllllllllllllllllll llllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllll llllll lllllllllllllll llll lllllllll ll lllllllllll llllllll lll ll lll llll lllll llllllllllllllll lllllllllllllllllllllll llllllllllllllllllll ll lllllllll llllllllllllllllllllllllllll lllllllllllll llllllllllllllllll lll ll ll lllllllll lllllll l llllllllllllll llllllll llllllll llllllllllllllllll lllllllllllllllllllllllllllll lllll llll lll l llllllllllllll lllllllllllll llllllll lllllllllllllllllll lllll lllllllllllll ll lll llllllll llllll lllllll llllllllllllllllllllllllllllllllllllllll llllll lllllllllllllllll llll llllllllllllllll llllllllllllll lllll llllllllllllll lllllllll ll lllllll llllllllllllllllllllllllllllllllll lllll lllll lllllll llllllllllll lll llllllllllllll lllll llllll lllllllllllllllllllllllllllllllllll lll lllllllllllllllllllllllll lllllll lllllll lllllllll llllllllllllll llllll lllll ll llllllllllllllllllllllllllllllllllllllll llllllll l lllllllll lllllll llllllll llllll lllllllllllllllllll lllllllllllll lllllll lllllllll lllllll llllll llllllllllllllllllllllll lllllllllll llll lllllll llllllllllllllllllll lllll lll lllllllllllll llllllllllllllllllllll ll lllllllllll lllllllll l lllllllll llllllllllllllllll lllll llllllll llll llll llllll lll llllllllll llllllllllllllll lllllllll llll l=1 j=6 l ll l lll l l ll lll ll ll ll l llll l ll l ll l ll l lll lll ll ll lll ll llll l ll lll ll lll ll l lll lllll l lll ll l ll llll ll llll lll ll lll ll l ll lll ll ll l ll ll ll ll lll l ll ll lll ll ll l ll l lll lll ll l ll l ll ll l l lllll l lll l lll lll l ll ll l lll lll llll llll l ll ll l lll ll llll lll ll ll lll l l ll llll lll ll llll ll l lll ll l ll l lll l ll lll ll ll ll ll l lll lll llll ll l lll llll l lll l l lll ll l ll lllll llll l lllll ll ll ll ll l l llll l ll l l llll l lll l lll llll ll l ll l ll lll lll lll lll ll ll ll llll lll ll lll ll ll ll ll ll l ll ll ll ll lll lll lll l ll ll lll l lllll l l ll ll ll ll llll ll llll ll ll l ll ll ll l lll llll lllll l lll ll ll l ll ll lll llll l ll l lll ll l ll ll ll ll ll ll l llll ll l ll l l ll lll lllll lllll l lll lll ll l l llll l ll lll llll ll llll ll l ll l l l ll ll ll lll l lll ll llll ll l ll l ll ll l ll ll ll ll ll lll ll lll l lll l l ll ll l ll ll ll llll l ll l ll ll ll ll ll ll ll l ll ll ll llll ll ll l l lll l ll l lll l ll lll lll ll l l ll llll l llll l ll ll ll lll l lll ll llll l lllll ll ll lll ll ll ll ll ll ll l ll ll ll lll ll ll l ll ll llll ll llll ll lll lll l lll lll l ll l l ll llll lll l lll l ll ll lll l ll llll l lll ll l ll llll l ll l lllll ll lll l ll ll ll l llll lll l ll lll ll ll ll ll l ll ll lll ll lllll l lll ll ll ll llll ll lll ll ll lll llll ll l ll l lll l llll lll lll ll ll ll l ll llll l ll ll lll lll ll ll l llll ll ll lll l l lll l ll ll lll l l ll l l ll ll lll lll l lll llll llll ll l ll l ll ll ll lll l lllll ll ll l l l lll l lll l ll ll l l lllll l llll ll l ll l ll ll ll lll ll lll lll ll ll ll l lllll ll ll ll ll lll ll ll l llll ll ll ll ll l l ll l lll lll l lll l l ll l ll ll ll ll lll lll l lll lll ll l llll lll ll lll l ll ll l l llll llll l ll l ll l ll lll l lll l lll ll ll ll lll ll l lll l lll ll l ll l ll ll l lll lll l l ll ll ll llll lll l lll l ll l l l llll lllll ll l lllll l l ll l l ll l ll ll ll llll l ll ll lll ll ll ll l ll l lllll l lll l l lll l ll l ll lll l ll ll ll llll ll lll ll ll l l l lllll l ll l ll lllll l lll ll lllll lll l l llll llll ll ll l lll ll ll l lll l ll llll ll ll lll ll l ll l l lll lllll l ll llll l llll ll ll llll ll lll ll ll l llll ll ll ll l l l lllll l ll l l ll ll l l ll lll lll l ll ll llll lllll l ll llll l ll ll l l l ll ll l ll l ll ll lll ll ll l ll lll l ll lll l ll ll l llll lll l lll l lll l ll llll lll ll l ll lll l ll l ll ll l l lll l ll l lll ll ll l l ll lll ll l ll ll lll ll llll ll ll ll l ll l ll l lll llll ll llll l ll l ll lll l ll lll lll ll l lllll ll l l l l llll ll ll ll ll l ll llll llll lll ll l llll lll l ll lll l l ll lll ll ll ll ll lll lll llll lll lll ll ll lll ll ll lllll ll ll ll ll l llll llll l ll ll ll ll lll lll llll ll ll l ll ll ll llll l l ll ll ll lll ll lll ll l ll ll lll ll lll lll ll l ll l ll l lll lll ll l lll l lll l l ll lll ll lll l ll ll lll ll l l lll ll ll lll llllll l l l lll ll l l llll l ll ll l ll ll l ll ll ll ll ll l l lll l lll l ll l llll ll ll ll ll lll ll l ll llll lll l l lll l ll ll l l llll l ll ll ll l ll l ll lll l llll l ll llll l l ll ll lll ll ll l l ll ll ll l l ll llll ll l l ll lll l ll lll ll l ll l ll ll l llll l lll ll ll l ll l l l l lll lll l lll ll ll l ll lll l ll ll lll ll ll lll lll ll l ll ll ll l l ll ll llll l lll l ll lll lll llll ll ll l ll ll lll l ll ll ll ll ll ll lll ll lll ll l ll l l lll lll l ll ll ll ll ll ll ll l l lll ll l ll l ll ll lllll ll l lll l lll lll lll l ll ll ll ll ll ll lll l lll ll l lll l ll ll ll lll l l lll l l ll lll ll llll lll lll ll ll lll l l −2 −1 0 1 2 . . . . lll lll llllll lllllllll llll llllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllll lll lllllllllllll llllll llllll lll l=1 j=12 l ll l lll l l ll lll ll ll ll l llll l ll l ll l ll l lll lll ll ll lll ll llll l ll lll ll lll ll l lll lllll l lll ll l ll llll ll llll lll ll lll ll l ll lll ll ll l ll ll ll ll lll l ll ll lll ll ll l ll l lll lll ll l ll l ll ll l l lllll l lll l lll lll l ll ll l lll lll llll llll l ll ll l lll ll llll lll ll ll lll l l ll llll lll ll llll ll l lll ll l ll l lll l ll lll ll ll ll ll l lll lll llll ll l lll llll l lll l l lll ll l ll lllll llll l lllll ll ll ll ll l l llll l ll l l llll l lll l lll llll ll l ll l ll lll lll lll lll ll ll ll llll lll ll lll ll ll ll ll ll l ll ll ll ll lll lll lll l ll ll lll l lllll l l ll ll ll ll llll ll llll ll ll l ll ll ll l lll llll lllll l lll ll ll l ll ll lll llll l ll l lll ll l ll ll ll ll ll ll l llll ll l ll l l ll lll lllll lllll l lll lll ll l l llll l ll lll llll ll llll ll l ll l l l ll ll ll lll l lll ll llll ll l ll l ll ll l ll ll ll ll ll lll ll lll l lll l l ll ll l ll ll ll llll l ll l ll ll ll ll ll ll ll l ll ll ll llll ll ll l l lll l ll l lll l ll lll lll ll l l ll llll l llll l ll ll ll lll l lll ll llll l lllll ll ll lll ll ll ll ll ll ll l ll ll ll lll ll ll l ll ll llll ll llll ll lll lll l lll lll l ll l l ll llll lll l lll l ll ll lll l ll llll l lll ll l ll llll l ll l lllll ll lll l ll ll ll l llll lll l ll lll ll ll ll ll l ll ll lll ll lllll l lll ll ll ll llll ll lll ll ll lll llll ll l ll l lll l llll lll lll ll ll ll l ll llll l ll ll lll lll ll ll l llll ll ll lll l l lll l ll ll lll l l ll l l ll ll lll lll l lll llll llll ll l ll l ll ll ll lll l lllll ll ll l l l lll l lll l ll ll l l lllll l llll ll l ll l ll ll ll lll ll lll lll ll ll ll l lllll ll ll ll ll lll ll ll l llll ll ll ll ll l l ll l lll lll l lll l l ll l ll ll ll ll lll lll l lll lll ll l llll lll ll lll l ll ll l l llll llll l ll l ll l ll lll l lll l lll ll ll ll lll ll l lll l lll ll l ll l ll ll l lll lll l l ll ll ll llll lll l lll l ll l l l llll lllll ll l lllll l l ll l l ll l ll ll ll llll l ll ll lll ll ll ll l ll l lllll l lll l l lll l ll l ll lll l ll ll ll llll ll lll ll ll l l l lllll l ll l ll lllll l lll ll lllll lll l l llll llll ll ll l lll ll ll l lll l ll llll ll ll lll ll l ll l l lll lllll l ll llll l llll ll ll llll ll lll ll ll l llll ll ll ll l l l lllll l ll l l ll ll l l ll lll lll l ll ll llll lllll l ll llll l ll ll l l l ll ll l ll l ll ll lll ll ll l ll lll l ll lll l ll ll l llll lll l lll l lll l ll llll lll ll l ll lll l ll l ll ll l l lll l ll l lll ll ll l l ll lll ll l ll ll lll ll llll ll ll ll l ll l ll l lll llll ll llll l ll l ll lll l ll lll lll ll l lllll ll l l l l llll ll ll ll ll l ll llll llll lll ll l llll lll l ll lll l l ll lll ll ll ll ll lll lll llll lll lll ll ll lll ll ll lllll ll ll ll ll l llll llll l ll ll ll ll lll lll llll ll ll l ll ll ll llll l l ll ll ll lll ll lll ll l ll ll lll ll lll lll ll l ll l ll l lll lll ll l lll l lll l l ll lll ll lll l ll ll lll ll l l lll ll ll lll llllll l l l lll ll l l llll l ll ll l ll ll l ll ll ll ll ll l l lll l lll l ll l llll ll ll ll ll lll ll l ll llll lll l l lll l ll ll l l llll l ll ll ll l ll l ll lll l llll l ll llll l l ll ll lll ll ll l l ll ll ll l l ll llll ll l l ll lll l ll lll ll l ll l ll ll l llll l lll ll ll l ll l l l l lll lll l lll ll ll l ll lll l ll ll lll ll ll lll lll ll l ll ll ll l l ll ll llll l lll l ll lll lll llll ll ll l ll ll lll l ll ll ll ll ll ll lll ll lll ll l ll l l lll lll l ll ll ll ll ll ll ll l l lll ll l ll l ll ll lllll ll l lll l lll lll lll l ll ll ll ll ll ll lll l lll ll l lll l ll ll ll lll l l lll l l ll lll ll llll lll lll ll ll lll l l −2 −1 0 1 2 . . . . ll l ll lll lllll l l lll l l llll ll l lll ll lllll ll ll lll lll llll ll lll l ll ll l lll ll lll ll l lll ll l ll ll ll llll ll l ll ll ll lll lll lll l l ll l llll l ll l l ll ll ll lll ll l ll l ll lll ll lll ll ll ll l llll l l lll l ll ll ll l lll ll l lll l ll ll l ll ll lll l ll lll l lll lll ll lllll l lll l l l ll ll l lll ll ll ll lll lll l lll l l lll l l llllll lll ll l lll lll lll ll l l ll l ll ll lll lll l ll ll l ll lll ll ll l ll lll l ll ll ll ll l lll llll ll ll l lll l l l ll lll ll l lll l lll ll ll ll llllll ll l ll ll l lllll ll ll llll l lll l lll l ll l ll l ll llll lll ll ll ll lll lll llll llll l ll ll lll ll ll lll lll l l l lll l lllll ll llllll lll ll ll ll ll l ll lllll l lll l l lll l ll ll ll l lll l lll l l l ll ll ll l ll lll llll l ll ll ll l ll l ll ll ll l llll lll l ll lll l l llll l ll lll l llll lll l llll l ll ll ll lll l l ll ll lll lllll ll ll llll l llll l l lll ll ll l ll ll ll l l ll ll lll l ll lllll ll ll ll ll lll ll ll ll l llll lll ll ll ll llll lll l llll lll ll llll ll lll l lll lll l lll ll ll ll ll ll l ll ll l lll ll ll l llll l l lll ll ll l ll lll llll ll lll l ll lll ll lll llll lll ll ll ll lll ll llll ll l lll l ll ll l l ll ll ll lll llll ll ll ll l l ll ll ll lll l llll ll lll l ll ll llll lllll ll l llll ll ll ll ll llllll l ll ll l l ll lllll l ll l ll ll ll ll l lll ll lll ll ll lll ll ll ll l ll ll ll ll ll lll l ll ll ll lll l llll l l ll l ll l lll ll l lll ll ll ll lll ll ll ll ll ll l ll ll ll l ll l l ll l ll ll lll l ll ll ll llll l lllll l llll l lll ll l lll l lll l l l llll l ll lll ll lll l l lll l llll l l l lll lll l llll l ll l lll l ll ll ll lll ll l lll ll l l ll l ll ll lll llll ll ll ll lll ll lll ll lll l l ll ll ll lllll llll l l lll l ll l ll ll l l lll lll ll l llll lll ll ll ll ll l l ll ll l lllll ll llll l ll l lll l ll ll llllll l l llll l ll ll ll l ll l lll l llll ll ll ll lll l ll ll l l ll llll ll ll lll lll l ll lll ll ll ll l ll l ll llll l ll lll ll l lll ll ll l ll l lll lll l llll llll ll l lll llll ll ll lll lll ll ll l ll ll l ll lll llll ll lll ll l l llll ll lll lll llll lllll lll ll l ll lll llll ll ll ll ll lll llll lll l ll l l lll ll lll lll ll ll ll l ll ll lll ll l ll ll l lll lll lll l ll ll llll ll lll l ll lll l ll l ll ll lll l lll l l llll ll ll llll ll llll ll ll ll l ll lll l l ll lll ll ll ll l l l lll ll l lll l ll l l ll l lll l ll l ll lll lll lll l l lllll ll l l lllll l l lll lllll ll l ll l llllll llll ll l ll lll lll l ll ll lll lll lll ll llll ll lll lll ll l l lll ll l l lll l lll lll l lll llll ll ll ll ll l ll ll ll ll l ll l l lll ll l ll l l ll ll l ll l ll ll lll ll lll ll l l llll l ll lll l l ll lll lll llll lll ll ll ll lll l ll l lll ll lll lll ll l l ll l ll l ll lll lll l l lll llll l ll l lll ll l l lll llll ll ll lll l lllll l llllll l ll lll lll l ll ll lll l ll ll l ll lll lllll lllll ll l llll ll lll lll lllll l lll ll lll l l lll l ll l ll ll l l l ll ll l l lll ll l ll lll ll ll l l l lll lll l l lll l lll l ll l lll ll l ll l llll ll ll ll ll l l lll llll l ll ll llll ll ll ll l ll l l l ll l ll l ll llll llll lll ll lll llll ll lll ll ll ll l l ll llll ll ll llll l llll l lll lll lll ll l ll llll l l ll lll lllll l ll lll l l ll ll lll ll l ll l lll l l l lll ll ll lll lll ll lll l l l lll lll ll lll ll l ll lll l lll l lll ll lll ll ll l ll l ll llll ll l l ll ll l ll lllll l l ll ll l lll l ll ll ll ll lll l lll ll ll ll llll ll l l lll lll ll l ll l lll ll ll ll llll ll lll lll ll lll l l llll l lllll l ll lll ll ll lll ll lll llllll l ll ll l l ll lll l l l l lll lll lll ll l l l=5 j=1 l ll l lll l l ll lll ll ll ll l llll l ll l ll l ll l lll lll ll ll lll ll llll l ll lll ll lll ll l lll lllll l lll ll l ll llll ll llll lll ll lll ll l ll lll ll ll l ll ll ll ll lll l ll ll lll ll ll l ll l lll lll ll l ll l ll ll l l lllll l lll l lll lll l ll ll l lll lll llll llll l ll ll l lll ll llll lll ll ll lll l l ll llll lll ll llll ll l lll ll l ll l lll l ll lll ll ll ll ll l lll lll llll ll l lll llll l lll l l lll ll l ll lllll llll l lllll ll ll ll ll l l llll l ll l l llll l lll l lll llll ll l ll l ll lll lll lll lll ll ll ll llll lll ll lll ll ll ll ll ll l ll ll ll ll lll lll lll l ll ll lll l lllll l l ll ll ll ll llll ll llll ll ll l ll ll ll l lll llll lllll l lll ll ll l ll ll lll llll l ll l lll ll l ll ll ll ll ll ll l llll ll l ll l l ll lll lllll lllll l lll lll ll l l llll l ll lll llll ll llll ll l ll l l l ll ll ll lll l lll ll llll ll l ll l ll ll l ll ll ll ll ll lll ll lll l lll l l ll ll l ll ll ll llll l ll l ll ll ll ll ll ll ll l ll ll ll llll ll ll l l lll l ll l lll l ll lll lll ll l l ll llll l llll l ll ll ll lll l lll ll llll l lllll ll ll lll ll ll ll ll ll ll l ll ll ll lll ll ll l ll ll llll ll llll ll lll lll l lll lll l ll l l ll llll lll l lll l ll ll lll l ll llll l lll ll l ll llll l ll l lllll ll lll l ll ll ll l llll lll l ll lll ll ll ll ll l ll ll lll ll lllll l lll ll ll ll llll ll lll ll ll lll llll ll l ll l lll l llll lll lll ll ll ll l ll llll l ll ll lll lll ll ll l llll ll ll lll l l lll l ll ll lll l l ll l l ll ll lll lll l lll llll llll ll l ll l ll ll ll lll l lllll ll ll l l l lll l lll l ll ll l l lllll l llll ll l ll l ll ll ll lll ll lll lll ll ll ll l lllll ll ll ll ll lll ll ll l llll ll ll ll ll l l ll l lll lll l lll l l ll l ll ll ll ll lll lll l lll lll ll l llll lll ll lll l ll ll l l llll llll l ll l ll l ll lll l lll l lll ll ll ll lll ll l lll l lll ll l ll l ll ll l lll lll l l ll ll ll llll lll l lll l ll l l l llll lllll ll l lllll l l ll l l ll l ll ll ll llll l ll ll lll ll ll ll l ll l lllll l lll l l lll l ll l ll lll l ll ll ll llll ll lll ll ll l l l lllll l ll l ll lllll l lll ll lllll lll l l llll llll ll ll l lll ll ll l lll l ll llll ll ll lll ll l ll l l lll lllll l ll llll l llll ll ll llll ll lll ll ll l llll ll ll ll l l l lllll l ll l l ll ll l l ll lll lll l ll ll llll lllll l ll llll l ll ll l l l ll ll l ll l ll ll lll ll ll l ll lll l ll lll l ll ll l llll lll l lll l lll l ll llll lll ll l ll lll l ll l ll ll l l lll l ll l lll ll ll l l ll lll ll l ll ll lll ll llll ll ll ll l ll l ll l lll llll ll llll l ll l ll lll l ll lll lll ll l lllll ll l l l l llll ll ll ll ll l ll llll llll lll ll l llll lll l ll lll l l ll lll ll ll ll ll lll lll llll lll lll ll ll lll ll ll lllll ll ll ll ll l llll llll l ll ll ll ll lll lll llll ll ll l ll ll ll llll l l ll ll ll lll ll lll ll l ll ll lll ll lll lll ll l ll l ll l lll lll ll l lll l lll l l ll lll ll lll l ll ll lll ll l l lll ll ll lll llllll l l l lll ll l l llll l ll ll l ll ll l ll ll ll ll ll l l lll l lll l ll l llll ll ll ll ll lll ll l ll llll lll l l lll l ll ll l l llll l ll ll ll l ll l ll lll l llll l ll llll l l ll ll lll ll ll l l ll ll ll l l ll llll ll l l ll lll l ll lll ll l ll l ll ll l llll l lll ll ll l ll l l l l lll lll l lll ll ll l ll lll l ll ll lll ll ll lll lll ll l ll ll ll l l ll ll llll l lll l ll lll lll llll ll ll l ll ll lll l ll ll ll ll ll ll lll ll lll ll l ll l l lll lll l ll ll ll ll ll ll ll l l lll ll l ll l ll ll lllll ll l lll l lll lll lll l ll ll ll ll ll ll lll l lll ll l lll l ll ll ll lll l l lll l l ll lll ll llll lll lll ll ll lll l l −2 −1 0 1 2 . . . . ll lll lll lll ll llllllllllll ll lllllll l lllllll llllllll ll llllll llllll llll llllllll lllllll ll llll llllllll llllllllllll ll llllllllll llllllllllllllllll llllllllllllllll llll llllllllllllll ll lllllllll llllllllllllllll lll lllllll lllllll llll lll llllllllll ll lllllllllllllllllllll lllllllllll llll lllll lll lllll llll llllllllllllllllll llllllllllllll llll llll llllllllllllllllll llll llllll lll llll lllllllll lllllll lllllllllllllllllllllllllllllllllll lllllllll lllllllllllllllll llllll lllll llllllllllll llllll llllllllllllllllllllll ll llllllllll llllllllll llll ll llllll llllll llllllllllllllll llll lll lll llllllll lll lllllllll llllllllllllll ll llllllllllllll llll lllllllll llll lllll llllllllll llllll ll lllllll llllllllllllllll llllllllllllllllllllllllll lllllllllll ll llllllllllllllllll lllllll llllllll llllllllllllllllll lll llllllllllllllllllll llllll llll lllllllll lll ll lll lllllllllll ll lllllllll lll lllllll lllll ll llll lll lll llllllllllllll lll lllllll ll lllllllllllllllllllll llllllllllll lllllllll ll llllllllllllll lllllllllllllllllll llll lllllllllll ll llllllllllllll llllllllllllllllllllll lllllllllllllllll lllll lllllllllllllllllllll ll lll llll lllllllllllllllll llllllll llllllll lllllll lllllllll llllllllll lllllllll llll llll lllllllllllllllllllllllllllllllllll lllll llllll llllllllll lllllllllllllllllllllllllllllllllllllllllll lllllllllllll ll lllllllllllllllllllllllllllllll llllllllllllll llll lll llllll lllllllllll llll llllllllll lllllllllll lll llllllllllllllllll llllllllllllllllllllllllllllllllll lllll llllllllllll llllll llllllll lllllll lllllllllllllllllllll lll llllllllllllll llllll lllllll lllllllllllll llllllllllllllll ll lllllllllllll llll llllllllllllllll llllllllll lllllllllll llllllll llllllllllll llll llllllllllll llllll ll llllllllllllll lllllll lllllllllll llllllllllllllll ll ll lllllll llllllll llll llll lllllllll llllllllllllllllllllll lll llllllll llllllllll lllllllll lllllllllllllllllllllllllll llllllllll llllllllll lll lllll lllllllllllll lllll lll llllllllllllllll lllllllll llllllll lllllllllll llllllll lllll llllllllllllllllll llllll lllll lllllllllllll llllllllllllllllll lllllll llllllllll llllllllllll ll lllll lllllllll llllll lll lllllll llllllllllllllll lllll lllll lllll llll llllll lllllllll lllllllllllllllllllllllllllllllllllll llllllllllll llllllll llllllllllllllllllllllll llllllll lllllll llllllllllllllllllllll lll llll llllllllllllllllllllllllllllllllll lllll lllll l llllllllllllll lllllllllllllll lllll llllll l lllllllllll lllllllll lllll lllllllll llll llllllllllll lllllll llllllllll llll llllll lllllllllllllll lll ll llllllll lllllllll l lll llllllll llll ll llll lllll lllllllll lll ll lll lllllllll l=5 j=6 l ll l lll l l ll lll ll ll ll l llll l ll l ll l ll l lll lll ll ll lll ll llll l ll lll ll lll ll l lll lllll l lll ll l ll llll ll llll lll ll lll ll l ll lll ll ll l ll ll ll ll lll l ll ll lll ll ll l ll l lll lll ll l ll l ll ll l l lllll l lll l lll lll l ll ll l lll lll llll llll l ll ll l lll ll llll lll ll ll lll l l ll llll lll ll llll ll l lll ll l ll l lll l ll lll ll ll ll ll l lll lll llll ll l lll llll l lll l l lll ll l ll lllll llll l lllll ll ll ll ll l l llll l ll l l llll l lll l lll llll ll l ll l ll lll lll lll lll ll ll ll llll lll ll lll ll ll ll ll ll l ll ll ll ll lll lll lll l ll ll lll l lllll l l ll ll ll ll llll ll llll ll ll l ll ll ll l lll llll lllll l lll ll ll l ll ll lll llll l ll l lll ll l ll ll ll ll ll ll l llll ll l ll l l ll lll lllll lllll l lll lll ll l l llll l ll lll llll ll llll ll l ll l l l ll ll ll lll l lll ll llll ll l ll l ll ll l ll ll ll ll ll lll ll lll l lll l l ll ll l ll ll ll llll l ll l ll ll ll ll ll ll ll l ll ll ll llll ll ll l l lll l ll l lll l ll lll lll ll l l ll llll l llll l ll ll ll lll l lll ll llll l lllll ll ll lll ll ll ll ll ll ll l ll ll ll lll ll ll l ll ll llll ll llll ll lll lll l lll lll l ll l l ll llll lll l lll l ll ll lll l ll llll l lll ll l ll llll l ll l lllll ll lll l ll ll ll l llll lll l ll lll ll ll ll ll l ll ll lll ll lllll l lll ll ll ll llll ll lll ll ll lll llll ll l ll l lll l llll lll lll ll ll ll l ll llll l ll ll lll lll ll ll l llll ll ll lll l l lll l ll ll lll l l ll l l ll ll lll lll l lll llll llll ll l ll l ll ll ll lll l lllll ll ll l l l lll l lll l ll ll l l lllll l llll ll l ll l ll ll ll lll ll lll lll ll ll ll l lllll ll ll ll ll lll ll ll l llll ll ll ll ll l l ll l lll lll l lll l l ll l ll ll ll ll lll lll l lll lll ll l llll lll ll lll l ll ll l l llll llll l ll l ll l ll lll l lll l lll ll ll ll lll ll l lll l lll ll l ll l ll ll l lll lll l l ll ll ll llll lll l lll l ll l l l llll lllll ll l lllll l l ll l l ll l ll ll ll llll l ll ll lll ll ll ll l ll l lllll l lll l l lll l ll l ll lll l ll ll ll llll ll lll ll ll l l l lllll l ll l ll lllll l lll ll lllll lll l l llll llll ll ll l lll ll ll l lll l ll llll ll ll lll ll l ll l l lll lllll l ll llll l llll ll ll llll ll lll ll ll l llll ll ll ll l l l lllll l ll l l ll ll l l ll lll lll l ll ll llll lllll l ll llll l ll ll l l l ll ll l ll l ll ll lll ll ll l ll lll l ll lll l ll ll l llll lll l lll l lll l ll llll lll ll l ll lll l ll l ll ll l l lll l ll l lll ll ll l l ll lll ll l ll ll lll ll llll ll ll ll l ll l ll l lll llll ll llll l ll l ll lll l ll lll lll ll l lllll ll l l l l llll ll ll ll ll l ll llll llll lll ll l llll lll l ll lll l l ll lll ll ll ll ll lll lll llll lll lll ll ll lll ll ll lllll ll ll ll ll l llll llll l ll ll ll ll lll lll llll ll ll l ll ll ll llll l l ll ll ll lll ll lll ll l ll ll lll ll lll lll ll l ll l ll l lll lll ll l lll l lll l l ll lll ll lll l ll ll lll ll l l lll ll ll lll llllll l l l lll ll l l llll l ll ll l ll ll l ll ll ll ll ll l l lll l lll l ll l llll ll ll ll ll lll ll l ll llll lll l l lll l ll ll l l llll l ll ll ll l ll l ll lll l llll l ll llll l l ll ll lll ll ll l l ll ll ll l l ll llll ll l l ll lll l ll lll ll l ll l ll ll l llll l lll ll ll l ll l l l l lll lll l lll ll ll l ll lll l ll ll lll ll ll lll lll ll l ll ll ll l l ll ll llll l lll l ll lll lll llll ll ll l ll ll lll l ll ll ll ll ll ll lll ll lll ll l ll l l lll lll l ll ll ll ll ll ll ll l l lll ll l ll l ll ll lllll ll l lll l lll lll lll l ll ll ll ll ll ll lll l lll ll l lll l ll ll ll lll l l lll l l ll lll ll llll lll lll ll ll lll l l −2 −1 0 1 2 . . . . llll lll lllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllllllllll llllllllll llllll lll lll lll l=5 j=12 Figure 1: Local linear estimator (red) at different scales l , to regress the function f (green) from noisysamples (black). The horizontal axis is (cid:104) v, x (cid:105) , while of course the estimator ˆ f j | l is a function of (cid:104) (cid:98) v, x (cid:105) and may appear multi-valued in (cid:104) v, x (cid:105) . For small l (top row) the error in the estimation of the index v is large, leading to poor regression estimates regardless of the regression scale j . For larger l (bottomrow) a good accuracy for the index vector v is achieved, and the estimator is able to approximate thefunction even below the noise level and the non-monotonicity scale (e.g. for j = 6 ); overfitting occursfor j too large (e.g. j = 12 in this case).(Y) Y has sub-Gaussian distribution with variance proxy R .(Z) ζ is sub-Gaussian with variance proxy σ .(X), (Y) and (Z) are standard assumptions in regression analysis tout court . Note that we start fromstandardized data, that is, we will not be tracking the (negligible) error resulting from standardizationbased on data samples. The following is instead typical of single-index models:(LCM) E [ X | (cid:104) v, X (cid:105) ] = v (cid:104) v, X (cid:105) (LCM) is commonly referred to as the linear conditional mean assumption [42, Condition 3.1], because(for centralized X ) it is equivalent to requiring E [ X | (cid:104) v, X (cid:105) ] to be linear in (cid:104) v, X (cid:105) [39, Lemma 1.1].Every spherical distribution, hence every elliptical distribution after standardization, satisfies (LCM)for every v [7, Corollary 5], and conversely [22]. While it does introduce some symmetry, it is lessrestrictive than it may seem. It has been shown to hold approximately in high dimension, where mostlow-dimensional projections are nearly normal [20, 26]. (LCM) is introduced to ensure that (cid:98) v is anunbiased estimate of v [11, Theorem 1]. As an alternative to (LCM), one could rely on the strongerassumption(SMD) for every w ∈ span { v } ⊥ , (cid:104) w, X (cid:105) has symmetric distribution (i.e. has same distribution as −(cid:104) w, X (cid:105) ). onditional regression for single-index models Algorithm:
SVR
Input : samples { ( X i , Y i ) } ni =1 ⊂ R d × R , intervals S, I , polynomial degree m ∈ N . Output: (cid:98) v l estimate of v , (cid:98) f j | l estimate of f .standardize data to mean and I d covariance; construct { S l,h } l,h , dyadic decomposition of S ; compute (cid:98) v l,h , the eigenvector of S l,h (cid:80) i X i X Ti { Y i ∈ S l,h } corresponding to the smallest eigenvalue, for all h ∈ H l = { h : S l,h ≥ − l n } ; compute (cid:98) v l , the eigenvector of (cid:80) h ∈H l S l,h (cid:80) h ∈H l (cid:98) v l,h (cid:98) v Tl,h S l,h corresponding to the largest eigenvalue; construct { I j,k } j,k , dyadic decomposition of I ; compute (cid:98) f j,k | l = arg min deg( p ) ≤ m (cid:80) i | Y i − p ( (cid:104) (cid:98) v l , X i (cid:105) ) | {(cid:104) (cid:98) v l , X i (cid:105) ∈ I j,k } ; define (cid:98) f j | l ( t ) = (cid:80) k (cid:98) f j,k | l ( t ) { t ∈ I j,k } . Table 2.
Computational cost breakdown for SVR. task computational cost standardization O ( d n ) O ( n log n ) O ( d n log n ) O ( d n log n ) O ( n log n ) m -order polynomial regression O ( m n log n ) total O (( d + m ) n log n ) The restriction (SMD) to symmetrical marginal distributions is purely technical, and we impose it onlyin order to apply standard Bernstein inequalities for bounded variables [48, Lemma 2.2.9]. Since X and Y are in general unbounded, we will condition the statistics of interest in suitable balls of con-stant radius (see Section 3.1). Such conditioning would in general break (LCM), but not (SMD). UsingBernstein concentration for bounded variables will allow us to detangle and better analyze the roleof the scale parameter l in our bounds. On the other hand, similar though less explicit bounds couldbe obtained using directly Bernstein inequalities for sub-exponential variables [48, Lemma 2.2.11],avoiding the conditioning and hence relaxing (SMD) for the weaker (LCM). While in our bounds thescale parameter is encoded in an explicit variance term, in bounds resulting from sub-exponential Bern-stein such a term would be replaced by a conditional sub-Gaussian norm, which is less interpretableand much more difficult to characterize. All boiling down to a technical distinction, we will not stress(SMD) versus (LCM) any further. In addition to (LCM) or (SMD), second order methods usually re-quire the so-called constant conditional variance assumption [10, p. 2117]:(CCV) Cov[ X | (cid:104) v, X (cid:105) ] is nonrandom.Assuming (X) and (LCM), (CCV) is equivalent to Cov[ X | (cid:104) v, X (cid:105) ] = I d − vv T almost surely [39,Corollary 5.1]. (CCV) is true for the normal distribution [39, Proposition 5.1], and again approxi-mately true in high dimension [20, 26]. Some care is required when assuming both (LCM) and (CCV):0 A. Lanteri, M. Maggioni and S. Vigogna imposing (LCM) for every v is equivalent to assuming spherical symmetry [22], and the only spher-ical distribution satisfying (CCV) is the normal distribution [35, Theorem 7]. For this reason we willintroduce a relaxation of (CCV) in the next subsection.We present separately bounds on the estimation of v in the next subsection, and on the regression of f in subsection 3.2. Our main result, Theorem 2, will give in particular a near-optimal high probabilitybound on the SVR estimator for F , in the form (cid:16) E X [ | (cid:98) F ( X ) − F ( X ) | ] (cid:17) / ≤ K ( d (cid:112) log d + d s )(log n ) s/ (cid:18) log nn (cid:19) s s +1 for F ∈ C s , s ∈ [1 / , , K a constant independent of n and d . Suppose (X), (Y) and (SMD) hold true. In the following, we will condition the statistics µ l,h and Σ l,h on (cid:107) X (cid:107) ≤ C X √ dR | Y | ≤ C Y R (3)for some constants C X , C Y . Since our samples ( X i , Y i ) ’s are not drawn from this conditional distri-bution, we implement SVR discarding samples which do not verify (3). In doing so, we only rejectan arbitrarily small constant fraction of X i ’s and Y i ’s (as C X , C Y increase) with probability higherthan − e − cn , thanks to assumptions (X) and (Y) and Lemma B.3, while not invalidating assump-tion (SMD). Moreover, the accepted samples are still independent conditioned on verifying (3). Thenumber of these samples is random, but larger than n/C for some C > with probability higher than p = 1 − e − cn . In the process of computing the bounds, the constant C will be absorbed by otherconstants; since the probability p is higher than all the other probabilities we will be computing, it willbe absorbed by them. As we mentioned in Section 3, similar but less interpretable bounds for SVR canbe proved for unbounded distributions and thus without discarding points. In accordance with (3), wepick the interval S in SVR as S = [ − C Y R, C Y R ] . Furthermore, we will be assuming lower boundedconditional variance on the distribution conditioned on (3):(LCV) There is α ≥ such that Var[ (cid:104) w, X (cid:105) | (cid:104) v, X (cid:105) , (3) ] ≥ R /α almost surely for all w ∈ span { v } ⊥ ∩ S d − .This assumption is a relaxation of the standard (CCV) we mentioned in Section 3. Besides the distri-butional assumptions discussed so far, we introduce the following functional property:( Ω ) There are ω ≥ and (cid:96) > such that, for every subinterval T ⊂ S with | T | ≥ ω , | [min f − ( T ) , max f − ( T )] | ≤ | T | /(cid:96) .Assumption ( Ω ) may be regarded as a large scale sub-Lipschitz property. Note that, if f is bi-Lipschitz,then ( Ω ) is satisfied with ω = 0 . However, ( Ω ) for ω > does not imply that f is monotone; it relaxesmonotonicity to monotonicity “at scales larger than ω ”.We may now state the main result for the SVR estimator of v : Theorem 1 (SVR) . Suppose (X) , (Y) , (Z) , ( Ω ) , (SMD) and (LCV) hold true. Let l be such that − l (cid:46) (cid:96)/ √ α and | S l,h | (cid:38) max { σ, ω } , h = 1 , . . . , l . Then, for n large enough so that n √ log n (cid:38) ( t + l + log d )2 l , we have (a) (cid:107) (cid:98) v l − v (cid:107) (cid:46) α(cid:96) − √ t + l + log d − l/ (cid:113) dn/ √ log n with probability higher than − e − t .onditional regression for single-index models Moreover, if n log n √ log n (cid:38) α (cid:96) − d l , then (b) E [ (cid:107) (cid:98) v l − v (cid:107) ] (cid:46) α (cid:96) − ( l + log d )2 − l dn/ √ log n . If, furthermore, | ζ | ≤ σ a.s., then (a) and (b) hold with n/ √ log n replaced by n . Theorem 1 not only proves convergence for SVR, but also shows that finer scales give more accurateestimates, provided the number of local samples S l,h is not too small and we stay above the criticalscales σ and ω , representing the noise and the non-monotonicity levels, respectively. Without assump-tion (LCM), both SIR and SVR provide biased estimates of the index vector; it is not known if suchbias is removable. Nevertheless, Theorem 1 suggests that the estimation error of SVR could be drivento by increasing l , only limited by the constraint of keeping the scale l larger than max { σ, ω } . Onthe other hand, for distributions not satisfying the assumptions above, the inverse regression curve candeviate considerably from the direction v , regardless of the size of the noise (see Figure 3). In SVR,assuming for a moment monotonicity ( ω = 0 ) and zero noise ( σ = 0 ), choosing the scale parameter l according to the lower bound on n yields a O ( n − ) convergence rate for the MSE, disregardinglogarithmic factors.To prove Theorem 1, we first establish bounds on the local statistics involved in the computation ofthe estimator of v : Proposition 1.
Suppose (Z) and ( Ω ) hold true. Let T ⊂ S be a bounded interval with | T | ≥ ω . Then: (a) For every τ ≥ , P {|(cid:104) v, X i (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T ] | (cid:38) (cid:96) − ( | S | + (cid:112) τ log n σ ) | Y i ∈ T } ≤ n − τ . If | ζ | ≤ σ a.s., then P {|(cid:104) v, X i (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T ] | (cid:46) (cid:96) − ( | S | + σ ) | Y i ∈ T } = 1 . (b) Var[ (cid:104) v, X (cid:105) | Y ∈ T ] (cid:46) (cid:96) − ( | T | + σ ) . Proof.
Let Z t = ( − (cid:112) t + 1) σ, −√ tσ ] ∪ [ √ tσ, (cid:112) t + 1) σ ) for t ∈ N . To prove (a) we first notethat, thanks to (Z), we have ζ i ∈ (cid:83) t ≤ τ log n Z t for every i with probability higher than − n − τ .Conditioned on this event and Y i ∈ T , (cid:104) v, X i (cid:105) ∈ f − ( T + (cid:83) t ≤ τ log n Z t ) . On the other hand, E [ (cid:104) v, X (cid:105) | Y ∈ T, ζ ∈ Z t ] ∈ [min f − ( T + Z t ) , max f − ( T + Z t )] . It follows from assumption ( Ω ) that |(cid:104) v, X i (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T, ζ ∈ Z t ] | (cid:46) (cid:96) − ( | T | + (cid:112) max { t, τ log n } σ ) . Thus, by the law of total expectation, |(cid:104) v, X i (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T ] | ≤ ∞ (cid:88) t =0 |(cid:104) v, X i (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T, ζ ∈ Z t ] | P { ζ ∈ Z t } (cid:46) (cid:96) − (cid:18) | T | + (cid:112) τ log nσ + σ (cid:88) t>τ √ te − t (cid:19) (cid:46) (cid:96) − ( | T | + (cid:112) τ log n σ ) . A. Lanteri, M. Maggioni and S. Vigogna
The case where | ζ | ≤ σ almost surely is similar and simpler. For (b), we write Var[ (cid:104) v, X (cid:105) | Y ∈ T ] = E [( (cid:104) v, X (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T ]) | Y ∈ T ]= ∞ (cid:88) t =0 E [( (cid:104) v, X (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T ]) | Y ∈ T, ζ ∈ Z t ] P { ζ ∈ Z t } . Conditioned on ζ ∈ Z t , assumption ( Ω ) gives |(cid:104) v, X (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T ] | ≤ ∞ (cid:88) s =0 |(cid:104) v, X (cid:105) − E [ (cid:104) v, X (cid:105) | Y ∈ T, ζ ∈ Z s ] | P { ζ ∈ Z s } (cid:46) (cid:96) − (cid:18) | T | + √ tσ + σ ∞ (cid:88) s =0 √ se − s (cid:19) (cid:46) (cid:96) − ( | T | + √ tσ ) , whence Var[ (cid:104) v, X (cid:105) | Y ∈ T ] (cid:46) (cid:96) − (cid:18) | T | + σ ∞ (cid:88) t =0 te − t (cid:19) (cid:46) (cid:96) − ( | T | + σ ) . Proposition 2.
Suppose (X) , (Y) , (Z) , ( Ω ) , (LCM) and (LCV) hold true. Then, for every l such that − l (cid:46) (cid:96)/ √ α and | S l,h | ≥ max { σ, ω } , h = 1 , . . . , l , v is the eigenvector of smallest eigenvalue of Σ l,h ,and λ d − (Σ l,h ) − λ d (Σ l,h ) (cid:38) R /α with probability higher than − e − cn . Proof.
We first lower bound λ d − (Σ l,h ) . We have Cov[ X | Y ∈ S l,h ] = E [ XX T | Y ∈ S l,h ] − E [ X | Y ∈ S l,h ] E [ X T | Y ∈ S l,h ] . Since X is independent of ζ , (2) implies that X is independent of Y given (cid:104) v, X (cid:105) , hence E [ XX T | Y ∈ S l,h ] = E [ E [ XX T | (cid:104) v, X (cid:105) , Y ∈ S l,h ] | Y ∈ S l,h ]= E [ E [ XX T | (cid:104) v, X (cid:105) ] | Y ∈ S l,h ] . For the same reason, and using assumption (LCM), we have E [ X | Y ∈ S l,h ] = E [ E [ X | (cid:104) v, X (cid:105) , Y ∈ S l,h ] | Y ∈ S l,h ]= E [ E [ X | (cid:104) v, X (cid:105) ] | Y ∈ S l,h ]= E [ v (cid:104) v, X (cid:105) | Y ∈ S l,h ] . Now, let w be a unitary vector orthogonal to v . Then w T E [ X | Y ∈ S l,h ] = E [ (cid:104) w, v (cid:105)(cid:104) v, X (cid:105) | Y ∈ S l,h ] = 0 , onditional regression for single-index models w T E [ XX T | Y ∈ S l,h ] w = E [Var[ (cid:104) w, X (cid:105) | (cid:104) v, X (cid:105) ] | Y ∈ S l,h ] ≥ R /α. Moreover, (LCM) implies by [11, Theorem 1.a] that v is an eigenvector of Σ l,h . Therefore, λ d − (Σ l,h ) = min w ∈ span { v } ⊥ (cid:107) w (cid:107) =1 w T Cov[ X | Y ∈ S l,h ] w ≥ R /α. To upper bound λ d (Σ l,h ) note that | S l,h | = | S | − l (cid:46) R − l . Thus, assumption ( Ω ) implies by Proposi-tion 1(b) that λ d (Σ l,h ) (cid:46) (cid:96) − R − l . We finally put together lower and upper bound. Taking − l (cid:46) (cid:96)/ √ α yields the desired inequality.We now establish convergence in probability for the local estimators (cid:98) v l,h . Proposition 3 (local SVR) . Suppose (X) , (Y) , (Z) , ( Ω ) , (SMD) and (LCV) hold true. Then, condi-tioned on S l,h , for every l such that − l (cid:46) (cid:96)/ √ α and | S l,h | (cid:38) max { σ, ω } , h = 1 , . . . , l , for every ε > and τ ≥ , P {(cid:107) (cid:98) v l,h − v (cid:107) > ε } (cid:46) d (cid:104) exp (cid:16) − c S l,h ε α (cid:96) − d √ τ log n (2 − l +2 − l ε ) (cid:17) + exp (cid:16) − c S l,h α d (cid:17)(cid:105) + n − τ . If | ζ | ≤ σ a.s., then P {(cid:107) (cid:98) v l,h − v (cid:107) > ε } (cid:46) d (cid:104) exp (cid:16) − c S l,h ε α (cid:96) − d (2 − l +2 − l ε ) (cid:17) + exp (cid:16) − c S l,h α d (cid:17)(cid:105) . Proof.
Since (cid:107) (cid:98) v l,h − v (cid:107) ≤ , we can assume ε ≤ ε ≤ whenever needed. The Davis–Kahan Theorem[2, Theorem VII.3.1] together with Proposition 2 gives (cid:107) (cid:98) v l,h − v (cid:107) ≤ (cid:107) v T ( (cid:98) Σ l,h − Σ l,h ) (cid:107)| λ d − ( (cid:98) Σ l,h ) − λ d (Σ l,h ) | . By Proposition 2 and the Weyl inequality we get | λ d − ( (cid:98) Σ l,h ) − λ d (Σ l,h ) | ≥ λ d (Σ l,h ) − λ d − (Σ l,h ) − | λ d − ( (cid:98) Σ l,h ) − λ d − (Σ l,h ) | (cid:38) R /α − (cid:107) (cid:98) Σ l,h − Σ l,h (cid:107) . We bound (cid:107) (cid:98) Σ l,h − Σ l,h (cid:107) using the Bernstein inequality. First, we introduce the intermediate term (cid:101) Σ l,h = 1 S l,h (cid:88) i ( X i − µ l,h )( X i − µ l,h ) T { Y i ∈ S l,h } , and split (cid:98) Σ l,h − Σ l,h into (cid:98) Σ l,h − Σ l,h = (cid:101) Σ l,h − Σ l,h − ( (cid:98) µ l,h − µ l,h )( (cid:98) µ l,h − µ l,h ) T . A. Lanteri, M. Maggioni and S. Vigogna
We have (cid:107) X i − µ l,h (cid:107) (cid:46) R d , hence P {(cid:107) (cid:101) Σ l,h − Σ l,h (cid:107) (cid:38) R /α | S l,h } (cid:46) d exp (cid:18) − c S l,h α d (cid:19) . Moreover, (cid:107) (cid:98) µ l,h − µ l,h (cid:107) (cid:46) R /α with same probability.We now apply the Bernstein inequality to concentrate v T ( (cid:98) Σ l,h − Σ l,h ) . By Proposition 1(a) we have,with probability no lower than − n − τ , | v T ( X i − µ l,h ) |(cid:107) X i − µ l,h (cid:107) (cid:46) (cid:96) − R (cid:112) dτ log n − l , or | v T ( X i − µ l,h ) |(cid:107) X i − µ l,h ] (cid:107) (cid:46) (cid:96) − R √ d − l when | ζ | ≤ σ . Next, we estimate the variance. Wehave (cid:107) v T ( (cid:101) Σ l,h − Σ l,h ) (cid:107) = v T ( (cid:101) Σ l,h − Σ l,h ) v = v T (cid:101) Σ l,h v − v T (cid:101) Σ l,h Σ l,h v − v T Σ l,h (cid:101) Σ l,h v + v T Σ l,h v, hence, taking the expectation, E [ (cid:107) v T ( (cid:101) Σ l,h − Σ l,h ) (cid:107) | Y ∈ S l,h ] = E [ v T (cid:101) Σ l,h v | Y ∈ S l,h ] − v T Σ l,h v, where E [ v T (cid:101) Σ l,h v | Y ∈ S l,h ] = 1( S l,h ) v T E (cid:20)(cid:18)(cid:88) i ( X i − µ l,h )( X i − µ l,h ) T (cid:19) | Y ∈ S l,h (cid:21) v ≤ S l,h v T E [( X − µ l,h ) (cid:107) X − µ l,h (cid:107) ( X − µ l,h ) T | Y ∈ S l,h ] v + v T Σ l,h v ≤ S l,h dR E [( v T ( X − µ l,h )) | Y ∈ S l,h ] + v T Σ l,h v = 1 S l,h dR Var[ v T X | Y ∈ S l,h ] + v T Σ l,h v. Thus, Proposition 1(b) gives E [ (cid:107) v T ( (cid:101) Σ l,h − Σ l,h ) (cid:107) | Y ∈ S l,h ] ≤ S l,h (cid:96) − dR − l . We therefore obtain P {(cid:107) v T ( (cid:98) Σ l,h − Σ l,h ) (cid:107) > α − R ε | S l,h } (cid:46) d exp (cid:32) − c S l,h ε α (cid:96) − d √ τ log n (2 − l + 2 − l ε ) (cid:33) , without √ τ log n if | ζ | ≤ σ . Same bounds hold for v T ( (cid:98) µ l,h − µ l,h )( (cid:98) µ l,h − µ l,h ) T , which completesthe proof.We are finally in a position to prove Theorem 1. onditional regression for single-index models Proof of Theorem 1.
The Davis–Kahan Theorem [45, Theorem 2] yields (cid:107) (cid:98) v l − v (cid:107) (cid:46) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:80) h S l,h (cid:88) h (cid:98) v l,h (cid:98) v Tl,h S l,h − vv T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:46) (cid:80) h S l,h (cid:88) h (cid:107) (cid:98) v l,h − v (cid:107) S l,h . Applying Proposition 3 and taking the union bound over h gives now (a). For (b), we condition on | ζ i | ≤ √ τ log nσ for all i ’s and calculate E [ (cid:107) (cid:98) v l − v (cid:107) ] − n − τ (cid:46) (cid:90) ε P {(cid:107) (cid:98) v l − v (cid:107) > ε } dε = (cid:90) − l ε P {(cid:107) (cid:98) v l − v (cid:107) > ε } dε + (cid:90) − l ε P {(cid:107) (cid:98) v l − v (cid:107) > ε } dε ≤ (cid:90) − l min (cid:110) , l d exp (cid:16) − c ( n/ √ log n ) ε α (cid:96) − √ τd − l (cid:17)(cid:111) εdε + (cid:90) − l l d exp (cid:16) − c ( n/ √ log n ) α (cid:96) − √ τd l (cid:17) εdε (cid:46) α (cid:96) − √ τ d log(2 l d ) 2 − l n/ √ log n + 2 l d exp (cid:16) − c ( n/ √ log nα (cid:96) − √ τd l (cid:17) , where the last inequality follows from Lemma B.4. For τ = 2 and n large enough as in the first assumedlower bound, we obtain (b). Analogous computations for the case where | ζ | ≤ σ lead to the finalclaim. In this section we study how partitioning polynomial regression of the link function in a single-indexmodel is affected by an estimate (cid:98) v of the index vector, where regression estimators are viewed asconditioned on (cid:98) v . We will focus on one standard class of priors for regression functions, namely theclass C s of Hölder continuous functions. We recall that a function g : R d → R is C s Hölder continuous( g ∈ C s ) if, for s = k + α , k ≥ an integer and α ∈ (0 , , g has bounded continuous derivatives up toorder k and | g | C s = max | λ | = k sup x (cid:54) = z ∂ λ g ( x ) − ∂ λ g ( z ) (cid:107) x − z (cid:107) α < ∞ . The Hölder norm is defined by (cid:107) g (cid:107) C s = (cid:88) | λ |≤ k (cid:107) ∂ λ g (cid:107) ∞ + | g | C s . It is well known that in general it is not possible to obtain optimal estimates in high probability withpiecewise polynomials of order greater than zero [4, Section 3]. For this reason, in the case of a C s Hölder regression function with s > we will assume the following regularity condition:6 A. Lanteri, M. Maggioni and S. Vigogna (R) For every interval I with { x ∈ R d : (cid:104) v, x (cid:105) ∈ I } ⊂ supp ρ , Var[ (cid:104) v, X (cid:105) | (cid:104) v, X (cid:105) ∈ I ] (cid:38) | I | .To control the distributional mismatch of the projection (cid:104) (cid:98) v, X (cid:105) , we will also make one of the followingtwo assumptions:(P1) X has spherical distribution.(P2) X has upper bounded density ρ with bounded support and, for every interval I with { x ∈ R d : (cid:104) v, x (cid:105) ∈ I } ⊂ supp ρ , ρ ( { x ∈ R d : (cid:104) v, x (cid:105) ∈ I } ) (cid:38) | I | .As discussed early in Section 3, spherical distributions (P1) provide a customary model for conditionalregression estimators and cover, but are not limited to, the Gaussian distribution. On the other hand,the class (P2) includes a variety of regular densities on compact normal domains, where no specialsymmetry is required.We now state our main theorem: Theorem 2.
Assume (X) and (Z) . Furthermore, assume either (P1) or (P2) . Let (cid:98) v be an estimator for v such that, for every ε > , ( (cid:98) V) P {(cid:107) (cid:98) v − v (cid:107) > ε } ≤ A exp( − nε /B ) for some A, B ≥ possibly dependent on d and specific parameters. Suppose f ∈ C s with s ∈ [1 / , ,and assume (R) in the case s > . Let (cid:98) F j | (cid:98) v be a piecewise constant ( s ≤ or linear ( s > estimatorof F at scale j conditioned on (cid:98) v , as defined in Section 2.4 for (cid:104) (cid:98) v, x (cid:105) ∈ [ − r, r ] , and outside. Then,setting − j (cid:16) √ B (log n/n ) / (2 s +1) and r (cid:16) √ d log nR we have: (a) For every ν > there is c ν ( d, B, R, (cid:107) f (cid:107) C s , s ) ≥ such that P (cid:26) ( E X [ | (cid:98) F j | (cid:98) v ( X ) − F ( X ) | ]) > ( κ + c ν )(log n ) s/ (cid:16) log nn (cid:17) s s +1 (cid:27) (cid:46) An − ν for some κ ( d, B, R, (cid:107) f (cid:107) C s , s ) . (b) E [ | (cid:98) F j | (cid:98) v ( X ) − F ( X ) | ] ≤ K (log n ) s (cid:16) log nn (cid:17) s s +1 for some K = K ( d, A, B, R, (cid:107) f (cid:107) C s , s ) .The dependence of all constants upon d , A and B is polynomial. Theorem 2 shows that partitioning poliynomial estimators achieve the -dimensional min-max con-vergence rate (up to logarithmic factors) when conditioned on any √ n -convergent estimate of v , andthus, in particular , on the estimate (cid:98) v obtained with SVR (under its assumptions). The same conclusionfollows for other prominent index estimators, including conditional methods such as SIR, SAVE, SCRand DR. Although formally our theorem requires non-asymptotic √ n -convergence to the index, andonly asymptotic √ n -consistency has been established for the aforementioned methods, finite samplebounds can be derived as well with similar arguments to the ones employed in Section 3.1.To prove Theorem 2 we first show that, with high probability, a conditional polynomial estimatordiffers from an oracle estimator (possessing knowledge of v ) by the angle between (cid:98) v and v : Proposition 4.
Assume (X) and (Z) . Furthermore, assume either (P1) or (P2) . Suppose f ∈ C α with α ∈ [1 / , . Let (cid:98) v be an estimate of v . For u ∈ { v, (cid:98) v } , let (cid:98) F j | u be a piecewise constant ( α ≤ ) orlinear ( α = 1 ) estimator of F at scale j conditioned on u as defined in Section 2.4 for (cid:104) u, x (cid:105) ∈ [ − r, r ] ,and outside. Assume (R) in case of a linear estimator. Then, for every ε > , r ≥ and conditionedon − j ≥ (cid:107) (cid:98) v − v (cid:107) /t for some t ≥ , we have (cid:0) E X [ | (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) | { X ∈ B (0 , r ) } ] (cid:1) (cid:46) t | f | C α r − α (cid:107) (cid:98) v − v (cid:107) − α + ε onditional regression for single-index models with probability higher than − C K j exp (cid:18) − c nε K j t (cid:107) f (cid:107) C α r α (cid:19) − n exp( − r / dR ) . The proof of Proposition 4 can be found in Appendix A. The key tool to obtain the dependenceon (cid:107) (cid:98) v − v (cid:107) in the upper bound is the Wasserstein distance, which enables to control the differencebetween statistics computed on the conditional distribution given (cid:98) v rather than v . We now proceed toprove Theorem 2. Proof of Theorem 2.
Let k = (cid:100) s (cid:101) − ∈ { , } and α = s ∧ ∈ [ , . We start by isolating the erroroutside a ball B (0 , r ) : E X [ | F ( X ) − (cid:98) F j | (cid:98) v ( X ) | ] (cid:46) E X [ | F ( X ) − (cid:98) F j | (cid:98) v ( X ) | {(cid:107) X (cid:107) ≤ r } ] + | f | C P {(cid:107) X (cid:107) > r } , whence, using Lemma B.2, we get the tail bound P {(cid:107) X (cid:107) > r } (cid:46) exp( − r / dR ) . (T)For x ∈ B (0 , r ) we decompose | F ( x ) − (cid:98) F j | (cid:98) v ( x ) | = | f ( (cid:104) v, x (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) |≤ | f ( (cid:104) v, x (cid:105) ) − f ( (cid:104) (cid:98) v, x (cid:105) ) | (cid:124) (cid:123)(cid:122) (cid:125) ( θ + | f ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) | , and bound ( θ
1) by the angle (cid:107) (cid:98) v − v (cid:107) : E X [ | f ( (cid:104) v, X (cid:105) ) − f ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] ≤ | f | C α r α (cid:107) (cid:98) v − v (cid:107) α . Hence, from assumption ( (cid:98)
V) and Lemma B.4 we get P { E X [ | f ( (cid:104) v, X (cid:105) ) − f ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] > ε } ≤ A exp (cid:18) − c nε /α Br | f | /α C α (cid:19) E [ | f ( (cid:104) v, X (cid:105) ) − f ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] ≤ (log( A ) B ) α | f | C α r α n − α . ( Θ1 )We further decompose | f ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) |≤ | f ( (cid:104) (cid:98) v, x (cid:105) ) − f j | v ( (cid:104) (cid:98) v, x (cid:105) ) | (cid:124) (cid:123)(cid:122) (cid:125) (b) + | f j | v ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j | v ( (cid:104) (cid:98) v, x (cid:105) ) | (cid:124) (cid:123)(cid:122) (cid:125) (v) + | (cid:98) f j | v ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) | (cid:124) (cid:123)(cid:122) (cid:125) ( θ ) . Integrating (b) we get a bias term, that we control exploiting the Hölder continuity of f (see [44,Section 3.2]): E X [ | f ( (cid:104) (cid:98) v, X (cid:105) ) − f j | v ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] (cid:46) | f | C s r s − js . (B)8 A. Lanteri, M. Maggioni and S. Vigogna
The variable (v) leads to a variance term, which can be concentrated with known calculations (see [44,Proposition 13]): P { E X [ | f j | v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] > ε } (cid:46) K j exp (cid:18) − c nε K j | f | C (cid:19) E [ | f j | v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] (cid:46) | f | C log( K j ) K j n . (V)For ( θ ) we condition on the event (cid:107) (cid:98) v − v (cid:107) ≤ t − j , taking into account that, thanks to assumption ( (cid:98) V),the complement has probability P {(cid:107) (cid:98) v − v (cid:107) > − j } ≤ A exp( − nt − j /B ) . ( Θ2 )Thus, Proposition 4 along with assumption ( (cid:98) V) and Lemma B.4 gives P { E X [ || (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ] > ε | (cid:107) (cid:98) v − v (cid:107) ≤ t − j } (cid:46) K j exp (cid:18) − c nε K j t | f | C α r α (cid:19) + A exp (cid:18) − nε − α ) Bt − α ) | f | − α ) C α r (cid:19) , E [ | (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } | (cid:107) (cid:98) v − v (cid:107) ≤ − j ] (cid:46) | f | C α r α log( K j ) K j n + (log( A ) B ) − α | f | C α r − α n − − α . ( Θ3 )In order to balance the tail (T), the bias (B), the variance (V) and the angle terms ( Θ1 ), ( Θ2 ) and( Θ3 ), we choose r = (cid:112) d log nR , − j (cid:16) √ B (log n/n ) / (2 s +1) , t = 1 , and plug in K j = 2 j , which leads to E [ | (cid:98) F j | (cid:98) v ( X ) − F ( X ) | ] (cid:46) | f | C (cid:18) n (cid:19) (T) + | f | C α R α (log( A ) Bd ) α (cid:18) log nn (cid:19) α ( Θ1 ) + | f | C s R s ( Bd ) s (log n ) s (cid:18) log nn (cid:19) s s +1 (B) + | f | C s +1 (cid:18) log nn (cid:19) s s +1 (V) + A exp (cid:18) − n (cid:16) log nn (cid:17) s +1 (cid:19) ( Θ2 ) + | f | C α s +1 R α d α (log n ) α (cid:18) log nn (cid:19) s s +1 ( Θ3 ) + | f | C α R − α (log( A ) B ) − α d − α (cid:18) log nn (cid:19) − α . onditional regression for single-index models Θ1 ), (log n/n ) α ≤ (log n/n ) s/ s +1 for α = s ∈ [1 / , , and for α = 1 and s > . In ( Θ2 ), exp( − n (log /n ) / (2 s +1) ) ≤ /n for s ≥ / . In ( Θ3 ), (log n/n ) / (2 − α ) ≤ (log n/n ) s/ (2 s +1) for α = s ∈ (0 , , and for α = 1 and s > . Collecting the constants we obtain (b). The bound (a) followssetting ε = c ν log s n (log n/n ) s/ (2 s +1) , t = √ c ν , and taking c ν large enough.The additional logarithmic factors in (a) and (b) are exclusively due to the unboundedness of the dis-tribution and can be avoided in the bounded case. While here we are restricting to constant and linearestimators, hence to the smoothness range s ≤ , the results may be extended to higher order polynomi-als, and thus to smoother regression functions, with similar proofs. However, from our decomposition,and specifically from ( Θ1 ), it seems not possible to maintain min-max optimality in the range s < / .We remark that this smoothness constraint does not depend on the regression technique, that is, theterm ( Θ1 ) would still arise when using different regression methods than partitioning polynomials.
4. Numerical experiments
In this section we conduct numerical experiments to demonstrate that the theoretical results above havepractical relevance and to investigate how relaxations of the assumptions affect the estimators. In orderto highlight specific aspects of different algorithms we use three different functions to conduct ourexperiments. The first two are F ( x ) = exp( (cid:104) v, x (cid:105) / , F ( x ) = F ( x ) + sin(20 (cid:104) v, x (cid:105) ) / . Both functions are smooth. F is monotone and thus we may choose ω = 0 , while F is non-monotone,thus condition ( Ω ) is satisfied only for ω > . This allows us to explore the behavior of (cid:98) v under mono-tonicity or lack thereof, and how the estimators are effected by the choice of the scales l and j . Toinvestigate the convergence rate of the regression estimator (cid:98) F , we use a monotone function F whichis piecewise quadratic on a random partition and continuous. The domain of x , and its dimension d ,will be specified in each experiment. F , F , F are shown in Figure 2. Here we compare the performances of SIR, SAVE and SVR in estimating the index vector v . Weconsider two settings S1, S2, corresponding to two different non-elliptical, and thus non-spherical,distributions for X : ρ X, , ρ X, ; ρ X, is a standard normal N (0 , in one coordinate, and a skewednormal with shape parameter α = 5 in the other coordinate; ρ X, is uniform on the triangle withvertices (0,0),(1,1),(0,1); all distributions are normalized to have zero mean and standard deviationequal to one. Note that in both settings conditions (LCM) and (SMD) are not satisfied. For each settingwe draw n = 1000 i.i.d. samples and generate the response variable Y i = F ( X i ) + ζ i using functions F and F , where ζ i ∼ N (0 , σ ) . We use different levels of noise setting σ equal to the , and of | f ( − − f (4) | . We chose v = (1 / √ , / √ for setting S1, and v = (1 , for setting S2. Theresults in Table 3 show the detailed performance of SIR, SAVE and SVR for all settings, functions,and noise levels. First, we note that the cases of F with noise and F with zero noise produce0 A. Lanteri, M. Maggioni and S. Vigogna −4 −2 0 2 4 . . . . . . . F F −4 −2 0 2 4 . . . . . F Figure 2: Different functions used in the experiments, with horizontal axis representing (cid:104) v, x (cid:105) .similar results. This is consistent with the intuition manifested from Theorem 1 that noise and non-monotonicity levels play a similar role in the accuracy of the estimators. In all settings SIR performsworse than the two other methods, SVR and SAVE, which on the other hand have similar performance,although SVR produces most of the times slightly better estimates. The poor performance of SIR in
Table 3.
Performance of the different algorithms in different settings, with err = log ( (cid:107) (cid:98) v − v (cid:107) ) , correspondingstandard error, and average computational time in seconds / . S1 S2 σσσ err se time err se time
FFF SIR -3.04 -2.83 0.60 -0.68 -1.83 0.60
SAVE -5.97 -1.06 1.30 -7.41 -7.14 1.40
SVR -6.42 -6.06 1.00 -7.60 -6.65 0.70
FFF SIR -3.11 -2.99 0.50 -0.67 -1.83 0.50
SAVE -4.39 -4.10 1.30 -4.13 -4.01 1.30
SVR -4.41 -4.08 0.90 -4.16 -3.92 0.70
FFF SIR -2.97 -2.69 0.50 -0.68 -1.81 0.50
SAVE -4.57 -4.24 1.40 -4.16 -4.03 1.30
SVR -4.58 -4.28 1.00 -4.12 -3.75 0.80
FFF SIR -2.94 -2.72 0.50 -0.68 -1.77 0.50
SAVE -4.06 -3.57 1.40 -3.42 -3.45 1.40
SVR -4.08 -3.87 0.90 -3.45 -3.46 0.80
FFF SIR -2.92 -2.67 0.50 -0.68 -1.38 0.60
SAVE -3.92 -3.58 1.30 -3.08 -3.01 1.40
SVR -3.91 -3.61 0.90 -3.21 -3.06 0.80
FFF SIR -2.87 -2.64 0.60 -0.68 -1.55 0.60
SAVE -3.64 -1.98 1.40 -2.90 -2.85 1.40
SVR -3.66 -3.45 0.90 -3.02 -2.80 0.80 these settings requires a better explanation. In Figure 3 we show graphically how the empirical inverse onditional regression for single-index models ll ll l ll lll ll ll l ll lll ll lll lll l lll llllll l l ll l llll l ll ll l ll llll l l ll l ll ll l ll llll ll lll l ll ll l lll llll l ll lll llll l ll l ll lll ll lll l l ll lll llll ll lll l ll ll llll ll ll ll lllll ll l ll ll ll lll lllllll ll ll l ll l ll lll ll lll ll l l ll ll l lll ll l ll ll l l ll l ll l l lll llll ll l l ll l ll ll ll lll lll ll llll lll ll lll llll l lll lll l ll lllll ll llll ll l ll l ll ll ll lll lll l lll ll l lll lll ll l lll l ll l ll l ll l ll ll lll ll l l lll ll l ll l lll ll ll l ll ll lll l lll lll ll l lll ll ll ll ll lll l llll ll l lll lll l ll lll ll ll l ll l ll lll lll ll lll ll ll ll l llll ll ll l l lllll l l llll l ll ll l ll ll l ll lll l ll lll l ll l l ll lll llll lll l l l lll ll l ll ll l ll l ll l ll ll ll ll ll l l ll ll ll ll ll l l l lll ll ll ll ll lll ll l ll l lll l lll l lll l lll ll ll ll ll l ll l ll ll l ll ll llll lll l ll ll l ll l ll l l lll ll l ll l l ll lll l ll lll lll ll lll l l ll l ll l ll ll l l ll lll l l lll l ll ll llll ll l lll ll ll lll l lll l ll lll lll ll l lll l llll ll l ll ll l ll l l ll lll l lll ll l lll l ll ll ll llll ll ll l ll ll l ll l l lll ll ll l ll l ll l ll l llll llll ll ll ll l l ll ll lll lll l l ll ll llll ll ll ll lll l ll l lll l l lll ll l l ll ll l l ll lll lll lll llll lll ll l ll lll l ll lll ll ll ll l ll ll lll lll lll ll ll lll ll l l ll ll ll l l ll l lll lll l l lll lll llll l ll ll ll l l ll l lll lll ll lll ll llll lll ll lll l lll l l ll lll ll l lll −2 −1 0 1 2 3 − − − ll ll l ll lll ll ll l ll lll ll lll lll l lll llllll l l ll l llll l ll ll l ll llll l l ll l ll ll l ll llll ll lll l ll ll l lll llll l ll lll llll l ll l ll lll ll lll l l ll lll llll ll lll l ll ll llll ll ll ll lllll ll l ll ll ll lll lllllll ll ll l ll l ll lll ll lll ll l l ll ll l lll ll l ll ll l l ll l ll l l lll llll ll l l ll l ll ll ll lll lll ll llll lll ll lll llll l lll lll l ll lllll ll llll ll l ll l ll ll ll lll lll l lll ll l lll lll ll l lll l ll l ll l ll l ll ll lll ll l l lll ll l ll l lll ll ll l ll ll lll l lll lll ll l lll ll ll ll ll lll l llll ll l lll lll l ll lll ll ll l ll l ll lll lll ll lll ll ll ll l llll ll ll l l lllll l l llll l ll ll l ll ll l ll lll l ll lll l ll l l ll lll llll lll l l l lll ll l ll ll l ll l ll l ll ll ll ll ll l l ll ll ll ll ll l l l lll ll ll ll ll lll ll l ll l lll l lll l lll l lll ll ll ll ll l ll l ll ll l ll ll llll lll l ll ll l ll l ll l l lll ll l ll l l ll lll l ll lll lll ll lll l l ll l ll l ll ll l l ll lll l l lll l ll ll llll ll l lll ll ll lll l lll l ll lll lll ll l lll l llll ll l ll ll l ll l l ll lll l lll ll l lll l ll ll ll llll ll ll l ll ll l ll l l lll ll ll l ll l ll l ll l llll llll ll ll ll l l ll ll lll lll l l ll ll llll ll ll ll lll l ll l lll l l lll ll l l ll ll l l ll lll lll lll llll lll ll l ll lll l ll lll ll ll ll l ll ll lll lll lll ll ll lll ll l l ll ll ll l l ll l lll lll l l lll lll llll l ll ll ll l l ll l lll lll ll lll ll llll lll ll lll l lll l l ll lll ll l lll −2 −1 0 1 2 3 − − − l l ll ll ll l llll ll ll ll ll lll lll ll ll ll ll lll l ll ll l ll lllll ll ll lll ll ll ll ll lll ll l lll lll l ll lll l lll l l lll l ll llll l ll l l l ll l l lll l l ll ll l l l ll ll lll ll ll lll l ll ll lll ll ll l ll l l llll l ll ll l ll llll l l lll ll lll ll l lll ll l l ll l l l lll l ll lll llll ll ll l llll l ll ll ll ll lll llll l lll ll l l ll ll l ll ll ll lll lllll ll lllll ll l ll l ll ll l l ll ll l ll lll l lll l ll ll l lll ll l ll l llll ll ll l ll l ll l lll ll ll ll lll ll l ll lll lll l l lll l lll l ll ll l ll l lll lll lll l ll l ll lll l lll lll l ll l llll ll l lll ll l llll ll lll lll l ll ll lll l ll l ll lll l lll l lll l ll lll l ll ll ll lll l llll lll llll l ll ll lll lll ll llll ll ll ll ll l ll ll ll llll l ll l lll l l ll ll ll l l l lll ll ll l l ll ll ll ll ll lll ll ll l ll l ll ll ll ll ll ll ll lll ll ll llll lll ll lll ll lll ll l lll ll ll ll l l lll ll l llll l l lll l lll ll ll l ll l ll ll l ll ll l ll l l ll l ll lll ll lll l ll l l lll llll llll llll ll ll lll ll ll lll lll llll l l l l ll ll l lll l ll ll ll l ll l ll lll ll ll ll ll ll ll lll ll l l lll ll l ll lll ll lll l l l llll l ll l ll l lllll l ll llll l l ll lll l lll ll ll l ll lll ll ll l l lll ll ll l llll l llll l lll llll ll l l ll l lll lll ll lll lll lll lllll ll llll ll l ll ll l ll l llll ll ll ll lll ll lll lll ll l lll lll l l llll l ll l lll llll l l ll l lll ll ll l lll ll ll lll l lll ll lll l l ll ll ll ll l l ll ll ll lll lll l lll l l lll −1 0 1 2 − − l l ll ll ll l llll ll ll ll ll lll lll ll ll ll ll lll l ll ll l ll lllll ll ll lll ll ll ll ll lll ll l lll lll l ll lll l lll l l lll l ll llll l ll l l l ll l l lll l l ll ll l l l ll ll lll ll ll lll l ll ll lll ll ll l ll l l llll l ll ll l ll llll l l lll ll lll ll l lll ll l l ll l l l lll l ll lll llll ll ll l llll l ll ll ll ll lll llll l lll ll l l ll ll l ll ll ll lll lllll ll lllll ll l ll l ll ll l l ll ll l ll lll l lll l ll ll l lll ll l ll l llll ll ll l ll l ll l lll ll ll ll lll ll l ll lll lll l l lll l lll l ll ll l ll l lll lll lll l ll l ll lll l lll lll l ll l llll ll l lll ll l llll ll lll lll l ll ll lll l ll l ll lll l lll l lll l ll lll l ll ll ll lll l llll lll llll l ll ll lll lll ll llll ll ll ll ll l ll ll ll llll l ll l lll l l ll ll ll l l l lll ll ll l l ll ll ll ll ll lll ll ll l ll l ll ll ll ll ll ll ll lll ll ll llll lll ll lll ll lll ll l lll ll ll ll l l lll ll l llll l l lll l lll ll ll l ll l ll ll l ll ll l ll l l ll l ll lll ll lll l ll l l lll llll llll llll ll ll lll ll ll lll lll llll l l l l ll ll l lll l ll ll ll l ll l ll lll ll ll ll ll ll ll lll ll l l lll ll l ll lll ll lll l l l llll l ll l ll l lllll l ll llll l l ll lll l lll ll ll l ll lll ll ll l l lll ll ll l llll l llll l lll llll ll l l ll l lll lll ll lll lll lll lllll ll llll ll l ll ll l ll l llll ll ll ll lll ll lll lll ll l lll lll l l llll l ll l lll llll l l ll l lll ll ll l lll ll ll lll l lll ll lll l l ll ll ll ll l l ll ll ll lll lll l lll l l lll −1 0 1 2 − − Figure 3:
Left column displays the ingredients for the estimates: the empirical inverse regression curve (green) used by SIRand the local gradients (blue), with length proportional to the number of samples in the corresponding level set, used by SVR.Estimates of v using SVR (blue) and SIR (green) are displayed on the right column. The methods are applied on setting S1 (toprow) and S2 (bottom row). The black line indicates v , while data points are colored according to the value of the correspondingresponse variable, generated with F and σ = 0 , using a red-to-yellow color scale. regression curve may drift away from v , resulting in a poor SIR estimate. On the other hand, the localgradients used by SVR provide good local estimates. This example shows how methods with higherorder statistics are in general more robust to assumptions relaxations.To investigate more extensively the performance of SVR in estimating v , we perform another ex-periment: we draw X from a -dimensional standard normal distribution, and to generate the re-sponse variable we use function F plus an additive Gaussian noise with standard deviation σ =0 . | f ( − − f (4) | . We repeat the experiment for different values of the sample size n . Resultsare shown in Figure 4. The left inset shows that the error in (cid:98) v stabilizes at scales comparable to thenoise level σ , which suggests that the assumption | S l,h | (cid:38) σ is needed. The right plot shows that therate of the error of (cid:98) v , for scales l coarser than the noise level, is approximately − , which is againconsistent with Theorem 1.2 A. Lanteri, M. Maggioni and S. Vigogna − . − . − . − . − . − . . n=16000n=32000n=64000n=128000n=256000n=512000noise level l l og ( (cid:107) (cid:98) v l − v (cid:107) ) l l l l l l − . − . − . − . − . l l l l l ll l l l l ll l l l l l l=4 slope=−0.48l=5 slope=−0.51l=6 slope=−0.53l=7 slope=−0.54 log ( n ) l og ( (cid:107) (cid:98) v l − v (cid:107) ) Figure 4: Behavior of the SVR estimate (cid:98) v l with respect to scale and sample size, for regression of F (see text). Left: error versus scale l . Right: error versus sample size n . In this section we perform some experiments to support our theoretical results regarding the regressionestimator obtained with SVR. The first experiment we perform consists on drawing X i , i = 1 , ..., n ,from a d -dimensional standard Normal distribution and obtain Y i = F ( X i ) + ζ i where ζ i ∼ N (0 , σ ) .Here we use function F because we want to limit the function smoothness in order to obtain concen-tration rates comparable with the min-max rate with s = 1 . We vary the dimension d = 5 , , , ,the size of the noise σ , equal to the and of | f ( − − f (4) | . To investigate the convergencerates of the estimator we repeat each experiment for different sample sizes n . In Figure 5 we show theempirical MSE, averaged over repetitions, as a function of the sample size, in logarithmic scale,for both our estimator and the k-Nearest-Neighbor (kNN) regression. We see that the MSE of the SVRestimator decays with a rate slightly better than the optimal value − / , independently from the di-mension d and the noise level σ : this is all consistent with Theorem 2. As expected, kNN-regressionhas a convergence rate which severely deteriorates with the dimension (curse of dimensionality). Wecan also notice that the MSE drops far below the noise level, which confirms the de-noising feature ofthe SVR estimator.To explore the behavior of the empirical MSE as a function of the scales l and j we conduct anotherexperiment: we draw X from a -dimensional standard normal distribution, and obtain the responsevariable Y = F ( X ) + ζ , with ζ Gaussian noise with standard deviation σ = 0 . | f ( − − f (4) | .Figure 6 shows the behavior of the log ( MSE ) , obtained with SVR, for different values of l , j and n .To obtain robust estimates in regions with high Monte Carlo variability, in regimes where our results donot hold, the errors are averaged over 50 repetition of each setting with a trimming. By observingeach row, we notice that the MSE reaches its minimum for low values of l and stays constant for larger l . By looking at the plot column-wise, we observe the bias variance trade-off, with coarse scales givingrough estimates, and fine scales resulting in overfitting. As expected, as the sample size grows, theoptimal scale j increases. onditional regression for single-index models − − − − − − ll ll ll ll ll llll ll ll ll ll llll ll ll ll ll llll ll ll ll ll llll ll ll ll ll llll ll ll ll ll ll SVR, d = 5, rate = −0.82SVR, d = 10, rate = −0.79SVR, d = 50, rate = −0.81SVR, d = 100, rate = −0.84 KNN, d = 5, rate = −0.32KNN, d = 10, rate = −0.2noise 5% log ( n ) l og ( M S E ) − − − − − − ll ll ll ll ll llll ll ll ll ll llll ll ll ll ll llll ll ll ll ll llll ll ll ll ll llll ll ll ll ll ll SVR, d = 5, rate = −0.83SVR, d = 10, rate = −0.82SVR, d = 50, rate = −0.82SVR, d = 100, rate = −0.85 KNN, d = 5, rate = −0.31KNN, d = 10, rate = −0.2noise 10% log ( n ) l og ( M S E ) Figure 5: Comparison of convergence rates for the regression estimator with SVR and KNN-regressionin different settings. l j n=16000 n=32000 n=64000 n=128000 n=256000 n=512000 −4.5−4.0−3.5−3.0−2.5−2.0−1.5 j l Figure 6: Empirical MSE versus sample size n and scales l and j . Acknowledgements . This research was partially supported by AFOSR FA9550-17-1-0280, NSF-DMS-1821211, NSF-ATD-1737984. A.L. acknowledges support from the de Castro Statistics Initia-tive, Collegio Carlo Alberto, Torino, Italy. S.V. thanks Timo Klock for the discussion and the usefulexchange of views about this and related problems.4
A. Lanteri, M. Maggioni and S. Vigogna
References [1] F. Bach. Breaking the curse of dimensionality with convex neural networks.
Journ. of Mach.Learn. Res. , 19(18):1–53, 2017.[2] R. Bhatia.
Matrix Analysis , volume 169 of
Graduate Texts in Mathematics . Springer, 1997.[3] P. J. Bickel and B. Li. Local polynomial regression on unknown manifolds.
Lecture Notes-Monograph Series , 54:177–186, 2007.[4] P. Binev, A. Cohen, W. Dahmen, and R. A. DeVore. Universal Algorithms for Learning Theory.Part II: Piecewise Polynomial Functions.
Constructive Approximation , 26(2):127–152, 2007.[5] P. Binev, A. Cohen, W. Dahmen, R. A. DeVore, and V. N. Temlyakov. Universal Algorithms forLearning Theory Part I: Piecewise Constant Functions.
Journal of Machine Learning Research ,6(1):1297–1321, 2005.[6] V. Buldygin and E. Pechuk. Inequalities for the distributions of functionals of sub-gaussian vec-tors.
Theory of Probability and Mathematical Statistics , 80:25–36, 2010.[7] S. Cambanis, S. Huang, and G. Simons. On the theory of elliptically contoured distributions.
Journal of Multivariate Analysis , 11(3):368–385, 1981.[8] R. J. Carroll, J. Fan, I. Gijbels, and M. P. Wand. Generalized partially linear single-index models.
Journal of the American Statistical Association , 92(438):477–489, 1997.[9] R. J. Carroll, D. Ruppert, and A. H. Welsh. Local estimating equations.
Journal of the AmericanStatistical Association , 93(441):214–227, 1998.[10] R. D. Cook. Save: a method for dimension reduction and graphics in regression.
Communicationsin Statistics - Theory and Methods , 29(9-10):2109–2121, 2000.[11] R. D. Cook and H. Lee. Dimension reduction in binary response regression.
Journal of theAmerican Statistical Association , 94(448):1187–1200, 1999.[12] R. Coudret, B. Liquet, and J. Saracco. Comparison of sliced inverse regression approaches forunderdetermined cases.
J. SFdS , 155(2):72–96, 2014.[13] X. Cui, W. K. Härdle, and L. Zhu. The EFM approach for single-index models.
Ann. Statist. ,39(3):1658–1688, 06 2011.[14] A. S. Dalalyan, A. Juditsky, and V. Spokoiny. A New Algorithm for Estimating the EffectiveDimension-Reduction Subspace.
Journal of Machine Learning Research , 9:1647–1678, 2008.[15] G. Dall’Aglio. Sugli estremi dei momenti delle funzioni di ripartizione doppia.
Annali dellaScuola Normale Superiore di Pisa - Classe di Scienze , Ser. 3, 10(1-2):35–74, 1956.[16] E. del Barrio, E. Giné, and C. Matrán. Central Limit Theorems for the Wasserstein DistanceBetween the Empirical and the True Distributions.
Annals of Probability , 27(2):1009–1071, 1999.[17] M. Delecroix, W. Härdle, and M. Hristache. Efficient estimation in single-index regression. In-terdisciplinary Research Project 373: Quantification and Simulation of Economic Processes. SFB373 Discussion Paper 37, Humboldt University of Berlin, 1997.[18] M. Delecroix and M. Hristache. M-estimateurs semi-paramétriques dans les modèles à directionrévélatrice unique.
Bull. Belg. Math. Soc. Simon Stevin , 6(2):161–185, 1999.[19] M. Delecroix, M. Hristache, and V. Patilea. On semiparametric M-estimation in single-indexregression.
Journal of Statistical Planning and Inference , 136(3):730–769, 2006.[20] P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit.
Ann. Statist. ,12(3):793–815, 09 1984.[21] N. Duan and K.-C. Li. Slicing regression: a link-free regression method.
Ann. Statist. , 19(2):505–530, 1991.[22] M. L. Eaton. A characterization of spherical distributions.
Journal of Multivariate Analysis ,20(2):272–276, 1986. onditional regression for single-index models
Electron. J. Statist. , 1:538–573, 2007.[24] R. Ganti, N. Rao, R. M. Willett, and R. Nowak. Learning single index models in high dimensions.arXiv:1506.08910, 2015.[25] L. Györfi, M. Kohler, A. Krzyzak, and H. Walk.
A Distribution-Free Theory of NonparametricRegression . Springer Series in Statistics. Springer, 2002.[26] P. Hall and K.-C. Li. On almost linearity of low dimensional projections from high dimensionaldata.
Ann. Statist. , 21(2):867–889, 06 1993.[27] W. Härdle, P. Hall, and H. Ichimura. Optimal smoothing in single-index models.
Ann. Statist. ,21(1):157–178, 03 1993.[28] W. Härdle and T. M. Stoker. Investigating smooth multiple regression by the method of averagederivatives.
Journal of the American Statistical Association , 84(408):986–995, 1989.[29] J. L. Horowitz.
Semiparametric Methods in Econometrics . Lecture Notes in Statistics. SpringerNew York, 1998.[30] M. Hristache, A. Juditsky, J. Polzehl, and V. Spokoiny. Structure Adaptive Approach for Dimen-sion Reduction.
Annals of Statistics , 29(6):1537–1566, 2001.[31] M. Hristache, A. Juditsky, and V. Spokoiny. Direct estimation of the index coefficient in a single-index model.
Ann. Statist. , 29(3):593–623, 06 2001.[32] H. Ichimura. Semiparametric least squares (SLS) and weighted SLS estimation of single-indexmodels.
Journal of Econometrics , 58(1):71–120, 1993.[33] S. M. Kakade, V. Kanade, O. Shamir, and A. T. Kalai. Efficient learning of generalized linearand single index models with isotonic regression.
Advances in Neural Information ProcessingSystems 24 , pages 927–935, 2011.[34] A. T. Kalai and R. Sastry. The isotron algorithm: High-dimensional isotonic regression. In
Proceedings of the 22nd Annual Conference on Learning Theory (COLT) , 2009.[35] D. Kelker. Distribution theory of spherical distributions and a location-scale parameter general-ization.
Sankhy¯a: The Indian Journal of Statistics, Series A (1961-2002) , 32(4):419–430, 1970.[36] S. Kpotufe. k-nn regression adapts to local intrinsic dimension.
Advances in Neural InformationProcessing Systems 24 , pages 729–737, 2011.[37] S. Kpotufe and V. Garg. Adaptivity to local smoothness and dimension in kernel regression.
Advances in Neural Information Processing Systems 26 , pages 3075–3083, 2013.[38] S. Kuksin and A. Shirikyan.
Mathematics of Two-Dimensional Turbulence . Cambridge Tracts inMathematics. Cambridge University Press, 2012.[39] B. Li.
Sufficient Dimension Reduction: Methods and Applications with R . Chapman & Hall/CRCMonographs on Statistics and Applied Probability. CRC Press, 2018.[40] B. Li and S. Wang. On Directional Regression for Dimension Reduction.
Journal of the AmericanStatistical Association , 102(479):997–1008, 2007.[41] B. Li, H. Zha, and F. Chiaromonte. Contour regression: A general approach to dimension reduc-tion.
The Annals of Statistics , 33(4):1580–1616, 2005.[42] K.-C. Li. Sliced inverse regression for dimension reduction.
Journal of the American StatisticalAssociation , 86(414):316–327, 1991.[43] W. Liao, M. Maggioni, and S. Vigogna. Learning adaptive multiscale approximations to data andfunctions near low-dimensional sets. In , pages226–230. IEEE, 2016.[44] W. Liao, M. Maggioni, and S. Vigogna. Multiscale regression on unknown manifolds.
ArXive-prints , 2020.[45] R. J. Samworth, T. Wang, and Y. Yu. A useful variant of the Davis–Kahan theorem for statisti-cians.
Biometrika , 102(2):315–323, 2014.6
A. Lanteri, M. Maggioni and S. Vigogna [46] T. M. Stoker. Consistent estimation of scaled coefficient.
Econometrica , pages 1461–1481, 1986.[47] C. J. Stone. Optimal global rates of convergence for nonparametric regression.
Ann. Statist. ,10(4):1040–1053, 12 1982.[48] A. W. van der Vaart and J. Wellner.
Weak Convergence and Empirical Processes: With Applica-tions to Statistics . Springer Series in Statistics. Springer, 1996.[49] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar andG. Kutyniok, editors,
Compressed Sensing , pages 210–268. Cambridge University Press, 2012.[50] R. Vershynin.
High-Dimensional Probability: An Introduction with Applications in Data Sci-ence . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press,2018.[51] Y. Xia. Asymptotic distributions for two estimators of the single-index model.
EconometricTheory , 22(6):1112–1137, 2006.[52] Y. Xia, H. Tong, W. K. Li, and L.-X. Zhu. An adaptive estimation of dimension reduction space.
Journal of the Royal Statistical Society. Series B (Statistical Methodology) , 64(3):363–410, 2002. onditional regression for single-index models Appendix A: Proof of Proposition 4
First, we set out some notation and exclude some low-probability events. In the case of (P1), we condi-tion on (cid:107) X i (cid:107) ≤ r for all i ’s, which happens with probability higher than − n exp( − r / R ) thanksto assumption (X) and Lemma B.2. We define ρ to be the distribution of X , ρ ( · | E ) the conditionaldistribution of X given X ∈ E , and also conditioned on (cid:107) X i (cid:107) ≤ r for all i ’s in the case of (P1), and ρ v ( · | E ) the push-forward of ρ ( · | E ) along x (cid:55)→ (cid:104) v, x (cid:105) . Let u ∈ { v, (cid:98) v } ; when a property is stated for u , it is meant to hold for both v and (cid:98) v . We write E j,k | u = { x ∈ R d : (cid:104) u, x (cid:105) ∈ I j,k } . We will restrict to the sets E j,k | u with ρ ( E j,k | u ) (cid:38) ε / K j | f | C , (E1)Indeed, E X [ | (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, X (cid:105) ) | {(cid:107) X (cid:107) ≤ r } ]= (cid:88) k ∈K j E X [ | (cid:98) f j | v ( (cid:104) (cid:98) v, X (cid:105) ) − (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, X (cid:105) ) | { X ∈ E j,k | (cid:98) v ∩ B (0 , r ) } ] , where the subsum of the terms in k with ρ ( E j,k | (cid:98) v ) ∩ B (0 , r ) (cid:46) ε / K j | f | C is bounded by ε ;hence, we can restrict to ρ ( E j,k | (cid:98) v ) ≥ ρ ( E j,k | (cid:98) v ∩ B (0 , r )) (cid:38) ε / K j | f | C . If we assume (P1),then ρ ( E j,k | v ) = ρ ( E j,k | (cid:98) v ) (cid:38) ε / K j | f | C as well. Otherwise, if we assume (P2), we still have ρ ( E j,k | v ) (cid:38) | I j,k | > / K j (cid:38) ε / K j | f | C .We further condition on the event E j,k | u (cid:38) nρ ( E j,k | u ) for all k ’s , (E2)which has probability at least − C K j exp( − c nε / K j | f | C ) , thanks to Lemma B.1 and (E1).Also recall that we are conditioning on (cid:107) (cid:98) v − v (cid:107) ≤ t − j . (E3)For two probability measures µ and ν , we define the Kantorovich distance K α ( µ, ν ) = sup g ∈C α , | g | C α ≤ (cid:90) g ( x ) d ( µ − ν )( x ) . The proof goes through a series of decompositions into statistics that are defined in Table 4 and whoseconcentration properties are stated in Lemma A.1.8
A. Lanteri, M. Maggioni and S. Vigogna
Table 4.
Statistics used in the decompositions of the proof of Proposition 4. (cid:98) y j,k | u = E j,k | u (cid:80) i Y i { X i ∈ E j,k | u } (cid:101) y j,k | u = E j,k | u (cid:80) i F ( X i ) { X i ∈ E j,k | u } y j,k | u = E [ F ( X ) | X ∈ E j,k | u ] (cid:98) ζ j,k | u = E j,k | u (cid:80) i ζ i { X i ∈ E j,k | u } (cid:98) x j,k | u = E j,k | u (cid:80) i X i { X i ∈ E j,k | u } x j,k | u = E [ X | X ∈ E j,k | u ] (cid:98) β j,k | u = (cid:80) i (cid:104) u,X i − (cid:98) x j,k | u (cid:105) (cid:16) Y i − (cid:98) y j,k | u (cid:17) { X i ∈ E j,k | u } (cid:80) i |(cid:104) u,X i − (cid:98) x j,k | u (cid:105)| { X i ∈ E j,k | u } (cid:98) q j,k | u = E j,k | u (cid:80) i (cid:104) u, X i − (cid:98) x j,k | u (cid:105) ( F ( X i ) − (cid:101) y j,k | u ) { X i ∈ E j,k | u } (cid:101) q j,k | u = E j,k | u (cid:80) i (cid:104) v, X i − x j,k | v (cid:105) ( F ( X i ) − y j,k | v ) { X i ∈ E j,k | u } q j,k | u = E [ (cid:104) v, X − x j,k | v (cid:105) ( F ( X ) − y j,k | v ) | X ∈ E j,k | u ] (cid:98) s j,k | u = E j,k | u (cid:80) i |(cid:104) u, X i − (cid:98) x j,k | u (cid:105)| { X i ∈ E j,k | u } (cid:101) s j,k | u = E j,k | u (cid:80) i |(cid:104) v, X i − x j,k | v (cid:105)| { X i ∈ E j,k | u } s j,k | u = Var[ (cid:104) v, X (cid:105) | X ∈ E j,k | u ] (cid:98) z j,k | u = E j,k | u (cid:80) i (cid:104) u, X i − (cid:98) x j,k | u (cid:105) ζ i { X i ∈ E j,k | u } Lemma A.1.
Under the assumptions of Proposition 4 and adopting the definitions in Table 4, for all k ’s satisfying (E1) and conditioned on (E2) and (E3) we have (a) P (cid:110) | (cid:98) ζ j,k | u | > ε √ K j ρ ( E j,k | u ) (cid:111) (cid:46) exp( − c nε / K j σ ) (b) P (cid:40) | (cid:101) y j,k | u − y j,k | u | > t − ε (cid:113) K j ρ ( E j,k | u ) (cid:41) (cid:46) exp( − c nε / K j t | f | C ) (c) | (cid:98) q j,k | v | ≤ | f | C r − j | (cid:98) q j,k | (cid:98) v | (cid:46) t | f | C r − j ( α = 1) (d) P { (cid:98) s j,k | u (cid:46) tr − j } (cid:46) exp( − c nε / K j t | f | C ) (e) P (cid:40) | (cid:98) z j,k | u | (cid:38) t − r − j ε (cid:113) K j ρ ( E j,k | u ) (cid:41) (cid:46) exp( − c nε / K j t σ ) (f) P (cid:40) |(cid:104) v, (cid:98) x j,k | u − x j,k | u (cid:105)| > t − | f | − C ε (cid:113) K j ρ ( I j,k | u ) (cid:41) (cid:46) exp (cid:0) − c nε / K j t | f | C r (cid:1) (g) P {| (cid:101) q j,k | u − q j,k | u | > r − j ε (cid:113) K j ρ ( E j,k | u ) } (cid:46) exp (cid:0) − c nε / K j t | f | C r (cid:1) Proof. (a). Follows directly from [49, Proposition 5.10], exploiting (E1) and (E2).(b). By the Bernstein inequality (along with (E1) and (E2)), we get P {| (cid:101) y j,k | u − y j,k | u | > ε/ (cid:113) K j ρ ( E j,k | u ) } (cid:46) exp (cid:18) − c nρ ( E j,k | u ) ε K j ρ ( E j,k | u ) | f | C + | f | C ε (cid:113) K j ρ ( E j,k | u ) (cid:19) onditional regression for single-index models = exp (cid:18) − c nε / K j | f | C + | f | C ε (cid:113) K j ρ ( E j,k | u ) (cid:19) ≤ exp( − c nε / K j | f | C ) . (c). Follows by definition and condition (E3).(d). We decompose | (cid:98) s j,k | u − s j,k | v | ≤ (cid:12)(cid:12)(cid:12)(cid:12) E j,k | u (cid:88) i (cid:104) u, x j,k | v − (cid:98) x j,k | u (cid:105)(cid:104) u, X i − (cid:98) x j,k | u (cid:105) { X i ∈ E j,k | u } (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) E j,k | u (cid:88) i (cid:104) u, X i − x j,k | v (cid:105)(cid:104) u, x j,k | v − (cid:98) x j,k | u (cid:105) { X i ∈ E j,k | u } (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) E j,k | u (cid:88) i (cid:104) u − v, X i − x j,k | v (cid:105)(cid:104) u, X i − x j,k | v (cid:105) { X i ∈ E j,k | u } (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) E j,k | u (cid:88) i (cid:104) v, X i − x j,k | v (cid:105)(cid:104) u − v, X i − x j,k | v (cid:105) { X i ∈ E j,k | u } (cid:12)(cid:12)(cid:12)(cid:12) + | (cid:101) s j,k | u − s j,k | u | + | s j,k | u − s j,k | v | . Using the Bernstein inequality (with (E1) and (E2)), condition (E3), and Lemma B.5 we ob-tain | (cid:98) s j,k | u − s j,k | v | (cid:46) tr − j , and hence (cid:98) s j,k | u (cid:38) tr − j by (R), with probability higher than − C exp( − c nε / K j t | f | C ) .(e). Follows from [6, Theorem 3.1], (E1) and (E2).(f). Follows by the Bernstein inequality (with (E1) and (E2)).(g). Follows by the Bernstein inequality (with (E1), (E2)) and condition (E3).We can now work to establish the main bound of Proposition 4. Let x ∈ B (0 , r ) , and let k be theindex such that (cid:104) (cid:98) v, x (cid:105) ∈ I j,k . Then (cid:98) f j | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j | v ( (cid:104) (cid:98) v, x (cid:105) ) = (cid:98) f j,k | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j,k | v ( (cid:104) (cid:98) v, x (cid:105) ) , where (cid:98) f j,k | u ( t ) = (cid:98) y j,k | u + (cid:98) β j,k | u ( t − (cid:104) u, (cid:98) x j,k | u (cid:105) ) defines our local empirical estimator with respect to the oracle direction ( u = v ), and the estimateddirection ( u = (cid:98) v ). Let us separate the constant and the linear components: | (cid:98) f j,k | (cid:98) v ( (cid:104) (cid:98) v, x (cid:105) ) − (cid:98) f j,k | v ( (cid:104) (cid:98) v, x (cid:105) ) | ≤ C + L,C = | (cid:98) y j,k | (cid:98) v − (cid:98) y j,k | v | L = | (cid:98) β j,k | (cid:98) v (cid:104) (cid:98) v, x − (cid:98) x j,k | (cid:98) v (cid:105) − (cid:98) β j,k | v ( (cid:104) (cid:98) v, x (cid:105) − (cid:104) v, (cid:98) x j,k | v (cid:105) ) | . We first approach the constant part. We have C ≤ | (cid:98) ζ j,k | (cid:98) v | + | (cid:101) y j,k | (cid:98) v − y j,k | (cid:98) v | + | y j,k | (cid:98) v − y j,k | v | + | y j,k | v − (cid:101) y j,k | v | + | (cid:98) ζ j,k | v | . A. Lanteri, M. Maggioni and S. Vigogna
Now, P {| (cid:98) ζ j,k | u | > ε/ √ K j ρ ( E j,k | u ) } (cid:46) exp( − c nε / K j σ ) and P {| (cid:101) y j,k | u − y j,k | u | > ε/ (cid:113) K j ρ ( E j,k | u ) } (cid:46) exp( − c nε / K j | f | C ) by Lemma A.1(a) and (b). In view of [38, Proposition 1.2.6] (bounding K α in terms of W ) and theKantorovich–Rubinstein duality for W , we have | y j,k | (cid:98) v − y j,k | v | ≤ | f | C α K α ( ρ v ( · | E j,k | (cid:98) v ) , ρ v ( · | E j,k | v )) ≤ | f | C α (cid:16) W ( ρ v ( · | E j,k | (cid:98) v ) , ρ v ( · | E j,k | v )) (cid:17) − α (cid:46) | f | C α r − α (cid:107) (cid:98) v − v (cid:107) − α , where in the last inequality we have used Lemma B.5.We now take care of the linear part, for which we can assume α = 1 . We split L ≤ L + L + L , where L = | (cid:98) β j,k | v ||(cid:104) v − (cid:98) v, x (cid:105)| L = | (cid:98) β j,k | (cid:98) v ||(cid:104) (cid:98) v, x − (cid:98) x j,k | (cid:98) v (cid:105) − (cid:104) v, x − (cid:98) x j,k | v (cid:105)| L = | (cid:98) β j,k | (cid:98) v − (cid:98) β j,k | v ||(cid:104) v, x − (cid:98) x j,k | v (cid:105)| . We have | (cid:98) β j,k | u | ≤ (cid:98) s − j,k | u (cid:16) | (cid:98) q j,k | u | + | (cid:98) z j,k | u | (cid:17) | (cid:98) β j,k | (cid:98) v − (cid:98) β j,k | v | ≤ (cid:98) s − j,k | (cid:98) v | (cid:98) q j,k | (cid:98) v − (cid:98) q j,k | v | + (cid:98) s − j,k | (cid:98) v (cid:98) s − j,k | v | (cid:98) s j,k | (cid:98) v − (cid:98) s j,k | v || (cid:98) q j,k | v | + (cid:98) s − j,k | (cid:98) v | (cid:98) z j,k | (cid:98) v | + (cid:98) s − j,k | v | (cid:98) z j,k | v | . Using Lemma A.1(d), (c) and (e), we get | (cid:98) β j,k | v | ≤ | f | C + t − r − j ε (cid:113) K j ρ ( E j,k | (cid:98) v ) | (cid:98) β j,k | (cid:98) v | ≤ t | f | C + t − r − j ε (cid:113) K j ρ ( E j,k | (cid:98) v ) | (cid:98) β j,k | (cid:98) v − (cid:98) β j,k | v | (cid:46) t − r − j | (cid:98) q j,k | (cid:98) v − (cid:98) q j,k | v | + t − r − j | f | C | (cid:98) s j,k | (cid:98) v − (cid:98) s j,k | v | + t − r − j ε (cid:113) K j ρ ( E j,k | (cid:98) v ) onditional regression for single-index models − C exp( − c nε / K j t | f | C ) . Hence L ≤ | f | C r (cid:107) (cid:98) v − v (cid:107) + ε (cid:113) K j ρ ( E j,k | (cid:98) v ) L ≤ t | f | C (cid:16) |(cid:104) (cid:98) v − v, x (cid:105)| + |(cid:104) v, (cid:98) x j,k | v − (cid:98) x j,k | (cid:98) v (cid:105)| + |(cid:104) v − (cid:98) v, (cid:98) x j,k | (cid:98) v (cid:105)| (cid:17) + t − r − j ε (cid:113) K j ρ ( E j,k | (cid:98) v ) (cid:16) |(cid:104) (cid:98) v, x − (cid:98) x j,k | (cid:98) v (cid:105)| + |(cid:104) v, x − (cid:98) x j,k | v (cid:105)| (cid:17) (cid:46) t | f | C (cid:16) r (cid:107) (cid:98) v − v (cid:107) + |(cid:104) v, (cid:98) x j,k | v − (cid:98) x j,k | (cid:98) v (cid:105)| (cid:17) + ε (cid:113) K j ρ ( E j,k | (cid:98) v ) L ≤ r − j | (cid:98) q j,k | (cid:98) v − (cid:98) q j,k | v | + | f | C r − j | (cid:98) s j,k | (cid:98) v − (cid:98) s j,k | v | + ε (cid:113) K j ρ ( E j,k | (cid:98) v ) with probability higher than − C exp( − c nε / K j t | f | C ) . Now, |(cid:104) v, (cid:98) x j,k | (cid:98) v − (cid:98) x j,k | v (cid:105)| ≤ |(cid:104) v, (cid:98) x j,k | (cid:98) v − x j,k | (cid:98) v (cid:105)| + |(cid:104) v, x j,k | (cid:98) v − x j,k | v (cid:105)| + |(cid:104) v, x j,k | v − (cid:98) x j,k | v (cid:105)| , where, by Lemma A.1(f), |(cid:104) v, (cid:98) x j,k | u − x j,k | u (cid:105)| ≤ t − | f | − C ε (cid:113) K j ρ ( I j,k | u ) with probability higher than − C exp (cid:0) − c nε / K j t | f | C (cid:1) , and, thanks to Lemma B.5, |(cid:104) v, x j,k | (cid:98) v − x j,k | v (cid:105)| ≤ W ( ρ v ( · | E j,k | (cid:98) v ) , ρ v ( · | E j,k | v )) (cid:46) r (cid:107) (cid:98) v − v (cid:107) . We are now left to estimate | (cid:98) q j,k | (cid:98) v − (cid:98) q j,k | v | and | (cid:98) s j,k | (cid:98) v − (cid:98) s j,k | v | . First, we break down | (cid:98) q j,k | (cid:98) v − (cid:98) q j,k | v | ≤ (cid:80) a =1 Q a , where Q = (cid:12)(cid:12)(cid:12)(cid:12) E j,k | (cid:98) v (cid:88) i (cid:104) (cid:98) v, x j,k | v − (cid:98) x j,k | (cid:98) v (cid:105) ( F ( X i ) − (cid:101) y j,k | (cid:98) v ) { X i ∈ E j,k | (cid:98) v } (cid:12)(cid:12)(cid:12)(cid:12) Q = (cid:12)(cid:12)(cid:12)(cid:12) E j,k | (cid:98) v (cid:88) i (cid:104) (cid:98) v − v, X i − x j,k | v (cid:105) ( F ( X i ) − (cid:101) y j,k, (cid:98) v ) { X i ∈ E j,k | (cid:98) v } (cid:12)(cid:12)(cid:12)(cid:12) Q = (cid:12)(cid:12)(cid:12)(cid:12) E j,k | (cid:98) v (cid:88) i (cid:104) v, X i − x j,k | v (cid:105) ( y j,k | v − (cid:101) y j,k | (cid:98) v ) { X i ∈ E j,k | (cid:98) v } (cid:12)(cid:12)(cid:12)(cid:12) Q = | (cid:101) q j,k | (cid:98) v − q j,k | (cid:98) v | Q = | q j,k | (cid:98) v − q j,k | v | Q = | q j,k | v − (cid:101) q j,k | v | A. Lanteri, M. Maggioni and S. Vigogna Q = (cid:12)(cid:12)(cid:12)(cid:12) E j,k | v (cid:88) i (cid:104) v, (cid:98) x j,k | v − x j,k | v (cid:105) ( F ( X i ) − y j,k | v ) { X i ∈ E j,k | v } (cid:12)(cid:12)(cid:12)(cid:12) Q = (cid:12)(cid:12)(cid:12)(cid:12) E j,k | v (cid:88) i (cid:104) v, X i − (cid:98) x j,k | v (cid:105) ( (cid:101) y jk | v − y j,k | v ) { X i ∈ E j,k | v } (cid:12)(cid:12)(cid:12)(cid:12) . We bound the terms T i ’s as follows. QQQ . Q ≤ |(cid:104) (cid:98) v, x j,k | v − (cid:98) x j,k | (cid:98) v (cid:105)| E j,k | (cid:98) v (cid:88) i | F ( X i ) − (cid:101) y j,k | (cid:98) v | { X i ∈ E j,k | (cid:98) v } with | F ( X i ) − (cid:101) y jk | (cid:98) v | (cid:46) t | f | C r − j . Hence r − j Q ≤ t | f | C |(cid:104) (cid:98) v, (cid:98) x j,k | (cid:98) v − x j,k | v (cid:105)|≤ t | f | C ( |(cid:104) v, (cid:98) x j,k | (cid:98) v − x j,k | (cid:98) v (cid:105)| + |(cid:104) v, x j,k | (cid:98) v − x j,k | v (cid:105)| + r (cid:107) (cid:98) v − v (cid:107) ) , where P {|(cid:104) v, (cid:98) x j,k | (cid:98) v − x j,k | (cid:98) v (cid:105)| > t − | f | − C ε (cid:113) K j ρ ( E j,k | (cid:98) v ) } (cid:46) exp (cid:16) − c nε / K j t | f | C r (cid:17) by Lemma A.1(f), and |(cid:104) v, x j,k | (cid:98) v − x j,k | v (cid:105)| ≤ W ( ρ v ( · | E j,k | (cid:98) v ) , ρ v ( · | E j,k | v )) (cid:46) r (cid:107) (cid:98) v − v (cid:107) by Lemma B.5. QQQ . Q (cid:46) − j t | f | C r (cid:107) (cid:98) v − v (cid:107) , hence r − j Q ≤ t | f | C r (cid:107) (cid:98) v − v (cid:107) . QQQ . Q ≤ I j,k | (cid:98) v (cid:88) i |(cid:104) v, X i − x j,k | v (cid:105)|| y j,k | v − (cid:101) y jk | (cid:98) v | { X i ∈ E j,k | (cid:98) v } with |(cid:104) v, X i − x j,k | v (cid:105) ) | (cid:46) tr − j . Hence r − j Q ≤ t | (cid:101) y jk | (cid:98) v − y j,k | v | ≤ t ( | (cid:101) y jk | (cid:98) v − y j,k | (cid:98) v | + | y j,k | (cid:98) v − y j,k | v | ) , where P {| (cid:101) y jk | (cid:98) v − y j,k | (cid:98) v | > t − ε (cid:113) K j ρ ( I j,k | (cid:98) v ) } ≤ C exp (cid:16) − c nε / K j t | f | C (cid:17) by Lemma A.1(b), and | y j,k | (cid:98) v − y j,k | v | ≤ | f | C W ( ρ ( · | I j,k | (cid:98) v ) , ρ ( · | I j,k | v )) (cid:46) | f | C r (cid:107) (cid:98) v − v (cid:107) by Lemma B.5. QQQ and QQQ . We apply Lemma A.1(g). QQQ . Q = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) g ( t )( dρ v ( t | E j,k | (cid:98) v ) − dρ v ( t | E j,k | v )) (cid:12)(cid:12)(cid:12)(cid:12) onditional regression for single-index models g ( t ) = ( t − (cid:104) v, x j,k | v (cid:105) )( f ( t ) − y j,k | v ) is Lipschitz of constant (cid:46) t | f | C − j r : | g ( t ) − g ( s ) | ≤ | t − s || f ( t ) − y j,k | v | + | s − (cid:104) v, x j,k | v (cid:105)|| f ( t ) − f ( s ) | (cid:46) t | f | C r − j | t − s | . Thus, by Lemma B.5, r − j Q (cid:46) t | f | C W ( ρ v ( · | E j,k | (cid:98) v ) − ρ v ( · | E j,k | v )) (cid:46) t | f | C r (cid:107) (cid:98) v − v (cid:107) .QQQ . We apply Lemma A.1(f) on Q ≤ |(cid:104) v, (cid:98) x j,k | v − x j,k | v (cid:105)|| f | C r − j . QQQ . We apply Lemma A.1(b) on Q ≤ r − j | (cid:98) y j,k | v − y j,k | v | .The quantity | (cid:98) s j,k | (cid:98) v − (cid:98) s j,k | v | can be estimated through an analogous decomposition. We can finallyput all the terms together and take the union bound over K j , which completes the proof. Appendix B: Proofs of technical results
In our proofs, we make use of the following Lemma to ensure that we have enough local samples, orto concentrate the empirical measure on the underlying distribution.
Lemma B.1.
Let X be a random variable, and let X , . . . , X n be independent copies of X . Given ameasurable set E , define ρ ( E ) = P { X ∈ E } and (cid:98) ρ ( E ) = n − (cid:80) i { X i ∈ E } . Then P {| (cid:98) ρ ( E ) − ρ ( E ) | > t } ≤ (cid:18) − nt / ρ ( E ) + t/ (cid:19) . In particular, for t = ρ ( E ) / we have P (cid:26)(cid:98) ρ ( E ) / ∈ (cid:20) ρ ( E ) , ρ ( E ) (cid:21)(cid:27) ≤ P (cid:26) | (cid:98) ρ ( E ) − ρ ( E ) | > ρ ( E ) (cid:27) ≤ (cid:0) − nρ ( E ) (cid:1) . Proof.
The bound follows by a direct application of the Bernstein inequality to the random variables { X i ∈ E } .When working with possibly unbounded distributions, we need some control on their tails. A com-mon choice is to assume sub-Gaussian decay. We recall that a random variable X is sub-Gaussian ofvariance proxy R if P {| X | > t } ≤ (cid:18) − t R (cid:19) . A random vector X ∈ R d is sub-Gaussian if (cid:104) u, X (cid:105) is sub-Gaussian for every u ∈ S d − . In particular,bounded and normal distributions are sub-Gaussian. Lemma B.2.
Let X ∈ R d be a sub-Gaussian vector with variance proxy R . Then P {(cid:107) X (cid:107) > t } ≤ (cid:18) − t dR (cid:19) . A. Lanteri, M. Maggioni and S. Vigogna
Proof.
Let X k be the k -th coordinate of X . Then E (cid:20) exp (cid:18) (cid:107) X (cid:107) dR (cid:19)(cid:21) = E (cid:34) d (cid:89) k =1 exp (cid:18) | X k | dR (cid:19)(cid:35) ≤ (cid:32) d (cid:89) k =1 E (cid:20) exp (cid:18) | X k | R (cid:19)(cid:21)(cid:33) /d ≤ . The result follows from [50, Proposition 2.5.2].The lemma below shows that most samples from a d -dimensional sub-Gaussian distribution of vari-ance proxy R fall into a ball of radius √ dR . Lemma B.3.
Let X , . . . , X n be independent copies of a sub-Gaussian vector X ∈ R d with varianceproxy R . Then, for every α ≥ and β ∈ (0 , , P (cid:110) B (0 , (cid:112) d log(2 α ) R )) < (cid:0) − α (cid:1) βn (cid:111) ≤ e − (cid:16) − α (cid:17) (1 − β ) / − β ) / n . Proof.
Let B = B (0 , (cid:112) d log(2 α ) R )) and ρ ( B ) = P { X ∈ B } . Lemma B.2 gives ρ ( B ) ≥ − − log(2 α )) = (cid:0) − α (cid:1) . An application of Lemma B.1 with t = (1 − β ) ρ ( B ) yields P (cid:8) B < (cid:0) − α (cid:1) βn (cid:9) ≤ P { B < βρ ( B ) n }≤ (cid:16) − (1 − β ) / − β ) / ρ ( B ) n (cid:17) ≤ (cid:16) − (cid:0) − α (cid:1) (1 − β ) / − β ) / n (cid:17) . We often carry out the following integration to obtain expectation bounds from bounds in probability.
Lemma B.4.
Let X be a random variable. Suppose there are p ∈ [1 , , a ≥ e and b > such that P {| X | > ε } ≤ ae − bε p for every ε > . Then E | X | ≤ (cid:0) log ab (cid:1) /p . Proof.
Integrating over ε > we get E | X | ≤ (cid:82) ε ε dε + (cid:82) ∞ ε ae − bε p εdε with ε = (log a/b ) / p .The first integral is equal to (log a/b ) /p , while the substitution bε p → ε in the second integral gives a p (cid:90) ∞ log a ε /p − e − ε dε (cid:18) b (cid:19) /p ≤ a (cid:90) ∞ log a e − ε dε (cid:18) b (cid:19) /p = 12 (cid:18) b (cid:19) /p . In the proof of Proposition 4 we use the following bound on the 1 st Wasserstein distance W betweentwo conditional distributions. Lemma B.5.
Let ρ be a probability distribution in R d , v, w ∈ S d − , P u x = (cid:104) u, x (cid:105) and I ⊂ R aninterval with ρ ( P − u I ) > for u ∈ { v, w } . (a) Suppose ρ is spherical. Then W ( ρ ( x | P v x ∈ I ) , ρ ( x | P w x ∈ I )) (cid:46) sin( ∠ ( v, w )) E X ∼ ρ [ (cid:107) X (cid:107) | P v X ∈ I ] . onditional regression for single-index models Suppose ρ has an upper bounded density and ρ ( P − v I ) (cid:38) | I | , and let P u denote the push-forward. Then W ( P v ρ ( x | P v x ∈ I ) , P v ρ ( x | P w x ∈ I )) (cid:46) sin( ∠ ( v, w )) diam(supp ρ ) . Proof.
Let X be a random vector distributed according to ρ , and let θ = ∠ ( v, w ) . If ρ is spherical, then ρ ( x | P w x ∈ I ) = ρ ( Rx | P v x ∈ I ) , where R is a rotation of angle θ . Hence, W ( ρ ( x | P v x ∈ I ) , ρ ( x | P w x ∈ I )) = (cid:90) (cid:107) x − Rx (cid:107) dρ ( x | P v x ∈ I )= 2 sin( θ/ E [ (cid:107) X (cid:107) | P v X ∈ I ] . Assume now the absolutely continuous case. Let µ = P v ρ ( x | P v x ∈ I ) , ν = P v ρ ( x | P w x ∈ I ) . By virtue of [15, Teorema 1] ([16, Equation (1.2)]), we have W ( µ, ν ) = (cid:90) + ∞−∞ | M ( t ) − N ( t ) | dt with M ( t ) = P { P v X ≤ t | P v X ∈ I } , N ( t ) = P { P v X ≤ t | P w X ∈ I } . Let J (cid:48) = P v ( ∂P − w I ∩ supp ρ ) , where ∂ denotes the boundary, and let J = I \ J (cid:48) . Then (cid:90) + ∞−∞ | M ( t ) − N ( t ) | dt ≤ | J (cid:48) | + (cid:90) J | M ( t ) − N ( t ) | dt. The first term is bounded as | J (cid:48) | (cid:46) sin( θ ) diam(supp ρ ) . For the second term, define I u = P − u I, I u,t = P − v ( −∞ , t ] ∩ P − u I u ∈ { v, w } , and note that ρ ( I u,t ) ≤ ρ ( I u ) . Then | M ( t ) − N ( t ) | = (cid:12)(cid:12)(cid:12)(cid:12) ρ ( I v,t ) ρ ( I v ) − ρ ( I w,t ) ρ ( I w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ρ ( I w ) | ρ ( I v,t ) − ρ ( I w,t ) | + | ρ ( I w ) − ρ ( I v ) | ρ ( I w,t ) ρ ( I v ) ρ ( I w ) ≤ ρ ( I v ) − (cid:0) | ρ ( I v,t ) − ρ ( I w,t ) | + | ρ ( I w ) − ρ ( I v ) | (cid:1) . Denoting A (cid:52) B = ( A ∪ B ) \ ( A ∩ B ) , we have | ρ ( I v,t ) − ρ ( I w,t ) | ≤ ρ ( I v,t (cid:52) I w,t ) ≤ ρ ( I v (cid:52) I w ) , | ρ ( I w ) − ρ ( I v ) | ≤ ρ ( I w (cid:52) I v ) = ρ ( I v (cid:52) I w ) (cid:46) sin( θ ) diam(supp ρ ) . The claim now follows from the assumption ρ ( I v ) (cid:38) | I | ≥ | J ||