[PDF] Recovery of regular ridge functions on the ball

Abstract

We consider the problem of the uniform (in L_\infty) recovery of ridge functions f(x)=\varphi(\langle a,x\rangle), x\in B_2^n, using noisy evaluations y_1\approx f(x^1),\ldots,y_N\approx f(x^N). It is known that for classes of functions \varphi of finite smoothness the problem suffers from the curse of dimensionality: in order to provide good accuracy for the recovery it is necessary to make exponential number of evaluations. We prove that if \varphi is analytic in a neighborhood of [-1,1] and the noise is small, then there is an efficient algorithm that recovers f with good accuracy using \asymp n\log^2n function evaluations.

Full PDF

RRecovery of regular ridge functions on the ball

Tatyana Zaitseva ∗ , Yuri Malykhin † , Konstantin Ryutin ‡ March 1, 2021

Abstract

We consider the problem of the uniform (in L ∞ ) recovery of ridgefunctions f ( x ) = ϕ ( (cid:104) a, x (cid:105) ), x ∈ B n , using noisy evaluations y ≈ f ( x ) , . . . , y N ≈ f ( x N ). It is known that for classes of functions ϕ ofﬁnite smoothness the problem suﬀers from the curse of dimensionality:in order to provide good accuracy for the recovery it is necessary tomake exponential number of evaluations. We prove that if ϕ is analyticin a neighborhood of [ − ,

1] and the noise is small, then there is aneﬃcient algorithm that recovers f with good accuracy using (cid:16) n log n function evaluations. Ridge functions recovery.

We consider ridge functions (plane waves) onthe ball B n = { x ∈ R n : | x | (cid:54) } , i.e. functions of the form f ( x ) = ϕ ( (cid:104) a, x (cid:105) ),where a ∈ R n , | a | = 1 is a ﬁxed vector and ϕ is a function on [ − , (cid:104) x, y (cid:105) is the scalar product on R n ). The problem is to recover f from its N noisy values at some points: y ≈ f ( x ) , . . . , y N ≈ f ( x N ). ∗ Moscow Center for Fundamental and Applied Mathematics, Laboratory “High-dimensional Approximation and Applications” of Lomonosov Moscow State University.Email: [email protected] † Steklov Mathematical Institute, Laboratory “High-dimensional Approximation andApplications” of Lomonosov Moscow State University. Email: [email protected] ‡ Moscow Center for Fundamental and Applied Mathematics, Laboratory “High-dimensional Approximation and Applications” of Lomonosov Moscow State University.Email: [email protected] a r X i v : . [ m a t h . F A ] F e b he recovery algorithm is allowed to choose points x , . . . , x N for functionevaluation; the algorithm is called adaptive if the point x i +1 may depend onthe previously obtained values y , . . . , y i . When we say that our algorithmmakes an evaluation of the unknown function we mean that it prescribes thepoint and receives the approximate value of the function at this point.The diﬃculty is that when log N = o ( n ), for any points x , . . . , x N ∈ B n there exists a unit vector a such that |(cid:104) a, x j (cid:105)| = o (1). Therefore we canmeasure ϕ only in a small neighborhood of zero and we cannot distinguish anontrivial function supported outside this neighborhood from the identicallyzero function. Therefore, if one knows only that ϕ is smooth, then anyrecovery method requires exponential (with respect to the dimension) numberof evaluations.In the theory of function recovery the number of function evaluationssuﬃcient to recover the unknown function with a given accuracy is studied.If its dependence on the dimension is exponential, then it is called the curseof dimensionality. Results on the existence of recovery algorithms with asmall number of evaluations (say, polynomially depending on the dimensionof the space) or, conversely, that any algorithm requires a large (exponen-tial) number of evaluations are actively studied within the Information-basedComplexity theory. See [NW16] for some statements and results in this di-rection.For r > r ) be the class of functions g : [ − , → R , with m = (cid:100) r − (cid:101) continuous derivatives on [ − , (cid:107) g ( j ) (cid:107) ∞ (cid:54) j = 0 , . . . , m , and g ( m ) is ( r − m )–H¨older with constant 1.The recovery in L ∞ for smooth ridge functions on the cube, f ( x ) = ϕ ( (cid:104) a, x (cid:105) ) , x ∈ [ − , n , (cid:107) a (cid:107) = 1 , ϕ : [ − , → R , (1)was studied by A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian, D. Pi-card in [CDD12]. The authors assume that ϕ ∈ Lip( r ) and the vector a is nonnegative: a i (cid:62) i = 1 , . . . , n . This condition is very restrictiveand it permits to recover ϕ ( t ) at any point t ∈ [ − ,

1] as: ϕ ( t ) = f ( x t ), x t := ( t, t, . . . , t ). The most interesting results in [CDD12] were obtainedunder the additional restriction that a is bounded in the weak- (cid:96) np norm, p ∈ (0 , (cid:96) p condition by thefollowing: (cid:107) a (cid:107) p (cid:54) M for some p ∈ (0 ,

1) and M (cid:62)

1. (2)2he class R ( r, p, M ) of ridge functions f on the cube, (1), with ϕ ∈ Lip( r )and vector a satisfying (2), arises. The algorithm from [DM21] gives anapproximation (cid:101) f , such thatsup f ∈R ( r,p,M ) ( E (cid:107) f − (cid:101) f (cid:107) ∞ ) / (cid:54) C ( r, p, M )(log n ) − r (1 /p − for N (cid:62) n .On the other hand it was proven that without (2) we have the curse ofdimensionality: a similar recovery error for the class of ridge functions of theform (1) with ϕ ∈ Lip( r ) is at least ε >

0, when N (cid:54) C exp( n/ f ( x ) = ϕ ( (cid:104) a, x (cid:105) ) , | a | = 1 , x ∈ B n . The corresponding recovery problem, even in a more general setting, wasconsidered by M. Fornasier, K. Schnass and J. Vybiral in [FSV12]. They aimto recover functions of the form f ( x ) = ϕ ( Ax ), where A is some ﬁxed k × n matrix and ϕ is a function of k variables. Let us restrict ourselves to the case k = 1 (ridge functions). Authors of [FSV12] introduce the following quantity α := (cid:90) S n − | ϕ (cid:48) ( (cid:104) a, x (cid:105) ) | dµ S n − ( x ) (3)(it does not depend on a ). They proved that if a satisﬁes (2) (the case of p = 1 is possible), ϕ ∈ C and α is not close to zero (e.g. | ϕ (cid:48) (0) | is separatedfrom zero) then the eﬃcient recovery of f is possible. Morally speaking, N (cid:16) ( ωα ) − evaluations are suﬃcient in order to achieve the error (cid:54) ω withhigh probability (when p = 1). Later, H. Tyagi and V. Cevher in [TC14]managed to get rid of the condition (2).We remark that for the functions considered in our paper, the quan-tity α can be very small. Given the admissible error ω , the class of ana-lytic functions contains a function with oscillation more than ω , but α (cid:54) exp( − c log(1 /ω ) log n ).In [MUV15] diﬀerent approximation characteristics for several classes ofridge functions on the ball were estimated. As a corollary the formal proofof the curse of dimensionality for the corresponding recovery problem wasgiven.A closely related problem in statistics is known as a “single index modelregression”. We brieﬂy describe its formulation. Suppose X is a randomvector in R n and Y is a random variable. It is assumed that the regression3unction f ( x ) = E ( Y | X = x ) has the form f ( x ) = g ( (cid:104) a, x (cid:105) ), i.e., it is a ridgefunction; in the language of statistics, such an assumption is called the singleindex model. It is required to construct a regression, i.e., approximate f , froma sample { ( X i , Y i ) } Ni =1 . We note the following important distinctions betweenthe regression problem and our problem: (1) we are not free to choose X i ;these points are a random sample from the distribution µ X unknown to us;(2) the error is measured in the L ( µ X )-norm rather than in the uniformnorm. See [GL07] for the statistical setting.In the case ϕ ( u ) = u our problem is deeply related to the Phase RetrievalProblem . Regular ridge functions.

S.V. Konyagin (personal communication) sug-gested to consider the case of regular ridge functions on the ball, i.e. f ( x ) = ϕ ( (cid:104) a, x (cid:105) ) with regular (analytic) ϕ .The recovery for classes of analytic functions was actively studied since1960s. The main focus was on the best possible method. The details aboutthe results can be found in[Os00]. In our paper we apply the recovery methodand error estimates from [DT19]: an analytic function is extrapolated from itsnoisy evaluations on the grid using least-squares ﬁtted algebraic polynomial.Let H ( σ, Q ) be the class of analytic functions ϕ in Π σ = { z = t + iy : | t | (cid:54) σ, | y | (cid:54) σ } , such that | ϕ ( z ) | (cid:54) Q in Π σ and ϕ ( z ) is real–valued for real z . For this class we do have a polynomial recovery algorithm provided theevaluation errors are suﬃciently small.Given a class Φ of functions ϕ : [ − , → R , we denote by R (Φ , B n )the class of ridge functions f : B n → R of the form f ( x ) = ϕ ( (cid:104) a, x (cid:105) ), where a ∈ R n , | a | = 1 and ϕ ∈ Φ. Theorem.

Let n ∈ N , σ ∈ (0 , , Q (cid:62) , δ ∗ ∈ (0 , ) , ω ∗ ∈ (0 , ) . There isa probabilistic adaptive algorithm: for any function f ∈ R ( H ( σ, Q ) , B n ) ituses N evaluations of f with errors not exceeding ε , where ε := Q exp( − C ( σ ) log( Qn/ω ∗ ) log n ) ,N := (cid:100) C ( σ ) log ( Qn/ω ∗ )(log(1 /δ ∗ ) + n ) (cid:101) and outputs an approximation (cid:101) f , such that with probability (cid:62) − δ ∗ we have max x ∈ B n | (cid:101) f ( x ) − f ( x ) | (cid:54) ω ∗ .

4e remark that (cid:101) f is of the form (cid:101) f ( x ) = (cid:101) ϕ ( (cid:104) (cid:101) a, x (cid:105) ), with (cid:101) ϕ — an algebraicpolynomial of degree not exceeding C ( σ ) log( Q/ω ∗ ). The number of opera-tions for the algorithm depends polynomially on n , Q/ω ∗ and log(1 /δ ∗ ) forﬁxed σ .The function f is invariant under ( a, ϕ ( x )) (cid:55)→ ( − a, ϕ ( − x )). We recovereither a and ϕ or − a, ϕ ( − x ). Corollary.

Let κ > . There is a polynomial algorithm that uses at most C ( σ, κ ) n log n evaluations of an unknown function f ∈ R ( H ( σ, Q ) , B n ) witherrors (cid:54) Q exp( − C ( σ, κ ) log n ) and outputs an approximation (cid:101) f , such that (cid:107) (cid:101) f − f (cid:107) L ∞ ( B n ) (cid:54) Qn − κ with probability (cid:62) − e − n . We did not try to optimize all the steps of the algorithm. Our main pointis the possibility of the eﬀective (of polynomial complexity) recovery algo-rithm of regular ridge functions. We also developed a new method involvingglobal properties of functions (the so-called embedding) and order statistics.In section 3 we discuss the results of the numerical implementation of thealgorithm.

Notations and useful facts.

We shall freely use the fact that n is large.Additionally we suppose ω ∗ and δ ∗ to be small enough.By homogeneity we may further assume that Q = 1. For the rest of thepaper ϕ denotes some unknown function from H ( σ, σ ∈ (0 , a theunknown vector of coeﬃcients of the ridge function f ( x ) = ϕ ( (cid:104) a, x (cid:105) ), x ∈ B n , | a | = 1.Diﬀerent quantities that are produced by the algorithm will be denotedby variables with “tildes”: e.g. (cid:101) ϕ i , (cid:101) ∆ ih,ν (they depend on the taken samplesof our function f ).In the rest of the paper for any vector γ ∈ B n we denote v γ := n / (cid:104) a, γ (cid:105) , and (cid:101) ϕ γ is some approximation for the function ϕ ( v γ t ) satisfying (4).We will ﬁx some constants b, B >

0; it is supposed that b − and B arelarge enough. They must satisfy condition (19) given below; in fact one cantake b = 0 . B = 5. We call a real number v typical if b (cid:54) | v | (cid:54) B .Let σ := σ b B .Throughout the paper t and x are real variables and z denotes complexvariable. Therefore the set {| t | (cid:54) h } is a segment, and {| z | (cid:54) h } is a disk onthe complex plane. 5e denote by c, c , . . . , C, C , . . . positive reals (their values may diﬀerfrom line to line).The functions from H ( σ,

1) are L σ -Lipschitz on [ − , L σ (cid:54) Cσ − (the explicit dependence on σ is not important for us).Let E ρ be the ellipse with focii ± ρ . Wenote that the major semi-axis of E ρ equals R = ( ρ + ρ − ); and we have, ρ = R + √ R −

1. It is clear that E σ ⊂ Π σ . When we say that somefunction ψ is analytic in E ρ and | ψ | (cid:54) C there we mean that it is analyticin the open domain bounded by E ρ and the estimate holds in the closeddomain.We make use of diﬀerent probabilistic notions and constructions. Forany random variable ξ we denote by Law ( ξ ) its distribution. Φ denotes thedistribution function of the standard gaussian random variable; by Φ ∗ wedenote the distribution function of | ξ | , ξ ∼ N (0 , ∗ ( x ) = 2(Φ( x ) − / x (cid:62) N , N , N , M , M (suﬃciently large natural numbers) and ω , ω , ω (suﬃciently small positive reals) that will be chosen in order to satisfydiﬀerent inequalities required for our algorithm. All of them may dependon σ , n , ω ∗ , with the sole exception of N , that depends only on δ ∗ . Themeaning of these parameters will be clear from the scheme of the algorithmgiven below. The scheme of the recovery algorithm.

Here we describe the steps ofthe algorithm and the ideas behind them. We hope that it will help thereader to follow the detailed proofs given in the next section.1. The procedure of extrapolation.Given a vector γ ∈ B n , we evaluate the function f on the grid: y k ≈ f (cid:18) γ kN (cid:19) = ϕ (cid:18) (cid:104) a, γ (cid:105) kN (cid:19) , k = − N , . . . , N , where N is large enough. Let v γ := n / (cid:104) a, γ (cid:105) . Thus, ϕ ( v γ t ) will beevaluated on the uniform grid in [ − n − / , n − / ]. We ﬁt a polynomialof an appropriate degree M to obtained values by least squares anduse it to extrapolate ϕ ( v γ t ) to larger segments. We rely on the esti-mates from [DT19] to prove that the constructed function (cid:101) ϕ γ gives an6pproximation with small enough error ω : | (cid:101) ϕ γ ( t ) − ϕ ( v γ t ) | (cid:54) ω , | t | (cid:54) min(1 , σ | v γ | ) . (4)2. The construction of (cid:101) ϕ i .We take N random vectors on the sphere: γ , . . . , γ N . For each γ i weconstruct the function (cid:101) ϕ i := (cid:101) ϕ γ i , that approximates ϕ ( v i t ), v i := v γ i .As a result we obtain the set of functions { (cid:101) ϕ i } N i =1 . Recall that a number v is called typical if b (cid:54) | v | (cid:54) B . Typical v i aremost convenient for us: we see from (4) that, ﬁrst, they are informative,i.e., a rather large part of ϕ is recovered, and second, the functions (cid:101) ϕ i exhibit a “good” behaviour on a fairly large segment | t | (cid:54) σ/ (2 B ).Since our choice of γ i is random our algorithm is probabilistic. Allother constructions and statements are true under condition (16) on γ i , that holds with high probability. That condition implies, e.g., thatmost of the v i are typical.3. The estimation of function oscillation.At this step we distinguish the case of a function ϕ close to a constantfrom the case of a function whose oscillation is large. We estimate theoscillation of functions (cid:101) ϕ i and obtain either∆ σ / := max | t | (cid:54) σ / | ϕ ( t ) − ϕ (0) | (cid:54) ω , (5)or the inequality ∆ σ = max | t | (cid:54) σ | ϕ ( t ) − ϕ (0) | (cid:62) ω . (6)The inequality (5) for small enough ω leads to the global boundmax | t | (cid:54) | ϕ ( t ) − ϕ (0) | (cid:54) ω ∗ . (7)Hence f can be approximated by f (0) and the algorithm stops. In thecase of (6) we proceed to the next step.7. The procedure of the search for the embedding (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ .We do not know the values of v γ but a simple idea helps us to ap-proximate the ratio | v γ | / | v γ | for any pair of vectors γ , γ . Namely, if | v γ | (cid:62) | v γ | , then from (4) it follows that (cid:101) ϕ γ ( t ) ≈ (cid:101) ϕ γ ( ± t/λ ) for some λ (cid:62)

1. If this approximate equality holds then we call it an “embed-ding” (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ . We show that if v γ is typical, the corresponding λ can be found with high accuracy and | (cid:101) λ ( (cid:101) ϕ γ , (cid:101) ϕ γ ) − | v γ || v γ | | (cid:54) ω .5. The search for a typical v . The goal of this step is to ﬁnd an index i ,such that v i is typical and | v i | (cid:54) /

4. We ﬁnd all possible pairwiseembeddings (cid:101) ϕ i (cid:44) → (cid:101) ϕ j for the set of functions { (cid:101) ϕ i } . That allows us tocompare (approximately) pairs | v i | , | v j | and analyze the order statisticsof the set {| v i |} in order to ﬁnd v i .6. The recovery of vector a . Using the function (cid:101) ϕ i from the previous stepwe construct embeddings (cid:101) ϕ i (cid:44) → (cid:101) ϕ γ for appropriate vectors γ (linearcombinations of standard basis e k ), and we ﬁnd approximate values of a k / | v i | , where a k are the coordinates of a . As a result we obtain theapproximation (cid:101) a to the vector a with error | a − (cid:101) a | (cid:54) Cω (cid:54) ω ∗ L σ . (8)7. The recovery of ϕ . The good approximation for a allows us to approx-imate ϕ ( t ) for any t ∈ [ − , ϕ ( t ) ≈ f ( t (cid:101) a ). We can compute ϕ on asuﬃciently ﬁne uniform grid of size 2 N + 1 in [ − ,

1] and apply thetechnique of [DT19] to approximate ϕ by the polynomial of degree M .As a result we get max | t | (cid:54) | ϕ ( t ) − (cid:101) ϕ ( t ) | (cid:54) ω ∗ . (9)Finally, we have (cid:101) f ( x ) := (cid:101) ϕ ( (cid:104) a, x (cid:105) ). We estimate the error of the approxi-mation, using (8), (9) and the Lipschitz property of ϕ : | f ( x ) − (cid:101) f ( x ) | (cid:54) | ϕ ( (cid:104) a, x (cid:105) ) − ϕ ( (cid:104) (cid:101) a, x (cid:105) ) | + | ϕ ( (cid:104) (cid:101) a, x (cid:105) ) − (cid:101) ϕ ( (cid:104) (cid:101) a, x (cid:105) ) | (cid:54)(cid:54) L σ | a − (cid:101) a | + ω ∗ (cid:54) ω ∗ . The algorithm

In this section we describe the recovery algorithm and estimate its accuracy.The theorem follows from these considerations. Each subsection correspondsto one step of the algorithm.

We will apply the following useful statement on extrapolation of analyticfunctions from their values on a uniform grid. Recall that E ρ is the ellipsewith focii ± ρ . Lemma 1 (See. [DT19], Corollary 2, 4) . Let ψ be analytic in E ρ and | ψ ( z ) | (cid:54) Q there; let the values y k = ψ ( k/N ) + ξ k , k = − N, . . . , N are known withaccuracy | ξ k | (cid:54) ε . Let p M be the polynomial of degree not exceeding M , thatminimizes (cid:80) Nk = − N | p ( k/N ) − y k | , and M (cid:54) (cid:112) N/ . Then:(i) interpolation: for | x | (cid:54) , we have | p M ( x ) − ψ ( x ) | (cid:54) CM / (cid:18) Q ρ − M ρ − ε (cid:19) , (10) with C — absolute constant;(ii) extrapolation: for | x | ∈ [1 , ( ρ + ρ − )) , we have | p M ( x ) − ψ ( x ) | (cid:54) C (cid:18) Q (cid:18) M / ρ − r − r (cid:19) r M + M / ( ρr ) M ε (cid:19) , (11) with r = ( | x | + √ x − /ρ , and C — absolute constant.Proof. The case (i) corresponds to Corollary 2 from [DT19], and the case (ii)to Corollary 4. We apply Theorems 3 and 4 from [DT19] in order to obtainnecessary estimates for singular numbers.Let us remind the setup of our algorithm. We recover an unknown func-tion f ( x ) = ϕ ( (cid:104) a, x (cid:105) ) with analytic ϕ ∈ H ( σ, ε . Procedure ( extrapolation ) . Given any vector γ ∈ B n , we receive thevalues y k ≈ f ( γ kN ), k = − N , . . . , N , take the polynomial p M of degree (cid:54) M that minimizes (cid:80) N k = − N | p M ( k/N ) − y k | , and output the function (cid:101) ϕ γ ( t ) := p M ( n / t ). 9 roposition 1. Suppose that N = M and the inequalities hold: M (cid:62) C ( σ ) log ω − , (12)log(1 /ε ) (cid:62) C (log ω − + M log n ) . (13) Then the function (cid:101) ϕ γ given by the extrapolation Procedure satisﬁes the in-equality (4).Proof. In our procedure we evaluate values of the function ψ ( z ) = ϕ ( uz ), u := (cid:104) a, γ (cid:105) , on the uniform grid { k/N } N − N in [ − , | p M ( x ) − ψ ( x ) | (cid:54) ω , | x | (cid:54) min( n / , R/ , R := σ/ | u | . (14)We start with the case R (cid:62)

2. The function ψ is analytic in the disk ofradius R . Therefore it is analytic in the disk of radius R = min( R, n / )and in the set E ρ , ρ = R + (cid:112) R −

1. We need an estimate valid for | x | (cid:54) R /

2. For such x we have r = | x | + √ x − ρ (cid:54) R / (cid:112) R / − R + (cid:112) R − (cid:54) . In the (more diﬃcult) case of extrapolation we apply (11) for ρ , Q = 1,1 (cid:54) | x | (cid:54) R / r (cid:54) /

2. Since ρ (cid:62) R (cid:62) ρ (cid:54) R (cid:54) n / , we get | p M ( x ) − ψ ( x ) | (cid:54) CM / (2 − M + ( ρ / M ε ) (cid:54) CM / (2 − M + (4 n ) M/ ε ) . We want each summand not to exceed ω /

2. Therefore the ﬁrst summandimposes the condition on M : CM / − M (cid:54) ω /

2, that holds under (12).The second summand imposes the condition (13). The estimate in the case | x | (cid:54) M and ε areweaker than in the extrapolation case.Let us consider the case R <

2. Since we have to estimate | p M ( x ) − ψ ( x ) | for | x | (cid:54) min( √ n, R/ <

1, we apply the interpolation error estimate (10).Since E σ ⊂ Π σ , we see that ϕ is analytic and bounded in E σ ; hence as | u | (cid:54)

1, the function ψ is also analytic and bounded in this set. Using theinequality (10) for ρ = 1 + σ , we get the condition on M : CM / σ − (1 + σ ) − M (cid:54) ω / ε is weaker than (13).The inequality (14) is proven. 10 .2 The construction of (cid:101) ϕ i We take N random vectors (uniformly) on the unit sphere: γ , . . . , γ N ∈ S n − . For each γ i , using the procedure described above we construct thefunction (cid:101) ϕ i := (cid:101) ϕ γ i , that approximates ϕ ( v i t ), where v i := v γ i = n / (cid:104) a, γ i (cid:105) . The statistics of v i . Let us consider the sequence v i . We trivially have | v i | (cid:54) n / . We do not know the values of v i but we know their distribution: { v i } N i =1 is a sample from the random variable V = n / X , where the vector X = ( X , . . . , X n ) is uniformly distributed on the sphere S n − . Let F n be thedistribution function of r.v. | V | ; (cid:98) F n be the empirical distribution functionof the sequence | v | , . . . , | v N | (a sample from distribution F n ).The distribution of V is close to the gaussian distribution: Law ( V ) ≈N (0 , ∗ the distribution functions of a standardgaussian variable ξ and the variable | ξ | , respectively. It is known that (see,e.g., [K06]) sup x ∈ R | Φ ∗ ( x ) − F n ( x ) | (cid:54) c/n. (15)We estimate the diﬀerence between F n and (cid:98) F n using the Dvoretzky-Kiefer-Wolfowitz inequality [M90]: P (sup x ∈ R | F n ( x ) − (cid:98) F n ( x ) | > λ ) (cid:54) − N λ ) . Let λ = 1 / − − N / x ∈ R | F n ( x ) − (cid:98) F n ( x ) | (cid:54) . (16) From here on we assume that (16) holds . It happens with proba-bility not less than 1 − δ ∗ if we take N (cid:62) C ln(2 /δ ∗ ).From (15) and (16), for suﬃciently large n and any q > q > (cid:12)(cid:12)(cid:12)(cid:12) N { i : q (cid:54) | v i | (cid:54) q } − (Φ ∗ ( q ) − Φ ∗ ( q )) (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) . (17)In the proof we will use the following approximate values for Φ ∗ : Φ ∗ (2) =0 . . . . , Φ ∗ (1 /

2) = 0 . . . . , Φ ∗ (0 .

75) = 0 . . . . , Φ ∗ (0 .

46) = 0 . . . . ,Φ ∗ (0 .

62) = 0 . . . . , Φ ∗ (0 .

57) = 0 . . . . .11n particular, from the inequality Φ ∗ (2) − Φ ∗ (1 / > .

57, we see:1 N { i : 1 / (cid:54) | v i | (cid:54) } > / . (18)Recall that we call a number v typical, if b (cid:54) | v | (cid:54) B . The appropriatechoice of b and B gives that the number of nontypical v i does not exceed 2percents. Formally, we assume that the numbers b, B are such thatΦ ∗ ( B ) − Φ ∗ ( b ) > , (19)and therefore we have { i : b (cid:54) | v i | (cid:54) B } (cid:62) N . (20) We consider the oscillation of the function ϕ :∆ h := max | t | (cid:54) h | ϕ ( t ) − ϕ (0) | . In order to estimate it we replace ϕ with (cid:101) ϕ i , and the maximum over thesegment with the maximum over the grid with some step ν > (cid:101) ∆ ih,ν := max | kν | (cid:54) h | (cid:101) ϕ i ( kν ) − (cid:101) ϕ i (0) | , i = 1 , . . . , N . Let (cid:101) ∆ med h,ν be the median of the sequence { (cid:101) ∆ ih,ν } N i =1 . Let us prove that if h (cid:54) σ/ ν is suﬃciently small, say, ν (cid:54) ω (2 L σ ) − , then the inequalityholds: ∆ h/ − ω (cid:54) (cid:101) ∆ med h,ν (cid:54) ∆ h + 3 ω . (21)Indeed, let i be such that 1 / (cid:54) | v i | (cid:54)

2. Since h (cid:54) σ/

4, the approxima-tion (4) works for | t | (cid:54) h and we can replace (cid:101) ϕ i with ϕ : | (cid:101) ∆ ih,ν − max | kν | (cid:54) h | ϕ ( v i kν ) − ϕ (0) || (cid:54) ω . Using the Lipschitz condition and the inequality | v i | (cid:54)

2, we get | ∆ | v i | h − max | kν | (cid:54) h | ϕ ( v i kν ) − ϕ (0) || (cid:54) L σ ν (cid:54) ω ν . From this | (cid:101) ∆ ih,ν − ∆ | v i | h | (cid:54) ω . Therefore for all 1 / (cid:54) | v i | (cid:54) h/ − ω (cid:54) (cid:101) ∆ ih,ν (cid:54) ∆ h + 3 ω . Since it holds for more than a half of all indexes i , the last inequalities holdalso for the median.We will use (21) for h := σ / ν := ω / (2 L σ ). If (cid:101) ∆ med h,ν (cid:62) ω − ω ,then ∆ h (cid:62) (cid:101) ∆ med h,ν − ω (cid:62) ω − ω (cid:62) ω (we assume ω (cid:62) ω ). Therefore we obtain (6) and proceed to the nextstep of the algorithm.In the opposite case ∆ σ / (cid:54) ω , we get (5). Let us derive the globalbound (7). Lemma 2.

Let ψ ∈ H ( σ, , σ ∈ (0 , , and let ∆ τ := max | t | (cid:54) τ | ψ ( t ) | , τ ∈ (0 , . Then the estimate holds max | t | (cid:54) | ψ ( t ) | (cid:54) A ∆ ατ , α = α ( σ, τ ) ∈ (0 , , A = A ( σ, τ ) > . Proof.

It is known that: for functions analytic in E ρ and bounded there by1 for any M there exists a polynomial p of degree not exceeding M , suchthat max | t | (cid:54) | p M ( t ) − ψ ( t ) | (cid:54) ρ − M / ( ρ −

1) (see [Ber]). Since E σ ⊂ Π σ ,we can take ρ = 1 + σ . In order to make the error of the approximationless than ∆ τ , we take the minimal M such that (1 + σ ) − M (cid:54) σ ∆ τ . Then | p M ( t ) − ψ ( t ) | (cid:54) ∆ τ and | p M ( t ) | (cid:54) τ for | t | (cid:54) τ .We use Chebyshev’s inequality in the following form (see [N64], p.233):let τ = τ (1 + 2 q ), q >

0, and P is a real algebraic polynomial of degree M ,then we have (cid:107) P (cid:107) C [ − τ ,τ ] (cid:107) P (cid:107) C [ − τ,τ ] (cid:54) T M ( τ /τ ) (cid:54) (1 + 2 q + 2 (cid:112) q + q ) M .T M is the classical Chebyshev polynomial of degree M. If q (cid:54) √ q ) M ; take q = c σ to make (1 + 5 √ q ) (cid:54) (1 + σ ) / (it holds for suﬃciently small c and σ < p M we obtainmax | t | (cid:54) τ | p M ( t ) | (cid:54) τ (1 + 5 √ q ) M = 2∆ τ (1 + σ ) M/ (cid:54) C ( σ )∆ / τ . τ (cid:54) C ( σ )∆ / τ for τ /τ = 1 + c σ . We iterate this construction,using points τ , τ , . . . , and obtain the required estimate on max | t | (cid:54) | ψ ( t ) | .Applying lemma to the function ( ϕ ( t ) − ϕ (0)), we get the estimate (7)under condition A ( σ, σ / ω / α ( σ,σ / (cid:54) ω ∗ / . (22)So, the algorithm stops with the approximation f ≈ f (0) (here we requirethat ε (cid:54) ω ∗ / (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ Deﬁnition.

We say that function h embeds into h with coeﬃcient λ (cid:62) δ (cid:62) h λ, δ (cid:44) −→ h ), whenmax | t | (cid:54) | h ( t ) − h ( t/λ ) | (cid:54) δ. We can similarly deﬁne the embedding h λ, δ, ν (cid:44) −→ h when we replace themaximum over [ − ,

1] with the maximum over the grid of step ν > | kν | (cid:54) | h ( kν ) − h ( kν/λ ) | (cid:54) δ. General embedding.

Let us consider the following general situation.Let g be some function on [ − R, R ] and v , v > (cid:101) g i to the functions g i ( t ) := g ( v i t ): | (cid:101) g i ( t ) − g i ( t ) | (cid:54) ω for | t | (cid:54) min(1 , R/v i ) , i = 1 , . (23)We assume the functions (cid:101) g i to be deﬁned on [ − , v i | t | (cid:54) R is natural since the function g i is deﬁned for such t .For any v (cid:54) R and any v (cid:62) v we have the embedding g λ ◦ , (cid:44) −→ g , where λ ◦ = v /v . Indeed, for | t | (cid:54) | v t | (cid:54) R , and therefore the functions g ( t ) and g ( t/λ ) are correctly deﬁned and g ( t ) = g ( v t ) = g ( v t/λ ) = g ( t/λ ) . The coeﬃcient of embedding λ is uniquely deﬁned in case of continuousfunction h . It follows from the next simple fact: if h is continuous and h ( θs ) ≡ h ( s ) for some θ ∈ (0 , h ( s ) ≡ h (0). For the case of non-zeroaccuracy embeddings we need the qualitative analogue of this fact.14 emma 3. Let h : [ − r, r ] → C be an L –Lipschitz function: | h ( s ) − h ( t ) | (cid:54) L | s − t | and ∆ := max | t | (cid:54) r | h ( t ) − h (0) | . If θ ∈ (0 , and max | t | (cid:54) r | h ( t ) − h ( θt ) | (cid:54) δ (cid:54) ∆ / , then | − θ | < ln(2 Lr/ ∆) (cid:98) ∆2 δ (cid:99) . Proof.

Lemma 4.

Let < r < R and g : [ − R, R ] → R be an L –Lipschitz function,such that the condition (23) holds for some v , v , ω > ; let λ ◦ := v /v .Suppose that v ∈ [ r, R/ . Then for any λ (cid:62) we have:(i) If v (cid:62) v and | λ − λ ◦ | (cid:54) min( ωRL , ) , then (cid:101) g λ, ω (cid:44) −→ (cid:101) g .(ii) If (cid:101) g λ, ω, ν (cid:44) −→ (cid:101) g , Lν max( v , v ) (cid:54) ω , and ∆ r := max | t | (cid:54) r | g ( t ) − g (0) | (cid:62) ω, then | λ − λ ◦ | max( λ, λ ◦ ) (cid:54) ω ∆ − r ln(2 Lr/ ∆ r ) . Proof. (i). Let t ∈ [ − , | (cid:101) g ( t ) − (cid:101) g ( t/λ ) | (cid:54) | (cid:101) g ( t ) − g ( t ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) ω + | g ( t ) − g ( t/λ ◦ ) | (cid:124) (cid:123)(cid:122) (cid:125) =0 ++ | g ( t/λ ◦ ) − g ( t/λ ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) Lv | /λ − /λ ◦ | (cid:54) ω + | g ( t/λ ) − (cid:101) g ( t/λ ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) ω (cid:54) ω. ω since (23) holds and min(1 , R/v ) = 1. Next, g ( t/λ ◦ ) is deﬁned and equals g ( v t ) = g ( t ). The third term is estimatedusing Lipschitz condition: | g ( v t/λ ◦ ) − g ( v t/λ ) | (cid:54) Lv | /λ − /λ ◦ | = Lv λ ◦ · λ − | λ − λ ◦ | (cid:54) LR λ − ωRL < ω. Finally, note that g ( t/λ ) is deﬁned and the approximation g ≈ (cid:101) g holds:indeed, we have λ (cid:62) λ ◦ (as λ ◦ (cid:62) | λ − λ ◦ | (cid:54) ), so v | t | /λ (cid:54) v /λ ◦ = 2 v (cid:54) R. (ii). Consider the grid where we have the embedding: { t k = kν : | kν | (cid:54) } . Pick some t ∈ [ − , t k in the grid such that | t k | (cid:54) | t | and | t − t k | (cid:54) ν . As in (i), we have a chain of inequalities: | g ( t ) − g ( t/λ ) | (cid:54) | g ( t ) − g ( t k ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) Lv ν (cid:54) ω + | g ( t k ) − (cid:101) g ( t k ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) ω + | (cid:101) g ( t k ) − (cid:101) g ( t k /λ ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) ω ++ | (cid:101) g ( t k /λ ) − g ( t k /λ ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) ω + | g ( t k /λ ) − g ( t/λ ) | (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) Lv ν/λ (cid:54) ω (cid:54) ω. The third term is estimated by the deﬁnition of the embedding (cid:101) g λ, ω, ν (cid:44) −→ (cid:101) g .The bounds for the two last terms are valid whenever the expressions g ( t k /λ )and g ( t/λ ) are deﬁned. For this, we need that v | t | /λ (cid:54) R . So, we haveobtained the following inequality: | g ( v t ) − g ( v t/λ ) | (cid:54) ω for | t | (cid:54) min(1 , Rλ/v ). (24)We consider two cases: λ > λ ◦ and λ < λ ◦ (if lambdas are equal, there isnothing to prove). If λ > λ ◦ , we set θ = λ ◦ /λ < s = v t , and from (24) weobtain | g ( s ) − g ( θs ) | (cid:54) ω when | s | (cid:54) v min(1 , Rλ/v ). Since Rλ/v > Rλ ◦ /v = R/v >

1, the aboveinequality is true for | s | (cid:54) v , hence for all | s | (cid:54) r .In the second case, λ < λ ◦ , the inequality (24) can be written as | g (˜ θ ˜ s ) − g (˜ s ) | (cid:54) ω, ˜ θ := λ/λ ◦ , ˜ s := v t/λ, and it holds for | ˜ s | (cid:54) v λ min(1 , Rλv ) = min( v /λ, R ) . v /λ > v /λ ◦ = v (cid:62) r , it holds for | ˜ s | (cid:54) r .Applying in both cases Lemma 3, we arrive at | λ − λ ◦ | max( λ, λ ◦ ) (cid:54) ln(2 Lr/ ∆ r ) (cid:98) ∆ r / (14 ω ) (cid:99) (cid:54) ω ∆ − r ln(2 Lr/ ∆ r ) . Let us apply the general constructions in our algorithm. The role offunction g is played by function ϕ ( ηt ), where η := bσ − ; also r := b and R :=2 B . Note that a typical v i satisﬁes | v i | ∈ [ r, R/ ϕ ( ηt ) on the segment [ − ησ , ησ ] = [ − b, b ], as required. The function ϕ ( ηt )is L (cid:48) σ –Lipshitz, L (cid:48) σ := ηL σ . Procedure ( embedding ) . For any pair of functions (cid:101) ϕ γ , (cid:101) ϕ γ we try to ﬁnd λ ∈ [1 , λ max ], λ max := n / b − , such that (cid:101) ϕ γ ( ηt ) λ, ω , ν (cid:44) −→ (cid:101) ϕ γ ( ηt ) , where ν = n − / ( L (cid:48) σ ) − ω , we search for the parameter λ over the grid in[1 , λ max ] of step ω / ( BL (cid:48) σ ). If we obtain an embedding with some λ , we set (cid:101) λ ( (cid:101) ϕ γ , (cid:101) ϕ γ ) := λ and write (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ . Otherwise, we similarly try to embed (cid:101) ϕ γ ( − t ) into (cid:101) ϕ γ and in the case of success we set (cid:101) λ ( (cid:101) ϕ γ , (cid:101) ϕ γ ) := λ . If we failto ﬁnd such λ , we say that there is no embedding and denote this situationas (cid:101) ϕ γ (cid:54) (cid:44) → (cid:101) ϕ γ . Proposition 2.

Let v γ be a typical number, the values ω i satisfy the in-equalities ω (cid:62) ω , Cω ω − ln(4 L b/ω ) (cid:54) ω bn − / . (25) If | v γ | (cid:62) | v γ | , then (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ . If (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ , then | v γ | (cid:62) | v γ | (1 − ω ) . Inboth cases | (cid:101) λ ( (cid:101) ϕ γ , (cid:101) ϕ γ ) − | v γ || v γ | | (cid:54) ω .Proof. We apply Lemma 4 with g ( t ) = ϕ ( ηt ), r = b , R = 2 B , ω = ω , L = L (cid:48) σ , (cid:101) g ( t ) = (cid:101) ϕ γ ( ηt ), (cid:101) g ( t ) = (cid:101) ϕ γ ( ηt ). Note that the typical v γ satisﬁes | v γ | ∈ [ r, R/ v γ > v γ >

0; other cases are similar. Recall thenotation λ ◦ = v γ /v γ .Let v γ (cid:62) v γ . We have λ ◦ (cid:54) n / /b = λ max ; therefore, we try to embedwith some λ , | λ − λ ◦ | (cid:54) ω / (2 BL ) (it is clear that ω / (2 BL ) < / (cid:54) ω on the whole segment,hence also on the grid; thus, (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ .Now suppose that (cid:101) ϕ γ (cid:44) → (cid:101) ϕ γ . Recall the inequality (6) which we obtainedat the step 2. It gives the estimate for g exactly on the segment [ − b, b ] since ησ = b : ∆ = max | t | (cid:54) b | g ( t ) − g (0) | (cid:62) ω . We may apply (ii) from Lemma 4, (ii); note that the condition ω / (cid:62) ω is satisﬁed by (25). We obtain | λ − λ ◦ | max( λ, λ ◦ ) (cid:54) Cω ω − ln(4 L b/ω ) (cid:54) ω /λ max . (26)Due to the construction, it holds that max( λ, λ ◦ ) (cid:54) λ max , and consequently | λ − λ ◦ | (cid:54) ω . Thus, v γ = v γ λ ◦ = v γ ( λ − ( λ − λ ◦ )) (cid:62) v γ (1 − ω ) . v i . For all pairs i, j ∈ { , . . . , N } we start the embedding procedure to searchfor embeddings (cid:101) ϕ i (cid:44) → (cid:101) ϕ j . It will allow us to compare | v i | with the relativeerror ω . In the following we use the estimates (17), (20) and Proposition 2.Let us calculate the numbers K j := { i : (cid:101) ϕ i (cid:44) → (cid:101) ϕ j } . Denote by J the set of j such that 0 . (cid:54) K j /N (cid:54) .

5. Let us consider theset { v j , j ∈ J } .Let us show that if j ∈ J , then | v j | (cid:54) .

75. Suppose the converse, let | v j | > .

75. Use that Φ ∗ (0 . > .

54; hence due to (17), there are at least0 . N indexes i , such that | v i | (cid:54) .

75; there are at least 0 . N typicalnumbers among them. Proposition 2 implies that all such (cid:101) ϕ i embed into (cid:101) ϕ j ,this contradicts j ∈ J .Now we show that j ∈ J implies | v j | (cid:62) .

45. Let | v j | < .

45 and (cid:101) ϕ i (cid:44) → (cid:101) ϕ j .If v i is typical, then due to Proposition 2 we have | v i | (cid:54) .

46 (we may assumethat ω is small). The amount of such v i is at most N (Φ ∗ (0 .

46) + 0 . (cid:54) . N . The amount of non-typical v i is at most 0 . N . Therefore, thereare at most 0 . N such i , this contradicts j ∈ J .Arguing as above, we see that indexes j with | v j | ∈ [0 . , .

62] belong to J . Therefore, this set is at least non-empty: J (cid:62) N (Φ ∗ (0 . − Φ ∗ (0 . − . (cid:62) . N . Finally, we take an arbitrary i ∈ J and obtain | v i | ∈ [0 . , . v i is typical. a Here we use the extrapolation and embedding procedures and also the func-tion (cid:101) ϕ i with typical | v i | (cid:54) / k = 1 , . . . , n we try to embed (cid:101) ϕ i (cid:44) → (cid:101) ϕ e k and ﬁnd the cor-responding (cid:101) λ (using the embedding procedure). Since a is a unit vector,for at least one k we have n / | a k | (cid:62) > | v i | and thus the embeddingexists. Suppose that for the k = k ∗ the corresponding (cid:101) λ is maximal. Wehave | n / | a k ∗ | / | v i | − (cid:101) λ ( (cid:101) ϕ i , (cid:101) ϕ e k ∗ ) | (cid:54) ω . Although | a k ∗ | is not necessarilymaximal, in any case max | a k | (cid:54) . | a k ∗ | .Without loss of generality we may assume that a k ∗ >

0. Indeed, aswe already noticed in introduction, the function f is invariant under thesubstitution ( a, ϕ ( x )) (cid:55)→ ( − a, ϕ ( − x )).Thus, we know the ratio n / a k ∗ / | v i | up to ω ; let us deﬁne other ratios.For a given k consider the vector γ = 0 . e k ∗ + 0 . e k ∈ B n . Then v γ = n / (cid:104) . e k ∗ + 0 . e k , a (cid:105) = n / (0 . a k ∗ + 0 . a k ) > . (cid:62) | v i | , therefore there exists an embedding (cid:101) ϕ i (cid:44) → (cid:101) ϕ γ and | λ ( (cid:101) ϕ i , (cid:101) ϕ γ ) − v γ / | v i || (cid:54) ω holds. From the equality n / a k = − n / a k ∗ + 10 v γ and from the obtainedinequalities on (cid:101) λ ( (cid:101) ϕ i , (cid:101) ϕ γ ) , (cid:101) λ ( (cid:101) ϕ i , (cid:101) ϕ e k ∗ ) it follows that (cid:12)(cid:12)(cid:12)(cid:12) n / a k | v i | − (cid:101) λ ( (cid:101) ϕ i , (cid:101) ϕ γ ) + 9 (cid:101) λ ( (cid:101) ϕ i , (cid:101) ϕ e k ∗ ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) Cω . Thus, we approximate coordinatewise the vector n / a/ | v i | by a vector w with the accuracy O ( ω ), so we have | w − n / a/ | v i || (cid:54) Cn / ω . It iseasy to see that | w − ra | (cid:54) δ ⇒ (cid:12)(cid:12)(cid:12)(cid:12) w | w | − a (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) δr − δ , (cid:101) a := w/ | w | we have | (cid:101) a − a | (cid:54) Cω . We obtain (8). Atthis step we have used 2 n (2 N + 1) additional evaluations. ϕ The knowledge of (cid:101) a allows us to approximate ϕ at an arbitrary point of thesegment [ − , t ∈ [ − ,

1] we evaluate f at the point t (cid:101) a andobtain the approximation y t : | y t − f ( t (cid:101) a ) | = | y t − ϕ ( t (cid:104) a, (cid:101) a (cid:105) ) | (cid:54) ε . Hence | y t − ϕ ( t ) | (cid:54) | ϕ ( t (cid:104) a, (cid:101) a (cid:105) ) − ϕ ( t ) | + ε (cid:54) L σ | a − (cid:101) a | + ε (cid:54) CL σ ω + ε ;where we applied the inequality |(cid:104) a, (cid:101) a (cid:105) − | (cid:54) | a − (cid:101) a | . Let us approximate ϕ in this way on some grid { k/N : k = − N , . . . , N } . Then we constructthe polynomial p M of degree M using the least squares ﬁt, as in the Step1. Lemma 1 (i) gives us an estimate for the accuracy | p M ( x ) − ϕ ( x ) | (cid:54) C M / ( ρ − M ρ − L σ ω + ε ) . We can take ρ = 1 + σ . For our purposes it is suﬃcient that each ofthe summands is at most ω ∗ /

6. The ﬁrst summand requires that M (cid:62) C ( σ ) log(6 /ω ∗ ). The condition for the second summand is ω (cid:54) ω ∗ CL σ M − / . (27)The condition on ε is similar to the condition (13) but weaker than it.So we can put N = 2 M . At this step we used 2 N + 1 values of f . We choose parameters in the following order: • N = (cid:100) C ln(2 /δ ∗ ) (cid:101) . • M = (cid:100) C ( σ ) log(2 /ω ∗ ) (cid:101) , N = M . • We choose ω based on the condition (27). It automatically implies thecondition on ω from (8). • We choose ω based on condition (22).20 We choose ω based on inequality (25) and condition ω (cid:62) ω . • We deﬁne M by the condition (12) and suppose N = 2 M . • The main requirement on ε is (13).Total number of function evaluations equals N = (2 N + 1)( N + 2 n ) +2 N + 1. 21 The implementation of the recovery algo-rithm

We implemented the recovery algorithm with some modiﬁcations using Pythonlanguage and the numpy library. The code is available at the Github: [gitTZ].Let us describe main changes that we introduced, and the results of numericalexperiments.The most important modiﬁcation is related to the embedding procedure.The optimal parameter λ for embedding a polynomial p ( t ) into polynomial p ( t ) can be found as the solution of the problem min − (cid:54) µ (cid:54) max | t | (cid:54) | p ( t ) − p ( µt ) | ,where µ = λ . Instead of the brute force search for possible µ on the grid andthe estimation of C -norm, we consider L -norm and explicitly ﬁndmin − (cid:54) µ (cid:54) S ( µ ) , where S ( µ ) := (cid:90) | t | (cid:54) ( p ( t ) − p ( µt )) dt. Indeed, S ( µ ) is a polynomial itself and it can be minimized on the set {− , } ∪ { µ ∈ ( − ,

1) : S (cid:48) ( µ ) = 0 } . This approach allows us to eﬀectivelycalculate rather accurate estimates of the embedding coeﬃcient.To test our algorithm we recover functions of the form ϕ ( x ) := x K (cid:32) A √ K (cid:88) k =1 A k cos( πkx ) + B k sin( πkx ) (cid:33) . (28)The values of parameters used in program are given in the table below: n ε M M N N N K K

50 10 −

12 30 200 25 200 8 7We used the values of (cid:101) ϕ γ ( t ) only for t ∈ [ − n − / , n − / ], i.e. only theinterpolation was done (extrapolation was not required); the oscillation inour experiments was large enough to search for embeddings on the segment[ − n − / , n − / ] and ﬁnd lambdas with high accuracy.The following table contains the results of 10 random experiments madewith the given above parameters. In each experiment the vector of the co-eﬃcients of trigonometric polynomial deﬁning ϕ by (28) and the vector a ofthe ridge function was chosen randomly and uniformly on the unit sphere.We also specify the value of the α parameter (3).22 | (cid:101) a − a | (cid:107) (cid:101) ϕ − ϕ (cid:107) C α . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − . · −

10 4 . · − . · − . · − The example of the function ϕ of the form (28) is shown in the Fig. 1.Figure 1: The function ϕ (example) Non-analytic functions.

Although the classes R ( H ( σ, Q ) , B N ) that wehave considered are rather rich, e.g., they include polynomial ridge functions f ( x ) = p ( (cid:104) a, x (cid:105) ), it seems that the assumption that ϕ is analytic may beweakened.The extrapolation technique of [DT19] may work for functions ϕ withcertain restrictions on the order of the decay of sequences (cid:107) ϕ ( k ) (cid:107) C [ − , (as k → ∞ ) or E k ( f ) := inf deg p (cid:54) k (cid:107) ϕ − p (cid:107) C [ − , .23e also note that the embedding technique may without extrapolation insome cases; a lower bound on the oscillation max | t | (cid:54) h | ϕ ( t ) − ϕ (0) | , h ≈ n − / iscrucial here. Nevertheless we have to deal with the case of small oscillation(that was the goal of Step 3). Derandomization.

Our algorithm is probabilistic, so the natural questionis: can we get rid of the randomness? The only point where we need it isthe condition (16). Let (

X, µ ) be a probability space and C is some familyof measurable subsets C ⊂ X . Recall that the discrepancy of a ﬁnite set Γfor a family C is deﬁned asdisc(Γ , C ) := sup C ∈C (cid:12)(cid:12)(cid:12)(cid:12) µ ( C ) − | Γ ∩ C || Γ | (cid:12)(cid:12)(cid:12)(cid:12) . The condition (16) is equivalent to the following discrepancy bound:disc( { γ i } N i =1 , C n ) (cid:54) , (29)where C n is the set of symmetric spherical caps { γ ∈ S n − : |(cid:104) a, γ (cid:105)| > t } .There are two diﬃculties: ﬁrst, we need a bound for discrepancy as n → ∞ ;the discrepancy theory is mostly developed for ﬁxed n . Second, we want adeterministic construction of { γ i } . In the case of boxes R n in [0 , n it isknown [Tbook, Prop. 6.72] that there are constructive sets Ξ N of N pointswith disc(Ξ N , R n ) (cid:54) Cn / N − / ln / max( n, N ) . So, N = n o (1) points would suﬃce for small discrepancy. It is interestingto get good constructive bounds in the spherical case. General ridge functions.

A ridge function is a simple yet interestingobject, but in practice we have to deal with more complex functions. So itis important to study the recovery of generalized ridge functions, e.g., sums ϕ ( (cid:104) a , x (cid:105) ) + ϕ ( (cid:104) a , x (cid:105) ) + . . . + ϕ r ( (cid:104) a r , x (cid:105) ). One may consider the case ofanalytic (or even polynomial) functions ϕ j . Acknowledgement.

The authors wish to express their gratitude towardsS.V. Konyagin for his suggestion to consider the case of regular ridge func-tions and constant encouragement. 24 eferences [DT19] L. Demanet, A. Townsend, “Stable extrapolation of analyticfunctions”,

Foundations of Computational Mathematics , :2(2019), 297–331.[DM21] B. Doerr, S. Mayer, “The recovery of ridge functions on the hy-percube suﬀers from the curse of dimensionality”, J.Complexity , (2021).[FSV12] M. Fornasier, K. Schnass, J. Vybiral, “Learning functions of fewarbitrary linear parameters in high dimensions”, Foundations ofComputational Mathematics , :2 (2012), 229–262.[TC14] H. Tyagi, V. Cevher, “Learning non-parametric basis indepen-dent models from point queries via low-rank methods”, Appliedand Computational Harmonic Analysis , :3 (2014), 389–412.[MUV15] S. Mayer, T. Ullrich, J. Vybiral, “Entropy and sampling num-bers of classes of ridge functions”, Constructive Approximation , :2 (2015), 231–264.[CDD12] A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian, D. Pi-card, “Capturing ridge functions in high dimensions from pointqueries”, Constructive Approximation , :2 (2012), 225–243.[Ber] S. Bernstein, “Sur la meilleure approximation des fonctionscontinues par les polynomes du degr´e donn´e. II”, Communica-tions de la Soci´et´e math´ematique de Kharkow , 2-´ee s´erie, 13:4-5(1912), 145–194 (in Russian).[NW16] E. Novak, H. Wozniakowski, “Tractability of multivariate prob-lems for standard and linear information in the worst case set-ting, Part I”,

J. Approx. Th. , (2016), 177–192.[GL07] S. Gaiﬀas, G. Lecue, “Optimal rates and adaptation in thesingle-index model using aggregation”, Electronic journal ofstatistics , (2007), 538–573.[Os00] K.Yu Osipenko, Optimal Recovery of Analytic Functions , NovaPublishers, 2000. 25M90] P. Massart, “The tight constant in the Dvoret-zky–Kiefer–Wolfowitz inequality”,

Annals of Probability , :3 (1990), 1269–1283.[Tbook] V. Temlyakov, Greedy Approximation , Cambridge UniversityPress, 2011.[K06] V.I. Khokhlov. “The uniform distribution on a sphere in R s .Properties of projections. I.”, Theory probab. appl , :3 (2006),386–399.[N64] I.P. Natanson, Constructive function theory , Ungar, vol. 1, 1964.[gitTZ] https://github.com/TZZZZ/new ridge no cursehttps://github.com/TZZZZ/new ridge no curse