[PDF] A sufficient condition for n-Best Kernel Approximation in Reproducing Kernel Hilbert Spaces

Abstract

We show that if a reproducing kernel Hilbert space H K , consisting of functions defined on E, enjoys Double Boundary Vanishing Condition (DBVC) and Linear Independent Condition (LIC), then for any preset natural number n, and any function f∈ H K , there exists a set of n parameterized multiple kernels K ~ w 1 ,⋯, K ~ w n , w k ∈E,k=1,⋯,n, and real (or complex) constants c 1 ,⋯, c n , giving rise to a solution of the optimization problem ∥f− ∑ k=1 n c k K ~ w k ∥=inf{∥f− ∑ k=1 n d k K ~ v k ∥ | v k ∈E, d k ∈R (or C),k=1,⋯,n}. By applying the theorem of this paper we show that the Hardy space and the Bergman space, as well as all the weighted Bergman spaces in the unit disc all possess n -best approximations. In the Hardy space case this gives a new proof of a classical result. Based on the obtained results we further prove existence of n -best spherical Poisson kernel approximation to functions of finite energy on the real-spheres.

Full PDF

aa r X i v : . [ m a t h . C V ] A ug A suﬃcient condition for n -Best Kernel Approximationin Reproducing Kernel Hilbert Spaces Wei Qu , Tao Qian ∗ , and Guan-Tie Deng School of Mathematical Sciences, Beijing Normal University, Beijing, China Macau Center for Mathematical Sciences, Macau University of Science and Technology, Macau, China School of Mathematical Sciences, Beijing Normal University, Beijing, China

August 4, 2020

Abstract

We show that if a reproducing kernel Hilbert space H K , consisting of functions deﬁned on E , enjoys Double Boundary Vanishing Condition (DBVC) and Linear Independent Condition (LIC),then for any preset natural number n, and any function f ∈ H K , there exists a set of n parameterizedmultiple kernels ˜ K w , · · · , ˜ K w n , w k ∈ E , k = 1 , · · · , n, and real (or complex) constants c , · · · , c n , giving rise to a solution of the optimization problem k f − n X k =1 c k ˜ K w k k = inf {k f − n X k =1 d k ˜ K v k k | v k ∈ E , d k ∈ R (or C ) , k = 1 , · · · , n } . By applying the theorem of this paper we show that the Hardy space and the Bergman space,as well as all the weighted Bergman spaces in the unit disc all possess n -best approximations. Inthe Hardy space case this gives a new proof of a classical result. Based on the obtained resultswe further prove existence of n -best spherical Poisson kernel approximation to functions of ﬁniteenergy on the real-spheres. MSC: 41A20; 41A65; 46E22; 30H20 keywords:

Reproducing Kernel Hilbert Space, Double Boundary Vanishing Condition, n -LinearlyIndependent Condition, Hardy Space, Bergman space, Approximation by Rational functions of CertainDegrees Let H be a complex Hilbert space consisting of functions deﬁned in a topological space E . Assumethat the point evaluation functional f ( z ) for any ﬁxed z ∈ E is a bounded linear functional, i.e., | f ( z ) | ≤ C z k f k , ∗ Corresponding author: [email protected] by The Science and Technology Development Fund, Macau SAR (File no. 0123/2018/A3) C z is a constant depending on z . Then, according to the Riesz representation theorem there isa function K z with z being a parameter such that f ( z ) = h f, K z i , for all z ∈ E . In such case we say that H is a reproducing kernel Hilbert space, abbreviated as RKHS,call K z the reproducing kernel of H . Denote by H K ( E ) the Hilbert space H whose correspondingreproducing kernel function is K z . Indeed, any RKHS can have only one reproducing kernel. A wideclass of Hilbert spaces, including the classical Hardy H -spaces, Bergman spaces, weighted Bergmanspaces, and Sobolev spaces, etc., belong to the category of reproducing kernel Hilbert spaces (RKHSs).The subject n -best approximation in reproducing kernel Hilbert spaces include, as a particular case, theone called best approximations to Hardy space functions by rational functions of order not exceeding n . The present study amounts to extending the question and solving it in a wide class of Hilbertspaces.In below we ﬁrst provide an account of the related concepts in the classical Hardy space of theunit disc. Denote by C the complex plane and D the open unit disc in C . The Hardy space in theunit disc is deﬁned, among other equivalent deﬁnitions, H ( D ) = { f : D → C | f ( z ) = ∞ X k =0 c k z k , ∞ X k =0 | c k | < ∞} . It is a basic property of the Hardy space that for any f ∈ H ( D ) there exists a boundary limit function,denoted f ( e it ) ∈ L ( ∂ D ) , in both the pointwise non-tangential limit sense and in the L -convergencesense as well. It is standard knowledge that under the inner product h f, g i = 12 π Z π f ( e it ) g ( e it ) dt the space H ( D ) forms a Hilbert space.Of particular importance in the Hardy space theory are the functions k w ( e it ) = 11 − we it and e H w ( z ) = k w k k w k = p − | w | − wz . The function k w ( z ), where w is considered as a parameter, is the reproducing kernel of the Hardyspace H ( D ) : By invoking the Cauchy formula it follows that for any f ∈ H ( D ) there holds h f, k w i = f ( w ) , and, subsequently, h f, (cid:18) ∂∂w (cid:19) l k w i = f ( l ) ( w ) , l = 1 , , · · · . Deﬁnition 1.1

For any n complex numbers ( w , · · · , w n ) ∈ D n and n complex numbers ( c , ..., c n ) ∈ C n , the function n X k =1 c k B k ( z )2 s called an n -Blaschke form, and an n -degenerate Blaschke form if c n = 0 , where { B k } nk =1 is the n -Takenaka-Malmquist (n-TM) system generated by the sequence ( w , · · · , w n ) ,B k ( z ) = p − | w k | − w k z k − Y l =1 z − w l − w l z . We note that the n -TM system { B k } nk =1 is the orthonormalization of the n -system(˜ k w , · · · , ˜ k w n ) , where ˜ k w k ( z ) , (cid:18) ddw (cid:19) ( l ( w k ) − k w ( z ) | w = w k , (1.1)called the multiple reproducing kernels , where l ( w k ) , multiple number of w k in ( w , · · · , w k )([10, 17]). Besides the multiple reproducing kernels we also use normalized multiple reproducing kernels ˜ e H w k ( z ) = ˜ k w k ( z ) k ˜ k w k ( z ) k . (1.2)For fast expanding a given function into a TM system the adaptive Fourier decomposition (AFD) wasproposed that is related to the Beurling-Lax decomposition of the Hardy space into the direct sumof the forward- and the backward-shift invariant subspaces ([9, 21]). AFD theory and algorithm havebeen generalized to matrix-valued functions deﬁned in the disc ([1]) and in the ball of several complexvariables ([2]).The n -best rational approximation problem in the Hardy space is formulated as follows. A pairof polynomials ( p, q ) is said to be n - admissible if it satisﬁes the following conditions: (i) p and q areco-prime;(ii) q does not have zeros in D ; and (iii) the both degrees of p and q are at most n ([10, 18]). The n -best Rational Approximation Problem : For f ∈ H ( D ) , ﬁnd an n -admissible pair ofpolynomials ( p , q ) such that k f − p q k H ( D ) = inf {k f − pq k H ( D ) | ( p, q ) is n − admissible } . (1.3)The above optimization problem may be re-formulated as ﬁnding a non-degenerative Blaschkeform n X k =1 h f, B w k i B w k ( z ) , where the B w k ’s correspond to w = ( w , · · · , w n ) , such that k f − n X k =1 h f, B w k i B w k k = inf {k f − n X k =1 h f, B v k i B v k k H ( D ) | v = ( v , · · · , v n ) ∈ D n } . (1.4)3r, alternatively, we can ask the following question: Denotes by f / span { ˜ e H v , · · · , ˜ e H v n } the projec-tion of f into the span of ˜ e H v , · · · , ˜ e H v n , ( v , · · · , v n ) ∈ D n . Find ( w , · · · , w n ) ∈ D n such that k f / span { ˜ e H w , · · · , ˜ e H w n }k is maximized over all ( v , · · · , v n ) ∈ D n . There have been several proofs in the literature for existence of the above speciﬁed n -best rationalapproximation problem in the classical Hardy spaces, see [15] (J. L. Walsh, 1962), [14] (G. Buckebusch,1978), [3] (L. Baratchart), [10], [6]. In the last two articles the problem is reformulated in terms of n -Blaschke form. In the Hardy space case practical algorithms, including the INRIA method ([19]),cyclic AFD ([18]), and lately the gradient descent method in [11], can only claim to converge, in fact,to a local minimum. A mathematical algorithm to ﬁnd the global minimum, is now still being sought.The present paper works with the RKHS context. In a general RKHS one has a set of analogousobjects and can raise the same n -best approximation question. Let H K be a reproducing kernelHilbert space (RKHS) consisting of a class of functions deﬁned in a topological space E , an open andconnected set if it is in a larger topological space, with the reproducing kernel K w : w ∈ E , that is,for any f ∈ H K , h f, K w i = f ( w ) . We will also use the objects ˜ K w , ˜ E w as, respectively, multiple reproducing kernel and multiple nor-malized reproducing kernel, similarly deﬁned as ˜ k a and ˜ e H a in, respectively, (1.1) and (1.2).For a ﬁxed positive integer n, the n -best question is formulated as follows: Find n parameters a = ( a , · · · , a n ) ∈ E n that make the objective function A ( f ; a ) = k f − n X k =1 h f, B a k i H K B a k k H K (1.5)minimized, where n X k =1 h f, B a k i H K B a k (1.6)is called the n - kernel orthonormal form of f corresponding to the n -tuple ( a , · · · , a n ) , where ( B a , · · · , B a k )is the G-S orthonormalization of the multiple kernels ( ˜ E a , · · · , ˜ E a k ) , k ≤ n. Note that the above for-mulation is equivalent with the following minimization problem: Find ( a , · · · , a n ) ∈ E n , ( c , · · · , c n ) ∈ C n , such that k f − n X k =1 c k ˜ K a k k = inf {k f − n X k =1 d k ˜ K b k k | b k ∈ E , d k ∈ C , k = 1 , · · · , n } . (1.7)If for some d k ’ and b k ’s f = P mk =1 d k ˜ K b k , then f is said to be an m -kernel expansion .We note that in the cases where the RKHS under study is the Hardy space inside the unit discor the Hardy space in the upper-half complex plane, if ( B , · · · , B n ) is the G-S orthonormalizationof the n -tuple of the multiple reproducing kernels (˜ e H a , . . . , ˜ e H a n ) , then, by adding one more multiplereproducing kernel ˜ e H a n +1 to the n -sequence, the corresponding ( n + 1)-orthonormalization system,( B , · · · , B n , B n +1 ) , is with the ( n + 1)-th term of the form B n +1 = φ n ˜ e H a n +1 , where φ n is the Blaschkeproduct, unique up to a uni-modular constant, deﬁned by the ﬁrst n parameters a , · · · , a n as its4eros, including the multiples. Indeed, TM systems are constructed in such way. In AFD, througha generalized backward shift operation, the TM systems are automatically generated ([9]). It is aquestion whether there exist other types RKHSs that possess such or similar constructive property.From our observation it seems that only the Hardy spaces of the classical domains possess suchproperty (see [1, 2]). In the weighted Bergman spaces of the classical domains this property does nothold ([12, 13]).As technical preparation we need to recall the so called ρ -weak pre-orthogonal adaptive Fourierdecomposition ( ρ -Weak-POAFD) developed in the general RKHS context. Assume that H is a generalHilbert space with a dictionary parameterized by elements in E , denoted E a , a ∈ E . Let ρ ∈ (0 , . Suppose that we have obtained an n -term orthogonal expansion f = n X k =1 h f, B k i B k + g n +1 , where ( B , · · · , B n ) is the G-S orthonormalization of a selected n -sequence ( a , · · · , a n ) , where the a k ’s are mutually diﬀerent. Select a n +1 , diﬀerent from all the already selected a k ’s, k = 1 , · · · , n, suchthat |h f, B a n +1 n +1 i| ≥ ρ sup {|h f, B bn +1 i| | b ∈ E } , (1.8)where for any b ∈ E , ( B , · · · , B n , B bn +1 ) is the G-S orthonormalization of ( E a , · · · , E a n , E b ) . Makesuch selections from n = 1 and for all consecutive n > , we obtain ρ -Weak-POAFD ([8, 12]. Remark 1.2 ρ -Weak-POAFD is available for all RKHSs. The selection criterion (1.8) shows thatit is a more optimal selection principle than the other types weak greedy algorithms in the classicalliterature ([5, 4]). When a dictionary satisﬁes BVC (see below), the selection corresponding to ρ = 1 is available, called POAFD. POAFD has the optimal maximal selection at each algorithm step ([8]).It reduces to AFD in the classical Hardy space ([9]). In this paper we introduce what we call by

Double Boundary Vanishing Condition (DBVC) thatwill play an important role in the n -best optimization problem. Assume that the parameters set E isequipped with a topology. We used to work with the cases in which E is a region (open and connected)of the complex plane C or a region of the space of several complex variables C n under its naturaltopology. We now have the convention that, together with the ﬁnite boundary points, we add theinﬁnite point, being included in the set of the boundary points if E is unbounded, that correspondsto the one-point-compactiﬁcation of the original topological space. The added point is denoted ∞ . Taking E = C + = { z ∈ C | Im( z ) > } as an example. E is equipped with the topology of C . Byadding the ∞ point, the sequence of open sets { z ∈ C + | Im( z ) < m or | z | > n } , were m, n arepositive integers, forms a basis of open neighborhoods of ∂ E . A RKHS is said to satisfy DBVC if forany sequence z n → ˜ z ∈ E and w n → ˜ w ∈ ∂ E , and z n = w n , there holdslim n →∞ h E z n , E w n i = 0 . (1.9)If DBVC holds, then we can show BVC (boundary vanishing condition) holds. That is, for any f ∈ H K and w n → ˜ w ∈ ∂ E , there holds lim n →∞ h f, E w n i = 0 . We have the following 5 emma 1.3 If H K is a RKHS satisfying DBVC, then it satisﬁes BVC. Proof.

Let f ∈ H K . Since H K is a RKHS, by any type of the matching pursuit algorithm, includingPOAFD and Weak-POAFD, one can ﬁnd ( a , · · · , a n , · · · ) , consisting of mutually diﬀerent terms in E , such that f = ∞ X k =1 h f, B k i B k , where for any n, ( B , · · · , B n ) is the G-S orthonormalization of some selected ( E a , · · · , E a n ) , n =1 , , · · · Then, for any ǫ > , one can ﬁnd a natural number N such that k f − N X k =1 h f, B k i B k k ≤ ǫ/ . By invoking the Cauchy-Schwarz inequality we have |h f, E w n i| = |h f − N X k =1 h f, B k i B k , E w n i| + |h N X k =1 h f, B k i B k , E w n i|≤ k f − N X k =1 h f, B k i B k k + |h N X k =1 h f, B k i B k , E w n i|≤ ǫ/ |h N X k =1 h f, B k i B k , E w n i| . We note that in the last summation the functions B , · · · , B N can be expressed as linear combina-tions of E a , · · · , E a N . The inner products involving B , · · · , B N then can be passed on to those with E a , · · · , E a N , and thus DBVC can be used. As a result, the last term of the above inequality chainis less than ǫ if n is large enough. The proof is complete.We need a condition on RKHS called n-Linearly Independent Condition ( n -LIC): If for a ﬁxed n and any mutually distinguish w , · · · , w n the corresponding function set { E w , · · · , E w n } is linearlyindependent, then the RKHS is said to satisfy n -Linearly Independent Condition. This conditionis rather mild, for, if it is not true, then a parameterized reproducing kernel is a linear expansionof some others. The latter implies that there exist w , · · · , w n , such that for all functions f in thespace there holds f ( w n ) = c f ( w ) + · · · + c n − f ( w n − ) , where c k ’s are ﬁxed complex constants. Aconsequence of n -LIC, that is also the form that we use in the proof of our main Theorem 2.1, isthat if a , · · · , a k , b are mutually distinguish points in E , then the projection of E b into the span of E a , · · · , E a k , or k E b − P kk =1 h E b , B k i B k k = q − P kl =1 |h E b , B l i| is nonzero, where ( B , · · · , B k ) isthe G-S orthonormalization of ( E a , · · · , E a k ) , k ≤ n − . The main result of this paper is

Theorem 1.4

A RKHS H K has a solution for the n -best optimization problem (1.5) in the open set E n if the RKHS satisﬁes DBVC and n - LIC . The precise statement of the theorem will be given in next section. The main eﬀort of the proofis to show that under the conditions DBVC and n -LIC a solution exists and must situate in the open6et E n (interior solution). In both the theory (sifting process) and applications (model reduction) asolution being inside the open set is crucial, as having been seen in the complex Hardy space rationalapproximation theory (see, for instance, the enclosed references by Walsh, Baratchart, Qian, and Quet al.). The main mechanism for such interior solutions is DBVC. In general RKHSs, a solution of the n -best may also happen at the boundary. Hence DBVC is not a necessary condition of existence of ageneral solution.After proving the main theorem we verify that the weighted Bergman spaces in the disc satisfyDBVC and n -LIC, and thus conclude that the weighted Bergman spaces have n -best kernel approx-imations in the corresponding Hilbert space norms. Based on the obtained results we further proveexistence of n -best spherical Poisson kernel approximation to functions of ﬁnite energy on the real-spheres. Except the classical Hardy spaces case, the other n -best existence results proved in this paper,including the version on RKHSs with a DBVC and n -LIC dictionary and the concrete examples withcomplex holomorphic function spaces and the spaces of functions of ﬁnite energy on the real-spheres,are all new results and proved for the ﬁrst time. n -Best Approximation for RKHS with DBVC and n -LIC Theorem 2.1

Let H K be a RKHS that satisﬁes DBVC and n -LIC. Let n be any but ﬁxed positiveinteger. Then for any f ∈ H K , if f by itself is not an m -kernel expansion form for ≤ m < n, then there exists an n -tuple of parameters ( a , · · · , a n ) ∈ E n , with ( B , · · · , B n ) being the associatedorthonormal systems such that A ( f ; a ) = k f − n X t =1 h f, B t i B t k attains the minimum value over all possible values arising from all the n -tuples in place of ( a , · · · , a n ) in E n . Proof of Theorem 2.1 . Denote b = ( b , · · · , b n ) . It is obvious that A ( f ; b ) has a non-negative globalinﬁmum value for all b in E n , call it d. We show that this global inﬁmum value is attainable at aninterior point of E n . Let a ( k ) = ( a ( k )1 , · · · , a ( k ) n ) be an n -tuple at which A ( f ; a ( k ) ) < d + 1 /k. Therethen exists a subsequence tending to an n -tuple a in E n . Without loss of generality we can assumethat the sequence a ( k ) itself tends to a . We are to show a ∈ E n . Assume the opposite, which meansthat some coordinates of a are on ∂ E , and we will, in such case, introduce a contradiction. We dividethe n coordinates into two groups, I and B , where for l ∈ I there holds lim k →∞ a ( k ) l = a l ∈ E ; and for l ∈ B there holds lim k →∞ a ( k ) l = a l ∈ ∂ E . We are assuming B = ∅ . Since A ( f ; a ( k ) ) is the energy of f onto the orthogonal complement of the span of the multiple reproducing kernels in the n -tuple a ( k ) , the energy quantity being irrelevant with the order of the elements in a ( k ) , we can assume, withoutloss of the generality, that the coordinates in I are all in front of those in B . The point is to show that,because lim k →∞ a ( k ) l ∈ ∂ E for l ∈ B , the components a ( k ) l of a ( k ) , if l ∈ B , will have no contributionsto the optimization of A ( f ; a ) . To simplify the argument we may assume without loss of generalitythat for every k the n -tuple a ( k ) does not have multiple components, although the limiting n -tuple a may have. Let l be the largest index for the indices in I , then the indices l + t, < t ≤ n − l willbe in the index range B . Since B = ∅ , we have l < n. R ( k ) = span { E a ( k )1 , · · · , E a ( k ) n } and P ( k ) the orthogonal projection to R ( k ) ; and likewise, R ( k ) I =span { E a ( k )1 , · · · , E a ( k ) l } and P ( k ) I the orthogonal projection mapping into R ( k ) I . It is easy to show thatfor the given function f , the projections P ( k ) I f have a limit as k → ∞ , denoted P I f, as the projectionof f into span { ˜ E a , ..., ˜ E a l } . Denote g = f − P I f. The general form of the elements in the Gram-Schmidt orthonomalization of the system { E a ( k ) t } nt =1 is B ( k ) t = E a ( k ) t − P t − j =1 h E a ( k ) t , B ( k ) j i B ( k ) j k E a ( k ) t − P t − j =1 h E a ( k ) t , B ( k ) j i B ( k ) j k , (2.10)where { B ( k )1 , · · · , B ( k ) j } is the Gram-Schmidt orthonormalization of { E a ( k )1 , · · · , E a ( k ) j } , ≤ j ≤ n. We show that for any function h in the reproducing kernel Hilbert space there holdslim k →∞ h h, B ( k ) t i = 0 , l < t ≤ n. (2.11)Temporarily accepting (2.11), and using it for h = f, we have d = lim k →∞ ( k f k − l X t =1 |h f, B ( k ) t i| ) = k f − P f k = k g k . We note that g = 0 , for otherwise f is an m -kernel form with m = l < n, contrary with theassumption. g = 0 then implies d > . Let g ( k ) j := f − P jt =1 h f, B ( k ) t i B ( k ) t , l ≤ j ≤ n. We havelim k →∞ g ( k ) j := g l = g, l ≤ j ≤ n. Find a ∈ E such that |h g, E a i| = δ > . Let the new parametermatrix be b ( k ) t = a ( k ) t , ≤ t < n ; b ( k ) n = a, where only the last column is diﬀerent from the old. Then in the new system, using ˜ B ( k ) t in place of B ( k ) t , ≤ t ≤ n, and ˜ P ( k ) in place of P ( k ) , we havelim k →∞ k ˜ P ( k ) f k = lim k →∞ ( n − X t =1 |h f, ˜ B ( k ) t i| + |h f, ˜ B ( k ) n i| )= lim k →∞ ( k l X t =1 |h f, B ( k ) t i B ( k ) t k + |h f, ˜ B ( k ) n i| )= k P I f k + lim k →∞ |h f, ˜ B ( k ) n i| = k P f k + lim k →∞ |h f, ˜ B ( k ) n i| (2.12)where h f, ˜ B ( k ) n i = h f, ( I − P kn − ) E a i ρ k , where, as a consequence of LIC, ρ k ∈ (0 , . We further have h f, ( I − P kn − ) E a i = h ( I − P kn − ) f, E a i , also lim k →∞ P kn − f = P l f. Taking into account ( I − P l ) f = g, and8im k →∞ ρ k = ρ = q − P l t =1 |h E a , B t i| ∈ (0 , , as a consequence of LIC again. The last equalitychain (2.12) ﬁnally equals k ˜ P f k = k P f k + | h g, E a i ρ | = k f k − k g k + ( δ/ρ ) > k f k − d . Or, lim k →∞ k f − ˜ P ( k ) f k < d , being contrary with d being the global inﬁmum value of A ( f ; b ) , b ∈ E n . The proof of the theorem iscomplete.Now we proceed to prove the relation (2.11) for t = l + 1 , · · · , n. First let t = l + 1 . We have h f, B ( k ) l +1 i = h f, E a ( k ) l − P l j =1 h E a ( k ) l , B ( k ) j i B ( k ) j r − P l j =1 |h E a ( k ) l , B ( k ) j i| i . Since H K satisﬁes DBVC, from Lemma 1.3, H K also satisﬁes BVC. As a consequence,lim k →∞ h f, E a ( k ) l i = 0 . (2.13)Since lim k →∞ ( a ( k )1 , · · · , a ( k ) l ) = ( a , · · · , a l ) ∈ E l , there exist the limits lim k →∞ B ( k ) j = B j , beingfunctions in H K , for j = 1 , · · · , l . Then BVC and the Cauchy-Schwarz inequality imply |h E a ( k ) l , B ( k ) j i| = |h E a ( k ) l , B j i| + |h E a ( k ) l , B ( k ) j − B j i|≤ |h E a ( k ) l , B j i| + k B ( k ) j − B j k→ , as k → ∞ , j ≤ l . (2.14)In accordance with the relations (2.13) and (2.14), we have (2.11) for t = l + 1 . Now we prove (2.11) for t > l + 1 . The induction hypotheses include that each term B ( k ) j , ≤ j ≤ t − , is a linear combination of E a ( k ) s , ≤ s ≤ j, while the coeﬃcients of the linear combina-tion are all constituted by sums and products between h E a ( k ) s , E a ( k ) s ′ i , ≤ s ′ , s ≤ j, and divisions by q − P s − l =1 |h E a ( k ) s , B ( k ) l i| , ≤ s ≤ j, without involving universal constants; and that thelim k →∞ h f, B ( k ) l i = 0 , l < l ≤ t − . Write, in accordance with (2.10), h f, B ( k ) t i = 1 k E a ( k ) t − P t − j =1 h E a ( k ) t , B ( k ) j i B ( k ) j k  h f, E a ( k ) t i − t − X j =1 h E a ( k ) t , B ( k ) j ih f, B ( k ) j i  . The assumed DBVC, its consequence BVC, and the induction hypotheses together, establishlim k →∞ h f, E a ( k ) t i = 0 , lim k →∞ h E a ( k ) t , B ( k ) j i = 0 , and lim k →∞ h f, B ( k ) j i = 0 , j ≤ t − . k →∞ h f, B ( k ) t i = 0 . Based on the mathematical induction principle the proof is complete.

Remark 2.2

A large amount commonly used Hilbert spaces are RKHSs in which DBVC and LIC aresatisﬁed. The above theorem guarantees that such RKHSs have n -best kernel approximations. Therecently developed cyclic and gradient descent algorithms ([18, 11]) for Hardy spaces are adaptable toabstract RKHSs with DBVC and LIC. The proof of the existence result guarantees convergence of theadapted algorithms in abstract spaces. It, in particular, serves as a useful reference in learning theoryfor simultaneously selecting n -parameters to optimize an energy-based objective function. References [1] D. Alpay, F. Colombo, T. Qian, I. Sabadini,

Adaptive orthonormal systems for matrix-valuedfunctions , Proceedings of the American Mathematical Society, 2017, (5): 2089-2106.[2] D. Alpay, F. Colombo, T. Qian, and I. Sabadini,

Adaptative decomposition: The case of theDrury-Arveson space , Journal of Fourier Analysis and Applications, 2017, (6): 1426-1444.[3] L. Baratchart, Existence and generic properties of L approximations for linear systems, Math.Control Inform., : 89-101.[4] E. D. Livshitz, V. N. Temlyakov, On convergence of weak greedy algorithms , South CarolinaUnivrsity Columbia DEPT of Mathmatics, 2000.[5] S. Mallat, Z. Zhang,

Matching pursuits with time-frequency dictionaries,

IEEE Trans. SignalProcess, 1993, : 3397-3415.[6] W. Mi, T. Qian, F. Wan, A fast adaptive model reduction method based on Takenaka-Malmquistsystems,

Systems and Control Letters, 2012, (1): 223-230.[7] T. Qian, Reproducing Kernel Sparse Representations in Relation to Operator Equations,

ComplexAnal. Oper. Theory 14 (2020), no. 2, 1C15.[8] T. Qian, Two-Dimensional Adaptive Fourier Decomposition, Mathematical Methods in the Ap-plied Sciences, 2016, (10): 2431-2448.[9] T. Qian, Y.B. Wang, Adaptive Fourier series-a variation of greedy algorithm, Advances in Com-putational Mathematics, 2011, (3): 279–293.[10] T. Qian, E. Wegert, Optimal approximation by Blaschke forms,

Complex Variables and EllipticEquations, 2013, (1): 123-133.[11] T. Qian, J. Z. Wang, W. X. Mai, An Enhancement Algorithm for Cyclic Adaptive Fourier De-composition , Applied and Computational Harmonic Analysis, available, 2019.[12] W. Qu, P. Dang,

Rational approximation in a class of weighted Hardy spaces,

Complex Analysisand Operator Theory, 2019, (4): 1827-1852.1013] W. Qu, P. Dang, Reproducing kernel approximation in weighted Bergman spaces: Algorithm andapplications,

Mathematical Methods in the Applied Sciences, 2019, (12): 4292-4304.[14] G. Ruckebusch, Sur l’approximation rationnelle des ﬁltres,

Report No 35 CMA Ecole Polytech-nique, 1978.[15] J. L. Walsh,

Interpolation and approximation by rational functions in the complex domain,

Amer-ican Mathematical Soc. Publication, 1962, .[16] T. Qian, Reproducing Kernel Sparse Representations in Relation to Operator Equations. ComplexAnal. Oper. Theory 14 (2020), no. 2, 1C15.[17] T. Qian, Y.B. Wang, Remarks on adaptive Fourier decomposition, International Journal ofWavelets, Multiresolution and Information Processing, 2013, (01).[18] T. Qian, Cyclic AFD Algorithm for best approximation by rational functions of given order,Mathematical Methods in the Applied Sciences, 2014, (6): 846-859.[19] Baratchart, L., Cardelli, M., Olivi, M. , Identiﬁcation and rational L approximation a gradientalgorithm, Automatica, 1991, (2): 413-417.[20] E. Stein, G. Weiss, Introduction to Fourier Analysis in Euclidean Spaces,

Princeton UniversityPress, 1970.[21] L. H. Tan, T. Qian, Q. H. Chen,

New aspects of Beurling-Lax shift invariant subspaces,

AppliedMathematics and Computation, 2015, : 257-266.[22] X. Y. Wang, T. Qian, I. T. Leong, Y. Gao,

Two-Dimensional Frequency-Domain System Identi-ﬁcation , IEEE Transactions on Automatic Control, 2019, DOI: 10.1109/TAC.2019.2913047.[23] B. Korenblum, H. Hedenmalm, K. Zhu, D. B´ekoll´e, Theory of Bergman spaces, The MathematicalIntelligencer, 2005, (1): 85–86.[24] B. MacCluer, Elementary functional analysis. Springer Science & Business Media, 2008,253