[PDF] Perturbations of Christoffel-Darboux kernels. I: detection of outliers

Abstract

Two central objects in constructive approximation, the Christoffel-Darboux kernel and the Christoffel function, are encoding ample information about the associated moment data and ultimately about the possible generating measures. We develop a multivariate theory of the Christoffel-Darboux kernel in C^d, with emphasis on the perturbation of Christoffel functions and their level sets with respect to perturbations of small norm or low rank. The statistical notion of leverage score provides a quantitative criterion for the detection of outliers in large data. Using the refined theory of Bergman orthogonal polynomials, we illustrate the main results, including some numerical simulations, in the case of finite atomic perturbations of area measure of a 2D region. Methods of function theory of a complex variable and (pluri)potential theory are widely used in the derivation of our perturbation formulas.

Full PDF

PPERTURBATIONS OF CHRISTOFFEL-DARBOUXKERNELS. I: DETECTION OF OUTLIERS

BERNHARD BECKERMANN, MIHAI PUTINAR, EDWARD B. SAFF,AND NIKOS STYLIANOPOULOS

Abstract.

Two central objects in constructive approximation, the Chri-stoﬀel-Darboux kernel and the Christoﬀel function, are encoding ampleinformation about the associated moment data and ultimately about thepossible generating measures. We develop a multivariate theory of theChristoﬀel-Darboux kernel in C d , with emphasis on the perturbation ofChristoﬀel functions and their level sets with respect to perturbations ofsmall norm or low rank. The statistical notion of leverage score providesa quantitative criterion for the detection of outliers in large data. Usingthe reﬁned theory of Bergman orthogonal polynomials, we illustrate themain results, including some numerical simulations, in the case of ﬁniteatomic perturbations of area measure of a 2D region. Methods of func-tion theory of a complex variable and (pluri)potential theory are widelyused in the derivation of our perturbation formulas. Contents

1. Introduction 21.1. The Christoﬀel-Darboux kernel and Christoﬀel function 21.2. Detecting outliers and anomalies in statistical data 31.3. Outline 62. Univariate and multivariate Christoﬀel-Darboux kernel 72.1. Deﬁnition and basic properties in the univariate case 72.2. Deﬁnition of the Christoﬀel-Darboux kernel in the multivariatecase 82.3. Basic properties of multivariate Christoﬀel-Darboux kernels 122.4. Leverage scores, the Mahalanobis distance and Christoﬀel-Darboux kernels 173. Approximation of Christoﬀel-Darboux kernels 193.1. Comparing two measures 19

Date : April 30, 2019.2010

Mathematics Subject Classiﬁcation.

Key words and phrases. orthogonal polynomial, Christoﬀel-Darboux kernel, Greenfunction, Siciak function, Bergman space, outlier, leverage score.

Acknowledgements.

The ﬁrst author was supported in part by the Labex CEMPI(ANR- 11-LABX-0007-01). The third author was partially supported by the U.S. Na-tional Science Foundation grant DMS-1516400. The forth author was supported by theUniversity of Cyprus grant 3/311-21027. a r X i v : . [ m a t h . C V ] A p r BECKERMANN, PUTINAR, SAFF, AND STYLIANOPOULOS C d Introduction

The wide scope of this ﬁrst article in a series is a systematic study ofperturbations of multivariate Christoﬀel-Darboux kernels. We depart froma Bourbaki style by treating in parallel a speciﬁc application, namely thedetection of outliers in statistical data. Both themes have a rich and glo-rious history which cannot be condensed in a single research article. Wewill merely provide glimpses with precise references into the past or recentachievements as our story unfolds, continuously interlacing the two themes.The reproducing kernel K µn ( z, w ) on the space of univariate or multivari-ate polynomials of degree less than or equal to n , in the presence of an innerproduct derived from the Lebesgue space L ( µ ) associated with a positivemeasure µ in C d , bears the name of Christoﬀel-Darboux kernel . It is givenin the univariate case by K µn ( z, w ) = n (cid:88) j =0 p µj ( z ) p µj ( w ) , where the p µj are orthonormal polynomials of respective degrees j . For morethan a century this object has repeatedly appeared (mostly in the one com-plex variable setting) as a central ﬁgure in interpolation, approximation, andasymptotic expansion problems.1.1. The Christoﬀel-Darboux kernel and Christoﬀel function.

Con-sider for instance the wide class of inverse problems leading in the end tothe classical moment problem on the real line. There one has a grasp on K n ( z, w ) = K µn ( z, w ) , n ≥ , before knowing the representing measure(s) µ .This gives for complex and non-real values z ∈ C the radius of Weyl’s circleof order n : r n ( z ) = 1 | z − z | K n ( z, z ) . HRISTOFFEL-DARBOUX KERNELS. I 3

The vanishing of the limit r ∞ ( z ) = lim n r n ( z ) for a single non-real z resolveswithout ambiguity the delicate question whether the moment problem isdeterminate (has a unique solution). For a real value x ∈ R and a ﬁxed n the reciprocal K n ( x,x ) , also known as the Christoﬀel function , gives a tightupper bound for the maximal mass a ﬁnite atomic measure with prescribedmoments up to degree n can charge at the point x . An authoritative accountof the central role of the reproducing kernel method in moment problemsand the theory of orthogonal polynomials on the line or on the circle iscontained in Freud’s monograph [15]. For an enthusiastic and informativeeulogy of Freud’s predilection for the kernel method see [31].Much happened meanwhile. On the theoretical side, Totik [44] has settleda long standing open problem concerning the limiting values of nK n ( x,x ) fora positive measure on the line, culminating a long series of particular casesoriginating a century ago in the pioneering work of Szeg˝o, see also the survey[40]. Roughly speaking, the limit of K n ( x,x ) detects the point mass chargeat the point x in the support of the original measure µ , while nK n ( x,x ) (seenas a function of x ) gives the Radon-Nikodym derivative of µ with respect tothe equilibrium measure of the support of orthogonality. One step further,Lubinsky [26] has studied a universality principle for the asymptotics of thekernel K µn ( x, y ), highly relevant for the theory of random matrices. Theweak-* limits of the counting measures of the eigenvalues of truncations of(unitary or self-adjoint) random matrices were recently unveiled via similarkernel methods by Simon [41].All the above results refer to a single real variable. There is, however,notable progress in the study of the asymptotics of the Christoﬀel-Darbouxkernel in the multivariate case cf. [9, 24, 25, 48]. These studies cover speciﬁcmeasures, in general invariant under a large symmetry group, the maintheorems oﬀering kernel asymptotics at points belonging to the support ofthe original measure. The behavior of the multivariate Christoﬀel-Darbouxkernel outside the support of the generating measure is decided in most casesby a classical extremal problem in pluripotential theory. We shall providedetails on the latter in the second section of this article and we will recorda few illustrative cases in the last section.1.2. Detecting outliers and anomalies in statistical data.

But that’snot all. Scientists outside the strict boundaries of the mathematics commu-nity continue to rediscover the wonders of Christoﬀel-Darboux type kernels.For a long time statisticians used the kernel method to estimate the proba-bility density of a sequence of identically distributed random variables [33],or to separate observational data by the level set of a positive deﬁnite kernel,a technique originally developed for pattern recognition [47]. The last coupleof years have witnessed the entry of Christoﬀel-Darboux kernel through thefront door of the very dynamical ﬁelds of Machine Learning and ArtiﬁcialIntelligence [28, 29, 21, 22]. Notice that in these applications µ is typically BECKERMANN, PUTINAR, SAFF, AND STYLIANOPOULOS an atomic (empirical) measure with ﬁnite support in C d representing a pointcloud, which makes analysis like kernel asymptotics more involved.In the case d = 1, Malyshkin [28] suggested the use of modiﬁed mo-ments, that is, computing the Christoﬀel-Darboux kernel of a measure µ from orthogonal polynomials of a similar but classical measure (Jacobi, La-guerre, Hermite), without going through monomials, an idea also used inthe modiﬁed Chebyshev algorithm of Sack and Donovan [17]. Some numer-ical evidence is provided for numerical stability of this approach, even forhigh degrees n . The approximation of the support of the original measureand detection of outliers became thus possible through the investigation ofthe level sets of the Christoﬀel function [21, 22]. Not unrelated, three ofthe authors of this article have successfully used the Christoﬀel function forreconstruction, from indirect measurements in 2D, of the boundaries of anarchipelago of islands each carrying the same constant density against areameasure [19].The aim of this ﬁrst essay in a series is to study the eﬀect of various per-turbations on Christoﬀel-Darboux kernels associated to positive measurescompactly supported by C d . An important thesis one can draw from clas-sical constructive function theory is that even purely real questions ask forthe domain of the reproducing kernel, potential functions, spectra of asso-ciated operators to be extended to complex values. We therefore suggest toconsider multivariate orthogonal polynomials in C d , where in applicationsreal point clouds could be imbedded in C d by using the isomorphism with R d . As we will recall in (6) for d = 1 and in Lemma 2.9 for d >

1, theChristoﬀel-Darboux kernel K µn ( z, z ) typically has exponential growth in Ω(the unbounded connected component of C d (cid:114) supp( µ )) but not in supp( µ ).This conﬁrms the claim of [28, 29, 21, 22] that level sets of the Christoﬀel-Darboux kernel for suﬃciently large n allow us to detect outliers , that is,points of supp( µ ) which are isolated in Ω.However, this claim is certainly wrong for atomic measures (cid:101) ν (also referredto as discrete measures) (cid:101) ν = N (cid:88) j =1 t j δ z ( j ) for t j > z ( j ) ∈ C d with support describing a cloud of pointssince, following the above terminology, every element of supp( (cid:101) ν ) would bean outlier, and, even worse, there is only a ﬁnite number of orthogonalpolynomials making kernel asymptotics impossible. Here typically the pointcloud splits into two parts, namely most of the points approximating wellsome continuous set S ⊆ C d , and some outliers suﬃciently far from S . Toformalize this, we suppose that N (cid:29) n , and that the atomic measure (cid:101) ν canbe split into two atomic measures (cid:101) ν = (cid:101) µ + σ, (cid:101) µ ≈ µ, supp( σ ) ⊆ Ω , (1) HRISTOFFEL-DARBOUX KERNELS. I 5 with µ and Ω as before and n large enough such that we have suﬃcientinformation about K µn ( z, z ). However, we should insist that in applicationswe only know the (empirical) measure (cid:101) ν and possibly its moments, and wewant to learn from level lines of K (cid:101) νn ( z, z ) the set of outliers supp( σ ), and, ifpossible, supp( µ ) or even µ itself.The notion (cid:101) µ ≈ µ in (1) means that the moments of both measuresup to order n are suﬃciently close, measured in § K µn ( z, z ) is suﬃciently close to K (cid:101) µn ( z, z ) for all z ∈ C d . In order to deduce lower and upper bounds for K (cid:101) νn ( z, z ) and actuallyshow that level lines of this Christoﬀel-Darboux kernel are indeed useful foroutlier detection, we still have to examine how a Christoﬀel-Darboux kernelchanges after adding a ﬁnite number of point masses represented by σ , thatis, provide lower and upper bounds for K µ + σn ( z, z ) /K µn ( z, z ) and K (cid:101) µ + σn ( z, z ) /K (cid:101) µn ( z, z ) , which is the scope of §

4. For the special case d = 1, we establish in § K µ + σn ( z, z ) /K µn ( z, z ) in the special case where we have ratioasymptotics for the underlying µ -orthogonal polynomials.Roughly speaking, as long as N (cid:29) n , by combining the above results weﬁnd two diﬀerent regimes, namely for z ∈ supp( σ ) ∪ supp( µ ) where K (cid:101) νn ( z, z )essentially remains bounded, and on compact subsets of the complementΩ (cid:114) supp( σ ) where K (cid:101) νn ( z, z ) growths exponentially like K µn ( z, z ). We mayconclude that level lines for K νn ( z, z ) for suﬃciently large parameters willseparate outliers from supp( µ ), and refer the readers to [21, 22] for a practicalchoice of the parameter of critical level lines.We expect the required critical level lines to encircle once either supp( µ )or exactly one of the outliers, but not the others. In practical terms, it isnon-trivial to draw on a computer certiﬁed level lines because of the stiﬀgradients of K (cid:101) νn ( z, z ), and hence this approach might be quite costly.Therefore, we suggest a alternate strategy with an overall cost whichscales linearly with N (and at most cubic in n ): Inspired by related resultsin statistics and data analysis, we are considering in Section 2.4 a leveragescore attached to each mass point z ( j ) , namely t j K (cid:101) νn ( z ( j ) , z ( j ) ) ∈ (0 , , which we expect to be close to 1 iﬀ z ( j ) is an outlier. For the particular case of d = 1 and µ being area measure on a simply connected and bounded subset ofthe complex plane, the above-mentioned results on the diﬀerent Christoﬀel-Darboux kernels allow us to fully justify this claim, see Corollary 6.5 andthe following discussion and numerical experiments.In this paper we do not discuss and exploit for the moment any recur-rence relation between multi-variate orthogonal polynomials and thus the d underlying commuting Hessenberg operators representing the multiplica-tion with the independent variables z , z , ..., z d . This will be considered ina sequel publication. BECKERMANN, PUTINAR, SAFF, AND STYLIANOPOULOS

Adding point masses to a given measure and recording the change oforthogonal polynomials is not a new theme, see for instance the historicalnotes in [39, p. 684]. The latter addresses measures on the unit circle,but the same applies to orthogonality on the real line. In case of severalvariables, quite a few authors have considered classical measures µ withexplicit formulas for the corresponding orthogonal polynomials like Jacobi-type measures on the unit ball, and σ being one [11] or several mass points[12], or the Lebesgue measure on the sphere [30, Theorem 3.3].1.3. Outline. § § § § § n , the Christoﬀel-Darboux kernels for two diﬀerent measures behave similarly. This requiresus to ﬁrst address in § p νn from p µn for two measures µ, ν , giving rise to so-calledmodiﬁed moment matrices.Our ﬁrst main result can be found in § § d = 1 of univariate polynomials and describesasymptotics for the change of the Christoﬀel-Darboux kernel after addinga ﬁnite number of point masses. This second result is illustrated in § C in situation (1) under somesmoothness assumptions on µ . We ﬁnally illustrate by examples in § C µn ( z, w ). For the convenience of the reader, an index ofnotation is included in § Acknowledgements.

The authors are grateful to the Mathematical ResearchInstitute at Oberwolfach, Germany, which provided exceptional workingconditions for a Research in Pairs collaboration in 2016 during which timethe main ideas of this manuscript were developed. Special thanks are dueto the referee for a careful examination of the manuscript and authoritativecomments which led to a clariﬁcation of Lemma 2.9.

HRISTOFFEL-DARBOUX KERNELS. I 7 Univariate and multivariate Christoffel-Darboux kernel

The role of reproducing kernels in the development of the classical theoriesof moments, orthogonal polynomials, statistical mechanics and in generalfunction theory cannot be overestimated. Numerous studies, old and new,recognize the asymptotics of the Christoﬀel functions as the key technicalingredient for a variety of interpolation and approximation questions.Our aim is to study the stability of the Christoﬀel-Darboux kernels undersmall perturbations or additive perturbations of the underlying measure.Quite a long time ago it was recognized that even purely real questions of,say approximation, have to be treated with methods of function theory of acomplex variable and potential theory. For this reason, we advocate belowthe complex variable setting. For the advantages and richness of the theoryof orthogonal polynomials in a single complex variable see [36, 42].2.1.

Deﬁnition and basic properties in the univariate case.

Considerthe Lebesgue space L ( µ ) with scalar product and norm (cid:104) f, g (cid:105) ,µ = (cid:90) f ( z ) g ( z ) dµ ( z ) , (cid:107) f (cid:107) ,µ = (cid:16) (cid:104) f, f (cid:105) ,µ (cid:17) / , (2)where µ is a positive Borel measure in the complex plane. Assume that themeasure µ has compact and inﬁnite support supp( µ ), so that one can or-thogonalize without ambiguity the sequence of monomials. Then the Gram-Schmidt process gives us the associated sequence of orthonormal polyno-mials, p µj of degree j , with positive leading coeﬃcient, together with theChristoﬀel-Darboux kernel K µn ( z, w ) = n (cid:88) j =0 p µj ( z ) p µj ( w ) (3)for the subspace of polynomials of degree at most n . The case z = w will bereferred to as the diagonal kernel , whereas for z (cid:54) = w we speak of a polarizedkernel .For any z ∈ C and any polynomial p of degree at most n we have thereproducing property (cid:104) p, K µn ( · , z ) (cid:105) ,µ = (cid:90) K n ( z, w ) p ( w ) dµ ( w ) = p ( z ); (4)in other words, the function w (cid:55)→ K µn ( w, z ) represents the linear functionalof point evaluation p (cid:55)→ p ( z ) in the algebra of polynomials. Simple Hilbertspace arguments show that K µn ( z, w ) = (cid:104) K µn ( · , w ) , K µn ( · , z ) (cid:105) ,µ , and that, forall z ∈ C ,1 (cid:112) K µn ( z, z ) = min {(cid:107) p (cid:107) ,µ : p polynomial of degree ≤ n , p ( z ) = 1 } , (5)with the unique extremal polynomial being given by x (cid:55)→ K µn ( x, z ) /K µn ( z, z ).The reciprocal λ n ( z ) = 1 /K µn ( z, z ) naturally appears in a variety ofapproximation questions in the complex plane, and is traditionally called BECKERMANN, PUTINAR, SAFF, AND STYLIANOPOULOS the

Christoﬀel or Christoﬀel-Darboux function. For instance, it is shownby Totik [45, Theorem 1.2] that, provided that µ is suﬃciently smoothand supported on a system of arcs, the quantity n/K n ( z, z ) tends to theRadon-Nykodym derivative of µ on open subarcs. If z belongs to the(two-dimensional) interior of supp( µ ), a similar result is established for n /K µn ( z, z ) in [45, Theorem 1.4]. Under weaker assumptions we may alsodescribe n th root limits of the Christoﬀel-Darboux kernel: Denote by Ωthe unbounded connected component of C (cid:114) supp( µ ), with g Ω ( · ) its Greenfunction with pole at ∞ . Using the inequality (cid:107) p (cid:107) ,µ ≤ µ ( C ) (cid:107) p (cid:107) L ∞ (supp( µ )) in (5) and, e.g., the extremal property [38, Theorem III.1.3] of Fekete points,we get at least exponential growth for the Christoﬀel-Darboux kernel in Ω,namely lim inf n →∞ K µn ( z, z ) /n ≥ e g Ω ( z ) (6)uniformly on compact subsets of Ω. Moreover, if Ω is regular with respectto the Dirichlet problem, then for suﬃciently “dense” measures, namely theclass Reg introduced in [42, § n →∞ K µn ( z, z ) /n = e g Ω ( z ) (7)uniformly on compact subsets of C . In particular, there is no exponentialgrowth on the support. Roughly saying, this is the motivation of [21, 22]for considering level sets of the Christoﬀel-Darboux kernel to help detectoutliers, that is, points of the support of µ which are isolated in Ω.2.2. Deﬁnition of the Christoﬀel-Darboux kernel in the multivari-ate case.

The theory of orthogonal polynomials of several (real or complex)variables is much more involved and thus less developed, we refer the readerfor instance to [13] or the recent summary [14] for the case of several realvariables. Brieﬂy speaking, we have to face at least two basic problems whichwe will address in the next two subsections. The ﬁrst problem is that thereis no natural ordering of multivariate monomials. Second, the multivariatemonomials might not be linearly independent in L ( µ ), or, in other words,the null space N ( µ ) = { p ∈ C [ z ] : (cid:90) | p | dµ = 0 } might be nontrivial. This means in particular that we have to revisit thereproducing property (4), and the extremal problems (5) and (6), the aimof § C [ z ] stands for the algebra of polynomials with complex co-eﬃcients in the indeterminates z = ( z , z , . . . , z d ). Notice that every func-tion continuous in C d and vanishing µ -almost everywhere has to vanish on Recall that g Ω ( z ) > z ∈ Ω, and equal to inﬁnity if supp( µ ) is polar, that is, oflogarithmic capacity zero. HRISTOFFEL-DARBOUX KERNELS. I 9 supp( µ ), which actually shows that N ( µ ) does only depend on supp( µ ) : N ( µ ) = { p ∈ C [ z ]: p vanishes on supp( µ ) } . It immediately follows that N ( µ ) is a (radical) ideal in the algebra C [ z ].Hereafter, µ denotes a positive Borel measure with compact support in C d , with corresponding scalar product as in (2). Depending on the context,the hermitian inner product in C d is denoted as follows: (cid:104) z, w (cid:105) = z · w = w ∗ z = z w + z w + · · · + z d w d . In a few instances we will also encounter the bilinear form z · w = z w + z w + · · · + z d w d . The standard multi-index notation z α = z α z α · · · z α d d , α = ( α , α , . . . , α d ) ∈ N d , is adopted. The graded lexicographical order on the set of indices is denotedby “ ≤ (cid:96) ”. Speciﬁcally α ≤ (cid:96) β if either α = β or α < β ; i.e., | α | = α + · · · + α d < β + · · · + β d = | β | , or | α | = | β | and α = β , . . . , α k = β k , α k +1 < β k +1 , for some k < d .In what follows we suppose that a ﬁxed enumeration α (0) , α (1) , α (2) , ... ofall multi-indices is given with α (0) = 0, for instance (but not necessarily) agraded lexicographical ordering. Henceforth we assume that multiplicationby any variable increases the order; that is, for all j, n that z j z α ( n ) = z α ( k ) for k > n . We write deg p = k if p is a linear combination of the monomials z α (0) , z α (1) , ..., z α ( k ) , with the coeﬃcient of z α ( k ) being non-trivial. Note that,for d >

1, our notion of degree diﬀers from the so-called total degree tdeg p being the largest of the orders | α | of monomials z α occurring in p . Example 2.1.

Consider the positive measure µ on C given by (cid:90) p ( z , z ) q ( z , z ) dµ ( z ) = (cid:90) | u |≤ p ( u , u ) q ( u , u ) dA ( u ) , where p, q ∈ C [ z ] and dA stands for the area measure in C . Notice thatsupp( µ ) is compact and strictly contained in the complex aﬃne variety { ( z , z ) ∈ C : P ( z , z ) = 0 } ; P ( z , z ) := z − z , in particular, the ideal N ( µ ) is generated by P . The graded lexicographicalordering for bivariate monomials is given by1 , z , z , z , z z , z , z , z z , z z , z , z , z z , z z , z z , z , ... This is equivalent to requiring that any subset of multi-indices { α (0) , α (1) , α (2) , ..., α ( n ) } is downward close, compare with, e.g., [10]. and, for instance, α (8) = (2 , z and z cannot be distin-guished, and the same is true for z z and z , and for z and z z , and soon. We thus have to go to the quotient space C [ z , z ] / N ( µ ), with a basisgiven by1 , z , z , z , z z , z , z , z z , z z , z , ..., z n , z n − z , z n − z , z n +11 , ..., written in the same order. We will see in Lemma 2.3 below, that exactly thismaximal set of monomials being linearly independent in L ( µ ) will allow usto construct corresponding multivariate orthogonal polynomials. Deﬁnition 2.2.

Let k ∈ N . The tautological vector of ordered monomialsis the row vector with ( k + 1) components, given by v k ( z ) = ( z α ( j ) ) j =0 , ,...,k . The corresponding matrix of moments is deﬁned by (cid:102) M k ( µ ) = (cid:16) (cid:104) z α ( (cid:96) ) , z α ( j ) (cid:105) ,µ (cid:17) kj,(cid:96) =0 = (cid:90) v k ( z ) ∗ v k ( z ) dµ ( z ) . Notice that (cid:102) M k ( µ ) is a Gram matrix and hence positive semi-deﬁnite.However, these moment matrices are not necessarily of full rank k + 1. Thismay happen already in the case d = 1 of univariate orthogonal polynomials,where z α ( j ) = z j and the support of µ is ﬁnite. Indeed, for d = 1, assumethat det (cid:102) M n ( µ ) = 0 and n is the ﬁrst index with this property. That meansthere exists a non-trivial combination of the rows of the matrix (cid:102) M n ( µ ), orin other terms, there exists a monic polynomial h ( z ) = z n + · · · of exactdegree n , such that h ∈ N ( µ ). That is, µ is an atomic measure with exactly n distinct points in its support, namely the roots of h . Consequently, h ( z ) z j isalso annihilated by µ for all j ≥

0, showing that N ( µ ) is the ideal generatedby h , and rank (cid:102) M n + j ( µ ) = rank (cid:102) M n ( µ ) = n, j ≥ . In case of several variables, the ideal N ( µ ) is not necessarily a principal ideal,but it is ﬁnitely generated, according to Hilbert’s basis theorem. However,in general the above rank stabilization phenomenon of moment matricesceases to hold for d > d ≥

1, we may still generate orthogonal polynomials p µ , p µ , ... by the Gram-Schmidt procedure starting with the family of functions z α (0) , z α (1) , ... ; however, it may happen that not every monomial z α ( k ) gives raiseto a new orthogonal polynomial. Indeed, since we have ﬁxed an orderingof the monomials, we may and will construct a maximal set of monomials z α ( k µn ) , with 0 = k µ < k µ < · · · being linearly independent in L ( µ ), andan orthogonal basis p µ , p µ , ... of the set spanned by these monomials. As wewill see in Lemma 2.6 below, this set together with N ( µ ) will allow us towrite the set of multivariate polynomials as an orthogonal sum. HRISTOFFEL-DARBOUX KERNELS. I 11

The next result, necessarily technical, formalizes the construction of or-thogonal polynomials in the presence of an ordering of the monomials andpossible linear dependence of the associated moments. The interested readermight ﬁrst have a look at the special case N ( µ ) = { } where the momentmatrices (cid:102) M k ( µ ) have full rank k + 1 for all k ∈ N , and thus orthogonalpolynomials exist for each degree n = k µn . Lemma 2.3.

We may construct multivariate orthogonal polynomials p µ , p µ , ... with deg p µn = k µn := min { k ≥ (cid:102) M k ( µ ) = n + 1 } ; (8) in particular, k µ = 0 . Moreover, for k = k µn , there exists an invertible andupper triangular matrix (cid:101) R k ( µ ) of order k + 1 such that (cid:101) R k ( µ ) ∗ (cid:102) M k ( µ ) (cid:101) R k ( µ ) = E k ( µ ) E k ( µ ) ∗ , E k ( µ ) := ( δ j,k µ(cid:96) ) j =0 ,...,k,(cid:96) =0 ,...,n . (9) Furthermore, M n ( µ ) = E k ( µ ) ∗ (cid:102) M k ( µ ) E k ( µ ) is an maximal invertible subma-trix of (cid:102) M k ( µ ) , and R n ( µ ) ∗ M n ( µ ) R n ( µ ) = I, v µn := ( p µ , ..., p µn ) = v k E k ( µ ) R n ( µ ) , (10) with the upper triangular and invertible matrix R n ( µ ) = E k ( µ ) ∗ (cid:101) R k ( µ ) E k ( µ ) .Proof. We construct orthogonal polynomials with (8) by recurrence. Since (cid:107) (cid:107) ,µ = µ ( C d ) (cid:54) = 0, we may deﬁne q µ ( z ) = 1 and p µ ( z ) = 1 / (cid:112) µ ( C d ). Atstep k ≥

1, with k µn − < k ≤ k µn , we compute q µk = v k ξ by orthogonalizing z α ( k ) against p µ , ..., p µn − . By recurrence hypothesis (8) and the assumption k µn − < k , the degree of any of the polynomials p µ , ..., p µn − is strictly lessthan k . Thus deg q µk = k ; that is, the last component of ξ is nontrivial.If k = k µn , then by deﬁnition of k µn the last column of (cid:102) M k ( µ ) is linearlyindependent of the others, and hence (cid:102) M k ( µ ) ξ (cid:54) = 0. Using the fact that (cid:102) M k ( µ ) is Hermitian and positive semi-deﬁnite, we infer (cid:107) q µk (cid:107) ,µ = ξ ∗ (cid:102) M k ( µ ) ξ (cid:54) = 0 . Consequently one can deﬁne p µn = q µk / (cid:107) q µk (cid:107) ,µ of degree k µn , with positiveleading coeﬃcient. If however k < k µn , we ﬁnd that (cid:102) M k ( µ ) ξ = 0 and thus (cid:107) q µk (cid:107) ,µ = 0, in other words, q µk ∈ N ( µ ). The same argument shows thatthere is no polynomial of degree exactly k and of norm 1 that is orthogonalto p µ , ..., p µn − . This proves (8).Redeﬁning q µk ( z ) = p µn ( z ) for all k = k µn , we have thus constructed a basis { q µ , ..., q µk } of the space of polynomials of degree at most k ; in other words,there exists an upper triangular and invertible matrix (cid:101) R k ( µ ) with( q µ ( z ) , ..., q µk ( z )) = v k ( z ) (cid:101) R k ( µ ) , and (9) follows by orthonormality.By construction, p µn contains only the powers z α ( k µ ) , ..., z α ( k µn ) ; that is, the columns of the coeﬃcient vectors of our orthogonal polynomialssatisfy (cid:101) R k ( µ ) E n ( µ ) = E n ( µ ) E n ( µ ) ∗ (cid:101) R k ( µ ) E n ( µ ) = E n ( µ ) R n ( µ ) , which implies (10). (cid:3) Notice that (9) gives a full rank decomposition (cid:102) M k ( µ ) = XX ∗ , with thefactor X ∗ = E n ( µ ) ∗ (cid:101) R n ( µ ) − being of upper echelon form, the pivot in the j -th column lying in row k µj . We also learn from (9) that a basis of thenullspace of (cid:102) M k ( µ ) is given by the columns of (cid:101) R k ( µ ) with indices diﬀerentfrom k µ , k µ , ... . Finally, from (9), R n ( µ ) − is the Cholesky factor of theinvertible matrix M n ( µ ).With the vector of orthogonal polynomials v µn as in Lemma 2.3, we arenow prepared to deﬁne the multivariate Christoﬀel-Darboux kernel of order n as K µn ( z, w ) = v µn ( z ) v µn ( w ) ∗ = n (cid:88) j =0 p µj ( z ) p µj ( w ) , where we notice that, according to (10), K µn ( z, w ) = v k µn ( z ) E n ( µ ) M n ( µ ) − E n ( µ ) ∗ v k µn ( w ) ∗ . For later use we mention the aﬃne invariance of orthogonal polynomialsand Christoﬀel functions : consider the change of variables (cid:101) z = Bz + b for adiagonal and invertible B ∈ C d × d and b ∈ C d , and let c >

0. If we deﬁne (cid:101) µ by (cid:101) µ ( S ) = cµ ( BS + b ) for any Borel set S ⊆ C d , then K µn ( z, w ) = c K (cid:101) µn ( Bz + b, Bw + b ) , (11)compare with [21, Lemma 1] where the authors allow also for general invert-ible matrices B but then require graded lexicographical ordering and theadditional assumption N ( µ ) = { } .2.3. Basic properties of multivariate Christoﬀel-Darboux kernels.Deﬁnition 2.4.

Consider for k = k µn the following two subsets of multivari-ate polynomials N n ( µ ) := { v k ξ : (cid:102) M k ( µ ) ξ = 0 } = { p ∈ N ( µ ) : deg p ≤ k } , L n ( µ ) := span { p µj : j = 0 , , ..., n } = span { z α ( k µj ) : j = 0 , , ..., n } , and the aﬃne variety S n ( µ ) := (cid:92) p ∈N n ( µ ) { z ∈ C d : p ( z ) = 0 } . By deﬁnition of N n ( µ ), we get supp( µ ) ⊆ S n ( µ ). Notice that N n ( µ ) isincreasing in n and hence S n ( µ ) decreases in n . Moreover, recalling that N ( µ ) = (cid:83) n ≥ N n ( µ ) is a ﬁnitely generated ideal, we see that, for n beinglarger than or equal to the largest of the degrees of the generators, the set HRISTOFFEL-DARBOUX KERNELS. I 13 S n ( µ ) does no longer depend on n , and we will use the notation S ( µ ), beingthe Zarisky closure of supp( µ ).Notice that N ( µ ) = { } implies that S ( µ ) = C d . In case of an atomicmeasure µ with a ﬁnite number of point masses one can show that S ( µ ) =supp( µ ), but in general S ( µ ) is larger than supp( µ ); see for instance Exam-ple 2.1. Examples 2.5.

Consider two measures µ, ν , with supp( µ ) ⊆ supp( ν ), sub-ject for instance to the condition ν − µ ≥

0. Then obviously N ( ν ) ⊆ N ( µ ),with equality if and only if supp( ν ) ⊆ S ( µ ). The Lebesgue measure µ on thereal or complex unit ball in C d has been extensively studied; in particular N ( µ ) = { } , see §

7. We conclude that, if supp( ν ) has a non-empty interiorin C d (or supp( µ ) ⊆ R d has a non-empty interior in R d ), then N ( ν ) = { } and S ( ν ) = C d .It is easy to check that we keep the reproducing property (4) for p ∈L n ( µ ), but it fails for nontrivial elements of N n ( µ ). The following resultshows that the set of multivariate polynomials of degree at most k = k µn can be written as an orthogonal sum L n ( µ ) ⊕ N n ( µ ). Thus the multivariateChristoﬀel-Darboux kernel is indeed a projection kernel onto L n ( µ ), andonly a reproducing kernel in the case N n ( µ ) = { } . Lemma 2.6.

Let k, n ∈ N with k µn ≤ k < k µn +1 , and p be a polynomial ofdegree ≤ k . Then the polynomial deﬁned by q ( z ) := (cid:90) K µn ( z, w ) p ( w ) dµ ( w ) satisﬁes q ∈ L n ( µ ) and p − q ∈ N n +1 ( µ ) . In particular, for k = k µn = deg p µn we have that N n ( µ ) = { p ∈ C [ z ] : deg p ≤ deg p µn , p ⊥ µ p µ , ..., p µn } . (12) Proof.

By construction, q ∈ L n ( µ ). Also, deg( p − q ) ≤ k < deg p µn +1 , and,by orthogonality, (cid:90) | p − q | dµ = (cid:107) p (cid:107) ,µ − (cid:107) q (cid:107) ,µ = (cid:107) p (cid:107) ,µ − n (cid:88) j =0 |(cid:104) p, p µj (cid:105) ,µ | = 0 , showing that p − q ∈ N n +1 ( µ ), as claimed in the ﬁrst assertion. Notice thatin the special case k = k µn , the same reasoning gives the slightly more preciseinformation p − q ∈ N n ( µ ), and p ∈ N n ( µ ) iﬀ q = 0 iﬀ p ⊥ µ p µ , ..., p µn , leadingto the second part. (cid:3) Let us now investigate to what extent the extremal property (5) remainstrue in the multivariate setting. If z (cid:54)∈ S n ( µ ) , then by deﬁnition there exists p ∈ N n ( µ ) with p ( z ) = 1, and hence inf {(cid:107) p (cid:107) ,µ : deg p ≤ k µn , p ( z ) = 1 } =0, showing that (5) fails. There are two ways to specialize this extremalproblem in order to recover (5). Lemma 2.7.

For all n ∈ N and z ∈ C d (cid:112) K µn ( z, z ) = min {(cid:107) p (cid:107) ,µ : p ∈ L n ( µ ) , p ( z ) = 1 } , (13) and for all k, n ∈ N with k µn ≤ k < k µn +1 and z ∈ S n +1 ( µ )1 (cid:112) K µn ( z, z ) = min {(cid:107) p (cid:107) ,µ : deg p ≤ k, p ( z ) = 1 } . (14) Proof.

For a proof of (13), we use exactly the same argument as in theunivariate case, namely the Cauchy-Schwarz inequality and its sharpness.Let now be z ∈ S n +1 ( µ ). Then, with p and q ∈ L n ( µ ) as in Lemma 2.6,we have that (cid:107) p (cid:107) ,µ = (cid:107) q (cid:107) ,µ , and p ( z ) − q ( z ) = 0 by deﬁnition of S n +1 ( µ ).This shows that in (14) we may restrict ourselves to p ∈ L n ( µ ), and (14)follows from (13). (cid:3) We learn from (14) that, for z ∈ S ( µ ), the Christoﬀel-Darboux kernel K µn ( z, z ) does not depend on the particular ordering of monomials as longas the set { α (0) , α (1) , ..., α ( k ) } remains the same.Of related interest will be the cosine function C µn ( z, w ) := K µn ( z, w ) (cid:112) K µn ( z, z ) (cid:112) K µn ( w, w ) . (15)To the best of our knowledge, there are hardly any results in the literatureconcerning asymptotics for n → ∞ for such a normalized and polarizedChristoﬀel-Darboux kernel. Multipoint matricial analogs of these kernelswill be used later on. Deﬁnition 2.8.

Let z , z , . . . , z (cid:96) , w , w , . . . , w (cid:96) be two (cid:96) -tuples of arbitrarypoints in C d , and deﬁne the (cid:96) × (cid:96) matrices K µn ( z , z , . . . , z (cid:96) ; w , w , . . . , w (cid:96) ) := ( K µn ( z j , w k )) (cid:96)j,k =1 ,C µn ( z , z , . . . , z (cid:96) ; w , w , . . . , w (cid:96) ) := ( C µn ( z j , w k )) (cid:96)j,k =1 . If z , ..., z n ∈ S n ( µ ) are distinct and K µn ( z , z , . . . , z (cid:96) ; z , z , . . . , z (cid:96) ) is in-vertible (compare with Lemma 4.1 below), then this matrix also occurs invariational problems similar to the single point case. Speciﬁcallymin {(cid:107) p (cid:107) ,µ : deg p ≤ k µn , p ( z j ) = c j for j = 1 , ..., (cid:96) } = c ∗ K µn ( z , z , . . . , z (cid:96) ; z , z , . . . , z (cid:96) ) − c, c T = ( c , . . . , c (cid:96) ) . This follows along the same lines as the proof of (14) by observing that itis suﬃcient to examine candidates of the form p ( z ) = a K µn ( z, z ) + · · · + a (cid:96) K µn ( z, z (cid:96) ) with p ( z j ) = c j for j = 1 , ..., (cid:96) .When trying to extend the asymptotic properties (6) and (7) to the mul-tivariate setting one encounters a series of complications. First of all, thesought limiting behavior of Christoﬀel-Darboux kernels outside the supportof the generating measure is hardly discussed in the literature, especially inthe general setting of § HRISTOFFEL-DARBOUX KERNELS. I 15 ourselves to measures µ with support being non-pluripolar, which allows toapply pluripotential theory, but excludes cases as in Example 2.1 where thesupport is an algebraic variety. As for many other asymptotic results in theliterature, we also have to restrict ourselves to total degree, and thereforeassume that, for all m ∈ N , we have that { α (0) , ..., α ( n ) } = { α ∈ N d : | α | ≤ m } , (16)with n =: n tot ( m ) = (cid:16) m + dd (cid:17) −

1, as it is, for instance, true for the gradedlexicographical ordering. In addition, we are forced to focus only on thecomplement of the polynomial hull of supp( µ ).Lemma 2.9 below summarizes some basic results of pluripotential the-ory. Without aiming at the most comprehensive (and hence more techni-cal) results, we conﬁne ourselves at oﬀering a multivariate analogue to themuch better understood case of a single complex variable. We stress thatLemma 2.9 below is not used anywhere else in the present article. Our mainfocus is perturbation formulas for the Christoﬀel-Darboux kernel and sharpasymptotics. As discussed below such formulas are of course desirable, butwe have to be well aware of the constraints imposed by their existence. Lemma 2.9.

Consider an ordering with (16) , and suppose that S := supp( µ ) is non-pluripolar. Denote by Ω be the unbounded connected component of C d (cid:114) S , and by (cid:98) S the polynomial convex hull of S . (a) For z ∈ Ω (cid:114) (cid:98) S , lim inf m →∞ K µn tot ( m ) ( z, z ) /m > . (17) (b) Suppose, in addition, that S is L -regular in the sense of [23, p.186 and § , and denote by g Ω the plurisubharmonic Green function with pole at ∞ which is plurisubharmonic and continuous in C d , equals in (cid:98) S and isstrictly positive in C d (cid:114) (cid:98) S ; see the proof below. Then, for all z ∈ C d , lim inf m →∞ K µn tot ( m ) ( z, z ) /m ≥ e g Ω ( z ) . (18) (c) If, moreover, µ belongs to the multivariate generalization of the class Reg introduced in [24, Eqn. (1.6)] (or, equivalently, ( S, µ ) satisﬁes a Bernstein-Markov property in the sense of [7, Eqn. (2)] ), then the limit lim m →∞ K µn tot ( m ) ( z, z ) /m = e g Ω ( z ) (19) exists for all z ∈ C d , and equals one iﬀ z ∈ (cid:98) S . By deﬁnition [23, p. 75], (cid:98) S = { z ∈ C d : | p ( z ) | ≤ (cid:107) p (cid:107) L ∞ ( S ) for all polynomials p } .One easily checks that (cid:98) S is compact and contains S . Moreover, S = (cid:98) S for real S by [23,Lemma 5.4.1]. However, unlike the case of one complex variable, (cid:98) S might be strictly largerthan the complement of Ω, see, e.g., [1]. Proof.

Our proof of Lemma 2.9 is based on Siciak’s function Φ S in C d forthe compact set S = supp( µ ):Φ S ( z ) := sup (cid:40)(cid:18) | p ( z ) |(cid:107) p (cid:107) L ∞ ( S ) (cid:19) / tdeg( p ) : p (cid:54) = 0 a polynomial (cid:41) = lim m →∞ sup (cid:40)(cid:18) | p ( z ) |(cid:107) p (cid:107) L ∞ ( S ) (cid:19) /m : p (cid:54) = 0 , tdeg( p ) ≤ m (cid:41) , where we recall that tdeg( p ) is the largest of the orders | α | of monomials z α occurring in p . Taking a ﬁxed polynomial p in the second formula showsin particular that Φ S ( z ) ≥ z ∈ C d . By deﬁnition of the polynomialconvex hull, we have that Φ S ( z ) ≤ z ∈ (cid:98) S , and thus Φ S ( z ) = 1 for z ∈ (cid:98) S .Moreover, if z (cid:54)∈ (cid:98) S , then there exists a polynomial q with | q ( z ) | / (cid:107) q (cid:107) L ∞ ( S ) >

1, and taking as p suitable powers of q we may conclude that Φ S ( z ) > L the Lelong class of plurisubharmonicfunctions having at most logarithmic growth at ∞ , and consider the so-called pluricomplex Green function V S ( z ) := sup { u ( z ) : u ∈ L , u | S ≤ } ,together with its upper semi-continuous regularization V ∗ S ( z ) = lim sup w → z V S ( w ) . It is shown, e.g., in [35, Theorem 2] or [38, Theorem B.2.8], that log Φ S ( z ) = V S ( z ) for all z ∈ C d . Hence from the above-mentioned properties of Siciak’sfunction we may conclude that V S = 0 in (cid:98) S , and V S > C d (cid:114) (cid:98) S . By [38,Deﬁnition B.1.6 and Theorem B.1.8], the function V ∗ S is plurisubharmonicwhereas, in general, V S is not. By deﬁnition, V ∗ S ≥ C d , V ∗ S > (cid:98) S , and V ∗ S = 0 on (cid:98) S up to some pluripolar set by [38, Theorem B.1.7]. Forall these reasons, following [35] we write g Ω ( z ) := V ∗ S ( z )referred to as the plurisubharmonic Green function of S with pole at inﬁnitysince the latter function reduces to the classical univariate Green functionin case d = 1. Notice also that g Ω = g C d (cid:114) (cid:98) S by [23, Corollary 5.1.7]. Forthe particular case of S being L -regular, we know that V S is continuous ateach z ∈ S [23, p.186], and therefore the restriction of V ∗ S on S is identicallyzero, which together with [23, Corollary 5.1.4] implies that V S is continuousin C d . Hence, for L -regular sets S we have that g Ω = V S = V ∗ S .We are now prepared to complete the proof. Since S = supp( µ ) non-pluripolar excludes S to be an algebraic variety and thus S ( µ ) = C d , wemay apply (14) together with the inequality (cid:107) p (cid:107) ,µ ≤ µ ( C ) (cid:107) p (cid:107) L ∞ ( S ) , HRISTOFFEL-DARBOUX KERNELS. I 17 for p ∈ C [ z ], in order to conclude that, for all z ∈ C d ,lim inf m →∞ K µn tot ( m ) ( z, z ) /m = lim inf m →∞ max tdeg p ≤ mp (cid:54) =0 (cid:16) | p ( z ) | (cid:107) p (cid:107) ,µ (cid:17) /m ≥ Φ S ( z ) = exp(2 V S ( z )) , which shows parts (a) and (b). For our proof of part (c), we use the addi-tional assumption thatlim sup m →∞ max tdeg p ≤ mp (cid:54) =0 (cid:16) (cid:107) p (cid:107) L ∞ ( S ) (cid:107) p (cid:107) ,µ (cid:17) /m = 1 , implying thatlim sup m →∞ K µn tot ( m ) ( z, z ) /m ≤ Φ S ( z ) = exp(2 V S ( z )) , as required for the claim of Lemma 2.9(c). (cid:3) Since for d = 1 the plurisubharmonic Green function becomes the clas-sical Green function, and n tot ( m ) = m , we see that Lemma 2.9 gives the(pointwise) multivariate equivalent of (6), (7), at least for suﬃciently smooth S = supp( µ ) and in case S = (cid:98) S , that, is, for polynomially convex supports S . We thus have exponential growth of the Christoﬀel function in Ω, butnot in supp( µ ). In particular, we should mention that (18) and (19) arewrong in general for outliers z ∈ supp( µ ) being isolated in Ω, since theplurisubharmonic Green function remains the same after adding pluripolarsets [23, Corollary 5.2.5]. We will recall in § L -regular and polynomially convex sets where Lemma 2.9 could be appliedsuccessfully.Also, we should mention that in the recent paper [8] the authors discussSiciak functions for more general degree constraints based on convex bodies,together with its pluripotential interpretation.2.4. Leverage scores, the Mahalanobis distance and Christoﬀel-Darboux kernels.

The aim of this section is to recall some links betweenthe Christoﬀel-Darboux kernel and several notions from statistics. For arecent related work see [32]. In this subsection we will denote elements of C d as row vectors. The mean m ( µ ) and positive deﬁnite covariance matrix C ( µ ) of a random variable with values in C d with law described by someprobability measure µ have the entries m ( µ ) j = (cid:90) z j dµ ( z ) , C ( µ ) j,k = (cid:90) ( z j − m ( µ ) j )( z k − m ( µ ) k ) dµ ( z ) . The Mahalanobis distance between a point z ∈ C d and the probability mea-sure µ is given by∆( z, µ ) = (cid:112) ( z − m ( µ )) C ( µ ) − ( z − m ( µ )) ∗ . The reader easily checks in terms of the moment matrices introduced inLemma 2.3 that, for v d ( z ) = (1 , z , ..., z d ), we have that V ∗ (cid:102) M d ( µ ) V = (cid:20) (cid:104) , (cid:105) ,µ C ( µ ) (cid:21) , V := (cid:20) − m ( µ )0 I (cid:21) , where we recall that (cid:104) , (cid:105) ,µ = µ ( C d ) = 1 and hence p µ ( z ) = K µ ( z, z ) = 1for all z ∈ C d . We conclude from the previous identity that, with C ( µ ), also (cid:102) M d ( µ ) = M d ( µ ) is invertible. Then K µd ( z, z ) = v d ( z ) M d ( µ ) − v d ( z ) ∗ (20)= v d ( z ) V (cid:20) C ( µ ) − (cid:21) V ∗ v d ( z ) ∗ = 1 + ∆( z, µ ) . Hence, up to some additive constant, (cid:112) K µn ( z, z ) can be considered as anatural generalization of the Mahalanobis distance between a point z ∈ C d and the probability measure µ .In the next statement we will introduce what will be referred to as ourleverage score for each element of the support of a discrete measure. Lemma 2.10.

Consider the discrete probability measure (cid:101) ν = N (cid:88) j =1 t j δ z ( j ) for t j > and distinct z ( j ) ∈ C d . Then, for all j = 1 , , ..., N and for all n such that p (cid:101) νn exists, our leverage score t j K (cid:101) νn ( z ( j ) , z ( j ) ) satisﬁes t j K (cid:101) νn ( z ( j ) , z ( j ) ) ∈ (0 , . Proof.

This is an immediate consequence of (14) and of the fact that z ( j ) ∈ supp( (cid:101) ν ) ⊆ S ( (cid:101) ν ). (cid:3) For a discrete probability measure (cid:101) ν as in Lemma 2.10, it is classical indata analysis to consider the data matrix X ∈ C N × d with centered andscaled rows √ t j (cid:16) z ( j ) − m ( (cid:101) ν ) (cid:17) , and thus C ( (cid:101) ν ) = X ∗ X . In the literature ondata analysis, the quantity( X ( X ∗ X ) − X ∗ ) j,j = t j ∆( z ( j ) , (cid:101) ν ) for j = 1 , ..., N is known to be an element of [0 , z ( j ) is considered to be an outlier if ( X ( X ∗ X ) − X ∗ ) j,j is “close” to 1. Similar techniques have been applied for eliminating outlierdata in multiple linear regression; see for instance Cook’s distance. To makethe link with our leverage score for n = d , recall from (20) that t j (cid:16) K (cid:101) νd ( z ( j ) , z ( j ) ) − (cid:17) = ( X ( X ∗ X ) − X ∗ ) j,j ∈ [0 , − t j ] , the last inclusion following from 1 = K (cid:101) ν ( z ( j ) , z ( j ) ) ≤ K (cid:101) νd ( z ( j ) , z ( j ) ), andLemma 2.10. As a consequence, our leverage score of Lemma 2.10 for n = d HRISTOFFEL-DARBOUX KERNELS. I 19 can be closer to 1, and thus can be considered as slightly more precise.Choosing an index n > d corresponds to considering in the regression notonly random variables X j but also their powers X α describing mixed eﬀects.It remains to show that t j K (cid:101) νn ( z ( j ) ) is close to 1 (and for large n “very”close) iﬀ z ( j ) is an outlier. Of course this necessitates giving a more precisemeaning to what is called an outlier. These two questions are studied in thenext sections; see, e.g., Corollary 6.4 and Corollary 6.5 for d = 1.3. Approximation of Christoffel-Darboux kernels

The aim of the present section is to give suﬃcient conditions on positivemeasures µ, ν with compact support in C d insuring that the correspondingChristoﬀel-Darboux kernels K µn ( z, z ) and K νn ( z, z ) are close for a ﬁxed n andfor all z ∈ C d . To this aim the following matrices are essential. Deﬁnition 3.1.

For k = k µm = k νn , we deﬁne the matrix of mixed momentsas R n ( ν, µ ) := (cid:90) v νn ( z ) ∗ v µm ( z ) dν ( z ) ∈ C ( n +1) × ( m +1) , and the matrix of modiﬁed moments M m ( ν, µ ) := (cid:90) v µm ( z ) ∗ v µm ( z ) dν ( z ) ∈ C ( m +1) × ( m +1) . Comparing two measures.

In this subsection we study conditionsallowing one family of orthogonal polynomials to be linear combinations ofthe other.

Lemma 3.2.

Assume that two positive measures µ, ν satisfy k = k µm = k νn ,for some positive integers n, m . The following statements are equivalent: (a) N n ( ν ) ⊆ N m ( µ ) ; (b) S m ( µ ) ⊆ S n ( ν ) ; (c) { k µ , k µ , k µ , ..., k µm } ⊆ { k ν , k ν , k ν , ...k νn } ; (d) The orthogonal polynomials p µ , ..., p µm can be written as a linear combi-nation of p ν , p ν , ..., p νn ; (e) v µm = v νn R n ( ν, µ ) ; (f) There exists a constant c k such that, for all p ∈ C [ z ] of degree at most k , (cid:107) p (cid:107) ,µ ≤ c k (cid:107) p (cid:107) ,ν . In addition, the smallest such constant is given by c k = (cid:107) R m ( µ, ν ) (cid:107) .Proof. We know from Lemma 2.6 that we may write the set of multivariatepolynomials of degree at most k = k µm = k νn as a direct sum L m ( µ ) ⊕N m ( µ ) = L n ( ν ) ⊕ N n ( ν ). Thus the equivalence between (a), (b), (c) and (d) followsimmediately from Deﬁnition 2.4. The implication (d) = ⇒ (e) follows by ν -orthogonality, and the implication (f) = ⇒ (a) is trivial. Hence it onlyremains to show that (e) implies (f). Suppose that (e) holds, implying L m ( µ ) ⊆ L n ( ν ). Let p ∈ C [ z ] of degreeat most k . Applying Lemma 2.6 we may write p = q + p , q = v νn ξ ∈ L n ( ν ) , p ∈ N n ( ν ) ⊆ N m ( µ )for some ξ ∈ C n +1 . Then, applying again Lemma 2.6, q = p + p , p ∈ L m ( µ ) , p ∈ N m ( µ ) ∩ L n ( ν ) , where p ( z ) = (cid:90) K µn ( z, w ) q ( w ) dµ ( w )= (cid:90) v µn ( z ) v µn ( w ) ∗ v νn ( w ) ξdµ ( w ) = v µn ( z ) R m ( µ, ν ) ξ. Hence (cid:107) p (cid:107) ,µ = (cid:107) p (cid:107) ,µ = (cid:107) R m ( µ, ν ) ξ (cid:107) ≤ (cid:107) R m ( µ, ν ) (cid:107) (cid:107) ξ (cid:107) = (cid:107) R m ( µ, ν ) (cid:107) (cid:107) q (cid:107) ,ν = (cid:107) R m ( µ, ν ) (cid:107) (cid:107) p (cid:107) ,ν , showing that (e) holds with c k = (cid:107) R m ( µ, ν ) (cid:107) . Taking the supremum for ξ ∈ C n +1 (cid:114) { } shows that c k cannot be smaller. (cid:3) Under the assumptions of Lemma 3.2(e), we can make a link between R n ( ν, µ ) and the matrices deﬁned in Lemma 2.3, namely R n ( ν, µ ) = R n ( ν ) − E k ( ν ) T E k ( µ ) R m ( µ ) . (21)To see this, recall from (10) that v µm = v k E k ( µ ) R m ( µ ) , v νn = v k E k ( ν ) R n ( ν ) , where Lemma 3.2(c) tells us that E k ( µ ) = E k ( ν ) E k ( ν ) T E k ( µ ), and thus v µm = v νn R n ( ν ) − E k ( ν ) T E k ( µ ) R m ( µ ) = v νn R n ( ν, µ ) , implying (21). In this case, it is not diﬃcult to check that R n ( ν, µ ) is of fullcolumn rank m + 1 and of upper echelon form, and that we have full rankdecomposition M m ( ν, µ ) = R n ( ν, µ ) ∗ R n ( ν, µ ) . (22)In the remainder of this section we will always suppose that k = k µm = k νn with N n ( ν ) = N m ( µ ). Thus we may apply Lemma 3.2 twice, implying that m = n and k µj = k νj for j = 0 , , ..., n by Lemma 3.2(c). In particular, weﬁnd that E k ( ν ) T E k ( µ ) is the identity of order n + 1. We conclude using (21)that R n ( ν, µ ) is upper triangular and invertible, with inverse given by R n ( ν, µ ) − = R n ( µ ) − R n ( ν ) = R n ( µ, ν ) . (23)We summarize these ﬁndings and their consequence for the Christoﬀel-Darboux kernel in the following statement. HRISTOFFEL-DARBOUX KERNELS. I 21

Corollary 3.3.

Let n ≥ be some integer with k µn = k νn and N n ( µ ) = N n ( ν ) . Then v µn ( z ) = v νn ( z ) R n ( ν, µ ) , v νn ( z ) = v µn ( z ) R n ( µ, ν ) , (24) with the upper triangular and invertible matrices R n ( ν, µ ) = R n ( ν ) − R n ( µ ) and R n ( µ, ν ) = R n ( ν, µ ) − , allowing for the Cholesky decompositions M n ( ν, µ ) = R n ( ν, µ ) ∗ R n ( ν, µ ) , M n ( µ, ν ) = R n ( µ, ν ) ∗ R n ( µ, ν ) . (25) In particular, K νn ( z, w ) = v µn ( z ) M n ( ν, µ ) − v µn ( w ) ∗ . (26) Examples 3.4. (a) Let µ = µ be the normalized Lebesgue measure on( ∂ D ) d making monomials orthonormal. Here N ( µ ) = { } , and thus k µ n = n for all n . As a consequence, Lemma 3.2(d),(e) reduces to Lemma 2.3.(b) Consider the case ν = µ + σ for some positive measure σ with compactsupport; for instance, the case of adding a ﬁnite number of point masses.Since ν − µ ≥

0, clearly N ( ν ) ⊆ N ( µ ), and from Lemma 3.2 we knowthat we may express the orthogonal polynomials p µj in terms of the p νj .Moreover, (cid:107) M n ( µ, ν ) (cid:107) = (cid:107) R m ( µ, ν ) (cid:107) ≤ σ ) ⊆ S ( µ ), then N ( µ ) = N ( µ + σ ) by Example 2.5. Thus, in this case,we may also express the orthogonal polynomials p νj in terms of the p µj via(24), k µj = k νj for all j ≥

0, and, by (25) and (26), ∀ z ∈ C d , ∀ n ≥ K µ + σn ( z, z ) ≤ K µn ( z, z ) . Remarks 3.5.

Lasserre and Pauwels in [22, Theorem 3.9 and Theorem 3.12]consider the graded lexicographical ordering (16) and give an explicit se-quence of numbers κ m such that the Hausdorﬀ distance d H ( S, S m ) betweenthe sets S = supp( µ ) , and S m = { x ∈ R n : K µn tot ( m ) ( z, z ) ≤ κ m } tends to 0. For this they assume that S = Clos(Int( S ))), and that thereis some constant w min > dµ ≥ w min dA | S , where dA denotes(unnormalized) Lebesgue measure in R d .We summarize their reasoning, by using our notation. All measures ν con-sidered in this context are supported in R d , with supp( ν ) having non-emptyinterior (in R d ), such that N ( ν ) = { } and S ( ν ) = C d by Example 2.5.Notice that d H ( S, S m ) ≤ δ if ∀ z ∈ supp( µ ) with dist( z, ∂ supp( µ )) ≥ δ : K µn tot ( m ) ( z, z ) ≤ κ m , (27) ∀ z (cid:54)∈ supp( µ ) with dist( z, ∂ supp( µ )) ≥ δ : K µn tot ( m ) ( z, z ) ≥ κ m . (28)For z as in (27) we denote by B the real unit ball in R n , and observe that z + δB ⊆ S . Hence w min A | z + δB ≤ w min A | S ≤ µ. Using the fact that that The interested reader might want to check that this condition is true for all n if N ( µ ) = N ( ν ). an explicit upper bound κ (cid:48) m is known for K A | B n tot ( m ) (0 ,

0) since the 90-ies, weobtain by Example 3.4(b) and (11) K µn tot ( m ) ( z, z ) ≤ K A | z + δB n tot ( m ) ( z, z ) w min = A ( B ) A ( z + δB ) K A | B n tot ( m ) (0 , w min = κ (cid:48) m δ d w min , compare with [22, Lemma 6.1]. With z as in (28), they consider the annulus S (cid:48) = { x ∈ R d : δ ≤ (cid:107) x (cid:107) ≤ r } , where r = δ + diam( S ) such that S ⊆ S (cid:48) .Following [24, 25], they then consider the real needle polynomial p ( x ) = T (cid:98) m/ (cid:99) ( r + δ − (cid:107) x − z (cid:107) r − δ ) , with (cid:107) p (cid:107) L ∞ ( S ) ≤ (cid:107) p (cid:107) L ∞ ( S (cid:48) ) = 1 , tdeg p ≤ m and thus deg p ≤ n tot ( m ), and | p ( z ) | = T (cid:98) m/ (cid:99) ( r + δ r − δ ) ≥ e (cid:98) m/ (cid:99) g C(cid:114) [ − , ( r δ r − δ ) ≥ (cid:16) r + δr − δ (cid:17) ( m − / ;compare with [22, Lemma 6.3]. It follows from (14) that K µn tot ( m ) ( z, z ) ≥ | p ( z ) | (cid:107) p (cid:107) ,µ ≥ µ ( C ) (cid:16) δ diam( S ) (cid:17) m − . In view of (27), (28), it thus only remains to ﬁnd δ as a function of m (possibly decreasing) and κ m such that κ (cid:48) m δ dm w min ≤ κ m ≤ µ ( C ) (cid:16) δ m diam( S ) (cid:17) m − . For instance, for δ m = 1 /m α with α ∈ (0 , m , and the right-hand side exponentially in m , so that the aboveinequalities are true for suﬃciently large m . Enclosed we list some possibleimprovements.(1) The assumption S = Clos(Int( S ))) for S = supp( µ ) is strongly used,which does not allow outliers.(2) The assumption S = Clos(Int( S ))) is strongly used also to give up-per bounds for K µn tot ( m ) ( z, z ) in supp( µ ) at some distance from theboundary. This condition could be replaced by stepping to the rela-tive interior of supp( µ ) in S ( µ ) ∩ R d (in the real setting) or S ( µ ) inthe complex setting. This may possibly result in an increase in thepower of m .(3) The lower bound for K µn tot ( m ) ( z, z ) is very rough, and it should bepossible to write something down in terms of the plurisubharmonicGreen function of supp( µ ). One diﬃculty arises in this direction:one needs bounds uniformly in z for all z ∈ supp( µ ) with distanceto supp( µ ) decaying like a fractional negative power of m . Remarks 3.6.

Modiﬁed moment matrices and mixed moment matrices havebeen shown to be useful for explicitly computing orthogonal polynomials ofone or several complex variables; see for instance the modiﬁed Chebyshev

HRISTOFFEL-DARBOUX KERNELS. I 23 algorithm for orthogonality on the real line [17]. Here the basic idea isthat one knows µ and its orthogonal polynomials p µj , and computes both R n ( µ, ν ) and v νn ( z ) = v µn ( z ) R n ( µ, ν ) through ν -orthogonality, using, e.g.,the Gram-Schmidt method. In case of ﬁnite precision arithmetic, we canonly insure small errors if R n ( µ, ν ) has a modest condition number givenby (cid:107) R n ( µ, ν ) (cid:107) (cid:107) R n ( µ, ν ) − (cid:107) . In case of one complex variable it has beenshown in [4, Lemma 3.4] that this condition number grows exponentially in n unless the complements of supp( µ ) and supp( ν ) have the same unboundedconnected component, and, according to Lemma 2.9, a similar result is ex-pected to be true in the case of several complex variables. Thus the choiceof µ is essential.In particular, it is not always a good idea to compute p ν , p ν , ... frommonomials, even after scaling (11); compare with Example 3.4(a). Butthe modiﬁed moment matrix M n ( µ, ν ) and hence R n ( µ, ν ) might be muchbetter conditioned for µ close to ν , or, in other words, the p µj are “nearly” ν -orthogonal for j = 0 , , ..., n .3.2. Small perturbations.

In this subsection we compare for ﬁxed n theChristoﬀel-Darboux kernels for two measures µ and ν that are close in aprecise sense. In view of Lemma 3.2(f), the proximity of the two measures isimposed by the following condition: there exists a suﬃciently small (cid:15) ∈ (0 , − (cid:15) ) (cid:107) p (cid:107) ,µ ≤ (cid:107) p (cid:107) ,ν ≤ (1 + (cid:15) ) (cid:107) p (cid:107) ,µ (29)for all polynomials p of degree at most k µn . As a consequence of (29), thequantity (cid:107) p (cid:107) ,µ vanishes if and only if this is true for (cid:107) p (cid:107) ,ν ; that is, k µn = k νn and N n ( µ ) = N n ( ν ). This allows us to apply Corollary 3.3, in particular v µn = ( p µ , ..., p µn ) = v νn R n ( ν, µ ) = ( p ν , ..., p νn ) R n ( ν, µ ) with the Hermitianand positive deﬁnite modiﬁed moment matrix M n ( ν, µ ) = R n ( ν, µ ) ∗ R n ( ν, µ )being similar to M n ( µ, ν ) − .This modiﬁed moment matrix allows us to restate assumption (29): forany ξ ∈ C n +1 , by considering p ( z ) = v µn ( z ) ξ = v νn ( z ) R n ( ν, µ ) ξ in (29), weﬁnd (1 − (cid:15) ) (cid:107) ξ (cid:107) ≤ (cid:107) R n ( ν, µ ) ξ (cid:107) = ξ ∗ M n ( ν, µ ) ξ ≤ (1 + (cid:15) ) (cid:107) ξ (cid:107) ; (30)in other words, ν is so close to µ that all eigenvalues of the Hermitianmatrix M n ( ν, µ ) − I lie in the interval [ − (cid:15), (cid:15) ], or (cid:107) M n ( ν, µ ) − I (cid:107) ≤ (cid:15) . Usingthe Froebenius matrix norm, we therefore get the suﬃcient condition n (cid:88) j,k =0 (cid:12)(cid:12)(cid:12) (cid:104) p µj , p µk (cid:105) ,ν − δ j,k (cid:12)(cid:12)(cid:12) = (cid:107) M n ( ν, µ ) − I (cid:107) F ≤ (cid:15) (31)which indicates how p µ , ..., p µn are “nearly” ν -orthogonal.For two Hermitian matrices X , X we write that X ≤ X if X − X is positive semi-deﬁnite. We make use of this notation in the proofs of thefollowing results. Proposition 3.7.

Under the assumption (29) there holds for all z, w ∈ C d (1 − (cid:15) ) K νn ( z, z ) ≤ K µn ( z, z ) ≤ (1 + (cid:15) ) K νn ( z, z ) , (32) | K µn ( z, w ) − K νn ( z, w ) | ≤ (cid:15) (cid:112) K νn ( z, z ) (cid:112) K νn ( w, w ) , (33) | C µn ( z, w ) − C νn ( z, w ) | ≤ (cid:15). (34) Proof.

Recall from (26) that K µn ( z, w ) = v νn ( z ) M n ( µ, ν ) − v νn ( w ) ∗ . Since M n ( µ, ν ) − is similar to M n ( ν, µ ), we get from (30) that(1 − (cid:15) ) I ≤ M n ( µ, ν ) − ≤ (1 + (cid:15) ) I, (35)and a combination with (26) for w = z gives (32). Moreover, it follows from(35) that (cid:107) M n ( µ, ν ) − − I (cid:107) ≤ (cid:15) , and thus, again by (26), | K µn ( z, w ) − K νn ( z, w ) | ≤ (cid:15) (cid:107) v νn ( z ) (cid:107) (cid:107) v νn ( w ) (cid:107) , implying (33). Finally, from (32) we infer K µn ( z, z ) K µn ( w, w ) K νn ( z, z ) K νn ( w, w ) ∈ (cid:104) (1 − (cid:15) ) , (1 + (cid:15) ) (cid:105) and thus, by deﬁnition (15) and (33), | C µn ( z, w ) − C νn ( z, w ) |≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K µn ( z, w ) − K νn ( z, w ) (cid:112) K νn ( z, z ) K νn ( w, w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + | C µn ( z, w ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:115) K µn ( z, z ) K µn ( w, w ) K νn ( z, z ) K νn ( w, w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15) + | C µn ( z, w ) | (cid:15) ≤ (cid:15), as claimed in (34). (cid:3) Remark 3.8.

If assumption (29) holds for a pair of measures ( µ, ν ), thenit also holds for ( µ + σ, ν + σ ) for any measure σ ≥ S ( µ ), and hence(1 − (cid:15) ) K ν + σn ( z, z ) ≤ K µ + σn ( z, z ) ≤ (1 + (cid:15) ) K ν + σn ( z, z )by Proposition 3.7. Indeed, from Example 3.4(b) we know that k µn = k µ + σn ,and so, for any polynomial p, of degree at most k µ + σn |(cid:107) p (cid:107) ,µ + σ − (cid:107) p (cid:107) ,ν + σ | = |(cid:107) p (cid:107) ,µ − (cid:107) p (cid:107) ,ν | ≤ (cid:15) (cid:107) p (cid:107) ,µ ≤ (cid:15) (cid:107) p (cid:107) ,µ + σ , which implies assumption (29) for ( µ + σ, ν + σ ). Remark 3.9.

Multi-point analogues of Proposition 3.7 are also available.We include them for completeness. Suppose that assumption (29) holds,and let z , ..., z (cid:96) ∈ C d . Then(1 − (cid:15) ) K νn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) ≤ K µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) ≤ (1 + (cid:15) ) K νn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) , and, for the Froebenius matrix norm (cid:107) · (cid:107) F , (cid:107) C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) − C νn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) (cid:107) F ≤ (cid:112) (cid:96) ( (cid:96) − (cid:15). HRISTOFFEL-DARBOUX KERNELS. I 25

For a proof, consider the matrix V νn ( z , ..., z (cid:96) ) :=  v νn ( z )... v νn ( z (cid:96) )  . From the factorization K νn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) = V νn ( z , ..., z (cid:96) ) V νn ( z , ..., z (cid:96) ) ∗ , the identity for the polarized kernel follows by multiplying (35) on the leftby ξ ∗ V νn ( z , ..., z (cid:96) ), and on the right by its adjoint. For the cosine identity,we apply (34) and obtain (cid:107) C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) − C νn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) (cid:107) F = (cid:96) (cid:88) j,k =1 ,j (cid:54) = k | C µn ( z j , z k ) − C νn ( z j , z k ) | ≤ (cid:96) ( (cid:96) − (cid:15) , as required for the above claim. Remarks 3.10.

In a series of papers, Migliorati and his co-authors con-sidered a ﬁxed general probability measure µ compactly supported in R d (probably their results still remain valid in C d ) and the discrete measure ν = 1 N N (cid:88) j =1 w ( z ( j ) ) δ z ( j ) , with the N masspoints z ( j ) ∈ supp( µ ) being independent and identicallydistributed random variables with law given by some sampling probabilitymeasure (cid:101) µ with supp( (cid:101) µ ) = supp( µ ) (e.g., µ = (cid:101) µ ), and the density w ( z ) = dµd (cid:101) µ ( z ) is assumed to be strictly positive in supp( µ ). We now refer to [10]where, as in the present paper, the authors allow for general degree sequencesand general measures µ . In particular, in [10, Theorem 2.1(i)], the authorschoose N large enough such thatmax z ∈ supp( µ ) w ( z ) K µn ( z, z ) ≤ − log 22 + 2 r N log( N )for some r >

0, and show that the probability that (cid:107) M n ( ν, µ ) − I (cid:107) > / N − r . In other words, condition (30) holds for (cid:15) = 1 / − N − r close to 1. By checking the proof, it can be seen thata similar result is true for any (cid:15) ∈ (0 , / Additive perturbation of positive measures

The aim of this section is to give upper and lower bounds for the ratioof Christoﬀel-Darboux kernels K µ + σn ( z, z ) /K µn ( z, z ) for z (cid:54)∈ supp( µ ), butpossibly z ∈ supp( σ ). Several of the examples studied in later sectionsrequire that the cosine function C µn ( z, w ) (see (15)) has a limit, at least along subsequences, for z, w (cid:54)∈ supp( µ ) . Hence our bounds will be formulated interms of this cosine.Let supp( µ ) be an inﬁnite set such that there exists an inﬁnite number oforthogonal polynomials p µj , and consider the case of adding (cid:96) disjoint pointmasses in the Zariski closure S ( µ ) of the support of the original measure µ (and later outside supp( µ )). That is, σ = (cid:96) (cid:88) j =1 t j δ z j , t j > , z , ..., z (cid:96) ∈ S ( µ ) disjoint. (36)As in Remark 3.9, we depart from the canonical notation and allow z , . . . , z (cid:96) to be elements in C d rather than the numerical coordinates of a single point z . From Example 3.4(b) we know that N ( µ ) = N ( µ + σ ), and thus we mayapply Corollary 3.3 and in particular property (26) for ν = µ + σ . Lemma 4.1. If z , ..., z (cid:96) ∈ S ( µ ) are distinct, then there exists an N suchthat the matrix K µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) is invertible for all n ≥ N .Proof. There exist multivariate Lagrange polynomials p , ..., p (cid:96) , each of themof minimal degree, such that p j ( z k ) = δ j,k for k = 1 , ..., (cid:96) . The relation z , ..., z (cid:96) ∈ S ( µ ) implies that for each p ∈ N ( µ ) there holds p ( z j ) = 0 for j =1 , , ..., (cid:96) . For each p j there exists q j ∈ span { p µ , p µ , ... } with deg q j ≤ deg p j and p j − q j ∈ N ( µ ). Then q j ( z k ) = p j ( z k ) = δ j,k , and thus also q , ..., q (cid:96) are Lagrange polynomials. In particular, deg q j = deg p j by minimality ofdeg p j . We conclude that there exists an integer N with k µN = max { deg q µj : j = 1 , ..., (cid:96) } , and the rows { v µn ( z j ) : j = 1 , , ..., (cid:96) } are linearly independentfor n ≥ N . This is false for n < N by minimality of the degrees. Thus K µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) is invertible for all n ≥ N , but not for n < N . (cid:3) Notice that Lemma 4.1 is trivial for the case d = 1 of univariate polyno-mials, where z α ( j ) = z j and hence N = (cid:96) − K µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) )and C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) are invertible for distinct points z j . This enablesus to establish the following comparison theorem consisting of a sequence ofalternating inequalities. HRISTOFFEL-DARBOUX KERNELS. I 27

Theorem 4.2.

Let z , ..., z (cid:96) ∈ S ( µ ) be distinct, n ≥ N as in Lemma 4.1,and consider the following matrices (depending on n ) C := C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) , (cid:101) C := C µn ( z , ..., z (cid:96) , z ; z , ..., z (cid:96) , z ) = (cid:20) C b ∗ b (cid:21) , b ∈ C × (cid:96) ,D := diag (cid:32) (cid:112) t j K µn ( z j , z j ) (cid:33) j =1 ,...,(cid:96) ,and the constants Σ m := 1 − m − (cid:88) j =0 ( − j bC − ( D C − ) j b ∗ , m = 1 , , . . . . Then, for all z ∈ C d , K µ + σn ( z, z ) K µn ( z, z ) = 1 − b ( D + C ) − b ∗ , (37) and Σ ≤ Σ ≤ Σ ≤ ... ≤ K µ + σn ( z, z ) K µn ( z, z ) ≤ ... ≤ Σ ≤ Σ ≤ Σ . (38)Before presenting the proof of this theorem, we make some importantobservations and provide two examples where the result is applied.Notice that, in the case of outlier detection, z j lies in Ω ∩ S ( µ ), with Ω theunbounded connected component of C (cid:114) supp( µ ). Hence (6) in case d = 1and (17) in case d > (cid:107) D (cid:107) → n → ∞ .Also, (cid:107) b (cid:107) ≤ √ (cid:96) since | C µn ( z, z j ) | ≤

1. Hence, as long as C − is boundeduniformly for n suﬃciently large, we see that (38) provides quite sharp lowerand upper bounds, even for modest values of m . This assumption on C isveriﬁed in many special cases discussed below where we show that (a unitaryscaled counterpart of) C has a ﬁnite limit as n → ∞ .In the next two examples, we make this last assertion a bit more explicit. Example 4.3.

We begin with the simple special case σ = t δ z of addingone point mass. Here, with the notation of Theorem 4.2, C = 1 , b = C µn ( z, z ) , D = 1 t K µn ( z , z ) . With these data we ﬁnd from (37) K µ + σn ( z, z ) K µn ( z, z ) = 1 − b (1 + D ) − b ∗ = 1 − | C µn ( z, z ) | t K µn ( z ,z ) , and, from (38) for m = 2 , − t K µn ( z , z )) ≤ Σ − Σ ≤ K µ + σn ( z, z ) K µn ( z, z ) − Σ = K µ + σn ( z, z ) K µn ( z, z ) − | C µn ( z, z ) | (cid:16) − t K µn ( z , z ) (cid:17) ≤ . (cid:3) Example 4.4.

For σ consisting of (cid:96) ≥ m ∈ { , , } . From (38)for m = 0 , = det C µn ( z , ..., z (cid:96) , z ; z , ..., z (cid:96) , z )det C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) ≤ K µ + σn ( z, z ) K µn ( z, z ) ≤ Σ = 1 . (39)where, in the equality on the left, we have used Schur complement tech-niques. Since Σ vanishes when z is one of the point masses at z j , we goone term further: − (cid:16) Σ − Σ (cid:17) ≤ K µ + σn ( z, z ) K µn ( z, z ) − Σ ≤ , (40)with Σ − Σ = bC − D C − b ∗ = (cid:96) (cid:88) j =1 | bC − e j | t j K µn ( z j , z j ) (41)= (cid:96) (cid:88) j =1 | det C µn ( z , .., z j − z, z j +1 , ..., z (cid:96) ; z , ..., z (cid:96) ) | | det C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) | t j K µn ( z j , z j ) , where in the last equality we have applied Cramer’s rule. (cid:3) Proof of Theorem 4.2.

We start by recalling that, by (26) for ν = µ + σ , K µ + σn ( z, z ) = v µn ( z ) M n ( µ + σ, µ ) − v µn ( z ) ∗ . By introducing the matrices V :=  v µn ( z ) / (cid:112) K µn ( z , z )... v µn ( z (cid:96) ) / (cid:112) K µn ( z (cid:96) , z (cid:96) )  ∈ C (cid:96) × ( n +1) , and D, C, b as in the statement of Theorem 4.2, it is not diﬃcult to checkthat M n ( µ + σ, µ ) = I + V ∗ D − V, and thus, by the Sherman-Morrison formula [18, § M n ( µ + σ, µ ) − = I − V ∗ D − ( I + D − V V ∗ D − ) − D − V. Observing that C = V V ∗ , b = v µn ( z ) V ∗ n (cid:112) K µn ( z, z ) , HRISTOFFEL-DARBOUX KERNELS. I 29 we obtain (37).If we neglect D, which usually is assumed to have small entries, we obtainin (37) that the right-hand side is the Schur complement Σ = (cid:101) C/C . For D (cid:54) = 0 we can give lower and upper bounds: by assumption, C is Hermit-ian positive deﬁnite, and hence has a square root C / with inverse C − / .Introducing the Hermitian positive deﬁnite matrix E = C − / D C − / , wehave for all integers m ≥ − m (cid:16) ( I + E ) − − m − (cid:88) j =0 ( − j E j (cid:17) = E m ( I + E ) − = (( E / ) m ) ∗ ( I + E ) − ( E / ) m ≥ , where again we write A ≤ B for two Hermitian matrices if B − A is positivedeﬁnite. Substituting gives( − m (cid:16) ( D + C ) − − m − (cid:88) j =0 C − ( D C − ) j (cid:17) ≥ , which together with (37) implies (38). (cid:3) Cosine asymptotics in the univariate case

In this section we consider orthogonal polynomials of a single complexvariable z ∈ C . For obtaining asymptotics it is natural to assume throughoutthe section that supp( µ ) is inﬁnite. Thus we have the nullspace N ( µ ) = 0, k µn = n , and S ( µ ) = C , compare with § C µn ( z, w ) andthus of K µ + σn ( z, z ) /K µn ( z, z ). Our main result is the following. Theorem 5.1.

Let Ω denote a subdomain of the unbounded component of C (cid:114) supp( µ ) , and suppose that there is a function g analytic and diﬀerentfrom zero in Ω such that lim n →∞ p µn ( z ) p µn +1 ( z ) = g ( z ) (42) uniformly on any compact subset of Ω . Let F be a compact subset of Ω .Then /K µn ( z, z ) → uniformly for z ∈ F , and lim n →∞ C µn ( z, w ) | p µn ( z ) | p µn ( z ) p µn ( w ) | p µn ( w ) | = (cid:112) (1 − | g ( z ) | )(1 − | g ( w ) | )1 − g ( z ) g ( w ) (43) uniformly for z, w ∈ F . Furthermore, let z , ..., z (cid:96) , w , ...w (cid:96) ∈ Ω with distinct g ( z ) , ..., g ( z (cid:96) ) anddistinct g ( w ) , ..., g ( w (cid:96) ) . Then, uniformly for z, w ∈ F , there holds lim n →∞ (cid:12)(cid:12)(cid:12) det C µn ( z , ..., z (cid:96) , z ; w , ..., w (cid:96) , w )det C µn ( z , ..., z (cid:96) ; w , ..., w (cid:96) ) (cid:12)(cid:12)(cid:12) (44)= (cid:112) (1 − | g ( z ) | )(1 − | g ( w ) | ) | − g ( z ) g ( w ) | (cid:96) (cid:89) j =1 (cid:12)(cid:12)(cid:12) g ( w ) − g ( w j )1 − g ( w j ) g ( w ) g ( z ) − g ( z j )1 − g ( z ) g ( z j ) (cid:12)(cid:12)(cid:12) . Before giving the proof of this theorem we mention a suﬃcient conditionfor which the hypotheses hold (see [37, Proposition 3.4]) .

Lemma 5.2.

If there exists a function G ( z ) analytic and non-zero at inﬁnitysuch that lim n →∞ zp µn ( z ) p µn +1 ( z ) = G ( z ) , uniformly in some neighborhood of inﬁnity, then lim n →∞ p µn ( z ) p µn +1 ( z ) = g ( z ) := G ( z ) z , (45) uniformly on every closed subset of Ω :=

C (cid:114)

Co(supp( µ )) , where Co( S ) denotes the convex hull of the set S. Moreover, < | g ( z ) | < for z ∈ Ω . Remark 5.3.

While it is possible for the limit (45) to hold everywhere in

1. (The same argu-ment implies that | g ( z ) | < z ∈ Ω.) In particular, K µn ( z, z ) ≥ | p µn ( z ) | ≥ q N − n ) | p µN ( z ) | ≥ c q N − n ) , showing that 1 /K µn ( z, z ) → F , as claimed in the theorem.Conversely, K µn ( z, z ) = K µN ( z, z ) + n (cid:88) j = N +1 | p µj ( z ) | ≤ c + | p µn ( z ) | n (cid:88) j =0 q n − j ≤ c + | p µn ( z ) | − q . As a consequence, | p µN ( z ) | ≤ (cid:113) K µN ( z, z ) = o ( | p µn ( z ) | ) n →∞ , (cid:113) K µn ( z, z ) = O ( | p µn ( z ) | ) n →∞ , uniformly on F .Summing the inequality (47) for j = N, N + 1 , ..., n − (cid:12)(cid:12)(cid:12) (1 − g ( z ) g ( w )) (cid:16) K µn ( z, w ) − K µN ( z, w ) (cid:17) − p µn ( z ) p µn ( w ) + p µN ( z ) p µN ( w ) (cid:12)(cid:12)(cid:12) ≤ n − (cid:88) j = N | p µj ( z ) p µj ( w ) − g ( z ) p µj +1 ( z ) g ( w ) p µj +1 ( w ) |≤ (2 (cid:15) + (cid:15) ) n − (cid:88) j = N (cid:113) | p µj ( z ) | + | p µj +1 ( z ) | (cid:113) | p µj ( w ) | + | p µj +1 ( w ) | ≤ (cid:15) + (cid:15) ) (cid:113) K µn ( z, z ) K µn ( w, w ) . Since (cid:15) > − g ( z ) g ( w )) K µn ( z, w ) = p µn ( z ) p µn ( w )(1 + o (1) n →∞ ) , (49)(1 − | g ( z ) | ) K µn ( z, z ) = | p µn ( z ) | (1 + o (1) n →∞ ) , (50)uniformly on F . This implies (43).For a proof of (44), we may suppose that z , ..., z (cid:96) , w , ..., w (cid:96) ∈ F , andneglect the factors of modulus 1 on the left-hand side of (43). Since the square roots on the right-hand side of (43) can be factored by linearity ofthe determinant, for establishing (44) it is suﬃcient to recall the well-knownexpression for the determinant of a Pick matrix, namelydet (cid:16) − a j b k (cid:17) j,k =1 ,...,(cid:96) = (cid:81) (cid:96)j,k =1 ,j

Under the assumptions of Theorem 5.1, and in particular z , ..., z (cid:96) ∈ Ω with distinct g ( z ) , ..., g ( z (cid:96) ) , we have uniformly on compactsubsets of Ω lim n →∞ K µ + σn ( z, z ) K µn ( z, z ) = (cid:12)(cid:12)(cid:12) (cid:96) (cid:89) j =1 g ( z ) − g ( z j )1 − g ( z ) g ( z j ) (cid:12)(cid:12)(cid:12) , (52) and, at a point mass z = z m ,K µ + σn ( z m , z m ) = 1 t m (cid:18) O (cid:18) K µn ( z m , z m ) (cid:19)(cid:19) n →∞ . (53) Proof.

We appeal to a special case of Theorem 4.2 that assertsΣ ≤ Σ ≤ K µ + σn ( z, z ) K µn ( z, z ) ≤ Σ , where we recall that tacitly all quantities depend on n and z . We will alsouse the abbreviations B ( z ) := (cid:96) (cid:89) j =1 g ( z ) − g ( z j )1 − g ( z ) g ( z j ) , B j ( z ) := (cid:96) (cid:89) k =1 ,k (cid:54) = j g ( z ) − g ( z k )1 − g ( z ) g ( z k ) . The matrix C = C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) is positive semi-deﬁnite; hence by(39) Σ = det C µn ( z , ..., z (cid:96) , z ; z , ..., z (cid:96) , z )det C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) )= (cid:12)(cid:12)(cid:12) det C µn ( z , ..., z (cid:96) , z ; z , ..., z (cid:96) , z )det C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) (cid:12)(cid:12)(cid:12) , HRISTOFFEL-DARBOUX KERNELS. I 33 which, by (44) for w j = z j and z = w, tends to | B ( z ) | as n → ∞ for z ∈ Ω.It remains to examine Σ − Σ : we recall from (41) thatΣ − Σ = (cid:96) (cid:88) j =1 | det C µn ( z , ..., z j − , z, z j +1 , ..., z (cid:96) ; z , ..., z (cid:96) ) | | det C µn ( z , ..., z (cid:96) ; z , ..., z (cid:96) ) | t j K µn ( z j , z j ) , which according to (44) behaves like (cid:96) (cid:88) j =1 t j K µn ( z j , z j ) (cid:16) − (cid:12)(cid:12)(cid:12) g ( z ) − g ( z j )1 − g ( z ) g ( z j ) (cid:12)(cid:12)(cid:12) (cid:17) (cid:12)(cid:12)(cid:12) B j ( z ) B j ( z j ) B j ( z j ) B j ( z j ) (cid:12)(cid:12)(cid:12) , where B j ( z j ) (cid:54) = 0 by assumption on g ( z ) , ..., g ( z (cid:96) ). From Theorem 5.1 andits proof we know that 1 /K µn ( z j , z j ) exponentially decays to 0 for all j as n → ∞ , implying that (52) holds.Notice that (52) is not very useful at a point mass z = z m since then B ( z m ) = 0 and even Σ = 0, implying that b ∗ = Ce m , the m th column of C . Here it is more helpful to return to the deﬁnition of Σ j and observe thatΣ − Σ = bC − D C − b ∗ = e ∗ m D e m = 1 t m K µn ( z m , z m ) , Σ − Σ = bC − D C − D C − b ∗ = e ∗ m C − e m (cid:16) t m K µn ( z m , z m ) (cid:17) . From Cramer’s rule and (44) we deduce that e ∗ m C − e m has a limit diﬀerentfrom 0 as n → ∞ . Hence (53) holds. (cid:3) Remark 5.5.

We have precise asymptotics with error terms in (53), butsuch error terms are missing in (52) as well as in (43) and (44).Under the assumption of Theorem 5.1, a slightly more careful error anal-ysis shows that the maximal error for z ∈ F in (49), (50) and thus in (43)is bounded by a constant timesmax z ∈ F (cid:12)(cid:12)(cid:12) p µn ( z ) − g ( z ) p µn +1 ( z ) (cid:113) | p µn ( z ) | + | p µn +1 ( z ) | (cid:12)(cid:12)(cid:12) plus an exponentially decreasing term. Since | C µn ( · , · ) | ≤

1, the same is truefor (44) by linearity of the determinant, and thus also for (52).

Remark 5.6.

In the examples presented in the next section, supp( µ ) iscompact with smooth boundary, Ω is the unbounded connected componentof C (cid:114) supp( µ ) being supposed to be simply connected, and (42) holds with g ( z ) = 1 / Φ( z ), with Φ the Riemann conformal map from Ω onto the exteriorof the closed unit disk. In particular, g is injective and, with z , ..., z (cid:96) , also g ( z ) , ..., g ( z (cid:96) ) are distinct. Moreover, the limit in (52) is diﬀerent from 0for z ∈ Ω diﬀerent from a point mass.Also, in this case, Ω is known to be regular with respect to the Dirichletproblem, and the measures µ of § Reg , which together with (7) allows us to specify the rate of geometric convergence mentioned in thepreceding Remark.6.

Asymptotics in Bergman space

In this section we discuss a special and well known case of orthogonal-ity in one complex variable where the precise asymptotic behavior of theChristoﬀel-Darboux kernels, both in (a small neighborhood of) the supportof the measure of orthogonality and far enough from the support is preciselyknown; Theorem 6.2 below contains precise ratio asymptotics, relevant forour study. The ﬁndings of the previous sections are then put to work, yield-ing the performance of the leverage score, see Corollary 6.5.Throughout this section G denotes a bounded open subset of the complexplane with simply connected complement, with boundary Γ; µ stands forthe area measure on Clos( G ) = G ∪ Γ. We denote by Φ the Riemann outerconformal map from

C(cid:114)

Clos( G ) onto C(cid:114)

Clos( D ) ﬁxing the point at inﬁnity.As usual, for r >

1, the compact level sets are deﬁned by their complement,

C (cid:114) G r := { z ∈ C (cid:114)

Clos( G ) : | Φ( z ) | > r } . Henceforth c j is used to denote some absolute, strictly positive constantsneither depending on z nor on n .Bergman orthogonal polynomials and their kernels have been used quitesuccessfully as building blocks of conformal maps; the long history of theirasymptotics is recorded in the monographs [16], [43]; the recent article [5]deals with the case of piecewise analytic boundaries. To provide some com-parison basis, we start by the simplest case of the unit disk. Example 6.1.

To be more explicit, set µ = µ D , the area measure on the unitdisk. Since p µ D n ( w ) = (cid:112) ( n + 1) /πw n , explicit formulas for the Christoﬀelkernel are at hand; in particular,max w ∈ supp( µ D ) K µ D n ( w, w ) = ( n + 1)( n + 2)2 π , (54) K µ D n ( w, w ) = n + 1 π | w | n +2 | w | − O (1 /n ) n →∞ ) , (55)the second relation being true uniformly on compact subsets of C(cid:114) supp( µ D ).The following theorem collects all relevant estimates and asymptotics. Theorem 6.2.

Let µ be area measure on G , and suppose that Γ is a Jordancurve which is either piecewise analytic without cusps, or otherwise possessesan arc length parametrization with a derivative being / -H¨older continuous. (a) For all n ≥ and z ∈ G : K µn ( z, z ) ≤ π dist( z, Γ) . HRISTOFFEL-DARBOUX KERNELS. I 35 (b)

With r ( n ) := (cid:113) n +1 the estimates e max z ∈ G r ( n ) K µn ( z, z ) ≤ max z ∈ supp( µ ) K µn ( z, z ) ≤ max z ∈ G r ( n ) K µn ( z, z ) ≤ γ n := c dist(Γ , ∂G r ( n ) ) . hold for all n ≥ . (c) We have K µn ( z, z ) = n + 1 π | Φ (cid:48) ( z ) | | Φ( z ) | − | Φ( z ) | n +2 (1 + O ( 1 n ) n →∞ ) uniformly on compact subsets of C (cid:114) supp( µ ) . (d) The asymptotics p µn ( z ) p µn +1 ( z ) = 1Φ( z ) (1 + O ((1 /n ) n →∞ ) is valid uniformly on compact subsets of C (cid:114) supp( µ ) .Proof. Part (a) is classical (being valid for any domain G ); it follows, e.g.,from [16, Lemma 1]. For a proof of part (b), we claim that the ﬁrst twoinequalities are simple consequences of the maximum principle and (5). In-deed, for any r ≥ z ∈ supp( µ ) K µn ( z, z ) ≤ max z ∈ G r K µn ( z ) ≤ max z ∈ G r max deg P ≤ n | P ( z ) | (cid:107) P (cid:107) ,µ . Using the maximum principle for P ( z ) on G r we ﬁnd the upper boundmax z ∈ G r K µn ( z ) ≤ max deg P ≤ n max z ∈ ∂G r | P ( z ) | (cid:107) P (cid:107) ,µ ≤ max z ∈ ∂G r K µn ( z, z ) , and thus the maximum of K µn ( z, z ) is attained at the boundary. Similarly,from the maximum principle for P ( z ) / Φ( z ) n on G (cid:114) supp( µ ) we infer thatmax z ∈ ∂G r ( n ) K µn ( z, z ) ≤ r ( n ) n max deg P ≤ n max z ∈ supp( µ ) | P ( z ) | (cid:107) P (cid:107) ,µ ≤ e max z ∈ supp( µ ) K µn ( z, z ) , showing that the ﬁrst two inequalities of part (b) are true. In order to showthe last inequality, we closely follow [5], and consider the row vectors P := ( ψ (cid:48) ( w ) p µj ( z )) j =0 ,...,n , Q := ( (cid:114) j + 1 π w j ) j =0 ,...,n ,F := ( ψ (cid:48) ( w ) F (cid:48) j +1 ( z ) (cid:112) π ( j + 1) ) j =0 ,...,n , where w = Φ( z ) is of modulus larger than 1, z = ψ ( w ), and F n is the n thFaber polynomial associated to G . Notice that (cid:107) P (cid:107) = K µn ( z, z ) / | Φ (cid:48) ( z ) | ,whereas (cid:107) Q (cid:107) = K µ D n ( w, w ) ≤ n + 1 π | w | n +2 | w | − n th Bergman Christoﬀel-Darboux kernel for the unit disk discussedin Example 6.1. From [5, Eqn. (1.5)] we know that the k the component of F − Q is given by the k th component of ( (cid:112) ( j + 1) /πw − j − ) j =0 , ,... C , with C the inﬁnite Grunsky matrix having norm strictly less than 1, and hence ∀| w | > (cid:107) F − Q (cid:107) ≤ π ( | w | − . From [5, Eqn. (2.1) and Corollary 2.3] we know that there exists an uppertriangular matrix R n with (cid:107) R n (cid:107) ≤ (cid:107) R − n (cid:107) ≤ / (cid:112) − (cid:107) C (cid:107) such that F = P R n . Thus ∀| w | > (cid:107) F (cid:107) ≤ (cid:107) P (cid:107) ≤ (cid:107) F (cid:107) − (cid:107) C (cid:107) . Combining these ﬁndings gives for z ∈ ∂G r ( n ) and hence | w | − / ( n + 1) K µn ( z, z ) = | Φ (cid:48) ( z ) | (cid:107) P (cid:107) ≤ | Φ (cid:48) ( z ) | ( (cid:107) Q (cid:107) + (cid:107) F − Q (cid:107) ) − (cid:107) C (cid:107) ≤ . − (cid:107) C (cid:107) | Φ (cid:48) ( z ) | ( | Φ( z ) | − . Recalling from [46, Theorem 3.1] that1 / z, Γ) ≤ | Φ (cid:48) ( z ) || Φ( z ) | − ≤ z, Γ) , we have established the last inequality of part (b), with c ≤ . −(cid:107) C (cid:107) de-pending only on the geometry of G .For a proof of parts (c) and (d), we recall from [5, Eqn. (1.12)] that ε n = (cid:107) Ce n (cid:107) = O (1 /n ) by our assumptions on Γ, and hence by [5, Theorem 1.1],uniformly for z in a compact subset of C (cid:114) supp( µ ), p µn ( z ) = Φ (cid:48) ( z ) √ n + 1 π Φ( z ) n (cid:18) O (cid:18) n + 1 (cid:19) n →∞ (cid:19) . (56)Then part (c) follows by taking squares in (56), summing, and ﬁnally apply-ing (55) with w = Φ( z ). Also, part (d) on ratio asymptotics is an immediateconsequence of (56). (cid:3) Remarks 6.3. (a)

We claim without proof that the estimates of the previ-ous statement and its proof could be improved if we are willing to add addi-tional smoothness assumptions on the boundary Γ. E.g., for a C boundaryΓ, (cid:107) F − Q (cid:107) / (cid:107) Q (cid:107) can be shown to tend to zero for n → ∞ uniformly for | Φ( z ) | − ≥ / ( n + 1), leading to lower and upper bounds for K µn ( z, z ) thatare sharp for n → ∞ up to a constant. HRISTOFFEL-DARBOUX KERNELS. I 37 (b)

For general Γ as in Theorem 6.2, the quantity γ n of Theorem 6.2(b)can be shown to behave like ( n + 1) in the case when Γ has no corner ofouter angle > π . Since this is true in particular for a C boundary, our The-orem 6.2(b) is in accordance with a recent result of Totik [45, Theorem 1.3]who showed under this additional assumption that the limitlim n →∞ n max z ∈ ∂ supp( µ ) K µn ( z, z ) is ﬁnite and (cid:54) = 0.In the case when there is a maximal outer angle = απ with α > γ n canbe shown to behave like ( n + 1) α , which we believe to be also the rate ofgrowth of max z ∈ ∂ supp( µ ) K µn ( z, z ). . (c) The O (1 /n ) error term in Theorem 6.2(c) can be improved for smootherboundaries Γ, but not in Theorem 6.2(d), see Example 6.1. (cid:3) In the next statement we improve a statement of Lasserre and Pauwels[22, Theorem 3.9 and Theorem 3.12]; compare with Remark 3.5. Here weallow for a ﬁnite number of ﬁxed outliers in the case of one complex variable.

Corollary 6.4.

Let µ be as in Theorem 6.2, and ν = µ + σ, σ = (cid:96) (cid:88) j =1 t j δ z j , t j > , z , ..., z (cid:96) ∈ C (cid:114) supp( µ ) distinct.Then the Hausdorﬀ distance between supp( ν ) and the level set S n := { z ∈ C : K νn ( z, z ) ≤ γ n } with γ n as in Theorem 6.2(b) tends to zero as n → ∞ .Proof. We ﬁrst recall from Theorem 4.2 that K νn ( z, z ) ≤ K µn ( z, z ), and hencesupp( µ ) ⊆ S n by Theorem 6.2(b). Also, setting ν j := ν − t j δ z j , we knowfrom Theorem 4.2 that K νn ( z j , z j ) = 1 /t j t j K νjn ( z j ,z j ) . The quantity K ν j n ( z j , z j ) /K µn ( z j , z j ) has a non-zero limit according to Corol-lary 5.4 and Theorem 6.2(d). Also, Theorem 6.2(c) together with Re-mark 6.3(b) imply that K µn ( z j , z j ) /γ n grows geometrically large. Hence K νn ( z j , z j ) → /t j with a geometric rate, implying that supp( ν ) ⊆ S n forsuﬃciently large n .For z outside of a neighborhood U of supp( ν ) we write K νn ( z, z ) γ n = K νn ( z, z ) K µn ( z, z ) K µn ( z, z ) γ n . The ﬁrst factor on the right-hand side has a non-zero limit according toCorollary 5.4 and Theorem 6.2(d) uniformly for z (cid:54)∈ U . Also, again byTheorem 6.2(c) and Remark 6.3(b), K µn ( z, z ) /γ n grows at least as a constanttimes | Φ( z ) | n /n β uniformly for z (cid:54)∈ U for some constant β >

0, and weconclude that S n ⊆ U for suﬃciently large n , showing the convergence ofthe Hausdorﬀ distance. (cid:3) We are now prepared to describe a situation where the eﬃciency of ourleverage score for detecting outliers can be tested.

Corollary 6.5.

Let µ, ν, σ be as in Corollary 6.4, and n be suﬃciently large.Furthermore, consider the discrete measures (cid:101) ν = (cid:101) µ + σ, (cid:101) µ = N (cid:88) j = (cid:96) +1 t j δ z j , t j > , z (cid:96) +1 , ..., z N ∈ supp( µ ) distinct,where we assume that (cid:101) µ is suﬃciently close to µ such that (cid:107) M n ( (cid:101) µ, µ ) − I (cid:107) ≤ / . Then there exist c > and q ∈ (0 , depending only on µ, σ but noton (cid:101) ν such that, for the outliers, j = 1 , , ..., (cid:96) : 1 − t j K (cid:101) νn ( z j , z j ) ≤ cq n , whereas for the other mass points of (cid:101) νj = (cid:96) + 1 , ..., N : t j K (cid:101) νn ( z j , z j ) ≤

32 min (cid:110) t j γ n , t j π dist( z j , Γ) (cid:111) , where γ n is as in Theorem 6.2(b).Proof. We ﬁrst consider the case j ∈ { , ..., (cid:96) } of outliers. Let ν j = ν − t j δ z j , (cid:101) ν j = (cid:101) ν − t j δ z j . According to Remark 3.8 with (cid:15) = 1 / K (cid:101) ν j n ( z j , z j ) ≥ K ν j n ( z j , z j ) /

2, and hence1 − t j K (cid:101) νn ( z j , z j ) = 11 + t j K (cid:101) ν j n ( z j , z j ) ≤ t j K ν j n ( z j , z j )= 2 t j K µn ( z j , z j ) K ν j n ( z j , z j ) 1 K µn ( z j , z j ) , and we conclude as in the previous proof using Theorem 6.2(c),(d) andCorollary 5.4 that our ﬁrst assertion holds.Now we consider the case j ∈ { (cid:96) + 1 , ..., N } of mass points z j ∈ supp( µ ).Applying ﬁrst Theorem 4.2 and then Proposition 3.7 with (cid:15) = 1 /

2, we arriveat t j K (cid:101) νn ( z j , z j ) ≤ t j K (cid:101) µn ( z j , z j ) ≤ t j K µn ( z j , z j ) . Thus our second assertion is a consequence of Theorem 6.2(a),(b). (cid:3)

By comparing the upper left entry, our assumption (cid:107) M n ( (cid:101) µ, µ ) − I (cid:107) ≤ / (cid:101) µ ( C ) µ ( C ) = 1 µ ( C ) N (cid:88) j = (cid:96) +1 t j ∈ [1 / , / , and thus a typical weight of (cid:101) µ is of order 1 /N . Hence, roughly speaking,our leverage score t j K (cid:101) νn ( z j , z j ) is very close to 1 for the case j ∈ { , ..., (cid:96) } ofoutliers, and at least bounded for z j lying in the interior of supp( µ ), and moreprecisely, having a weight of typical size and z j being at least of distance HRISTOFFEL-DARBOUX KERNELS. I 39 / √ N to the boundary Γ. Moreover, for equal weights t (cid:96) +1 = · · · = t N (andthus of order 1 /N ), as long as (cid:96) is ﬁxed, n is suﬃciently large, and N ≥ γ n (57)Corollary 6.5 tells us that we are able to identify correctly the outliers asthose elements z j of the support where our leverage score t j K (cid:101) νn ( z j , z j ) isclose to 1.In order to illustrate this claim, we conclude this section by reportingsome numerical simulations. Here we discuss for diﬀerent clouds of distinctpoints z , ..., z N ∈ C , and equal weights t = · · · = t N = 1 /N , with thecorresponding discrete measure ν . The points z j of the clouds are drawnfollowing a color code given by our leverage score t j K µn ( z j , z j ), with red cor-responding to values close to 0, and blue values close to 1. For each cloud,we draw in the upper row (without theoretical justiﬁcation) the level scoresfor bivariate orthogonal polynomials with lexicographical ordering for totaldegrees 1 , , n = 2 , ,

44) and in the lower row for univariateBergman orthogonal polynomials for n = 1 , ,

44. The ﬁrst column essen-tially corresponds to the classical leverage score known from data analysisand mentioned in Section 2.4.The vector of values of the leverage score has been computed in ﬁniteprecision arithmetic as follows: in the univariate case, we ﬁrst compute bythe full Arnoldi method in complexity O ( n N ) the Hessenberg matrix al-lowing to represent zv (cid:101) νn ( z ) in terms of v (cid:101) νn +1 ( z ), and then use a link betweenvalues of the Christoﬀel-Darboux kernel and GMRES for the shifted Hes-senberg matrix, with a total complexity of O ( n N ). A generalization of thisapproach has been used for bivariate orthogonal polynomials. There existmore eﬃcient approaches, but many of them suﬀer from loss of orthogonality.For each of our simulations we give an indicator showing that our approachdoes not have this drawback. Also, and this is probably the most importantmessage for large data sets, the complexity and memory requirements scalelinearly with N .In the numerical experiments displayed in Figures 1, 2, and 3, we observethat the classical leverage scores in the ﬁrst column do not detect any of ouroutliers, probably due to the small weights t j = 1 /N . In the experiences ofthe other two columns, we clearly detect outliers, but the color separation ismore pronounced for one complex variable, in particular for outliers whichare closer to G . In the right column for n = 44, the color code for theoutliers is best, but in fact also corners of the domain and some other pointsof the support of (cid:101) ν have an outlier color code. This seems to be partly aconsequence of the sampling procedure, but clearly also a consequence ofthe fact that we do not respect (57) since N is not large enough.Similar sharp estimates are known for measures supported on Jordancurves, or arcs in the complex plane, but we do not detail them here. Figure 1.

A cloud of N = 576 points, with 7 random out-liers, the others obtained by discretizing normalized Lebesguemeasure on the unit square by a regular grid. Figure 2.

A cloud of N = 600 points, with (cid:96) = 7 randomoutliers. The other points are random samplings of the unitdisk. HRISTOFFEL-DARBOUX KERNELS. I 41

Figure 3.

A ﬁrst cloud of N = 600 points, with (cid:96) = 7random outliers, and a second one with N = 900 and 15outliers. The other points are random samplings. Examples in the multivariate case

The present section collects a series of mostly known facts of pluripotentialtheory which are relevant for our study. In § g Ω ( z ). In § § Examples of plurisubharmonic Green functions with pole atinﬁnity.

With respect to and support of Lemma 2.9, we are fortunate tohave closed form expressions for the plurisubharmonic Green function g Ω ( z )with pole at inﬁnity for a few classes of compact subsets S = C(cid:114)

Ω of C d . For all these examples, S is non-pluripolar, L -regular andpolynomially convex, and hence the explicit formulas for g Ω ( z ) are obtainedvia computing Siciak’s extremal function, see [23]. Notice also the aﬃneinvariance not only of the Christoﬀel-Darboux kernel (11), but also of theplurisubharmonic Green function [23, § Complex ball.

Consider for instance a complex norm [ · ] in C d and theclosed ball B a ( r ) = { z ∈ C d ; [ z − a ] ≤ r } . By a complex norm we mean a norm with the homogeneity property[ λz ] = | λ | [ z ] , λ ∈ C , z ∈ C d . Let Ω denote the complement of B a ( r ) in C d . Then the set B a ( r ) turns outto be regular and g Ω ( z ) = log + [ z − a ] r . (58)For a proof, see Example 5.1.1 in [23]. In particular, for the unit ball B d with respect to the standard Hermitian norm (cid:107) · (cid:107) we obtain g C d (cid:114)B d ( z ) = log + (cid:107) z (cid:107) . Polynomial polyhedra.

Let p , p , . . . , p d be complex polynomials in d variables, with highest homogeneous parts ˆ p , ˆ p , . . . , ˆ p d respectively. As-sume that d (cid:88) j =1 | ˆ p j ( z ) | = 0 iﬀ z = 0; HRISTOFFEL-DARBOUX KERNELS. I 43 that is, the map ( p , p , . . . , p d ) : C d −→ C d has ﬁnite ﬁbres. Consider theanalytic polyhedron K = { z ∈ C d ; | p j ( z ) | ≤ , ≤ j ≤ d } , and its complement Ω = C d (cid:114) K . Then (see Corollary 5.3.2 in [23]) g Ω ( z ) = max ≤ j ≤ d log + | p j ( z ) | deg p j . (59)In particular, for the polydisk D d = { z ∈ C d ; | z j | ≤ , ≤ j ≤ d } we get g C d (cid:114)D d ( z ) = max ≤ j ≤ d log + | z j | . (60)7.1.3. Product domains.

A theorem due to Siciak provides the naturalityof the Green function on product sets. More speciﬁcally, let K ⊆ C d and L ⊆ C e be compact subsets. Then g C d + e (cid:114) ( K × L ) ( z, w ) = max( g C d (cid:114) K ( z ) , g C e (cid:114) L ( w )) , z ∈ C d , w ∈ C e . (61)For details, see Theorem 5.1.8 in [23].7.1.4. Real subsets of complex space.

Although a great deal of examples livein R d , it is necessary to treat them as subsets of C d . Almost all closed formknown formulas are obtained by pull-back from the Green function with poleat inﬁnity of the real interval [ − , g C(cid:114) [ − , ( z ) = log | z + (cid:112) z − | , z ∈ C (cid:114) [ − , , and g C(cid:114) [ − , ( z ) = 0 in for z ∈ [ − , g C(cid:114) [ − , ≥ E ⊆ R d that is convex, has non-empty inte-rior and is symmetric with respect to the origin: x ∈ E implies − x ∈ E .Following Lundin [27] one can represent E as follows E = { z ∈ C d : ∀ ω ∈ S d − , a ( ω ) ω ∗ z ∈ [ − , } , where S d − denotes the unit sphere in R d , and a is continuous on S d − . Wecan choose a ( ω ) to be equal to the inverse of half the width of E in thedirection ω . If the boundary of E is smooth, then there is no ambiguity indeﬁning a ( ω ). The main result of [27] then states that g C d (cid:114) E ( z ) = max ω ∈ S d − g C(cid:114) [ − , ( a ( ω ) ω ∗ z ) . A few particular cases are relevant and we list them under separate sub-sections.

Real ball.

On the unit ball B d ⊆ R d we remark a ( ω ) = 1 for all valuesof ω ∈ S d − and therefore g C d (cid:114) B d ( z ) = 12 g C(cid:114) [ − , ( (cid:107) z (cid:107) + | z · z − | ) , z / ∈ B d . (62)This result is due to Siciak, see Theorem 5.4.6 in [23]. In particular, for a real vector x ∈ R d we infer: g C d (cid:114) B ( x ) = g C(cid:114) [ − , ( (cid:107) x (cid:107) ) = log( (cid:107) x (cid:107) + (cid:112) (cid:107) x (cid:107) −

1) = 12 g C(cid:114) [ − , (2 (cid:107) x (cid:107) − . Real cube.

According to the product formula (61) we ﬁnd g C d (cid:114) [ − , d ( z ) = max ≤ j ≤ d g C(cid:114) [ − , ( z j ) | . (63)7.1.7. Simplex.

Denote by ∆ d the standard simplex in R d ; that is, the con-vex hull of the vectors (0 , e , · · · , e d ), where ( e j ) is the standard orthonormalbasis. A base change of Lundin’s formula via the map ( z , z , · · · , z d ) (cid:55)→ ( z , z , · · · , z d ) leads to the following closed form expression, discovered byBaran: g C d (cid:114) ∆ d ( z ) = g C(cid:114) [ − , ( | z | + | z | + · · · + | z d | + | z + · · · + z d − | ) . (64)For details, see Example 5.4.7 in [23].7.2. The real square with tensor product of Chebyshev weights.

Let us consider for x = ( x , x ) T ∈ [ − , the tensor measure dµ ( x ) = dω ( x ) dω ( x ) , dω ( x ) = dxπ (cid:112) − x . The orthogonal polynomials for the equilibrium measure ω on [ − ,

1] areexplicitly known; namely, p ω ( z ) = 1 and p ωn ( z ) = √ T n ( z ) for n ≥

1, where T n are Chebyshev polynomials of the ﬁrst kind. In particular, if α ( n ) =( j, k ) then p µn ( x ) = p ωj ( x ) p ωk ( x ), making it possible to get more explicitinformation about the underlying Christoﬀel-Darboux kernel. Theorem 7.1.

Consider the enumeration of the multi-indices consistentwith the tensor structure; that is, for all integers n ≥ with N = n max ( n ) =( n + 1) − { α (0) , ..., α ( n max ( N )) } = (cid:26)(cid:20) j(cid:96) (cid:21) , ≤ j, (cid:96) ≤ n (cid:27) , and set g ( z ) = g C(cid:114) [ − , ( z ) g C(cid:114) [ − , ( z ) . (65) (a) For z (cid:48) = (1 , and N = ( n + 1) − , max z ∈ supp( µ ) K µN ( z, z ) = K N ( z (cid:48) , z (cid:48) ) = (2 n + 1) ≤ N + 1 . HRISTOFFEL-DARBOUX KERNELS. I 45 (b)

For all z ∈ R (cid:114) supp( µ ) and N = ( n + 1) − → ∞ , K µN ( z, z ) ∼ (cid:26) e ng ( z ) in the “corner case” | z | > and | z | > , ( n + 1) e ng ( z ) else , , where ∼ means that the quotient has a ﬁnite and non-zero limit for n → ∞ . (c) For all (generic) z, w ∈ R (cid:114) supp( µ ) with z (cid:54) = w and z (cid:54) = w , and N = ( n + 1) − → ∞ , n even, we have C µN ( z, w ) tending to some ﬁnitenon-zero limit in the corner case for both z, w , and else tending to .Proof. We ﬁrst mention that, for N = ( n + 1) − K µN ( z, w ) = K ωn ( z , w ) K ωn ( z , w ) , C µN ( z, w ) = C ωn ( z , w ) C ωn ( z , w ) , and hence for the assertion of the Theorem it is suﬃcient to consider theunivariate case. For a proof of part (a), it is suﬃcient to recall that, for z ∈ [ − , T k ( z ) ≤ T k (1) , and hence K ωn ( z , z ) ≤ K ωn (1 ,

1) = 2 n + 1.For a proof of part (b) and (c), we write z j = 12 ( u j + 1 u j ) , w j = 12 ( v j + 1 v j ) , | u j | ≥ , | v j | ≥ , and log | u | = g C(cid:114) [ − , ( z ). Then by using geometric sums it is quite easyto check that K ωn ( z , z ) ∼ (cid:40) | u | n − / | u | ) if z (cid:54)∈ [ − , | u | > n + 1) if z = cos( t ) ∈ [ − , u = e it ,implying part (b).For a proof of part (c), we recall that, for distinct real w , z (cid:54)∈ [ − ,

1] with | z | > | w | > p ωn ( z ) /p ωn +1 ( z ) → /u for n → ∞ , and hence by Theorem 5.1lim n →∞ C ωn ( z , w )( z w | z w | ) n = (cid:112) ( u − v − u v − , which simpliﬁes for even n . If however z ∈ [ − , | K ωn ( z , w ) | / (cid:112) K ωn ( w , w ) is shown to be bounded in both cases w (cid:54)∈ [ − , w ∈ [ − , (cid:114) { z } , and hence C ωn ( z , w ) tends to 0 by part(b). (cid:3) If z ∈ R (cid:114) supp( µ ) and the masspoints of σ have distinct coordinates,we conclude as in the proof of Corollary 5.4 that K µ + σN ( z, z ) /K µN ( z, z ) hasa limit for N = ( n + 1) − → ∞ , which is diﬀerent from one only if thecorner case holds for z and at least one mass point. Hence, as in Corollaries6.4 and 6.5, we verify that our bivariate leverage score works in this setting.In order to compare the preceding theorem with Lemma 2.9, we noticethat, in case of graded lexicographical ordering and N = n tot ( n ) = ( n +1)( n + 2) / −

1, we still have that max z ∈ supp( µ ) K µN ( z, z ) ≤ N + 1. Hence,with the plurisubharmonic Green function of [ − , (cid:101) g ( z ) = max { g C(cid:114) [ − , ( z ) , g C(cid:114) [ − , ( z ) } , (66) Figure 4.

Comparison of level lines of Green functions forthe unit square and three diﬀerent families of orthogonalpolynomials: for the left-hand image we utilized the plurisub-harmonic Green function (66) of [ − , ⊆ R , which corre-sponds to bivariate OP with graded lexicograpghical ordering(i.e., total degree); the middle image is generated by usingthe tensor Green function (65) corresponding to bivariate OPand partial degree is used; and for the right-most image, thecomplex Green function corresponding to Bergman OP ofone complex variable is used.one may show that12 e n (cid:101) g ( z ) ≤ K µN ( z, z ) ≤ e n (cid:101) g ( z ) max deg P ≤ N max x ∈ supp( µ ) | p ( x ) | (cid:107) P (cid:107) ,µ ≤ (4 N + 1) e n (cid:101) g ( z ) , the left-hand side being obtained by (14) for p ( z , z ) ∈ { p ωn ( z ) , p ωn ( z ) } .All these ﬁndings are less precise than for the case of tensor ordering, andit seems to be non-trivial to get cosine asymptotics.In Figure 4 we compare three approaches to detect outliers outside theunit square [ − , , and in particular address the question whether oneshould prefer an analysis in R or the complex plane C . Though not fullyjustiﬁed for the case of two real variables, we expect to be able to detectsuccessfully outliers at z (cid:54)∈ [ − , for a parameter N provided that K µN ( z, z )is large. The left-hand plot corresponds to graded lexicographical ordering(total degree) discussed in the previous paragraph, where we draw somelevel lines of exp(2 n (cid:101) g ( z )) with the plurisubharmonic Green function (66) for z ∈ R (cid:114) [ − , instead of those of K µN ( z, z ), N + 1 = ( n + 1)( n + 2) /

2. Inthe middle plot corresponding to the tensor case (partial degree) discussed inTheorem 7.1 we draw some level lines of exp(2 ng ( z )) with the tensor Green HRISTOFFEL-DARBOUX KERNELS. I 47 function (65) for z ∈ R (cid:114) [ − , instead of those of K µN ( z, z ), N + 1 =( n + 1) . Finally, on the right corresponding to the case of one complexvariable z discussed in § ng C(cid:114) [ − , ( z ))for z ∈ C (cid:114) [ − , instead of those of K µN ( z, z ), N + 1 = n + 1, comparewith Theorem 6.2(c). Notice that there is no easy closed form expression forthis complex Green function g C(cid:114) [ − , , we have used the Schwarz-Christoﬀeltoolbox for Matlab of Toby Driscoll. In order to make comparison fair, weshould use about the same number N +1 of orthogonal polynomials: we havechosen from the left to the right n = 13, n = 9 and n = 100, correspondingto N + 1 = 105, N + 1 = 100 and N + 1 = 101, respectively.Some observations are are especially noteworthy. We start by comparingthe two bivariate kernels on the left-hand side and in the middle: from thebehavior of the level lines at the corners of [ − , it is apparent that oneshould prefer partial degree to total degree for detecting outliers with allcomponents outside [ − ,

1] (called the corner case in Theorem 7.1). Com-paring the values of the corresponding level lines, the opposite conclusionseems to be true for other outliers. However, much more striking, the pa-rameters of the level lines on the right-hand side are increasing much faster,and also here the level curves do ﬁne around the corners. This conﬁrms ourclaim that outlier detection of (2 d )-dimensional data should be done in C d and not in R d , at least for d = 1.7.3. Christoﬀel-Darboux kernel associated to Reinhardt domainsin C d . Natural domains of convergence for power series in several complexvariables are quite diverse, and in general they are not bi-holomorphicallyequivalent. These are known as (logarithmically convex) Reinhardt do-mains and oﬀer already a vast area of research for function theory andcomplex geometry. The recent monograph [20] gives a comprehensive ac-count of the theory of Reinhardt domains. For our immediate purpose,this class of domains D is important because the monomials are orthogo-nal (but not orthonormal) with respect to Lebesgue measure µ D (and thus k µ D n = deg p µ D n = n in Lemma 2.3); moreover in most cases the monomialsare complete in the associated Bergman space.By way of illustration we consider the two most important, and nonequiv-alent, Reinhardt domains: the ball and the polydisk. In the sequel of thissubsection we will work in C d for some ﬁxed d ≥

1, and enumerate the mul-tivariate monomials z α in graded lexicographical ordering. Let N = N ( n )be the number of monomials of total degree ≤ n .7.3.1. The complex ball.

We ﬁrst consider the Lebesgue (or volume) measure µ B on the unit ball B = B d = { z ∈ C d ; (cid:107) z (cid:107) < } , having as boundary the odd dimensional sphere S d − . Here with help ofpolar coordinates one gets (cid:90) z α z β dµ B ( z ) = π d α !( | α | + d )! for α = β , and zero else.Thus, up to normalization, the orthonormal polynomials are monomials, andwe get for the Christoﬀel-Darboux kernel using the multinomial formula K µ B N ( z, w ) = 1 π d n (cid:88) k =0 (cid:88) | α | = k ( | α | + d )! α ! z α w α = 1 π d n (cid:88) k =0 ( k + 1) d ( w ∗ z ) k , where we use the abbreviation w ∗ z = w z + ... + w d z d , together with thePochhammer symbol ( a ) k = a ( a +1) ... ( a + k − N = n tot ( n ) . We already know from Siciak, see formula (58), thatlim n →∞ K µ B N ( z, z ) /n = (cid:107) z (cid:107) , (cid:107) z (cid:107) > . The explicit form of the polarized kernel allows a closer look at the asymp-totic behavior for large degree.Provided that | w ∗ z | < w, z ∈ B ), the limitfor N → ∞ and thus for n → ∞ exists, and is given by the Bergmanreproducing kernel K µ B ( z, w ) = 1 π d ∞ (cid:88) k =0 ( k + 1) d ( w ∗ z ) k = d ! π d (1 − w ∗ z ) − d − . Note that the unbounded region deﬁned in C d × C d by the inequality | w ∗ z | < w ∗ z = 1 we obtain K µ B N ( z, w ) = d ! π d n (cid:88) k =0 (cid:16) k + dd (cid:17) = d ! π d (cid:16) n + 1 + dd + 1 (cid:17) = ( n + 1) d +1 π d ( d + 1) , which is also an upper bound for K µ B N ( z, w ) in the case | w ∗ z | = 1. Inparticular we learn that K µ B N ( z, z ) grows at most like n d +1 = O ( N ) for z in the support of µ B . Finally, in the case | w ∗ z | > K µ B N ( z, w ) = ( n + 1) d π d ( w ∗ z ) n n (cid:88) j =0 ( n − j + 1) d ( n + 1) d ( w ∗ z ) − j ∼ ( n + 1) d π d ( w ∗ z ) n +1 w ∗ z − ≤ j ≤ n ,0 ≤ − ( n − j + 1) d ( n + 1) d = j − (cid:88) (cid:96) =0 ( n − (cid:96) + 1) d − ( n − (cid:96) ) d ( n + 1) d ≤ jdn + d . HRISTOFFEL-DARBOUX KERNELS. I 49

If we exclude the (unlikely for outlier analysis) case that z is a complexmultiple of w , we conclude that, for z, w (cid:54)∈ supp( µ B ), K µ B N ( z, z ) ∼ ( n + 1) d π d (cid:107) z (cid:107) n +2 (cid:107) z (cid:107) − , lim N →∞ C N ( z, w ) = 0 . (67)7.3.2. The polydisk.

In complete analogy, consider the Lebesgue measure µ P on the polydisk P = D d . Here again multivariate monomials are orthogonal,with the normalization constant (cid:107) z α (cid:107) ,µ P = π d (cid:81) dj =1 ( α j + 1) , and hence K µ P N ( z, w ) = 1 π d n (cid:88) k =0 (cid:88) | α | = k z α w α d (cid:89) j =1 ( α j + 1) . It is not diﬃcult to check that, provided that | w j z j | < j ,lim N →∞ K µ P N ( z, w ) = 1 π d d (cid:89) j =1 − z j w j ) , which is the well known Bergman space reproducing kernel for P = D d . Also,the reader may check that, again, K µ P N ( z, z ) grows at most polynomially in N for z ∈ supp( µ D d ). As for the exterior behavior, again the extremalplurisubharmonic function approach giveslim n →∞ K µ P N ( z, z ) /n = max ≤ j ≤ d | z j | , z / ∈ P , cf. formula (61).Since the further analysis is a bit involved, we will restrict ourselvesto the special case d = 2 and z, w (cid:54)∈ P ; that is, max( | z | , | z | ) > | w | , | w | ) >

1. If | w z | < | w z | then π K µ P N ( z, w ) = n (cid:88) k =0 ( w z ) k k (cid:88) j =0 ( j + 1)( k + 1 − j ) (cid:0) w z w z (cid:17) j ∼ n (cid:88) k =0 ( k + 1)( w z ) k ( w z ) ( w z − w z ) ∼ n + 1 w z − w z ) n +3 ( w z − w z ) . The case w z = w z can be similarly treated. We conclude that K µ P N ( z, z )grows outside the support at least like max {| z | n , | z | n } times some poly-nomial in n . An asymptotic analysis for C µ P N ( z, w ) is possible but quiteinvolved and we omit the details. Index of notation

We list here the meanings of many of the symbols that are frequently usedthroughout the paper. • K µn ( z, w ) the Christoﬀel-Darboux kernel consisting of ( n + 1) summands; • tdeg p the total degree of a multivariate polynomial p ; • deg p the degree of a multivariate orthogonal polynomial, indicating theposition n = deg p of p in a prescribed linear ordering of independentmonomials; • S = supp( µ ) the support of the measure µ ; • S ( µ ) the Zariski closure of the support of a positive measure µ ; • N ( µ ) the ideal of polynomials vanishing on S ( µ ); • v n ( z ) the tautologic vector of monomials of degree less than or equal to n ; • v µn ( z ) the tautologic vector of µ -orthogonal polynomials; • (cid:102) M n ( µ ) the matrix of moments of degree less than or equal to n , associatedto a positive measure µ ; • M n ( µ ) the reduced moment matrix, a maximal invertible submatrix of (cid:102) M n ( µ ); • ∆( z, µ ) the Mahalanobis distance between a point z and a measure µ ; • t j K (cid:101) νn ( z ( j ) , z ( j ) ) the leverage score of a masspoint z ( j ) with mass t j of adiscrete measure (cid:101) ν ; • L n ( µ ) the linear span of orthogonal polynomials of degree less than or equalto n ; • C µn ( z, w ) the cosine function associated to the Christoﬀel-Darboux kernel; • C ( µ ) the covariance matrix of a random variable with law µ ; • g Ω the plurisubharmonic Green function of the unbounded open set Ω, withpole at inﬁnity; • Φ the normalized conformal mapping of the unbounded component of thecomplement of a Jordan curve in C onto the exterior of the unit disk; • T n Chebyshev polynomial of the ﬁrst kind.

References [1] H. Alexander, J. Wermer, Polynomial Hulls with Convex Fibers, Math. Ann. 271,99-109 (1985).[2] A. Ambroladze, On Exceptional Sets os Asymptotic relations for General OrthogonalPolynomials; J. Approx. Th. 82 (1995) 257-273.[3] T. Bayraktar: Zero distribution of random sparse polynomials, Mich. Math. J.66(2017), 389-419.[4] B. Beckermann, On the numerical condition of polynomial bases: Estimates for theCondition Number of Vandermonde, Krylov and Hankel matrices, Habilitationss-chrift, Universit¨at Hannover (1996).[5] B. Beckermann and N. Stylianopoulos, Bergman orthogonal polynomials and theGrunsky matrix, Constr. Approx. (2018) 211-235.[6] T. Bloom, N. Levenberg, Weighted pluripotential theory in C d , American Journal ofMathematics (2003), 57-103.[7] T. Bloom, N. Levenberg, F. Piazzon, F. Wielonsky, Bernstein-Markov: a survey,Dolomte Research Notes on Approximation (2015), 75-91. HRISTOFFEL-DARBOUX KERNELS. I 51 [8] L. Bos, N. Levenberg, Bernstein-Walsh theory associated to convex bodies and appli-cations to multivariate approximation theory, Comp. Meth. Funct. Theory 18(2018),361-388. (2017).[9] L. Bos, B. Della Vecchia and G. Mastroianni, On the asymptotics of Christoﬀelfunctions for centrally symmetric weights functions on the ball in R n , Rendiconti delCircolo Matematico di Palermo 52 (1998), 277-290.[10] A. Cohen, G. Migliorati, Optimal weighted least-squares methods. SMAI J. Comput.Math. 3 (2017), 181203.[11] A.M. Delgado, L. Fern´andez, T.E. P´erez, M.A. Pi˜nar, On the Uvarov modiﬁcation oftwo variable orthogonal polynomials on the disk. Complex Anal. Oper. Theory 6 (3)(2012), 665-676.[12] A. Delgado, L. Fernandez, T. Perez, M. Pinar, Multivariate Orthogonal Polynomialsand Modiﬁed Moment Functionals, arXiv:1601.07194 (2017).[13] C. F. Dunkl, Y. Xu, Orthogonal Polynomials of Several Variables (second edition),Cambridge Univ. Press, Cambridge, 2014.[14] Y. Xu, Orthogonal polynomials of several variables, arXiv:1701.02709.[15] G. Freud,

Orthogonale Polynome , Birkh¨auser, Basel, 1969.[16] D. Gaier,

Lectures on Complex Approximation , Birkh¨auser Boston Inc., Boston, MA,1987.[17] W. Gautschi,

Orthogonal polynomials: computation and approximation , NumericalMathematics and Scientiﬁc Computation, Oxford University Press, New York, 2004.[18] G.H. Golub, Ch.F. Loan,

Matrix computations , 4th edition, Johns Hopkins UniversityPress (2013).[19] B. Gustafsson, M. Putinar, E. Saﬀ, and N. Stylianopoulos,

Bergman polynomials onan archipelago: Estimates, zeros and shape reconstruction , Advances in Math. (2009), 1405–1460.[20] M. Jarnicki, P. Pﬂug,

First Steps in Several Complex Variables: Reinhardt Domains ,Europ. Math. Soc., Z¨urich, 2008.[21] J.B. Lasserre, E. Pauwels, Sorting out typicality with the inverse moment matrix SOSpolynomial, Proceedings of the 30-th NIPS Conference, 2016.[22] J. B. Lasserre, E. Pauwels, The empirical Christoﬀel function in Statistics and Ma-chine Learning, ArXiv 1701.02886v4, to appear in Advances in Computational Math-ematics (2019).[23] M. Klimek,

Pluripotential Theory , Clarendon Press, Oxford, 1991.[24] A. Kro´o and D. S. Lubinsky, Christoﬀel functions and universality in the bulk formultivariate orthogonal polynomials, Canadian J. Math. 65(2013), 600-620.[25] A. Kro´o and D. S. Lubinsky, Christoﬀel functions and universality on the boundaryof the ball, Acta Math. Hungarica 140(2013), 117-133.[26] D. S. Lubinksy, A new approach to universality limits involving orthogonal polyno-mials, Ann. of Math. 170(2009), 915-939.[27] M. Lundin, The extremal psh for the complement of convex, symmetric subsets of R N , Michigan Math. J. 32(1985), 197- 201.[28] V. G. Malyshkin, Multiple Instance Learning: Christoﬀel Function Approach to Dis-tribution Regression Problem, ArXiv 1511.07085v1[29] V. G. Malyshkin, Mathematical Foundations of Realtime Equity Trading. LiquidityDeﬁcit and Market Dynamics. Automated Trading Machines, ArXiv 1510.05510v5[30] C. Martinez and M.A. Pinar, Orthogonal Polynomials on the Unit Ball and Fourth-Order Partial Diﬀerential Equations, SIGMA (2016) 1-11.[31] P. Nevai, G´eza Freud, orthogonal polynomials and Christoﬀel functions. A case study,J. Approx. Theory, 48 (1986), pp. 3-167.[32] E. Pauwels, F. Bach, J-P. Vert, Relating Leverage Scores and Density using Regular-ized Christoﬀel Functions, arXiv:1805.07943 [33] E. Parzen, On the estimation of a probability density function and mode, Annals ofMathematical Statistics 33(1962), 1065-1076.[34] W. Plesniak, Siciak’s extremal function in complex and real analysis, Annales PoloniciMathematici (2003) 37-46.[35] A. Sadullaev, Plurisubharmonic Functions, in: G. M. Khenkin et al. (eds.), SeveralComplex Variables II , Encyclopaedia of Math. Sci., Springer-Verlag, Berlin Heidelberg1994.[36] E. B. Saﬀ, Orthogonal polynomials from a complex perspective, Orthogonal polyno-mials (Columbus, OH, 1989), Kluwer Acad. Publ., Dordrecht, 1990, pp. 363–393.[37] , Remarks on relative asymptotics for general orthogonal polynomials, Recenttrends in orthogonal polynomials and approximation theory, Contemp. Math., vol.507, Amer. Math. Soc., Providence, RI, 2010, pp. 233–239.[38] E.B. Saﬀ, V. Totik,

Logarithmic Potentials with External Fields , Grundlehren dermathematischen Wissernschaften, vol. 316, Springer, Heidelberg (1997).[39] B. Simon,

Orthogonal Polynomials on the Unit Circle, Part 2: Spectral theory , AMS(2009).[40] B. Simon, The Christoﬀel-Darboux kernel, in vol. ”Perspectives in Partial DiﬀerentialEquations, Harmonic Analysis and Applications: A Volume in Honor of Vladimir G.Mazya’s 70th Birthday”, D. Mitrea, M. Mitrea, eds., Proc. Symp. Pure Math. Amer.Math. Soc., Providence, R. I., 2008, pp. 314-355.[41] B. Simon, Weak convergence of CD kernels and applications, Duke Math. J.146(2009), 305-330.[42] H. Stahl, V. Totik,

General Orthogonal Polynomials , Cambridge Univ. Press, Cam-bridge, 1992.[43] P. K. Suetin,

Polynomials orthogonal over a region and Bieberbach polynomials , Pro-ceedings of the Steklov Institute of Mathematics, (1971), Amer. Math. Soc.,Providence, R.I., 1974.[44] V. Totik, Asymptotics for Christoﬀel functions for general measures on the real line,J. Anal. Math. (2000), 283-303.[45] V. Totik, Christoﬀel functions on curves and domains, Trans. Amer. Math. Soc. (2010), no. 4, 2053–2087.[46] K. C. Toh and L. N. Trefethen, The Kreiss matrix theorem on a general complexdomain, SIAM J. Matrix Anal. Appl., 21 (1999), pp. 145165.[47] V. N. Vapnik, Statistical Learning Theory , Wiley Interscience, New York, 1998.[48] Y. Xu, Asymptotics for orthogonal polynomials and Christoﬀel functions on a ball,Methods Appl. Analysis 3(1996), 257-272.

HRISTOFFEL-DARBOUX KERNELS. I 53

Laboratoire Painlev´e UMR 8524, Department of Mathematics, Universit´ede Lille, 59655 Villeneuve d’Ascq, France

E-mail address : [email protected]

Department of Mathematics, University of California at Santa Barbara,Santa Barbara, California, 93106-3080, and

School of Mathematics and Sta-tistics, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK

E-mail address : [email protected], [email protected] Center for Constructive Approximation, Department of Mathematics, Van-derbilt University, 1326 Stevenson Center, 37240 Nashville, TN, USA

E-mail address : [email protected] URL : http://my.vanderbilt.edu/edsaff/ Department of Mathematics and Statistics, University of Cyprus, P.O. Box20537, 1678 Nicosia, Cyprus

E-mail address : [email protected] URL ::