A Topological Approach to Inferring the Intrinsic Dimension of Convex Sensing Data
AA Topological Approach to Inferring the Intrinsic Dimension ofConvex Sensing Data
Min-Chun Wu Vladimir ItskovJuly 8, 2020
Abstract
We consider a common measurement paradigm, where an unknown subset of an affine spaceis measured by unknown continuous quasi-convex functions. Given the measurement data, canone determine the dimension of this space? In this paper, we develop a method for inferringthe intrinsic dimension of the data from measurements by quasi-convex functions, under naturalgeneric assumptions.The dimension inference problem depends only on discrete data of the ordering of the mea-sured points of space, induced by the sensor functions. We introduce a construction of a filtrationof Dowker complexes, associated to measurements by quasi-convex functions. Topological fea-tures of these complexes are then used to infer the intrinsic dimension. We prove convergencetheorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, un-der natural generic assumptions. We also illustrate the usability of this method in simulations.
Contents L k ( F , P K ) and its relation to the dimension of ( F , P K ) . . . . . . . . . . . . . . . . . 133.3 L k ( M ) and its convergence to L k ( F , P K ) . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Algorithm for ˆ K a and L k ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 How to use the algorithms under different situations . . . . . . . . . . . . . . . . . . 183.5.1 Subsample points when n is sufficiently large . . . . . . . . . . . . . . . . . . 193.5.2 Subsample functions when m is large . . . . . . . . . . . . . . . . . . . . . . . 211 a r X i v : . [ m a t h . A T ] J u l ˆ d low ( M, ε ) as an asymptotically consistent dimension estimator in the class ofcomplete regular pairs 22 ( F , P K ) from sampled data 25 λ i ( t ) for t ∈ (0 ,
1) . . . . . . . . . . . . . . . . . . . . . . 296.3 Proof of Interleaving Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . 306.4 Proof of Theorem 3.10, the convergence of L k ( M n ) to L k ( F , P K ) . . . . . . . . . . . 396.5 A lemma used in the proof of Lemma 4.5 . . . . . . . . . . . . . . . . . . . . . . . . 456.6 Cent is open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.7 Proof of Theorem 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Data in many scientific applications are often obtained by “sensing” the phase space via sen-sors/functions that are convex. Convex sensing is a class of problems of inferring the geometry ofdata that are sampled via such functions. To be precise, let us recall the following
Definition 1.1.
Let K ⊆ R d be open convex. A function f : K → R is quasi-convex if eachsublevel set f − ( −∞ , (cid:96) ) = { x ∈ K | f ( x ) < (cid:96) } is convex or empty, for all (cid:96) ∈ R .The following is perhaps the shortest, albeit naive and incomplete, formulation of a convexsensing problem. A collection of n points X = { x a } na =1 in an open convex region K ⊂ R d is sensedby measuring the values of m sensors , i.e. quasi-convex functions F = { f i : K → R } mi =1 . Supposethat one has access only to the m × n data matrix M = [ M ia ] of sensor values, where M ia = f i ( x a ) , (1)but does not have direct access to the information about the dimension d of the underlying space,the open convex region K , the points x a ∈ K , or any further details of the quasi-convex functions f i . Can one recover any geometric information about the sampled region K ? At the very minimum,can one infer the dimension d ? While the convex sensing problems may be not uncommon in many scientific applications, ourchief motivation comes from neuroscience. Neurons in the brain regions that represent sensory2 ell 1 cell 2 cell 3
Figure 1: The activities of three different experimentally recorded place cells in a rat’s hippocampus.The color represents the probability of each neuron’s firing as a function of the animal’s location.information often possess receptive fields . A paradigmatic example of a receptive field is that of ahippocampal place cell [9]. Place cells are a class of neurons in rodent hippocampus that act as po-sition sensors. Here the relevant stimulus space K ⊂ R d is the animal’s physical environment [13],with d ∈ { , , } , and x ∈ K is the animal’s location in this space. Each neuron is activated witha certain probability that is a continuous function f : K → R ≥ of the animal’s position in space.In other words, the probability of a single neuron’s activation at a time t is given by p ( t ) = f ( x ( t )),where x ( t ) is the animal’s position. For each neuron, the function f is called its place field , and isapproximately quasi-concave (see examples of place fields in Figure 1). Place fields can be easilycomputed when both the neuronal activity data and the relevant stimulus space are available. Anumber of other classes of sensory neurons in the brain also possess quasi-concave receptive fields ,that is, each such neuron responds with respect to a quasi-concave probability density function f : K → R ≥ on the stimulus space.In many situations, the relevant stimulus space for a given neural population may be unknown.This raises a natural question: can one infer the dimension of the stimulus space with quasi-concavereceptive fields from neural activity alone? More precisely, given the neural activity of m neuronswith quasi-concave receptive fields f i : K → R , can one “sense” the stimulus space by sampling theneural activity at n moments of time as M ia = f i ( x ( t a ))? Here one has access to the measurements M ia , but not the objects on the right-hand-side. This motivates the naive formulation of the convexsensing problem above. The convex sensing problem possesses a natural transformation group. If φ : R → R is a strictlymonotone-increasing function, then the sub-level sets of the composition φ ◦ f and f are identicalup to an order-preserving re-labeling. Thus, if φ is a strictly monotone-increasing function, then f is quasi-convex if and only if φ ◦ f is quasi-convex. It is easy to show that two sets of real numbers A function f ( x ) is quasi-concave if its negative, − f ( x ), is quasi-convex. A function φ : R → R is strictly monotone-increasing if φ ( y ) > φ ( x ) whenever y > x . a < a < · · · < a n and b < b < · · · < b n if and only if there existsa strictly monotone function φ : R → R , such that b i = φ ( a i ) for all i . It thus follows that it is onlythe total order of each row in the matrix M in equation (1) that constrain the geometric featuresof the point cloud X n = { x , ..., x n } in a convex sensing problem. This motivates the followingdefinition. Definition 1.2.
Let V be a finite set. A sequence of length k in V is a k -tuple s = ( v , ..., v k ) ofelements in V without repetitions . We denote by S k [ V ] the set of all sequences of length k on V .If M is an m × n real matrix that has distinct entries in each row, then each row yields asequence of length n . For the sake of an example, consider a real-valued matrix M = (cid:34) .
23 4 .
19 2 .
56 3 . .
78 2 .
88 5 .
76 13 . (cid:35) . Since the first row has the ordering 2 . < . < . < .
23, the total order < on V = { , , , } is 3 < < <
1. Thus, the order sequence for the first row is s = (3 , , , ∈ S [ V ]. Similarly,the order sequence for the second row is s = (2 , , , X n and the quasi-convex functions { f i } i ∈ [ m ] aregeneric in some natural sense , then each row of the data matrix M ia = f i ( x a ) has no repeatedvalues with probability 1. We denote the set of all “generic” data matrices as M om,n def = { m × n real-valued matrices with no repeated entries in each row } . For any such matrix M = [ M ia ] ∈ M om,n , one can define a collection S ( M ) of m maximal-lengthsequences as S ( M ) = { s , . . . , s m } , where each sequence s i = ( a i , ..., a in ) ∈ S n [ n ] is obtained from the total order of the i -th row: M ia i < M ia i < ... < M ia in . The geometry of a convex sensing problem for a data matrix M ∈ M om,n is constrained only bythe set of m sequences S ( M ) ⊂ S n [ n ]. The following observation makes it possible to re-state anyconvex sensing problem purely in terms of embedding a set of points that satisfy certain convex hull It will be rigorously defined in Section 1.3. The accurate notation is S n [[ n ]], but here we use the less cumbersome notation S n [ n ]. x , ..., x k ) denote the convex hull of a collection of points x , ..., x k in R d . Lemma 1.3.
For any collection of n distinct points { x , x , . . . , x n } ⊂ R d , the following statementsare equivalent:(i) There exists a continuous quasi-convex function f : R d → R such that f ( x ) < f ( x ) < · · · < f ( x n ) , (2) (ii) For each k = 2 , ..., n , x k / ∈ conv( x , ..., x k − ) .Proof. The implication ( i ) = ⇒ ( ii ) follows from Definition 1.1. To prove that ( ii ) = ⇒ ( i ), denote C k = conv( x , ..., x k ), d k ( x ) def = dist( x, C k ) for any k = 1 , . . . n , and define f ( x ) = (cid:80) nk =1 h k · d k ( x ),where h = 1 and h k def = 1 + 1 d k ( x k +1 ) max k − (cid:88) j =1 h j ( d j ( x k ) − d j ( x k +1 )) , , for k ≥ . Note that (ii) implies that d k ( x k +1 ) > k ≥
2. Recall that, for any convex set C ⊂ R d , thefunction x (cid:55)→ dist( x, C ) is continuous and convex . Thus, since h k are positive, f ( x ) is a continuousconvex (and thus quasi-convex) function. Moreover, f ( x ) = 0 < dist( x , x ) = f ( x ), and h k > d k ( x k +1 ) k − (cid:88) j =1 h j ( d j ( x k ) − d j ( x k +1 )) for k ≥ . The last inequality is equivalent to f ( x k +1 ) > f ( x k ). Thus inequalities (2) hold. Corollary 1.4.
A matrix M = [ M ia ] ∈ M om,n can be obtained as M ia = f i ( x a ) from a collection of m continuous quasi-convex functions f i : R d → R and n points x , . . . , x n ∈ R d if and only if thereexist n points x , . . . , x n ∈ R d such that, for each sequence s = ( a , a , . . . , a n ) ∈ S ( M ) and each k = 2 , ..., n , x a k / ∈ conv( x a , ..., x a k − ) . An important implication of Corollary 1.4 is that a convex sensing problem without any fur-ther constraint always has a two-dimensional solution. Recall that a set of points is convexlyindependent if none of these points lies in the convex hull of the others.
Corollary 1.5.
For every matrix M ∈ M om,n and convexly independent points x , x , . . . , x n ∈ R d ,there exist m continuous quasi-convex functions f i : R d → R such that M ia = f i ( x a ) . See, e.g., Example 3.16 in [3]. See, e.g., Section 3.2.1 in [3]. By choosing d = 2 in Corollary 1.4, we obtain a two-dimensional solution. non-generic ,for large n . If one explicitly excludes this situation, then the combinatorics of S ( M ) constrains theminimal possible dimension d of the geometric realization, as illustrated by the following example. Example 1.6.
Let n >
2, and M ∈ M on − ,n be a matrix obtained as in equation (1) with continuousquasi-convex functions f i , whose ( n −
1) sequences S ( M ) = { s , s , . . . , s n − } are of the form s i = ( · · · , n, i ) , for all i ∈ [ n − , (3)where each of the “ · · · ” in s i is an arbitrary permutation of [ n ] \ { n, i } . Assume that at least onepoint in X n = { x , ..., x n } ⊂ R d is contained in the interior of the convex hull conv( X n ), then thedimension in which M can be obtained as in Corollary 1.4 is d = n −
2. The proof is given inSection 6.1 of the Appendix.
It is clear from Corollary 1.5 and Example 1.6, that the problem of dimension inference is well-posed only in the presence of some genericity assumptions that guarantee convex dependence of thesampled points. Instead of making such an assumption explicit, we take a probabilistic perspective,wherein points are drawn from a probability distribution that is generic is some natural sense. Weassume that there are three objects (which are unknown) that underly any “convex sensing” data:(i) an open convex set K ⊆ R d ,(ii) m quasi-convex continuous functions F = { f i : K → R } mi =1 , and(iii) a probability measure P K on K .In relation to the neuroscience motivation in Section 1.1, K ⊆ R d is the stimulus space, eachfunction f i is the negative of the receptive field of a neuron, and P K is the measure that describesthe probability distribution of the stimuli. To guarantee that the convex sensing data are generic ,we impose the following regularity assumptions. Definition 1.7. A regular pair is a pair ( F , P K ) that satisfies the conditions (i)-(iii) above aswell as the following two conditions:(R1) The probability measure P K is equivalent to the Lebesgue measure on K .(R2) Level sets of all functions in F are of measure zero, i.e. for every i ∈ [ m ] and (cid:96) ∈ R , P K ( f − i ( (cid:96) )) = 0. Definition 1.8.
A point cloud { x , ..., x n } ⊂ K is sampled from a regular pair ( F , P K ) if it isi.i.d. from P K . A matrix M = [ M ia ] ∈ M om,n is sampled from a regular pair ( F , P K ), if for all i ∈ [ m ], and a ∈ [ n ], M ia = f i ( x a ) , where { x , ..., x n } ⊂ K is sampled from ( F , P K ).6he assumption (R1) ensures that the domain K is well-sampled, and thus the probabilitythat the points x , ..., x n are convexly independent approaches zero in the limit of large n . Theassumption (R2) guarantees, with probability 1, that the data matrix M has no repeated values ineach row, and thus is in M om,n .In this paper, we develop a method for estimating the dimension of convex sensing data.Intuitively, such an estimator needs to be consistent , i.e. “behave well” in the limit of large data.In addition to the conditions imposed on a regular pair, other properties of a pair ( F , P K ) may beneeded, depending on the context. It is therefore natural to define a consistent dimension estimatorin relation to a particular class of regular pairs. Since an estimator may rely on different parametersfor different regular pairs, we consider a one-parameter family of such estimators, motivating thefollowing definition of consistency. Definition 1.9.
Let R P be a class of regular pairs. For each regular pair ( F , P K ) ∈ R P we denoteby d ( F , P K ) the dimension d , where the open convex set K ⊆ R d is embedded. A one-parameterfamily of functions ˆ d ( ε ) : M om,n → N is called an asymptotically consistent estimator in R P ,if for every regular pair ( F , P K ) ∈ R P , there exists l >
0, such that for every ε ∈ (0 , l ) and eachsequence of matrices M n ∈ M om,n , sampled from ( F , P K ),lim n →∞ P (cid:16) ˆ d ( ε ) ( M n ) = d ( F , P K ) (cid:17) = 1 . (4)The structure of this paper is as follows. In Section 2, we define two multi-dimensional fil-trations of simplicial complexes: the empirical Dowker complex Dow( S ( M )) that can be asso-ciated to a data matrix M , and the Dowker complex D ow( F , P K ), that can be associated to aregular pair ( F , P K ). Using an interleaving distance between multi-filtered complexes , we prove(Theorem 2.9) that for a sequence { M n } of data matrices, sampled from a regular pair ( F , P K ),Dow( S ( M n )) → D ow( F , P K ) in probability, as n → ∞ .In Section 3, we develop tools for estimating the dimension of ( F , P K ) using persistent homol-ogy. We define a set of maximal persistent lengths associated to D ow( F , P K ) and prove (Lemma3.8) that a lower bound of the dimension of ( F , P K ) can be derived from these persistent lengths.Next we define another set of maximal persistence lengths from Dow( S ( M n )) and prove (Theorem3.10) that they converge to the maximal persistence lengths associated to D ow( F , P K ) in probabil-ity, in the limit of large sampling of the data. The rest of Section 3 is devoted to two subsamplingprocedures for different practical situations, as well as simulation results that illustrate that thecorrect dimension can be inferred with these two methods.In Section 4, we introduce complete regular pairs and prove (Theorem 4.3) that the lowerbound in Lemma 3.8 is equal to the dimension d ( F , P K ) for complete regular pairs. This estab-7ishes (Theorem 4.4) that the dimension estimator introduced in Section 3.3 is an asymptoticallyconsistent estimator in the class of complete regular pairs. In Section 5, we define an estimatorthat can be used to test (Theorem 5.5) whether the data matrix is sampled from a complete regularpair. The Appendix (Section 6) contains the proofs of the main theorems as well as some technicalsupporting lemmas. In this section, we define the empirical Dowker complex from the m sequences induced from therows of the data matrix M and the Dowker complex from the regular pair ( F , P K ) and prove thatthe empirical Dowker complex converges to the Dowker complex in probability. These complexesare both examples of multi-filtered simplicial complexes . Definition 2.1.
Let I = (cid:81) i ∈ [ m ] I i be an m -orthotope in R m , where each I i is an interval (open,closed, half-open, finite, or infinite are all allowed) in R . Let ≤ be the natural partial order on I induced from R m . A multi-filtered simplicial complex D indexed over I is a collection {D α } α ∈ I of simplicial complexes on a fixed finite vertex set, such that, D α ⊆ D β , for all α ≤ β in I .We define the empirical Dowker complexes from a collection of sequences of maximal length on the vertex set [ n ]. Definition 2.2.
Let S = { s , ..., s m } be a collection of sequences on [ n ] of length n . Let ≤ i be thetotal order on [ n ] induced from s i ; namely, for a, b ∈ [ n ], a ≤ i b if and only if a is before or equal to b in s i . We define the following multi-filtered simplicial complex, with vertex set [ m ] and indexedover [0 , m : Dow( S ) def = { Dow ( S ) ( t , ..., t m ) : ( t , ..., t m ) ∈ [0 , m } , where Dow( S )( t , ..., t m ) def = ∆( { σ a : a = 1 , ..., n } ) , and σ a = { i ∈ [ m ] : { b ∈ [ n ] : b ≤ i a } ) ≤ nt i } . Here ∆( { σ a } a ∈ [ n ] ) denotes the smallest simplicial complex containing the faces { σ a } a ∈ [ n ] . Thisfiltered complex is called the empirical Dowker complex of S .Recall from Section 1.2 that the relevant geometric information of the m × n data matrix M ∈ M om,n is contained in the collection of m sequences S ( M ) = { s , ..., s m } , where s i ⊂ S n [ n ] isof length n and records the total order induced by the i -th row of M . Therefore, we can consider i.e. of length n S ( M )) derived from the data matrix M .Note that our definition of empirical Dowker complex is a multi-parameter generalization ofthe Dowker complex defined in [5]. Specifically, the one-dimensional filtration of simplicial complex(indexed over t ) Dow( S ( M ))( n · t, ..., n · t ) is equal to the Dowker complex defined in [5].Recall that, for a collection A = { A i } i ∈ [ m ] of sets, the nerve of A , denoted nerve( A ), is thesimplicial complex on the vertex set [ m ] defined asnerve( A ) def = (cid:40) σ ⊆ [ m ] : (cid:92) i ∈ σ A i (cid:54) = ∅ (cid:41) . The following lemma is immediate from Definition 2.2.
Lemma 2.3.
Let S = { s , ..., s m } be a collection of sequences on [ n ] of length n . For each i ∈ [ m ] and t ∈ R , consider A ( i ) ( t ) def = { a ∈ [ n ] : { b ∈ [ n ] : b ≤ i a } ) ≤ nt } ⊂ [ n ] , where ≤ i is the total order on [ n ] induced by s i . Then Dow( S )( t , ..., t m ) = nerve (cid:18)(cid:110) A ( i ) ( t i ) (cid:111) i ∈ [ m ] (cid:19) . Next we connect the combinatorics of Dow( S ( M )) to the geometry. From Lemma 2.3, weknow that Dow( S ( M )) is the nerve of { A ( i ) ( t i ) } i ∈ [ m ] . To define an analogue of Dow( S ( M )) fromthe regular pair ( F , P K ), we use the following lemma (see the proof in Section 6.2) to define ananalogue of A ( i ) ( t ) from ( F , P K ). Lemma 2.4.
Let f : K → R be a continuous function with P K ( f − ( (cid:96) )) = 0 for all (cid:96) ∈ R , where P K is a probability measure on a convex open set K and P K is equivalent to the Lebesgue measureon K . Then there exists a unique strictly increasing continuous function λ : (0 , → R such that,for all t ∈ (0 , , P K ( f − ( −∞ , λ ( t )) = t. (5)For a regular pair ( F , P K ) = ( { f i : K → R } i ∈ [ m ] , P K ), by Lemma 2.4, for each i ∈ [ m ], thereexists a unique strictly increasing continuous function λ i : (0 , → R such that P K ( f i ( −∞ , λ i ( t )) = t . Using λ i ( t ), the following definition provides a continuous analogue of A ( i ) ( t ). Definition 2.5.
Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regular pair. For each i ∈ [ m ] and t ∈ (0 , K ( i ) ( t ) def = f − i ( −∞ , λ i ( t )) , λ i : (0 , → R is the unique function that satisfies P K ( f − i ( −∞ , λ i ( t ))) = t . For convenience,we also define K ( i ) (0) def = ∅ and K ( i ) (1) def = K .Figure 2: K ( i ) ( t ) is the sublevel set of f i whose P K measure is equal to t .An illustration of K ( i ) ( t ) can be found in Figure 2. They are simply sublevel sets of f i rescaledwith respect to the P K measure. On the other hand, for a point cloud X n = { x , · · · , x n } sampledfrom P K , if we identify [ n ] with X n via a ↔ x a , then A ( i ) ( t ) may be interpreted as the set of pointsin X n that is inside an approximation of K ( i ) ( t ). Informed by Lemma 2.3, we use K ( i ) ( t ) to definethe continuous version of Dowker complex. Definition 2.6.
Let ( F , P K ) be a regular pair. Define a multi-filtered complex D ow( F , P K ),indexed over [0 , m , by D ow( F , P K )( t , ..., t m ) def = nerve (cid:16) { K ( i ) ( t i ) } mi =1 (cid:17) . This multi-filtered complex is called the
Dowker complex induced from ( F , P K ).The complex Dow( S ( M )) is what we can obtain from the data matrix M , but it does notcapture the whole geometric information of ( F , P K ). On the other hand, D ow( F , P K ) reflects thewhole geometric information but is not directly computable. Since A ( i ) ( t i ) is an approximation of K ( i ) ( t i ), we might expect Dow( S ( M )) approximates D ow( F , P K ). As we shall see, this is the casebut, for comparing them formally, we need the concept of the interleaving distance. Definition 2.7.
For a multi-filtered complex K indexed over R m and (cid:15) >
0, the (cid:15) -shift of K ,denoted K + (cid:15) , is the multi-filtered complex defined by( K + (cid:15) )( t , ..., t m ) def = K ( t + (cid:15), ..., t m + (cid:15) ) . For two multi-filtered complexes K and L indexed over R m , the simplicial interleaving distance K and L is defined as d INT ( K , L ) def = inf { (cid:15) > K ⊆ L + (cid:15) and L ⊆ K + (cid:15) } . Note that this interleaving distance is between multi-filtered simplicial complexes while thestandard interleaving distance in topological data analysis is between persistence modules , namely,the level where the homology functor has been applied on the multi-filtered complex (see, e.g. [8],for the standard definition of interleaving distance between multi-dimensional persistence modules).Similar to the standard interleaving distance, the simplicial interleaving distance d INT defined hereis also a pseudo-metric; namely, d INT ( K , L ) = 0 does not imply K = L .The definition of simplicial interleaving distance involves a shift of indices and that is whythe two multi-filtered complexes to be compared are required to be indexed over the whole R m .Since both Dow( S ( M )) and D ow( F , P K ) are indexed only over [0 , m , to compare them in termsof interleaving distance, we first need to extend their indexing domain to R m . The definition belowis a natural way to extend the indexing domain. Definition 2.8.
For D = Dow( S ( M )) or D ow( F , P K ) and ( t , ..., t m ) ∈ R m , define D ( t , ..., t m ) def = D ( θ ( t ) , · · · , θ ( t m ))where θ : R → [0 ,
1] is defined by θ ( t ) = t , if 0 ≤ t ≤ θ ( t ) = 0, if t < θ ( t ) = 1, if t > Theorem 2.9 (Interleaving Convergence Theorem) . Let ( F , P K ) be a regular pair and M n bean m × n data matrix sampled from ( F , P K ) . Then the simplicial interleaving distance between Dow( S ( M n )) and D ow( F , P K ) converges to in probability as n → ∞ ; namely, for all (cid:15) > , lim n →∞ Pr [ d INT (Dow( S ( M n )) , D ow( F , P K )) > (cid:15) ] = 0 . The proof of Theorem 2.9 is given in Section 6.2. In Section 3, we use Theorem 2.9 to infer alower bound for the dimension of ( F , P K ). First we recall the definition of persistence modules, persistence intervals and persistence dia-grams; for more details see, e.g. Chapter 1 of [11]. Then we define the maximal persistence lengthfor a 1-dimensional filtration of simplicial complexes. We fix a ground field F , which is normallytaken to be F for computational reasons; all the statements here do not depend on the choice ofthe field. 11 efinition 3.1. A persistence module M indexed over an interval [0 , T ] is a collection {M t } t ∈ [0 ,T ] of vector spaces over F along with linear maps φ ts : M s → M t for every s ≤ t in [0 , T ] such that φ us = φ ut ◦ φ ts , and φ tt = id M t for all s ≤ t ≤ u in [0 , T ].A well-known structural characterization of a persistence module is via its persistence intervals (or equivalently, its persistence diagram ). To talk about persistence intervals, we would need todefine the direct sum of persistence modules and interval modules . Definition 3.2.
Let M = {M t } t ∈ [0 ,T ] and N = {N t } t ∈ [0 ,T ] be persistence modules over the sameindex interval [0 , T ]. Let { φ ts : s, t ∈ [0 , T ] , s ≤ t } and { ψ ts : s, t ∈ [0 , T ] , s ≤ t } be the linear mapsof M and N . The direct sum of M and N , denoted M ⊕ N , is the persistence module, definedby (
M ⊕ N ) t def = M t ⊕ N t along with the linear maps ( φ ts ) ⊕ ( ψ ts ) : M s ⊕ N s → M t ⊕ N t for every s ≤ t in [0 , T ]. Definition 3.3.
Let J ⊆ [0 , T ] be an interval in [0 , T ], which can be either open, closed, or half-open. The interval module I J defined over [0 , T ] is the persistence module I J = { ( I J ) t } t ∈ [0 ,T ] defined by ( I J ) t def = F for all t ∈ J and ( I J ) t def = 0 for all t / ∈ J , along with the identity linear mapsfrom ( I J ) s to ( I J ) t for every s ≤ t in J and zero maps from ( I J ) s to ( I J ) t for other s ≤ t in [0 , T ].The next decomposition theorem is a structural theorem that characterizes persistence modulesand guarantees the existence and uniqueness of persistence intervals (see, e.g., Section 1.1 and 1.2of [11] and references therein). Theorem 3.4.
Let M = {K t } t ∈ [0 ,T ] be a persistence module over a closed interval [0 , T ] . If, foreach t ∈ [0 , T ] , M t is a finite dimensional vector space over F , then M can be decomposed as adirect sum of interval modules; namely, M = (cid:77) J I J where { J } is a collection of some intervals (could be open, closed, or half-open) in [0 , T ] . Thedecomposition is unique in the sense that, for every such decomposition, the collection of intervalsis the same. Each interval J in the decomposition stated in Theorem 3.4 is called a persistence interval of M . We may summarize all persistence intervals as a 2D diagram in [0 , T ] × [0 , T ], called the persistence diagram of M : for each persistence interval with left end α and right end β , we mark apoint ( α, β ) in [0 , T ] × [0 , T ]. The diagram consisting of all such points is called the persistencediagram of M , denoted dgm( M ). Rigorously speaking, we should distinguish open, closed, andhalf-open intervals. For our purpose, we only use the lengths of the persistence intervals, and hencethe distinction of open, closed, and half-open intervals does not really matter.12n important class of persistence modules is obtained from a 1-dimensional filtration of sim-plicial complexes by applying the homology functors H k ( · ; F ), k = 0 , , , ... . Specifically, fora 1-dimensional filtration of simplicial complexes K = {K t } t ∈ [0 ,T ] and a fixed nonnegative inte-ger k , we have the persistence module H k ( K ; F ) = { H k ( K t ; F ) } t ∈ [0 ,T ] along with the linear maps( i ts ) ∗ : H k ( K s ; F ) → H k ( K t ; F ) for every s ≤ t in [0 , T ], where i ts is the inclusion map from K s to K t .Since H k ( · ; F ) is a covariant functor, the equality ( i ut ) ∗ ◦ ( i ts ) ∗ = ( i us ) ∗ holds for every s ≤ t ≤ u in[0 , T ].For each k , we may use the persistence diagram of H k ( K ; F ) for analysis. For our purpose,instead of the whole diagram, we summarize the diagram by only looking at the longest lengthamong all persistence intervals, which we formally define below: Definition 3.5.
Let K = {K t } t ∈ [0 ,T ] be a 1-dimensional filtration of simplicial complexes. For eachnonnegative integer k , we define l max ( k, K ) def = sup { β − α : ( α, β ) ∈ H k ( K ; F ) } (6)and call it the maximal persistence length in dimension k .This definition is similar to the one used in Section 3 of [2]. Normally, the length of apersistence interval in H k ( K ; F ) is viewed as its significance in dimension k . Therefore, l max ( k, K ),the maximum among such interval lengths, is viewed as the significance of K in dimension k . L k ( F , P K ) and its relation to the dimension of ( F , P K ) In this section, from the regular pair ( F , P K ), we define quantities that we use to bound thedimension d ( F , P K ) from below. We start with the following notation. Definition 3.6.
Given ( F , P K ), where F = { f i : K → R } i ∈ [ m ] is a collection of quasi-convexfunctions defined on a convex open set K and P K is a probability measure on K . For x ∈ K , wedefine T i ( x ) def = P K ( f − i ( −∞ , f i ( x ))) . The only difference is that the authors in [2] measures the maximal cycle multiplicatively while we measure itadditively. T i ( x ) is the P K measure of the shaded area f − i ( −∞ , f i ( x )).The function T i ( x ) may be regarded as the P K -rescaled version of f i (see Figure 3 for anillustration). Now we define a one dimensional filtration of simplicial complexes K x that are usedto infer a lower bound of the dimension d ( F , P K ). The geometry underlying the definition isdepicted in Figure 4.Figure 4: From left to right, the filtration K x ( t ), as the nerve of the sublevel sets of { f i } i ∈ [ m ] , startsat t = 0 as the empty simplicial complex and increases as t goes up to t max ( x ), where the sublevelsets of { f i } i ∈ [ m ] touch the point x on their boundaries. The formal formulation of the process is in(8) of Definition 3.7.Throughout Section 3, we fix an arbitrary coefficient field F when taking homology; namely,for a filtration of simplicial complexes K and a nonnegative integer k , H k ( K ) def = H k ( K ; F ). Definition 3.7.
Let ( F , P K ) be a regular pair, where F = { f i : K → R } i ∈ [ m ] . For x ∈ K , let t max ( x ) def = max i ∈ [ m ] T i ( x ) . (7)Define a one dimensional filtered complex K x , indexed over t ∈ [0 , t max ( x )], by K x ( t ) def = D ow( F , P K ) ( T ( x ) − ( t max ( x ) − t ) , ..., T m ( x ) − ( t max ( x ) − t )) . (8)14or every nonnegative integer k , we define L k ( F , P K ) def = sup x ∈ K l max ( k, K x ) . (9)As illustrated in Figure 4, if x is “central” in some appropriate sense (see Definition 4.1 inSection 4), a ( d ( F , P K ) − L k ( F , P K ) can at least be used to derive a lower bound for thedimension of the regular pair ( F , P K ) due to the following lemma. Lemma 3.8.
Let ( F , P K ) be a regular pair. Then, for k ≥ d ( F , P K ) , L k ( F , P K ) = 0 . In particular, d low ( F , P K ) def = 1 + max { k : L k ( F , P K ) > } ≤ d ( F , P K ) . (10) Proof.
For notational simplicity, in this proof, we denote d low = d low ( F , P K ) and d = d ( F , P K ).Recall that D ow( F , P K )( t , ..., t m ) = nerve (cid:0) { f − i ( −∞ , λ i ( t i )) } mi =1 (cid:1) . Since the functions f i arequasi-convex, intersections of convex sets are convex and convex sets are contractible, the set { f − i ( −∞ , λ i ( t i )) } mi =1 is a good cover. Thus, by nerve lemma (see, e.g., Theorem 10.7 in [1] orCorollary 4G.3 in [7]), we have the following homotopy equivalence: D ow( F , P K )( t , ..., t m ) ∼ (cid:91) i ∈ [ m ] f − i ( −∞ , λ i ( t i )) . (11)Notice that (cid:83) i ∈ [ m ] f − i ( −∞ , λ i ( t i )) is open in R d and it is well-known that, for every open set U ⊆ R d , H k ( U ) = 0, for all k ≥ d (see, e.g., Proposition 3.29 in [7]). Thus, for k ≥ d , H k (cid:16)(cid:83) i ∈ [ m ] f − i ( −∞ , λ i ( t i )) (cid:17) = 0. Combining with (11), we obtain, for k ≥ d and ( t , ..., t m ) ∈ [0 , m , H k ( D ow( F , P K )( t , ..., t m )) = 0. Therefore, for k ≥ d , l max ( k, K x ) = 0, for all x ∈ K , and L k ( F , P K ) = 0. Thus d low − { k : L k ( F , P K ) > } ≤ d − d low ≤ d . L k ( F , P K ) is defined with respect to a regular pair ( F , P K ) and thus is not directly computablefrom discrete data. In Section 3.3, we follow an analogous approach in defining L k ( F , P K ) to define L k ( M ) and prove that L k ( M ) converges to L k ( F , P K ). L k ( M ) and its convergence to L k ( F , P K ) In Theorem 2.9, we see that, for the data matrix M , Dow( S ( M )) approximates D ow( F , P K )with high probability. Thus, it is natural to use Dow( S ( M )) to define an analogue L k ( M ) of L k ( F , P K ). Definition 3.9.
Let M ∈ M om,n and S ( M ) = { s , ..., s m } be the collection of m sequences inducedfrom the rows of M ( s i corresponds to row i ). For a ∈ [ n ] and i ∈ [ m ], denoteord i ( M, a ) def = { b ∈ [ n ] : M ib ≤ M ia } ) . (12)15or a ∈ [ n ], which corresponds to the a -th column of the data matrix M , letˆ t max ( a ) def = max i ∈ [ m ] ord i ( M, a ) n . (13)Define a one dimensional filtered complex ˆ K a , indexed over t ∈ [0 , ˆ t max ( a )], byˆ K a ( t ) def = Dow( S ( M )) (cid:18) ord ( M, a ) n − (ˆ t max ( a ) − t ) , ..., ord m ( M, a ) n − (ˆ t max ( a ) − t ) (cid:19) . (14)See Definition 2.2 for the definition of Dow( S ( M )). For every nonneative integer k , we define L k ( M ) def = max a ∈ [ n ] l max ( k, ˆ K a ) . (15)Since Dow( S ( M )) approximates D ow( F , P K ), intuitively, ˆ K a ( t ) approximates K x a ( t ) and L k ( M )approximates L k ( F , P K ). With the help of Theorem 2.9 and the Isometry Theorem in topologicaldata analysis (see e.g. Theorem 6.16 in Section 6 of [11]), these intuitions are justified as follows: Theorem 3.10.
Let ( F , P K ) be a regular pair. Assume that K is bounded and each f i ∈ F can becontinuously extended to the closure ¯ K . Let M n be an m × n matrix sampled from ( F , P K ) . Then,for all k ∈ { } ∪ N , as n → ∞ , L k ( M n ) converges to L k ( F , P K ) in probability; namely, for all (cid:15) > , lim n →∞ Pr [ | L k ( M n ) − L k ( F , P K ) | < (cid:15) ] = 1 . Moreover, the rate of convergence is independent of k . The proof of Theorem 3.10 is given in Section 6.4. According to Theorem 3.10, for each non-negative integer k , L k ( M n ) are consistent estimators of L k ( F , P K ) and they converge uniformlyin probability. Thus, by Lemma 3.8 and Theorem 3.10, we can estimate a lower bound for thedimension of ( F , P K ) from the data matrix M , via looking at the values of L k ( M n ). Formally, wecan define the following estimator of d low ( F , P K ). Definition 3.11.
For (cid:15) > M ∈ M om,n , we defineˆ d low ( M, (cid:15) ) def = 1 + max { k : L k ( M ) > (cid:15) } . (16)As a consequence of Lemma 3.8 and Theorem 3.10, it is immediate that ˆ d low ( M n , (cid:15) ) is aconsistent estimator, for appropriately chosen (cid:15) . Corollary 3.12.
Let ( F , P K ) be a regular pair satisfying the conditions in Theorem 3.10 and M n ∈ i.e. for all (cid:15) >
0, lim n →∞ Pr (cid:2) sup k ≥ | L k ( M n ) − L k ( F , P K ) | < (cid:15) (cid:3) = 1 om,n be sampled from ( F , P K ) . Denote d low = d low ( F , P K ) . Then, for all < (cid:15) < L d low − ( F , P K ) , lim n →∞ Pr (cid:104) ˆ d low ( M n , (cid:15) ) = d low ( F , P K ) (cid:105) = 1 . (17) Proof.
For notational simplicity, in this proof, we denote d = d ( F , P K ). By Lemma 3.8 andTheorem 3.10, for k ≥ d , L k ( M n ) → L d low − ( M n ) → L d low − ( F , P K ) > n → ∞ , with the same rate of convergence. Since 0 < (cid:15) < L d low − ( F , P K ), as n → ∞ , w.h.p., L d low − ( M n ) > (cid:15) and L k ( M n ) < (cid:15) . Therefore, w.h.p., ˆ d low ( M n , (cid:15) ) = d low , and theresult follows.From Corollary 3.12, ˆ d low ( M n , (cid:15) ) can be used as a consistent estimator of d low ( F , P K ). However,we need to know how to choose an appropriate (cid:15) for ˆ d low ( M n , (cid:15) ), and hence estimation of L k ( F , P K )is still necessary. Therefore, in practice, we suggest one use a statistical approach estimating L k ( F , P K ) to infer d low ( F , P K ), instead of using ˆ d low ( M n , (cid:15) ) directly. The details are discussed inSection 3.5. ˆ K a and L k ( M ) For ease of implementation, we combine Definition 3.9 and Definition 2.2 and summarize themas algorithms for the computaion of ˆ K a and L k ( M ). Algorithm 1 is for ˆ K a . Algorithm 1
Computation of ˆ K a INPUTS: (1) M : an m × n real matrix without repeated values on each row; namely, a matrix in M om,n .(2) a : an integer in [ n ], referring to the a -th column of M . OUTPUT: [ ˆ K a ( t )] t : a filtration of simplicial complexes, where t ∈ { , /n, /n, ..., ˆ t max ( a ) } (see Step 2 orDenfinition 3.9 for definition of ˆ t max ( a )). STEPS:Step 1:
For i ∈ [ m ] and b ∈ [ n ], recall from Definition 3.9 the order (a positive integer)ord i ( M, b ) def = { c ∈ [ n ] : M ic ≤ M ib } ) . Step 2:
Define ˆ t max ( a ) def = (1 /n ) · max i ∈ [ m ] ord i ( M, a ). Step 3:
Define an increasing filtration of simplicial complexes ˆ K a byˆ K a ( t ) def = ∆ (cid:16)(cid:8) σ b : σ b = { i ∈ [ m ] : ord i ( M, b ) ≤ ord i ( M, a ) − n (ˆ t max ( a ) − t ) } (cid:9) b ∈ [ n ] (cid:17) , where t ∈ { , /n, /n, ..., ˆ t max ( a ) } . 17he next algorithm, Algorithm 2, is for computing L k ( M ). Note that, in the algorithm,PersistenceIntervals is a function with two inputs, a filtration of simplicial complexes and a positiveinteger that is set to limit the dimension of the computation of persistent homology to avoid possibleintractable computational complexities. As the name suggests, the output of PersistenceIntervalsis the persistence intervals of the first input in dimensions less than or equal to the second input. Algorithm 2
Computation of L k ( M ) INPUTS: (1) M : an m × n real data matrix.(2) d up : a positive integer, used to limit the dimension of the computation of persistent ho-mology; namely, the persistent homology is only computed for dimension 0 , , ..., d up to make itcomputationally feasible. OUTPUT: [ L k ( M ) for k = 0 , , ..., d up ]: an array of nonnegative real numbers. STEPS:Step 1:
For a ∈ [ n ], compute I a def = PersistenceIntervals (cid:16) ˆ K a , d up (cid:17) def = (cid:110) dgm (cid:16) H k (cid:16) ˆ K a (cid:17)(cid:17) for k = 0 , , ..., d up (cid:111) , and L a, max def = (cid:110) max (cid:110) β − α : ( α, β ) ∈ dgm (cid:16) H k (cid:16) ˆ K a (cid:17)(cid:17)(cid:111) for k = 0 , , ..., d up (cid:111) def = (cid:110) l max (cid:16) k, ˆ K a (cid:17) for k = 0 , ..., d up (cid:111) . where PersistenceIntervals( ˆ K a , d up ) computes the persistence intervals of ˆ K a in dimensions up to d up . Step 2:
For k = 0 , , ..., d up , compute L k ( M ) by L k ( M ) def = max { l max ( k, ˆ K a ) : a ∈ [ n ] } . The worst case complexity of a standard algorithm for computing the persistent homology ofa 1-dimensional filtration of simplicial complexes is cubical in the number of simplices (see, e.g.Section 5.3.1 in [10] and references therein). Since each ˆ K a starts from the empty simplicial complexand ends at the full simplex ∆ m − , we would need to go through all faces of ∆ m − . However, sincewe limit the computation only in dimension 0 , , ..., d up , where d up ≤ m − { d up + 1 , m − } )-skeleton of ∆ m − . Therefore, for our algorithm, the number18f faces in the 1-dimensional filtration is min { d up +2 ,m } (cid:88) k =0 (cid:18) mk (cid:19) which is O ( m d up +2 ). Since there are n such a ∈ [ n ], the worst case complexity of computing { L k ( M ) } d up k =0 is O ( n · m d up +2) ) = O ( n · m d up +6 ), which is of degree 3 d up + 6 in m but only linearin n .Since the algorithm is linear in n , even in the case when n is large, as long as m is not toolarge, the algorithm is still tractable. Moreover, to use the full power of Theorem 3.10, we wouldwant n to be large. In the case when n is large, we may subsample the points (i.e. the columns)to see how large the variance of L k ( M n ) is; this is called bootstrap in statistics. Moreover, we canimplement the subsampling for different numbers of columns and get the convergence trend.On the other hand, to infer the dimension d ( F , P K ), we will need at least m ≥ d ( F , P K ) + 1.Thus, we want m to be not too small. However, since the computational complexity of L k ( M n )goes up in high degree order with respect to m , we cannot have m being too large. In the case when m is too large, we can overcome the computational difficulty by subsampling the functions (i.e. therows); namely, pick randomly m s , say m s = 10, functions, which correspond to their respective m s rows of M n and compute the L k of the submatrix thus formed; repeat this process many times andsee how the result is distributed.We elaborate on these two methods (subsampling points or functions) in the following twosubsections. We also implement the methods for estimating the embedding dimension in theirappropriate situations, plot the results and give some principles for decision making (i.e. deciding,given the the plot and k , whether we accept L k ( F , P K ) > n is sufficiently large In the case when n is sufficiently large, say n ≥ n s ,points (i.e. columns of M n ) and obtain the variance information . Moreover, letting n s go up, wecan obtain further how the trend of convergence goes, which, by Theorem 3.10, should converge tothe true L k ( F , P K ). The technique of subsampling is called bootstrap in statistics.Figure 5 is the boxplots of L k ( M n s ) obtained by implementing this idea under differentsettings of ( d, m, n ), where d = d ( F , P K ) is the dimension of ( F , P K ), m is the number of functionsand n is the number of data points. Here, we choose ( m, n ) to be (10 , m is The boxplot of a collection of real numbers is a box together with a upper whisker and a lower whisker attached tothe top and bottom of the box and possibly some dots on top of the upper whisker or below the lower whisker. Fromthe box part, one can read out the first quartile Q medium Q third quartile Q Q − Q interquartile range (IQR) . The lower whisker and upper whisker, resp., label the values Q − . IQR and Q . IQR , resp. Values outside of the whiskers are regarded as outliers and labelled by dots. n is sufficiently large for subsampling. Subsampling is repeated 100times for each boxplot. To compare with the result of a purely random matrix, we also generate a10 ×
350 matrix whose entries are iid from Unif(0 ,
1) and compute its L k ’s. The details of how theboxplots are generated are in the caption of Figure 5. (a) d = 3 (b) d = 4 (c) d = 5 (d) M:
10 × 350 random matrix
Figure 5: The four panels are boxplots of L k obtained from subsampling the points. Throughoutthe panels, m = 10 and n = 350, where m is the number of functions and n is the number of totalsample poionts. The panels correspond to (a) d = 3, (b) d = 4, (c) d = 5, where d = d ( F , P K )is the dimension of ( F , P K ). The functions are chosen to be random quadratic functions definedon the unit d -ball in R d . Panel (d) is obtained by computing the L k ’s of an m × n matrix M n with entries i.i.d. from Unif(0,1), which is treated as a purely random matrix and whose mainpurpose is for comparison with other panels. Each figure in each panel is generated by subsampling n s = 50 , ,
200 columns of M n , repeated 100 times. By the decision principle, every figure inpanel (a) successfully infer their respective true dimensions; in panel (b), n s = 50 fails to infer thetrue dimension 4 but only infers a lower bound 3 while n s = 150 ,
200 successfully infer the truedimension 4; in panel (c), both n s = 50 ,
150 fail to infer the true dimension 5 but only infer a lowerbound 4 while n s = 200 successfully infer the true dimension 5. In panel (d), the figure has a quitedifferent behavior. decision principle we propose to follow isthat, on each boxplot of L k ( M n s ) , if the first quartile Q is not greater than , reject L k ( F , P K ) > ; otherwise, accept L k ( F , P K ) > d = 3, we can see that, as n s goes up, the variance of L k ( M n s ) for each k goes down. For k = 2, the first quartile of L k ( M n s ) is greater than 0 even for n s = 50; for k ≥ L k ( M n ) stays at 0 with only some noise-like dots all the time. According to this principle, we canconclude d ≥ d = 3.For panel (b) where d = 4, the same shrinking variance behavior can be observed. Moreover,the principle concludes d ≥ n s = 150, where the first quartile starts to stay away from 0.Similarly, for panel (c) where d = 5, in n s = 50 and n s = 150, our principle concludes d ≥ n s = 200, it concludes d ≥ d , we would need n s to be larger to make the best conclusion (i.e.inferring the true dimension). However, by making n s go up, the variance of L k ( M n s ) goes downand we may also use this information. Therefore, for small sample case, one may count on thisconvergence behavior and develop other principles by quantifying the trend of convergence. Forexample, in panel (c) where d = 5, when n s goes up from 50 to 150, we observe that L ( M n s ) pokesout from noiselike outliers to a filled box. This trend suggests that we “may accept” L ( F , P K ) > m is large As we mentioned earlier, the worst case computational complexity of L k ( M n ) goes up althoughpolynomially but with degree 3 d up + 6 (high degree) in m , the number of rows of M n . To overcomethis difficulty, we propose to subsample the rows (i.e. the collection of functions) of M n . Specifically,for a fixed number m s < m , we randomly choose m s rows of M n and construct the m s × n submatrix M m s × n accordingly, compute L k ( M m s × n ) and repeat the process as many times as assigned. Figure6 is the boxplot of L k ( M m s × n ) with m s = 10, repeated N rep = 1000 times, under different settings.Notice that throughout the plots, m = 100, n = 150, m s = 10 and N rep = 1000. We still adopt theprinciple as last subsection that we only accept L k ( F , P K ) > , , It can be proved that, if entries of M n ∈ M om,n are i.i.d. from Unif(0 , L k ( M n ) behaves as positive for k ≤ m − L k ( M n ) behaves as going to zero, for k ≥ m −
1; thus L k ( M n ) relies on m instead of an intrinsic d . i.e. the 25th percentile (a) d = 2 (b) d = 3 (c) d = 4 (d) d = 5 Figure 6: Boxplots of L k obtained from subsampling functions. Throughout the panels, m = 60, n = 150 and m s = 10, where m is the number of functions (rows of M n ), n is the number of totalsample points and m s is the number of functions used in each function subsampling. Each panel isgenerated under a fixed regular pair with (a) d = 2, (b) d = 3, (c) d = 4, (d) d = 5, where d is thetrue dimension of the regular pair. Function subsampling is repeated 1000 times for each panel.A lower bound for d ( F , P K ) may not be very satisfactory. In Section 4 and 5, we developsome theory and methods to decide whether the lower bound obtained in this section is indeed thedimension d ( F , P K ). ˆ d low ( M, ε ) as an asymptotically consistent dimension estimatorin the class of complete regular pairs We establish in Section 3 that a lower bound d low ( F , P K ) of d ( F , P K ) is generally inferablefrom sampled data. Here we provide a sufficient condition for d low ( F , P K ) = d ( F , P K ); this ensuresthat the dimension d ( F , P K ) can be inferred with high probability. Recall that the conic hull of22 set S ⊆ R d , denoted cone( S ) , is the setcone( S ) def = (cid:40) k (cid:88) i =1 c i v i | c , ..., c k ≥ , v , ..., v k ∈ S, k ∈ N (cid:41) . (18) Definition 4.1.
Let ( F , P K ) be a regular pair, where F = { f i } i ∈ [ m ] and each f i : K → R isdifferentiable. The setCent = (cid:110) x ∈ K | cone ( {∇ f ( x ) , . . . , ∇ f m ( x ) } ) = R d (cid:111) , is called the type 1 central region of ( F , P K ). Definition 4.2.
A regular pair ( F , P K ) is said to be complete if its Cent is non-empty.It is perhaps intuitive (see Figure 4 on page 14) that, for a sufficiently nice complete regularpair, the lower bound in Lemma 3.8 is indeed the dimension d ( F , P K ). More precisely, Theorem 4.3.
Let ( F , P K ) be a regular pair, where F = { f i } i ∈ [ m ] and each f i : K → R is differ-entiable. If ( F , P K ) is complete, then the lower bound in Lemma 3.8 is indeed the dimension of theregular pair, i.e. d low ( F , P K ) = d ( F , P K ) . The proof is given in Section 4.1. An immediate corollary of the above theorem and Corollary3.12, is the following
Theorem 4.4.
Let ( F , P K ) be a regular pair satisfying the conditions in Theorem 3.10, where F = { f i } i ∈ [ m ] and each f i : K → R is differentiable. If ( F , P K ) is complete regular pair withdimension d = d ( F , P K ) , and matrices M n ∈ M om,n are sampled from ( F , P K ) , then for every ε ∈ (0 , L d − ( F , P K )) , lim n →∞ Pr (cid:104) ˆ d low ( M n , ε ) = d ( F , P K ) (cid:105) = 1 . (19)In other words, ˆ d ( ε ) : M om,n → N defined by ˆ d ( ε )( M n ) def = ˆ d low ( M n , ε ) is an asymptoticallyconsistent estimator in the class of complete regular pairs. Proof.
By Theorem 4.3, d low ( F , P K ) = d ( F , P K ). Moreover, by Corollay 3.12,lim n →∞ Pr (cid:104) ˆ d low ( M n , ε ) = d low ( F , P K ) (cid:105) = 1 . Thus, the result follows.
Recall, the following notation from Section 3. Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regularpair; for any t ∈ [0 , K ( i ) ( t ) def = f − i ( −∞ , λ i ( t )), where λ i ( t ) is a monotone-increasing23unction that satisfies P K ( f − i ( −∞ , λ i ( t ))) = t . For any x ∈ K , we also denote T i ( x ) def = P K ( f − i ( −∞ , f i ( x ))) . Theorem 4.3 follows from the following key lemma.
Lemma 4.5.
Let ( { f i } mi =1 , P K ) be a complete regular pair. Suppose x ∈ Cent , then there exists ε > such that nerve (cid:18)(cid:110) K ( i ) ( T i ( x ) − t ) (cid:111) i ∈ [ m ] (cid:19) ∼ S d − , ∀ t ∈ [0 , ε ) . (20) Proof of Theorem 4.3.
By Lemma 3.8, d low ( F , P K ) ≤ d ( F , P K ). We therefore only need to prove L d − ( F , P K ) >
0. Let x ∈ Cent . By Lemma 4.5, l max ( d − , x ) > L d − ( F , P K ) >
0, completing the proof.
Proof of Lemma 4.5.
For each i ∈ [ m ], we denote byΩ i = K ( i ) ( T i ( x )) = { x ∈ K | f ( x ) < f ( x ) } the appropriate open convex sublevel set of f i . Note that K ( i ) ( T i ( x ) − t ) ⊆ Ω i for any t ≥
0, andthe nerve in the left-hand-side of (20) is a subcomplex of the nerve (cid:16) { Ω i } i ∈ [ m ] (cid:17) .For each non-empty σ ∈ nerve (cid:16) { Ω i } i ∈ [ m ] (cid:17) , the subset (cid:84) i ∈ σ Ω i is open and non-empty andhence has a nonzero P K measure. Thus there exists ε σ >
0, such that (cid:84) i ∈ σ K ( i ) ( T i ( x ) − t ) isnon-empty for any t ∈ [0 , ε σ ). Choosing ε to be the minimum of all such ε σ thus guarantees thatnerve (cid:18)(cid:110) K ( i ) ( T i ( x ) − t ) (cid:111) i ∈ [ m ] (cid:19) = nerve (cid:16) { Ω i } i ∈ [ m ] (cid:17) ∀ t ∈ [0 , ε ) . It thus suffices to prove (20) for t = 0. Since each Ω i is open and convex, by the nerve lemma it is enough to show that (cid:83) i ∈ [ m ] Ω i ∼ S d − . Moreover, since x lies on the boundary of each Ω i ,the union { x } ∪ (cid:83) i ∈ [ m ] Ω i is star-shaped. Therefore, it suffices to prove that there exists η > B η ( x ) \ { x } ⊆ (cid:91) i ∈ [ m ] Ω i , (21)where B η ( x ) = { x ∈ R d : (cid:107) x − x (cid:107) < η } .Suppose no such η > n ∈ N , there exists a unit vector v n ∈ S d − = { x ∈ R d , (cid:107) x (cid:107) = 1 } such that x n = x + n · v n / ∈ (cid:83) i ∈ [ m ] Ω i . By compactness of S d − , there is an infinitesubsequence { v n j } , that converges to a particular v ∗ ∈ S d − . Since all f i are differentiable, using See Lemma 6.24 in Section 6.5. See, e.g., Theorem 10.7 in [1] or Corollary 4G.3 in [7]. n j ( f i ( x n ) − f i ( x )) = (cid:104)∇ f i ( x ) , v n j (cid:105) + O (1 /n j ) . (22)Since x n / ∈ Ω i for all i , f i ( x n ) ≥ f i ( x ). Taking lim inf j →∞ on both sides of equation (22), weconclude that (cid:104)∇ f i ( x ) , v ∗ (cid:105) ≥ , ∀ i ∈ [ m ]. Since − v ∗ ∈ cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d , choosingappropriate nonnegative coefficients in (18) yields 0 ≤ (cid:104)− v ∗ , v ∗ (cid:105) and hence v ∗ = 0, a contradiction.Therefore the inclusion (21) holds for some η > ( F , P K ) from sampled data Theorem 4.3 establishes that completeness of ( F , P K ) implies d low ( F , P K ) = d ( F , P K ), andthus the data dimension d ( F , P K ) can be inferred from sampled data. Unfortunately, completenesscannot be directly tested from sampled data, since the gradient information is not directly accessiblefrom discrete samples. Here we consider a different notion of central region , Cent ⊆ K , which,under some generic assumtion, is indistinguishable from Cent in the probability measure P K (Lemma 5.3). We also establish that the probability measure of Cent can be approximated fromsampled data (Theorem 5.5). This enables one to test completeness of a regular pair from sampleddata. Definition 5.1.
Let ( F , P K ) be a regular pair, the subsetCent = x ∈ K | (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) = ∅ , is called the type 0 central region of ( F , P K ). Definition 5.2.
A set of vectors V = { v , ..., v m } ⊆ R d is said to be in general direction if,for every σ ⊆ [ m ] with | σ | ≤ d , the set of vectors { v i } i ∈ σ is linearly independent. A collection ofdifferentiable functions F = { f i : K → R } i ∈ [ m ] is said to be in general position if for (Lebesgue)almost every x in K , the vectors {∇ f i ( x ) } i ∈ [ m ] are in general direction. Lemma 5.3.
Let ( F , P K ) be a regular pair, where each function in F is differentiable. Assumethat F is in general position, then P K (Cent \ Cent ) = P K (Cent \ Cent ) = 0 . (23)The proof is given in Section 5.1. It can be shown that Cent of a regular pair is an openset (see Lemma 6.25 in the Appendix). Thus completeness of a regular pair ( F , P K ) is equivalentto P K (Cent ) >
0. Lemma 5.3 ensures that completeness of a regular pair in general position is25quivalent to P K (Cent ) >
0. In order to test whether P K (Cent ) >
0, one can use the followingnatural discretization.
Definition 5.4.
For a matrix M ∈ M om,n , the set (cid:91) Cent ( M ) def = a ∈ [ n ] : (cid:92) i ∈ [ m ] { b ∈ [ n ] : M ib < M ia } = ∅ is called the discretized central region .If a matrix M ∈ M om,n is sampled from a regular pair, then for each a ∈ [ n ], the set { b ∈ [ n ] : M ib < M ia } is a discretization of f − i ( −∞ , f i ( x a )), and (cid:91) Cent ( M ) can be thought of as anapproximation of Cent . The following theorem confirms this intuition. Theorem 5.5.
Let M n ∈ M om,n be sampled from a regular pair, then n (cid:91) Cent ( M n )) convergesto P K (Cent ) in probability:for all (cid:15) > , lim n →∞ Pr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:91) Cent ( M n )) − P K (Cent ) (cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) (cid:21) = 0 . . The proof involves technicalities used in proving the Interleaving Convergence Theorem (Theo-rem 2.9) and is given in Section 6.7 in the Appendix. Theorem 5.5 establishes that n (cid:91) Cent ( M n ))serves as an approximation of P K (Cent ), and thus enables one to to test whether P K (Cent ) > F , P K ). First we prove the first part of Lemma 5.3.
Lemma 5.6.
Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regular pair, where each function in F is differ-entiable. Assume that F is in general position, then P K (Cent \ Cent ) = 0 .Proof. Let K (cid:48) = { x ∈ K | ∃ i, ∇ f i ( x ) = 0 } denote the union of critical points of functions in F .Since F is in general position, K (cid:48) has Lebesgue measure zero. Assume x ∈ Cent \ K (cid:48) , and thus ∀ u ∈ R d ∃ c , . . . , c m ≥ u = m (cid:88) i =1 c i ∇ f i ( x ) . (24)It can be easily shown, see e.g. Theorem 3.2.3 in [4], that if f is differentiable and quasi-convex on26n open convex K with ∇ f ( x ) (cid:54) = 0, then f ( x ) < f ( x ) implies (cid:104)∇ f ( x ) , x − x (cid:105) <
0. Thus (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) ⊆ (cid:92) i ∈ [ m ] { x | (cid:104)∇ f i ( x ) , x − x (cid:105) < } = ∅ , where the last equality follows from (24), as one can chose u = x − x . This implies x ∈ Cent .Therefore, Cent \ K (cid:48) ⊆ Cent and P K (Cent \ Cent ) ≤ P K ( K (cid:48) ) = 0.To prove the second half of Lemma 5.3, we first recall that a convex cone C ⊆ R d is called flat if there exists w (cid:54) = 0 such that both w ∈ C and − w ∈ C . Otherwise, it is called salient . If a convexcone C is closed and salient, then there exists w ∈ R d such that (cid:104) u, w (cid:105) <
0, for all non-zero u ∈ C . Lemma 5.7.
Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regular pair, where each f i is differentiable, then Cent \ Cent ⊆ (cid:110) x ∈ K : cone (cid:16) {∇ f i ( x ) } i ∈ [ m ] (cid:17) is flat but not R d (cid:111) . (25) Proof.
Let x ∈ Cent \ Cent . Denote C = cone (cid:16) {∇ f i ( x ) } i ∈ [ m ] (cid:17) . Since x / ∈ Cent , C (cid:54) = R d . Thus, it suffices to prove that the cone C is flat. Suppose that thecone C is not flat, then there exists w such that (cid:104) u, w (cid:105) <
0, for all non-zero u ∈ C . In particular, (cid:104)∇ f i ( x ) , w (cid:105) < , ∀ i ∈ [ m ]. Let us show that ∀ i ∈ [ m ], there exists α i > x + α i w ∈ f − i ( −∞ , f i ( x )). Suppose not, then there exists i ∈ [ m ], such that f i ( x + αw ) ≥ f ( x ) , ∀ α > (cid:104)∇ f i ( x ) , w (cid:105) = lim inf α → + f ( x + αw ) − f ( x ) α ≥ , which is a contradiction. Thus, such positive α i ’s exist, and we obtain that x + (cid:18) min i ∈ [ m ] α i (cid:19) w ∈ (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) . This contradicts the assumption that x ∈ Cent . Therefore the cone C is flat.It can be shown that the inclusion in (25) is in fact an equality. However, since we do not needthe equality here, it was left out the proof. To finish the proof of Lemma 5.3, we use the following Salient cones are also called pointed cones . It is well-known (see, e.g., Section 2.6.1 in [3]) that if
C ⊂ R d isa closed salient cone, then its dual cone C ∗ def = { w ∈ R d : (cid:104) w, u (cid:105) ≥ , ∀ u ∈ C} has nonempty interior. Consider −C ∗ = { w ∈ R d : (cid:104) w, u (cid:105) ≤ , ∀ u ∈ C} = { w ∈ R d : (cid:104) w, u (cid:105) < , ∀ u ∈ C} ∪ { w ∈ R d : (cid:104) w, u (cid:105) = 0 , ∀ u ∈ C} . If C isclosed and salient, then −C ∗ has nonempty interior. Note that, if d >
0, then { w ∈ R d : (cid:104) w, u (cid:105) = 0 , ∀ u ∈ C} hasmeasure 0 and hence { w ∈ R d : (cid:104) w, u (cid:105) < , ∀ u ∈ C} is nonempty and any vector in it satisfies the wanted property. emma 5.8. Let V = { v , ..., v m } ⊂ R d be a set of vectors in general direction, then cone( V ) = R d or cone( V ) is salient. To prove Lemma 5.8, we use the following lemma.
Lemma 5.9 (see e.g. Theorem 2.5 in [12]) . Let V = { v , ..., v m } be a set of non-zero vectors in R d , then the following two statements are equivalent:(i) cone( V ) = span( V ) ;(ii) For each i ∈ [ m ] , − v i ∈ cone( V \ { v i } ) .Proof of Lemma 5.8. For m ≤ d , the vectors { v , ..., v m } are linearly independent. Suppose thereexists w ∈ R d , such that w, − w ∈ cone( V ). Thus there exist a i , b i ≥ w = (cid:80) mi =1 a i v i = − (cid:80) mi =1 b i v i . Since the vectors { v , ..., v m } are linearly independent, a i + b i = 0 for all i ∈ [ m ], andthus w = 0. Therefore cone( V ) is salient.For m > d , we prove by induction on the size of V . Suppose the result holds for any set of m ≥ d vectors in general direction. Let V = { v , ..., v m +1 } be a set of m + 1 vectors in generaldirection. Since any d vectors in V is a basis in R d , span( V ) = R d . Suppose the result is false for V ; equivalently, cone( V ) (cid:54) = R d and cone( V ) is flat. By Lemma 5.9, there exists j ∈ [ m + 1] suchthat − v j / ∈ cone( V \ { v j } ) . (26)Since cone( V ) is flat, there exists a nonzero w ∈ R d such that w, − w ∈ cone( V ), and thus w = (cid:80) m +1 i =1 a i v i = − (cid:80) m +1 i =1 b i v i , with a i , b i ≥ i . Let us prove that a j + b j >
0. If a j + b j = 0, then a j = b j = 0. Thus, w, − w ∈ cone( V \ { v j } ) and cone( V \ { v j } ) is not salient. Since | V \ { v j }| = m ,by the induction hypothesis, we must have cone( V \ { v j } ) = R d . However, cone( V ) (cid:54) = R d and hencecone( V \ { v j } ) (cid:54) = R d , a contradiction. Therefore a j + b j >
0, and we can conclude that − v j = (cid:88) i ∈ [ m +1] \{ j } (cid:18) a i + b i a j + b j (cid:19) v i ∈ cone( V \ { v j } ) , contradicting to (26). Therefore, the result holds for any V in general direction of size | V | = m + 1.This completes the proof by induction.We now finish the proof of Lemma 5.3. Proof of Lemma 5.3.
The first half of the proof of Lemma 5.3 is done in Lemma 5.6. To prove thesecond half, we combine Lemma 5.7 and Lemma 5.8 to obtainCent \ Cent ⊆ (cid:110) x ∈ K : {∇ f i ( x ) } i ∈ [ m ] is not in general direction (cid:111) . (27)28ince { f i } i ∈ [ m ] is in general position, the right hand side of (27) has measure zero, completing theproof. Proof.
For any a ≤ n −
1, the point x a is ordered the last in the sequence s a = ( · · · , n, a ); thus,by Lemma 1.3 each such point x a cannot be in the interior of the convex hull of the other points,therefore x n ∈ conv( x , . . . , x n − ). Assume that the embedding dimension is d ≤ n −
3, then bythe Caratheodorys theorem we conclude that there exists b ∈ [ n − x n ∈ conv( x , ..., ˆ x b , ..., x n − ) . (28)However, by assumptions (3) there exists a continuous quasi-convex function f b such that f b ( x a ) 2, one can place points x , ..., x n − to the vertices of an ( n − R n − , and place x n to the barycenter of thatsimplex. By construction, { x , ..., x n − } are convexly independent and we have for following convexhull relations for every i < n : x n / ∈ conv ( { x , ..., x n − } \ { x i } ) , and x i / ∈ conv( { x , ..., x n } \ { x i } ).Therefore, by Lemma 1.3 there exist quasi-convex continuous functions that realize the sequencesin (3). λ i ( t ) for t ∈ (0 , Lemma (Lemma 2.4) . Let f : K → R be a continuous function with P K ( f − ( (cid:96) )) = 0 for all (cid:96) ∈ R ,where P K is a probability measure on a convex open set K and P K is equivalent to the Lebesguemeasure on K . Then there exists a unique strictly increasing continuous function λ : (0 , → R such that, for all t ∈ (0 , , P K (cid:0) f − ( −∞ , λ ( t )) (cid:1) = t. (29) Proof. Since K is path-connected, by intermediate value theorem, f ( K ) is an interval in R . Definea function p f : f ( K ) → [0 , 1] by p f ( (cid:96) ) def = P K ( f − ( −∞ , (cid:96) )). Rewriting Equation (5) as p f ( λ ( t )) = t ,we note that λ ( t ) (if exists) is the inverse of p f , proving uniqueness of λ ( t ). For the existence andcontinuity of λ ( t ), it suffices to prove p f is continuous and strictly increasing.To prove p f is continuous, we prove p f is continuous from the right and from the left. Let (cid:96) ∈ f ( K ). For (cid:96) n (cid:37) (cid:96) in f ( K ), from definition, f − ( −∞ , (cid:96) n ) (cid:37) f − ( −∞ , (cid:96) ). Since P K is a Recall that a sequence ( (cid:96) n ) n ⊂ R goes up to (cid:96) ∈ R , denoted (cid:96) n (cid:37) (cid:96) , if (cid:96) n ≤ (cid:96) n +1 , for all n , and lim n →∞ (cid:96) n = (cid:96) . (cid:96) n (cid:38) (cid:96) is similarly defined. Recall that a sequence of sets ( A n ) n goes up to a set A , denoted A n (cid:37) A , if A n ⊆ A n +1 , for all n , and (cid:83) n A n = A . P K on both sides, we obtain p f ( (cid:96) n ) (cid:37) p f ( (cid:96) ). Thus p f is continuous from theleft. On the other hand, for (cid:96) n (cid:38) (cid:96) in f ( K ), from definition, (cid:32) ∞ (cid:92) n =1 f − ( −∞ , (cid:96) n ) (cid:33) \ f − ( −∞ , (cid:96) ) = f − ( (cid:96) ) . Thus p f ( (cid:96) n ) (cid:38) P K ( (cid:84) ∞ n =1 f − ( −∞ , (cid:96) n )) = P K ( f − ( −∞ , (cid:96) )) + P K ( f − ( (cid:96) )) = p f ( (cid:96) ) + 0 = p f ( (cid:96) ) and p f is continuous from the right. Therefore, p f is a continuous function.Now we turn to prove p f is strictly increasing. For (cid:96) < (cid:96) in f ( K ), we need to prove p f ( (cid:96) ) < p f ( (cid:96) ). Let U = f − ( −∞ , (cid:96) ) and U = f − ( −∞ , (cid:96) ), which are open convex sets with U ⊆ U . Since f ( K ) is an interval, for any (cid:96) ∈ ( (cid:96) , (cid:96) ), there exists x ∈ K with f ( x ) = (cid:96) .Thus U (cid:54) = U . Note that U (cid:42) cl ( U ); otherwise, U ⊆ U = int ( U ) ⊆ int ( cl ( U )) = U willimply U = U , where the last equality follows from openness and convexity of U . Thus, thereexists x ∈ U \ cl ( U ). Choose (cid:15) > B ( x , (cid:15) ) ⊆ U but B ( x , (cid:15) ) ∩ cl ( U ) = ∅ . Then P K ( U ) ≥ P K ( U ) + P K ( B ( x , (cid:15) )) > P K ( U ); equivalently, p f ( (cid:96) ) > p f ( (cid:96) ). Hence, p f is strictlyincreasing. The goal of this subsection is to prove Theorem 2.9, the Interleaving Convergence Theorem.The asymptotic behavior of Dow( S ( M )) actually follows from the asymptotic behaviors of severalbuilding blocks of Dow( S ( M )). We will first define these building blocks and prove their ownasymptotic theorems and then put these asymptotic theorems together to prove the InterleavingConvergence Theorem.We start with an object that, as will be seen, can be used to express D ow( F , P K ). Recall that,for t ∈ [0 , 1] and i ∈ [ m ], K ( i ) ( t ) def = f − i ( −∞ , λ i ( t ))where λ i ( t ) ∈ R satisfies P K ( f − i ( −∞ , λ i ( t ))) = t . Definition 6.1. For a regular pair ( F , P K ), define a function R ∞ : [0 , m → [0 , 1] by R ∞ ( t , ..., t m ) def = P K (cid:92) i ∈ [ m ] K ( i ) ( t i ) . It is easy to see that R ∞ is a cumulative distribution function ( CDF ). We next introduce an-other CDF, denoted R n , which will be used as an intermediate between D ow( F , P K ) and Dow( S ( M )). A n (cid:38) A is similarly defined. K ( i ) n ( t ) defined in Definition 6.3. Definition 6.2. For a point cloud X n ⊂ K of size n , sampled from a regular pair, we define afunction R n : [0 , m → [0 , 1] by R n ( t , ..., t m ) def = 1 n · X n ∩ (cid:92) i ∈ [ m ] K ( i ) ( t i ) . For those familiar with nonparametric statistics, it is easy to see that R n is in fact the empiricalcumulative distribution function ( empirical CDF ) of R ∞ . However, R n is still not obtainablefrom the m × n data matrix M n = [ f i ( x a )] since K ( i ) ( t i ) is not directly accessible from M n . Thenext definition is introduced to solve this problem by considering a step-function approximation of K ( i ) ( t i ). Definition 6.3. Let M n ∈ M om,n be sampled from a regular pair ( F , P K ), where F = { f i } i ∈ [ m ] .For i ∈ [ m ] and t ∈ [0 , K ( i ) n ( t ) = f − i ( −∞ , f i ( x a )) if t ∈ (cid:104) ord i ( M n ,a ) − n , ord i ( M n ,a ) n (cid:17) , a = 1 , ..., nK if t = 1 . For a pictorial illustration of K ( i ) n ( t ), please refer to Figure 7. Notice that there is a subscript n in K ( i ) n ( t ), indicating its dependence on the sampled matrix M n .The object in the next definition is obtainable solely from the data matrix M n , sampled froma regular pair. Definition 6.4. Let M n = [ M ia ] be an m × n data matrix, sampled from a regular pair. Define a31unction ˆ R n : [0 , m → [0 , 1] byˆ R n ( t , ..., t m ) def = 1 n · (cid:18)(cid:26) a ∈ [ n ] : ord i ( M n , a ) n ≤ t i , ∀ i ∈ [ m ] (cid:27)(cid:19) = 1 n · (cid:92) i ∈ [ m ] (cid:26) a ∈ [ n ] : ord i ( M n , a ) n ≤ t i (cid:27) . In the following lemma, we rewrite ˆ R n in a form that is similar to the definition of R n , whichhelps build a connection between them. Lemma 6.5. Let X n = { x , · · · , x n } ⊂ K be a point cloud, sampled from a regular pair, and M n be the corresponding m × n data matrix. Then, for all ( t , ..., t m ) ∈ [0 , m , ˆ R n ( t , ..., t m ) = 1 n · X n ∩ (cid:92) i ∈ [ m ] K ( i ) n ( t i ) . Proof. For each i ∈ [ m ] and t i ∈ [0 , b ∈ [ n ] such that ord i ( M n ,b ) − n ≤ t i < ord i ( M n ,b ) n .Then K ( i ) n ( t i ) = f − i ( −∞ , f i ( x b )) and X n ∩ K ( i ) n ( t i ) = { x a ∈ X n : f i ( x a ) < f i ( x b ) } = { x a ∈ X n : ord i ( M n , a ) < ord i ( M n , b ) } = { x a ∈ X n : ord i ( M n , a ) ≤ nt i } = (cid:26) x a ∈ X n : ord i ( M n , a ) n ≤ t i (cid:27) . For t i = 1, K ( i ) n ( t i ) = K and the above equality still holds. Thus, for any ( t , ..., t m ) ∈ [0 , m , X n ∩ (cid:92) i ∈ [ m ] K ( i ) n ( t i ) = (cid:92) i ∈ [ m ] (cid:26) x a ∈ X n : ord i ( M n , a ) n ≤ t i (cid:27) . By the definition of ˆ R n , the equality follows.Now the intuition behind the approximations is quite clear: since K ( i ) n is an approximation of K ( i ) , by Lemma 6.5, ˆ R n is an approximation of R n . Therefore, ˆ R n also approximates R ∞ .Next we connect R ∞ and ˆ R n with our target objects Dow( S ( M )) and D ow( F , P K ). Forsimplicity, we introduce the following convenient notations: Definition 6.6. For ( t , ..., t m ) ∈ [0 , m and σ ⊆ [ m ], define t σi def = t i , if i ∈ σ , otherwise . With these notations, we have 32 heorem 6.7. Let ( F , P K ) be a regular pair and M n be an m × n data matrix, sampled from ( F , P K ) . Then, for all ( t , ..., t m ) ∈ [0 , m , we have(i) D ow( F , P K )( t , ..., t m ) = { σ ⊆ [ m ] : R ∞ ( t σ , ..., t σm ) (cid:54) = 0 } .(ii) Dow( S ( M n ))( t , ..., t m ) = { σ ⊆ [ m ] : ˆ R n ( t σ , ..., t σm ) (cid:54) = 0 } .Proof. For the first equality, recall that D ow( F , P K )( t , ..., t m ) = nerve( { K ( i ) ( t i ) } i ∈ [ m ] ); namely, σ ∈ D ow( F , P K )( t , ..., t m ) if and only if (cid:84) i ∈ σ K ( i ) ( t i ) (cid:54) = ∅ . Since each K ( i ) ( t i ) is open, σ ∈ D ow( F , P K )( t , ..., t m ) is also equivalent to P K ( (cid:84) i ∈ σ K ( i ) ( t i )) (cid:54) = 0. Notice that, by definition of R ∞ and t σi , R ∞ ( t σ , ..., t σm ) = P K (cid:92) i ∈ [ m ] K ( i ) ( t σi ) = P K (cid:32)(cid:92) i ∈ σ K ( i ) ( t i ) (cid:33) ∩ (cid:92) i ∈ [ m ] \ σ K ( i ) (1) = P K (cid:32)(cid:92) i ∈ σ K ( i ) ( t i ) (cid:33) . Therefore, the first equality follows.For the second equality, recall, from Lemma 2.3, thatDow( S ( M n ))( t , · · · , t m ) = nerve( { A ( i ) ( t i ) } i ∈ [ m ] )where A ( i ) ( t i ) = { a ∈ [ n ] : n · { b ∈ [ n ] : b ≤ i a } ) ≤ t i } . Thus, σ ∈ Dow( S ( M ))( t , · · · , t m ) isequivalent to (cid:84) i ∈ σ A ( i ) ( t i ) (cid:54) = ∅ , which is also equivalent to (cid:84) i ∈ σ A ( i ) ( t i )) (cid:54) = 0 since each A ( i ) ( t i )is a finite set. Note that A ( i ) ( t i ) is exactly X n ∩ K ( i ) n ( t i ). Thus, by Lemma 6.5,ˆ R n ( t σ , · · · , t σm ) = 1 n · X n ∩ (cid:92) i ∈ [ m ] K ( i ) n ( t σi ) = 1 n · (cid:32)(cid:92) i ∈ σ A ( i ) ( t i ) (cid:33) ∩ (cid:92) i ∈ [ m ] \ σ A ( i ) (1) = 1 n · (cid:32)(cid:92) i ∈ σ A ( i ) ( t i ) (cid:33) . Hence, (cid:84) i ∈ σ A ( i ) ( t i )) (cid:54) = 0 is equivalent to ˆ R n ( t σ , · · · , t σm ) (cid:54) = 0, and the second equality follows.Before diving into asymptotic results, let us look at one useful property of R ∞ . Lemma 6.8. The map R ∞ in Definition 6.1 is uniformly continuous.Proof. Let ( t , ..., t m ) , ( t (cid:48) , ..., t (cid:48) m ) ∈ [0 , m . Denoting the symmetric difference of any two sets A B by A (cid:9) B def = ( A \ B ) ∪ ( B \ A ), then (cid:12)(cid:12) R ∞ ( t , ..., t m ) − R ∞ ( t (cid:48) , ..., t (cid:48) m ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P K (cid:92) i ∈ [ m ] K ( i ) ( t i ) − P K (cid:92) i ∈ [ m ] K ( i ) ( t (cid:48) i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ P K (cid:92) i ∈ [ m ] K ( i ) ( t i ) (cid:9) (cid:92) i ∈ [ m ] K ( i ) ( t (cid:48) i ) ≤ P K (cid:91) i ∈ [ m ] (cid:16) K ( i ) ( t i ) (cid:9) K ( i ) ( t (cid:48) i ) (cid:17) ≤ (cid:88) i ∈ [ m ] P K (cid:16) K ( i ) ( t i ) (cid:9) K ( i ) ( t (cid:48) i ) (cid:17) = (cid:88) i ∈ [ m ] | t i − t (cid:48) i | . Using this inequality, it is now easy to obtain that R ∞ is uniformly continuous.Now we arrive at a theorem that is key to the proof of Interleaving Convergence Theorem.In the rest of the discussion, we use w.h.p. to refer to with high probability . Namely, if westate, as n → ∞ , w.h.p., a sequence ( A n ) ∞ n =1 of events holds, then this means that, as n → ∞ , theprobability Pr[ A n ] approaches 1. Theorem 6.9. (1st Asymptotic Theorem) The sup-norm (cid:107) R n − R ∞ (cid:107) ∞ converges to in probability. In other words, for any (cid:15) > , lim n →∞ Pr [ (cid:107) R n − R ∞ (cid:107) ∞ ≤ (cid:15) ] = 1 . For the proof, we recall an intuitive fact from probability theory: For a finite collection of N events A ,n , · · · , A N,n that depends on n = 1 , , · · · , if, foreach i ∈ [ N ] , lim n →∞ Pr[ A i,n ] = 1 , then lim n →∞ Pr[ (cid:84) i ∈ [ N ] A i,n ] = 1 .Proof of Theorem 6.9. Let ( t , ..., t m ) ∈ [0 , m . Let I be the indicator function of (cid:84) i ∈ [ m ] K ( i ) ( t i ).In other words, I : K → { , } is a function defined by I ( x ) = x ∈ (cid:84) i ∈ [ m ] K ( i ) ( t i )0 otherwise . Notice that, since ( K, P K ) is a probability space (with Borel σ -algebra), I is a random variable.Moreover, by Definition 6.2, if I , ..., I n are i.i.d copies of I , then R n ( t , ..., t m ) = 1 n n (cid:88) a =1 I a . Let p = P K (cid:16)(cid:84) i ∈ [ m ] K ( i ) ( t i ) (cid:17) = R ∞ ( t , ..., t m ) = Pr[ I = 1]. By Chebyshev inequality, for any (cid:15) > n → ∞ , Pr (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) n · n (cid:88) a =1 I a (cid:33) − p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) (cid:35) ≤ Var (cid:0) n · (cid:80) na =1 I a (cid:1) (cid:15) = p (1 − p ) n(cid:15) → . i.e. Pr [ | R n ( t , ..., t m ) − R ∞ ( t , ..., t m ) | > (cid:15) ] → 0. Thus we have obtained the pointwise convergenceversion of the result.To prove uniform convergence, consider the following: By Lemma 6.8, R ∞ is uniformly contin-uous. Thus there exists δ > t , ..., t m ) and ( t (cid:48) , ..., t (cid:48) m ) with max i ∈ [ m ] | t i − t (cid:48) i | ≤ δ , (cid:12)(cid:12) R ∞ ( t , ..., t m ) − R ∞ ( t (cid:48) , ..., t (cid:48) m ) (cid:12)(cid:12) ≤ (cid:15). Subdivide [0 , m into finitely many ( m -dimensional) rectangles of sides shorter than δ . Let V bethe collection of all vertices of all rectangles in the subdivision. Since V is a finite set, by the abovepointwise result, as n → ∞ , w.h.p., sup (cid:126)t ∈ V | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15). (30)In other words, lim n →∞ Pr (cid:34) sup (cid:126)t ∈ V | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15) (cid:35) = 1 . We now claim that Equation (30) implies (cid:107) R n − R ∞ (cid:107) ∞ = sup (cid:126)t ∈ [0 , m | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15). (31)Let (cid:126)t = ( t , ..., t m ) be an arbitrary element in [0 , m . Then (cid:126)t lies in some small rectangle of thesubdivision. Let (cid:126)t and (cid:126)t be the unique maximum and minimum, respectively, in the rectangle.Then R ∞ ( (cid:126)t ) − (cid:15) ≤ ( R ∞ ( (cid:126)t ) + (cid:15) ) − (cid:15) = R ∞ ( (cid:126)t ) (uniform continuity of R ∞ ) ≤ R n ( (cid:126)t ) + (cid:15) ( (cid:126)t is a vertex in the subdivision) ≤ R n ( (cid:126)t ) + (cid:15) ( R n is monotone and (cid:126)t ≤ (cid:126)t ) ≤ R n ( (cid:126)t ) + (cid:15) ( R n is monotone and (cid:126)t ≤ (cid:126)t ) ≤ ( R ∞ ( (cid:126)t ) + (cid:15) ) + (cid:15) ( (cid:126)t is a vertex in the subdivision) ≤ R ∞ ( (cid:126)t ) + (cid:15) + (cid:15) + (cid:15) (uniform continuity of R ∞ ) . Thus, R ∞ ( (cid:126)t ) − (cid:15) ≤ R n ( (cid:126)t ) ≤ R ∞ ( (cid:126)t ) + 2 (cid:15) . i.e. | R ∞ ( (cid:126)t ) − R n ( (cid:126)t ) | ≤ (cid:15) . Since (cid:126)t ∈ [0 , m is arbitrary,35quation (31) follows. In other words,lim n →∞ Pr (cid:34) sup (cid:126)t ∈ [0 , m | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15) (cid:35) = 1 . Rescaling 2 (cid:15) to (cid:15) , the uniform result follows.For people familiar with non-parametric statistics, it is immediate that Theorem 6.9 is anatural m -dimensional generalization of the standard Glivenko-Cantelli theorem under specificconditions.Recall that, for x ∈ K and i ∈ [ m ], T i ( x ) def = P K (cid:0) f − i ( −∞ , f i ( x )) (cid:1) . Lemma 6.10. For all (cid:15) > , as n → ∞ , w.h.p., K ( i ) ( t − (cid:15) ) ⊆ K ( i ) n ( t ) ⊆ K ( i ) ( t + (cid:15) ) , ∀ i ∈ [ m ] , t ∈ [0 , where K ( i ) ( t ) def = ∅ for t < and to be K ( i ) ( t ) def = K for t > .Proof. For t = 1, K ( i ) n ( t ) = K = K ( i ) ( t + (cid:15) ) and the inclusions are clearly satisfied. Now let t ∈ [0 , t ∈ [ an , a +1 n ) for some a ∈ { , , · · · , n − } . W.L.O.G., assume f i ( x ) < · · · 1) = T i ( x a +1 ) and R n (1 , · · · , , T i ( x a +1 ) , , · · · , 1) = n · K ( i ) ( T i ( x a +1 )) ∩ X n ) = an . Choose n large enough suchthat n < (cid:15) . Then | T i ( x a +1 ) − t | ≤ | T i ( x a +1 ) − an | + | an − t | = | R ∞ (1 , · · · , , T i ( x a +1 ) , , · · · , − R n (1 , · · · , , T i ( x a +1 ) , , · · · , 1) + | an − t | . In the last expression, by Theorem 6.9, w.h.p., the first term is less than (cid:15) , not depending on t , andthe second term is less than (cid:15) by our choice of sufficient large n . Thus, w.h.p., | T i ( x a +1 ) − t | < (cid:15) ,for all t ∈ [0 , 1] and the result follows. Corollary 6.11. For all (cid:15) > , as n → ∞ , w.h.p., R n ( t − (cid:15), ..., t m − (cid:15) ) ≤ ˆ R n ( t , ..., t m ) ≤ R n ( t + (cid:15), ..., t m + (cid:15) ) , ∀ ( t , ..., t m ) ∈ [0 , m where, for the variables of R n , negative inputs are automatically replaced by and inputs greaterthan are automatically replaced by . Glivenko-Cantelli theorem has been generalized in many aspects in different literatures and is closely related tothe famous VC (VapnikChervonenkis) theory in theoretical machine learning. See, for example, Chapter 12 of [6] fora detailed introduction that connects standard Glivenko-Cantelli theorem and the VC theory. roof. As n → ∞ , w.h.p., R n ( t − (cid:15), ..., t m − (cid:15) ) = 1 n · (cid:92) i ∈ [ m ] K ( i ) ( t i − (cid:15) ) ∩ X n (Definition 6.2) ≤ n · (cid:92) i ∈ [ m ] K ( i ) n ( t i ) ∩ X n (Lemma 6.10)(= ˆ R n ( t , ..., t m )) (Lemma 6.5) ≤ n · (cid:92) i ∈ [ m ] K ( i ) ( t i + (cid:15) ) ∩ X n (Lemma 6.10)= R n ( t + (cid:15), ..., t m + (cid:15) ) (Definition 6.2) . Hence the result follows.Motivated from Theorem 6.7, the key to prove the Interleaving Convergence Theorem is thezero sets of R ∞ , R n , and ˆ R n , explicitly defined below. Definition 6.12. Let R ∞ , R n , and ˆ R n be defined as in Definition 6.1, 6.2, and 6.4. Define thefollowing subsets of [0 , m :(i) Z ∞ def = R − ∞ (0) = (cid:110) ( t , ..., t m ) : (cid:84) i ∈ [ m ] K ( i ) ( t i ) = ∅ (cid:111) .(ii) Z n def = R − n (0) = (cid:110) ( t , ..., t m ) : X n ∩ (cid:16)(cid:84) i ∈ [ m ] K ( i ) ( t i ) (cid:17) = ∅ (cid:111) .(iii) ˆ Z n def = ˆ R − n (0) = (cid:110) ( t , ..., t m ) : X n ∩ (cid:16)(cid:84) i ∈ [ m ] K ( i ) n ( t i ) (cid:17) = ∅ (cid:111) . For Z ⊆ R m and (cid:15) > 0, define Z + (cid:15) def = { ( t + (cid:15) , ..., t m + (cid:15) m ) : ( t , ..., t m ) ∈ Z, ≤ (cid:15) i ≤ (cid:15) } . (32)Note that, since R ∞ , R n , and ˆ R n are all monotone, Z ∞ , Z n , and ˆ Z n are closed under lowerparital order; namely, for Z = Z ∞ , Z n or ˆ Z n , if (cid:126)t ∗ ∈ Z , then (cid:126)t ∈ Z for all (cid:126)t ≤ (cid:126)t ∗ . Lemma 6.13. Let Z ∞ , Z n , and ˆ Z n be defined as in Definition 6.12. Then, for all (cid:15) > , as n → ∞ , w.h.p.,(i) Z n ⊆ ˆ Z n + (cid:15) and ˆ Z n ⊆ Z n + (cid:15) .(ii) Z n ⊆ Z ∞ + (cid:15) .Moreover, with probability , See Definition 6.3 for the defining formula of K ( i ) n ( t i ). iii) Z ∞ ⊆ Z n .Proof. For the first inclusion in (i), let z ∈ Z n , where z = ( t , · · · , t m ); namely, R n ( z ) = 0.By Corollary 6.11, w.h.p., ˆ R n ( t − (cid:15), · · · , t m − (cid:15) ) = 0; namely, ( t − (cid:15), · · · , t m − (cid:15) ) ∈ ˆ Z n . Thus z = ( t , · · · , t m ) ∈ ˆ Z n + (cid:15) .For the second inclusion in (i), let z ∈ ˆ Z n , where z = ( t , · · · , t m ); namely, ˆ R n ( z ) = 0.By Corollary 6.11, w.h.p., R n ( t − (cid:15), · · · , t m − (cid:15) ) = 0; namely, ( t − (cid:15), · · · , t m − (cid:15) ) ∈ Z n . Thus z = ( t , · · · , t m ) ∈ Z n + (cid:15) .To prove (ii), first, let ν = inf { R ∞ ( (cid:126)t ) : (cid:126)t ∈ [0 , m \ ( Z ∞ + (cid:15) ) } ; if [0 , m \ ( Z ∞ + (cid:15) ) = ∅ , simplydefine ν = 1. Since Z ∞ is the domain where R ∞ takes zero values and Z ∞ + (cid:15) is strictly awayfrom Z ∞ , by continuity of R ∞ and the fact R ∞ (1 , · · · , 1) = 1, we must have ν > 0. By Theorem6.9, w.h.p., (cid:107) R n − R ∞ (cid:107) ∞ ≤ ν/ 2. Thus, by triangle inequality, w.h.p., R n ≥ ν − ν/ ν/ > , m \ ( Z ∞ + (cid:15) ). Now, for every (cid:126)t ∈ Z n , R n ( (cid:126)t ) = 0 and thus (cid:126)t / ∈ [0 , m \ Z ∞ + (cid:15) ; namely, (cid:126)t ∈ Z ∞ + (cid:15) . Hence Z n ⊆ Z ∞ + (cid:15) .To prove (iii), let (cid:126)t ∈ Z ∞ . Then R ∞ ( (cid:126)t ) = 0, where (cid:126)t = ( t , ..., t m ); namely P K ( (cid:84) i ∈ [ m ] K ( i ) ( t i )) =0. Since K ( i ) ( t i ) is open for all i ∈ [ m ], (cid:84) i ∈ [ m ] K ( i ) ( t i ) is open with zero measure. Therefore, (cid:84) i ∈ [ m ] K ( i ) ( t i ) = ∅ and thus R n ( (cid:126)t ) = n · (cid:84) i ∈ [ m ] K ( i ) ( t i )) ∩ X n ) = 0. Hence, (cid:126)t ∈ Z n and Z ∞ ⊆ Z n .Immediate from Lemma 6.13 is Corollary 6.14. Let Z ∞ and ˆ Z n be defined as in Definition 6.12. Then, for all (cid:15) > , as n → ∞ ,w.h.p., ˆ Z n ⊆ Z ∞ + (cid:15) and Z ∞ ⊆ ˆ Z n + (cid:15). We are now able to prove Theorem 2.9. Theorem. (Theorem 2.9, Interleaving Convergence Theorem) Let M n ∈ M om,n be sampled from a regular pair ( F , P K ) . Then, for all (cid:15) > , as n → ∞ , Pr [ d INT ( D ow( F , P K ) , Dow( S ( M n ))) ≤ (cid:15) ] → . Proof. Let (cid:15) > 0. We need to prove: as n → ∞ , w.h.p. D ow( F , P K )( t − (cid:15), ..., t m − (cid:15) ) ⊆ Dow( S ( M n ))( t , ..., t m ) ⊆ D ow( F , P K )( t + (cid:15), ..., t m + (cid:15) ) . By Corollary 6.14, as n → ∞ , w.h.p., ˆ Z n ⊆ Z ∞ + (cid:15) and Z ∞ ⊆ ˆ Z n + (cid:15) .For the first inclusion, let σ ∈ D ow( F , P K )( t − (cid:15), ..., t m − (cid:15) ). Then, by Theorem 6.7, R ∞ (( t − (cid:15) ) σ , · · · , ( t m − (cid:15) ) σ ) (cid:54) = 038here ( t i − (cid:15) ) σ = ( t i − (cid:15) ) if i ∈ σ and ( t i − (cid:15) ) σ = 1 if i / ∈ σ . We need to prove ˆ R n ( t σ , · · · , t σm ) (cid:54) = 0.Let us prove by contradiction. Suppose ˆ R n ( t σ , · · · , t σm ) = 0. Then ( t σ , · · · , t σm ) ∈ ˆ Z n . Thus( t σ , · · · , t σm ) ∈ Z ∞ + (cid:15) and (( t − (cid:15) ) σ , · · · , ( t m − (cid:15) ) σ ), meaning R ∞ ((( t − (cid:15) ) σ , · · · , ( t m − (cid:15) ) σ )) = 0,a contradiction. Thus ˆ R n ( t σ , · · · , t σm ) (cid:54) = 0 and σ ∈ Dow( S ( M n ))( t , ..., t m ), completing the 1stpart. Analogously, the second inclusion can be proved by using Theorem 6.7, with the help of Z ∞ ⊆ ˆ Z n + (cid:15) . L k ( M n ) to L k ( F , P K ) In this subsection, we state the well-known Isometry Theorem in topological data analysis anduse it to prove Theorem 3.10. We begin with the definition of a quadrant-tame persistence module. Definition 6.15 (Definition 1.12 in [11]) . A persistence module V = ( V i , v ji ) over R is quadrant-tame if rank v ji < ∞ for all i < j . Theorem 6.16 (Isometry Theorem, Theorem 3.1 in [11]) . Let V , W be quadrant-tame persistencemodules over R . Then d b (dgm( V ) , dgm( W )) = d i ( V , W ) where d b is the bottleneck distance between persistence diagrams and d i is the interleaving distancebetween persistence modules. Notice that, throughout the paper, all simplicial complexes are subcomplexes of 2 [ m ] and henceall vector spaces in the persistence modules we consider are finite dimensional and thus quadrant-tame. Therefore, we have the Isometry Theorem available. In the rest of this section, the proofof Theorem 3.10 is broken into several lemmas based on some newly developed tools. Since thepresentation is in logical order instead of the order of ideas, we give a quick overview of how theyare pieced together.The central observation throughout the proof is Lemma 6.21, which writes both L k ( F , P K )and L k ( M n ) in terms of double supremum expressions. Notice that their expressions only differ in D ow( F , P K ) and Dow( S ( M n )), and in T ( F , P K ) + and T ( M n ) + , whcih are introduced in Definition6.18 and Definition 6.17.With this in mind, it is easy to see that a result that bounds the variation of the doublesupremum expression when D ow( F , P K ) is replaced by Dow( S ( M n )) is needed, which is exactlyLemma 6.23. Similarly, a result that bounds the variation of the double supremum expressionwhen T ( F , P K ) + is replaced by T ( M n ) + is also needed, which is Lemma 6.22. We still need tojustify the applicability of Lemma 6.23 and Lemma 6.22, respectively, which corresponds to theInterleaving Convergence Theorem (Theorem 2.9) and Lemma 6.19, respectively. Now the piecescan be connected and combined to complete the proof of Theorem 3.10. Notice that IsometryTheorem (Theorem 6.16) is lurking in the proofs of Lemma 6.23 and Lemma 6.22 and thus playingan important role in the proof of Theorem 3.10.39 efinition 6.17. Let T ⊆ [0 , m . Define the set of diagonal rays of T , denoted T + , by T + def = { ray T : T = ( t , ..., t m ) ∈ T } (33)where ray T def = (cid:8) ( t − t, ..., t m − t ) : t ∈ [0 , max i ∈ [ m ] t i ] (cid:9) . Definition 6.18. Let ( F , P K ) be a regular pair and M n ∈ M om,n be sampled from ( F , P K ). Definethe following two subsets of [0 , m : T ( F , P K ) def = { ( T ( x ) , ..., T m ( x )) : x ∈ K } , andˆ T ( M n ) def = { (ord ( M n , a ) /n, · · · , ord m ( M n , a ) /n ) : a ∈ [ n ] } . Recall that the Hausdorff distance between two subsets S , S of R m is defined asd H ( S , S ) = inf { (cid:15) > S ⊆ S + B (0 , (cid:15) ) , S ⊆ S + B (0 , (cid:15) ) } (34)where B (0 , (cid:15) ) is the (cid:15) -ball in R m centered at 0 and + inside the inf is the operation of Minkowskisum. In the next lemma, we prove that ˆ T ( M n ) approximates T ( F , P K ) in Hausdorff distance. Lemma 6.19. Let M n ∈ M om,n be sampled from a regular pair ( F , P K ) . Then, as n → ∞ , d H ( T ( F , P K ) , ˆ T ( M n )) → in probability. (35) Proof. Recall that, for each i ∈ [ m ], T i = φ i ◦ f i , where φ i is a monotone increasing function. Sincethere is no measure jump in a regular pair (i.e. P K ( f − i ( (cid:96) )) = 0 for all i ∈ [ m ] , (cid:96) ∈ R ), each φ i is continuous and so is each T i . Since each f i can be extended continuously to ¯ K , we also haveeach T i continuously extendable to ¯ K . Since ¯ K is compact, the function ( T , ..., T m ) : ¯ K → R m isuniformly continuous.Let (cid:15) > 0. We need to prove, as n → ∞ , w.h.p.,d H ( T ( F , P K ) , ˆ T ( M n )) < (cid:15) (36)By uniform continuity, there exists δ > x, y ∈ K with (cid:107) x − y (cid:107) ≤ δ , (cid:107) ( T ( x ) , · · · , T m ( x )) − ( T ( y ) , · · · , T m ( y )) (cid:107) ≤ (cid:15)/ . (37)Let X n = { x , ..., x n } be a sample of size n , i.i.d. from ( F , P K ). Let us prove that, as n → ∞ ,w.h.p., K ⊆ X n + B (0 , δ ) (38)Since K is bounded, we may cover K by finitely many small rectangles of diameters smaller than δ ,where each rectangle intersects K and the rectangles intersect each other only on their boundaries.40enote the rectangles by { R , ..., R N } . Let p j def = P K ( R j ∩ K ), which are positive by openness of K . Then Pr[( R j ∩ K ) ∩ X n (cid:54) = ∅ , ∀ j ∈ [ N ]] ≥ − N (cid:88) j =1 (1 − p j ) n . (39)Since N is finite and each 1 − p j ∈ [0 , n → ∞ , w.h.p., ( R j ∩ K ) ∩ X n (cid:54) = ∅ , for all j = 1 , ..., N .Since the diameter of each R j ∩ K is less than δ , Equation (38) follows.Let us prove another preparatory result: as n → ∞ , w.h.p.,max a ∈ [ n ] ,i ∈ [ m ] (cid:12)(cid:12)(cid:12)(cid:12) ord i ( M n , a ) n − T i ( x a ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:114) (cid:15) m . (40)Treating each T i as a cumulative distribution function defined on K , since [ m ] is finite, Equation(40) is an immediate consequence of Glivenko-Cantelli Theorem.Now we return to the proof. Notice that T ( F , P K ) is the image of ( T , ..., T m ) : K → [0 , m .By Equation (38) and Equation (37), w.h.p., T ( F , P K ) ⊆ ( T , ..., T m )( X n )+ B (0 , (cid:15)/ T , ..., T m )( X n ) ⊆ ˆ T ( M n ) + B (0 , (cid:15)/ T ( F , P K ) ⊆ ˆ T ( M n ) + B (0 , (cid:15) ). On the otherhand, by Equation (40), ˆ T ( M n ) ⊆ T ( F , P K ) + B (0 , (cid:15)/ ⊆ T ( F , P K ) + B (0 , (cid:15) ), completing theproof.In the following, we develop the convention of restricting a multi-filtered complex to a diagonalray as defined in Definition 6.17. Definition 6.20. Let T ⊆ [0 , m and K = {K ( T ) = K ( t , ..., t m ) } T ∈ R m be a multi-filtered complexindexed over R m with K ( T ) ⊆ [ m ] for all T ∈ R m . For T = ( t , ..., t m ) ∈ T , let ray T be asin Definition 6.17. Define the restriction of K to ray T as the 1-dimensional filtered complex K| ray T = {K| ray T ( t ) } t , indexed over t ∈ [0 , max i ∈ [ m ] t i ], by K| ray T ( t ) def = K (cid:18) t − max i ∈ [ m ] t i + t, · · · , t m − max i ∈ [ m ] t i + t (cid:19) . (41)Since we usually need to use interleaving distance to compare two filtered complexes, we extendthe indexing set of K| ray T to R by K| ray T ( t ) def = ∅ if t < [ m ] if t > max i ∈ [ m ] t i . (42)With these conventions, we state the following lemma:41 emma 6.21. For each k ∈ { } ∪ N , L k ( F , P K ) = sup ray ∈T ( F ,P K ) + sup { ( β − α ) : ( α, β ) ∈ dgm( H k ( D ow( F , P K ) | ray )) } , and L k ( M n ) = sup ray ∈ ˆ T ( M n ) + sup { ( β − α ) : ( α, β ) ∈ dgm( H k (Dow( S ( M n )) | ray )) } . Proof. The first equality follows from Equation (9) in Definition 3.7 and the second equality canbe obained from Equation (15) in Definition 3.9.The next lemma will be used to connect Lemma 6.19 and Lemma 6.21. Lemma 6.22. Let T , T ⊆ [0 , m such that d H ( T , T ) < (cid:15) . Let K = {K ( T ) } T ∈ R m be a multi-filtered complex indexed over R m with K ( T ) ⊆ [ m ] for all T . Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15). Proof. For any constant η > 0, we may choose ray ∈ ( T ) + such thatsup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } ≤ sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + η . (43)Let T be the element in T that ray is constructed from. Since d H ( T , T ) < (cid:15) , there exists T ∈ T such that (cid:107) T − T (cid:107) < (cid:15) . Let ray ∈ ( T ) + be constructed from T . Then d INT ( K| ray , K| ray ) < (cid:15) ,implying d i ( H k ( K| ray ) , H k ( K| ray )) < (cid:15). By Isometry Theorem (Theorem 6.16),d b (dgm( H k ( K| ray )) , dgm( H k ( K| ray ))) < (cid:15). (44)For any constant η > 0, there exists ( a , b ) ∈ dgm( H k ( K| ray )) such thatsup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } ≤ b − a + η . (45)By Equation (44), there exists ( a , b ) ∈ dgm( H k ( K| ray )) such that max {| a − a | , | b − b |} < (cid:15) .Therefore, b − a ≤ b − a + 2 (cid:15). (46)42ombining Equation (43), (45) and (46), we obtainsup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) }≤ b − a + η + η ≤ b − a + 2 (cid:15) + η + η ≤ sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + 2 (cid:15) + η + η ≤ sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + 2 (cid:15) + η + η Since η , η > ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } ≤ sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + 2 (cid:15). Reversing the role of T and T , we may obtain the other direction, completing the proof. Lemma 6.23. Let K and L be multi-filtered complexes indexed over R m . Let T ⊆ [0 , m . If d INT ( K , L ) < (cid:15) , then, for all k ∈ { } ∪ N , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15). Proof. For a constant η > 0, let ray ∈ T + and ( a , b ) ∈ dgm( H k ( L| ray )) such thatsup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } ≤ b − a + η. (47)Consider K| ray . Since d INT ( K , L ) < (cid:15) , d INT ( K| ray , L| ray ) < (cid:15) . Taking the H k functor, by theIsometry Theorem, d b (dgm( H k ( K| ray )) , dgm( H k ( L| ray ))) < (cid:15), which implies that there exists ( a , b ) ∈ dgm( H k ( K| ray )) such that max( | a − a | , | b − b | ) < (cid:15) andhence b − a ≤ b − a + 2 (cid:15). (48)43herefore,sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) }≤ b − a + 2 (cid:15) + η by Equation (47) and (48) ≤ sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } . Since η > ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } ≤ sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } +2 (cid:15). (49)Reversing the role of K and L , the other direction can be otained, completing the proof.With all the above lemmas, we may now present a rigorous proof of Theorem 3.10. Let usrestate Theorem 3.10 for easy reference. Theorem (Theorem 3.10) . Let M n ∈ M om,n be sampled from a regular pair ( F , P K ) . Assume that K is bounded and each f i can be continuously extended to the closure ¯ K . Then, for all k ∈ { } ∪ N ,as n → ∞ , L k ( M n ) converges to L k ( F , P K ) in probability; namely, for all (cid:15) > , lim n →∞ Pr [ | L k ( M n ) − L k ( F , P K ) | < (cid:15) ] = 1 . Moreover, the rate of convergence is independent of k .Proof of Theorem 3.10. For notational simplicity, let T = T ( F , P K ), T = ˆ T ( M n ), K = D ow( F , P K )and L = Dow( S ( M n )). Let (cid:15) > 0. By Lemma 6.21, we need to prove, as n → ∞ , w.h.p., (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15). By Interleaving Convergence Theorem (Thoerem 2.9), as n → ∞ , w.h.p., d INT ( K , L ) ≤ (cid:15)/ 4. There-fore, by Lemma 6.23, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15)/ . (50)On the other hand, by Lemma 6.19, as n → ∞ , d H ( T , T ) ≤ (cid:15)/ 4. Therefore, by Lemma 6.22, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15)/ . (51)Hence, combining Equation (50) and Equation (51), the result follows.44 .5 A lemma used in the proof of Lemma 4.5 Lemma 6.24. For t , ..., t m ∈ (0 , and σ ⊆ [ m ] , if (cid:84) i ∈ [ m ] K ( i ) ( t i ) (cid:54) = ∅ , then there exists (cid:15) > such that (cid:84) i ∈ σ K ( i ) ( t i − (cid:15) ) (cid:54) = ∅ . In addition, by monotonicity of K ( i ) ( t ) , we also have (cid:84) i ∈ σ K ( i ) ( t i − η ) (cid:54) = ∅ , for all < η ≤ (cid:15) .Proof. Let (cid:15) n be a sequence with (cid:15) n (cid:38) 0. Let us first prove that K ( i ) ( t i − (cid:15) n ) (cid:37) K ( i ) ( t i ); equiva-lently, K ( i ) ( t i ) = (cid:83) ∞ n =1 K ( i ) ( t i − (cid:15) n ). For any n , K ( i ) ( t i − (cid:15) n ) ⊆ K ( i ) ( t i ) by definition. Therefore, (cid:83) ∞ n =1 K ( i ) ( t i − (cid:15) n ) ⊆ K ( i ) ( t i ). For the other inclusion, assume x ∈ K ( i ) ( t i ) = f − ( −∞ , λ i ( t i )).Then f i ( x ) < λ i ( t i ). By Lemma 2.4, λ i is continuous and strictly increasing. Hence, there exists n such that λ i ( t i − (cid:15) n ) > f i ( x ); in other words, x ∈ K ( i ) ( t i − (cid:15) n ). Therefore, x ∈ (cid:83) ∞ n =1 K ( i ) ( t i − (cid:15) n ),proving the claim.Now we have, as n (cid:37) ∞ , K ( i ) ( t i − (cid:15) n ) (cid:37) K ( i ) ( t i ). Thus (cid:84) i ∈ σ K ( i ) ( t i − (cid:15) n ) (cid:37) (cid:84) i ∈ σ K ( i ) ( t i ).Since (cid:84) i ∈ σ K ( i ) ( t i ) (cid:54) = ∅ , there must exist n such that (cid:84) i ∈ σ K ( i ) ( t i − (cid:15) n ) (cid:54) = ∅ . Taking (cid:15) = (cid:15) n , theresult follows. is open This subsection is devoted to the proof of openness of Cent . Lemma 6.25. Let { f i : K → R } i ∈ [ m ] be a collection of quasi-convex C functions, where K isopen convex in R d . Then the set Cent = (cid:8) x ∈ K : cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d (cid:9) is open in K . Inparticular, Cent (cid:54) = ∅ is equivalent to P K (Cent ) > .Proof. Define a function h : K × S d − → R by h ( x, u ) = max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) . Since each f i is C , the functions ( x, u ) (cid:55)→ (cid:104) u, ∇ f i ( x ) (cid:105) are continuous and hence h is also continuous. For x ∈ K , we define ρ ( x ) = min u ∈ S d − h ( x, u ). Let us prove that, for x ∈ K , ρ ( x ) > {∇ f i ( x ) } i ∈ [ m ] ) = R d .For one direction, let x ∈ K satisfy ρ ( x ) > 0, or, equivalently, max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) > 0, forall u ∈ S d − . If 0 ∈ bd(conv( {∇ f i ( x ) } i ∈ [ m ] )), then the nonzero vector v pointing outward ofconv( {∇ f i ( x ) } i ∈ [ m ] ) and orthogonal to the hyperface containing 0 will make max i ∈ [ m ] (cid:104) v, ∇ f i ( x ) (cid:105) =0, a contradiction. If 0 / ∈ conv( {∇ f i ( x ) } i ∈ [ m ] ), then taking v = − argmin z ∈ conv( {∇ f i ( x ) } i ∈ [ m ] ) (cid:104) z, z (cid:105) willmake max i ∈ [ m ] (cid:104) v, ∇ f i ( x ) (cid:105) < 0, also a contradiction. Therefore, 0 ∈ int(conv( {∇ f i ( x ) } i ∈ [ m ] )) andhence cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d . For the other direction, let x ∈ K satisfy cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d . To prove ρ ( x ) > 0, since u (cid:55)→ max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) is continuous and S d − is compact, it sufficesto prove, for all u ∈ S d − , max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) > 0. Given u ∈ S d − , since cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d , u = (cid:80) i ∈ [ m ] r i · ∇ f i ( x ), for some r i ≥ 0. If (cid:104) u, ∇ f i ( x ) (cid:105) ≤ i ∈ [ m ], then (cid:104) u, u (cid:105) = (cid:80) i ∈ [ m ] r i (cid:104) u, ∇ f i ( x ) (cid:105) ≤ 0, a contradiction. Thus, the other direction is proved and, for x ∈ K , ρ ( x ) > {∇ f i ( x ) } i ∈ [ m ] ) = R d .Now let x ∈ K such that cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d . By what has been claimed, this isequivalent to ρ ( x ) > 0. We want to prove that there exists (cid:15) > x ∈ B ( x , (cid:15) ),45one( {∇ f i ( x ) } i ∈ [ m ] ) = R d , or equivalently, ρ ( x ) > 0. Suppose not, then there exists a sequence( x n , u n ) ∈ K × S d − such that x n → x and h ( x n , u n ) ≤ n . By compactness of S d − ,there is a subsequence u n j → u and thus by continuity of h , h ( x , u ) ≤ 0. However, h ( x , u ) ≥ min u ∈ S d − h ( x , u ) = ρ ( x ) > 0, a contradiction. Thus the proof is complete. Throughout this subsection, Cent is as defined in Definition 5.1, (cid:91) Cent ( M n ) is as defined inDefinition 5.4, and ˆ Z n and Z ∞ are as defined in Definition 6.12. The following two functions playa crucial role throughout the proof of Theorem 5.5. Definition 6.26. Let X n = { x , ..., x n } be sampled from a regular pair ( F , P K ) and M n ∈ M om,n be the correspondin data matrix. Define τ : K → [0 , m and ˆ τ n : X n → [0 , m by τ ( x ) def = ( T ( x ) , ..., T m ( x ))ˆ τ n ( x a ) def = (cid:18) ord ( M n , a ) − n , · · · , ord m ( M n , a ) − n (cid:19) . In order to prove Theorem 5.5, we first prove the following lemmas (Lemma 6.27 - Lemma6.29). Lemma 6.27. Let τ : K → [0 , m and ˆ τ n : X n → [0 , m be defined as in Definition 6.26. Then(i) τ − ( Z ∞ ) = Cent ,(ii) ˆ τ − n ( ˆ Z n ) = (cid:110) x a ∈ X n : a ∈ (cid:91) Cent ( M n ) (cid:111) , and(iii) τ − ( Z ∞ ) ∩ X n ⊆ ˆ τ − n ( ˆ Z n ) .Proof. To prove (i), τ − ( Z ∞ ) = { x ∈ K : τ ( x ) ∈ Z ∞ } = x ∈ K : P K (cid:92) i ∈ [ m ] K ( i ) ( T i ( x )) = 0 = x ∈ K : (cid:92) i ∈ [ m ] K ( i ) ( T i ( x )) = ∅ = x ∈ K : (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) = ∅ = Cent . 46o prove (ii),ˆ τ − n ( ˆ Z n ) = { x a ∈ X n : ˆ τ n ( x a ) ∈ ˆ Z n } = x a ∈ X n : (cid:92) i ∈ [ m ] { b ∈ [ n ] : ord i ( M n , b ) ≤ ord i ( M n , a ) − } = ∅ = x a ∈ X n : a ∈ [ n ] , (cid:92) i ∈ [ m ] { b ∈ [ n ] : M ib < M ia } = ∅ = { x a ∈ X n : a ∈ (cid:91) Cent ( M n ) } . To prove (iii), assume x a ∈ τ − ( Z ∞ ) ∩ X n . Then (cid:84) i ∈ [ m ] f − i ( −∞ , f i ( x a )) = ∅ . Thusˆ R n (ˆ i n ( x a )) = ˆ R n (cid:18) ord ( M n , a ) − n , ..., ord m ( M n , a ) − n (cid:19) = 1 n · X n ∩ (cid:92) i ∈ [ m ] K ( i ) n (cid:18) ord i ( M n , a ) − n (cid:19) (by Lemma 6.5)= 1 n · X n ∩ (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x a )) (by Definition 6.3)= 0 . Hence, x a ∈ ˆ τ − n ( ˆ Z n ) and the inclusion in (iii) follows. Lemma 6.28. Let τ : K → [0 , m and ˆ τ n : X n → [0 , m be defined as in Definition 6.26. Then,for any δ > , as n → ∞ , w.h.p., ˆ τ − n ( ˆ Z n ) ⊆ τ − ( Z ∞ + δ ) . Proof. By Corollary 6.14, w.h.p., ˆ Z n ⊆ Z ∞ + δ/ . (52)If Z ∞ = [0 , m , then τ − ( Z ∞ + δ ) = K and hence ˆ τ − n ( ˆ Z n ) ⊆ τ − ( Z ∞ + δ ) clearly holds. Nowsuppose Z ∞ (cid:54) = [0 , m . Since Z ∞ is closed, int ([0 , m \ Z ∞ ) is nonempty and open, say containing x . Choose (cid:15) > (cid:15) < δ and B ( x , √ d(cid:15) ) ⊆ int ([0 , m \ Z ∞ ). Then Z ∞ + (cid:15)/ (cid:36) Z ∞ + (cid:15) .Thus µ ∗ def = d H ( Z ∞ + (cid:15)/ , Z ∞ + (cid:15) ) / √ d is a positive number, where d H is the Hausdorff distance. Since Z ∞ + (cid:15)/ (cid:15)/ Z ∞ + (cid:15) , we have the inequality µ ∗ ≤ (cid:112) d · ( (cid:15)/ / √ d = ( (cid:15)/ < δ/ . (53) See Equation (32) for the definition of Z ∞ + δ . See Equation (34) for the definition of Hausdorff distance. 47y Equation (40) and definition of τ and ˆ τ n , w.h.p.,sup x ∈ X n (cid:107) τ ( x ) − ˆ τ n ( x ) (cid:107) < µ ∗ / . (54)There is one more property of Z ∞ , following from monotonicity of R ∞ , that we need in this proof:if ( t , ..., t m ) ∈ Z ∞ and ( t (cid:48) , ..., t (cid:48) m ) ≤ ( t , ..., t m ), then ( t (cid:48) , ..., t (cid:48) m ) ∈ Z ∞ ; in short, Z ∞ is closed under ≤ . Now we can prove the inclusion. Assume x ∈ ˆ τ − n ( ˆ Z n ), namely, ˆ τ n ( x ) ∈ ˆ Z n . Then τ ( x ) ∈ ˆ τ n ( x ) + B (0 , µ ∗ / 2) (by (54)) ⊆ ˆ Z n + B (0 , µ ∗ / 2) (since ˆ τ n ( x ) ∈ ˆ Z n ) ⊆ Z ∞ + δ/ B (0 , µ ∗ / 2) (by (52)) ⊆ Z ∞ + δ/ µ ∗ / Z ∞ is closed under ≤ ) ⊆ Z ∞ + δ/ δ/ ⊆ Z ∞ + δ. Therefore, x ∈ τ − ( Z ∞ + δ ). Since x ∈ ˆ τ − n ( ˆ Z n ) is arbitrary, the proof is complete. Lemma 6.29. Let X n ⊂ K be a point cloud of size n , sampled from a regular pair ( F , P K ) . Let τ be defined as in Definition 6.26. Then, for all (cid:15) > , there exists δ > such that, as n → ∞ ,w.h.p., (cid:12)(cid:12)(cid:12)(cid:12) X n ∩ τ − ( Z ∞ + δ )) n − P K (Cent ) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15)/ . Proof. Let us first prove that lim δ (cid:38) P K ( τ − ( Z ∞ + δ )) = P K (Cent ). Since Z ∞ is closed, Z ∞ + δ (cid:38) Z ∞ as δ (cid:38) 0. Thus, τ − ( Z ∞ + δ ) (cid:38) τ − ( Z ∞ ). Therefore, by monotone convergence theorem,lim δ (cid:38) P K ( τ − ( Z ∞ + δ )) = P K ( τ − ( Z ∞ )) = P K (Cent ).Since lim δ (cid:38) P K ( τ − ( Z ∞ + δ )) = P K (Cent ), we may choose δ such that | P K ( τ − ( Z ∞ + δ )) − P K (Cent ) | < (cid:15)/ . (55)Note that X n ∩ τ − ( Z ∞ + δ ) is an i.i.d. sample of τ − ( Z ∞ + δ ) ⊆ K . Thus, by law of large numbers,as n → ∞ , w.h.p., (cid:12)(cid:12)(cid:12)(cid:12) X n ∩ τ − ( Z ∞ + δ )) n − P K ( τ − ( Z ∞ + δ )) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15)/ . (56)Combining Equation (55) and (56), the result follows.With the help of previous lemmas, we give a proof of Theorem 5.5. Proof of Theorem 5.5. Let τ and ˆ τ n be defined as in Definition 6.26. Let (cid:15) > 0. Note that X n ∩ τ − ( Z ∞ ) is an i.i.d. sample of τ − ( Z ∞ ) = Cent ⊆ K . Thus, by law of large numbers, as n → ∞ ,48.h.p., (cid:12)(cid:12) X n ∩ τ − ( Z ∞ )) /n − P K (Cent ) (cid:12)(cid:12) < (cid:15)/ 2. Therefore, as n → ∞ , w.h.p., P K (Cent ) − (cid:15)/ < X n ∩ τ − ( Z ∞ )) /n ≤ τ − n ( ˆ Z n )) /n (by (iii) of Lemma 6.27)= (cid:91) Cent ( M n )) /n (by Lemma 6.27)On the other hand, as n → ∞ , w.h.p., (cid:91) Cent ( M n )) n = τ − n ( ˆ Z n )) /n (by Lemma 6.27)= τ − n ( ˆ Z n ) ∩ X n ) /n (since ˆ τ − n ( ˆ Z n ) ⊆ X n ) ≤ τ − ( Z ∞ + δ ) ∩ X n ) /n (by Lemma 6.28) ≤ P K (Cent ) + (cid:15)/ . Therefore, as n → ∞ , w.h.p., (cid:12)(cid:12)(cid:12)(cid:12) (cid:91) Cent ( M n )) n − P K (Cent ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15)/ Acknowledgments This work was supported by the NSF IOS-155925 grant. References [1] A. Bj¨orner, Handbook of combinatorics (vol. 2) , MIT Press, Cambridge, MA, USA, 1995,pp. 1819–1872.[2] Omer Bobrowski, Matthew Kahle, and Primoz Skraba, Maximally persistent cycles in randomgeometric complexes , The Annals of Applied Probability (2015).[3] Stephen Boyd and Lieven Vandenberghe, Convex optimization , 2004.[4] Alberto Cambini and Laura Martein, Generalized convexity and optimization: Theory andapplications , Lecture Notes in Economics and Mathematical Systems (2009).[5] Fr´ed´eric Chazal, Vin de Silva, and Steve Oudot, Persistence stability for geometric complexes ,Geometriae Dedicata (2014), no. 1, 193–214.[6] Luc Devroye, Lszl Gyrfi, and Gbor Lugosi, A probabilistic theory of pattern recognition , vol. 31,01 1996.[7] Allen Hatcher, Algebraic topology , Cambridge University Press, Cambridge, 2002. MR 1867354(2002k:55001) 498] Michael Lesnick, The theory of the interleaving distance on multidimensional persistence mod-ules , Foundations of Computational Mathematics (2015), no. 3, 613650.[9] John O’Keefe and Jonathan Dostrovsky, The hippocampus as a spatial map. preliminary evi-dence from unit activity in the freely-moving rat. , Brain Res. (1971), no. 1, 171–175.[10] Nina Otter, Mason A Porter, Ulrike Tillmann, Peter Grindrod, and Heather A Harrington, Aroadmap for the computation of persistent homology , EPJ Data Science (2017), no. 1.[11] Steve Y. Oudot, Persistence theory: from quiver representations to data analysis , Mathemati-cal Surveys and Monographs, vol. 209, American Mathematical Society, Providence, RI, 2015.[12] Rommel G. Regis, On the properties of positive spanning sets and positive bases , Optimizationand Engineering (2016), no. 1, 229–262.[13] Michael M. Yartsev and Nachum Ulanovsky, Representation of three-dimensional space in thehippocampus of flying bats , Science340