[PDF] A Topological Approach to Inferring the Intrinsic Dimension of Convex Sensing Data

Abstract

We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we develop a method for inferring the intrinsic dimension of the data from measurements by quasi-convex functions, under natural generic assumptions. The dimension inference problem depends only on discrete data of the ordering of the measured points of space, induced by the sensor functions. We introduce a construction of a filtration of Dowker complexes, associated to measurements by quasi-convex functions. Topological features of these complexes are then used to infer the intrinsic dimension. We prove convergence theorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, under natural generic assumptions. We also illustrate the usability of this method in simulations.

Full PDF

AA Topological Approach to Inferring the Intrinsic Dimension ofConvex Sensing Data

Min-Chun Wu Vladimir ItskovJuly 8, 2020

Abstract

We consider a common measurement paradigm, where an unknown subset of an aﬃne spaceis measured by unknown continuous quasi-convex functions. Given the measurement data, canone determine the dimension of this space? In this paper, we develop a method for inferringthe intrinsic dimension of the data from measurements by quasi-convex functions, under naturalgeneric assumptions.The dimension inference problem depends only on discrete data of the ordering of the mea-sured points of space, induced by the sensor functions. We introduce a construction of a ﬁltrationof Dowker complexes, associated to measurements by quasi-convex functions. Topological fea-tures of these complexes are then used to infer the intrinsic dimension. We prove convergencetheorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, un-der natural generic assumptions. We also illustrate the usability of this method in simulations.

Contents L k ( F , P K ) and its relation to the dimension of ( F , P K ) . . . . . . . . . . . . . . . . . 133.3 L k ( M ) and its convergence to L k ( F , P K ) . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Algorithm for ˆ K a and L k ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 How to use the algorithms under diﬀerent situations . . . . . . . . . . . . . . . . . . 183.5.1 Subsample points when n is suﬃciently large . . . . . . . . . . . . . . . . . . 193.5.2 Subsample functions when m is large . . . . . . . . . . . . . . . . . . . . . . . 211 a r X i v : . [ m a t h . A T ] J u l ˆ d low ( M, ε ) as an asymptotically consistent dimension estimator in the class ofcomplete regular pairs 22 ( F , P K ) from sampled data 25 λ i ( t ) for t ∈ (0 ,

1) . . . . . . . . . . . . . . . . . . . . . . 296.3 Proof of Interleaving Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . 306.4 Proof of Theorem 3.10, the convergence of L k ( M n ) to L k ( F , P K ) . . . . . . . . . . . 396.5 A lemma used in the proof of Lemma 4.5 . . . . . . . . . . . . . . . . . . . . . . . . 456.6 Cent is open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.7 Proof of Theorem 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Data in many scientiﬁc applications are often obtained by “sensing” the phase space via sen-sors/functions that are convex. Convex sensing is a class of problems of inferring the geometry ofdata that are sampled via such functions. To be precise, let us recall the following

Deﬁnition 1.1.

Let K ⊆ R d be open convex. A function f : K → R is quasi-convex if eachsublevel set f − ( −∞ , (cid:96) ) = { x ∈ K | f ( x ) < (cid:96) } is convex or empty, for all (cid:96) ∈ R .The following is perhaps the shortest, albeit naive and incomplete, formulation of a convexsensing problem. A collection of n points X = { x a } na =1 in an open convex region K ⊂ R d is sensedby measuring the values of m sensors , i.e. quasi-convex functions F = { f i : K → R } mi =1 . Supposethat one has access only to the m × n data matrix M = [ M ia ] of sensor values, where M ia = f i ( x a ) , (1)but does not have direct access to the information about the dimension d of the underlying space,the open convex region K , the points x a ∈ K , or any further details of the quasi-convex functions f i . Can one recover any geometric information about the sampled region K ? At the very minimum,can one infer the dimension d ? While the convex sensing problems may be not uncommon in many scientiﬁc applications, ourchief motivation comes from neuroscience. Neurons in the brain regions that represent sensory2 ell 1 cell 2 cell 3

Figure 1: The activities of three diﬀerent experimentally recorded place cells in a rat’s hippocampus.The color represents the probability of each neuron’s ﬁring as a function of the animal’s location.information often possess receptive ﬁelds . A paradigmatic example of a receptive ﬁeld is that of ahippocampal place cell [9]. Place cells are a class of neurons in rodent hippocampus that act as po-sition sensors. Here the relevant stimulus space K ⊂ R d is the animal’s physical environment [13],with d ∈ { , , } , and x ∈ K is the animal’s location in this space. Each neuron is activated witha certain probability that is a continuous function f : K → R ≥ of the animal’s position in space.In other words, the probability of a single neuron’s activation at a time t is given by p ( t ) = f ( x ( t )),where x ( t ) is the animal’s position. For each neuron, the function f is called its place ﬁeld , and isapproximately quasi-concave (see examples of place ﬁelds in Figure 1). Place ﬁelds can be easilycomputed when both the neuronal activity data and the relevant stimulus space are available. Anumber of other classes of sensory neurons in the brain also possess quasi-concave receptive ﬁelds ,that is, each such neuron responds with respect to a quasi-concave probability density function f : K → R ≥ on the stimulus space.In many situations, the relevant stimulus space for a given neural population may be unknown.This raises a natural question: can one infer the dimension of the stimulus space with quasi-concavereceptive ﬁelds from neural activity alone? More precisely, given the neural activity of m neuronswith quasi-concave receptive ﬁelds f i : K → R , can one “sense” the stimulus space by sampling theneural activity at n moments of time as M ia = f i ( x ( t a ))? Here one has access to the measurements M ia , but not the objects on the right-hand-side. This motivates the naive formulation of the convexsensing problem above. The convex sensing problem possesses a natural transformation group. If φ : R → R is a strictlymonotone-increasing function, then the sub-level sets of the composition φ ◦ f and f are identicalup to an order-preserving re-labeling. Thus, if φ is a strictly monotone-increasing function, then f is quasi-convex if and only if φ ◦ f is quasi-convex. It is easy to show that two sets of real numbers A function f ( x ) is quasi-concave if its negative, − f ( x ), is quasi-convex. A function φ : R → R is strictly monotone-increasing if φ ( y ) > φ ( x ) whenever y > x . a < a < · · · < a n and b < b < · · · < b n if and only if there existsa strictly monotone function φ : R → R , such that b i = φ ( a i ) for all i . It thus follows that it is onlythe total order of each row in the matrix M in equation (1) that constrain the geometric featuresof the point cloud X n = { x , ..., x n } in a convex sensing problem. This motivates the followingdeﬁnition. Deﬁnition 1.2.

Let V be a ﬁnite set. A sequence of length k in V is a k -tuple s = ( v , ..., v k ) ofelements in V without repetitions . We denote by S k [ V ] the set of all sequences of length k on V .If M is an m × n real matrix that has distinct entries in each row, then each row yields asequence of length n . For the sake of an example, consider a real-valued matrix M = (cid:34) .

23 4 .

19 2 .

56 3 . .

78 2 .

88 5 .

76 13 . (cid:35) . Since the ﬁrst row has the ordering 2 . < . < . < .

23, the total order < on V = { , , , } is 3 < < <

1. Thus, the order sequence for the ﬁrst row is s = (3 , , , ∈ S [ V ]. Similarly,the order sequence for the second row is s = (2 , , , X n and the quasi-convex functions { f i } i ∈ [ m ] aregeneric in some natural sense , then each row of the data matrix M ia = f i ( x a ) has no repeatedvalues with probability 1. We denote the set of all “generic” data matrices as M om,n def = { m × n real-valued matrices with no repeated entries in each row } . For any such matrix M = [ M ia ] ∈ M om,n , one can deﬁne a collection S ( M ) of m maximal-lengthsequences as S ( M ) = { s , . . . , s m } , where each sequence s i = ( a i , ..., a in ) ∈ S n [ n ] is obtained from the total order of the i -th row: M ia i < M ia i < ... < M ia in . The geometry of a convex sensing problem for a data matrix M ∈ M om,n is constrained only bythe set of m sequences S ( M ) ⊂ S n [ n ]. The following observation makes it possible to re-state anyconvex sensing problem purely in terms of embedding a set of points that satisfy certain convex hull It will be rigorously deﬁned in Section 1.3. The accurate notation is S n [[ n ]], but here we use the less cumbersome notation S n [ n ]. x , ..., x k ) denote the convex hull of a collection of points x , ..., x k in R d . Lemma 1.3.

For any collection of n distinct points { x , x , . . . , x n } ⊂ R d , the following statementsare equivalent:(i) There exists a continuous quasi-convex function f : R d → R such that f ( x ) < f ( x ) < · · · < f ( x n ) , (2) (ii) For each k = 2 , ..., n , x k / ∈ conv( x , ..., x k − ) .Proof. The implication ( i ) = ⇒ ( ii ) follows from Deﬁnition 1.1. To prove that ( ii ) = ⇒ ( i ), denote C k = conv( x , ..., x k ), d k ( x ) def = dist( x, C k ) for any k = 1 , . . . n , and deﬁne f ( x ) = (cid:80) nk =1 h k · d k ( x ),where h = 1 and h k def = 1 + 1 d k ( x k +1 ) max  k − (cid:88) j =1 h j ( d j ( x k ) − d j ( x k +1 )) ,  , for k ≥ . Note that (ii) implies that d k ( x k +1 ) > k ≥

2. Recall that, for any convex set C ⊂ R d , thefunction x (cid:55)→ dist( x, C ) is continuous and convex . Thus, since h k are positive, f ( x ) is a continuousconvex (and thus quasi-convex) function. Moreover, f ( x ) = 0 < dist( x , x ) = f ( x ), and h k > d k ( x k +1 ) k − (cid:88) j =1 h j ( d j ( x k ) − d j ( x k +1 )) for k ≥ . The last inequality is equivalent to f ( x k +1 ) > f ( x k ). Thus inequalities (2) hold. Corollary 1.4.

A matrix M = [ M ia ] ∈ M om,n can be obtained as M ia = f i ( x a ) from a collection of m continuous quasi-convex functions f i : R d → R and n points x , . . . , x n ∈ R d if and only if thereexist n points x , . . . , x n ∈ R d such that, for each sequence s = ( a , a , . . . , a n ) ∈ S ( M ) and each k = 2 , ..., n , x a k / ∈ conv( x a , ..., x a k − ) . An important implication of Corollary 1.4 is that a convex sensing problem without any fur-ther constraint always has a two-dimensional solution. Recall that a set of points is convexlyindependent if none of these points lies in the convex hull of the others.

Corollary 1.5.

For every matrix M ∈ M om,n and convexly independent points x , x , . . . , x n ∈ R d ,there exist m continuous quasi-convex functions f i : R d → R such that M ia = f i ( x a ) . See, e.g., Example 3.16 in [3]. See, e.g., Section 3.2.1 in [3]. By choosing d = 2 in Corollary 1.4, we obtain a two-dimensional solution. non-generic ,for large n . If one explicitly excludes this situation, then the combinatorics of S ( M ) constrains theminimal possible dimension d of the geometric realization, as illustrated by the following example. Example 1.6.

Let n >

2, and M ∈ M on − ,n be a matrix obtained as in equation (1) with continuousquasi-convex functions f i , whose ( n −

1) sequences S ( M ) = { s , s , . . . , s n − } are of the form s i = ( · · · , n, i ) , for all i ∈ [ n − , (3)where each of the “ · · · ” in s i is an arbitrary permutation of [ n ] \ { n, i } . Assume that at least onepoint in X n = { x , ..., x n } ⊂ R d is contained in the interior of the convex hull conv( X n ), then thedimension in which M can be obtained as in Corollary 1.4 is d = n −

2. The proof is given inSection 6.1 of the Appendix.

It is clear from Corollary 1.5 and Example 1.6, that the problem of dimension inference is well-posed only in the presence of some genericity assumptions that guarantee convex dependence of thesampled points. Instead of making such an assumption explicit, we take a probabilistic perspective,wherein points are drawn from a probability distribution that is generic is some natural sense. Weassume that there are three objects (which are unknown) that underly any “convex sensing” data:(i) an open convex set K ⊆ R d ,(ii) m quasi-convex continuous functions F = { f i : K → R } mi =1 , and(iii) a probability measure P K on K .In relation to the neuroscience motivation in Section 1.1, K ⊆ R d is the stimulus space, eachfunction f i is the negative of the receptive ﬁeld of a neuron, and P K is the measure that describesthe probability distribution of the stimuli. To guarantee that the convex sensing data are generic ,we impose the following regularity assumptions. Deﬁnition 1.7. A regular pair is a pair ( F , P K ) that satisﬁes the conditions (i)-(iii) above aswell as the following two conditions:(R1) The probability measure P K is equivalent to the Lebesgue measure on K .(R2) Level sets of all functions in F are of measure zero, i.e. for every i ∈ [ m ] and (cid:96) ∈ R , P K ( f − i ( (cid:96) )) = 0. Deﬁnition 1.8.

A point cloud { x , ..., x n } ⊂ K is sampled from a regular pair ( F , P K ) if it isi.i.d. from P K . A matrix M = [ M ia ] ∈ M om,n is sampled from a regular pair ( F , P K ), if for all i ∈ [ m ], and a ∈ [ n ], M ia = f i ( x a ) , where { x , ..., x n } ⊂ K is sampled from ( F , P K ).6he assumption (R1) ensures that the domain K is well-sampled, and thus the probabilitythat the points x , ..., x n are convexly independent approaches zero in the limit of large n . Theassumption (R2) guarantees, with probability 1, that the data matrix M has no repeated values ineach row, and thus is in M om,n .In this paper, we develop a method for estimating the dimension of convex sensing data.Intuitively, such an estimator needs to be consistent , i.e. “behave well” in the limit of large data.In addition to the conditions imposed on a regular pair, other properties of a pair ( F , P K ) may beneeded, depending on the context. It is therefore natural to deﬁne a consistent dimension estimatorin relation to a particular class of regular pairs. Since an estimator may rely on diﬀerent parametersfor diﬀerent regular pairs, we consider a one-parameter family of such estimators, motivating thefollowing deﬁnition of consistency. Deﬁnition 1.9.

Let R P be a class of regular pairs. For each regular pair ( F , P K ) ∈ R P we denoteby d ( F , P K ) the dimension d , where the open convex set K ⊆ R d is embedded. A one-parameterfamily of functions ˆ d ( ε ) : M om,n → N is called an asymptotically consistent estimator in R P ,if for every regular pair ( F , P K ) ∈ R P , there exists l >

0, such that for every ε ∈ (0 , l ) and eachsequence of matrices M n ∈ M om,n , sampled from ( F , P K ),lim n →∞ P (cid:16) ˆ d ( ε ) ( M n ) = d ( F , P K ) (cid:17) = 1 . (4)The structure of this paper is as follows. In Section 2, we deﬁne two multi-dimensional ﬁl-trations of simplicial complexes: the empirical Dowker complex Dow( S ( M )) that can be asso-ciated to a data matrix M , and the Dowker complex D ow( F , P K ), that can be associated to aregular pair ( F , P K ). Using an interleaving distance between multi-ﬁltered complexes , we prove(Theorem 2.9) that for a sequence { M n } of data matrices, sampled from a regular pair ( F , P K ),Dow( S ( M n )) → D ow( F , P K ) in probability, as n → ∞ .In Section 3, we develop tools for estimating the dimension of ( F , P K ) using persistent homol-ogy. We deﬁne a set of maximal persistent lengths associated to D ow( F , P K ) and prove (Lemma3.8) that a lower bound of the dimension of ( F , P K ) can be derived from these persistent lengths.Next we deﬁne another set of maximal persistence lengths from Dow( S ( M n )) and prove (Theorem3.10) that they converge to the maximal persistence lengths associated to D ow( F , P K ) in probabil-ity, in the limit of large sampling of the data. The rest of Section 3 is devoted to two subsamplingprocedures for diﬀerent practical situations, as well as simulation results that illustrate that thecorrect dimension can be inferred with these two methods.In Section 4, we introduce complete regular pairs and prove (Theorem 4.3) that the lowerbound in Lemma 3.8 is equal to the dimension d ( F , P K ) for complete regular pairs. This estab-7ishes (Theorem 4.4) that the dimension estimator introduced in Section 3.3 is an asymptoticallyconsistent estimator in the class of complete regular pairs. In Section 5, we deﬁne an estimatorthat can be used to test (Theorem 5.5) whether the data matrix is sampled from a complete regularpair. The Appendix (Section 6) contains the proofs of the main theorems as well as some technicalsupporting lemmas. In this section, we deﬁne the empirical Dowker complex from the m sequences induced from therows of the data matrix M and the Dowker complex from the regular pair ( F , P K ) and prove thatthe empirical Dowker complex converges to the Dowker complex in probability. These complexesare both examples of multi-ﬁltered simplicial complexes . Deﬁnition 2.1.

Let I = (cid:81) i ∈ [ m ] I i be an m -orthotope in R m , where each I i is an interval (open,closed, half-open, ﬁnite, or inﬁnite are all allowed) in R . Let ≤ be the natural partial order on I induced from R m . A multi-ﬁltered simplicial complex D indexed over I is a collection {D α } α ∈ I of simplicial complexes on a ﬁxed ﬁnite vertex set, such that, D α ⊆ D β , for all α ≤ β in I .We deﬁne the empirical Dowker complexes from a collection of sequences of maximal length on the vertex set [ n ]. Deﬁnition 2.2.

Let S = { s , ..., s m } be a collection of sequences on [ n ] of length n . Let ≤ i be thetotal order on [ n ] induced from s i ; namely, for a, b ∈ [ n ], a ≤ i b if and only if a is before or equal to b in s i . We deﬁne the following multi-ﬁltered simplicial complex, with vertex set [ m ] and indexedover [0 , m : Dow( S ) def = { Dow ( S ) ( t , ..., t m ) : ( t , ..., t m ) ∈ [0 , m } , where Dow( S )( t , ..., t m ) def = ∆( { σ a : a = 1 , ..., n } ) , and σ a = { i ∈ [ m ] : { b ∈ [ n ] : b ≤ i a } ) ≤ nt i } . Here ∆( { σ a } a ∈ [ n ] ) denotes the smallest simplicial complex containing the faces { σ a } a ∈ [ n ] . Thisﬁltered complex is called the empirical Dowker complex of S .Recall from Section 1.2 that the relevant geometric information of the m × n data matrix M ∈ M om,n is contained in the collection of m sequences S ( M ) = { s , ..., s m } , where s i ⊂ S n [ n ] isof length n and records the total order induced by the i -th row of M . Therefore, we can consider i.e. of length n S ( M )) derived from the data matrix M .Note that our deﬁnition of empirical Dowker complex is a multi-parameter generalization ofthe Dowker complex deﬁned in [5]. Speciﬁcally, the one-dimensional ﬁltration of simplicial complex(indexed over t ) Dow( S ( M ))( n · t, ..., n · t ) is equal to the Dowker complex deﬁned in [5].Recall that, for a collection A = { A i } i ∈ [ m ] of sets, the nerve of A , denoted nerve( A ), is thesimplicial complex on the vertex set [ m ] deﬁned asnerve( A ) def = (cid:40) σ ⊆ [ m ] : (cid:92) i ∈ σ A i (cid:54) = ∅ (cid:41) . The following lemma is immediate from Deﬁnition 2.2.

Lemma 2.3.

Let S = { s , ..., s m } be a collection of sequences on [ n ] of length n . For each i ∈ [ m ] and t ∈ R , consider A ( i ) ( t ) def = { a ∈ [ n ] : { b ∈ [ n ] : b ≤ i a } ) ≤ nt } ⊂ [ n ] , where ≤ i is the total order on [ n ] induced by s i . Then Dow( S )( t , ..., t m ) = nerve (cid:18)(cid:110) A ( i ) ( t i ) (cid:111) i ∈ [ m ] (cid:19) . Next we connect the combinatorics of Dow( S ( M )) to the geometry. From Lemma 2.3, weknow that Dow( S ( M )) is the nerve of { A ( i ) ( t i ) } i ∈ [ m ] . To deﬁne an analogue of Dow( S ( M )) fromthe regular pair ( F , P K ), we use the following lemma (see the proof in Section 6.2) to deﬁne ananalogue of A ( i ) ( t ) from ( F , P K ). Lemma 2.4.

Let f : K → R be a continuous function with P K ( f − ( (cid:96) )) = 0 for all (cid:96) ∈ R , where P K is a probability measure on a convex open set K and P K is equivalent to the Lebesgue measureon K . Then there exists a unique strictly increasing continuous function λ : (0 , → R such that,for all t ∈ (0 , , P K ( f − ( −∞ , λ ( t )) = t. (5)For a regular pair ( F , P K ) = ( { f i : K → R } i ∈ [ m ] , P K ), by Lemma 2.4, for each i ∈ [ m ], thereexists a unique strictly increasing continuous function λ i : (0 , → R such that P K ( f i ( −∞ , λ i ( t )) = t . Using λ i ( t ), the following deﬁnition provides a continuous analogue of A ( i ) ( t ). Deﬁnition 2.5.

Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regular pair. For each i ∈ [ m ] and t ∈ (0 , K ( i ) ( t ) def = f − i ( −∞ , λ i ( t )) , λ i : (0 , → R is the unique function that satisﬁes P K ( f − i ( −∞ , λ i ( t ))) = t . For convenience,we also deﬁne K ( i ) (0) def = ∅ and K ( i ) (1) def = K .Figure 2: K ( i ) ( t ) is the sublevel set of f i whose P K measure is equal to t .An illustration of K ( i ) ( t ) can be found in Figure 2. They are simply sublevel sets of f i rescaledwith respect to the P K measure. On the other hand, for a point cloud X n = { x , · · · , x n } sampledfrom P K , if we identify [ n ] with X n via a ↔ x a , then A ( i ) ( t ) may be interpreted as the set of pointsin X n that is inside an approximation of K ( i ) ( t ). Informed by Lemma 2.3, we use K ( i ) ( t ) to deﬁnethe continuous version of Dowker complex. Deﬁnition 2.6.

Let ( F , P K ) be a regular pair. Deﬁne a multi-ﬁltered complex D ow( F , P K ),indexed over [0 , m , by D ow( F , P K )( t , ..., t m ) def = nerve (cid:16) { K ( i ) ( t i ) } mi =1 (cid:17) . This multi-ﬁltered complex is called the

Dowker complex induced from ( F , P K ).The complex Dow( S ( M )) is what we can obtain from the data matrix M , but it does notcapture the whole geometric information of ( F , P K ). On the other hand, D ow( F , P K ) reﬂects thewhole geometric information but is not directly computable. Since A ( i ) ( t i ) is an approximation of K ( i ) ( t i ), we might expect Dow( S ( M )) approximates D ow( F , P K ). As we shall see, this is the casebut, for comparing them formally, we need the concept of the interleaving distance. Deﬁnition 2.7.

For a multi-ﬁltered complex K indexed over R m and (cid:15) >

0, the (cid:15) -shift of K ,denoted K + (cid:15) , is the multi-ﬁltered complex deﬁned by( K + (cid:15) )( t , ..., t m ) def = K ( t + (cid:15), ..., t m + (cid:15) ) . For two multi-ﬁltered complexes K and L indexed over R m , the simplicial interleaving distance K and L is deﬁned as d INT ( K , L ) def = inf { (cid:15) > K ⊆ L + (cid:15) and L ⊆ K + (cid:15) } . Note that this interleaving distance is between multi-ﬁltered simplicial complexes while thestandard interleaving distance in topological data analysis is between persistence modules , namely,the level where the homology functor has been applied on the multi-ﬁltered complex (see, e.g. [8],for the standard deﬁnition of interleaving distance between multi-dimensional persistence modules).Similar to the standard interleaving distance, the simplicial interleaving distance d INT deﬁned hereis also a pseudo-metric; namely, d INT ( K , L ) = 0 does not imply K = L .The deﬁnition of simplicial interleaving distance involves a shift of indices and that is whythe two multi-ﬁltered complexes to be compared are required to be indexed over the whole R m .Since both Dow( S ( M )) and D ow( F , P K ) are indexed only over [0 , m , to compare them in termsof interleaving distance, we ﬁrst need to extend their indexing domain to R m . The deﬁnition belowis a natural way to extend the indexing domain. Deﬁnition 2.8.

For D = Dow( S ( M )) or D ow( F , P K ) and ( t , ..., t m ) ∈ R m , deﬁne D ( t , ..., t m ) def = D ( θ ( t ) , · · · , θ ( t m ))where θ : R → [0 ,

1] is deﬁned by θ ( t ) = t , if 0 ≤ t ≤ θ ( t ) = 0, if t < θ ( t ) = 1, if t > Theorem 2.9 (Interleaving Convergence Theorem) . Let ( F , P K ) be a regular pair and M n bean m × n data matrix sampled from ( F , P K ) . Then the simplicial interleaving distance between Dow( S ( M n )) and D ow( F , P K ) converges to in probability as n → ∞ ; namely, for all (cid:15) > , lim n →∞ Pr [ d INT (Dow( S ( M n )) , D ow( F , P K )) > (cid:15) ] = 0 . The proof of Theorem 2.9 is given in Section 6.2. In Section 3, we use Theorem 2.9 to infer alower bound for the dimension of ( F , P K ). First we recall the deﬁnition of persistence modules, persistence intervals and persistence dia-grams; for more details see, e.g. Chapter 1 of [11]. Then we deﬁne the maximal persistence lengthfor a 1-dimensional ﬁltration of simplicial complexes. We ﬁx a ground ﬁeld F , which is normallytaken to be F for computational reasons; all the statements here do not depend on the choice ofthe ﬁeld. 11 eﬁnition 3.1. A persistence module M indexed over an interval [0 , T ] is a collection {M t } t ∈ [0 ,T ] of vector spaces over F along with linear maps φ ts : M s → M t for every s ≤ t in [0 , T ] such that φ us = φ ut ◦ φ ts , and φ tt = id M t for all s ≤ t ≤ u in [0 , T ].A well-known structural characterization of a persistence module is via its persistence intervals (or equivalently, its persistence diagram ). To talk about persistence intervals, we would need todeﬁne the direct sum of persistence modules and interval modules . Deﬁnition 3.2.

Let M = {M t } t ∈ [0 ,T ] and N = {N t } t ∈ [0 ,T ] be persistence modules over the sameindex interval [0 , T ]. Let { φ ts : s, t ∈ [0 , T ] , s ≤ t } and { ψ ts : s, t ∈ [0 , T ] , s ≤ t } be the linear mapsof M and N . The direct sum of M and N , denoted M ⊕ N , is the persistence module, deﬁnedby (

M ⊕ N ) t def = M t ⊕ N t along with the linear maps ( φ ts ) ⊕ ( ψ ts ) : M s ⊕ N s → M t ⊕ N t for every s ≤ t in [0 , T ]. Deﬁnition 3.3.

Let J ⊆ [0 , T ] be an interval in [0 , T ], which can be either open, closed, or half-open. The interval module I J deﬁned over [0 , T ] is the persistence module I J = { ( I J ) t } t ∈ [0 ,T ] deﬁned by ( I J ) t def = F for all t ∈ J and ( I J ) t def = 0 for all t / ∈ J , along with the identity linear mapsfrom ( I J ) s to ( I J ) t for every s ≤ t in J and zero maps from ( I J ) s to ( I J ) t for other s ≤ t in [0 , T ].The next decomposition theorem is a structural theorem that characterizes persistence modulesand guarantees the existence and uniqueness of persistence intervals (see, e.g., Section 1.1 and 1.2of [11] and references therein). Theorem 3.4.

Let M = {K t } t ∈ [0 ,T ] be a persistence module over a closed interval [0 , T ] . If, foreach t ∈ [0 , T ] , M t is a ﬁnite dimensional vector space over F , then M can be decomposed as adirect sum of interval modules; namely, M = (cid:77) J I J where { J } is a collection of some intervals (could be open, closed, or half-open) in [0 , T ] . Thedecomposition is unique in the sense that, for every such decomposition, the collection of intervalsis the same. Each interval J in the decomposition stated in Theorem 3.4 is called a persistence interval of M . We may summarize all persistence intervals as a 2D diagram in [0 , T ] × [0 , T ], called the persistence diagram of M : for each persistence interval with left end α and right end β , we mark apoint ( α, β ) in [0 , T ] × [0 , T ]. The diagram consisting of all such points is called the persistencediagram of M , denoted dgm( M ). Rigorously speaking, we should distinguish open, closed, andhalf-open intervals. For our purpose, we only use the lengths of the persistence intervals, and hencethe distinction of open, closed, and half-open intervals does not really matter.12n important class of persistence modules is obtained from a 1-dimensional ﬁltration of sim-plicial complexes by applying the homology functors H k ( · ; F ), k = 0 , , , ... . Speciﬁcally, fora 1-dimensional ﬁltration of simplicial complexes K = {K t } t ∈ [0 ,T ] and a ﬁxed nonnegative inte-ger k , we have the persistence module H k ( K ; F ) = { H k ( K t ; F ) } t ∈ [0 ,T ] along with the linear maps( i ts ) ∗ : H k ( K s ; F ) → H k ( K t ; F ) for every s ≤ t in [0 , T ], where i ts is the inclusion map from K s to K t .Since H k ( · ; F ) is a covariant functor, the equality ( i ut ) ∗ ◦ ( i ts ) ∗ = ( i us ) ∗ holds for every s ≤ t ≤ u in[0 , T ].For each k , we may use the persistence diagram of H k ( K ; F ) for analysis. For our purpose,instead of the whole diagram, we summarize the diagram by only looking at the longest lengthamong all persistence intervals, which we formally deﬁne below: Deﬁnition 3.5.

Let K = {K t } t ∈ [0 ,T ] be a 1-dimensional ﬁltration of simplicial complexes. For eachnonnegative integer k , we deﬁne l max ( k, K ) def = sup { β − α : ( α, β ) ∈ H k ( K ; F ) } (6)and call it the maximal persistence length in dimension k .This deﬁnition is similar to the one used in Section 3 of [2]. Normally, the length of apersistence interval in H k ( K ; F ) is viewed as its signiﬁcance in dimension k . Therefore, l max ( k, K ),the maximum among such interval lengths, is viewed as the signiﬁcance of K in dimension k . L k ( F , P K ) and its relation to the dimension of ( F , P K ) In this section, from the regular pair ( F , P K ), we deﬁne quantities that we use to bound thedimension d ( F , P K ) from below. We start with the following notation. Deﬁnition 3.6.

Given ( F , P K ), where F = { f i : K → R } i ∈ [ m ] is a collection of quasi-convexfunctions deﬁned on a convex open set K and P K is a probability measure on K . For x ∈ K , wedeﬁne T i ( x ) def = P K ( f − i ( −∞ , f i ( x ))) . The only diﬀerence is that the authors in [2] measures the maximal cycle multiplicatively while we measure itadditively. T i ( x ) is the P K measure of the shaded area f − i ( −∞ , f i ( x )).The function T i ( x ) may be regarded as the P K -rescaled version of f i (see Figure 3 for anillustration). Now we deﬁne a one dimensional ﬁltration of simplicial complexes K x that are usedto infer a lower bound of the dimension d ( F , P K ). The geometry underlying the deﬁnition isdepicted in Figure 4.Figure 4: From left to right, the ﬁltration K x ( t ), as the nerve of the sublevel sets of { f i } i ∈ [ m ] , startsat t = 0 as the empty simplicial complex and increases as t goes up to t max ( x ), where the sublevelsets of { f i } i ∈ [ m ] touch the point x on their boundaries. The formal formulation of the process is in(8) of Deﬁnition 3.7.Throughout Section 3, we ﬁx an arbitrary coeﬃcient ﬁeld F when taking homology; namely,for a ﬁltration of simplicial complexes K and a nonnegative integer k , H k ( K ) def = H k ( K ; F ). Deﬁnition 3.7.

Let ( F , P K ) be a regular pair, where F = { f i : K → R } i ∈ [ m ] . For x ∈ K , let t max ( x ) def = max i ∈ [ m ] T i ( x ) . (7)Deﬁne a one dimensional ﬁltered complex K x , indexed over t ∈ [0 , t max ( x )], by K x ( t ) def = D ow( F , P K ) ( T ( x ) − ( t max ( x ) − t ) , ..., T m ( x ) − ( t max ( x ) − t )) . (8)14or every nonnegative integer k , we deﬁne L k ( F , P K ) def = sup x ∈ K l max ( k, K x ) . (9)As illustrated in Figure 4, if x is “central” in some appropriate sense (see Deﬁnition 4.1 inSection 4), a ( d ( F , P K ) − L k ( F , P K ) can at least be used to derive a lower bound for thedimension of the regular pair ( F , P K ) due to the following lemma. Lemma 3.8.

Let ( F , P K ) be a regular pair. Then, for k ≥ d ( F , P K ) , L k ( F , P K ) = 0 . In particular, d low ( F , P K ) def = 1 + max { k : L k ( F , P K ) > } ≤ d ( F , P K ) . (10) Proof.

For notational simplicity, in this proof, we denote d low = d low ( F , P K ) and d = d ( F , P K ).Recall that D ow( F , P K )( t , ..., t m ) = nerve (cid:0) { f − i ( −∞ , λ i ( t i )) } mi =1 (cid:1) . Since the functions f i arequasi-convex, intersections of convex sets are convex and convex sets are contractible, the set { f − i ( −∞ , λ i ( t i )) } mi =1 is a good cover. Thus, by nerve lemma (see, e.g., Theorem 10.7 in [1] orCorollary 4G.3 in [7]), we have the following homotopy equivalence: D ow( F , P K )( t , ..., t m ) ∼ (cid:91) i ∈ [ m ] f − i ( −∞ , λ i ( t i )) . (11)Notice that (cid:83) i ∈ [ m ] f − i ( −∞ , λ i ( t i )) is open in R d and it is well-known that, for every open set U ⊆ R d , H k ( U ) = 0, for all k ≥ d (see, e.g., Proposition 3.29 in [7]). Thus, for k ≥ d , H k (cid:16)(cid:83) i ∈ [ m ] f − i ( −∞ , λ i ( t i )) (cid:17) = 0. Combining with (11), we obtain, for k ≥ d and ( t , ..., t m ) ∈ [0 , m , H k ( D ow( F , P K )( t , ..., t m )) = 0. Therefore, for k ≥ d , l max ( k, K x ) = 0, for all x ∈ K , and L k ( F , P K ) = 0. Thus d low − { k : L k ( F , P K ) > } ≤ d − d low ≤ d . L k ( F , P K ) is deﬁned with respect to a regular pair ( F , P K ) and thus is not directly computablefrom discrete data. In Section 3.3, we follow an analogous approach in deﬁning L k ( F , P K ) to deﬁne L k ( M ) and prove that L k ( M ) converges to L k ( F , P K ). L k ( M ) and its convergence to L k ( F , P K ) In Theorem 2.9, we see that, for the data matrix M , Dow( S ( M )) approximates D ow( F , P K )with high probability. Thus, it is natural to use Dow( S ( M )) to deﬁne an analogue L k ( M ) of L k ( F , P K ). Deﬁnition 3.9.

Let M ∈ M om,n and S ( M ) = { s , ..., s m } be the collection of m sequences inducedfrom the rows of M ( s i corresponds to row i ). For a ∈ [ n ] and i ∈ [ m ], denoteord i ( M, a ) def = { b ∈ [ n ] : M ib ≤ M ia } ) . (12)15or a ∈ [ n ], which corresponds to the a -th column of the data matrix M , letˆ t max ( a ) def = max i ∈ [ m ] ord i ( M, a ) n . (13)Deﬁne a one dimensional ﬁltered complex ˆ K a , indexed over t ∈ [0 , ˆ t max ( a )], byˆ K a ( t ) def = Dow( S ( M )) (cid:18) ord ( M, a ) n − (ˆ t max ( a ) − t ) , ..., ord m ( M, a ) n − (ˆ t max ( a ) − t ) (cid:19) . (14)See Deﬁnition 2.2 for the deﬁnition of Dow( S ( M )). For every nonneative integer k , we deﬁne L k ( M ) def = max a ∈ [ n ] l max ( k, ˆ K a ) . (15)Since Dow( S ( M )) approximates D ow( F , P K ), intuitively, ˆ K a ( t ) approximates K x a ( t ) and L k ( M )approximates L k ( F , P K ). With the help of Theorem 2.9 and the Isometry Theorem in topologicaldata analysis (see e.g. Theorem 6.16 in Section 6 of [11]), these intuitions are justiﬁed as follows: Theorem 3.10.

Let ( F , P K ) be a regular pair. Assume that K is bounded and each f i ∈ F can becontinuously extended to the closure ¯ K . Let M n be an m × n matrix sampled from ( F , P K ) . Then,for all k ∈ { } ∪ N , as n → ∞ , L k ( M n ) converges to L k ( F , P K ) in probability; namely, for all (cid:15) > , lim n →∞ Pr [ | L k ( M n ) − L k ( F , P K ) | < (cid:15) ] = 1 . Moreover, the rate of convergence is independent of k . The proof of Theorem 3.10 is given in Section 6.4. According to Theorem 3.10, for each non-negative integer k , L k ( M n ) are consistent estimators of L k ( F , P K ) and they converge uniformlyin probability. Thus, by Lemma 3.8 and Theorem 3.10, we can estimate a lower bound for thedimension of ( F , P K ) from the data matrix M , via looking at the values of L k ( M n ). Formally, wecan deﬁne the following estimator of d low ( F , P K ). Deﬁnition 3.11.

For (cid:15) > M ∈ M om,n , we deﬁneˆ d low ( M, (cid:15) ) def = 1 + max { k : L k ( M ) > (cid:15) } . (16)As a consequence of Lemma 3.8 and Theorem 3.10, it is immediate that ˆ d low ( M n , (cid:15) ) is aconsistent estimator, for appropriately chosen (cid:15) . Corollary 3.12.

Let ( F , P K ) be a regular pair satisfying the conditions in Theorem 3.10 and M n ∈ i.e. for all (cid:15) >

0, lim n →∞ Pr (cid:2) sup k ≥ | L k ( M n ) − L k ( F , P K ) | < (cid:15) (cid:3) = 1 om,n be sampled from ( F , P K ) . Denote d low = d low ( F , P K ) . Then, for all < (cid:15) < L d low − ( F , P K ) , lim n →∞ Pr (cid:104) ˆ d low ( M n , (cid:15) ) = d low ( F , P K ) (cid:105) = 1 . (17) Proof.

For notational simplicity, in this proof, we denote d = d ( F , P K ). By Lemma 3.8 andTheorem 3.10, for k ≥ d , L k ( M n ) → L d low − ( M n ) → L d low − ( F , P K ) > n → ∞ , with the same rate of convergence. Since 0 < (cid:15) < L d low − ( F , P K ), as n → ∞ , w.h.p., L d low − ( M n ) > (cid:15) and L k ( M n ) < (cid:15) . Therefore, w.h.p., ˆ d low ( M n , (cid:15) ) = d low , and theresult follows.From Corollary 3.12, ˆ d low ( M n , (cid:15) ) can be used as a consistent estimator of d low ( F , P K ). However,we need to know how to choose an appropriate (cid:15) for ˆ d low ( M n , (cid:15) ), and hence estimation of L k ( F , P K )is still necessary. Therefore, in practice, we suggest one use a statistical approach estimating L k ( F , P K ) to infer d low ( F , P K ), instead of using ˆ d low ( M n , (cid:15) ) directly. The details are discussed inSection 3.5. ˆ K a and L k ( M ) For ease of implementation, we combine Deﬁnition 3.9 and Deﬁnition 2.2 and summarize themas algorithms for the computaion of ˆ K a and L k ( M ). Algorithm 1 is for ˆ K a . Algorithm 1

Computation of ˆ K a INPUTS: (1) M : an m × n real matrix without repeated values on each row; namely, a matrix in M om,n .(2) a : an integer in [ n ], referring to the a -th column of M . OUTPUT: [ ˆ K a ( t )] t : a ﬁltration of simplicial complexes, where t ∈ { , /n, /n, ..., ˆ t max ( a ) } (see Step 2 orDenﬁnition 3.9 for deﬁnition of ˆ t max ( a )). STEPS:Step 1:

For i ∈ [ m ] and b ∈ [ n ], recall from Deﬁnition 3.9 the order (a positive integer)ord i ( M, b ) def = { c ∈ [ n ] : M ic ≤ M ib } ) . Step 2:

Deﬁne ˆ t max ( a ) def = (1 /n ) · max i ∈ [ m ] ord i ( M, a ). Step 3:

Deﬁne an increasing ﬁltration of simplicial complexes ˆ K a byˆ K a ( t ) def = ∆ (cid:16)(cid:8) σ b : σ b = { i ∈ [ m ] : ord i ( M, b ) ≤ ord i ( M, a ) − n (ˆ t max ( a ) − t ) } (cid:9) b ∈ [ n ] (cid:17) , where t ∈ { , /n, /n, ..., ˆ t max ( a ) } . 17he next algorithm, Algorithm 2, is for computing L k ( M ). Note that, in the algorithm,PersistenceIntervals is a function with two inputs, a ﬁltration of simplicial complexes and a positiveinteger that is set to limit the dimension of the computation of persistent homology to avoid possibleintractable computational complexities. As the name suggests, the output of PersistenceIntervalsis the persistence intervals of the ﬁrst input in dimensions less than or equal to the second input. Algorithm 2

Computation of L k ( M ) INPUTS: (1) M : an m × n real data matrix.(2) d up : a positive integer, used to limit the dimension of the computation of persistent ho-mology; namely, the persistent homology is only computed for dimension 0 , , ..., d up to make itcomputationally feasible. OUTPUT: [ L k ( M ) for k = 0 , , ..., d up ]: an array of nonnegative real numbers. STEPS:Step 1:

For a ∈ [ n ], compute I a def = PersistenceIntervals (cid:16) ˆ K a , d up (cid:17) def = (cid:110) dgm (cid:16) H k (cid:16) ˆ K a (cid:17)(cid:17) for k = 0 , , ..., d up (cid:111) , and L a, max def = (cid:110) max (cid:110) β − α : ( α, β ) ∈ dgm (cid:16) H k (cid:16) ˆ K a (cid:17)(cid:17)(cid:111) for k = 0 , , ..., d up (cid:111) def = (cid:110) l max (cid:16) k, ˆ K a (cid:17) for k = 0 , ..., d up (cid:111) . where PersistenceIntervals( ˆ K a , d up ) computes the persistence intervals of ˆ K a in dimensions up to d up . Step 2:

For k = 0 , , ..., d up , compute L k ( M ) by L k ( M ) def = max { l max ( k, ˆ K a ) : a ∈ [ n ] } . The worst case complexity of a standard algorithm for computing the persistent homology ofa 1-dimensional ﬁltration of simplicial complexes is cubical in the number of simplices (see, e.g.Section 5.3.1 in [10] and references therein). Since each ˆ K a starts from the empty simplicial complexand ends at the full simplex ∆ m − , we would need to go through all faces of ∆ m − . However, sincewe limit the computation only in dimension 0 , , ..., d up , where d up ≤ m − { d up + 1 , m − } )-skeleton of ∆ m − . Therefore, for our algorithm, the number18f faces in the 1-dimensional ﬁltration is min { d up +2 ,m } (cid:88) k =0 (cid:18) mk (cid:19) which is O ( m d up +2 ). Since there are n such a ∈ [ n ], the worst case complexity of computing { L k ( M ) } d up k =0 is O ( n · m d up +2) ) = O ( n · m d up +6 ), which is of degree 3 d up + 6 in m but only linearin n .Since the algorithm is linear in n , even in the case when n is large, as long as m is not toolarge, the algorithm is still tractable. Moreover, to use the full power of Theorem 3.10, we wouldwant n to be large. In the case when n is large, we may subsample the points (i.e. the columns)to see how large the variance of L k ( M n ) is; this is called bootstrap in statistics. Moreover, we canimplement the subsampling for diﬀerent numbers of columns and get the convergence trend.On the other hand, to infer the dimension d ( F , P K ), we will need at least m ≥ d ( F , P K ) + 1.Thus, we want m to be not too small. However, since the computational complexity of L k ( M n )goes up in high degree order with respect to m , we cannot have m being too large. In the case when m is too large, we can overcome the computational diﬃculty by subsampling the functions (i.e. therows); namely, pick randomly m s , say m s = 10, functions, which correspond to their respective m s rows of M n and compute the L k of the submatrix thus formed; repeat this process many times andsee how the result is distributed.We elaborate on these two methods (subsampling points or functions) in the following twosubsections. We also implement the methods for estimating the embedding dimension in theirappropriate situations, plot the results and give some principles for decision making (i.e. deciding,given the the plot and k , whether we accept L k ( F , P K ) > n is suﬃciently large In the case when n is suﬃciently large, say n ≥ n s ,points (i.e. columns of M n ) and obtain the variance information . Moreover, letting n s go up, wecan obtain further how the trend of convergence goes, which, by Theorem 3.10, should converge tothe true L k ( F , P K ). The technique of subsampling is called bootstrap in statistics.Figure 5 is the boxplots of L k ( M n s ) obtained by implementing this idea under diﬀerentsettings of ( d, m, n ), where d = d ( F , P K ) is the dimension of ( F , P K ), m is the number of functionsand n is the number of data points. Here, we choose ( m, n ) to be (10 , m is The boxplot of a collection of real numbers is a box together with a upper whisker and a lower whisker attached tothe top and bottom of the box and possibly some dots on top of the upper whisker or below the lower whisker. Fromthe box part, one can read out the ﬁrst quartile Q medium Q third quartile Q Q − Q interquartile range (IQR) . The lower whisker and upper whisker, resp., label the values Q − . IQR and Q . IQR , resp. Values outside of the whiskers are regarded as outliers and labelled by dots. n is suﬃciently large for subsampling. Subsampling is repeated 100times for each boxplot. To compare with the result of a purely random matrix, we also generate a10 ×

350 matrix whose entries are iid from Unif(0 ,

1) and compute its L k ’s. The details of how theboxplots are generated are in the caption of Figure 5. (a) d = 3 (b) d = 4 (c) d = 5 (d) M:

10 × 350 random matrix

Figure 5: The four panels are boxplots of L k obtained from subsampling the points. Throughoutthe panels, m = 10 and n = 350, where m is the number of functions and n is the number of totalsample poionts. The panels correspond to (a) d = 3, (b) d = 4, (c) d = 5, where d = d ( F , P K )is the dimension of ( F , P K ). The functions are chosen to be random quadratic functions deﬁnedon the unit d -ball in R d . Panel (d) is obtained by computing the L k ’s of an m × n matrix M n with entries i.i.d. from Unif(0,1), which is treated as a purely random matrix and whose mainpurpose is for comparison with other panels. Each ﬁgure in each panel is generated by subsampling n s = 50 , ,

200 columns of M n , repeated 100 times. By the decision principle, every ﬁgure inpanel (a) successfully infer their respective true dimensions; in panel (b), n s = 50 fails to infer thetrue dimension 4 but only infers a lower bound 3 while n s = 150 ,

200 successfully infer the truedimension 4; in panel (c), both n s = 50 ,

150 fail to infer the true dimension 5 but only infer a lowerbound 4 while n s = 200 successfully infer the true dimension 5. In panel (d), the ﬁgure has a quitediﬀerent behavior. decision principle we propose to follow isthat, on each boxplot of L k ( M n s ) , if the ﬁrst quartile Q is not greater than , reject L k ( F , P K ) > ; otherwise, accept L k ( F , P K ) > d = 3, we can see that, as n s goes up, the variance of L k ( M n s ) for each k goes down. For k = 2, the ﬁrst quartile of L k ( M n s ) is greater than 0 even for n s = 50; for k ≥ L k ( M n ) stays at 0 with only some noise-like dots all the time. According to this principle, we canconclude d ≥ d = 3.For panel (b) where d = 4, the same shrinking variance behavior can be observed. Moreover,the principle concludes d ≥ n s = 150, where the ﬁrst quartile starts to stay away from 0.Similarly, for panel (c) where d = 5, in n s = 50 and n s = 150, our principle concludes d ≥ n s = 200, it concludes d ≥ d , we would need n s to be larger to make the best conclusion (i.e.inferring the true dimension). However, by making n s go up, the variance of L k ( M n s ) goes downand we may also use this information. Therefore, for small sample case, one may count on thisconvergence behavior and develop other principles by quantifying the trend of convergence. Forexample, in panel (c) where d = 5, when n s goes up from 50 to 150, we observe that L ( M n s ) pokesout from noiselike outliers to a ﬁlled box. This trend suggests that we “may accept” L ( F , P K ) > m is large As we mentioned earlier, the worst case computational complexity of L k ( M n ) goes up althoughpolynomially but with degree 3 d up + 6 (high degree) in m , the number of rows of M n . To overcomethis diﬃculty, we propose to subsample the rows (i.e. the collection of functions) of M n . Speciﬁcally,for a ﬁxed number m s < m , we randomly choose m s rows of M n and construct the m s × n submatrix M m s × n accordingly, compute L k ( M m s × n ) and repeat the process as many times as assigned. Figure6 is the boxplot of L k ( M m s × n ) with m s = 10, repeated N rep = 1000 times, under diﬀerent settings.Notice that throughout the plots, m = 100, n = 150, m s = 10 and N rep = 1000. We still adopt theprinciple as last subsection that we only accept L k ( F , P K ) > , , It can be proved that, if entries of M n ∈ M om,n are i.i.d. from Unif(0 , L k ( M n ) behaves as positive for k ≤ m − L k ( M n ) behaves as going to zero, for k ≥ m −

1; thus L k ( M n ) relies on m instead of an intrinsic d . i.e. the 25th percentile (a) d = 2 (b) d = 3 (c) d = 4 (d) d = 5 Figure 6: Boxplots of L k obtained from subsampling functions. Throughout the panels, m = 60, n = 150 and m s = 10, where m is the number of functions (rows of M n ), n is the number of totalsample points and m s is the number of functions used in each function subsampling. Each panel isgenerated under a ﬁxed regular pair with (a) d = 2, (b) d = 3, (c) d = 4, (d) d = 5, where d is thetrue dimension of the regular pair. Function subsampling is repeated 1000 times for each panel.A lower bound for d ( F , P K ) may not be very satisfactory. In Section 4 and 5, we developsome theory and methods to decide whether the lower bound obtained in this section is indeed thedimension d ( F , P K ). ˆ d low ( M, ε ) as an asymptotically consistent dimension estimatorin the class of complete regular pairs We establish in Section 3 that a lower bound d low ( F , P K ) of d ( F , P K ) is generally inferablefrom sampled data. Here we provide a suﬃcient condition for d low ( F , P K ) = d ( F , P K ); this ensuresthat the dimension d ( F , P K ) can be inferred with high probability. Recall that the conic hull of22 set S ⊆ R d , denoted cone( S ) , is the setcone( S ) def = (cid:40) k (cid:88) i =1 c i v i | c , ..., c k ≥ , v , ..., v k ∈ S, k ∈ N (cid:41) . (18) Deﬁnition 4.1.

Let ( F , P K ) be a regular pair, where F = { f i } i ∈ [ m ] and each f i : K → R isdiﬀerentiable. The setCent = (cid:110) x ∈ K | cone ( {∇ f ( x ) , . . . , ∇ f m ( x ) } ) = R d (cid:111) , is called the type 1 central region of ( F , P K ). Deﬁnition 4.2.

A regular pair ( F , P K ) is said to be complete if its Cent is non-empty.It is perhaps intuitive (see Figure 4 on page 14) that, for a suﬃciently nice complete regularpair, the lower bound in Lemma 3.8 is indeed the dimension d ( F , P K ). More precisely, Theorem 4.3.

Let ( F , P K ) be a regular pair, where F = { f i } i ∈ [ m ] and each f i : K → R is diﬀer-entiable. If ( F , P K ) is complete, then the lower bound in Lemma 3.8 is indeed the dimension of theregular pair, i.e. d low ( F , P K ) = d ( F , P K ) . The proof is given in Section 4.1. An immediate corollary of the above theorem and Corollary3.12, is the following

Theorem 4.4.

Let ( F , P K ) be a regular pair satisfying the conditions in Theorem 3.10, where F = { f i } i ∈ [ m ] and each f i : K → R is diﬀerentiable. If ( F , P K ) is complete regular pair withdimension d = d ( F , P K ) , and matrices M n ∈ M om,n are sampled from ( F , P K ) , then for every ε ∈ (0 , L d − ( F , P K )) , lim n →∞ Pr (cid:104) ˆ d low ( M n , ε ) = d ( F , P K ) (cid:105) = 1 . (19)In other words, ˆ d ( ε ) : M om,n → N deﬁned by ˆ d ( ε )( M n ) def = ˆ d low ( M n , ε ) is an asymptoticallyconsistent estimator in the class of complete regular pairs. Proof.

By Theorem 4.3, d low ( F , P K ) = d ( F , P K ). Moreover, by Corollay 3.12,lim n →∞ Pr (cid:104) ˆ d low ( M n , ε ) = d low ( F , P K ) (cid:105) = 1 . Thus, the result follows.

Recall, the following notation from Section 3. Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regularpair; for any t ∈ [0 , K ( i ) ( t ) def = f − i ( −∞ , λ i ( t )), where λ i ( t ) is a monotone-increasing23unction that satisﬁes P K ( f − i ( −∞ , λ i ( t ))) = t . For any x ∈ K , we also denote T i ( x ) def = P K ( f − i ( −∞ , f i ( x ))) . Theorem 4.3 follows from the following key lemma.

Lemma 4.5.

Let ( { f i } mi =1 , P K ) be a complete regular pair. Suppose x ∈ Cent , then there exists ε > such that nerve (cid:18)(cid:110) K ( i ) ( T i ( x ) − t ) (cid:111) i ∈ [ m ] (cid:19) ∼ S d − , ∀ t ∈ [0 , ε ) . (20) Proof of Theorem 4.3.

By Lemma 3.8, d low ( F , P K ) ≤ d ( F , P K ). We therefore only need to prove L d − ( F , P K ) >

0. Let x ∈ Cent . By Lemma 4.5, l max ( d − , x ) > L d − ( F , P K ) >

0, completing the proof.

Proof of Lemma 4.5.

For each i ∈ [ m ], we denote byΩ i = K ( i ) ( T i ( x )) = { x ∈ K | f ( x ) < f ( x ) } the appropriate open convex sublevel set of f i . Note that K ( i ) ( T i ( x ) − t ) ⊆ Ω i for any t ≥

0, andthe nerve in the left-hand-side of (20) is a subcomplex of the nerve (cid:16) { Ω i } i ∈ [ m ] (cid:17) .For each non-empty σ ∈ nerve (cid:16) { Ω i } i ∈ [ m ] (cid:17) , the subset (cid:84) i ∈ σ Ω i is open and non-empty andhence has a nonzero P K measure. Thus there exists ε σ >

0, such that (cid:84) i ∈ σ K ( i ) ( T i ( x ) − t ) isnon-empty for any t ∈ [0 , ε σ ). Choosing ε to be the minimum of all such ε σ thus guarantees thatnerve (cid:18)(cid:110) K ( i ) ( T i ( x ) − t ) (cid:111) i ∈ [ m ] (cid:19) = nerve (cid:16) { Ω i } i ∈ [ m ] (cid:17) ∀ t ∈ [0 , ε ) . It thus suﬃces to prove (20) for t = 0. Since each Ω i is open and convex, by the nerve lemma it is enough to show that (cid:83) i ∈ [ m ] Ω i ∼ S d − . Moreover, since x lies on the boundary of each Ω i ,the union { x } ∪ (cid:83) i ∈ [ m ] Ω i is star-shaped. Therefore, it suﬃces to prove that there exists η > B η ( x ) \ { x } ⊆ (cid:91) i ∈ [ m ] Ω i , (21)where B η ( x ) = { x ∈ R d : (cid:107) x − x (cid:107) < η } .Suppose no such η > n ∈ N , there exists a unit vector v n ∈ S d − = { x ∈ R d , (cid:107) x (cid:107) = 1 } such that x n = x + n · v n / ∈ (cid:83) i ∈ [ m ] Ω i . By compactness of S d − , there is an inﬁnitesubsequence { v n j } , that converges to a particular v ∗ ∈ S d − . Since all f i are diﬀerentiable, using See Lemma 6.24 in Section 6.5. See, e.g., Theorem 10.7 in [1] or Corollary 4G.3 in [7]. n j ( f i ( x n ) − f i ( x )) = (cid:104)∇ f i ( x ) , v n j (cid:105) + O (1 /n j ) . (22)Since x n / ∈ Ω i for all i , f i ( x n ) ≥ f i ( x ). Taking lim inf j →∞ on both sides of equation (22), weconclude that (cid:104)∇ f i ( x ) , v ∗ (cid:105) ≥ , ∀ i ∈ [ m ]. Since − v ∗ ∈ cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d , choosingappropriate nonnegative coeﬃcients in (18) yields 0 ≤ (cid:104)− v ∗ , v ∗ (cid:105) and hence v ∗ = 0, a contradiction.Therefore the inclusion (21) holds for some η > ( F , P K ) from sampled data Theorem 4.3 establishes that completeness of ( F , P K ) implies d low ( F , P K ) = d ( F , P K ), andthus the data dimension d ( F , P K ) can be inferred from sampled data. Unfortunately, completenesscannot be directly tested from sampled data, since the gradient information is not directly accessiblefrom discrete samples. Here we consider a diﬀerent notion of central region , Cent ⊆ K , which,under some generic assumtion, is indistinguishable from Cent in the probability measure P K (Lemma 5.3). We also establish that the probability measure of Cent can be approximated fromsampled data (Theorem 5.5). This enables one to test completeness of a regular pair from sampleddata. Deﬁnition 5.1.

Let ( F , P K ) be a regular pair, the subsetCent =  x ∈ K | (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) = ∅  , is called the type 0 central region of ( F , P K ). Deﬁnition 5.2.

A set of vectors V = { v , ..., v m } ⊆ R d is said to be in general direction if,for every σ ⊆ [ m ] with | σ | ≤ d , the set of vectors { v i } i ∈ σ is linearly independent. A collection ofdiﬀerentiable functions F = { f i : K → R } i ∈ [ m ] is said to be in general position if for (Lebesgue)almost every x in K , the vectors {∇ f i ( x ) } i ∈ [ m ] are in general direction. Lemma 5.3.

Let ( F , P K ) be a regular pair, where each function in F is diﬀerentiable. Assumethat F is in general position, then P K (Cent \ Cent ) = P K (Cent \ Cent ) = 0 . (23)The proof is given in Section 5.1. It can be shown that Cent of a regular pair is an openset (see Lemma 6.25 in the Appendix). Thus completeness of a regular pair ( F , P K ) is equivalentto P K (Cent ) >

0. Lemma 5.3 ensures that completeness of a regular pair in general position is25quivalent to P K (Cent ) >

0. In order to test whether P K (Cent ) >

0, one can use the followingnatural discretization.

Deﬁnition 5.4.

For a matrix M ∈ M om,n , the set (cid:91) Cent ( M ) def =  a ∈ [ n ] : (cid:92) i ∈ [ m ] { b ∈ [ n ] : M ib < M ia } = ∅  is called the discretized central region .If a matrix M ∈ M om,n is sampled from a regular pair, then for each a ∈ [ n ], the set { b ∈ [ n ] : M ib < M ia } is a discretization of f − i ( −∞ , f i ( x a )), and (cid:91) Cent ( M ) can be thought of as anapproximation of Cent . The following theorem conﬁrms this intuition. Theorem 5.5.

Let M n ∈ M om,n be sampled from a regular pair, then n (cid:91) Cent ( M n )) convergesto P K (Cent ) in probability:for all (cid:15) > , lim n →∞ Pr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:91) Cent ( M n )) − P K (Cent ) (cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) (cid:21) = 0 . . The proof involves technicalities used in proving the Interleaving Convergence Theorem (Theo-rem 2.9) and is given in Section 6.7 in the Appendix. Theorem 5.5 establishes that n (cid:91) Cent ( M n ))serves as an approximation of P K (Cent ), and thus enables one to to test whether P K (Cent ) > F , P K ). First we prove the ﬁrst part of Lemma 5.3.

Lemma 5.6.

Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regular pair, where each function in F is diﬀer-entiable. Assume that F is in general position, then P K (Cent \ Cent ) = 0 .Proof. Let K (cid:48) = { x ∈ K | ∃ i, ∇ f i ( x ) = 0 } denote the union of critical points of functions in F .Since F is in general position, K (cid:48) has Lebesgue measure zero. Assume x ∈ Cent \ K (cid:48) , and thus ∀ u ∈ R d ∃ c , . . . , c m ≥ u = m (cid:88) i =1 c i ∇ f i ( x ) . (24)It can be easily shown, see e.g. Theorem 3.2.3 in [4], that if f is diﬀerentiable and quasi-convex on26n open convex K with ∇ f ( x ) (cid:54) = 0, then f ( x ) < f ( x ) implies (cid:104)∇ f ( x ) , x − x (cid:105) <

0. Thus (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) ⊆ (cid:92) i ∈ [ m ] { x | (cid:104)∇ f i ( x ) , x − x (cid:105) < } = ∅ , where the last equality follows from (24), as one can chose u = x − x . This implies x ∈ Cent .Therefore, Cent \ K (cid:48) ⊆ Cent and P K (Cent \ Cent ) ≤ P K ( K (cid:48) ) = 0.To prove the second half of Lemma 5.3, we ﬁrst recall that a convex cone C ⊆ R d is called ﬂat if there exists w (cid:54) = 0 such that both w ∈ C and − w ∈ C . Otherwise, it is called salient . If a convexcone C is closed and salient, then there exists w ∈ R d such that (cid:104) u, w (cid:105) <

0, for all non-zero u ∈ C . Lemma 5.7.

Let ( F , P K ) = ( { f i } i ∈ [ m ] , P K ) be a regular pair, where each f i is diﬀerentiable, then Cent \ Cent ⊆ (cid:110) x ∈ K : cone (cid:16) {∇ f i ( x ) } i ∈ [ m ] (cid:17) is ﬂat but not R d (cid:111) . (25) Proof.

Let x ∈ Cent \ Cent . Denote C = cone (cid:16) {∇ f i ( x ) } i ∈ [ m ] (cid:17) . Since x / ∈ Cent , C (cid:54) = R d . Thus, it suﬃces to prove that the cone C is ﬂat. Suppose that thecone C is not ﬂat, then there exists w such that (cid:104) u, w (cid:105) <

0, for all non-zero u ∈ C . In particular, (cid:104)∇ f i ( x ) , w (cid:105) < , ∀ i ∈ [ m ]. Let us show that ∀ i ∈ [ m ], there exists α i > x + α i w ∈ f − i ( −∞ , f i ( x )). Suppose not, then there exists i ∈ [ m ], such that f i ( x + αw ) ≥ f ( x ) , ∀ α > (cid:104)∇ f i ( x ) , w (cid:105) = lim inf α → + f ( x + αw ) − f ( x ) α ≥ , which is a contradiction. Thus, such positive α i ’s exist, and we obtain that x + (cid:18) min i ∈ [ m ] α i (cid:19) w ∈ (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) . This contradicts the assumption that x ∈ Cent . Therefore the cone C is ﬂat.It can be shown that the inclusion in (25) is in fact an equality. However, since we do not needthe equality here, it was left out the proof. To ﬁnish the proof of Lemma 5.3, we use the following Salient cones are also called pointed cones . It is well-known (see, e.g., Section 2.6.1 in [3]) that if

C ⊂ R d isa closed salient cone, then its dual cone C ∗ def = { w ∈ R d : (cid:104) w, u (cid:105) ≥ , ∀ u ∈ C} has nonempty interior. Consider −C ∗ = { w ∈ R d : (cid:104) w, u (cid:105) ≤ , ∀ u ∈ C} = { w ∈ R d : (cid:104) w, u (cid:105) < , ∀ u ∈ C} ∪ { w ∈ R d : (cid:104) w, u (cid:105) = 0 , ∀ u ∈ C} . If C isclosed and salient, then −C ∗ has nonempty interior. Note that, if d >

0, then { w ∈ R d : (cid:104) w, u (cid:105) = 0 , ∀ u ∈ C} hasmeasure 0 and hence { w ∈ R d : (cid:104) w, u (cid:105) < , ∀ u ∈ C} is nonempty and any vector in it satisﬁes the wanted property. emma 5.8. Let V = { v , ..., v m } ⊂ R d be a set of vectors in general direction, then cone( V ) = R d or cone( V ) is salient. To prove Lemma 5.8, we use the following lemma.

Lemma 5.9 (see e.g. Theorem 2.5 in [12]) . Let V = { v , ..., v m } be a set of non-zero vectors in R d , then the following two statements are equivalent:(i) cone( V ) = span( V ) ;(ii) For each i ∈ [ m ] , − v i ∈ cone( V \ { v i } ) .Proof of Lemma 5.8. For m ≤ d , the vectors { v , ..., v m } are linearly independent. Suppose thereexists w ∈ R d , such that w, − w ∈ cone( V ). Thus there exist a i , b i ≥ w = (cid:80) mi =1 a i v i = − (cid:80) mi =1 b i v i . Since the vectors { v , ..., v m } are linearly independent, a i + b i = 0 for all i ∈ [ m ], andthus w = 0. Therefore cone( V ) is salient.For m > d , we prove by induction on the size of V . Suppose the result holds for any set of m ≥ d vectors in general direction. Let V = { v , ..., v m +1 } be a set of m + 1 vectors in generaldirection. Since any d vectors in V is a basis in R d , span( V ) = R d . Suppose the result is false for V ; equivalently, cone( V ) (cid:54) = R d and cone( V ) is ﬂat. By Lemma 5.9, there exists j ∈ [ m + 1] suchthat − v j / ∈ cone( V \ { v j } ) . (26)Since cone( V ) is ﬂat, there exists a nonzero w ∈ R d such that w, − w ∈ cone( V ), and thus w = (cid:80) m +1 i =1 a i v i = − (cid:80) m +1 i =1 b i v i , with a i , b i ≥ i . Let us prove that a j + b j >

0. If a j + b j = 0, then a j = b j = 0. Thus, w, − w ∈ cone( V \ { v j } ) and cone( V \ { v j } ) is not salient. Since | V \ { v j }| = m ,by the induction hypothesis, we must have cone( V \ { v j } ) = R d . However, cone( V ) (cid:54) = R d and hencecone( V \ { v j } ) (cid:54) = R d , a contradiction. Therefore a j + b j >

0, and we can conclude that − v j = (cid:88) i ∈ [ m +1] \{ j } (cid:18) a i + b i a j + b j (cid:19) v i ∈ cone( V \ { v j } ) , contradicting to (26). Therefore, the result holds for any V in general direction of size | V | = m + 1.This completes the proof by induction.We now ﬁnish the proof of Lemma 5.3. Proof of Lemma 5.3.

The ﬁrst half of the proof of Lemma 5.3 is done in Lemma 5.6. To prove thesecond half, we combine Lemma 5.7 and Lemma 5.8 to obtainCent \ Cent ⊆ (cid:110) x ∈ K : {∇ f i ( x ) } i ∈ [ m ] is not in general direction (cid:111) . (27)28ince { f i } i ∈ [ m ] is in general position, the right hand side of (27) has measure zero, completing theproof. Proof.

For any a ≤ n −

1, the point x a is ordered the last in the sequence s a = ( · · · , n, a ); thus,by Lemma 1.3 each such point x a cannot be in the interior of the convex hull of the other points,therefore x n ∈ conv( x , . . . , x n − ). Assume that the embedding dimension is d ≤ n −

3, then bythe Caratheodorys theorem we conclude that there exists b ∈ [ n − x n ∈ conv( x , ..., ˆ x b , ..., x n − ) . (28)However, by assumptions (3) there exists a continuous quasi-convex function f b such that f b ( x a )

2, one can place points x , ..., x n − to the vertices of an ( n − R n − , and place x n to the barycenter of thatsimplex. By construction, { x , ..., x n − } are convexly independent and we have for following convexhull relations for every i < n : x n / ∈ conv ( { x , ..., x n − } \ { x i } ) , and x i / ∈ conv( { x , ..., x n } \ { x i } ).Therefore, by Lemma 1.3 there exist quasi-convex continuous functions that realize the sequencesin (3). λ i ( t ) for t ∈ (0 , Lemma (Lemma 2.4) . Let f : K → R be a continuous function with P K ( f − ( (cid:96) )) = 0 for all (cid:96) ∈ R ,where P K is a probability measure on a convex open set K and P K is equivalent to the Lebesguemeasure on K . Then there exists a unique strictly increasing continuous function λ : (0 , → R such that, for all t ∈ (0 , , P K (cid:0) f − ( −∞ , λ ( t )) (cid:1) = t. (29) Proof.

Since K is path-connected, by intermediate value theorem, f ( K ) is an interval in R . Deﬁnea function p f : f ( K ) → [0 ,

1] by p f ( (cid:96) ) def = P K ( f − ( −∞ , (cid:96) )). Rewriting Equation (5) as p f ( λ ( t )) = t ,we note that λ ( t ) (if exists) is the inverse of p f , proving uniqueness of λ ( t ). For the existence andcontinuity of λ ( t ), it suﬃces to prove p f is continuous and strictly increasing.To prove p f is continuous, we prove p f is continuous from the right and from the left. Let (cid:96) ∈ f ( K ). For (cid:96) n (cid:37) (cid:96) in f ( K ), from deﬁnition, f − ( −∞ , (cid:96) n ) (cid:37) f − ( −∞ , (cid:96) ). Since P K is a Recall that a sequence ( (cid:96) n ) n ⊂ R goes up to (cid:96) ∈ R , denoted (cid:96) n (cid:37) (cid:96) , if (cid:96) n ≤ (cid:96) n +1 , for all n , and lim n →∞ (cid:96) n = (cid:96) . (cid:96) n (cid:38) (cid:96) is similarly deﬁned. Recall that a sequence of sets ( A n ) n goes up to a set A , denoted A n (cid:37) A , if A n ⊆ A n +1 , for all n , and (cid:83) n A n = A . P K on both sides, we obtain p f ( (cid:96) n ) (cid:37) p f ( (cid:96) ). Thus p f is continuous from theleft. On the other hand, for (cid:96) n (cid:38) (cid:96) in f ( K ), from deﬁnition, (cid:32) ∞ (cid:92) n =1 f − ( −∞ , (cid:96) n ) (cid:33) \ f − ( −∞ , (cid:96) ) = f − ( (cid:96) ) . Thus p f ( (cid:96) n ) (cid:38) P K ( (cid:84) ∞ n =1 f − ( −∞ , (cid:96) n )) = P K ( f − ( −∞ , (cid:96) )) + P K ( f − ( (cid:96) )) = p f ( (cid:96) ) + 0 = p f ( (cid:96) ) and p f is continuous from the right. Therefore, p f is a continuous function.Now we turn to prove p f is strictly increasing. For (cid:96) < (cid:96) in f ( K ), we need to prove p f ( (cid:96) ) < p f ( (cid:96) ). Let U = f − ( −∞ , (cid:96) ) and U = f − ( −∞ , (cid:96) ), which are open convex sets with U ⊆ U . Since f ( K ) is an interval, for any (cid:96) ∈ ( (cid:96) , (cid:96) ), there exists x ∈ K with f ( x ) = (cid:96) .Thus U (cid:54) = U . Note that U (cid:42) cl ( U ); otherwise, U ⊆ U = int ( U ) ⊆ int ( cl ( U )) = U willimply U = U , where the last equality follows from openness and convexity of U . Thus, thereexists x ∈ U \ cl ( U ). Choose (cid:15) > B ( x , (cid:15) ) ⊆ U but B ( x , (cid:15) ) ∩ cl ( U ) = ∅ . Then P K ( U ) ≥ P K ( U ) + P K ( B ( x , (cid:15) )) > P K ( U ); equivalently, p f ( (cid:96) ) > p f ( (cid:96) ). Hence, p f is strictlyincreasing. The goal of this subsection is to prove Theorem 2.9, the Interleaving Convergence Theorem.The asymptotic behavior of Dow( S ( M )) actually follows from the asymptotic behaviors of severalbuilding blocks of Dow( S ( M )). We will ﬁrst deﬁne these building blocks and prove their ownasymptotic theorems and then put these asymptotic theorems together to prove the InterleavingConvergence Theorem.We start with an object that, as will be seen, can be used to express D ow( F , P K ). Recall that,for t ∈ [0 ,

1] and i ∈ [ m ], K ( i ) ( t ) def = f − i ( −∞ , λ i ( t ))where λ i ( t ) ∈ R satisﬁes P K ( f − i ( −∞ , λ i ( t ))) = t . Deﬁnition 6.1.

For a regular pair ( F , P K ), deﬁne a function R ∞ : [0 , m → [0 ,

1] by R ∞ ( t , ..., t m ) def = P K  (cid:92) i ∈ [ m ] K ( i ) ( t i )  . It is easy to see that R ∞ is a cumulative distribution function ( CDF ). We next introduce an-other CDF, denoted R n , which will be used as an intermediate between D ow( F , P K ) and Dow( S ( M )). A n (cid:38) A is similarly deﬁned. K ( i ) n ( t ) deﬁned in Deﬁnition 6.3. Deﬁnition 6.2.

For a point cloud X n ⊂ K of size n , sampled from a regular pair, we deﬁne afunction R n : [0 , m → [0 ,

1] by R n ( t , ..., t m ) def = 1 n ·  X n ∩  (cid:92) i ∈ [ m ] K ( i ) ( t i )  . For those familiar with nonparametric statistics, it is easy to see that R n is in fact the empiricalcumulative distribution function ( empirical CDF ) of R ∞ . However, R n is still not obtainablefrom the m × n data matrix M n = [ f i ( x a )] since K ( i ) ( t i ) is not directly accessible from M n . Thenext deﬁnition is introduced to solve this problem by considering a step-function approximation of K ( i ) ( t i ). Deﬁnition 6.3.

Let M n ∈ M om,n be sampled from a regular pair ( F , P K ), where F = { f i } i ∈ [ m ] .For i ∈ [ m ] and t ∈ [0 , K ( i ) n ( t ) =  f − i ( −∞ , f i ( x a )) if t ∈ (cid:104) ord i ( M n ,a ) − n , ord i ( M n ,a ) n (cid:17) , a = 1 , ..., nK if t = 1 . For a pictorial illustration of K ( i ) n ( t ), please refer to Figure 7. Notice that there is a subscript n in K ( i ) n ( t ), indicating its dependence on the sampled matrix M n .The object in the next deﬁnition is obtainable solely from the data matrix M n , sampled froma regular pair. Deﬁnition 6.4.

Let M n = [ M ia ] be an m × n data matrix, sampled from a regular pair. Deﬁne a31unction ˆ R n : [0 , m → [0 ,

1] byˆ R n ( t , ..., t m ) def = 1 n · (cid:18)(cid:26) a ∈ [ n ] : ord i ( M n , a ) n ≤ t i , ∀ i ∈ [ m ] (cid:27)(cid:19) = 1 n ·  (cid:92) i ∈ [ m ] (cid:26) a ∈ [ n ] : ord i ( M n , a ) n ≤ t i (cid:27) . In the following lemma, we rewrite ˆ R n in a form that is similar to the deﬁnition of R n , whichhelps build a connection between them. Lemma 6.5.

Let X n = { x , · · · , x n } ⊂ K be a point cloud, sampled from a regular pair, and M n be the corresponding m × n data matrix. Then, for all ( t , ..., t m ) ∈ [0 , m , ˆ R n ( t , ..., t m ) = 1 n ·  X n ∩  (cid:92) i ∈ [ m ] K ( i ) n ( t i )  . Proof.

For each i ∈ [ m ] and t i ∈ [0 , b ∈ [ n ] such that ord i ( M n ,b ) − n ≤ t i < ord i ( M n ,b ) n .Then K ( i ) n ( t i ) = f − i ( −∞ , f i ( x b )) and X n ∩ K ( i ) n ( t i ) = { x a ∈ X n : f i ( x a ) < f i ( x b ) } = { x a ∈ X n : ord i ( M n , a ) < ord i ( M n , b ) } = { x a ∈ X n : ord i ( M n , a ) ≤ nt i } = (cid:26) x a ∈ X n : ord i ( M n , a ) n ≤ t i (cid:27) . For t i = 1, K ( i ) n ( t i ) = K and the above equality still holds. Thus, for any ( t , ..., t m ) ∈ [0 , m , X n ∩  (cid:92) i ∈ [ m ] K ( i ) n ( t i )  = (cid:92) i ∈ [ m ] (cid:26) x a ∈ X n : ord i ( M n , a ) n ≤ t i (cid:27) . By the deﬁnition of ˆ R n , the equality follows.Now the intuition behind the approximations is quite clear: since K ( i ) n is an approximation of K ( i ) , by Lemma 6.5, ˆ R n is an approximation of R n . Therefore, ˆ R n also approximates R ∞ .Next we connect R ∞ and ˆ R n with our target objects Dow( S ( M )) and D ow( F , P K ). Forsimplicity, we introduce the following convenient notations: Deﬁnition 6.6.

For ( t , ..., t m ) ∈ [0 , m and σ ⊆ [ m ], deﬁne t σi def =  t i , if i ∈ σ , otherwise . With these notations, we have 32 heorem 6.7.

Let ( F , P K ) be a regular pair and M n be an m × n data matrix, sampled from ( F , P K ) . Then, for all ( t , ..., t m ) ∈ [0 , m , we have(i) D ow( F , P K )( t , ..., t m ) = { σ ⊆ [ m ] : R ∞ ( t σ , ..., t σm ) (cid:54) = 0 } .(ii) Dow( S ( M n ))( t , ..., t m ) = { σ ⊆ [ m ] : ˆ R n ( t σ , ..., t σm ) (cid:54) = 0 } .Proof. For the ﬁrst equality, recall that D ow( F , P K )( t , ..., t m ) = nerve( { K ( i ) ( t i ) } i ∈ [ m ] ); namely, σ ∈ D ow( F , P K )( t , ..., t m ) if and only if (cid:84) i ∈ σ K ( i ) ( t i ) (cid:54) = ∅ . Since each K ( i ) ( t i ) is open, σ ∈ D ow( F , P K )( t , ..., t m ) is also equivalent to P K ( (cid:84) i ∈ σ K ( i ) ( t i )) (cid:54) = 0. Notice that, by deﬁnition of R ∞ and t σi , R ∞ ( t σ , ..., t σm ) = P K  (cid:92) i ∈ [ m ] K ( i ) ( t σi )  = P K (cid:32)(cid:92) i ∈ σ K ( i ) ( t i ) (cid:33) ∩  (cid:92) i ∈ [ m ] \ σ K ( i ) (1)  = P K (cid:32)(cid:92) i ∈ σ K ( i ) ( t i ) (cid:33) . Therefore, the ﬁrst equality follows.For the second equality, recall, from Lemma 2.3, thatDow( S ( M n ))( t , · · · , t m ) = nerve( { A ( i ) ( t i ) } i ∈ [ m ] )where A ( i ) ( t i ) = { a ∈ [ n ] : n · { b ∈ [ n ] : b ≤ i a } ) ≤ t i } . Thus, σ ∈ Dow( S ( M ))( t , · · · , t m ) isequivalent to (cid:84) i ∈ σ A ( i ) ( t i ) (cid:54) = ∅ , which is also equivalent to (cid:84) i ∈ σ A ( i ) ( t i )) (cid:54) = 0 since each A ( i ) ( t i )is a ﬁnite set. Note that A ( i ) ( t i ) is exactly X n ∩ K ( i ) n ( t i ). Thus, by Lemma 6.5,ˆ R n ( t σ , · · · , t σm ) = 1 n ·  X n ∩  (cid:92) i ∈ [ m ] K ( i ) n ( t σi )  = 1 n · (cid:32)(cid:92) i ∈ σ A ( i ) ( t i ) (cid:33) ∩  (cid:92) i ∈ [ m ] \ σ A ( i ) (1)  = 1 n · (cid:32)(cid:92) i ∈ σ A ( i ) ( t i ) (cid:33) . Hence, (cid:84) i ∈ σ A ( i ) ( t i )) (cid:54) = 0 is equivalent to ˆ R n ( t σ , · · · , t σm ) (cid:54) = 0, and the second equality follows.Before diving into asymptotic results, let us look at one useful property of R ∞ . Lemma 6.8.

The map R ∞ in Deﬁnition 6.1 is uniformly continuous.Proof. Let ( t , ..., t m ) , ( t (cid:48) , ..., t (cid:48) m ) ∈ [0 , m . Denoting the symmetric diﬀerence of any two sets A B by A (cid:9) B def = ( A \ B ) ∪ ( B \ A ), then (cid:12)(cid:12) R ∞ ( t , ..., t m ) − R ∞ ( t (cid:48) , ..., t (cid:48) m ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P K  (cid:92) i ∈ [ m ] K ( i ) ( t i )  − P K  (cid:92) i ∈ [ m ] K ( i ) ( t (cid:48) i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ P K  (cid:92) i ∈ [ m ] K ( i ) ( t i )  (cid:9)  (cid:92) i ∈ [ m ] K ( i ) ( t (cid:48) i )  ≤ P K  (cid:91) i ∈ [ m ] (cid:16) K ( i ) ( t i ) (cid:9) K ( i ) ( t (cid:48) i ) (cid:17) ≤ (cid:88) i ∈ [ m ] P K (cid:16) K ( i ) ( t i ) (cid:9) K ( i ) ( t (cid:48) i ) (cid:17) = (cid:88) i ∈ [ m ] | t i − t (cid:48) i | . Using this inequality, it is now easy to obtain that R ∞ is uniformly continuous.Now we arrive at a theorem that is key to the proof of Interleaving Convergence Theorem.In the rest of the discussion, we use w.h.p. to refer to with high probability . Namely, if westate, as n → ∞ , w.h.p., a sequence ( A n ) ∞ n =1 of events holds, then this means that, as n → ∞ , theprobability Pr[ A n ] approaches 1. Theorem 6.9. (1st Asymptotic Theorem)

The sup-norm (cid:107) R n − R ∞ (cid:107) ∞ converges to in probability. In other words, for any (cid:15) > , lim n →∞ Pr [ (cid:107) R n − R ∞ (cid:107) ∞ ≤ (cid:15) ] = 1 . For the proof, we recall an intuitive fact from probability theory:

For a ﬁnite collection of N events A ,n , · · · , A N,n that depends on n = 1 , , · · · , if, foreach i ∈ [ N ] , lim n →∞ Pr[ A i,n ] = 1 , then lim n →∞ Pr[ (cid:84) i ∈ [ N ] A i,n ] = 1 .Proof of Theorem 6.9. Let ( t , ..., t m ) ∈ [0 , m . Let I be the indicator function of (cid:84) i ∈ [ m ] K ( i ) ( t i ).In other words, I : K → { , } is a function deﬁned by I ( x ) =  x ∈ (cid:84) i ∈ [ m ] K ( i ) ( t i )0 otherwise . Notice that, since (

K, P K ) is a probability space (with Borel σ -algebra), I is a random variable.Moreover, by Deﬁnition 6.2, if I , ..., I n are i.i.d copies of I , then R n ( t , ..., t m ) = 1 n n (cid:88) a =1 I a . Let p = P K (cid:16)(cid:84) i ∈ [ m ] K ( i ) ( t i ) (cid:17) = R ∞ ( t , ..., t m ) = Pr[ I = 1]. By Chebyshev inequality, for any (cid:15) > n → ∞ , Pr (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) n · n (cid:88) a =1 I a (cid:33) − p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > (cid:15) (cid:35) ≤ Var (cid:0) n · (cid:80) na =1 I a (cid:1) (cid:15) = p (1 − p ) n(cid:15) → . i.e. Pr [ | R n ( t , ..., t m ) − R ∞ ( t , ..., t m ) | > (cid:15) ] →

0. Thus we have obtained the pointwise convergenceversion of the result.To prove uniform convergence, consider the following: By Lemma 6.8, R ∞ is uniformly contin-uous. Thus there exists δ > t , ..., t m ) and ( t (cid:48) , ..., t (cid:48) m ) with max i ∈ [ m ] | t i − t (cid:48) i | ≤ δ , (cid:12)(cid:12) R ∞ ( t , ..., t m ) − R ∞ ( t (cid:48) , ..., t (cid:48) m ) (cid:12)(cid:12) ≤ (cid:15). Subdivide [0 , m into ﬁnitely many ( m -dimensional) rectangles of sides shorter than δ . Let V bethe collection of all vertices of all rectangles in the subdivision. Since V is a ﬁnite set, by the abovepointwise result, as n → ∞ , w.h.p., sup (cid:126)t ∈ V | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15). (30)In other words, lim n →∞ Pr (cid:34) sup (cid:126)t ∈ V | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15) (cid:35) = 1 . We now claim that Equation (30) implies (cid:107) R n − R ∞ (cid:107) ∞ = sup (cid:126)t ∈ [0 , m | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15). (31)Let (cid:126)t = ( t , ..., t m ) be an arbitrary element in [0 , m . Then (cid:126)t lies in some small rectangle of thesubdivision. Let (cid:126)t and (cid:126)t be the unique maximum and minimum, respectively, in the rectangle.Then R ∞ ( (cid:126)t ) − (cid:15) ≤ ( R ∞ ( (cid:126)t ) + (cid:15) ) − (cid:15) = R ∞ ( (cid:126)t ) (uniform continuity of R ∞ ) ≤ R n ( (cid:126)t ) + (cid:15) ( (cid:126)t is a vertex in the subdivision) ≤ R n ( (cid:126)t ) + (cid:15) ( R n is monotone and (cid:126)t ≤ (cid:126)t ) ≤ R n ( (cid:126)t ) + (cid:15) ( R n is monotone and (cid:126)t ≤ (cid:126)t ) ≤ ( R ∞ ( (cid:126)t ) + (cid:15) ) + (cid:15) ( (cid:126)t is a vertex in the subdivision) ≤ R ∞ ( (cid:126)t ) + (cid:15) + (cid:15) + (cid:15) (uniform continuity of R ∞ ) . Thus, R ∞ ( (cid:126)t ) − (cid:15) ≤ R n ( (cid:126)t ) ≤ R ∞ ( (cid:126)t ) + 2 (cid:15) . i.e. | R ∞ ( (cid:126)t ) − R n ( (cid:126)t ) | ≤ (cid:15) . Since (cid:126)t ∈ [0 , m is arbitrary,35quation (31) follows. In other words,lim n →∞ Pr (cid:34) sup (cid:126)t ∈ [0 , m | R n ( (cid:126)t ) − R ∞ ( (cid:126)t ) | ≤ (cid:15) (cid:35) = 1 . Rescaling 2 (cid:15) to (cid:15) , the uniform result follows.For people familiar with non-parametric statistics, it is immediate that Theorem 6.9 is anatural m -dimensional generalization of the standard Glivenko-Cantelli theorem under speciﬁcconditions.Recall that, for x ∈ K and i ∈ [ m ], T i ( x ) def = P K (cid:0) f − i ( −∞ , f i ( x )) (cid:1) . Lemma 6.10.

For all (cid:15) > , as n → ∞ , w.h.p., K ( i ) ( t − (cid:15) ) ⊆ K ( i ) n ( t ) ⊆ K ( i ) ( t + (cid:15) ) , ∀ i ∈ [ m ] , t ∈ [0 , where K ( i ) ( t ) def = ∅ for t < and to be K ( i ) ( t ) def = K for t > .Proof. For t = 1, K ( i ) n ( t ) = K = K ( i ) ( t + (cid:15) ) and the inclusions are clearly satisﬁed. Now let t ∈ [0 , t ∈ [ an , a +1 n ) for some a ∈ { , , · · · , n − } . W.L.O.G., assume f i ( x ) < · · ·

1) = T i ( x a +1 ) and R n (1 , · · · , , T i ( x a +1 ) , , · · · ,

1) = n · K ( i ) ( T i ( x a +1 )) ∩ X n ) = an . Choose n large enough suchthat n < (cid:15) . Then | T i ( x a +1 ) − t | ≤ | T i ( x a +1 ) − an | + | an − t | = | R ∞ (1 , · · · , , T i ( x a +1 ) , , · · · , − R n (1 , · · · , , T i ( x a +1 ) , , · · · ,

1) + | an − t | . In the last expression, by Theorem 6.9, w.h.p., the ﬁrst term is less than (cid:15) , not depending on t , andthe second term is less than (cid:15) by our choice of suﬃcient large n . Thus, w.h.p., | T i ( x a +1 ) − t | < (cid:15) ,for all t ∈ [0 ,

1] and the result follows.

Corollary 6.11.

For all (cid:15) > , as n → ∞ , w.h.p., R n ( t − (cid:15), ..., t m − (cid:15) ) ≤ ˆ R n ( t , ..., t m ) ≤ R n ( t + (cid:15), ..., t m + (cid:15) ) , ∀ ( t , ..., t m ) ∈ [0 , m where, for the variables of R n , negative inputs are automatically replaced by and inputs greaterthan are automatically replaced by . Glivenko-Cantelli theorem has been generalized in many aspects in diﬀerent literatures and is closely related tothe famous VC (VapnikChervonenkis) theory in theoretical machine learning. See, for example, Chapter 12 of [6] fora detailed introduction that connects standard Glivenko-Cantelli theorem and the VC theory. roof. As n → ∞ , w.h.p., R n ( t − (cid:15), ..., t m − (cid:15) ) = 1 n ·  (cid:92) i ∈ [ m ] K ( i ) ( t i − (cid:15) )  ∩ X n  (Deﬁnition 6.2) ≤ n ·  (cid:92) i ∈ [ m ] K ( i ) n ( t i )  ∩ X n  (Lemma 6.10)(= ˆ R n ( t , ..., t m )) (Lemma 6.5) ≤ n ·  (cid:92) i ∈ [ m ] K ( i ) ( t i + (cid:15) )  ∩ X n  (Lemma 6.10)= R n ( t + (cid:15), ..., t m + (cid:15) ) (Deﬁnition 6.2) . Hence the result follows.Motivated from Theorem 6.7, the key to prove the Interleaving Convergence Theorem is thezero sets of R ∞ , R n , and ˆ R n , explicitly deﬁned below. Deﬁnition 6.12.

Let R ∞ , R n , and ˆ R n be deﬁned as in Deﬁnition 6.1, 6.2, and 6.4. Deﬁne thefollowing subsets of [0 , m :(i) Z ∞ def = R − ∞ (0) = (cid:110) ( t , ..., t m ) : (cid:84) i ∈ [ m ] K ( i ) ( t i ) = ∅ (cid:111) .(ii) Z n def = R − n (0) = (cid:110) ( t , ..., t m ) : X n ∩ (cid:16)(cid:84) i ∈ [ m ] K ( i ) ( t i ) (cid:17) = ∅ (cid:111) .(iii) ˆ Z n def = ˆ R − n (0) = (cid:110) ( t , ..., t m ) : X n ∩ (cid:16)(cid:84) i ∈ [ m ] K ( i ) n ( t i ) (cid:17) = ∅ (cid:111) . For Z ⊆ R m and (cid:15) >

0, deﬁne Z + (cid:15) def = { ( t + (cid:15) , ..., t m + (cid:15) m ) : ( t , ..., t m ) ∈ Z, ≤ (cid:15) i ≤ (cid:15) } . (32)Note that, since R ∞ , R n , and ˆ R n are all monotone, Z ∞ , Z n , and ˆ Z n are closed under lowerparital order; namely, for Z = Z ∞ , Z n or ˆ Z n , if (cid:126)t ∗ ∈ Z , then (cid:126)t ∈ Z for all (cid:126)t ≤ (cid:126)t ∗ . Lemma 6.13.

Let Z ∞ , Z n , and ˆ Z n be deﬁned as in Deﬁnition 6.12. Then, for all (cid:15) > , as n → ∞ , w.h.p.,(i) Z n ⊆ ˆ Z n + (cid:15) and ˆ Z n ⊆ Z n + (cid:15) .(ii) Z n ⊆ Z ∞ + (cid:15) .Moreover, with probability , See Deﬁnition 6.3 for the deﬁning formula of K ( i ) n ( t i ). iii) Z ∞ ⊆ Z n .Proof. For the ﬁrst inclusion in (i), let z ∈ Z n , where z = ( t , · · · , t m ); namely, R n ( z ) = 0.By Corollary 6.11, w.h.p., ˆ R n ( t − (cid:15), · · · , t m − (cid:15) ) = 0; namely, ( t − (cid:15), · · · , t m − (cid:15) ) ∈ ˆ Z n . Thus z = ( t , · · · , t m ) ∈ ˆ Z n + (cid:15) .For the second inclusion in (i), let z ∈ ˆ Z n , where z = ( t , · · · , t m ); namely, ˆ R n ( z ) = 0.By Corollary 6.11, w.h.p., R n ( t − (cid:15), · · · , t m − (cid:15) ) = 0; namely, ( t − (cid:15), · · · , t m − (cid:15) ) ∈ Z n . Thus z = ( t , · · · , t m ) ∈ Z n + (cid:15) .To prove (ii), ﬁrst, let ν = inf { R ∞ ( (cid:126)t ) : (cid:126)t ∈ [0 , m \ ( Z ∞ + (cid:15) ) } ; if [0 , m \ ( Z ∞ + (cid:15) ) = ∅ , simplydeﬁne ν = 1. Since Z ∞ is the domain where R ∞ takes zero values and Z ∞ + (cid:15) is strictly awayfrom Z ∞ , by continuity of R ∞ and the fact R ∞ (1 , · · · ,

1) = 1, we must have ν >

0. By Theorem6.9, w.h.p., (cid:107) R n − R ∞ (cid:107) ∞ ≤ ν/

2. Thus, by triangle inequality, w.h.p., R n ≥ ν − ν/ ν/ > , m \ ( Z ∞ + (cid:15) ). Now, for every (cid:126)t ∈ Z n , R n ( (cid:126)t ) = 0 and thus (cid:126)t / ∈ [0 , m \ Z ∞ + (cid:15) ; namely, (cid:126)t ∈ Z ∞ + (cid:15) . Hence Z n ⊆ Z ∞ + (cid:15) .To prove (iii), let (cid:126)t ∈ Z ∞ . Then R ∞ ( (cid:126)t ) = 0, where (cid:126)t = ( t , ..., t m ); namely P K ( (cid:84) i ∈ [ m ] K ( i ) ( t i )) =0. Since K ( i ) ( t i ) is open for all i ∈ [ m ], (cid:84) i ∈ [ m ] K ( i ) ( t i ) is open with zero measure. Therefore, (cid:84) i ∈ [ m ] K ( i ) ( t i ) = ∅ and thus R n ( (cid:126)t ) = n · (cid:84) i ∈ [ m ] K ( i ) ( t i )) ∩ X n ) = 0. Hence, (cid:126)t ∈ Z n and Z ∞ ⊆ Z n .Immediate from Lemma 6.13 is Corollary 6.14.

Let Z ∞ and ˆ Z n be deﬁned as in Deﬁnition 6.12. Then, for all (cid:15) > , as n → ∞ ,w.h.p., ˆ Z n ⊆ Z ∞ + (cid:15) and Z ∞ ⊆ ˆ Z n + (cid:15). We are now able to prove Theorem 2.9.

Theorem. (Theorem 2.9, Interleaving Convergence Theorem)

Let M n ∈ M om,n be sampled from a regular pair ( F , P K ) . Then, for all (cid:15) > , as n → ∞ , Pr [ d INT ( D ow( F , P K ) , Dow( S ( M n ))) ≤ (cid:15) ] → . Proof.

Let (cid:15) >

0. We need to prove: as n → ∞ , w.h.p. D ow( F , P K )( t − (cid:15), ..., t m − (cid:15) ) ⊆ Dow( S ( M n ))( t , ..., t m ) ⊆ D ow( F , P K )( t + (cid:15), ..., t m + (cid:15) ) . By Corollary 6.14, as n → ∞ , w.h.p., ˆ Z n ⊆ Z ∞ + (cid:15) and Z ∞ ⊆ ˆ Z n + (cid:15) .For the ﬁrst inclusion, let σ ∈ D ow( F , P K )( t − (cid:15), ..., t m − (cid:15) ). Then, by Theorem 6.7, R ∞ (( t − (cid:15) ) σ , · · · , ( t m − (cid:15) ) σ ) (cid:54) = 038here ( t i − (cid:15) ) σ = ( t i − (cid:15) ) if i ∈ σ and ( t i − (cid:15) ) σ = 1 if i / ∈ σ . We need to prove ˆ R n ( t σ , · · · , t σm ) (cid:54) = 0.Let us prove by contradiction. Suppose ˆ R n ( t σ , · · · , t σm ) = 0. Then ( t σ , · · · , t σm ) ∈ ˆ Z n . Thus( t σ , · · · , t σm ) ∈ Z ∞ + (cid:15) and (( t − (cid:15) ) σ , · · · , ( t m − (cid:15) ) σ ), meaning R ∞ ((( t − (cid:15) ) σ , · · · , ( t m − (cid:15) ) σ )) = 0,a contradiction. Thus ˆ R n ( t σ , · · · , t σm ) (cid:54) = 0 and σ ∈ Dow( S ( M n ))( t , ..., t m ), completing the 1stpart. Analogously, the second inclusion can be proved by using Theorem 6.7, with the help of Z ∞ ⊆ ˆ Z n + (cid:15) . L k ( M n ) to L k ( F , P K ) In this subsection, we state the well-known Isometry Theorem in topological data analysis anduse it to prove Theorem 3.10. We begin with the deﬁnition of a quadrant-tame persistence module.

Deﬁnition 6.15 (Deﬁnition 1.12 in [11]) . A persistence module V = ( V i , v ji ) over R is quadrant-tame if rank v ji < ∞ for all i < j . Theorem 6.16 (Isometry Theorem, Theorem 3.1 in [11]) . Let V , W be quadrant-tame persistencemodules over R . Then d b (dgm( V ) , dgm( W )) = d i ( V , W ) where d b is the bottleneck distance between persistence diagrams and d i is the interleaving distancebetween persistence modules. Notice that, throughout the paper, all simplicial complexes are subcomplexes of 2 [ m ] and henceall vector spaces in the persistence modules we consider are ﬁnite dimensional and thus quadrant-tame. Therefore, we have the Isometry Theorem available. In the rest of this section, the proofof Theorem 3.10 is broken into several lemmas based on some newly developed tools. Since thepresentation is in logical order instead of the order of ideas, we give a quick overview of how theyare pieced together.The central observation throughout the proof is Lemma 6.21, which writes both L k ( F , P K )and L k ( M n ) in terms of double supremum expressions. Notice that their expressions only diﬀer in D ow( F , P K ) and Dow( S ( M n )), and in T ( F , P K ) + and T ( M n ) + , whcih are introduced in Deﬁnition6.18 and Deﬁnition 6.17.With this in mind, it is easy to see that a result that bounds the variation of the doublesupremum expression when D ow( F , P K ) is replaced by Dow( S ( M n )) is needed, which is exactlyLemma 6.23. Similarly, a result that bounds the variation of the double supremum expressionwhen T ( F , P K ) + is replaced by T ( M n ) + is also needed, which is Lemma 6.22. We still need tojustify the applicability of Lemma 6.23 and Lemma 6.22, respectively, which corresponds to theInterleaving Convergence Theorem (Theorem 2.9) and Lemma 6.19, respectively. Now the piecescan be connected and combined to complete the proof of Theorem 3.10. Notice that IsometryTheorem (Theorem 6.16) is lurking in the proofs of Lemma 6.23 and Lemma 6.22 and thus playingan important role in the proof of Theorem 3.10.39 eﬁnition 6.17. Let

T ⊆ [0 , m . Deﬁne the set of diagonal rays of T , denoted T + , by T + def = { ray T : T = ( t , ..., t m ) ∈ T } (33)where ray T def = (cid:8) ( t − t, ..., t m − t ) : t ∈ [0 , max i ∈ [ m ] t i ] (cid:9) . Deﬁnition 6.18.

Let ( F , P K ) be a regular pair and M n ∈ M om,n be sampled from ( F , P K ). Deﬁnethe following two subsets of [0 , m : T ( F , P K ) def = { ( T ( x ) , ..., T m ( x )) : x ∈ K } , andˆ T ( M n ) def = { (ord ( M n , a ) /n, · · · , ord m ( M n , a ) /n ) : a ∈ [ n ] } . Recall that the

Hausdorﬀ distance between two subsets S , S of R m is deﬁned asd H ( S , S ) = inf { (cid:15) > S ⊆ S + B (0 , (cid:15) ) , S ⊆ S + B (0 , (cid:15) ) } (34)where B (0 , (cid:15) ) is the (cid:15) -ball in R m centered at 0 and + inside the inf is the operation of Minkowskisum. In the next lemma, we prove that ˆ T ( M n ) approximates T ( F , P K ) in Hausdorﬀ distance. Lemma 6.19.

Let M n ∈ M om,n be sampled from a regular pair ( F , P K ) . Then, as n → ∞ , d H ( T ( F , P K ) , ˆ T ( M n )) → in probability. (35) Proof.

Recall that, for each i ∈ [ m ], T i = φ i ◦ f i , where φ i is a monotone increasing function. Sincethere is no measure jump in a regular pair (i.e. P K ( f − i ( (cid:96) )) = 0 for all i ∈ [ m ] , (cid:96) ∈ R ), each φ i is continuous and so is each T i . Since each f i can be extended continuously to ¯ K , we also haveeach T i continuously extendable to ¯ K . Since ¯ K is compact, the function ( T , ..., T m ) : ¯ K → R m isuniformly continuous.Let (cid:15) >

0. We need to prove, as n → ∞ , w.h.p.,d H ( T ( F , P K ) , ˆ T ( M n )) < (cid:15) (36)By uniform continuity, there exists δ > x, y ∈ K with (cid:107) x − y (cid:107) ≤ δ , (cid:107) ( T ( x ) , · · · , T m ( x )) − ( T ( y ) , · · · , T m ( y )) (cid:107) ≤ (cid:15)/ . (37)Let X n = { x , ..., x n } be a sample of size n , i.i.d. from ( F , P K ). Let us prove that, as n → ∞ ,w.h.p., K ⊆ X n + B (0 , δ ) (38)Since K is bounded, we may cover K by ﬁnitely many small rectangles of diameters smaller than δ ,where each rectangle intersects K and the rectangles intersect each other only on their boundaries.40enote the rectangles by { R , ..., R N } . Let p j def = P K ( R j ∩ K ), which are positive by openness of K . Then Pr[( R j ∩ K ) ∩ X n (cid:54) = ∅ , ∀ j ∈ [ N ]] ≥ − N (cid:88) j =1 (1 − p j ) n . (39)Since N is ﬁnite and each 1 − p j ∈ [0 , n → ∞ , w.h.p., ( R j ∩ K ) ∩ X n (cid:54) = ∅ , for all j = 1 , ..., N .Since the diameter of each R j ∩ K is less than δ , Equation (38) follows.Let us prove another preparatory result: as n → ∞ , w.h.p.,max a ∈ [ n ] ,i ∈ [ m ] (cid:12)(cid:12)(cid:12)(cid:12) ord i ( M n , a ) n − T i ( x a ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:114) (cid:15) m . (40)Treating each T i as a cumulative distribution function deﬁned on K , since [ m ] is ﬁnite, Equation(40) is an immediate consequence of Glivenko-Cantelli Theorem.Now we return to the proof. Notice that T ( F , P K ) is the image of ( T , ..., T m ) : K → [0 , m .By Equation (38) and Equation (37), w.h.p., T ( F , P K ) ⊆ ( T , ..., T m )( X n )+ B (0 , (cid:15)/ T , ..., T m )( X n ) ⊆ ˆ T ( M n ) + B (0 , (cid:15)/ T ( F , P K ) ⊆ ˆ T ( M n ) + B (0 , (cid:15) ). On the otherhand, by Equation (40), ˆ T ( M n ) ⊆ T ( F , P K ) + B (0 , (cid:15)/ ⊆ T ( F , P K ) + B (0 , (cid:15) ), completing theproof.In the following, we develop the convention of restricting a multi-ﬁltered complex to a diagonalray as deﬁned in Deﬁnition 6.17. Deﬁnition 6.20.

Let

T ⊆ [0 , m and K = {K ( T ) = K ( t , ..., t m ) } T ∈ R m be a multi-ﬁltered complexindexed over R m with K ( T ) ⊆ [ m ] for all T ∈ R m . For T = ( t , ..., t m ) ∈ T , let ray T be asin Deﬁnition 6.17. Deﬁne the restriction of K to ray T as the 1-dimensional ﬁltered complex K| ray T = {K| ray T ( t ) } t , indexed over t ∈ [0 , max i ∈ [ m ] t i ], by K| ray T ( t ) def = K (cid:18) t − max i ∈ [ m ] t i + t, · · · , t m − max i ∈ [ m ] t i + t (cid:19) . (41)Since we usually need to use interleaving distance to compare two ﬁltered complexes, we extendthe indexing set of K| ray T to R by K| ray T ( t ) def =  ∅ if t < [ m ] if t > max i ∈ [ m ] t i . (42)With these conventions, we state the following lemma:41 emma 6.21. For each k ∈ { } ∪ N , L k ( F , P K ) = sup ray ∈T ( F ,P K ) + sup { ( β − α ) : ( α, β ) ∈ dgm( H k ( D ow( F , P K ) | ray )) } , and L k ( M n ) = sup ray ∈ ˆ T ( M n ) + sup { ( β − α ) : ( α, β ) ∈ dgm( H k (Dow( S ( M n )) | ray )) } . Proof.

The ﬁrst equality follows from Equation (9) in Deﬁnition 3.7 and the second equality canbe obained from Equation (15) in Deﬁnition 3.9.The next lemma will be used to connect Lemma 6.19 and Lemma 6.21.

Lemma 6.22.

Let T , T ⊆ [0 , m such that d H ( T , T ) < (cid:15) . Let K = {K ( T ) } T ∈ R m be a multi-ﬁltered complex indexed over R m with K ( T ) ⊆ [ m ] for all T . Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15). Proof.

For any constant η >

0, we may choose ray ∈ ( T ) + such thatsup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } ≤ sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + η . (43)Let T be the element in T that ray is constructed from. Since d H ( T , T ) < (cid:15) , there exists T ∈ T such that (cid:107) T − T (cid:107) < (cid:15) . Let ray ∈ ( T ) + be constructed from T . Then d INT ( K| ray , K| ray ) < (cid:15) ,implying d i ( H k ( K| ray ) , H k ( K| ray )) < (cid:15). By Isometry Theorem (Theorem 6.16),d b (dgm( H k ( K| ray )) , dgm( H k ( K| ray ))) < (cid:15). (44)For any constant η >

0, there exists ( a , b ) ∈ dgm( H k ( K| ray )) such thatsup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } ≤ b − a + η . (45)By Equation (44), there exists ( a , b ) ∈ dgm( H k ( K| ray )) such that max {| a − a | , | b − b |} < (cid:15) .Therefore, b − a ≤ b − a + 2 (cid:15). (46)42ombining Equation (43), (45) and (46), we obtainsup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) }≤ b − a + η + η ≤ b − a + 2 (cid:15) + η + η ≤ sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + 2 (cid:15) + η + η ≤ sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + 2 (cid:15) + η + η Since η , η > ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } ≤ sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } + 2 (cid:15). Reversing the role of T and T , we may obtain the other direction, completing the proof. Lemma 6.23.

Let K and L be multi-ﬁltered complexes indexed over R m . Let T ⊆ [0 , m . If d INT ( K , L ) < (cid:15) , then, for all k ∈ { } ∪ N , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15). Proof.

For a constant η >

0, let ray ∈ T + and ( a , b ) ∈ dgm( H k ( L| ray )) such thatsup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } ≤ b − a + η. (47)Consider K| ray . Since d INT ( K , L ) < (cid:15) , d INT ( K| ray , L| ray ) < (cid:15) . Taking the H k functor, by theIsometry Theorem, d b (dgm( H k ( K| ray )) , dgm( H k ( L| ray ))) < (cid:15), which implies that there exists ( a , b ) ∈ dgm( H k ( K| ray )) such that max( | a − a | , | b − b | ) < (cid:15) andhence b − a ≤ b − a + 2 (cid:15). (48)43herefore,sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) }≤ b − a + 2 (cid:15) + η by Equation (47) and (48) ≤ sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } . Since η > ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } ≤ sup ray ∈T + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } +2 (cid:15). (49)Reversing the role of K and L , the other direction can be otained, completing the proof.With all the above lemmas, we may now present a rigorous proof of Theorem 3.10. Let usrestate Theorem 3.10 for easy reference. Theorem (Theorem 3.10) . Let M n ∈ M om,n be sampled from a regular pair ( F , P K ) . Assume that K is bounded and each f i can be continuously extended to the closure ¯ K . Then, for all k ∈ { } ∪ N ,as n → ∞ , L k ( M n ) converges to L k ( F , P K ) in probability; namely, for all (cid:15) > , lim n →∞ Pr [ | L k ( M n ) − L k ( F , P K ) | < (cid:15) ] = 1 . Moreover, the rate of convergence is independent of k .Proof of Theorem 3.10. For notational simplicity, let T = T ( F , P K ), T = ˆ T ( M n ), K = D ow( F , P K )and L = Dow( S ( M n )). Let (cid:15) >

0. By Lemma 6.21, we need to prove, as n → ∞ , w.h.p., (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15). By Interleaving Convergence Theorem (Thoerem 2.9), as n → ∞ , w.h.p., d INT ( K , L ) ≤ (cid:15)/

4. There-fore, by Lemma 6.23, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( K| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15)/ . (50)On the other hand, by Lemma 6.19, as n → ∞ , d H ( T , T ) ≤ (cid:15)/

4. Therefore, by Lemma 6.22, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } − sup ray ∈ ( T ) + sup { b − a : ( a, b ) ∈ dgm( H k ( L| ray )) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15)/ . (51)Hence, combining Equation (50) and Equation (51), the result follows.44 .5 A lemma used in the proof of Lemma 4.5 Lemma 6.24.

For t , ..., t m ∈ (0 , and σ ⊆ [ m ] , if (cid:84) i ∈ [ m ] K ( i ) ( t i ) (cid:54) = ∅ , then there exists (cid:15) > such that (cid:84) i ∈ σ K ( i ) ( t i − (cid:15) ) (cid:54) = ∅ . In addition, by monotonicity of K ( i ) ( t ) , we also have (cid:84) i ∈ σ K ( i ) ( t i − η ) (cid:54) = ∅ , for all < η ≤ (cid:15) .Proof. Let (cid:15) n be a sequence with (cid:15) n (cid:38)

0. Let us ﬁrst prove that K ( i ) ( t i − (cid:15) n ) (cid:37) K ( i ) ( t i ); equiva-lently, K ( i ) ( t i ) = (cid:83) ∞ n =1 K ( i ) ( t i − (cid:15) n ). For any n , K ( i ) ( t i − (cid:15) n ) ⊆ K ( i ) ( t i ) by deﬁnition. Therefore, (cid:83) ∞ n =1 K ( i ) ( t i − (cid:15) n ) ⊆ K ( i ) ( t i ). For the other inclusion, assume x ∈ K ( i ) ( t i ) = f − ( −∞ , λ i ( t i )).Then f i ( x ) < λ i ( t i ). By Lemma 2.4, λ i is continuous and strictly increasing. Hence, there exists n such that λ i ( t i − (cid:15) n ) > f i ( x ); in other words, x ∈ K ( i ) ( t i − (cid:15) n ). Therefore, x ∈ (cid:83) ∞ n =1 K ( i ) ( t i − (cid:15) n ),proving the claim.Now we have, as n (cid:37) ∞ , K ( i ) ( t i − (cid:15) n ) (cid:37) K ( i ) ( t i ). Thus (cid:84) i ∈ σ K ( i ) ( t i − (cid:15) n ) (cid:37) (cid:84) i ∈ σ K ( i ) ( t i ).Since (cid:84) i ∈ σ K ( i ) ( t i ) (cid:54) = ∅ , there must exist n such that (cid:84) i ∈ σ K ( i ) ( t i − (cid:15) n ) (cid:54) = ∅ . Taking (cid:15) = (cid:15) n , theresult follows. is open This subsection is devoted to the proof of openness of Cent . Lemma 6.25.

Let { f i : K → R } i ∈ [ m ] be a collection of quasi-convex C functions, where K isopen convex in R d . Then the set Cent = (cid:8) x ∈ K : cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d (cid:9) is open in K . Inparticular, Cent (cid:54) = ∅ is equivalent to P K (Cent ) > .Proof. Deﬁne a function h : K × S d − → R by h ( x, u ) = max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) . Since each f i is C , the functions ( x, u ) (cid:55)→ (cid:104) u, ∇ f i ( x ) (cid:105) are continuous and hence h is also continuous. For x ∈ K , we deﬁne ρ ( x ) = min u ∈ S d − h ( x, u ). Let us prove that, for x ∈ K , ρ ( x ) > {∇ f i ( x ) } i ∈ [ m ] ) = R d .For one direction, let x ∈ K satisfy ρ ( x ) >

0, or, equivalently, max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) >

0, forall u ∈ S d − . If 0 ∈ bd(conv( {∇ f i ( x ) } i ∈ [ m ] )), then the nonzero vector v pointing outward ofconv( {∇ f i ( x ) } i ∈ [ m ] ) and orthogonal to the hyperface containing 0 will make max i ∈ [ m ] (cid:104) v, ∇ f i ( x ) (cid:105) =0, a contradiction. If 0 / ∈ conv( {∇ f i ( x ) } i ∈ [ m ] ), then taking v = − argmin z ∈ conv( {∇ f i ( x ) } i ∈ [ m ] ) (cid:104) z, z (cid:105) willmake max i ∈ [ m ] (cid:104) v, ∇ f i ( x ) (cid:105) <

0, also a contradiction. Therefore, 0 ∈ int(conv( {∇ f i ( x ) } i ∈ [ m ] )) andhence cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d . For the other direction, let x ∈ K satisfy cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d . To prove ρ ( x ) >

0, since u (cid:55)→ max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) is continuous and S d − is compact, it suﬃcesto prove, for all u ∈ S d − , max i ∈ [ m ] (cid:104) u, ∇ f i ( x ) (cid:105) >

0. Given u ∈ S d − , since cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d , u = (cid:80) i ∈ [ m ] r i · ∇ f i ( x ), for some r i ≥

0. If (cid:104) u, ∇ f i ( x ) (cid:105) ≤ i ∈ [ m ], then (cid:104) u, u (cid:105) = (cid:80) i ∈ [ m ] r i (cid:104) u, ∇ f i ( x ) (cid:105) ≤

0, a contradiction. Thus, the other direction is proved and, for x ∈ K , ρ ( x ) > {∇ f i ( x ) } i ∈ [ m ] ) = R d .Now let x ∈ K such that cone( {∇ f i ( x ) } i ∈ [ m ] ) = R d . By what has been claimed, this isequivalent to ρ ( x ) >

0. We want to prove that there exists (cid:15) > x ∈ B ( x , (cid:15) ),45one( {∇ f i ( x ) } i ∈ [ m ] ) = R d , or equivalently, ρ ( x ) >

0. Suppose not, then there exists a sequence( x n , u n ) ∈ K × S d − such that x n → x and h ( x n , u n ) ≤ n . By compactness of S d − ,there is a subsequence u n j → u and thus by continuity of h , h ( x , u ) ≤

0. However, h ( x , u ) ≥ min u ∈ S d − h ( x , u ) = ρ ( x ) >

0, a contradiction. Thus the proof is complete.

Throughout this subsection, Cent is as deﬁned in Deﬁnition 5.1, (cid:91) Cent ( M n ) is as deﬁned inDeﬁnition 5.4, and ˆ Z n and Z ∞ are as deﬁned in Deﬁnition 6.12. The following two functions playa crucial role throughout the proof of Theorem 5.5. Deﬁnition 6.26.

Let X n = { x , ..., x n } be sampled from a regular pair ( F , P K ) and M n ∈ M om,n be the correspondin data matrix. Deﬁne τ : K → [0 , m and ˆ τ n : X n → [0 , m by τ ( x ) def = ( T ( x ) , ..., T m ( x ))ˆ τ n ( x a ) def = (cid:18) ord ( M n , a ) − n , · · · , ord m ( M n , a ) − n (cid:19) . In order to prove Theorem 5.5, we ﬁrst prove the following lemmas (Lemma 6.27 - Lemma6.29).

Lemma 6.27.

Let τ : K → [0 , m and ˆ τ n : X n → [0 , m be deﬁned as in Deﬁnition 6.26. Then(i) τ − ( Z ∞ ) = Cent ,(ii) ˆ τ − n ( ˆ Z n ) = (cid:110) x a ∈ X n : a ∈ (cid:91) Cent ( M n ) (cid:111) , and(iii) τ − ( Z ∞ ) ∩ X n ⊆ ˆ τ − n ( ˆ Z n ) .Proof. To prove (i), τ − ( Z ∞ ) = { x ∈ K : τ ( x ) ∈ Z ∞ } =  x ∈ K : P K  (cid:92) i ∈ [ m ] K ( i ) ( T i ( x ))  = 0  =  x ∈ K : (cid:92) i ∈ [ m ] K ( i ) ( T i ( x )) = ∅  =  x ∈ K : (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x )) = ∅  = Cent .

46o prove (ii),ˆ τ − n ( ˆ Z n ) = { x a ∈ X n : ˆ τ n ( x a ) ∈ ˆ Z n } =  x a ∈ X n : (cid:92) i ∈ [ m ] { b ∈ [ n ] : ord i ( M n , b ) ≤ ord i ( M n , a ) − } = ∅  =  x a ∈ X n : a ∈ [ n ] , (cid:92) i ∈ [ m ] { b ∈ [ n ] : M ib < M ia } = ∅  = { x a ∈ X n : a ∈ (cid:91) Cent ( M n ) } . To prove (iii), assume x a ∈ τ − ( Z ∞ ) ∩ X n . Then (cid:84) i ∈ [ m ] f − i ( −∞ , f i ( x a )) = ∅ . Thusˆ R n (ˆ i n ( x a )) = ˆ R n (cid:18) ord ( M n , a ) − n , ..., ord m ( M n , a ) − n (cid:19) = 1 n ·  X n ∩  (cid:92) i ∈ [ m ] K ( i ) n (cid:18) ord i ( M n , a ) − n (cid:19) (by Lemma 6.5)= 1 n ·  X n ∩  (cid:92) i ∈ [ m ] f − i ( −∞ , f i ( x a ))  (by Deﬁnition 6.3)= 0 . Hence, x a ∈ ˆ τ − n ( ˆ Z n ) and the inclusion in (iii) follows. Lemma 6.28.

Let τ : K → [0 , m and ˆ τ n : X n → [0 , m be deﬁned as in Deﬁnition 6.26. Then,for any δ > , as n → ∞ , w.h.p., ˆ τ − n ( ˆ Z n ) ⊆ τ − ( Z ∞ + δ ) . Proof.

By Corollary 6.14, w.h.p., ˆ Z n ⊆ Z ∞ + δ/ . (52)If Z ∞ = [0 , m , then τ − ( Z ∞ + δ ) = K and hence ˆ τ − n ( ˆ Z n ) ⊆ τ − ( Z ∞ + δ ) clearly holds. Nowsuppose Z ∞ (cid:54) = [0 , m . Since Z ∞ is closed, int ([0 , m \ Z ∞ ) is nonempty and open, say containing x . Choose (cid:15) > (cid:15) < δ and B ( x , √ d(cid:15) ) ⊆ int ([0 , m \ Z ∞ ). Then Z ∞ + (cid:15)/ (cid:36) Z ∞ + (cid:15) .Thus µ ∗ def = d H ( Z ∞ + (cid:15)/ , Z ∞ + (cid:15) ) / √ d is a positive number, where d H is the Hausdorﬀ distance. Since Z ∞ + (cid:15)/ (cid:15)/ Z ∞ + (cid:15) , we have the inequality µ ∗ ≤ (cid:112) d · ( (cid:15)/ / √ d = ( (cid:15)/ < δ/ . (53) See Equation (32) for the deﬁnition of Z ∞ + δ . See Equation (34) for the deﬁnition of Hausdorﬀ distance.

47y Equation (40) and deﬁnition of τ and ˆ τ n , w.h.p.,sup x ∈ X n (cid:107) τ ( x ) − ˆ τ n ( x ) (cid:107) < µ ∗ / . (54)There is one more property of Z ∞ , following from monotonicity of R ∞ , that we need in this proof:if ( t , ..., t m ) ∈ Z ∞ and ( t (cid:48) , ..., t (cid:48) m ) ≤ ( t , ..., t m ), then ( t (cid:48) , ..., t (cid:48) m ) ∈ Z ∞ ; in short, Z ∞ is closed under ≤ . Now we can prove the inclusion. Assume x ∈ ˆ τ − n ( ˆ Z n ), namely, ˆ τ n ( x ) ∈ ˆ Z n . Then τ ( x ) ∈ ˆ τ n ( x ) + B (0 , µ ∗ /

2) (by (54)) ⊆ ˆ Z n + B (0 , µ ∗ /

2) (since ˆ τ n ( x ) ∈ ˆ Z n ) ⊆ Z ∞ + δ/ B (0 , µ ∗ /

2) (by (52)) ⊆ Z ∞ + δ/ µ ∗ / Z ∞ is closed under ≤ ) ⊆ Z ∞ + δ/ δ/ ⊆ Z ∞ + δ. Therefore, x ∈ τ − ( Z ∞ + δ ). Since x ∈ ˆ τ − n ( ˆ Z n ) is arbitrary, the proof is complete. Lemma 6.29.

Let X n ⊂ K be a point cloud of size n , sampled from a regular pair ( F , P K ) . Let τ be deﬁned as in Deﬁnition 6.26. Then, for all (cid:15) > , there exists δ > such that, as n → ∞ ,w.h.p., (cid:12)(cid:12)(cid:12)(cid:12) X n ∩ τ − ( Z ∞ + δ )) n − P K (Cent ) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15)/ . Proof.

Let us ﬁrst prove that lim δ (cid:38) P K ( τ − ( Z ∞ + δ )) = P K (Cent ). Since Z ∞ is closed, Z ∞ + δ (cid:38) Z ∞ as δ (cid:38)

0. Thus, τ − ( Z ∞ + δ ) (cid:38) τ − ( Z ∞ ). Therefore, by monotone convergence theorem,lim δ (cid:38) P K ( τ − ( Z ∞ + δ )) = P K ( τ − ( Z ∞ )) = P K (Cent ).Since lim δ (cid:38) P K ( τ − ( Z ∞ + δ )) = P K (Cent ), we may choose δ such that | P K ( τ − ( Z ∞ + δ )) − P K (Cent ) | < (cid:15)/ . (55)Note that X n ∩ τ − ( Z ∞ + δ ) is an i.i.d. sample of τ − ( Z ∞ + δ ) ⊆ K . Thus, by law of large numbers,as n → ∞ , w.h.p., (cid:12)(cid:12)(cid:12)(cid:12) X n ∩ τ − ( Z ∞ + δ )) n − P K ( τ − ( Z ∞ + δ )) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15)/ . (56)Combining Equation (55) and (56), the result follows.With the help of previous lemmas, we give a proof of Theorem 5.5. Proof of Theorem 5.5.

Let τ and ˆ τ n be deﬁned as in Deﬁnition 6.26. Let (cid:15) >

0. Note that X n ∩ τ − ( Z ∞ ) is an i.i.d. sample of τ − ( Z ∞ ) = Cent ⊆ K . Thus, by law of large numbers, as n → ∞ ,48.h.p., (cid:12)(cid:12) X n ∩ τ − ( Z ∞ )) /n − P K (Cent ) (cid:12)(cid:12) < (cid:15)/

2. Therefore, as n → ∞ , w.h.p., P K (Cent ) − (cid:15)/ < X n ∩ τ − ( Z ∞ )) /n ≤ τ − n ( ˆ Z n )) /n (by (iii) of Lemma 6.27)= (cid:91) Cent ( M n )) /n (by Lemma 6.27)On the other hand, as n → ∞ , w.h.p., (cid:91) Cent ( M n )) n = τ − n ( ˆ Z n )) /n (by Lemma 6.27)= τ − n ( ˆ Z n ) ∩ X n ) /n (since ˆ τ − n ( ˆ Z n ) ⊆ X n ) ≤ τ − ( Z ∞ + δ ) ∩ X n ) /n (by Lemma 6.28) ≤ P K (Cent ) + (cid:15)/ . Therefore, as n → ∞ , w.h.p., (cid:12)(cid:12)(cid:12)(cid:12) (cid:91) Cent ( M n )) n − P K (Cent ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15)/ Acknowledgments

This work was supported by the NSF IOS-155925 grant.

References [1] A. Bj¨orner,

Handbook of combinatorics (vol. 2) , MIT Press, Cambridge, MA, USA, 1995,pp. 1819–1872.[2] Omer Bobrowski, Matthew Kahle, and Primoz Skraba,

Maximally persistent cycles in randomgeometric complexes , The Annals of Applied Probability (2015).[3] Stephen Boyd and Lieven Vandenberghe, Convex optimization , 2004.[4] Alberto Cambini and Laura Martein,

Generalized convexity and optimization: Theory andapplications , Lecture Notes in Economics and Mathematical Systems (2009).[5] Fr´ed´eric Chazal, Vin de Silva, and Steve Oudot,

Persistence stability for geometric complexes ,Geometriae Dedicata (2014), no. 1, 193–214.[6] Luc Devroye, Lszl Gyrﬁ, and Gbor Lugosi,

A probabilistic theory of pattern recognition , vol. 31,01 1996.[7] Allen Hatcher,

Algebraic topology , Cambridge University Press, Cambridge, 2002. MR 1867354(2002k:55001) 498] Michael Lesnick,

The theory of the interleaving distance on multidimensional persistence mod-ules , Foundations of Computational Mathematics (2015), no. 3, 613650.[9] John O’Keefe and Jonathan Dostrovsky, The hippocampus as a spatial map. preliminary evi-dence from unit activity in the freely-moving rat. , Brain Res. (1971), no. 1, 171–175.[10] Nina Otter, Mason A Porter, Ulrike Tillmann, Peter Grindrod, and Heather A Harrington, Aroadmap for the computation of persistent homology , EPJ Data Science (2017), no. 1.[11] Steve Y. Oudot, Persistence theory: from quiver representations to data analysis , Mathemati-cal Surveys and Monographs, vol. 209, American Mathematical Society, Providence, RI, 2015.[12] Rommel G. Regis,

On the properties of positive spanning sets and positive bases , Optimizationand Engineering (2016), no. 1, 229–262.[13] Michael M. Yartsev and Nachum Ulanovsky, Representation of three-dimensional space in thehippocampus of ﬂying bats , Science340