[PDF] Manifold-adaptive dimension estimation revisited

Abstract

Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold-adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected-median-FSA estimator with kNN estimators: maximum likelihood (ML, Levina-Bickel) and two implementations of DANCo (R and matlab). We show that corrected-median-FSA estimator beats the ML estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones.

Full PDF

MManifold-adaptive dimension estimation revisited

Zsigmond Benk˝o , Marcell Stippinger , Roberta Rehus , Attila Bencze , D´aniel Fab´o ,Bogl´arka Hajnal

2, 4 , Lor´and Er˝oss , Andr´as Telcs , and Zolt´an Somogyv´ari Department of Computational Sciences, Wigner Research Centre for Physics, H-1121, Hungary J´anos Szent´agothai Doctoral School of Neurosciences, Semmelweis University, H-1085, Hungary Department of Computer Science and Information Theory, Faculty of Electrical Engineering and Informatics,Budapest University of Technology and Economics, Budapest, H-1111, Hungary Epilepsy Center, Department of Neurology, National Institute of Clinical Neurosciences, Budapest, H-1145,Hungary Department of Functional Neurosurgery, National Institute of Clinical Neurosciences, Budapest, H-1145,Hungary Faculty of Information Technology and Bionics, P´eter P´azm´any Catholic University, Budapest, H-1083,Hungary Department of Quantitative Methods, University of Pannonia, Faculty of Business and Economics, H-8200,Veszpr´em, Hungary Neuromicrosystems ltd., Budapest, H-1113, Hungary * [email protected] August 11, 2020

Abstract

Data dimensionality informs us about data complexity and sets limit on the structure of suc-cessful signal processing pipelines. In this work we revisit and improve the manifold adaptiveFarahmand-Szepesv´ari-Audibert (FSA) dimension estimator, making it one of the best nearestneighbor-based dimension estimators available. We compute the probability density function oflocal FSA estimates, if the local manifold density is uniform. Based on the probability densityfunction, we propose to use the median of local estimates as a basic global measure of intrinsicdimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator overthe previously proposed statistics: the mode and the mean. Additionally, from the probability den-sity function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d.holds. We tackle edge and ﬁnite-sample eﬀects with an exponential correction formula, calibratedon hypercube datasets. We compare the performance of the corrected-median-FSA estimator withkNN estimators: maximum likelihood (ML, Levina-Bickel) and two implementations of DANCo(R and matlab). We show that corrected-median-FSA estimator beats the ML estimator and it ison equal footing with DANCo for standard synthetic benchmarks according to mean percentageerror and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in theneural dynamics while resting state and during epileptic seizures. We identify brain areas withlower-dimensional dynamics that are possible causal sources and candidates for being seizure onsetzones.

Introduction

Dimensionality sets profound limits on the stage where data takes place, therefore it is often crucialto know the intrinsic dimension of data to carry out meaningful analysis. Intrinsic dimension providesdirect information about data complexity, as such, it was recognised as a useful measure to describethe dynamics of dynamical systems[1], to detect anomalies in time series[2], to diagnose patients withvarious conditions[3, 4, 5, 6] and to use it simply as plugin parameter for signal processing algorithms.Most of the multivariate datasets lie on a lower dimensional manifold embedded in a potentiallyvery high-dimensional embedding space. This is because the observed variables are far from inde-pendent, and this interdependence introduces redundancies resulting in a lower intrinsic dimension1 a r X i v : . [ s t a t . M E ] A ug ID) of data compared with the number of observed variables. To capture this – possibly non-linear– interdependence, nonlinear dimension-estimation techniques can be applied[7, 8, 9, 10].To estimate the ID of data various aproaches have been proposed, for a full review of techniquessee the work of Campadelli et al.[11]. Here we discuss the k-Nearest Neighbor (kNN) ID estimators,with some recent advancements in the focus.A usually basic assumption of k NN ID estimators is that the fraction of points in a neighborhoodis approximately determined by the intrinsic dimensionality ( D ) and distance ( R ) times a – locallyalmost constant – mostly density-dependent factor ( η ( x, R ), Eq. 1). kn ≈ η ( x, R ) ∗ R Dk (1)where k is the number of samples in a neighborhood and n is the total number of samples on themanifold.Assuming a Poisson sampling process on the manifold Levina and Bickel[12] derived a MaximumLikelihood estimator, which became a popular method and got several updates[13, 14]. These estima-tors are prone to underestimation of dimensionality because of ﬁnite sample eﬀects and overestimationsbecause of the curvature.To address the challenges posed by curvature and ﬁnite sample, new estimators were proposed[15, 16, 17, 18]. To tackle the eﬀect of curvature, a minimal neighborhood size can be taken onnormalized neighborhood distances as in the case of MIND ML [15]. To tackle the underestimation dueto ﬁnite sample eﬀects, empirical corrections were applied. A naive empirical correction approachwas applied by Camastra and Vinciarelli[19]: a perceptron was trained on the estimates computedfor randomly sampled hypercubes to learn a correction function. Motivated by the correction in theprevious work, the IDEA method was created[15]; and a more principled approach was carried out,where the full distribution of estimates was compared to the distributions computed on test data setsusing the Kullback-Leibner divergence (MIND KL [15], DANCo[17]). In the case of DANCo, not justthe nearest neighbor distances, but the angles are measured and taken into account in the estimationprocess resulting in more accurate estimates.In the recent years, further estimators have been proposed, such as the estimator that uses minimalneighborhood information leveraging the empirical distribution of the ratio of the nearest neighbors toﬁt intrinsic dimension[18], or other approaches based on simplex skewness[20] and normalized distances[21, 22, 23, 24].In the following section, we revisit the manifold adaptive dimension estimator proposed by Farah-mand et al.[25] to measure intrinsic dimensionality of datasets. From Eq. 1 we can take the logarithmof both sides: ln (cid:18) kn (cid:19) ≈ ln η + D ln R k ln (cid:18) kn (cid:19) ≈ ln η + D ln R k (2)If η is slowly varying and R is small, we can take it as a constant.If we subtract the two equations from each other we get:ln (2) ≈ D ln (cid:18) R k R k (cid:19) (3)Thus, to compute the local estimates, we ﬁt a line through the log-distance k th and 2 k th nearestneighbor at a given location. d ( x ) = ln(2)ln ( R k /R k ) (4)To compute a global ID estimate, the authors proposed the mean of local estimates at sample-points, or a vote for the winner global ID value (the mode), if the estimator is used in integer-mode.2hey proved that the above global ID estimates are consistent for k >

1, if η is diﬀerentiable andthe manifold is regular. They calculated the upper bound for the probability of error for the globalestimate, however this bound contains unknown constants[25].In this paper we propose an improved FSA estimator, based on the assumption that the densityis locally uniform. We suggest to use the median of local values for a global intrinsic dimensionestimate. We correct the underestimation eﬀect by an exponential formula and test the new algorithmon benchmark datasets. We apply the proposed estimator to locate epileptic focus on ﬁeld potentialmeasurements. A D = 2 k=1k=11k=50 0 5 10 B D = 3 C D = 5 d D D = 8 d E D = 10 d F D = 12 Figure 1:

Probability density function of the Farahmand-Szepesv´ari-Audibert estimator( d ) for various dimensions ( D ) and neighborhood sizes ( k ). A-F The sublots show thattheoretical pdfs (continuous lines) ﬁts to the histograms ( n = 10000) of local estimates calculatedon uniformly sampled hypercubes ( D = 2 , , , , , k = 1 (blue), k = 11 (orange) and k = 50 (green). Results

Manifold adaptive dimension estimator revisited

The probability density of Farahmand-Szepesv´ari-Audibert estimator

We compute the probability density function of Farahmand-Szepesv´ari-Audibert (FSA) intrinsic di-mension estimator based on normalized distances.The normalized distance density of the k NN can be computed in the context of a K -neighborhood,where the normalized distance of K − ( r | k, K − , D ) = DB ( k, K − k ) r Dk − (1 − r D ) K − k − (5)where r is the normalized distance of the k th neighbor by the distance of K th neighbor ( r k = R k /R K , k < K ) and B is the Euler-beta function (see SI A.1 for a derivation). A maximum likelihood estimatorbased on Eq. 5 leads to the formula of the classical Levina-Bickel estimator ([12]). For a derivation ofthis probability density and the maximum likelihood solution see SI A.1 and SI A.2 respectively.We realize that the inverse of normalized distance appears in the formula of FSA estimator, so wecan express it as a function of r : d k = log 2log ( R k /R k ) = − log 2log ( R k /R k ) = − log 2log r k (6)Where r k = R k /R k .Thus, we can compute the pdf of the estimated values as plugging in K = 2 k into Eq. 5 followedby change of variables: q ( d k ) ≡ p ( r | k, k − , D ) (cid:12)(cid:12)(cid:12)(cid:12) d r d d k (cid:12)(cid:12)(cid:12)(cid:12) = D log (2) B ( k, k ) 2 − Dkdk (cid:18) − − Ddk (cid:19) k − d k (7) Theorem 1.

The median of q ( d k ) is at D .Proof. We apply the substitution a = 2 − D/d k in Eq. 7 (Eq. 10): p ( a ) = q ( d k ) (cid:12)(cid:12)(cid:12)(cid:12) d d k d a (cid:12)(cid:12)(cid:12)(cid:12) = (8)= D log (2) B ( k, k ) a k (1 − a ) k − log aD log D log 2 a log a (9)= 1 B ( k, k ) a k − (1 − a ) k − (10)The pdf in Eq.10 belongs to a beta distribution. The cumulative distribution function of this densityis the regularized incomplete Beta function with k as both parameters symmetrically. P ( a ) = I a ( k, k ) (11)The median of this distribution is at a = , thus at d k = D since: a = 2 − Ddk = 12 (12) D = d k (13)This means that the median of the FSA estimator is equal to the intrinsic dimension independentof neighborhood size, if the locally uniform point density assumption holds. The sample medianis a robust statistic, therefore we propose to use the sample median of local estimates as a globaldimension estimate. We will call this modiﬁed method the median Farahmand-Szepesv´ari-Audibert(mFSA) estimator.Let’s see the form for the smallest possible neighborhood size: k = 1 (Fig. 1). The pdf for theestimator takes a simpler from (Eq. 14). q ( d | k = 1 , D ) = D log(2) 2 − Dd d (14)Also, we can calculate the cumulative distribution function analytically (Eq. 15).4 ( d | k = 1 , D ) = (cid:90) d q ( t | k = 1 , D ) d t = 2 − D/d (15)The expectation of d k diverges for k = 1– but not for k > D (Eq. 16). Q ( d = D ) = 0 . Sampling distribution of the median

We can easily compute the pdf of the sample median if an odd sample size is given ( n = 2 l + 1) andif sample points are drawn independently according to Eq. 7. Roughly half of the points have to besmaller, half of the points have to be bigger and one point has to be exactly at m (Eq. 17). p ( m | k, D, n ) = 1 B ( l + 1 , l + 1) (cid:104) P (cid:16) a = 2 − D/m (cid:17) (cid:16) − P (cid:16) a = 2 − D/m (cid:17)(cid:17)(cid:105) l q ( m ) (17)Where p ( a ) and P ( a ) are the pdf and cdf of a (Eq. 10, 11) and q is the pdf of the FSA estimator (Fig.2). m d D = 2 A n = 11 n = 101 n = 1001 0.0 2.5 5.0 7.5 10.0 m d D = 5 B Figure 2:

The sampling distribution of the median for the FSA estimator ( k = 1 ) onuniformly sampled hypercubes. The ﬁgure shows the pdf of median-FSA estimator of pointsuniformly sampled from a square ( A ) and from a 5D hypercube ( B ) for three sample sizes: n = 11(blue), n = 101 (orange) and n = 1001 (green) respectively. The solid lines represent the theoreticalpdf-s of the median and the shaded histograms are the results of simulations ( N = 5000 realizations). Maximum Likelihood solution for the manifold-adaptive estimator

If the samples are independent and identically distributed, we can formulate the likelihood function asthe product of sample-likelihoods (Eq. 18). We seek for the maximum of the log likelihood function,but the derivative is transcendent for k >

1. Therefore, we can compute the place of the maximumnumerically (Eq. 20). 5 = n (cid:89) i =1 D log (2) B ( k, k ) 2 − Dk/d k ( i ) (1 − − D/d k ( i ) ) k − (cid:16) d k ( i ) (cid:17) (18)log L = n log log (2) B ( k, k ) + n log D − Dk log(2) (cid:88) d k ( i ) + ( k − (cid:88) log (cid:16) − − D/d k ( i ) (cid:17) (19) − (cid:88) log( d k ( i ) ) ∂ log L ∂D = nD − log(2) k (cid:88) d k ( i ) + log(2)( k − (cid:88) d k ( i ) (2 D/d k ( i ) − ! = 0 (20)For k = 1, the ML formula is equal to the Levina-Bickel ( k = 1) and MIND formulas. Results on randomly sampled hypercube datasets

Theoretical probability density function of the local FSA estimator ﬁts to empirical observations (Eq.7, Fig. 1). We simulated hypercube datasets with ﬁxed sample size ( n = 10000) and of variousintrinsic dimensions ( D = 2 , , , , , k parameter values ( k = 1 , , D = 2 and D = 5 with the smallest possible neighborhood ( k = 1), for diﬀerent samplesizes ( n = 11, 101, 1001). At big sample sizes the pdf is approximately a Gaussian[26], but for smallsamples the pdf is non-Gaussian and skewed.The mFSA estimator underestimates intrinsic dimensionality in high dimensions. This phenomenais partially a ﬁnite sample eﬀect (Fig. 3), but edge eﬀects make this underestimation even more severe.We graphically showed that mFSA estimator asymptotically converged to the real dimension valuesfor hypercube-datasets, when we applied periodic boundary conditions (Fig. 4). We found, that theconvergence is much slower for hard boundary conditions, where edge eﬀects make estimation errorshigher.We could estimate the logarithm of relative error with an s -order polynomial:log( E rel ) = log (cid:18) Dd (cid:19) = s (cid:88) i =1 α i d i (21)The order of the polynomial was diﬀerent for the two types of boundary conditions. When weapplied hard boundary, the order was s = 1, however in the periodic case higher order polynomials ﬁtthe data. Thus, in the case of hard-boundary, we could make the empirical correction formula: D ≈ C ( ˆ d ) = de α n d (22)where α n is a sample size dependent coeﬃcient that we could ﬁt with the least squares method. Results on synthetic benchmarks

We tested the mFSA estimator and its corrected version on synthetic benchmark datasets[27, 11]. Wesimulated N = 100 instances of 15 manifolds ( M i , n d n = 10 A n = 100 B n = 500 C

10 20 30 D d n = 1000 D

10 20 30 Dn = 2500 E

10 20 30 Dn = 10000 F Figure 3:

Intrinsic dimension dependence of the median-FSA estimator for uniformlysampled unit hypercubes with various sample sizes ( k = 1 ). Subplots

A-F show the mean ofmedian-FSA estimator (thick line) values from N = 100 realizations (shading) of uniformly sampledunit hypercubes with periodic boundary.In contrast, the cmFSA (cmFSA) estimator found the true intrinsic dimensionality of the datasets,it reached the best overall error rate (0 . . Analysing epileptic seizures

To show how mFSA works on real-world noisy data, we applied it to human neural recordings ofepileptic seizures.We acquired ﬁeld potential measurements from a patient with drug-resistant epilepsy by 2 electrodegrids and 3 electrode strips. We analyzed the neural recordings during interictal periods and duringepileptic activity to map possible seizure onset zones (see Methods).We found several characteristic diﬀerences in the dimension patterns between normal and controlconditions. In interictal periods (Fig. 7 A), we found the lowest average dimension value at the FbB2position on the froto-basal grid. Also, we observed a diagonal gradient of intrinsic dimensions on thecortical grid (Gr). In contrast, we observed the lowest dimension values at the hippocampal electrodestrip (JT), and the gradient on the cortical grid dissappeared during seizures (Fig. 7 B). Curiously, theintrinsic dimensionality became higher at fronto-basal recording sites during seizure (Fig. 7 C).7 d A B C n59131721 d D n512192633 E n1018263442 F Figure 4:

Sample size dependence of the median-FSA estimator for uniformly sampledunit hypercubes with varied intrinsic dimension value ( k = 1 ). Subplots

A-F show the meanof median-FSA estimator (thick line) values from N = 100 realizations (shading). Discussion

In this work we revisited and improved the manifold adaptive FSA dimension estimator. We computedthe probability density function of local estimates if the local density was uniform. From the pdf, wederive the maximum likelihood formula for intrinsic dimensionality.We proposed to use the median of local estimates as a global measure of intrinsic dimensionality,and demonstrated that this measure is asymptotically unbiased.We tackled edge eﬀects with a correction formula calibrated on hypercube datasets. We showedthat the coeﬃcients are sample-size dependent. Camastra and Vinciarelli [19] took a resemblingempirical approach, where they corrected correlation dimension estimates with a perceptron, calibratedon d-dimensional datasets. Our approach is diﬀerent, because we tried to grasp the connection betweenunderestimation and intrinsic dimensionality more directly, by showing that the dimension-dependenceof the relative error is exponential. The calibration procedure of DANCo may generalize better,because it compares the full distribution of local estimates rather than just a centrality measure[17].Also, we are aware that our simple correction formula overlooks the eﬀect of curvature and noise.We tried to address the former with the choice of minimal neighborhood size ( k = 1), thus theoverestimation eﬀect due to curvature is minimal. Additionally, the eﬀect of noise on the estimates isyet to be investigated. There are several strategies to alleviate noise eﬀects such as undersample thedata while keeping the neighborhood ﬁxed[18], or using a bigger neighborhood size , while keeping thesample size ﬁxed. Both of these procedures make the eﬀect of curvature more severe, which makesthe dimension estimation of noisy curved data a challenging task.We benchmarked the new mFSA and corrected-mFSA method against Levina-Bickel estimator and8 d n = 10= 0.023 A n = 100= 0.025 B n = 500= 0.020 C

10 20 30 D d n = 1000= 0.019 D

10 20 30 Dn = 2500= 0.017 E

10 20 30 Dn = 10000= 0.014 F Figure 5:

Bias-correction of the median-FSA estimator for uniformly sampled unit hyper-cubes with various sample sizes ( k = 1 ). Subplots

A-F show the mean of median-FSA estimator(grey line) values from N = 100 realizations (shading) of uniformly sampled unit hypercubes. Theboundary condition is hard, so the edge eﬀect makes under-estimation more severe. The colored linesshow the corrected estimates according to the ˆ w c = ˆ w exp( α ˆ w ).DANCo on synthetic benchmark datasets and found that cmFSA showed comparable performance toDANCo. For many datasets, R-DANCo overestimated the intrinsic dimensionality, which is mostprobably due to rough default calibration[20]; the matlab implementation showed the best overallresults in agreement with Campadelli et al[11]. This superiority was however dataset-speciﬁc: cmFSAperformed genuinely the best in 4, DANCo in 2 out of the 15 benchmark datasets (with 7 ties, Table 1).Also, cmFSA showed better overall error rate than DANCo. Combining the performance measuredby diﬀerent metrics, we recognise that cmFSA found the true intrinsic dimension of the data in morecases, but when mistaken, it makes relatively bigger errors compared with DANCo.The mFSA algorithm revealed diverse changes in the neural dynamics during epileptic seizures.In normal condition, the gradient of dimension values on the cortical grid reﬂects the hierarchicalorganization of neocortical information processing[28]. During seizures, this pattern becomes disruptedpointing to the breakdown of normal activation routes. Some channels showed lower dimensionaldynamics during seizures; that behaviour is far from the exception: the decrease in dimensionality isdue to widespread synchronization events between neural populations[29], a phenomenon reported byvarious authors [4, 30, 31]. These lower-dimensional areas are possible causal sources[7, 10, 9] andcandidates for being the seizure onset zone. Interestingly, Esteller et al found, that the Higuchi fractaldimension values were higher at seizure onset and decreased to lower values as the seizures evolvedover time[32]. We found, that most areas showed decreased dimensionality, but few areas also showedincreased dimension values as seizure takes place. This may suggests that new - so far unused - neuralcircuits are activated at seizure onset; whether this circuitry contributes to or counteracts epileptic9 M P E ( % ) A cmFSAm-DANCo M M M M M M M M M a M b M c M d M M M E rr o r r a t e B Figure 6:

Performance-comparison between cmFSA and DANCo on synthetic benchmarkdatasets. A

Dataset-wise Mean Percentage Error (MPE) on benchmark data. cmFSA (blue) showssmaller MPE in 4 cases and bigger MPE in 4 cases compared with DANCo (matlab). B Dataset-wiseerror rate for cmFSA and DANCo. cmFSA shows smaller error rates in 5 cases and bigger error ratesin 2 cases compared with DANCo.seizure is unclear.

Methods

The simulations and the FSA algorithms were implemented in python3[33] using the numpy[34],scipy[35] and matplotlib[36] packages, unless otherwise stated.

Simulations

We generated test-datasets by uniform random sampling from the unit D -cube to demonstrate, thattheoretical derivations ﬁt to data. We measured distances with a circular boundary condition to avoidedge eﬀects, hence the data is as close to the theoretical assumptions as possible.To illustrate the probability density function of the FSA estimator, we computed the local FSAintrinsic dimension values (Fig. 1). We generated d -hypercubes ( n = 10000, one realization) withdimensions of 2, 3, 5, 8, 10 and 12, then computed histograms of local FSA estimates for threeneighborhood sizes: k = 1, 11, 50 respectively (Fig. 1 A-F). We drew the theoretically computed pdfto illustrate the ﬁt.To show that the theoretically computed sampling distribution of the mFSA ﬁts to the hypercubedatasets, we varied the sample size ( n = 11 , , N = 5000 realizations from each. We10able 1: Dimension estimates on synthetic benchmark datasets.

The table shows true dimension values, median-Farahmand-Szepesv´ari-Audibert, Maximum Likeli-hood, corrected median Farahmand-Szepesv´ari-Audibert and DANCo mean estimates from N = 100realizations. The MPE values can be seen in the bottom line, the matlab version of DANCo estimatorproduced the smallest error followed by the cmFSA estimator.dataset d mFSA cmFSA fr cmFSA R-DANCo M-DANCo fr M-DANCo Levina1 M

10 9.09 11.19 11.08 12.00 10.42 M M M M M M M

20 14.58 M a

10 8.21 9.90 M b

17 12.76 16.95 M c

24 16.80 24.10 M d

70 35.64 69.84 M M

20 15.64 21.96 21.98 21.03 20.88 M A control B seizure C seizure-control Figure 7: mFSA Dimension estimates on intracranial Brain-LFP measurements duringinterictal activity and epileptic seizures.

The ﬁgure shows the dimension estimates on an in-tracranial cortical grid (Gr A-F), a smaller Frontobasal grid (Fb A, B) and 3 electrode strips withhippocampal and temporal localization (JIH, BIH, JT). The areas with lower-dimensional dynamicsare marked by stronger colors. A Average of mFSA dimension values from interictal LFP activity(N=16, k=10-20). B Average of mFSA dimension values from seizure LFP activity (N=18, k=10-20). C Diﬀerence of dimension values. Stronger red color marks areas, where the dynamics during seizurewas smaller-dimensional than its interictal counterpart. However, stronger blue indicates electrodes,where the during-seizure dynamics was higher dimensional than the interictal dynamics.computed the mFSA for each realization and plotted the results for d = 2 (Fig. 2 A) and d = 5(Fig. 2 B).We investigated the dimensionality and sample-size eﬀects on mFSA estimates ( Fig. 3 A-F).11e simulated the hypercube data in the 2-30 dimension-range, and applied various sample sizes: n = 10 , , , , N = 100). We computed the mFSAvalues with minimal neighborhood size ( k = 1), and observed ﬁnite-sample-eﬀects, and asymptoticconvergence. The ﬁnite sample eﬀect was pronounced at low sample sizes and high dimensions, butwe experienced convergence to the real dimension value as we increased sample size. We repeated theanalysis with hard boundary conditions.We ﬁtted a correction formula on the logarithm of dimension values and estimates with the leastsquares method (Eq. 23), using all 100 realizations for each sample sizes separately. α = (cid:80) (ln E i ) d ( i ) (cid:80) (cid:0) d ( i ) (cid:1) (23)Where E i = D i /d ( i ) is the relative error, D i is the intrinsic dimension of the data, and d ( i ) are thecorresponding mFSA estimates. This procedure ﬁt well to data in the intrinsic dimension range 2-30(Fig. 5 A-F).Wider range of intrinsic dimension values (2-80) required more coeﬃcients in the polynomial ﬁtprocedure (SFig. 1 A). Also, we used orthogonal distance regression to ﬁt the mean over realizationsof ln E i with the same D i value (SFig. 1 B). We utilized the mean and standard deviation of theregression error to compute the error rate of cmFSA estimator, if the error-distributions are normal(SFig. 1 C-D). We applied this calibration procedure ( n = 2500) to compute cmFSA on the followingbenchmark datasets. Comparison on synthetic benchmark datasets

We simulated N = 100 instances of 15 manifolds ( M i , n M N M (cid:88) j =1 N (cid:88) i =1 | D j − ˆ d ij | D j (24)Where there is N realizations of M types of manifolds, d j are the true dimension values, ˆ d ij are thedimension estimates.Also, we used the error rate – the fraction of cases, when the estimator did not ﬁnd (missed) thetrue dimensionality – as an alternative metric.We found that the corrected-mFSA estimator produced the second smallest MPE and the smallesterror rate on the test datasets (Fig. 6). Dimension estimation of interictal and epileptic dynamics

We used data of intracranial ﬁeld potentials from two subdural grids positioned – parietofrontally andfrontobasally – on the brain surface and from three strips located in the left and the right hippocampusand in the right temporal cortex as part of presurgical protocol for a subject with drug resistantepilepsy. This equipment recorded extracellular ﬁeld potentials at 88 neural channels at a samplingrate of 2048 Hz. Moreover, we read in – using the neo package[39]– selected 10 second long chunks ofthe recordings from interictal periods ( N = 16) and seizures ( N = 18) to further analysis.We standardised the data series and computed the Current Source Density (CSD) as the secondspatial derivative of the recorded potential. We rescaled the 10 second long signal chunks by subtract-ing the mean and dividing by the standard deviation. Then, we computed the CSD of the signals byapplying the graph Laplacian operator on the time-series. The Laplacian contains information about12he topology of the electrode grids, to encode this topology, we used von Neumann neighborhood inthe adjacency matrix. After CSD computation, we bandpass-ﬁltered the CSD signals[40] (1-30 Hz,fourth order Butterworth ﬁlter) to improve signal to noise ratio.We embedded CSD signals and subsampled the embedded time series. We used an iterative manualprocedure to optimize embedding parameters (SFig. 2). Since the fastest oscillation is (30 Hz) inthe signals, a ﬁxed value with one fourth period (2048 / ≈

17 samples) were used as embeddingdelay. We inspected the average space-time separation plots of CSD signals to determine a propersubsampling, (with embedding dimension of D=2 (Fig. 7 A). We found, that the ﬁrst local maximumof the space-time separation was at around 5 ms: 9 −

10, 10 −

11, 11 −

12 samples for the 1%, 25%,50% percentile contour-curves respectively. Therefore, we divided the embedded time series into 10subsets to ensure the required subsampling. Then, we embedded the CSD signal up to D = 12and measured the intrinsic dimensionality for each embeddings (Fig. 7 B C). We found that intrinsicdimension estimates started to show saturation at D > = 3, therefore we chose D = 7 as a suﬃcientlyhigh embedding dimension (averaged over k = 10 −

20 neighborhood sizes).We measured the intrinsic dimensionality of the embedded CSD signals using the mFSA methodduring interictal and epileptic episodes (Fig. 7). We selected the neighborhood size between k = 10 and k = 20 and averaged the resulting estimates over the neighborhoods and subsampling realizations. Weinvestigated the dimension values (Fig. 7 A B) and diﬀerences (Fig. 7 C) in interictal and in epilepticperiods.We found characteristic changes in the pattern of intrinsic dimensions during seizures, which mayhelp to localize seizure onset zone. Acknowledgments

We are grateful for ´Ad´am Zlatniczki for his comments on the manuscript.

Author contributions

Zsigmond Benk˝o performed the analytical and numerical calculations and wrote the manuscript.Marcell Stippinger corrected analytical calculations, wrote python code for numerical calculationsand corrected the manuscript.Roberta Rehus carried out exploratory data analysis and proofreading.D´aniel Fab´o, Bogl´arka Hajnal, Lor´and Er˝oss recorded the EEG data, helped with data analysisand contributed to the manuscript text.Attila Bencze and Andr´as Telcs had profound eﬀect on the mFSA derivations and contributed tothe manuscript.Zolt´an Somogyv´ari led the research, helped to interpret the results of data analysis and contributedto the text.

Funding

The research reported in this paper was supported by the BME NC TKP2020 grant of NKFIH Hun-gary, by the BME-Artiﬁcial Intelligence FIKP grant of EMMI (BME FIKP-MI/SC), by the NationalBrain Research Program of Hungary (NAP-B, KTIA NAP 12-2-201), by the National Brain ProjectII, NRDIO Hungary, PATTERN Group, and by 2017-1.2.1-NKP-2017-00002 of NKFIH.

References [1] Peter Grassberger and Itamar Procaccia. Measuring the strangeness of strange attractors.

PhysicaD: Nonlinear Phenomena , 1983. 132] M E Houle, E Schubert, and A Zimek. On the Correlation Between Local Intrinsic Dimension-ality and Outlierness.

Lecture Notes in Computer Science (including subseries Lecture Notes inArtiﬁcial Intelligence and Lecture Notes in Bioinformatics) , 11223 LNCS:177–191, 2018.[3] Martin Dlask and Jaromir Kukal. Correlation Dimension Estimation from EEG Time Series forAlzheimer Disease Diagnostics. In

Proceedings of the International Conference on BioinformaticsResearch and Applications 2017 - ICBRA 2017 , pages 62–65. ACM Press, 2017.[4] G. E. Polychronaki, P. Y. Ktonas, S. Gatzonis, A. Siatouni, P. A. Asvestas, H. Tsekou, D. Sakas,and K. S. Nikita. Comparison of fractal dimension estimation algorithms for epileptic seizureonset detection.

Journal of Neural Engineering , 7(4), 2010.[5] Manish Sharma, Ram Bilas Pachori, and U. Rajendra Acharya. A new approach to characterizeepileptic seizures using analytic time-frequency ﬂexible wavelet transform and fractal dimension.

Pattern Recognition Letters , 94:172–179, jul 2017.[6] U. Rajendra Acharya, S. Vinitha Sree, G. Swapna, Roshan Joy Martis, and Jasjit S. Suri. Auto-mated EEG analysis of epilepsy: A review.

Knowledge-Based Systems , 45:147–165, jun 2013.[7] Mahito Sugiyama and Karsten M. Borgwardt. Measuring statistical dependence via the mutualinformation dimension.

IJCAI International Joint Conference on Artiﬁcial Intelligence , pages1692–1698, 2013.[8] Simone Romano, Oussama Chelly, Vinh Nguyen, James Bailey, and Michael E. Houle. Measur-ing dependency via intrinsic dimensionality. In , number 4, pages 1207–1212. IEEE, dec 2016.[9] Zsigmond Benk˝o, ´Ad´am Zlatniczki, Marcell Stippinger, D´aniel Fab´o, Andr´as S´olyom, Lor´andEr˝oss, Andr´as Telcs, and Zolt´an Somogyv´ari. Complete Inference of Causal Relations betweenDynamical Systems. arXiv , aug 2018.[10] Anna Krakovsk´a. Correlation dimension detects causal links in coupled dynamical systems.

En-tropy , 21(9), 2019.[11] P Campadelli, E Casiraghi, C Ceruti, and A Rozza. Intrinsic Dimension Estimation: RelevantTechniques and a Benchmark Framework.

Mathematical Problems in Engineering , 2015:1–21,2015.[12] Elizaveta Levina and Peter J. Bickel. Maximum likelihood estimation of intrinsic dimension.In

Advances in Neural Information Processing Systems . Neural information processing systemsfoundation, 2005.[13] Zoubin Ghahramani and David Mckay. Comments on ’Maximum likelihood estimation of intrinsicdimension’, 2005.[14] Mithun Das Gupta and Thomas S. Huang. Regularized maximum likelihood for intrinsic dimen-sion estimation.

Uai , 1(1), 2010.[15] A Rozza, G Lombardi, C Ceruti, E Casiraghi, and P Campadelli. Novel high intrinsic dimension-ality estimators.

Machine Learning , 89(1-2):37–65, 2012.[16] S Bassis, A Rozza, C Ceruti, G Lombardi, E Casiraghi, and P Campadelli. A Novel IntrinsicDimensionality Estimator Based on Rank-Order Statistics. In Francesco Masulli, Alfredo Pet-rosino, and Stefano Rovetta, editors,

Clustering High–Dimensional Data , pages 102–117, Berlin,Heidelberg, 2015. Springer Berlin Heidelberg.[17] Claudio Ceruti, Simone Bassis, Alessandro Rozza, Gabriele Lombardi, Elena Casiraghi, and PaolaCampadelli. DANCo: An intrinsic dimensionality estimator exploiting angle and norm concen-tration.

Pattern Recognition , 47(8):2569–2581, 2014.1418] Elena Facco, Maria D ’errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsicdimension of datasets by a minimal neighborhood information.

Scientiﬁc Reports , (August):1–8,2017.[19] Francesco Camastra and Alessandro Vinciarelli. Estimating the intrinsic dimension of datawith a fractal-based method.

IEEE Transactions on Pattern Analysis and Machine Intelligence ,24(10):1404–1407, oct 2002.[20] Kerstin Johnsson, Charlotte Soneson, and Magnus Fontes. Low Bias Local Intrinsic Dimen-sion Estimation from Expected Simplex Skewness.

IEEE Transactions on Pattern Analysis andMachine Intelligence , 37(1):196–202, jan 2015.[21] Oussama Chelly, Michael E. Houle, and Ken Ichi Kawarabayashi. Enhanced estimation of localIntrinsic Dimensionality using auxiliary distances.

NII Technical Reports , (7), 2016.[22] Laurent Amsaleg, Oussama Chelly, Teddy Furon, St´ephane Girard, Michael E Houle, Ken-ichiKawarabayashi, and Michael Nett. Estimating Local Intrinsic Dimensionality. In

Proceedings ofthe 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining -KDD ’15 , number Cd, pages 29–38, New York, New York, USA, 2015. ACM Press.[23] Laurent Amsaleg, Oussama Chelly, Teddy Furon, St´ephane Girard, Michael E. Houle, Ken-ichiKawarabayashi, and Michael Nett. Extreme-value-theoretic estimation of local intrinsic dimen-sionality.

Data Mining and Knowledge Discovery , 32(6):1768–1805, nov 2018.[24] Laurent Amsaleg, Oussama Chelly, Michael E. Houle, Ken-ichi Kawarabayashi, MiloˇsRadovanovi´c, and Weeris Treeratanajaru. Intrinsic Dimensionality Estimation within Tight Local-ities. In

Proceedings of the 2019 SIAM International Conference on Data Mining , pages 181–189.Society for Industrial and Applied Mathematics, may 2019.[25] Amir Massoud Farahmand, Csaba Szepesv´ari, and Jean-Yves Audibert. Manifold-adaptive di-mension estimation. In

Proceedings of the 24th international conference on Machine learning -ICML ’07 , pages 265–272. ACM Press, 2007.[26] Pierre Simon Laplace. Memoir on the Probability of the Causes of Events.

Statistical Science ,1(3):364–378, aug 1986.[27] Matthias Hein and Jean-Yves Audibert. Intrinsic dimensionality estimation of submanifolds inR d. In

Proceedings of the 22nd international conference on Machine learning - ICML ’05 , pages289–296. ACM Press, 2005.[28] Satohiro Tajima, Toru Yanagawa, Naotaka Fujii, and Taro Toyoizumi. Untangling Brain-WideDynamics in Consciousness by Cross-Embedding.

PLOS Computational Biology , 11(11):e1004537,nov 2015.[29] Florian Mormann, Klaus Lehnertz, Peter David, and Christian E. Elger. Mean phase coherence asa measure for phase synchronization and its application to the EEG of epilepsy patients.

PhysicaD: Nonlinear Phenomena , 144(3):358–369, 2000.[30] E.T. Bullmore, M.J. Brammer, P. Bourlon, G. Alarcon, C.E. Polkey, R. Elwes, and C.D. Binnie.Fractal analysis of electroencephalographic signals intracerebrally recorded during 35 epilepticseizures: evaluation of a new method for synoptic visualisation of ictal events.

Electroencephalog-raphy and Clinical Neurophysiology , 91(5):337–345, nov 1994.[31] Niina P¨aivinen, Seppo Lammi, Asla Pitk¨anen, Jari Nissinen, Markku Penttonen, and TapioGr¨onfors. Epileptic seizure detection: A nonlinear viewpoint.

Computer Methods and Programsin Biomedicine , 79(2):151–159, aug 2005. 1532] R. Esteller, G. Vachtsevanos, J. Echauz, T. Henry, P. Pennell, C. Epstein, R. Bakay, C. Bowen,and B. Litt. Fractal dimension characterizes seizure onset in epileptic patients. In , pages 2343–2346 vol.4. IEEE, 1999.[33] Guido Van Rossum and Fred L. Drake.

Python 3 Reference Manual . CreateSpace, Scotts Valley,CA, 2009.[34] Travis E Oliphant.

A guide to NumPy , volume 1. Trelgol Publishing USA, 2006.[35] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cour-napeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, St´efan J. van derWalt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nel-son, Eric Jones, Robert Kern, Eric Larson, CJ Carey, ˙Ilhan Polat, Yu Feng, Eric W. Moore, JakeVand erPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero,Charles R Harris, Anne M. Archibald, Antˆonio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt,and SciPy 1. 0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientiﬁc Computing inPython.

Nature Methods , 17:261–272, 2020.[36] John D Hunter. Matplotlib: A 2d graphics environment.

Computing in science & engineering ,9(3):90–95, 2007.[37] MATLAB.

MATLAB version 9.8.0.1396136 (R2020a) . The Mathworks, Inc., Natick, Mas-sachusetts, 2020.[38] Gabriele Lombardi. Intrinsic dimensionality estimation techniques.

MATLAB Central File Ex-change , Retrieved July 16, 2020.[39] S. Garcia, D. Guarino, F. Jaillet, T.R. Jennings, R. Pr¨opper, P.L. Rautenberg, C. Rodgers,A. Sobolev, T. Wachtler, P. Yger, and A.P. Davison. Neo: an object model for handling electro-physiology data in multiple formats.

Frontiers in Neuroinformatics , 8:10, February 2014.[40] Alexandre Gramfort, Martin Luessi, Eric Larson, Denis Engemann, Daniel Strohmeier, ChristianBrodbeck, Roman Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, and Matti H¨am¨al¨ainen.MEG and EEG data analysis with MNE-Python.

Frontiers in Neuroscience , 7:267, 2013.16 upplemental informationA Calculations for normalized distances

A.1 Distance density of the k -th nearest neighbor Let’s take K − D -sphere randomly, and we chose one with r distance from thecenter. This situation simulates a K -neighborhood, with normalized distances of K − r was the k th from thecenter. P ( k | r, K, D ) = (cid:18) K − k − (cid:19) r D ( k − (1 − r D ) K − k − (S.1)here r can take values from the [0 ,

1] interval.Moreover the probability density that there is a point at r radius is given by the following derivationformula: p ( r | D ) = Dr D − (S.2)If sampling process is independent, the pdf that a point is on the radius r from K − p ( r | K − , D ) = n (cid:88) j =1 n (cid:90) d r · · · (cid:90) d r i · · · (cid:90) d r n (cid:124) (cid:123)(cid:122) (cid:125) i (cid:54) = j p ( r , r , . . . r j = r, . . . r n | D )= n (cid:88) j =1 n (cid:90) (cid:90) · · · (cid:90)(cid:124) (cid:123)(cid:122) (cid:125) n − n (cid:89) i =1 Dr iD − d r i (cid:124)(cid:123)(cid:122)(cid:125) i (cid:54) = j = 1 n n (cid:88) j =1 Dr jD − = Dr D − (S.3)This is the prior pdf of distance, we assume uniform density in the n-sphere. This prior can beany density, we chose this speciﬁc form with respect to the maximum entropy principle and also forpractical reasons.From the previous two formulas, we can write up the joint mixed probability function: p ( k, r | K − , D ) = D (cid:18) K − k − (cid:19) r Dk − (1 − r D ) n − k (S.4)Also: p ( k | K − , D ) = 1 K − k th neighbor: p ( r | k, K − , D ) = P ( k | r, K − , D ) p ( r | K − , D ) p ( k | K − , D ) (S.6)= ( K − D (cid:18) K − k − (cid:19) r Dk − (1 − r D ) K − k − (S.7)= DB ( k, K − k ) r Dk − (1 − r D ) K − k − (S.8)Where B is the beta function. 17 .2 Maximum Likelihood estimation of intrinsic dimension Given a dataset, we can use the Maximum Likelihood principle to estimate intrinsic dimensionalityby using Eq. S.8. The dataset is K − d -dimensional sphere. Butﬁrst we have to express the likelihood function: L ( D | X ) = p ( r , ..., r K − | D ) (S.9)This expression can be factorized into a chain because p ( r k | r k +1 , r k +2 , ..., r K − ) = p ( r k | r k +1 ) whichis a Markov property of neighbor distances. L ( D | X ) = p ( r , ..., r K − | D ) = K − (cid:89) p ( r k | r k +1 , D ) (S.10)where r K = 1. p ( r k | r k +1 , D ) = kD (cid:18) r k r k +1 (cid:19) kD − r k +1 (S.11)So if we substitute back into the previous expression: L ( D | X ) = p ( r , ..., r n | D ) = K − (cid:89) p ( r k | r k +1 , d )= ( K − D K − r D − r D r D − r D r D − r D . . . r ( K − D − K − r ( K − DK = ( K − D K − (cid:32) K − (cid:89) r k (cid:33) D − (S.12)The log likelihood:log L ( D | X ) = (cid:32) K − (cid:88) log k (cid:33) + ( K −

1) log D + ( D − K − (cid:88) log r k (S.13)We seek for extrema of the likelihood function: ∂ log L ( D | X ) ∂D ! = 0 K − D + K − (cid:88) log r k ! = 0 (S.14) d ML = K − − (cid:80) K − log r k (S.15)This latter formula is basically equivalent to the local Levina-Bickel ML intrinsic dimension esti-mator if r k = R k R K . B Derivation of the pdf of the FSA estimator

The starting point of our derivation is the posterior density of r , computed in Section 1: p ( r | k, K − , D ) = DB ( k, K − k ) r Dk − (1 − r D ) K − k − (S.16)18e ﬁll in K = 2 k to the previous expression: p ( r | k, k − , D ) = DB ( k, k ) r Dk − (1 − r D ) k − (S.17)The pdf of w can be expressed from the pdf of r with simple intergal transform: p ( r | k, k − , D ) d r = q ( d ) d d (S.18)so q ( d ) = p ( r | k, k − , D ) (cid:12)(cid:12)(cid:12)(cid:12) d r d d (cid:12)(cid:12)(cid:12)(cid:12) (S.19)To compute the above expression, we ﬁrst express r as a function of d , then we compute thederivative. Afterwards we put the things together. d = − log 2log r = ⇒ r = exp (cid:18) − log 2 d (cid:19) = ⇒ d r d d = exp (cid:18) − log 2 d (cid:19) log 2 d (S.20)And ﬁnally, we put together these parts to get the pdf of the FSA estimator (Fig. 1): q ( d | k, D ) = DB ( k, k ) e ( − log 2 d ( Dk − ) (cid:16) − e ( − log 2 d D ) (cid:17) k − e ( − log 2 d ) log 2 d == D log (2) B ( k, k ) 2 − Dkd (cid:16) − − Dd (cid:17) k − d (S.21)where B ( k, k ) is the Euler beta function. C Supplemental Figures and tables

STable 1:

Used symbols with interpretation. k - the order of the neighbor (increasing order as the distance from the center rises) K − R - distance from center r - normalized distance from center r = R/R K ( r ∈ [0 , η - local density-dependent factor, approximately independent of RD - intrinsic dimensionality of the space where the points are. d - estimated intrinsic dimension value P - Probability, probability mass function p or q - probability density function (pdf) n - sample size N - number of realizations B - Euler beta function 19

10 20 30 40 50 60 70 80 D d A

10 20 30 40 d l o g E B e rr o r C error error P D P correct P error P error (| E | = 1) P error (| E | = 2) P error (| E | 3) SFig. 1:

Calibration procedure for the n = 2500 datasets up to D = 80 ( k = 5 ). The ﬁgureshows the calibration procedure on 100 instances of uniformly sampled hypercubes. A Dimensionestimates in the function of intrinsic dimensionality for the calibration hypercubes. The diagonal(dashed) is the ideal value, however the mFSA estimates (blue) show saturation because of ﬁnitesample and edge eﬀects. cmFSA estimates (red) are also shown, with the mean (yellow) almostaligned with the diagonal. B The relative error ( E ) in the function of uncorrected mFSA dimensionon semilogarithmic scale. The error-mFSA pairs (blue) lie on a short stripe for each intrinsic dimensionvalue. The subplot also shows id-wise average points (yellow) and the polynomial ﬁtting curve (red). C The error of cmFSA estimates in the function of intrinsic dimension on the calibration datasets.The mean error (blue line) oscillates around zero and the 99 .

7% conﬁdence interval (blue dashed)widens as ID grows. The rounding switchpoints are also shown. D The probability that cmFSA hitsthe real ID of data, or misses by one, two or more as a function of ID on the calibration dataset.20 d C D=1 D=2 D=3 d D=4 D=5 D=6 d D=7 D=8 D=9 k d D=10 k D=11 k D=12

10 20 30 40 50 t (ms)10 x ( a u ) A D d B SFig. 2:

Subsampling and embedding of the CSD signals. A

Mean Space-time separation plotof the CSD recordings, the lines show the contours of the 1% (blue), 25% (orange), and 50% (green)percentiles for the 34 - 16 interictal and 18 seizures - recordings (thin lines) and their average (thickline, D = 2). The ﬁrst local maximum is at around 5 ms (10 time steps), which appoints the propersubsampling to avoid the eﬀect of temporal correlations during the dimension estimation. B Intrinsicdimension in the function of the embedding dimension for the 88 recording-channels (averaged between k = 5 −

10, for the ﬁrst seizure). Dimension-estimates deviate from the diagonal above D = 3, thus wechose D = 2 ∗ C Intrinsic dimension in the function of neighborhoodsize for various embedding dimensions (88 channels, for the ﬁrst seizure). The dimension estimatesare settled at the neighborhood size between k=10 −

20 (dashed blue). The knee because of theautocorrelation becomes pronounced for