[PDF] Simplicial persistence of financial markets: filtering, generative processes and portfolio risk

Abstract

We introduce simplicial persistence, a measure of time evolution of network motifs in subsequent temporal layers. We observe long memory in the evolution of structures from correlation filtering, with a two regime power law decay in the number of persistent simplicial complexes. Null models of the underlying time series are tested to investigate properties of the generative process and its evolutional constraints. Networks are generated with both TMFG filtering technique and thresholding showing that embedding-based filtering methods (TMFG) are able to identify higher order structures throughout the market sample, where thresholding methods fail. The decay exponents of these long memory processes are used to characterise financial markets based on their stage of development and liquidity. We find that more liquid markets tend to have a slower persistence decay. This is in contrast with the common understanding that developed markets are more random. We find that they are indeed less predictable for what concerns the dynamics of each single variable but they are more predictable for what concerns the collective evolution of the variables. This could imply higher fragility to systemic shocks.

Full PDF

SSimplicial persistence of ﬁnancial markets:ﬁltering, generative processes and portfolio risk

Jeremy D. Turiel a,1 , Paolo Barucca a , and Tomaso Aste a,b a Department of Computer Science, UCL, Gower Street, WC1E6BT London, UK; b Systemic Risk Centre, London School of Economics and Political Sciences, London, UnitedKingdom; Corresponding author. E-mail: [email protected] 21, 2020

We introduce simplicial persistence, a measure of time evolution ofnetwork motifs in subsequent temporal layers. We observe longmemory in the evolution of structures from correlation ﬁltering, witha two regime power law decay in the number of persistent simpli-cial complexes. Null models of the underlying time series are testedto investigate properties of the generative process and its evolu-tional constraints. Networks are generated with both TMFG ﬁlteringtechnique and thresholding showing that embedding-based ﬁlteringmethods (TMFG) are able to identify higher order structures through-out the market sample, where thresholding methods fail. The decayexponents of these long memory processes are used to characteriseﬁnancial markets based on their stage of development and liquidity.We ﬁnd that more liquid markets tend to have a slower persistencedecay. This is in contrast with the common understanding that de-veloped markets are more random. We ﬁnd that they are indeed lesspredictable for what concerns the dynamics of each single variablebut they are more predictable for what concerns the collective evo-lution of the variables. This could imply higher fragility to systemicshocks. network theory | topological ﬁltering | TMFG | ling memory | complexsystems | time series analysis | ﬁnancial networks

1. Introduction

Networks representing the structure of the interactions withincomplex systems have been increasingly studied in the lastfew decades (1). Applications range from biological networksto social networks, infrastructures and ﬁnance (2). In ﬁnance- mainly due to the abundance of time-series data regardingeconomic entities and the lack of data on direct relationshipsbetween them - there has been an extensive focus on the es-timation of pairwise interactions from pair correlations (3)of the stochastic time series characterising ﬁnancial markets.The need to extract signiﬁcant links from noisy correlationmatrices has triggered the development of ﬁltering techniqueswhich yield sparse network structures (4–7) based on a lim-ited set of statistical or topological hypotheses. There arethree main approaches to network ﬁltering: thresholding, sta-tistical validation, and topological ﬁltering. These methodsshow that a meaningful and consistent taxonomy of ﬁnancialassets emerges from sparse network structures, in particularwhen applying topological methods. Thresholding methodsremove edges which are less signiﬁcant based on their strength(or its absolute value), quantile thresholding is one of thesemethods that we use in this paper. This method considersthe distribution of edge strengths and removes edges withstrength below a certain quantile level. It is often applied toﬁnancial correlation matrices due to its lack of assumptions onthe underlying distribution. Statistical validation - which con-stitutes a generalisation of simpler thresholding methods - hasbeen used to establish the signiﬁcance of edges in correlation matrices, with applications to economics and ﬁnance as well asother ﬁelds (8–11). Statistical validation can be implementedby comparing empirical networks with random networks fromconstrained randomisations which generate weighted ensem-bles of null models and allow to quantify the signiﬁcance ofobserved realisations with respect to the ensemble statistics ofthe null constrained model. Topological ﬁltering through theMinimum Spanning Tree (MST) technique was initially sug-gested by Mantegna (3), and was further extended to planargraphs with the Planar Maximally Filtered Graph (PMFG)(12) and more recently to chordal graphs with predeﬁned mo-tif structure, as the Triangulated Maximally Filtered Graphs(TMFG) in (13) and the Maximally Filtered Clique Forest(MCFC) in (14).Market eﬃciency imposes the absence of temporal memoryin log returns, but the presence of long memory in higher-ordermoments of returns and long-term dependence (autocorrela-tion) of absolute and squared returns have been observed andthe are now considered among the important stylised factsin markets (15), e.g. volatility clustering, a form of regimeswitching in the ﬂuctuations observed in ﬁnancial markets.In (16, 17), later extended in (18), it was shown that ordersigns obey a long memory process, balanced by anti-correlatedvolumes which guarantee market eﬃciency. In ﬁnancial timeseries analysis, through the generalised Hurst exponent analy-sis, it was demonstrated that memory eﬀects are related to thestage of maturity of the market, with more mature marketsbeing more random (19).With the present paper we provide the missing piece, con-necting market structure and market memory by analysingthe autocorrelation of market structures (20), through per-sistence of its ﬁltered correlation matrix (21). We analysea range of null models (22) - corresponding to a range ofparsimonious assumptions on the underlying generative pro-cesses - for groups of time series. We compare topologicaland thresholding network ﬁltering approaches on both nullmodel-based time-series ensembles and real data to test thelong memory properties of multivariate ﬁnancial time series.Each null model preserves diﬀerent aspects of the time series,allowing to validate hypotheses about the long memory ofmarket structures by ranking persistence decays of real timeseries against null models. We show how the edge and mo-tif persistences - such as triangles and tetrahedra - of thesemodels decay in TMFG-ﬁltered graphs and graphs obtainedby ﬁltering correlation matrices through quantile thresholding.We show how topological ﬁltering is a better suited tool toidentify persistently correlated groups of securities throughoutthe market. We compare TMFG with quantile thresholdingof the correlation matrix, at ﬁxed network density level ob-serving that quantile thresholding yields analogous results toplanar ﬁltering for edge persistence, but it fails to identify1 a r X i v : . [ q -f i n . S T ] S e p otifs distributed throughout the market sample generatinginstead highly localised and clustered structures. Further, wedemonstrate that our ﬁndings have a practical application byintroducing an unsupervised technique to identify groups ofstocks which share strong fundamental price drivers. This tech-nique can be of particular use in less traded markets, whereidentifying structures with shared fundamental price driversmight otherwise require in-depth knowledge of the companies.The rest of the paper is structured as follows. Section 2describes the data, methods and deﬁnitions used for this work,Section 3 outlines the main ﬁndings, Section 4 discusses theseﬁndings and Section 5 concludes the work with suggestionsfor future works.

2. Materials and methods

A. Data.

We select the 100 most capitalised stocks from fourstock markets: NYSE, Italy, Germany and Israel’s (400 stocksin total). Markets range from highly liquid and more developedones such as the New York Stock Exchange and the FrankfurtStock Exchange to less liquid markets such as the Italian StockExchange and the Tel Aviv Stock Exchange.We investigate daily closing price data from Bloomberg for:• New York Stock Exchange (3/01/2014 - 31/12/2018);• Frankfurt Stock Exchange (3/01/2014 - 28/12/2018);• Borsa Italiana (Italian Stock Exchange) (3/01/2014 -28/12/2018);• Tel Aviv Stock Exchange (5/01/2014 - 1/1/2019).The data respectively includes 1258 daily prices observa-tions for the NYSE, 1272 for FSE and BI and 1225 for TASE.

B. Time series null models.

We generate ensembles of nullmodels which preserve an increasing number of properties ofthe real time series.

Random return shufﬂing

Individual stock log-return ( r t =log P rice t − log P rice t − ) time series are randomly shuﬄed,i.e. a random permutation along the time dimension of eachvariable is applied, to obtain a null model for noise and spuri-ous correlations. This model maintains the overall statisticsof the values of each time series but eliminates any correlationstructure. Rolling univariate Gaussian generator

We calculate the rollingmean µ t − δ t ,t and standard deviation σ t − δ t ,t of the log-returnseries for each security separately. We then generate ensem-bles by sampling the return r t at each point in time fromthe (rolling) univariate Gaussian distributions with samplemean and standard deviation r t ∼ N ( µ t − δ t ,t , σ t − δ t ,t ), with N ( µ, σ ) being a normal distribution with mean µ and stan-dard deviation σ . This intends to simulate the process as asimple moving average with uncorrelated time-varying Gaus-sian random noise. Stable multivariate Gaussian generator

We calculate the mean µ (for each security) and covariance matrix Σ throughout thewhole length of the log-return time series. We then generateensembles by sampling the vector of returns r t at each pointin time for all securities from the ﬁxed multivariate Gaussianwith empirical means and covariance matrix, r t ∼ N ( µ , Σ ).This intends to represent an underlying ﬁxed market structurewith sampling noise. Rolling multivariate Gaussian generator

After obtaining the log-return time series, we calculate the rolling mean µ [ t − δ t ,t ] (foreach security) and covariance matrix Σ [ t − δ t ,t ] between theseries. We generate ensembles by sampling the return at eachpoint in time r t for all securities from the (rolling) multivariateGaussian distributions with sample means and covariancematrices r t = N ( µ [ t − δ t ,t ] , Σ t − δ t ,t ] ). This intends to detectthe changing market structure and simulate the process asbeing generated by a multivariate Gaussian distribution withtime-varying constraints on structural relations. C. Correlation matrix estimation.

We then compute for thetime series correlation matrices with exponential smoothingfrom rolling windows of δ = 126 trading days with smoothingfactor of θ = 46 days. This is done for all realisations of eachnull model ensemble and for the real data.Correlations are noisy measures of co-movements of ﬁnan-cial asset prices, which are often non-stationary within theobservation window. Longer time windows beneﬁt the mea-sure’s stability, as we have more observations to estimate the N ( N − / N assets. However, alonger observation window can come with the disadvantage ofweighting more and less recent co-movements equally with therisk of averaging over a period in which the values are non-stationary. In order to compensate for this eﬀect, we applythe exponential smoothing method for Kendall correlations(23). This allows for more stable correlations, as the methodapplies an exponential weighting to the correlation window,prioritising more recently observed co-movements. D. Filtering: quantile thresholding and TMFG.

We apply twoﬁltering techniques with fundamental diﬀerences. The ﬁrstﬁltering method is quantile thresholding, which correspondsto hard thresholding to generate an adjacency matrix throughthe binarisation of individual correlations. For a correlationvalue v q corresponding to the quantile level q of the matrixvalues, the adjacency matrix is deﬁned as A i,j = (cid:26) ρ i,j ≥ v q , ρ i,j < v q , . This ﬁltering technique is entirely value-based with nostructural or other constraints. We apply it by providing aquantile level q which yields edge sparsity analogous to thatof the corresponding TMFG ﬁlter.The second ﬁltering technique is the TMFG method (13).This topological ﬁltering technique embeds the matrix withtopological constraints on planarity in a graph composed bysimplicial triangular and tetrahedral cliques. Edges are addedin a constrained fashion with priority according to their (ab-solute) value. The graph essentially corresponds to tilinga surface of genus 0. This technique represents a ﬁlteringmethod that accounts for values, but also imposes an underly-ing chordal structural form which might help regularising theﬁltered graph also for probabilistic modeling (24). Further-more, this technique imposes higher order structures, namelytriangles and tetrahedra, which are known to be a feature ofﬁnancial markets and social networks. E. Simplicial persistence.

We focus on temporal persistenceof tetrahedral and triangular simplicial complexes (motifs)in the TMFGs and graphs ﬁltered via quantile thesholdingonstructed from correlations over rolling windows. TMFGnetworks can be viewed as trees of tetrahedral (maximal)cliques connected by triangular faces, these are triangularcliques with diﬀerent meaning in the taxonomy, called separa-tors. If removed, separators split the graph into two parts. Notall triangular faces of the tetrahedral cliques are separatorsand we will refer to those which are not as triangles.This distinction is discarded for the results in Section Bin order to account for all triangles in the ﬁltered graph, asquantile thresholding does not distinguish between triangularfaces and separators.A motif corresponding to clique X c is considered soft-persistent at time t + τ if and only if the motif is presentat both the initial time t and at t + τ . A visual intuition formotif (triangle) persistence through time is provided in Figure1. Fig. 1.

Motif persistence visualisation.

Visual representation of a TMFGstructure’s motif (triangle) persistence in time. The green triangle in ﬁgure a) ispersistent through ﬁgure c), while other two triangles (present in ﬁgure a) within thered triangle) do not persist due to the rewiring of an edge. Figure b) shows one ofnon-persistent triangles with dashed contour. The rewired edge is also dashed. Thisvisualisation aims at showing the impact of edge rewiring on motif persistence andthe difference between edge and motif persistence.

We investigate the decay in the number of persistent motifsbetween ﬁltered correlation networks with observation windowsprogressively shifted by one trading day and we quantify howthe average persistence decays with the time shift τ .Here we use a form of soft persistence which is diﬀerent fromhard persistence (survival) of motifs which is more commonin the literature (25, 26). Speciﬁcally, the average motifpersistence in the plateau regime is deﬁned as h P m ( X c ) i T, T = 1 T · T − τ plat · T X t =0 T X τ = τ plat P m ( X t,t + τc ) , [1]where τ plat denotes the transition point to the plateau region.The average persistence for the entire clique set over T startingpoints at time shift τ is deﬁned as h P m ( X τ ) i T,C = 1 T · | C | · T X t =0 X c ∈ C P m ( X t,t + τc ) . [2]Where, considering the motif sets X tC = {X ti } i =1 ,...,C and X t + τC = {X t + τi } i =1 ,...,C , the binary persistence value of motif c ∈ C at time t and t + τ is P m ( X t,t + τc ) = ( X c ∈ X tC ) ∧ ( X c ∈ X t + τC ) [3]We obtain the power law ﬁt for the decay law and identifytwo regimes: one with a faster decay followed by one witha slower decay. The transition point τ plat is computed by minimising the unweighted average mean squared error (MSE)between the two ﬁts over all possible transition points in time.We also compare the decay exponents for multiple randomstock selections over diﬀerent markets to identify whether thesteepness of motif decay (edge, closed triad or tetrahedronclique) is indicative of market stability/development stage.We further investigate more liquid markets such as the NYSEfrom both a quantitative and qualitative point of view.Weclassify motifs in the plateau by their soft persistence andstudy the sector structure of the most persistent motifs.In order to further justify the analysis of motifs over individ-ual edges, we test the null hypothesis that motifs are formedby edges in the network whose existence is not mutually depen-dent. The assumption would imply that coexistence of edges inmotifs is not statistically signiﬁcant and that motif structureshave no extra persistence beyond the individual edges thatform them. The hypothesis being tested implies that motifpersistence is simply the result of persistence characterisingtheir component edges: P m ( χ t,t + τc ) = P m ( χ t,t + τc ) · P m ( χ t,t + τc ) · P m ( χ t,t + τc ) , [4]where the motif and its edges are deﬁned as χ t,t + τc = { χ t,t + τc , χ t,t + τc , χ t,t + τc } .In order to provide an application to systemic risk, weconstruct a portfolio containing all stocks in the ten mostpersistent motifs in the plateau region, as deﬁned in Equation1 (for each market). We then compare its volatility with thatof random portfolios with the same number of assets.

3. Results

The main ﬁndings of this work are described in this section,starting with an overview of results on the long memory ofedges and simplicial complexes in TMFG-ﬁltered correlationnetworks. The section continues with an analysis of nullmodels of ﬁnancial market structures, described in SectionB, and a comparison with real data to gain insights aboutthe generative process of the stochastic structure. We thensuggests how soft persistence captures the underlying changein market structure by relating its decay exponent to the stageof development (a proxy for stability) or average traded volumein the market (a proxy for liquidity which yields well-deﬁnedstable structures). We conclude the section with results insystemic risk applications to ﬁnancial portfolios where we showthat the most persistent motifs correspond to stocks in thesame sector and demonstrate how the portfolio of 10 mostpersistent motifs is highly volatile and systemic.

A. Long-term memory of motif structures.

The plot in Figure(2) shows the power law decay (evident from the linear trendin log-log scale) in h P m ( X τ ) i T =200 ,C vs. τ , followed by aplateau region that also decays as a power law, but with asmaller exponent. We also observe that all motif decays have τ plat ∈ [ δt window / , δt window ], where δt window represents thelength of the estimation window of the correlation matrix.The window used has δt window = 126 trading days and avalue of θ = 46 for exponential smoothing, as per (23). Thechoice of δt window corresponds to roughly 6 months of tradingand satisﬁes N < δt window , with N the number of assets inthe correlation matrix. The correlation matrix is hence well-conditioned and invertible. On the other hand the exponentialmoothing with θ = 46 mainly considers recent observationsfrom the latest few months.There are N − N − Fig. 2.

Persistence Decay.

Decay of triangular clique faces, separators andclique motifs persistence for 100 NYSE stocks, as a function of time interval δ t =[0 , (average over 200 starting points). The two power-law regimes are identiﬁedby the minimum MSE sum of the ﬁts. In Figure (2) we notice that the minimum MSE for thetwo linear ﬁts is achieved at the transition point between thedecay phase and the plateau. The transition point τ plat cantherefore be identiﬁed by minimising a standard ﬁt measurewith two phases, which strengthens the unsupervised natureof our method. The method for minimum MSE search isdescribed Section E. B. Null models of persistence in ﬁltered structures.

We re-port results for the edge and motif (triangle) persistence forreal data as well as for the null models described in Section B.We compare real data with null models and TMFG ﬁlteringwith quantile thresholding.Figure 3 shows the decay in edge persistence for both ﬁlter-ing methods. We notice that the random shuﬄing null modellies at the bottom, as it should produce completely randomstructures with little residual persistance due to probabilis-tic combinatorics and structural ﬁltering constraints in theTMFG. This shows that persistence is not an artifact of anyof the ﬁltering techniques used and not a mere result of returnvolatility of individual assets (which is preserved by returnshuﬄing). From Fig. (2) we also notice that the rolling uni-variate Gaussian model lies just above as it does not accountfor structure at all and only preserves rolling means and stan-dard deviations, this shows how persistence cannot merely beattributed to common long term trends or volatility variations.This null model carries some broad sense of structure andmarket direction and it shows how persistence does not merelyoriginate from overall market trends. We then ﬁnd a secondcluster, of structured models, with the rolling multivariateGaussian at the bottom. This shows how market persistencegoes beyond asset means and covariance, even after spuriousstructures have been removed. We then ﬁnd the real data,just below the stable multivariate Gaussian. This shows howmarkets have slowly evolving structures.Figure 4 shows the decay in triangular motif persistencefor both ﬁltering methods. We notice results analogous tothose in Figure 3 for TMFG ﬁltered graphs. Graphs ﬁlteredthrough quantile thresholding instead show a high level of noise in their top cluster (where structure is present). Ahigher number of motifs than those of the TMFG is found,but the ranking of null models is at times inconsistent, as wellas the position of the decay curve for real data. We wouldhave expected some triangles to break when looking at edgepersistence only, as well as to ﬁnd that the clustering coeﬃcientdecreases in persistent graphs (as it does in TMFG graphs).The clustering coeﬃcient for quantile thresholding-persistentgraphs is also found to be much higher, suggesting that theﬁltered structure is highly localised and clustered, while thatof the TMFG is more distributed, identifying systemic groupsof stocks throughout the market structure.

Fig. 3.

Edge persistence decay of null models.

Edge persistence decaywith δ τ for the time series null models of market returns and real data for the NYSE.We notice how for both TMFG ﬁltering and quantile thresholding the real data liesbetween the rolling multivariate Gaussian ensemble and the stable multivariate Gaus-sian ensemble. This indicates that the real market structure does evolve slowly in time,but with persistence beyond what can be inferred from estimates of its covariancestructure. E dg e P e r s i s t e n c e Q u a n t il e E dg e P e r s i s t e n c e T M F G shuffled ensemble (1)multivariate rolling ensemble (4)univariate rolling ensemble (2)fixed gaussian ensemble (5)hamiltonian ensemble (3)real Fig. 4.

Motif (triangle) persistence decay of null models.

Motifpersistence decay with δ τ for the time series null models of market returns and realdata for the NYSE. We notice how for TMFG ﬁltering the real data still lies between therolling multivariate Gaussian ensemble and the stable multivariate Gaussian ensemble(as in Figure 3). We instead notice that the decay ordering is noisier for quantilethresholding, showing how the method’s focus on individual connections affects itgeneralisation to motifs. This is despite the higher number of motifs in the quantilethresholding graph. T r i a n g l e P e r s i s t e n c e Q u a n t il e T r P e r s i s t e n c e T M F G shuffled ensemble (1)multivariate rolling ensemble (4)univariate rolling ensemble (2)fixed gaussian ensemble (5)hamiltonian ensemble (3)real C. Market classiﬁcation via decay exponent.

We now considerhow the decay exponent of TMFG graphs behaves acrossmarkets. Table (1) compares the decay exponents for cliques,triangular motifs and clique separators in the NYSE, Germanstock market, Italian stock market and Israeli stock market.The decay exponent α is obtained from the ﬁt based on thefollowing expression, h P m ( X τ ) i T,C = β · τ α [5] able 1. Exponents for the decay power law regime com-puted with MSE. The analysis refers to 100 randomly se-lected stocks amongst the 500 most capitalised, over timeintervals τ = [0 , and t = [0 , ..., diﬀerent initial tem-poral network layers. For all motif analyses in this work,triangles and separators constitute non-overlapping sets,as these represent theoretically and taxonomically diﬀer-ent structures and decay characteristics. Market Clique Triangular Motif Clique SeparatorNYSE -0.392 -0.493 -0.245Germany -0.792 -0.598 -0.381Italy -0.785 -0.811 -0.174*Israel -1.024 -0.866 -0.728 * Result compromised by regimes not well identiﬁed for motifdecay in large systems ( ≈

100 stocks).We notice from the results in Table (1) that the NYSE,which is clearly the most developed and liquid stock market,has the lowest decay exponent (in modulus, which correspondsto the slowest decay) for both cliques and triangles. Thisindicates that its correlations are more stable on a shortertime window.Germany and Italy have similar values for cliqueexponents, with Germany seemingly more stable in termsof triangular motifs. Israel, a younger and less liquid stockmarket, follows with a faster decay in both tetrahedral cliquesand triangular motifs. The ordering of these markets is notclearly identiﬁable in clique separators as noise in the data doesnot allow for the two decay regimes to be correctly identiﬁed inall markets (in this case for Italy). Separators have a distinctrole and meaning in the graph’s taxonomy and further workshould allow for a more thorough analysis of those.We observe promising results for a monotically increasingrelation between the decay exponent and the average dailyvolume of the market. The solidity of this result shall beinvestigated in future works.In Table (1) the decay exponent is not adjusted by theprobability that all edges in the clique must be present in thetemporal layer for the clique to exist. We show in Table (2)that, when adjusted by the probability of all its edges existingsimultaneously, triangular motifs have a slower decay thanindividual edges. The results in Table (2) are obtained froma set of randomly selected stocks diﬀerent to those used forTable (1). This adds further conﬁdence in the results andtheir generality.We stress that Table (2) falsiﬁes the hypothesis that motifsare formed by edges in the network whose existence is notmutually dependent (Equation 4). This is falsiﬁed by theconsistently lower decay exponent (in modulus) for adjustedpersistence of triangular motifs. We can then conclude thatmotifs are more stable structures across temporal layers ofthe network, with signiﬁcant interdependencies in their edges’existence.

D. Sector analysis in persistent motifs.

Figure (5) provides avisualisation of the network components formed by the tenmost persistent triangles in the NYSE. We observe that allstrongly persistent triangles have elements which belong tothe same industry sector. Table 3 shows this for the same tentriangles displayed in Figure (5). We notice that stock pricesin the sectors in Table (3) are mostly driven by sector-widefundamentals, which justify the persistent structure in the

Table 2.

Exponent for the power law decay regime identiﬁedby MSE in diﬀerent sample markets. The analysis refersto 100 randomly selected stocks amongst the 500 mostcapitalised, over time intervals τ = [0 , and t = [0 , ..., diﬀerent initial temporal network layers. Market Edge Triangular Motif Triangular Motif**NYSE -0.164 -0.398 -0.133Germany -0.265 -0.471 -0.157Italy -0.144* -0.458 -0.153Israel -0.397 -0.830 -0.277 * Result compromised by regimes not well identiﬁed for edgedecay in large systems ( ≈

100 stocks)** Motif exponent adjusted by the probability ofsimultaneous edge persistence in the motif).

Fig. 5.

Persistent NYSE motifs visualised.

Network representation of theten most persistent triangular motifs in the TMFG layers for the 100 most capitalisedstocks of the NYSE.. long term. Other motifs are constituted by ETFs and theirmain holdings ∗ .We also investigate whether motif persistence and motifstructures can be easily retrieved from the original correlationmatrix. The purpose of this is to check that our TMFGﬁltering method is not redundant and trivially replaceable. Totest this, we consider the ten most present persistent trianglesacross the plateau region and check their overlap with the tenmost correlated triplets in each unﬁltered correlation matrix.We ﬁnd that no more than one triangle lies in the intersectionbetween the two sets, in each temporal layer. We also checkthe correlation between motif persistence and the averagesum or product (results are equivalent for our purpose) of itsindividual edges’ correlation for all unﬁltered correlation layers.We observed through the Pearson and Kendall correlationvalues that the two measures are only loosely related, ascorrelation explained no more than 20% of the variance in theset of variables with large persistence. E. Portfolio volatility and systemic risk of persistent motifsvs. random portfolios.

Portfolio volatility distribution for the ∗ The reason for the existence of these motifs is intuitive and does not affect our analysis, as ETF-related motifs are unlikely to be present in the network formed by a random selection of stocks orby stocks in a portfolio. These motifs are present here as we focus on the 100 most capitalisedsecurities in the NYSE, which include ETFs. able 3.

Motif components and Financial Times sector aﬃliation for the ten most persistent motifs in the NYSE’s 100most capitalised stocks.

Security 1 Security 2 Security 3 FT SectorBiogen Inc Gilead Sciences Inc Celgene Corp BiopharmaceuticalUnitedHealth Group Inc Cigna Corp Anthem Inc Health CareBiogen Inc Gilead Sciences Inc Amgen Inc Biopharma/techBank of America Corp JPMorgan Chase & Co Morgan Stanley Financials-BanksVanguard FTSE ETF** MSCI EAFE ETF Vanguard FTSE ETF*** Index ETFsInvesco QQQ Trust* Amazon.com Inc Alphabet Inc TechConocoPhillips Schlumberger NV Exxon Mobil Corp Oil & GasNVIDIA Corp Texas Instruments Inc Broadcom Inc Tech HardwareChevron Corp Schlumberger NV Exxon Mobil Corp Oil & GasChevron Corp ConocoPhillips Schlumberger NV Oil & Gas * ETF on NASDAQ - Top Holdings include Amazon, Facebook, Apple, Alphabet** Vanguard FTSE Developed Markets Index Fund ETF Shares*** Vanguard FTSE Emerging Markets Index Fund ETF Shares100 most capitalised stocks inWe check that a portfolio formed by the 10 most persistentmotifs in each market has a highly enhanced out of samplevolatility due to its stable correlations.To do this, we consider the volatility of the motif portfolioand a distribution of volatilities for 10 randomly selectedportfolios with the same number of stocks.As expected, we observe the motif portfolio to yield avolatility vol motif close to the higher end of the distribution,i.e. ( vol motif − h vol random i ) > · σ ( vol random ), throughoutthe considered markets. We should highlight that the volatilityof portfolios is evaluated out of sample with respect to theperiod the persistence was calculated on, showing that thismethod is not only observational, but also predictive.Due to the more theoretical nature of this work, we referthe interested reader the work by some of the authors of thispaper for a more thorough analysis of portfolio applicationsand forecasting (27).

4. Discussion

The power law decay of edge and simplicial soft persistencesreported in Figure 2 suggests that market structures are char-acterised by a slow evolution which allows for long memoryin temporal layers. This decay type is in contrast with anexponential decay of the persistence which would imply in-stead short or no memory in the system. This observationis in line with the works by Bouchaud et al. and Lillo at al.in (16, 18, 19, 28), where power law decays in autocorrelationare identiﬁed as manifestations of long-memory processes ineﬃcient markets. However, it extends the concept to higherorder structures.The comparison between soft persistence in correlationstructures from real data and artiﬁcial data generated fromdiﬀerent null models (Figs. 3 and 4) demonstrates that thepersistence of real structures goes beyond all univariate nullmodels, hence conﬁrming long memory as a characteristicrequiring structural constraints. Also we demonstrate thatreal structures overcome the persistence of the rolling multi-variate Gaussian, hence suggesting that pairwise covariancesand moving averages do not suﬃce to induce the long memorypresent in real markets. As per the analysis on motif persis-tence beyond those of individual edges, we suggest that higher order relations in terms of structural evolution are present.The ordering of null models in Figure 3 further supports thevalidity of the persistence measure.The comparison of simplicial persistence of triangles be-tween quantile thresholding and TMFG ﬁlteres graphs, re-ported in Figure 4, reveals that quantile thresholding strugglesto separate the decay of real structures from that of rollingGaussian generated ones. This could be attributed to the“local” nature of the method, which matches the pairwise in-terpretation of relations in generating from a rolling Gaussian.TMFG ﬁltered graphs instead, perhaps due to their non-localembedding, provide a consistent ordering of null models withrelatively low noise.The ability to correctly identify persistent motifs through-out the market sample is essential as the most persistent motifswere found to be highly systemic (Section E). Persistent struc-tures in quantile thresholded graphs present higher and morestable clustering coeﬃcients. This suggests a very localisedand compact structure. TMFG ﬁltered graphs instead presenta lower clustering coeﬃcient and a decay with τ , as expectedsince some structures break. This is further evidence of theability of the TMFG ﬁltering method to identify meaningfulpersistent structures throughout the market. The issue withquantile thresholding is likely due to the method being merelyvalue-based with no sensible structural constraint, diﬀerentlyfrom the TMFG.The ranking of national markets based on their decay expo-nents in Table 1 can be interpreted in terms of the reductionof estimation noise in more liquid markets, as large deviationsbecome less likely and correlations as well as prices more re-ﬂective of the underlying generative processes and structures.Structures are perhaps clearer too and deviations are exploitedmore quickly if they emerge. This suggests that more eﬃcientand capitalised markets are characterised by structures whichare more stable in time and better reﬂected by the data. Thedecay exponent ranking also leads to the conclusion that moredeveloped markets are characterised by more meaningful un-derlying structures and cliques, suggesting that systemic riskmay represent a greater threat in developed markets.The results in Table 2 support the hypothesis that motifsconstitute meaningful structures in markets, beyond their indi-vidual edges. These results test the independence null model ofindividual edges in motif formation and show solid evidence toeject it. We can then conclude that highly persistent motifsare not a mere consequence of highly persistent individualedges, but also of the correlation in those edges existing con-currently. This results ties in with the above discussion onthe issues with locality of ﬁltering methods and generativeprocesses.Table 3 strengthens the importance of persistent motifs.Indeed, the ten most persistent motifs visualised in Figure 5are representative of industry sectors in the NYSE. Thesesectors are not identiﬁed by the motifs with higher edge cor-relation, which instead are dominated by motifs often dueto correlation noise in high volatility stocks. Persistence andthe identiﬁcation of persistent motifs are hence found to benon-trivial with respect to correlation strength of individualedges or motifs. The impact on portfolio diversiﬁcation of themotifs in Figure 5 indicates that these structures are highlyrelevant for systemic risk and portfolio volatility, with highpredictive power provided by the long memory property ofpersistence, which is an intrinsic temporal feature. As thesemotifs are not characterised by noticeably strong correlations,a common variance optimisation of the portfolio is unlikelyto optimise the weights to suﬃciently minimise the risk fromthese highly systemic structures.The systemic relevance of persistent motifs as well as theirout of sample forecasting power are shown by the results inSection E and in (27), where signiﬁcantly higher out of sampleportfolio volatility is observed for the portfolio of persistentmotifs. The motif portfolio volatility is signiﬁcantly aboveboth the mean and median of the random portfolios’ volatilitydistribution.This is a ﬁrst example of how just selecting stocks fromthe ten most persistent motifs forms a portfolio with higherlong term volatility. Clearly when aiming for a reduction insystemic risk, low volatility (the opposite) is the objective.The observations from Section E and (27) lay the groundfor the construction of portfolios where asset weights aim toreduce the volatility originating from persistent correlationsin motif structures.

5. Conclusion

The present work introduces the concept of simplicial persis-tence, focusing on the soft persistence in simplicial cliques.This measure is applied to a complex system with a slowlyevolving stochastic structure, namely ﬁnancial markets. Thegraph structures are obtained from Kendall correlations withexponential smoothing and ﬁltered with the TMFG or throughquantile thresholding. The slow evolution of these systemswith time manifests long memory in their structure with a tworegime power law decay in persistence with time. The tran-sition point between regimes is identiﬁed in an unsupervisedway with mean-squared error minimisation.Null models of market structure are then used to test hy-potheses about the generative process underlying the system.Two persistence decay clusters are observed, where the leastpersistent corresponds to null models with no structural con-straints and the upper one (most persistent) comprises therolling multivariate Gaussian (lowest), real data, and the stablemultivariate Gaussian (highest).Simplicial persistence of higher order structures in real dataand null models is hardly recognised by value-based thresh-olding methods which are unable to identify persistent cliques throughout the market sample. Decay exponents for diﬀerentmarkets are then observed to provide a ranking correspondingto their liquidity or stage of development, which suggests that,despite these systems being less predictable in their individ-ual series, they are more stable and predictable in terms ofstructure. Most persistent motifs are found to correspond tosectors where the price of stocks is mostly driven by sector-widefundamentals.Based on the ability of simplicial persistence to forecastand identify strongly correlated clusters of stocks, the impactof persistence-based systemic risk on portfolio volatility isveriﬁed with a comparison between the ten most persistentmotifs portfolio and random portfolios of the same size (27).The present work provides further evidence of how networkanalysis and complex systems can enhance our understandingof real world systems beyond traditional methods. Our resultsand methods lay the ground for future studies and modellingof the evolution of stochastic structures with long memory.

6. Acknowledgments

TA and JT acknowledge the EC Horizon 2020 FIN-Tech projectfor partial support and useful opportunities for discussion. JTacknowledges support from EPSRC (EP/L015129/1). TAacknowledges support from ESRC (ES/K002309/1), EPSRC(EP/P031730/1) and EC (H2020-ICT-2018-2 825215).

1. Newman M (2018)

Networks . (Oxford university press).2. Strogatz SH (2001) Exploring complex networks. nature

The European PhysicalJournal B-Condensed Matter and Complex Systems

Available at SSRN 3294548 .5. Cimini G, et al. (2019) The statistical physics of real-world networks.

Nature Reviews Physics arXiv preprint arXiv:1903.10805 .7. Masuda N, Kojaku S, Sano Y (2018) Conﬁguration model for correlation matrices preservingthe node strength.

Physical Review E

Physica A: Statistical Mechanics and its Applications

PloS one arXiv preprintarXiv:1902.07074 .11. Marcaccioli R, Livan G (2019) A pólya urn approach to information ﬁltering in complex net-works.

Nature communications

Proceedings of the National Academy of Sciences

Journal of complex Networks arXiv preprint arXiv:1905.02266 .15. Cont R (2001) Empirical properties of asset returns: stylized facts and statistical issues.

Quantitative Finance

Studies in nonlineardynamics & econometrics

Phys. Rev.E

Handbook of ﬁnancial markets: dynamics and evolution . (Elsevier), pp. 57–160.19. Di Matteo T, Aste T, Dacorogna MM (2005) Long-term memories of developed and emergingmarkets: Using the scaling analysis to characterize their stage of development.

Journal ofBanking & Finance

Phys. Rev. E

Phys.Rev. E

Phys. Rev. E

The Euro-pean Physical Journal B

Phys. Rev. E

BMC bioinformatics

Network Theory in Finance

International Conference on Complex Networks and Their Applications .(Springer), pp. 573–585.28. Bouchaud JP, Gefen Y, Potters M, Wyart M (2004) Fluctuations and response in ﬁnancialmarkets: the subtle nature of ‘random’ price changes.