[PDF] Iterated Integrals and Population Time Series Analysis

Abstract

One of the core advantages topological methods for data analysis provide is that the language of (co)chains can be mapped onto the semantics of the data, providing a natural avenue for human understanding of the results. Here, we describe such a semantic structure on Chen's classical iterated integral cochain model for paths in Euclidean space. Specifically, in the context of population time series data, we observe that iterated integrals provide a model-free measure of pairwise influence that can be used for causality inference. Along the way, we survey recent results and applications, review the current standard methods for causality inference, and briefly provide our outlook on generalizations to go beyond time series data.

Full PDF

IITERATED INTEGRALS AND POPULATION TIME SERIESANALYSIS

CHAD GIUSTI AND DARRICK LEE

Abstract.

One of the core advantages topological methods for data analysisprovide is that the language of (co)chains can be mapped onto the semantics ofthe data, providing a natural avenue for human understanding of the results.Here, we describe such a semantic structure on Chen’s classical iterated integralcochain model for paths in Euclidean space. Speciﬁcally, in the context ofpopulation time series data, we observe that iterated integrals provide a model-free measure of pairwise inﬂuence that can be used for causality inference.Along the way, we survey recent results and applications, review the currentstandard methods for causality inference, and brieﬂy provide our outlook ongeneralizations to go beyond time series data.

The growing availability of population time series data drawn from observa-tions of complex systems is driving a concomitant demand for analytic tools. Ofparticular interest are methods for extracting features of the time series which pro-vide human-understandable links between the observed function and the unknownstructure or organizing principles of the system.Over the last decade, substantial work has been done using persistent homologyfor time series analysis, including [15, 31, 34]. However, there are still substantialmathematical and conceptual barriers to direct interpretation of persistence dia-grams in terms of the underlying data; most successes have come from statisticalanalyses of families of diagrams, which provide some measure of discriminatorypower between systems. Thus, it is common to rely on persistence for classiﬁ-cation. However, the success of such a program must be measured against thecapabilities of modern machine learning tools, which appear capable of being tunedto out-perform topological methods. Using the results of topological computationsas a pre-processing step for machine learning tools has been successful, providinga rich-but-low-dimensional feature set which retains strong discriminatory power,but human interpretation of the results suﬀers from the same diﬃculties as before.It is the authors’ opinion that one of applied topology’s greatest potential ad-vantages is the ability to ask speciﬁc, fundamentally qualitative questions of datasets and compute answers in a context and language that humans can interpret.The machinery of (co)homology provides a blueprint for asking and answering suchquestions, in the form of (co)chain models. However, rather than encoding dataand then searching for meaning in the (co)homology, the authors propose selectingor designing the encoding topological space explicitly for the purpose of leveraginga (co)chain model which naturally encodes questions and answers of interest.This is a nuanced undertaking, perhaps best undertaken in the context of acollaboration between mathematicians and scientiﬁc domain experts. However,in the case of certain general data types, we can rely on the substantial extant

Date : April 30, 2019. a r X i v : . [ s t a t . O T ] A p r CHAD GIUSTI AND DARRICK LEE literature on cochain models in algebraic topology for inspiration. For example,in the case of our motivating question about families of time series, we can makeuse of the iterated integral model for cochains on P R N , originally developed byK. T. Chen [8, 9, 10, 11], more recently adapted to the study of stochastic diﬀerentialequations [18, 27], and ﬁnally picked up by the machine learning community in theguise of path signatures as a feature set for paths. In this paper, we will survey pathsignatures, the 0-cochains in Chen’s iterated integral model, and their fundamentalproperties, discuss how they have been applied to characterize cyclic structure inobserved time series, and oﬀer a new interpretation of lower-order iterated integralsas a measure of causality among simultaneously observed time series. Finally, webrieﬂy provide our outlook on how higher cochains, and cochain models of moregeneral mapping spaces may be leveraged for data analysis beyond time series.1. Path signatures as iterated integrals

Consider a collection of N simultaneous real-valued time series, γ i : [0 , → R , i = 1 , . . . , N , thought of as coordinate functions for a path Γ ∈ P R N = C ([0 , , R N ). Foundational work by K.T. Chen used iterated integrals to producea rational cochain model for this space. Deﬁnition 1.1.

Suppose d x , . . . , d x N are the standard 1-forms for R N . For t ∈ [0 , t = Γ | [0 ,t ] . For i ∈ [ N ], deﬁne a path S i (Γ)( t ) = (cid:90) Γ t d x i = (cid:90) t Γ ∗ d x i ( s ) = (cid:90) t d γ i ( s ) . Let I = ( i , . . . , i k ), where i l ∈ [ N ]. Higher order paths are inductively deﬁned as S I (Γ)( t ) = (cid:90) t S ( i ,...,i k − ) (Γ)( s )d γ i k ( s ) . The iterated integral of Γ with respect to I is deﬁned to be S I (Γ) := S I (Γ)(1).We can also deﬁne the iterated integral in a non-inductive way. Let ∆ k be thesimplex ∆ k = { ( t , . . . , t k ) | ≤ t ≤ . . . ≤ t k ≤ } . By direct computation, we have Γ ∗ d x i = γ (cid:48) i ( t )d t . Then, the iterated integral of Γwith respect to I is equivalently deﬁned as(1.1) S I (Γ) = (cid:90) ∆ k γ (cid:48) i ( t ) γ (cid:48) i ( t ) . . . γ (cid:48) i k ( t k ) d t d t . . . d t k . These iterated integrals with respect to a ﬁxed I can be viewed as functions S I : P R N → R on P R N . Chen generalized this concept of iterated integration toproduce forms on P R N , which ﬁt together to generate a cochain model of P R N .The iterated integrals deﬁned here are the 0-cochains of this cochain model. Asummary of this construction is included in Appendix A, and a brief discussion ofhigher cochains is in Section 3 .In this section, we discuss various properties and characterizations of these it-erated integrals, in preparation for their application to time series analysis in thefollowing section. A wide class of paths in which these theorems hold is the class ofbounded variation. For the remainder of the paper, we consider R N equipped withthe standard Euclidean norm, denoted (cid:107) · (cid:107) . TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 3

Deﬁnition 1.2.

Let Γ ∈ P R N . The of Γ on [0 ,

1] is deﬁned as(1.2) | Γ | − var := sup ( t i ) ∈P ([0 , (cid:88) i (cid:107) Γ( t i ) − Γ( t i − ) (cid:107) , where P ([0 , , BV ( R N ) = (cid:8) Γ ∈ P R N | | Γ | − var < ∞ (cid:9) are the paths of bounded variation on [0 , BV ( R N ).The collection of iterated integrals of Γ with respect to all multi-indices I is calledthe path signature of Γ, denoted S (Γ). The path signature can be represented asan element of the formal power series algebra of tensors (or also viewed as non-commutative indeterminates X = { X , . . . , X N } , denoted T ( R N ), S (Γ) = 1 + ∞ (cid:88) k =1 (cid:88) I =( i ,...,i k ) S I (Γ) X i ⊗ . . . ⊗ X i k . Several of the basic properties of these path signatures provide evidence thatthey are potentially useful for time series analysis.

Proposition 1.3.

Suppose Γ ∈ BV ( R N ) , φ : [0 , → [0 , a strictly increasingfunction, a ∈ R N , and λ ∈ R .The path signature is invariant under translation, S (Γ + a ) = S (Γ) , and reparametrization, S (Γ ◦ φ ) = S (Γ) . Additionally, under scaling, we have S ( λ Γ) = 1 + ∞ (cid:88) k =1 (cid:88) I =( i ,...,i k ) λ k S I (Γ) X i ⊗ . . . ⊗ X i k . Proof.

All three properties are straightforward to show using the deﬁnition of pathsignatures. Translation invariance is due to the translation invariance of the stan-dard 1-forms on R N . reparametrization invariance of the ﬁrst level is given by S i (Γ ◦ φ ) = (cid:90) ( γ i ( φ ( t )) (cid:48) dt = (cid:90) γ (cid:48) i ( φ ( t )) φ (cid:48) ( t ) dt = (cid:90) γ (cid:48) i ( τ ) dτ = S i (Γ) . Invariance for higher level signatures is shown by induction. Finally, the scalingproperty is clear from the deﬁnition of Equation 1.1. (cid:3)

Note that signatures can be deﬁned for paths with an arbitrary closed interval[ a, b ] ⊂ R as a domain. However, without loss of generality due to reparametrizationinvariance, we only consider paths deﬁned on [0 , . These path signatures characterize classes of paths in R N up to a tree-like equiv-alence , originally deﬁned in [24]. In order to deﬁne the relation, we ﬁrst considerconcatenation of paths. Suppose Γ , Γ ∈ BV ( R N ), then deﬁne the concatenationof the two paths, Γ ∗ Γ ∈ BV ( R N ) by(1.3) Γ ∗ Γ ( t ) = (cid:26) Γ (2 t ) : t ∈ [0 , )(Γ (1) − Γ (0)) + Γ (2 t −

1) : t ∈ [ , CHAD GIUSTI AND DARRICK LEE

The inverse of a path Γ is deﬁned to be the same path but running in the oppositedirection, namely Γ − ( t ) = Γ(1 − t ) . Deﬁnition 1.4 ([24]) . A path Γ ∈ BV ( R N ) is a tree-like path in R N if thereexists some positive real-valued continuous function h deﬁned on [0 ,

1] such that h (0) = h (1) = 0 and such that(1.4) (cid:107) Γ( t ) − Γ( s ) (cid:107) ≤ h ( s ) + h ( t ) − u ∈ [ s,t ] h ( u ) , where (cid:107) · (cid:107) is the Euclidean norm on R N . The function h is called a height function for Γ and if h is of bounded variation, then Γ is a Lipschitz tree-like path . Deﬁnition 1.5.

Two paths Γ , Γ ∈ BV ( R N ) are tree-like equivalent , Γ ∼ Γ , ifΓ ∗ Γ − is a Lipschitz tree-like path.It is shown in [24] that tree-like equivalence is an equivalence relation in BV ( R N )and that concatenation of paths respects ∼ . By deﬁning the inverse of a path Γby Γ − ( t ) = Γ(1 − t ), the equivalence classes Σ = BV ( R N ) / ∼ form a group underconcatenation.The more abstract notion of a tree-like path is required when working withgeneral bounded variation paths, but if we restrict ourselves to piecewise regularpaths, we can use a much more intuitive characterization based on reductions.Speciﬁcally, a path Γ is called reducible if there exist paths α , β , and γ such thatΓ = α ∗ γ ∗ γ − ∗ β up to reparametrization, and called irreducible otherwise.Furthermore, α ∗ β is called a reduction of Γ.A path Γ ∈ P R N is regular if Γ (cid:48) ( t ) is continuous and nonvanishing for all [0 , Lemma 1.6.

Suppose Γ ∈ P R N is a piecewise regular path. Then Γ is a Lipschitztree-like path if and only if its irreducible reduction is the constant path.Proof. First, suppose Γ can be reduced to a point. Thus, Γ can be constructediteratively with a ﬁnite set of paths γ , . . . , γ k as follows. Begin with Γ = γ ∗ γ − ,then Γ = α ∗ γ ∗ γ − ∗ β , where Γ = α ∗ β . Continue in this manner untilΓ = Γ k = α k − ∗ γ k ∗ γ − k ∗ β k − . For example, consider the following point reduciblepath Γ which can be built with two paths. Γ Γ = Γ γ γ − γ γ − Note that Γ will traverse each of γ , γ − , . . . , γ k , γ − k exactly once. Now, deﬁneΓ t to be the image of Γ | [0 ,t ] , and treat each of the γ i as the images. Then, deﬁne TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 5 the height function to be h ( t ) = k (cid:88) i =1 (cid:96) ( γ i ∩ Γ t ) − k (cid:88) i =1 (cid:96) ( γ − i ∩ Γ t ) , where (cid:96) ( · ) represents the length of the given segment. Intuitively, h ( t ) is the lengthof the curve up to Γ( t ), where we subtract oﬀ any segment that has been retraced.In our example, suppose that the red arrow represents the point Γ( t ), which hasbegun to traverse along γ − . The corresponding height function at the point is thediﬀerence of path lengths h ( t ) = ` ( ) − ` ( ) = ` ( ) .At the end of the curve, all paths and inverse paths will have been traced so h (1) = 0. Note that h ( t ) + h ( t ) − u ∈ [ t ,t ] h ( u ) represents the length of acurve from Γ( t ) to Γ( t ) which must be larger than (cid:107) Γ( t ) − Γ( t ) (cid:107) since this isthe straight line path. Lastly, the derivative of Γ( t ) is bounded over the closedinterval, the arc length function and thus the height function is Lipschitz. Thus, Γis Lipschitz tree-like.Next, suppose Γ is Lipschitz tree-like, and suppose to the contrary that Γ cannotbe reduced to a point. Let Γ r be the irreducible reduction of Γ. Then, Γ ∗ Γ − r ispoint reducible, and thus Lipschitz tree-like by the ﬁrst part of the proof. Thus, Γand Γ r are tree-like equivalent, so by the equivalence relation, if Γ r is not tree-like,then Γ is also not tree-like. Thus, we assume Γ is reduced so that it is irreducible.The height function h ( t ) is Lipschitz continuous, so there exists some local max-imum at t = t m . Next, choose t < t m and t > t m such that the following hold: • h ( t ) = h ( t ) = h ( t m ) − (cid:15) for some (cid:15) > • inf u ∈ [ t ,t ] h ( u ) = h ( t m ) − (cid:15) , and • Γ( t ) (cid:54) = Γ( t ).The ﬁrst two conditions are possible because h ( t ) is continuous, and the last con-dition is possible because Γ is irreducible. Therefore, we have (cid:107) Γ( t ) − Γ( t ) (cid:107) ≤ h ( t ) + h ( t ) − u ∈ [ t ,t ] h ( u ) = 0 , a contradiction. (cid:3) Now we state the characterization theorem, which was proved by Chen [10] forirreducible piecewise regular continuous paths, and generalized in [24] to boundedvariation paths BV ( R N ) ⊂ P R N . Theorem 1.7 ([24]) . Suppose Γ , Γ ∈ BV ( R N ) . Then S (Γ ) = S (Γ ) if and onlyif they are tree-like equivalent. In fact, this statement is even stronger when we consider the algebraic structureof the group of equivalence classes Σ and the group-like elements in formal powerseries. An element P ∈ T ( R N ) has a multiplicative inverse if and only if it hasa nonzero constant term. Therefore, the restriction (cid:101) T ( R N ) to formal power serieswith constant term 1 is a group under multiplication. Note that S (Γ) ∈ (cid:101) T ( R N ) bydeﬁnition. One of Chen’s original results [8] showed that the path signature map CHAD GIUSTI AND DARRICK LEE respects the multiplicative structure of paths and the formal power series. Namely,given Γ , Γ ∈ P R N , we have S (Γ ∗ Γ ) = S (Γ ) ⊗ S (Γ ) . Thus, the above theorem can be succinctly restated.

Theorem 1.8 ([24]) . The signature map S : Σ → (cid:101) T ( R N ) is an injective grouphomomorphism. That is, the path signature provides a complete set of invariants for paths upto tree-like equivalence, meaning any reparametrization-invariant property of suchequivalence classes can be derived using the signature terms. Thus, any propertyof time series that does not rely on the parameterization can be extracted from thesignature.This point of view is further emphasized in recent results by Chevyrev andOberhauser [14], which state that a normalized variant of the signature map ˜ S is universal to the class C b (Σ , R ) of continuous bounded functions on Σ, withrespect to the strict topology and is characteristic to the space of ﬁnite regularBorel measures on Σ. Loosely speaking, universal to C b (Σ , R ) means that anycontinuous, bounded function φ : Σ → R can be approximated by a linear functional φ ≈ (cid:104) (cid:96), ˜ S ( · ) (cid:105) , where (cid:96) ∈ (cid:101) T ( R N ) ∗ . Namely, in the context of classiﬁcation tasks, anydecision boundary deﬁned by a function in C b (Σ , R ) can be represented as a lineardecision boundary in (cid:101) T ( R N ) under the signature map. This provides theoreticaljustiﬁcation for the classiﬁcation tasks discussed in the next section. Characteristicmeans that ﬁnite, regular Borel measures on Σ are characterized by their expectednormalized signatures (in the same way that probability measures with compactsupport on R N are characterized by their moments).In addition to the multiplicative property of the signature, there exist a host ofother properties, stemming from another early result of Chen thatlog( S (Γ)) := (cid:88) j ≥ ( − j − j ( S (Γ) − j . is a Lie series for any path Γ [9]. This fact is equivalent to a shuﬄe productidentity [37], providing an internal multiplicative structure for the path signature. Deﬁnition 1.9.

Let k and l be non-negative integers. A ( k, l ) -shuﬄe is a permu-tation of σ of the set { , , . . . , k + l } such that σ − (1) < σ − (2) < . . . < σ − ( k )and σ − ( k + 1) < σ − ( k + 2) < . . . < σ − ( k + 1) . We denote by Sh ( k, l ) the set of ( k, l )-shuﬄes.Given two ﬁnite ordered multi-indices I = ( i , . . . , i k ) and J = ( j , . . . , j l ) , let R = ( r , . . . , r k , r k +1 , . . . r k +1 ) = ( i , . . . , i k , j , . . . , j l ) be the concatenated multi-index. The shuﬄe product of I and J is deﬁned to be the multiset I (cid:1) J = (cid:8)(cid:0) r σ (1) , . . . r σ ( k + l ) (cid:1) | σ ∈ Sh ( k, l ) (cid:9) . As an example, suppose I = (1 ,

2) and J = (2 , I (cid:1) J = { (1 , , , , (1 , , , , (2 , , , , (1 , , , , (2 , , , , (2 , , , } . TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 7

Theorem 1.10 ([37]) . Let I and J be multi-indices in [ N ] . Then S I (Γ) S J (Γ) = (cid:88) K ∈ I (cid:1) J S K (Γ) . Proof.

Let R = ( r , . . . , r k , r k +1 , . . . r k +1 ) = ( i , . . . , i k , j , . . . , j l ). Writing out thesignature on the left side of the equation using Equation 1.1, we get (cid:90) ∆ k × ∆ l γ (cid:48) r ( t ) . . . , γ (cid:48) r k + l ( t k + l ) d t . . . d t k + l , and the sum on the right side is (cid:88) σ ∈ Sh ( k,l ) (cid:90) ∆ k + l γ (cid:48) σ ( r ) ( t ) . . . γ (cid:48) σ ( r k + l ) ( t k + l )d t . . . d t k + l . The equivalence of the two formulas is given by the standard decomposition of∆ k × ∆ l into ( k + l )-simplices,∆ k × ∆ l = { ( t , . . . , t k + l ) | < t < . . . < t k < , < t k +1 < . . . < t k + l < } = (cid:71) σ ∈ Sh ( k,l ) (cid:8) ( t σ (1) , . . . , t σ ( k + l ) ) | < t < . . . < t k + l < (cid:9) . (cid:3) Note in particular that this implies the signature terms are not independent.For example, the shuﬄe formula says that S , (Γ) = S (Γ) S (Γ) − S , (Γ). Thuscomputation of all signature terms, even truncated to a ﬁnite level, results in re-dundant information. Basis sets for Lie series exist [37], and the set of Lyndonbases have been considered for signature computations [35, 36]. Further pertinentresults related to Lie series can be found in [37].Another property of central importance in data analysis is continuity of thesignature map. Let k ∈ N , then deﬁne the map π k : T ( R N ) → T k ( R N ) to be theprojection to the k th tensor level. Additionally, we equip T k ( R N ) with the norm | P | k := (cid:115) (cid:88) i ,...,i k | P i ,...,i k | , for all P = (cid:88) i ,...,i k P i ,...,i k X i ⊗ X i k . Recall that BV ( R N ) is equipped with the 1-variation norm deﬁned in Equation 1.2.With respect to these two norms, we obtain the following continuity result. Proposition 1.11 ([18]) . Suppose Γ , Γ ∈ BV ( R N ) and L ≥ max i =1 , | Γ i | − var .Then, for all k ≥ , there exist constants C k > such that | π k ( S (Γ ) − S (Γ )) | k ≤ C k L k − | Γ − Γ | − var . Additional analytic and geometric properties of the signature, along with appli-cations to rough paths is found in [18].2.

Applications to time series analysis

The signature provides a faithful embedding of bounded variation paths into theformal power series algebra of tensors. By considering the truncated signature at

CHAD GIUSTI AND DARRICK LEE some level L ∈ N , S L (Γ) = 1 + L (cid:88) k =1 (cid:88) I =( i ,...,i k ) S I (Γ) X i ⊗ . . . ⊗ X i k , we obtain a ﬁnite feature set { S I (Γ) } | I |≤ L for a multi-dimensional time series, whoselength does not depend on the length of the time series. One may draw parallelsbetween the signature representation of a path and various series representations offunctions such as Taylor series or Fourier series. However, there are two importantdiﬀerences:(1) The set of Taylor series and Fourier series coeﬃcients are linearly indepen-dent functionals, and provide a minimal set of features to describe func-tions. However, as described in the previous section, the full collectionof path signatures S ( · ) is not independent and includes redundant infor-mation, though there do exist bases for the signature such as the Lyndonbasis [35].(2) Series representations of functions is linear, whereas the path signatureis highly nonlinear. On the one hand, nonlinearity of the signature maycapture nontrivial, discriminatory aspects of paths with fewer features thana linear representation. However on the other hand, nonlinearity causes theinversion problem of ﬁnding a path with a given signature to be signiﬁcantlymore diﬃcult. A general method for continuous paths is given in [29], andanother method for piecewise linear paths is given in [28]. An algebraic-geometric approach to the problem was recently established in [1].The feature set obtained from the truncated signature has recently been used ina variety of machine learning classiﬁcation problems. Early examples include ap-plications to ﬁnancial time series [22] and handwritten character recognition [41].Other examples include classifying time series of self-reported mood scores to dis-tinguish between bipolar and borderline personality disorders [2], and classifyingtime series of diﬀerent brain region volumes to detect diagnosis of Alzheimers [30].The surveys [12, 26] further discuss these applications, along with diﬀerent ways totransform the time series such that the path is better suited for signature analysis.The path signature feature set has also been successful in situations where thedata isn’t naturally a path. This is the case in [13] in which the path signature isused in conjunction with persistent homology to build a feature set for barcodes,a topological summary of a data set. Barcodes have no standard description as avector of ﬁxed dimension, and this method provides such a description, allowingtechniques from topological data analysis to be used with standard machine learningalgorithms. The proposed pipeline consists of the the following compositions Met

P H −−→

Bar ι −→ BV ( R N ) S L −−→ R (cid:104)(cid:104) X (cid:105)(cid:105) . The map

P H : Met → Bar refers to the persistent homology functor, whichassigns a barcode to the input data represented by a metric space (such as a pointcloud in Euclidean space) [19]. The barcode can then be transformed into a pathin Euclidean space by the transformation ι , and ﬁnally the truncated signature S L is computed. Several transformations ι from barcodes to paths are consideredin the paper, and several are applied in this pipeline resulting in state-of-the-artperformance on some standard classiﬁcation benchmarks. TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 9

These applications demonstrate the utility of using path signature terms forclassiﬁcation tasks. However, as posited in the opening discussion, the power oftopological tools lies in their interpretability. Thus, we now turn our attention to thequestion of how path signatures provide encode human-understandable propertiesof multivariate time series. We begin with the notion of signed area and cyclicity,which is a way to study lead-lag relationships between time series in the absenceof periodicity. This weak structure is diﬃcult to capture with classical methodsfor time series analysis, which rely on the regularity of the parameterization todecompose the time series. To address this diﬁculty, Baryshnikov [4] suggested theuse of path signatures to characterize cyclicity. Next, we consider how the secondlevel signature terms can be viewed as a measure of causality.2.1.

Cyclicity and Lead-Lag Relationships.

We begin by explicitly computingthe ﬁrst two levels of the path signature. Again, we consider a collection of N simultaneous time series γ i : [0 , → R , viewed as a path Γ ∈ P R N . By deﬁnition,we can compute S i (Γ) = (cid:90) γ (cid:48) i ( t ) dt = γ i (1) − γ i (0) ,S i,j (Γ) = (cid:90) S i (Γ)( t ) γ (cid:48) j ( t ) dt = (cid:90) ( γ i ( t ) − γ i (0)) γ (cid:48) j ( t ) dt. The second level signature terms of a path in R are shown as the shaded areasin the following ﬁgure, where solid blue represents positive area, and hatched redrepresents negative area. S ; S ; ( S ; − S ; ) The third panel suggests that the linear combination ( S i,j (Γ) − S j,i (Γ)) encodessome information intrinsic to the path Γ. Deﬁnition 2.1.

Let α : [0 , → R be a continuous closed curve deﬁned by α ( t ) = ( α ( t ) , α ( t )) and x = ( x , x ) ∈ R \ im( α ). We can rewrite α ( t ) in terms ofpolar coordinates α ( t ) = ( r α,x ( t ) , θ α,x ( t )) centered at x where r α,x ( t ) = | α ( t ) − x | , θ α,x (0) = tan − (cid:18) α (0) − x α (0) − x (cid:19) , and θ α,x ( t ) is deﬁned via continuity. The winding number of α with respect to x is η ( α, x ) = θ α,x (1) − θ α,x (0)2 π . Proposition 2.2.

Suppose Γ ∈ P R N , and let ˜Γ = (˜ γ , . . . , ˜ γ N ) be the concatenationof Γ with a linear path connecting Γ(1) to Γ(0) . In addition, let ˜Γ i,j = (˜ γ i ( t ) , ˜ γ j ( t )) .Then A i,j (Γ) := 12 (cid:0) S i,j (Γ) − S j,i (Γ) (cid:1) = (cid:90) R η (˜Γ i,j , x )d x which is called the signed area.Proof. We begin by assuming Γ(0) = 0 by translation invariance. Next, we showthat A i,j (Γ) = A i,j (˜Γ). In the time interval t ∈ [1 / , γ i ( t ) = m i t + b i where m i = − b i since the path must end at γ i (1) = 0. Then, we have A i,j (˜Γ) = (cid:90) γ i ( t ) γ (cid:48) j ( t ) − γ j ( t ) γ (cid:48) i ( t ) dt = A i,j (Γ) + (cid:90) / ( m i t + b i ) m j − ( m j t + b j ) m i dt = A i,j (Γ) . Now, suppose Finally, by applying Stokes’ theorem, we get A i,j (˜Γ) = (cid:73) ˜Γ x i dx j − x j dx i = (cid:90) R η (˜Γ i,j , x )d x. (cid:3) In the third ﬁgure above, blue corresponds to a winding number of 1 whereasred corresponds to a winding number of −

1, resulting in the same interpretation asthe formula. More generally, it was shown in [5] that all moments of the windingnumber of the curve ˜Γ − ˜Γ(0) can be computed by linear combinations of signatureterms of Γ, and conversely that the ﬁrst four terms of log S (Γ) can be expressedusing only the function η (˜Γ − ˜Γ(0) , x ).The appearance of the winding number suggests that path signatures shouldbe useful in studying periodic time series. However, reparamerization-invariancemeans that the signature naturally captures the broader and increasingly importantclass of cyclic time series. Cyclic time series are those which can be factored throughthe circle Γ : [0 , φ −→ S f −→ R N where φ is an orientation-preserving parametrization of the process. Cyclic phe-nomena arise naturally in a plethora of ﬁelds. Some simple examples include phys-iological processes such as breathing, sleep, the cardiac cycle, and neuronal ﬁring;ecological processes such as the carbon cycle; and control processes involving feed-back loops. Despite their repetitive nature, very rarely are such processes trulyperiodic, or even quasi-periodic, except to a coarse approximation.One question of interest when studying cyclic processes is whether there exists a lead-lag relationship between two or more signals; such a relationship may indicatecausality, or simply provide a predictive signal. Consider the two pairs of timeseries Γ a = ( γ a , γ a ) and Γ b = ( γ b , γ b ), shown on the left in the following ﬁgure. TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 11

These two time series are chosen such that Γ b is simply a reparametrization of Γ a ,so there exists an orientation-preserving φ : [0 , → [0 ,

1] such that Γ b = Γ a ◦ φ . -101 -0.4 -0.2 0 0.2 0.4-101 Perhaps the most common method for detecting lead-lag relationships in timeseries Γ : [0 , T ] → R is the unbiased cross-correlation, deﬁned by r (Γ)( t d ) = 1 T − t d (cid:90) T γ ( t ) γ ( t − t d ) dt, where T is the total length of the time series and Γ( t ) = 0 when t / ∈ [0 , T ]. Theunbiased cross correlation of both sets of time series are shown on the top right.The cross correlation of Γ a has a clear periodic structure of its own, suggesting thatthe presence of a cyclic process in which one signal leads the other. The distancebetween maxima provides an estimate of the period of the two signals, and thephase-shift an estimate of the time-delay between γ a and γ a . However, the crosscorrelation of Γ b is irregular, and though it attains a large value near t d = − . b is a reparametrization of Γ a , they will have the same signedarea A , (Γ a ) = A , (Γ b ). Indeed, the curve traced out by ( γ , γ ), shown inthe bottom right, which winds around counter-clockwise 4 times, indicating thefour “events” in each time series. The positive signed area suggests a lead-lagrelationship for both sets of time series; this equivalence arises because the pathsignature depends only on ordered, simultaneous measurements, rather than thetime between measurements.In general, we can apply such an analysis to multidimensional time series by cal-culating the signed area between every pair of time series. In the context of sampleddata, this computation boils down to the dot product of vectors, so is computa-tionally feasible even for large systems, and the additivity of the integrals overpartitions of domains means the measure can easily be implemented for streamingdata. Deﬁnition 2.3.

Let Γ ∈ P R N represent N simultaneous time series. The leadmatrix of Γ is an N × N skew-symmetric matrix with entries( A ) i,j = A i,j (Γ) . The matrix characterizes pairwise lead-lag behavior among a family of simul-taneous time series. This method has been applied to the study of fMRI data, -505 distinguishing between patients with tinnitus and those with normal hearing [42].The skew-symmetric nature of this matrix lends itself to analogies with covari-ance matrices, however whereas the covariance matrix measures undirected andtemporally independent relationships between variables, the lead matrix measurestemporally directed relationships between variables.Of course, computing the signed area of the entire time series will only providesensible lead-lag information if this behavior persists throughout the entire timeinterval. In many scenarios, this is not the case. For example, in gene regulatorynetworks there are cycles of activity initiated by irregular, external chemical signals.Diﬀerent signals may induce diﬀerent cycles of behavior, which may even haveinverse lead-lag relationships, so integration across the entire time domain willprovide negligible signature. Similarly, in an experimental environment we mayperturb a system, necessarily leading to non-stationarity in the observed behavior,in which case the interesting signal would be the change in relationships acorssdiﬀerent epochs. Such controlled perturbations are, in particular, necessary forrigorous causality inference. -0.200.20.40.60 0.5 1-0.200.20.40.6-0.200.20.40.6

Original Time Series -0.200.20.40.60 0.5 1-0.200.20.40.6-0.200.20.40.6

Smoothed Time Series

For example, consider the synthetic time series Γ = ( γ , γ , γ ), as shown on theleft column of the ﬁgure. We wish to detect whether or not there exist any lead-lagcycles that occur on a time scale that is small compared to the entire interval of the TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 13 time series. Thus we perform signed area computations along a sliding window ofthe time series. We begin by convolving the time series with a narrow Gaussian asa smoothing preprocessing step to reduce noise. Next, we compute the three signedareas A , , A , and A , along a sliding window of length t = 0 . σ conﬁdence intervals in the third column panels.While formal analysis of the probabilities requires a model of the underlying timeseries, we can empirically infer that a lead-lag relationship exists if the signed areais outside the conﬁdence interval consecutively for a long sequence of consecutivetime points. Thus, we likely we have an event with positive A , , in which γ leads γ , and also an event with negative A , , in which γ leads γ .This example demonstrates how the path signature may be used to detect lead-lag relationships in a model-free setting. The generality of the path signature canbe exploited in other ways, and we describe a diﬀerent interpretation of the secondlevel signatures in terms of causality in the next section.2.2. Causality Analysis.

One of the fundamental steps in understanding thefunction of complex systems is the identiﬁcation of causal relationships. How-ever, empirically identifying such relationships is challenging, particularly whencontrolled experiments are diﬃcult or expensive to perform. Three of the most com-mon of approaches to causal inference are structural equation modelling, Grangercausality, and convergent cross mapping. Like most approaches, these suﬀer fromstringent assumptions that may not hold in empirical data. In order to understandthese limitations, we ﬁrst outline these methods, then describe how the second levelsignature terms can be applied as an assumption-free measurement of potential in-ﬂuences in observational data, and explore some examples of their use.We follow our previous notation and let Γ( t ) = ( γ ( t ) , . . . , γ N ( t )) denote a collec-tion of N simultaneous time series. In the following examples, we consider whether γ ( t ) causally eﬀects γ ( t ); the rest of the time series should be interpreted asmeasured external factors.Structural equation modelling (SEM) [40, 23] was one of the earliest develop-ments in causal inference. It has more recently been recast into a formal frameworkby Pearl [33] in which causal relationships can be determined. The fundamentaloperating principle of SEM is that causal assumptions are codiﬁed as hypothesesin the form of a directed graph, called a causal diagram . The nodes represent allvariables of interest, and directed edges represent possible causal inﬂuences. Notethat the crucial information in such a diagram is the absence of edges. γ γ γ γ Given this causal diagram, the structural equation most commonly used in prac-tice for time series assumes linearity, Gaussian errors and stationarity [7, 25]. Itcan be viewed as a combination of linear SEM and a vector autoregressive (VAR)model, Γ( t ) = n (cid:88) i =0 β i Γ( t − i ) + U ( t )where β i is a matrix of eﬀect sizes for a given time lag, and U is a vector of randomGaussian variables which represents error. The causal assumptions are encoded in β i , which has a zero entry for every directed edge that is omitted from the causaldiagram. The goal is then to estimate the parameters β i based on empirical datato determine whether or not causal inﬂuences exist.Another measure of causality in common use is Granger causality [21], whichexplicitly accounts for the temporal nature of causality, and is often used with timeseries data. It operates based on two main principles.(1) (Temporal precedence) The eﬀect does not precede its cause in time.(2) (Separability) The causal series contains unique information about the ef-fected series that is otherwise not available.Let A ⊥⊥ B | C denote that A and B are independent given C and let X t = { X ( s ) | s ≤ t } denote the history of X ( t ) up to time t . Deﬁnition 2.4.

The process γ ( t ) is Granger non-causal for the series γ ( t ) withrespect to Γ = ( γ ( t ) , γ ( t ) , γ ( t )) if γ ( t + 1) ⊥⊥ γ t | γ t , γ t for all t ∈ Z ; otherwise γ ( t ) Granger causes γ ( t ) with respect to Γ.The idea behind this deﬁnition is that γ does not causally inﬂuence γ if futurevalues of γ are independent to all past values of γ , conditioned on past values of γ and any external factors γ .A measure of Granger causality is determined by a comparison of predictivepower [6]. Let Γ = ( γ , γ , γ ) and (cid:101) Γ = ( γ , γ ), and we assume that these timeseries are modeled by a VAR process. To test the criteria of independence inGranger causality, we ﬁt two VAR modelsΓ( t ) = n (cid:88) i =1 A i Γ( t ) + U ( t ) , (2.1) (cid:101) Γ( t ) = n (cid:88) i =1 (cid:101) A i (cid:101) Γ( t ) + (cid:101) U ( t ) . (2.2)Prediction accuracy of either model is determined by the variance of the residualvar( U ( t )). Thus, the empirical notion of Granger causality is deﬁned by C γ → γ = ln var( (cid:101) U ( t ))var( U ( t )) . The separability assumption is untrue in many situations. Prominent examplesare deterministic dynamical systems with coupling between variables such as in afeedback loop. This is clear from Taken’s theorem.

TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 15

Theorem 2.5 ([39]) . Let M be a compact manifold of dimension m . For pairs ( φ, y ) , where ψ : M → M is a diﬀeomorphism and the observation function y : M → R is smooth, it is a generic property that the map Ψ : M → R m +1 deﬁnedby Ψ( x ) = (cid:0) y ( x ) , y ( ψ ( x )) , y ( ψ ( x )) , . . . , y ( ψ m ( x )) (cid:1) is an embedding. Here we treat M as an invariant manifold of a dynamical system evolving ac-cording to a vector ﬁeld V , and the diﬀeomorphism ψ corresponds to the ﬂow of V with respect to negative time − τ . The observation function is usually taken to be aprojection map π i on to the X i coordinate. In this context, Taken’s theorem statesthat the manifold M is diﬀeomorphic to reconstructions via the delay embeddingΨ i using any of the projection maps π i , assuming they are generic. Thus, if twovariables X i and X j are coupled in the dynamical system, then information aboutthe state of one variable X i exists in the history of another X j .The ﬁnal approach to causal inference that we describe takes advantage of thisproperty of dynamical systems. The method of convergent cross mapping (CCM)was developed by Sugihara [38] and later placed in a rigorous mathematical frame-work [16]. The motivation behind CCM is to understand the causal structure of an N dimensional time series Γ( t ) which is a trajectory of an underlying deterministicdynamical system γ (cid:48) i ( t ) = V i ( γ , . . . , γ N ) . A component γ ( t ) causally inﬂuences component γ ( t ) if the γ component of thevector ﬁeld, V ( X ) has a nontrivial dependence on γ . The idea is that if such anontrivial dependence exists, then one can predict the states in M based on theinformation in M . Prediction accuracy should increase as we include more timepoints in Γ( t ), and the convergence of prediction accuracy is used as the indicationof causal inﬂuence.The three methods of causal inference surveyed here were established based ondiﬀerent notions of causality, and are thus applicable in diﬀerent scenarios. How-ever, the practical implementations of SEM and GC depend on strong assumptionssuch as linearity, stationarity and Gaussian noise, which often do not hold for empir-ical data. Moreover, SEM requires a priori knowledge about the underlying processwhich may not be well established for complex data sets. CCM moves beyond linearand stationary assumptions to study complex nonlinear systems, but still dependson a dynamical systems model.We propose the path signature as a model-free measure of causality, in whichour only assumption is that of temporal precedence of causal eﬀects. Namely, wewish to detect the observed inﬂuence between the various components in our timeseries. We do not claim that observed inﬂuences are truly causal. Omitted externalfactors may confound observed variables, and various true causal pathways mayresult in spurious inﬂuences.This approach is motivated by the equation for the second level of the signature,in the case where γ i (0) = 0 for all components i , S i,j (Γ) = (cid:90) γ i ( t ) γ (cid:48) j ( t ) dt. Here the term γ i ( t ) should be thought of as the distance from the mean of thepath component. In practice, this is done by translating each component of thepath such that it has mean 0, normalizing the time series to have maximum value1 (either separately or as a group, depending on the intended application), andappending γ i (0) = 0 at the beginning of each time series.With this context, the integrand can be viewed as a measure of how the magni-tude of γ i ( t ) inﬂuences the change in γ (cid:48) j ( t ). By integrating over the entire path, weobtain an aggregate measure of the inﬂuence of γ i on the change in γ j over the giventime interval. As such, the second order signatures provide a measure of potentialobserved inﬂuence, indicating possible causal relationships between variables usingonly observations of time series, without any prior assumptions. Of course, thismethod will not be able to distinguish between true and spurious causal relations(due to confounders, for example). However, such caveats would necessarily applyto any system in the absence of a model; thus, in addition to providing a coarsemeasure of causality, one can view this method as a preprocessing step for themodel-based methods described above.We close with a ﬁnal example, considering the case that the system is knownor suspected to be non-stationary. In this setting, a global measure of inﬂuence isinappropriate, as we are often interested in the change in such structure when thesystem changes modes. Fortunately, it is straightforward to modify the signaturemeasure to detect temporally localized inﬂuences. This is done by studying thederivative of the signature, which is simply given by the integrand( S i,j ) (cid:48) (Γ)( t ) = γ i ( t ) γ (cid:48) j ( t ) . Geometrically, this is the instantaneous area of the arc at the origin of the ( γ i , γ j )-plane swept out by the pair of time series. If this measure has large magnitude onan interval, it suggests suggests that one of the series is strongly inﬂuencing theother during that epoch.We demonstrate this method using a familiar example of dynamics which ex-hibit mode-switching. Consider the time series Γ( t ) = ( γ ( t ) , γ ( t ) , γ ( t )), whichrepresents a portion of a discretized solution to the Lorenz equations γ (cid:48) ( t ) = σ ( γ ( t ) − γ ( t )) ,γ (cid:48) ( t ) = γ ( t )( ρ − γ ( t )) − γ ( t ) ,γ (cid:48) ( t ) = γ ( t ) γ ( t ) − βγ ( t ) , where we have taken the parameters σ = 10, ρ = 28, and β = 8 /

3. The equationsare solved using the built-in ode45 function in MATLAB. For preprocessing, eachcomponent has been translated so that it has mean 0, and an additional point hasbeen appended to the beginning of the time series so that it starts at the origin.Each component is individually normalized so that sup( γ i ( t )) − inf( γ i ( t )) = 1.In the following ﬁgure, all six second-level signature derivative terms are shownon the left, with plots of the path projected onto the corresponding plane. Notethat the time axis has arbitrary units due to reparametrization invariance. Aswith the previous example, we use a time shuﬄed null model in which the sameanalysis is performed on the shuﬄed time series. The null distribution is generatedby repeating this procedure 1000 times. The shaded portion of the signature plotscorrespond to the 3 σ conﬁdence intervals. TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 17

The upper and lower bounds of the conﬁdence intervals are outlined with redand green lines respectively. The time points at which the the signature derivativeis either above or below the conﬁdence interval are considered signiﬁcant, and arerespectively colored red or green in the plot on the right.We observe the expected result in the ﬁrst row: the signature derivative picksout sections of the plot in which γ is positive (negative) and γ is increasing(decreasing). The opposite trend of sections in which γ is positive (negative) and γ is decreasing (increasing) is seen in the green time points.3. Generalizations and Outlook

We have seen that path signatures provide a natural feature set for studyingmultivariate time series. In addition, we have discussed ways to view the secondlevel signature terms in order to understand the path signature in an interpretablemanner. In this section, we outline two directions for generalizations of these ideasto more complex settings, which will be further discussed in forthcoming work bythe authors.The ﬁrst direction is to consider the full Chen cochain model

Chen ( P R N ), al-luded to in Section 1 and further discussed in Appendix A, which is a subcomplexof the de Rham complex of diﬀerential forms on P R N . The iterated integrals de-scribed thus far are the 0-forms in this cochain model, and we have seen that thesecochains describe properties of individual points of P R N .Integration of the 1-forms of Chen ( P R N ) along paths in P R N , interpreted asparametrized families of time series, provides information about such a family.To draw an analogy, consider the case of diﬀerential forms on R N . The 0-formsare simply functions, which provide information about individual points in R N ,while integration of 1-forms provide information about paths in R N . For example,integration of d x i along a path tells us the displacement in the x i coordinate.The simplest example of a 1-form in Chen ( P R N ) is generated by a single 2-formon R N . We follow the construction in Deﬁnition A.3 to obtain our desired 1-form.Suppose ω = d x i ∧ d x j , and suppose α : I → P R N is a family of paths. Associ-ated to such a family is the map α : I × I → R N , deﬁned by α ( s, t ) = α ( s )( t ). Thepullback of ω with respect to α is( α ) ∗ ( ω ) = (cid:18) ∂α i ∂s ∂α j ∂t − ∂α i ∂t ∂α j ∂s (cid:19) d s ∧ d t. The 1-form in

Chen ( P R N ) with respect to ω , viewed under the plot α is deﬁnedto be (cid:18)(cid:90) ω (cid:19) α = (cid:18)(cid:90) ∂α i ∂s ∂α j ∂t − ∂α i ∂t ∂α j ∂s d t (cid:19) d s. We can think of this expression as the pullback of the 1-form (cid:82) ω along α . Thus,integrating over the family of paths corresponds to integrating over s , and we obtain (cid:90) α (cid:18)(cid:90) ω (cid:19) = (cid:90) (cid:90) ∂α i ∂s ∂α j ∂t − ∂α i ∂t ∂α j ∂s d t d s. Note that the integrand is the determinant of the Jacobian of α i,j = ( α i , α j ).Therefore, integration of (cid:82) ω along a family of paths yields the area of the region α i,j ( I ), as shown in the ﬁgure. α i;j s t x j x i s = s α i;j ( s ; t ) Although the information in (cid:82) ω may seem elementary, this example produces thesimplest 1-form by Chen’s construction, analogous to the ﬁrst level signature terms S i . The idea is to mimic the construction of the path signature, and constructiterated integral forms out of a 2-form ω ∈ A dR ( R N ) and several 1-forms ω i ∈ A dR ( R N ) in diﬀerent permutations to acquire more sophisticated properties of thesefamilies of paths. In fact, by considering the p -forms in Chen ( P R N ), we can studymultiparameter families of paths in a similar manner.The second direction is to consider spaces more general than P R N . Namely,we can think of the path space as the mapping space P R N = M ap ( I, R N ), andconsider iterated integral cochain models for more general mapping spaces. In fact,Chen’s deﬁnition of the path signature was not restricted to paths on R N , butrather paths on diﬀerentiable manifolds M . The deﬁnition of the path signature isthe same as Deﬁnition 1.1, except we replace the standard 1-forms with a collectionof forms ω , . . . , ω m ∈ A dR ( M ). Most of the algebraic properties from Section 1still hold for path signatures in P M . In the following theorem C r denotes r -timescontinuously diﬀerentiable. Theorem 3.1 ([10]) . Let M be a C r manifold with r ≥ , and suppose ω , . . . , ω m ∈ A dR ( M ) such that they span the cotangent bundle T ∗ M at every point. Then if Γ , Γ ∈ P M are irreducible piecewise-regular continuous paths such that Γ (0) =Γ (0) and S (Γ ) = S (Γ ) , then Γ is a reparametrization of Γ . This is the analogous statement of Theorem 1.8 but for

P M rather than P R N .This theorem states that the path signature for manifolds is still a faithful represen-tation of paths. Thus, it provides a complete reparametrization-invariant featureset for multivariate time series that naturally lie on a manifold. For example, timeseries of phases of a collection of oscillators, would be a path on a toroidal manifold.Another example may be time series of states of some dynamical system, which maybe a trajectory on an invariant manifold.We can generalize further and consider mapping spaces M ap ( Y, M ), where Y is a topological space such that conn( M ) ≥ dim( Y ). In this case, there exists ageneralized iterated integral cochain model for the mapping space, which is devel-oped in [32, 20]. This setting would allow for the study of data which is naturallymodeled by elements of such mapping spaces. Possible examples include vectorﬁelds over an embedded manifold M ⊂ R N , which can be modelled by the mappingspace M ap ( M, R N ). Acknowledgements

D.L. is supported by the Oﬃce of the Assistant Secretary of Defense Research& Engineering through ONR N00014-16-1-2010, and the Natural Sciences and En-gineering Research Council of Canada (NSERC) PGS-D3.

Appendix A. Path Space Cochains

Chen’s formulation of a cochain model begins by deﬁning a de Rham-typecochain complex A dR on a general class of spaces called diﬀerentiable spaces , gener-alizing the usual diﬀerential forms deﬁned on manifolds. Path spaces are examplesof diﬀerentiable spaces, and thus are associated with such a de Rham cochain com-plex. By deﬁning iterated integrals using higher-degree forms on R N , rather thanthe 1-forms used in Deﬁnition 1.1, we obtain forms on P R N rather than functions.Finally, he shows that the forms generated by iterated integrals form a subcom-plex of A dR , and is in fact quasi-isomorphic to A dR . A detailed account of thisconstruction is found in [11], and a more modern treatment can be found in [17].Smooth structures are deﬁned on manifolds by using charts to exploit the well-deﬁned notion of smoothness on Euclidean space. Charts can be viewed as probesinto the local structure of a manifold. However, as homeomorphisms of some Eu-clidean space of ﬁxed dimension, charts are a rather rigid way to view local structureas they are both maps into and out of a manifold. Diﬀerentiable spaces relax thishomeomorphism condition, and only require its plots , the diﬀerentiable space ana-log of a chart, to map into the space. Baez and Hoﬀnung [3] further discuss theseideas, along with categorical properties of diﬀerentiable spaces. Deﬁnition A.1. A diﬀerentiable space is a set X equipped with, for every Eu-clidean convex set C ⊆ R n with nonempty interior and for any dimension n , acollection of functions φ : C → X called plots , satisfying the following:(1) (Closure under pullback) If φ is a plot and f : C (cid:48) → C is a smooth, then φf is a plot.(2) (Open cover condition) Suppose the collection of convex sets { C j } form anopen cover of the convex set C , with inclusions i j : C j (cid:44) −→ C . If φi j is a plotfor all j , then φ is a plot.(3) (Constant plots) Every map f : R → X is a plot.It is clear that any manifold is a diﬀerentiable space by taking all smooth maps φ : C → M to be plots. We obtain a canoncial diﬀerentiable space structure on P M by noting that, given any map α : C → P M , there is an associated adjoint map α : I × C → M deﬁned by α ( t, x ) = α ( x )( t ). Consider the collection of all maps α : C → M for which the adjoint α is a smooth map, which clearly satisﬁes theﬁrst and third conditions. To obtain a collection of plots on P M , we additionallyinclude all maps α : C → P M such that the hypothesis of the second condition istrue.

Deﬁnition A.2. A p -form ω on a diﬀerentiable space X is an assignment of a p -form ω φ on C to each plot φ : C → X such that if f : C (cid:48) → C is smooth, then ω φf = f ∗ ω φ . The collection of p -forms on X is denoted A pdR ( X ), and the gradedcollection of all forms on X is A dR ( X ).Linearity, the wedge product, and the exterior derivative are all deﬁned plot-wise.Namely, given ω, ω , ω ∈ A dR ( X ), λ ∈ R , and any plot φ : C → X , TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 21 • ( ω + λω ) φ = ( ω ) φ + λ ( ω ) φ , • ( ω ∧ ω ) φ = ( ω ) φ ∧ ( ω ) φ , and • ( dω ) φ = dω φ .Therefore, A dR ( X ) has the structure of a commutative diﬀerential graded alge-bra, and we may deﬁne the de Rham cohomology H ∗ dR ( X ) := H ∗ ( A dR ( X ))of diﬀerentiable spaces.From here forward, we will focus on the case of forms on P R N , for which thereis a special, easily understood class of forms deﬁned using iterated integrals. Muchof what we explicitly construct can be lifted to paths in manifolds of interest,or tomore general mapping spaces, and will be discussed in forthcoming work by theauthors. Deﬁnition A.3.

Let ω , . . . , ω k be forms on R N with ω i ∈ A q i dR ( R N ). The iteratedintegral (cid:82) ω . . . ω k is a (( q + . . . + q k ) − k )-form on P R N deﬁned as follows. Let α : C → P R N be a plot with adjoint α : C × I → R N . Decompose the pullback of ω i along α on C × I as α ∗ ( ω i )( x, t ) = d t ∧ ω (cid:48) i ( x, t ) + ω (cid:48)(cid:48) i ( x, t )where ω (cid:48) i , ω (cid:48)(cid:48) i are q i -forms on C × I without a d t term. Then, the iterated integralis deﬁned as (cid:18)(cid:90) ω . . . ω k (cid:19) α = (cid:90) ∆ k ω (cid:48) ( x, t ) ∧ . . . ∧ ω (cid:48) k ( x, t k ) d t . . . d t k . Consider the conceptual similarities between this deﬁnition, and the one givenin Deﬁnition 1.1. In the language of our present formulation, S I (Γ), as given inEquation 1.1, is the iterated integral where ω l = d x i l viewed through the one-pointplot α Γ : {∗} → P R N deﬁned by α Γ ( ∗ ) = Γ. Deﬁnition A.4.

Let

Chen ( P R N ) be the sub-vector space of forms on P R N gen-erated by π ∗ ( ω ) ∧ (cid:90) ω . . . ω k ∧ π ∗ ( ω k +1 )where • ω i ∈ A dR ( R N ), for i = 0 , . . . , k + 1, • (cid:82) ω . . . ω k is the iterated integral in the previous deﬁnition, and • π , π : P R N → R N are the evaluation maps at 0 and 1 respectively. Theorem A.5 ([11]) . The complex

Chen ( P R N ) is a diﬀerential graded subalgebraof A dR ( P R N ) . This theorem is proved by showing that the

Chen ( P R N ) is closed under thediﬀerential and the wedge product. As we will not make use of the details, werefer the reader to [11] for further discussion of the diﬀerential, noting only thatthe additional forms π ∗ ( ω ) and π ∗ ( ω k +1 ) are required for closure. The wedgeproduct structure is analogous to the shuﬄe product identity in Theorem 1.10, andis proved in a similar manner. Note that the wedge product structure for 0-cochainsis exactly Theorem 1.10. Given m forms ω i ∈ A q i dR ( R N ) and σ a permutation of the set [ m ], we denote by (cid:15) σ, ( q i ) ∈ {− , } the sign such that ω ∧ . . . ∧ ω m = (cid:15) σ, ( q i ) (cid:0) ω σ (1) ∧ . . . ∧ ω σ ( m ) (cid:1) . As the notation suggests, (cid:15) σ, ( q i ) depends on both the permutation and the orderedlist of the degrees ( q i ). Lemma A.6.

Let ω i ∈ A q i dR ( R N ) for i = 1 , . . . , k + l . We have the following productformula: (cid:90) ω . . . ω k ∧ (cid:90) ω k +1 . . . ω k + l = (cid:88) σ ∈ Sh ( k,l ) (cid:15) σ, ( q i ) (cid:90) ω σ (1) ω σ (2) . . . ω σ ( k + l ) . (A.1)Theorem A.5 and the following theorem show that the subcomplex of iteratedintegrals Chen ( P R N ) is a cochain model for P R N . Theorem A.7.

The two commutative diﬀerential graded algebras, A dR ( P R N ) and Chen ( P R N ) , have the same minimal model as R N . Returning our focus to iterated integrals as functions, we see that the S I are 0-cochains in this model, constructed via pullback and integration. Indeed, considerthe evaluation map ev k : ∆ k × P R N → ( R N ) k deﬁned byev k (( t , . . . , t k ) , Γ) := (Γ( t ) , . . . , Γ( t k )) . Then, S I is the image of ⊗ kl =1 d x i l under the composition A dR ( R n ) ⊗ k ev ∗ k −−→ A kdR (∆ k × P R N ) (cid:82) ∆ k −−→ Chen ( P R N ) . References

1. Carlos Am´endola, Peter Friz, and Bernd Sturmfels,

Varieties of Signature Tensors ,arXiv:1804.08325 [math] (2018), arXiv: 1804.08325.2. Imanol Perez Arribas, Kate Saunders, Guy Goodwin, and Terry Lyons,

A signature-based machine learning model for bipolar disorder and borderline personality disorder ,arXiv:1707.07124 [stat] (2017), arXiv: 1707.07124.3. John C. Baez and Alexander E. Hoﬀnung,

Convenient categories of smooth spaces , Transac-tions of the American Mathematical Society (2011), no. 11, 5789–5825.4. Y. Baryshnikov and E. Schlaﬂy,

Cyclicity in multivariate time series and applications tofunctional MRI data , 2016 IEEE 55th Conference on Decision and Control (CDC), December2016, pp. 1625–1630.5. Horatio Boedihardjo, Hao Ni, and Zhongmin Qian,

Uniqueness of signature for simple curves ,Journal of Functional Analysis (2014), no. 6, 1778–1806.6. Steven L. Bressler and Anil K. Seth,

WienerGranger Causality: A well established methodol-ogy , NeuroImage (2011), no. 2, 323–329.7. Gang Chen, Daniel R. Glen, Ziad S. Saad, J. Paul Hamilton, Moriah E. Thomason, Ian H.Gotlib, and Robert W. Cox, Vector Autoregression, Structural Equation Modeling, and TheirSynthesis in Neuroimaging Data Analysis , Computers in biology and medicine (2011),no. 12, 1142–1155.8. Kuo-Tsai Chen, Iterated Integrals and Exponential Homomorphisms , Proceedings of the Lon-don Mathematical Society s3-4 (1954), no. 1, 502–512.9. ,

Integration of Paths, Geometric Invariants and a Generalized Baker- Hausdorﬀ For-mula , Annals of Mathematics (1957), no. 1, 163–178.10. , Integration of Paths–A Faithful Representation of Paths by Noncommutative FormalPower Series , Transactions of the American Mathematical Society (1958), no. 2, 395–407.11. , Iterated path integrals , Bulletin of the American Mathematical Society (1977),no. 5, 831–879 (EN). MR MR0454968 TERATED INTEGRALS AND POPULATION TIME SERIES ANALYSIS 23

12. Ilya Chevyrev and Andrey Kormilitzin,

A Primer on the Signature Method in Machine Learn-ing , arXiv:1603.03788 [cs, stat] (2016), arXiv: 1603.03788.13. Ilya Chevyrev, Vidit Nanda, and Harald Oberhauser,

Persistence paths and signature featuresin topological data analysis , arXiv:1806.00381 [cs, math, stat] (2018), arXiv: 1806.00381.14. Ilya Chevyrev and Harald Oberhauser,

Signature moments to characterize laws of stochasticprocesses , arXiv:1810.10971 [math, stat] (2018), arXiv: 1810.10971.15. David Cohen-Steiner, Herbert Edelsbrunner, and Dmitriy Morozov,

Vines and Vineyards byUpdating Persistence in Linear Time , Proceedings of the Twenty-second Annual Symposiumon Computational Geometry (New York, NY, USA), SCG ’06, ACM, 2006, pp. 119–126.16. B. Cummins, T. Gedeon, and K. Spendlove,

On the Eﬃcacy of State Space ReconstructionMethods in Determining Causality , SIAM Journal on Applied Dynamical Systems (2015),no. 1, 335–381.17. Yves F´elix, John Oprea, and Daniel Tanr´e, Algebraic Models in Geometry , Oxford UniversityPress, 2008 (en), Google-Books-ID: ENUTDAAAQBAJ.18. Peter K. Friz and Nicolas B. Victoir,

Multidimensional Stochastic Processes as Rough Paths:Theory and Applications , Cambridge Studies in Advanced Mathematics, Cambridge Univer-sity Press, 2010.19. Robert Ghrist,

Barcodes: the persistent topology of data , Bulletin of the American Mathe-matical Society (2008), no. 1, 61–75.20. Gr´egory Ginot, Thomas Tradler, and Mahmoud Zeinalian, A chen model for mapping spacesand the surface product , Annales scientiﬁques de l’´Ecole Normale Sup´erieure

Ser. 4, 43 (2010), no. 5, 811–881 (en). MR 272187721. C. W. J. Granger,

Investigating Causal Relations by Econometric Models and Cross-spectralMethods , Econometrica (1969), no. 3, 424–438.22. Lajos Gergely Gyurk´o, Terry Lyons, Mark Kontkowski, and Jonathan Field, Extracting infor-mation from the signature of a ﬁnancial data stream , arXiv:1307.7244 [q-ﬁn] (2013), arXiv:1307.7244.23. Trygve Haavelmo,

The Statistical Implications of a System of Simultaneous Equations , Econo-metrica (1943), no. 1, 1–12.24. Ben Hambly and Terry Lyons, Uniqueness for the signature of a path of bounded variationand the reduced path group , Annals of Mathematics (2010), no. 1, 109–167.25. Jieun Kim, Wei Zhu, Linda Chang, Peter M. Bentler, and Thomas Ernst,

Uniﬁed structuralequation modeling approach for the analysis of multisubject, multivariate functional MRIdata , Human Brain Mapping (2007), no. 2, 85–93 (en).26. Terry Lyons, Rough paths, Signatures and the modelling of functions on streams ,arXiv:1405.4537 [math, q-ﬁn, stat] (2014), arXiv: 1405.4537.27. Terry J. Lyons, Michael J. Caruana, and Thierry L´evy,

Diﬀerential Equations Driven byRough Paths: Ecole dEt de Probabilits de Saint-Flour XXXIV-2004 , cole d’t de Probabilitsde Saint-Flour, Springer-Verlag, Berlin Heidelberg, 2007 (en).28. Terry J. Lyons and Weijun Xu,

Hyperbolic development and inversion of signature , Journalof Functional Analysis (2017), no. 7, 2933–2955.29. ,

Inverting the signature of a path , Journal of the European Mathematical Society (2018), no. 7, 1655–1687.30. P. J. Moore, J. Gallacher, and T. J. Lyons, Using path signatures to predict a diagnosis ofAlzheimer’s disease , arXiv:1808.05865 [q-bio, stat] (2018), arXiv: 1808.05865.31. Elizabeth Munch, Katharine Turner, Paul Bendich, Sayan Mukherjee, Jonathan Mattingly,and John Harer,

Probabilistic Frchet means for time varying persistence diagrams , ElectronicJournal of Statistics (2015), no. 1, 1173–1204 (EN). MR MR335433532. Frdric Patras and Jean-Claude Thomas, Cochain algebras of mapping spaces and ﬁnite groupactions , Topology and its Applications (2003), no. 2, 189 – 207.33. Judea Pearl,

Causality: Models, Reasoning and Inference , 2nd ed., Cambridge UniversityPress, New York, NY, USA, 2009.34. Jose A. Perea and John Harer,

Sliding Windows and Persistence: An Application of Topo-logical Methods to Signal Analysis , Foundations of Computational Mathematics (2015),no. 3, 799–838.35. Jeremy Reizenstein, Calculation of Iterated-Integral Signatures and Log Signatures ,arXiv:1712.02757 [math] (2017), arXiv: 1712.02757.

36. Jeremy Reizenstein and Benjamin Graham,

The iisignature library: eﬃcient calculation ofiterated-integral signatures and log signatures , arXiv:1802.08252 [cs, math] (2018), arXiv:1802.08252.37. Christophe Reutenauer,

Free Lie Algebras , London Mathematical Society Monographs, OxfordUniversity Press, Oxford, New York, June 1993.38. George Sugihara, Robert May, Hao Ye, Chih-hao Hsieh, Ethan Deyle, Michael Fogarty, andStephan Munch,

Detecting Causality in Complex Ecosystems , Science (2012), no. 6106,496–500 (en).39. Floris Takens,

Detecting strange attractors in turbulence , Dynamical Systems and Turbu-lence, Warwick 1980 (David Rand and Lai-Sang Young, eds.), Lecture Notes in Mathematics,Springer Berlin Heidelberg, 1981, pp. 366–381 (en).40. S. Wright,

Correlation and causation , Journal of Agricultural Research (1921), 557–585.41. Weixin Yang, Lianwen Jin, and Manfei Liu, DeepWriterID: An End-to-end OnlineText-independent Writer Identiﬁcation System , arXiv:1508.04945 [cs, stat] (2015), arXiv:1508.04945.42. Benjamin J. Zimmerman, Ivan Abraham, Sara A. Schmidt, Yuliy Baryshnikov, and Fatima T.Husain,

Dissociating tinnitus patients from healthy controls using resting-state cyclicity anal-ysis and clustering , Network Neuroscience (2018), 1–23.

Department of Mathematical Sciences, University of Delaware, Newark, DE 19716

E-mail address : [email protected] Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104

E-mail address ::