Model comparison via simplicial complexes and persistent homology
MModel comparison via simplicial complexes and persistent homology
Sean T. Vittadello ∗ and Michael P. H. Stumpf Abstract
In many scientific and technological contexts we have only a poor understanding of the struc-ture and details of appropriate mathematical models. We often need to compare differentmodels. With available data we can use formal statistical model selection to compare andcontrast the ability of different mathematical models to describe such data. But there is alack of rigorous methods to compare different models a priori . Here we develop and illustratetwo such approaches that allow us to compare model structures in a systematic way. Usingwell-developed and understood concepts from simplicial geometry we are able to define a dis-tance based on the persistent homology applied to the simplicial complexes that captures themodel structure. In this way we can identify shared topological features of different models.We then expand this, and move from a distance between simplicial complexes to studyingequivalences between models in order to determine their functional relatedness.
Key words and phrases : Positional information, Turing model, algebraic topology, algebraic biology, persistencebarcode, model distance, model equivalence School of BioSciences, The University of Melbourne, Parkville VIC 3010 Australia ∗ Corresponding author: [email protected] School of Mathematics and Statistics, The University of Melbourne, Parkville VIC 3010 Australia.
December 25, 2020 a r X i v : . [ m a t h . A T ] D ec Introduction
Scientific models are representations of the physical world that isolate features of interest through variouslevels of abstraction [1]. The complexity of biological phenomena, exemplified in systems biology, neces-sitates the development of models to further the understanding of these systems [2–5]. There are variousapproaches to modelling biological systems, including rule-based models [6–8], network models [9–11],mechanistic models [5, 12], and continuum models [13–15], which are not necessarily mutually exclusiveclassifications.While a model should represent the corresponding phenomenon as faithfully as possible, it is also arequirement that the model be manageable. It is neither practicable to develop a complete model of abiological system, due to combinatorial complexity [3, 7, 16], nor is it necessary, since many componentsof a given system have relatively small effects on the system behaviour [4].A biological system may be represented by various alternative models, which can differ with respect tothe specific features isolated by each model and the interconnections between these features. Contrarily,similar models may represent ostensibly divergent biological systems, where the difference between themodels may be only conceptual. The ability to rigorously and efficiently compare models is thereforeimperative for understanding complex biological systems, and can elucidate the relationship betweenapparently distinct phenomena [17–21]. Model comparison is, however, a nontrivial task for multiplereasons: the various representational forms of models; the different levels of abstraction and granularitybetween models; and the need for a systematic formalism with a visual element that provides elucidationof conceptual similarities between models.In this article we present a novel systematic formalism for model comparison that is both quantitativeand visual. Since models consist of a finite collection of components, we represent a given model as acombinatorial object where the components of the model are represented as labelled vertices. Further,since the interconnections between various model components may have dimension higher than the dyadicconnections in combinatorial graphs, we employ simplicial complexes to represent the models which allowfor interconnections of any finite dimension. Representing models as simplicial complexes ensures that anymodel can be represented within our framework, irrespective of form, granularity, level of abstraction, andso on. Model comparison is then effected by comparison of the simplicial representations of the models,which have the same form so can be compared directly.The simplicial representations provide a graphical form for the models, however we can also obtainvisual representations of the models as collections of persistence intervals by applying persistent homologyto the simplicial complexes. Persistent homology identifies the global topological structure of a space that1ersists across multiple scales [22, 23], and is often employed in topological data analysis to study high-dimensional data sets [24]. Persistent homology is robust to small perturbations in the input data, isindependent of dimensions and coordinates, and provides a compact representation of the qualitativefeatures of the input data [24]. Our application of persistent homology is novel since our simplicialcomplexes correspond to models and not data sets, and we utilise the persistence intervals associated witha simplicial complex as an alternative representation of the model, and hence the corresponding biologicalsystem. The persistence-barcode representations tend to facilitate the visual comparison between variousmodels, which may be more difficult with the simplicial representations and particularly with the originalmodels.We provide two general methodologies for comparing models. The first is comparison by distance,which measures the difference between models in terms of the differences between the correspondinglabelled simplicial representations, and can be calculated either directly using the simplicial complexes orindirectly with the corresponding persistence intervals. The second is comparison by equivalence, wherewe identify certain components between the simplicial representations as equivalent, and then employoperations on the complexes to transform one to another.As an application of our formalism we compare the two main classes of models for developmentalpattern formation, namely Turing systems and positional information. While there is an extensive lit-erature for models of both Turing systems and positional information, the relationship between thesemodels, and between the corresponding biological systems, remains unclear [25]. One main outcomeof our methodology is the demonstration that the Turing activator-inhibitor model is equivalent to thepositional-information annihilation model, since the fundamental difference between the two models is es-sentially due to the location of the source of the gradient-forming morphogen. This observation suggeststhat the location of the morphogen source may determine the particular mechanism, namely Turing orpositional information, for gradient formation, or vice versa.
We begin with a brief and informal discussion of the background in simplicial homology relevant forour work. A more detailed overview of simplicial complexes and homology is provided in the ElectronicSupplementary Material document.A p -simplex is a generalisation of a filled triangle in the plane to an arbitrary dimension p , so a point2s a 0-simplex, a line segment is a 1-simplex, a filled triangle is a 2-simplex, a filled tetrahedron is a3-simplex, and so forth. We can think of 0-simplices as vertices and 1-simplices as edges, so that a simplegraph is therefore a set of 0-simplices and 1-simplices. A k -face of a simplex is a k -dimensional subsimplex,and the 0-faces of a simplex are said to span the simplex. A simplicial complex is a generalisation ofa simple graph, allowing for simplicies of dimension higher than one which represent higher-dimensionalinteractions. A simplicial complex K therefore consists of a set of simplices such that if σ is a simplexin K then every face of σ is also in K , and the nonempty intersection of two simplices in K is a simplexin K . The k -skeleton of a simplicial complex K is the subcomplex K ( k ) consisting of the simplices in K with dimension at most k . In particular, the 0-skeleton K (0) is the set of vertices and the 1-skeleton K (1) is the underlying graph of a simplicial complex K .A simplicial complex K is a topological space, formed from the union of its simplices, and simplicialhomology characterises the complex using algebraic techniques to compute the number of k -dimensionalholes in the complex. So, for example, 0-dimensional holes are connected components and 1-dimensionalholes are edge cycles. The homology of K depends on the underlying space of simplices and their inter-sections in K . Persistent homology studies the topological features, namely the k -dimensional holes, ofa complex K across multiple scales, based on a filtration of a simplicial complex which is an increasingsequence of subcomplexes. Each k -dimensional hole in K will be created at some index in the filtration,and will either persist through to the full complex K or will be annihilated at some later index. Persistenthomology therefore gives a multiset of persistence intervals P ( K ) describing the creation and annihilationof all k -dimensional holes in K . We denote by P k ( K ) the submultiset of P ( K ) consisting of the intervalscorresponding to k -dimensional holes. The persistence intervals can be visualised as a persistence barcode ,displaying the intervals as horizontal line segments. By a model we mean an abstraction of an observable phenomenon [1]. A specific detail of a givenmodel is a component of the model, and all models consist of a finite number of components and theirinterconnections, which represent direct relationships between specific components. For example, a Turingsystem of reaction-diffusion equations, which is a model of a developmental patterning process, includescomponents such as morphogens, diffusion, boundary conditions, reactions, and a morphogen gradient,along with the interconnections between particular components such as a morphogen and its diffusion.We begin by defining the set of model components.
Definition . Let C be the set of all components that appear in the collection of models under consider-3tion for comparison. We say that each such model is generated by a subset of components from C . LetOrd : C → { , . . . , |C|} be a bijection that specifies an order, possibly arbitrary, for the categorical dataelements in C .Our labelling of 0-simplices is flexible, which allows for more efficient or more explicit labelling asappropriate: Notation . We represent each model component from C as a 0-simplex which is labelled variously as i , v i , or the name of the component given by Ord − ( i ), where i is the position in the total order specifiedby Ord. Note that the label v i is based on recognising that 0-simplices are vertices.To each model we associate a simplicial complex that consists of the relevant labelled 0-simplices, alongwith the one- and higher-dimensional simplices that represent the interconnections between the compo-nents, where the dyadic interactions are 1-simplices, the triadic interactions are 2-simplices, and so forth asrequired. The labelling of the 0-simplices induces a labelling of the higher-dimensional simplices throughtheir spanning 0-simplices. Since the models of interest are associated with the same set of components C , we can identify a particular labelled simplex, representing specific components and interconnections,within the different simplicial complexes associated with the models.In Figure 1 we provide two simple examples of simplicial representations of models. The first is the PreyPredatorPredation Prey growthPredator growthPredator death(a) EnzymeSubstrate 1Product 1 Product 2Substrate 2(b)
Figure 1: Examples of simplicial representations . (a) Lotka-Volterra model. (b) Random-sequential bisub-strate reaction. Note that the shaded triangle is a 2-simplex, which represents the ternary complex formed betweenthe two substrates and the enzyme. simplicial complex associated to the Lotka-Volterra model [26, Chapter 3, page 79],d x d t = x ( a − by ) and d y d t = y ( cx − d ) , (1)4here x is the prey, y is the predator, and a , b , c , and d are positive constants. The underlying assumptionsof the model are: in the absence of predation the prey population grows without bound; the reduction ofthe prey’s per capita growth rate due to predation is proportional to the prey and predator populations;in the absence of prey the predator’s death rate produces exponential decay; and, the prey’s contributionto the predator’s growth rate is proportional to both the available prey and the size of the predatorpopulation.The second example is a model for a random-sequential bisubstrate reaction [27, Chapter 7, page 91],whereby an enzyme E binds two substrates A and B simultaneously, with a random order of binding, toproduce an EAB ternary complex that results in the formation of two products P and Q . We can writethis reaction as A + B E −− *) −− P + Q. (2)Given a simplicial representation of a model, we employ persistent homology to provide a visualcharacterisation of the model. We begin by applying the shortlex order to the simplices of the simplicialcomplex with respect to the assigned vertex order, where the shortlex order is defined as follows [28,Chapter 0, page 14]: Definition
Shortlex order ) . Let A be a totally-ordered finite set, called the alphabet , and let W be the set of all words that are finite sequences of symbols from A . The shortlex order on W orders thewords as follows:1. Two different words in W with equal length are ordered according to the alphabetic order of A ,therefore lexicographically.2. For two words with unequal lengths, the shorter word precedes the longer word.Note that, since A is totally-ordered, the shortlex order is also a total order. Applying the shortlexorder to a collection of simplicies, we first order the simplices by increasing dimension, and then thesimplices of the same dimension are ordered lexicographically. Since the set of vertices is totally orderedas specified by Ord, so is the shortlex order for the simplicies.To compare a collection of simplicial representations of models using persistent homology we requirefiltrations for each complex that are compatible, so that the same topological structures in differentcomplexes give the same persistence intervals. For this we utilise a reference simplex that contains thesimplicial complexes associated to all of the models for comparison: Notation
Reference simplex ) . Let C be the set of all components that appear in the collectionof models under consideration for comparison. The reference simplex is the ( |C| − C , that is spanned by the labelled 0-simplices { v , v , . . . , v |C| } . If n is the maximum dimension of thesimplicies in the simplicial representations of the models for comparison then the filtration for the n -skeleton K ( n ) C provides the most efficient and compatible subfiltrations for the models. Note that eachsimplicial representation of a model under consideration is a subcomplex of K ( n ) C , and hence also of K C .Therefore, to obtain filtrations for the simplicial complexes that allow for direct comparison we firstobtain a filtration for K ( n ) C , for some unique n . The filtration of each simplicial representation is thena subfiltration of the filtration for K ( n ) C . In this way we can directly compare the persistence intervalsbetween different models, since identical persistence intervals will arise from identical topological featuresassociated with the same subset of model components. We want the persistent homology of a particularsimplicial complex to be unique amongst all of the simplicial complexes of the models of interest, inorder to reflect all differences between the complexes and hence models. We therefore employ the shortlexfiltration : Definition
Shortlex filtration, shortlex subfiltration ) . Let C be the set of all components thatappear in the collection of models under consideration for comparison, and let the n -skeleton K ( n ) C havethe shortlex ordering. Define the weight function w : K ( n ) C → R by w ( σ ) = j , where j is the index of σ inthe shortlex ordering for K ( n ) C , with the first index equal to 1. The shortlex filtration of K ( n ) C is the nestedsequence of subcomplexes K C , ⊆ K C , ⊆ · · · ⊆ K C ,m = K ( n ) C , where K C ,i = w − (( −∞ , i ]) for 1 ≤ i ≤ m .Suppose that L is the simplicial representation of a model generated by a subset of components from C . The shortlex subfiltration of L is the filtration of L induced by the shortlex filtration on K ( n ) C .The shortlex filtration adds at most one simplex at each index of the filtration according to the shortlexordering of the simplices.We now establish that the simplicial representations and associated filtrations provide the desireddescriptions of the models. The following theorem shows that the simplicial complex associated with agiven model has a unique multiset of persistence intervals with respect to the shortlex filtration. Thereforethe persistence intervals, visualised as a persistence barcode, provide a unique ‘fingerprint’ of the model. Theorem 2.6.
Let K and L be two labelled simplicial complexes associated with two models generated by C , both with the shortlex filtration. Then K = L if and only if P k ( K ) = P k ( L ) for all k ≥ .Proof. The forward implication is trivial so consider the backward implication, for which we providea contrapositive proof. Suppose that K = L , and let σ p be a simplex of smallest dimension p in thesymmetric difference K L . Without loss of generality we may assume that σ p ∈ K . We show that thereexists k ≥ P k ( K ) = P k ( L ). 6f p = 0 then P ( K ) contains an interval with left endpoint equal to the filtration index i at which σ p is added to the filtered complex, corresponding to the creation of a 0-dimensional homology class. Since σ p / ∈ L there is no simplex added at index i in the filtration of L , hence no interval with left endpoint i in P ( L ).For p ≥ σ p is a ( p − τ p − , which is a nonbounding cycle in the filtered complexprior to the addition of σ p . So either τ p − corresponds to a linearly independent ( p − τ p − is a linear combination of the ( p − τ p − is in a linearly independent homology class then P p − ( K ) contains an interval with rightendpoint equal to the index j of the filtration where the homology class is annihilated by the addition of σ p . Since σ p / ∈ L there is no interval in P p − ( L ) with the same right endpoint j . Alternatively, supposethe homology class for τ p − is a linear combination of the ( p − σ p at some index i in the filtration creates a p -cycle, and hence a new p -dimensional homologyclass, containing σ p then there is a corresponding interval in P p ( K ) with left endpoint i and no intervalin P p ( L ) with left endpoint i . Otherwise, if the addition of σ p at some index i in the filtration does notcreate a p -cycle then, since τ p − is now a bounding cycle, it follows that one of the previously linearlyindependent ( p − i . Therefore, there is a corresponding interval in P p − ( K ) with right endpoint i , and no interval in P p − ( L ) with right endpoint i .We can rephrase Theorem 2.6 in terms of the full multisets of persistence intervals: Corollary 2.7.
Let K and L be two labelled simplicial complexes associated with two models generatedby C , both with the shortlex filtration. Then K = L if and only if P ( K ) = P ( L ) .Proof. The forward implication follows from Theorem 2.6. The backward implication will also follow fromTheorem 2.6 if we establish that P ( K ) = P ( L ) if and only if P k ( K ) = P k ( L ) for all k ≥
0. Note that F ( K ) = { P k ( K ) | P k ( K ) = ∅ and k ≥ } is a partition of P ( K ), and similarly for F ( L ). So ( a, b ) ∈ P ( K )implies ( a, b ) ∈ P k ( K ) for some k ≥
0, and if ( a, b ) ∈ P ( L ) then ( a, b ) ∈ P j ( L ) for some j ≥
0. Sinceboth K and L are filtered with the shortlex filtration, whereby at most one simplex is added at eachfiltration index, the dimension of the simplex added at index a uniquely determines the dimension of thehomology class created at index a corresponding to the interval ( a, b ). Therefore k = j , and it followsthat P ( K ) = P ( L ) is equivalent to P k ( K ) = P k ( L ) for all k ≥ Proposition 2.8.
Let K be a labelled simplicial complex with the shortlex filtration. Then every intervalin P ( K ) has multiplicity one.Proof. It suffices to show that no two intervals in P ( K ) have the same left endpoint. So suppose to thecontrary that the two intervals I and I in the multiset P ( K ) have the same left endpoint.Note that I and I do not correspond to homology classes in different dimensions. Indeed, one p -cycleand one q -cycle with p < q cannot be created at the same index as this would require the simultaneousaddition of a ( p − q − I and I must correspond to linearly-independent homology classes in the same dimension, andtherefore to the creation of two linearly-independent nonbounding p -cycles σ and σ at the same filtrationindex i due to the addition of a p -simplex τ . Then σ + σ under F addition is a nonbounding cycle atindex i −
1, so the addition of τ at index i yields one new homology class, contradicting the creation of twohomology classes. Therefore, there is at most one interval in P ( K ) with a particular left endpoint.While we have established that the multiset of persistence intervals is a set, we continue to describeit as a multiset for clarity of reference.Since a submodel consists of a subset of the components and interconnections of a given model,we expect a similar relationship to hold between the simplicial representations and the correspondingpersistent homology, which we now confirm. Proposition 2.9.
Let M and N be models where M is a submodel of N , and let K and L be the associatedlabelled simplicial complexes of M and N , respectively, with the shortlex filtration. Then K is a subcomplexof L . Further, for each k ≥ , the submultiset of finite intervals from P k ( K ) is a submultiset of P k ( L ) ,and each infinite interval in P k ( K ) is either in P k ( L ) or corresponds to a finite interval in P k ( L ) withthe same left endpoint.Proof. Since M is a submodel of N , all of the model components and interconnections in M are also in N ,so the corresponding simplicial complexes satisfy K ⊆ L .Now, fix k ≥ a, b ) be a finite interval in P k ( K ) corresponding to the creation and subsequentannihilation of a k -homology class at indices a and b , respectively, in the filtration. Since K ⊆ L , thesubcomplex of K corresponding to the k -homology class is also a subcomplex of L , so ( a, b ) ∈ P k ( L ).Suppose now that the infinite interval ( a, ∞ ) is in P k ( K ), corresponding to the creation of a k -homology class at index a in the filtration which is not annihilated and persists in the full complex K .8uppose further, without loss of generality, that ( a, ∞ ) / ∈ P k ( L ). Since K ⊆ L , the subcomplex of K corresponding to the k -homology class is also a subcomplex of L , so the k -homology class is also createdin the filtration of L . Since ( a, ∞ ) / ∈ P k ( L ), L contains the ( k + 1)-simplex for which the k -homologyclass is the bounding cycle, so the k -homology class is annihilated at some index b in the filtration of L due to the addition of the ( k + 1)-simplex. Thus, ( a, b ) ∈ P k ( L ).We visualise the persistent homology of a simplicial complex, summarised by the multisets of per-sistence intervals, as a persistence barcode. The barcodes have the distinct advantage of allowing for avisual comparison of models. Definition
Persistence barcode ) . Let K be a labelled simplicial complex with the shortlex filtra-tion. For each dimension p ≥ p th persistence barcode BAR p ( K ) for K is a graphical representation ofthe multiset of p -dimensional persistence intervals P p ( K ). Specifically, let f : P p ( K ) → (cid:8) , . . . , |P p ( K ) | (cid:9) be a bijection, then BAR p ( K ) is given by the set of points (cid:8) (cid:0) x, f ( a, b ) (cid:1) | ( a, b ) ∈ P p ( K ) and x ∈ [ a, b ) (cid:9) .Note that the persistence barcode does not account for multiplicity of intervals in P p ( K ) for generalfiltrations, however for the shortlex filtration each interval in P ( K ) has multiplicity one by Proposition 2.8,so in our case no information is hidden within the persistence barcode representations. To compare models of interest in a quantitative manner we introduce a measure of distance between thesimplicial representations. We provide a direct measure between the simplicial complexes, along with ameasure based on the multisets of persistence intervals, and then show that the two measures yield thesame distance.
Definition
Distance between simplicial complexes ) . Let S be a collection of labelled simplicialcomplexes with labels from C . Define the function d : S × S → R such that d ( K, L ) = | K L | , (3)so d is the cardinality of the symmetric difference of the two simplicial complexes.Note that the symmetric difference of two labelled simplicial complexes accounts for the labelling of thesimplices, so that two labelled simplices from different simplicial complexes are considered identical whenthey are spanned by the same set of labelled 0-simplices. Further, since we add simplices under Z / Z , wehave that K L = K + L . 9 otation . Given a multiset S of intervals ( a, b ) ∈ R × R , we denote the projection onto the firstcoordinates as proj ( S ) = { a ∈ R | ( a, b ) ∈ S for some b ∈ R } , and the projection onto the secondcoordinates as proj ( S ) = { b ∈ R | ( a, b ) ∈ S for some a ∈ R } . Definition
Distance between multisets of persistence intervals ) . Let M be a collection ofmultisets of persistence intervals P ( K ) corresponding to labelled simplicial complexes K , with labels from C and with the shortlex filtration. LetΘ : M → R be such that Θ (cid:0) P ( K ) (cid:1) = proj (cid:0) P ( K ) (cid:1) ∪ (cid:0) proj (cid:0) P ( K ) (cid:1) \ {∞} (cid:1) , (4)which is the set of all left endpoints and finite right-endpoints of the persistence intervals. Define thefunction ˆ d : M × M → R by ˆ d (cid:0) P ( K ) , P ( L ) (cid:1) = (cid:12)(cid:12) Θ (cid:0) P ( K ) (cid:1) Θ (cid:0) P ( L ) (cid:1)(cid:12)(cid:12) . (5)We now show that d is a distance function on S , that ˆ d is a distance function on M , and that the metricspaces ( S , d ) and ( M , ˆ d ) are isometric. So the two distances d and ˆ d are essentially identical, and wecan use either metric to determine the distances between simplicial complexes. To prove these claimswe use the triangle inequality for the symmetric difference: for sets X , Y , and Z , the cardinality of thesymmetric difference satisfies subadditivity, that is, | X Z | ≤ | X Y | + | Y Z | . This relation follows byobserving that X Z ⊆ ( X Y ) ∪ ( Y Z ). Theorem 2.14.
Let S be a collection of labelled simplicial complexes with labels from C , and let M bethe collection of multisets of persistence intervals P ( K ) , for K ∈ S , with the shortlex filtration. Then thefunction d is a metric on S , the function ˆ d is a metric on M , and the metric spaces ( S , d ) and ( M , ˆ d ) are isometric.Proof. To see that d is a metric on S , let K , L ∈ S . Then d ( K, L ) = 0 if and only if | K L | = 0 if andonly if K = L , so the identity of indiscernibles holds. Symmetry follows from the symmetric difference.It remains to show subadditivity, so let M ∈ S . By the triangle inequality for the symmetric differencewe then have d ( K, L ) = | K L | ≤ | K M | + | M L | = d ( K, M ) + d ( M, L ), as required.Next we show that ˆ d is a metric on M . So let P ( K ), P ( L ) ∈ M be two multisets of persistenceintervals corresponding to the simplicial complexes K and L . For the identity of indiscernibles, note thatˆ d (cid:0) P ( K ) , P ( L ) (cid:1) = 0 if and only if Θ (cid:0) P ( K ) (cid:1) = Θ (cid:0) P ( L ) (cid:1) , so that K and L contain the same simplicies, ifand only if K = L if and only if P ( K ) = P ( L ) by Corollary 2.7. Symmetry follows from the symmetricdifference, and subadditivity follows from the triangle inequality for the symmetric difference.10inally, to show that the two metric spaces ( S , d ) and ( M , ˆ d ) are isometric we need to establish asurjective isometry f : S → M . Define f such that K 7→ P ( K ), which is a surjection. Recall that f is an isometry if and only if ˆ d ( f ( K ) , f ( L )) = d ( K, L ) for all K , L ∈ S , or equivalently the car-dinality of K L equals the cardinality of Θ (cid:0) P ( K ) (cid:1) Θ (cid:0) P ( L ) (cid:1) for all K , L ∈ S . To show that f isan isometry it therefore suffices to establish a bijection φ ( K,L ) , for each K , L ∈ S , from K L ontoΘ (cid:0) P ( K ) (cid:1) Θ (cid:0) P ( L ) (cid:1) . Let g : V → R be the shortlex filtration for the ( |C| − V spanned bythe 0-simplices in { v , v , . . . , v |C| } . Define the function φ ( K,L ) = g | K L by restriction. Then φ ( K,L ) isa bijection onto its image, since simplices are added individually in the shortlex filtration. It remainsto show that the image of K L under φ ( K,L ) is Θ (cid:0) P ( K ) (cid:1) Θ (cid:0) P ( L ) (cid:1) . Indeed, σ ∈ K L if and only if a ∈ Θ (cid:0) P ( K ) (cid:1) Θ (cid:0) P ( L ) (cid:1) where g | K L ( σ ) = a . We now consider model comparison by equivalence, which accounts for particular similarities betweenmodels that we may regard as essentially identical. So rather than comparing the simplicial complexesassociated to models based on the presence or absence of a particular simplex, as in model comparisonby distance, here we consider the equivalence of models in terms of an equivalence of the correspondinglabelled simplicial complexes.For this we need to specify a predicate , Pr, which is a statement that contains a finite number ofvariables with the specified domain Dom, such that Pr becomes a (Boolean) proposition when instantiated.We can consider a predicate to be a Boolean-valued function Pr : Dom → { true, false } . Predicates areoften described in terms of the number of their variables, so that for an integer n ≥ n -place predicatePr( x , . . . , x n ) has n variables x , . . . , x n with domain Dom ⊆ D ×· · ·× D n , where x i ∈ D i for i = 1 , . . . , n . Definition
Equivalent simplicial complexes ) . Let S be a collection of labelled simplicial com-plexes with labels from C , and let Pr be a two-place predicate on S × S such that R eq = { ( K, L ) ∈ S × S | Pr(
K, L ) } (6)is an equivalence relation on S . Then the simplicial complexes K , L ∈ S are equivalent if and only if( K, L ) ∈ R eq .To determine the similarity of two models, represented as simplicial complexes, we first need to formu-late a predicate on which equivalence is based. For our purposes we consider the 0-simplices, correspondingto the model components, and the 1-simplices, corresponding to the interconnections between the model11omponents, to be the primary features of the models. We neglect higher-dimensional simplices since, forthe current models under consideration, they are not essential for model representation. Informally, wewant to say that K , L ∈ S are equivalent when their labelled 0-simplices and 1-simplices are essentiallythe same, so we need to identify the labelled subcomplexes in K \ L with those in L \ K . For this weemploy general operations on a simplicial complex. We define three operations on the 1-skeleton K (1) of a labelled simplicial complex K . Since the 1-skeleton is a labelled undirected graph, we refer to the0-simplices and 1-simplices as vertices and edges, respectively.Let K and L be two simplicial complexes. Recall that a vertex map is a function ψ : K (0) → L (0) , anda simplicial map ψ : K → L is a function such that ψ | K (0) is a vertex map and whenever W = { w i } ni =0 spans a simplex in K then ψ ( W ) = { ψ ( w i ) } ni =0 spans a simplex in L [29, 30]. We also require the notionof a simplicial multivalued map, which we define as follows: Definition
Simplicial multivalued map ) . Let K and L be labelled simplicial complexes withlabels from C , and let F : K (0) ⇒ L (0) be a left-total binary relation such that F ( w ) is a nonempty subsetof L (0) for each w ∈ K (0) . Then F is a simplicial multivalued map , denoted F : K ⇒ L , if whenever aset of vertices W = { w i } ni =0 spans a simplex in K then the set of vertices F ( W ) = { F ( w i ) } ni =0 spans asimplex in L , and for each i we possibly have | F ( w i ) | > Definition
Operation 1: Vertex identification ) . Let K be a labelled simplicial complex withlabels from C , let [ u, v, w ] be a 1-cycle in K where u and v have the same adjacent vertices, and let c / ∈ K (0) be a vertex with label from C . A vertex identification is the operation whereby the two vertices u and v are identified with the new vertex c to give a new simplicial complex L , in which all of the vertices in K that are adjacent to u , or equivalently v , are adjacent to c . Then σ ∈ L if and only if there exists τ ∈ K such that either τ (0) ∩ { u, v } = ∅ and σ = τ , or τ (0) ∩ { u, v } 6 = ∅ and σ is spanned by (cid:0) τ \ { u, v } (cid:1) ∪ { c } .Vertex identification is therefore given by the simplicial map π : K → L where, for z ∈ K (0) , π ( z ) = z if z / ∈ { u, v } ,c if z ∈ { u, v } . (7) Definition
Operation 2: Vertex split ) . Let K be a labelled simplicial complex with labels from C , let u ∈ K (0) , and let c , d / ∈ K (0) be two vertices with labels in C . A vertex split is the operationwhereby the vertex u is split into the two new vertices c and d to give a new simplicial complex L , inwhich c and d are adjacent and all of the vertices adjacent to u are adjacent to both c and d . Then σ ∈ L
12f and only if there exists τ ∈ K such that either u / ∈ τ (0) and σ = τ , or u ∈ τ (0) and σ is spanned by (cid:0) τ (0) \ { u } (cid:1) ∪ { c, d } . Vertex splitting is therefore given by the simplicial multivalued map π : K ⇒ L where, for z ∈ K (0) , π ( z ) = z if z = u, { c, d } if z = u. (8) Definition
Operation 3: Vertex substitution ) . Let K be a labelled simplicial complex withlabels from C , let u ∈ K (0) , and let c / ∈ K be a vertex corresponding to a model component in C . A vertexsubstitution is the operation whereby the labelled vertex c is substituted for the labelled vertex u to givea new simplicial complex L . Then σ ∈ L if and only if there exists τ ∈ K such that either u / ∈ τ (0) and σ = τ , or u ∈ τ (0) and σ is spanned by (cid:0) τ (0) \ { u } (cid:1) ∪ { c } . Vertex substitution is therefore given by thesimplicial map π : K → L where, for z ∈ K (0) , π ( z ) = z if z = u,c if z = u. (9)Note that a vertex identification is mutually inverse with a corresponding vertex split, and a vertexsubstitution is mutually inverse with another vertex substitution. To rigorously establish equivalencebetween models through application of Operations 1–3, we must ensure that each operation preserves therepresentation of the general physical system. The application of both Operations 1 and 2 is intrinsicallyrestrictive, since they require similar interconnections between the vertices. Here we explicitly specify theform in which the three operations are admissible: Definition
Admissible operations on labelled simplicial complexes ) . An admissible operation on a labelled simplicial complex associated with a model is:1. Operation 1 (Vertex identification): the two identified vertices and the new vertex must have labelsthat represent conceptually equivalent components of the model, which should intrinsically hold forthe two identified vertices since they form the boundary of an edge;2. Operation 2 (Vertex split): the original vertex and the two new vertices must have labels thatrepresent conceptually equivalent components of the model, which should intrinsically hold for thetwo new vertices since they form the boundary of an edge;3. Operation 3 (Vertex substitution): the new and old vertices must have labels that represent con-ceptually equivalent components of the model.13e are now in a position to define the required predicate. Definition
Predicate for the equivalence of models ) . The two-place predicate Pr on
S × S is as follows: for (
K, L ) ∈ S × S there exists a sequence, which may be empty, of the three admissibleoperations on simplicial complexes, say ( f i ) ni =0 , such that f n ( · · · f ( f ( K )) · · · ) = L .So for ( K, L ) ∈ S × S , the proposition Pr( K, L ) is true when a sequence of the three admissibleoperations can transform K to L , otherwise if no such sequence exists then Pr( K, L ) is false. Thefollowing theorem establishes that the specified predicate gives the necessary equivalence relation.
Theorem 2.22. If Pr is the two-place predicate on S × S given in Definition 2.21 then the relation R eq = { ( x, y ) ∈ S × S | Pr( x, y ) } is an equivalence relation on S .Proof. R eq is reflexive, since we can apply the empty sequence of operations to any simplicial complex.Symmetry of R eq follows from the invertibility of each operation, and transitivity follows from the asso-ciativity of composition of the operations. Therefore, R eq is an equivalence relation on S .The model equivalence that we consider here is quite restrictive, so only models that have a highlevel of similarity would be identified as equivalent. There are many other possible definitions of modelequivalence that may be more appropriate when comparing different collections of models. Here we apply our methods for model comparison to the two main categories of models for developmen-tal pattern formation, namely Turing and positional information. We consider the patterning to occurthroughout a two-dimensional rectangular domain, where the boundary conditions are always zero-flux onthe two opposite sides parallel to the morphogen concentration gradient. We assume that the velocitiesof the cytoplasm and the growing tissue are negligible, and we therefore assume no advection.To represent the models as simplicial complexes we first determine the set of components C on whichthe models under consideration are based. In this case the possible components are: the agents, namelymorphogens, modulators, and substrates; the reactions involving the agents, such as self-activation, acti-vation, inhibition, and annihilation; agent degradation; the influx and outflux boundary conditions; agentdiffusion; profile of the morphogen gradient; and scale invariance of the morphogen gradient. The orderedset of components is shown in Figure 2, where we also indicate with arrows the subsets of componentsthat we regard as providing equivalent functions within the models.14 orphogen 1 Morphogen 2
Morphogen 3
14 151617Morphogen 3 Diffusion 3Degradation 3Production 3
Substrate 1
Modulator 1
Reactions and Interactions
Morphogen 1 gradient
Figure 2: Ordered set of components for the Turing and positional information models . The simplicial representation of each model is constructed from the union of subcomplexes associatedwith each agent in the model. The subcomplex associated with an agent has 0-simplices that correspondto the model components that have a direct interconnection with the agent, including the agent itself.Here we describe one positional information model and one Turing model, and provide the descriptionsof four additional positional information models and three additional Turing models in the ElectronicSupplementary Material document. We begin by describing the positional-information annihilation model.
Both the oppsosing gradients model (Electronic Supplementary Material) and the annihilation modelare mechanisms whereby two opposing morphogen gradients provide size information for developmentalpatterning. The annihilation model, in which morphogen 2 irreversibly binds to morphogen 1, inhibitsthe action of morphogen 1 on transcription factor activity [31]. The sources of each morphogen are atopposite ends of the domain, and the morphogens interact by an annihilation reaction with rate k that15esults in global scale-invariant patterning [31, 32]. Mathematically, the two morphogen gradients withconcentrations m ( x , t ) and c ( x , t ) can be modelled as ∂m∂t = D m ∇ m − k m m − kmc, (10) ∂c∂t = D c ∇ c − k c c − kmc, (11)where D m and D c are diffusivities, and k m and k c are degradation rates. The boundary conditions foreach morphogen are Neumann at both boundaries, with flux at the source and zero flux at the oppositeboundary.For the simplicial representation of the annihilation model, the vertices and corresponding modelcomponents are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Influx 1 • v ←→ Morphogen 2 • v ←→ Diffusion 2 • v ←→ Degradation 2 • v ←→ Influx 2 • v ←→ Annihilation between morphogens 1 and 2 • v ←→ Monotonic gradient • v ←→ Global scale-invarianceFor the annihilation model the simplicial complex and corresponding persistent barcode is shown inFigure 3(a)–(b).We now describe the Turing activator-inhibitor model.
The activator-inhibitor system [15,33,34] consists of two diffusible morphogens, an autocatalytic activatorwith concentration m ( x , t ) and a rapidly diffusing inhibitor with concentration c ( x , t ). This model canbe represented mathematically as ∂m∂t = D m ∇ m + ρm c (1 + µ m m ) − k m m + ρ m , (12) ∂c∂t = D c ∇ c + ρm − k c c + ρ c , (13)16 ositional-information model - annihilation(c) Turing model - activator-inhibitor (a)
123 6 9 10 11 13264043Morphogen 1Diffusion 1Degradation 1Influx 1Morphogen 2 Diffusion 2 Degradation 2Influx 2AnnihilationbetweenMorphogens 1and 2Monotonic gradientGlobal scale-invariance Filtration index0 100 900800700600500400300200 1000 H Filtration index0 100 900800700600500400300200 1000 H (b) Filtration index0 100 900800700600500400300200 1000 H Filtration index0 100 900800700600500400300200 1000 H (d) Figure 3: Simplicial complex representations and corresponding persistence barcodes
Positional-information annihilation model: (a) Simplicial representation; (b) Persistence barcode. Turing activator-inhibitormodel: (c) Simplicial representation; (d) Persistence barcode. Note that the H and H labels on the persistencebarcodes refer to the 0- and 1-dimensional homology classes, and the arrows at the right endpoints of some persis-tence intervals indicate that the right endpoints are + ∞ , corresponding to homology groups that are not annihilatedin the filtration. where D m and D c are diffusivities, ρ m and ρ c are basal production rates, k m and k c are degradation rates,and µ m is a saturation constant. The parameter ρ is the source density , which measures the general abilityof the cells to perform the autocatalytic reaction. The patterning arises through local self-enhancementof the activator, activation of the inhibitor, and long-range inhibition of the activator. We assume thatthe boundary conditions are zero flux at both boundaries.For the simplicial representation of the activator-inhibitor model, the vertices and corresponding modelcomponents are: 17 v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Basal production 1 • v ←→ Morphogen 2 • v ←→ Diffusion 2 • v ←→ Degradation 2 • v ←→ Basal production 2 • v ←→ Self-activation of morphogen 1 • v ←→ Activation of morphogen 2 by morphogen 1 • v ←→ Inhibition of morphogen 1 by morphogen 2 • v ←→ Oscillatory gradient • v ←→ Global scale-invarianceFor the activator-inhibitor model the simplicial complex and corresponding persistent barcode is shownin Figure 3(c)–(d).
We now apply Theorem 2.14 to calculate the distances between the simplicial representations of thepositional information and Turing models, shown in Table 1. These distances indicate the total numberof labelled simplices that must be added and subtracted to transform one labelled simplicial complexassociated with a model to the labelled simplicial complex of another model.
Table 1:
Distances between Turing and positional information models
PI 1 PI 2 PI 3 PI 4 PI 5 T 1 T 2 T 3 T 4PI 1 (linear gradient) 0 11 29 27 33 53 45 78 73PI 2 (synthesis-diffusion-degradation) 11 0 18 24 30 48 40 73 68PI 3 (opposing gradients) 29 18 0 26 48 60 58 85 80PI 4 (annihilation) 27 24 26 0 44 54 58 79 74PI 5 (modulation) 33 30 48 44 0 72 64 97 92T 1 (activator-inhibitor) 53 48 60 54 72 0 44 77 20T 2 (substrate depletion) 45 40 58 58 64 44 0 87 64T 3 (inhibition of an inhibition) 78 73 85 79 97 77 87 0 97T 4 (modulation) 73 68 80 74 92 20 64 97 0In Table 2 we also show the number of simplices in each simplicial representation of the Turing andpositional information models.We observe that the distance between two models can be relatively large even when the models havesimilar numbers of simplices of each dimension, since the metric accounts for not just the numbers ofsimplicies but also for the labelling of the simplicies. Further, our measure of distance between models is18 able 2:
Number of simplices in the labelled simplicial representations of the Turing and positional informationmodels
We now demonstrate our notion of model equivalence by comparing the positional-information annihila-tion model with the Turing activator-inhibitor model. Our definition of model equivalence is intentionallyrestrictive, since we only want models to be regarded as equivalent when they are conceptually verysimilar.The simplicial representations of the annihilation model and the activator-inhibitor model are shownin Figure 3(a) and (c), respectively. In accordance with the admissibility requirements for Operations1–3, we regard as equivalent the following subsets of model components:
Positional information Turing
Influx 1 ⇐⇒ {
Basal production 1, Self-activation of morphogen 1 } Influx 2 ⇐⇒ Basal production 2Monotonic gradient ⇐⇒ Oscillatory gradientAnnihilation betweenmorphogens 1 and 2 ⇐⇒ {
Activation of morphogen 2 by morphogen 1,Inhibition of morphogen 1 by morphogen 2 } We shall demonstrate the equivalence of the two models by applying Operations 1–3 to the Turingactivator-inhibitor model to obtain the positional-information annihilation model. Note that, since thethree operations are necessarily invertible, we could similarly apply the operations to the positional-19nformation annihilation model to obtain the Turing activator-inhibitor model.We perform a vertex identification of the two vertices ‘Basal production 1’ and ‘Self-activation 1’ toobtain the new vertex ‘Influx 1’. The vertex ‘Basal production 2’ is substituted with the vertex ‘Influx 2’.Similarly, we substitute the vertex ‘Monotonic gradient’ for the vertex ‘Oscillatory gradient’, since we areinterested in the existence of the pattern-forming gradient and not the particular profile of the gradient.Finally, we identify the vertices ‘Activation of morphogen 2 by morphogen 1’ and ‘Inhibition of morphogen1 by morphogen 2’ to obtain the vertex ‘Annihilation between morphogens 1 and 2’. The resultingsimplicial complex is that which represents the positional-information annihilation model. Since we haveonly used admissible operations to transform the Turing model complex to the positional-informationmodel complex, the Turing activator-inhibitor model and the positional-information annihilator modelare equivalent.Conversely, we could apply vertex splits and substitutions to transform the simplicial representationof the positional-information annihilation model to the simplicial representation of the Turing activator-inhibitor model. We can also compare the persistence barcodes shown in Figure 3(b) and (d), whichhave a high degree of similarity in accordance with the established equivalence between the positional-information annihilation model and the Turing activator-inhibitor model.This analysis demonstrates that the Turing activator-inhibitor model and the positional-informationannihilation model, each one of the main models for their patterning mechanism, are in fact closely relatedin terms of structure and mechanism. The main difference between the positional information model andthe Turing model is the source of the gradient-forming morphogen, whereby the positional-informationmechanism takes advantage of an influx of morphogen from outside the patterning domain, whereasfor the Turing mechanism the morphogen is produced uniformly within the domain. The morphogeninflux naturally produces a morphogen gradient in the positional-information mechanism, so there is norequirement for specialised dynamics. In contrast, the Turing mechanism requires a Turing instability togenerate an oscillating gradient from the spatially uniform production of the morphogen, requiring veryparticular conditions for existence.
We have presented a new way of comparing models and their structures based on the relationshipsbetween different model aspects. Compared to some previous attempts at comparing model structures[35] our approach is readily automatable, and thus able to meet the demands of large-scale modellingattempts, and model-curation projects [36]. Studying the persistent homology of the relationships between20odel features allows us to determine meaningful distances between models. In addition to distances,here based on simplicial complex representations of the relationships between model features, we havealso developed and applied the concept of equivalent model features. This gives more nuanced insightsinto the relationships between different models and allows us to detect non-trivial similarities betweenmathematical models that are otherwise easily missed. Our realisation that the Turing activator-inhibitormodel and the positional-information annihilation model are in fact equivalent – which to our knowledgehas never before been shown or considered [25] – demonstrates the novel insights that can be gained fromour formalism for model comparison.We foresee more need for this in the immediate future: modelling is increasing in importance in thelife and biomedical sciences, yet remains to be fallible and often poorly grounded in reality [37, 38]. Theability to compare, contrast, reconcile, and triage potentially large sets of models will aid in makingmathematical modelling more helpful in biology.While we have applied our methodology to similar types of models, namely systems of reaction-diffusion equations, another important aspect of our formalism is that we can compare any models,irrespective of how distinct they are in form and size. Indeed, our methodology is universally applicableto all models.
Author contributions
STV and MPHS conceived and planned this analysis; STV conducted the research, performed the analysis,and drafted the manuscript; all authors reviewed, edited, and approved the final version.
Funding
The authors gratefully acknowledge funding through a “Life?” programme grant from the VolkswagenStiftung. MPHS is funded through the University of Melbourne Driving Research Momentum program.
Acknowledgments
We thank members of the Theoretical Systems Biology group at the University of Melbourne and ImperialCollege London, Heike Siebert (FU Berlin), James Briscoe (The Francis Crick Institute) and Mark Isalan(Imperial College London) for helpful discussions on Turing Patterns and model discrimination.21 eferences [1] Rosenblueth A, Wiener N. The role of models in science.
Philosophy of Science . 1945;12:316–321.doi:10.1086/286874.[2] Tomlin CJ, Axelrod JD. Biology by numbers: mathematical modelling in developmental biology.
Nature Reviews Genetics . 2007;8:331–340. doi:10.1038/nrg2098.[3] Sneddon MW, Faeder JR, Emonet T. Efficient modeling, simulation and coarse-graining of biologicalcomplexity with NFsim.
Nature Methods . 2010;8(2):177–183. doi:10.1038/nmeth.1546.[4] Pezzulo G, Levin M. Top-down models in biology: explanation and control of complex livingsystems above the molecular level.
Journal of The Royal Society Interface . 2016;13:20160555.doi:10.1098/rsif.2016.0555.[5] Transtrum MK, Qiu P. Bridging mechanistic and phenomenological models of complex biologicalsystems.
PLOS Computational Biology . 2016;12:e1004915. doi:10.1371/journal.pcbi.1004915.[6] Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W. Rules for modeling signal-transduction systems.
Science Signaling . 2006;2006:re6. doi:10.1126/stke.3442006re6.[7] Danos V, Feret J, Fontana W, Harmer R, Krivine J. Rule-based modelling of cellular signalling. In:Caires L, Vasconcelos VT, editors. CONCUR 2007 – Concurrency Theory. Springer Berlin Heidelberg;2007. p. 17–41. doi:10.1007/978-3-540-74407-8.[8] Bachman JA, Sorger P. New approaches to modeling complex biochemistry.
Nature Methods .2011;8:130–131. doi:10.1038/nmeth0211-130.[9] Wang RS, Saadatpour A, Albert R. Boolean modeling in systems biology: an overview of method-ology and applications.
Physical Biology . 2012;9:055001. doi:10.1088/1478-3975/9/5/055001.[10] Wolkenhauer O. Why model?
Frontiers in Physiology . 2014;5:21. doi:10.3389/fphys.2014.00021.[11] Tavassoly I, Goldfarb J, Iyengar R. Systems biology primer: the basic methods and approaches.
Essays in Biochemistry . 2018;62:487–500. doi:10.1042/EBC20180003.[12] Stalidzans E, Zanin M, Tieri P, Castiglione F, Polster A, Scheiner S, et al. Mechanistic modeling andmultiscale applications for precision medicine: theory and practice.
Network and Systems Medicine .2020;3:36–56. doi:10.1089/nsm.2020.0002. 2213] Turing AM. The chemical basis of morphogenesis.
Philosophical Transactions of the Royal Societyof London Series B, Biological Sciences . 1952;237:37–72. doi:10.1098/rstb.1952.0012.[14] Wolpert L. Positional information and the spatial pattern of cellular differentiation.
Journal ofTheoretical Biology . 1969;25:1–47. doi:10.1016/S0022-5193(69)80016-0.[15] Gierer A, Meinhardt H. A theory of biological pattern formation.
Kybernetik . 1972;12:30–39.doi:10.1007/BF00289234.[16] Le Nov`ere N. Quantitative and logic modelling of molecular and gene networks.
Nature ReviewsGenetics . 2015;16:146–158. doi:10.1038/nrg3885.[17] Gay S, Soliman S, Fages F. A graphical method for reducing and relating models in systems biology.
Bioinformatics . 2010;26:i575–i581. doi:10.1093/bioinformatics/btq388.[18] Schulz M, Krause F, Nov`ere NL, Klipp E, Liebermeister W. Retrieval, alignment, and clusteringof computational models based on semantic annotations.
Molecular Systems Biology . 2011;7:512.doi:10.1038/msb.2011.41.[19] Kirk P, Thorne T, Stumpf MPH. Model selection in systems and synthetic biology.
Current Opinionin Biotechnology . 2013;24:767–774. doi:10.1016/j.copbio.2013.03.012.[20] Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks.
Bioinformatics . 2014;30:2351–2359. doi:10.1093/bioinformatics/btu307.[21] Henkel R, Hoehndorf R, Kacprowski T, Kn¨upfer C, Liebermeister W, Waltemath D. No-tions of similarity for systems biology models.
Briefings in Bioinformatics . 2018;19:77–88.doi:10.1093/bib/bbw090.[22] Edelsbrunner H, Letscher D, Zomorodian A. Topological persistence and simplification.
Discrete &Computational Geometry . 2002;28:511–533. doi:10.1007/s00454-002-2885-2.[23] Zomorodian A, Carlsson G. Computing persistent homology.
Discrete & Computational Geometry .2005;33:249–274. doi:10.1007/s00454-004-1146-y.[24] Otter N, Porter MA, Tillmann U, Grindrod P, Harrington HA. A roadmap for the computation ofpersistent homology.
EPJ Data Science . 2017;6:17. doi:10.1140/epjds/s13688-017-0109-5.[25] Green JBA, Sharpe J. Positional information and reaction-diffusion: two big ideas in developmentalbiology combine.
Development . 2015;142:1203–1211. doi:10.1242/dev.114991.2326] Murray JD. Mathematical Biology. vol. I: An Introduction. Third edition ed. Springer New York;2002. doi:10.1007/b98868.[27] Marangoni AG. Enzyme Kinetics. Wiley-Interscience; 2003.[28] Sipser M. Introduction to the Theory of Computation. 3rd ed. Cengage Learning; 2013.[29] Rotman JJ. An Introduction to Algebraic Topology. Springer New York; 1988. doi:10.1007/978-1-4612-4576-6.[30] Munkres J. Elements of Algebraic Topology. CRC Press; 2018. doi:10.1201/9780429493911.[31] McHale P, Rappel WJ, Levine H. Embryonic pattern scaling achieved by oppositely directed mor-phogen gradients.
Physical Biology . 2006;3:107–120. doi:10.1088/1478-3975/3/2/003.[32] Ben-Naim E, Redner S. Inhomogeneous two-species annihilation in the steady state.
Journal ofPhysics A . 1992;25:L575–L583. doi:10.1088/0305-4470/25/9/012.[33] Meinhardt H. Turing ' s theory of morphogenesis of 1952 and the subsequent discovery of thecrucial role of local self-enhancement and long-range inhibition. Interface Focus . 2012;2:407–416.doi:10.1098/rsfs.2011.0097.[34] Landge AN, Jordan BM, Diego X, M¨uller P. Pattern formation mechanisms of self-organizingreaction-diffusion systems.
Developmental Biology . 2020;460:2–11. doi:10.1016/j.ydbio.2019.10.031.[35] Scholes NS, Schnoerr D, Isalan M, Stumpf MPH. A comprehensive network atlas reveals that Turingpatterns are common but not robust.
Cell Systems . 2019;9:243–257. doi:10.1016/j.cels.2019.07.007.[36] Malik-Sheriff RS, Glont M, Nguyen TVN, Tiwari K, Roberts MG, Xavier A, et al. BioModels — 15years of sharing computational models in life science.
Nucleic Acids Research . 2019;48:D407–D415.doi:10.1093/nar/gkz1055.[37] Gunawardena J. Models in biology: ‘accurate descriptions of our pathetic thinking’.
BMC Biology .2014;12:29. doi:10.1186/1741-7007-12-29.[38] Stumpf MPH. Multi-model and network inference based on ensemble estimates: avoiding the madnessof crowds.
Journal of The Royal Society Interface . 2020;17:20200419. doi:10.1098/rsif.2020.0419.24 upplementary Materialfor:
Model comparison via simplicial complexes and persistent homology
Sean T. Vittadello and Michael P. H. Stumpf a r X i v : . [ m a t h . A T ] D ec ontents Summary
In this Supplementary Material document we provide the simplicial representations and persistence barcodes for theadditional Turing and positional information models that we discuss in the main document.To assist the reader of the main document, we also provide a concise discussion of the background in algebraictopology that is relevant to this work, namely simplicial and persistent homology. Further details on simplicial ho-mology are available from standard references for algebraic topology [1–3], and further details on persistent homologycan be found in [4–6].Additionally, we describe the notation for reaction-diffusion equations that we employ in the main document.
We consider four additional models of positional information that produce patterning.
A linear concentration profile of a morphogen results when the production and degradation of the morphogen occuroutside of the tissue domain on opposite sides, and the morphogen passively diffuses along the domain from the sidewhere it is produced to the side where it is degraded [7–10]. Mathematically, the steady-state morphogen concentrationin this system satisfies Laplace’s equation, which yields global scale-invariant positional information. Specifically, ifthe tissue length is L with initial position at x = 0 then the morphogen concentration m ( x, t ) can be modelled as ∂m∂t = D ∂ m∂x , (S1)where D is the morphogen diffusivity. The steady-state solution of Equation S1 is the linear equation m ( x ) = ax + b ,for arbitrary constants a , b ∈ R . The boundary conditions must be influx at one end and outflux at the other end,and we may assume without loss of generality that influx occurs at x = 0 and outflux at x = L . The original Dirichletboundary conditions specify the constant concentrations m (0 , t ) = m > m ( L, t ) = m L ≥
0, with m > m L , sothe solution is m ( x ) = (cid:0) ( m L − m ) /L (cid:1) x + m . Monotonically decreasing linear gradients can also be achieved withNeumann boundary conditions at one boundary and Dirichlet boundary conditions at the opposite boundary.For the simplicial representation of the linear model, the vertices and corresponding model components are:2 v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Influx 1 • v ←→ Outflux 1 • v ←→ Monotonic gradient • v ←→ Global scale-invarianceThe simplicial representation should capture the interconnections of the model components that result in the formationof the steady-state morphogen gradient, so for the linear model the simplicial complex is shown in Figure S1(a). Thecorresponding persistence barcode is shown in Figure S2(a).
In the SDD model, a morphogen gradient forms by morphogen production from a localised source at the boundarycombined with morphogen diffusion and uniform degradation throughout the tissue [10–12]. Mathematically, themorphogen concentration m ( x , t ) can be modelled as ∂m∂t = D ∇ m − km, (S2)where D is the morphogen diffusivity, and k is the morphogen degradation rate which is independent of morphogenconcentration. The boundary conditions are Neumann for the influx and zero outflux. The steady-state morphogengradient is an exponential function, so this system is not scale invariant unless D , k , and the influx vary with thetissue length L in a very specific manner [12].For the simplicial representation of the SDD model, the vertices and corresponding model components are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Influx 1 • v ←→ Monotonic gradientFor the SDD model the simplicial complex is shown in Figure S1(b). The corresponding persistence barcode is shownin Figure S2(b).
We discuss two mechanisms whereby two opposing morphogen gradients provide size information for developmentalpatterning [13]. The first is the scaling by opposing gradients model, where the gene expression depends on therelative concentrations of the two morphogens. The second is the annihilation model, where morphogen 2 irreversiblybinds to morphogen 1 which inhibits its activity of transcription, and the target gene responds to the concentrationof morphogen 1 and the gradient of morphogen 2 provides size information to the concentration field of morphogen1. Here we consider the first mechanism, opposing gradients, and next consider the second mechanism.3 a) Positional-information model - linear gradient(b) Positional-information model - synthesis-diffusion-degradation(c) Positional-information model - opposing gradients
12 6 74043Morphogen 1Diffusion 1 Influx 1 Outflux 1Monotonic gradientGlobal scale-invariance1 2 3 640Morphogen 1 Diffusion 1 Degradation 1Influx 1Monotonic gradient123 6 9 10 11134042Morphogen 1Diffusion 1Degradation 1 Influx 1 Morphogen 2 Diffusion 2 Degradation 2Influx 2Monotonic gradientLocal scale-invariance (d) Positional-information model - modulation (e) Turing model - substrate depletion(f) Turing model - inhibition of an inhibition (g) Turing model - modulation
Figure S1: Simplicial complex representations for additional positional information and Turing models.
Positionalinformation models: (a) Linear gradient; (b) Synthesis-diffusion-degradation; (c) Opposing gradients; (d) Modulation. Turingmodels: (e) Substrate depletion; (f) Inhibition of inhibition; (g) Modulation. igure S2: Persistence barcodes for additional positional information and Turing models. Positional informationmodels: (a) Linear gradient; (b) Synthesis-diffusion-degradation; (c) Opposing gradients; (d) Modulation. Turing models: (e)Substrate depletion; (f) Inhibition of inhibition; (g) Modulation. m ( x , t ) and c ( x , t ) can be modelled as ∂m∂t = D m ∇ m − k m m, (S3) ∂c∂t = D c ∇ c − k c c, (S4)where D m and D c are diffusivities, and k m and k c are degradation rates. The boundary conditions for each morphogenare Neumann at both boundaries, with flux at the source and zero flux at the opposite boundary.For the simplicial representation of the opposing gradients model, the vertices and corresponding model compo-nents are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Influx 1 • v ←→ Morphogen 2 • v ←→ Diffusion 2 • v ←→ Degradation 2 • v ←→ Influx 2 • v ←→ Monotonic gradient • v ←→ Local scale-invarianceFor the opposing gradients model the simplicial complex is shown in Figure S1(c). The corresponding persistencebarcode is shown in Figure S2(c).
The largest degree of scaling across the patterning domain is obtained when the biophysical properties of the mor-phogen are influenced by an accessory modulator that senses the size of the domain [10–12, 15]. The modulatormolecules with concentration c ( x , t ) may influence the diffusivity, degradation rate, or influx rate, of the morphogenwith concentration m ( x , t ). If the kinetics and source of the modulator are independent of the morphogen concentra-tion then the mechanism is passive modulation . Otherwise the mechanism is active modulation , and the distributionof the modulator is dependent on feedback from the morphogen. An example of active modulation is the expansion-repression mechanism, whereby the modulator expands the range of the morphogen gradient through an increase inthe morphogen diffusivity or a decrease in the morphogen degradation rate, while the morphogen represses/inhibits6roduction of the modulator [15]. The expander molecule, here the modulator, needs to have a high diffusivity inorder to rapidly equilibrate across the domain and therefore provide information about domain size. Therefore, thismechanism establishes global scaling by increasing morphogen range in larger domains and decreasing morphogenrange in smaller domains. Correspondingly, the induction-contraction model also scales globally [16, 17]. In this case,the morphogen induces the production of a contractor molecule, here the modulator with high diffusivity once again,which contracts the range of the morphogen gradient through a decrease in the morphogen diffusivity or an increaseof the morphogen degradation rate. Note that while the amplitude and shape of the morphogen gradient is globallyscale-invariant, the same is not necessarily true of the modulator since the modulator level reflects the domain sizeand therefore increases or decreases accordingly.Here we consider the induction-contraction mechanism, which can be represented mathematically as ∂m∂t = D m ( c ) ∇ m − k m ( c ) m, (S5) ∂c∂t = D c ∇ c − k c c + ρ ( m ) , (S6)where D m ( c ) and D c are diffusivities, k m ( c ) and k c are degradation rates, and ρ ( m ) is the localised production sourcefor the modulator. Note that D m ( c ) is a decreasing function of c , k m ( c ) is an increasing function of c , and ρ ( m ) isan increasing function of m . Further note that ‘degradation’ in this context refers to not only physical destruction,but to any mechanism that effects the removal of the morphogen from the patterning system, including irreversiblecomplex formation. Morphogen boundary conditions are Neumann for the influx and zero flux at the opposite endof the domain. For the modulator there are various possibilities for the boundary conditions, such as the same typeof boundary conditions as the morphogen when the modulator source is outside the domain or, as we consider here,zero flux at both boundaries when the modulator source is within the domain.For the simplicial representation of the induction-contraction modulation model, the vertices and correspondingmodel components are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Influx 1 • v ←→ Modulator 1 • v ←→ Diffusion of modulator 1 • v ←→ Degradation of modulator 1 • v ←→ Production of modulator 1 • v ←→ Modulation of diffusion 1 by modulator 1 • v ←→ Modulation of degradation 1 by modulator 1 • v ←→ Inhibition of modulator 1 by morphogen 1 • v ←→ Monotonic gradient • v ←→ Global scale-invariance7or the induction-contraction modulation model the simplicial complex is shown in Figure S1(d). The correspondingpersistence barcode is shown in Figure S2(d).
We consider three further Turing models that produce patterning. Note that the boundary conditions may beDirichlet, Neumann, Robin, or periodic [18–20].
While long-range inhibition of an autocatalytic activator can occur with an inhibitor, as in the activator-inhibitormodel, the antagonistic effect can also arise from the depletion of a substrate that is consumed during activator pro-duction. In this system, the production of an autocatalytic activator with concentration m ( x , t ) results in either director indirect depletion of a substrate with concentration s ( x , t ) [21]. This model can be represented mathematically as ∂m∂t = D m ∇ m + ρsm − k m m + ρ m , (S7) ∂s∂t = D s ∇ s − ρsm − k s s + ρ s , (S8)where D m and D s are diffusivities, ρ m and ρ s are basal production rates, k m and k s are degradation rates, and ρ is the source density for the autocatalytic reaction of the activator, similar to the activator-inhibitor model. Thediffusivity D s of the substrate must be much faster than the diffusivity D m of the activator, and it is assumed thatthe substrate is produced uniformly throughout the domain. We assume that the boundary conditions are zero fluxat both boundaries.For the simplicial representation of the substrate depletion model, the vertices and corresponding model compo-nents are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Basal production 1 • v ←→ Substrate 1 • v ←→ Diffusion of substrate 1 • v ←→ Degradation of substrate 1 • v ←→ Basal production of substrate 1 • v ←→ Self-activation of morphogen 1 • v ←→ Depletion of substrate 1 by morphogen 1 • v ←→ Oscillatory gradient • v ←→ Global scale-invarianceFor the substrate depletion model the simplicial complex is shown in Figure S1(e). The corresponding persistencebarcode is shown in Figure S2(e). 8 .2.2 Inhibition of an inhibition
Dynamics analogous to the activator-inhibitor model can be realised through an inhibition of an inhibition mechanism[21]. In this case, two morphogens with concentrations a ( x , t ) and c ( x , t ) inhibit the production of each other, therebyforming a switching system in which one of the morphogens becomes fully activated similar to being autcatalytic.In order that pattern formation occur, a third morphogen with concentration b ( x , t ) acts as a long-range signal thatdisrupts the indirect self-enhancement of either a or c . Morphogen b is rapidly diffusing, is produced under controlof a , and inhibits the inhibition of c production by a . therefore acting as inhibitor. This model can be representedmathematically as ∂a∂t = D a ∇ a + ρ a κ a + c − k a a, (S9) ∂b∂t = D b ∇ b + k b ( a − b ) , (S10) ∂c∂t = D c ∇ c + ρ c κ c + a /b − k c c, (S11)where D a , D b , and D c are diffusivities, ρ a and ρ c are production rates, k a and k c are degradation rates, k b is both therate of production and degradation of b , and κ a and κ c are saturation constants that limit the production rate if theconcentrations of a or c become too low. We assume that the boundary conditions are zero flux at both boundaries.For the simplicial representation of the inhibition of an inhibition model, the vertices and corresponding modelcomponents are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Production 1 • v ←→ Morphogen 2 • v ←→ Diffusion 2 • v ←→ Degradation 2 • v ←→ Morphogen 3 • v ←→ Diffusion 3 • v ←→ Degradation 3 • v ←→ Production 3 • v ←→ Inhibition of morphogen 1 by morphogen 3 • v ←→ Inhibition of morphogen 3 by morphogen 1 • v ←→ Inhibition of an inhibition by morphogen 2 • v ←→ Production of morphogen 2 by morphogen 1 • v ←→ Oscillatory gradient • v ←→ Global scale-invarianceFor the inhibition of an inhibition model the simplicial complex is shown in Figure S1(f). The corresponding persis-tence barcode is shown in Figure S2(f). 9 .2.3 Modulation
The diffusion of a morphogen may be inhibited by adsorption on negatively-charged extracellular matrix (ECM) com-ponents, resulting in modulated diffusion and a smaller effective diffusivity for the morphogen [22]. The correspondingmodulation model [22] is an extension of the activator-inhibitor model [21, 23, 24]. Our description of the modulationmodel is based on the version of the activator-inhibitor model described above. The modulation model consists oftwo diffusible morphogens, an autocatalytic activator and an inhibitor, along with available binding sites on the ECMonto which the activator adsorbs. The inhibitor does not bind to the ECM, and the free activator and the inhibitorhave equal diffusivities D . It is assumed that autocatalysis and activation of the inhibitor by the activator occurs inboth the free and adsorbed states. It is also assumed that the degradation rate of the activator is equal in both thefree and adsorbed states. Free binding sites on the ECM therefore appear due to both desorption and degradation ofthe activator. Such modulation allows for stable dissipative morphogens gradients. Mathematically, the free activatorhas concentration a ( x , t ), the bound activator has concentration b ( x , t ), the inhibitor has concentration c ( x , t ), andthe concentration of available binding sites on the ECM is s ( x , t ): ∂a∂t = D ∇ a + ρ ( a + b ) c (1 + µ a ( a + b ) ) − k a a + ρ a − k sa + k − b, (S12) ∂c∂t = D ∇ c + ρ ( a + b ) − k c c + ρ c , (S13) ∂s∂t = − k sa + ( k − + k a ) b, (S14)where D is the diffusivity, ρ a and ρ c are basal production rates, k a and k c are degradation rates, µ a is the saturationconstant, ρ is the source density, k is the rate of adsorption of the activator onto the ECM, and k − is the rate ofdesorption of the activator from the ECM. We assume that the boundary conditions are zero flux at both boundaries.For the simplicial representation of the modulation model, the vertices and corresponding model components are: • v ←→ Morphogen 1 • v ←→ Diffusion 1 • v ←→ Degradation 1 • v ←→ Basal production 1 • v ←→ Morphogen 1 bound • v ←→ Morphogen 2 • v ←→ Diffusion 2 • v ←→ Degradation 2 • v ←→ Basal production 2 • v ←→ Self-activation of morphogen 1 • v ←→ Activation of morphogen 2 by morphogen 1 • v ←→ Inhibition of morphogen 1 by morphogen 2 • v ←→ Adsorption of morphogen 1 • v ←→ Desorption of bound morphogen 1 • v ←→ Oscillatory gradient • v ←→ Global scale-invariance10or the modulation model the simplicial complex is shown in Figure S1(g). The corresponding persistence barcode isshown in Figure S2(g).
Homology associates algebraic invariants, in particular abelian groups, to a topological space. These invariantsdetermine the number of connected components, holes, and voids in the space. This homological characterisationtherefore provides a method for distinguishing between topological spaces. There are various types of homologytheory, however simplicial homology often allows for easier calculation of the algebraic invariants in applications,and agrees with more general theories such as singular homology on spaces that can be triangulated. Note that atopological space and its triangulation, a simplicial complex, are homeomorphic, so they have the same homology.Triangulations of data sets provide common examples of simplicial complexes [25].We can view simplicial complexes as either combinatorial objects or as geometric objects, which we now describe.
Definition
Abstract simplicial complex ) . An abstract simplicial complex is a set V along with a collection K of finite nonempty subsets of V such that:1. { v } ∈ K for all v ∈ V .2. If σ ∈ K and τ is a nonempty subset of σ then τ ∈ K .The set V is the vertex set of the simplicial complex K , and the vertex v ∈ V is identified with { v } ∈ K . Eachelement σ ∈ K is called a simplex of K , and the dimension of σ is the nonnegative integer given by dim( σ ) = | σ | − σ ∈ K is a p -simplex if the dimension of σ is p , which is written as σ p for clarity if required. In particular,vertices are 0-simplices. Each nonempty subset τ ⊆ σ is called a face of σ , and is a proper face when τ = σ . The dimension of K is the largest dimension of its simplices, unless there is no finite upper bound for the dimensions ofthe simplices in which case the dimension of K is infinite. A subcomplex of K is a collection L ⊆ K that is alsoan abstract simplicial complex. While the definition of an abstract simplicial complex allows for an infinite set ofvertices V , henceforth we only consider finite V and therefore finite simplicial complexes K , since infinite-dimensionalcomplexes are not required for our work.Abstract simplicial complexes are combinatorial objects, however simplicial complexes can also be defined asgeometric objects in Euclidean space R n . In fact, an abstract simplicial complex has a geometric realisation asa geometric simplicial complex in R n . Correspondingly, if we just consider the subsets of vertices in a geometricsimplicial complex then we can obtain a corresponding abstract simplicial complex.11 efinition Geometric p -simplex ) . A geometric p -simplex σ in R n is the convex hull of p +1 affinely independentpoints in R n , which are the vertices of the p -simplex. A face of σ is the convex hull of a subset of the p + 1 affinelyindependent points that determine σ .Therefore, 0-simplices are points, 1-simplices are line segments, 2-simplices are triangles, 3-simplices are tetrahedra,and so forth in higher dimensions. Definition
Geometric simplicial complex ) . A geometric simplicial complex is a collection K of geometricsimplices such that:1. Every face of a simplex from K is also in K .2. If two simplices have nonempty intersection then they intersect at a common face.The abstract and geometric realisations of a simplicial complex are equivalent, so we may move between the tworealistions as required. Definition
Orientation of a simplex ) . Let V = { v i } pi =0 be the vertex set of a simplex σ . Denote an ordering of V by [ v i ] i ∈ I , where I is a permutation of the integers { , , . . . , p } . Two orderings of V are equivalent if they differ byan even permutation. If dim( σ ) ≥ V form two equivalence classes, and there is one equivalenceclass when dim( σ ) = 0. Each equivalence class is an orientation of the simplex σ . An oriented simplex is a simplex σ together with an orientation of σ . Definition
Oriented simplex complex ) . An oriented simplicial complex is a simplicial complex K togetherwith a partial order on the set of vertices of K , such that the restriction of the partial order to the vertices of anysimplex in K is a total order.Note that every total ordering of the vertices of a simplicial complex K gives an oriented simplicial complex. Inthe following we assume that K is an oriented simplicial complex. Definition p -chain ) . A p -chain is a finite formal sum of oriented p -simplices in K , c = P i a i σ pi , where thecoefficients a i are elements of an abelian group.For our purposes the coefficients are always from the finite cyclic group Z / Z , which we can identify as { , } ,and we can denote by F for simplicity since Z / Z is also a finite field. Under coefficients from F a p -chain is simplya set of p -simplices. Definition p -chain group ) . Denote by C p the set of all p -chains over K . Define a binary operation, denoted +,12n the set of p -chains C p by c + c = X i a i σ pi + X i b i σ pi = X i (cid:0) ( a i + b i (mod 2) (cid:1) σ pi , (S15)so that the sum of a simplex with itself is zero. Then ( C p , +) is an abelian group with identity P i σ pi and inverse − c = c . The set of p -simplices { σ pi } ni =1 in K generates C p and is the minimal set of such generators. Therefore,( C p , +) is a free abelian group, called the p -chain group , with basis { σ pi } ni =1 .For simplicity we denote the p -chain group as C p , without reference to the group operation. Definition
Boundary operator ) . For p ≥ boundary operator ∂ p : C p → C p − maps a p -simplex to thesum of the ( p − boundary . Specifically, for an oriented p -simplex σ p = [ v i ] pi =0 we have ∂ p ( σ p ) = p X i =0 ( − i [ v , . . . , ˆ v i , . . . , v p ] , (S16)where [ v , . . . , ˆ v i , . . . , v p ] is the ( p − v i . The boundary operator on general p -chains is given by ∂ p (cid:18) X i a i σ pi (cid:19) = X i ( − i a i ∂ p ( σ pi ) . (S17)Note that ∂ p is a group homomorphism. For 0-chains we define C − = 0, then the boundary operator ∂ satisfies ∂ ( c ) = 0 for all 0-chains c . Since we only consider chains with coefficients from F , all of the coefficients ( − i equalone.We now consider two important subgroups of C p . Definition
Cycle group, p -cycle ) . Denote by Z p the set of all p -chains in C p with boundary zero, that is, Z p = ker( ∂ p ) = { c ∈ C p | ∂ p ( c ) = 0 } . (S18)Since ∂ p is a homomorphism, Z p is a subgroup of C p called the cycle group . The p -chains in Z p are called p -cycles .Note that every 0-chain is a 0-cycle, so it follows that Z = C . Definition
Boundary group, p -boundary ) . Denote by B p the set of all p -chains in C p which are the boundaryof a ( p + 1)-chain, that is, B p = im( ∂ p +1 ) = { c ∈ C p | ∃ b ∈ C p +1 with ∂ p +1 ( b ) = c } . (S19)Since ∂ p +1 is a homomorphism, B p is a subgroup of C p called the boundary group . The p -chains in B p are called p -boundaries . 13ote that ∂ p ◦ ∂ p +1 = 0 for all p ≥
0, hence a boundary does not have a boundary so is a cycle, or equivalently B p is a subgroup of K p .The boundary operators provide a connection between the chain groups: Definition
Chain complex ) . Let K be an n -dimensional simplicial complex. Then the chain complex C ∗ isthe sequence 0 −−−→ C n ∂ n −−−→ C n − ∂ n − −−−→ · · · ∂ −−−→ C ∂ −−−→ C ∂ −−−→ , (S20)where ∂ k ∂ k +1 = 0 for all k . The sequence is augmented on the right by a 0, and ∂ = 0. On the left we have C n +1 = 0as there are no ( n + 1)-simplices in K .Since C p is abelian, B p is a normal subgroup and we can therefore construct the quotient group Z p /B p . Definition p -homology group ) . The p -homology group H p is defined as the quotient group Z p /B p .The p -homology group H p consists of equivalence classes of p -cycles that are not boundaries for any ( p + 1)-chains. Further, each equivalence class in H p consists of homologous p -cycles that differ only in a p -boundary. So, H p is nonzero if and only if the simplicial complex K has p -cycles that are not boundaries, or informally, K has p -dimensional holes. Computationally we are interested in the number of linearly independent p -dimensional holes,which we will see is given by the p -Betti number.Since the oriented p -simplices form a basis for the p -chain group C p , we can represent the boundary operator ∂ p : C p → C p − as a matrix with entries in F , called the boundary matrix. We can compute the homology groupsof a simplicial complex by reducing the boundary matrix to Smith normal form [3, Page 55, Theorem 11.3]. Thereduction algorithm employs the following elementary column and row operations on the boundary matrix [5]: Matrix operation 1:
Exchange two columns or rows;
Matrix operation 2:
Multiply any column or row by − Matrix operation 3:
Add an integer multiple of one column or row to another.The elementary column operations effect a change of basis for C p , and similarly the elementary row operations effecta change of basis for C p − , so these operations preserve the rank of the matrix. Since we are computing homologyover F , for which − Definition
Boundary matrix and Smith normal form ) . Let { σ pi } ni =1 and { σ p − i } mi =1 be bases for C p and14 p − , respectively. The boundary matrix M p of the boundary operator ∂ p : C p → C p − is the matrix σ p σ p · · · σ pn σ p − a a · · · a n σ p − a a · · · a n ... ... ... . . . ... σ p − m a m a m · · · a mn (S21)where a ij = 1 if and only if σ p − i is a face of σ pj , otherwise a ij = 0. By using column and row operations, M p can bereduced to Smith normal form S p , which has the same rank as M p : S p = d
0. . . 00 d l . (S22)The nonzero entries on the main diagonal, d i for 1 ≤ i ≤ l , satisfy d i | d i +1 for 1 ≤ i < l , which holds trivially for amatrix over F since each d i is one.The rank of S p is the number of ones on the main diagonal of S p , which we denote by b p − , and corresponds to p -chains with nonzero boundaries that generate B p − . Denote by z p the number of right-most columns of S p that arezero, which corresponds to p -cycles that generate Z p . Then r p = b p − + z p , where r p is the number of columns of S p .For p ≥ p -Betti number β p is obtained as β p = rank( Z p ) − rank( B p ) = z p − b p , (S23)and gives the number of linearly independent p -dimensional holes in K . Persistence is a measure for topological attributes [6], and persistent homology studies topological invariants thatreveal the topological features of a space that persist across multiple scales [4, 5]. Our discussion of persistenthomology follows [5]. We first define the notion of a weighted simplicial complex [26].
Definition
Weighted simplicial complex ) . A weighted simplicial complex is a pair ( K, w ) where K is asimplicial complex and w : K → R is a weight function over K such that if σ , τ ∈ K with σ ⊆ τ then w ( σ ) ≤ w ( τ ).There are various methods to assign a weight function to an unweighted simplicial complex. For example, a weight15unction can be constructed from a discrete Morse function [27]. Definition
Simplicial filtration, filtered simplicial complex, filtration index ) . Given a weighted simpli-cial complex (
K, w ) and a strictly-increasing finite sequence { λ i } ni =0 ⊆ R , the induced simplicial filtration is a nestedsequence of subcomplexes K ⊆ K ⊆ · · · ⊆ K n = K where K i := w − (( −∞ , λ i ]). For notational simplicity wedenote the filtration as { K i } ni =0 . A simplicial filtration, or simply filtration, is also referred to as a filtered simplicialcomplex . The filtration index of a simplex σ ∈ K is equal to i if σ ∈ K i and σ / ∈ K i − .Sometimes λ is chosen so that K = ∅ , however this is not always necessary in practice. Definition
Simplicial subfiltration ) . Let (
K, w ) be a weighted simplicial complex, let { λ i } ni =0 ⊆ R be astrictly-increasing finite sequence, and let { K i } ni =0 be a filtration for K . For a subcomplex L ⊂ K , a subfiltration for L is induced by the restriction of w to L , that is, the weighted simplicial complex is ( L, w | L ) and the subfiltrationis { K i ∩ L } ni =0 . Note that we may have K i ∩ L = K j ∩ L for some filtration indices i = j , so we therefore take thesmallest such index as the corresponding filtration index.Given a filtration we can consider the p -homology groups of each subcomplex in the filtration. Definition
Homology of filtration ) . Let (
K, w ) be a weighted simplicial complex with filtration { K i } ni =0 , andfix 0 ≤ i ≤ n . Let Z ip and B ip be the p -cycle and p -boundary groups, respectively, of K i . The p -homology group of K i is H ip = Z ip /B ip , and the p -Betti number β ip of K i is the rank of H ip .The p -Betti numbers of a simplicial filtration describe the topology of the filtered complex in terms of a sequenceof integers. These Betti numbers, however, may not describe the underlying topological space very well, since whilethe Betti numbers describe the topology of the underlying space, they also describe the noise associated with theparticular simplicial representation which appears as additional topological attribute. So the meaningful topologicalinformation is hidden within the topological noise.In order to obtain meaningful information about the underlying space using a simplicial representation, we need tobe able to distinguish between the noise arising from the representation and the topological features of the underlyingspace. We therefore require a measure of significance for the topological attributes associated with a simplicialrepresentation, and one very effective measure is persistence. The idea behind persistence is that a significanttopological attribute has a long lifetime, or persists, in a filtration. Persistence can therefore be defined solely interms of a filtration.In simplicial homology, nonbounding cycles from the same homology class are homologous to each other, meaningthat the cycles differ only by a boundary. In persistent homology we want to determine the nonbounding cyclesthat persist in the filtration, that is, the cycles that are nonbounding for a significant number of filtration indicesbefore possibly becoming boundaries. More formally, we factor the p -cycle group Z ip of the subcomplex K i by the p -boundary group B i + jp of the subcomplex K i + j that is j indices ahead in the filtration.16 efinition Persistent homology ) . Let (
K, w ) be a weighted simplicial complex with filtration { K i } ni =0 , andfix 0 ≤ i ≤ n . The j -persistent p -homology group H i,jp of K i is H i,jp = Z ip / ( B i + jp ∩ Z ip ) . (S24)The j -persistent p -Betti number β i,jp of K i is the rank of H i,jp .Note that H i,jp is a group since it is the intersection of the two subgroups Z ip and B i + jp of C i + jp .In order to understand the structure of persistent homology it is useful to approach it from a different perspective.Our intuition suggests that the computation of persistence necessitates compatible bases for the groups H ip and H i,jp ,however we need to establish the existence of a succinct description for the compatible bases. This is achieved in [5]by combining the homology of every complex in the filtration into a single algebraic structure. Definition
Persistence complex ) . A persistence complex C is a family of chain complexes { C i ∗ } i ≥ over acommutative ring R with unity, together with chain maps f i : C i ∗ → C i +1 ∗ so that we have the following diagram: C ∗ f −−−→ C ∗ f −−−→ C ∗ f −−−→ · · · . (S25)A filtered simplicial complex K with inclusion maps for the simplices is a persistence complex. Definition
Persistence module ) . Let R be a commutative ring with unity. A persistence module M is afamily of R -modules M i together with homomorphisms φ i : M i → M i +1 .The homology of a persistence complex is a persistence module, where the homomorphism φ i maps a homologyclass to the one that contains it. Definition
Finite type ) . Let R be a commutative ring with unity. A persistence complex { C i ∗ , f i } (resp.persistence module { M i , φ i } ) is of finite type if each component complex (resp. module) is a finitely generated R -module and if the maps f i (resp. φ i ) are isomorphisms for i ≥ m for some integer m .Since we consider only finite simplicial complexes K , each such K generates a persistence complex C of finite typewhose homology is a persistence module M of finite type.In the following we require the notions of graded rings and modules [28, Chapter 1, page 1; Chapter 2, page19]: Definition
Graded rings and modules ) . Let R be a ring with unity, and let G be a multiplicative group.Then R is a G-graded ring if there is a family { R g | g ∈ G of additive subgroups R g of R such that R = ⊕ g ∈ G R g ascommutative groups and R g R h ⊆ R gh for all g , h ∈ G . The set h ( R ) = ∪ g ∈ G R g is the set of homogeneous elements of R, and a nonzero element x ∈ R g is said to be homogeneous of degree g .A (left) G -graded R -module , or simply a graded module over R , is a left R -module M such that M = ⊕ g ∈ G M g where each M g is an additive subgroup of M , and for every g , h ∈ G we have R g M h ⊆ M gh . Since R e M h ⊆ M h , every17 h is an R e -submodule of M . The elements of ∪ h ∈ G M h are the homogeneous elements of M . A nonzero element m ∈ M h is homogeneous of degree h .An important example of a graded ring is the standard grading of a polynomial ring [29, Chapter 1, page 1]: Definition
Standard grading ) . Let S be the polynomial ring F [ x , . . . , x n ] over the field F . Set deg( x i ) = 1for each i . A monomial x c · · · x c n has degree c + · · · + c n . For a nonnegative integer i we denote by S i the F -vectorspace spanned by all monomials of degree i . In particular, S = F . A polynomial u ∈ S is called homogeneous if u ∈ S i for some i , and we then say that u has degree i and write deg( u ) = i . Note that 0 is a homogeneous elementwith arbitrary degree. The following two properties are equivalent, and hold for S:1. S i S j ⊆ S i + j for all nonnegative integers i and j .2. deg( uv ) = deg( u ) + deg( v ) for every two homogeneous elements u , v ∈ S .Every polynomial f ∈ S can be written uniquely as a finite sum f = P i f i of nonzero elements f i ∈ S i , and inthis case f i is called the homogeneous component of f of degree i . Therefore we have a direct-sum decomposition S = ⊕ i ≥ S i of S as a F -vector space, where S i S j ⊆ S i + j for all nonnegative integers i and j . We say that S has the standard grading .Suppose we have a persistence module M = { M i , φ i } i ≥ over a commutative ring R with unity. Equip thepolynomial ring R [ t ] with the standard grading, and define a graded module over R [ t ] by α ( M ) = ∞ M i =0 M i , (S26)where the R -module structure is simply the sum of the structures on the individual components, and the action of t is given by t · ( m , m , m , . . . ) = (0 , φ ( m ) , φ ( m ) , φ ( m ) , . . . ) . (S27)Note that t simply shifts elements of the module up in the gradation.We now obtain the following correspondence: Theorem 3.24 ( Correspondence ) . The correspondence α defines an equivalence of categories between the categoryof persistence modules of finite type over R and the category of finitely generated nonnegatively graded modules over R [ t ] . Intuitively, we are constructing a single algebraic structure that contains all of the simplicial subcomplexes in thefiltration. We begin by computing a direct sum of the complexes, arriving at a much larger space that is gradedaccording to the filtration ordering. We then remember the time each simplex enters using a polynomial coefficient.The key idea is that the filtration ordering is encoded in the coefficient polynomial ring.18he correspondence described in Theorem 3.24 suggests the nonexistence of simple classifications of persistencemodules over a ground ring that is not a field, such as Z . It is well known in commutative algebra that the classificationof modules over Z [ t ] is extremely complicated. While it is possible to assign interesting invariants to Z [ t ]-modules, asimple classification is not available, nor is it ever likely to be available.On the other hand, the correspondence gives a simple decomposition when the ground ring is a field F . Here, thegraded ring F [ t ] is a principal ideal domain and its only graded ideals are homogeneous of form ( t n ), so the structureof the F [ t ]-module is described as: (cid:18) n M i =1 Σ α i F [ t ] (cid:19) ⊕ (cid:18) m M j =1 Σ γ j F [ t ] / ( t n j ) (cid:19) , (S28)where n j ≤ n j +1 for 1 ≤ j ≤ m − α i , γ j in Z , and Σ α denotes an α -shift upward in grading.We wish to parametrise the isomorphism classes of F [ t ]-modules by suitable objects. Definition P -interval ) . A P -interval is an ordered pair ( i, j ) where 0 ≤ i < j for i ∈ Z and j ∈ Z ∪ { + ∞} .We associate a graded F [ t ]-module to a set S of P -intervals via a bijection Q , which is defined by Q ( i, j ) =Σ i F [ t ] / ( t j − ) for P -interval ( i, j ), noting that Q ( i, + ∞ ) = Σ i F [ t ]. For a set of P -intervals S = { ( i , j ) , ( i , j ) . . . , ( i n , j n ) } ,we define Q ( S ) = n M l =1 Q ( i l , j l ) . (S29)Our correspondence may now be restated as follows. Corollary 3.26.
The correspondence S → Q ( S ) defines a bijection between the finite sets of P -intervals and thefinitely generated graded modules over the graded ring F [ t ] . Consequently, the isomorphism classes of persistencemodules of finite type over F are in bijective correspondence with the finite sets of P -intervals. In summary, we are interested in the k th homology of a filtered simplicial complex K . In each dimension thehomology of the subcomplex K i becomes a vector space over a field, and is described fully by its rank β ik . We need tochoose compatible bases across the filtration in order to compute persistent homology for the entire filtration. So, weform the persistence module corresponding to K , which is a direct sum of these vector spaces. The structure theoremstates that a basis exists for this module that provides compatible bases for all of the vector spaces. In particular,each P -interval ( i, j ) describes a basis element for the homology vector spaces starting at time i until time j −
1. Thiselement is a k -cycle that is completed at time i , forming a new homology class. It also remains nonbounding untiltime j , at which time it joins the boundary group B jk .The persistent homology of a filtered simplicial complex is obtained by combining the homology of all subcomplexesin the simplicial filtration into a single algebraic structure, namely a graded module over a polynomial ring withindeterminate t , for which the standard homology is the persistent homology of the complex. Informally, the direct19um of the subcomplexes in the filtration gives a larger space that is graded according to the order of the filtration.Moving through the filtration, when a simplex enters the next subcomplex it is indicated with a polynomial coefficient,or from the perspective of the graded module, the simplex is shifted along the grading by multiplying the simplexusing t . The graded module therefore is a single structure that contains all of the subcomplexes in the filtration. Definition H p -barcode ) . Let (
K, w ) be a weighted simplicial complex. An H p -barcode is a multiset L p ofintervals of two forms:1. [ i, ∞ ), which corresponds to a topological attribute that is created at filtration index i ∈ N ∪ { } and exists inthe final structure;2. [ j, k ), which corresponds to a topological attribute that is created at filtration index j ∈ N ∪ { } and is destroyedat index k ∈ N ∪ { } .The algorithm for computing the H p -barcodes is detailed in [5]. A reaction-diffusion system for N ≥ × I ⊂ R n × R ≥ , n ≥
1, consists of a set of N coupled reaction-diffusion equations in n spatial dimensions, ∂ c ∂t = ∇ · ( D ∇ c ) + f ( c ) , (S30)where t ∈ I is time, x = ( x , x , . . . , x n ) ∈ Ω is position, the state variable c : Ω × I → R N , ( x , t ) c ( x , t ), isthe vector of agent concentrations, D = diag( D , D , . . . , D N ) is the diagonal diffusivity matrix, and the system ofreactions is described by the vector-valued function of a vector variable f , which is generally nonlinear. We allowfor the diffusivity of each agent to depend on the local concentrations of the other agents, and we assume there isno cross-diffusion. Solutions of the system (S30) on the domain Ω × I with spatial boundary ∂ Ω require functionsspecifying both the boundary conditions c ( x , t ) | ∂ Ω and initial conditions c ( x , Neumann : ∂ c ∂ n ( x ) = h ( x ) ∀ x ∈ ∂ Ω , (S31)where n is the exterior normal to the boundary ∂ Ω, and h is a scalar function. The normal derivative is defined20y ∂ c ∂ n ( x ) = ∇ c ( x ) · ˆ n ( x ) , (S32)where ˆ n is the unit normal. In particular, zero flux boundary conditions, where h is the zero function, correspondto a boundary of the spatial domain which is insulated, reflective, or isolated from the external environment. Dirichlet : c ( x ) = h ( x ) ∀ x ∈ ∂ Ω , (S33)where ∂ Ω is the boundary, and h is a scalar function. Periodic : for functions and derivatives.These boundary conditions may also be combined as in Robin and mixed boundary conditions.21 eferences [1] Spanier EH. Algebraic Topology. Springer New York; 1966. doi:10.1007/978-1-4684-9322-1.[2] Rotman JJ. An Introduction to Algebraic Topology. Springer New York; 1988. doi:10.1007/978-1-4612-4576-6.[3] Munkres J. Elements of Algebraic Topology. CRC Press; 2018. doi:10.1201/9780429493911.[4] Edelsbrunner H, Letscher D, Zomorodian A. Topological persistence and simplification.
Discrete & Computa-tional Geometry . 2002;28:511–533. doi:10.1007/s00454-002-2885-2.[5] Zomorodian A, Carlsson G. Computing persistent homology.
Discrete & Computational Geometry . 2005;33:249–274. doi:10.1007/s00454-004-1146-y.[6] Zomorodian A. Topology for Computing. Cambridge University Press; 2005. doi:10.1017/cbo9780511546945.[7] Stumpf HF. ¨Uber den Verlauf eines schuppenorientierenden Gef¨alles bei
Galleria mellonella . Wilhelm Roux ' Archiv f¨ur Entwicklungsmechanik der Organismen . 1967;158:315–330. doi:10.1007/bf00573402.[8] Wolpert L. Positional information and the spatial pattern of cellular differentiation.
Journal of TheoreticalBiology . 1969;25:1–47. doi:10.1016/S0022-5193(69)80016-0.[9] Crick F. Diffusion in embryogenesis.
Nature . 1970;225:420–422. doi:10.1038/225420a0.[10] ˇCapek D, M¨uller P. Positional information and tissue scaling during development and regeneration.
Development .2019;146:dev177709. doi:10.1242/dev.177709.[11] Wartlick O, Kicheva A, Gonz´alez-Gait´an M. Morphogen gradient formation.
Cold Spring Harbor Perspectivesin Biology . 2009;1:a001255. doi:10.1101/cshperspect.a001255.[12] Umulis DM, Othmer HG. Mechanisms of scaling in pattern formation.
Development . 2013;140:4830–4843.doi:10.1242/dev.100511.[13] McHale P, Rappel WJ, Levine H. Embryonic pattern scaling achieved by oppositely directed morphogen gradi-ents.
Physical Biology . 2006;3:107–120. doi:10.1088/1478-3975/3/2/003.[14] Houchmandzadeh B, Wieschaus E, Leibler S. Precise domain specification in the developing
Drosophila embryo.
Physical Review E . 2005;72:061920. doi:10.1103/PhysRevE.72.061920.[15] Ben-Zvi D, Barkai N. Scaling of morphogen gradients by an expansion-repression integral feedback control.
Proceedings of the National Academy of Sciences . 2010;107:6924–6929. doi:10.1073/pnas.0912734107.2216] Rahimi N, Averbukh I, Haskel-Ittah M, Degani N, Schejter ED, Barkai N, et al. A WntD-dependent inte-gral feedback loop attenuates variability in
Drosophila toll signaling.
Developmental Cell . 2016;36:401–414.doi:10.1016/j.devcel.2016.01.023.[17] Shilo BZ, Barkai N. Buffering global variability of morphogen gradients.
Developmental Cell . 2017;40:429–438.doi:10.1016/j.devcel.2016.12.012.[18] Dillon R, Maini PK, Othmer HG. Pattern formation in generalized Turing systems.
Journal of MathematicalBiology . 1994;32:345–393. doi:10.1007/BF00160165.[19] Varea C, Arag´on JL, Barrio RA. Confined Turing patterns in growing systems.
Physical Review E . 1997;56:1250–1253. doi:10.1103/PhysRevE.56.1250.[20] Barrio RA, Varea C, Arag´on JL, Maini PK. A two-dimensional numerical study of spatial pattern formation ininteracting Turing systems.
Bulletin of Mathematical Biology . 1999;61:483–505. doi:10.1006/bulm.1998.0093.[21] Meinhardt H. Turing ' s theory of morphogenesis of 1952 and the subsequent discovery of the crucial role of localself-enhancement and long-range inhibition. Interface Focus . 2012;2:407–416. doi:10.1098/rsfs.2011.0097.[22] Nesterenko AM, Kuznetsov MB, Korotkova DD, Zaraisky AG. Morphogene adsorption as a Turing instabil-ity regulator: theoretical analysis and possible applications in multicellular embryonic systems.
PLOS ONE .2017;12:e0171212. doi:10.1371/journal.pone.0171212.[23] Gierer A, Meinhardt H. A theory of biological pattern formation.
Kybernetik . 1972;12:30–39.doi:10.1007/BF00289234.[24] Landge AN, Jordan BM, Diego X, M¨uller P. Pattern formation mechanisms of self-organizing reaction-diffusionsystems.
Developmental Biology . 2020;460:2–11. doi:10.1016/j.ydbio.2019.10.031.[25] Giusti C, Ghrist R, Bassett DS. Two’s company, three (or more) is a simplex.
Journal of ComputationalNeuroscience . 2016;41:1–14. doi:10.1007/s10827-016-0608-6.[26] Espinoza JF, Hern´andez-Amador R, Hern´andez-Hern´andez HA, Ramonetti-Valencia B. A numerical approachfor the filtered generalized ˇCech complex.
Algorithms . 2020;13:11. doi:10.3390/a13010011.[27] Kannan H, Saucan E, Roy I, Samal A. Persistent homology of unweighted complex networks via discrete Morsetheory.