Topological data analysis distinguishes parameter regimes in the Anderson-Chaplain model of angiogenesis
John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne
TTopological data analysis distinguishes parameter regimes inthe Anderson-Chaplain model of angiogenesis
John T. Nardini* , Bernadette J. Stolz , Kevin B. Flores , Heather A. Harrington ,Helen M. Byrne Department of Mathematics, North Carolina State University, Raleigh, NorthCarolina, USA Mathematical Institute, University of Oxford, Oxford, OX2 6GG, UK* [email protected]
Abstract
Angiogenesis is the process by which blood vessels form from pre-existing vessels. Itplays a key role in many biological processes, including embryonic development andwound healing, and contributes to many diseases including cancer and rheumatoidarthritis. The structure of the resulting vessel networks determines their ability todeliver nutrients and remove waste products from biological tissues. Here we simulatethe Anderson-Chaplain model of angiogenesis at different parameter values and quantifythe vessel architectures of the resulting synthetic data. Specifically, we propose atopological data analysis (TDA) pipeline for systematic analysis of the model. TDA is avibrant and relatively new field of computational mathematics for studying the shape ofdata. We compute topological and standard descriptors of model simulations generatedby different parameter values. We show that TDA of model simulation data stratifiesparameter space into regions with similar vessel morphology. The methodologiesproposed here are widely applicable to other synthetic and experimental data includingwound healing, development, and plant biology.
Author summary
Vascular networks play a key role in many physiological processes, by deliveringnutrition to, and removing waste from, biological tissues. In cancer, tumors stimulatethe growth of new blood vessels, via a process called angiogenesis. The resultingvascular structure comprises many inter-connected vessels which lead to emergentmorphologies that influence the rate of tumor growth and treatment efficacy. In thiswork, we consider several approaches to summarize the morphology of syntheticvascular networks generated from a mathematical model of tumor-induced angiogenesis.We find that a topological approach can be used quantify vascular morphology of modelsimulations and group the simulations into biologically interpretable clusters. Thismethodology may be useful for the diagnosis of abnormal blood vessel networks andquantifying the efficacy of vascular-targeting treatments.
Introduction
Blood vessels deliver nutrients to, and remove waste products from, tissues during manyphysiological processes, including embryonic development, wound healing, andJanuary 5, 2021 1/38 a r X i v : . [ q - b i o . Q M ] J a n ig 1. Schematic of tumor-induced angiogenesis. We depict six aspects oftumor-induced angiogenesis that are captured in the Anderson-Chaplain model [4]. (1)Tumor cells release a chemoattractant, or tumor angiogenesis factor (TAF), thatstimulates the formation of new blood vessels. (2) Endothelial cells lining pre-existingblood vessels sense the TAF. (3) Endothelial tip cells migrate, via chemotaxis, upspatial gradients of the TAF and deposit a “snail trail” of new vessels as they move. (4)Endothelial tip cells also migrate, via haptotaxis, up spatial gradients in fibronectinwhich is bound to the tissue matrix. (5) A migrating endothelial cell can branch intotwo cells, to form a new sprout. (6) If a sprout coincides with an existing vessel, then itis annihilated and a new loop is formed; if two sprouts coincide or anastomose, thenthey are both annihilated and a new loop forms.cancer [1–3]. The structure and form of connected blood vessels (e.g., vascularmorphology), determines how nutrients and waste are supplied or removed from theenvironment and, in turn, influences the behavior of the tissue and its constituent cells.The morphology of vascular networks can reveal the presence of an underlying disease,or predict the response of a patient to treatment. High resolution vascular imagingtechnology creates an exciting opportunity to develop mathematical tools to discovernew links between blood vessel structure and function. In this work, we focus on tumor-induced angiogenesis , the growth of new blood vessels from pre-existingvasculature, in response to a tumor (Fig 1). We explore several approaches forquantifying vascular morphology, including recently developed techniques fromtopological data analysis. We develop and test our methodology through application tosimulated data from a hybrid (i.e., discrete and continuous) stochastic mathematicalmodel of tumor-induced angiogenesis proposed by Anderson and Chaplain [4].The Anderson-Chaplain model is based on a continuum model proposed by Baldingand McElwain [5], and a stochastic model proposed by Stokes and Lauffenberger [6]. Itis an on-lattice model in which endothelial cells perform a biased random walk togenerate blood vessels that reproduce those observed experimentally [4]. While moredetailed theoretical models have been developed (see reviews [7, 8]), we focus on theAnderson-Chaplain model due to its simplicity. The model describes thespatio-temporal evolution of three physical variables: endothelial tip cells, tumorangiogenesis factor (TAF), and fibronectin [4]. Fig 1 contains a model schematic whereJanuary 5, 2021 2/38umor cells located along the right boundary produce TAF and the endothelial tip cellsfrom a nearby healthy vessel placed along the left boundary move via chemotaxis upspatial gradients of TAF. As they move, they leave behind a “snail trail” of new bloodvessel segments [7]. Endothelial tip cells also migrate via haptotaxis , up spatialgradients of fibronectin, which is bound to the tissue matrix in which the endothelialcells are embedded. As they migrate, tip cells may branch , producing two tip cells andcreating a new vessel segment. Further, when a tip cell collides with another vesselsegment or another tip cell, the tip cell may be annihilated and a closed circuit or loopforms; this process is known as anastomosis . Fig 2.
Data generation and analysis pipeline. (1) Spatio-temporal modeling:Anderson-Chaplain model. The Anderson-Chaplain model is simulated for 11 ×
11 =121 different values of the haptotaxis and chemotaxis parameters, ( ρ, chi ). The modeloutput is saved as a binary image, where pixels labeled 1 or 0 denote the presence orabsence, respectively, of blood vessels. We generate 10 realizations for each of the 121parameter combinations, leading to 1,210 images of simulated vessel networks. (2) Dataanalysis. A.) Standard descriptor vectors. We compute the number of active tip cellsand the number of vessel segments over time. We also compute the length of the vesselsat the final time point [9–12]. B.) Topological descriptor vectors. We construct floodingand sweeping plane filtrations [13] and use them to track the birth and death oftopological features (connected components and loops) that are summarized as Betticurves and persistence images [12]. (3) Data clustering. We perform k -means clusteringof the descriptor vectors for the set of 1,210 simulations from step 2 to decompose the( ρ, χ ) parameter space into regions that group vessel networks with similar morphologies.January 5, 2021 3/38e would like a quantitative method to understand and compare vessel networksthat is applicable to experimental and synthetic data. Standard quantitative descriptorsof vascular morphology proposed by [9–11, 14, 15] include inter-vessel spacing, thenumber of branch points, measuring the total area (volume) covered for 2-dimensional(3-dimensional) simulations, vessel length, density, tortuosity, the number of vesselsegments, and the number of endothelial sprouts. These metrics correlate with tumorprogression and treatment response in experimental datasets [16]; however, they arecomputed at a fixed spatial scale, and neglect information about the connectedness (i.e.,the topology) of vascular structures. Therefore, standard descriptors do not exploit thefull information content of synthetic data generated from the models. At the same time,advances in imaging technology mean that it is now possible to visualize vasculararchitectures across multiple spatial scales [17]. These imaging developments arecreating a need for new methods that can quantify the multi-scale patterns in vascularnetworks, and how they change over time.A relatively new and expanding field of computational mathematics for studying theshape of data at multiple scales is TDA. TDA consists of algorithmic methods foridentifying global structures in high-dimensional datasets that may be noisy [18, 19].TDA has been useful for biomedical applications [20–23], including brain vessel MRIdata [13] and experimental data of tumor blood vessel networks [24, 25]. Acommonly-used tool from TDA is persistent homology (PH). Homology refers to thecharacterization of topological features ( e.g. , connected components and loops) in adataset, and persistence refers to the extent to which these features persist within thedata as a scale parameter varies. PH is a useful way to summarize the topologicalcharacteristics of spatial network data [24–26] as well as noisy and high dimensionalagent-based models (ABMs) [27, 28]. More recently, PH has been combined withmachine learning to extract accurate estimates of parameters from such models [29].Most of these studies focus on PH calculations for point clouds of data; notably, PH hasbeen adapted to analyze networks and grayscale images [30]. An active area of researchis exploring different filtrations for application-driven PH.In this paper, we develop a computational pipeline for analyzing simulations of theAnderson-Chaplain model over a range of biologically-relevant parameter values. Ourmain objectives are to identify model descriptor vectors that group together similarmodel simulations, and then to interpret the resulting clusters.We use two topological approaches, the sweeping plane and flooding filtrations, toconstruct a simplicial complex from data generated from simulations of theAnderson-Chaplain model. We show that PH of the sweeping plane filtration and itssubsequent vectorization provides an interpretable descriptor vector for the modelparameters governing chemotaxis and haptotaxis. We show that this topologicalapproach leads to more biologically informative clusters than existingmorphologically-motivated standard descriptor vectors. Furthermore, the clustersgenerated from the sweeping plane filtration are robust and generalize well to unseenmodel simulations.
Materials and Methods
We simulate the Anderson-Chaplain model of tumor-induced angiogenesis for a range ofvalues of the haptotaxis and chemotaxis coefficients, ρ and χ . Each simulation generatesvasculature composed of a collection of interconnected blood vessels in the form ofbinary image data. We calculate standard descriptors of these data at 50 time steps tocreate 3 standard descriptor vectors for each simulation. We also compute 30topological descriptor vectors for each model simulation; they encode the proposedmultiscale quantification of the vascular morphology. We calculate standard andJanuary 5, 2021 4/38opological descriptor vectors for the entire set of 1,210 simulations. We then analyzethe simulations by clustering the standard or topological descriptor vectors and comparethe results. Fig 2 shows the pipeline of data generation and analysis. We give details ofthe Anderson-Chaplain model, including governing equations, parameter values andimplementation, in the Appendix . The Python files and notebooks used for all stepsof our analysis are publicly available at https://github.com/johnnardini/Angio TDA.
Data generation
We simulate the Anderson-Chaplain model as the parameters ρ and χ are variedindependently among the following 11 values: { , . , . , . , . . . , . } . We generate10 realizations of the model for each of the 121 ( ρ, χ ) parameter combinations, andproduce 1,210 binary images that summarize the different synthetic blood vesselnetworks. Each simulation runs until either an endothelial tip cell crosses the linecorresponding to x = 0 .
95 nondimensional spatial units or the maximum simulationtime of t = 20 nondimensional time units is exceeded. Each simulation is initializedusing the same initial condition and with all other model parameters set to the baselinevalues from [4]. We use no-flux boundary conditions. Morphologically-motivated standard descriptors and descriptor vectors
Morphologically-motivated standard descriptors are used to summarize physicalcharacteristics of each model simulation. The standard descriptors that we use include:the number of vessel segments ( N S ), the number of endothelial tip cells ( N T ), and thefinal length of the longest vessel ( L ) [5, 9–11]. Fig 3 illustrates the standard descriptorsat a snapshot of a schematic model simulation. We record N S and N T at 50 time stepsto create descriptor vectors. To facilitate downstream clustering analysis, we requirevectors of the same length for all simulations. To ensure the dynamic descriptor vectorsfor each model simulation are of length 50, we record N S ( t ) and N T ( t ) at the 50 timesteps t = { t i } i =1 , where t i = ( i − t , ∆ t = t end /
49, and t end denotes the duration ofa given simulation. For each simulation, we also compute L ( t end ), the x -coordinate ofthe vessel segment location which has the greatest horizontal distance from the y -axis.Thus, the standard descriptor vectors we use for subsequent clustering analysis are N S ( t ) , N T ( t ) and L ( t end ). Quantifying vessel shape using topological data analysis
The most prominent method from topological data analysis is persistent homology(PH) [18, 30, 31]. PH is rooted in the mathematical concept of homology, which capturesthe characteristics of shapes. To compute (persistent) homology from data, one needs tofirst construct simplicial complexes, which can be thought of as collections ofgeneralized triangles. From the constructed simplicial complexes, one can quantify andvisualize the connected components (dimension 0) and loops (dimension 1) of a datasetat different spatial scales.
Simplicial complexes and homology
We construct simplicial complexes from data points derived from binary images of thecompleted simulations of the Anderson-Chaplain model (see“Synthetic data” in Step 1of Fig 2, where red pixels denote presence of vasculature). Normally, TDA of imagedata requires the construction of cubical complexes, which are generalizations of cubes;however, we have binary rather than grayscale data, so we instead use informationabout the model generation and Moore neighborhoods as we describe here. Each pixelJanuary 5, 2021 5/38 ig 3.
Standard descriptors are computed for each model simulation include thenumber of vessel segments ( N S , depicted by each colored solid curve), the number ofendothelial tip cells ( N T , depicted by black dots), and the final length of thevasculature ( L , depicted by the vertical grayed dashed line). This schematic example isinitialized with 2 vessel segments and 2 tip cells at locations (1) and (2). A branchingevent at location (3) increases the value of both N S and N T by one. An anastomosisevent at location (4) decreases N T by one. The branching event at location (5) increasesboth N S and N T by one.that has a value of one is embedded in R at the centroid of the pixel as a vertex (or0-simplex), as demonstrated in Fig. 4. We term the resulting collection of points a pointcloud . We connect two points by an edge (or 1-simplex) if they are within each others’Moore neighborhood (the Moore neighborhood for one pixel is shaded in Fig. 4B). Ifthree points are pairwise connected by an edge, then we connect them with a filled-intriangle (or 2-simplex). The union of 0-,1-, and 2-simplices form a simplicial complex , asshown in Fig. 4C.To compute topological invariants, such as connected components (dimension 0) andloops (dimension 1), we use homology. To obtain homology from a simplicial complex, K , we construct vector spaces whose bases are the 0-simplices, 1-simplices, and2-simplices, respectively, of K . There is a linear map between 2-simplices and1-simplices, called the boundary map ∂ , which sends triangles to their edges. Similarly,the boundary map ∂ sends edges to their points and ∂ sends points to 0. The action ofthe boundary map ∂ on the simplices is stored in a binary matrix where the entry a i,j indicates whether the i -th 0-simplex forms part of the boundary of the j -th 1-simplex.If so, then a i,j = 1; otherwise, a i,j = 0. One can compute the kernel Ker( · ) and imageIm( · ) of the boundary maps to obtain the vector spaces H ( K ) = Ker( ∂ ) / Im( ∂ ) and H ( K ) = Ker( ∂ ) / Im( ∂ ). These vector spaces are also referred to as homology groupsand their dimensions define topological invariants called the Betti numbers of K , β and β , which quantify the numbers of connected components and loops, respectively.January 5, 2021 6/38 ig 4. Converting a binary image into a simplicial complex. A) Schematic binaryimage. B) Point cloud for all nonzero pixels in the binary image. The Mooreneighborhood of one nonzero pixel is highlighted in salmon. C) A simplicial complexconstructed from the point cloud in panel B.
Filtrations for simulated vasculature
There are different ways to study vascular data at multiple scales [13, 24, 25]. We encodethe multiple spatial scales of the data using a filtration , which is a sequence ofembedded simplicial complexes K ⊆ K ⊆ · · · ⊆ K end built from the data. Here, theinput data is the binary image from the final timepoint of the Anderson-Chaplainmodel, which we call N (See Eq (13) in the Appendix ). We construct sequences ofbinary images that correspond to different filtered simplicial complexes: a sweepingplane filtration [13] and a flooding filtration. We remark that both filtrations can beconsidered sublevelset filtrations corresponding to a height function h : X → R on somesimplicial complex X (or just on the vertices/pixels and considering the simplicialsubcomplex spanned by them). Sweeping plane filtration.
In the sweeping plane filtration, we move a line acrossthe binary image N and include pixels in the filtration at discrete steps once a pixel iscrossed by the line. On every filtration step, we move the line by a fixed number ofpixels in a chosen direction. In Fig 5, we illustrate the method when a vertical linemoves from left to right (LTR). From the corresponding sequence of binary images, weconstruct a filtered simplicial complex K LTR = { K , K , . . . , K end } . Similarly, weconstruct the filtered simplicial complexes K RTL when the vertical line moves from rightto left (RTL), K TTB when a horizontal lines moves from top to bottom (TTB), and K BTT when a horizontal line moves from bottom to top (BTT).
Flooding filtration.
Our second filtration is the flooding filtration. From thebinary image, N , associated with a model simulation, we construct a sequence of binaryimages in the following way. On the first step, we input a model’s final output binaryimage. On each subsequent step, we iterate through each nonzero pixel from theprevious binary image and manually set every pixel in its Moore neighborhood (asshown in Fig 4B) to value one. We iterate until all pixels are set equal to one. In Fig6B, for example, the red pixels denote the nonzero pixels from the initial binary image,the orange pixels denote the nonzero pixels added on the first round of flooding, and theyellow pixels denote the nonzero pixels added on the second round of flooding. Each“round” is called a step. From the corresponding sequence of binary images, we constructa filtered simplicial complex K flood = { K , K , . . . , K end } . Persistent homology (PH)
PH is an algorithm that takes in data via a filtration and outputs a quantification oftopological features such as connected components (dimension 0) and loops (dimension1) across the filtration. The simplicial complexes are indexed by the scale parameter ofJanuary 5, 2021 7/38 ig 5.
The LTR sweeping plane filtration for a binary image. A) Sample binary imagefrom a simulation of the Anderson-Chaplain model. B-E) Point clouds used for eachiteration of the sweeping plane approach. On each step, only pixels located to the left ofthe plane (denoted here with a blue line) are included; gray pixels are ignored. F-I)Filtration steps constructed from the point clouds in panels B-E.
Fig 6.
The flooding filtration for a binary image. A) Schematic binary image. B)Grayscale image of the original binary image. Red pixels are nonzero in the initialimage, orange pixels become nonzero after one round of flooding, and yellow pixelsbecome nonzero after two rounds of flooding. C-E) We performed three filtrations bydilating the image twice. During each filtration, the eight neighboring pixels of allnonzero pixels (ie pixels not marked as white) become nonzero. Red pixels are nonzeroin the original image, orange pixels become nonzero on the second filtration (i.e., afterone step of flooding), and yellow pixels become nonzero on the third filtration (i.e., aftertwo steps of flooding). F-H) Simplicial complexes associated with the filtrationspresented in panels C-E.the filtration. In our case, the scale parameter corresponds to the spatial location in thesweeping plane filtration or steps of flooding in the flooding filtration. The inclusion ofa simplicial complex K i ⊆ K j for i ≤ j gives a relationship between the correspondinghomology groups H p ( K i ) and H p ( K j ) for p = 0 ,
1. This relationship enables us to tracktopological features such as loops along the simplicial complexes in the filtration.Intuitively, a topological feature is born in filtration step b when it is first computed aspart of the homology group H p ( K b ) and dies in filtration step d when that feature noJanuary 5, 2021 8/38onger exists in H p ( K d ), i.e. , when a connected component merges with anothercomponent or when a loop is covered by 2-simplices. The output from PH is a multisetof intervals [ b, d ) which quantifies the persistence of topological features. Eachtopological feature is said to persist for the scale d − b in the filtration. Our topologicaldescriptors are the connected components (i.e., 0-dimensional) and loops (i.e.,1-dimensional features) obtained by computing persistent homology of the sweepingplane and flooding filtrations. We use the superscript notation K v to denote the filteredsimplicial complex K in direction v = { LTR, RTL, TTB, BTT, flood } . Topological descriptor vectors from PH
We use PH to generate different output vectors. • Betti curves.
Betti curves [32] show the Betti numbers on each filtration step.Let β i ( K v ) denote the Betti curves of filtered simplicial complex K in direction v for dimensions i = { , } . • Persistence images.
A persistence image [12] uses as input the birth-deathpairs given by PH and converts the set of (birth b , persistence ( d − b )) coordinatesinto a vector, a format which is suitable for machine learning and otherclassification tasks. In a first step the coordinates ( b, d − b ) are blurred by aGaussian, with standard deviation σ , that is centered about each birth-persistencepoint. We take a weighted sum of the Gaussians and place a grid on this surfaceto create a vector. We compute two alternative weighting strategies: all ones (inwhich case all features are equally weighted) and a persistence weight of ( d − b )(in which more persistent features are given larger weights). Let PIO i ( K v ) andPIR i ( K v ) denote the corresponding persistence images computed from K v withequal and ramped weighting, respectively, for dimension i = { , } .The topological descriptor vectors that we compute for each model simulation are β i ( K v ), PIO i ( K v ), and PIR i ( K v ), where i = { , } and v = { LTR, RTL, TTB, BTT,flood } . In total each simulation has 3 × × Simulation Clustering
We use an unsupervised algorithm, k -means clustering, to group the quantitativedescriptor vectors of the synthetic data generated from simulations of theAnderson-Chaplain model. To cluster n samples x , x , . . . , x n ∈ R d into k clusters, the k -means algorithm finds k points in R d , and clusters each sample according to which ofthe k means it is closest to, and then minimizes the within-cluster residual sum ofsquares distance [33]. We cluster the standard descriptor vectors and the topologicaldescriptor vectors (see steps 2A and 2B of the methods pipeline in Fig 2), respectively,to investigate whether descriptor vectors lead to biologically meaningful clusters.We test the robustness of each clustering assignment by randomly splitting thedescriptor vectors into training and testing sets, and use the testing set to evaluateout-of-sample (OOS) accuracy. Specifically, 7 of the 10 descriptor vectors from each ofthe 121 ( ρ, χ ) parameter combinations are randomly chosen and placed into a trainingset (847 simulations); the remaining 3 descriptor vectors from each ( ρ, χ ) combinationare placed into a testing set (363 simulations). After performing unsupervised clusteringwith a k -means model on the training set, each ( ρ, χ ) combination is labeled by themajority of cluster assignments among its 7 training simulation descriptor vectors. Toevaluate each clustering model’s OOS accuracy, the 3 test data samples for each ( ρ, χ )parameter combination are given a “ground-truth” label equal to the cluster assignmentfor the 7 training simulations for that same ( ρ, χ ) parameter combination. A “predicted”January 5, 2021 9/38abel is calculated for each descriptor vector from the test data using the trained k -means model, and OOS accuracy is defined as the proportion of test simulations forwhich the “predicted” label matched the “ground-truth” label. For example, if theseven training simulations from the parameter combination ( ρ, χ ) = (0 . , .
2) are placedinto clusters { , , , , , , } , then the three testing simulations from ( ρ, χ ) = (0 . , . { , , } , then we compute an OOS accuracy of 66.67%. We use all 363 testingsimulations when computing OOS accuracy and use the KMeans command fromscikit-learn (version 0.22.1) for training and prediction.
Results
We simulated the Anderson-Chaplain model for 121 different values of the chemotaxisand haptotaxis coefficients ( ρ, χ ) and performed 10 model realizations for eachparameter pair. We observed that simulations generated by different parameter pairsmay appear visually distinct in their vessel growth patterns. To quantify andsummarize the morphologies, we computed dynamic standard descriptor vectors as wellas topological descriptor vectors for the set of 1,210 model simulations. We thenperformed unsupervised clustering using either standard or topological descriptorvectors to partition the ( ρ, χ ) parameter space.As we detail here, we found that the topological sweeping plane descriptor vectorsoffered a multi-scale analysis that led to biologically interpretable and robust clusteringsof the ( ρ, χ ) parameter space, whereas the clusterings of the standard descriptor vectorswere not biologically interpretable. The topological analysis succeeded in discriminatingfine-grained vascular structures and the parameter pairs that generated them.
Haptotaxis and chemotaxis alter vessel morphology
We investigated the influence of haptotaxis and chemotaxis by varying the parameters ρ and χ , respectively. Recall that in model simulations, the vessels emerged from theleft-most boundary and were recruited towards a tumor located on the right boundary.When chemotaxis alone drives endothelial tip cell movement ( ρ = 0 . χ = 0 . ρ = 0 . χ = 0 . ρ = χ = 0 . Standard and topological descriptors measure distinct aspectsof vascular architectures
Rather than performing visual inspection of all model simulations, we computed andcompared quantitative vessel descriptor vectors using data science approaches asdescribed in the
Materials and Methods . We demonstrate here how the biologicaland topological descriptors are distinct from each other. In Fig 8, we observe how thenumber of tip cells ( N T ), vessel segments ( N S ), connected components ( β ), and loops( β ) associated with an evolving vasculature change following different events. Aftereach anastomosis event, N T decreases by one (Fig 8B). After a branching event, both N T and N S increase by one (Fig 8C). We conclude that N T is unchanged and N S increases by one after one branching event and one anastomosis event have occurred. Bycontrast, the values of the topological descriptors before and after these events are notJanuary 5, 2021 10/38 pace ( x ) Sp a c e ( y ) A) Chemotaxis-Driven Simulation
Space ( x )B) Haptotaxis-Driven Simulation
Space ( x )C) Chemotaxis and Haptotaxis Simulation
Fig 7.
Typical realizations of the Anderson-Chaplain model for A) chemotaxis-driventip cell movement ( ρ = 0 . χ = 0 . ρ = 0 . χ = 0 . ρ = 0 . , χ = 0 . β , increases by 1(compare Fig 8A and D); when the vessel segments were not previously connected, β decreases by 1 (compare Fig 8A and E).Let N A ( r, s ) and N B ( r, s ) denote the numbers of anastomosis and branching events,respectively, that occur between times t = r and t = s . We computed N T ( t ) , N S ( t )vectors of the standard descriptors of evolving blood vessel simulations as follows: N T ( t ) = N S ( t ) = 9 ,N T ( t i ) = N T ( t i − ) + N B ( t i − , t i ) − N A ( t i − , t i ) , i = 2 , ..., ,N S ( t i ) = N S ( t i ) + N B ( t i − , t i ) , i = 2 , ..., . The values of N T ( t ) and N S ( t ) were determined from the models’ initial conditions.We first constructed the filtrations K v where v = { LTR, RTL, TTB, BTT, flood } of each simulation’s final binary image. Then we computed connected components andloops, which we call topological descriptors of K v , using PH. The topological descriptorvectors, Betti curves and persistence images, were then constructed for these 5filtrations. Quantification of blood vessel architecture data
Fig 9 shows the standard descriptor vectors for the three simulations presented in Fig 7.For the haptotaxis-driven simulation growth in the horizontal direction halts (the totallength remains at 0.095 units) and the numbers of tip cells and vessel segments stayconstant (at or near the initial condition of 9 vessels). By contrast, the number of tipcells and vessel segments increase over time for the chemotaxis-driven simulation, with amore marked increase when both chemotaxis and haptotaxis are active. For bothchemotaxis-active simulations, the vessels extend to the maximum horizontal distance.We next illustrate the topological analysis for the chemotaxis-driven,haptotaxis-driven, and chemotaxis & haptotaxis-driven simulations (from Fig 7). Wecomputed Betti curves, β ( K v ) and β ( K v ), of the simulated data with sweeping planefiltrations for v = { LTR, RTL, TTB, BTT } (see Fig 10) and a flooding filtration v = { flood } (see Fig 11). We could interpret the PH of the sweeping plane filtrationdirection. Since the blood vessels grow primarily from left to right, the topologicalanalysis primarily identifies dynamic changes in branching and anastomosis of vessels.The BTT and TTB filtrations are similar to each other, and reflect only horizontalJanuary 5, 2021 11/38 ig 8. Standard and topological descriptors describe distinct aspects of vesselmorphology (Key: β , number of connected components; β , number of loops; N T ,number of tip cells; N S , number of vessel segments). For clarity, each vessel segment iscolored differently, and black dots represent active tip cells. A) The initial vesselconfiguration. B) A schematic showing how the vascular architecture changes after ananastomosis event. C) A schematic showing how the vascular architecture changes aftera branching event. D-E) Schematics showing two ways in which vasculature can changeafter one anastomosis event and one branching event. F) A schematic showing how thevessels can change after one branching and two anastomosis events. Time steps C o un t A) N T ( t ) Chemotaxis-drivenHapotaxis-drivenChemotaxis & haptotaxis
Time steps C o un t B) N S ( t ) Length C) L ( t end ) Fig 9.
The standard descriptors used to summarize model simulations. We show howthe following descriptors change over 50 time steps for the three simulations presentedin Fig 7: A) the number of tip cells, N T ( t ), B) the number of vessel segments, N S ( t ),and C) the final vessel length, L ( t end ).January 5, 2021 12/38 pace ( x ) Sp a c e ( y ) A) LTR RTL TT B B TT Plane Orientations Space ( x )050 Sp a c e ( y ) Blood VesselLTRRTLBTTTTB Space ( x )025 Sp a c e ( y ) Blood VesselLTRRTLBTTTTB Space ( x )0250 Sp a c e ( y ) Blood VesselLTRRTLBTTTTB
Fig 10.
TDA sweeping plane descriptor vectors. A) The sweeping plane directions areleft-to-right (LTR, gray); right-to-left (RTL, orange); bottom-to-top (BTT, cyan); andtop-to-bottom (TTB, green). For these four v = i T j filtrations, we started with pointson the i th boundary and included more points as we stepped towards the j th boundary.Repeating this process produced the spatial filtration K v = K , K . . . K end for v = { LTR, RTL, TTB, BTT } . B-D) We formed the Betti curves β ( K v ) and β ( K v )by computing the Betti numbers along each step of K v for the three blood vesselsimulations presented in Fig 7; (B) the chemotaxis-driven simulation, (C) thehaptotaxis-driven simulation and (D) chemotaxis & haptotaxis simulation.changes, which could be interpreted as measuring bendiness or tortuosity. Therefore, weonly discuss the BTT results in this section.For the chemotaxis-driven simulation (Figs. 10B and 11(A-B)), the β ( K LTR )descriptor vector slightly decreases with distance x from the y -axis as vessel segmentsanastomose together. The β ( K RTL ) and β ( K RTL ) descriptor vectors decrease with x at the end of each vessel segment. The magnitude of β ( K BTT ) increases almostlinearly with the y -spatial coordinate, and β ( K BTT ) increases with y until it plateausfor y ≥ .
6. The β ( K flood ) descriptor vector decreases with each filtration step, and β ( K flood ) begins with β = 14 and achieves a local maximum of β = 10 after sevenfiltration steps before reaching β = 0 after 11 filtration steps.For the haptotaxis-driven simulation (Figs. 10C and 11(C-D)), all LTR and RTLBetti curves become zero for x > . x = 0 .
1. The β ( K BTT ) and β ( K BTT ) descriptor vectors increase steadily with y asthe horizontal plane moves past each individual vessel segment. The flooding descriptorvectors initially decrease with each flooding step and remain fixed at β = 1 , β = 0after six steps of flooding.January 5, 2021 13/38 pace ( x ) Sp a c e ( y ) A) B)
Flooding filtration S t e p nu m b e r , Betti curves Space ( x ) Sp a c e ( y ) C) D)
Flooding filtration S t e p nu m b e r , Betti curves
Space ( x ) Sp a c e ( y ) E) F)
Flooding filtration S t e p nu m b e r , Betti curves
Fig 11.
The flooding filtration. A,C,E) Illustrate the flooding filtrations for the threeblood vessel simulations from Fig 7. Dark red pixels were points included in the firststeps of the filtration and yellow pixels were the points included in later filtration steps.White pixels were not included after 24 steps of the flooding filtration. B,D,F) TheBetti Curves β ( K flood ) and β ( K flood ) for the chemotaxis-driven, haptotaxis-driven,and chemotaxis & haptotaxis simulations respectively.January 5, 2021 14/38or the chemotaxis & haptotaxis-driven simulation (Figs. 10D and 11(E-F)), β ( K LTR ) decreases with x as vessel segments anastomose together, while β ( K LTR )increases with x due to the formation of many loops. The β ( K RTL ) descriptor vectorincreases with x until it reaches its maximum value of β = 11 at x = 0 .
85 and thendecreases to β = 0 . The β ( K RTL ) descriptor vector decreases with x . The β ( K BTT )descriptor vector periodically increases and decreases with y as the horizontal planemoves past regions of high and low connectivity. The β ( K BTT ) descriptor vectorsteadily increases with the y coordinate. The β ( K flood ) descriptor vector was equal toone for all steps of flooding, and β ( K flood ) decreases with each round of flooding andreaches β = 0 after 20 flooding steps. Sweeping plane descriptor vectors cluster simulations byparameter values
We applied k -means clustering separately to the standard descriptor vectors and thetopological descriptor vectors. None of the clusterings of the standard descriptor vectorswere biologically meaningful in ( ρ , χ ) parameter space (see SI Fig. 16). While thetopological descriptor vectors from the sweeping plane filtration gave more biologicallyinterpretable groupings for combined haptotaxis and chemotaxis parameter values, theywere not robust for haptotaxis dominated or chemotaxis dominated ( ρ , χ ) parameterranges (see SI Fig 19). The flooding filtration did not give biologically meaningfulgroups for any parameter ranges (see Clustering results in the Appendix for alldetails). However, double descriptor vectors created by concatenating two sweepingplane topological descriptor vectors produced clusterings in ( ρ , χ ) parameter space thatwere biologically interpretable and robust. We note that doubles of standard descriptorvectors or topological flooding descriptor vectors were unable to produce biologicallyinterpretable and robust clusterings in ( ρ, χ ) parameter space (see Clustering results in the appendix).We computed a total of 2 × × β , β ), swept a plane across the domain in one of four directions (LTR, RTL,TTB, and BTT), and used one of the three PH summaries (BC, PIO, PIR). Wecomputed all double sweeping plane descriptor vectors and clustered all 276combinations in ( ρ, χ ) parameter space. The 15 double descriptor vectors with thehighest OOS accuracies are presented in Table 1; the four with the highest OOSaccuracies are liested below:1. PIO ( K LT R ) & PIO ( K RT L ) (93.1% OOS accuracy),2. PIO ( K RT L ) & PIR ( K LT R ) (92.6% OOS accuracy),3. PIR ( K T T B ) & PIR ( K LT R ) (91.5% OOS accuracy),4. PIR ( K BT T ) & PIR ( K LT R ) (91.5% OOS accuracy).In Fig 12, we show how the data clustered in ( ρ, χ ) parameter space when the top fourdouble sweeping plane descriptor vectors were used. The “PIO ( K LT R ) & PIO ( K RT L )”and “PIO ( K RT L ) & PIR ( K LT R )” doubles produced five connected clustering regionsand, thus, attained high OOS accuracy and biologically interpretable clusterings. Incontrast, the clustering generated by the following double descriptor vectors(“PIR ( K T T B ) & PIR ( K LT R )” and “PIR ( K BT T ) & PIR ( K LT R )”) was unable todistinguish between simulations with strong chemotaxis and strong haptotaxis. For example, the double descriptor vector “PIO ( K LTR ) & PIO ( K LTR )” denoted the topologicaldescriptor vector created by concatenating the PIO ( K LTR ) and PIO ( K LTR ) descriptor vectors.
January 5, 2021 15/38eature In Sample Accuracy Out of Sample AccuracyPIO ( K LT R ) & PIO ( K RT L ) 94.0% 93.1%PIO ( K RT L ) & PIR ( K LT R ) 94.2% 92.6%PIR ( K T T B ) & PIR ( K LT R ) 92.3% 91.5%PIR ( K BT T ) & PIR ( K LT R ) 92.3% 91.5%PIR ( K LT R ) & PIR ( K LT R ) 92.3% 90.9%PIO ( K T T B ) & PIR ( K LT R ) 94.2% 90.4%PIO ( K BT T ) & PIR ( K LT R ) 94.3% 90.1%PIO ( K LT R ) & PIR ( K LT R ) 92.6% 90.1%PIR ( K RT L ) & PIR ( K LT R ) 92.6% 89.0%PIR ( K LT R ) & PIR ( K RT L ) 90.2% 89.0%PIO ( K LT R ) & PIR ( K RT L ) 91.9% 88.4%PIO ( K RT L ) & PIO ( K LT R ) 86.1% 86.0%PIO ( K BT T ) & PIO ( K LT R ) 84.2% 85.1%PIO ( K LT R ) & PIR ( K BT T ) 85.6% 84.8%PIO ( K LT R ) & PIR ( K T T B ) 83.1% 84.8%PIO ( K T T B ) & PIO ( K LT R ) 84.9% 84.8%PIO ( K BT T ) & PIO ( K RT L ) 85.6% 84.8%
Table 1.
Summary of the 15 sweeping plane double descriptor vectors that achievedthe highest OOS accuracy scors.January 5, 2021 16/38 ig 12.
Partitioning ( ρ, χ ) parameter space into distinct regions using sweeping planedouble descriptor vectors. We applied k -means clustering, with k = 5, to sweeping planedouble descriptor vectors. The resulting partitions are presented for the four descriptorvectors that have the highest OOS accuracy: A) P IO ( K LT R ) &
P IO ( K RT L ), B)
P IO ( K RT L ) &
P IR ( K LT R ), C)
P IO ( K T T B ) &
P IR ( K LT R ), and D)
P IO ( K BT T ) &
P IR ( K LT R ). The five clusters are ordered according to the mean χ value within the cluster.January 5, 2021 17/38 roup 1 Group 2 Group 3Group 4 Group 5 Mean Cluster Realizations from PIO ( K LTR ) & PIO ( K RTL ) Fig 13.
Series of simulations that resemble the means from each of the five groups (asidentified from the double descriptor vectors associated with the sweeping planefiltrations PIO ( K LT R ) & and PIO ( K RT L )). The representative simulations forgroups 1-5 were generated using the following pairs of parameter values:( ρ, χ ) = (0 . , , (0 . , . , (0 . , . , (0 . , . , and (0 . , . k − means clustering to all sweeping plane double descriptor vectors andcomputed the average OOS accuracy for a range of values of k (SI Fig 15 ). While theaverage OOS accuracy decreased with increasing values of k , the rate of decrease slowedmarkedly for k >
5. We fixed k = 5 as this value robustly clustered the ( ρ, χ ) parameterspace into biologically interpretable clusters (Fig 12). Larger values of k may overfit thedata [34], while smaller values failed to partition ( ρ , χ ) parameter space into biologicallyinterpretable regions (SI Fig 21). For example, when k = 3, simulations with stronghaptotaxis ( ρ > χ ), equal chemotaxis and haptotaxis ( ρ = χ ), and strong chemotaxis( ρ < χ ) clustered together. When k = 4, two distinct clusters emerged near ρ = χ , butsimulations with large values of the haptotaxis parameter, ρ , were clustered with thosewith high values of the chemotaxis parameter, χ . With k = 5 clusters, the region withlarge values of the haptotaxis parameter, ρ , was further subdivided; we suggest that thisclustering provides sufficient resolution to distinguish model simulations generated fromdifferent regions of the parameter space.We identified five model simulations whose double descriptor vectors closelyresembled the means from the clusters associated with the “PIO ( K LT R ) &PIO ( K RT L )” clustering. Fig 13 presents representative simulations of each cluster. Wenote that as ρ and χ vary, the spatial extent and connectedness of the network vary asexpected (i.e., greater spatial extent in the x -direction as χ increases, and greaterconnectedness as ρ increases). We applied a similar analysis to model simulationssampled from ( ρ, χ, ψ ) parameter space, where the non-negative parameter ψ representsthe rate at which tip cells branch. The resulting parameter clustering changed only with ρ and χ , suggesting that the vascular morphology does not depend on ψ (SI Fig 22).January 5, 2021 18/38 onclusion and Discussion We have introduced a topological pipeline to analyze simulated data of tumor-inducedangiogenesis [4]. We investigated different filtrations to feed into the persistenthomology algorithm. We demonstrated that simple data science algorithms can grouptogether topologically similar simulated data. Pairs of topological descriptor vectorsgenerated from sweeping plane filtrations robustly clustered simulations into fivebiologically interpretable groups. Simulations within each group were generated fromsimilar values of the model parameters ρ and χ , which determine the sensitivity of thetip cells to haptotaxis and chemotaxis.The Betti numbers quantify connectedness of the vasculature across spatial scales.We computed PH of four sweeping plane filtrations (LTR, RTL, TTB, BTT), whichquantify topological features as a plane moves across the simulated data in a particulardirection. We found that the filtration direction provides information about differentaspects of the vascular network. For the LTR filtration, the number of connectedcomponents (i.e., β ) decreases when an anastomosis event occurs, whereas for the RTLfiltration, β decreases when a branching event occurs. The TTB and BTT filtrationssummarize vascular tortuosity: when tip cell movement in the y -spatial direction isnegligible and/or tip cells do not merge with adjacent sprouts, β increases with thenumber of tip cells/ number of sprouts. We found that the combined PIO ( K LT R ) &PIO ( K RT L ) sweeping plane descriptor vectors robustly cluster haptotaxis-chemotaxisparameter space into five distinct regions that appear to be connected.Our results may depend on the model setup, in which new blood vessels are createdas endothelial tip cells move from left to right. In the future, we may explore the use ofextended persistence [35] and/or PH transform [36], which allow consideration ofmultiple different directions in the topological analysis. If the vessel network growsradially, for example, then radial filtrations may be more informative than the LTRfiltration [24, 25]. Classically in PH, topological features of short persistence aretopological noise, yet there is growing evidence that these short features can provideinformation about geometry and curvature [13, 23, 25, 26, 37–40]. Our results usingpersistence images with a weighting of ones, rather than a ramped weighting, alsosuggests that short persistence features may be helpful for distinguishing vesselmorphologies.Many models of angiogenesis have been developed and implemented since theAnderson-Chaplain model, as has been extensively reviewed previously [7, 8, 41–44]. Weplan to use the methods described in this work to interpret more sophisticated modelsof angiogenesis and, in the longer term, to determine how vessel morphology affectstumor survival and to predict the efficacy of vascular targetting agents. The tumor isnot explicitly modeled in the Anderson-Chaplain model, but many studies coupleangiogenesis and tumor growth [15, 45–48]. For example, Vavourakis et al. developed a3D model of angiogenic tumor growth that incorporates the effects of mechanotaxis onvessel sprouting and mechano-sensitive vascular remodelling [47]. Vilanova et al.developed a phase-field model of vascular growth, regression, and re-growth that agreeswith in vivo experimental data [48]. Miller at al. used computational models to analyzetumor regression in response to drug treatment as a function of vasculature morphologyand heterogeneity [49]. Our future analysis of such state-of-the-art modeling techniqueswill require us to extend the topological methods presented here to handleheterogeneous data comprised of multiple cell types in 3D. Adapting topologicalmethods to 3D vessel data is near the edge of computational feasibility [25] and wouldrequire significant computational advancements to scale the TDA pipeline to multiple,more complex models and analyze the parameter landscape.The topological approaches considered in this work represent a multiscale way tosummarize the properties of in silico vasculature, and could be extended toJanuary 5, 2021 19/38xperimental image data. Previous studies have quantified experimental data oftumor-induced angiogenesis using standard descriptors that include sprout length anddiameter, number of tumor cells, concentration of angiogenic chemicals [50–52]. Tocompare models and data will require descriptors, such as the topological ones proposedhere, that can discriminate between different conditions (e.g., parameter values). Thechallenges of making direct comparisons between experimental and synthetic data is asignificant bottleneck in the validation of mathematical and computational models ofangiogenesis. In future work we will apply the topological approaches outlined in thispaper to compare different model simulations with experimental data [53, 54]. In thisway, we aim to determine biologically realistic parameter values for controlconditions [16], and to identify parameter values that are associated with response totreatments, including vessel normalization.
Appendix
Modeling of Tumor-induced angiogenesis
Model Equations for biological movement and reaction
We use the Anderson-Chaplain model [4] to simulate the formation of blood vesselnetworks in response to tumor-generated chemical gradients, as depicted in Fig 1. Thismodel describes the spatio-temporal evolution of three dependent variables, namelyendothelial tip cells, tumor angiogenic factor (TAF), and fibronectin. Endothelial tipcells emerge from a parent blood vessel and migrate towards the tumor in response tospatial gradients of TAF. In practice, tumors secrete multiple TAFs, including vascularendothelial growth factor (VEGF) and basic fibroblast growth factor (bFGF); in theAnderson-Chaplain model, TAF can be viewed as a generic factor. Tip cells migrate bychemotaxis up spatial gradients and also exhibit a small amount of random diffusion.Fibronectin, a major component of the extracellular matrix, also influences tip cellmovement; tip cells move, via haptotaxis, up spatial gradients of fibronectin. For a2-dimensional Cartesian geometry, the evolution of the tip cell density can berepresented by the following continuous partial differential equation: ∂n∂t = D n ∆ n − χ ∇ · ( n ∇ c ) − ∇ · ( ρ n ∇ f ) (1)where n = n ( t, x, y ) denotes tip cell density at time t and spatial position ( x, y ), f = f ( t, x, y ) denotes fibronectin concentration, and c = c ( t, x, y ) denotes TAFconcentration. The parameters D , χ , and ρ parameterize tip cell diffusion, chemotaxis,and haptotaxis, respectively.Fibronectin is assumed to be produced and consumed by tip cells and to remainbound to substrate ECM after its production. The evolution of fibronectinconcentration can be represented by the following partial differential equation: ∂f∂t = ωn − µnf, (2)where the parameters ω and µ parameterize fibronectin production and consumption,respectively.TAF is assumed to be secreted by the tumor and to diffuse on a scale faster than theendothelial cell migration. We assume tip cells consume TAF while migrating towardsthe tumor. The evolution of TAF concentration can be represented by the followingpartial differential equation: ∂c∂t = D c ∆ c − λnc, (3)January 5, 2021 20/38here the parameters D c and λ parameterize TAF diffusion and consumption by tipcells, respectively. Following [4], we will replace Equation (3) with the simpler partialdifferential equation: ∂c∂t = − λnc. (4)The parameters that appear in the model equations (1)-(2) and (4) are listed inTable 2, together with a brief description of their biological meaning and typicaldimensional values taken from [4]. Nondimensionalization, initial & boundary conditions:
Following [4], we nondimensionalize Eqs. (1)–(2) and (4) as follows. We consider thenondimensionalized variables (with asterisks) given by t ∗ = D c L t, n ∗ = nn , c ∗ = cc , ρ ∗ = ρρ ,D ∗ = D n D c , χ ∗ = χ c D c , ρ ∗ = ρ f D c ,β ∗ = ωL n f D c , γ ∗ = µL n D c , η ∗ = λL n D c . (5)We substitute the variables from Equation (5) into Equations (1)–(2) and (4) and findthe nondimensionalized system of equations given by (asterisks dropped for notationalconvenience) ∂n∂t = D ∆ n − χ ∇ · ( n ∇ c ) − ρ ∇ ( n ∇ f ) ,∂f∂t = βn − γnf,∂c∂t = − ηnc. (6)We list the nondimensionalized parameters for Equation in SI Tab 3. ∂n∂t = D ∆ n − χ ∇ · ( n ∇ c ) − ρ ∇ · ( n ∇ f ) ∂f∂t = βn − γnf∂c∂t = − ηnc (7)The dimensionless parameters used in all simulations (unless otherwise noted) are listedin SI Tab 3.We assume that the initial distribution of the TAF c can be described as aone-dimensional profile from a tumor source at x = 1 as c ( x, y,
0) = e − − x (cid:15) , ( x, y ) ∈ [0 , × [0 , . (8)The initial distribution of the fibronectin concentration f is given by f ( x, y,
0) = ke − x (cid:15) , ( x, y ) ∈ [0 , × [0 , . (9)Equations (8) and (9) are visually depicted in SI Fig 14.We use no-flux conditions at all boundary points to ensure that the net flux of tipcells into the spatial domain is zero (no boundary conditions are needed for fibronectinor TAF since their governing equations do not contain spatial derivatives).January 5, 2021 21/38 tochastic ABM implementation: We simulate Equation (7) by using a stochastic agent-based model (ABM) to describetip cell locations over time and the resulting concentrations of fibronectin and TAF. Inturn, we create network of angiogenic blood vessels comprised of the tip cells and theirprevious locations. We begin simulations by creating a lattice for our spatial domaingiven by x = i ∆ x, i = 0 , ..., N, ∆ x = 1 /N with N = 201. The discretization of thesecond spatial dimension, y , is defined equivalently. We then initialize ten tip cellslocations over time, n tip i ( t ), by setting n tip i (0) = { (0 , . i ) } i =1 .In order to simulate Eqs. (1)-(2) and (4), we discretize these equations around each n tip i ( t ) over timestep ∆ t using the upwind method and form five probabilities governingwhether the given tip cell will move left, right, up, down, or stay in place. We then set n tip i ( t + ∆ t ) equal to one of these locations based on these five computed probabilities.We define the i th cell sprout as n sprout i = (cid:91) t n tip i ( t ) , (10)We also consider tip cell branching, in which a new tip cell is placed in the ABMdomain. Similar to [4], we use the following three rules to determine when a tip cellbranches and forms a new active tip cell:1. Only sprouts of age greater than some new threshold value, ψ , may branch. Weset ψ = 0 .
1. After a tip cell branches, its “age” is re-set to zero.2. Tip cells can only branch if it neighbors lattice sites that are not occupied by anyother sprouts.3. The endothelial cell density at n tip i ( t ) must be greater than some threshold levelgiven by n b = 2 . c ( n tip i ( t ) , t ) . (11)We define endothelial cell density at the point ( x, y ) as the number of occupiedsprout locations in the 3 × x, y ), divided by nine.If all three of the above conditions are satisfied, then the tip cell will branch andrandomly place the new tip cell into one of its eight adjacent and unoccupied latticesites. Resulting vasculature image:
From an ABM simulation, we can form the entirevasculature by constructing N vasculature = (cid:91) i n sprout i . (12)We further convert this vasculature to a binary image, N , where we set each pixel ( i, j )to be N ( i, j ) = (cid:40) i ∆ x, j ∆ y ) ∈ N network . (13)We present three representative realizations of N in Fig 7 that result from different( ρ, χ ) combinations. In these figures, red cells have a values of one to denote a sprout,and white cells have a value of zero to denote the absence of a sprout.January 5, 2021 22/38 lustering results Clustering with standard descriptor vectors:
The use of the standard biologicaldescriptor vectors does not lead to biologically interpretable clusterings of the ( ρ , χ )parameter space (SI Fig 16). The clustering of the parameter space using tips and vesselsegment descriptor vectors suggested that the majority of simulations belong to onelarge cluster, which demonstrates that these descriptor vectors cannot distinguishbetween simulations that result from visually distinct vascular architectures generatedby different model parameter values. The final vasculature length descriptor vector ledto four distinct clusters in the parameter space, but almost all ( ρ , χ ) pairs with χ > ρ are placed into one large cluster (group 5), suggesting that this descriptor vector cannotdistinguish over half of the model simulations from each other. Concatenating all threedescriptor vectors into one large descriptor vector led again to one dominant cluster.These descriptor vectors clusterings are robust, however, as each achieves a high OOSaccuracy of above 90% (SI Fig 16). Clustering with topological flooding descriptor vectors:
We clustered oursimulated model data set using the the topological flooding descriptor vectors. Eachdescriptor vector counts either the number of connected components ( β ) or loops ( β );performs the flooding filtration; and is summarized using a BC, PIR, or PIO, so weconsidered 2 × × ρ, χ ) parameter space (Fig 17). These flooding descriptorvector clusterings are not robust, as we found a highest OOS accuracy of 65.8% isachieved with the PIR ( K flood ) descriptor vector.We next considered concatenating doubles of topological flooding vectors as a meansto improve robustness. We considered all (6 choose 2) = 15 possible flooding descriptorvectors doubles and ranked these clusterings by the highest OOS accuracies in SI Tab. 6.Doubles of flooding descriptor vectors attain only slightly higher OOS accuracy scoresthan individual flooding descriptor vectors. The PIO ( K flood ) & PIR ( K flood ) doubleachieved the highest OOS classification accuracy of 66.4%. Though the clusteringregions in ( ρ, χ ) space for this double flooding descriptor vector did appear biologicallyinterpretable, as the five groups are mostly well connected, aside from Group 4 (Fig 18). Clustering with individual topological sweeping plane descriptor vectors:
We clustered our simulated model dataset using the the topological sweeping planedescriptor vectors. Each descriptor vector counts either the number of connectedcomponents ( β ) or loops ( β ); sweeps the plane in one of four directions (LTR, RTL,TTB, and BTT); and is summarized using a Betti curve (BC), persistence image withramp weighting (PIR), or persistence image with one weighting (PIO), so we computed2 × × ( K LT R ) (91.5% OOSaccuracy), PIO ( K LT R ) (83.5% OOS accuracy), PIO ( K RT L ) (74.9% OOS accuracy),and PIO ( K LT R ) (74.7% OOS accuracy) descriptor vectors. While these clusteringsresult in high classification scores, the resulting clusters were determined to not bebiologically interpretable. For example, Fig 19 in the SI shows that the parameterclusterings that result from either the PIR ( K LT R ) or the PIO ( K LT R ) descriptorvector include both high haptotaxis ( e.g. , ( ρ, χ )=(0.5,0)) and high chemotaxis ( e.g. ,( ρ, χ )=(0,0.5)) parameter combinations in the same cluster (group 5), even though theseparameter combinations lead to markedly different vascular morphologies. ThePIO ( K RT L ) and PIO ( K LT R ) clusterings also led to clusterings that include highchemotaxis and high haptotaxis simulations within the same cluster (within groups 1and 2, respectively).January 5, 2021 23/38 arameter Dimensional value L length of domain 2 mm D n endothelial cell diffusion 10 − / s χ chemotactic response 2600 cm / ( M × s ) ρ haptotactic response 2600 cm / ( M × s ) f fibronectin concentration 10 − M D c TAF diffusion 2 . × − / s ω Production of fibronectin 1 . × − M/s µ uptake of fibronectin 1 × − M/s λ Uptake of TAF 2 . × − M/s
Table 2.
List of mechanistic parameters for the Chaplain Anderson ABM from [4].
Parameter baseline value D χ ρ β γ (cid:15) (cid:15) k Table 3.
Baseline nondimensionalized mechanistic parameters for theChaplain-Anderson Model from [4].
Acknowledgments
HAH, BJS and HB are grateful to the support provided by the UK Centre forTopological Data Analysis EPSRC grant EP/R018472/1. HAH gratefully acknowledgesfunding from EPSRC EP/R005125/1 and EP/T001968/1, the Royal SocietyRGF \ EA \ Space ( x ) Sp a c e ( y ) TAF, linear profile
Space ( x ) Sp a c e ( y ) Fibronectin, linear profile
Fig 14.
TAF concentration (left) and fibronectin concentration (right).January 5, 2021 24/38eature In Sample Accuracy Out of Sample AccuracyPIR ( K LT R ) 92.3% 91.5%PIO ( K LT R ) 90.1% 83.5%PIO ( K RT L ) 82.4% 74.9%PIO ( K LT R ) 82.6% 74.7% β ( K LT R ) 82.3% 74.1%PIR ( K RT L ) 78.3% 73.3%PIO ( K RT L ) 82.9% 73.0% β ( K RT L ) 77.9% 72.7%PIR ( K RT L ) 78.0% 72.7% β ( K RT L ) 79.0% 70.8% β ( K T T B ) 74.1% 68.6% β ( K LT R ) 77.7% 66.1% β ( K BT T ) 71.4% 65.0%PIO ( K BT T ) 73.3% 64.5%PIO ( K T T B ) 73.9% 64.5% β ( K BT T ) 72.3% 61.4%PIR ( K BT T ) 69.3% 60.6%PIO ( K T T B ) 68.0% 60.1%PIR ( K T T B ) 70.1% 59.8%PIO ( K BT T ) 67.9% 56.5%PIR ( K LT R ) 67.1% 54.8% β ( K T T B ) 64.6% 50.1%PIR ( K BT T ) 55.6% 41.6%PIR ( K T T B ) 51.9% 36.6%
Table 4.
Out of Sample Accuracy scores for individual feature vectors from thesweeping plane filtration using k -means classification with k = 5.January 5, 2021 25/38 k )405060708090100 OO S A cc u r a c y % Elbow Curve
Fig 15.
Elbow curve for the OOS Accuracy for the k -means clustering and labelingalgorithm over different values of k . We considered all 276 doubles of topological featurevectors and plot the mean OOS accuracy percentage plus or minus two standarddeviations for multiple values of k . We propose that the elbow occurs at k = 5.Feature In Sample Accuracy Out of Sample AccuracyPIR ( K flood ) 71.9% 65.8% β ( K flood ) 72.6% 63.4%PIO ( K flood ) 71.1% 60.6% β ( K flood ) 66.2% 52.6%PIO ( K flood ) 60.0% 46.0%PIR ( K flood ) 62.1% 44.9% Table 5.
Out of Sample Accuracy scores for individual feature vectors from theflooding filtration using k -means classification with k = 5.January 5, 2021 26/38 ig 16. Clustering of the ( ρ, χ ) parameter space using k -means with k = 5 on thestandard descriptor vectors to summarize each simulated vasculature. We considered A)Number of tips over time, C) Number of vessel segments over time, E) Final length ofthe simulated vasculature, and G) All of the previous descriptor vectors concatenatedinto one descriptor vector. The five clusters are ordered according to the mean χ valuewithin the cluster. Panels B,D,F,H depict the out of sample confusion matrices for eachdescriptor vector.January 5, 2021 27/38 ig 17. Clustering of the ( ρ, χ ) parameter space using k -means with k = 5 onindividual flooding topological descriptor vectors to summarize each simulatedvasculature. The four highest OOS accuracies resulted from the A) P IR ( K flood ), C) β ( K flood ), E) P IO ( K flood ), G) and β ( K flood ) descriptor vectors. The five clustersare ordered according to the mean χ value within the cluster. Panels B,D,F,H depictthe out of sample confusion matrices for each descriptor vector.January 5, 2021 28/38 ig 18. Clustering of the ( ρ, χ ) parameter space using k -means with k = 5 on doubleflooding topological descriptor vectors to summarize each simulation. The four highestOOS accuracies resulted from the A) P IO ( K flood )& P IR ( K flood ), C) P IR ( K flood )& P IR ( K flood ), E) β ( K flood )& β ( K flood ), G) and P IO ( K flood )& β ( K flood ) descriptor vectors. The five clusters are ordered accordingto the mean χ value within the cluster. Panels B,D,F,H depict the out of sampleconfusion matrices for each descriptor vector.January 5, 2021 29/38 ig 19. Clustering of the ( ρ, χ ) parameter space using k -means with k = 5 onindividual sweeping plane topological filtrations to summarize each simulatedvasculature. The four highest OOS accuracies resulted from the A) P IR ( K LT R ), C)
P IO ( K LT R ), E)
P IO ( K RT L ), G) and
P IO ( K LT R ) descriptor vectors. The fiveclusters are ordered according to the mean χ value within the cluster. Panels B,D,F,Hdepict the out of sample confusion matrices for each descriptor vectory.January 5, 2021 30/38 ig 20. Clustering of the ( ρ, χ ) parameter space using k -means with k = 5 on doublesof the sweeping plane topological filtration to summarize each simulated vasculature.The four highest OOS accuracies resulted from the A) P IO ( K LT R ) &
P IO ( K RT L ),C)
P IO ( K RT L ) &
P IR ( K LT R ), E)
P IO ( K T T B ) &
P IR ( K LT R ), and F)
P IO ( K BT T ) &
P IR ( K LT R ) descriptor vectors. The five clusters are orderedaccording to the mean χ value within the cluster. Panels B,D,F,H depict the out ofsample confusion matrices for each descriptor vectory.January 5, 2021 31/38 ig 21. Clustering of the ( ρ, χ ) parameter space using the PIO ( K LT R ) &PIO ( K RT L ) double of topological descriptor vectors using the k -means algorithm withA) k = 3, C) k = 4, E) k = 5. The clusters are ordered according to the mean χ valuewithin the cluster. Panels B,D,F) depict the out of sample confusion matrices for eachdescriptor vectory.January 5, 2021 32/38eature In Sample Accuracy Out of Sample AccuracyPIO ( K flood ) & PIR ( K flood ) 73.2% 66.4%PIR ( K flood ) & PIR ( K flood ) 71.9% 65.6%BC ( K flood ) & BC ( K flood ) 74.6% 65.0%PIO ( K flood ) & BC ( K flood ) 72.6% 63.9%PIR ( K flood ) & BC ( K flood ) 72.6% 63.4%PIR ( K flood ) & BC ( K flood ) 72.6% 63.4%PIO ( K flood ) & BC ( K flood ) 72.7% 63.4%PIO ( K flood ) & PIO ( K flood ) 70.7% 62.5%PIO ( K flood ) & PIR ( K flood ) 71.2% 60.6%PIO ( K flood ) & PIR ( K flood ) 70.6% 59.8%PIO ( K flood ) & BC ( K flood ) 66.4% 53.4%PIO ( K flood ) & BC ( K flood ) 66.6% 52.1%PIO ( K flood ) & PIR ( K flood ) 62.1% 51.5%PIR ( K flood ) & BC ( K flood ) 66.1% 51.0%PIR ( K flood ) & BC ( K flood ) 66.1% 51.0% Table 6.
Out of Sample Accuracy scores for doubles of feature vectors from theflooding filtration using k -means classification with k = 5.January 5, 2021 33/38 B r a n c h i n g () PIR RTL and PIO RTL clustering
Fig 22.
Clustering based on the ( ρ, χ ) grouping and the time to branch, ψ . We createda data set of blood vessel simulations by considering ( ρ, χ ) values from groups 1-5 fromFig 12A and letting ψ vary over the 10 values { . , . , . , . . . , . } . We simulated theAnderson-Chaplain model ten times at each ( ρ, χ, ψ ) values to create 500 totalsimulations. We performed our clustering methodology on this collection of imagesbased off the PIO ( K LT R ) & PIO ( K RT L ) descriptor vector.
Supporting informationReferences
1. Gupta MK, Qin RY. Mechanism and its regulation of tumor-inducedangiogenesis. World Journal of Gastroenterology. 2003;9(6):1144–1155.doi:10.3748/wjg.v9.i6.1144.2. Kushner EJ, Bautch VL. Building blood vessels in development and disease.Current Opinion in Hematology. 2013;20(3):231–236.doi:10.1097/MOH.0b013e328360614b.3. Tonnesen MG, Feng X, Clark RAF. Angiogenesis in Wound Healing. Journal ofInvestigative Dermatology Symposium Proceedings. 2000;5(1):40–46.doi:10.1046/j.1087-0024.2000.00014.x.4. Anderson ARA, Chaplain MAJ. Continuous and Discrete Mathematical Modelsof Tumor-induced Angiogenesis. Bulletin of Mathematical Biology.1998;60(5):857–899. doi:10.1006/bulm.1998.0042.5. Balding D, McElwain DLS. A mathematical model of tumour-induced capillarygrowth. Journal of Theoretical Biology. 1985;114(1):53–73.doi:10.1016/S0022-5193(85)80255-1.6. Stokes CL, Lauffenburger DA. Analysis of the roles of microvessel endothelial cellrandom motility and chemotaxis in angiogenesis. Journal of Theoretical Biology.1991;152(3):377–403. doi:10.1016/S0022-5193(05)80201-2.January 5, 2021 34/38. Byrne HM. Dissecting cancer through mathematics: from the cell to the animalmodel. Nat Rev Cancer. 2010;10(3):221–230. doi:10.1038/nrc2808.8. Metzcar J, Wang Y, Heiland R, Macklin P. A Review of Cell-BasedComputational Modeling in Cancer Biology. JCO Clinical Cancer Informatics.2019;doi:10.1200/CCI.18.00069.9. Folarin AA, Konerding MA, Timonen J, Nagl S, Pedley RB. Three-dimensionalanalysis of tumour vascular corrosion casts using stereoimaging andmicro-computed tomography. Microvascular Research. 2010;80(1):89–98.doi:10.1016/j.mvr.2010.03.007.10. Konerding MA, Malkusch W, Klapthor B, Ackern Cv, Fait E, Hill SA, et al.Evidence for characteristic vascular patterns in solid tumours: quantitativestudies using corrosion casts. British Journal of Cancer. 1999;80(5):724–732.doi:10.1038/sj.bjc.6690416.11. Konerding MA, Fait E, Gaumann A. 3D microvascular architecture ofpre-cancerous lesions and invasive carcinomas of the colon. British Journal ofCancer. 2001;84(10):1354–1362. doi:10.1054/bjoc.2001.1809.12. Adams H, Emerson T, Kirby M, Neville R, Peterson C, Shipman P, et al.Persistence images: a stable vector representation of persistent homology. Journalof Machine Learning Research. 2017;18(1):218–252.13. Bendich P, Marron JS, Miller E, Pieloch A, Skwerer S. Persistent HomologyAnalysis of Brain Artery Trees. Annals of Applied Statistics. 2016;10(1):198–218.doi:10.1214/15-AOAS886.14. Kannan P, Kretzschmar WW, Winter H, Warren D, Bates R, Allen PD, et al.Functional Parameters Derived from Magnetic Resonance Imaging ReflectVascular Morphology in Preclinical Tumors and in Human Liver Metastases.Clinical Cancer Research. 2018;24(19):4694–4704.doi:10.1158/1078-0432.CCR-18-0033.15. Perfahl H, Hughes BD, Alarc´on T, Maini PK, Lloyd MC, Reuss M, et al. 3Dhybrid modelling of vascular network formation. Journal of Theoretical Biology.2017;414:254–268. doi:10.1016/j.jtbi.2016.11.013.16. Bauer AL, Jackson TL, Jiang Y. A Cell-Based Model Exhibiting Branching andAnastomosis during Tumor-Induced Angiogenesis. Biophysical Journal.2007;92(9):3105–3121. doi:10.1529/biophysj.106.101501.17. Zhan Y, Li MZ, Yang L, Feng XF, Zhang QX, Zhang N, et al. An MRI Study ofNeurovascular Restorative After Combination Treatment With XiaoshuanEnteric-Coated Capsule and Enriched Environment in Rats After Stroke.Frontiers in Neuroscience. 2019;13. doi:10.3389/fnins.2019.00701.18. Carlsson G. Topology and data. Bulletin of the American Mathematical Society.2009;46(2):255–308. doi:10.1090/S0273-0979-09-01249-X.19. Zomorodian AJ. Topology for Computing. Cambridge University Press; 2005.20. Nicolau M, Levine AJ, Carlsson G. Topology based data analysis identifies asubgroup of breast cancers with a unique mutational profile and excellentsurvival. Proceedings of the National Academy of Sciences. 2011; p. 201102826.doi:10.1073/pnas.1102826108.January 5, 2021 35/381. Nielson JL, Paquette J, Liu AW, Guandique CF, Tovar CA, Inoue T, et al.Topological data analysis for discovery in preclinical spinal cord injury andtraumatic brain injury. Nature Communications. 2015;6:8581.doi:10.1038/ncomms9581.22. Qaiser T, Sirinukunwattana K, Nakane K, Tsang YW, Epstein D, Rajpoot N.Persistent Homology for Fast Tumor Segmentation in Whole Slide HistologyImages. Procedia Computer Science. 2016;90:119–124.doi:10.1016/j.procs.2016.07.033.23. Xia K, Wei GW. Persistent homology analysis of protein structure, flexibility,and folding. International Journal for Numerical Methods in BiomedicalEngineering. 2014;30(8):814–844. doi:10.1002/cnm.2655.24. Byrne HM, Harrington HA, Muschel R, Reinert G, Stolz BJ, Tillmann U.Topology characterises tumour vasculature. Mathematics Today. 2019;55(5):206 –210.25. Stolz BJ, Kaeppler J, Markelc B, Mech F, Lipsmeier F, Muschel RJ, et al.Multiscale topology characterises dynamic tumour vascular networks; 2020.26. Feng M, Porter MA. Persistent Homology of Geospatial Data: A Case Studywith Voting. arXiv:190205911 [physics]. 2019;.27. Topaz CM, Ziegelmeier L, Halverson T. Topological Data Analysis of BiologicalAggregation Models. PLoS ONE. 2015;10(5):e0126383.doi:10.1371/journal.pone.0126383.28. McGuirl MR, Volkening A, Sandstede B. Topological data analysis of zebrafishpatterns. Proceedings of the National Academy of Sciences.2020;117(10):5113–5124. doi:10.1073/pnas.1917763117.29. Bhaskar D, Manhart A, Milzman J, Nardini JT, Storey KM, Topaz CM, et al.Analyzing collective motion with machine learning and topology. Chaos.2019;29(12):123125. doi:10.1063/1.5125493.30. Otter N, Porter MA, Tillmann U, Grindrod P, Harrington HA. A roadmap forthe computation of persistent homology. European Physical Journal – DataScience. 2017;6(17):1–38. doi:10.1140/epjds/s13688-017-0109-5.31. Edelsbrunner H, Harer JL. Computational Topology: An Introduction.Providence R. I.: American Mathematical Society; 2010.32. Giusti C, Pastalkova E, Curto C, Itskov V. Clique topology reveals intrinsicgeometric structure in neural correlations. Proceedings of the National Academyof Sciences of the United States of America. 2015;112(44):13455–13460.doi:10.1073/pnas.1506407112.33. Lloyd S. Least squares quantization in PCM. IEEE Transactions on InformationTheory. 1982;28(2):129–137. doi:10.1109/TIT.1982.1056489.34. Bholowalia P, Kumar A. EBK-Means: A Clustering Technique based on ElbowMethod and K-Means in WSN. International Journal of Computer Applications.2014;105(9):17–24.35. Edelsbrunner H, Harer JL. Persistent homology — A survey. ContemporaryMathematics. 2008;453:257–282. doi:10.1090/conm/453/08802.January 5, 2021 36/386. Turner K, Mukherjee S, Boyer DM. Persistent homology transform for modelingshapes and surfaces. Information and Inference: A Journal of the IMA.2014;3(4):310–344. doi:10.1093/imaiai/iau011.37. Marchese A, Maroulas V. Signal classification with a point process distance onthe space of persistence diagrams. Advances in Data Analysis and Classification.2018;12(3):657–682. doi:10.1007/s11634-017-0294-x.38. Robins V, Turner K. Principal component analysis of persistent homology rankfunctions with case studies of spatial point patterns, sphere packing and colloids.Physica D: Nonlinear Phenomena. 2016;334:99–117.doi:10.1016/j.physd.2016.03.007.39. Stolz BJ, Harrington HA, Porter MA. Persistent homology of time-dependentfunctional networks constructed from coupled time series. Chaos.2017;27(4):047410. doi:10.1063/1.4978997.40. Bubenik P, Hull M, Patel D, Whittle B. Persistent homology detects curvature.Inverse Problems. 2020;36(2):025008. doi:10.1088/1361-6420/ab4ac0.41. Karolak A, Markov DA, McCawley LJ, Rejniak KA. Towards personalizedcomputational oncology: from spatial models of tumour spheroids, to organoids,to tissues. Journal of The Royal Society Interface. 2018;15(138):20170703.doi:10.1098/rsif.2017.0703.42. Mantzaris NV, Webb S, Othmer HG. Mathematical modeling of tumor-inducedangiogenesis. Journal of Mathematical Biology. 2004;49(2):111–187.doi:10.1007/s00285-003-0262-2.43. Peirce SM. Computational and Mathematical Modeling of Angiogenesis.Microcirculation. 2008;15(8):739–751. doi:10.1080/10739680802220331.44. Scianna M, Bell CG, Preziosi L. A review of mathematical models for theformation of vascular networks. Journal of Theoretical Biology. 2013;333:174–209.doi:10.1016/j.jtbi.2013.04.037.45. Owen MR, Alarc´on T, Maini PK, Byrne HM. Angiogenesis and vascularremodelling in normal and cancerous tissues. Journal of Mathematical Biology.2008;58(4):689. doi:10.1007/s00285-008-0213-z.46. Norton KA, Jin K, Popel AS. Modeling triple-negative breast cancerheterogeneity: Effects of stromal macrophages, fibroblasts and tumor vasculature.Journal of Theoretical Biology. 2018;452:56–68. doi:10.1016/j.jtbi.2018.05.003.47. Vavourakis V, Wijeratne PA, Shipley R, Loizidou M, Stylianopoulos T, HawkesDJ. A Validated Multiscale In-Silico Model for Mechano-sensitive TumourAngiogenesis and Growth. PLOS Computational Biology. 2017;13(1):e1005259.doi:10.1371/journal.pcbi.1005259.48. Vilanova G, Colominas I, Gomez H. A mathematical model of tumourangiogenesis: growth, regression and regrowth. Journal of The Royal SocietyInterface. 2017;14(126):20160918. doi:10.1098/rsif.2016.0918.49. Ha M, Hb F. Evaluation of Drug-Loaded Gold Nanoparticle Cytotoxicity as aFunction of Tumor Vasculature-Induced Tissue Heterogeneity. Annals ofBiomedical Engineering. 2018;47(1):257–271. doi:10.1007/s10439-018-02146-4.January 5, 2021 37/380. Feleszko W, Ba(cid:32)lkowiec EZ, Sieberth E, Marczak M, Dabrowska A, Giermasz A,et al. Lovastatin and tumor necrosis factor-alpha exhibit potentiated antitumoreffects against Ha-ras-transformed murine tumor Via inhibition of tumor-inducedangiogenesis. International Journal of Cancer. 1999;81(4):560–567.51. Cai H, Liu X, Zheng J, Xue Y, Ma J, Li Z, et al. Long non-coding RNA taurineupregulated 1 enhances tumor-induced angiogenesis through inhibitingmicroRNA-299 in human glioblastoma. Oncogene. 2017;36(3):318–331.doi:10.1038/onc.2016.212.52. Sailem HZ, Zen AAH. Morphological landscape of endothelial cell networksreveals a functional role of glutamate receptors in angiogenesis. Scientific reports.2020;10(1):1–14.53. Gimbrone MA, Cotran RS, Leapman SB, Folkman J. Tumor Growth andNeovascularization: An Experimental Model Using the Rabbit Cornea. Journal ofthe National Cancer Institute. 1974;52(2):413–427. doi:10.1093/jnci/52.2.413.54. Hlushchuk R, Barr´e S, Djonov V. Morphological Aspects of Tumor Angiogenesis.In: Ribatti D, editor. Tumor Angiogenesis Assays: Methods and Protocols.Methods in Molecular Biology. New York, NY: Springer; 2016. p. 13–24.Available from: https://doi.org/10.1007/978-1-4939-3999-2_2https://doi.org/10.1007/978-1-4939-3999-2_2