[PDF] Data-driven cardiovascular flow modeling: examples and opportunities

Abstract

High-fidelity modeling of blood flow is crucial for enhancing our understanding of cardiovascular disease. Despite significant advances in computational and experimental characterization of blood flow, the knowledge that we can acquire from such investigations remains limited by the presence of uncertainty in parameters, low spatiotemporal resolution, and measurement noise. Additionally, extracting useful information from these datasets is challenging. Data-driven modeling techniques have the potential to overcome these challenges and transform cardiovascular flow modeling. In this paper, we review several data-driven modeling techniques, highlight the common ideas and principles that emerge across numerous such techniques, and provide illustrative examples of how they could be used in the context of cardiovascular fluid mechanics. In particular, we discuss principal component analysis (PCA), robust PCA, compressed sensing, the Kalman filter for data assimilation, low-rank data recovery, and several additional methods for reduced-order modeling for cardiovascular flows, including the dynamic mode decomposition (DMD), and the sparse identification of nonlinear dynamics (SINDy). All of these techniques are presented in the context of cardiovascular flows with simple examples. These data-driven modeling techniques have the potential to transform computational and experimental cardiovascular flow research, and we discuss challenges and opportunities in applying these techniques in the field, looking ultimately towards data-driven patient-specific blood flow modeling.

Full PDF

DData-driven cardiovascular ﬂow modeling: examples andopportunities

Amirhossein Arzani Scott T. M. Dawson Department of Mechanical Engineering, Northern Arizona University, Flagstaﬀ, AZ, UnitedStates Department of Mechanical, Materials and Aerospace Engineering, Illinois Institute of Technology,Chicago, IL, United StatesCorrespondence:Amirhossein Arzani,Northern Arizona University,Flagstaﬀ, AZ, 86011Email: [email protected]

Abstract

High-ﬁdelity modeling of blood ﬂow is crucial for enhancing our understanding of cardiovas-cular disease. Despite signiﬁcant advances in computational and experimental characterizationof blood ﬂow, the knowledge that we can acquire from such investigations remains limited bythe presence of uncertainty in parameters, low spatiotemporal resolution, and measurementnoise. Additionally, extracting useful information from these datasets is challenging. Data-driven modeling techniques have the potential to overcome these challenges and transform car-diovascular ﬂow modeling. In this paper, we review several data-driven modeling techniques,highlight the common ideas and principles that emerge across numerous such techniques, andprovide illustrative examples of how they could be used in the context of cardiovascular ﬂuidmechanics. In particular, we discuss principal component analysis (PCA), robust PCA, com-pressed sensing, the Kalman ﬁlter for data assimilation, low-rank data recovery, and severaladditional methods for reduced-order modeling for cardiovascular ﬂows, including the dynamicmode decomposition (DMD), and the sparse identiﬁcation of nonlinear dynamics (SINDy). Allof these techniques are presented in the context of cardiovascular ﬂows with simple examples.These data-driven modeling techniques have the potential to transform computational andexperimental cardiovascular ﬂow research, and we discuss challenges and opportunities in ap-plying these techniques in the ﬁeld, looking ultimately towards data-driven patient-speciﬁcblood ﬂow modeling.

Keywords: blood ﬂow; hemodynamics; data science; sparse sensing; data-driven dynamicalsystems; reduced order modeling. a r X i v : . [ phy s i c s . c o m p - ph ] S e p Introduction

We live in the age of data science, where the wealth of data and advances in data processing andcomputational power are beginning to aﬀect every ﬁeld. The ﬁeld of cardiovascular ﬂuid mechanicsis no exception. Blood ﬂow modeling has come a long way from Womersley’s analytical Navier-Stokes solution for incompressible pulsatile ﬂow in tubes during the 1950s [1] to the recent foodand drug administration approval of patient-speciﬁc computational ﬂuid dynamics (CFD) estima-tion of pressure drop in coronary artery disease [2]. Advances in computational and experimentalcharacterization of blood ﬂow along with promising advances in the ﬁeld of data science provide aunique opportunity for exciting developments in the ﬁeld of cardiovascular ﬂuid mechanics with thepotential to transform our understanding of cardiovascular disease.In modeling cardiovascular disease, blood ﬂow and hemodynamics data are provided by multiplemodalities. Patient-speciﬁc CFD, in-vivo 4D ﬂow magnetic resonance imaging (MRI), and in-vitroparticle image velocimetry (PIV) or particle tracking velocimetry (PTV) are leading modalitiesin providing 3D time-resolved blood ﬂow data. However, none of these modalities are perfect.Numerical error, parameter uncertainty, imaging artifacts, low spatiotemporal resolution, and mea-surement noise are among the limitations of these methods. Some open questions are summarized:How can we improve the quality and accuracy of the data generated by either of these modalities?If we possess data from more than one modality, is it possible to generate new data with superioraccuracy with respect to either dataset? How can we best extract useful information from thesedatasets? Given the complex spatiotemporal patterns in hemodynamics, is it possible to derivesimpliﬁed representations of the data that facilitate physical understanding and further modeling?Data-driven modeling techniques provide potential means to answer these questions.This work will focus on data-driven modeling techniques that seek to exploit underlying low-dimensionality and sparsity in data, and in the systems that generate such data. These methodscombine classical data-driven analysis techniques originating from principal component analysis de-veloped over a century ago by Pearson [3], with more recent advances in data-driven dynamicalsystems analysis [4, 5, 6, 7], compressive sensing [8, 9, 10], and optimal data reconstruction andinterpolation [11, 12, 13]. In the class of methods that we focus on, we are typically interested in rep-resenting complex high-dimensional data in a simpler low-dimensional space, while also potentiallydealing with corrupted or under-resolved data. Note that the class of data-driven modeling methodsconsidered here share some similarities and distinctions with classical machine learning and deeplearning applications. In both techniques, we are interested in learning patterns in data. In deeplearning with neural networks, we are seeking optimized function approximators that are capable ofrevealing hidden patterns in data. Similarly, in the data-driven modeling methods discussed here,we are often interested in ﬁnding optimized models that can represent the data. A major distinc-tion is that in what we present as data-driven modeling, we often deal with small and sometimescorrupted datasets, whereas classical deep learning techniques typically require very large datasets.Additionally, lack of interpretability is a shortcoming of current deep learning techniques, which isan important issue in ﬂuid mechanics where we are often interested in the ﬂow physics [14]. On theother hand, data-driven reduced-order modeling facilitates physical interpretation [15].The fact that high-dimensional datasets are (at least approximately) low-rank can be observedempirically across a broad range of applications, and this ubiquitous phenomenon can also bedemonstrated theoretically under certain assumptions [16]. By deﬁnition, a low-rank dataset canbe represented by a small number of functions. For example, low-rank data representing a snapshotin time of the state of a system can be expressed as a certain linear combination of such functions,for any time. These functions can be viewed as a small number of elements of a basis that spansp. 2he space of all possible data, whether physically realistic or not. The fact that any physically-realistic snapshot can be represented by a small number of basis functions means that a datasetconsisting of many such snapshots is sparse in this basis. The existence of a basis in which thedata is sparse allows for the application of several related techniques that exploit this sparsity, suchas reconstruction of full state information from a limited number of sensor measurements, whichmay be placed randomly, or more optimally if given more information about the system. Also,by considering data generated from a given physical system as having some type of sparsity inan appropriate basis, we can further consider notions of sparsity in the underlying equations thatcan generate or approximate the data. For example, if approximating a given physical system bya set of nonlinear ordinary diﬀerential equations involving a small number of state variables, thestructure of the equations might typically only require a small number of nonzero terms. Thesenonzero terms are sparse in the space of all possible terms and can be identiﬁed from data usingsparsity-promoting identiﬁcation methods [7].In this manuscript, we review several data-driven modeling techniques and provide examples ofhow they could be used in cardiovascular ﬂuid mechanics research. The overarching goal of thesetechniques is to facilitate data interpretation, augment data when the data is incomplete or low-resolution, improve the quality of data, and leverage data from multiple modalities to improve dataﬁdelity. Many of these techniques are based on a well-established, yet diverse set of mathemati-cal tools. Familiarity with these fundamental mathematical techniques and concepts may enablecardiovascular researchers to come up with customized techniques to improve their models. Theobjective of this manuscript is to provide an introduction to some of these techniques with tangibleexamples for the cardiovascular ﬂuid mechanics community. To help researchers who would like amore in-depth introduction to these topics, in Table 1, we provide a list of excellent books thatwe recommend for each topic. The list is classiﬁed based on several important fundamental andapplied mathematical topics that are generally important in data science and data-driven modelingfor engineering and applied sciences.Table 1: List of recommended books for an in-depth introduction to fundamental and appliedmathematical topics that form the basis of data-driven modeling.Book title Author Comments Ref

Linear algebra

Linear algebra and learning fromdata G. Strang An excellent introduction with emphasis ondata-driven modeling. Author’s linear alge-bra video lectures are available on MIT OCWwebsite. [17]Introduction to applied linear al-gebra: vectors, matrices, and leastsquares S. Boyd &L. Vanden-berghe Basic linear algebra concepts. S. Boyd’s“Introduction to Linear Dynamical Systems”video lectures covering similar topics areavailable on Youtube. [18]Numerical linear algebra L. Trefethen& D. Bau A standard introductory textbook for numer-ical implementation of linear algebra algo-rithms. [19]

Statistics p. 3ractical Statistics for Data Scien-tists: 50+ Essential Concepts Us-ing R and Python P. Bruce etal. An easy read for essential statistical tools indata-driven modeling. [20]Probability and MathematicalStatistics: Theory, Applications,and Practice in R M. Meyer A comprehensive mathematical introductionto statistical theories. [21]

Optimization

Convex optimization S. Boyd &L. Vanden-berghe The standard textbook for convex optimiza-tion. S. Boyd’s video lectures based on thistopic are available on Youtube. [22]

Dynamical systems

Diﬀerential equations and dynam-ical systems L. Perko One of the best mathematical introductionsto dynamical systems. [23]Nonlinear dynamics and chaos:With applications to physics, biol-ogy, chemistry, and engineering S. Strogatz A simple conceptual and practical introduc-tion to dynamical systems. Strogatz’s videolectures based on this topic are available onYoutube. [24]

Data assimilation

Data assimilation: methods, algo-rithms, and applications M. Asch etal. A rigorous and comprehensive introductionto the topic. [25]Data assimilation: A mathemati-cal introduction K. Law et al. A mathematical introduction with Matlabcodes. [26]

Compressed sensing and sparsity

Compressed sensing for engineers A. Majum-dar An excellent practical introduction to com-pressed sensing with diﬀerent examples suchas medical imaging. [27]Sparse modeling: theory, algo-rithms, and applications I. Rish & G.Grabarnik A mathematically more detailed introductionto compressed sensing and sparsity. [28]

Machine learning

The hundred-page machine learn-ing book A. Burkov A concise but informative and conceptual in-troduction to machine learning. [29]An introduction to statisticallearning G. James etal. One of the most popular and accessible refer-ences in machine learning. [30]Matrix methods in data miningand pattern recognition L. Eld´en A mathematical introduction to machinelearning with emphasis on linear algebra. [31]

Reduced-order and data-driven modeling p. 4educed basis methods for partialdiﬀerential equations: an introduc-tion A. Quar-teroni et al. A detailed mathematical introduction toreduced-order modeling of partial diﬀerentialequations and proper orthogonal decomposi-tion (POD). [32]Turbulence, coherent structures,dynamical systems and symmetry P. Holmeset al. An introduction to POD with focus on appli-cations in ﬂuid mechanics and turbulence. [5]Dynamic mode decomposition:data-driven modeling of complexsystems N. Kutz et al. An introduction to reduced-order modeling ofdynamical systems using dynamic mode de-composition (DMD). [33]Data-driven modeling & scientiﬁccomputation: methods for com-plex systems & big data N. Kutz A comprehensive introductory book on a widerange of data-driven modeling tools and nec-essary mathematical preliminaries. Kutz’svideo lectures based on this topic are avail-able on Youtube. [34]Data-driven science and engineer-ing: Machine learning, dynamicalsystems, and control S. Brunton& N. Kutz An excellent overview of various data-drivenmodeling techniques. Brunton’s video lec-tures based on this topic are available onYoutube. [35]

In each section, we present an important data-driven modeling technique. Each section is dividedinto subsections where we present the motivation behind the need for the technique, the math-ematical theory, example(s) relevant to cardiovascular ﬂow problems, and a short discussion onopportunities for cardiovascular researchers as well as challenges in applying the technique to prac-tical problems. Finally, we brieﬂy discuss a few recent trends in computational mechanics relatedto data science that could be valuable in cardiovascular biomechanics modeling.

Thanks to high-performance computing, numerical simulations can provide spatiotemporally highlyresolved hemodynamics data. However, often hemodynamics data possess hidden low-dimensionality,which once exposed, can reduce the dimensionality of the data and provide opportunities for ad-vanced data analysis as discussed in the next sections. This is due to the high correlation amongtemporal snapshots in data (or equivalently, between diﬀerent locations in space). That is, underan appropriate coordinate system, we can reduce the size of hemodynamics data with minimal lossin accuracy. Principal component analysis (PCA) is based on singular value decomposition (SVD),which is one of the most ubiquitous and powerful matrix transformations in linear algebra [36]. Asan aside, note however that PCA (and the independently-developed Hotelling analysis [37]) pre-dated much of the theory and numerical analysis of SVD. PCA provides an optimal basis wherep. 5ith the minimum number of independent/uncorrelated modes one may represent the data. Inother words, we seek to ﬁnd a basis where the data has minimal correlation, and therefore reducedredundancy. What we introduce as PCA in section 2.2 is essentially equivalent to what is knownas the proper orthogonal decomposition (POD) in ﬂuid mechanics [38], which will be discussedexplicitly in section 6.2.

In PCA, the data is stacked into a rectangular matrix. Here, we arrange the spatial data in n rowsand the temporal data in m columns. Therefore, spatiotemporal hemodynamics data for a variable(e.g., velocity) could be arranged into a matrix XX ( n × m ) = (cid:104) W W · · · W m (cid:105) , (1)where the spatial data in each snapshot (time-step) is arranged as a column. This form of datarepresentation will frequently be used throughout the paper. Here, for clarity, we on occasion usesubscripts such as ( n × m ) to indicate the dimensions of a matrix. The temporal mean of the dataat each spatial location is subtracted from each row to deﬁne a new matrix ˜X˜X ( n × m ) = (cid:114) m (cid:16) X ( n × m ) − ¯x ( n × J (1 × m ) (cid:17) , (2)where J (1 × m ) = (cid:104) · · · (cid:105) , and ¯x ( n × = m m (cid:80) j =1 X ij is a column vector containing the mean ofeach row. Subsequently, SVD is performed on the processed data matrix ˜X = UΣV ∗ , (3)where the columns of U and V are the left and right singular vectors (which are orthonormal), thediagonal values of Σ are the singular values, and ∗ speciﬁes the complex conjugate transpose. Forthe “full” SVD, U and V are square matrices, but here we will only need to consider the columnscorresponding to nonzero singular values.The left and right singular vectors are also eigenvectors of the equivalent space- and time-correlation matrices, satisfying ˜X ˜X ∗ u j = σ j u j (4) ˜X ∗ ˜Xv j = σ j v j , (5)where u j and v j are the j -th columns of U and V , and σ j is the j -th diagonal entry of Σ .The SVD is unique (up to rotation of the singular vectors in the complex plane), and allows foran “optimal” low-rank approximation of the original data. More precisely, the rank- r approximation ˜X r that minimizes the reconstruction error (using the Frobenius norm) (cid:15) r = (cid:107) ˜X − ˜X r (cid:107) F (6)is given by ˜X r = U r Σ r V ∗ r , (7)where U r and V r contain the ﬁrst r columns of U and V , and Σ r contains the ﬁrst r rows andcolumns of Σ . p. 6hysically, with the data arranged as described in Eq. 1, the columns of U denote an orderedset of spatial functions, or “modes”, which represent the principal components of the data. Theextent to which a given such function characterizes the data is given by the associated singularvalue (i.e. the corresponding diagonal entry of Σ ). The columns of V (or equivalently, rows of V ∗ )give information about the relative amplitude of the corresponding spatial mode for each snapshotof data. Time-resolved data is not required to compute the spatial PCA modes, but in the casewhere time-resolved data is available, these columns of V are functions of time. Therefore, we canequivalently express equation 3 as ˜X = min( n,m ) (cid:88) j =1 u j σ j v ∗ j , (8)where u j and v j are the j -th columns of U and V respectively, and σ j is the j -th diagonal entry of Σ . Remark:

For vectorial data such as velocity, the data at one time-step is still placed in one columnby considering all components of the vector.

Problem statement:

Do blood ﬂow data in aneurysms exhibit low-dimensional behavior? Howmany modes are required to reconstruct the data? How does the complexity in cerebral and ab-dominal aortic aneurysm data compare?

Problem solution:

Patient-speciﬁc computational ﬂuid dynamics (CFD) results from prior workin a cerebral aneurysm [39] and an abdominal aortic aneurysm (AAA) [40] are used. Details aboutthe CFD simulations can be found in [39, 40]. 50 time-steps are used from one cardiac cycle andPCA is performed on the velocity data. The results are shown in Fig. 1. In the cerebral aneurysmmodel, a single mode is capable of capturing a signiﬁcant portion of the data (70% energy) whereasmore modes are required in the AAA data. To capture 90% of the data (based on singular values), 6and 16 modes are required for the cerebral aneurysm and AAA models, respectively. The dominantmodes are capable of explaining the majority of the data. The higher number of modes in theAAA data is due to the more complex and chaotic nature of blood ﬂow in aortic aneurysms [41].In practice, the number of modes is selected such that a balance between model complexity andﬁdelity is achieved.

PCA could not only be used to post-process data but it is also a powerful pre-processing step invarious data-driven modeling tools, which will be discussed in the following sections. Herein, wehave demonstrated how PCA reveals hidden low-dimensionality in hemodynamics data. In ourexample, the temporal data are stacked in columns, therefore low-dimensionality implies a highcorrelation between these columns of temporal data. Equivalently, these results also show thatthere must exist high amounts of correlation between measurements at diﬀerent spatial locations.Beyond this example, it is possible to organize hemodynamics data diﬀerently. For example, if thecolumns represent hemodynamic data with varying diﬀerent parameters (e.g., boundary conditions),then PCA can demonstrate the correlation among data with diﬀerent parameters. Applying PCAp. 7igure 1: Principal component analysis (PCA) performed on cerebral and abdominal aorticaneurysm velocity data. a) A single snapshot of the velocity streamlines is shown. b) Thesingular values for diﬀerent principal components (modes) are plotted. c) A semi-log plot isshown for better visualization in the range of singular values. d) The right panel shows thenormalized cumulative energy deﬁned as the cumulative sum of singular values normalized bythe sum of all singular values.to such data will enable data classiﬁcation and pattern recognition, which could serve as a crucialstep in machine learning. For example, linear discriminant analysis could be applied to labeledSVD/PCA modes for supervised learning and data classiﬁcation [34]. This technique has been usedto investigate correlations between hemodynamic and geometric parameters and aortic insuﬃciencyand ischemic events in the aorta of left ventricular assist device patients [42]. PCA has been usedin predicting cerebral aneurysm rupture based on morphology and hemodynamics [43]. PCA couldfacilitate geometric characterization of the vasculature [44, 45, 46] and the heart [47]. One challengein using PCA is that the rows need to be consistently placed in diﬀerent snapshots. For instance,if the hemodynamic data in columns are based on diﬀerent patients, care must be taken to ensurethat the data are oriented and aligned consistently.

Noisy, ﬂuctuating data can compromise the accuracy of PCA. Such data often result in a largernumber of energetic PCA modes than what is truly needed to represent the physics of the data.The issue is pronounced in experimental data but can also exist in computational data. In exper-imental blood ﬂow data, noisy and corrupt data are often inevitable. Additionally, in cases wheretransitional ﬂow exists (e.g., blood ﬂow in aneurysms [48]), high-resolution CFD solvers can capturep. 8uctuating velocity data that may compromise the temporal statistical correlation in PCA if one isinterested in mean ﬂow behavior. Robust principal component analysis (RPCA), as developed in[49], and recently applied to ﬂuids data in [50] has been proposed to overcome these limitations byseparating the noisy data from the rest of the data using an optimization framework.

Consider the same data matrix X deﬁned in Eq. 1. We assume that the noise is randomly andsparsely distributed in the data. The ﬁrst step is to write X as the sum of a low-rank ma-trix L containing the correlated data we are interested in and a sparse matrix S containing thenoisy/ﬂuctuating/corrupt data X = L + S . (9)Next, the above objective is mathematically formulated using an optimization problemmin L , S (rank( L ) + (cid:107) S (cid:107) ) s.t. X = L + S , (10)where rank( L ) is equivalent to the number of nonzero singular values of L , and (cid:107) S (cid:107) (which is notstrictly a norm) denotes the number of nonzero entries in S . Optimizing Eq. 10 is a combinatorialproblem, which can typically only be solved using a non-tractable brute force method. Therefore,the problem is relaxed to a convex optimization problem, formulated asmin L , S ( (cid:107) L (cid:107) ∗ + λ (cid:107) S (cid:107) ) s.t. X = L + S , (11)where (cid:107) . (cid:107) ∗ is the nuclear norm (sum of the singular values), the nonconvex l pseudonorm is replacedby the convex l norm, and λ is a regularization parameter controlling the level of sparsity in S . To ﬁnd L and S , the above problem can be solved using the augmented Lagrange multiplieralgorithm [51]. Finally, we may use the L matrix to perform PCA as explained in the previoussection. Problem statement:

Can RPCA improve low-dimensional data extraction in patient-speciﬁcabdominal aortic aneurysm (AAA) data with noise?

Problem solution:

The same AAA blood ﬂow velocity data used in the previous section is used.Noise is randomly added in space and time in 30% of the data. The value of noise is sampledfrom a random normal distribution with zero mean and a standard deviation equal to 10% of themaximum streamwise velocity in the data. The noise is added to all components of the velocityvector using the same distribution. PCA and RPCA are applied to both original and noisy data.The results shown in Fig. 2 demonstrate that PCA fails in exposing the low-dimensionality in noisydata where the singular values do not asymptote to zero. RPCA overcomes this issue and revealsthe low-dimensionality in the data. An interesting observation is that RPCA applied to the originaldata (without noise) leads to a smaller number of dominant modes compared to PCA, due to thefact that some of the data is incorporated into the sparse matrix S , and discarded prior to thedecomposition. p. 9igure 2: Robust principal component analysis (RPCA) performed on abdominal aorticaneurysm velocity data. PCA on the original data (top row), PCA on the original data withnoise (second row), RPCA on the original data (third row), and RPCA on the original datawith noise (fourth row) are plotted where the principal components in (a) a regular plot, (b)semi-log plot, and (c) cumulative sum of singular values normalized by the sum of all singularvalues are shown. RPCA is an excellent method to detect low-dimensionality and coherent patterns in experimentalhemodynamics data where noise is inevitable. Recently, RPCA has been successfully applied toexperimental particle image velocimetry (PIV) data of complex ﬂuid ﬂow [50]. Experimental car-diovascular ﬂuid mechanics techniques such as 4D ﬂow magnetic resonance imaging (MRI) oftencarry a signiﬁcant amount of noisy and corrupted data. RPCA could be applied to such data top. 10econstruct the dominant ﬂow structures. Additionally, RPCA has the potential to facilitate data-driven modeling techniques such as dynamic mode decomposition (see Sec 6) that are sensitive tonoise. One challenge in applying RPCA is the requirement that the corrupt data should be sparselydistributed. This requirement and sensitivity to the hyperparameter λ should be investigated inthe context of cardiovascular ﬂows in future work. In experimental modeling of cardiovascular ﬂows, high-resolution sampling of hemodynamics datais often not possible due to inherent limitations. However, high-resolution data are an essentialpart of many modeling tasks (e.g., patient-speciﬁc ﬂow waveforms for inlet boundary conditions,spatial gradients for wall shear stress calculation, etc.). In the previous sections, we saw that high-resolution data may admit a low-dimensional sparse representation in an appropriate basis. Theidea behind compressed sensing (CS) is that if we have a priori knowledge of such a basis, then wecan reconstruct high-resolution data from low-resolution sampling. CS is a theoretical frameworkthat solves an under-determined system of linear equations when the solution is known to be sparse.As a simple example, assume we have sampled velocity data in the aorta with a spatial resolutionof 3mm (typical 4D ﬂow MRI resolution), however, we would like to reconstruct the data witha spatial resolution of 0.5mm (typical CFD resolution). The only constraint here is that at thesampled MRI locations, the reconstructed data must match the 4D ﬂow MRI data. Obviously, thisis an under-determined system, meaning that the reconstruction problem has an inﬁnite number ofpossible solutions. In CS, we regularize this problem to enable a solution, by promoting a solutionthat is sparse in the speciﬁed basis.

Suppose we have m low-resolution measurements y ∈ R m and we want to reconstruct the high-resolution data x ∈ R n with n components, where n (cid:29) m . We know the location of low-resolutionmeasurements with respect to the high-resolution data via a known measurement matrix Cy = Cx . (12)If the low-resolution measurements are a subset of the high-resolution data, then each row of C consists of zeros everywhere except for the column corresponding to a measurement location, whichhas an entry of 1. The central assumption is that we have a priori knowledge of a transformation Φ that relates the high-resolution data with a sparse vector s where x = Φs , which implies thatwe know how to sparsify the data. Combining the previous two equations, we arrive at an equationthat could be solved for s y = CΦs . (13)This is an under-determined system of equations. To enable solution, the following optimizationproblem is solved min s (cid:107) s (cid:107) s.t. y = CΦs , (14)p. 11here using the l norm, we seek the sparsest solution s that satisﬁes the measurement constraints.Here, the use of the l norm is again used as a convex proxy for the l pseudonorm representingthe number of nonzero elements of s , in order to make the problem computationally tractable.Alternatively, the optimization problem could be written using the least absolute shrinkage andselection operator (LASSO) method [52]min s (cid:107) y − CΦs (cid:107) + λ (cid:107) s (cid:107) , (15)where the hyperparameter λ controls the level of sparsity. Note in particular that this formulationmeans that Eq. 12 no longer needs to be exactly satisﬁed. Once s is found, the high-resolution datacould be recovered using x = Φs . Problem statement:

Given low-resolution measured blood ﬂow waveform data in the aorta, isit possible to reconstruct the high-resolution ﬂow waveform using CS?

Problem solution:

We consider a typical blood ﬂow waveform in the infrarenal aorta representedusing Fourier series (Fig. 3). Such a waveform is often used as an inlet boundary condition for CFDsimulations in abdominal aortic aneurysms [53]. In-vivo phase-contrast MRI (PCMRI) enablespatient-speciﬁc measurement of such waveforms. We assume that we have low-resolution temporalsampling over 20 cardiac cycles. We randomly sample 100 data points over 20 cardiac cycles (onaverage 5 points per cardiac cycle). The goal is to reconstruct the data with 1000 data points (50points per cardiac cycle). The original and downsampled waveforms are shown in Fig. 3a and Fig. 3b,respectively. In this example, we assume that the waveform is sparse in the discrete cosine transform(DCT) basis. The CS solution (from Eq. 14) is shown in Fig. 3c. The result shows signiﬁcantimprovement over the low-resolution data. However, the CS waveform does not perfectly recoverthe original waveform even though the features are recovered. Artiﬁcial high-frequency features arepresent in the CS recovered waveform, which may be removed using a Gaussian smoothing ﬁlter(Fig. 3d).

CS is a very important data-driven modeling tool with various applications [27]. In experimentalcardiovascular biomechanics modeling, it is often not possible to collect data at the desired spatialand/or temporal resolution. Under certain conditions, CS provides the means to recover high-resolution data given low-resolution measurements. If we know the basis where the data is sparse,then we do not necessarily need to collect high-resolution data as we can reconstruct the data usingCS. In other words, if such a basis exists, then the Nyquist–Shannon sampling law [54], whichrequires data sampling at a rate twice as fast as the highest frequency, is no longer required. CSis widely used in medical imaging such as computed tomography (CT) [55] and MRI [56]. Forexample, CS has been used to accelerate blood ﬂow velocity measurement in 4D ﬂow MRI in theK-space (the Fourier transform of the magnetic resonance image) [57]. However, the applicationof CS to high-resolution reconstruction of low-resolution blood ﬂow data has remained elusive. Amajor challenge is identifying the appropriate basis where the solution is sparse. In a number ofapplications, sparsity in space can be achieved via a wavelet transform, while temporal sparsity canp. 12igure 3: Pulsatile blood ﬂow rate waveform in the infrarenal aorta is presented over 20 cardiaccycles (T=1s) using Fourier series. (a) The original waveform, (b) the downsampled waveform,(c) compressed sensing waveform reconstruction, and (d) compressed sensing waveform recon-struction with Gaussian smoothing are plotted.often be obtained using a Fourier transform. For example, the wavelet transform has been usedto successfully reconstruct spatial data governed by the diﬀusion equation with 10 times higherresolution [58]. However, successful reconstruction using a standard basis will likely be challenging.Dictionary learning and transform learning provide the means to identify an optimized basis [27],however, their success in complex blood ﬂow data remains to be investigated. An alternativeapproach is to learn the sparsifying basis using training data. This topic is discussed in Sec. 7.Another challenge is the requirement that the data should be sampled in a manner that isincoherent with the basis in which the signal is sparse. This requirement, which can be expressedmore formally as the restricted isometry property [59], ensures that the sampled signal can detectnonzero coeﬃcients of all basis elements. As a counterexample, uniform sampling is “periodic” andthus is not incoherent with a Fourier basis. Most typically, we can allow this incoherence propertyto be satisﬁed by choosing a random sampling procedure.

Sometimes in cardiovascular ﬂow modeling, we have the luxury of possessing both experimental dataand computational models. However, both experimental and computational models have limitationsand errors. Experimental data are often low-resolution and noisy, whereas even high-resolution com-putational simulations have limitations due to uncertainty in model parameters and/or governingequations. The ﬁeld of “data assimilation” deals with hybrid models that integrate observed exper-imental data with computational models. The goal is to use the computational model to advancep. 13he solution in time and based on the availability of experimental data, alter the computationalmodel’s prediction to improve solution reliability.

Here, we will discuss the simplest implementation of the Kalman ﬁlter. Consider our computationalmodel governed by a dynamical system as˙ x ( t ) = f ( x , t ) + q ; x ( t ) = x + q , (16)where q is the unknown error in the model and q represents the uncertainty in the initial conditionof the system. Consider the set of experimental data as y ( t ) = g ( t ) + q , (17)where g ( t ) are the measured data and q is the associated error. Note that the measurements donot necessarily need to be available at all points. We would like our model (Eq. 16) to satisfy theexperimental observation. The above equations form a system of overdetermined equations wherewe have more equations than unknowns. Therefore, an optimization problem is deﬁned to ﬁnd thesolution that minimizes the error weighted by the above errors in the model and experimental data.In the case of one-dimensional data, the solution to such optimization problem may be written as¯ x k = x k + K ( y k − x k ) , (18)where ¯ x is the data assimilated prediction, x is the computational/analytical model, y denotes theexperimental data, K is the Kalman ﬁlter (Kalman gain), and the subscript k speciﬁes the time-step. The experimental data may not be available at every model time-step, therefore the aboveprediction will only be applied at the time-steps where the experimental data are available. TheKalman gain may be written as K = σ x σ x + σ y , (19)where it is assumed that the errors could be represented with Gaussian distributions. σ x and σ y represent the standard deviation of the error distribution in the model and experimental data,respectively. It could be seen that K weights the contribution of experimental and computationaldata based on their expected errors.The above implementation could be extended to vectorial data and the Kalman gain could beimproved and updated at every step using the extended Kalman ﬁlter method. These methods aredescribed in a simple but informative presentation in [34] and in more detail in [60, 25]. Problem statement:

Consider an unsteady version of Hill’s spherical vortex, which is an ex-tension to the steady version that has been used as a very simple analytical model of 3D vortexﬂows in the left ventricle of the heart [61]. The goal is to construct particle trajectories consideringuncertainty in the initial condition and availability of noisy experimental data.p. 14 roblem solution:

We extend Hill’s spherical vortex [62] to a transient vortex by periodicmovement of the vortex center: u ( x, y, z, t ) =  ( x + 1 − r , xy, xz ) T , if r ≤ z r − − r − − , xyr − , xzr − ) T , if r > r = (cid:112) x + ( y − A ) + ( z − A ) and A = sin( πt/ x = (0 , . , T and time-step ∆ t = 0.001 to generatethe reference data. We assume synthetic experimental data is available at a lower sampling rate (∆ t = 0.04) with uncorrelated Gaussian white noise noise added with σ y =0.05. For the computationalmodel, we assume we do not have precise knowledge of the initial condition, and thus to eachcomponent of x we add Gaussian noise with σ x = 0 .

15. The Kalman ﬁlter method is used to correctthe model prediction based on the synthetic experimental data using Eq. 18 for each componentof x , which is applied once every 40 steps (based on the lower sampling rate of the experimentaldata).The results are shown in Fig. 4. A snapshot of the vortex ﬂow is shown. The induced unsteadi-ness in the ﬂow causes trajectory sensitivity to the initial condition. We can see that the perturbedsolution (Fig. 4c) behaves very diﬀerently from the exact solution (Fig. 4b). The Kalman ﬁlterimproves the estimation of the trajectory (Fig. 4d), however, noise is still present due to the noisynature of the experimental observation. Finally, the errors in trajectories are shown (Fig. 4e) in asemi-log plot where the Kalman ﬁlter signiﬁcantly outperforms the perturbed solution.Figure 4: Data assimilation based on Kalman ﬁlter is applied to a sample trajectory fromthe transient version of Hill’s spherical vortex. (a) 3D streamlines and 2D streamlines in across-section are shown at the initial time-step. The trajectory of the (b) exact solution, (c)perturbed solution (uncertain initial condition), (d) Kalman ﬁlter solution are shown in the2D x-y plan. (e) The errors in the perturbed and Kalman ﬁlter solutions are plotted (semi-logplot). p. 15 .4 Opportunities and challenges Herein, we have presented a very simple example of the Kalman ﬁlter. More advanced imple-mentations of the Kalman ﬁlter are possible for more reliable results in more complex settings.For example, the ensemble Kalman ﬁlter has been used to merge 4D ﬂow MRI and CFD simu-lations [63]. Parameters in 0D and 1D blood ﬂow models have been estimated based on clinicalmeasurements using the ensemble Kalman ﬁlter [64, 65] and the unscented Kalman ﬁlter [66], whichare extensions formulated to deal with large state dimensions and nonlinear system dynamics, re-spectively. The ensemble Kalman ﬁlter has also been used to estimate the inlet ﬂow waveformin patient-speciﬁc arterial network models [67]. Reduced-order unscented Kalman ﬁlter has beenused in conjunction with ﬂuid-structure interaction (FSI) simulations to estimate aortic aneurysmwall stiﬀness from wall displacement measurements [68]. Other data assimilation methods such asvariational data assimilation could also be used to merge multi-modality hemodynamics data [69].In general, the family of Kalman ﬁlter data assimilation models provides the means to improvethe accuracy of computational models by leveraging even imperfect experimental data. There area growing number of studies in the literature on merging CFD and 4D ﬂow MRI data using dataassimilation [70, 71, 63, 72]. PIV is another popular approach in hemodynamics quantiﬁcation.Multi-modality data assimilation methods that leverage all of these modalities could improve theaccuracy and reliability needed in transformative patient-speciﬁc hemodynamics modeling. Onechallenge in Kalman ﬁlter modeling is the prior knowledge of the error distribution and magnitude,which might not be always available and needs to be estimated. Additionally, the assumption thaterrors follow a Gaussian distribution might not always be appropriate.

Discovering low-dimensional behavior in blood ﬂow data enables a simpliﬁed understanding of theﬂow physics and opens up the door for eﬃcient reduced-order and data-driven modeling. Physically,these reduced-order models are motivated by the presence of coherent structures or patterns, whichare often observed in even highly-complicated ﬂuid ﬂow processes. Proper orthogonal decomposition(POD) has been traditionally used in ﬂuid mechanics for reduced-order modeling of coherent struc-tures [38, 73, 5]. POD guarantees optimal reconstruction of the original dataset from an energeticperspective for a given number of modes. However, this can come with the price of losing physicalinterpretability due to the associated frequency mixing [74]. Dynamic mode decomposition (DMD)is a more recent technique, which is data-driven (equation-free) and reduces temporal snapshotsof input data to a best-ﬁt linear reduced-order model [6, 75, 76, 33]. POD ranks modes by en-ergy (reconstruction optimality), whereas DMD decomposes data into modes that isolate diﬀerentdynamical features within a dataset.

Generally speaking, modal decomposition methods seek to decompose a system, or dataset, intospatial functions (modes), and their temporal coeﬃcients. That is, if we have a system state givenp. 16y y ( x, t ) , we wish to obtain a decomposition that separates the spatial and temporal features by y ( x, t ) ≈ m (cid:88) j =1 a j ( t ) u j ( x ) , (21)where u j ( x ) are spatial functions, with time-varying coeﬃcients given by a j ( t ). There are severalmethods to obtain such a decomposition from data, each of which has certain desirable features. Ifwe wish to obtain a decomposition that is most eﬃcient in the sense that the diﬀerence between theleft and right sides of equation 21 is minimized, then we can utilize a singular value decomposition(i.e. PCA), as described in section 2.2, utilizing the optimality property of SVD that minimizes thereconstruction error (Eq. 6). Here, the spatial modes u j correspond to the left singular vectors of thedata matrix, with the temporal functions given by the scaled right singular vectors, a j ( t ) = σ j v ∗ j .In this context, this is called a proper orthogonal decomposition. As the name suggests, thisdecomposition also gives spatial modes that are orthogonal (from the properties of SVD).In practice, if the size of a snapshot is much larger than the number of snapshots (i.e., n (cid:29) m ),it can be computationally cheaper to compute this decomposition by ﬁrst ﬁnding v j from theeigendecomposition given in Eq. 5, from which the POD modes can be subsequently found from u j = σ − j ˜Xv j . (22)This method for computing POD was ﬁrst proposed in [77], and is the approach employed in theexamples considered here.POD is not the only choice of decomposition for Eq. 21. We might also want to obtain a decom-position where the coeﬃcients a j ( t ) possess certain properties (note that they are also orthogonalfunctions in the POD). In particular, if we have a j ( t ) = exp( λ j t ) , (23)then each spatial mode can be associated with a characteristic growth/decay rate and frequencyof oscillation in time, given by the real and imaginary components of λ j , respectively. Such adecomposition can be obtained via dynamic mode decomposition (DMD). For DMD, it is assumedthat the data is time-resolved, with a ﬁxed timestep, ∆ t . The fundamental idea underlying DMDis to ﬁnd a linear operator A , which maps the system one timestep into the future, the eigenvectorsand eigenvalues of which are DMD modes and eigenvalues. Letting X and X denote the datamatrix with the last and ﬁrst column (snapshot) removed respectively, the linear operator A isgiven by A = X X +1 , (24)where + denotes the pseudoinverse. In practice, the n × n matrix A is typically not computeddirectly, as it is computationally easier to work in a lower dimensional space obtained via an SVDof the data. A typical algorithm to compute DMD involves the following steps:1. Compute the (optionally truncated to rank r ) SVD X = UΣV ∗ ≈ U r Σ r V ∗ r .2. Compute A r = U ∗ r X V r Σ − r , and ﬁnd it’s eigenvalues and eigenvectors satisfying A r w j = µ j w j

3. Compute DMD modes φ j = U r w j corresponding to each continuous time eigenvalue λ j , where µ j = exp( λ j ∆ t ). p. 17ith this approach, the number of DMD modes (or equivalently, the number of terms in the sum inEq. 21) is controlled by the truncation parameter, r . In practice, alternative approaches to choosea smaller number of modes can include modifying the basis of projection (e.g., [78, 79]), choosinga smaller subset of modes to use from those that are computed (e.g., [80, 81, 82]), or performing abalanced truncation on the resulting linear model (e.g., [83]). There are numerous further variantsof DMD that extend its applicability and usefulness for a wider variety of systems, incorporating,for example, external inputs [84], nonlinear observables [85], and noisy data [86, 87, 88]. For a morecomprehensive discussion of these variants, see [83, 15, 35]. Problem statement:

Given spatiotemporally resolved velocity data in a 2D cerebral aneurysmmodel, compare the dominant DMD and POD modes.

Problem solution:

Transient CFD simulations are performed using the open-source ﬁnite-elementsolver FEniCS [89]. An idealized 2D cerebral aneurysm model is considered with the geometric di-mensions shown in Fig. 5. Incompressible Navier-Stokes equations are solved with µ =0.04 P and ρ =1.06 g/cm . The pulsatile inlet ﬂow waveform is shown in Fig. 5 and the simulations are run forthree cardiac cycles and the last cycle is used in data analysis. The mesh consists of 15.2K trian-gular elements. The ﬁrst four dominant POD and DMD modes are shown in Fig. 5. The dominantDMD modes are selected using the sparsity-promoting algorithm given in [80]. The ﬁrst dominantmode in both methods is very similar. POD picks up more complex structures in the other modes,whereas the DMD modes seem to be inﬂuenced by the dominant vortex in the aneurysm. Velocity

PODDMD Mode 1 Mode 2 Mode 3 Mode 4 V e l o c it y ( c m / s ) Time (s)

Inlet BC

Figure 5: The ﬁrst four proper orthogonal decomposition (POD) and dynamic mode decom-position (DMD) modes are shown in an idealized aneurysm model with the shown geometricdimensions. The pulsatile waveform used as inlet boundary condition is shown.p. 18 .4 Opportunities and challenges

Modal analysis techniques such as POD and DMD facilitate ﬂow physics analysis by breaking downthe ﬂow evolution into several coherent structures (modes) ranked based on their energy (POD)and dynamics (DMD). These reduced-order models could also be used to reduce computationalcosts in simulations. However, accurate long-term extrapolation beyond the training data used inderiving these models remains a challenge. POD has been used in characterizing turbulent bloodﬂow [90, 91] and DMD has been used for quantitative modal analysis of blood ﬂow physics [92, 39].The pulsatile nature of blood ﬂow and variability in ﬂow patterns during diﬀerent phases of thecardiac cycle challenge the direct application of these techniques. Multistage DMD with controlhas been proposed to overcome these challenges in blood ﬂow physics analysis with DMD [39].

Uncertainty in model parameters is often an inevitable aspect of patient-speciﬁc computationalhemodynamics simulations. Inlet ﬂow waveform, ﬂow split ratio among the outlets, blood rheology,the 3D geometry, and vessel wall material property in FSI simulations are often not precisely known.These uncertainties may question the ﬁdelity of even high-resolution and minimally dissipative CFDsolvers, once the results are being interpreted in the context of personalized medicine. On the otherhand, in-vivo ﬂow measurement techniques such as 4D ﬂow MRI do not suﬀer from such limitations,however, they are limited in resolution and produce noisy data. Machine learning reduced-ordermodels provide a framework where one can construct a library of high-resolution computationalsimulations by varying the uncertain parameters, and couple that with reduced-order modeling andcompressed sensing to reconstruct high-resolution data and identify the uncertain parameter usingthe low-resolution experimental data [93].

First, we perform several high-resolution computational simulations by varying the uncertain pa-rameter p times ( β , β , · · · , β p ). Each simulation often results in several data snapshots in time.POD (equivalently, PCA) is performed on the dataset created for each parameter. Assume we needto keep m POD modes for each simulation such that we can reconstruct the data with acceptableerror (e.g., capturing 99% of the energy). The number m could be diﬀerent for each set of simu-lations, depending on the change in data complexity with variation in the uncertain parameter. Alibrary could be constructed as ψ = (cid:104) φ ( x , β ) · · · φ m ( x , β ) φ ( x , β ) · · · φ m ( x , β ) · · · φ ( x , β p ) · · · φ m ( x , β p ) (cid:105) . (25)This library contains a reduced-order representation of all possible dynamics in the systemwithin the range of the considered parameters and may be thought of as a supervised machinelearning step. Note that it is possible to perform POD just once on the entire appended dataset asopposed to each β parameter. This approach could still be used for data reconstruction, however,p. 19t will make parameter identiﬁcation less trivial. Next, an underdetermined system of equations isconstructed as ˆ Y = ( C ψ ) b , where ˆ Y is the low-resolution experimental data, C is a measurementmatrix specifying measurement locations, and b is a vector to be determined. To ﬁnd the solution,the following compressed sensing problem is solved (see Sec. 4)min b (cid:107) b (cid:107) s.t. ˆ Y = ( C ψ ) b . (26)To enable compressed sensing, the measurement locations used ( C ) should be randomly selected. Aswe discuss in Example 2 below, relaxing the compressed sensing optimization problem can performbetter in parameter identiﬁcationmin b (cid:107) b (cid:107) s.t. (cid:107) ˆ Y − ( C ψ ) b (cid:107) < (cid:15) , (27)where (cid:15) is a tolerance. Once the above optimization problem is solved, the high-resolution data isreconstructed as Y = ψ b .Thus far, we have not speciﬁed the locations of measurements (as deﬁned by C ). Additionally,we have assumed knowledge of a basis in which our data is sparse, but have not prespeciﬁed whichbasis elements will have nonzero coeﬃcients (i.e., which terms in b are nonzero). Here, followingthe work of [13], we show that favorable sensor/measurement locations can readily be found givenknowledge of such basis elements. From section 2.2, a low-dimensional set of functions that capturemost of the original data may be obtained from the columns of U r in the truncated SVD ψ = UΣV ∗ ≈ U r Σ r V ∗ r , (28)where U r and V r denotes the ﬁrst r columns of U and V , and Σ r contains the ﬁrst r diagonal entriesof Σ . Under the assumption that ψ ≈ U r U ∗ r ψ , we can, in theory, reconstruct the high-resolutiondata from r measurement locations by solving the (no longer underdetermined) equationˆ Y = CU r ( U ∗ r ψ ) = CU r a , (29)where the r -dimensional vector of coeﬃcients a = U ∗ r ψ . In principle, equation 29 can be solvedfor a for any choice of sensor locations C that give linearly independent equations, by using thepseudoinverse a = ( CU r ) + ˆ Y . (30)However, some choices of C can result in these equations being ill-conditioned, which means inparticular that noisy measurements ˆ Y can give wildly inaccurate reconstructions of the full data.One method to select measurement locations, as described in [13] is to choose sensor locations in amanner that results in CU r having a small condition number. This can be achieved via the pivotedQR decomposition U Tr C T = QR , (31)where the permutations in C T ensure that the diagonal elements of R are ordered from largestto smallest. In the examples below, we call this approach optimal sampling. Similar data recon-struction methods are used in the empirical interpolation methods described in [11, 12, 94]. Tosummarize, application of this method involves1. Obtain a set of modes/functions U r that suﬃciently capture the high ﬁdelity data (e.g. viaan SVD). p. 20. Find optimal measurement locations (e.g. low-dimensional data) using a pivoted QR decopo-sition of U r (equation 31).3. Find the mode coeﬃcients a from equation 30.4. Reconstruct the high-ﬁdelity data via Y = U r a . Problem statement:

Given a few clean or noisy velocity samples from the Womersley ﬂowproﬁle, is it possible to reconstruct the entire velocity proﬁle without knowing the Womersleynumber?

Problem solution:

For demonstration purposes, we consider a wide range of Womersley numbers α (1–29) based on a ﬁxed pressure gradient waveform (reported in [39]) to construct a library ofWomersley velocity proﬁles based on Womersley’s analytical solution [1]. The data is based on 150radial locations and 100 time-steps over one cardiac cycle. In this example, in constructing thelibrary, we apply POD once to the entire data. The reference data (groundtruth) is considered tobe α =14.7. To test how the method works under noise, noisy data is created by adding randomGaussian noise with zero mean and standard deviation of 10% the maximum velocity. Given only5 random and optimal sample points along the radial direction at one time-step, we are interestedin reconstructing the velocity proﬁle. The results are shown in Fig. 6. Optimal sensor placementcan reconstruct the true data with high accuracy ( l norm in error 0.03 and 0.002 for noisy andclean data, respectively). Data reconstruction with random sensor placement has diﬃculty inreconstructing the data ( l norm in error 0.45 and 0.35 for noisy and clean data, respectively).More sampling points are required for successful reconstruction with random sampling, and theoptimal sensor placement method is more robust to noise.Figure 6: The groundtruth velocity proﬁle is shown with optimal (left) and random (right)sampling. For each case, the reconstructed velocity proﬁle under clean and noisy data is shown.p. 21 .4 Example 2: Reconstruct high-resolution cerebral aneurysm ﬂowwith uncertain viscosity Problem statement:

Patient-speciﬁc variation in blood viscosity is common in patients withhemorheological disorders (e.g., diabetes) [95]. Given sparse (low-resolution) sampling of noisyvelocity data in a 2D cerebral aneurysm ﬂow with an unknown viscosity, is it possible to identifythe viscosity?

Problem solution:

The 2D cerebral aneurysm model in Sec. 6 is considered. Simulations areperformed for 8 diﬀerent dynamic viscosity values uniformly distributed between µ =0.03–0.2 P.Spatiotemporal velocity data are collected for all µ values. We consider µ =0.2 P as the groundtruthdata. Additionally, we assume we only have one temporal snapshot inside the cardiac cycle availablefor measured data. In this example, we consider 150 random data samples (sensors) to collect 2Dvelocity vector data. Noisy data are created by adding Gaussian noise with zero mean and standarddeviation of 5% the velocity data at each location. POD is performed for each parameter andappended to build the library ψ by keeping 12 modes for each parameter. The relaxed version ofthe compressed sensing problem (Eq. 27) is solved to identify the active modes in the library andreconstruct the data. Parameter identiﬁcation is done based on the dominant active modes in thelibrary. The resulting active modes are shown in Fig. 7. The dominant active mode belongs to theset of modes where µ =0.2 P, and therefore the method correctly identiﬁes the unknown viscosity.Most of the other modes are close to zero in accordance with the sparse regression model. Thereconstructed velocity patterns are very close to the original data. In general, further investigationshowed that parameter identiﬁcation could be achieved with fewer sensors whereas accurate datareconstruction required more sensors. For example, with even 30 sensors the algorithm was capableof correctly identifying the unknown viscosity (with reduced reconstruction accuracy). Anotherimportant observation is the sensitivity of the compressed sensing algorithm to noise. In the presenceof noise, the relaxed version of the optimization problem (Eq. 27) with a relatively high value of (cid:15) was required for convergence (herein, (cid:15) was set to 25% of the maximum velocity). In the exampleconsidered here, an increase in (cid:15) increased reconstruction error as expected but did not aﬀectparameter identiﬁcation. The machine learning ROM framework is capable of merging high-resolution numerical data (with anuncertain parameter) with low-resolution experimental data. A promising application is overcominglimitations in patient-speciﬁc CFD and 4D ﬂow MRI characterization of hemodynamics. In CFDmodeling, we can generate high-resolution data but often have to face uncertainty in parametersand assumptions. On the other hand, in-vivo 4D ﬂow MRI directly measures blood ﬂow inside thebody (without uncertainty in parameters), however, it is subject to imaging artifacts and noise. Themachine learning ROM approach could provide a method to overcome these limitations. A similarframework has been used to merge CFD and 4D ﬂow MRI where the authors assumed uncertaintyin the CFD inlet ﬂow waveform [70]. The machine learning ROM method could be extended toscenarios where multiple uncertain parameters are present. One challenge in merging CFD and4D ﬂow MRI data is the discrepancy in geometries during image segmentation and 3D modelcreation, which could be thought of as another uncertain parameter (the wall location where theno-slip boundary condition is imposed). It might be possible to use the machine learning frameworkin conjunction with non-uniform rational basis spline (NURBS) surfaces (e.g., isogeometric ﬁnitep. 22

Flow

Original data V e l o c it y ( c m / s ) V e l o c it y ( c m / s ) Time (s)

Inlet BC

Reconstructed data

Figure 7: The 2D aneurysm model and inlet boundary condition for CFD simulations areshown. The velocity streamlines for the original and reconstructed data are shown. The time-point of the original data is marked with a red dot in the waveform. The mode coeﬃcients in thereduced-order model library are plotted. The method correctly identiﬁes the dominant modeas being associated with the set of µ =0.2 P modes. The measurement sensors were randomlyplaced.element analysis [96]) to have control over a set of parameterized geometries. Another challenge isthe construction of the training library, which is the computationally expensive part of the model.Greedy algorithms that can appropriately identify the set of parameters in the library constructionprocess could mitigate this diﬃculty [97]. In the previous sections, to reconstruct high-resolution data, we needed prior knowledge of a basiswhere the data was sparse. Such knowledge might not always be available. However, if the datamatrix (Eq. 1) is low-rank, then the full data could be recovered with partial data sampling usinglow-rank matrix recovery [27]. A low-rank matrix implies that most of the columns of the matrix(herein, temporal snapshots) are not linearly independent. In other words, the total degree offreedom of the matrix is much smaller than the total number of entries, therefore, we may hope tobe able to reconstruct the full data with partial observation. In complex cardiovascular ﬂows, givena large number of spatial data points (number of rows) and the complexity in the ﬂow, perfectlycorrelated columns might not be observed. For instance, noise in data either due to measurementor numerical errors will cause the data matrix to be full rank. Nevertheless, a high correlation intemporal snapshots is often observed in hemodynamics data. As shown below, the low-rank matrixrecovery method could also work under these scenarios. Therefore, given low-resolution randomspatial and temporal sampling of hemodynamics data, we might be able to reconstruct the fulldata. p. 23 .2 Model theory and background

Consider the data matrix X in Eq. 1. Assume we only observe a random subset of entries (Ω) inthe matrix (random spatial and temporal sampling of data). Our goal is to reconstruct the entirematrix X with such observation assuming that X is low rank. Namely,min rank( X ) s.t. Y = M Ω ( X ) , (32)where M Ω is a masking operator that selects the entries of X that have been observed as Y . Thisoptimization problem is computationally very expensive to solve (non-deterministic polynomialhard). However, the problem could be relaxed to a convex optimization problemmin (cid:107) X (cid:107) ∗ s.t. Y = M Ω ( X ) , (33)where (cid:107) . (cid:107) ∗ is the nuclear norm, in a similar manner to the “convexiﬁcation” employed for robustPCA in section 3.2. This could be solved using the singular value shrinkage algorithm [27]. Problem statement:

Given low-resolution spatiotemporally random sampling of blood ﬂow ve-locity data in a cerebral aneurysm, is it possible to recover the high-resolution data?

Problem solution:

Consider the pulsatile blood ﬂow problem in the 2D idealized cerebralaneurysm problem in Sec. 6.3. We collect the velocity data in the aneurysmal region (boxed regionin Fig. 8) and stack the spatiotemporal velocity data into the data matrix X similar to Eq. 1. Thecollected data include vectorial velocity data at 3943 spatial locations and 49 time-steps throughoutthe cardiac cycle. Using the low-rank matrix recovery method, we reconstruct the full data givenonly 10% and 40% random spatiotemporal sampling of the data matrix. The results for two pointsare shown in Fig. 8. We observe that 40% sampling can perfectly capture the velocity outside theaneurysm and with mostly low error for the point inside the aneurysm. With 10% sampling, thealgorithm can still perform well for the point outside of the aneurysm with errors near the peakﬂow rate, whereas the performance for the point inside the aneurysm where more complex ﬂow ispresent is not as good. Matrix completion algorithms provide a promising tool to complete the data when random mea-surements are missing and a high correlation exists within the data. Similar ideas have been usedin reconstructing statistics in turbulent ﬂows [98]. In the context of cardiovascular ﬂows, often thedata are correlated, and therefore we may use matrix completion algorithms to increase the spa-tiotemporal resolution. Interestingly, we observed that the algorithm can successfully complete thedata matrix even if the matrix was full-rank, but exhibited low-rank behavior once the thresholdused in deﬁning the matrix rank was relaxed. One challenge in such methods is the requirementthat the incomplete data sampling should be random (in space and time). However, it might bepossible to draw random samples from a uniform sampling to be able to use such methods. Ofcourse, such random samplings need to be carefully devised to avoid accuracy loss.p. 24

Time (s) V e l o c i t y ( c m / s ) Time (s) V e l o c i t y ( c m / s ) Time (s) V e l o c i t y ( c m / s ) Time (s) V e l o c i t y ( c m / s ) Point A Point B P o i n t B P o i n t A

10% data sampling 40% data sampling

Original dataReconstructed data

Figure 8: Velocity data is reconstructed in a 2D cerebral aneurysm model from random spa-tiotemporal sampling of the data and using low rank matrix recovery. The boxed region showsthe region of interest where the data analysis is performed. The data is reconstructed at pointA (inside the aneurysm) and point B (inside the parent artery) using 10% and 40% randomspatiotemporal sampling. The velocity results are plotted versus time where the red line is thereconstructed data and the blue line is the original data.

Extracting analytical governing equations from large datasets has the potential to simplify our phys-ical understanding of dynamical systems and facilitate modeling. Sparse identiﬁcation of nonlineardynamics (SINDy) is a recent paradigm that enables the discovery of governing equations fromdata [7]. The central assumption is that the governing equations can be selected sparsely given anappropriately chosen set of candidate functions.

Consider a nonlinear dynamical system ˙ x ( t ) = f ( x ) , (34)where x (t) is the state of the system (e.g., particle position in ﬂuid ﬂow) as a function of time and f is the dynamics of the system that we would like to approximate analytically and sparsely usingp. 25 few functions. We consider a library of candidate functionsΘ( X ) =  ... ... ... ... ... ... ...1 X X P X P · · · sin( X ) · · · ... ... ... ... ... ... ...  , (35)where X P is the set of quadratic nonlinear functions combining the state vector (e.g., x ( t ) , x ( t ) x ( t ), ...) and so forth. Subsequently, we represent our dynamical system (Eq. 34) with thesefunctions ˙ X ( t ) = Θ ( X ) Ξ , (36)where each column of the matrix Ξ , ζ k is a sparse vector of coeﬃcients that determines the activefunctions in Θ . To ﬁnd ζ k and therefore the active functions, a convex optimization problem suchas the following can be solved min ζ k (cid:107) ˙ x ( t ) − Θ ( X ) ζ k (cid:107) + λ (cid:107) ζ k (cid:107) , (37)where the λ term promotes sparsity in the selected coeﬃcients. In practice, the identiﬁcation ofthe coeﬃcients ζ k can instead be solved using a sequential thresholded least-squares algorithm toimprove performance under noisy data [7]. Finally, the governing equations for the dynamicalsystem could be written for each state x k as˙ x k ( t ) = Θ ( X ) ζ k . (38) Problem statement:

Modeling the ﬂuid mechanics of thrombosis (blood clot formation) requiressolving large systems of advection-diﬀusion-reaction equations. It is highly desirable to identifyreduced-order thrombosis models from data to simplify thrombosis simulations [99]. Given thetemporal evolution of the prominent biochemicals involved in thrombosis, is it possible to identifythe governing equations for the reaction kinetics?

Problem solution:

We consider the reduced-order thrombosis model of Papadopoulos [100, 101]shown in Fig. 9. The system of ordinary diﬀerential equations models the biochemical reactionkinetics involved between thrombin (IIa), prothrombin (II), activated platelets (AP), and restingplatelets (RP). First, we eliminate the resting platelets from the system by realizing that the [AP]+ [RP] is constant. This is an essential step since the SINDy algorithm performs poorly when somevariables are highly correlated, due to Θ becoming ill-conditioned. The reduced Papadopoulosmodel is solved to generate biochemical trajectories. In real practice, these trajectories come fromexperimental data or high-dimensional thrombosis models and the goal is to identify the governingequations for the reaction kinetics based on these trajectories. In constructing the library Θ , theset of all linear and quadratic functions (based on state variables) are considered. Given the currentmodel, considering a broader set of linearly independent functions did not aﬀect the results. Theoriginal model, the generated trajectory, and the identiﬁed equations and trajectory from SINDyare shown in Fig. 9. Using the solution trajectories, SINDy exactly captures the active terms andcoeﬃcients in the governing equations. p. 26 × -6 [IIa] × -4 [II] × -3 [ AP ] d[IIa]/dt = -k in [IIa] + (k surf + k IIAP [AP])[II]d[II]/dt = -(k surf + k

IIAP [AP])[II]d[AP]/dt = k

APAP [AP][RP]+k

APIIa [RP]d[RP]/dt = -k

APAP [AP][RP]-k

APIIa [RP]Thrombin (IIa)Prothrombin (II)Activated platelets (AP)Resting platelets (RP)

Papadopoulos thrombosis modelReduced Papadopoulos thrombosis model [AP] + [RP [ = C ; constantd[IIa]/dt = -k in [IIa] + (k surf + k IIAP [AP])[II]d[II]/dt = -(k surf + k

IIAP [AP])[II]d[AP]/dt = k

APAP [AP](C-[AP])+k

APIIa (C-[AP])

Origninal model parameters k in k IIAP k surf k APAP k APIIa ] '[IIa]' [ -0.0262 ] [ 0] [ 0] '[II]' [ ] [ -7.2230e-06 ] [ 0] '[AP]' [ 0] [ 0] [ -0.0018 ] '[IIa][IIa]' [ 0] [ 0] [ 0] '[IIa][II]' [ 0] [ 0] [ 0] '[IIa][AP]' [ 0] [ 0] [ 0] '[II][II]' [ 0] [ 0] [ 0] '[II][AP]' [ ] [ -0.5250 ] [ 0] '[AP][AP]' [ 0] [ 0] [ -0.0524 ]C0.004

Generatetrajectories × -6 [IIa] × -4 [II] × -3 [ AP ] Identify the equations and parameters (SINDY)

Figure 9: The SINDy algorithm is applied to the Papadopoulos thrombosis model to identifythe equations from data. First, a reduced version of the Papadopoulos model is derived byeliminating the activated platelets [AP] using [AP] + [RP] = C where C = 0.004 is determinedfrom the initial conditions ([IIa] =0, [II] =9.509e-5, [AP] =0, [RP] =0.004). The constructedtrajectories ([IIa](t), [II](t), and [AP](t)) are used to identify the original governing equationsusing the sparse regression in SINDy. Problem statement:

Given particle trajectories in an unsteady 3D vortical ﬂow, is it possibleto estimate the time-dependent velocity vector ﬁeld using SINDY?

Problem solution:

We consider a modiﬁed version of the unsteady Hill’s spherical vortex inSec. 5.3 where the 3D velocity ﬁeld is given by  u ( x, y, z, t ) = x + 1 − r = 0 .

75 + 0 .

25 cos(4 ωt ) − x − y − z + 2 y sin(2 ωt ) v ( x, y, z, t ) = xyw ( x, y, z, t ) = x (cid:0) z + 0 . ωt ) (cid:1) , (39)where ω = π and r = (cid:112) x + ( y − . ωt )) + z . We simulate particle transport for 8 secondsunder this ﬂow for a particle with the initial position of (x , y , z ) = (0.02, 0.05, 0.01). Giventhe resulting trajectory (cid:0) x(t),y(t),z(t) (cid:1) , we would like to ﬁnd an analytical representation for thevelocity ﬁeld. Here, we use SINDy with control (SINDYc) [102] as a simple extension of the SINDyalgorithm to be able to estimate time-dependent dynamical systems. For the library Θ, the set ofall linear and quadratic functions in x,y,z as well as sin( jωt ), cos( jωt ), x i sin( jωt ), x i cos( jωt ) areconsidered where j=1,2,...,5, and x i =x,y,z. For simplicity, we have assumed prior knowledge about ω , however, it is possible to consider a wide range of frequencies by expanding the library. Theresults are shown in Fig. 10. Comparing the active velocity terms and coeﬃcients with Eq. 39, wesee that the correct terms are accurately predicted.p. 27 z y x Unsteady Hill vortex trajectory

Identify the velocity field (SINDYc) 'xdot' 'ydot' 'zdot' '1' [ ] [ 0] [ 0] 'x' [ 0] [ 0] [ 0] 'y' [ 0] [ 0] [ 0] 'z' [ 0] [ 0] [ 0] 'xx' [ -1.000 ] [ 0] [ 0] 'xy' [ 0] [ ] [ 0] 'xz' [ 0] [ 0] [ ] 'yy' [ -2.000 ] [ 0] [ 0] 'yz' [ 0] [ 0] [ 0] 'zz' [ -2.000 ] [ 0] [ 0] 'sin(1*Omega*t)' [ 0] [ 0] [ 0]'cos(1*Omega*t)' [ 0] [ 0] [ 0] 'cos(4*Omega*t)' [ ] [ 0] [ 0] 'x*sin(2*Omega*t)' [ 0] [ 0] [ ] 'y*sin(2*Omega*t)' [ ] [ 0] [ 0]'z*cos(5*Omega*t)' [ 0] [ 0] [ 0] ... 'xdot' 'ydot' 'zdot' ... ... ...... ... ... ... ... ... ... ...

Figure 10: The SINDy with control (SINDYc) algorithm is applied to a particle trajectorygenerated from an unsteady version of Hill’s spherical vortex. The particle trajectory is plottedand the terms identiﬁed for the velocity vector ﬁeld from SINDYc are listed. SINDYc exactlycaptures the correct terms.

We have demonstrated two examples of SINDy that could be used in cardiovascular ﬂow modeling.Systems biology models (system of ordinary diﬀerential equations governing biological reactionkinetics) are an essential part of multiscale mechanobiology models of disease growth [103, 104].SINDy provides a framework to derive such models from experimental data or identify reduced-order systems biology models from higher-order models. Identifying analytical velocity ﬁelds fromparticle trajectories is another example. Of course, this is expected to be a challenging task incomplex cardiovascular ﬂows. We have shown an example based on an idealized vortex ﬂow. Itremains to be investigated if similar models could be derived from other vortex-dominated ﬂowssuch as blood ﬂow in the left ventricle. One challenge in the application of SINDY is that thematrix Θ becomes ill-conditioned once the trajectories are highly correlated, which will compromisethe algorithm results. A reweighted l regularized least squares technique has been proposed toovercome this issue [105]. Another challenge is in building the library of candidate functions. In theexamples considered, not considering an important active term in the library, resulted in a densesolution where most terms became active. It should be noted that depending on the complexityof the system studied, it may not always be possible to represent a dynamical system sparsely.Finally, SINDY has been extended to partial diﬀerential equations (PDE) to identify active termsand coeﬃcients in spatiotemporal PDE models [106].p. 28 In this section, we brieﬂy discuss some of the recent trends in data-driven modeling and data scienceand their potential in cardiovascular ﬂow modeling.

Machine learning and neural networks

Neural networks are a powerful class of machinelearning techniques and may be thought of as an approximator to complex nonlinear functions. Apopular application in cardiovascular research is learning clinical outcomes based on diﬀerent inputparameters derived from patient-speciﬁc simulations [107, 108]. We may think of this as an advancedmulti-parameter regression framework where we have several input parameters, diﬀerent clinicaloutcomes, and large databases where we are interested in learning/ﬁtting a statistical relationshipbetween the data, which is achieved through the learning process. Another application area islearning models from complex large datasets. An active area of research related to this is learningturbulence models based on direct numerical simulation (DNS) databases [109]. Given the uniquephysics of turbulence in cardiovascular ﬂows [110, 111], learning customized Reynolds averagedNavier-Stokes (RANS) or large-eddy simulation (LES) models based on high-ﬁdelity DNS dataof turbulent pulsatile blood ﬂow seems to be a promising area of future research. A commonquestion for machine learning is if it can replace physics-based cardiovascular ﬂow simulations,and therefore provide a very fast patient-speciﬁc estimation of hemodynamics once the one-timeexpensive training process is completed. Currently, this paradigm has been successfully used toestimate CFD-based fractional ﬂow reserve (FFR) in coronary artery stenosis [112]. However, FFRis a single variable that measures the reduction in ﬂow rate (pressure drop) due to a stenosis. Usinga similar paradigm for estimating variables that exhibit spatial and temporal variation (e.g., velocityor wall shear stress) [113] is a much more involved process, in which extremely large datasets arerequired, and it is not clear if it could be achieved in the near future. In fact, even large datasetsare likely not “large” enough when it comes to systems that are highly sensitive to variations ininput data [114]. The recent physics-informed neural networks (PINN) paradigm [115] considers thegoverning equations in the learning process and could ultimately reduce some of the costs associatedwith traditional CFD modeling such as mesh creation. PINN has been used for an example inversemodeling problem in cardiovascular ﬂows where one is interested in estimating the velocity ﬁeldbased on concentration data [116]. This approach seems to pave the way to a new in-vivo frameworkfor measuring velocity based on time-resolved concentration measured using medical imaging (e.g.,dynamic contrast-enhanced imaging).

Parametric reduced-order models

The reduced-order models that we discussed in Sec. 6 didnot consider parameter variation. Namely, we approximated them given spatiotemporal velocitydata for a ﬁxed parameter. Sometimes in computational modeling, we are interested in a rangeof parameters. Reduced-order modeling tools such as the Galerkin-POD learn a projected spaceand basis, which they use to evolve the solution and therefore provide a computationally eﬃcientprediction of the solution. These methods are trained based on ﬁxed parameters and thereforecannot be used when parameters in the model change. Parametric reduced-order models have beendeveloped to address this challenge [117, 97]. These reduced-order modeling techniques could bevaluable in multiscale modeling of cardiovascular disease growth where parameters in the blood ﬂowor structural mechanics models can change during disease growth simulation.

Data-driven multiscale modeling

Multiscale modeling of cardiovascular disease is challeng-ing as such models need to connect diﬀerent models across diﬀerent spatial and temporal scales.p. 29ata-driven modeling and machine learning have the potential to transform multiscale modelingof cardiovascular disease [118]. Reduced-order models can be used to accelerate computationallyexpensive multiscale models. We may use the data generated from expensive multiscale simulationsor experimental observations to derive a simpliﬁed approximation of the underlying physics (e.g.,using SINDy). Complete collection of biological data across multiple scales represents a majorchallenge in calibrating parameters in multiscale models. Developing data-driven multiscale modelsthat can leverage sparsity or are robust to corrupt and incomplete data is a promising topic of futureresearch. In prior sections, we discussed data-driven modeling with multi-modality data. Devel-oping multiscale models that can appropriately leverage multi-modality data is another interestingtopic [119]. Deep neural networks provide a promising framework for multiphysics and multiscalemodeling [120]. Combining data-driven modeling and machine learning with physics-based sim-ulations provides a hybrid modeling strategy [121] with high potential in multiscale modeling ofbiological systems [118]. Advancements in each of these ﬁelds and their coupling will producepredictive digital twins [122] that could facilitate treatment planning, patient management, andultimately transform personalized cardiovascular medicine [123].

11 Conclusion

Recent advances in computational power, data science, and abundance of data are starting torevolutionize diﬀerent ﬁelds of science and engineering. Data-driven cardiovascular ﬂow modelingis currently at its infancy, yet there are various opportunities to develop customized data-drivenmodels that can transform cardiovascular biomechanics research. Multi-modality and multi-ﬁdelitydata assimilation, overcoming poor data quality, parameter identiﬁcation, super-resolution ﬂowreconstruction, and reduced-order modeling are some examples. We hope that this manuscriptinspires researchers to develop data-driven models that can address outstanding challenges in theﬁeld of cardiovascular biomechanics.

Conﬂict of interest

The authors declare no conﬂict of interest.

Acknowledgement

A. A. acknowledges funding from NSF OAC grant No. 1947559. The authors would like to thankMilad Habibi for assistance in generating the POD and DMD data in Sec 6 and Dr. Roshan M.D’Souza for fruitful discussions related to the topic of the manuscript. We are also thankful to Dr.Steven Brunton and Dr. Nathan Kutz for permission to use their codes in several examples in thismanuscript.

Data Availability

The codes and data used to generate the results in the manuscript will be made publicly availableafter peer-review. p. 30 eferences [1] J. R. Womersley. Method for the calculation of velocity, rate of ﬂow and viscous drag inarteries when the pressure gradient is known.

The Journal of Physiology , 127(3):553–563,1955.[2] C. A. Taylor, T. A. Fonte, and J. K. Min. Computational ﬂuid dynamics applied to cardiaccomputed tomography for noninvasive quantiﬁcation of fractional ﬂow reserve: scientiﬁc basis.

Journal of the American College of Cardiology , 61(22):2233–2241, 2013.[3] K. Pearson. On lines and planes of closest ﬁt to systems of points in space.

The London,Edinburgh, and Dublin Philosophical Magazine and Journal of Science , 2(11):559–572, 1901.[4] J. L. Lumley. The structure of inhomogeneous turbulent ﬂows. In A. M. Yaglam and V. I.Tatarsky, editors,

Proceedings of the International Colloquium on the Fine Scale Structureof the Atmosphere and its Inﬂuence on Radio Wave Propagation . Doklady Akademii NaukSSSR, Moscow, Nauka, 1967.[5] P. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley.

Turbulence, coherent structures,dynamical systems and symmetry . Cambridge University Press, 2012.[6] P. J.. Schmid. Dynamic mode decomposition of numerical and experimental data.

Journal ofFluid Mechanics , 656:5–28, 2010.[7] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data bysparse identiﬁcation of nonlinear dynamical systems.

Proceedings of the National Academy ofSciences , 113(15):3932–3937, 2016.[8] E. J. Cand`es. Compressive sampling. In

Proceedings of the international congress of mathe-maticians , volume 3, pages 1433–1452. Madrid, Spain, 2006.[9] D. L. Donoho. Compressed sensing.

IEEE Transactions on information theory , 52(4):1289–1306, 2006.[10] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal recon-struction from highly incomplete frequency information.

IEEE Transactions on informationtheory , 52(2):489–509, 2006.[11] M. Barrault, Y. Maday, N. C. Nguyen, and A. T. Patera. An ’empirical interpolation’ method:application to eﬃcient reduced-basis discretization of partial diﬀerential equations.

ComptesRendus Mathematique , 339(9):667–672, 2004.[12] S. Chaturantabut and D. C. Sorensen. Nonlinear model reduction via discrete empiricalinterpolation.

SIAM Journal on Scientiﬁc Computing , 32(5):2737–2764, 2010.[13] K. Manohar, B. W. Brunton, J. N. Kutz, and S. L. Brunton. Data-driven sparse sensorplacement for reconstruction: Demonstrating the beneﬁts of exploiting known patterns.

IEEEControl Systems Magazine , 38(3):63–86, 2018.[14] S. L. Brunton, B. R. Noack, and P. Koumoutsakos. Machine learning for ﬂuid mechanics.

Annual Review of Fluid Mechanics , 52:477–508, 2020.p. 3115] K. Taira, S. L. Brunton, S. T. M. Dawson, C. W. Rowley, T. Colonius, B. J. McKeon, O. T.Schmidt, S. Gordeyev, V. Theoﬁlis, and L. S. Ukeiley. Modal analysis of ﬂuid ﬂows: Anoverview.

AIAA Journal , pages 4013–4041, 2017.[16] M. Udell and A. Townsend. Why are big data matrices approximately low rank?

SIAMJournal on Mathematics of Data Science , 1(1):144–160, 2019.[17] G. Strang.

Linear algebra and learning from data . Wellesley-Cambridge Press, 2019.[18] S. Boyd and L. Vandenberghe.

Introduction to applied linear algebra: vectors, matrices, andleast squares . Cambridge university press, 2018.[19] L. N. Trefethen and D. Bau III.

Numerical linear algebra , volume 50. Siam, 1997.[20] P. Bruce, A. Bruce, and P. Gedeck.

Practical Statistics for Data Scientists: 50+ EssentialConcepts Using R and Python . O’Reilly Media, 2020.[21] M. C. Meyer.

Probability and Mathematical Statistics: Theory, Applications, and Practice inR , volume 162. SIAM, 2019.[22] S. P Boyd and L. Vandenberghe.

Convex optimization . Cambridge university press, 2004.[23] L. Perko.

Diﬀerential equations and dynamical systems . Springer Science & Business Media,2013.[24] S. H. Strogatz.

Nonlinear dynamics and chaos: With applications to physics, biology, chem-istry, and engineering . CRC press, 2018.[25] M. Asch, M. Bocquet, and M. Nodet.

Data assimilation: methods, algorithms, and applica-tions . SIAM, 2016.[26] K. Law, A. Stuart, and K. Zygalakis. Data assimilation: A mathematical introduction.Technical report, 2015.[27] A. Majumdar.

Compressed sensing for engineers . CRC Press, 2018.[28] I. Rish and G. Grabarnik.

Sparse modeling: theory, algorithms, and applications . CRC press,2014.[29] A. Burkov.

The hundred-page machine learning book , volume 1. Andriy Burkov Quebec City,Can., 2019.[30] G. James, D. Witten, T. Hastie, and R. Tibshirani.

An introduction to statistical learning ,volume 112. Springer, 2013.[31] L. Eld´en.

Matrix methods in data mining and pattern recognition . SIAM, 2007.[32] A. Quarteroni, A. Manzoni, and F. Negri.

Reduced basis methods for partial diﬀerentialequations: an introduction , volume 92. Springer, 2015.[33] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor.

Dynamic mode decomposition:data-driven modeling of complex systems . SIAM, 2016.p. 3234] J N. Kutz.

Data-driven modeling & scientiﬁc computation: methods for complex systems &big data . Oxford University Press, 2013.[35] S. L. Brunton and J. N. Kutz.

Data-driven science and engineering: Machine learning,dynamical systems, and control . Cambridge University Press, 2019.[36] S. Wold, K. Esbensen, and P. Geladi. Principal component analysis.

Chemometrics andIntelligent Laboratory Systems , 2(1-3):37–52, 1987.[37] H. Hotelling. Analysis of a complex of statistical variables into principal components.

Journalof Educational Psychology , 24(6):417, 1933.[38] J. L. Lumley.

Stochastic tools in turbulence . Courier Dover Publications, 2007.[39] M. Habibi, S.T.M. Dawson, and A. Arzani. Data-driven pulsatile blood ﬂow physics withdynamic mode decomposition.

Fluids , 5(3):111, 2020.[40] A. Arzani, G. Y. Suh, R. L. Dalman, and S. C. Shadden. A longitudinal comparison of hemo-dynamics and intraluminal thrombus deposition in abdominal aortic aneurysms.

AmericanJournal of Physiology-Heart and Circulatory Physiology , 307(12):H1786–H1795, 2014.[41] A. Arzani and S. C. Shadden. Characterization of the transport topology in patient-speciﬁcabdominal aortic aneurysm models.

Physics of Fluids , 24(8):1901, 2012.[42] C. Karmonik, S. Partovi, M. Loebe, B. Schmack, A. Weymann, A. B. Lumsden, M. Karck, andA. Ruhparwar. Computational ﬂuid dynamics in patients with continuous-ﬂow left ventricularassist device support show hemodynamic alterations in the ascending aorta.

The Journal ofThoracic and Cardiovascular Surgery , 147(4):1326–1333, 2014.[43] T. Passerini, L. M. Sangalli, S. Vantini, M. Piccinelli, S. Bacigaluppi, L. Antiga, E. Boccardi,P. Secchi, and A. Veneziani. An integrated statistical investigation of internal carotid arter-ies of patients aﬀected by cerebral aneurysms.

Cardiovascular Engineering and Technology ,3(1):26–40, 2012.[44] F. Mut, S. Wright, G. A. Ascoli, and J. R. Cebral. Morphometric, geographic, and territo-rial characterization of brain arterial trees.

International Journal for Numerical Methods inBiomedical Engineering , 30(7):755–766, 2014.[45] D. S. Molony, J. Park, L. Zhou, C. C. Fleischer, H. Y. Sun, X. P. Hu, J. N. Oshinski,H. Samady, D. P. Giddens, and A. Rezvan. Bulk ﬂow and near wall hemodynamics of therabbit aortic arch and descending thoracic aorta: A 4D PC-MRI derived computational ﬂuiddynamics study.

Journal of Biomechanical Engineering , 141(1), 2019.[46] F. Cosentino, G. M. Raﬀa, G. Gentile, V. Agnese, D. Bellavia, M. Pilato, and S. Pasta.Statistical shape analysis of ascending thoracic aortic aneurysm: Correlation between shapeand biomechanical descriptors.

Journal of Personalized Medicine , 10(2):28, 2020.[47] C. G. Fonseca, M. Backhaus, D. A. Bluemke, R. D. Britten, J. D. Chung, et al. The cardiacatlas project–an imaging database for computational modeling and statistical atlases of theheart.

Bioinformatics , 27(16):2288–2295, 2011.p. 3348] K. Valen-Sendstad, K. A. Mardal, M. Mortensen, B. A. P. Reif, and H. P. Langtangen. Directnumerical simulation of transitional ﬂow in a patient-speciﬁc intracranial aneurysm.

Journalof Biomechanics , 44(16):2826–2832, 2011.[49] E. J. Cand`es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis?

Journalof the ACM (JACM) , 58(3):1–37, 2011.[50] I. Scherl, B. Strom, J. K. Shang, O. Williams, B. L. Polagye, and S. L. Brunton. Robustprincipal component analysis for modal decomposition of corrupt ﬂuid ﬂows.

Physical ReviewFluids , 5(5):054401, 2020.[51] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange multiplier method for exact recoveryof corrupted low-rank matrices. arXiv preprint arXiv:1009.5055 , 2010.[52] R. Tibshirani. Regression shrinkage and selection via the lasso.

Journal of the Royal StatisticalSociety: Series B (Methodological) , 58(1):267–288, 1996.[53] A. Arzani. Accounting for residence-time in blood rheology models: do we really need non-newtonian blood ﬂow modelling in large arteries?

Journal of The Royal Society Interface ,15(146):20180486, 2018.[54] C. E. Shannon. A mathematical theory of communication.

The Bell system technical journal ,27(3):379–423, 1948.[55] G. H. Chen, J. Tang, and S. Leng. Prior image constrained compressed sensing (PICCS): amethod to accurately reconstruct dynamic CT images from highly undersampled projectiondata sets.

Medical physics , 35(2):660–663, 2008.[56] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye. k-t FOCUSS: a general com-pressed sensing framework for high resolution dynamic MRI.

Magnetic Resonance in Medicine ,61(1):103–116, 2009.[57] L. E. Ma, M. Markl, K. Chow, H. Huh, C. Forman, A. Vali, A. Greiser, J. Carr, S. Schnell, A. J.Barker, and N. Jin. Aortic 4D ﬂow MRI in 2 minutes using compressed sensing, respiratorycontrolled adaptive k-space reordering, and inline reconstruction.

Magnetic Resonance inMedicine , 2019.[58] R. Iba˜nez, E. Abisset-Chavanne, E. Cueto, A. Ammar, J. L. Duval, and F. Chinesta. Someapplications of compressed sensing in computational mechanics: model order reduction, mani-fold learning, data-driven applications and nonlinear dimensionality reduction.

ComputationalMechanics , 64(5):1259–1271, 2019.[59] E. J. Cand`es and M. B. Wakin. An introduction to compressive sampling.

IEEE signalprocessing magazine , 25(2):21–30, 2008.[60] G. Evensen.

Data assimilation: the ensemble Kalman ﬁlter . Springer Science & BusinessMedia, 2009.[61] A. Falahatpisheh, G. Pedrizzetti, and A. Kheradvar. Three-dimensional reconstruction ofcardiac ﬂows based on multi-planar velocity ﬁelds.

Experiments in Fluids , 55(11):1848, 2014.p. 3462] A. Falahatpisheh and A. Kheradvar. A measure of axisymmetry for vortex rings.

EuropeanJournal of Mechanics-B/Fluids , 49:264–271, 2015.[63] F. Gaidzik, D. Stucht, C. Roloﬀ, O. Speck, D. Th´evenin, and G. Janiga. Transient ﬂowprediction in an idealized aneurysm geometry using data assimilation.

Computers in Biologyand Medicine , 115:103507, 2019.[64] K. DeVault, P. A. Gremaud, V. Novak, M. S. Olufsen, G. Vernieres, and P. Zhao. Bloodﬂow in the circle of Willis: modeling and calibration.

Multiscale Modeling & Simulation ,7(2):888–909, 2008.[65] D. Canuto, J. L. Pantoja, J. Han, E. P. Dutson, and J. D. Eldredge. An ensemble Kalmanﬁlter approach to parameter estimation for patient-speciﬁc cardiovascular ﬂow modeling.

The-oretical and Computational Fluid Dynamics , pages 1–24, 2020.[66] S. Pant, B. Fabr`eges, J. F. Gerbeau, and I. E. Vignon-Clementel. A methodological paradigmfor patient-speciﬁc multi-scale CFD simulations: from clinical measurements to parameterestimates for individual analysis.

International Journal for Numerical Methods in BiomedicalEngineering , 30(12):1614–1648, 2014.[67] A. Arnold, C. Battista, D. Bia, Y. Z. German, R. L. Armentano, H. Tran, and M. S.Olufsen. Uncertainty quantiﬁcation in a patient-speciﬁc one-dimensional arterial networkmodel: EnKF-based inﬂow estimator.

Journal of Veriﬁcation, Validation and UncertaintyQuantiﬁcation , 2(1), 2017.[68] C. Bertoglio, P. Moireau, and J. F. Gerbeau. Sequential parameter estimation for ﬂuid–structure problems: Application to hemodynamics.

International Journal for NumericalMethods in Biomedical Engineering , 28(4):434–455, 2012.[69] S. W. Funke, M. Nordaas, Ø. Evju, M. S. Alnæs, and K. A. Mardal. Variational data assim-ilation for transient blood ﬂow simulations: Cerebral aneurysms as an illustrative example.

International Journal for Numerical Methods in Biomedical Engineering , 35(1):e3152, 2019.[70] M. F. Fathi, A. Bakhshinejad, A. Baghaie, D. Saloner, R. H. Sacho, V. L. Rayz, and R. M.D’Souza. Denoising and spatial resolution enhancement of 4D ﬂow MRI using proper orthog-onal decomposition and lasso regularization.

Computerized Medical Imaging and Graphics ,70:165–172, 2018.[71] T. S. Koltukluo˘glu and P. J. Blanco. Boundary control in computational haemodynamics.

Journal of Fluid Mechanics , 847:329–364, 2018.[72] E. Ferdian, A. Suinesiaputra, D. J. Dubowitz, D. Zhao, A. Wang, B. Cowan, and A. A.Young. 4DFlowNet: Super-resolution 4D Flow MRI using deep learning and computationalﬂuid dynamics.

Frontiers in Physics , 8:138, 2020.[73] G. Berkooz, P. Holmes, and J. L. Lumley. The proper orthogonal decomposition in the analysisof turbulent ﬂows.

Annual Review of Fluid Mechanics , 25(1):539–575, 1993.[74] B. R. Noack. From snapshots to modal expansions–bridging low residuals and pure frequencies.

Journal of Fluid Mechanics , 802:1–4, 2016.p. 3575] C. W. Rowley, I. Mezi´c, S. Bagheri, P. Schlatter, and D. S. Henningson. Spectral analysis ofnonlinear ﬂows.

Journal of Fluid Mechanics , 641:115–127, 2009.[76] J. H. Tu, Clarence W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz. Ondynamic mode decomposition: Theory and applications.

Journal of Computational Dynamics ,1(2):391–421, 2014.[77] L. Sirovich. Turbulence and the dynamics of coherent structures, parts I–III.

Quarterly ofApplied Mathematics , XLV(3):561–590, October 1987.[78] A. Wynn, D. Pearson, B. Ganapathisubramani, and P. Goulart. Optimal mode decompositionfor unsteady ﬂows.

Journal of Fluid Mechanics , 733:473–503, 2013.[79] P. Sashittal and D. J. Bodony. Reduced-order control using low-rank dynamic mode decom-position.

Theoretical and Computational Fluid Dynamics , 33(6):603–623, 2019.[80] M. R. Jovanovi´c, P. J. Schmid, and J. W. Nichols. Sparsity-promoting dynamic mode decom-position.

Physics of Fluids , 26(2):024103, 2014.[81] Z. Drmac, I. Mezic, and R. Mohr. Data driven modal decompositions: analysis and enhance-ments.

SIAM Journal on Scientiﬁc Computing , 40(4):A2253–A2285, 2018.[82] H. Zhang, S. T. M. Dawson, C. W. Rowley, E. A. Deem, and L. N. Cattafesta. Evaluating theaccuracy of the dynamic mode decomposition.

Journal of Computational Dynamics , 7(1):35,2020.[83] C. W. Rowley and S. T. M. Dawson. Model reduction for ﬂow analysis and control.

AnnualReview of Fluid Mechanics , 49:387–417, 2017.[84] J. Proctor, S. Brunton, and J. Kutz. Dynamic mode decomposition with control.

SIAMJournal on Applied Dynamical Systems , 15(1):142–161, 2016.[85] M. Williams, I. Kevrekidis, and C. Rowley. A data–driven approximation of the Koopmanoperator: Extending dynamic mode decomposition.

Journal of Nonlinear Science , 25(6):1307–1346, 2015.[86] S. T. M. Dawson, M. S. Hemati, M. O. Williams, and C. W. Rowley. Characterizing andcorrecting for the eﬀect of sensor noise in the dynamic mode decomposition.

Experiments inFluids , 57(3):42, 2016.[87] M. S. Hemati, C. W. Rowley, E. A. Deem, and L. N. Cattafesta. De-biasing the dynamicmode decomposition for applied Koopman spectral analysis of noisy datasets.

Theoretical andComputational Fluid Dynamics , 31(4):349–368, 2017.[88] T. Askham and J. N. Kutz. Variable projection methods for an optimized dynamic modedecomposition.

SIAM Journal on Applied Dynamical Systems , 17(1):380–416, 2018.[89] A. Logg, K. A. Mardal, and G. Wells.

Automated solution of diﬀerential equations by theﬁnite element method , volume 84. Springer, Berlin, Heidelberg, 2012.[90] L. Grinberg, A. Yakhot, and G. Karniadakis. Analyzing transient turbulence in a stenosedcarotid artery by proper orthogonal decomposition.

Annals of Biomedical Engineering ,37(11):2200–2217, 2009. p. 3691] S. Kefayati and T. L. Poepping. Transitional ﬂow analysis in the carotid artery bifurcationby proper orthogonal decomposition and particle image velocimetry.

Medical Engineering &Physics , 35(7):898–909, 2013.[92] G. Di Labbio and L. Kadem. Reduced-order modeling of left ventricular ﬂow subject to aorticvalve regurgitation.

Physics of Fluids , 31(3):031901, 2019.[93] I. Bright, G. Lin, and J. N. Kutz. Compressive sensing based machine learning strategy forcharacterizing the ﬂow around a cylinder with limited pressure measurements.

Physics ofFluids , 25(12):127102, 2013.[94] Z. Drmac and S. Gugercin. A new selection operator for the discrete empirical interpola-tion method—improved a priori error bound and extensions.

SIAM Journal on ScientiﬁcComputing , 38(2):A631–A648, 2016.[95] Y. I. Cho, M. P. Mooney, and D. J. Cho. Hemorheological disorders in diabetes mellitus.

Journal of Diabetes Science and Technology , 2(6):1130–1138, 2008.[96] T. J. R. Hughes, J. A. Cottrell, and Y. Bazilevs. Isogeometric analysis: CAD, ﬁnite elements,NURBS, exact geometry and mesh reﬁnement.

Computer Methods in Applied Mechanics andEngineering , 194(39-41):4135–4195, 2005.[97] J. S. Hesthaven, G. Rozza, and B. Stamm.

Certiﬁed reduced basis methods for parametrizedpartial diﬀerential equations , volume 590. Springer, 2016.[98] A. Zare, T. T. Georgiou, and M. R. Jovanovi´c. Stochastic dynamical modeling of turbulentﬂows.

Annual Review of Control, Robotics, and Autonomous Systems , 3:195–219, 2020.[99] K. B. Hansen and S. C. Shadden. Automated reduction of blood coagulation models.

Inter-national Journal for Numerical Methods in Biomedical Engineering , 35(10):e3220, 2019.[100] K. P. Papadopoulos, M. Gavaises, and C. Atkin. A simpliﬁed mathematical model for throm-bin generation.

Medical Engineering & Physics , 36(2):196–204, 2014.[101] K. Papadopoulos.

Flow eﬀect on thrombus formation in stenosed coronary arteries: a com-putational study . PhD thesis, City University London, 2015.[102] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Sparse identiﬁcation of nonlinear dynamics withcontrol (SINDYc).

IFAC-PapersOnLine , 49(18):710–715, 2016.[103] A. Arzani, K. S. Masters, and M. R. K. Mofrad. Multiscale systems biology model of calciﬁcaortic valve disease progression.

ACS Biomaterials Science & Engineering , 3(11):2922–2933,2017.[104] V. D. Sree and A. B. Tepole. Computational systems mechanobiology of growth and remod-eling: Integration of tissue mechanics and cell regulatory network dynamics.

Current Opinionin Biomedical Engineering , 15:75–80, 2020.[105] A. Cortiella, K. C. Park, and A. Doostan. Sparse identiﬁcation of nonlinear dynamical systemsvia reweighted (cid:96) -regularized least squares. arXiv preprint arXiv:2005.13232 , 2020.[106] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz. Data-driven discovery of partialdiﬀerential equations. Science Advances , 3(4):e1602614, 2017.p. 37107] F. J. Detmer, D. L¨uckehe, F. Mut, M. Slawski, S. Hirsch, P. Bijlenga, G. von Voigt, andJ. R. Cebral. Comparison of statistical learning approaches for cerebral aneurysm ruptureassessment.

International Journal of Computer Assisted Radiology and Surgery , 15(1):141–150, 2020.[108] Z. Jiang, H. N. Do, J. Choi, W. Lee, and S. Baek. A deep learning approach to predictabdominal aortic aneurysm expansion using longitudinal data.

Frontiers in Physics , 2020.[109] K. Duraisamy, G. Iaccarino, and H. Xiao. Turbulence modeling in the age of data.

AnnualReview of Fluid Mechanics , 51:357–377, 2019.[110] L. Antiga and D. A. Steinman. Rethinking turbulence in blood.

Biorheology , 46(2):77–81,2009.[111] D. Xu, A. Varshney, X. Ma, B. Song, M. Riedl, M. Avila, and B. Hof. Nonlinear hydrodynamicinstability and turbulence in pulsatile ﬂow.

Proceedings of the National Academy of Sciences ,117(21):11233–11239, 2020.[112] A. Coenen, Y. H. Kim, M. Kruk, C. Tesche, et al. Diagnostic accuracy of a machine-learningapproach to coronary computed tomographic angiography–based fractional ﬂow reserve: resultfrom the MACHINE consortium.

Circulation: Cardiovascular Imaging , 11(6):e007217, 2018.[113] A. Sarrami-Foroushani, T. Lassila, J. M. Pozo, A. Gooya, and A. F. Frangi. Direct estimationof wall shear stress from aneurysmal morphology: A statistical approach. In

InternationalConference on Medical Image Computing and Computer-Assisted Intervention , pages 201–209.Springer, 2016.[114] S. Succi and P. V. Coveney. Big data: the end of the scientiﬁc method?

PhilosophicalTransactions of the Royal Society A , 377(2142):20180145, 2019.[115] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deeplearning framework for solving forward and inverse problems involving nonlinear partial dif-ferential equations.

Journal of Computational Physics , 378:686–707, 2019.[116] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden ﬂuid mechanics: Learning velocity andpressure ﬁelds from ﬂow visualizations.

Science , 367(6481):1026–1030, 2020.[117] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction meth-ods for parametric dynamical systems.

SIAM Review , 57(4):483–531, 2015.[118] G. C. Y. Peng, M. Alber, A. B. Tepole, W. R. Cannon, S. De, S. Dura-Bernal, K. Garikipati,G. Karniadakis, W. W. Lytton, P. Perdikaris, L. Petzold, and E. Kuhl. Multiscale model-ing meets machine learning: What can we learn?

Archives of Computational Methods inEngineering , pages 1–21, 2020.[119] P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence, and G. E Karniadakis. Nonlinearinformation fusion algorithms for robust multi-ﬁdelity modeling.

Proceedings of the RoyalSociety A: Mathematical, Physical, and Engineering Sciences , 473:2198, 2016.[120] S. Cai, Z. Wang, L. Lu, T. A. Zaki, and G. E. Karniadakis. DeepM&Mnet: Inferring theelectroconvection multiphysics ﬁelds based on operator approximation by neural networks. arXiv preprint arXiv:2009.12935 , 2020. p. 38121] L. von Rueden, S. Mayer, R. Sifa, C. Bauckhage, and J. Garcke. Combining machine learningand simulation to a hybrid modelling approach: Current and future directions. In

Interna-tional Symposium on Intelligent Data Analysis , pages 548–560. Springer, 2020.[122] M. G. Kapteyn and K. E. Willcox. From physics-based models to predictive digital twins viainterpretable machine learning. arXiv preprint arXiv:2004.11356 , 2020.[123] D. R. Hose, P. V. Lawford, W. Huberts, L. R. Hellevik, S. W. Omholt, and F. N. van deVosse. Cardiovascular models for personalised medicine: Where now and where next?