[PDF] Biologically-informed neural networks guide mechanistic modeling from sparse experimental data

Abstract

Biologically-informed neural networks (BINNs), an extension of physics-informed neural networks [1], are introduced and used to discover the underlying dynamics of biological systems from sparse experimental data. In the present work, BINNs are trained in a supervised learning framework to approximate in vitro cell biology assay experiments while respecting a generalized form of the governing reaction-diffusion partial differential equation (PDE). By allowing the diffusion and reaction terms to be multilayer perceptrons (MLPs), the nonlinear forms of these terms can be learned while simultaneously converging to the solution of the governing PDE. Further, the trained MLPs are used to guide the selection of biologically interpretable mechanistic forms of the PDE terms which provides new insights into the biological and physical mechanisms that govern the dynamics of the observed system. The method is evaluated on sparse real-world data from wound healing assays with varying initial cell densities [2].

Full PDF

BBiologically-informed neural networks guide mechanisticmodeling from sparse experimental data

John H. Lagergren , John T. Nardini , Ruth E. Baker , Matthew J. Simpson ,Kevin B. Flores Department of Mathematics, North Carolina State University, Raleigh, NorthCarolina, USA Center for Research and Scientific Computation, North Carolina State University,Raleigh, North Carolina, USA Statistical and Applied Mathematical Sciences Institute, Durham, North Carolina,USA Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK School of Mathematical Sciences, Queensland University of Technology, Brisbane,Queensland 4001, Australia*[email protected], [email protected]

Abstract

Biologically-informed neural networks (BINNs), an extension of physics-informed neuralnetworks [1], are introduced and used to discover the underlying dynamics of biologicalsystems from sparse experimental data. In the present work, BINNs are trained in asupervised learning framework to approximate in vitro cell biology assay experimentswhile respecting a generalized form of the governing reaction-diffusion partialdifferential equation (PDE). By allowing the diffusion and reaction terms to bemultilayer perceptrons (MLPs), the nonlinear forms of these terms can be learned whilesimultaneously converging to the solution of the governing PDE. Further, the trainedMLPs are used to guide the selection of biologically interpretable mechanistic forms ofthe PDE terms which provides new insights into the biological and physical mechanismsthat govern the dynamics of the observed system. The method is evaluated on sparsereal-world data from wound healing assays with varying initial cell densities [2].

Author summary

In this work we extend equation learning methods to be feasible for biologicalapplications with nonlinear dynamics and where data are often sparse and noisy.Physics-informed neural networks have recently been shown to approximate solutions ofPDEs from simulated noisy data while simultaneously optimizing the PDE parameters.However, the success of this method requires the correct specification of the governingPDE, which may not be known in practice. Here, we present an extension of thealgorithm that allows neural networks to learn the nonlinear terms of the governingsystem without the need to specify the mechanistic form of the PDE. Our method isdemonstrated on real-world biological data from scratch assay experiments and used todiscover a previously unconsidered biological mechanism that describes delayedpopulation response to the scratch.May 28, 2020 1/34 a r X i v : . [ q - b i o . Q M ] M a y ntroduction Collective migration refers to the coordinated migration of a group of individuals [3, 4].This process arises in a variety of biological and social contexts, including pedestriandynamics [5], tumor progression [6], and animal development [7]. In the presence ofmany individuals, differential equation models provide a flexible framework toinvestigate collective behavior as a continuum [8–12]. A challenge for mathematiciansand scientists is to use mathematical models together with spatiotemporal data ofcollective migration to validate assumptions about the underlying physical andbiological laws that govern the observed dynamics. Several factors contribute to thedifficulty of this task, even for simple systems/data, some of which include biologicalforms and levels of noise in the observation process, poor understanding of theunderlying dynamics, a large number of candidate mathematical models,implementation of computationally expensive numerical solvers, etc. This work providesa data-driven tool which can alleviate many of these problems by enabling the rapiddevelopment and validation of mathematical models from sparse noisy data. Themethodology is demonstrated using a case study of scratch assay experiments.Scratch assays are a widely adopted experiment in cellular biology used to studycollective cell migration in vitro as cell populations re-colonize empty spatial regions.These experiments have been used previously to observe population-wide behavior inmany different contexts, including wound healing [13–16] and cancer progression [17].Mathematical modeling of scratch assays plays an important role in the quantificationand analysis of population dynamics. This is because (i) the equations and parameterscomprising mathematical models are interpretable , providing information about theunderlying physical and biological mechanics that drive the observed system, and (ii)when properly calibrated, they are generalizable , affording the ability to make accuratepredictions beyond the data set used for calibration.Reaction-diffusion partial differential equations (PDEs) are frequently used to modelscratch assay experiments [2, 8, 11, 18, 19]. The general one-dimensionalreaction-diffusion equation that describes the rate of change of a quantity of interest u ( x, t ) (e.g. cell density) is u t = ( D u x ) x + G u, x ∈ [ x , x f ] , t ∈ [ t , t f ] , (1)in which the rate of change of u (i.e. u t ) is a function of diffusion, modeled by thefunction D , and reaction or growth, modeled by the function G . Note that D and G depend on the application, and choosing the correct/optimal mechanistic models forthese terms is the focus of many current research efforts and remains an open question.The classical Fisher–Kolmogorov–Petrovsky–Piskunov (FKPP) equation is areaction-diffusion equation that has been used to model a wide spectrum of growth andtransport of biological processes. In particular, the FKPP model assumes a scalardiffusivity function D = D and logistic growth function G = r (1 − u/K ) with intrinsicgrowth rate r and carrying capacity K [19, 20]. Variants of the reaction-diffusionequation have also been used to account for different types of cell interactions duringscratch assay experiments. For example, the nonlinear diffusivity function D = 1 − α / + 3 α ( u/K − / ) with cell-to-cell adhesion coefficient α was used tomodel dynamics in which neighboring cells prevent other cells from migrating [21].Alternatively, a diffusivity function of the form D = D (1 + α ( u/K ) ) can be used tomodel dynamics in which cells promote the migration of others [11]. Additional variantsof reaction-diffusion equation models have captured cell migration in the presence ofgrowth factors [22], during melanoma progression [17], and in response to different drugtreatments [18].A recent study quantitatively investigated the role of initial cell density byconducting a suite of scratch assay experiments on PC-3 prostate cancer cells withMay 28, 2020 2/34ystematically varying initial cell densities [2]. The experimental data was used tocalibrate the FKPP equation as well as a variant model known as the GeneralizedPorous-FKPP equation, which assumes that diffusivity increases with cell density u byusing a diffusivity function D = D ( u/K ) m with diffusion coefficient D , carryingcapacity K , and exponent m . Like the FKPP equation, the growth term is alsodescribed by the logistic growth function G = r (1 − u/K ). While the calibrated modelsapproximated the experimental data well in many cases in [2], the presence ofsystematic biases between the model solutions and experimental data indicate theexistence of additional governing mechanisms that may not be accounted for in thesemathematical models. However, the existence of a large number of possible biophysicalmechanisms that could play a role in scratch assay dynamics makes the testing ofmathematical models against these experimental data computationally challenging.Thereby, this scenario motivates the use of equation learning methods to discover thediffusion and reaction terms directly from the experimental data.Enabled by advances in computing power, algorithms, and the amount of availabledata, the field of equation learning has recently emerged as a powerful tool for theautomated identification of underlying physical laws governing a set of observation data.The basic assumption in this field is that measured data arise from some unknown n -dimensional dynamical system of the form u t = F ( x, t, u, u x , u xx , . . . ; θ ) , x ∈ [ x , x f ] , t ∈ [ t , t f ] , (2)with quantity of interest u = u ( x, t ), parameter vector θ ∈ R k , and appropriate initialand boundary conditions. An example quantity of interest for modeling cell migrationdynamics is the cell density ( cells / mm ) at location x and time t . The measured data { u i,j } M,Ni,j =1 for a set of spatial points x i , i = 1 , . . . , M , and set of time points t j , j = 1 , . . . , N , are assumed to be corrupted by some form of observation error that maybe known or unknown in practice. The goal of equation learning methods is to identifythe closed form of F in Eq (2) directly from the noisy measurements u i,j . Note that, inorder to simulate the learned equation, either the noisy or a denoised version of theinitial condition can be used along with an assumed boundary condition (e.g. no-flux)that describes the biological process generating the data.Two primary sets of methodology have been used in field of equation learning todate: sparse regression [23, 24] and theory-informed neural networks [1, 25]. In thesparse regression framework, numerical methods (e.g. finite differences or polynomialsplines) are used to denoise u and approximate the partial derivatives u t , u x , u xx , etc.from a set of data. The approximations are then used to construct a library of nonlinearcandidate terms (e.g. 1, u , u , u x , . . . , u x u xx , etc.) thought to comprise the governingsystem of ordinary differential equations (ODEs) or PDEs. The data relating u t to allpossible model terms inside the library are formulated as a linear regression problem inwhich sparsity promoting techniques are used to select a small subset of library termsthat produce the most parsimonious model. While the sparse regression framework hasbeen successfully demonstrated to circumvent searching through a combinatorially largespace of possible candidate models, it can require large amounts of training data andthe numerical methods used for denoising and differentiation are not robust tobiologically realistic forms and levels of noise, leading to inaccuracies in both theconstructed library and learned equations [26]. Further, the method assumes theunknown function F in Eq (2) can be written as a linear combination of nonlinearcandidate terms, which may not be true in practice.An alternative approach uses function-approximating deep neural networks, i.e.,multilayer perceptrons (MLPs), as surrogate models u MLP ( x, t ) for the solution of thegoverning dynamical system [1, 25]. In this approach, the assumed mechanistic form of F in Eq (2) is pre-specified and then used as a form of regularization in the neuralMay 28, 2020 3/34etwork objective function. The parameters of F are allowed to be “learnable,”meaning that the parameters of the governing PDE are calibrated while the neuralnetwork is trained to minimize the error between u MLP ( x i , t j ) and the data u i,j . Thismethodology ensures that the neural network solution satisfies the physical lawsdescribed by F while simultaneously fitting the spatiotemporal data. Theory-informedneural networks have been demonstrated with smaller amounts of data in the presenceof noise, however, they have so far only been applied to problems where the governingmechanistic PDE is known a priori .Hybrid approaches that combine neural networks and sparse regression have alsobeen suggested to address some of the issues surrounding the above methods [26, 27]. Inthese approaches, neural networks are used as surrogate models for u ( x, t ) and thenused to construct the library of candidate terms for sparse regression using automaticdifferentiation. These methods have been shown to accurately learn the governingsystem of equations for a variety of reaction-diffusion models from spatiotemporal datawith biologically realistic levels of noise [26].All three approaches (i.e. sparse regression, theory-informed neural networks, andhybrids) however, suffer from the model specification problem , in which the governingODE/PDE model must be specified a priori either explicitly or as a library ofcandidate terms. Thus, (i) if the true dynamical system contains terms that are notincluded in the regularization term for theory-informed neural networks, or (ii) if thetrue terms cannot be represented as a linear combination of nonlinear candidate termsfor sparse regression, then these methods will ultimately fail to recover the true system.Further, detecting this issue when determining what the “true” system is in real-usecases is an open question. Where systems with scalar or linear dynamics may besuitable for these approaches, biological systems pose a particular challenge in thisrespect, since many of the underlying mechanics driving these systems are nonlinear.For example, the Generalized Porous-FKPP model contains a nonlinear diffusivityfunction D = D ( u/K ) m ) with unknown exponent m . These issues help explain why, tothe best of our knowledge, equation learning methods have not yet been successfullyapplied to real-world biological population-level data.In this work, biologically-informed neural networks (BINNs), an extension ofphysics-informed neural networks (PINNs) [1], are presented as a solution to the libraryspecification problem for systems with biological/physical constraints. In thisframework, the right-hand-side function F of the PDE in Eq (2) is assumed to be acombination of biologically relevant terms. For example, the general form ofreaction-diffusion models can be described by the two right-hand-side terms in Eq (1)meaning that the equation learning problem is transformed from learning F to learningthe diffusivity and growth functions D and G . Rather than assigning mechanistic formsto each function as in previous equation learning studies, each function is replaced witha separate neural network. This approach leverages the ability of deep neural networksto approximate continuous functions arbitrarily well [28]. Importantly, the form of eachlearned neural network function can be visualized, thereby enabling a data-driven toolfor user-guided conjecture of new mathematical equations that describe each separateterm in F . Moreover, formulating the equation learning task within the BINNsframework enables the modeler to use domain expertise to include qualitativeconstraints on the parameter networks (e.g. specifying nonlinear functions that arenon-negative, monotone increasing/decreasing, etc.) by selecting appropriate activationfunctions and loss terms for the optimization.While BINNs can be used to discover a wide range of governing equations across thebiological and physical sciences, including systems of ODEs and PDEs, in this workthey are demonstrated using reaction-diffusion PDEs. The BINNs methodology is firsttested using synthetic data and then demonstrated on experimental data from scratchMay 28, 2020 4/34ssay experiments with variable initial cell densities [2]. Notably, each data set is noisyand sparse, containing only five time measurements across 38 spatial locations. BINNsare used to discover the nonlinear forms of the diffusivity function and growth term ofthe governing reaction-diffusion equation. Persistent model discrepancy is used tomotivate the incorporation of a novel delay term which may have importantimplications for the reproducibility and modeling of scratch assays. The learnednonlinear forms of the diffusion, growth, and delay terms are used to guide the selectionof a mechanistic model with biologically interpretable parameters that remove virtuallyall of the model discrepancy. Scratch assay data

Biologically-informed neural networks (BINNs) are evaluated on experimental scratchassay data from [2]. A typical scratch assay involves (i) growing a cell monolayer up tosome desired initial cell density, (ii) creating a “scratch” in the interior of the monolayerto produce an empty region, and (iii) recording longitudinal measurements of the celldensity during re-colonization of the area. One-dimensional cell density profiles areobtained by manually counting the cells within vertical columns of the two-dimensionalimage data. See Fig S1 for a visualization of the experiment. For these data, the celldensity profiles were reported for six varying initial cell density levels (i.e. 10,000,12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). To make the cell densityprofiles compatible with neural network training, the data are pre-processed by rescalingthe x and t variables to the scales of millimeters (mm) and days, respectively (seeMethods Section for more details). Further, the cell density profile at the left boundaryis removed from the data because it was identified as an outlier across each of the sixdata sets. The resulting pre-processed cell densities at 37 spatial points and five timepoints are shown in Fig 1. C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 18,000 cells per well

Initial cell density: 20,000 cells per well

Fig 1. Experimental scratch assay data.

Pre-processed cell density profiles fromscratch assay experiments with varying initial cell densities [2]. Each subplotcorresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). The cell densities are reported at 37equally-spaced positions and five equally-spaced time points.May 28, 2020 5/34 iologically-informed neural networks

BINNs are centered around a function-approximating deep neural network, or MLP,denoted by u MLP ( x, t ) which acts as a surrogate model that approximates the solutionto the governing equation described by Eq (2) (Fig 2A). In this work, the governingPDE is assumed to contain two terms, D and G , that describe the generalreaction-diffusion model in Eq (1). Since the true forms of the diffusivity and growthfunctions are unknown, they are approximated by neural networks D = D MLP ( u ) and G = G MLP ( u ) (Fig 2B). Both D MLP and G MLP are continuously differentiable functionsthat input the predicted cell density u MLP ( x, t ) and output the corresponding diffusivityor growth value. The advantage of using MLPs for the terms of the governing PDE isthat the nonlinear forms of these terms can be learned without specifying themexplicitly (or as a library of candidate terms), thus circumventing the modelspecification problem. Automatic differentiation (Fig 2C) is used to numericallydifferentiate compositions of u MLP , D MLP , and G MLP in order to construct the generalreaction-diffusion model in Eq (1). The resulting PDE (Fig 2D) is used to regularize u MLP during training so that u MLP not only fits the data u i,j but also satisfies thegoverning reaction-diffusion system. 𝜕𝑢𝜕𝑡 = 𝜕𝜕𝑥 𝐷 ’ 𝜕𝑢𝜕𝑥 + 𝐺 ’ 𝑢 𝑥 𝑡 𝑢+𝑢+ 𝑢 𝐷 𝐺 Biologically-informed neural network Parameter networks

𝐼 𝜕𝑥 𝜕𝑡 𝜕𝑥

Auto. differentiation Governing dynamical system

A B C D

Fig 2. Biologically-informed neural networks for reaction-diffusion models. (A) BINNs are deep neural networks that approximate the solution of a governingdynamical system. (B) By allowing the terms of the dynamical system (e.g. diffusivityfunction D and growth function G ) to be function-approximating deep neural networks,the nonlinear forms of these terms can be learned without the need to specify amechanistic model or library of candidate terms. (C) Automatic differentiation is usedon compositions of the different neural network models (e.g. u , D , and G ) to constructthe PDE that describes governing dynamical system. (D) The governing system is usedin the neural network objective function to jointly learn and satisfy the governing PDEwhile minimizing the error between the network outputs and noisy observations.To ensure that the fit to the data and the fidelity to the governing PDE aresimultaneously optimized, the BINNs are trained with gradient-based methods using thefollowing multi-part objective function: L Total = L GLS + L PDE + L Constr . (3)The first term L GLS concerns the generalized least squares (GLS) distance between u MLP ( x i , t j ) and the corresponding observed data u i,j . The observation process isassumed to be described by a statistical error model of the form u i,j = u ( x i , t j ) + w i,j (cid:12) ε i,j , (4)in which the measured data u i,j are a combination of the underlying dynamical system u ( x i , t j ) and some random variable w i,j (cid:12) ε i,j where (cid:12) represents element-wisemultiplication [29]. In general, the independent and identically distributed (i.i.d.)May 28, 2020 6/34andom variable ε i,j is modeled by an n -dimensional normal distribution with meanzero and variance one that is weighted by w i,j = (cid:2) ω u γ ( x i , t j ) . . . ω n u γn ( x i , t j ) (cid:3) T , (5)for γ ≥ ω , . . . , ω n ∈ R where n is the dimensionality of the system. Note that (i)noiseless data are modeled by letting ω , . . . , ω n = 0, (ii) constant-variance error used inordinary least squares is modeled by letting γ = 0, ω , . . . , ω n = 1, and (iii)non-constant-variance error (e.g. proportional error) used in generalized least squares ismodeled by letting γ > ω , . . . , ω n (cid:54) = 0. Therefore, to account for the statistical errormodel in Eq (4), the GLS objective function L GLS = 1

M N

M,N (cid:88) i =1 ,j =1 (cid:20) u MLP ( x i , t j ) − u i,j | u MLP ( x i , t j ) | γ (cid:21) , (6)is used with proportionality constant γ = 0 .

2. Note that γ was tuned numericallyfollowing the methodology suggested in [26] (see Methods Section for more details).The next term L PDE ensures u MLP satisfies the solution of the governing PDE. Forease of notation, let ˆ u i,j ≡ u MLP ( x i , t j ), ˆ D i,j ≡ D MLP ( u MLP ( x i , t j )), andˆ G i,j ≡ G MLP ( u MLP ( x i , t j )). Then for the reaction-diffusion equation, the error termtakes the following form: L PDE = 1

M N

M,N (cid:88) i =1 ,j =1 (cid:34) ∂ ˆ u i,j ∂t (cid:124) (cid:123)(cid:122) (cid:125) LHS − (cid:18) ∂∂x (cid:18) ˆ D i,j ∂ ˆ u i,j ∂x (cid:19) + ˆ G i,j ˆ u i,j (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) RHS (cid:35) , (7)where LHS and RHS denote the left-hand- and right-hand-sides of the governing PDE,respectively. Thus, by driving L PDE to zero, the RHS is trained to match the LHS.Through this process, the nonlinear forms of D MLP and G MLP are learned despite notbeing directly observed. See the Methods Section for additional implementation details,including a random sampling procedure that enforces this PDE constraint everywhere inthe input domain during training.Biological information and domain expertise are incorporated into the BINNsframework by adding penalties in the loss term L Constr . For the reaction-diffusionequation, the diffusivity and growth rates are assumed to be within biologically feasibleranges [ D min , D max ] and [ G min , G max ], respectively. Further, diffusion is also assumedto be non-decreasing and growth to be non-increasing with respect to cell density. Thecorresponding constraints take the form: L Constr = 1

M N (cid:34)

M,N (cid:88) i =1 ,j =1ˆ DD max (cid:16) ˆ D i,j (cid:17) + M,N (cid:88) i =1 ,j =1 ∂ ˆ D/∂ ˆ u< (cid:32) ∂ ˆ D i,j ∂ ˆ u i,j (cid:33) (8)+ M,N (cid:88) i =1 ,j =1ˆ GG max (cid:16) ˆ G i,j (cid:17) + M,N (cid:88) i =1 ,j =1 ∂ ˆ G/∂ ˆ u> (cid:32) ∂ ˆ G i,j ∂ ˆ u i,j (cid:33) (cid:35) . The maximum and minimum diffusivity and growth rates considered in [2] were used toforce D MLP and G MLP to stay within biologically realistic ranges. The constraints on D MLP and G MLP shown in Eq (8) were used for all computational results in this work.See the Methods Section for biological motivations and numerical implementationdetails of these constraints.May 28, 2020 7/34 valuation procedure

Because the model prediction u MLP ( x, t ) is only a surrogate model for the dynamicalsystem u ( x, t ), it is possible that this approximation may contain errors, particularly inareas where the PDE constraint given by Eq (7) is not satisfied. To ensure that theinferred diffusion and growth terms lead to biologically realistic dynamics, thereaction-diffusion equation given by Eq (1) is solved numerically with a method-of-linesapproach using D = D MLP and G = G MLP . Note that this model is well-defined because D MLP and G MLP are continuously differentiable functions of the cell density, u . Further,BINNs are retrained multiple times for each data set in which the forward simulationusing the learned PDE terms that yields the smallest GLS error (Eq (6)) is saved. Allfits to the data shown in the Results Section are numerical solutions to the PDE inEq (1) using the learned diffusivity and growth functions. See the Methods Section fornumerical implementation details of the PDE forward solver. Results

Simulation case study

Since the diffusivity and growth terms are inferred by BINNs through learning D MLP and G MLP , respectively, the ability of BINNs to learn biologically accuraterepresentations of these terms must first be tested. To investigate this, data weresimulated using the classical FKPP and Generalized Porous-FKPP equations withparameter values from [2] for the scratch assay data with initial cell density 20,000 cellsper well. Additionally, the simulated data were obscured with artificial observation errorusing the statistical model in Eq (4) with γ = 0 .

2. Each simulation used the initialcondition from the scratch assay data with initial cell density 20,000 cells per well.Using the same level of sparsity (i.e. 37 spatial points and five time points), the BINNsframework was shown to (i) approximate the dynamical system accurately and (ii)approximate the general forms of the diffusivity and growth terms. See Fig S2 and FigS3 for the model and parameter fits, respectively. This case study demonstrates thatBINNs are able to learn accurate representations of the diffusivity and growth functionsfrom biologically realistic noisy sparse data, however, further analysis, like modelselection and comparison, is omitted here and instead explored using experimental data.

Reaction-diffusion BINNs for experimental data

As described in the previous sections, the diffusivity and growth functions areapproximated by deep neural networks, D = D MLP ( u ) and G = G MLP ( u ), resulting in agoverning PDE of the form u t = ( D MLP ( u ) u x ) x + G MLP ( u ) u, (9)where D MLP and G MLP are functions of the cell density u . A BINN was trained foreach data set with varying initial cell density. The resulting numerical PDE solutionsusing the trained D MLP and G MLP are shown in Fig 3.While the model fits shown in Fig 3 are excellent for lower initial cell densities, therestill remains a significant amount of model discrepancy at higher initial cell densities.GLS residual errors were computed to provide an additional way of visualizing themodel discrepancy (see Fig S4) in which non-i.i.d. residuals are clearly present at higherinitial cell densities. To investigate the specific form of the model discrepancy, Fig 4shows the learned diffusivity and growth functions with the corresponding model fit forthe data set with an initial cell density of 20,000 cells per well.May 28, 2020 8/34 .00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000500100015002000 C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 18,000 cells per well

Initial cell density: 20,000 cells per well

Fig 3. Reaction-diffusion BINN solutions.

Predicted cell density profiles usingBINNs with the governing reaction-diffusion PDE in Eq (9). Each subplot correspondsto an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000,18,000, and 20,000 cells per well). Solid lines represent the numerical solution to Eq (9)using D MLP and G MLP . The markers represent the experimental scratch assay data.

Fig 4. Reaction-diffusion BINN terms and discrepancy.

Left: learneddiffusivity and growth functions, D MLP and G MLP , evaluated over cell density, u . Right:Predicted cell density profiles using BINNs with the governing reaction-diffusion PDEin Eq (9) for data with initial cell density 20,000 cells per well. Solid lines represent thenumerical solution to Eq (9) using D MLP and G MLP . The markers represent theexperimental scratch assay data.Fig 4 reveals clear model discrepancy in two main areas: (i) at high cell densities (i.e. x ∈ [0 , .

25] mm and x ∈ [1 . , .

0] mm for t ∈ [0 ,

1] days) where diffusion is negligibleand the dynamics are governed primarily by growth; and (ii) at low cell densities (i.e. x ∈ [0 . , .

5] mm for t ∈ [0 ,

1] days) where growth is negligible and the dynamics areprimarily governed by diffusion. In particular, the discrepancy is largest for early timepoints where the diffusion and growth dynamics appear too rapid. The solutions of D MLP and G MLP are also qualitatively similar to the classical FKPP equation in whichthe learned diffusivity function is relatively constant while the learned growth functionis approximately linearly decreasing with cell density, u . However, despite D MLP andMay 28, 2020 9/34

MLP learning biologically realistic functions for the diffusivity and growth, thepersistent model discrepancy observed across multiple data sets with high initial celldensities (see Fig 3) suggests that the reaction-diffusion equation described in Eq (9)may be insufficient to fully capture the underlying dynamics of cell migration for thesedata. From a mathematical modeling perspective, the model discrepancy at early timepoints suggests the existence of a time delay that scales the magnitude of thedensity-dependent diffusion and growth rates. Biological reasons behind thisphenomenon may include cell damage from the scratch assay protocol or changes in cellfunctions where more cells become immobile/non-proliferative as the cell densityapproaches carrying capacity [30–32]. See the Discussion Section for more details.

Delay-reaction-diffusion BINNs for experimental data

Motivated by the model discrepancy for data sets with high initial cell density, thereaction-diffusion equation in Eq (9) was modified by including a time delay describedby an additional neural network function T MLP ( t ). The new term T MLP ( t ) is acontinuously differentiable function of time that is constrained to be non-decreasing andoutput values between 0 and 1. In this way T MLP can scale the strength of thedensity-dependent diffusivity and growth terms in time. Letting the diffusivity, D , andgrowth, G , terms of the governing PDE be functions of u and t , they are replaced with D = T MLP ( t ) D MLP ( u ) and G = T MLP ( t ) G MLP ( u ). This results in a governing PDE ofthe form u t = (cid:0) T MLP ( t ) D MLP ( u ) u x (cid:1) x + T MLP ( t ) G MLP ( u ) u, which simplifies to u t = T MLP ( t ) (cid:16) ( D MLP ( u ) u x ) x + G MLP ( u ) u (cid:17) , (10)where D MLP and G MLP are functions of the cell density u and T MLP is a function oftime t . Note that T MLP was chosen to be separable from D MLP and G MLP since thedensity-dependent dynamics of diffusion and growth are assumed to be consistentthroughout time. Further, it was assumed that both D MLP and G MLP are scaled by thesame time delay; see the Discussion Section for more details. BINNs governed by thePDE in Eq (10) were trained for each data set with varying initial cell density. Theresulting forward simulations using the trained T MLP , D MLP , and G MLP networks areshown in Fig 5.The model fits shown in Fig 5 demonstrate that virtually all of the modeldiscrepancy across each initial cell density was removed by including a time delay. Thisis confirmed further using GLS residual errors (see Fig S5) where the residuals areapproximately i.i.d. even at higher initial cell densities. Similar to the reaction-diffusioncase, Fig 6 shows the learned diffusivity, growth, and delay functions with thecorresponding model fit for the data set with initial cell density of 20,000 cells per well.Fig 6 shows that the model discrepancy in areas with high and low cell densities atearly time points has been practically eliminated. This is most clearly seen in thedelay-reaction-diffusion model solution at the second time point (i.e. t = 0 . D MLP and G MLP for thedelay-reaction-diffusion BINN learned similar forms of the diffusivity and growthcompared to the reaction-diffusion case. However, the delay term T MLP reveals that thediffusion and growth dynamics described by D MLP and G MLP are scaled down for earlytime points (i.e. t <

1) before T MLP converges to 1, allowing D MLP and G MLP to comeinto full effect. This observation is of particular importance since the majority ofMay 28, 2020 10/34 .00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.000500100015002000 C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 18,000 cells per well

Initial cell density: 20,000 cells per well

Fig 5. Delay-reaction-diffusion BINN solutions.

Predicted cell density profilesusing BINNs with the governing delay-reaction-diffusion PDE in Eq (10). Each subplotcorresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numericalsolution to Eq (10) using T MLP , D MLP , and G MLP . The markers represent theexperimental scratch assay data.

Fig 6. Delay-reaction-diffusion BINN terms and discrepancy.

Left: learneddiffusivity and growth functions, D MLP and G MLP , evaluated over cell density, u , anddelay function, T MLP , evaluated over time, t . Right: Predicted cell density profiles usingBINNs with the governing delay-reaction-diffusion PDE in Eq (10) for data with initialcell density 20,000 cells per well. Solid lines represent the numerical solution to Eq (10)using D MLP , G MLP , and T MLP . The markers represent the experimental scratch assaydata.scratch assay data are reported within this time delay region (i.e. 4, 6, 12, or 24hrs) [8, 13, 15, 33]. Importantly, not accounting for a time delay within this region maypotentially explain why scratch assay experiments are notoriously difficult toreproduce [2].May 28, 2020 11/34 uided mechanistic model selection

The diffusion, growth, and delay networks D MLP , G MLP , and T MLP were used to guidethe selection of biologically realistic mechanistic models for downstream use in atraditional mathematical modeling framework. Each network solution corresponding tothe six scratch assay data sets is shown in Fig 7. D i ff u s i v i t y ( mm / d a y ) D MLP G r o w t h ( / d a y ) G MLP S c a l e ( un i t l e ss ) T MLP

Fig 7. Delay-reaction-diffusion BINN terms.

The learned diffusivity, D MLP ,growth, G MLP , and delay, T MLP , functions extracted from the corresponding BINNswith governing delay-reaction-diffusion PDE in Eq (10). Each line corresponds to anexperiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000,18,000, and 20,000 cells per well). Note that D MLP and G MLP have different lengthssince they are evaluated between the minimum and maximum observed cell densitiescorresponding to each data set.From Fig 7, the learned diffusivities for each experiment with different initial celldensity are non-zero when u = 0 and appear increasing and concave up (like power laws)with respect to the cell density u . On the other hand, the learned growth terms areapproximately linear, which is consistent with logistic models, and the learned delayterms all exhibit sigmoidal dynamics. Note that the outlying G MLP solution for thescratch assay data set with 10,000 initial cells per well is likely an artifact of theobserved cell densities in that experiment not approaching the carrying capacity, andtherefore leading to unrealistic learned dynamics. Based on qualitative analysis of theseplots, the following mechanistic delay-reaction-diffusion equation is proposed to satisfyeach scratch assay data set: u t = T ( t ) (cid:16) ( D ( u ) u x ) x + G ( u ) u (cid:17) , (11)with diffusivity, growth, and delay functions D = D + D (cid:16) uK (cid:17) m , (12a) G = ru (cid:16) − uK (cid:17) , (12b) T = 11 + e − ( β t + β ) , (12c)respectively. The diffusivity function D in Eq (12a) is a combination of the classicalFKPP and Generalized Porous-FKPP diffusivity function, with baseline cell diffusivity D , diffusion coefficient D , and exponent m . The growth function G in Eq (12b) ischosen to be the logistic growth function with intrinsic growth rate r and carryingcapacity K . The delay function T in Eq (12c) is represented by the logistic regressionfunction with parameters β and β . One advantage of using a mathematical modelwith specified functional forms and parameters described by Eq (12a)-(12c) is thatstandard parameter estimation techniques can now be used. This enables a comparisonof the BINN-guided model in Eq (11) to other mechanistic models, namely, the classicalFKPP and Generalized Porous-FKPP equations.May 28, 2020 12/34 odel comparison The BINN-guided delay-reaction-diffusion model in Eq (11) was compared to theclassical FKPP equation u t = ( Du x ) x + ru (cid:0) − uK (cid:1) , (13)with diffusion coefficient D , intrinsic growth rate r , and carrying capacity K andGeneralized Porous-FKPP equation u t = ( D ( uK ) m u x ) x + ru (cid:0) − uK (cid:1) , (14)with additional exponent m . These models were used as a baseline for comparison sincethey have been identified as the current state-of-the-art in modeling these data [2, 34].The parameters of each model were optimized numerically using the generalized leastsquares error function in Eq (6) with the adjusted statistical error model in Eq (4) with γ = 0 .

2. Note that the carrying capacity was fixed at K = 1 . × / mm and notoptimized because it was empirically validated in [2]. The resulting model fits andparameter values for the classical FKPP and Generalized Porous-FKPP models areshown in Fig S6, Fig S7, Tab S1, and Tab S2. The solutions of the BINN-guideddelay-reaction-diffusion model in Eq (11) to each data set are shown in Fig 8. C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 18,000 cells per well

Initial cell density: 20,000 cells per well

Fig 8. BINN-guided delay-reaction-diffusion model solutions.

Predicted celldensity profiles using the delay-reaction-diffusion model in Eq (11). Each subplotcorresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numericalsolution to Eq (11) using the parameters that minimize L GLS in Eq (6). The markersrepresent the experimental scratch assay data.The predicted cell density profiles in Fig 8 closely matched the scratch assay datawhich suggests that the proposed model in Eq (11) with Eqs (12a)-(12c) successfullycaptured the learned dynamics from T MLP , D MLP , and G MLP . The optimizedparameter values across each data set are shown in Table 1. Note that the parameterswere rescaled to µ m and hours (hr) for comparison with [2] and [34].Table 1 reveals that many of the parameters relating to density-dependent diffusionand growth show trends (e.g. D and r increasing) with initial cell density similar to [2].The implications of this observation are considered in the Discussion Section. ToMay 28, 2020 13/34 able 1. BINN-guided delay-reaction-diffusion model parameters. Initial cell densityParameter 10,000 12,000 14,000 16,000 18,000 20,000 D ( µ m / hr ) 95.7 353.3 482.1 604.3 804.0 675.8 D ( µ m / hr ) 3987.1 3166.4 3775.0 3773.8 2201.8 1954.9 m (unitless) 1.5976 3.4708 1.9060 3.5173 3.2204 0.9876 r ( / hr ) 0.0525 0.0714 0.0742 0.0798 0.0772 0.0951 β (unitless) -1.0292 -3.3013 -3.1953 -2.9660 -1.2695 -4.0651 β ( / hr ) 0.2110 0.2293 0.2761 0.2180 0.1509 0.4166Table of model parameters for Eq (11) calibrated for each scratch assay data set. Eachcolumn corresponds to an experiment with different initial cell density (i.e. 10,000,12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Table 2. Generalized least squares (GLS) errors.

Initial cell densityModel 10,000 12,000 14,000 16,000 18,000 20,000classical FKPP 786.80 557.28 616.76 619.12 685.17 964.19Porous-FKPP 681.18 540.29 418.57 566.89 744.44 928.38BINN-guided model

Table of GLS errors between the model solutions and scratch assay data. Each columncorresponds to an experiment with different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). Bold numbers represent the minimumGLS error across the three models.compare the three models quantitatively, the generalized least squares (GLS) errorswere computed for each model and data set and reported in Table 2.The results in Table 2 showed that Eq (11) with Eqs (12a)-(12c) fit each data setmore accurately than the classical FKPP or Generalized Porous-FKPP models. Thisbehavior is not surprising given that the BINN-guided model is more complex.Therefore, model selection methods, which balance model accuracy with modelcomplexity, were also used to compare the quality of each model relative to the others.In particular, the modified Akaike Information Criterion (AIC) from [35] was used toaccount for the statistical error model in Eq (4). See Table 3 for the AIC scores acrosseach model and data set.

Table 3. Akaike Information Criterion (AIC) scores.

Initial cell densityModel 10,000 12,000 14,000 16,000 18,000 20,000classical FKPP 1239.6 1175.8 1194.5 1195.2 1214.0 1277.2Porous-FKPP 1214.9 1172.0

Table of AIC scores for each model and scratch assay data set. Each columncorresponds to an experiment with different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). Bold numbers represent the minimumAIC score across the three models.The results in Table 3 showed that the BINN-guided delay-reaction-diffusion modeloutperforms the classical FKPP and Generalized Porous-FKPP models across all dataMay 28, 2020 14/34ets except with initial cell density 14,000 cells per well. This discrepancy follows fromTable (6) where the additional parameters in Eq (11) only slightly decreased the GLSerror for the data set with initial density of 14,000. Finally, to quantify the “value” ofadding the novel delay term in Eq (12c) the differences between AIC scores for eachmodel and the minimum AIC score, denoted by ∆AIC, are shown in Table 4.

Table 4. Difference Akaike Information Criterion ( ∆ AIC) scores.

Initial cell densityModel 10,000 12,000 14,000 16,000 18,000 20,000classical FKPP 55.90 96.26 69.71 76.01 140.08 161.11Porous-FKPP 31.23 92.54 0.00 61.70 157.42 156.11BINN-guided model 0.00 0.00 2.53 0.00 0.00 0.00Table of AIC differences (∆AIC) between each model and scratch assay data set. Eachcolumn corresponds to an experiment with different initial cell density (i.e. 10,000,12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Each ∆AIC score representsthe difference between a model’s AIC score and the minimum recorded AIC score forthat data set.Table 4 suggests that the delay term is most impactful for data sets with large initialdensity (i.e. 18,000 and 20,000 cells per well) since the ∆AIC scores are significantlylarger for these data sets. Biological analysis and explanations for these results areconsidered in the following Discussion Section.

Discussion

In this work, biologically-informed neural networks (BINNs) were introduced as aflexible and robust equation learning method for real-world biological applications. TheBINNs framework was demonstrated using experimental biological data from scratchassays [2] and used to discover a delay term that had not yet been considered in themodeling of these data. The trained diffusivity, growth, and delay networks were usedto guide the selection of the mechanistic model in Eq (11) with Eqs (12a)-(12c), whichwas shown to model the data more accurately than the current state-of-the-art models(i.e. classical FKPP and Generalized Porous-FKPP equations). The results shown inthis work suggest that the BINNs framework can be successfully applied to a wide rangeof biological and physical problems where the data are sparse and the governingdynamics are unknown. The biological motivations for various aspects of the BINNsframework and significance of the results are discussed in the following paragraphs.The model solutions in Fig 3 and Fig 4 indicated that using only density-dependentdiffusivity and growth functions D ( u ) and G ( u ) was not sufficient to fully capture thescratch assay dynamics. Fig 4 highlighted this discrepancy at the second timemeasurement ( t = 0 . D and G be universal function-approximating neuralnetworks. In particular, the model solutions in the areas of high cell density (i.e. x ∈ [0 . , .

5] and x ∈ [1 . , . t ≥ t < x ∈ [0 . , . D = D MLP ( u, t ) and G = G MLP ( u, t ), respectively. However, sincethe dynamics of diffusion and growth are assumed to be consistent throughout time, thediffusion and growth terms were chosen to be separable functions composed ofdiffusivity D ( u ), growth G ( u ), and delay T ( t ). Additionally, it was assumed that bothdiffusion and growth were scaled by the same time delay T ( t ) as opposed to a diffusiondelay T D ( t ) and growth delay T G ( t ). This assumption may not be accurate if the timedelay is a result of density-dependent changes in cell function where cells become mobileand proliferative at different rates. In particular, since migration and proliferation havevery different timescales, it might be natural to expect that the delays would also havedifferent timescales. However, since the numerical solutions using T ( t ) matched thedata sufficiently accurately, this question is left for future work. Finally, T ( t ) wasconstrained to output values between 0 and 1 and forced to be increasing with time.These constraints were chosen to ensure that the delay term modeled thetime-dependent changes in cell dynamics for early time points but converged to unity bylater time points.In this work, BINNs revealed that the reaction-diffusion system in Eq (1) with celldensity-dependent diffusivity and growth functions was insufficient to capture the datadynamics. However, the model discrepancy for data sets with large initial cell densitymotivated the development of a time delay which significantly improved the modelaccuracy and resolved the observed discrepancy. The diffusivity, growth, and delaynetworks were used to posit a mechanistic model (i.e. Eq (11) with Eqs (12a)-(12c)).Using the logistic growth model (Eq (12b)) for the growth function and logisticregression (Eq (12c)) for the delay function followed straightforwardly from theparameter network solutions in Fig 7, however, the diffusivity function in Eq (12a)warrants further discussion.Opinions vary between the biological validity of (i) the classical FKPP and (ii) theGeneralized Porous-FKPP diffusivity functions. For example, one study compared (i)and (ii) using experimental wound size data and found that (ii) with m = 4 providedthe best fit to the data [36]. Another study fit (i) and (ii) to experimental cell migrationdata with different cell populations and found that one population was best describedby constant diffusivity in (i) and the other by nonlinear diffusivity with m = 1 in(ii) [37]. These studies do not reveal which approach is best, but they demonstrate thatcare is warranted. The diffusivity network ( D MLP ) solutions showed significantvariability across the scratch assay data sets, so the posited mechanistic model waschosen to respect the observed variability while also being as simple as possible.Therefore, Eq (12a) was chosen to be a combination of the diffusivity functions in (i)and (ii). This way both (i) and (ii) can be seen as nested models of the positeddiffusivity function by setting either D = 0 or D = 0. Yet the posited diffusivity is stillsimple, as it only increases the number of parameters with respect to (ii) by one. It maybe the case that the true diffusivity function is even more complex, such as a linearcombination of powers: D = D + D (cid:16) uK (cid:17) m + D (cid:16) uK (cid:17) m , May 28, 2020 16/34ith baseline diffusivity D , diffusion rates D and D , carrying capacity K , andexponents m and m . However, these considerations are beyond the scope of thepresent work and left for future work.The parameters of (i) the classical FKPP in Eq (13), (ii) the GeneralizedPorous-FKPP in Eq (14), and (iii) the BINN-guided model in Eq (11) withEqs (12a)-(12c) were optimized numerically for each scratch assay data set. Theoptimized parameters for (i) in Tab S1 all fall within the ranges reported in [2].However, this is not the case for any set of parameter values for (ii) as shown in Tab S2.This is likely due to the parameter optimization being conducted using the adjustedstatistical error model in Eq (4) with γ = 0 . m in thePorous-FKPP diffusivity function was not fixed at m = 1 as in [2]. However, in both (i)and (ii), the diffusion coefficient, D , and intrinsic growth rate, r , showed variabilitywith initial cell density, similar to the conclusions drawn in [2]. Therefore, in theory, ifthe delay term in Eq (12c) accounts for the time it takes for density-dependent growthand diffusion to become active in the system, which may be a function of initial celldensity, then the variability among diffusion coefficients and intrinsic growth rates forthe BINN-guided delay-reaction-diffusion model should be reduced across the scratchassay experiments. However, from the optimized parameter values in Table 1, thebaseline diffusion rate D and intrinsic growth rate r generally increase with initial celldensity and the diffusion coefficient D generally decreases with initial cell density. Thisobservation may indicate (i) practical identifiability issues between the diffusion, growth,and delay terms or (ii) the existence additional mechanisms that are not accounted forin the model. To confirm this, a Bayesian parameter estimation framework can be usedto examine practical identifiability of parameters [44, 45]. Then, a possible strategy tomitigate this issue would be to optimize the parameters of Eq (11) with Eqs (12a) and(12b) jointly across each scratch assay data set while allowing the delay parameters inEq (12c) to be tuned separately for each set. This exploration is left for future work.The BINN-guided delay-reaction-diffusion model was compared to the baselineclassical FKPP and Generalized Porous-FKPP models using both GLS errors andmodified AIC scores. The GLS errors in Table 2 showed that the BINN-guided modelfits the data more accurately than the baseline models across each scratch assay dataset. However, this improvement in accuracy is due to the increased model complexity(i.e. number of parameters and PDE terms) in the BINN-guided model. Therefore, torank the quality of each model, AIC scores were also computed since they balancemodel accuracy with model complexity. The AIC scores reported in Table 3 indicatethat the BINN-guided model also exceeds the baseline models in terms of relativequality across each scratch assay data set except with initial cell density 14,000 cells perwell, in which the Generalized Porous-FKPP model has a slightly smaller AIC score. Inother words, Tables 2 and 3 indicate that the BINN-guided model performs as well orbetter than the state-of-the-art in modeling the suite of scratch assay experimentsfrom [2]. In particular, this advantage is afforded by including the delay term inEq (12c). To quantify the relative value of adding the delay term, the AIC scores fromTable 3 are used to compute difference AIC (∆AIC) scores in Table 4 in which the∆AIC score for a fixed model and data set is given by the difference between thecorresponding AIC score and the minimum AIC score across all models for the givendata set. The ∆AIC scores in Table 4 indicate that the relative value of the delay termis largest for data sets with initial cell density 18,000 and 20,000 cells per well. Thisobservation is supported by the relevant biology discussed at the beginning of thissection, in which large initial cell densities either (i) result in more damaged cells nearthe borders of the scratch, (ii) cause more cells in the population to have terminallydifferentiated away from mobile/proliferative cell functions, or (iii) some combination of(i) and (ii) and other potentially unconsidered biological sources, all of which increaseMay 28, 2020 17/34he potential time delay before the density-dependent diffusion and growth dynamicsbecome the primary drivers of the temporal evolution of the system. Conclusions and future work

BINNs, a robust and flexible framework for equation learning with sparse and noisydata, was demonstrated and used to posit a mechanistic equation that outperforms thestate-of-the-art in modeling experimental scratch assay data. The development, training,and evaluation of BINNs and the resulting model selection and analysis were reportedto justify these claims. The discovered time delay term may have importantimplications for the reproducibility and modeling of scratch assays, since the majority ofthe reported data fall within the time delay region. Some of the drawbacks of theBINNs method and opportunities for future work and development are discussed below.Since BINNs rely on multilayer perceptrons (MLPs), the learned dynamics may notgeneralize well outside the training domain. For example, in the present work, if theobserved cell densities for a particular experiment do not approach the carrying capacity(e.g. the scratch assay data set with 10,000 initial cells per well) then the learneddynamics given by D MLP and G MLP may lead to biologically unrealistic behavior (see G MLP solutions in Fig 7). Further, since none of the scratch assay data reported valuesthat significantly exceeded the empirically set carrying capacity, G MLP would likely notgeneralize well to a scenario with exceedingly large observed cell densities. Options formitigating this issue include (i) replacing unrealistic MLP terms with mechanisticmodels (e.g. logistic growth instead of G MLP ) if the particular dynamics are known apriori , or (ii) adding additional constraints which force the MLP terms to satisfyspecific values (e.g. G MLP ( u = K ) = 0).An opportunity for future development is quantifying the uncertainty of both theapproximate solution u MLP and the parameter networks D MLP , G MLP , and T MLP .From the frequentist perspective, so called “subagging” (i.e. subsample aggregating) canbe used to build posterior distributions of the model solutions and parameternetworks [38]. In this framework, one simply samples N training/validation splits andtrains a BINN for each split. Then kernel density estimation or some other equivalentmethodology can be used to build distributions from the N number of trained BINNs.Alternatively, from the Bayesian perspective, physics-informed neural networks wererecently extended to Bayesian physics-informed neural networks (B-PINNs) [25]. In thisframework, Bayesian neural networks are substituted for u MLP and regularized using apre-specified governing PDE. In the BINNs framework, Bayesian neural networks couldalso be substituted for D MLP , G MLP , and T MLP to quantify the uncertainty of the PDEterms in addition to the model solution.While BINNs were demonstrated using one-dimensional reaction-diffusion PDEs forscratch assay data in this work, they can be applied on a wide spectrum of physical andbiological problems (for both ODE and PDE systems) in which the governing dynamicsare unknown and highly nonlinear. A straightforward next step for this work would beto evaluate BINNs on the two-dimensional scratch assay image data that were used toconstruct the one-dimensional cell density profiles in [2]. Further, more complicated celldynamics could be incorporated into the governing system in the present work byincluding PDE terms that describe cell population heterogeneity or additional biologicalmechanisms for damaged (but not dead) cells at the borders of the scratch.BINNs were used to address a canonical problem in the field of collective cellmigration by analyzing how the combination of density-dependent cell motility andproliferation drive the temporal dynamics of cell invasion during an experimentalscratch assay. This novel framework revealed new mechanistic and biological insightsinto this process by guiding the derivation of a mathematical model that has not beenconsidered previously using traditional mathematical modeling approaches. TheMay 28, 2020 18/34lassical FKPP and Generalized Porous-FKPP models are ubiquitous in modeling cellmigration and proliferation, yet the BINNs methodology presented here revealed thatthese models may fail to incorporate all of the relevant mechanisms underlying thisprocess. These results suggest that new models incorporating a time delay may benecessary to accurately capture the dynamics within the first day of a scratch assay, i.e.,just after the scratch is introduced. Based on the success of this work, BINNs establisha new paradigm for data-driven equation learning from sparse and noisy data that couldenable the rapid development and validation of mathematical models for a broad rangeof real-world applications throughout biology including ecology, epidemiology, and cellbiology.

Methods

All methods herein were implemented in Python 3.6.8 using the PyTorch 1.2.0 deeplearning library. All data and code are made publicly available at https://github.com/jlager . The following section is intended to make BINNsfeasible for a wide range of biological applications. In particular, this section covers (i)the importance of data pre-processing, (ii) strategies for using real-world knowledge todesign effective neural network models, (iii) the complete training protocol ranging fromselecting appropriate statistical error models and hyperparameters to balancing themulti-objective error function, and (iv) numerical implementation details for forwardsolving BINNs-guided PDEs.

Data pre-processing

Input and output standardization are common practice to stabilize neural networktraining [39]. Since the scratch assay data in [2] reported cell densities on the order of u = O (10 − ) cells / µ m at spatial locations on the order of x = O (10 ) µ m for time pointson the order of t = O (10) hours, these variables needed to be standardized. Withoutstandardization, the neural network models failed to converge for these data because (i)the network inputs ( x and t ) differed by several orders of magnitude from each otherand (ii) the network inputs ( x and t ) and outputs ( u ) also differed by several orders ofmagnitude. By rescaling x and t to millimeters (mm) and days, respectively, theadjusted variables ranged from x = O (1) mm, t = O (1) days, and cell density u = O (10 ) cells / mm . Standardizing x and t addressed (i) while (ii) is addressed byusing scaling factors discussed in the following section. The cell density profile at theleft boundary was removed since it was consistently larger than the remaining celldensities across all six data sets. Network design

BINNs are centered around u MLP , a function-approximating multilayer perceptron(MLP) (also known as an artificial neural network). MLPs, like polynomials [40], are inthe class of universal function approximators , meaning that they can approximate anycontinuous bounded functions on a closed interval arbitrarily well under somereasonable assumptions [28]. For the scratch assay data in the present work, u MLP inputs spatiotemporal vectors x = (cid:2) x, t (cid:3) and outputs the corresponding approximationsto the cell density u . To give u MLP sufficient capacity to approximate the solution tothe governing PDE, the network is chosen to have three hidden layers with 128 neuronsin each layer, resulting in a model with approximately 30,000 total parameters.Concretely, u MLP takes the form u MLP ( x ) = α · φ (cid:16) σ (cid:16) σ (cid:0) σ ( x W + b ) W + b (cid:1) W + b (cid:17) W + b (cid:17) , (15)May 28, 2020 19/34here the trainable parameters W i and b i denote weight matrices and bias vectors forthe i th layer, σ ( · ) and φ ( · ) denote nonlinear activation functions, and α denotes ascaling factor. Each hidden layer uses a “sigmoid” activation function (i.e. σ ( x ) = 1 / (1 + e − x )) while the output layer uses a “softplus” activation function (i.e. φ ( x ) = ln(1 + e x )). The softplus activation function is a particular design choice since itis a continuously differentiable function that forces the predicted cell densities to benon-negative, and has been previously shown to be well-suited for biological transportmodels [26]. Finally, to account for the difference in scale between the inputs( x, t = O (1)) and outputs ( u = O (10 )), the MLP outputs are post-multiplied by theexperimentally validated carrying capacity (i.e. α = 1 . × ) from [2]. Note that inpractice, if values like this are unknown, one can simply let α be the maximum observedcell density or some other similar quantity. The key here is to ensure the orders ofmagnitude between the network inputs and outputs are similar so that the parametersof the MLP do not have to account for the change of scale [39].The diffusivity, growth, and delay functions of the governing PDEs are modeled withneural networks D MLP ( u MLP ) and G MLP ( u MLP ), and T MLP ( t ). All three MLPs sharethe same number of layers as u MLP but use 32 neurons per layer. These networks arechosen to be smaller for both computational efficiency and because the parameterdynamics are assumed to be simpler than the cell density dynamics u . The hiddenlayers use sigmoid activation functions. The output layer for D MLP uses a softplusactivation because diffusion is assumed to be non-negative for all cell densities. Sincethe growth term can be negative (e.g. logistic growth when the cell density exceeds thecarrying capacity), a linear output (i.e. no activation function) is used in the final layerfor G MLP . The output layer for T MLP uses the sigmoid function to constrain theoutputs to (0 , u MLP , the inputs and outputs of D MLP and G MLP arealso standardized. In particular, the inputs of both networks (i.e. u MLP ) are divided bythe carrying capacity K = 1 . × while the outputs of D MLP are multiplied by 0.096 mm / day and the outputs of G MLP are multiplied by 2.4 / day . These values were themaximum diffusion and growth values considered in [2]. Similar to u MLP , the input andoutput scaling factors ensure the MLP parameters do not have to account for changes inscale. No standardization was used for T MLP since its inputs and outputs are of thesame order (i.e. O (1)). Training procedure

The BINN parameters (i.e. weights and biases of u MLP , D MLP , G MLP , and T MLP ) areoptimized using the first-order gradient-based Adam optimizer [41] with defaulthyper-parameters and minibatch-optimization. To prevent over-fitting, the scratch assaydata were randomly partitioned into 80%/20% training and validation sets. Thenetwork parameters were updated iteratively to minimize L Total in Eq (3) on thetraining set and saved on relative improvement in validation error. In other words, themodel parameters were saved if the relative difference between (i) the validation error inthe current iteration and (ii) the smallest recorded validation error exceeded 5%.Finally, since the parameters of each BINN are randomly initialized and applied todifferent data sets, early stopping of 5,000 (i.e. training was stopped if the relativevalidation error did improve for 5,000 consecutive epochs) was used to guarantee theconvergence of each BINN independently. The implementation details of each term in L Total (i.e. L GLS , L PDE , and L Constr ) are discussed in more detail below.The first term L GLS in Eq (6) corresponds to the generalized least squares (GLS)distance between u MLP and the observation data u i,j . Since the error process isassumed to be i.i.d., the parameters of the statistical model in Eq (4) (i.e. γ ) must firstbe calibrated. Following [26], u MLP is trained using L GLS as an objective function for γ = 0 . , . , . , . γ = 0 . γ = 0 . L GLS is evaluated at eachtraining iteration using mini-batches (i.e. randomly selected subsets) of input/outputdata. In general, using a small batch size acts as an additional form of regularizationthat helps neural networks escape local minima during training and allows for bettergeneralization [42]. However, this significantly increases the computational cost oftraining due to the increased number of training iterations needed to converge.Therefore, BINNs were trained using mini-batches of size 37 (i.e. 1 / u MLP satisfies the solution of the governing PDE, the terms L PDE inEq (7) and L Constr in Eq (8) are included in L Total as a form of regularization. However,since the scratch assay data are sparse, simply training u MLP using L Total at theobserved data locations can result in unrealistic dynamics in between data points.Therefore, to ensure u MLP satisfies the solution of a governing PDE everywhere in theinput domain, L PDE and L Constr are evaluated at 10,000 uniformly randomly sampledpoints x i ∈ [ x min , x max ] and t j ∈ [ t min , t max ] at each training iteration. Without therandom sampling procedure, u MLP can severely overfit to the data. To illustrate theimportance of the random sampling procedure, the model fits, GLS errors, and PDEerrors are shown in Fig S10 for three cases in which (i) no PDE regularization is used,(ii) PDE regularization is used at the data locations, and (iii) PDE regularization isused at 10,000 randomly sampled points. In particular, Fig S10 shows that in option (i) u MLP overfits the data practically everywhere in the input domain, (ii) u MLP overfitseverywhere except at the data locations (see vertical lines in third subplot of row b),and (iii) the random sampling procedure results in the smallest amount of PDE errorand the largest amount of GLS error. The desired behavior is shown in option (iii) since u MLP fits the data as accurately as allowed by the governing PDE.The third error term L Constr constrains D MLP , G MLP , and T MLP to exhibitbiologically realistic values and dynamics. Choosing appropriate constraints can beambiguous when the relevant literature gives conflicting suggestions. For example, whendesigning a derivative constraint for the diffusivity network D MLP , [21] suggest thatdiffusion should decrease with cell density due to cell-to-cell adhesion whereas [11]suggest the opposite in which cells promote the migration of others. To mitigate this,BINNs were trained without any constraints on D MLP and G MLP in order to visualizethe collective behavior of the parameter networks (see Fig S8). Note that T MLP was stillforced to be non-decreasing. The network evaluations in Fig S8 showed unrealisticparameter dynamics for some data sets, but their collective behavior was used to designderivative constraints that forced D MLP to increase as a function of cell density and G MLP to decrease with cell density for the set of scratch assay data considered in thiswork. Concretely, the diffusion term D MLP was constrained to values between 0.0 and0.096 mm / day and the growth term G MLP to values between − .

48 and 2.4 / day . Themaximum and minimum diffusion values and maximum growth value were chosen basedon values used in [2]. The minimum growth value was chosen to be negative 20% of themaximum growth value to allow G MLP to output negative values for cell densities nearthe carrying capacity if needed. The sigmoid output activation function for the delayterm T MLP constrained its outputs to between 0 and 1. Derivative terms were used in L Constr to constrain D MLP and T MLP to be non-decreasing and G MLP to benon-increasing. For ease of notation, let ˆ u i,j ≡ u MLP ( x i , t j ),ˆ D i,j ≡ D MLP ( u MLP ( x i , t j )), ˆ G i,j ≡ G MLP ( u MLP ( x i , t j )), and ˆ T i,j ≡ T MLP ( t j ), then theMay 28, 2020 21/34onstraint term can be written concretely as L Constr = 1

M N (cid:34) α M,N (cid:88) i =1 ,j =1ˆ D< . D> . (cid:16) ˆ D i,j (cid:17) + α M,N (cid:88) i =1 ,j =1 ∂ ˆ D/∂ ˆ u< (cid:32) ∂ ˆ D i,j ∂ ˆ u i,j (cid:33) + (16) α M,N (cid:88) i =1 ,j =1ˆ G< − . G> . (cid:16) ˆ G i,j (cid:17) + α M,N (cid:88) i =1 ,j =1 ∂ ˆ G/∂ ˆ u< (cid:32) ∂ ˆ G i,j ∂ ˆ u i,j (cid:33) + α M,N (cid:88) i =1 ,j =1 ∂ ˆ T /∂ ˆ t< (cid:32) ∂ ˆ T i,j ∂ ˆ u i,j (cid:33) (cid:35) . Since the parameter networks and their derivatives occur at different scales with respectto each other and with respect to the error terms L GLS and L PDE , each term of Eq (16)is weighted by a factor α i . In particular, each constraint is weighted based on theinput/output scaling factors of the corresponding neural network (see Network Designsubsection). Concretely, the terms in Eq (16) are weighted by α = / . × , α = K / . × , α = / . × , α = K / . × , and α = 10 . Note that theweight factors for the derivative constraints on D MLP and G MLP (i.e. α and α )include the carrying capacity K = 1 . × since K was used as an input scaling factorfor these networks. The factor 10 was chosen large enough to guarantee that D MLP , G MLP , and T MLP exhibited the desired behavior. Boundary conditions can also beincluded in the L Constr term, however, since they were unknown for the scratch assaydata considered in this work, no boundary conditions were used to train u MLP .Finally, the GLS errors at the initial condition (i.e. data locations where t = 0) wereweighted by a factor of 10 during training. This was found to improve the generalizationaccuracy of D MLP , G MLP , and T MLP when evaluated using a numerical PDE solver.The reason for this is because the cell density at t = 0 may not satisfy a governingdynamical system since the measurement is taken directly after the scratch assayprotocol is performed [2]. However, the initial condition “sets the stage” for thegoverning dynamics to drive the temporal evolution of the system. Therefore, byweighting the initial condition more heavily in L GLS , the PDE error term L PDE mustconform u MLP to satisfy the governing system for t > u MLP at t = 0.This step forced D MLP , G MLP , and T MLP to learn more generalizable representations ofthe diffusivity, growth, and delay functions, respectively. The weighting factor wasnumerically validated using the mean GLS error across each scratch assay experimentfor weighting factors 1, 10, and 10 . Note that this weighting factor makes BINNssensitive to the random choice of training/validation split, since some data points in theinitial condition may be more informative than others for equation learning andultimate model generalizability. This observation was also noted in a recent equationlearning study in which the random split of training and validation sets was found toinfluence the structure of the learned equation [26]. Adopting a strategy similar to thisprevious study, BINNs were trained 20 times for each data set (using different randomtraining/validation splits). The BINN for which the numerical simulations resulted inthe smallest GLS error was saved as the best model. PDE Forward Solver

The numerical implementation details are provided for systems describing quantity ofinterest u ( x, t ) that are governed by the following equation: u t = (cid:0) Q ( u, u x , t ) (cid:1) x + F ( u ) , (17) u ( x, t ) = φ ( x ) ,u x ( x , t ) = u x ( x f , t ) = 0 , May 28, 2020 22/34or x ∈ [ x , x f ], and t ∈ [ t , t f ]. Note that the reaction-diffusion model in Eq (1) is anexample of Eq (17) where Q ( u, u x , t ) = D ( u, t ) u x and F ( u ) = G ( u, t ) u . In Eq (17), theinitial condition is denoted by φ ( x ) and the boundary conditions are assumed to beno-flux boundary conditions. Note that the no-flux condition represents a zero net fluxboundary condition which does not preclude cells moving across the boundary, butinstead reflects the situation in which the flux in the positive and negative x -directionsare equal, giving rise to zero total flux. The spatial and temporal domains arediscretized into equispaced grids as: x i = i ∆ x, t j = j ∆ t, (18)for i = 0 , . . . ,

200 and j = 0 , . . . , , u i ( t ) = u ( x i , t ).Then, the method-of-lines approach is used to solve Eq (17) with the numericaldiscretization from [43] that is given by (cid:0) Q ( u, u x , t ) (cid:1) x ≈ P i +1 / ( t ) − P i − / ( t )∆ x , (19)where P i +1 / ( t ) is an estimate for the rightwards diffusive flux at location x i that isgiven by P i +1 / ( t ) = 12 (cid:20) Q (cid:18) u i ( t ) , u i +1 ( t ) − u i ( t )∆ x , t (cid:19) + Q (cid:18) u i +1 ( t ) , u i +1 ( t ) − u i ( t )∆ x , t (cid:19)(cid:21) . (20)The no-flux boundary conditions at x and x are implemented by incorporating theghost points x − and x satisfying u − ( t ) = u ( t ) and u ( t ) = u ( t ). The Scipyintegration subpackage (version 1.4.1) is used to integrate Eq (17) over time using anexplicit fourth order Runge-Kutta Method. Parameter estimation

The parameters of each mechanistic model were optimized using the Limited-memoryBFGS algorithm with bound constraints (L-BFGS-B) in Python’s Scipy package withdefault tolerance values to minimize the generalized least squares error function inEq (6) with the adjusted statistical error model in Eq (4) with γ = 0 .

2. The parametersfor Eqs (13) and (14) were initialized using the values from [2]. The parameters forEq (11) were initialized by fitting each PDE term in Eqs (12a)-(12c) to thecorresponding parameter network solutions in Fig 7 using ordinary least squares.Finally, the diffusivity and growth function parameters were bounded using D min = 0 mm / day , D max = 0 . mm / day , m min = 0, m max = 4, r min = 0 / day , and r max = 2 . / day (all of which come from [2]), while the delay function parameters β and β werebounded by [ − , Acknowledgments

This material was based upon work partially supported by the National ScienceFoundation under Grant DMS-1638521 to the Statistical and Applied MathematicalSciences Institute and IOS-1838314 to KBF, and in part by National Institute of Aginggrant R21AG059099 to KBF. Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the authors and do notnecessarily reflect the views of the National Science Foundation. REB is a Royal SocietyWolfson Research Merit Award holder and also acknowledges the Biotechnology andBiological Sciences Research Council for funding via grant no. BB/R000816/1. MJSacknowledges the Australian Research Council (DP200100177).May 28, 2020 23/34 eferences

1. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: Adeep learning framework for solving forward and inverse problems involvingnonlinear partial differential equations. Journal of Computational Physics.2019;378:686 – 707. doi:https://doi.org/10.1016/j.jcp.2018.10.045.2. Jin W, Shah ET, Penington CJ, McCue SW, Chopin LK, Simpson MJ.Reproducibility of scratch assays is affected by the initial degree of confluence:Experiments, modelling and model selection. Journal of Theoretical Biology.2016;390:136 – 145. doi:https://doi.org/10.1016/j.jtbi.2015.10.040.3. Friedl P, Gilmour D. Collective cell migration in morphogenesis, regeneration andcancer. Nature Reviews Molecular Cell Biology. 2009;10(7):445–457.doi:10.1038/nrm2720.4. Vicsek T, Czirok A, Ben-Jacob E, Cohen I, Sochet O. Novel type of phasetransition in a system of self-driven particles. Physical Review Letters.1995;75(6):1226–1229. doi:10.1103/PhysRevLett.75.1226.5. Helbing D, Moln´ar P. Social force model for pedestrian dynamics. PhysicalReview E. 1995;51(5):4282–4286. doi:10.1103/PhysRevE.51.4282.6. Gallaher J, Anderson ARA. Evolution of intratumoral phenotypic heterogeneity:the role of trait inheritance. Interface Focus. 2013;3(4):20130016.doi:10.1098/rsfs.2013.0016.7. McLennan R, Schumacher LJ, Morrison JA, Teddy JM, Ridenour DA, Box AC,et al. Neural crest migration is driven by a few trailblazer cells with a uniquemolecular signature narrowly confined to the invasive front. Development.2015;142(11):2014–2025. doi:10.1242/dev.117507.8. Arciero JC, Mi Q, Branca MF, Hackam DJ, Swigon D. Continuum model ofcollective cell migration in wound healing and colony expansion. BiophysicalJournal. 2011;100(3):535–543. doi:10.1016/j.bpj.2010.11.083.9. Dyson L, Baker RE. The importance of volume exclusion in modelling cellularmigration. Journal of Mathematical Biology. 2015;71(3):691–711.doi:10.1007/s00285-014-0829-0.10. Johnston ST, Simpson MJ, Baker RE. Mean-field descriptions of collectivemigration with strong adhesion. Physical Review E. 2012;85(5):051922.doi:10.1103/PhysRevE.85.051922.11. Nardini JT, Chapnick DA, Liu X, Bortz DM. Modeling keratinocyte woundhealing: cell-cell adhesions promote sustained migration. Journal of TheoreticalBiology. 2016;400:103–117. doi:10.1016/j.jtbi.2016.04.015.12. Topaz CM, D’Orsogna MR, Edelstein-Keshet L, Bernoff AJ. Locust dynamics:behavioral phase change and swarming. PLOS Computational Biology.2012;8(8):e1002642. doi:10.1371/journal.pcbi.1002642.13. Aoki K, Kondo Y, Naoki H, Hiratsuka T, Itoh RE, Matsuda M. Propagatingwave of ERK activation orients collective cell migration. Developmental Cell.2017;43(3):305–317.e5. doi:10.1016/j.devcel.2017.10.016.May 28, 2020 24/344. Chapnick DA, Liu X. Leader cell positioning drives wound-directed collectivemigration in TGF beta-stimulated epithelial sheets. Molecular Biology of the Cell.2014;25(10):1586–1593. doi:10.1091/mbc.E14-01-0697.15. Matsubayashi Y, Ebisuya M, Honjoh S, Nishida E. ERK activation propagates inepithelial cell sheets and regulates their migration during wound healing. CurrentBiology. 2004;14(8):731–735. doi:10.1016/j.cub.2004.03.060.16. Nikolic DL, Boettiger AN, Bar-Sagi D, Carbeck JD, Shvartsman SY. Role ofboundary conditions in an experimental model of epithelial wound healing.American Journal of Physiology - Cell Physiology. 2006;291(1):C68–C75.doi:10.1152/ajpcell.00411.2005.17. Haridas P, Penington CJ, McGovern JA, McElwain DLS, Simpson MJ.Quantifying rates of cell migration and cell proliferation in co-culture barrierassays reveals how skin and melanoma cells interact during melanoma spreadingand invasion. Journal of Theoretical Biology. 2017;423:13–25.doi:10.1016/j.jtbi.2017.04.017.18. Johnston ST, Shah ET, Chopin LK, Sean McElwain DL, Simpson MJ.Estimating cell diffusivity and cell proliferation rate by interpreting IncuCyteZOOM assay data using the Fisher-Kolmogorov model. BMC Systems Biology.2015;9(38). doi:10.1186/s12918-015-0182-y.19. Maini PK, McElwain DLS, Leavesley DI. Travelling waves in a wound healingassay. Applied Mathematics Letters. 2004;17:575–580.doi:10.1016/j.aml.2003.01.011.20. Baldock AL, Ahn S, Rockne R, Johnston S, Neal M, Corwin D, et al.Patient-specific metrics of invasiveness reveal significant prognostic benefit ofresection in a predictable subset of gliomas. PLOS ONE. 2014;9(10):e99057.doi:10.1371/journal.pone.0099057.21. Anguige K, Schmeiser C. A one-dimensional model of cell diffusion andaggregation, incorporating volume filling and cell-to-cell adhesion. Journal ofMathematical Biology. 2009;58(3):395. doi:10.1007/s00285-008-0197-8.22. Dale PD, Olsen L, Maini PK, Sherratt JA. Travelling waves in wound healing.FORMA. 1995;10(3):205–222.23. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data bysparse identification of nonlinear dynamical systems. Proceedings of the NationalAcademy of Sciences. 2016;113(15):3932–3937. doi:10.1073/pnas.1517384113.24. Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partialdifferential equations. Science Advances. 2017;3(4). doi:10.1126/sciadv.1602614.25. Yang L, Meng X, Karniadakis GE. B-PINNs: Bayesian Physics-Informed NeuralNetworks for Forward and Inverse PDE Problems with Noisy Data; 2020.26. Lagergren JH, Nardini JT, Lavigne GM, Rutter EM, Flores KB. Learning partialdifferential equations for biological transport models from noisy spatio-temporaldata. Proceedings of the Royal Society A: Mathematical, Physical andEngineering Sciences. 2020;476(2234):20190800. doi:10.1098/rspa.2019.0800.27. Both GJ, Choudhury S, Sens P, Kusters R. DeepMoD: Deep learning for modeldiscovery in noisy data; 2019.May 28, 2020 25/348. Hornik K. Approximation capabilities of multilayer feedforward networks. NeuralNetworks. 1991;4(2):251 – 257.doi:https://doi.org/10.1016/0893-6080(91)90009-T.29. Banks HT, Hu S, Thompson WC. Modeling and inverse problems in the presenceof uncertainty. Chapman and Hall/CRC; 2014.30. Dydowiczova A, Brozman O, Babica P, Sovadinova I. Improved multiparametricscrape loading-dye transfer assay for a simultaneous high-throughput analysis ofgap junctional intercellular communication, cell density and viability. ScientificReports. 2020;10.31. Poumay Y, Pittelkow MR. Cell density and culture factors regulate keratinocytecommitment to differentiation and expression of suprabasal K1/K10 keratins.Journal of Investigative Dermatology. 1995;104(2):271 – 276.doi:https://doi.org/10.1111/1523-1747.ep12612810.32. Neurohr GE, Amon A. Relevance and regulation of cell density. Trends in CellBiology. 2020;30(3):213 – 225. doi:https://doi.org/10.1016/j.tcb.2019.12.006.33. Bindschadler M, McGrath JL. Sheet migration by wounded monolayers as anemergent property of single-cell dynamics. Journal of Cell Science.2007;120(5):876–884. doi:10.1242/jcs.03395.34. Warne DJ, Baker RE, Simpson MJ. Using experimental data and informationcriteria to guide model selection for reaction–diffusion problems in mathematicalbiology. Bulletin of Mathematical Biology. 2019;81(6):1760–1804.doi:10.1007/s11538-019-00589-x.35. Banks HT, Joyner ML. AIC under the framework of least squares estimation.Applied Mathematics Letters. 2017;74:33 – 45.doi:https://doi.org/10.1016/j.aml.2017.05.005.36. Sherratt JA, Murray JD, Clarke BC. Models of epidermal wound healing.Proceedings of the Royal Society of London Series B: Biological Sciences.1990;241(1300):29–36. doi:10.1098/rspb.1990.0061.37. Sengers BG, Please CP, Oreffo ROC. Experimental characterization andcomputational modelling of two-dimensional cell spreading for skeletalregeneration. Journal of the Royal Society, Interface. 2007;4(17):1107–1117.38. Buhlmann P. Bagging, boosting and ensemble methods. Handbook ofComputational Statistics. 2012;doi:10.1007/978-3-642-21551-3-33.39. Theodoridis S, Koutroumbas K. Chapter 5 - Feature selection. In: Theodoridis S,Koutroumbas K, editors. Pattern Recognition (Fourth Edition). fourth edition ed.Boston: Academic Press; 2009. p. 261 – 322. Available from: .40. Stone MH. The generalized Weierstrass approximation theorem. MathematicsMagazine. 1948;21(5):237. doi:10.2307/3029337.41. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:14126980[cs]. 2017;.42. Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP. On large-batchtraining for deep learning: generalization gap and sharp minima.arXiv:160904836 [cs, math]. 2017;.May 28, 2020 26/343. Kurganov A, Tadmor E. New high-resolution central schemes for nonlinearconservation laws and convection–diffusion equations. Journal of ComputationalPhysics. 2000;160(1):241–282. doi:10.1006/jcph.2000.6459.44. Lagergren J, Reeder A, Hamilton F, Smith RC, Flores KB. Forecasting anduncertainty quantification using a hybrid of mechanistic and non-mechanisticmodels for an age-structured population model. Bulletin of Mathematical Biology.2018;80(6):1578–1595. doi:10.1007/s11538-018-0421-7.45. Adoteye K, Baraldi R, Flores K, Nardini J, Banks HT, Thompson WC.Correlation of parameter estimators for models admitting multipleparameterizations. International Journal of Pure and Applied Mathematics.2015;105(3):497–522.May 28, 2020 27/34 upporting information

Fig S1 Scratch assay experiment. (a) An illustration of an experiment with theIncuCyte ZOOM ™ system (Essen BioScience, MI USA). Full details of the experimentand image processing can be found in [2]. Cells are seeded uniformly within each well ina 96-well plate at a pre-specified density of between 10,000 and 20,000 cells per well. AWoundMaker ™ (Essen BioScience) is used to create a uniform vertical scratch along themiddle of the well. (b) Microscopy images are collected from a rectangular region of thewell. (c) Example images corresponding to experiments initiated with 12,000, 16,000, or20,000 cells per well. A PC-3 prostate cancer cell line was used. The image recordingtime is indicated on each subfigure and the scale bar corresponds to 300 µ m. The greendashed lines in the images in the top row show the approximate location of the leadingedge created by the scratch. Each image is divided into equally-spaced vertical columns,and the number of cells in each column divided by the column area is calculated to yieldan estimate of the 1-D cell density. ab c May 28, 2020 28/34 ig S2 Simulation model fits.

Predicted cell density profiles using BINNs withthe governing reaction-diffusion PDE in Eq (9). The left subplot corresponds to the setof simulated data using the classical FKPP equation and the right subplot correspondsto the Generalized Porous-FKPP equation. Solid lines represent the numerical solutionto Eq (9) using D MLP , and G MLP . Dashed lines represent the noiseless numericalsimulations of the classical FKPP and Generalized Porous-FKPP equations. Themarkers represent the numerical simulations of the classical FKPP and GeneralizedPorous-FKPP equations with artificial noise generated by the statistical error model inEq (4).

Position (mm) C e ll d e n s i t y ( c e ll s / mm ) Classical FKPP Solution

Position (mm) C e ll d e n s i t y ( c e ll s / mm ) Porous-FKPP Solution

Fig S3 Simulation parameter fits.

The learned diffusivity and growth functions D MLP and G MLP evaluated over cell density u . Starting from the left, the first twosubplots correspond to the learned diffusivity and growth functions from simulated datausing the classical FKPP equation. The last two subplots correspond to the learneddiffusivity and growth functions from simulated data using the GeneralizedPorous-FKPP equation. Solid lines represent the parameter networks D MLP and G MLP and dashed lines represent the true diffusivity and growth functions used to simulatethe data.

Cell density (cells/mm ) D i ff u s i v i t y ( mm / d a y ) Classical FKPP Diffusion

PredTrue

Cell density (cells/mm ) G r o w t h ( / d a y ) Classical FKPP Growth

PredTrue

Cell density (cells/mm ) D i ff u s i v i t y ( mm / d a y ) Porous-FKPP Diffusion

PredTrue

Cell density (cells/mm ) G r o w t h ( / d a y ) Porous-FKPP Growth

PredTrue

May 28, 2020 29/34 ig S4 Reaction-diffusion BINN residuals.

Modified residuals using BINNswith the governing reaction-diffusion PDE in Eq (9). Each subplot corresponds to anexperiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000,18,000, and 20,000 cells per well). M o d i f i e d r e s i d u a l s Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well M o d i f i e d r e s i d u a l s Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well )10050050100 M o d i f i e d r e s i d u a l s Initial cell density: 18,000 cells per well )10050050100 Initial cell density: 20,000 cells per well

Fig S5 Delay-reaction-diffusion BINN residuals.

Modified residuals usingBINNs with the governing delay-reaction-diffusion PDE in Eq (10). Each subplotcorresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). M o d i f i e d r e s i d u a l s Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well M o d i f i e d r e s i d u a l s Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well )10050050100 M o d i f i e d r e s i d u a l s Initial cell density: 18,000 cells per well )10050050100 Initial cell density: 20,000 cells per well

May 28, 2020 30/34 ig S6 Classical FKPP model solutions.

Predicted cell density profiles using theclassical FKPP model in Eq (13). Each subplot corresponds to an experiment with adifferent initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cellsper well). Solid lines represent the numerical solution to Eq (13) using the parametersthat minimize L GLS in Eq (6). The markers represent the experimental scratch assaydata. C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 18,000 cells per well

Initial cell density: 20,000 cells per well

Fig S7 Generalized Porous-FKPP model solutions.

Predicted cell densityprofiles using the Generalized Porous-FKPP model in Eq (14). Each subplotcorresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numericalsolution to Eq (14) using the parameters that minimize L GLS in Eq (6). The markersrepresent the experimental scratch assay data. C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 10,000 cells per well

Initial cell density: 12,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 14,000 cells per well

Initial cell density: 16,000 cells per well C e ll d e n s i t y ( c e ll s / mm ) Initial cell density: 18,000 cells per well

Initial cell density: 20,000 cells per well

May 28, 2020 31/34 ab S1 Classical FKPP parameter values.

Each column corresponds to anexperiment with different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000,and 20,000 cells per well). Initial cell densityParameter 10,000 12,000 14,000 16,000 18,000 20,000 D ( µ m / hr ) 309.7 253.8 681.8 540.9 735.7 978.5 r ( / hr ) 0.0437 0.0438 0.0483 0.0490 0.0540 0.0649 Tab S2 Generalized Porous-FKPP parameter values.

Each columncorresponds to an experiment with different initial cell density (i.e. 10,000, 12,000,14,000, 16,000, 18,000, and 20,000 cells per well).Initial cell densityParameter 10,000 12,000 14,000 16,000 18,000 20,000 D ( µ m / hr ) 1851.8 465.0 2993.4 2371.0 2377.8 2017.1 m (unitless) 0.9704 0.3001 0.9923 0.9879 0.8265 0.6235 r ( / hr ) 0.0435 0.0436 0.0481 0.0484 0.0526 0.0639 Fig S8 Unconstrained BINN terms.

The learned diffusivity D MLP , growth G MLP , and delay T MLP functions extracted from the corresponding BINNs withgoverning reaction-diffusion PDE in Eq (9) (first row) and delay-reaction-diffusion PDEin Eq (10) (second row). Each line corresponds to an experiment with a different initialcell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Notethat D MLP and G MLP have different lengths since they are evaluated between theminimum and maximum observed cell densities corresponding to each data set. D i ff u s i v i t y ( mm / d a y ) D MLP G r o w t h ( / d a y ) G MLP D i ff u s i v i t y ( mm / d a y ) D MLP G r o w t h ( / d a y ) G MLP S c a l e ( un i t l e ss ) T MLP

May 28, 2020 32/34 ig S9 Statistical error model selection.

The function-approximating deepneural network u MLP is trained using L GLS for different values of γ across each data set.Each subplot shows the modified residuals (see Eq (6)) as a function of the predictedcell density u . The columns correspond to different levels of proportionality (i.e. γ = 0 . , . , . , .

6) where γ = 0 . M o d i f i e d R e s i d u a l s a = 0.0 = 0.2 = 0.4 = 0.6 M o d i f i e d R e s i d u a l s b M o d i f i e d R e s i d u a l s c M o d i f i e d R e s i d u a l s d M o d i f i e d R e s i d u a l s e M o d i f i e d R e s i d u a l s f May 28, 2020 33/34 ig S10 PDE random sampling validation.

The BINNs framework is trainedusing L Total with three ways of including the PDE error term L Total : (a) no PDEregularization, (b) PDE regularization at the data locations, and (c) PDE regularizationat 10,000 randomly sampled points at each training iteration. The first column showsthe scratch assay data with initial cell density 20,000 cells per well (black dots) with thecorresponding BINNs approximation to the governing PDE u MLP (surface plot). Thesecond column shows heatmaps of the modified residual errors (see Eq (6)) at each datapoint. The third column shows heatmaps of the PDE errors (see Eq (7)) evaluated on a100 ××