Deep Neural Network Based Differential Equation Solver for HIV Enzyme Kinetics
DDeep Neural Network Based DifferentialEquation Solver for HIV Enzyme Kinetics
Joseph Stember ∗ Parvathy Jayan ∗ Hrithwik Shalu February 18, 2021 Memorial Sloan Kettering Cancer Center, New York, NY, US, 10065 Indian Institute of Science Education and Research, Tirupati, India, 517507 Indian Institute of Technology, Madras, Chennai, India, 600036 [email protected] [email protected] [email protected] Abstract
Purpose
We seek to use neural networks (NNs) to solve a well-knownsystem of differential equations describing the balance between T cells andHIV viral burden.
Materials and Methods
In this paper we employ a 3-input parallel NNto approximate solutions for the system of first order ordinary differentialequations describing the above biochemical relationship.
Results
The numerical results obtained by the NN are very similar toa host of numerical approximations from the literature.
Conclusion
We have demonstrated use of NN integration of a well-known and medically important system of first order coupled ordinarydifferential equations. Our trial-and-error approach counteracts the sys-tem’s inherent scale imbalance. However, it highlights the need to addressscale imbalance more substantively in future work. Doing so will allowmore automated solutions to larger systems of equations, which coulddescribe increasingly complex and biologically interesting systems. ∗ Equal contribution a r X i v : . [ q - b i o . Q M ] F e b ntroduction Differential equations underpin essentially all of science, engineering, and fi-nance. They describe how the behavior of dynamical systems unfolds over time.Most differential equations cannot be solved symbolically, i.e., a closed form so-lution does not exist. Almost all systems that are of interest for studying prac-tical applications must therefore be approximated numerically. Studying thesesystems centers on devising the appropriate numerical approximation schemebalancing the benefit of being able to obtain a solution with the deviation froma "true" answer that the approximation introduces.With the recently surging popularity of deep learning and neural networks(NNs), new interest has arisen in solving differential equations with NNs. Inthe 1990s, Lagaris et al. proposed and implemented as proof-of-principle NN-based solutions to some illustrative ordinary and partial differential equations[4]. Dockhorn [3] applied the approach to solving the Poisson and Navier-StokesEquations. Liu et al. [5] applied a NN approach to solving the Laplace Equation.In quantum mechanics, Sehanobish et al. [10] used NNs to compute solutionsfor the potential energy function from Schrodinger’s Equation. In classical me-chanics, Mattheakis et al. [6] solved Hamilton’s Equations for positions andmomenta, besting the fidelity of numerical solution phase space diagrams forboth periodic and chaotic dynamical systems.Chemical kinetics describes the rate of chemical conversions. It is an integralcomponent in systems biology, in which large biochemical systems are studiedquantitatively in order to understand biological behavior and disease processes,also forming an important tool in pharmacology. This is a rich field with mean-ingful applications in basic science and health care. As such, we seek to applyNNs to solve the differential equations that describe chemical kinetics.We endeavor specifically to do so for a particularly well known chemicalkinetics system, a model for the interplay of human immunodeficiency virus(HIV) and immune system CD4+T cells, which the virus invades and attacks.This is the mechanism by which HIV degrades the human immune system.Particularly before the widespread use of highly active anti-retroviral therapy(HAART), this rendered victims susceptible to opportunistic infections thatwould normally be asymptomatic in the presence of a healthy immune system. ethods
Model and parameters
The HIV kinetic model seeks to quantify the relative amounts / concentrations ofthe uninfected but susceptible T cells (not all T cells are susceptible to infection),infected T cells, and HIV virus particles over time. These are denoted by T ( t ) , I ( t ) , and V ( t ) , respectively.The system of ordinary first order differential equations connecting thesevariables can be formulated as [2, 9]: dTdt = p − αT + rT (cid:18) − T + IT max (cid:19) − kV TdIdt = kV T − βIdVdt = N βI − γV (1)subject to the initial conditions: T (0) = T , I (0) = I , V (0) = V (2)Regarding the parameters in Equation 1:• p is the rate of T cell production in the bone marrow and thymus• α is the natural turnover rate of uninfected T cells• r is the rate of T cell mitosis, or division• T max is the maximum concentration of T cells in the bloodstream• k is the rate constant for infection by the HIV virus• N is the number of infectious free viral particles (virions) produced perinfected T cell [8]• β is the natural turnover rate of infected T cells• γ is the natural turnover rate of virus particlesWe seek to solve Equation 1 using the experimentally known quantities [2,7, 1, 11]: T = 0 . , I = 0 . , V = 0 . , p = 0 . ,α = 0 . , β = 0 . , γ = 2 . , r = 3 . ,k = 0 . , T max = 1500 , N = 10 (3)We solve via NN loss minimization. In order to do this, we first subtract offthe right hand sides of Equation 1: Tdt − (cid:18) p − αT + rT (cid:18) − T + IT max (cid:19) − kV T (cid:19) = 0 dIdt − ( kV T − βI ) = 0 dVdt − ( N βI − γV ) = 0 (4) Addressing scale imbalance
As seen in Figure 3 as well as earlier numerical results (Tables 1 –3), in general, T ( t ) (cid:29) V ( t ) (cid:29) I ( t ) . More specifically, T ( t ) values are on the order of − – ,whereas V ( t ) takes values on the order of − – − , with I ( t ) much lower ataround − – − . In general, the order of magnitude of a function beingapproximated by a NN is reflected in the network’s loss.Hence, a loss value that is significant enough to update weights for the NNapproximating I ( t ) would have a negligible effect upon that for V ( t ) , and thelatter would fail to train / update, producing the vanish gradient problem. Onthe other hand, a loss significant enough to update V ( t ) would have an outsizedeffect on the weights for I ( t ) , engendering the exploding gradient problem.In order to ameliorate this problem, we multiply the dIdt and dVdt by scalingfactors in order to bring all quantities toward the same order of magnitude.Based on some trial and error, as well as the relative typical size scale of thevalues as mentioned above, the modified set of differential equations that wesolve with the NN from Figure 1 is given by: dTdt − (cid:18) p − αT + rT (cid:18) − T + IT max (cid:19) − kV T (cid:19) = 010 , × (cid:18) dIdt − ( kV T − βI ) (cid:19) = 010 × (cid:18) dVdt − ( N βI − γV ) (cid:19) = 0 (5)Two major drawbacks to this approach are:1. It requires a process of trial and error that can not be automated. Further-more, this would soon become unfeasible for even slightly larger systemsof differential equations.2. Equation 1 represents a coupled system of differential equations. For ex-ample, knowledge of any of I ( t ) and V ( t ) is required to calculate T ( t ) .This interdependence means that we can not change (via multiplication)any lines of Equation 1 without altering all of the functional values. Hence,although it works in this example, the approach of Equation 5 is not ingeneral a reliable one for bringing functions to the same general order ofmagnitude. eural network Having made the above adjustments, our loss will be the right hand side ofEquation 5. We can enforce the equality to arbitrary accuracy by minimizingthe loss to within a particular threshold. For the time derivative terms, wesubstitute the finite difference numerical approximations: dTdt ≈ T ( t i + ∆ t ) − T ( t i )∆ tdIdt ≈ I ( t i + ∆ t ) − I ( t i )∆ tdVdt ≈ V ( t i + ∆ t ) − V ( t i )∆ t , (6)Figure 1: Deep neural network architecture.where t i is the time value with which we estimate the slope and ∆ t is a smallime increment. We note that Equation 6 becomes strict equalities as ∆ t → .Hence, the smaller the value we choose for ∆ t , the better the approximation.The one input into our neural network is time t . The domain on which thefunction is defined is given by the span of t values on which the network istrained. The architecture of the relatively simple NN we employ here is shownin Figure 1. The input t is passed separately through a parallel set of networkbranches. We use the three parallel branches / multi-input structure becausewe are approximating three different functions.for the T ( t ) branch, this consists of two fully connected layers of 32 nodes,each with sine activation. The sine function normalizes to between -1 and 1,ensuring that network weights do not explode or vanish. It is used for the T ( t ) branch because T ( t ) is on a higher scale of magnitude compared to I ( t ) and V ( t ) .Since I ( t ) and V ( t ) take much lower values, their NN branches use ReLuactivation. ReLu could, given sizable input values, produce large values in theintermediate layers or loss. However, this is not of concern given typically lowvalues of I ( t ) and V ( t ) .As they tend to be of lower magnitude, I ( t ) and V ( t ) require more approx-imation power, and thus have an additional fully connected layer of 16 nodesbefore producing the single node outputs (representing the scaler function values I ( t ) and V ( t ) .)Hence, the three branches together produce a 3-node output. Each of theoutput nodes, T ( t ) , I ( t ) , and V ( t ) are compared to the numerical values ofEquation 5 using the approximations from Equation 6. We use mean absoluteerror for our loss function.We employ the following parameters / hyperparameters in training our NN:• Adam optimizer• Learning rate: × − • Training time: 3,000 epochs• Training set size: 256 time values• Testing set size: 128 time valuesThe input time values t s that the network samples span from t min = 0 to t max = 1 . Outside of interval [0 , , the NN will not learn how to fit the functions.We were able to sample within this interval with distinct training and validationsets as follows:• Training set: selecting t values from a uniform random distribution.• Testing set: selecting t values from a uniformly spaced distribution; all t s generated in the same epoch will form a grid where each t is equallyspaced.ll calculations were executed in Google Colab Pro, which runs in thePython language. We made use of the Pytorch module for NN design andtraining, also using the neurodiffeq library, which is specifically designed forsolving differential equations with NNs. Results
The loss during training is shown in Figure 2. Effective training is manifestedby monotonically decreasing loss of both the training and validation sets.Figure 2: Loss of the neural network as a function of training time (epochs).The computed values for T ( t ) , I ( t ) , and V ( t ) are shown in Figure 3. Theexpected trend of T ( t ) increasing over time and being of a higher order ofmagnitude compared to I ( t ) and V ( t ) is apparent. The overall trend matcheswell with previously published results [1].igure 3: Plots of the solved concentrations over time.By identifying the interpolated values of T ( t ) at particular time values, wecan compare with numerical approximation methods previously used to approx-mate the solution to Equation 1. The NN values are tabulated in the second-to-left-most column in Table 1. They are juxtaposed with the correspondingvalues from various leading numerical approximation methods, and we can seethat the values are very close. It should be noted that, despite the apparentincrease in T cells implied by Figure 3, this represents a sub-population of Tcells on a relatively short time scale. This is in fact bounded by overall T celldepletion at a longer time scale due to viral destruction [8], as would be ex-pected, and noting in fact that one definition of AIDS is total CD4 count below200 cells mm . Conclusions
If we were to use the numerical approximations as gold standards, the abovewould be a demonstration of high accuracy. We know from prior work that NN-based integration is in fact more accurate than numerical approximate solutionsand is often indistinguishable from the results of analytical integration in caseswhere a closed form solution is possible [6].An important limitation of the current approach is the aforementioned issueof scale imbalance. Although we achieved good results for this system witha trial and error approach that involved multiplying two of the equations byscaling factors, this is not guaranteed to work on other systems. In fact, thelack of a systematic method to achieve scale balance would be expected to hinderintegration for larger, more complex systems that would be of interest in, forinstance, systems biology. Future work will focus on new techniques to overcomethis challenge. t ProposedMethod HDM[2] LADM-Padé[7] Runge-Kutta MVIM[1] VIM[1] BCM[11]0 0.1 0.1 0.1 0.1 0.1 0.1 0.10.2 0.20305557 0.20880727 0.20880727 0.20880808 0.20880808 0.20880732 0.203861650.4 0.39913343 0.40610526 0.40610526 0.40624053 0.40624079 0.40613465 0.380330930.6 0.75553419 0.76114677 0.76114677 0.76442388 0.76442872 0.76245303 0.695462370.8 1.39635758 1.37731985 1.37731985 1.41404683 1.41409417 1.39788058 1.275962441 2.56711673 2.32916976 2.32916976 2.59159480 0.20880808 2.50674666 2.38322774Table 1: Numerical comparison of T ( t ) ProposedMethod HDM[2] LADM-Padé[7] Runge-Kutta MVIM[1] VIM[1] BCM[11]0 0.1 0.1 0.1 0.1 0.1 0.1 0.10.2 0.06334023 0.06187996 0.06187996 0.06187984 0.06187990 0.06187995 0.061879910.4 0.03875031 0.03831324 0.03831324 0.03829488 0.03829595 0.03830820 0.038294930.6 0.02408340 0.02439174 0.02439174 0.02370455 0.02371029 0.02392029 0.023704310.8 0.01513445 0.00996721 0.00996721 0.01468036 0.01470041 0.01621704 0.014679561 0.00962319 0.00330507 0.00330507 0.00910084 0.00915723 0.01608418 0.02370431Table 2: Numerical comparison of V ( t ) t ProposedMethod HDM[2] LADM-Padé[7] Runge-Kutta MVIM[1] VIM[1] BCM[11]0 0 0 0 0 0.1 · 10 -13 -6 -6 -6 -6 -6 -6 -6 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 Table 3: Numerical comparison of I ( t ) eferences [1] Abdon Atangana and Ernestine Alabaraoye. “Solving a system of frac-tional partial differential equations arising in the model of HIV infectionof CD4+ cells and attractor one-dimensional Keller-Segel equations”. In: Advances in Difference Equations
BioMedresearch international
A Discussion on Solving Partial Differential Equationsusing Neural Networks . 2019. arXiv: .[4] I.E. Lagaris, A. Likas, and D.I. Fotiadis. “Artificial neural networks forsolving ordinary and partial differential equations”. In:
IEEE Transactionson Neural Networks issn : 1045-9227. doi : . url : http://dx.doi.org/10.1109/72.712178 .[5] Zeyu Liu, Yantao Yang, and Qingdong Cai. “Neural network as a func-tion approximator and its application in solving differential equations”. In: Applied Mathematics and Mechanics arXiv preprint arXiv:2001.11107 (2020).[7] Mevlüde Yakit Ongun. “The Laplace Adomian decomposition method forsolving a model for HIV infection of CD4+ T cells”. In:
Mathematical andComputer Modelling
Mathematical biosciences
SIAM review
Learning Potentials of Quantum Systems usingDeep Neural Networks . 2021. arXiv: .[11] Şuayip Yüzbaşı. “A numerical approach to solve the model for HIV infec-tion of CD4+ T cells”. In: