A Faster DiSH: Hardware Implementation of a Discrete Cell Signaling Network Simulator
Kevin Gilboy, Khaled Sayed, Niteesh Sundaram, Kara Bocan, Natasa Miskov-Zivanov
AA Faster DiSH: Hardware Implementation of aDiscrete Cell Signaling Network Simulator
Kevin Gilboy, Khaled Sayed, Niteesh Sundaram, Kara Bocan, and Natasa Miskov-Zivanov
University of PittsburghPittsburgh, PA, USA, 15213Email:[email protected]
Abstract —Development of fast methods to conduct in silico experiments using computational models of cellular signaling is apromising approach toward advances in personalized medicine.However, software-based cellular network simulation has run-times plagued by wasted CPU cycles and unnecessary processes.Hardware-based simulation affords substantial speedup, butprior attempts at hardware-based biological simulation have beenlimited in scope and have suffered from inaccuracies due topoor random number generation. In this work, we propose sev-eral hardware-based simulation schemes utilizing novel randomupdate index generation techniques for step-based and round-based stochastic simulations of cellular networks. Our resultsshow improved runtimes while maintaining simulation accuracycompared to software implementations.
I. I
NTRODUCTION
Biological systems are stochastic in nature; biochemicalreactions have a certain probability of occurring, even whenthe concentrations of reactants are known, and the environ-mental conditions are controlled. This is in direct contrast to adeterministic system, in which a given set of inputs will alwaysproduce the same set of outputs. Computational models canmimic the randomness of biological systems to gain insightinto the nature of particular biological mechanisms, such asthe role a specific ligand plays in a signaling network, or thesubstance concentration needed to initiate a reaction. In addi-tion, computational models have been shown to be invaluableas researchers can run a large number of in silico experimentsthat would be inefficient or impractical if attempted in vivo or in vitro . Such models allow for predicting wet-lab outcomesand for shedding insights into the nature of biological systems.As computational models are versatile and can be constantlyupdated via input from the user, they have the potential to beused in clinical applications such as personalized medicine. Amodel of a physiological system could be created and thenmodified in real time to more accurately reflect the uniquecharacteristics of a specific person. The model could then beused to predict patient outcomes and inform treatments.Although many such personalized medicine software (SW)models exist, they are slower and less efficient than hardware(HW) approaches [1], [2], [3], [4]. Implementation in FPGAsallows running concurrent models at a speed superior tocurrent SW models, while still maintaining accuracy andreliability. In this paper, we describe HW emulation of adiscrete model of T cell differentiation in SystemVerilog, andcompare the accuracy and runtime with SW simulation for different simulation schemes and scenarios. These simula-tion schemes and scenarios represent stochasticity in cellularsignaling networks and different component conditions andchanges that occur in these networks. Selecting elements forupdate in such simulations is a non-trivial task, thereforewe also discuss and implement novel random update indexgeneration techniques for stochastic simulation schemes.II. M ETHODS
A T cell differentiation model was used in this work tocompare HW and SW simulation. The model is described in[5], where it was analyzed using the SW simulator describedin [6]. In this work, we compared HW and SW simulationof this model using simulation schemes described in [7],where an executable model consists of elements or elementgroups that are updated simultaneously within the group. Eachelement has an associated update rule, and elements/groupshave a certain probability of their rules being executed in agiven simulation step or round. Here we implemented: the si-multaneous (SMLN) scheme; random-order sequential (RSQ)scheme, both round-based (RB-RSQ) and step-based (SB-RSQ); and grouped random-order sequential (RSQ-g) scheme,both round-based (RB-RSQ-g) and step-based (SB-RSQ-g).The HW framework and simulation schemes were imple-mented in SystemVerilog and simulated using ModelSim. Forthe SW comparison, we used the DiSH (Discrete, Stochastic,Heterogeneous) SW simulator, previously described in [7].
A. HW Framework
A schematic of the HW simulator is shown in Figure 1.The simulation schemes are implemented and selected withinthe
Rule Selector block, and each simulation is run for aspecified number of steps or rounds. In addition, the model canbe initialized with different element values at the beginning ofeach simulation.A
Start signal to the
Control Path module promptsthe
Rule Selector to randomly generate an index viathe
Enable pin. If the
Valid Rule pin indicates avalid index,
Control Path asserts a high
Load sig-nal to the
Inhibitor , Updated , and
Current StateRegisters , allowing their values to be modified on thenext clock cycle. The index is used as a
Select inputfor the
Current State Register , which keeps trackof the state of each element in the model. The output a r X i v : . [ q - b i o . M N ] N ov ontrol Path Load Next StateEnable RNGValid RuleStart
Rule Selector
Valid RuleSeed RuleEnableIs steady state? Steady
RNG Seed
Current State Register
Data InInitial State LoadSelect Data Out
Updated Register
Data InInitial State LoadSelect Data Out
Inhibitor Register
Data InInitial State LoadSelect Data Out
Select Inhibitor Load InhibitorNumRulesInitial State
Comparator
Input 1Input 2Equal?
Comparator
Input 1Input 2Equal?
Previous StateRegister
Data InLoadData Out
NumRules
Network Logic
Next StateCurrent StateLoad UpdatedLoad Last State
Fig. 1. Hardware framework for simulating cellular networks with differentsimulation schemes and random update index generation. ( Data Out ) of the
Current State Register is thenbitmasked with output (
Data Out ) from the
InhibitorRegister , where the inhibitor is chosen by the initialconditions (
Select Inhibitor ). The result of the com-bined current state and inhibition is stored in the
PreviousState Register and passed into the
Network Logic module, which contains the update rules of the model. Thecurrent state is passed on the
Current State pin, and the
Network Logic module outputs the next state of the systemon the
Next State pin. The result is stored in the
CurrentState Register . The circuit is “steady” once all ruleshave been run, as tracked by the
Updated Register .The upper
Comparator module compares the updated ele-ments/groups with the complete list of elements/groups, assert-ing the
Steady pin on the
Control Path if they are equal.The circuit is said to be in “steady state” if the current state isequal to the previous state. The lower
Comparator modulecompares the output of the
Current State Register to that of the
Previous State Register and asserts the
Is steady state? pin on the
Control Path if theyare equal.
B. RNG Algorithms
While the SMLN scheme is deterministic, all other schemesrequire random number generation (RNG) to generate a ran-dom rule index that selects elements/groups and executes theirupdate rules in each simulation round/step. We describe heretwo novel HW-based algorithms for RNG that we created andimplemented within the
Rule Selector block in Figure 1.
1) Round-based RNG:
In the round-based simulationschemes, each element/group is updated at least once in agiven round; therefore each round consists of a number of stepsequal to the number of elements/groups in the model. Ourround-based RNG algorithm allows linear runtime with respectto the number of elements/groups in the model by eliminatingthe possibility of duplicate or invalid rule index generation.The operation is illustrated in Figure 2. Two parallel registerarrays, A and B, are utilized as stacks for storing itemsconsisting of a rule index (“Priority”) and randomly generatednumber (“Value”). Stack A always starts empty and growsupward while Stack B starts out full and recedes downward.With each step, a new item is pushed up Stack A. If the newitem’s Value is greater than an existing Value, the Priority ofthe new item is increased by one. Otherwise, the Priority ofthe existing item is increased by one. If there are no otherValues on the stack, the rule takes a default priority of 0.Simultaneously to the generation of Stack A, an item is poppedoff of B, and its Priority determines which element/group isupdated in the current step. When a round is completed, StackB is empty and Stack A is full. The items from A are thenloaded into B, Stack A is flushed, and the parallel generationand popping of items repeats. At the start of the simulation,where both stacks are empty, items must be generated to fillStack B before a round can begin. After the initial generationof Stack B, the coincident generation and popping of itemsallows linear runtime without duplicates or invalid rules.
2) Step-based RNG:
A RNG algorithm was also developedfor the step-based scheme, where the same element/groupcan potentially be updated multiple times before other ele-ments/groups are updated. Directly using the RNG numberas the rule index is highly inefficient for a model that has anumber of elements/groups not equal to a power of two: amodel with 37 elements would need 6 bits to represent eachof the potential element updates, leading to −
37 = 27
RNG outputs (42%) counting as a costly “miss”. Here, wedetermine the rule index ( I ) using Equation 1, where X is theRNG number and E is the number of elements/groups. Wealso use 10 RNG bits ( n = 10) to approach uniform probabilityof generating any index, calculated by setting a threshold of (2 n mod E ) / n < . I = X mod E (1) ig. 2. The RB-RSQ rule selection scheme implementation for a sample 16-rule system, utilizing a linear feedback shift register (LFSR). A 16-element/groupsystem as diagrammed here would be = 8 bits wide and 16 registers tall. Each register in the array could be thought of as being logically dividedin half, with the least significant bits corresponding to a Value, and the most significant bits indicating the Priority. C. HW and SW Comparison
We simulated eight scenarios representing different inputvalue configurations, and compared the SW and HW simula-tors by studying the responses of FOXP3 and IL2 (markersof regulatory or helper T cell phenotypes, respectively). TableI shows the initial values for input signals that were variedfor the eight scenarios: antigen dose affecting T cell receptor(TCR) signal strength, transforming growth factor beta ligand(TGF), and the inhibitor of protein kinase B (AKT off). Thepercentages listed for the Toggle scenarios in Table I representthe percentage of simulation steps/rounds that were completedbefore the protein’s value was toggled. Of the variables notlisted in Table I, CD28, PTEN, TSC, CD122, and CD132 wereinitialized to 1, and all the other variables were initialized to 0,to mimic the na¨ıve T cell phenotype at the beginning of eachsimulation. We ran all round-based simulations for 30 rounds,and all step-based and SMLN simulations for 2000 steps [5].We ran each simulation scheme 200 times from the initial tosteady state and calculated average trajectories, according tomethodology from [8]. III. R
ESULTS
Figures 3 and 4 show the difference between averagetrajectories obtained with the HW simulator and the SWsimulator for FOXP3 and IL2 and each simulation scheme.Average root-mean-square error (RMSE) across all scenariosfor each simulation scheme is shown in Table II.Runtime comparisons are shown in Table III for eachsimulation scheme. For HW runtimes, the number of clockcycles used as reported by ModelSim was divided by 50 MHz,a common FPGA base clock frequency. This was comparedto SW runtime as performed on a 2015 Apple MacBook Pro
TABLE IS
IMULATION SCENARIOS AND INITIAL VALUES
Scenario TCR high TCR low TGFbeta AKT off Toggle1 1 0 0 0 -2 0 1 0 0 -3 1 0 1 0 -4 0 1 1 0 -5 1 0 0 1 -6 1* 0 0 0 20.00%7 1* 0 0 0 26.67%8 1* 0 0 0 33.33%* Initial value before toggle TABLE IIRMSE
AVERAGED ACROSS ALL SIMULATION SCENARIOS
Scheme% Error RB-RSQ SB-RSQ RB-RSQ-g SB-RSQ-g SMLNFOXP3 1.13 11.18 10.86 6.22 14.99IL2 0.99 10.79 13.00 5.41 6.23 (3.1 GHz Intel i7 processor). The HW implementation had amedian speedup of 54.4X for round-based simulations, and426.7X for step-based simulations. The greatest speedup wasprovided for the SMLN scheme, followed by RSQ-g and thenRSQ. IV. D
ISCUSSION
Our results show that all five simulation schemes in HWproduced results comparable to the SW simulator, with all ig. 3. Difference between HW and SW simulator output for FOXP3 andIL2 with the RSQ scheme (top), and the RSQ-g scheme (bottom).Fig. 4. Difference between HW and SW simulator output for FOXP3 andIL2 with the SMLN scheme.
RMSEs under 15%. The round-based implementation hadthe smallest RMSE, but with the smallest runtime speedup,making it a Las Vegas type algorithm. Conversely, the SMLNscheme showed the greatest runtime speedup but greatestRMSE, making it a Monte Carlo type algorithm. The groupedstep-based implementation had the second smallest RMSE andthe fastest runtime speedup for all the RSQ methodologies.The step-based HW implementation was expected to providegreater speedup than the round-based implementation becausethe round-based RNG relies on pushing numbers onto astack at linear runtime. Similarly, the SMLN schemes showmuch greater speedup than the other schemes, because everyelement’s value is updated in each step without the need forRNG. However, the RSQ schemes are more desirable formodeling stochasticity in biological networks [1], [2], [3], [4].The speedups for RSQ simulation schemes were similar toprevious stochastic simulation work by Yoshimi et al. [2], andHW runtimes were similar to those reported in [4], despitedifferences in simulator architectures.All non-Toggle HW simulations approached zero differencefrom the SW simulator at steady state. For the Toggle scenar-ios, a large deviation occurred subsequent to the toggle ofthe antigen dose, and the steady state values were differentbetween the SW and HW simulators, particularly for the
TABLE IIIR
UNTIME COMPARISON FOR EACH SIMULATION SCHEME .Runtime [ms]Scheme Steps/Rounds SW HW Speedup
Round-Based Schemes
RB-RSQ 1929500 1062.5 38.6 27.5XRB-RSQ-g 1216500 1253.6 24.3 51.5XSMLN 11300 645.8 0.2 2857.5XRB-RSQ Toggle 1929500 1027.6 38.6 26.6XRB-RSQ-g Toggle 1216500 1394.7 24.3 57.3XSMLN Toggle 11300 688.9 0.2 3048.3X
Step-Based Schemes
SB-RSQ 401300 3225.1 8.0 401.8XSB-RSQ-g 303537 2745.0 6.1 452.2XSB-RSQ Toggle 401300 3036.9 8.0 378.3XSB-RSQ-g Toggle 303537 2741.7 6.1 451.6X grouped schemes. This can be likened to a logic circuit whoseinputs change before the output has stabilized.The most likely source of error in the RSQ methodologieslies in comparing the randomness of the HW RNG and therandomized SW update rule selection. This is evidenced by thegreater differences in transient behavior compared to steadystate. Comparing the HW and SW simulators with the samesequence of element/group updates could provide more similartransient results, and could be investigated in future work.The SMLN scheme implementations showed differences atthe start of the simulation, but reached zero difference at steadystate. In the SMLN scheme, each element not governed bya forced initial condition was given a random initial state,leading to possible initial states. Since only 200 of theseinitial states were tested, and the 200 states used in the HWmodel most likely differed from the 200 states used in the SWmodel, deviations are expected especially at the start of thesimulation due to different initial conditions. In a simulationwhere the HW and SW models utilize the same initial states,the HW and SW simulation results are the same.V. C ONCLUSION
In this work, we designed and emulated five HW-based sim-ulation schemes and two random update index generation tech-niques that were accurate in predicting phenotype decisionsin T cells and afforded orders of magnitude runtime speedupcompared to SW simulation. For the T cell model in this work,the round-based implementation was most accurate to SWbut provided the smallest runtime speedup, while the SMLNimplementation provided the fastest result although with loweraccuracy to SW. Future work includes parallelization, as wellas improving the random update index generation to increasethe accuracy without the cost of runtime.A
CKNOWLEDGMENT
This work is supported in part by DARPA Big Mechanismaward W911NF-14-1-0422.
EFERENCES[1] L. Salwinski and D. Eisenberg, “In silico simulation of biologicalnetwork dynamics,”
Nature Biotechnology , vol. 22, no. 8, pp. 1017–1019,2004. [Online]. Available: http://dx.doi.org/10.1038/nbt991[2] M. Yoshimi, Y. Iwaoka, Y. Nishikawa, T. Kojima, Y. Osana, A. Funahashi,N. Hiroi, Y. Shibata, N. Iwanaga, and H. Yamada, “FPGA implementationof a data-driven stochastic biochemical simulator with the next reactionmethod,” in . IEEE, 2007, Conference Proceedings, pp. 254–259.[3] N. Miskov-Zivanov, A. Bresticker, D. Krishnaswamy, S. Venkatakrishnan,P. Kashinkunti, D. Marculescu, and J. R. Faeder, “Regulatory networkanalysis acceleration with reconfigurable hardware,” in . IEEE, 2011, Conference Proceedings, pp. 149–152.[4] N. Miskov-Zivanov, A. Bresticker, D. Krishnaswamy, S. Venkatakrishnan,D. Marculescu, and J. R. Faeder, “Emulation of biological networks inreconfigurable hardware,” in
Proceedings of the 2nd ACM Conferenceon Bioinformatics, Computational Biology and Biomedicine . 2147893:ACM, 2011, Conference Proceedings, pp. 536–540.[5] N. Miskov-Zivanov, M. S. Turner, L. P. Kane, P. A. Morel, and J. R.Faeder, “The duration of T cell stimulation is a critical determinant ofcell fate and plasticity,”
Science Signaling , vol. 6, no. 300, p. ra97, 2013.[6] I. Albert, J. Thakar, S. Li, R. Zhang, and R. Albert, “Boolean networksimulations for life scientists,”
Source Code for Biology and Medicine ,vol. 3, p. 16, 2008.[7] K. Sayed, Y. H. Kuo, A. Kulkarni, and N. Miskov-Zivanov, “DiSHsimulator: Capturing dynamics of cellular signaling with heterogeneousknowledge,” in , Dec 2017,pp. 896–907.[8] K. Sayed, C. A. Telmer, and N. Miskov-Zivanov, “Motif modeling forcell signaling networks,” in2016 8th Cairo International BiomedicalEngineering Conference (CIBEC)