[PDF] A Faster DiSH: Hardware Implementation of a Discrete Cell Signaling Network Simulator

Abstract

Development of fast methods to conduct in silico experiments using computational models of cellular signaling is a promising approach toward advances in personalized medicine. However, software-based cellular network simulation has run-times plagued by wasted CPU cycles and unnecessary processes. Hardware-based simulation affords substantial speedup, but prior attempts at hardware-based biological simulation have been limited in scope and have suffered from inaccuracies due to poor random number generation. In this work, we propose several hardware-based simulation schemes utilizing novel random update index generation techniques for step-based and round-based stochastic simulations of cellular networks. Our results show improved runtimes while maintaining simulation accuracy compared to software implementations.

Full PDF

AA Faster DiSH: Hardware Implementation of aDiscrete Cell Signaling Network Simulator

Kevin Gilboy, Khaled Sayed, Niteesh Sundaram, Kara Bocan, and Natasa Miskov-Zivanov

University of PittsburghPittsburgh, PA, USA, 15213Email:[email protected]

Abstract —Development of fast methods to conduct in silico experiments using computational models of cellular signaling is apromising approach toward advances in personalized medicine.However, software-based cellular network simulation has run-times plagued by wasted CPU cycles and unnecessary processes.Hardware-based simulation affords substantial speedup, butprior attempts at hardware-based biological simulation have beenlimited in scope and have suffered from inaccuracies due topoor random number generation. In this work, we propose sev-eral hardware-based simulation schemes utilizing novel randomupdate index generation techniques for step-based and round-based stochastic simulations of cellular networks. Our resultsshow improved runtimes while maintaining simulation accuracycompared to software implementations.

I. I

NTRODUCTION

Biological systems are stochastic in nature; biochemicalreactions have a certain probability of occurring, even whenthe concentrations of reactants are known, and the environ-mental conditions are controlled. This is in direct contrast to adeterministic system, in which a given set of inputs will alwaysproduce the same set of outputs. Computational models canmimic the randomness of biological systems to gain insightinto the nature of particular biological mechanisms, such asthe role a speciﬁc ligand plays in a signaling network, or thesubstance concentration needed to initiate a reaction. In addi-tion, computational models have been shown to be invaluableas researchers can run a large number of in silico experimentsthat would be inefﬁcient or impractical if attempted in vivo or in vitro . Such models allow for predicting wet-lab outcomesand for shedding insights into the nature of biological systems.As computational models are versatile and can be constantlyupdated via input from the user, they have the potential to beused in clinical applications such as personalized medicine. Amodel of a physiological system could be created and thenmodiﬁed in real time to more accurately reﬂect the uniquecharacteristics of a speciﬁc person. The model could then beused to predict patient outcomes and inform treatments.Although many such personalized medicine software (SW)models exist, they are slower and less efﬁcient than hardware(HW) approaches [1], [2], [3], [4]. Implementation in FPGAsallows running concurrent models at a speed superior tocurrent SW models, while still maintaining accuracy andreliability. In this paper, we describe HW emulation of adiscrete model of T cell differentiation in SystemVerilog, andcompare the accuracy and runtime with SW simulation for different simulation schemes and scenarios. These simula-tion schemes and scenarios represent stochasticity in cellularsignaling networks and different component conditions andchanges that occur in these networks. Selecting elements forupdate in such simulations is a non-trivial task, thereforewe also discuss and implement novel random update indexgeneration techniques for stochastic simulation schemes.II. M ETHODS

A T cell differentiation model was used in this work tocompare HW and SW simulation. The model is described in[5], where it was analyzed using the SW simulator describedin [6]. In this work, we compared HW and SW simulationof this model using simulation schemes described in [7],where an executable model consists of elements or elementgroups that are updated simultaneously within the group. Eachelement has an associated update rule, and elements/groupshave a certain probability of their rules being executed in agiven simulation step or round. Here we implemented: the si-multaneous (SMLN) scheme; random-order sequential (RSQ)scheme, both round-based (RB-RSQ) and step-based (SB-RSQ); and grouped random-order sequential (RSQ-g) scheme,both round-based (RB-RSQ-g) and step-based (SB-RSQ-g).The HW framework and simulation schemes were imple-mented in SystemVerilog and simulated using ModelSim. Forthe SW comparison, we used the DiSH (Discrete, Stochastic,Heterogeneous) SW simulator, previously described in [7].

A. HW Framework

A schematic of the HW simulator is shown in Figure 1.The simulation schemes are implemented and selected withinthe

Rule Selector block, and each simulation is run for aspeciﬁed number of steps or rounds. In addition, the model canbe initialized with different element values at the beginning ofeach simulation.A

Start signal to the

Control Path module promptsthe

Rule Selector to randomly generate an index viathe

Enable pin. If the

Valid Rule pin indicates avalid index,

Control Path asserts a high

Load sig-nal to the

Inhibitor , Updated , and

Current StateRegisters , allowing their values to be modiﬁed on thenext clock cycle. The index is used as a

Select inputfor the

Current State Register , which keeps trackof the state of each element in the model. The output a r X i v : . [ q - b i o . M N ] N ov ontrol Path Load Next StateEnable RNGValid RuleStart

Rule Selector

Valid RuleSeed RuleEnableIs steady state? Steady

RNG Seed

Current State Register

Data InInitial State LoadSelect Data Out

Updated Register

Data InInitial State LoadSelect Data Out

Inhibitor Register

Data InInitial State LoadSelect Data Out

Select Inhibitor Load InhibitorNumRulesInitial State

Comparator

Input 1Input 2Equal?

Comparator

Input 1Input 2Equal?

Previous StateRegister

Data InLoadData Out

NumRules

Network Logic

Next StateCurrent StateLoad UpdatedLoad Last State

Fig. 1. Hardware framework for simulating cellular networks with differentsimulation schemes and random update index generation. ( Data Out ) of the

Current State Register is thenbitmasked with output (

Data Out ) from the

InhibitorRegister , where the inhibitor is chosen by the initialconditions (

Select Inhibitor ). The result of the com-bined current state and inhibition is stored in the

PreviousState Register and passed into the

Network Logic module, which contains the update rules of the model. Thecurrent state is passed on the

Current State pin, and the

Network Logic module outputs the next state of the systemon the

Next State pin. The result is stored in the

CurrentState Register . The circuit is “steady” once all ruleshave been run, as tracked by the

Updated Register .The upper

Comparator module compares the updated ele-ments/groups with the complete list of elements/groups, assert-ing the

Steady pin on the

Control Path if they are equal.The circuit is said to be in “steady state” if the current state isequal to the previous state. The lower

Comparator modulecompares the output of the

Current State Register to that of the

Previous State Register and asserts the

Is steady state? pin on the

Control Path if theyare equal.

B. RNG Algorithms

While the SMLN scheme is deterministic, all other schemesrequire random number generation (RNG) to generate a ran-dom rule index that selects elements/groups and executes theirupdate rules in each simulation round/step. We describe heretwo novel HW-based algorithms for RNG that we created andimplemented within the

Rule Selector block in Figure 1.

1) Round-based RNG:

In the round-based simulationschemes, each element/group is updated at least once in agiven round; therefore each round consists of a number of stepsequal to the number of elements/groups in the model. Ourround-based RNG algorithm allows linear runtime with respectto the number of elements/groups in the model by eliminatingthe possibility of duplicate or invalid rule index generation.The operation is illustrated in Figure 2. Two parallel registerarrays, A and B, are utilized as stacks for storing itemsconsisting of a rule index (“Priority”) and randomly generatednumber (“Value”). Stack A always starts empty and growsupward while Stack B starts out full and recedes downward.With each step, a new item is pushed up Stack A. If the newitem’s Value is greater than an existing Value, the Priority ofthe new item is increased by one. Otherwise, the Priority ofthe existing item is increased by one. If there are no otherValues on the stack, the rule takes a default priority of 0.Simultaneously to the generation of Stack A, an item is poppedoff of B, and its Priority determines which element/group isupdated in the current step. When a round is completed, StackB is empty and Stack A is full. The items from A are thenloaded into B, Stack A is ﬂushed, and the parallel generationand popping of items repeats. At the start of the simulation,where both stacks are empty, items must be generated to ﬁllStack B before a round can begin. After the initial generationof Stack B, the coincident generation and popping of itemsallows linear runtime without duplicates or invalid rules.

2) Step-based RNG:

A RNG algorithm was also developedfor the step-based scheme, where the same element/groupcan potentially be updated multiple times before other ele-ments/groups are updated. Directly using the RNG numberas the rule index is highly inefﬁcient for a model that has anumber of elements/groups not equal to a power of two: amodel with 37 elements would need 6 bits to represent eachof the potential element updates, leading to −

37 = 27

RNG outputs (42%) counting as a costly “miss”. Here, wedetermine the rule index ( I ) using Equation 1, where X is theRNG number and E is the number of elements/groups. Wealso use 10 RNG bits ( n = 10) to approach uniform probabilityof generating any index, calculated by setting a threshold of (2 n mod E ) / n < . I = X mod E (1) ig. 2. The RB-RSQ rule selection scheme implementation for a sample 16-rule system, utilizing a linear feedback shift register (LFSR). A 16-element/groupsystem as diagrammed here would be = 8 bits wide and 16 registers tall. Each register in the array could be thought of as being logically dividedin half, with the least signiﬁcant bits corresponding to a Value, and the most signiﬁcant bits indicating the Priority. C. HW and SW Comparison

We simulated eight scenarios representing different inputvalue conﬁgurations, and compared the SW and HW simula-tors by studying the responses of FOXP3 and IL2 (markersof regulatory or helper T cell phenotypes, respectively). TableI shows the initial values for input signals that were variedfor the eight scenarios: antigen dose affecting T cell receptor(TCR) signal strength, transforming growth factor beta ligand(TGF), and the inhibitor of protein kinase B (AKT off). Thepercentages listed for the Toggle scenarios in Table I representthe percentage of simulation steps/rounds that were completedbefore the protein’s value was toggled. Of the variables notlisted in Table I, CD28, PTEN, TSC, CD122, and CD132 wereinitialized to 1, and all the other variables were initialized to 0,to mimic the na¨ıve T cell phenotype at the beginning of eachsimulation. We ran all round-based simulations for 30 rounds,and all step-based and SMLN simulations for 2000 steps [5].We ran each simulation scheme 200 times from the initial tosteady state and calculated average trajectories, according tomethodology from [8]. III. R

ESULTS

Figures 3 and 4 show the difference between averagetrajectories obtained with the HW simulator and the SWsimulator for FOXP3 and IL2 and each simulation scheme.Average root-mean-square error (RMSE) across all scenariosfor each simulation scheme is shown in Table II.Runtime comparisons are shown in Table III for eachsimulation scheme. For HW runtimes, the number of clockcycles used as reported by ModelSim was divided by 50 MHz,a common FPGA base clock frequency. This was comparedto SW runtime as performed on a 2015 Apple MacBook Pro

TABLE IS

IMULATION SCENARIOS AND INITIAL VALUES

Scenario TCR high TCR low TGFbeta AKT off Toggle1 1 0 0 0 -2 0 1 0 0 -3 1 0 1 0 -4 0 1 1 0 -5 1 0 0 1 -6 1* 0 0 0 20.00%7 1* 0 0 0 26.67%8 1* 0 0 0 33.33%* Initial value before toggle TABLE IIRMSE

AVERAGED ACROSS ALL SIMULATION SCENARIOS

Scheme% Error RB-RSQ SB-RSQ RB-RSQ-g SB-RSQ-g SMLNFOXP3 1.13 11.18 10.86 6.22 14.99IL2 0.99 10.79 13.00 5.41 6.23 (3.1 GHz Intel i7 processor). The HW implementation had amedian speedup of 54.4X for round-based simulations, and426.7X for step-based simulations. The greatest speedup wasprovided for the SMLN scheme, followed by RSQ-g and thenRSQ. IV. D

ISCUSSION

Our results show that all ﬁve simulation schemes in HWproduced results comparable to the SW simulator, with all ig. 3. Difference between HW and SW simulator output for FOXP3 andIL2 with the RSQ scheme (top), and the RSQ-g scheme (bottom).Fig. 4. Difference between HW and SW simulator output for FOXP3 andIL2 with the SMLN scheme.

RMSEs under 15%. The round-based implementation hadthe smallest RMSE, but with the smallest runtime speedup,making it a Las Vegas type algorithm. Conversely, the SMLNscheme showed the greatest runtime speedup but greatestRMSE, making it a Monte Carlo type algorithm. The groupedstep-based implementation had the second smallest RMSE andthe fastest runtime speedup for all the RSQ methodologies.The step-based HW implementation was expected to providegreater speedup than the round-based implementation becausethe round-based RNG relies on pushing numbers onto astack at linear runtime. Similarly, the SMLN schemes showmuch greater speedup than the other schemes, because everyelement’s value is updated in each step without the need forRNG. However, the RSQ schemes are more desirable formodeling stochasticity in biological networks [1], [2], [3], [4].The speedups for RSQ simulation schemes were similar toprevious stochastic simulation work by Yoshimi et al. [2], andHW runtimes were similar to those reported in [4], despitedifferences in simulator architectures.All non-Toggle HW simulations approached zero differencefrom the SW simulator at steady state. For the Toggle scenar-ios, a large deviation occurred subsequent to the toggle ofthe antigen dose, and the steady state values were differentbetween the SW and HW simulators, particularly for the

TABLE IIIR

UNTIME COMPARISON FOR EACH SIMULATION SCHEME .Runtime [ms]Scheme Steps/Rounds SW HW Speedup

Round-Based Schemes

RB-RSQ 1929500 1062.5 38.6 27.5XRB-RSQ-g 1216500 1253.6 24.3 51.5XSMLN 11300 645.8 0.2 2857.5XRB-RSQ Toggle 1929500 1027.6 38.6 26.6XRB-RSQ-g Toggle 1216500 1394.7 24.3 57.3XSMLN Toggle 11300 688.9 0.2 3048.3X

Step-Based Schemes

SB-RSQ 401300 3225.1 8.0 401.8XSB-RSQ-g 303537 2745.0 6.1 452.2XSB-RSQ Toggle 401300 3036.9 8.0 378.3XSB-RSQ-g Toggle 303537 2741.7 6.1 451.6X grouped schemes. This can be likened to a logic circuit whoseinputs change before the output has stabilized.The most likely source of error in the RSQ methodologieslies in comparing the randomness of the HW RNG and therandomized SW update rule selection. This is evidenced by thegreater differences in transient behavior compared to steadystate. Comparing the HW and SW simulators with the samesequence of element/group updates could provide more similartransient results, and could be investigated in future work.The SMLN scheme implementations showed differences atthe start of the simulation, but reached zero difference at steadystate. In the SMLN scheme, each element not governed bya forced initial condition was given a random initial state,leading to possible initial states. Since only 200 of theseinitial states were tested, and the 200 states used in the HWmodel most likely differed from the 200 states used in the SWmodel, deviations are expected especially at the start of thesimulation due to different initial conditions. In a simulationwhere the HW and SW models utilize the same initial states,the HW and SW simulation results are the same.V. C ONCLUSION

In this work, we designed and emulated ﬁve HW-based sim-ulation schemes and two random update index generation tech-niques that were accurate in predicting phenotype decisionsin T cells and afforded orders of magnitude runtime speedupcompared to SW simulation. For the T cell model in this work,the round-based implementation was most accurate to SWbut provided the smallest runtime speedup, while the SMLNimplementation provided the fastest result although with loweraccuracy to SW. Future work includes parallelization, as wellas improving the random update index generation to increasethe accuracy without the cost of runtime.A

CKNOWLEDGMENT

This work is supported in part by DARPA Big Mechanismaward W911NF-14-1-0422.

EFERENCES[1] L. Salwinski and D. Eisenberg, “In silico simulation of biologicalnetwork dynamics,”

Nature Biotechnology , vol. 22, no. 8, pp. 1017–1019,2004. [Online]. Available: http://dx.doi.org/10.1038/nbt991[2] M. Yoshimi, Y. Iwaoka, Y. Nishikawa, T. Kojima, Y. Osana, A. Funahashi,N. Hiroi, Y. Shibata, N. Iwanaga, and H. Yamada, “FPGA implementationof a data-driven stochastic biochemical simulator with the next reactionmethod,” in . IEEE, 2007, Conference Proceedings, pp. 254–259.[3] N. Miskov-Zivanov, A. Bresticker, D. Krishnaswamy, S. Venkatakrishnan,P. Kashinkunti, D. Marculescu, and J. R. Faeder, “Regulatory networkanalysis acceleration with reconﬁgurable hardware,” in . IEEE, 2011, Conference Proceedings, pp. 149–152.[4] N. Miskov-Zivanov, A. Bresticker, D. Krishnaswamy, S. Venkatakrishnan,D. Marculescu, and J. R. Faeder, “Emulation of biological networks inreconﬁgurable hardware,” in

Proceedings of the 2nd ACM Conferenceon Bioinformatics, Computational Biology and Biomedicine . 2147893:ACM, 2011, Conference Proceedings, pp. 536–540.[5] N. Miskov-Zivanov, M. S. Turner, L. P. Kane, P. A. Morel, and J. R.Faeder, “The duration of T cell stimulation is a critical determinant ofcell fate and plasticity,”

Science Signaling , vol. 6, no. 300, p. ra97, 2013.[6] I. Albert, J. Thakar, S. Li, R. Zhang, and R. Albert, “Boolean networksimulations for life scientists,”

Source Code for Biology and Medicine ,vol. 3, p. 16, 2008.[7] K. Sayed, Y. H. Kuo, A. Kulkarni, and N. Miskov-Zivanov, “DiSHsimulator: Capturing dynamics of cellular signaling with heterogeneousknowledge,” in , Dec 2017,pp. 896–907.[8] K. Sayed, C. A. Telmer, and N. Miskov-Zivanov, “Motif modeling forcell signaling networks,” in2016 8th Cairo International BiomedicalEngineering Conference (CIBEC)