A system architecture for parallel analysis of flux-balanced metabolic pathways
Mona Arabzadeh, Mehdi Sedighi, Morteza Saheb Zamani, Sayed-Amir Marashi
AA system architecture for parallel analysis of flux-balancedmetabolic pathways
Mona Arabzadeh , Mehdi Sedighi , Morteza Saheb Zamani ,and Sayed-Amir Marashi Department of Computer Engineering and Information Technology,Amirkabir University of Technology, Tehran, Iran and Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran. { m.arabzadeh,msedighi,szamani } @aut.ac.ir, [email protected] Abstract
In this paper, a system architecture is proposed that approximatelymodels the functionality of metabolic networks. The AND/OR graphmodel is used to represent the metabolic network and each processingelement in the system emulates the functionality of a metabolite. Thesystem is implemented on a graphics processing unit (GPU) as the hard-ware platform using CUDA environment. The proposed architecture takesadvantage of the inherent parallelism in the network structure in terms ofboth pathway and metabolite traversal. The function of each element isdefined such that it can find flux-balanced pathways. Pathways in bothsmall and large metabolic networks are applied to the proposed architec-ture and the results are discussed.
Keywords:
Elementary Flux Mode (EFM); Graph Data Model; Graphics Process-ing Unit (GPU); Metabolic Pathways
Introduction
System architecture is defined as a generic discipline to handle “systems” consid-ered as existing or to be created objects [1]. The main purpose of this modelingis to support reasoning about the structural behavior and properties of the ob-jects. System architecture helps to consistently describe and efficiently designcomplex systems such as an industrial system or an organization which can becomprise of smaller parts called “subsystems” as shown in Fig 1-a. In the pro-posed method, each subsystem is considered as a metabolite. The approach isextensively described in the
Materials and methods section.
Systems Biology , known as the system-level integration of experimental andcomputational studies in biology, plays a significant role in understanding com-plex network systems [2]. In biological systems, functions emerge both from theelements and the network interconnections. This underscores the importance ofcomputational analysis to extract useful information from biological data.Reconstruction of genome-scale metabolic network models, which are amongthe best-studied models in biotechnology, is possible by collecting the gene-protein-reaction information from related biochemical databases, genome anno-tations and literature [3]. Imposing constraints on the fluxes of a reconstructed1 a r X i v : . [ q - b i o . M N ] N ov iochemical network results in the definition of achievable cellular metabolicfunctions [4]. Several mathematical representations of constraints are typicallyused for constraint-based modeling of metabolic networks, including flux-balance constraints (e.g., conservation of metabolic fluxes) and flux bounds . The formermeans the network should be at the steady-state condition and the latter limitsthe numerical ranges of network parameters such as the minimum and maximumrange of fluxes for each reaction.Elementary Flux Mode (EFM) analysis is a well-studied method in constraint-based modeling of metabolic networks. In EFM analysis, a network is decom-posed into minimal functional pathways [5] based on the assumption of balancedmetabolic fluxes.The use of EFM analysis [6, 7] is of interest in several biological applications.Bioengineering [8], phenotypic characterization [9], drug target prediction [10]and strain design [11] can be mentioned among these applications.Double-description is a technique to enumerate all extreme rays of a poly-hedral cone that is widely used in finding EFMs. Some of these methods usethe stoichiometric matrix itself [6] and some others use the null-space of thestoichiometric matrix to generate EFM candidates [12, 13, 14]. Computationalapproaches have been proposed to speed up double-description-based methodsto compute EFMs [15, 16] and some of them led to the development of compu-tational tools such as Metatool [17] and
EFMtool [16]. Some recent methods,such as [18], try to bring the insights of graph-theory to generating EFM candi-dates. Hardware platforms such as GPUs are also used as accelerators to speedup the process of generating candidates [19]. Besides, methods that explore aset of EFMs with specific properties, such as K -shortest EFMs [20], or EFMswith a given set of target reactions [21], have been proposed based on linearprogramming.The main focus of this paper is to propose a system architecture based onthe AND/OR graph data model [22]. In the AND/OR graph representation ofa metabolic network, metabolites are considered as graph nodes and connectedto each other through hyperarcs which model as reactions. The input of thissystem is the stoichiometric matrix of the given metabolic network. Its outputis the set of EFMs of the network as shown in Fig 1-b. The term systems bi-ology should not be confused with system architecture . “Systems biology” is ageneral term which is used for describing the holistic view of a biological en-tity. In systems biology, finding EFMs is a classical problem which can help inunderstanding the global behavior of metabolism. Here, the problem of findingEFMs in the context of systems biology is modeled as a system architecture. Thesmallest repetitive part of the system is considered as a module that emulatesthe function of a metabolite to find the set of minimal flux-balanced pathwaysknown as EFMs. The system has a network topology that enables a paralleliz-able computational scheme. Designing a model to map a biological network to ahardware platform to take advantage of the potential multicore computationalpower, and not necessarily just a hardware accelerator, is the main contribu-tion of this paper. The underlying innovation in our proposed method is notin a parallel computing implementation of the existing algorithms. Instead, we2onsider a metabolic network as a system with seemingly independent subsys-tems that have very intricate relationships with their surroundings. While eachsubsystem is acting in parallel with the other ones, it is tightly coupled withthe rest in terms of the inputs it receives and outputs it generates. We call thisarchitectural or topology-based parallelism. Inspired by this observation, wedefine a model that closely resembles a real metabolic network. In this modeleach metabolite is represented by an “independent” processing element that isbusy performing its own reactions by processing its inputs and producing itsoutputs. Once this model is established, GPU seems like a natural choice toimplement the notion of topology-based parallelism. Complex SystemSubsystem A Subsystem C Subsystem B Input Output Metabolic NetworkMetabolite B EFMsStoichiometric Matrix Metabolite A Metabolite C (a) (b) Figure 1:
System Architecture. (a) Designing a distributed and analyzablestructure for a complex system consider as a system architecture. (b) Systemarchitecture view of the proposed method.The rest of the paper is organized as follows. Preliminaries are provided inSection
Preliminaries . The main system design along with required definitionsare discussed in Section
Materials and methods . Section
Implementation andanalysis is devoted to implementation and results. Section
Discussion discussesdesign metrics and specifications and, finally, Section
Conclusion concludes thepaper.
Preliminaries
In this section, some basic concepts of metabolic networks and their counterpartgraph data model, formal definition of EFMs and a brief introduction to parallelcomputing are provided.
Metabolic networks
Metabolic networks model the metabolism of living cells in terms of a set of bio-chemical reactions. The biochemical reactions can be irreversible or reversiblewhich means the reaction can be active only in one direction, or can be active inboth directions, respectively. The contributing metabolites in a reaction can beeither substrates or products. Substrates are consumed and products are pro-duced during the operation of a reaction. The topology of a metabolic network3s characterized by its m ˆ n stoichiometric matrix, S , where m and n corre-spond to the number of metabolites and the number of reactions, respectively.The value S ij represents the stoichiometric coefficient of the metabolite i in thereaction j . S ij is positive/negative if the metabolite i is produced/consumed.If this coefficient is zero it means that the metabolite i does not contribute tothe reaction j . The network is considered in the steady-state if for each internalmetabolite, the rates of consumption and production are equal. The reactionsconnected to the external metabolites are called Boundary reactions.
Definition 1. Flux Mode. A flux mode v P R n illustrates flux distributionsof a set of reactions in a given metabolic network. Non-zero values in v representreaction fluxes. Definition 2. Flux-Balanced Mode.
A flux mode v P R n and v ‰ iscalled flux-balance , if it meets the following conditions: • v i ě i P irreversible reactions (thermodynamic constraint) and • S . v “ , i.e., the rates of consumption and production of internal metabo-lites are equal (steady-state condition). The low-dot operator simply isthe matrix inner product. Definition 3. Elementary Flux Mode.
A flux mode v P R n and v ‰ isconsidered an EFM, if it meets the following conditions: • v is flux-balanced based on Definition 2, • there is no v P R n with supp p v q Ă supp p v q , where support of a modeis defined as supp p v q “ t i | v i ‰ u , (minimality limitation).In Fig 2, an example of a metabolic network is illustrated. Metabolic network data model
The used data model in this paper is derived from [22]. The model is basedon the conventional AND/OR graph in computer science with additional fea-tures to make the model appropriate for object-oriented methods consideringthe metabolites as objects. In this model, the coefficients of the metabolitesin reactions are embedded in the graph structure as attributes of each node(defined in Definition 6, below). The information of the candidate pathways isalso embedded in the graph structure.
Definition 4.
Modified Graph for representing a metabolic network, denotedas MG , is defined as a set of Nodes , i.e., MG = t N i | ď i ď M ´ u , where M is the number of internal metabolites and each N i represents a metabolitein the network. Definition 5.
Each node N i in MG is a 3-tuple N i = ( i , I , O ) where • i is the tag of a metabolite, 4 BC r r r r r −⎡ ⎤⎢ ⎥−⎢ ⎥⎢ ⎥−⎣ ⎦ r r r r r ABC
S = ⎧ ⎫⎡ ⎤ ⎡ ⎤⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎨ ⎬⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭
EFMs = v v S v =0S v =0(a) (b)(c) Figure 2:
An example of a metabolic network. (a) Stoichiometric matrixof the network. (b) Graph representation of the network. (c) The set of EFMsof the network. • I is an array of input reactions that produce the metabolite and • O is an array of output reactions that consume the metabolite. Definition 6.
Each I / O in Definition 5 contains the following data: • The reaction j , 0 ď j ď r ´
1, where r is the number of reactions consum-ing/producing the metabolite i , • I M j / O M j , an array of the metabolites consumed/produced by reaction j .In other words, I M j / O M j = t m kj | ď k ď m ´ u , where m is the numberof consumed/produced metabolites by the reaction j , • I " M j / O " M j , an array of the metabolites produced/consumed by reaction j . In other words, I " M j / O " M j = t m kj | ď k ď m ´ u , where m is thenumber of produced/consumed metabolites by the reaction j excludingthe metabolite i itself, • The direction of the reaction for reversible reactions, • I c ij / O c ij , the coefficient of the reaction j for the produced/consumedmetabolite i in the stoichiometric matrix S and • I f ijp / O f ijp , the flux of the input/output reaction j for a certain path p .For bidirectional reactions, the term “direction” is used to clarify that eitherthe reactants react to form the products, or the products react together toproduce the reactants in the backward direction. A hyperarc links a set of5odes to another set in one connection. The incoming hyperarcs (i.e., I inDefinition 5) in each N i produce the metabolite i and the outgoing hyperarcs(i.e., O in Definition 5) consume it. Therefore, consumed metabolites, m i , andproduced metabolites, m i , contributing to reaction j as shown in Eq 1, are ashyperarcs between N i nodes each associated to one metabolite. I hyperarcs and O hyperarcs in N i nodes are related to each other by reaction tags in Definition6. The set of metabolites in m i , m i , are AND-related . The incoming I (outgoing O ) hyperarcs for each N i are OR-related . r j : m ` m ` ... ` m i é m ` m ` ... ` m i . (1) Definition 7. Pathway.
A pathway in a graph data model is defined as achain of adjacent N i nodes such that they form a hyperpath from a source nodeto a sink node. Adjacent nodes represent the metabolites which contribute in acommon reaction.A pathway can be seen as a subgraph of the AND/OR representation of ametabolic network. In addition, a pathway can be represented as a vector in aconvex flux cone. It should be noted here that these two definitions are shownto be equivalent [23]. Parallel computing and programming
Parallel computing is a type of computation in which independent executableelements of a system can run simultaneously. This is based on the assumptionthat large problems can often be divided into smaller ones, which can then besolved at the same time. Solving complex scientific and engineering problemsrequire huge computational power. Processing power in high performance com-puting (HPC) is measured by floating point operations per second, or
FLOP/s and the size of data in
Bytes considering that a double precision floating pointnumber takes 8 bytes of memory. Memory and timing constraints are amongprincipal reasons that move the trend to parallel computing and programmingframeworks. Each memory access can take several CPU cycles and increasingclock frequency does not readily improve performance anymore.There are several forms of parallel computing such as instruction-level andthread-level schemes. Instruction-Level Parallelism (ILP) is defined as a par-allelism among individual instructions executed by a microprocessor. Parallelparts are automatically extracted by the microprocessor. This type of paral-lelism is limited by the pipeline depth and instruction dependencies. On theother hand, Thread-Level Parallelism (TLP) happens when a set of multipleconcurrent tasks are running and the programmer ' s involvement is required toextract the parallelism as done in our model.Considering data along with instructions, two models are defined. In Sin-gle Instruction Multiple Data (SIMD) model, a large number of (usually small)processing elements apply a single instruction on multiple data sets. In otherwords, a single controlling processor issues each instruction and each process-ing element executes the same instruction. Multiple Instruction Multiple Data(MIMD) is another model in which each processor executes its own sequence of6nstructions independently. An extended model of SIMD, Single Thread Multi-ple Data (STMD), is used in the proposed model and will be explained later.A programming model is a bridge between a system developer ' s naturalmodel of an application and an implementation of that application on the tar-get hardware. A programming model must allow the programmer to balancethe competing goals of productivity, in terms of time and resource, and imple-mentation efficiency [24].Different hardware platforms and programming environments are introducedin the recent years to address the parallel computing requirements. The pro-posed system independent of the underlying hardware is presented in Section Materials and methods and the appropriate hardware for our proposed modeland its programming requirements are discussed in Section
Implementation andanalysis . Materials and methods
In this section, the proposed system architecture is introduced and then therelationship between the system and the applied data model is discussed. Themain target of the system is to model the function of each metabolite and createminimal pathways along these functional systems. To better describe the model,a set of terms are defined in the following section.
Definitions
Definition 8. Pathway Creation.
Pathway creation in a graph data model isdefined as the process of starting from a boundary reaction (i.e., an external arc)and following the chain of reaction-metabolites, recursively adding the AND-related nodes to a list. When a reaction adds a metabolite to the list of thepathway, the metabolite is called to be triggered by the reaction.
Definition 9. Forward/Backward Flow.
The term f orward is appliedto a flow when the process of pathway creation is in a forward direction for aparticular metabolite, which means that the metabolite is produced accordingto the flux changes of the input/output reactions of that metabolite. The term backward is applied when the flow is in a backward direction, i.e, the metaboliteis consumed according to the flux changes.
Definition 10. Primary and Secondary Reactions.
On each pathway,each node (metabolite) has a primary input (PI) and a primary output (PO).In forward/backward flow, a PI/PO is the first reaction which enters a node anda PO/PI is the first reaction which directs the flow of the pathway and tags thenode as visited . When an edge (i.e., a reaction) enters an already visited nodein that pathway in a forward/backward flow as an input/output, that reactionis tagged as a secondary input/output (SI/SO) of that node.7or example, when metabolites A and B are AND-related, if A is con-sumed during the reaction A + B Ñ C + D , then B should also be present in themedium which means A and B should be consumed simultaneously. The for-ward/backward flow is designed for the produced/consumed metabolites [22].In this example, the metabolites A and B are consumed while C and D areproduced. The order of adding nodes to the list in the pathway creation processin the forward/backward flow names reactions as primary or secondary. Anexample of primary and secondary reactions is illustrated in the graph model ofFig 3. r r r r r r Figure 3:
Primary and Secondary Reactions.
The pairs (r0,r1), (r1,r3),(r3,r5), (r3,r2), (r3,r4) act as primary reactions (input, output) for the five nodessince they tag the nodes as visited in the pathway creation process. The twonodes with dashed reactions are already visited when r2 and r4 enter, so thereactions are tagged as secondary reactions.
Definition 11. Flux-Dependent Reactions.
A primary output and a sec-ondary input in a forward flow (or a primary input and a secondary outputin a backward flow) are called flux-dependent reactions if the pathway whichgoes through the primary reaction, extends to the secondary reaction. In thiscase, their fluxes are linearly dependent over the target node. The dependencyis shown by the sign in the rest of the paper. The two dependent types areillustrated in Fig 4. The proposed architecture
In this section, the main architecture of the system to model and analyzemetabolic networks is provided. The main goal of the system is to create mini-mal flux-balanced metabolic pathways. In the proposed architecture, each nodein an MG graph is considered as a processing element or module with a lo-cal memory. The whole system has a global memory used by the processingelements to synchronize their tasks. The system architecture is illustrated inFig 5. It consists of m META modules, each corresponding to one node, and8 a) (b) R PI R SI R PO R PI R PO R SO N i N i Figure 4:
Flux dependencies.
Reactions with flux dependencies with respectto metabolite N i . (a) R P O (primary output) and R SI (secondary input) arerecognized as flux-dependent reactions in a forward flow, R P O R SI . (b) R P I (primary input) and R SO (secondary output) are recognized as flux-dependentreactions in a backward flow, R P I R SO .an arbiter which evaluates the constructed pathways and decides whether theybelong to a subset of the desired pathways or not. The modules are introducedin further detail below. InputsOutputs
META ARBITER
META x InputsOutputs
META InputsOutputs
META InputsOutputs
META M CONTROL AND ARITHMETIC UNIT MEMORY
ARBITER
Number of ReactionsNumber of EFMs
ARBITER
Figure 5:
Main architecture.
The main architecture of the hardware modelof a metabolic network. 9 odules: META x Each META module with the index x consists of a control/arithmetic unit and amemory unit. The task of the module is to keep the information of the pathways(i.e., EFM candidates) and to execute the required instructions based on theobtained information from the neighboring META modules. Two modules areneighbors if they are data dependent (i.e., have shared reactions). The twointernal units of the META x module are discussed below. Control and Arithmetic Unit.
Fig 6 illustrates the flowchart of theoperation of the control and arithmetic unit. The task of all boxes labeled asPROC in the figure are summarized in Table 1.
STARTSTOP
PROC2'
PROC PROC PROC PROC PROC Flux<0
Primary Input Primary Output
PROC MEMORY
Control and Arithmetic Unit
PROC 1
Non-VisitedMetabolite
PROC 1
PROC Flux>0
PROC 1PROC 1
PROC Non-VisitedMetaboliteVisitedMetaboliteVisitedMetabolite Flux = 0
Primary InputPrimary Output
Figure 6:
Control and arithmetic unit.
The flowchart of the control andarithmetic unit of the META x module.The thread inside the unit is running continuously until the arbiter closesthe processing element or the maximum times a node is visited, representedby MAX LOOP, is reached. In the beginning, the metabolite balance of eachpathway, is calculated in PROC based on Eq 2. EF “ ÿ I c ij I f ijp ´ ÿ O c ij O f ijp . (2)In this equation, EF indicates the extra flux. Depending on the value of EF , one of the three routes in the chart should be selected. If the metabolite10able 1: The task of the processes in Fig 6. Process Name TaskPROC Calculate metabolite balance for each pathway using Eq. (2)PROC p For each output/input, start a new pathway using Eq. (3) and go to PROC ;Tag P rimary input/output reactionsPROC p Check reaction dependency statusPROC p Use Eq. (4) to balance the flux and go to PROC or go to STOP if MAX LOOP is reachedPROC p Use Eq. (5) to balance the flux and go to PROC or go to STOP if MAX LOOP is reached is visited for the first time, new pathways are created for each output/input inPROC /PROC using Eq 3. O f ijp “ I fijp I cij O cij , EF ą p Forward q I f ijp “ O fijp O cij I cij , EF ă p Backward q . (3)Otherwise in PROC /PROC the reaction dependency is checked betweenthe input/output flux-changed reactions. Based on the results of PROC /PROC ,if reactions are independent, Eq 4 is used in PROC /PROC to pass the fluxto PO/PI reactions. I f p n q ijp “ I f p o q ijp I cij ` O fiep I cie I cij ,O f p n q ijp “ O f p o q ijp O cij ` O fiep I cie O cij . (4)Otherwise, Eq 5 is used and fluxes are passed to PI/PO reactions. In bothPROC /PROC and PROC /PROC , a variable which keeps track of the num-ber of times a metabolite is visited is checked against a constant MAX LOOPto avoid the loop over a node forever. I f p n q ijp “ I f p o q ijp I cij ´ I fiep I cie I cij ,O f p n q ijp “ O f p o q ijp O cij ´ I fiep I cie O cij . (5) Memory Unit.
The data stored in the memory unit of the META x can bedivided into two categories. • Metabolite Local : Consists of static and dynamic information. Static infor-mation of each metabolite includes input/output stoichiometry coefficientsand dynamic information includes the metabolite status in each pathwaywhich is changed over time. • Path Local : Consists of dynamic information of a pathway.11tatic and dynamic terms indicate Read-Only and Read/Write memory types,respectively. Fig 7 illustrates the structure of the memory unit.
MEMORY of META x Reaction Direction Reaction FluxReaction FluxReaction FluxReaction CoefReaction CoefReaction Coef Reaction DirectionReaction DirectionReaction SeenReaction SeenReaction Seen Reaction Direction Reaction FluxReaction FluxReaction FluxReaction DirectionReaction DirectionPathway IDMetabolite Status Reaction Direction Reaction FluxReaction FluxReaction FluxReaction DirectionReaction DirectionReaction StatusReaction StatusReaction Status
Dynamic for each
Pathway
Dynamic for each
Metabolite
Reaction CoefReaction CoefReaction Coef O U T P U T S I N P U T S Static for each
Metabolite
Figure 7:
Memory unit.
The structure of the memory unit of META x .. Modules: Arbiter
Based on the pathway information stored in META x modules, the arbiter de-cides which pathway candidates are balanced and to be considered as EFMs.When all META x modules are done with the analysis of all in-process pathways,the arbiter reports the result. The function of the arbiter is considered as a partof META x function in Algorithm 3. The decision is made based on the statusof each pathway specified in the algorithm as discussed later. Implementation and analysis
The hardware platform introduced in Section
Materials and methods is imple-mented on a Graphic Processor Unit (GPU). To demonstrate the operation ofthis platform, a metabolic network is considered for experiments and the re-sults are provided in the remainder of this section. The platform consists ofa set of parallel threads of STMD type where each META x module runs thesame thread for various data sets. Complication arises because of the inevitabledependencies between the modules ' s data. General-purpose graphics processor units
GPUs were originally developed for computationally intensive graphics of gamesand image processing applications in computer systems. But recently, their vastcomputational power has been utilized in general purpose computing.There are several reasons to consider GPUs as the hardware platform toemulate metabolic networks in this research. Firstly, a suitable environment12or software development based on C/C++ is available for high-level and easyprogramming which is convenient to try alternative options for running testroutines. Secondly, a large number of floating point units (FPUs) in GPUs canoperate in parallel to accelerate our analysis.A general architecture of a GPU is shown in Fig 8-a in which the mainprogram is running on a CPU. To run a process on the GPU, first the data inthe CPU memory is copied to the GPU memory. When the process is doneon the GPU, the result is copied from the GPU memory to the CPU memory.The parallel functional units of the GPU are blocks , each consisting of a set of threads . All threads in all blocks perform the same function in one access tothe GPU. In other words, the main stream of the program is running on theCPU and function calls from the CPU initialize a process on all GPU threads.By the term access , we mean the function call that is invoked by the CPU tosetup the GPU. Therefore, all threads run simultaneously from a programmer ' spoint of view. A typical GPU has three memory levels: a local memory for eachthread, a shared memory between threads of a block and a global memory.The correspondence between the system architecture and the GPU platformis illustrated in Fig 8-b. Each META x element stands on a thread and eachblock is dedicated to a candidate EFM pathway which means that differentpathways are processed by different blocks. CPU GPU
Global MemoryShared MemoryLocal Memory
Copy Data to GPUCopy Data to CPU
Parallel Elements
BlockThread (a) (b)
Number ofEFMCandidatesNumber of Metabolites
Figure 8:
The proposed system architecture on the GPU platform. (a)GPU general architecture and the way GPU and CPU communicate. (b) Howthe proposed system architecture is mapped onto the GPU.Algorithm 1 shows the main function running on the CPU. Three accesses,called kernel functions, are made from CPU to GPU as shown in Fig 9. Detailsof each function are discussed below.
Finding Dependent Reactions over All Metabolites.
In the flowchartof the control/arithmetic unit (Fig 6), flux-dependent reactions of a metabo-13
LGORITHM 1:
Main body of the algorithm.
Input : The stoichiometric matrix of a given metabolic network.
Output : A set of minimal flux-balanced pathways of the network. M R C L r max m max B , T )Convert the given matrix to the MG graph structure.Copy MG structure from CPU to GPU global memory. /* Finding dependent reactions with depth L over all metabolites. */ Set ( B , T ) as ( M , r maxL ) to be run on GPU.Run DEPTHy() on each thread. /* Initializing selected reactions with a nonzero value in all Blocks . */
Set ( B , T ) as ( C , R ) to be run on GPU.Run INIT() on each thread. /* Calculating balanced pathways by keeping all metabolites in steady-state. */ Set ( B , T ) as ( C , M ) to be run on GPU.Run METAx() on each thread.Copy MG pathways information from GPU to CPU. lite should be identified to make better decisions. Algorithm 2 describes theDEPTH y function which starts a path through a metabolite N i and continuesuntil the path length (i.e., the number of involved reactions on the path) isequal to L . If N i is reached again in the created path with length ď L , theinvolved reactions of N i are characterized as flux-dependent. To do this usingthe GPU, one block is dedicated to each metabolite. Each thread of each blockis dedicated to a path starting from N i . Therefore, DEPTH y function is exe-cuted on each thread of each block to traverse paths and resolve dependencies.Each thread of a block is identified by a block ID and a thread ID. The blockID is set to i for the metabolite N i . The thread ID should specify a uniquepath. To do this, the thread ID is used to route the path and to reveal theroute. We propose a static path-generator method in which the thread ID j as a decimal number is converted to a number in base r max and length L as j = p L ď l ă L q r max = p L ´ , L ´ , ..., l, ..., , q r max . r max is the maximum numberof output reactions of a node N i . So, the pair ( M , r maxL ) and DEPTH y is set up on each. We start from a node N i .The index l counts the number of nodes traversed on the graph, that is, theparameter depth . The digit in position l represents the order of the reactionwhich should be selected in depth l . Consider that the node in depth l in thequeue Q is represented by Q l . The number of outputs of Q l is considered as r which is the size of array O of the node. If l ă r , the process goes on, otherwisethe path is undefined. To continue the process, check to see if the node Q l is14 nitialization of Target ReactionsCalculation of Balanced PathwaysFinding Dependent Reactions over all MetabolitesNetwork Structure Data E F M C a nd i d a t e D a t a E F M C a nd i d a t e D a t a E F M C a nd i d a t e D a t a L T (cid:206) (000)3
000 01 T (cid:206) (010)3 L T (cid:206) (000)3
000 01 T (cid:206) (010)3 L T (cid:206) (000)3
000 01 T (cid:206) (010)3 Figure 9:
Kernel functions.
Three kernel functions are illustrated in thefigure. First, dependent reactions are explored. Each pathway is assigned toone thread and the thread number in base r max in L digits shows the order ofthe reactions. Two example for thread IDs 0 and 3 are shown in the figure. r max and L are both considered as three here. The conversion of 0 and 3 to thebase r max gives us p q and p q which specify the route of the pathway inthe assumed thread. Each yellow box is considered as a block and the processrepeats independently for all metabolites each of which dedicated to one block.After initializing the target reactions in one pass, each block is assigned toan EFM candidate and each thread performs the META x function. The bluerectangular shapes are memory elements showing the hierarchical memory levelsin GPU. 15 LGORITHM 2:
DEPTHy: Finding dependent reactions with depth L overmetabolite N i on thread T j in a given MG . Input : The MG graph of a metabolic network derived from its given stoichiometric matrix;Metabolite N i ; Thread ID T j ; Maximum number of input/output reactions of ametabolite r max . Output : Tagged dependent reactions of N i .Convert the decimal thread ID j to an L -digit number in base r max ; i.e., j = p L ď l ă L q rmax = p L ´ , L ´ , ..., , q rmax Store N i in the queue Q .Set l to 0.Set Start reaction as an input reaction of N i . while Q ‰ H and l ď L do Select the output reaction at level l based on L l .Check if the output is valid according to the size of O of Q l . if Q l P O MLl then
Set O as Last reaction.Tag the
Start reaction and the
Last reaction of N i as dependent reactions.Exit while . else Push back O MLl to Q . endend in the set of output metabolites of reaction L l shown by O M Ll (see Definition6). If the node is in the array, the process ends. Otherwise, it continues until l reaches L ´ Initialization of Target Reactions.
In this function, the target reactionswhich are selected to contribute in the pathway are initialized with a defaultvalue (“1” in our implementation). These reactions can be boundary reactionsor any other set of reactions. Connected metabolites to the initialized reactionsare triggered by them. To do this using the GPU, the pair ( C , R ) where C is the number of EFM candidate pathwaysand R is the number of reactions of the network. This is done on the GPU inone pass. Each thread sets/resets the initial value of a reaction on each blockwhich is dedicated to a candidate pathway. Calculation of Balanced Pathways.
In META x function in Algorithm 1,the pair ( C , M ). Each block is dedicatedto an EFM candidate pathway and each thread of the block is dedicated toa metabolite. The META x function is set up on all threads. Algorithm 3shows the function as sketched in the system design flow in Fig 5. In thisfunction, each pathway is created using randomly selected output/input in theforward/backward flow. EFM candidates at the end of the algorithm changeto either Done or notEFM status. The status remains as notDone as long asthere are still unbalanced metabolites on the pathway. Each thread executes a while(true) loop. The while loop is broken when all candidates take a Done or notEFM status. The function of the arbiter module in Fig 5 is embedded inthe META modules performing META x functions. This is because all threadsshould run the same function in each GPU access.16 LGORITHM 3:
METAx: on Thread T i of Block B j . Input : The MG graph of a metabolic network derived from its given stoichiometric matrix;Thread ID T i ; Block ID B j . Output : minimal flux-balanced pathways. while true doforall the
Inputs and Outputs of N Ti do Multiply the input/output coefficient to its flux value (i.e., I cij ˆ I fijp and O cij ˆ O fijp where p is assigned to B j ) and add the value to ExtraFlux as stated in Eq. (2). endif
ExtraFlux “ then Tag N Ti as a Stable metabolite. else if
ExtraFlux ą thenif N Ti is not visited then Select a random output reaction.Calculate the flux using Eq. (3).Save
Primary input, R PI , and Primary output, R PO , for N Ti .Tag the metabolite as visited . elseif R PI R PO is false and MAX LOOP is not reached then
Select output reaction R PO and use Eq. (4). else if R PI R PI is true and MAX LOOP is not reached then
Select input reaction R PI and use Eq. (4). endelse if MAX LOOP is reached then
Tag the EFM candidate as notEFM . endendendelse if ExtraFlux ă thenif N Ti is not visited then Select a random input reaction.Calculate the flux using Eq. (3).Save
Primary output and input for N Ti .Tag the metabolite as visited . elseif R PI R PO is false and MAX LOOP is not reached then
Select input reaction R PI and use Eq. (4). else if R PI R PI is true and MAX LOOP is not reached then
Select output reaction R PO and use Eq. (4). endelse if MAX LOOP is reached then
Tag the EFM candidate as notEFM . endendendforall the Threads (Metabolites) in the Block B j doif All metabolites are
Stable then
Tag the candidate as
Done . endendforall the Blocks (EFM candidates) doif
All candidates tagged as
Done or notEFM then Exit while . endendend L “ r max “
4. The columns
Local M and
Shared M reportthe memory usage in bytes in each thread and each block, respectively. The lastcolumn shows the number of registers allocated for each thread.
Function Name µs ) Local M/Thread Shared M/Block Registers/ThreadDEPTHy 12 256 149.7 816 24 23INIT 48 18 17.9 0 32 12METAx 48 12 15583.2 60 40 27 Application on biologically relevant metabolic networks
CHO cell metabolism
To prove the proposed concept on metabolic networks, a core model of Chi-nese Hamster Ovary (CHO) cell metabolism was chosen from [25, 26]. CHOderived cell lines are the preferred host cells for the production of therapeuticproteins [27]. The stoichiometric matrix of this network is of size 12 ˆ
18 withtwo boundary reactions r and r . An NVIDIA GeForce GT330M GPU andCUDA version 5.5 platform were chosen. An Intel Core-i5 CPU with 4 GBRAM was used to run the test. The network has 7 EFMs. Using random path-way creation, five EFMs were obtained. Table 2 summarizes the results afterattempting 48 EFM candidates with depth L “ r max “
4. The globalmemory used to store the network on GPU was approximately 2 MBytes. Totest that all EFMs are computable by the approach, contributing reactions ofeach EFM were selected intentionally in the algorithm and their pathway werecreated. This test was passed and all EFMs were observed. Considering enoughtime and resource, the approach can lead to the calculation of all EFMs, asproved by Theorem 1.Fig 10 illustrates the steps for creating one of the EFMs in the metabolicnetwork of CHO cell. The table in the top-right corner of this figure showsthe flux of reactions from the point of view of the in-process metabolites stepby step. There are six steps to balance the pathways and Step 7 shows thefluxes of the balanced pathway. The in-process metabolites in each step areshown by big-circles. In each step, fluxes are not equal in active threads. Thishappens due to the accessibility of threads to the shared data of the reactions.To overcome the conflict, a flag is set when a reaction flux is read in Eq (1) andreset after write. If another thread reads the flag as set, it should wait for theflag to reset.Table 3 reports some observations when different sets of reactions are ini-tialized. It should be mentioned that different starting reaction sets may poten-tially lead to different EFMs. For each metabolite, the unbalanced status of themetabolite is counted. The number of conflicts is also reported. Subtracting thenumber of conflicts from the number of steps gives the actual number of parallelsteps. The
Sum column shows the total number of times that all threads wereactivated and the
Ave column shows the average number of a thread is active,that is, the thread has the computational load of updating fluxes. The column18igure 10:
An EFM creation in CHO metabolic network.
The steps ofcreating one of the EFMs of the metabolic network of CHO cell. Six steps arerequired to balance the fluxes of the EFM. In each step, from top to down, in-process metabolites are shown with big-circles and stable metabolites are shownwith smaller ones. The fluxes are shown in the table from the point of view ofthe in-process metabolites in each step. The last column, Step 7, is the finalfluxes of the EFM identical from the view of all metabolites. Paths are depictedusing Escher [28]
Meta/Step shows the average number of in-process metabolites at each step. Ascan be derived from the table, by initializing internal reaction(s), more stepsare taken to make the pathway stable.
E.coli model iAF1260
To analyze the applicability of the proposed architecture on other metabolic net-works, the model of
E.coli network with 1668 metabolites and 2382 reactionswas chosen from [29]. ATP, AMP, ADP, H, Pi, CO , H O, COA, NAD, NADP,NADH and NADPH are considered as cofactors. As stated in the literature [20],some simplifications to the system ' s model can be performed in order to reducethe complexity of the problem, such as setting currency metabolites like cofac-tors and energy metabolites to external. According to [20], for energy currencymetabolites like ATP, NADH and FADH, since their concentration is assumed19able 3: The operation of the system when different reaction sets are initialized.Three types of sets with one, two and three reactions are selected, in which onlyreaction 0 is a boundary reaction. Selected reactions Steps( « ) Conflict Step-Conf Sum Ave Meta/Step m m m m m m m m m r0 8 1 7 16 1.8 2.3 2 1 2 2 2 2 2 1 2r1 8 1 7 16 1.8 2.3 2 1 2 2 2 2 2 1 2r7 60 14 46 67 7.4 1.5 4 3 6 4 6 14 20 4 6r9 61 14 47 77 8.6 1.6 4 3 6 4 6 14 19 15 6r1,10 57 14 43 77 8.6 1.8 4 3 6 4 6 14 19 14 7r4,11 59 14 45 77 8.6 1.7 4 3 6 4 6 15 19 14 6r1,7,9 55 16 39 78 8.7 2.0 4 3 7 4 7 13 19 15 6r0,3,8 56 15 41 78 8.7 1.9 5 3 6 4 6 14 20 14 6 to be constant, they are not required to be balanced by an EFM. Notations aretaken of the COBRA model from [29].The set of shortest EFMs producing L-Lysine calculated by [20] was ex-tracted. The observability of 7 shortest EFMs with length 27 and 28 was stud-ied on the proposed architecture. Fig 11 illustrates the steps of constructing theEFM depicted in the figure. At each step, the set of unbalanced metabolites arespecified and the required changes on the reaction set are illustrated. Reactionfluxes are changed so as to achieve the least required steps to create and balancethe targeting EFM.The same information for all 7 paths are provided in supplementary materialFile 1 (S1.exe) and the resulting EFMs are depicted in supplementary materialFile 2 (S2.pdf). Summary of the results are provided in Fig 12. As shown inFig 12-a, the number of unstable metabolites tends to match a normal curvewhich means that the number of unbalanced metabolites is increased in theintermediate steps. The box plots are also depicted in Fig 12-b for each path toshow the distribution and the average of the unbalanced metabolites.This study shows the potentiality of the proposed architecture for findingEFMs on large-scale metabolic networks. Three sets of variables influence thegeneration of an EFM, specially on large-scale metabolic networks: (1) a setof first-initialized reactions, (2) the primary input and output selection at first-visited metabolites, (3) the decision on how to pass the flow on the metaboliteswhich are not first-visited. In this test, the external input reaction is initializedand for the rest, the best decisions were taken to pass the flow through oneof the reactions of the in-process metabolite. In this example, an exhaustivesearch is applied to find the best selection to pass the flow. However, to makethe approach automated for large-scale networks, further studies are required.The order of metabolites and their accessability to the reaction ' s shared dataand the decision on how to pass a flow to balance a pathway or report it asan unstable path is important. The convergence of the solution is discussed inSection Theoretical analysis . However, a practical implementation to reach thedesired solution in a reasonable time can be proposed by further research.20igure 11:
Producing an EFM in
E.coli metabolic network.
Elevensteps are required to balance EFM fluxes. At each step, the set of unbalancedmetabolites are specified by blue cells. The process of these cells can run si-multaneously. Flux changes for reactions are shown at each step. The last rowstates that there is no unbalanced metabolite and the flux set indicates the EFMfluxes in the steady state. The EFM is depicted using Escher [28]. Metaboliteand reaction notations are taken from the model of [29].
Discussion
Design parameters
In parallel computing, several design parameters should be considered to makethe system work efficiently on the hardware platform. Some of these parametersand the way they are treated in the proposed design are discussed.
Granularity.
The level of details considered in a model or decision makingprocess is considered as
Granularity . The coarser granularity results in thedeeper level of details. Granularity in parallel systems is defined as the size ofa part of a system which is selected to work independently. In the proposeddesign, the granularity is chosen as the size of a metabolite according to thedata stored in each metabolite and the instructions to be executed in each ofthem. Besides, the function of a metabolite is the smallest repeatable part ofthe system. 21igure 12:
Parallel simulation of shortest EFMs of
E.coli iAF1260. (a)The number of unbalanced metabolites at each step for 7 pathways. (b) Thedistribution of unbalanced metabolites at each step for each pathway.
Coordination and Synchronization.
The memory and timing cost ofdata communication and resolving inconsistencies, which may occur because ofdata sharing, should be considered in the design. Uniform shared memory overdifferent blocks and message passing are among data coordination methods. Inthe proposed design, the hierarchical memory structure with different sharinglevels is used for data communication. To overcome the conflicts, solutions areproposed as discussed earlier.
Topology-based parallelism analysis
In the proposed model, two levels of parallelism are used as explained below.
Block-level (pathway-level) parallelism
This level of parallelism is based on the independence of different pathways inthe network. In order words, distinct pathways are created from different out-puts of a metabolite. Each new pathway is independent of the others. In ourimplementation, each block is dedicated to a pathway which executes indepen-dently. The idea is shown in Fig 13-a for a metabolite with two reactions.
Thread-level (metabolite-level) parallelism
This level of parallelism is based on the node structure. When a reaction takesan updated flux, all connected nodes to that reaction, except the one which wasin-process, are triggered. In the proposed implementation, the metabolite-levelparallelism is applied by performing a META x function on different threads22f a block. The idea is shown in Fig 13-b for a reaction with four connectedmetabolites. (a) (b) Figure 13:
Parallelism levels. (a) Block-level parallelism. The two path-ways can be traversed independently. (b) Thread-level parallelism. The threemetabolites can be processed simultaneously. Big circles are nodes which aregetting active concurrently. Thick arrows represent simultaneous active paths.
Performance modeling
It is usually hard to model the performance of a parallel system. Modeling theperformance of the proposed design is even harder since each thread executes ina while loop either to make a pathway flux-balanced or discard it. Z n functionin Eq 6 is defined to model the number of active parallel metabolites (threads)at Step n of the system. Z “ ,Z n “ Z n ´ ˆ P rM p rM n q ˆ P mR p mR n ´ q . (6)In this equation, rM n is the number of input or output reactions at eachstep and mR n is the number of metabolites involved in a reaction. At step n ,for each metabolite there are rM n reactions and each reaction has mR n ´ P rM and P mR functions are used in the model to consider thegraph dependencies in terms of pathway and metabolite, respectively. P rM and P mR are probability functions which are estimated to have high probabilitiesat first and then converge to zero. These functions can be modeled by moreexperiments on metabolic networks. A CPU-GPU comparison analysis based onthis performance model is provided in supplementary material File 3 (S3.pdf). Theoretical analysis
In this section, the proposed system architecture for finding EFMs is discussedtheoretically on the graph model. 23 roposition 1.
By initializing a selected set of reactions R in = t r k | k P indices u with a default flux value, all minimal pathways which are including R in are ex-plored in the graph MG of a given metabolic network; the set “indices” indicatesa subset of reactions in MG . A set of output and and-related metabolites of the reactions in the set R in aretriggered at each step s by those reactions. From each first-visited metabolite,new pathways are created through outputs/inputs in forward/backward flowgetting primary tags. Recursively, all pathways including the set R in are createduntil there are no metabolites to be triggered for each path.Since an arbitrarily selected set of reactions can be given, disconnected sub-graphs might be generated and the given selection may violate the elementarityof the generated path. Therefore, undesired paths should be removed from theresults. Proposition 2.
Starting with an initialized set of reactions with certain fluxes,all nodes are traversed and all reactions in a pathway are assigned a flux usingEquation 2, as each node has a primary input and a primary output to be used inthis equation. Equation 2 states that the amount of the incoming and outgoingfluxes in N i should be equal. Proposition 3.
Flux dependencies can prevent the rate of the production/consumptionof an internal metabolite in a given topology of a pathway from being zero. Equa-tion 3 is used to calculate an update for the consumption/production rate of thenode. In the following cases, the pathway is discarded: • I f p n q ijp “ or • O f p n q ijp “ , • a loop is repeated over a node (i.e., getting back to a node from the samereaction multiple times), while trying to find a way out in a subgraph ofthe pathway. Lemma 1.
If a pathway is EFM, its graph model is reachable by graph traversalof Proposition 1.Proof.
An EFM is a flux-balanced pathway with no flux-balanced subset. Node-dependent pathways are referred to the pathways in which all related nodes ofa selected reaction are traversed. All node-dependent pathways are producedaccording to Proposition 1 through different reaction choices. Therefore, either aflux-balanced pathway has a flux-balanced subset (which is node-dependent andreachable by Proposition 1) or it is an EFM itself and reachable by Proposition1 as well.
Lemma 2.
In a consistent network, for two types of pathways there exists a flux-balanced solution.
Type I pathways are defined as pathways with no hyperarcs;either cycles or pathways from an input to an output with one input and oneoutput for each node.
Type II pathways are those with nodes with more thanone input or more than one output but with no flux-dependent reactions. roof. For type I pathways, all nodes have one primary input and one primaryoutput. Using Equation 2 subsequently, as stated in Proposition 2, fluxes areassigned to all reactions such that consumption/production of metabolites arebalanced. For type II pathways, using Equation 3, as stated in Proposition 3, forall visited nodes there is a sequence of reactions through primary inputs/outputsto balance production/consumption of metabolites.To analyze pathways with flux-dependent nodes, an optimization problem isdefined in Lemma 3.
Lemma 3.
Finding flux values for potentially flux-balanced pathways in theproposed architecture can be defined as an optimization problem.Proof.
Function f for node i is defined as f p i q “ ř I c ij I f ijp ´ ř O c ij O f ijp . Theproblem of finding flux-balanced pathways, considering that the function of eachmetabolite i is independent, is defined as follows:Minimize: g p i q “ ř f p i q , i “ M , in which M is the number of active metabo-lites in the candidate pathway p .Constraints: For each node i , the outgoing/ingoing output/input reaction shouldbe selected such that g p i q “
0. The selected reaction carries the flux from theset of active reactions j in p as stated in propositions 2 and 3. Theorem 1.
Sampling parallel pathways on the proposed system architecture,leads to exploration of a set of EFMs which includes all type I and type II
EFMs.Proof.
Based on the result of coupon collector ' s problem in probability theory,the expected number of picks required to choose all the elements of a set is n n ř k “ k which for large n is approximately n log n . Consider the length of apathway as the number of its contributed metabolites. For a pathway withlength l , for each metabolite, r max selections are possible, in which r max is themaximum size of output/input array in the forward/backward flow. Therefore, p r max log r max q l efforts for a pathway with length l are required to create allpossible routes. Number of efforts reflects number of parallel pathways, consid-ered as distinct blocks on the GPU platform. The result of Lemma 1 is usedto show that by selecting all outputs/inputs (in forward/backward flow), allpathways are produced. The results of Lemma 2 and Lemma 3 are used toshow that the function running on each GPU thread keeps metabolite produc-tion/consumption rates balanced and leads to the construction of type I and typeII flux-balanced pathways. With a proper solution for the optimization problemof Lemma 3, a balanced solution for other pathways can be explored. Discussion summary
The proposed model traverses the graph and keeps both the stoichiometry in-formation and the minimality of the path with the opportunity of not exploring25he whole solution space. Our approach tries to decide if a certain “path” isEFM or not, merely based on topology rules (instead of recognizing the elemen-tarity through rank test or comparing new candidates with produced ones). Thegraph structure makes a good track of unbalanced metabolites. Each metaboliteis considered as an object and the graph structure makes it easier to set-up rulesfor paths and explore the intended solution space via rules; e.g., initializing aset of reactions to find their including EFMs.A hypergraph is a simplified type of the modified AND/OR graph. In theintroduced model, each arc has two different input and output coefficients. Be-sides, for each arc, a dynamic label for each pathway is defined, that is, the fluxof the reaction represented by that arc. The complication arises since fluxes ofcontributing input and output reactions over a node may be related. This re-sults to the non-linear property of fluxes. The non-linearity has been consideredin our flux calculation procedure (see Eq. 2 and Eq. 3).In comparison to the method of [20], both methods are exploring the non-steady-state solution space and try to apply additional constraints to reduce thecomplexity. However, in this approach instead of using the support of LP-basedtools, we used an inherent parallel structure of the network and developed amodel on hardware. However, while the basic framework is sketched and thepotential of the method on pathways belongs to large networks is shown, furtherstudied are required to automatize the approach to get desired output EFMs.The first introduced double-description method for finding EFMs exploresthe whole set of pathways to find pathways that are in the steady-state solu-tion space and then selects EFMs by comparing the reaction subset of path-ways [6]. In the improved double-description versions of the method which usenull-space of the stoichiometry matrix as an input, different combinations ofpathways in the null-space are calculated and then the reaction subset of path-ways are compared [12, 13, 14]. In [22], GB-EFM method first calculates thereaction-dependent pathways in the solution space and then checks to see ifthese pathways can be in the steady-state by using some rules on the topologyof the pathway. In the proposed model the procedure of constructing the pathsand balancing them are combined together to introduce a hardware independentcore so as to combine the results of all these cores with the same decision tableand distributed data on the network to calculate EFMs.
Conclusion
In this paper, a modular system architecture was proposed to calculate minimalflux-balanced metabolic pathways. The architecture is based on the AND/ORgraph model. Each META x module was designed in order to emulate the inter-nal function of a metabolite for finding EFMs. The proposed architecture wasimplemented on a GPU platform to take advantage of the parallel architectureprovided in the GPU based on multiple cores and hierarchical memory. Thememory levels of the GPU are used to illustrate the memory hierarchy in thesystem. The topology-based parallelism obtained by the system was the main26chievement of the model. Additionally, the simplified metabolic network ofthe CHO cell was studied to prove the concept of the design on metabolic net-works to find EFMs. Besides, the potential of the model was studied on shortestpathways of the E.coli model.Studying genome-scale models and finding biologically meaningful path-ways [30, 31] by setting rules in the module ' s function are considered as ourfuture research. In this paper, the static structure of the GPU was used. Usingdynamic thread activation, as provided in recent GPU architectures, and storeaccurate pathway information according to the available memory space are theideas to make the model appropriate for genome-scale analysis. In addition,core function decisions can be improved by randomized decisions while keepinga global cost function to manage the moves, which is considered to investigate inour future research. Besides, partitioning and compression preprocessing meth-ods can be used further to overcome the limitation of the hardware platforms. Supporting information
S1 excel file. Applying the proposed model on
E.coli shortest paths.
Complementary data are provided in the excel file.
S2 pdf file. Applying the proposed model on
E.coli shortest paths.
Complementary figures are provided in the pdf file.
S3 pdf file. CPU-GPU comparison model.
Complementary figure isprovided in the pdf file.
Acknowledgments
Authors would like to thank Dr. Nathan Lewis (UCSD) for helpful discussionson the concept.