[PDF] A system architecture for parallel analysis of flux-balanced metabolic pathways

Abstract

In this paper, a system architecture is proposed that approximately models the functionality of metabolic networks. The AND/OR graph model is used to represent the metabolic network and each processing element in the system emulates the functionality of a metabolite. The system is implemented on a graphics processing unit (GPU) as the hardware platform using CUDA environment. The proposed architecture takes advantage of the inherent parallelism in the network structure in terms of both pathway and metabolite traversal. The function of each element is defined such that it can find flux-balanced pathways. Pathways in both small and large metabolic networks are applied to the proposed architecture and the results are discussed.

Full PDF

AA system architecture for parallel analysis of ﬂux-balancedmetabolic pathways

Mona Arabzadeh , Mehdi Sedighi , Morteza Saheb Zamani ,and Sayed-Amir Marashi Department of Computer Engineering and Information Technology,Amirkabir University of Technology, Tehran, Iran and Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran. { m.arabzadeh,msedighi,szamani } @aut.ac.ir, [email protected] Abstract

In this paper, a system architecture is proposed that approximatelymodels the functionality of metabolic networks. The AND/OR graphmodel is used to represent the metabolic network and each processingelement in the system emulates the functionality of a metabolite. Thesystem is implemented on a graphics processing unit (GPU) as the hard-ware platform using CUDA environment. The proposed architecture takesadvantage of the inherent parallelism in the network structure in terms ofboth pathway and metabolite traversal. The function of each element isdeﬁned such that it can ﬁnd ﬂux-balanced pathways. Pathways in bothsmall and large metabolic networks are applied to the proposed architec-ture and the results are discussed.

Keywords:

Elementary Flux Mode (EFM); Graph Data Model; Graphics Process-ing Unit (GPU); Metabolic Pathways

Introduction

System architecture is deﬁned as a generic discipline to handle “systems” consid-ered as existing or to be created objects [1]. The main purpose of this modelingis to support reasoning about the structural behavior and properties of the ob-jects. System architecture helps to consistently describe and eﬃciently designcomplex systems such as an industrial system or an organization which can becomprise of smaller parts called “subsystems” as shown in Fig 1-a. In the pro-posed method, each subsystem is considered as a metabolite. The approach isextensively described in the

Materials and methods section.

Systems Biology , known as the system-level integration of experimental andcomputational studies in biology, plays a signiﬁcant role in understanding com-plex network systems [2]. In biological systems, functions emerge both from theelements and the network interconnections. This underscores the importance ofcomputational analysis to extract useful information from biological data.Reconstruction of genome-scale metabolic network models, which are amongthe best-studied models in biotechnology, is possible by collecting the gene-protein-reaction information from related biochemical databases, genome anno-tations and literature [3]. Imposing constraints on the ﬂuxes of a reconstructed1 a r X i v : . [ q - b i o . M N ] N ov iochemical network results in the deﬁnition of achievable cellular metabolicfunctions [4]. Several mathematical representations of constraints are typicallyused for constraint-based modeling of metabolic networks, including ﬂux-balance constraints (e.g., conservation of metabolic ﬂuxes) and ﬂux bounds . The formermeans the network should be at the steady-state condition and the latter limitsthe numerical ranges of network parameters such as the minimum and maximumrange of ﬂuxes for each reaction.Elementary Flux Mode (EFM) analysis is a well-studied method in constraint-based modeling of metabolic networks. In EFM analysis, a network is decom-posed into minimal functional pathways [5] based on the assumption of balancedmetabolic ﬂuxes.The use of EFM analysis [6, 7] is of interest in several biological applications.Bioengineering [8], phenotypic characterization [9], drug target prediction [10]and strain design [11] can be mentioned among these applications.Double-description is a technique to enumerate all extreme rays of a poly-hedral cone that is widely used in ﬁnding EFMs. Some of these methods usethe stoichiometric matrix itself [6] and some others use the null-space of thestoichiometric matrix to generate EFM candidates [12, 13, 14]. Computationalapproaches have been proposed to speed up double-description-based methodsto compute EFMs [15, 16] and some of them led to the development of compu-tational tools such as Metatool [17] and

EFMtool [16]. Some recent methods,such as [18], try to bring the insights of graph-theory to generating EFM candi-dates. Hardware platforms such as GPUs are also used as accelerators to speedup the process of generating candidates [19]. Besides, methods that explore aset of EFMs with speciﬁc properties, such as K -shortest EFMs [20], or EFMswith a given set of target reactions [21], have been proposed based on linearprogramming.The main focus of this paper is to propose a system architecture based onthe AND/OR graph data model [22]. In the AND/OR graph representation ofa metabolic network, metabolites are considered as graph nodes and connectedto each other through hyperarcs which model as reactions. The input of thissystem is the stoichiometric matrix of the given metabolic network. Its outputis the set of EFMs of the network as shown in Fig 1-b. The term systems bi-ology should not be confused with system architecture . “Systems biology” is ageneral term which is used for describing the holistic view of a biological en-tity. In systems biology, ﬁnding EFMs is a classical problem which can help inunderstanding the global behavior of metabolism. Here, the problem of ﬁndingEFMs in the context of systems biology is modeled as a system architecture. Thesmallest repetitive part of the system is considered as a module that emulatesthe function of a metabolite to ﬁnd the set of minimal ﬂux-balanced pathwaysknown as EFMs. The system has a network topology that enables a paralleliz-able computational scheme. Designing a model to map a biological network to ahardware platform to take advantage of the potential multicore computationalpower, and not necessarily just a hardware accelerator, is the main contribu-tion of this paper. The underlying innovation in our proposed method is notin a parallel computing implementation of the existing algorithms. Instead, we2onsider a metabolic network as a system with seemingly independent subsys-tems that have very intricate relationships with their surroundings. While eachsubsystem is acting in parallel with the other ones, it is tightly coupled withthe rest in terms of the inputs it receives and outputs it generates. We call thisarchitectural or topology-based parallelism. Inspired by this observation, wedeﬁne a model that closely resembles a real metabolic network. In this modeleach metabolite is represented by an “independent” processing element that isbusy performing its own reactions by processing its inputs and producing itsoutputs. Once this model is established, GPU seems like a natural choice toimplement the notion of topology-based parallelism. Complex SystemSubsystem A Subsystem C Subsystem B Input Output Metabolic NetworkMetabolite B EFMsStoichiometric Matrix Metabolite A Metabolite C (a) (b) Figure 1:

System Architecture. (a) Designing a distributed and analyzablestructure for a complex system consider as a system architecture. (b) Systemarchitecture view of the proposed method.The rest of the paper is organized as follows. Preliminaries are provided inSection

Preliminaries . The main system design along with required deﬁnitionsare discussed in Section

Materials and methods . Section

Implementation andanalysis is devoted to implementation and results. Section

Discussion discussesdesign metrics and speciﬁcations and, ﬁnally, Section

Conclusion concludes thepaper.

Preliminaries

In this section, some basic concepts of metabolic networks and their counterpartgraph data model, formal deﬁnition of EFMs and a brief introduction to parallelcomputing are provided.

Metabolic networks

Metabolic networks model the metabolism of living cells in terms of a set of bio-chemical reactions. The biochemical reactions can be irreversible or reversiblewhich means the reaction can be active only in one direction, or can be active inboth directions, respectively. The contributing metabolites in a reaction can beeither substrates or products. Substrates are consumed and products are pro-duced during the operation of a reaction. The topology of a metabolic network3s characterized by its m ˆ n stoichiometric matrix, S , where m and n corre-spond to the number of metabolites and the number of reactions, respectively.The value S ij represents the stoichiometric coeﬃcient of the metabolite i in thereaction j . S ij is positive/negative if the metabolite i is produced/consumed.If this coeﬃcient is zero it means that the metabolite i does not contribute tothe reaction j . The network is considered in the steady-state if for each internalmetabolite, the rates of consumption and production are equal. The reactionsconnected to the external metabolites are called Boundary reactions.

Deﬁnition 1. Flux Mode. A ﬂux mode v P R n illustrates ﬂux distributionsof a set of reactions in a given metabolic network. Non-zero values in v representreaction ﬂuxes. Deﬁnition 2. Flux-Balanced Mode.

A ﬂux mode v P R n and v ‰ iscalled ﬂux-balance , if it meets the following conditions: • v i ě i P irreversible reactions (thermodynamic constraint) and • S . v “ , i.e., the rates of consumption and production of internal metabo-lites are equal (steady-state condition). The low-dot operator simply isthe matrix inner product. Deﬁnition 3. Elementary Flux Mode.

A ﬂux mode v P R n and v ‰ isconsidered an EFM, if it meets the following conditions: • v is ﬂux-balanced based on Deﬁnition 2, • there is no v P R n with supp p v q Ă supp p v q , where support of a modeis deﬁned as supp p v q “ t i | v i ‰ u , (minimality limitation).In Fig 2, an example of a metabolic network is illustrated. Metabolic network data model

The used data model in this paper is derived from [22]. The model is basedon the conventional AND/OR graph in computer science with additional fea-tures to make the model appropriate for object-oriented methods consideringthe metabolites as objects. In this model, the coeﬃcients of the metabolitesin reactions are embedded in the graph structure as attributes of each node(deﬁned in Deﬁnition 6, below). The information of the candidate pathways isalso embedded in the graph structure.

Deﬁnition 4.

Modiﬁed Graph for representing a metabolic network, denotedas MG , is deﬁned as a set of Nodes , i.e., MG = t N i | ď i ď M ´ u , where M is the number of internal metabolites and each N i represents a metabolitein the network. Deﬁnition 5.

Each node N i in MG is a 3-tuple N i = ( i , I , O ) where • i is the tag of a metabolite, 4 BC r r r r r −⎡ ⎤⎢ ⎥−⎢ ⎥⎢ ⎥−⎣ ⎦ r r r r r ABC

S = ⎧ ⎫⎡ ⎤ ⎡ ⎤⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎨ ⎬⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎢ ⎥ ⎢ ⎥⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭

EFMs = v v S v =0S v =0(a) (b)(c) Figure 2:

An example of a metabolic network. (a) Stoichiometric matrixof the network. (b) Graph representation of the network. (c) The set of EFMsof the network. • I is an array of input reactions that produce the metabolite and • O is an array of output reactions that consume the metabolite. Deﬁnition 6.

Each I / O in Deﬁnition 5 contains the following data: • The reaction j , 0 ď j ď r ´

1, where r is the number of reactions consum-ing/producing the metabolite i , • I M j / O M j , an array of the metabolites consumed/produced by reaction j .In other words, I M j / O M j = t m kj | ď k ď m ´ u , where m is the numberof consumed/produced metabolites by the reaction j , • I " M j / O " M j , an array of the metabolites produced/consumed by reaction j . In other words, I " M j / O " M j = t m kj | ď k ď m ´ u , where m is thenumber of produced/consumed metabolites by the reaction j excludingthe metabolite i itself, • The direction of the reaction for reversible reactions, • I c ij / O c ij , the coeﬃcient of the reaction j for the produced/consumedmetabolite i in the stoichiometric matrix S and • I f ijp / O f ijp , the ﬂux of the input/output reaction j for a certain path p .For bidirectional reactions, the term “direction” is used to clarify that eitherthe reactants react to form the products, or the products react together toproduce the reactants in the backward direction. A hyperarc links a set of5odes to another set in one connection. The incoming hyperarcs (i.e., I inDeﬁnition 5) in each N i produce the metabolite i and the outgoing hyperarcs(i.e., O in Deﬁnition 5) consume it. Therefore, consumed metabolites, m i , andproduced metabolites, m i , contributing to reaction j as shown in Eq 1, are ashyperarcs between N i nodes each associated to one metabolite. I hyperarcs and O hyperarcs in N i nodes are related to each other by reaction tags in Deﬁnition6. The set of metabolites in m i , m i , are AND-related . The incoming I (outgoing O ) hyperarcs for each N i are OR-related . r j : m ` m ` ... ` m i é m ` m ` ... ` m i . (1) Deﬁnition 7. Pathway.

A pathway in a graph data model is deﬁned as achain of adjacent N i nodes such that they form a hyperpath from a source nodeto a sink node. Adjacent nodes represent the metabolites which contribute in acommon reaction.A pathway can be seen as a subgraph of the AND/OR representation of ametabolic network. In addition, a pathway can be represented as a vector in aconvex ﬂux cone. It should be noted here that these two deﬁnitions are shownto be equivalent [23]. Parallel computing and programming

Parallel computing is a type of computation in which independent executableelements of a system can run simultaneously. This is based on the assumptionthat large problems can often be divided into smaller ones, which can then besolved at the same time. Solving complex scientiﬁc and engineering problemsrequire huge computational power. Processing power in high performance com-puting (HPC) is measured by ﬂoating point operations per second, or

FLOP/s and the size of data in

Bytes considering that a double precision ﬂoating pointnumber takes 8 bytes of memory. Memory and timing constraints are amongprincipal reasons that move the trend to parallel computing and programmingframeworks. Each memory access can take several CPU cycles and increasingclock frequency does not readily improve performance anymore.There are several forms of parallel computing such as instruction-level andthread-level schemes. Instruction-Level Parallelism (ILP) is deﬁned as a par-allelism among individual instructions executed by a microprocessor. Parallelparts are automatically extracted by the microprocessor. This type of paral-lelism is limited by the pipeline depth and instruction dependencies. On theother hand, Thread-Level Parallelism (TLP) happens when a set of multipleconcurrent tasks are running and the programmer ' s involvement is required toextract the parallelism as done in our model.Considering data along with instructions, two models are deﬁned. In Sin-gle Instruction Multiple Data (SIMD) model, a large number of (usually small)processing elements apply a single instruction on multiple data sets. In otherwords, a single controlling processor issues each instruction and each process-ing element executes the same instruction. Multiple Instruction Multiple Data(MIMD) is another model in which each processor executes its own sequence of6nstructions independently. An extended model of SIMD, Single Thread Multi-ple Data (STMD), is used in the proposed model and will be explained later.A programming model is a bridge between a system developer ' s naturalmodel of an application and an implementation of that application on the tar-get hardware. A programming model must allow the programmer to balancethe competing goals of productivity, in terms of time and resource, and imple-mentation eﬃciency [24].Diﬀerent hardware platforms and programming environments are introducedin the recent years to address the parallel computing requirements. The pro-posed system independent of the underlying hardware is presented in Section Materials and methods and the appropriate hardware for our proposed modeland its programming requirements are discussed in Section

Implementation andanalysis . Materials and methods

In this section, the proposed system architecture is introduced and then therelationship between the system and the applied data model is discussed. Themain target of the system is to model the function of each metabolite and createminimal pathways along these functional systems. To better describe the model,a set of terms are deﬁned in the following section.

Deﬁnitions

Deﬁnition 8. Pathway Creation.

Pathway creation in a graph data model isdeﬁned as the process of starting from a boundary reaction (i.e., an external arc)and following the chain of reaction-metabolites, recursively adding the AND-related nodes to a list. When a reaction adds a metabolite to the list of thepathway, the metabolite is called to be triggered by the reaction.

Deﬁnition 9. Forward/Backward Flow.

The term f orward is appliedto a ﬂow when the process of pathway creation is in a forward direction for aparticular metabolite, which means that the metabolite is produced accordingto the ﬂux changes of the input/output reactions of that metabolite. The term backward is applied when the ﬂow is in a backward direction, i.e, the metaboliteis consumed according to the ﬂux changes.

Deﬁnition 10. Primary and Secondary Reactions.

On each pathway,each node (metabolite) has a primary input (PI) and a primary output (PO).In forward/backward ﬂow, a PI/PO is the ﬁrst reaction which enters a node anda PO/PI is the ﬁrst reaction which directs the ﬂow of the pathway and tags thenode as visited . When an edge (i.e., a reaction) enters an already visited nodein that pathway in a forward/backward ﬂow as an input/output, that reactionis tagged as a secondary input/output (SI/SO) of that node.7or example, when metabolites A and B are AND-related, if A is con-sumed during the reaction A + B Ñ C + D , then B should also be present in themedium which means A and B should be consumed simultaneously. The for-ward/backward ﬂow is designed for the produced/consumed metabolites [22].In this example, the metabolites A and B are consumed while C and D areproduced. The order of adding nodes to the list in the pathway creation processin the forward/backward ﬂow names reactions as primary or secondary. Anexample of primary and secondary reactions is illustrated in the graph model ofFig 3. r r r r r r Figure 3:

Primary and Secondary Reactions.

The pairs (r0,r1), (r1,r3),(r3,r5), (r3,r2), (r3,r4) act as primary reactions (input, output) for the ﬁve nodessince they tag the nodes as visited in the pathway creation process. The twonodes with dashed reactions are already visited when r2 and r4 enter, so thereactions are tagged as secondary reactions.

Deﬁnition 11. Flux-Dependent Reactions.

A primary output and a sec-ondary input in a forward ﬂow (or a primary input and a secondary outputin a backward ﬂow) are called ﬂux-dependent reactions if the pathway whichgoes through the primary reaction, extends to the secondary reaction. In thiscase, their ﬂuxes are linearly dependent over the target node. The dependencyis shown by the sign in the rest of the paper. The two dependent types areillustrated in Fig 4. The proposed architecture

In this section, the main architecture of the system to model and analyzemetabolic networks is provided. The main goal of the system is to create mini-mal ﬂux-balanced metabolic pathways. In the proposed architecture, each nodein an MG graph is considered as a processing element or module with a lo-cal memory. The whole system has a global memory used by the processingelements to synchronize their tasks. The system architecture is illustrated inFig 5. It consists of m META modules, each corresponding to one node, and8 a) (b) R PI R SI R PO R PI R PO R SO N i N i Figure 4:

Flux dependencies.

Reactions with ﬂux dependencies with respectto metabolite N i . (a) R P O (primary output) and R SI (secondary input) arerecognized as ﬂux-dependent reactions in a forward ﬂow, R P O R SI . (b) R P I (primary input) and R SO (secondary output) are recognized as ﬂux-dependentreactions in a backward ﬂow, R P I R SO .an arbiter which evaluates the constructed pathways and decides whether theybelong to a subset of the desired pathways or not. The modules are introducedin further detail below. InputsOutputs

META ARBITER

META x InputsOutputs

META InputsOutputs

META M CONTROL AND ARITHMETIC UNIT MEMORY

ARBITER

Number of ReactionsNumber of EFMs

ARBITER

Figure 5:

Main architecture.

The main architecture of the hardware modelof a metabolic network. 9 odules: META x Each META module with the index x consists of a control/arithmetic unit and amemory unit. The task of the module is to keep the information of the pathways(i.e., EFM candidates) and to execute the required instructions based on theobtained information from the neighboring META modules. Two modules areneighbors if they are data dependent (i.e., have shared reactions). The twointernal units of the META x module are discussed below. Control and Arithmetic Unit.

Fig 6 illustrates the ﬂowchart of theoperation of the control and arithmetic unit. The task of all boxes labeled asPROC in the ﬁgure are summarized in Table 1.

STARTSTOP

PROC2'

PROC PROC PROC PROC PROC Flux<0

Primary Input Primary Output

PROC MEMORY

Control and Arithmetic Unit

PROC 1

Non-VisitedMetabolite

PROC 1

PROC Flux>0

PROC 1PROC 1

PROC Non-VisitedMetaboliteVisitedMetaboliteVisitedMetabolite Flux = 0

Primary InputPrimary Output

Figure 6:

Control and arithmetic unit.

The ﬂowchart of the control andarithmetic unit of the META x module.The thread inside the unit is running continuously until the arbiter closesthe processing element or the maximum times a node is visited, representedby MAX LOOP, is reached. In the beginning, the metabolite balance of eachpathway, is calculated in PROC based on Eq 2. EF “ ÿ I c ij I f ijp ´ ÿ O c ij O f ijp . (2)In this equation, EF indicates the extra ﬂux. Depending on the value of EF , one of the three routes in the chart should be selected. If the metabolite10able 1: The task of the processes in Fig 6. Process Name TaskPROC Calculate metabolite balance for each pathway using Eq. (2)PROC p For each output/input, start a new pathway using Eq. (3) and go to PROC ;Tag P rimary input/output reactionsPROC p Check reaction dependency statusPROC p Use Eq. (4) to balance the ﬂux and go to PROC or go to STOP if MAX LOOP is reachedPROC p Use Eq. (5) to balance the ﬂux and go to PROC or go to STOP if MAX LOOP is reached is visited for the ﬁrst time, new pathways are created for each output/input inPROC /PROC using Eq 3. O f ijp “ I fijp I cij O cij , EF ą p Forward q I f ijp “ O fijp O cij I cij , EF ă p Backward q . (3)Otherwise in PROC /PROC the reaction dependency is checked betweenthe input/output ﬂux-changed reactions. Based on the results of PROC /PROC ,if reactions are independent, Eq 4 is used in PROC /PROC to pass the ﬂuxto PO/PI reactions. I f p n q ijp “ I f p o q ijp I cij ` O fiep I cie I cij ,O f p n q ijp “ O f p o q ijp O cij ` O fiep I cie O cij . (4)Otherwise, Eq 5 is used and ﬂuxes are passed to PI/PO reactions. In bothPROC /PROC and PROC /PROC , a variable which keeps track of the num-ber of times a metabolite is visited is checked against a constant MAX LOOPto avoid the loop over a node forever. I f p n q ijp “ I f p o q ijp I cij ´ I fiep I cie I cij ,O f p n q ijp “ O f p o q ijp O cij ´ I fiep I cie O cij . (5) Memory Unit.

The data stored in the memory unit of the META x can bedivided into two categories. • Metabolite Local : Consists of static and dynamic information. Static infor-mation of each metabolite includes input/output stoichiometry coeﬃcientsand dynamic information includes the metabolite status in each pathwaywhich is changed over time. • Path Local : Consists of dynamic information of a pathway.11tatic and dynamic terms indicate Read-Only and Read/Write memory types,respectively. Fig 7 illustrates the structure of the memory unit.

MEMORY of META x Reaction Direction Reaction FluxReaction FluxReaction FluxReaction CoefReaction CoefReaction Coef Reaction DirectionReaction DirectionReaction SeenReaction SeenReaction Seen Reaction Direction Reaction FluxReaction FluxReaction FluxReaction DirectionReaction DirectionPathway IDMetabolite Status Reaction Direction Reaction FluxReaction FluxReaction FluxReaction DirectionReaction DirectionReaction StatusReaction StatusReaction Status

Dynamic for each

Pathway

Dynamic for each

Metabolite

Reaction CoefReaction CoefReaction Coef O U T P U T S I N P U T S Static for each

Metabolite

Figure 7:

Memory unit.

The structure of the memory unit of META x .. Modules: Arbiter

Based on the pathway information stored in META x modules, the arbiter de-cides which pathway candidates are balanced and to be considered as EFMs.When all META x modules are done with the analysis of all in-process pathways,the arbiter reports the result. The function of the arbiter is considered as a partof META x function in Algorithm 3. The decision is made based on the statusof each pathway speciﬁed in the algorithm as discussed later. Implementation and analysis

The hardware platform introduced in Section

Materials and methods is imple-mented on a Graphic Processor Unit (GPU). To demonstrate the operation ofthis platform, a metabolic network is considered for experiments and the re-sults are provided in the remainder of this section. The platform consists ofa set of parallel threads of STMD type where each META x module runs thesame thread for various data sets. Complication arises because of the inevitabledependencies between the modules ' s data. General-purpose graphics processor units

GPUs were originally developed for computationally intensive graphics of gamesand image processing applications in computer systems. But recently, their vastcomputational power has been utilized in general purpose computing.There are several reasons to consider GPUs as the hardware platform toemulate metabolic networks in this research. Firstly, a suitable environment12or software development based on C/C++ is available for high-level and easyprogramming which is convenient to try alternative options for running testroutines. Secondly, a large number of ﬂoating point units (FPUs) in GPUs canoperate in parallel to accelerate our analysis.A general architecture of a GPU is shown in Fig 8-a in which the mainprogram is running on a CPU. To run a process on the GPU, ﬁrst the data inthe CPU memory is copied to the GPU memory. When the process is doneon the GPU, the result is copied from the GPU memory to the CPU memory.The parallel functional units of the GPU are blocks , each consisting of a set of threads . All threads in all blocks perform the same function in one access tothe GPU. In other words, the main stream of the program is running on theCPU and function calls from the CPU initialize a process on all GPU threads.By the term access , we mean the function call that is invoked by the CPU tosetup the GPU. Therefore, all threads run simultaneously from a programmer ' spoint of view. A typical GPU has three memory levels: a local memory for eachthread, a shared memory between threads of a block and a global memory.The correspondence between the system architecture and the GPU platformis illustrated in Fig 8-b. Each META x element stands on a thread and eachblock is dedicated to a candidate EFM pathway which means that diﬀerentpathways are processed by diﬀerent blocks. CPU GPU

Global MemoryShared MemoryLocal Memory

Copy Data to GPUCopy Data to CPU

Parallel Elements

BlockThread (a) (b)

Number ofEFMCandidatesNumber of Metabolites

Figure 8:

The proposed system architecture on the GPU platform. (a)GPU general architecture and the way GPU and CPU communicate. (b) Howthe proposed system architecture is mapped onto the GPU.Algorithm 1 shows the main function running on the CPU. Three accesses,called kernel functions, are made from CPU to GPU as shown in Fig 9. Detailsof each function are discussed below.

Finding Dependent Reactions over All Metabolites.

In the ﬂowchartof the control/arithmetic unit (Fig 6), ﬂux-dependent reactions of a metabo-13

LGORITHM 1:

Main body of the algorithm.

Input : The stoichiometric matrix of a given metabolic network.

Output : A set of minimal ﬂux-balanced pathways of the network. M R C L r max m max B , T )Convert the given matrix to the MG graph structure.Copy MG structure from CPU to GPU global memory. /* Finding dependent reactions with depth L over all metabolites. */ Set ( B , T ) as ( M , r maxL ) to be run on GPU.Run DEPTHy() on each thread. /* Initializing selected reactions with a nonzero value in all Blocks . */

Set ( B , T ) as ( C , R ) to be run on GPU.Run INIT() on each thread. /* Calculating balanced pathways by keeping all metabolites in steady-state. */ Set ( B , T ) as ( C , M ) to be run on GPU.Run METAx() on each thread.Copy MG pathways information from GPU to CPU. lite should be identiﬁed to make better decisions. Algorithm 2 describes theDEPTH y function which starts a path through a metabolite N i and continuesuntil the path length (i.e., the number of involved reactions on the path) isequal to L . If N i is reached again in the created path with length ď L , theinvolved reactions of N i are characterized as ﬂux-dependent. To do this usingthe GPU, one block is dedicated to each metabolite. Each thread of each blockis dedicated to a path starting from N i . Therefore, DEPTH y function is exe-cuted on each thread of each block to traverse paths and resolve dependencies.Each thread of a block is identiﬁed by a block ID and a thread ID. The blockID is set to i for the metabolite N i . The thread ID should specify a uniquepath. To do this, the thread ID is used to route the path and to reveal theroute. We propose a static path-generator method in which the thread ID j as a decimal number is converted to a number in base r max and length L as j = p L ď l ă L q r max = p L ´ , L ´ , ..., l, ..., , q r max . r max is the maximum numberof output reactions of a node N i . So, the pair ( M , r maxL ) and DEPTH y is set up on each. We start from a node N i .The index l counts the number of nodes traversed on the graph, that is, theparameter depth . The digit in position l represents the order of the reactionwhich should be selected in depth l . Consider that the node in depth l in thequeue Q is represented by Q l . The number of outputs of Q l is considered as r which is the size of array O of the node. If l ă r , the process goes on, otherwisethe path is undeﬁned. To continue the process, check to see if the node Q l is14 nitialization of Target ReactionsCalculation of Balanced PathwaysFinding Dependent Reactions over all MetabolitesNetwork Structure Data E F M C a nd i d a t e D a t a E F M C a nd i d a t e D a t a E F M C a nd i d a t e D a t a L T (cid:206) (000)3

000 01 T (cid:206) (010)3 L T (cid:206) (000)3

000 01 T (cid:206) (010)3 Figure 9:

Kernel functions.

Three kernel functions are illustrated in theﬁgure. First, dependent reactions are explored. Each pathway is assigned toone thread and the thread number in base r max in L digits shows the order ofthe reactions. Two example for thread IDs 0 and 3 are shown in the ﬁgure. r max and L are both considered as three here. The conversion of 0 and 3 to thebase r max gives us p q and p q which specify the route of the pathway inthe assumed thread. Each yellow box is considered as a block and the processrepeats independently for all metabolites each of which dedicated to one block.After initializing the target reactions in one pass, each block is assigned toan EFM candidate and each thread performs the META x function. The bluerectangular shapes are memory elements showing the hierarchical memory levelsin GPU. 15 LGORITHM 2:

DEPTHy: Finding dependent reactions with depth L overmetabolite N i on thread T j in a given MG . Input : The MG graph of a metabolic network derived from its given stoichiometric matrix;Metabolite N i ; Thread ID T j ; Maximum number of input/output reactions of ametabolite r max . Output : Tagged dependent reactions of N i .Convert the decimal thread ID j to an L -digit number in base r max ; i.e., j = p L ď l ă L q rmax = p L ´ , L ´ , ..., , q rmax Store N i in the queue Q .Set l to 0.Set Start reaction as an input reaction of N i . while Q ‰ H and l ď L do Select the output reaction at level l based on L l .Check if the output is valid according to the size of O of Q l . if Q l P O MLl then

Set O as Last reaction.Tag the

Start reaction and the

Last reaction of N i as dependent reactions.Exit while . else Push back O MLl to Q . endend in the set of output metabolites of reaction L l shown by O M Ll (see Deﬁnition6). If the node is in the array, the process ends. Otherwise, it continues until l reaches L ´ Initialization of Target Reactions.

In this function, the target reactionswhich are selected to contribute in the pathway are initialized with a defaultvalue (“1” in our implementation). These reactions can be boundary reactionsor any other set of reactions. Connected metabolites to the initialized reactionsare triggered by them. To do this using the GPU, the pair ( C , R ) where C is the number of EFM candidate pathwaysand R is the number of reactions of the network. This is done on the GPU inone pass. Each thread sets/resets the initial value of a reaction on each blockwhich is dedicated to a candidate pathway. Calculation of Balanced Pathways.

In META x function in Algorithm 1,the pair ( C , M ). Each block is dedicatedto an EFM candidate pathway and each thread of the block is dedicated toa metabolite. The META x function is set up on all threads. Algorithm 3shows the function as sketched in the system design ﬂow in Fig 5. In thisfunction, each pathway is created using randomly selected output/input in theforward/backward ﬂow. EFM candidates at the end of the algorithm changeto either Done or notEFM status. The status remains as notDone as long asthere are still unbalanced metabolites on the pathway. Each thread executes a while(true) loop. The while loop is broken when all candidates take a Done or notEFM status. The function of the arbiter module in Fig 5 is embedded inthe META modules performing META x functions. This is because all threadsshould run the same function in each GPU access.16 LGORITHM 3:

METAx: on Thread T i of Block B j . Input : The MG graph of a metabolic network derived from its given stoichiometric matrix;Thread ID T i ; Block ID B j . Output : minimal ﬂux-balanced pathways. while true doforall the

Inputs and Outputs of N Ti do Multiply the input/output coeﬃcient to its ﬂux value (i.e., I cij ˆ I fijp and O cij ˆ O fijp where p is assigned to B j ) and add the value to ExtraFlux as stated in Eq. (2). endif

ExtraFlux “ then Tag N Ti as a Stable metabolite. else if

ExtraFlux ą thenif N Ti is not visited then Select a random output reaction.Calculate the ﬂux using Eq. (3).Save

Primary input, R PI , and Primary output, R PO , for N Ti .Tag the metabolite as visited . elseif R PI R PO is false and MAX LOOP is not reached then

Select output reaction R PO and use Eq. (4). else if R PI R PI is true and MAX LOOP is not reached then

Select input reaction R PI and use Eq. (4). endelse if MAX LOOP is reached then

Tag the EFM candidate as notEFM . endendendelse if ExtraFlux ă thenif N Ti is not visited then Select a random input reaction.Calculate the ﬂux using Eq. (3).Save

Primary output and input for N Ti .Tag the metabolite as visited . elseif R PI R PO is false and MAX LOOP is not reached then

Select input reaction R PI and use Eq. (4). else if R PI R PI is true and MAX LOOP is not reached then

Select output reaction R PO and use Eq. (4). endelse if MAX LOOP is reached then

Tag the EFM candidate as notEFM . endendendforall the Threads (Metabolites) in the Block B j doif All metabolites are

Stable then

Tag the candidate as

Done . endendforall the Blocks (EFM candidates) doif

All candidates tagged as

Done or notEFM then Exit while . endendend L “ r max “

4. The columns

Local M and

Shared M reportthe memory usage in bytes in each thread and each block, respectively. The lastcolumn shows the number of registers allocated for each thread.

Function Name µs ) Local M/Thread Shared M/Block Registers/ThreadDEPTHy 12 256 149.7 816 24 23INIT 48 18 17.9 0 32 12METAx 48 12 15583.2 60 40 27 Application on biologically relevant metabolic networks

CHO cell metabolism

To prove the proposed concept on metabolic networks, a core model of Chi-nese Hamster Ovary (CHO) cell metabolism was chosen from [25, 26]. CHOderived cell lines are the preferred host cells for the production of therapeuticproteins [27]. The stoichiometric matrix of this network is of size 12 ˆ

18 withtwo boundary reactions r and r . An NVIDIA GeForce GT330M GPU andCUDA version 5.5 platform were chosen. An Intel Core-i5 CPU with 4 GBRAM was used to run the test. The network has 7 EFMs. Using random path-way creation, ﬁve EFMs were obtained. Table 2 summarizes the results afterattempting 48 EFM candidates with depth L “ r max “

4. The globalmemory used to store the network on GPU was approximately 2 MBytes. Totest that all EFMs are computable by the approach, contributing reactions ofeach EFM were selected intentionally in the algorithm and their pathway werecreated. This test was passed and all EFMs were observed. Considering enoughtime and resource, the approach can lead to the calculation of all EFMs, asproved by Theorem 1.Fig 10 illustrates the steps for creating one of the EFMs in the metabolicnetwork of CHO cell. The table in the top-right corner of this ﬁgure showsthe ﬂux of reactions from the point of view of the in-process metabolites stepby step. There are six steps to balance the pathways and Step 7 shows theﬂuxes of the balanced pathway. The in-process metabolites in each step areshown by big-circles. In each step, ﬂuxes are not equal in active threads. Thishappens due to the accessibility of threads to the shared data of the reactions.To overcome the conﬂict, a ﬂag is set when a reaction ﬂux is read in Eq (1) andreset after write. If another thread reads the ﬂag as set, it should wait for theﬂag to reset.Table 3 reports some observations when diﬀerent sets of reactions are ini-tialized. It should be mentioned that diﬀerent starting reaction sets may poten-tially lead to diﬀerent EFMs. For each metabolite, the unbalanced status of themetabolite is counted. The number of conﬂicts is also reported. Subtracting thenumber of conﬂicts from the number of steps gives the actual number of parallelsteps. The

Sum column shows the total number of times that all threads wereactivated and the

Ave column shows the average number of a thread is active,that is, the thread has the computational load of updating ﬂuxes. The column18igure 10:

An EFM creation in CHO metabolic network.

The steps ofcreating one of the EFMs of the metabolic network of CHO cell. Six steps arerequired to balance the ﬂuxes of the EFM. In each step, from top to down, in-process metabolites are shown with big-circles and stable metabolites are shownwith smaller ones. The ﬂuxes are shown in the table from the point of view ofthe in-process metabolites in each step. The last column, Step 7, is the ﬁnalﬂuxes of the EFM identical from the view of all metabolites. Paths are depictedusing Escher [28]

Meta/Step shows the average number of in-process metabolites at each step. Ascan be derived from the table, by initializing internal reaction(s), more stepsare taken to make the pathway stable.

E.coli model iAF1260

To analyze the applicability of the proposed architecture on other metabolic net-works, the model of

E.coli network with 1668 metabolites and 2382 reactionswas chosen from [29]. ATP, AMP, ADP, H, Pi, CO , H O, COA, NAD, NADP,NADH and NADPH are considered as cofactors. As stated in the literature [20],some simpliﬁcations to the system ' s model can be performed in order to reducethe complexity of the problem, such as setting currency metabolites like cofac-tors and energy metabolites to external. According to [20], for energy currencymetabolites like ATP, NADH and FADH, since their concentration is assumed19able 3: The operation of the system when diﬀerent reaction sets are initialized.Three types of sets with one, two and three reactions are selected, in which onlyreaction 0 is a boundary reaction. Selected reactions Steps( « ) Conﬂict Step-Conf Sum Ave Meta/Step m m m m m m m m m r0 8 1 7 16 1.8 2.3 2 1 2 2 2 2 2 1 2r1 8 1 7 16 1.8 2.3 2 1 2 2 2 2 2 1 2r7 60 14 46 67 7.4 1.5 4 3 6 4 6 14 20 4 6r9 61 14 47 77 8.6 1.6 4 3 6 4 6 14 19 15 6r1,10 57 14 43 77 8.6 1.8 4 3 6 4 6 14 19 14 7r4,11 59 14 45 77 8.6 1.7 4 3 6 4 6 15 19 14 6r1,7,9 55 16 39 78 8.7 2.0 4 3 7 4 7 13 19 15 6r0,3,8 56 15 41 78 8.7 1.9 5 3 6 4 6 14 20 14 6 to be constant, they are not required to be balanced by an EFM. Notations aretaken of the COBRA model from [29].The set of shortest EFMs producing L-Lysine calculated by [20] was ex-tracted. The observability of 7 shortest EFMs with length 27 and 28 was stud-ied on the proposed architecture. Fig 11 illustrates the steps of constructing theEFM depicted in the ﬁgure. At each step, the set of unbalanced metabolites arespeciﬁed and the required changes on the reaction set are illustrated. Reactionﬂuxes are changed so as to achieve the least required steps to create and balancethe targeting EFM.The same information for all 7 paths are provided in supplementary materialFile 1 (S1.exe) and the resulting EFMs are depicted in supplementary materialFile 2 (S2.pdf). Summary of the results are provided in Fig 12. As shown inFig 12-a, the number of unstable metabolites tends to match a normal curvewhich means that the number of unbalanced metabolites is increased in theintermediate steps. The box plots are also depicted in Fig 12-b for each path toshow the distribution and the average of the unbalanced metabolites.This study shows the potentiality of the proposed architecture for ﬁndingEFMs on large-scale metabolic networks. Three sets of variables inﬂuence thegeneration of an EFM, specially on large-scale metabolic networks: (1) a setof ﬁrst-initialized reactions, (2) the primary input and output selection at ﬁrst-visited metabolites, (3) the decision on how to pass the ﬂow on the metaboliteswhich are not ﬁrst-visited. In this test, the external input reaction is initializedand for the rest, the best decisions were taken to pass the ﬂow through oneof the reactions of the in-process metabolite. In this example, an exhaustivesearch is applied to ﬁnd the best selection to pass the ﬂow. However, to makethe approach automated for large-scale networks, further studies are required.The order of metabolites and their accessability to the reaction ' s shared dataand the decision on how to pass a ﬂow to balance a pathway or report it asan unstable path is important. The convergence of the solution is discussed inSection Theoretical analysis . However, a practical implementation to reach thedesired solution in a reasonable time can be proposed by further research.20igure 11:

Producing an EFM in

E.coli metabolic network.

Elevensteps are required to balance EFM ﬂuxes. At each step, the set of unbalancedmetabolites are speciﬁed by blue cells. The process of these cells can run si-multaneously. Flux changes for reactions are shown at each step. The last rowstates that there is no unbalanced metabolite and the ﬂux set indicates the EFMﬂuxes in the steady state. The EFM is depicted using Escher [28]. Metaboliteand reaction notations are taken from the model of [29].

Discussion

Design parameters

In parallel computing, several design parameters should be considered to makethe system work eﬃciently on the hardware platform. Some of these parametersand the way they are treated in the proposed design are discussed.

Granularity.

The level of details considered in a model or decision makingprocess is considered as

Granularity . The coarser granularity results in thedeeper level of details. Granularity in parallel systems is deﬁned as the size ofa part of a system which is selected to work independently. In the proposeddesign, the granularity is chosen as the size of a metabolite according to thedata stored in each metabolite and the instructions to be executed in each ofthem. Besides, the function of a metabolite is the smallest repeatable part ofthe system. 21igure 12:

Parallel simulation of shortest EFMs of

E.coli iAF1260. (a)The number of unbalanced metabolites at each step for 7 pathways. (b) Thedistribution of unbalanced metabolites at each step for each pathway.

Coordination and Synchronization.

The memory and timing cost ofdata communication and resolving inconsistencies, which may occur because ofdata sharing, should be considered in the design. Uniform shared memory overdiﬀerent blocks and message passing are among data coordination methods. Inthe proposed design, the hierarchical memory structure with diﬀerent sharinglevels is used for data communication. To overcome the conﬂicts, solutions areproposed as discussed earlier.

Topology-based parallelism analysis

In the proposed model, two levels of parallelism are used as explained below.

Block-level (pathway-level) parallelism

This level of parallelism is based on the independence of diﬀerent pathways inthe network. In order words, distinct pathways are created from diﬀerent out-puts of a metabolite. Each new pathway is independent of the others. In ourimplementation, each block is dedicated to a pathway which executes indepen-dently. The idea is shown in Fig 13-a for a metabolite with two reactions.

Thread-level (metabolite-level) parallelism

This level of parallelism is based on the node structure. When a reaction takesan updated ﬂux, all connected nodes to that reaction, except the one which wasin-process, are triggered. In the proposed implementation, the metabolite-levelparallelism is applied by performing a META x function on diﬀerent threads22f a block. The idea is shown in Fig 13-b for a reaction with four connectedmetabolites. (a) (b) Figure 13:

Parallelism levels. (a) Block-level parallelism. The two path-ways can be traversed independently. (b) Thread-level parallelism. The threemetabolites can be processed simultaneously. Big circles are nodes which aregetting active concurrently. Thick arrows represent simultaneous active paths.

Performance modeling

It is usually hard to model the performance of a parallel system. Modeling theperformance of the proposed design is even harder since each thread executes ina while loop either to make a pathway ﬂux-balanced or discard it. Z n functionin Eq 6 is deﬁned to model the number of active parallel metabolites (threads)at Step n of the system. Z “ ,Z n “ Z n ´ ˆ P rM p rM n q ˆ P mR p mR n ´ q . (6)In this equation, rM n is the number of input or output reactions at eachstep and mR n is the number of metabolites involved in a reaction. At step n ,for each metabolite there are rM n reactions and each reaction has mR n ´ P rM and P mR functions are used in the model to consider thegraph dependencies in terms of pathway and metabolite, respectively. P rM and P mR are probability functions which are estimated to have high probabilitiesat ﬁrst and then converge to zero. These functions can be modeled by moreexperiments on metabolic networks. A CPU-GPU comparison analysis based onthis performance model is provided in supplementary material File 3 (S3.pdf). Theoretical analysis

In this section, the proposed system architecture for ﬁnding EFMs is discussedtheoretically on the graph model. 23 roposition 1.

By initializing a selected set of reactions R in = t r k | k P indices u with a default ﬂux value, all minimal pathways which are including R in are ex-plored in the graph MG of a given metabolic network; the set “indices” indicatesa subset of reactions in MG . A set of output and and-related metabolites of the reactions in the set R in aretriggered at each step s by those reactions. From each ﬁrst-visited metabolite,new pathways are created through outputs/inputs in forward/backward ﬂowgetting primary tags. Recursively, all pathways including the set R in are createduntil there are no metabolites to be triggered for each path.Since an arbitrarily selected set of reactions can be given, disconnected sub-graphs might be generated and the given selection may violate the elementarityof the generated path. Therefore, undesired paths should be removed from theresults. Proposition 2.

Starting with an initialized set of reactions with certain ﬂuxes,all nodes are traversed and all reactions in a pathway are assigned a ﬂux usingEquation 2, as each node has a primary input and a primary output to be used inthis equation. Equation 2 states that the amount of the incoming and outgoingﬂuxes in N i should be equal. Proposition 3.

Flux dependencies can prevent the rate of the production/consumptionof an internal metabolite in a given topology of a pathway from being zero. Equa-tion 3 is used to calculate an update for the consumption/production rate of thenode. In the following cases, the pathway is discarded: • I f p n q ijp “ or • O f p n q ijp “ , • a loop is repeated over a node (i.e., getting back to a node from the samereaction multiple times), while trying to ﬁnd a way out in a subgraph ofthe pathway. Lemma 1.

If a pathway is EFM, its graph model is reachable by graph traversalof Proposition 1.Proof.

An EFM is a ﬂux-balanced pathway with no ﬂux-balanced subset. Node-dependent pathways are referred to the pathways in which all related nodes ofa selected reaction are traversed. All node-dependent pathways are producedaccording to Proposition 1 through diﬀerent reaction choices. Therefore, either aﬂux-balanced pathway has a ﬂux-balanced subset (which is node-dependent andreachable by Proposition 1) or it is an EFM itself and reachable by Proposition1 as well.

Lemma 2.

In a consistent network, for two types of pathways there exists a ﬂux-balanced solution.

Type I pathways are deﬁned as pathways with no hyperarcs;either cycles or pathways from an input to an output with one input and oneoutput for each node.

Type II pathways are those with nodes with more thanone input or more than one output but with no ﬂux-dependent reactions. roof. For type I pathways, all nodes have one primary input and one primaryoutput. Using Equation 2 subsequently, as stated in Proposition 2, ﬂuxes areassigned to all reactions such that consumption/production of metabolites arebalanced. For type II pathways, using Equation 3, as stated in Proposition 3, forall visited nodes there is a sequence of reactions through primary inputs/outputsto balance production/consumption of metabolites.To analyze pathways with ﬂux-dependent nodes, an optimization problem isdeﬁned in Lemma 3.

Lemma 3.

Finding ﬂux values for potentially ﬂux-balanced pathways in theproposed architecture can be deﬁned as an optimization problem.Proof.

Function f for node i is deﬁned as f p i q “ ř I c ij I f ijp ´ ř O c ij O f ijp . Theproblem of ﬁnding ﬂux-balanced pathways, considering that the function of eachmetabolite i is independent, is deﬁned as follows:Minimize: g p i q “ ř f p i q , i “ M , in which M is the number of active metabo-lites in the candidate pathway p .Constraints: For each node i , the outgoing/ingoing output/input reaction shouldbe selected such that g p i q “

0. The selected reaction carries the ﬂux from theset of active reactions j in p as stated in propositions 2 and 3. Theorem 1.

Sampling parallel pathways on the proposed system architecture,leads to exploration of a set of EFMs which includes all type I and type II

EFMs.Proof.

Based on the result of coupon collector ' s problem in probability theory,the expected number of picks required to choose all the elements of a set is n n ř k “ k which for large n is approximately n log n . Consider the length of apathway as the number of its contributed metabolites. For a pathway withlength l , for each metabolite, r max selections are possible, in which r max is themaximum size of output/input array in the forward/backward ﬂow. Therefore, p r max log r max q l eﬀorts for a pathway with length l are required to create allpossible routes. Number of eﬀorts reﬂects number of parallel pathways, consid-ered as distinct blocks on the GPU platform. The result of Lemma 1 is usedto show that by selecting all outputs/inputs (in forward/backward ﬂow), allpathways are produced. The results of Lemma 2 and Lemma 3 are used toshow that the function running on each GPU thread keeps metabolite produc-tion/consumption rates balanced and leads to the construction of type I and typeII ﬂux-balanced pathways. With a proper solution for the optimization problemof Lemma 3, a balanced solution for other pathways can be explored. Discussion summary

The proposed model traverses the graph and keeps both the stoichiometry in-formation and the minimality of the path with the opportunity of not exploring25he whole solution space. Our approach tries to decide if a certain “path” isEFM or not, merely based on topology rules (instead of recognizing the elemen-tarity through rank test or comparing new candidates with produced ones). Thegraph structure makes a good track of unbalanced metabolites. Each metaboliteis considered as an object and the graph structure makes it easier to set-up rulesfor paths and explore the intended solution space via rules; e.g., initializing aset of reactions to ﬁnd their including EFMs.A hypergraph is a simpliﬁed type of the modiﬁed AND/OR graph. In theintroduced model, each arc has two diﬀerent input and output coeﬃcients. Be-sides, for each arc, a dynamic label for each pathway is deﬁned, that is, the ﬂuxof the reaction represented by that arc. The complication arises since ﬂuxes ofcontributing input and output reactions over a node may be related. This re-sults to the non-linear property of ﬂuxes. The non-linearity has been consideredin our ﬂux calculation procedure (see Eq. 2 and Eq. 3).In comparison to the method of [20], both methods are exploring the non-steady-state solution space and try to apply additional constraints to reduce thecomplexity. However, in this approach instead of using the support of LP-basedtools, we used an inherent parallel structure of the network and developed amodel on hardware. However, while the basic framework is sketched and thepotential of the method on pathways belongs to large networks is shown, furtherstudied are required to automatize the approach to get desired output EFMs.The ﬁrst introduced double-description method for ﬁnding EFMs exploresthe whole set of pathways to ﬁnd pathways that are in the steady-state solu-tion space and then selects EFMs by comparing the reaction subset of path-ways [6]. In the improved double-description versions of the method which usenull-space of the stoichiometry matrix as an input, diﬀerent combinations ofpathways in the null-space are calculated and then the reaction subset of path-ways are compared [12, 13, 14]. In [22], GB-EFM method ﬁrst calculates thereaction-dependent pathways in the solution space and then checks to see ifthese pathways can be in the steady-state by using some rules on the topologyof the pathway. In the proposed model the procedure of constructing the pathsand balancing them are combined together to introduce a hardware independentcore so as to combine the results of all these cores with the same decision tableand distributed data on the network to calculate EFMs.

Conclusion

In this paper, a modular system architecture was proposed to calculate minimalﬂux-balanced metabolic pathways. The architecture is based on the AND/ORgraph model. Each META x module was designed in order to emulate the inter-nal function of a metabolite for ﬁnding EFMs. The proposed architecture wasimplemented on a GPU platform to take advantage of the parallel architectureprovided in the GPU based on multiple cores and hierarchical memory. Thememory levels of the GPU are used to illustrate the memory hierarchy in thesystem. The topology-based parallelism obtained by the system was the main26chievement of the model. Additionally, the simpliﬁed metabolic network ofthe CHO cell was studied to prove the concept of the design on metabolic net-works to ﬁnd EFMs. Besides, the potential of the model was studied on shortestpathways of the E.coli model.Studying genome-scale models and ﬁnding biologically meaningful path-ways [30, 31] by setting rules in the module ' s function are considered as ourfuture research. In this paper, the static structure of the GPU was used. Usingdynamic thread activation, as provided in recent GPU architectures, and storeaccurate pathway information according to the available memory space are theideas to make the model appropriate for genome-scale analysis. In addition,core function decisions can be improved by randomized decisions while keepinga global cost function to manage the moves, which is considered to investigate inour future research. Besides, partitioning and compression preprocessing meth-ods can be used further to overcome the limitation of the hardware platforms. Supporting information

S1 excel ﬁle. Applying the proposed model on

E.coli shortest paths.

Complementary data are provided in the excel ﬁle.

S2 pdf ﬁle. Applying the proposed model on

E.coli shortest paths.

Complementary ﬁgures are provided in the pdf ﬁle.

S3 pdf ﬁle. CPU-GPU comparison model.

Complementary ﬁgure isprovided in the pdf ﬁle.

Acknowledgments

Authors would like to thank Dr. Nathan Lewis (UCSD) for helpful discussionson the concept.