Workflow Management on BFT Blockchains
SSMaRt Blockchain Distributed WorkflowManagement
Joerg Evermann and Henry Kim Memorial University of Newfoundland, St. John’s, Canada [email protected] York University, Toronto, Canada [email protected]
Abstract.
Blockchain technology has been proposed as a new infra-structure technology for a wide variety of novel applications. Blockchainsprovide an immutable record of transactions, making them useful whenbusiness actors do not trust each other. Their distributed nature makesthem suitable for inter-organizational applications. However, proof-of-work based blockchains are computationally inefficient and do not pro-vide final consensus, although they scale well to large networks. In con-trast, blockchains built around Byzantine Fault Tolerance (BFT) algo-rithms are more efficient and provide immediate and final consensus,but do not scale well to large networks. We argue that this makes themwell-suited for workflow management applications that typically includeno more than a few dozen participants but require final consensus. Inthis paper, we discuss architectural options and present a prototype im-plementation of a BFT-blockchain-based workflow management system(WfMS).
Keywords:
Byzantine fault tolerance · blockchain · workflow manage-ment · interorganizational workflow · distributed workflow Inter-enterprise business processes may include stakeholders in adversarial rela-tionships, that nonetheless have to jointly complete process instances. Trust inthe current state of a process instance and correct execution of activities by otherstakeholders may be lacking. Blockchain technology can help in such situationsby providing a trusted, distributed workflow execution infrastructure.A blockchain cryptographically signs a series of blocks, containing transac-tions, so that it is difficult or impossible to alter earlier blocks in the chain. In adistributed blockchain, actors independently validate transactions, add them tothe blockchain, and replicate the chain across different nodes. The independentand distributed nature of actors requires finding a consensus regarding the valid-ity and order of transactions and blocks. In workflow execution, it is importantthat actors agree on the ”state of work” as this determines the set of next validactivities in the process. Hence, it is natural to use blockchain transactions todescribe workflow activities or workflow states. a r X i v : . [ c s . D C ] A ug J. Evermann and H. Kim
In contrast to prior work, which has focused on transaction ordering on proof-of-work blockchains, we examine the use of consensus protocols based on algo-rithms for Byzantine Fault Tolerance (BFT). Furthermore, we explore the archi-tecture of a blockchain-based WfMS without smart contracts. We motivate bothof these choices later in the paper. Event without the use of smart contracts, theblockchain remains essential as it provides independent validation of workflowactivities, distribution, replication, and tamper-proofing to workflow execution.Blockchain technology admits many different system designs, and WfMS canbe implemented in many different ways on blockchain infrastructure. In thispaper, we focus on the interface between the blockchain and the workflow engineand the architectural options available for the design of the system.
Contributions
We describe a prototype WfMS system as a proof-of-concept im-plementation for an architecture that has not yet seen attention in the literature.First, in contrast to earlier work (Sec. 2) we do not use smart contracts to imple-ment model-specific workflow engines. We show that generic or existing workflowengines can be readily adapted to fit onto a blockchain infrastructure and thatsmart contracts are not required. Second, as recommended, but not implementedby [22], we show how a BFT-based blockchain can be used as workflow man-agement infrastructure. We describe the implementation of a blockchain-basedWfMS that has served as our tool to investigate design choices, problems andsolutions in this research area. While our prototype is an important demonstra-tion of feasibility, our main contribution is in the identification and discussion ofthe different architectural choices, and highlighting the existence and feasibilityof alternatives to smart contracts on proof-of-work blockchains in this researcharea.The remainder of the paper is structured as follows. Section 2 reviews relatedwork on blockchain-based WfMS. We then describe the principles of distributedblockchains with a focus on BFT-based consensus (Sec. 3). Section 4 describesthe architecture of our system and discusses design choices. Section 5 presentsour prototype implementation. The final Sec. 6 discusses implications of BFT-based blockchain technology for WfMS and an outlook to future work.
This section discusses existing work in two research areas. The first subsectionfocuses on blockchain technology applied to workflow management; the secondsubsection focuses on blockchains that apply a BFT ordering mechanism.
Blockchain-based workflow management has only recently received research at-tention [15]. The main research challenges are around integration of blockchaininfrastructure into WfMS and ensuring correctness and security of the workflowexecution [15]. A number of prototype implementations have been presented,
MaRt Blockchain Distributed Workflow Management 3 focusing on the use of ”smart contracts”. A smart contract is a software applica-tion that is recorded and executed on the blockchain. This application ”listens”for relevant transactions sent to it and executes application logic upon receipt ofa transaction. For example, the widely used Ethereum blockchain has a Turing-complete virtual machine (VM) for smart contracts and compilers for differentprogramming languages.In a project driven by a financial institution, a prototype workflow implemen-tation using smart contracts on the Ethereum blockchain offers digital documentflow in the import/export domain [8,7]. The project demonstrates significantlylowered process cost, as well as increased transparency and trust among tradingpartners.A blockchain-based workflow project in the real-estate domain [13], also us-ing the Ethereum blockchain and smart contracts, notes that the de-centralizednature of blockchains and the lack of a central agency will make it difficult forregulators to enforce obligations and responsibilities of trading partners.A complete WfMS, including collaborative workflow modelling and modelinstantiation, uses models as contracts between collaborators [10]. The systemallows distributed, versioned modelling of private and public workflows, consen-sus building on versions to be instantiated, and tracking of instance states on theblockchain. The blockchain provides integrity assurance for models and instancestates. The authors note that the usefulness of the approach is limited by blocksize limits on the blockchain and the latency of new blocks [10].Another implementation of blockchain-based workflow execution [24,25] usessmart contracts on the Ethereum blockchain either as a choreography monitor,where the smart contract monitors execution status and validity of workflowmessages against a process model, or as an active mediator, where the smart con-tract”drives” the process by sending and receiving messages according to a pro-cess model. BPMN models are translated into smart contracts. Local Ethereumnodes monitor the blockchain for relevant messages from the smart contractand create messages for the smart contract. Transaction cost and latency arerecognized as important considerations in the evaluation of the approach. Acomparison between the public Ethereum blockchain and the Amazon SimpleWorkflow Service cloud-based environment shows blockchain-based costs to betwo orders of magnitude higher than a traditional infrastructure [18]. Hence,optimizing the space and computational requirements for smart contracts is im-portant [9]. BPMN models are first translated to Petri Nets, for which minimizingalgorithms are known. The minimized Petri nets are then compiled into smartcontracts, achieving up to 25% reduction in transaction cost [24,25], while alsosignificantly improving the throughput time. Building on lessons learned from[24,25], Caterpillar is an open-source blockchain-based business process manage-ment system [14]. Developed in Node.js it uses standard Ethereum tools, likethe Solidity compiler solc and the Ethereum client geth, to provide a distributedexecution environment for BPMN-based process models. Lorikeet is a similarsystem [6], also based on BPMN models that are translated to smart contracts https://ethereum.github.io/yellowpaper/paper.pdf J. Evermann and H. Kim for the Ethereum chain. Also working with Ethereum and Solidity, [21] presenta system that focusing on resource management in addition to control flow con-siderations and extends the smart contracts to manage a variety of resourceallocation patterns.The replicated nature of blockchains means that information is available toall participants. One approach to address this privacy issue in the context ofworkflow management is the use of access control lists and their enforcement insmart contracts [17]. After examining different blockchain consensus mechanisms in terms of termina-tion time and fault tolerance, BFT-based consensus is recommended for businessprocess executions [22].Solving the ordering and consensus problems not with expensive proof-of-work approaches, but with efficient and provably correct and live algorithms, isan important motivator for many recent blockchain projects. The Hyperledgerproject of the Linux foundation is the umbrella for a number of BFT-based block-chain implementations of various stages or maturity. Hyperledger Burrow is ablockchain that can execute Ethereum virtual machine code but is based on theTendermint BFT-based consensus algorithm. Hyperledger Iroha is based on”YAC”, a proprietary BFT-based consensus protocol, but does not provide smartcontracts. Hyperledger Indy is a blockchain implementation for decentralizedidentity management, based on redundant byzantine fault tolerance (RBFT) [2].Hyperledger Fabric is a generic blockchain implementation that provides smartcontracts, called ”chaincode”, which can be written in Go or Node.js. Early im-plementations used the BFT-SMRT ordering protocol [19], while recent versionshave moved to the simpler, crash-fault tolerant (CFT) RAFT algorithm [16]. A blockchain records transactions in contiguous blocks. A transaction can be anykind of content. Information integrity is maintained by applying a hash functionto the content of each block, which also contains the hash of the previous blockin the chain. Hence, altering a block requires changing all following blocks. In atypical blockchain, nodes are connected using a peer-to-peer network topology.New transactions may originate on any peer and must be recorded in new blocks.Blocks are generally distributed to each peer for independent validation andreplicated storage. The key challenge is to achieve a consensus on the validityand order of transactions and blocks, despite peers that are characterized by https://tendermint.com/ ”byzantine faults”: they may not respond correctly, may respond unpredictably,or may become altogether unresponsive. Blockchains may be either public or permissioned (”consortium”). Public block-chains typically have no access control or identity management. Hence, no nodecan be assumed to be trustworthy. In contrast, a permissioned blockchain hasaccess controls, node operators are generally known and invited to participate,and (some) node operators may be implicitly trusted. The distinction betweenpublic and permissioned is not binary, but a continuum [22].Public chains are typically created to serve a large number of anonymous par-ticipants. Their advantages include anonymity, universal access, and generally ahigh trustworthiness as a large number of nodes provide independent transactionvalidation. On the other hand, public chains require incentives for validation, of-ten in the form of a cryptocurrency, which increases transaction costs. Publicchains also provide little flexibility to adapt to special use cases.In contrast, permissioned chains are typically created for a specific use casewith a small number of known institutional participants. Advantages of permis-sioned chains include low transaction costs, high flexibility to adapt to specialuse cases, identifiability of transaction originators, and access controls. Disad-vantages may include relatively lower trustworthiness due to the smaller numberof validating nodes.Workflow management is typically the domain of a small number of institu-tional collaborators, rather than a large number of anonymous participants. Assuch, it is a good fit with permissioned blockchains.While the blockchain technology used for public blockchains may also beused for permissioned blockchains, the different characteristics of the latter maypermit or favour the use of technology options that would not be suitable forpublic blockchains, such as communication intensive BFT-based systems.
Smart contracts allow code execution as part of transactions on the blockchain.Advantages include code integrity, as code is part of the blockchain, and a tightintegration of application logic with transaction validation. Disadvantages maybe limitations of the smart contract language instruction set and the need tore-develop existing application logic.In contrast, implementing application logic off-chain means that existing ap-plications do not need to be ported, and developers have access to familiar pro-gramming languages, code libraries and development tools. On the other hand,transaction validation must call back to the application logic.Smart contracts ensure that all nodes provide the same validation results,whereas performing validation in off-chain logic places the onus on the develop-ers to ensure identical results for all nodes. On the other hand, it allows devel-opers to develop against a behavioural specification without specifying the exact
J. Evermann and H. Kim algorithms or implementation to be used. For the WfMS case in this article, thatmeans that transparency is lost about the specific details of the workflow imple-mentation, but what is gained is that different workflow systems can interoperateas long as all obey the same workflow semantics.Smart contracts have great potential in the context of workflow management,as witnessed by the the Caterpillar and Lorikeet approaches [6]. However, nei-ther Caterpillar nor Lorikeet provide a BPMN based generic workflow engine asa smart contract. Instead, both systems compile individual BPMN to specificsmart contracts.
Given the extensive investment in WfMS by researchers andpractitioners, we believe that investigating how standard WfMS can be imple-mented on blockchain infrastructure without re-implementation in smart con-tract languages is worthwhile.
Bitcoin popularized the proof-of-work mechanism for consensus finding and se-curing the blockchain. New transactions are distributed to all peers, validatedand added to a transaction pool. Validation is based on transactions that existin the chain as well as others already in the transaction pool. Each peer canindependently propose new blocks based on its latest block and distribute theseto other peers. Depending on network connectivity, speeds, and topology, eachpeer may have a different set of blocks and transactions, and hence may proposedifferent blocks, leading to side branches . Each peer considers the longest branchas the current main branch and proposes new blocks based on this. Transactionsin side branches are not considered valid and are not considered when validatingnew transitions or blocks. When a side branch becomes longer than the currentmain branch, the chain undergoes a reorganization . What was the side branchis validated and becomes the main branch. What was the main branch is con-sidered invalid and becomes a side branch. Transactions no longer in the mainbranch are added back to the transaction pool to be included in other blocks. Asa consequence, different peers can at times consider different blocks and transac-tions as valid. As proposed blocks are distributed across the network, peers willeventually converge on a consensus regarding the valid blocks and transactionsand their order in the main branch of the chain.To limit the rate of new block proposals and to secure the blockchain againstatttacks, proof-of-work consensus requires block proposers to solve a hard prob-lem (”proof-of-work”, ”mining”). Typically, this is to require the block hash tobe less than a certain value. A limited block rate allows nodes to achieve even-tual consensus, and a hard problem prevents attackers from ”overtaking” thecreation of legitimate blocks with fraudulent one. Assuming equal processingpower for each node, the network needs 2 f + 1 total nodes to tolerate f faultyor malicious nodes.The probability that a transaction in the main branch of the blockchain be-comes invalid decreases with each block that is ”mined” on top of it, althoughin principle it is always possible that a block becomes invalidated. Blockchain MaRt Blockchain Distributed Workflow Management 7 communities use rules of thumb for the number of additional blocks that is con-sidered to make a transaction ”safe” enough to act on. In addition to the lackof finality of consensus, this approach induces significant latency as applicationsmust wait not only for one block but many to be created. Furthermore, applica-tions must actively monitor the status of all transactions of interest, must reactto chain reorganizations, and communicate these aspects to the user.
In response to the drawbacks of the proof-of-work consensus, i.e. latencies, nofinality of consensus, and required processing power, provably correct orderingalgorithms, based on distributed systems research, have seen a resurgence ininterest. Most of the ongoing research can be traced back to a practical methodfor achieving byzantine fault tolerance (PBFT) [5]. PBFT orders client requestsusing a set of nodes that are fully connected by reliable messaging. Every orderingconsensus is established by a specific set of nodes (”view”), with a leader orprimary node. Tolerating up to f faulty nodes requires 3 f + 1 total nodes. Protocol
PBFT is a three-stage protocol. A client sends a request to all nodes.The leader proposes a sequence number for the request and broadcasts a pre-prepare message. Upon receipt of a pre-prepare message, a node broadcasts acorresponding prepare messge if it has itself received the request, has not alreadyreceived another pre-prepare message for the same sequence number, and is inthe current view. This indicates the node is prepared to accept the proposedsequence number. Nodes then wait to receive 2 f matching prepare messages,indicating that 2 f + 1 nodes are prepared to accept the proposed sequence num-ber for the request. When a node has received 2 f identical prepare messages,it broadcasts a commit message to all nodes. Each node then waits to reeive2 f identical commit messages, indicating that 2 f + 1 nodes have accepted theproposed sequence number for the request. Upon committing, the node executesthe request and sends a reply message to the client. The client in turn waits for2 f + 1 identical replies, which indicates that a consensus has been reached onthe sequence number of the request.In case the leader fails to propose a sequence number, nodes first forwardrequests to the leader. When the leader continues failing to act on requests orproposes sequence numbers too high or too low, nodes trigger a view change.The view change uses a three-stage protocol similar to the normal operation oneto determine a new leader.Consensus about request sequencing is closely related to state machine repli-cation (SMR). Each node maintains a state that can be changed by client re-quests. When every node begins with the same state and executes requests inthe same order, the state machine is replicated. BFT SMART
BFT-SMART [4] is a software library built around the PBFTordering protocol and adds dynamic view reconfiguration allowing nodes to joinand leave views, and the MOD-SMART [20] state transfer system.
J. Evermann and H. Kim
Collaborative state transfer is useful when nodes create state checkpoints atdifferent times (”sequential checkpointing”). Due to the lack of multiple identicalcheckpoints, a simple quorum protocol cannot be used. Instead, ”collaborativestate transfer” [3] provides checkpoint and log information from multiple nodesin a way that allows a new node to verify its correctness.BFT-SMART provides a simple programming interface. The client-side inter-face exposes the ability to submit requests for ordered or unordered operations.State-changing operations should be ordered, while read-only operations maybe unordered. Applications implement a server-side interface, encapsulating thestate machine, that receives ordered and unordered operation requests in consen-sus sequence from the BFT-SMART library for execution. Any replies are sentback to the requesting client. Operation requests are opaque to the library andare simple byte arrays. It is the client- and server-side application’s responsibil-ity to serialize and deserialize these in a meaningul way. View reconfigurations(adding or removing a node, or changing the level of byzantine fault tolerance)are special types of ordered requests but are treated as any other ordered requestfor ordering and consensus purposes.For state management, the server-side application implements methods tofetch and set a state snapshot or checkpoint, also serialized as a byte array.State changes (ordered operations) are logged and the state is periodically check-pointed (sequential checkpointing). When a node joins a view, it is sent the latestcheckpointed state (collaborative state transfer), which it sets for the server-sideapplication, and any ordered operations after that checkpoint are then replayed,allowing the server state to catch up to the consensus state.BFT-SMART has been proven to be correct and live, i.e. it will provide thesame sequence of operations to all nodes and will not deadlock [4]. In termsof throughput, a BFT-SMART system with four nodes ( f = 1) supports morethan 15,000 operation requests (1kB size) per second with latencies around 10milliseconds on a local network. BFT-SMART’s performance decreases linearlyas fault tolerance (and hence the number of nodes) increases: A system with 10nodes ( f = 3) still supports more than 10,000 operations per second [4]. Summary
PBFT-based ordering, as implemented in BFT-SMART, avoids thelatency, lack of finality and processing requirements of proof-of-work consensus.On the other hand, its three-stage protocol imposes significant communicationoverhead and requires fully-connected nodes. Fault tolerance in PBFT-derivedmethods increases linearly with the number of nodes, but performance tends todecrease due to additional communication. [23] presents a comparison of proof-of-work and BFT consensus, shown in Table 1.
The different strengths and weak-nesses of the two consensus mechanisms suggest that BFT-based ordering is agood fit with small, permissioned blockchains in the workflow management con-text.
Note that while throughput (transactions per seconds) is a key performancemetric for many blockchains and consensus algorithms, it is not important inthe workflow management context: Even the largest organizations are unlikely
MaRt Blockchain Distributed Workflow Management 9
Proof-of-work BFT ordering
Node identity open, anonymous permissioned, nodes know other nodesConsensus finality no yesScalability (ordering nodes) excellent limitedScalability (clients) excellent excellentThroughput limited excellentLatency high lowCorrectness proof no yes
Table 1.
Comparison between proof-of-work and BFT-based blockchains, adaptedfrom [23] to have production workflow systems that need to sustain tens of thousands ofworkflow actions per second.
The main component of a WfMS is the workflow engine, which interprets theworkflow model and enables work items for manual execution or execution byexternal applications [12]. The engine maintains workflow state information andcase data. It may be supported by, or include, services for organizational datamanagement and role resolution, worklist management, document storage, etc.Designing a WfMS architecture requires choosing where to locate and how toimplement the workflow engine and other service.Existing work on blockchain-based workflow management (Sec. 2) has de-ployed the workflow engine on the blockchain itself. However, by compiling aworkflow model to a smart contract, the contract forms a workflow engine foronly that workflow model. Alternatively, blockchains can be treated as a trustedinfrastructure layer for generic workflow engines, using the blockchain only forstoring and sharing the state of work and achieving consensus on that state. Toour knowledge, there has been no such implementation using PBFT-derived, orany other, ordering mechanisms.Ordering, block management, and the workflow engine are the three mainservices in our system architecture. Fig. 1 shows the architecture of our system.
Ordering Service
The ordering service in our prototype is implemented based onthe BFT-SMART library [4]. It can receive transactions to add to the blockchain,which is an ordered (state-changing) type of request it supports. The orderingservice maintains a record of the latest block hash and block number, as wellas a queue of transactions that have been added as its state. When a sufficientnumber of transactions has been collected, the ordering service creates a newblock and clears the transaction queue. The ordering service returns the latestblock hash and the hash of the set of queued transactions as a result to clients,allowing clients to detect absence of consensus. Clients can request the latestblock hash.
Block Service
The block service stores the blockchain, may exchange blocks withother nodes, and verifies the integrity of the blockchain.The block service uses a peer-to-peer network for block exchange with newand recovering nodes. This network is distinct from the network layer of BFT-SMART and is not fully connected. Block exchange is required only when a nodebegins operation and enters an ordering view. At that point, the ordering servicestate is first updated through the BFT-SMART state replication mechanisms.The block service then compares its latest block to the latest hash from theordering service. The latter is assumed to be authoritative. Verification of theblockchain then proceeds backwards from the head of the chain, i.e. the blockwith the latest hash. Any missing blocks are requested from other peers andverified prior to adding them.
Workflow Engine
The workflow engine maintains information about workflowinstances (cases) and workflow model definitions. It receives workflow transac-tions of new blocks that are added to the chain, updating the state of eachprocess instance and creating work items accordingly. Through the worklist, itmanages user interactions with work items and execution of external functionsby work items.
Ordering Service
Client
Adapter BFT SMRT Ordering ServiceP2P Block Exchange ServiceBlockService
Blocks
Ordering Service Server • Last Hash • Last Block
Workflow Engine
Cases
User
Worklist UI Work
Items
Node A123 4 4Ordering Service
Client
AdapterBlockService
Blocks
Ordering Service Server • Last Hash • Last Block
Workflow Engine
Cases
User
Worklist UI Work
Items
Node B5 Fig. 1.
Architecture overview, transaction flow, and block exchange
The red arrows labelled with numerals in Fig. 1 indicate the steps of handlinga workflow transaction in our system:1. User completes work item in worklist2. Transaction is created and passed to ordering service client for submissionto ordering service3. Transaction is submitted to ordering service4. Transaction is passed in order to the ordering service server of all nodes5. Ordering service servers validate ordered transaction with their workflowengine
MaRt Blockchain Distributed Workflow Management 11
6. When transaction pool contains a sufficient number of transactions, a newblock is created and passed to block service7. Block service notifies workflow engine of new block and transactions8. Workflow engine updates state of running cases and creates new work itemsfor local worklistThe green arrows labelled with letters in Fig. 1 indicate the block exchangemechanism when a peer node is started.A. Block service queries ordering service server for latest hash and transactionnumberB. If block service determines it is missing blocks, it broadcasts a block requestto all other nodesC. Block services receive block requestsD. Block services assemble blocks into response messageE. Block service receives requested blocks and verifies block chainIn step A, note that ordering service is started before block service and re-ceives latest hash and transaction number through state exchange from othernodes. Furthermore, the block request contains the lower and upper block num-bers required by the node. In step B, the block service begins by querying onerandom peer. When it receives no response, it queries an increasingly largernumber of peers for blocks. In step D, other nodes only respond if they cansatisfy at least the upper block number. In step E, if the block chain containsthe most recent block but is missing individual earlier blocks, the block servicewill successively request these blocks from the peer it has most recently receivedblocks from. If this fails, it will again broadcast a query for a specific block. Asfragments of the blockchain and individual blocks are added, the block servicessuccessively verifies the chain integrity beginning with the latest block and thelast hash received from the ordering service.Next, we discuss the architectural options that we considered when designingour prototype system. These affect performance, ease of implementation, andresilience.
Because BFT-SMART provides exchange of state information with new and re-covering nodes, one architectural option is to employ this method also for theblocks of the blockchain. This means that the entire blockchain is part of thereplicated state in BFT-SMART, effectively removing the need for a separateblock service with its peer-to-peer network and block exchange protocol. Whileeasy to implement by serializing the blockchain into the BFT-SMART statesnapshot, this model becomes infeasible as the blockchain becomes too large tobe rapidly exchanged with other nodes using the complex and communication-intensive collaborative state transfer mechanism in BFT-SMART. As an alter-native, it is sufficient for the state to only contain the hash of the last block, thenumber of the last created block, and the queue of transactions waiting to becollected into new blocks.
As noted above, blocks are created by the ordering service. One design optionis to pass new blocks as replies from the ordering service operation back tothe node that requested the add-transition operation that triggered the blockcreation. That node’s block service is then responsible for exchanging the blockwith other nodes using the peer-to-peer network. This creates significant trafficon that network and may also lead to delays in new block distribution.A second design option, implemented in our system, is to have the orderingservice server-side application that creates the new block pass the new blockdirectly to the block service on its node. This tighter coupling between orderingservice and block service reduces the communication overhead for the peer-to-peer network and latencies due to the block exchange. The peer-to-peer networkis still required for block exchange with new or recovering nodes.
One option is for workflow engine and block service to always be present togetheron each node, as we have done for our system. Block service notifying the engineof new blocks, or the engine validating transactions for the ordering service canbe done with local method calls.While there is little to be gained by separating block service and workflowengine and running multiples of each, a second option is to operate only a singleblock service with multiple, distributed workflow engines. This eliminates thepeer-to-peer network and block exchange communication. Blockchain integritycan still be verified from the latest hash of the ordering service nodes. However,this design eliminates the redundant storage that is an advantage of a replicatedblockchain. On the other hand, redundancy can be achieved by a replicatedstorage layer within the block service, e.g. a distributed file system or database.
A transaction may represent workflow operations such as defining a new workflowmodel, launching a new case, executing an activity, aborting or cancelling acase or removing a workflow model. Activity execution information includes theactivity name and case ID, as well input and output data values. Alternatively, atransaction can represent a workflow instance state , i.e. data values and enabledactivities, without capturing activity execution itself.The first option requires the engine to maintain its own state of the workflow(i.e. information about workflow models, running instances, data values and en-abled activities). Constructing this state means reading the blockchain forwards from the genesis block and replaying all transactions. State updates are doneby executing transactions in new blocks. While reducing the amount of infor-mation stored on the blockchain, as only changed information recorded, thisoption requires significant effort in managing the separate state and ensuring itis consistent with the blockchain record. In contrast, the second option makes
MaRt Blockchain Distributed Workflow Management 13 the workflow state available by reading the blockchain backwards from the headto identify the latest state for each process instance. State updates are done sim-ply by copying workflow states from blockchain transactions as new blocks arepresented. Not maintaining separate state signifanctly simplifies the workflowengine design but leads to more information being stored on the chain.The first option provides activity information in each transaction. Hence,data constraints can be specified as post-execution constraints and checked whenvalidating the transaction. The second option does not provide information aboutactivity execution in a transaction. Hence, only global case data constraints canbe specified and checked as part of transaction validation.Finally, while transactions are waiting to be included in a block, users canbe made aware of such pending transactions. For the first option, transactionsare informative as they inform the user about pending workflow activities. Inthe second option, such transactions are less informative to the user, as they donot contain activity execution information.
In proof-of-work blockchains, blocks contain multiple transactions. The blocksize is a trade-off among transaction arrival rate, available hashing power, desiredblock creation rate, available network bandwidth, and tolerance for latency. Atransaction may be ”pending” for a some time until it is included in a blockand at a ”safe” depth. In contrast, in BFT-based systems, there is no reason toprevent blocks from containing only one transaction, i.e. the blockchain becomesa chain of transactions.Moving to a chain of transaction has another advantage. Proof-of-work sys-tems order transactions between different blocks, but the order of transactionswithin a block is not defined: Transactions may be included in the same block aslong as they are not mutually contradictory. Block miners ultimately impose anorder, but this order is arbitrary. This means that as pending transactions arecollected, they must be validated against the entire set of pending transactionsto ensure they are not mutually conflicting. In a chain of transactions, a newtransaction must be validated only against the immediately prior one.
The ordering and block services (the latter always together with a workflowengine) can be coupled to varying degrees. At one extreme, block managementis part of the ordering service, as discussed in Sec. 4.1.In the less integrated architecture implemented in our system, every blockservice and workflow engine node is also an ordering node and vice versa, butblock management is distinct from ordering and implements its own peer-to-peer network infrastructure. This allows each ordering node to quickly validatetransactions using the local workflow engine. The drawback of this design is thatthe number of ordering nodes should be determined by the desired level of faulttolerance, whereas the number of workflow nodes should be determined based on the business process and/or application. An application requiring more orderingthan workflow nodes is not a problem as the additional nodes are simply notassigned any workflow tasks. On the other hand, when an application requiresmore workflow nodes than ordering nodes, the excess ordering nodes decreaseperformance due to the communication overhead.Both types of coupling have the problem that a faulty ordering service alsocompromises the block service and with that the workflow engine on that node.However, workflow engine and block service can detect their local node’s faultswhen adding a transaction by comparing the consensus ordering service resultto the result of the local ordering service result. If a detected fault is accidental,the node can be reset and synchronized with the consensus ordering view andreceive valid blocks from peers. If the faulty behaviour is malicious, it is theintention of the process participant that controls the entire node, so the blockservice and workflow service are also compromised intentionally.At the other extreme, in a very loosely coupled architecture, ordering nodesand block service / workflow nodes are separated. Newly created blocks arepassed to the block server as replies from BFT operations and are communicatedusing the block service peer-to-peer network. However, because the orderingservice validates transactions after ordering but before accepting them, eachordering node would require a reliable connection to at least one workflow engine.Managing these connections as workflow engines join and leave the network,and managing the additional communication, adds significant complexity andintroduces additional latency in validating transactions.
Given the architectural design options discussed in the previous section andtheir advantages and disadvantages, we chose to implement our initial prototypeby storing only the latest block hash, block number, and transaction pool asBFT-SMART state (Sec. 4.1). The workflow engine and block service are al-ways present together at each node (Sec. 4.3) and both are always co-locatedwith an ordering service node (Sec. 4.6). The ordering service passes new blocksdirectly to the local block service upon block creation, but a peer-to-peer net-work supports block exchange with new or recovering nodes (Sec. 4.2). We storeworkflow states on the blockchain, instead of workflow operations (Sec. 4.4) soas not having to maintain a separate workflow state in the engine. The blocksize is user configurable (Sec. 4.5). We developed the prototype in Java. Sourcecode is available as well as a video demonstration . Fig. 2 shows a screenshotof our prototype.We implemented a permissioned peer-to-peer infrastructure with a pre-definedlist of participating actors. To keep our prototype simple, actors are identifiedby their internet address rather than their public keys, so that we can omit anaddress resolution layer. The P2P layer is implemented using Java sockets and https://joerg.evermann.ca/software.html https://joerg.evermann.ca/BlockchainDemo.htmlMaRt Blockchain Distributed Workflow Management 15 Fig. 2.
Screenshot of prototype serialization. Each P2P node has an outbound server that establishes connec-tions to other peers, and an inbound server that accepts and verifies connectionrequests from peers. Each connection is served by a peer-connection thread,which in turn uses inbound and outbound queue handler threads to receive andsend messages. Incoming messages are submitted to the inbound message han-dler which passes them to the appropriate service. Nodes can join and leave thepeer-to-peer network at will. When a node joins, it tries to open connections to running peers. The first peer to be contacted will initiate a view change in theBFT-SMART odering service to include the new peer on that level as well.Upon starting of a node, the BFT-SMART layer will first update state in-formation from other nodes in the view. Next, the block service will identifymissing blocks and request them from peers. Once the blockchain is completeand verified, the workflow engine reads the blockchain to get the latest statefor each workflow instance. Peer-to-peer messages are cryptographically signedand verified upon receipt. Table 2 lists the message types on our peer-to-peernetwork.
BlockRequest Requests a block with a specific hash from one or more peersBlockSend Sends a block to one or more peersBlockChainRequest Requests multiple blocks within a hash range from one or morepeersBlockChainSend Sends multiple blocks to one or more peers
Table 2.
Message types
Our blockchain has two transaction types. A
ModelUpdate transaction installsa new workflow model definition. An
InstanceState transaction contains a stateof a workflow instance. It is submitted after a new case has been launched or anactivity instance has been executed. Extensions to terminate cases and invalidatemodel definitions are readily possible.To keep our prototype simple, our workflow models are based on plain Petrinets [1]. Each Petri net transition specifies a workflow activity. The workflowengine keeps track of the Petri net markings and case data, and can detectdeadlocked and finished cases to remove them from the worklist.Each activity is associated with a single node. This partitioning of the processto different nodes does not form the resource perspective of the workflow but isused only to signal each node whether to act on a transaction. Each node canprovide its own resource management by defining roles or other organizationalconcepts and performing further work item allocation within each node. Ourmodels allow the process designer to specify this information.External method calls are specified as calls to static Java methods, and areperformed synchronously by the workflow engine on work item enablement.The data perspective is implemented as a key–value store. We currently admitonly simple Java types as we implement a GUI for these; an extension to arbitrarytypes is readily possible. Each workflow instance has a set of data variables.When a transition is enabled, an activity instance (work item) is created forit and its input values are filled from the values of the workflow instance. Theactivity instance is then added to the local worklist or externally executed. Afteran activity instance is completed (manually or through execution of an externalapplication), output values are written back to the workflow instance which isthen submitted as an
InstanceState transaction to the ordering service.We emphasize that our implementation is not meant to be a fully-featuredWfMS. Instead, it serves only to illustrate generic WfMS functionality and its
MaRt Blockchain Distributed Workflow Management 17 interplay with blockchain infrastructure components. The WfMS features them-selves are not the focus of this research.The ordering service, workflow engine and the block service have a simpleinterface (Table 3). The ordering and block services can call on the workflow en-gine to validate transactions against the current workflow state, and optionally,against the most recent pending transaction. Validation checks that a transac-tion’s instance marking is reachable from the marking of the current workflowinstance state or that of the pending transaction. It also checks for data con-straint violation. The block service receives new blocks from the ordering serviceand passes them to the workflow engine. In the other direction, the workflowengine can submit new transactions to the ordering service after a work itemhas been completed. Finally, the block service can request the latest hash fromthe ordering service on joining the network or recovering from a fault. → validateTransaction(tx[, pendingTx]) Ordering service asks workflow engine to val-idate a transaction, given the current work-flow state and optionally the most recentpending transaction (cf. arrow 5 in Fig. 1) → receiveBlock(block) Block service receives a new block and passesrelevant transactions to the workflow engine(cf. arrow 6 in Fig. 1) ← addTransaction(tx) Workflow engine submits a new transactionto the ordering service (cf. arrow 2 in Fig. 1) ← getLatestHash() Block service requests the latest hash fromthe ordering service (cf. arrow A in Fig. 1) Table 3.
Interfaces between ordering service, block service and workflow engine (di-rections from the perspective of the ordering service)
Previous work on blockchain-based WfMS has focused on creating smart con-tracts to represent specific workflow models. In particular, the Ethereum proof-of-work-based blockchain is widely used. However, proof-of-work-based systemshave significant drawbacks in terms of processing power requirements, latency,and the lack of final consensus. In this work, we have shown that a PBFT-derivedordering and consensus method is a suitable WfMS infrastructure. While we donot use smart contracts of modern blockchains, the use of a blockchain remainsessential, as it provides independent validation of workflow actions, distribution,replication, and tamper-proofing to workflow management systems.Through the development of our prototype, we have identified architecturaldesign options with their advantages and disadvantages. Our chosen design, inwhich we integrate ordering service, block service, and workflow engine on everynode, strikes a balance between architectural and implementation simplicity onthe one hand, and performance and scalability on the other.
A limitation in our chosen model is that the number of nodes must strikea balance between the requirements of the workflow (the number of actors in-volved), the desired level of fault tolerance, and the performance of the system.The major advantages are the low communication overhead on the P2P block ex-change and the ability of local workflow engines to validate transactions quickly.While our approach has lower resilience against faults and malicious attacksthan proof-of-work chains, it also has lower latency and higher throughput. Un-like proof-of-work chains, the PBFT-based approach does not scale to a verylarge number of nodes. Given these characteristics, systems such as ours aresuitable for permissioned blockchain applications. The low latency makes themsuitable for fast-moving processes, where activities are of short duration or mustfollow each other quickly. Our system is cheaper to operate than public proof-of-work blockchains that incentivizes block mining through cryptocurrencies. Whileone can implement permissioned proof-of-work chains, these lose their resilienceagainst attacks in small networks as it is easy for a single actor to acquire themajority of processing power in a single high-performance node. Attacking aPBFT-based system cannot be done by concentrating computational power butrequires control of more than 1 / – We currently assign single peer nodes statically to workflow activities. In thefuture, we will extend this to dynamic peer node assignment and to integratethis with the workflow’s resource perspective. – Porting existing feature-complete workflow engines, such as the open-sourceYAWL [11] or Bonita systems, to blockchain infrastructure allows a richerworkflow language and leverages existing implementations.To conclude, this paper has described a prototype implementation for anarchitecture that has not yet seen any attention in the blockchain-based work-flow literature. We have implemented PBFT-based system as recommended by[22] and shown that this infrastructure is suitable for WfMS. We have shownhow generic workflow engines can be readily adapted to fit onto a blockchaininfrastructure without implementing these as smart contracts. The interfacesbetween components are quite simple. In contrast to [15], who suggest thatblockchain-specific modelling languages need to be developed, our work showsthat workflow engines do not need to be implemented using smart contracts, asdone by [24,25], but that traditional workflow engines be easily adapted to use blockchains as infrastructure for communication, persistence, replication, andtrust building. References
1. van der Aalst, W.M.P.: The application of petri nets to workflowmanagement. Journal of Circuits, Systems, and Computers (4), 398–461 (2002), https://doi.org/10.1145/571637.5716406. Ciccio, C.D., Cecconi, A., Dumas, M., Garcia-Banuelos, L., Lopez-Pintado,O., Lu, Q., Mendling, J., Ponomarev, A., Tran, A.B., Weber, I.: Blockchainsupport for collaborative business processes. Informatik Spektrum (3), 182–190 (2019). https://doi.org/10.1007/s00287-019-01178-x, https://doi.org/10.1007/s00287-019-01178-x7. Fridgen, G., Radszuwill, S., Urbach, N., Utz, L.: Cross-organizational workflowmanagement using blockchain technology - towards applicability, auditability, andautomation. In: 51st Hawaii International Conference on System Sciences HICSS.AIS Electronic Library (2018), http://aisel.aisnet.org/hicss-51/in/blockchain/68. Fridgen, G., Sablowsky, B., Urbach, N.: Implementation ofa blockchain workflow management prototype. ERCIM News100