[PDF] An In-Depth Investigation of Performance Characteristics of Hyperledger Fabric

Abstract

Private permissioned blockchains, such as Hyperledger Fabric, are widely deployed across the industry to facilitate cross-organizational processes and promise improved performance compared to their public counterparts. However, the lack of empirical and theoretical results prevent precise prediction of the real-world performance. We address this gap by conducting an in-depth performance analysis of Hyperledger Fabric. The paper presents a detailed compilation of various performance characteristics using an enhanced version of the Distributed Ledger Performance Scan. Researchers and practitioners alike can use the results as guidelines to better configure and implement their blockchains and utilize the DLPS framework to conduct their measurements.

Full PDF

AAn In-Depth Investigation of Performance Characteristics of HyperledgerFabric

TOBIAS GUGGENBERGER ∗ , Project Group Business & Information Systems Engineering of the Fraunhofer FIT;FIM Research Center, University of Bayreuth, Germany

JOHANNES SEDLMEIR ∗ , Project Group Business & Information Systems Engineering of the Fraunhofer FIT;FIM Research Center, University of Bayreuth, Germany

GILBERT FRIDGEN,

SnT - Interdisciplinary Center for Security, Reliability and Trust, University of Luxembourg,Luxembourg

ANDRÉ LUCKOW,

BMW Group, Germany

All measurements

Better hardwareReadCPU heavyI/O heavyOther

Simple public transactionsSimple private transactions

Maximum throughput (tx/s)

Measurements with homogeneous hardware and transactions (simple write operations)

CouchDBLevelDB

Number of peers48163264 B l o c k c h a i n C P U u s a g e ( % ) This chart illustrates the impact of the most influential experimental parameters on Hyperledger Fabric performance. We showthat manifold parameters, such as the choice of hardware, transaction privacy, database, and network size, significantly impactperformance metrics, such as throughput and latency. Our results stem from setting up more than 1,500 Hyperledger Fabric networksand more than 200 million transactions in experiments that lasted for more than 2,000 hours. We validate and extend previousresearch by evaluating more than 15 new network- and transaction-related parameters, e. g., different network sizes of 5 to 128 nodes,transaction payloads, intercontinental network delays and node crashes. ∗ Both authors contributed equally to this research.Authors’ addresses: Tobias Guggenberger, [email protected], Project Group Business & Information Systems Engineering of theFraunhofer FIT; FIM Research Center, University of Bayreuth, Bayreuth, Bavaria, Germany; Johannes Sedlmeir, [email protected],Project Group Business & Information Systems Engineering of the Fraunhofer FIT; FIM Research Center, University of Bayreuth, Wittelsbacherring 10,Bayreuth, Bavaria, 95447, Germany; Gilbert Fridgen, [email protected], SnT - Interdisciplinary Center for Security, Reliability and Trust,University of Luxembourg, Luxembourg, Luxembourg; André Luckow, [email protected], BMW Group, Munich, Germany.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] submitted to ACM a r X i v : . [ c s . PF ] F e b Guggenberger and Sedlmeir, et al.

Private permissioned blockchains, such as Hyperledger Fabric, are widely deployed across the industry to facilitate cross-organizationalprocesses and promise improved performance compared to their public counterparts. However, the lack of empirical and theoreticalresults prevent precise prediction of the real-world performance. We address this gap by conducting an in-depth performance analysisof Hyperledger Fabric. The paper presents a detailed compilation of various performance characteristics using an enhanced version ofthe Distributed Ledger Performance Scan. Researchers and practitioners alike can use the results as guidelines to better configure andimplement their blockchains and utilize the DLPS framework to conduct their measurements.CCS Concepts: •

Computer systems organization → Peer-to-peer architectures ; •

Information systems → Distributed storage ; Data centers ; •

Hardware → Testing with distributed and parallel systems ; •

Applied computing → Enterprise computing ; Datacenters .Additional Key Words and Phrases: blockchain, dlps, dlt, distributed ledger, hyperledger fabric, performance, scalability, throughput

ACM Reference Format:

Tobias Guggenberger, Johannes Sedlmeir, Gilbert Fridgen, and André Luckow. 2020. An In-Depth Investigation of PerformanceCharacteristics of Hyperledger Fabric. 1, 1 (February 2020), 32 pages. https://doi.org/tbd © 2020 Association for Computing Machinery.Manuscript submitted to ACMManuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 3

Bitcoin marks the first form of a blockchain system, developed primarily for the decentralization of financial systems.In particular, it created a new virtual currency that allows transfers without distinct intermediaries such as banks [33].Based on the general concept of Bitcoin, Buterin et al. [7] extended the scope of blockchain technology towards abroader field of applications. Ethereum improved the versatility of blockchains, which so far could only provide limitedprogramming logic to users, by introducing a programming language and an associated runtime environment (“Ethereumvirtual machine”) to execute smart contracts. These were initially conceptualized by Szabo [40] and allow the executionof highly customizable program code in a Peer-to-Peer (P2P) environment without relying on a distinct intermediary.The advancement of blockchain technology has fostered the development of decentralized applications in business andthe public sector, going far beyond the initial use cases within the financial sector [10, 15, 17, 21, 27, 30, 31, 38]. However,despite the large potential benefits of distributed ledgers to enterprises, such as consolidating audit and productiondata in an unimpeachable distributed database, public blockchains suffer from many limitations. For example, theyyield high transaction fees, low throughput, high latency and the lack of finality, high energy consumption, as well astransaction confidentiality [22].Toward these new paradigms, developers introduced new frameworks to answer various industries’ rising demandregarding enterprise-level blockchain applications. Modified blockchain architectures address the shortcoming ofpublic permissionless blockchains and adapt them to the need of enterprises [22]. To achieve these goals, frameworksthat implement private permissioned blockchains, which restrict participation to the blockchain and consensus to aconsortium, were developed, [4]. Within these implementations, Hyperledger Fabric (Fabric) has become the leadingsolution for many applications. In particular, a wide range of use cases make use of Fabric’s properties, as the frameworkprovides high security and performance as well as flexible tools for access management, privacy, and implementingbusiness logic [1, 22].Currently, several major projects based on Fabric are transitioning from tests and minimum viable products withlimited scope to production-ready systems, which results in a growing number of participating parties and operationsin these projects [19, 32]. However, the requirements toward private or public transactions, different complexitiesof smart contracts, and the need to support and adapt topologies differ heavily between these projects [22]. Fabricoffer various configurations to adapt to a wide range of different use case requirements [24]. The choice of variousarchitectural parameters, such as network size, choice of hardware, internet connection speed, and complexity ofoperations (i.e., smart contracts methods), is known to have a large impact on blockchains’ performance. Consequently,trade-offs between security, network size, privacy, and performance must be considered when designing a system withhigh-performance and specific reliability requirements [22].Our literature review in Section 3 identifies two important gaps in the current body of knowledge that have not beenaddressed by previous research [39]. First, existing studies focus on particular variables without allowing for a holisticview. This drawback is mainly due to current studies conducting their measurements with different non-standardizedtools. Additionally, many do not provide full transparency of how they define their key metrics and arrive at their results.Thus, observations of certain variables lack replicability and generalizability. However, these criteria are essential toallow a holistic view of the performance of Fabric. Second, the development of Fabric progresses fast and frequentlyintroduces changes, offering new configuration options and features that impact the performance and are not covered byliterature. For example, private data collections effectively implement the necessary level of privacy in a cross-enterprisesystem and are hence essential for many enterprise-level applications [24, 45]. The private data transaction process

Manuscript submitted to ACM

Guggenberger and Sedlmeir, et al.comprises increased complexity over the conventional transaction process by introducing additional gossip routines.These protocol changes make it hard to predict private data transactions’ performance compared to conventionaltransactions. However, the impact of using private data collections on performance has not been studied in academicliterature to the best of our knowledge.In this paper, we address the identified research gap by studying a wide variety of performance characteristics ofFabric. We present an in-depth performance analysis of Fabric from the perspective of both researchers and architectsof large-scale, enterprise, and public sector projects. Our measurements significantly extend the range of performancecharacteristics studied before. It includes scenarios that are highly relevant for the use of blockchain-deployments in thereal world by industry and the public sector, such as a need for confidentiality, cross-data center, and inter-continentaldeployments, and availability and resilience. Following Kannengießer et al. [23], the right balance of these factors isessential for allowing blockchain to create value effectively. Thus, our research objective is to develop a list of relevantvariables, measure their specific impact on different Fabric implementations, and demonstrate the potential of Fabric indifferent scenarios. Our results aim to contribute to a better understanding of enterprise blockchains as an example of ahighly complex fault-tolerant distributed system in real-world settings as enterprises would need them. For example,we show that Fabric scale very well with CPU-heavy transactions but struggles to perform with payloads larger than100kb. Another important finding is that Fabric is very suitable for intercontinental networks, but especially privatetransactions suffer from the resulting high latency. Besides demonstrating our results, we also contribute to an extendedversion of the Distributed Ledger Performance Scan (DLPS) [39], a blockchain benchmarking framework, to investigatemany of the identified knowledge gaps regarding performance characteristics. The DLPS provides clear definitionsof key performance metrics and offers an end-to-end description of their setup and measurement, allowing for fulltransparency and repeatability. We provide our extension of the DLPS in the open-source repository [13] as well as theresults of our experiments for researchers to repeat our measurements or easily investigate new configurations.The remainder of this paper is structured as follows: Section 2 gives an overview of essential background conceptsand the architecture of Fabric. Section 3 provides a literature review of existing work on benchmarking Fabric. We thenuse the findings of the literature and derive the main shortcomings that we want to address in Section 4. We also providedetails on the measurement process with the DLPS. Afterward, Section 5 presents the main findings of this paper bydemonstrating our benchmarking results relative to a wide range of variables employed in the benchmark testings. InSection 6, we discuss our findings, provide implications for real-world applications, and give design guidelines. Finally,in Section 7, we conclude this paper and provide suggestions for future research opportunities.

Since version 1.0, Fabric facilitates a paradigm that fundamentally differs from most blockchains to offer improvedperformance, flexibility and privacy features [24]. Instead of relying on an order-execute architecture, Fabric uses anexecute-order-validate paradigm (see Figure 1). Order-execute means, first, that the consensus mechanism is responsiblefor ordering and then broadcasting new transactions and, second, that all peers execute these transactions sequentially.In contrast, execute-order-validate implies that Fabric separates execution and validation from ordering [1].The changed replication process requires a new system architecture. A Fabric node can take up one of the followingthree roles [1]:

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 5

Order-execute architecture in replicated services • Simulate transactionand endorse• Create read-write set• Collect endorsements • Order read-write set• Atomic broadcast(consensus)• Stateless orderingservice • Validate endorse-ments & read-write set• Mark invalidand conflictingtransactions • Persist state on all peers

Execute Order Validate Update state • Consensus or atomicbroadcast • Deterministicexecution • Persist state on all peers

Order Execute Update stateExecute-order-validate architecture of Hyperledger Fabric

Fig. 1. The execute-order-validate paradigm in Fabric compared to the order-execute architecture that most blockchains exhibit. • Clients are responsible for submitting a transaction proposal to the peers and finally broadcast transactions inthe form of a bundled endorsement response for ordering [1]. • Peers receive the transaction proposal from the clients, simulate them and send the signed result back to theclients. Moreover, they eventually validate transactions. All peers maintain a ledger consisting of an append-onlydata structure (blockchain) of all previous transactions and a structure that represents the latest world state ofthe ledger. For storing the ledger state, peers can make use of conventional databases. Currently, Fabric v2.0supports LevelDB and CouchDB. Due to the execute-order-validate paradigm, Fabric does not require all peersto execute all transaction proposals – a design choice that makes Fabric quite special compared to most otherblockchains, both public permissionless and private permissioned. By means of an endorsement policy, one canspecify the subset of peers that is required for the transaction proposal execution for each smart contract methodindividually. These peers are also called endorsers or endorsing peers [1]. • Ordering Service Nodes, which can also be called orderers, jointly form the ordering service. The orderingservice is responsible for creating the total order of all transactions. There are different ways to implement theordering service, ranging from a (by now deprecated) solo orderer to distributed protocols, such as RAFT [36]and Kafka [25], to address different levels of fault tolerance [1]. In the future, developers want to introduceadditional ordering services, which should also consider byzantine faults [28].Clients, peers, and orderers are further grouped into organizations (abbreviated as orgs), typically representing companiesor wider groups of participants. Based on their organizational affiliation, these entities have different rights, includingthe permission to join a Blockchain channel, which represents a private subnet of communication between two ormore network participants including a corresponding ledger. A peer can either join one or multiple channels. Opting tohave an extended amount of nodes in one organization provides additional redundancy and eventually to some extentperformance through the distribution of the simulation workload [41]. This approach allows parallel endorsements oftransactions.

Manuscript submitted to ACM

Guggenberger and Sedlmeir, et al.

Fabric’s execute-order-validate paradigm separates the transaction flow into three parts: Execution (sometimes alsoreferred to as simulation) of a transaction and checking its correctness by comparing the signed result of redundantexecution on different peers, which is also called endorsement; ordering by means of a consensus protocol, regardlessof the semantics of a transaction; and transaction validation, ensuring endorsement policy and state consistency [1].Figure 2 gives an overview of the transaction flow. In detail, [1] describe the three phases as follows:(i)

Execution Phase:

A client sends a cryptographically signed transaction proposal to one or more endorsing peersfor the execution (simulation). The peers do not yet update their ledger but only generate a read set and a writeset (1). The write set consists of all key updates resulting from the simulation and the read set contains all keys thatthe peers read during the simulation. The endorsers then create a cryptographically signed endorsement, includingthe read and write set, and send it back to the client in a proposal response. The client collects endorsements untilthe requirements set by the endorsement policy are met (2). This action also ensures that all endorsers producethe same execution result and, thus, respond with the same read-write set [1].(ii)

Ordering Phase:

Once the client has received enough endorsements, it bundles them all, creates a signedtransaction, and sends it to the ordering service (3). The ordering service uses consensus to establish a total orderof all transactions. In addition, the ordering service will batch the transactions in blocks and sign them [1].(iii)

Validation Phase:

Blocks can either be delivered directly by the ordering service or from other peers througha gossip protocol (4). If a new block arrives at a peer, it will enter the validation phase (5), which involves thefollowing three sequential steps [1]:a. The peer checks if every transaction fulfills the endorsement requirements. If a transaction is invalid, the peerwill mark it accordingly and ignore its effect.b. The peer checks each transaction sequentially for read-write set conflicts. Hence, it compares the key, value,and version of the transaction with the current state of the ledger and ensures they are still the same. If atransaction is invalid, the peer will mark it accordingly and ignore its effect. O r de r i ng S e r v i c e

12 3 4 ChaincodeExecutionInvocation Endorsementcollection

OrderingBroadcast/Delivery Validation Commit

Client EndorsingPeer 1 EndorsingPeer 2 EndorsingPeer 3 Non-EndorsingPeer

Data communication

Fig. 2. Fabric high-level transaction flow, adapted from Androulaki et al. [1].

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 7c. The peer enters the ledger update phase and appends the block to the local store ledger. For each transactionthat is not marked as invalid, the peer will also write all key-value pairs of the write set to the local state. Fabricconsequently records invalid transactions even though they do not affect the state [1].

Since version 1.2, Fabric also supports private data through private data collections. A private data collection representsprivacy policies, defining which peers should process and store related data and which organization should be able toaccess them [29]. In particular, private data is a feature that is primarily made possible by the execute-order-validateparadigm and the endorsement policies that do not require every peer to recompute every transaction to validate it.This feature allows to conduct transactions where only a subset of the organizations that participate on the Fabricblockchain store the actual data while the remaining organizations only see the transaction hash, without relying oncomplex cryptographic techniques such as Zero-Knowledge Proofs or Homomorphic Encryption [20].Private data mainly makes use of the standard protocol described in Section 2.2, but differs at some stages to addressthe confidentiality in the three phases execution, ordering, and validation (see Figure 3).(i)

Execution Phase:

The client sends a proposal request to the designated endorser of the authorized organizations,including the confidential data. Based on the collection policy, defining which organizations should be able toaccess the private data, the endorsing peers distribute the private data to other authorized peers via a gossipprotocol. All peers receiving the private data store them in a transient data store. Similar to the general transactionflow, the endorsers generate a read-write set and send it to the client in the form of an endorsement (1). However,the read-write sets do not contain any confidential data, but rather a hash of the private data keys and values. Assoon as the client has received enough endorsements (2), it will send a transaction to the ordering service (3) thatis responsible for the total order of transactions [29].(ii)

Ordering Phase:

The ordering phase works similar to the general transaction flow. On consensus, the orderersinclude the transactions in a block and distribute them to all peers (4). Therefore, all peers receive the hashes ofthe private data, allowing for later validation [29]. O r de r i ng S e r v i c e

12 3 4 ChaincodeExecutionInvocation Endorsementcollection

OrderingBroadcast/Delivery Validation Commit

15 5

Client EndorsingPeer 1(authorized) EndorsingPeer 2(authorized) Non-authorizedPeer 1 Non-authorizedPeer 1

Private Data Hashed Data

Fig. 3. Fabric private data high-level transaction flow, adapted from Androulaki et al. [1]

Manuscript submitted to ACM

Guggenberger and Sedlmeir, et al.(iii)

Validation Phase:

All peers will store the transaction in their ledger and update the read-write set with theassociated hash values. Additionally, in case a peer is authorized to access the private data related to the transaction,it will check its transient data store for the private data. If the peer has not received the private data in the executionphase, it will try to pull the private data from other authorized peers. Then, the peer will use the hash values ofthe transaction to validate the private data (5) and eventually save it to the private state database [29]. In general,the peer makes use of an additional table of the regular state database.While private data provides an approach to transfer data confidentially between specific organizations, the neededcertificates are still used in plain text for verifying permissions. Even without insights into the content of a transaction,this alone leads to severe confidentiality issues, as it is apparent who is issuing new transactions. IBM, therefore,introduced an implementation of the identity mixer protocol [5, 9] to hide the identity of the issuing client certificate.Besides being highly experimental, this feature is only supported in the Java-implementation of the Fabric client yet.

While the performance of blockchains is often considered a crucial part when working towards production-levelsystems [41], research in the field of systematic benchmarking at the time of writing this article is still scarce. To gain abetter understanding of existing research on the performance of Fabric, we conducted a structured literature review. Toensure that we include any relevant paper, we first defined Hyperledger AND Fabric as a search string. We then usedthe string for queries in ACM Digital Library, Web of Science, IEEE Explore, and arXiv. The initial set contained 552papers. After the first screening of the title and abstract, we excluded 533 publications. Based on a full-text analysis, ourfinal set includes fourteen articles that analyzed the performance and scalability of Fabric. With the full-text analysis,we removed any paper that performed benchmarking on a heavily modified version of Fabric, as related findings arehighly theoretical and not transferable to the publicly available versions of the system. Furthermore, we excludedstudies, which only test very small network sizes (less than six peers), as the generalizability of these results is verylimited. Table 1 depicts the final set of papers that we analyzed thoroughly.Pongnumkul et al. [37] marks the first publication on the performance of Fabric. Comparing the Go implementationof the Ethereum client (Geth) to Fabric, the authors demonstrated the potential performance benefits of using a privatepermissioned blockchain. Later, Dinh et al. [11] laid another vital foundation of the analysis of the performance ofprivate, permissioned blockchains by introducing the first systematic benchmarking framework called Blockbench [6].Blockbench makes heavy use of Yahoo! Cloud Serving Benchmark and Smallbank, which are both benchmarkingframeworks for conventional IT systems focusing on centralized databases. The authors eventually compared Fabricto Geth and Parity. Dinh et al. [11] decided not to opt for the re-architectured v1.0 of Fabric but used v0.6 for thecomparison, as they gained much better performance results with the older release.Later work almost exclusively makes use of Fabric > v1.0. In comparison to the findings of Dinh et al. [11], whoreached around 1,000 transactions per second, Androulaki et al. [1] gained much higher performance statistics withthe newly introduced architecture of v1.x. The authors provided an extensive analysis of the preview version of v1.1.They demonstrated that Fabric is potentially able to cope with over 100 peers and, under the right circumstances,perform more than 3,500 transactions per second. Nevertheless, the performance results of Baliga et al. [3] wereagain significantly lower than that of Androulaki et al. [1], illustrating that the potential performance is dependenton various factors, such as benchmarking framework, the release version of Fabric and employed hardware. Laterresearch, therefore, extended testing on Fabric by introducing even more parameters and newer release versions of Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 9

Source Detailed content

Pongnumkul et al. [37]. This article presents a methodology for evaluating the performance of Ethereum and Fabric. Theresearch team eventually derives performance figures for execution time, latency, and throughput, alsoconsidering different workloads.Androulaki et al. [1]. This paper presents the execute-order-validate blockchain architecture of Fabric v1.1.0. The researchteam examines the throughput and latency under consideration of various parameters, such as blocksize, number of vCPUs, and number of peers.Baliga et al. [3]. This research makes use of Caliper to examine the performance of Fabric v1.0. The authors considervarious impacting factors, such as the number of nodes, endorsement policy, block size, and transactionsize.Dinh et al. [11]. In this research paper, the authors present the first systematic benchmarking framework for permis-sioned blockchains called Blockbench. It builds on the established YCSB and Smallbank frameworks andallows benchmarking of private Ethereum (Geth, Parity), Fabric, and Quorum. Based on the framework,the authors compare the performance of Fabric to Ethereum.Hao et al. [18]. This article presents a method for evaluating the performance of consensus algorithms in Ethereumand Fabric. The authors eventually derive performance figures for latency and throughput, taking alsovarying workloads into consideration.Nasir et al. [34]. The authors of this article demonstrate the performance of Fabric v1.0 in comparison to v0.6. Besidesanalyzing execution time, latency, and throughput, their study also varied the number of nodes toexamine the scalability of the two implementations.Thakkar et al. [42]. In this research article, the researchers examine the impact of various factors such as block size,endorsement policy, channels, and state database choice on Fabric v1.1. They eventually identifyperformance bottlenecks and propose optimizations that were included in later Fabric versions.Kuzlu et al. [26]. Making use of Caliper, this research examines the performance of Fabric regarding throughput, responsetime, and simultaneous transactions.Nguyen et al. [35]. The authors employ a customized version of the Hyperledger Caliper benchmarking framework toexamine the effect of sup-second network delays on the performance of Fabric. The test setup usedtwo cloud instances, one in Germany and one in France, to create the Fabric network.Dreyer et al. [14]. This article evaluates the performance of Fabric. The authors create various network configurations andmeasure throughput, latency, and error rate, along with the overall scalability of the Fabric platform.The authors put the results of their research in context with older versions of Fabric.Geneiatakis et al. [16]. The authors strongly motivate their research on the application of blockchain in the field of cross-border e-government services. However, in doing so, they also address the performance of Fabric in adedicated manner. Among other variables, they address network delay as an important factor for theperformance of Fabric.Thakkar and Nathan [41]. This paper examines the performance of Fabric v1.4 considering horizontal scaling (e.g., by addingmore nodes) and vertical scaling (e.g., by varying the number of CPUs per node). Based on theseobservations, the authors propose an optimization of the Fabric architecture, including pipelinedexecution of validation and commit phase.Wang and Chu [43]. This article goes into detail about the performance of Fabric and especially shows the performance ofdifferent ordering services. For this purpose, a network with 20 machines is used, and the differentphases of the transaction flow and endorsement policies are considered.Xu et al. [44]. In this paper, the authors developed a theoretical analysis framework to study the performance of Fabric,considering the execute-order-validate logic in Fabric v1.4. A series of experiments were conducted tocompare the results with the simulations, thus verifying the theoretical model.

Table 1. Existing literature on performance investigations of Fabric.

Fabric. Thakkar and Nathan [41] and Kuzlu et al. [26] performed their analysis on v1.4 of Fabric, further revealingthe complexity of performing performance tests of blockchain systems. In particular, Kuzlu et al. [26] concludes thatbesides the specific infrastructure the blockchain resides, also the design of the transactions, e.g., type and number,profoundly impact performance. Recently, Dreyer et al. [14] published their results, showing first measurements ofthe performance of Fabric v2.0. According to the authors, the performance of Fabric v2.0 improved significantly incomparison to older versions of the blockchain framework.

Manuscript submitted to ACM

Source Fabricversion Verticalscaling Horizontalscaling Database Privatedata Multipleworkloads Networkdelays Crashingnodes

This paper 2.0 (1.4) ✓ ✓ both ✓ ✓ ✓ ✓

Pongnumkul et al. [37] 0.6 ✗ ✗

LevelDB ✗ ✗ ✗ ✗

Androulaki et al. [1] 1.1 ✓ ✓

LevelDB ✗ ✓ ✓ ✗ Baliga et al. [3] 1.0 ✗ ✗

LevelDB ✗ ✓ ✗ ✗

Dinh et al. [11] 0.6 ✗ ✓

LevelDB ✗ ✗ ✗ ✗

Hao et al. [18] 1.0 ✗ ✗ n/a ✗ ✗ ✗ ✗

Nasir et al. [34] 1.0 (0.6) ✗ ✓ both ✗ ✗ ✗ ✗

Thakkar et al. [42] 1.1 ✓ ✓ both ✗ ✓ ✗ ✗

Kuzlu et al. [26] 1.4 ✗ ✗

CouchDB ✗ ✓ ✗ ✗

Nguyen et al. [35] 1.2 ✗ ✗ n/a ✗ ✗ ✓ ✗ Dreyer et al. [14] 2.0 (0.6/1.0) ✗ ✓ n/a ✗ ✓ ✗ ✗

Geneiatakis et al. [16] 1.1 ✗ ✓

CouchDB ✗ ✓ ✓ ✗

Thakkar and Nathan [41] 1.4 ✓ ✓ n/a ✗ ✓ ✗ ✗

Wang and Chu [43] 1.4 ✗ ✓ n/a ✗ ✓ ✗ ✗

Xu et al. [44] 1.4 ✗ ✓ n/a ✗ ✓ ✓ ✗ The authors investigated only delays of 1,000 ms and more, which is much more than the delays that typically occur in a worldwide distributed system. The authors considered network delay only in a single setting, without stating the actual delay between the data centers involved and without further analysisof the impact of different delays.

Table 2. Evaluation of the measurements conducted by the research papers in our literature review.

While later work introduced additional influencing factors, the results of Androulaki et al. [1] and Thakkar et al. [42]still remain one of the most complete presentations. Table 2 demonstrates that later work primarily focuses on specificcharacteristics, such as sole analysis of the effect of very high network delays. Nevertheless, due to the particulardependency on a wide range of other factors, including different benchmarking tools and definitions of key metrics [39],related findings give an initial impression but are hard to integrate into the results of other researchers.In summary, the existing literature offers first important insights into the properties of Fabric, but still has a narrowparameter space. Furthermore, reproducibility is limited due to the often minimal description of the methodology used,which, finally, still offers considerable room for improved generalizability of the results.

To further expand our understanding of private permissioned blockchains and specifically conduct further analysis ofthe various potentially influential variables and the performance of Fabric, we do standardized benchmarking. We foundthat available tools for blockchain benchmarking that apply to Fabric are Blockbench [6], Caliper [8] and the DLPS [39].Blockbench and Caliper do not clearly define how they determine the key performance metrics, particularly throughputand latency. The algorithm by which these are determined remained unclear. Thus, we chose the open-source frameworkDLPS. Furthermore, DLPS allows for sophisticated network deployment using cloud services, which enabled us to testa wide variety of configurations.Our benchmarking covers all variables that we identified through reviewing the related work (see Table 2). Throughoutour testing, we furthermore identified seven additional variables that potentially affect the performance of Fabric. Infact, the DLPS did not yet cover all the particularities of Fabric. First, we upgraded the Fabric version supported byDLPS to Fabric 2.0 and included multi-channel setups. Second, we added support for private transactions and complex

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 11

Group Design choices Answered questions

Architecture(Sec. 5.2) Number of organizations,peers, and orderers How does the configuration of organizations with regard to the number ofpeers and orderers influence the performance of Fabric?Endorsement policy What is the impact of different endorsement policies?Number of channels Does changing the number of channels have an impact on performance?Database location Does the separation of the database from the peer core functions improve theperformance of Fabric?Setup(Sec. 5.3) Hardware What is the impact of different computer specifications, particularly CPU?Database type How do different databases for the ledger state, such as CouchDB or LevelDB,impact the performance of Fabric?Block parameters How does the choice of blocksize and blocktime influence the performance?Business Logic(Sec. 5.4) Private data How does using private transactions influence the performance of Fabric?I/O-heavy workload How do transactions that trigger I/O-heavy chaincode impact the performanceof Fabric?CPU-heavy workload How do transactions that trigger CPU-heavy chaincode impact the performanceof Fabric?Reading vs. writing What are the essential differences between read and write performance?Network(Sec. 5.5) Delays To what extent do network delays impact performance?Bandwidth What are the bandwidth requirements for different architectures and through-puts?Robustness(Sec. 5.6) Node crashes How do crashing nodes impact the performance of Hyperledger Fabric?Temporal distribution ofrequests How do changes in the temporal request distribution affect the performance ofFabric?

Table 3. Design choices and network specifics for which a need for detailed investigations was determined. queries. Third, we extended the supported architectural parameters by allowing the CouchDB and ordering node dockercontainers to run either on the same node as the peers or on separate nodes. Splitting tasks on multiple machines orjoining them to reduce cross-instance latencies might help to increase performance. Fourth, we added both support forsimulating network delays as well as for multi-datacenter deployments. Finally, we refined the overall benchmarkingprocess, evaluated single-core CPU usage, traffic stats, and added capabilities to trigger automatic crashes of orderersand peers, which, e. g., required the dynamic identification of the current leader in the RAFT ordering service. Hence,the final framework allows testing for all previously mentioned variables that we found in existing research andextends them by new unique features that, according to our literature review, have not been investigated before. Table 3provides a description of all variables considered for the benchmarking. With the publication of this paper, we makeour enhancement to DLPS, as well as the configurations and results of all the experiments that we conducted for thispublication, available on the DLPS GitHub repository [13].We performed the testing in an incremental way to increase the reliability of our results. The left chart in Figure 4describes a single benchmarking run. We used a series of these runs to create a benchmarking ramping series (see theright chart of Figure 4). We create a configuration file that specifies all particularities of the Fabric network. The DLPSuses this file and automatically sets up a blockchain and client network in Amazon Web Services (AWS) before thebenchmarking process starts. A single benchmarking run in the DLPS involves sending requests from clients to thenetwork for a specific duration at a specific rate 𝑓 req , namely the slope of the requests (see orange requests curve). Thearrival of confirmations that the transactions have been processed successfully is illustrated by a responses curve (seegreen responses curve). In this curve, one can see distinguished blocks as they mark a quasi-simultaneous confirmation Manuscript submitted to ACM L a t e n c y ( m s ) , E ﬀ ec t i v i t y ( % ) , Single benchmarking run

Request rate f req (tx/s) CP U u s a g e ( % ) , T r a ﬃ c ( k B ) -5 0 5 10 15 20 25 Time (s) T o t a l r e qu e s t s / r e s p o n s e s RequestsResponsesPeer CPU (min, max) Peer traﬃc out (max)Peer traﬃc in (max) T h r o u g hpu t ( r e s p o n s e r a t e ) f r e s p ( t x / s ) Benchmarking ramping series

ThroughputLatencyEﬀectivity Peer CPU (avg)Peer traﬃc out (avg)Peer traﬃc in (avg)

Fig. 4. Exemplary benchmark run charts. of the included transactions, resulting in steps. The slope of the linear regression of the response curve corresponds tothe average rate of responses. The average time between sending and receiving confirmation of a specific transaction,or, equivalently, as long as the linear regressions remain parallel, the distance of the intersections of the regression forthe request and response curves with the 𝑥 -axis marks the latency. By starting at a low request rate and repeating at anincreased request rate in case the network can process requests at the given rate (see 𝑥 -axis of right Chart in Figure4), we can localize the maximum throughput, where an increase in the request rate does not further improve or evendecreases the response rate (due to queueing or overstress). In Figure 4, this behavior can be seen in the right chart at arequest rate of appr. 450 tx/s. In all measurements, we monitored the effectivity, i.e., the rate of transactions that werefinally successfully operated and controlled resource stats such as CPU usage and network traffic, to gather additionalinformation that might help to find the bottleneck. More information about the DLPS can be found on GitHub [13] andin the associated paper [39].By default, the deployments and tests with the DLPS are highly homogeneous and symmetric. We associate eachclient with one blockchain node, and each blockchain node is associated with the same number of clients. We use thisto have the ability to send requests completely uniformly, i.e., at 𝑓 transactions per second, a transaction is sent each 𝑓 second, which is again uniformly distributed among the clients. For example, if we have a 10 node network and20 clients, and a request rate of 100 tx/s, every client sends requests at 5 tx/s, and we also make sure that there is auniform offset between the clients. In case a client has multiple cores, and we use multiple workers for multi-threading,we make sure that there is a homogeneous offset. While at high request rates, the offset is harder to enforce and muchless relevant. A high degree of uniformity is relevant for measuring maximum throughput correctly when it is low, asthere are no spikes in the nodes’ workload.We used instances from the m5 series in AWS since they offer a good balance between computation, networking,and disk operations, all of which are necessary for blockchain nodes. Details with respect to the instances are displayed Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 13in Table 4. The fully automatic setup with the DLPS takes at least about 10 minutes, and a reasonable test takes aboutan hour – reducing the duration of a test increases the variance of the results and tended to overestimate the trueperformance in our tests. Modifying network parameters generally requires restarting the blockchain completely.Consequently, we decided to use a small network (4 orgs, each with 2 peers, one orderer, and 4 clients) withAWS m5.large instances as default, and to vary different individual or small subsets of parameters starting from thisdefault to keep costs and time bounded. Figure 5 illustrates the most important remaining default parameters for theFabric architecture, and Figure 6 gives an overview of the benchmarking settings. Therefore, our default configurationcomprises 8 peers and 4 orderers, as well as 16 clients, in a one-channel network with RAFT consensus. At the startof our experiments, the latest Fabric version was v2.0, so we conducted all experiments with this version. However,we also made some spot checks when v2.2 was released, noticing no significant performance changes. The remainingparameters are described in detail in the dedicated DLPS repository [13].In total, our experiments involve approximately 2,000 hours of testing, setting up approximately 1,500 Fabric networkswith a total of around 20,000 nodes and 40,000 clients, and sending more than 200 million transactions. In this process,we also collected 100 GB of log files, including sending and response times of each transaction and resource stats suchas CPU, memory, disk usage, ping, and traffic for each node and client.

Name vCPUs Memory (GiB) Network (Gbps) Storage (Mbps) m5.large 2 8 Up to 10 Up to 4,750m5.xlarge 4 16 Up to 10 Up to 4,750m5.2xlarge 8 32 Up to 10 Up to 4,750m5.4xlarge 16 64 Up to 10 4,750

Table 4. Used instance types in the AWS m5 series. They all base on Intel Xeon ® Platinum 8175M processor (up to 3.1 GHz), and weadded 16 GB of SSD storage. As the operating system, we used Ubuntu 18.04 LTS. Source: [2] { " node_type " : "m5 . large " ," f a b r i c _ v e r s i o n " : " 2 . 0 . 0 " ," fabric_ca_version " : " 1 . 4 . 4 " ," thirdparty_version " : " 0 . 4 . 1 8 " ," channel_count " : 1 ," database " : "CouchDB / LevelDB " ," external_database " : " False " ," i n t e r n a l _ o r d e r e r " : " False " ," org_count " : 4 ," peer_count " : 2 ," orderer_type " : "RAFT " ," orderer_count " : 4 ," batch_timeout " : 0 . 5 ," max_message_count " : 1000 ," absolute_max_bytes " : 10 ," preferred_max_bytes " : 4096 ," tls_enabled " : " True " ," endorsement " : " OutOf (2 , 4) " ," p r i v a t e _ f o r s " : 2 ," l o g _ l e v e l " : " Warning " ," c l i e n t _ t y p e " : "m5 . large " ," client_count " : 4 ,}

Fig. 5. Default settings for the Fabric networkarchitecture. { " duration " : 20 ," l o c a l i z a t i o n _ r u n s " : 2 ," rep etition_ runs " : 0 ," method " : " writeData " ,"mode" : " public " ," shape " : " smooth " ," delay " : 0 ," r2_bound " : 0 . 9 ," frequency_bound " : 100 ," latency_bound " : 10000 ," delta_send " : 0 . 5 ," d e l t a _ r e c e i v e " : 0 . 5 ," success_bound " : 0 . 8 ," r e t r y _ l i m i t " : 2 ," ramp_bound " : 2 ," success_base_rate " : 0 . 8 ," success_step_rate " : 0 . 04 ," f a i l u r e _ b a s e _ r a t e " : 0 . 8 ," f a i l u r e _ s t e p _ r a t e " : 0 . 04 ," delta_max_time " : 10}

Fig. 6. Default settings for the benchmarkinglogic.

Manuscript submitted to ACM

We first compared different modifications of the default architecture with respect to parameters that we consideredrelevant from the related work that we found in our literature review and from our experience working with the DLPS(see Figure 7). As the error bars, which represent the standard deviation obtained from conducting every experimentthree times, indicate, the results are very consistent and reproducible.Like Thakkar et al. [42], we found that write throughput for the default setup with LevelDB is around three timesthe maximum throughput with CouchDB, and we can even extend this result to private transactions (see Figure 7).We observe that the following modifications only have an insignificant impact on throughput: Doubling the numberof clients to distribute client workload to more workers (0 % impact on performance with public transactions and 4 %increase on performance with private transactions), doubling the number of channels (2 % decrease on performancewith public transactions and 13 % increase on performance with private transactions), deactivating TLS and switchingto a centralized (“solo”) orderer (13 % increase on performance with public transactions and 3 % increase on performancewith private transactions). This suggests that neither the number of clients and channels nor the ordering service andTLS is the bottleneck for the default architecture in our setup. While the results of [42] support that the ordering serviceis not a bottleneck in a similar architecture, they find that doubling the number of channels increases CPU utilizationand hence also throughput considerably. However, in our case, the peers’ CPU utilization is already very close tothe maximum on all virtual cores with one channel. This observation indicates that the two-channel configurationeventually will not exhibit a higher throughput. In our case, the difference between single-channel and dual-channelsetup was marginally low, resulting in a variation of only 9 % for private transactions. In contrast, public transactionsdid not show any significant impact. It is to note that these numbers represent the results with CouchDB. With LevelDB,the relative deviations largely were even smaller.Performance benchmarks with older versions of Fabric, particularly by Pongnumkul et al. [37], Dinh et al. [12] andNasir et al. [34], generally yield lower throughput (few hundred tx/s with LevelDB) on considerably better hardware,indicating that the development of Fabric has already led to considerable improvements of performance. Surprisingly,we noticed a slight decrease in performance of v2.0 compared to the previous version 1.4.4 for CouchDB, as opposed tothe results of [14]. As such, v1.4.4 using CouchDB was about 26 % faster with public transactions and 68 % faster withprivate transactions than v2.0. With LevelDB, the difference for private transactions dropped to only 11 % and withpublic transactions v2.0 was even 5 % faster than v1.4.4. We argue that the discrepancy between our results and that ofDreyer et al. [14] is due to how they made their conclusion. In fact, the authors compare their results for v2.0 withthe results of Nasir et al. [34] for v0.6 and v1.4. However, the studies’ testing environment was different and Dreyeret al. [14] used stronger machines with more computing power, probably resulting in their v2.0 measurements’ betterperformance. In contrast, our comparison of v1.4.4 and v2.0 was conducted ceteris paribus.

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 15

Default 32Clients 2 Channels Solo,no TLS v1.4.4

Setup modification M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB

Simple public transactionsSimple private transactions

Default 32Clients 2 Channels Solo,no TLS v1.4.4

Setup modification

LevelDB

Simple public transactionsSimple private transactions

Fig. 7. Different architectures in comparison. The configuration for the default setup is described at the end of section 4.

The endorsement policy, which we describe in Section 2, is an important setting, as itdrastically changes the level of redundancy. More endorsers mean more overhead but also higher robustness. As weillustrate in Figure 8, increasing the number of endorsers, i. e., the degree of redundancy of simulation, decreasesthroughput as expected. In absolute and relative numbers, LevelDB suffers from a much higher performance decreasewith a higher number of orderers compared to CouchDB. For example, maximum throughput for simple publictransactions with LevelDB decreases by 24 % resp. 54 % when switching from only one endorser (and thus no cross-checks of correct chaincode execution) to two or four endorsements, respectively. For CouchDB, degradation is 14 %resp. 41 %.For private transactions, we looked at pairwise private collections, i. e., private transactions between two orgs. Forprivate transactions, a degradation from 2 to 4 endorsers results in a loss of 14 % (CouchDB) and 30 % (LevelDB). Thesenumbers are notably lower than compared with public transactions (31 %, and 42 %). Thus, in general, performancedecreases heavier for LevelDB relatively (and absolutely) when more endorsements are necessary. Surprisingly, for onlyone endorser, in the case of private transactions, we noticed a strange behavior of Fabric that would result in throughputin the one-digit range as soon as multiple clients were requesting transactions from different peers. However, so far, wecould not determine the underlying reason.

Manuscript submitted to ACM

Number of endorsers M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB

Simple public transactionsSimple private transactions

Number of endorsers

LevelDB

Simple public transactionsSimple private transactions

Fig. 8. Varying the endorsement policy.

Initially, we see an increase in maximum throughput when increasing the number of peersper org while keeping the number of endorsers constant (see Figure 9). Likewise, increasing the number of orgs whilekeeping the number of peers per org and the number of endorsers constant increases maximum throughput. However,we also notice that maximum throughput decreases again for large network sizes, so there generally seems to be anoptimum. For the given setup, this optimum is at 8 peers per org and an endorsement policy of 2. Hence, we couldimprove the performance by up to 32 % for public transaction performance by ensuring the right number of peers. Withprivate transactions, the numbers are still about 21 % better with 8 peers per org than with 2 peers per org.Scaling the number of orgs and the endorsement policy equally only slightly reduces throughput for smaller networks,a potential reason being that the endorsement workload for each peer remains constant and the other operationslike networking and committing are not yet the bottleneck in this regime. Nevertheless, for larger network sizes, thethroughput degrades considerably, and we also see that the difference between having one and two peers per org onthroughput becomes negligible. This makes sense, as networking becomes the bottleneck in this regime, and splittingthe endorsement workload does no more give a considerable advantage to the networks with more peers per org.For scaling the number of RAFT orderers, we expected a performance decrease but could not yet observe one inour chosen scenarios. Seemingly, in the regime below 1,500 tx/s, the ordering service is not a bottleneck for up to 64orderers. Nevertheless, the ordering service might become a bottleneck for even larger ordering services. However,using a RAFT ordering service with up to 64 nodes should be sufficient in practically any scenario since this wouldallow a total of 31 crashes and still ensure the network’s functionality.

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 17

Total number of peers M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB, Scaling number of peers

Total number of peers

CouchDB, Scaling number of orgs

Number of ordering nodes M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) Scaling orderers

CouchDB, publicCouchDB, privateLevelDB, publicLevelDB, private

Number of orgs / endorsers

Scaling orgs and endorsers

Fig. 9. Different scalability parameters in comparison.

By deploying databases, orderers, and peers onto separate systems, one can gain a littleperformance boost (see Figure 10). In our default scenario, the database runs on the peer node (which is obligatoryfor LevelDB) and orderers on separate nodes. We see that running both the orderer and the peer on the same nodedecreases performance only slightly. Disregarding other important factors, separating an org’s Fabric componentson several computers is notably less efficient. In particular, we observed a decrease of only 15 % in the case of them5.large machines and 6 % in the case of the m5.2xlarge machines. Running the CouchDB on a separate node has someconsiderable effect on weaker hardware (increase of 23 % with m5.large). The throughput improvement is similar forprivate transactions on m5.large and m5.2xlarge hardware, amounting to 22 % resp. 18 %. Still, it becomes smaller (inrelative terms) as soon as better equipment comes into play (increase of 12 % with m5.2xlarge).

Manuscript submitted to ACM

Internal orderer Default External CouchDB

Setup modification M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB, m5.large

Simple public transactionsSimple private transactions

Internal orderer Default External CouchDB

Setup modification

CouchDB, m5.2xlarge

Simple public transactionsSimple private transactions

Fig. 10. The effect of separating the ordering nodes and the database for CouchDB.

Since already very early during our experiments, we realized that the performance of Fabricis susceptible to database choice. On average, the performance of Fabric was two to three times faster with LevelDBcompared to CouchDB. Especially using private data, the difference was more noticeable. With private data and m5.largemachines, Fabric was 272 % faster using LevelDB over CouchDB in the default setup.

It is important to determine the correlation between machine strength and performance since systemsshould scale with better hardware (and better network).As long as the number of vCPUs is small, an increase in their number improves performance notably (see Figure 11).For example, the performance increase for private transactions with CouchDB is 97 % when moving from m5.large tom5.xlarge instances and 62 % when moving from m5.xlarge to m5.2xlarge instances. This observation holds similarlyboth for CouchDB and LevelDB and both for public and private transactions. However, the improvement for movingfrom m5.2xlarge (8 vCPUs) to m5.4xlarge (16 vCPUs) is already small (less than 25 % for CouchDB and less than 20 %for LevelDB for both public and private transactions), also when taking into account that this also implies twice thecosts for hardware or cloud services. We also measured with m5.8xlarge instances. However, we noted that this, at bestyielded moderate performance improvements (even less than moving from m5.2xlarge to m5.4xlarge). Besides, crashesof peers became quite frequent (particularly for LevelDB), which led to our result being even worse than for m5.4xlarge.We think that searching for the reason for this behavior might be an interesting starting point for future improvementsof Fabric and yield better scaling with hardware.Like Thakkar and Nathan [41], we also observed that CPU utilization drops for hardware with many cores. Thakkarand Nathan [41] also argues that throughput can be increased by using more peers on multiple channels; however, thisbasically corresponds to running multiple blockchains instead of one, and currently, only cross-chain read-operationsbetween the blockchains (channels) are supported. Our experiments also suggested that for hardware with many cores,the CPUs cannot be fully utilized, and there is also not a single core that reaches more than 90 % CPU utilization.The computational tasks, hence, seem well parallelized and suggest that in the end, writing to disk is the bottleneck.However, we wanted to check whether using multiple channels could leverage additional, previously unparallelizableresources. This does indeed seem to be the case, but only to a small amount. Our results (Figure 12) confirm their

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 19 m5.large m5.xlarge m5.2xlarge m5.4xlarge

AWS Instance type M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB

Simple public transactionsSimple private transactions m5.large m5.xlarge m5.2xlarge m5.4xlarge

AWS Instance type

LevelDB

Simple public transactionsSimple private transactions

Fig. 11. Different instance types in comparison for simple public and private transactions with CouchDB and LevelDB. m5.large m5.xlarge m5.2xlarge m5.4xlarge

AWS Instance type M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB

One channelTwo channelsFour channels m5.large m5.xlarge m5.2xlarge m5.4xlarge

AWS Instance type

LevelDB

One channelTwo channelsFour channels

Fig. 12. Using multiple channels with varying hardware for simple public with CouchDB and LevelDB. observations that increasing the number of channels has only a small impact of an average of 12 % (regardless of thedatabase type) when going from a single-channel setup to a dual-channel setup. Any additional channel shows nonoticeable further improvement in maximum throughput.

New blocks are generated by the ordering service whenever the maximum blocksize is reachedor the time period that has passed since the generation of the last block is longer than the blocktime. Varying theblocktime (with a fixed maximum blocksize of 1000) keeps the maximum throughput below 500 tx/s, which alwaysmakes the blocktime trigger a new block. This observation is because as long as the maximum blocktime is less than2 s, no more than 1,000 tx will be operated within maximum block time. A small maximum blocktime (less than 0.25 s)implies low throughput since there is considerable overhead involved in creating, sending, and validating a new block. Apositive correlation between block size and maximum throughput has already been observed by Thakkar et al. [42]. Forlarger blocktimes, the workload related to transactions dominates and hence makes performance largely insensitive toblocktime. However, latency grows with blocktime, which makes perfect sense as it is always the associated timeout that

Manuscript submitted to ACM

Maximum blocktime (s) M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB

Simple public transactionsSimple private transactions

Maximum blocksize (tx)

CouchDB

Simple public transactionsSimple private transactions

Maximum blocktime (s) L a t e n c y a t l o w t h r o u g h p u t ( s ) CouchDB

Simple public transactionsSimple private transactions

Maximum blocksize (tx)

CouchDB

Simple public transactionsSimple private transactions

Fig. 13. Comparison of different block times and block sizes. triggers the creation of new blocks. It also makes a block timeout of around 0.5 s a sweet spot since increasing it does notfurther improve performance but increases latency, and decreasing maximum blocktime – though decreasing latency –also heavily decreases throughput. For varying the blocksize, we get the same results, but with a “cutoff”, which isthe case because we used the default maximum blocktime of 0.5 s, which – considering that maximum throughput isaround 500 tx/s when blocks become sufficiently large – becomes the actual trigger as soon as maximum blocksize ishigher than 0.5 s ·

500 tx/s =

250 tx. For the low throughput tests on latency, i.e., at 50 tx/s for public transactions and ablocktime of 500 ms, blocks never get bigger than 25 tx, so for the latency chart, we see no changes in latency beyond50 ms. See Figure 13 for an overview of the results.

First, we checked for the impact of maintaining larger data sets in terms of the statedatabases’ keyspace size. We did not observe any relevant dependence on the keyspace’s size for less than 10 keys (seeFigure 14). Performance implications of very large keyspace sizes for LevelDB are given, e.g., in [3, 12] – due to spacerestrictions, we consider this rather a property of the databases themselves than the Fabric network. Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 21

10 100 1\,K 10\,K 100\,K 1\,M

Size of data per transaction (bytes) M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB

Public transaction, data created on peerPublic transaction, data sent from clientPrivate transaction, data created on peerPrivate transaction, data sent from client

10 100 1\,K 10\,K 100\,K 1\,M

Size of data per transaction (bytes)

LevelDB

Public transaction, data created on peerPublic transaction, data sent from clientPrivate transaction, data created on peerPrivate transaction, data sent from client

Fig. 14. Comparison of performance with varying transaction size.

Furthermore, we checked for the sensitivity of the size of data written in a single transaction, both when the data iscommunicated via the client (data sent from the peer) and when it is already present on the peer (i. e., data created onpeer as a result of the executing a smart contract). One critical observation is that – for adequate bandwidth as within acloud data-center – it is not crucial whether large amounts of data for being processed are already available on thepeer or sent via the client. Transactions with 10 bytes have around the same throughput as simple (public/private)transactions benchmarked before. Degradation for moving from 10 bytes to 1 kB only yields degradations of less than10 % for CouchDB and less than 20 % for LevelDB. However, moving from 10 bytes to 100 kB degrades throughput bymore than 85 % for public transactions (even 95 % in case the data is generated on the client; leading us to the conclusionthat networking is particularly resource-intensive) and 75 % to 95 % for private transactions. Notably, the degradation ofCouchDB and LevelDB is similar, except for private transactions with CouchDB. Here, throughput is already ratherlow for 10 kB. Consequently, while there is no significant difference between the creation of the data on the client(networking intensive) and peer (no additional networking) for 10 Bytes, it is 3x for 100 kB LevelDB public, private andCouchDB public. Only for CouchDB private, the difference is only 30 – 50 %. For 1 MB, throughput is less than 10 tx/sfor LevelDB and less than 3 tx/s for CouchDB. The maximum throughput in terms of data is at around 14 MB/s for therun with 100 kB packages.

We first checked that the keyspace size does not impact less than 10 keys. Reading speed is only areasonable number on a "per peer" basis since no other node is involved in a reading operation (except for cross-checksin cases the client does not trust its peer). For simple key-based queries on m5.large instances, we obtained around400 reads per second on CouchDB (150 reads per second with complex queries) and around 750 reads per second onLevelDB. We used non-invoked queries, which do not lead to the Fabric transaction flow. Again, we use the standardconfiguration, consisting of 4 clients and 2 peers. Consequently, clients distribute requests equally between the peers.Complex queries are only feasible on CouchDB. Here we could observe a massive difference between no indexing(this performs approximately as good as querying the total database and searching the value space afterward resultingin a low one-digit number of successful queries per peer and second) and indexing (which still allows approximately150 reads per second and peer). Note that networks with high-performance requirements on reading processes shouldeither go for multiple peers for scaling benefits or consider fetching the peers’ data and maintaining another database. Manuscript submitted to ACM

Number of rows/columns M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) LevelDB matrixMultiplication, 2 endorsementsmatrixMultiplication, 4 endorsements

Number of rows/columns N u m b e r o f s i m p l e o p e r a t i o n s ( ) LevelDB matrixMultiplication, 2 endorsementsmatrixMultiplication, 4 endorsements

Fig. 15. Different tasks difficulties by matrix multiplication.

To test Fabric’s performance on CPU heavy operations, we conducted matrix multiplica-tions, implemented through simple nested loops, with different matrix sizes because this allows for quantitative controlof the complexity. Please refer to Figure 15 for an overview of our findings. Multiplying two n × 𝑛 matrices requires O (n ) simple operations (additions and multiplications) in our nested loop implementation. So for large n, we expectthat the throughput should scale as 𝑛 . Indeed, we see that the total number of operations approaches a saturationcurve for large n, since for small n, also the Fabric-related overhead matters. For n=300, the performance of the networkis still around 30 tx/s resp. 15 tx/s for two resp. four endorsements. When comparing this to a matrix multiplicationon a standalone Ethereum node, we found that the Ethereum Virtual Machine could not deal with a multiplicationof a 90 ×

90 matrix, and already multiplying a 30 ×

30 matrix took almost one second. This emphasizes the significantperformance improvements of Fabric when executing CPU-intensive tasks.We also checked that – as one would expect – there is no difference between public and private transactions sincethere are no database operations. Moreover, for a stricter endorsement policy (4 out of 8), performance is approximatelyhalf the performance compared to a weaker endorsement policy (2 out of 8). This result is because, in total, thereare twice as many computations for a single transaction. While the ratio of maximum throughput between the fourendorsements and two endorsements case is 40 % for multiplying a 1 × ×

100 matrix and 50 % and, hence, the expected asymptotic value for 300 × To investigate the impact of network delays in a real-world but still general scenario, we defined groupswithin our default architecture, where each group represents an enterprise and consists of two peers, one orderer,and four clients. Within a group, we want to assume minimal network delays. This hypothesis is certainly optimisticfor global enterprises, but in a large network, one might in fact choose the nearby peers within an organization forendorsement if speed matters. In a first attempt, we used the standard traffic-control (tc) tool under Ubuntu to setan artificial delay for any communication between the members of different groups. However, we noticed that theresults obtained by imposing artificial delays became highly unreliable at high throughput, suggesting that when CPU

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 23

90 ms

200 ms

330 ms

240 ms

185 ms

Sao Paolo

SingaporeAshburn, VA Frankfurt

115 ms

30 ms

20 ms40 ms 10 ms

Dublin MilanStockholmFrankfurt

40 ms30 ms

Fig. 16. Network topologies and corresponding network delays (one-way) used for determining the impact of network topology onmaximum throughput. usage or network traffic is high, tc does not operate correctly. Therefore, we started using deployments over multipledata-centers and set up a cross-European and global network. In particular, we set up groups located in Germany,Ireland, Italy, and Sweden for the European case with moderate network delays, and US East, Germany, Brazil, andSingapore for the intercontinental case with high network delays. Latency increases by 30-50 % from single datacenterto cross-European ( 30 ms one-way) and by more than 3x from a single data center to an intercontinental distributedsystem (up to 330 ms delay). In the intercontinental case, already at low throughput, transactions will, on average, take1.2 seconds (public) to more than 1.7 seconds (private). Once the throughput approaches closer to maximum sustainablethroughput, the latencies become even higher. A detailed topology of the network, including the network delays thatwe measured between each data center pair, is displayed in Figure 16.While our initial simulation with artificial network delays imposed using tc suggests a decrease in performance byapprox. 50 % for CouchDB and 70 % for LevelDB (and significant standard deviation thereof) for delays of 50 ms, usingthe actual cross-data center deployments with real-world delays we find that for both LevelDB and CouchDB and forboth public and private transactions, performance does not degrade that significantly in the intercontinental case (seeFigure 17). This refines and confirms a statement in [1] according to which a cross-data center deployment of 100 nodesusing five different datacenters (with unknown network delays), using LevelDB and public transactions, still offershigh performance. For public transactions with CouchDB, for example, we find that maximum throughput drops from426 tx/s for the single datacenter case to 376 tx/s in the cross-European case and 358 tx/s in the intercontinental case.This corresponds to a drop by 12 % and 16 %, respectively. We can also readily observe that the performance decrease isless significant for LevelDB in the cross-European case, whereas for both CouchDB and LevelDB, the performancedecrease of private transactions in the intercontinental case is considerable: For CouchDB, we observe a decrease inmaximum throughput of 39 % compared to the single datacenter case, and for LevelDB, we see a drop by 26 %.For a systematic investigation of the relationship between performance metrics and network delays, particularlyfor an analysis of latency, which we found to be (intuitively and empirically) much more sensitive on network delaysthan throughput, we had to adapt our benchmarking procedure. While the real-world deployments make it hardto vary network delays continuously, we found that the latencies in the real-world deployment are similar to thelatencies that we observed using tc when staying well below maximum throughput in our measurements. By conducting

Manuscript submitted to ACM

Single datacenter Europe Intercontinental

Network M a x i m u m s u s t a i n a b l e t h r o u g h p u t f ( t x / s ) CouchDB, network delays

Simple public transactionsSimple private transactions

Single datacenter Europe Intercontinental

Network

LevelDB, network delays

Simple public transactionsSimple private transactions

Fig. 17. Maximum throughput for single datacenter, cross-European, and intercontinental Fabric networks.

Single datacenter Europe Intercontinental

Network L a t e n c y a t l o w t h r o u g h p u t ( s ) CouchDB, network delays

Simple public transactionsSimple private transactions

Single datacenter Europe Intercontinental

Network

LevelDB, network delays

Simple public transactionsSimple private transactions

Fig. 18. Latency for different (one-way) network delays. corresponding experiments with artificial network delays imposed through the tc tool, we found that transactionlatency seems to grow approximately linearly with the network delay; interestingly, the average slope in Figure 18 isapproximately 15, which suggests that there are around 15 communications between different nodes leading to theobserved network latency. It is apparent that avoiding communication paths that exhibit notable network delays isimportant for Fabric networks, which operate at high-performance requirements. For example, this can be achieved byweakening endorsement policies and preferring endorsers with low latency, or avoiding particularly large distancesbetween ordering nodes. Close proximity between nodes, on the other hand, is at the cost of availability (“liveness”)because stronger geographic localization increases the threat of correlated crash-failures, e.g., caused by blackouts.

We investigated the bandwidth that the different roles, i.e., orderers, peers, and clients, in the Fabricnetwork require. We first noted that within the roles of peers and clients, inbound traffic is distributed very uniformly.Moreover, the maximum requirement on download speed for peers, orderers, and clients is very homogeneous amongeach of these roles. The maximum values that we observed were at most as large as the respective maximum on

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 25outbound traffic. Since, additionally, upload speed is more likely a bottleneck than download speed, we will only discussthe requirements on upload speed in detail here. Figure 19 illustrates the dependence of outbound traffic for all rolesin the network and different architectures. As expected there is a general linear correlation between throughput andoutbound traffic for all roles. Thakkar et al. [42] already measured the download rate of a peer to be approximately2.5 MB/s (and the download rate 0.5 MB/s) in their Fabric network. Regarding the upload rate of peers, we arrive at asimilar order of magnitude for equally high throughput.By contrast, we found the upload rate of orderers more heterogeneous, and it can become very large. More precisely,the RAFT leader requires a very high upload speed when the ordering service has many nodes. For n =

64 orderers, forexample, we observed an upload rate of more than 350 MB/s (recall that the maximum performance was independentof the number of orderers for up to 64 orderers, so upload is still not the bottleneck at least for deployment within asingle datacenter with high networking capabilities). This is plausible, as the crash-fault tolerant consensus mechanismRAFT that Fabric uses for the ordering service has a two-phase commit. Thus, the complexity of network traffic, i.e., thenumber of sent messages, is in the order of n(n-1), and the leader needs to be involved in any of these messages. Forthe other orderers, the outbound traffic is one order of magnitude smaller. The charts in the second row of Figure 19illustrate that the upload speed of non-leading ordering nodes mainly depends on the number of peers in the network,as well as on the number of endorsers per transaction, which both makes sense because they need to distribute newblocks to the peers. Transactions are larger when more endorsements (signatures) need to be collected.The third and fourth row of Figure 19 suggests that this observation remains true also for the upload requirements ofpeers and clients. Moreover, for the clients, the linear interrelation between outbound traffic and maximum throughputis clear. The non-leading orderers’ upload speed requirements are often about twice the requirement on the peers,which makes sense because, in our default scenario, there were twice as many peers as orderers. Moreover, the clientshave only a very small requirement on outbound network speed. Please refer to Figure 19 for an overview of the results.

Manuscript submitted to ACM

Leading orderer

Non-leading orderer

Peer

Maximum throughput (tx/s)

Client M a x i m u m o u t b o und t r a ﬃ c ( M B / s ) Fig. 19. Required bandwidth for different roles and architectures.

We tested different temporal distributions of the requests (i.e., jitter). Asillustrated in section 4, the DLPS sends transaction requests highly uniformly by default. We modified this to a step-shaped distribution to check for the queuing system’s sensitivity and efficiency. Here, clients send transactions at thebeginning of each second at a fluctuating distribution with notably more or fewer transactions per second ( Δ ≤ 𝑓 ). Inthis scenario, we did not notice a considerable deterioration of maximum throughput and latency. This suggests that aslong as queues do not become too large, the queuing process of Fabric is efficient. As soon as a system transitions from testing to productive usage, its resistance and resilienceagainst failures become extremely relevant. By operating multiple peers within one organization on physically separatednodes and using a blockchain per se, the negative impact of crashes and attacks in terms of data loss is already notablymitigated. Naturally, however, the impact of single nodes’ failures on overall performance is also very important, sinceit might take some time until a failed node is reset and re-synchronized. By the different roles in the system, we expectdifferent consequences of failures. We will only look into crashes here because malicious attacks need sophisticatedand specialized implementations and – since we are in a private permissioned network – can also be traced down tothe responsible parties and therefore disincentivized. We, therefore, recommend this topic for future work. Moreover,since we have always used enough clients to saturate the system and clients can easily be replaced on short notice (noneed for synchronizing), we will not look further into clients’ crashes. What remains, therefore, is looking at crashesof orderers and peers. Since the recommended ordering service is currently RAFT, which is crash fault-tolerant, weexpected that crashing a single orderer does not significantly impact the performance. Figure 20 depicts the impact ofcrashing various node types.

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 27

Crashing leading orderer Crashing non-leading orderer CP U u s a g e ( % ) , t r a ﬃ c ( k B / s ) Crashing peer T o t a l r e qu e s t s / r e s p o n s e s ( ) Time (s)

Requests Responses Peer CPU (min, max) Peer traﬃc out (max) Peer traﬃc in (max)

Fig. 20. Impact of crashing leading or non-leading orderer nodes and peers (at t =

30 s) on performance.

To check this, we set the network under stress at 400 tx/s, which is close to maximum throughput (and hencemaximum CPU utilization). After 30 seconds of sending transactions, we crashed a single orderer and continued sendingrequests at the same rate for an additional 30 seconds. We see that overall the impact of crashing an orderer is indeedlimited. However, it makes a considerable difference whether the crash is affecting the current RAFT leader or anon-leading orderer: In the case of crashing a non-leading ordering node, the ordering service stops distributing newblocks for around 5 seconds, and resumes with the previous speed thereafter with a newly selected orderer (Figure 20,chart on the left). If a non-leading orderer crashes, the impact on performance is negligible (Figure 20, chart in thecenter). If a single peer crashes, the performance drops by the rate of transactions that have needed the respective peeras an endorser. However, this is only the case because we restricted clients to requesting endorsements only from afixed set of peers, which contains exactly as many peers as the endorsement policy requires. In this case, we used ourdefault configuration with 4 orgs, each associated with 2 peers, and the endorsement policy requires 2 endorsements forevery transaction. Consequently, every peer participates in of all transactions, which explains the drop of throughputby 25 % after t =

30 seconds. In a production-grade Fabric network, one would likely provide at least a few more peers toeach client to compensate for crashes, which would then not lead to transactions that fail. However, the shift of theendorsement workload to another peer might decrease maximum throughput accordingly, namely to that of a Fabricnetwork without the crashed peer.

Fabric is a highly customizable permissioned blockchain framework, allowing enterprises to adjust the networkarchitecture to the requirements of their use case. While this ability allows for many optimizations, it also leadsto complexity and requires in-depth knowledge regarding Fabric’s design options and parameters. Together withexisting research, this paper should help to understand better what metrics are particularly important when settingup a Fabric-based application. In general, we were able to reproduce many results from existing work. We extendedour understanding of Fabric by using a benchmarking framework that is built on precise definitions of key metricsand testing many yet unexplored additional settings in a structured way. For example, we build upon the findings ofAndroulaki et al. [1] regarding the effect of network delays on the throughput of Fabric, but extend their result bycomparing three different setups (no delay, continental, and inter-continental). Consequently, we showed how DLPS

Manuscript submitted to ACM

Group Impacting Factor Results

Architecture Number of organizations,peers, and orderers The number of orderers does not influence overall performance in the regimeof 1 000 tx/s and below. Adding peers to small networks while keeping theendorsement policy constant improves the performance. The number of orga-nizations has only a limited impact on small networks ( ≤

32 orgs). However, itseffect increases with bigger networks due to how gossip dissemination works.Endorsement policy A stricter endorsement policy (higher number of endorsers per org), ceterisparibus, reduces total throughput. It is possible to balance a stricter endorsementpolicy by introducing additional peers to keep the throughput stable.Number of channels The number of channels has only a minimal effect on the performance of thesystem.Database location The database location has a very limited effect on the performance of thesystem.Setup Hardware Better hardware scales well with the performance for less than 8 vCPUs. How-ever, its impact diminishes for significantly larger numbers of vCPUs.Database type The database type has a high impact on the performance of the system. De-pending on the actual setup, LevelDB is up to three times faster than CouchDB.Block parameters Block time of around 0.5 s yields a particular sweet spot. Any addition of blocktime or block size, respectively, has only limited performance benefits butincreases block latency. Below 0.5 s, the throughput decreases considerably.Business Logic Private data Public data is an order of magnitude three for CouchDB and an order of twofor LevelDB faster than private data.I/O-heavy workload Once the transaction payload is beyond 1 kB, the performance decreases rapidly.CPU-heavy workload CPU-heavy node.js smart contracts work as fast as native implementations.The endorsement policy, however, significantly influences the performance, asthe redundancy of calculation rises.Reading vs. writing Reading scales linear with the number of peers, considering the client trusts thepeer, and no endorsement is needed. The performance of reading and writingcommands do not depend on the index size.Network Delays The impact of network delays is very low, even considering an intercontinentalnetwork. The influence on private data is marginally stronger than on publicdata.Bandwidth The bandwidth requirements rise proportional to the number of nodes. Con-sidering the RAFT setup, the leader node demands comparatively high uploadbandwidth with an increasing number of orderers.Robustness Node crashes Fabric is very robust with regard to crashes. A crashing peer does not influencethe overall network, despite its loss in endorsement power. In case a Raft leadercrashes, it takes about 5s for the system to re-elect a new leader and continuenormal operations.Temporal distribution ofrequests Small deviations in distribution do not impact the performance of the systems.Peaks beyond the maximum sustainable throughput can lead to undesirablecongestion effects.

Table 5. Results of the benchmarking efforts by impacting factor.

Figure 21 depicts the summary of our measurement results. We see that the maximum throughput heavily dependson the type of transactions (reading operations, CPU heavy transactions, i/o heavy transactions, and simple writetransactions) and the type of hardware. For homogeneous hardware (m5.large, for which we conducted most experi-ments), there is a clear correlation between maximum throughput and CPU utilization overall highly heterogeneousdeployments. We see that both depend heavily on the kind of database used (LevelDB achieves higher throughput), the

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 29

All measurements

Better hardwareReadCPU heavyI/O heavyOther

Simple public transactionsSimple private transactions

Maximum throughput (tx/s)

Measurements with homogeneous hardware and transactions (simple write operations)

CouchDBLevelDB

Number of peers48163264 B l o c k c h a i n C P U u s a g e ( % ) Fig. 21. Summary of all measurements and the overall most important design parameters. type of transactions (private transactions achieve lower throughput), and network size (large Fabric networks havelower throughput). Therefore, these parameters should be particularly respected whenever conceptualizing the networkarchitecture for a use case with higher performance requirements.Kannengießer et al. [22] describe various trade-offs that developers have to face when employing blockchain systems.Our article investigates some of the described trade-offs and provides additional metrics to quantify them in the case ofFabric. In particular, our measurements of different Smart Contract methods, e.g., varying the complexity of matrixmultiplications and the size of transactions, quantify the trade-off between transaction validation speed and complexityof operations. Similarly, by investigating private and public transactions in Fabric, we also quantify the trade-off betweenconfidentiality and performance derived in the paper. Finally, our various performance measurements on differentnetwork sizes and topologies and varying endorsement policy quantify the dependency of performance on the degreeof decentralization and, thus, security and availability. Since our experiments demonstrate that the ordering service isnot the bottleneck in the investigated architecture, the trade-off between performance and security was hardly present.The solo orderer, which lacks any crash or byzantine fault tolerance, provided about the same overall performance asthe crash fault-tolerant RAFT ordering service. It will be interesting to see whether using a byzantine fault-tolerantordering service, which will be provided in the future, will have any impact.We focused on a subset of interesting factorial combinations as the tremendously high degree of freedom of a Fabricmakes it unfeasible to test all possibilities. We settled on a standard configuration and then changed single parametersto identify their influence on performance. This drawback leads to the restriction that when moving away too far fromour testing scenario, the results might be different from ours. For example, Thakkar and Nathan [41] suggest that somecharacteristics might change for very strong servers. Therefore, this work is to be understood as orientation for thepotential of Fabric, but not as a strict reference for all possible cases. Hence, we suggest conducting specific evaluation

Manuscript submitted to ACM

This paper examines the performance of Fabric. It provides an in-depth analysis of the system, covering a total of 15variablesrelated to architecture, setup configuration, business logic, network and robustness, guiding system architectsand infrastructure engineers when designing their Fabric-based infrastructure and applications.This paper makes various theoretical and practical contributions. From an academic perspective, this paper contributesto understanding how to design blockchain systems, laying out further insights into private, permissioned blockchains’potentials. Future research can also take an extended list of influencing factors as a baseline for Fabric’s performanceanalyses and other blockchains. From a practitioner’s point of view, the demonstration of various parameters’ impactmight help optimize existing applications further to allow for higher network performance. Finally, by discussing thepotential of Fabric, we provide a baseline for practitioners to understand whether blockchain might be up to potentialoperation-level applications’ requirements.Our results demonstrate that Fabric is suitable to support the needs of many real-world applications providingsufficient scalability while ensuring critical properties of the system, such as being manipulation proof. While thisstudy covers a comprehensive list of variables, it still leaves certain areas undiscovered. For example, it might bebeneficial to compare the different supported programming languages, such as Go, Node, and Java. Therefore, we hopethat future research uses the extended DLPS framework to examine additional implementations. Additionally, whilethis paper considers a recent version of Fabric, evaluating features, such as private data collections, the developmentof the blockchain framework is still comparatively fast with developers focusing on further improving the overallperformance as well as new features, e. g., zero-knowledge proofs. Therefore, the presented analysis is only valid for theparticular release version, requiring additional testing once an update is introduced. Nevertheless, the list of influentialparameters and the functionalities by which we extended the DLPS can also support analyses of future updates on theFabric code. Additionally, it supports other enterprise blockchains, such as Quorum, that offer similar functionalities(Smart Contracts and private transactions) and parameters (such as block time) in deployments with different networkarchitectures and hardware.Furthermore, even though researchers and practitioners focus on Fabric, we like to address the need to investigateother blockchain implementations’ potential. While the architecture of different blockchain systems differs in specificdetails, the actual core architectural principles remain the same. Considering architecture, setup, business logic, network,and robustness when benchmarking different blockchain implementations ultimately allows for a better comparisonbetween the results.We think that blockchain has come a long way. Especially with the newest release of Fabric, blockchain implementa-tions came closer to a production-ready state than ever before. Nevertheless, our understanding of this technology,its performance, and scalability, in general, is still very limited. We have just started to identify the comprehensiveset of influencing factors on this kind of distributed systems. Gaining further insights will allow for better design ofblockchain infrastructures and applications, pushing the technological boundaries toward more advanced systems.

Manuscript submitted to ACM n In-Depth Investigation of Performance Characteristics of Hyperledger Fabric 31

ACKNOWLEDGMENTS

We thank Colin Glass, Marius Poke, and Orestis Papageorgiou for their valuable comments. We also gratefullyacknowledge the Luxembourg National Research Fund (FNR) and PayPal for their support of the PEARL project“P17/IS/13342933/PayPal-FNR/Chair in DFS/Gilbert Fridgen” that made this paper possible.

REFERENCES [1] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris,Gennady Laventman, Yacov Manevich, et al. 2018. Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. In

Proceedingsof the Thirteenth EuroSys Conference . IEEE, New York, NY, United States, 15.[2] AWS. 2021.

Amazon EC2 Pricing . Amazon Web Services. Retrieved 2021-07-02 from https://aws.amazon.com/ec2/pricing/on-demand/?nc1=h_ls[3] Arati Baliga, Nitesh Solanki, Shubham Verekar, Amol Pednekar, Pandurang Kamat, and Siddhartha Chatterjee. 2018. Performance Characterizationof Hyperledger Fabric. In

Crypto Valley Conference on Blockchain Technology . IEEE, Zug, Switzerland, 65–74.[4] Roman Beck, Christoph Müller-Bloch, and John Leslie King. 2018. Governance in the Blockchain Economy: A Framework and Research Agenda.

Journal of the Association for Information Systems

19, 10 (2018), 1020–1034.[5] Patrik Bichsel, Carl Binding, Jan Camenisch, Thomas Groß, Tom Heydt-Benjamin, Dieter Sommer, and Greg Zaverucha. 2009.

CryptographicProtocols of the Identity Mixer Library . IBM. Retrieved 2021-02-14 from http://patrik.biche.ch/pub/rz3730.pdf[6] Blockbench. 2021. Blockbench Repository. Retrieved 2021-02-14 from https://github.com/ooibc88/blockbench[7] Vitalik Buterin et al. 2014. A Next-Generation Smart Contract and Decentralized Application Platform. Retrieved 2021-02-14 from https://ethereum.org/en/whitepaper/[8] Caliper. 2021. Hyperledger Caliper Repository. Retrieved 2021-02-14 from https://github.com/hyperledger/caliper[9] Jan Camenisch, Manu Drijvers, and Maria Dubovitskaya. 2017. Practical UC-Secure Delegatable Credentials with Attributes and their Application toBlockchain. In

Proceedings of the SIGSAC Conference on Computer and Communications Security . ACM, Dallas, Texas, USA, 683–699.[10] Fran Casino, Thomas K Dasaklis, and Constantinos Patsakis. 2019. A Systematic Literature Review of Blockchain-based Applications: CurrentStatus, Classification and Open Issues.

Telematics and Informatics

36 (2019), 55–81.[11] Tien Tuan Anh Dinh, Rui Liu, Meihui Zhang, Gang Chen, Beng Chin Ooi, and Ji Wang. 2018. Untangling Blockchain: A Data Processing View ofBlockchain Systems.

Transactions on Knowledge and Data Engineering

30, 7 (2018), 1366–1385.[12] Tien Tuan Anh Dinh, Ji Wang, Gang Chen, Rui Liu, Beng Chin Ooi, and Kian-Lee Tan. 2017. Blockbench: A Framework for Analyzing PrivateBlockchains. In

Proceedings of the International Conference on Management of Data . ACM, Chicago, Illinois, USA, 1085–1100.[13] DLPS. 2021. DLPS Repository. Retrieved 2021-02-14 from https://github.com/DLPS-Framework[14] Julian Dreyer, Marten Fischer, and Ralf Tönjes. 2020.

Performance Analysis of Hyperledger Fabric 2.0 Blockchain Platform . Association for ComputingMachinery, New York, NY, USA, 32–38.[15] Gilbert Fridgen, Sven Radszuwill, Nils Urbach, and Lena Utz. 2018. Cross-organizational Workflow Management using Blockchain Technology –Towards Applicability, Auditability, and Automation. In

Proceedings of the 51st Hawaii International Conference on System Sciences . IEEE, Wailea,Maui, Hawaii, USA, 3507–3516.[16] D. Geneiatakis, Y. Soupionis, G. Steri, I. Kounelis, R. Neisse, and I. Nai-Fovino. 2020. Blockchain Performance Analysis for Supporting Cross-BorderE-Government Services.

IEEE Transactions on Engineering Management

67, 4 (2020), 1310–1322.[17] Tobias Guggenberger, André Schweizer, and Nils Urbach. 2020. Improving Interorganizational Information Sharing for Vendor Managed Inventory:Toward a Decentralized Information Hub Using Blockchain Technology.

IEEE Transactions on Engineering Management

67, 4 (2020), 1074–1085.[18] Yue Hao, Yi Li, Xinghua Dong, Li Fang, and Ping Chen. 2018. Performance Analysis of Consensus Algorithm in Private Blockchain. In

IntelligentVehicles Symposium

Private Data Collections on Hyperledger Fabric . IBM. Retrieved 2021-02-14 from https://github.com/IBM/private-data-collections-on-fabric[21] Thomas Jensen, Jonas Hedman, and Stefan Henningsson. 2019. How TradeLens Delivers Business Value With Blockchain Technology.

MIS QuarterlyExecutive

18, 4 (2019), 221–243.[22] Niclas Kannengießer, Sebastian Lins, Tobias Dehling, and Ali Sunyaev. 2020. Trade-Offs between Distributed Ledger Technology Characteristics.

Comput. Surveys

53, 2 (2020), 37.[23] Niclas Kannengießer, Sebastian Lins, Tobias Dehling, and Ali Sunyaev. 2020. Trade-Offs between Distributed Ledger Technology Characteristics.

ACM Comput. Surv.

53, 2 (2020), 37.[24] John Kolb, Moustafa AbdelBaky, Randy H. Katz, and David E. Culler. 2020. Core Concepts, Challenges, and Future Directions in Blockchain: ACentralized Tutorial.

ACM Comput. Surv.

53, 1 (2020), 39.[25] Jay Kreps, Neha Narkhede, Jun Rao, et al. 2011. Kafka: A Distributed Messaging System for Log Processing. In

Proceedings of the NetDB , Vol. 11.ACM, Athens, Greece, 7. Manuscript submitted to ACM [26] Murat Kuzlu, Manisa Pipattanasomporn, Levent Gurses, and Saifur Rahman. 2019. Performance Analysis of a Hyperledger Fabric BlockchainFramework: Throughput, Latency and Scalability. In

IEEE International Conference on Blockchain . IEEE, Atlanta, Georgia, USA, 536–540.[27] Olga Labazova, Tobias Dehling, and Ali Sunyaev. 2019. From Hype to Reality: A Taxonomy of Blockchain Applications. In

Proceedings of the 52ndHawaii International Conference on System Sciences . IEEE, Wailea, Maui, Hawaii, USA, 10.[28] Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine Generals Problem.

ACM Transactions on Programming Languages andSystems (TOPLAS)

4, 3 (1982), 382–401.[29] Chaoqun Ma, Xiaolin Kong, Qiujun Lan, and Zhongding Zhou. 2019. The Privacy Protection Mechanism of Hyperledger Fabric and its Applicationin Supply Chain Finance.

Cybersecurity

2, 1 (2019), 9.[30] Jens Mattke, Christian Maier, and Axel Hund. 2019. How an Enterprise Blockchain Application in the U.S. Pharmaceuticals Supply Chain is SavingLives.

MIS Quarterly Executive

18, 4 (2019), 246–261.[31] Daniel Miehle, Dominic Henze, Andreas Seitz, Andre Luckow, and Bernd Bruegge. 2019. PartChain: A Decentralized Traceability Application forMulti-Tier Supply Chain Networks in the Automotive Industry. In

IEEE International Conference on Decentralized Applications and Infrastructures .IEEE, Newark, California, USA, 140–145.[32] D. Miehle, D. Henze, A. Seitz, A. Luckow, and B. Bruegge. 2019. PartChain: A Decentralized Traceability Application for Multi-Tier Supply ChainNetworks in the Automotive Industry. In . 140–145.[33] Satoshi Nakamoto. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. Retrieved 2021-02-14 from http://bitcoin.org/bitcoin.pdf[34] Qassim Nasir, Ilham A Qasse, Manar Abu Talib, and Ali Bou Nassif. 2018. Performance Analysis of Hyperledger Fabric Platforms.

Security andCommunication Networks

1, 2018 (2018), 14.[35] Thanh Son Lam Nguyen, Guillaume Jourjon, Maria Potop-Butucaru, and Kim Loan Thai. 2019. Impact of Network Delays on Hyperledger Fabric. In

INFOCOM 2019 – Conference on Computer Communications Workshops . IEEE, Paris, France, 222–227.[36] Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In { USENIX } Annual Technical Conference . USENIXAssociation, Philadelphia, Philadelphia, USA, 305–319.[37] Suporn Pongnumkul, Chaiyaphum Siripanpornchana, and Suttipong Thajchayapong. 2017. Performance Analysis of Private Blockchain Platformsin Varying Workloads. In . IEEE, Vancouver, Canada, 6.[38] Alexander Rieger, Jannik Lockl, Nils Urbach, Florian Guggenmos, and Gilbert Fridgen. 2019. Building a Blockchain Application that Complies withthe EU General Data Protection Regulation.

MIS Quarterly Executive

18, 4 (2019), 263–279.[39] Johannes Sedlmeir, Philipp Ross, André Luckow, Jannik Lockl, Daniel Miehle, and Gilbert Fridgen. 2021. The DLPS: A Framework for BenchmarkingBlockchains. In

Proceedings of the 54th Hawaii International Conference in System Sciences . IEEE, Wailea, Maui, Hawaii, USA, 6855–6864.[40] Nick Szabo. 1997. Formalizing and Securing Relationships on Public Networks. Retrieved 2021-02-14 from https://firstmonday.org/ojs/index.php/fm/article/view/548/469[41] Parth Thakkar and Senthil Nathan. 2020. Scaling Hyperledger Fabric Using Pipelined Execution and Sparse Peers. Retrieved 2021-02-14 fromhttps://arxiv.org/pdf/2003.05113.pdf[42] Parth Thakkar, Senthil Nathan, and Balaji Viswanathan. 2018. Performance Benchmarking and Optimizing Hyperledger Fabric Blockchain Platform.In

IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems . IEEE, Banff, Alberta,Canada, 264–276.[43] Canhui Wang and Xiaowen Chu. 2020. Performance Characterization and Bottleneck Analysis of Hyperledger Fabric. Retrieved 2021-02-14 fromhttps://arxiv.org/pdf/2008.05946[44] Xiaoqiong Xu, Gang Sun, Long Luo, Huilong Cao, Hongfang Yu, and Athanasios V. Vasilakos. 2021. Latency Performance Modeling and Analysis forHyperledger Fabric Blockchain Network.

Information Processing & Management

58, 1 (2021), 13.[45] Rui Zhang, Rui Xue, and Ling Liu. 2019. Security and Privacy on Blockchain.