[PDF] SonicChain: A Wait-free, Pseudo-Static Approach Toward Concurrency in Blockchains

Abstract

Blockchains have a two-sided reputation: they are praised for disrupting some of our institutions through innovative technology for good, yet notorious for being slow and expensive to use. In this work, we tackle this issue with concurrency, yet we aim to take a radically different approach by valuing simplicity. We embrace the simplicity through two steps: first, we formulate a simple runtime mechanism to deal with conflicts called concurrency delegation. This method is much simpler and has less overhead, particularly in scenarios where conflicting transactions are relatively rare. Moreover, to further reduce the number of conflicting transactions, we propose using static annotations attached to each transaction, provided by the programmer. These annotations are pseudo-static: they are static with respect to the lifetime of the transaction, and therefore are free to use information such as the origin and parameters of the transaction. We propose a distributor component that can use the output of this pseudo-static annotations and use them to effectively distribute transactions between threads in the least-conflicting way. We name the outcome of a system combining concurrency delegation and pseudo-static annotations as SonicChain. We evaluate SonicChain for both validation and authoring tasks against a common workload in blockchains, namely, balance transfers, and observe that it performs expectedly well while introducing very little overhead and additional complexity to the system.

Full PDF

VVrije Universiteit Amsterdam Universiteit van Amsterdam

Master’s Thesis

SonicChain: A Wait-free, Pseudo-StaticApproach Toward Concurrency inBlockchains

Author:

Kian Paimani (2609260) dr. ir. A.L. Varbanescu dr. F. Regazzoni

A thesis submitted in fulﬁllment of the requirements forthe Master of Science degree in Parallel and Distributed Computer Systems

February 19, 2021 a r X i v : . [ c s . D C ] F e b In the future, trusting an opaque institution, a middleman, merchant or intermediarywith our interest, would be as archaic a concept, as reckoning on abacuses today”– Dr. Gavin Wood ii relude: Web 3.0 My wife asked me why I spoke so softly in the house. I said I was afraid MarkZuckerberg was listening!She laughed. I laughed. Alexa laughed. Siri laughed. – Web3 Summit 2019 - BerlinI want to brieﬂy recall how I got into the blockchain ecosystem, and how it turned outto be much more serious and important than what I thought, and why I think you shouldthink the same way. I will be brief though, there is a lot more to read in this thesis.I started developing blockchains in 2019, and at ﬁrst I always admittedly said to myfriends and colleagues that I am interested in it from an engineering perspective. The onlyother facade that I knew to blockchains was its monetary aspect. In short, blockchainswere two things to me: Complex distributed systems, and getting rich overnight. I triedto convince myself that I am only interested in the former.Soon after, through the works, papers, and talks of visionary pioneers of the blockchainecosystem, I learned about the third, perhaps the most important facade of blockchains:That it is merely a tool in a much larger ecosystem of ideas and beliefs about how wecollaborate, perform logistics, and live as a society. Diﬀerent people have drawn diﬀerentideas here; I will stick to just one that I ﬁnd the most relevant: Web 3.0.Let us look back: web 1.0 was the static web of the ’90s and the early years of the 20thcentury. It had limited interaction, and only served to show static information. The earlyweb and its underlying protocols were designed for government and educational purposes.

Trust is an absolutely vital asset in web 1.0.Then came along Mark Zucker... I mean web 2.0. It was dynamic, it could do much moreand started oﬀering way more. We starter building ﬁnancial and institutional systems ontop of it. Except, we forgot to update any of the protocols. Web 2.0 expanded very quickand very fast, and brought us here: We can do many great things with web 2.0, it is great.But the data, privacy, and sovereignty of the user is entirely at the mercy of centralized,giant tech corporations. In a sense, we let web 2.0 expand exponentially, without thinkingiiihat its underlying protocols – the trust – is still there, and it is becoming more dangerousevery day.It was quite a shock to me to learn that blockchain-enthusiasts are as excited to talkabout Satoshi Nakatomo and the daily bitcoin price, as they are to talk about the Snowdenrevolution, privacy, advertising, and democracy. And this is where web 3.0 comes into play.Web 3.0, essentially, is supposed to eliminate the trust embedded in the web 2.0 protocols,allowing users to interact with one another in a trust-less, decentralized, and more securemanner, using protocols that are governed transparently, rather than being owned by abig cooperation. And in this picture, blockchain is really just the tip of the iceberg, justone piece of the puzzle. A much wider army of tools, protocols and science is neededto develop and build web 3.0. This ranges from bare-bone sciences such as probabilityand cryptography, to social matters such as privacy, identity, and transparent governanceof protocols, all the way to the typical stuﬀ: user-friendly interfaces and APIs for otherdevelopers to build upon.I learned over time that blockchains are not as impressive as they seem. Rather it is theunderlying aspiration that makes it truly exciting. I hope you feel similar someday, andthis thesis might help you learn some of the basics of blockchain along the way.iv bstract

Blockchains have a two-sided reputation: they are praised for disrupting some ofour institutions through innovative technology for good, yet notorious for beingslow and expensive to use. In this work, we tackle this issue with concurrency,yet we aim to take a radically diﬀerent approach by valuing simplicity . Weembrace the simplicity through two steps: ﬁrst, we formulate a simple runtimemechanism to deal with conﬂicts called concurrency delegation . This methodis much simpler and has less overhead, particularly in scenarios where conﬂict-ing transactions are relatively rare. Moreover, to further reduce the number ofconﬂicting transactions, we propose using static annotations attached to eachtransaction, provided by the programmer. These annotations are pseudo-static:they are static with respect to the lifetime of the transaction, and therefore arefree to use information such as the origin and parameters of the transaction.We propose a distributor component that can use the output of this pseudo-static annotations and use them to eﬀectively distribute transactions betweenthreads in the least -conﬂicting way. We name the outcome of a system combin-ing concurrency delegation and pseudo-static annotations as

SonicChain . Weevaluate SonicChain for both validation and authoring tasks against a com-mon workload in blockchains, namely, balance transfers, and observe that itperforms expectedly well while introducing very little overhead and additionalcomplexity to the system. cknowledgements

I learned all of the engineering and blockchain knowledge needed to conductthis research through Parity Technologies, a pioneering company developing awide variety of technologies for the decentralized future. I joined parity withalmost no experience in such ﬁelds, and ﬁnd myself extremely lucky that I wasgiven the opportunity to learn from their engineers, make mistakes and growalong the way. My work with Parity will continue after this thesis is over, andI hope for it to last for many more years.I took two courses in 2018 at UvA (even though I technically study at VU)about concurrency, both supervised by Dr. Ana Varbanescu. Both were soenticing that they sparked my interest in concurrency and performance engi-neering, eventually giving me the idea to mix it with blockchains for my ﬁnalthesis. I recall the ﬁrst meeting that I had with her about my thesis, and tothis day I am grateful that she accepted and supported my work from that veryﬁrst day. Looking back at it in hindsight, I probably wouldn’t have acceptedmyself back then if I was in her shoe, give that I had a very vague idea of whatmy main idea is, and the rather unorthodox domain of the thesis. I can onlyhope to continue learning from Dr. Ana Varbanescu and enjoy it, as I alwayshave. ontents List of Figures ivList of Tables vGlossary vi1 Introduction 1

ONTENTS

References 74 iii ist of Figures ist of Tables lossary

Account Nonce

An integer linked to each account and incremented per transaction. Used to preventreplay attacks, page 22

Block

A bundle of transaction that can be appended to the chain, page 14

Block Authoring

The sensitive task of preparing a new block and propagating it to the network, page 15

Blockchain Network

The set of entire nodes holding (a partial) copy of a particular ledger, page 5

Canonical Chain

The current based chain that most of the nodes agree upon (particularly in the contextof a fork), page 18

Concurrency Avoidance

A scheme in contrast to concurrency control in which no synchronization isused between the threads, page 37

Genesis Block

The ﬁrst block of each chain, and it has no parent hash, page 14

Node

A single entity in a network, page 5

Origin

The source (i.e. the sender) of a transaction. Usually Provided through public-key cryptography,page 12

Parent Hash

A hash of the previous block, mentioned in the header of all blocks, page 14

Runtime

The portion of the blockchain’s code that executes transactions, page 21

State root

The hash of the entire state, when represented as a Merkle tree, page 19

Taintable State

Special HashMap-like data structure that assigns a taint to each key, page 39

Transaction

Arbitrary data provided by the outer world that can update the blockchain’s state, page 12 vi Introduction “If Bitcoin was the calculator, Ethereum was the ENIAC . It is expensive, slow,and hard to work with. The challenge of today is to build the commodity, ac-cessible and performant computer.” – Unknown.Blockchains are indeed an interesting topic in recent and coming years. Many believethat they are a revolutionary technology that will shape our future societies, much like theinternet and how it has impacted many aspects of how we live in the last few decades (1).Moreover, they are highly sophisticated and inter-disciplinary software artifacts, achievinghigh levels of decentralization and security, which was deemed impossible so far. To thecontrary, some people skeptically see them as controversial, or merely a "hyped hoax", anddoubt that they will ever deliver much real value to the world. Nonetheless, through therest of this chapter and this work overall, we provide ample reasoning to justify why wethink otherwise.In very broad terms, a blockchain is a tamper-proof, append-only ledger that is beingmaintained in a decentralized fashion and can only be updated once everyone agrees uponthat change as a bundle of transactions. This bundle of transactions is called a block . Oncethis block is agreed upon, it is appended (aka. chained ) to the ledger, hence the term block - chain . Moreover, the ledger itself is public and openly accessible to anyone. This meansthat everyone can verify and check the ﬁnal state of the ledger, and all the transactions andblocks in its past that lead to this particular ledger state, to verify everything. At the sametime, asymmetric cryptography is the core identity indicator that one needs to interactwith the chain, meaning that one’s personal identity can stay private in principle, if thatis desired based on the circumstances. For example, one’s public key is not revealing their the ﬁrst generation computer developed in 1944. It ﬁlls a 20-foot by 40-foot room and has 18,000vacuum tubes. .1 Research Questions personal identity, whilst using names and email addresses does, like in many traditionalsystems.In some sense, blockchains are revolutionary because they remove the need for trust ,and release it from the control of one entity (e.g. a single bank, an institute, or simplyGoogle), by encoding it as a self-sovereign decentralized software. Our institutions arebuilt upon the idea that they manage people’s assets, matters and belongings, and theyensure veracity, because we trust in them . In short, they have authority . Of course, thismodel could work well in principle, but it suﬀers from the same problem as some softwaredo: it is a single point of failure . A corrupt authority is just as destructive as a ﬂawedsingle point of failure in a software is. Blockchain attempts to resolve this by deployingsoftware (i.e. itself) in a transparent and decentralized manner, in which everyone’s privacyis respected, whilst at the same time everyone can still check the veracity of the ledger,and the software itself. In other words, no single entity should be able to have full controlover the system.Now, all of these great properties do not come cheap. Blockchains are extremely compli-cated pieces of software, and they require a great deal of expertise to be written. Moreover,many of the machinery used to create this decentralized and public append-only ledgerrequires synchronization, serialization, or generally other procedures, that are likely to de-crease the throughput at which the chain can process transactions. This, to some extent,contributes to the skepticism about blockchains’ feasibility. For example, Bitcoin, one ofthe famously deployed blockchains to date, consumes a lot of resources to operate, andcannot execute more than around half a dozen transactions per second (2).Therefore, it is a useful goal to investigate the possibilities through which a blockchainsystem can be enhanced to operate faster , eﬀectively delivering a higher throughput oftransactions per some unit of time. We have seen that blockchains are promising in their technology, and unique traits thatthey can deliver. Yet, they are notoriously slow. Therefore, we pursue the goal of improvingthe throughput of a blockchain system. By throughput, we mean the number of successfultransactions that can be processed per some unit of time. There are numerous ways toachieve this goal, ranging from redesigning internal protocols within the blockchain toapplying concurrency. In this thesis, we precisely focus on the latter, enabling concurrencywithin transactions that are processed and then appended to the ledger. Moreover, we do2 .2 Thesis Outline so by leveraging, and mixing the best attributes of two diﬀerent realms of concurrency,namely static analysis and runtime conﬂict detection . This approach is better comparedwith other alternatives in section 3.1.Based on this, we formulate the following as our research questions: RQ1

What approaches exist to achieve concurrent execution of transactions within ablockchain system, and improve the throughput?

RQ2

How could both static analysis and runtime approaches be combined together toachieve a new approach with minimum overhead and measurable beneﬁts?

RQ3

How would such an approach be evaluated against and compared to others?

The rest of this thesis is organized as follows: In chapter 2 we provide a comprehensivebackground on both blockchains and concurrency, the two pillars of knowledge that wewill build upon. Chapter 3 starts by deﬁning the requirements of a hypothetical systemof interest. Then, some of the contemporary literature and the ways that they fulﬁll therequirements are mentioned, eﬀectively acting as our mini "related work" section. Finally,we introduce our own approach and place it next to other approaches for comparison.Chapter 4 acts as a mini detour into the implementation of our system design. In chapter5 we evaluate our design and implementation, ﬁnally bringing us to a conclusion and futurework in chapter 6. By static we mean generally anything which is known at the compile phase, and by runtime the execution phase Background “The use of credit cards today is an act of faith on the part of all concerned. Eachparty is vulnerable to fraud by the others, and the cardholder, in particular, hasno protection against surveillance.” – David Chum et. al. - 1990In this chapter, we dive into the background knowledge needed for the rest of this work.Two primary pillars of knowledge need to be covered: blockchains and distributed systemsin section 2.1 and concurrency, upon which our solution will be articulated, in section 2.2.

In this section, we provide an overview of the basics of distributed systems, blockchains,and their underlying technologies. By the end of this section, it is expected that an averagereader will know enough about blockchain systems to be able to follow the rest of our work,and understand the approach proposed in chapter 3 and onwards.

An introduction to blockchain is always entangled with distributed and decentralized sys-tems.A distributed system is a system in which a group of nodes (each having their ownprocessor and memory) cooperate and coordinate for a common outcome. From the per-spective of an outside user, most often the distributed nature of the system is transparent,and all the nodes can be seen and interacted with, as if they were one cohesive system (3).Indeed, some details diﬀer between the distributed systems and blockchains, yet theunderlying concepts resonate in many ways (4), and blockchains can be seen as a form ofdistributed systems. Like a distributed system, a blockchain also consists of many nodes,4 .1 Blockchains And Distributed Ledger Technology operated either by organizations, or by normal people with their commodity computers.Similarly, this distribution trait is transparent to the end-user when they want to interactwith the blockchain, and they indeed see the system as one cohesive unit. All of thesenodes together form the blockchain network .Blockchains are also decentralized . This term was ﬁrst introduced in a revolutionarypaper in 1964 as a middle ground between purely centralized systems that have a singlepoint of failure, and 100% distributed systems, which are like a mesh (all nodes having linksto many other nodes (5) ). A decentralized system falls somewhere in between, where nosingle node’s failure can irrecoverably damage the system, and communication is somewhatdistributed, where some nodes might act as hops between diﬀerent sub-networks.Blockchains, depending on the implementation, can resonate more with either of theabove terms. Most often, from a networking perspective, they are much closer to theideals of a distributed system. From an operational and economical perspective, they canbe seen more as decentralized, where the operational power (i.e. the authority ) falls intothe hands of no single entity, yet a large enough group of authorities. Figure 2.1: Types of Networks. - From left to right: Centralized, Decentralized, andDistributed.

While most people associate the rise of blockchains with Bitcoin, it is indeed incorrect,because the basic ideas of blockchains were mentioned decades earlier. The ﬁrst relevantresearch paper was already mentioned in 2.1.1. Namely, in (5), besides the deﬁnition ofa decentralized system, the paper also describes many metrics regarding how secure anetwork should be, under certain attacks. The design of Paul Baran, author of (5), was ﬁrst proposed, like many other internet-related tech-nologies, in a military context. His paper was a solution to the USA’s concern about communication linksin the aftermath of a nuclear attack in the midst of the cold war (6). .1 Blockchains And Distributed Ledger Technology Next, (7) famously introduced what is know as Diﬃe-Hellman Key Exchange, which isthe backbone of public-key encryption. Moreover, this key exchange is heavily inspiredby (8), which depicts more ways in which cryptography can be used to secure onlinecommunication. Together, these papers form the digital signature scheme , which is heavilyused in all blockchain systems .Moreover, the idea of blockchain itself predates Bitcoin. The idea of chaining datatogether, whilst placing some digest of the previous piece (i.e. a hash thereof) in theheader of the next one was ﬁrst introduced in (9). This, in fact, is exactly the underlyingreason that a blockchain, as a data structure, can be seen as an append-only, tamper-proofledger. Any change to previous blocks will break the hash chain and cause the hash of thelatest block to become diﬀerent, making any changes to the history of the data structureidentiﬁable, hence tamper-proof .Finally, (10) introduced the idea of using digital computers as a means of currency in1990, as an alternative to the rise of credit cards at the time. There were a number ofproblems with this approach, including the famous double-spend problem, in which an en-tity can spend one unit of currency numerous times. Finally, in 2008, an unknown scientistwho used the name Satoshi Nakatomo released the ﬁrst draft of the Bitcoin whitepaper.In his work, he proposed Proof of Work as a means of solving the double-spend problem,among other details and improvements (11). Note that the idea of Proof of Work itselfgoes back, yet again, to 1993. This concept was ﬁrst introduced in (12), as means of spamprotection in early email services. Before going any further, we provide this, so-called, 1000 feet view of the blockchain. Thissmall section provides the big picture in the blockchain world, while the upcoming sectionsgo into the ﬁne depth thereof. This section introduces a lot of jargon all at once, but willhelp comprehend the rest of the chapter.Figure 2.2 shows a very broad overview of a blockchain network and the processes withinit. Each red circle is a node in the decentralized network. We zoom into only one of them.Each node holds two very important components: A runtime, and a ledger (dashed bluebox).The ledger is composed of the chain of blocks, and some auxiliary state for each block.These two are basically the entire view of the world. The blocks are linked together by a Many of these works were deemed military applications at the time, hence the release dates are whatis referred to as the "public dates", not the original, potentially concealed dates of their discovery. .1 Blockchains And Distributed Ledger Technology Figure 2.2: A 100ft View of the Blockchain Network - Nodes (red circles), Ledgerstate (dashed green box), Transaction (green diamonds). .1 Blockchains And Distributed Ledger Technology hash chain. We see a block outside of the state (black block). This is a candidate block,one that this node might propose to the rest of the network at some point to be appended.The runtime is simply the core logic of the chain. This core logic, the runtime, has theresponsibility of updating the ledger based on transactions . Transactions (green diamonds)are sent to a node from the outer world, potentially from end-users. The transactions arekept in a separate data structure (transaction queue) until getting a chance at being putinto a block. Having known where the blockchain’s idea originates from, and which ﬁelds of previousknowledge in the last half a century it aggregates, we can now have a closer look atthese technologies and eventually, build up a clear and concrete understanding of what ablockchain is and how it works.

We mentioned the Diﬃe-Hellman key exchange scheme in section 2.1.2. Key exchangeis basically a mechanism to establish a symmetric key, using only asymmetric data. Inother words, two participants can come up with a common shared symmetric key (used forencrypting data) without ever sharing it over the network . Indeed, while the underlyingprinciples are the same, for better performance, most modern distributed systems work withanother mechanism that is the more advanced variant of Diﬃe-Hellman, namely EllipticCurve Cryptography (ECC). Elliptic Curves oﬀer the same properties as Diﬃe-Hellman,with similar security measures, whilst being faster to compute and needing smaller keysizes. A key exchange, in short, allows for asymmetric cryptography , the variant ofcryptography that needs no secrete medium to exchange initial keys, and therefore is trulyapplicable to distributed systems. In asymmetric cryptography, a key pair is generatedat each entity. A public key, which can be, as the name suggests, publicly shared withanyone, and a private key that must be kept secret. Any data signed with the private keycan be veriﬁed using the public key. This allows for integrity checks, and allows anyone toverify the origin of a message. Hence, the private key is also referred to as the signature .Moreover, any data encrypted with the public key can only be decrypted with the privatekey. This allows conﬁdentiality. Readers may refer to (7) for more information about the details of how this mechanism works. .1 Blockchains And Distributed Ledger Technology Many useful properties can be achieved using asymmetric cryptography, and many mas-sively useful applications adopt it . For blockchains, we are particularly interested in signatures . Signatures allow entities to verify the integrity and the origin of any message.Moreover, the public portion of a key, i.e. a public key, can be used as an identiﬁer for anentity.For example, in the context of banking, a public key can be seen as an account number.It is public and known to everyone, and knowing it does not grant anyone the authorityto withdraw money from an account. The private key is the piece that gives one entity authority over an account, much like your physical presence at the bank and signing apaper, in a traditional banking system. This is a very common pattern in almost allblockchain and distributed systems: using private keys to sign messages, and using publickeys as identities.RSA and DSA are both non-elliptic signature schemes that are commonly known to date.ECDSA, short for E lliptic C ureve DSA, is the Elliptic Curve variant of the latter. Albeit,ECDSA is a subject of debate, due to its proven insecurities (13), and its performance.Hence, more recent, non-patented and open standard curves, such as EdDSA, are the mostcommonly used. EdDSA, short for Edwards-curve Digital Signature Algorithm is basedon the open standard Edward-curve and its reference, parameters, and implementation areall public domain.All in all, cryptography, and speciﬁcally digital signatures, play an integral role in theblockchain technology, and allow it to operate in a distributed way, where external messagescan be veriﬁed from their signatures. Hash functions, similar to elliptic curve cryptography, are among the mathematical back-bones of blockchains. A hash function is basically a function that takes some bits of dataas input and returns some bits of output in return. All hash functions have an importantproperty: they produce a ﬁxed sized output , regardless of the input size. Also, a hashfunction ensures that changing anything in the input, as small as one bit, does result inan entirely diﬀerent output. The device that you are using to read this line of text has probably already done at least one operationrelated to asymmetric cryptography since you started reading this footnote. This is how relevant they really are. Unlike ECDSA which is developed and patented by NIST, which in fact is the reason why manypeople doubt its security. .1 Blockchains And Distributed Ledger Technology Given these properties, one can assume that the hash of some piece of data can be seenas its digest . If the hash of two arbitrarily large pieces of data is the same, you can assumethat their underlying data are indeed the same. This is quite helpful to ensure that somecloned data is not tampered with, in a distributed environment. If we only distribute thehash of the original copy in a secure way, everyone can verify that they have a correctclone, without the need to check anything else.Albeit, a secure hash function needs to provide more properties. First, the hash functionneeds to ensure that no two diﬀerent inputs can lead to the same hash. This - the situationthat diﬀerent inputs generate the same hash - is called a collision , and the probability ofcollision in a hash function should be suﬃciently low for it to be secure. Moreover, a hashfunction must be a one way function, meaning that it cannot be reversed in a feasiblemanner. By feasible we eﬀectively mean timely : if reversing a function takes a fewmillion years, it is possible , but not feasible . The entire security of hash functions anddigital signatures is based on the fact that breaking them is not feasible . So, given somehash output, one cannot know the input that leads to that hash. Hash functions thathave this property are typically called cryptographic hash functions. Cryptographic hashfunctions are commonly used, next to asymmetric cryptography, for authentication andintegrity checks, where the sender can sign only a hash of a message and send it over thenetwork, such as in the common M essage A uthentication C ode, pattern (14) (MAC). From a networking perspective, a blockchain is a purely peer to peer distributed network.A peer to peer network is one in which many nodes form a mesh of connections betweenthem, and they are more or less of the same role and privilege.A peer to peer network is the architectural equivalent of what was explained as a dis-tributed network earlier in this chapter. The opposing architecture is what is knownas the client-server model, in which one node is the server and everyone else is a client.In other words, the client-server model is centralized , in the sense that only the servercontains valuable resources (whatever that resource might be: computation power, data,etc.) and serves it to all other clients.Unlike a client-server model, a peer to peer network does not have a single point offailure: there is no notion of client and server, and all of the entities have the same role,being simply called nodes . Having no servers to serve some data, it is straightforward tosay that peer to peer networks are collaborative . A node can consume some resources from And, yes, we are aware of quantum computing, but that is a story for another day. .1 Blockchains And Distributed Ledger Technology another node by requesting data from it, whilst being the producer for another node byserving data to it. This is radically diﬀerent from the traditional client-server model, inwhich the server is always the producer and clients are only consumers, and eﬀectivelyhave no control over the data that they are being served.Each node in a peer to peer network is constantly talking to other neighboring nodes. Forthis, they establish communication links between one another. Regardless of the transportprotocol (TCP, QUIC (15), etc.), these connections must be secure and encrypted for thenetwork to be resilient. Both elliptic curve cryptography and hash functions explained inthe previous sections, provide the technology needed to achieve this.In the rest of this work, we are particularly interested in the fact that in a blockchainsystem, the networking layer provides gossip capability. The gossip protocol is an epidemicprocedure to disseminate data to all neighboring nodes, to eventually reach the entirenetwork. In a nutshell, it is an eventually consistent protocol to ensure that some messagesare being constantly gossiped around, until eventually everyone sees them. Blockchainsuse the gossip protocol to propagate the messages that they receive from the end-user (themost important of which being transactions, explained in 2.1.4.5).A distributed system must be seen as a cohesive system from outside. Thus, a transactionthat a user submits to one node of the network should have the same chance of beingappended to the ledger by any of the nodes in the future. Therefore, it must be gossipedaround. This becomes more clear when we discuss block authoring in section 2.1.4.7. Shifting perspective, a blockchain is akin to a distributed database with very high redun-dancy, namely one copy per node. One might argue that this is too simplistic, but eventhe brief description that we have already provided (see 2.1.3) commensurates with this.Transactions can be submitted to a blockchain. These transactions are then added to abundle, called a block, which is then chained with all the previous blocks, forming a chainof blocks. All nodes maintain their view of this chain of blocks, and basically, that is whatthe blockchain is: a database for storing some chain of blocks.Now we can explain why in ﬁgure 2.2 each block was linked with some auxiliary data.Next to the block database, most blockchains store other data as well, to facilitate morecomplex logic. For example, in Bitcoin, that logic needs to maintain a list of accounts andbalances, and perform basic math on top of them . To know an account’s balance, it is In reality, Bitcoin does something slightly diﬀerent, which is known as the UTXO model, which weomit to explain here for simplicity (16). .1 Blockchains And Distributed Ledger Technology infeasible to re-calculate it every time from the known history of previous transactions.That would be equivalent to an ATM machine re-executing all your previous transactionsto know your current balance every time you use it. Thus, we need some sort of databaseas well, to store the auxiliary data that the blockchain logic needs - like, the list of accountsand their balances in our example. This auxiliary data is called the state , and is usuallyimplemented in the form of a key-value database.A key-value database is a database that can be queried similar to a map . Any valueinserted in the database needs to be linked with a key . This value is then placed inconjunction with that key. The same key can be used to retrieve, update, or delete thevalue. For example, in a bitcoin-like system, the keys are account identiﬁers (which wealready mentioned are most often just public cryptographic keys), and the values are simplythe account balances, some numeric value.Indeed, a more complicated blockchain, that does more than simple accounting, willhave a more complicated state layout. Even more, chains that support the execution ofarbitrary code (e.g. Smart Contracts (17)), like Ethereum, allow any key-value data pairto be inserted into the state.One challenge for nodes in a blockchain network is to keep a persistent view of thestate. For example, Alice’s view of how much money Bob owns needs to be the same aseveryone else’s view. But, before we dive into this aspect, let us ﬁrst formalize the meansof updating the state : the transactions . Transactions are pieces of information submitted to the system, that are eventually ap-pended to the blockchain in the form of a new block. And, as mentioned, everyone keepsthe history of all blocks, essentially having the ability to replay the history and make surethat an account claiming to have a certain number of tokens does indeed own it.The concept of state is the main reason why transactions exist. Transactions most oftencause some sort of update to happen in the state. Moreover, transactions are accountable,meaning that they most often contain a signature of their entire payload, to ensure bothintegrity and accountability. For example, if Alice wants to issue a transfer transactionto send some tokens to Bob, the chain will only accept this transaction if is signed withby Alice’s private key. Consequently, if there is a fee associated with this transfer, it is de-ducted from Alice’s account. This is where the link between identiﬁers and public keys alsobecomes more important. Each transaction has an origin , which is basically the identiﬁer Tokens are the equivalent of a monetary unit of currency, like a coin in the jargon of digital money. .1 Blockchains And Distributed Ledger Technology of the entity which sent that transaction. Each transaction also has a signature, which isbasically the entire (or only the sensitive parts of the) payload of the transaction, signedwith the private key associated with the aforementioned origin. Indeed, a transaction isvalid only if the signature and the public key (i.e., the origin ) match.This is a radically new usage of public-key cryptography in blockchains, where one cangenerate a private key using the computational power of a personal machine, and storesome tokens linked to it on a network operated by many decentralized nodes; that privatekey is the one and only key that can unlock those tokens and spend them. Although in thischapter we mostly use examples of a cryptocurrency (e.g. Bitcoin), we should note that anonline currency and token transfer is among the simplest forms of blockchain transactions.Depending on the functionality of a particular blockchain, its transactions can have speciﬁclogic and complexity. Nonetheless, containing a signature and some notion of origin is verycommon for most use cases.Let us recap some facts from the previous sections:• A blockchain is a peer to peer network in which a transaction received by one nodewill eventually reach other nodes.• Nodes apply transaction to update some state .• Nodes need to keep a persistent view of the state.A system with the combination of the above characteristics can easily come to a raceconditions. One ramiﬁcation of this race condition is the double-spend problem. ImagineEve owns 50 tokens. She sends one transaction to Alice, spending 40 tokens. Alice checksthat Eve has enough tokens to make this spend, updates Eve’s account balance to 10,basically updating her own view of the state. Now, if Eve sends the exact same transactionat the same time to Bob, it will also succeed, if Alice and Bob have not yet had time togossip their local transactions to one another.To solve this, blockchains agree on a contract: the state can only be updated via ap-pending a new block to the known chain of blocks, not one single transaction at a time.This allows to compensate for potential gossip delays to some extent, and is explained inmore detail in the next section. 13 .1 Blockchains And Distributed Ledger Technology2.1.4.6 Blocks Blocks are nothing but bundles of transactions, and they allow for ordering, which some-what relaxes the problem explained in the previous section, namely the race conditionbetween transactions. To do so, blocks allow nodes to agree on some particular order toapply transactions. Speciﬁcally, a node, instead of trying to apply transactions that it hasreceived via the gossip protocol, in some random order , will wait to receive a block fromother nodes, and then apply the transactions therein in the same order as stated in theblock. This is called block import .Transactions inside a block are ordered, and applying them sequentially is fully deter-ministic: it will always lead to the same result. Moreover, in the example of the previoussection, it is no longer possible for Eve to spend some tokens twice, because a block willeventually force some ordering of her transactions, meaning that whichever appears secondwill indeed fail, because the eﬀects of the ﬁrst one are already apparent and persistent inthe state of any node that is executing the block. In other words, as long as a node im-ports blocks, the transactions within it are ordered and the aforementioned race conditioncannot happen.A block also contains a small, yet very important piece of data called parent hash . Thisis basically a hash of the entire content of the last know block of the chain. There isexactly one block in each chain that has no parent, the ﬁrst block. This is a special case,and is called the genesis block . This, combined with the properties of the hash functionexplained in 2.1.4.2, brings about the tamper-proof-ness of all the blocks. In other words,the history of operations cannot be mutated. For example, if everyone in the networkalready knows that the parent hash of the last known block is H , it is impossible for Eveto inject, remove, or change any of the previous transaction, because this will inevitablycause the ﬁnal hash to be some other value, H , which in principle should be very diﬀerentfrom H .All in all, blocks make the blockchain more tamper-proof (at least the history of trans-actions), and bring some ordering constraints regarding the order in which transactionsneed to be applied. Nonetheless, with a bit of contemplation, one soon realizes that thisis not really solving the race condition, but rather just changing its granularity . Insteadof the question of which transaction to apply next, we now have the problem of which In principle, the probability of collision (the hash of some tampered chain of blocks being the sameas the valid one) is not absolute zero, but it is so small that it is commonly referred to astronomicallysmall , meaning that it will probably take millions of years for a collision to happen. As a malicious user,you most often don’t want to wait that long. .1 Blockchains And Distributed Ledger Technology block to append next. This is because, intentionally, we haven’t yet mentioned who canpropose new blocks to be appended, and when . We have only assumed that we somehow receive blocks over the network. This brings us to the consensus and authorship of blocks,explained in the next section. The consensus protocol in a blockchain consists of a set of algorithms that ensure all nodesin the network maintain an eventually consistent view of the ledger state (both the chainitself, and the state). The protocol needs to address problems such as emerging networkpartition, software failures, and the Byzantine General Problem (18), the sate in which aportion of the nodes in the network intentionally misbehave.For brevity, we only focus on one aspect of the consensus which is more relevant to ourwork, namely, the decision of block authoring : deciding who can author blocks, and when.Recall that nodes in a blockchain network form a distributed system. Therefore, eachnode could have a diﬀerent view of the blockchain, and each node might also have a diﬀerentset of transactions to build a new block out of (due to the fact that the underlying gossipmight have delivered diﬀerent transactions to diﬀerent nodes at a certain point in time).In principle, any of these nodes can bundle some transactions in a block and propagate itover the network, claiming that it should be appended to the blockchain. This will indeedlead to chaos. To avoid this, block authoring is a mechanism to dictate who can authorthe next block . This decision must be solved in a decentralized and provable manner.For example, in Proof of Work, each block must be hashed together with a variable suchthat the ﬁnal of the block hash has a certain number of leading zeros. This is hard tocompute, hence the system is resilient against an invalid block, namely those that wereauthored without respecting the authoring rules of the consensus protocol. Moreover, thisis provable: any node that receives a candidate block can hash it again and ensure thatthe block is valid with respect to Proof of Work. In this thesis, the terms "block author"and "validators" are used to refer to the entity that proposes the candidate block, and allthe other nodes that validate it, respectively. Deﬁnition 2.1.1.

Authoring and Validating . Author : the network entity that proposes a new candidate block. This task is calledblock authoring . Validators : All other nodes who receive this block and ensure its veracity. The act ofensuring veracity is called validating or importing a block. In some sense, if blockchains are a democratic system, block authoring is a protocol to chose a temporary dictator. .1 Blockchains And Distributed Ledger Technology In a Proof of Work scheme, the next author is basically whoever manages to solve theProof of Work puzzle faster.

Deﬁnition 2.1.2.

Given the adjustable parameter d and a candidate block data b , solvingthe Proof of Work puzzle is the process of ﬁnding a number n such that: Hash ( b || n ) < = d (2.1)Here, d is usually some power of 2 which is equal to a certain number of leading zerosin the output.Indeed, this is very slow and ineﬃcient, to the point that many have raised concerns evenabout the climate impact of the Bitcoin network . There are other consensus schemes,such as Proof of Stake, combined with veriﬁable random functions (20), that solve thesame problem, without wasting a lot of electricity.Nonetheless, we can see how this solves the synchronization issue in blockchains. A blockserializes a bundle of transactions. The consensus protocol, namely its block authoringprotocol, regulates the block production, so that not everyone can propose candidate blocksat the same time. So far, we have only talked about permissionless blockchains, which are the focus of thiswork. Nonetheless, now is a good time to mention that a permissionless blockchain is onlyone out of three categories of blockchains:•

Permissionless blockchains: A type of blockchain in which no single entity has anypower over the network. Such networks are called permissionless because one needsno permission from any authority to perform an action. For example, as long as onepays the corresponding fee, one can always submit a transaction to a permissionlessnetwork, i.e. one cannot be banned by some authority. Or, one can always decide tobe a candidate for block authoring, if one wishes to do so. One might not succeed indoing so (e.g. in a Proof of Work, weak hardware always fails to ﬁnd a proper hashin time and essentially waste power with no gain), but one has the freedom to doall of these actions. Such blockchains truly adhere to the decentralized goals of theblockchain ecosystem. Some estimates show the annual carbon emission of the Bitcoin network is more than that ofSwitzerland(19). .1 Blockchains And Distributed Ledger Technology • Consortium blockchains: In this type of blockchains, users can still interact withthe chain freely, but most consensus critical actions are not permissionless. Forexample, a chain might decide to delegate the task of block authoring to a ﬁxednumber of trusted nodes. In such a scenario, none of the mentioned Proof of Workschemes is needed, and the authoring rules can be simpliﬁed to a round-robin blockauthoring. Albeit, such chains are questionable because they don’t really solve themain problem of making systems trustless. Such chains are called Proof of Authority,meaning that a node can author a block by the virtue of being a member of a ﬁxedset of authorities (21). And from the perspective of the end-user, one must still trust in the honesty and goodwill of these authorities.•

Private blockchains: these blockchains use the same technology as Permissionlessblockchains to establish trust between organizations, but they are not open to thepublic. A common example would be a chain that maintains government recordsbetween diﬀerent ministries.It is important to note that many aspects of the consensus protocol, including its com-plexity, change based on the above taxonomy. The permissionless chains will typicallyhave the most diﬃcult type of consensus, because ensuring veracity is quite hard in adecentralized environment where anyone might misbehave. Albeit, the rationale of thedecentralization advocates is that by making the system transparent and open to the pub-lic, we actually gain more security comparing to hiding it behind servers and ﬁrewalls ,because we can also attract more honest participants, who can check the system and makesure it behaves correctly. Table 2.1:

Types of blockchain based on consensus.

Blockchain Type

Public Consortium Private

Permissionless?

Yes No No

Read?

Anyone Depends Invite Only

Write?

Anyone Trusted Authorities Invite Only

Owner

Nobody Multiple Entities Single Entity

Transaction Speed

Slow Fast FastDue to all this complexity, consensus remains a cutting-edge ﬁeld of research in theblockchain ecosystem. In table 2.1, we show how the consensus is also a major factor One reasonably might see this concept resonating with the Open Source Software movement, whereopen-source software is claimed to be more secure than a closed source one. .1 Blockchains And Distributed Ledger Technology in the throughput of the blockchain, which is our metric of interest in this work. Thiscorrelation is later explained in 3.1. Coming back to the permissionless block authoring schemes mentioned in 2.1.4.7, it turnsout that a perfect consensus cannot exist in a permissionless network (22). Aside fromproblems such as a node being malicious and network partitions, there could be other non-malicious scenarios in which everything in the network is seemingly ﬁne, yet nodes end upwith diﬀerent blockchain views. A simple scenario that can lead to this is if, by chance,two nodes manage to solve the Proof of Work puzzle almost at the same time. They bothcreate a completely valid block candidate and propagate it to the network. Some nodesmight see one of the candidates ﬁrst, while the others might see another one ﬁrst. Such ascenario is called a

Fork : a state in which nodes have been partitioned into smaller groups,each having their own blockchain views. Most consensus protocols solve this by adopting a longest chain rule. Eventually, once all block candidates have been propagated, each nodechooses the longest chain that they can build, and that is the accepted one. This chain iscalled the canonical chain , and the last block in it is called the best-block or the head ofthe blockchain. Based on the canonical chain, the state can also be re-created and stored.Aside from malicious forks (that we do not cover here), and forks due to decentralizationsuch as the example above, there could be federated forks as well. For example, if a groupof nodes in a blockchain network decide to make a particular change in the history, andthey all agree on it, they can simply fork from the original chain and make their newchain. This new chain has some common preﬁx with the original one, but it diverges atsome point. A very famous example of this is the Ethereum Classic fork from Ethereumnetwork (23). After a hack due to a software bug, a lot of funds got frozen in the mainEthereum network. A group of network participants decided to revert the hack. This wasnot widely accepted in the Ethereum ecosystem and thus, a fork happened, giving birthto the Ethereum Classic network.

Recall from 2.1.4.4 that blockchains store some sort of state next to their history of blocksas well. We further explain the reasons for this design in this section. To recap, the state isa key-value database that represents the state of the world, i.e., all the data that is storedbeside the history of blocks. States are mapped with block numbers. With each block, the After all, it deﬁes all the immutability properties of a blockchain. .1 Blockchains And Distributed Ledger Technology Figure 2.3: Forks - The cannon chain and the forked chain both have a common preﬁx, yethave diﬀerent best-blocks. transactions within it could potentially alter the state. Hence, we can interpret this term:"state at block n ": it means the state, given all the blocks from genesis up to n beingexecuted.First, it is important to acknowledge that maintaining the state seems optional, and itis indeed the case. In principle, a node can decide not to maintain the state, and whenevera state value needs to be looked up at a block n , all the blocks from genesis up to n needto be re-executed. This is indeed ineﬃcient. On the contrary, maintaining a copy of thestate for all the blocks also soon becomes a storage bottleneck. In practice, many chainsadopt a middle-ground, in which normal nodes store only the state associated with the last k blocks.Without getting into too all the details, we continue with a problem statement: in such adatabase, it is very expensive for two nodes to compare their state views with one another.In essence, they would have to compare each and every key-value pair individually. Tobe able to use this comparison more eﬃciently, blockchains use a data structure called aMerkle tree (24). A Merkle tree is a tree in which all leaf nodes contain some data, andall non-leaf nodes contain the hash of their child nodes.There are numerous ways to abstract a key-value database with a Merkle tree. Forexample, one could hash the keys in the database to get a ﬁxed size, base 16, string. Then,each value will be stored at a radix-16 tree leaf, which can be traversed by this base 16hash string.In such a data structure, we can clearly see that the root of the Merkle tree has a veryimportant property: it is the ﬁngerprint of the entire data . This piece of data is veryimportant in blockchains, and is usually referred to as state root . In essence, if two nodescompute their individual state roots and compare them, this comparison would conﬁdently Sometimes referred to as "Trie" as well. Named after Ralph Merkle, who also contributed to the foundation of cryptography in (8). .1 Blockchains And Distributed Ledger Technology Figure 2.4: Merkle Tree - A binary Merkel Tree. The root hash contains a digest of allthe 4 data nodes. show if they have the same state or not. This is very similar to how the existence of theparent hash in each block ensures that all nodes have the same chain of blocks: changingonly a bit in a previous block, or a state value in this case, will cause the hashes to nolonger match. Similarly, changing only one value in the entire key-value database will causethe state roots to mismatch.Recalling the deﬁnition of author and validator from 2.1.1, we can now elaborate moreon what a validator exactly does. A validator node, upon receiving a block, should checkthat the block’s author is valid (for example check the proof of work puzzle), and then itre-executes all the transactions in the block, to compute a new state root. Finally, thisstate root is compared with the state root that the block author proposed in the block,and if they match, the block is valid.We can now summarize all the common data that are usually present in a block’s header:• Block number: A numeric representation of the block count, also known as blockchain height .• Parent hash: This is the signature of the blockchain preﬁx.• State root: It is common for a block to also name the state root that should becomputed, if the transactions inside the block body are executed on top of the afore-mentioned parent hash block’s state.20 .1 Blockchains And Distributed Ledger Technology

Our deﬁnition of the basic concepts of blockchain almost ends here. In the next sections,we brieﬂy explain concepts that are more relevant to the implementation of a blockchain,than to the protocol itself.

We coin the term

Runtime as the piece of logic in the blockchain that is responsible for updating the state . To be more speciﬁc, the runtime of any blockchain can be simpliﬁedas a function that takes a transaction as input, has access to read the state, and (asoutput) generates a set of new key-value pairs that need to be updated in the state (or theruntime itself can update the state directly, depending on the design of the system). Thisabstraction will be further used in chapter 3.

Deﬁnition 2.1.3.

Generic Runtime. runtime = f n ( transaction ) → state By deﬁning the transaction queue, we distinguish between transactions that are included in any block, and those that are not. As mentioned, a blockchain node might constantlyreceive transactions, either directly from end-users, or from other nodes, as their role insome sort of gossip protocol. These transactions are all pending , and their existence does not imply anything about the state of the blockchain. Only when, by some means ofconsensus, everyone agrees to append a block to the chain, the transactions within thatblock are included in the chain. Thus, a transaction can be categorized as either included and pending .The transaction queue is the place where all the pending transactions live in. Its im-plementation details are outside the scope of this work, and depend on the needs of theparticular chain. Nonetheless, we highlight the fact that the transaction queue is a com-ponent that sits next to the block authoring process. Once a node wants to author a block(or it just tries to do so, in cases such as Bitcoin, where some Proof of Work puzzle needsto be solved ﬁrst), it will use the transactions that it has received and have been stored inthe transaction queue as a source of block building.

Remark.

The transaction queue is also sometimes called transaction pool . We prefer theterm queue in this work because later on, we depend on the fact that a queue implies orderwhile a pool implies no order. 21 .1 Blockchains And Distributed Ledger Technology2.1.4.13 Transaction Validation

Usually, a transaction needs to pass some bare minimum checks to even be included inthe queue, not to mention being included in the canonical chain. Usually, checks that aremandatory, persistent, and rather cheap to compute can happen right when a transactionis being inserted in the queue. For example, the signature of a transaction must always bevalid, and its validity status persists over time. In other words, if the signature is correct,it will stay correct over time. On the contrary, state-dependent checks usually need tobe performed when a transaction is being included , not when it is being inserted into thequeue. The reason for this is subtle, yet very important. If a transaction is asserting totransfer some tokens from Alice to Bob, the state-dependent check is to make sure Alicehas enough tokens. In principle, it is wrong to check Alice’s account balance at the timeof inserting the transaction into the queue, since we do not know when this transaction isgoing to be included . What matters is that at the block in which this transaction is beingincluded , Alice must have enough tokens.That being said, an implementation could optimize the read from the state in someparticular way to allow more checks to happen in the transaction queue layer (one of whichis explained in the next section, 2.1.4.14). Although, it should be noted that transactionsin the queue are not yet accountable , since they are not executed. In other words, a userdoes not pay any fees to have their transaction live in the queue. But, they do pay to havetheir transaction included in the chain. Therefore, if the queue spends too much time onvalidation, this can easily turn into a

Denial of Service attack (DoS).

We mentioned that signatures allow transactions to be signed only by an entity that ownsa private key associated with the account. This allows anyone to verify a transaction thatclaims to spend some funds from an account. Nonetheless, given that the block history ispublic, this pattern is vulnerable to replay attacks . A replay attack is an attack in which amalicious user will submit some (potentially signed) data twice. In the case of a blockchain,Eve can simply look up a transaction that transfers some money out of Alice’s account,and re-submit it back to the chain numerous times. This is an entirely valid operationby itself, since the transaction that Eve is submitting again indeed does contain a validsignature from Alice’s private key.To solve this, blockchains that rely on state usually introduce the concept of nonce: acounter that is associated with each account in state, initially set to zero for every potential22 .1 Blockchains And Distributed Ledger Technology account. A transaction is only valid if, in its signed payload, it provides the nonce valueassociated with the origin, incremented by one. Once the transaction is included, the nonceof the account is incremented by one. This eﬀectively alleviates the vulnerability of replayattacks. Any transaction that Alice signs, once submitted to the chain and upon being included , is no longer valid for re-submission.

We close our introduction to blockchains by providing a ﬁnal perspective on their nature.First, we enumerate some of the lenses through which we have seen blockchains:• A distributed peer to peer network of nodes.• A distributed database of blocks and states.• A decentralized trustless transaction processing unit.We can put all of this together into one frame by representing blockchains as statemachines . This concept resonates well with our notion of state as well. A blockchain is a decentralized state machine. It is a state machine because the state-root hash at the endof each block is one potential state, and blocks allow for transition between states. Dueto forks, one might have to revert to a previous state. It is decentralized because thereis no single entity that can enforce transition from one state to another one. In fact, anyparticipant can propose transitions by authoring a block candidate, but it will only everbe considered canon if it is agreed upon by everyone, through the consensus mechanism.Moreover, each participant stores a copy of the state. If a single node crashes, goes oﬄine,or decides to misbehave, the integrity of the system is maintained, as long as there areenough honest participants.

Before continuing with the chapter, we brieﬂy address the issue of technology context . So farin this chapter, we have used simple examples from the banking world, since it is similar toBitcoin, which is a well-known system, and it is easy to explain. Nonetheless, a reader whomay have previously had some background knowledge with some other blockchain project X might soon ﬁnd some details that we named here to be less than 100% compatiblewith project X . Moreover, we have even admitted throughout the text that some of ourexamples are not even exactly similar to Bitcoin (such as the state model, as opposed tothe UTXO model). 23 .2 Concurrency Such perceived incompatibilities/inaccuracies are predictable to happen, as blockchainsystems are a rapidly evolving ﬁeld of science and engineering at the moment. Diﬀerentprojects diverge from one another, even in radical concepts, and experiment with newpatterns. Nonetheless, we make the following assertions about our assumptions in thiswork:• Whenever we build up a simple example (mostly with Alice and Bob) in this work,we do not tie it to any particular blockchain project. Instead, these examples are tobe interpreted completely independently, and solely based on the relevant conceptsexplained.• In this entire work, we aim to see blockchains in the most generic form that they canbe seen. That is to say, we interpret blockchains exactly as we deﬁned in 2.1.5: adecentralized state machine that can be transitioned through means of any form ofopaque transaction. An example of this is the key-value state model that we used ismore generic than the UTXO model .To summarize, we have explained in this section only what we have deemed to be the mostfundamental concepts of blockchains, and we noted whenever a detail could potentially beimplementation-speciﬁc. This approach will persist throughout the rest of this work. In this section, we introduce relevant concepts from the ﬁeld of concurrency. As mentioned,the crux of our idea is to deploy concurrency in a blockchain system to gain throughput.Concurrency is the ability of a software artifact to execute units of its logic (i.e. a task ) out-of-order , without causing the outcome to be invalid (25). If done correctly, this out-of-order execution can be mapped to diﬀerent hardware units and improve performance.The opposite viewpoint is a purely in-order execution, namely sequential.We link this directly to our example of interest: a node in a blockchain system has aprocess that is responsible for executing blocks. By default, this process is purely sequen-tial: all of the transactions in the block are executed in order, namely the same order asthey appear in the block. Deploying concurrency should allow this process to divide thetransactions within the block into a number of smaller groups. Then, these groups can beexecuted concurrently, without causing an invalid outcome. We have not explained UTXO in this work yet, but suﬃce to say that you can easily implementUTXO with key-value but not the other way around. .2 Concurrency The outcome of interest, of course, is only the aforementioned state database after allof the transactions of the block are applied. Speciﬁcally, as nodes in the network receiveblocks and apply them to a common state, the only acceptable outcome is for all of thenodes to reach the same state after applying the block . This comparison is done by meansof the state root. Deﬁnition 2.2.1.

Valid Block : A block is valid only if its execution is deterministic amongall of the nodes in the network, and leads to the same state root S (cid:48) , if applied on top of aknown previous state S . Remark.

The deterministic replayability of the transactions is in fact the property thatensures that the state is, in principle, optional to keep around, because it is deterministicallyreproducible from the chain of blocks.Thus, the need for determinism is absolute in a blockchain’s runtime environment.This is in fact why blockchains are designed to work sequentially by default: because it iseasy to ensure determinism when all the transactions are applied sequentially.Given the execution model of a block, we can reduce the problem to a single node’shardware. Assume a node is attempting to execute a block, and it has the entire stateloaded in memory. If a single thread executes the transactions within the block, theoutcome is deterministic by deﬁnition. On the other hand, if multiple threads try toexecute the transactions concurrently and access the state as they move forward, theresult is moot. This is because threads will have to compete over access to the blockchainstate and a typical race condition happens (26). The challenge is to allow these threadsto cooperate and achieve concurrency, while still maintaining determinism. Therefore, wehave translated our blockchain scenario to a typical shared-state concurrency problem . Insuch a setup, multiple threads are competing for access to some shared data (the state),and the runtime environment needs to resolve the race conditions between the threads.In the next sections, we present practical ways to use concurrency over a shared state ,while still generating valid results. Generally, mechanisms that provide more control overthe behavior of concurrent programs are referred to as concurrency control . Among those,we are essentially looking for those that will allow a valid block to be authored and re-executed, as deﬁned in 2.2.1, most notably deterministically. Note that this is only the case of block validation (block import ). There is also the task of blockauthoring, which is actually more complicated, but irrelevant to the discussion of this section – see 2.1.1. We note that the determinism requirement is somewhat special to our use case and some systemsmight not require it. .2 Concurrency Locks are a common and intuitive mechanism for concurrency control. A lock is anabstract marker applied to a shared object (or, generally, to any memory region) to limitthe number of threads that can access it at the same time. The idea of using locks indatabase systems that want to achieve higher throughput by using multiple threads goesback many decades ago (27, 28), among other ﬁelds.The simplest form of a lock will not distinguish between reads and writes and the onlyoperations on it are acquire and release . To access the data protected by the lock, ﬁrst,the lock itself needs to be acquired. Once a thread acquires a lock, no other threads can:they have to wait for it. Once the holding thread is done with the lock (more accurately,done with the data protected by the lock), they release the lock. Upon being released,the lock is acquired by one of the waiting threads, and the process restarts. This processcan easily ensure that some data is never accessed by multiple threads at the same time.The processor usually ensures that these primitive operations ( acquire and release ) aredone atomically between threads. Such locks that do not distinguish between reading andwriting are called a Mutex , short for mutual exclusion (29).A more elaborate variant of Mutex is a read-write lock (RW-lock). Such locks leveragethe fact that multiple reads from the same data are almost always harmless, and shouldbe allowed - hence the read and write distinction. In an RW-lock, at any given time, therecan be only one writer, but multiple concurrent readers are allowed.

Remark.

We use the Rust programming language for the implementation of this work. Rustprovides some of the ﬁnest compile-time memory safety guarantees among all programminglanguages (30). To achieve this, Rust references (i.e., addresses to memory locations) havethe exact same aliasing rule: multiple immutable references to a data can co-exit in a scope,but only one mutable reference is allowed (31).Locks are easy to understand, but notoriously hard to use correctly . A programmerneeds to think about every single critical memory access, and acquire and release locksfrom diﬀerent threads to prevent wrong outcomes. Even worse, immature use of locksoften leads the programs to deadlock, i.e., reach a state in which all threads are inﬁnitelywaiting for a lock acquired (and never released) by another thread. These issues arecommon programming errors, but they remain very hard to detect and resolve (32).Moreover, locks are a pessimistic mechanism for concurrency control: they assume thatif two threads want to acquire the same (write) lock, their logic will cause a conﬂict tohappen. Based on the granularity of the lock and the internal logic of each thread, a Sadly, as we will see, the most intuitive way is not always the easiest to use. .2 Concurrency conﬂict might not be the case all the time. This is exactly what the next section willaddress. Transactional memory is the opposite of locking when it comes to waiting. In locking, thethreads often need to wait for one another. If a thread is writing, then all the readers andwriters need to wait. This is based on the assumption that mutual acquirement of lockswill always lead to conﬂicts, so it needs to be prevented in any case. Transactional memorytakes the opposite approach, and assumes that mutual data accesses will not conﬂict bydefault. In other words, threads do not need to wait for one another, thus transactionalmemory is coined "lock-free" or sometimes "wait-free" (33).In the context of transactional memory, a thread’s execution is divided into smallerpieces of logic called transactions. A transaction attempts to apply one or more updatesto some data, without waiting, and then commits the changes. Before a commit, thechanges by any transaction are not visible to any other transaction. Once a commit isabout to happen, the runtime must check if these changes are conﬂicting or not, based onthe previous commits. If committed successfully, the changes are then visible to all othertransactions. Else, none of the changes become visible, the transaction aborts , and thechanges are reverted. The great advantage of this model is that if two transactions accessthe same memory region, but in a non-conﬂicting way , then there is a lot less waiting. Inessence, there is no waiting in the execution of transactions, at the cost of some runtimeoverhead when they want to commit.Transactional memory can exist either via specialized hardware or simulated in the soft-ware, referred to as software transactional memory, or STM (34). If implemented in soft-ware, transactional memory does incur a runtime overhead. Nonetheless, its programminginterface is much easier and less error-prone than that of locks, because the programmerdoes not need to manually acquire and release locks. (35).Transactional memory is likely to lessen the waiting time and conﬂicts. Nonetheless, itstill allows threads to operate over the same data structure. This implies complicationsabout commits, aborts, and coherence . A radically diﬀerent approach is to try and preventthese complications from the get-go, by disallowing shared data to exist; this mechanism Textbook example: access two diﬀerent keys of a concurrent hash map. The question of which changes from a local thread’s transaction become visible to other threads andwhen, and under which conditions. We have barely touched this issue, which in itself deserves a thesis tobe fully understood. .2 Concurrency is described in the next section. But ﬁrst, we brieﬂy note a common trait of locking andtransactional memory. Note About Determinism

It is very important to note that both locking and transactional memory are non-deterministic.This means that executing the same workload multiple times may or may not lead to thesame output. It is easy to demonstrate why: ﬁrst, consider locking. Imagine two threadswill soon attempt to compete for a lock, and each will try and write a diﬀerent value to aprotected memory address. Based on the fairness rules of the underlying operating system,either of them could be the ﬁrst one. While the output of the program in both cases is correct , it is not deterministic .The same can be said about transactional memory: if two transactions have both alteredthe same data in a conﬂicting way, one of them is doomed to fail upon trying to commit,and it is just a matter of which do it ﬁrst. Again, both outputs are correct, yet the programis not deterministic.

Remark.

A reader can, at this point, link our use of words "correct" and "determinism" to2.1.4.7 and block authoring speciﬁcally. From the perspective of a block author, it does notmatter what the outcome of a block is (i.e. which transactions are within it, which succeed,and which fail). All such blocks are probably correct . The ﬁrst and foremost importanceis for all validators to deterministically come to the same state root, once having receivedthe block later. This is in stark contrast with the non-determinism nature of locking.

There is a great quote from the documentation of the Go programming language, whicheloquently explains the point of this section: "Don’t communicate by sharing memory;share memory by communicating." (36). This introduces a radical new approach to con-currency, in which threads are either stateless or pure functions (37), where their state isprivate. All synchronization is then achieved by the means of message passing. Like this,threads do not need to share any common state or data. If threads need to manipulatethe same data, they can send references to the data to one another. In many cases, thispattern is advantageous compared to locking, both in terms of concurrency degree and theprogramming ease.Nonetheless, we know that our use-case exactly needs some executor threads to have ashared state while executing the blockchain transactions. Therefore, we do not directlyapply the message-passing paradigm, but we use it as inspiration, and take the possibilityof message passing into account. We revisit this possibility in chapter 3.28 .3 Recap: Splicing Concurrency and Blockchain

As mentioned in 2.2.2, transactional memory attempts to reduce the waiting time by as-suming that conﬂicts are rare. This could bring about two downsides: reverts, in case aconﬂict happens, and the general runtime overhead that the system needs to tolerate (forall the extra machinery needed for transactional memory ). An interesting approach tocounter these limitations is static analysis. Static refers to an action done at compile-time,contrary to runtime. The goal of static analysis is to somehow improve the concurrencydegree by leveraging only compile-time information. In the case of transactional memory,this could be achieved by using static analysis to predict and reduce aborts(38). Simi-larly, other studies have tried to use static analysis to improve the usage of locking byautomatically inserting the lock commands into the program’s source code at compile time(39). This can greatly ease the user experience of programmers using locks, and reduce thechance of human errors to emerge. Similar to message passing, we take inspiration fromthe concept of static analysis in our design later in chapter 3. Which is even more if it is being emulated in the software. A Novel Approaches TowardConcurrency in Blockchains

Don’t share memory to communicate, communicate to share memory. – Oﬃcial Go Programming Language Documentation.In this chapter, we build up all the details and arguments needed to introduce ourapproach toward concurrency within blockchains. We start with an interlude, enumeratingdiﬀerent ways to improve blockchains’ throughput from an end-to-end perspective, andhighlight concurrency as our method of choice.

Blockchains can be seen, in a very broad way, as decentralized state machines that tran-sition by means of transactions . The throughput of a blockchain network, measured intransactions per second, is a function of numerous components, and can be analyzed fromdiﬀerent points of view. While in this work we focus mainly on one aspect, it is helpful toenumerate diﬀerent viewpoints and see how each of them aﬀects the overall throughput . We discussed how the consensus protocol provides the means of ensuring that all nodeshave a persistent view of the state (see Section 2.1.4.7), and it can heavily contribute tothe throughput of the system. Take, for example, two common consensus protocols: Proofof

Work and Proof of

Stake . They use the computation power ( work ) and an amountof bonded tokens ( stake ), respectively, as their guarantees that an entity has authority to This categorization is by no means exhaustive. We are naming only a handful. .1 Prelude: Speeding up a Blockchain - An Out-of-The-Box Overview perform some operation, such as authoring a block. It is important to note that each ofthese consensus protocols has inherently diﬀerent throughput characteristics (40). Proofof work, as the name suggests, requires the author to prove their legitimacy by providingproof that they have solved a particular hashing puzzle. This is slow by nature, and wastesa lot of computation power on each node that wants to produce blocks, which in turnhas a negative impact on the frequency of blocks, which directly impacts the transactionthroughput. Improving the throughput of Proof of Work requires the network to agreeon an easier puzzle, that can, in turn, make the system less secure (2) (further details ofwhich are outside the scope of this work).On the contrary, Proof of Stake does not need puzzle solving, which is beneﬁcial interms of computation resources. Moreover, since the chance of any node being the authoris determined by their stake , more frequent blocks do not impact the security of the chainas much as Proof of Work does. Recently, we are seeing blockchains turning to veriﬁablerandom functions (20) for block authoring, and deploying a traditional byzantine faulttolerance voting scheme on top of it to ensure ﬁnality (41, 42). This further decouplesblock production and ﬁnality , allowing production to proceed faster and with even less dragfrom the rest of the consensus system.All in all, one general approach towards increasing the throughput of a blockchain isto re-think the consensus and block authoring mechanisms that dictate when blocks areadded to the chain - speciﬁcally, at which frequency. It is crucially important to note thatany approach in this domain falls somewhere in the spectrum of centralized-decentralized,where most often approaches that are more centralized are more capable of delivering betterthroughput, yet they may not have some of the security and immutability guarantees of ablockchain. A prime example of this was mentioned in table 2.1 of chapter 2.1.4.8, whereprivate blockchains are named as being always the fastest. An interesting consensus-related optimization that is gaining a lot of relevance in recentyears is a technique called sharding , borrowed from the databases ﬁeld. Shards are slices ofdata (in the database jargon) that are maintained on diﬀerent nodes. In a blockchain sys-tem, shards refer to sub-chains that are maintained by sub-networks of the whole system.In essence, instead of keeping all the nodes in the entire system synchronized at all times,sharded blockchains consist of multiple smaller networks, each maintaining their own can-non chain. Albeit, most of the time these sub-chains all have the same preﬁx and only Using some hypothetical election algorithm which is irrelevant to this work. .1 Prelude: Speeding up a Blockchain - An Out-of-The-Box Overview diﬀer in the most recent blocks. At ﬁxed intervals, sub-networks come to an agreementand synchronize their shards with one another. In some sense, sharding allows smallersub-networks to progress faster(43, 44, 45). Another approach to improve the throughput is changing the nature of the chain itself.A classic blockchain is theoretically limited due to its shape: a chain has only one head,thus only one new block can be added at each point in time. This property brings extrasecurity, and makes the chain state easier to reason about (i.e. there is only one canonicalchain). A radical approach is to question this property and allow diﬀerent blocks (orindividual transactions) to be created at the same time. Consequently, this approachturns a blockchain from a literal chain of blocks into a graph of transactions (46). Mostoften, such technologies are referred to as Directed Acyclic Graphs (DAG) solutions. Aprominent example of this is the IOTA project(47).Allowing the chain to grow from diﬀerent heads (i.e. seeing it as a graph) allows trueparallelism at the block layer , eﬀectively increasing the throughput. Nonetheless, the se-curity of such approaches is still an active area of research, and achieving decentralizationwith such loose authoring constraints has proven to be challenging (48).Altering chain topology brings even more radical changes to the original idea of blockchain.While being very promising for some ﬁelds such as massively large user applications (i.e."Internet of Things", micro-payments), we do not consider DAGs in this work. We chooseto adhere to the deﬁnitions provided in chapter 2 as our baseline of what a blockchain is.

Finally, we can focus on the transaction-processing view of the blockchain, and try anddeploy concurrency on top of it, leaving the other aspects, such as consensus, unchangedand, more importantly, generic . This is very important, as it allows our approach (to beexplained further in this chapter) to be deployed on many chains, because it is independentof any chain-speciﬁc detail. Any chain will eventually come to a point where it mustexecute some transactions, be it in the form of a chain, or a DAG, with any consensus.Thus, concurrency is a viable solution as long as the notions of transactions and blocksexist.Our work speciﬁcally focuses on this aspect of blockchain systems, and proposes a novelapproach to achieve concurrency within each block’s execution, both in the authoring phaseand in the validation phase, thereby increasing the throughput.32 .2 Concurrency within Block Production and Validation

Each of these methods brings about an improvement in the throughput (in terms of trans-actions processed per second) of the system in their own unique way. The question of which one will be the dominant one is too speciﬁc to a particular chain’s assumptions, andoutside the scope of what we are trying to tackle in this work. As an example, for someconsortium chains where consensus is less of an issue, altering the consensus is likely toresult in a signiﬁcant throughput gain, without using any of the other methods such assharding and concurrency.Instead, we acknowledge that each of these approaches has its own merit and could beuseful in a certain scenario. Therefore, we put our focus on concurrency for the rest of thiswork. We see concurrency as a universal improvement that can be applied to any chain.Moreover, It is worth noting that having optimal hardware utilization (to reduce costs) isan important factor in the blockchain industry, as many chains are run by people who aremaking a proﬁt out of running validators and miners.Finally, by seeing these broad options, we can clarify our usage of the word "throughput".One might notice that the ﬁrst two options mentioned in this section (consensus, chaintopology) can increase the throughput at the block level: more blocks can be added,thus more transaction throughput. This is in contrast to what concurrency can do. Theconcurrency explained in 3.1.3 is the matter of what happens within a block. Henceforth,by throughput, we mean throughput of transactions that are being executed within a( single ) block. Similarly, by concurrent, we mean concurrent within the transactions of a( single ) block, not concurrency between the blocks themselves. In this section, we explain in detail how a concurrent blockchain should function. Mostnotably, we deﬁne how a concurrent author and a concurrent validator diﬀer from theirsequential counterparts by deﬁning their requirements. Note that these requirements aremandatory and any approach toward concurrency in blockchains must respect them, aslong as it has a common deﬁnition of a blockchain as we named in chapter 2. As a readermight expect based on previous explanations, all of them boil down to one radical property: determinism .To recap, the block author is the elected entity that proposes a new block consistingof transactions. The block author must have already executed these transactions in someprotocol-speciﬁc order (e.g., sequentially), and noted the correct state root of the block33 .2 Concurrency within Block Production and Validation in its header. This block is then propagated over the network. All other nodes validatethis block and, if they all come to the same state root, they append it to their local chain.An author that successfully creates a block gets rewarded for their work by the system.

We begin in a chronologically sensible way, with block authoring. Before anything inter-esting can happen in a blockchain, someone has to author and propose a new block. Else,no state transition happens.A concurrent author has access to a pool of transactions that have been received overthe network, most often via the gossip protocol. From a consensus point of view, it isabsolutely irrelevant to ensure all nodes have the same transactions in their local pool.In other words, from a consensus point of view, there is no consensus in the transactionpool layer. All that matters is that any node, once chosen to be the author, has a poolfrom which it can choose transactions. Then, the author has a limited amount of time toprepare the block and propagate it.A number of ambiguities arise here. We dismiss them all in the following enumerationto be able to only focus on the concurrency aspect.• Typically, the author needs some way to prefer a subset of the transactions pool, asmost often all of it cannot be ﬁt into the block. For this work, we leave this detailgeneric and assume that each author has ﬁrst ﬁltered out its pool into a new queue of transactions (noting that the former is unordered and the latter is ordered ) thatshe prefers to include in the block. In reality, a common strategy here is to prioritizethe transactions that will pay oﬀ the most fee, as this will beneﬁt the block author.• The fact that the authoring time is limited is the whole reason why the author isincentivized to use concurrency: the more transaction that can be ﬁtted into theblock in a limited amount of time, the more the sum of transaction fees, thus morereward for the author.• A block must have some chain-speciﬁc resource (e.g. computation, state read/write,byte length of transactions) limit. For simplicity, we assume that each block can ﬁt aﬁxed maximum number of transactions, but this limit is so high that the bottleneckis not the transaction limit itself, but rather the amount of time that that author hasto prepare the block, bolstering the importance of high throughput. In reality, some Also referred to as queue in 2.1.4.12, and sometimes called mempool in the industry. .2 Concurrency within Block Production and Validation blockchains have adopted a similar approach (i.e., a cap on number of transactions),or have limited the size of the block. Complex chains that support arbitrary codeexecution even go further and limit the computation cost of the transactions - see,for example, Ethereum’s gas metering(49).Having all these parameters ﬁxed, we can focus on the block building part, namelyexecuting each transaction and placing it in the block.A sequential author would simply execute all the transactions one by one (in someorder of preference) up until the time limit, and calculate the new state root. Thesetransactions are then structured as a block. Concatenated with a header that notes thestate root, the block is ready to be propagated. The created block is an ordered containerfor transactions, therefore it can be trivially re-executed deterministically by validators, aslong as everyone does it sequentially.A concurrent author’s goal is to execute these transactions in a concurrent way, hopingto ﬁt more of them in the same limited time , while still allowing the validators to cometo the same state root. This is challenging, because, most often, concurrency is non-deterministic. Therefore, the author is expected to piggy-back some auxiliary informationto its block, to allow validators to execute it deterministically. Maintaining determinismis the ﬁrst and foremost criterion of the concurrent author.Moreover, the second criterion is a net positive gain in throughput. The concurrentauthor prefers to be able to execute more transactions within the ﬁxed time frame thatshe has for authoring, for she will then be rewarded with more transaction fees. A validator’s role is simpler in both the sequential and concurrent fashion. Recall that ablock is an ordered container for transactions. Then, the sequential validator has a trivialrole: re-execute the transactions sequentially and compare state roots. The concurrentvalidator, however, is likely to need to do more.More speciﬁcally, the concurrent validator knows that a concurrent author must have ex-ecuted all or some of the transactions within the block concurrently. Therefore, conﬂictingtransactions must have preceded one another in some way. The goal of the concurrentvalidator is to reproduce the same precedence in an eﬃcient manner, and thus arrive atthe same state root.For example, assume the author uses a simple Mutex to perform concurrency. In thiscase, some transactions inevitably have to wait for other transactions that accessed the35 .3 Existing Approaches same Mutex earlier. This leads to an implicit precedence between conﬂicting transactions.The author needs to somehow transfer the precedence information to the validator, andthe validator must respect this order to arrive at the same state root.

In this section, we look at some of the already existing approaches toward concurrencyin blockchains. These approaches are essentially a practical resemblance of what wasexplained in the previous section. While doing so, we denote their deﬁciencies and buildupon them to introduce our approach.

Concurrency Control

Every tool that we named for concurrency in chapter 2 can essentially be used in blockchainsas well, yet each takes a speciﬁc toll on the system in order to be feasible. All of theseapproaches fall within the category of concurrency control . We begin with the simplestone: locking.A locking approach would divide the transactions into multiple threads . Each trans-action within the thread, when attempting to access any key in the state (Recall from2.1.4.4 that the state is a key-value database), has to acquire a lock for it. Once acquired,the transaction can access the key. This process is not deterministic. Therefore, the run-time needs to keep track of which locks were requested by which thread, and the order in which they were granted. This information builds, in essence, a dependency graph.This dependency graph needs to be sent to the validators as well. The validators parsethe dependency graph and, based on that information, spawn the required number ofthreads, and distribute the transactions among them. (50) is among the earliest works onconcurrency within a blockchain, and adopts such an approach.The details of generating the dependency graph with minimum size, encoding it in theblock in an eﬃcient way, and parsing it in the validator are being highly simpliﬁed here.These steps are critical, as they are the main overheads of this approach. The size ofthis graph needs to be small, as it needs to be added to the block and increases thenetwork overhead. Moreover, the overhead of this extra processing must be worthwhilefor the author, as otherwise, it would be in contrast to the whole objective of deployingconcurrency. There are some works that only focus on the "dependency graph generating A 1:1 relation between threads and transactions is also possible, given that the programming languagesupports green threads. .3 Existing Approaches and processing" aspect of the process. They assume some means exist through whichthe read and write set of each transaction can be computed (i.e., by monitoring the lockrequests that each thread sends at runtime). On top of this, they provide eﬃcient ways tobuild the dependency graph, and use it at the validator’s end (51).The next step of this progression is to utilize transactional memory. This line of researchfollows the same pattern. More recent works use software transactional memory (STM) toreduce the waiting time and conﬂict rates. Similar to the locking approach, the runtimeneeds to keep track of the dependencies and build a graph that encodes this information.Diﬀerent ﬂavours of STMs are used and compared in this line of research, such as Read-Write STMs, Single-Version Object-based STMs, and Multi-Version Object-based STMs(52, 53). Nonetheless, the underlying procedure stays the same: some means of concurrencycontrol to handle conﬂicts, track dependency, and use it to encode precedence, then re-create the same precedence in the validator. Concurrency Avoidance

Next, we name an out-of-the-box work that takes a rather diﬀerent path. Many studiesin the blockchain literature use datasets from the database industry as their reference.Such datasets might have unrealistically high rates of contention. (54) is an empiricalstudy that tries to determine the conﬂict rates within Ethereum transactions, a live , andarguably well-adopted network. While doing so, the study demonstrates a diﬀerent, wait-free approach. In the concurrent simulator of this work, all transactions are executedin parallel, with the assumption that they do not conﬂict. If a conﬂict happens and atransaction aborts, it is discarded, and re-executed again at a later phase, sequentially.This essentially clusters transactions into two groups: concurrent and sequential. All ofthe concurrent transactions are guaranteed not to conﬂict. The sequential transactionsdo not matter as they are executed sequentially. Aside from their ﬁndings of the conﬂictrates in diﬀerent periods of time in Ethereum, they also report speedups in some cases,not being too shy of the speedup amounts reported by (50), which uses locking.This is an inspiring ﬁnding, implying that perhaps complicated concurrency controlmight not be needed after all for many of the transactions in some period of time, basedon the contention of the transactions. In some sense, this work adopts a technique that wecoin as concurrency avoidance , instead of concurrency control . As a consequence,the system need not deal with conﬂicts in any way, because they are rejected and dealtwith separately in the sequential phase. 37 .4 Our Approach

Static Analysis

Finally, we note that there has also been some work on pure static analysis in the ﬁeldof blockchains, yet all of those that we have found require fundamental changes to theprogrammable language of the target chain. For example, (55) provides an extension tothe Ethereum’s smart contract language, Solidity, that allows it to be executed in a trulyconcurrent manner (by essentially limiting the features of the language). Similarly, RChainis an industrial example of a chain that has a programming model that is fundamentallyconcurrent(56), namely pi-calculus(57). Such approaches are also inspiring, yet we preferdevising an approach which does not need to alter such fundamental assumptions aboutthe programming model, in favor of easy adoption and outreach.The aforementioned 3 broad approaches (concurrency control, concurrency avoidance,static analysis) are essentially the taxonomy that we have found to answer the ﬁrst researchquestion, namely the diﬀerent ways to utilize concurrency within a blockchain. We closethis section with a remark about the scope of the referenced works in this chapter

Remark.

Most of the surveyed related work name their work as approaches toward con-currency for smart contracts . At this point, it would be helpful to clarify that. Thedetails of smart contract chains are well beyond the scope of this work. But, it is worthnoting that a a smart-contract chain is a ﬁxed chain that has a ﬁxed state transition logic,and a part of that logic is to store codes (smart contracts) and execute them upon beingdispatched. Moreover, since Ethereum is the prominent smart contract chain, all of theseworks present themselves with simulators that can hypothetically be implemented in theEthereum node. To the contrary, we do not limit ourselves to smart contracts or anyspeciﬁc chain in this work; instead, we build upon the idea that the future of blockchainswill not be a single chain (chain maximalism), but rather an abundance of domain-speciﬁcchains interoperating with one another. To achieve this, one needs to think in the contextof a framework for building blockchains, not a particular blockchain per se.

In this section, we describe our approach towards concurrency in a blockchain runtime,both in the authoring phase and in the validation phase. First, we begin by drawing aconclusion from the surveyed studies in 3.3.

We begin by pointing out that all of the mentioned works, regardless of their outcome,produce a sizable amount of overhead. We think these overheads are preventable and arguethat perhaps a diﬀerent approach can prevent them altogether.38 .4 Our Approach

Both locking and transactional memory results in a sizeable overhead while authoring.This is mostly hidden in some sort of runtime overhead, for example, the need to keeptrack of locking order, and consequently to parse it into a dependency graph. Moreover,this dependency graph inevitably increases the block size, because the dependency graphneeds to be propagated to all other nodes. Finally, the validator also needs to tolerate theoverhead of parsing the dependency graph and making informed, potentially complicateddecisions based upon it. These are all overheads compared to the basic sequential model.In essence, we express skepticism toward these complex runtime machinery to deal withconﬂicts, and record precedence.On the other hand, the pure concurrency avoidance model is likely to fail under anyworkload with some non-negligible degree of contention, because it basically falls back tothe sequential model where most transactions are aborted and moved to the sequentialmodel. Results from (54) show the same trend.On the contrary, we aim to minimize these overheads by ﬁnding a new balance betweenthe "concurrency control" and "concurrency avoidance" models. Moreover, we ﬁnd a newbalance between "runtime" and "static" as well. While some runtime apparatus is neededto orchestrate the execution and prevent chaos, tracking all dependencies is likely to be toomuch. Similarly, while a purely static approach toward concurrency is a radical change tothe programmable language of a chain, we claim that some static information could nudgethe runtime to the right path.

Our approach is based on three key pivotal ideas, explained in the next sections.

We have already seen that locking is a common primitive to achieve shared-state concur-rency. In our approach, we relax this primitive such that any access to a shared state by athread does not incur long waiting times , but instead, it might immediately fail. Todo so, we link each key in the state database with a taint value. If a key has never beenaccessed before, it is untainted. Once it is accessed by any thread (regardless of the typeof operation being read or write), it is tainted by the identiﬁer of that accessor thread.Henceforth, any access to this key by any other thread fails, returning the identiﬁer of theoriginal tainter (aka, owner ) of the key. As we shall see in the implementation (see 4),39 .4 Our Approach this approach is almost wait-free, meaning that threads almost always proceed immedi-ately with any state operation. Indeed, a thread can always freely access keys that it hasalready tainted before.

If a thread, in the process of executing a transaction, tries to access a tainted state key,it forwards this transaction to the owner of that key. By doing so, a thread basically delegates the task of executing a transaction concurrently to another thread because itcannot meet the state-access requirements of the transactions itself; based on the availableerror information, the recipient (i.e., the owner of the failing state key) is more likely tobe capable of doing that. This is the middle ground between concurrency avoidance and concurrency control , which we have coined concurrency delegation .Compared to concurrency control, threads in the concurrency delegation model do not tryto resolve contention in any sophisticated way. Instead, they simply delegate (aka, forward )the transactions to whomever they think might be able to execute them successfully. Thereis no waiting involved, and no record is kept about access precedence.On the other hand, compared with concurrency avoidance, concurrency delegation doesnot automatically assume that just because a transaction’s state operation has failed, thetransaction cannot be executed in any concurrent way. Instead, the transaction might beexecutable by another thread, which is known to be the owner of the problematic state key.Therefore, instead of immediately being discarded (and potentially executed sequentiallyat the end), each transaction is given a second chance to succeed in a concurrent fashion.Finally, if a transaction is already forwarded and still cannot be executed, (only) thenit is forwarded to a sequential queue to be dealt with later. We name such transactions orphans . This implies that each transaction is at most forwarded twice: once to anotherpotential recipient, and next, when needed, to be declared as an orphan.Why is "a failing transaction that has already been forwarded once" considered an or-phan? We present an example to make this clearer: assume thread T receives a forwardedtransaction from T . Based on the protocol of delegation, we know that this transactionmust have at least one key which is owned by T , because T failed to access it and thusforwarded it to T . Moreover, if the transaction still fails, it means that it has at leastone other key that is owned by some other thread T . This implies that T is not able toexecute this transaction, and any further recipient of this transaction, including T , willalso fail to execute it because the transaction has at least one key owned by T .40 .4 Our Approach Therefore, no transaction ever needs to be forwarded more than once. A transaction canbe forwarded at most once, and thereafter it is considered to be an orphan.

We predict that concurrency delegation works well when threads need to seldom forwardtransactions to one another. In the delegation model, the transactions need to be initially distributed between the threads in some educated and eﬀective way to minimize forward-ing. This is where we turn to pseudo-static heuristics in the form of a hint. We use theterm " pseudo-static " because we do not mean information that is necessarily available atcompile-time, but is rather known before a particular transaction is executed. In otherwords, the information is static with respect to the lifetime of a transaction and can beinferred without the need to execute the transaction, but rather by just inspecting it.Our approach is based upon these key ideas: taintable state, concurrency delegation,and pseudo-static hints. In the next section, we connect these ideas and depict how theywork together in the form of a uniﬁed algorithm.

We ﬁrst look into the baseline algorithm, which essentially combines the concurrency del-egation and the taintable state, without any static heuristics. We consider a single masterthread (henceforth called "master"), and worker threads (henceforth called "workers"),each having the ability to send messages over channels to one another. The master hasaccess to a potentially unbounded queue of ( ordered , ready to execute) transactions.Moreover, it has an (initially) empty queue of orphan transactions, to which workers canforward transactions. Additionally, each worker has a local queue, to which the master andother workers can send transactions. The master and all workers share a reference to thesame state database, S , which follows the logic of a taintable state as explained in 3.4.2.1. Remark.

As we will see, it is crucial to remember that both the transaction queue andthe block are ordered containers for transactions. In other words, they both act like anordered list / array / vector of transactions. The order of transactions in the queue willend up being used to order the transaction in the ﬁnal block as well.The master’s (simpliﬁed) execution logic during authoring is as follows: Needless to say, all of the arguments in this section are applicable to virtually any number of threads. Recall that each node has an unordered pool of transactions, from which they chose an ordered queue based on some arbitrary preference. .4 Our Approach Distribution phase : The master starts distributing transactions between workersby some arbitrary function F . In essence, F is a f n ( transaction ) → identif ier ,meaning that, for each transaction, it outputs one thread identiﬁer. Once the distri-bution is done, each transaction in the queue is tagged by the identiﬁer of one workerthread. The distribution phase ends with the master sending each transaction to itscorresponding worker’s local queue.2. Collection phase : The master then waits for reports from all workers, indicatingthat they are done with executing all of the transactions that they have receivedearlier. During this phase, threads might forward transactions to one another, andmight forward transactions back to the master, if deemed to be orphan, exactly asexplained in the concurrency delegation model. Both of these events are reported tothe master and the tag of each transaction might change in the initial queue. Oncetermination is detected by the master thread, a message is sent to all workers to shutthem down.3.

Orphan phase : Once all worker threads are done, the master executes any trans-actions that it has received in its orphan queue. At this point, the master thread issure that there are no other active threads in the system, thus accessing S withoutworrying about the taint is safe. The transactions are executed sequentially on topof S , and their tag is changed to a special identiﬁer for orphan transactions.Then, the master is ready to ﬁnalize the block. By this point, each transaction is eithertagged to be orphan, or with the identiﬁer of one of the worker threads. In essence, wehave clustered transactions into ordered groups. The validator who receives this blockrespects this clustering and executes transactions with the same tag in the same thread.In the ﬁrst groups, the transactions within a group might conﬂict with one another (e.g.attempt to write to the same key), but they are ordered and are known the be executedby the same thread, thus deterministic. The transactions in the last group, namely theorphan group, are executed sequentially and in isolation, thus deterministic.The worker side of the processing (with slight simpliﬁcations) is as follows:1. Depleting local queue : Having received a number of transactions from the master(after the "

Distribution phase "), each worker then tries to deplete its local queue.For each transaction

T x , the logic is as follows:If

T x is executed successfully, nothing is done or reported. This is because themaster is already assuming that

T x is executed by the current worker, therefore42 .4 Our Approach nothing needs to be reported. If

T x fails due to a taint error, it is forward to theowner of the state key that caused the failure. Note that at this point we know that

T x has not been forwarded before, because it is being retrieved from the initial localqueue.At the end of this phase, the worker sends an overall report to the master, notinghow many of the transactions it could execute successfully, and how many ended upbeing forwarded. This data is then used at the master to detect termination.2.

Termination phase : Once done with their local queue, the workers listen for twotypes of messages, namely termination or forwarded transactions from other threads.Termination is the message from the master to shut down the worker. Forwardedtransactions are those that another worker is delegating / forwarding to the currentworker because of a taint error. The forwarded transaction is then executed locallyand, if it is successful, the result is reported to the master. If the execution fails againdue to a taint error, then the transaction is forwarded to the master as an orphan.Note that in this case, reporting is vital, because a worker is ending up executing atransaction and the master is not aware of it, because the worker who ﬁnally executesthe transaction is not the same as assigned in the "Distribution phase" of the master.This reporting is also needed for the termination detection of this phase. A number of noteworthy remarks exist on the baseline algorithm:

Termination Detection . We intentionally did not describe how the master detects thetermination of the collection phase, because, if we wanted to describe it, we needed furtherinformation from the worker’s logic. Recall that the master knows how many transactionsit has initially distributed between all the workers. Moreover, from the reports sent by theworker at the end of "Depleting local queue" phase, it knows how many of them executed intheir designated thread , and how many of them ended up being forwarded . Also, recall thateach forwarded transaction, upon being executed successfully, is reported to the master.Similarly, each forwarded transaction that fails is also reported to the master (by beingforwarded to the orphan queue residing in the master thread). Thus, the master can safelyassert that

Termination is achieved once the sum of "all locally-executed transactions atworkers", "forwarded and successfully executed transactions" and "orphan transactions"is equal to the initial count of transactions in the queue . At this point, the terminationmessage is created and broadcasted to all workers.43 .4 Our ApproachDeﬁnition 3.4.1. Termination of the master thread’s collection phase.

Assume N initial transaction. Each worker thread, upon ﬁnishing the "deplete localqueue" phase, reports back { r , r , r , r } respectively, indicating the number of transac-tions that each executed locally. Given D as the number of reports of transactions beingforwarded, and O as the size of the orphan queue, termination of the collection phase isachieved iﬀ (cid:88) t =1 r t + D + O == N (3.1) Maintaining Order : Aside from termination detection, it is also vital for determinismthat the master takes action upon the report of a transaction being forwarded. This isbecause once the tag of a transaction changes, it is likely that its order mustalso change within the queue . For example, if a transaction, initially assigned to T is known to be forwarded and executed by T , it is important to re-order it in the initialtransaction queue such that it is placed after all the transactions initially assigned to T .This is because, in reality, T ﬁrst executed all of its designated transactions and then executes any forwarded transactions. Recall that the queue is an ordered container fortransactions and its order will eventually end up building the order of the transaction inthe block. Orphan Transactions . An orphan transaction is a transaction that has already beenforwarded and still fails to execute at its current host thread, due to a taint error. We nowrepresent this from a diﬀerent perspective. In our concurrency delegation scheme, threadsrace to access state keys and, upon successful access, they taint them. Any transactionhas a number of state keys that it needs to access in order to be processed.

An orphantransaction is one that has state keys being tainted by at least two diﬀerent threads . Forexample, assume a transaction needs to access keys K and K . Assume thread T isexecuting this transaction. If K is already tainted by a T and K by a T , then thistransaction will inevitably end up being in the orphan queue. The transaction is ﬁrstforwarded from T to T , where it can successfully access K , but still fails to access K and thus orphaned . Minimal Overhead . Our approach incurs minimal overhead to the block. In fact,the only additional data needed is one identiﬁer attached to each transaction, indicatingwhich thread must execute it (and the special case thereof, orphan transaction), namelythe tag . This can be as small as a single byte per transaction, which is negligible. Note thattransactions within the block still maintain partial order: the transactions of a particular An interesting optimization can be applied on top of this logic, which is explained further in 4.4. .4 Our Approach tag are sorted within their tag. Only the relative order of transactions from diﬀerent tagsis lost, which is not signiﬁcant, because they are guaranteed by the author to not conﬂict. Validation . We can now consider validation as well. As expected, due to the minimaloverhead and the simplicity of the baseline algorithm, the validation logic is fairly simple.Each block is received with all of its transactions having a tag. The ones tagged to beorphans are set aside for later execution. Then, one worker thread is spawned per tag, andtransactions are assigned to threads based on their tags, and in the same relative order.The workers can then execute concurrently, without the need for any concurrency control ,because they eﬀectively know that all contentious transactions already have the same tags,and thus are ordered sequentially within that tag/thread . In essence, the validation is fully parallelizable and does not need any synchronization. Once all threads are ﬁnished, theorphan transactions are executed sequentially and validation comes to an end.

Finally, we must address the most important requirement:

Determinism . We prove deter-minism by showing that the validation and authoring both have the exact same executionenvironment, and the transactions are executed in the exact same order in both phases.In more detail, both the author and the validator spawn the same number of threads, andall of the transactions executed by any thread during authoring is re-executed by a singlethread in the validation phase as well.First, consider all transactions that are executed in their initially designated worker,based on the aforementioned distributor function F . All of these transactions are assigneda tag, and have some partial order within the tag. By design, they are also placed in theirdesignated worker thread’s queue with the same order, and thus executed in the sameorder. Consequently, they are, yet again, placed in the ﬁnal block with the same orderwithin the tag. Therefore, the worker thread in the authoring phase and the worker threadin the validation phase will execute exactly the same transactions, in the exact same order.Next, consider the transactions that ended up being forwarded. These transactions areexecuted after the designated transactions of their ﬁnal host thread. The master thread,responsible for building the ﬁnal block, has to note this change and ensure that this partialorder is maintained in the ﬁnal block. The ﬁrst step for the master is to change the tagof this forwarded transaction to the tag of its newly designated thread. Then, to ensurethat the order within the tag is maintained, the master simply places this transaction atthe end of the queue. This ensures that this transaction will be placed in the ﬁnal block insuch a way that it is executed after all the transactions that have the same tag. Then, we45 .4 Our Approach can apply the same logic as the previous paragraph and assert that the order of executionof all transactions with a speciﬁc tag stays the same within one thread in both authoringand validation, thus deterministic.Last but not least, the orphan transactions also need the same property to be executeddeterministically: maintaining order. The master needs to make sure that all the trans-actions that are tagged as Orphan within the block have the same order as they wereexecuted.Given the deterministic execution of all types of transactions in our concurrent system,we conclude that our approach is fully deterministic, with minimal additional eﬀort .Indeed, the only subtlety that needs to be taken care of is the re-ordering of a transactionin the queue (and consequently the block) when it is successfully forwarded.

The previous section is a complete description of a concurrent system that is deterministicand can be deployed as-is, without any further requirements. Nonetheless, one can arguethat this system might not be eﬃcient in the throughput gain that it can deliver. This isbecause we said nothing about the distribution function of the master, namely F . Whilewe keep this distribution function generic in this entire work, we make one important claimabout it: using static information of the transaction can be very beneﬁcial, and is key tohigh performance.Static information is anything that can be known about a transaction, before it hitsthe runtime and gets executed. In other words, we are interested in any information thatcan be inferred from the transaction, without needing to execute it. A clear example ofsuch information is the origin of the transaction (i.e., the sender account). Because ofpublic-key cryptography usage, all transactions carry a signature and the origin accountas a part of their payload. Therefore, using the origin as static information is permitted,because it is known even before the transaction is executed. Similar reasoning applies to thearguments of the transaction as well. These are information that is encoded in the payloadof the transaction, and the runtime of the master thread (before starting authoring) caneﬀectively use them to optimize F . On the contrary, consider the return value of thetransaction. This is a piece of information that is not considered static with respect to thetransaction, because it can only be known by executing the transaction. Remark.

Given the above paragraph, we denote that our deﬁnition of static is diﬀerentfrom the term which is usually referred to as compile-time information. Therefore, we usedthe term pseudo-static in some places to delineate the diﬀerence.46 .4 Our Approach

We use this pseudo-static information to our beneﬁt, by proposing a static annotationto be added by the programmer to each transaction. Recall that the underlying state ofthe blockchain runtime is a key-value database, so each state access is linked to a key.Moreover, if F can know the list of keys accessed by each transaction, it could create aperfect distribution where no transaction is ever forwarded or orphaned, because no threadever reaches a taint error while accessing the state. Of course, things are not simple. Inpractice, it is impossible to know the execution path of a complex transaction withoutexecuting it , therefore knowing the exact state keys that it must access is impossible.The important point is to remember that the annotation can still be reasonably accuratewithout the need for executing the transaction. This annotation can state, in a best-eﬀort manner, which state keys are likely to be accessed by this transaction, based on,for example, one of the common execution paths of the transaction. Listing 3.1 shows anabstract example of these static annotations . Listing 3.1:

Example of Static Hints fn transaction(origin, arg1,arg2) {read(origin); if condition {// more probable branch! read(arg1)} else {// less probablebranch! read(arg2)}} We observe a transaction, embodied as a function named transaction , which has 3arguments: the origin and two auxiliary ones. Furthermore, we observe a macro thatis providing the state keys that might be accessed by this transaction. In this case, thesecond argument is not relevant, and seemingly only some state key of the origin and the arg1 might be accessed.This macro syntax is just one example - its speciﬁcation is irrelevant at this point.What matters is that the transaction can provide the runtime with some easy-to-compute,pseudo-static information about the state access of the transaction (therefore we called themacro in listing 3.1). The runtime can then eﬀectively use this information inthe transaction distribution phase to come up with a better distribution that leads to lesscontention, and, consequently, to less forwarding and fewer orphans. Remark.

One might question: why is the system limited by the halting problem and cannotpre-execute all transactions in the ﬁrst place? This is partially answered in the transactionvalidation section (see 2.1.4.13). The more detailed reason is that this static hinting mustﬁt into the transaction validation pipeline, because the transactions for which we are keen An interested reader can refer to the "Halting problem" for a formal representation of this issue (58). This style refers to the Rust programming language, but is applicable to any other compiled languageas well. .4 Our Approach to know the access keys are not yet included in any block, thus not accountable. Inshort, executing all transactions in-order pool is an easy Denial Of Service attack vector,especially in chains that allow arbitrary code execution by the user. In this section, we bring 3.4.3 and 3.4.4 into one picture, and explain the entire systemarchitecture. The all-in-one system architecture is presented in Figure 3.1. In the followingparagraph, we explain all its components, as indicated by the labels in the ﬁgure, ﬁnallyconnecting all the dots of our concurrent system as we move forward.

Figure 3.1: Overall System Architecture. - • shows the taintable state. We can see that each cell is not merely a key-value pair,but rather a (taint, key, value) triplet. Some keys are not tainted ( None ), whilesome are already tainted by a thread because they have been accessed by it. Accessto each of the keys by a thread succeeds if the key is untainted, or is tainted by theaccessor thread. Else, an error is returned, notifying the identiﬁer of the thread thatowns the key. 48 .4 Our Approach • emphasizes the need of all threads to access the state; ideally, this access is wait-free. Any access to a key in the state either immediately succeeds, in which case thekey is tainted. Else, it is immediately rejected because of a taint error.• shows the communication channels between the master and workers. These chan-nels allow all of the threads to communicate by means of sending messages to oneanother. Most often these messages are simply a transaction to be forwarded. It isworth noting that worker threads can also communicate in the same manner.• indicates the local queue of each thread. This is where the transactions that themaster thread designates to each thread live until they are executed. Needless to say,this queue is also ordered.• identiﬁes the orphan queue, where the master thread maintains an ordered queueof transactions that are essentially rejected by the workers and need to be executedin a second sequential phase. Once all the workers are done accessing the state,the master thread exclusively starts using the state (essentially ignoring all the taintvalues) and executes all the orphan transactions sequentially.• importantly depicts the transaction distributor component of the master thread.This component is invoked prior to the process of authoring to tag all the transactionsthat are ready to be executed in the transaction queue. This eﬀectively determineswhich thread gets to execute which transactions.• is the transaction queue, where an ordered subset of the transaction pool is veriﬁedand awaiting to be delegated to worker threads.With this overall blueprint of the system’s architecture, we conclude the design of Sonic-Chain and move forward to the next chapter, where we cover some of the implementationdetails of our prototype. 49 Implementation

Rust is like a futuristic laser gun with an almost AI-like foot detector that turnsthe safety on when it recognizes your foot. – u/goofbe on redditIn this chapter, we bring the system architecture mentioned at the end of the chapter 3closer to a running prototype. This chapter is by no means extensive since there are manymany implementation details that could be worth noting. Nonetheless, in favor of brevity,we minimize the details to only those that:• Are important with regards to the evaluation of the system.• Impose a particular practical challenge that we ﬁnd interesting.The entire source code of the prototype is available as free and open-source software(59).For the implementation, we use the Rust programming language (60). Being backedby Mozilla, Rust has a lot to oﬀer in the domain of system programming and low-levelapplications, such as a blockchain runtime. Rust is a unique language, among the few rareones in which one can claim that the learning curve is indeed steep, as mastering it is not just a matter of learning the new syntax. One of the reasons for this learning curve is Rust’scompile-time memory management system, which means all allocations and de-allocationsare inferred and checked at compile time . This ensures that the program is memory-safe,even in a multi-threaded context , whilst having no garbage collector at runtime. Rustdelivers, in some sense, the performance of C, combined with the abstractions and safetyfeatures of Java or C The Rust community often uses the term "Fearless Concurrency" for this combination – memorysafety and concurrency (61) .1 The Rust Standard Library Remark.

We assume some basic Rust knowledge in the rest of this chapter. An interestedreader without any Rust experience can also follow, yet they are likely to have to look upsome types and concepts in the documentation.Lastly, it is worth mentioning that our choice is not merely out of interest. In the lastfew years, Rust has been heavily invested in, by big blockchains companies and in theirresearch (62).

First, we explain some of the primitive types available in Rust’s standard library thatwe use in our implementation. Note that these types are merely our choice for this im-plementation. Although similar data types from other libraries (or crates , in the Rustjargon) are also acceptable, we prefer to limit ourselves to the standard library and remaindependency-free in our proof-of-concept implementation.

For the taintable state, we need a data type that is similar to a typical concurrent

HashMap (63), yet has slight diﬀerences. Rust does not provide a concurrent HashMapin the standard library, so the way to go is to implement our own custom data structure.To implement this, we use a

HashMap and a

RwLock , both of which are provided by thestandard library. The HashMap behaves just like a typical

HashMap in any programminglanguage. The

RwLock is a locking primitive with read-write distinction, where multiple read requests can be done at the same time, while a write request will block access ofall-but-one requests.Like most data types in languages that support generics, Rust’s default

HashMap isgeneric over both the key and value type that it uses. The ﬁnal

HashMap is using opaquebyte arrays as both the key and value type. For the keys, we do our own hashing andconcatenation to compute the location (i.e., the ﬁnal key) of any given state variable. Forexample, to compute the key of where the balance of an account is stored, we compute: "balances:balance_of".hash() + accountId.hash() and use the ﬁnal byte array as thekey. As for the values, to be able to store values of diﬀerent types in the same map, weencode all values to a binary format, thus, a byte array is used.

Listing 4.1:

Key and Value Types (with slight simpliﬁcation – see 4.8). /// The key type. type Key = Vec; /// The value type. .1 The Rust Standard Library type Value = Vec; /// Final State layout. type State = HashMap; Listing 4.1 shows how the ﬁnal state type is created. Namely, we alias the state to be ahashmap with both key and values being opaque byte arrays.

Remark.

A careful reader might notice at this point that we explained how we build the key byte array (hashing and concatenation), yet we have not yet discussed how the valuebyte array is built. This will be explained in 4.5.

In short, Rust’s threading model is : each thread will be mapped to an operating systemthread (which sometimes directly maps to a hardware thread). Naturally, this means thatthe overhead of spawning new threads is not negligible, but there are no additional runtimeoverheads. This justiﬁes using a small number of threads and assigning a large number oftransactions to each, rather than, for example, creating one thread per transaction.The main purpose of this decision is for Rust to remain a runtime-free programminglanguage(64), meaning that there is zero-to-minimal runtime in the ﬁnal binary. Theopposite of the threading model is the

N:M model, also known as green threads . Suchthreads are lightweight, and do not map to a hardware/operating-system thread in anyway. Instead, they are a software abstraction and are handled by a runtime . Therefore,a language that wants to support green threads needs some sizeable runtime machinery tobe able to handle that. For communication, we use multi-producer-single-consumer(66) ( mpsc for short) channels- the only type available in the standard library. 3.1 depicted the communication mediumfor the threads as something similar to a bus, where all threads can directly communicatewith all other threads. In reality, the layout of the channels is a bit more complicated.The process of using Rust’s mpsc channels is such that each channel has one receiver andone producer handle, and the producer handle can be freely copied into diﬀerent threads(while the the receiver cannot be copied around – this is ensured by Rust’s compile-timechecks). With this approach, each thread has an mpsc channel and keeps the receivinghandle to itself, while giving a copy of the producer handle to all other threads. tokio is one of the best known such runtimes in the Rust ecosystem(65). .2 Example Runtime: Balances Listing 4.2 demonstrates this process. A producer and receiver pair is created at line 2.Further down, two threads are created, where each receives a clone() of the producer. Thereceiver will stay in the starting thread, and it is used at the end to checks for incomingmessages.

Listing 4.2:

How Channels Allow Communication Between Threads (with slight simpliﬁca-tion) // In the local thread. let (producer, receiver) = std::mpsc::channel(); // spawn a new thread. std::thread::spawn(|| { // this scope is local to a new thread. let local_producer = producer.clone(); // note the cloned handle ----^^^^^^^ local_producer.send("thread1"); }); // spawn another new thread. std::thread::spawn(|| { // this scope is local to a new thread. let another_local_producer = producer.clone(); // note the cloned handle -----------^^^^^^^ another_local_producer.send("thread2"); }); // check incoming messages. while let Ok(msg) = receiver.rcv() { // do something with ‘msg‘. } Next, we demonstrate an example runtime to help the readers get familiar with the con-text of the implementation and how the ﬁnal outcome looks like. Recall from 2.1.4.11that a runtime is the core state transition logic of each chain. Our ﬁnal implementationsupports multiple runtime modules within, each having their own speciﬁc business logic.The simplest example of such a module is a balances module that takes care of storingthe balance of some accounts and allows the transfer of tokens between them. We nowenumerate some of the important bits of code involved in this module.First, there needs to be a struct to store the balance of a single account, which wenamed

AccountBalance - see listing 4.3 for the full deﬁnition.53 .2 Example Runtime: Balances

Listing 4.3:

Balance Strict /// The amount of balance that a certain account. pub struct AccountBalance { /// The amount that is free and allowed to be transferred out. free: u128, /// The amount that is reserved, potentially because of the balance beingused in other modules. reserved: u128, } The state layout of these modules is simple: there needs to be one mapping, with thekey being an account identiﬁer and the value being

AccountBalance , as in listing 4.3.

Listing 4.4:

State Layout of the Balances Module decl_storage_map!( // Auxillary name assigned to the storage struct. BalanceOf, // Auxillary name used in key hashing. "balance_of", // Key type. AccountId, // Value type. AccountBalance, ); Note that the statement in listing 4.4 is a macro (denoted by the ! notation), mean-ing that it generates a substantial amount of code at compile time. Most of this codedeals with functions for generating the key of a speciﬁc account’s balance, namely thehashing and concatenation method explained in 4.1.1. To recap, the code generated bythis macro means: the ﬁnal state key of an any account in the storage is computed as: "balances:balance_of".hash() + account.hash() .Overall, the inline documentation of the listing should make it clear that: there ex-ists a state mapping from AccountId to AccountBalance ". We have already seen what

AccountBalance is, and

AccountId is an alias for –no surprise– a public key.Finally, we can look at the only public transaction that can be executed in this module:a transfer of some tokens from one account to another one, presented in listing 4.5.

Listing 4.5:

Transfer Transaction >::key_for(origin), >::key_for(dest) ] )] fn transfer(runtime, origin, dest: AccountId, value: Balance) { .3 Generic Distributor // read the balance of the origin. let mut old_balance = BalanceOf::read(runtime, origin).or_forward()?; if let Some(remaining) = old_balance.free.checked_sub(value) { // origin has enough balance. Continue. old_balance.free = remaining; // new balance of the origin. BalanceOf::write(runtime, origin, old_balance).unwrap() // new balance of the destination. BalanceOf::mutate(runtime, dest, |old| old.free += value).or_orphan()?;

Ok(()) } else { Err(DispatchError::LogicError("Does not have enough funds."))} } The logic of the transaction should be straightforward: (1) read the origin’s balance;(2) if they have enough free balance, update the balance of the origin and destination, (3)Else, return an error.The more interesting bit is in lines 1-4 of listing 4.5, where our long promised access macro is being used in action. The interpretation of the macro is basically as follows:this transaction will (most likely ) access two state keys, the balance of the origin and thebalance of the destination. Another noteworthy detail of the implementation is the distributor component. Recallthat the distributor is responsible for tagging each transaction in the transaction queuewith the identiﬁer of one thread. We emphasize that we leave this detail generic in ourimplementation, similar to its position in chapter 3. This means that there is no concreteimplementation of a distributor in our system. Instead, any function that satisﬁes a certainrequirement can be plugged in and used as the distributor. The main detail to rememberis that we prefer the distributor to use the hints provided by the access macro. because(see chapter 3) such a distributor will be a lot more eﬀective at preventing transactionforwarding.Two examples of distributor implementation are as follows: Recall that the access macro was supposed to be a best-eﬀort guess . .4 Bonus: Optimizing Orphans • Round Robin : This distributor simply ignores the access macro and assigns trans-actions to threads one at a time, in a sequence.•

Connected Components (67): This graph processing algorithm is the exact oppo-site end of the spectrum compared to Round Robin, meaning that it heavily takesthe access macro into account.Speciﬁcally: all transactions provide a list of state keys that they might access dur-ing their execution, via the access macro. This distributor builds a bipartite graphof transactions and state keys, where each transaction has an edge to all the keysthat it might access. Therefore, two transactions that are likely to access the samestate key end up being connected , because they both have an edge to that key. Theconnected component, as the name suggests, identiﬁes these connected transactions.Every group of transactions that access a common set of state keys is grouped as acomponent. Once all components are identiﬁed, they are distributed among threadsas evenly as possible . In essence, keeping a full component as a unit of work dis-tribution ensures that transactions that might conﬂict will end up being sent to thesame thread, eﬀectively minimizing forwarded and orphaned transactions. A closer look at 4.5 reveals that the ﬁrst state access error is being handled by .or_forward()? and the second one with .or_orphan()? . The reason for this is quite interesting: Recallfrom 3.4.3.1 that an orphan transaction is basically one that needs access to state keys thatare tainted by at least two diﬀerent threads. From this, we can realize: if a transactionsuccessfully accesses any state key, this means that no other thread can execute this trans-action, because the key associated with that ﬁrst state access is already tainted. Therefore,we can conclude a massive simpliﬁcation: Only if the ﬁrst state access of a transaction failsit will be forwarded. Any further failure is simply an orphan right oﬀ the bat.

So far, all the mentioned details of this chapter are somewhat necessary to be able tocomprehend the evaluation in chapter 5. Conversely, this last section is optional, andexplains some of the details of the Taintable state implementation. This explanation doesassume even more Rust knowledge from the reader. Using a simple greedy algorithm that we do not describe here. .5 Bonus: Taintable State Let us recap the situation in the state

HashMap . We know that the state is basically a

HashMap, Vec

AccountId to an

AccountBalance . How exactly is the

AccountBalance encoded to

Vec ?• We claimed that the state is almost wait-free. How can the underlying implementa-tion support this? As noted, our tainting logic is very special to our use case, andthe implementation of a typical concurrent

HashMap will not be a good inspirationfor us, because it would have diﬀerent waiting semantics.The answers are actually connected and are provided together.Recall that our initial proposal was for the state to be wait-free, meaning that any accessto the state would succeed or fail immediately. We can solidify this idea as such: only theﬁrst access to each key of the state might incur some waiting for other threads; thereafter,all operations are wait-free. It should be clear why we can obtain this property: if a keyis accessed just once, it is tainted, and, therefore, all further accesses can be immediatelyexecuted based on that taint value. Speciﬁcally, following access from the owner threadsucceeds, while a following access from another thread fails. It is, unfortunately, impossibleto allow all accesses to be wait-free, because a write operation to a key that had not existedbefore could trigger a resize of the HashMap. Now it is clear why we titled SonicChain almost wait-free.Next, we discuss how this behavior is implemented. First, we explain what the actualvalue stored in the state is. Listing 4.6 shows the data type stored in the state. Note thatthe generic struct has separate taint and data ﬁelds. Moreover, the taint is optional ,meaning that it may or may not exist , as denoted by

Option<_> in listing 4.6.

Listing 4.6:

The state value type pub struct StateValue { /// The data itself. data: RefCell, /// The taint associated with the data. taint: Option, } The ﬁnal state type is deﬁned in listing 4.7. Note that the key type, the value type, andthe taint type are all left out generic. 57 .5 Bonus: Taintable State

Listing 4.7:

The ﬁnal generic state type pub type StateType = HashMap>; Now we can re-iterate on listing 4.1 and correct it, leading to the implementation inlisting 4.8.

Listing 4.8:

The ﬁnal ( concrete ) state type pub type Key = Vec; pub type Value = Vec; pub type ThreadId = u8; pub type State = StateType; To recap, the

Key = Vec is computed from hashing and concatenation, and the

Value = Vec is the binary encoding of any data type that we may store in the state- for example u128 for the case of balances (as seen in 4.3).Finally, we can demonstrate the locking procedure. To implement the state, we wrap a

StateType in a

RwLock inside a new struct . This struct will then implement appropriatemethods to allow the runtime to access the state - see listing 4.9.

Listing 4.9:

The wrapper for

StateType /// Public interface of a state database. pub trait GenericState { fn read(&self, key: &K, current: T) -> Result; fn write(&self, key: &K, value: V, current: T) -> Result<(), T>; fn mutate(&self, key: &K, update: impl Fn(&mut V) -> (), current: T) ->Result<(), T>; } /// A struct that implements ‘GenericState‘. /// /// This implements the taintable struct. Each access will try and taint thatstate key. Any further /// access from other threads will not be allowed. /// /// This is a highly concurrent implementation. Locking is scarce. pub struct TaintState { backend: RwLock>, } impl GenericState for TaintState { // implementation } The most interesting piece of code omitted in 4.9 is the // implementation part of58 .5 Bonus: Taintable State

GenericState for

TaintState : this where we deﬁne how and when we use the

RwLock of backend in TaintState .Speciﬁcally, each access to the backend will ﬁrst acquire a read lock (which is not block-ing, and other threads can also access it at the same time). Two possible outcomes exist:• If the key is already tainted, the state operations will fail or succeed trivially: allnon-owner threads will receive an error with the thread identiﬁer of the owner, andall owner operations will succeed. Moreover, note that the inner data in the mapitself is wrapped in a

RefCell , which allows interior mutability(68). In essence, thismeans that the owner thread can manipulate the data only by acquiring a read lock.Our tainting logic and rules ensure that this will not lead to any race conditions.• If a key is not already tainted (or non-existent in the inner

HashMap ), then the threadproceeds by trying to acquire a write lock. This is needed, because this operationis going to alter the taint ﬁeld of a

StateValue , and all other threads need to beblocked. Once the taint has been updated, the write access is immediately dropped,so that all other threads can proceed.One ﬁnal important remark is of interest. We mentioned Rust’s concurrent memorysafety with much conﬁdence in earlier sections of this chapter, claiming that it can preventmemory errors in compile time. Nonetheless, we see that using a

RefCell , we can altersome data, even when shared between threads, without any synchronization. How is thispossible?The key is that Rust does allow such operations in unsafe mode. The "unsafe mode"of Rust, sometimes called the "wild west" of Rust, is based on a contract between theprogrammer and the compiler, where the programmer manually testiﬁes that certain op-erations are memory safe, and need not be checked by the compiler. In some sense, unsafeRust is like a fallback to C , where memory can be arbitrarily accessed, allocated, de-allocated, and such.In our example, we have such a contract with the compiler as well. We know that:• If all threads access the taint with a write lock, and• If all threads access the data after checking the taint, via a read lock.All race conditions are resolved, thus no compiler checks are needed.Indeed, this contract needs to be delivered to the compiler by a single unsafe statementin our state implementation - listing 4.10. Here, Sync is a trait to mark data types that are59 .5 Bonus: Taintable State safe to be shared between threads. As the name recommends, it is a marker trait with nofunctions.

RefCell is not

Sync by default, because it allows arbitrary interior mutability.Because our

StateValue contains a

RefCell , it is also not

Sync be default. The statementin listing 4.10, which needs to be unsafe , tells the compiler to make an exception in thiscase and allow

StateValue to be used in a multi-threaded context. This allows us to wrapthe

TaintState in an

Arc (atomic reference counted pointer) and share it between threads.

Listing 4.10:

Unsafe Implementation in State unsafe impl Sync for StateValue {} Benchmark and Analysis

Rules of Optimization: Rule 1: Don’t do it. Rule 2 (for experts only): Don’t doit yet. – Michael A. Jackson4.2 introduced an example runtime module in our implementation, namely a balancesmodule that can store the balance of diﬀerent accounts and initiate transfers betweenthem. In this chapter, we build upon this module and provide benchmarks to evaluateSonicChain, as described in 3.4. First, we begin by explaining the details of the bench-marking environment, including the data set.

All experiments are executed on a personal laptop with and

32 GB 2400 MHz DDR4 RAM . We keep the machine connected to power for consis-tent results, and run no additional resource-intensive software while taking measurements.We measure the execution time of both the authoring and validation tasks. From thecomputed time, we derive the throughput in transactions per second . Recall that authoringis the process of creating a block, and validating is the task of re-importing it to ensureveracity; these tasks are performed by the author and validator , respectively. Moreover,recall that in our concurrency delegation model, by the end of the authoring phase, alltransactions are tagged with the identiﬁer of the thread that should execute them. There-fore, the validation task is fairly simpler. Furthermore, we set the access macro of thetransfer transaction to point to the account balance key of the origin and destination account, as demonstrated in listing 5.1. 61 .1 The Benchmarking Environment

Listing 5.1:

Signature of the Transfer and its Access Hints >::key_for(origin), >::key_for(dest) ] )] fn transfer(runtime, origin, dest: AccountId, value: Balance) { /*implementation */ } Then, we use this information to spawn two benchmarks, one with connected compo-nents and one with a round robin distributor. Indeed, we also use a sequential versionas baseline.The dataset is composed of two parts: the initial state and the transactions .The initial state is the state of the world before any transactions are executed. In ourcase, this maps to a number of initial accounts and an initial balance in each of them. Theinitial balance can be parameterized. The second parameter is the number of transactionsbetween the accounts. Recall that the only transaction in our balances module is transfer .For example, assuming 100 accounts and 50 transactions, each of the 50 transactions isgenerated by picking two random accounts (e.g., Alice and Bob) from the entire set of 100accounts and creating a transfer(Alice, Bob, Amount) with a ﬁxed transfer amount.In this chapter, we only focus on a variation of this dataset that we call the millionaire’splayground . This is because we assign a very large amount of initial balance to eachaccount, ensuring that it is many times larger than the transfer amount. Consequently, alltransfers will succeed .Note that despite the transfer being a dead-simple transaction, the success or failurebranches have diﬀerent state access requirements . Namely, a succeeding transactionaccesses the balance of both the origin and the destination of the transfer, while a failingone only accesses the former. This, next to the value placed as a hint in the access macrocan lead to interesting combinations that are outside the scope of this chapter, but will bediscussed further in chapter 6.We assume that there are no time limits imposed on the author of the block, and allowit to ﬁnish executing all transactions. Of course, as we already delineated earlier, this isnot how things work in reality (see 2.1.4.7). Nonetheless, this model allows us to be able toclearly see the throughput diﬀerence between the sequential execution and the concurrentones.From within authoring, we do not measure the execution time of the generic distributor.A critical reader might think this is a way for us to avoid taking the execution timeof connected components into account. We strongly assert otherwise. In most modern62 .2 Benchmark Results blockchains, authors know, well in advance, if and when they author blocks. Therefore,it is very sensible for them run the distributor procedure (be it the expensive connectedcomponents, or the cheap round-robin, or anything else) over their pool of transactions inadvance, as a form of pre-processing. Even if they do not know when they might authora block, it is rather straightforward to periodically/regularly run the distributor on theirlocal pool . For the simpler task of validation, we measure the entire process of parsingthe block and executing it.We must also address the state storage issue. In a real-life scenario, the state databaseis likely to be kept on high latency storage, and therefore access to new keys might beorders of magnitude slower (particularly the ﬁrst one, depending on the caching) than anycomputation. Our implementation keeps the entire state in an in-memory HashMap . Weacknowledge that this is likely to be too simplistic and to compensate, we artiﬁcially insert sleep operations into the read and write operations of the ﬁnal state implementation.Lastly, we assume that both the authoring and validation uses the same degree of con-currency, meaning that everyone has the same number of worker threads, ready to receivetasks and transactions.

For the ﬁrst demonstration, we ﬁx the number of accounts and gradually create more(transfer) transactions between them. For all 3 classes of executions (sequential, round-robin, connected components) we generate 1000 members and increase the number oftransactions from 250 to 2000. Both the validation and authoring times are measured. Allexecutions utilize 4 worker threads. Table 5.1 presents all the results in one picture.Despite our ﬁrst benchmark being a simple one, it already unravels plenty of details andhidden traits about the system’s behavior. Thus, we make the following observations:

Sequential

The sequential execution is nothing special. As expected, the execution time of both tasksincreases linearly as the number of transactions increases. The throughput, as expected,stays more or less the same. Lastly, we anecdotally realized that running connected components is not a real bottleneck in graphswith sizes in the order of a few thousand transactions, and with our experimental setup. .2 Benchmark Results Table 5.1:

Benchmarking results with the millionaire’s playground data set. All times are inms. RR and CC stand from round robin and connected components distributors, respectively."4" refers to the number of workers. type members transactions authoring (ms) authoring tps validation (ms) validation tpsSequential 1000 250 1899

Sequential 1000 500 3639

Sequential 1000 1000 7548

Sequential 1000 2000 15053

Concurrent(RR-4) 1000 250 898

Concurrent(RR-4) 1000 500 2161

Concurrent(RR-4) 1000 1000 5517

Concurrent(RR-4) 1000 2000 12788

Concurrent(CC-4) 1000 250 625

Concurrent(CC-4) 1000 500 1210

Concurrent(CC-4) 1000 1000 9510

Concurrent(CC-4) 1000 2000 19698

Round Robin

The behavior of the round-robin distributor degrades over time. With a small number oftransactions (e.g. 250), the throughput increase in authoring is more than 100%. As thenumber of transactions grows more toward 1000, the increase drops. At 1000 transactions,authoring is only a smidgen 20% more than the sequential throughput.The reason for this behavior is of interest. Recall that round-robin will distribute thetransactions between the threads with no particular knowledge. This is expected to causea fairly large number of transactions to become forwarded or orphans. Our executionlogs clearly demonstrate this behavior. For example, we have the following lines from theexecution log of round-robin with 250 transactions. [Worker [Worker [Worker [Worker [Master ] - Finishing Collection phase with [179 executed][32 forwarded][39orphaned] Lines 1-4 contain the

AuthoringReport message sent from the worker to master. Thetwo numeric ﬁelds of this message are the number of transactions that got executed andforwarded respectively. As seen in the log, each worker notiﬁed the master that it failed toexecute a portion of their designated transactions. Line 5 is the aggregate information logof the master thread once all the transactions, except the orphans, are executed. As seenin the log, from the 250 transactions, 179 were executed in their designated thread, 32 were64 .2 Benchmark Results forwarded and executed, and ﬁnally, 39 were orphaned. This clearly shows the consequenceof round robin’s blindness toward the access macro. The amount of transactions that gotforwarded and orphaned directly contributes to a reduction in throughput.Now, we can see the equivalent log in the execution with 1000 transactions. [Master] - Finishing Collection phase with [389 executed][146 forwarded][465orphaned] With 1000 transactions, more than half of them failed to execute in their designatedthread, and ended up being forwarded or orphaned. This further demonstrates why thethroughput drops from 250 to 1000 in round-robin.As for validation, we see that the validation throughput is analogous to that of authoringin each row, but slightly better. Moreover, similar to authoring, the throughput dropsas we increase the transactions. It is very important to understand why. Recall thatduring validation, the validator simply uses the tags provided by the author to knowwhich transaction needs to be executed where. Now, let us analyze the destiny of theimperfect transactions, namely forwarded ones and orphans. The forwarded transactionswill be treated by the validator as if they were not forwarded. These transactions are stillsimply assigned to a thread, and the validator can eﬀectively save time by executing themconcurrently, in multiple threads. On the other hand, the orphan transactions are a loss of throughput for both the author and the transaction. If a transaction is declared as anorphan, not only the author, but also the validator both lose any throughput gains forthat transaction. This clariﬁes the throughput trend of the validation in round-robin, andhow it drops as we increase the transactions. The underlying reason is, in fact, that asthe number of transactions grows, the number of orphans also increases in the round-robinbenchmarks. Therefore, a decline in the throughput of the validator is also expected.

Connected Components

The outcome of the connected components is even more interesting. For transaction counts250 and 500 we see much better throughput than both sequential and round-robin. Thistrend applies to both authoring and validation. Nonetheless, for transaction counts 1000and 2000 we observe the throughput plummets down to rates lower than the sequentialthroughput.The reason for this can also be explained by examining the logs. First, let us look at thesame log lines for one of the good execution, namely 500 transactions.65 .2 Benchmark Results [Worker [Worker [Worker [Worker [Master ] - Finishing Collection phase with [500 executed][0 forwarded][0orphaned] Interestingly, this time all worker threads executed all of their designated transactionswith no error, leaving no particular leftover work to do for the master thread. Such caseslead to slightly less than 4-fold throughput gain in authoring and a perfect 4-fold gain invalidation.Now let us examine what goes wrong in the case of 1000 transactions. [Worker [Worker [Worker [Worker [Master ] - Finishing Collection phase with [1000 executed][0 forwarded][0orphaned] In this interesting case, jumping into line 5 of the logs that gives an overview actuallyunravels absolutely nothing: It still seems like all the transactions are executed in theirdesignated thread. The problem is hidden in the number of transactions that each threadreceived. As seen in lines 1-4, the ﬁrst 3 threads received a total of 24 transactions, whileall the remaining transactions were given to the last worker thread. The reason for this isrooted in the number of accounts and transactions. Given 1000 accounts, generating 1000transfers from them is likely to create a large blob of interconnected transactions. Thisis something that the connected components cannot deal well with. The outcome is thata very large chunk of transactions will be assigned to one giant component, consequentlygiven to one worker thread to execute. This is, essentially, a typical work imbalanceproblem that can be seen in diﬀerent ﬁelds of parallel and concurrent computing. Giventhis, the connected component execution, in this case, is basically sequential, and it has awhole lot of overhead from message passing. Therefore, the throughput drops signiﬁcantly,even below the sequential one. 66 .2 Benchmark Results

As for the validation, we can use the conclusions from the round-robin section to reasonabout its behavior. Recall that forwarded transactions are not an overhead for the valida-tor, and only orphan transactions can cause an overhead. In the case of 1000 and 2000transactions and connected components, the validation throughput is almost the same asthat of the sequential execution. This is because the distributor essentially linearizes thetransactions into a sequential group. Moreover, all of the overhead is for the author. There-fore, it is expected that the throughput of the validator is almost the same as sequentialexecution.The results indicate diﬀerent traits and characteristics of the entire system as a whole,and our two distributor components of choice. Round robin is an example of a system inwhich we use concurrency but without any pseudo-static hints. On the contrary, connectedcomponents is a prime example of scenarios where the decision of transaction distribution isentirely based upon pseudo-static hints. While the results are in favor of the latter, we alsoobserved that in some niche scenarios, connected components is inﬂexible and thereforeturns out to not be the best option. In the next chapter, we summarize these details intothe conclusion of our work. 67

Conclusion

Simplicity is a prerequisite for reliability. – Edsger DijkstraWe have embarked on a long journey to reach this stage of our work. To make theconclusion more comprehensible, we ﬁrst brieﬂy recap what we have done so far, and whatwe have observed in the benchmarks of chapter 5. We then enumerate our conclusiveobservations in 6.2. We then answer the research questions in 6.3. Finally, we mentionsome of the future work to be done in 6.4.

We began by making minimal, yet important assumptions about what a blockchain sys-tem should look like, whilst explaining the details thereof in chapter 2. Looking back inhindsight, the most important assumption that we have is the key-value based state im-plementation. With the state, we can analogize a runtime executing transaction over a state to a thread in a multi-core CPU, trying to access memory as it executes code. Weare then faced with a problem with a setup very similar to that of the shared state con-currency, except the absolute need for determinism. In essence, determinism is the mainblockchain-speciﬁc challenged imposed on this problem.We identiﬁed the de-facto way of solving this challenge in contemporary literature tobe building dependency graphs and piggy-backing them into the block. With furtherinvestigation, we decided to not proceed further down this path: instead, we consideredthat a much simpler model can be suﬃcient, if we utilize all of the information available.In chapter 3 we declared " a much simpler model " to be the concurrency delegation model, where conﬂicting transactions can be forwarded between threads when possible,or executed sequentially, at the end of the processing "cycle", otherwise. In essence, we68 .2 Discussion and observations remove the chance of non-determinism by allowing threads to only execute non-conﬂictingtransactions. In other words: intra-thread transactions can conﬂict as much as they want,but inter-thread transactions must not conﬂict at all. All transactions that generate inter-thread conﬂicts are deemed to be orphan, and will be executed sequentially. The mainadvantage of the delegation model is its ﬁnal outcome: because of the absence of inter-thread conﬂicts, the only bit of data that needs to be added to each transaction is one ﬁnalidentiﬁer about which thread should execute it. The validator is guaranteed to be able todeterministically re-execute the block, whilst gaining potential throughput beneﬁts.Furthermore, we make sure we " utilize all of the information available " by adding somepseudo-static hints on top of each transaction. These hints need to be provided by theprogrammer, and denote the state keys that are likely to be accessed by the transaction.These hints do not need to be highly accurate. Moreover, we restricted our hints to datathat is cheap to know prior to the execution of the transaction. Basically, the dilemmais to, not solve, but rather ﬁnd a way around the halting problem(58). To do so, weacknowledge that, most often, transactions are simple units of logic, with speciﬁc behaviorgiven diﬀerent inputs . Given this, it is likely to be easy for the programmer to providesomewhat accurate hints about the behavior of a transaction.We combined these features in a concurrent system for high-throughput transactionprocessing called SonicChain. Furthermore, we provided a prototype implementation of thesystem, where we included the access macro and the distributors - see Chapter 4. We usedthis prototype to evaluate an example application called "the millionaires’ playground". Wefurther benchmarked the performance of this example application. The results, presentedin chapter 5, show throughput improvements for both distributor types, when executedagainst a synthetic balance transfer workload. Our work has lead to the following observations.

The transactions are key . Needless to say, the type of transactions that are beingexecuted, and the amount of inter-transaction state dependencies between them, is the ﬁrstand foremost factor of throughput improvement. This factor trumps the underlying systemas well. With a workload that is made almost entirely from interdependent transactions,even the best of systems probably fails to deliver high throughput. Moreover, with a Blockchain transactions are expensive pieces of code to execute, because they need to be re-executedby hundreds, if not thousands of other nodes. They are not designed to execute complex arbitrary logic,and are unlikely to evolve in this direction in the foreseeable future. .3 Research Questions and Answers workload with almost no inter-transaction dependencies, probably most systems performwell. We showed, in our experiments, that our system is no diﬀerent in that regard.Nonetheless, the important point is that we managed to achieve ideal throughput gains byeﬀectively leveraging the static hints of the transactions. Simplicity . Our work is, in some sense, a retaliatory demonstration against complicatedconcurrency control mechanisms deployed (mostly in academia) on blockchains. Our maingoal was not to trump their results and prove that our system is better per-se. Rather, weshowed that for certain workloads (that are –reasonably speculating– not far from reality),a much simpler system will also be enough. Moreover, the new and simpler system isactually beneﬁcial upon a certain axis, for example, smaller runtime and block overhead.

Generic Design . This work is generic at two diﬀerent levels.

First , our access macrois merely one example of how the static data that a transaction carries can be used.Implementations could vary, or diﬀerent means of analysis can be used. The only pointis to remain aware that the goal is to be able to infer the state access requirements oftransactions without executing them . Second , with or without the access macro, weleave the decision of how to use the output of static hints to be generic in the form ofcomponent that we called the distributor. We do provide two such distributors, merely toillustrate the diﬀerence in complexity, and discuss their processing cost, but many othervariants can be envisioned.

Based on this work, the answers to our research questions (formulated in section 1.1) areas follows.

RQ1

What approaches exist to achieve concurrent execution of transactions within ablockchain system, and improve the throughput?

Answer

We identiﬁed the current usages of concurrency within blockchains to within one ofthe two categories that we depicted: concurrency control or concurrency avoidance.Both do result in a throughput gain. Full answer in 3.3.

RQ2

How could both static analysis and runtime approaches be combined together toachieve a new approach with minimum overhead and measurable beneﬁts? As before, by static we mean static with respect to the execution of the transaction. .4 Future WorkAnswer We basically opted to reduce the runtime apparatus (coined: concurrency delega-tion) and instead compensate with pseudo-static advisory data that can complementthe runtime to achieve an equally good result as concurrency control but with lessoverhead in validation. Full answer in 3.4.

RQ3

How would such an approach be evaluated against and compared to others?

Answer

We used an empirical evaluation approach to measure the throughput gain. A real-istically synthesized data set was created for this purpose. We concluded that ourapproach is suﬃcient to achieve ideal throughput gains (given the number of threads)under certain circumstances, which supports our claim that a simpler system couldbe well enough.Summarizing, this work makes the following contributions:1. A new runtime model for concurrency with minimal overhead, namely concurrencyDelegation2. A proposal to couple runtime machinery with pseudo-static information for betterresult.3. A ﬁrst prototype of such a system, namely SonicChain, implemented in the Rustprogramming language.4. An empirical analysis of the system against realistic workloads.

Our discussion and observations already hint at some of the future work that we proposeto be pursued. Here, we enumerate such future work directions in more detail.

Inaccurate hints . We boldly mention that static hints do not need to be accurate forthe system to work properly. Of course, they should not be totally unrelated. A goodstarting point to reason about hints is to look at the transaction’s logic and proceed withthe control ﬂow optimistically or pessimistically. Which ﬂow has the most state access?Which one has the least? The interesting follow-up question is how does the behavior ofthe system change when we move between diﬀerent access macros? One might recall, thetransfer transaction that we used before had a pessimistic access macro: we assumed thebalance of both the origin and the destination will be accessed. This is not alwaysthe case. If the origin does not have enough funds, then the transaction ﬁnishes with an71 .4 Future Work error, and only the balance key of the origin is accessed (i.e. tainted ). We excluded suchexperiments from our work here for the sake of brevity.

Probabilistic Hints . An interesting addition to the previous point is the mixing ofprobability theorems with the access macro. In essence, instead of relying on the program-mer to provide the hints, the system could use benchmarks and previous data to buildprobabilistic models to accurately predict the access requirements of a transaction givenspeciﬁc inputs. This basically transforms the access macro from being (pseudo) static to probabilistic . At the very extreme end of this spectrum, one could even see the utilizationof artiﬁcial intelligence to accurately generate the access requirements of a transaction. Read-Write Taint . A comment on our work that can fundamentally question ourapproach is: why is there no distinction between read and write operations? This is a faircomment, and we agree that it is an unorthodox approach compared to the rest of theliterature in concurrent computing. We did not opt for this approach in our work becauseof our strong preference for simplicity. There are many paths to follow along the line ofread-write distinction. Most would make our system quite similar to a cache coherencemodel, where a write operation needs to be notiﬁed to others and potentially invalidateother previous read operations. This is already a red line for us, as we do not want to carrythe burden of rollbacks, both because of their overhead and inherent non-determinism. Aﬁnal interesting comment about this aspect would be that read-write taint is also likelyto introduce read-only access macro hints. We foresee that it will be very useful for thedistributor to be able to know if an access hint is read-only or not. All in all, we foreseeinteresting outcomes if the path of read-write taints is pursued, but express worry abouthow many complications it will add to the system, compared to the actual throughputgain. The obvious beneﬁt of this addition to the work would be that threads that only read certain data will not block other threads.

Continuous Analysis of Real Chains . After all, everything that we have donehere is in some sense hypothetical, because we do not use real transaction from a realchain. It is of great value for further studies to analyze diﬀerent, domain-speciﬁc (e.g.Bitcoin) and general-purpose chains (e.g. Ethereum), and derive characteristics from theirtransactions (as it was already done in (54)). For example, an easy study is to applyconnected components to diﬀerent blocks of the Ethereum network and see how well theycan be clustered into disjoint components with a balanced size. Will we face the sameimbalance problem as with our experiments in chapter 5?

Hybrid distributor.

In our work we did not explore the possibility of hybrid distrib-utors. This can also be a fairly simple addition to the system, making it more versatile.72 .4 Future Work

Even only within the two distributors that we discussed, we could see that it was not thecase that one distributor is the best in all cases. Round robin had limited throughputgains, but it was still better than the ideal connected components in data sets that hadhighly intertwined transactions. We propose building hybrid distributors that potentiallychose the best distribution function based on particular criteria, speciﬁc to the applicationand/or dataset at hand.

Nonce-aware execution . We explained what a nonce is in 2.1.4.14. A nonce is ofimportance to us because it is very likely to be the common key between many transactionsthat cause them to become interdependent. For example, imagine an account

Alice thathas state keys in diﬀerent runtime modules. One for balance, and one for diﬀerent eachmodule. These state keys can be accessed independently, but bringing the logic of nonceinto the picture, things get a wee bit more complicated. Now, all of these transactionsneed to write to the key that stores the nonce of Alice, and they are all dependent on oneanother. Things can get even worse. There could be special state keys that need to beaccessed by all transactions (for example: a counter for all transaction in the block thatis kept in the state). This forcefully makes all of the transactions in the block sequentialif no special treatment is in place for them. And indeed, this is our main point: we omitsuch special cases from our work because they should be treated specially in order for anyconcurrency model to be sensible. 73 eferences [1]

Craig Pirrong . Will Blockchain Be a Big Deal? Reasons for Caution . J.Appl. Corp. Finance , (4):98–104, 2019. 1[2] Arthur Gervais, Ghassan O. Karame, Karl Wüst, Vasileios Glykantzis,Hubert Ritzdorf, and Srdjan Capkun . On the Security and Performanceof Proof of Work Blockchains . In

Proceedings of the 2016 ACM SIGSAC Con-ference on Computer and Communications Security , CCS ’16, pages 3–16, Vienna,Austria, October 2016. Association for Computing Machinery. 2, 31[3]

IMRAN BASHIR . MASTERING BLOCKCHAIN: Distributed Ledger Technology,Decentralization, and Smart Contracts Explained, 2nd Edition;Distributed Ledger.

PACKT Publishing, Place of publication not identiﬁed, 2018. 4[4]

Maurice Herlihy . Blockchains from a Distributed Computing Perspective . Commun. ACM , (2):78–85, January 2019. 4[5] P. Baran . On Distributed Communications Networks . IEEE Trans. Commun.Syst. , (1):1–9, March 1964. 5[6] . Paul Baran and the Origins of the Internet

W. Diffie and M. Hellman . New Directions in Cryptography . IEEE Trans.Inf. Theory , (6):644–654, November 1976. 6, 8[8] Ralph C. Merkle . Secure Communications over Insecure Channels . Com-mun. ACM , (4):294–299, April 1978. 6, 19[9] Stuart Haber and W. Scott Stornetta . How to Time-Stamp a DigitalDocument . J. Cryptology , (2):99–111, January 1991. 674 EFERENCES [10]

David Chaum, Amos Fiat, and Moni Naor . Untraceable Electronic Cash . In

Shafi Goldwasser , editor,

Advances in Cryptology — CRYPTO’ 88 , Lecture Notesin Computer Science, pages 319–327, New York, NY, 1990. Springer. 6[11]

Satoshi Nakamoto . Bitcoin: A Peer-to-Peer Electronic Cash System .page 9. 6[12]

Cynthia Dwork and Moni Naor . Pricing via Processing or CombattingJunk Mail . In

Ernest F. Brickell , editor,

Advances in Cryptology — CRYPTO’92 , Lecture Notes in Computer Science, pages 139–147, Berlin, Heidelberg, 1993.Springer. 6[13]

Billy Bob Brumley and Nicola Tuveri . Remote Timing Attacks Are StillPractical . Technical Report 232, 2011. 9[14]

Mihir Bellare, Ran Canetti, and Hugo Krawczyk . Keying Hash Func-tions for Message Authentication . In

Neal Koblitz , editor,

Advances in Cryp-tology — CRYPTO ’96 , Lecture Notes in Computer Science, pages 1–15, Berlin, Hei-delberg, 1996. Springer. 10[15]

Gaetano Carlucci, Luca De Cicco, and Saverio Mascolo . HTTP overUDP: An Experimental Investigation of QUIC . In

Proceedings of the 30thAnnual ACM Symposium on Applied Computing , SAC ’15, pages 609–614, Salamanca,Spain, April 2015. Association for Computing Machinery. 11[16]

Sergi Delgado-Segura, Cristina Pérez-Solà, Guillermo Navarro-Arribas, and Jordi Herrera-Joancomartí . Analysis of the Bitcoin UTXOSet . Technical Report 1095, 2017. 11[17]

Ethereum Whitepaper . https://ethereum.org. 12[18]

Leslie Lamport, Robert Shostak, and Marshall Pease . The ByzantineGenerals Problem . ACM Trans. Program. Lang. Syst. , (3):382–401, July 1982. 15[19] Christian Stoll, Lena Klaaßen, and Ulrich Gallersdörfer . The CarbonFootprint of Bitcoin . Joule , (7):1647–1661, July 2019. 16[20] Yevgeniy Dodis and Aleksandr Yampolskiy . A Veriﬁable Random Func-tion with Short Proofs and Keys . In

Serge Vaudenay , editor,

Public KeyCryptography - PKC 2005 , Lecture Notes in Computer Science, pages 416–431, Berlin,Heidelberg, 2005. Springer. 16, 31 75

EFERENCES [21]

Stefano De Angelis, Leonardo Aniello, Roberto Baldoni, Federico Lom-bardi, Andrea Margheri, and Vladimiro Sassone . PBFT vs Proof-of-Authority: Applying the CAP Theorem to Permissioned Blockchain . In

Italian Conference on Cyber Security (06/02/18) , January 2018. 17[22]

Wenbo Wang, Dinh Thai Hoang, Peizhao Hu, Zehui Xiong, Dusit Niyato,Ping Wang, Yonggang Wen, and Dong In Kim . A Survey on ConsensusMechanisms and Mining Strategy Management in Blockchain Networks . IEEE Access , :22328–22370, 2019. 18[23] Paul Vigna . The Great Digital-Currency Debate: ‘New’ Ethereum Vs.Ethereum ‘Classic’ , August 2016. 18[24]

Ralph C. Merkle . A Digital Signature Based on a Conventional EncryptionFunction . In

Carl Pomerance , editor,

Advances in Cryptology — CRYPTO ’87 ,Lecture Notes in Computer Science, pages 369–378, Berlin, Heidelberg, 1988. Springer.19[25]

Leslie Lamport . Time, Clocks, and the Ordering of Events in a DistributedSystem . Commun. ACM , (7):558–565, July 1978. 24[26] . ISO/IEC 9899:2011

Zvi Kedem and Abraham Silberschatz . Controlling Concurrency UsingLocking Protocols . In , pages 274–285, October 1979. 26[28]

R. J. T Morris and W. S Wong . Performance Analysis of Locking and Opti-mistic Concurrency Control Algorithms . Performance Evaluation , (2):105–118,May 1985. 26[29] Rachid Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, andVasileios Trigonakis . Lock& . ACM Trans. Comput. Syst. , (1):1:1–1:149, March 2019. 26[30] Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and DerekDreyer . RustBelt: Securing the Foundations of the Rust Programming EFERENCESLanguage . Proc. ACM Program. Lang. , (POPL):66:1–66:34, December 2017. 26,50[31] Aaron Weiss, Olek Gierczak, Daniel Patterson, Nicholas D. Matsakis,and Amal Ahmed . Oxide: The Essence of Rust . ArXiv190300982 Cs , August2020. 26[32]

Maurice Herlihy and Nir Shavit . The Art of Multiprocessor Programming, Re-vised Reprint . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edi-tion, 2012. 26[33]

Tom Knight . An Architecture for Mostly Functional Languages . In

Pro-ceedings of the 1986 ACM Conference on LISP and Functional Programming , LFP’86, pages 105–112, New York, NY, USA, August 1986. Association for ComputingMachinery. 27[34]

Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D.Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, ChristosKozyrakis, and Kunle Olukotun . Transactional Memory Coherence andConsistency . In

Proceedings of the 31st Annual International Symposium on Com-puter Architecture , ISCA ’04, page 102, USA, March 2004. IEEE Computer Society.27[35]

Maurice Herlihy and J. Eliot B. Moss . Transactional Memory: Archi-tectural Support for Lock-Free Data Structures . SIGARCH Comput. Archit.News , (2):289–300, May 1993. 27[36] Share Memory By Communicating - The Go Blog .https://blog.golang.org/codelab-share. 28[37]

MostlyAdequate/Mostly-Adequate-Guide . https://github.com/MostlyAdequate/mostly-adequate-guide. 28[38]

Ricardo J. Dias, João M. Lourenço, and Nuno M. Preguiça . Eﬃcient andCorrect Transactional Memory Programs Combining Snapshot Isolation and StaticAnalysis . 29[39]

Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani . Inferring Locksfor Atomic Sections . August 2007. 2977

EFERENCES [40]

Alessio Meneghetti, Tommaso Parise, Massimiliano Sala, and DanieleTaufer . A Survey on Eﬃcient Parallelization of Blockchain-Based SmartContracts . ArXiv190400731 Cs , February 2019. 31[41]

Vitalik Buterin and Virgil Griffith . Casper the Friendly Finality Gadget . ArXiv171009437 Cs , January 2019. 31[42]

Alistair Stewart . Poster: GRANDPA Finality Gadget . In

Proceedings of the2019 ACM SIGSAC Conference on Computer and Communications Security , CCS ’19,pages 2649–2651, New York, NY, USA, November 2019. Association for ComputingMachinery. 31[43]

Sébastien Forestier, Damir Vodenicarevic, and Adrien Laversanne-Finot . Blockclique: Scaling Blockchains through Transaction Sharding ina Multithreaded Block Graph . ArXiv180309029 Cs , September 2019. 32[44]

Mustafa Al-Bassam, Alberto Sonnino, Shehar Bano, Dave Hrycyszyn,and George Danezis . Chainspace: A Sharded Smart Contracts Platform . ArXiv170803778 Cs , August 2017. 32[45]

Baheti Shrey, Anjana Parwat Singh, Peri Sathya, and Simmhan Yogesh . DiPETrans: A Framework for Distributed Parallel Execution of Transac-tions of Blocks in Blockchain . ArXiv190611721 Cs , June 2019. 32[46]

Huma Pervez, Muhammad Muneeb, Muhammad Usama Irfan, and Irfan UlHaq . A Comparative Analysis of DAG-Based Blockchain Architectures .In , pages 27–34, December 2018. 32[47]

Divya M and Nagaveni B. Biradar . IOTA-Next Generation Block Chain . Int. J. Eng. Comput. Sci. , (04):23823–23826, April 2018. 32[48] Yonatan Sompolinsky, Yoad Lewenberg, and Aviv Zohar . SPECTRE: AFast and Scalable Cryptocurrency Protocol . Technical Report 1159, 2016. 32[49]

Daniel Perez and Benjamin Livshits . Broken Metre: Attacking ResourceMetering in EVM . ArXiv190907220 Cs , March 2020. 35[50]

Thomas Dickerson, Paul Gazzillo, Maurice Herlihy, and Eric Koskinen . [SmartLocks] Adding Concurrency to Smart Contracts . ArXiv170204467 Cs ,February 2017. 36, 37 78

EFERENCES [51]

Enabling Concurrency on Smart Contracts Using Multiversion Ordering . SpringerBerlin Heidelberg, New York, NY, 2018. 37[52]

Parwat Singh Anjana, Sweta Kumari, Sathya Peri, and Archit Somani . Eﬃcient Concurrent Execution of Smart Contracts in Blockchains UsingObject-Based Transactional Memory . ArXiv190400358 Cs , August 2019. 37[53]

Parwat Singh Anjana, Sweta Kumari, Sathya Peri, Sachin Rathor, andArchit Somani . An Eﬃcient Framework for Optimistic Concurrent Execu-tion of Smart Contracts . ArXiv180901326 Cs , January 2019. 37[54]

Vikram Saraph and Maurice Herlihy . An Empirical Study of SpeculativeConcurrency in Ethereum Smart Contracts . ArXiv190101376 Cs , January 2019.37, 39, 72[55]

Massimo Bartoletti, Letterio Galletta, and Maurizio Murgia . A TrueConcurrent Model of Smart Contracts Executions . ArXiv190504366 Cs , May2019. 38[56] darryl . RCast 21: The Currency of Concurrency , March 2019. 38[57]

David Turner . The Polymorphic Pi-Calculus: Theory and Implementation .July 1996. 38[58]

L Burkholder . The Halting Problem . SIGACT News , (3):48–60, April 1987.47, 69[59] Kian Paimani . Kianenigma/SubSonic , September 2020. 50[60]

Steve Klabnik and Carol Nichols . The Rust Programming Language (CoversRust 2018) . No Starch Press, September 2019. 50[61]

Fearless Concurrency - The Rust Programming Language . https://doc.rust-lang.org/book/ch16-00-concurrency.html. 50[62]

Rust in Blockchain . https://rustinblockchain.org/. 51[63]

J. Barnat, P. Ročkai, V. Štill, and J. Weiser . Fast, Dynamically-SizedConcurrent Hash Table . In

Bernd Fischer and Jaco Geldenhuys , editors,

Model Checking Software , Lecture Notes in Computer Science, pages 49–65, Cham,2015. Springer International Publishing. 5179

EFERENCES [64]

Rust’s Journey to Async/Await

Tokio - Rust . https://docs.rs/tokio/0.3.2/tokio/. 52[66]

Std::Sync::Mpsc - Rust . https://doc.rust-lang.org/std/sync/mpsc/. 52[67]

Esko Nuutila and Eljas Soisalon-Soininen . On Finding the StronglyConnected Components in a Directed Graph . Information Processing Letters , (1):9–14, January 1994. 56[68] RefCell < T > and the Interior Mutability Pattern - The Rust ProgrammingLanguageand the Interior Mutability Pattern - The Rust ProgrammingLanguage