[PDF] Automated Test-Case Generation for Solidity Smart Contracts: the AGSolT Approach and its Evaluation

Abstract

Blockchain and smart contract technology represent novel approaches to trusted services computing, opening the way to services designed specifically for trusted computing. Nevertheless, testing smart contracts is still in its infancy, with plenty of challenges not yet fully explored. We argue that existing tools are primarily for vulnerabilities detection and do not produce test suites suited for human oracles. In this paper, we present AGSOLT, a tool for Automated Generation of Solidity Test Suites. We evaluate the tool's efficiency by implementing two search algorithms to automatically generate test suites for stand-alone Solidity smart contracts, considering some of the blockchain-specific challenges. Subsequently, to test AGSOLT in a realistic service operations scenario, we compared a random search algorithm and a genetic algorithm on a set of 36 real-world service applications featuring smart contracts. We found that AGSOLT is capable of achieving high branch coverage with both approaches and even discovered that some of the most popular Solidity smart contracts on GitHub have design flaws that might, for example, make code functions easily un-executable as a result of requiring too much gas. We conclude that AGSOLT provides a very valuable addition to service operations' pipelines supporting trusted computing applications based on smart contracts.

Full PDF

TTHIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. AGSolT: a Tool for Automated Test-CaseGeneration for Solidity Smart Contracts

Stefan Driessen ∗ , Dario Di Nucci ∗ , Geert Monsieur † ,Damian A. Tamburri † , and Willem-Jan van den Heuvel ∗∗ Tilburg University, Jheronimous Academy of Data Science, Netherlands † Eindhoven University of Technology, Jheronimous Academy of Data Science, Netherlands

Abstract —Blockchain and smart contract technology are novel approaches to data and code management, that facilitate trustedcomputing by allowing for development in a distributed and decentralized manner. Testing smart contracts comes with its own set ofchallenges which have not yet been fully identiﬁed and explored. Although existing tools can identify and discover known vulnerabilitiesand their interactions on the Ethereum blockchain through random search or symbolic execution, no framework exists for applyingadvanced, multi-objective algorithms to create test suites for such smart contracts. In this paper, we present AGS OL T (AutomatedGenerator of Solidity Test Suites). We demonstrate its efﬁciency by implementing two search algorithms to automatically generate testsuites for stand-alone Solidity smart contracts, taking into account some of the blockchain-speciﬁc challenges. To test AGS OL T, wecompared a random search algorithm and a genetic algorithm on a set of 36 real-world smart contracts. We found that AGS OL T iscapable of achieving high branch overage with both approaches and even discovered some errors in some of the most popular Soliditysmart contracts on Github.

Index Terms —Automated Test Case Generation; Smart Contracts; Blockchain; Search Algorithms. (cid:70)

NTRODUCTION

Blockchain and smart contract technologies are novel ap-proaches to data and code management. They facilitatetrusted computing by gracefully allowing for developmentin a distributed and decentralized manner. Smart Contractsare capsules of code, similar to classes in object-orientedprogramming languages, such as Java and Python, whichare deployed on distributed systems such as blockchains.Smart Contracts and blockchains have seen a major risein popularity in recent years [1], [2]. In large part, this is dueto the inherent qualities of blockchains, such as immutabilityof data and ease of access to the data stored, which rendersextensive testing of critical importance, especially beforecode deployment. So far, research on testing smart contractshas focused primarily on identifying smart contracts andblockchain vulnerabilities [3], [1], [2], [4], and applyingrather basic techniques such as fuzzing to detect thesevulnerabilities [5], [6], [7].Forays into the ﬁeld of automated test case genera-tion for smart contracts such as Oyente [2] and Contract-Fuzzer [5] are certainly promising but do unfortunatelynot facilitate unit testing of smart contracts. Additionally,existing literature has suggested that these approaches runthe serious risk of being too simplistic to fully capture morecomplex applications [8], [9].This paper analyzes some of the challenges for design-ing a tool for automated test case generation for smartcontracts. We ﬁnd that the challenges previously identi-ﬁed [10] for popular programming languages such as Javaand C still hold, but additional qualities are desirable.To handle these challenges mentioned above, we present AGS OL T (Automated Generator of Solidity Test Suites),an automated test case generation tool for unit testing forthe smart contract programming language Solidity on theEthereum blockchain. AGSolT creates concise test suitesfor individual smart contracts while aiming to achieve ahigh branch coverage level. AGS OL T could lead the way tocreate higher-quality test cases that exercise more in-depthfeatures of smart contracts as it facilitates the application ofmetaheuristic techniques for automated test-case generationfor Solidity smart contracts.AGS OL T implements two common approaches for auto-mated test case generation: (1) fuzzing, which is a randomtesting approach, and (2) genetic algorithms, which are asearch-based testing approach. On the one hand, fuzzinggenerates test cases randomly; on the other hand, the geneticalgorithms iteratively improve an initially random set of testcases through a search guided by one or more objectivefunctions. Previous research [11], [8] has shown that bothapproaches can be equally effective when generating testsuites, which makes them both valid approaches to anautomated test case generation problem.We conducted an empirical study on 36 real-world smartcontracts to assess the effectiveness of AGS OL T and take acloser look at how the two approaches compare for Soliditysmart contracts. As far as the authors are aware, this is theﬁrst comparison of the sort in the domain of smart contracts.We ﬁnd that AGS OL T achieves good branch cover-age on a variety of smart contracts and can detect errorsin some of the most popular smart contracts on Github.Both approaches show promise for future investigation,

1. Available online at https://github.com/AGSolT/AGSolT-2020-Submission a r X i v : . [ c s . S E ] F e b HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. although genetic algorithms might be slightly more suitablefor achieving branch coverage on speciﬁc types of smartcontracts. Although neither approach is signiﬁcantly fasterthan the other, our experiments seem to indicate that aguided search that prefers smaller test cases might be betterat reducing the time spent running the tests on a blockchainimplementation.In sum, this paper contributes to the state-of-the-art by:1) Proposing a set of challenges speciﬁc to theblockchain domain that any automated test casegeneration tool should aim to overcome.2) Introducing AGS OL T an automated test case gener-ation tool, capable of:a) Creating small, human-readable test suitesthat are optimized for branch coverage.b) Allowing for the implementation of differenttypes of algorithms such as random-testingand search-based-testing.c) Being easily adapted to allow for differenttypes of objectives such as mutation cover-age or statement coverage.3) Providing the ﬁrst comparison between a guidedsearch and a random search in the domain of au-tomated test case generation for smart contracts.The rest of this paper is organized as follows: Section 2introduces the concept of smart contracts in the context ofthe Ethereum blockchain and discusses existing ATG toolsfor these smart contracts. Section 3 formalizes the challengesthat we identify for creating an ATG tool for Smart con-tracts. In Section 4, the AGS OL T tool is introduced, andits workings are explained. Section 5 describes the designand the results of the empirical study we conducted to eval-uate AGS OL T and compare the search-based and randomalgorithms, while Section 6 discusses its threats to validity.Finally, Section 7 discusses the results of these experimentsand introduces potential future work.

ACKGROUND

This section provides an overview concerning blockchain,smart contracts, and their testing.

A blockchain [12], [13] can be viewed as a decentralized , distributed digital ledger: an ordered list of blocks , whichthemselves contain an ordered list of transactions (see alsoﬁgure Figure 1). New blocks are added by miners , whofollow a consensus protocol that dictates the rules of theblockchain, including how to add new data and deal withconﬂicting versions of the blockchain. On the Ethereumblockchain, transactions can be used to transfer Ether (ETH)cryptocurrency from one address to another and deploy andinteract with smart contracts . Ether is also used to compen-sate the miner, who receives a small fee (called

Gas ) fromthe transaction sender for registering a transaction on theblockchain. In addition to sending Ether, transactions canalso be used to deploy- and interact with smart contractsBecause of its inner working, the Ethereum blockchain

Transaction 1.1Transaction 1.2.... Transaction 2.1Transaction 2.2.... Transaction 3.1Transaction 3.2.... ...

Block 1 Block 2 Block 3

Fig. 1. A blockchain as a ledger: an ordered list of blocks, each of whichcontains an ordered list of transactions. can store (almost) any data type, so long as modiﬁcationsare made in a transaction-based manner. Its creators haveleveraged this property to store (compiled) pieces of code,called smart contracts, on the blockchain, which can be usedas follows. Each transaction has a “Data” ﬁeld where atransaction sender can store bytecode. When sending to apreviously unused address , this bytecode can be interpretedby miners that use the

Ethereum Virtual Machine (EVM) tocreate new smart contracts whose bytecode is stored on theblockchain at the new address.After a contract has been deployed, the transactions sentto its address can invoke the execution of the code stored onthe blockchain by including the method to be invoked andany input parameters in the Data ﬁeld of the transaction.The EVM speciﬁes how to alter the state of the system basedon the Data ﬁeld of the transaction and the code stored atthe speciﬁed address [14]. If a transaction is issued withouta recognizable method in its “Data” ﬁeld, a special functioncalled

FALLBACK is invoked.Since writing bytecode by hand is impractical, severalhigh-level programming languages have been created, themost popular of which is

Solidity , which is inspired byPython, C++, and Javascript [15]. Smart contracts in Solidityare similar to classes in Object-oriented programming andbehave similarly to objects : the smart contract code servesas a blueprint to deploy many instances on the blockchain,each with their address and internal state. Similarly, Soliditysmart contracts have both public functions and variablesthat can be accessed from outside the smart contract andprivate functions and variables that can only be interactedwith by the contract itself.

Detecting vulnerabilities in smart contracts has been a hotresearch topic in recent years, especially since the infa-mous DAO (Distributed Autonomous Organisation) attackin 2016, where roughly 60 million dollars worth of Ether wasstolen because of an unforeseen exploit in a published smartcontract [16]. Due to speciﬁc blockchain properties, such asthe immutability of committed blocks and its distributedand decentralized nature , the proper implementation ofsmart contracts is particularly challenging. For one of themost extensive analyses of these challenges, we suggest thereader looks at Butijn et al. [17]. Below we brieﬂy discusssome of the relevant work in the ﬁeld of testing smartcontracts.Delmolino et al. [3] found that when teaching under-graduate students to create smart contracts, even simple

2. Distributed in this context implies that anyone can access thebytecode of a smart contract, decentralized means that anyone caninteract with a deployed contract.

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. implementations lead to a multitude of non-trivial prob-lems. Often, such problems do not prevent compilation butleave the contract vulnerable to exploitation or unintendedbehaviors. Anderson et al. [1], Luu et al. [2], and Atzei etal. [4] investigated already published contracts and high-lighted that some of them present design ﬂaws althoughalready published on the blockchain. Recently, Zou et al. [18]investigated the challenges related to smart contract testingand conﬁrmed that almost half of all developers desiredtools to verify code correctness.The above studies and the previously mentioned DAOattack motivated introducing new development tools todevelop and test safe smart contracts effectively [19]. Sev-eral tools have been put forward that we introduce brieﬂybelow: S OLIDITY - COVERAGE [20] measures the quality of anexisting test suite by checking whether branch coverage [21]has been achieved (i.e., whether all possible paths throughthe code have been executed). S

OLIDITY C HECK [22] checksSolidity code for patterns that are known to lead to vul-nerabilities and warns the user about them. Wu et al. [23]have designed 15 mutation operators for Solidity smartcontracts and use these to detect defects in 26 real-worldsmart contracts. O

YENTE [2] creates a control-ﬂow graph fora given smart contract and uses symbolic execution to checkits branch feasibility , (i.e., whether each part of the code is the-oretically reachable), as well as whether vulnerabilities arepresent. S

MART S HIELD [24] is a bytecode rectiﬁcation toolthat automatically ﬁxes three types of security bugs. ADF-GA [25] uses control-ﬂow-graphs with dup-based coveringcriteria but only tests on a small set of smart contracts thatonly use integers and unsigned integers. A recent additionby Liu et. al. is M OD C ON , which relies on user-deﬁnedmodels to impose model testing on smart contract [26].Finally, fuzzers [27] automatically create test cases forsmart contracts by generating random (within a speciﬁedrange) inputs for contract functions to detect errors. S OL - FUZZER [6] has been created by the Solidity developers todetect internal compilation errors and segmentation faults.The commercial E

CHIDNA [7] tries to break user-deﬁned in-variants, while the academic C

ONTRACT F UZZER [5] checksfor both coding errors and the vulnerabilities mentioned byLuu et al. [2] and Bartoletti et al. [28].When it comes to automated unit-testing of Solid-ity smart contracts on the Ethereum blockchain, each ofthe approaches mentioned above comes with its limita-tions: (i) S

OLIDITY - COVERAGE [20], S

OLIDITY C HECK [22],and O

YENTE [2] do not produce test suites; (ii)S

MARTSHIELD [24] focuses on only three vulnerabilities; (iii)E

CHIDNA [7] and M OD C ON [26] require the user to deﬁneinvariants or models of their code, and (iv) ADF-GA [25] istested on a small subset of possible smart contracts. C ON - TRACTFUZZER [5] is perhaps the most complete approachout there because it creates test suites fully automaticallyand works on a variety of smart contracts. However, the toolcreators do not make explicit their approach to dealing withthe blockchain and smart contract aspects of their automatedtesting. Additionally, fuzzing tools have been theorized tobe less effective at covering complex functionality thanguided search-algorithms such as genetic algorithms.For this reason, we introduce AGS OL T (Automated Gen-eration of Solidity Test Suites). This tool can easily leverage different search algorithms to automatically generate testsuites for Solidity Smart contracts that aim to achieve branchcoverage. In the next sections, we ﬁrst introduce the chal-lenges that any tool or framework that sets out to achievethis goal will meet and then discuss how AGS OL T aimsto overcome these challenges. Finally, we demonstrate theeffectiveness and efﬁciency of the tool by experimenting on36 real-world smart contracts.

MART C ONTRACT T ESTING

This section focuses on the challenges and discusses someof the qualities that an effective automated test case gener-ation tool for (Solidity-) smart contracts should possess. Toachieve high branch coverage with good efﬁciency, such atool should handle the same issues that similar tools have,such as creating control dependency graphs and measuringbranch distances. Furthermore, several challenges are spe-ciﬁc to the deployment, execution, and testing of (Solidity)smart contracts on a blockchain.

The only way to change the state of a smart contract is bysending a transaction to the contracts’ address and invokeone of its functions. Besides the function and parameterspeciﬁcation, every interaction with a smart contract hasto provide a sender, which is the address from which thetransaction was sent, a value , which is the amount of Ethersent in the transaction, and an amount of gas, which isthe fee that the sender has to pay to the miner for thecomputational power involved in adding this transactionto the block. These transaction properties can be accessed bythe smart contract receiving the transaction and inﬂuence itsinner workings, affecting which branches are traversed. pragma solidity contract Auction { address payable public Seller; address payable public Frontrunner; uint public HighBid; uint public CloseTime; constructor ( uint _CloseTime) payable public { Seller = msg . sender ; Frontrunner = msg . sender ; HighBid = msg . value ; CloseTime = _CloseTime; } function Bid() payable external { require ( msg . value > HighBid); Frontrunner. send (HighBid); HighBid = msg . value ; Frontrunner = msg . sender ; } function Claim() external { require ( block . timestamp > CloseTime); // Implement ownership transfer selfdestruct (Seller); }} Smart Contract 1. An example of Ethereum-speciﬁc properties.

3. Many blockchain interaction platforms do not require a value bespeciﬁed, in which case this defaults to zero.

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. As an illustration, Smart Contract 1 shows an exampleof a simple Auction on the Ethereum blockchain. When thecontract is initiated the constructor is executed, which in-stantiates the S

ELLER , F

RONTRUNNER , H

IGH B ID and C LOS - E T IME variables. Afterwards anyone can make a Bid bycalling the B ID () function (lines 16-21). This function ﬁrstchecks if the transaction property MSG . VALUE (the new bid)is higher than the current highest bid and if it is, it refundsthe previous highest bidder (F

RONTRUNNER ) and changesthe Highest Bid and frontrunner based on the transactioninformation

MSG . VALUE and

MSG . SENDER respectively.Any automated test-case generation tool for smart con-tracts should generate test-cases containing transactionsfrom different accounts to test sender-dependent functional-ity. Similarly, the tool should vary the amount of Ether sendwith a transaction and evolve it either as if they were inputvariables or chosen for this purpose.

Besides transaction properties, a smart contract has accessto additional information from the blockchain environmenton which it is deployed, such as the address of the miner ofthe current block, the gas limit of the current block (i.e., themaximum amount of computation in a block), the hash ofrecently added blocks, and the time and block number ofthe current block. Moreover, because each smart contract hasan address, it has a balance in Ether associated with it, whichaffects its ability to send Ether. An example of this is givenby the C LAIM () function in Smart Contract 1 which com-pares the blockchain property

BLOCK . TIMESTAMP (whichgives the time since the Unix epoch for this block) withthe user-speciﬁed C

LOSE T IME before the auction can beclosed. If the speciﬁed time has been reached, the smartcontract removes itself from the blockchain and sends itsentire balance to the seller. These blockchain propertiescan be manipulated (within certain limitations) by the testenvironments. A useful test case generation tool shouldvary some or all of these blockchain properties for bettertesting while at the same time respecting the logical rules ofthe blockchain, such as that block numbers and time mustalways increase between different blocks.

Similarly to how Java classes can instantiate and inter-act with other classes, smart contracts on the Ethereumblockchain can instantiate and send transactions, such asmethod invocations or Ether transfers, to other smart con-tracts. However, there are two essential differences. First,smart contracts can send transactions to any address on theblockchain, allowing them to transfer Ether to a wallet orcall functions of any smart contract on the same blockchainas long as that contracts address is passed as a variableto the calling smart contract. A special case occurs whenthe contract sends a transaction to the so-called zero-address (0 X

4. any of the last 256. smart contracts to invoke the contract’s functionality orpurely transfer currency.At line 18 of the B ID () function in Smart Contract 1the previous F RONTRUNNER is refunded the bid. If suchF

RONTRUNNER is a smart contract, this transaction invokesthe fallback function of that smart contract, which could,in turn, call one of the functions in the A

UCTION smartcontract.Automated Test Case Generation Tools for Smart Con-tracts should be aware of both existing addresses, as well asnon-existing addresses and the zero-address, which mightcause errors in the smart contract. Additionally, they shouldanticipate interaction with smart contracts outside the pro-grammer’s control. OL T: A

UTOMATED G ENERATOR OF S OLID - ITY T EST S UITES

This section describes the design choices and algorithmicprocedures that make up the main workings of AGS OL T.Figure 2 shows its high-level workings, which is composedof an initialization phase and a testing loop . In the initializationphase, relevant properties of the smart contract(s) underexamination are extracted, which are required during thetesting loop. During the testing loop, test cases are runon a blockchain implementation and their performance isevaluated using the branch distance. AGS OL T is mostlyimplemented in Python, except the instrumentation of theblockchain, which is done through the

WEB library inJavascript. During the initialization phase, AGS OL T extracts severalcharacteristics of the smart contract under investigation tocreate the ﬁrst generation of test cases that can be improvedin the testing loop . When Solidity code is compiled into bytecode, an A P - PLICATION B INARY INTERFACE (ABI) ﬁle is created. Thisﬁle contains the basic information necessary for test casegeneration (i.e., function names and input types). Addi-tionally, during this step, hard-coded values of the contractare scraped to be used for seeding the method invocations.Seeding is a common technique used in automated test casegeneration, which involves including certain values withhigher probability when randomly selecting variables forinput variables [29]. In AGS OL T whenever a random inputvariable or ETH value or account is selected, ﬁrst a checkis performed whether one or more hard-coded values ofthe corresponding type were present in the smart contract(and consequently scraped). If there are such values 50%of the time a random scraped value is picked instead of acompletely random value. This allows AGS OL T to automat-ically leverage information inside the smart contract to reachbranch coverage quicker.

5. We use G

ANACHE

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. ABI Analysis( )RandomInitialisation( ) yesNo Budgetconsumedor coverageachieved?Deploy & RunTest Cases( )Extract ExecutionLogs( )Evaluate TestCases & UpdateArchive( )Selection( )Mutation( )Crossover( ) StartCFG Creation( )CDG Creation &Trimming( ) GeneticRandom

Algorithm? Return Final Test Suite I n i t i a li sa t i onT es t i ng Loop

Fig. 2. The ﬂowchart of AGS OL T; each step is explained in a corre-sponding section. TABLE 1EVM opcodes needed to generate test cases

Hex Opcode Stack Input Stack Ouput56 JUMP dest57 JUMPI dest, bool5B JUMPDEST10 LT a , b a < b

11 GT a , b a > b

12 SLT a , b a < b

13 SGT a , b a > b

14 EQ a , b a = b

15 ISZERO a a = 0

To keep track of the branches to be traversed and thosealready covered, E VO S OL extracts the control dependencygraph [30] of the smart contract. To this end, the ControlFlow Graph (CFG) is distilled from the bytecode. The CFG iscreated from the bytecode to retrieve the values on the stackwhen the EVM evaluates a predicate controlling a branch.These values are needed later for deciding how close a testcase is to satisfying a predicate, and consequently, traversingthe branch it controls. Table 1 shows the 9 opcodes which arerelevant for identifying nodes and branches in the opcodecolumn, their hex value as it appears in bytecode as well asthe argument(s) they consume from the stack and the outputvalue they push onto the stack. The "JUMP"-opcode is usedto jump to a different part of the bytecode for execution(indicated by the destination value). The "JUMPI"-opcodeworks similar to "JUMP" except that execution continues from the destination, only the consumed bool is true, this iswhat creates branches in the CFG. Finally the other opcodesshown in Table 1 correspond to the predicates that cancontrol a branch; < , > , == and ¬ . Note that ≤ , ≥ and (cid:54) = can be represented with ¬ > , ¬ < and ¬ == respectively.For each branching node, AGS OL T identiﬁes the opcodethat corresponds to the controlling predicate to compute thebranch distance for the outgoing branches.

Algorithm 1 C OMPACTIFY

CFG

Input:

N (cid:46)

The set of all nodes in the CFG.

Output: N (cid:48) (cid:46) The set of nodes where nodes with superﬂuous brancheshave been merged.1: procedure C OMPACTIFY

CFG2: UN ←− N (cid:46)

Initialise the unmerged nodes.3: MN ←− ∅ (cid:46) Initialise the merged nodes.4: while UN (cid:54) = ∅ do Node ←− any n ∈ UN | n. incoming_nodes ∩ UN = ∅ UN ←− UN − Node MN ←− MN ∪ { Node } UN, MN ←− Compactify ( Node , UN , MN ) end while return MN end procedure Algorithm 2 C OMPACTIFY

Input:

Node (cid:46)

A node to be compactiﬁed. UN (cid:46) The set of unmerged nodes. MN (cid:46) The set of merged nodes.

Output: UN (cid:48) (cid:46) The updated set of unmerged nodes. MN (cid:48) (cid:46) The updated set of merged nodes.1: procedure C OMPACTIFY if Node.outgoing_nodes (cid:54) = 1 then return

UN, MN else if Node.incoming_nodes (cid:54) = 1 then return

UN, MN else nextNode ←− Node.outgoing_node UN ←− UN − nextNode MN ←− MN ∪ { nextNode } Node ←− Node (cid:76) nextNode end if return C OMPACTIFY ( Node , UN , MN )11: end procedure Some edges of the graph lead to superﬂuous nodes whoseexecution neither leads to or depends on any predicate,that could waste part of the search budget. Therefore, theyare eliminated by running the C

OMPACTIFY

CFG algorithmshown in Algorithm 1 which uses the C

OMPACTIFY proce-dure in Algorithm 2. Finally, AGS OL T uses the algorithmproposed by Lengauer and Tarjan [31] to determine the con-trol dependencies between the nodes and distil the

ControlDependency Graph from the

Control Flow Graph .Considering some speciﬁc characteristics of theEthereum blockchain, the graph can still be optimized byremoving some nodes that are not relevant for test casegeneration. These nodes and edges belong to the followingpatterns: • Dispatcher Nodes and Edges.

The E

THEREUM byte-code contains a dispatcher function that handles the

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. transactions to the smart contract. Since AGS OL Tinvokes all (public) methods, there is no need tocalculate the branch distance for these edges. • Empty Fallback.

An empty fallback function is ini-tialized when the user does not explicitly deﬁne one.However, such a function can be safely ignored as itdoes not change the semantics. • State Variables.

Public variables are accessed asfunctions through the contract dispatcher. Sincecalling these variables does not help cover newbranches, the corresponding nodes and edges in theCDG can be ignored.An example of these patterns is shown in Fig. 3: Thecontrol dependency graph of smart contract 1 starts withdispatcher nodes (even nodes), which are used to identifythe method or state variable (starting at uneven nodes) thatis called. If none of the state variables or methods waspassed in the transaction, the fallback function (starting atnode 12) is invoked. Since no fallback function was speciﬁedin smart contract 1, this method is empty, and neither invok-ing it nor any of the state variables is particularly interestingfor testing purposes. For that reason AGS OL T, removesthe edges and nodes corresponding to the dispatcher, statevariables and empty fallback functions (shown dotted inFig. 3 and creates new edges to the relevant methods (B ID and C LAIM ) which are shown in bold in Fig. 3. s21 893

Other state variablesSellerFrontrunnerBid Fallback

Claim

Fig. 3. CDG of the

Claim function in Smart Contract. 1 • Payable Check.

A S

OLIDITY function can be declaredas payable if it accepts transactions that have an asso-ciated E

THER value. When a function is not declaredas payable, the Ethereum compiler makes sure thatthe EVM reverts such transactions. Since our goal isto test only the functionalities implemented by thedeveloper, AGS OL T ignores such branches and sim-ply does not send E

THER to non-payable functions.As an example, Figure 4 shows the CDG of the

Claim function reported in Smart Contract 1. Before going from line 23 to 24, the EVM veriﬁes if the transaction has anE

THER value and, if it does, reverts the transaction. AGS OL Ttrims the dashed nodes and edges and merges the start nodewith node 3. s 34 5e line 23 line 24 REVERTSelfDestructtimestamp > CloseTime timestamp ≤ CloseTimeEnd of method1Value > 0 Value = 02 REVERT

Fig. 4. CDG of the

Claim function in Smart Contract 1

During the testing loop, the actual search for optimal test-cases is performed until the budget is consumed. We ﬁrstdiscuss the difference between a random- and a search-based search algorithm, followed by the general steps.

After extracting the required information, the population oftest cases is initialized through a random algorithm. As inprevious work, each test case is a sequence of statements t = < s , s , ..., s n > [32], [33], [34]. AGS OL T relies on twotypes of statements: • Constructor statements are used to deploy smartcontracts on the blockchain. Such statements are usedas the ﬁrst statement s of each test case t to ensurethat a fresh instance of the smart contract is instanti-ated for each test case on which the function state-ments can be called. This statement type containsthe information required to deploy an instance of therelevant smart contract on the blockchain, includingthe input variables required by the smart contractconstructor and the transaction metadata, such as theamount of ETH send with the transaction and theaccount from which the transaction is sent. • Function statements are used to create transactionsthat invoke functions in the deployed smart con-tracts. Indeed, the only way to interact with a smartcontract in Ethereum is by sending a transaction toits address. All the statements, but the ﬁrst (i.e., theconstructor statement), in a test case are functionstatements that are responsible for traversing thebranches of the smart contract. This statement typecontains a reference to the function to cover, its inputvariables, and the transaction metadata.A set of test cases is initialized by creating N randomtest cases, where N is the population size i.e., the number HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. of test cases in any generation. When AGS OL T relies on therandom search, test cases are generated by performing onlythis step.The search keeps running through until either fullbranch coverage is achieved or the speciﬁed budget is con-sumed. At this point, the ﬁnal population (i.e., the archive) ispresented as the solution. As shown in Figure 2, to improvethe generated test cases, AGS OL T can perform a guidedsearch and a random search. For the former, we integratedD

YNA

MOSA (i.e., Many-Objective Sorting Algorithm withDynamic target selection), the genetic algorithm proposedby Panichella et al. [35].

Genetic Algorithms are inspired by biological evolution:they work with a population of (candidate) solutions or chromosomes from which they derive a next generation of so-lutions by iteratively applying evaluation , selection , crossover ,and mutation . Mitchell [36] and Lucken et al. [37] providemore details on genetic algorithms for multi-objective prob-lems.D YNA

MOSA [35] is a state-of-the-art algorithm speciﬁ-cally designed for automated test case generation. It facili-tates the creation of a small and effective test suite throughmulti-objective optimization inspired by NSGA-II [38]. Dy-naMOSA has been shown to signiﬁcantly outperform othertest case generation algorithms, such as the Whole-SuiteApproach [32] and LIPS [39], [40] in terms of branch andmutation coverage on an extensive set of Java classes.

Fitness Function.

The search algorithm is guided by thenormalized branch distance, as deﬁned by Arcuri et al. [32].The normalized branch distance for a test case t and abranch b with controlling predicate p b is given by: d ( t, b ) =  , if t satisﬁes p bf pb ( t,b ) f pb ( t,b )+1 , if p b has been reached but notsatisﬁed , otherwise (1)where f p ( t, b ) is given by Korel’s objective function forrelational predicates as shown in Table 2 [41]. Test cases witha smaller normalized branch distance are closer to coveringthe corresponding branch and are thus more desirable. TABLE 2Relational predicates and objective functions [41]

Relational predicate f p a > b b − aa ≥ b b − aa < b a − ba ≤ b a − ba = b abs ( a − b ) a (cid:54) = b − abs ( a − b ) Because we aim at covering all branches simultaneously,the goal of the search becomes the following, similarly towhat previously formulated by Panichella et al. [35]:

Deﬁnition 1 (Fitness Function) . Let B = { b , b , ..., b k } bethe set of branches in a smart contract. Find a test suite T = { t , t , ..., t n } consisting of non-dominated test cases t that simultaneously minimizes the ﬁtness function for each branch b ∈ B , i.e., minimizing the following k objectivefunctions:  f ( t ) = al ( t, b ) + d ( t, b ) f ( t ) = al ( t, b ) + d ( t, b ) ... f k ( t ) = al ( t, b k ) + d ( t, b k ) (2)where al ( t, b i ) is the approach level of t to b i (i.e., thenumber of predicates between the closest branch executedby t and b j ) and d ( b, t ) is the minimal normalized branch distance of t to branch b ∈ B as deﬁned in Equation (1).Note that in this multi-objective approach a distance iscalculated for each objective (branch) so that rather thanusing a single distance to describe the ﬁtness of a test case t , a distance vector , (cid:126)d t = < d ( t, b ) , d ( t, b ) , ..., d ( t, b n ) > , isused. Selection Operation.

After randomly initializing the ﬁrstgeneration of test cases and measuring the branch distances,the test cases are ranked using their Pareto fronts [38] as theprimary criterion. Although in Multi-Objective Optimiza-tion having solutions that make trade-offs between objec-tives is usually desirable, this is not the case for AutomatedTest Case Generation. Indeed, only fully covered branchesare relevant for the branch coverage, whereas a test that almost covers one (or more) uncovered branches does notadd any value to the ﬁnal test suite. When ranking testcases in the ﬁrst Pareto front, D

YNA

MOSA uses a preferencecriterion that generalizes this idea by determining, for eachbranch b ∈ B , the non-dominated test cases closest tocovering b and (if there are more than one) the shortestone among those. More formally, as deﬁned by Panichellaet al. [35], the preference criterion is the following: Deﬁnition 2 (Preference Criterion.) . Given a branch b i withcorresponding objective function d i = d ( b i , t ) , a test case t is preferred over another test case t (cid:48) (written as t ≺ b i t (cid:48) ) iff d i ( t ) < d i ( t (cid:48) ) OR d i ( t ) = d i ( t (cid:48) ) ∧ size ( t ) < size ( t (cid:48) ) (3)where size is a function that gives the length (e.g.,number of statements) of a given test case. Size is considereda secondary criterion to prioritize solutions because shortersolutions reduce the oracle cost for humans [42], [32].To compare the test cases that are not in the same Paretofront and are not preferred by the preference-criterion,the sub-vector-distance-assignment algorithm introduced byKöppen and Yoshida [43] is used as a secondary selectioncriterion. Its goal is selecting the most diverse possiblesubset of solutions from the last Pareto front for the nextgeneration. Before evaluating and selecting the best test cases forthe next generation, each test case runs on an Ethereumblockchain environment. As mentioned in Section 4.2.1,each test case starts with a constructor statement, whichis used to deploy a new instance of the smart contract tothe blockchain instance. By looking at the receipt of the

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. transaction, AGS OL T instantiates the new smart contractand extracts its address on the blockchain. Afterward, eachmethod call is executed by sending a transaction to the in-stance’s address. The hash-codes of the transactions, whichidentify each transaction on the blockchain, are stored forthe next step.

To compute the branch coverage for a test case as deﬁned inEquation (1), two types of information are required: (i) theparts of the code covered by the test case and (ii) the valuesthat are on the stack when a branch-controlling predicateis evaluated. AGS OL T extracts this information througha slightly modiﬁed functionality of the javascript

WEB debug module called getTransactionTrace . This module takesthe transaction of a method call, recreates the blockchainstate when the transaction was executed, and writes theexecuted opcodes and the stack evolution in a ﬁle used forthe next evaluations. Algorithm 3

Evaluate Test Case

Input: methodCalls (cid:46)

The list of methods called by the test case

Opcodelists (cid:46)

The lists of opcodes executed by each Methodcall

Callstacklists (cid:46)

The lists of all the items on the stack when eachopcode was executed

Edges = { E , E , ..., E n } (cid:46) The (ordered) list of Edges of thesmart contract.

Nodes = { N , N , ..., N m } (cid:46) The (ordered) list of Nodes of thesmart contract.

Result: a distance vector which contains the test case’s distance toeach branch. procedure S ET D ISTANCES test _ scores = [ ∞ , ∞ , ..., ∞ ] (cid:46) Distance to each Edge traversed = ∅ (cid:46) The set of traversed edges for each

Methodcall, Opcodelist, Callstacklist do curNode = startNode while curNode (cid:54) = endNode do nextNode = F IND N EXT N ODE ( curNode, Opcodelist ) for each E i ∈ Edges doif E i .startNode == curNode then test _ scores [ i ] = min( test _ scores [ i ] , B RANCH D IST ( Opcodelist, Callstacklist, E i )) end ifif E i .endNode == nextNode then traversed = traversed (cid:83) { E i } end ifend for curNode = nextNode end whileend for for each E i ∈ Edges doif test _ scores [ i ] == ∞ then test _ scores [ i ] = A PPROACH L EVEL ( E i , traversed ) end ifend for end procedure After executing all the test cases and retrieving the nec-essary information, test cases are evaluated, as shown inAlgorithm 3, to produce the distance vector, test _ scores ,describing the test case’s ﬁtness.For each test case, its distance to all branches is initial-ized as inﬁnite (line 2). Additionally, AGS OL T keeps track

7. https://web3js.readthedocs.io/ of all traversed edges (initialized at line 3) to calculate theapproach levels for those edges whose starting nodes are notreached during the execution. For every method call in thetest case, AGS OL T takes the corresponding list of executedopcodes and a list of lists containing all the values on thestack when executing each opcode (line 4). The ﬁrst node inthe CDG of any method is always the same (line 5), whileits end is only reached when a node has no outgoing edges(line 6). Finding the next node using the F

IND N EXT N ODE (line 7) method means looking at the ﬁrst opcodes executedafter leaving the current node and comparing them to theopcodes of the nodes with an incoming edge from thecurrent node. For each reached node, AGS OL T analyzes theoutgoing edges (lines 8-9) and updates the test _ scores ifthe normalized branch distance from Equation (1) is smallerthan the smallest distance found so far in the test case(lines 10-11). After identifying all traversed edges, AGS OL Tcalculates for each not covered branch, the approach level:i.e., the number of edges that would need to be traversedbefore the node controlling the branch can be reached (lines19-23). Finally, if a test case outperforms the best test-casefound so far for a particular branch, it is stored in an archive ,which keeps track of the best test-case for each branch.It is important to note that (as can be seen in Fig. 2)both the random testing approach and the genetic approachgo through steps 4.2.3 through 4.2.5. The key differencebetween these approaches is that genetic algorithms use selection , crossover , and mutation to create the next generationof test cases, while random testing creates a new set ofrandomly initialized test cases. To deal with the challenges that arise from transactionproperties, blockchain properties, and interactive propertiesmentioned in Section 3, AGS OL T provides conﬁgurationoptions that deal with the Ethereum and Solidity blockchainand smart contract environment: • Transaction Properties.

AGS OL T extracts all the ac-counts of the blockchain environment and uses themboth as senders of transactions and as input variableswhenever an address type is required. Similarly, itkeeps track of whether a function is payable and, ifit i s, will send an amount (between a conﬁgurablemaximum and minimum) of Ether with the trans-action. Both addresses and values can be evolvedby the genetic algorithm as though they were inputvariables. • Blockchain Properties.

AGS OL T allows the user toinclude a

PassBlocks or PassTime method callin test cases, which instruct the blockchain envi-ronment to update the latest block number or thetime rather than invoke smart contract functions (as-suming the chosen blockchain environment allowsthese manipulations). Both block number and timecan only increase , similarly to real-world Ethereumimplementations. The miner conﬁgurations can beset in the blockchain environment. Therefore, theyare not manipulated in AGS OL T. • Interactive Properties.

In addition to using the ex-tracted accounts as input variables, AGS OL T has

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. an option to include speciﬁc non-existent accounts,which can trigger speciﬁc errors. This feature alsoallows the users to indicate a new contract creationthrough the zero-address (0 X OL T as address input variables to testthe interaction between the contracts. However, thisfunctionality is not entirely supported . Addition-ally, the user can use this functionality to provideaddresses that do not exist on the blockchain asinput variables for the contract functions to test thebehavior of the contracts when the sent transactionsfail. At the end of the procedure shown in Fig. 2 AGS OL Toutputs a text-ﬁle that gives information about the test suiteand the test process. In particular, it includes the numberof branches found and covered, the number of iterationsthrough the loop before stopping, the total time spent test-ing, and the time spent running the tests on the blockchain.Afterward, the test cases are provided as construct state-ments and method calls with relevant input- and transactionarguments. The test suite is easily interpretable for humansand can easily be automatically transformed into input forthe user’s preferred testing environment.In addition to the test suite, AGS OL T writes out the samemeta information of all contracts that were tested in a CSVﬁle for easy comparison.

MPIRICAL E VALUATION

This section reports the empirical study that we performedto compare effectiveness , efﬁciency , and test case length of thetwo algorithms for test case generation implemented inAGS OL T: namely, a fuzzer and

DynaMOSA [35].

In an attempt to test on real-world smart contracts for ourexperiment, we scraped Github to obtain the most starredprojects containing Solidity ﬁles. We selected the smart con-tracts that adhered to the following criteria: (i) being stand-alone, meaning they do not call other smart contracts dur-ing run-time (although they can inherit functionality fromother smart contracts), (ii) coming from different applicationdomains, (iii) not having any user-deﬁned inputs for theirfunctions. We retrieved 36 Solidity smart contracts from17 different repositories, which is comparable to existingstudies [25], [26]. To conﬁrm that the contracts were usedin the real world, we manually inspected them, and wefound that at least 17 of the smart contracts have also beendeployed on either the main Ethereum network or on a testnetwork. Table 3 shows the characteristics of the identiﬁedsmart contracts, including their domain, whether they werefound online, their number of statements, and number ofbranches in its CDG.

8. The current version of AGS OL T is designed for unit testing ofsmart contracts.

Additionally, Table 3 highlights presence the blockchain-speciﬁc qualities that AGS OL T can handle. The sender depen-dence and value dependence indicate whether functionality ofthe smart contract depends on the transaction sender andtransaction value and fall into the transaction propertiesdiscussed in Section 3 and Section 4.3.

Block dependence and time dependence indicate whether the contract relies on blocknumber or the blockchain time for its functionality, whichfalls into the blockchain properties discussed in Section 3and Section 4.3. Finally account as variables , non-existingaccount dependence and zero account dependence indicate thepresence of interaction within the smart contract that woulddepend on the accounts passed as input variables and fallinto the interaction properties discussed in Section 3 andSection 4.3.The entire data set, including the addresses of the de-ployed smart contracts, along with the tool and the results,is available in our online appendix The smart contractsare spread out over ten application domains. They varyin terms of the number of source code statements andbranches in the CDG of the corresponding bytecode. Wefound that the transaction properties we identiﬁed occurredmost frequently (28 sender dependencies and 26 valuedependencies), followed by the interaction properties (29variable dependencies, four non-existent account depen-dencies, and two zero-account dependencies). Interestinglyonly three smart contracts exhibited blockchain properties(two time dependencies and one block dependency). Thischaracteristic is feasible because relying on block and timeinformation is inconsistent (each miner might have differentinformation), and developers should rely on it as little aspossible. Importantly only four smart contracts do not relyon any of the properties we identiﬁed. Since the presence ofthese dependencies was not part of the search protocol, thisdemonstrates the necessity for our tool (and others like it)to consider the blockchain-speciﬁc properties identiﬁed inSection 3. OL T Evaluation

To evaluate the effectiveness of AGS OL T as well as comparethe effectiveness of our random search and guided search,we perform an empirical study steered by the followingresearch questions. • RQ1. (Effectiveness)

Which is the coverage of the ge-netic algorithm approach compared to the random ap-proach when generating test cases for Solidity smartcontracts? • RQ2. (Efﬁciency)

Which is the execution time of thegenetic algorithm approach compared to the random ap-proach when generating test cases for Solidity smartcontracts? • RQ3. (Test Case Length)

Which is the average numberof statements in a test case for the genetic algorithm ap-proach compared to the random approach when generatingtest cases for Solidity smart contracts?

The ﬁrst two research questions are selected becausethey give insight into the performance of the two ap-proaches as well as the general performance of AGS OL T.

9. https://github.com/AGSolT/AGSolT-2020-Submission

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. TABLE 3The smart contracts used for evaluating AGS OL T and their characteristics.

Contract Name Domain

State-ments

Bran-ches Found SenderDep. ValueDep. Acc. asVars NE Acc.Dep. Zero Acc.Dep. BlockDep. TimeDep.

AddressBook Communication 19 54 (cid:55) (cid:51) (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) array-utils Storage 144 257 (cid:55) (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

BadAuction Token 7 7 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

BasicToken Token 11 8 (cid:55) (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55)

Casino Exploit 38 29 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:51)

DateTime Time 90 143 (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

DosAuction Exploit 7 7 (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

EIP20StandardToken Token 24 13 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

EasyPayAndWithDraw Token 7 8 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

EtherBank Exploit 13 17 (cid:55) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

EzToken Token 31 11 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

FixedSupplyToken Token 39 22 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

FundRaising Finance 23 21 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:51)

Gift_1_ETH Exploit 18 18 (cid:51) (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

Greeter Communication 15 81 (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

Greeter2 Communication 13 60 (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

Greeter3 Communication 15 73 (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

GuardCheck Finance 10 14 (cid:55) (cid:51) (cid:51) (cid:51) (cid:55) (cid:51) (cid:55) (cid:55)

GuessTheNumberChallenge Exploit 6 8 (cid:55) (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

Identity Identity 53 131 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

IdentityManager Identity 49 90 (cid:51) (cid:51) (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

LotteryFor10 Betting 45 44 (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:51) (cid:55)

LotteryMultipleWinners Betting 31 45 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

MultiSigWallet (1) Wallet 56 70 (cid:55) (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55)

MultiSigWallet (2) Wallet 59 83 (cid:55) (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55)

MyAdvancedToken Token 53 3 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

OpenAddressLottery Betting 30 34 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

PermissionGroups Identity 58 86 (cid:51) (cid:51) (cid:55) (cid:51) (cid:55) (cid:51) (cid:55) (cid:55)

Prover Communication 27 17 (cid:51) (cid:51) (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

Randomness Betting 22 17 (cid:55) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

Reentrance Exploit 9 14 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

Rubixi Exploit 56 102 (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55)

SecureAuction Finance 11 6 (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

TestDateTime Time 160 252 (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55) theRun Exploit 62 83 (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55)

VulnerableTwoStep Exploit 11 10 (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:55)

The third research question is included because creatingsmall, “ human-readable ” test cases is a secondary objectiveof D

YNA

MOSA [35].To answer the research questions, we run AGS OL T foreach smart contract in Table 3 to generate a test suite untileither a ) full branch coverage is achieved or b ) the tool hasgone back to the start of the search loop in Figure 2 100times. We repeated the process ten times for each smartcontract to account for the inherent randomness of bothapproaches. We set the population size to 50 individualsfor both approaches; therefore, the search budget consistsof 5000 test case evaluations or up to 200 000 method eval-uations per smart contract. Our parameter settings for thegenetic algorithm are the same as those used for evaluatingD YNA

MOSA [35], and the conﬁgurable options discussedin Section 4.3 were appropriately set whenever possible toconstrain the search.As previously mentioned, we used G

ANACHE to sim-ulate the Ethereum blockchain, as it is much faster thana decentralized blockchain implementation. The execution was run on virtual machines running Ubuntu server with aRAM of 16GB.For each generated test case, we measure its branchcoverage, the time spent running tests on the blockchain, thetotal time, and the number of statements. Additionally, wecompute the statistical signiﬁcance of the difference betweenthe two approaches using

Wilcoxon’s test [44] with a p-valuethreshold of 0.05 as well as the Vargha-Delaney statistic( ˆ A ) [45] which is used to measure the magnitude of thedifference. First, all tables miss the results for three contracts, whichreturned an error. We found that invoking some functionsof D

ATE T IME and I

DENTITY could cost more Gas than theblock limit and that calling a function in I

DENTITY withAGS OL T can produce an out of bounds error; therefore, weexcluded them from the performance evaluation. However,we included these smart contracts in Table 3 since they

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. TABLE 4Comparison for the achieved branch coverage for the genetic search algorithm and the fuzzing algorithm.

Name

Branches Mean Cov. Gen. Mean Cov. Fuz. p-val ˆ A Effect Size % % AddressBook 54 54.0 1.00 54.0 1.00 1.00 0.50 negligibleBadAuction 7 7.00 1.00 7.00 1.00 1.00 0.50 negligibleBasicToken 8 8.00 1.00 8.00 1.00 1.00 0.50 negligibleCasino 29 25.0 0.86 24.1 0.83 largeDosAuction 7 7.00 1.00 7.00 1.00 1.00 0.50 negligibleEIP20StandardToken 13 13.0 1.00 13.0 1.00 1.00 0.50 negligibleEasyPayAndWithDraw 8 8.0 1.00 6.00 0.75 largeEtherBank 17 14.0 0.82 14.0 0.82 1.00 0.50 negligibleEzToken 11 11.0 1.00 11.0 1.00 1.00 0.50 negligibleFixedSupplyToken 22 21.7 0.99 22.0 1.00 0.08 0.35 smallFundRaising 21 21.0 1.00 21.0 1.00 1.00 0.50 negligibleGift_1_ETH 18 14.0 0.78 14.0 0.78 1.00 0.50 negligibleGreeter 81 81.0 1.00 81.0 1.00 1.00 0.50 negligibleGreeter2 60 60.0 1.00 60.0 1.00 1.00 0.50 negligibleGreeter3 73 73.0 1.00 73.0 1.00 1.00 0.50 negligibleGuardCheck 14 14.0 1.00 14.0 1.00 1.00 0.50 negligibleGuessTheNumberChallenge 8 8.00 1.00 8.00 1.00 1.00 0.50 negligibleIdentityManager 90 73.6 0.82 55.0 0.61 largeLotteryFor10 44 43.0 0.98 43.0 0.98 1.00 0.50 negligibleLotteryMultipleWinners 45 44.7 0.99 43.4 0.96 largeMultiSigWallet (1) 70 62.0 0.89 62.7 0.90 0.44 0.33 mediumMultiSigWallet (2) 83 76.2 0.92 74.5 0.90 0.33 0.69 mediumMyAdvancedToken 3 3.00 1.00 3.00 1.00 1.00 0.50 negligibleOpenAddressLottery 34 32.0 0.94 32.0 0.94 1.00 0.50 negligiblePermissionGroups 86 85.7 0.997 83.7 0.97 largeProver 17 17.0 1.00 17.0 1.00 1.00 0.50 negligibleRandomness 17 16.0 0.94 16.0 0.94 1.00 0.50 negligibleReentrance 14 13.0 0.93 13.0 0.93 1.00 0.50 negligibleRubixi 102 67.0 0.66 69.0 0.68 largeSecureAuction 6 6.00 1.00 6.00 1.00 1.00 0.50 negligibleTestDateTime 252 243 0.96 240 0.95 largetheRun 83 34.0 0.41 34.0 0.41 1.00 0.50 negligibleVulnerableTwoStep 10 10.0 1.00 10.0 1.00 1.00 0.50 negligible

TABLE 5How often the two approaches managed to achieve full branchcoverage.

Name Full Cov. Gen. Full Cov. Fuz.

AddressBook 10 10BadAuction 10 10BasicToken 10 10DosAuction 10 10EIP20StandardToken 10 10EasyPayAndWithDraw 10 -EzToken 10 10FixedSupplyToken 7 10FundRaising 10 10Greeter 10 10Greeter2 10 10Greeter3 10 10GuardCheck 10 10GuessTheNumberChallenge 10 10LotteryMultipleWinners 7 2MultiSigWallet (1) 1 -MyAdvancedToken 10 10PermissionGroups 8 -Prover 10 10SecureAuction 10 10VulnerableTwoStep 10 10 demonstrate the usefulness of AGS OL T as a tool capableof detecting errors in popular real-world smart contracts.

Table 4 shows the mean branch coverage in terms ofbranches covered and the percentage of total branches cov-ered for both D

YNA

MOSA [35] and the fuzzer approach.Overall both approaches achieved good branch coverage,Table 5 shows that D

YNA

MOSA managed to achieve fullbranch coverage for 21 smart contracts and the fuzzerachieves full branch coverage for 18 smart contracts. Oneoutlier on which both approaches perform poorly is the“theRun” contract, which relies on the block hash to sim-ulate randomness, which is something that cannot be ma-nipulated by AGS OL T.Table 4 also reports p -values from a Wilcoxon test aswell as the ˆ A and effect size from a Vargha-Delaneytest comparing the distributions of the achieved branchcoverages (in percentages) by applying the genetic andfuzzing approach each ten times per smart contract.Looking closer at the p -values and Varghay-Delaneystatistic, we see that D YNA

MOSA achieves signiﬁcantlyhigher coverage ( p ≤ . ) than the fuzzer in six cases, eachwith large effect size. In contrast, the fuzzer signiﬁcantlyoutperformed D YNA

MOSA only once, also with largeeffect sizes. Additionally, when D

YNA

MOSA outperformsthe fuzzer, the average branch coverage increases between % and %, while the fuzzer only achieves a % (2branches) increase. This observation is in line with existingliterature [8], [9] that suggests that genetic algorithmscould prove beneﬁcial when compared to random testing HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. TABLE 6Comparison for the time spend on creating tests for the genetic search algorithm and the fuzzing algorithm.

Name Generations Time/Generation Total Time (s) Chain Time (%) p-value ˆ A Effect SizeGen. Fuz. Gen. Fuz. Gen. Fuz. Gen. Fuz.

AddressBook 3.50 1.90 122 277 428 527 0.72 0.71 0.28 0.40 smallBadAuction 1.00 1.00 85.9 86.5 85.9 86.5 0.84 0.84 0.96 0.5 negligibleBasicToken 1.00 1.00 92.7 95.2 92.7 95.2 0.76 0.77 0.39 0.39 smallCasino 101 101 80.3 133 8115 13432 0.82 0.82 largeDosAuction 1.00 1.00 69.2 70.9 69.2 70.9 0.85 0.85 0.58 0.43 negligibleEIP20StandardToken 1.00 1.00 1001 110 101 110 0.76 0.75

EasyPayAndWithDraw 3.50 101 122 78.1 425 7888 0.84 0.86 largeEtherBank 101 101 67.8 19.5 6848 1970 0.84 0.88 largeEzToken 1.00 1.00 137 146 137 146 0.72 0.73

FixedSupplyToken 34.7 4.10 79.5 172 2758 704 0.79 0.77 0.33 0.60 smallFundRaising 1.00 1.00 81.1 79.7 81.1 79.7 0.77 0.77 0.80 0.52 negligibleGift_1_ETH 101 101 66.4 93.1 67078 9399 0.83 0.82 largeGreeter 2.70 1.30 172 162 463 210 0.70 0.69 0.33 0.63 smallGreeter2 1.00 1.20 183 182 183 218 0.68 0.69 mediumGreeter3 2.10 1.60 172 250 361 400 0.71 0.68 0.28 0.20 largeGuardCheck 1.00 1.00 86.2 74.0 86.2 74.0 0.82 0.83 largeGuessTheNumberChallenge 1.90 1.30 52.3 35.5 99.5 46.2 0.84 0.75 0.33 0.23 largeIdentityManager 101 101 141 113 14196 11430 0.73 0.76 0.09 0.70 mediumLotteryFor10 101 101 149 113 15039 11389 0.73 0.79 largeLotteryMultipleWinners 65.9 91.7 174 95.1 11472 8721 0.76 0.79 0.28 0.72 mediumMultiSigWallet (1) 97.9 101 149 96.8 14600 9778 0.75 0.78 largeMultiSigWallet (2) 101 101 190 96.6 19210 9760 0.74 0.79 largeMyAdvancedToken 1.00 1.00 135 139 135 139 0.71 0.71 0.72 0.39 smallOpenAddressLottery 101 101 245 101 24729 10159 0.79 0.81 largePermissionGroups 66.7 101 132 157 8827 15878 0.74 0.78 largeProver 1.00 1.00 200 193 200 193 0.71 0.69 0.28 0.62 smallRandomness 101 101 86.3 86.5 8720 8740 0.82 0.83 0.80 0.51 negligibleReentrance 101 101 59.6 93.0 6022 9391 0.84 0.83 largeRubixi 101 101 53.4 121 5393 12257 0.70 0.71 largeSecureAuction 1.00 1.00 90.3 87.0 90.3 87.0 0.81 0.81 0.09 0.63 smallTestDateTime 101 101 211 345 21276 34882 0.62 0.62 largetheRun 101 101 104 91.8 10538 9269 0.75 0.80 largeVulnerableTwoStep 1.00 1.00 71.3 73 71.3 72.6 0.84 0.83 0.65 0.48 negligible

Mean approach for exercising deeper functionalities in code.

The genetic algorithm (i.e., D

YNA

MOSA) signiﬁcantlyoutperformed the fuzzing algorithm when generatingtest cases for six Solidity smart contracts. The fuzzeroutperformed the genetic algorithm signiﬁcantly onlyonce.

Table 6 shows the average number of generations (includingthe (ﬁrst) random initialisation) for both approaches as wellas the mean total time spend and the average time pergeneration. Additionally the Chain Time column, showsthe average percentage of time that was spend runningthe tests on our blockchain implementation (as opposed toevaluating- and generating new test cases). Interestingly, onaverage both approaches are more or less equally fast: withD

YNA

MOSA taking , generations on average comparedto , generations for the fuzzer and , seconds to , seconds for the fuzzer. This is surprising becausethe D YNA

MOSA algorithm follows the additional selection,crossover and mutation steps described in Section 4. Onepossible explanation for this is the preference criterion 2,which guides the search towards smaller test cases. Smallertest cases, in turn, take up less time; especially since Table 6shows that most of the time in our experiments was usedrunning the test cases on the blockchain. In order to properlycompare the results for the two implementations we per- formed Wilcoxon tests and Vargha-Delany tests comparingthe distributions of average run times for the smart contractsfor each approach, the results of which are shown in the ﬁnal3 columns of Table 6.There are ten smart contracts for which D

YNA

MOSAsigniﬁcantly ( p ≤ . ) outperformed the fuzzer (9 withlarge and 1 medium effect size). The faster performancefor “EasyPayAndWithDraw” and “PermissionGroups” canbe attributed to the fact that for these smart contracts thegenetic approach manages to regularly achieve branch cov-erage before the budget is consumed, whereas the fuzzerdoes not. For the other smart contracts we speculate thatthe preference criterion (2) in D YNA

MOSA, which guidesthe search to smaller test cases, saves time when runningthe tests in the blockchain environment and evaluating theirperformance as described in sections 4.2.3 through 4.2.5.There are seven smart contracts for which the fuzzingapproach signiﬁcantly outperformed the genetic search(each with large effect size). For each of these, we see thatthe fuzzer, spends a smaller percentage off time off -chaincompared to D

YNA

MOSA. This makes as the fuzzerbypasses the (computationally intensive) selection , crossover and mutation steps described in Section 4.2.2. D YNA

MOSA was signiﬁcantly faster than the fuzzingalgorithm the on ten smart contracts. The fuzzingalgorithm was signiﬁcantly faster than D

YNA

MOSAseven times.

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. Table 7 shows the average test case length (in number ofstatements) for the ﬁnal solution presented by both thegenetic algorithm and the fuzzing algorithm. Note that thissolution is an archive, which stores for each branch to becovered, the shortest test case that covers it. Interestingly,even though the creators of D

YNA

MOSA cite the use of apreference criterion as a means for reducing the size of thetest cases in the ﬁnal test suite, in this experiment simplyimplementing an archive resulted in results that are, atﬁrst glance, fairly similar with D

YNA

MOSA averaging 4.96statements and the fuzzer averaging 5.03 statements.To better compare the results of the two approaches aWilcoxon test and a Vargha-Delany test were performedcomparing the distributions of the average test case lengthsof the ﬁnal test suites for each smart contract. D

YNA

MOSAproduced signiﬁcantly shorter ( p ≤ . test cases for ﬁvesmart contracts (four with large effect size and one withmedium effect size), each of which it was also signiﬁcantlyfaster for as shown in Table 6. This supports the theory thatthe smaller test cases found by the guided search can lead toan increase in efﬁciency when compared to a random search.At the same time the fuzzing approach yielded signif-icantly smaller test cases in the ﬁnal test suite for 4 smartcontracts. For the “EasyPayAndWithDraw” smart contractthis can be explained by the fact that the guided searchmanages to achieve full branch coverage fairly quicklywhereas the fuzzer consumes the full budget and thus hasmany more opportunities to generate smaller test cases. Forthe other smart contracts (two with large effect size and onewith medium effect size) the improvement is very minor:ranging from 0.15 to 0.42 statements on average.If instead we look only at those smart contracts forwhich the fuzzing approach and the genetic approachcomplete in the same number of generations the averagetest case length for D YNA

MOSA becomes 3.70 statementsand the average test case length for the Fuzzer becomes3.96 which is slightly bigger.

The genetic algorithm (i.e., D

YNA

MOSA) producedsigniﬁcantly smaller test cases in the ﬁnal test suiteswhen compared to the fuzzing algorithm for ﬁve smartcontracts. The fuzzing algorithm produced signiﬁcantlyshorter test cases in the ﬁnal test suites when comparedto D

YNA

MOSA for 3 smart contracts.

HREATS TO V ALIDITY

In this section, we discuss the threats to the validity of ourexperiment.

Construct Validity.

We demonstrated that transactionproperties, blockchain properties, and interactive propertiesare present in some of the most popular Solidity smartcontracts on Github. Additionally, we showed the effec-tiveness and efﬁciency of AGS OL T by comparing a search-based test approach with a random testing one in termsof branch coverage, execution time, and test case length.Both approaches were implemented in the same tool (i.e.,AGS OL T) and executed on the same hardware environmentto make the comparison as fair as possible. We acknowledgethat implementation issues could negatively impact the ﬁnal

TABLE 7Comparison between the average test case length for the geneticsearch algorithm and the fuzzing algorithm.

Name Gen. Fuz p-value ˆ A Effect Size

AddressBook 9.39 10.1 0.39 0.44 negligibleBadAuction 2.94 3.09 0.34 0.45 negligibleBasicToken 3.67 3.54 0.61 0.50 negligibleCasino 5.88 9.1 0

DosAuction 3.66 3.79 0.84 0.46 negligibleEIP20StandardToken 5.71 5.89 0.58 0.42 smallEasyPayAndWithDraw 8.1 2.0

EtherBank 2.37 2.66

EzToken 4.47 4.66 0.61 0.41 smallFixedSupplyToken 4.15 4.85 0.28 0.41 smallFundRaising 6.55 6.61 0.88 0.48 negligibleGift_1_ETH 2.02 2.01 0.32 0.60 smallGreeter 7.62 7.51 0.96 0.52 negligibleGreeter2 7.01 7.34 0.72 0.44 negligibleGreeter3 8.22 9.16 0.09 0.26 largeGuardCheck 4.46 4.71 0.44 0.40 smallGuessTheNumberChallenge 14.46 13.69 0.54 0.50 negligibleIdentityManager 3.44 3.24 0.44 0.44 negligibleLotteryFor10 4.49 4.46 0.72 0.48 negligibleLotteryMultipleWinners 8.33 7.15 0.07 0.71 mediumMultiSigWallet (1) 3.88 6.59

MultiSigWallet (2) 3.85 6.81

MyAdvancedToken 2.70 2.93 0.51 0.42 smallOpenAddressLottery 2.51 2.09

PermissionGroups 7.16 6.61 0.44 0.58 smallProver 4.41 4.42 0.57 0.56 negligibleRandomness 2.46 2.58 0.07 0.28 mediumReentrance 2.15 2.0

Rubixi 3.73 3.52 0.58 0.50 negligibleSecureAuction 3.87 3.58 0.24 0.65 smallTestDateTime 2.42 2.14 theRun 2.1 2.17

VulnerableTwoStep 5.33 5.15 0.80 0.52 negligible

Mean results. However, please consider that we strictly followedthe deﬁnition of the algorithm provided by Panichella etal. [35] and that our implementation is publicly available toallow other researchers to replicate our study.

Internal Validity.

All the experiments were executedten times to address the inherent randomness of both ap-proaches. Fine-tuning the parameters of the DynaMOSAalgorithm [35] could also have affected the internal validityof the experiments; since setting these parameters is chal-lenging [46], we used the default values suggested by thecreators of the algorithm [35].

External Validity.

We tested AGS OL T on a set of real-world smart contracts from a wide variety of developers.We also ensured that each basic variable type and arraysin Solidity were included in the data set. Despite exhibitingeach of the properties that are indicative of the identiﬁedblockchain-speciﬁc challenges it is still possible that ourdata set is not representative of Solidity smart contractsin general. Future experimentation with a larger data setis desirable. AGS OL T cannot yet handle user-deﬁned inputvariable types nor smart contracts that rely on previouslydeployed smart contracts for their initialization. Addingthis feature is part of our research agenda. Our conclusionsare derived from the results obtained only on one geneticalgorithm, namely DynaMOSA [35]. Our research agendaincludes experimentation with a broader set of search al-gorithms. We did not run the test cases in a distributedblockchain, but we relied on G

ANACHE , a framework torun tests, execute commands, and inspect smart contracts.However, please consider that the resulting test suites arepresented conveniently and can be easily used in any test

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. network (e.g., Ropstein ). Conclusion Validity

The results were obtained by re-peating the experiments enough times and adopting appro-priate statistical tests to draw valid conclusions. Speciﬁcally,we used the Wilcoxon test [44] to test the signiﬁcance of thedifferences and the Vargha-Delaney statistic [45] to estimatethe effect size of the observed differences.

ONCLUSION

This paper discussed the challenges that arise when apply-ing automated test case generation in a blockchain envi-ronment, identifying three different categories: transactionproperties, blockchain properties and interactive properties.We presented, explored and partially validated AGS OL T, atool that addresses these challenges and creates test suitesthat aim to achieve branch coverage for Solidity smartcontract unit testing.AGS OL T works with both a random testing approach(i.e., a fuzzer) and a guided-search approach (i.e., the D Y - NA MOSA genetic algorithm [35]). We gathered a data setconsisting of real-world smart contracts from GitHub.On the one hand, we demonstrated that many of thesecontracts exhibit behaviors that align with the challengeswe identiﬁed. Additionally, we have shown the effective-ness and efﬁciency of AGS OL T by achieving good branchcoverage with both approaches. In doing so we presentedthe ﬁrst comparison between a guided search and a randomsearch in the domain of automated test case generation forsmart contracts.We found that the D

YNA

MSOA algorithm outper-formed our fuzzer for achieving branch coverage, but as-certained that neither approach is signiﬁcantly faster orproduces signiﬁcantly smaller test cases for the ﬁnal testsuite.The fact that the fuzzer was not faster, despite notgoing through the extra steps of selection, cross-over andmutation, is interesting and deserves further investigation.We hypothesize that this could be due to the preferencecriterion of D

YNA

MOSA, which should, in theory, resultin less time spent on the execution and evaluation steps ofthe testing procedure. Finally, remarkably, we have shownthat three of the most prevalent smart contracts on Github,suffer from critical failures (crashes) that emerged duringour tests, demonstrating the potential real-world value ofAGS OL T.In our future agenda, we plan to extend our current base-line a larger set of commercial smart contracts. Moreoverwe intend to leverage the parameterization of our testingapproach with more search algorithms. Additionally, wewill expand AGS O LT to test inter-contract dependencieswith the ﬁnal goal of creating test cases in blockchainenvironments when multiple smart contracts interact. R EFERENCES [1] L. Anderson, R. Holz, A. Ponomarev, P. Rimba, and I. Weber,“New kids on the block: an analysis of modern blockchains,” arXivpreprint arXiv:1606.06530 , 2016.10. https://ethereum.org/en/developers/docs/networks/

Proceedings of the 2016 ACM SIGSACconference on computer and communications security , 2016, pp. 254–269.[3] K. Delmolino, M. Arnett, A. Kosba, A. Miller, and E. Shi, “Lab:Step by Step towards Programming a Safe Smart Contract,” 2015.[4] N. Atzei, M. Bartoletti, and T. Cimoli, “A survey of attacks onethereum smart contracts.”

IACR Cryptology ePrint archive , vol.2016, p. 1007, 2016.[5] B. Jiang, Y. Liu, and W. Chan, “Contractfuzzer: Fuzzing smartcontracts for vulnerability detection,” in

Proceedings of the 33rdACM/IEEE International Conference on Automated Software Engineer-ing , 2018, pp. 259–269.[6] “The ofﬁcial solidity fuzzer: Solfuzzer.” [Online]. Avail-able: https://solidity.readthedocs.io/en/develop/contributing.html

Software Testing Veriﬁcationand Reliability , vol. 28, no. 4, pp. 1367–1374, 2018.[9] M. Harman and P. McMinn, “A theoretical and empirical studyof search-based testing: Local, global, and hybrid search,”

SoftwareEngineering, IEEE Transactions on , vol. 36, pp. 226 – 247, 05 2010.[10] M. Harman, Y. Jia, and Y. Zhang, “Achievements, open problemsand challenges for search based software testing,” in , 2015, pp. 1–12.[11] A. Arcuri, M. Z. Iqbal, and L. Briand, “Black-box system testing ofreal-time embedded systems using random and search-based test-ing,”

Lecture Notes in Computer Science (including subseries LectureNotes in Artiﬁcial Intelligence and Lecture Notes in Bioinformatics) ,vol. 6435 LNCS, pp. 95–110, 2010.[12] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system.”2008. [Online]. Available: https://bitcoin.org/bitcoin.pdf[13] V. Buterin, “Ethereum: a next generation smart contract anddecentralized application platform.” 2013. [Online]. Available:https://github.com/ethereum/wiki/wiki/White-Paper[14] G. Wood et al. , “Ethereum: A secure decentralised generalisedtransaction ledger,”

Ethereum project yellow paper

ACM Computing Surveys(CSUR) , vol. 53, no. 3, pp. 1–37, 2020.[18] W. Zou, D. Lo, P. S. Kochhar, X. D. Le, X. Xia, Y. Feng, Z. Chen,and B. Xu, “Smart contract development: Challenges and oppor-tunities,”

IEEE Transactions on Software Engineering , pp. 1–1, 2019.[19] “Ethereum wiki: Ethereum contract security techniques andtips.” [Online]. Available: https://github.com/ethereum/wiki/wiki/Safety et al. , “An orchestrated survey of methodologies for automatedsoftware test case generation,”

Journal of Systems and Software ,vol. 86, no. 8, pp. 1978–2001, 2013.[22] P. Zhang, F. Xiao, and X. Luo, “Soliditycheck: Quickly detect-ing smart contract problems through regular expressions,” arXivpreprint arXiv:1911.09425 , 2019.[23] H. Wu, X. Wang, J. Xu, W. Zou, L. Zhang, and Z. Chen,“Mutation testing for ethereum smart contract,” arXiv preprintarXiv:1908.03707 , 2019.[24] Y. Zhang, S. Ma, J. Li, K. Li, S. Nepal, and D. Gu, “Smartshield:Automatic smart contract protection made easy,” in , 2020, pp. 23–34.[25] P. Zhang, J. Yu, and S. Ji, “Adf-ga: Data ﬂow criterion basedtest case generation for ethereum smart contracts,” arXiv preprintarXiv:2003.00257 , 2020.

HIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICATION. COPYRIGHT MAY BE TRANSFERRED WITHOUT NOTICE, AFTER WHICH THIS VERSION MAY NO LONGER BE ACCESSIBLE. [26] Y. Liu, Y. Li, S.-W. Lin, and Q. Yan, “ModCon: A model-basedtesting platform for smart contracts,” in Proceedings of the 28thACM Joint European Software Engineering Conference and Symposiumon the Foundations of Software Engineering (FSE) , Nov. 2020.[27] M. Sutton, A. Greene, and P. Amini,

Fuzzing: brute force vulnerabilitydiscovery . Pearson Education, 2007.[28] M. Bartoletti and L. Pompianu, “An empirical analysis of smartcontracts: platforms, applications, and design patterns,” in

Interna-tional conference on ﬁnancial cryptography and data security . Springer,2017, pp. 494–509.[29] G. Fraser and A. Arcuri, “The seed is strong: Seeding strategiesin search-based software testing,” in . IEEE,2012, pp. 121–130.[30] J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The programdependence graph and its use in optimization,”

ACM Transactionson Programming Languages and Systems (TOPLAS) , vol. 9, no. 3, pp.319–349, 1987.[31] T. Lengauer and R. E. Tarjan, “A fast algorithm for ﬁnding domina-tors in a ﬂowgraph,”

ACM Transactions on Programming Languagesand Systems (TOPLAS) , vol. 1, no. 1, pp. 121–141, 1979.[32] G. Fraser and A. Arcuri, “Whole test suite generation,”

IEEETransactions on Software Engineering , vol. 39, no. 2, pp. 276–291,2012.[33] G. Fraser and A. Zeller, “Mutation-driven generation of unit testsand oracles,”

IEEE Transactions on Software Engineering , vol. 38,no. 2, pp. 278–292, 2011.[34] P. Tonella, “Evolutionary testing of classes,” in

ACM SIGSOFTSoftware Engineering Notes , vol. 29, no. 4. ACM, 2004, pp. 119–128.[35] A. Panichella, F. M. Kifetew, and P. Tonella, “Automated testcase generation as a many-objective optimisation problem withdynamic selection of the targets,”

IEEE Transactions on SoftwareEngineering , vol. 44, no. 2, pp. 122–158, 2017. [36] M. Mitchell,

An introduction to genetic algorithms . MIT press, 1998.[37] C. Von Lücken, B. Barán, and C. Brizuela, “A survey on multi-objective evolutionary algorithms for many-objective problems,”

Computational optimization and applications , vol. 58, no. 3, pp. 707–756, 2014.[38] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast andelitist multiobjective genetic algorithm: Nsga-ii,”

IEEE transactionson evolutionary computation , vol. 6, no. 2, pp. 182–197, 2002.[39] S. Scalabrino, G. Grano, D. Di Nucci, R. Oliveto, and A. De Lucia,“Search-based testing of procedural programs: Iterative single-target or multi-target approach?” in

International Symposium onSearch Based Software Engineering . Springer, 2016, pp. 64–79.[40] A. Panichella, F. M. Kifetew, and P. Tonella, “Lips vs mosa: Areplicated empirical study on automated test case generation,”in

International Symposium on Search Based Software Engineering .Springer, 2017, pp. 83–98.[41] B. Korel, “Automated software test data generation,”

IEEE Trans-actions on software engineering , vol. 16, no. 8, pp. 870–879, 1990.[42] L. Baresi and M. Miraz, “Testful: Automatic unit-test generationfor java classes,” in , vol. 2. IEEE, 2010, pp. 281–284.[43] M. Köppen and K. Yoshida, “Substitute distance assignments innsga-ii for handling many-objective optimization problems,” in

In-ternational Conference on Evolutionary Multi-Criterion Optimization ,2007, pp. 727–741.[44] W. J. Conover and W. J. Conover,

Practical nonparametric statistics .Wiley New York, 1980.[45] A. Vargha and H. D. Delaney, “A critique and improvement of thecl common language effect size statistics of mcgraw and wong,”

Journal of Educational and Behavioral Statistics , vol. 25, no. 2, pp.101–132, 2000.[46] A. Arcuri and G. Fraser, “Parameter tuning or default values?an empirical investigation in search-based software engineering,”