[PDF] Blockchain Superoptimizer

Abstract

In the blockchain-based, distributed computing platform Ethereum, programs called smart contracts are compiled to bytecode and executed on the Ethereum Virtual Machine (EVM). Executing EVM bytecode is subject to monetary fees---a clear optimization target. Our aim is to superoptimize EVM bytecode by encoding the operational semantics of EVM instructions as SMT formulas and leveraging a constraint solver to automatically find cheaper bytecode. We implement this approach in our EVM Bytecode SuperOptimizer ebso and perform two large scale evaluations on real-world data sets.

Full PDF

BBlockchain Superoptimizer (cid:63)

Julian Nagele − − − and Maria A Schett − − − Queen Mary University of London, UK [email protected] University College London, UK [email protected]

Abstract.

In the blockchain-based, distributed computing platformEthereum, programs called smart contracts are compiled to bytecode andexecuted on the Ethereum Virtual Machine (EVM). Executing EVM byte-code is subject to monetary fees—a clear optimization target. Our aim isto superoptimize EVM bytecode by encoding the operational semanticsof EVM instructions as SMT formulas and leveraging a constraint solverto automatically ﬁnd cheaper bytecode. We implement this approach inour EVM Bytecode SuperOptimizer ebso and perform two large scaleevaluations on real-world data sets.

Keywords:

Superoptimization, Ethereum, Smart Contracts, SMT

Ethereum is a blockchain-based, distributed computing platform featuring a quasi-Turing complete programming language. In

Ethereum , programs are called smartcontracts, compiled to bytecode and executed on the Ethereum Virtual Machine(

EVM ). In order to avoid network spam and to ensure termination, executionis subject to monetary fees. These fees are speciﬁed in units of gas , i.e. , anyinstruction executed on the EVM has a cost in terms of gas, possibly dependingon its input and the execution state.

Example 1.

Consider the expression 3+(0 − x ), which corresponds to the program PUSH SUB PUSH ADD . The

EVM is a stack-based machine, so this program takesan argument x from the stack to compute the expression above. However, clearlyone can save the ADD instruction and instead compute 3 − x , i.e. , optimize theprogram to PUSH SUB . The ﬁrst program costs 12 g to execute on the EVM ,while the second costs only 6 g .We build a tool that automatically ﬁnds this optimization and similar othersthat are missed by state-of-the-art smart contract compilers: the E VM b ytecode s uper o ptimizer ebso . The use of ebso for Example 1 is sketched in Figure 1. Toﬁnd these optimizations, ebso implements superoptimization . Superoptimizationis often considered too slow to use during software development except for specialcircumstances. We argue that compiling smart contracts is such a circumstance. (cid:63) This research is supported by the UK Research Institute in Veriﬁed TrustworthySoftware Systems and partially supported by funding from Google. a r X i v : . [ c s . L O ] M a y J. Nagele and M.A. Schett xx − x )3 − x ≡ EVM executes for 12 g PUSH SUB PUSH ADD

EVM executes for 6 g PUSH SUB

Fig. 1: Overview over ebso .Since bytecode, once it has been deployed to the blockchain, cannot change again,spending extra time optimizing a program that may be called many times, mightwell be worth it. Especially, since it is very clear what “worth it” means: theclear cost model of gas makes it easy to deﬁne optimality. Our main contributions are: (i) an SMT encoding of a subset of

EVM bytecodesemantics (Section 4), (ii) an implementation of two ﬂavors of superoptimization: basic , where the constraint solver is used to check equivalence of enumeratedcandidate instruction sequences, and unbounded , where also the enumerationitself is shifted to the constraint solver (Section 5), and (iii) two large scaleevaluations (Section 6). First, we run ebso on a collection of smart contracts froma programming competition aimed at producing the cheapest

EVM bytecodefor given programming challenges. Even in this already highly optimized dataset ebso still ﬁnds 19 optimizations. In the second evaluation we compare theperformance of basic and unbounded superoptimization on the 2500 most calledsmart contracts from the

Ethereum blockchain and ﬁnd that, in our setting,unbounded superoptimization outperforms basic superoptimization.

Smart contracts in

Ethereum are usually written in a specialized high-levellanguage such as

Solidity or Vyper and then compiled into bytecode , which isexecuted on the

EVM . The

EVM is a virtual machine formally deﬁned in the

Ethereum yellow paper [14]. It is based on a stack , which holds words , i.e. , bitvectors, of size 256. The maximal stack size is set to 2 . Pushing words onto afull stack leads to a stack overﬂow , while removing words from the empty stackleads to a stack underﬂow . Both lead the EVM to enter an exceptional halting state. The

EVM also features a volatile memory , a word-addressed byte array,and a persistent key-value storage , a word-addressed word array, whose contentsare stored on the

Ethereum blockchain. The bytecode directly corresponds to Of course setting the gas price of individual instructions, such that it accuratelyreﬂects the computational cost is hard, and has been a problem in the past see e.g. news.ycombinator.com/item?id=12557372. This word size was chosen to facilitate the cryptographic computations such ashashing that are often performed in the

EVM .lockchain Superoptimizer 3 more human-friendly instructions . For example, the

EVM bytecode encodes the following sequence of instructions:

PUSH PUSH ADD . Instructionscan be classiﬁed into diﬀerent categories, such as arithmetic operations, e.g.

ADD and

SUB for addition and subtraction, comparisons , e.g. SLT for signed less-than,and bitwise operations , like

AND and

NOT . The instruction

PUSH pushes a word ontothe stack, while

POP removes the top word. Words on the stack can be duplicatedusing

DUP i and swapped using SWAP i for 1 (cid:54) i (cid:54)

16, where i refers to the i thword below the top. Some instructions are speciﬁc to the blockchain domain, like BLOCKHASH , which returns the hash of a recently mined block, or

ADDRESS , whichreturns the address of the currently executing account. Instructions for controlﬂow include e.g.

JUMP , JUMPDEST , and

STOP .We write δ ( ι ) for the number of words that instruction ι takes from the stack,and α ( ι ) for the number of words ι adds onto the stack. A program p is a ﬁnitesequence of instructions. We deﬁne the size | p | of a program as the number of itsinstructions. To execute a program on the Ethereum blockchain, the caller has topay gas . The amount to be paid depends on both the instructions of the programand the input: every instruction comes with a gas cost . For example,

PUSH and

ADD currently cost 3 g , and therefore executing the program above costs 9 g . Mostinstructions have a ﬁxed cost, but some take the current state of the executioninto account. A prominent example of this behavior is storage. Writing to azero-valued key conceptually allocates new storage and thus is more expensivethan writing to a key that is already in use, i.e. , holds a non-zero value. The gasprices of all instructions are speciﬁed in the yellow paper [14]. Given a source program p superoptimization tries to generate a target program p (cid:48) such that (i ) p (cid:48) is equivalent to p , and (ii ) the cost of p (cid:48) is minimal with respectto a given cost function C . This problem arises in several contexts with diﬀerentsource and target languages. In our case, i.e. , for a binary recompiler, both sourceand target are EVM bytecode.A standard approach to superoptimization and synthesis [4, 9, 12, 13] is tosearch through the space of candidate instruction sequences of increasing costand use a constraint solver to check whether a candidate correctly implementsthe source program. The solver of choice is usually a Satisﬁability ModuloTheories (SMT) solver, which operates on ﬁrst-order formulas in combinationwith background theories, such as the theory of bit vectors or arrays. ModernSMT solvers are highly optimized and implement techniques to handle arbitraryﬁrst-order formulas, such as E-matching. With increasing cost of the candidatesequence, the search space dramatically increases. To deal with this explosionone idea is to hand some of the search to the solver, by using templates [4, 13].Templates leave holes in the target program, e.g. for immediate arguments ofinstructions, that the solver must then ﬁll. A candidate program is correct if the We gloss over the 32 diﬀerent

PUSH instructions depending on the size of the word tobe pushed. J. Nagele and M.A. Schett1: function

BasicSo ( p s , C )2: n ← while true do for all p t ∈ { p | C ( p ) = n } do χ ← EncodeBso ( p s , p t )6: if Satisfiable ( χ ) then m ← GetModel ( χ )8: p t ← DecodeBso ( m )9: return p t n ← n + 1(a) Basic Superoptimization. 1: function UnboundedSo ( p s , C )2: p t ← p s χ ← EncodeUso ( p t ) ∧ Bound ( p t , C )4: while Satisfiable ( χ ) do m ← GetModel ( χ )6: p t ← DecodeUso ( m )7: χ ← χ ∧ Bound ( p t , C )8: return p t (b) Unbounded Superoptimization. Alg. 2: Superoptimization.encoding is satisﬁable, i.e. , if the solver ﬁnds a model. Constructing the targetprogram then amounts to obtaining the values for the templates from the model.This approach is shown in Algorithm 2(a).

Unbounded superoptimization [5, 6] pushes this idea further. Instead of search-ing through candidate programs and calling the SMT solver on them, it shiftsthe search into the solver, i.e. , the encoding expresses all candidate instructionsequences of any length that correctly implement the source program. This ap-proach is shown in Algorithm 2(b): if the solver returns satisﬁable then there isan instruction sequence that correctly implements the source program. Again,this target program is reconstructed from the model. If successful, a constraintasking for a cheaper program is added and the solver is called again. Note thatthis also means that unbounded superoptimization can stop with a correct, butpossibly non-optimal solution. In contrast, basic superoptimization cannot returna correct solution until it has ﬁnished.The main ingredients of superoptimization in Algorithm 2 are

Encode-Bso/Uso producing the SMT encoding, and

DecodeBso/Uso reconstructingthe target program from a model. We present our encodings for the semantics of

EVM bytecode in the following section.

We start by encoding three parts of the

EVM execution state: (i) the stack, (ii) gas consumption, and (iii) whether the execution is in an exceptional haltingstate. We model the stack as an uninterpreted function together with a counter,which points to the next free position on the stack.

Deﬁnition 1.

A state σ = (cid:104) st , c , hlt , g (cid:105) consists of (i) a function st ( V , j, n ) that, after the program has executed j instructions oninput variables from V returns the word from position n in the stack, (ii) a function c ( j ) that returns the number of words on the stack after executing j instructions. Hence st ( V , j, c ( j ) − returns the top of the stack. lockchain Superoptimizer 5 (iii) a function hlt ( j ) that returns true ( (cid:62) ) if exceptional halting has occurredafter executing j instructions, and false ( ⊥ ) otherwise. (iv) a function g ( V , j ) that returns the amount of gas consumed after executing j instructions. Here the functions in σ represent all execution states of a program, indexed byvariable j . Example 2.

Symbolically executing the program

PUSH PUSH ADD using ourrepresentation above we have g (0) = 0 g (1) = 3 g (2) = 6 g (3) = 9 c (0) = 0 c (1) = 1 c (2) = 2 c (3) = 1 st (1 ,

0) = 41 st (2 ,

1) = 1 st (3 ,

0) = 42and hlt (0) = hlt (1) = hlt (2) = hlt (3) = ⊥ .Note that this program does not consume any words that were already onthe stack. This is not the case in general. For instance we might be dealing withthe body of a function, which takes its arguments from the stack. Hence we needto ensure that at the beginning of the execution suﬃciently many words are onthe stack. To this end we ﬁrst compute the depth ˆ δ ( p ) of the program p , i.e. , thenumber of words a program p consumes. Then we take variables x , . . . , x ˆ δ ( p ) − that represent the input to the program and initialize our functions accordingly. Deﬁnition 2.

For a program with ˆ δ ( p ) = d we initialize the state σ using g σ (0) = 0 ∧ hlt σ (0) = ⊥ ∧ c σ (0) = d ∧ (cid:94) (cid:54) (cid:96)

ADD we set c (0) = 2, and st ( { x , x } , ,

0) = x and st ( { x , x } , ,

1) = x . We then have st ( { x , x } , ,

0) = x + x .To encode the eﬀect of EVM instructions we build SMT formulas to capturetheir operational semantics. That is, for an instruction ι and a state σ we give aformula τ ( ι, σ, j ) that deﬁnes the eﬀect on state σ if ι is the j -th instruction thatis executed. Since large parts of these formulas are similar for every instructionand only depend on δ and α we build them from smaller building blocks. Deﬁnition 3.

For an instruction ι and state σ we deﬁne: τ g ( ι, σ, j ) ≡ g σ ( V , j + 1) = g σ ( V , j ) + C ( σ, j, ι ) τ c ( ι, σ, j ) ≡ c σ ( j + 1) = c σ ( j ) + α ( ι ) − δ ( ι ) τ pres ( ι, σ, j ) ≡ ∀ n.n < c σ ( j ) − δ ( ι ) → st σ ( V , j + 1 , n ) = st σ ( V , j, n ) τ hlt ( ι, σ, j ) ≡ hlt σ ( j + 1) = hlt σ ( j ) ∨ c σ ( j ) − δ ( ι ) < ∨ c σ ( j ) − δ ( ι ) + α ( ι ) > Here C ( σ, j, ι ) is the gas cost of executing instruction ι on state σ after j steps. J. Nagele and M.A. Schett

The formula τ g adds the cost of ι to the gas cost incurred so far. The formula τ c updates the counter for the number of words on the stack according to δ and α . The formula τ pres expresses that all words on the stack below c σ ( j ) − δ ( ι ) arepreserved. Finally, τ hlt captures that exceptions relevant to the stack can occurthrough either an underﬂow or an overﬂow, and that once it has occurred anexceptional halt state persists. For now the only other component we need is howthe instructions aﬀect the stack st , i.e. , a formula τ st ( ι, σ, j ). Here we only givean example and refer to our implementation or the yellow paper [14] for details.We have τ st ( ADD , σ, j ) ≡ st σ ( V , j + 1 , c σ ( j + 1) − st σ ( V , j, c σ ( j ) −

1) + st σ ( V , j, c σ ( j ) − Deﬁnition 4.

For an instruction ι and state σ we deﬁne τ ( ι, σ, j ) ≡ τ st ( ι, σ, j ) ∧ τ c ( ι, σ, j ) ∧ τ g ( ι, σ, j ) ∧ τ hlt ( ι, σ, j ) ∧ τ pres ( ι, σ, j )Then to encode the semantics of a program p all we need to do is to apply τ to the instructions of p . Deﬁnition 5.

For a program p = ι · · · ι n we set τ ( p, σ ) ≡ (cid:86) (cid:54) j (cid:54) n τ ( ι j , σ, j ) . Before building an encoding for superoptimization we consider another aspectof the

EVM for our state representation: storage and memory. The gas cost forstoring words depends on the words that are currently stored. Similarly, the costfor using memory depends on the number of bytes currently used. This is whythe cost of an instruction C ( σ, j, ι ) depends on the state and the function g σ accumulating gas cost depends on V .To add support for storage and memory to our encoding there are two naturalchoices: the theory of arrays or an Ackermann encoding. However, since we havenot used arrays so far, they would require the solver to deal with an additionaltheory. For an Ackermann encoding we only need uninterpreted functions, whichwe have used already. Hence, to represent storage in our encoding we extendstates with an uninterpreted function str ( V , j, k ), which returns the word atkey k after the program has executed j instructions. Similarly to how we setup the initial stack we need to deal with the values held by the storage beforethe program is executed. Thus, to initialize str we introduce fresh variables torepresent the initial contents of the storage. More precisely, for all SLOAD and

SSTORE instructions occurring at positions j , . . . , j (cid:96) in the source program, weintroduce fresh variables s , . . . , s (cid:96) and add them to V . Then for a state σ weinitialize str σ by adding the following conjunct to the initialization constraintfrom Deﬁnition 2: ∀ w. str σ ( V , , w ) = ite ( w = a j , s , ite ( w = a j , s , . . . , ite ( w = a j (cid:96) , s (cid:96) , w ⊥ )))where a j = st σ ( V , j, c ( j ) −

1) and w ⊥ is the default value for words in the storage. lockchain Superoptimizer 7 The eﬀect of the two storage instructions

SLOAD and

SSTORE can then beencoded as follows: τ st ( SLOAD , σ, j ) ≡ st σ ( V , j + 1 , c σ ( j + 1) −

1) = str ( V , j, st σ ( V , j, c σ ( j ) − τ str ( SSTORE , σ, j ) ≡ ∀ w. str σ ( V , j + 1 , w ) = ite ( w = st σ ( V , j, c σ ( j ) − , st σ ( V , j, c σ ( j ) − , str σ ( V , j, w ))Moreover all instructions except SSTORE preserve the storage, that is, for ι (cid:54) = SSTORE we add the following conjunct to τ pres ( ι, σ, j ): ∀ w. str σ ( V , j + 1 , w ) = str σ ( V , j, w )To encode memory a similar strategy is an obvious way to go. However, weﬁrst want to evaluate the solver’s performance on the encodings obtained whenusing stack and storage. Since the solver already struggled, due to the size of theprograms and the number of universally quantiﬁed variables, see Section 6, wehave not yet added an encoding of memory.Finally, to use our encoding for superoptimization we need an encoding ofequality for two states after a certain number of instructions. Either to ensurethat two programs are equivalent (they start and end in equal states) or diﬀerent(they start in equal states, but end in diﬀerent ones). The following formulacaptures this constraint. Deﬁnition 6.

For states σ and σ and program locations j and j we deﬁne (cid:15) ( σ , σ , j , j ) ≡ c σ ( j ) = c σ ( j ) ∧ hlt σ ( j ) = hlt σ ( j ) ∧ ∀ n.n < c σ ( j ) → st σ ( V , j , n ) = st σ ( V , j , n ) ∧ ∀ w. str σ ( V , j , w ) = str σ ( V , j , w )Since we aim to improve gas consumption, we do not demand equality for g .We now have all ingredients needed to implement basic superoptimization:simply enumerate all possible programs ordered by gas cost and use the encodingsto check equivalence. However, since already for one PUSH there are 2 possiblearguments, this will not produce results in a reasonable amount of time. Hence weuse templates as described in Section 3. We introduce an uninterpreted function a ( j ) that maps a program location j to a word, which will be the argumentof PUSH . The solver then ﬁlls these templates and we can get the values fromthe model. This is a step forward, but since we have 80 encoded instructions,enumerating all permutations still yields too large a search space. Hence we use anencoding similar to the CEGIS algorithm [4]. Given a collection of instructions, weformulate a constraint representing all possible permutations of these instructions.It is satisﬁable if there is a way to connect the instructions into a target programthat is equivalent to the source program. The order of the instructions can againbe reconstructed from the model provided by the solver. More precisely givena source program p and a list of candidate instructions ι , . . . , ι n , EncodeBso from Algorithm 2(a) takes variables j , . . . , j n and two states σ and σ (cid:48) and builds J. Nagele and M.A. Schett the following formula ∀V . (cid:15) ( σ, σ (cid:48) , , ∧ (cid:15) ( σ, σ (cid:48) , | p | , n ) ∧ τ ( p, σ ) ∧ (cid:94) (cid:54) (cid:96) (cid:54) n τ ( ι (cid:96) , σ (cid:48) , j (cid:96) ) ∧ (cid:94) (cid:54) (cid:96)

Given a set of instructions CI we deﬁne the formula ρ ( σ, n ) as ∀ j. j (cid:62) ∧ j < n → (cid:94) ι ∈ CI instr ( j ) = ι → τ ( ι, σ, j ) ∧ (cid:95) ι ∈ CI instr ( j ) = ι Finally, the constraint produced by

EncodeUso from Algorithm 2(b) is ∀V . τ ( p, σ ) ∧ ρ ( σ (cid:48) , n ) ∧ (cid:15) ( σ, σ (cid:48) , , ∧ (cid:15) ( σ, σ (cid:48) , | p | , n ) ∧ g σ ( V , | p | ) > g σ (cid:48) ( V , n )During our experiments we observed that the solver struggles to show thatthe formula is unsatisﬁable when p is already optimal. To help in these caseswe additionally add a bound on n : since the cheapest EVM instruction has gascost 1, the target program cannot use more instructions than the gas cost of p , i.e. , we add n (cid:54) g σ ( V , | p | ).In our application domain there are many instructions that fetch informationfrom the outside world. For instance, ADDRESS gets the

Ethereum address ofthe account currently executing the bytecode of this smart contract. Since itis not possible to know these values at compile time we cannot encode theirfull semantics. However, we would still like to take advantage of structuraloptimizations where these instructions are involved, e.g. , via

DUP and

SWAP . Example 3.

Consider the program

ADDRESS DUP1 . The same eﬀect can be achievedby simply calling

ADDRESS ADDRESS . Duplicating words on the stack, if they areused multiple times, is an intuitive approach. However, because executing

ADDRESS costs 2 g and DUP1 costs 3 g , perhaps unexpectedly, the second program is cheaper.To ﬁnd such optimizations we need a way to encode ADDRESS and similarinstructions. For our purposes, these instructions have in common that they putarbitrary but ﬁxed words onto the stack. Analogous to uninterpreted functions, wecall them uninterpreted instructions and collect them in the set UI . To represent lockchain Superoptimizer 9 their output we use universally quantiﬁed variables—similar to input variables.To encode the eﬀect uninterpreted instructions have on the stack, i.e. , τ st , wedistinguish between constant and non-constant uninterpreted instructions.Let ui c ( p ) be the set of constant uninterpreted instructions in p , i.e. ui c ( p ) = { ι ∈ p | ι ∈ UI ∧ δ ( ι ) = 0 } . Then for ui c ( p ) = { ι , . . . , ι k } we take variables u ι , . . . , u ι k and add them to V , and thus to the arguments of the state func-tion st . The formula τ st can then use these variables to represent the unknownword produced by the uninterpreted instruction, i.e. , for ι ∈ ui c ( p ) with thecorresponding variable u ι in V , we set τ st ( ι, σ, j ) ≡ st σ ( V , j + 1 , c σ ( j )) = u ι .For a non-constant instruction ι , such as BLOCKHASH or BALANCE , the wordput onto the stack by ι depends on the top δ ( ι ) words of the stack. We againmodel this dependency using an uninterpreted function. That is, for every non-constant uninterpreted instruction ι in the source program p , ui n ( p ) = { ι ∈ p | ι ∈ UI ∧ δ ( ι ) > } , we use an uninterpreted function f ι . Conceptually, we canthink of f ι as a read-only memory initialized with the values that the calls to ι produce. Example 4.

The instruction

BLOCKHASH gets the hash of a given block b . Thusoptimizing the program PUSH b BLOCKHASH PUSH b BLOCKHASH depends on thevalues b and b . If b = b then the cheaper program PUSH b BLOCKHASH DUP1 yields the same state as the original program.To capture this behaviour, we need to associate the arguments b and b of BLOCKHASH with the two diﬀerent results they may produce. As with constantuninterpreted instructions, to model arbitrary but ﬁxed results, we add freshvariables to V . However, to account for diﬀerent results produced by (cid:96) invocationsof ι in p we have to add (cid:96) variables. Let p be a program and ι ∈ ui n ( p ) aunary instruction which appears (cid:96) times at positions j , . . . , j (cid:96) in p . For variables u , . . . , u (cid:96) , we initialize f ι as follows: ∀ w. f ι ( V , w ) = ite ( w = a j , u , ite ( w = a j , u , . . . , ite ( w = a j (cid:96) , u (cid:96) , w ⊥ )))where a j is the word on the stack after j instructions in p , that is a j = st σ ( V , j, c ( j ) − w ⊥ is a default word.This approach straightforwardly extends to instructions with more than oneargument. Here we assume that uninterpreted instructions put exactly one wordonto the stack, i.e. , α ( ι ) = 1 for all ι ∈ UI . This assumption is easily veriﬁed forthe EVM : the only instructions with α ( ι ) > DUP and

SWAP . Finally we setthe eﬀect a non-constant uninterpreted instruction ι with associated function f ι has on the stack: τ st ( ι, σ, j ) ≡ st σ ( V , j + 1 , c σ ( j + 1) −

1) = f ι ( V , st σ ( V , j, c σ ( j ) − BLOCKHASH returns 0 if it is called for a blocknumber greater than the current block number. While the current block numberis not known at compile time, the instruction

NUMBER does return it. Encodingthis interplay between

BLOCKHASH and

NUMBER could potentially be exploited forﬁnding optimizations.

We implemented basic and unbounded superoptimization in our tool ebso , whichis available under the Apache-2.0 license: github.com/juliannagele/ebso . Theencoding employed by ebso uses several background theories: (i ) uninterpretedfunctions (UF) for encoding the state of the EVM , for templates, and for encodinguninterpreted instructions, (ii ) bit vector arithmetic (BV) for operations onwords, (iii ) quantiﬁers for initial words on the stack and in the storage, and theresults of uninterpreted instructions, and (iv ) linear integer arithmetic (LIA)for the instruction counter. Hence following the SMT-LIB classiﬁcation ebso ’sconstraints fall under the logic UFBVLIA. As SMT solver we chose Z3 [3],version 4.7.1 which we call with default conﬁgurations. In particular, Z3 performedwell for the theory of quantiﬁed bit vectors and uninterpreted functions in thelast SMT competition (albeit non-competing). The aim of our implementation is to provide a prototype without relyingon heavy engineering and optimizations such as exploiting parallelism or tweak-ing Z3 strategies. But without any optimization, for the full word size of the EVM —256 bit— ebso did not handle the simple program

PUSH ADD POP within areasonable amount of time. Thus we need techniques to make ebso viable. Byinvestigating the models generated by Z3 run with the default conﬁguration, webelieve that the problem lies with the leading universally quantiﬁed variables.And we have plenty of them: for the input on the stack, for the storage, andfor uninterpreted instructions. By reducing the word size to a small k , we canreduce the search space for universally quantiﬁed variables from 2 to somesigniﬁcantly smaller 2 k . But then we need to check any target program foundwith a smaller word size. Example 5.

The program

PUSH SUB PUSH ADD from Example 1 optimizes to

NOT for word size 2 bit, because then the binary representation of 3 is all ones.When using word size 256 bit this optimization is not correct.To ensure that the target program has the same semantics for word size 256 bit,we use translation validation : we ask the solver to ﬁnd inputs, which distinguishthe source and target programs, i.e. , where both programs start in equivalentstates, but their ﬁnal state is diﬀerent. Using our existing machinery this formulais easy to build: Deﬁnition 8.

Two programs p and p (cid:48) are equivalent if ν ( p, p (cid:48) , σ, σ (cid:48) ) ≡ ∃V , τ ( p, σ ) ∧ τ ( p (cid:48) , σ (cid:48) ) ∧ (cid:15) ( σ, σ (cid:48) , , ∧ ¬ (cid:15) ( σ, σ (cid:48) , | p | , | p (cid:48) | ) is unsatisﬁable. Otherwise, p and p (cid:48) are diﬀerent, and the values for the variablesin V from the model are a corresponding witness. smtlib.cs.uiowa.edu/logics.shtml smt-comp.github.io/2019/results/ufbv-single-query This approach also allows for other over-approximations. For instance, we tried usingintegers instead of bit vectors, which performed worse.lockchain Superoptimizer 11

A subtle problem remains: how can we represent the program

PUSH k bit? Our solution is to replace arguments a , . . . , a m of PUSH where a i (cid:62) k with fresh, universally quantiﬁed variables c , . . . , c m . If a target programis found, we replace c i by the original value a i , and check with translationvalidation whether this target program is correct. A drawback of this approachis that we might lose potential optimizations. Example 6.

The program

PUSH

AND optimizes to the empty program.But, abstracting the argument of

PUSH translates the program to

PUSH c i AND ,which does not allow the same optimization.Like many compiler optimizations, ebso optimizes basic blocks. Therefore wesplit

EVM bytecode along instructions that change the control ﬂow, e.g.

JUMPI ,or

SELFDESTRUCT . Similarly we further split basic blocks into ( ebso ) blocks sothat they contain only encoded instructions. Instructions, which are not encoded,or encodable, include instructions that write to memory, e.g.

MSTORE , or the loginstructions

LOG . Lemma 1.

If program p superoptimizes to program t then in any program wecan replace p by t .Proof. We show the statement by induction on the program context ( c , c ) ofthe program c pc . By assumption, the statement holds for the base case ([ ] , [ ]).For the step case ( ιc , c ), we observe that every instruction ι is deterministic, i.e. executing ι starting from a state σ leads to a deterministic state σ (cid:48) . By inductionhypothesis, executing c pc and c tc from a state σ (cid:48) leads to the same state σ (cid:48)(cid:48) , and therefore we can replace ιc pc by ιc tc . We can reason analogously for( c , c ι ). We evaluated ebso on two real-word data sets: (i ) optimizing an already highlyoptimized data set in Section 6.1, and (ii ) a large-scale data set from the Ethereum blockchain to compare basic and unboundend superoptimization in Section 6.2.We use ebso to extract ebso blocks from our data sets. From the extractedblocks (i) we remove duplicate blocks, and (ii) we remove blocks which are onlydiﬀerent in the arguments of

PUSH by abstracting to word size 4 bit. We runboth evaluations on a cluster [7] consisting of nodes running Intel Xeon E5645processors at 2 .

40 GHz, with one core and 1 GiB of memory per instance.We successfully validated all optimizations found by ebso by running areference implementation of the

EVM on pseudo-random input. Therefore, we runthe bytecode of the original input block and the optimized bytecode to observethat both produce the same ﬁnal state. The

EVM implementation we use is go-ethereum version . github.com/ethereum/go-ethereum2 J. Nagele and M.A. Schett .

69 % (0 .

36 %)proved optimal 481 17 .

54 %time-out (trans. val. failed) 2243 (196) 81 .

77 % (7 .

15 %)

Table 1: Aggregated results of running ebso on GG . This evaluation tests ebso against human intelligence. Underlying our data set are200

Solidity contracts ( GG raw ) we collected from the . Inthat contest competitors had to write the most gas-eﬃcient

Solidity code for ﬁvegiven challenges: (i) integer sorting, (ii) implementing an interpreter, (iii) hexdecoding, (iv) string searching, and (v) removing duplicate elements. Everychallenge had two categories: standard and wild . For wild, any

Solidity feature isallowed—even inlining

EVM bytecode. The winner of each track received 1 Ether.The Gas Golﬁng Contest provides a very high-quality data set: the

EVM bytecodewas not only optimized by the solc compiler, but also by humans leveraging thesecompiler optimizations and writing inline code themselves. To collect our dataset GG , we ﬁrst compiled the Solidity contracts in GG raw with the same set-upas in the contest. One contract in the wild category failed to compile and wasthus excluded from GG raw . From the generated .bin-runtime ﬁles, we extractedour ﬁnal data set GG of 2743 distinct blocks.For this evaluation, we run ebso in its default mode: unbounded superop-timization. We run unbounded superoptimization because, as can be seen inSection 6.2, in our context unbounded superoptimization outperformed basicsuperoptimization. As time-out for this challenging data set, we estimated 1 h asreasonable.Table 1 shows the aggregated results of running ebso on GG . In total, ebso optimizes 19 blocks out of 2743, 10 of which are shown to be optimal. Moreover, ebso can prove for more than 17 % of blocks in GG that they are already optimal. Itis encouraging that ebso even ﬁnds optimizations in this already highly optimizeddata set. The quality of the data set is supported by the high percentage of blocksbeing proved as optimal by ebso . Next we examine three found optimizationsmore closely. Our favorite optimization POP PUSH SWAP1 POP PUSH

SLTDUP1 EQ PUSH

SLTDUP1 EQ is, in fact, a round-about and optimizable way to pop two words fromthe stack and push 1 on the stack. Some optimizations follow clear patterns.The optimizations

CALLVALUE DUP1 ISZERO PUSH

81 to

CALLVALUE CALLVALUE g.solidity.cc Namely, $ solc --optimize --bin-runtime --optimize-runs 200 with solc com-piler version available at github.com/ethereum/solidity/tree/v0.4.24.lockchain Superoptimizer 13 uso bso .

54 % (0 .

64 %) 184 0 . .

34 % 348 0 .

57 %time-out (trans. val. failed) 56 392 (1467) 92 .

12 % (2 . .

13 %

Table 2: Aggregated results of running ebso with uso and bso on EthBC . ISZERO PUSH

81 and

CALLVALUE DUP1 ISZERO PUSH

364 to

CALLVALUE CALLVALUEISZERO PUSH

364 are both based on the fact that

CALLVALUE is cheaper than

DUP1 . Finding such patterns and generalizing them into peephole optimizationrules could be interesting future work.Unfortunately, ebso hit a time-out in nearly 82 % of all cases, where we counta failed translation validation as part of the time-outs, since in that case ebso continues to search for optimizations after increasing the word size. vs.

Basic Superoptimization

In this evaluation we compare unbounded and basic superoptimization, whichwe will abbreviate with uso and bso , respectively. To compare uso and bso , wewant a considerably larger data set. Fortunately, there is a rich source of

EVM bytecode accessible: contracts deployed on the

Ethereum blockchain. Assumingthat contracts that are called more often are well constructed, we queried the2500 most called contracts using Google

BigQuery. From them we extractour data set

EthBC of 61 217 distinct blocks. For this considerably larger dataset, we estimated a cut-oﬀ point of 15 min as reasonable. One limitation is that,due to the high volume, we only run the full evaluation once.Table 2 shows the aggregated results of running ebso on EthBC . Out of 61 217blocks in

EthBC , ebso ﬁnds 943 optimizations using uso out of which it proves 393to be optimal. Using bso

184 optimizations are found. Some blocks were shownto be optimal by both approaches. Also, both approaches time out in a majorityof the cases: uso in more than 92 %, and bso in more than 99 %. Over all 61 217blocks the total amount of gas saved for uso is 17 871 and 6903 for bso . Forall blocks where an optimization is found, the average gas saving per block in uso is 29 .

63 %, and 46 . bso . The higher average for bso can be explainedby (i ) bso ’s bias for smaller blocks, where relative savings are naturally higher,and (ii ) bso only providing optimal results, whereas uso may ﬁnd intermediate,non-optimal results. The optimization with the largest gain, is one which we didnot necessarily expect to ﬁnd in a deployed contract: a redundant storage access.Storage is expensive, hence optimized for in deployed contracts, but uso and up to block number 7 300 000 deployed on Mar-04-2019 01:22:15 AM +UTC cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics4 J. Nagele and M.A. Schett bso both found PUSH PUSH SLOAD SUB PUSH DUP2 SWAP1 SSTORE POP whichoptimizes to the empty program—because the program basically loads the valuefrom key 4 only to store it back to that same key. This optimization saves atleast 5220 g , but up to 20 220 g .From Table 2 we see that on EthBC , uso outperforms bso by roughly a factorof ﬁve on found optimizations; more than ten times as many blocks are provedoptimal by uso than by bso . As we expected, most optimizations found by bso were also found by uso , but surprisingly, bso found 21 optimizations, on which uso failed. We found that nearly all of the 21 source programs are fairly complicated,but have a short optimization of two or three instructions. To pick an example, theblock PUSH PUSH SLOAD LT ISZERO ISZERO ISZERO PUSH

PUSH PUSH bso . Additionally, all 21 blocks are cheap: all cost less than 10 g .We also would have expected at least some of these optimizations to have beenfound by uso . We believe internal unfortunate, non-deterministic choice withinthe solver to be the reason that it did not. Summary.

We develop ebso , a superoptimizer for

EVM bytecode, implementingtwo diﬀerent superoptimization approaches and compare them on a large set ofreal-world smart contracts. Our experiments show that, relying on the heavilyoptimized search heuristics of a modern SMT solver is a feasible approach tosuperoptimizing

EVM bytecode.

Related Work.

Superoptimization [9] has been explored for a variety of dif-ferent contexts [5, 6, 10, 12], including binary translation [1] and synthesizingcompiler optimizations [11]. To our knowledge ebso is the ﬁrst application ofsuperoptimization to smart contracts.Chen et al. [2] also aim to save gas by optimizing

EVM bytecode. Theyidentiﬁed 24 anti patterns by manual inspection. Building on their work werun ebso on their identiﬁed patterns. For 19 instances, ebso too found the sameoptimizations. For 2 patterns, ebso lacks encoding of the instructions (

STOP , JUMP ), and for 2 patterns ebso times out on a local machine.Due to the repeated exploitation of ﬂaws in smart contracts, various for-mal approaches for analyzing

EVM bytecode have been proposed. For instanceOyente [8] performs control ﬂow analysis in order to detect security defects suchas reentrancy bugs.

Outlook.

There is ample opportunity for future work. We do not yet support the

EVM ’s memory. While conceptually this would be a straightforward extension,the number of universally quantiﬁed variables and size of blocks are alreadyposing challenges for performance, as we identiﬁed by analyzing the optimizationsfound by ebso . lockchain Superoptimizer 15 Thus, it would be interesting to use SMT benchmarks obtained by ebso ’ssuperoptimization encoding to evaluate diﬀerent solvers, e.g.

CVC4 or Vampire .The basis for this is already in place: ebso can export the generated constraintsin SMT-LIB format. Accordingly, we plan to generate new SMT benchmarks andsubmit them to one of the suitable categories of SMT-LIB.In order to ease the burden on developers ebso could beneﬁt from cachingcommon optimization patterns [11] to speed up optimization times. Anotherfruitful approach could be to extract the optimization patterns and generalizethem into peephole optimizations and rewrite rules. References

1. Bansal, S., Aiken, A.: Binary translation using peephole superoptimizers. In: Proc.8th OSDI. pp. 177–192. USENIX (2008)2. Chen, T., Li, Z., Zhou, H., Chen, J., Luo, X., Li, X., Zhang, X.: Towards savingmoney in using smart contracts. In: Proc. 40th ICSE-NIER. pp. 81–84. ACM (2018).https://doi.org/10.1145/3183399.31834203. De Moura, L., Bjørner, N.: Z3: An eﬃcient smt solver. In: Proc. 14th TACAS.LNCS, vol. 9206, pp. 337–340. Springer (2008)4. Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In:Proc. 32nd PLDI. pp. 62–73. ACM (2011). https://doi.org/10.1145/1993498.19935065. Jangda, A., Yorsh, G.: Unbounded superoptimization. In: Proc. Onward! 2017. pp.78–88. ACM (2017). https://doi.org/10.1145/3133850.31338566. Joshi, R., Nelson, G., Randall, K.H.: Denali: A Goal-directed Superoptimizer. In:Proc. 23rd PLDI. pp. 304–314. ACM (2002). https://doi.org/10.1145/512529.5125667. King, T., Butcher, S., Zalewski, L.: Apocrita - High Performance Com-puting Cluster for Queen Mary University of London (Mar 2017).https://doi.org/10.5281/zenodo.4380458. Luu, L., Chu, D.H., Olickel, H., Saxena, P., Hobor, A.: Making smartcontracts smarter. In: Proc. 23rd CCS. pp. 254–269. ACM (2016).https://doi.org/10.1145/2976749.29783099. Massalin, H.: Superoptimizer: A look at the smallest program. In: Proc. 2ndASPLOS. pp. 122–126. IEEE (1987). https://doi.org/10.1145/36206.3619410. Phothilimthana, P.M., Thakur, A., Bod´ık, R., Dhurjati, D.: Scaling upsuperoptimization. In: Proc. 21st ASPLOS. pp. 297–310. ACM (2016).https://doi.org/10.1145/2872362.287238711. Sasnauskas, R., Chen, Y., Collingbourne, P., Ketema, J., Taneja, J., Regehr, J.:Souper: A synthesizing superoptimizer. CoRR abs/1711.04422 (2017), http://arxiv.org/abs/1711.0442212. Schkufza, E., Sharma, R., Aiken, A.: Stochastic superoptimization. In: Proc. 18thASPLOS. pp. 305–316. ACM (2013). https://doi.org/10.1145/2451116.245115013. Srinivasan, V., Reps, T.: Synthesis of machine code from semantics. In: Proc. 36thPLDI. pp. 596–607. ACM (2015). https://doi.org/10.1145/2737924.273796014. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger. Tech.Rep. Byzantium Version e94ebda (2018), https://ethereum.github.io/yellowpaper/paper.pdf cvc4.cs.stanford.edu/web/15