BBlockchain Superoptimizer (cid:63)
Julian Nagele − − − and Maria A Schett − − − Queen Mary University of London, UK [email protected] University College London, UK [email protected]
Abstract.
In the blockchain-based, distributed computing platformEthereum, programs called smart contracts are compiled to bytecode andexecuted on the Ethereum Virtual Machine (EVM). Executing EVM byte-code is subject to monetary fees—a clear optimization target. Our aim isto superoptimize EVM bytecode by encoding the operational semanticsof EVM instructions as SMT formulas and leveraging a constraint solverto automatically find cheaper bytecode. We implement this approach inour EVM Bytecode SuperOptimizer ebso and perform two large scaleevaluations on real-world data sets.
Keywords:
Superoptimization, Ethereum, Smart Contracts, SMT
Ethereum is a blockchain-based, distributed computing platform featuring a quasi-Turing complete programming language. In
Ethereum , programs are called smartcontracts, compiled to bytecode and executed on the Ethereum Virtual Machine(
EVM ). In order to avoid network spam and to ensure termination, executionis subject to monetary fees. These fees are specified in units of gas , i.e. , anyinstruction executed on the EVM has a cost in terms of gas, possibly dependingon its input and the execution state.
Example 1.
Consider the expression 3+(0 − x ), which corresponds to the program PUSH SUB PUSH ADD . The
EVM is a stack-based machine, so this program takesan argument x from the stack to compute the expression above. However, clearlyone can save the ADD instruction and instead compute 3 − x , i.e. , optimize theprogram to PUSH SUB . The first program costs 12 g to execute on the EVM ,while the second costs only 6 g .We build a tool that automatically finds this optimization and similar othersthat are missed by state-of-the-art smart contract compilers: the E VM b ytecode s uper o ptimizer ebso . The use of ebso for Example 1 is sketched in Figure 1. Tofind these optimizations, ebso implements superoptimization . Superoptimizationis often considered too slow to use during software development except for specialcircumstances. We argue that compiling smart contracts is such a circumstance. (cid:63) This research is supported by the UK Research Institute in Verified TrustworthySoftware Systems and partially supported by funding from Google. a r X i v : . [ c s . L O ] M a y J. Nagele and M.A. Schett xx − x )3 − x ≡ EVM executes for 12 g PUSH SUB PUSH ADD
EVM executes for 6 g PUSH SUB
Fig. 1: Overview over ebso .Since bytecode, once it has been deployed to the blockchain, cannot change again,spending extra time optimizing a program that may be called many times, mightwell be worth it. Especially, since it is very clear what “worth it” means: theclear cost model of gas makes it easy to define optimality. Our main contributions are: (i) an SMT encoding of a subset of
EVM bytecodesemantics (Section 4), (ii) an implementation of two flavors of superoptimization: basic , where the constraint solver is used to check equivalence of enumeratedcandidate instruction sequences, and unbounded , where also the enumerationitself is shifted to the constraint solver (Section 5), and (iii) two large scaleevaluations (Section 6). First, we run ebso on a collection of smart contracts froma programming competition aimed at producing the cheapest
EVM bytecodefor given programming challenges. Even in this already highly optimized dataset ebso still finds 19 optimizations. In the second evaluation we compare theperformance of basic and unbounded superoptimization on the 2500 most calledsmart contracts from the
Ethereum blockchain and find that, in our setting,unbounded superoptimization outperforms basic superoptimization.
Smart contracts in
Ethereum are usually written in a specialized high-levellanguage such as
Solidity or Vyper and then compiled into bytecode , which isexecuted on the
EVM . The
EVM is a virtual machine formally defined in the
Ethereum yellow paper [14]. It is based on a stack , which holds words , i.e. , bitvectors, of size 256. The maximal stack size is set to 2 . Pushing words onto afull stack leads to a stack overflow , while removing words from the empty stackleads to a stack underflow . Both lead the EVM to enter an exceptional halting state. The
EVM also features a volatile memory , a word-addressed byte array,and a persistent key-value storage , a word-addressed word array, whose contentsare stored on the
Ethereum blockchain. The bytecode directly corresponds to Of course setting the gas price of individual instructions, such that it accuratelyreflects the computational cost is hard, and has been a problem in the past see e.g. news.ycombinator.com/item?id=12557372. This word size was chosen to facilitate the cryptographic computations such ashashing that are often performed in the
EVM .lockchain Superoptimizer 3 more human-friendly instructions . For example, the
EVM bytecode encodes the following sequence of instructions:
PUSH PUSH ADD . Instructionscan be classified into different categories, such as arithmetic operations, e.g.
ADD and
SUB for addition and subtraction, comparisons , e.g. SLT for signed less-than,and bitwise operations , like
AND and
NOT . The instruction
PUSH pushes a word ontothe stack, while
POP removes the top word. Words on the stack can be duplicatedusing
DUP i and swapped using SWAP i for 1 (cid:54) i (cid:54)
16, where i refers to the i thword below the top. Some instructions are specific to the blockchain domain, like BLOCKHASH , which returns the hash of a recently mined block, or
ADDRESS , whichreturns the address of the currently executing account. Instructions for controlflow include e.g.
JUMP , JUMPDEST , and
STOP .We write δ ( ι ) for the number of words that instruction ι takes from the stack,and α ( ι ) for the number of words ι adds onto the stack. A program p is a finitesequence of instructions. We define the size | p | of a program as the number of itsinstructions. To execute a program on the Ethereum blockchain, the caller has topay gas . The amount to be paid depends on both the instructions of the programand the input: every instruction comes with a gas cost . For example,
PUSH and
ADD currently cost 3 g , and therefore executing the program above costs 9 g . Mostinstructions have a fixed cost, but some take the current state of the executioninto account. A prominent example of this behavior is storage. Writing to azero-valued key conceptually allocates new storage and thus is more expensivethan writing to a key that is already in use, i.e. , holds a non-zero value. The gasprices of all instructions are specified in the yellow paper [14]. Given a source program p superoptimization tries to generate a target program p (cid:48) such that (i ) p (cid:48) is equivalent to p , and (ii ) the cost of p (cid:48) is minimal with respectto a given cost function C . This problem arises in several contexts with differentsource and target languages. In our case, i.e. , for a binary recompiler, both sourceand target are EVM bytecode.A standard approach to superoptimization and synthesis [4, 9, 12, 13] is tosearch through the space of candidate instruction sequences of increasing costand use a constraint solver to check whether a candidate correctly implementsthe source program. The solver of choice is usually a Satisfiability ModuloTheories (SMT) solver, which operates on first-order formulas in combinationwith background theories, such as the theory of bit vectors or arrays. ModernSMT solvers are highly optimized and implement techniques to handle arbitraryfirst-order formulas, such as E-matching. With increasing cost of the candidatesequence, the search space dramatically increases. To deal with this explosionone idea is to hand some of the search to the solver, by using templates [4, 13].Templates leave holes in the target program, e.g. for immediate arguments ofinstructions, that the solver must then fill. A candidate program is correct if the We gloss over the 32 different
PUSH instructions depending on the size of the word tobe pushed. J. Nagele and M.A. Schett1: function
BasicSo ( p s , C )2: n ← while true do for all p t ∈ { p | C ( p ) = n } do χ ← EncodeBso ( p s , p t )6: if Satisfiable ( χ ) then m ← GetModel ( χ )8: p t ← DecodeBso ( m )9: return p t n ← n + 1(a) Basic Superoptimization. 1: function UnboundedSo ( p s , C )2: p t ← p s χ ← EncodeUso ( p t ) ∧ Bound ( p t , C )4: while Satisfiable ( χ ) do m ← GetModel ( χ )6: p t ← DecodeUso ( m )7: χ ← χ ∧ Bound ( p t , C )8: return p t (b) Unbounded Superoptimization. Alg. 2: Superoptimization.encoding is satisfiable, i.e. , if the solver finds a model. Constructing the targetprogram then amounts to obtaining the values for the templates from the model.This approach is shown in Algorithm 2(a).
Unbounded superoptimization [5, 6] pushes this idea further. Instead of search-ing through candidate programs and calling the SMT solver on them, it shiftsthe search into the solver, i.e. , the encoding expresses all candidate instructionsequences of any length that correctly implement the source program. This ap-proach is shown in Algorithm 2(b): if the solver returns satisfiable then there isan instruction sequence that correctly implements the source program. Again,this target program is reconstructed from the model. If successful, a constraintasking for a cheaper program is added and the solver is called again. Note thatthis also means that unbounded superoptimization can stop with a correct, butpossibly non-optimal solution. In contrast, basic superoptimization cannot returna correct solution until it has finished.The main ingredients of superoptimization in Algorithm 2 are
Encode-Bso/Uso producing the SMT encoding, and
DecodeBso/Uso reconstructingthe target program from a model. We present our encodings for the semantics of
EVM bytecode in the following section.
We start by encoding three parts of the
EVM execution state: (i) the stack, (ii) gas consumption, and (iii) whether the execution is in an exceptional haltingstate. We model the stack as an uninterpreted function together with a counter,which points to the next free position on the stack.
Definition 1.
A state σ = (cid:104) st , c , hlt , g (cid:105) consists of (i) a function st ( V , j, n ) that, after the program has executed j instructions oninput variables from V returns the word from position n in the stack, (ii) a function c ( j ) that returns the number of words on the stack after executing j instructions. Hence st ( V , j, c ( j ) − returns the top of the stack. lockchain Superoptimizer 5 (iii) a function hlt ( j ) that returns true ( (cid:62) ) if exceptional halting has occurredafter executing j instructions, and false ( ⊥ ) otherwise. (iv) a function g ( V , j ) that returns the amount of gas consumed after executing j instructions. Here the functions in σ represent all execution states of a program, indexed byvariable j . Example 2.
Symbolically executing the program
PUSH PUSH ADD using ourrepresentation above we have g (0) = 0 g (1) = 3 g (2) = 6 g (3) = 9 c (0) = 0 c (1) = 1 c (2) = 2 c (3) = 1 st (1 ,
0) = 41 st (2 ,
0) = 41 st (2 ,
1) = 1 st (3 ,
0) = 42and hlt (0) = hlt (1) = hlt (2) = hlt (3) = ⊥ .Note that this program does not consume any words that were already onthe stack. This is not the case in general. For instance we might be dealing withthe body of a function, which takes its arguments from the stack. Hence we needto ensure that at the beginning of the execution sufficiently many words are onthe stack. To this end we first compute the depth ˆ δ ( p ) of the program p , i.e. , thenumber of words a program p consumes. Then we take variables x , . . . , x ˆ δ ( p ) − that represent the input to the program and initialize our functions accordingly. Definition 2.
For a program with ˆ δ ( p ) = d we initialize the state σ using g σ (0) = 0 ∧ hlt σ (0) = ⊥ ∧ c σ (0) = d ∧ (cid:94) (cid:54) (cid:96) ADD we set c (0) = 2, and st ( { x , x } , , 0) = x and st ( { x , x } , , 1) = x . We then have st ( { x , x } , , 0) = x + x .To encode the effect of EVM instructions we build SMT formulas to capturetheir operational semantics. That is, for an instruction ι and a state σ we give aformula τ ( ι, σ, j ) that defines the effect on state σ if ι is the j -th instruction thatis executed. Since large parts of these formulas are similar for every instructionand only depend on δ and α we build them from smaller building blocks. Definition 3. For an instruction ι and state σ we define: τ g ( ι, σ, j ) ≡ g σ ( V , j + 1) = g σ ( V , j ) + C ( σ, j, ι ) τ c ( ι, σ, j ) ≡ c σ ( j + 1) = c σ ( j ) + α ( ι ) − δ ( ι ) τ pres ( ι, σ, j ) ≡ ∀ n.n < c σ ( j ) − δ ( ι ) → st σ ( V , j + 1 , n ) = st σ ( V , j, n ) τ hlt ( ι, σ, j ) ≡ hlt σ ( j + 1) = hlt σ ( j ) ∨ c σ ( j ) − δ ( ι ) < ∨ c σ ( j ) − δ ( ι ) + α ( ι ) > Here C ( σ, j, ι ) is the gas cost of executing instruction ι on state σ after j steps. J. Nagele and M.A. Schett The formula τ g adds the cost of ι to the gas cost incurred so far. The formula τ c updates the counter for the number of words on the stack according to δ and α . The formula τ pres expresses that all words on the stack below c σ ( j ) − δ ( ι ) arepreserved. Finally, τ hlt captures that exceptions relevant to the stack can occurthrough either an underflow or an overflow, and that once it has occurred anexceptional halt state persists. For now the only other component we need is howthe instructions affect the stack st , i.e. , a formula τ st ( ι, σ, j ). Here we only givean example and refer to our implementation or the yellow paper [14] for details.We have τ st ( ADD , σ, j ) ≡ st σ ( V , j + 1 , c σ ( j + 1) − st σ ( V , j, c σ ( j ) − 1) + st σ ( V , j, c σ ( j ) − Definition 4. For an instruction ι and state σ we define τ ( ι, σ, j ) ≡ τ st ( ι, σ, j ) ∧ τ c ( ι, σ, j ) ∧ τ g ( ι, σ, j ) ∧ τ hlt ( ι, σ, j ) ∧ τ pres ( ι, σ, j )Then to encode the semantics of a program p all we need to do is to apply τ to the instructions of p . Definition 5. For a program p = ι · · · ι n we set τ ( p, σ ) ≡ (cid:86) (cid:54) j (cid:54) n τ ( ι j , σ, j ) . Before building an encoding for superoptimization we consider another aspectof the EVM for our state representation: storage and memory. The gas cost forstoring words depends on the words that are currently stored. Similarly, the costfor using memory depends on the number of bytes currently used. This is whythe cost of an instruction C ( σ, j, ι ) depends on the state and the function g σ accumulating gas cost depends on V .To add support for storage and memory to our encoding there are two naturalchoices: the theory of arrays or an Ackermann encoding. However, since we havenot used arrays so far, they would require the solver to deal with an additionaltheory. For an Ackermann encoding we only need uninterpreted functions, whichwe have used already. Hence, to represent storage in our encoding we extendstates with an uninterpreted function str ( V , j, k ), which returns the word atkey k after the program has executed j instructions. Similarly to how we setup the initial stack we need to deal with the values held by the storage beforethe program is executed. Thus, to initialize str we introduce fresh variables torepresent the initial contents of the storage. More precisely, for all SLOAD and SSTORE instructions occurring at positions j , . . . , j (cid:96) in the source program, weintroduce fresh variables s , . . . , s (cid:96) and add them to V . Then for a state σ weinitialize str σ by adding the following conjunct to the initialization constraintfrom Definition 2: ∀ w. str σ ( V , , w ) = ite ( w = a j , s , ite ( w = a j , s , . . . , ite ( w = a j (cid:96) , s (cid:96) , w ⊥ )))where a j = st σ ( V , j, c ( j ) − 1) and w ⊥ is the default value for words in the storage. lockchain Superoptimizer 7 The effect of the two storage instructions SLOAD and SSTORE can then beencoded as follows: τ st ( SLOAD , σ, j ) ≡ st σ ( V , j + 1 , c σ ( j + 1) − 1) = str ( V , j, st σ ( V , j, c σ ( j ) − τ str ( SSTORE , σ, j ) ≡ ∀ w. str σ ( V , j + 1 , w ) = ite ( w = st σ ( V , j, c σ ( j ) − , st σ ( V , j, c σ ( j ) − , str σ ( V , j, w ))Moreover all instructions except SSTORE preserve the storage, that is, for ι (cid:54) = SSTORE we add the following conjunct to τ pres ( ι, σ, j ): ∀ w. str σ ( V , j + 1 , w ) = str σ ( V , j, w )To encode memory a similar strategy is an obvious way to go. However, wefirst want to evaluate the solver’s performance on the encodings obtained whenusing stack and storage. Since the solver already struggled, due to the size of theprograms and the number of universally quantified variables, see Section 6, wehave not yet added an encoding of memory.Finally, to use our encoding for superoptimization we need an encoding ofequality for two states after a certain number of instructions. Either to ensurethat two programs are equivalent (they start and end in equal states) or different(they start in equal states, but end in different ones). The following formulacaptures this constraint. Definition 6. For states σ and σ and program locations j and j we define (cid:15) ( σ , σ , j , j ) ≡ c σ ( j ) = c σ ( j ) ∧ hlt σ ( j ) = hlt σ ( j ) ∧ ∀ n.n < c σ ( j ) → st σ ( V , j , n ) = st σ ( V , j , n ) ∧ ∀ w. str σ ( V , j , w ) = str σ ( V , j , w )Since we aim to improve gas consumption, we do not demand equality for g .We now have all ingredients needed to implement basic superoptimization:simply enumerate all possible programs ordered by gas cost and use the encodingsto check equivalence. However, since already for one PUSH there are 2 possiblearguments, this will not produce results in a reasonable amount of time. Hence weuse templates as described in Section 3. We introduce an uninterpreted function a ( j ) that maps a program location j to a word, which will be the argumentof PUSH . The solver then fills these templates and we can get the values fromthe model. This is a step forward, but since we have 80 encoded instructions,enumerating all permutations still yields too large a search space. Hence we use anencoding similar to the CEGIS algorithm [4]. Given a collection of instructions, weformulate a constraint representing all possible permutations of these instructions.It is satisfiable if there is a way to connect the instructions into a target programthat is equivalent to the source program. The order of the instructions can againbe reconstructed from the model provided by the solver. More precisely givena source program p and a list of candidate instructions ι , . . . , ι n , EncodeBso from Algorithm 2(a) takes variables j , . . . , j n and two states σ and σ (cid:48) and builds J. Nagele and M.A. Schett the following formula ∀V . (cid:15) ( σ, σ (cid:48) , , ∧ (cid:15) ( σ, σ (cid:48) , | p | , n ) ∧ τ ( p, σ ) ∧ (cid:94) (cid:54) (cid:96) (cid:54) n τ ( ι (cid:96) , σ (cid:48) , j (cid:96) ) ∧ (cid:94) (cid:54) (cid:96) Given a set of instructions CI we define the formula ρ ( σ, n ) as ∀ j. j (cid:62) ∧ j < n → (cid:94) ι ∈ CI instr ( j ) = ι → τ ( ι, σ, j ) ∧ (cid:95) ι ∈ CI instr ( j ) = ι Finally, the constraint produced by EncodeUso from Algorithm 2(b) is ∀V . τ ( p, σ ) ∧ ρ ( σ (cid:48) , n ) ∧ (cid:15) ( σ, σ (cid:48) , , ∧ (cid:15) ( σ, σ (cid:48) , | p | , n ) ∧ g σ ( V , | p | ) > g σ (cid:48) ( V , n )During our experiments we observed that the solver struggles to show thatthe formula is unsatisfiable when p is already optimal. To help in these caseswe additionally add a bound on n : since the cheapest EVM instruction has gascost 1, the target program cannot use more instructions than the gas cost of p , i.e. , we add n (cid:54) g σ ( V , | p | ).In our application domain there are many instructions that fetch informationfrom the outside world. For instance, ADDRESS gets the Ethereum address ofthe account currently executing the bytecode of this smart contract. Since itis not possible to know these values at compile time we cannot encode theirfull semantics. However, we would still like to take advantage of structuraloptimizations where these instructions are involved, e.g. , via DUP and SWAP . Example 3. Consider the program ADDRESS DUP1 . The same effect can be achievedby simply calling ADDRESS ADDRESS . Duplicating words on the stack, if they areused multiple times, is an intuitive approach. However, because executing ADDRESS costs 2 g and DUP1 costs 3 g , perhaps unexpectedly, the second program is cheaper.To find such optimizations we need a way to encode ADDRESS and similarinstructions. For our purposes, these instructions have in common that they putarbitrary but fixed words onto the stack. Analogous to uninterpreted functions, wecall them uninterpreted instructions and collect them in the set UI . To represent lockchain Superoptimizer 9 their output we use universally quantified variables—similar to input variables.To encode the effect uninterpreted instructions have on the stack, i.e. , τ st , wedistinguish between constant and non-constant uninterpreted instructions.Let ui c ( p ) be the set of constant uninterpreted instructions in p , i.e. ui c ( p ) = { ι ∈ p | ι ∈ UI ∧ δ ( ι ) = 0 } . Then for ui c ( p ) = { ι , . . . , ι k } we take variables u ι , . . . , u ι k and add them to V , and thus to the arguments of the state func-tion st . The formula τ st can then use these variables to represent the unknownword produced by the uninterpreted instruction, i.e. , for ι ∈ ui c ( p ) with thecorresponding variable u ι in V , we set τ st ( ι, σ, j ) ≡ st σ ( V , j + 1 , c σ ( j )) = u ι .For a non-constant instruction ι , such as BLOCKHASH or BALANCE , the wordput onto the stack by ι depends on the top δ ( ι ) words of the stack. We againmodel this dependency using an uninterpreted function. That is, for every non-constant uninterpreted instruction ι in the source program p , ui n ( p ) = { ι ∈ p | ι ∈ UI ∧ δ ( ι ) > } , we use an uninterpreted function f ι . Conceptually, we canthink of f ι as a read-only memory initialized with the values that the calls to ι produce. Example 4. The instruction BLOCKHASH gets the hash of a given block b . Thusoptimizing the program PUSH b BLOCKHASH PUSH b BLOCKHASH depends on thevalues b and b . If b = b then the cheaper program PUSH b BLOCKHASH DUP1 yields the same state as the original program.To capture this behaviour, we need to associate the arguments b and b of BLOCKHASH with the two different results they may produce. As with constantuninterpreted instructions, to model arbitrary but fixed results, we add freshvariables to V . However, to account for different results produced by (cid:96) invocationsof ι in p we have to add (cid:96) variables. Let p be a program and ι ∈ ui n ( p ) aunary instruction which appears (cid:96) times at positions j , . . . , j (cid:96) in p . For variables u , . . . , u (cid:96) , we initialize f ι as follows: ∀ w. f ι ( V , w ) = ite ( w = a j , u , ite ( w = a j , u , . . . , ite ( w = a j (cid:96) , u (cid:96) , w ⊥ )))where a j is the word on the stack after j instructions in p , that is a j = st σ ( V , j, c ( j ) − w ⊥ is a default word.This approach straightforwardly extends to instructions with more than oneargument. Here we assume that uninterpreted instructions put exactly one wordonto the stack, i.e. , α ( ι ) = 1 for all ι ∈ UI . This assumption is easily verified forthe EVM : the only instructions with α ( ι ) > DUP and SWAP . Finally we setthe effect a non-constant uninterpreted instruction ι with associated function f ι has on the stack: τ st ( ι, σ, j ) ≡ st σ ( V , j + 1 , c σ ( j + 1) − 1) = f ι ( V , st σ ( V , j, c σ ( j ) − BLOCKHASH returns 0 if it is called for a blocknumber greater than the current block number. While the current block numberis not known at compile time, the instruction NUMBER does return it. Encodingthis interplay between BLOCKHASH and NUMBER could potentially be exploited forfinding optimizations. We implemented basic and unbounded superoptimization in our tool ebso , whichis available under the Apache-2.0 license: github.com/juliannagele/ebso . Theencoding employed by ebso uses several background theories: (i ) uninterpretedfunctions (UF) for encoding the state of the EVM , for templates, and for encodinguninterpreted instructions, (ii ) bit vector arithmetic (BV) for operations onwords, (iii ) quantifiers for initial words on the stack and in the storage, and theresults of uninterpreted instructions, and (iv ) linear integer arithmetic (LIA)for the instruction counter. Hence following the SMT-LIB classification ebso ’sconstraints fall under the logic UFBVLIA. As SMT solver we chose Z3 [3],version 4.7.1 which we call with default configurations. In particular, Z3 performedwell for the theory of quantified bit vectors and uninterpreted functions in thelast SMT competition (albeit non-competing). The aim of our implementation is to provide a prototype without relyingon heavy engineering and optimizations such as exploiting parallelism or tweak-ing Z3 strategies. But without any optimization, for the full word size of the EVM —256 bit— ebso did not handle the simple program PUSH ADD POP within areasonable amount of time. Thus we need techniques to make ebso viable. Byinvestigating the models generated by Z3 run with the default configuration, webelieve that the problem lies with the leading universally quantified variables.And we have plenty of them: for the input on the stack, for the storage, andfor uninterpreted instructions. By reducing the word size to a small k , we canreduce the search space for universally quantified variables from 2 to somesignificantly smaller 2 k . But then we need to check any target program foundwith a smaller word size. Example 5. The program PUSH SUB PUSH ADD from Example 1 optimizes to NOT for word size 2 bit, because then the binary representation of 3 is all ones.When using word size 256 bit this optimization is not correct.To ensure that the target program has the same semantics for word size 256 bit,we use translation validation : we ask the solver to find inputs, which distinguishthe source and target programs, i.e. , where both programs start in equivalentstates, but their final state is different. Using our existing machinery this formulais easy to build: Definition 8. Two programs p and p (cid:48) are equivalent if ν ( p, p (cid:48) , σ, σ (cid:48) ) ≡ ∃V , τ ( p, σ ) ∧ τ ( p (cid:48) , σ (cid:48) ) ∧ (cid:15) ( σ, σ (cid:48) , , ∧ ¬ (cid:15) ( σ, σ (cid:48) , | p | , | p (cid:48) | ) is unsatisfiable. Otherwise, p and p (cid:48) are different, and the values for the variablesin V from the model are a corresponding witness. smtlib.cs.uiowa.edu/logics.shtml smt-comp.github.io/2019/results/ufbv-single-query This approach also allows for other over-approximations. For instance, we tried usingintegers instead of bit vectors, which performed worse.lockchain Superoptimizer 11 A subtle problem remains: how can we represent the program PUSH k bit? Our solution is to replace arguments a , . . . , a m of PUSH where a i (cid:62) k with fresh, universally quantified variables c , . . . , c m . If a target programis found, we replace c i by the original value a i , and check with translationvalidation whether this target program is correct. A drawback of this approachis that we might lose potential optimizations. Example 6. The program PUSH AND optimizes to the empty program.But, abstracting the argument of PUSH translates the program to PUSH c i AND ,which does not allow the same optimization.Like many compiler optimizations, ebso optimizes basic blocks. Therefore wesplit EVM bytecode along instructions that change the control flow, e.g. JUMPI ,or SELFDESTRUCT . Similarly we further split basic blocks into ( ebso ) blocks sothat they contain only encoded instructions. Instructions, which are not encoded,or encodable, include instructions that write to memory, e.g. MSTORE , or the loginstructions LOG . Lemma 1. If program p superoptimizes to program t then in any program wecan replace p by t .Proof. We show the statement by induction on the program context ( c , c ) ofthe program c pc . By assumption, the statement holds for the base case ([ ] , [ ]).For the step case ( ιc , c ), we observe that every instruction ι is deterministic, i.e. executing ι starting from a state σ leads to a deterministic state σ (cid:48) . By inductionhypothesis, executing c pc and c tc from a state σ (cid:48) leads to the same state σ (cid:48)(cid:48) , and therefore we can replace ιc pc by ιc tc . We can reason analogously for( c , c ι ). We evaluated ebso on two real-word data sets: (i ) optimizing an already highlyoptimized data set in Section 6.1, and (ii ) a large-scale data set from the Ethereum blockchain to compare basic and unboundend superoptimization in Section 6.2.We use ebso to extract ebso blocks from our data sets. From the extractedblocks (i) we remove duplicate blocks, and (ii) we remove blocks which are onlydifferent in the arguments of PUSH by abstracting to word size 4 bit. We runboth evaluations on a cluster [7] consisting of nodes running Intel Xeon E5645processors at 2 . 40 GHz, with one core and 1 GiB of memory per instance.We successfully validated all optimizations found by ebso by running areference implementation of the EVM on pseudo-random input. Therefore, we runthe bytecode of the original input block and the optimized bytecode to observethat both produce the same final state. The EVM implementation we use is go-ethereum version . github.com/ethereum/go-ethereum2 J. Nagele and M.A. Schett . 69 % (0 . 36 %)proved optimal 481 17 . 54 %time-out (trans. val. failed) 2243 (196) 81 . 77 % (7 . 15 %) Table 1: Aggregated results of running ebso on GG . This evaluation tests ebso against human intelligence. Underlying our data set are200 Solidity contracts ( GG raw ) we collected from the . Inthat contest competitors had to write the most gas-efficient Solidity code for fivegiven challenges: (i) integer sorting, (ii) implementing an interpreter, (iii) hexdecoding, (iv) string searching, and (v) removing duplicate elements. Everychallenge had two categories: standard and wild . For wild, any Solidity feature isallowed—even inlining EVM bytecode. The winner of each track received 1 Ether.The Gas Golfing Contest provides a very high-quality data set: the EVM bytecodewas not only optimized by the solc compiler, but also by humans leveraging thesecompiler optimizations and writing inline code themselves. To collect our dataset GG , we first compiled the Solidity contracts in GG raw with the same set-upas in the contest. One contract in the wild category failed to compile and wasthus excluded from GG raw . From the generated .bin-runtime files, we extractedour final data set GG of 2743 distinct blocks.For this evaluation, we run ebso in its default mode: unbounded superop-timization. We run unbounded superoptimization because, as can be seen inSection 6.2, in our context unbounded superoptimization outperformed basicsuperoptimization. As time-out for this challenging data set, we estimated 1 h asreasonable.Table 1 shows the aggregated results of running ebso on GG . In total, ebso optimizes 19 blocks out of 2743, 10 of which are shown to be optimal. Moreover, ebso can prove for more than 17 % of blocks in GG that they are already optimal. Itis encouraging that ebso even finds optimizations in this already highly optimizeddata set. The quality of the data set is supported by the high percentage of blocksbeing proved as optimal by ebso . Next we examine three found optimizationsmore closely. Our favorite optimization POP PUSH SWAP1 POP PUSH SLTDUP1 EQ PUSH SLTDUP1 EQ is, in fact, a round-about and optimizable way to pop two words fromthe stack and push 1 on the stack. Some optimizations follow clear patterns.The optimizations CALLVALUE DUP1 ISZERO PUSH 81 to CALLVALUE CALLVALUE g.solidity.cc Namely, $ solc --optimize --bin-runtime --optimize-runs 200 with solc com-piler version available at github.com/ethereum/solidity/tree/v0.4.24.lockchain Superoptimizer 13 uso bso . 54 % (0 . 64 %) 184 0 . . 34 % 348 0 . 57 %time-out (trans. val. failed) 56 392 (1467) 92 . 12 % (2 . . 13 % Table 2: Aggregated results of running ebso with uso and bso on EthBC . ISZERO PUSH 81 and CALLVALUE DUP1 ISZERO PUSH 364 to CALLVALUE CALLVALUEISZERO PUSH 364 are both based on the fact that CALLVALUE is cheaper than DUP1 . Finding such patterns and generalizing them into peephole optimizationrules could be interesting future work.Unfortunately, ebso hit a time-out in nearly 82 % of all cases, where we counta failed translation validation as part of the time-outs, since in that case ebso continues to search for optimizations after increasing the word size. vs. Basic Superoptimization In this evaluation we compare unbounded and basic superoptimization, whichwe will abbreviate with uso and bso , respectively. To compare uso and bso , wewant a considerably larger data set. Fortunately, there is a rich source of EVM bytecode accessible: contracts deployed on the Ethereum blockchain. Assumingthat contracts that are called more often are well constructed, we queried the2500 most called contracts using Google BigQuery. From them we extractour data set EthBC of 61 217 distinct blocks. For this considerably larger dataset, we estimated a cut-off point of 15 min as reasonable. One limitation is that,due to the high volume, we only run the full evaluation once.Table 2 shows the aggregated results of running ebso on EthBC . Out of 61 217blocks in EthBC , ebso finds 943 optimizations using uso out of which it proves 393to be optimal. Using bso 184 optimizations are found. Some blocks were shownto be optimal by both approaches. Also, both approaches time out in a majorityof the cases: uso in more than 92 %, and bso in more than 99 %. Over all 61 217blocks the total amount of gas saved for uso is 17 871 and 6903 for bso . Forall blocks where an optimization is found, the average gas saving per block in uso is 29 . 63 %, and 46 . bso . The higher average for bso can be explainedby (i ) bso ’s bias for smaller blocks, where relative savings are naturally higher,and (ii ) bso only providing optimal results, whereas uso may find intermediate,non-optimal results. The optimization with the largest gain, is one which we didnot necessarily expect to find in a deployed contract: a redundant storage access.Storage is expensive, hence optimized for in deployed contracts, but uso and up to block number 7 300 000 deployed on Mar-04-2019 01:22:15 AM +UTC cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics4 J. Nagele and M.A. Schett bso both found PUSH PUSH SLOAD SUB PUSH DUP2 SWAP1 SSTORE POP whichoptimizes to the empty program—because the program basically loads the valuefrom key 4 only to store it back to that same key. This optimization saves atleast 5220 g , but up to 20 220 g .From Table 2 we see that on EthBC , uso outperforms bso by roughly a factorof five on found optimizations; more than ten times as many blocks are provedoptimal by uso than by bso . As we expected, most optimizations found by bso were also found by uso , but surprisingly, bso found 21 optimizations, on which uso failed. We found that nearly all of the 21 source programs are fairly complicated,but have a short optimization of two or three instructions. To pick an example, theblock PUSH PUSH SLOAD LT ISZERO ISZERO ISZERO PUSH PUSH PUSH bso . Additionally, all 21 blocks are cheap: all cost less than 10 g .We also would have expected at least some of these optimizations to have beenfound by uso . We believe internal unfortunate, non-deterministic choice withinthe solver to be the reason that it did not. Summary. We develop ebso , a superoptimizer for EVM bytecode, implementingtwo different superoptimization approaches and compare them on a large set ofreal-world smart contracts. Our experiments show that, relying on the heavilyoptimized search heuristics of a modern SMT solver is a feasible approach tosuperoptimizing EVM bytecode. Related Work. Superoptimization [9] has been explored for a variety of dif-ferent contexts [5, 6, 10, 12], including binary translation [1] and synthesizingcompiler optimizations [11]. To our knowledge ebso is the first application ofsuperoptimization to smart contracts.Chen et al. [2] also aim to save gas by optimizing EVM bytecode. Theyidentified 24 anti patterns by manual inspection. Building on their work werun ebso on their identified patterns. For 19 instances, ebso too found the sameoptimizations. For 2 patterns, ebso lacks encoding of the instructions ( STOP , JUMP ), and for 2 patterns ebso times out on a local machine.Due to the repeated exploitation of flaws in smart contracts, various for-mal approaches for analyzing EVM bytecode have been proposed. For instanceOyente [8] performs control flow analysis in order to detect security defects suchas reentrancy bugs. Outlook. There is ample opportunity for future work. We do not yet support the EVM ’s memory. While conceptually this would be a straightforward extension,the number of universally quantified variables and size of blocks are alreadyposing challenges for performance, as we identified by analyzing the optimizationsfound by ebso . lockchain Superoptimizer 15 Thus, it would be interesting to use SMT benchmarks obtained by ebso ’ssuperoptimization encoding to evaluate different solvers, e.g. CVC4 or Vampire .The basis for this is already in place: ebso can export the generated constraintsin SMT-LIB format. Accordingly, we plan to generate new SMT benchmarks andsubmit them to one of the suitable categories of SMT-LIB.In order to ease the burden on developers ebso could benefit from cachingcommon optimization patterns [11] to speed up optimization times. Anotherfruitful approach could be to extract the optimization patterns and generalizethem into peephole optimizations and rewrite rules. References 1. Bansal, S., Aiken, A.: Binary translation using peephole superoptimizers. In: Proc.8th OSDI. pp. 177–192. USENIX (2008)2. Chen, T., Li, Z., Zhou, H., Chen, J., Luo, X., Li, X., Zhang, X.: Towards savingmoney in using smart contracts. In: Proc. 40th ICSE-NIER. pp. 81–84. ACM (2018).https://doi.org/10.1145/3183399.31834203. De Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Proc. 14th TACAS.LNCS, vol. 9206, pp. 337–340. Springer (2008)4. Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In:Proc. 32nd PLDI. pp. 62–73. ACM (2011). https://doi.org/10.1145/1993498.19935065. Jangda, A., Yorsh, G.: Unbounded superoptimization. In: Proc. Onward! 2017. pp.78–88. ACM (2017). https://doi.org/10.1145/3133850.31338566. Joshi, R., Nelson, G., Randall, K.H.: Denali: A Goal-directed Superoptimizer. In:Proc. 23rd PLDI. pp. 304–314. ACM (2002). https://doi.org/10.1145/512529.5125667. King, T., Butcher, S., Zalewski, L.: Apocrita - High Performance Com-puting Cluster for Queen Mary University of London (Mar 2017).https://doi.org/10.5281/zenodo.4380458. Luu, L., Chu, D.H., Olickel, H., Saxena, P., Hobor, A.: Making smartcontracts smarter. In: Proc. 23rd CCS. pp. 254–269. ACM (2016).https://doi.org/10.1145/2976749.29783099. Massalin, H.: Superoptimizer: A look at the smallest program. In: Proc. 2ndASPLOS. pp. 122–126. IEEE (1987). https://doi.org/10.1145/36206.3619410. Phothilimthana, P.M., Thakur, A., Bod´ık, R., Dhurjati, D.: Scaling upsuperoptimization. In: Proc. 21st ASPLOS. pp. 297–310. ACM (2016).https://doi.org/10.1145/2872362.287238711. Sasnauskas, R., Chen, Y., Collingbourne, P., Ketema, J., Taneja, J., Regehr, J.:Souper: A synthesizing superoptimizer. CoRR abs/1711.04422 (2017), http://arxiv.org/abs/1711.0442212. Schkufza, E., Sharma, R., Aiken, A.: Stochastic superoptimization. In: Proc. 18thASPLOS. pp. 305–316. ACM (2013). https://doi.org/10.1145/2451116.245115013. Srinivasan, V., Reps, T.: Synthesis of machine code from semantics. In: Proc. 36thPLDI. pp. 596–607. ACM (2015). https://doi.org/10.1145/2737924.273796014. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger. Tech.Rep. Byzantium Version e94ebda (2018), https://ethereum.github.io/yellowpaper/paper.pdf cvc4.cs.stanford.edu/web/15