[PDF] Analyzing Smart Contracts: From EVM to a sound Control-Flow Graph

Abstract

The EVM language is a simple stack-based language with words of 256 bits, with one significant difference between the EVM and other virtual machine languages (like Java Bytecode or CLI for .Net programs): the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. Static analyzers need the complete control flow graph (CFG) of the EVM program in order to be able to represent all its execution paths. This report addresses the problem of obtaining a precise and complete stack-sensitive CFG by means of a static analysis, cloning the blocks that might be executed using different states of the execution stack. The soundness of the analysis presented is proved.

Full PDF

AAnalyzing Smart Contracts: From EVM to a soundControl-Flow Graph

Elvira Albert , , Jes´us Correas , Pablo Gordillo , Alejandro Hern´andez-Cerezo ,Guillermo Rom´an-D´ıez , and Albert Rubio , Instituto de Tecnolog´ıa del Conocimiento, Spain Complutense University of Madrid, Spain Universidad Polit´ecnica de Madrid, Spain

Abstract.

The

EVM language is a simple stack-based language with words of 256 bits, with onesigniﬁcant diﬀerence between the

EVM and other virtual machine languages (like Java Bytecodeor CLI for .Net programs): the use of the stack for saving the jump addresses instead of having itexplicit in the code of the jumping instructions. Static analyzers need the complete control ﬂowgraph (CFG) of the

EVM program in order to be able to represent all its execution paths. Thisreport addresses the problem of obtaining a precise and complete stack-sensitive CFG by means ofa static analysis, cloning the blocks that might be executed using diﬀerent states of the executionstack. The soundness of the analysis presented is proved. EVM

Language

The

EVM language is a simple stack-based language with words of 256 bits with a local volatile memorythat behaves as a simple word-addressed array of bytes, and a persistent storage that is part of theblockchain state. A more detailed description of the language and the complete set of operation codescan be found in [6]. In this section, we focus only on the relevant characteristics of the

EVM that areneeded for describing our work.

Example 1.

In order to describe our techniques, we use as running example a simpliﬁed version (withoutcalls to the external service Oraclize and the authenticity proof veriﬁer) of the contract [1] that imple-ments a lottery system. During a game, players call a method joinPot to buy lottery tickets; each player’saddress is appended to an array addresses of current players, and the number of tickets is appended toan array slots , both having variable length. After some time has elapsed, anyone can call rewardWinner which calls the

Oraclize service to obtain a random number for the winning ticket. If all goes accordingto plan, the

Oraclize service then responds by calling the __callback method with this random numberand the authenticity proof as arguments. A new instance of the game is then started, and the winner isallowed to withdraw her balance using a withdraw method. Figure 2 shows an excerpt of the

Solidity code(including the public function findWinner ) and a fragment of the

EVM code produced by the compiler.The

Solidity source code is shown for readability, as our analysis works directly on the

EVM code.To the right of Figure 2 we show a fragment of the

EVM code of method findWinner . It can be seenthat the

EVM has instructions for operating with the stack contents, like

DUP x or SWAP x ; for comparisons,like LT , GT ; for accessing the storage (memory) of the contract, like SSTORE , SLOAD ( MLOAD , MSTORE ); toadd/remove elements to/from the stack, like

PUSH x / POP ; and many others (we again refer to [6] fordetails). Some instructions increment the program counter in several units (e.g.,

PUSH x Y adds a wordwith the constant Y of x bytes to the stack and increments the program counter by x + 1). In whatfollows, we use size ( b ) to refer to the number of units that instruction b increments the value of theprogram counter. For instance size ( POP ) = 1, size ( PUSH1 ) = 2 or size ( PUSH3 ) = 4. (cid:4)

One signiﬁcant diﬀerence between the

EVM and other virtual machine languages (like Java Bytecodeor CLI for .Net programs) is the use of the stack for saving the jump addresses instead of having it explicitin the code of the jumping instructions. In

EVM , instructions

JUMP and

JUMPI will jump, unconditionallyand conditionally respectively, to the program counter stored in the top of the execution stack. Thisfeature of the

EVM requires, in order to obtain the control ﬂow graph of the program, to keep track ofthe information stored in the stack. Let us illustrate it with an example.

Example 2.

In the

EVM code to the right of Figure 2 we can see two jump instructions at program points and , respectively, and the jump address ( and ) is stored in the instruction immediatelybefore them: or . It then jumps to this destination by using the instruction JUMPDEST (programpoints , , ). (cid:4) a r X i v : . [ c s . P L ] M a y contract EthereumPot { address [] public addresses ; address public winnerAddress; uint [] public slots ; function callback ( bytes32 queryId , string result , bytes proof ) { if ( msg . sender != oraclize cbAddress()) throw ; random number = uint (sha3( result)) winnerAddress = ﬁndWinner(random number); amountWon = this .balance ∗

98 / 100 ; winnerAnnounced(winnerAddress, amountWon); if (winnerAddress. send (amountWon)) { if (owner. send ( this .balance)) { openPot(); } } } function ﬁndWinner ( uint random) constant returns (address winner) { for ( uint i = 0; i < slots . length ; i++) { if (random < = slots[i]) { return addresses [ i ]; } } } // Other functions } · · · · · · · · · Fig. 1.

Excerpt of

Solidity code for

EthereumPot contract (left), and fragment of

EVM code for function findWinner (right)

We start our analysis by deﬁning the set J , which contains all possible jump destinations in an EVM program P ≡ b , . . . , b p : J ( P ) = { pc | b pc ∈ P ∧ b pc ≡ JUMPDEST } . We use b pc ∈ P for referring to the instruction at program counter pc in the EVM program P . In whatfollows, we omit P from deﬁnitions when it is clear from the context, e.g., we use J to refer to J ( P ). Example 3.

Given the

EVM code that corresponds to function findWinner , we get the following set: J = { , , , , , , , , , , } (cid:4) The ﬁrst step in the computation of the CFG is to deﬁne the notion of block . In general [2], givena program P , a block is a maximal sequence of straight-line consecutive code in the program with theproperties that the ﬂow of control can only enter the block through the ﬁrst instruction in the block, andcan only leave the block at the last instruction. Let us deﬁne the concept of block in an EVM program:

Deﬁnition 1 (blocks).

Given an

EVM program P ≡ b , . . . , b p , we deﬁne blocks ( P ) = (cid:26) B i ≡ b i , . . . , b j (cid:12)(cid:12)(cid:12)(cid:12) ( ∀ k.i < k < j, b k (cid:54)∈ Jump ∪ End ∪ {

JUMPDEST } ) ∧ ( i =0 ∨ b i ≡ JUMPDEST ∨ b i − = JUMPI ) ∧ ( j = p ∨ b j ∈ Jump ∨ b j ∈ End ∨ b j +1 ≡ JUMPDEST ) (cid:27) where Jump = { JUMP , JUMPI } End = { REVERT , STOP , INVALID } Example 4.

Figure 3 shows the blocks (nodes) obtained for findWinner and their corresponding jumpinvocations. Solid and dashed edges represent the two possible execution paths depending on the entryblock: solid edges represent the path that starts from block and dashed edges the path that starts contract

EthereumPot { address [] public addresses ; address public winnerAddress; uint [] public slots ; function callback ( bytes32 queryId , string result , bytes proof ) { if ( msg . sender != oraclize cbAddress()) throw ; random number = uint (sha3( result)) winnerAddress = ﬁndWinner(random number); amountWon = this .balance ∗

Excerpt of

Solidity code for

EthereumPot contract (left), and fragment of

EVM code for function findWinner (right) from . Note that most of the blocks start with a

JUMPDEST instruction ( , , , , , , , , , , ). The rest of the blocks start with instructions that come right after a JUMPI instruction ( , ). Analogously, most blocks end in a JUMP ( , , , , ), JUMPI ( , , , ) or RETURN ( ) instruction or in the instruction that precedes JUMPDEST ( ). (cid:4) Observing the blocks in Figure 3, we can see that most

JUMP instructions use the address introducedin the

PUSH instruction executed immediately before the

JUMP . However, in general, in

EVM code, it ispossible to ﬁnd a

JUMP whose address has been stored in a diﬀerent block. This happens for instance whena public function is invoked privately from other methods of the same contract, the returning programcounter is introduced by the invokers at diﬀerent program points and it will be used in a unique

JUMP instruction when the invoked method ﬁnishes in order to return to the particular caller that invoked thatfunction.

Example 5.

In Figure 3, at block we have a

JUMP (marked with (cid:73) ) whose address is not pushed inthe same block. This

JUMP takes the returned address from function findWinner . If findWinner is publiclyinvoked, it jumps to address (pushed at block at (cid:63) ) and if it is invoked from __callback it jumpsto (pushed at block at (cid:63) ). EVM to a complete CFG

As we have seen in the previous section, the addresses used by the jumping instructions are stored inthe execution stack. In

EVM , blocks can be reached with diﬀerent stack sizes an contents. As it is usedin other tools [4,3,5], to precisely infer the possible addresses at jumping program points, we need a context-sensitive static analysis that analyze separately all blocks for each possible stack than can reachthem (only considering the addresses stored in the stack). This section presents an address analysis of EVM programs which allows us to compute a complete CFG of the

EVM code. To compute the addressesinvolved in the jumping instructions, we deﬁne a static analysis which soundly infers all possible addressesthat a

JUMP instruction could use.

UMPDESTPUSH1 0x03DUP1SLOADSWAP1POPDUP2LTISZEROPUSH2 0x06d0JUMPI

Block 661 - 66D

PUSH1 0x03DUP2DUP2SLOADDUP2LTISZEROISZEROPUSH2 0x066fJUMPI JUMPDESTSWAP1PUSH1 0x00MSTOREPUSH1 0x20PUSH1 0x00SHA3ADDSLOADDUP4GTISZEROISZEROPUSH2 0x06c3JUMPIJUMPDESTDUP1DUP1PUSH1 0x01ADDSWAP2POPPOPPUSH2 0x0653JUMP

Block 66F - 682Block 6C3 - 6CFBlock 653 - 660

JUMPDESTPUSH1 0x00DUP1PUSH1 0x00 SWAP1POP

Block 64B - 652Block 941 – 953Block 123 - 141 Block 66E - 66E

PUSH1 0x01DUP2·····ISZEROPUSH2 0x0691JUMPI

Block 690 - 690Block 683 - 68F

JUMPDEST

Block 6D0 - 6D0

JUMPDEST POPSWAP2SWAP1POPJUMP

Block 6D1 - 6D6

JUMPDEST SWAP1·····POPPUSH2 0x06d1JUMP

Block 691 - 6C2Block 954 - 9B8Block 142 - 183

JUMPDESTMODADDPUSH1 0x0aDUP2SWAP1SSTOREPOPPUSH2 0x0954PUSH1 0x0aSLOADPUSH2 0x064bJUMP JUMPDEST ·····ORSWAP1SSTORE…...ISZEROPUSH2 0x09baJUMPIJUMPDEST PUSH1 0x40…..SWAP2SUBSWAP1RETURNJUMPDESTPOPPUSH2 0x0142PUSH1 0x04DUP1CALLDATASIZESUBDUP2…..PUSH2 0x064bJUMP INVALID INVALID

Fig. 3.

Fragment of the CFG of findWinner

In our address analysis we aim at having the stack represented by explicit variables. Given thecharacteristics of

EVM programs, the execution stack of

EVM programs produced from

Solidity programswithout recursion can be ﬂattened. Besides, as the size of the stack of the Ethereum Virtual Machine isbounded to 1024 elements (see [6]), the number of stack variables is limited. We use V to represent theset of all possible stack variables that may be used in the program. The ﬁrst element we deﬁne for ouranalysis is its abstract state: The abstract state

Our analysis uses a partial representation of the execution stack as basic element. Tothis end, we use the notion of stack state as a pair (cid:104) n, σ (cid:105) , where n is the number of elements in the stack,and σ is a partial mapping that relates some stack positions with a set of jump destinations. A positionin the stack is referred as s i with 0 ≤ i < n , and s n − is the position at the top of the stack. The abstractstate of the analysis is deﬁned on the set of all stack states S = {(cid:104) n, σ (cid:105) | ≤ n ≤ |V| ∧ σ ( s ) ∈ Σ n } where Σ n is the set of all mappings using n stack variables, deﬁned recursively as follows: Σ i = Σ i − ∪ { σ [ s i (cid:55)→ j ] | σ ∈ Σ i − ∧ j ⊆ J } ; Σ = { σ ∅ } , where σ ∅ is the empty mapping. Deﬁnition 2 (abstract state).

The abstract state is a partial mapping π of the form S (cid:55)→ P ( S ) . The application of σ to an element s i , that is, σ ( s i ), corresponds to the set of jump destinations thata stack variable s i can contain. The ﬁrst element of the tuple, that is, n , stores the size of the stack inthe diﬀerent abstract states.The abstract domain is the lattice (cid:104) AS , π (cid:62) , π ⊥ , (cid:116) , (cid:118)(cid:105) , where AS is the set of abstract states and π (cid:62) isthe top of the lattice deﬁned as the mapping π (cid:62) such that ∀ s ∈ S , π (cid:62) ( s ) = S . The bottom element of thelattice π ⊥ is the empty mapping. Now, to deﬁne (cid:116) and (cid:118) , we ﬁrst deﬁne the function img ( π, s ) as π ( s )if s ∈ dom ( π ) and ∅ , otherwise. Given two abstract states π and π , we use π = π (cid:116) π to denote that π is the least upper-bound deﬁned as follows ∀ s ∈ dom ( π ) ∪ dom ( π ) , π ( s ) = img ( π , s ) ∪ img ( π , s ) Atthis point, π (cid:118) π holds iﬀ dom ( π ) ⊆ dom ( π ) and ∀ s ∈ dom ( π ) , π ( s ) ⊆ π ( s ) . Transfer function

One of the ingredients of our analysis is a transfer function that models the eﬀectof each

EVM instruction on the abtract state for the diﬀerent instructions. Given a stack state s of theform (cid:104) n, σ (cid:105) , Figure 4 deﬁnes the updating function λ ( b δ,αpc , s ) where b corresponds to the EVM instructionto be applied, pc corresponds to the program counter of the instruction and α and δ to the number oflements placed to and removed from the EVM stack when executing b , respectively. Given a map m wewill use m [ x (cid:55)→ y ] to indicate the result of updating m by making m ( x ) = y while m stays the same forall locations diﬀerent from x , and we will use m \ [ x ] to refer to a partial mapping that stays the samefor all locations diﬀerent from x , and m ( x ) is undeﬁned.By means of λ , now we can deﬁne the transfer function of our analysis. Deﬁnition 3 (transfer function).

Given the set of abstract states AS and the set of EVM instructions

Ins , the transfer function τ is deﬁned as a mapping of the form τ : Ins × AS (cid:55)→ AS is deﬁned as follows: τ ( b, π ) = π (cid:48) where ∀ s ∈ dom ( π ) , π (cid:48) ( s ) = λ ( b, π ( s )) b δ,α λ ( b, (cid:104) n, σ (cid:105) )(1) PUSH x v (cid:104) n + 1 , σ [ s n (cid:55)→ { v } ] (cid:105) when v ∈ J(cid:104) n + 1 , σ (cid:105) when v (cid:54)∈ J (2) DUP x (cid:104) n + 1 , σ (cid:105) when s n − x (cid:54)∈ dom ( σ ) (cid:104) n + 1 , σ [ s n (cid:55)→ σ ( s n − x )] (cid:105) when s n − x ∈ dom ( σ )(3) SWAP x (cid:104) n, σ (cid:105) when s n − (cid:54)∈ dom ( σ ) ∧ s n − x − (cid:54)∈ dom ( σ ) (cid:104) n, σ [ s n − x − (cid:55)→ σ ( s n − ) , s n − (cid:55)→ σ ( s n − x − )] (cid:105) when s n − ∈ dom ( σ ) ∧ s n − x − ∈ dom ( σ ) (cid:104) n, σ [ s n − (cid:55)→ σ ( s n − x − )] \ σ [ s n − x − ] (cid:105) when s n − (cid:54)∈ dom ( σ ) ∧ s n − x − ∈ dom ( σ ) (cid:104) n, σ [ s n − x − (cid:55)→ σ ( s n − )] \ σ [ s n − ] (cid:105) when s n − ∈ dom ( σ ) ∧ s n − x − (cid:54)∈ dom ( σ )(4) otherwise (cid:104) n − δ + α, σ \ [ s n − , . . . , s n − δ ] (cid:105) Fig. 4.

Updating function

Example 6.

Given the following initial abstract state {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} , which corresponds to theinitial stack state for executing block , the application of the transfer function τ to the block thatstarts at EVM instruction , produces the following results (between parenthesis we show the programpoint). To the right we show the application of the transfer function to block with its initial abstractstate {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} . ( ) JUMPDEST {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) MOD {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) ADD {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) PUSH1 0A {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) DUP2 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) SWAP1 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) SSTORE {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) POP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) PUSH2 0954 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }}(cid:105)}} ( ) PUSH1 0A {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }}(cid:105)}} ( ) SLOAD {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }}(cid:105)}} ( ) PUSH2 064B {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105)}} ( ) JUMP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }}(cid:105)}} ( ) JUMPDEST {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) POP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ( ) PUSH2 0142 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) PUSH1 04 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) DUP1 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) CALLDATASIZE {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) SUB {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ...( ) SWAP1 {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) POP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) POP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) POP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} ( ) PUSH2 064B {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105)}} ( ) JUMP {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} (cid:4) Addresses equations system

The next step consists in deﬁning, by means of the transfer and the updatingfunctions, a constraint equation system to represent all possible jumping addresses that could be validfor executing a jump instruction in the program.

Deﬁnition 4 (addresses equations system).

Given an

EVM program P of the form b , . . . , b p , its ad-dresses equation system , E ( P ) includes the following equations according to all EVM bytecode instruction b pc ∈ P : pc C pc (1) JUMP X σ ( s n − ) (cid:119) idmap ( λ ( b pc , (cid:104) n, σ (cid:105) )) ∀ s ∈ dom ( X pc ) , (cid:104) n, σ (cid:105) ∈ X pc ( s ) (2) JUMPI X σ ( s n − ) (cid:119) idmap ( λ ( b pc , (cid:104) n, σ (cid:105) )) ∀ s ∈ dom ( X pc ) , (cid:104) n, σ (cid:105) ∈ X pc ( s ) X pc +1 (cid:119) idmap ( λ ( b pc , (cid:104) n, σ (cid:105) )) ∀ s ∈ dom ( X pc ) , (cid:104) n, σ (cid:105) ∈ X pc ( s ) (3) b pc (cid:54)∈ End ∧ b pc + size ( b pc ) = JUMPDEST X pc + size ( b pc ) (cid:119) idmap ( λ ( b pc , (cid:104) n, σ (cid:105) )) ∀ s ∈ dom ( X pc ) , (cid:104) n, σ (cid:105) ∈ X pc ( s ) (4) b pc (cid:54)∈ End X pc + size ( b pc ) (cid:119) τ ( b pc , X pc ) (5) otherwise X pc + size ( b pc ) (cid:119) τ ( b pc , X pc ) where idmap ( s ) returns a map π such that dom ( π ) = { s } and π ( s ) = { s } and size ( b pc ) returns thenumber of bytes of the instruction b pc . Observe that the addresses equations system will have one equation for all program points of the pro-gram. Concretely, variables of the form X pc store the jumping addresses saved in the stack after executing b pc for all possible entry stacks. This information will be used for computing all possible jump destinationswhen executing JUMP or JUMPI instructions. For computing the system, most instructions, cases (4) and(5), just apply the transfer function τ to compute the possible stack states of the subsequent instruction.Note that the expression pc + size ( b pc ) at (3) just computes the position of the next instruction in the EVM program. Jumping instructions, points (1) and (2), compute the initial state of the invoked blocks,thus they produce a map with all possible input stack states that can reach one block.

JUMP and

JUMPI instructions produce, for each stack state, one equation by taking the element from the previous stackstate X σ ( s n − ) . JUMPI , point (2) produces an extra equation X pc +1 to capture the possibility of continuingto the next instruction instead of jumping to the destination address. Additionally, those instructionsbefore JUMPDEST , point (3), produce initial states for the block that starts in the

JUMPDEST . When theconstraint equation system is solved, constraint variables over-approximate the jumping information forthe program.

Example 7.

As it can be seen if Figure 3, we can jump to block 64B from two diﬀerent blocks, and . The computation of the jump equations systems will produce the following equations for the entryprogram points of these two blocks: X (cid:119) {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ... X (cid:119) {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105)}}X (cid:13) (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} X (cid:119) {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , {}(cid:105)}} ... X (cid:119) {(cid:104) , {}(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105)}}X (cid:13) (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ {(cid:104) , { s (cid:55)→ }(cid:105)}} Observe that we have two diﬀerent stack contents reaching the same program point, e.g. two equationsfor X are produced by two diﬀerent blocks, the JUMP at the end of block , identiﬁed by X (cid:13) ,and the JUMP at the end of block 123, identiﬁed by X (cid:13) . Thus the equation that must hold for X isproduced by the application of the operation X (cid:13) (cid:116) X (cid:13) , as follows: X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} Note that the application of the transfer function τ for all instructions of block 64B applies function λ to all elements in the abstract state and updates the stack state accordingly ( JUMPDEST ) X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} ( PUSH1 00 ) X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} ( DUP1 ) X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} ( PUSH1 00 ) X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} ( SWAP1 ) X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} ( POP ) X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} (cid:4) Solving the addresses equations system of a program P can be done iteratively. A na¨ıve algorithmconsists in ﬁrst creating one constraint variable X (cid:119) π ∅ [ (cid:104) , σ ∅ (cid:105) (cid:55)→ {(cid:104) , σ ∅ (cid:105)} ], where π ∅ and σ ∅ are emptymappings, and X pc (cid:119) π ⊥ for all pc ∈ P, pc (cid:54) = 0, and then iteratively reﬁning the values of these variablesas follows: (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) }X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) }X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ (cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ (cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ (cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ (cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ (cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } ... X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ (cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ , s (cid:55)→ }(cid:105) } A (cid:13)X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) }X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) }X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) }X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) }X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) } B (cid:13)X (cid:119) {(cid:104) , {}(cid:105) (cid:55)→ (cid:104) , {}(cid:105) } ... B (cid:13)X (cid:119) {(cid:104) , {}(cid:105) (cid:55)→ (cid:104) , {}(cid:105) } ... Fig. 5.

Jumps equations system of __callback function

1. substitute the current values of the constraint variables in the right-hand side of each constraint, andthen evaluate the right-hand side if needed;2. if each constraint

X (cid:119) E holds, where E is the value of the evaluation of the right-hand side of theprevious step, then the process ﬁnishes; otherwise3. for each X (cid:119) E which does not hold, let E (cid:48) be the current value of X . Then update the current valueof X to E (cid:116) E (cid:48) . Once all these updates are (iteratively) applied we repeat the process at step 1.Termination is guaranteed since the abstract domain does not have inﬁnitely ascending chains as thenumber of jump destinations and the stack size is ﬁnite. Example 8.

Figure 5 shows the equations produced by Deﬁnition 4 of the ﬁrst and the last instruction ofall blocks shown in Figure 3. The ﬁrst instruction shown in the system is X , computed in Example 7.Observe that the application of τ stores the jumping addresses in the corresponding abstract states after PUSH instructions (see X , X , X , X , . . . ). Such addresses will be used to produce the equationsat the JUMP or JUMPI instructions. In the case of

JUMP , as the jump is unconditional, it only producesone equation, e.g. X consumes address to produce the input state of X , or X produces theinput abstract state for X . JUMPI instructions produce two diﬀerent equations: (1) one equation whichcorresponds to the jumping address stored in the stack, e.g. equations X and X produced by thejumps of the equations X and X respectively; and (2) one equation which corresponds to the nextnstruction, e.g. X and X produced by X and X , respectively. Finally, another point to highlightoccurs at equation X : as we have two possible jumping addresses in the stack of and both can be usedby the JUMP at the end of the block, we produce two inputs for the two possible jumping addresses, X and X , for capturing the two possible branches from block (see Figure 3). (cid:4) Theorem 1 (Soundness).

Let P ≡ b , . . . , b p be a program, X , . . . , X n the solution of the jumpsequations system of P , and pc the program counter of a jump instruction. Then for any execution of P , there exists s ∈ dom ( X pc ) such that (cid:104) n, σ (cid:105) ∈ X pc ( s ) and σ ( s n − ) contains all jump addresses thatinstruction b pc might jump to during the execution of P .Control Flow Graph. At this point, by means of the addresses equation system solution, we computethe control ﬂow graph of the program. In order to simplify the notation, given a block B i , we deﬁne thefunction getId ( i, (cid:104) n, σ (cid:105) ), which receives the block identiﬁer i and an abstract stack (cid:104) n, σ (cid:105) and returns aunique identiﬁer for the abstract stack (cid:104) n, σ (cid:105) ∈ dom ( X i ). Similarly, getStack ( i, id ) returns the abstractstate (cid:104) n, σ (cid:105) that corresponds to the identiﬁer id of block B i . Besides, we deﬁne the function getSize ( pc, id )that, given a program point pc ∈ B i and a unique identiﬁer id for B i , returns the value n (cid:48) s.t. (cid:104) n, σ (cid:105) = getStack ( i, id ), and X pc ( (cid:104) n, σ (cid:105) ) = (cid:104) n (cid:48) , σ (cid:48) (cid:105) . Example 9.

Given the equation: X (cid:119) {(cid:104) , { s (cid:55)→ }(cid:105) (cid:124) (cid:123)(cid:122) (cid:125) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105) , (cid:104) , { s (cid:55)→ }(cid:105) (cid:124) (cid:123)(cid:122) (cid:125) (cid:55)→ (cid:104) , { s (cid:55)→ }(cid:105)} , if we compute the functions getId and getSize , we have that getId ( , (cid:104) , { s (cid:55)→ }(cid:105) ) = 1 and getId ( , (cid:104) , { s (cid:55)→ }(cid:105) ) = 2. Analogously, getSize ( ,

1) = 7 and getSize ( ,

2) = 3. (cid:4)

Deﬁnition 5 (control ﬂow graph).

Given an

EVM program P , its blocks B i ≡ b i . . . b j ∈ blocks ( P ) and its ﬂow analysis results provided by a set of variables of the form X pc for all pc ∈ P , we deﬁne the control ﬂow graph of P as a directed graph CFG = (cid:104) V, E (cid:105) with a set of vertices V = { B i : id | B i ∈ blocks ( P ) ∧ (cid:104) n, σ (cid:105) ∈ dom ( X i ) ∧ id = getId ( i, (cid:104) n, σ (cid:105) ) } and a set of edges E = E jump ∪ E next such that: E jump = { B i : id → B d : id | b j ∈ Jump ∧(cid:104) n, σ (cid:105) ∈ dom ( X j ) ∧ id = getId ( i, (cid:104) n, σ (cid:105) ) ∧(cid:104) n (cid:48) , σ (cid:48) (cid:105) ∈ X j ( (cid:104) n, σ (cid:105) ) ∧ d = σ (cid:48) ( s n (cid:48) − ) ∧(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) = λ ( b j , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ) ∧ id = getId ( d, (cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) ) } E next = { B i : id → B d : id | b j (cid:54) = JUMP ∧ b j (cid:54)∈ End ∧(cid:104) n, σ (cid:105) ∈ dom ( X j ) ∧ id = getId ( i, (cid:104) n, σ (cid:105) ) ∧(cid:104) n (cid:48) , σ (cid:48) (cid:105) ∈ X j ( (cid:104) n, σ (cid:105) ) ∧ d = j + size ( b j ) ∧(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) = λ ( b j , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ) ∧ id = getId ( d, (cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) )The ﬁrst relevant point of the control ﬂow graph (CFG) we produce is that, for producing the set ofvertices V , we replicate each block for each diﬀerent stack state that could be used for invoking it (graynodes in Figure 3 are replicated in the CFG). Analogously, the diﬀerent entry stack states are also usedto produce diﬀerent edges depending on its corresponding replicated blocks. Note that the deﬁnitiondistinguishes between two kinds of edges. (1) edges produced by JUMP or JUMPI instructions at the endof the blocks, whose destination is taken from the values stored in the stack states of the instructionbefore the jump with d = σ (cid:48) ( s n (cid:48) − ); and (2) edges produced by continuations to the next instruction,whose destination is computed with d = j + size ( b j ). In both kinds of edges, as we could have replicatedblocks, we apply function λ and get the id of the resulting state to compute the id of the destination: (cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) = λ ( b j , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ) ∧ id = getId ( d, (cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) ). Example 10.

Considering the blocks shown in Figure 3 and the equations shown at Figure 5, the CFGof the program includes non-replicated nodes for those blocks that only receive one possible stack state(white nodes in Figure 3). However, the nodes that could be reached by two diﬀerent stack states (graynodes in Figure 3) will be replicated in the CFG: V = { B , B , B , B , B , B , B , B , B , B , B , B , B , B , B ,B , B , B , B , B , B , B , B , B , B , B } nalogously, our CFG replicates the edges according to the nodes replicated (solid and dashed edges inFigure 3):E = { B → B , B → B , B → B , B → B , B → B ,B → B , B → B , B → B , B → B , B → B ,B → B , B → B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B ,B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B ,B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B , B (cid:57)(cid:57)(cid:75) B } Note that, in Figure 3, we distinguish dashed and solid edges just to remark that we could have twopossible execution paths, that is, if the call to findWinner comes from block B , it will return to block B and, if the execution comes from a public invocation, i.e. block B , it will return to block B . (cid:4) The proof sketch follows the next steps:1. We ﬁrst deﬁne an

EVM operational semantics that describes how

EVM programs handle jump ad-dresses on the stack.2. Then we deﬁne an

EVM collecting semantics for the operational semantics. Such collecting semanticsgathers all transitions that can be produced by the execution of a program P ;3. We continue by deﬁning the jumps-to property as a property of this collecting semantics; and4. Then we prove a lemma that states that the least solution of the set of constraints generated asdescribed in Deﬁnition 4 is a safe approximation of the EVM collecting semantics w.r.t. the jumps-toproperty.5. Finally, we rewrite Theorem 1 in terms of the operational semantics and prove it.Figure 6 shows the semantics of some instructions involved in the computation of the values storedin the stack for handling jumps. The state of the program S is a tuple (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) where pc is the valueof the program counter with the index of the next instruction to be executed, and (cid:104) n, σ (cid:105) is a stackstate as deﬁned in Section 2. Interesting rules are the ones that deal with jump destination addresseson the stack: Rule (4) adds a new address on the stack, and Rules (6) and (8-10) copy or exchangeexisting addresses on top of the stack, respectively. Rules (1) to (3) perform a jump in the programand therefore consume the address placed on top of the stack, plus an additional word in the caseof JUMPI . If the instructions considered in this simpliﬁed semantics do not handle jump addresses, thecorresponding rules just remove some values from the stack in the program state S (Rules (5) , (7) and (11) ). The remaining EVM instructions not explicitely considered in this simpliﬁed semantics aregenerically represented by Rule (12) with b δ,αpc , where δ is the number of items removed from stack when b pc is executed, and α is the number of additional items placed on the stack. Complete executions aretraces of the form S ⇒ S ⇒ . . . ⇒ S n where S ≡ (cid:104) , (cid:104) , σ ∅ (cid:105)(cid:105) is the initial state, σ is the emptymapping, and S n corresponds to the last state. There are no inﬁnite traces, as any transaction thatexecutes EVM code has a ﬁnite gas limit and every instruction executed consumes some amount of gas.When the gas limit is exceeded, an out-of-gas exception occurs and the program halts immediately.

Deﬁnition 6 (

EVM collecting semantics).

Given an

EVM program P , the EVM collecting semanticsoperator C P is deﬁned as follows: C P ( X ) = {(cid:104) S, S (cid:48) (cid:105) | (cid:104) , S (cid:105) ∈ X ∧ S ⇒ S (cid:48) } The

EVM semantics is deﬁned as ξ P = (cid:83) n> C nP ( X ) , where X ≡ {(cid:104) , (cid:104) , σ ∅ (cid:105)(cid:105)} is the initial conﬁgura-tion. Deﬁnition 7 (jumps-to property).

Let P be an IR program, ξ P = (cid:83) n> C nP ( X ) , and b an instructionat program point pc , then we say that ξ P (cid:15) pc T if T = {(cid:104) n, σ (cid:105) | (cid:104) S, S (cid:48) (cid:105) ∈ ξ P ∧ (cid:104) n, σ (cid:105) ∈ S (cid:48) } . The following lemma states that the least solution of the constraint equation system deﬁned inDeﬁnition 2 is a safe approximation of ξ P : Lemma 1 (soundness).

Let P ≡ b , . . . , b p be a program, pc a program point and X , . . . , X p the leastsolution of the constraints equation system as deﬁned in Section 2. The following holds:If ξ P (cid:15) pc T , then for all (cid:104) n, σ (cid:105) ∈ T , exists s ∈ dom ( X pc ) such that (cid:104) n, σ (cid:105) ∈ X pc ( s ) . b pc = JUMP (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) σ ( s n − ) , (cid:104) n − , σ \ [ s n − ] (cid:105)(cid:105) (2) b pc = JUMPI (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) σ ( s n − ) , (cid:104) n − , σ \ [ s n − , s n − ] (cid:105)(cid:105) (3) b pc = JUMPI (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n − , σ \ [ s n − , s n − ] (cid:105)(cid:105) (4) b pc = PUSH x v, v ∈ J(cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n + 1 , σ [ s n (cid:55)→ { v } ] (cid:105)(cid:105) (5) b pc = PUSH x v, v / ∈ J(cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n + 1 , σ (cid:105)(cid:105) (6) b pc = DUP x, s n − x ∈ dom ( σ ) (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n + 1 , σ [ s n (cid:55)→ σ ( s n − x )] (cid:105)(cid:105) (7) b pc = DUP x, s n − x / ∈ dom ( σ ) (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n + 1 , σ (cid:105)(cid:105) (8) b pc = SWAP x, s n − ∈ dom ( σ ) , s n − x − ∈ dom ( σ ) (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n, σ [ s n − x − (cid:55)→ σ ( s n − ) , s n − (cid:55)→ σ ( s n − x − )] (cid:105)(cid:105) (9) b pc = SWAP x, s n − ∈ dom ( σ ) , s n − x − / ∈ dom ( σ ) (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n, σ [ s n − x − (cid:55)→ σ ( s n − )] \ [ s n − ] (cid:105)(cid:105) (10) b pc = SWAP x, s n − / ∈ dom ( σ ) , s n − x − ∈ dom ( σ ) (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n, σ [ s n − (cid:55)→ σ ( s n − x − )] \ [ s n − x − ] (cid:105)(cid:105) (11) b pc = SWAP x, s n − / ∈ dom ( σ ) , s n − x − / ∈ dom ( σ ) (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n, σ \ [ s n − , s n − x − ] (cid:105)(cid:105) (12) b δ,αpc ∈ otherwise , b δ,αpc / ∈ End (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) ⇒ (cid:104) pc + size ( b pc ) , (cid:104) n − δ + α, σ \ [ s n − , ..., s n − δ ] (cid:105)(cid:105) Fig. 6.

Simpliﬁed

EVM semantics for handling jumps

Proof.

We use X mpc to refer to the value obtained for X pc after m iterations of the algorithm for solvingthe equation system depicted in Section 2. We say that X pc covers (cid:104) n, σ (cid:105) in C mP ( X ) at program point pc when this lemma holds for the result of computing C mP ( X ). In order to prove this lemma, we can reasonby induction on the value of m , the length of the traces S ⇒ m S m considered in C mP ( X ). Case base : if m = 0, S = (cid:104) , (cid:104) , σ ∅ (cid:105)(cid:105) and the Lemma trivially holds as (cid:104) , σ ∅ (cid:105) ∈ X ( (cid:104) , σ ∅ (cid:105) ). Induction Hypothesis : we assume Lemma 1 holds for all traces of length m ≥ Inductive Case : Let us consider traces of length m + 1, which are of the form S ⇒ m S m ⇒ S m +1 . S m is a program state of the form S m = (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) . We can apply the induction hypothesis to S m : thereexists some s ∈ dom ( X mpc ) such that (cid:104) n, σ (cid:105) ∈ X mpc ( s ). For extending the Lemma, we reason for all possiblerules in the simpliﬁed EVM semantics (Fig. 6) we may apply from S m to S m +1 : – Rule (1): After executing a

JUMP instruction S m +1 is of the form (cid:104) σ ( s n − ) , (cid:104) n − , σ \ [ s n − ] (cid:105)(cid:105) . Initeration m + 1, the following set of equations corresponding to b pc is evaluated: X σ ( s n − ) (cid:119) idmap ( λ ( b pc , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) for all s (cid:48) ∈ dom ( X pc ) , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ∈ X pc ( s (cid:48) )where idmap ( λ ( b pc , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) = π ⊥ [ (cid:104) n (cid:48) − , σ (cid:48) \ [ s n − ] (cid:105) (cid:55)→ {(cid:104) n (cid:48) − , σ (cid:48) \ [ s n − ] (cid:105)} ] (Case (4) in Fig. 4).The induction hypothesis guarantees that there exists some s (cid:48)(cid:48) ∈ X mpc such that (cid:104) n, σ (cid:105) ∈ X mpc ( s (cid:48)(cid:48) ),here S m = (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) . Therefore, at Iteration m + 1, the following must hold: X m +1 σ ( s n − ) (cid:119) π ⊥ [ (cid:104) n − , σ \ [ s n − ] (cid:105) (cid:55)→ {(cid:104) n − , σ \ [ s n − ] (cid:105)} ]so (cid:104) n − , σ \ [ s n − ] (cid:105) ∈ X m +1 σ ( s n − ) ( (cid:104) n − , σ \ [ s n − ] (cid:105) ) and thus Lemma 1 holds. – Rules (2) and (3): After executing a

JUMPI instruction, S m +1 is either (cid:104) σ ( s n − ) , (cid:104) n − , σ \ [ s n − , s n − ] (cid:105)(cid:105) or (cid:104) pc + size ( b pc ) , (cid:104) n − , σ \ [ s n − , s n − ] (cid:105)(cid:105) , respectively. In any of those cases the following sets ofequations are evaluated: X σ ( s n − ) (cid:119) idmap ( λ ( JUMPI , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) for all s (cid:48) ∈ dom ( X pc ) , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ∈ X pc ( s (cid:48) ) X pc +1 (cid:119) idmap ( λ ( JUMPI , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) for all s (cid:48) ∈ dom ( X pc ) , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ∈ X pc ( s (cid:48) )where idmap ( λ ( b pc , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) = π ⊥ [ (cid:104) n (cid:48) − , σ (cid:48) \ [ s n − , s n − ] (cid:105) (cid:55)→ {(cid:104) n (cid:48) − , σ (cid:48) \ [ s n − , s n − ] (cid:105)} ] (Case (4)of the deﬁnition of the update function λ in Fig. 4). As in the previous case, the induction hypothesisguarantees that at Iteration m there exists s (cid:48)(cid:48) ∈ X mpc such that (cid:104) n, σ (cid:105) ∈ X mpc ( s (cid:48)(cid:48) ). Therefore, inIteration m + 1, the following must hold: X m +1 σ ( s n − ) (cid:119) π ⊥ [ (cid:104) n − , σ \ [ s n − , s n − ] (cid:105) (cid:55)→ {(cid:104) n − , σ \ [ s n − , s n − ] (cid:105)} ] X m +1 pc +1 (cid:119) π ⊥ [ (cid:104) n − , σ \ [ s n − , s n − ] (cid:105) (cid:55)→ {(cid:104) n − , σ \ [ s n − , s n − ] (cid:105)} ]and thus Lemma 1 holds for these cases as well. – Rules (4) - (12): We will ﬁrst consider the case in which any of these rules corresponds to an

EVM instruction followed by an instruction diﬀerent from

JUMPDEST . All rules are similar, as they all usethe set of equations generated by Case (4) in Deﬁnition 4. We will see Rule (4) in detail.After executing a

PUSH x v instruction, S m +1 is (cid:104) pc + size ( b pc ) , (cid:104) n + 1 , σ [ s n (cid:55)→ { v } ] (cid:105)(cid:105) . We have toprove that exists some s ∈ dom ( X pc + size ( b pc ) ) such that (cid:104) n + 1 , σ [ s n (cid:55)→ { v } ] (cid:105) ∈ X pc + size ( b pc ) ( s ). Thefollowing set of equations is evaluated: X pc + size ( b pc ) (cid:119) τ ( PUSH x, X pc ) (1)By Deﬁnition 3, τ ( PUSH x, X pc ) = π (cid:48) , where ∀ s (cid:48) ∈ dom ( π ) , π (cid:48) ( s (cid:48) ) = λ ( PUSH x, X pc ( s (cid:48) )). By the case (1)of the deﬁnition of the update function λ , we have that: ∀(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) ∈ dom ( X pc ) , π (cid:48) ( (cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105) ) = (cid:104) n (cid:48)(cid:48) + 1 , σ (cid:48)(cid:48) [ s n (cid:55)→ { v } ] (cid:105) (2)By the induction hypothesis, at Iteration m there exists some s ∈ dom ( X mpc ) such that (cid:104) n, σ (cid:105) ∈ X mpc ( s ).Therefore, by 1 and 2, at Iteration m + 1 we have that the following holds: s ∈ dom ( X m +1 pc + size ( b pc ) ) and (cid:104) n + 1 , σ [ s n (cid:55)→ { v } ] (cid:105) ∈ X pc + size ( b pc ) ( s )and thus Lemma 1 holds for Rule (4). – Rules (4) - (12), followed by a

JUMPDEST instruction. After executing any of these instructions, S m +1 is (cid:104) pc + size ( b pc ) , (cid:104) n (cid:48)(cid:48)(cid:48) , σ (cid:48)(cid:48)(cid:48) (cid:105)(cid:105) , where (cid:104) n (cid:48)(cid:48)(cid:48) , σ (cid:48)(cid:48)(cid:48) (cid:105) is obtained according to the rule from Figure 6. Wehave to prove that exists some s ∈ dom ( X pc + size ( b pc ) ) such that (cid:104) n (cid:48)(cid:48)(cid:48) , σ (cid:48)(cid:48)(cid:48) (cid:105) ∈ X pc + size ( b pc ) ( s ). Thefollowing set of equations is evaluated: X pc + size ( b pc ) (cid:119) idmap ( λ ( b pc , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) for all s (cid:48) ∈ dom ( X pc ) , (cid:104) n (cid:48) , σ (cid:48) (cid:105) ∈ X pc ( s (cid:48) ) (3)where idmap ( λ ( b pc , (cid:104) n (cid:48) , σ (cid:48) (cid:105) )) = π ⊥ [ (cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) ] (cid:105) (cid:55)→ {(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105)} ], where n (cid:48)(cid:48) and σ ‘ are obtained accordingto the cases of the updating function detailed in Figure 4. We can see that {(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105)} ] match themodiﬁcation made to the state S m +1 by the corresponding rule of the semantics. Therefore, atIteration there exists an s = {(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105)} ] such that {(cid:104) n (cid:48)(cid:48) , σ (cid:48)(cid:48) (cid:105)} ] ∈ X m +1 pc + size ( b pc ) , and Lemma 1 alsoholds.When the algorithm stops Lemma 1 holds, as for any pc X m +1 pc (cid:119) X mpc for each iteration of thealgorithm for solving the equations system of Section 2.Now we rewrite Theorem 1 in terms of the operational semantics of Figure 6. This rewriting actuallyis stronger than Theorem 1, as guarantees the correctness of the stack states obtained from the jumpsequations system at any step of the execution. Theorem 2 (Soundness).

Let P ≡ b , . . . , b p be a program, S = (cid:104) , (cid:104) , σ ∅ (cid:105)(cid:105) the initial program state,and X , . . . , X n the solution of the jumps equations system of P . Then for any trace S (cid:59) ∗ S m , where S m = (cid:104) pc, (cid:104) n, σ (cid:105)(cid:105) , there exists s ∈ dom ( X pc ) such that (cid:104) n, σ (cid:105) ∈ X pc ( s ) .Proof. Straightforward from Lemma 1, as the

EVM collecting semantics takes into account all possibletraces of the operational semantics. eferences

1. The

EthereumPot contract, 2017. https://etherscan.io/address/0x5a13caa82851342e14cd2ad0257707cddb8a31b7.2. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman.

Compilers: Principles, Techniques, and Tools . Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2nd edition, 2006.3. Lexi Brent, Anton Jurisevic, Michael Kong, Eric Liu, Francois Gauthier, Vincent Gramoli, Ralph Holz,and Bernhard Scholz. Vandal: A Scalable Security Analysis Framework for Smart Contracts, 2018.arXiv:1809.03981.4. Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Gigahorse: thorough, declarative de-compilation of smart contracts. In Joanne M. Atlee, Tevﬁk Bultan, and Jon Whittle, editors,

Proceedings ofthe 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31,2019 , pages 1176–1186. IEEE / ACM, 2019.5. Neville Grech, Michael Kong, Anton Jurisevic, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Mad-max: surviving out-of-gas conditions in ethereum smart contracts.