Automatic Verification of LLVM Code
aa r X i v : . [ c s . P L ] J un Automatic Verification of LLVM Code
Axel Legay , Dirk Nowotka , and Danny Bgsted Poulsen UCLouvain, Belgium Kiel University, Germany Aalborg University, DenmarkJune 5, 2020
Abstract
In this work we present our work in developing asoftware verification tool for
LLVM -code -
Lodin -that incorporates both explicit-state model check-ing, statistical model checking and symbolic statemodel checking algorithms.
Formal Methods, in particular Model Checking [1],have for many years promised to revolutionise theway we assert software correctness. It has gained alarge following in the hardware design industry, buthas yet to become mainstream in the software de-velopment industry - and this despite software beingused in a large array of safety-critical components ine.g. cars and air planes. Nowadays, any non-trivialcomponent of any system is controlled by an em-bedded microprocessor with a control program mak-ing software quality assurance more important thanever. Many case studies have shown that formalmethods is a valuable tool - even in industrial con-texts - but most successful applications have beenconducted by academic researchers exploring formalmethods usefulness.One of the reasons that formal methods havenot penetrated the software industry is, that formalmethods require a translation of the source code toa formal model (e.g. Petri Nets or Automata) and the analysis conducted on these formal models. Thisis problematic as it requires industry engineers toinvest quite some effort into understanding the for-mal modelling language and its associated tool. Thediagnostic output for formal tools are also hard tounderstand without being an expert in formal meth-ods. As a result, industry quality assurance relieson extensive testing - which will have to be doneeven after applying formal methods - and code re-views. Another complicating factor in applying theabove mentioned workflow is, that sometimes theengineers do not know the source code intimately -parts of it might have been auto-generated and someof it might be legacy code. Attempting to translatecode one has not developed to a formal model is verydifficult and error-prone.In summary, the learning curve of formal meth-ods is steep thus industry engineers rely on othermethods, and translating code to formal models isvery hard and close to impossible. Formal tools areneeded that understand the source code that indus-try already uses to easen the usage of formal toolsin industry.Academics have developed tools accepting purecode as inputs [2, 5, 13, 14]. A major breakthroughwas achieved by tools such as
Blast [5] and
SLAM [2] based around a Counter-Example-Guided-Abstraction-Refinement (CEGAR) [9],where a program text is explored symbolic basedon a predicate abstraction of the program. The1redicates are continuously refined to make the ab-straction as detailed as needed. Another approach,pioneered by the tool
CBMC [16], is boundedmodel checking [6]. Here the program transitionsystem is unrolled a number of times ( in practiceby unrolling loops and inlining function call), andencoded into a constraint system. During encodingthe assertions can be added that has to be truealong any execution (e.g. that a divisor is neverzero). If the resulting constraint system has asolution where an assertion is true, then the systemis not safe. CEGAR and Bounded Model Checkingare incomplete, but are nevertheless both verysuccessful in locating errors.Nowadays the more successful software verifica-tion tools are
CBMC [16] (bounded model checker)and
CPAChecker [4] (CEGAR-based tool - anddirect successor of
Blast ). The tools are amongthe dominating tools in Software Verification com-petitions . CBMC and
CPAChecker are both tied to onesource language thus major parts of the tools haveto be implemented for each language they want tosupport. A better idea may be to base the analy-ses on an intermediate format that can capture thesemantics of many high level languages. One suchintermediate format is
LLVM [17] which at least 4tools are using:1.
LLBMC [13] follows in the footsteps of
CBMC and performs bounded model checking on
LLVM ,2.
SeaHorn [15] has the objective of making ver-ification platform for
LLVM code, it seems toemploy mostly CEGAR-based approaches,3.
Klee [7] is a symbolic execution engine per-forming a s symbolic exploration of the statespace, in order to find good test cases for test-ing, and4.
Divine [3] is an explicit-state model checker for
LLVM code.Although previously mentioned tools have pavedthe way for formal methods entering industry, they https://sv-comp.sosy-lab.org are not without flaws. A lot of them primarily focuson single-threaded programs which is a problem, be-cause industry moves to multi core-architecture andverification thus needs to take interleaving into ac-count. This interleaving is the cause of the statespace space explosion problem - a problem that thesymbolic representation of LLBMC , CBMC and
CPAChecker cannot avoid. Although there hasbeen some work in adapting at least
CBMC to con-current code, it is still an open problem how to verifyconcurrent programs efficiently.In this paper we present the tool
Lodin a fairlynew tool [18] offering a range of verification tech-niques for
LLVM . For concurrent programs it im-plements explicit-state reachability. Realising an ex-haustive state space search will not scale for largeprograms, it also implements under-approximatestate space searches through simulation. For single-threaded programs
Lodin implements symbolic ex-ploration akin to
CBMC and
LLBMC . In this way,
Lodin distinguishes itself from existing tools by im-plementing several techniques into a joint frame-work.
Lodin achieves its ability to implement differenttechniques through its flexible architecture. An-other feature of
Lodin that sets it apart from otherformal tools is its extensibility through platform plu-gins: the core of
Lodin implements only the bareminimum semantics of
LLVM and has no knowl-edge of the runtime environment of the program. Inreal-life programs, the executing program may callinto the runtime environment which
Lodin mustknow about in order to provide correct verificationresults. The platform plugins serves as a way toprovide these implementations.
Although the focus of this paper is not to describethe
LLVM [17] language itself, we spend some timeon presenting a simplified version of the
LLVM in-struction set and its semantics. The full
LLVM language description is available online [12]. Thedescription we provide is closely linked to the im-plementation inside
Lodin .2 ; Function Attrs : nounwind uwtable define void @main () init: br label %blk blk: %x = phi i32 [ %z, %blk ], [ 0, %init ] %z = phi i32 [ %x, %blk ], [ 1, %init ] %b = icmp eq i32 %x, %z br i1 %b, label %succ , label %blk succ: %y = add i32 0, 1 ret i32 1 } LLVM -Listing 1: An example
LLVM module witha single entry point @ main . An LLVM module consists of functions of whichsome of them may be entry point functions whichare starting points for an
LLVM process. Func-tions are divided into
Basic Blocks where a BasicBlock is a sequence of instructions executed in alinear fashion. Basic blocks are named by labels,so that instructions can direct control to the basicblock. Individual instructions within a basic blockcan be pure artihmetic operations, memory alloca-tions, memory accesses, function calls or instruc-tions that passes control to other basic blocks. Basicblocks are always terminated by the latter class thusthese are called terminator instructions. Operandsto the instructions of an
LLVM program are keptin so-called registers, and a syntactical requirementfor an
LLVM is that it must be in single-static-assignment i.e. each register is only assigned once.In
LLVM -Listing 1 is shown a very short
LLVM program. The program consists of a single function @ main (which is also the entry point) that consistsof three basic blocks init , blk and succ . The blockscovers lines 4 −
5, 7 −
10 and 12 −
13 respectively.The terminating instruction links init to block blk and links blk to succ and blk . We refer to Figure 1for a graphical depiction of how the basic blocks arelinked together. LLVM Types
All operations in
LLVM are typed,either with an arbitrary width bitvector, a com- init : br label % blkblk :% x = phi i32 [ % z , % blk ] [ , % init ] % z = phi i32 [ % x , % branch ] [ , % init ] % b = cmp eq i % x , % zbr i % b , label % succ , label % blksucc :% y = add i32 0 , Figure 1: Control Flow Graph of
LLVM -Listing 1.pound datatype or a memory pointer. The bitvec-tor is denoted i n where n is the width. For ourdiscussion, we restrict ourselves to bitvectors thatare multiple of bytes thus we let T int = { i n | n ∈ { , , , , . . . , } } be the set of all integer types in LLVM . .If ty , . . . , ty n are LLVM types then h ty , . . . ty n − i is acompound type. We denote by T comp all compound LLVM types. For a type h ty , . . . , ty n − i and se-quence of integers i , . . . , i k we let T i ,...,i k ( h ty , . . . , ty n − i ) = T i ,...,i k ( ty i1 ) T ǫ ( ty ) = ty , A memory pointer type to a type ty is denoted ty ∗ . LLVM leaves the bithwidth of pointer types unspec-ified - for the remainder of this paper we assume it is64 bit. As is customary in C-style languages,
LLVM includes the void type used to signify a function doesnot return a value.It will often be convenient to talk about the byte- Like C-Style structs
BSize ( ty ) = n if ty = i n P ni =1 BSize ( ty i ) if ty = h ty , . . . , ty n i ty = i n ∗ We let T denote the set of all types in LLVM . LLVM instructions
Let R be a set of registers, BL be a finite set of basic block labels and let Fs be a finite set of function names, then Table 1 dis-plays the instruction set used in our discussion of LLVM . In the table
BInst ( R ) = Arith ( R ) ∪ Log ( R ) ∪ Mem ( R ) ∪ Cmp ( R ) ∪ Intrin ( R ) are the basic instruc-tions while Term ( R , BL ) are instructions terminatinga basic blocks (e.g. jumps). A short descriptionof the intendend meaning of the instruction classesmay be in order: Arith ( R ) Instructions in this class are arithmeticinstructions that takes two registers ( % inp1 and % inp2 , perform the mathematical operation andstore the result in % res . It is worth noting thatsince LLVM has no signed and unsigned typesit instead has signed and unsigned versions ofsome instructions. Prime examples of this isthe remainder ( rem ) and the division ( div ) in-structions. Signed and unsigned versions aredistinguished by the prefixes ’s’ and ’u’.
Log ( R ) This class consists of instructions perform-ing bitwise operations. It might be worth men-tioning the bit shift operations. Shifting to theleft, shl , is performed by moving the bit patterntowards the most significant bit and pad withzeros. For Shifting to the right, LLVM has tooperations lshr and ashr . The lshr is similar toleft shifting with the difference that the patternis shifted to the least signifant bit and called alogical shift. The ashr is on the other hand aarithmetic right shift, which preserves the signbit of the pattern.
Mem ( R ) This instructions class has instructions forallocating memory, loading a value from a mem-ory address and a value at a memory ad-dres. A special instruction in this class is the getelementptr instruction indexing into a com-pound type stored in memory. It can bethought of as the dereferencing operator in C. Cmp ( R ) This class of instructions are used for com-paring the values of registers. As an example, % res = cmp ule i32 % inp1 , % inp2 compares if % inp1 is less than or equal to % inp2 while interpreting % inp1 and % inp2 as unsigned integers. Term ( R , BL ) This class consists of instructionsterminating a block. A terminating action caneither be a jump to another block or a returnfrom a function. For jumping there are twodifferent version: The unconditional version br label % block that jumps to the specified blockno matter what, and the conditional br i % cond , label % ttblock , label % ffblock thatjumps to ttblock if the pattern in % cond corre-sponds to true and to ffblock otherwise. Thereare also two return instructions: an instruction( ret void ) that does not return a value and onethat does ( ret ty % res ). CInst ( R , Fs ) Instruction for calling otherfunctions. The nstruction for call-ing a function with name @ func is % res = call ret @ func ( ty1 % ^p1 . . . tyn % ^pn ) . Asone would expect, this pass control to thefunction @ func , passes % ^p1 . . . % ^p1 as parametersand stores the result of the function call into % res . Phi ( R , BL ) The instruction class Phi ( R , BL ) consistsof instructions selecting a value based on whichbasic block control flowed from. The instruc-tions are needed, because LLVM -programs arein single-static-assignment form. The instruc-tions are only allowed in the start of a basicblock and must be executed simultaneously i.e.the evaluation of one phi-instruction cannot af-fect the result of another in the same block.
Intrin ( R ) This class is a set of “extension in-structions” used by Lodin . Currently it onlyconsists of instructions that returns a non-deterministic value.4 define dso_local i32 @main () { init: %1 = call i32 (...) @__VERIFIER_nondet_int() %2 = icmp ne i32 %1, 0 br i1 %2, label branch , label end branch : %4 = add nsw i32 %1, 1 br label end end: %.0 = phi i32 [ %4, branch ], [ %1, init ] ret i32 %.0 } LLVM -Listing 2: Example program for using phi i32
Remark 1.
All instructions in Table 1 can takeconstants as parameters in addition to real registers.For ease of exposition we will, however, treat con-stants as standard registers.
Formal Definitions of LLVM Modules
In theintroduction to this section, we mentioned thatLLVM programs consists of functions (of whichsome may be program entry points) and functionsconsists of basic blocks. We are now turning towardsgiving propert formal definitions of these concepts.
Definition 1 (Basic Block) . Let BL be a set of la-bels, Fs be a set of functions names and R be a set ofregisters, then a basic block, B , is a finite sequence I I . . . I n of instruction where • for all i < n , I i ∈ BInst ( R ) ∪ CInst ( R , Fs ) ∪ Phi ( R , BL ) , • I n ∈ Term ( R , BL ) and • if I i ∈ Phi ( R , BL ) then ∀ j < i , I j ∈ Phi ( R , BL ) .We denote the set of all possible basic blocks over BL , R and Fs by BB ( R , BL , Fs )As a convention, if B = I I . . . I n is a basic blockthen we write | B | = n for its length and we let B [ i ] = I i . Definition 2 (Function) . A function F with n paramters over the function names Fs is a tuple ( @ N , R , P , BL , BBs , Bm , ret ) where • @ N ∈ Fs is the functions name, • R is a set of registers, • P = p , . . . , p n where for all i , p i ∈ R , is a se-quence of registers used as parameters, • BL is a finite set of labels with the requirementthat init ∈ BL , • BBs ⊆ BB ( R , BL , Fs ) is a finite set of blocks, • Bm : BL → BBs assigns each block label a basicblock and • ret ∈ T is the return type of the function. Definition 3 (Program Entry Point) . A programentry point is a function ( @ N , R , ∅ , BL , BBs , Bm , void ) . Definition 4 (Module) . An LLVM module M isa tuple ( F , E ) where • F = { F , . . . F n } is a collection of functionswhere ∀ i, F i = ( @ N i , R i , P i , BL i , BBs i , Bm i , ret i ) ,and for all k = j , R k ∩ R j = ∅ and • E = k , . . . , k m is a list of indices defining theentry functions i.e. ∀ ≤ i ≤ m, F i is an entrypoint function. For module M = ( F , E ) we abuse notationslightly and allows writing F ∈ M whenever F ∈ F . Well-typedness
For each register in % r ∈ R weassign a type from t ∈ T and write % r : t to denotethat % r has type t . If a list of registers % , . . . , % n hasthe same type ty , we write % , . . . , % n : ty . Gener-alising this notation to an instruction Inst , we write
Inst : ty to denote Inst is well-typed with type ty .Figure 2 shows the type rules of LLVM instruc-tions. For a function F = ( @ N , R , P , BL , BBs , Bm , retty )we write Rets ( F ) to get all return instructionswithin that functions basic blocks. Given this wesay that F is well-typed ( F : retty ) if for all Inst ∈ Rets ( F ), Inst : retty and all other instructions arewell-typed.5 rith ( R ) % res = add ty % inp1 , % inp2 % res = sub ty % inp1 , % inp2 % res = mul ty % inp1 , % inp2 % res = udiv ty % inp1 , % inp2 % res = sdiv ty % inp1 , % inp2 % res = urem ty % inp1 , % inp2 % res = srem ty % inp1 , % inp2 Log ( R ) % res = shl ty % inp1 , % inp2 % res = lshr ty % inp1 , % inp2 % res = lahr ty % inp1 , % inp2 % res = and ty % inp1 , % inp2 % res = or ty % inp1 , % inp2 % res = xor ty % inp1 , % inp2 Mem ( R ) % res = alloca ty % res = getelementptr ty , ty ∗ % ptr , ty1ind1 . . . , tynindn % res = load ty , ty ∗ % addr store ty % val , ty ∗ % addr Cmp ( R ) % res = cmp eq ty % inp1 , % inp2 % res = cmp ne ty % inp1 , % inp2 % res = cmp uge ty % inp1 , % inp2 % res = cmp ugt ty % inp1 , % inp2 % res = cmp ule ty % inp1 , % inp2 % res = cmp ult ty % inp1 , % inp2 % res = cmp sge ty % inp1 , % inp2 % res = cmp sgt ty % inp1 , % inp2 % res = cmp sle ty % inp1 , % inp2 % res = cmp slt ty % inp1 , % inp2 Term ( R , BL ) ret void ret ty % resbr label % block br i % cond , label % ttblock , label % ffblock Phi ( R , BL ) % res = phi ty [ % inp1 , % lab1 ] . . . [ % inpn , % labn ] CInst ( R , Fs ) % res = call ret @ func ( ty1 % ^p1 . . . tyn % ^pn ) Intrin ( R ) % res = lodin ndty Table 1: Basic instructions over a set of registers R and basic block names BL , where % cond , % res , % inp1 , . . . , % inpn , ∈ R , block , ttblock , ffblock , lab1 , . . . , labn ∈ BL , @ func ∈ Fs and for all i , indi ∈ Z . Modelling External Dependencies
A commonproblem in software verification is that the systemwe want to verify depends on external library func-tions (e.g. libc ), or functions interacting directlywith the operating system (e.g. pthread ). In prin-ciple we could extend the
LLVM language with im-plementations for all these external function callsbut it would unnecessarily inflate the semantics, andthe semantics would have to be redefined for eachexternal library and operating system.
Lodin combats this problem in two ways:1.
Lodin extends the
LLVM language with the % = lodin ndty instruction that returns non-deterministic values, allowing a programmer to re-place external function calls with % = lodin ndty and thereby explore all possible results of externalfunction calls, and 2. Lodin allows programmers toextend the
Lodin interpreter through platform plu-gins that provide implementations of external func-tions. Calls to external function calls are syntacti- cally indistinguishable from function defined in the
LLVM module itself.
Lodin has been developed with reusability in mindallowing to use core components for both explicitstate analysis and symbolic state analysis. Thesemantics we present in the following reflect thisreusability by defining the core semantics in terms ofa context . The context is responsible for represent-ing the register values, how memory is representedand for implementing operations on registers. Thecore semantics “just” translate the
LLVM instruc-tion set to operations on context states and keepstrack of the control flow. In some sense one couldconsider the context being a “virtual machine”.A context provides the
LLVM program with aninfinite set of register variables which the contextmaps to actual values. The intention is that a6 inary ty ∈ T int % res , % inp1 , % inp2 : ty ( % res = inst ty % inp1 , % inp2 ) : ty Compare % res : i % inp1 , % inp2 : ty ty ∈ T int ( % res = cmp cc ty % inp1 , % inp2 ) : i Alloca % res : ty ∗ % res = alloca ty : ty ∗ Load % addr : ty ∗ % res : ty % res = load ty , ty ∗ % addr : ty Store % val : ty % addr : ty ∗ store ty % val , ty ∗ % addr : void Phi % res , % lab1 , . . . , % regn : ty % res = phi ty [ % inp1 , % lab1 ] . . . [ % inpn , % labn ] : ty Ret1 ret void
Ret2 % res : tyret ty % res : ty Branch1 br label % block : void Branch2 % cond : i br i % cond , label % ttblock , label % ffblock : void NonDet % res : ty % res = lodin ndty : ty Call % res : ret (cid:2) % pi , % ^pi : tyi (cid:3) i =1 ...n % res = call ret @ func ( ty1 % ^p1 . . . tyn % ^pn ) GEP % res : res ∗ res = T ind ... ind n ( ty ) % res = getelementptr ty , ty ∗ % ptr , ty1ind1 . . . , tynindn : res ∗ Figure 2: Type rules for
LLVM for which we have ( % res = inst ty % inp1 , % inp2 ∈ Arith ( R ) ∪ Log ( R ) and( % res = cmp cc ty % inp1 , % inp2 ) ∈ Cmp ( R ) LLVM program maps
LLVM registers to contextregister variables i.e. uses a redirection table to ob-tain the values of the
LLVM registers. This doesend up complicating the semantics slightly, but al-lows calling a function twice in the
LLVM programi.e. enables recursion.
Definition 5 (Context) . A context is a tuple A =( S A , s init , dom A , R , ff A ) where • S A is a set of configuration states for the con-text, • s A init ∈ S A is the initial context state, • dom A assigns to each ty ∈ T a range of valuesthat type can attain values within, • R is an infinite set of register variables, • ff A ∈ dom A ( i ) is a representation for “false”. A collection of operations are needed for a
LLVM program to manipulate the states of a context. Mostof these operations are just semantical functions for
LLVM instructions (see Table 2). Instead of writing ◦ ( S, t , t ) = R when applying an operator, we usean infix notation J t ◦ t K S = R . Besides the instruc-tions in Table 2 we need instructions for creatingnew register variables ( mReg ) , evaluate the valueof a register variable ( Eval ty A ), loading ( load ty A ) andstoring ( store ty A ) values from/to memory, allocatingmemory ( alloc ty A ) and free’ing memory ( free ) . Wediscuss them briefly in the following from a usage-perspetice: mReg A : S A × R → S A × R This function takesa context state s A and a register % r , where % r : ty . It returns a register variable r ∈ R that can beused to store values of ty and a new context state s .Naturally, the context must ensure that the registervariable r is not already used in s A .7nstruction Operator SignatureAddition add + ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Subtraction sub − ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Multiplication mul · ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Unsigned Division div / ty u A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Signed Division sdiv / ty s A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Signed Remainder rem % ty s A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Unsigned Modulo srem % ty u A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Shift left shl << ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Logical Shift right lshr >> ty a A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Arithmetic shift right ashr >> ty a A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Bitwise and and & ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Bitwise or or | ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Bitwise xor xor & ty A S A × dom A ( ty ) × dom A ( ty ) → dom A ( ty ) Equality cmp eq == ty A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Non-equality cmp ne = ty A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Signed Greater than cmp sgt > ty s A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Signed Greater than or equal cmp sge > ty s A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Signed Lessr than or equal cmp sle ≤ ty s A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Signed Less than cmp slt < ty s A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Unsigned Greater than cmp ugt > ty u A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Unsigned Greater than or equal cmp uge > ty u A S A × dom A ( i ) × dom A ( ty ) → S A × dom A ( ty ) × {⊤ , ⊥} Unsigned Less than or equal cmp ule ≤ ty u A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Unsigned Less than cmp ult < ty u A S A × dom A ( ty ) × dom A ( ty ) → S A × dom A ( i ) × {⊤ , ⊥} Table 2: Operations for a context A = ( S A , s init , dom A , R ). They each take as input a context state andoperands and returns a new contet states and a return value. The compare instructions also return a valuein {⊤ , ⊥} . Eval ty A : S A × R → dom A ( ty ) This function takesa context state s and register variable r ∈ R , andreturns a value in dom A ( ty ). Set ty A : S A × R × dom A ( ty ) → S A This functiontakes a context state s , register variable r ∈ R withtype ty and a value v ∈ dom A ( ty ). It returns a newcontext state s ′ with r bound to the value v . load ty A : S A × dom A ( ty ∗ ) → dom A ( ty ) This functiontakes a context state s and a memory address in dom A ( ty ∗ ) and returns a subset of dom A ( ty ). store ty A : S A × dom A ( ty ) × dom A ( ty ∗ ) → S A Thisfunction takes a context state s and values v ∈ dom A ( ty ) and a ∈ dom A ( ty ∗ ) . It returns a new s ′ where the value the memory address a has been up-dated to the value v . alloc ty A : S A → S A × dom A ( ty ∗ ) This function takesa context state s and returns a tuple ( s ′ , t ) where t ∈ ty ∗ is a newly allocated memory address withspace for a type ty , and s ′ is a new context stateupdated with information that t is no longer freefor allocation. free A : S A × S ty ∈ T dom A ( ty ∗ ) → S A This func-tion takes a context state s and a value in k ∈ i ∈ B dom A ( i i ∗ ). It returns a new context state s ′ where the memory pointed to by k has been re-leased. NonDet ty A : S A → S A × dom A ( ty ) This functiontakes a context state s and returns a subset of dom A ( ty ) and a new context state. PtrAdd A ty ∗ A : dom A ( ty ∗ ) × Z → dom A ( ty ∗ ) Thisfunction takes a pointer p and natural number b andreturns a pointer new pointer after adding b bytesto p . Core Semantics
We are now ready to define the core semantics fora single
LLVM process relative to a given context.The state of a single process (e.g. instruction tobe executed, what function it is executing, whichblock was previously executed, mapping the func-tions register to context register variables) is keptin an activation record. The activation record alsohas a list of memory addresses, that must be deal-located when control leaves the currently executingfunction. If a function calls another function, an ac-tivation record is pushed in front of the current onethus forming a stack of activation record.
Remark 2.
An activation record roughly corre-sponds to the well-known concept of a stackframe.
LLVM does however not assume the existence ofa stack and rather in the activation keeps a set ofmemory addresses that must be relased when remov-ing the activation record (corresponding to poppingthe stackframe in stack-based systems).
Definition 6 (Activation Record) . Anactivation record, relative to a con-text ( S A , s A init , dom A , R , ff ) is a tuple ( F , prev , cur , pc , π, Free ) where • F = ( @ N , R , P , BL , BBs , Bm , ret ) is the LLVM function currently being executed, • prev ∈ BL is the label of the block executed beforethe current one, • cur ∈ BL is the label of the currently executedbasic block, • pc ∈ N is a pointer into the current basic blockto locate the next instruction to be executed, • π : R → R maps registers to register variablesof the context and • Free is a set of memory addresses that must bedeleted when removing this activation record.
Remark 3.
Intuitively, an activation record is splitinto two parts: 1. A static part that indicates whichinstruction to be executed, given by F , prev , cur and pc , and 2. a dynamic part that links the process tothe memory model of the context, given by π and Free . A stack of activation records is a structure s : s · · · : s n where each s i is an activation record. Theempty stack is denoted by ǫ . In the transition rulesin Figure 3-7, we usually use the notation s : SL meaning that s is the head of the stack and SL is theremaining part of the stack. We also write Inst def =( EXPR ) to denote that
Inst is syntactically equivalentto
EXPR . The transition rules are defined relative to acontext state s and a module. Given a context state s ′ and module M the rules define how to executean instruction Inst from state ( s, SL ), where s is anactivation record and SL is a stack, to produce thetuple (( s ′ , SL ′ ) , s ) where ( s ′ , SL ′ ) is a new state and s is a new context state. We write this as s , M ⊢ ( s, SL ) Inst −−−→ (( s ′ , SL ′ ) , s ′ ) . The rules may look intimidating but most of themare fairly straightforward. As an example let usbriefly consider the rule for binary operators (thatare not comparisons) i.e.
Binary
Inst = Bm ( cur )[ pc ] v ∈ J Eval ty A ( s , r )) ◦ ( inst ) Eval ty A ( s , r ) K s s , M ⊢ ( s, SL ) Inst −−−→ (( F , prev , cur , pc + 1 ,π, Free ) , SL ) , Set ty A ( s , r res ,v ) , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =(% res = instty % inp1 , % inp2 ) r = π (% inp1 ) r = π (% inp2 ) r res = π (% res ) ) This rule says, that in order to execute an in-struction % res = inst ty % inp1 , % inp2 we first figureout which register variables in s that contain thevalues of % inp1 , % inp , % res . This look up is done with9 lloc Inst = Bm ( cur )[ pc ] alloc ty A ( s ) = s ′ ,v s , M ⊢ ( s, SL ) Inst −−−→ (( F , prev , cur , pc + 1 ,π, Free ∪ { m } ) , SL ) , Set ty ∗A ( s ′ , r res ,v ) , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =(% res = allocaty ) r res = π (% res ) Load
Inst = Bm ( cur )[ pc ] v ∈ load ty A ( s , Eval ty ∗A ( s , r addr )) s , M ⊢ ( s, SL ) Inst −−−→ (( F , prev , cur , pc + 1 ,π, Free ) , SL ) , Set ty A ( s , r res ,v ) , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =(% res = loadty , ty ∗ % add ) r addr = π (% addr ) r res = π (% res ) Store
Inst = Bm ( cur )[ pc ] store ty A ( s , Eval ty A ( s , r val ) , Eval ty ∗A ( s , r addr )) = s ′ s , M ⊢ ( s, SL ) Inst −−−→ (( F , prev , cur , pc + 1 ,π, Free ) , SL ) , s ′ , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =( storety % val , ty ∗ % addr ) r val = π (% val ) r addr = π (% addr ) GEP
Inst = Bm ( cur )[ pc ] s , M ⊢ ( s ′ , SL ) Inst −−−→ ( s ′′ , SL ) , s s , M ⊢ ( s, SL ) Inst −−−→ ( s ′′ , SL ) , Set ty A ( s , r res , PtrAdd A ( Eval ty ∗A ( s , r ptr )) ,k ) , s =( F , prev , cur , pc ,π, Free ) ,s ′ =( F , prev , cur , pc +1 ,π, Free ) , F =(@ func , R , P , BL , BBs , Bm , ret ) Inst def =(% res = getelementptrty , ty ∗ % ptr , ty1ind1 ..., tynindn ) r ptr = π (% ptr ) r res = π (% res ) k = T ind2 ,..., indn ( ty )+ ind1 · BSize ( ty ) Figure 3: Transition Rules for memory instructionscalls to π and results kept in r , r , r res . Then we eval-uate the value of r and r in s via calls to Eval ty A ,and the operation corresponding to inst is looked upwith ◦ (see Table 2 for this mapping) and applied ( J Eval ty A ( s , r )) ◦ ( inst ) Eval ty ( s , r ) K s ) giving a newcontext state ( s ′ ), and the value of the operation( v ). Set ty ( s ′ , r res , v ) stores this new value in r res andreturns the new context state. Finally we updatethe program counter ( pc + 1).In the rules special care has to be taken for the phi ty instructoins. All of these must be evaluated si-multaneously. We therefore evaluate the them in abig-step fashion where the evaluation of one instruc-tion also result in evaluating the next instruction (ifit is also a phi ty instruction). For the getelementptr rule, we use the auxillary function T i ,...i n ( h ty1 , . . . , tyn i ) = i − X k =1 BSize ( tyk ) + T i ,...i n ( tyi ) T ǫ ( ty ) = 0to calculate the offset needed to access the correctelement of the designated type. Remark 4. If Lodin has some functions definedin a platform plugin, the call rule in Figure 7 is re-placed by the implementation described in that mod-ule instead. Platform functions are executed atomi-cally in
Lodin . Branch Unconditional
Inst = Bm ( cur )[ pc ] s , M ⊢ ( s, SL ) Inst −−−→ (( F , cur , block , , π, Free ) , SL ) , s , s =( F , prev , cur , pc ,π, Free ) Inst def =( brlabel % block ) F =(@ N , R , P , BL , BBs , Bm , ret ) Branch Conditional True
Inst = Bm ( cur )[ pc ] J Eval ty A ( s ,π ( % cond )) = ty A ff A K s = s ′ , , ⊤ s , M ⊢ ( s, SL ) Inst −−−→ (( F , cur , ttblock , ,π, Free ) , SL ) , s ′ , s =( F , prev , cur , pc ,π, Free ) Inst def =( bri % cond , label % ttblock , label % ffblock ) F =(@ N , R , P , BL , BBs , Bm , ret ) Branch Conditional False
Inst = Bm ( cur )[ pc ] J Eval ty A ( s ,π ( % cond )) == ty A ff A K s = s ′ , , ⊤ s , M ⊢ ( s, SL ) Inst −−−→ (( F , cur , ffblock , ,π, Free ) , SL ) , s ′ , s =( F , prev , cur , pc ,π, Free ) Inst def =( bri % cond , label % ttblock , label % ffblock ) F =(@ N , R , P , BL , BBs , Bm , ret ) Return Void
Inst = Bm ( cur )[ pc ] [ s i = free A ( s i − , f i )] i =1 ...n s , M ⊢ ( s, SL ) Inst −−−→ ( s ′ , SL ′ ) , s n , s =( F , prev , cur , pc ,π, Free ) , Free = { f ,f ,...,f n } Inst def =( retvoid ) SL = s ′ : SL ′ F =(@ N , R , P , BL , BBs , Bm , ret ) Return Value
Inst = Bm ( cur )[ pc ] [ s i = free A ( s i − ,f i )] i =1 ...n s , M ⊢ ( s, SL ) Inst −−−→ ( s ′ , SL ′ ) , Set ty A ( s n , r v , Eval ty A ( s ,π ( % val ))) , s =( F , prev , cur , pc ,π, Free ) , Free = { f ,f ,...,f n } Inst def =( retty % val ) SL = s ′ : SL ′ F =(@ N , R , P , BL , BBs , Bm , ty ) s ′ =( F ′ , prev ′ , cur ′ , pc ′ ,π ′ , Free ′ ) F ′ =(@ func ′ , R ′ , P ′ , BL ′ , BBs ′ , Bm ′ , ret ′ ) , Inst c def =(% res = callty @ N ( ty1 % ^p1 ... tyn % ^pn )) Bm ′ ( pc ′ − Inst c r v = π ′ (% res ) Figure 4: Transition rules for terminator instruc-tions
Phi
Inst = Bm ( cur )[ pc ] Bm ( cur )[ pc +1] ∈ Phi ( R , BL ) s , M ⊢ ( s ′ , SL ) Inst −−−→ ( s ′′ , SL ) , s s , M ⊢ ( s, SL ) Inst −−−→ ( s ′′ , SL ) , Set ty A ( s , r res , Eval ty A ( s , r inp )) , s =( F , prev , cur , pc ,π, Free ) ,s ′ =( F , prev , cur , pc +1 ,π, Free ) , F =(@ func , R , P , BL , BBs , Bm , ret ) Inst def =(% res = phity [% inp1 , % lab1 ] ... [% inpn , % labn ]) ∃ i, labi = prev r inp = π (% inpi ) r res = π (% res ) Phi2
Inst = Bm ( cur )[ pc ] Bm ( cur )[ pc + 1] / ∈ Phi ( R , BL ) s , M ⊢ ( s, SL ) Inst −−−→ ( s ′ , SL ) , Set ty A ( s , r res , Eval ty A ( s , r inp )) , s =( F , prev , cur , pc ,π, Free ) ,s ′ =( F , prev , cur , pc +1 ,π, Free ) , F =(@ func , R , P , BL , BBs , Bm , ret ) Inst def =(% res = phity [% inp1 , % lab1 ] ... [% inpn , % labn ]) ∃ i, labi = prev r inp = π (% inpi ) r res = π (% res ) Figure 5: Compare Rules for Phi instructions
Compare
Inst = Bm ( cur )[ pc ] J Eval ty A ( s , r ) op ty Eval ty A ( s , r ) K s = ,v, s , M ⊢ ( s, SL ) Inst −−−→ ( F , prev , cur , pc +1 ,π, Free ) , SL ) , Set ty A ( s ,π ( % res ) ,v ) , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =(% res = cmpcondty % inp1 , % inp2 ) γ ty ( cond )=( op, ) r = π (% inp1 ) r = π (% inp2 ) Figure 6: Compare Rules for comparison instruc-tions10 inary
Inst = Bm ( cur )[ pc ] v ∈ J Eval ty A ( s , r )) ◦ ( inst ) Eval ty A ( s , r ) K s s , M ⊢ ( s, SL ) Inst −−−→ (( F , prev , cur , pc + 1 , π, Free ) , SL ) , Set ty A ( s , r res , v ) , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =(% res = instty % inp1 , % inp2 ) r = π (% inp1 ) r = π (% inp2 ) r res = π (% res ) ) Call Function
Inst = Bm ( cur )[ pc ] [ s i , g i = mReg A ( s i − , r i ) ∧ π i = π i − [ r i g i ]] i =1 ...m [ s m + i +1 = Set ty i A ( s m + i , π m ( p i ) , v i )] i =0 ...n s , M ⊢ ( s, SL ) Inst −−−→ (( F ′ , init , init , , π m , ∅ )) , s ′ : SL , s =( F , prev , cur , pc ,π old , Free ) s ′ =( F , prev , cur , pc +1 ,π old , Free ) Inst def =( callret @ func ( ty % ^p ... ty n % ^p n )) ∀ i,v i = Eval ty i A ( s ,π old (% ^p i )) , F ′ =(@ func , R , P , BL , BBs , Bm , ret ) ∈M , P = { p ,..., p n − } , R = { r ,..., r m } ,π ′ : R →R NonDet
Inst = Bm ( cur )[ pc ] NonDet ty A ( s ) = V, s ′ v ∈ V s , M ⊢ ( s, SL ) Inst −−−→ (( F , prev , cur , pc + 1 , π, Free ) , SL ) , Set ty A ( s ′ , r res , v ) , s =( F , prev , cur , pc ,π, Free ) F =(@ N , R , P , BL , BBs , Bm , ret ) , Inst def =(% res = lodin ndty ) r res = π (% res ) Figure 7: Miscellaneous rules Rules define void @stub () { init: call void @N () br label %loop loop: br label %loop ret void } LLVM -Listing 3: Stub function ( stub F for instanti-tating an entry point F = ( @ N , R , P , BL , BBs , Bm , void ) Network of Processes
Let M = ( F , E ) be an LLVM module where F = { F , . . . , F n } with F i =( @ N i , R i , P i , BL i , BBs i , Bm i , ret i ) and E = { k , . . . , k m } and let A = ( S A , s init , dom A , R , tt A , ff A ) be acontext. We define the transition system L AM =( N , n , −→ A ) where a state n ∈ N is a tuple n =( s , s , . . . , s m , s , M ) where each s i is a state of aprocess and s ∈ S A .A state n = ( s , s , . . . , s i , . . . , s m , s , M ) maytransit to state n ′ = ( s , s , . . . , s ′ i , . . . , s n , s ′ , M ) viathe i th component performing an instruction Inst if s , M ⊢ s i Inst −−−→ s ′ i , s ′ . We write this as n Inst −−−→ i A n ′ .The initial state n is(( κ , ǫ ) , . . . , ( κ m , ǫ ) , s init , M ) where κ i =( stub F ki , init , init , , , ∅ ) and stub ( F k i ) is aspecial stub function shown in LLVM -Listing 3.
In the preceding section we developed the semanticsof
LLVM programs abstractly i.e. we defined an“interface” to a context of the semantics, allowinginstantiating different semantics by modifying theinstantiation of this interface. In this section wedevelop two instantiations ( E and S ) of the interface.The resulting transition semantics for module M , L EM ( L SM ), we call the explicit (symbolic) semantics. Bitvectors
Let B = { , } then a bitvector ofwidth n is an element in B n . Two special bitvectorsare ~ n = (0 , , . . . , ∈ B n and ~ n = (1 , , . . . , ∈ B n . If ~b = ( b , b , . . . , b n − ) ∈ B n is a bitvector, thenwe can access individual bits by indexing into ~b i.e. ~b [ i ] = b i . We also allow extracting the sub-vector( b i , . . . , b j ) by ~b [ i : j + 1]. If ~b = ( b , b , . . . , b n − ) ∈ B n , ~c = ( c , . . . , . . . c i − ) ∈ B i , k ∈ { , n − } and k + i < n then we let ~b [ k : k + i/~c ] = ( b , b , . . . , b k − , c , . . . , c i − , b k + i , . . . , b n − ) . Let ~b = ( b , b , . . . , b n − ) ∈ B n be a bitvector,then we can interpret it as either an unsigned integeror a signed integer. In the prior case we use the stan-dard binary encoding and define h ~b i = P ni =0 b i · i . In the latter case we use 2’s-complement encoding11nd let h· ~b ·i = − b n − n − + P n − i =0 b i i . To encodea number n ∈ N in either binary or 2s-complementwe write h n i − and h· n ·i − respectively.The classic bitwise operators, and, or, xor andnegation, between vector ~b , ~b ∈ B n are defined asusual and denoted ( ~b and ~b ), ( ~b or ~b ), ( ~b xor ~b )and ( neg ~b ) respectively. If ~b ∈ B n is a bitvector, d ∈ N is a number and d < n then we define bitshifting operations as ~b lshl d = ~ n [ d : n/ ~b [0 : n − d ]] ,~b lshr d = ~ n [0 : d/ ~b [ n − d : n ]] ~b ashr d = ( ~ n [0 : d/ ~b [ n − d : n ]] if ~b [ n −
1] = 0 ~ n [0 : d/ ~b [ n − d : n ]] if ~b [ n −
1] = 1The lshl ( lshr ) operator is a logic left (right)bitshift i.e. shift all bits to left (right) and pad withzero. The ashr is arithmetic right shift where in-stead of padding with zero, the bit vector is paddedwith the original value of the most significant bit. Memory Modelling
In the explicit semantics wemodel the memory state of a computer as a (possi-bly) infinite length array of memory blocks. Mem-ory blocks are tagged with their size and the actualcontent of the block. Formally, the memory stateof program is a function M : N → ( N × ( S i ∈ N B i )) ∪{ a } ). An entry M B ( i ) means that block i of thememory has not been used. If M B ( i ) = ( k,~b ) and ~b ∈ B k then we say that block i is consistent , has k and ~b is the content of that block.To modify and read from memory, we define thefunctions: • new (( M ) , i ) = ( M [ n ( i,~ i )]) , n where n = min ( { g | M ( g ) = a } ), • Memfree (( M , Used ) , i ) = ( M [ i a ]), • read ( M , b, f, len ) = ~b [ f : f + len ] where M ( b ) =( i,~b ) and f + len < i and M ⊥ ⊥ ... block ( size , · )... offset . . . size Figure 8: Memory representation in
Lodin . Point-ers are 64bit integeres split into a 32bit base and a32bit offset . Lodin uses a redirection table ( M )that store memory blocks, and block indexes intothis table, while offset indexes into the memoryblocks. The symbol ⊥ indicates an entry in M isunused. • write ( M , b, f, ~c, len ) = ( M [ b ~b [ f : f + len/~c ] , Used ) where M ( b ) = ( i,~b ) and f + len < i andThe initial state of the memory is the function M init where for all i , M init ( i ) = a .Given both a representation of the register valuesand the memory, we can now define the explicit con-text. In the explicit context, we assign to a type in the domain B n and any pointer type is assigned thedomain B . Using a 64-bit bitvector for represent-ing pointers allows us to use the 32 most significantfor indexing into M of the memory and use 32 leastsignificant bits to index into the actual block. Fora pointer p ∈ B we let block ( p ) = p [32 : 64] and offset ( p ) = p [0 : 32]. See Figure 8 for a graphicaldepiction of how this work. Definition 7 (Explicit Context) . The explicit con-text is the tuple E = ( S E , s E init , dom E , N , ff E ) where • S E = { ( M , N, F ) | M is a memory state ∧ N ⊂ N ∧ F : N → ( S i ∈ N B i ) ∪ {⊥}}• s E init = ( m init , ∅ , F ) where for all i , F ( i ) = ⊥ , • dom E ( t ) = B · BSize ( t ) • ff E = h i − . Reg E (( M , N, F ) , % r ) = ( M , N ∪ { i } , F ) , i where i = min ( N \ N ) Eval ty E (( M , N, F ) , i ) = ( F ( i ) if F ( i ) ∈ dom E ( ty ) Error
Otherwise alloc ty E (( M , N, F )) = ( M ′ , N, F ) , i if ty = i n ∧ new ( M , BSize ( ty )) = M ′ , i free E ((( M , Used ) , N, F ) , i ) = ( Memfree (( M , Used ) , k ) , N, F )) if i ∈ B k = h i [32:64] i∈ Used h i [0:32] i =0 Error otherwise load ty E ((( M , Used ) , N, F ) , i ) = { ((( M , Used ) , N, F ) , read (( M , Used ) , k, o, m )) } if k = h i [32:64] i∈ Useddom E ( ty )= B m o = h i [0:32] i∗ s,b )= M ( k ) o + m ty u E and > ty s E below, and note that theremaining comparison operators are easily gener-alised from these. In the rules we let tt E ∈ dom E ( i )and require tt E = ff E . > ty u E ( s , r , r ) = ( ( s , tt E , ⊤ ) if h r i > h r i , ( s , tt E , ⊥ ) otherwise > ty s E ( s , r , r ) = ( ( s , tt E , ⊤ ) if h· r ·i > h· r ·i ( s , ff E , ⊥ ) otherwise Remark 5.
Instantiating a model with the explicitcontext as described so far result in a possibly infi-nite state space. As a result, an exhaustive enumer-ation of all possible states may not terminate.
We have already mentioned that an explicit rep-resentation of values in a program will explode(even without concurrency) in the presence of non-deterministic values. As an example of this, con-sider
LLVM -Listing 4 which can call the function @@ error if and only % is set to 5. It is easy for easyfor humans to realise that @@ error can be called, buta computer with an explicit representation has toenumerate all 32 − % .For combatting this, Lodin provides a symboliccontext representation. Instead of representing val-ues explicitly, the symbolic context gathers all oper-ations performed during exploration into one largelogical formula - known as the path formula - thatcan since be passed to an SMT-solver. The SMT-solver can then determine if the formula is satisfiableand thus if the explored path is feasible.
An SMT-instance is principally a first order logicformula where some predicates and functions have define void @main () init: %1 = alloca i32 , align 4 %2 = lodin_nd i32 store i32 %2, i32* %1, align 4 %3 = load i32 , i32* %1, align 4 %4 = icmp eq i32 %3, 5 br i1 %4, label %call , label %done call: ; preds= %0 call void (...) @error () br label %done done: ;preds = %5 , %0 ret void } LLVM -Listing 4: Example of why Symbolic Repre-sentation are necessaryspecial interpretations. These special interpreta-tions are encapsulated into what is called theories.An SMT-instance of the theory T can be determinedto be satisfiable or not satisfiable by SMT-solversupporting the T . We will not invest too much timehere in talking about how SMT-solvers work, butwill rather informally discuss the theories we need. Theory of Bitvectors
In the theory of bitvec-tors, variables are given a bitvector type i n . Theoperations that can be performed between bitvec-tors are • the classic bitwise operations, i.e. and , or , neg , xor , lshl , lshr and ashr • arithmetic operations (modulo 2 n ), i.e. add , sub , div u , div s , mul , rem u , rem s - as inthe LLVM discussion we need both signed andunsigned versions of some operations (indexedby u and s ) • comparisons e.g. = and ≤ , • boolean operations e.g. ( ∧ , ∨ , ¬ ) • concatenation of bitvectors ◦ , Note we reuse the type name from
LLVM ty E ( S, ~b , ~b ) = { ( h b + b i − ) } if b i = h ~b i i b + b ≤h ~ m i − { ( h ( b + b )%2 m i − ) } otherwise PtrAdd E ( ~b , k ) = ~b where block ( ~b ) = block ( ~b ) and offset ( ~b ) = offset ( ~b ) + k − ty E ( S, ~b , ~b ) = { ( h b + b i − ) } if b = h ~b i b = h neg ~b i +1 b + b ≤h ~ m i − { ( h ( b + b )%2 m i − ) } if b = h ~b i b = h neg ~b i +1 · ty E ( S, ~b , ~b ) = { ( h b ∗ b i − ) } if b i = h ~b i i b ∗ b ≤h ~ m i − b i = h ~b i i { ( h ( b ∗ b )%2 m i − ) } otherwise / ty u E ( S, ~b , ~b ) = ( { ( h⌊ b /b ⌋i − ) } if b i = h ~b i i b =0 { ( ~c ) | ~c ∈ B m } otherwise / ty s E ( S, ~b , ~b ) = ( { ( h· trunc ( b /b ) ·i − } if b i = h· ~b i ·i b =0 { ( ~c ) | ~c ∈ B m } otherwise% ty u E ( S, ~b , ~b ) = ( { ( h b − b · ⌊ b /b ⌋i − ) } if b i = h ~b i i b =0 { ( ~c ) | ~c ∈ B m } otherwise% ty s E ( S, ~b , ~b ) = ( { ( h· b − b · ( trunc ( b /b )) ·i − } if b i = h· ~b i ·i b =0 { ( ~c ) | ~c ∈ B m } otherwise << ty E ( S, ~b , ~b ) = ( { ( ~b lshl b ) } if b = h ~b i b We reuse the operatorions from ourdiscussin of bitvectors in subsection 3.1, and re-quire that the SMT-solver implements the seman-tics of the operations as described there. Likewisewe write constant bitvectors using the notation fromsubsection 3.1. Theory of Arrays In this theory an array is amapping between elements. Elements from an ar-ray can be read using a select function, and anelement stored in an array using a store function.We introduce the array type { i n } → { i m } mappingelements from i n to i m . If v : { i n } → { i m } , v : i n and v : i m then we write store ( v, v , v ) to createa new array that is equal to v with the only differ-ence that the value of v now maps to the value of v . We also write v = select ( v, v ) to set v equalto the value kept at position v .In the following we use V to denote an infinite setof SMT variables. We also use the restricted sets V ty = { v ∈ V | v : ty } . Similarly we refer by W to all SMT expressions over V and W ty to all SMTexpressions with type ty . The Symbolic Context The symbolic context in Lodin maps its registervariables to SMT variables and uses a so called pathformula to capture all constraints (assignments andcomparisons) encountered during a program execu-tion. Memory is represented using a SMT array anda SMT variable points to first place in memory thatis free for allocation. Definition 8 (Symbolic Context) . The symboliccontext for the symbolic semantics is the tuple S =( S S , s init , dom S , N , ff S ) where • S S are tuples ( v M , v f , N, F, ψ, used ) where – v M : { i } → { i } is an array represent-ing the memory state of the program, – v f : i is a pointer into memory – N ⊆ N is a set of used register variables, – F : N → V ∪ {⊥} – ψ is an SMT formula - the path formula- encoding the constraints that an exploredpath has to satisfy, and – used ⊆ V is a set of used SMT variables. • s init = ( M, , F, ff S == ff S , ∅ ) where for all n ∈ N , F ( n ) = ⊥• dom S ( i i ) = W i i , dom S ( ty ∗ ) = W i , and dom S ( h ty1 , . . . ty1n i ) = W · BSize ( h ty1 ,... ty1n i ) • ff S = ~ . The arithmetic instructions (e.g. + ty S ( s , v , v )that we need to implement for the context isstraightforward to represent. All we need to do isto create an SMT expressions corresponding to theoperation, Below we give a generalised definition ofthe rule: ∼ ty S (( v M , v f , F, ψ, used ) , v , v ) = v SMTOp v For the mapping between ∼ ty S and SMTOp we refer toTable 3.The comparison operators are very similar tothe binary operator, and below we provide an ex-ample for the > ty u S ( s , v , v ) function where s =( v M , v f , F, ψ, used ) > ty u S ( s , v , v ) =( v M , v f , N, F, ψ ∧ ( v > u v ) , used , v > u v , ⊤ )For the remainder of the operations we refer thereader to Figure 11 and Figure 12. Example 1. We briefly return to the module ( M )in LLVM -Listing 4 and consider how we can usethe symbolic representation of Lodin to determineif the function @ error can be called. We simplyinstantiate the symbolic transition system L SM =( N , n S , −→ S ) and generate symbolic states from n S until we reach a state n f = ( s : s · · · : ǫ, s S , M )16 Reg S (( v M , v f , N, F, ψ, used ) , % r ) = ( v M , v f , N ∪ { i } , F, ψ, used ) , i where i = min ( N \ N ) Eval ty S (( v M , v f , N, F, ψ, used ) , i ) = ( F ( i ) if F ( i ) ∈ dom S ( ty ) Error Otherwise Set ty S (( v M , v f , N, F, ψ, used ) , l, v ) = ( ( v M , v f , N, F, ψ ∧ ( F ( l ) = v ) , used ) if l ∈ N ∧ v ∈ dom S ( ty ) Error Otherwise alloc ty S (( v M , v f , N, F, ψ, used )) = ( ( v M , v f ′ , N, F, ψ ∧ ( v f ′ = v f add n ) , used \ { v f } ) , v f if ty = i n ( v M , v f ′ , N, F, ψ ∧ ( v f ′ = v f add , used \ { v f } ) , v f if ty = i n ∗ PtrAdd S ( v b , k ) = v b add k Figure 11: Evaluation and setting registers in symbolic context. load i n S (( v M , v f , N, F, ψ, used ) , i ) = SymbLoad i n ( v M , F ( i )) SymbLoad i n ( v M , v a ) = ( select ( v M , v a ) if i n = i select ( v M , v a ) ◦ SymbLoad i n − ( v M , v a add 1) otherwise store ty S (( v M , v f , N, F, ψ, used ) , v v , v p ) = ( v M ′ , v f , N, F, ψ ∧ ( v M ′ = SymbStore ty ( v M , v v , v p )) , used , v v , v p ) SymbStore i n (( v M , v v , v p )) = ( store ( v M , v p , v v ) if i n = i SymbStore i n − ( store ( v M , v p , v v [0 . . . , v p add , v v [8 . . . n ]) otherwiseFigure 12: Store and Load operations in the symbolic context where s = ( @ main , prev , call , pc , π, Free ) and s S =( v M , v f , N, F, ψ, used ) . Reaching n f reveals thatthere is a path in the control flow graph of @ main that reaches the call-block (and thereby the call in-struction), but not that it is feasible. To ensure thefeasibility, we invoke a SMT-solver and checks if ψ is satisfiable. If this is the case, we can read thevalue of all registers used along that path from theSMT satisyfing assignment. Remark 7. The symbolic context assigns each reg-ister of an LLVM program a single SMT-variable,and gathers constraints over these SMT-variables in a path formula. Assignments to LLVM registers iscaptured by equality between the SMT-variable andSMT-expressions. A result of this is that the sym-bolic context does not support assigning to the sameregister multiple times thus it is only applicable forfor programs without any loops in their control-flow-graph. Merging Symbolic States It is usual convenientto merge symbolic context states into one state.This allows exploring several computational pathssimultaneously and helps combat path-explosion17 ty S add − ty S sub · ty S mul / ty u S div u / ty s S div s % ty u S rem u % ty s S rem s << ty S lshl >> ty l S lshr >> ty a S ashr & ty S and | ty S or ⊕ ty S xor Table 3: Mapping between semantical operators andSMT operatorsproblem - which is a big problem for symbolic exe-cution engines such as Klee .For merging context-states s S = ( v M , v f , N, F, ψ, used )and s ′S = ( v ′ M , v ′ f , N ′ , F ′ , ψ ′ , used ′ )where for all n ∈ N ∩ N ′ it is the case that F ( n ) = F ′ ( n ) we introduce the function merge : S S × S S →S S defined as merge ( s S , s ′S ) = ( v ′′ M , v ′′ f , N ∪ N ′ , F ′′ , ( ψ ∨ ψ ′ ) ∧ ψ ′′ ∧ ψ ′′′ , used ∪ used ′ ∪ { v ′′ M , v ′′ f , v P } )where • F ′′ ( n ) = ( F ( n ) if n ∈ NF ′ ( n ) if n ∈ N ′ • v ′′ M , v ′′ f , v P / ∈ used ∪ used ′ , • ψ ′′ def = ( v ′′ M = ite ( v P , v M , v M ′ )), • ψ ′′′ def = ( v ′′ f = ite ( v P , v f , v f ′ )).Here v P with type i is a fresh SMT variable and ite ( v P , v, v ′ ) evaluates to v ′ if v P = ff S and to v otherwise. Model Checking [1, 8] is a technique widely used inacademia for validating that a formal model of a pro-gram behaves correctly - according to a specificationgiven by a logical formula. A basic specification isa reachability specification, where we are interestedin finding a state where a given proposition is true.This is the main focus in Lodin , and thus we willlimit our discussion to this setting. At the core of any reachability checking algorithmis a transition system to search and a set of atomicpropositions. In the case of Lodin , the state spacewe search is L EM = ( N , n , −→ E ). Atomic proposi-tions of a program are elements that may be true orfalse in a state ( for instance whether x == 5 or ifa state has a DataRace ) . An interpretation (overstates N ) of an atomic proposition, p , is a function P p : N → { tt , ff } , where tt indicates p is true and ff indicates it is false. Atomic propositions may becombined with the classical boolean operators ∧ , ∨ and ¬ . The interpretation of these combined propo-sitions are defined recursively below as, • P ψ ∧ ψ ( n ) = P ψ ( n ) ∧ P ψ ( n ) • P ψ ∨ ψ ( n ) = P ψ ( n ) ∨ P ψ ( n ) • P ¬ ψ ( n ) = ¬P ψ ( s ),where ψ , ψ are combined proposition themselves.Checking reachability for the proposition ψ is now tocheck whether we, from the initial state, can reacha state n where P ψ ( n ) = tt . The classical ap-proach for such a search is the fix-point algorithmin Algorithm 4.For a finite state system Algorithm 4 obviouslyterminate, as Passed eventually contains the entirereachable state space - and thus no further statescan be put into Waiting and therefore Waiting willeventually become ∅ . Equally straightforward is it We define the exact propositions of Lodin in a short while ata: Property : φ Data: Initial state: n Result: ⊤ or ⊥ Passed := ∅ ; Waiting := { n } ; while Waiting = ∅ do Let n c ∈ Waiting ; Waiting := Waiting \ { n c } ; if P φ ( n c ) thenreturn ⊤ end Waiting := Waiting ∪ { n |∃ i, Inst s.t. n c Inst −−−→ i E n } ; Waiting := Waiting \ Passed endreturn ⊥ Algorithm 1: The classic reachability algorithm.States that has not been explored (but found) arekept in the set Waiting , and states that has alreadybeen processed are kept in Passed .to realise that Algorithm 4 produces correct results.Algorithm 4 is non-deterministic in selecting an ele-ment from Waiting and in generating successors ofthe currently considered state. The latter can easilybe determinisied by generating states in a fixed or-der, while the prior can be determinised in differentways: the two usual ways is to keep the elements of Waiting in a stack or on a queue and let the orderinduced by these define the search order. Remark 8. As mentioned earlier, the explicit statespace may in fact be infinite thus Algorithm 4 maynot terminate. In Lodin we have added optionsfor terminating any verification after a user definedtime or after using a user defined size of memory. LLVM Propositions Lodin has support forpropositions specifying classic programming errors(division by zero, data race, out of bounds errors,etc). Furthermore, it is posible to do comparisonsbetween registers and check if a specific functionis called by a process. The use case for the lat-ter is, that the user can modify the verified pro-gram to call an error function and check if that h Prop i | = h Compare i | h Simple ih Compare i | = ( h Comparand ih OP ih Comparand i ) h Comparand i | = h Number i | h Register ih Number i | = h Integer i ; h Type ih Register i | = @ h Integer i . h String i . % h String i ; h Type ih Type i | = h us i | h us i | h us i | h us i h us i | = ui | si h OP i | = < | < = | > = | > | == | ! = h Simple i | = DataRace | DivZero | OverFlows | [ h Integer i . h String i ] Figure 13: Grammar generating verification queriesof Lodin .function is called . In Lodin s propositional lan-guage, registers and numbers are typed to signedbitvectors or unsigned bitvectors with the suffixes ui n and si n where n ∈ { , , , } . For any pro-duction rule R in Figure 13 we write Ψ( R ) for thelanguage generated by that rule. An expression like @ . F . % tmp3 ; ui32 == 3; ui32 , means take register % tmp3 in the function @ F of the 0th process. Interpret it asan unsigned 32bit integer, and compare it for equal-ity with 3 also interpreted as a unsigned 32bit in-teger . For comparisons to make sense, the two ex-pressions being compared must, naturally, have thesame type.For evaluating the value of a register in a state( n = ( s , s , . . . , s n , s , M )), we define A @ k . F . % tmp ; uin ( n ) = h Eval ty E ( s , r ) i if s k =((@ F , prev , cur , pc ,π, Free ) , SL ) tmp : i n r = π ( tmp ) ~ n otherwise A @ k . F . % tmp ; sin ( n ) = h· Eval ty E ( s , r ) ·i if s k =((@ F , prev , cur , pc ,π, Free ) , SL ) tmp : i n r = π ( tmp ) h· ~ n ·i otherwise This modification could even be done at compile-time, byreplacing the implementation of the commonly used assert function ui32 ) we write A ui32 ( n ) and it has the obvious implementation.Given these notations, we can define how proposi-tions are evaluated within Lodin in Figure 14. Ashort discussion may be in order about the evalua-tions in Figure 14. • Division by zero ( DivZero ) are determined inthe obvious manner, where we simply check ifany process executes any instruction involvinga division and check if the second operand iszero. • Buffer overflows ( OverFlows ) are likewise eas-ily checked by checking if any process accessesmemory, and for each of those that do accessmemory we check if their read/write to mem-ory exceeds the length of the buffer they arewriting/reading into/from. • The instruction for checking whether a spe-cific process number i can call a function func ([ i. func ]), we first check if process i performs a call instruction and if so, if the functions beingcalled matches func . • The most diffuclt proposition to check is with-out a doubt DataRace . For evaluating thisinstruction, we iterate over all processes andfinds pairs of read/write and write/write to thesamme pointer base. Afterwards we check iftheir offset + length overlaps Example 2. As a short example of using Lodin for reachability checking let us consider LLVM -Listing 1 and consider we are interested inwhether % x and % z can ever be equal. Notice thatsince all phi instructions should be executed atomi-cally in the beginning of a block, this should neverbe possible - thus checking this with Lodin actually div , sdiv , rem , srem P A ⊲⊳ A ( n ) = A A ⊲⊳ A A P DivZero ( n ) = ⊤ if ∃ s i =(( F , prev , cur , pc ,π, Free ) , SL ) F =(@ F , R , P , BL , BBs , Bm , ret ) Bm ( cur )[ pc ]=% res = DIV ty % inp1 , % inp2DIV ∈{ udiv , sdiv , urem , srem } r = π (% inp2 ) h Eval ty ( s ,r ) i =0 ⊥ otherwise P OverFlows ( n ) = ⊤ if ∃ s i =(( F , prev , cur , pc ,π, Free ) , SL ) F =(@ F , R , P , BL , BBs , Bm , ret ) Bm ( cur )[ pc ]= store ty % inp1 , ty ∗ % inp2 r = π (% inp2 ) π (% inp1 ) ∈ B l ( len ,v )= M ( block ( r )) offset ( r )+ l> len ⊤ if ∃ s i =(( F , prev , cur , pc ,π, Free ) , SL ) F =(@ F , R , P , BL , BBs , Bm , ret ) Bm ( cur )[ pc ]= store ty % inp1 , ty ∗ % inp2 r = π (% inp2 ) π (% inp1 ) ∈ B l ⊥ = M ( block ( r )) ⊥ otherwise P [ i . func ] ( n ) ⊤ if ∃ s i =(( F , prev , cur , pc ,π, Free ) , SL ) F =(@ F , R , P , BL , BBs , Bm , ret ) Bm ( cur )[ pc ]=% res = call ret @ func ( ty1 % ^p1 ... tyn % ^pn ) ⊥ otherwise P DataRace ( n ) = ⊤ if ∃ s i =(( F i , prev i , cur i , pc i ,π i , Free i ) , SL i ) F i =(@ F i , R i , P i , BL i , BBs i , Bm i , ret i ) Bm i ( cur )[ pc i ]= res = load ty , ty ∗ ptr i p i = Eval ty i ∗ E ( s ,π i ( ptr i )) ∃ s j =(( F j , prev j , cur j , pc j ,π j , Free j ) , SL j ) F j =(@ F j , R j , P j , BL j , BBs j , Bm j , ret j ) Bm j ( cur )[ pc j ]= store ty j val , ty j ∗ ptr j p j = Eval ty j ∗ E ( s ,π i ( ptr i )) block ( p i )= block ( p j ) { offset ( p i ) ,... offset ( p i )+ BSize ( ty i ) }∩{ offset ( p j ) ,... offset ( p j )+ BSize ( ty i j ) }6 = ∅ ⊥ otherwiseFigure 14: Evaluation of propositions in Lodin where A , A ∈ Ψ( Register ) ∪ Ψ( N umber ) , ⊲⊳ ∈ Ψ( OP ), n = ( s , s , . . . , s n , s , M ) and s =(( M , Used ),N,F) . For OverFlows we have only shownthe rule for overflows at writes, but naturally thereis an equivalent rule for reads.20 Lodin example.ll example2.qLodin 0.3 (Jul 8 2019)Revision : 0.2 -802 - ga42644cfImportance Ratio: doubleLLVM: 8.0.0LLVM module modifications:Remove Unuused instructionsWarning : No entry -point specified. Assumingmain.Random seed: 1562587068System : NaiveGraph -explicitPlatform : PThreadStorage : SharedMem StorageSuccessor: StandardProb - Successor: StandardPassed -Waiting : StandardSMT -Backend : Boolector 3.0.0Verifying: E< >((0 .main.b ==) )Warning : Casting register main.b to integertype UI8 - can ’t guarantee LLVM uses thisregister as suchNot Satisfied Lodin -Output 1: Output from Lodin . checks if Lodin implements the phi instructions be-haviour correctly.In Lodin we can check the property by asking thequery E <> ( @ . main . % x ; ui32 == @ . main . % z ; ui32 ) .Unfortunately Lodin reports that this is indeedpossible even though it should not be. There is alogical explanation for this: both registers are ini-tialised by Lodin to thus in the initial state theyare equal. For this reason, it is more reasonable touse the % b register for our check thus we check thequery E <> ( @ . main . % b ; ui8 == 1; ui8 ) and get the re-sult in Lodin -Output 1 indicating it is indeed notpossible. A well-known problem for explicit-state reachabil-ity checking of parallel systems is the notorious statespace explosion problem i.e. that the combined statespace increases exponentially when each process ofthe system increases linearly. This is a huge problemwhen considering high-level programs and exacter-bated when using LLVM as input, because LLVM programs has more instructions per process. Formaking explicit-state reachability checking possiblewe thus need ways of limiting the size of the statespace. A first realisation to reduce the state spaceis, that processes can only influence each others be-haviour at predefined points, namely when accessingmemory. Due to our specification language allowingto query whether functions can be called, we alsoconsider call instructions to affect the external be-haviour of a process. We say that an instruction Inst is internal if Inst if it is a load , store or call instruc-tion. We denote the set of all internal instructionsby Internal ( R ). In the following we describe thetwo state space reductions that are implemented in-side Lodin . They both define a new transition re-lation, that can directly replace −→ E . e Our first state space reduction is based on theidea, that when a process performs a transi-tion step it will perform all following transitionsthat executes internal instructions. More for-mally, we replace the transitions relation −→ E with −→ e where −→ e is defined according tothe rule (cid:20) n k − Inst k −−−→ i E n k (cid:21) k =1 ...n n Inst ... Inst n −−−−−−−−→ i e n n , ∀ k> , Inst k ∈ Internal ( R ) . Notice, that there is no lower length in thensize of the sequence Inst , · · · Inst n . To achievethe largest reduction, Lodin always uses thelongest possible sequence. I In this state space reduction, all processes thatperform internal instructions execute simulta-neously while all other processes execute inde-pendently. The transition relation −→ I is de-fined by two rules (cid:20) n k − Inst k −−−→ i k E n k (cid:21) k =1 ...n n Inst ... Inst n −−−−−−−−→ i ,...,i n I n n , ∀ k, Inst k ∈ Internal ( R ) b cd f hi j 8 l (a) E ad hi l (b) e ab cd f hi j 8 l (c) I Figure 15: Lodin state space reductions. Transi-tions going left originates from one process whiletransitions going to the right correspond to another.Dashed arrows indicate visible actions. n Inst −−−→ i E n ′ n Inst −−−→ i I n ′ , Inst / ∈ Internal ( R ) In Figure 15 we provide a graphical overview ofhow these reductions modifies the state space. Example 3. As an example of the state space re-ductions that I and e respectively do, considerthe C-program in Figure 16 that executes petersonsmutual exclusion algorithm. To use this programwith Lodin , it must first be compiled to an .ll -fileusing clang . After this step we can inspect thestate space reductions achieved by asking Lodin thequery EnumStates on the resulting .ll -file with the clang -S -c -emit-llvm file.c State Generator States DataRace States E e I different state space reductions. In Table 4 we seethe reported number of states, along with how manystates with data races that was encountered. Noticethat I in this case achieves the largest reduction. Although the above state space reductions canreduce the state space due to interleavings dra-matically, they cannot reduce the number of statescaused by non-deterministic input. A program withjust one non-deterministic 32bit value will end uphaving over 2 states. In the preceding section we saw how Lodin can beused to perform an exhaustive state space searchunder an explicit context. We also realised, thatthe state space explosion problem poses a problemfor any exhaustive search and showed how Lodin can reduce this explosion through state space re-ductions. The state space reductinos also havetheir limits thus we need other strategies for han-dling this explosion. Lodin proposes to use asimulation-based technique, where random (step-bounded) traces are drawn from the program andinspected for satisfaction of the property at hand.At the heart of any simulation-based technique isan underlying simulation distribution. The simula-tion distribution may stem from actual knowledgeof how the system behaves, in which case simula-tions can be used to calculate actual probabilities ofthe system satisfying the property using statisticalmethods - hence the name statistical model check-ing [21]. In case the simulation distribution is “arbi-22 int flags [2] = {0 ,0}; int turn = 0; void crit () {} typedef struct { int *mflag; int *oflag; int* turn; }Options ; void* petersons1 () { Options opt; opt.mflag = &flags [0]; opt.oflag = &flags [1]; opt.turn = &turn; *( opt.mflag) = 1; *( opt.turn) = 1; while (*(opt.oflag ) && *( opt. turn) == 1) { // busy wait } // critical section crit (); // end of critical section *( opt.mflag) = 0; return 0; } void* petersons2 () { Options opt; opt.mflag = &flags [1]; opt.oflag = &flags [0]; opt.turn = &turn; *( opt.mflag) = 1; *( opt.turn) = 0; while (*(opt.oflag ) && *( opt. turn) == 0) { // busy wait } // critical section crit (); // end of critical section *( opt.mflag) = 0; return 0; } Figure 16: Petersons Mutual Exclusion Protocoltrary”, then estimated probabilities are meaninglessfor the system itself, but serves as a way to predicthow likely it is that a continued search will find theproperty searched for. In this case the technique iscalled Monte Carlo Model Checking. In Lodin each state n of the state space L EM =( N , n , −→ E ) is assigned a probability distribution γ n : N → [0 , γ n should obviously only assigna probability mass to a process if that process canperform a transition thus we require that γ n ( i ) =0 = ⇒ n Inst −−−→ i E n ′ , for some n ′ . Having selectedwho should perform an action, we also need a prob-ability function for the result of that choice i . We dothis by assuming a δ n ,i : N → [0 , N is theset of all states. The requirement to this functionis, that it should only assign probabilities to statesthat can be reached by the ith process performing a transition from n i.e. δ n ,i ( n ′ ) = 0 = ⇒ n Inst −−−→ i E n for some instruction Inst .Given these two probability mass functions, theprobability that a system generates the finite tran-sition sequence ω = n Inst −−−→ i E n Inst −−−→ i E . . . Inst n −−−→ i n E n n , where n is the initial state, is given by P ( ω ) = Q nk =1 γ n k − ( i k ) · δ n k − ,i k ( n k ). For a transitions se-quence ω = n Inst −−−→ i E n Inst −−−→ i E . . . Inst n −−−→ i n E n n , welet | ω | = n be its length and ω [ i ] = n i . We also letΩ m, M be the set of all transition sequences ω with | ω | = m of LLVM module M . Let p be a propo-sition, and ω ∈ Ω m, M then we define the indicatorfunction I p ( ω ) = ( ∃ i s.t. P p ( ω [ i ]) = tt ω at some point satisfies p and 0otherwise. With this at our hand, we define theprobability that an execution trace of a program M satisfies a proposition p within m steps as23 ata: Initial state: n Data: Length: nω = n ; for i ∈ { , . . . , n } do k ∼ δ n i − ; n i ∼ γ n i − ,k ( n i − ); ω = ω n i ; endreturn ω Algorithm 2: Generating random traces in Lodin n States DataRace States1 77 1100 1840 41000 3579 1110000 4714 14Table 5: State encountered with SMC. The usedquery is EnumStatesSMC <=5000 n . Pr M ,m ( p ) = X ω ∈ Ω m, M I p ( ω ) · P ( ω )As the probability only depends on the state, weusually project out transitions and only generate thestates. An algorithms for generating a sequence ofstates from n according to the probability distribu-tion can be seen in Algorithm 2. In the algorithm weuse k ∼ P to mean that k is distributed accordingto the probability mass function P . Example 4. Before dwelling upon how to usingsimulation to do verification, let us briefly considerwhat kind of coverage of the state space we can ex-pect with by doing simulations. To this end, wehave implemented the query EnumStatesSMC <=l n .This query simply generates n traces each of length l and keeps tracks of how many different states ithas visited in total. We show the results of runningthis query on Figure 16 in Table 5. Recall from pre-viously, that the total number of states is . Statistical model checking tries answering two ques-tions: 1. a quantitative “What is the probability θ of reaching p ”?, and 2. a qualitative “Is the prob-ability of θ greater than θ t ”? Both questions areanswered by generating a number samples and us-ing statistical techniques to infer the answer with auser specified confidence. Quantitative Here we repeatedly generate runsand construct an interval [ θ l , θ u ] for which we areconfident that the probabiltity θ is contained within.For the following we assume we are provided with ǫ being the wanted width of the interval and an α ∈ [0 , 1] indicating the confidence (1 − α ) we want inthe interval.Consider that we have generated a sequence ofsamples ω , ω , . . . , and let x , . . . , x m be randomvariables such that x i = I p ( ω i ). Then each variable x i has a Bernoulli distribution with success proba-bility θ t and the sum X m = P mi =1 x i is binomiallydistributed. We construct a confidence interval us-ing the exact confidence interval by Clopper andPearson [10]: if we have m samples then a Clopper-Pearson-interval with confidence α is given as theintersection S ≤ ∩ S ≥ where S ≤ = { ψ | B m,ψ ( X m ) > α/ } S ≥ = { ψ | − B m,ψ ( X m ) > α/ } and B m,ψ is the cumulative distribution function fora binomial distribution with m samples and successparameter ψ . Notice that we are not in control ofthe resulting width of this interval - more sampleswill however shrink the width ǫ and thus we simplyiteratively produce samples until we get the desiredwidth. Example 5. Let us consider the program inFigure 16 again and let us asses the probability thata data race is encountered. We can asses thiswith the query: Pr[<=5000] (<> DataRace) . The in this query is the length of the runs. See Lodin -Output 2 for the output. From the outputwe can see that Lodin estimates the probability to odin 0.3 (Jul 8 2019)Revision : 0.2 -802 - ga42644cfImportance Ratio: doubleLLVM: 8.0.0LLVM module modifications:Remove Unuused instructionsWarning : Function signature of entry pointpetersons1 (Pointer ()) does not match onreturn type by platform (UI32 ())Warning : Function signature of entry pointpetersons2 (Pointer ()) does not match onreturn type by platform (UI32 ())Random seed: 1562589004System : NaiveGraph -explicitPlatform : PThreadStorage : SharedMem StorageSuccessor: StandardProb - Successor: StandardPassed -Waiting : StandardSMT -Backend : Boolector 3.0.0Verifying: Pr [ <=5000]( < >DataRace )Result : [0 .285738 ,0 .295738 ] with confidence 0.95Total Runs: 31883 , Satisfying Runs: 9269Histogram: Satisfying RunsMax Frequency: 0.504262Values in [28, 103 ] in steps of 1[ 4674 , 2057 , 0, 0, 0, 0, 0, 0, 0, 480, 0, 0,0, 0, 0, 0, 0, 615, 0, 0, 0, 0, 84, 0, 0,0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 1167 , 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 152, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 2 ] Lodin -Output 2: Lodin -output lie in the interval [0 . , . . The last part pro-vides a histogram over the length of the satisyfingruns. Lodin runs by default with α = 0 . and δ = 0 . . These parameters can be tweaked by suf-fixing the query with { Alpha = Float, Epsilon =Float } where Float are numbers in [0 , . Runningthe query Pr[<=5000] (<> DataRace) { Alpha =0.01, Epsilon = 0.05 } for instance gives the re-sult [0 . , . . Qualitative. Checking whether the probability Pr M ,m ( p ) exceeds a threshold θ can be answeredby doing hypothesis testing. We test the hypothesis H : Pr M ,m ( p ) ≥ θ against H : Pr M ,m ( p ) < θ .In advance, we want to define two parameters, α (significance level) and β (power level), that sig-nifies how willing we are to reject a true hypoth-esis and how willing we are to accept a false hy-pothesis. In practice we want a test for which theprobability of rejecting H while H is true is lessthan α ; while the probability of accepting H while H is true is less than β . Realising that acheivingboth of these requirements is close to impossible ingeneral [22] we introduce an indifference region ofwidth 2 · δ around θ and test instead the hypothesis H ′ Pr M ( p ) : φ ≥ θ + δ against H ′ : Pr M ( p ) < θ − δ .Wald [20] developed a sequential hypothesis testingalgorithm, see Algorithm 3, for exactly this case;the idea is to iteratively generate runs and basedon these calculate a value r - eventually this valuewill cross log ( β/ (1 − α )) or log ((1 − β ) /α ) and H ′ o is either rejected or accepted. In previous sections we described the symbolic rep-resentation of states used within Lodin , and we sawin an example how this representation could be usedto explore many values registers simultaneously. Wehowever did not give a structured way of using thissymbolic representation in a verification framework.We make up for that in this section.25 ata: Initial State: s Data: Property: Pr M ,m ( p ) ≥ θ Data: Indifference Region: 2 · δ Data: Significance Level: α Data: Power Level: β Result: ⊤ or ⊥ p = θ + δ ; p = θ − δ ; r = 0; while d > δ do ω = generateRun ( s, m ); x = I p ( ω ); r = r + x · log ( P /p ) + (1 − x ) · log ((1 − p / (1 − p )); if r ≤ log ( β/ (1 − α )) thenreturn ⊤ endif r ≥ log ((1 − β ) /α ) thenreturn ⊥ endendAlgorithm 3: Testing whether probability islarger than θ In this section we show how Lodin uses its symbolicrepresentation to analyse single-threaded programswithout loops. For now, we will also restrict ourattention to verify if a given function can be called atany time e.g. propositions as [0 . @ error ]. Before goinginto details about the algorithm, we will setup upsome convenient notations, to make the algorithmmore readable.A key concept we will need in the algorithmfor analysing loop-free programs is converging ba-sic blocks and diverging basic blocks: for a LLVM function ( @ N , R , P , BL , BBs , Bm , ret ), we say that ablock B ∈ BBs diverges control flow if B [ | B | ] def =( br i c , label % trueb , label % falseb ). For a block B ∈ BBs where Bm ( con ) = B for some con , we define theset of all blocks jumps to B as Data: Property : φ Data: Initial state: n Result: ⊤ or ⊥ Mergees := Mergees ; Waiting := { s } ; while Waiting = ∅ do Let n c ∈ Waiting ; Waiting := Waiting \ { n c } ; if P [ i . @ func ] ( n c ) thenreturn ⊤ endforeach n n ∈ { n | ∃ i, Inst s.t. n c Inst −−−→ i S n } doif ¬ Mergeable ( n n ) then Waiting = Waiting ∪ { n n } ; endelse Let n n = (( F , prev , cur , pc , π, Free ) : S, s S ) ; if ∃ ( cur , n o , n ) ∈ Mergees thenif n − then Waiting = Waiting ∪ { merge ( n o , n n ) } ; endelse Mergees = Mergees \ { ( cur , n o , n )) } ∪{ ( cur , merge ( n o , n n ) , n − } endendelse Mergees = Mergees ∪ { ( cur , n n , In ( n n ) − } ; endendendendreturn ⊥ Algorithm 4: The symbolic reachability algo-rithm.26 n ( con ) = { B ′ ∈ BBs | B ′ [ | B ′ | ] def = ( br i r , label % con , label % f ) }∪{ B ′ ∈ BBs | B ′ [ | B ′ | ] def = ( br i r , label % t , label % con ) }∪{ B ′ ∈ BBs | B ′ [ | B ′ | ] def = ( br label % con ) } , and say that con labels a converging block if | In ( con ) | > 1. For ease of writing we will say that con is a converging block. The definition of In welift to states of L SM = ( N , n S , −→ S ) as follows: if n = ( s : S, s S ) and s = ( F , prev , cur , pc , π, Free )then In ( n ) = In ( cur ).In the discussion of the symbolic context, we de-fined how to merge symbolic context states. Herewe wish to lift merging to a state n , n ′ ∈ N . Astate n = (( F , prev , cur , pc , π, Free ) : S, s S ) is consid-ered mergeable (written Mergeable ( n )) if pc is nota phi i32 instruction and In ( n ) > 1. It can be mergedwith another state n ′ = (( F , prev ′ , cur , pc , π, Free ′ ) : S, s ′S ) if s S and s ′S can be merged. The merge of n , n ′ is defined as: merge ( n , n ′ ) =(( F , prev , cur , pc , π, Free ∪ { Free ′ } ) : S, merge ( s S , s ′S ))After these preliminary setups, we are ready toshow the algorithm in Algorithm 4. To a large ex-tend it is the classic reachability algorithm whereunexplored states are kept in a Waiting list, and im-mediately after being pulled from the Waiting , is ischecked if the property at hand is satisfied. Check-ing if the property [ i. @ func ] is true involves1. checking if the function @ func is being called bythe ith process (a check that does not dependon the LLVM registers),2. checking if the path formula of the state is sat-isfiable.If the property is not satisfied, then all pos-sible successor are generated and either put into Waiting (if not a Mergeable state) or it is triedmerged with a state already in a Mergees queue. Handling Loops Any nontrivial program willhave loops, and as such verificaion techniques mustcope with loops. Lodin can verify programs withloops, but relies on syntactially unrolling the loopsbefore verification. In case the loop unroll is com-plete, then the verification is complete - otherwisethe verification is only sound. Lodin - available at - is buildaround the LLVM -bitcode and uses the LLVM -libraries for parsing the input-files, and perform-ing some LLVM modifications during. Lodin does,however, not use the infrastructure of LLVM forperforming analyses. Instead it builds its own inter-nal representation of the loaded LLVM module andimplements its own state space successor generator. At load time Lodin can perform a number of modi-fications of the LLVM program - some of the modifi-cations are enabled by default, some forced enabledby others . In the following we briefly discuss themodifications. Naming Instructions LLVM -bitcode files donot necessarily contain names for the registers. Atload time Lodin therefore give names to all non-named registers in the program. This simplifies in-ternally when providing error messages. Constant Removal LLVM -bitcode instructionscan have constant expressions which the interpreterof Lodin would have to evaluate at run time. Wereplace these constant expressions with LLVM in-structions thus simplifying the subset of LLVM thatour interpreter needs to understand. Simplify CFG This is a standard LLVM mod-ification that attempts to simplify the control flow To help the user, the modified program can be outputtedat load time as well Lodin provides an option for running thissimplification, but does not run it by default as itmodifies the program drastically and thus specifica-tions of the user is perhaps no longer “valid”. Themodification can be enabled by the user or forced byother modifications. Elimninate Dead Code As the names suggests,this modification removes code that statically canbe determined to be unreachable. This is standard LLVM modification that has to be enabled by theuser. Constant Propagation This is a standard LLVM modification that forwards constants in the LLVM -code and thereby reduce the number of in-structions in the LLVM -code. Mem2Reg This modification tries to promotememory operations to register operations. This isuseful as it makes operations easier for some of themodifications. The modification can be enabled bythe user or forced by other modifications. Loop Unrolling This is the only modificationthat requires a user specified input n . The mod-ification unrolls all detected loops in the programat most n times. If it can be determined a loopwill only execute m < n times, it is of course onlyunrolled m times. The unrolling is implemented in-side Lodin but borrows the unrolling strategy fromthe LLVM library. The reason the loop unrollingdoes not use the default LLVM unrolling method isthat Lodin needs more control of the unrolling thanthe interface offered. Enabling loop unrolling force-enables Mem2Reg and Simplify CFG . The main us-age of Loop unrolling is to support the unrollingneeded by bounded model checking. Lodin employs a layered architecture (seeFigure 17) where high-level algororithms - asdetailed in previous sections - can be implementedwithout knowledge of low-level consideratins such Generators Prob-GeneratorsInterpreter PlatformsContext MemoryAlgorithmsState RepStorage Figure 17: Architecture of Lodin as how the states are represented. The algorithmsdepends on state generators implementing thethe state space reductions or the probabilisticsemantics. The generators in turns depends on ajoint interpreter-platform unit, that will interactwith an interface to a state representation (howactivation records are stored etc.). The state repre-sentation then depends on a context-memory unitwhich performs the operations requested by theinterpreter. At the lowest level of the architectureis the storage unit which is responsible for storingand saving states (used by the implementation of Passed / Waiting sets in Algorithm 4).. SMT Solvers Lodin uses external SMT-solversfor solving the contraints gathered by the symbo-lis context implementation. The constraints arerepresented in a solver-independent format andonly at the last minute converted to SMT-solverspecifics. This allows easily interchanging the usedsolver: currently Lodin is linked against Z3 [11] and Boolector [19] and uses Boolector by default.28 Conclusion We presented the fairly new tool Lodin . Lodin implements explicit-state model checking of LLVM with concurrent processes. To combat thestate-space explosion problem Lodin supplementsexplicit-state model checking techniques withsimulation-based techniques. For single-threadedprograms Lodin implements a symboic state spacerepresentation allowing it to verify programs withnon-deterministic input precisely. The symbolicenigne of Lodin uses off-the-shelf SMT-solvers -presently Boolector and Z3 . References [1] Christel Baier and Joost-Pieter Katoen. Princi-ples of Model Checking . MIT Press, 2008. ISBN978-0-262-02649-9.[2] Thomas Ball and Sriram K. Rajamani. TheSLAM toolkit. In G´erard Berry, HubertComon, and Alain Finkel, editors, Com-puter Aided Verification, 13th InternationalConference, CAV 2001, Paris, France, July18-22, 2001, Proceedings , volume 2102 of Lecture Notes in Computer Science , pages260–264. Springer, 2001. ISBN 3-540-42345-1. doi: 10.1007/3-540-44585-4 \ 25. URL https://doi.org/10.1007/3-540-44585-4_25 .[3] Zuzana Baranov, Ji Barnat, Katarna Kejstov,Tade Kuera, Henrich Lauko, Jan Mrzek, PetrRokai, and Vladimr till. Model checking of Cand C++ with DIVINE 4. In Automated Tech-nology for Verification and Analysis (ATVA2017) , volume 10482 of LNCS , pages 201–207.Springer, 2017.[4] Dirk Beyer and M. Erkan Keremoglu.Cpachecker: A tool for configurable soft-ware verification. In Ganesh Gopalakrishnanand Shaz Qadeer, editors, Computer AidedVerification - 23rd International Conference,CAV 2011, Snowbird, UT, USA, July 14-20,2011. Proceedings , volume 6806 of Lecture Notes in Computer Science , pages 184–190.Springer, 2011. ISBN 978-3-642-22109-5. doi:10.1007/978-3-642-22110-1 16.[5] Dirk Beyer, Thomas A. Henzinger, RanjitJhala, and Rupak Majumdar. The softwaremodel checker blast. STTT , 9(5-6):505–525,2007. doi: 10.1007/s10009-007-0044-z. URL https://doi.org/10.1007/s10009-007-0044-z .[6] Armin Biere, Alessandro Cimatti, Ed-mund M. Clarke, Ofer Strichman, andYunshan Zhu. Bounded model checking. Advances in Computers , 58:117–148, 2003.doi: 10.1016/S0065-2458(03)58003-2. URL https://doi.org/10.1016/S0065-2458(03)58003-2 .[7] Cristian Cadar, Daniel Dunbar, and Dawson R.Engler. KLEE: unassisted and automatic gen-eration of high-coverage tests for complexsystems programs. In Richard Draves andRobbert van Renesse, editors, , pages 209–224. USENIX Associ-ation, 2008. ISBN 978-1-931971-65-2. URL .[8] Edmund Clarke, Orna Grumberg, and DoronPeled. Model Checking . MIT Press, 1999.[9] Edmund M. Clarke, Orna Grumberg,Somesh Jha, Yuan Lu, and HelmutVeith. Counterexample-guided abstrac-tion refinement for symbolic model check-ing. J. ACM , 50(5):752–794, 2003.doi: 10.1145/876638.876643. URL https://doi.org/10.1145/876638.876643 .[10] Charles J Clopper and Egon S Pearson. Theuse of confidence or fiducial limits illustratedin the case of the binomial. Biometrika , 26(4):404–413, 1934.[11] Leonardo Mendon¸ca de Moura and Niko-laj Bjørner. Z3: an efficient SMT solver.In C. R. Ramakrishnan and Jakob Rehof,29ditors, Tools and Algorithms for the Con-struction and Analysis of Systems, 14thInternational Conference, TACAS 2008, Heldas Part of the Joint European Conferenceson Theory and Practice of Software, ETAPS2008, Budapest, Hungary, March 29-April 6,2008. Proceedings , volume 4963 of LectureNotes in Computer Science , pages 337–340.Springer, 2008. ISBN 978-3-540-78799-0.doi: 10.1007/978-3-540-78800-3 \ 24. URL https://doi.org/10.1007/978-3-540-78800-3_24 .[12] LLVM Developers. LLVMlanguage reference manual. https://llvm.org/docs/LangRef.html ,2018.[13] Stephan Falke, Florian Merz, and Carsten Sinz.LLBMC: improved bounded model checking ofC programs using LLVM - (competition contri-bution). In Nir Piterman and Scott A. Smolka,editors, Tools and Algorithms for the Construc-tion and Analysis of Systems - 19th Interna-tional Conference, TACAS 2013, Held as Partof the European Joint Conferences on Theoryand Practice of Software, ETAPS 2013, Rome,Italy, March 16-24, 2013. Proceedings , volume7795 of Lecture Notes in Computer Science ,pages 623–626. Springer, 2013. ISBN 978-3-642-36741-0. doi: 10.1007/978-3-642-36742-748.[14] Patrice Godefroid. Verisoft: A tool for theautomatic analysis of concurrent reactivesoftware. In Orna Grumberg, editor, Com-puter Aided Verification, 9th InternationalConference, CAV ’97, Haifa, Israel, June22-25, 1997, Proceedings , volume 1254 of Lecture Notes in Computer Science , pages476–479. Springer, 1997. ISBN 3-540-63166-6. doi: 10.1007/3-540-63166-6 \ 52. URL https://doi.org/10.1007/3-540-63166-6_52 .[15] Arie Gurfinkel, Temesghen Kahsai, AnveshKomuravelli, and Jorge A. Navas. The seahornverification framework. In Daniel Kroening andCorina S. Pasareanu, editors, Computer Aided Verification - 27th International Conference,CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I , volume 9206 of Lecture Notes in Computer Science , pages 343–361. Springer, 2015. ISBN 978-3-319-21689-8.doi: 10.1007/978-3-319-21690-4 \ 20. URL https://doi.org/10.1007/978-3-319-21690-4_20 .[16] Daniel Kroening and Michael Tautschnig.CBMC - C bounded model checker - (compe-tition contribution). In Erika ´Abrah´am andKlaus Havelund, editors, Tools and Algorithmsfor the Construction and Analysis of Systems -20th International Conference, TACAS 2014,Held as Part of the European Joint Confer-ences on Theory and Practice of Software,ETAPS 2014, Grenoble, France, April 5-13,2014. Proceedings , volume 8413 of LectureNotes in Computer Science , pages 389–391.Springer, 2014. ISBN 978-3-642-54861-1.doi: 10.1007/978-3-642-54862-8 \ 26. URL https://doi.org/10.1007/978-3-642-54862-8_26 .[17] Chris Lattner and Vikram S. Adve. LLVM:A compilation framework for lifelong programanalysis & transformation. In , pages 75–88.IEEE Computer Society, 2004. ISBN 0-7695-2102-9. doi: 10.1109/CGO.2004.1281665. URL https://doi.org/10.1109/CGO.2004.1281665 .[18] Axel Legay, Dirk Nowotka, Danny BøgstedPoulsen, and Louis-Marie Traonouez. Sta-tistical model checking of LLVM code. InKlaus Havelund, Jan Peleska, Bill Roscoe,and Erik P. de Vink, editors, Formal Methods- 22nd International Symposium, FM 2018,Held as Part of the Federated Logic Confer-ence, FloC 2018, Oxford, UK, July 15-17,2018, Proceedings , volume 10951 of LectureNotes in Computer Science , pages 542–549.Springer, 2018. ISBN 978-3-319-95581-0.doi: 10.1007/978-3-319-95582-7 \ 32. URL https://doi.org/10.1007/978-3-319-95582-7_32 .3019] Aina Niemetz, Mathias Preiner, CliffordWolf, and Armin Biere. Btor2 , btormcand boolector 3.0. In Hana Chockler andGeorg Weissenbacher, editors, Computer AidedVerification - 30th International Conference,CAV 2018, Held as Part of the Federated LogicConference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I , volume 10981 of Lecture Notes in Computer Science , pages 587–595. Springer, 2018. ISBN 978-3-319-96144-6.doi: 10.1007/978-3-319-96145-3 \ 32. URL https://doi.org/10.1007/978-3-319-96145-3_32 .[20] Abraham Wald. Sequential analysis . CourierCorporation, 1973.[21] H˚akan L. S. Younes, Marta Z. Kwiatkowska,Gethin Norman, and David Parker. Numeri-cal vs. statistical probabilistic model checking. STTT , 8(3):216–228, 2006.[22] Hkan L. S. Younes.