[PDF] LTL Model Checking of Self Modifying Code

Abstract

Self modifying code is code that can modify its own instructions during the execution of the program. It is extensively used by malware writers to obfuscate their malicious code. Thus, analysing self modifying code is nowadays a big challenge. In this paper, we consider the LTL model-checking problem of self modifying code. We model such programs using self-modifying pushdown systems (SM-PDS), an extension of pushdown systems that can modify its own set of transitions during execution. We reduce the LTL model-checking problem to the emptiness problem of self-modifying Büchi pushdown systems (SM-BPDS). We implemented our techniques in a tool that we successfully applied for the detection of several self-modifying malware. Our tool was also able to detect several malwares that well-known antiviruses such as BitDefender, Kinsoft, Avira, eScan, Kaspersky, Qihoo-360, Baidu, Avast, and Symantec failed to detect.

Full PDF

LLTL Model Checking of Self Modifying Code

Tayssir Touili and Xin Ye CNRS,LIPN and University Paris 13 East China Normal University, Shanghai, China

Abstract.

Self modifying code is code that can modify its own instruc-tions during the execution of the program. It is extensively used by mal-ware writers to obfuscate their malicious code. Thus, analysing self mod-ifying code is nowadays a big challenge. In this paper, we consider theLTL model-checking problem of self modifying code. We model such pro-grams using self-modifying pushdown systems (SM-PDS), an extensionof pushdown systems that can modify its own set of transitions duringexecution. We reduce the LTL model-checking problem to the empti-ness problem of self-modifying B¨uchi pushdown systems (SM-BPDS).We implemented our techniques in a tool that we successfully appliedfor the detection of several self-modifying malware. Our tool was alsoable to detect several malwares that well-known antiviruses such as Bit-Defender, Kinsoft, Avira, eScan, Kaspersky, Qihoo-360, Baidu, Avast,and Symantec failed to detect.

Binary code presents several complex aspects that cannot be encountred insource code. One of these aspects is self-modifying code, i.e., code that canmodify its own instructions during the execution of the program. Self-modifyingcode makes reverse code engineering harder. Thus, it is extensively used to pro-tect software intellectual property. It is also heavily used by malware writers inorder to make their malwares hard to analyse and detect by static analysers andanti-viruses. Thus, it is crucial to be able to analyse self-modifying code.There are several kinds of self-modifying code. In this work, we considerself-modifying code caused by self-modifying instructions . These kind of in-structions treat code as data. This allows them to read and write into code,leading to self-modifying instructions . These self-modifying instructions areusually mov instructions, since mov allows to access memory and read andwrite into it.Let us consider the example shown in Fig.1. For simplicity, the addresses’length is assumed to be 1 byte. In the right box, we give, respectively, the binarycode, the addresses of the diﬀerent instructions, and the corresponding assemblycode, obtained by translating syntactically the binary code at each address. Forexample, is the binary code of the jump jmp . Thus,

0c 02 is translated to jmp 0x2 (jump to address 0x2). The second line is translated to push 0x9 , since a r X i v : . [ c s . L O ] S e p f is the binary code of the instruction push . The third instruction mov 0x20xc will replace the ﬁrst byte at address by . Thus, at address , ff 09 is replaced by

0c 09 . This means the instruction push 0x9 is replacedby the jump instruction jmp 0x9 (jump to address 0x9), etc. Therefore, thiscode is self-modifying: the mov instruction was able to modify the instructionsof the program via its ability to read and write the memory. If we study thiscode without looking at the semantics of the self-modifying instructions, we willextract from it the Control Flow Graph

CFG a that is in the left of the ﬁgure,and we will reach the conclusion that the call to the API function CopyFileA ataddress cannot be made. However, you can see that the correct CFG is theone on the right hand side

CFG b , where the call to the API function CopyFileAat address can be reached. Thus, it is very important to be able to take intoaccount the semantics of the self-modifying instructions in binary code.

CFGs ﬀ ﬀ ﬀ Binary Codes AssemblyaddressCodes

After Execution ofmov 0x2 0xc

Fig. 1: An Example of a Self-modifying CodeIn this paper, we consider the LTL model-checking problem of self-modifyingcode. To this aim, we use Self-Modifying Pushdown Systems (SM-PDSs) [29]to model self-modifying code. Indeed, SM-PDSs were shown in [29] to be anadequate model for self-modifying code since they allow to mimic the program’sstack while taking into account the self-modifying semantics of the transitions.This is very important for binary code analysis and malware detection, sincemalwares are based on calls to API functions of the operating system. Thus,antiviruses check the API calls to determine whether a program is malicious ornot. Therefore, to evade from these antiviruses, malware writers try to hide theAPI calls they make by replacing calls by push and jump instructions. Thus,to be able to analyse such malwares, it is crucial to be able to analyse theprogram’s stack. Hence the need to a model like pushdown systems and self-modifying pushdown systems for this purpose, since they allow to mimic theprogram’s stack. 2ntuitively, a SM-PDS is a pushdown system (PDS) with self-modifying rules,i.e., with rules that allow to modify the current set of transitions during execu-tion. This model was introduced in [29] in order to represent self-modifying code.In [29], the authors have proposed algrithms to compute ﬁnite automata thataccept the forward and backward reachability sets of SM-PDSs. In this work,we tackle the problem of LTL model-checking of SM-PDSs. Since SM-PDSs areequivalent to PDSs [29], one possible approach for LTL model checking of SM-PDS is to translate the SM-PDS to a standard PDS and then run the LTLmodel checking algorithm on the equivalent PDS [2,10]. But translation from aSM-PDS to a standard PDS is exponential. Thus, performing the LTL modelchecking on the equivalent PDS is not eﬃcient.To overcome this limitation, we propose a direct

LTL model checking algo-rithm for SM-PDSs. Our algorithm is based on reducing the LTL model checkingproblem to the emptiness problem of Self Modifying B¨uchi Pushdown Systems(SM-BPDS). Intuitively, we obtain this SM-BPDS by taking the product of theSM-PDS with a B¨uchi automaton accepting an LTL formula ϕ . Then, we solvethe emptiness problem of an SM-BPDS by computing its repeating heads. Thiscomputation is based on computing labelled pre ∗ conﬁgurations by applying asaturation procedure on labelled ﬁnite automata.We implemented our algorithm in a tool. Our experiments show that our direct techniques are much more eﬃcient than translating the SM-PDS to anequivalent PDS and then applying the standard LTL model checking for PDSs[2,10]. Moreover, we successfully applied our tool to the analysis of 892 self-modifying malwares. Our tool was also able to detect several self-modifyingmalwares that well-known antiviruses like BitDefender, Kinsoft, Avira, eScan,Kaspersky, Qihoo-360, Baidu, Avast, and Symantec were not able to detect. Related Work.

Model checking and static analysis approaches have been widelyused to analyze binary programs, for instance, in [9,5,23,11,3]. Temporal Logicswere chosen to describe malicious behaviors in [20,11,3,4,8]. However, these workscannot deal with self-modifying code.POMMADE [3,4] is a malware detector based on LTL and CTL model-checking of PDSs. STAMAD [15,16,14] is a malware detector based on PDSsand machine learning. However, POMMADE and STAMAD cannot deal withself-modifying code.Cai et al. [7] use local reasoning and separation logic to describe self-modifyingcode and treat program code uniformly as regular data structure. However, [7]requires programs to be manually annotated with invariants. In [26], the au-thors propose a formal semantics for self-modifying codes, and use that to repre-sent self-unpacking code. This work only deals with packing and unpacking be-haviours. Bonfante et al. [6] provide an operational semantics for self-modifyingprograms and show that they can be constructively rewritten to a non-modifyingprogram. However, all these speciﬁcations [6,7,26] are too abstract to be used inpractice.In [1], the authors propose a new representation of self-modifying code namedState Enhanced-Control Flow Graph (SE-CFG). SE-CFG extends standard con-3rol ﬂow graphs with a new data structure, keeping track of the possible statesprograms can reach, and with edges that can be conditional on the state of thetarget memory location. It is not easy to analyse a binary program only using itsSE-CFG, especially that this representation does not allow to take into accountthe stack of the program.[24] propose abstract interpretation techniques to compute an over-approximationof the set of reachable states of a self-modifying program, where for each controlpoint of the program, an over-approximation of the memory state at this controlpoint is provided. [18] combine static and dynamic analysis techniques to analyseself-modifying programs. Unlike our approach, these techniques [24,18] cannothandle the program’s stack.Unpacking binary code is also considered in [13,17,22,26]. These works donot consider self-modifying mov instructions.

Outline.

The rest of the paper is structured as follows: Section 2 recallsthe deﬁnition of Self Modifying pushdown systems. LTL model checking andSM-BPDSs are deﬁned in Section 3. Section 4 solves the emptiness problem ofSM-BPDS. Finally, the experiments are reported in Section 5.

We recall in this section the deﬁnition of Self-modifying Pushdown Systems [29].

Deﬁnition 1.

A Self-modifying Pushdown System (SM-PDS) is a tuple P =( P, Γ, ∆, ∆ c ) , where P is a ﬁnite set of control points, Γ is a ﬁnite set of stacksymbols, ∆ ⊆ ( P × Γ ) × ( P × Γ ∗ ) is a ﬁnite set of transition rules, and ∆ c ∈ P × ∆ × ∆ × P is a ﬁnite set of modifying transition rules. If (( p, γ ) , ( p (cid:48) , w )) ∈ ∆ ,we also write (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ . If ( p, r , r , p (cid:48) ) ∈ ∆ c , we also write p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c . A Pushdown System (PDS) is a SM-PDS where ∆ c = ∅ . Intuitively, a Self-modifying Pushdown System is a Pushdown System thatcan dynamically modify its set of rules during the execution time: rules ∆ arestandard PDS transition rules, while rules ∆ c modify the current set of transitionrules: (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ expresses that if the SM-PDS is in control point p and has γ on top of its stack, then it can move to control point p (cid:48) , pop γ andpush w onto the stack, while p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c expresses that when the PDS isin control point p , then it can move to control point p (cid:48) , remove the rule r fromits current set of transition rules, and add the rule r .Formally, a conﬁguration of a SM-PDS is a tuple c = ( (cid:104) p, w (cid:105) , θ ) where p ∈ P is the control point, w ∈ Γ ∗ is the stack content, and θ ⊆ ∆ ∪ ∆ c is the currentset of transition rules of the SM-PDS. θ is called the current phase of the SM-PDS. When the SM-PDS is a PDS, i.e., when ∆ c = ∅ , a conﬁguration is a tuple c = ( (cid:104) p, w (cid:105) , ∆ ), since there is no changing rule, so there is only one possible phase.4n this case, we can also write c = (cid:104) p, w (cid:105) . Let C be the set of conﬁgurations ofa SM-PDS. A SM-PDS deﬁnes a transition relation ⇒ P between conﬁgurationsas follows: Let c = ( (cid:104) p, w (cid:105) , θ ) be a conﬁguration, and let r be a rule in θ , then:1. if r ∈ ∆ c is of the form r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) , such that r ∈ θ , then ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:105) , θ (cid:48) ), where θ (cid:48) = ( θ \ { r } ) ∪ { r } . In other words, the transition rule r updates the current set of transition rules θ by removing r from it andadding r to it.2. if r ∈ ∆ is of the form r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:48) (cid:105) ∈ ∆ , then ( (cid:104) p, γw (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) w (cid:105) , θ ). In other words, the transition rule r moves the control pointfrom p to p (cid:48) , pops γ from the stack and pushes w (cid:48) onto the stack. Thistransition keeps the current set of transition rules θ unchanged.Let ⇒ ∗P be the transitive, reﬂexive closure of ⇒ P and ⇒ + P be its transitiveclosure. An execution (a run) of P is a sequence of conﬁgurations π = c c ... s.t. c i ⇒ P c i +1 for every i ≥

0. Given a conﬁguration c , the set of immedi-ate predecessors (resp. successors) of c is pre P ( c ) = { c (cid:48) ∈ C : c (cid:48) ⇒ P c } (resp. post P ( c ) = { c (cid:48) ∈ C : c ⇒ P c (cid:48) } ). These notations can be generalizedstraightforwardly to sets of conﬁgurations. Let pre ∗P (resp. post ∗P ) denote thereﬂexive-transitive closure of pre P (resp. post P ). We remove the subscript P when it is clear from the context.We suppose w.l.o.g. that rules in ∆ are of the form (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) such that | w | ≤

2, and that the self-modifying rules r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) in ∆ c are such that r (cid:54) = r . Note that this is not a restriction, since for a given SM-PDS, one cancompute an equivalent SM-PDS that satisﬁes these conditions [29] . Let P = ( P, Γ, ∆, ∆ c ) be a SM-PDS. It was shown in [29] that:1. P can be described by an equivalent pushdown system (PDS). Indeed, sincethe number of phases is ﬁnite, we can encode phases in the control pointof the PDS. However, this translation is not eﬃcient since the number ofcontrol points of the equivalent PDS is | P | · O ( | ∆ | + | ∆ c | ) .2. P can also be described by an equivalent Symbolic pushdown system [27],where each SM-PDS rule is represented by a single, symbolic transition,where the diﬀerent values of the phases are encoded in a symbolic way usingrelations between phases. This translation is not eﬃcient neither since thesize of the relations used in the symbolic transitions is 2 O ( | ∆ | + | ∆ c | ) . It is shown in [29] how to describe a self-modifying binary code using a SM-PDS.The basic idea is that the control locations of the SM-PDS store the controlpoints of the binary program and the stack mimics the program’s stack. Our5ranslation relies on the disassembler Jakstab [12] to disassemble binary code,construct the control ﬂow graph (CFG), determine indirect jumps, compute thepossible values of used variables, registers and the memory locations at eachcontrol point of program. After getting the control ﬂow graph whose edges areequipped with disassembled instructions, we translate the CFG into a SM-PDSas described in [29]. The non self-modifying instructions of the program deﬁnethe rules ∆ of the SM-PDS (which are standard PDS rules), and can be obtainedfollowing the translation of [3] that models non self-modifying instructions ofthe program by a PDS. Self-modifying instructions are represented using self-modifying transitions ∆ c of the SM-PDS. For more details, we refer the readerto [29]. Let At be a ﬁnite set of atomic propositions. LTL formulas are deﬁned as follows(where A ∈ At ): ϕ := A | ¬ ϕ | ϕ ∨ ϕ | Xϕ | ϕ U ϕ Formulae are interpreted on inﬁnite words over 2 At . Let ω = ω ω ... be aninﬁnite word over 2 At . We write ω i for the suﬃx of ω starting at ω i . We denote ω | = ϕ to express that ω satisﬁes a formula ϕ : ω | = A ⇐⇒ A ∈ ω ω | = ¬ ϕ ⇐⇒ ω (cid:50) ϕω | = ϕ ∨ ϕ ⇐⇒ ω | = ϕ or ω | = ϕ ω | = Xϕ ⇐⇒ ω | = ϕω | = ϕ Uϕ ⇐⇒ ∃ i ≥ , ω i | = ϕ and ∀ ≤ j < i, ω j | = ϕ The temporal operators G (globally) and F (eventually) are deﬁned as fol-lows:

F ϕ = ( A ∨ ¬ A ) U ϕ and Gϕ = ¬ F ¬ ϕ . Let W ( ϕ ) be the set of inﬁnite wordsthat satisfy an LTL formula ϕ . It is well known that W ( ϕ ) can be accepted byB¨uchi automata: Deﬁnition 2.

A B¨uchi automaton B is a quintuple ( Q, Γ, η, q , F ) where Q isa ﬁnite set of states, Γ is a ﬁnite input alphabet, η ⊆ ( Q × Γ × Q ) is a setof transitions, q ∈ Q is the initial state and F ⊆ Q is the set of acceptingstates. A run of B on a word γ γ ... ∈ Γ ω is a sequence of states q q q ... s.t. ∀ i ≥ , ( q i , γ i , q i +1 ) ∈ η . An inﬁnite word ω is accepted by B if B has a run on ω that starts at q and visits accepting states from F inﬁnitely often. Theorem. [19] Given an LTL formula ϕ , one can eﬀectively construct a B¨uchiautomaton B ϕ which accepts W ( ϕ ) . A Self Modifying B¨uchi Pushdown Systems (SM-BPDS) is a tu-ple BP = ( P, Γ, ∆, ∆ c , G ) where P is a set of control locations, G ⊆ P is a set f accepting control locations, ∆ ⊆ ( P × Γ ) × ( P × Γ ∗ ) is a ﬁnite set of transitionrules, and ∆ c ⊆ P × ∆ ∪ ∆ c × ∆ ∪ ∆ c × P is a ﬁnite set of modifying transitionrules in the form p ( σ,σ (cid:48) ) (cid:44) −−−−→ p (cid:48) where σ, σ (cid:48) ⊆ ∆ ∪ ∆ c .Let ⇒ BP be the transition relation between conﬁgurations as follows: Let θ ⊆ ∆ ∪ ∆ c , γ ∈ Γ, w ∈ Γ ∗ , and p ∈ P , then1. If r : (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:48) (cid:105) ∈ ∆ and r ∈ θ , then ( (cid:104) p, γw (cid:105) , θ ) ⇒ BP ( (cid:104) p (cid:48) , w (cid:48) w (cid:105) , θ ) .2. If r : p ( σ,σ (cid:48) ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c , σ ∩ θ (cid:54) = ∅ and r ∈ θ , then ( (cid:104) p, γw (cid:105) , θ ) ⇒ BP ( (cid:104) p (cid:48) , γw (cid:105) , θ (cid:48) ) where θ (cid:48) = θ \ σ ∪ σ (cid:48) .A run π of BP is a sequence of conﬁgurations π = c c ... s.t. c i ⇒ BP c i +1 for every i ≥ . π is accepting iﬀ it inﬁnitely often visits conﬁgurations havingcontrol locations in G .Let c and c (cid:48) be two conﬁgurations of the SM-BPDS BP . The relation ⇒ r BP is deﬁned as follows: c ⇒ r BP c (cid:48) iﬀ there exists a conﬁguration ( (cid:104) g, u (cid:105) , θ ) , g ∈ G s.t. c ⇒ ∗BP ( (cid:104) g, u (cid:105) , θ ) ⇒ + BP c (cid:48) . We remove the subscript BP when it is clearfrom the context. We deﬁne i ⇒ as follows: c i ⇒ c (cid:48) iﬀ there exists a sequence ofconﬁgurations c ⇒ BP c ⇒ BP ... ⇒ BP c i s.t. c = c and c i = c (cid:48) .A head of SM-BPDS is a tuple ( (cid:104) p, γ (cid:105) , θ ) where p ∈ P , γ ∈ Γ and θ ⊆ ∆ ∪ ∆ c .A head (( p, γ ) , θ ) is repeating if there exists v ∈ Γ ∗ such that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r BP ( (cid:104) p, γv (cid:105) , θ ) . The set of repeating heads of SM-BPDS is called Rep BP . We assume w.l.o.g. that for every rule in ∆ c of the form r : p ( σ,σ (cid:48) ) (cid:44) −−−−→ p (cid:48) , r / ∈ σ. Let P = ( P, Γ, ∆, ∆ c ) be a self modifying pushdown system. Let At be a setof atomic propositions. Let ν : P → At be a labelling function. Let π =( (cid:104) p , w (cid:105) , θ )( (cid:104) p , w (cid:105) , θ ) ... be an execution of the SM-PDS P . Let ϕ be an LTLformula over the set of atomic propositions At . We say that π | = ν ϕ iﬀ ν ( p ) ν ( p ) · · · | = ϕ Let ( (cid:104) p, w (cid:105) , θ ) be a conﬁguration of P . We say that ( (cid:104) p, w (cid:105) , θ ) | = ν ϕ iﬀ P hasa path π starting at ( (cid:104) p, w (cid:105) , θ ) such that π | = ν ϕ .Our goal in this paper is to perform LTL model-checking for self-modifyingpushdown systems. Since SM-PDSs can be translated to standard (symbolic)pushdown systems, one way to solve this LTL model-checking problem is tocompute the (symbolic) pushdown system that is equivalent to the SM-PDS(see section 2.2), and then apply the standard LTL model-checking algorithms onstandard PDSs [27]. However, this approach is not eﬃcient (as will be witnessedlater in the experiments). Thus, we need a direct approach that performs LTLmodel-checking on the SM-PDS, without translating it to an equivalent PDS. Let7 ϕ = ( Q, At , η, q , F ) be a B¨uchi automaton that accepts W ( ϕ ). We computethe SM-BPDS BP ϕ = ( P × Q, Γ, ∆ (cid:48) , ∆ (cid:48) c , G ) by performing a kind of productbetween the SM-PDS P and the B¨uchi automaton B ϕ as follows:1. if r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ and ( q, ν ( p ) , q (cid:48) ) ∈ η , then (cid:104) ( p, q ) , γ (cid:105) (cid:44) → (cid:104) ( p (cid:48) , q (cid:48) ) , w (cid:105) ∈ ∆ (cid:48) . Let prod ( r ) be the set of rules of ∆ (cid:48) obtained from the rule r , i.e., rulesof ∆ (cid:48) of the form (cid:104) ( p, q ) , γ (cid:105) (cid:44) → (cid:104) ( p (cid:48) , q (cid:48) ) , w (cid:105) .2. if a rule r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c and ( q, ν ( p ) , q (cid:48) ) ∈ η , then ( p, q ) ( σ,σ (cid:48) ) (cid:44) −−−−→ ( p (cid:48) , q (cid:48) ) ∈ ∆ (cid:48) c where σ = prod ( r ) , σ (cid:48) = prod ( r ). Let prod ( r ) be the set ofrules of ∆ (cid:48) obtained from the rule r , i.e., rules of ∆ (cid:48) c of the form ( p, q ) ( σ,σ (cid:48) ) (cid:44) −−−−→ ( p (cid:48) , q (cid:48) ).3. G = P × F .We can show that: Theorem 1.

Let ( (cid:104) p, w (cid:105) , θ ) be a conﬁguration of the SM-PDS P . ( (cid:104) p, w (cid:105) , θ ) | = ν ϕ iﬀ BP ϕ has an accepting run from ( (cid:104) ( p, q ) , w (cid:105) , prod ( θ )) where prod ( θ ) is theset of rules of ∆ ∪ ∆ c obtained from the rules of θ as described above. Thus, LTL model-checking for SM-PDSs can be reduced to checking whethera SM-BPDS has an accepting run. The rest of the paper is devoted to thisproblem.

From now on, we ﬁx a SM-BPDS BP = ( P, Γ, ∆, ∆ c , G ). We can show that BP has an accepting run starting from a conﬁguration c if and only if from c , it canreach a conﬁguration with a repeating head: Proposition 1.

A SM-BPDS BP has an accepting run starting from a con-ﬁguration c if and only if there exists a repeating head (( p, γ ) , θ ) such that c ⇒ ∗BP ( (cid:104) p, γw (cid:105) , θ ) for some w ∈ Γ ∗ . Proof: “ ⇒ ”: Let σ = c c ... be an accepting run starting at conﬁguration c where c = c and c i = ( (cid:104) p i , w i (cid:105) , θ i ). We construct an increasing sequenceof indices i , i ... with a property that once any of the conﬁgurations c i k isreached, the rest of the run never changes the bottom | w i k |− | w i | = min {| w j | | j ≥ }| w i k | = min {| w j | | j > i k − } , k ≥ BP has only ﬁnitely many diﬀerent heads, there must be a head ( (cid:104) p, γ (cid:105) , θ )which occurs inﬁnitely often as a head in the sequence c i c i ... . Moreover, as8ome g ∈ G becomes a control location inﬁnitely often, we can ﬁnd a subse-quence of indices i j , i j , ... with the following property: for every k ≥ , thereexist v, w ∈ Γ ∗ c i jk = ( (cid:104) p, γw (cid:105) , θ ) ⇒ r ( (cid:104) p, γvw (cid:105) , θ ) = c i jk +1 Because w is never looked at or changed in this path, we can have ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p, γv (cid:105) , θ ). This proves this direction of the proposition.“ ⇐ ”: Because ( (cid:104) p, γ (cid:105) , θ ) is a repeating head, we can construct the followingrun for some u, v, w ∈ Γ ∗ , θ (cid:48) ⊆ ( ∆ ∪ ∆ c ) and g ∈ G : c ⇒ ∗ ( (cid:104) p, γw (cid:105) , θ ) ⇒ ∗ ( (cid:104) g, uw (cid:105) , θ (cid:48) ) ⇒ + ( (cid:104) p, γvw (cid:105) , θ ) ⇒ ∗ ( (cid:104) g, uvw (cid:105) , θ (cid:48) ) ⇒ + ( (cid:104) p, γvvw (cid:105) , θ ) ⇒ ∗ ... Since g occurs inﬁnitely often, the run is accepting. (cid:50) Thus, since there exists an eﬃcient algorithm to compute the pre ∗ of SM-PDSs [29], the emptiness problem of a SM-BPDS can be reduced to computingits repeating heads. G Our goal is to compute the set of repeating heads

Rep BP , i.e., the set of heads( (cid:104) p, γ (cid:105) , θ ) such that there exists v ∈ Γ ∗ , ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p, γv (cid:105) , θ ). I.e., ( (cid:104) p, γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, γv (cid:105) , θ ) s.t. this path goes through an accepting location in G . To this aim,we will compute a ﬁnite graph G whose nodes are the heads of BP of the form(( p, γ ) , θ ), where p ∈ P , γ ∈ Γ and θ ⊆ ∆ ∪ ∆ c ; and whose edges encode the reach-ability relation between these heads. More precisely, given two heads (( p, γ ) , θ )and (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ), (( p, γ ) , θ ) b −→ (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) is an edge of the graph G means thatthe conﬁguration ( (cid:104) p, γ (cid:105) , θ ) can reach a conﬁguration having ( (cid:104) p (cid:48) , γ (cid:48) (cid:105) , θ (cid:48) ) as head,i.e., it means that there exists v ∈ Γ ∗ s.t. ( (cid:104) p, γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ (cid:48) ). Moreover,we need to keep the information whether this path visits an accepting location in G or not. This information is recorded in the label of the edge b : b = 1 means thatthe path visits an accepting location in G , i.e. that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ (cid:48) ).Otherwise, b = 0. Therefore, if the graph G contains a loop from a head (( p, γ ) , θ )to itself such that this loop goes through an edge labelled by 1, then (( p, γ ) , θ )is a repeating head. Thus, computing Rep BP can be reduced to computing thegraph G and ﬁnding 1-labelled loops in this graph.More precisely, we deﬁne the head reachability graph G as follows: Deﬁnition 4.

The head reachability graph G is a tuple ( P × Γ × ∆ ∪ ∆ c , { , } , δ ) such that (( p, γ ) , θ ) b −→ (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) is an edge of δ iﬀ:1. there exists a transition r c : p ( σ,σ (cid:48) ) (cid:44) −−−−→ p (cid:48) ∈ θ ∩ ∆ c , γ = γ (cid:48) , θ (cid:48) = θ \ σ ∪ σ (cid:48) ,and b = 1 iﬀ p ∈ G ;2. there exists a transition (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ (cid:48) (cid:105) ∈ θ ∩ ∆, θ = θ (cid:48) and b = 1 iﬀ p ∈ G ; . there exists a transition (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48)(cid:48) , γ γ (cid:48) (cid:105) ∈ θ ∩ ∆ , for γ ∈ Γ , p (cid:48)(cid:48) ∈ P ,s.t. ( (cid:104) p (cid:48)(cid:48) , γ (cid:105) , θ ) ⇒ ∗BP ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) , and b = 1 iﬀ p ∈ G or ( (cid:104) p (cid:48)(cid:48) , γ (cid:105) , θ ) ⇒ r BP ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) Let G be the head reachability graph. We deﬁne −→ i as follows: let (( p, γ ) , θ ) and (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) be two heads of BP . We write (( p, γ ) , θ ) −→ i (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) iﬀ ∃ booleans b , b ...b i ∈ { , } , ∃ heads (( p j , γ j ) , θ j ) , ≤ j ≤ i s.t. G contains the followingpath (( p , γ ) , θ ) b −→ (( p , γ ) , θ ) b −→ ... b i −→ (( p i , γ i ) , θ i ) where (( p , γ ) , θ ) =(( p, γ ) , θ ) and (( p i , γ i ) , θ i ) = (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) .Let → ∗ be the reﬂexive transitive closure of the graph relation b −→ , and let → r be deﬁned as follows: Given two heads (( p, γ ) , θ ) and (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) , (( p, γ ) , θ ) → r (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) iﬀ there is in G a path between (( p, γ ) , θ ) and (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) that goesthrough a 1-labelled edge, i.e., iﬀ there exist heads (( p , γ ) , θ ) and (( p , γ ) , θ ) s.t. (( p, γ ) , θ ) → ∗ (( p , γ ) , θ ) −→ (( p , γ ) , θ ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) . We can show that:

Theorem 2.

Let BP = ( P, Γ, ∆, ∆ c , G ) be a self-modifying B¨uchi pushdownsystem, and let G be its corresponding head reachability graph. A head (( p, γ ) , θ ) of BP is repeating iﬀ G has a loop on the node (( p, γ ) , θ ) that goes through a1-labeled edge. To prove this theorem, we ﬁrst need to prove the following lemma:

Lemma 1.

The relations → ∗ and → r have the following properties: For anyheads (( p, γ ) , θ ) and (( p (cid:48) , γ (cid:48) ) , θ ) :(a) (( p, γ ) , θ ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ) iﬀ ( (cid:104) p, γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) for some v ∈ Γ ∗ .(b) (( p, γ ) , θ ) → r (( p (cid:48) , γ (cid:48) ) , θ ) iﬀ ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) for some v ∈ Γ ∗ . Proof: “ ⇒ ”: Assume (( p, γ ) , θ ) −→ i (( p (cid:48) , γ (cid:48) ) , θ ). We proceed by induction on i .(a) Basis. i = 0. In this case, (( p, γ ) , θ ) = (( p (cid:48) , γ (cid:48) ) , θ ), then we can get( (cid:104) p, γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, γ (cid:105) , θ ) = ( (cid:104) p (cid:48) , γ (cid:48) (cid:105) , θ ) Step. i >

0. Then there exist p ∈ P, γ (cid:48)(cid:48) ∈ Γ ∗ and θ (cid:48) ⊆ ∆ ∪ ∆ c such that(( p, γ ) , θ ) −→ (( p , γ (cid:48)(cid:48) ) , θ (cid:48) ) −−→ i − (( p (cid:48) , γ (cid:48) ) , θ ). From the induction hypothesis,there exists u ∈ Γ ∗ such that ( (cid:104) p , γ (cid:48)(cid:48) (cid:105) , θ (cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) u (cid:105) , θ )Since (( p, γ ) , θ ) → (( p , γ (cid:48)(cid:48) ) , θ (cid:48) ), we have ( (cid:104) p, γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γ (cid:48)(cid:48) w (cid:105) , θ (cid:48) ) for w ∈ Γ ∗ , hence ( (cid:104) p, γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) uw (cid:105) , θ ).The property holds.(b) (( p, γ ) , θ ) → r (( p, γ ) , θ ) cannot hold for the case i = 0. Basis. i = 1 . In this case, (( p, γ ) , θ ) → r (( p (cid:48) , γ (cid:48) ) , θ ), then we can get p ∈ G and ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) (cid:105) , θ ). The property holds.10 tep. i >

0. As done in the proof of part (a) of this lemma, there exists p , γ (cid:48)(cid:48) ∈ Γ, θ (cid:48)(cid:48) ⊆ ∆ ∪ ∆ c s.t. (( p, γ ) , θ ) −→ (( p , γ (cid:48)(cid:48) ) , θ (cid:48) ) −−→ i − (( p (cid:48) , γ (cid:48) ) , θ ).Then if (( p, γ ) , θ ) → r (( p (cid:48) , γ (cid:48) ) , θ ), either (( p , γ (cid:48)(cid:48) ) , θ (cid:48) ) → r (( p (cid:48) , γ (cid:48) ) , θ ) or(( p, γ ) , θ ) −→ (( p , γ (cid:48)(cid:48) ) , θ (cid:48) ) holds. In the ﬁrst case i.e. (( p , γ (cid:48)(cid:48) ) , θ (cid:48) ) → r (( p (cid:48) , γ (cid:48) ) , θ ),by the induction hypothesis, we can have ( (cid:104) p , γ (cid:48)(cid:48) (cid:105) , θ (cid:48) ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) u (cid:105) , θ ),hence, ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) u (cid:105) , θ ) holdsThe second case depends on the rule applied to get (( p, γ ) , θ ) −→ (( p , γ (cid:48)(cid:48) ) , θ (cid:48) )according to Deﬁnition 4.- If this edge corresponds to a transition r c : p ( σ,σ (cid:48) ) (cid:44) −−−−→ p ∈ θ , then γ = γ (cid:48)(cid:48) , θ (cid:48) = θ \ σ ∪ σ (cid:48) and p ∈ G . Since we can obtain ( (cid:104) p, γ (cid:105) , θ ) ⇒ BP ( (cid:104) p , γ (cid:105) , θ (cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) uw (cid:105) , θ ) from part ( a ) and p ∈ G , then ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p , γ (cid:105) , θ (cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) uw (cid:105) , θ ). This implies that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ )for some v ∈ Γ ∗ . - If this edge corresponds to a transition r : (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , γ (cid:48)(cid:48) (cid:105) ∈ θ ∩ ∆ , then θ (cid:48) = θ and p ∈ G . Since we can obtain ( (cid:104) p, γ (cid:105) , θ ) ⇒ BP ( (cid:104) p , γ (cid:48)(cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) uw (cid:105) , θ ) from part ( a ) and p ∈ G , then ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p , γ (cid:48)(cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) uw (cid:105) , θ ). This implies that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) for some v ∈ Γ ∗ .- If this edge corresponds to a transition r : (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48)(cid:48) , γ γ (cid:48)(cid:48) (cid:105) ∈ θ ,then either p ∈ G or ( (cid:104) p (cid:48)(cid:48) , γ (cid:105) , θ ) ⇒ r ( (cid:104) p , (cid:15) (cid:105) , θ (cid:48) ) holds. If p ∈ G , thenwe have ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , γ γ (cid:48)(cid:48) (cid:105) , θ ). Otherwise, ( (cid:104) p (cid:48)(cid:48) , v γ (cid:48)(cid:48) w (cid:105) , θ ) ⇒ r ( (cid:104) p , γ (cid:48)(cid:48) w (cid:105) , θ (cid:48) ). Since we can obtain ( (cid:104) p , γ (cid:48)(cid:48) (cid:105) , θ (cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) u (cid:105) , θ ) frompart ( a ). Therefore, ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p , γ (cid:48)(cid:48) (cid:105) , θ (cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) u (cid:105) , θ ). Thisimplies that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) for some v ∈ Γ ∗ .‘ ⇐ ”: Assume ( (cid:104) p, γ (cid:105) , θ ) i ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ). We proceed by induction on i .(a) Basis. i = 0. In this case, v = (cid:15) and ( (cid:104) p, γ (cid:105) , θ ) = ( (cid:104) p (cid:48) , γ (cid:48) (cid:105) , θ ), then(( p, γ ) , θ ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ) holds. Step. i >

0. Then there exist p ∈ P, u ∈ Γ ∗ and θ (cid:48) ⊆ ∆ ∪ ∆ c such that( (cid:104) p, γ (cid:105) , θ ) ⇒ ( (cid:104) p , u (cid:105) , θ (cid:48) ) i − ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ). There are 2 cases:1. Case θ (cid:48) = θ : There must exist a rule r : (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , u (cid:105) ∈ ∆ such that r ∈ θ (cid:48) and | u | ≥

1. Let l denote the minimal length of the stack on thepath from ( (cid:104) p , u (cid:105) , θ ) to ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ). Then u can be written as u (cid:48)(cid:48) γ u (cid:48) where | u (cid:48) | = l − u (cid:48) will remain on the stack for the path).Furthermore, there exists p (cid:48)(cid:48)(cid:48) such that ( (cid:104) p , u (cid:48)(cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) forsome θ (cid:48)(cid:48) ⊆ ( ∆ c ∪ ∆ ). We have ( (cid:104) p, γ (cid:105) , θ ) k ⇒ ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ u (cid:48) (cid:105) , θ (cid:48)(cid:48) ) for k < i . Bythe induction on i , we have (( p, γ ) , θ ) → ∗ (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ). Because u (cid:48) hasto remain on the stack for the rest of the path, v is of the form v (cid:48) u (cid:48) forsome v (cid:48) ∈ Γ ∗ . That means ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ (cid:105) , θ (cid:48)(cid:48) ) j ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:48) (cid:105) , θ ) for j < i . Bythe induction hypothesis, (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ) holds. Moreover,we have (( p, γ ) , θ ) → ∗ (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ), hence (( p, γ ) , θ ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ).11. Case θ (cid:48) (cid:54) = θ : There must be a rule r c : p ( σ,σ (cid:48) ) (cid:44) −−−−→ p ∈ ∆ c such that r c ∈ θ and σ ∩ θ (cid:54) = ∅ , then θ (cid:48) = θ \ σ ∪ σ (cid:48) . After the execution of r c , the content of the stack will remain the same, thus, u = γ . Then( (cid:104) p, γ (cid:105) , θ ) ⇒ ( (cid:104) p , γ (cid:105) , θ (cid:48) ) i − ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ). By the induction hypothe-sis to ( (cid:104) p , γ (cid:105) , θ (cid:48) ) i − ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ), we can obtain that (( p , γ ) , θ (cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ). Since ( (cid:104) p, γ (cid:105) , θ ) ⇒ ( (cid:104) p , γ (cid:105) , θ (cid:48) ), then we can have a path(( p, γ ) , θ ) → (( p , γ ) , θ (cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ) that implies (( p, γ ) , θ ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ). The property holds.(b) ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p, γ (cid:48) v (cid:105) , θ ) is impossible in 0 steps. Basis. i = 1. ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p, γ (cid:105) , θ ), then p ∈ G . Thus, (( p, γ ) , θ ) → r (( p, γ ) , θ ) holds. Step. i >

1. ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) holds, then there exist p ∈ P, u ∈ Γ ∗ and θ (cid:48) ⊆ ∆ ∪ ∆ c such that ( (cid:104) p, γ (cid:105) , θ ) ⇒ ( (cid:104) p , u (cid:105) , θ (cid:48) ) i − ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ).Thus, either ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p , u (cid:105) , θ (cid:48) ) or ( (cid:104) p , u (cid:105) , θ (cid:48) ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) holds.The ﬁrst case implies p ∈ G. There are 2 cases:1. Case θ (cid:48) = θ : then as in the previous proof of part (a), we can have apath (( p, γ ) , θ ) → ∗ (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ). Since p ∈ G , we getby Deﬁnition 4 (( p, γ ) , θ ) → ∗ (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ). Thus, wehave that (( p, γ ) , θ ) → r (( p (cid:48) , γ (cid:48) ) , θ ). The property holds.2. Case θ (cid:48) (cid:54) = θ : then as in the previous proof of part (a), we can havea path (( p, γ ) , θ ) → (( p , γ ) , θ (cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ). Since p ∈ G , we get(( p, γ ) , θ ) −→ (( p , γ ) , θ (cid:48) ) → ∗ (( p (cid:48) , γ (cid:48) ) , θ ). Thus, we have that (( p, γ ) , θ ) → r (( p (cid:48) , γ (cid:48) ) , θ ). The property holds.In the second case, ( (cid:104) p , u (cid:105) , θ (cid:48) ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ) holds. As previously, thereare 2 cases:1. Case θ (cid:48) = θ : then as in case (a) we have ( (cid:104) p , u (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ u (cid:48) (cid:105) , θ (cid:48)(cid:48) )and ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ (cid:105) , θ (cid:48)(cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:48) (cid:105) , θ ). If ( (cid:104) p , u (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:105) , θ ),then either ( (cid:104) p , u (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ u (cid:48) (cid:105) , θ (cid:48)(cid:48) ) or ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ (cid:105) , θ (cid:48)(cid:48) ) ⇒ r ( (cid:104) p (cid:48) , γ (cid:48) v (cid:48) (cid:105) , θ ).- If ( (cid:104) p , u (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ u (cid:48) (cid:105) , θ (cid:48)(cid:48) ), let u (cid:48)(cid:48) ∈ Γ ∗ s.t. u = u (cid:48)(cid:48) γ u (cid:48) and ( (cid:104) p , u (cid:48)(cid:48) (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ), then, we have (( p, γ ) , θ ) → r (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ). We have ( (cid:104) p, γ (cid:105) , θ ) k ⇒ ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ u (cid:48) (cid:105) , θ (cid:48)(cid:48) ) for k < i . Bythe induction on i , we have (( p, γ ) , θ ) → ∗ (( p (cid:48)(cid:48)(cid:48) , γ ) , θ (cid:48)(cid:48) ). Because u (cid:48) has to remain on the stack for the rest of the path, v is of the form v (cid:48) u (cid:48) for some v (cid:48) ∈ Γ ∗ . That means ( (cid:104) p (cid:48)(cid:48)(cid:48) , γ (cid:105) , θ (cid:48)(cid:48) ) j ⇒ ( (cid:104) p (cid:48) , γ (cid:48) v (cid:48) (cid:105) , θ ) for j

We can now prove Theorem 2.

Proof:

Let (( p, γ ) , θ ) be a repeating head, then there exists some v ∈ Γ ∗ , θ ⊆ ∆ c ∪ ∆ such that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p, γv (cid:105) , θ ). By Lemma 1, this is the case if andonly if (( p, γ ) , θ ) → r (( p, γ ) , θ ). From the deﬁnition of → r , that means that thereexist heads (( p , γ ) , θ (cid:48) ) and (( p , γ ) , θ (cid:48)(cid:48) ) such that (( p, γ ) , θ ) → ∗ (( p , γ ) , θ (cid:48) ) −→ (( p , γ ) , θ (cid:48)(cid:48) ) → ∗ (( p, γ ) , θ ) . Then (( p, γ ) , θ ) , (( p , γ ) , θ (cid:48) ) and (( p , γ ) , θ (cid:48)(cid:48) ) are allin the same loop with a 1-labelled edge. Conversely, whenever (( p, γ ) , θ ) is in acomponent with such an edge, (( p, γ ) , θ ) → r (( p, γ ) , θ ) holds, then Lemma 1implies that ( (cid:104) p, γ (cid:105) , θ ) ⇒ r ( (cid:104) p, γv (cid:105) , θ ) which means that (( p, γ ) , θ ) is a repeatinghead. (cid:50) BP -automata To compute G , we need to be able to compute predecessors of conﬁgurations ofthe form ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), and to determine whether these predecessors were backward-reachable using some control points in G (item 3 in Deﬁnition 4). To solve thisquestion, we will label conﬁgurations ( (cid:104) p (cid:48)(cid:48) , w (cid:105) , θ ) s.t. ( (cid:104) p (cid:48)(cid:48) , w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )by 1 if this path went through an accepting location in G , i.e., if ( (cid:104) p (cid:48)(cid:48) , w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), and by 0 if not. To this aim, we deﬁne a labelled conﬁguration as atuple [( (cid:104) p, w (cid:105) , θ ) , b ], s.t. ( (cid:104) p, w (cid:105) , θ ) is a conﬁguration and b ∈ { , } .Multi-automata were introduced in [2,10] to ﬁnitely represent regular inﬁnitesets of conﬁgurations of a PDS. Since a labelled conﬁguration c = [( (cid:104) p, w (cid:105) , θ ) , b ]of a SM-PDS involves a PDS conﬁguration (cid:104) p, w (cid:105) , together with the current setof transition rules (phase) θ , and a boolean b , in order to take into account thephases θ , and these new 0 / BP -automata as follows: Deﬁnition 5.

Let BP = ( P, Γ, ∆, ∆ c , G ) be a SM-BPDS. A labelled BP -automatonis a tuple A = ( Q, Γ, T, I, F ) where Γ is the automaton alphabet, Q is a ﬁnite et of states, I ⊆ P × ∆ ∪ ∆ c ⊆ Q is the set of initial states, T ⊂ Q × (cid:0) ( Γ ∪{ (cid:15) } ) × { , } (cid:1) × Q is the set of transitions, F ⊆ Q is the set of ﬁnal states. If (cid:0) q, [ γ, b ] , q (cid:48) (cid:1) ∈ T , we write q [ γ,b ] −−−→ T q (cid:48) . We extend this notation in the obviousway to sequences of symbols: (1) ∀ q ∈ Q, q [ (cid:15), −−−→ T q , and (2) ∀ q, q (cid:48) ∈ Q, ∀ b ∈{ , } , ∀ w ∈ Γ ∗ for w = γ ...γ n +1 , q [ w,b ] −−−→ T q (cid:48) iﬀ ∃ q , ..., q n ∈ Q, b , ..., b n +1 ∈{ , } , b = b ∨ b ∨ ... ∨ b n +1 and q [ γ ,b ] −−−−−→ T q γ ,b ] −−−−−→ T q · · · q n [ γ n +1 ,b n +1 ] −−−−−−−−→ T q (cid:48) . If q [ w,b ] −−−→ T q (cid:48) holds, we say that q [ w,b ] −−−→ T q (cid:48) and q [ γ ,b ] −−−−−→ T q γ ,b ] −−−−−→ T q · · · q n [ γ n +1 ,b n +1 ] −−−−−−−−→ T q (cid:48) is a path of A .A labelled conﬁguration [( (cid:104) p, w (cid:105) , θ ) , b ] is accepted by the automaton A iﬀthere exists a path ( p, θ ) [ γ ,b ] −−−−−→ T q γ ,b ] −−−−−→ T q · · · q n [ γ n ,b n ] −−−−−→ T q n +1 in A such that w = γ γ · · · γ n , b = b ∨ b ∨ ... ∨ b n , ( p, θ ) ∈ I , and q n +1 ∈ F . Let L ( A ) be theset of labelled conﬁgurations accepted by A . pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) Given a conﬁguration of the form ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), our goal is to compute a labelled BP -automaton A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) that accepts labelled conﬁgurations of theform [ c, b ] where c is a conﬁguration and b ∈ { , } such that c ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )(i.e., c ∈ pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) ) and b = 1 iﬀ this path went through ﬁnal controlpoints, i.e., c ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). Otherwise, b = 0.Let p ∈ P , we deﬁne B ( p ) = 1 if p ∈ G and B ( p ) = 0 otherwise. A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) =( Q, Γ, T, I, F ) is computed as follows: Initially, Q = I = F = { ( p (cid:48) , θ (cid:48) ) } and T = ∅ . We add to T transitions as follows: α : If r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , w (cid:105) ∈ ∆ . If there exists in T a path ( p , θ ) [ w,b ] −−−→ T q (in case | w | = 0, we have w = (cid:15) ) with r ∈ θ . Then, add ( p, θ ) to I , and (cid:0) ( p, θ ) , [ γ, B ( p ) ∨ b ] , q (cid:1) to T . α : if r = p ( σ,σ (cid:48) ) (cid:44) −−−−→ p ∈ ∆ c and there exists in T a transition ( p , θ ) [ γ,b ] −−−→ T q with r ∈ θ , where γ ∈ Γ . Then add ( p, θ (cid:48) ) to I , and (cid:0) ( p, θ (cid:48) ) , [ γ, B ( p ) ∨ b ] , q (cid:1) to T , for θ (cid:48) such that θ = θ (cid:48) \ σ ∪ σ (cid:48) .The procedure above terminates since there is a ﬁnite number of states andphases. Note that by construction, F = { ( p (cid:48) , θ (cid:48) ) } , and, since initially Q = { ( p (cid:48) , θ (cid:48) ) } , states of A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) are all of the form ( p, θ ) for p ∈ P and θ ⊆ ∆ ∪ ∆ c .Let us explain the intuition behind rule ( α ). Let r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , w (cid:105) ∈ ∆ . Let c = ( (cid:104) p , ww (cid:48) (cid:105) , θ ) and c (cid:48) = ( (cid:104) p, γw (cid:48) (cid:105) , θ ). Then, if c ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), then necessar-ily, c (cid:48) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). Moreover, c (cid:48) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) iﬀ either c ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) or p ∈ G (i.e. B ( p ) = 1). Thus, we would like that if the automaton A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) accepts the labelled conﬁguration [ c, b ] (where b = 1 means c ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )),then it should also accept the labelled conﬁguration [ c (cid:48) , b ∨ B ( p )] ( b ∨ B ( p ) = 1means c (cid:48) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )). Thus, if the automaton A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) contains14 path of the form π = ( p , θ ) [ w,b ] −−−−→ T q [ w (cid:48) ,b ] −−−−→ T q f where q f ∈ F that ac-cepts the labelled conﬁguration [ c, b ], then the automaton should also accept thelabelled conﬁguration [ c (cid:48) , b ∨ B ( p )]. This conﬁguration is accepted by the run( p, θ ) [ γ,B ( p ) ∨ b ] −−−−−−−−→ T q [ w (cid:48) ,b ] −−−−−→ T q f added by rule ( α ).Rule ( α ) deals with modifying rules: Let r = p ( r ,r ) (cid:44) −−−−→ p ∈ ∆ c . Let c = ( (cid:104) p , γw (cid:48) (cid:105) , θ ) and c (cid:48) = ( (cid:104) p, γw (cid:48) (cid:105) , θ (cid:48)(cid:48) ) s.t. θ = θ (cid:48)(cid:48) \{ r } ∪ { r } . Then, if c ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), then necessarily, c (cid:48) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). Moreover, c (cid:48) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )iﬀ either c ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) or p ∈ G (i.e. B ( p ) = 1). Thus, we need to impose thatif the automaton A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) contains a path of the form ( p , θ ) [ γ,b ] −−−→ T q [ w (cid:48) ,b ] −−−−→ T q f (where q f ∈ F ) that accepts the labelled conﬁguration [ c, b ] , b = b ∨ b ( b = 1 means c ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )), then necessarily, the automaton A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) should also accept the labelled conﬁguration [ c (cid:48) , b ∨ B ( p )]. This conﬁguration isaccepted by the run ( p, θ (cid:48)(cid:48) ) [ γ,B ( p ) ∨ b ] −−−−−−−→ T q [ w (cid:48) ,b ] −−−−→ T q f added by rule ( α ).Before proving that our construction is correct, we introduce the followingdeﬁnition: Deﬁnition 6.

Let A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) = ( Q, Γ, T, P, F ) be the labelled P -automatoncomputed by the saturation procedure above. In this section, we use −→ i T to denotethe transition relation of A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) obtained after adding i transitionsusing the saturation procedure above. Let us notice that due to the fact that ini-tially Q = { ( p (cid:48) , θ (cid:48) ) } and due to rules ( α ) and ( α ) that at step i add onlytransitions of the form ( p, θ ) γ −→ T q for a state q that is already in the automatonat step i − , then, states of A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) are all of the form ( p, θ ) for p ∈ P and θ ⊆ ∆ ∪ ∆ c . We can show that:

Lemma 2.

Let p, p (cid:48)(cid:48) ∈ P and θ, θ (cid:48)(cid:48) ⊆ ∆ ∪ ∆ c . Let w ∈ Γ ∗ and b ∈ { , } . If apath ( p, θ ) [ w,b ] −−−→ T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) is in A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) , then ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) .Moreover, if b = 1 , then ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) . Proof:

Initially, the automaton contains no transitions. Let i be an index suchthat ( p, θ ) [ w,b ] −−−→ i T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) holds. We proceed by induction on i . Basis. i = 0, then ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) [ (cid:15), −−−→ T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ). This means p (cid:48)(cid:48) = p (cid:48) , θ (cid:48)(cid:48) = θ (cid:48) . Since initially Q = { ( p (cid:48) , θ (cid:48) ) } , then ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) always holds. Step. i >

0. Let t = (cid:0) ( p , θ ) , [ γ, b ] , ( p , θ ) (cid:1) be the i -th transition added to A pre ∗ and j be the number of times that t is used in the path ( p, θ ) [ w,b ] −−−→ i T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ).The proof is by induction on j . If j = 0, then we have ( p, θ ) [ w,b ] −−−→ i − T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) inthe automaton, and we apply the induction hypothesis (induction on i ) thenwe obtain ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ). So assume that j >

0. Then, there exist u, v ∈ Γ ∗ , b (cid:48) , b (cid:48)(cid:48) ∈ { , } such that w = uγv , b = b (cid:48) ∨ b ∨ b (cid:48)(cid:48) and15 p, θ ) [ u,b (cid:48) ] −−−→ i − T ( p , θ ) [ γ,b ] −−−→ i T ( p , θ ) [ v,b (cid:48)(cid:48) ] −−−−→ i T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) (1)The application of the induction hypothesis (induction on i ) to ( p, θ ) [ u,b (cid:48) ] −−−→ i − T ( p , θ ) gives that( (cid:104) p, u (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , (cid:15) (cid:105) , θ ) , moreover, if b (cid:48) = 1 , ( (cid:104) p, u (cid:105) , θ ) ⇒ r ( (cid:104) p , (cid:15) (cid:105) , θ ) (2)There are 2 cases depending on whether transition t was added by saturationrule α or α .1. Case t was added by rule α : There exist p ∈ P and w ∈ Γ ∗ such that r = (cid:104) p , γ (cid:105) (cid:44) → (cid:104) p , w (cid:105) ∈ ∆ ∩ θ (3)and A pre ∗ contains the following path: π (cid:48) = ( p , θ ) [ w ,b ] −−−−→ i − T ( p , θ ) [ v,b (cid:48)(cid:48) ] −−−−→ i T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) , b = b ∨ B ( p ) (4)Applying the transition rule r , we get that( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , w v (cid:105) , θ ) (5)By induction on j (since transition t is used j − π (cid:48) ), we get from(4) that( (cid:104) p , w v (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) moreover, if b ∨ b (cid:48)(cid:48) = 1 , ( (cid:104) p , w v (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )(6)Putting (2), (5) and (6) together, we can obtain that( (cid:104) p, w (cid:105) , θ ) = ( (cid:104) p, uγv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , w v (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )Furthermore, if b = b (cid:48) ∨ b ∨ b (cid:48)(cid:48) = 1, then b (cid:48) = 1 or b ∨ b (cid:48)(cid:48) = 1.For the ﬁrst case, b (cid:48) = 1, then we can have ( (cid:104) p, u (cid:105) , θ ) ⇒ r ( (cid:104) p , (cid:15) (cid:105) , θ ) from(2). Thus, we can obtain that ( (cid:104) p, uγv (cid:105) , θ ) ⇒ r ( (cid:104) p , γv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )i.e. ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ).The second case b ∨ b (cid:48)(cid:48) = 1 i.e. B ( p ) ∨ b ∨ b (cid:48)(cid:48) = 1 implies that B ( p ) = 1(that means p ∈ G and ( (cid:104) p , γv (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )) or b ∨ b (cid:48)(cid:48) = 1 (thatimplies ( (cid:104) p , w v (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) from (6)). Therefore, ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ). 16. Case t was added by rule α : there exist p ∈ P and θ ⊆ ∆ ∪ ∆ c such that r = p σ,σ (cid:48) ) (cid:44) −−−−→ p ∈ ∆ c ∩ θ , θ = ( θ \ σ ) ∪ σ (cid:48) (7)and the following path in the current automaton ( self-modifying rule won’tchange the stack) with r ∈ θ :( p , θ ) [ γ,b (cid:48) ] −−−→ i − T ( p , θ ) [ v,b (cid:48)(cid:48) ] −−−−→ i T ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) , b = B ( p ) ∨ b (cid:48) (8)Applying the transition rule, we can get from (7) that( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , γv (cid:105) , θ ) (9)We can apply the induction hypothesis (on j ) to (8), and obtain( (cid:104) p , γv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ), moreover, if b (cid:48) ∨ b (cid:48)(cid:48) = 1 , ( (cid:104) p , γv (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )(10)From (2),(9) and (10), we get( (cid:104) p, w (cid:105) , θ ) = ( (cid:104) p, uγv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , γv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )Furthermore, if b = b (cid:48) ∨ b ∨ b (cid:48)(cid:48) = 1 , then b (cid:48) = 1 or b ∨ b (cid:48)(cid:48) = 1.For the ﬁrst case, b (cid:48) = 1, then we can have ( (cid:104) p, u (cid:105) , θ ) ⇒ r ( (cid:104) p , (cid:15) (cid:105) , θ ) from(2). Thus, we can obtain that ( (cid:104) p, uγv (cid:105) , θ ) ⇒ r ( (cid:104) p , γv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) )i.e. ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ). The second case b ∨ b (cid:48)(cid:48) = 1 i.e. B ( p ) ∨ b (cid:48) ∨ b (cid:48)(cid:48) = 1 implies that B ( p ) = 1 (that means p ∈ G and ( (cid:104) p , γv (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )) or b (cid:48) ∨ b (cid:48)(cid:48) = 1 (that implies ( (cid:104) p , γv (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ) from(10)) i.e. ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). Therefore, we can get that if b = 1, then( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , (cid:15) (cid:105) , θ (cid:48)(cid:48) ). (cid:50) Lemma 3.

If there is a labelled conﬁguration [( (cid:104) p, w (cid:105) , θ ) , b ] such that ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) , then there is a path ( p, θ ) [ w,b ] −−−→ T ( p (cid:48) , θ (cid:48) ) in A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) . More-over, if ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) , then b = 1 . Proof:

Assume ( (cid:104) p, w (cid:105) , θ ) i ⇒ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). We proceed by induction on i . Basis. i = 0. Then θ = θ (cid:48) , p (cid:48) = p and w = (cid:15) . Initially, we have that Q = { ( p (cid:48) , θ (cid:48) ) } ,therefore, by the deﬁnition of → T , we have ( p (cid:48) , θ (cid:48) ) (cid:15) −→ T ( p (cid:48) , θ (cid:48) ). We cannot have( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) in 0-step. Step. i >

0. Then there exists a conﬁguration ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) such that( (cid:104) p, w (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) i − ⇒ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) )17e apply the induction hypothesis to ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) i − ⇒ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), and obtain thatthere exists in A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) a path ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) [ u,b (cid:48)(cid:48) ] −−−−→ T ( p (cid:48) , θ (cid:48) ). If ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), b (cid:48)(cid:48) = 1.Let ( p , θ ) be a state of A pre ∗ . Let w , u ∈ Γ ∗ , γ ∈ Γ, b (cid:48)(cid:48) , b (cid:48)(cid:48) ∈ { , } besuch that w = γw , u = u w , b (cid:48)(cid:48) = b (cid:48)(cid:48) ∨ b (cid:48)(cid:48) and( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) [ u ,b (cid:48)(cid:48) ] −−−−−→ T ( p , θ ) [ w ,b (cid:48)(cid:48) ] −−−−−→ T ( p (cid:48) , θ (cid:48) ) (1)There are two cases depending on which rule is applied to get ( (cid:104) p, w (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ).1. Case ( (cid:104) p, w (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) is obtained by a rule of the form: (cid:104) p, γ (cid:105) (cid:44) →(cid:104) p (cid:48)(cid:48) , u (cid:105) ∈ ∆ . In this case, θ (cid:48)(cid:48) = θ. By the saturation rule α , we have( p, θ (cid:48)(cid:48) ) [ γ,b ] −−−−→ T ( p , θ ) , b = B ( p ) ∨ b (cid:48)(cid:48) (2)Putting (1) and (2) together, we can obtain that π = ( p, θ (cid:48)(cid:48) ) [ γ,b ] −−−−→ T ( p , θ ) [ w ,b (cid:48)(cid:48) ] −−−−−→ T ( p (cid:48) , θ (cid:48) ) (3)Thus, ( p, θ (cid:48)(cid:48) ) [ γw ,b ∨ b (cid:48)(cid:48) ] −−−−−−−−→ T ( p (cid:48) , θ (cid:48) ) i.e. ( p, θ ) [ w,b ] −−−→ T ( p (cid:48) , θ (cid:48) ) where b = b ∨ b (cid:48)(cid:48) .2. Case ( (cid:104) p, w (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) is obtained by a rule of the form p ( σ,σ (cid:48) ) (cid:44) −−−−→ p (cid:48)(cid:48) ∈ ∆ c i.e θ (cid:48)(cid:48) (cid:54) = θ. In this case, u = γ . By the saturation rule β , we obtainthat ( p, θ ) [ γ,b ] −−−−→ T ( p , θ ) where θ (cid:48)(cid:48) = θ \{ r } ∪ { r } , b = B ( p ) ∨ b (cid:48)(cid:48) . (4)Putting (1) and (4) together, we have the following path( p, θ ) [ γ,b ] −−−−→ T ( p , θ ) [ w ,b (cid:48)(cid:48) ] −−−−−→ T ( p (cid:48) , θ (cid:48) ) i.e. ( p, θ ) [ w,b ] −−−→ T ( p (cid:48) , θ (cid:48) ) where b = b ∨ b (cid:48)(cid:48) (5)Furthermore, if ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), then ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) or( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ).For the ﬁrst case, ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ), then p ∈ G i.e. B ( p ) = 1. Forthe second case, ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), we can get b (cid:48)(cid:48) = 1 (from inductionhypothesis). Thus, b = b ∨ b (cid:48)(cid:48) = B ( p ) ∨ b (cid:48)(cid:48) ∨ b (cid:48)(cid:48) = B ( p ) ∨ b (cid:48)(cid:48) = 1. Therefore, if( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), then we can obtain b = 1. (cid:50) From these two lemmas, we get:

Theorem 3.

Let [ c, b ] be a labelled conﬁguration. Then [ c, b ] is in L ( A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) iﬀ c ∈ pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) . Moreover, c ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) iﬀ b = 1 . roof: Let [( (cid:104) p, w (cid:105) , θ ) , b ] be a conﬁguration of pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) ). Then ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). By Lemma 2, we can obtain that there exists a path ( p, θ ) [ w,b ] −−−→ T ( p (cid:48) , θ (cid:48) ) in A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) . So [( (cid:104) p, w (cid:105) , θ ) , b ] is in L ( A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) ). More-over, if ( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), then b = 1.Conversely, let [( (cid:104) p, w (cid:105) , θ ) , b ] be a conﬁguration accepted by A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) i.e. there exists a path ( p, θ ) [ w,b ] −−−→ T ( p (cid:48) , θ (cid:48) ) in A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) . By Lemma3, ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) i.e. ( (cid:104) p, w (cid:105) , θ ) ∈ pre ∗ ( L ( A )). Moreover, if b = 1,( (cid:104) p, w (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ). (cid:50) G Based on the deﬁnition of the Head Reachability Graph G , and on Theorem 3,we can compute G as follows. Initially, G has no edges. α (cid:48) : if r c : p ( σ,σ (cid:48) ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c , then for every phase θ such that r c ∈ θ and every γ ∈ Γ , we add the edge (( p, γ ) , θ ) B ( p ) −−−→ (( p (cid:48) , γ ) , θ ) to the graph G , where θ = θ \ σ ∪ σ (cid:48) . α (cid:48) : if r : (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , γ (cid:105) ∈ ∆ , then for every phase θ such that r ∈ θ , we addthe edge (( p, γ ) , θ ) B ( p ) −−−→ (( p , γ ) , θ ) to the graph G . α (cid:48) : if r : (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , γ γ (cid:48) (cid:105) ∈ ∆ , then for every phase θ such that r ∈ θ , we addto the graph G the edge (( p, γ ) , θ ) B ( p ) −−−→ (( p , γ ) , θ ). Moreover, for everycontrol point p (cid:48) ∈ P and phase θ (cid:48) such that A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) contains atransition of the form t = ( p , θ ) [ γ ,b ] −−−−→ T ( p (cid:48) , θ (cid:48) ), we add to the graph G theedge (( p, γ ) , θ ) b ∨ B ( p ) −−−−→ (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ).Items α (cid:48) and α (cid:48) are obvious. They respectively correspond to item 1 anditem 2 of Deﬁnition 4 (since B ( p ) = 1 iﬀ p ∈ G ). Item α (cid:48) is based onLemma 1 and on item 3 of Deﬁnition 4. Indeed, it follows from Lemma 1 that A pre ∗ (cid:0) ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ) (cid:1) contains a transition of the form ( p , θ ) [ γ ,b ] −−−−→ T ( p (cid:48) , θ (cid:48) ) impliesthat ( (cid:104) p , γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ), and if b = 1, then ( (cid:104) p , γ (cid:105) , θ ) ⇒ r ( (cid:104) p (cid:48) , (cid:15) (cid:105) , θ (cid:48) ).Thus, in this case, the edge (( p, γ ) , θ ) b ∨ B ( p ) −−−−→ (( p (cid:48) , γ (cid:48) ) , θ (cid:48) ) is added to G (item 3of Deﬁnition 4) since (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , γ γ (cid:48) (cid:105) ∈ ∆ . We implemented our approach in a tool and we compared its performance againstthe approaches that consist in translating the SM-PDS to an equivalent stan-dard (or symbolic) PDS, and then applying the standard LTL model checking19lgorithms implemented in the PDS model-checker tool Moped [27]. All our ex-periments were run on Ubuntu 16.04 with a 2.7 GHz CPU, 2GB of memory.To perform the comparison, we randomly generate several SM-PDSs and LTLformulas of diﬀerent sizes. The results (CPU Execution time) are shown in Ta-ble 1.

Column

Size is the size of SM-PDS ( S for non-modifying transitions ∆ and S for modifying transitions ∆ c ). Column

LTL gives the size of the tran-sitions of the B¨uchi automaton generated from the LTL formula (using the toolLTL2BA[21]).

Column

SM-PDS gives the cost of our direct algorithm presentedin this paper.

Column

PDS shows the cost it takes to get the equivalent PDSfrom the SM-PDS.

Column

Result reports the cost it takes to run the LTL PDSmodel-checker Moped [27] for the PDS we got.

Column

Total is the total costit takes to translate the SM-PDS into a PDS and then apply the standard LTLmodel checking algorithm of Moped (Total=PDS+Result).

Column

SymbolicPDS reports the cost it takes to get the equivalent Symbolic PDS from the SM-PDS.

Column

Result is the cost to run the Symbolic PDS LTL model-checkerMoped. Column

T otal is the total cost it takes to translate the SM-PDS intoa symbolic PDS and then apply the standard LTL model checking algorithmof Moped. You can see that our direct algorithm ( Column

SM-PDS ) is muchmore eﬃcient than translating the SM-PDS to an equivalent (symbolic) PDS,and then run the standard LTL model-checker Moped.

Translating the SM-PDS to a standard PDS may take more than 20 days, whereas ourdirect algorithm takes only a few seconds.

Moreover, since the obtainedstandard (symbolic) PDS is huge, Moped failed to handle several cases (the timelimit that we set for Moped is 20 minutes), whereas our tool was able to dealwith all the cases in only a few seconds.

As described in [4], severalmalicious behaviors can be described by LTL formulas. We give in what followsthree examples of such malicious behaviors and show how they can be describedby LTL formulas:

Registry Key Injecting:

In order to get started at boot time, many malwaresadd themselves into the registry key listing. This behavior is typically imple-mented by ﬁrst calling the API function GetModuleFileNameA to retrieve thepath of the malware’s executable ﬁle. Then, the API function RegSetValueExAis called to add the ﬁle path into the registry key listing. This malicious behaviorcan be described in LTL as follows: φ rk = F (cid:0) call GetM oduleF ileN ameA ∧ F ( call RegSetV alueExA ) (cid:1) This formula expresses that if a call to the API function GetModuleFile-NameA is followed by a call to the API function RegSetValueExA, then probablya malware is trying to add itself into the registry key listing.

Data-Stealing:

Stealing data from the host is a popular malicious behavior thatintend to steal any valuable information including passwords, software codes,bank information, etc. To do this, the malware needs to scan the disk to ﬁnd the20 ize LTL SM-PDS PDS Result Total Symbolic PDS

Result T otal S : 5 , S : 2 | δ | :15 S : 5 , S : 3 | δ | :8 S : 11 , S : 4 | δ | :8 S : 5 , S : 3 | δ | :10 S : 110 , S : 4 | δ | :8 S : 255 , S : 8 | δ | :8 S : 255 , S : 8 | δ | :10 S : 110 , S : 4 | δ | :15 S : 255 , S : 8 | δ | :15 S : 110 , S : 4 | δ | :20 S : 255 , S : 8 | δ | :20 S : 255 , S : 8 | δ | :25 S : 2059 , S : 7 | δ | :8 S : 2059 , S : 9 | δ | :8 S : 2059 , S : 11 | δ | :8 S : 2059 , S : 11 | δ | :28 S : 3050 , S : 10 | δ | :8 S : 3090 , S : 10 | δ | :8 S : 3050 , S : 10 | δ | :20 S : 3090 , S : 10 | δ | :30 S : 3090 , S : 10 | δ | :25 S : 4050 , S : 10 | δ | :8 S : 4050 , S : 10 | δ | :28 S : 4058 , S : 11 | δ | :8 S : 4058 , S : 11 | δ | :25 S : 5050 , S : 11 | δ | :8 S : 5090 , S : 11 | δ | :8 S : 5090 , S : 11 | δ | :10 S : 6090 , S : 11 | δ | :8 S : 6090 , S : 11 | δ | :10 S : 6090 , S : 11 | δ | :40 S : 7090 , S : 11 | δ | :25 S : 7090 , S : 11 | δ | :30 S : 9090 , S : 11 | δ | :8 S : 9090 , S : 11 | δ | :20 S : 10050 , S : 12 | δ | :8 S : 10050 , S : 12 | δ | :25 S : 10050 , S : 12 | δ | :30 S : 10150 , S : 12 | δ | :35 S : 10150 , S : 14 | δ | :8 S : 10150 , S : 14 | δ | :40 S : 10150 , S : 12 | δ | :40 S : 10150 , S : 16 | δ | :45 S : 10150 , S : 12 | δ | :60 S : 10150 , S : 12 | δ | :65 S : 10150 , S : 16 | δ | :65 S : 10180 , S : 16 | δ | :65 S : 10180 , S : 16 | δ | :78 Table 1: Our approach vs. standard LTL for PDSs21nteresting ﬁle that he wants to steal. After ﬁnding the ﬁle, the malware needs tolocate it. To this aim, the malware ﬁrst calls the API function GetModuleHan-dleA to get a base address to search for a location of the ﬁle. Then the malwarestarts looking for the interesting ﬁle by calling the API function FindFirstFileA.Then the API functions CreateFileMappingA and MapViewOfFile are called toaccess the ﬁle. Finally, the speciﬁc ﬁle can be copied by calling the API functionCopyFileA. Thus, this data-stealing malicious behavior can be described by thefollowing LTL formula as follows: φ ds = F ( call GetModuleHandleA ∧ F ( call F indF irstF ileA ∧ F ( call CreateF ileMappingA ∧ F ( call MapV iewofF ile ∧ F call CopyF ileA )))) Spy-Worm:

A spy worm is a malware that can record data and send it using theSocket API functions. For example, Keylogger is a spy worm that can record thekeyboard states by calling the API functions GetAsyKeyState and GetKeyStateand send that to the speciﬁc server by calling the socket function sendto. Anotherspy worm can also spy on the I/O device rather than the keyboard. For this, itcan use the API function GetRawInputData to obtain input from the speciﬁeddevice, and then send this input by calling the socket functions send or sendto.Thus, this malicious behavior can be described by the following LTL formula: φ sw = F (cid:0) ( call GetAsyncKeyState ∨ call GetRawInputData ) ∧ F ( call sendto ∨ call send ) (cid:1) Appending virus:

An appending virus is a virus that inserts a copy of its codeat the end of the target ﬁle. To achieve this, since the real OFFSET of the virus’variables depends on the size of the infected ﬁle, the virus has to ﬁrst computeits real absolute address in the memory. To perform this, the virus has to callthe sequence of instructions: l : call f ; l : ....; f : pop eax; . The instruction call f will push the return address l onto the stack. Then, the pop instruction in f will put the value of this address into the register eax. Thus, the virus can getits real absolute address from the register eax. This malicious behavior can bedescribed by the following LTL formula: φ av = (cid:87) F (cid:16) call ∧ X (top-of-stack = a ) ∧ G ¬ (cid:0) ret ∧ (top-of-stack = a ) (cid:1)(cid:17) where the (cid:87) is taken over all possible return addresses a , and top-of-stack =a is a predicate that indicates that the top of the stack is a . The subformula call ∧ X (top-of-stack = a ) means that there exists a procedure call having a asreturn address. Indeed, when a procedure call is made, the program pushes itscorresponding return address a to the stack. Thus, at the next step, a will be onthe top of the stack. Therefore, the formula above expresses that there exists aprocedure call having a as return address, such that there is no ret instructionwhich will return to a .Note that this formula uses predicates that indicate that the top of the stackis a . Our techniques work for this case as well: it suﬃces to encode the top ofthe stack in the control points of the SM-PDS. Our implementation works forthis case as well and can handle appending viruses. Applying our tool for malware detection.

We applied our tool to detectseveral malwares. We use the unpack tool unpacker [28] to handle packers like22 x a m p l e S i z e L T L M u l t i p l e p r e ∗ E x a m p l e S i z e L T L M u l t i p l e p r e ∗ E x a m p l e S i z e L T L M u l t i p l e p r e ∗ T a n a t o s . b . s . s N e t s ky . c . s . s W i n . H a pp y . s . s N e t s ky . a450 . s . s M y d oo m . c . s . s M y D oo m - N . s . s M y d oo m . y . s . s M y d oo m . j . s . s k l e z - N . s . s k l e z . c . s . s M y d oo m . v . s . s N e t s ky . b . s . s R e p a h . b . s . s G i b e . b . s . s M ag i s t r . b . s . s N e t s ky . d . s . s A r du r k . d . s . s k l e z . f . s . s K e li n o .l . s . s K i p i s . t . s . s k l e z . d . s . s K e li n o . g4700 . s . s P l ag e . b . s . s U r b e . a1230 . s . s k l e z . e . s . s M ag i s t r . b . s . s M ag i s t r . a . p o l y . s . s A d o n . . s . s A d o n . . s . s Sp a m . T e d r oo . A B . s . s A k e z . s . s A l c a u l. d . s . s A l a u l. c . s . s H a h a r i n . A . s . s f s A u t o B . F . s . s H a h a r i n . d r . s . s L d P i n c h . B X . D LL . s . s L d P i n c h . f m y e . s . s L d P i n c h . W i n . . s . s L d P i n c h - . s . s L d P i n c h . e . s . s W i n T oga ! r f n . s . s T a n a t o s . b . s . s N e t s ky . c . s . s W i n . H a pp y . s . s N e t s ky . a450 . s . s M y d oo m . c . s . s M y D oo m - N . s . s M y d oo m . y . s . s M y d oo m . j . s . s k l e z - N . s . s k l e z . c . s . s M y d oo m . v . s . s N e t s ky . b . s . s R e p a h . b . s . s G i b e . b . s . s M ag i s t r . b . s . s N e t s ky . d . s . s A r du r k . d . s . s k l e z . f . s . s K e li n o .l . s . s K i p i s . t . s . s k l e z . d . s . s K e li n o . g4700 . s . s P l ag e . b . s . s U r b e . a1230 . s . s k l e z . e . s . s M ag i s t r . b . s . s M ag i s t r . a . p o l y . s . s M y d oo m - E G [ T r j ] . s . s E m a il. W ! c . s . s W . M y d oo m . L . s . s M y d oo m . . s . s M y d oo m . c j d z . s . s M y d oo m . D N . w o r m . s . s M y d oo m . R . s . s W i n . M y d oo m . s . s M y d oo m . o @ MM ! z i p . s . s M y d oo m . M @ mm . s . s M y D oo m . . s . s M y D oo m . N . s . s S r a m o t a . a v f . s . s M y d oo m . . s W i n . M y d oo m . . s . s W i n . R un o u c e . s . s W i n . C hu r . A . s . s W i n . C NH a c k e r . s . s W i n . S ky b ag41806 . s . s S ky b ag . A . s . s N e t s ky . a h @ MM . s . s A d o n . . s . s A d o n . . s . s Sp a m . T e d r oo . A B . s . s A k e z . s . s A l c a u l. d . s . s A l a u l. c . s . s H a h a r i n . A . s . s f s A u t o B . F . s . s H a h a r i n . d r . s . s L d P i n c h . B X . D LL . s . s L d P i n c h . f m y e . s . s L d P i n c h .. . s . s L d P i n c h - . s . s L d P i n c h . e . s . s W i n T oga ! r f n . s . s L d P i n c h . b y . s . s G e n e r i c . . s . s L d P i n c h . a rr . s . s L d P n c h - F a m . s . s T r o j . L d P i n c h . e r . s . s L d P i n c h . G e n . . s . s A nd r o m . s . s A r du r k . d . s . s G e n e r i c . . s . s J o r i k . s . s B u g b e a r - B . s . s T a n a t o s . O . s . s T a b l e : M u l t i p l e p r e ∗ v . s . o u r d i r ec t L T L m o d e l - c h ec k i n ga l go r i t h m x a m p l e S i z e R e s u l t c o s t E x a m p l e S i z e R e s u l t c o s t E x a m p l e S i z e R e s u l t c o s t T a n a t o s . b Y e s . s N e t s ky . c Y e s . s W i n . H a pp y Y e s . s N e t s ky . a45 Y e s . s M y d oo m . c Y e s . s M y D oo m - N Y e s . s M y d oo m . y Y e s . s M y d oo m . j Y e s . s k l e z - N Y e s . s k l e z . c Y e s . s M y d oo m . v Y e s . s N e t s ky . b Y e s . s R e p a h . b Y e s . s G i b e . b Y e s . s M ag i s t r . b Y e s . s N e t s ky . d Y e s . s A r du r k . d Y e s . s k l e z . f Y e s . s K e li n o .l Y e s . s K i p i s . t Y e s . s k l e z . d Y e s . s K e li n o . g470 Y e s . s P l ag e . b Y e s . s U r b e . a123 Y e s . s k l e z . e Y e s . s M ag i s t r . b Y e s . s M ag i s t r . a . p o l y Y e s . s M y d oo m . M @ mm Y e s . s M y D oo m . Y e s . s M y D oo m . N ! w o r m Y e s . s W i n . R un o u c e Y e s . s W i n . C hu r . A Y e s . s W i n . C NH a c k e r . C Y e s . s W i n . M y d oo m ! O Y e s . s M y d oo m . o @ MM ! z i p Y e s . s W . M y d oo m . k Z L Y e s . s M y d oo m - E G [ T r j ] Y e s . s E m a il. W o r m . W ! c Y e s . s W . M y d oo m . L Y e s . s W o r m . M y d oo m - Y e s . s M y d oo m . C J D Z - Y e s . s M y d oo m . D N . w o r m Y e s . s W i n . M y d oo m . R Y e s . s W i n . M y d oo m . d l np q i Y e s . s M y d oo m . o @ MM ! z i p Y e s . s S r a m o t a . a v f Y e s . s B e h a v e s L i k e . M y d oo m Y e s . s W i n . M y d oo m . Y e s . s M y d oo m . A C Q Y e s . s M y d oo m . b a19423 Y e s . s M y d oo m . f t d e Y e s . s W o r m . A n a r xy Y e s . s M a l w a r e ! b f Y e s . s A n a r . A . Y e s . s W i n . A n a r . a215 Y e s . s n a r . Y e s . s W o r m - e m a il. A n a r . S Y e s . s H LL W . N e w A p t Y e s . s W i n . W o r m . k m Y e s . s N e w a p t . E f bh Y e s . s N e w A p t ! g e n e r i c Y e s . s N e w A p t . A @ mm Y e s . s N e w a p t . W i n . Y e s . s W . W . N e w a p t . A ! Y e s . s W o r m . M a il. N e w A p t . a51550 Y e s . s m a li c i o u s . Y e s . s W i n . Y a n z Y e s . s Y a n z i. Q T Q X - Y e s . s W i n . Y a n z . a2410 Y e s . s W i n . S ky b ag4180 Y e s . s S ky b ag . A Y e s . s N e t s ky . a h @ MM Y e s . s S ky b ag . b Y e s . s W o r m . S ky b ag - Y e s . s W i n . A g e n t . R Y e s . s S ky b ag [ W r m ] Y e s . s S ky b ag . D v g b Y e s . s N e t s ky . C I . w o r m Y e s . s A g e n t . x p r o533 Y e s . s V il s e l.l hb Y e s . s G e n e r i c . Y e s . s V il s e l.l hb Y e s . s G e n e r i c . D F Y e s . s L d P i n c h . ao q Y e s . s J o r i k Y e s . s B u g b e a r - B Y e s . s T a n a t o s . O Y e s . s G e n . Y e s . s G i b e . b Y e s . s G e n e r i c . AX C N Y e s . s A nd r o m Y e s . s A r du r k . d Y e s . s G e n e r i c . Y e s . s L d P i n c h . b y Y e s . s G e n e r i c . Y e s . s L d P i n c h . a rr Y e s . s G e n e r i c . Y e s . s G e n e r i c . Y e s . s L d P i n c h . m g5957 Y e s . s S c r i p t . Y e s . s G e n e r i c . D F Y e s . s Z a ﬁ Y e s . s G e n e r i c K D Y e s . s W i n . A g e n t . e s Y e s . s W . H f s A u t o B . Y e s . s T r o j a n . S i v i s - Y e s . s W i n . S i gg e n . Y e s . s T r o j a n / C o s m u .i s k . Y e s . s T r o j a n . - Y e s . s D e l ph i. G e n Y e s . s T r o j a n . b c . Y e s . s D e l f o b f u s Y e s . s T r o j . U nd e f Y e s . s T r o j a n - R a n s o m . Y e s . s L D P i n c h . Y e s . s P S W . L d P i n c h . p l t Y e s . s P S W . P i n c h . Y e s . s L d P i n c h . B X . D LL Y e s . s L d P i n c h . f m y e Y e s . s L d P i n c h . W i n . Y e s . s T r o j a nSp y . L y d r a . a3450 Y e s . s T r o j a n . S t a r t P ag e Y e s . s P S W T r o j . L d P i n c h . a u Y e s . s L d P i n c h - Y e s . s L d P i n c h - R Y e s . s L d P i n c h . G e n . Y e s . s G r a f t o r . Y e s . s L d P i n c h - A I H [ T r j ] Y e s . s W i n . H e u r . k Y e s . s L d P i n c h - Y e s . s L d P i n c h . e Y e s . s W i n T oga ! r f n Y e s . s P S W . L d P i n c h . m j Y e s . s G ao b o t . D I H . w o r m Y e s . s L D P i n c h . D F ! t r . p w s Y e s . s T r o j a nSp y . Z b o t Y e s . s L D P i n c h . Y e s . s S ill y P r o xy . A M Y e s . s L d P i n c h . m j ! c Y e s . s L d P i n c h . H . g e n ! E l d o r a d o605 Y e s . s G e n e r i c ! B T Y e s . s L d P n c h - F a m Y e s . s T r o j . L d P i n c h . e r Y e s . s L d P i n c h . G e n . Y e s . s W i n . M a l w a r e . w s c Y e s . s m a li c i o u s . f d Y e s . s W S . L D P i n c h . Y e s . s x a m p l e S i z e R e s u l t c o s t E x a m p l e S i z e R e s u l t c o s t E x a m p l e S i z e R e s u l t c o s t c a l c u l a t i o n . e x e N o18 . s c i s v c . e x e N o3 . ss i m p l e . e x e N o0 . s s hu t d o w n . e x e N o0 . s l oo p . e x e N o9 . s c m d . e x e N o13 . s n o t e p a d . e x e N o24 . s j a v a . e x e N o15 . s j a v a . e x e N o42 . s s o r t . e x e N o29 . s b i b D e s k . e x e N o50 . s i n t e r f a c e . e x e N o8 . s i p v . e x e N o4 . s T e x t W r a n g l e r . e x e N o45 . ss ogo u . e x e N o55 . s ga m e . e x e N o82 . s c y c l e . t e x N o42 . s c a l e nd e r . e x e N o35 . s Sd B o t . z k Y e s . s V i r u s . G e n Y e s . s A u t o R un . P R Y e s . s A d o n . Y e s . s A d o n . Y e s . s Sp a m . T e d r oo . A B Y e s . s A k e z Y e s . s A l c a u l. d Y e s . s A l a u l. c Y e s . s V i r u s . W i n . k l k Y e s . s V i r u s . W i n . A g e n t Y e s . s H oa x . G e n Y e s . s e H e u r . V i r u s Y e s . s A k e z . Y e s . s A k e z . W i n . Y e s . s W e i r d . . C Y e s . s PE A K E Z . A Y e s . s V i r u s . W e i r d . c Y e s . s W K u a n g435 Y e s . s R a d a r . G e n Y e s . s A k e z . W i n . Y e s . s H a h a r i n . A Y e s . s f s A u t o B . F Y e s . s H a h a r i n . d r Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s N G V C K Y e s . s T a b l e : E x p e r i m e n t a l R e s u l t s Column

Size is the number of controllocations,

Column

Result gives the result of our algorithm:

Yes means ma-licious and No means benign; and Column cost gives the cost to apply ourLTL model-checker to check one of the LTL properties described above.2. Second, we abstract away the self-modifying instructions and proceed asif these instructions were not self-modifying. In this case, we translate thebinary codes to standard pushdown systems as described in [3]. By usingPDSs as models, none of the malwares that we consider was detected asmalicious, whereas, as reported in Table 3, using self-modifying PDSs asmodels, and applying our LTL model-checking algorithm allowed to detectall the 892 malwares that we considered.Note that checking the formulas φ rk , φ ds , and φ sw could be done using mul-tiple pre ∗ queries on SM-PDSs using the pre ∗ algorithm of [29]. However, thiswould be less eﬃcient than performing our direct LTL model-checking algorithm,as shown in Table 2, where Column

Size gives the number of control locations,

Column

LTL gives the time of applying our LTL model-checking algorithm; and

Column

Multiple pre ∗ gives the cost of applying multiple pre ∗ on SM-PDSs tocheck the properties φ rk , φ ds , and φ sw . It can be seen that applying our direct LTL model checking algortihm is more eﬃcient. Furthermore, the appendingvirus formula φ av cannot be solved using multiple pre ∗ queries. Our direct LTLmodel-checking algorithm is needed in this case. Note that some of the malwareswe considered in our experiments are appending viruses. Thus, our algorithm andour implementation are crucial to be able to detect these malwares. our tool McAfee Norman BitDefender Kinsoft Avira eScan Kaspersky Qihoo360 Baidu Avast Symantec

Table 4: Detection rate: Our tool vs. well known antiviruses26 omparison with well-known antiviruses.

We compare our tool againstwell-known and widely used antiviruses. Since known antiviruses update theirsignature database as soon as a new malware is known, in order to have a faircomparision with these antiviruses, we need to consider new malwares. We usethe sophisticated malware generator NGVCK available at VX Heavens [31] togenerate 205 malwares. We obfuscate these malwares with self-modifying code,and we fed them to our tool and to well known antiviruses such as BitDefender,Kinsoft, Avira, eScan, Kaspersky, Qihoo-360, Baidu, Avast, and Symantec. Ourtool was able to detect all these programs as malicious, whereas none of thewell-known antiviruses was able to detect all these malwares. Table 4 reportsthe detection rates of our tool and the well-known anti-viruses.

References

1. A.Bertrand, M.Matias, and D.Koen. A model for self-modifying code. In

IHMM-Sec , 2006.2. A. Bouajjani, J. Esparza, and O. Maler. Reachability Analysis of Pushdown Au-tomata: Application to Model Checking. In

CONCUR’97 , 1997.3. F.Song and T.Touili. Eﬃcient malware detection using model-checking. In FM ,2012.4. F.Song and T.Touili. Ltl model-checking for malware detection. In TACAS , 2013.5. G.Balakrishnan, T.W. Reps, N.Kidd, A.Lal, J.Lim, et al. Model checking x86executables with codesurfer/x86 and WPDS++. In

CAV , 2005.6. G.Bonfante, J.Marion, and D.Reynaud-Plantey. A computability perspective onself-modifying programs. In

SEFM , 2009.7. H.Cai, Z.Shao, and A.Vaynberg. Certiﬁed self-modifying code.

ACM SIGPLANNotices , 42(6), 2007.8. H.Nguyen and T.Touili. CARET model checking for malware detection. In

SPIN ,2017.9. J.Bergeron], M.Debbabi, et al. Static detection of malicious code in executableprograms.

Int. J. of Req. Eng , 2001(184-189), 2001.10. J.Esparza, D.Hansel, P.Rossmanith, and S.Schwoon. Eﬃcient algorithms for modelchecking pushdown systems. In

CAV , 2000.11. J.Kinder, S.Katzenbeisser, C.Schallhart, and H.Veith. Detecting malicious code bymodel checking. In

DIMVA , 2005.12. H.Veith J.Kinder. Jakstab: A static analysis platform for binaries. In

CAV , 2008.13. K.Coogan, S.Debray, T.Kaochar, and G.Townsend. Automatic static unpacking ofmalware binaries. In

WCRE’09 , 2009.14. K.Dam and T.Touili. Malware detection based on graph classiﬁcation. In

ICISSP ,2017.15. K.Dam and T.Touili. Learning malware using generalized graph kernels. In

ARES ,2018.16. K.Dam and T.Touili. Precise extraction of malicious behaviors. In

COMPSAC ,2018.17. K.Gyung et al. Renovo: A hidden code extractor for packed executables. In

WORM , 2007.18. K.Roundy and B.Miller. Hybrid analysis and control of malware. In

RAID , 2010.19. M.Vardi and P.Wolper. Reasoning about inﬁnite computations.

Inf. Comput. ,115(1), 1994.

0. P.Beaucamps, I.Gnaedig, and J.Marion. Behavior abstraction in malware analysis.In

Runtime Veriﬁcation , 2010.21. P.Gastin and D.Oddoux. Fast ltl to b¨uchi automata translation. In

CAV , 2001.22. P.Royal, M.Halpin, et al. Polyunpack: Automating the hidden-code extraction ofunpack-executing malware. In

ACSAC , 2006.23. P.Singh and A.Lakhotia. Static veriﬁcation of worm and virus behavior in binaryexecutables using model checking. In

IAW , 2003.24. S.Blazy, V.Laporte, and D.Pichardie. Veriﬁed abstract interpretation techniquesfor disassembling low-level self-modifying code.

JAR , 56(3), 2016.25. S.Cutler. malshare. https://malshare.com .26. S.Debray, K.Coogan, and G.Townsend. On the semantics of self-unpacking mal-ware code.

Tech. rep. University of Arizona, Computer Science , 2008.27. S.Schwoon.

Model-checking pushdown systems . PhD thesis, Technische Universit¨atM¨unchen, Universit¨atsbibliothek, 2002.28. Unpacker Tool. Automated unpacking: A behaviour based approach. https://github.com/malwaremusings/unpacker .29. T.Touili and X.Ye. Reachability analysis of self modifying code. In

ICECCS , 2017.30. T.Touili and X.Ye. Ltl model checking of self modifying code. In

ICECCS , 2019.31. V.Heaven. V.heavens. http://vxer.org/lib/ .32. VirusShare. vxshare. https://virusshare.com ..

Related Researches

An Interactive Proof of Termination for a Concurrent λ -calculus with References and Explicit Substitutions

by Yann Hamdaoui

From Matching Logic To Parallel Imperative Language Verification

by ShangBei Wang

Injective Objects and Fibered Codensity Liftings

by Yuichi Komorida

A Verified Decision Procedure for Univariate Real Arithmetic with the BKR Algorithm

by Katherine Cordwell

Structure-Constrained Process Graphs for the Process Semantics of Regular Expressions

by Clemens Grabmayer

Parallel Independence in Attributed Graph Rewriting

by Thierry Boy de la Tour

Formalized Haar Measure

by Floris van Doorn

Certifying Differential Equation Solutions from Computer Algebra Systems in Isabelle/HOL

by Thomas Hickman

On Stochastic Rewriting and Combinatorics via Rule-Algebraic Methods

by Nicolas Behr

A formalization of Dedekind domains and class groups of global fields

by Anne Baanen

Formalising a Turing-Complete Choreographic Language in Coq

by Luís Cruz-Filipe

A model of Clocked Cubical Type Theory

by Magnus Baunsgaard Kristensen

Algorithmic Correspondence for Hybrid Logic with Binder

by Zhiguang Zhao

On the Parameterized Complexity of Learning First-Order Logic

by Steffen van Bergerem

Being correct is not enough: efficient verification using robust linear temporal logic

by Tzanis Anevlavis

Induction principles for type theories, internally to presheaf categories

by Rafaël Bocquet

Uniform Elgot Iteration in Foundations

by Sergey Goncharov

A tier-based typed programming language characterizing Feasible Functionals

by Emmanuel Hainry

Syntactic completeness of proper display calculi

by Jinsheng Chen

Behavioral QLTL

by Giuseppe De Giacomo

On the Axiomatisability of Parallel Composition

by Luca Aceto

Unifying Hidden-Variable Problems from Quantum Mechanics by Logics of Dependence and Independence

by Rafael Albert

Polymorphic Automorphisms and the Picard Group

by Pieter Hofstra

Relative Expressiveness of Defeasible Logics II

by Michael J. Maher

Mining EL Bases with Adaptable Role Depth

by Ricardo Guimarães

«

1

2

3

4

»

Submitted on 27 Sep 2019 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar