Reachability Analysis of Self Modifying Code
11 Reachability Analysis of Self Modifying Code
Tayssir Touili LIPN, CNRS and University Paris 13Xin Ye Shanghai Key Lab. of Trustworthy Comput., ECNU and LIPN, CNRS and University Paris 13
Abstract —A Self modifying code is code that modifies itsown instructions during execution time. It is nowadays widelyused, especially in malware to make the code hard to analyseand to detect by anti-viruses. Thus, the analysis of such selfmodifying programs is a big challenge. Pushdown systems (PDSs)is a natural model that is extensively used for the analysis ofsequential programs because they allow to accurately modelprocedure calls and mimic the program’s stack. In this work,we propose to extend the PushDown System model with self-modifying rules. We call the new model Self-Modifying Push-Down System (SM-PDS). A SM-PDS is a PDS that can modifyits own set of transitions during execution. We show how SM-PDSs can be used to naturally represent self-modifying programsand provide efficient algorithms to compute the backward andforward reachable configurations of SM-PDSs. We implementedour techniques in a tool and obtained encouraging results. Inparticular, we successfully applied our tool for the detection ofself-modifying malware.
I. I
NTRODUCTION
Self-modifying code is code that modifies its own instruc-tions during execution time. It is nowadays widely used,mainly to make programs hard to understand. For example,self-modifying code is extensively used to protect softwareintellectual property, since it makes reverse code engineeringharder. It is also abundantly used by malware writers in orderto obfuscate their malicious code and make it hard to analyseby static analysers and anti-viruses.There are several kinds of implementations for self-modifying codes.
Packing [CT08] consists in applying com-pression techniques to make the size of the executable filesmaller. This converts the executable file to a form where theexecutable content is hidden. Then, the code is ”unpacked” atruntime before execution. Such packed code is self-modifying.
Encryption is another technique to hide the code. It uses somekind of invertible operations to hide the executable code withan encryption key. Then, the code is ”decrypted” at runtimeprior to execution. Encrypted programs are self-modifying.These two forms of self-modifying codes have been wellstudied in the litterature and could be handled by severalunpacking tools such as [Tea], [Kar].In this work, we consider another kind of self-modifyingcode, caused by self-modifying instructions , where code istreated as data that can thus be read and written by self-modifying instructions . These self-modifying instructions areusually mov instructions, since mov can access memory, andread and write to it. For example, consider the programshown in Fig.1. For simplification matters, we suppose thatthe addresses’ length is 1 byte. The binary code is given inthe left side, while in the right side, we give its correspondingassembly code obtained by translating syntactically the binarycode at each address. For example, ff is the binary code of the instruction push , thus, the first line is translated to push 0x3 , the second line to push 0b , etc. Let us executethis code. First, we execute push 0x3 , then push 0b , then mov 0x2 0xc . This last instruction will replace the first byteat address by . Thus, at address , ff 0b isreplaced by
0c 0b . Since is the binary code of jmp , thismeans the instruction push 0b is replaced by jmp 0xb .Therefore, this code is self-modifying. If we treat it blindly,without looking at the semantics of the different instructions,we will extract from it the Control Flow Graph CFG a ,whereas its correct Control Flow Graph is
CFG b . You can seethat the mov instruction was able to modify the instructionsof the program successfully via its ability to read and writethe memory.In this paper, we consider the reachability analysis of self-modifying programs where the code is modified by mov instructions. To this aim, we first need to find an adequatemodel for such programs. PushDown Systems (PDS) areknown to be a natural model for sequential programs [Sch02],as they allow to track the contexts of the different calls inthe program. Moreover, PushDown Systems allow to recordand mimic the program’s stack, which is very important formalware detection. Indeed, to check whether a program ismalicious, anti-viruses start by identifying the calls it makesto the API functions. To evade these checks, malware writerstry to obfuscate the calls they make to the Operating Systemby using pushes and jumps. Thus, it is important to be ableto track the stack to detect such obfuscated calls. This iswhy PushDown Systems were used in [ST12], [ST13] tomodel binary programs in order to perform malware detection.However, these works do not consider malwares that use self-modifying code, as PushDown Systems are not able to modelself-modifying instructions.To overcome this limitation, we propose in this work toextend the PushDown System model with self-modifyingrules. We call the new model Self-Modifying PushDownSystem (SM-PDS). Roughly speaking, a SM-PDS is a PDSthat can modify its own set of transitions during execution. Weshow how SM-PDSs can be used to naturally represent self-modifying programs. It turns out that SM-PDSs are equivalentto standard PDSs. We show how to translate a SM-PDSto a standard PDS. This translation is exponential. Thus,performing the reachability analysis on the equivalent PDS isnot efficient. We propose then direct algorithms to computethe forward ( post ∗ ) and backward ( pre ∗ ) reachability setsfor SM-PDSs. This allows to efficiently perform reachabilityanalysis for self-modifying programs. Our algorithms arebased on (1) representing regular (potentially infinite) sets ofconfigurations of SM-PDSs using finite state automata, and(2) applying saturation procedures on the finite state automata a r X i v : . [ c s . F L ] S e p x0 push 0x30x2 push 0b0x4 mov 0x2 0xc0x7 push %ebx CFGs ff ff ff Binary Codes AssemblyaddressCodes
After Execution of mov 0x2 0xc
Fig. 1: A Simple Example of Self-modifying Codesin order to take into account the effect of applying the rulesof the SM-PDS. We implemented our algorithms in a toolthat takes as input either an SM-PDS or a self-modifyingprogram. Our experiments show that our direct techniquesare much more efficient than translating the SM-PDS to anequivalent PDS and then applying the standard reachabilityalgorithms for PDSs [BEM97], [EHRS00], [Sch02]. Moreover,we successfully applied our tool to the analysis of several self-modifying malwares.
This paper is an expanded version of the conferencepaper [TY17]. Compared to [TY17], in this expandedversion, we propose new algorithms for computing theforward and backward reachable configurations for SM-PDSs, and we provide the detailed proofs that show thecorrectness of our constructions.Outline.
The rest of the paper is structured as follows:Section 2 introduces our new model and shows how totranslate a SM-PDS to an equivalent pushdown system. InSection 3, we give the translation from a binary code to aSM-PDS. In Section 4, we define finite automata to representregular (potentially infinite) sets of configurations of SM-PDSs. Sections 5 and 6 give our algorithms to compute thebackward and forward reachability sets of SM-PDSs. Section7 describes our experiments.
Related Work.
Reachability analysis of pushdown systems was consideredin [BEM97], [EHRS00]. Our algorithms are extensions of thesaturation approach of these works.Model checking and static analysis approaches have beenwidely used to analyze binary programs, for instance,in [BDD + + + + + mov instructions.II. S ELF - MODIFYING P USHDOWN S YSTEMS
A. Definition
We introduce in this section our new model: Self-modifyingPushdown Systems.2 efinition 1.
A Self-modifying Pushdown System (SM-PDS)is a tuple P = ( P, Γ , ∆ , ∆ c ) , where P is a finite set of controlpoints, Γ is a finite set of stack symbols, ∆ ⊆ ( P × Γ) × ( P × Γ ∗ ) is a finite set of transition rules, and ∆ c ⊆ P × (∆ ∪ ∆ c ) × (∆ ∪ ∆ c ) × P is a finite set of modifying transition rules. If (( p, γ ) , ( p (cid:48) , w )) ∈ ∆ , we also write (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ .If ( p, r , r , p (cid:48) ) ∈ ∆ c , we also write p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c . APushdown System (PDS) is a SM-PDS where ∆ c = ∅ . Intuitively, a Self-modifying Pushdown System is a Push-down System that can dynamically modify its set of rulesduring the execution time: rules ∆ are standard PDS transitionrules, while rules ∆ c modify the current set of transition rules: (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ expresses that if the SM-PDS is incontrol point p and has γ on top of its stack, then it canmove to control point p (cid:48) , pop γ and push w onto the stack,while p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c expresses that when the PDS is incontrol point p , then it can move to control point p (cid:48) , removethe rule r from its current set of transition rules, and addthe rule r . Formally, a configuration of a SM-PDS is a tuple c = ( (cid:104) p, w (cid:105) , θ ) where p ∈ P is the control point, w ∈ Γ ∗ is thestack content, and θ ⊆ ∆ ∪ ∆ c is the current set of transitionrules of the SM-PDS. θ is called the current phase of theSM-PDS. When the SM-PDS is a PDS, i.e., when ∆ c = ∅ ,a configuration is a tuple c = ( (cid:104) p, w (cid:105) , ∆) , since there is nochanging rule, so there is only one possible phase. In this case,we can also write c = (cid:104) p, w (cid:105) . Let C be the set of configurationsof a SM-PDS. A SM-PDS defines a transition relation ⇒ P between configurations as follows: Let c = ( (cid:104) p, w (cid:105) , θ ) be aconfiguration, and let r be a rule in θ , then:1) if r ∈ ∆ c is of the form r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) , such that r ∈ θ , then ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:105) , θ (cid:48) ) , where θ (cid:48) =( θ \ { r } ) ∪ { r } . In other words, the transition rule r updates the current set of transition rules θ by removing r from it and adding r to it.2) if r ∈ ∆ is of the form r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:48) (cid:105) ∈ ∆ ,then ( (cid:104) p, γw (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) w (cid:105) , θ ) . In other words, thetransition rule r moves the control point from p to p (cid:48) ,pops γ from the stack and pushes w (cid:48) onto the stack.This transition keeps the current set of transition rules θ unchanged.Let ⇒ ∗P be the transitive, reflexive closure of ⇒ P . Wedefine i ⇒ as follows: c i ⇒ c (cid:48) iff there exists a sequence ofconfigurations c ⇒ P c ⇒ P ... ⇒ P c i s.t. c = c and c i = c Given a configuration c , the set of immediate predecessors(resp. successors) of c is pre P ( c ) = { c (cid:48) ∈ C : c (cid:48) ⇒ P c } (resp. post P ( c ) = { c (cid:48) ∈ C : c ⇒ P c (cid:48) } ). These notationscan be generalized straightforwardly to sets of configurations.Let pre ∗P (resp. post ∗P ) denote the reflexive-transitive closureof pre P (resp. post P ). We omit the subscript P when it isunderstood from the context. Example 1.
Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS where p = { p , p , p , p } , Γ = { γ , γ , γ } , ∆ = { r : (cid:104) p , γ (cid:105) (cid:44) → (cid:104) p , γ γ (cid:105) , r : (cid:104) p , γ (cid:105) (cid:44) → (cid:104) p , (cid:15) (cid:105) , r : (cid:104) p , γ (cid:105) (cid:44) →(cid:104) p , γ γ (cid:105)} , ∆ c = { r (cid:48) : p r ,r ) (cid:44) −−−−→ p } . Let c =( (cid:104) p , γ γ (cid:105) , θ ) where θ = { r , r , r (cid:48) } . Applying rule r , we get ( (cid:104) p , γ γ (cid:105) , θ ) ⇒ P ( (cid:104) p , γ γ γ (cid:105) , θ ) . Then, applyingrule r , we get ( (cid:104) p , γ γ γ (cid:105) , θ ) ⇒ P ( (cid:104) p , γ γ (cid:105) , θ ) . Then,applying rule r (cid:48) , we get ( (cid:104) p , γ γ (cid:105) , θ ) ⇒ P ( (cid:104) p , γ γ (cid:105) , θ ) where r (cid:48) is self-modifying, thus, it leads the SM-PDS fromphase θ = { r , r , r (cid:48) } to phase θ = θ \ { r } ∪{ r } = { r , r , r (cid:48) } . Then, applying rule r , we get ( (cid:104) p , γ γ (cid:105) , θ ) ⇒ P ( (cid:104) p , γ γ γ (cid:105) , θ ) . Then, applying rule r again, we get ( (cid:104) p , γ γ γ (cid:105) , θ ) ⇒ P ( (cid:104) p , γ γ (cid:105) , θ ) .B. From SM-PDSs to PDSs A SM-PDS can be described by a PDS. This is due to thefact that the number of phases is finite, thus, we can encodephases in the control points of the PDS: Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS, we compute the PDS P (cid:48) = ( P (cid:48) , Γ , ∆ (cid:48) ) asfollows: P (cid:48) = P × ∆ ∪ ∆ c . Initially, ∆ (cid:48) = ∅ . For every θ ∈ ∆ ∪ ∆ c , r ∈ θ :1) If r = (cid:104) p, γ (cid:105) (cid:44) → ( p (cid:48) , w ) ∈ ∆ , we add (cid:104) ( p, θ ) , γ (cid:105) (cid:44) →(cid:104) ( p (cid:48) , θ ) , w (cid:105) to ∆ (cid:48)
2) if r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c , then for every γ ∈ Γ ,we add (cid:104) ( p, θ ) , γ (cid:105) (cid:44) → (cid:104) ( p (cid:48) , θ (cid:48) ) , γ (cid:105) to ∆ (cid:48) , where θ (cid:48) =( θ \ { r } ) ∪ { r } .It is easy to see that: Proposition 1. ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) iff (cid:104) ( p, θ ) , w (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ (cid:48) ) , w (cid:48) (cid:105) . Proof: ⇒ : We will show that if ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) , thenwe have (cid:104) ( p, θ ) , w (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ (cid:48) ) , w (cid:48) (cid:105) . There are two casesdepending on the form of the rule that led to this transition. • Case θ = θ (cid:48) : it means that the transition does notcorrespond to a self-modifying transition rule. Thus thereis a rule r ∈ θ of the form r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , u (cid:48) (cid:105) that led to this transition. Let u be such that w = γu, w (cid:48) = u (cid:48) u . By the construction rule of the PDS P (cid:48) ,we have (cid:104) ( p, θ ) , γ (cid:105) (cid:44) → (cid:104) ( p (cid:48) , θ ) , u (cid:48) (cid:105) ∈ ∆ (cid:48) . Therefore, (cid:104) ( p, θ ) , γu (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ ) , u (cid:48) u (cid:105) holds. This implies that (cid:104) ( p, θ ) , w (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ ) , w (cid:48) (cid:105) . • Case θ (cid:54) = θ (cid:48) : it means that the transition correspondsto a self-modifying transition rule. Thus there is a rule r ∈ θ of the form p ( r ,r ) (cid:44) −−−−→ p (cid:48) that led to thistransition. Let u be such that w = γu, w (cid:48) = γu . By theconstruction rule of the PDS P (cid:48) , we have (cid:104) ( p, θ ) , γ (cid:105) (cid:44) →(cid:104) ( p (cid:48) , θ (cid:48) ) , γ (cid:105) ∈ ∆ (cid:48) where θ (cid:48) = ( θ \{ r } ) ∪{ r } . Therefore, (cid:104) ( p, θ ) , γu (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ (cid:48) ) , γu (cid:105) holds. This implies that (cid:104) ( p, θ ) , w (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ (cid:48) ) , w (cid:48) (cid:105) . ⇐ : We will show that if (cid:104) ( p, θ ) , w (cid:105) ⇒ P (cid:48) (cid:104) ( p (cid:48) , θ (cid:48) ) , w (cid:48) (cid:105) , then ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . Let γ ∈ Γ , u, u (cid:48) ∈ Γ ∗ be suchthat w = γu, w (cid:48) = u (cid:48) u. There are two cases. • Case θ = θ (cid:48) . Let r = (cid:104) ( p, θ ) , γ (cid:105) (cid:44) → (cid:104) ( p (cid:48) , θ ) , u (cid:48) (cid:105) ∈ ∆ (cid:48) bethe rule that led to the transition. By the constructionof PDS P (cid:48) , there must exist a rule r ∈ θ such that r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , u (cid:48) (cid:105) . Therefore, ( (cid:104) p, γu (cid:105) , θ ) ⇒ P ( (cid:104) p, u (cid:48) u (cid:105) , θ ) holds. This implies that ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p, w (cid:48) (cid:105) , θ (cid:48) ) . • Case θ (cid:54) = θ (cid:48) . Let r = (cid:104) ( p, θ ) , γ (cid:105) (cid:44) → (cid:104) ( p (cid:48) , θ (cid:48) ) , γ (cid:105) ∈ ∆ (cid:48) be the rule leading to the transition and u (cid:48) = γ. By the3onstruction of PDS P (cid:48) , there must exist a rule r ∈ θ such that r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) where θ (cid:48) = ( θ \{ r } ) ∪ { r } .Therefore, ( (cid:104) p, γu (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , γu (cid:105) , θ (cid:48) ) holds. Thisimplies that ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p, w (cid:48) (cid:105) , θ (cid:48) ) . (cid:50) Thus, we get:
Theorem 1.
Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS, we cancompute an equivalent PDS P (cid:48) = ( P (cid:48) , Γ , ∆ (cid:48) ) such that | ∆ (cid:48) | = (cid:0) | ∆ | + | ∆ c | · | Γ | (cid:1) · O ( | ∆ | + | ∆ c | ) and | P (cid:48) | = | P | · O ( | ∆ | + | ∆ c | ) .C. From SM-PDSs to Symbolic PDSs Instead of recording the phases θ of the SM-PDS in thecontrol points of the equivalent PDS, we can have a more com-pact translation from SM-PDSs to symbolic PDSs [Sch02],where each SM-PDS rule is represented by a single, symbolic transition, where the different values of the phases are encodedin a symbolic way using relations between phases:
Definition 2.
A symbolic pushdown system is a tuple P =( P, Γ , δ ) , where P is a set of control points, Γ is the stackalphabet, and δ is a set of symbolic rules of the form: (cid:104) p, γ (cid:105) R (cid:44) −−−−→ (cid:104) p (cid:48) , w (cid:105) , where R ⊆ ∆ ∪ ∆ c × ∆ ∪ ∆ c is arelation. A symbolic PDS defines a transition relation (cid:59) P betweenSM-PDS configurations as follows: Let c = ( (cid:104) p, γw (cid:48) (cid:105) , θ ) bea configuration and let (cid:104) p, γ (cid:105) R (cid:44) −−−−→ (cid:104) p (cid:48) , w (cid:105) be a rule in δ , then: ( (cid:104) p, γw (cid:48) (cid:105) , θ ) (cid:59) P ( (cid:104) p (cid:48) , ww (cid:48) (cid:105) , θ (cid:48) ) for ( θ, θ (cid:48) ) ∈ R . Let (cid:59) ∗P be the transitive, reflexive closure of (cid:59) P . Then, given aSM-PDS P = ( P, Γ , ∆ , ∆ c ) , we can compute an equivalentsymbolic PDS P (cid:48) = ( P, Γ , ∆ (cid:48) ) such that: Initially, ∆ (cid:48) = ∅ ; • For every (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ , add (cid:104) p, γ (cid:105) R id (cid:44) −−−−→(cid:104) p (cid:48) , w (cid:105) to ∆ (cid:48) , where R id is the identity relation. • For every r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c and every γ ∈ Γ ,add (cid:104) p, γ (cid:105) R (cid:44) −−−−→ (cid:104) p (cid:48) , γ (cid:105) to ∆ (cid:48) , where R = { ( θ , θ ) ∈ ∆ ∪ ∆ c × ∆ ∪ ∆ c | r ∈ θ and θ = ( θ \ { r } ) ∪ { r }} .It is easy to see that: Proposition 2. ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) iff ( (cid:104) p, w (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . Proof: ⇒ : we will show that if ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) , then ( (cid:104) p, w (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . There are two cases dependingon the form of the rule that led to this transition. • Case θ = θ (cid:48) , it means that the transition does notcorrespond to a self-modifying transition rule. Thus thereis a rule r ∈ θ of the form r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , u (cid:48) (cid:105) that led to this transition. Let u be such that w = γu, w (cid:48) = u (cid:48) u . By construction of the symbolic pushdownsystem P (cid:48) , (cid:104) p, γ (cid:105) R id (cid:44) −−−−→ (cid:104) p (cid:48) , u (cid:48) (cid:105) ∈ ∆ (cid:48) , therefore, ( (cid:104) p, γu (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , u (cid:48) u (cid:105) , θ ) holds. This implies that ( (cid:104) p, w (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . • Case θ (cid:54) = θ (cid:48) , it means that the transition correspondsto a self-modifying transition rule. Thus there is a rule r ∈ θ of the form r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) that led to thistransition and θ (cid:48) = ( θ \{ r } ) ∪ { r } . Let u be such that w = γu, w (cid:48) = γu . By construction of the symbolicpushdown system P (cid:48) , (cid:104) p, γ (cid:105) R (cid:44) −−−−→ (cid:104) p (cid:48) , γ (cid:105) ∈ ∆ (cid:48) and R = { ( θ, θ (cid:48) ) ∈ ∆ ∪ ∆ c × ∆ ∪ ∆ c | r ∈ θ and θ (cid:48) = ( θ \{ r } ) ∪ { r }} , therefore, ( (cid:104) p, γu (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , γu (cid:105) , θ (cid:48) ) holds. This implies that ( (cid:104) p, w (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . ⇐ : we will show that if ( (cid:104) p, w (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) , then ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . Let γ ∈ Γ , u, u (cid:48) ∈ Γ ∗ be suchthat w = γu, w (cid:48) = u (cid:48) u . There are two cases. • Case θ = θ (cid:48) . Let (cid:104) p, γ (cid:105) R id (cid:44) −−−−→ (cid:104) p (cid:48) , u (cid:48) (cid:105) ∈ ∆ (cid:48) be therule applied to this transition. By the construction ofthe symbolic pushdown system P (cid:48) , there must exist arule r ∈ θ s.t. r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , u (cid:48) (cid:105) ∈ ∆ . Therefore, ( (cid:104) p, γu (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , u (cid:48) u (cid:105) , θ ) holds. This implies that ( (cid:104) p, w (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . • Case θ (cid:54) = θ (cid:48) . Let (cid:104) p, γ (cid:105) R (cid:44) −−−−→ (cid:104) p (cid:48) , γ (cid:105) ∈ ∆ (cid:48) be the ruleapplied to this transition with w (cid:48) = γu . By the construc-tion of the symbolic pushdown system P (cid:48) , there mustexist a rule r ∈ θ of the form r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c s.t. R = { ( θ , θ ) ∈ ∆ ∪ ∆ c × ∆ ∪ ∆ c | r ∈ θ and θ =( θ \ { r } ) ∪ { r }} . Therefore, θ (cid:48) = ( θ \{ r } ) ∪ { r } and ( (cid:104) p, γu (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , γu (cid:105) , θ (cid:48) ) hold. This implies that ( (cid:104) p, w (cid:105) , θ ) (cid:59) P (cid:48) ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ (cid:48) ) . (cid:50) Thus, we get:
Theorem 2.
Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS, we cancompute an equivalent symbolic PDS P (cid:48) = ( P (cid:48) , Γ , ∆ (cid:48) ) suchthat | P (cid:48) | = | P | , | ∆ (cid:48) | = | ∆ | + | ∆ c | · | Γ | , and the size of therelations used in the symbolic transitions is O ( | ∆ | + | ∆ c | ) . III. M
ODELING SELF - MODIFYING CODE WITH
SM-PDS S A. Self-modifying instructions
There are different techniques to implement self-modifyingcode. We consider in this work code that uses self-modifyinginstructions. These are instructions that can access the memorylocations and write onto them, thus changing the instructionsthat are in these memory locations. In assembly, the onlyinstructions that can do this are the mov instructions. In thiscase, the self-modifying instructions are of the form mov l v ,where l is a location of the program that stores executabledata and v is a value. This instruction replaces the value atlocation l (in the binary code) with the value v . This meansif at location l there is a binary value v (cid:48) that is involved inan assembly instruction i , and if by replacing v (cid:48) by v , weobtain a new assembly instruction i , then the instruction i is replaced by i . E.g., ff is the binary code of push , isthe binary code of inc , is the binary code of jmp , c6 is the binary code of mov , etc. Thus, if we have mov l ff ,and if at location l there was initially the value 40 01 (whichcorresponds to the assembly instruction inc %edx), then 40 isreplaced by ff , which means the instruction inc %edx isreplaced by push 01 . If at location l there was initially thevalue c6 01 02 (which corresponds to the assembly instruction4 ov edx 0x2 ), then c6 is replaced by ff , which means theinstruction mov edx 0x2 is replaced by push 02 .Note that if the instructions i and i do not have thesame number of operands, then mov l v will, in addition toreplacing i by i , change several other instructions that follow i . Currently, we cannot handle this case, thus we assume that i and i have the same number of operands.Note also that mov l v is self-modifying only if l is alocation of the program that stores executable data, otherwise,it is not; e.g., mov eax v does not change the instructionsof the program, it just writes the value v to the register eax .Thus, from now on, by self-modifying instruction, we meanan instruction of the form mov l v , where l is a location ofthe program that stores executable data. Moreover, to ensurethat only one instruction is modified, we assume that thecorresponding instructions i and i have the same numberof operands. B. From self-modifying code to SM-PDS
We show in what follows how to build a SM-PDS froma binary program. We suppose we are given an oracle O that extracts from the binary code a corresponding assemblyprogram, together with informations about the values of theregisters and the memory locations at each control point of theprogram. In our implementation, we use Jakstab [Kin08] toget this oracle. We translate the assembly program into a self-modifying pushdown system where the control locations storethe control points of the binary program and the stack memicsthe program’s stack. The non self-modifying instructions of theprogram define the rules ∆ of the SM-PDS (which are standardPDS rules), and can be obtained following the translationof [ST12] that models non self-modifying instructions of theprogram by a PDS.As for the self-modifying instructions of the program, theydefine the set of changing rules ∆ c . As explained above, theseare instructions of the form mov l v , where l is a locationof the program that stores executable data. This instructionreplaces the value at location l (in the binary code) with thevalue v . Let i be the initial instruction involving the location l ,and let i be the new instruction involving the location l , afterapplying the mov l v instruction. As mentioned previously,we assume that i and i have the same number of operands(to ensure that only one instruction is modified). Let r (resp. r ) be the SM-PDS rule corresponding to the instruction i (resp. i ). Suppose from control point n to n (cid:48) , we have this mov l v instruction, then we add n ( r ,r ) (cid:44) −−−−→ n (cid:48) to ∆ c . Thisis the SM-PDS rule corresponding to the instruction mov l v at control point n .IV. R EPRESENTING INFINITE SETS OF CONFIGURATIONSOF A
SM-PDSMulti-automata were introduced in [BEM97], [EHRS00] tofinitely represent regular infinite sets of configurations of aPDS. A configuration c = ( (cid:104) p, w (cid:105) , θ ) of a SM-PDS involvesa PDS configuration (cid:104) p, w (cid:105) , together with the current set oftransition rules (phase) θ . To finitely represent regular infinite sets of such configurations, we extend multi-automata in orderto take into account the phases θ : Definition 3.
Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS. A P -automaton is a tuple A = ( Q, Γ , T, P, F ) where Γ is theautomaton alphabet, Q is a finite set of states, P × ∆ ∪ ∆ c ⊆ Q is the set of initial states, T ⊂ Q × (cid:0) (Γ ∪ { (cid:15) } ) (cid:1) × Q is the setof transitions and F ⊆ Q is the set of final states. If (cid:0) q, γ, q (cid:48) (cid:1) ∈ T , we write q γ −→ T q (cid:48) . We extend this no-tation in the obvious manner to sequences of symbols: (1) ∀ q ∈ Q, q (cid:15) −→ T q , and (2) ∀ q, q (cid:48) ∈ Q, θ (cid:48) ∈ ∆ ∪ ∆ c , ∀ γ ∈ Γ ∪ { (cid:15) } , ∀ w ∈ Γ ∗ for w = γ γ · · · γ n , q γw −−→ T q (cid:48) iff ∃ q (cid:48)(cid:48) ∈ Q, q γ −→ T q (cid:48)(cid:48) and q (cid:48)(cid:48) w −−→ T q (cid:48) . If q w −→ T q (cid:48) holds, we say that q w −→ T q (cid:48) is a path of A . A configuration ( (cid:104) p, w (cid:105) , θ ) is acceptedby A iff A contains a path ( p, θ ) γ −−→ T q γ −−→ T q · · · q n γ n −−→ T q where q ∈ F . Let L ( A ) be the set of configurations acceptedby A . Let C be a set of configurations of the SM-PDS P . C isregular if there exists a P -automaton A such that C = L ( A ) V. E
FFICIENT COMPUTATION OF pre ∗ IMAGES
Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS, and let A =( Q, Γ , T, P, F ) be a P -automaton that represents a regularset of configurations C ( C = L ( A ) ). To compute pre ∗ ( C ) ,one can use the translation of Section II-B to compute anequivalent PDS, and then apply the algorithms of [BEM97],[EHRS00]. This procedure is too complex since the size ofthe obtained PDS is huge. One can also use the translation ofSection II-C to compute an equivalent symbolic PDS, and thenuse the algorihms of [Sch02]. However, this procedure is notoptimal neither since the number of elements of the relationsconsidered in the rules of the symbolic PDSs are huge. Wepresent in this section a direct and more efficient algorithmthat computes pre ∗ ( C ) without any need to translate the SM-PDS to an equivalent PDS or symbolic PDS. We assumew.l.o.g. that A has no transitions leading to an initial state. Wealso assume that the self-modifying rules r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) in ∆ c are such that r (cid:54) = r . This is not a restriction since a rule ofthe form r = p ( r,r ) (cid:44) −−−−→ p (cid:48) can be replaced by these rules thatmeet this constraint: r = p ( r ⊥ ,r ⊥ ) (cid:44) −−−−→ p i and p i ( r,r ) (cid:44) −−−−→ p (cid:48) ,where r ⊥ is a new fake rule that we can add to all phases.The construction of A pre ∗ follows the same idea as forstandard pushdown systems (see [BEM97], [EHRS00]). Itconsists in adding iteratively new transitions to the automaton A according to saturation rules (reflecting the backwardapplication of the transition rules in the system), while theset of states remains unchanged. Therefore, let A pre ∗ be the P -automaton ( Q, Γ , T (cid:48) , P, F ) , where T (cid:48) is computed using thefollowing saturation rules: initially T (cid:48) = T . α : If r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , w (cid:105) ∈ ∆ , where w ∈ Γ ∗ . For every θ ⊆ ∆ ∪ ∆ c s.t. r ∈ θ , if there exists in T (cid:48) a path π = ( p , θ ) w −−→ T q , then add (( p, θ ) , γ, q ) to T (cid:48) . α : if r = p ( r ,r ) (cid:44) −−−−→ p ∈ ∆ c for every θ ⊆ ∆ ∪ ∆ c s.t. r ∈ θ, r ∈ θ and for every γ ∈ Γ , if there exists in T (cid:48) a transition t = ( p , θ ) γ −→ T q , then add (( p, θ (cid:48) ) , γ, q ) to T (cid:48) where θ = ( θ (cid:48) \ { r } ) ∪ { r } .5 :Δ : r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ Δ c :Δ c : r ′ : p r ′ : p θ = { r , r , r , r , r ′ } θ = { r , r , r , r , r ′ } p , θ p , θ s s s s γ γ γ γ p , θ p , θ s s s s γ γ γ γ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ p , θ p , θ γ ↪ p ( r , r ) Fig. 2: The automata A (left) and A pre ∗ (right)The procedure above terminates since there is a finite numberof states and phases.Let us explain intuitively the role of the saturation rule( α ). Let r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ . Consider a path inthe automaton of the form ( p (cid:48) , θ (cid:48) ) w −−→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F , where q F ∈ F . This means, by definition of P -automata, that theconfiguration c = ( (cid:104) p (cid:48) , ww (cid:48) (cid:105) , θ (cid:48) ) is accepted by A pre ∗ . If r isin θ (cid:48) , then the configuration c (cid:48) = ( (cid:104) p, γw (cid:48) (cid:105) , θ (cid:48) ) is a predecessorof c . Therefore, it should be added to A pre ∗ . This configurationis accepted by the run ( p, θ (cid:48) ) γ −→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F added by rules( α ).Rule ( α ) deals with modifying rules: Let r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c . Consider a path in the automaton of the form ( p (cid:48) , θ (cid:48) ) γ −→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F , where q F ∈ F . This means, by defi-nition of P -automata, that the configuration c = ( (cid:104) p (cid:48) , γw (cid:48) (cid:105) , θ (cid:48) ) is accepted by A pre ∗ . If r and r are in θ (cid:48) , then the con-figuration c (cid:48) = ( (cid:104) p, γw (cid:48) (cid:105) , θ ) is a predecessor of c , where θ (cid:48) = ( θ \ { r } ) ∪ { r } . Therefore, it should be addedto A pre ∗ . This configuration is accepted by the run π (cid:48) =( p, θ ) γ −→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F added by rules ( α ).Thus, we can show that: Theorem 3. A pre ∗ recognizes pre ∗ ( L ( A )) . Before proving this theorem, let us illustrate the constructionon 2 examples.
Example 2.
Let us illustrate the procedure by an ex-ample. Consider the SM-PDS with control points P = { p , p , p , p , p , p } and ∆ , ∆ c as shown in the left halfof Fig. 2. Let A be the automaton that accepts the set C = { ( (cid:104) p , γ γ (cid:105) , θ ) } , also shown on the left where ( p , θ ) is the initial state and s is the final state. The result of thealgorithm is shown in the right half of Fig. 2. The result isobtained through the following steps: First, we note that ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) holds. Since (cid:104) p , (cid:15) (cid:105) occurs on the right hand side of rule r and r ∈ θ , then Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Now that we have ( p , θ ) γ −→ T (cid:48) ( p , θ ) , since r ∈ θ ,Rule ( α ) adds ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since we have ( p , θ ) γ −→ T (cid:48) ( p , θ ) , the self-modifyingtransition r (cid:48) ∈ θ can be applied. Thus, Rule ( α ) adds ( p , θ ) γ −→ ( p , θ ) to T (cid:48) where θ = ( θ \ { r } ) ∪{ r } = { r , r , r , r , r (cid:48) } . Since ( p , θ ) (cid:15) −→ ( p , θ ) and r ∈ θ , Rule ( α ) adds ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Then, there is a path ( p , θ ) γ −→ T (cid:48) ( p , θ ) γ −→ T (cid:48) ( p , θ ) . Since (cid:104) p , γ γ (cid:105) occurs on the right hand sideof r and r ∈ θ , then Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . No further additions are possible. Thus, the procedureterminates.
Example 3.
Let us give another example. Consider the SM-PDS with control points P = { p , p , p , p , p } and ∆ , ∆ c as shown in the left half of Fig. 3. Let A be the automatonthat accepts the set C = { ( (cid:104) p , γ γ (cid:105) , θ ) } where ( p , θ ) isthe initial state and s is the final state as shown on the left.The result A pre ∗ of the algorithm is on the right half of Fig.3. The result is obtained through the following steps: Since ( p , θ ) γ −→ T (cid:48) s and r ∈ θ , then Rule ( α ) adds ( p , θ ) γ −→ s to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) s and r ∈ θ , Rule ( α ) adds thetransition ( p , θ ) γ −→ s to T (cid:48) . Since ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) addsthe transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Then, there is a path ( p , θ ) γ −→ T (cid:48) ( p , θ ) γ −→ T (cid:48) s and r ∈ θ , Rule α adds the transition ( p , θ ) γ −→ s to T (cid:48) . Because ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Then,since r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since there is a path ( p , θ ) γ −→ T (cid:48) ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) s , ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r c , r ∈ θ , Rule ( α ) adds ( p , θ ) γ −→ s and ( p , θ ) γ −→ ( p , θ ) to T (cid:48) where θ = ( θ \{ r } ) ∪ r = { r , r , r , r , r , r c , r c , r c , r c } . For the same reason,since ( p , θ ) γ −→ T (cid:48) ( p , θ ) , ( p , θ ) γ −→ T (cid:48) s and r , ∈ θ , r c ∈ θ , Rule ( α ) adds the transitions ( p , θ ) γ −→ ( p , θ ) and ( p , θ ) γ −→ s to T (cid:48) where θ = ( θ \{ r } ) ∪ { r } = { r , r , r , r , r , r c , r c , r c , r c } . Since ( p , θ ) γ −→ T (cid:48) s , ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds the transitions ( p , θ ) γ −→ s and ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds ( p , θ ) γ −→ ( p , θ ) . Because there are paths ( p , θ ) γ −→ T (cid:48) ( p , θ ) γ −→ T (cid:48) ( p , θ ) and ( p , θ ) γ −→ T (cid:48) ( p , θ ) γ −→ T (cid:48) s , Rule ( α ) adds the transitions ( p , θ ) γ −→ ( p , θ ) and ( p , θ ) γ −→ s to T (cid:48) . Since ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds ( p , θ ) γ −→ ( p , θ ) . Now we have ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r c , r ∈ θ , , θ p , θ s s s s γ γ γ γ Δ :Δ : r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ Δ c :Δ c : r c : p r c : p r c : p r c : p r c : p r c : p r c : p r c : p θ = { r , r , r , r , r , r c , r c , r c , r c } θ = { r , r , r , r , r , r c , r c , r c , r c } p , θ p , θ s s s s p , θ p , θ γ p , θ p , θ γ γ p , θ p , θ γ p , θ p , θ γ γ θ = { r , r , r , r , r , r c , r c , r c , r c } θ = { r , r , r , r , r , r c , r c , r c , r c } θ = { r , r , r , r , r , r c , r c , r c , r c } p , θ p , θ γ γ γ p , θ p , θ γ p , θ p , θ p , θ p , θ γ γ p , θ p , θ γ p , θ p , θ γ γ γ γ γ γ θ = { r , r , r , r , r , r c , r c , r c , r c } p , θ p , θ γ p , θ p , θ γ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ γ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ γ γ γ γ γ γ γ ↪↪ p ↪↪ ( r , r )( r , r )( r , r )( r , r ) p p p Fig. 3: The automata A (left) and A pre ∗ (right) Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) where θ = { r , r , r , r , r , r c , r c , r c , r c } . For thesame reason, since ( p , θ ) γ −→ ( p , θ ) and r c , r ∈ θ ,Rule α adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) because θ = ( θ \ { r } ) ∪ { r } . Since ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) addsthe transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Because ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α )adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Then,since there is a path ( p , θ ) γ −→ T (cid:48) ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Then, since r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r c ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ T (cid:48) ( p , θ ) to T (cid:48) where ( θ \ { r } ) ∪ { r } = θ . Meanwhile, since ( p , θ ) γ −→ ( p , θ ) and r c , r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) where ( θ \{ r } ) ∪{ r } = θ . Because r ∈ θ and ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) , Rule ( α )adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Since ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds ( p , θ ) γ −→ T (cid:48) ( p , θ ) to T (cid:48) . Then, there is a path ( p , θ ) γ γ −−−→ ∗ T (cid:48) ( p , θ ) , since r ∈ θ , Rule ( α ) addsthe transition ( p , θ ) γ −→ T (cid:48) ( p , θ ) to T (cid:48) . Then, since ( p , θ ) (cid:15) −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds thetransition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . Now we have ( p , θ ) γ −→ T (cid:48) ( p , θ ) and ( p , θ ) γ −→ T (cid:48) ( p , θ ) , since r c , r ∈ θ , Rule α adds the transitions ( p , θ ) γ −→ ( p , θ ) and ( p , θ ) γ −→ ( p , θ ) to T (cid:48) where ( θ \ { r }∪ ) { r } = θ . Since ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r , r c ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) where ( θ \ { r } ) ∪ { r } = θ . Since ( p , θ ) γ −→ T (cid:48) ( p , θ ) and r ∈ θ , Rule ( α ) adds the transition ( p , θ ) γ −→ ( p , θ ) to T (cid:48) . No further additions are possible, so the procedureterminates.A. Proof of Theorem 3
Let us now prove Theorem 3. To prove this theorem, wefirst introduce the following lemma.
Lemma 1.
For every configuration ( (cid:104) p, w (cid:105) , θ ) ∈ L ( A ) , if ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗P ( (cid:104) p, w (cid:105) , θ ) , then ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q for somefinal state q of A pre ∗ . Proof:
Assume ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) i ⇒ P ( (cid:104) p, w (cid:105) , θ ) . We proceed byinduction on i . Basis. i = 0 . Then θ = θ , p (cid:48) = p and w = w (cid:48) . Since ( (cid:104) p, w (cid:105) , θ ) ∈ L ( A ) , we have ( p, θ ) w −→ T (cid:48) q always holdsfor some final state q i.e. ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q holds. Step. i > . Then there exists a configuration ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) such that ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ P ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) i − ⇒ P ( (cid:104) p, w (cid:105) , θ ) We apply the induction hypothesis to ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) i − ⇒ ( (cid:104) p, w (cid:105) , θ ) , and obtain ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) u −→ T (cid:48) q for q ∈ F .Let w , u ∈ Γ ∗ , γ (cid:48) ∈ Γ be such that w (cid:48) = γ (cid:48) w , u = u w .Let q (cid:48) be a state of A pre ∗ s.t. ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) u −−→ T (cid:48) q (cid:48) w −−→ T (cid:48) q (1)There are two cases depending on which rule is applied to get ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) .1) Case ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) is obtained by a ruleof the form: (cid:104) p (cid:48) , γ (cid:48) (cid:105) (cid:44) → (cid:104) p (cid:48)(cid:48) , u (cid:105) ∈ ∆ . In this case, θ (cid:48)(cid:48) = θ. By the saturation rule α , we have ( p (cid:48) , θ (cid:48)(cid:48) ) γ (cid:48) −−→ T (cid:48) q (cid:48) (2)Putting (1) and (2) together, we can obtain that π = ( p (cid:48) , θ (cid:48)(cid:48) ) γ (cid:48) −−→ T (cid:48) q (cid:48) w −−→ T (cid:48) q (3)Thus, ( p (cid:48) , θ (cid:48)(cid:48) ) γ (cid:48) w −−−−→ T (cid:48) q i.e. ( p (cid:48) , θ ) w (cid:48) −−→ q for some finalstate q ∈ F .2) Case ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) is obtained by a ruleof the form p (cid:48) ( r ,r ) (cid:44) −−−−→ p (cid:48)(cid:48) ∈ ∆ c . I.e θ (cid:48)(cid:48) (cid:54) = θ. In thiscase, u = γ (cid:48) . By the saturation rule α , we obtain that ( p (cid:48) , θ ) γ (cid:48) −−→ T (cid:48) q (cid:48) where θ (cid:48)(cid:48) = θ \{ r } ∪ { r } . (4)Putting (1) and (4) together, we have the following path ( p (cid:48) , θ ) γ (cid:48) −−→ T (cid:48) q (cid:48) w −−→ T (cid:48) q. I.e. ( p (cid:48) , θ ) w (cid:48) −−→ T (cid:48) q for q ∈ F (5)7 Lemma 2.
If a path π = ( p, θ ) w −→ T (cid:48) q for θ ⊆ ∆ ∪ ∆ c is in A pre ∗ , then (I) ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) holds for a configura-tion ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −−→ T q in the initial P -automaton A ; (II) Moreover, if q is an initial state i.e. in the form ( p, θ ) ,then w (cid:48) = (cid:15) . Proof:
Let A pre ∗ = ( Q, Γ , T, P, F ) be the P -automatoncomputed by the saturation procedure. In this proof, we use −−→ i T (cid:48) to denote the transition relation of A pre ∗ obtainedafter adding i -transitions using the saturation procedure. Inparticular, since initially A pre ∗ = A , A pre ∗ contains the path ( p (cid:48) , θ ) w (cid:48) −−→ T q where ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ∈ L ( A ) , then we write ( p (cid:48) , θ ) w (cid:48) −−→ T q .Let i be an index such that π = ( p, θ ) w −−→ i T (cid:48) q holds. Weshall prove (I) by induction on i . Statement (II) then followsimmediately from the fact that initial states have no incomingtransitions in A . Basis. i = 0 . Since ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, w (cid:105) , θ ) always holds,take then p = p (cid:48) , w = w (cid:48) and θ = θ . Step. i > . Let t = (( p , θ ) , γ, q (cid:48) ) be the i -th transitionadded to A pre ∗ and j be the number of times that t is usedin the path ( p, θ ) w −−→ i T (cid:48) q . The proof is by induction on j . If j = 0 , then we have ( p, θ ) w −−→ i − T (cid:48) q in the automaton, and weapply the induction hypothesis (induction on i ) then we obtain ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) for a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −−→ T q in the initial P -automaton A . So assumethat j > . Then, there exist u and v such that w = uγv and ( p, θ ) u −−→ i − T (cid:48) ( p , θ ) γ −−→ i T (cid:48) q (cid:48) v −−→ i T (cid:48) q (1)The application of the induction hypothesis (induction on i ) to ( p, θ ) u −−→ i − T (cid:48) ( p , θ ) (notice that ( p , θ ) is an initialstate) gives that ( (cid:104) p, u (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , (cid:15) (cid:105) , θ ) (2)There are 2 cases depending on whether transition t wasadded by saturation rule α or α .1) Case t was added by rule α : There exist p ∈ P and w ∈ Γ ∗ such that r = (cid:104) p , γ (cid:105) (cid:44) → (cid:104) p , w (cid:105) ∈ ∆ ∩ θ (3)and A pre ∗ contains the following path: π (cid:48) = ( p , θ ) w −−→ i − T (cid:48) q (cid:48) v −−→ i T (cid:48) q (4)Applying the transition rule r gets that ( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , w v (cid:105) , θ ) (5) By induction on j (since transition t is used j − timesin π (cid:48) ), we get from (4) that ( (cid:104) p , w v (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −−→ T (cid:48) q in the initial P -automaton A (6)Putting (2) ,(5) and (6) together, we can obtain that ( (cid:104) p, w (cid:105) , θ ) = ( (cid:104) p, uγv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , w v (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) such that ( p (cid:48) , θ ) w (cid:48) −−→ T q in the initial P -automaton A
2) Case t was added by rule α : there exist p ∈ P and θ (cid:48)(cid:48) ⊆ ∆ ∪ ∆ c such that p r ,r ) (cid:44) −−−−→ p ∈ ∆ c ∩ θ (cid:48)(cid:48) , θ (cid:48)(cid:48) = ( θ \{ r } ) ∪ { r } (7)and the following path in the current automaton ( self-modifying rule won’t change the stack) with r ∈ θ (cid:48)(cid:48) :( p , θ (cid:48)(cid:48) ) γ −−→ i − T (cid:48) q (cid:48) v −−→ i T (cid:48) q (8)Applying the transition rule, we can get from (7) that ( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , γv (cid:105) , θ (cid:48)(cid:48) ) (9)We can apply the induction hypothesis (on j ) to (8), andobtain ( (cid:104) p , γv (cid:105) , θ (cid:48)(cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −−→ T q in the initial P -automaton A (10)From (2),(9) and (10), we get ( (cid:104) p, w (cid:105) , θ ) = ( (cid:104) p, uγv (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γv (cid:105) , θ ) ⇒ ( (cid:104) p , γv (cid:105) , θ (cid:48)(cid:48) ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) such that ( p (cid:48) , θ ) w (cid:48) −−→ T q in the initial P -automaton A . (cid:50) Then, we can prove Theorem 3:
Proof:
Let ( (cid:104) p, w (cid:105) , θ ) be a configuration of pre ∗ ( L ( A )) . Then ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) for a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −−→ T (cid:48) q is a path in A for q ∈ F . By lemma 1,we can obtain that there exists a path ( p, θ ) w −→ T (cid:48) q for somefinal state q of A pre ∗ . So ( (cid:104) p, w (cid:105) , θ ) is recognized by A pre ∗ .Conversely, let ( (cid:104) p, w (cid:105) , θ ) be a configuration accepted by A pre ∗ i.e. there exists a path ( p, θ ) w −→ T (cid:48) q in A pre ∗ for somefinal state q ∈ F . By lemma 2, there exists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. there exist a path ( p (cid:48) , θ ) w (cid:48) −→ T q in the initialautomaton A and ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) . Because q isa final state, we have ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ∈ L ( A ) i.e. ( (cid:104) p, w (cid:105) , θ ) ∈ pre ∗ ( L ( A )) . (cid:50) FFICIENT COMPUTATION OF post ∗ IMAGES
Let P = ( P, Γ , ∆ , ∆ c ) be a SM-PDS, and let A =( Q, Γ , T, P, F ) be a P -automaton that represents a regular setof configurations C ( C = L ( A ) ). Similarly, it is not optimalto compute post ∗ ( C ) using the translations of Sections II-Band II-C to compute equivalent PDSs or symbolic PDSs, andthen apply the algorithms of [EHRS00], [Sch02]. We presentin this section a direct and efficient algorithm that computes post ∗ ( C ) . We assume w.l.o.g. that A has no transitions leadingto an initial state. Moreover, we assume that the rules of ∆ are of the form (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) , where | w | ≤ . This is not arestriction, indeed, a rule of the form (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ · · · γ n (cid:105) , n > can be replaced by the following rules: • (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p , a γ n (cid:105) • (cid:104) p , a (cid:105) (cid:44) → (cid:104) p , a γ n − (cid:105) • (cid:104) p , a (cid:105) (cid:44) → (cid:104) p , a γ n − (cid:105) • · · · , • (cid:104) p n − , a n − (cid:105) (cid:44) → (cid:104) p (cid:48) , γ γ (cid:105) As previously, the construction of A post ∗ consists in addingiteratively new transitions to the automaton A according tosaturation rules (reflecting the forward application of thetransition rules in the system). We define A post ∗ to be the P -automaton ( Q (cid:48) , Γ , T (cid:48) , P, F ) , where T (cid:48) is computed usingthe following saturation rules and Q (cid:48) is the smallest set s.t. Q ⊆ Q (cid:48) and for every r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ γ (cid:105) ∈ ∆ , q θp (cid:48) γ ∈ Q (cid:48) where q θp (cid:48) γ is the new state labelled with p (cid:48) , γ and θ :initially T (cid:48) = T ; β : If r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , (cid:15) (cid:105) ∈ ∆ and there exists in T (cid:48) a path π = ( p, θ ) γ −→ T (cid:48) q with r ∈ θ , then add (( p (cid:48) , θ ) , (cid:15), q ) to T (cid:48) . β : If r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ (cid:48) (cid:105) ∈ ∆ and there exists in T (cid:48) a path π = ( p, θ ) γ −→ T (cid:48) q with r ∈ θ , then add (( p (cid:48) , θ ) , γ (cid:48) , q ) to T (cid:48) . β : If r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ γ (cid:105) ∈ ∆ and there exists in T (cid:48) a path π = ( p, θ ) γ −→ T (cid:48) q with r ∈ θ . Add t (cid:48) =(( p (cid:48) , θ ) , γ , q θp (cid:48) γ ) and t (cid:48)(cid:48) = ( q θp (cid:48) γ , γ , q ) to T (cid:48) . β : if r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c and there exists in T (cid:48) a path π = ( p, θ ) γ −→ T (cid:48) q , where γ ∈ Γ with r ∈ θ , and r ∈ θ ,then add t (cid:48) = (( p (cid:48) , θ (cid:48) ) , γ, q ) where θ (cid:48) = ( θ \ { r } ) ∪{ r }} .The procedure above terminates since there is a finitenumber of states and phases.Let us explain intuitively the role of the saturation rulesabove. Consider a path in the automaton of the form ( p, θ ) γ −→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F , where q F ∈ F . This means, by defi-nition of P -automata, that the configuration c = ( (cid:104) p, γw (cid:48) (cid:105) , θ ) is accepted by A post ∗ .Let r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , (cid:15) (cid:105) ∈ ∆ . If r is in θ , then theconfiguration c (cid:48) = ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) is a successor of c . Therefore,it should be added to A post ∗ . This configuration is acceptedby the run ( p (cid:48) , θ ) (cid:15) −→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F added by rules ( β ).If θ contains the rule r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ (cid:48) (cid:105) ∈ ∆ , thenthe configuration c (cid:48) = ( (cid:104) p (cid:48) , γ (cid:48) w (cid:48) (cid:105) , θ ) is a successor of c .Therefore, it should be added to A post ∗ . This configuration isaccepted by the run ( p (cid:48) , θ ) γ (cid:48) −−→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F added by rules( β ). If r = (cid:104) p, γ (cid:105) (cid:44) → (cid:104) p (cid:48) , γ γ (cid:105) ∈ ∆ is in θ , then the configu-ration c (cid:48) = ( (cid:104) p (cid:48) , γ γ w (cid:48) (cid:105) , θ ) is a successor of c . Therefore, itshould be added to A post ∗ . This configuration is accepted bythe run ( p (cid:48) , θ ) γ −−→ T (cid:48) q θp (cid:48) γ γ −−→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F added by rules( β ).Rule ( β ) deals with modifying rules: Let r = p ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c . If r and r are in θ , then the configuration c (cid:48) =( (cid:104) p (cid:48) , γw (cid:48) (cid:105) , θ (cid:48) ) is a successor of c , where θ (cid:48) = ( θ \{ r } ) ∪{ r } .Therefore, it should be added to A post ∗ . This configuration isaccepted by the run ( p (cid:48) , θ (cid:48) ) γ −→ T (cid:48) q w (cid:48) −−→ T (cid:48) q F added by rules( β ).Thus, we can show that: Theorem 4. A post ∗ recognizes the set post ∗ ( L ( A )) . Before proving this theorem, let us illustrate the constructionon 2 examples. p , θ p , θ s s s s γ γ γ γ Δ :Δ : r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ Δ c :Δ c : r ′ : p r ′ : p θ = { r , r , r , r , r ′ } θ = { r , r , r , r , r ′ } p , θ p , θ s s s s γ γ γ γ p , θ p , θ q θ p γ q θ p γ γ γ p , θ p , θ q θ p γ q θ p γ γ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ ϵ p , θ p , θ q θ p γ q θ p γ γ γ γ ( r , r ) ↪ p Fig. 4: The automata A (left) and A post ∗ (right) Example 4.
Let us illustrate this procedure by an exam-ple. Consider the SM-PDS shown in the left half of Fig.4 and the automaton A from Fig. 4 that accepts the set C = { ( (cid:104) p , γ γ (cid:105) , θ ) } where ( p , θ ) is the initial state and s is the final state. Then the result A post ∗ of the algorithm isshown in the right half of Fig. 4. The result is derived throughthe following steps: First, since ( p , θ ) γ −→ T (cid:48) s and r ∈ θ , Rule ( β ) generates a new state q θ p γ and adds the two transitions: ( p , θ ) γ −→ q θ p γ and q θ p γ γ −→ s to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) generates a new state q θ p γ and adds two transitions: ( p , θ ) γ −→ q θ p γ and q θ p γ γ −→ q θ p γ to T (cid:48) . Because ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) adds the transition ( p , θ ) γ −→ q θ p γ to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r (cid:48) ∈ θ , Rule ( β ) addsthe transition ( p , θ ) γ −→ q θ p γ to T (cid:48) where θ = ( θ \{ r } ) ∪ { r } = { r , r , r , r , r (cid:48) } . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) addsthe transition ( p , θ ) (cid:15) −→ q θ p γ to T (cid:48) . Then, since there is a path ( p , θ ) γ −→ ∗ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) generates new state q θ p ,γ and adds wo transitions ( p , θ ) γ −→ q θ p ,γ and q θ p ,γ γ −→ q θ p γ to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p ,γ and r ∈ θ , Rule ( β ) addsthe transition ( p , θ ) γ −→ q θ p ,γ to T (cid:48) . No unprocessed matches remain. The procedure termi-nates.
Example 5.
Let us illustrate this procedure by another ex-ample. Consider the SM-PDS shown in the left half of Fig. 5where ( p , θ ) is the initial state and s is the final state. Theresult A post ∗ of the algorithm is shown in the right half ofFig. 5 obtained as follows: First, since ( p , θ ) γ −→ T (cid:48) s and r ∈ θ , Rule ( β ) generates a new state q θ p γ and adds two transitions: ( p , θ ) γ −→ q θ p γ and q θ p γ γ −→ s to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) generates a new state q θ p γ and adds two transitions: ( p , θ ) γ −→ q θ p γ and q θ p γ γ −→ q θ p γ to T (cid:48) . Because ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) adds ( p , θ ) γ −→ q θ p γ to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r c , r ∈ θ , Rule ( β ) adds the transition ( p , θ ) γ −→ q θ p γ to T (cid:48) where θ =( θ \ { r } ) ∪ { r } . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) addsthe transition ( p , θ ) (cid:15) −→ q θ p γ to T (cid:48) . Then there is apath ( p , θ ) γ −→ ∗ T (cid:48) q θ p γ , since r ∈ θ , Rule ( β ) addsthe transition ( p , θ ) γ −→ q θ p γ to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r c , r ∈ θ , Rule ( β ) adds the transition ( p , θ ) γ −→ q θ p γ to T (cid:48) where ( θ \{ r } ) ∪ { r } = θ . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) addsthe transitions ( p , θ ) γ −→ q θ p γ and q θ p γ γ −→ q θ p γ to T (cid:48) . Because ( p , θ ) γ −→ T (cid:48) q θ p γ and r c ∈ θ , Rule ( β ) adds the transition ( p , θ ) γ −→ q θ p γ to T (cid:48) . Since ( p , θ ) γ −→ T (cid:48) q θ p γ and r ∈ θ , Rule ( β ) addsthe transition ( p , θ ) γ −→ q θ p γ to T (cid:48) . Since p , θ γ −→ T (cid:48) q θ p γ holds and r , r c ∈ θ , Rule ( β ) adds the transition ( p , θ ) γ −→ q θ p γ to T (cid:48) . Then, since ( p , θ ) γ −→ T (cid:48) q θ p γ and r c ∈ θ , Rule ( β ) adds the transition ( p , θ ) γ −→ q θ p γ to T (cid:48) . Since r ∈ θ and ( p , θ ) γ −→ T (cid:48) q θ p γ , Rule ( β ) addstwo transitions: ( p , θ ) γ −→ q θ p γ and q θ p γ γ −→ q θ p γ to T (cid:48) . No more rules can be applied. Thus, the procedureterminates.A. Proof of Theorem 4
Let us now prove Theorem 4. To prove this theorem, wefirst show the following lemma:
Lemma 3.
For every configuration ( (cid:104) p, w (cid:105) , θ ) ∈ L ( A ) ,if ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) then we have a path π =( p (cid:48) , θ ) w (cid:48) −−→ T (cid:48) q for some final state q of A post ∗ . Proof:
Let i be the index s.t. ( (cid:104) p, w (cid:105) , θ ) i ⇒ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) holds. Weproceed by induction on i . Basis. i = 0 . Then p (cid:48) = p , w = w (cid:48) and θ = θ . Since ( (cid:104) p, w (cid:105) , θ ) ∈ L ( A ) , we have ( p, θ ) w −−→ T (cid:48) q for some finalstate q that implies π = ( p (cid:48) , θ ) w (cid:48) −−→ T (cid:48) q is a path of A post ∗ . Step. i > . Then there exists a configuration ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) with ( (cid:104) p, w (cid:105) , θ ) i − ⇒ ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) ⇒ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) By applying the induction hypothesis (induction on i ), we canget that ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) u −−→ T (cid:48) q for some q ∈ F (1)Then, let γ ∈ Γ , u , w ∈ Γ ∗ be such that u = γu , w (cid:48) = w u . Let q be a state of A post ∗ s.t. we have the followingpath in A post ∗ : ( p (cid:48)(cid:48) , θ (cid:48)(cid:48) ) γ −→ T (cid:48) q u −−→ T (cid:48) q (2)There are two cases depending on whether ( (cid:104) p (cid:48)(cid:48) , u (cid:105) , θ (cid:48)(cid:48) ) ⇒ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) is corresponding to a self-modifying transition(i.e. ( θ (cid:48)(cid:48) = θ )) or not.1) Case: θ (cid:48)(cid:48) = θ . Then there exists a transition rule r : (cid:104) p (cid:48)(cid:48) , γ (cid:105) (cid:44) → (cid:104) p (cid:48) , w (cid:105) ∈ ∆ s.t. r ∈ θ . There are threepossible cases depending on the length of w : - Case | w | = 0 i.e. w = (cid:15) , by applying thesaturation rule β , we can get ( p (cid:48) , θ ) (cid:15) −→ T (cid:48) q (3)Putting (2) and (3) together, we can have ( p (cid:48) , θ ) (cid:15) −→ T (cid:48) q u −−→ T (cid:48) q i.e. ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q forsome final state q of A post ∗ .- Case | w | = 1 , then let γ (cid:48) ∈ Γ s.t. w = γ (cid:48) . Byapplying the saturation rule α , we can get ( p (cid:48) , θ ) γ (cid:48) −→ T (cid:48) q (4)Putting (2) and (4) together, we can have ( p (cid:48) , θ ) γ (cid:48) −→ T (cid:48) q u −−→ T (cid:48) q i.e. ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q forsome final state q of A post ∗ .- Case | w | = 2 , let γ (cid:48) , γ (cid:48) ∈ Γ be such that w = γ (cid:48) γ (cid:48) . By applying the saturation rule α , we canget ( p (cid:48) , θ ) γ (cid:48) −−→ T (cid:48) q θp (cid:48) γ (cid:48) γ (cid:48) −−→ T (cid:48) q (5)Putting (2) and (5) together, then we have apath ( p (cid:48) , θ ) γ (cid:48) −→ T (cid:48) q θp (cid:48) γ (cid:48) γ (cid:48) −→ T (cid:48) q u −→ T (cid:48) q i.e. ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q for some final state q of A post ∗ .2) Case θ (cid:48)(cid:48) (cid:54) = θ . Then there exists a self-modifying transi-tion rule s.t. r : p (cid:48)(cid:48) ( r ,r ) (cid:44) −−−−→ p (cid:48) ∈ ∆ c ∩ θ (cid:48)(cid:48) and γ = w and θ = ( θ (cid:48)(cid:48) \{ r } ) ∪ { r } .By applying rule β to (2), we have the following pathin the automaton:10 , θ p , θ s s s s γ γ γ γ Δ :Δ : r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , ϵ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ r : ⟨ p , γ ⟩ ↪ ⟨ p , γ ⟩ Δ c :Δ c : r c : p r c : p r c : p r c : p r c : p r c : p r c : p r c : p θ = { r , r , r . r , r c , r c , r c , r c } θ = { r , r , r . r , r c , r c , r c , r c } p , θ p , θ s s s s γ γ γ γ p , θ p , θ q θ p γ q θ p γ γ γ p , θ p , θ q θ p γ q θ p γ γ γ p , θ p , θ γ p , θ p , θ γ p , θ p , θ ϵ p , θ p , θ γ θ = { r , r , r , r , r c , r c , r c , r c } θ = { r , r , r , r , r c , r c , r c , r c } θ = { r , r , r , r , r c , r c , r c , r c } p , θ p , θ γ p , θ p , θ γ γ γ γ γ γ p p p p ↪↪↪↪ ( r , r )( r , r )( r , r )( r , r ) Fig. 5: The automata A (left) and A post ∗ (right) ( p (cid:48) , θ ) γ −→ T (cid:48) q u −→ T (cid:48) q (5)i.e. ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q for some final state q of A post ∗ . (cid:50) Lemma 4.
If a path π = ( p, θ ) w −→ T (cid:48) q is in A post ∗ , then thefollowing holds: (I) if q is a state of A , then ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, w (cid:105) , θ ) fora configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) such that ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A ; (II) if q is a new state of the form q = q θ p γ , then ( (cid:104) p , γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, w (cid:105) , θ ) . Proof:
Let A post ∗ = ( Q (cid:48) , Γ , T (cid:48) , P, F ) be the P -automatoncomputed by the saturation procedure. In this proof, we use −−→ i T (cid:48) to denote the transition relation → T (cid:48) of A post ∗ obtainedafter adding i transitions using the saturation procedure.Let i be an index such that ( p, θ ) w −−→ i T (cid:48) q holds. We proveboth parts of the lemma by induction on i . Basis. i = 0 . Only (I) applies. Thus, p (cid:48) = p , θ = θ and w = w (cid:48) . ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) always holds. Step. i > . Let t be the i -th transition added to the automaton.Let j be the number of times that t is used in ( p, θ ) w −−→ i T (cid:48) q . A has no transitions leading to initial states, and the algorithmdoes not add any such transitions; therefore, if t starts in aninitial state, t can only be used at the start of the path.The proof is by induction on j . If j = 0 , then wehave ( p, θ ) w −−→ i − T (cid:48) q . We apply the induction hypothesis(induction on i ) then we obtain that there exists a config-uration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, w (cid:105) , θ ) and ( p (cid:48) , θ ) w (cid:48) −→ T q is a path of initial P -automaton A . So assumethat j > . We distinguish three possible cases: 1) If t was added by the rule β , β or β , then t =(( p , θ ) , v, q ) , where v = (cid:15) or v = γ . Then, nec-essarily, j = 1 and there exists the following path in thecurrent automaton: ( p, θ ) = ( p , θ ) v −−→ i T (cid:48) q w −−→ i − T (cid:48) q (1)There are 2 cases depending on whether transition t wasadded by rule β or not.- Case t was added by rule β : there exists a self-modifying transition rule such that r = p r ,r ) (cid:44) −−−−→ p ∈ ∆ c , and there exists the following path in thecurrent automaton: ( p , θ ) v −−→ i − T (cid:48) q w −−→ i − T (cid:48) q, θ = θ \{ r } ∪ { r } (2)By induction on ( i ) , we get from (2) that there ex-ists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A : ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , vw (cid:105) , θ ) (3)By applying the rule p r ,r ) (cid:44) −−−−→ p , we get that ( (cid:104) p , vw (cid:105) , θ ) ⇒ ( (cid:104) p , vw (cid:105) , θ ) (4)Thus, putting (3) and (4) together, we getthat there exists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A and: ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , vw (cid:105) , θ ) ⇒ ( (cid:104) p , w (cid:105) , θ ) = ( (cid:104) p, w (cid:105) , θ ) (5)- Case t is added by β or β : then there exists p ∈ P , γ ∈ Γ such that r = (cid:104) p , γ (cid:105) (cid:44) → (cid:104) p , v (cid:105) ∈ ∆ (6)and A post ∗ contains the following path:11 p , θ ) γ −−→ i − T (cid:48) q w −−→ i − T (cid:48) q (7)By induction on ( i ) , We can get from (7)that there exists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A and: ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γ w (cid:105) , θ ) (8)Thus, putting (6) and (8) together, we havethat there exists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A and: ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γ w (cid:105) , θ ) ⇒ ( (cid:104) p , w (cid:105) , θ ) = ( (cid:104) p, w (cid:105) , θ ) (9)2) If t is the first transition added by rule β i.e. t is inthe form of (( p , θ (cid:48)(cid:48) ) , γ , q θ p γ ) . If this transition is new,then there are no transitions outgoing from q θ p γ . So theonly path using t is ( p , θ (cid:48)(cid:48) ) γ −−→ i T (cid:48) q θ p γ . For this path,we only need to prove part (II), and ( (cid:104) p , γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γ (cid:105) , θ ) holds trivially.3) Let t = ( q θ p γ , γ (cid:48)(cid:48) , q (cid:48) ) be the second transition added bysaturation rule β . Then there exist u , v ∈ Γ ∗ s.t. w = uγ (cid:48)(cid:48) v and the current automaton contains the followingpath: ( p, θ ) u −−→ i − T (cid:48) q θ p γ γ (cid:48)(cid:48) −−→ i T (cid:48) q (cid:48) v −−→ i T (cid:48) q (10)Because t was added via the saturation rule, then thereexist p ∈ P , γ ∈ Γ and a rule of the form (cid:104) p , γ (cid:105) (cid:44) → (cid:104) p , γ γ (cid:48)(cid:48) (cid:105) ∈ ∆ ∩ θ (11)and A post ∗ contains the following path: ( p , θ ) γ −−→ i − T (cid:48) q (cid:48) v −−→ i T (cid:48) q (12)We apply the induction hypothesis on i and obtain that ( (cid:104) p , γ (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, u (cid:105) , θ ) (13)We apply the induction hypothesis on i to ob-tain that there exists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A and: ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γ v (cid:105) , θ ) (14)Thus, putting (11) (13) and (14) together, we havethat there exists a configuration ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. ( p (cid:48) , θ ) w (cid:48) −→ T q is a path in the initial P -automaton A and: ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ⇒ ∗ ( (cid:104) p , γ v (cid:105) , θ ) ⇒ ( (cid:104) p , γ γ (cid:48)(cid:48) v (cid:105) , θ ) ⇒ ∗ ( (cid:104) p, uγ (cid:48)(cid:48) v (cid:105) , θ ) = ( (cid:104) p, w (cid:105) , θ ) (15) (cid:50) Then we continue to prove Theorem 4:
Proof:
Let ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) be a configuration of post ∗ ( L ( A )) .Then there exists a configuration ( (cid:104) p, w (cid:105) , θ ) such that thereexists a path ( p, θ ) w −→ T q in the initial automaton A and ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗ ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) . By Lemma 3, we can have ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q for q is a final state of A post ∗ . So ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) is recognized by A post ∗ .Conversely, let ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) be a configuration recognized by A post ∗ . Then there exists a path ( p (cid:48) , θ ) w (cid:48) −→ T (cid:48) q in A post ∗ for some final state q . By Lemma 4, since q is a finalstate, we have ( (cid:104) p, w (cid:105) , θ ) ⇒ ∗P ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) s.t. there existsa configuration ( (cid:104) p, w (cid:105) , θ ) s.t. ( p, θ ) w −→ T (cid:48) q is a path inthe initial automaton A i.e. ( (cid:104) p, w (cid:105) , θ ) ∈ L ( A ) . Therefore, ( (cid:104) p (cid:48) , w (cid:48) (cid:105) , θ ) ∈ post ∗ ( L ( A )) (cid:50) VII. E
XPERIMENTS
A. Our algorithms vs. standard pre ∗ and post ∗ algorithms ofPDSs We implemented our algorithms in a tool. To comparethe performance of our algorithms against the approach thatconsists in translating the SM-PDS into an equivalent PDS orsymbolic PDS and then apply the standard post ∗ and pre ∗ algorithms for PDSs and symbolic PDSs [EHRS00], [Sch02],we first applied our tool on randomly generated SM-PDSs ofvarious sizes. The results of the comparision using the pre ∗ (resp. post ∗ ) algorithms are reported in Table 1 (resp. Table2).In Table I, Column | ∆ | + | ∆ c | is the number of transitionsof the SM-PDS (changing and non changing rules). Column
SM-PDS gives the cost it takes to apply our direct algorithm tocompute the pre ∗ for the given SM-PDS. Column
PDS showsthe cost it takes to get the equivalent PDS from the SM-PDS.
Column
Symbolic PDS reports the cost it takes to get theequivalent Symbolic PDS from the SM-PDS.
Column
Result1reports the cost it takes to get the pre ∗ analysis of Moped[Sch02] for the PDS we got. Column
Total1 is the total costit takes to translate the SM-PDS into a PDS and then applythe standard pre ∗ algorithm of Moped (Total1=PDS+Result1). Column
Result2 reports the cost it takes to get the pre ∗ analysis of Moped for the symbolic PDS we got. Column
Total2 is the total cost it takes to translate the SM-PDS into asymbolic PDS and then apply the standard pre ∗ algorithm ofMoped (Total2=Symbolic PDS+Result2). ”error” in the tablemeans failure of Moped, because the size of the relationsinvolved in the symbolic transitions is huge. Hence, we mark − for the total execution time. You can see that our directalgorithm ( Column
SM-PDS) is much more efficient.Table II shows the performance of our post ∗ algorithm.The meaning of the columns are exactly the same as for the pre ∗ case, but using the post ∗ algorithms instead. You cansee from this table that applying our direct post ∗ algorithmon the SM-PDS is much better than translating the SM-PDSto an equivalent PDS or symbolic PDS, and then applying thestandard post ∗ algorithms of Moped. Going through PDSs or12 ∆ | + | ∆ c | SM-PDS PDS Result1 Total1 Symbolic PDS Result2 Total2
10 + 3
13 + 3
13 + 3
43 + 7
110 + 10
120 + 10
255 + 8
B 295.41s & 86MB 0.05s 295.46s 21.41s & 76MB 0.02s 21.43s
TABLE I: Our direct pre ∗ algorithm vs. standard pre ∗ algorithms of PDSs | ∆ | + | ∆ c | SM-PDS PDS Result1 Total1 Symbolic PDS Result2 Total2
10 + 3
13 + 3
43 + 7
110 + 10
120 + 10
255 + 8
TABLE II: Our direct post ∗ algorithm vs. standard post ∗ algorithms of PDSssymbolic PDSs is less efficient and leads to memory out inseveral cases. B. Malware Detection
Self-modifying code is widely used as an obfuscation tech-nique for malware writers. Thus, we applied our tool formalware detection.
Example SM-PDS PDSEmail-Worm.Win32.Klez.b Y NBackdoor.Win32.Allaple.b Y NEmail-Worm.Win32.Avron.a Y NEmail-Worm.Win32.Anar.a Y NEmail-Worm.Win32.Anar.b Y NEmail-Worm.Win32.Bagle.a Y NEmail-Worm.Win32.Bagle.am Y NEmail-Worm.Win32.Bagle.ao Y NEmail-Worm.Win32.Bagle.ap Y NEmail-Worm.Win32.Ardurk.d Y NEmail-Worm.Win32.Atak.k Y NEmail-Worm.Win32.Atak.g Y NEmail-Worm.Win32.Hanged Y N
TABLE III: Malware DetectionWe consider self-modifying versions of 13 well knownmalwares. In these versions, the malicious behaviors are unreachable if one does not take into account that the self-modifying piece of code will change the malware code: ifthe code does not change, the part that contains the maliciousbehavior cannot be reached; after executing the self-modifyingcode, the control point will jump to the part containing themalicious behavior.We model such malwares in two ways: (1) first, we take intoaccount the self-modifying piece of code and use SM-PDSsto represent these programs as discussed in Section III-B, (2)second, we don’t take into account that this part of the codeis self-modifying and we treat it as all the other instructionsof the program. In this case, we model these programs by astandard PDS following the translation of [ST12].The results are reported in Table 3,
Column
Examplereports the name of the worm.
Column
SM-PDS showsthe result obtained by applying our method to check thereachability of the entry point of the malicious block.
Column
PDS gives the result if we apply the traditional PDS translationof programs (without taking into account the semantics of selfmodifying code) method to check the reachability of the entrypoint of the malicious block. Y stands for yes (the program ismalicious) and N stands for no (the program is benign). As itcan be seen, our techniques that go through SM-PDS to model13elf modifying code is able to conclude that the entry point ofthe malicious block is reachable, whereas the standard PDStranslation from programs fails to reach this conclusion.R EFERENCES[AMDB06] Bertrand Anckaert, Matias Madou, and Koen De Bosschere. Amodel for self-modifying code. In IH , pages 232–248, 2006.[BDD +
01] Jean Bergeron, Mourad Debbabi, Jules Desharnais, Mourad MErhioui, Yvan Lavoie, Nadia Tawbi, et al. Static detection ofmalicious code in executable programs.
Int. J. of Req. Eng ,2001(184-189):79, 2001.[BEM97] A. Bouajjani, J. Esparza, and O. Maler. Reachability Analysisof Pushdown Automata: Application to Model Checking. In
CONCUR , pages 135–150, 1997.[BLP16] Sandrine Blazy, Vincent Laporte, and David Pichardie. Ver-ified abstract interpretation techniques for disassembling low-level self-modifying code.
Journal of Automated Reasoning ,56(3):283–308, 2016.[BMRP09] Guillaume Bonfante, Jean-Yves Marion, and Daniel Reynaud-Plantey. A computability perspective on self-modifying pro-grams. In
SEFM , pages 231–239, 2009.[BRK +
05] Gogul Balakrishnan, Thomas W. Reps, Nicholas Kidd, AkashLal, Junghee Lim, David Melski, Radu Gruian, Suan Hsi Yong,Chi-Hua Chen, and Tim Teitelbaum. Model checking x86executables with codesurfer/x86 and WPDS++. In
CAV , pages158–163, 2005.[CDKT09] Kevin Coogan, Saumya Debray, Tasneem Kaochar, and GreggTownsend. Automatic static unpacking of malware binaries. In
WCRE , pages 167–176, 2009.[CJS +
05] Mihai Christodorescu, Somesh Jha, Sanjit A Seshia, Dawn Song,and Randal E Bryant. Semantics-aware malware detection. In S & P , pages 32–46, 2005.[CSV07] Hongxu Cai, Zhong Shao, and Alexander Vaynberg. Certifiedself-modifying code. ACM SIGPLAN Notices , 42(6):66–77,2007.[CT08] Saumya K Debray Kevin P Coogan and Gregg M Townsend. Onthe semantics of self-unpacking malware code. Technical report,Citeseer, 2008.[EHRS00] Javier Esparza, David Hansel, Peter Rossmanith, and StefanSchwoon. Efficient algorithms for model checking pushdownsystems. In
CAV , pages 232–247, 2000.[Kar] Karl. Automated unpacking: A behaviour based approach. https://github.com/malwaremusings/unpacker.[Kin08] Veith H. Kinder.J. Jakstab: A static analysis platform for binaries.In
CAV , pages 423–427. Springer, 2008.[KKSV05] Johannes Kinder, Stefan Katzenbeisser, Christian Schallhart, andHelmut Veith. Detecting malicious code by model checking. In
DIMVA , pages 174–187, 2005.[KPY07] Min Gyung Kang, Pongsin Poosankam, and Heng Yin. Renovo:A hidden code extractor for packed executables. In
WORM ,pages 46–53. ACM, 2007.[RHD +
06] Paul Royal, Mitch Halpin, David Dagon, Robert Edmonds, andWenke Lee. Polyunpack: Automating the hidden-code extractionof unpack-executing malware. In
ACSAC , pages 289–300, 2006.[RM10] Kevin A Roundy and Barton P Miller. Hybrid analysis andcontrol of malware. In
International Workshop on RecentAdvances in Intrusion Detection , pages 317–338. Springer, 2010.[Sch02] Stefan Schwoon.
Model-checking pushdown systems . PhD thesis,Technische Universit¨at M¨unchen, Universit¨atsbibliothek, 2002.[SL03] Prabhat K Singh and Arun Lakhotia. Static verification of wormand virus behavior in binary executables using model checking.In
IAW , pages 298–300, 2003.[ST12] Fu Song and Tayssir Touili. Efficient malware detection usingmodel-checking. In FM , pages 418–433, 2012.[ST13] Fu Song and Tayssir Touili. Ltl model-checking for malwaredetection. In TACAS , pages 416–431. Springer, 2013.[Tea] EnSilo Research Team. Self-modifying code unpacking toolusing dynamorio. https://github.com/BreakingMalware/Selfie.[TY17] Tayssir Touili and Xin Ye. Reachability analysis of self modify-ing code. In , pages 120–127. IEEE,2017., pages 120–127. IEEE,2017.