A Myhill-Nerode Theorem for Register Automata and Symbolic Trace Languages
aa r X i v : . [ c s . F L ] J u l A Myhill-Nerode Theorem for RegisterAutomata and Symbolic Trace Languages ⋆ Frits Vaandrager and Abhisek Midya
Institute for Computing and Information Sciences, Radboud University, Nijmegen,the Netherlands
Abstract.
We propose a new symbolic trace semantics for register au-tomata (extended finite state machines) which records both the sequenceof input symbols that occur during a run as well as the constraints oninput parameters that are imposed by this run. Our main result is ageneralization of the classical Myhill-Nerode theorem to this symbolicsetting. Our generalization requires the use of three relations to cap-ture the additional structure of register automata. Location equivalence ≡ l captures that symbolic traces end in the same location, transitionequivalence ≡ t captures that they share the same final transition, anda partial equivalence relation ≡ r captures that symbolic values v and v ′ are stored in the same register after symbolic traces w and w ′ , respec-tively. A symbolic language is defined to be regular if relations ≡ l , ≡ t and ≡ r exist that satisfy certain conditions, in particular, they all havefinite index. We show that the symbolic language associated to a reg-ister automaton is regular, and we construct, for each regular symboliclanguage, a register automaton that accepts this language. Our resultprovides a foundation for grey-box learning algorithms in settings wherethe constraints on data parameters can be extracted from code using e.g.tools for symbolic/concolic execution or tainting. We believe that movingto a grey-box setting is essential to overcome the scalability problems ofstate-of-the-art black-box learning algorithms. Model learning (a.k.a. active automata learning) is a black-box technique whichconstructs state machine models of software and hardware components frominformation obtained by providing inputs and observing the resulting outputs.Model learning has been successfully used in numerous applications, for instancefor generating conformance test suites of software components [20], finding mis-takes in implementations of security-critical protocols [13,15,14], learning inter-faces of classes in software libraries [23], and checking that a legacy componentand a refactored implementation have the same behavior [31]. We refer to [33,26]for surveys and further references. ⋆ Supported by NWO TOP project 612.001.852 Grey-box learning of Interfaces forRefactoring Legacy Software (GIRLS). yhill-Nerode theorems [29,21] are of pivotal importance for model learningalgorithms. Angluin’s classical L ∗ algorithm [3] for active learning of regularlanguages, as well as improvements such as [30,27,32], use an observation tableto approximate the Nerode congruence. Maler and Steiger [28] established aMyhill-Nerode theorem for ω -languages that serves as a basis for a learningalgorithm described in [4]. The SL ∗ algorithm for active learning of registerautomata of Cassel et al [11] is directly based on a generalization of the classicalMyhill-Nerode theorem to a setting of data languages and register automata(extended finite state machines). Francez and Kaminski [16], Benedikt et al [5]and Bojanszyk et al [6] all present Myhill-Nerode theorems for data languages.Despite the convincing applications of black-box model learning, it is fair tosay that existing algorithms do not scale very well. In order to learn models ofrealistic applications in which inputs and outputs carry data parameters, state-of-the-art techniques either rely on manually constructed mappers that abstractthe data parameters of inputs and outputs into a finite alphabet [2], or otherwiseinfer guards and assignments from black-box observations of test outputs [11,1].The latter can be costly, especially for models where the control flow dependson data parameters in the input. Thus, for instance, the RALib tool [9], animplementation of the SL ∗ algorithm, needed more than two hundred thousandinput/reset events to learn register automata with just 6 to 8 locations for TCPclient implementations of Linux, FreeBSD and Windows [14]. Existing black-box model learning algorithms also face severe restrictions on the operationsand predicates on data that are supported (typically, only equality/inequalitypredicates and constants).A natural way to address these limitations is to augment learning algorithmswith white-box information extraction methods, which are able to obtain infor-mation about the system under learning at lower cost than black-box techniques[25]. Constraints on data parameters can be extracted from the code using e.g.tools for symbolic execution [8], concolic execution [19], or tainting [22]. Severalresearchers have successfully explored this idea, see for instance [18,12,7,24]. Re-cently, we showed how constraints on data parameters can be extracted fromPython programs using tainting, and used to boost the performance of RALibwith almost two orders of magnitude [17]. Nevertheless, all these approaches arerather ad hoc, and what is missing is Myhill-Nerode theorem for this enrichedsettings that may serve as a foundation for grey-box model learning algorithmsfor a general class of register automata. In this article, we present such a theorem.More specifically, we propose a new symbolic trace semantics for registerautomata which records both the sequence of input symbols that occur duringa run as well as the constraints on input parameters that are imposed by thisrun. Our main result is a Myhill-Nerode theorem for symbolic trace languages.Whereas the original Myhill-Nerode theorem refers to a single equivalence rela-tion ≡ on words, and constructs a DFA in which states are equivalence classes of ≡ , our generalization requires the use of three relations to capture the additionalstructure of register automata. Location equivalence ≡ l captures that symbolictraces end in the same location, transition equivalence ≡ t captures that they2hare the same final transition, and a partial equivalence relation ≡ r capturesthat symbolic values v and v ′ are stored in the same register after symbolic traces w and w ′ , respectively. A symbolic language is defined to be regular if relations ≡ l , ≡ t and ≡ r exist that satisfy certain conditions, in particular, they all havefinite index. Whereas in the classical case of regular languages the Nerode equiv-alence ≡ is uniquely determined, different relations relations ≡ l , ≡ t and ≡ r mayexist that satisfy the conditions for regularity for symbolic languages. We showthat the symbolic language associated to a register automaton is regular, and weconstruct, for each regular symbolic language, a register automaton that acceptsthis language. In this automaton, the locations are equivalence classes of ≡ l ,the transitions are equivalence classes of ≡ t , and the registers are equivalenceclasses of ≡ r . In this way, we obtain a natural generalization of the classicalMyhill-Nerode theorem for symbolic languages and register automata. UnlikeCassel et al [11], we need no restrictions on the allowed data predicates to proveour result, which drastically increases the range of potential applications. Ourresult paves the way for efficient grey-box learning algorithms in settings wherethe constraints on data parameters can be extracted from the code. In this section, we fix some basic vocabulary for (partial) functions, languages,and logical formulas.
We write f : X ⇀ Y to denote that f is a partial function from set X to set Y .For x ∈ X , we write f ( x ) ↓ if there exists a y ∈ Y such that f ( x ) = y , i.e., theresult is defined, and f ( x ) ↑ if the result is undefined. We write domain ( f ) = { x ∈ X | f ( x ) ↓} and range ( f ) = { f ( x ) ∈ Y | x ∈ domain ( f ) } . We often identifya partial function f with the set of pairs { ( x, y ) ∈ X × Y | f ( x ) = y } . As usual,we write f : X → Y to denote that f is a total function from X to Y , that is, f : X ⇀ Y and domain ( f ) = X . Let Σ be a set of symbols . A word u = a . . . a n over Σ is a finite sequence ofsymbols from Σ . The length of a word u , denoted | u | is the number of symbolsoccurring in it. The empty word is denoted ǫ . We denote by Σ ∗ the set of all wordsover Σ , and by Σ + the set of all nonempty words over Σ (i.e. Σ ∗ = Σ + ∪ { ǫ } ).Given two words u and w , we denote by u · w the concatenation of u and w .When the context allows it, u · w shall be simply written uw . We say that u is a prefix of w iff there exists a word u ′ such that u · u ′ = w . Similarly, u is a suffix of w iff there exists a word u ′ such that u ′ · u = w . A language L over Σ is anyset of words over Σ , so therefore a subset of Σ ∗ . We say that L is prefix closedif, for each w ∈ L and each prefix u of w , u ∈ L as well.3 .3 Guards We postulate a countably infinite set V = { v , v , . . . } of variables . In addition,there is also a variable p
6∈ V that will play a special role as formal parameterof input symbols; we write V + = V ∪ { p } . Our framework is parametrized by aset R of relation symbols. Elements of R are assigned finite arities . A guard isa Boolean combination of relation symbols from R over variables. Formally, theset of guards is inductively defined as follows: – If r ∈ R is an n -ary relation symbol and x , . . . , x n are variables from V + ,then r ( x , . . . , x n ) is a guard. – If g is a guard then ¬ g is a guard. – If g and g are guards then g ∧ g is a guard.We use standard abbreviations from propositional logic such as ⊤ and g ∨ g .We write Var ( g ) for the set of variables that occur in a guard g . We say that g is a guard over set of variables X if Var ( g ) ⊆ X . We write G ( X ) for the set ofguards over X , and use symbol ≡ to denote syntactic equality of guards.We postulate a structure R consisting of a set D of data values and a distin-guished n -ary relation r R ⊆ D n for each n -ary relation symbol r ∈ R . A trivialexample of a structures R , R consists of the binary symbol ‘=’, D the set of nat-ural numbers, and = R is the equality predicate on numbers. An n -ary operation f : D n → D can be modelled in our framework as an n + 1-ary predicate. Wemay for instance extend structure R with a ternary predicate symbol +, where( d , d , d ) ∈ + R iff the sum of d and d equals d . Constants like 0 and 1 canbe added to R as unary predicates.A valuation is a partial function ξ : V + ⇀ D that assigns data values tovariables. If Var ( g ) ⊆ domain ( ξ ), then ξ | = g is defined inductively by: – ξ | = r ( x , . . . , x n ) iff ( ξ ( x ) , . . . , ξ ( x n )) ∈ r R – ξ | = ¬ g iff not ξ | = g – ξ | = g ∧ g iff ξ | = g and ξ | = g If ξ | = g then we say valuation ξ satisfies guard g . We call g is satisfiable , andwrite Sat ( g ), if there exists a valuation ξ such that ξ | = g . Guard g is a tautology if ξ | = g for all valuations ξ with Var ( g ) ⊆ domain ( ξ ).A variable renaming is a partial function σ : V + ⇀ V +. If g is a guardwith Var ( g ) ⊆ domain ( σ ) then g [ σ ] is the guard obtained by replacing eachoccurrence of a variable x in g by variable σ ( x ). The following lemma is easilyproved by induction. Lemma 1. ξ ◦ σ | = g iff ξ | = g [ σ ] Proof.
By induction on structure of g : – g ≡ r ( x , . . . , x n ) ξ ◦ σ | = g ⇔ ( ξ ◦ σ ( x ) , . . . , ξ ◦ σ ( x n )) ∈ r R ⇔ ( ξ ( σ ( x )) , . . . , ξ ( σ ( x n ))) ∈ r R ⇔ ξ | = r ( σ ( x ) , . . . , σ ( x n )) ⇔ ξ | = g [ σ ]4 g ≡ ¬ g ′ ξ ◦ σ | = g ⇔ not ξ ◦ σ | = g ′ ⇔ not ξ | = g ′ [ σ ] (by induction hypothesis) ⇔ ξ | = ¬ g ′ [ σ ] ⇔ ξ | = g [ σ ] – g ≡ g ∧ g ξ ◦ σ | = g ⇔ ξ ◦ σ | = g and ξ ◦ σ | = g (by induction hypothesis) ⇔ ξ | = g [ σ ] and ξ | = g [ σ ] ⇔ ξ | = g [ σ ] ∧ g [ σ ] ⇔ ξ | = g [ σ ] In this section, we introduce register automata and show how they may be usedas recognizers for both data languages and symbolic languages.
A register automaton comprises a set of locations with transitions between them,and a set of registers which can store data values that are received as inputs.Transitions contain guards over the registers and the current input, and mayassign new values to registers.
Definition 1. A register automaton is a tuple A = ( Σ, Q, q , F, V, Γ ) , where – Σ is a finite set of input symbols , – Q is a finite set of locations , with q ∈ Q the initial location , and F ⊆ Q aset of accepting locations , – V ⊂ V is a finite set of registers , and – Γ is a finite set of transitions , each of form h q, α, g, ̺, q ′ i where • q, q ′ ∈ Q are the source and target locations, respectively; we require that q ′ ∈ F ⇒ q ∈ F , • α ∈ Σ is an input symbol, • g ∈ G ( V ∪ { p } ) is a guard, and • ̺ : V ⇀ V ∪ { p } is an assignment ; we require that ̺ is injective.Register automata are required to be completely specified in the sense that foreach location q ∈ Q and each input symbol α ∈ Σ , the disjunction of the guardson the α -transitions with source q is a tautology. Register automata are alsorequired to be deterministic in the sense that for each location q ∈ Q and inputsymbol α ∈ Σ , the conjunction of the guards of any pair of distinct α -transitionswith source q is not satisfiable. We write q α,g,̺ −−−→ q ′ if h q, α, g, ̺, q ′ i ∈ Γ . xample 1. Figure 1 shows a register automaton A = ( Σ, Q, q , F, V, Γ ) with asingle input symbol a and four locations q , q , q and q , with q , q , q acceptingand q non-accepting. The initial location q is marked by an arrow “start”and accepting locations are indicated by a double circle. There is just a singleregister x . Set Γ contains six transitions, which are indicated in the diagram.All transitions are labeled with input symbol a , a guard over formal parameter p and the registers, and an assignment. Guards represent conditions on datavalues. For example, the guard on the transition from q to q , expresses thatthe data value of action a must be smaller than the data value currently storedin register x . We write x := p to denote the assignment that stores the dataparameter p in register x , that is, the function ̺ satisfying ̺ ( x ) = p . Trivialguards ( ⊤ ) and assignments (empty domain) are omitted. Note that location q is actually a sink location, i.e., there is no way to get into an accepting state from q . Thus the register automaton satisfies the condition that for each transitioneither the source location is accepting or the target location is not accepting.When drawing register automata, we often only depict the accepting locations,and leave a non-accepting sink location and the transitions leading to it implicit.Note that in locations q and q , which have more than one outgoing transition,the disjunction of the guards of these transitions is equivalent to true, whereasthe conjunction is equivalent to false.The semantics of a register automaton is defined in terms of the set of datawords that it accepts. Definition 2.
Let Σ be a finite alphabet. A data symbol over Σ is a pair α ( d ) with α ∈ Σ and d ∈ D . A data word over Σ is a finite sequence of data symbols,i.e., a word over Σ × D . A data language over Σ is a set of data words over Σ . We associate a data language to each register automata as follows.
Definition 3.
Let A = ( Σ, Q, q , F, V, Γ ) be a register automaton. A configura-tion of A is a pair ( q, ξ ) , where q ∈ Q and ξ : V ⇀ D . A run of A over a dataword w = α ( d ) · · · α n ( d n ) is a sequence γ = ( q , ξ ) α ( d ) −−−−→ ( q , ξ ) . . . ( q n − , ξ n − ) α n ( d n ) −−−−→ ( q n , ξ n ) , where, for ≤ i ≤ n , ( q i , ξ i ) is a configuration of A , domain ( ξ ) = ∅ , and for < i ≤ n , Γ contains a transition q i − α i ,g i ,̺ i −−−−−→ q i such that q start q q q a, x := pa, x ≤ p, x := pa, p < x, x := pa, x ≤ p, x := p a, p < x a Fig. 1: Register automaton.6 ι i | = g i , where ι i = ξ i − ∪ { ( p, d i ) } , and – ξ i = ι i ◦ ̺ i .We say that run γ is accepting if q n ∈ F and rejecting if q n F . We call w the trace of γ , notation trace ( γ ) = w . Data word w is accepted (rejected) if A hasan accepting (rejecting) run over w . The data language of A , notation L ( A ) , isthe set of all data words that are accepted by A . Two register automata over thesame alphabet Σ are trace equivalent if they accept the same data language.Example 2. Consider the register automaton of Figure 1. This automaton ac-cepts the data word a (1) a (4) a (0) a (7) since the following sequence of steps isa run (here ξ is the trivial function with empty domain):( q , ξ ) a (1) −−→ ( q , x a (4) −−→ ( q , x a (0) −−→ ( q , x a (7) −−→ ( q , x . Note that the final location q of this run is accepting. Upon receiving the firstinput a (1), the automaton jumps to q and stores data value 1 in the register x .Since 4 is bigger than 1, the automaton takes the self loop upon receiving thesecond input a (4) and stores 4. Since 0 is less than 4, it moves to q upon receiptof the third input a (0) and updates x to 0. Finally, the automaton gets back to q as 7 is bigger than 0.Suppose that in the register automaton of Figure 1 we replace the guard onthe transition from q to q by x ≤ p . Since initial valuation ξ does not assigna value to x , this means that it is not defined whether ξ satisfies guard x ≤ p .Automata in which such “runtime errors” do not occur are called well-formed . Definition 4.
Let A be a register automaton. We say that a configuration ( q, ξ ) of A is reachable if there is a run of A that ends with ( q, ξ ) . We call A well-formed if, for each reachable configuration ( q, ξ ) , ξ assigns a value to all variablesfrom V that occur in guards of outgoing transitions of q , that is, ( q, ξ ) reachable ∧ q α,g̺ −−−→ q ′ ⇒ Var ( g ) ⊆ domain ( ξ ) ∪ { p } . As soon as the set of data values and the collection of predicates becomes non-trivial, well-formedness of register automata becomes undecidable. However, itis easy to come up with a sufficient condition for well-formedness, based on asyntactic analysis of A , which covers the cases that occur in practice. In theremainder of article, we will restrict our attention to well-formed register au-tomata. In particular, the register automata that are constructed from regularsymbolic trace languages in our Myhill-Nerode theorem will be well-formed.Our definition of a register automaton is different from the one used in the SL ∗ algorithm [11] and its implementation in RALib [9]. It is instructive to com-pare the two definitions. In order to establish a Myhill-Nerode theorem, [11]requires that structure R , which is a parameter of the SL ∗ algorithm, is weaklyextendible . This technical restriction excludes many data types that are com-monly used in practice. For instance, the set of integers with constants 0 and7, an addition operator +, and a less-than predicate < is not weakly extend-able. In our approach, no restrictions on R are needed. Unlike [11], we do notassociate a fixed set of variables to each location. Our definition is slightly moregeneral, which simplifies some technicalities. However, we require assignmentsto be injective, a restriction that is not imposed by [11]. But note that the reg-ister automata that are actually constructed by SL ∗ are right-invariant [10]. Ina right-invariant register automaton, two values can only be tested for equalityif one of them is the current input symbol. Right-invariance, as defined in [10],implies that assignments are injective. As illustrated by the example of Figure 2,our register automata are exponentially more succinct than the right-invariantregister automata constructed by SL ∗ . As pointed out in [10], right-invariantregister automata in turn are more succinct than the automata of [16,5]. Our q start q q q n − q n q ok a, x := p a, x := p · · · a, x n := pb, V n − i =1 ( x i = x i +1 ↔ x n + i = x n + i +1 ) Fig. 2: For each n > A n is a register automaton that first accepts 2 n inputsymbols a , storing all the data values that it receives, and then accepts inputsymbol b when two consecutive values in the first half of the input are equal iffthe corresponding consecutive values in the second half of the input are equal.The number of locations and transitions of A n grows linearly with n . There existright-invariant register automata B n that accept the same data languages, buttheir size grows exponentially with n .definition assumes that, for any transition from q to q ′ , q ′ ∈ F ⇒ q ∈ F . As aconsequence of this assumption, which is not required in [11], the data languageaccepted by a register automaton is prefix closed. We need this property fortechnical reasons, but for models of reactive systems it is actually quite natural.The RALib tool [9] also assumes that data languages are prefix closed. For readers familiar with [11]: A structure (called theory in [11]) is weakly extendableif for all natural numbers k and data words u , there exists a u ′ with u ′ ≈ R u which is k -extendable. Intuitively, u ′ ≈ R u if data words u ′ and u have the samesequences of actions and cannot be distinguished by the relations in R . Let u = α (0) α (1) α (2) α (4) α (8) α (16) α (11). Then there exists just one u ′ different from u with u ′ ≈ R u , namely u ′ = α (0) α (1) α (2) α (4) α (8) α (16) α (13). Now both u and u ′ are noteven 1-extendable: if we extend u with α (3), we cannot find a matching extension α ( d ′ ) of u ′ such that uα (3) ≈ R u ′ α ( d ′ ), and if we extend u ′ with α (5) we cannotfind a matching extension α ( d ) of u such that uα ( d ) ≈ R u ′ α (5). γ of a register automaton A we can trivially extract a data word trace ( γ ) by forgetting all information except the data symbols. Conversely, foreach data word w that is accepted by A , there exists a corresponding acceptingrun γ , which is uniquely determined by the data word since from each configu-ration ( q, ξ ) and data symbol α ( d ), exactly one transition will be enabled. Lemma 2.
Suppose γ and γ ′ are runs of a register automaton A such thattrace ( γ ) = trace ( γ ′ ) . Then γ = γ ′ .Proof. We prove the lemma by contradiction. Suppose γ = γ ′ . All runs of A share at least the initial configuration ( q , ξ ). Let γ be as in Definition 3, andlet ( q i − , ξ i − ) be the last point where γ and γ ′ coincide. From this point, γ continues its course with a step( q i − , ξ i − ) α i ( d i ) −−−−→ ( q i , ξ i ) , whereas γ ′ continues with a different step( q i − , ξ i − ) α i ( d i ) −−−−→ ( q ′ i , ξ ′ i ) . Note that both steps carry the same data symbol as trace ( γ ) = trace ( γ ′ ). Then Γ contains a transition q i − α i ,g i ,̺ i −−−−−→ q i such that ι i | = g i , where ι i = ξ i − ∪{ ( p, d i ) } .In addition, Γ contains a transition q i − α i ,g ′ i ,̺ ′ i −−−−−→ q ′ i such that ι i | = g ′ i . Since ι i | = g i and ι i | = g ′ i , we may conclude that g i ∧ g ′ i is satisfiable. Therefore, as A isdeterministic, both transitions are the same, that is, g i ≡ g ′ i , ̺ i = ̺ ′ i and q i = q ′ i .But then also ξ i = ι i ◦ ̺ i = ι i ◦ ̺ ′ i = ξ ′ i , which means that the two outgoing stepsof configuration ( q i − , ξ i − ) are the same. Contradiction. We will now introduce an alternative trace semantics for register automata,which records both the sequence of input symbols that occur during a run aswell as the constraints on input parameters that are imposed by this run. We willexplore some basic properties of this semantics, and show that the equivalenceinduced by symbolic traces is finer than data equivalence.A symbolic language consists of words in which input symbols and guardsalternate.
Definition 5.
Let Σ be a finite alphabet. A symbolic word over Σ is a finitealternating sequence w = α G · · · α n G n of input symbols from Σ and guards. A symbolic language over Σ is a set of symbolic words over Σ . A symbolic run is just a run, except that the valuations do not return concretedata values, but markers (variables) that record the exact place in the run wherethe input occurred. Using these symbolic valuations (variable renamings, actu-ally) it is straightforward to compute the constraints on the input parametersfrom the guards occurring in the run. 9 efinition 6.
Let A = ( Σ, Q, q , F, V, Γ ) be a register automaton. A symbolicrun of A is a sequence δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) , where ζ is the trivial variable renaming with empty domain and, for < i ≤ n , – q i − α i ,g i ,̺ i −−−−−→ q i is a transition in Γ , – ζ i is a variable renaming with domain ( ζ i ) ⊆ V , – ζ i = ι i ◦ ̺ i , where ι i = ζ i − ∪ { ( p, v i ) } , and – guard G ∧ · · · ∧ G n is satisfiable, where G i ≡ g i [ ι i ] .We say that symbolic run δ is accepting if q n ∈ F and rejecting if q n F . The symbolic trace of δ is the symbolic word strace ( δ ) = α G · · · α n G n . Symbolicword w is accepted (rejected) if A has an accepting (rejecting) symbolic run δ with strace ( δ ) = w . The symbolic language of A , notation L s ( A ) , is the set ofall symbolic words accepted by A . Two register automata over the same alphabet Σ are symbolic trace equivalent if they accept the same symbolic language.Example 3. Consider the register automaton of Figure 1. The following sequenceof tainted steps constitutes a symbolic run:( q , ζ ) a, ⊤ ,̺ −−−−→ ( q , x v ) a,x ≤ p,̺ −−−−−−→ ( q , x v ) a,p Let δ be a symbolic run of A , as in Definition 6. Then range ( ζ i ) ⊆{ v , . . . , v i } , for i ∈ { , . . . , n } , and range ( ι i ) ⊆ { v , . . . , v i } , for i ∈ { , . . . , n } .Proof. By induction on i : – Base. Suppose i = 0. Then the lemma holds trivially since range ( ζ ) = ∅ .10 Induction step. Suppose i > 0. Then range ( ζ i ) = range ( ι i ◦ ̺ i ) ⊆ range ( ι i )= range ( ζ i − ∪ { ( p, v i ) } )= range ( ζ i − ) ∪ { v i } (by induction hypothesis) ⊆ { v , . . . , v i − } ∪ { v i } = { v , . . . , v i } . As a consequence of our assumption that assignments in a register automatonare injective, all the variable renamings in a symbolic run are injective as well. Lemma 4. Let δ be a symbolic run of A , as in Definition 6. Then, for each i ∈ { , . . . , n } , ζ i is injective, and for each i ∈ { , . . . , n } , ι i is injective.Proof. By induction on i : – Base. Suppose i = 0. Then the lemma holds trivially since range ( ζ ) = ∅ . – Induction step. Suppose i > 0. By Lemma 3, range ( ζ i − ) ⊆ { v , . . . , v i − } .By the induction hypothesis, ζ i − is injective. From this we conclude that ζ i − ∪ { ( p, v i ) } is injective, which means ι i is injective. Since the compositionof injective functions is injective, ζ i = ι i ◦ ̺ i is injective.All symbolic words accepted by a register automaton satisfy some basic sanityproperties: guards may only refer to the markers for values received thus far, andthe conjunction of all the guards is satisfiable. We call symbolic words that satisfythese properties feasible . Note that if a symbolic word is feasible, any prefix isfeasible as well. Definition 7 (Feasible). Let w = α G · · · α n G n be a symbolic word. We writelength ( w ) = n and guard ( w ) = G ∧ · · · ∧ G n . Word w is feasible if guard ( w ) is satisfiable and Var ( G i ) ⊆ { v , . . . , v i } , for each i ∈ { , . . . , n } . A symboliclanguage is feasible if it is prefix closed and consists of feasible symbolic words. Lemma 5. L s ( A ) is feasible.Proof. Since for any transition h q, α, g, ̺, q ′ i of A , q ′ ∈ F implies q ∈ F , L s ( A ) isprefix closed. Suppose w = α G · · · α n G n is a symbolic word of A . It suffices toshow that w is feasible. Consider a symbolic run δ for w , as in Definition 6. ByLemma 3, Var ( G i ) = Var ( g i [ ι i ]) ⊆ range ( ι i ) ⊆ { v , . . . , v i } , for i ∈ { , . . . , n } .By definition of δ , guard ( w ) = G ∧ · · · ∧ G n is satisfiable.Since register automata are deterministic, each symbolic trace of A corre-sponds to a unique symbolic run of A . Lemma 6. Suppose δ and δ ′ are symbolic runs of a register automaton A suchthat strace ( δ ) = strace ( δ ′ ) . Then δ = δ ′ . roof. We prove the lemma by contradiction. Suppose δ = δ ′ . All symbolic runsof A share at least the initial configuration ( q , ζ ). Let δ be as in Definition 6,and let ( q i − , ζ i − ) be the last point where δ and δ ′ coincide. From this point, δ continues its course with a step( q i − , ζ i − ) α i ,g i ,̺ i −−−−−→ ( q i , ζ i ) , whereas δ ′ continues with a different step( q i − , ζ i − ) α ′ i ,g ′ i ,̺ ′ i −−−−−→ ( q ′ i , ζ ′ i ) . Since strace ( δ ) = strace ( δ ′ ), α i = α ′ i and g i [ ι i ] ≡ g ′ i [ ι i ], where ι i = ζ i − ∪{ ( p, v i ) } .By Lemma 4, variable renaming ι i is injective, which implies that g i ≡ g ′ i .The underlying transitions q i − α i ,g i ,̺ i −−−−−→ q i and q i − α ′ i ,g ′ i ,̺ ′ i −−−−−→ q ′ i of A must bedifferent, because otherwise also ζ i and ζ ′ i would be equal. Therefore, since A isdeterministic, g i ∧ g ′ i is not satisfiable. Since g i ≡ g ′ i , this means that g i is notsatisfiable. But since δ is a symbolic run, G i = g i [ ι i ] is satisfiable, and thus thereexists a valuation ξ such that ξ | = G i . But now Lemma 1 gives ξ ◦ ι i | = g i . Thismeans that g i is satisfiable and we have derived a contradiction. Definition 8. Let A be a register automaton and w ∈ L s ( A ) . Then we writesymb ( w ) for the unique symbolic run δ of A with strace ( δ ) = w . There exists a one-to-one correspondence between runs of A and pairs con-sisting of a symbolic run of A and a satisfying assignments for the guards fromits symbolic trace. Lemma 7. Let δ be a symbolic run of A , as in Definition 6, and ξ : { v , . . . , v n } →D a valuation such that ξ | = G ∧· · ·∧ G n . Let run A ( δ, ξ ) be the sequence obtainedfrom δ by (a) replacing each input α i by data symbol α i ( ξ ( v i )) (for < i ≤ n ),(b) removing guards g i and assignments ̺ i , and (c) replacing valuations ζ i by ξ i = ξ ◦ ζ i (for ≤ i ≤ n ). Then run A ( δ, ξ ) is a run of A .Proof. It suffices to show, for 0 < i ≤ n , that κ i | = g i , where κ i = ξ i − ∪{ ( p, d i ) } ,and ξ i = κ i ◦ ̺ i , for 0 < i ≤ n . We derive ξ ◦ ι i = ξ ◦ ( ζ i − ∪ { p, v i } ) = ξ ◦ ζ i − ∪ { p, d i } = ξ i − ∪ { ( p, d i ) } = κ i . By assumption, ξ | = G i ≡ g i [ ι i ]. By Lemma 1, ξ ◦ ι i | = g i . Hence, by the abovederivation, κ i | = g i , as required. We derive ξ i = ξ ◦ ζ i = ξ ◦ ( ι i ◦ ̺ i ) = ( ξ ◦ ι i ) ◦ ̺ i = κ i ◦ ̺ i . Thus ξ i = κ i ◦ ̺ i , as required. Lemma 8. Let γ be a run of register automaton A . Then there exist a valuation ξ and symbolic run δ such that run A ( δ, ξ ) = γ . roof. Let γ be as in Definition 3: γ = ( q , ξ ) α ( d ) −−−−→ ( q , ξ ) . . . ( q n − , ξ n − ) α n ( d n ) −−−−→ ( q n , ξ n ) , where, for 0 ≤ i ≤ n , ( q i , ξ i ) is a configuration of A , domain ( ξ ) = ∅ , and for0 < i ≤ n , Γ contains a transition q i − α i ,g i ,̺ i −−−−−→ q i such that – ι ′ i | = g i , where ι ′ i = ξ i − ∪ { ( p, d i ) } , and – ξ i = ι ′ i ◦ ̺ i .Since A is deterministic, the transitions q i − α i ,g i ,̺ i −−−−−→ q i are uniquely determined.Let ζ be the trivial variable renaming with empty domain and, for 0 < i ≤ n ,define ζ i inductively by ζ i = ι i ◦ ̺ i and ι i = ζ i − ∪{ ( p, v i ) } . Let ξ : { v , . . . , v n } →D be given by ξ ( v i ) = d i , for 1 ≤ i ≤ n , and let δ be the sequence δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) . We claim that δ is a symbolic execution of A . For this, it suffices to show that ξ | = G ∧ · · · ∧ G n , where G i ≡ g i [ ι i ]. By induction on i we show ι ′ i = ξ ◦ ι i for 0 < i ≤ nξ i = ξ ◦ ζ i for 0 ≤ i ≤ n 1. Base. ξ = ξ ◦ ζ , as both ξ and ζ have empty domain.2. Induction step. ι ′ i = ξ i − ∪ { ( p, d i ) } = (by induction hypothesis)= ξ ◦ ζ i − ∪ { ( p, d i ) } = ξ ◦ ( ζ i − ∪ { ( p, v i ) } ) = ξ ◦ ι i ξ i = ι ′ i ◦ ̺ i = ( ξ ◦ ι i ) ◦ ̺ i = ξ ◦ ( ι i ◦ ̺ i ) = ξ ◦ ζ i Let 0 ≤ i ≤ n . Since γ is a run, ι ′ i | = g i . By the identity we just derived, ξ ◦ ι i | = g i .By Lemma 1, ξ | = g i [ ι i ] ≡ G i . Hence ξ | = G ∧ · · · ∧ G n , which proves the claimthat δ is a symbolic execution of A . It is easy to verify that γ = run ( δ, ξ ).Using the above lemmas, we can prove that whenever two register automataaccept the same symbolic language, they also accept the same data language. Theorem 1. Suppose A and B are register automata with L s ( A ) = L s ( B ) . Then L ( A ) = L ( B ) .Proof. We will prove L ( A ) ⊆ L ( B ). The proof of the inclusion L ( A ) ⊆ L ( B ) issymmetric. Suppose w ∈ L ( A ). Then there exists a run γ of A with trace ( γ ) = w .By Lemma 8, there exist a valuation ξ and symbolic run δ of A such that run A ( δ, ξ ) = γ . Let u = strace ( δ ). Then u ∈ L s ( A ) and, since L s ( A ) = L s ( B ), u ∈ L s ( B ). Let δ ′ be a symbolic run of B such that strace ( δ ′ ) = u . Let γ ′ = run B ( δ ′ , ξ ). By Lemma 7, γ ′ is a run of B . Let w ′ = trace ( γ ′ ). Then w ′ ∈ L ( B ).Note that w and w ′ share the same sequence of data values, as given by valuation ξ . Also note that w , γ , δ , u , δ ′ , γ ′ and w ′ all share the same sequence of inputsymbols. Thus w = w ′ and w ∈ L ( B ), as required.13 start q start q q a, p > a, p ≤ a Fig. 3: Trace equivalent but not symbolic trace equivalent. Example 4. The converse of Theorem 1 does not hold. Figure 3 gives a trivialexample of two register automata with the same data language but a differentsymbolic language.Lemma 7 allows us to rephrase the well-formedness condition of registerautomata in terms of symbolic runs. Corollary 1. Register automaton A is well-formed iff, for each symbolic run δ that ends with ( q, ζ ) , q α,g̺ −−−→ q ′ ⇒ Var ( g ) ⊆ domain ( ζ ) ∪ { p } .Proof. “ ⇒ ” Suppose symbolic execution δ is defined as in Definition 6. Let ξ : { v , . . . , v n } → D be a valuation such that ξ | = G ∧· · ·∧ G n . (Such a valuation ξ exists since, by definition of a symbolic run, G ∧ · · · ∧ G n is satisfiable.)By Lemma 7, γ = run ( δ, ξ ) is a run of A . By construction of γ , γ ends witha reachable configuration ( q, ξ ), where domain ( ξ ) = domain ( ζ ). Now we mayapply the definition of well-formedness to conclude Var ( g ) ⊆ domain ( ζ ) ∪ { p } .“ ⇐ ” Suppose that for each symbolic run δ that ends with ( q, ζ ), we have q α,g̺ −−−→ q ′ ⇒ Var ( g ) ⊆ domain ( ζ ) ∪ { p } . Let ( q, ξ ) be the final configurationof a run γ of A with q α,g̺ −−−→ q ′ . By Lemma 8, there exist a valuation ξ ′ andsymbolic run δ such that run A ( δ, ξ ′ ) = γ . Let ( q, ζ ) be the final configuration ofsymbolic run δ . Then by the assumption, Var ( g ) ⊆ domain ( ζ ) ∪ { p } . Therefore,since domain ( ξ ) = domain ( ζ ), Var ( g ) ⊆ domain ( ζ ) ∪ { p } and we may concludethat A is well-formed. The Myhill-Nerode equivalence [29,21] deems two words w and w ′ of a language L equivalent if there does not exist a suffix u that distinguishes them, that is,only one of the words w · u and w ′ · u is in L . The Myhill-Nerode theorem statesthat L is regular if and only if this equivalence relation has a finite index, andmoreover that the number of states in the smallest deterministic finite automa-ton (DFA) recognizing L is equal to the number of equivalence classes. In thissection, we present a Myhill-Nerode theorem for symbolic languages and registerautomata. We need three relations ≡ l , ≡ t and ≡ r on symbolic words to capture14he structure of register automata. Intuitively, two symbolic words w and w ′ are location equivalent , notation w ≡ l w ′ , if they lead to the same location, transitionequivalent , notation w ≡ t w ′ , if they share the same final transition, and marker v of w , and marker v ′ of w ′ are register equivalent , notation ( w, v ) ≡ r ( w ′ , v ′ ),when they are stored in the same register after occurrence of words w and w ′ .Whereas ≡ l and ≡ t are equivalence relations, ≡ t is a partial equivalence relation(PER), that is, a relation that is symmetric and transitive. Relation ≡ r is notnecessarily reflexive, as ( w, v ) ≡ r ( w, v ) only holds when marker v is stored aftersymbolic trace w . Since a register automaton has finitely many locations, finitelymany transitions, and finitely many registers, the equivalences ≡ l and ≡ t , andthe equivalence induced by ≡ r , are all required to have finite index. Definition 9. A feasible symbolic language L over Σ is regular iff there existthree relations: – an equivalence relation ≡ l on L , called location equivalence , – an equivalence relation ≡ t on L \ { ǫ } , called transition equivalence , and – a partial equivalence relation ≡ r on { ( w, v i ) ∈ L × V | i ≤ length ( w ) } , called register equivalence . We say that w stores v if ( w, v ) ≡ r ( w, v ) .We require that equivalences ≡ l and ≡ t , as well as the equivalence relation ob-tained by restricting ≡ t to { ( w, v ) ∈ L × V | w stores v } have finite index.We also require that relations ≡ l , ≡ t and ≡ r satisfy the conditions of Table 1,for w, w ′ , u, u ′ ∈ L , length ( w ) = m , length ( w ′ ) = n , α, α ′ ∈ Σ , G, G ′ guards, v, v ′ ∈ V , and σ : V ⇀ V . Condition 1 implies that, given w , w ′ and v , there isat most one v ′ s.t. ( w, v ) ≡ r ( w ′ , v ′ ) . Therefore, we may define matching ( w, w ′ ) as the variable renaming σ satisfying: σ ( v ) = v ′ if ( w, v ) ≡ r ( w ′ , v ′ ) v n +1 if v = v m +1 undefined otherwise Intuitively, the first condition captures that a register can store at most a singlevalue at a time. When wαG and w ′ α ′ G ′ share the same final transition, then inparticular w and w ′ share the same final location (Condition 2), input symbols α and α ′ are equal (Condition 3), G ′ is just a renaming of G (Condition 4),and wαG and w ′ α ′ G ′ share the same final location (Condition 5) and finalassignment (Conditions 6, 7 and 8). Condition 6 says that the parameters ofthe final input end up in the same register when they are stored. Condition 7says that when two values are stored in the same register, they will stay in thesame register for the rest of their life (this condition can be viewed as a rightinvariance condition for registers). Conversely, if two values are stored in thesame register after a transition, and they do not correspond to the final input,they were already stored in the same register before the transition (Condition 8).Condition 9 captures the well-formedness assumption for register automata. Asa consequence of Condition 9, G [ σ ] is defined in Conditions 4, 10 and 11, since15 w, v ) ≡ r ( w, v ′ ) ⇒ v = v ′ (1) wαG ≡ t w ′ α ′ G ′ ⇒ w ≡ l w ′ (2) wαG ≡ t w ′ α ′ G ′ ⇒ α = α ′ (3) wαG ≡ t w ′ αG ′ ∧ σ = matching ( w, w ′ ) ⇒ G [ σ ] ≡ G ′ (4) w ≡ t w ′ ⇒ w ≡ l w ′ (5) w ≡ t w ′ ∧ w stores v m ⇒ ( w, v m ) ≡ r ( w ′ , v n ) (6) u ≡ t u ′ ∧ u = wαG ∧ u ′ = w ′ αG ′ ∧ ( w, v ) ≡ r ( w ′ , v ′ ) ∧ u stores v ⇒ ( u, v ) ≡ r ( u ′ , v ′ ) (7) u ≡ t u ′ ∧ u = wαG ∧ u ′ = w ′ αG ′ ∧ ( u, v ) ≡ r ( u ′ , v ′ ) ∧ v = v m +1 ⇒ ( w, v ) ≡ r ( w ′ , v ′ ) (8) w ≡ l w ′ ∧ wαG ∈ L ∧ v ∈ Var ( G ) \ { v m +1 } ⇒ ∃ v ′ : ( w, v ) ≡ r ( w ′ , v ′ ) (9) w ≡ l w ′ ∧ wαG ∈ L ∧ σ = matching ( w, w ′ ) ∧ Sat ( guard ( w ′ ) ∧ G [ σ ]) ⇒ w ′ αG [ σ ] ∈ L (10) w ≡ l w ′ ∧ wαG ∈ L ∧ w ′ αG ′ ∈ L ∧ σ = matching ( w, w ′ ) ∧ Sat ( G [ σ ] ∧ G ′ ) ⇒ wαG ≡ t w ′ αG ′ (11) Table 1: Conditions for regularity of symbolic languages. Var ( G ) ⊆ domain ( σ ). Condition 10 is the equivalent for symbolic languages ofthe well-known right invariance condition for regular languages. For symboliclanguages a right invariance condition w ≡ l w ′ ∧ wαG ∈ L ∧ σ = matching ( w, w ′ ) ⇒ w ′ αG [ σ ] ∈ L would be too strong: even though w and w ′ lead to the same location, the valuesstored in the registers may be different, and therefore they will not necessarilyenable the same transitions. However, when in addition guard ( w ′ ) ∧ G [ σ ] is sat-isfiable, we may conclude that w ′′ αG [ σ ] ∈ L . Condition 11, finally, asserts that L only allows deterministic behavior.The simple lemma below asserts that, due to the determinism imposed byCondition 11, the converse of Conditions 2, 3 and 4 also holds. This means that ≡ t can be expressed in terms of ≡ l and ≡ r , that is, once we have fixed ≡ l and ≡ r , relation ≡ t is fully determined. Lemma 9. Suppose symbolic language L over Σ is regular, and equivalences ≡ l , ≡ t and ≡ r satisfy the conditions of Definition 9. Then w ≡ l w ′ ∧ wαG ∈ L ∧ w ′ αG ′ ∈ L ∧ σ = matching ( w, w ′ ) ∧ G ′ ≡ G [ σ ] ⇒ wαG ≡ t w ′ αG ′ . Proof. Suppose the left hand side of the above implication holds. Since L isregular, it is in particular feasible, and therefore G ′ is satisfiable. But then, since16 ′ ≡ G [ σ ], also G [ σ ] ∧ G ′ is satisfiable. Therefore, Condition 11 implies that theright hand side of the implication holds.We can now state and prove our “symbolic” version of the celebrated resultof Myhill & Nerode. First we prove that the symbolic language of any regis-ter automaton is regular (Theorem 2), and then we establish that any regularsymbolic language can be obtained as the symbolic language of some registerautomaton (Theorem 3). Theorem 2. Suppose A is a register automaton. Then L s ( A ) is regular.Proof. Let L = L s ( A ). Then, by Lemma 5, L is feasible. Define equivalences ≡ l , ≡ t and ≡ r as follows: – For w, w ′ ∈ L , w ≡ l w ′ iff symb ( w ) and symb ( w ′ ) share the same finallocation. – For w, w ′ ∈ L \ { ǫ } , w ≡ t w ′ iff symb ( w ) and symb ( w ′ ) share the same finaltransition. – For w, w ′ ∈ L and v, v ′ ∈ V , ( w, v ) ≡ r ( w ′ , v ′ ) iff there is a register x ∈ V suchthat the final valuations ζ of symb ( w ) stores v in x , and the final valuation ζ ′ of symb ( w ′ ) stores v ′ in x , that is, ζ ( x ) = v and ζ ′ ( x ) = v ′ .(Note that, by Lemma 3, range ( ζ ) ⊆ { v , . . . , v m } , for m = length ( w ), and range ( ζ ′ ) ⊆ { v , . . . , v n } , for n = length ( w ′ ).)Then ≡ l has finite index since A has a finite number of locations, ≡ t has finiteindex since A has a finite number of transitions, and the equivalence induced by ≡ r has finite index since A has a finite number of registers.Assume w, w ′ ∈ L , where w contains m input symbols and w ′ contains n input symbols. Let symb ( w ) = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . ( q m − , ζ m − ) α m ,g m ,̺ m −−−−−−−→ ( q m , ζ m ) , symb ( w ′ ) = ( q ′ , ζ ′ ) α ′ ,g ′ ,̺ ′ −−−−−→ ( q ′ , ζ ′ ) . . . ( q ′ n − , ζ ′ n − ) α ′ n ,g ′ n ,̺ ′ n −−−−−−→ ( q ′ n , ζ ′ n ) , as in Definition 6. We show that all 11 conditions of Table 1 hold: – Condition 1. If ( w, v ) ≡ r ( w, v ′ ) then v and v ′ are stored in the same register x in the final valuation ζ m of symb ( w ). Thus v = ζ m ( x ) = v ′ . – Condition 2. If symb ( wαG ) and symb ( w ′ α ′ G ′ ) share the same final transi-tion, then symb ( w ) and symb ( w ′ ) certainly share the same final location. – Condition 3. If symb ( wαG ) and symb ( w ′ α ′ G ′ ) share the same final transi-tion, then α and α ′ must be equal to the input symbols of this final transition,and thus equal to each other. – Condition 4. Assume wαG ≡ t w ′ αG ′ and σ = matching ( w, w ′ ). Let symb ( wαG )and symb ( w ′ αG ′ ) be obtained by appending transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )17o symb ( w ) and symb ( w ′ ), respectively. Then q m = q ′ n , g ≡ g ′ , ̺ = ̺ ′ , q = q ′ , G ≡ g [ ι ], where ι = ζ m ∪ { ( p, v m +1 ) } , and G ′ ≡ g ′ [ ι ′ ], where ι ′ = ζ ′ n ∪{ ( p, v n +1 ) } . We have to show that G ′ ≡ G [ σ ], or equivalently g [ σ ◦ ι ] = g [ ι ′ ].Suppose x ∈ Var ( g ). • If x = p then σ ◦ ι ( x ) = σ ◦ ι ( p ) = σ ( v m +1 ) = v n +1 = ι ′ ( p ) = ι ′ ( x ). • If x = p then, by Corollary 1, x ∈ domain ( ζ m ) and x ∈ domain ( ζ ′ n ). Let v = ζ m ( x ) and v ′ = ζ ′ n ( x ). Then, by definition of ≡ r , ( w, v ) ≡ r ( w ′ , v ′ )and thus σ ( v ) = v ′ . Hence σ ◦ ι ( x ) = σ ◦ ζ m ( x ) = σ ( v ) = v ′ = ζ ′ m ( x ) = ι ′ ( x ). – Condition 5. If symb ( w ) and symb ( w ′ ) share the same final transition, theycertainly share the same final location. – Condition 6. Assume w ≡ t w ′ and w stores v m . Then there exists a variable x such that ζ m ( x ) = v m . By the definition of symbolic runs, ζ m = ι m ◦ ̺ m ,where ι m = ζ m − ∪ { ( p, v m ) } . By Lemma 3, range ( ζ m − ) ⊆ { v , . . . , v m − } .We conclude that ̺ m ( x ) = p . Again by the definition of symbolic runs, ζ ′ n = ι ′ n ◦ ̺ ′ n , where ι ′ n = ζ ′ n − ∪ { ( p, v n ) } . Since w ≡ t w ′ , we know ̺ m = ̺ ′ n .Therefore ζ ′ n ( x ) = ι ′ n ◦ ̺ ′ n ( x ) = ι ′ n ◦ ̺ m ( x ) = ι ′ n ( p ) = v n . This implies( w, v m ) ≡ r ( w ′ , v n ), as required. – Condition 7. Assume that u ≡ t u ′ , u = wαG , u ′ = w ′ αG ′ , u stores v ,and ( w, v ) ≡ r ( w ′ , v ′ ). Let symb ( wαG ) and symb ( w ′ αG ′ ) be obtained byappending transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )to symb ( w ) and symb ( w ′ ), respectively. Then ̺ = ̺ ′ , ζ = ι ◦ ̺ , where ι = ζ m ∪ { ( p, v m +1 ) } , and ζ ′ = ι ′ ◦ ̺ , where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since ( w, v ) ≡ r ( w ′ , v ′ ), there exists an x ∈ V such that ζ m ( x ) = v and ζ ′ n ( x ) = v ′ . Thusalso ι ( x ) = v and ι ′ ( x ) = v ′ . Since u stores v , there exists an y ∈ V suchthat ζ ( y ) = v . By Lemma 4, ι is injective. Thus ι ( ̺ ( y )) = v and ι ( x ) = v implies ̺ ( y ) = x . But this means ζ ′ ( y ) = ι ′ ◦ ̺ ( y ) = ι ′ ( x ) = v ′ . Therefore( u, v ) ≡ r ( u ′ , v ′ ). – Condition 8. Assume u ≡ t u ′ , u = wαG , u ′ = w ′ αG ′ , v = v m +1 and( u, v ) ≡ r ( u ′ , v ′ ). Let symb ( wαG ) and symb ( w ′ αG ′ ) be obtained by append-ing transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )to symb ( w ) and symb ( w ′ ), respectively. Then ̺ = ̺ ′ , ζ = ι ◦ ̺ , where ι = ζ m ∪ { ( p, v m +1 ) } , and ζ ′ = ι ′ ◦ ̺ , where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since ( u, v ) ≡ r ( u ′ , v ′ ), there exists an x ∈ V such that ζ ( x ) = v and ζ ′ ( x ) = v ′ . Using v = v m +1 , we infer that there exists an y ∈ V such that ̺ ( x ) = y and ζ m ( y ) = v . Now we derive ζ ′ n ( y ) = ι ′ ( y ) = ι ′ ◦ ̺ ( x ) = ζ ′ ( x ) = v ′ . Therefore( w, v ) ≡ r ( w ′ , v ′ ). – Condition 9. Assume w ≡ l w ′ , wαG ∈ L and v ∈ Var ( G ) \ { v m +1 } . Let symb ( wαG ) be obtained by appending transition( q m , ζ m ) α,g,̺ −−−→ ( q, ζ )18o symb ( w ). Then q m = q ′ n and G ≡ g [ ι ], where ι = ζ m ∪ { ( p, v m +1 ) } . Since v ∈ Var ( G ) \{ v m +1 } , there exists a variable x ∈ Var ( g ) \{ p } with ζ m ( x ) = v .By Corollary 1, Var ( g ) ⊆ domain ( ζ ′ n ) ∪ { p } , and thus x ∈ domain ( ζ ′ n ). Let v ′ = ζ ′ n ( x ). Then ( w, v ) ≡ r ( w ′ , v ′ ). – Condition 10. Assume that w ≡ l w ′ , wαG ∈ L , σ = matching ( w, w ′ ) and Sat ( guard ( w ′ ) ∧ G [ σ ]). Since wαG ∈ L , symb ( wαG ) can be obtained byappending a transition ( q m , ζ m ) α,g,̺ −−−→ ( q, ζ )to symb ( w ), with G ≡ g [ ι ], where ι = ζ m ∪ { ( p, v m +1 ) } . Since w ≡ l w ′ , q m = q ′ n . Now consider the sequence δ ′ obtained by appending a transition( q ′ n , ζ ′ n ) α,g,̺ −−−→ ( q, ζ ′ )to symb ( w ′ ), with ζ ′ = ι ′ ◦ ̺ , where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since guard ( w ′ ) ∧ G [ σ ] is satisfiable, we may conclude that δ ′ is a symbolic execution if we canprove G [ σ ] ≡ g [ ι ′ ], or equivalently g [ σ ◦ ι ] = g [ ι ′ ]. Suppose x ∈ Var ( g ). • If x = p then σ ◦ ι ( x ) = σ ◦ ι ( p ) = σ ( v m +1 ) = v n +1 = ι ′ ( p ) = ι ′ ( x ). • If x = p then, by Corollary 1, x ∈ domain ( ζ m ) and x ∈ domain ( ζ ′ n ). Let v = ζ m ( x ) and v ′ = ζ ′ n ( x ). Then, by definition of ≡ r , ( w, v ) ≡ r ( w ′ , v ′ )and thus σ ( v ) = v ′ . Hence σ ◦ ι ( x ) = σ ◦ ζ m ( x ) = σ ( v ) = v ′ = ζ ′ m ( x ) = ι ′ ( x ).Hence g [ σ ◦ ι ] = g [ ι ′ ] and δ ′ is a symbolic run for w ′ αG [ σ ]. Since q ∈ F , weconclude wαG [ σ ] ∈ L . – Condition 11. Suppose w ≡ l w ′ , wαG ∈ L , w ′ αG ′ ∈ L , σ = matching ( w, w ′ )and G [ σ ] ∧ G ′ is satisfiable. Let δ = symb ( wαG ) and δ ′ = symb ( w ′ αG ′ ) beobtained by appending transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )to symb ( w ) and symb ( w ′ ), respectively. Then G ≡ g [ ι ], where ι = ζ m ∪{ ( p, v m +1 ) } , and G ′ ≡ g ′ [ ι ′ ], where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since G [ σ ] ∧ G ′ issatisfiable, there exists a valuation ξ such that ξ | = G [ σ ] ∧ G ′ . Define variable renaming σ ′ as follows σ ′ ( x ) = (cid:26) ι ′ ( x ) if x ∈ Var ( g ′ ) σ ◦ ι ( x ) otherwiseThen clearly G ′ ≡ g ′ [ ι ′ ] ≡ g ′ [ σ ′ ]. We verify that G [ σ ] ≡ g [ σ ◦ ι ] ≡ g [ σ ′ ]. Let x ∈ Var ( g ). Then • If x = p then σ ◦ ι ( x ) = σ ◦ ι ( p ) = σ ( v m +1 ) = v n +1 = ι ′ ( p ) = ι ′ ( x ). • If x ∈ Var ( g ′ ) \ { p } then, by Corollary 1, x ∈ domain ( ζ m ) and x ∈ domain ( ζ ′ n ). Let ζ m ( x ) = v and ζ ′ n ( x ) = v ′ . Then ( w, v ) ≡ r ( w ′ , v ′ ) andthus σ ( v ) = v ′ . Hence σ ◦ ι ( x ) = σ ◦ ζ m ( x ) = σ ( v ) = v ′ = ζ ′ n ( x ) = ι ′ ( x ).19 If x Var ( g ′ ) then, by definition of σ ′ , σ ◦ ι ( x ) = σ ′ ( x ).Thus G [ σ ] ∧ G ′ ≡ ( g ∧ g ′ )[ σ ′ ] . Therefore ξ | = ( g ∧ g ′ )[ σ ′ ] and, by Lemma 1, ξ ◦ σ ′ | = g ∧ g ′ . This meansthat g ∧ g ′ is satisfiable. Since w ≡ l w ′ , q m = q ′ n . Because A is requiredto be deterministic, the conjunction of the guards of any pair of distinct α -transitions from q m = q ′ n is not satisfiable. Therefore the final transitionsof δ and δ ′ must be equal. This implies wαG ≡ t w ′ αG ′ .The following example shows that in general there is no coarsest locationequivalence that satisfies all conditions of Table 1. So whereas for regular lan-guages a unique Nerode congruence exists, this is not always true for symboliclanguages. Example 5. Consider the symbolic language L that consists of the following threesymbolic words and their prefixes: w = a v > a v > b ⊤ u = a v = 0 a v = 0 b ⊤ z = a v < c v + v = 0 a v > c ⊤ Symbolic language L is accepted by both automata displayed in Figure 4. Thus,by Theorem 2, L is regular. Let w i , u i and z i denote the prefixes of w , u and z , respectively, of length i . Then, according to the location equivalence inducedby the first automaton, w ≡ l u , and according to the location equivalenceinduced by the second automaton, u ≡ l z . Therefore, if a coarsest locationequivalence relation would exist, w ≡ l z should hold. Then, by Condition 9,( w , v ) ≡ r ( z , v ). Thus, by Lemma 9, w ≡ t z , and therefore, by Condition 5, w ≡ l z . But now Condition 10 implies a v > a v > c ⊤ ∈ L , which is acontradiction. Theorem 3. Suppose L is a regular symbolic language over Σ . Then there exista register automaton A such that L = L s ( A ) .Proof. Since any register automaton without accepting locations accepts theempty symbolic language, we may assume without loss of generality that L isnonempty. Let ≡ l , ≡ t , ≡ r be relations satisfying the properties stated in Defini-tion 9. We define register automaton A = ( Σ, Q, q , F, V, Γ ) as follows: – Q = { [ w ] l | w ∈ L } ∪ { q sink } , where q sink is a special sink location .(Since L is regular, ≡ l has finite index, and so Q is finite, as required.) – q = [ ǫ ] l .(Since L is regular, it is feasible, and thus prefix closed. Therefore, since weassume that L is nonempty, ǫ ∈ L .) – F = { [ w ] l | w ∈ L } . 20 start q q q q q q a, p < , x := p a, p > , x := pc, x + p = 0 , x := p a, x > a, x > b, ⊤ c, ⊤ a, p = 0 , x := p a, x = 0 q start q q q q q q a, p < , x := p a, p > , x := pc, x + p = 0 , x := p a, x > a, x > b, ⊤ c, ⊤ a, p = 0 , x := p a, x = 0 Fig. 4: There is no unique, coarsest location equivalence. – V = { [( w, v )] r | w ∈ L ∧ v ∈ V ∧ w stores v } .(Since L is regular, the equivalence induced by ≡ r has finite index, and so V is finite, as required. Note that registers are supposed to be elements of V , and equivalence classes of ≡ r are not. Thus, strictly speaking, we shouldassociate a unique register of V to each equivalence class of ≡ r , and define V in terms of those registers.) – Γ contains a transition h q, α, g, ̺, q ′ i for each equivalence class [ wαG ] t , where • q = [ w ] l (Condition 2 ensures that the definition of q is independent from thechoice of representative wαG .) • (Condition 3 ensures that input symbol α is independent from the choiceof representative wαG .) • g ≡ G [ τ ] where τ is a variable renaming that satisfies, for v ∈ Var ( G ), τ ( v ) = (cid:26) [( w, v )] r if w stores vp if v = v m +1 ∧ m = length ( w )(By Condition 9, w stores v , for any v ∈ Var ( G ) \ { v m +1 } , so G [ τ ] iswell-defined. Condition 4 ensures that the definition of g is independent21rom the choice of representative wαG .) Also note that, by Condition 1, τ is injective.) • ̺ is defined for each equivalence class [( w ′ αG ′ , v ′ )] r with w ′ αG ′ ≡ t wαG and w ′ αG ′ stores v ′ . Let n = length ( w ′ ). Then ̺ ([( w ′ αG ′ , v ′ )] r ) = (cid:26) [( w ′ , v ′ )] r if w ′ stores v ′ p if v ′ = v n +1 (By Condition 8, either v ′ = v n +1 or w ′ stores v ′ , so ̺ ([( w ′ αG ′ , v ′ )] r ) iswell-defined. Also by Condition 8, the definition of ̺ does not depend onthe choice of representative w ′ αG ′ . By Conditions 6 and 7, assignment ̺ is injective.) • q ′ = [ wαG ] l (Condition 5 ensures that the definition of q ′ is independent from thechoice of representative wαG .)In order to ensure that A is completely specified, we add transitions to thesink location q sink . More specifically, if q ∈ Q is a location with outgoing α -transitions with guards g , . . . , g m , then we add a transition h q, α, ¬ ( g ∨· · · ∨ g m ) , ̺ , q sink i to Γ , for ̺ the trivial assignment with empty domain.Finally, we add, for each α ∈ Σ , a self loop h q sink , α, ⊤ , ̺ , q sink i to Γ . Since L is regular, ≡ t has finite index and therefore Γ is finite, as required.Note that in fact there exists a one-to-one correspondence between equiv-alence classes of ≡ t and the transitions in Γ that do not lead to q sink .Because suppose wαG ∈ L and w ′ α ′ G ′ ∈ L induce the same transition h q, α ′′ , g, ̺, q ′ i . Then q = [ w ] l = [ w ′ ] l and thus w ≡ l w ′ . Also α = α ′′ = α ′ and thus α = α ′ . Moreover, G [ τ ] ≡ G ′ [ τ ′ ] (with τ ′ defined as expected).Now observe that G [ τ ] ≡ G [ σ ][ τ ′ ], for σ = matching ( w, w ′ ). Thus we have G [ σ ][ τ ′ ] ≡ G ′ [ τ ′ ]. Since τ ′ is injective, this implies G [ σ ] ≡ G ′ . Now Lemma 9implies wαG ≡ t w ′ α ′ G ′ . So each transition of Γ corresponds to exactly oneequivalence class of ≡ t .We claim that A is deterministic and prove this by contradiction. Suppose h q, α, g ′ , ̺ ′ , q ′ i and h q, α, g ′′ , ̺ ′′ , q ′′ i are two distinct α -transitions in Γ with g ′ ∧ g ′′ satisfiable. Then there exists a valuation ξ such that ξ | = g ′ ∧ g ′′ . Note that q = q sink , q ′ = q sink and q ′′ = q sink . Let the two transitions correspond to (dis-tinct) equivalence classes [ w ′ αG ′ ] t and [ w ′′ αG ′′ ] t , respectively. Then g ′ = G ′ [ τ ′ ]and g ′′ = G ′′ [ τ ′′ ], with τ ′ and τ ′′ defined as above. Now observe that G ′ [ τ ′ ] ≡ G ′ [ σ ][ τ ′′ ], for σ = matching ( w ′ , w ′′ ). Using Lemma 4, we derive ξ | = g ′ ∧ g ′′ ⇔ ξ | = G ′ [ σ ][ τ ′′ ] ∧ G ′′ [ τ ′′ ] ⇔ ξ | = ( G ′ [ σ ] ∧ G ′′ )[ τ ′′ ] ⇔ ξ ◦ τ ′′ | = G ′ [ σ ] ∧ G ′′ . Thus G ′ [ σ ] ∧ G ′′ is satisfiable and we may apply Condition 11 to conclude w ′ αG ′ ≡ t w ′′ αG ′′ . Contradiction.So using the assumption that L is regular, we established that A is a registerautomaton. Note that for this we essentially use that equivalences ≡ l , ≡ t and ≡ r have finite index, as well as all the conditions, except Condition 10.22t remains to prove L = L s ( A ). First, we show that L ⊆ L s ( A ). For this,suppose that w = α G · · · α n G n ∈ L . We need to prove w ∈ L s ( A ). Considerthe following sequence δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) , where q = [ w ] l , w = ǫ , domain ( ζ ) = ∅ and, for 1 ≤ i ≤ n , – q i = [ w i ] l , where w i = α G · · · α i G i , – h q i − , α i , g i , ̺ i , q i i is the transition associated to [ w i ] t , – ζ i = ι i ◦ ̺ i , where ι i = ζ i − ∪ { ( p, v i ) } .Since L is feasible, G ∧ · · · ∧ G n is satisfiable. Therefore, in order to prove that δ is a symbolic run of A , it suffices to show, for 1 ≤ i ≤ n , G i ≡ g i [ ι i ] . Suppose w i stores v . Then, for i > ̺ i ([( w i , v )] r ) = (cid:26) [( w i − , v )] r if w i − stores vp if v ′ = v i By induction on i we prove that w i stores v ⇒ ζ i ([ w i , v )] r ) = v . – Base i = 0. Trivial since w does not store any v . – Induction step. Assume i > w i stores v . We consider two cases: • v = v i . Then ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v i )] r ) = ι i ( p ) = v i = v . • w i − stores v . Then ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v )] r ) = ι i ([( w i − , v )] r )= ζ i − ([( w i − , v )] r ) = v (by induction hypothesis) . By definition g i ≡ G i [ τ i ], where for v ∈ Var ( G i ), τ i ( v ) = (cid:26) [( w i − , v )] r if w i − stores vp if v = v i This means we need to prove G i ≡ G i [ τ i ][ ι i ], that is, we must show, for v ∈ Var ( G i ), that ι i ( τ i ( v )) = v . There are two cases: – If v = v i then ι i ( τ i ( v )) = ι i ( p ) = v i = v . – If w i − stores v then ι i ( τ i ( v )) = ι i ([( w i , v )] r ) = ζ i ([( w i , v )] r ) = v .We conclude that δ is a symbolic run with strace ( β ) = w . Since w ∈ L , q n =[ w ] l ∈ F , so symbolic run β is accepting, and thus w ∈ L s ( A ), as required.Next we need to show that L s ( A ) ⊆ L . For this, suppose w = α G · · · α n G n ∈ L s ( A ). We need to prove w ∈ L . Let δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) , be a symbolic run of A , as in Definition 6, with strace ( δ ) = w . For 0 < i ≤ n , sup-pose transition h q i − , α i , g i , ̺ i , q i i corresponds to equivalence class [ u i − α i G ′ i ] t .For 0 ≤ i ≤ n , let w i = α G · · · α i G i .We prove by induction that q i = [ w i ] l and w i stores v ⇒ ζ i ([( w i , v )] r ) = v .23 Base i = 0. Trivial, since q = [ ǫ ] l = [ w ] l and domain ( ζ ) = ∅ by definition. – Induction step. Assume i > 0. Since transition q i − α i ,g i ,̺ i −−−−−→ q i correspondsto equivalence class [ u i − α i G ′ i ] t , q i − = [ u i − ] l . Therefore, by induction hy-pothesis, u i − ≡ l w i − . By Definition 6, G i ≡ g i [ ι i ] and by definition of A , g i ≡ G ′ i [ τ ], where for each v ∈ Var ( G ′ i ), τ ( v ) = (cid:26) [( u i − , v )] r if u i − stores vp if v = v m +1 where m = length ( u i − ). Thus G i ≡ G ′ i [ ι i ◦ τ ]. Let σ = matching ( u i − , w i − ).Then, for each v ∈ Var ( G ′ i ), ι i ◦ τ ( v ) = σ ( v ): • If v = v m +1 then ι i ◦ τ ( v ) = ι i ( p ) = v i = σ ( v ). • If v = v m +1 then, by Condition 9, there exists a v ′ such that ( u i − , v ) ≡ r ( w i − , v ′ ). Then, again by induction hypothesis, ι i ◦ τ ( v ) = ι i ([( u i − , v )] r ) = ζ i − ([( u i − , v )] r ) = ζ i − ([( w i − , v ′ )] r ) = v ′ = σ ( v ) . Therefore G i ≡ G ′ i [ σ ]. Since δ is a symbolic run, guard ( w i − ) ∧ G i is satisfi-able. Now we may use Condition 10 to conclude w i = w i − α i G i ∈ L . Then,by Lemma 9, u i − α i G ′ i ≡ t w i , and thus, by Condition 5, u i − α i G ′ i ≡ l w i .From this, we conclude q i = [ w i ] l .Suppose w i stores v . Since u i − α i G ′ i ≡ t w i , ̺ i ([( w i , v )] r ) = (cid:26) [( w i − , v )] r if w i − stores vp if v ′ = v i Assume w i stores v . We consider two cases: • v = v i . Then ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v i )] r ) = ι i ( p ) = v i = v . • w i − stores v . Then, using the induction hypothesis, ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v )] r ) = ι i ([( w i − , v )] r )= ζ i − ([( w i − , v )] r ) = v. Thus in particular q n = [ w n ] l = [ w ] l . This implies w ∈ L , as required.As a final note, we observe that A is well-formed. Because suppose δ isa symbolic run that ends with ( q, ζ ) and suppose q α,g̺ −−−→ q ′ . Let transition q α,g̺ −−−→ q ′ correspond to equivalence class [ wαG ] t . Suppose x ∈ Var ( g ). Then,by construction of A , there is a variable v ∈ Var ( G ) such that either x =[( w, v )] r and w stores v , or x = p and v = v m +1 , where m = length ( w ). Let w ′ = strace ( δ ). By the above inductive proof, q = [ w ′ ] l and w ′ stores v ′ ⇒ [( w ′ , v ′ )] r ∈ domain ( ζ ). Then w ≡ l w ′ and by Condition 9, either v = v m +1 there exists a v ′ such that ( w, v ) ≡ r ( w ′ , v ′ ). This means that either x = p or x ∈ domain ( ζ ). Hence we may conclude that Var ( g ) ⊆ domain ( ζ ) ∪ { p } andthus A is well-formed by Corollary 1. 24 eferences 1. F. Aarts, F. Heidarian, H. Kuppens, P. Olsen, and F.W. Vaandrager. Automatalearning through counterexample-guided abstraction refinement. In D. Gian-nakopoulou and D. M´ery, editors, , volume 7436 of Lecture Notes in Computer Science , pages 10–27. Springer, August 2012.2. F. Aarts, B. Jonsson, J. Uijen, and F.W. Vaandrager. Generating models of infinite-state communication protocols using regular inference with abstraction. FormalMethods in System Design , 46(1):1–41, 2015.3. D. Angluin. Learning regular sets from queries and counterexamples. Inf. Comput. ,75(2):87–106, 1987.4. Dana Angluin and Dana Fisman. Learning regular omega languages. Theor. Com-put. Sci. , 650:57–72, 2016.5. Michael Benedikt, Clemens Ley, and Gabriele Puppis. What you must rememberwhen processing data words. In Alberto H. F. Laender and Laks V. S. Laksh-manan, editors, Proceedings of the 4th Alberto Mendelzon International Workshopon Foundations of Data Management, Buenos Aires, Argentina, May 17-20, 2010 ,volume 619 of CEUR Workshop Proceedings . CEUR-WS.org, 2010.6. Mikolaj Bojanczyk, Bartek Klin, and Slawomir Lasota. Automata with groupactions. In Proceedings of the 26th Annual IEEE Symposium on Logic in ComputerScience, LICS 2011, June 21-24, 2011, Toronto, Ontario, Canada , pages 355–364.IEEE Computer Society, 2011.7. Matko Botinˇcan and Domagoj Babi´c. Sigma*: Symbolic learning of input-outputspecifications. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Sym-posium on Principles of Programming Languages , POPL ’13, pages 443–456, NewYork, NY, USA, 2013. ACM.8. Cristian Cadar and Koushik Sen. Symbolic execution for software testing: Threedecades later. Commun. ACM , 56(2):8290, February 2013.9. S. Cassel, F. Howar, and B. Jonsson. RALib: A LearnLib extension forinferring EFSMs. In DIFTS 15, Int. Workshop on Design and Implemen-tation of Formal Tools and Systems , Austin, Texas, 2015. Available at .10. S. Cassel, F. Howar, B. Jonsson, M. Merten, and B. Steffen. A succinct canonicalregister automaton model. J. Log. Algebr. Meth. Program. , 84(1):54–66, 2015.11. S. Cassel, F. Howar, B. Jonsson, and B. Steffen. Active learning for extended finitestate machines. Formal Asp. Comput. , 28(2):233–263, 2016.12. Chia Yuan Cho, Domagoj Babi´c, Pongsin Poosankam, Kevin Zhijie Chen, Ed-ward XueJun Wu, and Dawn Song. Mace: Model-inference-assisted concolic explo-ration for protocol and vulnerability discovery. In Proceedings of the 20th USENIXConference on Security , SEC’11, pages 10–10, Berkeley, CA, USA, 2011. USENIXAssociation.13. P. Fiter˘au-Bro¸stean, R. Janssen, and F.W. Vaandrager. Combining model learn-ing and model checking to analyze TCP implementations. In S. Chaudhuri andA. Farzan, editors, Proceedings 28th International Conference on Computer AidedVerification (CAV’16), Toronto, Ontario, Canada, volume 9780 of Lecture Notesin Computer Science , pages 454–471. Springer, 2016.14. Paul Fiter˘au-Bro¸stean and Falk Howar. Learning-based testing the sliding windowbehavior of TCP implementations. In Laure Petrucci, Cristina Seceleanu, and AnaCavalcanti, editors, Critical Systems: Formal Methods and Automated Verification Joint 22nd International Workshop on Formal Methods for Industrial CriticalSystems - and - 17th International Workshop on Automated Verification of CriticalSystems, FMICS-AVoCS 2017, Turin, Italy, September 18-20, 2017, Proceedings ,volume 10471 of Lecture Notes in Computer Science , pages 185–200. Springer,2017.15. Paul Fiter˘au-Bro¸stean, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits Vaandrager,and Patrick Verleg. Model learning and model checking of SSH implementations. In Proceedings of the 24th ACM SIGSOFT International SPIN Symposium on ModelChecking of Software , SPIN 2017, pages 142–151, New York, NY, USA, 2017. ACM.16. Nissim Francez and Michael Kaminski. An algebraic characterization of determin-istic regular languages over infinite alphabets. Theor. Comput. Sci. , 306(1-3):155–175, 2003.17. Bharat Garhewal, Frits Vaandrager, Falk Howar, Timo Schrijvers, Toon Lenaerts,and Rob Smits. Grey-box learning of register automata, June 2020. Under sub-mission.18. Dimitra Giannakopoulou, Zvonimir Rakamari´c, and Vishwanath Raman. Sym-bolic learning of component interfaces. In Proceedings of the 19th InternationalConference on Static Analysis , SAS’12, pages 248–264, Berlin, Heidelberg, 2012.Springer-Verlag.19. Patrice Godefroid, Nils Klarlund, and Koushik Sen. Dart: Directed automatedrandom testing. SIGPLAN Not. , 40(6):213–223, June 2005.20. Andreas Hagerer, Tiziana Margaria, Oliver Niese, Bernhard Steffen, Georg Brune,and Hans-Dieter Ide. Efficient regression testing of CTI-systems: Testing a complexcall-center solution. Annual review of communication, Int.Engineering Consortium(IEC) , 55:1033–1040, 2001.21. J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Languages andComputation . Addison-Wesley, 1979.22. Matthias H¨oschele and Andreas Zeller. Mining input grammars from dynamictaints. In David Lo, Sven Apel, and Sarfraz Khurshid, editors, Proceedings of the31st IEEE/ACM International Conference on Automated Software Engineering,ASE 2016, Singapore, September 3-7, 2016 , pages 720–725. ACM, 2016.23. F. Howar, M. Isberner, B. Steffen, O. Bauer, and B. Jonsson. Inferring semanticinterfaces of data structures. In ISoLA (1): Leveraging Applications of FormalMethods, Verification and Validation. Technologies for Mastering Change - 5thInternational Symposium, ISoLA 2012, Heraklion, Crete, Greece, October 15-18,2012, Proceedings, Part I , volume 7609 of Lecture Notes in Computer Science ,pages 554–571. Springer, 2012.24. Falk Howar, Dimitra Giannakopoulou, and Zvonimir Rakamari´c. Hybrid learning:Interface generation through static, dynamic, and symbolic analysis. In Proceedingsof the 2013 International Symposium on Software Testing and Analysis , ISSTA2013, pages 268–279, New York, NY, USA, 2013. ACM.25. Falk Howar, Bengt Jonsson, and Frits W. Vaandrager. Combining black-box andwhite-box techniques for learning register automata. In Bernhard Steffen andGerhard J. Woeginger, editors, Computing and Software Science - State of theArt and Perspectives , volume 10000 of Lecture Notes in Computer Science , pages563–588. Springer, 2019.26. Falk Howar and Bernhard Steffen. Active automata learning in practice. In AmelBennaceur, Reiner H¨ahnle, and Karl Meinke, editors, Machine Learning for Dy-namic Software Analysis: Potentials and Limits: International Dagstuhl Seminar16172, Dagstuhl Castle, Germany, April 24-27, 2016, Revised Papers , pages 123–148. Springer International Publishing, 2018. 7. Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: Aredundancy-free approach to active automata learning. In Borzoo Bonakdarpourand Scott A. Smolka, editors, Runtime Verification: 5th International Conference,RV 2014, Toronto, ON, Canada, September 22-25, 2014. Proceedings , pages 307–322, Cham, 2014. Springer International Publishing.28. Oded Maler and Ludwig Staiger. On syntactic congruences for omega-languages. Theor. Comput. Sci. , 183(1):93–112, 1997.29. A. Nerode. Linear automaton transformations. Proceedings of the American Math-ematical Society , 9(4):541–544, 1958.30. R.L. Rivest and R.E. Schapire. Inference of finite automata using homing se-quences. Inf. Comput. , 103(2):299–347, 1993.31. M. Schuts, J. Hooman, and F.W. Vaandrager. Refactoring of legacy softwareusing model learning and equivalence checking: an industrial experience report. InE. ´Abrah´am and M. Huisman, editors, Proceedings 12th International Conferenceon integrated Formal Methods (iFM), Reykjavik, Iceland, June 1-3, volume 9681of Lecture Notes in Computer Science , pages 311–325, 2016.32. M. Shahbaz and R. Groz. Inferring mealy machines. In Ana Cavalcanti and DennisDams, editors, FM 2009: Formal Methods, Second World Congress, Eindhoven,The Netherlands, November 2-6, 2009. Proceedings , volume 5850 of Lecture Notesin Computer Science , pages 207–222. Springer, 2009.33. F.W. Vaandrager. Model learning. Communications of the ACM , 60(2):86–95,February 2017., 60(2):86–95,February 2017.