[PDF] A Myhill-Nerode Theorem for Register Automata and Symbolic Trace Languages

Abstract

We propose a new symbolic trace semantics for register automata (extended finite state machines) which records both the sequence of input symbols that occur during a run as well as the constraints on input parameters that are imposed by this run. Our main result is a generalization of the classical Myhill-Nerode theorem to this symbolic setting. Our generalization requires the use of three relations to capture the additional structure of register automata. Location equivalence ≡ l captures that symbolic traces end in the same location, transition equivalence ≡ t captures that they share the same final transition, and a partial equivalence relation ≡ r captures that symbolic values v and v ′ are stored in the same register after symbolic traces w and w ′ , respectively. A symbolic language is defined to be regular if relations ≡ l , ≡ t and ≡ r exist that satisfy certain conditions, in particular, they all have finite index. We show that the symbolic language associated to a register automaton is regular, and we construct, for each regular symbolic language, a register automaton that accepts this language. Our result provides a foundation for grey-box learning algorithms in settings where the constraints on data parameters can be extracted from code using e.g. tools for symbolic/concolic execution or tainting. We believe that moving to a grey-box setting is essential to overcome the scalability problems of state-of-the-art black-box learning algorithms.

Full PDF

aa r X i v : . [ c s . F L ] J u l A Myhill-Nerode Theorem for RegisterAutomata and Symbolic Trace Languages ⋆ Frits Vaandrager and Abhisek Midya

Institute for Computing and Information Sciences, Radboud University, Nijmegen,the Netherlands

Abstract.

We propose a new symbolic trace semantics for register au-tomata (extended ﬁnite state machines) which records both the sequenceof input symbols that occur during a run as well as the constraints oninput parameters that are imposed by this run. Our main result is ageneralization of the classical Myhill-Nerode theorem to this symbolicsetting. Our generalization requires the use of three relations to cap-ture the additional structure of register automata. Location equivalence ≡ l captures that symbolic traces end in the same location, transitionequivalence ≡ t captures that they share the same ﬁnal transition, anda partial equivalence relation ≡ r captures that symbolic values v and v ′ are stored in the same register after symbolic traces w and w ′ , respec-tively. A symbolic language is deﬁned to be regular if relations ≡ l , ≡ t and ≡ r exist that satisfy certain conditions, in particular, they all haveﬁnite index. We show that the symbolic language associated to a reg-ister automaton is regular, and we construct, for each regular symboliclanguage, a register automaton that accepts this language. Our resultprovides a foundation for grey-box learning algorithms in settings wherethe constraints on data parameters can be extracted from code using e.g.tools for symbolic/concolic execution or tainting. We believe that movingto a grey-box setting is essential to overcome the scalability problems ofstate-of-the-art black-box learning algorithms. Model learning (a.k.a. active automata learning) is a black-box technique whichconstructs state machine models of software and hardware components frominformation obtained by providing inputs and observing the resulting outputs.Model learning has been successfully used in numerous applications, for instancefor generating conformance test suites of software components [20], ﬁnding mis-takes in implementations of security-critical protocols [13,15,14], learning inter-faces of classes in software libraries [23], and checking that a legacy componentand a refactored implementation have the same behavior [31]. We refer to [33,26]for surveys and further references. ⋆ Supported by NWO TOP project 612.001.852 Grey-box learning of Interfaces forRefactoring Legacy Software (GIRLS). yhill-Nerode theorems [29,21] are of pivotal importance for model learningalgorithms. Angluin’s classical L ∗ algorithm [3] for active learning of regularlanguages, as well as improvements such as [30,27,32], use an observation tableto approximate the Nerode congruence. Maler and Steiger [28] established aMyhill-Nerode theorem for ω -languages that serves as a basis for a learningalgorithm described in [4]. The SL ∗ algorithm for active learning of registerautomata of Cassel et al [11] is directly based on a generalization of the classicalMyhill-Nerode theorem to a setting of data languages and register automata(extended ﬁnite state machines). Francez and Kaminski [16], Benedikt et al [5]and Bojanszyk et al [6] all present Myhill-Nerode theorems for data languages.Despite the convincing applications of black-box model learning, it is fair tosay that existing algorithms do not scale very well. In order to learn models ofrealistic applications in which inputs and outputs carry data parameters, state-of-the-art techniques either rely on manually constructed mappers that abstractthe data parameters of inputs and outputs into a ﬁnite alphabet [2], or otherwiseinfer guards and assignments from black-box observations of test outputs [11,1].The latter can be costly, especially for models where the control ﬂow dependson data parameters in the input. Thus, for instance, the RALib tool [9], animplementation of the SL ∗ algorithm, needed more than two hundred thousandinput/reset events to learn register automata with just 6 to 8 locations for TCPclient implementations of Linux, FreeBSD and Windows [14]. Existing black-box model learning algorithms also face severe restrictions on the operationsand predicates on data that are supported (typically, only equality/inequalitypredicates and constants).A natural way to address these limitations is to augment learning algorithmswith white-box information extraction methods, which are able to obtain infor-mation about the system under learning at lower cost than black-box techniques[25]. Constraints on data parameters can be extracted from the code using e.g.tools for symbolic execution [8], concolic execution [19], or tainting [22]. Severalresearchers have successfully explored this idea, see for instance [18,12,7,24]. Re-cently, we showed how constraints on data parameters can be extracted fromPython programs using tainting, and used to boost the performance of RALibwith almost two orders of magnitude [17]. Nevertheless, all these approaches arerather ad hoc, and what is missing is Myhill-Nerode theorem for this enrichedsettings that may serve as a foundation for grey-box model learning algorithmsfor a general class of register automata. In this article, we present such a theorem.More speciﬁcally, we propose a new symbolic trace semantics for registerautomata which records both the sequence of input symbols that occur duringa run as well as the constraints on input parameters that are imposed by thisrun. Our main result is a Myhill-Nerode theorem for symbolic trace languages.Whereas the original Myhill-Nerode theorem refers to a single equivalence rela-tion ≡ on words, and constructs a DFA in which states are equivalence classes of ≡ , our generalization requires the use of three relations to capture the additionalstructure of register automata. Location equivalence ≡ l captures that symbolictraces end in the same location, transition equivalence ≡ t captures that they2hare the same ﬁnal transition, and a partial equivalence relation ≡ r capturesthat symbolic values v and v ′ are stored in the same register after symbolic traces w and w ′ , respectively. A symbolic language is deﬁned to be regular if relations ≡ l , ≡ t and ≡ r exist that satisfy certain conditions, in particular, they all haveﬁnite index. Whereas in the classical case of regular languages the Nerode equiv-alence ≡ is uniquely determined, diﬀerent relations relations ≡ l , ≡ t and ≡ r mayexist that satisfy the conditions for regularity for symbolic languages. We showthat the symbolic language associated to a register automaton is regular, and weconstruct, for each regular symbolic language, a register automaton that acceptsthis language. In this automaton, the locations are equivalence classes of ≡ l ,the transitions are equivalence classes of ≡ t , and the registers are equivalenceclasses of ≡ r . In this way, we obtain a natural generalization of the classicalMyhill-Nerode theorem for symbolic languages and register automata. UnlikeCassel et al [11], we need no restrictions on the allowed data predicates to proveour result, which drastically increases the range of potential applications. Ourresult paves the way for eﬃcient grey-box learning algorithms in settings wherethe constraints on data parameters can be extracted from the code. In this section, we ﬁx some basic vocabulary for (partial) functions, languages,and logical formulas.

We write f : X ⇀ Y to denote that f is a partial function from set X to set Y .For x ∈ X , we write f ( x ) ↓ if there exists a y ∈ Y such that f ( x ) = y , i.e., theresult is deﬁned, and f ( x ) ↑ if the result is undeﬁned. We write domain ( f ) = { x ∈ X | f ( x ) ↓} and range ( f ) = { f ( x ) ∈ Y | x ∈ domain ( f ) } . We often identifya partial function f with the set of pairs { ( x, y ) ∈ X × Y | f ( x ) = y } . As usual,we write f : X → Y to denote that f is a total function from X to Y , that is, f : X ⇀ Y and domain ( f ) = X . Let Σ be a set of symbols . A word u = a . . . a n over Σ is a ﬁnite sequence ofsymbols from Σ . The length of a word u , denoted | u | is the number of symbolsoccurring in it. The empty word is denoted ǫ . We denote by Σ ∗ the set of all wordsover Σ , and by Σ + the set of all nonempty words over Σ (i.e. Σ ∗ = Σ + ∪ { ǫ } ).Given two words u and w , we denote by u · w the concatenation of u and w .When the context allows it, u · w shall be simply written uw . We say that u is a preﬁx of w iﬀ there exists a word u ′ such that u · u ′ = w . Similarly, u is a suﬃx of w iﬀ there exists a word u ′ such that u ′ · u = w . A language L over Σ is anyset of words over Σ , so therefore a subset of Σ ∗ . We say that L is preﬁx closedif, for each w ∈ L and each preﬁx u of w , u ∈ L as well.3 .3 Guards We postulate a countably inﬁnite set V = { v , v , . . . } of variables . In addition,there is also a variable p

6∈ V that will play a special role as formal parameterof input symbols; we write V + = V ∪ { p } . Our framework is parametrized by aset R of relation symbols. Elements of R are assigned ﬁnite arities . A guard isa Boolean combination of relation symbols from R over variables. Formally, theset of guards is inductively deﬁned as follows: – If r ∈ R is an n -ary relation symbol and x , . . . , x n are variables from V + ,then r ( x , . . . , x n ) is a guard. – If g is a guard then ¬ g is a guard. – If g and g are guards then g ∧ g is a guard.We use standard abbreviations from propositional logic such as ⊤ and g ∨ g .We write Var ( g ) for the set of variables that occur in a guard g . We say that g is a guard over set of variables X if Var ( g ) ⊆ X . We write G ( X ) for the set ofguards over X , and use symbol ≡ to denote syntactic equality of guards.We postulate a structure R consisting of a set D of data values and a distin-guished n -ary relation r R ⊆ D n for each n -ary relation symbol r ∈ R . A trivialexample of a structures R , R consists of the binary symbol ‘=’, D the set of nat-ural numbers, and = R is the equality predicate on numbers. An n -ary operation f : D n → D can be modelled in our framework as an n + 1-ary predicate. Wemay for instance extend structure R with a ternary predicate symbol +, where( d , d , d ) ∈ + R iﬀ the sum of d and d equals d . Constants like 0 and 1 canbe added to R as unary predicates.A valuation is a partial function ξ : V + ⇀ D that assigns data values tovariables. If Var ( g ) ⊆ domain ( ξ ), then ξ | = g is deﬁned inductively by: – ξ | = r ( x , . . . , x n ) iﬀ ( ξ ( x ) , . . . , ξ ( x n )) ∈ r R – ξ | = ¬ g iﬀ not ξ | = g – ξ | = g ∧ g iﬀ ξ | = g and ξ | = g If ξ | = g then we say valuation ξ satisﬁes guard g . We call g is satisﬁable , andwrite Sat ( g ), if there exists a valuation ξ such that ξ | = g . Guard g is a tautology if ξ | = g for all valuations ξ with Var ( g ) ⊆ domain ( ξ ).A variable renaming is a partial function σ : V + ⇀ V +. If g is a guardwith Var ( g ) ⊆ domain ( σ ) then g [ σ ] is the guard obtained by replacing eachoccurrence of a variable x in g by variable σ ( x ). The following lemma is easilyproved by induction. Lemma 1. ξ ◦ σ | = g iﬀ ξ | = g [ σ ] Proof.

A register automaton comprises a set of locations with transitions between them,and a set of registers which can store data values that are received as inputs.Transitions contain guards over the registers and the current input, and mayassign new values to registers.

Deﬁnition 1. A register automaton is a tuple A = ( Σ, Q, q , F, V, Γ ) , where – Σ is a ﬁnite set of input symbols , – Q is a ﬁnite set of locations , with q ∈ Q the initial location , and F ⊆ Q aset of accepting locations , – V ⊂ V is a ﬁnite set of registers , and – Γ is a ﬁnite set of transitions , each of form h q, α, g, ̺, q ′ i where • q, q ′ ∈ Q are the source and target locations, respectively; we require that q ′ ∈ F ⇒ q ∈ F , • α ∈ Σ is an input symbol, • g ∈ G ( V ∪ { p } ) is a guard, and • ̺ : V ⇀ V ∪ { p } is an assignment ; we require that ̺ is injective.Register automata are required to be completely speciﬁed in the sense that foreach location q ∈ Q and each input symbol α ∈ Σ , the disjunction of the guardson the α -transitions with source q is a tautology. Register automata are alsorequired to be deterministic in the sense that for each location q ∈ Q and inputsymbol α ∈ Σ , the conjunction of the guards of any pair of distinct α -transitionswith source q is not satisﬁable. We write q α,g,̺ −−−→ q ′ if h q, α, g, ̺, q ′ i ∈ Γ . xample 1. Figure 1 shows a register automaton A = ( Σ, Q, q , F, V, Γ ) with asingle input symbol a and four locations q , q , q and q , with q , q , q acceptingand q non-accepting. The initial location q is marked by an arrow “start”and accepting locations are indicated by a double circle. There is just a singleregister x . Set Γ contains six transitions, which are indicated in the diagram.All transitions are labeled with input symbol a , a guard over formal parameter p and the registers, and an assignment. Guards represent conditions on datavalues. For example, the guard on the transition from q to q , expresses thatthe data value of action a must be smaller than the data value currently storedin register x . We write x := p to denote the assignment that stores the dataparameter p in register x , that is, the function ̺ satisfying ̺ ( x ) = p . Trivialguards ( ⊤ ) and assignments (empty domain) are omitted. Note that location q is actually a sink location, i.e., there is no way to get into an accepting state from q . Thus the register automaton satisﬁes the condition that for each transitioneither the source location is accepting or the target location is not accepting.When drawing register automata, we often only depict the accepting locations,and leave a non-accepting sink location and the transitions leading to it implicit.Note that in locations q and q , which have more than one outgoing transition,the disjunction of the guards of these transitions is equivalent to true, whereasthe conjunction is equivalent to false.The semantics of a register automaton is deﬁned in terms of the set of datawords that it accepts. Deﬁnition 2.

Let Σ be a ﬁnite alphabet. A data symbol over Σ is a pair α ( d ) with α ∈ Σ and d ∈ D . A data word over Σ is a ﬁnite sequence of data symbols,i.e., a word over Σ × D . A data language over Σ is a set of data words over Σ . We associate a data language to each register automata as follows.

Deﬁnition 3.

Let A = ( Σ, Q, q , F, V, Γ ) be a register automaton. A conﬁgura-tion of A is a pair ( q, ξ ) , where q ∈ Q and ξ : V ⇀ D . A run of A over a dataword w = α ( d ) · · · α n ( d n ) is a sequence γ = ( q , ξ ) α ( d ) −−−−→ ( q , ξ ) . . . ( q n − , ξ n − ) α n ( d n ) −−−−→ ( q n , ξ n ) , where, for ≤ i ≤ n , ( q i , ξ i ) is a conﬁguration of A , domain ( ξ ) = ∅ , and for < i ≤ n , Γ contains a transition q i − α i ,g i ,̺ i −−−−−→ q i such that q start q q q a, x := pa, x ≤ p, x := pa, p < x, x := pa, x ≤ p, x := p a, p < x a Fig. 1: Register automaton.6 ι i | = g i , where ι i = ξ i − ∪ { ( p, d i ) } , and – ξ i = ι i ◦ ̺ i .We say that run γ is accepting if q n ∈ F and rejecting if q n F . We call w the trace of γ , notation trace ( γ ) = w . Data word w is accepted (rejected) if A hasan accepting (rejecting) run over w . The data language of A , notation L ( A ) , isthe set of all data words that are accepted by A . Two register automata over thesame alphabet Σ are trace equivalent if they accept the same data language.Example 2. Consider the register automaton of Figure 1. This automaton ac-cepts the data word a (1) a (4) a (0) a (7) since the following sequence of steps isa run (here ξ is the trivial function with empty domain):( q , ξ ) a (1) −−→ ( q , x a (4) −−→ ( q , x a (0) −−→ ( q , x a (7) −−→ ( q , x . Note that the ﬁnal location q of this run is accepting. Upon receiving the ﬁrstinput a (1), the automaton jumps to q and stores data value 1 in the register x .Since 4 is bigger than 1, the automaton takes the self loop upon receiving thesecond input a (4) and stores 4. Since 0 is less than 4, it moves to q upon receiptof the third input a (0) and updates x to 0. Finally, the automaton gets back to q as 7 is bigger than 0.Suppose that in the register automaton of Figure 1 we replace the guard onthe transition from q to q by x ≤ p . Since initial valuation ξ does not assigna value to x , this means that it is not deﬁned whether ξ satisﬁes guard x ≤ p .Automata in which such “runtime errors” do not occur are called well-formed . Deﬁnition 4.

Let A be a register automaton. We say that a conﬁguration ( q, ξ ) of A is reachable if there is a run of A that ends with ( q, ξ ) . We call A well-formed if, for each reachable conﬁguration ( q, ξ ) , ξ assigns a value to all variablesfrom V that occur in guards of outgoing transitions of q , that is, ( q, ξ ) reachable ∧ q α,g̺ −−−→ q ′ ⇒ Var ( g ) ⊆ domain ( ξ ) ∪ { p } . As soon as the set of data values and the collection of predicates becomes non-trivial, well-formedness of register automata becomes undecidable. However, itis easy to come up with a suﬃcient condition for well-formedness, based on asyntactic analysis of A , which covers the cases that occur in practice. In theremainder of article, we will restrict our attention to well-formed register au-tomata. In particular, the register automata that are constructed from regularsymbolic trace languages in our Myhill-Nerode theorem will be well-formed.Our deﬁnition of a register automaton is diﬀerent from the one used in the SL ∗ algorithm [11] and its implementation in RALib [9]. It is instructive to com-pare the two deﬁnitions. In order to establish a Myhill-Nerode theorem, [11]requires that structure R , which is a parameter of the SL ∗ algorithm, is weaklyextendible . This technical restriction excludes many data types that are com-monly used in practice. For instance, the set of integers with constants 0 and7, an addition operator +, and a less-than predicate < is not weakly extend-able. In our approach, no restrictions on R are needed. Unlike [11], we do notassociate a ﬁxed set of variables to each location. Our deﬁnition is slightly moregeneral, which simpliﬁes some technicalities. However, we require assignmentsto be injective, a restriction that is not imposed by [11]. But note that the reg-ister automata that are actually constructed by SL ∗ are right-invariant [10]. Ina right-invariant register automaton, two values can only be tested for equalityif one of them is the current input symbol. Right-invariance, as deﬁned in [10],implies that assignments are injective. As illustrated by the example of Figure 2,our register automata are exponentially more succinct than the right-invariantregister automata constructed by SL ∗ . As pointed out in [10], right-invariantregister automata in turn are more succinct than the automata of [16,5]. Our q start q q q n − q n q ok a, x := p a, x := p · · · a, x n := pb, V n − i =1 ( x i = x i +1 ↔ x n + i = x n + i +1 ) Fig. 2: For each n > A n is a register automaton that ﬁrst accepts 2 n inputsymbols a , storing all the data values that it receives, and then accepts inputsymbol b when two consecutive values in the ﬁrst half of the input are equal iﬀthe corresponding consecutive values in the second half of the input are equal.The number of locations and transitions of A n grows linearly with n . There existright-invariant register automata B n that accept the same data languages, buttheir size grows exponentially with n .deﬁnition assumes that, for any transition from q to q ′ , q ′ ∈ F ⇒ q ∈ F . As aconsequence of this assumption, which is not required in [11], the data languageaccepted by a register automaton is preﬁx closed. We need this property fortechnical reasons, but for models of reactive systems it is actually quite natural.The RALib tool [9] also assumes that data languages are preﬁx closed. For readers familiar with [11]: A structure (called theory in [11]) is weakly extendableif for all natural numbers k and data words u , there exists a u ′ with u ′ ≈ R u which is k -extendable. Intuitively, u ′ ≈ R u if data words u ′ and u have the samesequences of actions and cannot be distinguished by the relations in R . Let u = α (0) α (1) α (2) α (4) α (8) α (16) α (11). Then there exists just one u ′ diﬀerent from u with u ′ ≈ R u , namely u ′ = α (0) α (1) α (2) α (4) α (8) α (16) α (13). Now both u and u ′ are noteven 1-extendable: if we extend u with α (3), we cannot ﬁnd a matching extension α ( d ′ ) of u ′ such that uα (3) ≈ R u ′ α ( d ′ ), and if we extend u ′ with α (5) we cannotﬁnd a matching extension α ( d ) of u such that uα ( d ) ≈ R u ′ α (5). γ of a register automaton A we can trivially extract a data word trace ( γ ) by forgetting all information except the data symbols. Conversely, foreach data word w that is accepted by A , there exists a corresponding acceptingrun γ , which is uniquely determined by the data word since from each conﬁgu-ration ( q, ξ ) and data symbol α ( d ), exactly one transition will be enabled. Lemma 2.

Suppose γ and γ ′ are runs of a register automaton A such thattrace ( γ ) = trace ( γ ′ ) . Then γ = γ ′ .Proof. We prove the lemma by contradiction. Suppose γ = γ ′ . All runs of A share at least the initial conﬁguration ( q , ξ ). Let γ be as in Deﬁnition 3, andlet ( q i − , ξ i − ) be the last point where γ and γ ′ coincide. From this point, γ continues its course with a step( q i − , ξ i − ) α i ( d i ) −−−−→ ( q i , ξ i ) , whereas γ ′ continues with a diﬀerent step( q i − , ξ i − ) α i ( d i ) −−−−→ ( q ′ i , ξ ′ i ) . Note that both steps carry the same data symbol as trace ( γ ) = trace ( γ ′ ). Then Γ contains a transition q i − α i ,g i ,̺ i −−−−−→ q i such that ι i | = g i , where ι i = ξ i − ∪{ ( p, d i ) } .In addition, Γ contains a transition q i − α i ,g ′ i ,̺ ′ i −−−−−→ q ′ i such that ι i | = g ′ i . Since ι i | = g i and ι i | = g ′ i , we may conclude that g i ∧ g ′ i is satisﬁable. Therefore, as A isdeterministic, both transitions are the same, that is, g i ≡ g ′ i , ̺ i = ̺ ′ i and q i = q ′ i .But then also ξ i = ι i ◦ ̺ i = ι i ◦ ̺ ′ i = ξ ′ i , which means that the two outgoing stepsof conﬁguration ( q i − , ξ i − ) are the same. Contradiction. We will now introduce an alternative trace semantics for register automata,which records both the sequence of input symbols that occur during a run aswell as the constraints on input parameters that are imposed by this run. We willexplore some basic properties of this semantics, and show that the equivalenceinduced by symbolic traces is ﬁner than data equivalence.A symbolic language consists of words in which input symbols and guardsalternate.

Deﬁnition 5.

Let Σ be a ﬁnite alphabet. A symbolic word over Σ is a ﬁnitealternating sequence w = α G · · · α n G n of input symbols from Σ and guards. A symbolic language over Σ is a set of symbolic words over Σ . A symbolic run is just a run, except that the valuations do not return concretedata values, but markers (variables) that record the exact place in the run wherethe input occurred. Using these symbolic valuations (variable renamings, actu-ally) it is straightforward to compute the constraints on the input parametersfrom the guards occurring in the run. 9 eﬁnition 6.

Let A = ( Σ, Q, q , F, V, Γ ) be a register automaton. A symbolicrun of A is a sequence δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) , where ζ is the trivial variable renaming with empty domain and, for < i ≤ n , – q i − α i ,g i ,̺ i −−−−−→ q i is a transition in Γ , – ζ i is a variable renaming with domain ( ζ i ) ⊆ V , – ζ i = ι i ◦ ̺ i , where ι i = ζ i − ∪ { ( p, v i ) } , and – guard G ∧ · · · ∧ G n is satisﬁable, where G i ≡ g i [ ι i ] .We say that symbolic run δ is accepting if q n ∈ F and rejecting if q n F . The symbolic trace of δ is the symbolic word strace ( δ ) = α G · · · α n G n . Symbolicword w is accepted (rejected) if A has an accepting (rejecting) symbolic run δ with strace ( δ ) = w . The symbolic language of A , notation L s ( A ) , is the set ofall symbolic words accepted by A . Two register automata over the same alphabet Σ are symbolic trace equivalent if they accept the same symbolic language.Example 3. Consider the register automaton of Figure 1. The following sequenceof tainted steps constitutes a symbolic run:( q , ζ ) a, ⊤ ,̺ −−−−→ ( q , x v ) a,x ≤ p,̺ −−−−−−→ ( q , x v ) a,p

Let δ be a symbolic run of A , as in Deﬁnition 6. Then range ( ζ i ) ⊆{ v , . . . , v i } , for i ∈ { , . . . , n } , and range ( ι i ) ⊆ { v , . . . , v i } , for i ∈ { , . . . , n } .Proof. By induction on i : – Base. Suppose i = 0. Then the lemma holds trivially since range ( ζ ) = ∅ .10 Induction step. Suppose i >

0. Then range ( ζ i ) = range ( ι i ◦ ̺ i ) ⊆ range ( ι i )= range ( ζ i − ∪ { ( p, v i ) } )= range ( ζ i − ) ∪ { v i } (by induction hypothesis) ⊆ { v , . . . , v i − } ∪ { v i } = { v , . . . , v i } . As a consequence of our assumption that assignments in a register automatonare injective, all the variable renamings in a symbolic run are injective as well.

Lemma 4.

Let δ be a symbolic run of A , as in Deﬁnition 6. Then, for each i ∈ { , . . . , n } , ζ i is injective, and for each i ∈ { , . . . , n } , ι i is injective.Proof. By induction on i : – Base. Suppose i = 0. Then the lemma holds trivially since range ( ζ ) = ∅ . – Induction step. Suppose i >

0. By Lemma 3, range ( ζ i − ) ⊆ { v , . . . , v i − } .By the induction hypothesis, ζ i − is injective. From this we conclude that ζ i − ∪ { ( p, v i ) } is injective, which means ι i is injective. Since the compositionof injective functions is injective, ζ i = ι i ◦ ̺ i is injective.All symbolic words accepted by a register automaton satisfy some basic sanityproperties: guards may only refer to the markers for values received thus far, andthe conjunction of all the guards is satisﬁable. We call symbolic words that satisfythese properties feasible . Note that if a symbolic word is feasible, any preﬁx isfeasible as well. Deﬁnition 7 (Feasible).

Let w = α G · · · α n G n be a symbolic word. We writelength ( w ) = n and guard ( w ) = G ∧ · · · ∧ G n . Word w is feasible if guard ( w ) is satisﬁable and Var ( G i ) ⊆ { v , . . . , v i } , for each i ∈ { , . . . , n } . A symboliclanguage is feasible if it is preﬁx closed and consists of feasible symbolic words. Lemma 5. L s ( A ) is feasible.Proof. Since for any transition h q, α, g, ̺, q ′ i of A , q ′ ∈ F implies q ∈ F , L s ( A ) ispreﬁx closed. Suppose w = α G · · · α n G n is a symbolic word of A . It suﬃces toshow that w is feasible. Consider a symbolic run δ for w , as in Deﬁnition 6. ByLemma 3, Var ( G i ) = Var ( g i [ ι i ]) ⊆ range ( ι i ) ⊆ { v , . . . , v i } , for i ∈ { , . . . , n } .By deﬁnition of δ , guard ( w ) = G ∧ · · · ∧ G n is satisﬁable.Since register automata are deterministic, each symbolic trace of A corre-sponds to a unique symbolic run of A . Lemma 6.

Suppose δ and δ ′ are symbolic runs of a register automaton A suchthat strace ( δ ) = strace ( δ ′ ) . Then δ = δ ′ . roof. We prove the lemma by contradiction. Suppose δ = δ ′ . All symbolic runsof A share at least the initial conﬁguration ( q , ζ ). Let δ be as in Deﬁnition 6,and let ( q i − , ζ i − ) be the last point where δ and δ ′ coincide. From this point, δ continues its course with a step( q i − , ζ i − ) α i ,g i ,̺ i −−−−−→ ( q i , ζ i ) , whereas δ ′ continues with a diﬀerent step( q i − , ζ i − ) α ′ i ,g ′ i ,̺ ′ i −−−−−→ ( q ′ i , ζ ′ i ) . Since strace ( δ ) = strace ( δ ′ ), α i = α ′ i and g i [ ι i ] ≡ g ′ i [ ι i ], where ι i = ζ i − ∪{ ( p, v i ) } .By Lemma 4, variable renaming ι i is injective, which implies that g i ≡ g ′ i .The underlying transitions q i − α i ,g i ,̺ i −−−−−→ q i and q i − α ′ i ,g ′ i ,̺ ′ i −−−−−→ q ′ i of A must bediﬀerent, because otherwise also ζ i and ζ ′ i would be equal. Therefore, since A isdeterministic, g i ∧ g ′ i is not satisﬁable. Since g i ≡ g ′ i , this means that g i is notsatisﬁable. But since δ is a symbolic run, G i = g i [ ι i ] is satisﬁable, and thus thereexists a valuation ξ such that ξ | = G i . But now Lemma 1 gives ξ ◦ ι i | = g i . Thismeans that g i is satisﬁable and we have derived a contradiction. Deﬁnition 8.

Let A be a register automaton and w ∈ L s ( A ) . Then we writesymb ( w ) for the unique symbolic run δ of A with strace ( δ ) = w . There exists a one-to-one correspondence between runs of A and pairs con-sisting of a symbolic run of A and a satisfying assignments for the guards fromits symbolic trace. Lemma 7.

Let δ be a symbolic run of A , as in Deﬁnition 6, and ξ : { v , . . . , v n } →D a valuation such that ξ | = G ∧· · ·∧ G n . Let run A ( δ, ξ ) be the sequence obtainedfrom δ by (a) replacing each input α i by data symbol α i ( ξ ( v i )) (for < i ≤ n ),(b) removing guards g i and assignments ̺ i , and (c) replacing valuations ζ i by ξ i = ξ ◦ ζ i (for ≤ i ≤ n ). Then run A ( δ, ξ ) is a run of A .Proof. It suﬃces to show, for 0 < i ≤ n , that κ i | = g i , where κ i = ξ i − ∪{ ( p, d i ) } ,and ξ i = κ i ◦ ̺ i , for 0 < i ≤ n . We derive ξ ◦ ι i = ξ ◦ ( ζ i − ∪ { p, v i } ) = ξ ◦ ζ i − ∪ { p, d i } = ξ i − ∪ { ( p, d i ) } = κ i . By assumption, ξ | = G i ≡ g i [ ι i ]. By Lemma 1, ξ ◦ ι i | = g i . Hence, by the abovederivation, κ i | = g i , as required. We derive ξ i = ξ ◦ ζ i = ξ ◦ ( ι i ◦ ̺ i ) = ( ξ ◦ ι i ) ◦ ̺ i = κ i ◦ ̺ i . Thus ξ i = κ i ◦ ̺ i , as required. Lemma 8.

Let γ be a run of register automaton A . Then there exist a valuation ξ and symbolic run δ such that run A ( δ, ξ ) = γ . roof. Let γ be as in Deﬁnition 3: γ = ( q , ξ ) α ( d ) −−−−→ ( q , ξ ) . . . ( q n − , ξ n − ) α n ( d n ) −−−−→ ( q n , ξ n ) , where, for 0 ≤ i ≤ n , ( q i , ξ i ) is a conﬁguration of A , domain ( ξ ) = ∅ , and for0 < i ≤ n , Γ contains a transition q i − α i ,g i ,̺ i −−−−−→ q i such that – ι ′ i | = g i , where ι ′ i = ξ i − ∪ { ( p, d i ) } , and – ξ i = ι ′ i ◦ ̺ i .Since A is deterministic, the transitions q i − α i ,g i ,̺ i −−−−−→ q i are uniquely determined.Let ζ be the trivial variable renaming with empty domain and, for 0 < i ≤ n ,deﬁne ζ i inductively by ζ i = ι i ◦ ̺ i and ι i = ζ i − ∪{ ( p, v i ) } . Let ξ : { v , . . . , v n } →D be given by ξ ( v i ) = d i , for 1 ≤ i ≤ n , and let δ be the sequence δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) . We claim that δ is a symbolic execution of A . For this, it suﬃces to show that ξ | = G ∧ · · · ∧ G n , where G i ≡ g i [ ι i ]. By induction on i we show ι ′ i = ξ ◦ ι i for 0 < i ≤ nξ i = ξ ◦ ζ i for 0 ≤ i ≤ n

1. Base. ξ = ξ ◦ ζ , as both ξ and ζ have empty domain.2. Induction step. ι ′ i = ξ i − ∪ { ( p, d i ) } = (by induction hypothesis)= ξ ◦ ζ i − ∪ { ( p, d i ) } = ξ ◦ ( ζ i − ∪ { ( p, v i ) } ) = ξ ◦ ι i ξ i = ι ′ i ◦ ̺ i = ( ξ ◦ ι i ) ◦ ̺ i = ξ ◦ ( ι i ◦ ̺ i ) = ξ ◦ ζ i Let 0 ≤ i ≤ n . Since γ is a run, ι ′ i | = g i . By the identity we just derived, ξ ◦ ι i | = g i .By Lemma 1, ξ | = g i [ ι i ] ≡ G i . Hence ξ | = G ∧ · · · ∧ G n , which proves the claimthat δ is a symbolic execution of A . It is easy to verify that γ = run ( δ, ξ ).Using the above lemmas, we can prove that whenever two register automataaccept the same symbolic language, they also accept the same data language. Theorem 1.

Suppose A and B are register automata with L s ( A ) = L s ( B ) . Then L ( A ) = L ( B ) .Proof. We will prove L ( A ) ⊆ L ( B ). The proof of the inclusion L ( A ) ⊆ L ( B ) issymmetric. Suppose w ∈ L ( A ). Then there exists a run γ of A with trace ( γ ) = w .By Lemma 8, there exist a valuation ξ and symbolic run δ of A such that run A ( δ, ξ ) = γ . Let u = strace ( δ ). Then u ∈ L s ( A ) and, since L s ( A ) = L s ( B ), u ∈ L s ( B ). Let δ ′ be a symbolic run of B such that strace ( δ ′ ) = u . Let γ ′ = run B ( δ ′ , ξ ). By Lemma 7, γ ′ is a run of B . Let w ′ = trace ( γ ′ ). Then w ′ ∈ L ( B ).Note that w and w ′ share the same sequence of data values, as given by valuation ξ . Also note that w , γ , δ , u , δ ′ , γ ′ and w ′ all share the same sequence of inputsymbols. Thus w = w ′ and w ∈ L ( B ), as required.13 start q start q q a, p > a, p ≤ a Fig. 3: Trace equivalent but not symbolic trace equivalent.

Example 4.

The converse of Theorem 1 does not hold. Figure 3 gives a trivialexample of two register automata with the same data language but a diﬀerentsymbolic language.Lemma 7 allows us to rephrase the well-formedness condition of registerautomata in terms of symbolic runs.

Corollary 1.

Register automaton A is well-formed iﬀ, for each symbolic run δ that ends with ( q, ζ ) , q α,g̺ −−−→ q ′ ⇒ Var ( g ) ⊆ domain ( ζ ) ∪ { p } .Proof. “ ⇒ ” Suppose symbolic execution δ is deﬁned as in Deﬁnition 6. Let ξ : { v , . . . , v n } → D be a valuation such that ξ | = G ∧· · ·∧ G n . (Such a valuation ξ exists since, by deﬁnition of a symbolic run, G ∧ · · · ∧ G n is satisﬁable.)By Lemma 7, γ = run ( δ, ξ ) is a run of A . By construction of γ , γ ends witha reachable conﬁguration ( q, ξ ), where domain ( ξ ) = domain ( ζ ). Now we mayapply the deﬁnition of well-formedness to conclude Var ( g ) ⊆ domain ( ζ ) ∪ { p } .“ ⇐ ” Suppose that for each symbolic run δ that ends with ( q, ζ ), we have q α,g̺ −−−→ q ′ ⇒ Var ( g ) ⊆ domain ( ζ ) ∪ { p } . Let ( q, ξ ) be the ﬁnal conﬁgurationof a run γ of A with q α,g̺ −−−→ q ′ . By Lemma 8, there exist a valuation ξ ′ andsymbolic run δ such that run A ( δ, ξ ′ ) = γ . Let ( q, ζ ) be the ﬁnal conﬁguration ofsymbolic run δ . Then by the assumption, Var ( g ) ⊆ domain ( ζ ) ∪ { p } . Therefore,since domain ( ξ ) = domain ( ζ ), Var ( g ) ⊆ domain ( ζ ) ∪ { p } and we may concludethat A is well-formed. The Myhill-Nerode equivalence [29,21] deems two words w and w ′ of a language L equivalent if there does not exist a suﬃx u that distinguishes them, that is,only one of the words w · u and w ′ · u is in L . The Myhill-Nerode theorem statesthat L is regular if and only if this equivalence relation has a ﬁnite index, andmoreover that the number of states in the smallest deterministic ﬁnite automa-ton (DFA) recognizing L is equal to the number of equivalence classes. In thissection, we present a Myhill-Nerode theorem for symbolic languages and registerautomata. We need three relations ≡ l , ≡ t and ≡ r on symbolic words to capture14he structure of register automata. Intuitively, two symbolic words w and w ′ are location equivalent , notation w ≡ l w ′ , if they lead to the same location, transitionequivalent , notation w ≡ t w ′ , if they share the same ﬁnal transition, and marker v of w , and marker v ′ of w ′ are register equivalent , notation ( w, v ) ≡ r ( w ′ , v ′ ),when they are stored in the same register after occurrence of words w and w ′ .Whereas ≡ l and ≡ t are equivalence relations, ≡ t is a partial equivalence relation(PER), that is, a relation that is symmetric and transitive. Relation ≡ r is notnecessarily reﬂexive, as ( w, v ) ≡ r ( w, v ) only holds when marker v is stored aftersymbolic trace w . Since a register automaton has ﬁnitely many locations, ﬁnitelymany transitions, and ﬁnitely many registers, the equivalences ≡ l and ≡ t , andthe equivalence induced by ≡ r , are all required to have ﬁnite index. Deﬁnition 9.

A feasible symbolic language L over Σ is regular iﬀ there existthree relations: – an equivalence relation ≡ l on L , called location equivalence , – an equivalence relation ≡ t on L \ { ǫ } , called transition equivalence , and – a partial equivalence relation ≡ r on { ( w, v i ) ∈ L × V | i ≤ length ( w ) } , called register equivalence . We say that w stores v if ( w, v ) ≡ r ( w, v ) .We require that equivalences ≡ l and ≡ t , as well as the equivalence relation ob-tained by restricting ≡ t to { ( w, v ) ∈ L × V | w stores v } have ﬁnite index.We also require that relations ≡ l , ≡ t and ≡ r satisfy the conditions of Table 1,for w, w ′ , u, u ′ ∈ L , length ( w ) = m , length ( w ′ ) = n , α, α ′ ∈ Σ , G, G ′ guards, v, v ′ ∈ V , and σ : V ⇀ V . Condition 1 implies that, given w , w ′ and v , there isat most one v ′ s.t. ( w, v ) ≡ r ( w ′ , v ′ ) . Therefore, we may deﬁne matching ( w, w ′ ) as the variable renaming σ satisfying: σ ( v ) =  v ′ if ( w, v ) ≡ r ( w ′ , v ′ ) v n +1 if v = v m +1 undeﬁned otherwise Intuitively, the ﬁrst condition captures that a register can store at most a singlevalue at a time. When wαG and w ′ α ′ G ′ share the same ﬁnal transition, then inparticular w and w ′ share the same ﬁnal location (Condition 2), input symbols α and α ′ are equal (Condition 3), G ′ is just a renaming of G (Condition 4),and wαG and w ′ α ′ G ′ share the same ﬁnal location (Condition 5) and ﬁnalassignment (Conditions 6, 7 and 8). Condition 6 says that the parameters ofthe ﬁnal input end up in the same register when they are stored. Condition 7says that when two values are stored in the same register, they will stay in thesame register for the rest of their life (this condition can be viewed as a rightinvariance condition for registers). Conversely, if two values are stored in thesame register after a transition, and they do not correspond to the ﬁnal input,they were already stored in the same register before the transition (Condition 8).Condition 9 captures the well-formedness assumption for register automata. Asa consequence of Condition 9, G [ σ ] is deﬁned in Conditions 4, 10 and 11, since15 w, v ) ≡ r ( w, v ′ ) ⇒ v = v ′ (1) wαG ≡ t w ′ α ′ G ′ ⇒ w ≡ l w ′ (2) wαG ≡ t w ′ α ′ G ′ ⇒ α = α ′ (3) wαG ≡ t w ′ αG ′ ∧ σ = matching ( w, w ′ ) ⇒ G [ σ ] ≡ G ′ (4) w ≡ t w ′ ⇒ w ≡ l w ′ (5) w ≡ t w ′ ∧ w stores v m ⇒ ( w, v m ) ≡ r ( w ′ , v n ) (6) u ≡ t u ′ ∧ u = wαG ∧ u ′ = w ′ αG ′ ∧ ( w, v ) ≡ r ( w ′ , v ′ ) ∧ u stores v ⇒ ( u, v ) ≡ r ( u ′ , v ′ ) (7) u ≡ t u ′ ∧ u = wαG ∧ u ′ = w ′ αG ′ ∧ ( u, v ) ≡ r ( u ′ , v ′ ) ∧ v = v m +1 ⇒ ( w, v ) ≡ r ( w ′ , v ′ ) (8) w ≡ l w ′ ∧ wαG ∈ L ∧ v ∈ Var ( G ) \ { v m +1 } ⇒ ∃ v ′ : ( w, v ) ≡ r ( w ′ , v ′ ) (9) w ≡ l w ′ ∧ wαG ∈ L ∧ σ = matching ( w, w ′ ) ∧ Sat ( guard ( w ′ ) ∧ G [ σ ]) ⇒ w ′ αG [ σ ] ∈ L (10) w ≡ l w ′ ∧ wαG ∈ L ∧ w ′ αG ′ ∈ L ∧ σ = matching ( w, w ′ ) ∧ Sat ( G [ σ ] ∧ G ′ ) ⇒ wαG ≡ t w ′ αG ′ (11) Table 1: Conditions for regularity of symbolic languages.

Var ( G ) ⊆ domain ( σ ). Condition 10 is the equivalent for symbolic languages ofthe well-known right invariance condition for regular languages. For symboliclanguages a right invariance condition w ≡ l w ′ ∧ wαG ∈ L ∧ σ = matching ( w, w ′ ) ⇒ w ′ αG [ σ ] ∈ L would be too strong: even though w and w ′ lead to the same location, the valuesstored in the registers may be diﬀerent, and therefore they will not necessarilyenable the same transitions. However, when in addition guard ( w ′ ) ∧ G [ σ ] is sat-isﬁable, we may conclude that w ′′ αG [ σ ] ∈ L . Condition 11, ﬁnally, asserts that L only allows deterministic behavior.The simple lemma below asserts that, due to the determinism imposed byCondition 11, the converse of Conditions 2, 3 and 4 also holds. This means that ≡ t can be expressed in terms of ≡ l and ≡ r , that is, once we have ﬁxed ≡ l and ≡ r , relation ≡ t is fully determined. Lemma 9.

Suppose symbolic language L over Σ is regular, and equivalences ≡ l , ≡ t and ≡ r satisfy the conditions of Deﬁnition 9. Then w ≡ l w ′ ∧ wαG ∈ L ∧ w ′ αG ′ ∈ L ∧ σ = matching ( w, w ′ ) ∧ G ′ ≡ G [ σ ] ⇒ wαG ≡ t w ′ αG ′ . Proof.

Suppose the left hand side of the above implication holds. Since L isregular, it is in particular feasible, and therefore G ′ is satisﬁable. But then, since16 ′ ≡ G [ σ ], also G [ σ ] ∧ G ′ is satisﬁable. Therefore, Condition 11 implies that theright hand side of the implication holds.We can now state and prove our “symbolic” version of the celebrated resultof Myhill & Nerode. First we prove that the symbolic language of any regis-ter automaton is regular (Theorem 2), and then we establish that any regularsymbolic language can be obtained as the symbolic language of some registerautomaton (Theorem 3). Theorem 2.

Suppose A is a register automaton. Then L s ( A ) is regular.Proof. Let L = L s ( A ). Then, by Lemma 5, L is feasible. Deﬁne equivalences ≡ l , ≡ t and ≡ r as follows: – For w, w ′ ∈ L , w ≡ l w ′ iﬀ symb ( w ) and symb ( w ′ ) share the same ﬁnallocation. – For w, w ′ ∈ L \ { ǫ } , w ≡ t w ′ iﬀ symb ( w ) and symb ( w ′ ) share the same ﬁnaltransition. – For w, w ′ ∈ L and v, v ′ ∈ V , ( w, v ) ≡ r ( w ′ , v ′ ) iﬀ there is a register x ∈ V suchthat the ﬁnal valuations ζ of symb ( w ) stores v in x , and the ﬁnal valuation ζ ′ of symb ( w ′ ) stores v ′ in x , that is, ζ ( x ) = v and ζ ′ ( x ) = v ′ .(Note that, by Lemma 3, range ( ζ ) ⊆ { v , . . . , v m } , for m = length ( w ), and range ( ζ ′ ) ⊆ { v , . . . , v n } , for n = length ( w ′ ).)Then ≡ l has ﬁnite index since A has a ﬁnite number of locations, ≡ t has ﬁniteindex since A has a ﬁnite number of transitions, and the equivalence induced by ≡ r has ﬁnite index since A has a ﬁnite number of registers.Assume w, w ′ ∈ L , where w contains m input symbols and w ′ contains n input symbols. Let symb ( w ) = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . ( q m − , ζ m − ) α m ,g m ,̺ m −−−−−−−→ ( q m , ζ m ) , symb ( w ′ ) = ( q ′ , ζ ′ ) α ′ ,g ′ ,̺ ′ −−−−−→ ( q ′ , ζ ′ ) . . . ( q ′ n − , ζ ′ n − ) α ′ n ,g ′ n ,̺ ′ n −−−−−−→ ( q ′ n , ζ ′ n ) , as in Deﬁnition 6. We show that all 11 conditions of Table 1 hold: – Condition 1. If ( w, v ) ≡ r ( w, v ′ ) then v and v ′ are stored in the same register x in the ﬁnal valuation ζ m of symb ( w ). Thus v = ζ m ( x ) = v ′ . – Condition 2. If symb ( wαG ) and symb ( w ′ α ′ G ′ ) share the same ﬁnal transi-tion, then symb ( w ) and symb ( w ′ ) certainly share the same ﬁnal location. – Condition 3. If symb ( wαG ) and symb ( w ′ α ′ G ′ ) share the same ﬁnal transi-tion, then α and α ′ must be equal to the input symbols of this ﬁnal transition,and thus equal to each other. – Condition 4. Assume wαG ≡ t w ′ αG ′ and σ = matching ( w, w ′ ). Let symb ( wαG )and symb ( w ′ αG ′ ) be obtained by appending transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )17o symb ( w ) and symb ( w ′ ), respectively. Then q m = q ′ n , g ≡ g ′ , ̺ = ̺ ′ , q = q ′ , G ≡ g [ ι ], where ι = ζ m ∪ { ( p, v m +1 ) } , and G ′ ≡ g ′ [ ι ′ ], where ι ′ = ζ ′ n ∪{ ( p, v n +1 ) } . We have to show that G ′ ≡ G [ σ ], or equivalently g [ σ ◦ ι ] = g [ ι ′ ].Suppose x ∈ Var ( g ). • If x = p then σ ◦ ι ( x ) = σ ◦ ι ( p ) = σ ( v m +1 ) = v n +1 = ι ′ ( p ) = ι ′ ( x ). • If x = p then, by Corollary 1, x ∈ domain ( ζ m ) and x ∈ domain ( ζ ′ n ). Let v = ζ m ( x ) and v ′ = ζ ′ n ( x ). Then, by deﬁnition of ≡ r , ( w, v ) ≡ r ( w ′ , v ′ )and thus σ ( v ) = v ′ . Hence σ ◦ ι ( x ) = σ ◦ ζ m ( x ) = σ ( v ) = v ′ = ζ ′ m ( x ) = ι ′ ( x ). – Condition 5. If symb ( w ) and symb ( w ′ ) share the same ﬁnal transition, theycertainly share the same ﬁnal location. – Condition 6. Assume w ≡ t w ′ and w stores v m . Then there exists a variable x such that ζ m ( x ) = v m . By the deﬁnition of symbolic runs, ζ m = ι m ◦ ̺ m ,where ι m = ζ m − ∪ { ( p, v m ) } . By Lemma 3, range ( ζ m − ) ⊆ { v , . . . , v m − } .We conclude that ̺ m ( x ) = p . Again by the deﬁnition of symbolic runs, ζ ′ n = ι ′ n ◦ ̺ ′ n , where ι ′ n = ζ ′ n − ∪ { ( p, v n ) } . Since w ≡ t w ′ , we know ̺ m = ̺ ′ n .Therefore ζ ′ n ( x ) = ι ′ n ◦ ̺ ′ n ( x ) = ι ′ n ◦ ̺ m ( x ) = ι ′ n ( p ) = v n . This implies( w, v m ) ≡ r ( w ′ , v n ), as required. – Condition 7. Assume that u ≡ t u ′ , u = wαG , u ′ = w ′ αG ′ , u stores v ,and ( w, v ) ≡ r ( w ′ , v ′ ). Let symb ( wαG ) and symb ( w ′ αG ′ ) be obtained byappending transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )to symb ( w ) and symb ( w ′ ), respectively. Then ̺ = ̺ ′ , ζ = ι ◦ ̺ , where ι = ζ m ∪ { ( p, v m +1 ) } , and ζ ′ = ι ′ ◦ ̺ , where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since ( w, v ) ≡ r ( w ′ , v ′ ), there exists an x ∈ V such that ζ m ( x ) = v and ζ ′ n ( x ) = v ′ . Thusalso ι ( x ) = v and ι ′ ( x ) = v ′ . Since u stores v , there exists an y ∈ V suchthat ζ ( y ) = v . By Lemma 4, ι is injective. Thus ι ( ̺ ( y )) = v and ι ( x ) = v implies ̺ ( y ) = x . But this means ζ ′ ( y ) = ι ′ ◦ ̺ ( y ) = ι ′ ( x ) = v ′ . Therefore( u, v ) ≡ r ( u ′ , v ′ ). – Condition 8. Assume u ≡ t u ′ , u = wαG , u ′ = w ′ αG ′ , v = v m +1 and( u, v ) ≡ r ( u ′ , v ′ ). Let symb ( wαG ) and symb ( w ′ αG ′ ) be obtained by append-ing transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )to symb ( w ) and symb ( w ′ ), respectively. Then ̺ = ̺ ′ , ζ = ι ◦ ̺ , where ι = ζ m ∪ { ( p, v m +1 ) } , and ζ ′ = ι ′ ◦ ̺ , where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since ( u, v ) ≡ r ( u ′ , v ′ ), there exists an x ∈ V such that ζ ( x ) = v and ζ ′ ( x ) = v ′ . Using v = v m +1 , we infer that there exists an y ∈ V such that ̺ ( x ) = y and ζ m ( y ) = v . Now we derive ζ ′ n ( y ) = ι ′ ( y ) = ι ′ ◦ ̺ ( x ) = ζ ′ ( x ) = v ′ . Therefore( w, v ) ≡ r ( w ′ , v ′ ). – Condition 9. Assume w ≡ l w ′ , wαG ∈ L and v ∈ Var ( G ) \ { v m +1 } . Let symb ( wαG ) be obtained by appending transition( q m , ζ m ) α,g,̺ −−−→ ( q, ζ )18o symb ( w ). Then q m = q ′ n and G ≡ g [ ι ], where ι = ζ m ∪ { ( p, v m +1 ) } . Since v ∈ Var ( G ) \{ v m +1 } , there exists a variable x ∈ Var ( g ) \{ p } with ζ m ( x ) = v .By Corollary 1, Var ( g ) ⊆ domain ( ζ ′ n ) ∪ { p } , and thus x ∈ domain ( ζ ′ n ). Let v ′ = ζ ′ n ( x ). Then ( w, v ) ≡ r ( w ′ , v ′ ). – Condition 10. Assume that w ≡ l w ′ , wαG ∈ L , σ = matching ( w, w ′ ) and Sat ( guard ( w ′ ) ∧ G [ σ ]). Since wαG ∈ L , symb ( wαG ) can be obtained byappending a transition ( q m , ζ m ) α,g,̺ −−−→ ( q, ζ )to symb ( w ), with G ≡ g [ ι ], where ι = ζ m ∪ { ( p, v m +1 ) } . Since w ≡ l w ′ , q m = q ′ n . Now consider the sequence δ ′ obtained by appending a transition( q ′ n , ζ ′ n ) α,g,̺ −−−→ ( q, ζ ′ )to symb ( w ′ ), with ζ ′ = ι ′ ◦ ̺ , where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since guard ( w ′ ) ∧ G [ σ ] is satisﬁable, we may conclude that δ ′ is a symbolic execution if we canprove G [ σ ] ≡ g [ ι ′ ], or equivalently g [ σ ◦ ι ] = g [ ι ′ ]. Suppose x ∈ Var ( g ). • If x = p then σ ◦ ι ( x ) = σ ◦ ι ( p ) = σ ( v m +1 ) = v n +1 = ι ′ ( p ) = ι ′ ( x ). • If x = p then, by Corollary 1, x ∈ domain ( ζ m ) and x ∈ domain ( ζ ′ n ). Let v = ζ m ( x ) and v ′ = ζ ′ n ( x ). Then, by deﬁnition of ≡ r , ( w, v ) ≡ r ( w ′ , v ′ )and thus σ ( v ) = v ′ . Hence σ ◦ ι ( x ) = σ ◦ ζ m ( x ) = σ ( v ) = v ′ = ζ ′ m ( x ) = ι ′ ( x ).Hence g [ σ ◦ ι ] = g [ ι ′ ] and δ ′ is a symbolic run for w ′ αG [ σ ]. Since q ∈ F , weconclude wαG [ σ ] ∈ L . – Condition 11. Suppose w ≡ l w ′ , wαG ∈ L , w ′ αG ′ ∈ L , σ = matching ( w, w ′ )and G [ σ ] ∧ G ′ is satisﬁable. Let δ = symb ( wαG ) and δ ′ = symb ( w ′ αG ′ ) beobtained by appending transitions( q m , ζ m ) α,g,̺ −−−→ ( q, ζ ) and ( q ′ n , ζ ′ n ) α,g ′ ,̺ ′ −−−−→ ( q ′ , ζ ′ )to symb ( w ) and symb ( w ′ ), respectively. Then G ≡ g [ ι ], where ι = ζ m ∪{ ( p, v m +1 ) } , and G ′ ≡ g ′ [ ι ′ ], where ι ′ = ζ ′ n ∪ { ( p, v n +1 ) } . Since G [ σ ] ∧ G ′ issatisﬁable, there exists a valuation ξ such that ξ | = G [ σ ] ∧ G ′ . Deﬁne variable renaming σ ′ as follows σ ′ ( x ) = (cid:26) ι ′ ( x ) if x ∈ Var ( g ′ ) σ ◦ ι ( x ) otherwiseThen clearly G ′ ≡ g ′ [ ι ′ ] ≡ g ′ [ σ ′ ]. We verify that G [ σ ] ≡ g [ σ ◦ ι ] ≡ g [ σ ′ ]. Let x ∈ Var ( g ). Then • If x = p then σ ◦ ι ( x ) = σ ◦ ι ( p ) = σ ( v m +1 ) = v n +1 = ι ′ ( p ) = ι ′ ( x ). • If x ∈ Var ( g ′ ) \ { p } then, by Corollary 1, x ∈ domain ( ζ m ) and x ∈ domain ( ζ ′ n ). Let ζ m ( x ) = v and ζ ′ n ( x ) = v ′ . Then ( w, v ) ≡ r ( w ′ , v ′ ) andthus σ ( v ) = v ′ . Hence σ ◦ ι ( x ) = σ ◦ ζ m ( x ) = σ ( v ) = v ′ = ζ ′ n ( x ) = ι ′ ( x ).19 If x Var ( g ′ ) then, by deﬁnition of σ ′ , σ ◦ ι ( x ) = σ ′ ( x ).Thus G [ σ ] ∧ G ′ ≡ ( g ∧ g ′ )[ σ ′ ] . Therefore ξ | = ( g ∧ g ′ )[ σ ′ ] and, by Lemma 1, ξ ◦ σ ′ | = g ∧ g ′ . This meansthat g ∧ g ′ is satisﬁable. Since w ≡ l w ′ , q m = q ′ n . Because A is requiredto be deterministic, the conjunction of the guards of any pair of distinct α -transitions from q m = q ′ n is not satisﬁable. Therefore the ﬁnal transitionsof δ and δ ′ must be equal. This implies wαG ≡ t w ′ αG ′ .The following example shows that in general there is no coarsest locationequivalence that satisﬁes all conditions of Table 1. So whereas for regular lan-guages a unique Nerode congruence exists, this is not always true for symboliclanguages. Example 5.

Consider the symbolic language L that consists of the following threesymbolic words and their preﬁxes: w = a v > a v > b ⊤ u = a v = 0 a v = 0 b ⊤ z = a v < c v + v = 0 a v > c ⊤ Symbolic language L is accepted by both automata displayed in Figure 4. Thus,by Theorem 2, L is regular. Let w i , u i and z i denote the preﬁxes of w , u and z , respectively, of length i . Then, according to the location equivalence inducedby the ﬁrst automaton, w ≡ l u , and according to the location equivalenceinduced by the second automaton, u ≡ l z . Therefore, if a coarsest locationequivalence relation would exist, w ≡ l z should hold. Then, by Condition 9,( w , v ) ≡ r ( z , v ). Thus, by Lemma 9, w ≡ t z , and therefore, by Condition 5, w ≡ l z . But now Condition 10 implies a v > a v > c ⊤ ∈ L , which is acontradiction. Theorem 3.

Suppose L is a regular symbolic language over Σ . Then there exista register automaton A such that L = L s ( A ) .Proof. Since any register automaton without accepting locations accepts theempty symbolic language, we may assume without loss of generality that L isnonempty. Let ≡ l , ≡ t , ≡ r be relations satisfying the properties stated in Deﬁni-tion 9. We deﬁne register automaton A = ( Σ, Q, q , F, V, Γ ) as follows: – Q = { [ w ] l | w ∈ L } ∪ { q sink } , where q sink is a special sink location .(Since L is regular, ≡ l has ﬁnite index, and so Q is ﬁnite, as required.) – q = [ ǫ ] l .(Since L is regular, it is feasible, and thus preﬁx closed. Therefore, since weassume that L is nonempty, ǫ ∈ L .) – F = { [ w ] l | w ∈ L } . 20 start q q q q q q a, p < , x := p a, p > , x := pc, x + p = 0 , x := p a, x > a, x > b, ⊤ c, ⊤ a, p = 0 , x := p a, x = 0 q start q q q q q q a, p < , x := p a, p > , x := pc, x + p = 0 , x := p a, x > a, x > b, ⊤ c, ⊤ a, p = 0 , x := p a, x = 0 Fig. 4: There is no unique, coarsest location equivalence. – V = { [( w, v )] r | w ∈ L ∧ v ∈ V ∧ w stores v } .(Since L is regular, the equivalence induced by ≡ r has ﬁnite index, and so V is ﬁnite, as required. Note that registers are supposed to be elements of V , and equivalence classes of ≡ r are not. Thus, strictly speaking, we shouldassociate a unique register of V to each equivalence class of ≡ r , and deﬁne V in terms of those registers.) – Γ contains a transition h q, α, g, ̺, q ′ i for each equivalence class [ wαG ] t , where • q = [ w ] l (Condition 2 ensures that the deﬁnition of q is independent from thechoice of representative wαG .) • (Condition 3 ensures that input symbol α is independent from the choiceof representative wαG .) • g ≡ G [ τ ] where τ is a variable renaming that satisﬁes, for v ∈ Var ( G ), τ ( v ) = (cid:26) [( w, v )] r if w stores vp if v = v m +1 ∧ m = length ( w )(By Condition 9, w stores v , for any v ∈ Var ( G ) \ { v m +1 } , so G [ τ ] iswell-deﬁned. Condition 4 ensures that the deﬁnition of g is independent21rom the choice of representative wαG .) Also note that, by Condition 1, τ is injective.) • ̺ is deﬁned for each equivalence class [( w ′ αG ′ , v ′ )] r with w ′ αG ′ ≡ t wαG and w ′ αG ′ stores v ′ . Let n = length ( w ′ ). Then ̺ ([( w ′ αG ′ , v ′ )] r ) = (cid:26) [( w ′ , v ′ )] r if w ′ stores v ′ p if v ′ = v n +1 (By Condition 8, either v ′ = v n +1 or w ′ stores v ′ , so ̺ ([( w ′ αG ′ , v ′ )] r ) iswell-deﬁned. Also by Condition 8, the deﬁnition of ̺ does not depend onthe choice of representative w ′ αG ′ . By Conditions 6 and 7, assignment ̺ is injective.) • q ′ = [ wαG ] l (Condition 5 ensures that the deﬁnition of q ′ is independent from thechoice of representative wαG .)In order to ensure that A is completely speciﬁed, we add transitions to thesink location q sink . More speciﬁcally, if q ∈ Q is a location with outgoing α -transitions with guards g , . . . , g m , then we add a transition h q, α, ¬ ( g ∨· · · ∨ g m ) , ̺ , q sink i to Γ , for ̺ the trivial assignment with empty domain.Finally, we add, for each α ∈ Σ , a self loop h q sink , α, ⊤ , ̺ , q sink i to Γ . Since L is regular, ≡ t has ﬁnite index and therefore Γ is ﬁnite, as required.Note that in fact there exists a one-to-one correspondence between equiv-alence classes of ≡ t and the transitions in Γ that do not lead to q sink .Because suppose wαG ∈ L and w ′ α ′ G ′ ∈ L induce the same transition h q, α ′′ , g, ̺, q ′ i . Then q = [ w ] l = [ w ′ ] l and thus w ≡ l w ′ . Also α = α ′′ = α ′ and thus α = α ′ . Moreover, G [ τ ] ≡ G ′ [ τ ′ ] (with τ ′ deﬁned as expected).Now observe that G [ τ ] ≡ G [ σ ][ τ ′ ], for σ = matching ( w, w ′ ). Thus we have G [ σ ][ τ ′ ] ≡ G ′ [ τ ′ ]. Since τ ′ is injective, this implies G [ σ ] ≡ G ′ . Now Lemma 9implies wαG ≡ t w ′ α ′ G ′ . So each transition of Γ corresponds to exactly oneequivalence class of ≡ t .We claim that A is deterministic and prove this by contradiction. Suppose h q, α, g ′ , ̺ ′ , q ′ i and h q, α, g ′′ , ̺ ′′ , q ′′ i are two distinct α -transitions in Γ with g ′ ∧ g ′′ satisﬁable. Then there exists a valuation ξ such that ξ | = g ′ ∧ g ′′ . Note that q = q sink , q ′ = q sink and q ′′ = q sink . Let the two transitions correspond to (dis-tinct) equivalence classes [ w ′ αG ′ ] t and [ w ′′ αG ′′ ] t , respectively. Then g ′ = G ′ [ τ ′ ]and g ′′ = G ′′ [ τ ′′ ], with τ ′ and τ ′′ deﬁned as above. Now observe that G ′ [ τ ′ ] ≡ G ′ [ σ ][ τ ′′ ], for σ = matching ( w ′ , w ′′ ). Using Lemma 4, we derive ξ | = g ′ ∧ g ′′ ⇔ ξ | = G ′ [ σ ][ τ ′′ ] ∧ G ′′ [ τ ′′ ] ⇔ ξ | = ( G ′ [ σ ] ∧ G ′′ )[ τ ′′ ] ⇔ ξ ◦ τ ′′ | = G ′ [ σ ] ∧ G ′′ . Thus G ′ [ σ ] ∧ G ′′ is satisﬁable and we may apply Condition 11 to conclude w ′ αG ′ ≡ t w ′′ αG ′′ . Contradiction.So using the assumption that L is regular, we established that A is a registerautomaton. Note that for this we essentially use that equivalences ≡ l , ≡ t and ≡ r have ﬁnite index, as well as all the conditions, except Condition 10.22t remains to prove L = L s ( A ). First, we show that L ⊆ L s ( A ). For this,suppose that w = α G · · · α n G n ∈ L . We need to prove w ∈ L s ( A ). Considerthe following sequence δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) , where q = [ w ] l , w = ǫ , domain ( ζ ) = ∅ and, for 1 ≤ i ≤ n , – q i = [ w i ] l , where w i = α G · · · α i G i , – h q i − , α i , g i , ̺ i , q i i is the transition associated to [ w i ] t , – ζ i = ι i ◦ ̺ i , where ι i = ζ i − ∪ { ( p, v i ) } .Since L is feasible, G ∧ · · · ∧ G n is satisﬁable. Therefore, in order to prove that δ is a symbolic run of A , it suﬃces to show, for 1 ≤ i ≤ n , G i ≡ g i [ ι i ] . Suppose w i stores v . Then, for i > ̺ i ([( w i , v )] r ) = (cid:26) [( w i − , v )] r if w i − stores vp if v ′ = v i By induction on i we prove that w i stores v ⇒ ζ i ([ w i , v )] r ) = v . – Base i = 0. Trivial since w does not store any v . – Induction step. Assume i > w i stores v . We consider two cases: • v = v i . Then ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v i )] r ) = ι i ( p ) = v i = v . • w i − stores v . Then ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v )] r ) = ι i ([( w i − , v )] r )= ζ i − ([( w i − , v )] r ) = v (by induction hypothesis) . By deﬁnition g i ≡ G i [ τ i ], where for v ∈ Var ( G i ), τ i ( v ) = (cid:26) [( w i − , v )] r if w i − stores vp if v = v i This means we need to prove G i ≡ G i [ τ i ][ ι i ], that is, we must show, for v ∈ Var ( G i ), that ι i ( τ i ( v )) = v . There are two cases: – If v = v i then ι i ( τ i ( v )) = ι i ( p ) = v i = v . – If w i − stores v then ι i ( τ i ( v )) = ι i ([( w i , v )] r ) = ζ i ([( w i , v )] r ) = v .We conclude that δ is a symbolic run with strace ( β ) = w . Since w ∈ L , q n =[ w ] l ∈ F , so symbolic run β is accepting, and thus w ∈ L s ( A ), as required.Next we need to show that L s ( A ) ⊆ L . For this, suppose w = α G · · · α n G n ∈ L s ( A ). We need to prove w ∈ L . Let δ = ( q , ζ ) α ,g ,̺ −−−−−→ ( q , ζ ) . . . α n ,g n ,̺ n −−−−−−→ ( q n , ζ n ) , be a symbolic run of A , as in Deﬁnition 6, with strace ( δ ) = w . For 0 < i ≤ n , sup-pose transition h q i − , α i , g i , ̺ i , q i i corresponds to equivalence class [ u i − α i G ′ i ] t .For 0 ≤ i ≤ n , let w i = α G · · · α i G i .We prove by induction that q i = [ w i ] l and w i stores v ⇒ ζ i ([( w i , v )] r ) = v .23 Base i = 0. Trivial, since q = [ ǫ ] l = [ w ] l and domain ( ζ ) = ∅ by deﬁnition. – Induction step. Assume i >

0. Since transition q i − α i ,g i ,̺ i −−−−−→ q i correspondsto equivalence class [ u i − α i G ′ i ] t , q i − = [ u i − ] l . Therefore, by induction hy-pothesis, u i − ≡ l w i − . By Deﬁnition 6, G i ≡ g i [ ι i ] and by deﬁnition of A , g i ≡ G ′ i [ τ ], where for each v ∈ Var ( G ′ i ), τ ( v ) = (cid:26) [( u i − , v )] r if u i − stores vp if v = v m +1 where m = length ( u i − ). Thus G i ≡ G ′ i [ ι i ◦ τ ]. Let σ = matching ( u i − , w i − ).Then, for each v ∈ Var ( G ′ i ), ι i ◦ τ ( v ) = σ ( v ): • If v = v m +1 then ι i ◦ τ ( v ) = ι i ( p ) = v i = σ ( v ). • If v = v m +1 then, by Condition 9, there exists a v ′ such that ( u i − , v ) ≡ r ( w i − , v ′ ). Then, again by induction hypothesis, ι i ◦ τ ( v ) = ι i ([( u i − , v )] r ) = ζ i − ([( u i − , v )] r ) = ζ i − ([( w i − , v ′ )] r ) = v ′ = σ ( v ) . Therefore G i ≡ G ′ i [ σ ]. Since δ is a symbolic run, guard ( w i − ) ∧ G i is satisﬁ-able. Now we may use Condition 10 to conclude w i = w i − α i G i ∈ L . Then,by Lemma 9, u i − α i G ′ i ≡ t w i , and thus, by Condition 5, u i − α i G ′ i ≡ l w i .From this, we conclude q i = [ w i ] l .Suppose w i stores v . Since u i − α i G ′ i ≡ t w i , ̺ i ([( w i , v )] r ) = (cid:26) [( w i − , v )] r if w i − stores vp if v ′ = v i Assume w i stores v . We consider two cases: • v = v i . Then ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v i )] r ) = ι i ( p ) = v i = v . • w i − stores v . Then, using the induction hypothesis, ζ i ([( w i , v )] r ) = ι i ◦ ̺ i ([( w i , v )] r ) = ι i ([( w i − , v )] r )= ζ i − ([( w i − , v )] r ) = v. Thus in particular q n = [ w n ] l = [ w ] l . This implies w ∈ L , as required.As a ﬁnal note, we observe that A is well-formed. Because suppose δ isa symbolic run that ends with ( q, ζ ) and suppose q α,g̺ −−−→ q ′ . Let transition q α,g̺ −−−→ q ′ correspond to equivalence class [ wαG ] t . Suppose x ∈ Var ( g ). Then,by construction of A , there is a variable v ∈ Var ( G ) such that either x =[( w, v )] r and w stores v , or x = p and v = v m +1 , where m = length ( w ). Let w ′ = strace ( δ ). By the above inductive proof, q = [ w ′ ] l and w ′ stores v ′ ⇒ [( w ′ , v ′ )] r ∈ domain ( ζ ). Then w ≡ l w ′ and by Condition 9, either v = v m +1 there exists a v ′ such that ( w, v ) ≡ r ( w ′ , v ′ ). This means that either x = p or x ∈ domain ( ζ ). Hence we may conclude that Var ( g ) ⊆ domain ( ζ ) ∪ { p } andthus A is well-formed by Corollary 1. 24 eferences

1. F. Aarts, F. Heidarian, H. Kuppens, P. Olsen, and F.W. Vaandrager. Automatalearning through counterexample-guided abstraction reﬁnement. In D. Gian-nakopoulou and D. M´ery, editors, , volume 7436 of

Lecture Notes in Computer Science , pages 10–27. Springer, August 2012.2. F. Aarts, B. Jonsson, J. Uijen, and F.W. Vaandrager. Generating models of inﬁnite-state communication protocols using regular inference with abstraction.

FormalMethods in System Design , 46(1):1–41, 2015.3. D. Angluin. Learning regular sets from queries and counterexamples.

Inf. Comput. ,75(2):87–106, 1987.4. Dana Angluin and Dana Fisman. Learning regular omega languages.

Theor. Com-put. Sci. , 650:57–72, 2016.5. Michael Benedikt, Clemens Ley, and Gabriele Puppis. What you must rememberwhen processing data words. In Alberto H. F. Laender and Laks V. S. Laksh-manan, editors,

Proceedings of the 4th Alberto Mendelzon International Workshopon Foundations of Data Management, Buenos Aires, Argentina, May 17-20, 2010 ,volume 619 of

CEUR Workshop Proceedings . CEUR-WS.org, 2010.6. Mikolaj Bojanczyk, Bartek Klin, and Slawomir Lasota. Automata with groupactions. In

Proceedings of the 26th Annual IEEE Symposium on Logic in ComputerScience, LICS 2011, June 21-24, 2011, Toronto, Ontario, Canada , pages 355–364.IEEE Computer Society, 2011.7. Matko Botinˇcan and Domagoj Babi´c. Sigma*: Symbolic learning of input-outputspeciﬁcations. In

Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Sym-posium on Principles of Programming Languages , POPL ’13, pages 443–456, NewYork, NY, USA, 2013. ACM.8. Cristian Cadar and Koushik Sen. Symbolic execution for software testing: Threedecades later.

Commun. ACM , 56(2):8290, February 2013.9. S. Cassel, F. Howar, and B. Jonsson. RALib: A LearnLib extension forinferring EFSMs. In

DIFTS 15, Int. Workshop on Design and Implemen-tation of Formal Tools and Systems , Austin, Texas, 2015. Available at .10. S. Cassel, F. Howar, B. Jonsson, M. Merten, and B. Steﬀen. A succinct canonicalregister automaton model.

J. Log. Algebr. Meth. Program. , 84(1):54–66, 2015.11. S. Cassel, F. Howar, B. Jonsson, and B. Steﬀen. Active learning for extended ﬁnitestate machines.

Formal Asp. Comput. , 28(2):233–263, 2016.12. Chia Yuan Cho, Domagoj Babi´c, Pongsin Poosankam, Kevin Zhijie Chen, Ed-ward XueJun Wu, and Dawn Song. Mace: Model-inference-assisted concolic explo-ration for protocol and vulnerability discovery. In

Proceedings of the 20th USENIXConference on Security , SEC’11, pages 10–10, Berkeley, CA, USA, 2011. USENIXAssociation.13. P. Fiter˘au-Bro¸stean, R. Janssen, and F.W. Vaandrager. Combining model learn-ing and model checking to analyze TCP implementations. In S. Chaudhuri andA. Farzan, editors,

Proceedings 28th International Conference on Computer AidedVeriﬁcation (CAV’16),

Toronto, Ontario, Canada, volume 9780 of

Lecture Notesin Computer Science , pages 454–471. Springer, 2016.14. Paul Fiter˘au-Bro¸stean and Falk Howar. Learning-based testing the sliding windowbehavior of TCP implementations. In Laure Petrucci, Cristina Seceleanu, and AnaCavalcanti, editors,

Critical Systems: Formal Methods and Automated Veriﬁcation Joint 22nd International Workshop on Formal Methods for Industrial CriticalSystems - and - 17th International Workshop on Automated Veriﬁcation of CriticalSystems, FMICS-AVoCS 2017, Turin, Italy, September 18-20, 2017, Proceedings ,volume 10471 of

Lecture Notes in Computer Science , pages 185–200. Springer,2017.15. Paul Fiter˘au-Bro¸stean, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits Vaandrager,and Patrick Verleg. Model learning and model checking of SSH implementations. In

Proceedings of the 24th ACM SIGSOFT International SPIN Symposium on ModelChecking of Software , SPIN 2017, pages 142–151, New York, NY, USA, 2017. ACM.16. Nissim Francez and Michael Kaminski. An algebraic characterization of determin-istic regular languages over inﬁnite alphabets.

Theor. Comput. Sci. , 306(1-3):155–175, 2003.17. Bharat Garhewal, Frits Vaandrager, Falk Howar, Timo Schrijvers, Toon Lenaerts,and Rob Smits. Grey-box learning of register automata, June 2020. Under sub-mission.18. Dimitra Giannakopoulou, Zvonimir Rakamari´c, and Vishwanath Raman. Sym-bolic learning of component interfaces. In

Proceedings of the 19th InternationalConference on Static Analysis , SAS’12, pages 248–264, Berlin, Heidelberg, 2012.Springer-Verlag.19. Patrice Godefroid, Nils Klarlund, and Koushik Sen. Dart: Directed automatedrandom testing.

SIGPLAN Not. , 40(6):213–223, June 2005.20. Andreas Hagerer, Tiziana Margaria, Oliver Niese, Bernhard Steﬀen, Georg Brune,and Hans-Dieter Ide. Eﬃcient regression testing of CTI-systems: Testing a complexcall-center solution.

Annual review of communication, Int.Engineering Consortium(IEC) , 55:1033–1040, 2001.21. J.E. Hopcroft and J.D. Ullman.

Introduction to Automata Theory, Languages andComputation . Addison-Wesley, 1979.22. Matthias H¨oschele and Andreas Zeller. Mining input grammars from dynamictaints. In David Lo, Sven Apel, and Sarfraz Khurshid, editors,

Proceedings of the31st IEEE/ACM International Conference on Automated Software Engineering,ASE 2016, Singapore, September 3-7, 2016 , pages 720–725. ACM, 2016.23. F. Howar, M. Isberner, B. Steﬀen, O. Bauer, and B. Jonsson. Inferring semanticinterfaces of data structures. In

ISoLA (1): Leveraging Applications of FormalMethods, Veriﬁcation and Validation. Technologies for Mastering Change - 5thInternational Symposium, ISoLA 2012, Heraklion, Crete, Greece, October 15-18,2012, Proceedings, Part I , volume 7609 of

Lecture Notes in Computer Science ,pages 554–571. Springer, 2012.24. Falk Howar, Dimitra Giannakopoulou, and Zvonimir Rakamari´c. Hybrid learning:Interface generation through static, dynamic, and symbolic analysis. In

Proceedingsof the 2013 International Symposium on Software Testing and Analysis , ISSTA2013, pages 268–279, New York, NY, USA, 2013. ACM.25. Falk Howar, Bengt Jonsson, and Frits W. Vaandrager. Combining black-box andwhite-box techniques for learning register automata. In Bernhard Steﬀen andGerhard J. Woeginger, editors,

Computing and Software Science - State of theArt and Perspectives , volume 10000 of

Lecture Notes in Computer Science , pages563–588. Springer, 2019.26. Falk Howar and Bernhard Steﬀen. Active automata learning in practice. In AmelBennaceur, Reiner H¨ahnle, and Karl Meinke, editors,

Machine Learning for Dy-namic Software Analysis: Potentials and Limits: International Dagstuhl Seminar16172, Dagstuhl Castle, Germany, April 24-27, 2016, Revised Papers , pages 123–148. Springer International Publishing, 2018.

7. Malte Isberner, Falk Howar, and Bernhard Steﬀen. The TTT algorithm: Aredundancy-free approach to active automata learning. In Borzoo Bonakdarpourand Scott A. Smolka, editors,

Runtime Veriﬁcation: 5th International Conference,RV 2014, Toronto, ON, Canada, September 22-25, 2014. Proceedings , pages 307–322, Cham, 2014. Springer International Publishing.28. Oded Maler and Ludwig Staiger. On syntactic congruences for omega-languages.

Theor. Comput. Sci. , 183(1):93–112, 1997.29. A. Nerode. Linear automaton transformations.

Proceedings of the American Math-ematical Society , 9(4):541–544, 1958.30. R.L. Rivest and R.E. Schapire. Inference of ﬁnite automata using homing se-quences.

Inf. Comput. , 103(2):299–347, 1993.31. M. Schuts, J. Hooman, and F.W. Vaandrager. Refactoring of legacy softwareusing model learning and equivalence checking: an industrial experience report. InE. ´Abrah´am and M. Huisman, editors,

Proceedings 12th International Conferenceon integrated Formal Methods (iFM),

Reykjavik, Iceland, June 1-3, volume 9681of

Lecture Notes in Computer Science , pages 311–325, 2016.32. M. Shahbaz and R. Groz. Inferring mealy machines. In Ana Cavalcanti and DennisDams, editors,

FM 2009: Formal Methods, Second World Congress, Eindhoven,The Netherlands, November 2-6, 2009. Proceedings , volume 5850 of

Lecture Notesin Computer Science , pages 207–222. Springer, 2009.33. F.W. Vaandrager. Model learning.

Communications of the ACM , 60(2):86–95,February 2017., 60(2):86–95,February 2017.