[PDF] Abstract Transducers

Abstract

Several abstract machines that operate on symbolic input alphabets have been proposed in the last decade, for example, symbolic automata or lattice automata. Applications of these types of automata include software security analysis and natural language processing. While these models provide means to describe words over infinite input alphabets, there is no considerable work on symbolic output (as present in transducers) alphabets, or even abstraction (widening) thereof. Furthermore, established approaches for transforming, for example, minimizing or reducing, finite-state machines that produce output on states or transitions are not applicable. A notion of equivalence of this type of machines is needed to make statements about whether or not transformations maintain the semantics. We present abstract transducers as a new form of finite-state transducers. Both their input alphabet and the output alphabet is composed of abstract words, where one abstract word represents a set of concrete words. The mapping between these representations is described by abstract word domains. By using words instead of single letters, abstract transducers provide the possibility of lookaheads to decide on state transitions to conduct. Since both the input symbol and the output symbol on each transition is an abstract entity, abstraction techniques can be applied naturally. We apply abstract transducers as the foundation for sharing task artifacts for reuse in context of program analysis and verification, and describe task artifacts as abstract words. A task artifact is any entity that contributes to an analysis task and its solution, for example, candidate invariants or source code to weave.

Full PDF

11Abstract Transducers ∗ ANDREAS STAHLBAUER,

University of PassauSeveral abstract machines that operate on symbolic input alphabets have been proposed in the last decade, forexample, symbolic automata or lattice automata. Applications of these types of automata include softwaresecurity analysis and natural language processing. While these models provide means to describe wordsover infinite input alphabets, there is no considerable work on symbolic output (as present in transducers)alphabets, or even abstraction (widening) thereof. Furthermore, established approaches for transforming, forexample, minimizing or reducing, finite-state machines that produce output on states or transitions are notapplicable. A notion of equivalence of this type of machines is needed to make statements about whether ornot transformations maintain the semantics.We present abstract transducers as a new form of finite-state transducers. Both their input alphabet and theoutput alphabet is composed of abstract words , where one abstract word represents a set of concrete words.The mapping between these representations is described by abstract word domains . By using words insteadof single letters, abstract transducers provide the possibility of lookaheads to decide on state transitions toconduct. Since both the input symbol and the output symbol on each transition is an abstract entity, abstraction techniques can be applied naturally.We apply abstract transducers as the foundation for sharing task artifacts for reuse in context of programanalysis and verification, and describe task artifacts as abstract words. A task artifact is any entity thatcontributes to an analysis task and its solution, for example, candidate invariants or source code to weave.CCS Concepts: •

Theory of computation → Abstract machines ; Automata over infinite objects ; Ab-straction ; •

Software and its engineering → Automated static analysis ; System modeling languages . ∗ This is a preliminary report. Please refer to the final version of this work if available.Author’s address: Andreas Stahlbauer, [email protected], University of Passau. a r X i v : . [ c s . F L ] O c t :2 Andreas Stahlbauer q q q q q q q q q /{ p } { a }/{ ϵ } { b }/{ s }{ ϵ }/{ t }{ d }/{ ϵ } { e }/{ y } { ϵ }/{ v }{ ϵ }/{ u }{ ϵ }/{ w }{ c }/{ x } Input Output ϵ pa pde pyab ps(tuv)*wabc ps(tuv)*wx Fig. 1. Each transition of an abstract transducer is annotated with an abstract input word v and an abstractoutput word w . An abstract input word denotes [[ v ]] ⊆ Σ ∗ a set of words over an input alphabet Σ , an abstractoutput word denotes [[ w ]] ⊆ Θ ∞ a set of words over an output alphabet Θ ; the illustration shows the setsof concrete words. Please note that already the activation of the initial state q emits an abstract outputword—in this case: the set { p } of concrete words. We use the abstract epsilon word v ϵ , with [[ v ϵ ]] = { ϵ } ,with the semantics that is known from ϵ -NFAs [48, 70]. Transitions with the abstract input word v ϵ arecalled ϵ -moves and can result in output words of infinite length, which result from ϵ -loops—for example, theloop q → q → q → q . The table on the right shows—for the transducer on the left—a set of input words andcorresponding output words in the form of regular expressions. We present abstract transducers as a new type of abstract machines that operate on an abstractinput alphabet and an abstract output alphabet, and that have an inherent notion of abstraction.Both the input alphabet and the output alphabet are described based on abstract domains, whichenables different forms of abstracting these transducers and allows for different forms of symbolic representations. An abstract representation of words is essential for creating finite abstractions of possibly exponentially many and infinitely long output words, and abstraction of a transducerallows to increase the sharing of its outputs, that is, one output becomes applicable to a wider set ofinput words. Different abstract domains, and respective lattices, have been proposed to representand abstract states and behaviors of systems and their relationships [26]. An abstract domainprovides means to map between abstract and concrete entities. Combining abstract domains andfinite-state transducers results in a generic formalism that provides a unified view on differenttypes of automata and transducers, and enables new applications in different areas, for example, inprogram analysis and verification. Figure 1 illustrates the working principle of abstract transducers.

Problems.

Abstract transducers address several problems: (1) In case alphabets consist of many,possibly exponentially many, symbols, traditional automata concepts with single concrete symbolsper transitions provide limited efficiency. Automata that employ a symbolic alphabet—where onesymbol from the alphabet denotes a set of concrete symbols—solve this issue [74, 75]. Having asymbolic representation of alphabet symbols makes approaches for abstracting (or widening) finite-state machines—such as relational abstraction or alphabet abstraction [23, 65]—applicable. We useabstract domains, as known from abstract interpretation, for constructing symbolic representations,and mapping between concrete and symbolic alphabets. This way, we can choose from a largevariety of abstract domains to provide different symbolic and explicit mechanisms for representingdata, for example, binary decision diagrams [21], predicates [7, 40], or polyhedra [69]. Abstractionis also essential for output words, which are produced by transducers, and has not yet receivedattention by researchers. (2) We allow the transducers to have ϵ -moves that are annotated withoutputs, which can lead to output words of infinite length ; here, a symbolic representation of setsof output words, based on corresponding abstract domains for the output alphabet, can help to bstract Transducers 1:3 q q q / w ϵ v / w ϵ v / w ϵ ¬ v / w ϵ v / w ¬ v / w ϵ v / w [[ w ϵ ]] = { ϵ }[[ w ]] = { $2 != 7 }[[ w ]] = { $2 == 7 } [[ v ]] = { enter() }[[ v ]] = { leave() }[[ v ]] = { o ◦ alloc($1, $2) | o ∈ Θ } (a) Yarn Transducer q q q q / w ϵ v / w ¬ v / w ϵ ¬ v / w v / w ¬ v / w v / w ⊤/ w [[ w ϵ ]] = { true }[[ w ]] = { i ≡ }[[ w ]] = { i ≡ }[[ w ]] = { i ≡ } [[ v ]] = { i = i + 1 } (b) Precision TransducerFig. 2. Two types of abstract transducers are illustrated: A Yarn transducer that emits code to weave, that is,it corresponds to an aspect in AOP, and a precision transducer, which is a means to share candidate invariants,or predicates, for (re-)use in predicate abstraction. Please note that the abstract input word v describes alookahead, that is, it contains a word ¯ σ ∈ [[ v ]] with | ¯ σ | > . The lookahead matches if the input that remainsto be consumed after word v matched starts with alloc($1,$2) , where $1 and $2 are parameters that bindthe arguments that are given to the function alloc to internal variables, which are then used to produce theconcrete output for the abstract output words w and w . provide a finite representation that represents or even overapproximates sets of exponentiallymany and infinitely long words. By having a means for abstracting both the input alphabet and theoutput alphabet, we can implement further, more elaborated techniques with various applications.We abstract our transducers to increase the sharing of the output they emit. An abstract transducermight have been constructed to produce its output for a specific set of input words that can befound in a specific analysis task, that is, (3) the reuse of the output can be limited to a specific setof analysis tasks, while the output would also be applicable to a broader set of tasks. Sharing isincreased if a given output word becomes produced for a larger set of input words—that is, we takeadvantage of the nondeterminism that abstraction introduces [4]. The alphabets from that thesewords can be composed of can (in general) consist of arbitrarily complex entities (symbols), forexample, tuples of concrete letters as used for multi-track automata [23]. (4) Nevertheless, also forthese complex symbols, a means of abstraction is needed. Constructing complex alphabets, andwords thereof, based on abstract product domains [25] addresses this issue. Applications.

We instantiate abstract transducers as task artifact transducers. A task artifacttransducer is an abstract transducer that maps between a set of control paths of a given programto analyze and a set of task artifacts, which are intended to be shared for reuse. Task artifacttransducers are a generic means to provide artifacts that contributes to an analysis task and itssolution . These task artifact transducers aid in various analysis tasks for that task artifacts, forexample, intermediate verification results, have to be provided at specific points and in specificcontexts in the control flow. By using such transducers as means for sharing artifacts for reuse,we gain precise control over the sharing process: We can precisely specify at which points and inwhich context (path prefix), of the control flow of a program, certain artifacts should be sharedfor reuse. We use them both to construct the transition relation of the analysis task itself, and forconstructing a state-space abstraction with a finite number of abstract states in an efficient andeffective manner, that is, for sharing syntactic and semantic task artifacts. Syntactic task artifacts :4 Andreas Stahlbauer include, for example, components, aspects, or assertions to check [8, 51]. Semantic task artifactsinclude, for example, function summaries [68], invariants, or Craig interpolants [46, 47]. The goalof sharing task artifacts is to make the overall process of constructing syntactic and semantic taskmodels more efficient and effective.We present two forms of task artifact transducers based on abstract transducers in anotherwork [72]: Yarn transducers and precision transducers. Figure 2 provides examples for these typesof abstract transducers. A

Yarn transducer can express aspects—source code, or labeled transitionsystems (LTSs) in general, to emit at specific points—to weave into a control-flow graph. Suchaspects can, for example, provide the environment model or a specification. It must be possible toemit code to weave before any of the transitions that are processed as input: An initial transduceroutput is needed. For soundness, operations such as ϵ -elimination, union, or reduction must keep thesemantics—including their temporal relationships, also concurrency—of these aspects. A precisiontransducer is annotated with sets of predicates (candidate invariants) to emit for reuse in differentcontexts of the transition system to construct (for example, a Kripke structure) in an analysisprocess. The shared predicates can be used to compute predicate abstractions (as used for softwaremodel checking [6, 40]), the number of CEGAR [17, 24] iterations can be reduced by abstractingthese transducers, which increases sharing (the same predicate can be emitted in more contexts).Such precision transducers can also express the predicate sharing strategy of lazy abstraction [47]. Contributions.

This work presents the following contributions and shares most of the materialwith the author’s thesis [72]: • Abstract Transducers.

We introduce abstract transducers as a generic and unifying type ofabstract machines that use abstract word domains to characterize both the input alphabet andthe output alphabet, and that have an inherent notion of abstraction . • Abstract Output Closure.

We present techniques for computing finite abstractions of the output of ϵ -closures with ϵ -loops , which are possible if ϵ -moves are allowed. These techniques allowto produce finite outputs from transducers with outputs that describe exponentially largesets of potentially infinitely long words, and they aid in eliminating the ϵ -moves. • Transducer Abstraction.

We exactly define what it means to abstract (or overapproximate)an abstract transducer. Based on this notion of abstraction we discuss different types ofabstractions and define corresponding operators. • Transducer Reduction.

After defining the notion of equivalence of abstract transducers, wediscuss different transformations that maintain their semantics while reducing their numberof control states and transitions. Such reduction techniques help to reduce the degree ofnon-determinism, which reduces the costs of executing abstract transducers. • Transducer Analysis.

We present an abstract transducer analysis as a generic configurableprogram analysis for running different types of abstract transducers. • Task Artifact Transducers.

We instantiate abstract transducers as task artifact transducers tohave a generic means to share various artifacts that contribute to different concerns of ananalysis task. Task artifact transducers foster the reuse of components of an analysis task andthe intermediate analysis (reasoning) results that are produced while conducting an analysis.

Key Insights. (1) Abstract transducers have a fundamentally different semantic compared to othertransducers with symbolic alphabets, such as symbolic transducers [31], which becomes obviouswhen comparing their notions of equivalence. (2) Existing algorithms for transforming finite-statetransducers are not applicable for abstract transducers. (3) Abstracting abstract transducers is ameans for systematically increasing the scope of sharing artifacts for reuse. bstract Transducers 1:5

We start with preliminaries, including the notation: We denote sets A , B , . . . by upper case lettersor add a hat (cid:98) a , (cid:98) b , . . . to signal that an entity is a set. Set elements a , b , . . . are denoted by lower caseletters. We add a bar ¯ a , ¯ B , . . . to denote lists (or sequences) . Elements of sets are enclosed { a , b , c } incurly brackets, components of tuples are enclosed ( A , x , y ) in round brackets, elements of lists areenclosed ⟨ a , b , c ⟩ in angle brackets. Languages and Words.

The set of all finite words over an alphabet Σ is denoted by Σ ∗ , which is afree monoid Σ ∗ = ( Σ , ◦ , ϵ ) , also known as Kleene star, where concatenation ◦ : Σ ∗ × Σ ∗ → Σ ∗ is itsbinary operator and its neutral element ϵ is the empty word . A word ¯ σ is a sequence ⟨ σ , . . . , σ n ⟩ ofsymbols from the alphabet. The length | ¯ σ | of a word ¯ σ is its number of subsequent symbols; theempty word has length | ϵ | = |⟨⟩| =

0. Given two words ¯ σ = ⟨ σ , . . . , σ n ⟩ and ¯ τ = ⟨ τ , . . . , τ m ⟩ , theconcatenation ¯ σ ◦ ¯ τ results in the word ¯ c = ⟨ σ , . . . , σ n , τ , . . . , τ m ⟩ with length | ¯ c | = | ¯ σ | + | ¯ τ | . Aword ¯ σ a is prefix of another word ¯ σ b , that is, it is element ( ¯ σ a , ¯ σ b ) ∈⪯ of the prefix relation ⪯ , ifthere exists a suffix ¯ σ s ∈ Σ ∗ such that ¯ σ b = ¯ σ a ◦ ¯ σ s . While the finitely words over an alphabet Σ are denoted by Σ ∗ , the infinitely long ones are denoted by Σ ω , and the set of all words is denotedby Σ ∞ = Σ ∗ ∪ Σ ω [55], with the infinite iteration · ω and the finite iteration · ∗ . The set of all wordsover an alphabet Σ that is described by a structure S and that are considered well-formed regardingcertain production rules is called language L( S ) ⊆ Σ ∗ of S . The empty word language { ϵ } consistsof the empty word ϵ only, the empty language corresponds to the empty set ∅ . Lattices.

A (complete) lattice [41] is a tuple (cid:221) E = ( E , ⊑ , ⊓ , ⊔ , ⊤ , ⊥) , with a set of abstract elements E and a partial order relation ⊑ ⊆ E × E . The operator meet is a relation ⊓ : ( E × E ) → E that providesthe greatest lower bound for a given pair ( e , e ) ∈ E × E of abstract elements. The operator join is arelation ⊔ : E × E → E that provides the least upper bound for a given pair of abstract elements.The bottom element ⊥ is the least element in the partial order relation, that is, there exists no otherelement e ⊑ ⊥ , with e (cid:44) ⊥ . The top element ⊤ is the greatest element in the partial order, that is,there exists no other element ⊤ ⊑ e , with e (cid:44) ⊤ . The operators ⊓ and ⊔ extend to sets naturally:The meet over a set of abstract elements is denoted by (cid:100) : 2 E × E , and the join by (cid:195) : 2 E × E , forexample, ⊥ = (cid:100) E and ⊤ = (cid:195) E . A semi lattice either has no meet or no join for all abstract elements.The relation ⊑ is also called the inclusion relation [77]. Partial ordered sets ( posets ) can be madesemi-lattices, and semi-lattices can be made complete by adding additional abstract elements [38].A lattice element e c ∈ E is called complement of an element e ∈ E iff e ⊓ e c = ⊥ and e ⊔ e c = ⊤ .We denote the complement of an lattice element e by ¬ e . A lattice is called complemented iff thereexists a complement for all its elements. Powerset Lattice.

The powerset lattice that describes a Hoare powertheory [2, 34, 44]—over agiven lattice (cid:221) E = ( E , ⊑ E , ⊓ E , ⊔ E , ⊤ E , ⊥ E ) —is denoted by pw ( (cid:221) E ) = ( E , ⊑ , ⊓ , ⊔ , ⊤ , ⊥) , where the setof elements is constituted by the set of all subsets 2 E of set E . The inclusion relation ⊑ has theelement ( E , E ) ∈⊑ if and only if ∀ e ∈ E ∃ e ∈ E : e ⊑ E e . The join ⊔( E , E ) = E ∪ E is theunion, and the meet ⊓( E , E ) = E ∩ E is the intersection of two given sets E , E ⊆ E . The bottomelement ⊥ = ∅ is the empty set, and the top element ⊤ = E corresponds to the set with all elements.Any complemented distributive lattice is isomorphic to a Boolean algebra [49], which also followsfrom the Stone duality [73]; one example for such lattices are powerset lattices. Lattices generalizeBoolean algebras by not requiring complement and distributivity in the first hand. Map Lattice. A map lattice ml ( K , (cid:221) V ) = ( K → V , ⊑ , ⊓ , ⊔ , ⊤ , ⊥) is a lattice of elements that aremaps, that is, the elements are functions that map from a set K of keys to a set V of values; thevalues of this map are elements of another lattice (cid:221) V = ( V , ⊑ V , ⊓ V , ⊔ V , ⊤ V , ⊥ V ) . Such a latticeare also known as function lattice [5, 32]. The inclusion relation ⊑ has element ( m , m ) ∈⊑ if :6 Andreas Stahlbauer and only if ∀ k ∈ K : m ( k ) ⊥ ⊑ V m ( k ) ⊥ . In the following, we rely on the function m ( k ) ⊥ = m ( k ) if ( k , ·) ∈ m otherwise ⊥ V which returns the value for a given key k from a map m ,and the bottom element of the value lattice if no entry for the key is present. The meet ⊓ isdefined by ⊓( m , m ) = {( k , v ⊓ V v ) | ( k , v ) ∈ m ∧ v = m ( k ) ⊥ } , the join ⊔ is defined by ⊔( m , m ) = {( k , m ( k ) ⊥ ⊔ V m ( k ) ⊥ ) | k ∈ keys ( m ) ∪ keys ( m )} , the top element ⊤ is definedby ⊤ = {( k , ⊤ V ) | k ∈ K } , and the bottom element ⊥ is defined by ⊥ = {( k , ⊥ V ) | k ∈ K } .We define an image-join operator (cid:195) → : 2 K × E → K → E : Given a map M ⊆ K × E , with a set ofkeys K , and a set of lattice elements E , the operator joins all tuples ( k , e ) ∈ M with the same key k into one tuple with a value that aggregates all value elements e , that is, (cid:195) → M = { ( k , (cid:195) { e | e ∈{( k , e ) ∈ M }}) | k ∈ { k | ( k , ·) ∈ M } } . Abstract Domains.

An abstract domain D = ( (cid:221) C , (cid:221) E , [[·]] , ⟨⟨·⟩⟩) [27] is defined based on a tuplethat consists of a lattice (cid:221) C of the set of concrete elements C , a lattice (cid:221) E of the set of abstractelements E , a denotation function [[·]] and an abstraction function ⟨⟨·⟩⟩ . The set C consists of allpossible interpretations of elements from the set of abstract elements E for a specific universe. The denotation [[ e ]] : E → C of an abstract element e is the set of all its possible interpretations—asknown from denotational semantics [2]. The abstraction ⟨⟨ e ⟩⟩ of an abstract element e results in a newabstract element e ′ with [[ e ]] ⊆ [[ e ′ ]] . The abstraction ⟨⟨ C k ⟩⟩ of a set of concrete elements C k ⊆ C results in an abstract element e , with [[ e ]] = C k . An abstraction ⟨⟨·⟩⟩ π with widening producesan abstraction with an abstraction precision π , which can result in a widening. The abstractionprecision π ∈ Π [59] defines the set of details that the resulting abstraction should maintain for soundreasoning. Two elements are called semantically equal , that is, e ≡ e , if and only if [[ e ]] = [[ e ]] inthe same universe. One element semantically implies another element, that is, e ⊨ e , if and onlyif [[ e ]] ⊆ [[ e ]] . Before we present abstract transducers, we describe concepts to cope with sets of possibly ex-ponentially many and infinitely long words symbolically. A word can express temporal or causalrelationships between the letters of the word. We introduce concepts and techniques to deal withsets of words on an abstract level.

We now discuss established terms that are relevant in the context of the terms that we introduce inthe following sections. This helps to understand our terminology choices.Both the input alphabet and the output alphabet of an abstract transducer is characterized basedon an abstract domain.

Abstract domains are a generic means for abstraction and provide variousoperations for manipulating and comparing abstract elements (entities) [26], and for mappingbetween concrete and abstract elements .Elements from a set Σ can be combined to form (possibly infinite) sequences ¯ σ ∈ Σ ∞ of thoseelements. We use the term word to denote sequences of elements that can be formed from otherwords by concatenation. Words are elements of a free monoid (semigroup) for that concatenation isthe binary and associative operator, and the empty word (empty sequence) is the identity (neutral)element. A language is a set of words—and typically well-formed regarding some production rules.In a generic abstract domain, one abstract element maps to a set of concrete elements , which isreflected by the denotation (concretization) function [[·]] . That is, we can deduce that one abstractword represents a set of concrete words , and an abstract language maps to a set of concrete languages .A word, as mentioned earlier, establishes a temporal relationship between all its characters;each character has a semantic denotation on its own, that is, it maps to a set of entities. The bstract Transducers 1:7 expressiveness of words compared their characters is dual to the expressiveness of linear temporallogic to propositional logic: A formula in propositional logic (interpreted for a specific universe)denotes a set of entities, whereas a formula in linear temporal logic denotes sequences of sets ofentities (over time). A set of words , that is, a language , provides sufficient expressiveness to describea set of forks in words over time, for example, to describe a set of concurrent program executions,or for matching trees or (more general) graphs.That is, an abstract word, which maps to a set of concrete words, provides an abstraction withsufficient expressiveness to describe sets of linear-time concerns, and an abstract language, whichrepresents a set of sets of words, provides expressiveness to describe sets of concerns that areexpressible in branching-time logic. In the following, we restrict the discussion and presentation toabstract words and keep abstract languages for future work. The foundation of abstract transducers is formed by the abstract word domain, a lattice-basedabstract domain [26, 35] for mapping between abstract words and concrete words.

Definition 3.1 (Abstract Word). An abstract word v ∈ I is a symbolic representation of a set ⊆ Σ ∞ of concrete words over a concrete alphabet Σ , where the set I denotes all abstract words.The relationship between an abstract word and the set of concrete words it represents, along witha means for abstraction, is defined by the abstract word domain: Definition 3.2 (Abstract Word Domain). An abstract word domain is an abstract domain D W = ( pw ( (cid:221) W ) , (cid:221) I , [[·]] , ⟨⟨·⟩⟩) that has abstract words I as its abstract elements. The relationship betweenabstract words is defined based on the abstract word lattice (cid:221) I = ( I , ⊑ , ⊓ , ⊔ , ⊤ , ⊥) . One abstractword v maps to a set of concrete words [[ v ]] ⊆ W , which is defined by the denotation function [[·]] : I → W . The lattice of concrete words (cid:221) W defines the relationship between elements from theset of concrete words W . Sets of concrete words are formed based on a powerset lattice pw ( (cid:221) W ) .The abstraction function ⟨⟨·⟩⟩ : 2 W → I transforms a given set of concrete words (cid:98) ¯ w ⊆ W into anabstract word v , that is, v = ⟨⟨ (cid:98) ¯ w ⟩⟩ . The abstract epsilon word v ϵ maps [[ v ϵ ]] = { ϵ } to the set with theempty word ϵ only. The bottom element ⊥ , or also abstract bottom word , of the abstract word latticedenotes an abstract word that maps to the empty set of concrete words, that is, [[⊥]] = ∅ .The abstraction mechanism that is provided by the abstract word domain is important for (1) con-structing finite abstractions of collections with exponentially many or infinitely long words; it canbe used to (2) check whether or not the analysis process ran into a fixed point, and (3) for increasingthe sharing of the output that we produce based on abstract transducers.A problem that we have to deal with is the word coverage problem , that is, the question of whetheror not a given abstract word v a is covered by another abstract word L b , that is, if v a ⊑ v b , where ⊑ is the inclusion relation of the abstract word lattice. The actual matching process, that is, the checkfor coverage can be implemented based on quotienting: The abstract word domain must providethe possibility to compute left quotients [22] (Brzowzowski derivates) to match abstract words. Definition 3.3 (Left Quotient).

The left quotient [22] v w : I × I → I of an abstract word v ∈ I regarding an abstract word w ∈ I is defined as v w = ⟨⟨{ ¯ s | ¯ p ◦ ¯ s ∈ [[ v ]] ∧ ¯ p ∈ [[ w ]]}⟩⟩ . It denotessuffixes of v for that w contains prefixes.Another fundamental operation when dealing with words is their concatenation, which is thebinary operator of the free monoid Σ ∗ that describes the set of words over an alphabet Σ . We extendthis operator to abstract words, and with it to sets of words: :8 Andreas Stahlbauer Definition 3.4 (Concatenation).

The concatenation of a pair of abstract words v ◦ v results inan abstract word v ◦ that denotes [[ v ◦ ]] the concatenation of all concrete finite words from theabstract word v with all (finite and infinite) concrete words from the abstract word v . Theconcatenation ¯ σ ◦ ¯ σ of an infinite word ¯ σ with another word ¯ σ results in the infinite word ¯ σ .That is [[ v ◦ v ]] = { ¯ σ ◦ ¯ σ | ¯ σ ∈ [[ v ]] ∧ ¯ σ ∈ [[ v ]]} .To deal with abstract words, the notion of head and tail is important: Definition 3.5 (Head).

Given an abstract word v , the function head ( v ) : I → I denotes the headof an abstract word : The resulting abstract word represents the set of prefixes with length one, orformally [[ head ( v )]] = { ¯ h | ¯ h ◦ ¯ σ ∈ [[ v ]] ∧ | ¯ h | = } . Definition 3.6 (Tail).

The tail of an abstract word is provided by the function tail ( v ) : I → I . Acall v ′ = tail ( L ) returns a new abstract word v ′ that represents the set of postfixes that follow afterthe head. That is, [[ tail ( v )]] = { ¯ σ | ¯ h ◦ ¯ σ ∈ [[ v ]] ∧ | ¯ h | = } , which equals tail ( v ) = v head ( v ) . In several occasions, when reasoning about abstract words and their relationship, we need the fullexpressive power of a Boolean algebra. We can build on the duality between Boolean algebras,regular languages, and complemented and distributive lattices, which follows from the Stoneduality [63, 73]. The abstract word lattice is dual to a Boolean algebra if and only if its meet ⊓ andjoin ⊔ are distributive over each other and if each element in the lattice has a complement withinthe lattice. One example of a lattice that is dual to a Boolean algebra is the powerset lattice andanother one the lattice of regular languages [20, 39]. Both lattices can describe sets of words andcan thus be instantiated as an abstract word lattice of an abstract word domain. Given a lattice ofregular expressions, the join ⊔ corresponds to the language inclusion, the meet ⊓ to the languageintersection, and the operator ⊑ describes the language inclusion; the language is complementedsince the complement of a regular expression is still regular. Definition 3.7 (Abstract Word Complement).

Given an abstract word v , its complement ¬ v definesa set of concrete words such that ∀ v ∈ I : ¬ v ⊓ v = ⊥ and ∀ v ∈ I : ¬ v ⊔ v = ⊤ , with ∀ ¯ a ∈ [[ v ]] : ∀ ¯ b ∈ [[¬ v ]] : ¯ a ⊓ c ¯ b = ⊥ c , where ⊓ c and ⊥ c are components of the concrete word lattice.In case an abstract word lattice is dual to a Boolean algebra, the abstract words and their composition,can also be described using Boolean operators, which have their duals in lattice theory: The join ⊔ corresponds to the logical disjunction ∨ , the meet ⊓ corresponds to the disjunction ∨ , and thecomplement corresponds to the logical negation ¬ . A Boolean formula ϑ is equivalent to an abstractword v if and only if [[ ϑ ]] = [[ v ]] . An abstract word can be parameterized with a finite set of parameters β ⊆ B . A parameterizedabstract word can take two roles: It (1) can capture (bind) values to the parameters during a matchingprocess for a given input, and (2) values for the arguments can get passed explicitly (and act asa template ). We use the term instantiation to denote the process of deriving an abstract word v ′ from an abstract word v by assigning values to the parameters, with [[ v ′ ]] ⊂ [[ v ]] . Examples fordifferent types of templates words include invariant templates [53, 71]. The values that have beenbound to the parameters of an abstract word are provided by the operator bounded : I → B→V .We can bind values to parameters of an abstract word and derive a new abstract word with theoperator bind : I × B→V → I . Binding of values to parameters (variable binding) was extensivelystudied in the past, for example, for rewriting systems [42, 60], and regular expressions [36]. bstract Transducers 1:9 This work introduces abstract transducers, a type of abstract machines that map between abstractinput words and abstract output words. Compared to established transducer concepts, intermediatelanguages are central (we still have a notion of accepted language): Informally speaking, the intermediate input language is the set of words for which the transducer can perform state transitions,and the set of words that are produced as output along these transitions is called the intermediateoutput language .To produce the intermediate output language, an abstract transducer operates prescient , that is,it can take a lookahead into account to decide whether to conduct a state transition or not—andwith it produce an output. Words from the intermediate output language are intended to be usedimmediately, that is, as soon as they are produced while executing the transducer, which has severalimplications on the design on the algorithms that execute abstract transducers and that manipulatethem—for example, to eliminate ϵ -moves.Both the input alphabet and their output alphabet are abstract and defined based on abstractword domains. One abstract word maps to a set of concrete words; the abstract domain providesmeans for mapping between these representations. This abstraction functionality enriches thepossibilities to compute abstractions (widenings) of abstract transducers, which we use as a meansof increasing the scope of sharing : one output is mapped to a larger set of inputs.Each transition of an abstract transducer is annotated with an abstract input word and anabstract output word—which corresponds to symbols of the input alphabet and the output alphabetof traditional transducers. Consuming and producing abstract words instead of single concreteletters has several advantages that increase the generality of our approach: (1) it can be used forlookahead-matching, that is, instead of describing the input symbol to consume, also a sequenceof symbols that must follow can be described, (2) the abstract epsilon word v ϵ , with [[ v ϵ ]] = { ϵ } ,can be used to model the behavior of an ϵ -NFA [70] with a corresponding ϵ -closure and to modelautomata that do not produce outputs at all, and (3) relying on abstract words allows to produceand cope with output words of infinite length, which can be the result of ϵ -loops.Formally, we define an abstract transducer as: Definition 4.1 (Abstract Transducer). An abstract transducer T ∈ T is defined by following tuple: T = ( Q , D in , D out , ι , F , δ )• Control States Q . The finite set Q defines the control states in which the transducer can be in. • Abstract Input Domain D in . The abstract input domain is an abstract word domain thatmaps between abstract words I and concrete words over the concrete input alphabet Σ . Itprovides a denotation function [[·]] in : I → Σ ∗ to map between an abstract word and a setof concrete (and finite) words. We assume the lattice of abstract words to be distributiveand complemented, that is, to be dual [73] to a Boolean algebra . An abstract domain withlattice-valued regular expression [57] would be an example of an abstract input domain. • Abstract Output Domain D out . The abstract output domain is an abstract word domain thatdefines the abstract output words W and their relationship. Its denotation function [[·]] out : W → Θ ∞ maps between an abstract output word and the corresponding set of concreteoutput words over the concrete output alphabet Θ . An instance of an abstract output domaincould, for example, use antichains [1] for word inclusion checks. • Initial Transducer State ι ∈ Q → W . The (non-empty) map ι characterizes the initial transducerstate . The pairing of control states with outputs is needed, since already the transitions that :10 Andreas Stahlbauer leave the initial state can be ϵ -moves that are annotated with an output, and it must bepossible to eliminate those moves without affecting the semantics of the transducer. • Final Control States F ⊆ Q . The set F defines the final (accepting) control states. This set canbe empty, for example, if the transducer is not intended to operate as a classical acceptor,that is, if the focus is on the intermediate languages. • Transition Relation δ ⊆ ∆ . The transition relation defines the set of transitions that arepossible between the different control states. Given a transducer transition ( q , v , q ′ , w ) ∈ δ ,with ∆ = Q × I × Q × W , both the abstract transition input word v and the abstract transitionoutput word w can be the abstract epsilon word, which is used to implement the functionalityof an ϵ -NFA. The abstract input word v must never be the abstract bottom word, that is, [[ v ]] ¯ σ (cid:44) ∅ . Having the empty word as output signals that the matching process must stop forthe given abstract input word—nevertheless, there can be another transition from the samestate q that has an intersecting abstract input word which can cancel out this effect.The set of all transducers is denoted by T , with the subset T D in × D out ⊆ T of transducers that transducefrom words from an abstract input domain D in to those from an abstract output domain D out . The set of control states Q of an abstract transducer implicitly contains two special states that areentered under certain conditions or are used by algorithms that operate on abstract transducers: Definition 4.2 (Trap State).

The trap state or inactivity signaling state is a special control state q π that can be entered to signal that the analysis should continue from that point on, but the transducerwill no more contribute to the analysis process. We assume that this state is implicitly present foreach transducer, that is, q π ∈ Q and ( q π , ⊤ , q π , w ϵ ) ∈ δ , with [[ w ϵ ]] out = { ϵ } .The trap state is entered if no more transitions to move are left, but the analysis should still continuefrom that point on. This state is important for configurations of analyses that track automata ortransducers with a non-stuttering semantics, that is, that do not stay in the same state if no transitionmatches. We define another, similar, control state: Definition 4.3 (Bottom Control State). A bottom control state or unreachable control state is aspecial control state q ⊥ ∈ Q that has no leaving transitions and is not an accepting state, that is, ( q ⊥ , · , · , ·) (cid:60) δ and q ⊥ (cid:60) F , while we assume this state to be present for all transducers implicitly intheir set of control states Q .The core of an abstract transducer is its transition relation, which defines the possible transitionsbetween control states and the output to produce on these transitions. The result of a state transitionis a new transducer state: Definition 4.4 (Transducer State). A transducer state ι ∈ J , with J = Q → W , is map ι : Q → W from control states to abstract output words. Typically, a transducer state is the result of runningthe abstract transducer for a given input, starting in the initial transducer state ι ∈ J . We formalize abstract transducers as Mealy-style [56] finite-state machines. Nevertheless, also aMoore-style [58] representation is possible:

Definition 4.5 (Moore-style Abstract Transducer). A Moore-style abstract transducer is an abstracttransducer that emits its outputs not on transitions between control states but active control states . bstract Transducers 1:11 That is, it is defined by the tuple T Moore = ( Q , D in , D out , Q , F , δ , λ ) . This form of abstract transducerhas a control transition relation δ ⊆ Q × I × Q and uses a state-output labeling function λ : Q → W to map abstract output words to control states. Furthermore, this style of abstract transducer has aset Q of initial control states.A Moore-style abstract transducer allows to represent an abstract reachability graph easily. Forthis work, we prefer the Mealy-style formalization of abstract transducers because they requirefewer states and are fit well for sharing syntactic task artifacts (program fragments for weaving).After we have defined the components of an abstract transducer, we continue in followingsubsections with the description of their semantics. Annotating a transition of an abstract transducer with an abstract input word that maps to at leastone concrete word that is longer than one letter, specifies a lookahead. The possibility of conductinglookaheads is essential if a transition should produce a particular output only if the remainingword to process has a specific word as its prefix. Consider the following example:

Example 4.6.

Assume that the transducer is in control state q ∈ Q . Given a concrete inputword ¯ σ = ⟨ σ , . . . , σ n ⟩ ∈ Σ ∗ , a transducer transition ( q , v , q ′ , w ) ∈ δ , with [[ v ]] in = {⟨ x , ‘ e ‘ , ‘ d ‘ ⟩ | x ∈ Σ } will only match if σ = ‘ e ‘ ∧ σ = ‘ d ‘ and will then produce the output w .We characterize the lookahead of a transducer transition by a number: Definition 4.7 (Transition Lookahead).

The lookahead ℓ ( τ ) ∈ N of a transition τ = ( q , v , · , ·) ∈ δ is ℓ ( τ ) = ℓ ( τ ) = max {| ¯ σ | | ¯ σ ∈ [[ v ]] in } − Definition 4.8 (Transducer Lookahead).

The lookahead of an abstract transducer ℓ ( T ) ∈ N is themaximal lookahead of any of its transition. That is, ℓ ( T ) = max { ℓ ( τ ) | τ ∈ δ } , where δ is thetransition relation of transducer T . Lookahead Lookahead To be consumedConsumed InputConcatenation

Fig. 3. Matching

One can execute an abstract transducer on a rooted and directedgraph instead of a particular input word—one word correspondsto a list or a sequence of letters. Each edge of the graph that wematch is labeled with a letter. Words are formed by concatenatingall letters on the graph edges that get traversed during the matchingprocess, starting from the root node of the graph. Figure 3 providesan intuition of the matching process. In this work, we restrict thegraph matching process to disjunctive tree matching, defined by:

Definition 4.9 (Disjunctive Tree Matching).

A tree matching pro-cedure is called to be disjunctive if not several input branches thatfollow from a particular point on have to satisfy specific criteria.That is, if only one of the input words that follow (on that thelookahead is conducted), must satisfy a given criterion.To allow for matching based on the full expressiveness of regular tree expressions (several of theinput words might have to satisfy a specific criterion), the abstract transducer’s abstract inputdomain has to be lifted from an abstract word domain to an abstract language domain—see Sect. 3.1.We keep this extension of abstract transducers for future work. :12 Andreas Stahlbauer

An established practice [48, 70] in automata theory and its application is to use automata withtransitions that are annotated with an empty-word symbol ϵ . This was, first and foremost, introducedas a convenience feature to describe automata and its transition relation in a more concise fashion.Abstract transducers allow to annotate transitions with the abstract epsilon word v ϵ to providesimilar semantics and convenience: Definition 4.10 ( ϵ -Move). An ϵ -move (or ϵ -transition ) is an automaton transition (or transducertransition) ( q , v , q ′ , w ) ∈ δ that is annotated with the abstract epsilon word v ϵ as its input, that is, [[ v ]] in = [[ v ϵ ]] in = { ϵ } .Some algorithms might not be able to deal with transducers that have ϵ -moves—or they might bemore sophisticated in their presence—but only with those transducers from that all ϵ -moves wereeliminated. We define abstract transducers without ϵ -moves as: Definition 4.11 (Input- ϵ -Free). An abstract transducer is said to be input- ϵ -free if it does not haveany transition based on an ϵ -move, that is, (· , v ϵ , · , ·) (cid:60) δ , with [[ v ϵ ]] in = { ϵ } .The presence of ϵ -moves can lead to loops thereof, which is vital for expressing complex outputs,for example, to describe the control-flow of Turing-complete programs—assuming that each moveemits a program operation to conduct as output. Definition 4.12 ( ϵ -Loop). An ϵ -loop is any sequence of ϵ -moves that starts in a control state q k and could include this control state q k infinitely often in such a sequence. More formally, an ϵ -loopis a sequence ¯ τ = ⟨ τ , . . . , τ n ⟩ ∈ ∆ ∞ of ϵ -moves that is well-founded in the transition relation δ andthere exists a transducer transition τ i = ( q , · , · , ·) ∈ ¯ τ for which the source state q is precisely thedestination state q ′ of a transducer transition τ j = (· , · , q ′ , ·) ∈ ¯ τ , with i ≤ j .From the definition of ϵ -moves follows the definition of the ϵ -closure [70]. Intuitively speaking the ϵ -closure of a control state q is the set of control states that become instantly and simultaneously(parallel) active if state q becomes active. Definition 4.13 (Epsilon Closure).

The epsilon closure epsclosure : Q → Q of a state q ∈ Q is theset epsclosure ( q ) ⊆ Q of states that can get reached transitively from state q by only following ϵ -moves [70]. The bottom state q ⊥ is added if the epsilon closure includes an ϵ -loop from which nocontrol state is reachable with no ϵ -move leaving.The transition relation of an abstract transducer can contain sequences {( q , v ϵ , q , w ) , ( q , v ϵ , q , w )} ⊆ δ of ϵ -moves but not each control state that is reached within such a sequencemight have non- ϵ -moves leaving in the transition relation. We therefore introduce the notion ofclosure termination states: Definition 4.14 (Closure Termination States).

The closure termination states closureterm : Q → Q of a given state q are both the states (1) in the epsilon closure epsclosure ( q ) from which no ϵ -moveleaves and (2) states within the closure that are accepting, that is, closureterm ( q ) = { q ′ | q ′ ∈ epsclosure ( q ) ∧ ( q ′ , v ϵ , · , ·) (cid:60) δ } ∪ ( epsclosure ( q ) ∩ F ) .Each transducer transition between the control states from an ϵ -closure can be mapped to a set ofclosure termination states: Definition 4.15 (Termination State Mapping).

The termination state mapping is a map ∆ Ω : ∆ → Q that maps a given transducer transition to the set of closure termination states that are reachable.Given a control state q ∈ Q , the result is the empty set ∅ if no ϵ -move leaves state q ; it is the bottomstate q ⊥ if there is not any other termination state. bstract Transducers 1:13 Since also each transition within an epsilon closure can produce an output, we introduce the notionof concrete language on termination . This notion reflects with which output words the differentclosure termination states can be reached:

Definition 4.16 (Concrete Language on Termination).

The concrete language on termination Ω : Q × Q → Θ ∞ for a given pair ( q , q Ω ) describes the concrete output language (a set of concrete words)that can be produced starting in control state q and that terminates with a closure terminationstate q Ω ∈ closureterm ( q ) . More formally, let (cid:98) τ = { ¯ τ , . . . } ⊆ ∆ ∞ be the set of all well-foundedsequences of transducer transitions between control state q and the termination state q Ω , with ¯ τ i = ⟨ τ , . . . ⟩ and τ i = ( q , v i , q ′ , w i ) ∈ δ . The concrete output language [[ ¯ τ i ]] of a sequence ¯ τ i is theconcatenation [[ w ]] out ◦ . . . of the concretizations of all abstract output words w i that are emittedalong it. That is, the concrete output language Ω ( q , q Ω ) is the union (cid:208) ¯ τ i ∈ (cid:98) τ [[ ¯ τ i ]] . Definition 4.17 (Concrete Closure Language).

The concrete closure language Ω ( q ) ⊆ Θ ∞ of a givencontrol state q and its ϵ -closure is the set of concrete output words that is produced while makingtransitions along the ϵ -moves between states in the closure. More precisely, it is the join of concretelanguages on termination, that is, Ω ( q ) = (cid:208) { ¯ σ ∈ Ω ( q , q Ω ) | q Ω ∈ closureterm ( q )} .In our applications of abstract transducers, we use the (anonymous) states and transi-tions in the epsilon closure as a tool for expressing relational outputs. Please note, thatalso ϵ -moves that lead to a dead-end are relevant and must not be eliminated —what isdone for some applications [31]—because the output might be relevant for the analysistask, and the soundness of the produced result, for which the transducer is executed. q q q q q q / w v ϵ / w a v ϵ / w c v ϵ / w b v ϵ / w d v ϵ / w f v / w e Fig. 4. With ϵ -loop Example 4.18.

Figure 4 illustrates an example transducer:The ϵ -closure of control state q is the set epsclosure ( q ) = { q , q , q , q , q , q ⊥ } , for state q , the closure epsclosure ( q ) = { q } does not contain additional states. State q has the set of closuretermination states closureterm ( q ) = { q , q ⊥ } , and state q has closureterm ( q ) = { q ⊥ } , that is, no other termination state is reach-able. The transitions between states { q , q , q } form an ϵ -loop.Given a control state q ∈ Q , the semantics of ϵ -moves implies thatwith reaching state q , actually all states in Q t = closureterm ( q ) are reached immediately. Thatis, also all output on the transitions from q to a state in Q t is produced immediately, resultingin—possibly exponentially many and infinitely long—words ⊆ Θ ∞ over the output alphabet Θ . Previous section describes the epsilon closure of abstract transducers; in contrast to establishedtransducer concepts, we also address ϵ -moves that are annotated with non-empty outputs, and usethem as tool to express complex output languages, with possibly exponentially many and infinitelylong words, in a convenient fashion. When executing or reducing (minimizing) abstract transducers,means for collecting, aggregating, and possibly abstracting the output on these transitions areneeded. Given a control state q ∈ Q , the goal of this summarization process is to provide anabstract output word w Ω ∈ W for each of its closure termination states q Ω ∈ closureterm ( q ) thatoverapproximates the concrete closure language, that is, Ω ( q , q Ω ) ⊆ [[ w Ω ]] out —which can lead to aloss of information. The computation of this closure is done in a corresponding operator: Definition 4.19 (Abstract Output Closure).

The abstract output closure of a given control state q ∈ Q is a finite overapproximation of the concrete closure language of each of its closure termination :14 Andreas Stahlbauer states; it is a map of closure termination states of q to abstract output words, which summarizes thecorresponding closure output languages: abstclosure : ( Q × W ) → Q → W . A call abstclosure ( q , w ) ,with an initial abstract output word w , returns a map {( q t , w t ) | q t ∈ closureterm ( q ) ∧ [[ w t ]] out ⊆[[ w ]] out ◦ Ω ( q , q t )} .We extend the abstract output closure operator abstclosure to sets: Definition 4.20 (Abstract Output Closure).

The abstract output closure of a given set of transducerstates (cid:156) abstclosure : 2 Q × W → Q → W is defined as (cid:156) abstclosure ( S ) = (cid:195) → (cid:208) {( q Ω , w Ω ) | ( q , w ) ∈ S ∧( q Ω , w Ω ) ∈ abstclosure ( q , w )} .Actual implementations of an abstract output closure operator can be provided, for example, basedon abstract interpretation, or based on techniques from automata theory. Even transducers can beused [66] to compute abstractions of languages, in our case, the concrete output languages that areproduced in the ϵ -closure. We give two examples of implementations: Joining Closure.

The first abstract output closure operator abstclosure ⊔ joins all abstract outputwords that can be found on transitions in the epsilon closure from control state q that are mappedto the same closure termination state. Let us assume that there is an operation closuretrans : Q × Q → ∆ that, given a pair of control states q , q Ω ∈ Q , returns all transitions from thetransition relation δ that are in the epsilon closure epsclosure ( q ) and are mapped to a closuretermination state q Ω . Then, we can define the closure operator as follows: abstclosure ⊔ ( q , w ) = {( q Ω , w ⊔ out (cid:195) out { w | (· , · , · , w ) ∈ closuretrans ( q , q Ω )}) | q Ω ∈ closureterm ( q )} . This operatorproduces an overapproximation of the concrete output language. The resulting abstraction doesneither preserve information on the flow nor is path information kept. Regular Closure.

Another example of an output closure operator is abstclosure ∞ . Here, we as-sume that the abstract output words can be described based on an abstract domain of ∞ -regularlanguages [55], with a corresponding lattice thereof. Rules for transforming automata into regularexpressions can be applied [55]: The result for the transducer in Fig. 4 is abstclosure ∞ ( q , w ) = {( q , w ◦ w b ) , ( q ⊥ , w ◦ w a ◦ ( w c ◦ w f ◦ w d ) ω )} . This type of output closure is lossless. Nevertheless,not all applications require this level of detail. We now define runs of abstract transducers and illustrate how they are conducted for given inputs.All runs of an abstract transducer start from the initial transducer state:

Definition 4.21 (Abstract Transducer Run). A run of an abstract transducer on a concrete inputword ¯ σ = ⟨ σ , . . . , σ n ⟩ ∈ Σ ∗ and a lookahead (cid:98) σ ⊆ Σ ∗ is a sequence of transducer state transi-tions ι v / w −−−−→ . . . v n / w n −−−−−→ ι n , also denoted by ⟨ ι , . . . , ι n ⟩ in case the actual transducer transitionsare irrelevant for the discussion. A run always starts in the initial transducer state ι ∈ J , iswell-founded in the transition relation δ , and all transitions along the input match, that is, thequotienting (⟨⟨{⟨ σ i , . . . , σ n ⟩} ◦ (cid:98) σ ⟩⟩ in ) v i (cid:44) ⊥ does not result in the abstract bottom word.Before we continue to define feasible and accepting runs of an abstract transducer, we define theabstract output of a run: Definition 4.22 (Abstract Run Output).

The abstract output of a run ι v / w −−−−→ . . . v n / w n −−−−−→ ι n is theconcatenation of the subsequent abstract output words w ◦ = w ◦ w ◦ . . . ◦ w n . The abstractoutput word w is one abstract output word from the initial transducer state, that is, there exists apair (· , w ) ∈ ι .The output of a abstract transducer run is essential for the definition of feasible transducer runs: bstract Transducers 1:15 Definition 4.23 (Feasible Run).

A run is called feasible if and only if its abstract output w ◦ is not thebottom element ⊥ , that is, if and only if [[ w ◦ ]] out (cid:44) ∅ . The set of all concrete inputs (with lookaheads)that result in a feasible run on an abstract transducer T defines the function feasible T : Σ ∗ × Σ ∗ → B .Abstract transducers can also operate as acceptors and define a set of inputs to be accepted. Wefirst define the notion of an accepting run and define the accepted input language later: Definition 4.24 (Accepting Run).

A run ⟨ ι , . . . , ι n ⟩ is called to be accepting if it is feasible and itslast transducer state contains an accepting (final) control state, that is, if and only if ( q n , w n ) ∈ ι n ,with q n ∈ F and w n (cid:44) ⊥ .In general, an abstract transducer is a nondeterministic automaton, nevertheless it can be deter-ministic if it satisfies following criterion: Definition 4.25 (Deterministic Abstract Transducer).

We call an abstract transducer deterministic if and only if it does not allow a run ¯ ι = ⟨ ι , . . . ι n ⟩ with a transducer state ι i ∈ ¯ ι that consists ofmore than one element, that is, ∀ ι i ∈ ¯ ι : | ι i | ≤ concrete input word ¯ σ ∈ Σ ∗ based on the concrete input alphabet Σ and a set (cid:98) σ ⊆ Σ ∗ of wordsthat can follow to this word (used for the lookahead), which output does the transducer produceand does processing the word terminate in an accepting control state? Since a concrete input wordcan be represented as an abstract word, and we consider this the more general case, we describeruns based on abstract input words: A given concrete input word ¯ σ ∈ Σ ∗ can be transformed toan abstract input word by applying the abstraction operator such that we end up in an abstractword v = ⟨⟨{ ¯ σ }⟩⟩ in , with [[ v ]] in = { ¯ σ } . Definition 4.26 (Run).

The function run T : Q × W × I × I → Q → W conducts a run startingfrom a control state q ∈ Q , an initial abstract output word w ∈ W , an abstract input word v ∈ I ,with v (cid:44) ⊥ , and an abstract word v ℓ ∈ I that describes the lookahead that must be satisfied: run T ( q , w , v , v ℓ ) =  { ( q , w ) } if v = v ϵ (cid:196) → (cid:216) { run T ( q ′′ , w ◦ w ′′ , tail ( v ) , v ℓ ) |( q , v τ , q ′ , w ′ ) ∈ δ ∧ ( q ′′ , w ′′ ) ∈ abstclosure ( q ′ , w ′ )∧ ( v ◦ v ℓ ) v τ (cid:44) ⊥∧ ( head ( v )) head ( v τ ) (cid:44) ⊥ } otherwiseThe function run terminates its recursion if the abstract input word is the bottom element. Therecursive call to run is done for the tail of the abstract input word—which ensures termination—incase a transition that leaves the given control state q matched the input.We extend this function to (cid:99) run T : 2 Q → W × I × I → Q → W , which starts from a transducer state,and we define it as follows: (cid:99) run T ( ι , v , v ℓ ) = (cid:196) → (cid:216) ( q , w )∈ ι run T ( q , w , v , v ℓ ) The transducer state to start from is omitted if it is the abstract transducer‘s initial transducer state ι ,that is, (cid:99) run T ( v , v ℓ ) = (cid:99) run T ( ι , v , v ℓ ) . Given a concrete input word ¯ σ ∈ Σ ∗ and a corresponding set ofconcrete words (cid:98) σ ⊆ Σ ∗ for the lookahead, we write (cid:99) run T ( ¯ σ , (cid:98) σ ) as an abbreviation for (cid:99) run T ( ι , ¯ σ , (cid:98) σ ) ,which is as an abbreviation for (cid:99) run T ( ι , ⟨⟨{ ¯ σ }⟩⟩ in , ⟨⟨ (cid:98) σ ⟩⟩ in ) . :16 Andreas Stahlbauer Contrary to other types of finite state transducers [29] our abstract transducers distinguish betweentwo type of input languages: the intermediate input language and the accepted input language.

Definition 4.27 (Intermediate Input Language).

The intermediate input language L in ( T ) ⊆ Σ ∗ × Σ ∗ of an abstract transducer T is the set of concrete input words for that the transducer can conductfeasible runs starting from the initial transducer state ι : L in ( T ) = { ( ¯ σ , (cid:98) σ ) | feasible T ( ¯ σ , (cid:98) σ ) ∧ ¯ σ ∈ Σ ∗ ∧ (cid:98) σ ⊆ Σ ∗ } . It follows that each prefix ¯ σ p ⪯ ¯ σ of each word ¯ σ ∈ L in ( T ) is also element of the intermediate inputlanguage, that is, ¯ σ p ∈ L in ( T ) .The accepted input language reflects the established notion of input language, which is based onthe set of words that can reach a final control state: Definition 4.28 (Accepted Input Language).

The accepted input language L acc ⊆ L in is the subsetof the intermediate input language for which an accepting control state q ∈ F is reached: L acc ( T ) = { ( ¯ σ , (cid:98) σ ) ∈ L in ( T ) | ( q , ·) ∈ (cid:99) run T ( ¯ σ , (cid:98) σ ) ∧ q ∈ F } . Beside the accepted input language, another characteristic of an abstract transducer is its set oftransductions and its set of accepting transductions:

Definition 4.29 (Transductions).

The set of transductions

T ( T ) ⊆ Σ ∗ × Σ ∗ × Θ ∞ of an abstracttransducer T characterizes both its concrete input language and the outputs that are produced forthem. One element ( ¯ σ , ¯ Σ ℓ , ¯ Θ ) ∈ T ( T ) from this set is a tuple that consists of a word prefix ¯ σ that isconsumed by a run of the transducer, a set of concrete words ¯ Θ ⊆ Σ ∗ to conduct the lookahead onand that remains to be consumed by the next transitions of the transducer, and the set of concreteoutput words ¯ Θ ⊆ Θ ∞ that are emitted with the consumption of word ¯ σ —see the definition of (cid:99) run T for more details: T ( T ) = (cid:216) { ( ¯ σ , (cid:98) σ , [[ w ]] out ) | ( ¯ σ , (cid:98) σ ) ∈ L in ( T )∧ ( q , w ) ∈ (cid:99) run T ( ¯ σ , (cid:98) σ ) } . Definition 4.30 (Accepting Transductions).

The set of accepting transductions T acc ( T ) ⊆ T ( T ) is thesubset of the transductions of a given abstract transducer T that are produced by accepting runs: T ( T ) = (cid:216) { ( ¯ σ , (cid:98) σ , [[ w ]] out ) | ( ¯ σ , (cid:98) σ ) ∈ L in ( T )∧ ( q , w ) ∈ (cid:99) run T ( ¯ σ , (cid:98) σ )∧ q ∈ F } . The number of accepting transductions is greater or equal than the number of accepted inputwords, that is, |L acc ( T )| ≤ |T acc ( T )| , because there can be independent concrete output languagesfor one concrete input ( ¯ σ , (cid:98) σ ) ∈ Σ ∗ × Σ ∗ .In combination, the set of transductions and the set of accepted transductions determine if twoabstract transducers are equivalent to each other: Definition 4.31 (Equivalence).

Two abstract transducers T , T ∈ T are called equivalent T ≡ T to each other if and only if both have the same set of transductions and the same set of acceptingtransductions, that is, if and only if T ( T ) = T ( T ) and T acc ( T ) = T acc ( T ) . bstract Transducers 1:17 Algorithm 1 elim ( T ϵ ) Input:

Abstract transducer T ϵ = ( Q , D in , D out , ι , F , δ ) ∈ T Output:

Abstract transducer T ∈ T , with T ϵ ≡ T // Sentinel transitions for the initial transducer state, with v ϵ (cid:46) v start δ ϵ = δ ∪ { ( q s , v start , q , w ) | ( q , w ) ∈ ι } // Shortcut ϵ -moves to their termination states δ ′ = { ( q , v , q ′′ , w ′′ ) | τ = ( q , v , q ′ , w ) ∈ δ ϵ ∧ v (cid:44) v ϵ ∧ ( q ′′ , w ′′ ) ∈ abstclosure ({( q ′ , w )}) } // Reconstruct a new initial transducer state ι ′ = { ( q , w ) | (· , v , q , w ) ∈ δ ∧ v = v start } // Reassemble the components to a new abstract transducer return ( Q , D in , D out , ι ′ , F , δ ′ ) Based on the notion of equality, we can define different operations, for example, reduction or ϵ -elimination. We start by defining a more fundamental one: The union of two abstract transducers.The union is constructed similar to the union of ϵ -NFAs, with the exception that no ϵ -moves areadded; we take advantage of the fact that the initial transducer state is a set: Definition 4.32 (Union).

Given two abstract transducers T , T ∈ T D in × D out that both have thesame abstract input domain D in and the same abstract output domain D out , such that T = ( Q , D in , D out , ι , F , δ ) and T = ( Q , D in , D out , ι , F , δ ) . The union ∪ : T × T → T of twoabstract transducers results in a new abstract transducer T ∪ = T ∪ T that maintains ex-actly both the union of the set of transductions and the set of accepting transductions, that is, T ( T ∪ ) = T ( T ) ∪ T ( T ) and T acc ( T ∪ ) = T acc ( T ) ∪ T acc ( T ) . We define the union as T ∪ = ∪( T , T ) = ( Q ∪ Q , D in , D out , ι ∪ ι , F ∪ F , δ ∪ δ ) . ϵ -Moves Since ϵ -moves are considered to be a convenience feature, eliminating them without losing anyoutput must be possible—that is, without altering the semantics of the transducer. The ϵ -closurecan allow sequences of state transitions of infinite length, that is, a means to encode this infiniteinformation into one (finite) output symbol is needed. An algorithm for computing abstract outputclosures provides such a means.For the design of an ϵ -elimination algorithm, it is important to note that all states in the ϵ -closureof a control state become active when it is entered. This implies that then also the output that isproduced along with these ϵ -moves must be emitted: Existing algorithms for ϵ -elimination arenot applicable to abstract transducers. An algorithm for eliminating ϵ -moves from an abstracttransducer T ϵ must ensure that the resulting transducer T is equivalent T ϵ ≡ T . Please note thatstuttering transitions must be made explicit and must be considered to allow a sound eliminationof ϵ -moves.Algorithm 1 is our approach for eliminating ϵ -moves from an abstract transducer. The algorithmconstructs a new transition relation, from which all ϵ -moves are removed by adding shortcuts tothe closure termination states and concatenating the corresponding closure output language.Proposition 4.33. Given an abstract transducer T ϵ , all its ϵ -moves can be eliminated withoutaffecting its semantics, that is, without affecting either the set of transductions or the set of acceptingtransductions. The abstract transducer T ϵ can be transformed into an input- ϵ -free transducer T , with T ϵ ≡ T . :18 Andreas Stahlbauer Proof. We prove the proposition by providing an algorithm that conducts this transformationwhile maintaining the set of transductions and the set of accepted transductions: Given an abstracttransducer T ϵ that has ϵ -moves, Algorithm 1—which we implicitly parameterize with the outputclosure operator abstclosure ∞ —produces an abstract transducer T that is input- ϵ -free and satis-fies T ( T ϵ ) = T ( T ) and T acc ( T ϵ ) = T acc ( T ) . (1) The transition relation δ ′ , and with it the resultingtransducer T , is input- ϵ -free because only non- ϵ -moves are added to the transition relation. (2) Theset of closure termination states, for which abstclosure ∞ provides a pairing with the correspondingoutput closure language, contains all accepting states (Definition 4.14), that is, all moves to acceptingstates are maintained, and with it the set of accepting transductions. (3) The set of transductions ismaintained: The output from the epsilon closures, that is, the closure termination languages, areconcatenated to the transitions to the closure termination states. □ Example 4.34.

Given the transducer in Fig. 4, Algorithm 1 proceeds as follows: First, we extendthe transition relation with sentinels and get δ ϵ = {( q , v ϵ , q , w a ) , ( q , v ϵ , q , w c ) , ( q , v ϵ , q , w f ) , ( q , v ϵ , q , w d ) , ( q , v ϵ , q , w b ) , ( q , v ϵ , q , w e ) , ( q s , v start , q , v ϵ )} . In the next step, ϵ -moves are left outby adding transitions to the closure termination states and concatenating the corresponding closureoutput languages; the result is a new transition relation δ ′ = {( q s , v start , q ⊥ , w ∗ ) , ( q s , v start , q , w b ) , ( q , v , q , w e )} , with w ∗ = w a ◦ ( w c ◦ w f ◦ w d ) ω . Then, the initial transducer state is re-constructedfrom the relation δ ′ and we get ι ′ = {( q ⊥ , w ∗ ) , ( q , w b )} . Finally, the transducer is re-assembledand we get the transducer shown in Fig. 3. A typical operation when dealing with finite state machines is the transformation of a nondetermin-istic automaton into a deterministic one. This is not possible for abstract transducers in general: Thecontrol-flow structure of the state-transitions within the ϵ -closure describes different informationflows—that is, sets of output words that reach different closure termination states—as its semantics,which is not the case for classical automata and transducers. For example, a state-space splittingmight be intended based on the information of the emitted output—different outputs for the sameinput that lead to different control states. That is, different closure termination states, which can beaccepting states, can have associated different closure termination languages; this separation mustbe maintained—which is also reflected in our definition of transducer equivalence.Proposition 4.35. Not every nondeterministic abstract transducer T can be transformed into anequivalent deterministic transducer T d , with T ≡ T d . Proof. We proof the proposition by counterexample—assuming that all abstract transducerscan be determinized. Given an abstract transducer T with the set of initial transducer states ι = {( q , w ) , ( q , w )} and the relation δ = {( q , v , q , w ) , ( q , v , q , w )} , with ⟨ a ⟩ ∈ [[ v ]] in and ⟨ b ⟩ ∈[[ v ]] in , it has the set of transductions T ( T ) = {( ϵ , { ϵ } , [[ w ]] out ) , ( ϵ , { ϵ } , [[ w ]] out ) , (⟨ a ⟩ , { ϵ } , [[ w ◦ w ]] out ) , (⟨ b ⟩ , { ϵ } , [[ w ◦ w ]] out )} . A determinized version would have an initial transducer statewith only one element, that is, the initial transducer state can be either ι = {( q , w ⊔ w )} of atransducer T or ι = {( q , ϵ )} of a transducer T . Both are wrong since transducer T intendedan initial state space splitting with different output languages. Transducer T does not have thetransduction ( ϵ , { ϵ } , [[ w ⊔ w ]] out ) ∈ T ( T ) . The transductions of T are not equal to those of T , since T ( T ) = {( ϵ , { ϵ } , [[ w ⊔ w ]] out ) , (⟨ a ⟩ , { ϵ } , [[( w ⊔ w ) ◦ w ]] out ) , (⟨ b ⟩ , { ϵ } , [[( w ⊔ w ) ◦ w ]] out )} (cid:44) T ( T ) . □ Proposition 4.36.

An abstract transducer needs a set of initial transducer states to allow for anelimination of ϵ -moves. That is, a set of initial transducer states with | ι | = is not sufficient for all ϵ -input-free transducers while maintaining their semantics. bstract Transducers 1:19 q q q / w v / w ¬ v / ϵ v / w ¬ v / w (a) q q / w v / w ¬ v / ϵ ¬ v / w v / w (b) q q / w ⊤/ w v / w (c) q / w ⊤/( w ⊔ w ) (d)Fig. 5. Examples for different types of abstractions. Abstractions are applied step-wise from left to right:(a) we start with the unabstracted transducer, (b) we conduct a state abstraction by merging states q and q ,(c) we abstract the input alphabet, (d) we abstract the output alphabet. Proof. Implication of the proof for proposition 4.35. □ Abstracting (widening) an abstract transducer is a means to provide its output for a larger setof input words, that is, a mechanism to increase sharing and with it the potential of reuse . That is,we explicitly rely on the fact that abstracting an automaton can widen its input language, andintroduces non-determinism [4]. We discuss different types of abstractions that are relevant forthis work—Fig. 5 provides examples for abstractions. Approaches for abstracting classical automataand symbolic automata have been presented in the past [23, 65], which can also be adopted forabstract transducers.Given an abstract transducer T , the abstraction operator ⟨⟨·⟩⟩ π : T → T with widening—with theabstraction precision π as an implicit parameter that determines the level of abstraction to achieve—has to guarantee that the resulting abstraction overapproximates both the set of transductions andthe set of accepting transductions: Definition 5.1 (Overapproximation).

An abstract transducer T overapproximates another abstracttransducer T , which we denote by T | = T , if and only if T overapproximates both the set oftransductions and the set of accepting transductions of transducer T , that is, T | = T if andonly if ∀ ( ¯ σ , (cid:98) σ , (cid:98) θ ) ∈ T acc ( T ) : ∃ ( ¯ σ , (cid:98) σ , (cid:98) θ ) ∈ T acc ( T ) : ¯ σ = ¯ σ ∧ (cid:98) σ = (cid:98) σ ∧ (cid:98) θ ⊆ C (cid:98) θ and ∀ ( ¯ σ , (cid:98) σ , (cid:98) θ ) ∈ T ( T ) : ∃ ( ¯ σ , (cid:98) σ , (cid:98) θ ) ∈ T ( T ) : ¯ σ = ¯ σ ∧ (cid:98) σ = (cid:98) σ ∧ (cid:98) θ ⊆ C (cid:98) θ . The relation ⊆ C denotes the inclusion relation of the concrete language lattice of the output language domain. The classical approach to abstract an automaton is state abstraction , that is, to merge several controlstates into one [62]. Please note that this approach can also be used for abstracting output closures,which is the case if control states within an ϵ -closure are merged: Definition 5.2 (Control State Merge). A state merge for a given abstract transducer T is conductedby merging a set of its control states Q m ⊆ Q into into one new state q m , and results in a newabstract transducer T m . We denote this process by the operator qmerge : T × Q → T , that is, T m = qmerge ( T , Q m ) . The actual definition of operator qmerge is given by Algorithm 2.Proposition 5.3. Given an abstract transducer T , a transformation T ′ = qmerge ( T , Q m ) results ina new abstract transducer T ′ , with T | = T ′ , that is, transducer T ′ overapproximates transducer T . Proof. We have to show that (1) each input ( ¯ σ , (cid:98) σ ) ∈ Σ ∗ × Σ ∗ that leads to a feasiblerun ι = (cid:99) run T ( ¯ σ , (cid:98) σ ) on T also leads to a feasible run ι ′ = (cid:99) run T ′ ( ¯ σ , (cid:98) σ ) on transducer T ′ , and for :20 Andreas Stahlbauer Algorithm 2 qmerge ( T , Q m ) Input:

Abstract transducer T = ( Q , D in , D out , ι , F , δ ) ,set Q m of states to merge Output:

Abstract transducer T ′ , with T | = T ′ Variables:

Control state q m that is not in the set Q of transducer T // Define the abstraction α α = { ( q , q ′ ) | q ∈ Q ∧ q ′ = q m if q ∈ Q m else q ′ = q } // New set of control states Q ′ = ( Q \ Q m ) ∪ { q m } // New set of accepting states F ′ = { α ( q ) | q ∈ F } // New initial transducer state ι ′ = (cid:195) → { ( α ( q ) , M ) | ( q , M ) ∈ ι } // New transition relation δ ′ = { ( α ( q ) , v , α ( q ′ ) , w ) | ( q , v , q ′ , w ) ∈ δ } // Compose the resulting transducer return ( Q ′ , D in , D out , ι ′ , F ′ , δ ′ ) each element ( q , w ) ∈ ι there exists an element ( q ′ , w ′ ) ∈ ι ′ , with [[ w ]] ⊆ [[ w ′ ]] . Furthermore, wehave to show that (2) each input that leads to an accepting run on transducer T also lead to anaccepting run on transducer T ′ . Given a run ¯ ι = ⟨ ι , . . . , ι n ⟩ that is feasible on transducer T fora given input ( ¯ σ , (cid:98) σ ) ∈ Σ ∗ × Σ ∗ . The same input will also produce a feasible run ¯ ι ′ = ⟨ ι ′ , . . . , ι ′ n ⟩ on transducer T ′ . For each ι i ∈ ¯ ι with ( q , ·) ∈ ι i or ( q , ·) ∈ ι i , the corresponding transducerstate ι ′ i ∈ ¯ ι ′ will contain the merged control state q m with a corresponding abstract output word,that is, ( q m , w ) ∈ ι ′ i . The definition of qmerge ensures that all transitions from either control state q or q are also possible from control state q m : All transitions in δ from or to a control state in Q m arereplaced by corresponding transitions from or to control state q m . In case a control state to merge isincluded in the initial transducer state ι , it is replaced by control state q m in the initial transducerstate ι ′ of transducer T ′ . The non-deterministic nature of abstract transducers ensures that alltransitions that match will also be taken: One control state can have a set of successor states for agiven input. The transformation of the set of accepting control states F to the set F ′ ensures that ifone of the states to merge was an accepting state, also state q m will become an accepting state;states in F that are not included in Q m stay accepting states in F ′ . That is, all transitions—and runson them—that were possible from or to control states in the set Q m are still possible (lead feasibleor accepting runs) in the new abstract transducer T ′ , but now start or end in control state q m . □ Please note that abstracting abstract transducers by merging control states does neither affectthe number of transitions nor their labeling—both the input symbols and the output symbols ontransitions stay the same, but output languages of epsilon closures can change.

Definition 5.4 (State Abstraction).

The state abstraction ⟨⟨ T ⟩⟩ πQ of an abstract transducer T resultsin a new abstract transducer T ′ that is computed based on an abstraction precision π , with T | = T ′ .The abstraction precision determines which states to keep separated and which to combine intoone state—which represents the corresponding equivalence class. The abstraction precision π = bstract Transducers 1:21 ⟨ Q , . . . , Q n ⟩ defines a list of disjoint sets of control states that should be combined. A stateabstraction is conducted as follows: ⟨⟨ T ⟩⟩ πQ = (cid:40) qmerge ( T , Q ) if π = ⟨ Q ⟩ qmerge (⟨⟨ T ⟩⟩ ⟨ Q ,..., Q n ⟩ Q , Q ) if | π | > π = ⟨ Q , Q , . . . ⟩ An abstraction approach that influences the abstract input words of the transitions is input alphabetabstraction, which is the process of changing the abstract input word v of a transducer transition τ = ( q , v , q ′ , w ) ∈ δ to an new abstract input word v ′ , with [[ v ]] in ⊆ [[ v ′ ]] in : Definition 5.5 (Input Alphabet Abstraction). An input alphabet abstraction ⟨⟨ T ⟩⟩ πI of an abstracttransducer T results in a new abstract transducer were some of the abstract input words of itscontrol transitions were widened based on the given abstraction precision π ∈ Π I . The abstractionprecision π for input alphabet abstraction maps an abstraction precision π in that is applicable tothe abstract input domain to each of the transducer’s control transitions, that is, it is a left-totalfunction π : ∆ → Π in . The result is an abstract transducer with a widened transition relation: δ ′ = { ( q , ⟨⟨ v ⟩⟩ π in in , q ′ , w ) | τ = ( q , v , q ′ , w ) ∈ δ ∧ ( τ , π in ) ∈ π } . Along with this work, we introduce an output alphabet abstraction, which adjusts the abstractoutput words of transitions. It denotes the process of changing the abstract output word w of atransducer transition τ = ( q , v , q ′ , w ) ∈ δ to an new abstract output word w ′ , with [[ w ]] out ⊆ [[ w ′ ]] out : Definition 5.6 (Output Alphabet Abstraction). An output alphabet abstraction ⟨⟨ T ⟩⟩ πO of an abstracttransducer T results in a new transducer were some of the abstract output words of its controltransitions were widened based on the given abstraction precision π ∈ Π O . The precision π foroutput alphabet abstraction maps an abstraction precision π out that is applicable to the outputdomain to each of the transducer’s transitions, that is, it is a left-total function π : ∆ → Π out . Theresult is an abstract transducer with a widened transition relation: δ ′ = { ( q , v , q ′ , ⟨⟨ w ⟩⟩ π out out ) | τ = ( q , v , q ′ , w ) ∈ δ ∧ ( τ , π out ) ∈ π } . Please note that also the computation of the abstract output closure—see Sect. 4.5—yields a form ofoutput alphabet abstraction.

Besides abstraction techniques, also techniques for the reduction of abstract transducers are impor-tant. Such techniques help to reduce the number of control states, the number of control transitions,and the degree of non-determinism of a given abstract transducer. That is, they help to reduce thecosts of using and running abstract transducers for particular inputs, for example, to conduct averification task. Minimization is related to reduction but aims at ending up in finite state machineswith a minimal number of states—an optimum.The number of control states of an abstract transducer is critical for the performance of its usein an analysis procedure. Since a minimization is too expensive [18, 28, 50], we propose to adoptreduction techniques as known for NFAs to reduce the size and the degree of non-determinismof abstract transducers—a low degree of non-determinism is critical for efficient execution ofnon-deterministic finite state machines [52]. :22 Andreas Stahlbauer

Abstract transducers can be reduced by merging control states, or their transitions, as long as theset of transductions and the set of accepting transductions is preserved. Please note that we assume,if not stated otherwise, that ϵ -moves were removed before applying the reduction techniques thatwe describe here. Definition 6.1 (Operator reduce ). The (generic) reduction operator reduce : T → T reduces agiven abstract transducer T . Instances of this operator have to guarantee to produce an equivalentabstract transducer, that is, T ≡ reduce ( T ) . Before we continue to outline an algorithm for reducing abstract transducers by merging controlstates, we provide more definitions:

Definition 6.2 (Control State Equivalence).

Two control states q , q ∈ Q of an abstract transducer T are called equivalent to each other, that is, q ≡ q , if and only if they can be merged withoutaffecting the transducer‘s set of transductions nor its set of accepting transductions, that is, if andonly if T ≡ qmerge ( T , { q , q }) .Based on the definition of control state equality, we define the equality of abstract transducer states: Definition 6.3 (Transducer State Equivalence).

Two transducer states ι , ι ∈ J are called equivalent if and only if they describe equivalent pairs of control states and abstract output words, that is,if and only if ∀ ( q , w ) ∈ ι : ∃ ( q ′ , w ′ ) ∈ ι : q ≡ q ′ ∧ w ≡ w ′ and ∀ ( q , w ) ∈ ι : ∃ ( q ′ , w ′ ) ∈ ι : q ≡ q ′ ∧ w ≡ w ′ .To determine whether merging two control states maintains the set of transductions, the notion ofleft transductions is essential: Definition 6.4 (Left Transductions).

The set of left transductions ←−T ( T , q ) ⊆ Σ ∗ × Σ ∗ × Θ ∞ to agiven control state q ∈ Q , which belongs to a particular abstract transducer T ∈ T , is the set of alltransductions that can be produced on paths that start in the initial transducer state ι and that reach the given control state q with a feasible run: ←−T ( T , q ) = (cid:216) { ( ¯ σ , (cid:98) σ , [[ w ′ ]] out )| ( q , w ′ ) ∈ (cid:99) run T ( ¯ σ , (cid:98) σ )∧ ¯ σ ∈ Σ ∗ ∧ (cid:98) σ ⊆ Σ ∗ ∧ w ′ (cid:44) ⊥ } . Proposition 6.5.

A transformation T ′ = qmerge ( T , { q , q }) maintains both the set of transductionsand the set of accepting transductions if the left-transductions of the control states q and q are equal,that is, T ′ ≡ qmerge ( T , { q , q }) if ←−T ( T , q ) = ←−T ( T , q ) . Proof. Control state q is reachable by runs that correspond to the set of left transduc-tions ←−T ( T , q ) and control state q by runs that correspond to the set of left transductions ←−T ( T , q ) .The proposition states that if we merge control states q and q , with ←−T ( T , q ) = ←−T ( T , q ) into anew state q m of a new transducer T ′ = qmerge ( T , { q , q }) then this transducer is equivalent T ′ ≡ T to the original one. (1) First, we show that control state q m is reachable by all feasible runs that canalso reach control state q or q , and that there is no feasible run that can reach q m but neither state q or q . That is, we show that ←−T ( T ′ , q m ) = ←−T ( T , q ) = ←−T ( T , q ) : The operation qmerge ensures thatall transitions that entered either state q or state q also enter state q m ; that is, all feasible runs thatreached q or q now reach state q m and since q m is a new state it is only reachable by these runs.(2) Next, we show that all runs that are feasible from control state q or q are also feasible from bstract Transducers 1:23 control state q m , and there is no feasible run from state q m that is not feasible from control state q or q : The construction process of q m ensures that all transitions that leave states q or q also leavestate q m , and no other transitions get added to leave this state; that is, all feasible runs that start incontrol state q m are also feasible runs if they start in control state q or q . (3) Finally, we have toshow that all runs that are accepting from control state q or q are also accepting from state q m ,and there is no accepting run from control state q m that is not accepting from state q or q : Theoperator qmerge merges states q and q into a state q m , which becomes an accepting control stateif also state q or state q is an accepting control state. That is, the inputs {( ¯ σ , (cid:98) σ ) | ( ¯ σ , (cid:98) σ , ·) ∈ ←−T ( T , q ) become elements of set of accepting transductions of transducer T ′ if they were also accepted bytransducer T . All inputs that get accepted by runs starting from control state q or state q , get alsoaccepted by runs that start from control state q m . □ Statements about the result of manipulating an abstract transducer by merging control states arealso possible based on the notion of right transductions:

Definition 6.6 (Right Transductions).

The set of right transductions −→T ( T , q , w ) ⊆ Σ ∗ × Σ ∗ × Θ ∞ of a given control state q ∈ Q , which belongs to a specific abstract transducer T ∈ T , with initialabstract output word w , is the set of all transductions that can be produced on the feasible runs that start from the given transducer state ( q , w ) : −→T ( T , q , w ) = (cid:216) { ( ¯ σ , (cid:98) σ , [[ w ′ ]] out )| (· , w ′ ) ∈ (cid:99) run T ({( q , w )} , ¯ σ , (cid:98) σ )∧ ¯ σ ∈ Σ ∗ ∧ (cid:98) σ ⊆ Σ ∗ ∧ w ′ (cid:44) ⊥ } . Definition 6.7 (Right Accepted Language).

The right accepted language of a given abstract trans-ducer T for a given control state q is the set of pairs ( ¯ σ , (cid:98) σ ) ∈ Σ ∗ × Σ ∗ that lead to an accepting runif started from the given control state q : −−−→L acc ( T , q ) = { ( ¯ σ , (cid:98) σ ) ∈ L in ( T )| ( q ′ , ·) ∈ (cid:99) run T ({( q , w ϵ )} , ¯ σ , (cid:98) σ )∧ q ′ ∈ F } . Proposition 6.8.

Merging two control states q , q ∈ Q of an abstract transducer T , which resultsin a new abstract transducer, maintains the set of transductions if their sets of right transductions areequal, that is, T ( T ) = T ( qmerge ( T , { q , q })) if −→T ( T , q ) = −→T ( T , q ) . Please note that we do not makea proposition about the set of accepted transductions here. Proof. Let the set of left transductions of two control states q and q be different to each other,that is, ←−T ( T , q ) (cid:44) ←−T ( T , q ) . From proposition 5 . q and q leads to an overapproximation, that is, T | = qmerge ( T , { q , q }) . Itremains to be shown that the set of transductions is preserved if the right-transductions of twocontrol states to merge are actually equal: T ( T ) = T ( qmerge ( T , { q , q })) if −→T ( T , q ) = −→T ( T , q ) ,that is, that the merge does not add additional transductions. To add additional transductionsit would be necessary that the set of right-transductions of control state q m overapproximatesthe union of the right-transductions of control states q and q . Nevertheless, since −→T ( T , q ) isequivalent to −→T ( T , q ) also −→T ( T , q m ) does not add additional right transductions, that is, −→T ( T , q ) = −→T ( T , q ) = −→T ( T , q m ) . □ :24 Andreas Stahlbauer Proposition 6.9.

Merging two control states q , q ∈ Q of an abstract transducer T , which results ina new transducer, does not maintain the set of accepting transductions if their sets of left transductionsare not equal to each other. That is, T acc ( T ) (cid:44) T acc ( qmerge ( T , { q , q })) if ←−T ( T , q ) (cid:44) ←−T ( T , q ) . Proof. Let q and q be two control states of an abstract transducer T , with q ∈ F and q (cid:60) F .Merging these states by qmerge ( T , { q , q }) results in a new transducer T ′ with a control state q m into that q and q have been merged, and that became an accepting control state q m ∈ F ′ . In casethe left transductions ←−T ( T , q ) and ←−T ( T , q ) are different to each other, different inputs can reachstates q and q . Both inputs that reached q or q can reach the control state q m , and all these inputsnow result in accepting runs since q m ∈ F ′ , that is, also runs for inputs that reached q and that werenot accepting before now reach the accepting control state q m , resulting in an overapproximationof the set of accepting transductions. □ Definition 6.10 (Left Equivalent).

The left equivalence relation ≡ L ⊆ Q × Q describes the pairs ofcontrol states that are equivalent to each other and that have the same set of left-transductions—it isa subset of control state equivalence relation. That is, ( q , q ) ∈≡ L if q ≡ q and ←−T ( T , q ) = ←−T ( T , q ) .Proposition 6.11. Given an input- ϵ -free abstract transducer T , a set of two control states Q m = { q , q } ⊆ Q of transducer T satisfy ←−T ( q ) = ←−T ( q ) if ∀ ( q , v , q ′ , w ) ∈ entering ( Q m ) : ∀ ( q ′′ , v ′ , q ′ , w ′ ) ∈ entering ( Q m ) : v ≡ v ′ ∧ w ≡ w ′ ∧ ←−T ( q ) = ←−T ( q ′′ ) . We use the auxiliary func-tion entering ( Q ) = {( q , v , q ′ , w ) ∈ δ | q ′ ∈ Q } . Proof. Given the set of all control states Q p ⊆ Q from these control states in Q m = { q , q } are directly reachable, that is, Q p = { q | ( q , · , q ′ , ·) ∈ δ ∧ q ′ ∈ Q m } . If all states in the set Q p havethe same set of left-transductions, then only transitions from control states in the set Q p to thosecontrol states in the set Q m can affect whether or not the sets of left-transductions of states in Q m are not equal to each other. □ Definition 6.12 (Operator reduce

Left ). The reduction operator reduce

Left : T → T reduces agiven abstract transducer T by merging all control-states that are left-equivalent to another. Thetransformation satisfies reduce Left ( T ) ≡ T .Existing algorithms [28, 30] for reducing automata and symbolic transducers are not applicablebecause the set of transductions is not taken into account in the definition of equivalence. We now present a generic and configurable program analysis that executes an abstract transducer .This abstract transducer analysis keeps track of the current transducer state while processing theinput. The analysis can be configured, for example, to determine the extent to which the transducerstates should be tracked in a path sensitive manner—path sensitivity might be needed for particularanalysis purposes only. Thus, we can mitigate the state-space explosion problem in some cases.The transducer analysis is the foundation for several analyses that we describe in this work, forexample, for the Yarn transducer analysis, and the precision transducer analysis.

Our abstract transducer analysis is built on the concept of configurable program analysis (CPA) [14,15]. The abstract transducer CPA D T = ( D T , ⇝ T , ↓ T , merge T , stop T , prec T , target T ) bstract Transducers 1:25 tracks a set of states of a given abstract transducer T = ( Q , D in , D out , ι , F , δ ) . The CPAs behavior isconfigured by using different variants of its operators. For example, varying the operator merge T can configure the analysis to operate path sensitive, or only context sensitive and flow sensitive [14].We rely on the strengthening operator ↓ T for instantiating parameterized outputs. Other programanalyses, which run in parallel to the abstract transducer analysis, can read and use use the outputwords for different purposes. The abstract transducer analysis D T is composed of the followingcomponents: Abstract Domain D T . The abstract domain D T = ( C , E , [[·]] , ⟨⟨·⟩⟩) is defined based on a map lat-tice E = ( J , ⊤ , ⊥ , ⊑ , ⊔ , ⊓) , with J = Q → W , where each element ι ∈ J of the lattice is an abstracttransducer state. One transducer state ι = {( q , w ) , . . . } ∈ J is a mapping ι : Q → W from controlstates to abstract output words. The analysis starts with the initial transducer state ι of the abstracttransducer to conduct runs for. Transfer Relation ⇝ T . The transfer relation ⇝ T ⊆ J × G × J × Π defines abstract successor statesof an abstract state ι = {( q , w ) , . . . } ∈ J for a given control-flow transition д ∈ G and abstractionprecision π ∈ Π . We define this transition relation without implicit stuttering , that is, if there shouldbe stuttering, the transducer must have corresponding transitions. The transfer relation is definedas follows: ι д ⇝ T { {( q , w )} | ( v , v ℓ ) = look ( д , ℓ )∧ ( q , w ) ∈ (cid:99) run T ( ι , v , v ℓ )∧ q (cid:44) q ⊥ } . Please note that the function (cid:99) run is implicitly parameterized with an abstract closure opera-tor abstclosure . The operator look : G × N → I × I maps the given control-flow transition д ∈ G to an abstract input word v and provides a bounded lookahead of length ℓ in form of the abstractinput word v ℓ which is derived from the control-flow transitions that follow transition д on thecontrol-transition relation of the underlying analysis task.The operator look does not only provide the lookahead but also translates between the alphabetof the graph that is traversed to the abstract input alphabet of the abstract transducer. That is,varying this operator provides different views on the given input, for example, a control-flowtransition д ∈ G can be translated to the function to that the transition belongs to, or to thesuccessor control location that is reached by the control transition.The operator merge can decide later if states should be tracked separately or not. Operator ↓ T . The strengthening [14] operator ↓ T : E × × E × × J → J is called after all analysesthat run in parallel have provided an abstract successor state as components for the compositestate e × = ( e , . . . , e n ) ∈ E × . At this point, the strengthening operator can access the information thatis present in any of the component states e i ∈ e × and use them to strengthen its own (component)state. We instantiate parameterized output words during strengthening. Information of an analysisthat runs in parallel can be used to support various instantiation and synthesis mechanisms.The strengthening ι ′ = {( q ′ , w ′ )} = ↓ T ( e × , e ′× , ι ) is conducted for a given transducer state ι = {( q , w )} , which is the result of conducting a transducer transition τ = ( q , v , q ′ , w ) ∈ δ for aninput ( ¯ σ , (cid:98) σ ) ∈ Σ ∗ × Σ ∗ . Beside the information that can be found in the composite states e × and e ′× , also the values that were bounded to the parameters of the abstract input word v can be takeninto account to instantiate the abstract output word w ′ . A consistent binding of parameters amongdifferent transitions, that is, for the whole program trace—as this is used by some aspects andcorresponding weavers [3]—is not yet supported. :26 Andreas Stahlbauer Operator merge T . The merge operator merge T : J × J × Π → J controls if two transducer statesshould get combined, or if they should be explored separately and separate the state space. Thebehavior of the operator can be controlled based on a given precision π ∈ Π . The default is toalways separate two different abstract states, that is, merge T = merge sep [14], which ensures thepath sensitivity of the analysis. Please note that the abstract transducer analysis is typically oneof several analyses that run as components of a composite analysis: Even if the analysis wouldconduct a merge, other component analyses might signal not to do so. Operator stop T . The coverage check operator stop T : J × J → B decides whether a given abstractstate is already covered by a state reached or not. As default, we use the inclusion relation of thelattice, that is, stop T = stop sep . Operator prec T . The precision adjustment operator prec T could conduct further abstraction of agiven abstract state. We do not abstract here: A call prec T ( ι , π , ·) returns the pair ( ι , π ) ∈ J × Π without adjustments. Operator target T . The target operator target T : J → S determines the set of properties for that agiven abstract state is a target state. Each property is a task concern , that is, the set of properties S ⊂ H is a subset of the set H of task concerns. We assume that there is only one transition τ = (· , · , q ′ , ·) ∈ δ for each accepting control state q ′ ∈ F . We rely on a function ζ : ∆ → H that maps each transducertransition to a set of task concerns. Given an abstract transducer state ι = {( q , ·) , . . . ( q n , ·)} ∈ J ,the operator returns: target T ( ι ) = (cid:216) { ζ ( t ) | ( q , ·) ∈ ι ∧ q ∈ F ∧ t = (· , · , q , ·) ∈ δ } By relying on the CPA framework [14], the abstract transducer analysis is equipped with an inherentnotion of configurability, and can be instantiated several times and in different ways within theframework, to conduct an analysis task in the most efficient and effective manner.

Transducer Composition.

It might be necessary to execute several abstract transducers in parallelalong with the state space exploration for an analysis task. Given a list ⟨ T , . . . , T n ⟩ of abstracttransducers to run, a list ⟨ D , . . . , D m ⟩ of analyses, with n ≥ m >

1, has to be instantiated. Weassume that these transducers have the same abstract input domain and the same abstract outputdomain, and consider the composition of transducers with different abstract output domains to befuture work. The first approach ( separation ) is to instantiate one analysis for each abstract transducer( m = n ), which fosters a clear separation of concerns. Each of the m instantiated analyses addsone component to composite (product) state that is formed by the composite analysis; the numberof CPAs operators that are invoked transitively by the CPA algorithm increases. An alternativeapproach ( union ) is to construct the union T ∪ = T ∪ ... ∪ T n of the transducers to run and to runthis single transducer T ∪ with one abstract transducer analysis. Also hybrid approaches can betaken, that is, construct unions for subsets of the transducers, and run others separately. State-Set Composition.

One abstract state ι = {( q , ·) , ( q , ·) , . . . } ∈ J of the transducer analysiscan contain several control states from the set Q of the abstract transducer to run for a givenanalysis task. The number of control states per abstract state can be controlled by the transduceranalysis and its operators, for example, the operator merge , which decides whether or not to exploretwo abstract states separately. The decision to join two different control states into one state set of bstract Transducers 1:27 one abstract state of the transducer analysis can affect the path sensitivity of the analysis, that is, ifit is possible to determine the branch of the state space that has led to a given control state. Abstract transducers combine different concepts and techniques from formal methods, automatatheory, domain theory, and abstract interpretation, to end up in a generic type of abstract machine.We discuss the related work based on the different concepts that can be found in abstract transducersand explain the relationship and differences to existing work.

Symbolic Alphabet.

An abstract transducer can use arbitrarily composed abstract domains todefine both its input and the output; for the input domain, we require that its lattice is dual to aBoolean algebra. We introduce a special class of abstract domain, the abstract word domain, todescribe words of complex entities, such as program traces of concurrent systems. Symbolic finiteautomata and transducers [31] share the idea of using theories to describe sets of input and outputsymbols. Other types of automata describe their input symbols based on predicates [74] or asmulti-valued input symbols [37, 54]. From the perspective of abstract transducers, trace partitioningdomains [67], lattice automata [37], and regular expressions over lattice-based alphabets [57] areinstances of abstract word domains.

Output Closure.

With abstract transducers, we also introduce means to deal with ϵ -loops thatare annotated with outputs, that is, to compute and use finite symbolic representations of outputsthat potentially consist of exponentially many and infinitely long words. Compared to existingwork [31, 61], we also consider ϵ -moves that lead to dead ends as relevant, handle them in ouralgorithms, and do not consider them as candidates for removal. Lookahead.

In each step of processing input, abstract transducers can conduct a lookahead on theremaining input to determine which transitions to take. Several other types of abstract machinesprovide the capability of lookaheads, for example, tree transducers, which had been extendedto support regular lookaheads [33], or extended symbolic finite state machines [29]. A labelingwith words instead of letters is also conducted in the case of generalized finite automata [70], butthey consume full words in a transition step—instead of just one letter as is the case for abstracttransducers.

Transducer Abstraction.

By defining both the input alphabet and the output alphabet of ab-stract transducers based on an abstract domain, we can make use of the full range of abstractionmechanisms that were developed in the context of abstract interpretation for abstracting abstracttransducers, that is, to widen their set of transductions. Approaches for abstracting classical au-tomata have been presented in the past [23]. A more recent work [65] presented techniques forabstracting symbolic automata, which could also be adopted to abstract transducers. This work isthe first that proposes to abstract a type of transducer for increasing its sharing, that is, widen theset of words for which particular outputs are produced.

Running Transducers.

Running automata in parallel to a program analysis is an establishedconcept in the fields of program analysis and verification [9, 12, 16]. Algorithmic aspects of howautomata are executed, for example, how the current state of automata is represented in the statespace of the analysis task, are in many cases [45] not discussed further, while the performanceimplications can be dramatical. Work in the context of configurable program analysis [10, 11, 13] ismost transparent about this.

Transducers for Analysis and Verification.

Transducers are widely used in the context ofprogram analysis and verification. They are used, for example, for synthesis [64], to describe theinput-output-relation of programs [19, 43, 64, 78], and for string manipulations [76, 78]. Automata :28 Andreas Stahlbauer that produce an output—which is then used in the analysis process—have been proposed in theform of assumption automata [13] for conditional model checking, error witness automata thatoutput strengthening conditions [11] to narrow down the state space of the analysis process, andfor correctness witnesses [10].

This work has introduced abstract transducers, a type of abstract machines that map betweenan input language and an output language while taking a lookahead into account. In contrast toestablished finite-state transducers, abstract transducers have a strong focus on the intermediatelanguage that they produce, which has several implications on the design of algorithms that operateon these machines. Both the input alphabet and the output alphabet of abstract transducers consistof abstract words , where one abstract word denotes a set of concrete words. Means for representing,constructing, and widening of abstract words, and for describing their relationship, are provided bythe corresponding abstract word domain. Building on these abstract alphabets allows for abstracting these transducers.We use techniques from abstract interpretation as the foundation for our abstraction mechanisms.The concept of abstract transducers enables several new applications: We discussed applications inthe context of sharing task artifacts for reuse within program analysis tasks.From the concept of abstract transducers, we instantiate the concept of task artifact transducers ,which generalize a group of finite-state machines that are used in the context of program analysisand verification for reproducing and sharing information. These transducers provide informationthat contributes to an analysis task and its solution. The underlying graph structure of finite-statetransducers allows us to capture the structure of information, share it, and enable its reuse. Taskartifact transducers have several applications and we outlined some of them: Yarn transducers,which provide sequences of program operations to weave into a transition system, and precisiontransducers, which are a means to define the level of abstraction for different parts of the statespace. Other applications of task artifact transducers can be found in the context of providingand checking verification evidence, for example, transducers for error witnesses, which provideinformation that guides towards specification violations, or transducers for correctness witnesses,which provides certificates to check while traversing the control flow of programs. bstract Transducers 1:29

REFERENCES [1] P. A. Abdulla, Y.-F. Chen, L. Holík, R. Mayr, and T. Vojnar. 2010. When Simulation Meets Antichains. In

Proc. TACAS(LNCS) , Vol. 6015. Springer, 158–174.[2] S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum. 1994.

Handbook of Logic in Computer Science, Semantic Structures .Vol. 3. Clarendon Press.[3] C. Allan, P. Avgustinov, A. S. Christensen, L. J. Hendren, S. Kuzins, O. Lhoták, O. de Moor, D. Sereni, G. Sittampalam,and J. Tibble. 2005. Adding trace matching with free variables to AspectJ. In

Proc. OOPSLA . ACM, 345–364.[4] G. Avni and O. Kupferman. 2013. When does abstraction help?

Inf. Process. Lett.

Acta Inf.

27, 7(1990), 583–625.[6] T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani. 2001. Automatic Predicate Abstraction of C Programs. In

Proc.PLDI . ACM, 203–213.[7] T. Ball, A. Podelski, and S. K. Rajamani. 2001. Boolean and Cartesian Abstraction for Model Checking C Programs. In

Proc. TACAS . Springer, 268–283.[8] T. Ball and S. K. Rajamani. 2002.

SLIC: A Specification Language for Interface Checking (of C) . Technical ReportMSR-TR-2001-21. Microsoft Research.[9] D. Beyer, A. Chlipala, T. A. Henzinger, R. Jhala, and R. Majumdar. 2004. Generating Tests from Counterexamples. In

Proc. ICSE . 326–335.[10] D. Beyer, M. Dangl, D. Dietsch, and M. Heizmann. 2016. Correctness witnesses: exchanging verification results betweenverifiers. In

Proc. FSE . ACM, 326–337.[11] D. Beyer, M. Dangl, D. Dietsch, M. Heizmann, and A. Stahlbauer. 2015. Witness Validation and Stepwise Testificationacross Software Verifiers. In

Proc. ESEC/FSE . ACM, 721–733.[12] D. Beyer, T. A. Henzinger, R. Jhala, and R. Majumdar. 2007. The software model checker Blast.

STTT

9, 5-6 (2007),505–525.[13] D. Beyer, T. A. Henzinger, M. E. Keremoglu, and P. Wendler. 2012. Conditional model checking: A technique to passinformation between verifiers. In

Proc. FSE . ACM, 57.[14] D. Beyer, T. A. Henzinger, and G. Théoduloz. 2007. Configurable Software Verification: Concretizing the Convergenceof Model Checking and Program Analysis. In

Proc. CAV . Springer, 504–518.[15] D. Beyer, T. A. Henzinger, and G. Théoduloz. 2008. Program Analysis with Dynamic Precision Adjustment. In

Proc.ASE . IEEE, 29–38.[16] D. Beyer, A. Holzer, M. Tautschnig, and H. Veith. 2013. Information Reuse for Multi-goal Reachability Analyses. In

Proc. ESOP (LNCS) , Vol. 7792. Springer, 472–491.[17] D. Beyer, S. Löwe, E. Novikov, A. Stahlbauer, and P. Wendler. 2013. Precision Reuse for Efficient Regression Verification.In

Proc. ESEC/FSE . ACM, 389–399.[18] H. Björklund and W. Martens. 2012. The tractability frontier for NFA minimization.

J. Comput. Syst. Sci.

78, 1 (2012),198–210.[19] V. Botbol, E. Chailloux, and T. Le Gall. 2017. Static Analysis of Communicating Processes Using Symbolic Transducers.In

Proc. VMCAI (LNCS) , Vol. 10145. Springer, 73–90.[20] M. J. J. Branco and J.-E. Pin. 2009. Equations Defining the Polynomial Closure of a Lattice of Regular Languages. In

Proc. ICALP (2) (LNCS) , Vol. 5556. Springer, 115–126.[21] R. E. Bryant. 1992. Symbolic Boolean Manipulation with Ordered Binary-Decision Diagrams.

ACM Comput. Surv.

24, 3(1992), 293–318.[22] J. A. Brzozowski. 1964. Derivatives of Regular Expressions.

J. ACM

11, 4 (1964), 481–494.[23] T. Bultan, F. Yu, M. Alkhalaf, and A. Aydin. 2017.

String Analysis for Software Verification and Security . Springer.[24] E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. 2000. Counterexample-Guided Abstraction Refinement. In

Proc.CAV (LNCS 1855) . Springer, 154–169.[25] A. Cortesi, G. Costantini, and P. Ferrara. 2013. A Survey on Product Operators in Abstract Interpretation. In

Festschriftfor Dave Schmidt (EPTCS) , Vol. 129. 325–336.[26] P. Cousot and R. Cousot. 1977. Abstract interpretation: A unified lattice model for the static analysis of programs byconstruction or approximation of fixpoints. In

Proc. POPL . ACM, 238–252.[27] P. Cousot and R. Cousot. 1992. Abstract Interpretation Frameworks.

J. Log. Comput.

2, 4 (1992), 511–547.[28] L. D’Antoni and M. Veanes. 2014. Minimization of symbolic automata. In

Proc. POPL . ACM, 541–554.[29] L. D’Antoni and M. Veanes. 2015. Extended symbolic finite automata and transducers.

Formal Methods in SystemDesign

47, 1 (2015), 93–119.[30] L. D’Antoni and M. Veanes. 2017. Forward Bisimulations for Nondeterministic Symbolic Finite Automata. In

Proc.TACAS (LNCS 10205) . 518–534. :30 Andreas Stahlbauer [31] L. D’Antoni and M. Veanes. 2017. The Power of Symbolic Automata and Transducers. In

Proc. CAV (LNCS 10426) .Springer, 47–67.[32] D. Duffus, B. JÃşnsson, and I. Rival. 1978. Structure Results for Function Lattices.

Canadian Journal of Mathematics

Mathematical Systems Theory

10 (1977),289–303.[34] A. Ferguson and J. Hughes. 1989. An Iterative Powerdomain Construction. In

Functional Programming (Workshops inComputing) . Springer, 41–55.[35] G. Filé, R. Giacobazzi, and F. Ranzato. 1996. A Unifying View of Abstract Domain Design.

ACM Comput. Surv.

28, 2(1996), 333–336.[36] D. D. Freydenberger. 2013. Extended Regular Expressions: Succinctness and Decidability.

Theory Comput. Syst.

53, 2(2013), 159–193.[37] T. Le Gall and B. Jeannet. 2007. Lattice Automata: A Representation for Languages on Infinite Alphabets, and SomeApplications to Verification. In

Proc. SAS (LNCS) , Vol. 4634. Springer, 52–68.[38] Vijay K. Garg. 2015.

Introduction to lattice theory with computer science applications . Wiley.[39] M. Gehrke, S. Grigorieff, and J.-E. Pin. 2008. Duality and Equational Theory of Regular Languages. In

Proc. ICALP (2)(LNCS) , Vol. 5126. Springer, 246–257.[40] S. Graf and H. Saïdi. 1997. Construction of Abstract State Graphs with PVS. In

Proc. CAV . Springer, 72–83.[41] G. GrÃďtzer. 2011.

Lattice Theory: Foundation . BirkhÃďuser.[42] M. Hamana. 2003. Term rewriting with variable binding: an initial algebra approach. In

Proc. PPDP . ACM, 148–159.[43] R. Hamlet. 1978. Test reliability and software maintenance. In

Proc. COMPSAC . IEEE, 315–320.[44] J. B. Hart and C. Tsinakis. 2007. A concrete realization of the Hoare powerdomain.

Soft Comput.

11, 11 (2007),1059–1063.[45] M. Heizmann, J. Hoenicke, and A. Podelski. 2013. Software Model Checking for People Who Love Automata. In

Proc.CAV (LNCS) , Vol. 8044. Springer, 36–52.[46] T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan. 2004. Abstractions from proofs. In

Proc. POPL . ACM,232–244.[47] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. 2002. Lazy abstraction. In

Proc. POPL . ACM, 58–70.[48] J. E. Hopcroft, R. Motwani, and J. D. Ullman. 2003.

Introduction to automata theory, languages, and computation -international edition (2. ed) . Addison-Wesley.[49] E. V. Huntington. 1904. Sets of Independent Postulates for the Algebra of Logic.

Trans. Amer. Math. Soc.

5, 3 (1904),288–309.[50] T. Jiang and B. Ravikumar. 1993. Minimal NFA Problems are Hard.

SIAM J. Comput.

22, 6 (1993), 1117–1141.[51] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. V. Lopes, J.-M. Loingtier, and J. Irwin. 1997. Aspect-OrientedProgramming. In

Proc. ECOOP (LNCS 1241) . Springer, 220–242.[52] S.-K. Ko and Y.-S. Han. 2014. Left is Better than Right for Reducing Nondeterminism of NFAs. In

Proc. CIAA (LNCS) ,Vol. 8587. Springer, 238–251.[53] S. Kong, Y. Jung, C. David, B.-Y. Wang, and K. Yi. 2010. Automatically Inferring Quantified Loop Invariants byAlgorithmic Learning from Simple Templates. In

Proc. APLAS (LNCS 6461) . Springer, 328–343.[54] O. Kupferman and Y. Lustig. 2007. Lattice Automata. In

Proc. VMCAI (LNCS) , Vol. 4349. Springer, 199–213.[55] C. Löding and A. Tollkötter. 2016. Transformation Between Regular Expressions and omega-Automata. In

MFCS(LIPIcs) , Vol. 58. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 88:1–88:13.[56] G. H. Mealy. 1955. A method for synthesizing sequential circuits.

The Bell System Technical Journal

34, 5 (1955),1045–1079.[57] J. Midtgaard, F. Nielson, and H. R. Nielson. 2016. A Parametric Abstract Domain for Lattice-Valued Regular Expressions.In

Proc. SAS (LNCS) , Vol. 9837. Springer, 338–360.[58] E. F. Moore. 1956. Gedanken-experiments on sequential machines.

Automata studies

34 (1956), 129–153.[59] P. P. Nayak and A. Y. Levy. 1995. A Semantic Theory of Abstractions. In

IJCAI . Morgan Kaufmann, 196–203.[60] T. Nipkow and C. Prehofer. 1998. Higher-order rewriting and equational reasoning.

Automated DeductionâĂŤA Basisfor Applications

Proc.ICASSP . IEEE, 4317–4320.[62] S. Pinchinat and H. Marchand. 2000. Symbolic abstractions of automata. In

Discrete Event Systems . Springer, 39–48.[63] N. Pippenger. 1997. Regular Languages and Stone Duality.

Theory Comput. Syst.

30, 2 (1997), 121–134.[64] A. Pnueli and R. Rosner. 1989. On the Synthesis of a Reactive Module.. In

Proc. POPL . 179–190.[65] M. D. Preda, R. Giacobazzi, A. Lakhotia, and I. Mastroeni. 2015. Abstract Symbolic Automata: Mixed syntactic/semanticsimilarity analysis of executables. In

Proc. POPL . ACM, 329–341. bstract Transducers 1:31 [66] M. D. Preda, R. Giacobazzi, and I. Mastroeni. 2016. Completeness in Approximate Transduction. In

Proc. SAS (LNCS) ,Vol. 9837. Springer, 126–146.[67] X. Rival and L. Mauborgne. 2007. The trace partitioning abstract domain.

ACM Trans. Program. Lang. Syst.

29, 5 (2007),26.[68] O. Sery, G. Fedyukovich, and N. Sharygina. 2012. Incremental Upgrade Checking by Means of Interpolation-basedFunction Summaries. In

Proc. FMCAD . FMCAD, 114–121.[69] G. Singh, M. Püschel, and M. T. Vechev. 2017. Fast polyhedra abstract domain. In

Proc. POPL . ACM, 46–59.[70] M. Sipser. 1997.

Introduction to the Theory of Computation . PWS Publishing Company.[71] S. Srivastava and S. Gulwani. 2009. Program verification using templates over predicate abstraction. In

Proc. PLDI .ACM, 223–234.[72] A. Stahlbauer. under submission (2019).

Abstract Transducers for Program Analysis and Verification . Ph.D. Dissertation.University of Passau.[73] M. H. Stone. 1936. The Theory of Representation for Boolean Algebras.

Trans. Amer. Math. Soc.

40, 1 (1936), 37–111.[74] G. van Noord and D. Gerdemann. 2001. Finite State Transducers with Predicates and Identities.

Grammars

4, 3 (2001),263–286.[75] M. Veanes. 2013. Applications of Symbolic Finite Automata. In

Proc. CIAA (LNCS 7982) . Springer, 16–23.[76] M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjørner. 2012. Symbolic finite state transducers: algorithmsand applications. In

Proc. POPL . ACM, 137–150.[77] P. M. Whitman. 1946. Lattices, equivalence relations, and subgroups.

Bull. Amer. Math. Soc.

52, 6 (1946), 507–522.[78] F. Yu, T. Bultan, and O. H. Ibarra. 2011. Relational String Verification Using Multi-Track Automata.