Aperiodicity, Star-freeness, and First-order Definability of Structured Context-Free Languages
AAperiodicity, Star-freeness, and First-order Definabilityof Structured Context-Free Languages
Dino Mandrioli , Matteo Pradella , , Stefano Crespi Reghizzi , Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano,Piazza Leonardo Da Vinci 32, 20133 Milano, Italy IEIIT, Consiglio Nazionale delle Ricerche, via Ponzio 34/5, 20133 Milano, Italy { dino.mandrioli,matteo.pradella,stefano.crespireghizzi } @polimi.it Abstract.
A classic result in formal language theory is the equivalence amongnoncounting, or aperiodic, regular languages, and languages defined through star-free regular expressions, or first-order logic. Together with first-order completenessof linear temporal logic these results constitute a theoretical foundation for model-checking algorithms. Extending these results to structured subclasses of context-free languages, such as tree-languages did not work as smoothly: for instance W.Thomas showed that there are star-free tree languages that are counting.We show, instead, that investigating the same properties within the family ofoperator precedence languages leads to equivalences that perfectly match thoseon regular languages. The study of this old family of context-free languages hasbeen recently resumed to enhance not only parsing (the original motivation of itsinventor R. Floyd) but also to exploit their algebraic and logic properties. We havebeen able to reproduce the classic results of regular languages for this much largerclass by going back to string languages rather than tree languages.Since operator precedence languages strictly include other classes of structuredlanguages such as visibly pushdown languages, the same results given in this paperhold as trivial corollary for that family too.
Keywords:
Operator Precedence Languages, Aperiodicity, First-Order Logic,Star-Free Expressions, Visibly Pushdown Languages, Input-Driven Languages,Structured Languages
From a long time much research effort in the field of formal language theory has beendevoted to extend as much as possible the nice algebraic and logic properties of regularlanguages to larger families of languages, typically the context-free ones or subfamiliesthereof. Regular languages in fact are closed w.r.t. all basic algebraic operations and arecharacterized also in terms of classic monadic second-order (MSO) logic [9,19,39], butnot so for general context-free languages.A noticeable exception is provided by so called structured context-free languages .With this term we mean those various families of languages whose typical tree-structureis immediately visible in their sentences: two first historical and practically equivalentexamples of such languages are parenthesis languages and tree languages introduced a r X i v : . [ c s . F L ] J un Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi respectively by McNaughton [30] and Thatcher [37]. More recently, input-driven lan-guages (IDL) [8], later renamed visibly pushdown languages (VPL) [2] and height-deterministic languages [32] have also been shown to share many important propertiesof regular languages. In particular tree languages and VPLs are closed w.r.t. booleanoperations, concatenation, Kleene * and are characterized in terms of some MSO logic,although such operations and the adopted logic language are rather differently definedin the two cases. For a more complete analysis of structured languages and how theyextend algebraic and logic properties of regular languages, see [28].In this paper we are interested in an important subfamily of regular languages and itsextension to various types of (structured) context-free languages, namely noncounting(NC) or aperiodic languages. Intuitively, aperiodicity is a property of a recognizingdevice which prevents from separating strings that differ from each other by the numberof repetitions of some substring, e.g. odd versus even. For instance, many hardwaredevices count sequences of bits modulo some positive number, whereas most of thelexical rules defining programming language identifiers are noncounting.Aperiodicity has been thoroughly investigated within the family of regular languages.Many sophisticated techniques elaborated by various researchers discovered unexpectedequivalences among subclasses of regular languages defined by means of differentformalisms in apparently unrelated ways. Among them, the most relevant results areprobably the equivalence of noncounting regular languages, the languages definedthrough star-free regular expressions, i.e., regular expressions made out of all booleanoperations and concatenation but avoiding Kleene *, and languages defined through thefirst-order restriction of MSO – first-order (FO) logic –. FO definability, in particular,has a tremendous impact on the success of model-checking algorithms, thanks to thefirst-order completeness of linear temporal logic . Within the rich literature on aperiodicregular languages and the various equivalent classes a fairly comprehensive treatment isoffered by [31].Not surprisingly various attempts have been done to extend the notion of aperiodicitybeyond regular languages, specifically to some kind of structured context-free languages .The noncounting property, in fact, is perhaps even more important for context-freelanguages than for regular ones: whereas various hardware devices, e.g., count modulosome natural number, it is quite unlikely that a programming, or data description, or anatural language exhibits counting features such as forbidding an even number of nestedloops or recursive procedure calls. We could claim that most if not all of context-freelanguages of practical interest have an aperiodic structure.So far, however, the investigation of aperiodic structured context-free languagesachieved only partial results and left several critical questions still open. Noncountingparenthesis languages have been first introduced in [12]; then an equivalent definition interms of tree languages has been given in [38]. In that paper, however, the author showedthat the same equivalences holding for regular languages do not extend to tree languages:e.g., there are counting star-free languages. The same work and further subsequentstudies (e.g., [23,20,24,34]) provided partial results by investigating special subclasses This result is due to H.W. Kamp. From his thesis several simplified proofs have been derived,e.g., [35].periodicity, Star-freeness, and First-order Definability of Structured CF Languages 3 of the various involved families but the original simplicity and beauty of the regularlanguage properties was irremediably lost.In this paper we show that exactly the same equivalences holding for aperiodicregular languages hold for a large and important class of context-free languages, namely operator precedence languages (OPL) . OPLs have been invented by Floyd to supportefficient deterministic parsing [21]. We classify them as “structured but semivisible”languages since their structure is implicitly assigned by precedence relations betweenterminal characters which were inspired to Floyd by the precedence rules betweenarithmetic operations: as an early intuition for readers who are not familiar with OPLs,the expression a + b ∗ c ”hides” the parenthetic structure ( a + ( b ∗ c )) which is impliedby the fact that multiplicative operations should be applied before the additive ones.It was also soon apparent that, thanks to such an implicit structure assigned to thestrings by the precedence relations, these languages enjoy some closure properties typicalof regular languages and other structured context-free ones [15] despite the fact that,unlike more traditional structured languages, they still require a typical parsing processto make their syntax trees explicit. This fact accounts for a much wider application fieldwhich includes most programming and data description languages.More recently we resumed the study of this old family of languages and, besidesapplying their properties to support new parallel parsing algorithms [6], we discoveredfurther fundamental algebraic and logic properties thereof, thus completing some typicalextensions from regular to structured context-free languages.Besides supplying an automata family recognizing OPLs, the Operator Precedenceautomata (OPA) , we produced an MSO logic characterization thereof as a naturalextension of the classic one for regular languages [27]. Furthermore we showed [14] thatOPLs are a considerable generalization of VPLs which in turn generalize parenthesislanguages which are an equivalent formalism as tree-languages.Apart from strictly set theoretic containment, OPLs enlarge significantly the scope ofpractical application of other structured languages, e.g., in terms of automatic propertyproof. E.g., together with their corresponding MSO logic they allow to specify and proveproperties of systems where the typical LIFO policy of procedure calls and returns canbe broken by unexpected events such as interrupts or exceptions [27,28], a feature that isnot available in VPLs and their MSO logic [4].In summary, to the best of our knowledge, OPLs are far the largest language familythat enjoys the main closures and decidability properties of regular languages, besides alogical characterization that is a natural extension of the classic one.The coincidental remark that a subclass of OPLs is both noncounting [17] and FOlogic definable [26] suggested to take again the challenge of extending the equivalencesof aperiodic, star-free, FO definable regular languages to a significant class of context-free languages. We achieved our goal for OPLs thanks to two key ideas:1. We abandoned the “traditional” approach of extending regular languages as treelanguages and we went back to string languages. This implied going back to theoperation of string concatenation which was replaced by the append operation inthe case of tree languages (this fact is the origin of the counterexample given byThomas [38] to show that there are counting star-free tree languages).
Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi
2. We adopted the same MSO logic extension that we exploited for our previous results[26] which in turn was inspired by similar extensions defined for general context-freelanguages [25] and for VPLs [2] and examined its restriction to the FO case. Suchlogics, again, are defined on strings rather than on trees.The main results we present in this paper are therefore:1. We define operator precedence expressions (OPE) which extend regular expressionsby adding an operation that imposes a matching between two (hidden) parentheses.We show that OPEs define the OPL family. This is done in Section 3.2. We show that star-free OPEs define exactly those OPLs that are definable throughFO formulas (in Section 4), and define noncounting OPLs (in Section 5).3. Finally, in Section 7, we show that every NC OPL can be defined by means of aFO formula. This last step requires a rather articulated procedure which exploits a regular language control theorem (in Section 6) which, informally, “splits” the logicformulas defining an OPL into a part devoted to describe its typical tree-like structureand another part that imposes a regular constraint on the strings derived fromgrammar’s nonterminal symbols. By means of several nontrivial transformations weshow that such a control language can be made NC if the original OPL is in turnNC. Thanks to the fact that both parts of the logic formulas can be defined throughFO formulas, we finally obtain the equivalences
OPL = OPE-languages = MSO-languages and
NC-OPL = SF-OPE-languages = FO-languages exactly as for regular languages.Thus, our results open the door to extend to OPLs the successful model checkingtechniques typical of regular languages.The next preliminary Section 2 provides the necessary terminology and background onOPLs, aperiodicity, parenthesis languages, MSO and FO logic language characterization.
For brevity, we just list our notations for the basic concepts we use from formal languageand automata theory. The terminal alphabet is usually denoted by Σ , and the emptystring is ε . For a string x , | x | denotes the length of x . The character , not present in theterminal alphabet, is used as string delimiter , and we define the alphabet Σ = Σ ∪ { } . Finite Automata A finite automaton (FA) A is defined by a 5-tuple ( Q, Σ, δ, I, F ) where Q is the set of states, δ the state-transition relation (or its graph denoted by → ), δ ⊆ Q × Σ × Q ; I and F are the nonempty subsets of Q respectively comprising theinitial and final states. If the tuple ( q, a, q (cid:48) ) is in the relation, the edge q a −−→ q (cid:48) is inthe graph. The transitive closure of the relation is defined as usual. Thus, for a string x ∈ Σ ∗ such that there is a path from state q to q (cid:48) labeled with x , the notation q x −−→ q (cid:48) is equivalent to ( q, x, q (cid:48) ) ∈ δ ∗ ; if q ∈ I and q (cid:48) ∈ F , then the string x is accepted by A .The language of the accepted strings is denoted by L ( A ) , it is called a regular language . periodicity, Star-freeness, and First-order Definability of Structured CF Languages 5 We also need two well-known extensions of the previous FA definition, both notimpacting on the language family recognized. In the first extension, we permit an edgelabel to be the empty string; such an edge is called a spontaneous transition or step. Inthe second one, an edge label may be a string in Σ + . These two classical extensions areformalized by letting δ ⊆ Q × Σ ∗ × Q . An edge with a label in Σ ∗ is called a macrotransition or macrostep . Non-counting or aperiodic regular languages
A regular language L over Σ is called noncounting (NC) or aperiodic if there exists an integer n ≥ such that for all x, y, z ∈ Σ ∗ , xy n z ∈ L iff xy n + m z ∈ L , ∀ m ≥ .. Regular expressions and star-free languages A regular expression (RE) over an alphabet Σ is a well-formed formula made with the characters of Σ , ∅ , ε , the Boolean operators ∪ , ¬ , ∩ , the concatenation · , and the Kleene star operator ’ ∗ ’. We may also use theoperator + . When neither ’ ∗ ’ nor ’ + ’ are used, the RE is called star-free (SF). An RE E defines a language over Σ , denoted by L ( E ) . Proposition 1.
Finite automata and regular expressions define the language familyof regular (or rational) languages (REG). The family of aperiodic regular languagescoincides with the family of languages defined by star-free REs.
A (context-free) grammar is a tuple G =( Σ, V N , P, S ) where Σ and V N , with Σ ∩ V N = ∅ , are resp. the terminal and thenonterminal alphabets, the total alphabet is V = Σ ∪ V N , P ⊆ V N × V ∗ is the rule(or production) set, and S ⊆ V N , S (cid:54) = ∅ , is the axiom set. For a generic rule B → α ,where B and α are resp. called the left/right hand sides (lhs / rhs) the following formsare relevant:axiomatic : B ∈ S terminal : α ∈ Σ + empty : α = ε renaming : α ∈ V N operator : α ∩ V ∗ V N V N V ∗ = ∅ , i.e., at least one terminal is interposed betweenany two nonterminals occurring in α parenthesized : α = (cid:76) β (cid:77) where (cid:76) and (cid:77) are new terminals not in Σ .A grammar is called backward deterministic or a BD-grammar (or invertible ) if ( B → α, C → α ∈ P ) implies B = C .If all rules of a grammar are in operator form, the grammar is called an operatorgrammar or O-grammar.A grammar ˜ G = (cid:16) V, Σ ∪ { (cid:76) , (cid:77) } , ˜ P , S (cid:17) (1) is a parenthesis grammar (Par-grammar) if the rhs of every rule is parenthesized. For agrammar G = ( V, Σ, P, S ) , the grammar (1) is called the parenthesized version of G , if ˜ P consists of all rules B → (cid:76) β (cid:77) such that B → β is in P . Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi
For brevity we give for granted the usual definition of derivation denoted by thesymbols == ⇒ G (immediate derivation), ∗ == ⇒ G (reflexive and transitive closure of == ⇒ G ), h == ⇒ G (derivation in h steps); the subscript G will be omitted whenever clear from the context.We give also for granted the notion of syntax tree and that a parenthesized string isan equivalent way to represent a syntax tree of a context-free grammar where internalnodes are unlabeled.The language defined by a grammar starting from a nonterminal X is L G ( X ) = { w | w ∈ Σ ∗ , X ∗ == ⇒ G w } We call w a sentence if X ∈ S . The union of L G ( X ) for all X ∈ S is the language L ( G ) defined by G . The language generated by a Par-grammar is called a parenthesislanguage , and its sentences are well-parenthesized strings.Two grammars defining the same language are equivalent . Two grammars such thattheir parenthesized versions are equivalent, are structurally equivalent . Any grammar can be transformed, preserving equivalence, into a BD-grammar, and alsointo an O-grammar [5,22] without renaming rules and without empty rules but possibly asingle rule whose lhs is an axiom not otherwise occurring in any other production.
Fromnow on, w.l.o.g., we exclusively deal with O-grammars without renaming rules, and, if ε is part of the language, with an axiomatic rule B → ε , where B does not appear in therhs of any production. Definition 3 (Backward deterministic reduced grammar [30,36]). A context over analphabet Σ is a string in Σ ∗ {−} Σ ∗ , where the character ‘ − ’ / ∈ Σ is called a blank. Wedenote by α [ x ] the context α with its blank replaced by the string x . Two nonterminals B and C of a grammar G are termed equivalent if, for every context α , α [ B ] is derivablefrom an axiom of G iff so is α [ C ] (not necessarily from the same axiom).A nonterminal B is useless if there is no context α such that α [ B ] is derivable froman axiom or B generates no terminal string. A terminal b is useless if it does not appearin any sentence of L ( G ) .A grammar is clean if it has no useless nonterminals and terminals. A grammar is reduced if it is clean and no two nonterminals are equivalent.A BDR-grammar is both backward deterministic and reduced. From [30], every parenthesis language is generated by a unique, up to an isomorphismof its nonterminal alphabet, Par-grammar that is BDR.
Operator precedence grammars
We define the operator precedence grammars (OPGs)following primarily [28].Intuitively, operator precedence grammars are based on three precedence relations,called equal , yield and take , included in Σ × Σ . A character a is equal in precedence to b iff some rhs of the grammar contains as substring ab or a string aBb , where B is anonterminal; in fact, when evaluating the relations between terminal characters for OPG,nonterminals are, so to say, “transparent”. A character a yields precedence to b iff b canoccur immediately to the left of a syntax subtree whose leftmost terminal character is periodicity, Star-freeness, and First-order Definability of Structured CF Languages 7 b . Symmetrically, a takes precedence over b iff a can occur as the rightmost terminal character of a subtree and b is the immediately following terminal character. Definition 4 (OP relations).
Let G = ( V N , Σ, P, S ) be an O-grammar. Let a, b ∈ Σ , A, B ∈ V N , C ∈ V N ∪ ε , and α, β range over ( V N ∪ Σ ) ∗ . For a nonterminal A , the leftand right terminal sets are respectively: L G ( A ) = { a ∈ Σ | A ∗ == ⇒ G Caα } and R G ( A ) = { a ∈ Σ | A ∗ == ⇒ G αaC } , (The grammar name will be omitted unless necessary to prevent confusion.)The operator precedence relations are defined over Σ × Σ as follows:equal in precedence: a . = b ⇐⇒ f orsomeA → αaCbβ ∈ P . = takes precedence: a (cid:109) b ⇐⇒ f orsomeA → αBbβ ∈ P, a ∈ R ( B ) a (cid:109) ⇐⇒ a ∈ R ( B ) and B ∈ S ( B is an axiom)yields precedence: a (cid:108) b ⇐⇒ f orsomeA → αaBβ ∈ P, b ∈ L ( B ) (cid:108) b ⇐⇒ b ∈ L ( B ) and B ∈ S. The OP relations can be collected into a | Σ | × | Σ | array, called the operatorprecedence matrix of the grammar, OP M ( G ) : for each (ordered) pair ( a, b ) ∈ Σ × Σ , OP M a,b ( G ) contains the OP relations holding between a and b . More abstractly, consider a square matrix: M = { M a,b ⊆ { . = , (cid:108) , (cid:109) } | a, b ∈ Σ } (2)Such OPM matrix, is called conflict-free iff ∀ a, b ∈ Σ , ≤ | M a,b | ≤ . A conflict-free matrix is called total or complete iff ∀ a, b ∈ Σ , M a,b ∈ { . = , (cid:108) , (cid:109) } . A matrix is ˙= - acyclic if (cid:54) ∃ a i ∈ Σ such that a i ˙= . . . ˙= a i .We extend the set inclusion relations and the boolean operations in the obvious cellby cell way, to any two matrices having the same terminal alphabet. Two matrices are compatible iff their union is conflict-free. Definition 5 (Operator precedence grammar).
A grammar G is an operator prece-dence (or Floyd’s) grammar, for short an OPG, iff the matrix OP M ( G ) is conflict-free,i.e. the three OP relations are disjoint.An OPG is ˙= - acyclic if OP M ( G ) is so.An operator precedence language (OPL) is a language generated by an OPG.Remarks. If the relation ˙= is acyclic, then the length of the rhs of any rule of G isbounded by the length of the longest ˙= -chain in OP M ( G ) .It is known that the family of OPLs is strictly included within the deterministic andreverse-deterministic context-free family. Moreover any OPG that is BD has the LR(1)property. Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi
Example 6.
For the grammar
GAE (see Figure 1), the left and right terminal sets ofnonterminals E , T and F are, respectively: L ( E ) = { + , ∗ , e } , L ( T ) = {∗ , e } , L ( F ) = { e } , R ( E ) = { + , ∗ , e } , R ( T ) = {∗ , e } , and R ( F ) = { e } . GAE : S = { E, T, F } E → E + T | T ∗ F | eT → T ∗ F | eF → e + ∗ e (cid:109) (cid:108) (cid:108) (cid:109) ∗ (cid:109) (cid:109) (cid:108) (cid:109) e (cid:109) (cid:109) (cid:109) (cid:108) (cid:108) (cid:108) Fig. 1.
GAE (left), and its OPM (right). Figure 1 displays the conflict-free OPM associated with the grammar
GAE ; forinstance OP M ∗ ,e = = (cid:108) tells that ∗ yields precedence to e .Notice that, unlike the arithmetic relations having similar typography, the OP rela-tions do not enjoy any of the transitive, symmetric, reflexive properties.A conflict-free matrix associates to every string at most one structure, i.e., a uniqueparenthesization (see Proposition 7, point 3). This aspect, paired with a way of determin-istically choosing rules’ rhs to be reduced, are the basis of Floyd’s natural bottom-updeterministic parsing algorithm.For instance, the following BD version of GAE , paired with its OPM which is notaffected by the transformation, can unambiguously drive the bottom-up parsing of thestring e + e ∗ e + e to build its unique associated parenthesis version (cid:76)(cid:76)(cid:76) e (cid:77) + (cid:76)(cid:76) e (cid:77) ∗ (cid:76) e (cid:77)(cid:77)(cid:77) + (cid:76) e (cid:77)(cid:77) S = { E, T, F } E → E + T | E + F | T + T | F + F | F + T | T + FT → T ∗ F | F ∗ FF → e Various formal properties of OPGs and languages are documented in the literature,chiefly in [15,14,28]. For convenience, we just recall and collect the ones that are relevantfor this article, in the next proposition.
Proposition 7 (Further properties of OPLs/OPGs).
1. Let M be a conflict-free OPM over Σ × Σ . The class of compatible OPGs andlanguages are: C M = { G | G is an OPG and OP M ( G ) ⊆ M } L M = { L ( G ) | G ∈ C M }
2. Let M be a conflict-free ˙= -acyclic OPM M . The class C M contains a uniquegrammar, called the maxgrammar of M , denoted by G max,M , such that for allgrammars G ∈ C M , the inclusions L ( G ) ⊆ L ( G max,M ) and L ( ˜ G ) ⊆ L ( ˜ G max,M ) hold, where ˜ G and ˜ G max,M are the parenthesized versions of G and G max,M . If M is total, then L ( G max,M ) = Σ ∗ . periodicity, Star-freeness, and First-order Definability of Structured CF Languages 9
3. Let M be a total conflict-free OPM over alphabet Σ . We define the function M : Σ ∗ → ( Σ ∪ { (cid:76) , (cid:77) } ) ∗ as M ( x ) = y, if A ∗ ===== ⇒ G max,M x, A ∗ ===== ⇒ ˜ G max,M y are corresponding derivations. (3) E.g. with M such that a (cid:108) a , a . = b , b (cid:109) b , M ( aaaabbb ) = (cid:76) a (cid:76) a (cid:76) a (cid:76) ab (cid:77) b (cid:77) b (cid:77)(cid:77) .4. Let A ∈ V N . The profile of nonterminal A is the pair of left/right terminal sets ( L ( A ) , R ( A )) . An OPG is called free iff, for any nonterminals A, B ∈ V N , if theprofiles of A and B are equal then A = B . The class of free grammars compatiblewith an OPM M is a finite subset of the class C M . The maxgrammar G max,M isfree.5. The closure properties of the family L M of compatible OPLs are the following. Let M be a total OPM. – L M is closed under union, intersection and set-difference, therefore also undercomplement. – L M is closed under concatenation. – if matrix M is ˙= -acyclic, L M is closed under Kleene star.Remark . Thanks to the fact that a conflict-free OPM assigns to each string at mostone parenthesization –and exactly one if the OPM is complete– the above closureproperties of OPLs w.r.t. boolean operations automatically extend to their parenthesizedversions . In particular, any complete, conflict-free, ˙= -acyclic OPM defines a universalparenthesized language L pU such that its image under the homomorphism that erasesparentheses is Σ ∗ and the result of applying boolean operations to the parenthesizedversions some OPLs is the same as the result of parenthesizing the result of applying thesame operations to the unparenthesized languages.In the following we will assume that an OPM is ˙= -acyclic unless we explicitly pointout the opposite. Such a hypothesis is stated for simplicity despite the fact that, rigorouslyspeaking, it affects the expressive power of OPLs: it guarantees the closure w.r.t. Kleenestar and therefore the possibility of generating Σ ∗ ; this limitation however, is notnecessary if we define OPLs by means of automata [27]; neither would it be necessaryif we adopted OPGs extended by the possibility of including regular expressions inproduction rhs [16], which however would require a much heavier notation. Technically,the only results requiring the ˙= -acyclicity hypothesis are those in Section 3. Morecomments about avoiding this hypothesis are given in the conclusion. In [27] the traditional monadic second order logic (MSO) characterization of regularlanguages by B¨uchi, Elgot, and Trakhtenbrot [9,19,39] is extended to the case of OPL.To deal with the typical tree structure of context-free languages the original MSO syntaxis augmented with the new binary relation (cid:121) , based on the OPL precedence relations:informally, x (cid:121) y holds between the rightmost and leftmost positions of the contextencompassing a subtree, i.e., respectively, of the character that yields precedence to e + e ∗ e + e Fig. 2.
The string e + e ∗ e + e , with relation (cid:121) . the subtree’s leftmost leaf, and of the one over which the subtree’s rightmost leaf takesprecedence.Unlike similar but simpler relations introduced, e.g., in [25] and [2], the (cid:121) relationis not one-to-one. For instance, Figure 2 displays the (cid:121) relation holding for the sentence e + e ∗ e + e generated by grammar GAE : we have (cid:121) , (cid:121) , (cid:121) , (cid:121) , (cid:121) , (cid:121) , and (cid:121) . Such pairs correspond to contexts where a reduce operationis executed during the parsing of the string (they are listed according to their executionorder).Formally, we define a countable infinite set of first-order variables x , y , . . . and acountable infinite set of monadic second-order (set) variables X , Y , . . . . We adopt theconvention to denote first and second-order variables in boldface font. Definition 8 (Monadic Second-Order Logic over ( Σ, M ) ). Let V be a set of first-order variables, and V be a set of second-order (or set) variables. The MSO Σ,M ( monadic second-order logic over ( Σ, M ) ) is defined by the following syntax (symbols Σ, M will be omitted unless necessary to prevent confusion): ϕ := c ( x ) | x ∈ X | x < y | x (cid:121) y | ¬ ϕ | ϕ ∨ ϕ | ∃ x .ϕ | ∃ X .ϕ where c ∈ Σ , x , y ∈ V , and X ∈ V . A MSO formula is interpreted over a ( Σ, M ) string w , with respect to assignments ν : V → { , , . . . , | w | + 1 } and ν : V → ℘ ( { , , . . . , | w | + 1 } ) , in this way: – w , M, ν , ν | = c ( x ) iff w w cw and | w | = ν ( x ) . – w , M, ν , ν | = x ∈ X iff ν ( x ) ∈ ν ( X ) . – w , M, ν , ν | = x < y iff ν ( x ) < ν ( y ) . – w , M, ν , ν | = x (cid:121) y iff w w aw bw , | w | = ν ( x ) , | w aw | = ν ( y ) , and w is the frontier of a subtree of the syntax tree of w . – w , M, ν , ν | = ¬ ϕ iff w , M, ν , ν (cid:54)| = ϕ . – w , M, ν , ν | = ϕ ∨ ϕ iff w , M, ν , ν | = ϕ or w , M, ν , ν | = ϕ . – w , M, ν , ν | = ∃ x .ϕ iff w , M, ν (cid:48) , ν | = ϕ , for some ν (cid:48) with ν (cid:48) ( y ) = ν ( y ) for all y ∈ V − { x } . – w , M, ν , ν | = ∃ X .ϕ iff w , M, ν , ν (cid:48) | = ϕ , for some ν (cid:48) with ν (cid:48) ( Y ) = ν ( Y ) for all Y ∈ V − { X } . The same does not apply to the case of concatenation. This is the usual MSO over strings, augmented with the (cid:121) predicate.periodicity, Star-freeness, and First-order Definability of Structured CF Languages 11
To improve readability, we will drop M , ν , ν and the delimiters ∧ , ∀ , x + 1 , x − , x = y , x ≤ y .A sentence is a formula without free variables. The language of all strings w ∈ Σ ∗ such that w | = ϕ is L ( ϕ ) = { w ∈ Σ ∗ | w | = ϕ } . In [27] it is proved that the above MSO logic describes exactly the OPL family.As usual, we denote the restriction of the MSO logic to the first-order as FO. We alsorecall that the languages generated by free grammars (see Proposition 7, item 4) are FOdefinable [26].Whenever we will deal with logic definition of languages we will implicitly ex-clude from such languages the empty string, according with the traditional conventionadopted in the literature (see, e.g., [31]); thus, when talking about MSO or FO definablelanguages we will exclude empty rules from their grammars. In this section we resume the original definitions and properties of noncounting (NC)context-free languages [12] based on parenthesis grammars [30] and their relations withthe OPL family.In the following all Par-grammars will be assumed to be BDR, unless the opposite isexplicitly stated.
Definition 9 (Noncounting parenthesis language and grammar [12]).
A parenthesislanguage L is noncounting (NC) or aperiodic iff there exists an integer n > such that,for all strings x, u, w, v, y in ( Σ ∪ { (cid:76) , (cid:77) } ) ∗ where w and uwv are well parenthesized, xu n wv n y ∈ L iff xu n + m wv n + m y ∈ L , ∀ m ≥ .A derivation of a Par-grammar is counting iff it has the form B ∗ = ⇒ u m Bv m , with m > , and there is not a derivation B ∗ = ⇒ uBv .A Par-grammar is noncounting iff none of its derivations is counting. Theorem 10 (NC language and grammar (Th. 1 of [12])).
A parenthesis language isNC iff its BDR grammar has no counting derivation.
Definition 11 (NC OP languages and grammars).
For a given OPL L with OPM M , L p is the language of the parenthesized strings x p uniquely associated to L ’s strings x by M . An OPL L is NC iff its corresponding parenthesized language L p is NC.A derivation of an OPG G is counting iff the corresponding derivation of the associ-ated Par-grammar G p is counting. Thus, an OPL is NC iff its BDR OPG (unique up to an isomorphim of nonterminalalphabets) has no counting derivations.In the following, unless parentheses are explicitly needed, we will refer to unparen-thesized strings rather than to parenthesis ones, thanks to the one-to-one correspondence. Such a convention is due to the fact that the semantics of monadic logic formulas is given byreferring to string positions.2 Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi
It is also worth recalling [13] the following peculiar property of OPLs: such languagesare NC or not independently on their OPM, in other words, although the NC property isdefined for structured languages (parenthesis or tree languages [30,37]), in the case ofOPLs this property does not depend on the structure given to the sentences by the OPM.It is important to stress, however, that, despite the above peculiarity of OPLs, aperi-odicity remains a property that makes sense only with reference to the structured versionof languages. Consider, in fact, the following OPLs, with the same OPM consisting of { c (cid:108) c, c . = a, c . = b, a (cid:109) b, b (cid:109) a } besides the implicit relations w.r.t. : L = { c n ( ab ) n | n ≥ } , L = { ( ab ) + } They are both clearly NC and so is their concatenation L · L according to Defini-tion 11 (see also Theorem 22); however, if we applied Definition 9 to L · L withoutconsidering parentheses, we would obtain that, for every n , c n ( ab ) n ∈ L · L but notso for c n +1 ( ab ) n +1 . Next we introduce
Operator Precedence Expressions (OPE) as another formalism todefine OPLs, equivalent to OPGs and MSO logic. An OPE uses the same operations onstrings and languages as Kleene’s REs, and just one additional operation, called fence ,that selects from a language the strings that correspond to a well parenthesized string. Inthe past, regular expressions of different kinds have been proposed for string languagesmore general than the finite-state ones (e.g. the cap expressions for CF languages [40])or for languages made of structures instead of strings, e.g., the tree languages or thepicture languages. Our OPEs have little in common with any of them and, unlike regularexpressions for tree languages [38], enjoy in the context of OPLs the same properties asregular expressions in the context of regular languages.We recall that an OPM M defines a function from unparenthesized strings to theirparenthesized counterparts; such a function is exploited in the following definition. Forconvenience, we define the homomorphism (projection) η : Σ → Σ as: η ( a ) = a , for a ∈ Σ , and η ( ε . Definition 12 (OPE).
Given a complete OPM M , an OPE E and its language L M ( E ) ⊆ Σ ∗ are defined as follows. The meta-alphabet of OPE uses the same symbols of regularexpressions, together with the two symbols ‘[’, and ‘]’. Let E and E be OPE:1. a ∈ Σ is an OPE with L M ( a ) = a .2. ¬ E is an OPE with L M ( ¬ E ) = Σ ∗ − L M ( E ) .3. a [ E ] b , called the fence operation, i.e., we say E in the fence a, b , is an OPE withif a, b ∈ Σ : L M ( a [ E ] b ) = a · { x ∈ L M ( E ) | M ( a · x · b ) = (cid:76) a · M ( x ) · b (cid:77) } · b if a = , b ∈ Σ : L M ( E ] b ) = { x ∈ L M ( E ) | M ( x · b ) = (cid:76) M ( x ) · b (cid:77) } · b if a ∈ Σ, b = : L M ( a [ E ] a · { x ∈ L M ( E ) | M ( a · x ) = (cid:76) a · M ( x ) (cid:77) } where E must not contain E ∪ E is an OPE with L M ( E ∪ E ) = L M ( E ) ∪ L M ( E ) .5. E · E is an OPE with L M ( E · E ) = L M ( E ) · L M ( E ) , where E does notcontain a [ E ] and E does not contain E ] a , for some OPE E , and a ∈ Σ . periodicity, Star-freeness, and First-order Definability of Structured CF Languages 13 E ∗ is an OPE defined by E ∗ := (cid:83) ∞ n =0 E n , where E := { ε } , E = E , E n := E n − · E ; E +1 := (cid:83) ∞ n =1 E n .Among the operations defining OPEs, concatenation has the maximum precedence;set-theoretic operations have the usual precedences, the fence operation is dealt with asa normal parenthesis pair.Similarly to the case of regular expressions, a star-free (SF) OPE is one that doesnot use the * and + operators. The conditions on is not permitted within, say, the left factor E because delimiters are necessarily positioned at the two ends of a string.Besides the usual abbreviations for set operations (e.g., ∩ and − ), we will also usethe following derived operators: – a∆b := a [ Σ + ] b . – a ∇ b := ¬ ( a∆b ) ∩ a · Σ + · b .It is trivial to see that the identity a [ E ] b = a∆b ∩ a · E · b holds.The fact that in Definition 12 matrix M is complete is without loss of generality: tostate that for two terminals a and b , M a,b = ∅ (i.e. that there should be a “hole” in theOPM for them), we can use the short notations hole( a, b ) := ¬ ( Σ ∗ ( ab ∪ a∆b ) Σ ∗ ) , hole( , b ) := ¬ ( ∆bΣ ∗ ) , hole( a, ¬ ( Σ ∗ a∆ and intersect them with the OPE.The following examples illustrate the meaning of the fence operation, the expres-siveness of OPLs w.r.t. less powerful classes of context-free languages, and how OPEsnaturally extend regular expressions to the OPL family. Example 13.
Let Σ be { a, b } , { a (cid:108) a, a . = b, b (cid:109) b } ⊆ M . The OPE a [ a ∗ b ∗ ] b defines thelanguage { a n b n | n ≥ } . In fact the fence operation imposes that any string x ∈ a ∗ b ∗ embedded within the context a, b be well-parenthesized according to M .The OPEs a [ a ∗ b ∗ ] and a + a [ a ∗ b ∗ ] b ∪ { a + } , instead, both define the language { a n b m | n > m ≥ } since the matrix M allows for, e.g., the string aaabb parenthesizedas (cid:76) a (cid:76) a (cid:76) ab (cid:77) b (cid:77)(cid:77) .If instead Σ = { a, b, c } , with { a (cid:108) a, a . = b, a . = c, b (cid:109) b, b (cid:109) c, c (cid:109) b } ⊆ M , thenboth a [ a ∗ ( bc ) ∗ ] b and a [( aa ) ∗ ( bc ) ∗ ] b define the language { a ( a n ( bc ) n ) b | n ≥ } .It is also easy to define Dyck languages with OPEs, as their parenthesis structure isnaturally encoded by the OPM. Consider L Dyck the Dyck language with two pairs ofparentheses denoted by a, a (cid:48) and b, b (cid:48) . This language can be described simply through anincomplete OPM, reported in Figure 3 (left). In other words it is L Dyck = L ( G max,M ) where M is the matrix of the figure. Given that, for technical simplicity, we use onlycomplete OPMs, we must refer to the one in Figure 3 (right), and state in the OPE thatsome OP relations are not wanted, such as a, b (cid:48) , where the open and closed parenthesesare of the wrong kind, or a, , i.e. an open a must have a matching a (cid:48) . a a (cid:48) b b (cid:48) a (cid:108) ˙= (cid:108) a (cid:48) (cid:108) (cid:109) (cid:108) (cid:109) (cid:109) b (cid:108) (cid:108) ˙= b (cid:48) (cid:108) (cid:109) (cid:108) (cid:109) (cid:109) (cid:108) (cid:108) . = a a (cid:48) b b (cid:48) a (cid:108) ˙= (cid:108) (cid:109) (cid:109) a (cid:48) (cid:108) (cid:109) (cid:108) (cid:109) (cid:109) b (cid:108) (cid:109) (cid:108) ˙= (cid:109) b (cid:48) (cid:108) (cid:109) (cid:108) (cid:109) (cid:109) (cid:108) (cid:108) (cid:108) (cid:108) . = Fig. 3.
The incomplete OPM defining L Dyck (left), and a possible completion M complete (right). The following OPE defines L Dyck by suitably restricting the “universe” L ( G max,M complete ) : hole( a, b (cid:48) ) ∩ hole( b, a (cid:48) ) ∩ hole( , a (cid:48) ) ∩ hole( , b (cid:48) ) ∩ hole( a, ∩ hole( b, Example 14.
For a more application-oriented case, consider the classical LIFO policymanaging procedure calls and returns but assume also that interrupts may occur: in sucha case the stack of pending calls is emptied and computation is resumed from scratch. call ret int call (cid:108) ˙= (cid:109) ret (cid:109) (cid:109) (cid:109) (cid:109) int (cid:109) (cid:109) (cid:109) (cid:108) (cid:108) Fig. 4.
Incomplete OPM M int for the OPE describing an interrupt policy. This policy is already formalized by the incomplete OPM of Figure 4, with Σ = { call, ret, int } with the obvious meaning of symbols. For example, the string call callret call call int represents a run where only the second call returns, while the otherones are interrupted. On the contrary, call call int ret is forbidden, because a return isnot allowed when the stack is empty.If we further want to say that there must be at least one procedure terminatingregularly, we can use the OPE: Σ ∗ · call∆ret · Σ ∗ .Another example is the following, were we state that the run must contain at leastone sub-run where no procedures are interrupted: Σ ∗ · hole( call, int ) · Σ ∗ .Notice that the language defined by the above OPE is not a VPL since VPLs onlyallow for unmatched returns and calls at the beginning or at the end of a string, respec-tively. Theorem 15.
For every OPE E on a OPL M , there is an OPG G , compatible with M ,such that L M ( E ) = L ( G ) .Proof. By induction on E ’s structure. The operations ∪ , ¬ , · , and ∗ come from theclosures of OPLs. The only new case is a [ E ] b which is given by the following grammar. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 15 If, by induction, G defines the same language as E , with axiom S E , then we add to G the following rules, where S is the new axiom, and S , S (cid:48) are nonterminals not used in G : – S → η ( a ) S E η ( b ) , if a . = b in M ; – S → η ( a ) S (cid:48) and S (cid:48) → S E η ( b ) , if a (cid:108) b in M ; – S → S (cid:48) η ( b ) and S (cid:48) → η ( a ) S E , if a (cid:109) b in M .Let us call this new grammar G (cid:48) . The grammar for a [ E ] b is then the one obtained byapplying the construction for intersection between G (cid:48) and the maxgrammar for M . Thisintersection is to check that a (cid:108) L ( S E ) and R ( S E ) (cid:109) b ; if it is not the case, accordingto the semantics of a [ E ] b , the resulting language is empty. (cid:117)(cid:116) Next we show that OPEs can express any language that is definable through anMSO formula as defined in Section 2.3. Thanks to the fact that the same MSO logic canexpress exactly OPLs [27] and to Theorem 15 we will obtain our first major result, i.e.,the equivalence of MSO, OPG, OP automata (see e.g., [28]), and OPE.In order to construct an OPE from a given MSO formula we follow the traditionalpath adopted for regular languages (as explained, e.g., in [33]) and augment it to dealwith the new xxx i (cid:121) xxx j relation. For a MSO formula ϕ , let xxx , xxx , . . . , xxx r be the set of firstorder variables occurring in ϕ , and XXX , XXX , . . . , XXX s be the set of second order variables.We use the new alphabet B p,q = Σ × { , } p × { , } q , where p ≥ r and q ≥ s . Themain idea is that the { , } p part of the alphabet is used to encode the value of the firstorder variables (e.g. for p = r = 4 , (1 , , , stands for both the positions xxx and xxx ),while the { , } q part of the alphabet is used for the second order variables. Hence, weare interested in the language K p,q formed by all strings where the components encodingthe first order variables contain exactly one occurrence of 1. We also use this definition C k := { c ∈ B p,q | the ( k + 1) -th component of c = 1 } . Theorem 16.
For every MSO formula ϕ on an OP alphabet ( Σ, M ) there is a OPE E on M such that L M ( E ) = L ( ϕ ) .Proof. By induction on ϕ ’s structure; the construction is standard for regular operations,the only difference is xxx i (cid:121) xxx j .Following B¨uchi’s theorem, we use the alphabet B p,q to encode interpretations offree variables. The set K p,q of strings where each component encoding a first-ordervariable is such that there exists only one 1 is given by the following regular expression: K p,q = (cid:92) ≤ i ≤ p ( B ∗ p,q C i B ∗ p,q − B ∗ p C i B ∗ p,q C i B ∗ p,q ) . Disjunction and negation are naturally translated into ∪ and ¬ ; like in B¨uchi’s theorem, ∃ xxx i ψ (resp. ∃ XXX j ψ ) is translated into the regular expression E ψ for ψ , on an alphabet B p,q , and the expression E for ∃ xxx i ψ is obtained from E ψ by erasing by projection thecomponent i (resp. j ) from the alphabet B p,q ; the order relation xxx i < xxx j is representedby K p,q ∩ B ∗ p C i B ∗ p C j B ∗ p .Last, the OPE for xxx i (cid:121) xxx j is B ∗ p,q C i [ B + p,q ] C j B ∗ p,q . (cid:117)(cid:116) After having complemented the characterization of OPLs in terms of OPEs, we nowenter the analysis of the critical subclass of aperiodic OPLs: in this section we show thatthe languages defined by star-free OPEs coincide with the FO-definable OPLs; in Section5 that NC OPLs are closed w.r.t. boolean operations and concatenation and therefore SFOPEs define NC OPLs; in Section 6 we provide a new characterization of OPLs in termsof MSO formulas by exploiting a control graph associated with a BDR OPG; finally, inSection 7 we show that such MSO formulas can be made FO when the OPL is NC.
Lemma 17 (Flat Normal Form).
Any star-free OPE can be written in the followingform, called flat normal form: (cid:91) i (cid:92) j t i,j where the elements t i,j have either the form L i,j a i,j ∆b i,j R i,j , or L i,j a i,j ∇ b i,j R i,j , or H i,j , for a i,j , b i,j ∈ Σ , and L i,j , R i,j , H i,j star-free regular expressions.Proof. The lemma is a consequence of the distributive and De Morgan properties,together with the following identities, where ◦ , ◦ ∈ { ∆, ∇} , and L k are star-freeregular expressions, ≤ k ≤ : a [ E ] b = a∆b ∩ aEbL a ◦ a L a ◦ a L = L a ◦ a L a Σ + a L ∩ L a Σ + a L a ◦ a L ¬ ( L a ∆a L ) = L a ∇ a L ∪ ¬ ( L a Σ + a L ) ¬ ( L a ∇ a L ) = L a ∆a L ∪ ¬ ( L a Σ + a L ) The first two identities are immediate, while the last two are based on the idea that theonly non-regular constraints of the left-hand negations are respectively a ∇ a or a ∆a ,that represent strings that are not in the set only because of their structure. (cid:117)(cid:116) Theorem 18.
For every FO formula ϕ on an OP alphabet ( Σ, M ) there is a star-freeOPE E on M such that L M ( E ) = L ( ϕ ) .Proof. Consider the ϕ formula, and its set of first order variables: like in Section 3, B p = Σ × { , } p (the q components are absent, being ϕ a first order formula), and theset K p of strings where each component encoding a variable is such that there existsonly one 1.First, K p is star-free: K p = (cid:92) ≤ i ≤ p ( B ∗ p C i B ∗ p − B ∗ p C i B ∗ p C i B ∗ p ) . Disjunction and negation are naturally translated into ∪ and ¬ ; xxx i < xxx j is coveredby the star-free OPE K p ∩ B ∗ p C i B ∗ p C j B ∗ p .The xxx i (cid:121) xxx j formula is like in the second order case, i.e. is translated into B ∗ p C i [ B + p ] C j B ∗ p , which is star-free. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 17 For the existential quantification, the problem is that star-free (OP and regular)languages are not closed under projections. Like in the regular case, the idea is toleverage the encoding of the evaluation of first-order variables, because there is onlyone position in which the component is 1 (see K p ), to use the bijective renamings π ( a, v , v , ..., v p − ,
0) = ( a, v , v , ..., v p − ) , and π ( a, v , v , ..., v p − ,
1) = ( a, v , v , ..., v p − ) , where the last component is the oneencoding the quantified variable. Notice that the bijective renaming does not change the Σ component of the symbol, thus maintaining all the OP precedence relations.Let E ϕ be the star-free OPE on the alphabet B p for the formula ϕ , with x a freevariable in it. Let us assume w.l.o.g. that the evaluation of x is encoded by the lastcomponent of B p ; let B = Σ × { , } p − × { } , and A = Σ × { , } p − × { } .The OPE for ∃ xϕ is obtained from the OPE for ϕ through the bijective renaming π ,and considering all the cases in which the symbol from A can occur.First, let E (cid:48) be a OPE in flat normal form, equivalent to E ϕ (Lemma 17). The FOsemantics is such that L ( ϕ ) = L M ( E (cid:48) ) = L M ( E (cid:48) ) ∩ B ∗ AB ∗ .By construction, E (cid:48) is a union of intersections of elements L i,j a i,j ∆b i,j R i,j , or L i,j a i,j ∇ b i,j R i,j , or H i,j , where a i,j , b i,j ∈ Σ , and L i,j , R i,j , H i,j are star-free regularlanguages.In the intersection between E (cid:48) and B ∗ AB ∗ , all the possible cases in which thesymbol in A can occur in E (cid:48) ’s terms must be considered: e.g. in L i,j a i,j ∆b i,j R i,j it couldoccur in the L i,j prefix, or in a i,j ∆b i,j , or in R i,j . More precisely, L i,j a i,j ∆b i,j R i,j ∩ B ∗ AB ∗ = ( L i,j ∩ B ∗ AB ∗ ) a i,j ∆b i,j R i,j ∪ L i,j ( a i,j ∆b i,j ∩ B ∗ AB ∗ ) R i,j ∪ L i,j a i,j ∆b i,j ( R i,j ∩ B ∗ AB ∗ ) (the ∇ case is analogous, H i,j is immediate, being regular star-free).The cases in which the symbol from A occurs in L i,j or R i,j are easy, because theyare by construction regular star-free languages, hence we can use one of the standardregular approaches found in the literature (e.g. by using the splitting lemma in [18]). Theonly differences are in the factors a i,j ∆b i,j , or a i,j ∇ b i,j .Let us consider the case a i,j ∆b i,j ∩ B ∗ AB ∗ . The cases a i,j ∈ A or b i,j ∈ A are like ( L i,j ∩ B ∗ AB ∗ ) and ( R i,j ∩ B ∗ AB ∗ ) , respectively, because L i,j a i,j and b i,j R i,j arealso regular star-free ( ∇ is analogous).The remaining cases are a i,j ∆b i,j ∩ B + AB + and a i,j ∇ b i,j ∩ B + AB + . By def-inition of ∆ , a i,j ∆b i,j ∩ B + AB + = a i,j [ B ∗ AB ∗ ] b i,j , and its bijective renaming is π ( a i,j )[ π ( B ∗ ) π ( A ) π ( B ∗ )] π ( b i,j ) = a (cid:48) i,j [ B + p − ] b (cid:48) i,j , where π ( a i,j ) = a (cid:48) i,j , and π ( b i,j ) = b (cid:48) i,j , which is a star-free OPE. By definition of ∇ , a i,j ∇ b i,j ∩ B + AB + = ¬ ( a i,j [ B + p ] b i,j ) ∩ a i,j B + p b i,j ∩ B + AB + = ¬ ( a i,j [ B + p ] b i,j ) ∩ a i,j B ∗ AB ∗ b i,j .Hence, its renaming is ¬ ( π ( a i,j )[ π ( B ∗ p ) π ( B p ) π ( B ∗ p )] π ( b i,j )) ∩ π ( a i,j B ∗ ) π ( A ) π ( B ∗ b i,j ) = ¬ ( a (cid:48) i,j [ B + p − ] b (cid:48) i,j ) ∩ a (cid:48) i,j B + p − b (cid:48) i,j , a star-free OPE. (cid:117)(cid:116) Theorem 19.
For every star-free OPE E on an OP alphabet ( Σ, M ) , there is a FOformula ϕ on ( Σ, M ) such that L M ( E ) = L ( ϕ ) .Proof. The proof is by induction on E ’s structure. Of course, singletons are easilyfirst-order definable; for negation and union we use ¬ and ∨ as natural.Like in the case of star-free regular languages, concatenation is less immediate, andit is based on formula relativization . Consider two FO formulae ϕ and ψ , and assumew.l.o.g. that their variables are disjunct, and let xxx be a variable not used in neither of them. To construct a relativized variant of ϕ , called ϕ Counting and non-counting parenthesis languages are closed w.r.t. tocomplement. Thus, counting and non-counting OPLs are closed w.r.t. complement w.r.t.the max-language defined by any OPM.Proof. We give the proof for counting languages which also implies the closure ofnon-counting ones.By definition of counting parenthesis language and from Theorem 1 of [12], if L p is counting there exist strings x, u, v, z, y and integers n, m with n > , m > suchthat xv n + r zu n + r y ∈ L for all r = km > but not for all r > . Thus, the complementof L p contains infinitely many strings xv n + i zu n + i y ∈ L p but not all of them sincefor some i , i = km . Thus, for ¬ L p too there is no n such that xv n zu n y ∈ L iff xv n + r zu n + r y ∈ L for all r ≥ .The same holds for the unparenthesized version of L p if it is an OPL. (cid:117)(cid:116) Theorem 21. Non-counting parenthesis languages and non-counting OPLs are closedw.r.t. union and therefore w.r.t. intersection.Proof. Let L , L be two NC parenthesis languages/OPLs. Assume by contradiction that L = L ∪ L be counting. Thus, there exist strings x, u, v, z, y such that for infinitelymany m , xv n zu n y ∈ L but for no n xv n zu n y ∈ L iff xv n + r zu n + r y ∈ L for all r ≥ .Hence, the same property must hold for at least one of L and L which therefore wouldbe counting. (cid:117)(cid:116) Notice that, unlike the case of complement, counting languages are not closed w.r.t.union and intersection, whether they are regular or parenthesis or OP languages. Theorem 22. Non-counting OPLs are closed w.r.t. concatenation. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 19 Proof. Recall from [14] that OPLs with compatible OPM are closed w.r.t. concatenation.Thus, let L , L be NC OPLs, and G = ( Σ, V N , P , S ) , G = ( Σ, V N , P , S ) theirrespective BDR OPGs. Let also L p , L p , be their respective parenthesized languagesand G p , G p , their respective parenthesized grammars. We also recall that in generalthe parenthesized version L p of L = L · L is not the parenthesized concatenation ofthe parenthesized versions of L and L , i.e., L p may differ from (cid:76) L (cid:48) p · L (cid:48) p (cid:77) , where (cid:76) L (cid:48) p (cid:77) = L p and (cid:76) L (cid:48) p (cid:77) = L p , because the OP concatenation may cause the syntaxtrees of L and L to coalesce.The construction given in [14] builds a grammar G whose nonterminal alphabetincludes V N , V N and a set of pairs [ A , A ] with A ∈ V N , A ∈ V N ; the axioms of G are the pairs [ X , X ] with X ∈ S , X ∈ S . In essence (Lemmas 18 through 21of [14]) G ’s derivations are such that [ X , X ] ∗ == ⇒ G x [ A , A ] y , [ A , A ] ∗ == ⇒ G u implies u = w · z for some w, z and X ∗ == ⇒ G xA , A ∗ == ⇒ G w , X ∗ == ⇒ G A y , A ∗ == ⇒ G z . Noticethat some substrings of x · w , resp. z · y , may be derived from nonterminals belongingto V N , resp. V N , as the consequence of rules of type [ A , A ] → α [ B , B ] β with α ∈ V ∗ , β ∈ V ∗ , where [ B , B ] could be missing; also, any string γ derivable in G contains at most one nonterminal of type [ A , A ] (see Figure 5).Suppose, by contradiction, that G has a counting derivation [ X , X ] ∗ == ⇒ G x [ A , A ] y ∗ == ⇒ G xu m [ A , A ] v m y ∗ == ⇒ G xu m zv m y (one of u m , v m could be empty either in L orin L p ) whereas [ A , A ] does not derive u [ A , A ] v : this would imply the derivations A ∗ == ⇒ G u m A , A ∗ == ⇒ G A v m which would be counting in G and G since theywould involve the same nonterminals in the pairs [ A i , A j ] . Figure 5 shows a countingderivation of G derived by the concatenation of two counting derivations of G and G ;in this case neither u m nor v m are empty.If instead the counting derivation of G were derived from nonterminals belonging to V N , (resp. V N ) that derivation would exist identical for G (resp. G ). (cid:117)(cid:116) Thanks to the above closure properties we deduce the following important propertyof OPEs. Theorem 23. The OPLs defined through star-free OPEs are NC.Proof. Thanks to Lemma 17 we only need to consider OPEs in flat normal form:they consist of star-free regular expressions combined through boolean operations andconcatenation with a∆b and a ∇ b operators. a∆b = a [ Σ + ] b is obviously NC; a ∇ b is theintersection of the negation of a∆b with the regular star-free expression aΣ + b . Thanksto the above closure properties of NC OPLs, star-free OPEs are NC. (cid:117)(cid:116) This is a minor deviation from the formulation given in [14] since in that paper it was assumedthat grammars have only one axiom. Note that the G produced by the construction is BD if so are G and G , but it could benot necessarily BDR; however, if a BDR OPG has a counting derivation, any equivalent BDgrammar has also a counting derivation.0 Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi x X A uuu... A z yX A vvv...A z x [X X ]uuu... z z yvvv...[A A ][A A ] m m Fig. 5. An example of paired derivations combined by the concatenation construction. In this casethe last character of u is in . = relation with the first character of v .periodicity, Star-freeness, and First-order Definability of Structured CF Languages 21 In this cornerstone section we show how any OPL can be expressed as a combinationof a “skeleton language” –the max-language associated with the OPM– combined witha “regular control”. Such a regular control, defined through a graph derived from theOPG, can be translated in the traditional way into MSO formulas –which become FOif the language defined by the graph is noncounting [31]–. These formulas, suitablycomplemented by the (cid:121) relation, express the language generated by the source OPG.The following definition of control graph associates a regular language with everynonterminal symbol of the grammar. Definition 24 (control graph). Let G = ( Σ, V N , P, S ) be an OPG. The control graph of G , denoted by C ( G ) =( Q, Σ, δδδ ) , is the graph having vertices or states Q and edges defined by δδδ relation anddenoted by an arrow −→ , labelled by elements in Σ ∗ , defined as follows. – Q = V (cid:3) N ∪ V (cid:2) N , where V (cid:3) N (resp. V (cid:2) N ) = { A (cid:3) (resp. A (cid:2) ) | A ∈ V N } . – Let W be the set: W = { w ∈ Σ + | A → βwγ ∈ G, β = β (cid:48) B or ε, γ = Cγ (cid:48) or ε } . (4) The macroedges , denoted by a boldface arrow −→−→−→ , define the macro δ relation δδδ .They are associated with an OPG according to the following table, where w ∈ W :rule edge A → Bγ A (cid:3) ε −→−→−→ B (cid:3) A → wBγ A (cid:3) w −→−→−→ B (cid:3) A → βB B (cid:2) ε −→−→−→ A (cid:2) A → βBw B (cid:2) w −→−→−→ A (cid:2) A → βBwCγ B (cid:2) w −→−→−→ C (cid:3) A → w A (cid:3) w −→−→−→ A (cid:2) For a given control graph, the regular languages consisting in the paths going fromstate to state are named control languages ; in particular, for any grammar nonterminal A , we will denote the set { x | A (cid:3) x −→−→−→ A (cid:2) } as R A . Notice that the triple C ( G ) = ( Q, Σ, δδδ ) defining a control graph can be seen asthe homonymous triple of a finite automaton, in the extended version introduced inSection 2.1 without altering the properties of the defined languages –the regular ones–.This simplifying notation will allow us, in the following, to state a more immediatecorrespondence between the terminal parts of grammar rules and graph edges, withoutintroducing useless intermediate steps. For this reason the edges of a control graph arecalled macroedges and the transitions macrosteps. Whenever needed to avoid confusion,the arrows denoting graph edges and macroedges will be labeled with a subscriptindicating the δ relation they belong to.Intuitively, a state of type A (cid:3) denotes that a path of the control graph visiting thesyntax tree of a string generated by G is touching the nonterminal A while following atop-down direction; conversely, it visits A (cid:2) while following a bottom-up direction. We will see (Theorem 26) that the frontier of a syntax tree rooted in nonterminal A is a path of the control graph, going from A (cid:3) to A (cid:2) (of course, being such paths regularlanguages, they also include strings that are not in L G ( A ) ).An example of control graph expressed in terms of macrosteps can be found inFigure 14. We already know that the MSO logic defined in Section 2.3 as an extension of thetraditional logic for regular languages defines exactly the family of OPLs. In this sectionwe show a way to obtain an MSO formula equivalent to an OPG directly from its controlgraph: the final goal is to obtain from such a construction an FO formula instead of anMSO one in the case that the OPL is aperiodic.Intuitively the (cid:121) relation, which is the only new element w.r.t. the traditional MSOlogic for regular languages, “embraces” the string x generated by some grammar non-terminal A , thus it must be A (cid:3) x −→−→−→ A (cid:2) . Next we provide the details of the MSOconstruction.First, we resume from previous papers about logic characterization of OPL [27,26]the following TreeC formula which states that the positions x , . . . , x n , with n > , ofa string are, in order, the positions of the terminal characters of a grammar rule rhs and x , x n +1 are the positions of the character immediately at the left and immediately at theright of the subtree generated by that rule: TreeC( x , x , . . . , x n , x n +1 ) := x (cid:121) x n +1 ∧ (cid:86) ≤ i ≤ n x i + 1 = x i +1 ∨ x i (cid:121) x i +1 ∧ (cid:86) i +1 Fig. 6. An example of the TreeC relation for a rule A → aBbcCdD (with a ( x ) , b ( x ) , c ( x ) , d ( x ) ). where the disjunction is considered over the rules of G and B j are either ε or are thenonterminals occurring in the rhs of the production.Finally formula χ states that the strings included between χ := (cid:94) A ∈ V N ψ A ∧ ∃ e (cid:32) e + 1) ∧ ¬∃ y ( e + 1 < y ) ∧ (cid:95) A ∈ S ϕ A (0 , e + 1) (cid:33) (7) Example 25. Consider the following OPG G NL , with S = { A, B } . A → aBcA | aBcB | ac, B → bAcA | bAcB | bc Let ϕ A and ϕ B be the MSO formulas defining the regular languages R A and R B ,and ϕ A ( x , y ) and ϕ B ( x , y ) their respective relativized versions. Then the ψ A formulafor nonterminal A of G NL is: ∀ x , y ϕ A ( x , y ) ∧ x (cid:121) y ⇒∃ x , x TreeC( x , x , x , y ) ∧ a ( x ) ∧ c ( x ) ∧ ϕ B ( x , x ) ∧ ϕ A ( x , y ) ∧ x + 1 = x ∨∃ x , x TreeC( x , x , x , y ) ∧ a ( x ) ∧ c ( x ) ∧ ϕ B ( x , x ) ∧ ϕ B ( x , y ) ∧ x + 1 = x ∨∃ x , x (cid:18) TreeC( x , x , x , y ) ∧ a ( x ) ∧ c ( x ) ∧ x + 1 = x ∧ x + 1 = x ∧ x + 1 = y (cid:19) (8)Notice that we purposedly avoided some obvious simplifications to emphasize thegeneral structure of the ψ formula. Theorem 26 (Regular Control). Let G = ( Σ, V N , P, S ) be a BDR OPG, M its OPM, C ( G ) its control graph, ψ A the formula defined above for each of G ’s nonterminals.Then, for any A ∈ V N , x ∈ L ( A ) if and only if x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A .Proof. First of all, we note that A (cid:3) x −−→ A (cid:2) iff x (cid:15) ϕ A (0 , | x | + 1) , i.e. R A = { x | x (cid:15) ϕ A (0 , | x | + 1) } , by construction of C ( G ) and of ϕ A .The proof is by induction on the height m of the syntax trees rooted in A . Base : m = 1 . If A == ⇒ G x , with x = c . . . c n , i.e. A → x is a production of G , then x (cid:15) TreeC(0 , . . . , n + 1) and x (cid:15) c i ( i ) for every i = 1 . . . n . Also, it is A (cid:3) x −→ A (cid:2) , by construction of C ( G ) . Hence, x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A .Vice versa, it is x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A , with x = (cid:108) c . = c . = . . . c n (cid:109) .Therefore: (i) x ∈ R A , (ii) x (cid:15) (cid:121) | x | + 1 , and (iii) x (cid:15) c i ( i ) for every i = 1 . . . n . (ii) and (iii) imply that there exists a production B → x , but being G BDR, B must be A . Hence, x ∈ L ( A ) . Induction : m > . Let us consider any A → B c B . . . c n B n ∈ P , c i ∈ Σ , wheresome B i could be absent – we assume for simplicity that they are all present; the casewhere some of them is missing can be promptly adapted. Case A == ⇒ G B c B . . . c n B n ∗ == ⇒ G w c w c w . . . c n w n = x implies x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A . Induction hypothesis: for each i = 0 . . . n , B i ∗ == ⇒ G w i implies w i (cid:15) ϕ B i (0 , | w i | + 1) ∧ ψ B i .Let x i be the position of c i in x (i.e. x (cid:15) c i ( x i ) ), i = 1 . . . n . Being A == ⇒ G B c B . . . c n B n ∗ == ⇒ G w c w c w . . . c n w n = x , the structure of x is such that (cid:108) w (cid:109) c (cid:108) w (cid:109) . . . c n (cid:108) w n (cid:109) . Hence, x (cid:15) x i − (cid:121) x i , i = 1 . . . n , and (cid:121) | x | + 1 . By construction of C ( A ) , A (cid:3) ε −→ B (cid:3) , B (cid:2) i − c i −→ B (cid:3) i , i = 1 . . . n , B (cid:2) n ε −→ A (cid:2) ,so we have A (cid:3) x −→ A (cid:2) . This means x (cid:15) ϕ A (0 , | x | + 1) . By induction hypothesis, w i (cid:15) ϕ B i (0 , | w i | + 1) implies x (cid:15) ϕ B i ( x i , x i +1 ) ; also, x (cid:15) ϕ B (0 , x ) and x (cid:15) ϕ B n ( x n , | x | + 1) . Hence, x (cid:15) TreeC(0 , x . . . x n , | x | + 1) . Therefore, let ψ A be ∀ x , y ψ (cid:48) A ( x , y ) . By induction hypothesis, ψ (cid:48) A holds in all the substrings w i . Theonly new case for the values of x , y that make the left hand side of the implication of ψ (cid:48) A true is x = 0 and y = | x | + 1 : in this case we proved that x (cid:15) ψ (cid:48) A (0 , | x | + 1) .Hence, x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A . Case x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A implies A == ⇒ G B c B . . . c n B n ∗ == ⇒ G w c w c w . . . c n w n = x . Induction hypothesis: for each i = 0 . . . n , w i (cid:15) ϕ B i (0 , | w i | + 1) ∧ ψ B i implies B i ∗ == ⇒ G w i .The hypothesis x (cid:15) ϕ A (0 , | x | + 1) ∧ ψ A guarantees that for at least one rule of G , A → B c B c ...c n B n among x ’s positions there exist x . . . x n such that x (cid:15) TreeC(0 , x . . . x n , | x | + 1) and c ( x i ) = c i | i = 1 . . . n . Thus x = w c . . . c n w n and, by the induction hypothesis, for each i = 0 . . . n , there exist unique B i such that B i ∗ == ⇒ G w i . Since G is BDR we conclude that A is the unique nonterminal of G suchthat A ∗ == ⇒ G x . (cid:117)(cid:116) From Theorem 26 we immediately derive the following main periodicity, Star-freeness, and First-order Definability of Structured CF Languages 25 Corollary 27. For any BDR OPG G , L ( G ) is the set of strings satisfying the corre-sponding formula χ . The above formulas are based on subformulas ϕ A which define the regular languagesof paths within the control graph. It is a rather natural intuition that if the control graphof an OPG defines NC regular control languages, then the OPL of the grammar is NC aswell (see the next Section 7 for a more accurate explanation of this intuition). From theclassic literature, we can define equivalent first-order formulas ϕ (cid:48) A , therefore obtaining afirst-order MSO Σ,M formula χ (cid:48) ; thus, we obtain a first important result: Corollary 28. If the control graph of an OPG G defines languages R A , A denotingany nonterminal character of G , that are NC, then, L ( G ) can be defined through a FOformula. Unfortunately, we will soon see that there are NC OPLs such that the control graphof their (unique up to a nonterminal isomophism) BDR OPG defines counting regularlanguages R A . Thus, the following –highly technical– section is devoted to transformthe original BDR grammar of a NC OPL and its control graph into equivalent ones wherethe controlling regular languages involved in the above formulas are NC and thereforeFO definable. The previous section showed that, if an OPL is controlled by a control graph whose pathlabels from descending to corresponding ascending states are NC regular languages, thenthe OPL can be defined through a FO formula; by adding the intuition that, if languages R A , where A denotes any nonterminal of the original grammar, are NC, then the originalOPL is NC as well, we would obtain a sufficient condition for FO-expressibility of NCOPLs.This is not our goal, however: we want to show that any NC OPL can be expressedby means of a FO formula. Unfortunately, it is immediate to realize that there are NCOPLs whose languages R A of the control graph of their BDR grammar are counting, asshown by the following simple example: Example 29. Consider the grammar below: A → aBc | d ; B → aAb the regular control language R A is ( aa ) ∗ d ( bc ) ∗ . Notice, however, that Theorem 26 stillholds if we replace R A by the NC language a ∗ d ( bc ) ∗ : intuitively, it is the OPM, andtherefore the (cid:121) relation, which imposes that each b and each c are paired with a single a , so that for each sequence belonging to ( bc ) ∗ we implicitly count an even number of a .Generalizing this natural intuition into a rigorous replacement of the original controlgraph of any OPG with a different NC one which preserves Theorem 26 is the target ofthis section. To achieve it, we need a rather articulated path which is outlined below: 1. First, in the same way as in [12] we build a linear grammar GL associated with theoriginal OPG G (which is always assumed to be BDR) such that L ( GL ) is NC iffso is L ( G ) .2. Then, we derive from the control graph of GL another control graph ¯ C ( GL ) whoseregular languages are NC. This will require a rather sophisticated transformation ofthe original C ( GL ) .3. The original grammar G is transformed into an equivalent one G (cid:48) (no more BDR)whose nonterminals are pairs of states of the transformed control graph ¯ C ( GL ) oftype ( X (cid:3) A , X (cid:2) A ) where one or more of them are homomorphically mapped into singlenonterminals A of G , and such that its control graph C ( G (cid:48) ) exhibits only NC controllanguages.4. Finally, the original Theorem 26 is extended to the case of the transformed grammar G (cid:48) and its new control graph. At this point, the MSO formalization of any OPLprovided in Section 6.1 automatically becomes an FO one thanks to the fact thateach subformula ϕ A defines a NC regular language. A linear production of the form A → uBv such that B ∈ V N , u (cid:54) = ε and v (cid:54) = ε is called bilateral . A linear grammar is bilateral ifit contains only bilateral productions and terminal productions. Thus, a bilateral grammar may not contain productions that are null, renaming, left-linearor right-linear.The following definition slightly modifies a similar one given in [12]. Definition 31 (Linearized grammar). Let G = ( Σ, V N , P, S ) be a BDR OPG. Itsassociated linearized grammar GL is ( Σ L , V N , P L , S ) , where Σ L = Σ ∪ ¯ Σ ∪ { ¯ ε L , ¯ ε R } , ¯ Σ = { ¯ C | C ∈ V N } , P L = { A → αBβ | α, β ∈ ¯ Σ + , A → h ( α ) Bh ( β ) ∈ P } , where h ( a ) = a ∈ Σ , h ( ¯ C ) = C ∈ V N , h (¯ ε L ) = h (¯ ε R ) = ε .Example 32. Consider the grammar G NL of Example 25. Its associated linearizedgrammar G NL L , with Σ = { a, b, c, ¯ A, ¯ B, ¯ ε R } , W = { a, b, c, b ¯ Ac, a ¯ Bc, c ¯ A, c ¯ B, ac, bc, ¯ ε R } , and the sameaxioms as G NL , has the following productions: A → a ¯ BcA ¯ ε R | aBc ¯ A | a ¯ BcB ¯ ε R | aBc ¯ B | ac,B → b ¯ AcA ¯ ε R | bAc ¯ A | b ¯ AcB ¯ ε R | bAc ¯ B | bc A linearized grammar is evidently bilateral and BDR. It has a different terminalalphabet –and therefore OPM– than the original grammar from which it is derived but isstill an OPG since its new OPM is clearly conflict-free (the two separate “dummy ε ” havebeen introduced just to avoid the risk of conflicts). It is not guaranteed, however, thatan OPG with ˙= -acyclic OPM has an associated linearized grammar enjoying the sameproperty. Such a hypothesis, however, is not necessary to guarantee the following results(indeed, it is only necessary to guarantee the existence of a maxgrammar generating theuniversal language Σ ∗ ; see also further comments in the conclusions.)The following lemma is a trivial adaptation of the analogous Lemma 1 of [12] toDefinition 31. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 27 Lemma 33. Let G be a BDR OPG and GL its associated linearized grammar. L ( GL ) is NC iff so is L ( G ) . This simple but fundamental lemma formalizes the fact that the aperiodicity propertycan be checked by looking only at the paths traversing the syntax trees from the root tothe leaves neglecting their ramifications.The next definition and property are taken from [10]. Definition 34 (Counter). For a given FA (without ε -moves) a counter is a pair ( X, u ) ,where X is a sequence of different states q q . . . q k , with k > and u is a nonemptystring such that for ≤ i ≤ k , q i u −→ δ ∗ q ( i +1) mod k ; k is said the order of the counter.For a counter C = ( X, u ) , the sequence X is said the counter sequence of C and u the string of C . Proposition 35. If a FA A is counter-free, i.e., has no counter, then L ( A ) is noncounting,or aperiodic. Notice that the converse of this statement only holds in the case of minimizeddeterministic FA [31]. Notation . For simplicity we will make use of the abbreviated notation q z −→−→−→ δδδ q (cid:48) introducedin Definition 24 to denote a macro-step transition on the control graph that scans a string z in the finite set W of Eq. (4), p. 21. Thus, for a linearized grammar GL , every pathof its control graph belonging to some R A is articulated into a sequence of macrostepswhose states belong to V (cid:3) N followed by a sequence which traverses the correspondingnodes of V (cid:2) N in the reverse order. Accordingly, a counter sequence may only contain asequence of C ( GL ) ’s nodes that correspond to the grammar nonterminals. It is immediateto verify that the control graph of a linearized grammar exhibits a counter iff it exhibits acounter consisting just of macro-steps.Let C = ( X, u ) be a counter with X = A A . . . A k , A i u −→−→−→ A ( i +1) mod k , for ≤ i ≤ k . Let also u = z z . . . z j , j ≥ be the factorization into strings z i of the set W corresponding to the macro-steps of the path A i u −→−→−→ A ( i +1) mod k : notice that sucha factorization is the same for all i since the OPM imposes the same parenthesization of u in any path.The following lemma allows us to reason about the NC property of linear OPLswithout considering explicitly the parenthesis versions of their grammars. Lemma 36. Let GL be a bilateral linear OPG, C ( GL ) its control graph, GL p theparenthesized version of GL , and C ( GL p ) its control graph. Then, for any nonterminal A of GL the control language R pA is NC iff so is R A .Proof. If R pA is counting, then obviously so is R A .Vice versa, suppose by contradiction that for all k R A contains a string xy k z butnot xy k +1 z . Notice that for k sufficiently large the parenthesized version y kp of y k mustcontain either only open or only closed parentheses.Let us assume w.l.o.g. that y kp begins with an open (resp. ends with a closed) paren-thesis; otherwise consider a suitable permutation thereof. If all occurrences of y p itselfbegin with an open parenthesis (resp. end with a closed one), then R pA is counting too; otherwise for some r ≤ k it must be u = y rp without a parenthesis between twoconsecutive occurrences of y ; but this would imply a conflict in the OPM. (cid:117)(cid:116) Definition 37 (Counter table). We use an array with the following scheme, called a counter table T , to orderly and completely represent the (macro)transitions which mayoccur within a counter C = ( X = A A . . . A k , u = z z . . . z j ) : A z −→−→−→ B z −→−→−→ B . . . B j − z j −→−→−→ A A z −→−→−→ B z −→−→−→ B . . . B j − z j −→−→−→ A · · · A k z −→−→−→ B k z −→−→−→ B k . . . B j − k z j −→−→−→ A (9) With reference to the above Table (9) the sequence of macrosteps looping from A to A is called the path of the counter table. Thus, a counter table defines a “matrix of counters” consisting of its columns: in thecase of Table (9) the first column A , A , . . . , A k together with the string u will be usedas the reference counter of the table. Each cyclic permutation of each column is anothercounter with the same string, whereas each column, e.g. ( B B . . . B , z z . . . z j z ),is a counter whose string is a cyclic permutation of u . For any counter of a counter table,its associated path is the sequence of macrosteps looping from its first state to itself. Theabove remarks lead to the following formal definition: Definition 38. Let T be a counter table expressed in the form of Table 9; the con-ventionally designated counter C = ( X = A A . . . A k , u = z z . . . z j ) is namedits reference counter ; all columns ( B r B r . . . B rk , z r z ( r +1) mod j . . . z r − ) are named horizontal cyclic permutations of the reference counter; all counters C = ( X = A l A l +1 . . . A k . . . A l − , u = z z . . . z j ) are named vertical cyclic permutations ofthe reference counter; horizontal-vertical and vertical-horizontal cyclic permutationsare the natural combination of the two permutations. If we apply cyclic permutations to the whole path producing a counter C = ( X = A A . . . A k , u = z z . . . z j ) , and therefore a complete counter table, we obtain afamily of counter tables associated with the original Table 9. We decide, therefore,to choose arbitrarily an “entry point” of any path producing a counter. Such an entrypoint uniquely determines a counter table T and therefore a unique reference counter.Furthermore, for convenience, if the same path A i u →→→ A ( i +1) mod k , for ≤ i ≤ k canalso be read as A i u (cid:48) →→→ A ( i +1) mod k (cid:48) , with u = u (cid:48) r , k (cid:48) = k · r we represent the uniqueassociated T by choosing the minimum of such u s (and the maximum of the k s). Allelements of the table –states, transitions, counter sequences– will be referred through thisunique T , ignoring the other tables of its “family”. Whenever needed we will identify acounter table, its counter sequences, and any element thereof, through a unique index, as T [ i ] , X [ i ] , A l [ i ] , respectively.Notice that a counter table uniquely defines a collection of counters (among themthe first column being chosen as its reference counter), but the same counter may be periodicity, Star-freeness, and First-order Definability of Structured CF Languages 29 a counter, whether a reference counter or not, of different tables. This case arises, forinstance, when the linearized grammar contains two productions such as A → z B v and A → z C w . Then the same counter C = ( X = A A . . . A k , u = z z . . . z j ) occurs in a counter table that necessarily differs from the one represented in the table(9), in at least one of the intermediate states B ih .Notice also that the various counters of a counter table may not be disjoint. Consider,for instance, the following sequence of transitions A a −→−→−→ B , B b −→−→−→ C , C c −→−→−→ B , B a −→−→−→ D , D b −→−→−→ E , E c −→−→−→ A which constitute a counter table. In this counter table nonterminal B occurs twice by us-ing two different transitions; thus, we obtain the counters ( AB, abc ) , ( BD, bca ) , ( CE, cab ) .Furthermore, the same transition B b −→−→−→ C , can also be used to exit the counter table,after having executed the loop B b −→−→−→ C , C c −→−→−→ B , instead of continuing the countertable with B a −→−→−→ D . Definition 39 (Paired Paths). Let C ( GL ) be the control graph of a linearized grammar GL . Let A = ⇒ u A v . . . = ⇒ u . . . u n − A n v n − . . . v with u = u u . . . u n − , v = v n − . . . v be a derivation for GL . Then the paths A (cid:3) u −→−→−→ A (cid:3) , . . . A (cid:3) n − u n − −→−→−→ A (cid:3) n ,and A (cid:2) n v n − −→−→−→ A (cid:2) n − , . . . A (cid:2) v −→−→−→ A (cid:2) , called, respectively, descending and ascending , are paired (by such derivation) .Two counter tables are paired iff their paths, or cyclic permutations thereof, arepaired; two counters are paired iff their associated paths are paired – therefore so arethe counter tables they belong to. If the control graph of a linearized grammar GL is counter free, then L ( GL ) is NC.Notice, in fact, that1. C ( GL ) has no ε -moves, thus the definition Def. 34 of counter-free is well-posed forit;2. If, by contradiction, GL , which is BDR, admitted a counting derivation, such aderivation would imply two paired counters of C ( GL ) .Unfortunately such a condition is only sufficient but not necessary to guarantee that L ( GL ) is NC, as shown by Example 29. Thus, according to the path outlined at thebeginning of Section 7, our next goal is to transform C ( GL ) into a control graph, denotedas ¯ C ( GL ) , whose regular languages are NC and which will drive the construction of agrammar G (cid:48) , equivalent to the original G , such that its control graph defines NC R A s forits nonterminals. The construction of ¯ C ( GL ) will exploit the following lemma, whichmakes use of the notion of paired counters: Lemma 40. If GL is NC, then C ( GL ) either has no paired counters or, for any twopaired counters, the orders of the descending and ascending counter are coprimenumbers.Proof. Assume, by contradiction, that the counters C (cid:3) = ( X (cid:3) , u ) , C (cid:2) = ( Y (cid:2) , v ) arepaired by the derivation A ∗ = ⇒ u k A v h and that for some j, r, s > , k = j · r , h = j · s . Let X (cid:3) = A (cid:3) . . . A (cid:3) k , Y (cid:2) = A (cid:2) h . . . A (cid:2) , with A h = A k . This means that for some j , A ∗ = ⇒ u j A j v j ∗ = ⇒ u j A j v j . . . ∗ = ⇒ u k A v h ; thus ( A (cid:3) A (cid:3) j A (cid:3) j . . . A (cid:3) k , u j ) and ( A (cid:2) h A (cid:2) h − j A (cid:2) h − j , . . . A (cid:2) , v j ) are two paired counters as well which correspond to acounting derivation of GL . (cid:117)(cid:116) Example 41. The productions A → aBb and B → aAb generate the two paired coun-ters of order 2 of the control graph: ( A (cid:3) B (cid:3) , a ) paired with ( B (cid:2) A (cid:2) , b ) . Instead, theproductions A → aA f , A → bA g , A → aA h , A → bA f , A → aA g , A → bA h generate the following sequence of descending counters of order pairedwith ascending counters of order : ( A (cid:3) A (cid:3) A (cid:3) , ab ) , ( A (cid:2) A (cid:2) , hgf )( A (cid:3) A (cid:3) A (cid:3) , ba ) , ( A (cid:2) A (cid:2) , gf h )( A (cid:3) A (cid:3) A (cid:3) , ab ) , ( A (cid:2) A (cid:2) , f hg )( A (cid:3) A (cid:3) A (cid:3) , ba ) , ( A (cid:2) A (cid:2) , hgf )( A (cid:3) A (cid:3) A (cid:3) , ab ) , ( A (cid:2) A (cid:2) , gf h )( A (cid:3) A (cid:3) A (cid:3) , ba ) , ( A (cid:2) A (cid:2) , f hg ) By looking at the second case of Example 41 we notice that for each couple of pairedcounter sequences there is just one nonterminal that belongs to both of them. This remarkis easily generalized to the following lemma: Lemma 42. Let L ( GL ) be NC. If in C ( GL ) there are two paired counters C (cid:3) =( X (cid:3) , u ) , C (cid:2) = ( Y (cid:2) , v ) there exists only one A , such that A (cid:3) ∈ X (cid:3) , A (cid:2) ∈ Y (cid:2) .Proof. Let | X (cid:3) | = k , and | Y (cid:2) | = h , with h and k coprime, thanks to Lemma 40. Thetwo paired counters correspond to a NC derivation of GL A ∗ = ⇒ xA t y ∗ = ⇒ u k A v h with no repeated nonterminals A t . The total length of the derivation is h · k and each A t belongs to a set, marked (cid:3) , of cardinality k in the table T [ i ] of C (cid:3) and to a set, marked (cid:2) ,of cardinality h in the table T [ f ] of C (cid:2) . Thus, for any couple ( X (cid:3) , Y (cid:2) ) paired by thetwo counter tables, there exists exactly one A , such that A (cid:3) ∈ X (cid:3) , A (cid:2) ∈ Y (cid:2) by virtueof the Chinese remainder theorem. (cid:117)(cid:116) On the basis of the above lemmas the construction of ¯ C ( GL ) aims at replacing anyascending and descending counter with a loop X u −→−→−→ ¯ δδδ ∗ X where X is a suitable new statein ¯ C ( GL ) representing a whole counter sequence of C ( GL ) ; thanks to Lemma 40, thenew loop will be paired with a path that is not a counter or with another loop which inturn replaces a counter whose order is coprime w.r.t. the order of the other one. By virtueof Lemma 42, in turn, this will allow to disambiguate which element of the countersequence corresponds to the GL ’s nonterminal deriving the various instances of string u .This basic idea, however, cannot be implemented in a trivial way such as replacing allstates belonging to a counter sequence by a single state representing the whole sequence.Consider, for instance, a grammar containing the following productions: A → aBc | hB → aAd | bCdC → bAd which produce the fragment of control graph depicted in Figure 7. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 31 Fig. 7. A control graph including a descending counter. The control graph has a descending counter ( A (cid:3) B (cid:3) , a ) paired with the ascendingpath A (cid:2) d −−→ B (cid:2) c −−→ A (cid:2) . If we simply replace the descending path A (cid:3) a −−→ B (cid:3) a −−→ A (cid:3) with a self-loop AB (cid:3) a −−→ AB (cid:3) by coalescing the two states into one state denotedby AB (cid:3) , we obtain as a side effect a new counter ( AB (cid:3) C (cid:3) , b ) ; if we further collapse AB (cid:3) C (cid:3) into ABC (cid:3) we reduce the descending part of the control graph to a single statewith two self-loops labeled a, b : at this point, once a path reaches the state A (cid:2) and readsthe symbol d it is impossible to decide whether such an “ascending d ” should be pairedwith a previous descending b or a since both are labeling a self-loop on the unique state ABC (cid:3) .The construction we devised for such a ¯ C ( GL ) is therefore more complex: it isarticulated into two steps: first a ˆ C ( GL ) “equivalent” to C ( GL ) , in a sense that willbe made precise in Lemma 44, is built, which suitably splits some states belonging tocounters in such a way that each new instance thereof belongs to exactly one countertable; then the further construction ¯ C ( GL ) collapses all counter sequences into singlestates that allow repeating the “basic counter string u ” any number of times, instead of k times. Thus, each path of the original control graph C ( GL ) of type, say A (cid:3) u k −→ A (cid:3) thatrealizes a counter ( X (cid:3) , u ) of order k will be replaced by k paths X (cid:3) u −→ X (cid:3) (apart froma transient that will be explained later). Thanks to Lemma 40, if GL is NC, it will not bepaired with another counter ( Y (cid:2) , v ) , or, if so happens, the order of the other counter willbe an h coprime of k ; thus, thanks to Lemma 42, it will be possible to associate eachcouple of paired counters of the control graph of GL with a unique derivation of thegrammar. Construction of ˆ C ( GL ) . Intuitively, the aim of ˆ C ( GL ) is to produce “non-intersectingcounter tables”, i.e., counter tables such that T [ i ] (cid:54) = T [ j ] implies that the countersequences of T [ i ] are all disjoint from those of T [ j ] . This is obtained by creating oneinstance of state A , say A [ i ] , for each counter table T [ i ] A belongs to, where the index i binds the state instance to the table.The construction below applies as well to states of type A (cid:3) and to states of type A (cid:2) ,according to Definition 24. Notice that macro-transitions of the type A (cid:3) z −−→ A (cid:2) , whichcorrespond to GL ’s productions A → z , z ∈ W , cannot belong to any counter table of C ( GL ) , but A (cid:3) and/or A (cid:2) can belong to some descending or ascending counter. In otherwords, possible counters either involve states in V (cid:3) N only or in V (cid:2) N only. The construction of ˆ C ( GL ) = ( ˆ Q, Σ, ˆ δ ) starts from C ( GL ) = ( Q, Σ, δ ) , i.e., it is aprocess where ˆ Q and ˆ δ are initialized as Q and δ , and modifies them in the followingway. When the transformations below apply identically to descending and ascendingpaths we omit labeling the states of the control graph as (cid:3) or (cid:2) :First, we label any counter table T with a unique index i .Then, all states belonging to T [ i ] are also labeled in the same way, so that if a state A belongs to different counter tables, T [ i ] and T [ j ] , i (cid:54) = j , it will be split into differentstates A [ i ] and A [ j ] ; if instead it belongs to just one counter with only one associatedtable, for convenience it will be labeled with the same index i identifying the table. If itdoes not belong to any counter table, it remains unlabeled.Then, ˆ C ( GL ) ’s transitions are defined as follows: – For every macro-transition A f −→−→−→ δδδ B where A and B are both descending or both as-cending, for all m copies A [1] , A [2] , . . . A [ m ] of A and n copies B [1] , B [2] , . . . B [ n ] of B , A f −→−→−→ δδδ B is replaced by m · n macro-transitions A [ i ] f −→−→−→ ˆ δδδ B [ j ] , where A [ i ] and/or B [ j ] remain A and/or B if they do not belong to any counter table. – For every transition A (cid:3) f −→−→−→ δδδ A (cid:2) , if A belongs to some descending and/or ascendingcounter –thus it is labeled A (cid:3) [ i ] and/or A (cid:2) [ j ] – all possible A (cid:3) [ i ] f −→−→−→ ˆ δδδ A (cid:2) [ j ] replacethe original macro-transition. Fig. 8. C ( GL ) . Example 43. Consider the fragment of a control graph C ( GL ) (which could be indiffer-ently a descending or an ascending part thereof) depicted in Figure 8. The correspondingfragment of ˆ C ( GL ) is given in Figure 9. The example shows the case of two countertables sharing some states. Notice that in general the construction of ˆ C ( GL ) increases periodicity, Star-freeness, and First-order Definability of Structured CF Languages 33 the number of counters which are all isomorphic to the original one: for instance, inthe case of Figure 9, instead of the path A a −−→ H b −−→ L a −−→ B b −−→ A , we have A [1] a −−→ H [1] b −−→ L [1] a −−→ B [1] b −−→ A [1] , but also A [1] a −−→ H [2] b −−→ L [1] a −−→ B [1] b −−→ A [1] , A [1] a −−→ H [1] b −−→ L [2] a −−→ B [1] b −−→ A [1] . . . . We will see, however,that, despite the increased number of paths, none of them will generate a counting pathafter the further transformation from ˆ C ( GL ) to ¯ C ( GL ) . Fig. 9. ˆ C ( GL ) ; states belonging to different counter tables are depicted in different colors. Lemma 44. For each pair ( A (cid:3) , A (cid:2) ) of C ( GL ) , A (cid:3) z −→−→−→ δδδ A (cid:2) iff, either A (cid:3) z −→−→−→ ˆ δ ˆ δ ˆ δ A (cid:2) or,for all A (cid:3) [ i ] , A (cid:2) [ j ] , A (cid:3) z −→−→−→ ˆ δ ˆ δ ˆ δ A (cid:2) [ j ] or A (cid:3) [ i ] z −→−→−→ ˆ δ ˆ δ ˆ δ A (cid:2) or A (cid:3) [ i ] z −→−→−→ ˆ δ ˆ δ ˆ δ A (cid:2) [ j ] .By projecting the counters of ˆ C ( GL ) through the homomorphism h ( A [ i ]) = A , h ( B ) = B for all B that do not belong to any counter, one obtains exactly the countertables and the counters of C ( GL ) .Proof. Paths of C ( GL ) that do not touch any state belonging to some counter tableare found identically in ˆ C ( GL ) . If the path of a counter table T [ i ] of C ( GL ) touchesa sequence of states H, K, . . . L , ˆ C ( GL ) also has the path obtained by replacing H by H [ i ] , K by K [ i ] , etc., i being the index of T [ i ] . It is also always possible to “jump” froma table T [ i ] to another table T [ j ] by using the transition target B [ j ] instead of B [ i ] . Conversely, for each A [ i ] , B [ j ] , whether i = j or not, if in ˆ C ( GL ) there is themacro-transition A [ i ] f −→−→−→ ˆ δ ˆ δ ˆ δ B [ j ] this means that in C ( GL ) there was A f −→−→−→ δδδ B .Furthermore, the construction of ˆ C ( GL ) does not produce counters that are not theimage of C ( GL ) ’s counters under h − , since all its transitions involving some A [ i ] comefrom a corresponding C ( GL ) ’s transition with A in place of A [ i ] , (cid:117)(cid:116) Construction of ¯ C ( GL ) As anticipated, the core of ¯ C ( GL ) ’s construction moves from ˆ C ( GL ) and, roughly speaking, consists in collapsing all states belonging to a countersequence of a given counter table into a single new state named as the counter sequenceitself and labeled by the index of the table it belongs to.The behavior of ¯ C ( GL ) is such that it behaves exactly as C ( GL ) until it reaches astate that belongs to some counter table, say T [ i ] with reference counter C = ( X [ i ] , u ) .At that point it uses the single state, say A [ i ] , belonging to X [ i ] as an “entry point” to T [ i ] ; it follows the whole path A [ i ] u −→ A [ i ] . . . A k [ i ] u −→ A [ i ] of the table up to thelast step that would “close” the counter; at this point its next transition, instead of goingback to A [ i ] , enters a new state –named counter sequence state – representing the wholecounter sequence X [ i ] that includes the state A [ i ] .Then, ¯ C ( GL ) loops along the horizontal cyclic permutations of the counter, thereforewithout counting the repetitions of the counter string u ; in other words it “forgets thevertical cyclic permutations” of the counter table. When ¯ C ( GL ) exits from the loop,say by reading f , it nondeterministically reaches any node that can be reached by anystate belonging to the counter state it is leaving. Notice that exit from the loop occursonly as a consequence of a transition that in C ( GL ) was not part of the counter table;such a transition may lead either to a state that does not belong to the table, such as L h −−→ R in Figure 8, or to a state that is still part of the table, such as A c −−→ L in thesame figure. In the latter case the same table can be re-entered, i.e., the original countingpath may be resumed, but this must happen only by going into the single entry point ofthe table, not directly into the counter sequence state containing it (the reason of thischoice will be clear later); for instance in the case of Figure 8, the transition that reads c (from the counter sequence containing A ) leads to instances of L , not to the countersequence state(s) containing it. Notice also that the transition A c −−→ L may also occur in ¯ C ( GL ) during the “transient” before entering the counter sequence state: this means thatthe counting path is interrupted before being completed for the first time and possiblyresumed from scratch (with a different entry point).Obviously, ¯ C ( GL ) will exhibit all behaviors of C ( GL ) plus more; we will seehowever, that pairing such, say, descending behaviors with the ascending ones will allowto discard those that are not compatible with GL ’s derivations.We now describe in detail the construction of ¯ C ( GL ) .Let ( X, u ) with X = A . . . A k , u = z z . . . z j , j ≥ , z i ∈ W , denote anycounter of a counter table T of C ( GL ) ; to simplify the notation we will avoid theindex identifying the single tables whenever not necessary. Let also { ( Y m , u | m ) | m = 1 , . . . , j − } denote its horizontal cyclic permutations (if any, i.e., if j > ),where Y m = B m B m . . . B mk , u | m = z ( m +1) mod j z ( m +2) mod j . . . z m mod j , j ≥ .For every m = 1 , . . . , j − , l = 1 , . . . , k , let B ml z m −→−→−→ δδδ B m +1 l , B jl z j −→−→−→ δδδ A l +1) mod k . periodicity, Star-freeness, and First-order Definability of Structured CF Languages 35 Points 1 through 6 of the construction below are identical whether they are appliedto states belonging to descending or ascending paths; thus we will not mark those stateswith (cid:3) or (cid:2) .1. For each counter sequence X [ i ] = A [ i ] . . . A k [ i ] of counter table T [ i ] we de-fine the pipeline P P L ( X [ i ]) as the k cyclic permutations of the sequence of allstates A [ i ] B [ i ] B [ i ] . . . A [ i ] . . . B j − k [ i ] traversed by the whole path of the ta-ble, i.e, the permutations starting with A l [ i ] , with ≤ l ≤ k followed by thenew state X [ i ] , called a counter sequence state . For instance, with reference toFigure 9, P P L ( A [1] L [1]) consists of the two sequences A [1] H [1] L [1] B [1] and L [1] B [1] A [1] H [1] both followed by the state AL [1] . Similarly, P P L ( H [1] B [1]) consists of the two sequences H [1] L [1] B [1] A [1] and B [1] A [1] H [1] L [1] followedby the state HB [1] .For each counter table, all sequences of all pipelines of its counters are disjoint.Thus, for each table with counter sequences of order k and string u consisting of j elements in W a collection of ( k · j ) different copies of the original k · j states ofthe table plus j counter sequence states are in the state space ¯ Q besides all originalstates that do not participate in any counter table. Notation To distinguish the k · j replicas of the sequences that, for each pipelinelead to the counter sequence states, we add a second index to the one denoting thecounter table, ranging from to k · j − ; the -th copy, e.g., H [2 , , will denotethe entry point of each sequence of the pipeline.Let us now build ¯ C ( GL ) ’s (macro)transitions ¯ δ ¯ δ ¯ δ .2. All transitions that do not involve states belonging to counter tables are replicatedidentically from ˆ δ ˆ δ ˆ δ and therefore from δδδ .3. For all sequences of all pipelines of all tables T [ i ] with string u = z z . . . z j ,reference counter sequence X [ i ] = A [ i ] . . . A k [ i ] , and its horizontal permutations Y m [ i ] = B m [ i ] B m [ i ] . . . B mk [ i ] , u | m = z ( m +1) mod j z ( m +2) mod j . . . z m mod j , j ≥ , all original transitions of the table are replicated identically for each sequence,but the last one that would “close the counter”: precisely, A l [ i, z −→−→−→ B l [ i, , B m − l [ i, r ] z m −→−→−→ B ml [ i, r +1] for ≤ l ≤ k , ≤ m ≤ j − , r = l · ( j − m − . Inplace of transition, B j − k [ i, k · j − z j −→−→−→ A [ i, , the transition B j − k [ i, k · j − z j −→−→−→ X [ i ] is added to ¯ δ ¯ δ ¯ δ . In other words, this first set of transitions allows to enter a countersequence state from any state belonging to it, only by starting from the entry point ofthe pipeline associated with that state, then following the whole path of the countertable and, at its last step entering the new state of type counter sequence, of whichthe entry point is a member.As a particular case, if j = 1 , there is only one counter sequence state X [ i ] , allsequences of the pipeline have length k , and consist of transitions A l [ i, r ] u −→−→−→ A ( l +1) mod k [ i, r + 1] , with ≤ r ≤ k − , but the last one which is A l [ i, k − u −→−→−→ X [ i ] for some l .Notice that in some cases the same transition could be used as part of a countertable path and as an exit way to it; since it leads to a state still belonging to thecounter table, its target will be the entry point of a pipeline of the same counter table.Example 46 illustrates this case. 4. For all counter sequence states X [ i ] = A [ i ] . . . A k [ i ] , Y [ i ] = B [ i ] . . . B k [ i ] of atable T [ i ] , if for any A l [ i ] , B p [ i ] , z m , A l [ i ] z m −→−→−→ ˆ δδδ B p [ i ] (then it is also A l (cid:48) [ i ] z m −→−→−→ ˆ δδδ B p (cid:48) [ i ] for all A l (cid:48) [ i ] ∈ X [ i ] and B p (cid:48) [ i ] ∈ Y [ i ] ) we put X [ i ] z m −→−→−→ ¯ δδδ Y [ i ] . Similarly fortransitions in ˆ δδδ going from some B p [ i ] to some A l [ i ] or to some other B p (cid:48) [ i ] ∈ Y (cid:48) [ i ] .Thus, once ¯ C ( GL ) entered a counter table with string u it can accept any number of u s, plus possibly a prefix thereof, without counting them.5. Entering a counter. Counters can be entered only through the entry points of theirpipelines. This means that for each transition A x −→−→−→ ˆ δ ˆ δ ˆ δ B that does not belong to thecounter table T [ i ] but leads to a state B [ i ] thereof (notice that A could either belongor not to T [ i ] ) we add – only – A x −→−→−→ ¯ δ ¯ δ ¯ δ B [ i, . All other elements of the pipelinesequences that are not entry point can be accessed only through the transitions builtin point 3 above.6. Exiting a counter. Counters can be exited in two ways: either in the transient beforeentering the counter sequence state, or exiting the loop that repeats the string u any number of times without counting them. In the former case this is obtained byadding, for each original transition of C ( GL ) that departs from a state of the countertable T [ i ] and does not belong to the table, say A x −→−→−→ δδδ B , an instance thereof forall occurrences of A [ i, r ] in the various pipelines of the counters. Notice that thetarget state B of such transitions could either belong –as in the case of transition A c −→ L of Figure 8– or not to the same table: in the positive case it should be – only –the entry point labeled B [ i, of the pipelines; in the negative case it could be asingle state not belonging to any counter table or the entry point of some pipeline ofa different table.Exiting the counter from the counter sequence state is obtained similarly by replicat-ing the original transition A x −→−→−→ δδδ B for the target state B in the same way as in theprevious case and by replacing the source state A with the counter sequence state X [ i ] containing it.7. Finally, for each production A → x of GL : – If A does not belong to any counter of C ( GL ) only A (cid:3) x −→−→−→ A (cid:2) is in ¯ δδδ (this isalready implied by point 2 above). – If there is some A (cid:3) [ i ] in ˆ Q but no A (cid:2) [ f ] , i.e., A belongs to some descendingcounter but to no ascending one, we set both A (cid:3) [ i, r ] x −→−→−→ A (cid:2) for each r and X (cid:3) [ i ] x −→−→−→ A (cid:2) where A (cid:3) [ i, r ] may denote either an entry point of the pipeline( r = 0 ) or any other singleton element thereof. – If instead A (cid:3) does not belong to any counter but there is some A (cid:2) [ f ] , we setonly A (cid:3) x −→−→−→ ¯ δ ¯ δ ¯ δ A (cid:2) [ f, ; no transition A (cid:3) x −→−→−→ Y (cid:2) [ f ] or A (cid:3) x −→−→−→ ¯ δ ¯ δ ¯ δ A (cid:2) [ f, r ] with r (cid:54) = 0 is set, however: this is due to our convention that counters can only beentered through the single states that are entry points of a pipeline, whereas,once they entered the counter sequence state they must be exited only therefrom. – If in ˆ δδδ there are transitions A (cid:3) [ i ] x −→−→−→ A (cid:2) [ f ] , i.e. A belongs both to a descendingcounter X (cid:3) and to an ascending one Y (cid:2) of C ( GL ) , then A (cid:3) [ i, r ] x −→−→−→ A (cid:2) [ f, , periodicity, Star-freeness, and First-order Definability of Structured CF Languages 37 with r ≥ , and X (cid:3) [ i ] x −→−→−→ A (cid:2) [ f, , are in ¯ δ ¯ δ ¯ δ but neither A (cid:3) [ i, r ] x −→−→−→ Y (cid:2) [ f ] ,nor X (cid:3) [ i ] x −→−→−→ Y (cid:2) [ f ] , nor A (cid:3) [ i, r ] x −→−→−→ A (cid:2) [ f, s ] , nor X (cid:3) [ i ] x −→−→−→ A (cid:2) [ f, s ] , with s (cid:54) = 0 are included in ¯ δ ¯ δ ¯ δ for the same reason as above. Fig. 10. The ¯ C ( GL ) fragment derived from the C ( GL ) and ˆ C ( GL ) of Example 43. The gray boxesrepresent a collection of source or target states with the names indicated in the box. To illustrate the main features of the above construction, as a first example, consideragain the fragment of Example 43: the corresponding fragment of ¯ C ( GL ) is depicted inFigure 10; see also the further Example 48.The following example, instead, explains why we introduced the pipelines as aninput for counter sequence states. Example 45. The control graph of Figure 7 has shown that simply collapsing the statesof a counter sequence into a single state produces undesired side effects, such as spuriouscounters. A first repair could consist in keeping the original states (of ˆ C ( GL ) ) and usingthem as an entry for the compound states, in some sense, a pipeline of length 1.This solution too, however, is not enough. Consider, for instance, the fragment ofcontrol graph in Figure 11, no matter whether representing a descending or an ascendingfraction of the whole graph; it contains just one counter table with counters ( AC, ab ) and ( BD, ba ) ; thus, the corresponding fraction of ˆ C ( GL ) ) is isomorphic to the original graph.A possible version of ¯ C ( GL ) ) making use of single states to enter the counter sequencestates is given in Figure 12 which shows a new counter table with counters ( AP, ac ) and (( BD ) Q, ca ) which do not correspond to the behavior of the original control graph. Fig. 11. A fragment of control graph with one counter table. The source of the problem abides in the fact that the path cac reentering state A afterleaving BD “forgot” that its source was D , not B ; thus, it can go on in a way that doesnot separate the two cases. The construction of ¯ C ( GL ) ) making use of the full pipelines,on the contrary, “compels” to reenter the counter from scratch, i.e., from the “real” A ,not a different member of the same counter. This is why counters may be entered onlythrough their entry points. Fig. 12. An erroneous attempt to build a ¯ C ( GL ) version of the control graph fragment of Figure 11.periodicity, Star-freeness, and First-order Definability of Structured CF Languages 39 Finally the example below points out that in some cases the same transition can beused to follow the path of a counter table, but also to exit it, depending on the contextwithin which it occurs. Example 46. Consider the counter table, say the i th, consisting of the transition sequence A a −→−→−→ B , B b −→−→−→ C , C c −→−→−→ B , B a −→−→−→ D , D b −→−→−→ E , E c −→−→−→ A . It produces pipelineswith two occurrences of symbol B with different indices; thus, the same transition, e.g., B b −→−→−→ C is used both to follow the path of the counter table and to exit it but startingfrom different states as shown in Figure 13. Fig. 13. A significant fragment of the ¯ C ( GL ) derived from the transition sequence A a −→−→−→ B , B b −→−→−→ C , C c −→−→−→ B , B a −→−→−→ D , D b −→−→−→ E , E c −→−→−→ A . For simplicity other similar pipelines havebeen omitted. Lemma 47. For any nonterminal A of GL , the regular languages consisting of all pathsof ¯ C ( GL ) going from anyone of A (cid:3) , A (cid:3) [ i, r ] , X (cid:3) [ i ] , with A ∈ X (cid:3) [ i ] to anyone of A (cid:2) , A (cid:2) [ f, r ] , Y (cid:2) [ f ] , with A ∈ Y (cid:2) [ f ] are NC.Proof. The original “pure counters” of ˆ C ( GL ) have been “broken” by replacing thearrows that would complete the string u k with transitions that enter a loop accepting u ∗ . Thus, any pipeline associated with a counter whose string is u accepts sequences u m ,with m ≥ k . All paths of ¯ C ( GL ) that do not touch counter sequence states existed in C ( GL ) too up to the homomorphism that erases the indexes of the duplicated states.The only transitions that are not replicas of transitions existing in ˆ C ( GL ) (andin C ( GL ) ) are those exiting the counter sequence states since they are derived fromtransitions originating by some of the states belonging to the counter sequence, say X . If such transitions originate paths that do not lead to any pipeline, i.e., that donot correspond to C ( GL ) ’s paths leading to some counter table, then such paths cannotcontain any counter since they simply replicate C ( GL ) ’s paths with no counters. Suppose,instead, that such a path, after reading a string z , reaches the entry point of a pipelinewhich, through a string v j leads to a new counter: thus, the reading of z is only a finiteprefix of a path that leads from a counter sequence to another one (if instead the pathof the pipeline reading v j is abandoned before reaching the counter sequence state, itcontinues by replicating a path that existed already in C ( GL ) without counters, up to arenaming of some states). Notice that, as a particular case the new counter string v couldbe u again but referring to a different counter table, therefore with disjoint states.As a further special case, however, it could even happen that z is u s (it cannot be u = z s because by convention, u is the minimal string that can be associated with thecounter table – see Definition 37) and, by reading z , ¯ C ( GL ) re-enters a pipeline of thesame table so that after going through the whole pipeline we reach again state X . Inthis case we would have closed a loop from X to X by reading the string u s + k , thus, ¯ C ( GL ) would not be counter free. Nevertheless, it is aperiodic since, together with u s + k we would also find all strings u s + k + n for any n ≥ because from X we can read anystring in u ∗ . (cid:117)(cid:116) At this point it would be possible to prove again Theorem 26 and its Corollary 27 forany GL by suitably replacing formulas ϕ A with formulas referring to ¯ C ( GL ) instead of C ( GL ) . We would thus obtain FO definability of linear OPLs. This result however, hasalready been obtained with much less effort. Here we want to achieve the general resultfor any NC OPL. Let now G be a BDR OPG, GL its associated linearized OPG, C ( GL ) the originalcontrol graph of GL and ˆ C ( GL ) , ¯ C ( GL ) its respective transformations obtained throughtheir constructions (remember that ¯ C ( GL ) has been built starting from ˆ C ( GL ) ). A newgrammar G (cid:48) equivalent to G is built according to the following procedure: – The nonterminal alphabet of G (cid:48) , V (cid:48) N consists of: • All pairs ( A (cid:3) , A (cid:2) ) where A (cid:3) , A (cid:2) are singleton states of ¯ Q , i.e., states of ¯ C ( GL ) other than counter sequence states. They include also singleton states belongingto pipelines, i.e., states of type A (cid:3) [ i, r ] and/or A (cid:2) [ l, s ] if A belongs to somedescending or ascending counter. • All pairs ( X (cid:3) A , A (cid:2) ) , ( A (cid:3) , X (cid:2) A ) where A (cid:3) and A (cid:2) are singleton states of ¯ Q notbelonging to any descending, resp. ascending, counter and X (cid:3) A and X (cid:2) A are thecounter sequence states containing A (cid:3) and A (cid:2) , respectively. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 41 • The pairs ( X (cid:3) A , Y (cid:2) A ) , ( X (cid:3) A , A (cid:2) [ l, s ]) , ( A (cid:3) [ i, r ] , Y (cid:2) A ) where X (cid:3) A and Y (cid:2) A are thecounter sequence states belonging to two paired counter tables T [ i ] , T [ l ] and ( A (cid:3) [ i, r ] , A (cid:2) [ l, s ]) are elements of the corresponding pipelines. Thanksto Lemma 42 ( X (cid:3) A , Y (cid:2) A ) uniquely identify a nonterminal A of G . • The same elements as in the point above where X (cid:3) A and Y (cid:2) A are the countersequence states belonging to two non-paired counter tables T [ i ] , T [ l ] , with theexclusion of the pair ( X (cid:3) A , Y (cid:2) A ) . – For convenience, in the following construction we use the notation [ X A ] (cid:3) (resp., [ X A ] (cid:2) ) to denote either the singleton state A (cid:3) (resp. A (cid:2) ) or any counter sequencestate X A containing A , or any element of the corresponding pipeline. – For every production A → x of G the following productions are in P (cid:48) , for all [ X A ] (cid:3) : • if A does not belong to any ascending counter, then ([ X A ] (cid:3) , A (cid:2) ) → x ; • if A belongs to an ascending counter, say l , then ([ X A ] (cid:3) , A [ l, (cid:2) ) → x (seepoint 7 of ¯ C ( GL ) ’s construction). – For every production A → B x . . . x n B n of G (with x i ∈ W G ), where, as usual, B and B n may be missing, consider the following cases:1. A does not belong to any counter, either descending or ascending. Then thefollowing productions are in P (cid:48) : ( A (cid:3) , A (cid:2) ) → ([ Y B ] (cid:3) , [ Y B ] (cid:2) ) x . . . x n ([ Y B n ] (cid:3) , [ Y B n ] (cid:2) ) where, for each k , [ Y B k ] (cid:3) is B (cid:3) k if B k does not belong to any descending counter, B (cid:3) k [ i, for any i suchthat B k belongs to a counter table T [ i ] . The [ Y B k ] (cid:2) components are all the onesdefined in V (cid:48) N .2. A belongs to a descending counter table T [ i ] but not to any ascending one. Thenthe following productions are in P (cid:48) : • if no B k belongs to T [ i ] , then ([ X A ] (cid:3) , A (cid:2) ) → ([ Y B ] (cid:3) , [ Y B ] (cid:2) ) x . . . x n ([ Y B n ] (cid:3) , [ Y B n ] (cid:2) ) where [ X A ] (cid:3) stands for all A (cid:3) [ i, r ] plus X (cid:3) A [ i ] , and for each k , [ Y B k ] (cid:3) is B (cid:3) k if B k does not belong to any descending counter, B (cid:3) k [ l, for any l such that B k belongs to a counter table T [ l ] , with l (cid:54) = i . • if there exists a k such that B k belongs to T [ i ] –there can be at most onesuch k because of the construction of ˆ C ( GL ) –, then ([ X A ] (cid:3) , A (cid:2) ) → ([ Y B ] (cid:3) , [ Y B ] (cid:2) ) x . . . x n ([ Y B n ] (cid:3) , [ Y B n ] (cid:2) ) where if [ X A ] (cid:3) is A (cid:3) [ i, r ] , with ≤ r ≤ j − , where j is the length of the pipeline, [ Y B k ] (cid:3) is B (cid:3) k [ i, r + 1] ; if [ X A ] (cid:3) is A (cid:3) [ i, j ] or X (cid:3) A [ i ] [ Y B k ] (cid:3) is Y (cid:3) B K [ i ] ; all remainingelements of the rhs, including [ Y B k ] (cid:2) , are as in the previous item.3. A belongs to an ascending counter table T [ l ] but not to any descending one.Then the following productions are in P (cid:48) : • If none of the B k belongs to T [ l ] then the lhs is ( A (cid:3) , A (cid:2) [ l, and the nonter-minals ([ Y B k ] (cid:3) , [ Y B k ] (cid:2) ) of the rhs are all those existing in G (cid:48) ’s nonterminalalphabet. • If there exists a unique B k belonging to T [ l ] , then ( A (cid:3) , [ X A ] (cid:2) ) → ([ Y B ] (cid:3) , [ Y B ] (cid:2) ) x . . . x n ([ Y B n ] (cid:3) , [ Y B n ] (cid:2) ) where if [ Y B k ] (cid:2) is B (cid:3) k [ l, s ] , with ≤ s ≤ j − , [ X A ] (cid:2) is A (cid:3) [ l, s + 1] ; if [ Y B k ] (cid:2) is B (cid:3) k [ l, j ] or Y (cid:2) B k [ l ] , [ X A ] (cid:2) is X (cid:2) A [ l ] ; all remaining elements of therhs, including [ Y B k ] (cid:3) , are as in the previous bullet. 4. The case where A belongs to a descending counter table T [ i ] and to a pairedascending one T [ l ] can be treated as a natural combination of the previous ones,keeping in mind Lemma 42.5. A belongs to a descending counter table T [ i ] and to an ascending one T [ l ] that are not paired. In this case only one of the two tables can be followedby the derivation. In other words, a derivation A ∗ == ⇒ u k Av is interrupted tomove to another “semicounting derivation” A ∗ == ⇒ zAw h . In this case bothpossibilities are applied: all elements [ X A ] (cid:2) of the ascending pipeline, includingthe counter sequence state, are paired with single elements of the descendingpipeline excluding the counter sequence state, and conversely, in all compatibleways. The elements of the rhs are built in the same way as in points 3. and 2.above, respectively.For instance, if A belongs to a descending counter ( AB (cid:3) [1] , a ) and to an ascend-ing one ( AC (cid:2) [2] , b ) a production A → aBb belonging to both counter tablesbecomes the following G (cid:48) ’s productions ([ X A ] (cid:3) , [ X (cid:48) A ] (cid:2) ) → a ([ Y B ] (cid:3) , [ Y B ] (cid:2) ) b , ([ X (cid:48) A ] (cid:3) , [ X A ] (cid:2) ) → a ([ Y B ] (cid:3) , [ Y B ] (cid:2) ) b where [ X A ] (cid:3) (resp. [ X A ] (cid:2) ) stands for anyelement of the descending (resp. ascending) pipeline, including ( AB (cid:3) [1] , a ) (resp. ( AC (cid:2) [2] , b ) ) and [ X (cid:48) A ] (cid:3) (resp. [ X (cid:48) A ] (cid:2) ) stands for any element of the de-scending (reps. ascending) pipeline, excluding ( AB (cid:3) [1] , a ) (resp. ( AC (cid:2) [2] , b ) ).See also Example 49. – The axioms of G (cid:48) are: • the pairs ( A (cid:3) , A (cid:2) ) where A is an axiom of G that does not occur in any countertable, whether descending or ascending; • all pairs ( A (cid:3) , [ X A ] (cid:2) ) where A is an axiom of G that does not occur in anydescending counter table but occurs in some ascending ones; • all pairs ( A (cid:3) [ i, , [ X A ] (cid:2) ) where A is an axiom of G that belongs to the de-scending counter table T [ i ] and [ X A ] (cid:2) ) denotes either A (cid:2) or any element of anascending pipeline –including the counter sequence set– depending on whetheror not A belongs to some ascending counter table.Intuitively, G (cid:48) splits all of G ’s nonterminals into pairs representing elements of C ( GL ) ’s descending and ascending paths involving the same nonterminal of G . If oneof C ( GL ) ’s states belongs to a counter sequence this is recorded in the name of thenew nonterminal symbol which can be an element of the corresponding pipeline. If aderivation is following a descending or an ascending path of the syntax tree that is partof a counter table, say the i -th, then that part of the path must obey the constraints givenby the i -th pipeline. Such constraints are given by ¯ C ( GL ) since all paths root-to-leavesand back of G (cid:48) are the same as those of GL . Notice that, whereas G is BDR, G (cid:48) is not;it may also contain useless nonterminals. Example 48. To summarize, consider again the G NL grammar of Example 25 and itslinearized version G NL L of Example 32.The control graph of G NL L is given in Figure 14: it exhibits three ascending counters ( A (cid:2) B (cid:2) , c ¯ A ) , ( A (cid:2) B (cid:2) , c ¯ B ) , ( A (cid:2) B (cid:2) , ¯ ε R ) ; notice that the third one has no impact on the Besides, of course, those related to the case when the production is used to remain in the samecounter table.periodicity, Star-freeness, and First-order Definability of Structured CF Languages 43 counting property since we also have the self loops A (cid:2) ¯ ε R −→−→−→ ¯ δ ¯ δ ¯ δ A (cid:2) , B (cid:2) ¯ ε R −→−→−→ ¯ δ ¯ δ ¯ δ B (cid:2) . Thecorresponding ¯ C ( G NL L ) is given in Figure 15. Fig. 14. The control graph C ( G NL L ) G (cid:48) NL ’s nonterminal alphabet is the set: { ( A (cid:3) , A (cid:2) [ i, j ]) , ( A (cid:3) , AB (cid:2) [ i ]) , ( B (cid:3) , B (cid:2) [ i, j ]) , ( B (cid:3) , AB (cid:2) [ i ]) | i = 1 , , , j = 0 , } ,A significant sample of G (cid:48) NL ’s rules is given below. ( A (cid:3) , A [ i, (cid:2) ) → ac ( B (cid:3) , B [ i, (cid:2) ) → bc From the original G ’s rule A → aBcB we obtain the following rules: ( A (cid:3) , A (cid:2) [ k, → a ( B (cid:3) , [ Y (cid:2) B [ i ]]) c ( B (cid:3) , [ Y (cid:2) B [ l ]]) , where [ Y (cid:2) B [ i ]] , resp. [ Y (cid:2) B [ l ]] , standsfor either B (cid:2) [ i, or B (cid:2) [ i, or AB (cid:2) [ i ] , with i, l = 1 , , , k (cid:54) = i, l . ( A (cid:3) , A (cid:2) [ i, → a ( B (cid:3) , B (cid:2) [ i, c ( B (cid:3) , [ Y (cid:2) B [ l ]]) , ( A (cid:3) , AB (cid:2) [ i ]) → a ( B (cid:3) , B (cid:2) [ i, c ( B (cid:3) , [ Y (cid:2) B [ l ]]) , ( A (cid:3) , A (cid:2) [ l, → a ( B (cid:3) , [ Y (cid:2) B [ i ]]) c ( B (cid:3) , B (cid:2) [ l, , ( A (cid:3) , AB (cid:2) [ l ]) → a ( B (cid:3) , [ Y (cid:2) B [ i ]]) c ( B (cid:3) , B (cid:2) [ l, .The rationale of the construction is that any (ascending, in this case) counter canbe interrupted leading only to the entry point of a different counter (or to a state notbelonging to any counter, in the general case). If instead we are following a specificcounter marked by its index i , the sequence of the states (in this case the ascendingcomponent of G (cid:48) ’s nonterminal) must follow the sequence imposed by the i -th pipeline,whereas the other nonterminals, which correspond to the ¯ B terminals of G NL L may beof any type. The remaining rules of G (cid:48) should now be easily inferred by analogy. Fig. 15. The control graph ¯ C ( G NL L ) . The upper part of the graph concerning the descendingpaths is not reported being identical to the original one of C ( G NL L ) .periodicity, Star-freeness, and First-order Definability of Structured CF Languages 45 The following example instead enlightens the ambiguity of G (cid:48) as a consequence ofintroducing repeated rhs and the case of a grammar nonterminal belonging to both anascending and a descending counter, but not paired. Example 49. Consider the following grammar G cross , with S = { A, B } A → aBc , B → aAb | aCb | h , C → dBb It is easy to realize that C ( G cross ) has a descending counter C (cid:3) = ( A (cid:3) B (cid:3) , a ) andan ascending one C (cid:2) = ( B (cid:2) C (cid:2) , b ) . Notice that the production B → aAb is used inboth counter tables. Without providing explicitly the whole grammar G (cid:48) Cross we display ¯ C ( G cross ) in Figure 16. Fig. 16. The control graph ¯ C ( G cross ) A first derivation of G Cross is B ===== ⇒ G Cross h . Since B is an axiom of G Cross , h ∈ L ( G ) . In G (cid:48) Cross h can be derived –in one step– by the lhss ( B (cid:3) [1 , , B (cid:2) [2 , , ( B (cid:3) [1 , , B (cid:2) [2 , , ( AB (cid:3) [1] , B (cid:2) [2 , ; however, since only ( B (cid:3) [1 , , B (cid:2) [2 , is anaxiom of G (cid:48) , h can be derived as a string of L ( G (cid:48) ) only through that nonterminal; thederivation ( AB (cid:3) [1] , B (cid:2) [2 , ⇒ G (cid:48) Cross h , instead, could be used elsewhere as part of alonger G (cid:48) Cross derivation. The fact that in the lhs of G (cid:48) Cross rule occur the labels of twodifferent counter tables denotes the possibility that it belongs to two different counters.Imagine now that h occurs in the context d − b . This means that dhb has beenderived in G Cross by C ===== ⇒ G Cross dhb ; thus, no ambiguity remains and the only possi- ble lhs for all rhs d ( B (cid:3) [1 , , B (cid:2) [2 , b , d ( B (cid:3) [1 , , B (cid:2) [2 , b , d ( AB (cid:3) [1] , B (cid:2) [2 , b is ( C (cid:3) , C (cid:2) [2 , .The next derivation step of G Cross necessarily involves reducing the rhs aCb to B .This step, however, could be a further step of the ascending counter C or could interruptthe ascending counter and become a –last– step of the descending counter C . Thus,we have two possible groups of lhs for a ( C (cid:3) , C (cid:2) [2 , b , namely { ( B [1 , (cid:3) , BC (cid:2) [2]) , ( B [1 , (cid:3) , BC (cid:2) [2]) } and { ( B [1 , (cid:3) , B (cid:2) [1 , , ( B [1 , (cid:3) , B (cid:2) [1 , , ( AB [1] (cid:3) , B (cid:2) [1 , } .Notice, instead, that point 5. of G (cid:48) construction excludes the lhs ( AB [1] (cid:3) , BC (cid:2) [2]) whichwould be superfluous.If the next reduction involves the context a − c only C will be followed by applyingambiguously one of the rules ( A [1 , (cid:3) , A (cid:2) ) → a ( B [1 , (cid:3) , B (cid:2) [1 , c , ( A [1 , (cid:3) , A (cid:2) ) → a ( AB [1] (cid:3) , B (cid:2) [1 , c , ( AB [1] (cid:3) , A (cid:2) ) → a ( AB [1] (cid:3) , B (cid:2) [1 , c , ( A [1 , (cid:3) , A (cid:2) ) → a ( B [1 , (cid:3) , BC (cid:2) [2]) c , ( A [1 , (cid:3) , A (cid:2) ) → a ( AB [1] (cid:3) , BC (cid:2) [2]) c , ( AB [1] (cid:3) , A (cid:2) ) → a ( AB [1] (cid:3) , BC (cid:2) [2]) c .Symmetrically, if the next reduction involves the context d − b only C will befollowed. Lemma 50. Let G be a BDR OPG and G (cid:48) the grammar derived therefrom accordingto the above procedure. For every A ∈ V N A ∗ == ⇒ G x iff for some ([ X A ] (cid:3) , [ X A ] (cid:2) ) , ([ X A ] (cid:3) , [ X A ] (cid:2) ) ∗ == ⇒ G (cid:48) x .Proof. Base of the induction. By construction of G (cid:48) , A == ⇒ G x iff for all [ X A ] (cid:3) , either ([ X A ] (cid:3) , A (cid:2) ) → x , or ([ X A ] (cid:3) , A (cid:2) [ i, → x , for any i such that A belongs to a countertable T [ i ] . Moreover, by construction of ¯ C ( GL ) , [ X A ] (cid:3) x −→−→−→ ¯ δδδ A (cid:2) or [ X A ] (cid:3) x −→−→−→ ¯ δδδ A (cid:2) [ i, ,for all [ X A ] (cid:3) . Inductive step. 1. From G (cid:48) to G . Assume that for h ≤ p and for each A ∈ V N , ([ X A ] (cid:3) , [ X A ] (cid:2) ) h == ⇒ G (cid:48) x for some ([ X A ] (cid:3) , [ X A ] (cid:2) ) , implies A h == ⇒ G x . Consider a derivation ([ X A ] (cid:3) , [ X A ] (cid:2) ) ∗ == ⇒ G (cid:48) x ([ X B ] (cid:3) , [ X B ] (cid:2) ) x . . . ([ X B n ] (cid:3) , [ X B n ] (cid:2) ) ∗ == ⇒ G (cid:48) x . . . w n , with ([ X B k ] (cid:3) , [ X B k ] (cid:2) ) h == ⇒ G (cid:48) w k , h ≤ p , x k ∈ W (notice that W is the same both for G and G (cid:48) ); as for the con-struction of G (cid:48) , we treat only the case where ([ X B ] (cid:3) , [ X B ] (cid:2) ) is missing and ([ X B n ] (cid:3) , [ X B n ] (cid:2) ) is present since the other cases are fully similar.By the induction hypothesis B k ∗ == ⇒ G w k . By construction of ¯ C ( GL ) , for some [ X A ] (cid:3) , [ X A ] (cid:2) , [ Y B ] (cid:3) , [ Y B ] (cid:2) the following transitions are in ¯ δδδ : [ X A ] (cid:3) x −→−→−→ [ Y B ] (cid:3) , [ Y B n ] (cid:2) ¯ ε R −→−→−→ [ X A ] (cid:2) ; [ Y B ] (cid:2) x ¯ B ... ¯ B k − x k −→−→−→ [ Y B k ] (cid:3) , ≤ k ≤ n ; [ Y B k ] (cid:2) x k +1 ¯ B k +1 ... ¯ B n −→−→−→ [ X A ] (cid:2) , ≤ k ≤ n − ; with the additional constraint that, if [ X A ] (cid:2) is an X (cid:2) A [ i ] or A (cid:2) [ i, j ] for some i, j with j > , then [ Y B k ] (cid:2) is B (cid:2) [ i, j ] or B (cid:2) [ i, j − , respectively. periodicity, Star-freeness, and First-order Definability of Structured CF Languages 47 This means that for some D (cid:3) in [ X A ] (cid:3) , H (cid:3) k in [ Y B k ] (cid:3) , D was lhs of a production of G such as D → x H . . . x n H n . For each k , however, Y (cid:3) B k is paired with a unique B (cid:2) k or with an Y (cid:2) such that there is exactly one B such that B (cid:3) k ∈ Y (cid:3) B k and B (cid:2) k ∈ Y (cid:2) so that for a unique B k = H k ∗ == ⇒ G w k . Thus x B . . . x n B n is a unique rhs of G with a unique lhs D = A , so that A ∗ == ⇒ G x .2. From G to G (cid:48) . Conversely, assume that for h ≤ p and for each A ∈ V N , A h == ⇒ G x implies that for some ([ X A ] (cid:3) , [ X A ] (cid:2) ) , ([ X A ] (cid:3) , [ X A ] (cid:2) ) h == ⇒ G (cid:48) x (NB: there could beseveral ones since G (cid:48) is not BDR). Consider a derivation A == ⇒ G x B . . . B n ∗ == ⇒ G x w n . . . w n , with B k h == ⇒ G w k , h ≤ p . By the induction hypothesis there exists atleast one derivation ([ X B k ] (cid:3) , [ X B k ] (cid:2) ) h == ⇒ G (cid:48) w k for each k .The construction of G (cid:48) produces from G ’s production A → x B ...B n all possi-ble rules ([ X A ] (cid:3) , [ X A ] (cid:2) ) → x ([ X B ] (cid:3) , [ X B ] (cid:2) ) x . . . ([ X B n ] (cid:3) , [ X B n ] (cid:2) ) that arecompatible with ¯ δδδ according to the above construction. Thus, there exists at leastone rule in G (cid:48) ([ X A ] (cid:3) , [ X A ] (cid:2) ) → x ([ X B ] (cid:3) , [ X B ] (cid:2) ) x . . . ([ X B n ] (cid:3) , [ X B n ] (cid:2) ) foreach ([ X B k ] (cid:3) , [ X B k ] (cid:2) ) ∗ == ⇒ G (cid:48) w k . (cid:117)(cid:116) By taking into account how G (cid:48) axioms are derived from those of G we immediatelyobtain the main theorem: Theorem 51. The OPG G and the OPG G (cid:48) built from it on the basis of the aboveconstruction are structurally equivalent. The structural equivalence is an obvious consequence of the fact that the two grammarsshare the same OPM.The control graph of grammar G (cid:48) , C ( G (cid:48) ) , is defined through a natural modificationof the original Definition 24: precisely, V (cid:3) N is the set of the left elements of V (cid:48) N , and V (cid:2) N the set of right elements thereof.Figure 17 displays a fragment of C ( G (cid:48) ) for the grammar of Example 48. Whereasthe transitions from descending states are complete, for simplicity only the entry pointsof the ascending part of the graph are displayed.The following theorem extends Theorem 26 to the grammars such as G (cid:48) derivedfrom BDR OPGs. Theorem 52. Consider formulas 6, 7 where the subscript A is replaced by all pairs ([ X A ] (cid:3) , [ X A ] (cid:2) ) as defined in the construction of G (cid:48) . Thus formula ϕ ([ X A ] (cid:3) , [ X A ] (cid:2) ) definesthe set { x | [ X A ] (cid:3) x −→ [ X A ] (cid:2) } . For any ([ X A ] (cid:3) , [ X A ] (cid:2) ) ∈ V (cid:48) N , x ∈ L (([ X A ] (cid:3) , [ X A ] (cid:2) )) if and only if ϕ ([ X A ] (cid:3) , [ X A ] (cid:2) ) (0 , | x | + 1) ∧ ψ ([ X A ] (cid:3) , [ X A ] (cid:2) ) hold.Proof. The proof is almost identical to that of Theorem 26, the only difference comingfrom the fact that G (cid:48) is not BDR. Thus, e.g., in the base of the induction, instead of justone production A → x we may have several ones of type ([ X A ] (cid:3) , [ X A ] (cid:2) ) → x , eachone of them satisfying ψ ([ X A ] (cid:3) , [ X A ] (cid:2) ) with the corresponding lhs. (cid:117)(cid:116) Fig. 17. A fragment of the control graph C ( G (cid:48) ) . The upper part of the graph depicts the descending(single) states; the lower part shows only the entry points of the ascending pipelines. The following theorem is the last step to achieve FO definability of aperiodic OPLs. Theorem 53. Let G (cid:48) be the grammar built from any NC BDR OPG G according to theprocedure given above and let C ( G (cid:48) ) be its control graph. Then, for each ([ X A ] (cid:3) , [ X A ] (cid:2) ) of G (cid:48) the set of paths [ X A ] (cid:3) w i −→−→−→ [ X A ] (cid:2) is a NC regular language.Proof. The fact that the set of paths is a regular language follows immediately from thedefinition of the automaton as in Theorem 26.Consider a generic path [ X A ] (cid:3) w −→−→−→ [ X A ] (cid:2) of C ( G (cid:48) ) with w = xv n y with n suf-ficiently large, e.g., larger than G (cid:48) ’s nonterminal alphabet. Thus, there must exist asubpath of [ X A ] (cid:3) w −→−→−→ [ X A ] (cid:2) such as [ X B ] (cid:3) v −→−→−→ [ X B ] (cid:3) v −→−→−→ ... v −→−→−→ [ X nB ] (cid:3) , with v = w x w x ... where w i are well parenthesized according to the OPM and x i ∈ W ,or similarly for an ascending path. Notice in fact that, being v ’s parenthesization uniquelydetermined by the OPM, [ X lB ] , ≤ l ≤ n are either all [ X lB ] (cid:3) or all [ X lB ] (cid:2) .If for some i [ X iB ] (cid:3) = [ X i +1 B ] (cid:3) then it is also [ X A ] (cid:3) xv n + r y −→−→−→ [ X A ] (cid:2) for every r ≥ .Suppose instead that for some k [ X B ] (cid:3) v −→−→−→ [ X B ] (cid:3) v −→−→−→ ... [ X kB ] (cid:3) v −→−→−→ [ X B ] (cid:3) with [ X iB ] (cid:3) (cid:54) = [ X jB ] (cid:3) for i (cid:54) = j .Since the original grammar G is BDR, for each w i there exists a unique C i suchthat C i ∗ == ⇒ G w i . Thus, B (cid:3) l ¯ y −→−→−→ δδδ B (cid:3) ( l +1) mod k in C ( GL ) , where ¯ y is obtained from y byreplacing each w i with ¯ C i ; since ( B (cid:3) ...B (cid:3) k , ¯ v ) is a counter of C ( GL ) , by constructionof ¯ C ( GL ) it is also X (cid:3) B ¯ C x ¯ C x ... −→−→−→ ¯ δδδ X (cid:3) B for X (cid:3) B = B ...B (cid:3) k and any path including ¯ v k periodicity, Star-freeness, and First-order Definability of Structured CF Languages 49 must also include the counter sequence state X (cid:3) B . By replacing back ¯ C i with w i weobtain X (cid:3) B v −→−→−→ X (cid:3) B as part of the path [ X B ] (cid:3) v −→−→−→ [ X B ] (cid:3) v −→−→−→ ... [ X kB ] (cid:3) v −→−→−→ [ X B ] (cid:3) ;thus [ X A ] (cid:3) w (cid:48) −→−→−→ [ X A ] (cid:2) for all w (cid:48) = xv n + r y with r ≥ . (cid:117)(cid:116) As a consequence of Theorem 53 all formulas ϕ ([ X A ] (cid:3) , [ X A ] (cid:2) ) can be written in FOlogic, so that the original MSO formulas 6, 7 become FO once applied to grammar G (cid:48) .Finally we have obtained or main result: Theorem 54. Aperiodic operator precedence languages are FO definable. Figure 18 summarizes the results presented in this paper together with previous relatedones. The external boxes represent equivalent ways to express general OPLs, whereasthe internal ones represent equivalent ways to express aperiodic OPLs. The figureimmediately suggests a first further research step, i.e., making the internal triangle asquare, as well as the external one: we conjecture that once the concept of NC OPLs hasbeen put in the appropriate framework, a further characterization thereof in terms of asuitable subclass of OPAs should be possible but so far we did not pursue such an option.A further benefit coming from such an extension would be avoiding the hypothesisof . = -acyclic OPMs: this restriction is necessary only to guarantee that an OPG cangenerate the whole Σ ∗ but is not necessary for OPAs which indeed have a slightly moreexpressive power than OPGs .We also hope that the articulated path that we used to prove that NC OPLs are FOdefinable can be made shorter and more direct, although we cannot forget that even inthe case of regular languages such proof paths are rather complex (see, e.g., [31].)The most exciting goal that we wish to pursue and we submit to the theoreticalcomputer science community, however, is the completion of the great historical path that,for regular languages, lead from the first characterization in terms of MSO logic to therestricted case of FO characterization of NC regular languages, to the temporal logic onewhich in turn is first-order complete, thanks to Kamp’s theorem [35], and, ultimately, tothe striking success of model checking techniques.Some proposals of temporal logic extension of the classical linear or branchingtime ones to cope with the typical nesting structure of context-free languages have beenalready offered in the literature. E.g., [29] presents an FO-complete temporal logic tospecify properties of paths in tree-languages; [1,3,7] present different cases of temporallogics extended to deal with VPLs ; they also prove FO-completeness of such logics butdo not afford the relation between FO and MSO versions of their logics, neither do theydeal with aperiodicity for VPLs .We too have already designed a first example of temporal logic for OPLs [11] andbuilt an algorithm that derives an OPA from a formula of this logic of exponential size in Alternatively, OPGs could be extended with productions allowing for rhs that include regularexpressions [14,16] but we avoided this option to keep the notation not too cumbersome. As announced in the abstract, it should be now clear that, as a corollary of our result, one canalso obtain an FO logic characterization of NC VPLs.0 Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi Legend All boxes denote classes of OPLs with a common conflict-free OPM:MSO denotes languages defined through MSO formulasFO denotes languages defined through FO formulas A denotes languages defined through operator precedence automata [27] E denotes languages defined through OPEs E SF denotes languages defined through star-free OPEs G denotes languages defined through OPGs G NC denotes aperiodic OPLs, i.e., languages defined through NC OPGsArrows between boxes denote language family inclusion; they are labeled by the reference pointing to where the propertyhas been proved, either to previous literature or to a section of this paper. Fig. 18. The relations among the various characterizations of OPLs and their aperiodic subclass.periodicity, Star-freeness, and First-order Definability of Structured CF Languages 51 the length of the formula. Thanks to the result of this paper, and to the fact that most, ifnot all, of the context-free languages for practical applications are aperiodic, the finalgoal of building model checkers that cover a much wider application field than that ofregular languages –and of various structured context-free languages, such as VPLs, too–with comparable computational complexity does not seem unreachable. References 1. Alur, R., Arenas, M., Barcel´o, P., Etessami, K., Immerman, N., Libkin, L.: First-order andtemporal logics for nested words. Logical Methods in Computer Science 4(4) (2008)2. Alur, R., Madhusudan, P.: Adding nesting structure to words. J. ACM 56(3) (2009)3. Alur, R., Chaudhuri, S., Madhusudan, P.: Software model checking using languages of nestedtrees. ACM Trans. Program. Lang. Syst. 33(5), 15:1–15:45 (2011), https://doi.org/10.1145/2039346.2039347 4. Alur, R., Fisman, D.: Colored nested words. In: Dediu, A., Janousek, J., Mart´ın-Vide, C.,Truthe, B. (eds.) Language and Automata Theory and Applications - 10th InternationalConference, LATA 2016, Prague, Czech Republic, March 14-18, 2016, Proceedings. LectureNotes in Computer Science, vol. 9618, pp. 143–155. Springer (2016), https://doi.org/10.1007/978-3-319-30000-9_11 5. Autebert, J., Berstel, J., Boasson, L.: Context-free languages and pushdown automata. In:Handbook of Formal Languages (1), pp. 111–174 (1997), https://doi.org/10.1007/978-3-642-59136-5_3 6. Barenghi, A., Crespi Reghizzi, S., Mandrioli, D., Panella, F., Pradella, M.: Paral-lel parsing made practical. Sci. Comput. Program. 112(3), 195–226 (2015), dOI:10.1016/j.scico.2015.09.0027. Bozzelli, L., S´anchez, C.: Visibly linear temporal logic. In: Demri, S., Kapur, D., Weidenbach,C. (eds.) Automated Reasoning - 7th International Joint Conference, IJCAR 2014, Heldas Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 19-22, 2014.Proceedings. Lecture Notes in Computer Science, vol. 8562, pp. 418–433. Springer (2014), https://doi.org/10.1007/978-3-319-08587-6_33 8. von Braunm¨uhl, B., Verbeek, R.: Input-driven languages are recognized in log n space. In:Proceedings of the Symposium on Fundamentals of Computation Theory, Lect. Notes Comput.Sci. 158. pp. 40–51. Springer (1983)9. B¨uchi, J.R.: Weak Second-Order Arithmetic and Finite Automata. Mathematical Logic Quar-terly 6(1-6), 66–92 (1960)10. Chevalier, F., D’Souza, D., Prabhakar, P.: Counter-free input-determined timed automata.In: Raskin, J., Thiagarajan, P.S. (eds.) Formal Modeling and Analysis of Timed Systems,5th International Conference, FORMATS 2007, Salzburg, Austria, October 3-5, 2007, Pro-ceedings. Lecture Notes in Computer Science, vol. 4763, pp. 82–97. Springer (2007), https://doi.org/10.1007/978-3-540-75454-1_8 11. Chiari, M., Mandrioli, D., Pradella, M.: Temporal logic and model checking for operatorprecedence languages. In: Orlandini, A., Zimmermann, M. (eds.) Proceedings Ninth Interna-tional Symposium on Games, Automata, Logics, and Formal Verification, GandALF 2018,Saarbr¨ucken, Germany, 26-28th September 2018. EPTCS, vol. 277, pp. 161–175 (2018), https://doi.org/10.4204/EPTCS.277.12 12. Crespi Reghizzi, S., Guida, G., Mandrioli, D.: Noncounting Context-Free Languages. J. ACM25, 571–580 (1978)13. Crespi Reghizzi, S., Guida, G., Mandrioli, D.: Operator Precedence Grammars and theNoncounting Property. SICOMP: SIAM Journ. on Computing 10, 174—191 (1981)2 Dino Mandrioli, Matteo Pradella, Stefano Crespi Reghizzi14. Crespi Reghizzi, S., Mandrioli, D.: Operator Precedence and the Visibly Pushdown Property.J. Comput. Syst. Sci. 78(6), 1837–1867 (2012)15. Crespi Reghizzi, S., Mandrioli, D., Martin, D.F.: Algebraic Properties of Operator PrecedenceLanguages. Information and Control 37(2), 115–133 (May 1978)16. Crespi Reghizzi, S., Pradella, M.: Beyond operator-precedence grammars and languages.Journal of Computer and System Sciences 113, 18–41 (2020)17. Crespi Reghizzi, S., Mandrioli, D.: A class of grammar generating non-counting languages. Inf.Process. Lett. 7(1), 24–26 (1978), https://doi.org/10.1016/0020-0190(78)90033-9 18. Diekert, V., Gastin, P.: First-order definable languages. In: Logic and Automata: History andPerspectives, Texts in Logic and Games. pp. 261–306. Amsterdam University Press (2008)19. Elgot, C.C.: Decision problems of finite automata design and related arithmetics. Trans. Am.Math. Soc. 98(1), 21–52 (1961)20. ´Esik, Z., Iv´an, S.: Aperiodicity in tree automata. In: Bozapalidis, S., Rahonis, G. (eds.) Al-gebraic Informatics, Second International Conference, CAI 2007, Thessaloniki, Greece,May 21-25, 2007, Revised Selected and Invited Papers. Lecture Notes in ComputerScience, vol. 4728, pp. 189–207. Springer (2007), https://doi.org/10.1007/978-3-540-75414-5_12 21. Floyd, R.W.: Syntactic Analysis and Operator Precedence. J. ACM 10(3), 316–333 (1963)22. Harrison, M.A.: Introduction to Formal Language Theory. Addison Wesley (1978)23. Heuter, U.: First-order properties of trees, star-free expressions, and aperiodicity. ITA 25,125–145 (1991), https://doi.org/10.1051/ita/1991250201251 24. Langholm, T.: A descriptive characterisation of linear languages. Journal of Logic,Language and Information 15(3), 233–250 (2006), https://doi.org/10.1007/s10849-006-9016-z 25. Lautemann, C., Schwentick, T., Th´erien, D.: Logics for context-free languages. In: Pacholski,L., Tiuryn, J. (eds.) Computer Science Logic, 8th International Workshop, CSL ’94, Kazimierz,Poland, September 25-30, 1994, Selected Papers. Lecture Notes in Computer Science, vol.933, pp. 205–216. Springer (1994)26. Lonati, V., Mandrioli, D., Panella, F., Pradella, M.: First-order logic definability of freelanguages. In: Beklemishev, L.D., Musatov, D.V. (eds.) Computer Science - Theory and Appli-cations - 10th International Computer Science Symposium in Russia, CSR 2015, Listvyanka,Russia, July 13-17, 2015, Proceedings. Lecture Notes in Computer Science, vol. 9139, pp.310–324. Springer (2015)27. Lonati, V., Mandrioli, D., Panella, F., Pradella, M.: Operator precedence languages: Theirautomata-theoretic and logic characterization. SIAM J. Comput. 44(4), 1026–1088 (2015)28. Mandrioli, D., Pradella, M.: Generalizing input-driven languages: Theoretical and practicalbenefits. Computer Science Review 27, 61–87 (2018), https://doi.org/10.1016/j.cosrev.2017.12.001 29. Marx, M.: Conditional XPath. ACM Transactions on Database Systems 30(4), 929–959 (dec2005)30. McNaughton, R.: Parenthesis Grammars. J. ACM 14(3), 490–500 (1967)31. McNaughton, R., Papert, S.: Counter-free Automata. MIT Press, Cambridge, USA (1971)32. Nowotka, D., Srba, J.: Height-Deterministic Pushdown Automata. In: Kucera, L., Kucera,A. (eds.) MFCS 2007, Cesk´y Krumlov, Czech Republic, August 26-31, 2007, Proceedings.LNCS, vol. 4708, pp. 125–134. Springer (2007)33. Pin, J.: Logic on words. In: Current Trends in Theoretical Computer Science, pp. 254–273(2001)34. Potthoff, A.: First-order logic on finite trees. In: Mosses, P.D., Nielsen, M., Schwartzbach,M.I. (eds.) TAPSOFT’95: Theory and Practice of Software Development, 6th Internationalperiodicity, Star-freeness, and First-order Definability of Structured CF Languages 53Joint Conference CAAP/FASE, Aarhus, Denmark, May 22-26, 1995, Proceedings. LectureNotes in Computer Science, vol. 915, pp. 125–139. Springer (1995), https://doi.org/10.1007/3-540-59293-8_191 35. Rabinovich, A.: A proof of kamp’s theorem. Logical Methods in Computer Science 10(1)(2014), https://doi.org/10.2168/LMCS-10(1:14)2014https://doi.org/10.2168/LMCS-10(1:14)2014