[PDF] Positive first-order logic on words

Abstract

We study FO+, a fragment of first-order logic on finite words, where monadic predicates can only appear positively. We show that there is a FO-definable language that is monotone in monadic predicates but not definable in FO+. This provides a simple proof that Lyndon's preservation theorem fails on finite structures. We additionally show that given a regular language, it is undecidable whether it is definable in FO+.

Full PDF

PPositive First-order Logic on Words

Denis KuperbergCNRS, LIP, ENS LyonEmail: [email protected] 22, 2021

Abstract

We study FO + , a fragment of ﬁrst-order logic on ﬁnite words, wheremonadic predicates can only appear positively. We show that there isa FO-deﬁnable language that is monotone in monadic predicates but notdeﬁnable in FO + . This provides a simple proof that Lyndon’s preservationtheorem fails on ﬁnite structures. We additionally show that given aregular language, it is undecidable whether it is deﬁnable in FO + . Perservation theorems in ﬁrst-order logic (FO) establish a link between semanticand syntactic properties [AG97, Ros08]. We will be particularly interested herein Lyndon’s theorem [Lyn59], which states that if a ﬁrst-order formula is mono-tone in a predicate P (semantic property), then it is equivalent to a formulathat is positive in P (syntactic property). As for other preservation theorems,this result may not hold when restricting the class of structures considered.Whether Lyndon’s Theorem was true when restricted to ﬁnite structures wasan open problem for 28 years. It was ﬁnally shown to fail on ﬁnite structuresin [AG87] with a very diﬃcult proof, using a large array of techniques from dif-ferent ﬁelds of mathematics such as probability theory, topology, lattice theory,and analytic number theory. A simpler but still quite intricate proof of thisfact was later given by [Sto95], using Ehrenfeucht-Fra¨ıss´e games on grid-likestructures equipped with two binary predicates.The goal of this paper is to further restrict the class of structures underconsideration, by allowing only ﬁnite words. Our purpose is twofold: • Find out whether Lyndon’s Theorem holds on ﬁnite words, and investi-gate the relation of this framework with the more general case of ﬁnitestructures. • From the point of view of language theory: study the natural fragment ofFO + -deﬁnable languages, in particular given a regular language, can wedecide whether it is FO + -deﬁnable?1 a r X i v : . [ c s . F L ] J a n e will therefore work in this paper with the particular signature associatedwith ﬁnite words: one binary predicate (the total order), and a ﬁnite set ofmonadic predicates (encoding the alphabet).Recall that FO on words is a well-studied logic deﬁning a proper fragmentof regular languages. This fragment has many equivalent characterizations:deﬁnable by star-free expressions, aperiodic monoids, LTL formulas,... [DG08,Sch65, Kam68, MP71]. Contributions

We deﬁne a semantic notion of monotone language on ordered alphabets: thelanguage is required to be closed under replacement of a letter by a bigger one.This generalizes the monotonicity condition on monadic predicates in the senseof Lyndon. The negation-free logic FO + can only deﬁne monotone languages,and can be seen as a fragment of the standard FO logic on words.We show that Lyndon’s Theorem fails on ﬁnite words, by building a regularlanguage that is monotone and FO-deﬁnable, but not FO + -deﬁnable. This proofuses a variant of Ehrenfeucht-Fra¨ıss´e games that characterizes FO + -deﬁnability,introduced in [Sto95], and instantiated here on ﬁnite words. As a corollary,using suitable axiomatizations of ﬁnite words, we obtain the failure of Lyndon’stheorem on ﬁnite structures, in a much simpler way than in [AG87, Sto95].Finally, answering our second objective, we show that FO + -deﬁnability isundecidable for regular languages. This result is obtained using a reductionfrom the Turing Machine Mortality problem [Hoo66]. To our knowledge, this isthe ﬁrst example of a natural class of regular languages for which membershipis undecidable. Related works

Monotone complexity

Positive fragments of ﬁrst-order logic play a prominent role in complexitytheory. Indeed, an active research program consists in studying positive frag-ments of complexity classes. This includes for instance trying to lift equivalentcharacterizations of a class to their positive versions, or investigating whether asemantic and a syntactic deﬁnition of the positive variant of a class are equiva-lent. See [GS92] for an introduction to monotone complexity, and [LSS96, Ste94]for examples of characterizations of the positive versions of the classes P and NP ,in particular through extensions of ﬁrst-order logic. The aforementioned paper[AG87], which was the ﬁrst to show the failure of Lyndon’s theorem on ﬁnitestructures, does so by reproving in particular an important result on monotonecircuit complexity ﬁrst proved in [FSS81]: Monotone-AC (cid:54) = Monotone ∩ AC . Membership in subclasses of regular languages The concept of “natural” class is of course a bit informal here, but for instance we canthink of it as classes inductively deﬁned via a syntax. k quantiﬁer alternations? Recent works obtaineddecidability results for this question, but only for the ﬁrst 3 levels of the quanti-ﬁer alternation hierarchy [PZ19]. For higher levels, the problem remains open.Let us also mention the generalized star-height problem [PST89]: can a givenregular language be deﬁned in an extended regular expression (with comple-ment allowed) with no nesting of Kleene star? In this case it is not even knownwhether all regular languages can be deﬁned in this way. Quantitative extensions

First-order logic on words has been extended to quantitative settings, whichnaturally yields a negation-free syntax, because complementation becomes prob-lematic in those settings. This is the case in the theory of regular cost functions[Col12, KVB12], and in other quantitative extensions concerned with bounded-ness properties, such as MSO+U [Boj04] or Magnitude MSO [Col13]. We hopethat the present work can shed a light on these extensions as well.

Notations and prerequisites If i, j ∈ N , we note [ i, j ] the set { i, i + 1 , . . . , j } . We will note A a ﬁnite alphabetthroughout the paper. The set of ﬁnite words on A is A ∗ . The length of u ∈ A ∗ is denoted | u | . We will note dom ( u ) = [0 , | u | −

1] the set of positions of a word u . If u is a word and i ∈ dom ( u ), we will note u [ i ] the letter at position i , and u [ ..i ] the preﬁx of u up to position i . Similarly, u [ i..j ] is the inﬁx of u fromposition i to j and u [ i.. ] is the suﬃx of u starting in position i .We will assume that the reader is familiar with the notion of regular lan-guages of ﬁnite words, and with some ways to deﬁne such languages: ﬁnite au-tomata (DFA for deterministic and NFA for non-deterministic), ﬁnite monoids,and ﬁrst-order logic. See e.g. [DG08] for an introduction of all the neededmaterial. In this paper we will consider that the ﬁnite alphabet A is equipped with a par-tial order ≤ A . This partial order is naturally extended to words componentwise: a a . . . a n ≤ A b b . . . b m if n = m and for all i ∈ [1 , n ] we have a i ≤ A b i .A special case that will be of interest here is when the alphabet is built as thepowerset of a set P of predicates , i.e. A = P ( P ), and the order ≤ A is inclusion.We will call this a powerset alphabet . 3aking A = P ( P ) is standard in settings such as veriﬁcation and modeltheory, where several predicates can be considered independently of each otherin some position.Powerset alphabets constitute a particular case of ordered alphabets. Theresults obtained in this paper are valid for both the powerset case and thegeneral case. Due to the nature of the results (existence of a counter-exampleand undecidability result), it is enough to show them in the particular case ofpowerset alphabets to cover both cases. Moreover, the powerset alphabet caseallows us to directly establish a link with Lyndon’s theorem, which is stated inthe framework of model theory. For these reasons, we will keep the more generalnotion of ordered alphabet for generic deﬁnitions, but will prove our main resultson powerset alphabets in order to directly obtain the stronger version of theseresults. We ﬁx A a ﬁnite ordered alphabet. Deﬁnition 1.

We say that a language L ⊆ A ∗ is monotone if for all u ∈ L and v ≥ A u , we have v ∈ L . Example 2.

Let A = { a, b } with a ≤ A b . Then A ∗ bA ∗ is monotone but A ∗ aA ∗ is not monotone. Deﬁnition 3.

Let L ⊆ A ∗ , the monotone closure of L is the language L ↑ = { v ∈ A ∗ | ∃ u ∈ L, u ≤ A v } . It is the smallest monotone language containing L .In particular, if a ∈ A , we will note a ↑ the set { b ∈ A | a ≤ A b } . Lemma 4.

Given a NFA A , we can compute in time O ( |A| · | A | ) a NFA A ↑ for the monotone closure of L ( A ) .Proof. We build a NFA A ↑ from A , by replacing every transition p a → q of A by p a ↑ → q . We use here the standard convention where a transition p B → q with B ⊆ A stands for a set of transitions { p b → q | b ∈ B } . It is straightforward toverify that A ↑ is a NFA for L ↑ : any run of A ↑ on some word v can be mappedto a run of A on some u ≤ A v . Theorem 5.

Given a regular language L ⊆ A ∗ , it is decidable whether L ismonotone. The problem is in P if L is given by a DFA and Pspace -completeif L is given by a NFA.Proof. Notice that if A is a NFA, L ( A ) is monotone if and only if L ( A ↑ ) ⊆ L ( A ).This shows that the problem is in Pspace in general, and that it is in P when A is a DFA, since it reduces to checking emptiness of the intersection between A ↑ and the complement of A . We show that the general problem is Pspace -hardby reducing from NFA universality. Let A be a NFA on an alphabet A . Webuild the ordered alphabet B = A ∪ { a, b } , where a, b / ∈ A , and a ≤ B b is the4nly non-trivial inequality in B . We build a NFA B recognizing aA ∗ + bL ( A ),using standard NFA constructions. We have that L ( A ) = A ∗ if and only if L ( B )is monotone, thereby completing the Pspace -hardness reduction.

The main idea of positive FO, that we will note FO + , is to guarantee via asyntactic restriction that it only deﬁnes monotone languages.Notice that since monotone languages are not closed under complement, wecannot allow negation in the syntax of FO + . This means we have to add dualversions of classical operators of ﬁrst-order logic.This naturally yields the following syntax for FO + : ϕ, ψ := a ↑ ( x ) | x ≤ y | x < y | ϕ ∨ ψ | ϕ ∧ ψ | ∃ x.ϕ | ∀ x.ϕ As usual, variables x, y, . . . range over the positions of the input word. Thesemantics is the same as classical FO on words, with the notable exception that a ↑ ( x ) is true if and only if x is labelled by some b ∈ a ↑ . Unlike classical FO, itis not possible to require that a position is labelled by a letter a , except when a ↑ = { a } . This is necessary to guarantee that only monotone languages can bedeﬁned. Formal semantics of FO + If ϕ is a formula with free variables FV( ϕ ), its semantics is a set (cid:74) ϕ (cid:75) of pairsof the form ( u, α ), where u ∈ A ∗ and α : FV( ϕ ) → dom ( u ) a valuation for thefree variables. We write indistinctively u, α | = ϕ or ( u, α ) ∈ (cid:74) ϕ (cid:75) , to signify that( u, α ) is accepted by ϕ . If F V ( ϕ ) = ∅ , we can simply write u | = ϕ instead of( u, ∅ ) | = ϕ . In this case, the language recognized by ϕ is { u ∈ A ∗ | u | = ϕ } .We deﬁne (cid:74) ϕ (cid:75) by induction on φ : • u, α | = a ↑ ( x ) if u [ α ( x )] ≥ A a . • u, α | = x ≤ y if α ( x ) ≤ α ( y ). • u, α | = x < y if α ( x ) < α ( y ). • (cid:74) ϕ ∨ ψ (cid:75) = (cid:74) ϕ (cid:75) ∪ (cid:74) ψ (cid:75) . • (cid:74) ϕ ∧ ψ (cid:75) = (cid:74) ϕ (cid:75) ∩ (cid:74) ψ (cid:75) . • u, α | = ∃ x.ϕ if there exists i ∈ dom ( u ) such that ( u, α [ x (cid:55)→ i ]) ∈ (cid:74) ϕ (cid:75) . • u, α | = ∀ x.ϕ if for all i ∈ dom ( u ), we have ( u, α [ x (cid:55)→ i ]) ∈ (cid:74) ϕ (cid:75) . Example 6.

On alphabet A = { a, b, c } with a ≤ A b . • ∀ x.a ↑ ( x ) recognizes ( a + b ) ∗ . 5 ∃ x.b ↑ ( x ) recognizes A ∗ bA ∗ . Remark 7.

In the powerset alphabet framework where A = P ( P ), we cannaturally view FO + as the negation-free fragment of ﬁrst-order logic, by havingatomic predicates a ↑ ( x ) range directly over P instead of A = P ( P ). We canthen drop the a ↑ notation, as predicates from P are considered independently ofeach other. This way, p ( x ) will be true if and only if the letter S ∈ A labelling x contains p . A letter predicate S ↑ ( x ) in the former syntax can then be expressedby (cid:86) p ∈ S p ( x ), so FO + based on predicates from P is indeed equivalent to FO + based on A . We will take this convention when working on powerset alphabets. Example 8.

Let A = P ( P ) with P = { a, b } . The formula ∃ x, y. x ≤ y ∧ a ( x ) ∧ b ( y ) recognizes A ∗ { a, b } A ∗ + A ∗ { a } A ∗ { b } A ∗ . FO + Lemma 9.

Assume the order on A is trivial, i.e. no two distinct letters arecomparable. Then all languages are monotone, and any FO-deﬁnable languageis FO + -deﬁnable.Proof. The fact that all languages are monotone in this case follows from thefact that for two words u, v we have u ≤ A v if and only if u = v .If L is deﬁnable by a FO formula ϕ , we can build a FO + formula ψ from ϕ by pushing negations to the leaves using the usual rewritings such as ¬ ( ϕ ∧ ψ ) = ¬ ϕ ∨ ¬ ψ and ¬ ( ∃ x.ϕ ) = ∀ x. ¬ ϕ . For all letter a ∈ A and variable x , we thenreplace all occurrences of ¬ a ( x ) by (cid:87) b (cid:54) = a b ( x ). Lemma 10.

The logic FO + can only deﬁne monotone languages.Proof. By induction on formulas, see Appendix A.1 for details.It is natural to ask whether the converse of Lemma 10 holds: if a languageis FO-deﬁnable and monotone, then is it necessarily FO + -deﬁnable? This willbe the purpose of Section 4. Deﬁnition 11.

The quantiﬁer rank of a formula ϕ , noted qr( ϕ ) is its numberof nested quantiﬁers. It can be deﬁned by induction in the following way: if ϕ is atomic then qr( ϕ ) = 0, otherwise, qr( ϕ ∧ ψ ) = qr( ϕ ∨ ψ ) = max(qr( ϕ ) , qr( ψ ))and qr( ∃ x.ϕ ) = qr( ∀ x.ϕ ) = qr( ϕ ) + 1. We will explain here how FO + -deﬁnability can be captured by an ordered variantof Ehrenfeucht-Fra¨ıss´e games, that we will call EF + -games.This notion was deﬁned in [Sto95] for general structures, we will instantiateit here on words.We deﬁne the n -round EF + -game on two words u, v ∈ A ∗ , noted EF + n ( u, v ).This game is played between two players, Spoiler and Duplicator.6f k ∈ N , a k -position of the game is of the form ( u, α, v, β ), where α :[1 , k ] → dom ( u ) and β : [1 , k ] → dom ( v ) are valuations for k variables in u and v respectively. We can think of α and β as giving the position of k previouslyplaced tokens in u and v .A k -position ( u, α, v, β ) is valid if for all i ∈ [1 , k ], we have u [ α ( i )] ≤ A v [ β ( i )],and for all i, j ∈ [1 , k ], α ( i ) ≤ α ( j ) if and only if β ( i ) ≤ β ( j ).Notice the diﬀerence with usual EF-games: here we do not ask that tokensplaced in the same round have same label, but that the label in u is ≤ A -smallerthan the label is v . This feature is intended to capture FO + instead of FO.The game starts from the 0-position ( u, ∅ , v, ∅ ).At each round, starting from a k -position ( u, α, v, β ), the game is playedas follows. If ( u, α, v, β ) is not valid, then Spoiler wins. Otherwise, if k = n ,then Duplicator wins. Otherwise, Spoiler chooses a position in one of the twowords, and places pebble number k + 1 on it. Duplicator answers by placingpebble number k + 1 on a position of the other word. Let us call α (cid:48) and β (cid:48) the extensions of α and β with these new pebbles. If ( u, α (cid:48) , v, β (cid:48) ) is not a valid( k + 1)-position, then Spoiler immediately wins the game, otherwise, the gamemoves to the next round with ( k + 1)-position ( u, α (cid:48) , v, β (cid:48) ).We will note u (cid:22) n v when Duplicator has a winning strategy in EF + n ( u, v ). Theorem 12 ([Sto95, Thm 2.4]) . We have u (cid:22) n v if and only if for all formula ϕ of FO + with qr( ϕ ) ≤ n , we have ( u | = ϕ ) ⇒ ( v | = ϕ ) . Since the proof of Theorem 12 does not appear in [Sto95], it can be foundin Appendix A.2 for completeness.

Corollary 13.

A language L is not FO + -deﬁnable if and only if for all n ∈ N ,there exists ( u, v ) ∈ L × L such that u (cid:22) n v .Proof. ⇐ : Let n ∈ N , there exists ( u, v ) ∈ L × L such that u (cid:22) n v . ByTheorem 12, any formula of quantiﬁer rank n accepting u must accept v , so noformula of quantiﬁer rank n recognizes L . This is true for all n ∈ N , so L is notFO + -deﬁnable. ⇒ (contrapositive): Assume there exists n ∈ N such that for all ( u, v ) ∈ L × L , u (cid:54)(cid:22) n v . By Theorem 12, this means for all ( u, v ) ∈ L × L , there exists aformula ϕ u,v of quantiﬁer rank n accepting u but not v . Since there are ﬁnitelymany FO + formulas of rank n up to logical equivalence [Lib04, Lem 3.13], theset of formulas F = { ϕ u,v | ( u, v ) ∈ L × L } can be chosen ﬁnite. We deﬁne ψ = (cid:87) u ∈ L (cid:86) v / ∈ L ϕ u,v , where the intersections and union are ﬁnite since F isﬁnite. For all u ∈ L , u is accepted by (cid:86) v / ∈ L ϕ u,v hence by ψ , and conversely, aword accepted by ψ must be accepted by some (cid:86) v / ∈ L ϕ u,v , so it cannot be in L . We will now answer the natural question posed in Section 3.2: is any FO-deﬁnable monotone language FO + -deﬁnable?7 .1 A counter-example language This section is dedicated to the proof of the following theorem:

Theorem 14.

There is a FO-deﬁnable monotone language K on a powersetalphabet that is not FO + -deﬁnable. Let P = { a, b, c } and A = P ( P ), ordered by inclusion.We will note (cid:0) ab (cid:1) , (cid:0) bc (cid:1) , (cid:0) ca (cid:1) for the letters { a, b } , { b, c } , { a, c } respectively, and (cid:62) for { a, b, c } . If x ∈ P we will often note x instead of { x } to lighten notations.We now deﬁne the wanted language by: K := ( a ↑ b ↑ c ↑ ) ∗ + A ∗ (cid:62) A ∗ . We claim that K satisﬁes the requirements of Theorem 14. Lemma 15. K is monotone and FO-deﬁnable.Proof. The fact that K is monotone is straightforward from its deﬁnition, asthe union of two monotone languages.To show that K is FO-deﬁnable, we can simply use the classical characteri-zations of ﬁrst-order deﬁnable languages [DG08], by verifying for instance thatits 5-state minimal automaton is counter-free, or that its 21-element syntacticmonoid is aperiodic, see Appendix A.3 for details and representations of theseobjects. In addition, it is useful to give an intuition on how a FO formulacan describe the language K , as we will later build on this understanding inSection 5. We describe the behaviour of such a formula in Appendix A.4. Lemma 16. K is not FO + -deﬁnable.Proof. We establish this using Corollary 13. Let n ∈ N , and N = 2 n . We deﬁne u = ( abc ) N and v = [ (cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1) ] N − (cid:0) ab (cid:1)(cid:0) bc (cid:1) . Notice that u ∈ K , and v / ∈ K because | v | ≡ u (cid:22) n v toconclude. We give a strategy for Duplicator in EF + n ( u, v ). The strategy is anadaptation from the classical strategy showing that ( aa ) ∗ is not FO-deﬁnable[Lib04]. We consider that at the beginning, tokens ﬁrst , last are placed on theﬁrst and last positions on u , and ﬁrst (cid:48) , last (cid:48) on the ﬁrst and last position of v . The strategy of Duplicator during the game is then as follows: every timeSpoiler places a token in one of the words, Duplicator answers in the otherby replicating the closest distance (and direction) to an existing token. Thisstrategy is illustrated in Figure 1, where move i of Spoiler (resp. Duplicator) isrepresented by i (resp. i ). a b c a b c a b c a b c a b c a b c a b c a b c (cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) Figure 1: An example of Duplicator’s strategy for n = 3.8e have to show that this strategy of Duplicator allows him to play n roundswithout losing the game. This proof is similar to the classical one for ( aa ) ∗ , seee.g. [Lib04]. The main intuition is that the length of the non-matching intervalsbetween u and v is at worst divided by 2 at each round, and it starts with alength of 2 n , so Duplicator can survive n rounds. A detailed proof can be foundin Appendix A.5. In this section we consider ﬁrst-order logic on arbitrary signatures and uncon-strained structures.

Deﬁnition 17.

A formula ϕ is monotone in a predicate P if whenever a struc-ture S is a model of ϕ , any structure S (cid:48) obtained from S by adding tuples to P is also a model of ϕ . Example 18.

On graphs, where the only predicate is the edge predicate, theformula asking for the existence of a triangle is monotone, but the formulastating that the graph is not a clique is not monotone.

Deﬁnition 19.

A formula ϕ is positive in P if it never uses P under a negation.Let us recall the statement of Lyndon’s Theorem, which holds on general(possibly inﬁnite) structures: Theorem 20 ([Lyn59, Cor 2.1]) . If ψ is a FO formula monotone in predicates P , . . . , P n , then it is equivalent to a formula positive in predicates P , . . . , P n . We will now see explicitly how the language K from Section 4.1 can be usedto show that Lyndon’s Theorem fails on ﬁnite structures.The failure of this theorem on ﬁnite structures was ﬁrst shown in [AG87]with a very diﬃcult proof, then reproved in [Sto95] with a simpler one, usingthe Ehrenfeucht-Fra¨ıss´e technique. Still, the proof from [Sto95] is quite involvedcompared to the one we present here. Several monotone predicates

We will use here the fact that if P = { a, b, c } is a set of monadic predicates,then a ﬁnite model over the signature ( ≤ , a, b, c ) where the order ≤ is total issimply a ﬁnite word on the powerset alphabet A = P ( P ). Therefore, in order toview our words as general ﬁnite structures, it suﬃces to axiomatize the fact that ≤ a total order. This can be done with a formula ψ tot = ( ∀ x, y. x ≤ y ∨ y ≤ x ) ∧ ( ∀ x, y, z. x ≤ y ∧ y ≤ z ⇒ x ≤ z ) ∧ ( ∀ x, y. x ≤ y ∧ y ≤ x ⇒ x = y ) ∧ ( ∀ x. x ≤ x ).Notice that ψ tot is not monotone in the predicate ≤ .If we allow monotonicity on several predicates, then we directly obtain thefailure of Lyndon’s theorem on ﬁnite structures. Indeed, let ϕ be the FO-formuladeﬁning K , obtained in Lemma 15, and let ψ = ϕ ∧ ψ tot . Then, ψ is monotonein predicates a, b, c , and ﬁnite structures satisfying ψ are exactly words of K .9owever, as we proved in Theorem 14, no ﬁrst-order formula that is positive inpredicates a, b, c can deﬁne the same class of structures. Single monotone predicate

Other formulations of Lyndon’s Theorem use a single monotone predicate,as in [Sto95]. We can encode the language K in this framework, by using onebinary predicate A to represent all letter predicates. Notice that apart from theempty word, all words of K have at least 3 positions. The three ﬁrst positions arenoted 0 , ,

2. Let ψ be a FO-formula stating that there are at least 3 elements0 , ,

2, and that for all y / ∈ { , , } and for all x , A ( x, y ) holds. We build ϕ (cid:48) from the formula ϕ recognizing the language K by replacing every occurrenceof a ( x ) (resp. b ( x ) , c ( x )) by A ( x,

0) (resp. A ( x, , A ( x, ψ (cid:48) = ψ tot ∧ ψ ∧ ϕ (cid:48) , which accepts exactlynon-empty words of K . No formula positive in A can recognize this classof structures, otherwise we could obtain from it a FO + -formula contradict-ing Theorem 14. This is done by replacing every occurrence of A ( x, y ) by( a ( x ) ∧ y = 0) ∨ ( b ( x ) ∧ y = 1) ∨ ( c ( x ) ∧ y = 2) ∨ y ≥ Closure under surjective homomorphisms

Lyndon’s theorem is also often stated in the following way: if a FO formuladeﬁnes a class of structures closed under surjective homomorphisms, then it isequivalent to a positive formula. This formulation is equivalent to saying thatthe formula is monotone in all predicates. We can deal with this framework aswell, by incorporating a predicate (cid:54)≤ to the signature, and changing the formula ψ obtained above by ψ (cid:48)(cid:48) = ( ∃ x, y. x ≤ y ∧ x (cid:54)≤ y ) ∨ ( ψ ∧ ∀ x, y. ( x ≤ y ∨ x (cid:54)≤ y )).This way, monotonicity constraints on ≤ and (cid:54)≤ become trivial: the onlystructures of interest, where ψ (cid:48)(cid:48) is not trivially true or false, are the ones where (cid:54)≤ is indeed the negation of ≤ . We can therefore axiomatize in ψ tot the fact that ≤ is a total order, by using ≤ and (cid:54)≤ freely. FO + -deﬁnability This section is dedicated to the proof of the following Theorem:

Theorem 21.

The following problem is undecidable: given L a regular languageon a powerset alphabet, is L FO + -deﬁnable? We will start by describing the problem we will reduce from, called TuringMachine (TM) Mortality.The TM Mortality problem asks, given a deterministic TM M , whetherthere exists a bound n ∈ N such that from any ﬁnite conﬁguration (state of themachine, position on the tape, and content of the tape), the machine halts inat most n steps. We say that M is mortal if such an n exists. Theorem 22 ([Hoo66]) . The TM Mortality problem is undecidable. emark 23. The standard mortality problem as formulated in [Hoo66] doesnot ask for a uniform bound on the halting time, and allows for inﬁnite conﬁg-urations, but it is well-known that the two formulations are equivalent using acompactness argument. Indeed, if for all n ∈ N , the TM has a run of length atleast n from some conﬁguration C n , then we can ﬁnd a conﬁguration C that isa limit of a subsequence of ( C n ) n ∈ N , so that M has an inﬁnite run from C .Notice that the initial and ﬁnal states of M play no role here, so we will omitthem in the description of M . Indeed, we can assume that M halts wheneverthere is no transition from the current conﬁguration.Let M = (Γ , Q, ∆) be a deterministic TM, where Γ is the alphabet of M , Q its set of states, and ∆ ⊆ Q × Γ × Q × Γ × {← , →} its (deterministic) transitiontable. We want to build a regular language L such that L is FO + -deﬁnable ifand only if M is mortal.We will also assume without loss of generality that Q is partitioned into Q , Q , Q , and that all possible successors of a state in Q (resp. Q , Q ) arein Q (resp. Q , Q ). We will say that p has type i if p ∈ Q i . The successortype of 1 (resp. 2, 3) is 2 (resp. 3, 1).Our goal is now to start from an instance M of TM Mortality, and deﬁne aregular language L such that L is FO + -deﬁnable if and only if M is mortal. L base The base alphabet

We deﬁne ﬁrst a base alphabet , that will be used to encode conﬁgurations ofthe TM M : A base = Γ ∪ (∆ × Γ) ∪ (Γ × ∆) ∪ (∆ × Γ × ∆) ∪ ( Q × Γ) ∪ { } . We will note a δ (resp. a δ (cid:48) , a δ (cid:48) δ ) the letters from ∆ × Γ (resp. Γ × ∆ , ∆ × Γ × ∆),and [ q.a ] letters of Q × Γ.The letter [ q.a ] is used to encode the position of the reading head, q ∈ Q being the current state of the machine, and a ∈ Γ the letter it is reading.A letter a δ will be used to encode a position of the tape that the readinghead just left, via a transition δ writing an a on this position. A letter a δ (cid:48) willbe used for a position of the tape containing a , and that the reading head isabout to enter via a transition δ (cid:48) . We use a δ (cid:48) δ if both are simultaneously true,i.e. the reading head is coming back to the position it just visited.Finally, the letter Conﬁguration words

The encoding of a conﬁguration of M is therefore a word of the form (forexample) a a . . . ( a i − ) δ (cid:48) [ q.a i ]( a i +1 ) δ . . . a n . The letter ( a i +1 ) δ indicates thatthe reading head came from the right via a transition δ = ( , , q, a i +1 , ← ) (whereis a placeholder for an unknown element). The letter ( a i − ) δ (cid:48) indicates thatit will go in the next step to the left via a transition δ (cid:48) = ( q, a i , , , ← ).11 word u ∈ ( A base ) ∗ is a conﬁguration word if it encodes a conﬁguration of M with no incoherences. More formally, u is a conﬁguration word if u containsno Q × Γ (the reading head), and either one a δ andone b δ (cid:48) located on each side of the head, or just one letter a δ (cid:48) δ adjacent to thehead. Moreover, the labels δ and δ (cid:48) both have to be coherent with the currentcontent of the tape. Remark 24.

Because we ask these predecessor and successor labellings to bepresent, conﬁguration words only encode TM conﬁgurations that have a prede-cessor and a successor conﬁguration.The type of a conﬁguration word is simply the type in { , , } of the uniquestate it contains.Let us call C ⊆ ( A base ) ∗ the language of conﬁguration words. This language C is partitioned into C , C , C according to the type of the conﬁguration word.It is straightforward to verify that each C i is a FO-deﬁnable language.We can now deﬁne the language L base . The basic idea is that we want L base to be ( C C C ∗ , but in order to avoid unnecessary bookkeeping later inthe proof, we do not want to care about the endpoints being C and C . Let usalso drop the last C appearsat least once. This gives for L base the more complicated expression: ( ε + C C C C C C ∗ ( C + C C + C C C ) . Notice that L base cannot verify that the sequence is an actual run of M , sinceit just controls that the immediate neighbourhood of the reading head is valid,and that the types succeed each other according to the 1-2-3 cycle. The rest ofthe tape can be arbitrarily changed from one conﬁguration word to the next. A We now deﬁne another alphabet A amb ( amb for ambiguous), consisting of someunordered pairs of letters from A base . An unordered pair { a, b } is in A amb if a can be replaced by b in the encodings of two successive conﬁgurations of M .Thus, let A amb be the following set of unordered pairs (we note the “predecessor”element ﬁrst to facilitate the reading): • { a δ , a } , a ∈ Γ, δ ∈ ∆ • { a, a δ (cid:48) } , a ∈ Γ, δ (cid:48) ∈ ∆ • { a δ (cid:48) , [ q.a ] } , δ (cid:48) = ( , , q, , ) ∈ ∆ • { a δ (cid:48) δ , [ q.a ] } , δ (cid:48) = ( p, , q, , d ) ∈ ∆, δ = ( q, a, p, , − d ) ∈ ∆ • { [ p.a ] , b δ } , δ = ( p, a, , b, ) ∈ ∆ • { [ p.a ] , b δ (cid:48) δ } , δ = ( p, a, q, b, d ) ∈ ∆, δ (cid:48) = ( q, , , , − d ) ∈ ∆ Notice that all letters of A amb have a clear “predecessor” element: eventhe possible ambiguity regarding letters a δ (cid:48) δ are resolved thanks to the typeconstraint on transitions of M . For readability, we will use the notation (cid:0) ab (cid:1) instead of { a, b } , where the upper letter is the predecessor element.12e can now deﬁne the alphabet A = A base ∪ A amb , partially ordered by a < A b if a ∈ A base , b ∈ A amb , a ∈ b . For now we use the general formalism ofordered alphabet for simplicity. We will later describe in Remark 42 how theconstruction is easily modiﬁed to ﬁt in the powerset alphabet framework. Lemma 25. If u , u ∈ C encode two successive conﬁgurations of the samelength, then there exists v ∈ A ∗ such that u ≤ A v and u ≤ A v .Proof. It suﬃces to take the letters in v to be the union of letters in u , u whenthese letters diﬀer. For instance if u = aab δ (cid:48) [ p.a ] c δ c and u = aa δ (cid:48)(cid:48) [ q.b ] d δ (cid:48) cc ,then v = a (cid:0) aa δ (cid:48)(cid:48) (cid:1)(cid:0) b δ (cid:48) [ q.b ] (cid:1)(cid:0) [ p.a ] d δ (cid:48) (cid:1)(cid:0) c δ c (cid:1) c . Lemma 26.

Let u , u ∈ C , and v ∈ A ∗ such that u ≤ A v and u ≤ A v . Theneither u = u , or one is the successor conﬁguration of the other.Proof. The alphabet A amb is deﬁned to enforce this.Let [ p.a ] and [ q.b ] be the reading heads in u and u .If p and q have the same type, then the a δ , a δ (cid:48) extra labelling and the deﬁ-nition of A amb are not compatible with the reading head changing position, sothey must appear as the same [ p.a ] in the same position. In this case, this forces u = u , as any diﬀerence would result in an incompatibility with the deﬁnitionof C or A amb .If p and q do not have the same type, then one of them, say p , is thepredecessor in the 1-2-3 cycle order. Then, the next transition δ (cid:48) labelling aletter adjacent to the reading head in u must yield state q , and the previoustransition δ next to [ q.b ] must be equal to δ (cid:48) , in order to avoid violation oflocal constraints imposed by C and A amb . Therefore, locally there is a validtransition between u and u . On positions outside of the reading head in both u , u (labelled or not), the alphabet A ensures that the letters from Γ are thesame in both words. Lemma 27.

It is impossible to have three distinct words u , u , u ∈ C and v ∈ A ∗ such that for all i ∈ { , , } , u i ≤ A v .Proof. By Lemma 26, any pair from { u , u , u } must encode two consecutiveconﬁgurations of M . However, since the reading head must move at each step,from u to u and from u to u , this means the reading head moves either 0 or2 positions between u and u , which yields a contradiction. L We ﬁnally deﬁne L to be the monotone closure of L base on alphabet A , so that L can contain letters from A amb . Lemma 28. L is FO-deﬁnable and monotone. roof. The language L is monotone by construction.The fact that L is FO-deﬁnable can be obtained by combining the fact that C , C , C are all FO-deﬁnable, together with the fact that the language K fromSection 4.1 is FO-deﬁnable as well, by Lemma 15. We also need Lemma 27 toguarantee that the equivalent of the letter (cid:62) from K never appears. The goal of this section is to prove that L is FO + -deﬁnable if and only if M ismortal, using Corollary 13.The idea is that runs of M will allow us to build instances of EF + gamesfor L , with longer runs of M corresponding to Duplicator winning more rounds.Conversely, we will show that if M is mortal, then Spoiler wins any EF + gamein a ﬁxed number of rounds. M not mortal = ⇒ L not FO + -deﬁnable Let n ∈ N , we aim to build ( u, v ) ∈ L × L such that u (cid:22) n v .There is a conﬁguration from which M has a run of length N + 3, with N = 2 n +1 + 1. Let u = u u . . . u N be an encoding of this run where each u i ∈ C , and where we omitted the ﬁrst and last conﬁgurations of the run, whichmay not be representable in C by Remark 24. Here all the u i ’s are of the samelength K , the size of the tape needed for this run.By Lemma 25, for each i ∈ [0 , N − v i ∈ A ∗ such that u i ≤ A v i and u i +1 ≤ A v i .We build v = u v . . . v N − u N . Notice that v / ∈ L , because the typesof u and u N forces them to be separated by N − u , but in v they are separated by N − u (cid:22) n v . It is a simpleadaptation from the proof of Lemma 16, so we will just sketch the idea.Let us consider that initially, there is a pair of initial (resp. ﬁnal) tokens atthe beginning (resp. end) of u, v . We will consider that the initial tokens are“blue”, and the ﬁnal ones are “yellow”. In the following, a pair of correspondingtokens in u, v will be blue (resp. yellow) if they are at the same distance to thebeginning (resp. end) of the word.When Spoiler plays a token in u i (resp. v i ), Duplicator will look at the colorof the closest token in u i , (resp. v i ), and answer with a token of the same color,i.e. by playing in v i (resp. u i ) for blue, and in v i − (resp. u i +1 ) for yellow. Ofcourse, the same strategy applies to tokens played on k rounds, the numberof u or v ) is at least 2 n − k . This invariant guarantees that Duplicator wins the n -round game, since this gap will never be empty.14 .6.2 M mortal = ⇒ L FO + -deﬁnable Let M be a mortal T M , and n be the length of a maximal run of M , startingfrom any conﬁguration.We will show that L is FO + -deﬁnable, by giving a strategy for Spoiler in EF + f ( n ) ( u, v ) for any ( u, v ) ∈ L × L , where the number of rounds f ( n ) dependsonly on n , and not on u, v .Let us start by some auxiliary deﬁnitions on conﬁguration words.If u ∈ C is a conﬁguration word, let us deﬁne its height h ( u ) to be the lengthof the run starting in u , and not going outside of the tape speciﬁed in u .If u ∈ C , let us also deﬁne its n -approximation α n ( u ) as the maximal wordin ( A base ) ≤ n · ( Q × Γ) · ( A base ) ≤ n that is an inﬁx on u . I.e. we remove letterswhose distance to the reading head is bigger than n .Here are a few properties of the height: Lemma 29. • for all u ∈ C , we have < h ( u ) < n . • for all u ∈ C and x, y ∈ Γ ∗ , we have h ( xuy ) ≥ h ( u ) . • for all u ∈ C , we have h ( u ) = h ( α n ( u )) . • if v ∈ C is the successor conﬁguration of u ∈ C , then h ( v ) = h ( u ) − .Proof. The ﬁrst item is a consequence of the fact that M is mortal with bound n , and moreover we ask that all words from C have a predecessor conﬁgurationand a successor one (Remark 24). The second item comes from the fact thatthe run of length h ( u ) starting in u is still possible when adding a context x, y ,which is not aﬀected by this run. The third item uses the fact that a run canonly visit the n -approximation of u , so the context outside of α n ( u ) does notaﬀect the height h ( u ). The fourth item is a basic consequence of the deﬁnitionof the height. Corollary 30.

The height of a conﬁguration word u is a FO + -deﬁnable prop-erty, i.e for all k ∈ N there exists a FO + formula h k such that h k accepts aconﬁguration word u ∈ C if and only if h ( u ) = k .Proof. From Lemma 29, the formula h k can simply use a lookup table to verifythat α n ( u ) is of height k . Using FO + instead of FO is not a restriction whenwe assume the input to be in ( A base ) ∗ . When evaluated on A ∗ , the formula h k will accept the monotone closure of conﬁguration words of height k . Remark 31.

We use here the fact that computation is done locally around thereading head to obtain Corollary 30. This seems to make Turing Machines moresuited to this reduction than e.g. cellular automata, where computation is donein parallel on the whole tape.Thanks to the height abstraction, we will show that we can focus on playinga special kind of abstracted EF-game. 15 he integer game

Let Σ base = [0 , n ] and Σ amb = { (cid:0) ii − (cid:1) | ≤ i ≤ n } . Let Σ = Σ base ∪ Σ amb ,ordered by i ≤ Σ (cid:0) ii − (cid:1) and i − ≤ Σ (cid:0) ii − (cid:1) for all (cid:0) ii − (cid:1) ∈ Σ amb .We deﬁne the n -integer game as follows: It is played on an arena ( u, v ) with u ∈ (Σ base ) ∗ and v ∈ (Σ amb ) ∗ . If we note i (resp. j ) the ﬁrst (resp. last) letterof u , then the ﬁrst (resp. last) letter of v is (cid:0) ii − (cid:1) (resp. (cid:0) j +1 j (cid:1) ).The rest of the rules is very close to those of EF + ( u, v ): in each round,Spoiler plays a token in u or v , Duplicator has to answer with a token in theother word, while maintaining the order between tokens, and the constraintthat the label of a token in u is ≤ Σ -smaller than the label of its counterpartin v . We add an additional neighbouring constraint for Duplicator: consecutivetokens in one word must be related to consecutive tokens in the other, and inthis case, if two tokens of v are in consecutive positions labelled (cid:0) ii − (cid:1)(cid:0) jj − (cid:1) , thecorresponding tokens in u must be either labelled i, j or i − , j −

1. A mix i, j − i − , j is not allowed.2 3 2 4 3 5 4 3 4 4 (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) Figure 2: A position of the integer game.

Lemma 32.

For all n ∈ N , Spoiler can win any n -integer game in n rounds.Proof. We proceed by induction on n .For n = 1, the constraints on the game forces u ∈ ∗ v ∈ (cid:0) (cid:1) ∗ .We can have Spoiler play on the last occurrence of 1 in u , and on the suc-cessor position labelled 0. Duplicator cannot respond to these two moves whilerespecting the neighbouring constraint, so Spoiler wins in 2 moves.Assume now that for some n ≥

1, Spoiler wins any n -integer game in 2 n moves, and consider an ( n + 1)-integer game arena ( u, v ). If the letters n + 1and (cid:0) n +1 n (cid:1) do not appear in u, v respectively, then Spoiler can win in 2 n movesby induction hypothesis.If the letter n +1 does not appear in u , then let y be the ﬁrst position labelled (cid:0) n +1 n (cid:1) in v . By deﬁnition of the integer game y cannot be the ﬁrst position of v , otherwise u should start with n + 1. We will choose position y in v for theﬁrst move of Spoiler, let x be the position in u answered by Duplicator, we have u [ x ] = n . We can assume that x is not the ﬁrst position of u , otherwise Spoilercan win in the next move. If Spoiler were to play x − u , with u [ x −

1] = i ,by the neighbouring constraint Duplicator would be forced to answer y − v , with label (cid:0) i +1 i (cid:1) . This shows that the words u [ ..x −

1] and v [ ..y −

1] form acorrect n -integer arena, as the integer n +1 is not present anymore, and all otherconstraints are respected. Therefore, Spoiler can win by playing 2 n moves in16hese preﬁxes. This gives a total of 2 n + 1 moves in the original ( n + 1)-integergame.Finally, if the letter n + 1 does appear in u , Spoiler starts by playing theposition x in u corresponding to the last occurrence of n + 1 in u . Duplicatormust answer a position y labelled (cid:0) n +1 n (cid:1) . As before, using the neighbouringconstraint, we know that if i = u [ x + 1], then v [ y + 1] = (cid:0) ii − (cid:1) . Therefore, thewords u [ x + 1 .. ] and v [ y + 1 .. ] form an ( n + 1)-integer game arena, and moreoverthe letter n +1 does not appear in u [ x +1 .. ] (by choice of x ). Using the precedentcase, we know that Spoiler can win from there in 2 n + 1 moves, playing onlyon u [ x + 1 .. ] and v [ y + 1 .. ]. This gives a total of 2 n + 2 moves in the original( n + 1)-integer game, thereby completing the induction proof. Remark 33.

Lemma 32 still holds if the deﬁnition of n -integer game is general-ized to include the symmetric case where, if we note i, j the ﬁrst and last lettersof u respectively, v starts with (cid:0) i +1 i (cid:1) and ends with (cid:0) jj − (cid:1) . Indeed, it suﬃces toconsider the mirrored images of u and v to show that Spoiler wins in the sameamount of rounds. From the integer game to the original EF + -game .Let us now show how we can use this integer game to describe a strategy forSpoiler in the original EF + -game.Let ( u, v ) ∈ L × L , and recall that n is the length of a maximal run of theTuring Machine M . We will show that Spoiler wins EF + f ( n ) ( u, v ), for some f ( n )depending solely on n .Without loss of generality we can assume that u ∈ L base . This is becausethere exists u (cid:48) ∈ L base with u (cid:48) ≤ A u , and we can consider the pair ( u (cid:48) , v ) insteadof ( u, v ). Indeed, if Spoiler wins on ( u (cid:48) , v ), then the same strategy is winningon ( u, v ), since the winning condition is only easier for him in ( u, v ).Thus we can write u = u u . . . u N , where each u i is in C . Let us alsowrite v = v v . . . v T , where each v i does not contain EF + ( u, v ), that is winning ina number f ( n ) of rounds only depending on n .We will reuse ideas from the explicit formula from Lemma 15, developed inAppendix A.4.Let us call local factor a factor of the form v i v i +1 . A local factor is for-bidden if it is not a factor of any word in L . If v contains a forbidden localfactor, Spoiler can win in a constant number of moves (at most 5), by pointingthe problematic positions in this local factor, that Duplicator will not be ableto replicate in u . We therefore assume from now on that v does not contain aforbidden local factor. Deﬁnition 34.

A factor v i of v is compatible with type j ∈ { , , } if thereexists u (cid:48) ∈ C j with u (cid:48) ≤ A v i . The set-type of v i is { j | v i is compatible with j } .By Lemma 27, each v i is compatible with at most 2 distinct types in { , , } .If v i is compatible with 2 types, then one is the predecessor (resp. successor)of the other in the 1-2-3 cycle order, and we call it the ﬁrst type (resp. second ype ) of v i . We will consider that v (resp. v T ) is only compatible with type ( u )(resp. type ( u N )). Indeed, if Duplicator matches v to a word u i with i (cid:54) = 0,Spoiler can win the game in the next round, by choosing a u i (and same argument for v T ). Deﬁnition 35.

A factor of the form v i v i +1 . . . v j of v is called ambiguous if each v k is compatible with two types, and the set-types succeed each other inthe cycle order { , } → { , } → { , } → { , } . For instance if the set-typeof v i is { , } , then v i +1 must have set-type { , } , etc. An ambiguous factor is maximal if it is not contained in a strictly larger ambiguous factor. Deﬁnition 36.

A factor v i of v is called an anchor if either i = 0 , i = T or if v i − v i v i +1 is not ambiguous.If v i is an anchor, we can uniquely deﬁne its anchor type . It is simply its typeif i = 0 or T , and otherwise since v i − v i v i +1 is not ambiguous, we deﬁnethe anchor type of v i to be the only possible type for v i that does not create anincoherence with its two neighbours. Example 37.

Assume v has set-type { , } , v has set-type { , } , and v hasset-type { , } . Then v is an anchor, and its anchor type is 1. The type 3 isindeed impossible for v , since its successor type 1 is not in the set-type of v .Notice that if Duplicator maps an anchor v i to a word u j such that type ( u j )is not the anchor type of v i , then Spoiler can win in at most 5 moves, by pointingto a contradiction with the immediate neighbourhood of v i . Deﬁnition 38.

A maximal ambiguous factor v i v i +1 . . . v j is coherent if v i − v i . . . v j +1 ∈ L , and this is witnessed by the anchor types of v i − and v j +1 .In other words, v i v i +1 . . . v j is coherent if the anchor types at the ex-tremities are either both compatible with the ﬁrst type of both v i , v j , or areboth compatible with the second type. Here “compatible” is taken in the senseof the 1-2-3 cycle order. Example 39.

Let w = v i v i +1 . . . v j be a maximal ambiguous factor, where v i has set-type { , } and v j has set-type { , } . Assume v i − has anchor type1, so it is compatible with the second type of v i . This means that for w to becoherent, we need v j +1 to have anchor type 1, in order to be compatible withthe second type of v j as well. Lemma 40. v contains a maximal ambiguous factor w that is not coherent.Proof. Assume that all maximal ambiguous factors of v are coherent. Since v does not contain forbidden local factors, we have that the anchor types of twoconsecutive anchors follow the 1-2-3 order. This means that the anchor types,together with the coherence of maximal ambiguous factors, witness that v ∈ L .Since we know that v / ∈ L , this is a contradiction.18e are now ready to describe the strategy of Spoiler. Spoiler will place atoken at the beginning of w , and a token at the end of w . Because w is notcoherent, Duplicator is forced to answer with the ﬁrst type for one of thesetokens, and with the second type for the other: otherwise Spoiler immediatelywins by exposing the incoherence with the anchors delimiting w .Spoiler can now play only between these existing tokens, and import thestrategy from the integer game, by abstracting each word u i by its height andeach word v j ≥ A u (cid:48) , u (cid:48)(cid:48) (where u (cid:48) , u (cid:48)(cid:48) ∈ C , h ( u (cid:48) ) = 1 + h ( u (cid:48)(cid:48) )) by (cid:0) h ( u (cid:48) ) h ( u (cid:48)(cid:48) ) (cid:1) . Eachfactor of u, v delimited by u or v , i.e. just aftera Lemma 41.

If Duplicator does not comply with the rules of the integer game,then Spoiler can punish it in at most log n rounds.Proof. If u i is matched to v j , and their n -approximations do not match, thiscan be punished by Spoiler using log n rounds (with a dichotomy strategy, or n rounds with a naive strategy). This means that by Lemma 29, Spoiler canenforce the basic rule of the integer game, stating that if integer t is matched to (cid:0) s +1 s (cid:1) , then t = s + 1 or t = s . Using the correspondence between EF + -gamesand FO + -deﬁnability, this property can also be seen via Corollary 30.If neighbours are matched with non-neighbours, then it suﬃces for Spoilerto point the u i u i +1 is matched to v j v j +1 . Assume type ( u i ) is the ﬁrst (resp.second) type of v j while type ( u i +1 ) is the second (resp. ﬁrst) type of v j +1 . Bydeﬁnition of L , type ( u i +1 ) must be the successor type of type ( u i ), for instancewithout loss of generality, type ( u i ) = 1 and type ( u i +1 ) = 2. Then, the set-typeof v i is { , } (resp. { , } ) and the set-type of v j +1 is { , } (resp. { , } ). Thiscontradicts the fact that v i v i +1 is part of an ambiguous factor, as set-typesshould follow each other in the order { , } - { , } - { , } .Combining these arguments and by Lemma 32, we obtain that followingthis strategy, Spoiler will win in at most f ( n ) = 2 + 2 n + log n + 5 rounds, bypunishing Duplicator as soon as Duplicator loses the n -integer game.Using Corollary 13, we obtain that L is FO + -deﬁnable, with a formula ofquantiﬁer rank at most f ( n ). This concludes the proof of Theorem 21. Remark 42.

The alphabet A can be turned into a powerset alphabet, by addingall subsets of A base absent from A amb , rejecting any word containing ∅ but nonew non-empty subset, and accepting any word containing a new non-emptysubset. This shows that this undecidability result still holds in the special caseof powerset alphabets. 19 onclusion We believe this paper gives an example of fruitful interaction between automatatheory and model theory. Indeed, a classical result of model theory, the failure ofLyndon’s theorem on ﬁnite structures, has been greatly simpliﬁed by consideringregular languages. Conversely, this question coming from model theory, whenconsidered on regular languages, yields the ﬁrst (to our knowledge) naturalfragment of regular languages with undecidable membership problem. We hopethat the tools developed in this paper can be further used in both ﬁelds, andthat this will encourage more interactions of this form in the future.In the short term, we are interested in extending these techniques to theframework of cost functions, see [Kup14, Kup], and to other extensions of regularlanguages.

Acknowledgements.

I am grateful to Thomas Colcombet for bringing thistopic to my attention, and in particular for asking the question FO + ? = monotoneFO, as well as for many interesting exchanges. Thanks also to Amina Doumaneand Sam Van Gool for helpful discussions, and to Anupam Das and NatachaPortier for comments on earlier versions of this document.

References [AG87] Miklos Ajtai and Yuri Gurevich. Monotone versus positive.

J. ACM ,34(4):1004–1015, October 1987.[AG97] Natasha Alechina and Yuri Gurevich.

Syntax vs. semantics on ﬁnitestructures , pages 14–33. Springer Berlin Heidelberg, Berlin, Heidel-berg, 1997.[Boj04] Miko(cid:32)laj Boja´nczyk. A bounding quantiﬁer. In Jerzy Marcinkowskiand Andrzej Tarlecki, editors,

Computer Science Logic , pages 41–55,Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.[Col11] Thomas Colcombet. Green’s relations and their use in automata the-ory. In

Language and Automata Theory and Applications - 5th In-ternational Conference, LATA 2011, Tarragona, Spain, May 26-31,2011. Proceedings , volume 6638 of

Lecture Notes in Computer Sci-ence , pages 1–21. Springer, 2011.[Col12] Thomas Colcombet. Regular cost functions, part i: Logic and algebraover words. volume 9, 12 2012.[Col13] Thomas Colcombet. Magnitude monadic logic over words and theuse of relative internal set theory. In , pages 123–123, 2013.20DG08] Volker Diekert and Paul Gastin. First-order deﬁnable languages. In

Logic and Automata: History and Perspectives, Texts in Logic andGames , pages 261–306. Amsterdam University Press, 2008.[FSS81] M. Furst, J. B. Saxe, and M. Sipser. Parity, circuits, and thepolynomial-time hierarchy. In , pages 260–270, 1981.[GS92] Michelangelo Grigni and Michael Sipser. Monotone complexity.In

Poceedings of the London Mathematical Society Symposium onBoolean Function Complexity , page 57–75, USA, 1992. CambridgeUniversity Press.[Hoo66] Philip K. Hooper. The undecidability of the turing machine immor-tality problem.

Journal of Symbolic Logic , 31(2):219–234, 1966.[Kam68] Hans W. Kamp.

Tense Logic and the Theory of Linear Order . Phdthesis, University of Warsaw, 1968.[Kup] Denis Kuperberg. Erratum for [Kup14]. http://perso.ens-lyon.fr/denis.kuperberg/papers/Erratum.pdf.[Kup14] Denis Kuperberg. Linear temporal logic for regular cost functions.

Logical Methods in Computer Science , 10(1), 2014.[KVB12] Denis Kuperberg and Michael Vanden Boom. On the expressive powerof cost logics over inﬁnite words. In

Automata, Languages, and Pro-gramming , pages 287–298, Berlin, Heidelberg, 2012. Springer BerlinHeidelberg.[Lib04] Leonid Libkin.

Elements of Finite Model Theory . Springer, August2004.[LSS96] C. Lautemann, T. Schwentick, and I. A. Stewart. On positive p. In , page 162.IEEE Computer Society, 1996.[Lyn59] Roger C. Lyndon. Properties preserved under homomorphism.

PaciﬁcJ. Math. , 9(1):143–154, 1959.[MP71] Robert McNaughton and Seymour A. Papert.

Counter-Free Automata(M.I.T. Research Monograph No. 65) . The MIT Press, 1971.[PST89] Jean-Eric Pin, Howard Straubing, and Denis Therien. New results onthe generalized star-height problem. volume 349, pages 458–467, 021989.[PZ19] Thomas Place and Marc Zeitoun. Going higher in ﬁrst-order quantiﬁeralternation hierarchies on words.

J. ACM , 66(2), March 2019.21Ros08] Benjamin Rossman. Homomorphism preservation theorems.

J. ACM ,55, 07 2008.[Sch65] M.P. Sch¨utzenberger. On ﬁnite monoids having only trivial subgroups.

Information and Control , 8(2):190 – 194, 1965.[Ste94] Iain A. Stewart. Logical Description of Monotone NP Problems.

Jour-nal of Logic and Computation , 4(4):337–357, 1994.[Sto95] Alexei P. Stolboushkin. Finitely monotone properties. In

LICS, SanDiego, California, USA, June 26-29, 1995 , pages 324–330. IEEE Com-puter Society, 1995. 22

Appendix

A.1 Proof of Lemma 10

We prove here that any language deﬁnable by FO + is monotone.This is done by induction on the FO + formula ϕ , where the induction prop-erty is strengthened to include possible free variables: for all ( u, α ) ∈ (cid:74) ϕ (cid:75) and v ≥ A u , we have ( v, α ) ∈ (cid:74) ϕ (cid:75) . Base cases :Let ( u, α ) ∈ (cid:74) a ↑ ( x ) (cid:75) and v ≥ A u , we have v [ α ( x )] ≥ A u [ α ( x )] ≥ A a , so( v, α ) ∈ (cid:74) a ↑ ( x ) (cid:75) .Let ( u, α ) ∈ (cid:74) x ≤ y (cid:75) and v ≥ A u . We have α ( x ) ≤ α ( y ) so ( v, α ) ∈ (cid:74) x ≤ y (cid:75) .The argument for < instead of ≤ is identical. Induction cases :Let ( u, α ) ∈ (cid:74) ϕ ∨ ψ (cid:75) and v ≥ A u . We have ( u, α ) ∈ (cid:74) ϕ (cid:75) or ( u, α ) ∈ (cid:74) ψ (cid:75) .Therefore, by induction hypothesis, ( v, α ) ∈ (cid:74) ϕ (cid:75) or ( v, α ) ∈ (cid:74) ψ (cid:75) , hence ( v, α ) ∈ (cid:74) ϕ ∨ ψ (cid:75) . The argument for ϕ ∨ ψ is identical.Let ( u, α ) ∈ (cid:74) ∃ x.ϕ (cid:75) and v ≥ A u . There exists i ∈ dom ( u ) such that ( u, α [ x (cid:55)→ i ]) ∈ (cid:74) ϕ (cid:75) . By induction hypothesis, ( v, α [ x (cid:55)→ i ]) ∈ (cid:74) ϕ (cid:75) . Hence, ( v, α ) ∈ (cid:74) ∃ x.ϕ (cid:75) .The argument for ∀ is identical. A.2 Proof of Theorem 12

The proof is an adaptation of the classical proof for correctness of EF-games,see e.g. [Lib04].Since FO + is a fragment of FO, we can directly use the following Lemma: Lemma 43 ([Lib04, Lem 3.13]) . Let n, k ∈ N . Up to logical equivalence, thereare ﬁnitely many formulas of quantiﬁer rank at most n using k free variables. We will now show a strengthening of Theorem 12, where free variables areincorporated in the statement:

Theorem 44.

Let n, k ∈ N , u, v ∈ A , α : [1 , k ] → dom ( u ) and β : [1 , k ] → dom ( v ) be valuations for k variables x , . . . , x k in u, v respectively. Then Dupli-cator wins EF + n ( u, α, v, β ) if and only if for any FO + formula ϕ with qr( ϕ ) ≤ n using k free variables x . . . x k , we have u, α | = ϕ ⇒ v, β | = ϕ .Proof. We prove this by induction on n . Base case n = 0:Notice that quantiﬁer-free formulas of FO + are just positive boolean combi-nations of atomic formulas, that either compare the values of the free variables,or assert that the label of a free variable is ≤ A -greater than some letter a ∈ A .Consider that there is a quantiﬁer-free formula ϕ with k free variables accepting u, α but rejecting v, β . This happens if and only if there is a variable x i suchthat u [ α ( x i )] (cid:54)≤ A v [ β ( x i )], or if two variables x i , x j are not in the same order ac-cording to α and β . That is, this happens if and only if ( u, α, v, β ) is not a valid k -position, i.e. if and only if Spoiler wins the 0-round game EF +0 ( u, α, v, β ).23 nduction case : Assume there is a FO + formula ϕ with qr( ϕ ) ≤ n , ac-cepting u, α but not v, β . The formula ϕ is a positive combination of atomicformulas, formulas of the form ∃ x.ψ , and formulas of the form ∀ x.ψ . Therefore,one of these formulas accepts u, α but not v, β . If it is an atomic formula, thenSpoiler immediately wins EF + n ( u, α, v, β ) as in the base case.If it is a formula of the form ∃ x.ψ , then Spoiler can use the following strategy:pick a position p witnessing that the formula is true for u, α , and play theposition p in u . Duplicator will answer a position p (cid:48) in v , and the game willmove to ( u, α (cid:48) , v, β (cid:48) ), where α (cid:48) = α [ x (cid:55)→ p ] and β (cid:48) = β [ x (cid:55)→ p (cid:48) ]. Since theformula ψ has quantiﬁer rank at most n −

1, and accepts u, α (cid:48) but not v, β (cid:48) , byinduction hypothesis Spoiler can win in the remaining n − ∀ x.ψ , then Spoiler can do the following:pick a position p (cid:48) witnessing that the formula is false for v, β , and play theposition p (cid:48) in v . Duplicator will answer a position p in u , and the game willmove to ( u, α (cid:48) , v, β (cid:48) ), where α (cid:48) = α [ x (cid:55)→ p ] and β (cid:48) = β [ x (cid:55)→ p (cid:48) ]. Since theformula ψ has quantiﬁer rank at most n −

1, and accepts u, α (cid:48) but not v, β (cid:48) ,by induction hypothesis Spoiler can win in the remaining n − n accepting u, α must accept v, β , and we give a strategy forDuplicator in EF + n ( u, α, v, β ).Suppose Spoiler places pebble x at position p in u . Let α (cid:48) = α [ x (cid:55)→ p ]. ByLemma 43, up to logical equivalence, there is only a ﬁnite set F of FO + formulasof rank at most n − k + 1 free variables accepting u, α (cid:48) . Let ψ = (cid:86) ϕ ∈ F ϕ .Then u, α is accepted by the formula ∃ x.ψ of rank n (as witnessed by p ), so byassumption we also have v, β | = ∃ x.ψ . This means there is a p (cid:48) ∈ dom ( v ) suchthat v, β (cid:48) | = ψ , where β (cid:48) = β [ x (cid:55)→ p (cid:48) ]. Duplicator can answer position p (cid:48) in v ,and by induction hypothesis he will win the remaining of the game, since everyformula of F accepts v, β (cid:48) .Suppose now that Spoiler places pebble x at position p (cid:48) in v . Let β (cid:48) = β [ x (cid:55)→ p (cid:48) ]. Let F be the ﬁnite set of formulas (up to equivalence) of quantiﬁer rank atmost n − k + 1 free variables, that reject v, β (cid:48) . Let ψ = (cid:87) ϕ ∈ F ϕ , and ψ (cid:48) = ∀ x.ψ . By construction, x = p (cid:48) witnesses that ψ (cid:48) does not accept v, β . Ourassumption implies that it does not accept u, α either. So there is p ∈ dom ( u )such that u, α (cid:48) (cid:54)| = ∀ x.ψ , where α (cid:48) = α [ x (cid:55)→ p ]. Duplicator can answer position p in u . If a formula ϕ of rank at most n − u, α (cid:48) , then by constructionit cannot appear in F , therefore it is also true in v, β (cid:48) . By induction hypothesis,Duplicator wins the remaining ( n − u, α (cid:48) , v, β (cid:48) ). A.3 Automaton and monoid for the language K Recall that K = ( a ↑ b ↑ c ↑ ) ∗ + A ∗ (cid:62) A ∗ , with A = P ( { a, b, c } ).We show here that K is FO-deﬁnable, using the characterizations of [DG08]on both the minimal automaton and the syntactic monoid.24 inimal automaton The minimal deterministic ﬁnite automaton (DFA) A recognizing K is de-picted in Figure 3. We note ¬ a = {∅ , { b } , { c } , { b, c }} the sub-alphabet of A ofletters not containing a , similarly for ¬ b and ¬ c . The edges going to rejectingstate ⊥ are grayed and dashed, and the ones going to accepting sink state q (cid:62) are grayed, for readability. We also note a (cid:48) = a ↑ \ {(cid:62)} = {{ a } , (cid:0) ab (cid:1) , (cid:0) ca (cid:1) } , andsimilarly for b (cid:48) , c (cid:48) . q a q b q c q (cid:62) ⊥ a (cid:48) b (cid:48) c (cid:48) (cid:62) (cid:62) (cid:62) A (cid:62)¬ a ¬ b ¬ cA \ {(cid:62)} Figure 3: The minimal DFA A of K To show that K is FO-deﬁnable, it suﬃces to show that A is counter-free,i.e. that there is no word u ∈ A ∗ such that there are two distincts states p, q of A and k ∈ N such that p u → q and q u k → p . Assume such an u exists, since theonly non-trivial strongly connected component in A is { q a , q b , q c } , these statesare the only candidates for p, q . Since p, q are distinct, it means | u | is not amultiple of 3, and u induces a 3-cycle, either q a u → q b u → q c u → q a if | u | ≡ | u | ≡ u canbe read from all states from { q a , q b , q c } , while staying in this component. Sucha letter does not exist, so we reach a contradiction. The DFA A is counter-free,so K is FO-deﬁnable [DG08]. Syntactic monoid

It is also instructive to see what the syntactic monoid of K looks like, inparticular to get a ﬁrst intuition on how a FO formula can be deﬁned for K .We depict this monoid M in Figure 4, using the eggbox representation basedon Green’s relations: boxes are J -classes, lines are R -classes, columns are L -classes, and cells are H -classes. See [Col11] for an introduction to Green’srelations and eggbox representation.The syntactic morphism h : A ∗ → M is easily inferred, as elements of the25 (cid:0) ab (cid:1) (cid:0) ab (cid:1)(cid:0) bc (cid:1) (cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) bc (cid:1) (cid:0) bc (cid:1)(cid:0) ca (cid:1)(cid:0) ca (cid:1)(cid:0) ab (cid:1) (cid:0) ca (cid:1)(cid:0) ab (cid:1)(cid:0) bc (cid:1) (cid:0) ca (cid:1) a ab abcbca b bcca cab c ∅(cid:62) Figure 4: The syntactic monoid M of K monoid in h ( A ) are directly named after the letter mapping to them. Theaccepting part of M is F = { , (cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1) , (cid:0) ca (cid:1)(cid:0) ab (cid:1)(cid:0) bc (cid:1) , abc, (cid:62)} .To show that K is FO-deﬁnable, it suﬃces to verify that M is aperiodic,which is direcly visible on Figure 4, as all H -classes are singletons (see [Col11]). A.4 An explicit FO formula for the language K Recall that K = ( a ↑ b ↑ c ↑ ) ∗ + A ∗ (cid:62) A ∗ . We describe here the behaviour of a formulawitnessing that K is FO-deﬁnable.The A ∗ (cid:62) A ∗ part of K is just to rule out words containing (cid:62) by acceptingthem, which can be done by a formula ∃ x. (cid:62) ( x ). So we just need to design aformula ϕ for K (cid:48) = ( a ↑ b ↑ c ↑ ) ∗ , assuming the letter (cid:62) does not appear, the ﬁnalformula will then be ϕ ∨ ∃ x. (cid:62) ( x ).We will call forbidden pattern any word that is not an inﬁx of a word in K (cid:48) .Let us call anchor a position x such that either x is labelled by a singleton, or x is labelled by (cid:0) ab (cid:1) (resp. (cid:0) bc (cid:1) , (cid:0) ca (cid:1) ) with x + 1 labelled by a letter diﬀerent from (cid:0) bc (cid:1) (resp. (cid:0) ca (cid:1) , (cid:0) ab (cid:1) ). The idea is that if x is an anchor position of u ∈ K (cid:48) , then thereis only one possibility for the value of x mod 3. If the ﬁrst position is labelledby a letter from a ↑ , we will consider that it is an anchor labelled a , otherwisewe will reject the input word. Similarly, the last position is either a c anchor orcauses immediate rejection of the word. If x, y are successive anchor positions(i.e. with no other anchor positions between them), the word u [ x + 1 ..y − (cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1) ) ∗ . We say that an anchor x goes right-up (resp. right-down ) if we can replace the letter (cid:0) αβ (cid:1) by α (resp. β ) at position x + 1 without having a forbidden pattern in the immediate neighbourhood of x . Notice that x can not go both right-up and right-down. We deﬁne in thesame way the left-up and left-down property by replacing x + 1 with x −

1. Forinstance consider u = (cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1)(cid:0) ab (cid:1)(cid:0) bc (cid:1) c (cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1)(cid:0) ab (cid:1)(cid:0) bc (cid:1)(cid:0) bc (cid:1)(cid:0) ca (cid:1)(cid:0) ab (cid:1)(cid:0) bc (cid:1) , then apartfrom the ﬁrst and last position there are two anchors: x = 5 labelled c and y = 10 labelled (cid:0) bc (cid:1) , because it is followed by another (cid:0) bc (cid:1) .Figure 5: A visualization of anchorsThe anchor x goes left-up and right-up, while the anchor y goes left-up andright-down. If d ∈ { up, down } is a direction, we say that two successive anchors x < y agree on d if x goes right- d and y goes left- d . We say that x and y agreeif they agree on some d .Now, the formula ϕ will express the following properties: • for all x, x + 1 consecutive anchors, the letters at positions x, x + 1 , x + 2do not form a forbidden pattern (omit x + 2 if x + 1 is the last position). • all non-consecutive successive anchors agree.For instance the formula will accept the word u above, as the anchors 0 , x agreeon up, x, y agree on up, and y, last agree on down.It is routine to verify that these properties can be expressed in FO, and thatthey indeed characterize the language K (cid:48) . A.5 Detailed proof of Lemma 16

We show here that the strategy of Duplicator deﬁned in the proof of Theorem 14of Section 4.1 indeed guarantees that Duplicator wins EF + n ( u, v ).We will generally write p, p (cid:48) for related tokens, p being the position in u and p (cid:48) the position in v .The proof works by showing that the following invariant holds: after i roundswhere Duplicator did not lose, if tokens in positions p < q in u are related totokens p (cid:48) < q (cid:48) in v , and u [ p..q ] (cid:54)≤ A v [ p (cid:48) ..q (cid:48) ], let us note d = q − p, d (cid:48) = q (cid:48) − p (cid:48) ;then d = d (cid:48) + 1 and d ≥ n − i . In other words, if we call wrong interval a factor u [ p..q ] or v [ p (cid:48) ..q (cid:48) ] such that u [ p..q ] (cid:54)≤ A v [ p (cid:48) ..q (cid:48) ], the invariant states that after i rounds, the length of the smallest wrong interval in u is at least 2 n − i , andcorresponding wrong intervals diﬀer by 1, the one in u being longer. Before theﬁrst round, this invariant is true, as the only tokens are at the endpoints of u and v , and we have | u | = | v | + 1 and | u | ≥ n . Now, assume the invariant trueat round i , and consider round i + 1. When Spoiler plays a token in one of27he words, two cases can happen. If it is played between previous tokens p, q (resp. p (cid:48) , q (cid:48) ) such that u [ p..q ] ≤ A v [ p (cid:48) ..q (cid:48) ], then Duplicator will simply answerthe corresponding position in the other word, and the smallest wrong intervalis not aﬀected. If on the contrary, the new token is played in a minimal wronginterval, say u [ p, q ] on position r , then Duplicator will answer by preserving theclosest distance between r − p and q − r . For instance if r − p < q − r , Duplicatorwill answer r (cid:48) = p (cid:48) +( r − p ). We can notice that by deﬁnition of the words u and v , and since u [ p ] ≤ v [ p (cid:48) ] by the rules of the game, we have u [ p..r ] ≤ A v [ p (cid:48) ..r (cid:48) ],and in particular u [ r ] ≤ A v [ r (cid:48) ], so the move of Duplicator is legal. Moreover,since q − r > r − p , we have q − r ≥ q − p , so using the induction hypothesis, q − r ≤ n − ( i +1) . Moreover, since we had ( q − p ) = ( q (cid:48) − p (cid:48) ) + 1, we now have( q − r ) = ( q − p ) − ( r − p ) = ( q (cid:48) − p (cid:48) ) + 1 − ( r (cid:48) − p (cid:48) ) = ( q (cid:48) − r (cid:48) ) + 1, so theinvariant is preserved. The case where r − p ≥ q − r is symmetrical. If onthe other hand Spoiler plays in v a position r (cid:48) in a wrong interval v [ p (cid:48) ..q (cid:48) ], thenmin( r (cid:48) − p (cid:48) , q (cid:48) − r (cid:48) ) will be strictly smaller than 2 n − ( i +1) , and will be replicated bythe answer r of Duplicator in u [ p..q ]. This means that the new smallest wronginterval created in u will have length at least 2 n − ( i +1)+1)