Nondeterministic Syntactic Complexity
aa r X i v : . [ c s . F L ] J a n Nondeterministic Syntactic Complexity
Robert S. R. Myers, Stefan Milius ,⋆ , and Henning Urbat ,⋆⋆ Friedrich-Alexander-Universität Erlangen-Nürnberg
Abstract
We introduce a new measure on regular languages: their non-deterministic syntactic complexity . It is the least degree of any extensionof the ‘canonical boolean representation’ of the syntactic monoid. Equi-valently, it is the least number of states of any subatomic nondetermin-istic acceptor. It turns out that essentially all previous structural work onnondeterministic state-minimality computes this measure. Our approachrests on an algebraic interpretation of nondeterministic finite automataas deterministic finite automata endowed with semilattice structure. Cru-cially, the latter form a self-dual category.
Regular languages admit a plethora of equivalent representations: finite auto-mata, finite monoids, regular expressions, formulas of monadic second-order lo-gic, and numerous others. In many cases, the most succinct representation isgiven by a nondeterministic finite automaton (nfa) . Therefore, the investiga-tion of state-minimal nfas is of both computational and mathematical interest.However, this turns out to be surprisingly intricate; in fact, the task of min-imizing an nfa, or even of deciding whether a given nfa is minimal, is knownto be PSPACE-complete [23]. One intuitive reason is that minimal nfas lackstructure: a language may have many non-isomorphic minimal nondeterministicacceptors, and there are no clearly identified and easily verifiable mathematicalproperties distinguishing them from non-minimal ones. As a consequence, allknown algorithms for nfa minimization (and related problems such as inclusionor universality testing) require some form of exhaustive search [9, 11, 26]. Thissharply contrasts the situation for minimal deterministic finite automata (dfa) :they can be characterized by a universal property making them unique up toisomorphism, which immediately leads to efficient minimization.In the present paper, we work towards the goal of bringing more structureinto the theory of nondeterministic state-minimality. To this end, we propose anovel algebraic perspective on nfas resting on boolean representations of mon-oids, i.e. morphisms M → JSL ( S, S ) from a monoid M into the endomorphismmonoid of a finite join-semilattice S . Our focus lies on quotient monoids of the ⋆ Supported by Deutsche Forschungsgemeinschaft (DFG) under projects MI 717/5-2and MI 717/7-1, and as part of the Research and Training Group 2475 “Cybercrimeand Forensic Computing” (393541319/GRK2475/1-2019) ⋆⋆ Supported by Deutsche Forschungsgemeinschaft (DFG) under proj. SCHR 1118/8-2 R. Myers, S. Milius, and H. Urbat free monoid Σ ∗ recognizing a given regular language L ⊆ Σ ∗ . The largest suchmonoid is Σ ∗ itself, while the smallest one is the syntactic monoid syn ( L ). Forboth of them, L induces a canonical boolean representation Σ ∗ → JSL ( SLD ( L ) , SLD ( L ) and syn ( L ) → JSL ( SLD ( L ) , SLD ( L ))on the semilattice SLD ( L ) of all finite unions of left derivatives of L . The firstrepresentation gives rise to an algebraic characterization of minimal nfas: Theorem.
The size of a state-minimal nfa for L equals the least degree of anyextension of the canonical representation of Σ ∗ induced by L .Here, the degree of a representation refers to the number of join-irreducibles of theunderlying semilattice. In the light of this result, it is natural to ask for an ana-logous automata-theoretic perspective on the canonical representation of syn ( L )and its extensions. For this purpose, we introduce the class of subatomic nfas, ageneralization of atomic nfas earlier introduced by Brzozowski and Tamm [6]. Inorder to get a handle on them, we employ an algebraic framework that interpretsnfas in terms of JSL -dfas , i.e. deterministic finite automata in the category ofsemilattices. In this setting, the semilattice
SLD ( L ) used in the canonical repres-entations naturally arises as the minimal JSL -dfa for the language L . We shalldemonstrate that much of the structure theory of (sub-)atomic nfas reduces tothe observation that the category of JSL -dfas is self-dual . Our main result givesan algebraic characterization of minimal subatomic nfas:
Theorem.
The size of a state-minimal subatomic nfa for L equals the leastdegree of any extension of the canonical representation of syn ( L ).We call the measure suggested by the above theorem the nondeterministic syn-tactic complexity of the language L . It turns out to be extremely natural: asillustrated in Section 5, essentially all existing work on the structure of state-minimal nfas implicitly identifies classes of languages whose nondeterministicstate complexity equals their nondeterministic syntactic complexity, and thus isactually concerned with computing minimal subatomic acceptors. We start by introducing some notation and terminology used in the paper.
Semilattices. A (join-)semilattice is a poset ( S, ≤ S ) in which every finite subset X ⊆ S has a least upper bound, a.k.a. join, denoted by W X . A morphism ofsemilattices is a map preserving all finite joins. Let JSL denote the categoryof join-semilattices and their morphisms. An element j of a semilattice S is join-irreducible if for all finite subsets X ⊆ S with j = W X one has j ∈ X . Let J ( S ) = { j ∈ S : j is join-irreducible } . Let 2 = { , } denote the two-element semilattice with 0 ≤
1. Since 2 ∼ = ( P (1) , ⊆ )is the free semilattice on a single generator, morphisms from 2 into a semilattice ondeterministic Syntactic Complexity 3 S correspond uniquely to elements of S . Similarly, a morphism f : S → prime filter F = f − [1] ⊆ S , i.e. an upwards closed subsetsuch that W X ∈ F implies X ∩ F = ∅ for every finite subset X ⊆ S . If S is finite,prime filters are precisely the sets F = { s ∈ S : s s } for s ∈ S . If S is a sub-semilattice of a semilattice T , every prime filter F of S can be extended to theprime filter T \ ( ↓ ( S \ F )) of T , where ↓ X = { t ∈ T : t ≤ x for some x ∈ X } denotes the down-closure of a subset X ⊆ T . Equivalently, every morphism f : S → g : T →
2. In category-theoreticterminology, this means that the semilattice 2 forms an injective object of
JSL .The category
JSL f of finite semilattices is self-dual [25]. The equivalencefunctor JSL f ≃ −→ JSL opf sends a semilattice S to its dual semilattice S op obtainedby reversing the order, and a morphism f : S → T to the morphism f ∗ : T op → S op mapping t ∈ T to the ≤ S -largest element s ∈ S with f ( s ) ≤ T t . Note that f is adjoint to f ∗ : for s ∈ S and t ∈ T we have f ( s ) ≤ T t iff s ≤ S f ∗ ( t ). Languages. A language is a subset L of Σ ∗ , the set of finite words over an alphabet Σ . We let L = Σ ∗ \ L denote the complement and L r = { w r : w ∈ L } the reverse ,where w r = a n . . . a for w = a . . . a n . The left derivatives , right derivatives and two-sided derivatives of L are, respectively, given by u − L = { w ∈ Σ ∗ : uw ∈ L } , Lv − = { w ∈ Σ ∗ : wv ∈ L } and u − Lv − = { w ∈ Σ ∗ : uwv ∈ L } for u, v ∈ Σ ∗ .More generally, for U ⊆ Σ ∗ the language U − L = S u ∈ U u − L is called the leftquotient of L w.r.t. U . We define the following sets of languages generated by L : – LD ( L ) = { u − L : u ∈ Σ ∗ } , the set of all left derivatives of L ; – SLD ( L ), its closure under finite union; – BLD ( L ), its closure under all set-theoretic boolean operations; – BLRD ( L ), its closure under all boolean operations and right derivatives.In other words, SLD ( L ) is the ∪ -semilattice of all left quotients of L , or equival-ently, the ∪ -subsemilattice of P ( Σ ∗ ) generated by all left derivatives. Moreover, BLD ( L ) and BLRD ( L ) form the boolean subalgebras of P ( Σ ∗ ) generated by allleft derivatives and all two-sided derivatives, respectively. In this section, we set up the algebraic framework in which nondeterministicautomata can be studied. Since it involves considering several different types ofautomata, it is convenient to view them all as instances of a general categoricalconcept. For the rest of this paper, let Σ denote a fixed finite input alphabet. Definition 3.1.
Let C be a category and let X, Y ∈ C be two fixed objects.An automaton in C is a quadruple ( S, δ, i, f ) consisting of an object S ∈ C of states , a family δ = ( δ a : S → S ) a ∈ Σ of morphisms representing transitions , andtwo morphisms i : X → S and f : S → Y representing initial and final states(see the left-hand diagram below). A morphism between automata ( S, δ, i, f )and ( S ′ , δ ′ , i ′ , f ′ ) is given by a morphism h : S → S ′ in C preserving transitions, R. Myers, S. Milius, and H. Urbat initial states and final states, i.e. making the right-hand diagram below commutefor all a ∈ Σ : X i / / S δ a (cid:6) (cid:6) f / / Y X i / / i ′ $ $ ■■■■■■ S δ a / / h (cid:15) (cid:15) S h (cid:15) (cid:15) f / / YS ′ δ ′ a / / S ′ f ′ : : ✉✉✉✉✉✉ Let
Aut ( C ) denote the category of automata in C and their morphisms. Notation 3.2.
We put δ w := δ a n ◦ · · · ◦ δ a for w = a . . . a n in Σ ∗ . Example 3.3. (1) An automaton D = ( S, δ, i, f ) in
Set , the category of setsand functions, with X = 1 and Y = 2, is precisely a classical deterministicautomaton . It is called a dfa if S is finite. We identify the map i : 1 → S withan initial state s = i ( ∗ ) ∈ S , and the map f : S → F = f − [1] ⊆ S of final states. The language L ( D, s ) accepted by a state s ∈ S is the set of allwords w ∈ Σ ∗ such that δ w ( s ) ∈ F . The language L ( D ) accepted by D is thelanguage accepted by the state s .(2) An automaton N = ( S, δ, i, f ) in
Rel , the category of sets and relations,with X = Y = 1, is precisely a classical nondeterministic automaton . It is calledan nfa if S is finite. We identify i ⊆ × S with a set I ⊆ S of initial statesand f ⊆ S × F ⊆ S of final states. Thus, in our view an nfa mayhave multiple initial states. The language L ( N, R ) accepted by a subset R ⊆ S consists of all w ∈ Σ ∗ such that ( r, s ) ∈ δ w for some r ∈ R and s ∈ F . Thelanguage L ( N ) accepted by N is the language accepted by the set I .(3) An automaton A = ( S, δ, i, f ) in
JSL with X = Y = 2, shortly a JSL -automaton , is given by a semilattice S of states, a family δ = ( δ a : S → S ) a ∈ Σ of semilattice morphisms specifying transitions, an initial state s ∈ S (corres-ponding to i : 2 → S ), and a prime filter F ⊆ S of final states (corresponding to f : S → JSL -dfa if S is finite. The language accepted by a state s ∈ S or by the automaton A , resp., is defined as for deterministic automata. Remark 3.4 (JSL-dfas vs. nfas).
Dfas, nfas and
JSL -dfas are expressivelyequivalent; they all accept precisely the regular languages. The interest of
JSL -dfas is that they constitute an algebraic representation of nfas:(1) Every
JSL -dfa A = ( S, δ, s , F ) induces an equivalent nfa J ( A ) on the set J ( S ) of join-irreducibles of S . Given s, t ∈ J ( S ) and a ∈ Σ , there is a transition s a −→ t in J ( A ) iff t ≤ δ a ( s ); the initial states are those s ∈ J ( S ) with s ≤ s , andthe final states form the set J ( S ) ∩ F .(2) Conversely, for every nfa N = ( Q, δ, I, F ), the subset construction yields anequivalent
JSL -dfa P ( N ) with states P ( Q ) (the ∪ -semilattice of subsets of Q ),transitions P δ a : P ( Q ) → P ( Q ), X δ a [ X ], initial state I ∈ P ( Q ), and finalstates those subsets of Q containing some state from F . Note that J ( P ( Q )) ∼ = Q . ondeterministic Syntactic Complexity 5 It follows that the task of finding a state-minimal nfa for a given language isequivalent to finding a
JSL -dfa with a minimum number of join-irreducibles [4].This idea has recently been extended to a general coalgebraic framework [32,39].Recall that the minimal dfa [7] for a regular language L , denoted by dfa ( L ),has states LD ( L ) (the set of left derivatives of L ), transitions K a −→ a − K for K ∈ LD ( L ) and a ∈ Σ , initial state L = ε − L , and final states those K ∈ LD ( L )containing ε . Up to isomorphism, it can be characterized as the unique dfaaccepting L that is reachable (i.e. every state is reachable from the initial statevia transitions) and simple (i.e. any two distinct states accept distinct languages).We now develop the analogous concepts for JSL -automata; they are instances ofthe categorical theory of minimality due to Arbib and Manes [3] and Goguen [15].Let us first observe that every language has two canonical infinite
JSL -acceptors:
Definition 3.5.
Let L ⊆ Σ ∗ be a language.(1) The initial JSL -automaton
Init ( L ) for L has states P f ( Σ ∗ ) (the ∪ -semilatticeof finite subsets of Σ ∗ ), initial state { ε } , final states all X ∈ P f ( Σ ∗ ) with X ∩ L = ∅ , and transitions X Xa = { xa : x ∈ X } for X ∈ P f ( Σ ∗ ) and a ∈ Σ .(2) The final JSL -automaton
Fin ( L ) for L has states P ( Σ ∗ ) (the ∪ -semilatticeof all languages), initial state L , final states all languages K containing ε , andtransitions K a − K for K ∈ P ( Σ ∗ ) and a ∈ Σ .As suggested by the terminology, these automata form the initial and the finalobject in the category of JSL -automata accepting L : Lemma 3.6 [3, 15].
For every
JSL -automaton A = ( S, δ, s , F ) accepting thelanguage L ⊆ Σ ∗ , there exist unique JSL -automata morphisms e A : Init ( L ) → A and m A : A → Fin ( L ) . The map e A sends { w , . . . , w n } ∈ P f ( Σ ∗ ) to the state W ni =1 δ w i ( s ) , and themap m A sends a state s ∈ S to L ( A, s ) , the language accepted by s . Definition 3.7. A JSL -automaton A = ( S, δ, s , F ) is called(1) reachable if the unique morphism e A : Init ( L ) → A is surjective, i.e. everystate is of the form W ni =1 δ w i ( s ) for some w , . . . , w n ∈ Σ ∗ ;(2) simple if the unique morphism m A : A → Fin ( L ) in injective, i.e. any twodistinct states accept distinct languages;(3) minimal if it is both reachable and simple. Remark 3.8. (1) The category
Aut ( JSL ) has a factorization system given bysurjective and injective morphisms. Thus, for every
JSL -automata morphism h : ( S, δ, i, f ) → ( S ′ , δ ′ , i ′ , f ′ ) with image factorization h = ( S e / / / / S ′′ / / m / / S ′ ) in JSL , there exists a unique
JSL -automaton structure ( S ′′ , δ ′′ , i ′′ , f ′′ ) on S ′′ mak-ing both e and m automata morphisms. We call e the coimage and m the image of h . Subautomata and quotient automata of JSL -automata are represented byinjective and surjective morphisms, respectively.
R. Myers, S. Milius, and H. Urbat (2) Every
JSL -automaton A has a unique reachable subautomaton reach ( A ) A , the reachable part of A . It is the smallest subautomaton of A and arises asthe image of the unique morphism e A : Init ( L ) → A . Thus, A is reachable iff A ∼ = reach ( A ) iff A has no proper subautomaton . Let us emphasize that a state in reach ( A ) is not necessarily reachable when A isviewed as an ordinary dfa. For distinction, we thus call a state JSL -reachable ifit lies in reach ( A ), and dfa-reachable if it is reachable in the usual sense.(3) Dually, every JSL -automaton A has a unique simple quotient automaton A ։ simple ( A ), the simplification of A . It is the smallest quotient automaton of A and arises as the coimage of the unique morphism m A : A → Fin ( L ). Thus, A is simple iff A ∼ = simple ( A ) iff A has no proper quotient automaton . (4) Every language L ⊆ Σ ∗ has a minimal JSL -automaton, unique up to iso-morphism. It can be constructed as the image of the unique automata morphism h L : Init ( L ) → Fin ( L ). Since h L sends { w , . . . , w n } ∈ P f ( Σ ∗ ) to the language S ni =1 w − i L , the minimal automaton of L is the subautomaton SLD ( L ) of Fin ( L )carried by the semilattice of finite unions of left derivatives of L . Example 3.9.
The minimal
JSL -dfa accepting L = { a, aa } is shown below,with the dashed lines representing the partial order. { ε, a }− La (cid:127) (cid:127) a − La (cid:23) (cid:23) La l l ❨❨❨❨❨❨❨❨❨❨❨❨❨❨❨ o o ( aa ) − L a ∅ a w w Remark 3.10.
The self-duality of
JSL f lifts to a self-duality of the category of JSL -dfas. The equivalence functor
Aut ( JSL f ) ≃ −→ Aut ( JSL f ) op maps a JSL -dfa A = ( S, ( δ a : S → S ) a ∈ Σ , i : 2 → S, f : S →
2) to its dual automaton A op = ( S op , ( δ ∗ a : S op → S op ) a ∈ Σ , f ∗ : 2 → S op , i ∗ : S op → , using that 2 op ∼ = 2. Thus, the initial state of A op is the ≤ S -largest non-final stateof A , and its final states are those s ∈ S with s S s . Given s, t ∈ S and a ∈ Σ ,there is a transition s a −→ t in A op iff t is the ≤ S -largest state with δ a ( t ) ≤ S s .The dualization of JSL -dfas can be seen as an algebraic generalization of thereversal operation on nfas. Recall that the reverse of an nfa N is the nfa N r obtained by flipping all transitions and swapping initial and final states. If N accepts the language L , then N r accepts the reverse language L r . Lemma 3.11.
For each nfa N = ( Q, δ, I, F ) , we have the JSL -dfa isomorphism [ P ( N )] op ∼ = −→ P ( N r ) , X X = Q \ X. ondeterministic Syntactic Complexity 7 The following lemma summarizes some important properties of A op : Lemma 3.12.
Let A = ( S, δ, i, f ) be a JSL -dfa. (1)
For every s ∈ S , we have L ( A op , s ) = { w ∈ Σ ∗ : δ w r ( s ) S s } . (2) If A accepts the language L , then A op accepts the reverse language L r . (3) We have [ reach ( A )] op ∼ = simple ( A op ) . Thus, A is reachable iff A op is simple. Our next goal is to give, for every regular language L , dual characterizationsof SLD ( L ), BLD ( L ) and BLRD ( L ), the JSL -subautomata of
Fin ( L ) carried byall finite unions of left derivatives, boolean combinations of left derivatives andboolean combinations of two-sided derivatives, respectively. These results formthe core of our duality-based approach to (sub-)atomic nfas in the next section.The minimal JSL -dfa
SLD ( L ) admits the following dual description: Proposition 3.13.
For every regular language L , the minimal JSL -dfas for L and L r are dual. More precisely, we have the JSL -dfa isomorphism dr L : [ SLD ( L r )] op ∼ = −→ SLD ( L ) , K ( K r ) − L. Remark 3.14. (1) The isomorphism dr L induces a bijection between the left and right factors of L , i.e. the inclusion-maximal left/right solutions of X · Y ⊆ L .Conway [10] observed that the left and right factors are respectively { K r : K ∈ SLD ( L r ) } and { K : K ∈ SLD ( L ) } and that they biject. Backhouse [5] observedthat they are dually isomorphic posets. Proposition 3.13 provides an explicitautomata-theoretic lattice isomorphism arising canonically via duality.(2) The isomorphism dr L is tightly connected to the dependency relation [18,20]of a regular language L , i.e. the binary relation given by DR L ⊆ LD ( L ) × LD ( L r ) , DR L ( u − L, v − L r ) : ⇐⇒ uv r ∈ L. Its restriction DR jL := DR L ∩ J ( SLD ( L )) × J ( SLD ( L r )) to the ∪ -irreducible leftderivatives of L and L r is called the reduced dependency relation . The followingtheorem shows that the semilattice of left quotients and the dependency relationare essentially the same concepts. In part (3), we use that the isomorphism dr L restricts to a bijection between the ∪ -irreducible derivatives of L r and the meet-irreducible elements of the lattice SLD ( L ). Theorem 3.15 (Dependency theorem). (1)
We have the
JSL -isomorphism
SLD ( L ) ∼ = −→ ( {DR L [ X ] : X ⊆ LD ( L ) } , ∪ , ∅ ) , K
7→ { v − L r : v ∈ K r } . Note that its codomain forms a subsemilattice of P ( LD ( L r )) . (2) For all u, v ∈ Σ ∗ we have DR L ( u − L, v − L r ) ⇐⇒ u − L * dr L ( v − L r ) . (3) The following diagram in
Rel commutes: J ( SLD ( L r )) ∼ =dr L / / M ( SLD ( L )) J ( SLD ( L )) DR jL O O J ( SLD ( L )) * O O R. Myers, S. Milius, and H. Urbat
Let us now turn to a dual characterization of the
JSL -dfa
BLD ( L ): Proposition 3.16.
For every regular language L , the JSL -dfa
BLD ( L ) is dualto the subset construction of the minimal dfa for L r : [ BLD ( L )] op ∼ = P ( dfa ( L r )) . The isomorphism maps { w − L r , . . . , w − n L r } ∈ P ( dfa ( L r )) to T ni =1 At( w r i ) , where At( x ) is the unique atom (= join-irreducible) of BLD ( L ) containing x . To state the dual characterization of
BLRD ( L ), we recall two standard conceptsfrom algebraic language theory [33]. The transition monoid of a deterministicautomaton D = ( S, δ, i, f ) is the image tm ( D ) ⊆ Set ( S, S ) of the morphism Σ ∗ → Set ( S, S ) , w δ w . Thus, tm ( M ) is carried by the set of extended transition maps δ w ( w ∈ Σ ∗ )with multiplication given by δ v • δ w = δ vw and unit id S = δ ε : S → S . We mayview tm ( D ) as a deterministic automaton with initial state id S , final states all δ w such that w is accepted by D , and transitions δ w a −→ δ wa for w ∈ Σ ∗ and a ∈ Σ . This automaton accepts the same language as D . The syntactic monoid syn ( L ) of a regular language L ⊆ Σ ∗ is the transition monoid of its minimal dfa: syn ( L ) = tm ( dfa ( L )) . Equivalently, syn ( L ) is the quotient monoid of the free monoid Σ ∗ modulo the syntactic congruence of L , i.e the monoid congruence on Σ ∗ given by v ≡ L w iff ∀ x, y ∈ Σ ∗ : xvy ∈ L ⇐⇒ xwy ∈ L. The associated surjective monoid morphism µ L : Σ ∗ ։ syn ( L ), mapping w ∈ Σ ∗ to its congruence class [ w ] L ∈ syn ( L ), is called the syntactic morphism . Proposition 3.17.
For every regular language L , the JSL -dfa
BLRD ( L ) is dualto the subset construction of syn ( L r ) , viewed as a dfa: [ BLRD ( L )] op ∼ = P ( syn ( L r )) . The isomorphism maps { [ w ] L r , . . . , [ w n ] L r } ∈ P ( syn ( L r )) to T ni =1 At( w i r ) , with At( x ) denoting the unique atom of BLRD ( L ) containing x . Our final duality result in this section concerns the transition semiring [35], ageneralization of the transition monoid to
JSL -automata. Note that the monoid
JSL ( S, S ) of endomorphisms of a semilattice S forms an idempotent semiringwith join defined pointwise: for any f, g : S → S , the morphism f ∨ g : S → S is given by s f ( s ) ∨ g ( s ). The transition semiring of a JSL -automaton A =( S, δ, i, f ) is the image ts ( A ) ⊆ JSL ( S, S ) of the semiring morphism P f ( Σ ∗ ) → JSL ( S, S ) , { w , . . . , w n } 7→ n _ i =1 δ w i . ondeterministic Syntactic Complexity 9 Here P f ( Σ ∗ ) is the free idempotent semiring on Σ , with composition given byconcatenation of languages and join given by union. Thus, ts ( A ) is the semi-ring carried by all morphisms W ni =1 δ w i for w , . . . , w n ∈ Σ ∗ , with join given asabove and multiplication W j δ v j • W i δ w i = W i,j δ v j w i . We view ts ( A ) as a JSL -automaton with initial state id S = δ ε , final states all W i δ w i such that some w i is accepted by A , and transitions W ni =1 δ w i a −−→ W ni =1 δ w i a for w , . . . , w n ∈ Σ ∗ and a ∈ Σ . This JSL -automaton is reachable and accepts the same language as A . It has the following dual characterization: Notation 3.18.
Given a simple
JSL -automaton A = ( S, δ, i, f ), the subauto-maton of
Fin ( L ) obtained by closing S (viewed as a set of languages) under rightderivatives is called the right-derivative closure of A and denoted rdc ( A ). Proposition 3.19.
Let A be a reachable JSL -dfa. Then the transition semiringof A , viewed as a JSL -dfa, is dual to the right-derivative closure of A op : [ ts ( A )] op ∼ = rdc ( A op ) . Note that both [ ts ( A )] op and rdc ( A op ) are simple, hence subautomata of Fin ( L ).Thus, the isomorphism just expresses that their states accept the same languages. Based upon the duality results of the previous section, we will now introduce ouralgebraic approach to nondeterministic state minimality. It rests on the conceptof a representation of a monoid on a finite semilattice.
Definition 4.1 (Boolean representation).
Let M be a monoid.(1) A boolean representation of M is given by a finite semilattice S together witha monoid morphism ρ : M → JSL ( S, S ). The degree of ρ isdeg( ρ ) := | J ( S ) | . (2) Given boolean representations ρ i : M → JSL ( S i , S i ), i = 1 ,
2, an equivariantmap f : ρ → ρ is a JSL -morphism f : S → S such that f ( ρ ( m )( s )) = ρ ( m )( f ( s )) for all m ∈ M and s ∈ S .If f is injective, we say that the representation ρ extends ρ . Remark 4.2. (1) The above representations are called boolean because sem-ilattices are precisely semimodules over the boolean semiring 2 = { , } with1 + 1 = 1. For more on representations over general commutative semirings,see [21].(2) The category of boolean representations of M coincides with the functorcategory JSL M f , viewing M as a one object category. Definition 4.3 (Canonical representation).
For every regular language L ,the canonical boolean representation of the syntactic monoid syn ( L ) is given by κ L : syn ( L ) → JSL ( SLD ( L ) , SLD ( L )) , [ w ] L λK.w − K. It induces the canonical boolean presentation of the free monoid Σ ∗ given by κ L ◦ µ L : Σ ∗ → JSL ( SLD ( L ) , SLD ( L )) , w λK.w − K, where µ L : Σ ∗ ։ syn ( L ) is the syntactic morphism.The representation κ L ◦ µ L amounts to constructing the transition semiring ofthe minimal JSL -automaton
SLD ( L ), i.e. the syntactic semiring [35] of L . Example 4.4.
We describe the canonical boolean representation κ L n for thelanguage L n := (0 + 1) ∗ n , n ∈ N . Let S := 2 n +1 ⊥ be the semilatticeof binary words of length n + 1, ordered pointwise, with an additional bottomelement ⊥ . Then SLD ( L n ) is isomorphic to S , as witnessed by the isomorphism f : S ∼ = −→ SLD ( L n ) , f ( ⊥ ) = ∅ , f ( w ) = w − L n . Thus, κ L n is isomorphic to the representation ρ : syn ( L n ) → JSL ( S, S ) where:(1) ρ ([0] L n ) : S → S performs a left-shift (distinct from left-rotate);(2) ρ ([1] L n ) : S → S performs a left-shift and sets the last bit as 1.Finally, deg( κ L n ) = deg( ρ ) = 1 + | J (2 n +1 ) | = n + 2 is the number of states ofthe usual minimal nfa for L . Example 4.5.
We describe the canonical boolean presentation κ L for the lan-guage L = a ( a + a ) + a ( a + a ) + a ( a + a ) over Σ = { a , a , a } . Considerthe ∪ -semilattice M = {∅ , { a , a } , { a , a } , { a , a } , Σ } . Then SLD ( L ) is iso-morphic to the product semilattice 2 × M × f : SLD ( L ) ∼ = −→ × M × , f ( X ) = ( X ∩ Σ , X ∩ Σ, X ∩ { ε } ) . Note that the first and third component is either ∅ or one other set, i.e. it may beidentified with the elements of 2. For i = 1 , , α i : 2 → M , α i (1) = Σ \ { a i } ; β i : M → , β i ( S ) = 1 ⇐⇒ a i ∈ S ; γ : 2 → γ (1) = 0; δ : M × × → × M × , δ ( x, y, z ) = ( z, x, y ) . Then κ L is isomorphic to ρ : syn ( L ) → JSL (2 × M × , × M ×
2) where ρ ([ a i ] L ) = ( 2 × M × α i × β i × γ −−−−−−→ M × × δ −→ × M × . Thus, deg( κ L ) = deg( ρ ) = 1 + 3 + 1 = 5. An analogous description of κ L existsfor any language L where each word has the same length. ondeterministic Syntactic Complexity 11 The next theorem links minimal nfas and representations.
Definition 4.6.
The nondeterministic state complexity ns( L ) of a regular lan-guage L is the least number of states of any nfa accepting L . Theorem 4.7.
For every regular language L , the nondeterministic state com-plexity ns( L ) is the least degree of any boolean representation extending the ca-nonical representation κ L ◦ µ L : Σ ∗ → JSL ( SLD ( L ) , SLD ( L )) .Proof (Sketch). (1) Given a k -state nfa N = ( Q, δ, I, F ) accepting L , consider the subsemilattice langs ( N ) = simple ( P ( N )) of P ( Σ ∗ ) on all languages accepted by subsets of Q .The embedding SLD ( L ) langs ( N ) yields an extension of κ L ◦ µ L . Since thesemilattice langs ( N ) is generated by the languages accepted by single states of N , this extension has degree at most k .(2) Conversely, let ρ : Σ ∗ → JSL ( S, S ) be a boolean representation of degree k extending κ L ◦ µ L , witnessed by an injective equivariant map h : SLD ( L ) S .One can equip S with a JSL -dfa structure making h an automata morphism.Since morphisms preserve accepted languages, it follows that S accepts L . Thenthe nfa of join-irreducibles of S , see Remark 3.4, is a k -state nfa accepting L . ⊓⊔ As an application, let us return to the dependency relation DR L introducedin Remark 3.14(2). Recall that a biclique of a relation R ⊆ X × Y (viewed asa bipartite graph) is a subset of the form X ′ × Y ′ ⊆ R , where X ′ ⊆ X and Y ′ ⊆ Y . A biclique cover of R is a set C of bicliques with R = S C . The bipartite dimension dim( R ) is the least cardinality of any biclique cover of R . Theorem 4.8 (Gruber-Holzer [18]).
For every regular language L , we have dim( DR L ) ≤ ns( L ) . We give a new algebraic proof of this result based on boolean representations.
Proof. (1) The task of computing biclique covers is well-known to be equivalentto the set basis problem. Given a family C ⊆ P ( Y ) of subsets of a finite set Y , a set basis for C is a family B ⊆ P ( Y ) such that each element of C can beexpressed as a union of elements of B . A relation R ⊆ X × Y has a bicliquecover of size k iff the family C R = { R [ x ] : x ∈ X } ⊆ P ( Y ) of neighborhoods ofnodes in X has a set basis of size k .(2) Given an instance C ⊆ P ( Y ) of the set basis problem, consider the ∪ -subsemilattice h C i ⊆ P ( Y ) generated by C , i.e. the semilattice of all unions ofsets in C . We claim that C has a set basis of size at most k iff there exists anextension of h C i of degree at most k , i.e. a monomorphism h C i S into somefinite semilattice S with | J ( S ) | ≤ k .For the “only if” direction, suppose that B ⊆ P ( Y ) is a set basis of C of sizeat most k . The the embedding h C i h B i gives an extension of h C i with the desired property: since the semilattice h B i has a set of generators with at most k elements, it has at most k join-irreducibles.For the “if” direction, suppose that m : h C i S with | J ( S ) | ≤ k is given.Since the free semilattice P ( Y ) is an injective object of JSL [19, Corollary 2.9],there exists a morphism f : S → P ( Y ) extending the embedding h C i P ( Y ).Consider the image S ′ ⊆ P ( Y ) of f , leading to the commutative diagram below: h C i % % ⊆ % % ▲▲▲▲▲▲ / / m / / S f (cid:15) (cid:15) e / / / / S ′ y y ⊆ y y ttttttt P ( Y )We thus have h C i ⊆ S ′ ⊆ P ( Y ). Every set of generators of the semilattice S ′ isa basis of C . Since the morphism e is surjective, we have | J ( S ′ ) | ≤ | J ( S ) | ≤ k ,i.e. S ′ has a set of generators with at most k elements.(3) Let C DR L ⊆ P ( LD ( L r )) be the instance of the set basis problem correspond-ing to the dependency relation DR L ⊆ LD ( L ) × LD ( L r ). Note that h C DR L i con-sists of all DR L [ X ] for X ⊆ LD ( L ). Thus, Theorem 3.15(1) shows that h C DR L i ∼ = SLD ( L ). In particular, every extension of the canonical boolean representation of Σ ∗ yields an extension of the semilattice h C DR L i of the same degree. Therefore,by part (1) and (2) and Theorem 4.7, we have dim( DR L ) ≤ ns( L ), as required.Theorem 4.7 motivates the following definition, which can be considered the keyconcept of our paper: Definition 4.9.
The nondeterministic syntactic complexity n µ ( L ) of a regularlanguage L is the least degree of any boolean representation of syn ( L ) extendingthe canonical boolean representation κ L : syn ( L ) → JSL ( SLD ( L ) , SLD ( L )).Just like the degrees of boolean representations of Σ ∗ determine the state com-plexity of nfas, we will provide an automata-theoretic characterization of n µ ( L )in terms of subatomic nfas in Theorem 4.14 below. Definition 4.10.
An nfa accepting the language L is called(1) atomic if each state accepts a language from BLD ( L ), and(2) subatomic if each state accepts a language from BLRD ( L ).The notion of an atomic nfa goes back to Brzozowski and Tamm [6], as does thefollowing characterization. Notation 4.11.
For any nfa N , let rsc ( N ) denote the dfa obtained via the reachable subset construction , i.e. the dfa-reachable part of P ( N ). Theorem 4.12.
An nfa N is atomic iff rsc ( N r ) is a minimal dfa. We present a new conceptual proof, interpreting this theorem as an instance ofthe self-duality of
JSL -dfas. ondeterministic Syntactic Complexity 13
Proof (Sketch).
Let L be the language accepted by N . We establish the theoremby showing each of the following statements to be equivalent to the next one:(1) N is atomic.(2) There exists a JSL -automata morphism from P ( N ) to BLD ( L ).(3) There exists a JSL -automata morphism from P ( dfa ( L r )) to P ( N r ).(4) There exists a dfa morphism from dfa ( L r ) to P ( N r ).(5) There exists a dfa morphism from dfa ( L r ) to rsc ( N r ).(6) rsc ( N r ) is a minimal dfa.The key step is (2) ⇔ (3), which follows via duality from Lemmas 3.11 and 3.12,and Proposition 3.16. All remaining equivalences follow from the definitions. ⊓⊔ The next theorem gives an analogous characterization of subatomic nfas. Again,the proof is based on duality.
Theorem 4.13.
An nfa N accepting the language L is subatomic iff the trans-ition monoid of rsc ( N r ) is isomorphic to the syntactic monoid syn ( L r ) .Proof (Sketch). Each of the following statements is equivalent to the next one:(1) N is subatomic.(2) There exists a JSL -dfa morphism from P ( N ) to BLRD ( L ).(3) There exists a JSL -dfa morphism from rdc ( simple ( P ( N ))) to BLRD ( L ).(4) There exists a JSL -dfa morphism from P ( syn ( L r )) to ts ( reach ( P ( N r ))).(5) There exists a dfa morphism from syn ( L r ) to ts ( reach ( P ( N r ))).(6) There exists a dfa morphism from syn ( L r ) to tm ( rsc ( N r )).(7) The monoids syn ( L r ) and tm ( rsc ( N r )) are isomorphic.The equivalence (3) ⇔ (4) follows via duality from Lemma 3.11, Proposition 3.17and Proposition 3.19. All remaining equivalences follow from the definitions. ⊓⊔ We are prepared to state the main result of our paper, an automata-theoreticcharacterization of the nondeterministic syntactic complexity:
Theorem 4.14.
For every regular language L , the nondeterministic syntacticcomplexity n µ ( L ) is the least number of states of any subatomic nfa accepting L .Proof (Sketch). (1) Let N be a k -state subatomic nfa accepting the language L . As in the proofof Theorem 4.7, we consider the semilattice langs ( N ) = simple ( P ( N )). Then ρ : syn ( L ) → JSL ( langs ( N ) , langs ( N )) , [ w ] L λK.w − K, is a representation of syn ( L ) of degree at most k extending κ L .(2) Conversely, let ρ : syn ( L ) → JSL ( S, S ) be a boolean representation ex-tending κ L , and let h : SLD ( Q ) S be the embedding. As in the proof ofTheorem 4.7, we can equip S with the structure of a JSL -dfa making h an auto-mata morphism. Its nfa of join-irreducibles, see Remark 3.4, is a subatomic nfaaccepting L with deg( ρ ) states. ⊓⊔ We conclude this section with the observation that the state complexity of un-restricted nfas, subatomic nfas and atomic nfas generally differs:
Example 4.15 (Subatomic more succinct than atomic).
Consider the lan-guage L accepted by the nfa N shown below, along with the minimal dfas for L and L r . Each automaton has exactly one initial state, namely 0. a,b (cid:127) (cid:127) ⑧⑧⑧⑧⑧⑧ b (cid:31) (cid:31) ❄❄❄❄❄❄ b (cid:31) (cid:31) ❄❄❄❄❄❄ b ? ? ⑧⑧⑧⑧⑧⑧ a _ _ ❄❄❄❄❄❄ a r r a (cid:127) (cid:127) ⑧⑧⑧⑧⑧⑧ a g g a _ _ ❄❄❄❄❄❄ b (cid:31) (cid:31) ❄❄❄❄❄❄ a (cid:15) (cid:15) a o o b / / ? ? b (cid:127) (cid:127) ⑧⑧⑧⑧⑧⑧ a (cid:15) (cid:15) a ' ' b O O a (cid:127) (cid:127) ⑧⑧⑧⑧⑧⑧ b O O a (cid:15) (cid:15) a O O b / / a o o b / / a,b g g a ' ' b / / a o o b / / a o o b (cid:15) (cid:15) a,b ' ' a,b o o b O O a o o N dfa ( L ) dfa ( L r ) Brzozowski and Tamm [6] showed that there is no atomic nfa with four statesaccepting L . However, N is subatomic: one can verify that the transition monoidsof dfa ( L r ) and rsc ( N r ) both have 22 elements. Since the former is the syntacticmonoid of L r , they are isomorphic, and so Theorem 4.13 applies. Example 4.16 (Subatomic less succinct than general nfas).
There is aregular language for which no state-minimal nfa is subatomic: L := { a n : n ∈ N , n = 5 } ⊆ { a } ∗ . It is accepted by the following nfa: / / /.-,()*+(cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) a (cid:127) (cid:127) ⑧⑧⑧⑧⑧⑧⑧⑧ /.-,()*+(cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) a (cid:15) (cid:15) o o /.-,()*+(cid:31)(cid:30)(cid:29)(cid:28)(cid:24)(cid:25)(cid:26)(cid:27) a / / /.-,()*+ a _ _ ❄❄❄❄❄❄❄❄ a / / /.-,()*+ a O O An exhaustive search shows that no subatomic nfa with five states accepts L .In fact, L is the unique (!) unary language with ns( L ) ≤ L ) < n µ ( L ).Moreover, the above nfa and its reverse are the only state-minimal nfas for L . While subatomic nfas are generally less succinct then unrestricted ones, all struc-tural results concerning nondeterministic state complexity we have encounteredin the literature are actually about nondeterministic syntactic complexity: theyimplicitly identify classes of languages where the two measures coincide. In thepresent section, we illustrate this in a few selected applications. ondeterministic Syntactic Complexity 15
For unary languages L ⊆ { a } ∗ , two-sided derivatives are left derivatives. Thus,a unary nfa is atomic iff it is subatomic. Example 5.1 (Cyclic unary languages).
A unary language L is cyclic if itsminimal dfa is a cycle [16]. We claim that ns( L ) = n µ ( L ). To see this, let d := | LD ( L ) | be the period (i.e. number of states) of the minimal dfa. By Fact 1 of [16](originally from [22]) every state-minimal nfa N accepting L is a disjoint unionof cyclic dfas whose periods divide d . Then | rsc ( N r ) | = d : we have | rsc ( N r ) | ≥ d since rsc ( N r ) is a dfa accepting L = L r and d is the size of the minimal dfa for L , and | rsc ( N r ) | ≤ d because after d steps, each cycle will be back in its initialstate. Thus N is atomic by Theorem 4.12 and hence subatomic.We deduce the following result for (not necessarily unary) regular languages: Theorem 5.2. If syn ( L ) is a cyclic group, then ns( L ) = n µ ( L ) .Proof (Sketch). Suppose that syn ( L ) = tm ( dfa ( L )) is cyclic. Then there exists w ∈ Σ ∗ such that the map λX.w − X : LD ( L ) → LD ( L ) generates tm ( dfa ( L )).Fix an alphabet Σ = { a } disjoint from Σ and consider the unary language L := { a n : n ∈ N , w n ∈ L } ⊆ Σ ∗ . Let g : Σ ∗ → Σ ∗ be the monoid morphism where g ( a ) := w . Then we havethe JSL -isomorphism f : SLD ( L ) ∼ = −→ SLD ( L ) , f ( X − L ) := [ g [ X ]] − L. For each a ∈ Σ choose n a ∈ N such that a − K = ( w n a ) − K for all K ∈ LD ( L ).The respective transition endomorphisms of the JSL -automata
SLD ( L ) and SLD ( L ) determine each other in the sense that the following diagrams commute: SLD ( L ) f ∼ = / / a − ( − ) (cid:15) (cid:15) SLD ( L ) w − ( − ) (cid:15) (cid:15) SLD ( L ) f ∼ = / / SLD ( L ) SLD ( L ) f ∼ = / / ( a na ) − ( − ) (cid:15) (cid:15) SLD ( L ) a − ( − ) (cid:15) (cid:15) SLD ( L ) f ∼ = / / SLD ( L )Then ns( L ) = ns( L ) by Theorem 4.7 and n µ ( L ) = n µ ( L ) by Theorem 4.14.Moreover, by Example 5.1 we know that ns( L ) = n µ ( L ), so the claim follows. Example 5.3 ( n µ ( L ) no larger than Chrobak normal form). A unary nfais in
Chrobak normal form [8, 13] if it has a single initial state and at most onestate with multiple successors, all of which lie in disjoint cycles. We claim thatfor any nfa N in Chrobak normal form accepting the language L , we haven µ ( L ) ≤ | N | , In [16] nfas are restricted to have a single initial state and so are distinguished fromunions of dfas; the latter are valid nfas from our perspective.6 R. Myers, S. Milius, and H. Urbat where | N | denotes the number of states of N . To see this, observe that each stateof N up to and including the unique choice state accepts some left derivativeof L . The successors of the choice state collectively accept a derivative u − L ;this language is cyclic because it is a finite union of cyclic languages. Therefore,by Example 5.1 we may replace the cycles by an atomic nfa accepting u − L ,without increasing the number of states. The resulting nfa is atomic.Since every unary nfa on n states can be transformed into an nfa in Chrobaknormal form with O ( n ) states [8, Lemma 4.3], we get: Corollary 5.4. If L is a unary regular language, then n µ ( L ) = O (ns( L ) ) . There are several natural classes of regular languages for which canonical state-minimal nondeterministic acceptors have been identified. We show that theseacceptors are actually subatomic. In our arguments, we frequently consider the length of a finite semilattice S , i.e. the maximum length n of any ascending chain s < s < . . . < s n in S . Note that since every element is uniquely determinedby the set of join-irreducibles below it, the length of S is at most | J ( S ) | . Example 5.5 (Bideterministic and biseparable languages). (1) A language is called bideterministic if it is accepted by a dfa whose reverse isalso a dfa. In this case, the minimal dfa is a minimal nfa [34,38]. Bideterministiclanguages have been studied in the context of automata learning [2] and codingtheory, where they are known as rectangular codes [27, 36]. We show that forevery bideterministic language L ,ns( L ) = n µ ( L ) = | LD ( L ) | . To this end, we first note that by [36, Theorem 3.1] a language L ⊆ Σ ∗ isbideterministic iff the left derivatives of L are pairwise disjoint. This implies that SLD ( L ) is a boolean algebra with atoms LD ( L ). Since the length of a booleanalgebra equals the number of atoms (= join-irreducibles), we conclude that forevery finite semilattice extension SLD ( L ) S , the semilattice S has length atleast | LD ( L ) | . Thus, | LD ( L ) | ≤ | J ( S ) | , so any representation ρ extending κ L or κ L ◦ µ L satisfies | LD ( L ) | ≤ deg( ρ ). Hence, ns( L ) = n µ ( L ) = | LD ( L ) | byTheorem 4.7 and 4.14. In particular, the minimal dfa of L is a minimal nfa.(2) A language L is biseparable if SLD ( L ) is a boolean algebra [28]. For everybiseparable language L , the canonical residual automaton [12], i.e. the nfa N L of join-irreducibles of the minimal JSL -dfa
SLD ( L ), is a state-minimal nfa; itis subatomic because every state of N L accepts a derivative of L . This followsexactly as in (1): our argument only used that SLD ( L ) is a boolean algebra. Actually [28] defines biseparability as a property of nfas, and characterizes bisepar-able nfas as those accepting a language L for which no ∪ -irreducible left derivativeis contained in the union of other ∪ -irreducible left derivatives. This is equivalent tothe lattice SLD ( L ) being boolean, i.e. to L being ‘biseparable’ in our sense.ondeterministic Syntactic Complexity 17 Example 5.6 (Maximal reachability).
A folklore result asserts that if N is an nfa whose accepted language L satisfies | LD ( L ) | = 2 | N | , then N is state-minimal. Since LD ( L ) forms the set of states of the minimal dfa for L and rsc ( N )accepts L , we have rsc ( N ) = P ( N ). It follows the JSL -dfa P ( N ) is reachableand simple, hence isomorphic to the minimal JSL -dfa
SLD ( L ). This proves that SLD ( L ) is a boolean algebra, i.e. L is a biseparable language. We conclude fromExample 5.5(2) that ns( L ) = n µ ( L ) = | N | and N L is a subatomic minimal nfa. Example 5.7 (BiRFSA and topological languages).
So far
SLD ( L ) hasbeen a boolean algebra. But the argument in Example 5.5 also applies when SLD ( L ) is a distributive lattice, noting that the length of a finite distributive lat-tice is equal to the number of its join-irreducibles [17, Corollary 2.14]. Languageswith this property are called topological [1]. It thus follows as in Example 5.5(2)that for any topological language L , the canonical residual automaton N L issubatomic and a state-minimal nfa. Thus, ns( L ) = n µ ( L ) = | J ( SLD ( L )) | .There is another class of languages where N L is known to be a state-minimalnfa, the biRFSA languages [28]. A language L is called biRFSA if N L is iso-morphic to ( N L r ) r . Surprisingly, these languages are exactly the topological ones:(1) Suppose that L is topological . Recall that N L is the nfa of join-irreduciblesof the minimal JSL -dfa. Thus, it has states J ( SLD ( L )) and transitions given by X a −→ Y iff Y ⊆ a − X for a ∈ Σ . Moreover, a join-irreducible j is initial iff j ⊆ L and final iff ε ∈ j . Since the lattice SLD ( L ) is distributive, we have a canonicalbijection between its join- and meet-irreducibles: τ : J ( SLD ( L )) ∼ = −→ M ( SLD ( L )) , τ ( j ) = [ { X ∈ SLD ( L ) : j * X } . Let θ be the unique map making the following diagram commute, where dr L isthe restriction of the isomorphism of Proposition 3.13: J ( SLD ( L )) τ ∼ = ' ' ❖❖❖❖❖❖❖ θ ∼ = w w ♦♦♦♦♦♦♦ J ( SLD ( L r )) dr L ∼ = / / M ( SLD ( L ))One can show θ to be an nfa isomorphism from N L to ( N L r ) r . Thus, L is biRFSA.(2) Suppose that L is biRFSA. Then we have a surjective
JSL -morphism[ P ( J ( SLD ( L )))] op ∼ = P ( J ( SLD ( L r ))) e L r −−→ SLD ( L r ) ∼ = [ SLD ( L )] op , where the first isomorphism follows from N L ∼ = ( N L r ) r and Lemma 3.11, thesecond isomorphism is given by Proposition 3.13, and e L r sends X ⊆ J ( SLD ( L r ))to S X . The dual of this morphism is the injective JSL -morphism m L : SLD ( L ) P ( J ( SLD ( L )))sending K ∈ SLD ( L ) to the set of all j ∈ J ( SLD ( L )) with j ⊆ K . Note that e L ◦ m L = id SLD ( Q ) , showing that SLD ( L ) is a retract of P ( J ( SLD ( L ))). Since JSL -retracts of finite distributive lattices are distributive, see e.g. [31, Lemma2.2.3.15], it follows that
SLD ( L ) is distributive. Thus, L is topological. Example 5.8 (Extremal languages).
Call a language extremal if SLD ( L ) haslength | J ( SLD ( L )) | i.e. we have an extremal lattice in the sense of Markowsky[29]. Again, the argument of Example 5.5 applies and we get ns( L ) = n µ ( L ) = | J ( SLD ( L )) | . Topological languages are extremal since every distributive latticeis an extremal lattice, although extremal languages need not be topological. Bothclasses are naturally characterized in terms of the reduced dependency relation:(1) L is topological iff DR jL is essentially an order relation ≤ P ⊆ P × P of afinite poset [30, Example 2.2.12].(2) L is extremal iff DR jL is upper unitriangularizable [29, Theorem 11].The latter means the adjacency matrix of the bipartite graph DR jL can be putin upper triangular form with ones along the diagonal, by permuting rows andcolumns. An order relation is upper unitriangularizable because it may be ex-tended to a linear order. Motivated by the duality theory of deterministic finite automata over semilat-tices, we introduced a natural class of nondeterministic finite automata called subatomic nfas and studied their state complexity in terms of boolean represent-ations of syntactic monoids. Furthermore, we demonstrated that a large body ofprevious work on state minimization of general nfas actually constructs minimalsubatomic ones. There are several directions for future work.As illustrated by Theorem 4.8, the dependency relation DR L forms a usefultool for proving lower bounds on nfas. It is also a key element of the Kameda-Weiner algorithm [26,37] for minimizing nfas, which rests on computing bicliquecovers of DR L . We aim to give an algebraic interpretation of dependency rela-tions based on the representation of finite semilattices by contexts [24], whichcan be augmented to a categorical equivalence between JSL f and a suitable cat-egory of bipartite graphs [31]. Under this equivalence, JSL -dfas correspond to dependency automata ; in particular, the minimal
JSL -dfa
SLD ( L ) correspondsto a dependency automaton whose underlying bipartite graph is precisely thedependency relation DR L . We expect that this observation can lead to a fresh al-gebraic perspective on the Kameda-Weiner algorithm, as well as a generalizationof it computing minimal (sub-)atomic nfas.On a related note, we also intend to investigate the complexity of the minimiz-ation problem for (sub-)atomic nfas. While minimizing general nfas is PSPACE-complete, even if the input automaton is a dfa, we conjecture that the additionalstructure present in (sub-)atomic acceptors will simplify their minimization toan NP-complete task. First evidence in this direction is provided by Geldenhuys,van der Merve, and van Zijl [14] whose work implies that minimal atomic nfascan be efficiently computed in practice using SAT solvers. ondeterministic Syntactic Complexity 19 References
1. Adámek, J., Myers, R.S., Urbat, H., Milius, S.: On continuous nondeterminism andstate minimality. In: Proc. 30th Conference on the Mathematical Foundations ofProgramming Semantics (MFPS XXX). vol. 308, pp. 3–23 (2014)2. Angluin, D.: Inference of reversible languages. J. ACM 29(3), 741–765 (1982)3. Arbib, M.A., Manes, E.G.: Adjoint machines, state-behavior machines, and duality.Journal of Pure and Applied Algebra 6(3), 313–344 (1975)4. Arbib, M.A., Manes, E.G.: Fuzzy machines in a category. Bulletin of the AustralianMathematical Society 13(2), 169–210 (1975)5. Backhouse, R.: Factor theory and the unity of opposites. Journal of Logical andAlgebraic Methods in Programming 85(5, Part 2), 824–846 (2016)6. Brzozowski, J., Tamm, H.: Theory of átomata. Theoretical Computer Science 539,13–27 (2014)7. Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (Oct1964)8. Chrobak, M.: Finite automata and unary languages. Theoretical Computer Science47, 149–158 (1986)9. Clemente, L., Mayr, R.: Efficient reduction of nondeterministic automata withapplication to language inclusion testing. Logical Methods in Computer ScienceVolume 15, Issue 1 (2019)10. Conway, J.H.: Regular Algebra and Finite Machines. Printed in GB by WilliamClowes & Sons Ltd (1971)11. De Wulf, M., Doyen, L., Henzinger, T.A., Raskin, J.F.: Antichains: A new algorithmfor checking universality of finite automata. In: Ball, T., Jones, R.B. (eds.) Com-puter Aided Verification. pp. 17–30. Springer (2006)12. Denis, F., Lemay, A., Terlutte, A.: Residual finite state automata. In: Ferreira, A.,Reichel, H. (eds.) STACS 2001: 18th Annual Symposium on Theoretical Aspectsof Computer Science Dresden, Germany, February 15–17, 2001 Proceedings. pp.144–157. Springer Berlin Heidelberg, Berlin, Heidelberg (2001)13. Gawrychowski, P.: Chrobak normal form revisited, with applications. In: Bouchou-Markhoff, B., Caron, P., Champarnaud, J.M., Maurel, D. (eds.) Implementationand Application of Automata. pp. 142–153. Springer Berlin Heidelberg, Berlin,Heidelberg (2011)14. Geldenhuys, J., van der Merwe, B., van Zijl, L.: Reducing nondeterministic finiteautomata with SAT solvers. In: Yli-Jyrä, A., Kornai, A., Sakarovitch, J., Wat-son, B. (eds.) Finite-State Methods and Natural Language Processing. pp. 81–92.Springer Berlin Heidelberg, Berlin, Heidelberg (2010)15. Goguen, J.A.: Discrete-time machines in closed monoidal categories. I. J. Comput.Syst. Sci. 10(1), 1–43 (1975)16. Gramlich, G.: Probabilistic and nondeterministic unary automata. In: Proc. ofMath. Foundations of Computer Science, Springer, LNCS 2747, 2003. pp. 460–469. Springer (2003)17. Grätzer, G.: General Lattice Theory. Birkhäuser Verlag, 2. edn. (1998)18. Gruber, H., Holzer, M.: Finding lower bounds for nondeterministic state complexityis hard. In: Ibarra, O.H., Dang, Z. (eds.) Developments in Language Theory: 10thInternational Conference, DLT 2006, Santa Barbara, CA, USA, June 26-29, 2006.Proceedings. pp. 363–374. Springer Berlin Heidelberg, Berlin, Heidelberg (2006)19. Horn, A., Kimura, N.: The category of semilattices. Algebra Univ. 1, 26–38 (1971)0 R. Myers, S. Milius, and H. Urbat20. Hromkovič, J., Seibert, S., Karhumäki, J., Klauck, H., Schnitger, G.:Communication complexity method for measuring nondeterminism in fi-nite automata. Information and Computation 172(2), 202–217 (2002),
21. Izhakian, Z., Rhodes, J., Steinberg, B.: Representation theory of finite semigroupsover semirings. Journal of Algebra 336(1), 139–157 (2011)22. Jiang, T., McDowell, E., Ravikumar, B.: The structure and complexity of minimalnfa’s over a unary alphabet. International Journal of Foundations of ComputerScience 02(02), 163–182 (1991)23. Jiang, T., Ravikumar, B.: Minimal NFA problems are hard. SIAM Journal onComputing 22(6), 1117–1141 (1993)24. Jipsen, P.: Categories of algebraic contexts equivalent to idempotent semiringsand domain semirings. In: Kahl, W., Griffin, T.G. (eds.) Relational and AlgebraicMethods in Computer Science. pp. 195–206. Springer Berlin Heidelberg, Berlin,Heidelberg (2012)25. Johnstone, P.T.: Stone spaces. Cambridge University Press (1982)26. Kameda, T., Weiner, P.: On the state minimization of nondeterministic finite auto-mata. IEEE Transactions on Computers C-19(7), 617–627 (1970)27. Kschischang, F.R.: The trellis structure of maximal fixed-cost codes. IEEE Trans-actions on Information Theory 42(6), 1828–1838 (1996)28. Latteux, M., Roos, Y., Terlutte, A.: Minimal NFA and biRFSA languages. RAIRO- Theoretical Informatics and Applications 43(2), 221–237 (2009)29. Markowsky, G.: Primes, irreducibles and extremal lattices. Order 9, 265–290 (091992)30. Myers, R.S.R.: Nondeterministic automata and JSL-dfas. CoRR abs/2007.06031(2020), https://arxiv.org/abs/2007.06031
31. Myers, R.S.R.: Representing semilattices as relations. CoRR abs/2007.10277(2020), https://arxiv.org/abs/2007.10277
32. Myers, R.S.R., Adámek, J., Milius, S., Urbat, H.: Coalgebraic constructions ofcanonical nondeterministic automata. Theoretical Computer Science 604, 81–101(2015)33. Pin, J.É.: Mathematical foundations of automata theory (September 2020), avail-able at
34. Pin, J.E.: On reversible automata. In: Simon, I. (ed.) LATIN ’92. pp. 401–416.Springer Berlin Heidelberg, Berlin, Heidelberg (1992)35. Polák, L.: Syntactic semiring of a language. In: Sgall, J., Pultr, A., Kolman, P.(eds.) Mathematical Foundations of Computer Science 2001: 26th InternationalSymposium, MFCS 2001 Mariánské Lázne, Czech Republic, August 27–31, 2001Proceedings. pp. 611–620. Springer Berlin Heidelberg, Berlin, Heidelberg (2001)36. Shankar, P., Dasgupta, A., Deshmukh, K., Rajan, B.: On viewing block codes asfinite automata. Theoretical Computer Science 290(3), 1775–1797 (2003)37. Tamm, H.: New interpretation and generalization of the Kameda-Weiner method.In: Chatzigiannakis, I., Mitzenmacher, M., Rabani, Y., Sangiorgi, D. (eds.) ICALP2016, Rome, Italy. LIPIcs, vol. 55, pp. 116:1–116:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)38. Tamm, H., Ukkonen, E.: Bideterministic automata and minimal representationsof regular languages. Theoretical Computer Science 328(1), 135–149 (2004)39. van Heerdt, G., Moerman, J., Sammartino, M., Silva, A.: A (co)algebraic theoryof succinct automata. Journal of Logical and Algebraic Methods in Programming105, 112–125 (2019)ondeterministic Syntactic Complexity 21
A Appendix
This Appendix provides full proofs and additional details on the examples omit-ted for space reasons.
Proof of Lemma 3.11
Let N = ( Q, δ, I, F ). We claim that the semilattice isomorphism h : [ P ( Q )] op ∼ = −→ P ( Q ) , X X = Q \ X, gives an isomorphism of JSL -dfas from [ P ( N )] op to P ( N r ). Preservation of the initial state.
The initial state of [ P ( N ) op ] is F , the largestnon-final state of P ( N ). Thus h maps it to F , the initial state of P ( N r ). Preservation of final states.
By definition, a state X is final in [ P ( N )] op iff I X .This is equivalent to h ( X ) ∩ I = ∅ , i.e. to h ( X ) being final in P ( N r ). Preservation of transitions.
Let
X, Y ∈ P ( Q ) and a ∈ Σ such that X a −→ Y is atransition in [ P ( N )] op . By definition, Y is the set of all q ∈ Q with δ a [ q ] ⊆ X .Thus, Y is the set of all q ∈ Q such δ a [ q ] ∩ X = ∅ . This means that X a −→ Y is atransition in P ( N r ). Proof of Lemma 3.12 (1) Let g : S → G = { x ∈ S : x S s } . Then, for any word w = a . . . a n in Σ ∗ , we have δ w r ( s ) S s iff the morphism2 i −→ S δ an −−→ S · · · S δ a −−→ S g −→ id : 2 →
2. This is the case iff the dual morphism2 g ∗ −→ S op δ ∗ a −−→ S op · · · S op δ ∗ an −−→ S op i ∗ −→ id : 2 →
2. Since g ∗ maps 1 to s , this means precisely that the state s of A op accepts w .(2) follows from part (1) by choosing s to be the initial state of A op , i.e. thelargest non-final state of A .(3) follows via duality: the smallest subautomaton reach ( A ) of A dualizes to thesmallest quotient automaton simple ( A op ) of A op . Proof of Proposition 3.13
By Lemma 3.12, the dual of a minimal
JSL -dfa accepting L r is a minimal JSL -dfa accepting L . Thus, by the uniqueness of minimal automata, the unique JSL -automata morphism from [
SLD ( L r )] op to SLD ( L ), mapping the state K of [ SLD ( L r )] op to the language L ([ SLD ( L r )] op , K ) it accepts, is an isomorphism.It only remains to verify that this language is equal to ( K r ) − L . To this end, wecompute for all w ∈ Σ ∗ : w ∈ L ([ SLD ( L r )] op , K ) ⇐⇒ ( w r ) − L r K by Lemma 3.12(1) ⇐⇒ ∃ x ∈ K : w r x ∈ L r ⇐⇒ ∃ y ∈ K r : yw ∈ L ⇐⇒ w ∈ ( K r ) − L Proof of Theorem 3.15 (1) We need to show that α : SLD ( L ) → ( {DR L [ X ] : X ⊆ LD ( L ) } , ∪ , ∅ ) , α ( K ) := { v − L r : v ∈ K r } , is an isomorphism. To this end, let K ∈ SLD ( L ), say K = K ∪ · · · ∪ K n for K i ∈ LD ( L ). We show that α ( K ) = DR L ( { K , . . . , K n } ) , which immediately implies that α is a well-defined isomorphism of semilattices.To this end, we compute for all v ∈ Σ ∗ : v − L r ∈ α ( K ) ⇐⇒ v ∈ K r ⇐⇒ ∃ i : v ∈ K i r ⇐⇒ ∃ i : v r ∈ K i ⇐⇒ ∃ i : v − L r ∈ DR L [ K i ] ⇐⇒ v − L r ∈ DR L ( { K , . . . , K n } )(2) Let us first note that the isomorphism dr L from Proposition 3.13 has thefollowing alternative description:dr L ( U − L r ) = [ { K ∈ LD ( L ) : K ∩ U r = ∅} for every U ⊆ Σ ∗ . (A.1)In fact, for every w ∈ Σ ∗ we compute: w ∈ dr L ( U − L r ) ⇐⇒ w ∈ ( U − L r ) r − L def. dr L ⇐⇒ ∃ v ∈ U − L r : v r w ∈ L ⇐⇒ ∃ v ∈ Σ ∗ : [ v r w ∈ L ∧ ∀ u ∈ U : uv L r ] ondeterministic Syntactic Complexity 23 ⇐⇒ ∃ y ∈ Σ ∗ : [ yw ∈ L ∧ ∀ u ∈ U : yu r L ] ⇐⇒ ∃ y ∈ Σ ∗ : [ w ∈ y − L ∧ y − L ∩ U r = ∅ ] ⇐⇒ w ∈ [ { K ∈ LD ( L ) : K ∩ U r = ∅} . It thus follows for all u, v ∈ Σ ∗ : u − L dr L ( v − L r ) ⇐⇒ u − L
6∈ { K ∈ LD ( L ) : v r K }⇐⇒ v r ∈ u − L ⇐⇒ DR L ( u − L, v − L r ) . (3) follows immediately from (2), restricted to DR jL . Proof of Proposition 3.16 (1) Let At( x ) denote the unique atom of BLD ( L ) containing the word x ∈ Σ ∗ .For any v, w ∈ Σ ∗ we have v − L r = w − L r iff At( v r ) = At( w r ). In fact, v − L r = w − L r iff ∀ x ∈ Σ ∗ : vx ∈ L r ⇐⇒ wx ∈ L r iff ∀ y ∈ Σ ∗ : v r ∈ y − L ⇐⇒ w r ∈ y − L iff At( v r ) = At( w r ) . In the final step, we use that the boolean algebra
BLD ( L ) is generated by theleft derivatives of L , so two words belong to the same atom iff they belong tothe same left derivatives.(2) It follows that the map h : P ( dfa ( L r )) → [ BLD ( L )] op defined by { w − L r , . . . , w − n L r } 7→ n \ i =1 At( w i r )gives a well-defined isomorphism of semilattices. It remains to prove that it isan automata morphism. Preservation of the initial state.
The initial state { L r } of P ( dfa ( L r )) is mappedto At( ε ). This is the largest non-final state of BLD ( L ), i.e. the initial state of[ BLD ( L )] op . Preservation of final states.
Recall that the final states of [
BLD ( L )] op are thoselanguages in BLD ( L ) not containing L . Thus, { w − L r , . . . , w − n L r } final in BLD ( L )iff w i ∈ L r for some i iff w i r ∈ L for some i iff At( w i r ) ⊆ L for some i iff L At( w i r ) for some i iff L n \ i =1 At( w i r )iff n \ i =1 At( w i r ) final in [ BLD ( L )] op Preservation of transitions.
Since the semilattice P ( dfa ( L r )) is generated by theleft derivatives of L r , it suffices to prove that for each w ∈ Σ ∗ and a ∈ Σ wehave the transition h ( { w − L r } ) a −→ h ( { a − w − L r } ) , i.e. At( w r ) a −→ At( aw r )in [ BLD ( L )] op . But this is immediate because a − At( aw r ) ⊇ At( w r ). Proof of Proposition 3.17
The proof is much analogous to the one of Proposition 3.16.(1) Let At( x ) denote the atom of BLRD ( L ) containing the word x ∈ Σ ∗ . For anytwo words v, w ∈ Σ ∗ we have v ≡ L r w iff At( v r ) = At( w r ). In fact, v ≡ L r w iff ∀ x, y ∈ Σ ∗ : v ∈ x − L r y − ⇐⇒ w ∈ x − L r y − iff ∀ s, t ∈ Σ ∗ : v r ∈ s − Lt − ⇐⇒ w r ∈ s − Lt − iff At( v r ) = At( w r ) . In the final step, we use that the boolean algebra
BLRD ( L ) is generated by thetwo-sided derivatives of L , so two words belong to the same atom iff they belongto the same two-sides derivatives.(2) It follows that the map h : P ( syn ( L r )) → [ BLRD ( L )] op defined by { [ w ] L r , . . . , [ w n ] L r } 7→ n \ i =1 At( w i r )gives a well-defined isomorphism of semilattices. It remains to prove that it isan automata morphism. Preservation of the initial state.
The initial state { [ ε ] L r } of P ( syn ( L r )) is mappedto At( ε ). This is the largest non-final state of BLRD ( L ), i.e. the initial state of[ BLRD ( L )] op . ondeterministic Syntactic Complexity 25 Preservation of final states.
The final states of [
BLRD ( L )] op are those languagesin BLRD ( L ) not containing L . Thus, { [ w ] L r , . . . , [ w n ] L r } final in BLRD ( L )iff w i ∈ L r for some i iff w i r ∈ L for some i iff At( w i r ) ⊆ L for some i iff L At( w i r ) for some i iff L n \ i =1 At( w i r )iff n \ i =1 At( w i r ) final in [ BLRD ( L )] op Preservation of transitions.
Since the semilattice P ( syn ( L r )) is generated by theelements of syn ( L ), it suffices to prove that for each w ∈ Σ ∗ and a ∈ Σ we havethe transition h ( { [ w ] L r } ) a −→ h ( { [ wa ] L r } ) , i.e. At( w r ) a −→ At( aw r )in [ BLRD ( L )] op . But this is immediate because a − At( aw r ) ⊇ At( w r ). Proof of Proposition 3.19
Let A = ( S, δ, s , F ). For any K ⊆ Σ ∗ we put δ K := W w ∈ K δ w .(1) We first show that L ([ ts ( A )] op , δ K ) = [ v ∈ Σ ∗ L ( A op , δ vK ( s ))( v r ) − for each K ⊆ Σ ∗ . (A.2)To see this, we compute for all u ∈ Σ ∗ : u ∈ L ([ ts ( A )] op , δ K )iff δ u r δ K by Lemma 3.12(1)iff ∃ v ∈ Σ ∗ : δ u r ( δ v ( s )) S δ K ( δ v ( s )) since A is reachableiff ∃ v ∈ Σ ∗ : δ vu r ( s ) S δ vK ( s )iff ∃ v ∈ Σ ∗ : uv r ∈ L ( A op , δ vK ( s )) by Lemma 3.12(1)iff ∃ v ∈ Σ ∗ : u ∈ L ( A op , δ vK ( s ))( v r ) − . (2) For any w ∈ Σ ∗ , consider the two semilattice morphisms γ w : ts ( A ) → ts ( A ) , f δ w ◦ f,ϕ w : ts ( A ) → ts ( A ) , f f ◦ δ w . along with their dual morphisms γ ∗ w , ϕ ∗ w : [ ts ( A )] op → [ ts ( A )] op . We claim that L ([ ts ( A )] op , δ K )( w r ) − = L ([ ts ( A )] op , ϕ ∗ w ( δ K )) for each K ⊆ Σ ∗ . (A.3)To see this, we compute as follows for all u ∈ Σ ∗ , where ≤ is the order of thesemilattice JSL ( S, S ): u ∈ L ([ ts ( A )] op , ϕ ∗ w ( δ K ))iff id S ( γ u r ) ∗ ( ϕ ∗ w ( δ K )) def. L ( − , − )iff id S ( ϕ w ◦ γ u r ) ∗ ( δ K )iff ϕ w ◦ γ u r ( id S ) δ K by adjointnessiff δ wu r δ K iff γ wu r ( id S ) δ K iff id S ( γ wu r ) ∗ ( δ K ) by adjointnessiff uw r ∈ L ([ ts ( A )] op , δ K ) def. L ( − , − )iff u ∈ L ([ ts ( A )] op , δ K )( w r ) − (3) We are ready to prove the proposition. Since both [ ts ( A )] op and rdc ( A op ) aresimple JSL -dfas, and thus can be viewed as subautomata of
Fin ( L ), it suffices toshow that they contain the same languages. The inclusion [ ts ( A )] op ⊆ rdc ( A op )follows from (A.2). For the reverse inclusion, since [ ts ( A )] op is closed under rightderivatives by (A.3), we only need to prove that A op ⊆ [ ts ( A )] op . To this end,we show that, for any s ∈ S , L ( A op , s ) = L ([ ts ( A )] op , δ K ) , where K = { w ∈ Σ ∗ : δ w ( s ) ≤ S s } . For the proof, we first note that for all u ∈ Σ ∗ , δ u ( s ) ≤ S s ⇐⇒ ∀ v ∈ Σ ∗ : δ vu ( s ) ≤ S δ vK ( s ) . (A.4)In fact, “ ⇐ ” follows by taking v = ε ; we have s = δ K ( s ) because A is reachable.For “ ⇒ ”, suppose that δ u ( s ) ≤ S s . Then u ∈ K and therefore δ vu ( s ) ≤ S _ w ∈ K δ vw ( s ) = δ vK ( s )We now compute u ∈ L ( A op , s )iff δ u r ( s ) S s by Lemma 3.12(1)iff ∃ v ∈ Σ ∗ : δ vu r ( s ) S δ vK ( s ) by (A.4) ondeterministic Syntactic Complexity 27 iff ∃ v ∈ Σ ∗ : uv r ∈ L ( A op , δ vK ( s )) by Lemma 3.12(1)iff ∃ v ∈ Σ ∗ : u ∈ L ( A op , δ vK ( s ))( v r ) − iff u ∈ L ([ ts ( A )] op , δ K ) by (A.2)This concludes the proof. Proof of Theorem 4.7
Let d ( L ) denote the least degree of any boolean representation extending thecanonical representation κ L ◦ µ L .(1) A boolean presentation of Σ ∗ is given by a finite semilattice lattice S togetherwith a family of semilattice morphisms δ = ( δ a : S → S ) a ∈ Σ . An equivariantmap between boolean presentations ( S, δ ) and ( S ′ , δ ′ ) is a semilattice morphism h : S → S ′ with δ ′ a ◦ h = h ◦ δ a for all a ∈ Σ . If S carries a JSL -automatastructure (
S, δ, i, f ) and h is a monic, there exists an automata structure on S ′ making h an automata morphism: put i ′ := h ◦ i , and choose f ′ : S ′ → f ′ = h ◦ f . Such an f ′ exists because thesemilattice 2 is an injective object of JSL .2 i / / i ′ (cid:31) (cid:31) ❄❄❄❄❄❄❄ S δ a / / h (cid:15) (cid:15) S h (cid:15) (cid:15) f / / S ′ δ ′ a / / S ′ f ′ ? ? ⑧⑧⑧⑧⑧⑧⑧ (2) To prove d ( L ) ≤ ns( L ), suppose that N is an nfa accepting the language L .Consider the JSL -subautomaton langs ( N ) = simple ( P ( N )) of Fin ( L ) carried bythe semilattice of all languages accepted by subsets of N . Note that SLD ( L ) isa subautomaton of langs ( N ): every finite union S i w − i L of left derivatives of L is accepted by the set of all states of N reachable on input w i for some i . Thus,the inclusion map SLD ( L ) langs ( N ) defines an extension of the canonicalrepresentation κ L ◦ µ L . Since the semilattice langs ( N ) is generated by the setof languages accepted by single states of N , it follows that the degree of thisrepresentation is at most the number of states of N .(3) To prove ns( L ) ≤ d ( L ), suppose that ( S, δ ) is a boolean representation of Σ ∗ of degree k extending κ L ◦ µ L , witnessed by an injective equivariant map h : SLD ( L ) S . By part (1), we can equip S with a JSL -dfa structure making h an automata morphism. Since morphisms preserve accepted languages, it followsthat S accepts L . The automaton S has k join-irreducibles, so Remark 3.4 showsthat there exists an nfa on k states accepting L . Proof of Theorem 4.12
Remark A.1.
The subset construction, restricted to dfas, gives rise to a leftadjoint P : Aut ( Set f ) → Aut ( JSL f ) between the categories of dfas and JSL - dfas. Thus, for any dfa D and any JSL -dfa A , there is a bijective correspondencebetween dfa morphisms from D to A and JSL -dfa morphisms from P ( D ) to A .Our proof of Theorem 4.12 is essentially an instance of the self-duality of JSL -dfas. Let L be the language accepted by N . We establish the theorem by showingthat each of the following statements is equivalent to the next one:(1) N is atomic.(2) There exists a JSL -automata morphism from P ( N ) to BLD ( L ).(3) There exists a JSL -automata morphism from P ( dfa ( L r )) to P ( N r ).(4) There exists a dfa morphism from dfa ( L r ) to P ( N r ).(5) There exists a dfa morphism from dfa ( L r ) to rsc ( N r ).(6) rsc ( N r ) is a minimal dfa. Ad (1) ⇔ (2). The unique automata morphism m P ( N ) : P ( N ) → Fin ( L ) mapsevery state of P ( N ) to the language it accepts. Thus, N is atomic iff m P ( N ) factorizes through the subautomaton BLD ( L ) of Fin ( L ). Ad (2) ⇔ (3). This follows via duality from Lemma 3.11, Lemma 3.12 and Proposition 3.16.
Ad (3) ⇔ (4). This follows from Remark A.1.
Ad (4) ⇔ (5). Since dfa ( L r ) is a reachable dfa, every dfa morphism from dfa ( L r )to P ( N r ) factorizes through the dfa-reachable part rsc ( N r ) of P ( N r ). Ad (5) ⇔ (6). Every dfa morphism from dfa ( L r ) to rsc ( N r ) is an isomorphism: itis injective because dfa ( L r ) is a simple dfa and surjective because rsc ( N r ) is areachable dfa. Conversely, if rsc ( N r ) is a minimal dfa, then it is isomorphic to dfa ( L r ) by the uniqueness of minimal dfas. Proof of Theorem 4.13
Let us first recall the concept of algebraic language recognition [33].
Remark A.2.
A finite monoid M is said to recognize the language L ⊆ Σ ∗ if there exists a monoid morphism h : Σ ∗ → M and a subset P ⊆ M with L = h − [ P ]. Regular languages are exactly the languages recognizable by finitemonoids. In fact, we have the following connections between monoids and dfas:(1) If L is recognized by a finite monoid M via h : Σ ∗ → M and P ⊆ M , then M can be viewed as dfa accepting L , with transitions m a −→ m • h ( a ) for m ∈ M and a ∈ Σ , initial state 1 M , and final states P .(2) Conversely, if L is accepted by a dfa D = ( S, δ, s , F ), then the transitionmonoid tm ( D ) recognizes L via the morphism h : Σ ∗ ։ tm ( D ), w δ w , and P = { δ w : w ∈ L } . In particular, the syntactic monoid recognizes L via the syn-tactic morphism µ L : Σ ∗ ։ syn ( L ). It can be characterized as the least quotient ondeterministic Syntactic Complexity 29 monoid of Σ ∗ recognizing L : for any surjective monoid morphism h : Σ ∗ ։ M recognizing L , there is a unique morphism g : M ։ syn ( L ) with µ L = g ◦ h : Σ ∗ h ~ ~ ~ ~ ⑤⑤⑤⑤⑤⑤⑤⑤ µ L ●●●●●●●●● M g / / / / ❴❴❴❴❴❴❴ syn ( L )(3) Finally, there is a tight connection between morphisms of monoids and dfas.Suppose that two surjective monoid morphisms h i : Σ ∗ ։ M i and subsets P i ⊆ M i for i = 1 , M and M as dfas. Then everydfa morphism g : M → M makes the triangle below commute: Σ ∗ h } } } } ④④④④④④④④ h ! ! ! ! ❈❈❈❈❈❈❈❈ M g / / / / ❴❴❴❴❴❴❴ M In fact, M and M accept the same language L and Σ ∗ can be seen as the initialdfa accepting L when equipped with L ⊆ Σ ∗ as the set of final states. From thesurjectivity of h it easily follows that g is a monoid morphism. Conversely,every monoid morphism g making the above triangle commute and satisfying g [ P ] = P is a dfa morphism. Remark A.3.
For any
JSL -dfa A , the dfa-reachable part of ts ( reach ( A )) is tm ( A r ), where A r denotes the dfa-reachable part of A . In fact, letting reach ( A ) =( S, δ, s , F ) and A r = ( S r , δ r , s ,r , F r ), we have that A r is a sub-dfa of reach ( A ).Then the map ( δ r ) w δ w gives a well-defined injective dfa morphism from tm ( A r ) to ts ( reach ( A )), using that the semilattice S is generated by the subset S r ⊆ S . Thus, tm ( A r ) is a sub-dfa of ts ( reach ( A )). Since it is reachable, it itisomorphic to the dfa-reachable part of ts ( reach ( A )).With these preparations, we are ready to prove Theorem 4.13. Again, the argu-ment crucially rests on the self-duality of JSL -dfas. We show that each of thefollowing statements is equivalent to the next one:(1) N is subatomic.(2) There exists a JSL -dfa morphism from P ( N ) to BLRD ( L ).(3) There exists a JSL -dfa morphism from rdc ( simple ( P ( N ))) to BLRD ( L ).(4) There exists a JSL -dfa morphism from P ( syn ( L r )) to ts ( reach ( P ( N r ))).(5) There exists a dfa morphism from syn ( L r ) to ts ( reach ( P ( N r ))).(6) There exists a dfa morphism from syn ( L r ) to tm ( rsc ( N r )).(7) The monoids syn ( L r ) and tm ( rsc ( N r )) are isomorphic. Ad (1) ⇔ (2). The unique automata morphism m P ( N ) : P ( N ) → Fin ( L ) mapsevery state of P ( N ) to the language it accepts. Thus, N is subatomic iff m P ( N ) factorizes through the subautomaton BLRD ( L ) of Fin ( L ). Ad (2) ⇔ (3). This is clear since
BLRD ( L ) is closed under right derivatives. Ad (3) ⇔ (4). This follows via duality from Lemma 3.11, Proposition 3.17 andProposition 3.19.
Ad (4) ⇔ (5). This follows from Remark A.1.
Ad (5) ⇔ (6). Putting A = P ( N r ) in Remark A.3, we see that tm ( rsc ( N r )) isthe dfa-reachable part of ts ( reach ( P ( N r ))). Since syn ( L r ) is reachable as a dfa,it follows that every dfa morphism into ts ( reach ( P ( N r ))) factorizes through tm ( rsc ( N r )). Ad (6) ⇒ (7). Let q N r : Σ ∗ ։ tm ( rsc ( N r )) denote the canonical monoid morphismmapping w ∈ Σ ∗ to the transition morphism δ w of the dfa rsc ( N r ). Note thatthe dfa structure of tm ( rsc ( N r )) is precisely the one induced by q N r . Thus, givena dfa morphism h : syn ( L r ) → tm ( rsc ( N r )) we know that the following diagramcommutes by initiality, see Remark A.2(3): Σ ∗ µ L r ~ ~ ~ ~ ⑤⑤⑤⑤⑤⑤⑤⑤ q N r ●●●●●●●●● syn ( L r ) h / / / / ❴❴❴❴❴ tm ( rsc ( N r )) (A.5)Then h is necessarily a monoid morphism because µ L r is surjective. Since q N r recognizes the language L r , we get a unique monoid morphism g : tm ( rsc ( N r )) → syn ( L r ) with g ◦ q N r = µ L r . It follows that h is an isomorphism with h − = g . Ad (7) ⇒ (6). Suppose that the monoids syn ( L r ) and tm ( rsc ( N r )) are isomorphic.Let again g : tm ( rsc ( N r )) → syn ( L r ) be the unique monoid morphism with g ◦ q N r = µ L r . Then g is surjective because µ L r is. Since syn ( L r ) and tm ( rsc ( L r )) havethe same number of elements, it follows that g is also injective, i.e. an isomorph-ism of monoids. Then Remark A.2(3) shows that its inverse g − : syn ( L r ) → tm ( rsc ( N r )) is a dfa morphism. Proof of Theorem 4.14
Let a ( L ) denote the least number of states of any subatomic nfa accepting L .We are to prove a ( L ) = n µ ( L ).(1) To prove n µ ( L ) ≤ a ( L ), suppose that N is a subatomic nfa accepting thelanguage L . Consider the subsemilattice langs ( N ) = simple ( P ( N )) of Fin ( L ) ofall languages accepted by subsets of N . We claim that ρ : syn ( L ) → JSL ( langs ( N ) , langs ( N )) , [ w ] L λK.w − K is a boolean representation of syn ( L ) extending the canonical one. This is obviousonce we prove ρ to be a well-defined map, i.e. v ≡ L w implies v − K = w − K ondeterministic Syntactic Complexity 31 for v, w ∈ Σ ∗ and K ∈ langs ( N ). Since K ∈ BLRD ( L ), the boolean algebragenerated by all two-sided derivatives of L , and derivatives commute with allset-theoretic boolean operations, we can assume w.l.o.g. that K = s − Lt − forsome s, t ∈ Σ ∗ . Then, for all x ∈ Σ ∗ , x ∈ v − K ⇐⇒ x ∈ v − s − Lt − ⇐⇒ svxt ∈ L ⇐⇒ swxt ∈ L since v ≡ L w ⇐⇒ x ∈ w − s − Lt − ⇐⇒ x ∈ w − K proving that v − L = w − L , as required. Since the semilattice langs ( N ) is gener-ated by the set of languages accepted by single states of N , it follows that deg( ρ )is at most the number of states of N .(2) To prove a ( L ) ≤ n µ ( L ), let ρ : syn ( L ) → JSL ( S, S ) be a boolean representa-tion of syn ( L ) extending the canonical one. Then ρ ◦ µ L : Σ ∗ → JSL ( S, S ) extendsthe canonical presentation κ L ◦ µ L of Σ ∗ , and so like in proof of Theorem 4.7we can equip S with the structure of a JSL -dfa A = ( S, δ, i, f ) accepting L . Itsextended transition morphism for w ∈ Σ ∗ is given by δ w : S → S, s ρ ([ w ] L )( s ) . In particular, v ≡ L w implies δ v = δ w , which shows that every state of A acceptsa union of syntactic congruence classes of L . Since[ w ] L = \ xwy ∈ L x − Ly − ∩ \ xwy L x − Ly − , it follows that all languages accepted by states of A lie in BLRD ( L ). Therefore,the nfa N of join-irreducibles of A (see Remark 3.4) is a subatomic nfa withdeg( ρ ) states accepting L . Proof of Theorem 5.2 (1) Suppose that syn ( L ) = tm ( dfa ( L )) is cyclic. Then there exists w ∈ Σ ∗ suchthat the map λX.w − X : LD ( L ) → LD ( L ) generates tm ( dfa ( L )). We claim that,for all K, M ⊆ Σ ∗ K − L = M − L iff [ ∀ n ∈ N : w n ∈ K − L ⇐⇒ w n ∈ M − L ] . (A.6)The “only if” direction is trivial. For the converse, suppose that K − L = M − L .W.l.o.g. we may assume that there exists w ∈ K − L \ M − L . Choose i , . . . i k and j , . . . , j m such that K − L = S kp =1 ( w i p ) − L and M − L = S mr =1 ( w j r ) − L .Moreover, choose n ∈ N such that w − L = ( w n ) − L . Then we have w ∈ ( w i p ) − L for some p and thus w i p ∈ w − L = ( w n ) − L , using that tm ( dfa ( L )) isa commutative monoid. Thus, w n ∈ ( w i p ) − L ⊆ K − L . On the other hand, wehave w ( w j r ) − L for all r , so the same argument shows that ( w ) n M − L . (2) Fix an alphabet Σ = { a } disjoint from Σ and consider the unary language L := { a n : n ∈ N , w n ∈ L } ⊆ Σ ∗ . Let g : Σ ∗ → Σ ∗ be the monoid morphism where g ( a ) := w . We claim thatthe following map is a JSL -isomorphism: f : SLD ( L ) ∼ = −→ SLD ( L ) , f ( X − L ) := g [ X ] − L. To see that f is well-defined and injective, we prove for all X, Y ⊆ Σ ∗ : X − L = Y − L iff g [ X ] − L = g [ Y ] − L. In fact, we have X − L = Y − L iff ∀ n ∈ N : a n ∈ X − L ⇐⇒ a n ∈ Y − L iff ∀ n ∈ N : [ ∃ a k ∈ X : a n + k ∈ L ] ⇐⇒ [ ∃ a m ∈ Y : a n + m ∈ L ]iff ∀ n ∈ N : [ ∃ a k ∈ X : w n + k ∈ L ] ⇐⇒ [ ∃ a m ∈ Y : w n + m ∈ L ]iff ∀ n ∈ N : [ ∃ a k ∈ X : w n ∈ ( g ( a ) k ) − L ] ⇔ [ ∃ a m ∈ Y : w n ∈ ( g ( a ) m ) − L ]iff ∀ n ∈ N : w n ∈ g [ X ] − L ⇐⇒ w n ∈ g [ Y ] − L iff g [ X ] − L = g [ Y ] − L where the final step uses (A.6). This proves f to be well-defined and injective.Moreover, it immediately follows from the definition that f is surjective andpreserves finite unions.(3) For each a ∈ Σ choose n a ∈ N such that a − K = ( w n a ) − K for all K ∈ LD ( L ). The respective transition endomorphisms of the JSL -automata
SLD ( L ) and SLD ( L ) determine each other in the sense that the following dia-grams commute: SLD ( L ) f ∼ = / / a − ( − ) (cid:15) (cid:15) SLD ( L ) w − ( − ) (cid:15) (cid:15) SLD ( L ) f ∼ = / / SLD ( L ) SLD ( L ) f ∼ = / / ( a na ) − ( − ) (cid:15) (cid:15) SLD ( L ) a − ( − ) (cid:15) (cid:15) SLD ( L ) f ∼ = / / SLD ( L )It follows that extensions of the canonical representations κ L and κ L ◦ µ L corres-pond uniquely to extensions of the canonical representations κ L and κ L ◦ µ L ,respectively. Therefore, ns( L ) = ns( L ) by Theorem 4.7 and n µ ( L ) = n µ ( L ) byTheorem 4.14. Moreover, from Example 5.1 we know that ns( L ) = n µ ( L ), andso ns( L ) = n µ ( L ) as claimed. Details for Example 5.7
We prove that the map θ gives an nfa isomorphism from N L to ( N L r ) r . Note firstthat if θ ( u − L ) = v − L r , we have u − L ⊆ X ⇐⇒ v r ∈ X for X ∈ LD ( L ) . ondeterministic Syntactic Complexity 33 In fact, u − L ⊆ X ⇐⇒ X τ ( u − L ) def. τ ⇐⇒ X dr L ( v − L r ) τ = dr L ◦ θ ⇐⇒ DR L ( X, v − L r ) by Theorem 3.15 ⇐⇒ v r ∈ X def. DR L With this preparation, we verify that θ satisfies the properties of an nfa morph-ism: Preservation of initial and final states.
Let u − L ∈ J ( SLD ( L )) and θ ( u − L ) = v − L r . Then u − L ⊆ L ⇐⇒ v r ∈ L ⇐⇒ v ∈ L r ⇐⇒ ε ∈ v − L r . A symmetric argument, exchanging the roles of L and L r , shows that ε ∈ u − L ⇐⇒ v − L r ⊆ L r . Thus, the state u − L is initial/final in N L iff v − L r is initial/final in ( N L r ) r . Preservation of transitions.
Let u − L, u − L ∈ J ( SLD ( L )) and θ ( u − L ) = v − L r , θ ( u − L ) = v − L r . For each a ∈ Σ , we need to show that there is a transition u − L a −→ u − L in N L iff there is a transition v − L r a −→ v − L r in ( N L r ) r . In fact: u − L ⊆ ( ua ) − L ⇐⇒ v r ∈ ( ua ) − L ⇐⇒ uav r ∈ L ⇐⇒ vau r ∈ L r ⇐⇒ u r ∈ ( va ) − L r ⇐⇒ v − L r ⊆ ( va ) − L rr