A Lower Bound for Primality of Finite Languages
aa r X i v : . [ c s . F L ] F e b A Lower Bound for Primalityof Finite Languages
Philip Sieder19th February 2019
Abstract
A regular language L is said to be prime, if it is not the product of two non-trivial lan-guages. Martens et al. settled the exact complexity of deciding primality for determ-inistic finite automata in 2010. For finite languages, Mateescu et al. and Wieczoreksuspect the NP - completeness of primality, but no actual bounds are given. Usingthe techniques of Martens et al., we prove the NP lower bound and give a Π P2 up-per bound for deciding primality of finite languages given as deterministic finiteautomata. Contents NP - hardness of Primality finite SquareTiling edge to SquareTiling rel . . . . . . . . . . . . . . . . . . . 94.2 From
SquareTiling rel to ConcatenationEquivalence finite . . . . . . . . . . . 111ontents4.3 From
ConcatenationEquivalence finite to Primality finite . . . . . . . . . . . . 14
Coming from number theory, the primality of regular languages is a quite natural prob-lem. As integers have a unique prime factorisation, one could hope to decomposelanguages into indecomposable (and therefore possibly simpler) languages. Unfortu-nately the decompositions of languages do not behave as nicely as those of numbers.A language, if decomposable, can have different decompositions. Neither the num-ber of prime factors is unique nor do different decompositions need to have commonprime factors [MSY98, Section 4]. Therefore the most interesting question is, whethera language can be decomposed at all, or in other words whether a language is prime.As in number theory, the complexity of a primality test (for regular languages) waspinpointed relatively recently. Martens et al. [MNS10] showed that the problem is
PSPACE - complete . For finite languages in particular, there are pursuits by Mateescu etal. [MSY98] and Wieczorek [Wie10], but, besides an NP - completeness conjecture, noactual bounds have been given. Using the ideas of Martens et al., we prove an NP lowerbound and a Π P2 upper bound for the problem. So again languages behave way worsethen numbers, where primality can be tested in polynomial time.In Section 2 we establish the notation and give definitions for the general languagetheoretical facts we need. In Section 3 we give some insight on the necessary propertiesfor studying primality of regular languages. Those enable the proof of the Π P2 upperbound at the end of the section. Section 4 provides the NP - hardness by establishing achain of polynomial time reductions, similar to the one in the proof of Martens, Niewerthand Schwentick. In the final Section 5, we give a brief compilation of what is yet to bedetermined. In this section we will introduce the basic concepts and notations. We omit the factsabout complexity classes and polynomial time reduction. For those concepts and defini-tions we refer to Papadimitriou’s book [Pap94]. First let us fix some general symbols:3 Preliminaries
Notation. [ a, b ] := { m ∈ Z | a ≤ m ≤ b } with a, b ∈ Z integers. n Z := { n · m | m ∈ Z } with n ∈ Z an integer.For a computational decision problem PROBLEM , ¬ PROBLEM describes the sameproblem with negated answer.Now we will introduce the most important concepts about regular languages and finiteautomata we use. Since this part is mostly to fix the notation, we do not give muchexplanation or motivation and the definitions might have minor inaccuracies. For amore thorough understanding of those conceptions we refer to the book of Hopcroft etal. [HMRU00].
Definition 1.
A (finite) alphabet is a finite set Σ of letters. A word w = a . . . a n is a finitesequence of letters a i ∈ Σ and | w | = | a . . . a n | := n is the length of the word . The emptyword (of length zero) is written as ε . For two words v = a . . . a m and w = b . . . b n , v ◦ w := vw := a . . . a m b . . . b n describes the concatenation of the two words v and w .The Kleene closure of Σ is Σ ∗ := S n ≥ Σ n where Σ n denotes the set of all words overthe alphabet Σ with length n . Additionally Σ + := S n ≥ Σ n is the set of all words withpositive length. A language L ⊆ Σ ∗ is a set of words. A finite language is a languagecontaining only finitely many words. For two languages L and L over an alphabet Σ ,the term L ◦ L := L L := { vw ∈ Σ ∗ | v ∈ L and w ∈ L } describes the product (orconcatenation) of the two languages. Definition 2 (finite automaton) . A nondeterministic finite automaton ( NFA ) M is a tuple ( Q, Σ , δ, I, F ) where Q is a finite set of states, Σ is a finite alphabet, δ : Q × Σ → Q isthe transition function, I ⊆ Q is the set of initial states and F ⊆ Q is the set of acceptingstates. The automaton is called a deterministic finite automaton ( DFA ) if | I | = 1 and forall q ∈ Q and all a ∈ Σ the inequation | δ ( q, a ) | ≤ holds. Remark.
In this thesis, if not explicitly mentioned otherwise, an “automaton” is a
DFA .We allow δ ( q, a ) = ∅ for DFA s to simplify their specification. To get a model where δ isa total function one only has to add a sink state g such that δ ( q, a ) equals { g } instead of ∅ and δ ( g, a ) = { g } for all a ∈ Σ . When a transition function is defined in this paper, anot considered pair ( q, a ) ∈ Q × Σ means δ ( q, a ) = ∅ . Furthermore, if δ ( q, a ) = { q ′ } is asingleton, we write δ ( q, a ) = q ′ . Notation.
Let ( Q, Σ , δ, I, F ) be an NFA , S ⊆ Q , w ∈ Σ ∗ and a ∈ Σ . Then we define4 An introduction to primality of regularlanguages• δ ( S, a ) := S q ∈ S δ ( q, a ) • δ ( S, w ) inductively as δ ( S, aw ) := δ ( δ ( S, a ) , w ) (the states reached from S after reading w )• δ ∗ ( S, w ) inductively as δ ∗ ( S, aw ) := δ ( S, a ) ∪ δ ( δ ( S, a ) , w ) (all states visited from S by reading w )If S = { q } is a singleton, we write δ ( q, w ) and δ ∗ ( q, w ) . Definition 3.
Let M = ( Q, Σ , δ, I, F ) be an NFA .The language L ( M ) := { w ∈ Σ ∗ | δ ( I, w ) ∩ F = ∅ } is the language defined by M .A language L ⊆ Σ ∗ is called regular , if there is an NFA M such that L = L ( M ) . Remark.
Every regular language L has a DFA M such that L = L ( M ) . Corollary 4.
Every finite language is regular.
In this section we give the definitions, important properties and known results about theprimality of regular and finite languages. First of we start with a definition of primality.
Definition 5 (Primality) . A regular language L ⊆ Σ ∗ is called decomposable , if there arelanguages L , L ⊆ Σ ∗ , L = { ε } 6 = L such that L = L ◦ L . If L is not decomposableit is called prime . Remark.
As we see in Theorem 11, it makes no difference whether we require L and L to be regular languages.The definition adverts the following decision problem: Problem 6.
Primality regular
Input: A regular language L over a finite alphabet Σ given as a DFA
Question: Is L primeThe exact complexity of this problem was determined relatively recently:5 An introduction to primality of regularlanguages Theorem 7 ([MNS10, Corollary 6.10]) . Primality regular is PSPACE - complete . For finite languages the exact complexity of the problem is not yet known. To the bestof our knowledge, the NP - hardness , which we prove in Theorem 14, was not knownbefore. Let us start with a definition of the problem. Problem 8.
Primality finite
Input: A finite language L over a finite alphabet Σ given as a DFA
Question: Is L primeThe problem was examined before: The paper of Mateescu et al. [MSY98] establishessome notions, gives general results and treats examples. They suspect NP - completeness for Primality finite , but only give a double exponential algorithm [MSY98, Theorem 3.1and below]. A less theoretical approach takes Wieczorek [Wie10], as he offers an optim-ised deterministic algorithm for a finite language given as a list. If the finite languageis given as a list of words, the primality problem is obviously in coNP : One guesses apartition in two parts for every word and checks whether all combinations of a first partof one and a second part of another word are again in the given language. As the de-scription of a finite language as a list can be exponentially larger than the corresponding
DFA (for instance the language of all words of a specific length), the algorithm is notuseful for our problem.To check for primality of a language L , one has to consider if there are languages L and L that decompose L = L L . Because we have to work with the DFA of L , we shouldexamine the states in which the words get actually split. That leads to the followingdefinition and results: Definition 9.
Let L be a regular language, given as a DFA M = ( Q, Σ , δ, { s } , F ) and P ⊆ Q a set of states. We call P a partition set and define the regular languages L P := { w ∈ Σ ∗ | δ ( s, w ) ∈ P } and L P := \ p ∈ P { w ∈ Σ ∗ | δ ( p, w ) ∈ F } . Remark.
The languages L P and L P are regular because ( Q, Σ , δ, { s } , P ) is an automatonfor L P and ( Q, Σ , δ, { p } , F ) is an automaton for { w ∈ Σ ∗ | δ ( p, w ) ∈ F } and an intersec-tion of regular languages is regular again [HMRU00, Section 4.2]. Lemma 10.
Let L be a regular language given as a DFA M = ( Q, Σ , δ, { s } , F ) and let P ⊆ Q be any subset, then L P L P ⊆ L .Proof. Let w w ∈ L P L P with w i ∈ L Pi , then δ ( s, w ) ∈ P by the definition of L P andtherefore δ ( s, w w ) = δ ( δ ( s, w ) , w ) ∈ F by the definition of L P . Theorem 11 ([MSY98, Lemma 3.1]) . Let L be a regular language, given as a DFA M =( Q, Σ , δ, { s } , F ) , let L = L L be a decomposition of L and let P := { q ∈ Q | q = δ ( s, w ) for some w ∈ L } be the set of “border”-states. Then L ⊆ L P , L ⊆ L P and L = L P L P is the decomposition of L into two regular languages.Proof. L ⊆ L P : Let w ∈ L , then δ ( s, w ) ∈ P and therefore w ∈ L P . L ⊆ L P : Suppose w ∈ L \ L P , that means w ∈ L and there is a p ∈ P such that δ ( p, w ) / ∈ F . Let v ∈ L such that δ ( s, v ) = p . Then vw / ∈ L , because δ ( s, vw ) = δ ( δ ( s, v ) , w ) = δ ( p, w ) / ∈ F , but at the same time vw ∈ L L = L . That contradicts theexistence of w ∈ L \ L P . L = L P L P : The inclusion L ⊆ L P L P follows directly from L = L L and L i ⊆ L Pi for i ∈ { , } . The other inclusion was given in Lemma 10.The theorem enables us to limit our search for decompositions to the ones that arisefrom this construction. The problem is, after guessing a partition set P , to actually checkwhether L ⊆ L P L P . Unfortunately the intersection of O ( n ) sets and the concatenationof two languages is not efficient, as both can lead to an exponential blow-up of thenumber of states. 7 An introduction to primality of regularlanguagesWe do not use the following theorem from Wieczorek [Wie10], which is included forreaders interested in further research. It allows to reduce the states that have to beconsidered for P , but a reduction beyond O ( n ) is neither obvious nor likely. Theorem 12 ([Wie10, Theorem 3]) . Let L be a decomposable finite language with aminimal DFA M = ( Q, Σ , δ, { s } , F ) . Then there is a partition set P with L = L P L P suchthat for all p ∈ P either |{ a ∈ Σ | δ ( p, a ) }| > or ( p ∈ F ) ∧ ( ∃ w ∈ Σ ∗ : δ ( p, w ) ∈ F ) holds. Unfortunately we did not close the gap between the NP lower and the Π P2 upper bound.But let us at least provide a proof for the Π P2 upper bound: Proposition 13.
Primality finite is in Π P2 .Proof. The definitions for the polynomial hierarchy can be found in Papadimitriou’s book[Pap94, Section 17.2]. We will argue that ¬ Primality finite is in Σ P2 by the characterisationof [Pap94, Chapter 17, Corollary 2]: ¬ Primality finite = { L = L ( Q, Σ , δ, I, F ) | ∃ P ⊆ Q ∀ w ∈ L : ( L, P, w ) ∈ R : ⇐⇒ w ∈ L P L P } Using Theorem 11 and Lemma 10, the right side is a characterisation of ¬ Primality finite .We have to check that the relation R is polynomial-time decidable and is polynomiallybalanced. For a finite language L let M = ( Q, Σ , δ, { s } , F ) be the DFA of L and n itssize. The relation is polynomial-time decidable: One simulates M on the input w andstores the set P w := δ ∗ ( s, w ) ∩ P and the remaining characters of w (when reaching p ∈ P w ) in W ⊆ Σ ∗ . If P w = ∅ , we reject. Otherwise we simulate for all v ∈ W and all p ∈ P w the automaton M p := ( Q, Σ , δ, { p } , F ) on v . If there is at least one v such that v ∈ L ( M p ) for all p ∈ P w , we accept or else we reject. So the test takes at most time O ( n + n · n ) .The relation is polynomially balanced as well since the partition set has at most n ele-ments and w has length at most n − ( M is acyclic since the language is finite).8 NP - hardness of Primality finite NP - hardness of Primality finite
In this chapter we proof the following main theorem of the paper:
Theorem 14.
Primality finite is NP - hard (for languages given as DFA s).
We will start with the NP - complete problem SquareTiling edge and build the followingchain of polynomial reductions: NP ≤ SquareTiling edge ≤ SquareTiling rel ≤¬ ConcatenationEquivalence finite ≤ Primality finite
The chain is actually quite similar to the one in the work of Martens et al. [MNS10,Sections 5.2 and 6.2]. They reference a different form of tiling and use a special case ofconcatenation equivalence.
SquareTiling edge to SquareTiling rel
We start with a tiling problem whose complexity is stated in the book of Garey andJohnson [GJ79]. Then we will adapt the problem to a better fitting variant.
Problem 15.
SquareTiling edge
Input: A set of colours C , a set of tiles T ⊆ C and a natural number n ≤ | C | ;A tile a bcd := ( a, b, c, d ) ∈ T has four edges with correspondingcoloursQuestion: Is there a tiling, i.e. an n × n square A ∈ T n × n of tiles, such thatall adjacent tiles A ( i, j ) = a bc d and A ( i, j + 1) = α βγ δ resp. A ( i + 1 , j ) = ˜ a ˜ b ˜ c ˜ d fullfill b = δ resp. c = ˜ a NP - hardness of Primality finite
Proposition 16 ([GJ79, GP13] ) . SquareTiling edge is NP - complete . Problem 17.
SquareTiling rel
Input: A set of tiles Θ , relations V, H ⊆ Θ × Θ and a natural number n ∈ N Question: Is there a tiling, i.e. an n × n square T ∈ Θ n × n , such that adjacenttiles are in the horizontal relation H resp. the vertical relation V : ∀ i ∀ j < n : ( T ( i, j ) , T ( i, j + 1)) ∈ H ∀ i < n ∀ j : ( T ( i, j ) , T ( i + 1 , j )) ∈ V Remark.
Alternatively we write T ( i · n + j ) := T ( i, j ) and get a list where ( T ( m ) , T ( m +1)) ∈ H for ≤ m < n ∧ m / ∈ n Z and ( T ( m ) , T ( m + n )) ∈ V for ≤ m ≤ n − n has tobe fulfilled. Proposition 18.
SquareTiling rel is NP - hard .Proof. Given an input C , T and n for SquareTiling edge . Let H := { ( a b cd , α βγ δ ) ∈ T × T | b = δ } , V := { ( a b c d , α βγδ ) ∈ T × T | c = α } and Θ := T . Then there is a tiling T for SquareTiling rel (Θ , H, V, n ) if and only if thereis one for SquareTiling edge ( C, T , n ) . The construction of Θ , H and V works obviously inpolynomial time. Remark.
One can translate
SquareTiling rel to SquareTiling edge as well, as outlined in apaper of van Emde Boas [vEB97, p. 7]. The source only mentions that the directed-hamilton-path problem is reduced to
SquareTiling edge . Togive the interested reader a basis for the proof: n := vertices, the series of vertices in the hamiltonpath is written on the diagonal of the n × n -square and the corresponding edges are on the diagonalsabove and below the main diagonal. NP - hardness of Primality finite
SquareTiling rel to ConcatenationEquivalence finite
This is the most interesting reduction in the chain. Here a truly original idea, not presentin the proof of Martens et al. [MNS10], is necessary.
Problem 19.
ConcatenationEquivalence finite
Input: Finite languages L , L and L over a finite alphabet Σ given as DFA sQuestion: Does L = L L holdNow we will reduce SquareTiling rel to ¬ ConcatenationEquivalence finite . The most inter-esting point, compared to the regular language case, is that we work over the alphabet Θ × [1 , n ] instead of just Θ . This allows us, for a word in L L , to detect the point wherewe jump from L to L . Proposition 20.
ConcatenationEquivalence finite is coNP - complete .Proof. It is obviously in coNP , since a word in L \ L L or in L L \ L is a witness for L = L L and the longest words to consider have length O ( n ) , as the languages are finite.So we get to the coNP - hardness . Suppose we can solve ConcatenationEquivalence finite .Let n , Θ and V, H ⊆ Θ × Θ be an input for SquareTiling rel . We define L := { ( t , t , . . . ( t m , m ) ∈ (Θ × [1 , n ]) ∗ | m ≤ n − } , L := [ ≤ m ≤ n ,m/ ∈ n Z { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m +1 ) / ∈ H } ∪ [ ≤ m ≤ n − n { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m + n ) / ∈ V } and L := L L ∪ { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } . The size of the
DFA s of the defined languages is polynomial in the size of the input andcan be constructed in polynomial time as shown below.11 NP - hardness of Primality finite
The automaton for L is pretty simple and has n − states: start . . . n − × { } Θ × { } Θ × { n − } The automaton for L is more complicated and depends on V and H , but it is polynomialin size. We will give two automata M V and M H with polynomial sizes such that L ( M H ) = [ ≤ m ≤ n ,m/ ∈ n Z { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m +1 ) / ∈ H } and L ( M V ) = [ ≤ m ≤ n − n { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m + n ) / ∈ V } . Obviously L = L ( M H ∪ M V ) and the union automaton still has polynomial size [Yu97,Proof of Theorem 2.1].The automaton M H is constructed as follows: The set of states is Q H := { s H } ˙ ∪ { ς t,m | t ∈ Θ , m ∈ [1 , n ] \ n Z } ˙ ∪ { ς i | i ∈ [2 , n ] } . The automaton has to check for a word ( t , m )( t , m ) . . . ( t k , m k ) whether ( t , t ) / ∈ H and whether m i +1 = m i + 1 for all i . So after reading the first letter the state has to store t and every state has to store the most recent m i . Therefore after the first character wego to the corresponding state ς t ,m . If the next character fulfils both ( t , t ) / ∈ H and m = m + 1 , we only have to check m i +1 = m i + 1 . Hence we only store the mostrecent m i , by going to the state ς m i . Once we get to ς n we accept. If otherwise therewas any mistake we stop the run at that point.Here a formal definition of the transition function δ H : δ H ( s H , ( t, m )) := ς t,m for ≤ m < n and m / ∈ n Z δ H ( ς t,m , ( t ′ , m + 1)) := ς m +1 for ≤ m < n and ( t, t ′ ) / ∈ Hδ H ( ς m , ( t, m + 1)) := ς m +1 for < m < n NP - hardness of Primality finite
The automaton is then defined as M H := ( Q H , Θ × [1 , n ] , δ H , { s H } , { ς n } ) and has | Θ | · ( n − n ) + n − states.The automaton M V is quite similar. The only difference is, that we have to check for aword ( t , m )( t , m ) . . . ( t k , m k ) , whether ( t , t n ) / ∈ V . Therefore we need the addi-tional states σ t,m,o , where o stores how many characters away from ( t , m ) we alreadyare. Hence we get the following set of states Q V := { s V } ˙ ∪ { σ t,m,o | t ∈ Θ , m ∈ [1 , n − n ] , o ∈ [0 , n − } ˙ ∪ { σ i | i ∈ [ n + 1 , n ] } ,the transition function δ V ( s H , ( t, m )) := σ t,m, for ≤ m ≤ n − nδ V ( σ t,m,i , ( t ′ , m + i + 1)) := σ t,m,i +1 for ≤ i < n − δ V ( σ t,m,n − , ( t ′ , m + n )) := σ m + n for ≤ m ≤ n − n and ( t, t ′ ) / ∈ Vδ V ( σ m , ( t, m + 1)) := σ m +1 for ≤ m < n and finally the DFA is given as M V := ( Q V , Θ × [1 , n ] , δ V , { s V } , { σ n } ) with | Θ | · ( n − n ) · n + n − n states.So at last we have to show that L = L L ∪ { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } has a polynomial-sized automaton. Using this description, we might get an exponentialblow-up from the concatenation, but (with the help from the [1 , n ] part of the alphabet)the language can be characterised a bit differently. It basically contains all properlynumbered tilings and additionally those with one jump and a forbidden tiling (with afault, either vertically or horizontally, directly after the jump). An automaton for thiscan be constructed using a DFA M = ( Q , Θ × [1 , n ] , δ , { s } , F ) that accepts L .As set of states we use Q := Q \ { s } ˙ ∪ [0 , n ] and the transition function is as follows δ ( q, ( t, m )) := q + 1 m = q + 1 , ≤ q < n δ ( s , ( t, m )) m = q + 1 q ∈ [0 , n ] δ ( q, ( t, m )) q ∈ Q \ { s } The idea is to check for legal numbering with the states [0 , n ] . If there is a leap in13 NP - hardness of Primality finite the numbering, we jump into the automaton for L . So the automaton is given by M := ( Q, Θ × [1 , n ] , δ, { } , { n }∪ F ) . Obviously L ( M ) = L holds and M has polynomialsize.Thus DFA s for L , L and L are constructed in polynomial time and have polynomial sizein the size of the tiling problem. Combining this with the following Lemma 21 yields areduction from SquareTiling rel to ¬ ConcatenationEquivalence finite . That
SquareTiling rel is NP - hard (Proposition 18) completes the proof. Lemma 21.
Let n , Θ , V, H ⊆ Θ × Θ be an input for SquareTiling rel and L , L and L constructed as above. Then L = L L if and only if there is no legal tiling.Proof. A word in { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } can be interpreted as atiling, where T ( j ) = t j . Every word w w ∈ { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } ∩ L L with w i ∈ L i represents a tiling that violates the given relations: Let w =( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ L , then either ( t m , t m +1 ) / ∈ H or ( t m , t m + n ) / ∈ V which contradicts a legal tiling.Let L = L L , then { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } ⊂ L L , so every pos-sible tiling violates the relations and therefore there is no legal tiling. On the otherhand, if there is no legal tiling, then every possible tiling violates a relation. Hence { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } ⊂ L L which yields L = L L . ConcatenationEquivalence finite to Primality finite
Theorem 14.
Primality finite is NP - hard . The following proof is similar to [MNS10, Proof of Theorem 6.4]. The difference is,since they treat (non-finite) regular languages, that they reduce the problem L L = Σ ∗ (so for them L = Σ ∗ ). Proof of Theorem 14.
Let L , L and L be finite languages over the alphabet Σ given as DFA s. We want to construct a language A , such that A is decomposable if and only if14 NP - hardness of Primality finite L = L L , which reduces ConcatenationEquivalence finite to ¬ Primality finite and provesthe theorem by the coNP - hardness of ConcatenationEquivalence finite (Proposition 20).If L = ∅ , L = { ε } , L = ∅ or L = ∅ , then it is easy to check whether L = L L . So wecan assume ∅ = L = { ε } and L = ∅ = L .Let Σ ′ := { a ′ | a ∈ Σ } be a disjoint copy of the alphabet and let $ / ∈ Σ ˙ ∪ Σ ′ be anadditional letter. L ′ and L ′ are the respective languages over Σ ′ .Now we define the language A := L ∪ L $ L ′ ∪ L ′ $ L ∪ L ′ $$ L ′ . The language’s
DFA is obviously constructable in polynomial time.
Lemma 22.
The language A is either prime or its only non-trivial decomposition is A ◦ A with A := L ∪ L ′ $ and A := L ∪ $ L ′ .Remark. The proof of Martens et al. [MNS10, Claim 6.5] in the paper’s appendix worksnearly word for word. It is rather technical and adds no real value. For the sake ofcompleteness we provide one regardless.
Proof.
Suppose A = A l A r is a non-trivial decomposition. We first show that A l ⊆ Σ ∗ ∪ Σ ′∗ $ and symmetrically A r ⊆ Σ ∗ ∪ $Σ ′∗ .Suppose A l contains a word w l with two $ -letters in it or where a symbol from Σ precedesa $ -sign.In both cases, for w l w r to be in A , w r has to be in Σ ′∗ . Thus A r ⊆ Σ ′∗ and, since thedecomposition is non-trivial, A r ) { ε } . The language L ⊆ Σ ∗ contains at least one word v of length ≥ (see premises). So we can concatenate v ∈ L ⊆ A l (because A r ⊆ Σ ′∗ and L ⊆ A ) with a word ε = w ∈ A r and should get vw ∈ A l A r = A . That is acontradiction, as A does not contain a word in Σ + Σ ′ + .The language A r includes at least one word w containing a $ , because A ⊇ L ′ $$ L ′ = ∅ (we assume L = ∅ = L ) and any word in A l contains at most one $ . If any word v ∈ A l includes a $ not as its last character, we get a paradox because vw ∈ A l A r = A
15 Final remarksincorporates two $ -signs that are not next to each other. That cannot happen for a wordin A .Now we know, every word in A l contains at most one $ -sign and if it contains one, the $ is the last sign and is not preceded by a letter in Σ . Hence A l ⊆ Σ ∗ ∪ Σ ′∗ $ . Symmetrically(one can look at the reversed languages) A r ⊆ Σ ∗ ∪ $Σ ′∗ .The intersection A ∩ Σ ∗ $Σ ′∗ = L $ L ′ together with the structures of A l and A r yield A l ∩ Σ ∗ = L and symmetrically A r ∩ Σ ∗ = L .Similarly A ∩ Σ ′∗ $$Σ ′∗ = L ′ $$ L ′ along with the structures of A l and A r imply A l ∩ Σ ′∗ $ = L ′ $ and A r ∩ $Σ ′∗ = $ L ′ . Thus A l = L ∪ L ′ $ and A r = L ∪ $ L ′ . Proposition 23. A is decomposable if and only if L = L L .Proof. If A is decomposable, we know A = A A as defined in Lemma 22. Since A ∩ Σ ∗ = L , A ∩ Σ ∗ = L , A ∩ Σ ∗ = L and A = A A , L = L L holds.If on the other hand L = L L , obviously A = A A as in Lemma 22.So accordingly we get that the coNP - hard problem ConcatenationEquivalence finite is re-ducible to the complement of
Primality finite . Therefore the original problem
Primality finite is NP - hard . There are still many open questions related to
Primality finite . Obviously the exact com-plexity has to be determined. Our attempts to find an NP -algorithm failed, so perhapsthe lower bound has to be improved further. A coNP lower bound for primality witha list as input would strongly hint to a higher lower bound for Primality finite . Havingthe input as an
NFA would be yet another problem to consider. In that case basicallynothing is known, since Theorem 11 is not applicable in its current form.16 Final remarksAside from these variants for the input, the decomposition into three, four or generallyinto m languages is a problem to consider (for all the input variants). A priori we donot know much about that. For lists we still get coNP for fixed m by naively guessingthe partition. And for DFA s we can check, for all possible decompositions into twolanguages, whether those languages are decomposable again. That approach clearly isnot efficient.Comprehensively we can say that there are still many open questions regarding thecomplexity of decompositions of finite languages.17eferences
Acknowledgement.
I want to thank Prof. Dr. Wim Martens for his guidance and support,Dr. Matthias Niewerth for his advice, especially for the idea to “count” the tiles, andJohannes Doleschal for proofreading.
References [GJ79] Michael R. Garey and David S. Johnson,
Computers and Intractability: AGuide to the Theory of NP-Completeness , W. H. Freeman & Co., New York,NY, USA, 1979.[HMRU00] John E. Hopcroft, Rajeev Motwani, Rotwani, and Jeffrey D. Ullman,
Intro-duction to Automata Theory, Languages and Computability , 2nd ed., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000.[MNS10] Wim Martens, Matthias Niewerth, and Thomas Schwentick,
Schema Designfor XML Repositories: Complexity and Tractability , Proceedings of theTwenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles ofDatabase Systems (New York, NY, USA), PODS ’10, ACM, 2010, pp. 239–250.[MSY98] Alexandru Mateescu, Arto Salomaa, and Sheng Yu,
On the Decomposition ofFinite Languages , Tech. report, 1998.[Pap94] Christos H. Papadimitriou,
Computational Complexity , Theoretical computerscience, Addison-Wesley, 1994.[vEB97] Peter van Emde Boas,
The convenience of tilings , Lecture Notes in Pure andApplied Mathematics (1997), 331–363.[Wie10] Wojciech Wieczorek,