[PDF] A Lower Bound for Primality of Finite Languages

Abstract

A regular language L is said to be prime, if it is not the product of two non-trivial languages. Martens et al. settled the exact complexity of deciding primality for deterministic finite automata in 2010. For finite languages, Mateescu et al. and Wieczorek suspect the NP - completeness of primality, but no actual bounds are given. Using techniques of Martens et al., we prove the NP lower bound and give a Π P 2 upper bound for deciding primality of finite languages given as deterministic finite automata.

Full PDF

aa r X i v : . [ c s . F L ] F e b A Lower Bound for Primalityof Finite Languages

Philip Sieder19th February 2019

Abstract

A regular language L is said to be prime, if it is not the product of two non-trivial lan-guages. Martens et al. settled the exact complexity of deciding primality for determ-inistic ﬁnite automata in 2010. For ﬁnite languages, Mateescu et al. and Wieczoreksuspect the NP - completeness of primality, but no actual bounds are given. Usingthe techniques of Martens et al., we prove the NP lower bound and give a Π P2 up-per bound for deciding primality of ﬁnite languages given as deterministic ﬁniteautomata. Contents NP - hardness of Primality ﬁnite SquareTiling edge to SquareTiling rel . . . . . . . . . . . . . . . . . . . 94.2 From

SquareTiling rel to ConcatenationEquivalence ﬁnite . . . . . . . . . . . 111ontents4.3 From

ConcatenationEquivalence ﬁnite to Primality ﬁnite . . . . . . . . . . . . 14

Coming from number theory, the primality of regular languages is a quite natural prob-lem. As integers have a unique prime factorisation, one could hope to decomposelanguages into indecomposable (and therefore possibly simpler) languages. Unfortu-nately the decompositions of languages do not behave as nicely as those of numbers.A language, if decomposable, can have different decompositions. Neither the num-ber of prime factors is unique nor do different decompositions need to have commonprime factors [MSY98, Section 4]. Therefore the most interesting question is, whethera language can be decomposed at all, or in other words whether a language is prime.As in number theory, the complexity of a primality test (for regular languages) waspinpointed relatively recently. Martens et al. [MNS10] showed that the problem is

PSPACE - complete . For ﬁnite languages in particular, there are pursuits by Mateescu etal. [MSY98] and Wieczorek [Wie10], but, besides an NP - completeness conjecture, noactual bounds have been given. Using the ideas of Martens et al., we prove an NP lowerbound and a Π P2 upper bound for the problem. So again languages behave way worsethen numbers, where primality can be tested in polynomial time.In Section 2 we establish the notation and give deﬁnitions for the general languagetheoretical facts we need. In Section 3 we give some insight on the necessary propertiesfor studying primality of regular languages. Those enable the proof of the Π P2 upperbound at the end of the section. Section 4 provides the NP - hardness by establishing achain of polynomial time reductions, similar to the one in the proof of Martens, Niewerthand Schwentick. In the ﬁnal Section 5, we give a brief compilation of what is yet to bedetermined. In this section we will introduce the basic concepts and notations. We omit the factsabout complexity classes and polynomial time reduction. For those concepts and deﬁni-tions we refer to Papadimitriou’s book [Pap94]. First let us ﬁx some general symbols:3 Preliminaries

Notation. [ a, b ] := { m ∈ Z | a ≤ m ≤ b } with a, b ∈ Z integers. n Z := { n · m | m ∈ Z } with n ∈ Z an integer.For a computational decision problem PROBLEM , ¬ PROBLEM describes the sameproblem with negated answer.Now we will introduce the most important concepts about regular languages and ﬁniteautomata we use. Since this part is mostly to ﬁx the notation, we do not give muchexplanation or motivation and the deﬁnitions might have minor inaccuracies. For amore thorough understanding of those conceptions we refer to the book of Hopcroft etal. [HMRU00].

Deﬁnition 1.

A (ﬁnite) alphabet is a ﬁnite set Σ of letters. A word w = a . . . a n is a ﬁnitesequence of letters a i ∈ Σ and | w | = | a . . . a n | := n is the length of the word . The emptyword (of length zero) is written as ε . For two words v = a . . . a m and w = b . . . b n , v ◦ w := vw := a . . . a m b . . . b n describes the concatenation of the two words v and w .The Kleene closure of Σ is Σ ∗ := S n ≥ Σ n where Σ n denotes the set of all words overthe alphabet Σ with length n . Additionally Σ + := S n ≥ Σ n is the set of all words withpositive length. A language L ⊆ Σ ∗ is a set of words. A ﬁnite language is a languagecontaining only ﬁnitely many words. For two languages L and L over an alphabet Σ ,the term L ◦ L := L L := { vw ∈ Σ ∗ | v ∈ L and w ∈ L } describes the product (orconcatenation) of the two languages. Deﬁnition 2 (ﬁnite automaton) . A nondeterministic ﬁnite automaton ( NFA ) M is a tuple ( Q, Σ , δ, I, F ) where Q is a ﬁnite set of states, Σ is a ﬁnite alphabet, δ : Q × Σ → Q isthe transition function, I ⊆ Q is the set of initial states and F ⊆ Q is the set of acceptingstates. The automaton is called a deterministic ﬁnite automaton ( DFA ) if | I | = 1 and forall q ∈ Q and all a ∈ Σ the inequation | δ ( q, a ) | ≤ holds. Remark.

In this thesis, if not explicitly mentioned otherwise, an “automaton” is a

DFA .We allow δ ( q, a ) = ∅ for DFA s to simplify their speciﬁcation. To get a model where δ isa total function one only has to add a sink state g such that δ ( q, a ) equals { g } instead of ∅ and δ ( g, a ) = { g } for all a ∈ Σ . When a transition function is deﬁned in this paper, anot considered pair ( q, a ) ∈ Q × Σ means δ ( q, a ) = ∅ . Furthermore, if δ ( q, a ) = { q ′ } is asingleton, we write δ ( q, a ) = q ′ . Notation.

Let ( Q, Σ , δ, I, F ) be an NFA , S ⊆ Q , w ∈ Σ ∗ and a ∈ Σ . Then we deﬁne4 An introduction to primality of regularlanguages• δ ( S, a ) := S q ∈ S δ ( q, a ) • δ ( S, w ) inductively as δ ( S, aw ) := δ ( δ ( S, a ) , w ) (the states reached from S after reading w )• δ ∗ ( S, w ) inductively as δ ∗ ( S, aw ) := δ ( S, a ) ∪ δ ( δ ( S, a ) , w ) (all states visited from S by reading w )If S = { q } is a singleton, we write δ ( q, w ) and δ ∗ ( q, w ) . Deﬁnition 3.

Let M = ( Q, Σ , δ, I, F ) be an NFA .The language L ( M ) := { w ∈ Σ ∗ | δ ( I, w ) ∩ F = ∅ } is the language deﬁned by M .A language L ⊆ Σ ∗ is called regular , if there is an NFA M such that L = L ( M ) . Remark.

Every regular language L has a DFA M such that L = L ( M ) . Corollary 4.

Every ﬁnite language is regular.

In this section we give the deﬁnitions, important properties and known results about theprimality of regular and ﬁnite languages. First of we start with a deﬁnition of primality.

Deﬁnition 5 (Primality) . A regular language L ⊆ Σ ∗ is called decomposable , if there arelanguages L , L ⊆ Σ ∗ , L = { ε } 6 = L such that L = L ◦ L . If L is not decomposableit is called prime . Remark.

As we see in Theorem 11, it makes no difference whether we require L and L to be regular languages.The deﬁnition adverts the following decision problem: Problem 6.

Primality regular

Input: A regular language L over a ﬁnite alphabet Σ given as a DFA

Question: Is L primeThe exact complexity of this problem was determined relatively recently:5 An introduction to primality of regularlanguages Theorem 7 ([MNS10, Corollary 6.10]) . Primality regular is PSPACE - complete . For ﬁnite languages the exact complexity of the problem is not yet known. To the bestof our knowledge, the NP - hardness , which we prove in Theorem 14, was not knownbefore. Let us start with a deﬁnition of the problem. Problem 8.

Primality ﬁnite

Input: A ﬁnite language L over a ﬁnite alphabet Σ given as a DFA

Question: Is L primeThe problem was examined before: The paper of Mateescu et al. [MSY98] establishessome notions, gives general results and treats examples. They suspect NP - completeness for Primality ﬁnite , but only give a double exponential algorithm [MSY98, Theorem 3.1and below]. A less theoretical approach takes Wieczorek [Wie10], as he offers an optim-ised deterministic algorithm for a ﬁnite language given as a list. If the ﬁnite languageis given as a list of words, the primality problem is obviously in coNP : One guesses apartition in two parts for every word and checks whether all combinations of a ﬁrst partof one and a second part of another word are again in the given language. As the de-scription of a ﬁnite language as a list can be exponentially larger than the corresponding

DFA (for instance the language of all words of a speciﬁc length), the algorithm is notuseful for our problem.To check for primality of a language L , one has to consider if there are languages L and L that decompose L = L L . Because we have to work with the DFA of L , we shouldexamine the states in which the words get actually split. That leads to the followingdeﬁnition and results: Deﬁnition 9.

Let L be a regular language, given as a DFA M = ( Q, Σ , δ, { s } , F ) and P ⊆ Q a set of states. We call P a partition set and deﬁne the regular languages L P := { w ∈ Σ ∗ | δ ( s, w ) ∈ P } and L P := \ p ∈ P { w ∈ Σ ∗ | δ ( p, w ) ∈ F } . Remark.

The languages L P and L P are regular because ( Q, Σ , δ, { s } , P ) is an automatonfor L P and ( Q, Σ , δ, { p } , F ) is an automaton for { w ∈ Σ ∗ | δ ( p, w ) ∈ F } and an intersec-tion of regular languages is regular again [HMRU00, Section 4.2]. Lemma 10.

Let L be a regular language given as a DFA M = ( Q, Σ , δ, { s } , F ) and let P ⊆ Q be any subset, then L P L P ⊆ L .Proof. Let w w ∈ L P L P with w i ∈ L Pi , then δ ( s, w ) ∈ P by the deﬁnition of L P andtherefore δ ( s, w w ) = δ ( δ ( s, w ) , w ) ∈ F by the deﬁnition of L P . Theorem 11 ([MSY98, Lemma 3.1]) . Let L be a regular language, given as a DFA M =( Q, Σ , δ, { s } , F ) , let L = L L be a decomposition of L and let P := { q ∈ Q | q = δ ( s, w ) for some w ∈ L } be the set of “border”-states. Then L ⊆ L P , L ⊆ L P and L = L P L P is the decomposition of L into two regular languages.Proof. L ⊆ L P : Let w ∈ L , then δ ( s, w ) ∈ P and therefore w ∈ L P . L ⊆ L P : Suppose w ∈ L \ L P , that means w ∈ L and there is a p ∈ P such that δ ( p, w ) / ∈ F . Let v ∈ L such that δ ( s, v ) = p . Then vw / ∈ L , because δ ( s, vw ) = δ ( δ ( s, v ) , w ) = δ ( p, w ) / ∈ F , but at the same time vw ∈ L L = L . That contradicts theexistence of w ∈ L \ L P . L = L P L P : The inclusion L ⊆ L P L P follows directly from L = L L and L i ⊆ L Pi for i ∈ { , } . The other inclusion was given in Lemma 10.The theorem enables us to limit our search for decompositions to the ones that arisefrom this construction. The problem is, after guessing a partition set P , to actually checkwhether L ⊆ L P L P . Unfortunately the intersection of O ( n ) sets and the concatenationof two languages is not efﬁcient, as both can lead to an exponential blow-up of thenumber of states. 7 An introduction to primality of regularlanguagesWe do not use the following theorem from Wieczorek [Wie10], which is included forreaders interested in further research. It allows to reduce the states that have to beconsidered for P , but a reduction beyond O ( n ) is neither obvious nor likely. Theorem 12 ([Wie10, Theorem 3]) . Let L be a decomposable ﬁnite language with aminimal DFA M = ( Q, Σ , δ, { s } , F ) . Then there is a partition set P with L = L P L P suchthat for all p ∈ P either |{ a ∈ Σ | δ ( p, a ) }| > or ( p ∈ F ) ∧ ( ∃ w ∈ Σ ∗ : δ ( p, w ) ∈ F ) holds. Unfortunately we did not close the gap between the NP lower and the Π P2 upper bound.But let us at least provide a proof for the Π P2 upper bound: Proposition 13.

Primality ﬁnite is in Π P2 .Proof. The deﬁnitions for the polynomial hierarchy can be found in Papadimitriou’s book[Pap94, Section 17.2]. We will argue that ¬ Primality ﬁnite is in Σ P2 by the characterisationof [Pap94, Chapter 17, Corollary 2]: ¬ Primality ﬁnite = { L = L ( Q, Σ , δ, I, F ) | ∃ P ⊆ Q ∀ w ∈ L : ( L, P, w ) ∈ R : ⇐⇒ w ∈ L P L P } Using Theorem 11 and Lemma 10, the right side is a characterisation of ¬ Primality ﬁnite .We have to check that the relation R is polynomial-time decidable and is polynomiallybalanced. For a ﬁnite language L let M = ( Q, Σ , δ, { s } , F ) be the DFA of L and n itssize. The relation is polynomial-time decidable: One simulates M on the input w andstores the set P w := δ ∗ ( s, w ) ∩ P and the remaining characters of w (when reaching p ∈ P w ) in W ⊆ Σ ∗ . If P w = ∅ , we reject. Otherwise we simulate for all v ∈ W and all p ∈ P w the automaton M p := ( Q, Σ , δ, { p } , F ) on v . If there is at least one v such that v ∈ L ( M p ) for all p ∈ P w , we accept or else we reject. So the test takes at most time O ( n + n · n ) .The relation is polynomially balanced as well since the partition set has at most n ele-ments and w has length at most n − ( M is acyclic since the language is ﬁnite).8 NP - hardness of Primality ﬁnite NP - hardness of Primality ﬁnite

In this chapter we proof the following main theorem of the paper:

Theorem 14.

Primality ﬁnite is NP - hard (for languages given as DFA s).

We will start with the NP - complete problem SquareTiling edge and build the followingchain of polynomial reductions: NP ≤ SquareTiling edge ≤ SquareTiling rel ≤¬ ConcatenationEquivalence ﬁnite ≤ Primality ﬁnite

The chain is actually quite similar to the one in the work of Martens et al. [MNS10,Sections 5.2 and 6.2]. They reference a different form of tiling and use a special case ofconcatenation equivalence.

SquareTiling edge to SquareTiling rel

We start with a tiling problem whose complexity is stated in the book of Garey andJohnson [GJ79]. Then we will adapt the problem to a better ﬁtting variant.

Problem 15.

SquareTiling edge

Input: A set of colours C , a set of tiles T ⊆ C and a natural number n ≤ | C | ;A tile a bcd := ( a, b, c, d ) ∈ T has four edges with correspondingcoloursQuestion: Is there a tiling, i.e. an n × n square A ∈ T n × n of tiles, such thatall adjacent tiles A ( i, j ) = a bc d and A ( i, j + 1) = α βγ δ resp. A ( i + 1 , j ) = ˜ a ˜ b ˜ c ˜ d fullﬁll b = δ resp. c = ˜ a NP - hardness of Primality ﬁnite

Proposition 16 ([GJ79, GP13] ) . SquareTiling edge is NP - complete . Problem 17.

SquareTiling rel

Input: A set of tiles Θ , relations V, H ⊆ Θ × Θ and a natural number n ∈ N Question: Is there a tiling, i.e. an n × n square T ∈ Θ n × n , such that adjacenttiles are in the horizontal relation H resp. the vertical relation V : ∀ i ∀ j < n : ( T ( i, j ) , T ( i, j + 1)) ∈ H ∀ i < n ∀ j : ( T ( i, j ) , T ( i + 1 , j )) ∈ V Remark.

Alternatively we write T ( i · n + j ) := T ( i, j ) and get a list where ( T ( m ) , T ( m +1)) ∈ H for ≤ m < n ∧ m / ∈ n Z and ( T ( m ) , T ( m + n )) ∈ V for ≤ m ≤ n − n has tobe fulﬁlled. Proposition 18.

SquareTiling rel is NP - hard .Proof. Given an input C , T and n for SquareTiling edge . Let H := { ( a b cd , α βγ δ ) ∈ T × T | b = δ } , V := { ( a b c d , α βγδ ) ∈ T × T | c = α } and Θ := T . Then there is a tiling T for SquareTiling rel (Θ , H, V, n ) if and only if thereis one for SquareTiling edge ( C, T , n ) . The construction of Θ , H and V works obviously inpolynomial time. Remark.

One can translate

SquareTiling rel to SquareTiling edge as well, as outlined in apaper of van Emde Boas [vEB97, p. 7]. The source only mentions that the directed-hamilton-path problem is reduced to

SquareTiling edge . Togive the interested reader a basis for the proof: n := vertices, the series of vertices in the hamiltonpath is written on the diagonal of the n × n -square and the corresponding edges are on the diagonalsabove and below the main diagonal. NP - hardness of Primality ﬁnite

SquareTiling rel to ConcatenationEquivalence ﬁnite

This is the most interesting reduction in the chain. Here a truly original idea, not presentin the proof of Martens et al. [MNS10], is necessary.

Problem 19.

ConcatenationEquivalence ﬁnite

Input: Finite languages L , L and L over a ﬁnite alphabet Σ given as DFA sQuestion: Does L = L L holdNow we will reduce SquareTiling rel to ¬ ConcatenationEquivalence ﬁnite . The most inter-esting point, compared to the regular language case, is that we work over the alphabet Θ × [1 , n ] instead of just Θ . This allows us, for a word in L L , to detect the point wherewe jump from L to L . Proposition 20.

ConcatenationEquivalence ﬁnite is coNP - complete .Proof. It is obviously in coNP , since a word in L \ L L or in L L \ L is a witness for L = L L and the longest words to consider have length O ( n ) , as the languages are ﬁnite.So we get to the coNP - hardness . Suppose we can solve ConcatenationEquivalence ﬁnite .Let n , Θ and V, H ⊆ Θ × Θ be an input for SquareTiling rel . We deﬁne L := { ( t , t , . . . ( t m , m ) ∈ (Θ × [1 , n ]) ∗ | m ≤ n − } , L := [ ≤ m ≤ n ,m/ ∈ n Z { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m +1 ) / ∈ H } ∪ [ ≤ m ≤ n − n { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m + n ) / ∈ V } and L := L L ∪ { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } . The size of the

DFA s of the deﬁned languages is polynomial in the size of the input andcan be constructed in polynomial time as shown below.11 NP - hardness of Primality ﬁnite

The automaton for L is pretty simple and has n − states: start . . . n − × { } Θ × { } Θ × { n − } The automaton for L is more complicated and depends on V and H , but it is polynomialin size. We will give two automata M V and M H with polynomial sizes such that L ( M H ) = [ ≤ m ≤ n ,m/ ∈ n Z { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m +1 ) / ∈ H } and L ( M V ) = [ ≤ m ≤ n − n { ( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ (Θ × [1 , n ]) ∗ | ( t m , t m + n ) / ∈ V } . Obviously L = L ( M H ∪ M V ) and the union automaton still has polynomial size [Yu97,Proof of Theorem 2.1].The automaton M H is constructed as follows: The set of states is Q H := { s H } ˙ ∪ { ς t,m | t ∈ Θ , m ∈ [1 , n ] \ n Z } ˙ ∪ { ς i | i ∈ [2 , n ] } . The automaton has to check for a word ( t , m )( t , m ) . . . ( t k , m k ) whether ( t , t ) / ∈ H and whether m i +1 = m i + 1 for all i . So after reading the ﬁrst letter the state has to store t and every state has to store the most recent m i . Therefore after the ﬁrst character wego to the corresponding state ς t ,m . If the next character fulﬁls both ( t , t ) / ∈ H and m = m + 1 , we only have to check m i +1 = m i + 1 . Hence we only store the mostrecent m i , by going to the state ς m i . Once we get to ς n we accept. If otherwise therewas any mistake we stop the run at that point.Here a formal deﬁnition of the transition function δ H : δ H ( s H , ( t, m )) := ς t,m for ≤ m < n and m / ∈ n Z δ H ( ς t,m , ( t ′ , m + 1)) := ς m +1 for ≤ m < n and ( t, t ′ ) / ∈ Hδ H ( ς m , ( t, m + 1)) := ς m +1 for < m < n NP - hardness of Primality ﬁnite

The automaton is then deﬁned as M H := ( Q H , Θ × [1 , n ] , δ H , { s H } , { ς n } ) and has | Θ | · ( n − n ) + n − states.The automaton M V is quite similar. The only difference is, that we have to check for aword ( t , m )( t , m ) . . . ( t k , m k ) , whether ( t , t n ) / ∈ V . Therefore we need the addi-tional states σ t,m,o , where o stores how many characters away from ( t , m ) we alreadyare. Hence we get the following set of states Q V := { s V } ˙ ∪ { σ t,m,o | t ∈ Θ , m ∈ [1 , n − n ] , o ∈ [0 , n − } ˙ ∪ { σ i | i ∈ [ n + 1 , n ] } ,the transition function δ V ( s H , ( t, m )) := σ t,m, for ≤ m ≤ n − nδ V ( σ t,m,i , ( t ′ , m + i + 1)) := σ t,m,i +1 for ≤ i < n − δ V ( σ t,m,n − , ( t ′ , m + n )) := σ m + n for ≤ m ≤ n − n and ( t, t ′ ) / ∈ Vδ V ( σ m , ( t, m + 1)) := σ m +1 for ≤ m < n and ﬁnally the DFA is given as M V := ( Q V , Θ × [1 , n ] , δ V , { s V } , { σ n } ) with | Θ | · ( n − n ) · n + n − n states.So at last we have to show that L = L L ∪ { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } has a polynomial-sized automaton. Using this description, we might get an exponentialblow-up from the concatenation, but (with the help from the [1 , n ] part of the alphabet)the language can be characterised a bit differently. It basically contains all properlynumbered tilings and additionally those with one jump and a forbidden tiling (with afault, either vertically or horizontally, directly after the jump). An automaton for thiscan be constructed using a DFA M = ( Q , Θ × [1 , n ] , δ , { s } , F ) that accepts L .As set of states we use Q := Q \ { s } ˙ ∪ [0 , n ] and the transition function is as follows δ ( q, ( t, m )) :=  q + 1 m = q + 1 , ≤ q < n δ ( s , ( t, m )) m = q + 1 q ∈ [0 , n ] δ ( q, ( t, m )) q ∈ Q \ { s } The idea is to check for legal numbering with the states [0 , n ] . If there is a leap in13 NP - hardness of Primality ﬁnite the numbering, we jump into the automaton for L . So the automaton is given by M := ( Q, Θ × [1 , n ] , δ, { } , { n }∪ F ) . Obviously L ( M ) = L holds and M has polynomialsize.Thus DFA s for L , L and L are constructed in polynomial time and have polynomial sizein the size of the tiling problem. Combining this with the following Lemma 21 yields areduction from SquareTiling rel to ¬ ConcatenationEquivalence ﬁnite . That

SquareTiling rel is NP - hard (Proposition 18) completes the proof. Lemma 21.

Let n , Θ , V, H ⊆ Θ × Θ be an input for SquareTiling rel and L , L and L constructed as above. Then L = L L if and only if there is no legal tiling.Proof. A word in { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } can be interpreted as atiling, where T ( j ) = t j . Every word w w ∈ { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } ∩ L L with w i ∈ L i represents a tiling that violates the given relations: Let w =( t m , m )( t m +1 , m + 1) . . . ( t n , n ) ∈ L , then either ( t m , t m +1 ) / ∈ H or ( t m , t m + n ) / ∈ V which contradicts a legal tiling.Let L = L L , then { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } ⊂ L L , so every pos-sible tiling violates the relations and therefore there is no legal tiling. On the otherhand, if there is no legal tiling, then every possible tiling violates a relation. Hence { ( t , t , . . . ( t n , n ) ∈ (Θ × [1 , n ]) n } ⊂ L L which yields L = L L . ConcatenationEquivalence ﬁnite to Primality ﬁnite

Theorem 14.

Primality ﬁnite is NP - hard . The following proof is similar to [MNS10, Proof of Theorem 6.4]. The difference is,since they treat (non-ﬁnite) regular languages, that they reduce the problem L L = Σ ∗ (so for them L = Σ ∗ ). Proof of Theorem 14.

Let L , L and L be ﬁnite languages over the alphabet Σ given as DFA s. We want to construct a language A , such that A is decomposable if and only if14 NP - hardness of Primality ﬁnite L = L L , which reduces ConcatenationEquivalence ﬁnite to ¬ Primality ﬁnite and provesthe theorem by the coNP - hardness of ConcatenationEquivalence ﬁnite (Proposition 20).If L = ∅ , L = { ε } , L = ∅ or L = ∅ , then it is easy to check whether L = L L . So wecan assume ∅ = L = { ε } and L = ∅ = L .Let Σ ′ := { a ′ | a ∈ Σ } be a disjoint copy of the alphabet and let $ / ∈ Σ ˙ ∪ Σ ′ be anadditional letter. L ′ and L ′ are the respective languages over Σ ′ .Now we deﬁne the language A := L ∪ L $ L ′ ∪ L ′ $ L ∪ L ′ $$ L ′ . The language’s

DFA is obviously constructable in polynomial time.

Lemma 22.

The language A is either prime or its only non-trivial decomposition is A ◦ A with A := L ∪ L ′ $ and A := L ∪ $ L ′ .Remark. The proof of Martens et al. [MNS10, Claim 6.5] in the paper’s appendix worksnearly word for word. It is rather technical and adds no real value. For the sake ofcompleteness we provide one regardless.

Proof.

Suppose A = A l A r is a non-trivial decomposition. We ﬁrst show that A l ⊆ Σ ∗ ∪ Σ ′∗ $ and symmetrically A r ⊆ Σ ∗ ∪ $Σ ′∗ .Suppose A l contains a word w l with two $ -letters in it or where a symbol from Σ precedesa $ -sign.In both cases, for w l w r to be in A , w r has to be in Σ ′∗ . Thus A r ⊆ Σ ′∗ and, since thedecomposition is non-trivial, A r ) { ε } . The language L ⊆ Σ ∗ contains at least one word v of length ≥ (see premises). So we can concatenate v ∈ L ⊆ A l (because A r ⊆ Σ ′∗ and L ⊆ A ) with a word ε = w ∈ A r and should get vw ∈ A l A r = A . That is acontradiction, as A does not contain a word in Σ + Σ ′ + .The language A r includes at least one word w containing a $ , because A ⊇ L ′ $$ L ′ = ∅ (we assume L = ∅ = L ) and any word in A l contains at most one $ . If any word v ∈ A l includes a $ not as its last character, we get a paradox because vw ∈ A l A r = A

15 Final remarksincorporates two $ -signs that are not next to each other. That cannot happen for a wordin A .Now we know, every word in A l contains at most one $ -sign and if it contains one, the $ is the last sign and is not preceded by a letter in Σ . Hence A l ⊆ Σ ∗ ∪ Σ ′∗ $ . Symmetrically(one can look at the reversed languages) A r ⊆ Σ ∗ ∪ $Σ ′∗ .The intersection A ∩ Σ ∗ $Σ ′∗ = L $ L ′ together with the structures of A l and A r yield A l ∩ Σ ∗ = L and symmetrically A r ∩ Σ ∗ = L .Similarly A ∩ Σ ′∗ $$Σ ′∗ = L ′ $$ L ′ along with the structures of A l and A r imply A l ∩ Σ ′∗ $ = L ′ $ and A r ∩ $Σ ′∗ = $ L ′ . Thus A l = L ∪ L ′ $ and A r = L ∪ $ L ′ . Proposition 23. A is decomposable if and only if L = L L .Proof. If A is decomposable, we know A = A A as deﬁned in Lemma 22. Since A ∩ Σ ∗ = L , A ∩ Σ ∗ = L , A ∩ Σ ∗ = L and A = A A , L = L L holds.If on the other hand L = L L , obviously A = A A as in Lemma 22.So accordingly we get that the coNP - hard problem ConcatenationEquivalence ﬁnite is re-ducible to the complement of

Primality ﬁnite . Therefore the original problem

Primality ﬁnite is NP - hard . There are still many open questions related to

Primality ﬁnite . Obviously the exact com-plexity has to be determined. Our attempts to ﬁnd an NP -algorithm failed, so perhapsthe lower bound has to be improved further. A coNP lower bound for primality witha list as input would strongly hint to a higher lower bound for Primality ﬁnite . Havingthe input as an

NFA would be yet another problem to consider. In that case basicallynothing is known, since Theorem 11 is not applicable in its current form.16 Final remarksAside from these variants for the input, the decomposition into three, four or generallyinto m languages is a problem to consider (for all the input variants). A priori we donot know much about that. For lists we still get coNP for ﬁxed m by naively guessingthe partition. And for DFA s we can check, for all possible decompositions into twolanguages, whether those languages are decomposable again. That approach clearly isnot efﬁcient.Comprehensively we can say that there are still many open questions regarding thecomplexity of decompositions of ﬁnite languages.17eferences

Acknowledgement.

I want to thank Prof. Dr. Wim Martens for his guidance and support,Dr. Matthias Niewerth for his advice, especially for the idea to “count” the tiles, andJohannes Doleschal for proofreading.

References [GJ79] Michael R. Garey and David S. Johnson,

Computers and Intractability: AGuide to the Theory of NP-Completeness , W. H. Freeman & Co., New York,NY, USA, 1979.[HMRU00] John E. Hopcroft, Rajeev Motwani, Rotwani, and Jeffrey D. Ullman,

Intro-duction to Automata Theory, Languages and Computability , 2nd ed., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000.[MNS10] Wim Martens, Matthias Niewerth, and Thomas Schwentick,

Schema Designfor XML Repositories: Complexity and Tractability , Proceedings of theTwenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles ofDatabase Systems (New York, NY, USA), PODS ’10, ACM, 2010, pp. 239–250.[MSY98] Alexandru Mateescu, Arto Salomaa, and Sheng Yu,

On the Decomposition ofFinite Languages , Tech. report, 1998.[Pap94] Christos H. Papadimitriou,

Computational Complexity , Theoretical computerscience, Addison-Wesley, 1994.[vEB97] Peter van Emde Boas,

The convenience of tilings , Lecture Notes in Pure andApplied Mathematics (1997), 331–363.[Wie10] Wojciech Wieczorek,