[PDF] The Complexity of Learning Linear Temporal Formulas from Examples

Abstract

In this paper we initiate the study of the computational complexity of learning linear temporal logic (LTL) formulas from examples. We construct approximation algorithms for fragments of LTL and prove hardness results; in particular we obtain tight bounds for approximation of the fragment containing only the next operator and conjunctions, and prove NP-completeness results for many fragments.

Full PDF

aa r X i v : . [ c s . F L ] F e b The Complexity of LearningLinear Temporal Formulas from Examples

Nathana¨el Fijalkow and Guillaume Lagarde CNRS, LaBRI, Universit´e de Bordeaux, France The Alan Turing Institute in London, United Kingdom

Abstract

In this paper we initiate the study of the computational complex-ity of learning linear temporal logic (LTL) formulas from examples. Weconstruct approximation algorithms for fragments of LTL and prove hard-ness results; in particular we obtain tight bounds for approximation of thefragment containing only the next operator and conjunctions, and prove NP -completeness results for many fragments. We are in this paper interested in the complexity of learning formulas of LinearTemporal Logic (

LTL ) from examples, in a passive scenario: from a set ofpositive and negative words, the objective is to construct a formula, as smallas possible, which satisﬁes the positive words and does not satisfy the negativewords.Passive learning of languages has a long history paved with negative re-sults. Learning automata is notoriously diﬃcult from a theoretical perspective,as witnessed by the original NP -hardness result of learning a Deterministic Fi-nite Automaton (DFA) from examples ([Gol78]). This line of hardness resultsculminates with the inapproximability result of [PW93] stating that there isno polynomial time algorithm for learning a DFA from examples even up to apolynomial approximation of their size.One approach to cope with such hardness results is to change representation,for instance replacing automata by logical formulas; their syntactic structuresmake them more amenable to principled search algorithms. There is a rangeof potential logical formalisms to choose from depending on the applicationdomain. Linear Temporal Logic ([Pnu77]) is a prominent logic for specifyingtemporal properties over words, it has become a de facto standard in many ﬁeldssuch as model checking, program analysis, and motion planning for robotics. Akey property making LTL a strong candidate as a concept class is that itssyntax does not include variables, contributing to the fact that

LTL formulasare typically easy to interpret and therefore useful as explanations.Over the past ﬁve to ten years learning temporal logics (of which

LTL is the1ore) has become an active research area, with applications in program speci-ﬁcation ([LPB15]) and anomaly and fault detections ([BVPA + + LTL formulas from examples.

Our contributions.

We present a set of results for three fragments of

LTL .For all three fragments we show that the learning problem is NP -complete. • In Section 3 we study

LTL ( X , ∧ ), which is the fragment containing onlythe next operator and conjunctions. We obtain matching upper and lowerbounds on approximation algorithms: we show that there exists a poly-nomial time log( n )-approximation algorithm for learning LTL ( X , ∧ ), andthat the approximation ratio cannot be improved for polynomial time al-gorithms. • In Section 4 we study

LTL ( F , ∧ ), which is the fragment containing onlythe eventually operator and conjunctions. We construct an n -approximationalgorithm and show that there is no polynomial time log( n )-approximationalgorithm. • In Section 5 we study

LTL ( F , X , ∧ , ∨ ), which is the fragment containingthe eventually and next operators, conjunctions and disjunctions.We conclude in Section 6, listing remaining open problems. Unless otherwise speciﬁed we use the alphabet Σ = { a, b } of size 2. We indexwords from position 1 (not 0) and the letter at position i in the word w is w ( i ),so w = w (1) . . . w ( ℓ ). The empty word is ε .The syntax of Linear Temporal Logic ( LTL ) includes atomic formulas c ∈ Σ, the boolean operators ∧ and ∨ , and the temporal operators X , F , and G .The semantic of LTL over ﬁnite words is deﬁned inductively over formulas,through the notation w, i | = φ where w ∈ Σ ∗ is a word of length ℓ , i ∈ [1 , ℓ ] isa position in w , and φ an LTL formula. The deﬁnition is given below for theatomic formulas and temporal operators X , F , and G , with boolean operatorsinterpreted as usual. • w, i | = c if w ( i ) = c . • w, i | = X φ if i < ℓ and w, i + 1 | = φ . It is called the ne X t operator. LTL also includes an Until operator U extending both F and G . In this paper we onlyconsider fragments of LTL ( G , F , X , ∧ , ∨ ). w, i | = F φ if w, i ′ | = φ for some i ′ ∈ [ i, ℓ ]. It is called the e F entuallyoperator. • w, i | = G φ if w, i ′ | = φ for all i ′ ∈ [ i, ℓ ]. It is called the G lobally operator.We then write w | = φ if w, | = φ and say that w satisﬁes φ . We considerfragments of LTL by specifying which boolean connectives and temporal oper-ators are allowed. For instance

LTL ( X , ∧ ) is the set of all LTL formulas usingonly atomic formulas, conjunctions, and the next operator. The full logic weconsider here is

LTL = LTL ( F , G , X , ∧ , ∨ ). The size of a formula is the sizeof its syntactic tree. We say that two formulas are equivalent if they have thesame semantics. The LTL learning problem.

The

LTL learning decision problem is:

INPUT : u , . . . , u n , v , . . . , v m ∈ Σ ∗ and k ∈ N , QUESTION : does there exist an

LTL formula φ of size at most k such that for all j ∈ [1 , n ], we have u j | = φ ,and for all j ∈ [1 , m ], we have v j = φ ?In that case we say that φ separates u , . . . , u n from v , . . . , v m , or simplythat φ is a separating formula if the words are clear from the context. We call u , . . . , u n the positive words, and v , . . . , v m the negative words. The LTL learning problem is analogously deﬁned for any fragment of

LTL . Parameters for complexity analysis.

Without loss of generality we canassume that n = m (adding duplicate identical words to have an equal numberof positive and negative words). Therefore the three important parameters forthe complexity of the LTL learning problem are: n the number of words, ℓ themaximum length of the words, and k the desired size for the formula. Representation.

The words given as input are represented in a natural way.We emphasise a subtelty on the representation of k : it can be given in binary(a standard assumption) or in unary.In the ﬁrst case, the input size is O ( n · ℓ + log( k )), so the formula φ we arelooking for may be exponential in the input size! Therefore it is not clear apriori that the LTL learning problem is in NP . Opting for a unary encoding,the input size becomes O ( n · ℓ + k ), and in that case an easy argument showsthat the LTL learning problem is in NP .We follow the standard representation: k is given in binary, and therefore itis not immediate that the LTL learning problem is in NP . Convention.

Typically i ∈ [1 , ℓ ] is a position in a word and j ∈ [1 , n ] is usedfor indexing words. A naive algorithm.

Let us start our complexity analysis of the learning

LTL problem by constructing a naive algorithm for the whole logic.3 heorem 1.

There exists an algorithm for learning

LTL in time and space O ( exp ( k ) · n · ℓ ) , where exp ( k ) is exponential in k . Notice that the dependence of the algorithm presented in Theorem 1 is linearin n and ℓ , and it is exponential only in k , but since k is represented in binarythis is potentially a doubly-exponential algorithm. Proof.

For a formula φ ∈ LTL , we write h φ i : { u , . . . , u n , v , . . . , v n } → { , } ℓ for the function deﬁned by h φ i ( w )( i ) = ( w, i | = φ, w, i = φ, for w ∈ { u , . . . , u n , v , . . . , v n } .Note that φ is separating if and only if h φ i ( u j )(1) = 1 and h φ i ( v j )(1) = 0for all j ∈ [1 , n ]. The algorithm simply consists in enumerating all formulas φ of LTL of size at most k inductively, constructing h φ i , and checking whether φ isseparating. Initially, we construct h a i and h b i , and then once we have computed h φ i and h ψ i , we can compute h φ ∧ ψ i , h φ ∨ ψ i , h X φ i , h F φ i and h G φ i in time O ( n · ℓ ). To conclude, we note that the number of formulas of LTL of size atmost k is exponential in k . Approximation algorithms.

The goal of this paper is to understand thecomplexity of learning fragments of

LTL and to construct eﬃcient approxi-mation algorithms. An α -approximation algorithm for learning LTL (or somefragment of

LTL ) does the following: the algorithm either determines that thereare no separating formulas, or constructs a separating formula φ which has sizeat most α · m with m the size of a minimal separating formula. The asymptotics can be obtained using classical techniques from Analytic Combina-torics [FS08]: the number of

LTL formulas of size k is asymptotically equivalent to √ · k √ πk . LTL ( X , ∧ ) Normalisation

We ﬁrst state and prove a normalisation lemma for formulas in

LTL ( X , ∧ ).We deﬁne the class of “patterns” as formulas generated by the followinggrammar: P . = X i c | X i ( c ∧ P ) with i ≥ c ∈ Σ . Unravelling the deﬁnition we get the following general form for patterns: P = X i − ( c ∧ X i − i ( · · · ∧ X i p − i p − c p ) · · · ) , with 1 ≤ i < i < · · · < i p and c , . . . , c p ∈ Σ. It is equivalent to the (larger insize) formula V q ∈ [1 ,p ] X i q − c q , which states that for each q ∈ [1 , p ], the letter inposition i q is c q .To determine the size of a pattern P we look at two parameters: its lastposition last ( P ) = i p and its width width ( P ) = p . The size of P is last ( P ) +2( width ( P ) − LTL ( X , ∧ ) formulas:do we increase the last position, to reach further letters in the words, or thewidth, to further restrict the set of satisfying words? Lemma 1.

For every formula φ ∈ LTL ( X , ∧ ) there exists an equivalent patternof size smaller than or equal to φ .Proof. We proceed by induction on φ . • Atomic formulas are already a special case of patterns. • If φ = X φ ′ , by induction hypothesis we get a pattern P equivalent to φ ′ ,then X P is a pattern and equivalent to φ . • If φ = φ ∧ φ , by induction hypothesis we get two patterns P and P equivalent to φ and φ . We use the inductive deﬁnition for patterns toshow that P ∧ P is equivalent to another pattern. We focus on the case P = X i ( c ∧ P ′ ) and P = X i ( c ∧ P ′ ), the other cases are simplerinstances of this one.There are two cases: i = i or i = i .If i = i , either c = c and then P ∧ P is equivalent to false, whichis the pattern c ∧ c , or c = c , and then P ∧ P is equivalent to X i ( c ∧ P ′ ∧ P ′ ). By induction hypothesis P ′ ∧ P ′ is equivalent to apattern P ′ , so the pattern X i ( c ∧ P ′ ) is equivalent to P ∧ P , hence to φ .If i = i , without loss of generality i < i , then P ∧ P is equivalent to X i ( c ∧ P ′ ∧ X i − i ( c ∧ P ′ )). By induction hypothesis P ′ ∧ X i − i ( c ∧ P ′ )is equivalent to a pattern P ′ , so the pattern X i ( c ∧ P ′ ). is equivalent to P ∧ P , hence to φ .The ﬁrst simple corollary of Lemma 1 is a non-deterministic polynomial timealgorithm. 5 heorem 2. The learning problem for

LTL ( X , ∧ ) is in NP .Proof. Let u , . . . , u n , v , . . . , v n a set of 2 n words of length at most ℓ . Thanksto Lemma 1, if there exists a separating formula φ , then there exists a separatingpattern of size no larger than φ . However patterns have polynomially boundedsize: indeed both the last position and the width are at most ℓ , so the size of apattern is at most 3 ℓ − O ( ℓ ).In other words, if there exists a separating formula, then there exists oneof size linear in ℓ . A non-deterministic algorithm guesses such a formula andchecks whether it is indeed separating in (deterministic) time O ( n · ℓ ). An approximation algorithm

Theorem 3.

There exists a O ( n · ℓ ) time log( n ) -approximation algorithm forlearning LTL ( X , ∧ ) . Algorithm 1:

The greedy algorithm returning a log( n )-approximation of a minimal separating LTL ( X , ∧ )-formula with lastposition ℓ . Data:

Words u , . . . , u n , v , . . . , v n of length at most ℓ . X ← { i ∈ [1 , ℓ ] : ∃ c ∈ Σ , ∀ j ∈ [1 , n ] , u j ( i ) = c } for i ∈ X do Y i ← { j ∈ [1 , n ] : v j ( i ) = u ( i ) = u ( i ) = · · · = u n ( i ) } I ← ∅ C ← ∅ x ← repeat i x ← argmax { Card( Y i \ C x ) : i ∈ X \ I x } ; I x +1 ← I x ∪ { i x } ; C x +1 ← C x ∪ Y i x ; x ← x + 1 ; until C x = [1 , n ] or I x = X ; if C x = [1 , n ] thenreturn The pattern corresponding to I x elsereturn No separating formulaProof.

Let u , . . . , u n , v , . . . , v n a set of 2 n words of length at most ℓ . Thanksto Lemma 1 we are looking for a separating pattern: P = X i − ( c ∧ X i − i ( · · · ∧ X i p − i p − c p ) · · · ) . For a pattern P we deﬁne I ( P ) = { i q ∈ [1 , ℓ ] : q ∈ [1 , p ] } . Note that last ( P ) =max I ( P ) and width ( P ) = Card( I ( P )).We deﬁne the set X = { i ∈ [1 , ℓ ] : ∃ c ∈ Σ , ∀ j ∈ [1 , n ] , u j ( i ) = c } . Note that P satisﬁes u , . . . , u n if and only if I ( P ) ⊆ X . Further, given I ⊆ X , wecan construct a pattern P such that I ( P ) = I and P satisﬁes u , . . . , u n : wesimply choose c q = u ( i q ) = · · · = u n ( i q ) for q ∈ [1 , p ]. We call P the patterncorresponding to I . 6ecall that the size of the pattern P is last ( P ) + 2( width ( P ) − last ( P ) and the width width ( P ).Let us consider the following easier problem: construct a log( n )-approximationof a minimal separating pattern with ﬁxed last position. Assuming we have suchan algorithm, we obtain a log( n )-approximation of a minimal separating patternby running the previous algorithm on preﬁxes of length ℓ ′ for each ℓ ′ ∈ [1 , ℓ ].We now focus on the question of constructing a log( n )-approximation of aminimal separating pattern with ﬁxed last position. We refer to Algorithm 1for the pseudocode. For a set I , we write C I = S { Y i : i ∈ I } : the patterncorresponding to I does not satisfy v j if and only if j ∈ C I . In particular, thepattern corresponding to I is separating if and only if C I = [1 , n ].The algorithm constructs a set I incrementally through the sequence ( I x ) x ≥ ,with the following easy invariant: for x ≥

0, we have C x = C I x . The algorithmis greedy: I x is augmented with i ∈ X \ I x maximising the number of wordsadded to C x by adding i , which is the cardinality of Y i \ C x .We now prove that this yields a log( n )-approximation algorithm. Let P opt aminimal separating pattern with last position ℓ , inducing I opt = I ( P opt ) ⊆ [1 , ℓ ]of cardinal m . Note that C I opt = [1 , n ].We let n x = n − | C x | and show the following by induction on x ≥ n x +1 ≤ n x · (cid:18) − m (cid:19) = n x · m − m . We claim that there exists i ∈ X \ I x such that Card( Y i \ C x ) ≥ n x m . Indeed,assume towards contradiction that for all i ∈ X \ I x we have Card( Y i \ C x ) < n x m ,then there are no sets I of cardinal m such that C I ⊇ [1 , n ] \ C x , contradictingthe existence of I opt . Thus there exists i ∈ X \ I x such that Card( Y i \ C x ) ≥ n x m ,implying that the algorithm chooses such an i and n x +1 ≤ n x − n x m = n x · (cid:0) − m (cid:1) .The proved inequality implies n x ≤ n · (cid:0) − m (cid:1) x . This quantity is lessthan 1 for x ≥ log( n ) · m , implying that the algorithm stops after at mostlog( n ) · m steps. Consequently, the pattern corresponding to I has size at mostlog( n ) · | P opt | , completing the claim on approximation.A naive complexity analysis yields an implementation of Algorithm 1 run-ning in time O ( n · ℓ ), leading to an overall complexity of O ( n · ℓ ) by run-ning Algorithm 1 on the preﬁxes of length ℓ ′ of u , . . . , u n , v , . . . , v n for each ℓ ′ ∈ [1 , ℓ ]. Hardness results

Theorem 4.

The

LTL ( X , ∧ ) learning problem is NP -hard, and there are no (1 − o (1)) · log( n ) polynomial time approximation algorithms unless P = NP ,even for a single positive word. Note that Theorem 3 and Theorem 4 yield matching upper and lower boundson approximation algorithms for learning

LTL ( X , ∧ ).7he hardness result stated in Theorem 4 follows from a reduction to the setcover problem, that we deﬁne now. The set cover decision problem is: given S , . . . , S ℓ subsets of [1 , n ] and k ∈ N , does there exists I ⊆ [1 , ℓ ] of size at most k such that S i ∈ I S i = [1 , n ]? In that case we say that I is a cover. An α -approximation algorithm returns a cover of size at most α · k where k is the sizeof a minimal cover. The following results form the state of the art for solvingexact and approximate variants of the set cover problem. Theorem 5 ([DS14]) . The set cover problem is NP -complete, and there are no (1 − o (1)) · log( n ) polynomial time approximation algorithms unless P = NP .Proof. We construct a reduction from set cover. Let S , . . . , S ℓ subsets of [1 , n ]and k ∈ N .Let us consider the word u = a ℓ +1 , and for each j ∈ [1 , n ] and i ∈ [1 , ℓ ],writing v j ( i ) for the i th letter of v j : v j ( i ) = ( b if j ∈ S i ,a if j / ∈ S i , and we set v j ( ℓ + 1) = a for any j ∈ [1 , n ]. We also add v n +1 = a ℓ b .We claim that there is a cover of size k if and only if there is a formula ofsize ℓ + 2 k − u from v , . . . , v n +1 .Thanks to Lemma 1 we can restrict our attention to patterns, i.e formulasof the form (we adjust the indexing for technical convenience) φ = X i − ( c ∧ X i − i ( · · · ∧ X i p +1 − i p c p +1 ) · · · ) , for some positions i ≤ · · · ≤ i p +1 and letters c , . . . , c p +1 ∈ Σ. If φ satisﬁes u ,then necessarily c = · · · = c p +1 = a . This implies that if φ does not satisfy v n +1 , then necessarily i p +1 = ℓ + 1.We associate to φ the set I = { i ≤ · · · ≤ i p } . Note that φ is equivalent to V q ∈ [1 ,p ] X i q − a ∧ X ℓ a , and the size of φ is ℓ + 1 + 2( | I | − φ separates u from v , . . . , v n +1 if and only if I is a cover.Indeed, I is a cover if and only if for every j ∈ [1 , n ] there exists i ∈ I such that j ∈ S i , which is equivalent to for every j ∈ [1 , n ] we have v j = φ .8 LTL ( F , ∧ ) As we will see,

LTL ( F , ∧ ) over an alphabet of size 2 is very weak. This degen-eracy vanishes when considering alphabets of size at least 3. Let us ﬁx a (ﬁnite)alphabet Σ. Minimal formulas

Instead of deﬁning a normal form as we did for

LTL ( X , ∧ ) we characterise theexpressive power of LTL ( F , ∧ ) and construct for each property expressible inthis logic a minimal formula.Let us consider two words u = u (1) . . . u ( ℓ ′ ) and v = v (1) . . . v ( ℓ ). We saythat u is a subword of v if there exists φ : [1 , ℓ ′ ] → [1 , ℓ ] increasing such that v ( φ ( i )) = u ( i ), and that u is a factor of v if v = v uv for two words v , v . Forexample abba is a subword of b ab aaaa ba , but not a factor, and bba is a factorof ba bba a . We say that a word is non-repeating if every two consecutive lettersare diﬀerent. Lemma 2.

For every formula φ ∈ LTL ( F , ∧ ) , either it is equivalent to false orthere exists a ﬁnite set of non-repeating words w , . . . , w p and c ∈ Σ ∪ { ε } suchthat for every word z , z | = φ if and only if ( for all q ∈ [1 , p ] , w q is a subword of z, and z starts with c. Proof.

We proceed by induction over φ . • For the atomic formula c ∈ Σ, the property is satisﬁed using the emptyset of words and c . • If φ = F φ ′ , by induction hypothesis we get w , . . . , w p and c for φ ′ . Welet w ′ i = cw i if w i (1) = c and w ′ i = w i otherwise, then z | = φ if and onlyif for all q ∈ [1 , p ], w ′ q is a subword of z and z starts with ε (the lattercondition is always satisﬁed). • If φ = φ ∧ φ , by induction hypothesis we get w , . . . , w p , c for φ and w , . . . , w p , c for φ . There are two cases. If c and c are non-emptyand c = c then φ is equivalent to false. Otherwise, either both are non-empty and equal or at least one is ε , say c . In both cases, u | = φ if andonly if for all ( e, q ) ∈ (1 , [1 , p ]) ∪ (2 , [1 , p ]), w eq is a subword of u and u starts with c .Lemma 2 gives a characterisation of the properties expressible in LTL ( F , ∧ ).It implies that over an alphabet of size 2 the fragment LTL ( F , ∧ ) is very weak.Indeed, there are very few non-repeating words over the alphabet Σ = { a, b } :only preﬁxes of abab . . . and baba . . . . This implies that formulas in LTL ( F , ∧ )over Σ = { a, b } can only place lower bounds on the number of alternationsbetween a and b (starting from a or from b ) and check whether the word startswith a or b . In particular, the LTL ( F , ∧ ) learning problem over this alphabet9s (almost) trivial and thus not interesting. Hence we now assume that Σ hassize at least 3.We move back from semantics to syntax, and show how to construct minimalformulas. Let w , . . . , w p a ﬁnite set of non-repeating words and c ∈ Σ ∪ { ε } ,we deﬁne a formula φ as follows.The set of preﬁxes of w , . . . , w p are organised in a forest (set of trees): anode is labelled by a preﬁx w of some w , . . . , w p , and its children are the words wc which are preﬁxes of some w , . . . , w p . The leaves are labelled by w , . . . , w p .We interpret each tree t as a formula φ t in LTL ( F , ∧ ) as follows, in an inductivefashion: for c ∈ Σ, if t is labelled wa with subtrees t , . . . , t q , then φ t = F ( c ∧ ^ i φ t i ) . If c = ε , the formula associated to w , . . . , w p and c is the conjunction of theformulas for each tree of the forest, and if c ∈ Σ, then the formula additionallyhas a conjunct c .As an example, consider the set of words ab, ac, bab , and the letter a . Theforest corresponding to ab, ac, bab contains two trees: one contains the nodes b, ba, bab , and the other one the nodes a, ab, ac . The two corresponding formulasare F ( b ∧ F ( a ∧ F b )) ; F ( a ∧ F b ∧ F c ) . And the formula corresponding to the set of words ab, ac, bab , and the letter a is a ∧ F ( b ∧ F ( a ∧ F b )) ∧ F ( a ∧ F b ∧ F c ) . Lemma 3.

For every non-repeating words w , . . . , w p and c ∈ Σ ∪ { ε } , theformula φ constructed above is minimal, meaning there are no smaller equivalentformulas. Applying the construction above to a single non-repeating word w = c . . . c p we obtain what we call a “ f attern” (pattern with an F): F = F ( c ∧ F ( · · · ∧ F c p ) · · · ) , We say that the non-repeating word w induces the fattern F above, and con-versely that the fattern F induces the word w . The size of a fattern F is 3 | w |− c ∧ F , in that case theletter c is added at the beginning of w and the size is 3 | w | − Lemma 4.

Let u , . . . , u n , v , . . . , v n . If there exists φ ∈ LTL ( F , ∧ ) separating u , . . . , u n from v , . . . , v n , then there exists a conjunction of at most n fatternsseparating u , . . . , u n from v , . . . , v n .Proof. Thanks to Lemma 2, to the separating formula φ we can associate aﬁnite set of non-repeating words w , . . . , w p and c ∈ Σ ∪ { ε } such that for everyword z , z | = φ if and only if ( for all q ∈ [1 , p ] , w q is a subword of z, and z starts with c. j ∈ [1 , n ], since v j does not satisfy φ either v j does not start with c or forsome q ∈ [1 , p ] the word w q is not a subword of u . For each j ∈ [1 , n ] such that v j starts with c , we pick one q j ∈ [1 , p ] for which w q j is not a subword of v j ,and consider the set (cid:8) w q j : j ∈ [1 , n ] (cid:9) together with c ∈ Σ ∪ { ε } . The formulainduced by the construction above is a conjunction of at most n fatterns and itseparates u , . . . , u n from v , . . . , v n .A ﬁrst corollary of Lemmas 2 and 3 is a non-deterministic polynomial timealgorithm. Theorem 6.

The learning problem for

LTL ( F , ∧ ) is in NP .Proof. Let u , . . . , u n , v , . . . , v n a set of 2 n words of length at most ℓ . Assumethere exists a separating formula φ , thanks to Lemma 4 there exists a con-junction of at most n fatterns separating u , . . . , u n from v , . . . , v n . Howeverfatterns have polynomially bounded size: indeed the size of a fattern is at most3 ℓ − O ( ℓ ).In other words, if there exists a separating formula, then there exists one ofsize at most O ( n · ℓ ). A non-deterministic algorithm guesses such a formula andchecks whether it is indeed separating in (deterministic) time O ( n · ℓ ). A dynamic programming algorithm

Let us deﬁne an intermediate problem called shortest subword: the input is u , . . . , u n , v , . . . , v n , and the goal is to ﬁnd the shortest word w such that forall j ∈ [1 , n ], w is a subword of u j and not a subword of v j .Lemma 4 and Lemma 6 imply that learning LTL ( F , ∧ ) in both cases of asingle positive word and a single negative word is equivalent to the shortest sub-word problem, since minimising the size of a ﬂattern is equivalent to minimisingthe size of the word it induces. In particular, this implies that the shortestsubword problem is NP -complete. Let us construct an algorithm for solvingthe shortest subword problem and then discuss its consequences for learning LTL ( F , ∧ ). Lemma 5.

There exists an algorithm solving the shortest subword problem run-ning in time O ( n · (min (cid:8) ℓ , ℓ n (cid:9) + | Σ | · ℓ )) . We use Python-inspired notations for suﬃxes: we let w ( k :) denote the wordobtained from w starting at position k .Let us write i = ( i , i , . . . , i n , i ′ , i ′ , . . . , i ′ n ) for a tuple of positions in each ofthe 2 n words. We include for each word the special position ω . Let R ( i ) be thelength of a shortest word w such that for all j ∈ [1 , n ], w is a subword of u j ( i j :)and not a subword of v j ( i ′ j :). We construct a dynamic programming algorithmpopulating the table R ; the goal is to compute R (1 , , . . . , Proof.

The key equality on which Algorithm 2 relies is R ( i ) = min ( R ( i + 1 , ni , . . . , ni n , ni ′ , . . . , ni ′ n ) R ( i + 1 , i , . . . , i n , i ′ , . . . , i ′ n ) , lgorithm 2: The dynamic programming algorithm solving the short-est subword problem.

Data:

Words u , . . . , u n , v , . . . , v n of length at most ℓ . for ( i , . . . , i n ) doR ( i , . . . , i n , ω, . . . , ω ) ← for i with i ′ j = ω doR ( i ) ← ω ; for w ∈ { u , . . . , v n } , c ∈ Σ , i doind ( w, c, i ) ← min { i ′ : w ( i ′ ) = c ∧ i ′ ≥ i } for i = ( i , . . . i n , i ′ , . . . , i ′ n ) dofor j ∈ [1 , n ] do ni j = ind ( u j , u ( i ) , i j ) ; ni ′ j = ind ( v j , u ( i ) , i ′ j ) ; x ← R ( i + 1 , ni , . . . , ni n , ni ′ , . . . , ni ′ n ) ; y ← R ( i + 1 , i , . . . , i n , i ′ , . . . , i ′ n )) ; R ( i ) ← min(1 + x, y ) ; return R (1 , . . . , ni j = ind ( u j , u ( i ) , i j ) and ni ′ j = ind ( v j , u ( i ) , i ′ j ). It correspondsto the following case distinction: we consider the shortest subword w from i together with the functions φ i mapping w to each u , . . . , u n , v , . . . , v n . Then • either φ (1) = i , and then necessarily φ i (1) ≥ ni i for i ∈ [2 , n ] and φ i ′ (1) ≥ ni ′ i , so w (2 :) is the shortest subword starting from( i + 1 , ni , . . . , ni n , ni ′ , . . . , ni ′ n ) , • or φ (1) > i , and then w is the shortest subword starting from( i + 1 , i , . . . , i n , i ′ , . . . , i ′ n ) . Complexity analysis. There are at most 2 ℓ subwords, and at most ℓ n tuples;both give an upper bound on the number of iterations. Processing each isdone in time O ( n ) since we need to query the values ind ( u j , u ( i ) , i j ) and ind ( v j , u ( i ) , i ′ j ) for j ∈ [1 , n ]. The naive algorithm to compute all the values ind ( w, c, i ) runs in time O ( n ·| Σ |· ℓ ) but this can be easily reduced to a runningtime of O ( n · | Σ | · ℓ ).We now show how to instantiate Algorithm 2 for learning LTL ( F , ∧ ). Theorem 7. • There exists a O ( n · (min (cid:8) ℓ , ℓ n (cid:9) + | Σ | · ℓ )) time algorithm for learning LTL ( F , ∧ ) with a single negative word. • There exists a O ( n · (min (cid:8) ℓ , ℓ n (cid:9) + | Σ | · ℓ )) time algorithm for learning LTL ( F , ∧ ) with a single positive word. • There exists a O ( n · (min (cid:8) ℓ , ℓ n (cid:9) + | Σ | · ℓ ))) time n -approximation algo-rithm for learning LTL ( F , ∧ ) . roof. Let us ﬁrst consider the case of a single negative word. Thanks toLemma 4 we can restrict our attention to fatterns, so in this case learning

LTL ( F , ∧ ) is equivalent to the shortest subword problem with a single negativeword. Instantiating Lemma 5 we obtain a O ( n · (min (cid:8) ℓ , ℓ n +1 (cid:9) + | Σ | · ℓ )) timealgorithm for learning LTL ( F , ∧ ) with a single negative word.The case of a single positive word is similar, but invoking Lemma 6 insteadof Lemma 4.Let us now consider the general problem of learning LTL ( F , ∧ ). The al-gorithm is the following: for each j ∈ [1 , n ] we run the algorithm for learning LTL ( F , ∧ ) on a single negative word: we construct a formula φ j separating u , . . . , u n from v j . The algorithm then outputs the formula ψ = V j ∈ [1 ,n ] φ j .Indeed ψ separates u , . . . , u n from v , . . . , v n . We now claim that | ψ | ≤ n · m where m is the size of a minimal formula in LTL ( F , ∧ ) separating u , . . . , u n from v , . . . , v n . Let φ such a formula, then for all j ∈ [1 , n ] it also separates u , . . . , u n from v j , so | φ j | ≤ | φ | , implying that | ψ | ≤ n · | φ | . Hardness results

Theorem 8.

The

LTL ( F , ∧ ) learning problem is NP -hard, and there are no (1 − o (1)) · log( n ) polynomial time approximation algorithms unless P = NP ,even with a single positive word. The result follows from a reduction from the hitting set problem. The hittingset decision problem is: given C , . . . , C n subsets of [1 , ℓ ] and k ∈ N , does thereexist H subset of [1 , ℓ ] of size at most k such that for every j ∈ [1 , n ] we have H ∩ C j = ∅ . In that case we say that H is a hitting set.The hitting set problem is an equivalent formulation of the set cover problem,but it is here technically more convenient to construct a reduction from thehitting set problem. The hardness results stated in Theorem 5 apply to thehitting set problem.For proving the correction of the reduction we need a normalisation lemmaspecialised to the case of a single positive word. Lemma 6.

Let u, v , . . . , v n . If there exists φ ∈ LTL ( F , ∧ ) separating u from v , . . . , v n , then there exists a fattern of size smaller than or equal to φ separating u from v , . . . , v n .Proof. Thanks to Lemma 2, to the separating formula φ we can associate aﬁnite set of non-repeating words w , . . . , w p and c ∈ Σ ∪ { ε } such that for everyword z , z | = φ if and only if ( for all q ∈ [1 , p ] , w q is a subword of z, and z starts with c. Since u satisﬁes φ , it starts with c and for all q ∈ [1 , p ], w q is a subword of u .For each q ∈ [1 , p ] there exists φ q mapping the positions of w q to u . Let us write w for the word obtained by considering all positions mapped by φ q for q ∈ [1 , p ].By deﬁnition w is a subword of u , and for all q ∈ [1 , p ] w q is a subword of w .It follows that the fattern induced by w separates u from v , . . . , v n . The size13f w is at most the sum of the sizes of the w q for q ∈ [1 , p ], hence the fatterninduced by w is smaller than the original formula φ .We can now prove Theorem 8. Proof.

We construct a reduction from the hitting set problem. Let C , . . . , C n subsets of [1 , ℓ ] and k ∈ N . Let us consider the alphabet [0 , ℓ ], we deﬁne theword u = 012 . . . ℓ . For each j ∈ [1 , n ] we let [1 , ℓ ] \ C j = (cid:8) a j, < · · · < a j,m j (cid:9) ,and deﬁne v j = 0 a j, . . . a j,m j .We claim that there exists a hitting set of size at most k if and only if thereexists a formula in LTL ( F , ∧ ) of size at most 3 k − u from v , . . . , v n .Let H = { c , . . . , c k } a hitting set of size k with c < c < · · · < c k , weconstruct the (non-grounded) fattern induced by w = c . . . c k , it separates u from v , . . . , v n and has size 3 k − φ a formula in LTL ( F , ∧ ) of size 3 k − u from v , . . . , v n . Thanks to Lemma 6 we can assume that φ is a fattern, let w = c . . . c k the non-repeating word it induces. Necessarily c < c < · · · < c k . If φ is grounded then c = 0, but then the (non-grounded) fattern induced by c . . . c k is also separating, so we can assume that φ is not grounded. We let H = { c , . . . , c k } , and argue that H is a hitting set. Indeed, H is a hitting setif and only if for every j ∈ [1 , n ] we have H ∩ C j = ∅ , which is equivalent to forevery j ∈ [1 , n ] we have v j = φ ; indeed for c i ∈ H ∩ C j by deﬁnition c i does notappear in v j so v j = F c i . 14 LTL ( F , X , ∧ , ∨ ) Theorem 9.

The learning problem for

LTL ( F , X , ∧ , ∨ ) is in NP .Proof. Let u , . . . , u n , v , . . . , v n a set of 2 n words all of length ℓ . We note thatthere always exist a separating formula: _ j ∈ [1 ,n ] ^ i ∈ [1 ,ℓ ] X i − u j ( i ) . This formula has size O ( n · ℓ ), which is polynomial in the size of the input. Anon-deterministic algorithm guesses such a formula of size at most O ( n · ℓ ) andchecks whether it is indeed separating in (deterministic) time O ( n · ℓ ).We note that the argument applies to any fragment containing X , ∧ , and ∨ ;in particular this shows that the learning problem for LTL = LTL ( G , F , X , ∧ , ∨ )is in NP . Hardness result

We show that the reduction constructed in Section 3 extends to

LTL ( F , X , ∧ , ∨ ). Theorem 10.

The

LTL ( F , X , ∧ , ∨ ) learning problem is NP -hard, and there areno (1 − o (1)) · log( n ) polynomial time approximation algorithms unless P = NP ,even for a single positive word. We prove that the reduction constructed in Theorem 4 is also a reductionfrom set cover to the

LTL ( F , X , ∧ , ∨ ) learning problem.To prove this result we need a reduction lemma for disjunctions, that westate and prove now. Let φ ∈ LTL ( F , X , ∧ , ∨ ), we deﬁne D ( φ ) ⊆ LTL ( F , X , ∧ )by induction: • If φ = c then D ( φ ) = { c } . • If φ = φ ∧ φ then D ( φ ) = { ψ ∧ ψ : ψ ∈ D ( φ ) , ψ ∈ D ( φ ) } . • If φ = φ ∨ φ then D ( φ ) = D ( φ ) ∪ D ( φ ). • If φ = X φ ′ then D ( φ ) = { X ψ : ψ ∈ D ( φ ′ ) } . • If φ = F φ ′ then D ( φ ) = { F ψ : ψ ∈ D ( φ ′ ) } . Lemma 7.

For any u, v , . . . , v n , if φ separates u from v , . . . , v n , then thereexists ψ ∈ D ( φ ) which separates u from v , . . . , v n .Proof. We proceed by induction on φ . • If φ = c this is clear. • If φ = φ ∧ φ then D ( φ ) = { ψ ∧ ψ : ψ ∈ D ( φ ) , ψ ∈ D ( φ ) } . Since φ separates u from v , . . . , v n , there exists I , I ⊆ [1 , n ] such that I ∪ I = [1 , n ], φ separates u from { v i : i ∈ I } , and φ separates u from { v i : i ∈ I } . By induction hypothesis applied to both φ and φ there ex-ists ψ ∈ D ( φ ) separating u from { v i : i ∈ I } and ψ ∈ D ( φ ) separating u from { v i : i ∈ I } . It follows that ψ ∧ ψ separates u from v , . . . , v n ,and ψ ∧ ψ ∈ D ( φ ). The formula can be factorised to yield a formula of size O ( n · ℓ ). If φ = φ ∨ φ then D ( φ ) = D ( φ ) ∪ D ( φ ). Since φ separates u from v , . . . , v n , either φ or φ does as well; without loss of generality let ussay that φ separates u from v , . . . , v n . The induction hypothesis impliesthat ψ ∈ D ( φ ) separates u from v , . . . , v n , and ψ ∈ D ( φ ). • The cases φ = X φ ′ and φ = F φ ′ follow directly by induction hypothesis.We now prove Theorem 10. Proof.

Let u, v , . . . , v n +1 the words constructed by the reduction. We claimthat if there exists a formula in LTL ( F , X , ∧ , ∨ ) separating u from v , . . . , v n +1 ,then there exists a formula in LTL ( X , ∧ ) separating u from v , . . . , v n +1 of sizesmaller than or equal to the original formula. The proof goes in two steps: • from LTL ( F , X , ∧ , ∨ ) to LTL ( F , X , ∧ ); • from LTL ( F , X , ∧ ) to LTL ( X , ∧ ).Let φ ∈ LTL ( F , X , ∧ , ∨ ) separating u from v , . . . , v n +1 . Thanks to Lemma 7there exists ψ ∈ D ( φ ) separating u from v , . . . , v n +1 . Note that all formulas in D ( φ ) are smaller than or equal to φ , which ﬁnishes the proof of the ﬁrst claim.Let φ ∈ LTL ( F , X , ∧ ), we deﬁne [ φ ] ∈ LTL ( X , ∧ ) by induction: • If φ = a then [ φ ] = a . • If φ = φ ∧ φ then [ φ ] = [ φ ] ∧ [ φ ]. • If φ = X φ ′ then [ φ ] = X [ φ ′ ]. • If φ = F φ ′ then [ φ ] = [ φ ′ ].We claim that if φ separates u from v , . . . , v n +1 , then [ φ ] separates u from v , . . . , v n +1 . To prove this we will establish 3 properties.1. For every word w , w | = [ φ ] implies w | = φ .2. Let i ∈ [2 , ℓ + 1] and i ′ ∈ [1 , i − u, i | = φ then v n +1 , i ′ | = φ .3. If u | = φ and v n +1 = φ , then u | = [ φ ].Here are the proofs of these three properties.1. By induction on φ , we prove that w, i | = [ φ ] implies w, i | = φ . • If φ = a then [ φ ] = a so the property is trivial. • If φ = φ ∧ φ then [ φ ] = [ φ ] ∧ [ φ ] so the property follows byinduction hypothesis. • If φ = X φ ′ then [ φ ] = X [ φ ′ ] so the property follows by inductionhypothesis. • If φ = F φ ′ then [ φ ] = [ φ ′ ]. Assume w, i | = [ φ ], meaning w, i | = [ φ ′ ].By induction hypothesis this implies that w, i | = φ ′ . Now this impliesthat w, i | = F φ ′ (choose i ′ = i in the deﬁnition of the semantics of F ). 16. Recall that u = a ℓ +1 and v n +1 = a ℓ b . By induction on φ , we prove thatfor all i ∈ [2 , ℓ + 1] and i ′ ∈ [1 , i − u, i | = φ implies v n +1 , i ′ | = φ . • If φ ∈ { a, b } , since u, i | = φ necessarily φ = a , so v n +1 , i ′ | = φ (indeed i ′ ≤ ℓ so v n +1 ( i ′ ) = a ). • If φ = φ ∧ φ the property follows by induction hypothesis. • If φ = X φ ′ , we have u, i | = φ if i + 1 ≤ ℓ + 1 and u, i + 1 | = φ ′ .By induction hypothesis v n +1 , i ′ + 1 | = φ ′ , implying that v n +1 , i ′ | = X φ ′ = φ . • If φ = F φ ′ , we have u, i | = φ if there exists i ′ ∈ [ i, ℓ + 1] such that u, i ′ | = φ ′ . By induction hypothesis v n +1 , i ′ − | = φ ′ , with i ′ − ∈ [ i − , ℓ ], implying that for i ′′ ∈ [1 , i −

1] we have v n +1 , i ′′ | = F φ ′ , so v n +1 , i ′′ | = φ .3. By induction on φ , we prove that for all i ∈ [1 , ℓ + 1], if u, i | = φ and v n +1 , i = φ , then u, i | = [ φ ]. • If φ ∈ { a, b } , then [ φ ] = φ so the property is trivial. • If φ = φ ∧ φ the property follows by induction hypothesis. Indeed,since u, i | = φ then u, i | = φ and u, i | = φ . Since v n +1 , i = φ theneither v n +1 , i = φ or v n +1 , i = φ . Let us consider the ﬁrst case,the other being symmetric: v n +1 , i = φ . By induction hypothesisto φ we get that v n +1 , i = [ φ ]. Since [ φ ] = [ φ ] ∧ [ φ ] this impliesthat v n +1 , i = [ φ ]. • If φ = X φ ′ the property follows by induction hypothesis. • If φ = F φ ′ , then [ φ ] = [ φ ′ ]. Since u, i | = φ , there exists i ′ ∈ [ i, ℓ + 1]such that u, i ′ | = φ ′ . The second property implies that necessarily i ′ = i : indeed if i ′ > i we would have v n +1 , i | = φ ′ , implying that v n +1 , i | = φ . It follows that u, i | = φ ′ . Since v n +1 , i = φ in particular v n +1 , i = φ ′ . By induction hypothesis this implies that u, i | = [ φ ′ ],equivalently u, i | = [ φ ].Thanks to these three properties we can show that if φ separates u from v , . . . , v n +1 , then [ φ ] separates u from v , . . . , v n +1 . Since for each j ∈ [1 , n + 1],we have v j = φ , the ﬁrst property implies that v j = [ φ ]. Since u | = φ and v n +1 = φ , the third property implies that u | = [ φ ].17 Dual results and open problems

Towards stating the remaining most interesting open problems, let us ﬁrst givean easy dualisation argument. We deﬁne the duals as follows: a = ¬ a X = X F = G G = F ∧ = ∨ ∨ = ∧ . For a formula φ we write φ the formula obtained from φ by applying · inductively.Clearly, u | = φ if and only if u = φ . Consequently, φ separates u , . . . , u n from v , . . . , v n if and only if φ separates v , . . . , v n from u , . . . , u n . Using thisduality, LTL ( X , ∧ ) becomes LTL ( X , ∨ ), LTL ( F , ∧ ) becomes LTL ( G , ∨ ), and LTL ( F , X , ∧ , ∨ ) becomes LTL ( G , X , ∧ , ∨ ). Accordingly, all results we obtainedfor the three fragments apply to their duals.We have shown in Section 4 that there is no polynomial time (1 − o (1)) · log( n )-approximation algorithm, and constructed an (exponential in the number ofwords) n -approximation algorithm. Open problem 1.

Does there exist a polynomial time O (log( n )) -approximationalgorithm for learning LTL ( F , ∧ ) ? We have proved that the learning problem is NP -complete for the fragments LTL ( X , ∧ ) , LTL ( F , ∧ ), LTL ( F , X , ∧ , ∨ ), and their duals. The reduction usedfor proving the last result does not extend to full LTL (indeed G a separates u from v , . . . , v n +1 ). Open problem 2.

Is the learning problem NP -complete for full LTL ? Acknowledgments

We thank Daniel Neider for introducing us to this fascinating problem.

References [BVPA +

16] Giuseppe Bombara, Cristian Ioan Vasile, Francisco Penedo Alvarez,Hirotoshi Yasuoka, and Calin Belta. A Decision Tree Approach toData Classiﬁcation using Signal Temporal Logic. In

Hybrid Sys-tems: Computation and Control (HSCC) , 2016.[CM19] Alberto Camacho and Sheila A. McIlraith. Learning interpretablemodels expressed in linear temporal logic.

International Conferenceon Automated Planning and Scheduling, ICAPS , 29, 2019.[DS14] Irit Dinur and David Steurer. Analytical approach to parallel repe-tition. In

Symposium on Theory of Computing, STOC , pages 624–633, 2014.[EGN20] R¨udiger Ehlers, Ivan Gavran, and Daniel Neider. Learning prop-erties in LTL ∩ ACTL from positive examples only. In

FormalMethods in Computer Aided Design, FMCAD , 2020.[FS08] Philippe Flajolet and Robert Sedgewick.

Analytic Combinatorics .Cambridge University Press, 2008.18Gol78] E. Mark Gold. Complexity of automaton identiﬁcation from givendata.

Information and Control , 37(3):302–320, 1978.[KMS +

19] Joseph Kim, Christian Muise, Ankit Shah, Shubham Agarwal, andJulie Shah. Bayesian inference of linear temporal logic speciﬁca-tions for contrastive explanations. In

International Joint Confer-ence on Artiﬁcial Intelligence, IJCAI , 2019.[LPB15] Caroline Lemieux, Dennis Park, and Ivan Beschastnikh. GeneralLTL speciﬁcation mining. In

International Conference on Auto-mated Software Engineering, (ASE) , 2015.[NG18] Daniel Neider and Ivan Gavran. Learning linear temporal prop-erties. In

Formal Methods in Computer Aided Design, FMCAD ,pages 1–10, 2018.[Pnu77] Amir Pnueli. The temporal logic of programs. In

Symposium onFoundations of Computer Science, SFCS , 1977.[PW93] Leonard Pitt and Manfred K. Warmuth. The minimum consis-tent DFA problem cannot be approximated within any polynomial.

Journal of the ACM , 40(1):95–142, 1993.[RFN20] Rajarshi Roy, Dana Fisman, and Daniel Neider. Learning inter-pretable models in the property speciﬁcation language. In