[PDF] Cobham's Theorem and Automaticity

Abstract

We make certain bounds in Krebs' proof of Cobham's theorem explicit and obtain corresponding upper bounds on the length of a common prefix of an aperiodic a -automatic sequence and an aperiodic b -automatic sequence, where a and b are multiplicatively independent. We also show that an automatic sequence cannot have arbitrarily large factors in common with a Sturmian sequence.

Full PDF

aa r X i v : . [ c s . F L ] D ec Cobham’s Theorem and Automaticity

Lucas Mol and Narad Rampersad ∗ Department of Mathematics and StatisticsUniversity of Winnipeg {l.mol,n.rampersad}@uwinnipeg.ca

Jeﬀrey Shallit † School of Computer ScienceUniversity of Waterloo [email protected]

Manon Stipulanti ‡ Department of MathematicsUniversity of Liège [email protected]

December 17, 2018

Abstract

We make certain bounds in Krebs’ proof of Cobham’s theorem explicit and ob-tain corresponding upper bounds on the length of a common preﬁx of an aperiodic a -automatic sequence and an aperiodic b -automatic sequence, where a and b are mul-tiplicatively independent. We also show that an automatic sequence cannot have arbi-trarily large factors in common with a Sturmian sequence. This paper is concerned with the following question: Given a b -automatic sequence f and asequence g from some other family of sequences G , how similar can f and g be? By “similar”we could mean several things:1. f and g are identical;2. f and g have a long common preﬁx;3. f and g have a factor of length n in common for inﬁnitely many n ;4. f and g have the same set of factors of length n for all suﬃciently large n ;5. f and g agree on a set of positions of density . ∗ The author is supported by an NSERC Discovery Grant. † The author is supported by an NSERC Discovery Grant. ‡ The author is supported by FRIA Grant 1.E030.16. G is the family of a -automatic sequences, where a and b are multiplicatively indepen-dent ( a and b are not powers of the same integer), then we have some answers. Notably,Cobham’s theorem [6] states that f and g can be identical only if f and g are ultimatelyperiodic. Recently, Krebs [8] has given a very short and elegant proof of Cobham’s theo-rem. Much of what we do in the ﬁrst part of this paper is based on this proof of Cobham’stheorem. We also note that Byszewski and Konieczny [4] generalized Cobham’s theorem byshowing that if f and g coincide on a set of positions of density , then they are periodic ona set of positions of density .One of the main results of this paper concerns the “long common preﬁx” measure ofsimilarity. In particular we give explicit bounds (in terms of the number of states of theautomata generating the sequences) on how long f and g can agree before they are forced toagree forever. As an example of a result of this type, consider the following generalizationof the Fine–Wilf theorem [10, Theorem 2.3.5]: If f ∈ w { w, x } ω and g ∈ x { w, x } ω ( w and x are ﬁnite words) agree on a preﬁx of length | w | + | x | − gcd( | w | , | x | ) , then f = g . (Herethe notation { w, x } ω denotes the set of inﬁnite words of the form U U U · · · , where each U i ∈ { w, x } .) In our setting, where f is an a -automatic sequence and g is a b -automaticsequence, we obtain our bounds on the length of the common preﬁx by following the proofof Krebs and making explicit several of the bounds that appear in this proof. Our resultanswers a question posed by Zamboni (personal communication), who asked how long asequence generated by a b -uniform morphism and one generated by an a -uniform morphismcan agree before the two sequences are forced to be equal.This problem of bounding the length of the common preﬁx of f and g is related to theconcept of b -automaticity of inﬁnite sequences [9], which measures the minimum number ofstates of a base- b automaton that computes the length- n preﬁx of the sequence. In particular,we are able to get a lower bound on the b -automaticity of an a -automatic sequence.Regarding the property of having “arbitrarily large factors in common”, it is not diﬃcult tosee that even distinct aperiodic a -automatic and b -automatic sequences can have arbitrarilylarge factors in common. For example, the characteristic sequences of powers of and respectively are -automatic and -automatic respectively, and clearly have arbitrarily largeruns of ’s in common. The problem in this case is to show that in general such large factorsnecessarily have some simple structure; however, we do not address this question in thispaper.If we now change the family G of sequences from a -automatic to Sturmian, then it issomewhat easier to answer these kinds of questions. Sturmian sequences are those given bythe ﬁrst diﬀerences of sequences of the form ( ⌊ nα + β ⌋ ) n ≥ , where ≤ α, β < and α is irrational [3]. The number α is called the slope of the Sturmiansequence and the number β is the called the intercept . It is well-known that a Sturmiansequence cannot be b -automatic. This follows from the fact that the limiting frequency of ’s in a Sturmian sequence is α , whereas if a letter in a b -automatic sequence has a limitingfrequency, that frequency must be rational [6, Thm. 6, p. 180].The problem of determining the maximum length of a common preﬁx of a b -automaticsequence and a Sturmian sequence was examined by Shallit [9]. Upper bounds on the length2f the common preﬁx can be deduced from the automaticity results given by Shallit. In thepresent paper we answer, in the negative, the question, “Can a Sturmian sequence and a b -automatic sequence have arbitrarily large ﬁnite factors in common?”Byszewski and Konieczny [4] examine these questions for the family of generalized poly-nomial functions (these are sequences deﬁned by expressions involving algebraic operationsalong with the ﬂoor function). This family contains the family of Sturmian sequences as asubset. In recent work [5], they have extended some of the results of this paper to this moregeneral class.We also mention the work of Tapsoba [11]. Recall that the complexity of a word s is the function counting the number of distinct factors of length n in s . It is also well-known that Sturmian words have the minimum possible complexity n + 1 achievable by anaperiodic inﬁnite word. Tapsoba shows another distinction between automatic sequences andSturmian words by giving a formula for the minimal complexity function of the ﬁxed pointof an injective k -uniform binary morphism and comparing this to the complexity functionof Sturmian words. a -automatic and b -automaticsequences This section is largely based on the work of Krebs [8] and so we will mostly stick to thenotation used in his paper. The reader should read this section in conjunction with Krebs’paper; we occasionally omit details that can be found there.

Let b ≥ and let w ∈ { , , , . . . } ∗ . Write w = w n − w n − · · · w , where each w i ∈{ , , , . . . } . We deﬁne the number [ w ] b by [ w ] b = w n − b n − + w n − b n − + · · · + w b + w . Typically, one restricts w to be over the canonical digit set { , , . . . , b − } , in which caseevery natural number x has a unique representation w such that x = [ w ] b and w does notbegin with a (the number is represented by the empty string). In this case, we use h x i b to denote this representation w .However, Krebs’ proof requires the use of a larger digit set. Let D b denote the digit set { , . . . , b } . Over this digit set, numbers may no longer have unique representations, evenwith the restriction that the representation must begin with a non-zero digit. We use thenotation ( x ) D b to refer to some particular representation of x over the digit set D b that doesnot begin with the digit zero, without necessarily specifying which representation it is. Notealso that if some representation ( x ) D b has length n , then x ≤ b n − X i =0 b i = 2 b ( b n − b − ≤ b n +1 . deterministic ﬁnite automaton with output (DFAO) is a -tuple ( S, D, δ, s , ∆ , F ) ,where S is a ﬁnite set of states , D is a ﬁnite input alphabet , δ : S × D → S is the transitionfunction , s ∈ S is the initial state , ∆ is a ﬁnite output alphabet , and F : S → ∆ is the output function . See [2] for more details.Let D be a set of non-negative digits containing { , , . . . , b − } . A sequence ( f x ) x ∈ N is ( b, D ) -automatic if there is a DFAO M = ( S, D, δ, s , ∆ , F ) such that f [ w ] b = F ( δ ( s , w )) forall w ∈ D ∗ . Note that for each x , the DFAO M must produce the same output for all w ∈ D ∗ satisfying x = [ w ] b . The DFAO M is called a ( b, D ) -DFAO . A sequence is b -automatic if itis ( b, { , , . . . , b − } ) -automatic, and the automaton M in this case is called a b -DFAO . Krebs begins his proof by showing that a sequence f is automatic with respect to repre-sentations over the canonical base- b digit set if and only if it is automatic with respect torepresentations over the digit set D b . The reverse direction can be seen by noting that givena ( b, D b ) -DFAO generating f , one obtains a b -DFAO generating f simply by deleting thetransitions on all digits other than { , , . . . , b − } . The forward direction is proved usingtwo results: the ﬁrst is a modiﬁcation of [2, Theorem 6.8.6] and the second can be foundin [7, Proposition 7.1.4]. The ﬁrst result [2, Theorem 6.8.6] states that if a sequence f isgenerated by a b -DFAO M , then so is the sequence obtained by ﬁrst applying a transducer T to the input and then feeding the output of T to M . As presented in [2], this result requires T to map words over the digits set { , , . . . , b − } to words over the same digit set; how-ever, the proof is easily modiﬁed to allow T to map words over any digit set to words over { , , . . . , b − } . Krebs therefore applies this modiﬁed version of [2, Theorem 6.8.6] where T is the transducer of [7, Proposition 7.1.4], which converts input over a non-canonical digitset (in our case D b ) to the canonical digit set for a given base b (this is called normaliza-tion ). The result of this operation is therefore a ( b, D b ) -DFAO computing f . We now discussthe details of this construction with the aim of obtaining a reasonably small ( b, D b ) -DFAOcomputing f .Let N be the transducer of [7, Lemma 7.1.1], which converts from the digit set D b to thedigit set { , , . . . , b − } and reads its input from least signiﬁcant digit to most signiﬁcantdigit. The number of states of N is determined by the quantity m = max {| e − d | : e ∈ D b , d ∈ { , , . . . , b − }} ; in particular, the state set of N is deﬁned to be Q = { s ∈ N : s < m/ ( b − } . In ourcase, we have m = 2 b , and furthermore, for b = 2 we have b/ ( b −

1) = 4 and for b > we have < b/ ( b − ≤ . We therefore set γ = 4 if b = 2 and γ = 3 if b > , so that Q = { s ∈ N : s < γ } .The set of transitions of N is E = { s e | d −→ s ′ : s + e = bs ′ + d } . The initial state is and the output function ω maps each state s ∈ Q to h s i b . Note that N is subsequential , or “input-deterministic”. To see this, suppose we have two transitions s e | d ′ −→ s ′ and s e | d ′′ −→ s ′′ . | , | | , | | | , | | | , | | , | | | , | | , | | | , | Figure 1: The transducer N in base converting D into { , } .Then bs ′ + d ′ = bs ′′ + d ′′ , which we can rewrite as ( s ′ − s ′′ ) b = d ′′ − d ′ . However, we have | d ′′ − d ′ | < b , so | s ′ − s ′′ | < , which implies s ′ = s ′′ and d ′ = d ′′ .On input u = e n e n − · · · e over D b , the transducer N produces output v = ω ( s ) d n d n − · · · d over { , , . . . , b − } , where s is the state reached by N after reading u , and [ u R ] b = [ v R ] b . Example 1.

Throughout this section, we illustrate the proof with the case b = 2 . In thiscase, the transducer N is the one given in Figure 1. For instance, on input u = 4032 over D , the transitions of N are | −→ | −→ | −→ | −→ , so N outputs v = h i , which is the canonical base- expansion of u .Let M = ( S, { , , . . . , b − } , δ, I, ∆ , F ) be a b -DFAO generating a b -automatic sequence f . Recall that our convention is that a b -DFAO reads its input from most signiﬁcant digitto least signiﬁcant digit. Example 1 (Continued) . We now consider the Thue–Morse sequence t = 01101001 · · · which is the ﬁxed point of the morphism τ : 0 , . It is well known thatthe Thue–Morse sequence t is -automatic and can be generated by the -DFAO M =( S, { , } , δ, I, ∆ , F ) with S = { , } = F and I = 0 drawn in Figure 2.Let M ′ = ( S ′ , D b , δ ′ , I ′ , ∆ , F ′ ) , be the ( b, D b ) -DFAO deﬁned as follows (again, it reads itsinput from most signiﬁcant digit to least signiﬁcant digit). We deﬁne S ′ = {{ ( s , , ( s , , . . . , ( s γ − , γ − } : s , s , . . . , s γ − ∈ S } , and I ′ = { ( δ ( I, h q i b ) , q ) : 0 ≤ q < γ } .

10 1 01

Figure 2: The -DFAO M generating the Thue–Morse sequence.Clearly we have I ′ ∈ S ′ . For any t ∈ S ′ and e ∈ D b , we deﬁne δ ′ ( t, e ) = [ ( s,q ) ∈ t n ( δ ( s, d ) , q ′ ) : q ′ e | d −→ q in N o . Finally, for t ∈ S ′ , deﬁne F ′ ( t ) = F ( s ) , where ( s, ∈ t (by the deﬁnition of S ′ , there is aunique such s ∈ S ).We ﬁrst show that δ ′ is well-deﬁned. Let t ∈ S ′ and e ∈ D b , and we will show that δ ′ ( t, e ) ∈ S ′ . We need to show that for every state p of N (i.e., every p ∈ Q ) the set δ ′ ( t, e ) contains a unique element of the form ( s, p ) , where s ∈ S . Let p ∈ Q be a state of N . Since N is input-deterministic, there is exactly one outgoing transition from p in N with inputsymbol e , say p e | d −→ q in N . Since ( s, q ) ∈ t for exactly one s ∈ S (by deﬁnition of S ′ ),we conclude that ( δ ( s, d ) , p ) ∈ δ ′ ( t, e ) , and it is the unique element in δ ′ ( t, e ) with secondcoordinate p .Now we show that M ′ computes the same automatic sequence as M . For any u = u m · · · u ∈ D ∗ b that doesn’t begin with , there exists exactly one v = v n · · · v ∈ { , , . . . , b − } ∗ that doesn’t begin with such that [ u ] b = [ v ] b . Namely, v = h [ u ] b i b . Note that m ≤ n ≤ m + 2 . We need to show that if ( s, ∈ δ ′ ( I ′ , u ) , then δ ( I, v ) = s . Suppose that ( s, ∈ δ ′ ( I ′ , u ) . Then in N , we have u | v −→ q u | v −→ q u | v −→ · · · u m | v m −→ q m , and h q m i b = v n · · · v m +1 . Therefore, we have ( δ ( I, v n · · · v m +1 ) , q m ) ∈ I ′ , and retracing thesteps of M ′ , we conclude that δ ( I, v ) = s. Informally, M ′ works through the transducer N in the reverse direction, while computingthe transitions of M on the output. Since we are working through the transducer backwards,there are γ possible places to start, each corresponding to a diﬀerent backwards path throughthe transducer. Further, if we start working backwards from state q in the transducer, thenthe output function of the transducer will be h q i b . The output function of the transduceris read ﬁrst by M ′ , which explains the deﬁnition of I ′ . Only when we reach the end of theinput string do we know which backwards path through the transducer was correct (the onethat started at state ), so M ′ computes the transitions of M for all γ paths along the way.We have therefore shown how, given a b -DFAO M for f , to produce a ( b, D b ) -DFAO M ′ that also generates f . Furthermore, the ( b, D b ) -DFAO M ′ has at most | S | γ ≤ | S | (1)6 (1 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (0 , }{ (0 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , } { (0 , , (1 , , (1 , , (0 , } { (0 , , (0 , , (1 , , (1 , }{ (1 , , (1 , , (0 , , (0 , } { (0 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (1 , }{ (0 , , (0 , , (1 , , (0 , }

01 2 340 , ,

412 30 123 40 123 4

Figure 3: The (2 , D ) -DFAO M ′ computing the Thue–Morse sequence (“white” states output ; “grey” states output ).states. Example 1 (Continued) . In Figure 3, we give the (2 , D ) -DFAO M ′ (omitting all unreach-able states) that computes the Thue–Morse sequence. We also give its transition table inTable 1. To that aim, recall that γ = 4 . From Figure 2, we also get I ′ = { ( δ ( I, h q i b ) , q ) : 0 ≤ q < γ } = { ( δ ( I, ǫ ) , , ( δ ( I, , , ( δ ( I, , , ( δ ( I, , } = { (0 , , (1 , , (1 , , (0 , } . We also compute M ′ on two diﬀerent words u ∈ D ∗ . Take u = 4032 ∈ D ∗ whose canonical7 ′ ( t, e ) e ∈ { , , } t ∈ S ′ { (0 , , (1 , , (1 , , (0 , } { (0 , , (1 , , (1 , , (0 , } { (1 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (0 , }{ (1 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , }{ (1 , , (0 , , (1 , , (0 , } { (1 , , (0 , , (0 , , (1 , } { (0 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , }{ (0 , , (1 , , (0 , , (0 , } { (0 , , (1 , , (1 , , (0 , } { (1 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , }{ (1 , , (0 , , (0 , , (1 , } { (1 , , (0 , , (0 , , (1 , } { (0 , , (0 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (1 , }{ (0 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , }{ (1 , , (0 , , (1 , , (1 , } { (1 , , (0 , , (0 , , (1 , } { (0 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , }{ (1 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , }{ (0 , , (0 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , }{ (0 , , (1 , , (0 , , (1 , } { (0 , , (1 , , (1 , , (0 , } { (1 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , } δ ′ ( t, e ) e ∈ { , } t ∈ S ′ { (0 , , (1 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , }{ (1 , , (1 , , (0 , , (1 , } { (0 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , }{ (1 , , (0 , , (1 , , (0 , } { (1 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , }{ (0 , , (1 , , (0 , , (0 , } { (0 , , (0 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (1 , }{ (1 , , (0 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , }{ (0 , , (0 , , (1 , , (1 , } { (1 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (0 , }{ (1 , , (0 , , (1 , , (1 , } { (1 , , (1 , , (0 , , (1 , } { (1 , , (0 , , (1 , , (0 , }{ (1 , , (1 , , (0 , , (0 , } { (0 , , (0 , , (1 , , (0 , } { (0 , , (1 , , (0 , , (1 , }{ (0 , , (0 , , (1 , , (0 , } { (1 , , (1 , , (0 , , (0 , } { (1 , , (0 , , (0 , , (1 , }{ (0 , , (1 , , (0 , , (1 , } { (0 , , (0 , , (1 , , (1 , } { (0 , , (1 , , (1 , , (0 , } Table 1: The transition function δ ′ of M ′ as a function of t ∈ S ′ and e ∈ { , , , , } . ase- expansion is v = 101000 . The transitions are I ′ = { (0 , , (1 , , (1 , , (0 , } −→ { (1 , , (0 , , (0 , , (1 , } −→ { (1 , , (0 , , (0 , , (1 , } −→ { (1 , , (0 , , (1 , , (1 , } −→ { (0 , , (1 , , (1 , , (0 , } . By deﬁnition of F ′ , we have F ′ ( { (0 , , (1 , , (1 , , (0 , } ) = F (0) = 0 . Thus the automaton M ′ outputs after reading u , just as the automaton M does when reading v . The secondcoordinates of the ordered pairs in bold are the states of the “correct path” through thetransducer N , in reverse: | −→ | −→ | −→ | −→ . The ﬁrst coordinate of the bolded pair in I ′ is δ ( I, h i ) = δ ( I,

10) = 1 , and the ﬁrstcoordinates of the remaining bolded pairs are determined by starting from state δ ( I,

10) = 1 in M and following the transitions of M given by the output labels of the above path through N (again, working backwards through N ): δ ( I,

10) = 1 −→ −→ −→ −→ δ ( I, v ) . This illustrates how, on input u , M ′ computes F ( δ ( I, v )) , which is exactly the output of M on input v .As a second illustration, take u ′ = 2014 ∈ D ∗ whose canonical base- expansion is v ′ = 10110 . On the input u ′ , the transitions of M ′ are I ′ = { (0 , , (1 , , (1 , , (0 , } −→ { (1 , , (0 , , (1 , , (0 , } −→ { (1 , , (0 , , (0 , , (1 , } −→ { (0 , , (0 , , (1 , , (0 , } −→ { (1 , , (0 , , (0 , , (1 , } . Similarly, F ′ ( { (1 , , (0 , , (0 , , (1 , } ) = F (1) = 1 , so the automaton M ′ outputs afterreading u ′ , agreeing with the output of M on input v ′ . Again, we have bolded the orderedpairs corresponding to the “correct path” through the transducer N .We end this section with some remarks on the construction. We hope that the readeris convinced that the construction we have described works for any digit set containing { , , . . . , b − } and not just the digit set D b . Furthermore, Krebs has pointed out (privatecommunication) that the number of states needed for the construction can be improved bychanging the digit set from D b to { , , . . . , b − } . Recall that our construction resultsin a DFAO with | S | γ states. If b = 2 , then we have γ = 4 , while if b > , then we have γ = 3 . However, if we change the digit set as suggested by Krebs, we improve this to | S | states. Krebs’ proof of Cobham’s Theorem works just as well with this new choice of digitset; however, a number of bounds and constants in his proof would have to be modiﬁed. Wedo not present these modiﬁcations here; we just note that it is possible to do it.9 .3 Upper bound on longest commmon preﬁx Having dealt with the conversion to the larger digit set required by Krebs, we now proceedwith the Diophantine approximation result used by Krebs.

Lemma 2.

Let a, b ≥ be integers and let ǫ be a positive real number. Deﬁne η := max {⌈ log a b ⌉ , ⌈ log b a ⌉} . There are non-negative integers m, n < η (( b − /ǫ + 1) such that | a m − b n | ≤ ǫb n . Proof.

First suppose that a ≥ b . Let ( f x ) x ∈ N be the sequence such that a x b − f x ∈ [1 , b ) forall x ∈ N . Then ≤ (log b a ) x − f x , so f x ≤ (log b a ) x . Now by the pigeonhole principle thereexist x < y ≤ ( b − /ǫ + 1 such that (cid:12)(cid:12) a y b − f y − a x b − f x (cid:12)(cid:12) ≤ ǫ ; i.e., (cid:12)(cid:12) a y − x − b f y − f x (cid:12)(cid:12) ≤ ǫb f y a − x ≤ ǫb f y − f x . Thus, we have m = y − x ≤ y ≤ ( b − /ǫ + 1 and n = f y − f x ≤ f y ≤ (log b a ) y ≤ (log b a )(( b − /ǫ + 1) ≤ η (( b − /ǫ + 1) , as required.Now suppose that a < b . Applying the previous argument with a ⌈ log a b ⌉ in place of a (where ⌈ ρ ⌉ denotes the least integer greater than or equal to ρ ) , we ﬁnd that m = ⌈ log a b ⌉ ( y − x ) ≤ ⌈ log a b ⌉ y ≤ ⌈ log a b ⌉ (( b − /ǫ + 1) ≤ η (( b − /ǫ + 1) , and n = f y − f x ≤ f y ≤ ⌈ log a b ⌉ (log b a ) y ≤ ⌈ log a b ⌉ (log b a )(( b − /ǫ + 1) ≤ η (( b − /ǫ + 1) , as required (the ﬁnal inequality above follows from the fact that log b a < in this case).As in Lemma 2, deﬁne η := max {⌈ log a b ⌉ , ⌈ log b a ⌉} and also deﬁne θ := max { a, b } . Wenow deﬁne E ( a, b, R, S ) := η (cid:2) (cid:0) θ ( S +1)( R +1) + 1 (cid:1) ( θ −

1) + 1 (cid:3) ,A ( a, b, R, S ) := (cid:0) θ ( S +1)( R +1) + 2 (cid:1) θ E ( a,b,R,S ) , and note that both these functions are symmetric under exchange of their ﬁrst two argumentsand also under exchange of their last two arguments. Theorem 3.

Let a, b ≥ be multiplicatively independent integers. Let g = ( g x ) x ∈ N becomputed by a DFAO M a = ( S a , D a , δ a , s ,a , ∆ a , F a ) in base a and let f = ( f x ) x ∈ N be computedby a DFAO M b = ( S b , D b , δ b , s ,b , ∆ b , F b ) in base b . Suppose that f and g agree on a preﬁxof length A ( a, b, | S a | , | S b | ) . Then f and g are equal and ultimately periodic. roof. Let S ∞ be the subset of states of M b consisting of all states s with the property thatthere are inﬁnitely many numbers x such that some representation ( x ) D b reaches state s in M b . For each s ∈ S ∞ , we claim that there must exist a state t ∈ S a and positive integers x st and y st such that some base- b representations ( x st ) D b and ( y st ) D b both lead to state s in M b and some base- a representations ( x st ) D a and ( y st ) D a both lead to state t in M a . We showthis by giving an explicit upper bound on x st and y st .If a string W has length at least | S b | , then any computation of M b on W repeats a state.Since for each s ∈ S ∞ there are inﬁnitely many ( x ) D b that reach state s , there must existsome number x , some representation ( x ) D b , and some factorization ( x ) D b = uw with thefollowing properties: • | ( x ) D b | ≤ | S b | . • There exists a non-empty v such that | v | ≤ | S b | and uv i w reaches s for all i ≥ .For ≤ i ≤ | S a | , let x i be the integer such that ( x i ) D b = uv i w . Then the numbers x i are alldistinct. Now consider the states reached in M a by some choice of representations ( x i ) D a ,for ≤ i ≤ | S a | . There must be two such numbers x i and x j such that ( x i ) D a and ( x j ) D a reach the same state t in M a . We choose these as our x st and y st . Finally, we note that for ≤ i ≤ | S a | , we have | ( x i ) D b | ≤ | S b | ( | S a | + 1) , which gives the bound x st , y st ≤ b | S b | ( | S a | +1)+1 .Let ξ := max { x st , y st | s ∈ S ∞ } + 1 ≤ b | S b | ( | S a | +1)+1 + 1 . By Lemma 2, there exist m, n ≤ η [6 ξ ( b −

1) + 1] ≤ E ( a, b, | S a | , | S b | ) such that ξ | a m − b n | ≤ b n . As deﬁned in [8], let p st := ( x st − y st )( a m − b n ) (swapping x st and y st if necessary, so that p st > ), and note that from [8] we have p st ≤ b n . Let z beany integer such that z, z + p st ∈ (cid:2) b n , b n (cid:3) . In particular, there exist representations ( z ) D b and ( z + p st ) D b such that | ( z ) D b | , | ( z + p st ) D b | ≤ n . In what follows, we speciﬁcally use therepresentations of z and z + p st that satisfy this condition on their lengths. We also note thatby the calculation in [8], we have z − y st ( a m − b n ) ≤ a m , so there is also a representation ( z − y st ( a m − b n )) D a of length at most m .Let x be any integer such that some representation ( x ) D b goes to state s in M b . Recallthat ( x st ) D b and ( y st ) D b go to state s in M b and ( x st ) D a and ( y st ) D a go to state t in M a . If f and g agree on a suﬃciently long preﬁx (to be speciﬁed later), then we have f xb n + z = f y st b n + z (since ( x ) D b and ( y st ) D b go to state s in M b ) = f y st a m + z − y st ( a m − b n ) (rewriting the index) = g y st a m + z − y st ( a m − b n ) (since f and g agree) = g x st a m + z − y st ( a m − b n ) (since ( y st ) D a and ( x st ) D a go to state t in M a ) = g x st b n + z + p st (rewriting the index) = f x st b n + z + p st (since f and g agree) = f xb n + z + p st (since ( x st ) D b and ( x ) D b go to state s in M b ) . For this calculation to be correct, the two sequences f and g should agree on a preﬁx of11ength max { y st , x st | s ∈ S ∞ } a m + z − y st ( a m − b n ) ≤ ( ξ − a m + z − y st ( a m − b n ) ≤ ( ξ − a m + 2 a m ≤ ( ξ + 1) a m . Now ξ ≤ b | S b | ( | S a | +1)+1 + 1 , so we have ( ξ + 1) a m ≤ (cid:0) b | S b | ( | S a | +1)+1 + 2 (cid:1) a m ≤ (cid:0) b | S b | ( | S a | +1)+1 + 2 (cid:1) a E ( a,b, | S a | , | S b | ) ≤ A ( a, b, | S a | , | S b | ) . Thus, if f and g agree on a preﬁx of length A ( a, b, | S a | , | S b | ) , then f has a local period p st ≤ b n ≤ b E ( a,b, | S a | , | S b | ) on the interval [( x + 1 / b n , ( x + 5 / b n ] . By the same argument as in [8], the sequence f isultimately periodic. We will show further that the periodicity begins after a preﬁx of lengthat most (cid:18) b | S b | +1 + 13 (cid:19) b n ≤ (cid:18) b | S b | +1 + 13 (cid:19) b E ( a,b, | S a | , | S b | ) . Any representation ( x ) D b of length | S b | must reach a state in S ∞ . Therefore if x = 2 b | S b | +1 ,then for every y ≥ x , every representation ( y ) D b reaches a state in S ∞ . Now by the argumentof [8], the sequence f has period p f := p st starting from index i f := (cid:18) b | S b | +1 + 13 (cid:19) b n . By a similar argument (with the roles of f and g reversed) we ﬁnd that if f and g agreeon a preﬁx of length A ( a, b, | S a | , | S b | ) , then g has period p g starting from some index i g ,where p g and i g are deﬁned analogously to p f and i f . Now, we have max { p f , p g } ≤ θ E ( a,b, | S a | , | S b | ) , and max { i f , i g } ≤ max (cid:26)(cid:18) b | S b | +1 + 13 (cid:19) b E ( a,b, | S a | , | S b | ) , (cid:18) a | S a | +1 + 13 (cid:19) a E ( b,a, | S b | , | S a | ) (cid:27) ≤ max (cid:26)(cid:18) b | S b | +1 + 13 (cid:19) , (cid:18) a | S a | +1 + 13 (cid:19)(cid:27) θ E ( a,b, | S a | , | S b | ) , so max { i f , i g } + p f + p g ≤ max (cid:26)(cid:18) b | S b | +1 + 23 (cid:19) , (cid:18) a | S a | +1 + 23 (cid:19)(cid:27) θ E ( a,b, | S a | , | S b | ) ≤ A ( a, b, | S a | , | S b | ) . Therefore, by the Fine–Wilf theorem [10, Theorem 2.3.5], the sequences f and g are equal.12n the next corollary, let exp r ( x ) denote the function r x . Corollary 4.

Let a, b ≥ be multiplicatively independent integers. Let g = ( g x ) x ∈ N and f = ( f x ) x ∈ N be sequences over a set ∆ of size d . Suppose that g is computed by an a -DFAO M ′ a with R states and f is computed by a b -DFAO M ′ b with S states. There is a positiveconstant C (depending only on a and b ) such that if f and g agree on a preﬁx of length exp θ (exp θ ( CR S )) (2) then f and g are equal and ultimately periodic.Proof. We have previously observed that conversion from a b -DFAO to a ( b, D b ) -DFAOincreases the number of states to at most the quantity (1). We apply the bound of Theorem 3with | S a | = R and | S b | = S . Simplifying the resulting expression, we ﬁnd that there is a positive constant C such thatthe bound of Theorem 3 is at most the quantity (2).Note that the bound on the length of the common preﬁx that we obtain seems absurdlylarge compared to what seems likely to be the optimal bound. It is not too diﬃcult to givean example where the common preﬁx has length that is (singly) exponential in the size ofthe deﬁning automata. For instance, let g be the constant (and hence a -automatic) sequence (0 , , , , . . . ) . Fix some positive integer N and let f be the characteristic sequence of theset { b n − n ≥ N } . Then f is an aperiodic b -automatic sequence. Indeed, a b -DFAO M generating f can be obtained from the N + 2 state DFA accepting the regular language ∗ ( b − N ( b − ∗ by making the accepting state output and all other states output .Then M has N + 2 states and the sequences f and g agree on a preﬁx of length b N − .Now we examine the connection to automaticity . The b -automaticity of a sequence g is the function A bg ( n ) whose value is the least number of states required in a b -DFAO thatcomputes a preﬁx of g of length n . Shallit [9, Proposition 1(c)] showed that if g is not b -automatic, then there is a constant c such that A bg ( n ) ≥ c log b n for inﬁnitely many n . Corollary 5.

Let a, b be multiplicatively independent integers with a, b ≥ . There is apositive constant D such that the b -automaticity A bg ( n ) of an aperiodic a -automatic sequence g satisﬁes A bg ( n ) > D (log log n ) / , for all n ≥ .Proof. Let M a be an a -DFAO computing g and let M b,n be a b -DFAO computing a sequence f that agrees with g on a preﬁx of length n . Suppose that M a has E states and that M b,n has S n states. Since g is aperiodic, by (2) we have n < exp θ (exp θ ( CE ( S n ) )) Treating E as a constant, we get S n > (cid:18) C / E (cid:19) (log θ log θ n ) / = D (log log n ) / , for some positive constant D . 13ote that while this may seem weaker than the c log b n lower bound mentioned previously,the former only holds for inﬁnitely many n , whereas our lower bound holds for all n . Withoutthe assumption that g is a -automatic, the b -automaticity of g could potentially be constantfor long stretches, and only for very sparsely distributed values of n satisfy A bg ( n ) ≥ c log b n .Our result shows that under the assumption that g is a -automatic, the function A bg ( n ) cannotbe constant for too long.On the other hand, our lower bound on the b -automaticity does seem to be rather weakcompared to what can be proved for speciﬁc sequences. Shallit [9] showed that if p is theﬁxed point of → , → , then for k odd, we have A kp ( n ) = Ω( n / /k ) , and if t is the ﬁxed point of → , → (the Thue–Morse word), then for k odd, we have A kt ( n ) = Ω( n / /k / ) . b -automatic and Sturmiansequences As mentioned in the introduction, the problem of bounding the length of the longest commonpreﬁx of a b -automatic sequence and a Sturmian sequence was addressed by Shallit [9]. In thissection, we show that two such sequences cannot have arbitrarily large factors in common.Our main result is the following: Theorem 6.

Let f be a b -automatic sequence and let g be a Sturmian sequence. There existsa constant C (depending on f and g ) such that if f and g have a factor in common of length n , then n ≤ C . Note that this result would follow fairly easily from the frequency results mentionedpreviously, if f is uniformly recurrent (meaning that every factor z of f occurs inﬁnitelyoften, and with bounded gap size between two consecutive occurrences). However, unlikeSturmian sequences, automatic sequences need not be uniformly recurrent: consider, forexample, the -automatic sequence that is the characteristic sequence of the powers of .Our proof is therefore based on the ﬁniteness of the b -kernel of f , along with the uniformdistribution property of Sturmian sequences (this is similar to the techniques used in [9]). Proof.

Let f = f f · · · and g = g g · · · , where g has slope α and intercept β . Since thefactors of a Sturmian word do not depend on β , without loss of generality, we may supposethat β = 0 (or, in other words, that g is a characteristic word ). Then g can be deﬁned bythe following rule: g n = ( , if { ( n + 1) α } < α ;0 , otherwise.Here {·} denotes the fractional part of a real number.Suppose that for some integer L , the words f and g have a factor of length L in common:i.e., for some i ≤ j , we have f i · · · f i + L − = g j · · · g j + L − . i ≤ j since g is recurrent, but this is not important for what follows.)Suppose that the b -kernel of f , { ( f nb r + s ) n ≥ : r ≥ and ≤ s < b r } , has Q distinct elements. Let r satisfy b r > Q . There there exist integers s , s with ≤ s { d α } . Choose ǫ > such that ǫ < { d α } − { d α } . Note that d − d = s − s ,so ǫ does not depend on L (or I ). Since b r α is irrational, if I is suﬃciently large, then byKronecker’s theorem (which asserts that the set of points { nα } is dense in (0 , ) there exists N ∈ I such that { N ( b r α ) + d α } ∈ [ α, α + ǫ ] . By the choice of ǫ , this implies that { N ( b r α ) + d α } ≥ α and { N ( b r α ) + d α } < α, contradicting the assumption that N satisﬁes one of (3) or (4). The contradiction meansthat L must be bounded by some constant C , which proves the theorem. Example 7.

Consider the Thue-Morse word t = 01101001 · · · , and the Fibonacci word f = 01001010 · · · given by the ﬁxed point of → and → . The latter is Sturmian.The set of common factors is { ǫ, , , , , , , , , , , , , , , , , , , , , , , , } , so C = 8 . 15 Final thoughts

As noted at the end of Section 2, the

Ω((log log n ) / ) lower bound we obtain on the b -automaticity of an aperiodic a -automatic sequence is surely not optimal. Sequences with O (log n ) (i.e., “low”) b -automaticity are called b -quasiautomatic in [9]. It seems unlikely thatan aperiodic a -automatic sequence can even be b -quasiautomatic. Known examples of b -quasiautomatic sequences strongly resemble b -automatic sequences. For example, the ﬁxedpoint of the morphism c → cba , a → aa , b → b , starting with c , is -quasiautomatic, butnot -automatic [9]. Similarly, the ﬁxed point of → , → is not -automatic[1] but is conjectured to be -quasiautomatic [9]. Curiously, this latter sequence is auto-matic with respect to the positional numeration system (and a certain choice of canonicalrepresentations) whose place values are given by the sequence ((2 n − ( − n ) / n ≥ [1].We conclude by again mentioning the problem stated in the introduction of characterizingthe common factors of a b -automatic sequence and an a -automatic sequence. Can the methodof Krebs be applied to this problem? Acknowledgments

We thank Jean-Paul Allouche for helpful discussions. The normalization construction ofSection 2.2 was obtained in discussions with Émilie Charlier, Julien Leroy, and Michel Rigoof the University of Liège. We thank them for their help with this problem.After we posted an initial version of this paper on the arXiv, Thijmen Krebs contacted uswith a number of very helpful comments. He clariﬁed some important points regarding hispaper, and gave several suggestions which greatly simpliﬁed and improved the presentationof the normalization construction. We are very grateful for his feedback, which signiﬁcantlyimproved the exposition in Section 2.2.

References [1] G. Allouche, J.-P. Allouche, and J. Shallit, “Kolam indiens, dessins sur le sable aux îlesVanuatu, courbe de Sierpinski et morphismes de monoïde”, Ann. Inst. Fourier, Grenoble56 (2006), 2115–2130.[2] J.-P. Allouche and J. Shallit,

Automatic Sequences: Theory, Applications, Generaliza-tions , Cambridge, 2003.[3] J. Berstel and P. Séébold, “Sturmian words”, in M. Lothaire, ed.,

Algebraic Combinatoricson Words , Cambridge University Press, 2002, pp. 40–97.[4] J. Byszewski and J. Konieczny, “Automatic sequences and generalized polynomials”.Preprint available at https://arxiv.org/abs/1705.08979 .[5] J. Byszewski and J. Konieczny, “Factors of generalised polynomials and automatic se-quences”, Indag. Math. (N.S.) 29 (2018), no. 3, 981–985.166] A. Cobham, “Uniform tag sequences”, Math. Systems Theory 6 (1972), 164–192.[7] M. Lothaire,

Algebraic Combinatorics on Words , Cambridge, 2002.[8] T. Krebs, “A more reasonable proof of Cobham’s Theorem”. Preprint available at https://arxiv.org/abs/1801.06704 .[9] J. Shallit, “Automaticity IV: sequences, sets, and diversity”, J. Théorie des Nombres deBordeaux, 8 (1996), 347–367.[10] J. Shallit,

Related Researches

Regular Model Checking Approach to Knowledge Reasoning over Parameterized Systems (technical report)

by Daniel Stan

Lie complexity of words

by Jason P. Bell

Parallel Hyperedge Replacement String Languages

by Graham Campbell

Recursive Prime Factorizations: Dyck Words as Numbers

by Ralph L. Childress

Subcubic Certificates for CFL Reachability

by Dmitry Chistikov

Explaining Safety Failures in NetKAT

by Georgiana Caltais

Decision Power of Weak Asynchronous Models of Distributed Computing

by Philipp Czerner

Automatic sequences: from rational bases to trees

by Michel Rigo

A theory of Automated Market Makers in DeFi

by Massimo Bartoletti

Model Checking for Decision Making System of Long Endurance Unmanned Surface Vehicle

by Hanlin Niu

Simplest Non-Regular Deterministic Context-Free Language

by Petr Jancar

Synthesis and Implementation of Distributed Supervisory Controllers with Communication Delays

by R.H.J. Schouten

Les claviers, un modèle de calcul

by Yoan Géran

On Typical Hesitant Fuzzy Languages and Automata

by Valdigleis S. Costa

On polynomial grammars extended with substitution

by Janusz Schmude

New Techniques for Universality in Unambiguous Register Automata

by Wojciech Czerwi?ski

Learning Pomset Automata

by Gerco van Heerdt

Locality and Centrality: The Variety ZG

by Antoine Amarilli

MatchKAT: An Algebraic Foundation For Match-Action

by Xiang Long

Dynamic Membership for Regular Languages

by Antoine Amarilli

Adaptive Synchronisation of Pushdown Automata

by A. R. Balasubramanian

Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata

by Borja Balle

Which Regular Languages can be Efficiently Indexed?

by Nicola Cotumaccio

Recognizability of languages via deterministic finite automata with values on a monoid: General Myhill-Nerode Theorem

by José Ramón González de Mendívil

The Complexity of Learning Linear Temporal Formulas from Examples

by Nathanaël Fijalkow

«

1

2

3

4

»

Submitted on 3 Sep 2018 (v1), last revised 14 Dec 2018 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar