[PDF] k-Spectra of weakly-c-Balanced Words

Abstract

A word u is a scattered factor of w if u can be obtained from w by deleting some of its letters. That is, there exist the (potentially empty) words u 1 , u 2 ,..., u n , and v 0 , v 1 ,.., v n such that u= u 1 u 2 ... u n and w= v 0 u 1 v 1 u 2 v 2 ... u n v n . We consider the set of length- k scattered factors of a given word w, called here k -spectrum and denoted $\ScatFact_k(w)$. We prove a series of properties of the sets $\ScatFact_k(w)$ for binary strictly balanced and, respectively, c -balanced words w , i.e., words over a two-letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has c -more occurrences than the other. In particular, we consider the question which cardinalities $n= |\ScatFact_k(w)|$ are obtainable, for a positive integer k , when w is either a strictly balanced binary word of length 2k , or a c -balanced binary word of length 2k−c . We also consider the problem of reconstructing words from their k -spectra.

Full PDF

aa r X i v : . [ c s . F L ] M a y k -Spectra of Weakly- c -Balanced Words Joel D. Day , Pamela Fleischmann , Florin Manea , and Dirk Nowotka Loughborough University, UK,

[email protected] Kiel University, Germany { fpa,flm,dn } @informatik.uni-kiel.de Abstract.

A word u is a scattered factor of w if u can be obtained from w by deleting some of its letters. That is, there exist the (potentiallyempty) words u , u , ..., u n , and v , v , .., v n such that u = u u ...u n and w = v u v u v ...u n v n . We consider the set of length- k scattered factorsof a given word w , called here k -spectrum and denoted ScatFact k ( w ). Weprove a series of properties of the sets ScatFact k ( w ) for binary weakly-0-balanced and, respectively, weakly- c -balanced words w , i.e., words overa two-letter alphabet where the number of occurrences of each letteris the same, or, respectively, one letter has c -more occurrences thanthe other. In particular, we consider the question which cardinalities n = | ScatFact k ( w ) | are obtainable, for a positive integer k , when w is either a weakly-0-balanced binary word of length 2 k , or a weakly- c -balanced binary word of length 2 k − c . We also consider the problem ofreconstructing words from their k -spectra. Given a word w , a scattered factor (also called scattered subword, or simplysubword in the literature) is a word obtained by removing one or more factorsfrom w . More formally, u is a scattered factor of w if there exist u , . . . , u n ∈ Σ ∗ , v , . . . , v n ∈ Σ ∗ such that u = u u . . . u n and w = v u v u . . . u n v n .Consequently a scattered factor of w can be thought of as a representation of w in which some parts are missing. As such, there is considerable interest inthe relationship of a word and its scattered factors from both a theoretical andpractical point of view. For an introduction to the study of scattered factors, seeChapter 6 of [9]. On the one hand, it is easy to imagine how, in any situationwhere discrete, linear data is read from an imperfect input – such as whensequencing DNA or during the transmission of a digital signal – scattered factorsform a natural model, as multiple parts of the input may be missed, but the restwill remain unaﬀected and in-sequence. For instance, various applications andconnections of this model in veriﬁcation are discussed in [14,6] within a languagetheoretic framework, while applications of the model in DNA sequencing arediscussed in [4] in an algorithmic framework. On the other hand, from a morealgebraic perspective, there have been eﬀorts to bridge the gap between thenon-commutative ﬁeld of combinatorics on words with traditional commutativemathematics via Parikh matrices (cf. e.g., [11,13]) which are closely related to,and inﬂuenced by the topic of scattered factors.he set (or also in some cases, multi-set) of scattered factors of a word w ,denoted ScatFact( w ) is typically exponentially large in the length of w , andcontains a lot of redundant information in the sense that, for k ′ < k ≤ | w | , aword of length k ′ is a scattered factor of w if and only if it is a scattered factorof a scattered factor of w of length k . This has led to the idea of k -spectra:the set of all length- k scattered factors of a word. For example, the 3-spectrumof the word ababbb is the set { aab , aba , abb , bab , bbb } . Note that unlike someliterature, we do not consider the k -spectra to be the multi-set of scatteredfactors in the present work, but rather ignore the multiplicities. This distinctionis non-trivial as there are signiﬁcant variations on the properties based on thesediﬀerent deﬁnitions (cf. e.g., [10]). Also, the notion of k -spectra is closely relatedto the classical notion of factor complexity of words, which counts, for eachpositive integer k , the number of distinct factors of length k of a word. Here, thecardinality of the k -spectrum of a word gives the number of the word’s distinct scattered factors of length k .One of the most fundamental questions about k -spectra of words, and indeedsets of scattered factors in general, is that of recognition: given a set S of words(of length k ), is S the subset of a k -spectrum of some word? In general, it remainsa long standing goal of the theory to give a “nice” descriptive characterisation ofscattered factor sets (and k -spectra), and to better understand their structure [9].Another fundamental question concerning k -spectra, and one well motivatedin several applications, is the question of reconstruction: given a word w oflength n , what is the smallest value k such that the k -spectrum of w is uniquelydetermined? This question was addressed and solved successively in a variety ofcases. In particular, in [3], the exact bound of n + 1 is given in the general case.Other variations, including for the deﬁnition of k -spectra where multiplicities arealso taken into account, are considered in [10], while [7] considers the questionof reconstructing words from their palindromic scattered factors.In the current work, we consider k -spectra in the restricted setting of a binaryalphabet Σ = { a , b } . For such an alphabet, we can always identify the naturalnumber c ∈ N which describes how weakly balanced a word is: c is the diﬀerencebetween the amount of a s and b s. Thus, it seems natural to categorise all wordsover Σ according to this diﬀerence: a binary word where one letter has exactly c more occurrences than the other one is called weakly- c -balanced. In Section 3 thecardinalities of k -spectra of weakly- c -balanced words of length 2 k − c are investi-gated. Our ﬁrst results concern the minimal and maximal cardinality ScatFact k might have. We show that the cardinality ranges for weakly-0-balanced between k + 1 and 2 k , and determine exactly for which words of length 2 k these valuesare reached. In the case of weakly- c -balanced words, we are able to replicate theresult regarding the minimal cardinality of ScatFact k , but the case of maximalcardinality seems to be more complicated. To this end, it seems that the wordscontaining many alternations between the two letters of the alphabet have largersets ScatFact k . Therefore, we ﬁrst investigate the scattered factors of the wordswhich are preﬁxes of ( ab ) ω and give a precise description of all scattered factorsof any length of such words. That is, not only we compute the cardinality ofcatFact k ( w ), for all such words w , but also describe a way to obtain directlythe respective scattered factors, without repetitions. We use this to describe ex-actly the sets ScatFact i for the word ( ab ) k − c a c , which seems a good candidatefor a weakly- c -balanced word with many distinct scattered factors.Further, in Section 4, we explore more the cardinalities of ScatFact k ( w ) forweakly-0-balanced words w of length 2 k . We obtain for these words that thesmallest three numbers which are possible cardinalities for their k -spectra are k + 1, 2 k , and 3 k −

3, thus identifying two gaps in the set of such cardinalities.Among other results on this topic, we show that for every constant i there exista word w of length 2 k such that | ScatFact k ( w ) | ∈ Θ ( n i ); we also show how sucha word can be constructed.Finally, in Section 5, we also approach the question of reconstructing weakly-0-balanced words from k -spectra in the speciﬁc case that the spectra are alsolimited to weakly-0-balanced words only. While we are not able to resolve thequestion completely, we conjecture that the situation is similar to the generalcase: the smallest value k such that the k -spectrum of w is uniquely determinedis k = | w | + 1 if | w | is odd and k = | w | + 2, otherwise, in the case when w contains at most two blocks of b s.After introducing a series of basic deﬁnitions, preliminaries, and notations,the organisation of the paper follows the description above. The proofs can befound in [2]. Let N be the set of natural numbers, N = N ∪ { } , and let N ≥ k be all naturalnumbers greater than or equal to k . Let [ n ] denote the set { , . . . , n } and [ n ] =[ n ] ∪ { } for an n ∈ N .We consider words w over the alphabet Σ = { a , b } . Σ ∗ denotes the set ofall ﬁnite words over Σ , also called binary words. Σ ω the set of all inﬁnite wordsover Σ , also called binary inﬁnite words. The empty word is denoted by ε and Σ + is the free semigroup Σ ∗ \{ ε } . The length of a word w is denoted by | w | .Let Σ ≤ k := { w ∈ Σ ∗ | | w | ≤ k } and Σ k be the set of all words of length exactly k ∈ N . The number of occurrences of a letter a ∈ Σ in a word w ∈ Σ ∗ is denotedby | w | a . The i th letter of a word w is given by w [ i ] for i ∈ [ | w | ]. For a givenword w ∈ Σ n the reversal of w is deﬁned by w R = w [ n ] w [ n − . . . w [2] w [1]. Thepowers of w ∈ Σ ∗ are deﬁned recursively by w = ε , w n = ww n − for n ∈ N .A word w ∈ Σ ∗ is called weakly- c -balanced if || w | a −| w | b | = c for c ∈ N . Thusweakly-0-balanced words have the same number of a s and b s. Let Σ ∗ wzb be the setof all weakly-0-balanced words over Σ . For example, abaa is weakly-2-balanced, aba is weakly-1-balanced, while abbaba is weakly-0-balanced.A word u ∈ Σ ∗ is a factor of w ∈ Σ ∗ , if w = xuy holds for some words x, y ∈ Σ ∗ . Moreover, u is a preﬁx of w if x = ε holds and a suﬃx if y = ε holds. The factor of w from the i th to the j th letter will be denoted by w [ i..j ] for0 ≤ i ≤ j ≤ | w | . Given a letter a ∈ Σ and a word w ∈ Σ ∗ , a block of a is a factor u = w [ i..j ] with u = a j − i , such that either i = 1 or w [ i −

1] = b = a and either = | w | or w [ j + 1] = b = a . For example the word abaaabaabb has 3 a -blocksand 3 b -blocks. Scattered factors and k -spectra are deﬁned as follows. Deﬁnition 1.

A word u = a . . . a n ∈ Σ n , for n ∈ N , is a scattered factor ofa word w ∈ Σ + if there exists v , . . . , v n ∈ Σ ∗ with w = v a v . . . v n − a n v n .Let ScatFact( w ) denote the set of w ’s scattered factors and consider addition-ally ScatFact k ( w ) and ScatFact ≤ k ( w ) as the two subsets of ScatFact( w ) whichcontain only the scattered factors of length k ∈ N or the ones up to length k ∈ N . The sets ScatFact ≤ k ( w ) and ScatFact k ( w ) are also known as full k -spectrum and, respectively, k -spectrum of a word w ∈ Σ ∗ (see [1], [10], [12]) and moreover,scattered factors are often called subwords or scattered subwords . Obviously the k -spectrum is empty for k > | w | and contains exactly w ’s letters for k = 1 andonly w for k = | w | . Considering the word w = abba , the other spectra are givenby ScatFact ( w ) = { a , b , ab , ba } and ScatFact ( w ) = { ab , aba , b a } .It is worth noting that if u is a scattered factor of w , and v is a scatteredfactor of u , then v is a scattered factor of w . Additionally, notice two importantsymmetries regarding k -spectra. For w ∈ Σ ∗ and the renaming morphism · : Σ → Σ with a = b and b = a we have ScatFact( w R ) = { u R | u ∈ ScatFact( w ) } and ScatFact( w ) = { u | u ∈ ScatFact( w ) } . Thus, from a structural point of view,it is suﬃcient to consider only one representative from the equivalence classesinduced by the equivalence relation where w is equivalent to w whenever w is obtained by a composition of reversals and renamings from w . Consideringw.l.o.g. the order a < b on Σ , we choose the lexicographically smallest word asrepresentative from each class. As such, we will mostly analyse the k -spectra ofwords starting with a . We shall make use of this fact extensively in Section 4. k -Spectra of Weakly- c -Balanced Words In the current section, we consider the combinatorial properties of k -spectra ofweakly- c -balanced ﬁnite words. In particular, we are interested in the cardinali-ties of the k -spectra and in the question: which cardinalities are (not) possible?Since the k -spectra of a n and b n are just a k and b k respectively for all n ∈ N and k ∈ [ n ] , we assume | w | a , | w | b > w ∈ Σ ∗ . It is a straightforward obser-vation that not every subset of Σ k is a k -spectrum of some word w . For example,for k = 2, aa and bb can only be scattered factors of a word containing both a sand b s, and therefore having either ab or ba as a scattered factor as well. Thus,there is no word w such that ScatFact ( w ) = { aa , bb } .In general, for any word containing only a s or only b s, there will be exactlyone scattered factor of each length, while for words containing both a ’s and b ’s,the smallest k -spectra are realised for words of the form w = a n b (up to renamingand reversal), for which ScatFact k ( w ) = { a k , a k − b } for each k ∈ [ | w | ]. On theother hand, as Proposition 5 shows, the maximal k -spectra are those containingall words of length k – and hence have size 2 k , achieved by e.g. w = ( ab ) n for n ≥ k . Note that when weakly-0-balanced words are considered, the sameaximum applies, since ( ab ) n is weakly-0-balanced, while the minimum doesnot, since a n b is not weakly-0-balanced.It is straightforward to enumerate all possible k -spectra, and describe thewords realising them for k ≤

2, hence we shall generally consider only k -spectrain the sequel for which k ≥

3. Our ﬁrst result generalises the previous observationabout minimal-size k -spectra. Theorem 2.

For k ∈ N ≥ , c ∈ [ k − , i ∈ [ c ] , and a weakly- c -balanced word w ∈ Σ k − c , we have | ScatFact k − i ( w ) | ≥ k − c + 1 , where equality holds if andonly if w ∈ { a k b k − c , a k − c b k , b k a k − c , b k − c a k } . Moreover, if w ∈ Σ kwzb \{ a k b k } ,then | ScatFact k ( w ) | ≥ k + 3 .Proof. Consider ﬁrstly only weakly-0-balanced words, i.e. c = 0 and w.l.o.g. only w = a k b k . The cases k = 1 and k = 2 are the induction basis.The word a k b k has obviously all a r b s for r, s ∈ [ k ] as scattered factors, thus k + 1 many. This proves the ⇐ -direction.Consider now a word w ∈ Σ kwzb \{ a k b k , b k a k } . Since w is not a k b k , w containsa factor aba or bab . Assume w.l.o.g. that w = x aba y holds for x, y ∈ Σ ∗ with | x | + | y | = 2 k −

3. By w ∈ Σ kwzb follows that | x | b or | y | b is not zero. Choosew.l.o.g. z , z ∈ Σ ∗ with y = z b z which implies w = x aba z b z . Consequently | xz z | a = | xz z | b = k − case 1: xz z = a k − b k − By induction | ScatFact k − ( xz z ) | = ( k −

2) + 1 = k −

1. Let u be a scatteredfactor of xz z of length k −

2. Then there exist u , u , and u such that u is ascattered factor of x , u of z , and u of z respectively. Consequently u aa u u , u ab u u , and u ba u u are diﬀerent elements of ScatFact k ( w ). Each scattered factor of xz z is of theform a r b s for r, s ∈ [ k − . We will now prove in which cases the aforementionedscattered factors are diﬀerent. Consider u = u u u = a r b s and u ′ = u ′ u ′ u ′ = a r ′ b s ′ to be diﬀerent scattered factors of this form, i.e. r = r ′ and s = s ′ . Set α = u aa u u , β = u ′ aa u ′ u ′ α = u ba u u , β = u ′ ba u ′ u ′ α = u ab u u , β = u ab u u . If u = a r , u u = a r b s and u ′ = a r ′ , u ′ u ′ = a r ′ b s ′ with r + r = r and r ′ + r ′ = r ′ , we get because of r = r ′ , r = − α = a r +2 b s = a r ′ +2 b s = β ,α = a r +2 b s = a r ′ ba r ′ +1 b s ′ = β α = a r ba r +1 b s = a r ′ ba r ′ +1 b s ′ = β . f u = a r , u u = a r b s and u ′ = a r ′ b s ′ , u ′ u ′ = b s ′ with r + r = r , s ′ + s ′ = s ′ , and s ′ = 0 (already in the previous case) we get because of s ′ = 0, α = a r +2 b s = a r ′ b s ′ aab s ′ = β ,α = a r +2 b s = a r ′ b s ′ bab s ′ = β α = a r ba r +1 b s = a r ′ b s ′ bab s ′ = β . If u = a r b s , u u = b s and u ′ = a r ′ b s ′ , u ′ u ′ = b s ′ with r + r = r , s ′ + s ′ = s ′ , and s , s ′ = 0 (already in the previous case) we get because of r ′ = r and s , s ′ = 0, α = a r b s aab s = a r ′ b s ′ aab s ′ = β ,α = a r b s aab s = a r ′ b s ′ bab s ′ = β α = a r b s bab s = a r ′ b s ′ bab s ′ = β . Consequently α and α are all diﬀerent and we get 2( k −

1) many diﬀerentscattered factors. Assume now additionally | r − r ′ | = 3. If u = a r , u u = a r b s and u ′ = a r ′ , u ′ u ′ = a r ′ b s ′ with r + r = r and r ′ + r ′ = r ′ , we get becauseof s ′ = 0, r ′ = r , r ′ = r + 1 α = a r +2 b s == a r ′ aba r ′ b s ′ = β ,α = a r ba r +1 b s = a r ′ aba r ′ b s ′ = β ,α = a r aba r b s = a r ′ aba r ′ b s ′ = β , If u = a r , u u = a r b s and u ′ = a r ′ b s ′ , u ′ u ′ = b s ′ with r + r = r , s ′ + s ′ = s ′ , and s ′ = 0 (already in the previous case) we get because of s ′ = 0, r ′ = r + 2, α = a r +2 b s = a r ′ b s ′ abb s ′ = β ,α = a r ba r +1 b s = a r ′ b s ′ abb s ′ = β α = a r aba r b s = a r ′ b s ′ abb s ′ = β . If u = a r b s , u u = b s and u ′ = a r ′ b s ′ , u ′ u ′ = b s ′ with r + r = r , s ′ + s ′ = s ′ , and s , s ′ = 0 (already in the previous case) we get because of r ′ = r and s , s ′ = 0, r ′ = r + 2, α = a r b s aab s = a r ′ b s ′ abb s ′ = β ,α = a r b s bab s = a r ′ b s ′ abb s ′ = β α = a r b s abb s = a r ′ b s ′ abb s ′ = β . Consequently we have another ⌊ k − ⌋ +1 diﬀerent scattered factors. This sums upto | ScatFact k ( w ) | ≥ k − > k + 1. An immediate result is that the k -spectrumas at least k + 3 elements for k ≥

5. For k = 3 and k = 4 the results can beeasily veriﬁed by testing. case 2: xz z = a k − b k − In this case all words of the form a r abaa s for r + s = k − r ∈ [ | x | a ] , and s ∈ [ | y | a ] are | x | a + 1 diﬀerent scattered factors of length k of w . Analogously all b r ′ abab s ′ with r ′ + s ′ = k − r ′ ∈ [ | x | b ] , s ′ ∈ [ | y | b ] are | x | b +1 diﬀerent scatteredfactors of length k of w . All these factors are diﬀerent and additionally w has a k and b k as scattered factors. Hence | ScatFact k ( w ) | ≥ | x | a + | x | b + 4 = | x | + 4holds. Since the length of w is 2 k , the length of xy is 2 k − x and y have diﬀerent lengths. Assume w.l.o.g. | x | > | y | , i.e. | x | ≥ k −

1. Thisimplies | ScatFact k ( w ) | ≥ k + 3 follows. This proves the claim for c = 0.Assume now c > w = a k b k − c . By the previous part we know | ScatFact k − c ( w ) | = k − c + 1 if and only if w = a k − c b k − c . The claim about the( k − c )-spectrum follows immediately by ScatFact k − c ( w ) = ScatFact k − c ( a k b k − c )since the prepended a s do not change the ( k − c )-spectrum. For i ∈ [ c − noticethat x ∈ ScatFact k − i ( a k b k − c ) implies that a x (resp. x b , x a , b x ) is a scatteredfactor of a k b k − c of length k − i + 1. Thus | ScatFact k − i +1 ( w ) | ≥ k − c + 1 follows.On the other hand a scattered factor of a k b k − c of length k − i + 1 is exactly ofthis form, since it can neither start with b ( a k b k − c has only ( k − c ) occurrencesof b ) nor contain ba resp. ab (this would be the implication of a scattered factorbeing of the form a x ′ with | x ′ | = k − i , x ′ ScatFact k − i ( a k b k − c )). ⊓⊔ Remark 3.

Theorem 2 answers immediately the question, whether a given set S ⊆ Σ k , with | S | < k + 1 or | S | = k + 2, is a k -spectrum of a word w ∈ Σ kwzb inthe negative.Theorem 2 shows that the smallest cardinality of the k -spectrum of a word w is reached when the letters in w are nicely ordered , both for weakly-0-balancedwords as well as for weakly- c -balanced words with c >

0. The largest cardinalityis, not surprisingly, reached for words where the alternation of a and b lettersis, in a sense, maximal, e.g., for w = ( ab ) k . To this end, one can show a generalresult. Theorem 4.

For w ∈ Σ ∗ , the k -spectrum of w is Σ k if and only if { ab , ba } k ∩ ScatFact k ( w ) = ∅ . The previous theorem has an immediate consequence, which exactly charac-terises the weakly-0-balanced words of length 2 k for which the maximal cardi-nality of ScatFact k ( w ) is reached. Proof.

We will show this result by induction. For k = 1, the equivalence is:ScatFact ( w ) = Σ iﬀ { ab , ba } ∩ ScatFact ( w ) = ∅ . If both a and b are scattered factors of w , ab or ba has to be a factor and thus ascattered factor of w . On the other hand if w has ab or ba as a scattered factor,it has a and b as scattered factors.ssume now that the equivalence holds for an arbitrary but ﬁxed k − ∈ N .We will show it holds for k .For the ⇐ -direction consider u ∈ { ab , ba } k ∩ ScatFact k ( u ). Thus, u ∈{ ab , ba } k − { ab , ba } and hence there exists u ′ ∈ { ab , ba } k − with u ∈ u ′ { ab , ba } .By induction we have ScatFact k − ( u ′ ) = Σ k − . For any x ∈ Σ k exists x ′ ∈ Σ k − with x ∈ x ′ { a , b } . This implies that there exist a , . . . , a k − ∈ Σ ∗ with u ′ = a x ′ [1] a . . . x ′ [ k − a k − since x ′ ∈ ScatFact k − ( u ′ ). By u ∈ a x ′ [1] a . . . x ′ [ k − a k − { ab , ba } it follows in both cases, namely x = x ′ a or x = x ′ b , that x ∈ ScatFact k ( w ).This proves the inclusion Σ k ⊆ ScatFact k ( w ). By ScatFact k ( w ) ⊆ Σ k the ﬁrstdirection is proven.For the ⇒ -direction assume ScatFact k ( w ) = Σ k . Assume w.l.o.g. w [ | w | ] = a .Choose x, y ∈ Σ ∗ with w = xy and x [ | x | ] = b , and y ∈ a ∗ . As Σ k − b ⊂ Σ k , it follows that Σ k − b ⊆ ScatFact k ( x ). Clearly, this means that Σ k − ⊆ ScatFact k − ( x [1 .. | x |− { ab , ba } k − ∩ ScatFact k − ( x [1 .. | x |− = ∅ . Thus, { ab , ba } k − x [ | x | ] a ∩ ScatFact k ( w [1 .. | x | +1]) = ∅ , because w [1 .. | x | +1] = x [1 .. | x | ] b . Hence, { ab , ba } k − ba ∩ ScatFact k ( w ) = ∅ . The conclusion follows. ⊓⊔ Proposition 5.

For k ∈ N ≥ and w ∈ Σ kwzb we have w ∈ { ab , ba } k if and onlyif ScatFact k ( w ) = Σ k .Proof. If w ∈ { ab , ba } k , then { ab , ba } k ∩ ScatFact k ( w ) = ∅ and the claimfollows by Theorem 4. On the other hand if ScatFact k ( w ) = Σ k then { ab , ba } k ∩ ScatFact k ( w ) = ∅ and since | w | = 2 k we get w ∈ { ab , ba } k . ⊓⊔ To see why from w ∈ { ab , ba } k it follows that ScatFact k ( w ) = Σ k , note that,by deﬁnition, a word w ∈ { ab , ba } k is just a concatenation of k blocks from { ab , ba } . To construct the scattered factors of w , we can simply select from eachblock either the a or the b . The resulting output is a word of length k , wherein each position we could choose freely the letter. Consequently, we can produceall words in Σ k in this way. The other implication follows by induction.Generalising Proposition 5 for weakly- c -balanced words requires a more so-phisticated approach. A generalisation would be to consider w ∈ { ab , ba } k − c a c .By Theorem 4 we have ScatFact k − c ( w ) = Σ k − c . But the size of ScatFact k − i ( w )for i ∈ [ c ] depends on the speciﬁc choice of w . To see why, consider the words w = baabba and w = ( ba ) . Then by Proposition 5, | ScatFact ( w ) | = 8 = | ScatFact ( w ) | . However, when we append an a to the end of both w and w ,we see that in fact | ScatFact ( w a ) | = 11 = 12 = | ScatFact ( w a ) | . The maindiﬀerence between weakly-0-balanced and weakly- c -balanced words for c > a and b occurring in w .In the remaining part of this section we present a series of results for weakly- c -balanced words. Intuitively, the words with many alternations between a and b ave more distinct scattered factors. So, we will focus on such words mainly. Ourﬁrst result is a direct consequence from Theorem 4. The second result concernswords avoiding a and b gives a method to identify eﬃciently the ℓ -spectra ofwords which are preﬁxes of ( ab ) ω , for all ℓ . Finally, we are able to derive a way toeﬃciently enumerate (and count) the scattered factors of length k of ( ab ) k − c a c . Corollary 6.

For k ∈ N ≥ , c ∈ [ k ] , and w ∈ Σ k − c weakly- c -balanced, thecardinality of ScatFact k − c ( w ) is exactly k − c if and only if ScatFact k − c ) ( w ) ∩{ ab , ba } k − c = ∅ .Proof. The claim follows directly by Theorem 4. ⊓⊔ As announced, we further focus our investigation on the words w = ( ab ) k − c a c .By Theorem 4 we have | ScatFact i ( w ) | = Σ i for all i ∈ [ k − c ] . For all i with k − c < i ≤ k , a more sophisticated counting argument is needed. Intuitively,a scattered factor of length i of ( ab ) k − c a c consists of a part that is a scatteredfactor (of arbitrary length) of ( ab ) k − c followed by a (possibly empty) suﬃx of a s. Thus, a full description of the ℓ -spectra of words that occur as preﬁxes of( ab ) ω , for all appropriate ℓ , is useful. To this end, we introduce the notion ofa deleting sequence: for a word w and a scattered factor u of w the deletingsequence contains (in a strictly increasing order) w ’s positions that have to bedeleted to obtain u . Deﬁnition 7.

For w ∈ Σ ∗ , σ = ( s , . . . , s ℓ ) ∈ [ | w | ] ℓ , with ℓ ≤ | w | and s i < s i +1 for all i ∈ [ ℓ − , is a deleting sequence . The scattered factor u σ associatedto a deleting sequence σ is u σ = u . . . u ℓ +1 , where u = w [1 ..s − , u ℓ +1 = w [ s ℓ + 1 .. | w | ] , and u i = w [ s i − + 1 ..s i − for ≤ i ≤ ℓ . Two sequences σ, σ ′ with u σ = u σ ′ are called equivalent . For the word w = abbaa and σ = (1 , ,

4) the associated scattered factor is u σ = ba . Since ba can also be generated by (1 , , , ,

4) and (1 , , ℓ -spectrum of a word w ∈ Σ n for ℓ, n ∈ N , wecan determine how many equivalence classes does the equivalence deﬁned abovehave, for sequences of length k = n − ℓ . The following three lemmas characterisethe equivalence of deleting sequences. Lemma 8.

Let w ∈ Σ n be a preﬁx of ( ab ) ω . Let σ = ( s , . . . , s k ) be a deletingsequence for w such that there exists j ≥ with s j − < s j − and s j + 1 = s j +1 .Then σ is equivalent σ ′ = ( s , . . . , s j − , s j − , s j +1 − , s j +2 , . . . s k ) , i.e., σ ′ isthe sequence σ where both s j and s j +1 were decreased by .Proof. Since s j − < s j −

1, the factor u σ contains the letter w [ s j − w [ s j ] = a then w [ s j +1 ] = w [ s j + 1] = b and w [ s j −

1] = b . Clearly, when deleting w [ s j − w [ s j ] according to the sequence σ ′ , the b that was corresponding to w [ s j − b corresponding to w [ s j +1 ], which is not deleted. So,in the end, u σ ′ = u σ . The case w [ s j ] = b is analogous. ⊓⊔ emma 9. Let w ∈ Σ n be a preﬁx of ( ab ) ω . Let σ = ( s , . . . , s k ) be a deletingsequence for w . Then there exists an integer j ≥ such that σ is equivalent to thedeleting sequence (1 , , . . . , j, s ′ j +1 , . . . , s ′ k ) , where s ′ j +1 > j + 1 and s ′ i > s ′ i − + 1 ,for all j < i ≤ k . Moreover, j ≥ if and only if σ contained two consecutivepositions or σ started with .Proof. Let σ = σ . For i ≥

0, we iteratively transform σ i into σ i +1 as follows: if σ i contains on consecutive positions the numbers g, t, t + 1 , h , such that g < t − h > t + 2, we replace them by g, t − , t, h and obtain the sequnce σ i +1 . ByLemma 8, σ i is equivalent to σ i +1 . It is clear that in O ( n ) steps we will reacha sequence σ ℓ which cannot be transformed anymore. We take σ ′ = σ ℓ and it isimmediate that it will have the required form. ⊓⊔ Lemma 10.

Let w ∈ Σ n be a preﬁx of ( ab ) ω . Let σ = (1 , , . . . , j , s ′ j +1 , . . . , s ′ k ) , where s ′ j +1 > j + 1 and s ′ i > s ′ i − + 1 , for all j < i ≤ k , and σ = (1 , , . . . , j , s ′′ j +1 , . . . , s ′′ k ) , where s ′′ j +1 > j + 1 and s ′′ i > s ′′ i − + 1 , for all j < i ≤ k . If σ = σ then σ and σ are not equivalent (i.e., u σ = u σ ).Proof. We ﬁrst consider the case j = j . Let ℓ to be minimum such that s ′ ℓ = s ′′ ℓ . We can assume without losing generality that s ′ ℓ < s ′′ ℓ . Then u σ and u σ share the same preﬁx of length t = ( s ′ ℓ − − ( ℓ − w [ s ′ ℓ −

1] and is followed by w [ s ′ ℓ + 1] in u σ and, respectively, by w [ s ′ ℓ ] in u σ .But w [ s ′ ℓ + 1] = w [ s ′ ℓ ], so u σ = u σ .Further, we consider the case when j < j (the case j < j is symmetric);assume, as a convention, that s ′′ k +1 = 0 and let d = j − j . Clearly, j and j musthave the same parity, or u σ and u σ would start with diﬀerent letters, so theywould not be equal. Let ℓ to be minimum integer such that s ′ ℓ − j = s ′′ ℓ + d − j ;because s ′′ k +1 = 0 by convention, we have ℓ ≤ k . If both ℓ and ℓ + d are at most k , then we get similarly to the case j = j that u σ = u σ . In the case when ℓ ≤ k < ℓ + d , then, by length reasons, all positions j > s ℓ (so, including s ℓ + 1)in w should belong to σ , a contradiction. This concludes our proof. ⊓⊔ Lemmas 8, 9, and 10 show that the representatives of the equivalence classesw.r.t. the equivalence relation between deleting sequences, introduced in Def-inition 7, are the sequences (1 , , . . . , j, s ′ j +1 , . . . , s ′ k ), where s ′ j +1 > j + 1 and s ′ i > s ′ i − + 1, for all j < i ≤ k . For a ﬁxed j ≥

1, the number of such se-quences is (cid:0) ( n − j − − ( k − j )+1 k − j (cid:1) = (cid:0) n − kk − j (cid:1) . For j = 0, we have (cid:0) ( n − − k +1 k (cid:1) = (cid:0) n − kk (cid:1) nonequivalent sequences (note that none starts with 1, as those were countedfor j = 1 already). In total, we have, for a word w of length n , which is a preﬁxof ( ab ) ω , exactly P j ∈ [ k ] (cid:0) n − kk − j (cid:1) nonequivalent deleting sequences of length k , so P j ∈ [ k ] (cid:0) n − kk − j (cid:1) diﬀerent scattered factors of length n − k . In the above formula,we assume that (cid:0) ab (cid:1) = 0 when a < b .Moreover, the distinct scattered factors of length ℓ = n − k of w can beobtained eﬃciently as follows. For j from 0 to ℓ , delete the ﬁrst j letters of w . For all choices of ℓ − j positions in w [ j + 1 ..n ], such that each two of thesepositions are not consecutive, delete the letters on the respective positions. Theesulted word is a member of ScatFact ℓ ( w ), and we never obtain the same wordtwice by this procedure. The next theorem follows from the above. Theorem 11.

Let w be a word of length n which is a preﬁx of ( ab ) ω . Then | ScatFact ℓ ( w ) | = P j ∈ [ n − ℓ ] (cid:0) ℓn − ℓ − j (cid:1) . A straightforward consequence of the above theorem is that, if ℓ ≤ n − ℓ then | ScatFact ℓ ( w ) | = 2 ℓ . With Theorem 11, we can now completely characterise thecardinality of the ℓ -spectra of the weakly- c -balanced word ( ab ) k − c a c for ℓ ≤ k . Theorem 12.

Let w = ( ab ) k − c a c for k ∈ N , c ∈ [ k ] . Then, for i ≤ k − c we have | ScatFact i ( w ) | = 2 i . For k ≥ i > k − c we have | ScatFact i ( w ) | =1 + 2 k − c + P j ∈ [( i + c ) − k − | ScatFact i − j − (( ab ) k − c − a ) | .Proof. We will need to show the proof for k ≥ i > k − c , as the other part followsimmediately from Theorem 4.We give a method to count the scattered factors of w = ( ab ) k − c a c . To beginwith, we have the scattered factor a i . All the other scattered factors must containa letter b . Thus, we count separately the scattered factors of the form u ba j , foreach j ∈ [ i − . This is equivalent to counting in how many ways we can choose u . For each such u we will just have to append ba j at the end to get the desiredscattered factors of length. Thus, | u | = i − j −

1. If j ≥ c then u should occuras a scattered factor of ( ab ) k − j − a (in order to be able to append ba j at its endand still stay as a scattered factor of w ), while if j < c then u should occur asa scattered factor of ( ab ) k − c − a . In the ﬁrst case, the length of the scatteredfactor u we want to generate is less than half of the length of the word ( ab ) t a from which we generate it. So, there are 2 i − j − choices for u . In the second case,if j ≥ ( i + c ) − k , again, the length of the scattered factor u we want to generateis less than half of the length of the word ( ab ) k − c − a from which we generateit. So, there are 2 i − j − choices for u again. Finally, if j < ( i + c ) − k , thenthere i − j − > k − c −

1, and we need Theorem 4 to generate u . There are | ScatFact i − j − (( ab ) k − c − a ) | ways to choose u in this case. Summing all theseup, we get the result from the statement:1 + i − X j = i + c − k i − j − + X j ∈ [ i + c − k − ScatFact i − j − (( ab ) k − c − a ) =1 + 2 k − c + X j ∈ [ i + c − k − ScatFact i − j − (( ab ) k − c − a ) . This concludes our proof. ⊓⊔ As in the case of the scattered factors of preﬁxes of ( ab ) ω , we have a preciseand eﬃcient way to generate the scattered factors of w = ( ab ) k − c a c . For scat-tered factors of length i ≤ k − c of w , we just generate all possible words of length i . For greater i , on top of a i , we generate separately the scattered factors of theform u ba j , for each j ∈ [ i − . It is clear that, in such a word, | u | = i − j − j ≥ c then u must be a scattered factor of ( ab ) k − j − a , while if j < c then u must be a scattered factor of ( ab ) k − c − a . If j ≥ ( i + c ) − k then, by Theorem11, u can take all 2 i − j − possible values. For smaller values of j , we need togenerate u of length i − j − ab ) k − c − a , by the methoddescribed after Proposition 5.Nevertheless, Theorems 11 and 12 are useful to see that in order to determinethe cardinality of the sets of scattered factors of words consisting of alternating a s and b s or, respectively, of ( ab ) k − c a c , it is not needed to generate these setseﬀectively. k -Spectra of Weakly-0-Balanced Words In the last section a characterisation for the smallest and the largest k -spectraof words of a given length are presented (Theorem 2 and Proposition 5). In thissection the part in between will be investigated for weakly-0-balanced words(i.e. words of length 2 k with k occurrences of each letter). As before, we shallassume that k ∈ N ≥ . In the particular case that k = 3, we have already proventhat the k -spectrum with minimal cardinality has 4 elements and that the max-imal cardinality is 8. Moreover as mentioned in Remark 3 a k -spectrum of car-dinality 5 does not exist for weakly-0-balanced words of length 2 k . The questionremains if k -spectra of cardinalities 6 and 7 exist, and if so, for which words.Before showing that a k -spectrum of cardinality 2 k − k also exists for all k ∈ N ≥ , we prove that only scatteredfactors of the form b i +1 a k − i − for i ∈ [ k − (up to renaming, reversal) canbe “taken out” from the full set of possible scattered factors independently,without additionally requiring the removal of additional scattered factors aswell. In particular, if a word of length k of another form is absent from the setof scattered factors of w , then | ScatFact k ( w ) | < k − Lemma 13.

If for w ∈ Σ kwzb there exists u / ∈ ScatFact k ( w ) with u / ∈ { b i a k − i | i ∈ [ k − } ∪ { a i b k − i | i ∈ [ k − } , then | ScatFact k ( w ) | < k − .Proof. Let be i ∈ [ k − . Consider ﬁrstly u = b r a s for r + s = k and r [ i ] ∪ { k − i, . . . , k } and Σ k \{ u } ⊃ ScatFact k ( w ) for a word w ∈ Σ kwzb . If b r +1 a s − is also not a scattered factor of w , the claim is proven (in this case two elements of Σ k are missing in ScatFact k ( w )). Assume b r +1 a s − ∈ ScatFact( w ). This impliesthat (possibly intertwined) ( s −

1) occurrences of a follow ( r + 1) occurrences of b . Since u is not a scattered factor of w , after these ( s − a s only b s may occur.If b r − a s b is not a scattered factor, the claim is again proven and so supposethat it is one. This implies that the ( r − b s are preceded by a s and not by b s. This implies that b r +1 a s − is not a scattered factor and that contradicts theassumption. Consider now u = u b r a s b t u with | u | = k not to be a scatteredfactor of w for r, s, t ∈ N . Following the same arguments as before, the claim isproven if u b r − a s b t +1 u is not a scattered factor and hence it is assumed tobe one. This implies that exactly | u | b b s occur before b r − . This implies that u b r +1 a s b t − u is not a scattered factor of w of length k . Analogously it can beroven that scattered factors containing the switch from a to b and back to a cannot lead to the cardinality 2 k − ⊓⊔ Proposition 14.

For k ∈ N ≥ and w ∈ Σ kwzb , the set ScatFact k ( w ) has k − elements if and only if w ∈ { ( ab ) i a b ( ab ) k − i − | i ∈ [ k − } (up to renamingand reversal). In particular ScatFact k ( w ) = Σ k \{ b i +1 a k − i − } holds for w =( ab ) i a b ( ab ) k − i − with i ∈ [ k − .Proof. Let be i ∈ [ k − . First ” ⇐ ” will be proven and for that consider w = ( ab ) i a b ( ab ) k − i − . By Lemma 5 followsScatFact i (( ab ) i ) = Σ i and ScatFact k − i − (( ab ) k − i − ) = Σ k − i − . With ScatFact ( a b ) = { aa , ab , bb } the k -spectrum of w has at least 3 · i · k − i − = 3 · k − = 2 k − k − elements. Notice that by this construction, scatteredfactors with a ba at the middle position cannot be reached. For this reason wehave to have a look at w ’s remaining scattered factors not being gained by theabove construction. This means that not only i letters are allowed to be takenof the ﬁrst part and not only k − i − ab ) i one can notice that all binary numbers (en-coded by a , b ) of length i are scattered factors of ( ab ) i − a . Appending to thesescattered factors a b implies that nearly all binary numbers are in the i + 1-spectrum of ab i . Appending now an a from the middle part and then each ofthe words from the last part leads to nearly all remaining scattered factors ofthe k -spectrum of w . The only missing word is b i + i , since the last b cannot bereached within the ﬁrst part. This implies that the word b i +1 a k − i − is not inthe k -spectrum of w since with the (i+1) th b the middle part is reached and thelast part contains only k − i − a s. This concludes | ScatFact k ( w ) | = 2 k − | ScatFact k ( w ) | = 2 k − b i +1 a k − i − for an i ∈ [ k − is missing in the k -spectrum of w . Moreover thisis exactly the only element missing. Fix an i ∈ [ k − and set u = b i +1 a k − i − .The proof will be very technically and exclude step by step all other possibili-ties than w being ( ab ) i a b ( ab ) k − i − . Firstly consider i = k −

2. This implies u = b k − a . In this case w has to end in b but not in b since otherwise b k − a would not be a scattered factor. If w were of the form w bab , | w | a = k − | w | b = k − b k − a is not a scatteredfactor. If w ended in a b , a k − ba would be excluded. Hence, w ends in a b .Suppose at last that w = ( ab ) ℓ a b w holds for ℓ < k − w ∈ Σ ∗ . Then w has each ( k − ℓ − a and b . Thus b ℓ +1 a k − ℓ − is not a scattered factor oflength k . This proofs that for i = k − w = ( ab ) k − a b is implied by b k − a being the only excluded scattered factor from Σ k . Hence assume i ∈ [ k − . Supposition: w ends in b ℓ for ℓ ≥ i < k − b k − a ScatFact k ( w ) follows and since i + 1 < k − u .In the next step it will be shown that exactly k − i − ab are asuﬃx of w . Supposition: w = w b ( ab ) ℓ f ℓ > k − i − b i +1 a k − i − would not be a scattered factor of w . If ℓ < k − i − b k − ℓ − a ℓ +1 would not be a scattered factor since w has( k − a and ( k − ℓ − b . Supposition: w = w a ( ba ) ℓ b In this case | w | a = k − − ℓ and | w | b = k − ℓ − a k − − ℓ b ℓ +1 a is not in the k -spectrum of w .Consequently there exists a w such that w = w b ( ab ) k − i − holds. In the nextit will be shown that b has to be preceded by a . Supposition: w = w b ( ab ) k − i − Here w has ( i + 2) a and ( i − b and hence b i a k − i − b is not a scattered factorof length k of w . Supposition: w = w bab ( ab ) k − i − This implies a i +2 bab k − i ScatFact k ( w ) since w has i + 1 occurrences of a and i − b .This proofs that a b ( ab ) k − i − is a suﬃx of w . The case that this is precededby another a is excluded since then a i ba k − i − would not be in the k -spectrumof k . In the last step it will be shown that the ﬁrst occurrence of a is at thepoint 2 ℓ . Supposition: w = ( ab ) ℓ a w for ℓ = i If ℓ is smaller than i , | w | a = k − ℓ − | w | b = k − ℓ hold and b ℓ +1 a k − ℓ − ScatFact k ( w ) follows. If ℓ is greater than i , in contradiction to the main assump-tion b i +1 a k − i − is a scattered factor, because b i +1 is a scattered factor of ( ab ) ℓ and k − ℓ + ℓ − ( i + 1) = k − i − a are left in the rest of w .Combining w = ( ab ) i a w and w = w a b ( ab ) k − i − the claim that w is of theform ( ab ) i a b ( ab ) k − i − is proven. ⊓⊔ By Proposition 14 we get that 7 is a possible cardinality of the set of scatteredfactors of length 3 of weakly-0-balanced words of length 6 and, moreover, thatexactly the words a b ab and aba b (and symmetric words obtained by reversaland renaming) have seven diﬀerent scattered factors. The following theoremdemonstrates that there always exists a weakly-0-balanced word w of length 2 k such that | ScatFact k ( w ) | = 2 k . Thus, for the case k = 3 also the question if sixis a possible cardinality of ScatFact ( w ) can be answered positively. Theorem 15.

The k -spectrum of a word w ∈ Σ kwzb has exactly k elements ifand only if w ∈ { a k − bab k − , a k − b k a } holds (up to renaming and reversal).Moreover, there does not exist a weakly- -balanced word w ∈ Σ kwzb with a k -spectrum of cardinality k − i for i ∈ [ k − .Proof. Consider ﬁrst w = a k − bab k − . Since the k -spectrum of a k b k is a subsetof the k -spectrum of w , the k -spectrum of w has at least k + 1 elements. Ad-ditionally w has the scattered factors of the form a i bab k − − i , which sum up to k −

1. Hence | ScatFact k ( w ) | = k + 1 + k − k holds. Moreover a k − b k a hasall elements of a k b k ’s k -spectrum as scattered factors. Here the word has in ad-dition all words of the form a i b k − − i a as scattered factors which sum up to k − k .he other direction will be proven by contraposition following the two maincases a k − bab k − and a k − b k a . Assume ﬁrst w = a ℓ b x for ℓ ∈ [ k − ≥ . Notice that it does not have tobe considered that the word starts with one a , since this is symmetric to thereversal of the case a k − b k a . This implies | x | a = k − ℓ and | x | b = k −

1. Noticehere k − ℓ < k −

1. Thus, there exists a scattered factor x ′ of x of length 2( k − ℓ )with | x ′ | a = | x ′ | b = k − ℓ . By Lemma 2 follows | ScatFact k − ℓ ( y ) | = k − ℓ + 1 ⇔ y ∈ { a k − ℓ b k − ℓ , b k − ℓ a k − ℓ } and | ScatFact k − ℓ ( y ) | > k − ℓ +1 otherwise. This implies that the ( k − ℓ )-spectrumof x ′ is minimal with respect to cardinality if x ′ is either a k − ℓ b k − ℓ or b k − ℓ a k − ℓ .For giving a lower bound of the cardinality of w ’s scattered factor set of length k , it is suﬃcient to only take these both options into consideration. This impliesthat it is not necessary to examine the cases where x contains other scatteredfactors with both k − ℓ a and b . case 1: x ′ = a k − ℓ b k − ℓ Thus x contains ℓ − b which are not in x ′ . case a: x = b ℓ − a k − ℓ b k − ℓ In this case w = a ℓ b ℓ a k − ℓ b k − ℓ holds and that the k -spectrum of a k b k is a subsetof ScatFact k ( w ) follows. case i: ℓ < k − ℓ For all s ∈ [ ℓ ] the words a ℓ − s b s a k − ℓ , . . . , a ℓ b s a k − ℓ − s are well-deﬁned and sumup to s + 1. Moreover for every s ∈ [ k − ℓ ] exists r ∈ N and exist r , s ∈ N such that the words a r b s a r b s with s + r + s + r = k are all distinct anddistinct to the aforementioned. Thus, in this case k + 1 + ℓ X s =1 ( s + 1) + k − ℓ = 2 k + 1 − ℓ + ℓ ( ℓ + 1)2 + ℓ ≥ k + 4is a lower bound for ScatFact k ( w ). case ii: ℓ > k − ℓ Consider here for r ∈ [ k − ℓ ] the words b ℓ − r a r b k − ℓ , . . . , b ℓ a r b k − ℓ − r . For ﬁxed r these are r + 1. Moreover in this case for all r ∈ [ ℓ ] exist s , r ∈ N and s ∈ N such that the words a r b s a r b s with s + r + s + r = l are all distinct anddistinct to the aforementioned. In total this sums up to k + 1 + k − ℓ X r =1 ( r + 1) + ℓ = k + 1 + ( k − ℓ )( k − ℓ + 1)2 + ( k − ℓ ) + ℓ ≥ k + 4diﬀerent scattered factors. case b: x = a k − ℓ b k − Thus, w = a ℓ ba k − ℓ b k − holds. Here it holds as well that the k -spectrum of a k b k is a subset of ScatFact k ( w ). Moreover all words of the form ba r b s for r + s = k − r ∈ [ k − ℓ ] are diﬀerent scattered factors, i.e. k − ℓ many. Additionally theords a r bab s for r + s = k − r, s > k + 1 + k − k − k − k ( w ). This proves the claim for k ≥ case 2: x ′ = b k − ℓ a k − ℓ Consequently x ∈ { b k − a k − ℓ , b k − ℓ a k − ℓ b ℓ − } holds. case a: x = b k − a k − ℓ Hence w = a ℓ b k a k − ℓ . Here only ℓ + 1 diﬀerent scattered factors are of the form a r b s exist and k − ℓ of the form b s a r with r + s = k (notice that the latter ones areonly k − ℓ since among all of them one is in common with the ﬁrst ones). Finallyconsider the words of the form a r b s a r with r + r + s = k and r , r , s > ℓ + 1 + k − ℓ + k . By a k ∈ ScatFact k ( w ), | ScatFact k ( w ) | ≥ k + 2follows. case b: x = b k − ℓ a k − ℓ b ℓ − In this case w = a ℓ b k − ℓ +1 a k − ℓ b ℓ − holds. Here the cardinality of the k -spectrumof w is determined analogously to case 1a. ⊓⊔ By Proposition 14 and Theorem 15 the possible cardinalities of ScatFact ( w )for weakly-0-balanced words w of length 6 are completely characterized. Theo-rem 15 determines the ﬁrst gap in the set of cardinalities of | ScatFact k ( w ) | for w ∈ Σ kwzb : there does not exist a word w ∈ Σ kwzb with | ScatFact k ( w ) | = k + i + 1for i ∈ [ k −

2] and k ≥

3, since all words that are not of the form a k b k , b k a k , a k − bab k − , or a k − b k a have a scattered factor set of cardinality at least2 k + 1. As the size of this ﬁrst gap is linear in k , it is clear that the larger k is,the more unlikely it is to ﬁnd a k -spectrum of a small cardinality.In the following we will prove that the cardinalities 2 k + 1 up to 3 k − k − k + 1 and 2 k (witnessed by, e.g. a k − b k a ). Lemma 16.

For i ∈ (cid:2) ⌊ k ⌋ (cid:3) and j ∈ [ k − – | ScatFact k ( a k − i b k a i ) | = k ( i + 1) − i + 1 for k ≥ , – | ScatFact k ( a k − b ab k − ) | = 3 k − , – | ScatFact k ( a k − b j ab k − j a ) | = k (2 j + 2) − j + 2 for k ≥ , and – | ScatFact k ( a k − b j a b k − j ) | = k (2 j + 1) − j + 2 .Proof. For the ﬁrst claim, let be i ∈ (cid:2) ⌊ k ⌋ (cid:3) ≥ . The k -spectrum of a k − i b k a i con-tains exactly all words of the form a r b s a t with r + s + t = k , t ∈ [ i ] , r ∈ [ k − i ] ,and s ∈ [ k ] . If t and r are ﬁxed, s is uniquely determined. Since all these scat-tered factors are diﬀerent, the k -spectrum has ( i + 1)( k − i + 1) = k ( i + 1) − i − a k − b ab k − areof four diﬀerent forms: b r ab t , a r b s a , a r b s , and a r b s ab s . Notice that all thesescattered factors are diﬀerent if in the second one s is chosen greater than orequal to 1 and in the last one r, s , s ≥ s ∈ [2] there are enough a at the beginningfor padding from the left. The third form leads to k + 1 diﬀerent scattered ashown in Theorem 2. The last one is a little bit more complicated. Notice ﬁrstlythat r is at most k − s , s > s and s , namely as 1. If r is k − s = 1 and s = 2 or vice versa. For r ∈ [ k −

5] there existalways 2 possibilities for the b s between the a s. This leads to 2( k −

5) possibilities.Allover it sums up to 2 + 2 + k + 1 + 1 + 2 + 2( k −

5) = 8 + 3 k −

10 = 3 k − a r b s , b r a s b r , a r b s a r , and a r b s a r b s ,where with appropriate chosen exponents no factors is counted twice. Also asbefore, i can be chosen in (cid:2) ⌊ k ⌋ (cid:3) , since otherwise the proof is analogous for k − i .The ﬁrst form contributes k + 1 elements. The second and third form contribute2 i each, since s resp. r range in [2]. For the last form a distinction is necessary.If r = k − a k − bab is the only scattered factor. If r is smaller than k − i possibilities for each r ∈ [ k −

3] lead to scattered factors. Allover this sumsup to k + 1 + 2 i + 2 i + 1 + 2 i ( k −

4) = k (2 i + 1) − i + 2. By this the ﬁrst claimis proven.For the second claim again scattered factors of diﬀerent forms will be distin-guished. Since also here the minimal k -spectrum is a subset of the k -spectrum of w , these k + 1 elements counts for the cardinality. There exists i many scatteredfactors of the form a r b s a and k − a r b s a , since with the last a alloccurrences of b are before it. Assuming w.l.o.g. again that i is at most k only b k − a is a scattered factor of the form b s a r . The scattered factors of the form b r ab r a contribute i many. The remaining two forms need again a case analysis.There exists exactly one scattered factor of the form a r b s ab s for r = k − a r b s ab s a for r = k −

4. If r resp. r aresmaller there exists i diﬀerent scattered factors for each choice of r ∈ [ k −

4] resp. r ∈ [ k − k +1+ k − i + i +1+ i +1+ i ( k − k ( i −

4) =2 k + 2 + 3 i + ik − i + ik − i = k (2 + 2 i ) − i + 2. ⊓⊔ Notice that for i ∈ (cid:2) ⌊ k ⌋ (cid:3) the sequence ( k (2 i +1) − i +2) i is increasing and itsminimum is 3 k − i ∈ (cid:2) ⌊ k ⌋ (cid:3) the sequence ( k (2 i +2) − i +2) i is increasingand its minimum is 4 k −

4. The following lemma only gives lower bounds forspeciﬁc forms of words, since, on the one hand, it proves to be suﬃcient for theTheorem 18 which describes the second gap, and, on the other hand, the proofsshow that the formulas describing the exact number of scattered factors of aspeciﬁc form are getting more and more complicated. It has to be shown thatalso words starting with i letters a , for i ∈ [ k − k -spectrum of greater(as lower is already excluded) cardinality. By Lemma 16 only words with anothertransition from a ’s to b ’s need to be considered, ( w = a r b s w a r b s ). W.l.o.g.we can assume s to be maximal, such that w starts with an a , and similarly,by maximality of r , ends with a b , thus only words of the form a r b s . . . a r n b s n have to be considered, and by Proposition 5, it is suﬃcient to investigate n < k . Lemma 17. – | ScatFact k ( a k − b i ab j ab k − i − j ) | ≥ k − for i, j ∈ [ k − , i + j ≤ k − , | ScatFact k ( a k − b s a r b s a r b s ) | ≥ k − for s + s + s = k , r + r =2 , s > , r , r , s , s ≥ , – | ScatFact k ( a r b s . . . a r n b s n ) | ≥ k − for r ≤ k − , P i ∈ [ n ] r i = P i ∈ [ n ] s i = k , and r i , s i ≥ .Proof. For the ﬁrst claim, choose i, j ∈ [ k − a r b s for r, s ∈ [ k ] are scattered factors of w ij and by Lemma 2 follows that w ij has k + 1 scattered factors of this form. Scattered factors of the form a r b s a r can occur in three variants. In the ﬁrst variant only the second block of a isinvolved after the ﬁrst block of b , namely the second single a is not involved.Since i ∈ [ k −

2] holds, for each s ∈ [ i ] exists r , r ( r = 1) such that a r b s a r isa scattered factor of w ij , i.e. w ij has additionally i scattered factors. The secondvariant uses the a of each the second and the third a -block. This only scatteredfactors of the form a r b s a r are of interest, the second b -block is not involved.If i + j = k − i − i new elements are in the k -spectrum. If only the a from the third blockis involved then j (resp. j −

1) new elements are in the spectrum. This sums upto at least 2 i + j − a r b s a r . A similar distinction leadsto the number of scattered factors of the form a r b s a r b s . Assume ﬁrst r = 1and for this only the a from the second a -block. This implies that either only b from the second block or from the second and third block can be taken for thelast b -block in the scattered factor. Moreover r , s , s are at most k −

3. Foreach choice of r in [ k −

3] there are min { j, k − − i } possibilities, which leadsto i ( k − j − j + k − j − X ℓ =1 k − − ℓ ! = 6 i + 1 12 k i − kji − ki + 1 12 j i + 1 12 ji. If b from the second and third block are allowed, all of the second block have tooccur for obtaining diﬀerent scattered factors to the previous ones. Thus, i ( k − j − i − j + k − j − i − X ℓ =1 k − − ℓ ! = kij + 12 k − k − ik − jk − ij − i j − ij + 1 12 i + 1 12 j + 12 i + 12 j . If both, the second and the third a -block, are involved ik − i − ij − i additionalscattered factors are in the k -spectrum. This all sums up to k + 1 + 9 i − k i − ki + 12 j i − ji + 12 j + 12 i + 12 j . Since either i ≥ ij or j ≥ ij and i, j ∈ [ k −

3] hold, this is greater than orequal to 1 12 k − k + 9 12 ≥ k − . otice that additionally there exist scattered factors of other forms, which en-large the concrete k -spectrum.For the second claim, consider ﬁrst the case, when s = 0, r = 0, or r = 0.This leads to words of the form matching Lemma 16 and consequently the k -spectrum has k (2 i +1) − i +2 ≥ k − > k − s = 0 holds and all other exponents are at least 1. By Lemma 16 followsagain that each such word has at least k (2 i + 2) − i + 2 ≥ k − > k − k − a k is a scattered factor and a k − i b i for s n also. Noticehere, that the proof leads to s n − scattered factors, if in the claim s n = 0 wouldbe allowed. Consider now the scattered factors of the form a i b j for i, j ∈ [ k ]. Let m be the number of the block in which the i th a occurs. If s m + · · · + s n ≥ k − i holds, a i b k − i is a scattered factor of w . Consider the opposite. This implies thatfrom the m th till the n th block less then k − i b occur. Thus in the blocks 1 to i there occur more than i b . Since the i th a is in the m th block, from this point tillthe end there are k − i a . Hence b i a k − i is a scattered factor of w . So in each caseat least one scattered factor occurs, i.e. at least k + 1 scattered factors of thisform are in the k -spectrum. Notice here, that the argument holds still if s m = 0is allowed. With a similar argumentation the number of occurrences of the form a i b j a k − i − j will be shown. If for a speciﬁc i, j -combination a i b j a k − i − j is not ascattered factor, then choose m , m such that the i th a is in block m and thej th b after that is in block m . Thus in the blocks m + 1 to n are less than k − i − j a . Let r ′ m be the a in the m block which don’t belong to a i . Then r ′ m + · · · + r m contains more than k − j a since k − j − i a occur in the m ’ th to the n th block. Thus a r ′ m b s m . . . a r m b s ′ m is a scattered factor of length atleast k + 1 where s ′ m describes the part of the m block until the j th b . If1 < m , m < n holds, ba k − j − b j − is a scattered factor of w . If m = m = 1holds, a k − j − bab is a scattered factor. If both are equal to n , ba k − j − b j − is ascattered factor. In both cases the last b exist even if s m = 0 holds, since thescattered factor ends in the examined block m . If m < m holds, there existsa factor of length > k which can be narrowed to a factor starting in a , ending in b , and having at least one switch from b back to a and back to b . This concludesto at least ( k − scattered factors of the form a i b j a k − i − j (or a diﬀerent onein exchange). By k − k + 3 ≥ k − k ≥ ⊓⊔ By Lemmas 16 and 17 we are able to prove the following theorem, whichshows the second gap in the set of cardinalities of ScatFact k for words in Σ kwzb . Theorem 18.

For k ≥ there does not exist a word w ∈ Σ kwzb with k -spectrumof cardinality k + i for i ∈ [ k − . In other words, i.e. between k + 1 and k − is a cardinality-gap.Proof. Theorems 2 and 15 show that exactly the words a k b k , a k − , bab k − , and a k − b k a have k -spectra of cardinality less than or equal to 2 k . By Lemma 16nd 17 follows that a k − b k a has a k -spectrum of cardinality 3 k −

3. Assume a w ∈ Σ kwzb \{ a k b k , a k − ba b k − , a k − b k a , a k − b k a } . Since renaming and reversaldo not inﬂuence the cardinality, it can be assumed w.l.o.g. that w starts with a .By assumption w does not start with a k . If w starts with a k − , w = a k − b i ab k − i follows with i ∈ [ k − ≥ and by Lemma 16 the k -spectrum has ( i +1) k − i +6 ≥ k − > k − k − a . and it is shown that words starting with at least two and at most k − a lead to k -spectra of cardinality greater than 3 k − ⊓⊔ Going further, we analyse the larger possible cardinalities of ScatFact k , tryingto see what values are achievable (even if only asymptotically, in some cases). Corollary 19.

All square numbers, greater or equal to four, occur as the cardi-nality of the k -spectrum of a word w ∈ Σ kwzb ; in particular | ScatFact k ( a k b k a k ) | = (cid:0) k + 1 (cid:1) holds for k even.Proof. Apply Lemma 16 to i = k . This implies that the cardinality of the k -spectrum of a k b k a k is k (cid:18) k (cid:19) − k − k + k − (cid:18) k (cid:19) . ⊓⊔ Inspired by the previous Corollary, we can show the following result con-cerning the asymptotic behaviour of the cardinality of ScatFact k for words oflength 2 k . Proposition 20.

Let i > be a ﬁxed (constant) integer. Let d = ⌊ ki ⌋ and r = k − di , and d ′ = ⌊ ki − ⌋ and r ′ = k − d ′ ( i − . Then the following hold: – the word a r b r ( a d b d ) i has Θ ( k i − ) scattered factors of length n ; – the word a r b r ′ ( a d b d ′ ) i − a d has Θ ( k i − ) scattered factors of length n .Proof. Let us ﬁrst show the upper bounds. The following algorithm can be usedto ﬁnd the scattered factors of length k of a r b r ( a d b d ) i . Choose 2 numbers q and q from [ i ] , and 2 i − r , . . . , r i − from [ d ] . Let r i = k − ( q + q + P j ∈ [2 i − r j ). If r i ≥ w ′ = a q b q ( a r b r )( a r b r ) · · · ( a r i − b r i )is a scattered factor of a r b r ( a d b d ) i , and all scattered factors of length k of thisword have this form. From the construction of w ′ , because d ≤ ki , it followsthat there are at most O ( i k i − ) possible ways to obtain it. As i is seen as aconstant, this means that a r b r ( a d b d ) i has O ( n i − ) scattered factors of length k .In the same way one can show that a r b r ′ ( a d b d ′ ) i − a d has O ( n i − ) scatteredfactors of length n .Let us now show the lower bounds. We ﬁrst consider the word a r b r ( a d b d ) i .As i is constant, let us assume that k > i (2 i − i − . Clearly, k ( i − i (2 i − < k i − ≤ ki − ≤ ≤ ki and d + r ≥ ki . We generate scattered factors of the word a r b r ( a d b d ) i asfollows. We ﬁrstly choose 2 i − r , . . . , r i − between k ( i − i (2 i − and k i − .Under our assumptions, the word w ′′ = b r ( a r b r ) · · · ( a r i − b r i − )is a scattered factor of the suﬃx b d ( a d b d ) i − of a r b r ( a d b d ) i . Let r = k − P j ∈ [2 i − r j . We have r ≤ ki ≤ d + r , so a r w ′′ is a scattered factor of a r ( a d b d ) i ,so also of a r b r ( a d b d ) i . Moreover, each choice of a tuple ( r , . . . , r i − ) leads to adiﬀerent scattered factor of a r b r ( a d b d ) i . The total number of tuples we chooseis (cid:18) k i − − k ( i − i (2 i − (cid:19) i − = (cid:18) ki (2 i − (cid:19) i − . So the total number of scattered factors of length k of a r b r ( a d b d ) i is at least (cid:16) ki (2 i − (cid:17) i − . As the total number of scattered factors of length k of a r b r ( a d b d ) i is also O ( k i − ), we get that a r b r ( a d b d ) i has Θ ( k i − ) scattered factors of length k . The proof that a r b r ′ ( a d b d ′ ) i − a d has Θ ( n i − ) scattered factors of length k follows in a very similar manner. ⊓⊔ Remark 21.

Let i be an integer, and consider k another integer divisible by i .Consider the word w k = ( a ki b ki ) i . The exact number of scattered factors of length k of w k equals to the number C (cid:0) k, i, ki (cid:1) of weak 2 i -compositions of k , whoseterms are bounded by ki , i.e., the number of ways in which k can be writtenas a sum P j ∈ [2 i ] r j where r j ∈ (cid:2) ki (cid:3) . From Proposition 20 we also get that thisnumber is Θ ( n k − ), but we also have: C (cid:18) k, i, ki (cid:19) = X ≤ j

E > C (cid:18) k, i, ki (cid:19) ≤ E · X ≤ j

0. This seems to be an interestingcombinatorial inequality in itself.One can also show as in Proposition 20 that the number of scattered factors oflength k of w k , which have, at their turn, ( ab ) i as a scattered factor, is Θ ( k i − ).This number also equals the number C ′ (cid:0) k, i, ki (cid:1) of 2 i -compositions of k whoseterms are strictly positive integers upper bounded by ki , i.e., the number of waysin which k can be written as a sum P j ∈ [2 i ] r j where r j ∈ (cid:2) ki (cid:3) . Just as above,from this we get P ≤ j

0. Again, this inequality seemsinteresting to us.e will end this analysis with the conjecture that, in contrast to the ﬁrstgap, which always starts immediately after the ﬁrst obtainable cardinality, thelast gap ends earlier the larger k is. More precisely, if w = a b ( ab ) k − − i ba ( ab ) i for k ∈ N ≥ , i ∈ [ k − then | ScatFact k ( w ) | = 2 k − − i .At the end of this section, we will brieﬂy introduce θ -palindromes in thisspeciﬁc setting. Let θ : Σ ∗ → Σ ∗ be an antimorphic involution, i.e. θ ( uv ) = θ ( v ) θ ( u ) and θ is the identity on Σ ∗ . By Σ = { a , b } only the identity andrenaming are such mappings. The ﬁxed points of θ are called θ -palindromes( ab .θ ( b ) θ ( a )) and exactly the words where w R = w holds. They were studiedin diﬀerent ﬁelds well (see e.g., [5], [8]). A word w ∈ Σ kwzb is a θ -palindromeiﬀ either w ∈ { a w ′ b , b w ′ a } for some θ -palindrome w ′ ∈ Σ k − wzb or additionally w = a k b k a k in the case that k is even. Two cardinality results for θ -palindromesare presented in Lemma 16 and Corollary 19. We believe that persuing the k -spectra of θ -palindromes may lead to a deeper insight of which cardinalities canbe reached, but due to space restrictions we will only mention one conjecturehere, which may already show that cardinalities are somehow propagating for θ -palindromes. Notice that this conjecture implies that indeed similar to thesecond gap here 4 k − k − − i for i ∈ [ k − Conjecture 22.

The k -spectrum of w = ab k − a k − b has 4( k −

1) elements andmoreover if w ′ = w R with a k -spectrum of cardinality ℓ ∈ N ≥ then the scat-tered factor set of a w b has cardinality 2 ℓ − k -Spectra In the ﬁnal section we consider the slightly diﬀerent problem of reconstructinga word from its scattered factors, or more speciﬁcally in this case, k -spectra.More generally, we are interested in how much information about a (weakly-0-balanced) word w is contained in its scattered factors, and more precisely, whichscattered factors are not necessary or useful for reconstructing the word w , ordistinguishing it from others. Since w is a scattered factor of itself, it is trivialthat the scattered factor of length | w | is suﬃcient to uniquely reconstruct w . Onthe other hand, all words over { a , b } ∗ containing both letters will have the same1-spectrum. Thus we see that the length of the scattered factors of a word w plays a role in how much information about w they contain. This relationshipis described more precisely by the following result of Dress and Erd¨os [3] alongwith the fact that (cf. e.g. Proposition 5) a word of length 2 k is not uniquelydetermined by its scattered factors of length k . Proposition 23 (Dress and Erd¨os [3]). If ScatFact k +1 ( w ) = ScatFact k +1 ( w ′ ) holds for w, w ′ ∈ Σ ≤ k then w = w ′ follows.Proof (for w , w ′ being weakly- -balanced). We give a procedure for uniquelyreconstructing w from ScatFact k ( w ). For all i, j ∈ N such that i + j = k , askhether a i ba j ∈ ScatFact k ( w ). Since there are exactly i + j occurrences of a in w , all are accounted for in the (potential) scattered factor a i ba j , and thus theanswer is ‘yes’ if and only if there are one or more b s between the i th and ( i +1) th occurrences of a in w . Hence after these queries, we know exactly which a s areconsecutive (i.e. do not have a b between them) in w . Similarly we ask for all i, j ∈ N such that i + j = k , ask whether b i ab j ∈ ScatFact k ( w ). By symmetry,this tells us exactly which b s are consecutive. This is suﬃcient information tospecify w completely. ⊓⊔ In the proof of Proposition 23, a pivotal role is played by scattered factorswhich contain many a s and a few b s or vice-versa. The question arises as towhether this is due to the fact that these scattered factors contain inherentlymore information about the structure of the whole word than e.g., weakly-0-balanced ones. In the general case, the answer is, sometimes at least, yes: wecannot distinguish between e.g. two words in { a } ∗ by their weakly-0-balancedscattered factors, as the only such factor is ε . The same problem arises for allwords which have a suﬃciently uneven ratio of a s to b s.However, if in addition we consider only weakly-0-balanced words, then thesituation changes. We conjecture that in fact, for these words w , the weakly-0-balanced scattered factors are just as informative about the w as the unbalancedones. More formally, we believe the following adaptation of Proposition 23 holds: Conjecture 24.

Let k ∈ N . Let k ′ = k +1 for odd k , and k ′ = k +2 for even k . Let w, w ′ ∈ Σ kwzb such that ScatFact k ′ ( w ) ∩ Σ k ′ wzb = ScatFact k ′ ( w ′ ) ∩ Σ k ′ wzb . Then w = w ′ .While we do not resolve the conjecture, we give an example of a subclass ofwords for which it holds true, namely when there are at most two blocks of b s(and therefore by symmetry if there are at most two blocks of a s). Proposition 25.

Let k ∈ N . If k is odd, then each word w ∈ a ∗ b ∗ a ∗ b ∗ a ∗ ∩ Σ kwzb is uniquely determined by the set ScatFact k +1 ( w ) ∩ Σ k +1 wzb . Similarly, if k is even, then each word w ∈ a ∗ b ∗ a ∗ b ∗ a ∗ ∩ Σ kwzb is uniquely determined by theset ScatFact k +2 ( w ) ∩ Σ k +2 wzb .Proof. As in the proof of Proposition 23, we give an algorithm for uniquelyreconstructing w . W.l.o.g., let k be odd. The case that k is even is easily adapted.Let w = a i b j a ℓ b k − j a k − i − ℓ and let S = ScatFact k +1 ( w ) ∩ Σ ∗ wzb . Firstly, we shalldeal with the case that ℓ = 0. Note that we can decide whether ℓ = 0 by queryingwhether there exists a scattered factor u ∈ S such that u ∈ a ∗ b + a + b + a ∗ . Now,if ℓ = 0, we have w = a i b k a k − i . Since k is odd, exactly one of i, k − i will be atmost k − . We can decide which one by querying whether a k +12 b k +12 ∈ S . W.l.og.,suppose i ≤ k − (so the query returns “no”). The other case is symmetric. Thennote that a i b k +12 a k +12 − i ∈ S but a i +1 b k +12 a k +12 − i − / ∈ S . Thus the exact value of i (and therefore k − i ) can be inferred directly from observing scattered factorsof this form in S .ow consider the the case that ℓ = 0. Note that there exists u ∈ b + a k +12 b + ∩ S if and only if ℓ ≥ k +12 . Suppose ﬁrstly that ℓ ≥ k +12 . Then i + ( k − i − ℓ ) ≤ k − .Thus we can determine i and ( k − i − ℓ ) (and therefore ℓ ) by looking for themaximum m , m such that there exists u ∈ a m b + a + b + a m with u ∈ S ( i isthe maximum m while k − i − ℓ is the maximum m ). Moreover, exactly oneof j, k − j will be less than k +12 . We can decide which one by querying whether a k +12 b k +12 ∈ S . If so, it must be that k − j ≥ k +12 . Suppose that this is thecase (the other case is symmetric). Then as before, we can determine the exactvalue of j by looking at the scattered factors of the form b m a + b + a ∗ (i.e., j isthe maximum m ) and we are done.Finally, we consider the case that 0 < ℓ < k +12 . Then ℓ can be uniquelydetermined as the maximum m such that there exists u ∈ a ∗ b + a m b + a ∗ with u ∈ S . In order to determine i (or equivalently k − i − ℓ ), we look for the maximum m , m such that there exist u ∈ a m b + a + b + a ∗ and u ∈ a ∗ b + a + b + a m with u , u ∈ S . In particular at least one of m , m must be strictly less than k − .If m < k − , then j = m and if m < k − then k − ℓ − i = m . In either case,since ℓ is already known, this uniquely determines both i and k − i − ℓ .It remains to determine j (or equivalently k − j ). Recall that exactly one of j, k − j will be less than k +12 . Let m be the maximum m such that there exists u ∈ a ∗ b m a + b + a ∗ with u ∈ S and let m be the maximum m such that thereexists u ∈ a ∗ b + a + b m a ∗ with u ∈ S . Note that m , m ≤ k − . If m < k − (resp. m < k − ), then j = m (resp k − j = m ), and thus j and j − k canbe inferred. If m = m = k − , then either j = k − or k − j = k − . Now, if k − i − ℓ < k +12 , there exists u ∈ a ∗ b k +12 a + a k − i − ℓ with u ∈ S if and only if j = k +12 (in which case k − j = k − ). On the other hand, if k − i − ℓ ≥ k +12 , then i < k +12 and there exists u ∈ a i a + b k +12 a ∗ with u ∈ S if and only if k − j = k +12 (in which case j = k − ). In either case, all exponents are known and we haveuniquely reconstructed w . ⊓⊔ The diﬃculty in proving Conjecture 24 seems to arise from the fact that,for diﬀerent pairs of words w, w ′ ∈ Σ wzb , the set of scattered factors whichdistinguish them, namely the symmetric diﬀerence of ScatFact k ( w ) ∩ Σ kwzb andScatFact k ( w ′ ) ∩ Σ kwzb (for appropriate k ), varies considerably, unlike with theproof(s) of Proposition 23, where the set of distinguishing scattered factors isalways made up words of the same form, regardless of the choice of w and w ′ . Asan example, consider the words w = ababab , w ′ = bababa , and w ′′ = ababba .Then the symmetric diﬀerence of ScatFact ( w ) ∩ Σ wzb and ScatFact ( w ′ ) ∩ Σ wzb is { aabb , bbaa } . On the other hand, considering ScatFact ( w ′ ) ∩ Σ wzb andScatFact ( w ′′ ) ∩ Σ wzb , the symmetric diﬀerence is { baab } . We have considered properties of k -spectra of weakly-0-balanced words. In par-ticular, in Section 3 we give several insights into the structure of the set ofll k -spectra of weakly-0-balanced words of length 2 k by considering for whichnumbers n there exists w such that the k -spectrum of w has cardinality n . Inparticular, we characterise the ﬁrst two gaps in the possibilities for each k whichare regular (in the sense that the ﬁrst and second gaps are always from k + 2 to2 k − k + 1 to 3 k − k -spectra. We note that this is, in a sense, as hard as in the generalcase, however, we also conjecture that even if we consider only the scatteredfactors which are also weakly-0-balanced, then the situation remains the same,in the sense that it can be achieved for the same choices of k . Resolving thisconjecture appears to require some new approach however since the techniquesfor the general case are not easily adapted.As mentioned at the end of Section 3 some of the weakly-0-balanced wordsare θ -palindromes. Since the θ -palindromes of length 2 k are constructible fromthe ones of length 2( k −

1) (except for each even k exactly one θ -palindrome)we surmised that the structure and properties propagate. Moreover we expectedthat the knowledge of the word’s second half helps in ﬁnding the cardinalities ofthe k -spectra. Nevertheless we were only able to get results for θ -palindromes inthe same manner as for the other words, but we still believe that the structureof the θ -palindromes can reveal more insights with further work. eferences

1. J. Berstel and J. Karhum¨aki. Combinatorics on words – A tutorial.

BEATCS:Bulletin of the European Association for Theoretical Computer Science , 79, 2003.2. Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. k -spectra ofweakly- c -balanced words. https://arxiv.org/abs/1904.09125 , 2019.3. A. W.M. Dress and P. Erd¨os. Reconstructing words from subwords in linear time. Annals of Combinatorics , 8(4):457–462, 2004.4. Cees H. Elzinga, Sven Rahmann, and Hui Wang. Algorithms for subsequencecombinatorics.

Theor. Comput. Sci. , 409(3):394–404, 2008.5. S. Z. Fazekas, F. Manea, R. Mercas, and K. Shikishima-Tsuji. The pseudopalin-dromic completion of regular languages.

Information and Compution , 239:222–236,2014.6. Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity,and expressiveness of ﬁrst-order logic over the subword ordering. In

Proc. LICS2017 , pages 1–12, 2017.7. ˇS. Holub and K. Saari. On highly palindromic words.

Descrete Applied Mathemat-ics , 157:953–959, 2009.8. L. Kari and K. Mahalingam. Watson-crick palindromes in dna computing.

NaturalComputing , 9(2):297–316, 2010.9. M. Lothaire.

Combinatorics on Words . Cambridge University Press, 1997.10. J. Manˇuch. Characterization of a word by its subwords. In

Developments inLanguage Theory , pages 210–219. World Scientiﬁc, 1999.11. A. Mateescu, A. Salomaa, and S. Yu. Subword histories and parikh matrices.

Journal of Computer and System Sciences , 68(1):1–21, 2004.12. G. Rozenberg and A. Salomaa, editors.

Handbook of Formal Languages (3 volumes) .Springer, 1997.13. A. Salomaa. Connections between subwords and certain matrix mappings.

Theo-retical Computer Science , 340(2):188–203, 2005.14. Georg Zetzsche. The complexity of downward closure comparisons. In

Proc. ICALP2016 , volume 55 of

LIPIcs , pages 123:1–123:14, 2016. k+1 2k87

Related Researches

Regular Model Checking Approach to Knowledge Reasoning over Parameterized Systems (technical report)

by Daniel Stan

Lie complexity of words

by Jason P. Bell

Parallel Hyperedge Replacement String Languages

by Graham Campbell

Recursive Prime Factorizations: Dyck Words as Numbers

by Ralph L. Childress

Subcubic Certificates for CFL Reachability

by Dmitry Chistikov

Explaining Safety Failures in NetKAT

by Georgiana Caltais

Decision Power of Weak Asynchronous Models of Distributed Computing

by Philipp Czerner

Automatic sequences: from rational bases to trees

by Michel Rigo

A theory of Automated Market Makers in DeFi

by Massimo Bartoletti

Model Checking for Decision Making System of Long Endurance Unmanned Surface Vehicle

by Hanlin Niu

Simplest Non-Regular Deterministic Context-Free Language

by Petr Jancar

Synthesis and Implementation of Distributed Supervisory Controllers with Communication Delays

by R.H.J. Schouten

Les claviers, un modèle de calcul

by Yoan Géran

On Typical Hesitant Fuzzy Languages and Automata

by Valdigleis S. Costa

On polynomial grammars extended with substitution

by Janusz Schmude

New Techniques for Universality in Unambiguous Register Automata

by Wojciech Czerwi?ski

Learning Pomset Automata

by Gerco van Heerdt

Locality and Centrality: The Variety ZG

by Antoine Amarilli

MatchKAT: An Algebraic Foundation For Match-Action

by Xiang Long

Dynamic Membership for Regular Languages

by Antoine Amarilli

Adaptive Synchronisation of Pushdown Automata

by A. R. Balasubramanian

Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata

by Borja Balle

Which Regular Languages can be Efficiently Indexed?

by Nicola Cotumaccio

Recognizability of languages via deterministic finite automata with values on a monoid: General Myhill-Nerode Theorem

by José Ramón González de Mendívil

The Complexity of Learning Linear Temporal Formulas from Examples

by Nathanaël Fijalkow

«

1

2

3

4

»

Submitted on 19 Apr 2019 (v1), last revised 24 May 2019 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar