Kolmogorov complexity, Lovasz local lemma and critical exponents
aa r X i v : . [ m a t h . C O ] S e p Kolmogorov complexity, Lovasz local lemma and criticalexponents
A. Rumyantsev
Moscow State University, Russia, Mathematics Department, Logic and algorithms theorydivision
Abstract.
D. Krieger and J. Shallit have proved that every real number greaterthan 1 is a critical exponent of some sequence [1]. We show how this result can bederived from some general statements about sequences whose subsequences have(almost) maximal Kolmogorov complexity. In this way one can also construct asequence that has no “approximate” fractional powers with exponent that exceedsa given value.
Let w = w w . . . be an infinite binary sequence. For any finite set A ⊂ N let w ( A ) be abinary string of length A formed by w i with i ∈ A (in the same order as in w ). We wantto construct a sequence w such that strings w ( A ) have high Kolmogorov complexity forall simple A . (See [3] for the definition and properties of Kolmogorov complexity. Weuse prefix complexity and denote it by K , but plain complexity can also be used withminimal changes.) Theorem 1.
Let g be a positive real number less than . Then there exists a sequence w and an integer N such that for any finite set A of cardinality at least N the inequalityK ( A , w ( A ) | t ) > g · Aholds for some t ∈ A. Here K ( A , w ( A ) | t ) is conditional Kolmogorov complexity of a pair ( A , w ( A )) rela-tive to t . Proof . This result is a consequence of Lovasz local lemma (see, e.g., [4] for a proof):
Lemma . Assume that a finite sequence of events A , . . . , A n is given, for each i somesubset N ( i ) ⊂ { , . . . , n } of “neighbors” is fixed, positive reals e , . . . , e n are chosen insuch a way that Pr [ A i ] e i (cid:213) j ∈ N ( i ) , j = i ( − e j ) and for every i the event A i is independent of the family of all A j with j / ∈ N ( i ) , j = i .Then the probability of the event “not A and not A and. . . and not A n ” is at least ( − e ) · . . . · ( − e n ) .The standard compactness argument shows that it is enough (for some N ; the choiceof N will be explained later) to construct an arbitrarily long finite sequence w that sat-isfies the statement of Theorem 1. Let us fix the desired length of this (long) sequence. For any set A (whose elements do not exceed this length) and any string Z of length | A | such that K ( A , Z | t ) < g · A for every t ∈ A consider the event w ( A ) = Z ; the set A iscallled the support of this event. We have to prove that the complements of these eventshave non-empty intersection.This is done by using Lovasz lemma. Let us choose some b between g and 1. Let e i be 2 − b s where s is the size of support of i th event. For each event w ( A ) = Z theneighbor events are events w ( A ′ ) = Z ′ such that the supports A and A ′ have nonemptyintersection. Let us check the assumptions of Lovasz lemma.First, an event A i is independent of any family of events whose supports do notintersect the support of A i .Second, let w ( A ) = Z be an event and let n be the cardinality of A . The probabilityof this event is 2 − n . We have to check that 2 − n does not exceed 2 − b n multiplied by theproduct of ( − − b m ) factors for all neighbor events (where m is the size of the supportof the corresponding events).This product can be split into parts according to possible intersection points. (Ifthere are several intersection points, let us select and fix one of them.) Then for any t ∈ A and for any m there is at most 2 g m factors that belong to the t -part and have size m , since there exist at most 2 g m objects that have complexity less than g m (relative to t ).Then we take a product over all m and multiply the results for all t (there are n of them).The condition of Lovasz lemma (that we need to check) gets the form2 − n − b n (cid:213) m > N ( − − b m ) g m n or (after we remove the common exponent n )2 b − (cid:213) m > N ( − − b m ) g m Bernoulli inequality guarantees that this is true if2 b − − (cid:229) m > N g m − b m Since the left hand side is less than 1 and the geometric series converges, this inequalityis true for a suitable N . (Let us repeat how the proof goes: we start with b ∈ ( g , ) , thenwe choose N using the convergence of the series, then for any finite number of eventswe apply Lovasz lemma, and then we use compactness.)(End of proof)The inequality established in this theorem has an useful corollary: K ( w ( A ) | t ) > g · A − K ( A | t ) − O ( ) , since K ( A , w ( A ) | t ) K ( A | t ) + K ( w ( A ) | t ) + O ( ) . For example, if A is an interval, then K ( A | t ) is o ( A ) , so this term (as well as an additive constant O ( ) ) can be absorbed bya small change in g and we obtain the following corollary (“Levin’s lemma”, see [2] fora discussion and further references): for any g < w such thatall its substrings of sufficiently large length n have complexity at least g n . Let X be a string over some alphabet, and let Y be its prefix. Then the string Z = X . . . XY is called a fractional power of X and the ratio | Z | / | X | is its exponent . A critical exponent of an infinite sequnce w is the least upper bound of all exponents of fractional powersthat are substrings of w . D. Krieger and J. Shallit [1] have proved the following result: Theorem 2.
For any real a > there exists an infinite sequence that has critical expo-nent a . Informally speaking, when constructing such a sequence, we need to achieve twogoals. First, we have to guarantee (for rational numbers r less than a but arbitrarilyclose to a ) that our sequence contains r -powers; second, we have to guarantee that itdoes not contain q -powers for q > a . Each goal is easy to achieve when consideredseparately. For the first one, we can just insert some r -power for every rational r < a .For the second goal we can use the sequence with complex substrings: since every q -power has complexity about 1 / q of its length (the number of free bits in it), Levin’ssequence does not contain long q -powers if q > / g .The real problem is to combine these two goals: after we fix the repetition patternneeded to ensure the first requirement (i.e., after decide which bits in a sequence shouldcoincide) we need to choose the values of the “free” bits in such a way that no other(significant) repetitions arise. For that, let us first prove some general statement aboutKolmogorov complexity of subsequences in the case when some bits are repeated. Let ∼ be an equivalence relation on N . We assume that all equivalence classes are finiteand the relation itself is computable; moreover, we assume that for a given x one caneffectively list the x ’s equivalence class. This relation is used as a repetition pattern: weconsider only sequences w that follows ∼ , i.e., only sequences w such that w i = w j if i ∼ j . For any set A ⊂ N we consider the number of free bits in A , i.e., the number ofequivalence classes that have a non-empty intersection with A ; it is denoted f A in thesequel.There are countably many equivalence classes. Let us assign natural numbers tothem (say, in the increasing order of minimal elements) and let c ( i ) be the number ofequivalence class that contains i . Then every sequence w that follows the repetitionpattern ∼ has the form w i = t c ( i ) for some function c : N → N .Now we assume that the equivalence relation ∼ (as explained above) and a constant g < Theorem 3.
There exists a sequence w that follows the pattern ∼ and an integer Nwith the following properties: for every finite set A with f A > N there exists t ∈ A suchthat K ( w ( A ) | t ) > g · f A − K ( A | t ) − log m ( t ) where m ( t ) is the “multiplicity” of t, i.e., the number of bits in its equivalence class. (Note that if all equivalence classes are singletons, then log m ( t ) disappears, f A isthe cardinality of A and we get an already mentioned corollary.) Proof . Let w i = t c ( i ) where t is a sequence that satisfies the statement of Theorem 1(with the same g ). For any A let B be the set of all c ( i ) for i ∈ A . Then B = f A .Theorem 1 guarantees that K ( B , t ( B ) | u ) > g · B for some u ∈ B . Since u ∈ B , thereexists some t ∈ A such that c ( t ) = u . To specify t when u is known, we need log m ( t ) bits,so K ( t | u ) log m ( t ) + O ( ) . After t is known, we need K ( A | t ) additional bits to specify A and K ( w ( A ) | t ) bits to specify w ( A ) . Knowing A and w ( A ) , we then reconstruct B and t ( B ) . Therefore, g · B K ( B , t ( B ) | u ) log m ( t ) + K ( A | t ) + K ( w ( A ) | t ) + O ( ) , which implies the desired inequality (with additional term O ( ) , which can be compen-sated by a small change in g ). Assume that 1 < a < b . First, let us show that Theorem 3 implies the existence of abinary sequence w that contains fractional powers of all rational exponents less than a ,but does not contain long fractional powers of exponents greater than b .To construct such a sequence, let r , r , . . . be all rational numbers between 1 and a . For each r i = p i / q i we “implant” a fractional power of exponent r i in the sequence:we select some interval of length p i and decide that this interval should be a fractionalpower of some string of length q i (and exponent r i ). This means that we declare twoindices in this interval equivalent if they differ by a multiple of q i . (The intervals fordifferent i are disjoint.) We call these intervals active intervals . We assume that distancebetween two active intervals is much bigger than the lengths of these two intervals (seebelow why this is useful). X1 Y1 X2 Y2r1 | X1 | r2 | X2 | Fig. 1.
Two fractional powers of exponent r and r are implanted; Y i is a prefix of X i (in this example the exponents are less than 2, so only one full period is shown).Evidently, any sequence that follows this repetition pattern has critical exponent atleast a .Let us choose some g between a / b and 1 and apply Theorem 3 with this g to thepattern explained above. We get a bit sequence; let us prove that it does not contain long fractional powers of exponent greater than b . Indeed, it is easy to see that density offree bits in this pattern is at least 1 / al pha , i.e., for any interval A of length l the numberof free bits in it, a f A , is at least l / a . Indeed, if A intersects with two or more active intervals, then all bits between them are free, and the distance between the intervals islarge compared to interval sizes. Then we may assume that A intersects with only oneactive interval. All subintervals of the active interval have the same repetitions period,and the density of free bits is minimal when A is maximal, i.e., coincides with the entireactive interval. The bits outside the active interval are free (no equivalences), so theycan only increase the fraction of free bits.On the other hand, a fractional power of exponent b and length l has complexity l / b + O ( log l ) (we specify the length of the string and l / b bits that form the period).For long enough strings we then get a contradiction with the statement of Theorem 3since a / b < g .To get rid of short fractional powers of exponent greater than b we can add ad-ditional layer of symbols that prevents them. In other terms, consider a sequence in afinite alphabet that follows (almost) the same repetition pattern but has no other repe-titions (not prescribed by the pattern) on short distances. It is easy to construct such asequence; for example, we may assume that q i is a multiple of i ! and then consider aperiodic sequence with any large period M ; it will destroy all periods that are not mul-tiple of M , i.e., all short periods and only finitely many of q i (the latter does not changethe critical exponent). The Cartesian product of these two sequences ( i th letter is a pairformed by i th letters of both sequences) has critical exponent between a and b .In fact, we even get a stronger result: Theorem 4.
For any a and b such that < a < b there exist a sequence w that hasfractional powers of exponent r for all r < a but does not have approximate fractionalpowers of exponent b or more: there exists some e > such that any substring of lengthn is e n-far from any fractional power in terms of Hamming distance ( we need to changeat least e n symbols of the sequence to get a fractional power of length n ) . Indeed, a change of e -fraction bits in a sequence of length n increases its complexityat most by H ( e ) n + O ( log n ) where H ( e ) = − e log e − ( − e ) log ( − e ) . Therefore, we need to change a constant fraction of bits to compensate for the differencein complexities (between the lower bound guaranteed by Theorem 3 and the upperbound due to approximate periodicity). (End of proof.)
The same construction (with some refinement) can be used to get a sequence with givencritical exponent.
Theorem 5. ( Krieger – Shallit ) For any real number a > there exists a sequence thathas critical exponent a . (This proof follows the suggestions of D. Krieger who informed the author about theproblem and suggested to apply Theorem 1 to it. See [1] for the original proof. Author thanks D. Krieger for the explanations and both authors of [1] for the permission to citetheir paper.)Again, let us consider repetition pattern that guarantees all exponents less than a and apply Theorem 3 with some g close to 1. This (as we have seen) prevents powerswith exponents greater that a / g ; the problem is how to get rid of intermediate expo-nents.To do this, we should distinguish between two possibilities: (a) an unwanted poweris an extension of the prescribed one (has the same period that unexpectedly has morerepetitions) and (b) an unwanted power is not an extension. The first type of unwantedpowers can be prevented by adding brackets around each active interval (in a speciallayer: we take a Cartesian product of the sequence and this layer).It remains to explain why unwanted repetitions of the second type do not exist (for g close enough to 1). Consider any fractional power with exponent greater than a . Thereare two possibilities:(1) It intersect at least two active intervals. Then it contains all free bits betweenthese intervals, and (since we assume that the distances are large compared to the lengthof intervals) the density of free bits is close to 1, so exponent greater than a is impossi-ble.(2) It intersects only one active interval. The same argument (about density of freebits) shows that if the endpoints of this fractional power deviate significantly from theendpoints of the active interval, then the density of free bits is significantly greaterthan 1 / a and we again get a contradiction. Therefore, taking g close to 1 we mayguarantee that the distance between endpoints of fractional power and active interval isa small fraction of the length of the active interval. Then we get two different periods in the intersection of fractional power and active interval. One (“old”) is inherited fromthe repetition pattern; the second one (“new”) is due to the fact that we consider afractional power. (The periods are different, otherwise we are in the case (1).) Theperiod lengths are close to each other. Indeed, if the new period is significantly longer,then the exponent is less than a ; if the new period is significantly shorter, then thecomplexity bound decreases and we again get a contradiction.Now note that two periods t and t in a string guarantee the period t − t near theendpoints of this string (at the distance equal to the difference between string lengthand minimal of these periods). Therefore we get a period that is a small fraction of thestring length at an interval whose length is a non-negligible fraction of the string length.This again significantly decreases the complexity of the string, and this contradicts thelower bound of the complexity. (End of the proof.)Remark. This proof uses some parameters that have to be chosen properly. For agiven a we choose g that is close enough to 1 and makes the arguments about “suffi-ciently small” and “significantly different” things in the last paragraph valid for longstrings. Then we choose the repetition patterns where length of active intervals are mul-tiples of factorials and the distances between them grow much faster than the lengths ofactive intervals. Then we apply Theorem 3 for this pattern. Finally, we look at the length N provided by this theorem and prevent all shorter periods by an additional layer. An-other layes is used for brackets. These layers destroy only finitely many of prescribedpatterns and unwanted short periods. References
1. D. Krieger and J. Shallit. ”Every real number greater than 1 is a critical exponent”. Acceptedto Theoret. Comput. Sci.2. A. Yu. Rumyantsev and M. A. Ushakov, Forbidden Substrings, Kolmogorov Complexity andAlmost Periodic Sequences, Springer, Lecture Notes in Computer Science, Volume 3884 /2006, STACS 2006, pp. 396–407.3. Li M., Vitanyi P,
An Introduction to Kolmogorov Complexity and Its Applications , 2nd ed.N.Y.: Springer, 1997.4. Rajeev Motwani, Prabhakar Raghavan,