[PDF] When are Kalman-filter restless bandits indexable?

Abstract

We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.

Full PDF

WWhen are Kalman-Filter Restless Bandits Indexable?

Christopher Dance and Tomi SilanderXerox Research Centre Europe, Grenoble, FranceMay, 2015

Abstract

We study the restless bandit associated with an extremely simple scalar Kalman ﬁltermodel in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the

Whittle index is a non-decreasing function of the relevant belief state.In spite of the long history of this problem, this appears to be the ﬁrst such proof. We useresults about

Schur-convexity and mechanical words , which are particular binary stringsintimately related to palindromes . We study the problem of monitoring several time series so as to maintain a precise beliefwhile minimising the cost of sensing. Such problems can be viewed as POMDPs with belief-dependent rewards [3] and their applications include active sensing [7], attention mechanisms formultiple-object tracking [22], as well as online summarisation of massive data from time-series[4]. Speciﬁcally, we discuss the restless bandit [24] associated with the discrete-time Kalmanﬁlter [19].

Restless bandits generalise bandit problems [6, 8] to situations where the state of each arm(project, site or target) continues to change even if the arm is not played. As with banditproblems, the states of the arms evolve independently given the actions taken, suggesting thatthere might be eﬃcient algorithms for large-scale settings, based on calculating an index foreach arm, which is a real number associated with the (belief-)state of that arm alone. However,while bandits always have an optimal index policy (select the arm with the largest index), itis known that no index policy can be optimal for some discrete-state restless bandits [17] andsuch problems are in general PSPACE-hard even to approximate to any non-trivial factor [10].Further, in this paper we address restless bandits with real-valued rather than discrete states.On the other hand, Whittle proposed a natural index policy for restless bandits [24], butthis policy only makes sense when the restless bandit is indexable , as we now explain. Saywe have n restless bandits and we are constrained to play m arms at each time. Whittleconsidered relaxing this constraint by only requiring that the time-average number of armsplayed is m . Now the optimal average cost for this relaxed problem is a lower bound on theoptimal average cost for the original problem. Also, the relaxed problem can be separated into n single-arm problems by the method of Lagrange multipliers, making it relatively easy to solve.In this separated version of the relaxed problem, each arm behaves identically to an arm inthe original problem, except that an additional price λ is charged each time the arm is played,where λ corresponds to the Lagrange multiplier for the relaxed constraint. Now let us considera family of optimal policies which achieves the optimal cost-to-go Q i ( x, u ; λ ) for a single arm i with price λ and which takes actions u = π i ( x ; λ ) when in state x where u = 0 means passiveand u = 1 means active. At ﬁrst glance, we might intuitively suppose that it becomes less andless attractive to be active as the price λ increases so that as the price is increased beyond somevalue λ i ( x ), the optimal action switches from active to passive. At this price we are ambivalentbetween being active and passive so that Q i ( x, λ i ( x )) = Q i ( x, λ i ( x )). Such a value λ i ( x )1 a r X i v : . [ s t a t . M L ] S e p s called the Whittle index for arm i in state x . Indeed if there is a family of optimal policiesfor which π i ( x ; λ hi ) ≤ π i ( x ; λ lo ) for all states x and all pairs of prices λ hi ≥ λ lo then an optimal solution to the relaxed problem for price λ is to activate arm i if and only if λ < λ i ( x ). If a restless bandit satisﬁes this condition, it is said to be indexable . It is importantto note that some restless bandits are not indexable, so activating arm i if and only if λ < λ i ( x )does not correspond to an optimal solution to the relaxed problem. Indeed, in a study of smallrandomly-generated problems, Weber and Weiss [23] found that roughly 10% of problems werenot indexable.As a policy based on λ i ( x ) is so good for the relaxed problem when the arms are indexable,this motivates us to use λ i ( x ) as a heuristic for the original problem. This heuristic is calledWhittle’s index policy and at each time it activates the m arms with the highest indexes λ i ( x ).Further motivation for studying indexability is that for ordinary bandits the Whittle indexreduces to the Gittins index, making the Whittle index policy optimal when only one arm maybe active at each time, that is when m = 1. More generally, Whittle’s index policy is notoptimal for some restless bandit problems even when the arms are indexable, but indexabilityis still a rather useful concept, since if all arms are indexable and certain other conditions hold,Whittle’s policy is asymptotically optimal, as we now explain. Consider a sequence of restlessbandit problems parameterised by the number of indexable arms n and in which m = αn ofthe arms can be simultaneously active for some ﬁxed α ∈ (0 , n tends to inﬁnity,the time-average cost per arm for Whittle’s index policy converges to the time-average costper arm for an optimal policy, provided a certain ﬂuid approximation has a unique ﬁxed point.This result was ﬁrst demonstrated by Weber and Weiss [23] who for simplicity of expositiononly considered the symmetric case in which the n arms have identical costs and transitionprobabilities. Recently, Verloop [20] extended this result to asymmetric cases involving multipletypes of arms. Interestingly, this extension also covers cases where new arms arrive and oldarms depart.Restless bandits associated with scalar Kalman(-Bucy) ﬁlters in continuous time were re-cently shown to be indexable [12] and the corresponding discrete-time problem has attractedconsiderable attention over a long period [15, 11, 16, 21]. However, that attention has producedno satisfactory proof of indexability – even for scalar time-series and even if we assume thatthere is a monotone optimal policy for the single-arm problem, which is a policy that plays thearm if and only if the relevant belief-state exceeds some threshold (here the relevant belief-stateis a posterior variance). Theorem 1 of this paper addresses that gap. After formalising theproblem (Section 2), we describe the concepts and intuition (Section 3) behind the main result(Section 4). The main tools are mechanical words (which are not suﬃciently well-known) and Schur convexity . As these tools are associated with rather general theorems, we believe thatfuture work (Section 5) should enable substantial generalisation of our results.

We consider the problem of tracking N time-series, which we call arms, in discrete time. Thestate Z i,t ∈ R of arm i at time t ∈ Z + evolves as a standard-normal random walk independentof everything but its immediate past ( Z + , R − and R + all include zero). The action space is U := { , . . . , N } . Action u t = i makes an expensive observation Y i,t of arm i which is normally-distributed about Z i,t with precision b i ∈ R + and we receive cheap observations Y j,t of eachother arm j with precision a j ∈ R + where a j < b j and a j = 0 means no observation at all.Let Z t , Y t , H t , F t be the state, observation, history and observed history, so that Z t := ( Z ,t , . . . , Z N,t ) , Y t := ( Y ,t , . . . , Y N,t ) , H t := (( Z , u , Y ) , . . . , ( Z t , u t , Y t )) and F t :=(( u , Y ) , . . . , ( u t , Y t )) . Then we formalise the above as ( · is the indicator function) Z i, ∼ N (0 , , Z i,t +1 | H t ∼ N ( Z i,t , , Y i,t | H t − , Z t , u t ∼ N (cid:18) Z i,t , u t (cid:54) = i a i + u t = i b i (cid:19) . E [( Z i,t +1 − Z i,t ) ] (cid:54) = 1 by a change of variables.Thus the posterior belief is given by the Kalman ﬁlter as Z i,t | F t ∼ N ( ˆ Z i,t , x i,t ) where theposterior mean is ˆ Z i,t ∈ R and the error variance x i,t ∈ R + satisﬁes x i,t +1 = φ i, ut +1= i ( x i,t ) where φ i, ( x ) := x + 1 a i x + a i + 1 and φ i, ( x ) := x + 1 b i x + b i + 1 . (1) Problem KF1.

Let π be a policy so that u t = π ( F t − ). Let x πi,t be the error varianceunder π . The problem is to choose π so as to minimise the following objective for discountfactor β ∈ [0 , x πi,t with weights w i ∈ R + plus observation costs h i ∈ R + for i = 1 , . . . , N : E (cid:34) ∞ (cid:88) t =0 N (cid:88) i =1 β t (cid:8) h i u t = i + w i x πi,t (cid:9)(cid:35) = ∞ (cid:88) t =0 N (cid:88) i =1 β t (cid:8) h i u t = i + w i x πi,t (cid:9) where the equality follows as (1) is a deterministic mapping (and assuming π is deterministic). Single-Arm Problem and Whittle Index.

Now ﬁx an arm i and write x πt , φ ( · ) , . . . instead of x πt,i , φ i, ( · ) , . . . . Say there are now two actions u t = 0 , h + ν where ν ∈ R .The single-arm problem is to choose a policy, which here is an action sequence, π := ( u , u , . . . )so as to minimise V π ( x | ν ) := ∞ (cid:88) t =0 β t { ( h + ν ) u t + wx πt } where x = x . (2)Let Q ( x, α | ν ) be the optimal cost-to-go in this problem if the ﬁrst action must be α and let π ∗ be an optimal policy, so that Q ( x, α | ν ) := ( h + ν ) α + wx + βV π ∗ ( φ α ( x ) | ν ) . For any ﬁxed x ∈ R + , the value of ν for which actions u = 0 and u = 1 are both optimal isknown as the Whittle index λ W ( x ) assuming it exists and is unique. In other words The Whittle index λ W ( x ) is the solution to Q ( x, | λ W ( x )) = Q ( x, | λ W ( x )) . (3)Let us consider a policy which takes action u = α then acts optimally producing actions u α ∗ t ( x )and error variances x α ∗ t ( x ). Then (3) gives ∞ (cid:88) t =0 β t (cid:8) ( h + λ W ( x )) u ∗ t + wx ∗ t ( x ) (cid:9) = ∞ (cid:88) t =0 β t (cid:8) ( h + λ W ( x )) u ∗ t + wx ∗ t ( x ) (cid:9) . Solving this linear equation for the index λ W ( x ) gives λ W ( x ) = w (cid:80) ∞ t =1 β t ( x ∗ t ( x ) − x ∗ t ( x )) (cid:80) ∞ t =0 β t ( u ∗ t ( x ) − u ∗ t ( x )) − h. (4)Whittle [24] recognised that for his index policy (play the arm with the largest λ W ( x )) to makesense, any arm which receives an expensive observation for added cost ν , must also receive anexpensive observation for added cost ν (cid:48) < ν . Such problems are said to be indexable . Thequestion resolved by this paper is whether Problem KF1 is indexable. Equivalently, is λ W ( x )non-decreasing in x ∈ R + ? We make the following intuitive assumption about threshold (monotone) policies.3 t x t + ? (x) ? (x) A BCD E x t0* x t x t + ? (x) ? (x) FG HI J x t1* Figure 1: Orbit x ∗ t ( x ) traces the path ABCDE . . . for the word 01 w = 01101. Orbit x ∗ t ( x )traces the path F GHIJ . . . for the word 10 w = 10101. Word w = 101 is a palindrome. A1.

For some x ∈ R + depending on ν ∈ R , the policy u t = x t ≥ x is optimal for problem (2). Note that under A1, deﬁnition (3) means the policy u t = x t >x is also optimal, so we canchoose u ∗ t ( x ) := (cid:40) x ∗ t − ( x ) ≤ x x ∗ t ( x ) := (cid:40) φ ( x ∗ t − ( x )) if x ∗ t − ( x ) ≤ xφ ( x ∗ t − ( x )) otherwise u ∗ t ( x ) := (cid:40) x ∗ t − ( x ) < x x ∗ t ( x ) := (cid:40) φ ( x ∗ t − ( x )) if x ∗ t − ( x ) < xφ ( x ∗ t − ( x )) otherwise  (5)where x ∗ ( x ) = x ∗ ( x ) = x . We refer to x ∗ t ( x ) , x ∗ t ( x ) as the x -threshold orbits (Figure 1).We are now ready to state our main result. Theorem 1.

Suppose a threshold policy (A1) is optimal for the single-arm problem (2).Then Problem KF1 is indexable. Speciﬁcally, for any b > a ≥ let φ ( x ) := x + 1 ax + a + 1 , φ ( x ) := x + 1 bx + b + 1 and for any w ∈ R + , h ∈ R and < β < , let λ W ( x ) := w (cid:80) ∞ t =1 β t ( x ∗ t ( x ) − x ∗ t ( x )) (cid:80) ∞ t =0 β t ( u ∗ t ( x ) − u ∗ t ( x )) − h (6) in which action sequences u ∗ t ( x ) , u ∗ t ( x ) and error variance sequences x ∗ t ( x ) , x ∗ t ( x ) are givenin terms of φ , φ by (5). Then λ W ( x ) is a continuous and non-decreasing function of x ∈ R + . We are now ready to describe the key concepts underlying this result.

Words.

In this paper, a word w is a string on { , } ∗ with k th letter w k and w i : j := w i w i +1 . . . w j . The empty word is (cid:15) , the concatenation of words u, v is uv , the word that isthe n -fold repetition of w is w n , the inﬁnite repetition of w is w ω and ˜ w is the reverse of w , so w = ˜ w means w is a palindrome. The length of w is | w | and | w | u is the number of times thatword u appears in w , overlaps included. Christoﬀel, Sturmian and Mechanical Words.

It turns out that the action sequencesin (5) are given by such words, so the following deﬁnitions are central to this paper.The

Christoﬀel tree (Figure 2) is an inﬁnite complete binary tree [5] in which each nodeis labelled with a pair ( u, v ) of words. The root is (0 ,

1) and the children of ( u, v ) are ( u, uv )4 (0,0001) (0001,001) (001,00101) (00101,01) (01,01011) (01011,011) (011,0111) (0111,1)(0,001) (001,01) (01,011) (011,1)(0,01) (01,1)(0,1) Figure 2: Part of the Christoﬀel tree.and ( uv, v ). The

Christoﬀel words are the words 0 , uv for all ( u, v ) inthat tree. The fractions | uv | / | uv | form the Stern-Brocot tree [9] which contains each positiverational number exactly once. Also, inﬁnite paths in the Stern-Brocot tree converge to thepositive irrational numbers. Analogously, Sturmian words could be thought of as inﬁnitely-long Christoﬀel words.Alternatively, among many known characterisations, the Christoﬀel words can be deﬁnedas the words 0 , w a := | w | / | w | and(01 w ) n := (cid:98) ( n + 1) a (cid:99) − (cid:98) na (cid:99) for any relatively prime natural numbers | w | and | w | and for n = 1 , , . . . , | w | . TheSturmian words are then the inﬁnite words 0 w w · · · where, for n = 1 , , . . . and a ∈ (0 , \ Q ,(01 w w · · · ) n := (cid:98) ( n + 1) a (cid:99) − (cid:98) na (cid:99) . We use the notation 0 w mechanical words is the union of the Christoﬀel and Sturmian words [13]. (Notethat the mechanical words are sometimes deﬁned in terms of inﬁnite repetitions of the Christoﬀelwords.) Majorisation.

As in [14], let x, y ∈ R m and let x ( i ) and y ( i ) be their elements sorted in ascending order. We say x is weakly supermajorised by y and write x ≺ w y if j (cid:88) k =1 x ( k ) ≥ j (cid:88) k =1 y ( k ) for all j = 1 , . . . , m .If this is an equality for j = m we say x is majorised by y and write x ≺ y . It turns out that x ≺ y ⇔ j (cid:88) k =1 x [ k ] ≤ j (cid:88) k =1 y [ k ] for j = 1 , . . . , m − j = m where x [ k ] , y [ k ] are the sequences sorted in descending order. For x, y ∈ R m we have [14] x ≺ y ⇔ m (cid:88) i =1 f ( x i ) ≤ m (cid:88) i =1 f ( y i ) for all convex functions f : R → R .More generally, a real-valued function φ deﬁned on a subset A of R m is said to be Schur-convex on A if x ≺ y implies that φ ( x ) ≤ φ ( y ). M¨obius Transformations.

Let µ A ( x ) denote the M¨obius transformation µ A ( x ) := A x + A A x + A where A ∈ R × . M¨obius transformations such as φ ( · ) , φ ( · ) are closed under com-position, so for any word w we deﬁne φ w ( x ) := φ w | w | ◦ · · · ◦ φ w ◦ φ w ( x ) and φ (cid:15) ( x ) := x. Intuition.

Here is the intuition behind our main result.For any x ∈ R + , the orbits in (5) correspond to a particular mechanical word 0 , w x (Figure 1). Speciﬁcally, for any word u , let y u be the ﬁxed pointof the mapping φ u on R + so that φ u ( y u ) = y u and y u ∈ R + . Then the word corresponding5 / |01w| y w and ? w ( ) Figure 3: Lower ﬁxed points y w of Christoﬀel words (black dots), majorisation points forthose words (black circles) and the tree of φ w (0) (blue).to x is 1 for 0 ≤ x ≤ y , 0 w x ∈ [ y w , y w ] and 0 for y ≤ x < ∞ . In passing we notethat these ﬁxed points are sorted in ascending order by the ratio ρ := | w | / | w | of countsof 0s to counts of 1s, as illustrated by Figure 3. Interestingly, it turns out that ratio ρ is apiecewise-constant yet continuous function of x , reminiscent of the Cantor function.Also, composition of M¨obius transformations is homeomorphic to matrix multiplication sothat µ A ◦ µ B ( x ) = µ AB ( x ) for any A, B ∈ R × . Thus, the index (6) can be written in terms of the orbits of a linear system (11) given by 0 , w . Further, if A ∈ R × and det( A ) = 1 then the gradient of the corresponding M¨obiustransformation is the convex function dµ A ( x ) dx = 1( A x + A ) . So the gradient of the index is the diﬀerence of the sums of a convex function of the linear-system orbits. However, such sums are Schur-convex functions and it follows that the index isincreasing because one orbit weakly supermajorises the other, as we now show for the case 0 w , w w is a palindrome.Further, if w is a palindrome, it turns out that the diﬀerence between the linear-system orbitsincreases with x . So, we might deﬁne the majorisation point for w as the x for which oneorbit majorises the other. Quite remarkably, if w is a palindrome then the majorisation pointis φ w (0) (Proposition 7). Indeed the black circles and blue dots of Figure 3 coincide. Finally, φ w (0) is less than or equal to y w which is the least x for which the orbits correspond to theword 0 w

1. Indeed, the blue dots of Figure 3 are below the corresponding black dots. Thus oneorbit does indeed supermajorise the other.

The M¨obius transformations of (1) satisfy the following assumption for I := R + . We prove thatthe ﬁxed point y w of word w (the solution to φ w ( x ) = x on I ) is unique in the supplementarymaterial. 6 ssumption A2. Functions φ : I → I , φ : I → I , where I is an interval of R , areincreasing and non-expansive, so for all x, y ∈ I : x < y and for k ∈ { , } we have φ k ( x ) < φ k ( y ) (cid:124) (cid:123)(cid:122) (cid:125) increasing and φ k ( y ) − φ k ( x ) < y − x (cid:124) (cid:123)(cid:122) (cid:125) non-expansive . Furthermore, the ﬁxed points y , y of φ , φ on I satisfy y < y . Hence the following two propositions (supplementary material) apply to φ , φ of (1) on I = R + . Proposition 1.

Suppose A2 holds, x ∈ I and w is a non-empty word. Then x < φ w ( x ) ⇔ φ w ( x ) < y w ⇔ x < y w and x > φ w ( x ) ⇔ φ w ( x ) > y w ⇔ x > y w . For a given x , in the notation of (5), we call the shortest word u such that ( u ∗ , u ∗ , . . . ) = u ω the x -threshold word . Proposition 2 generalises a recent result about x -threshold words in asetting where φ , φ are linear [18]. Proposition 2.

Suppose A2 holds and w is a mechanical word. Then w is the x -threshold word ⇔ x ∈ [ y w , y w ] . Also, if x , x ∈ I with x ≥ y and x ≤ y then the x - and x -threshold words are and . We also use the following very interesting fact (Proposition 4.2 on p.28 of [5]).

Proposition 3.

Suppose w is a mechanical word. Then w is a palindrome. M ( w ) and Preﬁx Sums S ( w ) Deﬁnition.

Assume that a, b ∈ R + and a < b . Consider the matrices F := (cid:18) a a (cid:19) , G := (cid:18) b b (cid:19) and K := (cid:18) − −

10 1 (cid:19) so that the M¨obius transformations µ F , µ G are the functions φ , φ of (1) and GF − F G =( b − a ) K . Given any word w ∈ { , } ∗ , we deﬁne the matrix product M ( w ) M ( w ) := M ( w | w | ) · · · M ( w ) , where M ( (cid:15) ) := I, M (0) := F and M (1) := G where I ∈ R × is the identity and the preﬁx sum S ( w ) as the matrix polynomial S ( w ) := | w | (cid:88) k =1 M ( w k ) , where S ( (cid:15) ) := 0 (the all-zero matrix). (7)For any A ∈ R × , let tr( A ) be the trace of A , let A ij = [ A ] ij be the entries of A and let A ≥ A are non-negative. Remark.

Clearly, det( F ) = det( G ) = 1 so that det( M ( w )) = 1 for any word w . Also, S ( w )corresponds to the partial sums of the linear-system orbits, as hinted in the previous section.The following proposition captures the role of palindromes (proof in the supplementarymaterial). Proposition 4.

Suppose w is a word, p is a palindrome and n ∈ Z + . Then1. M ( p ) = (cid:32) fh +1 h + f f h − h + f h (cid:33) for some f, h ∈ R , . tr ( M (10 p )) = tr ( M (01 p )) ,3. If u ∈ { p (10 p ) n , (10 p ) n } then M ( u ) − M (˜ u ) = λK for some λ ∈ R − ,4. If w is a preﬁx of p then [ M ( p (10 p ) n w )] ≤ [ M ( p (01 p ) n w )] ,5. [ M ((10 p ) n w )] ≥ [ M ((01 p ) n w )] ,6. [ M ((10 p ) n ≥ [ M ((01 p ) n . We now demonstrate a surprisingly simple relation between S ( w ) and M ( w ). Proposition 5.

Suppose w is a palindrome. Then S ( w ) = M ( w ) − and S ( w ) = M ( w ) + S ( w ) . (8) Furthermore, if ∆ k := [ S (10 w ) M ( w (10 w ) k ) − S (01 w ) M ( w (01 w ) k )] then ∆ k = 0 for all k ∈ Z + . (9) Proof.

Let us write M := M ( w ) , S := S ( w ). We prove (8) by induction on | w | . In the basecase w ∈ { (cid:15), , } . For w = (cid:15) , M − S , M + S = 0 = S . For w ∈ { , } , M − c = S , M + S = 1 + c = S for some c ∈ { a, b } . For the inductive step, inaccordance with Claim 1 of Proposition 19, assume w ∈ { v , v } for some word v satisfying M ( v ) = (cid:32) fh +1 h + f f h − h + f h (cid:33) , S ( v ) = (cid:18) c dh − f + h − (cid:19) for some c, d, f, h ∈ R .For w = 1 v M := M (1 v

1) = GM ( v ) G and S := S (1 v

1) = GM ( v ) G + S ( v ) G + G . Calculatingthe corresponding matrix products and sums gives S = ( bh + h + bf − bh + 2 h + bf + f + 1)( h + f ) − = M − S − S = bh + 2 h + bf + f = M as claimed. For w = 0 u F = G | b = a . This completes the proof of (8). Furthermore Part.

Let A := S ( w ) F G + F G + G and B := S ( w ) GF + GF + F . Then∆ k = [( A ( M ( w ) F G ) k − B ( M ( w ) GF ) k ) M ( w )] (10)by deﬁnition of S ( · ). By Claim 1 of Proposition 19 and (8) we know that M ( w ) = (cid:32) fh +1 h + f f h − h + f h (cid:33) , S ( w ) = (cid:18) c dh − f + h − (cid:19) for some c, d, f, h ∈ R .Substituting these expressions and the deﬁnitions of F, G into the deﬁnitions of

A, B and theninto (10) for k ∈ { , } directly gives ∆ = ∆ = 0 (although this calculation is long).Now consider the case k ≥

2. Claim 2 of Proposition 19 says tr( M (10 w )) = tr( M (01 w ))and clearly det( M (10 w )) = det( M (01 w )) = 1. Thus we can diagonalise as M ( w ) F G =: U DU − , M ( w ) GF =: V DV − , D := diag( λ, /λ ) for some λ ≥ k = [ AU D k U − M ( w ) − e T BV D k V − M ( w )] =: γ λ k + γ λ − k . So, if λ = 1 then∆ k = γ + γ = ∆ and we already showed that ∆ = 0. Otherwise λ (cid:54) = 1, so ∆ = ∆ = 0implies γ + γ = γ λ + γ λ − = 0 which gives γ = γ = 0. Thus for any k ∈ Z + we have∆ k = γ λ k + γ λ − k = 0. 8 .3 Majorisation The following is a straightforward consequence of results in [14] proved in the supplementarymaterial. We emphasize that the notation ≺ w has nothing to do with the notion of w as aword. Proposition 6.

Suppose x, y ∈ R m + and f : R → R is a symmetric function that is convex anddecreasing on R + . Then x ≺ w y and β ∈ [0 , ⇒ (cid:80) mi =1 β i f ( x ( i ) ) ≥ (cid:80) mi =1 β i f ( y ( i ) ) . For any x ∈ R and any ﬁxed word w , deﬁne the sequences for n ∈ Z + and k = 1 , . . . , mx nm + k ( x ) := [ M ((10 w ) n (10 w ) k ) v ( x )] , σ ( n ) x := ( x nm +1 ( x ) , . . . , x nm + m ( x )) y nm + k ( x ) := [ M ((01 w ) n (01 w ) k ) v ( x )] , σ ( n ) y := ( y nm +1 ( x ) , . . . , y nm + m ( x )) (cid:41) (11)where m := | w | and v ( x ) := ( x, T . Proposition 7.

Suppose w is a palindrome and x ≥ φ w (0) . Then σ ( n ) x and σ ( n ) y are ascendingsequences on R + and σ ( n ) x ≺ w σ ( n ) y for any n ∈ Z + .Proof. Clearly φ w (0) ≥ x ≥ v ( x ) ≥

0. So for any word u and letter c ∈ { , } wehave M ( uc ) v ( x ) = M ( c ) M ( u ) v ( x ) ≥ M ( u ) v ( x ) ≥ M ( c ) ≥ I . Thus x k +1 ( x ) ≥ x k ( x ) ≥ y k +1 ( x ) ≥ y k ( x ) ≥

0. In conclusion, σ ( n ) x and σ ( n ) y are ascending sequences on R + .Now φ w (0) = [ M ( w )] [ M ( w )] . Thus [ Av ( φ w (0))] := [ AM ( w )] [ M ( w )] for any A ∈ R × . So x nm + k ( φ w (0)) − y nm + k ( φ w (0))= 1[ M ( w )] [( M ((10 w ) n (10 w ) k ) − M ((01 w ) n (01 w ) k )) M ( w )] ≤ k = 2 , . . . , m by Claim 4 of Proposition 19. So all but the ﬁrst term of the sum T m ( φ w (0))is non-positive where T j ( x ) := j (cid:88) k =1 ( x nm + k ( x ) − y nm + k ( x )) . Thus T ( φ w (0)) ≥ T ( φ w (0)) ≥ . . . T m ( φ w (0)). But T m ( φ w (0)) = 1[ M ( w )] m (cid:88) k =1 [( M ((10 w ) n (10 w ) k ) − M ((01 w ) n (01 w ) k )) M ( w )] = 1[ M ( w )] [ S (10 w ) M ( w (10 w ) n ) − S (01 w ) M ( w (01 w ) n )] = 0where the last step follows from (9). So T j ( φ w (0)) ≥ j = 1 , . . . , m . Yet Claims 5 and 6 ofProposition 19 give ddx T j ( x ) = (cid:80) jk =1 [ M ((10 w ) n (10 w ) k ) − M ((01 w ) n (01 w ) k )] ≥ . So for x ≥ φ w (0) we have T j ( x ) ≥ j = 1 , . . . , m which means that σ ( n ) x ≺ w σ ( n ) y . Theorem 1.

The index λ W ( x ) of (6) is continuous and non-decreasing for x ∈ R + .Proof. As weight w is non-negative and cost h is a constant we only need to prove the result for λ ( x ) := λ W ( x ) (cid:12)(cid:12) w =1 ,h =0 and we can use w to denote a word. By Proposition 2, x ∈ [ y w , y w ]for some mechanical word 0 w

1. (Cases x / ∈ ( y , y ) are clariﬁed in the supplementary material.)Let us show that the hypotheses of Proposition 7 are satisﬁed by w and x . Firstly, w is apalindrome by Proposition 3. Secondly, φ w (0) ≥ φ w ( · ) is monotonically increasing,9t follows that φ w ◦ φ w (0) ≥ φ w (0). Equivalently, φ w ◦ φ w (0) ≥ φ w (0) so that φ w (0) ≤ y w by Proposition 1. Hence x ≥ y w ≥ φ w (0).Thus Proposition 7 applies, showing that the sequences σ ( n ) x and σ ( n ) y , with elements x nm + k ( x ) and y nm + k ( x ) as deﬁned in (11), are non-decreasing sequences on R + with σ ( n ) x ≺ w σ ( n ) y . Also, 1 /x is a symmetric function that is convex and decreasing on R + . ThereforeProposition 6 applies giving m (cid:88) k =1 (cid:18) β nm + k − ( x nm + k ( x )) − β nm + k − ( y nm + k ( x )) (cid:19) ≥ n ∈ Z + where m := | w | . (12)Also Proposition 2 shows that the x -threshold orbits are ( φ u ( x ) , . . . , φ u k ( x ) , . . . ) and( φ l ( x ) , . . . , φ l k ( x ) , . . . ) where u := (01 w ) ω and l := (10 w ) ω . So the denominator of (6)is ∞ (cid:88) k =0 β k ( l k +1 =1 − u k +1 =1 ) = ∞ (cid:88) k =0 β mk (1 − β ) ⇒ λ ( x ) = 1 − β m − β ∞ (cid:88) k =1 β k − ( φ u k ( x ) − φ l k ( x )) . Note that ddx ex + fgx + h = gx + h ) for any eh − f g = 1. Then (12) gives dλ ( x ) dx = 1 − β m − β ∞ (cid:88) n =0 m (cid:88) k =1 (cid:18) β nm + k − ( x nm + k ( x )) − β nm + k − ( y nm + k ( x )) (cid:19) ≥ . But λ ( x ) is continuous for x ∈ R + (as shown in the supplementary material). Therefore weconclude that λ ( x ) is non-decreasing for x ∈ R + . One might attempt to prove that assumption A1 holds using general results about mono-tone optimal policies for two-action MDPs based on submodularity [2] or multimodularity [1].However, we ﬁnd counter-examples to the required submodularity condition. Rather, we areoptimistic that the ideas of this paper themselves oﬀer an alternative approach to proving A1.It would then be natural to extend our results to settings where the underlying state evolvesas Z t +1 | H t ∼ N ( mZ t ,

1) for some multiplier m (cid:54) = 1 and to cost functions other than thevariance. Finally, the question of the indexability of the discrete-time Kalman ﬁlter in multipledimensions remains open. 10 eferences [1] E. Altman, B. Gaujal, and A. Hordijk. Multimodularity, convexity, and optimization properties. Mathematics of Operations Research , 25(2):324–347, 2000.[2] E. Altman and S. Stidham Jr. Optimality of monotonic policies for two-action Markovian decisionprocesses, with applications to control of queues with delayed information.

Queueing Systems ,21(3-4):267–291, 1995.[3] M. Araya, O. Buﬀet, V. Thomas, and F. Charpillet. A POMDP extension with belief-dependentrewards. In

Neural Information Processing Systems , pages 64–72, 2010.[4] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular max-imization: Massive data summarization on the ﬂy. In

Proceedings of the 20th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining , pages 671–680, 2014.[5] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola.

Combinatorics on Words: Christoﬀel Wordsand Repetitions in Words . CRM Monograph Series, 2008.[6] S. Bubeck and N. Cesa-Bianchi.

Regret Analysis of Stochastic and Nonstochastic Multi-armedBandit Problems, Foundation and Trends in Machine Learning, Vol. 5 . NOW, 2012.[7] Y. Chen, H. Shioi, C. Montesinos, L. P. Koh, S. Wich, and A. Krause. Active detection via adaptivesubmodularity. In

Proceedings of The 31st International Conference on Machine Learning , pages55–63, 2014.[8] J. Gittins, K. Glazebrook, and R. Weber.

Multi-armed bandit allocation indices . John Wiley &Sons, 2011.[9] R. Graham, D. Knuth, and O. Patashnik.

Concrete Mathematics: A Foundation for ComputerScience . Addison-Wesley, 1994.[10] S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems.

Journalof the ACM , 58(1):3, 2010.[11] B. La Scala and B. Moran. Optimal target tracking with restless bandits.

Digital Signal Processing ,16(5):479–487, 2006.[12] J. Le Ny, E. Feron, and M. Dahleh. Scheduling continuous-time Kalman ﬁlters.

IEEE Trans.Automatic Control , 56(6):1381–1394, 2011.[13] M. Lothaire.

Algebraic combinatorics on words . Cambridge University Press, 2002.[14] A. Marshall, I. Olkin, and B. Arnold.

Inequalities: Theory of majorization and its applications .Springer Science & Business Media, 2010.[15] L. Meier, J. Peschon, and R. Dressler. Optimal control of measurement subsystems.

IEEE Trans.Automatic Control , 12(5):528–536, 1967.[16] J. Ni˜no-Mora and S. Villar. Multitarget tracking via restless bandit marginal productivity indicesand Kalman ﬁlter in discrete time. In

Proceedings of the 48th IEEE Conference on Decision andControl , pages 2905–2910, 2009.[17] R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless Markov bandits. In

Algorithmic Learning Theory , pages 214–228. Springer, 2012.[18] B. Rajpathak, H. Pillai, and S. Bandyopadhyay. Analysis of stable periodic orbits in the onedimensional linear piecewise-smooth discontinuous map.

Chaos , 22(3):033126, 2012.[19] T. Thiele.

Sur la compensation de quelques erreurs quasi-syst´ematiques par la m´ethode des moin-dres carr´es . CA Reitzel, 1880.[20] I. Verloop. Asymptotic optimal control of multi-clss restless bandits.

CNRS Technical Report,hal-00743781 , 2014.[21] S. Villar.

Restless bandit index policies for dynamic sensor scheduling optimization . PhD thesis,Statistics Department, Universidad Carlos III de Madrid, 2012.[22] E. Vul, G. Alvarez, J. B. Tenenbaum, and M. J. Black. Explaining human multiple object track-ing as resource-constrained approximate inference in a dynamic probabilistic model. In

NeuralInformation Processing Systems , pages 1955–1963, 2009.[23] R. R. Weber and G. Weiss. On an index policy for restless bandits.

Journal of Applied Probability ,pages 637–648, 1990.[24] P. Whittle. Restless bandits: Activity allocation in a changing world.

Journal of Applied Proba-bility , pages 287–298, 1988. Supplementary Material: Introduction

The results used but not proved in the main paper are given here as: • Proposition 9 which was used to show that φ w (0) ≤ x , • Proposition 16 for the range of x giving a speciﬁc mechanical word, • Proposition 17 showing the index is continuous for x ∈ R + , • Proposition 19 showing the properties of M ( p ) when p is a palindrome. • and Proposition 20 for weak supermajorisation with β (cid:54) = 1.A clariﬁcation of the extreme cases of Theorem 1 of the main paper is presented in the ﬁnalsection. x -Threshold Policies to Mechanical Words Some concepts relating to mechanical words appeared as early as 1771 in Jean Bernoulli’sstudy of continued fractions (Berstel et al , 2008). The term “mechanical sequences” appearsin the work of Morse and Hedlund (Am. J. Math., Vol 62, No. 1, 1940, p. 1-42) who hadjust introduced the term “symbolic dynamics”. Morse and Hedlund studied the concept fromthe perspective of sequences of the form (cid:98) c + kβ (cid:99) for c, β ∈ R and k ∈ Z . They also studiedthe concept from the perspective of diﬀerential equations, motivating the term “Sturmian se-quences.” Since that time there has been tremendous progress in the study of such sequencesfrom the perspective of Combinatorics on Words (Lothaire, 2001). However, the recent (andhighly-approachable) paper of Rajpathak, Pillai and Bandyopadhyay (Chaos, Vol. 22, 2012)on the piecewise-linear map-with-a-gap discovers such sequences without recognising them asmechanical sequences. Proposition 16 of this section is a substantial generalisation of thatresult and we could not ﬁnd this proposition explicitly stated in the literature. Our result isnot surprising if one has the intuition that there is a topological conjugacy between the mapsof this section and the piecewise linear map-with-a-gap. However, it might be diﬃcult to ex-plicitly identify the appropriate topological conjugacy and thereby prove our result for all casesconsidered here. Let π denote a word consisting of a string of 0s and 1s in which the k th letter is π k and letters i, i + 1 , . . . , j are π i : j . Let | π | be the length of π and | π | w for a word w be the number oftimes that word w appears in π . Let (cid:15) denote the empty word and π ω denote the inﬁnite wordconstructed by repeatedly concatenating π .Consider two functions φ : I → I and φ : I → I where I is an interval of R . We deﬁnethe transformation φ π : I → I for any word π by the composition φ π ( x ) := φ π | π | ◦ · · · ◦ φ π ◦ φ π ( x ) . Let y π ∈ I be the ﬁxed point of φ π , so φ π ( y π ) = y π , assuming a unique ﬁxed point on I exists.Given x ∈ I , we call the sequence ( x k : k ≥

1) the x -threshold orbit for φ , φ if x = φ ( x ) , x k +1 = (cid:40) φ ( x k ) if x k ≥ xφ ( x k ) if x k < x for k ≥ . We call π the x -threshold word for φ , φ if it is the shortest word such that x k +1 = φ ( π ω ) k ( x k )for all k ≥

1. We shall just write x -threshold orbit and x -threshold word where φ , φ are obviousfrom the context. 12or p ≥

1, let L p , R p be the morphisms (substitutions) L p : (cid:40) → p +1 → p R p : (cid:40) → p → p +1 . We say π is a valid word if π ∈ { , } or π ∈ { L p ( w ) , R p ( w ) : p ≥ } for some valid word w . Remark.

The morphisms L p , R p generate the Christoﬀel tree so valid words are mechanicalwords . To see this, note that the Christoﬀel tree is generated by the following morphisms(Berstel et al , 2008, p. 37) G : (cid:40) → →

01 ˜ D : (cid:40) → → . We may translate (from English to French) as L p = G p ◦ ˜ D and R p = ˜ D p ◦ G so any compositionof L p and R p can be written as a composition of G and ˜ D . Likewise, any composition of G and˜ D can be written as a composition of L p and R p . Speciﬁcally if p k , q k , p k +1 ≥ · · · ◦ G p k − ◦ ˜ D q k ◦ G p k +1 ◦ ˜ D ◦ · · · = · · · ◦ ( G p k − ◦ ˜ D ) ◦ ( ˜ D q k − ◦ G ) ◦ ( G p k +1 − ◦ ˜ D ) ◦ · · · = · · · ◦ L p k − ◦ R q k − ◦ L p k +1 − ◦ · · · whereas if q k = 1 we have · · · ◦ G p k − ◦ ˜ D ◦ G p k +1 ◦ ˜ D ◦ · · · = · · · ◦ ( G p k − ◦ ˜ D ) ◦ ( G p k +1 ◦ ˜ D ) ◦ · · · = · · · ◦ L p k − ◦ L p k +1 ◦ · · · . A symmetric argument holds if p k = 1 or p k +1 = 1. Throughout, we make the following assumption about φ , φ . The existence of ﬁxed points y , y is addressed immediately thereafter. Assumption A2.

Functions φ : I → I , φ : I → I , where I is an interval of R , areincreasing and non-expansive. Equivalently, for all x, y ∈ I : x < y and for k ∈ { , } we have φ k ( x ) < φ k ( y ) (cid:124) (cid:123)(cid:122) (cid:125) increasing and φ k ( y ) − φ k ( x ) < y − x (cid:124) (cid:123)(cid:122) (cid:125) non-expansive . Furthermore, the ﬁxed points y , y of φ , φ satisfy y < y . Proposition 8.

Suppose A2 holds, that x ∈ I and that w is any non-empty word. Then φ w ( x ) is increasing and non-expansive. Further, the ﬁxed point y w exists and is unique.Proof. First we show that φ w ( x ) is increasing, by induction. In the base case, | w | = 1 and theclaim follows from A2. For the inductive step assume φ u ( x ) is increasing, where w = au forsome a ∈ { , } and word u . Then for any x, y ∈ I : x < y , φ w ( y ) = φ u ( φ a ( y )) > φ u ( φ a ( x )) as φ a ( y ) > φ a ( x ) and φ u is increasing= φ w ( x ) . Therefore φ w is increasing. 13ow we show that φ w ( x ) is non-expansive, by induction. If | w | = 1 then this follows from A2.Else, say φ u ( x ) is non-expansive where w = ua and a ∈ { , } . Then for any x, y ∈ I : x < y , φ w ( y ) − φ w ( x ) = φ a ( φ u ( y )) − φ a ( φ u ( x )) < φ u ( y ) − φ u ( x ) as φ u ( y ) > φ u ( x ) and φ a is non-expansive < y − x as φ u is non-expansive.Therefore φ w is non-expansive.Let ψ ( x ) := max { φ ( x ) , φ ( x ) } . As φ is non-expansive we have y = φ ( y ) > φ ( y ) + y − y which rearranges to give φ ( y ) < y , so that ψ ( y ) = y . Also ψ is increasing as φ , φ areincreasing, so φ w ( y ) ≤ ψ ( | w | ) ( y ) = y .We now prove that y w exists. The argument of the previous paragraph shows that g ( x ) := x − φ w ( x ) satisﬁes g ( y ) ≥

0. A symmetric argument leads to the conclusion that g ( y ) ≤ g ( x ) is a continuous function, so by the intermediate value theorem, there is some y ∈ [ y , y ] for which g ( y ) = 0. Equivalently y = φ w ( y ). Therefore a ﬁxed point y w exists.To show that the ﬁxed point is unique, suppose both y and z are ﬁxed points with y > z .As φ w is non-expansive we have φ w ( y ) − φ w ( z ) y − z <

1. Yet, as φ w ( y ) = y, φ w ( z ) = z we have φ w ( y ) − φ w ( z ) y − z = 1 . This is a contradiction. Therefore the ﬁxed point is unique.Given a word w , the next proposition shows when the transformation φ w increases or de-creases its argument and what might be deduced from such an increase or decrease. Proposition 9.

Suppose A2 holds, x ∈ I and w is any non-empty word. Then x < φ w ( x ) ⇔ φ w ( x ) < y w ⇔ x < y w and x > φ w ( x ) ⇔ φ w ( x ) > y w ⇔ x > y w . Proof.

We use Proposition 8 throughout the argument without further mention.Say x < y w . As φ w is increasing, φ w ( x ) < φ w ( y w ) = y w where the equality is the deﬁnition of y w . Also, as φ w is non-expansive, y w = φ w ( y w ) < φ w ( x ) + y w − x which rearranges to give x < φ w ( x ).Now say x > y w . As above, we then have φ w ( x ) > φ w ( y w ) = y w and y w = φ w ( y w ) > φ w ( x ) + y w − x so that x > φ w ( x ).The contrapositive of x > y w ⇒ φ w ( x ) > y w is φ w ( x ) ≤ y w ⇒ x ≤ y w . But if φ w ( x ) (cid:54) = y w then x (cid:54) = y w as φ w is increasing and therefore injective. Thus φ w ( x ) < y w ⇒ x < y w .The contrapositive of x > y w ⇒ x > φ w ( x ) is x ≤ φ w ( x ) ⇒ x ≤ y w . But if x (cid:54) = φ w ( x ) then x (cid:54) = y w as y w is a ﬁxed point. So we can conclude that x < φ w ( x ) ⇒ x < y w .By symmetry, φ w ( x ) > y w ⇒ x > y w and x > φ w ( x ) ⇒ x > y w . This completes theproof. Proposition 10.

Suppose A2 holds and π is any word satisfying | π | | π | > . Then y < y π π =: s q for some q ≥

0. Thus y π = φ π ( y π ) ≤ φ s q ( y ) as φ π is increasing= φ s ( y ) as φ (cid:15) ( y ) = φ ( y ) = y > φ s ( y ) by Proposition 9 ≥ y by repeating the same argument if | s | > y π ≤ y . Therefore y π > y .A symmetrical argument leads to the conclusion that y π < y . Proposition 11.

If A2 holds and n ≥ then y n − < y n − < y n and y n < y n − y n − by Proposition 9 so that φ n − ( y n − ) = φ n − ( φ ( y n − )) > φ n − ( y n − ) = y n − so Proposition 9 gives y n − > y n − . Furthermore y n = φ ( y n − ) by deﬁnition of y π and y n − < y by Proposition 10 sothat φ ( y n − ) > y n − by Proposition 9. Thus y n > y n − .The proof that y n < y n − < y n − is symmetrical. Proposition 12.

Suppose A2 holds, M ∈ { L q , R q : q ≥ } and ˜ w is any word. Let ˜ y v be theﬁxed point of ˜ φ v := φ M ( v ) for any word v and let w M (0 ˜ w . Then ˜ x ∈ [˜ y

01 ˜ w , ˜ y

10 ˜ w ] ⇔ x := φ q (˜ x ) ∈ [ y w , y w ] . Proof.

Say M = L q . Note that φ q (˜ y

01 ˜ w ) = φ q ( y L q (01 ˜ w ) ) as ˜ y v is the ﬁxed point of ˜ φ v = φ L q ( v ) = φ q ( y q L q (1 ˜ w ) ) as L q (0) = 0 q y L q (1 ˜ w )0 q as φ a ( y ab ) = y ba for any words a, b = y w as 0 w L q (0 ˜ w

1) = 0 L q (1 ˜ w )0 q φ q (˜ y

10 ˜ w ) = φ q ( y L q (10 ˜ w ) )= φ q ( y q L q (0 ˜ w ) )= y L q (0 ˜ w )0 q = y w as 0 w L q (0 ˜ w )0 q . Proposition 8 shows that ˜ y

01 ˜ w , ˜ y

10 ˜ w exist. So the above equalities show that an inverse φ ( − q ( x ) exists for x ∈ { y w , y w } . As φ q is increasing and continuous, we have x ∈ [ y w , y w ] ⇔ ˜ x ∈ [ φ ( − q ( y w ) , φ ( − q (( y w )] = [˜ y

01 ˜ w , ˜ y

10 ˜ w ] . The proof for M = R q is symmetric. 15 .3 x -Threshold Words Proposition 13.

Suppose A2 holds, π is the x -threshold word and n ≥ . Then1. x ≤ y n − ⇒ | π ω | n = 0 x ≥ y n − ⇒ | π ω | n − = 0 x ≥ y n − ⇒ | π ω | n = 0 x ≤ y n − ⇒ | π ω | n − = 0 Proof. If x ≤ y then it follows from Proposition 9 that the x -threshold word is π = 1. Likewiseif x > y then the x -threshold word is π = 0. In these cases Claims 1 and 2 hold, so in thefollowing we assume that y < x ≤ y . Claim 1:

Let ( x k ) the x -threshold orbit. If ( π ω ) k : k + n − = 0 n − for some k , then x k + n − = φ n − ( x k ) by deﬁnition of ( x k ) ≥ φ n − ( φ ( x )) as x k ≥ φ ( x ) for all k ≥ φ n − is increasing= φ n − ( x ) ≥ x if x ≤ y n − by Proposition 9.But if x k + n − ≥ x then π k + n − = 1 by deﬁnition π . Therefore | π | n = 0 . Claim 2:

Let ( x k ) be the x -threshold orbit. If ( π ω ) k : k + n − = 10 n − for some k , then x k + n = φ n − ( x k ) < φ n − ( φ ( x )) as x k < φ ( x ) for all k ≥ φ n − is increasing= φ n − ( x ) ≤ x if x ≥ y n − by Proposition 9.But if x k + n < x then ( π ω ) k + n = 0. Therefore | π | n − = 0 . The proof of Claims 3 and 4 is symmetrical.

Proposition 14.

Suppose A2 holds and π is a x -threshold word. Then1. | π | > ⇒ π = L n ( w ) for some word w and some n ≥ | π | > ⇒ π = R n ( w ) for some word w and some n ≥ Proof.

First, applying Claims 1 and 3 of Proposition 13 with n = 2 we have | π | = 0 for x ≤ y and | π | = 0 for x ≥ y . Furthermore y = φ ( y ) > y by Proposition 9. Thus π cannot contain both 00 and 11.So, if | π | > π is of the form 0 q q . . . with strings of 0s separated by individual1s. Let q := min k q k . By Propositions 11 and 13, I q := ( y q − , y q ) is the only set of x valuesfor which π ω can contain 10 q

1. Thus π ω can only contain both 10 q q +1 F q := I q ∩ I q +1 = ( y q − , y q ) ∩ ( y q , y q +1 ) = ( y q , y q )noting Proposition 11 gives y q − < y q − < y q < y q . Finally, we have F q ∩ F q (cid:48) = ∅ for q (cid:54) = q (cid:48) , which also follows from Proposition 11. Thus if | π | > π is a concatenation of L q (0) and L q (1). Equivalently π = L q ( w ) for some word w and some q ≥ Proposition 15.

Suppose A2 holds and π is a x -threshold word. Then π is a valid word. roof. There are three cases to consider: either | π | = | π | = 0 or | π | > | π | > First case:

The only non-empty words not containing 00 or 11 are 0 , , (01) n , (10) n forsome n ≥

1. Now x -threshold words start with 0 unless x ≤ y (in which case π = 1) so π (cid:54) = (10) n . Further, the x -threshold word was deﬁned to be the shortest word such that suchthat x k +1 = A ( π ω ) k x k so this leaves us with the options 0 , ,

01. These are all valid words.

Second case: If π contains 00, we may write π = L q ( w ) for some word w , by Proposition 14.Now from point x k on the x -threshold orbit we have π k : k + q = 0 q +1 if and only if φ q ( x k ) < x which corresponds to x k < φ ( − q )0 ( x ) =: ˜ x . So the word w corresponds to a ˜ x -threshold orbit(˜ x k : k ≥

1) for ψ ( x ) := φ q +1 ( x ) , ψ ( x ) := φ q ( x ). To spell it out, we have˜ x = ψ (˜ x ) , ˜ x k +1 = ψ w k (˜ x k ) , w k = (cid:40) x k ≥ ˜ x x k < ˜ x for k ≥ y π as the ﬁxed point ˜ y π = ψ π (˜ y π ).Now ψ , ψ are non-negative, as φ , φ are non-negative. Also ψ , ψ are monotonicallyincreasing and non-expansive by Proposition 8. Further, φ q +1 ( y q ) = φ q ( φ ( y q )) > φ q ( y q ) = y q so that y q +1 > y q by Proposition 9. But by deﬁnition ˜ y = y q +1 and ˜ y = y q , so that˜ y < ˜ y . Therefore ψ , ψ satisfy A2. Third case:

We prove that π = R q ( w ) for some positive integer q and word w . We alsoshow that word w is a ˆ x -threshold word for a pair of functions (say) χ , χ which satisfy A2.The argument is symmetric to the second case, so it is omitted.In conclusion, either1. π ∈ { , , L (1) } which are valid words2. π = L q ( w ) where w is a ˜ x -threshold word for ψ , ψ which satisfy Propositions 8-14 andtherefore w satisﬁes this conclusion3. or π = R q ( w ) where w is a ˆ x -threshold word for χ , χ which satisfy Propositions 8-14and therefore w satisﬁes this conclusion.Thus π is a valid word. This completes the proof.The following proposition shows that all valid words are x -threshold words and tells usexplicitly which values of x produce a given valid word. It is one of the key results of the mainpaper. Proposition 16.

Suppose A2 is satisﬁed and w is any valid word. Then w is the x -threshold word ⇔ x ∈ [ y w , y w ] . Proof.

Let V := { L q (1) , R q (1) : q ≥ } , V n +1 := { L q ( v ) , R q ( v ) : v ∈ V n , q ≥ } . Note that V contains L q (0) = 0 q +1 L q +1 (1) and R q (0) = 01 q which for q ≥ R q − (1) and for q = 1 equals 01 = L (1). Thus ∪ ∞ n =1 V n is the set of all valid words of form 0 w H n : 0 w ∈ V n is the x -threshold word ⇔ x ∈ [ y w , y w ] Base case ( H ). Say 0 w q x -threshold word. Then x > φ (10 q ) n q − ( x ) for all n ≥ φ (010 q − ) n ( φ q − ( x )) ⇒ x ≥ lim n →∞ φ (010 q − ) n ( φ q − ( x )) = y q − . x -threshold word also gives x ≤ φ q ( x ). Therefore x ≥ y q by Proposi-tion 9. Thus if 0 q x -threshold word then x ∈ [ y w , y w ].Now say x ∈ [ y q − , y q ]. Proposition 10 gives y < x < y so that the x -threshold orbit( x k ) is contained in ( y , y ). So Proposition 9 shows that φ ( x k ) > x k and φ ( x k ) < x k for all k ≥

0. So to prove that the x -threshold word is 0 q φ (10 q ) n q − ( x ) < x and φ (10 q ) n ( x ) ≥ x for all n ≥

0. But if x ≥ y q − then for all n ≥ x ≥ φ (010 q − ) n ( x ) by Proposition 9 > φ (010 q − ) n ( φ q − ( x )) as y q − < y q − ≤ x by Claim 3 of Proposition 11= φ (10 q ) n q − ( x ) . Also if x ≤ y q then φ (10 q ) n ( x ) ≥ x for all n ≥ w q x ∈ [ y w , y w ] implies that 0 w x -threshold word.For 0 w q , the proof that π = 01 q ⇔ x ∈ [ y w , y w ] is symmetric, so it is omitted. Inductive Step.

Assume 0 ˜ w H n .Say 0 w L q (0 ˜ w k i := | L q (((0 ˜ w ω ) i − ) | + 1 so ( π ω ) k i is aligned with the start ofthe i th letter of (0 ˜ w ω . Let x k := φ ((10 w ) ω ) k ( x ) , ˜ x i := x k i , x = φ q (˜ x ) and let ˜ y v denote theﬁxed point of ˜ φ v := φ L q ( v ) for any word v . Then we have L q (0 ˜ w

1) is the x -threshold word for φ , φ ⇔ ((0 w ω ) k i : k i + q = 0 q +1 if and only if φ q ( x k i ) < x ⇔ ((0 ˜ w ω ) i = 0 if and only if ˜ x i < ˜ x ⇔ w x -threshold word for ˜ φ , ˜ φ ⇔ ˜ x ∈ [˜ y

01 ˜ w , ˜ y

10 ˜ w ] as 0 ˜ w H n ⇔ x ∈ [ y w , y w ] by Proposition 12Symmetrically we may conclude that π = 0 w R q (0 ˜ w ⇔ x ∈ [ y w , y w ]. Therefore H n +1 is true.This completes the proof. We showed that the Whittle index is increasing on the domain of each ﬁxed Christoﬀel word.However, we also need to show that the index is continuous as we move between words. Sohere we prove the following proposition.

Proposition 17.

Suppose λ ( · ) is as in the main paper. Then λ ( x ) is a continuous function of x ∈ R + . We use the following deﬁnitions.

Deﬁnition.

Let ˜ w be the reverse of word w , w ω be the word constructed by concatenating w inﬁnitely many times, | w | be the length of word w and | w | u be the number of times thatword u is a factor of w . Deﬁnition.

For a possibly-inﬁnite word w and numbers x ∈ R , β ∈ (0 ,

1) we have S ( a ω b, x ) = S ( a ω , x ) . (14)Further, if x a = φ a ( x a ) then the formula for the sum of a geometric progression gives S ( a ω , x a ) = S ( a, x a )1 − β | a | . (15) Deﬁnition.

Let X π be the range of x for which the x -threshold word is π .The following construction is closely related to the beautiful Christoﬀel tree (Berstel et al ,2008).

Deﬁnition.

Consider the mapping C which takes a sequence of words and returns asequence containing the original words mingled with the concatenation of neighbouring wordsas follows: C (( a, b, c, d, . . . , x, y, z )) := ( a, ab, b, bc, c, cd, d, . . . , x, xy, y, yz, z ) . Now consider the sequences t k := C ( k ) ((0 , k ≥

0. The ﬁrst few such sequences are t = (0 , t = (0 , , t = (0 , , , , t = (0 , , , , , , , , . Remark. If u ∈ t k then | u | ≥ k ≥

0. Now suppose u, v are adjacent in t k and wehave | uv | ≥ k + 2. Then t k +1 contains u, uv, v from which we can construct uuv and uvv . But | uuv | = | u | + | uv | ≥ k + 2 = k + 3 and | uvv | = | uv | + | v | ≥ k + 2 + 1 = k + 3. Thus, byinduction, we have shown that | uv | ≥ k + 2 for any adjacent pair u, v in t k and any k ≥

0. (16)

We gather the results needed to prove Proposition 17. Most of these results these relate to thenotion that if | x − y | is small and a, b are the x - and y -threshold words, then words a, b usuallyhave a long common preﬁx, although this is not always the case.The following simple result is repeatedly used in the other Lemmas of this subsection. Lemma 1.

Suppose (0 a , b is a standard pair. Then a b = b a .Proof. As (0 a , b

1) is a standard pair, 0 a b w a , b , w a, b, w are palindromes. Thus a b = w = ˜ w = ˜ b a = b a. If (0 a , b

1) is a standard pair, then the interval X b is immediately to the left of X a b ω .Since the words 0 b a b ω can diﬀer within the ﬁrst few letters, continuity of λ ( x ) at x = sup X b is not obvious. Similarly, X (0 a ω b is immediately to the left of X a . However,the factors 1 − β | (0 a ω b | and 1 − β | a | appearing in the deﬁnitions of the corresponding Whittleindices are diﬀerent for | a | < ∞ . Thus continuity of λ ( x ) at x = sup X a is not obvious. Thenext two Lemmas address these questions. 19 emma 2. Suppose (0 a , b is a standard pair and let x = φ b ( x ) . Then λ (0 b , x ) = λ (0 a b ω , x ) . Proof.

The right-hand side λ (0 a b ω , x ) involves the sum S (10 a b ω , x ) = S (10 b a (10 b ) ω , x ) by Lemma 1= S (10 b, x ) + β | b | S (01 a (10 b ) ω , φ b ( x )) by 13= S (10 b, x ) + β | b | S (01 a (10 b ) ω , x ) as x = φ b ( x )= (1 − β | b | ) S ((10 b ) ω , x ) + β | b | S (01 a (10 b ) ω , x ) by 15 . (17)Now we note that repeated application of Lemma 1 gives01 a (10 b ) ω = 01 a b (10 b ) ω = 01 b a (10 b ) ω = (01 b ) ω a. (18)Thus λ (0 a b ω , x ) = 1 − β | a b ω | − β ( S ((01 a b ω ) ω , x ) − S ((10 a b ω ) ω , x ))= S (01 a b ω , x ) − S (10 a b ω , x )1 − β by 14= 1 − β | b | − β ( S (01 a (10 b ) ω , x ) − S ((10 b ) ω , x )) by 17= 1 − β | b | − β ( S ((01 b ) ω , x ) − S ((10 b ) ω , x )) by 18= λ (0 b , x ) . This completes the proof.

Lemma 3.

Suppose (0 a , b is a standard pair and let x = φ a ( x ) . Then λ ((0 a ω b , x ) = λ (0 a , x ) . Proof.

The left-hand side λ ((0 a ω b , x ) involves the sum S (01( a ω b , x ) = S (01( a ω , x ) by 14= S (01 a, x ) + β | a | S ((10 a ) ω , φ a ( x )) by 13= S (01 a, x ) + β | a | S ((10 a ) ω , x ) as x = φ a ( x )= (1 − β | a | ) S ((01 a ) ω , x ) + β | a | S ((10 a ) ω , x ) by 15 . (19)Thus λ ((0 a ω b , x ) = 1 − β | (0 a ω b | − β ( S ((01( a ω b ω , x ) − S ((10( a ω b ω , x ))= 11 − β ( S (01( a ω b , x ) − S ((10 a ) ω , x )) by 14= 1 − β | a | − β ( S ((01 a ) ω , x ) − S ((10 a ) ω , x )) by 19= λ (0 a , x ) . This completes the proof. 20o demonstrate continuity at other points, we will need to rely on the fact that nearbywords often have a long common preﬁx as shown by the following two Lemmas.

Lemma 4.

Suppose (0 a , b is a subsequence of t k for some k ≥ . Then b a is a preﬁxof both (0 a ω and b (01 b ) ω .Proof. Let a = b · · · indicate that b is a preﬁx of word a and consider the statements A ( a, b ) : ( a ω = b · · · and B ( a, b ) : ( b ω = a · · · . It suﬃces to show that A ( a, b ) and B ( a, b ) are true for any adjacent words 0 a , b t k for k ≥

0. This is because A ( a, b ) ⇒ (0 a ω = 0 a a ω = 0 a b · · · = 0 b a · · · where the last equality follows from Lemma 1 and B ( a, b ) ⇒ b (01 b ) ω = 0 b b ω = 0 b a · · · which are the claims of the Lemma.We shall use induction. Take t = (0 , , , ,

01) as the base case. We must showthat A (0 , (cid:15) ) , B (0 , (cid:15) ) , A ( (cid:15), , B ( (cid:15),

1) are true. However these statements are respectively that(001) ω = (cid:15) · · · , (01) ω = 0 · · · , (10) ω = 1 · · · , (101) ω = (cid:15) · · · and are all true.Otherwise, say A ( a, b ) , B ( a, b ) are true for any adjacency 0 a , b t k . Let 0 a b c c = a b = b a using Lemma 1 again. Then the statements A ( a, c ) , B ( a, c ) , A ( c, b ) , B ( c, b ) are all true as( a ω = a a ω = a b · · · = c · · · by A ( a, b ) and as c = a b ( c ω = c · · · = a · · · as c = a b ( c ω = c · · · = b · · · as c = b a ( b ω = b b ω = b a · · · = c · · · by B ( a, b ) and as c = b a .Thus A ( a, b ) , B ( a, b ) are true for all adjacencies 0 a , b t k +1 . This completes the proof. Lemma 5.

Suppose a , b are adjacent in t k and that c lies strictly between them in t k (cid:48) for some < k < k (cid:48) . Then c b a · · · .Proof. The interval of t k (cid:48) between 0 a , b a , b t k (cid:48) − k was constructed from 0 ,

1. Thus 0 c a q b · · · for some positive integer q . Nowrecall that 0 b a = 0 a b by Lemma 1. Thus 0 c a q − a b · · · = (0 a q − b a · · · =0 b (01 a ) q · · · = 0 b a · · · as claimed.Although the existence of a long common preﬁx for nearby words suggests continuity, toprove anything we must bound the residual after removing the long common preﬁx. Thefollowing Lemma is one way to achieve this. Lemma 6.

The highest point on the orbits ( φ ((01 w ) ω ) k ( x ) : k ≥

0) and ( φ ((10 w ) ω ) k ( x ) : k ≥

0) is x + 1 since 0 w x -threshold word. The terms a k , b k of the discounted sums S ( u, φ s ( y )) =: ∞ (cid:88) k =0 β k a k and S ( u (cid:48) , φ s (cid:48) ( y )) =: ∞ (cid:88) k =0 β k b k are from the orbits ( φ ((01 w ) ω ) k ( y ) : k ≥

0) and ( φ ((10 w ) ω ) k ( y ) : k ≥

0) and φ u (cid:48)(cid:48) ( x ) ≥ φ u (cid:48)(cid:48) ( y )for any word u (cid:48)(cid:48) as x ≥ y . Therefore terms a k , b k , are also no higher than φ ( x ) ≤ x + 1. Furthermore, terms a k , b k are non-negative, so that | a k − b k | ≤ x + 1. Thus | S ( u, φ s ( y )) − S ( u (cid:48) , φ s (cid:48) ( y )) | ≤ (cid:80) ∞ k =0 β k | a k − b k | ≤ (cid:80) ∞ k =0 β k ( x + 1) = x +11 − β . λ ( π, x ) is continuous, a bound on its slope is helpful. Lemma 7.

Suppose x ≥ and that w is a valid word. Then | λ (cid:48) (0 w , x ) | ≤ − β ) .Proof. The deﬁnition of λ (0 w , x ) gives | λ (cid:48) (0 w , x ) | ≤ − β | w | − β ∞ (cid:88) k =0 β k (cid:12)(cid:12)(cid:12) φ (cid:48) ((01 w ) ω ) k ( x ) − φ (cid:48) ((10 w ) ω ) k ( x ) (cid:12)(cid:12)(cid:12) ≤ − β ∞ (cid:88) k =0 β k = 1(1 − β ) where the second inequality follows as 0 ≤ β | w | < ≤ φ (cid:48) u ( x ) ≤ u since0 ≤ φ (cid:48) ( x ) ≤ φ (cid:48) ( x ) ≤ φ , φ of the main paper. Lemma 8.

Suppose φ ( x ) and φ ( x ) are as in the main paper and x ∈ R + . Then φ ( x ) <φ ( x ) .Proof. The deﬁnitions of φ , φ give φ ( x ) − φ ( x ) =( b − a ) ( ab + b + a ) x + (2 ab + 3 b + 3 a + 2) x + ab + 2 b + 2 a + 3(( ab + b + a ) x + ab + b + 2 a + 1)(( ab + b + a ) x + ab + 2 b + a + 1)which is positive as b > a and x ≥ (cid:15), δ ) deﬁnition in which we will put δ = l k where l k is deﬁned in the following Lemma. Lemma 9.

For any (cid:15) > there is a k < ∞ such that < l k := inf {| X π | : π ∈ t k } < (cid:15). Proof.

Say 0 a , b t k . Then by construction of t k + i , the gap ( z b , z a ) contains2 i − t k + i \ t k . Each of these intervals is at most z a − z b i − in length. Thus lim k →∞ l k = 0. This demonstrates the existence of a k < ∞ such that l k < (cid:15) .To show that l k > k , we shall demonstrate that assuming l k = 0 leads to acontradiction. If l k = 0 then there is some word 0 w ∈ t k such that z w = z w =: x .Therefore φ w ( x ) = φ w ( x ). Now in R + , functions φ ( x ) , φ ( x ) have inverses, so φ − w ( x ) iswell-deﬁned. Therefore φ ( x ) = φ − w ◦ φ w ( x ) = φ − w ◦ φ w ( x ) = φ ( x )which contradicts Lemma 8 as x ≥ Proof.

We wish to show that for any (cid:15) >

0, there exists a δ > | x − y | < δ we have ∆ := | λ ( x ) − λ ( y ) | < (cid:15) . Without loss of generality we assume that x ≥ y .Speciﬁcally, we shall put δ = l k > l k is as deﬁned in Lemma 9 and k is any positiveinteger such that l k (1 − β ) < (cid:15) and such that 2 x +1(1 − β ) β k +1 < (cid:15) . The existence of such a k isguaranteed by Lemma 9 and because β ∈ (0 , a , b x - and y -threshold words. If these words are the same then∆ = | λ (0 a , x ) − λ (0 a , y ) | ≤ | y − x | sup z ∈ [ x,y ] | λ (cid:48) (0 a , z ) | ≤ | y − x | (1 − β ) ≤ l k (1 − β ) < (cid:15) | y − x | < δ = l k and thefourth from the deﬁnition of k . 22therwise 0 a (cid:54) = 0 b

1. In this case, let (0 e , b

1) be the standard pair for word 0 b

1, let a = φ a ( a ) and ¯ b = φ b (¯ b ). Noting that y ≤ ¯ b ≤ a ≤ x , our strategy is to write∆ = | ∆ + ∆ + ∆ + ∆ + ∆ + ∆ | ∆ := λ (0 b , y ) − λ (0 b , ¯ b )∆ := λ (0 b , ¯ b ) − λ (0 e b ω , ¯ b )∆ := λ (0 e b ω , ¯ b ) − λ ((0 a ω , ¯ b )∆ := λ ((0 a ω , ¯ b ) − λ ((0 a ω , a )∆ := λ ((0 a ω , a ) − λ (0 a , a )∆ := λ (0 a , a ) − λ (0 a , x ) . Lemma 7 and the choice of δ give | ∆ | + | ∆ | + | ∆ | ≤ ¯ b − y + a − ¯ b + x − a (1 − β ) < l k (1 − β ) ≤ (cid:15) = ∆ = 0 . (21)It remains to consider ∆ . It follows from the deﬁnition of l k , that for some adjacent words0 c , d t k : either 0 a c a c d e b ω is a word strictly between 0 c d

1. Thus by Lemma 5 wehave (0 a ω = 0 pu and 0 e b ω = 0 pv where p := d c and u, v are the appropriate suﬃxes.Therefore the deﬁnition of λ ( w, x ) gives | ∆ | = (cid:12)(cid:12) λ ((0 a ω , ¯ b ) − λ (0 d b ω , ¯ b ) (cid:12)(cid:12) = 11 − β (cid:12)(cid:12)(cid:12)(cid:12) S (01 p, ¯ b ) + β | p | S ( u, φ p (¯ b )) − S (10 p, ¯ b ) − β | p | S ( u, φ p (¯ b )) − S (01 p, ¯ b ) − β | p | S ( v, φ p (¯ b )) + S (10 p, ¯ b ) + β | p | S ( v, φ p (¯ b )) (cid:12)(cid:12)(cid:12)(cid:12) = β | p | − β (cid:12)(cid:12) S ( u, φ p (¯ b )) − S ( u, φ p (¯ b )) − S ( v, φ p (¯ b )) + S ( v, φ p (¯ b )) (cid:12)(cid:12) ≤ β | p | − β (cid:0)(cid:12)(cid:12) S ( u, φ p (¯ b )) − S ( u, φ p (¯ b )) (cid:12)(cid:12) + (cid:12)(cid:12) S ( v, φ p (¯ b )) − S ( v, φ p (¯ b )) (cid:12)(cid:12)(cid:1) ≤ β | p | − β (cid:18) a + 11 − β + ¯ b + 11 − β (cid:19) ≤ β k +1 (1 − β ) x + 1) < (cid:15) a ≤ ¯ b ≤ x and ﬁnally from the deﬁnition of k .Finally, coupling 20, 21 and 22 and using the triangle inequality gives∆ < (cid:15) (cid:15) (cid:15). This completes the proof. M ( w ) Recall the deﬁnitions about words from the main paper, particularly that ˜ w is the reverse of w .Also, recall the deﬁnitions of matrices F, G, K, M ( w ). The ﬁrst of the following propositions isused to prove the second. The second appears in the main paper.23 roposition 18. Suppose w, w (cid:48) are any words. Then1. det( M ( w )) = 1 , M ( ˜ w ) = KM ( w ) − K ,3. M ( w ) = (cid:18) e f eh − f h (cid:19) for some e, f, h ∈ R ,4. M ( w ) − M ( ˜ w ) = λK for some λ ∈ R ,5. [ M ( w w (cid:48) )] [ M ( w w (cid:48) )] ≥ [ M ( w w (cid:48) )] [ M ( w w (cid:48) )] ,6. [ M ( w )] ≥ [ M ( w )] .Proof. det( M ( w )) = (cid:81) | w | i =1 det( M ( w i )) = 1 as det( F ) = det( G ) = 1 gives Claim 1.Claim 2.

The deﬁnitions of

F, G, K give KF = F − K, KG = G − K . Thus KM ( w ) = M ( w | w | ) − · · · M ( w ) − K = M ( ˜ w ) − K . The result follows as K = I . Claim 3.

Put M ( w ) =: (cid:18) e fg h (cid:19) and solve det( M ( w )) = 1 = eg − hf for g . Claim 4.

Substituting Claim 2 and Claim 3 in Claim 4 gives M ( w ) − KM ( w ) − K = ( h − e − g ) K. Claim 5.

Put M := M ( w ) , N := M ( w (cid:48) ). We calculate[ N GF M ] [ N F GM ] − [ N GF M ] [ N F GM ] = ( b − a )( M M − M M )(( ab + b + a ) N + ( b + a + 2) N N + N ) ≥ b > a ≥

0, det( M ) = 1 and N ≥

0. The result follows as

N F GM ≥ N GF M ≥ Claim 6. If w = (cid:15) then [ M ( w )] − [ M ( w )] = 1 ≥

0. Otherwise we use induction on | w | toshow that M ( w ) v ≥ v := ( − , T . In the base case w ∈ { , } so M ( w ) v = (cid:18) c c (cid:19) (cid:18) − (cid:19) = (cid:18) (cid:19) ≥ c ∈ { a, b } . For the inductive step, assume w = { u, u } for some word u satisfying M ( u ) v ≥

0. Then M ( w ) v = (cid:18) c c (cid:19) M ( u ) v ≥ c ∈ { a, b } .As [ M ( w ) v ] = [ M ( w )] − [ M ( w )] , this completes the proof. Proposition 19.

Suppose w is a word, p is a palindrome and n ≥ Z + . Then1. M ( p ) = (cid:32) fh +1 h + f f h − h + f h (cid:33) for some f, h ∈ R ,2. tr ( M (10 p )) = tr ( M (01 p )) ,3. If u ∈ { p (10 p ) n , (10 p ) n } then M ( u ) − M (˜ u ) = λK for some λ ∈ R − ,4. If w is a preﬁx of p then [ M ( p (10 p ) n w )] ≤ [ M ( p (01 p ) n w )] ,5. [ M ((10 p ) n w )] ≥ [ M ((01 p ) n w )] ,6. [ M ((10 p ) n ≥ [ M ((01 p ) n . roof. In this proof, we refer to Claim k of Proposition 18 as P k . Claim 1.

P2 gives M ( p ) = KM ( p ) − K as p = ˜ p . But in the notation of P3, [ M ( p )] =[ KM ( p ) − K ] says e = h − ( eh − /f . Solve this for e and substitute in P3. Claim 2.

Noting that GF − F G = ( b − a ) K , the notation of Claim 1 givestr( M (01 p )) − tr( M (10 p )) = tr( M ( p )( GF − GF )) = ( b − a )tr (cid:32)(cid:32) fh +1 h + f f h − h + f h (cid:33) K (cid:33) = 0 . Claim 3.

Note we can move from u to ˜ u just by swapping some 10 for 01. So, repeated applica-tion of P5 gives the inequality [ M ( u )] [ M ( u )] ≤ [ M (˜ u )] [ M (˜ u )] . But the denominators of this inequality areequal (and non-negative) as P4 gives [ M ( u )] − [ M (˜ u )] = λ (cid:48) K = 0 for some λ (cid:48) ∈ R . Thusthis inequality reduces to [ M ( u )] ≤ [ M (˜ u )] . Yet P4 also gives [ M ( u ) − M (˜ u )] = λK which combined with the previous sentence says that λK ≤

0. As K = 1, this gives λ ∈ R − . Claim 4.

Let s be the corresponding suﬃx so p = ws and M ( p (10 p ) n w ) − M ( p (01 p ) n w ) = M ( s ) − ( M ( p (10 p ) n +1 ) − M ( p (01 p ) n +1 )) =: A. But Claim 3 with u = p (10 p ) n +1 gives[ A ] = λ [ M ( s ) − K ] (cid:124) (cid:123)(cid:122) (cid:125) for some λ ≤ = [ KM (˜ s )] (cid:124) (cid:123)(cid:122) (cid:125) by P2 = λ ([ M (˜ s )] − [ M (˜ s )] ) ≤ (cid:124)(cid:123)(cid:122)(cid:125) by P6. Claim 5. As M ( w ) ≥

0, Claim 3 with u = (10 p ) n

10 gives[ M ( w )( M ((10 p ) n − M ((01 p ) n = λ [ M ( w ) K ] = λ [ − M ( w )] ≥ . Claim 6.

Let E := (cid:18) (cid:19) . Then G − F = ( b − a ) E ≥

0, so that[ GM ((10 p ) n ) − F M ((01 p ) n )] = [( b − a ) EM ((10 p ) n ) + F M ((10 p ) n ) − F M ((01 p ) n )] ≥ [ M ((10 p ) n − M ((01 p ) n ≥

10 Majorisation

In the main paper, we used one result about majorisation which was similar-but-not-identicalto any results in Marshall, Olson and Arnold (2011). Let us prove that result.

Proposition 20.

Suppose x, y ∈ R m + and f : R → R is a symmetric function that is convexand decreasing on R + . Then x ≺ w y and β ∈ [0 , ⇒ (cid:80) mi =1 β i f ( x ( i ) ) ≥ (cid:80) mi =1 β i f ( y ( i ) ) .Proof. As the claim relates to x ( i ) and y ( i ) we assume that x i and y i are in ascending order.Marshall et al (3H2B, page 133) says that if g : A → R is a non-decreasing and convexfunction on A ⊆ R and ( u , . . . , u m ) is a non-increasing and non-negative sequence, then forall non-increasing sequences ( p , . . . , p m ) the function φ ( a ) := (cid:80) mi =1 u i g ( p i ) is Schur-convex.Indeed the function f is increasing and convex for p ∈ R − (such as p = − x and p = − y )and ( β, . . . , β m ) is a non-increasing and non-negative sequence for β ∈ [0 , p , . . . , p m ) on R m − the function ψ ( p ) := (cid:80) mi =1 β i f ( p i ) is Schur-convex.Recall ( ibid , page 12) that a ∈ R m is said to be weakly submajorised by b ∈ R m , written a ≺ w b if k (cid:88) i =1 a [ i ] ≤ k (cid:88) i =1 b [ i ] , k = 1 , . . . , m where a [ i ] denotes a in descending order25nd that x ≺ w y ⇔ − a ≺ w − b ( ibid , page 13).However ( ibid , 3A8, page 87) if φ ( p ) is a real function on A ⊂ R m which is non-decreasingin each argument p i and Schur-convex on A and p ≺ w q on A then φ ( p ) ≤ φ ( q ).Indeed, the function ψ ( p ) = (cid:80) mi =1 β i f ( p i ) is a real function on R m − which is non-decreasingin each argument and Schur-convex on R m − for all non-increasing sequences ( p , . . . , p m ). Fur-thermore, − y ≺ w − x as x ≺ w y . Therefore ψ ( y ) = ψ ( − y ) ≤ ψ ( − x ) = ψ ( x ) as claimed.

11 Clariﬁcation of Theorem 1 for ≤ x ≤ y or y ≤ x < ∞ Recall the following deﬁnitions and assumption from the main paper F := (cid:18) a a (cid:19) , G := (cid:18) b b (cid:19) , E := (cid:18) (cid:19) , v ( x ) := (cid:18) z (cid:19) , b > a ≥ . If 0 ≤ x ≤ y or y ≤ x < ∞ then the relevant linear systems, (9) in the main paper, are( M (1 k +1 ) − M (01 k )) v ( x ) = ( G − F ) G k v ( x ) = ( b − a ) EG k v ( x ) ≥ M (10 k ) − M (0 k +1 )) v ( x ) = ( G − F ) F k v ( x ) = ( b − a ) EF k v ( x ) ≥ (cid:41) for k ∈ Z + where both inequalities follow as E, F, G are all ≥

0, as b > a and as x ≥ min { y , y } ≥ . Therefore all cumulative sums of the above expressions are non-negative so the derivative of thenumerator of the Whittle index is non-negative by the same weak-supermajorisation argumentas in the main paper.Meanwhile, the denominator of the index in these cases is ∞ (cid:88) k =0 β k ((1 ω ) k +1 − (01 ω ) k +1 ) = β = ∞ (cid:88) k =0 β k ((10 ω ) k +1 − (0 ω ) k +1 )which is non-negative. Therefore the rest of the proof of Theorem 1 follows as in the mainpaper.In fact we could say that the majorisation point, which is φ w (0) for words 0 w − F v ( −

1) = Gv ( −

1) = v (0). Also, Ev ( −

1) = (0 , T . Thus for all k ∈ Z + , EG k v ( − ≥ EF k v ( − ≥ Ev ( − − (cid:15) ) < (cid:15) > ..