[PDF] On the Expressive Power of Higher-Order Pushdown Systems

Abstract

We show that deterministic collapsible pushdown automata of second order can recognize a language that is not recognizable by any deterministic higher-order pushdown automaton (without collapse) of any order. This implies that there exists a tree generated by a second order collapsible pushdown system (equivalently, by a recursion scheme of second order) that is not generated by any deterministic higher-order pushdown system (without collapse) of any order (equivalently, by any safe recursion scheme of any order). As a side effect, we present a pumping lemma for deterministic higher-order pushdown automata, which potentially can be useful for other applications.

Full PDF

LLogical Methods in Computer ScienceVolume 16, Issue 3, 2020, pp. 11:1–11:69https://lmcs.episciences.org/ Submitted Apr. 18, 2014Published Aug. 20, 2020

ON THE EXPRESSIVE POWER OFHIGHER-ORDER PUSHDOWN SYSTEMS

PAWE(cid:32)L PARYSUniversity of Warsaw, ul. Banacha 2, 02-097 Warszawa, Poland e-mail address : [email protected]

Abstract.

We show that deterministic collapsible pushdown automata of second order canrecognize a language that is not recognizable by any deterministic higher-order pushdownautomaton (without collapse) of any order. This implies that there exists a tree generatedby a second order collapsible pushdown system (equivalently, by a recursion scheme ofsecond order) that is not generated by any deterministic higher-order pushdown system(without collapse) of any order (equivalently, by any safe recursion scheme of any order).As a side eﬀect, we present a pumping lemma for deterministic higher-order pushdownautomata, which potentially can be useful for other applications. Introduction

Already in the 70’s, Maslov [Mas74, Mas76] generalized the concept of pushdown automatato higher-order pushdown automata ( n -PDA) by allowing the stack to contain other stacksrather than just atomic elements. In the last decade, renewed interest in these automatahas arisen. They are now studied not only as acceptors of string languages, but also asgenerators of graphs and trees. It was an interesting problem whether the class of treesgenerated by n -PDA coincides with the class of trees generated by order- n recursion schemes.Knapik, Niwi´nski, and Urzyczyn [KNU02] showed something similar but diﬀerent: thatthis class coincides with the class of trees generated by safe order- n recursion schemes(safety is a syntactic restriction on the recursion scheme), and Caucal [Cau02] gave anothercharacterization: trees of order n + 1 are obtained from trees of order n by an MSO-interpretation of a graph, followed by application of unfolding.Driven by the question whether safety implies a semantical restriction to recursionschemes Hague, Murawski, Ong, and Serre [HMOS08] extended the model of n -PDA toorder- n collapsible pushdown automata ( n -CPDA) by introducing a new stack operationcalled collapse, and proved that the class of trees generated by n -CPDA coincides with theclass of trees generated by order- n recursion schemes (earlier, Knapik, Niwi´nski, Urzyczyn,and Walukiewicz [KNUW05] introduced panic automata, a model equivalent to 2-CPDA). Key words and phrases:

Higher-order pushdown systems, collapse, higher-order recursion schemes.This is a full version of our conference paper [Par12b].Work supported by the National Science Center (decision DEC-2012/07/D/ST6/02443). The author holdsa post-doctoral position supported by Warsaw Center of Mathematics and Computer Science.

LOGICAL METHODS l IN COMPUTER SCIENCE DOI:10.23638/LMCS-16(3:11)2020 c (cid:13)

Paweł Parys CC (cid:13) Creative Commons

Pawe(cid:32)l Parys

Vol. 16:3

Let us mention that these trees have decidable MSO theory [Ong06], and that higher-orderrecursion schemes have close connections with veriﬁcation of some real life higher-orderprograms [Kob09].Nevertheless, it was still an open question whether these two hierarchies of trees arepossibly the same hierarchy? This problem was stated in Knapik et al. [KNU02] and repeatedin other papers concerning higher-order pushdown automata [KNUW05, AdMO05, Ong06,HMOS08]. A partial answer to this question was given in our previous paper [Par11]: thereis a tree generated by a 2-CPDA that is not generated by any 2-PDA. We prove the followingstronger property.

Theorem 1.1.

There is a tree generated by a -CPDA (equivalently, by a recursion schemeof order ) that is not generated by any n -PDA, for any n (equivalently, by any safe recursionscheme of any order). This conﬁrms that the correspondence between higher-order recursion schemes andhigher-order pushdown automata is not perfect. The tree used in Theorem 1.1 (after someadaptations) comes from Knapik et al. [KNU02] and from that time was conjectured to be agood example.In this paper we work with PDA that recognize words instead of generating trees. Whilein general PDA used to recognize word languages can be nondeterministic, trees generatedby PDA closely correspond to word languages recognized by deterministic PDA. Technically,we prove the following theorem, from which Theorem 1.1 follows (it is shown in Section 3how these theorems are related).

Theorem 1.2.

There is a language recognized by a deterministic -CPDA that is notrecognized by any deterministic n -PDA, for any n . As a side eﬀect, in Section 9 we present a pumping lemma for higher-order pushdownautomata. Although its formulation is not very natural, we believe it may be useful for someother applications. The lemma is similar to the pumping lemma from our another paper[Par12c]; see Section 9 for some comments. Earlier, several pumping lemmas related to thesecond order of the pushdown hierarchy were proposed [Hay73, Gil96, Kar11].This paper is an extended version of our conference paper [Par12b]. The proof ofTheorem 1.1 goes along the same line, but with essential diﬀerences in details. The partabout types (Section 7) was simpliﬁed slightly, in the cost of complicating other parts (whichwas necessary since Theorem 7.3 is now proven in a weaker form than in the conferencepaper).1.1.

Related Work.

One may ask a similar question for word languages instead of trees:is there a language recognized by a CPDA that is not recognized by any (nondeterministic)PDA? This is an independent problem. The answer is known only for order 2 and is opposite:one can see that in 2-CPDA the collapse operation can be simulated by nondeterminism,hence 2-PDA and 2-CPDA recognize the same languages [AdMO05]. It is also an openquestion whether all word languages recognized by CPDA are context-sensitive.We have shown [Par12a] that the collapse operation increases the expressive power ofdeterministic higher-order pushdown automata with data. In this model of automata eachletter from the input word is equipped by a data value, which comes from an inﬁnite set;these data values can be stored on the stack and compared with other data values. In sucha setting the proof becomes easier than in the no-data case considered in this paper. ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:3

One can consider conﬁguration graphs of n -PDA and n -CPDA, and their ε -closures. Weknow [HMOS08] that there is a 2-CPDA whose conﬁguration graph has undecidable MSOtheory, hence which is not a conﬁguration graph of an n -PDA, nor an ε -closure of such, asthey all have decidable MSO theories.Engelfriet [Eng91] showed that the hierarchies of word languages and of trees generatedby PDA are strict (that is, for each n there is a language recognized by an n -PDA that isnot recognized by any ( n − n -CPDA is ( n − Preliminaries

For natural numbers a , b , where b ≥ a −

1, by [ a, b ] we denote the set { a, . . . , b } (which isempty if b = a − n is used exclusively for the order of pushdown automata,which is usually assumed to be ﬁxed and known implicitly.We now deﬁne stacks of order k ( k -stacks for short). Traditionally, a 0-stack is just asingle symbol, and a k -stack for k ≥ k − k -stack that is a part of an r -stack for k < r , it is convenient toknow where this k -stack is located in the r -stack. For this reason, we equip every element ofa stack by its position, written as a vector of natural numbers. Thus, for a ﬁxed alphabet Γ(of stack symbols ), a stack of order 0 is a pair ( γ, x ), where γ ∈ Γ and x = ( x n , x n − , . . . , x )is a vector of n positive integers, called a position . Then, for k ∈ [1 , n ] we deﬁne k -stacks byinduction: a k -stack is a list [ s , s , . . . , s m ] of nonempty ( k − x n , x n − , . . . , x k +1 such that, for i ∈ [1 , m ], all positions in s i are of the form ( x n , x n − , . . . , x k +1 , i, y k − , y k − , . . . , y ). ByΓ k ∗ and Γ k + we denote the the set of order- k stacks, and the set of nonempty order- k stack,respectively, where k ∈ [0 , n ]. The top of a stack is on the right.For example, when we have a 3-stack s , and n = 5, then the second 0-stack of thethird 1-stack (counting from the bottom) of the bottommost 2-stack of s is of the form( γ, ( x , x , , , x and x say where s is located in an imaginary 5-stack; thenumbers x and x should be the same in the whole s .For a k -stack s k , where k ∈ [0 , n − p +1 ( s k ) be the k -stack obtained from s k byincreasing the ( n − k )-th coordinate of all its positions by 1. For example p +1 (( γ, (2 , γ, (2 , p +1 ([( γ, (2 , , ( γ, (2 , γ, (3 , , ( γ, (3 , k -stacks s k , t k we write s k = t k , we mean that notonly their contents are equal, but also positions contained in their 0-stacks are equal; thus,when s k and t k come from the same n -stack, this actually means that s k and t k refer to thesame k -stack.While comparing two stacks, we sometimes need to ignore positions contained in their0-stacks, and compare only their contents. For a k -stack s k , let positionless stack pos ↓ ( s k )be the list of lists of ... of lists of stack symbols obtained from s k by removing positionsfrom all 0-stacks. We say that two k -stacks s k , t k are positionless-equal , denoted s k ∼ = t k ,when pos ↓ ( s k ) = pos ↓ ( t k ). When s n − is a positionless n -stack, there is a unique n -stack s n such that s n − = pos ↓ ( s n ); we write pos + ( s n − ) for s n . Pawe(cid:32)l Parys

Vol. 16:3

The size of a k -stack s k , denoted | s k | , is the number of ( k − s k = [ s , s , . . . , s m ] ∈ Γ k ∗ , and s k − ∈ Γ k − , and [ s , s , . . . , s m , s k − ] is a valid k -stack, wedenote this k -stack by s k : s k − . The operator “:” is assumed to be right associative (i.e.,e.g., s : s : s = s : ( s : s )). When 0 ≤ k ≤ r , and s r = t r : t r − : · · · : t k ∈ Γ r + , by top k ( s r ) we denote the topmost k -stack of s r , that is, t k . We use the name positionlesstopmost k -stack for pos ↓ ( top k ( · )).When Γ is ﬁxed, the stack operations of order k ≥ pop k and push kγ for each γ ∈ Γ.We can apply them to a nonempty r -stack for r ≥ k , which gives the following: • pop k ( s r : s r − : · · · : s k : s k − ) = s r : s r − : · · · : s k , that is, we remove the topmost( k − k -stack contains at least two ( k − • push kγ ( s r : s r − : · · · : s ) = s r : s r − : · · · : s k +1 : ( s k : s k − : · · · : s ) : p +1 ( s k − : s k − : · · · : s : ( γ, x )) for s = ( γ (cid:48) , x ), that is, we duplicate the topmost ( k − γ , adjusting appropriately all positions. A deterministic word-recognizing pushdown automaton of order n ( n -DPDA for short)is a tuple ( A, Γ , γ I , Q, q I , F, δ ) where A is an input alphabet, Γ is a stack alphabet, γ I ∈ Γis an initial stack symbol, Q is a set of states, q I ∈ Q is an initial state, F ⊆ Q is a set ofaccepting states, and δ is a transition function that maps every element of Q × Γ into one ofthe following objects: • read ( (cid:126)q ), where (cid:126)q : A → Q is an injective function, or • ( q, op ), where q ∈ Q and op is a stack operation of order at most n .A conﬁguration of A consists of a state and of a nonempty n -stack, that is, it is anelement of Q × Γ n + . The initial conﬁguration consists of the initial state q I and of the n -stackcontaining only one 0-stack, enclosing the initial stack symbol γ I . We use the notation π i (( p , . . . , p k )) = p i ; in particular for a conﬁguration c , π ( c ) denotes its state, and π ( c )its stack. Additionally, for a set X of tuples we deﬁne π i ( X ) to be { π i ( p ) : p ∈ X } . In orderto shorten the notation, for a conﬁguration c we sometimes write top k ( c ) or pop k ( c ) for top k ( π ( c )) or pop k ( π ( c )), respectively.We use a shorthand δ ( c ) for a conﬁguration c to denote δ ( π ( c ) , pos ↓ ( top ( c ))). Aconﬁguration d is a successor of a conﬁguration c , if • δ ( c ) = read ( (cid:126)q ), and d = ( (cid:126)q ( a ) , π ( c )) for some a ∈ A , or • δ ( c ) = ( q, op ), and d = ( q, op ( π ( c ))).Notice that a conﬁguration c has • | A | successors, if the transition is read ( (cid:126)q ); • no successors, if the operation is pop k but there is only one ( k − k -stack; • one successor, otherwise.Next, we deﬁne a run of A . For 0 ≤ i ≤ m , let c i be a conﬁguration. A run R from c to c m is a sequence c , c , . . . , c m such that, for each i ∈ [1 , m ], c i is a successor of c i − . Weset R ( i ) = c i and call | R | = m the length of R . The subrun R (cid:22) i,j is c i , c i +1 , . . . , c j . For runs R, S with R ( | R | ) = S (0), we write R ◦ S for the composition of R and S that is deﬁned asexpected. Sometimes we also consider inﬁnite runs, such that the sequence c , c , c , . . . isinﬁnite. However, unless stated explicitly, a run is ﬁnite. In the classical deﬁnition the topmost symbol can be changed only when k = 1 (for k ≥ γ = γ (cid:48) ). We make this (unimportant) extension to have a uniform deﬁnition of push k for all k . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:5 The word read by a run is a word over the input alphabet A . For a run from aconﬁguration c to its successor d , it is the empty word if the transition between them isof the form ( q, op ). If the transition is read ( (cid:126)q ), this is the one-letter word consisting of theletter a for which π ( d ) = (cid:126)q ( a ) (this letter is determined uniquely, as (cid:126)q is injective). For alonger run R this is deﬁned as the concatenation of the words read by the subruns R (cid:22) i − ,i for i ∈ [1 , | R | ]. A run is accepting if it ends in a conﬁguration whose state is accepting. A word w is accepted by A if it is read by some accepting run starting in the initial conﬁguration.The language recognized by A is the set of words accepted by A .2.1. Collapsible -DPDA. In Section 4 we also use deterministic collapsible pushdownautomata of order 2 (2-DCPDA for short). Such automata are deﬁned like 2-DPDA, withthe following diﬀerences. A 0-stack contains now three parts: a symbol from Γ, a position,and a natural number, but still only the symbol (together with a state) is used to determinewhich transition is performed from a conﬁguration. The push γ operation sets the number inthe topmost 0-stack to the current size of the 2-stack (while push γ does not modify thesenumbers). We have a new stack operation collapse . Its result collapse ( s ) is obtained from s by removing its topmost 1-stacks, so that only k − k is thenumber stored in top ( s ) (intuitively, we remove all 1-stacks on which the topmost 0-stackis present). 3. Relation between Word Languages and Trees

In this section we describe how word languages recognized by DPDA are related to treesgenerated by PDA. Before seeing how Theorem 1.2 implies Theorem 1.1, we need to deﬁnehow n -PDA are used to generate trees. We consider ranked, potentially inﬁnite trees. Besideof the input alphabet A we have a function rank : A → N ; a tree node labelled by some a ∈ A has always rank ( a ) children.Automata used to generate trees are deﬁned like DPDA or DCPDA (in particular theyare deterministic as well), with the diﬀerence that they do not have the set of accepting states,and that instead of the read ( (cid:126)q ) transitions, there are branch ( a, q , q , . . . , q rank ( a ) ) transitions,for a ∈ A , and for pairwise distinct states q , q , . . . , q rank ( a ) ∈ Q . If the transition from c is δ ( c ) = branch ( a, q , q , . . . , q rank ( a ) ), in a successor d of c we have π ( d ) = π ( c ) and π ( d ) = q i for some i ∈ [1 , rank ( a )] (in particular c has no successors if rank ( a ) = 0).Let T ( A ) be the set of all conﬁgurations c of A reachable from the initial one, such thata branch transition should be performed from c . If there is a conﬁguration of A reachablefrom the initial one, from which there is no run to a conﬁguration from T ( A ), by deﬁnition A does not generate any tree. Otherwise, a tree generated by A has runs from the initialconﬁguration to a conﬁguration from T ( A ) as its nodes. A node R is labelled by a ∈ A suchthat δ ( R ( | R | )) = branch ( a, q , q , . . . , q rank ( a ) ). A node S is its i -th child (1 ≤ i ≤ rank ( a )),if S is the composition of R and a run S (cid:48) that uses a branch transition only in its ﬁrsttransition, and for which π ( S (cid:48) (1)) = q i . Notice that the graph obtained this way is reallyan A -labelled ranked tree.We now see how Theorem 1.1 follows from Theorem 1.2. Let L ⊆ A ∗ be the languagerecognized by a 2-DCPDA A that is not recognized by any n -DPDA, for any n ( L existsby Theorem 1.2). First, we transform A into a 2-DCPDA B , recognizing L as well, suchthat each conﬁguration of B reachable from the initial one has a successor. Observe that Pawe(cid:32)l Parys

Vol. 16:3 the only reason why in A there may be conﬁgurations with no successors is that it wantsto empty a stack using a pop operation. To avoid such situations, B should have somebottom-of-stack marker ⊥ on the bottom of each 1-stack, and on the bottom of the 2-stack(a 1-stack containing only the ⊥ marker). Thus, B starts with the ⊥ marker as the initialstack symbol, performs push ⊥ and push γ I , placing the original initial stack symbol γ I . Then,whenever A blocks because it wants to empty a stack, in B the bottom-of-stack marker isuncovered; in such a situation B starts some loop with no accepting state. There is alsoa technical detail, that a pop operation that would block A , in B can enter an acceptingstate; to overcome this problem, every pop operation ending in an accepting state shouldﬁrst end in some auxiliary, not accepting state, from which (if the bottom-of-stack marker isnot seen) the accepting state is reached.Next, we create a tree-generating 2-CPDA C , which generates a tree over the alpha-bet B = { X, Y, Z } , where rank ( X ) = | A | and rank ( Y ) = rank ( Z ) = 1. It is obtainedfrom B in two steps. First, we replace each transition read ( (cid:126)q ) of B by the transition branch ( X, (cid:126)q ( a ) , (cid:126)q ( a ) , . . . , (cid:126)q ( a | A | )), where A = { a , . . . , a | A | } . Then, in each transition wereplace the resulting state q by a fresh auxiliary state q , and from q (for any topmost stacksymbol) we perform transition branch ( Y, q ) if q was accepting, or transition branch ( Z, q )if q was not accepting (this way, after each step of the original automaton, we perform atransition branch ( Y, · ) or branch ( Z, · )). Notice that from each conﬁguration of C reachablefrom the initial one, there exists a run to a conﬁguration from T ( C ), as required by thedeﬁnition of a tree-generating CPDA. Let t C be the tree generated by C .Finally, suppose that t C can also be generated by some n -PDA D (without collapse).From D we create a word-recognizing n -DPDA E . We replace each transition of the form branch ( X, q , q , . . . , q | A | ) of D by the transition read ( (cid:126)q ), where (cid:126)q ( a i ) = q i . We replace eachtransition branch ( Y, q ) of D by the transition ( p, push γ ) for a fresh accepting state p and somestack symbol γ ; from ( p, γ ) we perform the transition ( q, pop ) (thus, we replace branch ( Y, q )by a pass through an accepting state). The same for a branch ( Z, q ) transition, but the freshstate p is not accepting.Notice that E recognizes L ; this contradicts our assumptions about L , so t C is notgenerated by any n -PDA. Indeed, take any word w ∈ L . We have an accepting run of B that reads w and starts in the initial conﬁguration. This run corresponds to a run of C , thatis, to a path p in t C from the root to a Y -labelled node. Letters of w tell us which child thepath p chooses in X -labelled nodes: if i -th letter of w is a j , then from the i -th X -labellednode of p , the path continues to the j -th child. This path p corresponds also to a run of D ,so to a run of E . This run starts in the initial conﬁguration, ends with an accepting state,and reads w ; thus, E accepts w . Similarly, each word accepted by E is also accepted by B .We also recall that a tree is generated by a recursion scheme of order 2 if and only ifit is generated by a 2-CPDA [HMOS08], and that a tree is generated by a safe recursionscheme of order n if and only if it is generated by an n -PDA [KNU02]; this implies the“equivalently” parts of Theorem 1.1.4. The Separating Language

In this section we deﬁne a language U that can be recognized by a 2-DCPDA, but not byany n -DPDA, for any n . It is a language over the alphabet A = { [ , ] , (cid:63), (cid:93) } . For a word w ∈ { [ , ] , (cid:63) } ∗ we deﬁne stars ( w ). Whenever in some preﬁx of w there are more closingbrackets than opening brackets, stars ( w ) = 0. Also when in the whole w we have the ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:7 n n n N ...m N +1 m N ... m [ [[ [ [ ] ] ][Figure 1: The stack of a 1-DPDA after reading the word w same number of opening and closing brackets, stars ( w ) = 0. Otherwise, let stars ( w ) be thenumber of stars in w before the last opening bracket that is not closed. Let U be the set ofwords w(cid:93) stars ( w )+1 , for any w ∈ { [ , ] , (cid:63) } ∗ (i.e., these are words w consisting of brackets andstars, followed by stars ( w ) + 1 sharp symbols).It is known that languages similar to U can be recognized by a 2-DCPDA (cf., e.g.,Aehlig, de Miranda, and Ong [AdMO05]), but for completeness we brieﬂy show it below.The 2-DCPDA uses three stack symbols: X (used to mark the bottom of 1-stacks), Y (usedto count brackets), Z (used to mark the bottommost 1-stack). The initial symbol is X . Theautomaton ﬁrst pushes Z , makes a copy of the 1-stack (i.e., it performs push Z ), and pops Z (hence the ﬁrst 1-stack is marked with Z , unlike any other 1-stack used later). Then, foran opening bracket we push Y , for a closing bracket we pop Y , and for a star we perform push γ (where γ is the topmost stack symbol). Hence for each star we have a 1-stack and onthe last 1-stack we have as many Y symbols as the number of currently open brackets. Iffor a closing bracket the topmost symbol is X , it means that in the word read so far wehave more closing brackets than opening brackets; in this case we should accept suﬃxes ofthe form { [ , ] , (cid:63) } ∗ (cid:93) , which is easy.Finally, the (cid:93) symbol is read. If the topmost symbol is X , we have read as many openingbrackets as closing brackets, hence we should accept one (cid:93) symbol. Otherwise, the topmost Y symbol corresponds to the last opening bracket that is not closed. We execute the collapse operation. It leaves the 1-stacks created by the stars read before this bracket, except one(plus the ﬁrst 1-stack). Thus, the number of 1-stacks is precisely equal to stars ( w ). Nowwe should read as many (cid:93) symbols as we have 1-stacks, plus one (after each (cid:93) symbol weperform pop ), and then accept.In the remaining part of the paper we prove that any n -DPDA cannot recognize U ; inparticular all automata appearing in the following sections do not use collapse.5. Overview of the Proof

Before we start the real proof, in this section we present its general structure, on the intuitivelevel. Let us ﬁrst see why U cannot be recognized by any 1-DPDA A . Consider the inputword w = [ (cid:63) n [ (cid:63) n . . . [ (cid:63) n N [ (cid:63) m N +1 ] (cid:63) m N ] · · · (cid:63) m ] (cid:63) m [(where each bracket is matched, except the last opening bracket). Notice that stars ( w )equals the sum of all n i and m i , so A , after reading w , has to store all these numbersin its stack. Thus, it ﬁrst stores the number n on the stack (by repeating some stacksymbol n times), then it can mark that there was an opening bracket, then it stores n ,and so on (see Figure 1); none of these numbers can be removed later. Now consider thepreﬁx w ,i of w that ends just after the i -th closing bracket. Since A is deterministic, thestack at the end of w ,i looks similar: it is just shorter, but for sure it ends to the right ofthe vertical line, which denotes the stack size after the last opening bracket. We see that Pawe(cid:32)l Parys

Vol. 16:3 stars ( w ,i ) = n + · · · + n N − i . Thus, when A sees a (cid:93) after w ,i , it has to remove (ignore)the numbers above n N − i , and sum the rest. In particular it passes the vertical line in somestate q i . We see that for each i , at the moment of crossing this line, the stack is the same(everything to the right of the line is removed), only the state q i can diﬀer. So in fact each q i has to be diﬀerent, since for each i we expect a diﬀerent behavior. This is a contradictionwhen N is greater than the number of states.It follows that A is of order at least 2, and while reading w at some moment a pushof order 2 has to be performed, where in the topmost 1-stack we don’t remember some ofthe numbers n i or m i (for example, in order to recognize w , after each ] we can copy thetopmost 1-stack, and remove a fragment of its copy, so that the matching opening bracket ison the top). But now we can consider the word w = w (cid:63) n (cid:48) w (cid:63) n (cid:48) . . . w (cid:63) n (cid:48) N w (cid:63) m (cid:48) N +1 ] (cid:63) m (cid:48) N ] · · · (cid:63) m (cid:48) ] (cid:63) m (cid:48) [ , where the numbers n i , m i in each copy of w are independent (so in fact each w is a diﬀerentword). Notice that each w ends by an unmatched opening bracket; they are matched by theclosing brackets at the end of w . We can now almost repeat the previous reasoning. First, stars ( w ) equals the sum of all numbers, so they all have to be kept on the stack. Then, wedraw a line after reading the last w (that is, separating the 1-stacks created before thatmoment from those created later). By the order-1 argument, some number from each w isnot present in the topmost 1-stack after reading this w , so it cannot be present above theline. Next, for each i we try to end the word already after the i -th closing bracket (amongthose at the end of w , not those inside words w ). When we have a (cid:93) after each of thesepreﬁxes, we have to go below the line and behave diﬀerently (include a diﬀerent subset ofthose values which are not present above the line), so we have to cross the line in diﬀerentstates. This is again a contradiction when N is greater than the number of states. Byinduction we can continue like this, and nesting the words w n again we can show that foreach order of the DPDA there is a problem.Although the above idea of the proof looks simple, formalizing it is not straightforward.We have to deal with the following issues:(1) Above we have argued why a 1-DPDA cannot deal correctly with the word w . But infact we should consider any n -DPDA, and prove that it is impossible that it stores allnumbers from w inside one 1-stack. Then there arises a problem: when crossing “theline” it is no longer true that the stack can only be of one form. Indeed, the topmost1-stack has one ﬁxed form, but we can cross the line in a copy of this 1-stack, withanything below this 1-stack. We can even cross the line multiple times, in several copiesof the 1-stack. Thus, it is no longer true that the number of states gives the number ofways in which we can visit a substack. The ways of visiting a substack are described bytypes of stacks and by types of sequences of conﬁgurations, deﬁned in Section 7. Thekey point is that there are ﬁnitely many types for a ﬁxed DPDA.(2) Where exactly is a number stored in a stack? And, where exactly “the line” should beplaced? This is not sharp, since a DPDA may delay some stack operations by keepinginformation in its state, as well as it may temporarily create some fancy redundantstructures on the stack, which are removed later in the run. To deal with this issue,in Section 8 we deﬁne milestone conﬁgurations. Intuitively, these are conﬁgurations inwhich no additional garbage is present on the stack.(3) Finally, why it would be wrong when, while reading the (cid:93) symbols, the automaton didnot visit a place where there is stored a number that is a part of stars ( · )? Maybe, ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:9 accidentally, this number is equal to some other amount in the stack. Or maybe itwas propagated to some other region on the stack by some involved manipulations. Toovercome this diﬃculty, in Section 9 we prove a pumping lemma. It allows to changeany of the numbers in the input word, without altering too much the whole stack. Ifsome number (included in stars ( · )) is changed, the DPDA has to enter the part of thestack changed by the pumping lemma; otherwise it would incorrectly accept after thesame number of the (cid:93) symbols for two words with diﬀerent stars ( · ).6. The History Function and Special Runs

We begin this section by deﬁning the history function. Then we deﬁne two classes of runsthat are particularly interesting for us, namely k -upper runs and k -returns.For any run R and any k -stack s k of R ( | R | ), where k ∈ [0 , n ], we deﬁne a k -stack hist ( R, s k ). Intuitively, hist ( R, s k ) is the (unique) k -stack of R (0), which evolved to the k -stack s k in R ( | R | ). Formally, we deﬁne hist ( R, s k ) by induction on the length of R , startingwith the case of k = 0. When | R | = 0, we take hist ( R, s ) = s . Consider now a longerrun R = S ◦ T with | T | = 1. We take hist ( R, s ) = hist ( S, s ) if the last transition of R is read or performs pop , as well as if the transition performs push rγ and s is not in thetopmost ( r − R ( | R | ). If the last transition of R performs push rγ and s is inthe topmost ( r − R ( | R | ), then hist ( R, s ) = hist ( S, t ), where t is equal to s with the ( n − r + 1)-th coordinate of its position decreased by 1 (i.e., t is the 0-stack of T (0) from which s was obtained as a copy). Notice that (for technical convenience) hist works in this way also for the topmost 0-stack, although the content of the topmost 0-stackchanges during the push rγ operation. For k >

0, we deﬁne hist ( R, s k ) to be the k -stack of R (0) containing hist ( R, s ) for all 0-stacks s in s k (observe that when s , t are two 0-stacksin s k , the 0-stacks hist ( R, s ) and hist ( R, t ) are in the same k -stack).It is important to notice that whenever R = S ◦ T , then hist ( S, hist ( T, s k )) = hist ( R, s k ).In the sequel we extensively use this property, which we call compositionality of histories .For k ∈ [0 , n ], we say that a run R is k -upper if hist ( R, top k ( R ( | R | ))) = top k ( R (0)); let up k be the set of all such runs. Intuitively, a run R is k -upper when the topmost k -stack of R ( | R | ) is a copy of the topmost k -stack of R (0), but possibly some changes were made to it.Notice that up n contains all runs, up k ⊆ up l for k ≤ l , and for a run R ◦ S with S ∈ up k itholds R ∈ up k ⇐⇒ R ◦ S ∈ up k (the last property is by compositionality of histories).For k ∈ [1 , n ], a run R is a k -return if • hist ( R, top k − ( R ( | R | ))) = top k − ( pop k ( R (0))), and • R (cid:22) i, | R | (cid:54)∈ up k − for all i ∈ [0 , | R | − ret k be the set of k -returns. Observe that ret k ⊆ up k . Intuitively, R is a k -return whenthe topmost k -stack of R ( | R | ) is obtained from the topmost k -stack of R (0) by removing itstopmost ( k − Example 6.1.

Consider a 2-DPDA, and its run R of length 6 in which pos ↓ ( π ( R (0))) =[[ a, b ] , [ c, d ]], and in which the operations between consecutive conﬁgurations are push e , pop , pop , pop , push d , pop . Recall that our deﬁnition is that a push of any order can change the topmost stack symbol.The contents of the stacks of the conﬁgurations in the run, and subruns being k -upper Pawe(cid:32)l Parys

Vol. 16:3

Table 1: Stack contents of the example run, and subruns being k -upper runs and k -returns j pos ↓ ( π ( R ( j ))) i : R (cid:22) i,j ∈ up i : R (cid:22) i,j ∈ up i : R (cid:22) i,j ∈ ret i : R (cid:22) i,j ∈ ret a, b ] , [ c, d ]] 0 0 − − a, b ] , [ c, d ] , [ c, e ]] 0 , , − − a, b ] , [ c, d ] , [ c ]] 2 0 , , , − a, b ] , [ c, d ]] 0 , , − ,

24 [[ a, b ] , [ c ]] 4 0 , , , − a, b ] , [ c, d ]] 4 , , , , − − a, b ] , [ c ]] 4 , , , , , − runs and k -returns are presented in Table 1. Notice that R is not a 1-return. We have hist ( R (cid:22) , , ( d, (2 , c, (2 , Basic Properties of Runs.

We now state several easy propositions, which are usefullater, and also give more intuition about the above deﬁnitions.

Proposition 6.2.

Let R be a k -upper run (where k ∈ [0 , n ] ) such that R (cid:22) i, | R | (cid:54)∈ up k for each i ∈ [1 , | R | − . Then either • top k ( R (0)) ∼ = top k ( R ( | R | )) ; additionally for every -stack s in top k ( R ( | R | )) , hist ( R, s ) is the corresponding -stack in top k ( R (0)) , or • | R | = 1 and the only transition of R performs pop r for r ≤ k , or push rγ for r ≤ k .Proof. For | R | ≤ k -stack of R ( | R | ). It is covered by the ﬁrst operation of R , andthen it is not the topmost k -stack until R ( | R | ). Thus, it remains unchanged (we have theﬁrst possibility).Next, we give four propositions about k -upper runs and k -returns. Proposition 6.3.

Let S ◦ T be a ( k − -upper run in which T is k -upper, where k ∈ [1 , n ] .Then S is ( k − -upper. Proposition 6.5.

Let R be a run that is not ( k − -upper, where k ∈ [1 , n ] . Suppose that R (cid:22) ,j is ( k − -upper for the greatest index j ∈ [0 , | R | − such that R (cid:22) j, | R | is k -upper (inparticular such an index j exists). Then R is a k -return. Proposition 6.6.

Let R be a k -return, where k ∈ [1 , n ] . Then pop k ( top k ( R (0))) ∼ = top k ( R ( | R | )) . Additionally for every -stack s in top k ( R ( | R | )) , hist ( R, s ) is the corre-sponding -stack in pop k ( top k ( R (0))) .Proof of Propositions 6.3-6.6. Recall that a ( k − k -upperruns, and a k -return are special cases of k -upper runs. Thus in all four propositions we havea k -upper run R , where k ∈ [1 , n ] (where for Proposition 6.4 we take R = S ◦ T ). Let X denote the set of those indices i ∈ [0 , | R | ] for which R (cid:22) i, | R | is k -upper. Notice that 0 ∈ X ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:11 Figure 2: Illustrations to Equality (6.1) and Propositions 6.3-6.6. Particular columns repre-sent a k -stack in consecutive conﬁgurations of a run. Arrows show the value ofthe hist function. The run on the ﬁrst diagram is not ( k − k − k − k -stack isthe topmost one (cf. Proposition 6.4). The run on the last diagram is a k -return.and | R | ∈ X . For i ∈ X , let r i = | top k ( R ( i )) | , and let s i ( r ) be the r -th ( k − top k ( R ( i )) (for r ∈ [1 , r i ]). We claim that for all b, e ∈ X with b ≤ e , and for each r ∈ [1 , r e ], hist ( R (cid:22) b,e , s e ( r )) = s b (cid:0) min( { r } ∪ { r l : l ∈ X ∧ b ≤ l < e } ) (cid:1) . (6.1)See Figure 2 for an illustration. We prove Equality (6.1) by induction on e − b . For e = b it istrue. For the induction step consider the smallest e (cid:48) ∈ X that is greater than e . Notice thatfor all l ∈ [ e + 1 , e (cid:48) −

1] necessarily R (cid:22) l,e (cid:48) (cid:54)∈ up k (since R (cid:22) l,e (cid:48) ∈ up k implies that R (cid:22) l, | R | ∈ up k ,that is, that l ∈ X ), so the subrun R (cid:22) e,e (cid:48) is in one of the forms described by Proposition 6.2.For both of them we see that for each r ∈ [1 , r e (cid:48) ], hist ( R (cid:22) e,e (cid:48) , s e (cid:48) ( r )) = s e (min { r, r e } ) . Together with the induction assumption for b, e , this implies Equality (6.1) for b, e (cid:48) .We also claim that for all b, e ∈ X with b ≤ e , R (cid:22) b,e ∈ up k − ⇔ r b ≤ r i for each i ∈ X ∩ [ b, e ] . (6.2)Indeed, R (cid:22) b,e is ( k − hist ( R (cid:22) b,e , s e ( r e )) = s b ( r b ), and, as we see fromEquality (6.1), the latter holds if and only if r b ≤ r i for each i ∈ X ∩ [ b, e ].Proposition 6.3 follows directly from Equivalence (6.2) used with b = 0 and e = | R | .In order to prove Proposition 6.4, we suppose that R = S ◦ T , that R is ( k − T is k -upper. Using Equivalence (6.2) with b = 0 and e = | R | , we obtain that r ≤ r i for each i ∈ X . Since T is k -upper, | S | ∈ X . Thus, we can use Equivalence (6.2)with b = 0 and e = | S | ; it tells us that S is ( k − R (cid:54)∈ up k − , let j = max( X ∩ [0 , | R | − R (cid:22) ,j ∈ up k − . If j < | R | −

1, then from Proposition 6.2 applied to R (cid:22) j, | R | we obtain that hist ( R (cid:22) j, | R | , top k − ( R ( | R | ))) = top k − ( R ( j )), that is, that R (cid:22) j, | R | is( k − k − k − R (cid:22) ,j is ( k − R is not. Thus, j = | R | −

1. By Equivalence (6.2),the assumptions R (cid:22) , | R |− ∈ up k − and R (cid:54)∈ up k − imply that r ≤ r i for each i ∈ X \ {| R |} ,but not for i = | R | . It follows that r = r | R |− = r | R | + 1, since | r | R |− − r | R | | ≤

1. FromEquality (6.1) we deduce that hist ( R, top k − ( R ( | R | ))) = top k − ( pop k ( R (0))), and fromEquivalence (6.2) that R (cid:22) i, | R | (cid:54)∈ up k − for i ∈ X ∩ [0 , | R | − i (cid:54)∈ X , we also have that R (cid:22) i, | R | (cid:54)∈ up k ⊇ up k − by deﬁnition of X . Thus R is a k -return. Pawe(cid:32)l Parys

Vol. 16:3

Finally, suppose that R is a k -return. By deﬁnition, this implies that R (cid:22) i, | R | is not( k − i ∈ X \ {| R |} , so, by Equivalence (6.2), for every i ∈ X \ {| R |} thereis some j ∈ X such that j > i and r i > r j . By transitivity, it actually holds that r i > r | R | for each i ∈ X \ {| R |} . Thus, Equality (6.1) implies that hist ( R, s | R | ( r )) = s ( r ) for each r ∈ [1 , r | R | ]; moreover, hist ( R (cid:22) i, | R | , s | R | ( r )) (cid:54) = top k − ( R ( i )) for all i < | R | , which implies that s | R | ( r ) is an unmodiﬁed copy of s ( r ) (a ( k − k − top k ( R ( | R | )) consists of the r | R | bottommost( k − top k ( R (0)), also in the sense of the history function. By the deﬁnition of a k -return, hist ( R, s | R | ( r | R | )) = s ( r − r | R | = r − Proposition 6.7.

Let R be a run such that its ﬁrst transition performs push kγ , and R (cid:22) , | R | is a k -return, where k ∈ [1 , n ] . Then top k ( R (0)) ∼ = top k ( R ( | R | )) . Additionally, for every -stack s in top k ( R ( | R | )) , hist ( R, s ) is the corresponding -stack in top k ( R (0)) . Characterization of Returns and Upper Runs.

Next we give two propositions,which describe possible forms of upper runs and returns.

Proposition 6.8.

A run R is k -upper (where k ∈ [0 , n ] ) if and only if (1) | R | = 0 , or (2) | R | = 1 , and the only transition of R is read , or it performs push rγ for any r , or pop r for r ≤ k , or (3) the ﬁrst transition of R performs push rγ for r ≥ k + 1 , and R (cid:22) , | R | is an r -return, or (4) R is a composition of two nonempty k -upper runs.Proof. The right-to-left implication is almost immediate; in Case (3) we use Proposition 6.7.Concentrate on the left-to-right implication. If | R | = 0, then we have Case (1). Supposethat | R | ≥

1. Notice that the ﬁrst transition, between R (0) and R (1), cannot perform pop r for r ≥ k + 1, as such an operation removes the topmost k -stack of R (0), which contradictsthe assumption that R is k -upper. Thus, if | R | = 1, then we have Case (2). Suppose that | R | ≥

2. If the ﬁrst transition is read , or performs pop r for r ≤ k , or push rγ for r ≤ k ,then both R (cid:22) , and R (cid:22) , | R | are k -upper; we have Case (4). We can do the same when theoperation is push rγ for r ≥ k + 1 and R (cid:22) , | R | is k -upper.The remaining case is that the ﬁrst operation is push rγ for r ≥ k + 1 and R (cid:22) , | R | is not k -upper. Notice that hist ( R (cid:22) , , s k ) = top k ( R (0)) holds only for two k -stack of R (1): for s k = top k ( R (1)) and for s k = top k ( pop r ( R (1))). So, because R is k -upper and R (cid:22) , | R | is not k -upper, which by deﬁnition means that hist ( R, top k ( R ( | R | ))) = top k ( R (0))and hist ( R (cid:22) , | R | , top k ( R ( | R | ))) (cid:54) = top k ( R (1)), it has to be hist ( R (cid:22) , | R | , top k ( R ( | R | ))) = top k ( pop r ( R (1))). Thus, also hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) = top r − ( pop r ( R (1))). Let x bethe smallest positive index for which R (cid:22) x, | R | is ( r − hist ( R (cid:22) ,x , top r − ( R ( x ))) = top r − ( pop r ( R (1))) (by compositionality of histories, because hist ( R (cid:22) x, | R | , top r − ( R ( | R | ))) = top r − ( R ( x )) and hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) = top r − ( pop r ( R (1)))), and there is no i ∈ [1 , x −

1] such that R (cid:22) i,x is ( r − R (cid:22) i,x ∈ up r − and R (cid:22) x, | R | ∈ up r − wouldimply that R (cid:22) i, | R | ∈ up r − ). Thus, R (cid:22) ,x is an r -return. The knowledge at this point of theproof is summarized in Figure 3. ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:13 x ∈ ret r | R | push rγ ∈ up r − ∈ up k R Figure 3: Illustration for the proof ofProposition 6.8 1 x ∈ ret k | R | push kγ ∈ ret r ∈ up k R Figure 4: Illustration for the proof ofProposition 6.9If x = | R | , then we have Case (3). For the remaining part of the proof suppose that x < | R | . Let s k = hist ( R (cid:22) x, | R | , top k ( R ( | R | ))). Because R (cid:22) x, | R | ∈ up r − and top k ( R ( | R | )) is in top r − ( R ( | R | )) (recall that k ≤ r − s k is in top r − ( R ( x )). On the other hand,because R ∈ up k , by compositionality of histories we know that hist ( R (cid:22) ,x , s k ) = top k ( R (0)).Proposition 6.7 applied to R (cid:22) ,x (its ﬁrst operation is push rγ , and R (cid:22) ,x is an r -return) impliesthat s k = top k ( R ( x )), that is, that R (cid:22) , | R | and R (cid:22) x, | R | are k -upper. Thus, we have Case(4). Proposition 6.9.

A run R is an r -return (where r ∈ [1 , n ] ) if and only if (1) | R | = 1 , and the only transition of R performs pop r , or (2) the ﬁrst transition of R is read , or it performs pop k for k < r , or push kγ for k (cid:54) = r , and R (cid:22) , | R | is an r -return, or (3) the ﬁrst transition of R performs push kγ for k ≥ r , and R (cid:22) , | R | is a composition of a k -return and an r -return.Proof. Let us analyze the right-to-left implication, which is easier. Case (1) is trivial. In Case(2) we observe that hist ( R (cid:22) , , top r − ( pop r ( R (1)))) = top r − ( pop r ( R (0))) (it is importantthat k (cid:54) = r in the case of push kγ ), and hence hist ( R, top r − ( R ( | R | ))) = hist ( R (cid:22) , , hist ( R (cid:22) , | R | , top r − ( R ( | R | ))))= hist ( R (cid:22) , , top r − ( pop r ( R (1)))) = top r − ( pop r ( R (0))) . In particular, this implies that R is not ( r − R (cid:22) i, | R | is not ( r − i ∈ [1 , | R | −

1] because R (cid:22) , | R | is an r -return. Thus, R is an r -return. In Case (3), let x − k -return ends in R ( x )). The situation is depictedin Figure 4. Recall that k ≥ r . By Proposition 6.7, hist ( R (cid:22) ,x , top r − ( pop r ( R ( x )))) = top r − ( pop r ( R (0))). Since hist ( R (cid:22) x, | R | , top r − ( R ( | R | ))) = top r − ( pop r ( R ( x ))), we concludethat hist ( R, top r − ( R ( | R | ))) = top r − ( pop r ( R (0))). This in particular implies that R is not( r − R (cid:22) x, | R | is an r -return, R (cid:22) i, | R | cannot be ( r − i ∈ [ x, | R |− R (cid:22) i, | R | was ( r − i ∈ [1 , x − hist ( R (cid:22) i,x , top r − ( pop r ( R ( x )))) = top r − ( R ( i )). This would imply that R (cid:22) i,x is ( k − k > r and k = r ),which is impossible, because R (cid:22) ,x is a k -return. We conclude that R is an r -return.Concentrate now on the left-to-right implication. Before starting the proof, noticethat in order to prove that R (cid:22) x, | R | ∈ ret r for some x ∈ [0 , | R | ], it is enough to check that hist ( R (cid:22) x, | R | , top r − ( R ( | R | ))) = top r − ( pop r ( R ( x ))): the condition that R (cid:22) i, | R | (cid:54)∈ up r − forall i ∈ [ x, | R | −

1] is ensured by the fact that R itself is an r -return. Pawe(cid:32)l Parys

Vol. 16:3

Of course | R | ≥

1. Because R is an r -return, hist ( R (cid:22) , , hist ( R (cid:22) , | R | , top r − ( R ( | R | )))) = hist ( R, top r − ( R ( | R | )))= top r − ( pop r ( R (0))) . (6.3)Observe that the ﬁrst operation, between R (0) and R (1), cannot be pop k for k ≥ r + 1, as after such an operation there would be no ( r − s r − of R (1) such that hist ( R (cid:22) , , s r − ) = top r − ( pop r ( R (0))), which contradicts Equality (6.3).Suppose that the ﬁrst operation of R is pop r . In this situation, the only ( r − s r − of R (1) such that hist ( R (cid:22) , , s r − ) = top r − ( pop r ( R (0))) is s r − = top r − ( R (1)), andthus we have hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) = top r − ( R (1)) by Equality (6.3). This meansthat R (cid:22) , | R | is ( r − r -return cannot be ( r − | R | = 1; we have Case (1).Next, suppose that the ﬁrst operation is read , or pop k for k ≤ r −

1, or push kγ for k ≤ r −

1. In this situation, the only ( r − s r − of R (1) such that hist ( R (cid:22) , , s r − ) = top r − ( pop r ( R (0))) is s r − = top r − ( pop r ( R (1))), and thus hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) = top r − ( pop r ( R (1))), by Equality (6.3). In consequence, R (cid:22) , | R | is an r -return; we have Case(2). Finally, suppose that the ﬁrst operation of R is push kγ for k ≥ r . If k > r , thenthere are two ( r − s r − of R (1) such that hist ( R (cid:22) , , s r − ) = top r − ( pop r ( R (0))),namely s r − = top r − ( pop r ( R (1))) and s r − = top r − ( pop r ( pop k ( R (1)))). If k = r ,only the latter possibility remains: s r − = top r − ( pop r ( pop k ( R (1)))). By Equality (6.3), hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) has to be one of these two ( r − hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) = top r − ( pop r ( R (1))) and k > r . Then R (cid:22) , | R | is an r -return; we have Case (2).The opposite possibility is that hist ( R (cid:22) , | R | , top r − ( R ( | R | ))) = top r − ( pop r ( pop k ( R (1)))).Because top r − ( R ( | R | )) and top r − ( pop r ( pop k ( R (1)))) are in top k ( R ( | R | )) and top k ( R (1)),respectively (recall that k ≥ r ), this implies that R (cid:22) , | R | ∈ up k . Let x be the smallestpositive index such that R (cid:22) ,x (cid:54)∈ up k − and R (cid:22) x, | R | ∈ up k (it exists: in the worst case wecan take x = | R | , since R (cid:22) , | R | (cid:54)∈ up k − and R (cid:22) | R | , | R | ∈ up k ). Because R (cid:22) , | R | and R (cid:22) x, | R | are k -upper, also R (cid:22) ,x is k -upper. Moreover, because R (cid:22) ,x (cid:54)∈ up k − and R (cid:22) , ∈ up k − ,necessarily x >

1. Let also j be the greatest index in [1 , x −

1] such that R (cid:22) j,x ∈ up k (itexists, because R (cid:22) ,x ∈ up k and 1 ∈ [1 , x − R (cid:22) j, | R | (a composition of two k -upperruns) is k -upper, and thus R (cid:22) ,j is not ( k − x . In such a situation,Proposition 6.5 implies that R (cid:22) ,x is a k -return. Let t r − = hist ( R (cid:22) x, | R | , top r − ( R ( | R | ))).Because R (cid:22) x, | R | ∈ up k and top r − ( R ( | R | )) is in top k ( R ( | R | )) (since k ≥ r ), we have that t r − is in top k ( R ( x )). On the other hand, because R ∈ ret r , by compositionality of histories weknow that hist ( R (cid:22) ,x , t r − ) = top r − ( pop r ( R (0))). Proposition 6.7 applied to R (cid:22) ,x (its ﬁrstoperation is push kγ , and R (cid:22) ,x is a k -return) implies that t r − = top r − ( pop r ( R ( x ))), that is,that R (cid:22) x, | R | ∈ ret r ; we have Case (3). ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:15 Types and Sequence Equivalence

In this section we assign to each conﬁguration a type from a ﬁnite set. The slogan is thatconﬁgurations with the same positionless topmost k -stacks and the same type are startingpoints of similar k -upper runs. We start by an example. Example 7.1.

Consider a 3-DPDA that (while being in some state) can perform thefollowing 1-upper run: it executes pop , push , and then it starts analyzing the topmost2-stack using pop and pop ; when a 0-stack containing a ﬁxed stack symbol a is found, theautomaton performs pop ; the run ends in the same state as it begins. As an eﬀect of thisrun, only the topmost 0-stack is removed, so this is indeed a 1-upper run. Notice that it canbe executed only when the topmost 2-stack contains the a symbol, and can be repeated aslong as the topmost 1-stack is nonempty. Consider now two conﬁguration of this 3-DPDA,having the same positionless topmost 1-stack. If additionally the topmost 2-stacks of bothconﬁgurations contain the a symbol, then from each of them we can start the 1-upper rundescribed above, and repeat it the same number of times.Because a 1-upper run can arbitrarily modify the topmost 1-stack, we consider conﬁgu-rations having the same positionless topmost 1-stack. On the other hand, we summarize therest of the stack in a small piece of information, called a type. In this example we only needto know whether there is the a symbol in the topmost 2-stack (below the topmost 1-stack).In general, whenever a 3-DPDA removes the topmost 1-stack and starts analyzing the stackbelow, next it has to remove the whole topmost 2-stack (since we consider a 1-upper run).Thus, for each entering state (i.e., the state when removing the topmost 1-stack) we onlyneed to know the exit state (i.e., the state when removing the topmost 2-stack). For higherorders the situation is slightly more complicated, but similar.There is also a second goal of this section. Suppose that we have a sequence ofconﬁgurations, all having the same positionless topmost k -stack and the same type. Then,as said above, from each of them we can execute a similar k -upper run. Typically, these k -upper runs are preﬁxes of some accepting runs. We want to determine whether suchaccepting runs can read an unbounded number of (cid:93) symbols, or not. (For technical reasons,we consider n -returns instead of accepting runs.)For this section we ﬁx an n -DPDA A with stack alphabet Γ, state set Q , and inputalphabet A that contains a distinguished symbol (cid:93) . Moreover, we ﬁx a morphism ϕ : A ∗ → M into a ﬁnite monoid M . For a run R reading a word w , by ϕ ( R ) we denote ϕ ( w ), and by (cid:93) ( R )we denote the number of sharps in w . The goal of the morphism is to describe when twoupper runs read a similar word: we want to distinguish input words evaluating to diﬀerentelements of M .Recall that when both R ◦ S and S are k -upper runs, then R is k -upper as well. Itfollows that any nonempty k -upper run R can be uniquely represented as a compositionof the maximal number of nonempty k -upper runs R ◦ · · · ◦ R r : we keep on cutting oﬀminimal suﬃxes that are k -upper (notice that inﬁxes or even preﬁxes of R i can be k -upper,but suﬃxes are not). We compare k -upper runs using the following deﬁnition of being( k, ϕ )-parallel. Deﬁnition 7.2.

Let R = R ◦ · · · ◦ R r and S = S ◦ · · · ◦ S s be k -upper runs decomposedinto the maximal number of nonempty k -upper runs. We say that R and S are ( k, ϕ ) -parallel when r = s , and for each i ∈ [1 , r ] it holds that ϕ ( R i ) = ϕ ( S i ) and top k ( R i (0)) ∼ = top k ( S i (0)), Pawe(cid:32)l Parys

Vol. 16:3 as well as top k ( R ( | R | )) ∼ = top k ( S ( | S | )). In particular, two runs R , S of length 0 are ( k, ϕ )-parallel when top k ( R (0)) ∼ = top k ( S (0)). When saying that two runs are ( k, ϕ )-parallel weimplicitly mean that they are k -upper.We claim that if runs R and S are ( k, ϕ )-parallel, and R is divided in any way into k -upper runs R = R (cid:48) ◦· · ·◦ R (cid:48) m , then S can be as well divided into k -upper runs S = S (cid:48) ◦· · ·◦ S (cid:48) m such that for each i ∈ [1 , m ] it holds that ϕ ( R (cid:48) i ) = ϕ ( S (cid:48) i ) and top k ( R (cid:48) i (0)) ∼ = top k ( S (cid:48) i (0)), aswell as top k ( R ( | R | )) ∼ = top k ( S ( | S | )). Indeed, on the one hand, each nonempty R (cid:48) i can befurther subdivided into k -upper runs of the ﬁnest decomposition. On the other hand, foreach empty R (cid:48) i we can insert an empty S (cid:48) i into the sequence for S .As already mentioned, to each conﬁguration c we assign its ( A , ϕ ) -type (simply called type when A and ϕ are ﬁxed), which comes from a ﬁnite set. Before giving a deﬁnition, westate two theorems, which describe required properties of our types. Theorem 7.3.

Let R be a k -upper run, where k ∈ [0 , n ] , and let c be a conﬁguration havingthe same ( A , ϕ ) -type and the same positionless topmost k -stack as R (0) . Then from c wecan start a run that is ( k, ϕ ) -parallel to R . In addition to types, we also deﬁne an equivalence relation over inﬁnite sequences ofconﬁgurations of A , called ( A , ϕ )-sequence-equivalence, which has ﬁnitely many equivalenceclasses. The goal is to specify whether the number of (cid:93) symbols read by a run constructedin Theorem 7.3 is big or small. However, instead of having “big” and “small” numbers, wesay whether their sequence is bounded or unbounded. This is made precise in the followingtheorem. Theorem 7.4.

Let R ◦ R (cid:48) be a run in which R is k -upper and R (cid:48) is an n -return, where k ∈ [0 , n ] . Let c , c , . . . and d , d , . . . be inﬁnite sequences of conﬁgurations that are ( A , ϕ ) -sequence-equivalent, and in which all conﬁgurations have the same ( A , ϕ ) -type and the samepositionless topmost k -stack as R (0) . Then for each i there exist runs S i ◦ S (cid:48) i from c i , and T i ◦ T (cid:48) i from d i in which S i and T i are ( k, ϕ ) -parallel to R , and S (cid:48) i and T (cid:48) i are n -returnssuch that ϕ ( S (cid:48) i ) = ϕ ( T (cid:48) i ) = ϕ ( R (cid:48) ) , and such that the sequences (cid:93) ( S ◦ S (cid:48) ) , (cid:93) ( S ◦ S (cid:48) ) , . . . and (cid:93) ( T ◦ T (cid:48) ) , (cid:93) ( T ◦ T (cid:48) ) , . . . are either both bounded or both unbounded. Let us mention brieﬂy how this theorem is used in Section 10. We consider there aconﬁguration c reached after reading a complicated word, containing some blocks of stars,separated by some brackets. Using a pumping lemma developed in Section 9, we increasethe number of stars read in one of such blocks, obtaining conﬁgurations c , c , . . . at the endof the run (where consecutive conﬁgurations are reached after reading more and more starsin the considered block). It is ensured that all c i have the same ( A , ϕ )-type, and the samepositionless topmost k -stack. Likewise, we increase the number of stars read in some otherblock of stars, obtaining conﬁgurations d , d , . . . . Having more blocks of stars than classesof the ( A , ϕ )-sequence-equivalence relation we can ensure that the sequences c , c , . . . and d , d , . . . are ( A , ϕ )-sequence-equivalent, by the pigeonhole principle. Theorem 7.4 saysthat the two sequences of conﬁgurations cannot be distinguished by runs (of a speciﬁcform) starting in these conﬁgurations: the runs contain sharps corresponding to stars eitherfrom both considered blocks of stars (and then both sequences (cid:93) ( S ◦ S (cid:48) ) , (cid:93) ( S ◦ S (cid:48) ) , . . . and (cid:93) ( T ◦ T (cid:48) ) , (cid:93) ( T ◦ T (cid:48) ) , . . . are unbounded) or from none of them (and then both thesesequences are bounded).The n -returns in Theorem 7.4 should be understood as accepting runs. Indeed, inSection 10 we increase by 1 the order of an arbitrary ( n − pop n ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:17 operation just before reaching an accepting state; after such a modiﬁcation, a run is acceptingif and only if it is an n -return. This trick is performed only for uniformity of presentation:instead of considering accepting runs as a separate concept, we see them as a special case ofreturns (and returns are used anyway).One may be puzzled by the fact that Theorem 7.3 talks about a k -upper run, whileTheorem 7.4 about a k -upper run composed with an n -return. This diﬀerence is application-driven: the ﬁrst theorem needs to be used without an n -return, while the second one withan n -return. In fact Theorem 7.3 is true also with an n -return at the end, and Theorem 7.4also without an n -return. Example 7.5.

Consider the 3-DPDA and the 1-upper run from Example 7.1, with thediﬀerence that now whenever a b symbol is removed from the stack during the analysis ofthe topmost 2-stack, the DPDA reads the (cid:93) symbol from the input. Additionally, supposethat when the topmost 1-stack becomes empty (a bottom-of-stack symbol is uncovered),the DPDA performs pop ; this pop serves as the 3-return R (cid:48) . Then basically we need twoequivalence classes of sequences of conﬁgurations (recall that only for sequences with thesame positionless topmost 1-stack the relation is meaningful): one where the number of b symbols in the topmost 2-stacks in the conﬁgurations is bounded, and one where this numberis unbounded. Depending on this fact, the runs read either a bounded or an unboundednumber of sharps. Of course in general we need more classes than just two (“bounded” and“unbounded”), because, for example, another 1-upper run (having a diﬀerent image under ϕ )might read one sharp per each c symbol found on the stack (instead of the b symbols).The rest of this section is devoted to deﬁning types and sequence-equivalence, andproving Theorems 7.3 and 7.4. This is independent from the rest of the paper.7.1. Deﬁnition of Types.

The types considered here are similar to stack automataof Broadbent, Carayol, Hague, and Serre [BCHS12], as well as to intersection types ofKobayashi [Kob09]. Notice, however, that we extend them by a productive/nonproductiveﬂag, which is not present there. This ﬂag is essential for our proof, since we want to estimatethe number of (cid:93) symbols read by a run, not just to determine existence of some kind ofruns. On the other hand, in the conference version of the current paper [Par12b] we wereusing types that were directly describing returns (while here returns correspond to using anassumption); these types were more complicated.

Run Descriptors.

We label stacks by run descriptors . To label a k -stack s k , where k ∈ [0 , n ],we can use a run descriptor from a set T k . The sets T k are deﬁned inductively as follows: T k = Q × P ( M × T n ) × P ( M × T n − ) × · · · × P ( M × T k +1 ) × { np , pr } , where P ( X ) denotes the power set of X . We use lowercase Greek letters ( σ, τ, . . . ) todenote elements of T k , uppercase Greek letters (Ψ , Φ , . . . ) to denote subsets of M × T k , anduppercase Greek letters with a tilde ( (cid:101) Ψ , (cid:101) Φ , . . . ) to denote subsets of T k ; to all of them weoften attach k in superscript.A run descriptor in T k is of the form σ = ( p, Ψ n , Ψ n − , . . . , Ψ k +1 , f ). Its ﬁrst coordinate, p , is called the state of σ . The sets Ψ i , for i ∈ [ k + 1 , n ], are called assumption sets of σ ,and are denoted ass i ( σ ). The last coordinate, f , is called a productivity ﬂag of σ . When f = np , we say that σ is nonproductive ; otherwise, it is productive . By T np and T pr we denote Pawe(cid:32)l Parys

Vol. 16:3 the subsets of (cid:83) k ∈ [0 ,n ] T k containing only nonproductive and productive run descriptors,respectively.A run descriptor σ assigned to some k -stack s k describes a run that starts in a conﬁgu-ration with state p and topmost k -stack s k . The run descriptor “can be used” only when thestack t n : t n − : · · · : t k +1 : s k in this conﬁguration is such that for each i ∈ [ k + 1 , n ] to the i -stack t i we have assigned π (Ψ i ). An assumption ( m, τ ) ∈ Ψ i is used when (a copy of) thestack t i becomes uncovered. The run descriptor τ describes a run from such a conﬁguration d ; this run is a suﬃx of the run from c = ( p, t n : t n − : · · · : t k +1 : s k ). The run from c to d ,which uncovers t i , is an i -return. The monoid element m describes the word w read by thereturn: m = ϕ ( w ).Beside of the state p , and the assumption sets, in σ we also have a productivity ﬂag.Roughly speaking, the run descriptor σ is productive if s k is itself responsible for readingsome (cid:93) symbols. It means that either some reading of a (cid:93) symbol is performed “inside s k ”,or some productive run descriptor (coming from some assumption set Ψ i ) is used at leasttwice as an assumption (the latter also increases the number of (cid:93) symbols read, since somereading described by this productive assumption is repeated). Thanks to the productivityﬂag, we can estimate the number of (cid:93) symbols read, by calculating the number of productiverun descriptors used.One may wonder which runs have a description by a run descriptor. The answer is thatall runs: we do not restrict ourselves to any speciﬁc kind of runs at this point.We now give more intuitions on run descriptors, in particular cases. Run descriptors in T n , assigned to stacks s n of the maximal order n , are simply of the form ( p, f ). When thestarting state p is ﬁxed, we only have two run descriptors: ( p, np ) and ( p, pr ). The formerdescribes runs from ( p, s n ) that do not read any (cid:93) symbols, while the latter those that doread some (cid:93) symbols.A run descriptor in T n − is of the form ( p, Ψ n , f ). When assigned to a stack s n − , itdescribes a run R from a conﬁguration of the form c = ( p, t n : s n − ). It is possible that R never visits t n , and only builds on top of s n − (i.e., R is ( n − n is empty, and the ﬂag f simply says whether R reads some (cid:93) symbols. The opposite case is that R uncovers the stack t n in some conﬁguration d = ( q, t n ),that is, that some its preﬁx R (cid:22) ,i is an n -return. In this situation Ψ n = { ( ϕ ( R (cid:22) ,i ) , τ ) } ,where τ describes the suﬃx R (cid:22) i, | R | . Because we are considering the highest order, when t n isuncovered in R ( i ) there are no other copies of t n . This means that only a single assumptionmay be used for t n (i.e., | Ψ n | ≤ f simplysays whether the preﬁx R (cid:22) ,i reads some (cid:93) symbol.For run descriptors in T n − the situation becomes more interesting. We explain this bymeans of an example. Example 7.6.

Consider a run R such that • R (0) = ( p , t n : t n − : s n − ), • R (cid:22) ,i is an ( n − R ( i ) = ( p , ( t n : t n − : u n − ) : p +1 ( t n − )) (notice that R (cid:22) ,i performs some push nγ without a corresponding pop n ); • R (cid:22) i,j is an n -return with R ( j ) = ( p , t n : t n − : u n − ); • R (cid:22) j,k is an ( n − R ( k ) = ( p , t n : t n − ); • R (cid:22) k,l is an n -return with R ( l ) = ( p , t n ).Let us see how such a run is described by a run descriptor σ = ( p , Ψ n , Ψ n − , f ) ∈ T n − ,which can be assigned to the ( n − s n − . Necessarily Ψ n is a singleton containing ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:19 ( ϕ ( R (cid:22) ,l ) , τ ) for τ describing the suﬃx R (cid:22) l, | R | . The set Ψ n − contains in general two elements:( ϕ ( R (cid:22) ,i ) , τ ) for τ describing the suﬃx R (cid:22) i, | R | , and ( ϕ ( R (cid:22) ,k ) , τ ) for τ describing the suﬃx R (cid:22) k, | R | . If τ (cid:54) = τ , then f says whether R (cid:22) ,i or R (cid:22) j,k reads some (cid:93) symbol (and the ﬂagsin τ , τ , τ are responsible for the subruns R (cid:22) i,k , R (cid:22) k,l , and R (cid:22) l, | R | , respectively). In mayalso happen that τ = τ . In this situation, we say that the run descriptor τ is used twiceas an assumption. Then f = pr if R (cid:22) ,i or R (cid:22) j,k reads some (cid:93) symbol, but also when τ isproductive (i.e., when a productive run descriptor is used more than once as an assumption).The intuition for this is that now while looking at run descriptors in π (Ψ n − ) it is notvisible that there are two subruns R (cid:22) i,j and R (cid:22) k,l reading (cid:93) symbols, as they both correspondto the same assumption; by setting f = pr we reﬂect the fact that R reads more (cid:93) symbolsthan in the situation when every assumption would be used only once. We remark that a run descriptor σ = ( p, Ψ n , Ψ n − , . . . , Ψ k +1 , f ) should not be seen as aclassical implication of the form “if the stacks below the topmost k -stack satisfy assumptionsΨ n , Ψ n − , . . . , Ψ k +1 , then there exists a run satisfying some properties”. It is much closerto an implication in a linear logic. Indeed, the conclusion of the implication is trivial, asit, basically, says only that there exists some (arbitrary) run. The interesting informationabout the run is contained in the sets of assumptions: by specifying an assumption, wesay that there is a suﬃx of the run corresponding to this assumption. In particular, it isessential that there are no redundant assumptions (every assumption “has to be used”, atleast once). Moreover, the information whether some assumptions are used more than onceis also recorded, in the productivity ﬂag. Composers.

For m ∈ M and Ψ ⊆ M ×T k we use the notation m ◦ Ψ for { ( m · m (cid:48) , σ ) : ( m (cid:48) , σ ) ∈ Ψ } . Given a run descriptor σ = ( p, Ψ n , Ψ n − , . . . , Ψ l +1 , f ) ∈ T l , for k ∈ [ l, n ] by red k ( σ ) wedenote the “reduced” run descriptor ( p, Ψ n , Ψ n − , . . . , Ψ k +1 , g ) ∈ T k in which g = np ⇔ ( f = np , and π (Ψ i ) ⊆ T np for each i ∈ [ l + 1 , k ]) . The following proposition is a direct consequence of the deﬁnition.

Proposition 7.7.

For ≤ l ≤ j ≤ k ≤ n and σ ∈ T l it holds that red k ( red j ( σ )) = red k ( σ ) . We now deﬁne composers, which are used to compose run descriptors corresponding tosmaller stacks into run descriptors corresponding to greater stacks.

Deﬁnition 7.8.

Consider a tuple (Φ k , Φ k − . . . , Φ l ; Ψ k ; f ), where 0 ≤ l ≤ k ≤ n , Φ i ⊆ M × T i for each i ∈ [ l, k ], Ψ k ⊆ M × T k , and f ∈ { np , pr } . Such a tuple is called a composer if(C1) Φ i = (cid:83) { m ◦ ass i ( σ ) : ( m, σ ) ∈ Φ l } for each i ∈ [ l + 1 , k ],(C2) Ψ k = { ( m, red k ( σ )) : ( m, σ ) ∈ Φ l } ,(C3) | π (Ψ k ) | = | π (Φ l ) | (which means that each σ ∈ π (Φ l ) gives a diﬀerent red k ( σ )), and One can imagine two possible deﬁnitions of “the same assumption”: we may compare either only rundescriptors (in our case, τ = τ ), or pairs consisting of a monoid element and a run descriptor (in our case,( ϕ ( R (cid:22) ,i ) , τ ) = ( ϕ ( R (cid:22) ,k ) , τ )). At ﬁrst glance both deﬁnitions look equally good, or the latter deﬁnitionseems to be more natural than the former. It turns out, however, that the latter deﬁnition is problematic.The diﬃculty is that for pairs that are originally diﬀerent, ( m , σ ) (cid:54) = ( m , σ ), it may happen that aftermultiplying them by a monoid element they become equal, ( m · m , σ ) = ( m · m , σ ). We prefer to avoidthis, and hence we stick to the former deﬁnition. Pawe(cid:32)l Parys

Vol. 16:3 (C4) f = np if and only if π ( ass i ( σ )) ∩ π ( ass i ( τ )) ⊆ T np for each i ∈ [ l + 1 , k ] and each σ, τ ∈ π (Φ l ) such that σ (cid:54) = τ .Suppose that we have a k -stack t k = s k : s k − : · · · : s l , where, for i ∈ [ l, k ], elements ofΦ i “are assigned” to s i . Intuitively, a tuple (Φ k , Φ k − . . . , Φ l ; Ψ k ; f ) is a composer, if in sucha situation the set Ψ k can be assigned to the whole stack t k . As we see, the deﬁnition doesnot depend on the stacks that are actually composed, only on the sets Φ i assigned to thesestacks. One can think about Φ k , Φ k − . . . , Φ l as about inputs to the composer, and aboutΨ k and f as outputs. Nevertheless, already Φ l determines all remaining coordinates of thecomposer (when l and k are ﬁxed). We remark that not every set Φ l ⊆ M × T l can be usedin a composer, due to Condition (C3) of the deﬁnition.The deﬁnition says that run descriptors assigned to t k are of the form red k ( σ ) for σ assigned to s l (the reason is that if the topmost k -stack of a conﬁguration is t k , then itstopmost l -stack is s l ). Moreover, the assumptions of σ that are contained in ass i ( σ ) for i ∈ [ l + 1 , k ] have to be realized by the stacks s i . We notice that the run descriptor red k ( σ )is productive when σ is productive or some of the assumptions in ass i ( σ ) for i ∈ [ l + 1 , k ] isproductive (cf. the deﬁnition of red k ( σ )); in other words, in a run corresponding to the rundescriptor, the part corresponding to the stack t k is productive when some part correspondingto s i for some i ∈ [ l, k ] is productive.The composer itself also has a productivity ﬂag f . The intuition is that we set this ﬂagto pr if the runs described by elements of Ψ k read more (cid:93) symbols than those describedby elements of Φ k , Φ k − . . . , Φ l , in total. This is the case when a productive run descriptor(coming from some Φ i for i ∈ [ l + 1 , k ]) is used as an assumption for more than one elementof Φ l . While summing over all run descriptors from all Φ i , such a run descriptor is addedonly once, but it contributes to more than one run descriptor in Ψ k .Another important issue is that in Φ k , Φ k − . . . , Φ l we only have elements that reallycontribute while constructing elements of Ψ k . Simultaneously, we require that every rundescriptor in Ψ k has exactly one realization by run descriptors from Φ k , Φ k − . . . , Φ l (i.e.,it is obtained by reducing exactly one run descriptor from Φ l ). The justiﬁcation for boththese properties is the same: we want to “approximate” the number of (cid:93) symbols read byruns described by elements of Ψ k while looking at the number of (cid:93) symbols read by runsdescribed by elements of Φ k , Φ k − . . . , Φ l . Any redundant run descriptors in Φ k , Φ k − . . . , Φ l would bias our calculations; multiple decompositions of a single element of Ψ k would alsobias our calculations.In the sets Φ k , Φ k − . . . , Φ l and Ψ k we also have monoid elements, not only run descriptors.Intuitively, a pair ( m, σ ) ∈ Ψ k (or ( m, σ ) ∈ Φ i ) corresponds to a run consisting of two parts:the ﬁrst part reads a word evaluating to m and uncovers the stack t k ( s i , respectively);the second part starts when the topmost k -stack is t k (the topmost i -stack is s i ), and itis described by σ . The monoid elements in Ψ k are the same as in Φ l , since uncovering t k means uncovering s l . On the other hand, elements in Φ i for i ∈ [ l + 1 , k ] are obtained as thecomposition of m coming from ( m, σ ) ∈ Φ l (describing the word read before uncovering s l )and of m (cid:48) coming from a particular assumption ( m (cid:48) , τ ) ∈ ass i ( σ ) (describing the word readafter uncovering s l , but before uncovering s i ). Example 7.9.

Suppose that n = 2. Let τ np ∈ T ∩ T np and τ pr ∈ T ∩ T pr . Consider thefollowing elements of T : σ = ( p, Ψ , { ( m , τ np ) , ( m , τ pr ) } , np ) , σ = ( p, Ψ , { ( m , τ np ) } , np ) , ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:21 σ (cid:48) = ( p, Ψ , { ( m , τ np ) } , pr ) , σ = ( q, Φ , { ( m , τ np ) , ( m , τ pr ) } , pr ) , and the following elements of T : ξ = ( p, Ψ , pr ) , ξ = ( p, Ψ , np ) , ξ = ( q, Φ , pr ) . It holds that red ( σ i ) = ξ i for i ∈ { , , } , and red ( σ (cid:48) ) = ξ . We have composers resultingin a single pair from M × T , like( { ( m · m , τ np ) , ( m · m , τ pr ) } , { ( m , σ ) } ; { ( m , ξ ) } ; np ) , or( { ( m · m , τ np ) } , { ( m , σ (cid:48) ) } ; { ( m , ξ ) } ; np ) . Notice that these two composers have the same “output set”. We may also repeat the samerun descriptor with multiple monoid elements:( { ( m · m , τ np ) , ( m · m , τ np ) } , { ( m , σ (cid:48) ) , ( m , σ (cid:48) ) } ; { ( m , ξ ) , ( m , ξ ) } ; np ) . Here, the situation that m · m = m · m is allowed as well (and then the ﬁrst set hasonly a single element). Observe that that all the above composers are nonproductive, eventhough they involve productive run descriptors. Next, we can also have composers involvingmultiple run descriptors, like( { ( m · m , τ np ) } , { ( m , σ (cid:48) ) , ( m , σ ) } ; { ( m , ξ ) , ( m , ξ ) } ; np ) . This composer is again nonproductive: although there is a run descriptor τ np that appearsin assumption sets of both σ (cid:48) and σ , this run descriptor τ np is nonproductive. But if aproductive run descriptor τ pr appears in assumption sets of two run descriptors (like for σ and σ ), we have a productive composer:( { ( m · m , τ np ) , ( m · m , τ np ) , ( m · m , τ pr ) , ( m · m , τ pr ) } , { ( m , σ ) , ( m , σ ) } ; { ( m , ξ ) , ( m , ξ ) } ; pr ) . As negative examples, we have the following tuples, which are not composers:( { ( m · m , τ np ) , ( m · m , τ pr ) } , { ( m , σ (cid:48) ) } ; { ( m , ξ ) } ; np ) , and( { ( m · m , τ np ) , ( m · m , τ pr ) , ( m · m , τ np ) } , { ( m , σ ) , ( m , σ (cid:48) ) } ; { ( m , ξ ) , ( m , ξ ) } ; np ) . In the ﬁrst tuple, the ﬁrst set contains a redundant pair ( m · m , τ pr ), which is forbidden.In the second tuple, we have simultaneously two realizations of ξ : one using σ , and oneusing σ (cid:48) ; this is forbidden as well.In the next proposition we notice that composers are associative. Proposition 7.10.

Let ≤ l ≤ j ≤ k ≤ n . For all ﬁxed Φ k , Φ k − , . . . , Φ l , Ψ k and f ∈{ np , pr } , the following two conditions are equivalent: • (Φ k , Φ k − , . . . , Φ l ; Ψ k ; f ) is a composer, and • (Φ j , Φ j − , . . . , Φ l ; Ψ j ; f ) and (Φ k , Φ k − , . . . , Φ j +1 , Ψ j ; Ψ k ; f ) are composers for some Ψ j and f , f such that ( f = np ) ⇔ ( f = f = np ) .Proof. We use symbols X and X , X for the tuples from the ﬁrst and second item, respec-tively. First, notice that the deﬁnition of a composer instantiated either for X , or for both X and X simultaneously, implies that Φ i ⊆ M × T i for i ∈ [ l, k ], and Ψ k ⊆ M × T k .While proving the right-to-left implication, Condition (C2) of the deﬁnition of a composer Pawe(cid:32)l Parys

Vol. 16:3 instantiated for X implies that Ψ j = { ( m, red j ( σ )) : ( m, σ ) ∈ Φ l } . While proving theopposite implication, we choose Ψ j in this way; the eﬀect is that X satisﬁes Condition (C2).We see that ass i ( red j ( σ )) = ass i ( σ ) for i ∈ [ j + 1 , k ] and σ ∈ T l , so Condition (C1) for X is equivalent to the conjunction of Condition (C1) for X and X .By Proposition 7.7 we have that red k ( red j ( σ )) = red k ( σ ) for σ ∈ T l , which implies thatCondition (C2) instantiated for X is equivalent to Condition (C2) instantiated for X .Notice that | π (Ψ k ) | ≤ | π (Ψ j ) | ≤ | π (Φ l ) | , because π (Ψ k ) = { red k ( σ ) : σ ∈ π (Ψ j ) } and π (Ψ j ) = { red j ( σ ) : σ ∈ π (Φ l ) } . Thus | π (Ψ k ) | = | π (Φ l ) | if and only if | π (Ψ k ) | = | π (Ψ j ) | = | π (Φ l ) | ; we obtain equivalence for Condition (C3).Finally, Condition (C4) for X says that f = np if and only if π ( ass i ( σ )) ∩ π ( ass i ( τ )) ⊆T np for each i ∈ [ l +1 , j ] and each σ, τ ∈ π (Φ l ) such that σ (cid:54) = τ ; otherwise f = pr . For X and f we have the same with i ∈ [ j +1 , k ] (here we use again the fact that ass i ( red j ( σ )) = ass i ( σ )).Thus, while proving the right-to-left implication we have these equivalences for f and f ;they imply that ( f = np ) ⇔ ( f = f = np ). While proving the left-to-right implication wedeﬁne f and f so that this is satisﬁed; then we have Condition (C4) for X and X , andthe equivalence ( f = np ) ⇔ ( f = f = np ). Derivation Trees.

Next, we say when a run descriptor σ from T can be assigned to a stacksymbol γ . To this end, we deﬁne how statements of the form γ (cid:96) σ can be derived. Such astatement means that from a conﬁguration with topmost stack symbol γ one can start a rundescribed by the run descriptor σ (assuming that the stacks below γ satisfy the assumptionsof σ ). Actually, it is not enough to deﬁne when γ (cid:96) σ is true; we explicitly need to handlederivation trees justifying such statements. Thus, in Deﬁnition 7.11 we deﬁne the notion ofa derivation tree for γ (cid:96) σ . Having such a derivation tree D , γ (cid:96) σ is called the conclusion of D , and σ is called the run descriptor of D and is denoted rd ( D ). Deﬁnition 7.11.

We deﬁne the set of derivation trees as the smallest set satisfying thefollowing conditions. Let p be a state, and γ a stack symbol.(1) A triple ( empty , γ, p ) is a derivation tree for γ (cid:96) ( p, ∅ , ∅ , . . . , ∅ , np ).(2) Suppose that δ ( γ, p ) = read ( (cid:126)q ) and that D (cid:48) is a derivation tree for γ (cid:96) τ , where the stateof τ is (cid:126)q ( a ) for some a ∈ A . Denote Φ i = ϕ ( a ) ◦ ass i ( τ ) for i ∈ [1 , n ]. Then ( read , p, D (cid:48) )is a derivation tree for γ (cid:96) ( p, Φ n , Φ n − , . . . , Φ , f ), where f = np if and only if τ ∈ T np and a (cid:54) = (cid:93) .(3) Suppose that δ ( γ, p ) = ( q, pop k ), and that τ k ∈ T k is a run descriptor with state q .Then ( pop , γ, p, τ k ) is a derivation tree for γ (cid:96) ( p, ass n ( τ k ) , ass n − ( τ k ) , . . . , ass k +1 ( τ k ) , { ( M , τ k ) } , ∅ , . . . , ∅ , np ) . (4) Suppose that δ ( γ, p ) = ( q, push kα ), and that D (cid:48) is a derivation tree for α (cid:96) τ , wherethe state of τ is q . Denote Ψ i = ass i ( τ ) for i ∈ [1 , n ]. Moreover, suppose that(Φ k , Φ k − , . . . , Φ ; Ψ k ; f ) is a composer, and D is a set of derivation trees, all havingthe stack element γ in their conclusion, and such that { rd ( E ) : E ∈ D } = π (Φ ) and | D | = | π (Φ ) | . Let Υ i =  Ψ i for i ∈ [ k + 1 , n ] , Φ i for i = k , Ψ i ∪ Φ i for i ∈ [1 , k − . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:23 Then ( push , γ, p, D (cid:48) , D ) is a derivation tree for γ (cid:96) ( p, Υ n , Υ n − , . . . , Υ , g ), where g = np if and only if f = np , and { τ } ∪ π (Φ ) ⊆ T np , and π (Ψ i ) ∩ π (Φ i ) ⊆ T np for each i ∈ [1 , k − depth of a derivation tree D , denoted depth ( D ), is deﬁned naturally: it is 0 inCases (1) and (3), 1 + depth ( D (cid:48) ) in Case (2), and 1 + max( depth ( D (cid:48) ) , max E ∈ D depth ( E )) inCase (4).Notice that a derivation tree D determines its conclusion. This can be seen by inductionon the structure of D . In Cases (1) and (3) this is immediate. In Case (2), γ and τ aredetermined by D (cid:48) , and the letter a is determined by p and by the state of τ (recall that (cid:126)q isrequired to be injective, by the deﬁnition of the read operation); this already ﬁxes the wholeconclusion of D . Case (4) is the most complicated one. First, we can see that π (Φ ) andall Ψ i are ﬁxed by D (cid:48) and D . Then, by the deﬁnition of a composer, we have that the setΦ has to contain those pairs ( m, τ ) ∈ M × π (Φ ) for which ( m, red k ( τ )) ∈ Ψ k . The set Φ ﬁxes the rest of the composer, and thus the whole conclusion of D .We now comment on the intuitions staying behind Deﬁnition 7.11. Let D be a derivationtree for γ (cid:96) σ , where the state of σ is p . We should have in mind a run R with R (0) =( p, s n : s n − : · · · : s : u γ ), where pos ↓ ( u γ ) = γ . Basically, the run descriptor σ describessuch a run, and the derivation tree D speciﬁes parts of this run for which the 0-stack u γ isresponsible.Case (1) corresponds to an empty run ( | R | = 0). Thus, the assumption sets of σ areempty (stacks s i are never uncovered), and the run descriptor is nonproductive.In the remaining cases, nonempty runs are considered. Case (2) talks about a run R starting with a read operation. Say that this operation reads a symbol a , and ends in astate (cid:126)q ( a ). The derivation tree D (cid:48) (and, in particular, its run descriptor τ ) describes thesuﬃx R (cid:22) , | R | . Thus, we require that the state in τ is (cid:126)q ( a ). Because R (cid:22) , does not modifythe stack, assumptions from ass i ( τ ), referring to top i ( pop i ( R (1))), refer simultaneously to top i ( pop i ( R (0))) (for i ∈ [1 , n ]). This is expressed by the fact that as ass i ( σ ) we almost takeassumptions from ass i ( τ ). The only diﬀerence is that we multiply the monoid element inevery pair by ϕ ( a ) on the left. This is because monoid elements in ass i ( τ ) correspond tosome preﬁxes of R (cid:22) , | R | , while monoid elements in ass i ( σ ) should talk about preﬁxes of thewhole R ; the latter preﬁxes additionally read the symbol a at the very beginning. Our newrun descriptor σ is productive either when the ﬁrst operation reads the (cid:93) symbol, or whenthe run descriptor τ , describing the rest of the run, is productive. Recall that (cid:126)q is requiredto be injective (cf. the deﬁnition of the read operation), so seeing the state (cid:126)q ( a ) we candetermine the symbol a that was read.In Case (3) the ﬁrst operation of R is pop k , which leads to a conﬁguration of the form( q, s n : s n − : · · · : s k ). The suﬃx R (cid:22) , | R | is described by a run descriptor τ k . In particular,the state in τ k should be q . For i ∈ [1 , k −

1] the i -stack s i is never uncovered (it is destroyedby the pop k operation), so the assumption set ass i ( σ ) is set to ∅ . The k -stack s k is uncoveredafter the ﬁrst operation, so we put τ k to the assumption set ass k ( σ ), together with theneutral element of the monoid (because the word read by R (cid:22) , is empty). This is the onlypair in ass k ( σ ), because before R (1) no copies of s k are created, so s k cannot be uncoveredagain. For i ∈ [ k + 1 , n ], the assumption set ass i ( τ k ) talks about uncovering s i , and hence itis taken as ass i ( σ ) (the monoid elements remain unchanged, because R (cid:22) , does not read anysymbols). Notice that, unlike for the read operation, we do not include in D any derivationtree talking about τ k . This is because in D we only describe the part of R for which the Pawe(cid:32)l Parys

Vol. 16:3 u γ is responsible: on the one hand, after the read operation, the 0-stack u γ is still onthe top of the stack and is responsible for continuing the run; on the other hand, after the pop k operation, u γ is no longer present on the stack. The run descriptor σ is nonproductive: u γ is not responsible for reading any (cid:93) symbol, and every assumption of σ is used only once(for i ∈ [ k + 1 , n ], we simply pass every assumption from ass i ( σ ) to τ k ; independently, τ k may use these assumptions more than once, but this is a responsibility of τ k , not of σ ).Finally, we have Case (4), where the ﬁrst operation of R is push kα , leading to a con-ﬁguration of the form ( q, s n : s n − : · · · : s k +1 : t k : p +1 ( s k − : s k − : · · · : s : u α )) with t k = s k : s k − : · · · : s : u γ and with pos ↓ ( u α ) = α . A run descriptor τ (having state q )describes the suﬃx R (cid:22) , | R | . In D we include a derivation tree D (cid:48) for α (cid:96) τ , talking aboutthe parts of R (cid:22) , | R | for which the new topmost 0-stack u α is responsible, because u γ isresponsible for creating u α . For i ∈ [1 , k − ∪ [ k + 1 , n ], the assumption set ass i ( τ ) refersto the i -stack s i , and hence we take assumptions from ass i ( τ ) to ass i ( σ ). The assumptionset ass k ( τ ), in turn, refers to the k -stack t k , not directly to s k . Because of that, we needa composer to decompose the assumption set ass k ( τ ) to sets Φ k , Φ k − , . . . , Φ , referring tostacks s k , s k − , . . . , s , u γ . Then, for i ∈ [1 , k ], assumptions from Φ i are taken to ass i ( σ ).Elements of Φ refer to our 0-stack u γ , and hence should be realized by our derivationtree. Thus, for every run descriptor in Φ we provide a derivation tree (in the set D ). Weshould set σ to be productive if u γ is responsible for reading some (cid:93) symbols, or when someproductive run descriptor from an assumption set is used more than once. Maybe thishappens inside τ or inside some run descriptor from Φ ; when any of them is productive,we set σ to be productive. But it may also happen that a productive run descriptor isused as an assumption by τ and by some run descriptor from Φ , or that a productive rundescriptor is used as an assumption by multiple run descriptors from Φ (in the latter case,the composer is productive); in both these situation we also set σ to be productive.As already said, a derivation tree D is not a representation of the whole run R , it onlytalks about parts of R for which the topmost 0-stack u γ is responsible. It is not, however, aprecise representation even of these parts. This can be seen in Case (4) of the deﬁnition. Itis possible that τ uses some run descriptor ξ from the assumption set ass k ( τ ) more thanonce. This means that we have multiple suﬃxes of the run R described by ξ . But in D we include only one derivation tree talking about ξ , corresponding to one of the suﬃxes of R . Thus, in a sense, a derivation tree for γ (cid:96) σ is a proof that a run described by σ exists,rather than a way of representing any such run. Example 7.12.

Consider the 2-DPDA A depicted below; its stack alphabet is { γ } , andinput alphabet A = { a, b, (cid:93) } . push q q push q q pop q q q a b (cid:93) pop pop pop The arrow from q to q denotes that δ ( q , γ ) = ( q , push γ ), the arrow from q to q denotesthat δ ( q ) = read ( (cid:126)q ) with (cid:126)q ( a ) = q , and so on. Take also a monoid M = { , ne } with1 · · ne = ne · ne · ne = ne , and a morphism ϕ : A ∗ → M that maps the emptyword to 1 and all nonempty words to ne . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:25 Consider run descriptors σ i = ( q i , { (1 , ( q , pr )) } , ∅ , np ) for i ∈ { , , } , and σ ,f = ( q , { ( ne , ( q , pr )) } , ∅ , f ) for f ∈ { np , pr } . They describe runs that are compositions of a 2-return, and of a run that starts in the state q and reads some (cid:93) symbols (is productive). For i ∈ { , , } , the 2-return should not readanything. Thus, in this case D i = ( pop , γ, q i , ( q , pr )) is a derivation tree for γ (cid:96) σ i . On theother hand, for i = 4 the 2-return should read a nonempty word, which should contain some (cid:93) symbol when f = pr . We can derive γ (cid:96) σ , pr by D ,(cid:93) = ( read , q , D ), and γ (cid:96) σ , np by D ,a = ( read , q , D ), as well as by D ,b = ( read , q , D ).For f ∈ { np , pr } , denote σ ,f = red ( σ ,f ), that is, σ ,f = ( q , { ( ne , ( q , pr )) } , f ). Thenext run descriptors that we consider are σ ,f = ( q , { ( ne , ( q , pr )) } , { (1 , σ ,f ) } , np ) for f ∈ { np , pr } . They describe runs whose some suﬃx (that starts before reading anything, and after a1-return) is described by σ ,f , and some other suﬃx (that starts after reading a nonemptyword, and after a 2-return) is described by ( q , pr ). We can derive γ (cid:96) σ ,f by D ,f =( pop , γ, q , σ ,f ), for f ∈ { np , pr } . Notice that the derivation trees D ,f (and the former onesas well) specify precisely only a preﬁx of a run; more precisely, the fragment of the run thatcorresponds to the topmost 0-stack, before using an assumption. It is not checked that theassumption can be fulﬁlled by any stack that would be placed below the topmost 0-stack.We can equally well have a derivation tree like ( pop , γ, q , ξ ) with ξ = ( q , { (1 , ( q , pr )) } , pr ),which derives γ (cid:96) ( q , { (1 , ( q , pr )) } , { (1 , ξ ) } , np ), although it is impossible to deliver ξ byany 1-stack (because, e.g., there is no 2-return that ends in the state q ).Next, for f , f ∈ { np , pr } consider run descriptors σ ,f ,f = ( q , { ( ne , ( q , pr )) } , { (1 , σ ,f ) , ( ne , σ ,f ) } , f ) , where f = pr ⇔ f = f = pr . We can derive γ (cid:96) σ ,f ,f by D ,f ,f = ( push , γ, q , D ,f , { D ,f } ). The derivation tree hasto specify fragments of the run for which the topmost 0-stack is responsible. When appliedto a stack s : s : s , the push γ operation (resulting in ( s : s : s ) : p +1 ( s : s )) splits thetopmost 0-stack s into two copies; D ,f ,f says that, with the new copy of s , we shouldcontinue according to the derivation tree D ,f . The run descriptor rd ( D ,f ) (i.e., σ ,f )says that the 1-stack s will be used with run descriptor σ ,f (before reading anything),and that the 2-stack s : s : s will be used with run descriptor ( q , pr ) (after reading anonempty word). The deﬁnition of our derivation tree involves a composer( { ( ne , ( q , pr )) } , { ( ne , σ ,f ) } , { ( ne , σ ,f ) } ; { ( ne , ( q , pr )) } ; np ) . It says that in order to provide ( q , pr ) by the 2-stack s : s : s , we have to provide σ ,f by s , and σ ,f by s , and ( q , pr ) by s . Because ( q , pr ) for s : s : s will be used afterreading a nonempty word (the pair in the output set of the composer is ( ne , ( q , pr ))), alsothe run descriptors for s , s , s will be used after reading a nonempty word. The derivationtree D ,f ,f says also how the lower copy of s provides σ ,f ; this is described by thederivation tree D ,f . In the order-1 assumption set of σ ,f ,f we put both assumptions,(1 , σ ,f ) and ( ne , σ ,f ); the former needs to be provided by the new copy of s , and thelatter by the lower copy of s . Notice that σ , pr , pr is productive, because for f = f = pr the same (productive) run descriptor σ , pr is used for s twice. Pawe(cid:32)l Parys

Vol. 16:3

Consider also run descriptors σ ,f = ( q , { ( ne , ( q , pr )) } , ∅ , f ) for f ∈ { np , pr } . We can derive γ (cid:96) σ , np by D ,a,a = ( push , γ, q , D , np , np , { D ,a } ). After performing push γ from s : s : s , we obtain s : ( s : s ) : p +1 ( s ). The derivation tree D , np , np speciﬁes thebehavior of the new topmost 0-stack. The composer( ∅ , { (1 , σ , np ) , ( ne , σ , np ) } ; { (1 , σ , np ) , ( ne , σ , np ) } ; np )speciﬁes how the assumptions of rd ( D , np , np ) (i.e., σ , np , np ) can be realized by the 1-stack s : s . It says that (the lower copy of) s should provide σ , np , and the derivation tree D ,a speciﬁes how it is provided. The whole D ,a,a corresponds to a run that reads theletter a twice. Similarly, γ (cid:96) σ , np can be derived by D ,b,b = ( push , γ, q , D , np , np , { D ,b } ),which corresponds to a run that reads the letter b twice. On the other hand, there is noderivation tree corresponding to a run that ﬁrst reads the letter a , and then the letter b (or vice versa). The reason is that we have to provide exactly one realization of σ , np for s ; this can be either D ,a , or D ,b , but not both. The statement γ (cid:96) σ , pr can be derivedby D ,(cid:93),(cid:93) = ( push , γ, q , D , pr , pr , { D ,(cid:93) } ), by D ,(cid:93),a = ( push , γ, q , D , pr , np , { D ,(cid:93) , D ,a } ), by D ,a,(cid:93) = ( push , γ, q , D , np , pr , { D ,(cid:93) , D ,a } ), and by similar derivation trees using the letter b instead of a . Notice that here we have sets with two derivation trees, D ,(cid:93) and D ,a ; this isbecause one provides a realization for σ , pr , and the other for σ , np .Finally, consider run descriptors τ = ( q , ∅ , ∅ , np ) ,τ = ( q , ∅ , ∅ , pr ) , and τ = ( q , ∅ , { (1 , τ ) } , np ) with τ = red ( τ ) = ( q , ∅ , pr ) . Run descriptors τ and τ have empty assumption sets, hence they describe runs that neveruncover stacks that are below the topmost 0-stack (in other words, these are runs whoseevery preﬁx is 0-upper). We can derive γ (cid:96) τ by E = ( empty , γ, q ), which corresponds tothe empty run; γ (cid:96) τ by E = ( read , q , E ), which corresponds to the run that reads (cid:93) andstops; and γ (cid:96) τ by E = ( pop , γ, q , τ ). Annotated Stacks.

A derivation tree provides an information about a part of a run R forwhich a particular 0-stack is responsible. In order to describe the whole R , we have tospecify derivation trees for all 0-stacks in R (0). To the end, we annotate stacks using sets ofderivation trees.An annotated k -stack is a positionless k -stack over an extended alphabet, whoseelements are pairs ( γ, D ), where γ ∈ Γ and D is a set of derivation trees having conclusionswith the stack symbol γ , and diﬀerent run descriptors (that is, rd ( D ) = rd ( E ) for D, E ∈ D implies D = E ). Annotated stacks are denoted using boldface letters, often with their orderwritten in the superscript: s , t , etc. The projection of each letter in an annotated k -stack s k to the Γ coordinate is denoted by st ( s k ).We also deﬁne the type of an annotated k -stack, which is a subset of T k : type (( γ, D )) = { rd ( D ) : D ∈ D } , type ([ ]) = ∅ , It turns out that while considering annotated k -stacks we do not need positions, so for notationalconvenience we assume that annotated stacks are positionless (i.e., their 0-stacks do not contain positions,conversely to non-annotated stacks). ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:27 type ( s k : s k − ) = { red k ( σ ) : σ ∈ type ( s k − ) } . We always want to annotate stacks in a consistent way. Intuitively, when a run descriptorassigned to some stack element requires some assumptions, then the part of the stack that isbelow has to deliver annotations fulﬁlling these assumptions. Simultaneously, all annotationshave to be useful: they cannot provide derivation trees for run descriptors that do not appearas assumptions of annotations assigned higher in the stack. To formalize this, we deﬁnebelow when an annotated k -stack s k is well-formed . Deﬁnition 7.13.

Each annotated 0-stack, and the empty annotated k -stack for each k ≥ k -stack s k : s k − is well-formed if both s k and s k − are well-formed, and type ( s k ) = (cid:83) { π ( ass k ( σ )) : σ ∈ type ( s k − ) } , and | type ( s k : s k − ) | = | type ( s k − ) | .In the sequel, generally we only consider well-formed annotated stacks (except for somemoments when we ﬁrst deﬁne an annotated stack, and then we prove that it is well-formed).Notice that beside of the condition mentioned earlier (saying that type ( s k ) providesexactly assumptions for run descriptors in type ( s k − )), we also have the second condition, | type ( s k : s k − ) | = | type ( s k − ) | , saying that every run descriptor in type ( s k : s k − ) hasexactly one realization by a composition of run descriptors from type ( s k ) and type ( s k − ).Both conditions have the same goal, which is also the same as for analogous conditions inthe deﬁnition of a composer (Deﬁnition 7.8): we want to estimate the number of (cid:93) symbolsread by a run by looking at the number of productive run descriptors associated to 0-stacksin an annotated stacks. Two realizations of the same run descriptor, as well as realizationsof a run descriptor not appearing in an assumption set, would bias these calculations.The deﬁnition of types and well-formedness connects only the type of s k : s k − withthe types of s k and s k − , but similar conditions can be written for a stack of the form s k : s k − : · · · : s l . Proposition 7.14.

Let ≤ l ≤ k ≤ n , and let s = s k : s k − : · · · : s l be an annotated k -stack in which each s i is well-formed. Then, (T1) type ( s ) = { red k ( σ ) : σ ∈ type ( s l ) } .Moreover, s is well-formed if and only if (T2) type ( s i ) = (cid:83) { π ( ass i ( σ )) : σ ∈ type ( s l ) } for every i ∈ [ l + 1 , k ] , (T3) | type ( s ) | = | type ( s l ) | .Proof. Induction on k − l . Suppose ﬁrst that k − l = 0. In this case, Item (T1) holds because s = s l and because red k ( σ ) = σ when σ ∈ T k . Moreover s is well-formed by assumption,Item (T2) is true because [ l + 1 , k ] = ∅ is empty, and Item (T3) is true because s = s l ; thuswe have the equivalence.Suppose now that k − l ≥

1, and denote t = s k − : s k − : · · · : s l . We then have s = s k : t . We apply the induction assumption to t = s k − : s k − : · · · : s l . By Item (T1) ofthe induction assumption, type ( t ) = { red k − ( σ ) : σ ∈ type ( s l ) } , and by the deﬁnition of thetype of s = s k : t , type ( s ) = { red k ( σ ) : σ ∈ type ( t ) } = { red k ( red k − ( σ )) : σ ∈ type ( s l ) } . Recalling that red k ( red k − ( σ )) = red k ( σ ), we obtain Item (T1).Suppose that s is well-formed. Then, in particular, t is well-formed, so Items (T2)and (T3) hold for the substack t by the induction assumption. Item (T2) from the induction Pawe(cid:32)l Parys

Vol. 16:3 assumption is the same as our Item (T2) for i ∈ [ l + 1 , k − s = s k : t , type ( s k ) = (cid:91) { π ( ass k ( σ )) : σ ∈ type ( t ) } = (cid:91) { π ( ass k ( red k − ( σ ))) : σ ∈ type ( s l ) } . Recalling that ass k ( red k − ( σ )) = ass k ( σ ) we obtain Item (T2) for i = k . Moreover, Item (T3)of the induction assumption says that | type ( t ) | = | type ( s l ) | , and by well-formedness of s = s k : t we have that | type ( s ) | = | type ( t ) | , thus | type ( s ) | = | type ( s l ) | (Item (T3)).Conversely, suppose that Items (T2) and (T3) hold for s . Because type ( s ) is the imageof type ( t ) under the function red k , and type ( t ) is the image of type ( s l ) under the function red k − , we necessarily have | type ( s ) | ≤ | type ( t ) | ≤ | type ( s l ) | , and thus | type ( s ) | = | type ( s l ) | (Item (T3)) implies that | type ( s ) | = | type ( t ) | = | type ( s l ) | ; we thus have Item (T3) for t .Moreover, Item (T2) for t is a direct consequence of this item for s (we only restrict theconsidered orders i to [ l + 1 , k − t is well-formed.Using Item (T2) for i = k , the equality ass k ( σ ) = ass k ( red k − ( σ )), and Item (T1) for t weobtain that type ( s k ) = (cid:91) { π ( ass k ( σ )) : σ ∈ type ( s l ) } = (cid:91) { π ( ass k ( red k − ( σ ))) : σ ∈ type ( s l ) } = (cid:91) { π ( ass k ( σ )) : σ ∈ type ( t ) } . Together with the equality | type ( s ) | = | type ( t ) | this implies that s = s k : t is well-formed.An annotated stack s is called singular if | type ( s ) | = 1. When an annotated n -stack s n is singular, we deﬁne conf ( s n ) to be the conﬁguration ( q, pos + ( st ( s n ))), where q is the stateof the only run descriptor in type ( s n ).As the type of a conﬁguration c , denoted type A ,ϕ ( c ), we take the union of type ( top ( s n ))over all well-formed singular annotated n -stacks s n such that conf ( s n ) = c , type A ,ϕ ( c ) = (cid:91) { type ( top ( s n )) : s n well-formed, conf ( s n ) = c } . We remark that in the union we could also allow well-formed annotated stacks s n thatare not necessarily singular, but are such that pos + ( st ( s n )) is the stack of c and the state ofall run descriptors in type ( s n ) is the state of c . On the other hand, in general there does notexist a single well-formed annotated stack s n such that type ( top ( s n )) = type A ,ϕ ( c ). Namely,we can have a situation like in Example 7.9, where we cannot assign both σ and σ (cid:48) tothe topmost 0-stack, as both of them result in the same run descriptor ξ for the topmost1-stack.Actually, we see a direct connection between the well-formedness property and composers. Proposition 7.15.

Let ≤ l ≤ k ≤ n , let s = s k : s k − : · · · : s l be an annotated k -stackin which each s i is well-formed, and let Ψ k ⊆ M × T k . The following two conditions areequivalent: • there exists a composer (Φ k , Φ k − , . . . , Φ l ; Ψ k ; f ) such that π (Φ i ) = type ( s i ) for each i ∈ [ l, k ] , and • s is well-formed and π (Ψ k ) = type ( s ) .Proof. Suppose ﬁrst that we have a composer (Φ k , Φ k − , . . . , Φ l ; Ψ k ; f ) such that π (Φ i ) = type ( s i ) for i ∈ [ k, l ] . (7.1)Notice that type ( s ) = { red k ( σ ) : σ ∈ type ( s l ) } = { red k ( σ ) : σ ∈ π (Φ l ) } = π (Ψ k ) , (7.2) ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:29 where the consecutive equalities follow from Item (T1) of Proposition 7.14, from Equal-ity (7.1), and from Condition (C2) of Deﬁnition 7.8, respectively. Moreover, for i ∈ [ l + 1 , k ], type ( s i ) = π (Φ i ) = (cid:91) { π ( ass i ( σ )) : σ ∈ π (Φ l ) } = (cid:91) { π ( ass i ( σ )) : σ ∈ type ( s l ) } , (7.3)where the equalities are consequences of Equality (7.1), of Condition (C1) of Deﬁnition 7.8,and of Equality (7.1) again. Simultaneously, | type ( s ) | = | π (Ψ k ) | = | π (Φ l ) | = | type ( s l ) | , (7.4)where the consecutive equalities follow from Equality (7.2), from Condition (C3) of Deﬁni-tion 7.8, and from Equality (7.1), respectively. Equalities (7.3) and (7.4) give Items (T2)and (T3) of Proposition 7.14, which implies that s is well-formed; together with Equality (7.2)this gives the thesis.Conversely, suppose that s is well-formed and π (Ψ k ) = type ( s ) . (7.5)We deﬁne Φ l = { ( m, σ ) : σ ∈ type ( s l ) ∧ ( m, red k ( σ )) ∈ Ψ k } , and (7.6)Φ i = (cid:91) { m ◦ ass i ( σ ) : ( m, σ ) ∈ Φ l } for i ∈ [ l + 1 , k ] . (7.7)By Equality (7.5) and by Item (T1) of Proposition 7.14 we have that π (Ψ k ) = type ( s ) = { red k ( σ ) : σ ∈ type ( s l ) } . (7.8)This means that for every σ ∈ type ( s l ) there is some m such that ( m, red k ( σ )) ∈ Ψ k . In thelight of Equality (7.6) this implies that π (Φ l ) = type ( s l ) . (7.9)On the other hand, by Equality (7.7) and by Item (T2) of Proposition 7.14, for i ∈ [ l + 1 , k ],we have that π (Φ i ) = (cid:91) { π ( ass i ( σ )) : σ ∈ π (Φ l ) } = type ( s i ) . (7.10)Let us now check particular conditions of Deﬁnition 7.8. Condition (C1) is immediate fromEquality (7.7). By Equality (7.6), for every pair ( m, σ ) ∈ Φ l , the pair ( m, red k ( σ )) is in Ψ k .Conversely, by Equality (7.8), every pair in Ψ k is of the form ( m, red k ( σ )) with σ ∈ type ( s l ),hence ( m, σ ) ∈ Φ l by Equality (7.6). This implies Condition (C2). Using consecutivelyEquality (7.5), Item (T3) of Proposition 7.14, and Equality (7.9), we obtain Condition (C3): | π (Ψ k ) | = | type ( s ) | = | type ( s l ) | = | π (Φ l ) | . Condition (C4) always holds for some f ∈ { np , pr } . Thus (Φ k , Φ k − , . . . , Φ l ; Ψ k ; f ) is acomposer, which together with Equalities (7.9) and (7.10) gives the thesis. Example 7.16.

This is a continuation of Example 7.12. With the derivation trees consideredthere, s = [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93),a } )]] and s = [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93) , D ,a } ) , ( γ, { D , pr , np } )]] Pawe(cid:32)l Parys

Vol. 16:3 are well-formed (singular) annotated 2-stacks. On the other hand, the following annotated2-stacks are not well-formed: s = [[( γ, { E } ) , ( γ, { E } )] , [( γ, { E } ) , ( γ, { D ,(cid:93),a } )]] , s = [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93) , D ,a , D ,b } ) , ( γ, { D , pr , np } )]] , and s = [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93) } ) , ( γ, { D , pr , np } )]] . In s , we provide a spare derivation tree E , not needed by the derivation trees assignedabove. In s , two derivation trees, D ,a and D ,b , provide the same run descriptor. Finally,in s , we are missing a derivation tree that would provide σ , np .When (cid:101) Ψ is a subset of the type of a well-formed annotated k -stack s , we can removesome of the annotations in s in order to obtain a well-formed annotated k -stack s (cid:22) (cid:101) Ψ whosetype is (cid:101) Ψ. We do this by induction: • For k = 0, we restrict the set of derivation trees in s to those trees whose run descriptor isin (cid:101) Ψ. • The type of the empty stack is empty, so we need not to restrict it in any way. • For s = s k : s k − , we restrict s k − to the set (cid:101) Φ containing those σ ∈ type ( s k − ) for which red k ( σ ) ∈ (cid:101) Ψ, and we restrict s k to (cid:83) σ ∈ (cid:101) Φ π ( ass k ( σ )). Plan for the Remaining Part of the Section.

We have already deﬁned types of conﬁgurations,as needed for Theorem 7.3. Type of a conﬁguration c is deﬁned via existence of annotatedstacks s such that conf ( s ) = c . Theorem 7.3 says that if two conﬁgurations c, d have thesame type, and we have a run starting in c , then a similar run starts d . Roughly, the strategyof the proof is as follows. First, basing on the run starting in c , we construct an annotatedstack s with conf ( s ) = c , corresponding to this run. More precisely, we do not process thewhole run in this way, but rather its particular fragments. Then, because the types of c and d equal, there exists an annotated stack t with conf ( t ) = d , and with rd ( t ) = rd ( s ). Having t we proceed in the opposite direction: basing on the annotated stack t we construct a runstarting in d , satisfying the thesis of the theorem.Recall that an annotated stack is, roughly, a description of a run. The run described byan annotated stack is called annotated run , and is deﬁned in Subsection 7.2. In Lemma 7.33(located in Subsection 7.4) we prove that annotated runs have the expected form, thatis, that using an assumption of a run descriptor corresponds to performing a return. InSubsection 7.5 we present the opposite direction: how to construct an annotated stack basingon a run. The proof of Theorem 7.3 is ﬁnalized in Subsection 7.6.Simultaneously, we prepare ourselves for a proof of Theorem 7.4, which additionallytalks about the number of (cid:93) symbols read by a run. We thus need to estimate the number of (cid:93) symbols read by an annotated run. To this end, in Subsection 7.2 we deﬁne two numberscorresponding to an annotated stack s , namely low ( s ) and high ( s ). They provide a lowerbound and, respectively, an upper bound for the number of (cid:93) symbols read by an annotatedrun starting in s , as showed in Lemma 7.22. (We also deﬁne there a third number, len ( s ),and we prove that it gives an upper bound for the length of an annotated run starting in s .Its role is auxiliary: it is only needed for showing that the constructed annotated run is ﬁnite.Actually, this is needed already while proving Theorem 7.3.) The essential property is that high ( s ) can be bounded by a function of low ( s ), as shown in Proposition 7.32, contained inSubsection 7.3. Thus low ( s ) itself estimates the number of (cid:93) symbols read by an annotated ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:31 run starting in s . With such a function low we can deﬁne sequence-equivalence, as needed inTheorem 7.4. Namely, having a sequence of conﬁgurations (all of the same type), for everyrun descriptor σ it is enough to know one think: whether there is a sequence of annotatedstacks corresponding to these conﬁgurations (and to the run descriptor σ ), and such thatthe values of low for these annotated stacks are bounded. If this is the case, using theseannotated stacks we can reproduce runs that read a bounded number of (cid:93) symbol. If not,then a sequence of annotated stacks corresponding to the considered conﬁgurations alsoexists, but the values of low for these annotated stacks are unbounded, and in eﬀect thereproduced runs read an unbounded number of (cid:93) symbols. A proof of Theorem 7.4, followingthese ideas, is given in Subsection 7.7.7.2. Annotated Runs.

In this subsection we describe how, having an annotated stack, wecan reproduce a run. This is formalized in the notion of annotated runs. We also relatethe number of (cid:93) symbols read by a run with the number of productive run descriptors inannotations of the starting conﬁguration. Later, in Subsection 7.5, we do the converse: weshow how to construct an annotated stack basing on a run. Recall that when a well-formedannotated stack is singular, then its topmost 0-stack is singular as well (cf. Item (T3) ofProposition 7.14).

Deﬁnition 7.17.

Let s = s n : s n − : · · · : s be a well-formed singular annotated n -stack,where s = ( γ, { D } ). We deﬁne the successor of s .(1) If D = ( empty , γ, p ), then s has no successor.(2) If D = ( read , p, D (cid:48) ), then the successor is s n : s n − : · · · : s : ( γ, { D (cid:48) } ).(3) If D = ( pop , γ, p, τ k ), then the successor is s n : s n − : · · · : s k .(4) Suppose that D = ( push , γ, p, D (cid:48) , D ). Let α , k , Ψ i , Φ i be as in Deﬁnition 7.11(4). Inthis situation, the successor of s is s n : s n − : · · · : s k +1 : t k : s k − (cid:22) π (Ψ k − ) : s k − (cid:22) π (Ψ k − ) : · · · : s (cid:22) π (Ψ ) : ( α, { D (cid:48) } ) , where t k = s k (cid:22) π (Φ k ) : s k − (cid:22) π (Φ k − ) : · · · : s (cid:22) π (Φ ) : ( γ, D ).We notice that in Case (4) for i ∈ [1 , k ] we have that π (Φ i ) ⊆ type ( s i ), and for i ∈ [1 , k −

1] we have that π (Ψ i ) ⊆ type ( s i ), and thus the restrictions are legal. Indeed,from Proposition 7.14 applied to s we know that type ( s i ) = π ( ass i ( rd ( D ))) for all i ∈ [1 , n ];moreover, by Deﬁnition 7.11(4), ass i ( σ ) = Ψ i ∪ Φ i for i ∈ [1 , k − ass k ( σ ) = Φ k . Proposition 7.18.

Let t be the successor of a well-formed singular annotated n -stack s .Then t is singular, well-formed, and conf ( t ) is a successor of conf ( s ) (in the consideredautomaton).Proof. Let s = s n : s n − : · · · : s , and s = ( γ, { D } ), and σ = rd ( D ). All s i are well-formed,because s is well-formed. Moreover, by Proposition 7.14, type ( s i ) = π ( ass i ( σ )) for all i ∈ [1 , n ].We have several cases depending on the shape of s . We cannot have D = ( empty , γ, p ),as then s has no successor.Suppose that D = ( read , p, D (cid:48) ). Recall from Deﬁnition 7.11 that π ( ass i ( rd ( D (cid:48) )) = π ( ass i ( σ )) for all i ∈ [1 , n ]. By Proposition 7.14, t is singular (its type is a singleton { red n ( rd ( D (cid:48) )) } ) and well-formed. Moreover, δ ( γ, p ) = read ( (cid:126)q ), where (cid:126)q ( a ) equals the state of rd ( D (cid:48) ) for some a ∈ A . The transition from conf ( s ) reading this a leads to conf ( t ). Pawe(cid:32)l Parys

Vol. 16:3

Next, suppose that D = ( pop , γ, p, τ k ). By Deﬁnition 7.11 we have that π ( ass k ( σ )) = { τ k } (hence type ( s k ) = { τ k } ) and ass i ( σ ) = ass i ( τ k ) (hence type ( s i ) = π ( ass i ( τ k ))) for i ∈ [ k + 1 , n ]. Proposition 7.14 applied to t = s n : s n − : · · · : s k implies that it is singularand well-formed. Moreover, δ ( γ, p ) = ( q, pop k ), where q is the state of τ k , so the transitionfrom conf ( s ) leads to conf ( t ).Finally, suppose that D = ( push , γ, p, D (cid:48) , D ). Let α , k , Ψ i , Φ i , f be as in Deﬁni-tion 7.11(4), and t k as in Deﬁnition 7.17(4). Obviously, type ( s i (cid:22) π (Φ i ) ) = π (Φ i ), for all i ∈ [1 , k ]; moreover, type (( γ, D )) = { rd ( E ) : E ∈ D } = π (Φ ), by Deﬁnition 7.11(4).Furthermore, again by this deﬁnition, (Φ k , Φ k − , . . . , Φ ; Ψ k ; f ) is a composer. It fol-lows from Proposition 7.15 that t k is well-formed, and type ( t k ) = π (Ψ k ) (recall that t k = s k (cid:22) π (Φ k ) : s k − (cid:22) π (Φ k − ) : · · · : s (cid:22) π (Φ ) : ( γ, D )). By Deﬁnition 7.11(4), for all i ∈ [1 , n ]we have that Ψ i = ass i ( rd ( D (cid:48) )). Moreover, • for i ∈ [ k + 1 , n ], we have ass i ( σ ) = Ψ i by Deﬁnition 7.11(4), so type ( s i ) = π ( ass i ( σ )) = π (Ψ i ) = π ( ass i ( rd ( D (cid:48) ))); • type ( t k ) = π (Ψ k ) = π ( ass k ( rd ( D (cid:48) )); • for i ∈ [1 , k − type ( s i (cid:22) π (Ψ i ) ) = π (Ψ i ) = π ( ass i ( rd ( D (cid:48) )).In eﬀect, t , which is a composition of these stacks, and of ( α, { D (cid:48) } ) is singular and well-formed by Proposition 7.14. Additionally, δ ( γ, p ) = ( q, push kα ), where q is the state of rd ( D (cid:48) ),so the transition from conf ( s ) leads to conf ( t ).An annotated run R is a sequence s , . . . , s m of well-formed singular n -stacks in which s i is the successor of s i − for each i ∈ [1 , m ]. By replacing each s i by conf ( s i ) we obtain arun denoted st ( R ).Notice that an annotated stack s may have less successors than conf ( s ). Indeed, inthe case of D = ( empty , γ, p ) there are no successors of s , but conf ( s ) may have successors.Similarly, in the case of D = ( read , p, D (cid:48) ) there is exactly one successor of s (the state in rd ( D (cid:48) ) determines which letter should be read), while in a run from conf ( s ) we can read anyletter. Example 7.19.

Recall the 2-DPDA A from Example 7.12, and the annotated stack s = [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93),a } )]] from Example 7.16. The successors of s are, consecutively,[[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93) , D ,a } ) , ( γ, { D , pr , np } )]] , [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,a } ) , ( γ, { D , np } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93) } ) , ( γ, { D , pr } )]] , [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,a } ) , ( γ, { D , np } )] , [( γ, ∅ ) , ( γ, { D ,(cid:93) } )]] , [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,a } ) , ( γ, { D , np } )] , [( γ, ∅ ) , ( γ, { D } )]] , [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,a } ) , ( γ, { D , np } )]] , [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D ,a } )]] , [[( γ, { E } ) , ( γ, { E } )] , [( γ, ∅ ) , ( γ, { D } )]] , [[( γ, { E } ) , ( γ, { E } )]] , [[( γ, { E } )]] , [[( γ, { E } )]] ;the last of them has no more successors. In the transition between the ﬁrst and the secondline, D , pr , np says that the new topmost 0-stack should be annotated by D , pr , and that the ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:33 previously topmost 0-stack should be annotated by D , np . Because ass ( rd ( D , pr )) = σ , pr and ass ( rd ( D , np )) = σ , np , we know that D ,(cid:93) should be taken to the topmost 1-stack (wehave that red ( rd ( D ,(cid:93) )) = σ , pr ), and that D ,a should be left in the second topmost 1-stack(we have that red ( rd ( D ,a )) = σ , np ).We can see that not every run is of the form st ( R ) for some annotated run R . Forexample, this is the case for the run that starts in ( q , pos + ([[ γ, γ ] , [ γ, γ ]])) and reads a , then b , and then (cid:93) . Indeed, in order to obtain such a run as an annotated run, to the topmost0-stack we have to assign a hypothetical derivation tree D ,a,b , saying that we should ﬁrstread a and then b , but there is no such derivation tree (as already explained in Example 7.12).Another run that is not of the form st ( R ) for any annotated run R is the run that starts in( q , pos + ([[ γ, γ ] , [ γ, γ, γ ]])) and reads a , then b , and then (cid:93) . This time the problem is that tothe second topmost 0-stack we cannot assign simultaneously D ,a and D ,b , as they bothhave the same run descriptor. More generally, the push in Case (4) of Deﬁnition 7.17 leavesthe same annotations in the original substack as in the copied substack, up to a restriction,which causes that fragments of the annotated run corresponding to these substacks have tobe the same.A priori there might exist an inﬁnite annotated run. But, as we see below, thisis impossible: always after some number of steps we reach an annotated stack with nosuccessors (Case (1) of Deﬁnition 7.17). Moreover, we show that the number of (cid:93) symbols readby the run starting in an annotated stack s can be estimated by the number of productiverun descriptors in the annotations of s . To this end, to each well-formed annotated stack s we assign three natural numbers: low ( s ), high ( s ), and len ( s ). The ﬁrst two of them give alower and an upper bound on the number of (cid:93) symbols read by our run, and the last onegives an upper bound on the length of the run. Deﬁnition 7.20.

For positive integers m , . . . , m k we deﬁne pow ( m , . . . , m k ) by inductionon k : pow () = 1 , and pow ( m , m , . . . , m k ) = (1 + m ) pow ( m ,...,m k ) − . Notice that, in particular, pow ( m ) = m and pow ( m , m ) = (1 + m ) m − Deﬁnition 7.21.

For a well-formed annotated k -stack s we deﬁne natural numbers low ( s ), high ( s ), and len ( s ) by induction on the structure of s . • If s = ( γ, D ), we take low ( s ) = | type ( s ) ∩ T pr | , high ( s ) = (cid:89) D ∈ D : rd ( D ) ∈T pr C depth ( D ) , and len ( s ) = (cid:89) D ∈ D C depth ( D ) , where C z is deﬁned inductively: C = 2 , and C z +1 = (2 |T | ) n · ( C z ) |T | +1 . • We take low ([ ]) = 0 and high ([ ]) = len ([ ]) = 1.

Pawe(cid:32)l Parys

Vol. 16:3 • If s = s k : s k − , we take low ( s ) = (cid:88) σ ∈ type ( s k − ) (cid:0) low ( s k (cid:22) π ( ass k ( σ )) ) + low ( s k − (cid:22) { σ } ) (cid:1) , high ( s ) = (cid:89) σ ∈ type ( s k − ) pow (cid:0) high ( s k (cid:22) π ( ass k ( σ )) ) , high ( s k − (cid:22) { σ } ) (cid:1) , and len ( s ) = (cid:89) σ ∈ type ( s k − ) pow (cid:0) len ( s k (cid:22) π ( ass k ( σ )) ) , len ( s k − (cid:22) { σ } ) (cid:1) . The three numbers are interesting for us, because of the following lemma. Recall that,for a run R , by (cid:93) ( R ) we denote the number of (cid:93) symbols read by R . Lemma 7.22. If R is an annotated run, low ( R (0)) ≤ (cid:93) ( st ( R )) + low ( R ( | R | )) , high ( R (0)) ≥ (cid:93) ( st ( R )) + high ( R ( | R | )) , and len ( R (0)) ≥ | R | + len ( R ( | R | )) . This is one of key lemmas of this section. We now give some examples and intuitionsstaying behind this lemma, and behind the deﬁnitions of low , high , and len ; after that, weprove this lemma.We see that the last inequality of Lemma 7.22 bounds the length of an annotated run R by len ( R (0)), that is, by a function of the annotated stack R (0), in which the annotatedrun starts. Similarly, the second inequality bounds the number of (cid:93) symbols read by R ,by another function of R (0), namely by high ( R (0)). The additional components addedon the right of these inequalities only strengthen them. Conversely, the role of the ﬁrstinequality is to give a lower bound for the number of (cid:93) symbols read by R . If R is maximal(i.e., cannot be prolonged), then the topmost 0-stack of R ( R ) is annotated by a derivationtree of the form ( empty , γ, q ), and all other 0-stacks are annotated by empty sets. In eﬀect low ( R ( | R | )) = 0, and we simply obtain that low ( R (0)) ≤ (cid:93) ( st ( R )). But for an arbitraryrun R , which may end prematurely, even before reading any (cid:93) symbol, we have to add low ( R ( | R | )) on the right side of the inequality.Roughly speaking, low ( R (0)) counts the number of productive run descriptors in theannotations of R (0). The intuition is that every productive run descriptor is responsible forincreasing the number of (cid:93) symbols read, so the ﬁrst inequality of Lemma 7.22 should holdwith such a deﬁnition of low .We see that the function high takes into account the same productive run descriptorsas low , but instead of sums we use products and the pow function. Indeed, it is shownin Proposition 7.27 that if all run descriptors in the annotations of a substack s k − arenonproductive, then high ( s k − ) = 1 (and low ( s k − ) = 0). Suppose that s k − is singular, thatis, its type is a singleton { σ } . If s k : s k − is well-formed, we have type ( s k ) = π ( ass k ( σ )).Observe that pow ( x,

1) = (1 + x ) − x for every x , and thus high ( s k : s k − ) = pow ( high ( s k (cid:22) π ( ass k ( σ )) ) , high ( s k − (cid:22) { σ } ))= pow ( high ( s k ) , high ( s k − )) = pow ( high ( s k ) ,

1) = high ( s k ) . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:35 This is similar to the behavior of the low function, as in such a case we also have low ( s k : s k − ) = low ( s k (cid:22) π ( ass k ( σ )) ) + low ( s k − (cid:22) { σ } )= low ( s k ) + low ( s k − ) = low ( s k ) + 0 = low ( s k ) . Technical details of the deﬁnition of high were chosen so that it is possible to performa proof, but two facts are important here. First, nonproductive run descriptors cannotbe responsible for increasing the number of (cid:93) symbols read, and thus (in order to obtainthe second inequality of Lemma 7.22) it is enough to have a function high that takes intoaccount only productive run descriptors (i.e., ignores nonproductive run descriptors). Second,because high ( R (0)) and low ( R (0)) are counting the same productive run descriptors, onlyin a diﬀerent way, the two values are related. Namely, high ( R (0)) can be bounded by afunction of low ( R (0)). This essential property is shown in Proposition 7.32, in the nextsubsection.The function len is deﬁned very similarly to high , but it takes into account all rundescriptors, not only productive ones. Thus, roughly, it depends on the size of the annotatedstack. Example 7.23.

We continue the previous examples, concerning the 2-DPDA A fromExample 7.12. Consider the annotated stack s = [[ t , t ] , [ t , t ]], where t = ( γ, { D ,(cid:93),a } ) , t = ( γ, { E } ) , t = ( γ, ∅ ) , t = ( γ, { E } ) . We have that type ( t ) = { rd ( D ,(cid:93),a ) } = { σ , pr } , π ( ass ( σ , pr )) = ∅ , type ( t ) = ∅ , type ( t ) = { rd ( E ) } = { τ } , π ( ass ( τ )) = { τ } , type ( t ) = { rd ( E ) } = { τ } . π ( ass ( τ )) = ∅ . In eﬀect type ([ t , t ]) = { red ( σ ) : σ ∈ type ( t ) } = { red ( σ , pr ) } , π ( ass ( red ( σ , pr ))) = { ( q , pr ) } , type ([ t ]) = { red ( σ ) : σ ∈ type ( t ) } = ∅ , type ([ t , t ]) = { red ( σ ) : σ ∈ type ( t ) } = { red ( τ ) } , π ( ass ( red ( τ ))) = ∅ , type ([ t ]) = { red ( σ ) : σ ∈ type ( t ) } = { red ( τ ) } = { τ } , and ﬁnally type ([[ t , t ]]) = { red ( σ ) : σ ∈ type ([ t , t ]) } = { red ( red ( τ )) } = { ( q , pr ) } . Recall that τ ∈ T np and σ , pr , τ ∈ T pr . We can compute low ( s ) as follows: low ([ t ]) = (cid:88) σ ∈ type ( t ) (cid:0) low ([ ] (cid:22) π ( ass ( σ )) ) + low ( t (cid:22) { σ } ) (cid:1) = low ([ ] (cid:22) ∅ ) + low ( t (cid:22) { τ } ) = low ([ ]) + low ( t ) = 0 + 1 = 1 , Pawe(cid:32)l Parys

Vol. 16:3 low ([[ t , t ]]) = (cid:88) σ ∈ type ([ t , t ]) (cid:0) low ([ ] (cid:22) π ( ass ( σ )) ) + low ([ t , t ] (cid:22) { σ } ) (cid:1) = low ([ ] (cid:22) ∅ ) + low ([ t , t ] (cid:22) { red ( τ ) } ) = 0 + low ([ t , t ])= (cid:88) σ ∈ type ( t ) (cid:0) low ([ t ] (cid:22) π ( ass ( σ )) ) + low ( t (cid:22) { σ } ) (cid:1) = low ([ t ] (cid:22) { τ } ) + low ( t (cid:22) { τ } ) = low ([ t ]) + low ( t ) = 1 + 0 = 1 , low ([ t ]) = (cid:88) σ ∈ type ( t ) (cid:0) low ([ ] (cid:22) π ( ass ( σ )) ) + low ( t (cid:22) { σ } ) (cid:1) = 0 , low ([ t , t ]) = (cid:88) σ ∈ type ( t ) (cid:0) low ([ t ] (cid:22) π ( ass ( σ )) ) + low ( t (cid:22) { σ } ) (cid:1) = low ([ t ] (cid:22) ∅ ) + low ( t (cid:22) { σ , pr } ) = low ([ t ]) + low ( t ) = 0 + 1 = 1 , low ( s ) = (cid:88) σ ∈ type ([ t , t ]) (cid:0) low ([[ t , t ]] (cid:22) π ( ass ( σ )) ) + low ([ t , t ] (cid:22) { σ } ) (cid:1) = low ([[ t , t ]] (cid:22) { ( q , pr ) } ) + low ([ t , t ] (cid:22) { red ( σ , pr ) } )= low ([[ t , t ]]) + low ([ t , t ]) = 1 + 1 = 2 . We see that the restrictions of annotated stacks appearing in the above formulas do notmodify these annotated stacks (this is the case because all annotations are either singletonsor empty sets).Next, we compute high ( s ). To this end, recall that depth ( D ,(cid:93),a ) = 2 and depth ( E ) = 1.This time we give the formulas ignoring the restrictions, as again they do not changeanything: high ([ t , t ]) = pow ( high ([ t ]) , high ( t )) = pow ( pow ( high ([ ]) , high ( t )) , high ( t ))= pow ( pow (1 , C depth ( E ) ) ,

1) = pow ( pow (1 , C ) ,

1) = 2 C − , high ([ t , t ]) = pow ( high ([ t ]) , high ( t )) = pow (1 , C depth ( D ,(cid:93),a )) = 2 C − , high ( s ) = pow ( high ([[ t , t ]]) , high ([ t , t ]))= pow ( pow ( high ([ ]) , high ([ t , t ])) , high ([ t , t ]))= pow ( pow (1 , C − , C −

1) = 2 (2 C − C − − . Notice that high ( t ) = 1, because type ( t ) contains only a nonproductive run descriptor.Similarly, we can compute len ( s ), but this should also take into account E , whosedepth is 0: len ([ t , t ]) = pow ( len ([ t ]) , len ( t )) = pow ( pow ( len ([ ]) , len ( t )) , len ( t ))= pow ( pow (1 , C depth ( E ) ) , C depth ( E ) ) = pow ( pow (1 , C ) , C ) = 2 C · C − , len ([ t , t ]) = pow ( len ([ t ]) , len ( t )) = pow (1 , C depth ( D ,(cid:93),a )) = 2 C − , len ( s ) = pow ( len ([[ t , t ]]) , len ([ t , t ]))= pow ( pow ( len ([ ]) , len ([ t , t ])) , len ([ t , t ]))= pow ( pow (1 , C · C − , C −

1) = 2 (2 C · C − C − − . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:37 We have said that low counts the number of productive run descriptors in all annotations.This is a good high-level intuition, but strictly speaking this is not true. Indeed, say thatwe have two run descriptors σ, σ (cid:48) ∈ type ( s k − ) such that ass k ( σ ) = ass k ( σ (cid:48) ). Then, in theformula for low ( s k : s k − ) we add low ( s k (cid:22) π ( ass k ( σ )) ) twice (once for σ , and once for σ (cid:48) ). Thisis illustrated by the next example. Example 7.24.

Consider the 2-DPDA A depicted below; its stack alphabet is { γ } , andinput alphabet { (cid:93) } . push q q pop q q pop q q q (cid:93) pop pop pop This time we consider the trivial monoid M = { } . Denote σ i = ( q i , { (1 , ( q , pr )) } , pr ) for i ∈ { , , } . We are interested in derivation trees D = ( push , γ, q , ( pop , γ, q , σ ) , { ( pop , γ, q , σ ) } ) ,D i = ( pop , γ, q i , σ ) for i ∈ { , } , and D = ( read , q , ( pop , γ, q , ( q , pr ))) . The annotated 1-stack s = [( γ, { D } ) , ( γ, { D , D } ) , ( γ, { D } )] is well-formed. Notice thatthe run descriptor of D is productive, while run descriptors of D , D , and D arenonproductive. We can see, though, that D uses both D and D as assumptions, andboth D and D use D as an assumption. In eﬀect low ( s ) counts the nonproductive rundescriptor rd ( D ) twice: low ( s ) = low ([( γ, { D } ) , ( γ, { D , D } )]) + low (( γ, { D } ))= low ([( γ, { D } )]) + low (( γ, { D } )) + low ([( γ, { D } )]) + low (( γ, { D } )) + 0= 1 + 0 + 1 + 0 + 0 = 2 . Remark 7.25.

Consider the function (cid:105) n ( k ) deﬁned by (cid:105) ( k ) = k and (cid:105) n +1 ( k ) = 2 (cid:105) n ( k ) . One can construct an n -DPDA A that recognizes the language { (cid:93) k a(cid:93) (cid:105) n − ( k ) : k ∈ N } (seeBlumensath [Blu08, Example 9] for a very similar construction). After reading a preﬁx (cid:93) k a , the number of 0-stacks in the n -stack s of A is linear in k . It is possible to annotate s , resulting in an annotated stack s , such that the maximal annotated run starting from s reads (cid:105) n − ( k ) (cid:93) symbols. It follows that the high function (and thus len as well) has tobe at least ( n − n -stack. According to ourdeﬁnition, high and len are (in the worst case) ( n + 1)-fold exponential, which is slightlylarger than necessary. We believe that it is possible to save these two exponentiations, atthe cost of complicating proofs.We now prove Lemma 7.22, which ﬁlls the rest of this subsection. We start by provingsome (in)equalities regarding the pow function. Proposition 7.26.

The following is true for all positive integers: pow ( a , . . . , a k , pow ( b , . . . , b l )) = pow ( a , . . . , a k , b , . . . , b l ) , (7.11) Pawe(cid:32)l Parys

Vol. 16:3 pow ( a , . . . , a k , pow ( c , c , . . . , c l ) , b , . . . , b l ) ≤≤ pow ( a , . . . , a k , c , b c , . . . , b l c l ) , (7.12) pow ( a , . . . , a i − , a xi , a i +1 , . . . , a k − , a k ) ≤ pow ( a , . . . , a k − , xa k ) for i < k , (7.13) pow ( a , . . . , a k − , a k ) + 1 ≤ pow ( a , . . . , a k − , a k + 1) , (7.14) pow ( a , . . . , a k ) · pow ( b , . . . , b k ) ≤ pow ( a b , . . . , a k b k ) . (7.15) Proof.

Equality (7.11) can be shown by induction on k . For k = 0 we have that pow ( pow ( b , . . . , b l )) = (1 + pow ( b , . . . , b l )) pow () −

1= (1 + pow ( b , . . . , b l )) − pow ( b , . . . , b l ) , and for k > pow ( a , . . . , a k , pow ( b , . . . , b l )) = (1 + a ) pow ( a ,...,a k , pow ( b ,...,b l )) −

1= (1 + a ) pow ( a ,...,a k ,b ,...,b l ) − pow ( a , . . . , a k , b , . . . , b l ) . For Inequality (7.12) suppose ﬁrst that k = 0. By Inequality (7.15), which we provebelow, it follows that pow ( pow ( c , c , . . . , c l ) , b , . . . , b l ) = (1 + pow ( c , c , . . . , c l )) pow ( b ,...,b l ) −

1= (1 + (1 + c ) pow ( c ,...,c l ) − pow ( b ,...,b l ) −

1= (1 + c ) pow ( b ,...,b l ) · pow ( c ,...,c l ) − ≤ (1 + c ) pow ( b c ,...,b l c l ) − pow ( c , b c , . . . , b l c l ) . It is easy to see that pow is monotone, thus the general form of Inequality (7.12) followsfrom the above special form thanks to Equality (7.11): pow ( a , . . . , a k , pow ( c , c , . . . , c l ) , b , . . . , b l )= pow ( a , . . . , a k , pow ( pow ( c , c , . . . , c l ) , b , . . . , b l )) ≤ pow ( a , . . . , a k , pow ( c , b c , . . . , b l c l )) = pow ( a , . . . , a k , c , b c , . . . , b l c l ) . Heading toward proving Inequality (7.13), we ﬁrst show that x · pow ( a i +1 , . . . , a k ) ≤ pow ( a i +1 , . . . , a k − , xa k ) , (7.16)where i < k , and the numbers x, a i +1 , . . . , a k are positive integers. This is shown by inductionon k − i . When k − i = 1, we simply have that x · pow ( a k ) = x · ((1 + a k ) −

1) = (1 + xa k ) − pow ( xa k ) . Suppose that k − i >

1. Notice that xb ≤ b x for all x ∈ N and b ≥

2. Thus, x · pow ( a i +1 , . . . , a k ) = x · ((1 + a i +1 ) pow ( a i +2 ,...,a k ) − ≤ x · (1 + a i +1 ) pow ( a i +2 ,...,a k ) − ≤ (1 + a i +1 ) x · pow ( a i +2 ,...,a k ) − ≤ (1 + a i +1 ) pow ( a i +2 ,...,a k − ,xa k ) − pow ( a i +1 , . . . , a k − , xa k ) . Above, the ﬁrst inequality holds because x ≥

1; the second inequality follows from theinequality xb ≤ b x , where we notice that (1 + a i +1 ) pow ( a i +2 ,...,a k ) ≥ a i +1 ≥ ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:39 pow ( . . . ) ≥ pow ( a xi , a i +1 , . . . , a k ) = (1 + a xi ) pow ( a i +1 ,...,a k ) − ≤ (1 + a i ) x · pow ( a i +1 ,...,a k ) − ≤ (1 + a i ) pow ( a i +1 ,...,a k − ,xa k ) − pow ( a i +1 , . . . , a k − , xa k ) . This gives the thesis, by Equality (7.11) and by monotonicity of pow : pow ( a , . . . , a i − , a xi , a i +1 , . . . , a k − , a k ) = pow ( a , . . . , a i − , pow ( a xi , a i +1 , . . . , a k − , a k )) ≤ pow ( a , . . . , a i − , pow ( a i , . . . , a k − , xa k )) = pow ( a , . . . , a k − , xa k ) . Inequality (7.14) is shown by induction on k . For k = 1 we simply have pow ( a ) + 1 = (1 + a ) − a + 1) − pow ( a + 1) . For k > a ≥ pow ( a , . . . , a k − , a k ) + 1 = (1 + a ) pow ( a ,...,a k ) − ≤ (1 + a ) pow ( a ,...,a k )+1 − ≤ (1 + a ) pow ( a ,...,a k − ,a k +1) − pow ( a , . . . , a k − , a k + 1) . Inequality (7.15) is also shown by induction on k . For k = 0 the thesis is trivial: pow () · pow () = 1 · pow () . Suppose now that k ≥

1, and denote x = pow ( a , . . . , a k ) and y = pow ( b , . . . , b k ). We claimthat ((1 + a ) x − b ) y − ≤ (1 + a b ) xy − . (7.17)Let us prove this inequality. By symmetry, we can assume that x ≥ y . We have three cases.If x = y = 1, Inequality (7.17) simply says that((1 + a ) − b ) −

1) = a b ≤ (1 + a b ) · − . Next, suppose that x ≥ y = 1. We see that0 ≤ ( b − , ≤ b − b + 1 , b ≤ b + 2 b + 1 , b ≤ ( b + 1) ,b ≤ (cid:18) b + 12 (cid:19) . Because x ≥ b ≥

1, it follows that b ≤ (cid:18) b + 12 (cid:19) x . (7.18)Next, observe that 0 ≤ ( a − b − , ≤ a b − a − b + 1 ,a b + a + b + 1 ≤ a b + 2 , (1 + a )( b + 1) ≤ a b , Pawe(cid:32)l Parys

Vol. 16:3 (1 + a ) (cid:18) b + 12 (cid:19) ≤ a b . (7.19)Using Inequalities (7.18) and (7.19) we obtain Inequality (7.17):((1 + a ) x − b ) −

1) = (1 + a ) x · b − b ≤ (1 + a ) x · b − ≤ (1 + a ) x (cid:18) b + 12 (cid:19) x − ≤ (1 + a b ) x · − . The remaining case is when x ≥ y ≥

2. In this case we have that((1 + a ) x − b ) y − ≤ (1 + a ) x (1 + b ) y − ≤ (1 + a ) x (1 + b ) x − ≤ (1 + a b ) x (1 + a b ) x − a b ) x · − ≤ (1 + a b ) x · y − . Thus, we have shown Inequality (7.17) in all cases. Using this inequality and the inductionassumption, we can conclude that pow ( a , . . . , a k ) · pow ( b , . . . , b k ) = ((1 + a ) x − b ) y − ≤ (1 + a b ) x · y − ≤ (1 + a b ) pow ( a b ,...,a k b k ) − pow ( a b , . . . , a k b k ) . Heading toward the proof of Lemma 7.22, we now observe some auxiliary properties.

Proposition 7.27.

Let s be a well-formed annotated stack. If type ( s ) ⊆ T np then low ( s ) = 0 and high ( s ) = 1 ; otherwise low ( s ) ≥ and high ( s ) ≥ .Proof. By induction on the structure of s . In the base cases of a 0-stack and of an empty k -stack, the thesis follows directly from Deﬁnition 7.21. In the induction step denote s = s k : s k − . Recall that type ( s ) = { red k ( σ ) : σ ∈ type ( s k − ) } (by the deﬁnition of types),so type ( s ) ⊆ T np ⇔ ∀ σ ∈ type ( s k − ) red k ( σ ) ∈ T np . Moreover, red k ( σ ) ∈ T np ⇔ ( σ ∈ T np ∧ π ( ass k ( σ )) ⊆ T np )for σ ∈ T k − (by the deﬁnition of red k ). It follows that type ( s ) ⊆ T np ⇔ ∀ σ ∈ type ( s k − ) ( σ ∈ T np ∧ π ( ass k ( σ )) ⊆ T np ) . If type ( s ) ⊆ T np then, by the induction assumption, low ( s k (cid:22) π ( ass k ( σ )) ) = low ( s k − (cid:22) { σ } ) =0 and high ( s k (cid:22) π ( ass k ( σ )) ) = high ( s k − (cid:22) { σ } ) = 1 for all σ ∈ type ( s k − ); in eﬀect low ( s ) = (cid:88) σ ∈ type ( s k − ) (cid:0) low ( s k (cid:22) π ( ass k ( σ )) ) + low ( s k − (cid:22) { σ } ) (cid:1) = (cid:88) σ ∈ type ( s k − ) (0 + 0) = 0 , and high ( s ) = (cid:89) σ ∈ type ( s k − ) pow (cid:0) high ( s k (cid:22) π ( ass k ( σ )) ) , high ( s k − (cid:22) { σ } ) (cid:1) . = (cid:89) σ ∈ type ( s k − ) pow (1 ,

1) = (cid:89) σ ∈ type ( s k − ) . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:41 Conversely, if type ( s ) (cid:54)⊆ T np then, by the induction assumption, at least one among low ( s k (cid:22) π ( ass k ( σ )) ) and low ( s k − (cid:22) { σ } ) for σ ∈ type ( s k − ) is positive (and all other are non-negative); in eﬀect low ( s ), being their sum, is positive. Similarly, at least one among high ( s k (cid:22) π ( ass k ( σ )) ) and high ( s k − (cid:22) { σ } ) is greater than 1 (and all other are positive); in ef-fect some pow (cid:0) high ( s k (cid:22) π ( ass k ( σ )) ) , high ( s k − (cid:22) { σ } ) (cid:1) is greater than 1 (notice that pow (2 ,

1) =(1 + 2) − pow (1 ,

2) = (1 + 1) − pow is monotone), and thustheir product high ( s ) is greater than 1. Proposition 7.28.

For every well-formed annotated stack s , low ( s ) = (cid:88) σ ∈ type ( s ) low ( s (cid:22) { σ } ) , high ( s ) = (cid:89) σ ∈ type ( s ) high ( s (cid:22) { σ } ) , and len ( s ) = (cid:89) σ ∈ type ( s ) len ( s (cid:22) { σ } ) . Proof.

We analyze Deﬁnition 7.21. Suppose ﬁrst that s = ( γ, D ) (i.e., that s is of order 0).Because type ( s (cid:22) { σ } ) = { σ } , low ( s ) = | type ( s ) ∩ T pr | = (cid:88) σ ∈ type ( s ) |{ σ } ∩ T pr | = (cid:88) σ ∈ type ( s ) | type ( s (cid:22) { σ } ) ∩ T pr | = (cid:88) σ ∈ type ( s ) low ( s (cid:22) { σ } ) . Recall that type ( s ) = { rd ( D ) : D ∈ D } , and that s (cid:22) { σ } = ( γ, { D ∈ D : rd ( D ) = σ } ); thus, high ( s ) = (cid:89) D ∈ D : rd ( D ) ∈T pr C depth ( D ) = (cid:89) σ ∈ type ( s )  (cid:89) D ∈ D : rd ( D )= σ ∈T pr C depth ( D )  = (cid:89) σ ∈ type ( s ) high ( s (cid:22) { σ } ) , and len ( s ) = (cid:89) D ∈ D C depth ( D ) = (cid:89) σ ∈ type ( s )  (cid:89) D ∈ D : rd ( D )= σ C depth ( D )  = (cid:89) σ ∈ type ( s ) len ( s (cid:22) { σ } ) . If s = [ ], then type ( s ) = ∅ , and thus low ( s ) = 0 = (cid:88) σ ∈∅ low ( s (cid:22) { σ } ) , high ( s ) = 1 = (cid:89) σ ∈∅ high ( s (cid:22) { σ } ) , and len ( s ) = 1 = (cid:89) σ ∈∅ len ( s (cid:22) { σ } ) . Finally, suppose that s = s k : s k − . Recall that type ( s ) = { red k ( τ ) : τ ∈ type ( s k − ) } .Moreover, by well-formedness of s , for every σ ∈ type ( s ) there is exactly one τ ∈ type ( s k − )such that red k ( τ ) = σ ; denote it τ σ . By the deﬁnition of a restriction, we have that Pawe(cid:32)l Parys

Vol. 16:3 s (cid:22) { σ } = s k (cid:22) π ( ass k ( τ σ )) : s k − (cid:22) { τ σ } . Recalling that type ( s (cid:22) { σ } ) = { σ } , we obtain low ( s ) = (cid:88) τ ∈ type ( s k − ) (cid:0) low ( s k (cid:22) π ( ass k ( τ )) ) + low ( s k − (cid:22) { τ } ) (cid:1) = (cid:88) σ ∈ type ( s ) (cid:0) low ( s k (cid:22) π ( ass k ( τ σ )) ) + low ( s k − (cid:22) { τ σ } ) (cid:1) = (cid:88) σ ∈ type ( s ) low ( s (cid:22) { σ } ) , high ( s ) = (cid:89) τ ∈ type ( s k − ) pow (cid:0) high ( s k (cid:22) π ( ass k ( τ )) ) , high ( s k − (cid:22) { τ } ) (cid:1) = (cid:89) σ ∈ type ( s ) pow (cid:0) high ( s k (cid:22) π ( ass k ( τ σ )) ) , high ( s k − (cid:22) { τ σ } ) (cid:1) = (cid:89) σ ∈ type ( s ) high ( s (cid:22) { σ } ) , and len ( s ) = (cid:89) τ ∈ type ( s k − ) pow (cid:0) len ( s k (cid:22) π ( ass k ( τ )) ) , len ( s k − (cid:22) { τ } ) (cid:1) = (cid:89) σ ∈ type ( s ) pow (cid:0) len ( s k (cid:22) π ( ass k ( τ σ )) ) , len ( s k − (cid:22) { τ σ } ) (cid:1) = (cid:89) σ ∈ type ( s ) len ( s (cid:22) { σ } ) . Proposition 7.29.

Let ≤ l ≤ k ≤ n , and let s = s k : s k − : · · · : s l be a well-formedannotated k -stack that is singular. In this situation low ( s ) = k (cid:88) i = l low ( s i ) , high ( s ) = pow ( high ( s k ) , high ( s k − ) , . . . , high ( s l )) , and len ( s ) = pow ( len ( s k ) , len ( s k − ) , . . . , len ( s l )) . Proof.

Induction on k − l . For k − l = 0, we simply have s = s k ; both sides of each equalityare the same (recall that pow ( x ) = x for every x ).Suppose that k − l ≥

1, and denote t = s k − : s k − : · · · : s l ; we have that s = s k : t .Because s is well-formed and type ( s ) is a singleton, also type ( t ) is a singleton { σ } , where type ( s k ) = π ( ass k ( σ )). In eﬀect, restricting t to { σ } or s k to π ( ass k ( σ )) does not changethe annotated stacks, so, by deﬁnition, low ( s ) = (cid:88) σ ∈ type ( t ) (cid:0) low ( s k (cid:22) π ( ass k ( σ )) ) + low ( t (cid:22) { σ } ) (cid:1) = low ( s k ) + low ( t ) . Similarly, high ( s ) = pow ( high ( s k ) , high ( t )) and len ( s ) = pow ( len ( s k ) , len ( t )) . From the induction assumption we know that low ( t ) = k − (cid:88) i = l low ( s i ) , high ( t ) = pow ( high ( s k − ) , high ( s k − ) , . . . , high ( s l )) , and len ( t ) = pow ( len ( s k − ) , len ( s k − ) , . . . , len ( s l )) . By substituting this to equalities for low ( s ), high ( s ), and len ( s ) we obtain the thesis, wherein the case of high and len we additionally use Equality (7.11). ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:43 Proposition 7.30.

Let ≤ l ≤ k ≤ n , let s = s k : s k − : · · · : s l be a well-formed annotated k -stack, and let σ ∈ type ( s l ) . In this situation s (cid:22) { red k ( σ ) } = s k (cid:22) π ( ass k ( σ )) : s k − (cid:22) π ( ass k − ( σ )) : · · · : s l +1 (cid:22) π ( ass l +1 ( σ )) : s l (cid:22) { σ } . Proof.

Proposition 7.14 used for s implies that type ( s ) = { red k ( σ ) : σ ∈ type ( s l ) } and | type ( s ) | = | type ( s l ) | ; the latter means that red k ( σ ) = red k ( σ (cid:48) ) implies σ = σ (cid:48) for σ, σ (cid:48) ∈ type ( s l ).Observe that top l ( s (cid:22) { red k ( σ ) } ) equals s l restricted to a subset of type ( s l ). Proposition 7.14used for s (cid:22) { red k ( σ ) } implies that this subset is a singleton { σ (cid:48) } , and that red k ( σ (cid:48) ) = red k ( σ ).This implies that σ (cid:48) = σ , by the previous paragraph. Using also Item (T2) of Proposition 7.14,we see that s (cid:22) { red k ( σ ) } = s k (cid:22) π ( ass k ( σ )) : s k − (cid:22) π ( ass k − ( σ )) : · · · : s l +1 (cid:22) π ( ass l +1 ( σ )) : s l (cid:22) { σ } , as required.Next, we observe how the functions low , high , and len interplay with composing annotatedstacks. Lemma 7.31.

Let ≤ l ≤ k ≤ n , let (Φ k , Φ k − , . . . , Φ l ; Ψ k ; f ) be a composer, and let s = s k : s k − : · · · : s l be a well-formed annotated k -stack such that type ( s i ) = π (Φ i ) foreach i ∈ [ l, k ] . In this situation k (cid:88) i = l low ( s i ) ≤ low ( s ) , (7.20) k (cid:88) i = l low ( s i ) < low ( s ) if f = pr , (7.21) pow (cid:0) high ( s k ) , high ( s k − ) , . . . , high ( s l +1 ) , (cid:12)(cid:12) T (cid:12)(cid:12) n · high ( s l ) (cid:1) ≥ high ( s ) , (7.22) pow (cid:0) high ( s k ) , high ( s k − ) , . . . , high ( s l +1 ) , high ( s l ) (cid:1) ≥ high ( s ) if f = np , (7.23) pow (cid:0) len ( s k ) , len ( s k − ) , . . . , len ( s l +1 ) , (cid:12)(cid:12) T (cid:12)(cid:12) n · len ( s l ) (cid:1) ≥ len ( s ) . (7.24) Proof.

Because type ( s l ) = π (Φ l ), Proposition 7.14 used for s implies that type ( s ) = { red k ( σ ) : σ ∈ π (Φ l ) } and | type ( s ) | = | π (Φ l ) | , which means that the mapping deﬁnedby σ (cid:55)→ red k ( σ ) is a bijection between π (Φ l ) and type ( s ). Moreover, for σ ∈ π (Φ l ), s (cid:22) { red k ( σ ) } = s k (cid:22) π ( ass k ( σ )) : s k − (cid:22) π ( ass k − ( σ )) : · · · : s l +1 (cid:22) π ( ass l +1 ( σ )) : s l (cid:22) { σ } , by Proposition 7.30. Pawe(cid:32)l Parys

Vol. 16:3

For i ∈ [ l + 1 , k ] and σ ∈ π (Φ l ) denote H iσ = high ( s i (cid:22) π ( ass i ( σ )) ). Using Proposition 7.28,the above property, and Proposition 7.29, we obtain that low ( s ) = (cid:88) τ ∈ type ( s ) low ( s (cid:22) { τ } ) = (cid:88) σ ∈ π (Φ l ) low ( s (cid:22) { red l ( σ ) } )= (cid:88) σ ∈ π (Φ l ) (cid:16) low ( s l (cid:22) { σ } )) + k (cid:88) i = l +1 low ( s i (cid:22) π ( ass i ( σ )) ) (cid:17) = low ( s l ) + k (cid:88) i = l +1 (cid:88) σ ∈ π (Φ l ) low ( s i (cid:22) π ( ass i ( σ )) ) , and (7.25) high ( s ) = (cid:89) τ ∈ type ( s ) high ( s (cid:22) { τ } ) = (cid:89) σ ∈ π (Φ l ) high ( s (cid:22) { red k ( σ ) } )= (cid:89) σ ∈ π (Φ l ) pow (cid:0) H kσ , H k − σ , . . . , H l +1 σ , high ( s l (cid:22) { σ } ) (cid:1) . (7.26)For each i ∈ [ l + 1 , k ] it holds that type ( s i ) = π (Φ i ) = (cid:83) { π ( ass i ( σ )) : σ ∈ π (Φ l ) } (bythe deﬁnition of a composer, Condition (C1)), so, by Proposition 7.28, low ( s i ) = (cid:88) τ ∈ type ( s i ) low ( s i (cid:22) { τ } ) ≤ (cid:88) σ ∈ π (Φ l ) (cid:88) τ ∈ π ( ass i ( σ )) low ( s i (cid:22) { τ } ) = (cid:88) σ ∈ π (Φ l ) low ( s i (cid:22) π ( ass i ( σ )) ) . (7.27)Altogether, Equality (7.25) and Inequality (7.27) used for all i ∈ [ l + 1 , k ] imply Inequal-ity (7.20).For Inequality (7.21), recall from the deﬁnition of a composer (Condition (C4)) that if f = pr , then for some i ∈ [ l + 1 , k ], some τ ∈ T pr appears in π ( ass i ( σ )) simultaneously for twodiﬀerent σ ∈ π (Φ l ). By Proposition 7.27, low ( s i (cid:22) { τ } ) ≥ τ (because it is productive).Thus, some positive component low ( s i (cid:22) { τ } ) appears in two sums (cid:80) τ ∈ π ( ass i ( σ )) low ( s i (cid:22) { τ } ) inInequality (7.27) used for this i , so this inequality (and, in eﬀect, Inequality (7.20)) becomesstrict.Using Equality (7.26), Inequality (7.15), and Proposition 7.28 we obtain high ( s ) = (cid:89) σ ∈ π (Φ l ) pow (cid:0) H kσ , H k − σ , . . . , H l +1 σ , high ( s l (cid:22) { σ } ) (cid:1) ≤ pow (cid:16) (cid:89) σ ∈ π (Φ l ) H kσ , (cid:89) σ ∈ π (Φ l ) H k − σ , . . . , (cid:89) σ ∈ π (Φ l ) H l +1 σ , (cid:89) σ ∈ π (Φ l ) high ( s l (cid:22) { σ } ) (cid:17) = pow (cid:16) (cid:89) σ ∈ π (Φ l ) H kσ , (cid:89) σ ∈ π (Φ l ) H k − σ , . . . , (cid:89) σ ∈ π (Φ l ) H l +1 σ , high ( s l ) (cid:17) . (7.28) ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:45 Observe that by restricting an annotated stack we can only decrease the value of high . Thus,for each i ∈ [ l + 1 , k ], (cid:89) σ ∈ π (Φ l ) H iσ = (cid:89) σ ∈ π (Φ l ) high ( s i (cid:22) π ( ass i ( σ )) ) ≤ (cid:89) σ ∈ π (Φ l ) high ( s i ) = ( high ( s i )) | π (Φ l ) | ≤ ( high ( s i )) |T | . The last inequality is true because | π (Φ l ) | ≤ |T l | ≤ |T | . Using Inequality (7.13) we movethe |T | exponents (there is at most n of them) into the last argument of pow and we obtainInequality (7.22): high ( s ) ≤ pow (cid:0) ( high ( s k )) |T | , ( high ( s k − )) |T | , . . . , ( high ( s l +1 )) |T | , high ( s l ) (cid:1) ≤≤ pow (cid:0) high ( s k ) , high ( s k − ) , . . . , high ( s l +1 ) , |T | n · high ( s l ) (cid:1) . Now suppose that f = np . It implies, for each i ∈ [ l + 1 , k ], that each τ ∈ π (Φ i ) ∩ T pr belongs to the set π ( ass i ( σ )) only for one σ ∈ π (Φ l ), so all the common factors are equalto 1 (cf. Proposition 7.27): (cid:89) σ ∈ π (Φ l ) H iσ = (cid:89) σ ∈ π (Φ l ) (cid:89) τ ∈ π ( ass i ( σ )) high ( s i (cid:22) { τ } ) = (cid:89) τ ∈ π (Φ i ) high ( s i (cid:22) { τ } ) = high ( s i ) . By substituting this to Inequality (7.28) we obtain Inequality (7.23).Inequality (7.24) is obtained in the same way as Inequality (7.22), because the deﬁnitionsof len and high diﬀer only in the base case.We are now ready to prove Lemma 7.22.

Proof of Lemma 7.22.

It is enough to prove the lemma for annotated runs of length 1. Thenthe result for longer runs follow by an immediate induction. Thus, assume that | R | = 1,and denote R (0) = s n : s n − : · · · : s with s = ( γ, { D } ). Recall that our goal is to provethe following inequalities: low ( R (0)) ≤ (cid:93) ( st ( R )) + low ( R (1)) , high ( R (0)) ≥ (cid:93) ( st ( R )) + high ( R (1)) , and len ( R (0)) ≥ len ( R (1)) . We have four cases, depending on the shape of D . Case 1.

It is impossible that D is of the form ( empty , γ, p ), since then R (0) would not havea successor. Case 2.

Suppose that D = ( read , p, D (cid:48) ). Then R (1) = s n : s n − : · · · : s : ( γ, { D (cid:48) } ).Because both R (0) and R (1) are singular, by Proposition 7.29 we have that low ( R (0)) = low ( s ) + n (cid:88) i =1 low ( s i ) and low ( R (1)) = low (( γ, { D (cid:48) } )) + n (cid:88) i =1 low ( s i ) . Thus, the required inequality about low can be restated as low ( s ) ≤ (cid:93) ( st ( R )) + low (( γ, { D (cid:48) } )) . It holds when rd ( D ) ∈ T np (as then low ( s ) = 0). If rd ( D ) ∈ T pr , then low ( s ) = 1, and either rd ( D (cid:48) ) ∈ T pr or the letter read by st ( R ) is (cid:93) , so the right side is positive. Pawe(cid:32)l Parys

Vol. 16:3

Again by Proposition 7.29 we have that high ( R (0)) = pow ( high ( s n ) , high ( s n − ) , . . . , high ( s ) , high ( s )) , and high ( R (1)) = pow ( high ( s n ) , high ( s n − ) , . . . , high ( s ) , high (( γ, { D (cid:48) } ))) . If rd ( D ) ∈ T np , then (cid:93) ( st ( R )) = 0 and rd ( D (cid:48) ) ∈ T np . In this case high ( s ) = high (( γ, { D (cid:48) } )) =1, so the two sides of the inequality are equal: high ( R (0)) = (cid:93) ( st ( R )) + high ( R (1)) . If rd ( D ) ∈ T pr , then using Inequality (7.14) we obtain high ( R (0)) = pow ( high ( s n ) , high ( s n − ) , . . . , high ( s ) , C depth ( D ) ) ≥ pow ( high ( s n ) , high ( s n − ) , . . . , high ( s ) , C depth ( D (cid:48) ) + 1) ≥ pow ( high ( s n ) , high ( s n − ) , . . . , high ( s ) , C depth ( D (cid:48) ) ) + 1 ≥ (cid:93) ( st ( R )) + high ( R (1)) . In the same way we obtain the required inequality for len . Case 3.

Suppose that D = ( pop , γ, p, τ k ). Then R (1) = s n : s n − : · · · : s k . The operationbetween conf ( R (0)) and conf ( R (1)) is pop k , so (cid:93) ( st ( R )) = 0. For i ∈ [1 , k − ass i ( rd ( D )) = ∅ , and by well-formedness of R (0) we have that type ( s i ) = π ( ass i ( rd ( D ))) (cf. Proposition 7.14); in eﬀect type ( s i ) = ∅ , hence low ( s i ) = 0and high ( s i ) = 1. By Deﬁnition 7.11(3), rd ( R ) ∈ T np , so low ( s ) = 0 and high ( s ) = 1;moreover len ( s ) = C = 2. Because R (0) and R (1) are singular, by Proposition 7.29 wecan write low ( R (0)) = n (cid:88) i =0 low ( s i ) = n (cid:88) i = k low ( s i ) = (cid:93) ( st ( R )) + low ( R (1)) , high ( R (0)) = pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s k ) , , . . . , (cid:1) = pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s k ) (cid:1) = (cid:93) ( st ( R )) + high ( R (1)) , and len ( R (0)) = pow (cid:0) len ( s n ) , len ( s n − ) , . . . , len ( s k ) , , . . . , , (cid:1) ≥ pow (cid:0) len ( s n ) , len ( s n − ) , . . . , len ( s k ) , , . . . , , (cid:1) + 1= pow (cid:0) len ( s n ) , len ( s n − ) , . . . , len ( s k )) + 1 = 1 + len ( R (1)) , as required. Case 4.

Suppose that D = ( push , γ, p, D (cid:48) , D ). Let α and k be such that δ ( γ, p ) performs push kα ; this transition does not read anything, so (cid:93) ( st ( R )) = 0. By Deﬁnition 7.11(4) wehave a composer (Φ k , Φ k − , . . . , Φ ; Ψ k ; f ) such that π (Φ ) = { rd ( E ) : E ∈ D } . Denote alsoΨ i = ass i ( rd ( D (cid:48) )) for i ∈ [1 , k − R (1) = s n : s n − : · · · : s k +1 : t k : s k − (cid:22) π (Ψ k − ) : s k − (cid:22) π (Ψ k − ) : · · · : s (cid:22) π (Ψ ) : ( α, { D (cid:48) } ) , where t k = s k (cid:22) π (Φ k ) : s k − (cid:22) π (Φ k − ) : · · · : s (cid:22) π (Φ ) : ( γ, D ).From Lemma 7.31 we obtain the following inequality, which is strict if f = pr : k (cid:88) i =1 low ( s i (cid:22) π (Φ i ) ) + low (( γ, D )) ≤ low ( t k ) . (7.29) ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:47 Because R (0) is well-formed, type ( s i ) = π ( ass i ( rd ( D ))) for all i ∈ [1 , n ] (cf. Proposition 7.14).By Deﬁnition 7.11(4), ass k ( σ ) = Φ k , and ass i ( σ ) = Ψ i ∪ Φ i for i ∈ [1 , k − type ( s k ) = π (Φ k ) and type ( s i ) = π (Ψ i ) ∪ π (Φ i ) for i ∈ [1 , k − low ( s k ) = low ( s k (cid:22) π (Φ k ) ) , and (7.30) low ( s i ) = (cid:88) σ ∈ type ( s i ) low ( s i (cid:22) { σ } ) ≤ (cid:88) σ ∈ π (Ψ i ) low ( s i (cid:22) { σ } ) + (cid:88) σ ∈ π (Φ i ) low ( s i (cid:22) { σ } )= low ( s i (cid:22) π (Ψ i ) ) + low ( s i (cid:22) π (Φ i ) ) for i ∈ [1 , k − , (7.31)where the equalities in the second formula are by Proposition 7.28. Moreover, if π (Ψ i ) ∩ π (Φ i ) (cid:54)⊆ T np for some i ∈ [1 , k − τ ∈ π (Ψ i ) ∩ π (Φ i ) ∩ T pr , which is positive by Proposition 7.27,appears in both sums on the right side, and only once on the left side). Because R (0) and R (1) are singular, low ( R (0)) = n (cid:88) i =0 low ( s i ) , and low ( R (1)) = n (cid:88) i = k +1 low ( s i ) + low ( t k ) + k − (cid:88) i =1 low ( s i (cid:22) π (Ψ i ) ) + low (( α, { D (cid:48) } ))by Proposition 7.29. We apply (In)equalities (7.30) and (7.31) to the formula for low ( R (0));next, we substitute Inequality (7.29); we obtain low ( R (0)) ≤ n (cid:88) i = k +1 low ( s i ) + k − (cid:88) i =1 low ( s i (cid:22) π (Ψ i ) ) + k (cid:88) i =1 low ( s i (cid:22) π (Φ i ) ) + low ( s ) ≤≤ n (cid:88) i = k +1 low ( s i ) + k − (cid:88) i =1 low ( s i (cid:22) π (Ψ i ) ) + low ( t k ) − low (( γ, D )) + low ( s ) == low ( R (1)) − low (( α, { D (cid:48) } )) − low (( γ, D )) + low ( s ) ≤ low ( R (1)) + low ( s ) . If { rd ( D (cid:48) ) } ∪ π (Φ ) (cid:54)⊆ T np , the last inequality is strict, as we have removed negativecomponents. Because low ( s ) ≤

1, if some of the above inequalities was strict, we can remove low ( s ), and we obtain low ( R (0)) ≤ low ( R (1)), as required. On the other hand, if none ofthese inequalities was strict, then π (Ψ i ) ∩ π (Φ i ) ⊆ T np for each i ∈ [1 , k − f = np ,and { rd ( D (cid:48) ) } ∪ π (Φ ) ⊆ T np ; from Deﬁnition 7.11(4) it follows that in this case rd ( D ) ∈ T np ,so low ( s ) = 0, and we obtain the required inequality as well.Next, we prove the inequality for high . Denote a i = high ( s i ) for i ∈ [ k + 1 , n ] ,a i = high ( s i (cid:22) π (Ψ i ) ) for i ∈ [1 , k − ,b i = high ( s i (cid:22) π (Φ i ) ) for i ∈ [1 , k ] . Suppose ﬁrst that rd ( D ) ∈ T np . Then, by Deﬁnition 7.11(4), { rd ( D (cid:48) ) } ∪ π (Φ ) ⊆ T np , and f = np , and π (Ψ i ) ∩ π (Φ i ) ⊆ T np for each i ∈ [1 , k − high ( s ) = Pawe(cid:32)l Parys

Vol. 16:3 high (( γ, D )) = high (( α, { D (cid:48) } )) = 1. Recall that type ( s i ) = π (Ψ i ) ∪ π (Φ i ) for i ∈ [1 , k − high ( s i ) = (cid:89) σ ∈ type ( s i ) high ( s i (cid:22) { σ } ) = (cid:89) σ ∈ π (Ψ i ) high ( s i (cid:22) { σ } ) · (cid:89) σ ∈ π (Φ i ) high ( s i (cid:22) { σ } )= a i · b i for i ∈ [1 , k −

1] ; (7.32)the second equality above holds because for σ ∈ π (Ψ i ) ∩ π (Φ i ) ⊆ T np we have high ( s i (cid:22) { σ } ) = 1(cf. Proposition 7.27). Because f = np , from Lemma 7.31 we know that pow ( b k , b k − , . . . , b ,

1) = pow (cid:0) b k , b k − , . . . , b , high (( γ, D )) (cid:1) ≥ high ( t k ) . Using Proposition 7.29 and Equalities (7.32), then Inequality (7.12), then the above inequality,and then again Proposition 7.29, we obtain high ( R (0)) = pow ( a n , a n − , . . . , a k +1 , b k , a k − b k − , a k − b k − , . . . , a b , ≥ pow ( a n , a n − , . . . , a k +1 , pow ( b k , b k − , . . . , b , , a k − , a k − . . . , a , ≥ pow ( a n , a n − , . . . , a k +1 , high ( t k ) , a k − , a k − , . . . , a ,

1) = high ( R (1)) . Next, suppose that rd ( D ) ∈ T pr . Then Lemma 7.31 gives us the inequality pow (cid:0) b k , b k − , . . . , b , |T | n · high (( γ, D )) (cid:1) ≥ high ( t k ) . (7.33)By deﬁnition it holds high ( s ) = C depth ( D ) = (2 |T | ) n · ( C depth ( D ) − ) |T | +1 ≥ k − · |T | n · C depth ( D (cid:48) ) · (cid:89) E ∈ D C depth ( E ) ≥ k − · |T | n · high (( α, { D (cid:48) } )) · high (( γ, D )) . Using Inequality (7.13) we replace 2 k − in the last argument of pow by 2 in the k − high ( s i )) ≥ a i b i for each i ∈ [1 , k − high ( R (0)) = pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s ) , high ( s ) (cid:1) ≥ pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s ) , k − · |T | n · high (( α, { D (cid:48) } )) · high (( γ, D )) (cid:1) ≥ pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s k ) , ( high ( s k − )) , ( high ( s k − )) , . . . , ( high ( s )) , |T | n · high (( α, { D (cid:48) } )) · high (( γ, D )) (cid:1) ≥ pow (cid:0) a n , a n − , . . . , a k +1 , b k , a k − b k − , a k − b k − , . . . , a b , |T | n · high (( α, { D (cid:48) } )) · high (( γ, D )) (cid:1) ≥ pow (cid:0) a n , a n − , . . . , a k +1 , pow (cid:0) b k , b k − , . . . , b , |T | n · high (( γ, D )) (cid:1) ,a k − , a k − , . . . , a , high (( α, { D (cid:48) } )) (cid:1) ≥ pow (cid:0) a n , a n − , . . . , a k +1 , high ( t k ) , a k − , a k − , . . . , a , high (( α, { D (cid:48) } )) (cid:1) = high ( R (1)) . The inequality for len can be proved in a very similar way as that for high in the case rd ( D ) ∈ T pr . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:49 Relating Upper and Lower Bounds.

As already mentioned, it is meaningful toconsider the functions low and high because they are closely related: one is bounded if theother is bounded. This is shown in the following proposition.

Proposition 7.32.

There exists a function H : N → N such that for each conﬁguration c and each run descriptor σ ∈ type A ,ϕ ( c ) there exists a well-formed annotated n -stack s forwhich type ( top ( s )) = { σ } , and conf ( s ) = c , and high ( s ) ≤ H ( low ( s )) .Proof. Let d be a number such that for each derivation tree there exists a derivation treewith the same conclusion and depth at most d ; such a number exists, because there are onlyﬁnitely many possible conclusions. For each k ≥ N k : N → N , and wetake H = N n . The deﬁnition is inductive: N k (0) = 1, and, for L > N ( L ) = ( C d ) |T | ,N k ( L ) = (cid:0) pow ( N k ( L − , N k − ( L )) (cid:1) |T k − | for k > , where C d is the constant from Deﬁnition 7.21.By deﬁnition of a type, for each conﬁguration c and each run descriptor σ ∈ type A ,ϕ ( c )there exists a well-formed annotated n -stack s such that type ( top ( s )) = { σ } and conf ( s ) = c .We can assume without loss of generality that all derivation trees in s have depth at most d :we can safely replace each tree by another (smaller) tree having the same conclusion. Thus,it is enough to prove that for each well-formed annotated k -stack s , in which all derivationtrees have depth at most d , it holds high ( s ) ≤ N k ( low ( s )).Denote L = low ( s ). If L = 0 then high ( s ) = 1 = N k ( L ), thanks to Proposition 7.27.Suppose that L >

0. In this case we prove the thesis by induction on the structure of s . Fora stack s = ( γ, D ) of order 0 it holds high ( s ) ≤ (cid:89) D ∈ D C depth ( D ) ≤ ( C d ) |T | = N ( L ) . Next, consider a stack s = s k : s k − . Recall that low ( s ) equals the sum of low for s k (cid:22) π ( ass k ( σ )) and s k − (cid:22) { σ } over all σ ∈ type ( s k − ). We have two cases. One possibility isthat low ( s k (cid:22) π ( ass k ( σ )) ) = L for some σ ∈ type ( s k − ). Then low for all other considered stacksis 0, so their high is 1. Using the induction assumption we obtain high ( s ) ≤ pow ( N k ( L ) , · (cid:89) τ ∈ type ( s k − ) \{ σ } pow (1 ,

1) = N k ( L ) . The opposite situation is that low ( s k (cid:22) π ( ass k ( σ )) ) ≤ L − σ ∈ type ( s k − ). Observingthat N k is monotone, by the induction assumption high ( s k (cid:22) π ( ass k ( σ )) ) ≤ N k ( L −

1) and high ( s k − (cid:22) { σ } ) ≤ N k − ( L ) for each σ ∈ type ( s k − ), so we obtain high ( s ) ≤ (cid:89) σ ∈ type ( s k − ) pow ( N k ( L − , N k − ( L )) ≤ N k ( L ) . Pawe(cid:32)l Parys

Vol. 16:3

Assumptions Are Used in Returns.

Our next goal is to formally prove that when-ever an assumption of a run descriptor is used in an annotated run, then we have a return.

Lemma 7.33.

Let s = s n : s n − : · · · : s be a well-formed singular annotated n -stack, where type ( s ) = { σ } . If ( m, ξ ) ∈ ass r ( σ ) , then there exists an annotated run R starting in s suchthat st ( R ) is an r -return, ϕ ( st ( R )) = m , and top r ( R ( | R | )) = s r (cid:22) { ξ } .Proof. We use induction on len ( s ). Thanks to Lemma 7.22 we can always use the inductionassumption for the successor of s . We have several cases depending on the shape of thederivation tree D in s (that is, on the ﬁrst operation in an annotated run starting in s ).We use the characterization of returns from Proposition 6.9. Case 1. If D = ( empty , γ, p ) then ass r ( σ ) = ∅ , so the assumptions cannot hold. Case 2.

Suppose that D = ( read , p, D (cid:48) ). Then the successor t of s diﬀers from s only in thetopmost 0-stack; the new topmost 0-stack has type { τ } such that ass r ( σ ) = ϕ ( a ) ◦ ass r ( τ ),where a is the letter read by the step between conf ( s ) and conf ( t ). Because ( m, ξ ) ∈ ass r ( σ ),there exists m (cid:48) such that m = ϕ ( a ) · m (cid:48) and ( m (cid:48) , ξ ) ∈ ass r ( τ ). By the induction assumptionfor t , there exists an annotated run SS starting in t such that st ( SS ) is an r -return, ϕ ( st ( SS )) = m (cid:48) , and top r ( SS ( | SS | )) = s r (cid:22) { ξ } . Together with the step between s and t , itgives us an annotated run as required. Case 3.

Suppose that D = ( pop , γ, p, τ ), where τ ∈ T k . The successor of s is t = s n : s n − : · · · : s k . Recall that ass i ( σ ) = ∅ for i < k , so r ≥ k . Moreover, ass k ( σ ) = { ( M , τ ) } , and type ( s k ) = π ( ass k ( σ )) = { τ } by well-formedness of s (cf. Proposition 7.14). If r = k , then( m, ξ ) = ( M , τ ). In this case the annotated run of length 1 satisﬁes the thesis. Otherwise r > k , and ( m, ξ ) ∈ ass r ( σ ) = ass r ( τ ). Then as well ( m, ξ ) ∈ ass r ( τ (cid:48) ), where τ (cid:48) is the rundescriptor in the type of top ( s k ) (since τ = red k ( τ (cid:48) )). The induction assumption for t givesus an annotated run SS starting in t such that st ( SS ) is an r -return, ϕ ( st ( SS )) = m , and top r ( SS ( | SS | )) = s r (cid:22) { ξ } . Together with the step between s and t , it gives us an annotatedrun as required. Case 4.

Suppose that D = ( push , γ, p, D (cid:48) , D ). Let α , k , Ψ i , Φ i be as in Deﬁnition 7.11(4).The successor of s is t = s n : s n − : · · · : s k +1 : t k : s k − (cid:22) π (Ψ k − ) : s k − (cid:22) π (Ψ k − ) : · · · : s (cid:22) π (Ψ ) : ( α, { D (cid:48) } ) , where t k = s k (cid:22) π (Φ k ) : s k − (cid:22) π (Φ k − ) : · · · : s (cid:22) π (Φ ) : ( γ, D ). Recall from our previous proofsthat type ( s i ) = π (Ψ i ) for i ∈ [ k + 1 , n ], and type ( t k ) = π (Ψ k ). If ( m, ξ ) ∈ Ψ r and r (cid:54) = k ,then the induction assumption for t gives us an annotated run SS starting in t such that st ( SS ) is an r -return, ϕ ( st ( SS )) = m , and top r ( SS ( | SS | )) = s r (cid:22) { ξ } ; together with the stepbetween s and t , it gives us an annotated run as required. Otherwise ( m, ξ ) ∈ Φ r and r ≤ k .Recall that we have a composer (Φ k , Φ k − , . . . , Φ ; Ψ k ; f ). The deﬁnition of a composer(Condition (C1)) gives us some ( m , τ ) ∈ Φ and m ∈ M such that m = m · m and( m , ξ ) ∈ ass r ( τ ). We see that ( m , red k ( τ )) ∈ Ψ k (by Condition (C2) of the deﬁnition). Theinduction assumption for t and ( m , red k ( τ )) ∈ Ψ k gives us an annotated run SS startingin t such that st ( SS ) is a k -return, ϕ ( st ( SS )) = m , and top k ( SS ( | SS | )) = t k (cid:22) { red k ( τ ) } . ByProposition 7.30, t k (cid:22) { red k ( τ ) } = s k (cid:22) π ( ass k ( τ )) : s k − (cid:22) π ( ass k − ( τ )) : · · · : s (cid:22) π ( ass ( τ )) : ( γ, D ) (cid:22) { τ } . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:51 Recalling that r ≤ k , the induction assumption for SS ( | SS | ) and ( m , ξ ) ∈ ass r ( τ ) gives usan annotated run T starting in SS ( | SS | ) such that st ( T ) is an r -return, ϕ ( st ( T )) = m ,and top r ( T ( | T | )) = s r (cid:22) { ξ } . The step between s and t composed with SS and then with T gives us an annotated run as required.7.5. Completeness of Types.

In the previous subsection we have proved soundness of thetype system, which means that if a run descriptor is contained in the type of a conﬁgurationthen a corresponding run exists from this conﬁguration. As usual, we need the oppositedirection (completeness) as well; that is, having a run from a conﬁguration, we want to implythat the corresponding run descriptor is in the type of this conﬁguration. This is shown inLemma 7.34, which is a converse of Lemma 7.33. While reversing Lemma 7.33 we have toremember that not every run can be extended to an annotated run, so in Lemma 7.34 weneed to use runs, not annotated runs.

Lemma 7.34.

Let R be an r -return, and let ξ ∈ type A ,ϕ ( R ( | R | )) . Then there exists a rundescriptor σ ∈ type A ,ϕ ( R (0)) such that ( ϕ ( R ) , red r ( ξ )) ∈ ass r ( σ ) . Before proving Lemma 7.34, we ﬁrst state four auxiliary lemmas. These lemmas areused not only in the proof of Lemma 7.34, but also in the next subsection. Actually, every ofthese lemmas has a part denoted by ( (cid:63) ); these parts are needed only in the next subsection.

Lemma 7.35.

Let R be a run of length whose transition is read , and let τ ∈ type A ,ϕ ( R (1)) .Then there exists σ ∈ type A ,ϕ ( R (0)) such that ass i ( σ ) = ϕ ( R ) ◦ ass i ( τ ) for each i ∈ [1 , n ] .Moreover, there exists a well-formed singular annotated -stack v such that type ( v ) = { σ } ,and the following is satisﬁed. ( (cid:63) ) Let s (cid:48) be a well-formed annotated n -stack with top ( s (cid:48) ) = v . Then there exists anannotated run SS of length such that SS (0) = s (cid:48) , the transition of st ( SS ) is read , itreads the same letter as the transition of R , and type ( top ( SS (1))) = { τ } .Proof. By deﬁnition of type A ,ϕ , there exists a well-formed annotated stack s = s n : s n − : · · · : s : ( γ, { D (cid:48) } ) such that rd ( D (cid:48) ) = τ and conf ( s ) = R (1). Well-formedness of s impliesthat type ( s i ) = π ( ass i ( τ )) for each i ∈ [1 , n ] (cf. Proposition 7.14). When p is the state of R (0), Deﬁnition 7.11(2) implies that D = ( read , p, D (cid:48) ) is a derivation tree with conclusion γ (cid:96) σ , where σ = ( p, Φ n , Φ n − , . . . , Φ , f ) and Φ i = ϕ ( R ) ◦ ass i ( τ ) for each i ∈ [1 , n ]. Because π ( ass i ( σ )) = π ( ass i ( τ )), the annotated stack s n : s n − : · · · : s : ( γ, { D } ) is well-formed(again, cf. Proposition 7.14), so σ ∈ type A ,ϕ ( R (0)).In order to prove Property ( (cid:63) ), as v we take ( γ, { D } ). Clearly type ( v ) = { rd ( D ) } = { σ } .Consider now any well-formed annotated n -stack s (cid:48) with top ( s (cid:48) ) = v . Let SS be theannotated run from s (cid:48) to its successor. Because the topmost 0-stack of s (cid:48) is annotated by D = ( read , p, D (cid:48) ), the successor of s (cid:48) indeed exists, and the transition of st ( SS ) is read . Thestate of R (1) and the state of conf ( SS (1)) are the same (namely π ( rd ( D (cid:48) ))), so st ( SS )reads the same letter as R . Moreover, top ( SS (1)) = ( γ, { D (cid:48) } ), so type ( top ( SS (1))) = { rd ( D (cid:48) ) } = { τ } , as required. Lemma 7.36.

Let R be a run of length performing pop k , and let τ ∈ type A ,ϕ ( R (1)) .Then there exists σ ∈ type A ,ϕ ( R (0)) such that ass i ( σ ) = ass i ( τ ) for each i ∈ [ k + 1 , n ] , and ass k ( σ ) = { ( M , red k ( τ )) } . Moreover, there exists a well-formed singular annotated k -stack v k such that type ( top ( v k )) = { σ } , and the following is satisﬁed. Pawe(cid:32)l Parys

Vol. 16:3 ( (cid:63) ) Let s (cid:48) be a well-formed annotated n -stack with top k ( s (cid:48) ) = v k . Then there exists anannotated run SS of length such that SS (0) = s (cid:48) , and st ( SS ) performs pop k , and type ( top ( SS (1))) = { τ } .Proof. Denote R (0) = ( p, s n : s n − : · · · : s ); then π ( R (1)) = s n : s n − : · · · : s k . Bydeﬁnition of type A ,ϕ , there exists a well-formed annotated stack s = s n : s n − : · · · : s k suchthat type ( top ( s k )) = { τ } , and st ( s i ) = pos ↓ ( s i ) for each i ∈ [ k, n ]. Then, by Proposition 7.14, type ( s k ) = { red k ( τ ) } . Well-formedness of s implies that type ( s i ) = π ( ass i ( τ )) for each i ∈ [ k + 1 , n ] (cf. Proposition 7.14). For i ∈ [1 , k −

1] let s i be the well-formed annotated i -stack such that type ( s i ) = ∅ and st ( s i ) = pos ↓ ( s i ) (we annotate s i by empty sets). Finally,by Deﬁnition 7.11(3), D = ( pop , s , p, red k ( τ )) is a derivation tree with conclusion γ (cid:96) σ ,where γ = pos ↓ ( s ) and σ = ( p, ass n ( τ ) , ass n − ( τ ) , . . . , ass k +1 ( τ ) , { ( M , red k ( τ ) } , ∅ , . . . , ∅ , np ) , so s = ( γ, { D } ) has type { σ } . Using Proposition 7.14 we observe that s n : s n − : · · · : s iswell-formed, so σ ∈ type A ,ϕ ( R (0)).In order to prove Property ( (cid:63) ), as v k we take s k : s k − : · · · : s . Clearly type ( top ( v k )) = type ( s ) = { σ } . Consider now any well-formed annotated n -stack s (cid:48) with top k ( s (cid:48) ) = v k .Let SS be the annotated run from s (cid:48) to its successor. Because the topmost 0-stack of s (cid:48) is annotated by D , the successor of s (cid:48) indeed exists, and st ( SS ) performs pop k . Moreover, top k ( SS (1)) = s k , so type ( top ( SS (1))) = type ( top ( s k )) = { τ } . Lemma 7.37.

Let R be a run of length performing push kα , and let τ ∈ type A ,ϕ ( R (1)) .Then there exists σ ∈ type A ,ϕ ( R (0)) such that ass i ( τ ) ⊆ ass i ( σ ) for each i ∈ [1 , n ] \ { k } .Moreover, there exists a well-formed singular annotated -stack v such that type ( v ) = { σ } ,and the following is satisﬁed. ( (cid:63) ) Let s (cid:48) be a well-formed annotated n -stack with top ( s (cid:48) ) = v . Then there exists anannotated run SS of length such that SS (0) = s (cid:48) , and st ( SS ) performs push kα , and type ( top ( SS (1))) = { τ } .Proof. Before starting the actual proof, we observe that for each pair of well-formed annotatedstacks s , t such that st ( s ) = st ( t ) we can construct a well-formed annotated stack s ⊕ t whose type is type ( s ) ∪ type ( t ) and such that st ( s ⊕ t ) = st ( s ). We construct s ⊕ t byinduction on the structure of s . Denote (cid:101) Ψ = type ( t ) \ type ( s ). If s is of order 0, then wetake s ⊕ t = ( γ, D ∪ D (cid:48) ), where s = ( γ, D ) and t (cid:22) (cid:101) Ψ = ( γ, D (cid:48) ). If s = t = [ ], then s ⊕ t = [ ]is ﬁne. If s = s j : s j − and t (cid:22) (cid:101) Ψ = t j : t j − , then as s ⊕ t we take ( s j ⊕ t j ) : ( s j − ⊕ t j − );observe that it is well-formed, because the types of s and t (cid:22) (cid:101) Ψ are disjoint.Denote R (0) = ( p, s n : s n − : · · · : s : ( γ, x )); then π ( R (1)) equals s n : s n − : · · · : s k +1 : ( s k : s k − : · · · : s : ( γ, x )) : p +1 ( s k − : s k − : · · · : s : ( α, x )) . By deﬁnition of type A ,ϕ , there exists a well-formed annotated stack s = s n : s n − : · · · : s : ( α, { D (cid:48) } ) in which s k = t k : t k − : · · · : t : ( γ, D ), such that rd ( D (cid:48) ) = τ , and st ( s i ) = pos ↓ ( s i ) for each i ∈ [1 , n ] \ { k } , and st ( t i ) = pos ↓ ( s i ) for each i ∈ [1 , k ]. DenoteΨ i = ass i ( τ ) for each i ∈ [1 , n ]. Well-formedness of s implies that type ( s i ) = π (Ψ i )for each i ∈ [1 , n ] (cf. Proposition 7.14), and, thanks to Proposition 7.15, there existsa composer (Φ k , Φ k − , . . . , Φ ; Ψ k ; f ) such that type ( t i ) = π (Φ i ) for each i ∈ [1 , k ] and ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:53 { rd ( E ) : E ∈ D } = π (Φ ). By Deﬁnition 7.11(4), D = ( push , γ, p, D (cid:48) , D ) is a derivationtree with conclusion γ (cid:96) σ , where σ = ( p, Ψ n , Ψ n − , . . . , Ψ k +1 , Φ k , Ψ k − ∪ Φ k − , Ψ k − ∪ Φ k − , . . . , Ψ ∪ Φ , g ) . Using Proposition 7.14 we observe that the annotated stack s n : s n − : · · · : s k +1 : t k : ( s k − ⊕ t k − ) : ( s k − ⊕ t k − ) : · · · : ( s ⊕ t ) : ( γ, { D } )is well-formed, so σ ∈ type A ,ϕ ( R (0)).In order to prove Property ( (cid:63) ), as v we take ( γ, { D } ). Clearly type ( v ) = { rd ( D ) } = { σ } .Consider now any well-formed annotated n -stack s (cid:48) with top ( s (cid:48) ) = v . Let SS be theannotated run from s (cid:48) to its successor. Because the topmost 0-stack of s (cid:48) is annotated by D , the successor of s (cid:48) indeed exists, and st ( SS ) performs push kα . Moreover, top ( SS (1)) =( α, { D (cid:48) } ), so type ( top ( SS (1))) = { rd ( D (cid:48) ) } = { τ } . Lemma 7.38.

Let R be a run in which R (cid:22) , performs push kα and R (cid:22) , | R | is a k -return,let τ ∈ type A ,ϕ ( R ( | R | )) , and let ρ ∈ type A ,ϕ ( R (1)) be such that ( ϕ ( R ) , red k ( τ )) ∈ ass k ( ρ ) .Then there exists σ ∈ type A ,ϕ ( R (0)) such that ϕ ( R ) ◦ ass i ( τ ) ⊆ ass i ( σ ) for each i ∈ [1 , k ] .Moreover, there exists a well-formed singular annotated -stack v such that type ( v ) = { σ } ,and the following is satisﬁed. ( (cid:63) ) Let s (cid:48) be a well-formed annotated n -stack with top ( s (cid:48) ) = v . Then there exists anannotated run SS of length such that SS (0) = s (cid:48) , and st ( SS ) performs push kα , and type ( top ( SS (1))) = { ρ } , and τ ∈ type ( top ( pop k ( SS (1)))) .Proof. Denote R (0) = ( p, s n : s n − : · · · : s : ( γ, x )); then π ( R (1)) is as in the previouslemma, and top k ( R (0)) ∼ = top k ( R ( | R | )) (due to Proposition 6.7). The deﬁnition of type A ,ϕ gives us a well-formed annotated n -stack u such that type ( top ( u )) = { τ } and conf ( u ) = R ( | R | ), and a well-formed annotated stack s = s n : s n − : · · · : s : ( α, { D (cid:48) } ) such that rd ( D (cid:48) ) = ρ and conf ( s ) = R (1). Denote u k = top k ( u ). Then type ( top ( u k )) = { τ } , and st ( u k ) = pos ↓ ( s k : s k − : · · · : s : ( γ, x )), and st ( s i ) = pos ↓ ( s i ) for each i ∈ [1 , n ] \ { k } ,and st ( s k ) = pos ↓ ( s k : s k − : · · · : s : ( γ, x )). Denote Ψ i = ass i ( ρ ) for each i ∈ [1 , n ].Well-formedness of s implies that type ( s i ) = π (Ψ i ) for each i ∈ [1 , n ] (cf. Proposition 7.14).By Proposition 7.14, type ( u k ) = { red k ( τ ) } . Thanks to the assumption red k ( τ ) ∈ π ( ass k ( ρ ))we have that type ( u k ) ⊆ π (Ψ k ). In eﬀect, the annotated stack u k ⊕ s k has type π (Ψ k ),equal to the type of s k , but additionally τ ∈ type ( top ( u k ⊕ s k )) (recalling the constructionfrom the previous proof, we see that to u k ⊕ s k we take all annotations from u k and someannotations from s k ). Denote u k ⊕ s k = t k : t k − : · · · : t : ( γ, D ). By Proposition 7.15we have a composer (Φ k , Φ k − , . . . , Φ ; Ψ k ; f ) such that type ( t i ) = π (Φ i ) for each i ∈ [1 , k ]and { rd ( E ) : E ∈ D } = π (Φ ). Because τ ∈ π (Φ ) and ( ϕ ( R ) , red k ( τ )) ∈ Ψ k , it holds( ϕ ( R ) , τ ) ∈ Φ (thanks to Conditions (C2) and (C3) of the deﬁnition of a composer), whichimplies ϕ ( R ) ◦ ass i ( τ ) ⊆ Φ i for each i ∈ [1 , k ] (thanks to Condition (C1) of the deﬁnition).By Deﬁnition 7.11(4), D = ( push , γ, p, D (cid:48) , D ) is a derivation tree with conclusion γ (cid:96) σ ,where σ = ( p, Ψ n , Ψ n − , . . . , Ψ k +1 , Φ k , Ψ k − ∪ Φ k − , Ψ k − ∪ Φ k − , . . . , Ψ ∪ Φ , g ) . We observe that the annotated stack s n : s n − : · · · : s k +1 : t k : ( s k − ⊕ t k − ) : ( s k − ⊕ t k − ) : · · · : ( s ⊕ t ) : ( γ, { D } )is well-formed (by Proposition 7.14), so σ ∈ type A ,ϕ ( R (0)). Pawe(cid:32)l Parys

Vol. 16:3

In order to prove Property ( (cid:63) ), as v we take ( γ, { D } ). Clearly type ( v ) = { rd ( D ) } = { σ } .Consider now any well-formed annotated n -stack s (cid:48) with top ( s (cid:48) ) = v . Let SS be theannotated run from s (cid:48) to its successor. Because the topmost 0-stack of s (cid:48) is annotated by D , the successor of s (cid:48) indeed exists, and st ( SS ) performs push kα . Moreover, top ( SS (1)) =( α, { D (cid:48) } ), so type ( top ( SS (1))) = { rd ( D (cid:48) ) } = { ρ } . On the other hand, top ( pop k ( SS (1))) is( γ, D ), so its type is { rd ( E ) : E ∈ D } = π (Φ ), and we know that τ ∈ π (Φ ). Proof of Lemma 7.34.

Recall that we are given an r -return R , and a run descriptor ξ ∈ type A ,ϕ ( R ( | R | )), and we have to show existence of a run descriptor σ ∈ type A ,ϕ ( R (0))such that ( ϕ ( R ) , red r ( ξ )) ∈ ass r ( σ ). We use induction on the length of the r -return R .Proposition 6.9 gives us possible forms of R ; we analyze these cases.Suppose ﬁrst that | R | = 1 and the only transition of R performs pop r . We take σ from Lemma 7.36, where we take ξ as τ and r as k . By assumption ϕ ( R ) = M , so( ϕ ( R ) , red r ( ξ )) ∈ ass r ( σ ).Next, suppose that R (cid:22) , | R | is an r -return, and the ﬁrst transition of R is read , or performs pop k for k < r , or push kα for k (cid:54) = r . The induction assumption for R (cid:22) , | R | gives us a rundescriptor τ ∈ type A ,ϕ ( R (1)) such that ( ϕ ( R (cid:22) , | R | ) , red r ( ξ )) ∈ ass r ( τ ), and Lemma 7.35, or7.36, or 7.37, respectively, used for R (cid:22) , gives us a run descriptor σ ∈ type A ,ϕ ( R (0)) suchthat ϕ ( R (cid:22) , ) ◦ ass r ( τ ) ⊆ ass r ( σ ) (where ϕ ( R (cid:22) , ) may be nontrivial only when the transitionis read ).Finally, suppose that the ﬁrst transition of R performs push kα for k ≥ r and R (cid:22) , | R | = S ◦ T for some k -return S and r -return T . The induction assumption for T gives us a run descriptor τ ∈ type A ,ϕ ( T (0)) such that ( ϕ ( T ) , red r ( ξ )) ∈ ass r ( τ ), and the induction assumption for S gives us a run descriptor ρ ∈ type A ,ϕ ( R (1)) such that ( ϕ ( S ) , red k ( τ )) ∈ ass k ( ρ ). UsingLemma 7.38 for R (cid:22) , ◦ S we obtain a run descriptor σ ∈ type A ,ϕ ( R (0)) such that ϕ ( S ) ◦ ass r ( τ ) ⊆ ass r ( σ ) (recalling that r ≤ k ), so ( ϕ ( R ) , red r ( ξ )) = ( ϕ ( S ) ◦ ϕ ( T ) , red r ( ξ )) ∈ ϕ ( S ) ◦ ass r ( τ ) ⊆ ass r ( σ ).7.6. Reproducing Upper Runs.

Till now we were using types to describe returns from aconﬁguration, but thanks to the decomposition given by Proposition 6.8 we can also describe r -upper runs. This is stated in the following lemma. Lemma 7.39.

Let R be an r -upper run (where r ∈ [0 , n ] ), and let τ ∈ type A ,ϕ ( R ( | R | )) .Then there exists a run descriptor σ ∈ type A ,ϕ ( R (0)) and a monotone function f R : N → N such that the following is satisﬁed. ( (cid:63) ) Let s be a well-formed annotated n -stack such that type ( top ( s )) = { σ } and top r ( conf ( s )) ∼ = top r ( R (0)) . Then there exists a well-formed annotated n -stack t such that type ( top ( t ))= { τ } , and there exists a run S from conf ( s ) to conf ( t ) that is ( r, ϕ ) -parallel to R , andsuch that low ( s ) ≤ f R ( (cid:93) ( S ) + low ( t )) , and f R ( high ( s )) ≥ (cid:93) ( S ) + high ( t ) . The idea staying behind a proof of this lemma is that the run R can be split into partsof two kinds. First, we have parts for which the topmost r -stack of R (0) is responsible.Since in conf ( s ) the topmost r -stack is the same, we can execute them also from conf ( s ).Second, we have parts not controlled by the topmost r -stack of R (0), but (according toProposition 6.8) these are returns. Analogous returns can be executed from conf ( s ), becauseof the run descriptor σ in type ( top ( s )). ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:55 Two aspects of the statement of the lemma can be understood basing on the aboveidea. First, the correction function f R is really needed (i.e., the lemma would be false ifthe identity function was always taken as f R ). Second, there does not need to exist anannotated run from s to t ; we only prove existence of a (non-annotated) run from conf ( s ) to conf ( t ). The justiﬁcation of both these phenomena is the same: while creating the run from conf ( s ), we completely ignore the annotations contained in the topmost r -stack of s (whilean annotated run from s necessarily follows them); instead, as long as the topmost r -stackof st ( s ) controls the run, we copy steps of the run R (only after leaving the topmost r -stack,we start using the annotations from s ). In a sense, f R describes how much can be lost whileignoring annotations in the topmost r -stack of s (recall that, while ignoring annotation, this r -stack is the same as in R (0), thus ﬁxed; only the annotations are not ﬁxed).Before proving Lemma 7.39 we show how Theorem 7.3 follows from it. For thispurpose the inequalities regarding low and high are redundant; they are used later to proveTheorem 7.4. Proof of Theorem 7.3.

Recall that we are given a k -upper run R , and a conﬁguration c having the same ( A , ϕ )-type and the same positionless topmost k -stack as R (0). Considerthe run descriptor τ = ( π ( R ( | R | )) , ∅ , . . . , ∅ , np ) and observe that τ ∈ type A ,ϕ ( R ( | R | )) (weannotate the topmost 0-stack of R ( | R | ) by the derivation tree from Deﬁnition 7.11(1)).Applying Lemma 7.39 we obtain a run descriptor σ ∈ type A ,ϕ ( R (0)) = type A ,ϕ ( c ). Then wetake any well-formed annotated n -stack s such that type ( top ( s )) = { σ } and conf ( s ) = c ,existing by the deﬁnition of type A ,ϕ . Since top k ( conf ( s )) ∼ = top k ( R (0)), from Property ( (cid:63) ) ofLemma 7.39 we obtain a run S that starts in c and is ( k, ϕ )-parallel to R , as required.Below, we give an auxiliary lemma, showing how to construct a function f R needed forLemma 7.39. Lemma 7.40.

Let k ∈ [0 , n ] , and let v k be a well-formed singular annotated k -stack. Thenthere exists a monotone function f v k : N → N such that for all well-formed singular annotated n -stacks s , s (cid:48) with s = s n : s n − : · · · : s k +1 : s k and s (cid:48) = s n : s n − : · · · : s k +1 : v k and st ( s k ) = st ( v k ) it holds that low ( s ) ≤ f v k ( low ( s (cid:48) )) and f v k ( high ( s )) ≥ high ( s (cid:48) ) .Proof. We ﬁrst deﬁne the function f v k , and then we show that it satisﬁes the thesis.Suppose that k and v k are ﬁxed. We construct f v k ( N ) by induction on N . Considersome N ∈ N . First, we ensure that f v k ( N ) ≥ N + low ( s k ) for all annotated k -stacks s k such that st ( s k ) = st ( v k ). This is possible, because low ( s k ) equals the number ofproductive run descriptors altogether in the types of all 0-stacks in s k . So, althoughthere are inﬁnitely many annotated k -stacks s k such that st ( s k ) = st ( v k ), the value of low ( s k ) is bounded by the number of 0-stacks in st ( s k ) times |T | . Next, we ensure that f v k ( N ) ≥ pow ( a n , a n − , . . . , a k +1 , high ( v k )) for all tuples ( a n , a n − , . . . , a k +1 , a k ) of positiveintegers such that pow ( a n , a n − , . . . , a k +1 , a k ) = N . Notice that there are only ﬁnitely manysuch tuples (in particular, none of a i may be greater than N ). Finally, we ensure that f v k ( N ) ≥ f v k ( N −

1) (unless N = 0), in order to ensure monotonicity of f v k (since we aredeﬁning f v k by induction, f v k ( N −

1) is already deﬁned).Consider now well-formed singular annotated n -stacks s , s (cid:48) such that s = s n : s n − : · · · : s k +1 : s k , and s (cid:48) = s n : s n − : · · · : s k +1 : v k , and st ( s k ) = st ( v k ). Using Proposition 7.29 Pawe(cid:32)l Parys

Vol. 16:3 and properties of f v k ensured in its deﬁnition, we obtain the required inequalities: low ( s ) = low ( s k ) + n (cid:88) i = k +1 low ( s i ) ≤ f v k (cid:16) n (cid:88) i = k +1 low ( s i ) (cid:17) ≤ f v k (cid:16) low ( v k ) + n (cid:88) i = k +1 low ( s i ) (cid:17) = f v k ( low ( s (cid:48) )) ,f v k ( high ( s )) = f v k (cid:0) pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s k +1 ) , high ( s k ) (cid:1)(cid:1) ≥ pow (cid:0) high ( s n ) , high ( s n − ) , . . . , high ( s k +1 ) , high ( v k ) (cid:1) = high ( s (cid:48) ) . Proof of Lemma 7.39.

The proof is by induction on the length of the r -upper run R . Propo-sition 6.8 gives us possible forms of R ; we analyze these cases.If R has length 0, then we can take τ as σ and the identity function as f R ; given s wetake it as t , and as S we take the run of length 0 from conf ( s ).Suppose that R has length 1, and its transition either is read or performs push kα . Thenwe construct σ and v out of R and τ as in Lemma 7.35 or Lemma 7.37, respectively.Recall that σ ∈ type A ,ϕ ( R (0)), as needed. As f R we take the function f v constructed inLemma 7.40 for the annotated 0-stack v . Next, we are given a well-formed annotated n -stack s such that type ( top ( s )) = { σ } and top r ( conf ( s )) ∼ = top r ( R (0)). Let s (cid:48) be theannotated n -stack obtained from s by replacing its topmost 0-stack with v . Because type ( v ) = { σ } = type ( top ( s )), we have that s (cid:48) is also well-formed. As t we take thesuccessor of s (cid:48) , and as S the one-step run from conf ( s (cid:48) ) (i.e., from conf ( s )) to conf ( t ). ByProperty ( (cid:63) ) of Lemma 7.35 or Lemma 7.37, respectively, the successor of s (cid:48) indeed exists,and type ( top ( t )) = { τ } ; moreover, the run S performs the same transition as R , and in thecase of read it reads the same letter, so S is ( r, ϕ )-parallel to R . Recalling that f R satisﬁesthe thesis of Lemma 7.40, and using Lemma 7.22, we obtain the required inequalities: low ( s ) ≤ f R ( low ( s (cid:48) )) ≤ f R ( (cid:93) ( S ) + low ( t )) , and f R ( high ( s )) ≥ high ( s (cid:48) ) ≥ (cid:93) ( S ) + high ( t ) . Next, suppose that R has length 1 and performs pop k for k ≤ r . We construct σ and v k out of R and τ as in Lemma 7.36. Recall that σ ∈ type A ,ϕ ( R (0)), as needed.As f R we take the function f v k constructed in Lemma 7.40 for the annotated k -stack v k . Then, we are given an annotated stack s = s n : s n − : · · · : s k +1 : s k such that type ( top ( s )) = { σ } and top r ( conf ( s )) ∼ = top r ( R (0)). Consider s (cid:48) = s n : s n − : · · · : s k +1 : v k .Because type ( top ( v k )) = { σ } = type ( top ( s )), by Proposition 7.14 we have that type ( v k ) = { red k ( σ ) } = type ( top k ( s )); in eﬀect s (cid:48) is also well-formed. As t we take the successor of s (cid:48) ,and as S the one-step run from conf ( s (cid:48) ) to conf ( t ). By Property ( (cid:63) ) of Lemma 7.36, thesuccessor of s indeed exists, and type ( top ( t )) = { τ } , and the transition of S performs pop k .Clearly S is ( r, ϕ )-parallel to R . The required inequalities are obtained in the same way asin the previous case, due to Lemmas 7.40 and 7.22.Next, suppose that R (cid:22) , performs push kα and R (cid:22) , | R | is a k -return, where k ≥ r + 1.This case is similar, but slightly more complicated. First, using Lemma 7.34, we construct arun descriptor ρ ∈ type A ,ϕ ( R (1)) such that ( ϕ ( R ) , red k ( τ )) ∈ ass k ( ρ ). Then, we construct σ and v out of R , τ , and ρ as in Lemma 7.38. Recall that σ ∈ type A ,ϕ ( R (0)), as needed. As f R we take the function f v constructed in Lemma 7.40 for the annotated 0-stack v . Whenwe are given s , we proceed as follows. First, as s (cid:48) we take the annotated n -stack obtained ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:57 from s by replacing its topmost 0-stack with v . As in the previous cases, s (cid:48) is well-formed.Let SS be the one-step annotated run from s (cid:48) , and let SS (1) = u n : u n − : · · · : u . ByProperty ( (cid:63) ) of Lemma 7.38, SS indeed exists, and st ( SS ) performs push kα , and type ( u ) = { ρ } , and τ ∈ type ( top ( u k )). Because ( ϕ ( R ) , red k ( τ )) ∈ ass k ( ρ ), Lemma 7.33 gives us anannotated run T starting in SS (1) such that st ( T ) is a k -return, ϕ ( st ( T )) = ϕ ( R ), and top k ( T ( | T | )) = u k (cid:22) red k ( τ ) . As S we take st ( SS ◦ T ), and as t we take T ( | T | ). Proposition 7.30implies that { τ } = type ( top ( u k (cid:22) red k ( τ ) )) = type ( top ( t )). By Proposition 6.7 we obtain that top k ( R (0)) ∼ = top k ( R ( | R | )) and top k ( S (0)) ∼ = top k ( S ( | S | )). Since k ≥ r + 1, top r ( R ( | R | )) ∼ = top r ( S ( | S | )) as well. Together with ϕ ( S ) = ϕ ( S (cid:22) , | S | ) = ϕ ( R (cid:22) , | R | ) = ϕ ( R ) this means that R and S are ( r, ϕ )-parallel, because by deﬁnition no suﬃx of a k -return can be ( k − r -upper). The inequalities are obtained as in the previous cases.Finally, suppose that R is a composition of shorter k -upper runs R and R . Theinduction assumption used for R and for τ gives us a run descriptor ρ ∈ type A ,ϕ ( R (0)) anda function f . Then, the induction assumption used for R and for ρ gives us a run descriptor σ ∈ type A ,ϕ ( R (0)) and a function f . As f R we take a monotone function such that for eachpair a , b of natural numbers it holds f R ( a ) ≥ f ( a ) + f ( f ( a )) and f R ( a + b ) ≥ f ( a + f ( b )).Then, we are given a well-formed annotated n -stack s such that type ( top ( s )) = { σ } and top r ( conf ( s )) ∼ = top r ( R (0)). From the induction assumption for R we obtain a well-formed annotated n -stack u such that type ( top ( u )) = { ρ } , and a run S from conf ( s ) to conf ( u ) being ( r, ϕ )-parallel to R . Then, from the induction assumption for R we obtaina well-formed annotated n -stack t such that type ( top ( t )) = { τ } , and a run S from conf ( u )to conf ( t ) being ( r, ϕ )-parallel to R . As S we take the composition of S and S ; it is( r, ϕ )-parallel to R . Using the inequalities from the induction assumption we obtain low ( s ) ≤ f ( (cid:93) ( S ) + low ( u )) ≤ f ( (cid:93) ( S ) + f ( (cid:93) ( S ) + low ( t ))) ≤ f R ( (cid:93) ( S ) + (cid:93) ( S ) + low ( t )) = f R ( (cid:93) ( S ) + low ( t )) f R ( high ( s )) ≥ f ( high ( s )) + f ( f ( high ( s )) ≥ (cid:93) ( S ) + high ( u ) + f ( (cid:93) ( S ) + high ( u )) ≥ (cid:93) ( S ) + f ( high ( u )) ≥ (cid:93) ( S ) + (cid:93) ( S ) + high ( t ) = (cid:93) ( S ) + high ( t ) . Sequence-Equivalence.

In the ﬁnal part of this section we deﬁne sequence-equivalence,and we prove Theorem 7.4.

Deﬁnition 7.41.

Let ( c i ) ∞ i =1 be a sequence of conﬁgurations. We deﬁne stype (( c i ) ∞ i =1 ) ⊆ T to be the set of such σ ∈ T that there exists a sequence of well-formed annotated n -stacks ( s i ) ∞ i =1 for which type ( top ( s i )) = { σ } and conf ( s i ) = c i for each i , and the sequence( high ( s i )) ∞ i =1 is bounded (notice that we require the same type { σ } for all i ). We say thattwo sequences of conﬁgurations, ( c i ) ∞ i =1 and ( d i ) ∞ i =1 , are ( A , ϕ ) -sequence-equivalent when itholds that stype (( c i ) ∞ i =1 ) = stype (( d i ) ∞ i =1 ). Proof of Theorem 7.4.

Recall that we are given a run R ◦ R (cid:48) in which R is k -upper and R (cid:48) is an n -return; we are also given two inﬁnite sequences of conﬁgurations c , c , . . . and d , d , . . . that are ( A , ϕ )-sequence-equivalent, and in which all conﬁgurations have the same( A , ϕ )-type and the same positionless topmost k -stack as R (0). Our goal is to construct,for each i , runs S i ◦ S (cid:48) i from c i , and T i ◦ T (cid:48) i from d i in which S i and T i are ( k, ϕ )-parallelto R , and S (cid:48) i and T (cid:48) i are n -returns such that ϕ ( S (cid:48) i ) = ϕ ( T (cid:48) i ) = ϕ ( R (cid:48) ), and such that thesequences (cid:93) ( S ◦ S (cid:48) ) , (cid:93) ( S ◦ S (cid:48) ) , . . . and (cid:93) ( T ◦ T (cid:48) ) , (cid:93) ( T ◦ T (cid:48) ) , . . . are either both bounded Pawe(cid:32)l Parys

Vol. 16:3 or both unbounded. Let ξ = ( π ( R (cid:48) ( | R (cid:48) | )) , ∅ , . . . , ∅ , np ). We see that ξ ∈ type A ,ϕ ( R (cid:48) ( | R (cid:48) | )),because we can annotate the topmost 0-stack ( γ, x ) by { ( empty , γ, π ( R (cid:48) ( | R (cid:48) | ))) } and allother 0-stacks by ∅ . Lemma 7.34 applied to R (cid:48) and ξ implies that type A ,ϕ ( R (cid:48) (0)) contains arun descriptor τ such that ( ϕ ( R (cid:48) ) , red n ( ξ )) ∈ ass n ( τ ). Then, Lemma 7.39 applied to R and τ gives us a run descriptor σ ∈ type A ,ϕ ( R (0)) and a function f R . We have two cases. Case 1.

Suppose ﬁrst that σ ∈ stype (( c i ) ∞ i =1 ) (hence also σ ∈ stype (( d i ) ∞ i =1 )). Then we havea sequence of annotated n -stacks ( s i ) ∞ i =1 such that type ( top ( s i )) = { σ } and conf ( s i ) = c i foreach i , and the sequence ( high ( s i )) ∞ i =1 is bounded. Recall that the topmost k -stacks of c i and of R (0) are positionless-equal, for each i . We use Property ( (cid:63) ) of Lemma 7.39 for the annotatedstack s i . We obtain a well-formed annotated n -stack t i such that type ( top ( t i )) = { τ } ,and a run S i from c i to conf ( t i ) being ( k, ϕ )-parallel to R and such that f R ( high ( s i )) ≥ (cid:93) ( S i ) + high ( t i ). Next, for each i we apply Lemma 7.33 for t i and for the pair ( ϕ ( R (cid:48) ) , red n ( ξ )).We obtain an annotated run SS (cid:48) i starting in t i such that st ( SS (cid:48) i ) is an n -return, ϕ ( st ( SS (cid:48) i )) = ϕ ( R (cid:48) ), and type ( SS (cid:48) i ( | SS (cid:48) i | )) = { red n ( ξ ) } . Let S (cid:48) i = st ( SS (cid:48) i ), and u i = SS (cid:48) i ( | SS (cid:48) i | ). Thanks toLemma 7.22, high ( t i ) ≥ (cid:93) ( S (cid:48) i ) + high ( u i ). Because ( high ( s i )) ∞ i =1 is bounded, we see that thesequence ( (cid:93) ( S i ◦ S (cid:48) i )) ∞ i =1 is bounded as well.We perform the same construction for ( d i ) ∞ i =1 , obtaining runs T i ◦ T (cid:48) i from d i , such that( (cid:93) ( T i ◦ T (cid:48) i )) ∞ i =1 is bounded. Case 2.

This is the opposite case: we suppose that σ (cid:54)∈ stype (( c i ) ∞ i =1 ). Recall that σ ∈ type A ,ϕ ( R (0)) = type A ,ϕ ( c i ) for each i . Using Proposition 7.32 we construct a well-formedannotated n -stack s i such that type ( top ( s i )) = { σ } , and conf ( s i ) = c i , and high ( s i ) ≤ H ( low ( s i )) for a function H not depending on i . Our assumption ensures that ( high ( s i )) ∞ i =1 is unbounded, so ( low ( s i )) ∞ i =1 is unbounded as well. We construct the runs exactly in thesame way as in Case 1, but this time we concentrate on the opposite inequalities. Foreach i it holds that low ( s i ) ≤ f R ( (cid:93) ( S i ) + low ( t i )) ≤ f R ( (cid:93) ( S i ◦ S (cid:48) i ) + low ( u i )). Additionally low ( u i ) = 0, because type ( u i ) = type ( SS (cid:48) i ( | SS (cid:48) i | )) = { red n ( ξ ) } ⊆ T np (cf. Proposition 7.27).It follows that ( (cid:93) ( S i ◦ S (cid:48) i )) ∞ i =1 is unbounded, and similarly ( (cid:93) ( T i ◦ T (cid:48) i )) ∞ i =1 .8. Milestone Configurations

In this section we deﬁne so-called milestone conﬁgurations and we show their basic properties.The intuitions are as follows. Consider a long run reading only stars. Looking globally, thestack grows (or remains unchanged). Locally, however, some parts of the stack might beconstructed, and a few steps later removed. In order to handle this behavior, we concentrateon those conﬁgurations of the run in which the stack is minimal (in appropriate sense) andwill not be destroyed later; they are called milestone conﬁgurations.The idea of considering milestone conﬁgurations comes from Kartzow [Kar10], but ourdeﬁnition is slightly diﬀerent (namely, their deﬁnition is relative to a run, which can bearbitrary, while our deﬁnition is absolute: we always consider the run reading only stars).For this section we ﬁx an n -DPDA A with stack alphabet Γ and with input alphabet A containing a distinguished symbol denoted (cid:63) (star). Deﬁnition 8.1.

We say that a conﬁguration c is a milestone (or a milestone conﬁguration)if there exists an inﬁnite run R from c reading only stars, and an inﬁnite set I of indicessuch that 0 ∈ I , and R (cid:22) i,j ∈ up for all i, j ∈ I , i ≤ j . ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:59 Example 8.2.

Consider a DPDA of order 3. Suppose that there is a run that begins in astack pos + ([[[ a, a ]]]), and performs forever the following sequence of operations, in a loop: push a , push a , pop , push a , pop , push a . Then the positionless topmost 2-stack is, alternately, [[ a, a ]], or [[ a, a ] , [ a, a ]], or [[ a, a ] , [ a ]].This run does not read any symbols, so it is a degenerate case of an inﬁnite run that readsonly stars. Conﬁgurations with positionless topmost 2-stack [[ a, a ]] are milestones (and noother conﬁgurations in this run). To obtain a less degenerate case, we may consider a loopof transitions as above, but containing additionally a read transition; when a star is read,the loop continues (we do not care what happens when any other symbol is read). Thenagain conﬁgurations having [[ a, a ]] as the topmost 2-stack are milestones.If c is a milestone, R the (unique) inﬁnite run from c reading only stars, and I a set likein the deﬁnition of a milestone, then for each i ∈ I the conﬁguration R ( i ) is a milestone aswell. The following lemma shows that in fact the set I can contain all indices i for which R ( i ) is a milestone. Lemma 8.3.

Let R be a run between two milestone conﬁgurations. If R reads only stars,then it is -upper.Proof. We prove by induction on n − k , where k ∈ [0 , n ], that each run R as in the lemma is k -upper. Trivially each run is n -upper. Now suppose that the thesis holds for some k > R between two milestone conﬁgurations, and suppose that it reads only stars.Let S be the inﬁnite run that starts in R (0) and reads only stars (since R (0) is a milestone,the run is really inﬁnite); R is its preﬁx. Notice that we can ﬁnd a milestone S ( i ) such that i ≥ | R | and S (cid:22) ,i is ( k − i ≥ | R | from the inﬁnite set I from Deﬁnition 8.1. From the induction assumption we know that S (cid:22) | R | ,i is k -upper. We conclude that S (cid:22) , | R | = R is ( k − S (cid:22) , | R | ◦ S (cid:22) | R | ,i .Another important property is that in a very long run reading only stars we can ﬁnda milestone conﬁguration. What “very long” means of course depends on the size of theconﬁguration where the run starts. Lemma 8.4.

Let l ∈ [1 , n ] . There exists a function β , assigning a natural number to everypositionless l -stack, having the following property. Let R be a run that reads only stars, let s l | R | be an l -stack of R ( | R | ) , and let s li = hist ( R (cid:22) i, | R | , s l | R | ) for all i ∈ [0 , | R | ] . If there exist atleast β ( pos ↓ ( s l )) indices i such that s li = top l ( R ( i )) , then for some index i the conﬁguration R ( i ) is a milestone and s li = top l ( R ( i )) . Corollary 8.5. If R is an inﬁnite run reading only stars, then for inﬁnitely many indices i the conﬁguration R ( i ) is a milestone.Proof. To obtain a ﬁrst milestone conﬁguration, it is enough to use Lemma 8.4 for l := n andfor the preﬁx of R of length β ( pos ↓ ( π ( R (0)))). We repeat this procedure for the remainingsuﬃx of R . In order to get some intuitions on Lemma 8.4, let us ﬁrst see why it works for l = n . In thiscase s l is just the whole n -stack of R (0). Moreover, the assumption that there exist at least β ( pos ↓ ( s l )) indices i such that s li = top l ( R ( i )) simply expresses that | R | + 1 ≥ β ( pos ↓ ( s l )). There exists a direct proof of this corollary, not presented here, which is much easier than the proof ofLemma 8.4.

Pawe(cid:32)l Parys

Vol. 16:3

Thus, the lemma says that if we have a long enough run that starts in a conﬁguration withstack s l and reads only stars, then the run reaches a milestone conﬁguration. This, in turn,means that we cannot decrease the stack s l forever. Indeed, recall the intuition that amilestone conﬁguration is a minimal conﬁguration, that is, such that the run reading onlystars never visits a “smaller” conﬁguration. It is just enough to consider the inﬁnite runreading only stars, and take the ,,smallest” conﬁguration visited by this run; this should bea milestone conﬁguration.When l < n , the lemma concentrates on the history of a single l -stack s l | R | (anotherpoint of view is that it concentrates on the future of a single l -stack s l ). We look at thefragments of R where this l -stack is the topmost l -stack; the length of these fragments isrequired to be at least β ( pos ↓ ( s l )) in total. The lemma says that regardless of what happensin other fragments of R , controlled by other parts of the n -stack of R (0) (outside of s l ), thestack s l can itself ensure that a milestone conﬁguration is reached.In the remaining part of the section we prove Lemma 8.4. Our proof strategy is asfollows. The indices i for which s li is the topmost l -stack give us a decomposition of aninﬁx of R into many l -upper runs. As a ﬁrst step, consecutively for k = l − , l − , . . . , R into many k -upper runs. Then, among theborders of the constructed 0-upper runs we ﬁnd two conﬁgurations having the same type.Using Theorem 7.3 we can replicate the 0-upper run between them into arbitrarily manyconsecutive 0-upper runs, proving that these two conﬁgurations are milestones.The division of an inﬁx of R into k -upper runs is described using k -advancing sets,deﬁned as follows. Assuming that R is ﬁxed, a set I k ⊆ [0 , | R | ] is called k -advancing if ∅ (cid:54) = I k = { i ∈ [min I k , max I k ] : R (cid:22) i, max I k ∈ up k } . Notice that when min I k ≤ i ≤ j ∈ I k , then i belongs to I k if and only if R (cid:22) i,j is k -upper. Inother words, a k -advancing set not only gives us a decomposition into k -upper runs, but alsothese k -upper runs cannot be further subdivided into shorter k -upper runs. The followingauxiliary lemma describes our induction step. Lemma 8.6.

Let k ∈ [1 , n ] , and N ∈ N . There exists a function f kN : N → N , having thefollowing properties. Let R be a run that reads only stars, and let I k be a k -advancingset. If | I k | ≥ f kN ( | top k ( R (min I k )) | ) , then there exists a ( k − -advancing subset I k − ⊆ I k of size at least N , such that pos ↓ ( top k − ( R (min I k − ))) is one of the ( k − -stacks in pos ↓ ( top k ( R (min I k ))) .Proof. We prove the lemma by induction on N . For N = 1 we can take f k ( r ) := 1, andthen I k − := { min I k } . Let now N ≥

2. We take f kN ( r ) := 1 + r (cid:88) m =1 f kN − ( m + 1) . Fix some R and I k satisfying the assumptions. Let a := min I k and r := | top k ( R (min I k )) | .For each j ∈ I k denote r j := | top k ( R ( j )) | and m j := min { r i : i ∈ I k ∧ i ≤ j } . Notice that 1 ≤ m j ≤ r (because r a = r ) and that m j ≥ m j (cid:48) for j ≤ j (cid:48) . From the formulafor f kN ( r ) we see that for some m we have at least f kN − ( m + 1) + 1 indices j ∈ I k such that m j = m , by the pigeonhole principle (if for every m there were at most f kN − ( m + 1) such ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:61 indices, in total we would have at most (cid:80) rm =1 f kN − ( m + 1) = f kN ( r ) − I k , butwe have at least f kN ( r ) of them). Choose some such m ; let b be the ﬁrst index such that m b = m , and e the last such index. We see that m = r b .Let c be the next element of I k after b (of course c ≤ e ). Notice that r c ≤ r b + 1 = m + 1;this follows from Proposition 6.2 used for the run R (cid:22) b,c . Thus, | I k ∩ [ c, e ] | ≥ f kN − ( m + 1) ≥ f kN − ( r c ) . We use the induction assumption for I k ∩ [ c, e ]. We obtain a ( k − J k − ⊆ I k ∩ [ c, e ] of size at least N −

1. We take I k − := { i ∈ [ b, min J k − ] : R (cid:22) i, max J k − ∈ up k − } ∪ J k − . We easily see that I k − is ( k − J k − exactly these indices for whichthe appropriate run is ( k − r i ≥ m i = m = r b for each i ∈ I k ∩ [ b, e ](hence, in particular, for each i ∈ I k ∩ [ b, max J k − ]), Proposition 6.3 implies that R (cid:22) b, max J k − is ( k − I k − in addition to the N − J k − contains at least oneadditional element b .Finally, we show that pos ↓ ( top k − ( R ( b ))) is one of the ( k − pos ↓ ( top k ( R ( a ))).We know that r i ≥ m i > m = r b for each i ∈ I k ∩ [ a, b − R (cid:22) i,b is not ( k − k − R ( b ) wasnot modiﬁed since R ( a ). On the other hand R (cid:22) a,b is k -upper, thus, indeed, the topmost( k − R ( b ) is one of the ( k − k -stack of R ( a ) (whileignoring positions annotating the stacks).While using Theorem 7.3 we need to ensure that the replicated run reads only stars.For this reason, ﬁx a ﬁnite monoid M , and a morphism ϕ : A ∗ → M , such that its value ϕ ( w ) determines whether a word w consists only of stars. Our second auxiliary lemma isused to conclude the proof of Lemma 8.4. Lemma 8.7.

Let S be a nonempty -upper run reading only stars, in which S ( | S | ) has thesame ( A , ϕ ) -type and the same topmost stack symbol as S (0) (where ϕ as above). Then S can be extended into a run S ◦ T ◦ U reading only stars, where T and U are nonempty -upper runs, and U ( | U | ) has the same ( A , ϕ ) -type and the same topmost stack symbol as U (0) . As a consequence, S (0) is a milestone.Proof. First, we observe that for each r ∈ N we can construct a composition S ◦ · · · ◦ S r of r nonempty 0-upper runs, reading only stars, in which S = S . For r = 1 this is triviallytrue. Suppose that we have such a composition for some r . Then Theorem 7.3 applied tothis composition and to S ( | S | ) (where we use the fact that S ( | S | ) has the same ( A , ϕ )-typeand the same positionless topmost 0-stack as ( S ◦ · · · ◦ S r )(0)) gives us a run that starts in S ( | S | ) and is ( A , ϕ )-parallel to S ◦ · · · ◦ S r . Recalling the deﬁnition of being ( A , ϕ )-parallel,we see that this run is a composition S (cid:48) ◦ · · · ◦ S (cid:48) r of r nonempty 0-upper runs reading onlystars. Together with S at the beginning, they give a longer composition as required.Take such a composition for r equal to the number of stack symbols in our alphabetΓ, times the number of ( A , ϕ )-types, plus two. Then, by the pigeonhole principle, we canﬁnd two indices i, j ∈ [2 , r ] with i < j for which S j ( | S j | ) has the same ( A , ϕ )-type and thesame topmost stack symbol as S i ( | S i | ). Skipping the part after S j , we obtain a composition S ◦ T ◦ U as required.We can repeat the same construction for U , and append two more nonempty 0-upperruns, out of which the second has equal ( A , ϕ )-type and topmost stack symbol at its two Pawe(cid:32)l Parys

Vol. 16:3

Figure 5: An example conﬁguration at the end of a run of a 2-DPDA, and an analogousconﬁguration after pumping. The 2-stack grows from left to right. White symbolswere already present in R (0); dark gray symbols were created while reading starsat the beginning of R ; light gray symbols were created later.ends. Continuing this forever, we obtain an inﬁnite run reading only stars, divided into0-upper runs. Since it starts in S (0), this conﬁguration is a milestone. Proof of Lemma 8.4.

Fix some l -stack s l . Let N be equal to the number of stack symbolsin the alphabet, times the number of ( A , ϕ )-types, plus one, where again ϕ checks whether aword consists only of stars. For k ∈ [1 , l ] we take N k = f kN k − ( r k ), where r k is the maximalsize of a k -stack that appears in s l , and f kN k − is the function from Lemma 8.6. We deﬁne β ( pos ↓ ( s l )) := N l .Now take a run R and l -stacks s li for i ∈ [1 , | R | ], such that the assumptions of the lemmaare satisﬁed. First, for each k ∈ [0 , l ] we want to construct a k -advancing set I k of size atleast N k , such that pos ↓ ( top k ( R (min I k ))) is one of the k -stacks in pos ↓ ( s l ).As I l we take the set of those indices i for which s li = top l ( R ( i )). It is immediatefrom the deﬁnitions that I l is l -advancing (recall that s li = hist ( R (cid:22) i,j , s lj ) for i ≤ j ). Byassumption | I l | ≥ β ( pos ↓ ( s l )) = N l . Moreover, s l min I l was not modiﬁed from the beginningof the run (as it was not the topmost l -stack), so this l -stack is positionless-equal to s l .Then by induction on l − k , we construct I k − out of I k using Lemma 8.6. Notice thatthe size of the topmost k -stack of R (min I k ) is at most r k , so we can indeed obtain I k − ofsize at least N k − .Finally, we have a 0-advancing set I such that | I | ≥ N . Observe that in I we can ﬁndtwo indices i, j with i < j such that R ( j ) has the same ( A , ϕ )-type and the same topmoststack symbol as R ( i ), by the pigeonhole principle (recall the deﬁnition of N from the ﬁrstparagraph of the proof). Lemma 8.7 applied to R (cid:22) i,j proves that R ( i ) is a milestone. Byconstruction i ∈ I ⊆ · · · ⊆ I l , so s li = top l ( R ( i )).9. Pumping Lemma

In this section we present a pumping lemma, which can be used to change the number ofstars read in some place of a run, without changing too much the rest of the run. For thissection we ﬁx an n -DPDA A with input alphabet A containing a (cid:63) symbol. We also ﬁx amorphism ϕ : A ∗ → M into a ﬁnite monoid. ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:63 We start by an intuitive explanation of the pumping lemma. In the situation that weconsider, we have a milestone conﬁguration from which we start runs that ﬁrst read somenumber of stars, and later also other symbols. One possibility is that most of these runsare k -upper (for some k ), except maybe some runs reading a small number of stars at thebeginning. Then we are unable to use our pumping lemma, but we gain the knowledge thatour run is k -upper. The opposite situation is that there are runs from this conﬁgurationthat are not k -upper and read arbitrarily many stars at the beginning; our pumping lemmatalks about this situation. Consider such a run R of a 2-DPDA, whose last conﬁguration isdepicted on the left of Figure 5. It starts in a milestone, so its initial fragment that readsonly stars is basically 0-upper. This means that the automaton builds on top of the stack of R (0) (depicted in white), without modifying it; also in the copies of the topmost 1-stackthe original part is not modiﬁed (the automaton can inspect this part, but then it has tobe removed). We consider a run that is not 0-upper, so later, when we start reading othersymbols than stars, the “white part” of the topmost 1-stack is uncovered; its content is thesame as in R (0). By assumption there exists a run from the same conﬁguration that readsmore (arbitrarily many) stars at the beginning, and is not 0-upper. When it uncovers the“white part” of the topmost 1-stack, this part is exactly the same as in the original run, sothese runs can continue in the same way. This is depicted on the right of the ﬁgure.Next, we state our pumping lemma. For uniformity of presentation, we refer there to( − − Theorem 9.1 (Pumping lemma) . For each milestone conﬁguration c there exists a number pb ( c ) having the following property. Let R ◦ R (cid:48) be a run starting in c , where R is not ( k − -upper and reads a word beginning with at least pb ( c ) stars, and R (cid:48) is k -upper. In sucha situation, for each l ∈ N there exists a run S ◦ S (cid:48) starting in c , and such that ϕ ( S ) = ϕ ( R ) ,and S reads a word beginning with at least l stars, and S (cid:48) is ( k, ϕ ) -parallel to R (cid:48) . Let us mention that another pumping lemma for higher-order pushdown automata waspresented in a former paper by the author [Par12c]. There are several diﬀerences betweenthese two lemmas. An advantage of the former lemma is that it gives a precise value for pb ( c ),in terms of the size of c . Moreover, it works not only for deterministic PDA, but also fornondeterministic PDA in which the ε -closure of the conﬁguration graph is ﬁnitely-branching.On the other hand, the former pumping lemma is only given for k = 0. Additionally, itjust says that the length of the word read by the run increases, not necessarily the numberof stars at its beginning. The former pumping lemma was later generalized to collapsiblepushdown automata [KP12].In the rest of the section we present a proof of Theorem 9.1. Its essence is as describedabove: we consider the moment when the run ceases to be ( k − pop k operation; our topmost k -stack becomespositionless-equal to pop k ( top k ( R (0))). This can also happen during a pop r operation forsome r > k . Then we can obtain another topmost k -stack, but altogether we have onlyﬁnitely many possibilities. At least one of these possibilities happens for runs readingarbitrarily large number of stars at the beginning, by the pigeonhole principle; we can stickto this possibility. Next, when we change the number of stars read at the beginning, we stillland in a conﬁguration having the same positionless topmost k -stack as in the original run,when the run ceases to be ( k − k < n −

1, the type of the rest of the stack is important as well (thelatter fragment of the run can perform returns visiting interiors of our stack; the existence

Pawe(cid:32)l Parys

Vol. 16:3 of such returns is described by the type). This is not a problem, since the type comes froma ﬁnite set, so we can assume that it is ﬁxed as well.The most diﬃcult part of the proof is to show that indeed when the run ceases to be( k − k -stack. This isshown in Corollary 9.3. It is based on Lemma 9.2, in which we analyze the situation justafter reading the stars.In order to state Lemma 9.2, we need two deﬁnitions. For a run R starting in aconﬁguration c , and for a k -stack s k in some conﬁguration R ( i ), where k ∈ [1 , n ], we say that s k is c -clear in R ( i ) (with respect to R ) when hist ( R (cid:22) ,i , top k − ( s k )) (cid:54) = top k − ( c ). Moreover,for a conﬁguration c , and for k ∈ [1 , n ], let S k ( c ) be the smallest set of positionless k -stackssuch that if R is a run that starts in c and reads only stars, and s k is a k -stack of R ( | R | )that is c -clear with respect to R , then pos ↓ ( s k ) ∈ S k ( c ). Lemma 9.2.

For each milestone conﬁguration c , and for k ∈ [1 , n ] , the set S k ( c ) is ﬁnite.Proof. Let X ( c ) be the set containing all positionless k -stacks of c , and additionally pos ↓ ( pop k ( top k ( c ))); clearly X ( c ) is ﬁnite. We claim that every positionless k -stack in S k ( c ) can be obtained from a positionless k -stack s k − ∈ X ( c ) by applying at most β ( s k − ) push and pop operations, where β is the function from Lemma 8.4; this immediately implies that S k ( c ) is ﬁnite.Fix a run R that starts in c and reads only stars, and ﬁx a c -clear k -stack s k of R ( | R | ).Consider the smallest index i for which the k -stack hist ( R (cid:22) i, | R | , s k ) is c -clear in R ( i ); denote t k = hist ( R (cid:22) i, | R | , s k ). We claim that pos ↓ ( t k ) ∈ X ( c ). Indeed, either i = 0 and t k is one of the k -stacks of c , or hist ( R (cid:22) i − , | R | , s k ) is not c -clear in R ( i − k -stackbecomes c -clear in the next conﬁguration, so necessarily this is the topmost k -stack, and theoperation between these conﬁgurations is pop k . We see that R (cid:22) ,i − is ( k − R (cid:22) ,i is not ( k − R (cid:22) ,i is a k -return, and Proposition 6.6implies that t k = top k ( R ( i )) ∼ = pop k ( top k ( c )); thus, pos ↓ ( t k ) ∈ X ( c ).Observe that t k can be changed in R (cid:22) i, | R | only when it is the topmost k -stack. If thereexist at most β ( pos ↓ ( t k )) indices j ∈ [ i, | R | ] such that hist ( R (cid:22) j, | R | , s k ) = top k ( R ( j )), then s k can be obtained from t k by applying at most β ( pos ↓ ( t k )) push and pop operations, as wewanted to prove.It remains to prove that indeed there are at most β ( pos ↓ ( t k )) indices j ∈ [ i, | R | ] such that hist ( R (cid:22) j, | R | , s k ) = top k ( R ( j )). Suppose to the contrary that there are more than β ( pos ↓ ( t k ))such indices j . Then we can use Lemma 8.4 for R (cid:22) i, | R | ; it gives us an index j such thatthe conﬁguration R ( j ) is a milestone and hist ( R (cid:22) j, | R | , s k ) = top k ( R ( j )). Because both c and R ( j ) are milestones, we know that R (cid:22) ,j is 0-upper, thanks to Lemma 8.3. One case is that i = 0; then hist ( R, s k ) (cid:54) = top k ( c ) (because, by the deﬁnition of i , hist ( R, s k ) is c -clear in c ), and we know that hist ( R (cid:22) j, | R | , s k ) = top k ( R ( j )), so R (cid:22) ,j is not k -upper; in particularit cannot be 0-upper. Otherwise, as already observed, R (cid:22) ,i is not ( k − hist ( R (cid:22) i, | R | , s k ) = top k ( R ( i )), which implies that R (cid:22) i,j is k -upper. But R (cid:22) ,j is ( k − R (cid:22) ,i ◦ R (cid:22) i,j . Corollary 9.3.

For each milestone conﬁguration c there exists a ﬁnite set S ( c ) of conﬁgu-rations having the following property. Let k ∈ [0 , n ] , let R be a run starting in c , and let r ∈ [0 , | R | ] be such that R (cid:22) ,r reads only stars. Suppose that R is not ( k − -upper, butfor each i ∈ [ r, | R | − either R (cid:22) ,i is ( k − -upper or R (cid:22) i, | R | is not k -upper. Then we can ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:65 ﬁnd a conﬁguration d ∈ S ( c ) having the same ( A , ϕ ) -type and the same positionless topmost k -stack as R ( | R | ) .Proof. There are only ﬁnitely many possible values of an ( A , ϕ )-type of a conﬁguration.Thus, it is enough to show, for each k , that there are only ﬁnitely many possible positionlesstopmost k -stacks over all conﬁgurations R ( | R | ) satisfying the assumptions. For k = 0 this istrivial as a positionless 0-stack contains just one symbol. Suppose that k ≥

1. We have twocases.First suppose that R (cid:22) i, | R | is k -upper for some i ∈ [ r, | R | − i . Then by assumption R (cid:22) ,i is ( k − R is not. This is possible only when i = | R | − R (cid:22) i, | R | ). Proposition 6.5 says that R isnecessarily a k -return. Thus, top k ( R ( | R | )) ∼ = pop k ( top k ( R (0))) (cf. Proposition 6.6); thecontent of this k -stack is ﬁxed.The other case is that R (cid:22) i, | R | is not k -upper for every i ∈ [ r, | R | − k -stack of R ( | R | ) is an unchanged copy of some k -stack of R ( r ). As R is not( k − k -stack of R ( r ) is c -clear; it thus belongs to the set S k ( c ), which is ﬁniteby Lemma 9.2. Proof of Theorem 9.1.

Consider the inﬁnite run P starting at the milestone conﬁguration c and reading only stars. Consider ﬁrst the degenerate case when in P only ﬁnitely manystars are read. As pb ( c ) we take their number, plus one. Then the thesis is satisﬁed trivially,as there is no run that starts in c and reads a word beginning with pb ( c ) stars. So for therest of the proof suppose that P reads inﬁnitely many stars.Let S ( c ) be the set from Corollary 9.3 (used for c ). For each i ≥ T i ⊆ [0 , n ] × S ( c ) × M as follows. A triple ( j, d, m ) belongs to T i if and only if there exists arun R from c such that the word read by R begins with (at least) i stars, and ϕ ( R ) = m ,and R ( | R | ) has the same ( A , ϕ )-type and the same positionless topmost j -stack as d . Bydeﬁnition T i +1 ⊆ T i (for each i ), and there are only ﬁnitely many possible sets, so from somemoment every T i is the same. As pb ( c ) we take a positive number such that T i = T pb ( c ) forall i ≥ pb ( c ).Consider now a run R ◦ R (cid:48) starting in c , where R is not ( k − pb ( c ) stars, and R (cid:48) is k -upper, for some k ∈ [0 , n ]. Consider also anumber l . Our goal is to construct a run S ◦ S (cid:48) starting in c and such that ϕ ( S ) = ϕ ( R ),and S reads a word beginning with at least l stars, and S (cid:48) is ( k, ϕ )-parallel to R (cid:48) . Withoutloss of generality, we can assume that l ≥ pb ( c ). Let r be an index such that R (cid:22) ,r readsexactly pb ( c ) stars. Without loss of generality, we can assume that there is no i ∈ [ r, | R | − R (cid:22) ,i is not ( k − R (cid:22) i, | R | is k -upper (if such i exists, we move thesubrun R (cid:22) i, | R | to R (cid:48) , that is, we use the pumping lemma for R (cid:22) ,i ◦ ( R (cid:22) i, | R | ◦ R (cid:48) ), and thenin the resulting S (cid:48) we ﬁnd the subrun ( k, ϕ )-parallel to R (cid:22) i, | R | and we move it back to S ).We use Corollary 9.3 for R and r ; its assumptions are satisﬁed thanks to our “withoutloss of generality” assumption. We obtain some d ∈ S ( c ) that has the same ( A , ϕ )-typeand the same positionless topmost k -stack as R ( | R | ). It means that ( k, d, ϕ ( R )) ∈ T pb ( c ) .Because T pb ( c ) = T l , there exists a run S from c such that the word read by S begins with (atleast) l stars, and ϕ ( S ) = ϕ ( R ), and S ( | S | ) has the same ( A , ϕ )-type and the same topmost k -stack as R ( | R | ).Finally, we use Theorem 7.3 for R (cid:48) in order to obtain an accepting run S (cid:48) that starts in S ( | S | ) and is ( k, ϕ )-parallel to R (cid:48) . Pawe(cid:32)l Parys

Vol. 16:3

Why U Cannot Be Recognized?

In this section we prove that the language U cannot be recognized by a deterministichigher-order pushdown automaton. Notice that our techniques presented in previous sectionswere quite general (not too much related to the U language). We believe that they can beuseful for other purposes, for instance, to analyze behavior of some automata (in particularautomata whose main objective is to count and compare the number of times a symbolappears on the input).Of course our proof is by contradiction: suppose that for some n we have an ( n − U . We construct an n -DPDA A that works as follows. First it performsa push n operation. Then it simulates the ( n − push n and pop n operations). When the ( n − A performs a pop n operation andafterwards accepts. Clearly, A recognizes U as well (here we use the fact that no word in U is a preﬁx of another word in U ). Such a normalization allows us to use Theorem 7.4, as in A every accepting run is an n -return.Fix a ﬁnite monoid M and a morphism λ : A ∗ → M that checks whether a word is ofthe form (cid:93) ∗ (some number of (cid:93) symbols), or of the form (cid:63) ∗ ] (cid:63) ∗ (a closing bracket surroundedby some number of stars), or of neither of these two forms. This means that λ ( u ) (cid:54) = λ ( v ) forall words u, v being of diﬀerent forms. Let N be the number of equivalence classes of the( A , λ )-sequence-equivalence relation, times the number of ( A , λ )-types, plus one. Considerthe following words: w = [ ,w k +1 = w Nk ] N [ for k ∈ [0 , n − , where the number in the superscript (in this case N ) denotes the number of repetitions of aword. For a word w , its pattern is a word obtained from w by removing its letters otherthan brackets (leaving only brackets). Fix a morphism ϕ : A ∗ → M such that from its value ϕ ( w ) we can deduce • whether the word w contains the (cid:93) symbol, and • whether the pattern of w is longer than | w n | (recall that n is the order of A ), and • the exact value of the pattern of w , whenever this pattern is not longer than | w n | .We ﬁx a run R , and an index z ( w ) for each preﬁx w of w n , such that the following holds.The run R begins in the initial conﬁguration. Between R (0) and R ( z ( ε )) only stars are read.For each preﬁx w of w n , the conﬁguration R ( z ( w )) is a milestone. Just after z ( w ), the run R reads pb ( R ( z ( w ))) stars, where pb is the function from Theorem 9.1 used for morphism ϕ . If w = va (where a is a single letter), the word read by R between R ( z ( v )) and R ( z ( w ))consists of a surrounded by some number of stars. Of course such a run R exists: we readstars until we reach a milestone (succeeds thanks to Corollary 8.5), then we read as manystars as required by the pumping lemma, then we read the next letter of w n , and so on(because A accepts U , it will never block).It is important to analyze relations between conﬁgurations R ( z ( v )) for some preﬁxes v of w n . In order to avoid complicated subscripts, for any preﬁxes v, w of w n we denote (cid:104) v, w (cid:105) := R (cid:22) z ( v ) ,z ( w ) .By construction of A , for every preﬁx v of w n the run (cid:104) v, w n (cid:105) is ( n − pop n operation before reading some (cid:93) symbol). This contradicts thefollowing key lemma (taken for k = n − u = ε ). ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:67 u w k w k w k w k w k w k w k w k ]] ]]] ]] ] ]] (cid:93)(cid:93) many stars d ,l d ,l many (cid:93) Figure 6: Illustration of runs appearing in the proof (where N = 4, x = 1, y = 3). Recallthat stars can appear between letters of the words. Lemma 10.1.

Let k ∈ [ − , n − , and let u be a word such that uw k +1 is a preﬁx of w n .Then there exist a preﬁx v of w k +1 such that v (cid:54) = w k +1 and (cid:104) uv, uw k +1 (cid:105) is not k -upper.Proof. The proof is by induction on k . For k = − − v = ε ).Let now k ≥

0. Figure 6 may be helpful in ﬁnding diﬀerent runs present in the proofbelow. Suppose that the thesis of the lemma does not hold. Then for each preﬁx v of w k +1 the run (cid:104) uv, uw k +1 (cid:105) is k -upper. From this we get the following property ♥ .Let v (cid:48) be a preﬁx of w k +1 , and v a preﬁx of v (cid:48) . Then (cid:104) uv, uv (cid:48) (cid:105) is k -upper.In the proof, we construct two sequences of accepting runs, with many extra starsinserted in two diﬀerent places. Namely, in one sequence we insert extra stars before the lastopening bracket that is not closed, and in the other sequence—after this bracket. In eﬀect,the number of sharp symbols read by runs in one of these sequences should be unbounded,and in the other—bounded. The sequences are constructed in such a way that this violatesTheorem 7.4, which says that the number of sharp symbols read by runs in these sequencesis either bounded in both sequences or unbounded in both sequences.Now we come to details. By the induction assumption (where uw i − k is taken as u ), foreach i ∈ [1 , N ] there exists a preﬁx v i of w k such that (cid:104) uw i − k v i , uw ik (cid:105) is not ( k − (cid:104) uw ik , uw Nk (cid:105) is k -upper (property ♥ ), from Proposition 6.4 we know that (cid:104) uw i − k v i , uw Nk (cid:105) cannot be ( k − i ∈ [1 , N ] we useit for (cid:104) uw i − k v i , uw Nk (cid:105) ◦ (cid:104) uw Nk , uw k +1 (cid:105) . Recall from the deﬁnition of R that the word read by (cid:104) uw i − k v i , uw Nk (cid:105) begins with such a number of stars that the pumping lemma can be used.For each number l we obtain a run S i,l ◦ S (cid:48) i,l , such that ϕ ( S i,l ) = ϕ ( (cid:104) uw i − k v i , uw Nk (cid:105) ), and S i,l reads a word beginning with at least l stars, and S (cid:48) i,l is ( k, ϕ )-parallel to (cid:104) uw Nk , uw k +1 (cid:105) ; let d i,l = S i,l ( | S i,l | ). Notice that the run R (cid:22) ,z ( uw i − k v i ) ◦ S i,l , starts in the initial conﬁguration,ends in d i,l , and reads a word having pattern uw Nk .Because there are ﬁnitely many possible ( A , λ )-types, we can assume that type A ,λ ( d i,l ) = type A ,λ ( d i,j ) for each i ∈ [1 , N ] and each l and j . Indeed, we can choose (for each i separately)some value of type A ,λ ( d i,l ) that appears inﬁnitely often, and then we take the subsequenceof only these d i,l that give this value.Since there are more possible indices i ∈ [1 , N ] than the number of classes of the ( A , λ )-sequence-equivalence relation, times the number of ( A , λ )-types, there have to exist twoindices x , y with 1 ≤ x < y ≤ N such that type A ,λ ( d x, ) = type A ,λ ( d y, ), and the sequences Pawe(cid:32)l Parys

Vol. 16:3 d x, , d x, , . . . and d y, , d y, , . . . are ( A , λ )-sequence-equivalent. From now we ﬁx these twoindices x, y . Furthermore, because S (cid:48) i,l is ( k, ϕ )-parallel to (cid:104) uw Nk , uw k +1 (cid:105) for each i ∈ [1 , N ]and each l , we know that the topmost k -stacks of all d x,l and of all d y,l are positionless-equal.Let R (cid:48) be a preﬁx of S (cid:48) x, that is ( k, ϕ )-parallel to (cid:104) uw Nk , uw Nk ] N − x (cid:105) . Notice that R (cid:48) consists of N − x runs, each of which is k -upper and reads a word of the form (cid:63) ∗ ] (cid:63) ∗ (a closingbracket surrounded by some number of stars). Let also R (cid:48)(cid:48) be an n -return that starts in R (cid:48) ( | R (cid:48) | ) and reads only (cid:93) symbols (because A recognizes U , there is an accepting run R (cid:48)(cid:48) that starts in R (cid:48) ( | R (cid:48) | ) and reads only (cid:93) symbols; by construction of A , it is an n -return).Finally, we use Theorem 7.4 for λ (as ϕ ), sequences d x, , d x, , . . . (as c , c , . . . ) and d y, , d y, , . . . (as d , d , . . . ), and for run the R (cid:48) ◦ R (cid:48)(cid:48) . As noticed above (in particularbecause R (cid:48) (0) = d x, ), the conﬁgurations R (cid:48) (0), and d x,l , and d y,l for each l all have thesame ( A , λ )-types and positionless topmost k -stacks. Thus, the assumptions of the theoremare satisﬁed. For each l , we obtain runs S l = S (cid:48) l ◦ S (cid:48)(cid:48) l (from d x,l ) and T l = T (cid:48) l ◦ T (cid:48)(cid:48) l (from d y,l ).The word read by any of these runs contains N − x closing brackets with some number ofstars around them, and after them some number of (cid:93) symbols.The runs R (cid:22) ,z ( uw x − k v x ) ◦ S x,l ◦ S l and R (cid:22) ,z ( uw y − k v y ) ◦ S y,l ◦ T l for each l have pattern uw Nk ] N − x . In this pattern the last opening bracket that is not closed is the last bracket ofthe x -th w k after u . Recall that conﬁgurations d x,l were obtained by pumping inside the x -th w k , so before this bracket; for l → ∞ the number of stars inserted there is unbounded.From the deﬁnition of the language U it follows that the sequence (cid:93) ( S ) , (cid:93) ( S ) , . . . has to beunbounded. On the other hand, conﬁgurations d y,l were obtained by pumping inside the y -th w k , so after the last opening bracket that was not closed (as y > x ). For each l the numberof stars before this bracket is the same. From the deﬁnition of the language U it followsthat the sequence (cid:93) ( T ) , (cid:93) ( T ) , . . . has to be constant, hence bounded. This contradicts thethesis of Theorem 7.4, which says that these sequences are either both bounded or bothunbounded. Acknowledgment

We thank A. Kartzow, and the anonymous reviewers of this paper and its conference versionfor their constructive comments.

References [AdMO05] Klaus Aehlig, Jolie G. de Miranda, and C.-H. Luke Ong. Safety is not a restriction at level 2for string languages. In Vladimiro Sassone, editor,

FoSSaCS , volume 3441 of

Lecture Notes inComputer Science , pages 490–504. Springer, 2005.[BCHS12] Christopher H. Broadbent, Arnaud Carayol, Matthew Hague, and Olivier Serre. A saturationmethod for collapsible pushdown systems. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts,and Roger Wattenhofer, editors,

ICALP (2) , volume 7392 of

Lecture Notes in Computer Science ,pages 165–176. Springer, 2012.[Blu08] Achim Blumensath. On the structure of graphs in the Caucal hierarchy.

Theor. Comput. Sci. ,400(1-3):19–45, 2008.[Cau02] Didier Caucal. On inﬁnite terms having a decidable monadic theory. In Krzysztof Diks andWojciech Rytter, editors,

MFCS , volume 2420 of

Lecture Notes in Computer Science , pages165–176. Springer, 2002. Notice that we cannot use (cid:104) uw Nk , uw Nk ] N − x (cid:105) instead of R (cid:48) , because do not know anything about the( A , λ )-type of R ( z ( uw Nk ))). ol. 16:3 ON THE EXPRESSIVE POWER OF HIGHER-ORDER PUSHDOWN SYSTEMS 11:69 [Eng91] Joost Engelfriet. Iterated stack automata and complexity classes. Inf. Comput. , 95(1):21–75,1991.[Gil96] Robert H. Gilman. A shrinking lemma for indexed languages.

Theor. Comput. Sci. , 163(1&2):277–281, 1996.[Hay73] Takeshi Hayashi. On derivation trees of indexed grammars.

Publ. RIMS, Kyoto Univ. , 9:61–92,1973.[HK13] Alexander Heußner and Alexander Kartzow. Reachability in higher-order-counters. In KrishnenduChatterjee and Jir´ı Sgall, editors,

Mathematical Foundations of Computer Science 2013 - 38thInternational Symposium, MFCS 2013, Klosterneuburg, Austria, August 26-30, 2013. Proceedings ,volume 8087 of

Lecture Notes in Computer Science , pages 528–539. Springer, 2013.[HMOS08] Matthew Hague, Andrzej S. Murawski, C.-H. Luke Ong, and Olivier Serre. Collapsible pushdownautomata and recursion schemes. In

LICS , pages 452–461. IEEE Computer Society, 2008.[Kar10] Alexander Kartzow. Collapsible pushdown graphs of level 2 are tree-automatic. In Jean-YvesMarion and Thomas Schwentick, editors,

STACS , volume 5 of

LIPIcs , pages 501–512. SchlossDagstuhl - Leibniz-Zentrum fuer Informatik, 2010.[Kar11] Alexander Kartzow. A pumping lemma for collapsible pushdown graphs of level 2. In MarcBezem, editor,

CSL , volume 12 of

LIPIcs , pages 322–336. Schloss Dagstuhl - Leibniz-Zentrumfuer Informatik, 2011.[KNU02] Teodor Knapik, Damian Niwi´nski, and Pawe(cid:32)l Urzyczyn. Higher-order pushdown trees are easy. InMogens Nielsen and Uﬀe Engberg, editors,

FoSSaCS , volume 2303 of

Lecture Notes in ComputerScience , pages 205–222. Springer, 2002.[KNUW05] Teodor Knapik, Damian Niwi´nski, Pawe(cid:32)l Urzyczyn, and Igor Walukiewicz. Unsafe grammarsand panic automata. In Lu´ıs Caires, Giuseppe F. Italiano, Lu´ıs Monteiro, Catuscia Palamidessi,and Moti Yung, editors,

ICALP , volume 3580 of

Lecture Notes in Computer Science , pages1450–1461. Springer, 2005.[KO09] Naoki Kobayashi and C.-H. Luke Ong. Complexity of model checking recursion schemes forfragments of the modal mu-calculus. In Susanne Albers, Alberto Marchetti-Spaccamela, YossiMatias, Sotiris E. Nikoletseas, and Wolfgang Thomas, editors,

ICALP (2) , volume 5556 of

LectureNotes in Computer Science , pages 223–234. Springer, 2009.[Kob09] Naoki Kobayashi. Model-checking higher-order functions. In Ant´onio Porto and Francisco JavierL´opez-Fraguas, editors,

PPDP , pages 25–36. ACM, 2009.[KP12] Alexander Kartzow and Pawe(cid:32)l Parys. Strictness of the collapsible pushdown hierarchy. InBranislav Rovan, Vladimiro Sassone, and Peter Widmayer, editors,

MFCS , volume 7464 of

Lecture Notes in Computer Science , pages 566–577. Springer, 2012.[Mas74] A. N. Maslov. The hierarchy of indexed languages of an arbitrary level.

Soviet Math. Dokl. ,15:1170–1174, 1974.[Mas76] A. N. Maslov. Multilevel stack automata.

Problems of Information Transmission , 12:38–43, 1976.[Ong06] C.-H. Luke Ong. On model-checking trees generated by higher-order recursion schemes. In

LICS ,pages 81–90. IEEE Computer Society, 2006.[Par11] Pawe(cid:32)l Parys. Collapse operation increases expressive power of deterministic higher order pushdownautomata. In Thomas Schwentick and Christoph D¨urr, editors,

STACS , volume 9 of

LIPIcs ,pages 603–614. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2011.[Par12a] Pawe(cid:32)l Parys. Higher-order pushdown systems with data. In Marco Faella and Aniello Murano,editors,

GandALF , volume 96 of

EPTCS , pages 210–223, 2012.[Par12b] Pawe(cid:32)l Parys. On the signiﬁcance of the collapse operation. In

LICS , pages 521–530. IEEE, 2012.[Par12c] Pawe(cid:32)l Parys. A pumping lemma for pushdown graphs of any level. In Christoph D¨urr and ThomasWilke, editors,

STACS , volume 14 of

LIPIcs , pages 54–65. Schloss Dagstuhl - Leibniz-Zentrumfuer Informatik, 2012.

This work is licensed under the Creative Commons Attribution License. To view a copy of thislicense, visit https://creativecommons.org/licenses/by/4.0/https://creativecommons.org/licenses/by/4.0/