Representation of ordered trees with a given degree distribution
aa r X i v : . [ c s . D S ] J u l Representation of ordered trees with a given degreedistribution
Dekel Tsur ∗ Abstract
The degree distribution of an ordered tree T with n nodes is ~n = ( n , . . . , n n − ),where n i is the number of nodes in T with i children. Let N ( ~n ) be the number oftrees with degree distribution ~n . We give a data structure that stores an ordered tree T with n nodes and degree distribution ~n using log N ( ~n ) + O ( n/ log t n ) bits for everyconstant t . The data structure answers tree queries in constant time. This improvesthe current data structures with lowest space for ordered trees: The structure ofJansson et al. [JCSS 2012] that uses log N ( ~n ) + O ( n log log n/ log n ) bits, and thestructure of Navarro and Sadakane [TALG 2014] that uses 2 n + O ( n/ log t n ) bits forevery constant t . A problem which was extensively studied in recent years is designing a succinct datastructure that stores a tree while supporting queries on the tree, like finding the parent ofa node, or computing the lowest common ancestor of two nodes [1–16,18,19]. The problemof storing a static ordinal tree was studied in [2, 3, 5–8, 11–14, 16]. These paper show thatan ordinal tree with n nodes can be stored using 2 n + o ( n ) bits while answering queries inconstant time. The space of 2 n + o ( n ) bits matches the lower bound of 2 n − Θ(log n ) bitsfor this problem. In most of these papers, the o ( n ) term is Ω( n log log n/ log n ). The onlyexception is the data structure of Navarro and Sadakane [16] which uses 2 n + O ( n/ log t n )bits for every constant t .Jansson et al. [12] studied the problem of storing a tree with a given degree distribution.The degree distribution of an ordered tree T with n nodes is ~n = ( n , . . . , n n − ), where n i is the number of nodes in T with i children. Let N ( ~n ) be the number of trees withdegree distribution ~n . Jansson et al. showed a data structure that stores a tree T withdegree distribution ~n using log N ( ~n ) + O ( n log log n/ log n ) bits, and answers tree queriesin constant time. This data structure is based on Huffman code that stores the sequenceof node degrees (according to preorder). A different data structure was given by Farzanand Munro [5]. The space complexity of this structure is log N ( ~n ) + O ( n log log n/ √ log n )bits. The data structure of Farzan and Munro is based on a tree decomposition approach.In this paper, we give a data structure that stores a tree T using log N ( ~n )+ O ( n/ log t n )bits, for every constant t . This results improve both the data structure of Navarro andSadakane [16] (since log N ( ~n ) ≤ n ) and the data structure of Jansson et al. [12]. Our datastructure supports many tree queries which are answered in constant time. See Table 1for some of the queries supported by our data structure.Our data structure is based on two components. The first component is the treedecomposition method of Farzan and Munro [5]. While Farzan and Munro used two ∗ Department of Computer Science, Ben-Gurion University of the Negev. Email: [email protected] x ) The depth of x .height( x ) The height of x .num descendants( x ) The number of descendants of x .parent( x ) The parent of x .lca( x, y ) The lowest common ancestor of x and y .level ancestor( x, i ) The ancestor y of x for which depth( y ) = depth( x ) − i .degree( x ) The number of children of x .child rank( x ) The rank of x among its siblings.child select( x, i ) The i -th child of x .pre rank( x ) The preorder rank of x .pre select( i ) The i -th node in the preorder.levels of decomposition, we use an arbitrarily large constant number of levels. The secondcomponent is the aB-tree of Patrascu [17], which is a structure for storing an array ofpoly-logarithmic size with almost optimal space, while supporting queries on the array inconstant time. This structure has been used for storing trees in [16,20]. However, in thesepapers the tree is converted to an array and tree queries are handled using queries on thearray. In this paper we give a generalized aB-tree which can directly store an object fromsome decomposable family of objects. This generalization may be useful for the design ofsuccinct data structures for other problems.The rest of this paper is organized as follows. In Section 2 we describe the treedecomposition of Farzan and Munro [5]. In Section 3 we generalize the aB-tree structureof Patrascu [17]. Finally, in Section 4, we describe our data structure for ordinal trees. One component of our data structure is the tree decomposition of Farzan and Munro [5].In this section we describe a slightly modified version of this decomposition.
Lemma 1.
For a tree T with n nodes and an integer L , there is a collection D T,L ofsubtrees of T with the following properties.1. Every edge of T appears in exactly one tree of D T,L .2. The size of every tree in D T,L is at most L + 1 and at least .3. The number of trees in D T,L is O ( n/L ) .4. For every T ′ ∈ D T,L , at most two nodes of T ′ can appear in other trees of D T,L .These nodes are called the boundary nodes of T ′ .5. A boundary node of a tree T ′ ∈ D T,L can be either a root of T ′ or a leaf of T ′ . Inthe latter case the node will be called the boundary leaf of T ′ .6. For every T ′ ∈ D T,L , there are at most two maximal intervals I and I such that anode x ∈ T is a non-root node of T ′ if and only if the preorder rank of x is in I ∪ I .
2e now describe an algorithm that generates the decomposition of Lemma 1 (thealgorithm is based on the algorithm of Farzan and Munro with minor changes). Thealgorithm uses a procedure pack( x, x , . . . , x k ) that receives a node x and some children x , . . . , x k of x , where each child x i has an associated subtree S i of T that contains x i andsome of its descendants. Each tree S i has size at most L −
1. The procedure merges thetrees S , . . . , S k into larger trees as follows.1. For each i , add the node x to S i , and make x i the child of x .2. i ← S i with the trees S i +1 , S i +2 , . . . (by merging their roots) and stopwhen the merged tree has at least L nodes, or when there are no more children of x .4. Let S j be the last tree merged with S i . If j < k , set i ← j + 1 and go to step 3.We say that a node x ∈ T is heavy if | T h x i| ≥ L , where T h x i is the subtree of T thatcontains x and all its descendants. A heavy node is type 2 if it has at least two heavychildren, and otherwise it is type 1 .The decomposition algorithm has two phases. In the first phase the algorithm processesthe type 2 heavy nodes. Let x be a type 2 heavy node and let x , . . . , x k be children of x .Suppose that the heavy nodes among x , . . . , x k are x h , . . . , x h k ′ , where h < · · · < h k ′ .The algorithm adds to D T,L the following trees.1. A subtree whose nodes are x and the parent of x (if x is not the root of T ).2. For i = 1 , . . . , k ′ , a subtree whose nodes are x and x h i .3. For i = 1 , . . . , k ′ + 1, the subtrees generated by pack( x, x h i − +1 , . . . , x h i − ), wherethe subtree associated with each x j is T h x j i . We assume here that h = 1 and h k ′ +1 = k + 1.In the second phase, the algorithm processes maximal paths of type 1 heavy nodes.Let x , . . . , x k be a maximal path of type 1 heavy nodes ( x i is the child of x i − for all i ). If x k has a heavy child, denote this child by x ′ . Let S be a subtree of T containing x and its descendants, except x ′ and its descendants if x ′ exists. Let i be the maximalindex such that | S h x i i| ≥ L . If no such index exists, i = 1. Now, run pack( x i , y , . . . , y d ),where y , . . . , y d are the children of x i in S . The subtree associated with each y j is S h y j i .Each tree generated by procedure pack is added to D T,L . If i >
1, add to D T,L the subtreewhose nodes are { x i , x i − } , and continue recursively on the path x , . . . , x i − .For a tree T and an integer L we define a tree T T,L as follows. The tree T T,L has anode v S for every tree S ∈ D T,L , and a node v r which is the root of the tree. For two trees S , S ∈ D T,L , v S is the parent of v S in T T,L if and only if the root of S is the boundaryleaf of S . The node v r is the parent of v S for every S ∈ D T,L such that the root of S isthe root of T . Observation 2.
For every tree S ∈ D T,L , if v S is a leaf of T T,L , the only node of S that isa heavy node of T is the root of S . Otherwise, the set of nodes of S that are heavy nodesof T consists of all the nodes on the path from the root of S to the boundary leaf of S . Generalized aB-trees
In this section we describe the aB-tree (augmented B-tree) structure of Patrascu [17],and then generalize it. An aB-tree is a data structure that stores an array with elementsfrom a set Σ. Let A be the set of all such arrays. Let B be an integer (not necessarilyconstant), and let f : A →
Φ be a function that has the following property: There is afunction f ′ : N × Φ B → Φ such that for every array A ∈ A whose size is dividable by B , f ( A ) = f ′ ( | A | , f ( A ) , . . . , f ( A B )), where A = A · · · A B is a partition of A into B equalsized sub-arrays.Let A ∈ A be an array of size m = B t . An aB-tree of A is a B -ary tree defined asfollows. The root r of the tree stores f ( A ). The array A is partitioned into B sub-arraysof size m/B , and we recursively build aB-trees for these sub-arrays. The B roots of thesetrees are the children of r . The recursion stops when the sub-array has size 1.An aB-tree supports queries on A using the following algorithm. Performs a descentin the aB-tree starting at the root. At each node v , the algorithm decides to which childof v to go by examining the f values stored at the children of v . We assume that if thesevalues are packed into one word, the decision is performed in constant time. When a leafis reached, the algorithm returns the answer to the query. Let N ( n, α ) denote the numberof arrays A ∈ A of size n with f ( A ) = α . The following theorem is the main result in [17]. Theorem 3. If B = O ( w/ log( | A | + | Φ | )) (where w ≥ log n is the word size), the aB-tree ofan array A can be stored using at most log N ( | A | , f ( A )) + 2 bits. The time for performinga query is O (log B | A | ) using pre-computed tables of size O ( | Σ | + | Φ | B +1 + B · | Φ | B ) . We note that the value f ( A ) is required in order to answer queries, and the space forstoring this value is not included in the bound log N ( | A | , f ( A )) + 2 of the theorem.In the rest of this section we generalizes Theorem 3. Let A be a set of objects (forexample, A can be a set of ordered trees). As before, assume there is a function f : A →
Φ. We assume that f ( A ) encodes the size of A (namely, | A | can be computed from f ( A )). Suppose that there is a decomposition algorithm that receives an object A ∈ A and generates sub-objects A , . . . , A B (some of these objects can be of size 0) and avalue β ∈ Φ which contains the information necessary to reconstruct A from A , . . . , A B .Formally, we denote by Decompose( A ) = ( β, A , . . . , A B ) the output of the decompositionalgorithm. We also define a function g : A → Φ by g ( A ) = β and functions f i : A →
Φby f i ( A ) = f ( A i ). Let F = { ( g ( A ) , f ( A ) , . . . , f B ( A )) : A ∈ A} . We assume that thedecomposition algorithm has the following properties.(P1) There is a function Join : Φ × A B → A such that Join(Decompose( A )) = A forevery A ∈ A .(P2) Decompose(Join( β, A , . . . , A B )) = ( β, A , . . . , A B ) for every A , . . . , A B ∈ A and β ∈ Φ such that ( β, f ( A ) , . . . , f ( A B )) ∈ F .(P3) There is a function f ′ : F →
Φ such that f ( A ) = f ′ ( g ( A ) , f ( A ) , . . . , f B ( A )) for every A ∈ A .(P4) There is a constant δ ≤ B/ A ) = ( β, A , . . . , A k ), then | A i | ≤ δ | A | /B for all i .Let N ( α, β ) denotes the number of objects A ∈ A for which f ( A ) = α and g ( A ) = β .Let X α,β = { ( ~α, ~β ) : ~α = ( α , . . . , α B ) ∈ Φ B , ~β ∈ Φ B , ( β, α , . . . , α B ) ∈ F , f ′ ( β, α , . . . , α B ) = α } . emma 4. For every α ∈ Φ and β ∈ Φ , P (( α ,...,α B ) , ( β ,...,β B )) ∈X α,β Q Bi =1 N ( α i , β i ) = N ( α, β ) . Proof.
Let A be the set of all tuples ( A , . . . , A B ) ∈ A B such that(( f ( A ) , . . . , f ( A B )) , ( g ( A ) , . . . , g ( A B ))) ∈ X α,β . Let A be the set of all A ∈ A such that f ( A ) = α and g ( A ) = β . We need to show that |A | = |A | . Define a mapping h by h ( A , . . . , A B ) = Join( β, A , . . . , A B ). We will showthat h is a bijection from A to A .Fix ( A , . . . , A B ) ∈ A and denote A = Join( β, A , . . . , A B ). By the definition of A and X α,β , ( β, f ( A ) , . . . , f ( A B )) ∈ F , and by Property (P2, Decompose( A ) = ( β, A , . . . , A B ).Hence, f i ( A ) = f ( A i ) for all i and g ( A ) = β . We have f ( A ) = f ′ ( g ( A ) , f ( A ) , . . . , f B ( A )) = f ′ ( β, f ( A ) , . . . , f ( A B )) = α , where the first equality follows from Property (P3) and thethird equality follows from the definition of X α,β . We also shown above that g ( A ) = β .Therefore, h ( A , . . . , A B ) ∈ A .The mapping h is injective due to Property (P2). We next show that h is surjective. Fix A ∈ A . By definition, f ( A ) = α and g ( A ) = β . Let Decompose( A ) = ( β, A , . . . , A B ).By Property (P1), h ( A , . . . , A B ) = A , so it remains to show that ( A , . . . , A B ) ∈ A .By definition, ( β, f ( A ) , . . . , f ( A B )) = ( β, f ( A ) , . . . , f B ( A )) ∈ F . By Property (P3), f ′ ( β, f ( A ) , . . . , f ( A B )) = f ( A ) = α . Therefore, ( A , . . . , A B ) ∈ A .We now define a generalization of an aB-tree. A generalized aB-tree of an object A ∈ A is defined as follows. The root r of the tree stores f ( A ) and g ( A ). Suppose thatDecompose( A ) = ( β, A , . . . , A B ). Recursively build aB-trees for A , . . . , A B , and theroots of these trees are the children of r . The recursion stops when the object has size 1or 0.The following theorem generalizes Theorem 3. The proof of the theorem is very similarto the proof of Theorem 3 and uses Lemma 4 in order to bound the space. Theorem 5. If B = O ( w/ log( | Φ | + | Φ | )) , the generalized aB-tree of an object A ∈ A can be stored using at most log N ( f ( A ) , g ( A )) + 2 bits. The time for performing a query is O (log B | A | ) using pre-computed tables of size O ( a + | Φ | B · | Φ | B · ( | Φ | · | Φ | + B )) , where a is the number of objects in A of size . For a tree T with degree distribution ~n = ( n , . . . , n n − ) define the tree degree entropy H ∗ ( T ) = n P i : n i > n i log nn i . Since nH ∗ ( T ) = log N ( ~n ) + O (log n ), it suffices to show adata structure for T that uses nH ∗ ( T ) + O ( n/ log t n ) bits for any constant t .Let t be some constant. Define B = log / n and L = log t +2 n . As in [17], define e ( i )to be the rounding up of log nn i to a multiple of 1 /L . If n i = 0, e ( i ) is the rounding upof log n to a multiple of 1 /L . For a tree S define E ( S ) = P | S | i =1 e (degree(pre select S ( i ))),where pre select S ( i ) is the i -th node of S in preorder. Let Σ = { i ≤ n − n i > } . We saythat a tree S is a Σ -tree if for every node x of S , except perhaps the root, degree( x ) ∈ Σ. Lemma 6.
For every m ≤ n and a ≥ , the number of Σ -trees S with m nodes and E ( S ) = a is at most a +1 . Proof.
For a string A over the alphabet Σ define E ( A ) = P | A | i =1 e ( A [ i ]). Let N ( n, a ) be thenumber of strings over Σ with length m and E ( A ) = a . We first prove that N ( m, a ) ≤ a m (we note that this inequality was stated in [17] without a proof).The base m = 0 is trivial. We now prove the induction step. Let A be a string of length m with E ( A ) = a . Clearly, e ( A [1]) ≤ a otherwise E ( A ) > a , contradicting the assumptionthat E ( A ) = a . If we remove A [1] from A , we obtain a string A ′ of length m − E ( A ′ ) = E ( A ) − e (degree( A [1])) ≥
0. Therefore, N ( m, a ) = P i ∈ Σ: e ( i ) ≤ a N ( m − , a − e ( i )).Using the induction hypothesis, we obtain that N ( m, a ) ≤ X i ∈ Σ: e ( i ) ≤ a a − e ( i ) ≤ X i ∈ Σ a − log nni = 2 a X i ∈ Σ n i n = 2 a . We now bound the number of Σ-trees with m nodes and E ( S ) = a . We say that aΣ-tree is of type 1 if the degree of its root is in Σ, and otherwise the tree is of type 2 . Forevery Σ-tree S we associate a string A S in which A S [ i ] = degree(pre select S ( i )). If S is atype 1 Σ-tree then A S is a string over the alphabet Σ and E ( A S ) = E ( S ). Therefore, thenumber of type 1 Σ-trees S with m nodes and E ( S ) = a is at most N ( m, a ) ≤ a . If S is atype 2 Σ-tree then A S [2 ..m ] is a string over the alphabet Σ and E ( A S [2 ..m ]) = E ( S ) − a ′ ,where a ′ is the rounding up of log n to a multiple of 1 /L . Since there are at most m waysto choose the degree of the root of S , it follows that the number of type 2 Σ-trees S with m nodes and E ( S ) = a is at most m N ( m, a − a ′ ) ≤ n a − a ′ ≤ a .To build our data structure on T , we first partition T into macro trees using thedecomposition algorithm of Lemma 1 with parameter L . On each macro tree S we builda generalized aB-tree as follows.Let A be the set of all Σ-trees with at most 2 L + 1 nodes, and in which one of theleaves may be designated a boundary leaf. We first describe procedure Decompose. For atree S ∈ A , Decompose( S ) generates subtrees S , . . . , S B of S by applying the algorithmof Lemma 1 on S with parameter L ( S ) = Θ( | S | /B ), where the constant hidden in the Θnotation is chosen such that the number of trees in the decomposition is at most B (such aconstant exists to due to part 3 of Lemma 1). This algorithm generates subtrees S , . . . , S k of S , with k ≤ B . The subtrees S , . . . , S k are numbered according to the preorder ranks oftheir roots, and two subtrees with a common root are numbered according to the preorderrank of the first child of the root. If k < B we add empty subtrees S k +1 , . . . , S B .We next describe the mappings f : A →
Φ and g : A → Φ . Recall that g ( S ) is theinformation required to reconstruct S from S , . . . , S B . In our case, g ( S ) is the balancedparenthesis string of the tree T S,L ( S ) . The number of nodes in T S,L ( S ) is k + 1. Since k ≤ B , g ( S ) is a binary string of length at most 2 B + 2. Thus, | Φ | = O (2 B ).We define f ( S ) to be a vector ( E ( S ) , | S | , s S , s ′ S , s ′′ S , d S , l S , p S ) whose components aredefined as follows. • s S = | S h x i| , where x is the rightmost child of the root of S (recall that S h x i is thesubtree of S containing x and its descendants). • s ′ S = | S h x ′ i| , where x ′ is the child of the root of S which is on the path between theroot of S and the boundary leaf of S . If S does not have a boundary leaf, s ′ S = 0. • s ′′ S = max y | S h y i| where the maximum is taken over every node y of S whose parentis on the path between the root of S and the boundary leaf of S , and y is not on thispath. If S does not have a boundary leaf, the maximum is taken over all children y of the root of S . • d S is the number of children of the root of S .6 l S is the distance between the root of S and the boundary leaf of S . If S does nothave a boundary leaf, l S = 0. • p S is the number of nodes in S that appear before the boundary leaf of S in thepreorder of S . If S does not have a boundary leaf, p S = 0.We note that the value E ( S ) is required in order to bound the space of the aB-trees.The values | S | , s S , s ′ S , s ′′ S , d S , and l S are required in order to satisfy Property (P2) ofSection 3 (see the proof of Lemma 9 below). These values are also used for answeringqueries. Finally, the value p S is needed to answer queries.The values | S | , s S , s ′ S , s ′′ S , d S , l S , p S are integers bounded by L . Moreover, E ( S ) is a mul-tiple of 1 /L and E ( S ) ≤ L (log n + 1 /L ) = L log n + 1. Therefore, | Φ | = O ( L · L log n ) = O ( L log n ). It follows that the condition B = O ( w/ log( | Φ | + | Φ | )) of Theorem 5 is sat-isfied (since B = log / n and w/ log( | Φ | + | Φ | ) = Ω( w/B ) = Ω(log / n )). Moreover, thesize of the lookup tables of Theorem 5 is O (2 B ( B +1) ) = O ( √ n ).The following lemmas shows that Properties (P1)–(P4) of Section 3 are satisfied. Lemma 7.
Property (P1) is satisfied.
Proof.
We define the function Join as follows. Given a balanced parenthesis string β ofa tree S β and trees S , . . . , S B , the tree S = Join( β, S , . . . , S B ) is constructed as follows.For i = 1 , . . . , B , associate the tree S i to the node pre select S β ( i + 1). For every internalnode v in S β , merge the boundary leaf of the tree S i associated with v , and the roots ofthe trees associated with the children of v (if v is the root of S β just merge the roots ofthe trees associated with the children of v ). By definition, Join(Decompose( S )) = S forevery tree S . Lemma 8.
Let S = Join( β, S , . . . , S B ) . If a node x ∈ S is a boundary node of some tree S i , the values of | S h x i| and degree( x ) can be computed from β, f ( S ) , . . . , f ( S B ) . Proof.
Let S β be the tree whose balanced parenthesis string is β . Assume that x is not theroot of S (the proof for the case when x is the root is similar). Therefore, x is the boundaryleaf of some tree S i . Let I be a set containing every index j = i such that pre select S β ( j +1) is a descendant of pre select S β ( i + 1). Observe that | S h x i| = 1 + P j ∈ I ( | S j | − I be a set containing every index j such that pre select S β ( j + 1) is a childof pre select S β ( i + 1). By part 1 of Lemma 1, degree( x ) = P j ∈ I d S j . The lemma nowfollows since I, I can be computed from β and | S j | , d S j are components of f ( S j ). Lemma 9.
Property (P2) is satisfied.
Proof.
Suppose that β ∈ Φ is a balanced parenthesis string and S , . . . , S B ∈ A aretrees such that ( β, f ( S ) , . . . , f ( S B )) ∈ F (recall that F = { ( g ( S ) , f ( S ) , . . . , f B ( S )) : S ∈A} ). We need to show that Decompose(Join( β, S , . . . , S B )) = ( β, S , . . . , S B ). Denote S = Join( β, S , . . . , S B ). By the definition of F , there is a tree S ∗ such that g ( S ∗ ) = β and f i ( S ∗ ) = f ( S i ) for all i . Denote Decompose( S ∗ ) = ( β, S ∗ , . . . , S ∗ B ). Let S β be a treewhose balanced parenthesis string is β .By Lemma 8, | S | = | S ∗ | and therefore L ( S ) = L ( S ∗ ). Recall that a node of S or S ∗ is heavy if the size of its subtree is at least L ( S ). Define the skeleton of a tree to bethe subtree that contains the heavy nodes of the tree. We first claim that the skeletonof S ∗ can be reconstructed from β, f ( S ∗ ) , . . . , f ( S ∗ B ). To prove this claim, define trees P , . . . , P B , where P i is a path of length l S ∗ i . By Observation 2, the skeleton of S ∗ isisomorphic to Join( β, P , . . . , P B ). 7e now show that S and S ∗ have isomorphic skeletons. Consider some subtree S i suchthat pre select S β ( i +1) is not a leaf of S β . Let x be the boundary leaf of S i , and let x ∗ be theboundary leaf of S ∗ i . By Lemma 8 and Observation 2, | S h x i| = | S ∗ h x ∗ i| ≥ L ( S ∗ ) = L ( S ),so x is a heavy node of S . Therefore, all the nodes of S that are on the path betweenthe root of S i and the boundary leaf of S i are heavy nodes of S (this follows from thefact that all ancestors of a heavy node are heavy). Let S ′ be the subtree of S containingall the nodes of S that are nodes on the path between the root and the boundary leafof S i , for every S i such that pre select S β ( i + 1) is not a leaf of S β . Since l S i = l S ∗ i forall i , it follows that S ′ is isomorphic to Join( β, P , . . . , P B ) and to the skeleton of S ∗ . Itremains to show that S ′ is the skeleton of S . Assume conversely that there is a heavynode y of S which is not in S ′ . We can choose such y whose parent x is in S ′ . Let S i bethe tree containing y . Since the y is not on the path between the root and the boundaryleaf of S i , all the descendants of y are in S i . Since x is on the path between the rootand the boundary leaf of S i (if S i does not have a boundary leaf, x is the root of S i ), s ′′ S i ≥ | S h y i| ≥ L ( S ) = L ( S ∗ ). It follows that s ′′ S ∗ i = s ′′ S i ≥ L ( S ∗ ) which means that S ∗ i has a heavy node which is not on the path between the root and the boundary leaf. Thiscontradicts Observation 2. Therefore, S and S ∗ have isomorphic skeletons.We now prove that Decompose( S ) = ( β, S , . . . , S B ). Suppose we run the decomposi-tion algorithm on S and on S ∗ . In the first phase of the algorithm, the algorithm processestype 2 heavy nodes. Since S and S ∗ have isomorphic skeletons, there is a bijection betweenthe type 2 heavy nodes of S and the type 2 heavy nodes of S ∗ . Let x be a type 2 heavynode of S and let x ∗ be the corresponding type 2 heavy node of S ∗ . Let x ∗ , . . . , x ∗ k be thechildren of x ∗ , and let x ∗ h , . . . , x ∗ h k ′ be the heavy children of x ∗ . When processing x ∗ , thedecomposition algorithm generates the following subtrees of S ∗ .1. A subtree whose nodes are x ∗ and its parent.2. For j = 1 , . . . , k ′ , a subtree whose nodes are x ∗ and x ∗ h j .3. For j = 1 , . . . , k ′ + 1, the subtrees generated by pack( x ∗ , x ∗ h j − +1 , . . . , x ∗ h j − ).For every subtree S ∗ j of the first two types above that is generated when processing x ∗ ,the subtree S j is generated when processing x (since the number of heavy children of x is equal to the number of heavy children of x ∗ ). We now consider the subtrees ofthe third type. Suppose without loss of generality that h >
1. Consider the call topack( x ∗ , x ∗ , . . . , x ∗ h − ). The first tree generated by this call, denoted S ∗ a , consists of x ∗ ,some children x ∗ , . . . , x ∗ l of x ∗ , and all the descendants of x ∗ , . . . , x ∗ l , where l = d S ∗ a . Fromthe definition of procedure pack, P l − j =1 | S ∗ h x ∗ j i| < L ( S ∗ ) −
1. Additionally, if l < h − P lj =1 | S ∗ h x ∗ j i| ≥ L ( S ∗ ) − I = { pre rank( x ∗ j ) − j = 1 , . . . , h − } . We have that h − P j ∈ I d S ∗ j . Thenumber of children of x before the first heavy child of x is equal to P j ∈ I d S j = P j ∈ I d S ∗ j = h −
1. Let x , . . . , x h − be these children.Since d S a = d S ∗ a = l , when the decomposition algorithm processes the node x of S wehave l − X j =1 | S h x j i| = | S a | − s S a − | S ∗ a | − s S ∗ a − l − X j =1 | S ∗ h x ∗ j i| < L ( S ∗ ) − L ( S ) − . Additionally, if l < h − l X j =1 | S h x j i| = | S a | − | S ∗ a | − l X j =1 | S ∗ h x ∗ j i| ≥ L ( S ∗ ) = L ( S ) . x, x , . . . , x h − ) is S a . Continuing with thesame arguments, we obtain that for every tree S ∗ j generated by a call to pack( x ∗ , · ) whenprocessing x ∗ , the tree S j is generated by a call to pack( x, · ) when processing x .Now consider the second phase of the algorithm. Let x ∗ , . . . , x ∗ k be a maximal pathof type 1 heavy nodes of S ∗ , and let x , . . . , x k be the corresponding maximal path oftype 1 heavy nodes of S . For simplicity, assume that x ∗ k does not have a heavy child.Let S ∗ a , S ∗ a , . . . be the subtrees generated by pack( x ∗ i , y ∗ , y ∗ , . . . ), where y ∗ , y ∗ , . . . are thechildren of x ∗ i . Let S ∗ a be the subtree from S ∗ a , S ∗ a , . . . that contains x ∗ k . Let l = l S ∗ a = l S a .By the definition of the decomposition algorithm, s ′ S ∗ a = | S ∗ h x ∗ k − l +1 i| < L ( S ∗ ). Moreover,if l < k −
1, 1 + P j ( | S ∗ a j | −
1) = | S ∗ h x ∗ k − l i| ≥ L ( S ∗ ). Therefore, | S h x k − l +1 i| = s ′ S a = s ′ S ∗ a < L ( S ) and if l < k − | S h x k − l i| = 1 + P j ( | S a j | −
1) = 1 + P j ( | S ∗ a j | − ≥ L ( S ).Therefore, when processing the path x , . . . , x k , the decomposition algorithm makes a callto pack( x i , y , y , . . . ), where y , y , . . . are the children of x i . Using the same argumentused for the first phase of the algorithm, we obtain that the trees S a , S a , . . . are generatedby pack( x i , y , y , . . . ). Lemma 10.
Property (P3) is satisfied.
Proof.
Let S be a tree and Decompose( S ) = ( β, S , . . . , S B ). Recall that f ( S ) =( E ( S ) , | S | , s S , s ′ S , s ′′ S , d S , l S , p S ) and g ( S ) = β is the balanced parenthesis string of T S,L ( S ) .A node x of S is called an inner boundary node if it is a boundary node of some subtree S i . By definition, E ( S ) is equal to P Bi =1 ( E ( S i ) − e ( d S i )) plus the sum of e (degree( x ))for every inner boundary node x of S . By Lemma 8, every such value e (degree( x ))can be computed from g ( S ) , f ( S ) , . . . , f ( S B ). Therefore, E ( S ) can be computed from g ( S ) , f ( S ) , . . . , f ( S B ).Similarly, | S | is equal to P Bi =1 ( | S i | −
1) plus the number of inner boundary nodes of S . The number of inner boundary nodes of S is equal to the number of internal nodes in T S,L ( S ) . Thus, | S | can be computed from g ( S ) , f ( S ) , . . . , f ( S B ).We next consider s S . Let x be the rightmost child of the root of S . Let v S i be therightmost child of the root of T S,L ( S ) . Then, the tree S i contains both the root of S and x .If x is not the boundary leaf of S i then all the descendants of x are in S i . Thus, s S = s S i .Otherwise, by Lemma 8, s S can be computed from g ( S ) , f ( S ) , . . . , f ( S B ).The other components of f ( S ) can also be computed from g ( S ) , f ( S ) , . . . , f ( S B ). Weomit the details. Lemma 11.
Property (P4) is satisfied.
Proof.
The lemma follows from part 2 of Lemma 1.Our data structure for the tree T consists of the following components. • For each macro tree S , the aB-tree of S , stored using Theorem 5. • For each macro tree S , the values f ( S ) and g ( S ). • Additional information and data structures for handling queries, which will be de-scribed later.The space of the aB-trees and the values f ( S ) , g ( S ) is P S (log N ( f ( S ) , g ( S )) + 2 + ⌈ log | Φ |⌉ + ⌈ log | Φ |⌉ ), where the summation is over every macro tree S . By part 6 ofLemma 1, X S (2 + ⌈ log | Φ |⌉ + ⌈ log | Φ |⌉ ) = O ( n/L · B ) = O ( n/ log t n ) .
9e next bound P S log N ( f ( S ) , g ( S )). Since E ( S ) , | S | are components of f ( S ), we havefrom Lemma 6 that N ( f ( S ) , g ( S )) ≤ E ( S )+1 . Therefore, P S log N ( f ( S ) , g ( S )) ≤ P S ( E ( S )+1). By definition, P S E ( S ) is equal to E ( T ) + P S e ( d S ) minus the sum of e (degree( x ))for every node x of T which is a boundary node of some macro tree. Therefore, X S E ( S ) ≤ E ( T )+ X S e ( d S ) ≤ ( nH ∗ ( T )+ O ( n/L ))+ O ( n/L · log n ) = nH ∗ ( T )+ O ( n/ log t n ) . Most of the queries on T are handled in a similar way these queries are handled in thedata structure of Farzan and Munro [5]. We give some examples below. We assume thata node x in T is represented by its preorder number. In order to compute the macro treethat contains a node x , we store the following structures. • A rank-select structure on a binary string B of length n in which B [ x ] = 1 if nodes x and x − • An array M in which M [ i ] is the number of the macro tree that contains node x = select ( B, i ).By part 6 of Lemma 1, the number of ones in B is O ( n/L ). Therefore, the space for B is O ( n/L · log L ) + O ( n/ log t n ) = O ( n/ log t n ) bits (using the rank-select structure ofPatrascu [17]), and the space for M is O ( n/L · log n ) = O ( n/ log t n ) bits.For handling depth( x ) queries, the data structure stores the depths of the roots ofthe macro trees. The required space is O ( n/L · log n ) = O ( n/ log t n ) bits. Answering adepth( x ) query is done by finding the macro tree S containing x . Then, add the depth ofthe root of S (which is stored in the data structure) to the distance between x and theroot of S . The latter value is computed using the aB-tree of S . It suffices to describehow to compute this value when the aB-tree is stored naively. Recall that the root of theaB-tree corresponds to S , and the children of the root corresponds to subtrees S , . . . , S B of S . Finding the subtree S i that contains x can be done using a lookup table indexedby g ( S ), | S | , . . . , | S B | , and p S , . . . , p S B . Next, compute the distance between the rootof S i and the root of S using a lookup table indexed by g ( S ) and l S , . . . , l S B . Then thequery algorithm descend to the i -th child of the root of the aB-tree and continues thecomputation in a similar manner.The handling of level ancestor queries is different than the way these queries are han-dled in the structure of Farzan and Munro. We define weights on the edges of T T,L asfollows. For every non-root node v S in T T,L , the weight of the edge between v S and itsparent is l S . The data structure stores a weighted ancestor structure on T T,L . We usethe structure of Navarro and Sadakane [16] which has O (1) query time. The space ofthis structure is O ( n ′ log n ′ · log( n ′ W ) + n ′ W/ log t ′ ( n ′ W )) for every constant t ′ , where n ′ = |T T,L | and W is the maximum weight of an edge of T T,L . Since n ′ = O ( n/L ) and W = O ( L ), we obtain that the space is O ( n/ log t n ) bits.In order to answer a level ancestor( x, d ) query, first find the macro tree S that contains x . Then use the aB-tree of S to find level ancestor( x, d ) if this node is in S . Otherwise, let r be the root of S and let d ′ be the distance between r and x ( d ′ is computed using the aB-tree). Next, perform a level ancestor(parent( v S ) , d − d ′ ) on T T,L , and let v S ′ be the answer.Let v S ′′ be the child of v S ′ which is an ancestor of v S . The node level ancestor( x, d ) is inthe macro tree S ′′ , and it can be found using a query on the aB-tree of S ′′ .10 eferences [1] D. Arroyuelo, P. Davoodi, and S. R. Satti. Succinct dynamic cardinal trees. Algo-rithmica , 74(2):742–777, 2016.[2] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Rep-resenting trees of higher degree.
Algorithmica , 43(4):275–292, 2005.[3] O. Delpratt, N. Rahman, and R. Raman. Engineering the louds succinct tree repre-sentation. In
Proc. 5th Workshop on Experimental and Efficient Algorithms (WEA) ,pages 134–145, 2006.[4] A. Farzan and J. I. Munro. Succinct representation of dynamic trees.
TheoreticalComputer Science , 412(24):2668–2678, 2011.[5] A. Farzan and J. I. Munro. A uniform paradigm to succinctly encode various familiesof trees.
Algorithmica , 68(1):16–40, 2014.[6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representationfor balanced parentheses.
Theoretical Computer Science , 368(3):231–246, 2006.[7] R. F. Geary, R. Raman, and V. Raman. Succinct ordinal trees with level-ancestorqueries.
ACM Transactions on Algorithms , 2(4):510–534, 2006.[8] A. Golynski, R. Grossi, A. Gupta, R. Raman, and S. S. Rao. On the size of succinctindices. In
Proc. 15th European Symposium on Algorithms (ESA) , pages 371–382,2007.[9] A. Gupta, W.-K. Hon, R. Shah, and J. S. Vitter. A framework for dynamizing succinctdata structures. In
Proc. 34th International Colloquium on Automata, Languages andProgramming (ICALP) , pages 521–532, 2007.[10] M. He, J. I. Munro, and S. R. Satti. Succinct ordinal trees based on tree covering.
ACM Transactions on Algorithms , 8(4):42, 2012.[11] G. Jacobson. Space-efficient static trees and graphs. In
Proc. 30th Symposium onFoundation of Computer Science (FOCS) , pages 549–554, 1989.[12] J. Jansson, K. Sadakane, and W.-K. Sung. Ultra-succinct representation of orderedtrees with applications.
J. of Computer and System Sciences , 78(2):619–631, 2012.[13] J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Succinct representations of per-mutations and functions.
Theoretical Computer Science , 438:74–88, 2012.[14] J. I. Munro and V. Raman. Succinct representation of balanced parentheses andstatic trees.
SIAM J. on Computing , 31(3):762–776, 2001.[15] J. I. Munro, V. Raman, and A. J. Storm. Representing dynamic binary trees suc-cinctly. In
Proc. 12th Symposium on Discrete Algorithms (SODA) , pages 529–536,2001.[16] G. Navarro and K. Sadakane. Fully-functional static and dynamic succinct trees.
ACM Transactions on Algorithms , 10(3):article 16, 2014.[17] M. P˘atra¸scu. Succincter. In
Proc. 49th Symposium on Foundation of ComputerScience (FOCS) , pages 305–313, 2008. 1118] R. Raman, V. Raman, and S. R. Satti. Succinct indexable dictionaries with appli-cations to encoding k-ary trees, prefix sums and multisets.
ACM Transactions onAlgorithms , 3(4):43, 2007.[19] R. Raman and S. S. Rao. Succinct dynamic dictionaries and trees. In
Proc. 30thInternational Colloquium on Automata, Languages and Programming (ICALP) , pages357–368, 2003.[20] D. Tsur. Succinct representation of labeled trees.