Multidimensional segment trees can do range updates in poly-logarithmic time
MMultidimensional segment trees can do range queriesand updates in logarithmic time
Nabil Ibtehaz , M. Kaykobad , and M. Sohel Rahman { kaykobad,msrahman } @cse.buet.ac.bd Department of CSE, BUET,ECE Building, West Palasi, Dhaka-1205, Bangladesh * Corresponding authorNovember 6, 2018
Abstract
Updating and querying on a range is a classical algorithmic problem with amultitude of applications. The Segment Tree data structure is particularly notablein handling the range query and update operations. A Segment Segment Tree di-vides the range into disjoint segments and merges them together to perform rangequeries and range updates elegantly. Although this data structure is remarkablypotent for 1-dimensional problems, it falls short in higher dimensions. Lazy Propa-gation enables the operations to be computed in O ( logn ) time in single dimension.However, the concept of lazy propagation could not be translated to higher dimen-sional cases, which imposes a time complexity of O ( n k − logn ) for operations on k -dimensional data. In this paper, we have made an attempt to emulate the ideaof lazy propagation differently so that it can be applied for 2-dimensional cases.Moreover, the proposed modification is capable of performing any general aggre-gate function similar to the original Segment Tree, and can also be extended to evenhigher dimensions. Our proposed algorithm manages to perform range queries andupdates in O (log n ) time for a 2-dimensional problem, which becomes O (log d n )for a d -dimensional situation. Index terms—
Dynamic range query, Lazy Propagation, Multidimensional data,Range sum query, Segment Tree, Tree data structures,1 a r X i v : . [ c s . D S ] N ov Introduction
Range queries appear frequently in different problems in Computer Science. Many in-teresting and useful problems can also be reduced to range queries. For example, theproblem of computing the Lowest Common Ancestor (LCA) of two nodes in a tree canbe reduced to a range minimum query problem [1] to solve it efficiently [2]. Also rangeupdates and range queries have a lot of diversified applications in domains like databasetheory [3, 4], sensor networks [5], image processing [6], cryptography [7], computationalgeometry [8, 9], geographic information systems [10] etc.In its simplest form, the dynamic range query problem involves two operations,namely, query and update. Suppose we are considering range sum queries and updates;then a query for a given range will return the sum of all the elements within the suppliedrange. On the other hand, an update in this context adds a specific value (given withthe update request) to each of the elements within the range. In range query and updatecontext, the aim is to construct a data structure that can be efficiently used to answerqueries and handle updates.The Segment Tree [8] data structure is particularly notable in handling the rangequery and update operations. A Segment Tree is a complete binary tree where thenodes operate on segments. These segments are divided into two equal or near equalparts recursively and are merged later on to perform dynamic range query operationsefficiently in logarithmic time. Compared to other similar data structures like FenwickTrees [11], Segment Trees are more versatile, which has led to diversified applications ofthis tree-based data structure. Segment trees have applications in networking [12, 13],computer vision [14], computational geometry [15, 16] etc. to name a few.However, if we consider update operations, Segment Trees have only been able to enjoysuccess in single dimension. In higher dimensions, it suffers greatly due to its inabilityto handle range updates efficiently therein. In one dimension, Segment Trees exploitthe idea of Lazy Propagation that allows it to perform range updates in logarithmictime [17]. But unfortunately, this trick does not generalize to higher dimensions. Inspite of its heavy use in solving different problems in the literature, we don’t find muchworks therein focusing on improving the Segment Tree itself. In this paper, we havemade an attempt to improve the performance of the update operation of Segment Treein higher dimensions. In particular, we propose and incorporate a novel idea of scaledupdate and partial query that in combination with the concept of lazy propagation equipssegment trees with the capability to perform range update operations in two dimensionalcontext efficiently (Section 3). We also discuss how our approach generalizes to higher( >
2) dimensions (Section 4). To the best of our knowledge, this is the first (successful)attempt to achieve this feat and we believe that our modified segment tree data structurewill be extremely useful in diversified applications especially since other available datastructures fall short in this regard.
In this paper we primarily discuss range sum queries and updates over a two dimensionalarray. A sub-array of a two dimensional array A is A [ x : x , y : y ], where x , x arecoordinates along the 1st dimension ( x dimension), and y , y are coordinates along the2nd dimension ( y dimension). It consists of all the elements A [ x, y ], such that x ≤ x ≤ x and y ≤ y ≤ y . The query operation on the aforementioned sub-array returns the sumof all the elements within the sub-array: query ( A [ x : x , y : y ]) = x (cid:88) x = x y (cid:88) y = y A [ x, y ]The update operation, on the other hand, adds a constant value c to all the elementsof the sub-array: update ( A [ x : x , y : y ] , c ) → A [ x, y ] = A [ x, y ] + c, ∀ x ≤ x ≤ x , y ≤ y ≤ y A Segment Tree can perform range queries and updates in logarithmic time followinga divide and conquer approach [8]. It is defined on a segment, which is recursivelydivided into two small segments and merged together to compute aggregate functions onthe segments. Although Segment Tree following this approach can only perform pointupdates, the concept of ‘Lazy Propagation’ allows the data structure to perform rangequeries and updates in logarithmic time as well. Segment Trees can be extended to higherdimensions by successively cascading Segment Trees. In a 2D Segment Tree, each of thenodes of the Segment Tree contains another Segment Tree therein. Thus, two segmentsalong the two dimensions are simultaneously considered. However, lazy propagationconcept does not generalize to higher dimensions and hence 2D (or higher dimensional)Segment Trees cannot do better than O ( n d − log n ) time query and update operations,where n is the maximum size of the array in any one dimension and d is the numberof dimensions [18]. The operations of Segment Tree are illustrated in Figure 1. A morecomprehensive overview on Segment Trees can be found in Appendix A. (a) Segment Tree (b) Update(3,4) (c) Query(3,3) Figure 1: 1 Dimensional Segment Tree. 1a shows the structure of the Segment Tree,which is a complete binary tree and all the nodes are defined on a segment, while the leafnodes work on individual indices. 1b illustrates the lazy propagation operation. Whileupdating the range (3,4), upon entering the node operating on (3,4) range, we update lazyvalues and backtrack instead of traversing to the individual leaf nodes. 1c demonstratesthe query operation. While querying on the index 3, we keep dividing the regions into 2segments until the node is reached. Along the way the lazy values are passed and finally,the value from the node is retrieved, combining them we obtain the result of the query.3 .3 Terminologies
For the rest of the paper, we denote the 1st (2nd) dimension as x -dimension ( y -dimension)or simply x ( y ). We use the term region to mean the sub-array of the array spanned bythat region, i.e., indices of the elements. Also by log n we imply log n . We also definethe terms x -superregion and x -subregion as follows. Definition 2.1.
A region r ≡ [ x : x , y : y ] is a x -superregion of a region r (cid:48) ≡ [ x (cid:48) : x (cid:48) , y (cid:48) : y (cid:48) ] , if the ranges along y -dimension are equal in both cases (i.e., y = y (cid:48) , y = y (cid:48) ), but the range along x -dimension of r (cid:48) is a proper subrange of that of r ( [ x : x ] ∪ [ x (cid:48) : x (cid:48) ] = [ x : x ] , [ x : x ] ∩ [ x (cid:48) : x (cid:48) ] = [ x (cid:48) : x (cid:48) ] , [ x : x ] (cid:54) = [ x (cid:48) : x (cid:48) ] ). On the other hand, inthe above, r (cid:48) is said to be a x -subregion of the region r . In our approach, we propose and implement two different types of updates in theSegment Tree data structure (See Section 3.1.2 for details) as defined below.
Definition 2.2 (Dispersed and Intended Updates.) . Consider updating a region r ≡ [ x start : x end , y start : y end ] . Now suppose, r is a x -subregion of r (cid:48) ≡ [ x (cid:48) : x (cid:48) , y (cid:48) : y (cid:48) ] . Alsosuppose that r is either a x -superregion of r (cid:48)(cid:48) ≡ [ x (cid:48)(cid:48) : x (cid:48)(cid:48) , y (cid:48)(cid:48) : y (cid:48)(cid:48) ] , or completely matcheswith r (cid:48)(cid:48) . Then, we refer to the update operation as ‘Dispersed Update’ (‘Intended Update’)when we are updating the node n operating on the region r (cid:48) ( r (cid:48)(cid:48) ) . Furthermore, we classify the queries into two classes as defined below.
Definition 2.3 (Partial and Complete Queries.) . While querying on a range r ≡ [ x start : x end , y start : y end ] , we perform a Partial (Complete) Query on node n operating on range r (cid:48) ≡ [ x : x , y : y ] such that r is a x -subregion ( x -superregion) of r (cid:48) (or completelymatches with r (cid:48) ). In our proposed 2D Segment Tree, at each node, we store two types of values and lazyupdates, namely, Global and Local. Global values and global lazy updates propagatealong the first dimension. These are the results of Intended Updates and are used bothin Partial and Complete Queries. Local values and local lazy updates, on the contrary,remain confined within the first dimensions and they do not propagate. They account forDispersed Updates and are only considered during Complete Query. In case of PartialQueries, we dilute the values stored in the nodes by scaling them, and we only considerthe global values stored in the nodes. In case of Complete Queries, we do not dilute thevalues stored in the nodes and consider both local and global values. We discuss thedetails in the next section.In what follows we will be using the following notations with respect to a segmenttree. A node n of a Segment Tree is defined by the following attributes: • n.range : The range of the node specifies the segment it is operating on. • n.size : The size denotes the length, i.e., the number of elements in the segment itis operating on. • n.value : The value of a node equals the result of the function of consideration whenoperated on the segment the node is defined on. For example, for our range sumquery problem, the value represents the sum of all the elements in the segment. • n.lazy : The value of the lazy update of a node signifies that all the descendants ofthat node should be updated by this value implicitly.4n our proposed modification we decompose the values and lazy updates into two com-ponents ‘Global’ and ‘Local’. Therefore these quantities are denoted by n.global.value , n.global.lazy , n.local.value , n.local.lazy for Global and Local components respectively. In this section we present our proposed approach, prove its correctness and analyze thetime and space complexity thereof.
In what follows we assume that we are handling the dynamic range sum query problemover a two dimensional array A of size n × m . The construction of our proposed 2D Segment Tree is almost identical to that of theoriginal 2D Segment Tree. However, in our proposed Segment Tree, at every node, westore two types of values, namely, local values and global values (as discussed in theprevious section). The reason for this and further details thereof will be spelled outshortly.
The goal of an update operation in our context is to update a region r ≡ [ x start : x end , y start : y end ] by adding a constant value c (supplied with the update request) toeach of the elements of the array A residing within the range r . Similar to the origi-nal 2D Segment Tree algorithm, our modified algorithm first divides the regions alongthe first dimension ( x dimension), and then starts breaking the regions along the sec-ond dimension ( y dimension). The update algorithm starts from the root node n R ,which is defined on the entire range along the x dimension, i.e., [1 : n ]. If the region r ≡ [ x start : x end , y start : y end ] we intend to update covers the entire range along the x dimension, i.e., x start = 1 , x end = n , we do not break the regions along the x dimension;rather we go inside the root node n R , which itself contains a 1D segment tree T n R . Wethen start dividing the range along the y dimension following the classical 1D SegmentTree algorithm. We again start from the root node of the 1D segment tree, T n R , thataccounts for all the regions whose range along x dimension is [1 : n ]. We then startbreaking the regions along y dimension until we obtain a node n R j , that covers the region r or its subregions, we are updating. Then we update the global values of the node n R j and set the global lazy value as shown in Equations (1) and (2) below. n R j .global.value = n R j .global.value + n R j .size × c (1) n R j .global.lazy = n R j .global.lazy + c (2)After updating the node n R j , as we backtrack to the root of T n R , along the way foreach node n R j (cid:48) we modify its global value as follows (Equation (3)): we assign the sum5f (a) the global values of its left child n R j (cid:48)| l and right child n R j (cid:48)| r and (b) the product ofglobal lazy values and the size of the node. n R j (cid:48) .global.value = n R j (cid:48)| l .global.value + n R j (cid:48)| r .global.value + n R j (cid:48) .global.lazy × n R j (cid:48) .size (3)Here we are updating the global values stored in each node, and as we mentionedearlier in Section 2.3 we term them as intended Updates . Now, it may be the case thatthe intended update region r along the x dimension does not span the entire range [1 : n ].In such a scenario we divide the region along the x dimension further until we arrive ata node n i that operates on a region [ x : x ] such that it is completely contained insidethe range of r along the x dimension, [ x start : x end ]. Similar to the root node, the node n i encapsulates another 1D Segment Tree T n i that operates on all the regions that spanthe range [ x : x ] completely, i.e., the root node of tree T n i is defined on the region[ x : x , m ]. Then we perform the same operations discussed above for the tree T n R .As we are dividing the regions along x dimension, not all divided regions are subregionsof the intended region. On the other hand, the intended region can be a subregion of somedivided regions. There may also exist some regions that overlap with the update regionand some regions that are completely disjoint to it. Similar to the classical SegmentTree update algorithm, we can safely ignore the disjoint regions. For updating a region r (cid:48) ≡ [ x (cid:48) : x (cid:48) ] having overlaps with r , we first trim down the intended update region r along x dimension to get the trimmed range [ x (cid:48) start : x (cid:48) end ], such that x (cid:48) start = max( x start , x (cid:48) ) and x (cid:48) end = min( x end , x (cid:48) ). Clearly, if the range of x dimension of r is contained completelywithin that of r (cid:48) , the trimmed region stays the same. Thus, for the ease of implementation,we perform this trimming in both the cases.For these cases, after dividing along x dimension, we start dividing the regions along y dimension following the same rules applied earlier. However, we now perform a dispersedupdate as follows. Let, we intend to update the region r , and we are currently at a node n operating on r (cid:48) . Now let us assume that r is either a subregion of r (cid:48) or it intersects with r (cid:48) . As stated in the previous paragraph we perform a trimming on r as a general rule.Then, we distribute the effect of updating r over the whole region r (cid:48) and use a scaledvalue c (cid:48) (instead of c ) as defined below: c (cid:48) = scaling × c = x (cid:48) end − x (cid:48) start + 1 x − x + 1 × c (4)Here, [ x (cid:48) start : x (cid:48) end ] is the trimmed range (as discussed above); [ x : x ] is the rangecovered by the region along x dimension; recall that c is the value supplied with the(original) update request.This scaled value c (cid:48) is then dispersed in the entire region through our dispersed update that updates the local values of the nodes as follows. n i j .local.value = n i j .local.value + n i j .size × c (cid:48) (5) n i j .local.lazy = n i j .local.lazy + c (cid:48) (6)Similar to the global values, the local values of the parent nodes are also updated aswe backtrack to the root node. Suppose that during a stage of backtracking we are at6ode n i j having n i j | l and n i j | r as its left and right child respectively. Then, the local valueof node n i j is updated as follows. n i j .local.value = n i j | l .local.value + n i j | r .local.value + n i j .local.lazy × n i j .size (7)The pseudocode of the update operation is presented as Algorithm 1. Recall that the query operation must return the sum of all the elements in the suppliedrange/region r ≡ [ x start : x end , y start : y end ]. To do that, our algorithm starts from theroot node n R . We start dividing the regions along x dimension until the query rangealong x dimension (i.e., [ x start : x end ]) is obtained. Note that, as we travel from onenode to another while dividing the regions along x dimension, even in cases where rangesalong x dimension mismatch, we perform a query on the Segment Tree inside each node.This query is termed as ‘Partial Query’, as we only consider the global values of thenodes. Moreover, we dilute, i.e., scale down the values stored in the nodes, such thatonly the actual contribution of the region represented by the node to the queried regionis considered as follows. If we are at a node n that operates on the region [ x : x , y : y ],then, the scaling factor will be as follows. scaling n = x end − x start + 1 x − x + 1 (8)Similar to the proposed update operation, the query region is trimmed such thatonly the query region that falls within the region operated by the node is considered.In addition to considering the global values stored in the nodes, we also propagate theglobal lazy values stored in the nodes. These lazy values do not require scaling as theyare multiplied by the size of the query region. The partial result obtained from a node, n is therefore as follows. partial result n = n.global.value × scaling n + parent of n (cid:88) v = root node v.global.lazy × query region.size (9)On the other hand, while dividing the regions along x dimension, if we obtain a region r (cid:48) such that the range along x dimension is either equal or is contained within that of thequery region, we perform a ‘Complete Query’. For ‘Complete Query’ we consider boththe local and the global values. Similarly, both the local and global lazy updates arepropagated. Moreover, no scaling is performed as the region already corresponds to thequery region. The complete result obtained from a node n is therefore as follows. Complete Result n = n.global.value + n.local.value + parent of n (cid:88) v = root node ( v.global.lazy + v.local.lazy ) × query region.size (10)After performing both the ‘Partial Query’ and ‘Complete Query’ the results are back-propagated to the root node. From there we compute the output of query operation by7ombining, i.e., adding all these results together. Query ([ x start : x end , y start : y end ]) = (cid:88) n i ∈ N P artial Result ( n i ) + (cid:88) n j ∈ N x Complete Result ( n j )(11)Here, N x is the set of all the visited nodes that operate on regions which either equalto or are x -subregion of the queried region r . The set N comprises the rest of the visitednodes.The pseudocode of the query operation is presented as Algorithm 2. Lemma 3.1. ‘Intended updates’ update the regions as well as their subregions precisely.Proof.
From the steps of our proposed update algorithm (Section 3.1.2), it is trivial toshow that the algorithm updates the actual regions correctly. In all the situations, thegoal of the proposed algorithm is to keep on dividing the ranges until the actual updateregion or its subregion is obtained. After reaching that, the algorithm updates the globalvalues stored in the node and starts backtracking to the root node. In cases when thealgorithm reaches the actual region but not its subregion, the subregions are still updatedas the updates are stored in the global values which are passed to the decedent subregionsanyway. Hence, it is ensured that the ‘Intended Updates’ not only precisely updates theactual region, but also its subregions.From the definition of 1D segment tree, while updating a region r with a value c , anode n operating on a subregion r (cid:48) is updated using the following rule: n.value = n.value + c × n.size (12)Recall that n.size refers to the size of the region the node is operating on, i.e., n.size = || r (cid:48) || . Unfortunately, when working on 2 or higher dimensions, it is not possible to performsuch operations along all but the innermost dimension. This motivated us to proposerelaxed updates along x dimension, which we termed as ‘Dispersed Updates’. Now wehave the following lemma. Lemma 3.2.
The effect of updating a proper subregion of a region is realized by ‘DispersedUpdates’.Proof.
Since, unlike the x dimension the ranges along the y dimension are broken com-pletely while performing the ‘Dispersed Updates’, without any loss of generality we canconsider a proper subregion as a proper x -subregion. Suppose we intend to update r ≡ [ x start : x end , y start : y end ], and the node n is associated to r (cid:48) ≡ [ x : x , y start : y end ],which is a proper x -superregion of r . So, we have r (cid:48) ∪ r = r (cid:48) , r (cid:48) ∩ r = r, r (cid:54) = r (cid:48) . Whenupdating the node n by adding c , according to Equation (12) we should be adding c to all the || r (cid:48) || elements of the subarray the node n is defined on. But in reality, only || r || number of elements are actually being updated. Thus the node value erroneouslyincreases by c × || r (cid:48) || , differing from the expected increment of c × || r || . Now, ‘DispersedUpdate’ scales the update value c to c (cid:48) = c × s by a ratio s = || r |||| r (cid:48) || , where r (cid:48) is coveredby node n and we intend to update r . Hence, when this c (cid:48) is used instead of c , the storedvalue of node n is increased by c × || r || , instead of increasing by c × || r (cid:48) || wrongfully.8ecall that, a dispersed update only modifies the local values of a node n (i.e., n.value and n.lazy ). Lemma 3.2 claims that a dispersed update accurately captures the modifi-cations of the subregions beneath it. Hence we have the following easy corollary. Corollary 3.2.1.
Local values (both value and lazy ) link a node n to the updates of itsdescendants. Now recall that while modifying a node n operating on a region r (cid:48) ≡ [ x : x , y : y ]that intersects with the region of update r ≡ [ x start : x end , y start : y end ], our proposedalgorithm trims r to [ x (cid:48) start : x (cid:48) end , y start : y end ] such that x (cid:48) start = max ( x start , x ) and x (cid:48) end = min ( x end , x ). Following the arguments from lemma 3.2 we have the followingcorollary. Corollary 3.2.2.
Trimming down the update regions (as mentioned above) allows theintersecting regions to be updated properly.
Lemma 3.3.
Diluting the values ensures the contribution of a region to its proper sub-region.Proof.
Suppose we are considering a node n , defined on region r (cid:48) ≡ [ x : x , y : y ],whereas the region of query r ≡ [ x start : x end , y start : y end ] is a proper subregion of r (cid:48) .Suppose, n.global.value = v . This accounts for updating all the elements contained inthat region. However, this value, v , is the representative of all || r (cid:48) || elements encompassedby the region r (cid:48) . But, we are interested in only || r || elements of them belonging to theregion r . Hence, returning the global value v is wrong (and over-estimates the actualvalue). Therefore, we dilute the actual value stored in a node n by a factor as follows: scaling = x end − x start + 1 x − x + 1 = || r |||| r (cid:48) || (13) diluted value = scaling × node.global.value = || r |||| r (cid:48) || × node.global.value (14)Thus we are able to consider the updates of a region while querying on its subregion.We remark that while considering the lazy updates since we multiply them by thenumber of elements in query region, || r || , it estimates the actual value. Now, in the lazypropagation process of the classical 1D Segment Tree, updates that are to be performed ona number of adjacent segments are stored in an ancestor node of the tree, that representsa bigger segment covering the segments to be updated. During query time these updatesare passed down and distributed properly such that the effects of the updates are fulfilled[17]. Unfortunately, following the classical Segment Tree algorithm, it is only possibleto implement lazy propagation in the innermost dimension, i.e., y dimension [18]. As aresult, it is impossible to retrieve the updates that were imposed on regions which mayoverlap with the query region but extends further along the x dimension. Now we presentand prove the following lemma. Lemma 3.4. ‘Partial Query’ mimics the lazy propagation procedure along x dimension.Proof. Partial query only considers the global values ( value and lazy ) of a node andrepeatedly perform queries until the actual query region is reached. Consider a node n associated to a region [ x : x , y : y ]. Suppose, a number of ancestors of this node are9pdated. In order to ensure efficiency, the update algorithm terminates after modifyingthose parent nodes but not node n . Thus, while querying on node n , it is imperative toretrieve these information.Since these updates were ‘Intended Updates’ the global values were altered. ByLemma 3.3, through diluting the values the actual updates of the region caused by theencompassing regions can be obtained. Therefore, by repeatedly performing this ‘Par-tial Query’ for every node visited while dividing the region along x dimension, all theupdates that were performed on the x -superregions are obtained. Thus, it emulates thelazy propagation scheme along the x dimension.While performing the ‘Complete Query’, we consider both the local and global valuesof a node n . The global values are results of ‘Intended Updates’, which correspond to theupdates made on that node, i.e., region. On the other hand, local values originate from‘Dispersed Updates’ which capture the effects of updating the subregions of that region(Lemma 3.2). Hence, from the definitions and the proposed algorithm, it is evident that‘Complete Query’ acknowledges both the updates to a region and its subregions. SO wehave the following lemma. Lemma 3.5. ‘Complete Query’ not only considers the updates to a region but also theupdates to the subregions of that region.
Theorem 3.6.
The Update operation is correct.Proof.
In order to establish the correctness of the update operation, it suffices to ensurethat when updating a specific region, the proposed update algorithm modifies all thenodes associated to that region appropriately. Suppose, we are updating a region [ x start : x end , y start : y end ]. Without any loss of generality, five types of regions can be specified asfollows (also see Figure 2):1. R is the actual update region.2. R belongs to the regions that encompass R as a subregion.3. R belongs to the subregions of R R belongs to the regions that intersect with R R belongs to the regions that are disjoint with R .In Lemma 3.1 we have proved that the proposed algorithm updates both the update regionand its subregions properly ( R and R ). Lemma 3.2 demonstrated how the proposed‘Dispersed Update’ captures the outcome of updating the region R in the regions R .Corollary 3.2.2 extends this idea to the intersecting regions R . Finally we can safely omitthe disjoint regions (i.e., R ) completely (Section 3.1.2). Hence the result follows. Theorem 3.7.
The Query operation is correct.Proof.
While querying on a region r , it is necessary that all the updates that were per-formed on regions associated with r are compiled. Suppose we are querying on a region[ x start : x end , y start : y end ]. Without any loss of generality, values and updates of 3 typesof regions should be considered as follows (also see Figure 3):1. R is the region being queried on itself.10igure 2: Different types of regions to consider while updating the region R . Theseinclude the superregions ( R ), subregions ( R ), intersecting regions ( R ) and completelydisjoint regions ( R )Figure 3: Different types of regions to consider while querying on a region R . Alongwith analyzing the region R itself, it suffices to consider the superregions, R (ancestors)and subregions, R (descendents). The ancestors are connected though global values,whereas the descendents are linked by local values.2. R belongs to the superregions of R .3. R belongs to the subregions of R .Lemma 3.3 and Lemma 3.4 show that the query algorithm compiles all the updates of thesuperregions of the region queried upon through ‘Partial Queries’. On the other hand,Lemma 3.5 proves the effectiveness of ‘Complete Query’ in determining the values andthe updates of the queried region and its subregions. Hence, as shown in the Figure 3,while querying on any region [ x start : x end , y start : y end ] the effects of updating the super-regions are passed through global values ( value and lazy ) and the impact of updatingthe subregions are compiled by combining the local values ( value and lazy ). Therefore,the proposed query algorithm can query on any region correctly.Finally, based on the above arguments, particularly Theorem 3.6 and Theorem 3.7,we can state the following: 11 heorem 3.8 (Correctness) . The proposed data structure handles range sum queries anddynamic updates correctly.
A segment tree on an array of length n is a binary tree with height (cid:100) log n (cid:101) . The totalnumber of nodes in a segment tree is 1 + 2 + 2 + 2 + ... + 2 (cid:100) log n (cid:101) = 2 × (cid:100) log n (cid:101)− ≈ × n .Now our proposed algorithm holds another segment tree inside each node. So, for a 2Darray of size n × m the total space required is O (2 × n × × m ) = O (4 × n × m ) = O ( n × m ).
1D segment tree (on a n -sized array) efficiently performs update and query operationsin logarithmic time, i.e., O (log n ) [8]. This follows trivially since the segment tree con-tinuously divides the segments into two equal parts and a segment of length n can bedivided into two equal parts for at most log n time. Our proposed 2D Segment Treesimilarly visits O (log n ) number of nodes. But as each of these nodes contain another1D Segment Tree inside, an additional time complexity of O (log m ) is introduced. Thisadditional O (log m ) time complexity accounts for both repeated ‘Dispersed Updates’and ‘Partial Queries’ at the nodes. Therefore, the overall time complexity becomes = O (log n × log m ) = O (log n ), assuming n ≥ m , a huge improvement over the previousresults of O ( n log n ) [18]. In order to examine the time complexity experimentally, we performed a number ofrandom tests. We implemented 2D Segment Trees using our approach on two dimensionalarrays of size n × n for n = 5 to 900. We then performed 100 random updates. Eachupdate was followed by 100 random queries. The average time needed for the operationswere calculated and plotted in a graph. We fit a c log n curve and it was seen that forboth update and query functions the time required follows this curve. These results arepresented in Figure 4. However, the times needed for updates seem a bit noisy, which isdue to the fact that we took less number of samples for update operations (100) comparedto that of the query operations (100 ×
100 = 10000).Computing the time duration of the operations is not reliable enough as some resourcesof the CPU may be occupied by some other processes during one computation but freeduring some other computation CPUs are also prone to thermal throttling. Thus, wealso calculated the average number of nodes visited for the operations. These results arepresented in Figure 5. Similarly, in this case, it can be seen that the required number ofsteps follow the shape of a fitted c log n curve. Till now, we have focused on modifying the Segment Tree data structure in such a waythat it can solve the range sum query problem over a two dimensional array. However,12igure 4: Average time needed for update and query operations by the proposed SegmentTreeFigure 5: Average number of nodes visited during update and query operations by theproposed Segment Tree 13he proposed algorithm can not only be extended to higher dimensional problems butalso can be utilized with other types of aggregate functions. In this section, we brieflymention the generalization of our approach. First, we present the following theorem.
Theorem 4.1.
The proposed algorithm can perform range sum query on d -dimensionaldata in O (log d n ) time.Proof. For d = 1, we can use the classical 1D Segment Tree, thus it suffices to prove thistheorem for d ≥
2. We proof this theorem using mathematical induction.
Base Case : k = 2Base case is proved by Theorem 3.8 and Section 3.3.2. Induction step : Let, for p ∈ Z , p >
2, the statement is true.Hence, we are capable of performing dynamic range sum queries on data of p -dimensionsin O (log p n ) time. This also establishes that for p -dimensional data it is possible to im-plement the proposed scheme of lazy propagation by using the ‘Dispersed Updates’ and‘Partial Queries’. Now, let us introduce another external dimension to the p -dimensionalproblem. Since the p-dimensional subspace already supports lazy propagation withinit, we can simply propagate the global values from the outermost new dimension tothe inner p -dimensional space. Thus, lazy propagation can be implemented. Further-more, assuming the range of the outer dimension is of length n , it can be dividedinto two segments at most O (log n ) time. Thus, the overall time complexity becomes O (log n × log p n ) = O (log ( p +1) n ).Thus, if d = p satisfies the statement, d = p + 1 satisfies it as well. Hence, the resultfollows.Finally, we discuss how the proposed algorithm can be extended to other aggregatefunctions as well. In the proposed algorithm we pass the information from x dimension to y dimension through the use of ‘Partial Query’ and ‘Dispersed Update’. Together, theymimic the lazy propagation procedure. The most important part in both these ideas is toscale the regions properly. Through scaling, we acknowledge the effects of the individualelements of the subarray. Since the operations always impose an identical change toall the elements of a region, by considering how the individual elements change, we cansegment away the effect of updating a certain region of the whole space. Thus almost anyaggregate function can be scaled in this away and only a selected portion of the regioncan be considered.Since all our analyses were focused on performing sum queries, we performed divisionsto scale them. However, if we were to perform multiplication queries it would requirecomputing the roots to ensure proper scaling. On the other hand for operations like ANDor OR, no complicated scaling is necessary. Hence it is always possible to extract how anindividual element is modified while performing an aggregate operation and it is possibleto split up the effect of updating a specific portion of the region. Thus, the proposedalgorithm can be extended to other aggregate functions as well.Our Python[19] implementation of the proposed algorithm along with generalizationto solving other dynamic range queries using this algorithm can be found in: https://github.com/robin-0/Multidimensional-Segment-Tree Range query problems are one of the most frequent problems in computer science. Forone dimensional variant, these problems can be solved efficiently and effortlessly using14egment trees [17]. In this section, we describe the relative suitability of the variousalgorithms and data structures in order to solve the two dimensional range sum queryproblem and present a comparison with our proposed algorithm.In a naive brute force manner, one can update ranges and compute the range queriesof a n × m array in O ( nm ) time, which becomes highly inconvenient when the numberof queries is high. As a result, a number of data structures and algorithms have beenproposed to solve the range query update problem efficiently.Sparse tables, developed by using the ideas of dynamic programming [17], are capableof returning queries in constant time. However, the downfall of this data structure is thatit is immutable; after initializing it in O ( nm log n log m ) time, the individual elementscannot be updated. Square root decomposition, also known as Mo’s algorithm, partitionsthe 2D array into p × q grids with p = √ n, q = √ m [17]. This data structure precomputesthe sums of the elements of these grids. Using these grid information, this data structureis capable of updating the entries, but still it requires a time complexity of O ( √ n × √ m ).Tree based data structures are promising in this regard. Quadtrees follow a divideand conquer approach, by recursively dividing the entire region into four squares or nearsquare subregions, and combining them in order to perform range queries [20]. Thisprocedure is quite similar to that of a Segment Tree. Although it performs query andupdate operations in O (log n ) (provided n = m ) time when working on a square region,it suffers greatly when the regions are not of square shape. In the worst cases, forexample, when the regions are of size 1 × n or n ×
1, it has to traverse all the way downto the individual leaf nodes. This makes the time complexity O ( n ) (for d -dimensionalproblems this becomes O ( n d − )) [18]. Fenwick Trees were designed to work on cumulativefrequency tables [11]; however, they can be adapted to perform range queries as well.The original Fenwick Tree was only capable of performing point updates and range queryoperations. But they can be slightly modified to perform range update and point queryas well [17]. Mishra developed a novel approach to perform range update and rangequery simultaneously using Fenwick Trees [18]. Although this data structure manages toperform these operations in O (log n log m ) time ( O (log n ), when n ≥ m ) and O ( n × m )memory, it requires an exponential number of trees to do so (4 d for d dimensional problem)[18]. Furthermore, in order to perform a query it requires 4 query operations on the trees,and to update it demands a total of 36 update operations on the trees, each runningin O (log n log m ) time. These requirements increase exponentially with the number ofdimensions. All these issues make this data structure quite cumbersome, despite that it isthe only data structure capable of performing these queries in asymptotically logarithmictime [18]. Moreover, the algorithm presented for this data structure is strictly for rangesum query only, without any provision of generality.In this paper, we have presented an innovative approach to perform range update andrange queries using 2D Segment Trees. Our proposed algorithm requires O (log n log m )time for these operations ( O (log n ), when n ≥ m ) and O ( n × m ) memory. Also, ouralgorithm holds good for higher dimensional cases and other aggregate functions similarto the original Segment Tree. A relative comparison of the various data structures andalgorithms are presented in Table 1. 15able 1: Comparison of the Different ApproachesData Structure Operation Time Complexity Memory RequirementNaive approach O ( n m ) O (1)Sparse Table O (1) O ( n m log n log m )Sqrt Decomposition O ( √ n √ m ) O ( √ n √ m )Quadtree(Assuming n = m ) O (log n ), for square region O ( n ) O ( n ), otherwiseFenwick Tree O (log n log m ) O ( n m )Proposed Segment Tree O (log n log m ) O ( n m ) In this paper, we have developed a novel approach to perform dynamic range queriesefficiently on higher dimensional data using Segment Trees. This introduces a lot ofnew opportunities to utilize this highly versatile data structure in solving complex multi-dimensional problems. The future directions of our research will be to reduce someunorthodox problems to range query and update problems and exploit the effectivenessof the proposed Segment Tree in solving such problems.
A Overview of the Original Segment Tree Algorithm
A.1 Definition
A Segment Tree T , defined on an array A , is a complete binary tree [8]. Any inter-mediate node n i operates on a segment i start to i end and stores the value of a func-tion f ([ A [ i start ] , . . . , A [ i end ]]) computed on that segment. The node n i has left andright child nodes n l and n r , who are defined on the ranges [ i start . . . (cid:98) i start + i end (cid:99) ] and[ (cid:98) i start + i end (cid:99) + 1 . . . i end ] respectively. This formulation goes on until the leaf nodes of thetree are reached. A leaf node n l is coupled to only a particular element A [ l ] of the array A , and it stores the value of the function f ([ A [ l ]]) computed on that specific element.Segment tree data structures are particularly useful in updating and querying on asegment, as apparent from their name. Although Segment Trees are capable of computinga diverse set of functions, the simplest application of Segment Tree is to solve the rangesum query problem. A.2 Updating the Segment Tree
Segment tree can modify an element of the array A , it is defined on and subsequently,update the values of all the corresponding segments efficiently in logarithmic time. Let,we want to add a constant value c to the i th element of the array A . In order to updatethat element we first start from the root node of the Segment Tree, which covers theentire segment 1 to n . Next, we divide the segment into two segments 1 to (cid:98) n (cid:99) and (cid:98) n + 1 (cid:99) to n , which corresponds to the left and right child of the root node respectively.Then, we visit the node that represents the segment containing the index i , and repeatthe process until the actual leaf node operating on index i is reached. Hence, we updatethe value of that node by adding c to it. Finally, we backtrack to the root node and alongthe way, update all nodes by setting their value to the sum of their two children.16 eaf node.value = leaf node.value + cintermediate node.value = lef t child.value + right child.value A.3 Querying on a Range
In order to query the sum of the elements of a range [ i, . . . , j ], we follow an approachsimilar to the update operation. Again we start from the root node and start dividingthe segments into two equal parts until we obtain segments that completely lie within thequery range. Then we recursively add the values of all such segments and return theirsum. The query operation is also performed in logarithmic time O ( logn ) A.4 Lazy Propagation
In Section A.2 we described the algorithm to update a specific array element using Seg-ment Tree. In order to update a range, we can simply repeat the step m times, where, m is the length of the range. However, this leads to a time complexity of O ( nlogn ). SegmentTree overcomes this limitation by following an elegant process called lazy propagation.In order to do so, we keep another value called ‘lazy update’ in each of the nodes. Thelazy update value accounts for updating all nodes that are predecessors of the currentnode, or equivalently all proper subsegments of the segment we are working on. Duringquerying the sequence these lazy update values are passed on to the child nodes and thusthe lazy updates are propagated to the deeper nodes of the tree.In this process, instead of strictly going to the individual leaf nodes, if in any statewe are at a node where the segment completely lies within the update range, we performa lazy update at that node. We do this by updating the value of the node by adding theproduct of c and the length of the segment operated on by that node. We also incrementthe lazy update value by c , which signifies that the elements of all the subsegments ofthis segment should also be updated by adding c to them. This method of updating quiteresembles the query operation, where not only we work on leaf nodes but also work onintermediate nodes enclosed in the query segment.Lazy propagation allows the Segment Tree to update ranges efficiently in logarithmictime O ( logn ) as well. A.5 Higher Dimensional Segment Trees
The 1 Dimensional Segment Tree can be extended to higher dimensional cases as well.The most basic one is the 2 Dimensional Segment Tree which is defined on a matrix or2D Array A of size n × m .In this case, we have a Segment Tree T , which is a complete binary tree. The nodes ofthis tree operate on segments along the first dimension, i.e., the root node is defined onthe range 1 to n . Similar to the one dimensional Segment Tree, the root node has a leftchild and a right child who operates on ranges 1 to (cid:98) n +12 (cid:99) and (cid:98) n +12 (cid:99) + 1 to n respectively.The child nodes of these intermediate nodes similarly divide the working segment intotwo subsegments, and this division goes on until the leaf nodes are reached.However, for this variant of Segment Tree instead of the nodes containing the com-puted value of a function over a segment, they contain another segment tree inside. Thissecond layer of Segment Tree T (cid:48) , is on the contrary defined on segments along the 2nd17imension. This implies that the root node of tree T (cid:48) operates on the segment 1 to m ,its child nodes operates on ranges 1 to (cid:98) m +12 (cid:99) and (cid:98) m +12 (cid:99) + 1 to m , and so on.Thus, effectively a 2D Segment Tree is a Segment Tree of Segment Trees. This twolayer representation offers two ranges along two different dimensions. These two rangesdefine a rectangular region of the two dimensional array the tree is defined on. Forexample, suppose we are at a node n i of the 1st layer Segment Tree, which works on thesegment i to i along the 1st dimension. Now, inside the node n i the 2nd layer of theSegment Tree is defined. Let us assume in the 2nd layer we are at a node n j , that operateson the range j to j along the 2nd dimension. Thus we are at the same time consideringtwo ranges, i to i along the first dimension and j to j along the second dimension.This limits our analysis to the region A [ i : i , j : j ]. Thus, using the two dimensionalSegment Tree we can perform updates and queries on a region of a two dimensional array.However, the two dimensional Segment Tree can be considered only for point updates[algo max]. It falls short that it does not support lazy propagation. Mishra [18] provedthat the 2D Segment Tree can perform lazy propagation only along the last i.e., the2nd dimension. Which effectively makes the time complexity of the update operation O ( nlogm ).Similarly, by cascading more layers of Segment Trees higher dimensional SegmentTrees can be constructed and utilized. However, these higher dimensional variants also fallshort to lazy propagation, as they are only capable of supporting it along the innermostdimension [18]. As a result, for a k dimensional tree, the time complexity becomes O ( n k − logn ) B Pseudocodes
Here we present the pseudocodes of the proposed Update and Query operations as Algo-rithm 1 and Algorithm 2 respectively.Our Python[19] implementation of the proposed algorithm can be found in: https://github.com/robin-0/Multidimensional-Segment-Tree lgorithm 1 Update
UpdateByX ( rootN ode , 1, n , x start , x end , y start , y end , c ) (cid:46) Starting from the root node function
UpdateByX ( node , x , x , x start , x end , y start , y end , c ) if ( x : x ) is within ( x Start : x End ) then (cid:46) Intended Update
UpdateByY ( newN ode , x , x , 1, m , x start , x end , y start , y end , c ) else if ( x : x ) is outside ( x Start : x End ) then (cid:46) Disjoint Region returnelse x mid = x + x UpdateByX ( lef tChild , x , x mid , x start , x end , y start , y end , c ) UpdateByX ( rightChild , x mid + 1, x , x start , x end , y start , y end , c ) x (cid:48) start = max ( x start , x ) (cid:46) Trimming x (cid:48) end = min ( x end , x ) scaling = x (cid:48) end − x (cid:48) start +1 x − x +1 (cid:46) Dispersed Update
UpdateByY ( newN ode , x , x , 1, m , x (cid:48) start , x (cid:48) end , y start , y end , c × scaling ) end ifend functionfunction UpdateByY ( node , x , x , y , y , x start , x end , y start , y end , c ) if ( y : y ) is within ( y Start : y End ) thenif ( x : x ) is within ( x Start : x End ) then (cid:46) Intended Update node.global.value = node.global.value + c × ( x − x + 1) × ( y − y + 1) node.global.lazy = node.global.lazy + c else (cid:46) Dispersed Update node.local.value = node.local.value + c × ( x − x + 1) × ( y − y + 1) node.local.lazy = node.local.lazy + c end ifelse if ( y : y ) is outside ( y Start : y End ) then (cid:46) Disjoint Region returnelse y mid = y + y UpdateByY ( lef tChild , x , x , y , y mid , x start , x end , y start , y end , c ) UpdateByY ( rightChild , x , x , y mid + 1, y , x start , x end , y start , y end , c ) node.local.value = lef tChild.local.value + rightChild.local.value + node.local.lazy × ( x − x + 1) × ( y − y + 1) node.global.value = lef tChild.global.value + rightChild.global.value + node.global.lazy × ( x − x + 1) × ( y − y + 1) end ifend function lgorithm 2 Query
QueryByX ( node , 1, n , x start , x end , y start , y end ) (cid:46) Starting from the root node function
QueryByX ( node , x , x , x start , x end , y start , y end ) if ( x : x ) is within ( x Start : x End ) then (cid:46) Complete Query return
QueryByY ( newN ode , x , x , 1, m , x start , x end , y start , y end , 0) else if ( x : x ) is outside ( x Start : x End ) then (cid:46) Disjoint Region return else x mid = x + x x (cid:48) start = max ( x start , x ) (cid:46) Trimming x (cid:48) end = min ( x end , x ) return QueryByY ( newN ode , x , x , 1, m , x (cid:48) start , x (cid:48) end , y start , y end , 0) (cid:46) Partial Query+
QueryByX ( lef tChild , x , x mid , x start , x end , y start , y end )+ QueryByX ( rightChild , x mid + 1, x , x start , x end , y start , y end ) end ifend functionfunction QueryByY ( node , x , x , y , y , x start , x end , y start , y end , lazy ) if ( y : y ) is within ( y Start : y End ) thenif ( x : x ) is within ( x Start : x End ) then (cid:46) Complete Query return node.local.value + node.global.value + lazy × ( x − x +1) × ( y − y +1) else (cid:46) Partial Query scaling = ( x end − x start +1)( x − x +1) (cid:46) Diluting values return node.global.value × scaling + lazy × ( x end − x start + 1) × ( y − y + 1) end ifelse if ( y : y ) is outside ( y Start : y End ) then (cid:46) Disjoint Region return else y mid = y + y if ( x : x ) is within ( x Start : x End ) then (cid:46) Complete Query return
QueryByY ( lef tChild , x , x , y , y mid , x start , x end , y start , y end , lazy + node.global.lazy + node.local.lazy )+ QueryByY ( rightChild , x , x , y mid + 1, y , x start , x end , y start , y end , lazy + node.global.lazy + node.local.lazy ) else (cid:46) Partial Query return
QueryByY ( lef tChild , x , x , y , y mid , x start , x end , y start , y end , lazy + node.global.lazy )+ QueryByY ( rightChild , x , x , y mid + 1, y , x start , x end , y start , y end , lazy + node.global.lazy ) end ifend ifend function eferences [1] Michael A Bender and Martin Farach-Colton. The lca problem revisited. In LatinAmerican Symposium on Theoretical Informatics , pages 88–94. Springer, 2000.[2] Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest commonancestors. siam Journal on Computing , 13(2):338–355, 1984.[3] Bernd-Uwe Pagel, Hans-Werner Six, Heinrich Toben, and Peter Widmayer. Towardsan analysis of range query performance in spatial data structures. In
Proceedings ofthe twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of databasesystems , pages 214–221. ACM, 1993.[4] Einar Mykletun and Gene Tsudik. Aggregation queries in the database-as-a-servicemodel. In
IFIP Annual Conference on Data and Applications Security and Privacy ,pages 89–103. Springer, 2006.[5] Xin Li, Young Jin Kim, Ramesh Govindan, and Wei Hong. Multi-dimensional rangequeries in sensor networks. In
Proceedings of the 1st international conference onEmbedded networked sensor systems , pages 63–75. ACM, 2003.[6] Yining Deng, BS Manjunath, Charles Kenney, Michael S Moore, and Hyundoo Shin.An efficient color representation for image retrieval.
IEEE Transactions on imageprocessing , 10(1):140–147, 2001.[7] Dan Boneh and Brent Waters. Conjunctive, subset, and range queries on encrypteddata. In
Theory of Cryptography Conference , pages 535–554. Springer, 2007.[8] Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Com-putational geometry. In
Computational geometry , pages 1–17. Springer, 1997.[9] KS Easwarakumar and T Hema. Bits-tree-an efficient data structure for segmentstorage and query processing. arXiv preprint arXiv:1501.03435 , 2015.[10] Hanan Samet, Azriel Rosenfeld, Clifford A Shaffer, and Robert E Webber. A ge-ographic information system using quadtrees.
Pattern Recognition , 17(6):647–656,1984.[11] Peter M Fenwick. A new data structure for cumulative frequency tables.
Software:Practice and Experience , 24(3):327–336, 1994.[12] Changxi Zheng, Guobin Shen, Shipeng Li, and Scott Shenker. Distributed segmenttree: Support of range query and cover query over dht. In
IPTPS , 2006.[13] Pankaj Gupta and Nick McKeown. Algorithms for packet classification.
IEEE Net-work , 15(2):24–32, 2001.[14] Xing Mei, Xun Sun, Weiming Dong, Haitao Wang, and Xiaopeng Zhang. Segment-tree based cost aggregation for stereo matching. In
Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition , pages 313–320, 2013.[15] Jon Louis Bentley and Derick Wood. An optimal worst case algorithm for reportingintersections of rectangles.
IEEE Transactions on Computers , (7):571–577, 1980.2116] Bernard Chazelle, Herbert Edelsbrunner, Leonidas J Guibas, and Micha Sharir.Algorithms for bichromatic line-segment problems and polyhedral terrains.
Algo-rithmica , 11(2):116–132, 1994.[17] Steven Halim and Felix Halim.
Competitive Programming 3 . Lulu IndependentPublish, 2013.[18] Pushkar Mishra. A new algorithm for updating and querying sub-arrays of multidi-mensional arrays. arXiv preprint arXiv:1311.6093 , 2013.[19] Guido Van Rossum et al. Python programming language. In
USENIX AnnualTechnical Conference , volume 41, page 36, 2007.[20] Hanan Samet. An overview of quadtrees, octrees, and related hierarchical datastructures. In