Access-Adaptive Priority Search Tree
AAccess-Adaptive Priority Search Tree
Haley Massa and Jeffrey Uhlmann
Dept. of Electrical Engineering and Computer ScienceUniversity of Missouri - Columbia
Abstract —In this paper we show that the priority searchtree of McCreight, which was originally developed tosatisfy a class of spatial search queries on 2-dimensionalpoints, can be adapted to the problem of dynamicallymaintaining a set of keys so that the query complexityadapts to the distribution of queried keys. Presently, thebest-known example of such a data structure is the splaytree, which dynamically reconfigures itself during eachquery so that frequently accessed keys move to the top ofthe tree and thus can be retrieved with fewer queries thankeys that are lower in the tree. However, while the splaytree is conjectured to offer optimal adaptive amortizedquery complexity, it may require O ( n ) for individualqueries. We show that an access-adaptive priority searchtree (AAPST) can provide competitive adaptive queryperformance while ensuring O (log n ) worst-case queryperformance, thus potentially making it more suitable forcertain interactive (e.g.,online and real-time) applicationsfor which the response time must be bounded. Keywords : search trees, adaptive data structures, adaptive searchtrees, priority search tree, splay tree.
I. I
NTRODUCTION
Many applications demand the efficient satisfactionof key-retrieval queries from a dynamically-maintainedsearch structure (database) of n keys. In many of theseapplications certain keys are queried much more fre-quently than other keys, and this nonuniform samplingfrom the set of n keys can potentially be exploited bya distribution-sensitive search structure to surpass the O (log n ) comparison-based theoretical lower bound onthe expected number of comparisons per query requiredin the uniform case.The splay tree, developed by Daniel Sleator andRobert Tarjan [4], is a self-adjusting [1] binary searchtree that optimizes its structure to the distribution pat-terns of the dataset. Splay trees differ from standardbalanced BSTs by performing rotations that migratefrequently-accessed keys to the top of the tree so thatthe search paths to those keys will be shorter whenaccessed during future queries. A novelty of the splaytree is that it does not necessarily enforce balance atall times during a given sequence of n updates and/or queries, but it does guarantee that the complexity of anygiven sequence is O ( n log n ) . The value of the splaytree as an access-sensitive search structure is that itcan offer sequence time complexity approaching O ( n ) if the access distribution of keys is highly nonuniform.By contrast, a standard balanced BST (e.g., AVL, red-black, etc. [3]) provides no access-distribution sensitivityand thus can be expected to require O ( n log n ) timeto perform a sequence of O ( n ) operations. A naturalquestion is whether it is possible to combine the access-sensitive properties of the splay tree with the efficientworst-case properties of a balanced BST.In this paper we show that the priority search tree [2], which was published in the same year as the splaytree (1985), can be applied to achieve adaptive queryperformance that is competitive with the splay treewhile providing superior worst-case optimal O (log n ) update and query complexity. This access-adaptive pri-ority search tree (AAPST) is described in the followingsection. We then provide practical comparisons of theAAPST and splay tree in the form of simulation resultswith varying degrees of nonuniformity in the samplingof query keys.II. A DAPTIVE P RIORITY S EARCH T REE
The priority search tree (PST) is a data structureintroduced by Edward McCreight in 1985 with theobjective of storing a set of n points in R in a waythat allows for O (log n ) update complexity, i.e., insertionor deletion of a point, and O (log n + k ) complexity forsemi-infinite 2-dimensional range queries where k is thenumber of returned objects. This complexity is achievedby maintaining the points simultaneously in BST orderon the x coordinates and heap order on the y coordinateswithin the same binary tree structure. The data structureallows for five main operations on a dataset D of orderedpairs to be performed efficiently:1) Insert an ordered pair into D .2) Delete an ordered pair from D .1 a r X i v : . [ c s . D S ] S e p
3) Given integers x , x , and y , among all the pairs ( x, y ) in D such that x ≤ x ≤ x and y ≤ y ,find the pair whose x is minimal.4) Given integers x and x , among all the pairs ( x, y ) in D such that x ≤ x ≤ x , find the pairwhose y is minimal.5) Given integers x , x , and y , enumerate all k pairs ( x, y ) in D such that x ≤ x ≤ x and y ≤ y .The priority search tree was the first data structureto support 2-dimensional spatial search queries withinthe same O (log n + k ) complexity of 1-dimensionalrange queries offered by balanced binary search trees(BSTs) while also supporting O (log n ) update opera-tions. Specifically operations 1-4 have time complexity O (log n ) and operation 5 has O (log n + k ) complexity.Each node in a priority search tree contains exactlyone ordered pair ( x, y ) . A maximum PST is constructedso that the y -value of every child node is less than orequal to that of its parent whereas in a minimum PSTthe y -value of a child node is greater than or equal toit parent’s y -value. In both cases, the x -value of everynode in a right subtree is strictly less than that of everynode in the left subtree. Furthermore, the cardinality ofa node’s right subtree is equal to or one less than thenode’s left subtree, ensuring balance.While the priority search tree was originally createdto store two-dimensional coordinates, we introduce herean alternative use of the data structure. With someslight construction and operation alterations, the prior-ity search tree can be used as a distribution-sensitivesearch structure. We will call this specialized structurean access adaptive priority search tree (AAPST). TheAAPST stores each key of a dataset in the x-value ofa node and its respective access frequency as the y-value. In other words, the skeleton of the tree maintainsa BST ordering of the keys while the access-freqenciesassociated with the keys are maintained in heap order.By slightly altering the search algorithms of a regularpriority search tree, any search key can be found (or notfound) in an AAPST in O (log n ) time. The principalchange to the standard PST is the incrementing of thepriorities (access frequencies) associated with the keys.Specifically, when a key is accessed by either an updateor a query, its access count is incremented and the key’sposition in the heap may then also be incremented. Morespecifically, a modified access/query algorithm can bedefined as follows:1) Find the query key and increment its associatedpriority.2) If the incremented priority does not exceed thepriority of the key-priority pair in its parent node then return.3) Else delete the pair and reinsert using the standardPST update algorithms.Step 1 takes time proportional to the pair’s depth in thetree, and this will be the complexity of the operationin all cases in which the updated priority does not affectthe heap order; otherwise the complexity of the operationwill be dominated by the O (log n ) complexities of thestandard PST update algorithms. This establishes theworst-case O (log n ) complexity of the new adaptivequery algorithm.III. C OMPARATIVE P ERFORMANCE R ESULTS
In this section we examine the relative performancecharacteristics of a conventional balanced binary searchtree (BST), a splay tree, and the AAPST. In the caseof uniformly sampled query keys, the splay tree andAAPST incur extra overhead compared to the BST.In the case of the splay tree, this overhead takes theform of extra comparisons performed as the tree isrestructured. In the case of the AAPST, the overheadtakes the form of an extra key comparison per nodevisited: one comparison to the key stored at each nodeaccording to heap order, and another comparison to thepivot key stored at each node for use in traversing the treeaccording to BST order. Therefore, the goal of our testsis to examine how relative number of key comparisonsused during the search of each data structure is affectedby the distribution of queried keys. We should expect theBST to be superior in the uniform case while the splaytree and AAPST should perform better as the distributionbecomes increasingly nonuniform.We define a value p , ≤ p ≤ , to parameter-ize our key-access testing distributions with p = 0 representing a uniform random distribution of key ac-cesses; p = 1 representing an exponentially-distributedsequence of accesses with the most frequently-accessedkey representing approximately of the accesses, thenext representing of the accesses, etc., such that O (log n ) of the keys comprise O ( n ) of the accesses; and < p < representing a weighted mixture of accessesfrom the two distributions.As can be seen in Figure 1, the splay and AAPStrees perform comparably but are outperformed by theBST because of its lower overhead. In other words, theoverhead of adaptivity incurred by the splay and AOPStrees does not yield dividends in the case of keys thatare queried uniform-randomly. Figure 2 shows that therelative performance advantage of the BST decreaseswhen of the key accesses exponentially distributedaccess frequencies. Figure 3 shows that when there is an equal mix of keys sampled from the uniformand exponential distributions the three search structuresperform comparably. In the case of all keys sampledexponentially, Figures 4 and 5 show that the distribution-sensitivity properties of the splay and AOPS trees pro-vide a significant performance advantage over the BSTas the access frequencies tend toward an exponentialdistribution. Fig. 1. This figure shows the average number of key comparisons forquery keys drawn from uniform distribution for datasets of increasingsize n . The expected number of key comparisons performed by theBST is log( n ) while the splay and AOPS trees perform roughly twiceas many comparisons per query.Fig. 2. This figure shows the average number of key comparisonswhen the / of the query keys are sampled with exponentialfrequency and the remaining are sampled uniformly. Fig. 3. This figure shows the average number of key comparisonswhen half of the query keys are sampled with exponential frequencyand the remaining half are sampled uniformly.Fig. 4. This figure shows the average number of key comparisonswhen of the query keys are sampled according to an exponetialdistribution.Fig. 5. This figure shows the average number of key comparisonswhen all query keys are sampled according to an exponential dis-tribution, i.e., the frequency of access of different keys decreasesexponentially. IV. D
ISCUSSION
The principal contribution of this paper is the demon-stration that the classical priority search tree of Mc-Creight can be reinterpreted so that instead of stor-ing 2-dimensional points it is adapted for the access-sensitive storage and retrieval of 1-dimensional keys.Our simulation results show that the new access-adaptivepriority search tree (AAPST) offers comparable access-sensitive performance to the splay tree while boundingthe complexity of each operation, a property which isneeded for interactive applications that must imposestrict constraints on the worst-case response time ofeach operation. Future work will examine finer-grainperformance characteristics of the AAPST and theirrelevance to practical applications.R
EFERENCES [1] B. Allen and I. Munro, “Self-organizing search trees,”
Journalof the ACM , 25 (4): 526–535, 1978.[2] McCreight, Edward, “Priority search trees,”
SIAM Journal onScientific Computing , 14 (2): 257–276, 1985.[3] Sedgewick, Robert, “Balanced Trees,”
Algorithms , Addison-Wesley, 1983.[4] Sleator, Daniel D.; Tarjan, Robert E., “Self-Adjusting BinarySearch Trees,”
Journal of the ACM , 32 (3): 652–686. 1985.
Haley Massa is an undergraduate research stu-dent in computer science and applied mathe-matics at the University of Missouri-Columbia.In addition to algorithms and data structures,Haley enjoys studying number theory and full-stack web development.