Abstract

The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive on-line, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper, a new solution for on-line indexing is presented by providing an on-line suffix tree construction in O(loglogn+loglog|Σ|) worst-case expected time per character, where n is the size of the string, and Σ is the alphabet. This improves upon all previously known on-line suffix tree constructions for general alphabets, at the cost of having the run time in expectation. The main idea is to reduce the problem of constructing a suffix tree on-line to an interesting variant of the order maintenance problem, which may be of independent interest. In the famous order maintenance problem, one wishes to maintain a dynamic list L of size n under insertions, deletions, and order queries. In an order query, one is given two nodes from L and must determine which node precedes the other in L . In the Predecessor search on Dynamic Subsets of an Ordered Dynamic List problem (POLP) it is also necessary to maintain dynamic subsets of L such that given some u∈L it will be possible to quickly locate the predecessor of u in any subset. This paper provides an efficient data structure capable of solving the POLP with worst-case expected bounds that match the currently best known bounds for predecessor search in the RAM model, improving over a solution which may be implicitly obtained from Dietz [Die89]. Furthermore, this paper improves or simplifies bounds for several additional applications, including fully-persistent arrays and the Order-Maintenance Problem.

Full PDF