On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List
Abstract
The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive on-line, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper, a new solution for on-line indexing is presented by providing an on-line suffix tree construction in
O(loglogn+loglog|Σ|)
worst-case expected time per character, where
n
is the size of the string, and
Σ
is the alphabet. This improves upon all previously known on-line suffix tree constructions for general alphabets, at the cost of having the run time in expectation.
The main idea is to reduce the problem of constructing a suffix tree on-line to an interesting variant of the order maintenance problem, which may be of independent interest. In the famous order maintenance problem, one wishes to maintain a dynamic list
L
of size
n
under insertions, deletions, and order queries. In an order query, one is given two nodes from
L
and must determine which node precedes the other in
L
. In the Predecessor search on Dynamic Subsets of an Ordered Dynamic List problem (POLP) it is also necessary to maintain dynamic subsets of
L
such that given some
u∈L
it will be possible to quickly locate the predecessor of
u
in any subset. This paper provides an efficient data structure capable of solving the POLP with worst-case expected bounds that match the currently best known bounds for predecessor search in the RAM model, improving over a solution which may be implicitly obtained from Dietz [Die89].
Furthermore, this paper improves or simplifies bounds for several additional applications, including fully-persistent arrays and the Order-Maintenance Problem.