Tatiana A. Starikovskaya
University of Bristol
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tatiana A. Starikovskaya.
symposium on discrete algorithms | 2015
Maxim A. Babenko; Paweł Gawrychowski; Tomasz Kociumaka; Tatiana A. Starikovskaya
We present an improved wavelet tree construction algorithm and discuss its applications to a number of rank/select problems for integer keys and strings. Given a string of length n over an alphabet of size σ ≤ n, our method builds the wavelet tree in O(n log σ / [EQUATION]log n) time, improving upon the state-of-the-art algorithm by a factor of [EQUATION]log n. As a consequence, given an array of n integers we can construct in O(n[EQUATION]log n) time a data structure consisting of O(n) machine words and capable of answering rank/select queries for the subranges of the array in O(log n/ log log n) time. This is a log log n-factor improvement in query time compared to Chan and Patrascu (SODA 2010) and a [EQUATION]log n-factor improvement in construction time compared to Brodal et al. (Theor. Comput. Sci. 2011). Next, we switch to stringological context and propose a novel notion of wavelet suffix trees. For a string w of length n, this data structure occupies O(n) words, takes O(n[EQUATION]log n) time to construct, and simultaneously captures the combinatorial structure of substrings of w while enabling efficient top-down traversal and binary search. In particular, with a wavelet suffix tree we are able to answer in O(log|x|) time the following two natural analogues of rank/select queries for suffixes of substrings: 1) For substrings x and y of w (given by their endpoints) count the number of suffixes of x that are lexicographically smaller than y; 2) For a substring x of w (given by its endpoints) and an integer k, find the k-th lexicographically smallest suffix of x. We further show that wavelet suffix trees allow to compute a run-length-encoded Burrows-Wheeler transform of a substring x of w (again, given by its endpoints) in O(s log |x|) time, where s denotes the length of the resulting run-length encoding. This answers a question by Cormode and Muthukrishnan (SODA 2005), who considered an analogous problem for Lempel-Ziv compression. All our algorithms, except for the construction of wavelet suffix trees, which additionally requires O(n) time in expectation, are deterministic and operate in the word RAM model.
mathematical foundations of computer science | 2012
Tatiana A. Starikovskaya
We present an algorithm which computes the Lempel-Ziv factorization of a word W of length n on an alphabet Σ of size σ online in the following sense: it reads W starting from the left, and, after reading each r=O(logσn) characters of W, updates the Lempel-Ziv factorization. The algorithm requires O(nlogσ) bits of space and O(n log2n) time. The basis of the algorithm is a sparse suffix tree combined with wavelet trees.
combinatorial pattern matching | 2012
Gregory Kucherov; Yakov Nekrich; Tatiana A. Starikovskaya
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem.
european symposium on algorithms | 2015
Raphaël Clifford; Allyx Fontaine; Ely Porat; Benjamin Sach; Tatiana A. Starikovskaya
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(loglog(k + m)) time per arriving character and uses O(k logm) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.
european symposium on algorithms | 2014
Tomasz Kociumaka; Tatiana A. Starikovskaya; Hjalte Wedel Vildhøj
Given m documents of total length n, we consider the problem of finding a longest string common to at least d ≥ 2 of the documents. This problem is known as the longest common substring (LCS) problem and has a classic \(\mathcal{O}(n)\) space and \(\mathcal{O}(n)\) time solution (Weiner [FOCS’73], Hui [CPM’92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter 1 ≤ τ ≤ n, the LCS problem can be solved in \(\mathcal{O}(\tau)\) space and \(\mathcal{O}(n^2/\tau)\) time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a τ-additive approximation to the LCS in \(\mathcal{O}(n^2/\tau)\) time and \(\mathcal{O}(1)\) space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in \(\mathcal{O}(\tau)\) space must use \(\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)})\) time.
computer science symposium in russia | 2008
Maxim A. Babenko; Tatiana A. Starikovskaya
Given a set of N strings A = {α1, ..., αN} of total length n over alphabet Σ one may ask to find, for each 2 ≤ K ≤ N, the longest substring β that appears in at least K strings in A. It is known that this problem can be solved in O(n) time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in O(1) time). Also, its running time and memory consumption may depend on |Σ|. This paper presents an alternative, remarkably simple approach to the above problem, which relies on the notion of suffix arrays. Once the suffix array of some auxiliary O(n)-length string is computed, one needs a simple O(n)-time postprocessing to find the requested longest substring. Since a number of efficient and simple linear-time algorithms for constructing suffix arrays has been recently developed (with constant not depending on |Σ|), our approach seems to be quite practical.
combinatorial pattern matching | 2016
Tatiana A. Starikovskaya
In the longest common substring problem we are given two strings of length
data compression, communications and processing | 2011
Roman Kolpakov; Gregory Kucherov; Tatiana A. Starikovskaya
n
Problems of Information Transmission | 2011
Maxim A. Babenko; Tatiana A. Starikovskaya
and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimeister and Morgenstern introduced the problem of the longest common substring with
combinatorial pattern matching | 2013
Tatiana A. Starikovskaya; Hjalte Wedel Vildhøj
k