Tatiana A. Starikovskaya

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tatiana A. Starikovskaya is active.

Explore More

Publication

Featured researches published by Tatiana A. Starikovskaya.

symposium on discrete algorithms | 2015

Wavelet trees meet suffix trees

Maxim A. Babenko; Paweł Gawrychowski; Tomasz Kociumaka; Tatiana A. Starikovskaya

We present an improved wavelet tree construction algorithm and discuss its applications to a number of rank/select problems for integer keys and strings. Given a string of length n over an alphabet of size σ ≤ n, our method builds the wavelet tree in O(n log σ / [EQUATION]log n) time, improving upon the state-of-the-art algorithm by a factor of [EQUATION]log n. As a consequence, given an array of n integers we can construct in O(n[EQUATION]log n) time a data structure consisting of O(n) machine words and capable of answering rank/select queries for the subranges of the array in O(log n/ log log n) time. This is a log log n-factor improvement in query time compared to Chan and Patrascu (SODA 2010) and a [EQUATION]log n-factor improvement in construction time compared to Brodal et al. (Theor. Comput. Sci. 2011). Next, we switch to stringological context and propose a novel notion of wavelet suffix trees. For a string w of length n, this data structure occupies O(n) words, takes O(n[EQUATION]log n) time to construct, and simultaneously captures the combinatorial structure of substrings of w while enabling efficient top-down traversal and binary search. In particular, with a wavelet suffix tree we are able to answer in O(log|x|) time the following two natural analogues of rank/select queries for suffixes of substrings: 1) For substrings x and y of w (given by their endpoints) count the number of suffixes of x that are lexicographically smaller than y; 2) For a substring x of w (given by its endpoints) and an integer k, find the k-th lexicographically smallest suffix of x. We further show that wavelet suffix trees allow to compute a run-length-encoded Burrows-Wheeler transform of a substring x of w (again, given by its endpoints) in O(s log |x|) time, where s denotes the length of the resulting run-length encoding. This answers a question by Cormode and Muthukrishnan (SODA 2005), who considered an analogous problem for Lempel-Ziv compression. All our algorithms, except for the construction of wavelet suffix trees, which additionally requires O(n) time in expectation, are deterministic and operate in the word RAM model.

mathematical foundations of computer science | 2012

Computing lempel-ziv factorization online

Tatiana A. Starikovskaya

We present an algorithm which computes the Lempel-Ziv factorization of a word W of length n on an alphabet Σ of size σ online in the following sense: it reads W starting from the left, and, after reading each r=O(logσn) characters of W, updates the Lempel-Ziv factorization. The algorithm requires O(nlogσ) bits of space and O(n log2n) time. The basis of the algorithm is a sparse suffix tree combined with wavelet trees.

combinatorial pattern matching | 2012

Cross-Document pattern matching

Gregory Kucherov; Yakov Nekrich; Tatiana A. Starikovskaya

We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem.

european symposium on algorithms | 2015

Dictionary Matching in a Stream

Raphaël Clifford; Allyx Fontaine; Ely Porat; Benjamin Sach; Tatiana A. Starikovskaya

We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(loglog(k + m)) time per arriving character and uses O(k logm) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.

european symposium on algorithms | 2014

Sublinear Space Algorithms for the Longest Common Substring Problem

Tomasz Kociumaka; Tatiana A. Starikovskaya; Hjalte Wedel Vildhøj

Given m documents of total length n, we consider the problem of finding a longest string common to at least d ≥ 2 of the documents. This problem is known as the longest common substring (LCS) problem and has a classic \(\mathcal{O}(n)\) space and \(\mathcal{O}(n)\) time solution (Weiner [FOCS’73], Hui [CPM’92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter 1 ≤ τ ≤ n, the LCS problem can be solved in \(\mathcal{O}(\tau)\) space and \(\mathcal{O}(n^2/\tau)\) time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a τ-additive approximation to the LCS in \(\mathcal{O}(n^2/\tau)\) time and \(\mathcal{O}(1)\) space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in \(\mathcal{O}(\tau)\) space must use \(\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)})\) time.

computer science symposium in russia | 2008

Computing longest common substrings via suffix arrays

Maxim A. Babenko; Tatiana A. Starikovskaya

Given a set of N strings A = {α1, ..., αN} of total length n over alphabet Σ one may ask to find, for each 2 ≤ K ≤ N, the longest substring β that appears in at least K strings in A. It is known that this problem can be solved in O(n) time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in O(1) time). Also, its running time and memory consumption may depend on |Σ|. This paper presents an alternative, remarkably simple approach to the above problem, which relies on the notion of suffix arrays. Once the suffix array of some auxiliary O(n)-length string is computed, one needs a simple O(n)-time postprocessing to find the requested longest substring. Since a number of efficient and simple linear-time algorithms for constructing suffix arrays has been recently developed (with constant not depending on |Σ|), our approach seems to be quite practical.

combinatorial pattern matching | 2016

Longest common substring with approximately k mismatches

Tatiana A. Starikovskaya

In the longest common substring problem we are given two strings of length

data compression, communications and processing | 2011

Pattern Matching on Sparse Suffix Trees

Roman Kolpakov; Gregory Kucherov; Tatiana A. Starikovskaya

Problems of Information Transmission | 2011

Computing the longest common substring with one mismatch

Maxim A. Babenko; Tatiana A. Starikovskaya

and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimeister and Morgenstern introduced the problem of the longest common substring with

combinatorial pattern matching | 2013