Tomasz Waleń
University of Warsaw
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tomasz Waleń.
string processing and information retrieval | 2010
Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a string of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for a few classical string problems: computing all distinct kth string powers for a given k, in particular squares for k = 2, and finding all local periods in a given string of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov & Kucherov, 1999). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.
language and automata theory and applications | 2010
Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition v with a period p such that 2p≤|v|. The maximal number of runs in a string of length n has been thoroughly studied, and is known to be between 0.944 n and 1.029 n. In this paper we investigate cubic runs, in which the shortest period p satisfies 3p≤|v|. We show the upper bound of 0.5 n on the maximal number of such runs in a string of length n, and construct an infinite sequence of words over binary alphabet for which the lower bound is 0.406 n.
Information Processing Letters | 2013
Marcin Kubica; Tomasz Kulczyński; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
Abstract We say that two sequences x and w of length m are order-isomorphic (of the same “shape”) if w [ i ] ⩽ w [ j ] if and only if x [ i ] ⩽ x [ j ] for each i , j ∈ [ 1 , m ] . We present a simple linear time algorithm for checking if a given sequence y of length n contains a factor which is order-isomorphic to a given pattern x. A factor is a subsequence of consecutive symbols of y, so we call our problem the consecutive permutation pattern matching. The (general) permutation pattern matching problem is related to general subsequences and is known to be NP-complete. We show that the situation for consecutive subsequences is significantly different and present an O ( n + m ) time algorithm under a natural assumption that the symbols of x can be sorted in O ( m ) time, otherwise the time is O ( n + m log m ) . In our algorithm we use a modification of the classical Knuth–Morris–Pratt string matching algorithm.
Nucleic Acids Research | 2014
Grzegorz Chojnowski; Tomasz Waleń; Janusz M. Bujnicki
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions.
Theoretical Computer Science | 2014
Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a word of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for classical textual problems: computing all distinct k-th word powers for a given k, in particular squares for k=2, and finding all local periods in a given word of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a word and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov and Kucherov, 1999 [25,26]). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.
symposium on theoretical aspects of computer science | 2008
Costas S. Iliopoulos; Maxime Crochemore; Marcin Kubica; M. Sohel Rahman; Tomasz Waleń
The Range Next Value problem (Problem RNV) is a recent interesting variant of the range search problems, where the query is for the immediate next (or equal) value of a given number within a given interval of an array. Problem RNV was introduced and studied very recently by Crochemore et. al [Finding Patterns In Given Intervals, MFCS 2007]. In this paper, we present improved algorithms for Problem RNV. We also show how this problem can be used to achieve optimal query time for a number of interesting variants of the classic pattern matching problems.
Journal of Combinatorial Optimization | 2006
Sergey Bereg; Marcin Kubica; Tomasz Waleń; Binhai Zhu
In this paper, we present a new model for RNA multiple sequence structural alignment based on the longest common subsequence. We consider both the off-line and on-line cases. For the off-line case, i.e., when the longest common subsequence is given as a linear graph with n vertices, we first present a polynomial O(n2) time algorithm to compute its maximum nested loop. We then consider a slightly different problem—the Maximum Loop Chain problem and present an algorithm which runs in O(n5) time. For the on-line case, i.e., given m RNA sequences of lengths n, compute the longest common subsequence of them such that this subsequence either induces a maximum nested loop or the maximum number of matches, we present efficient algorithms using dynamic programming when m is small.
conference on current trends in theory and practice of informatics | 2009
Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Wojciech Rytter; Tomasz Waleń
Suffix arrays provide a powerful data structure to solve several questions related to the structure of all the factors of a string. We show how they can be used to compute efficiently two new tables storing different types of previous factors (past segments) of a string. The concept of a longest previous factor is inherent to Ziv-Lempel factorization of strings in text compression, as well as in statistics of repetitions and symmetries. The longest previous reverse factor for a given position i is the longest factor starting at i, such that its reverse copy occurs before, while the longest previous non-overlapping factor is the longest factor v starting at i which has an exact copy occurring before. The previous copies of the factors are required to occur in the prefix ending at position i ? 1. We design algorithms computing the table of longest previous reverse factors (LPrF table) and the table of longest previous non-overlapping factors (LPnF table). The latter table is useful to compute repetitions while the former is a useful tool for extracting symmetries. These tables are computed, using two previously computed read-only arrays (SUF and LCP) composing the suffix array, in linear time on any integer alphabet. The tables have not been explicitly considered before, but they have several applications and they are natural extensions of the LPF table which has been studied thoroughly before. Our results improve on the previous ones in several ways. The running time of the computation no longer depends on the size of the alphabet, which drops a log factor. Moreover the newly introduced tables store additional information on the structure of the string, helpful to improve, for example, gapped palindrome detection and text compression using reverse factors.
string processing and information retrieval | 2012
Tomasz Kociumaka; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
We present several efficient data structures for answering queries related to periods in words. For a given word w of length n the Period Query given a factor of w (represented by an interval) returns its shortest period and a compact representation of all periods. Several algorithmic solutions are proposed that balance the data structure space (ranging from O(n) to O(nlogn)), and the query time complexity (ranging from O(log1+en) to O(logn)).
European Journal of Combinatorics | 2013
Maxime Crochemore; Lucian Ilie; Costas S. Iliopoulos; Marcin Kubica; Wojciech Rytter; Tomasz Waleń
The Longest Previous Factor array gives, for each position i in a string y , the length of the longest factor (substring) of y that occurs both at i and to the left of i in y . The Longest Previous Factor array is central in many text compression techniques as well as in the most efficient algorithms for detecting motifs and repetitions occurring in a text. Computing the Longest Previous Factor array requires usually the Suffix Array and the Longest Common Prefix array. We give the first time-space optimal algorithm that computes the Longest Previous Factor array, given the Suffix Array and the Longest Common Prefix array. We also give the first linear-time algorithm that computes the permutation that applied to the Longest Common Prefix array produces the Longest Previous Factor array.