Is this you? Create Your Porfile

Tomáš Flouri

Heidelberg Institute for Theoretical Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tomáš Flouri is active.

Explore More

Publication

Featured researches published by Tomáš Flouri.

bioinformatics and biomedicine | 2010

An algorithm for mapping short reads to a dynamically changing genomic sequence

Tomáš Flouri; Jan Holub; Costas S. Iliopoulos; Solon P. Pissis

The constant advances in sequencing technology have redefined the way genome sequencing is performed. They are able to produce tens of millions of short sequences (reads), during a single experiment, and with a much lower cost than previously possible. Due to this massive amount of data, efficient algorithms for mapping these reads to reference sequences are in great demand, and recently, there has been ample work for publishing such algorithms. In this paper, we study a different version of this problem: mapping these reads to a dynamically changing genomic sequence. We propose a new practical algorithm, which employs a suitable data structure that takes into account potential dynamic effects (replacements, insertions, deletions) on the genomic sequence. The presented experimental results demonstrate that the proposed approach can be applied to address the problem of mapping millions of reads to multiple genomic sequences.

Systematic Biology | 2015

The phylogenetic likelihood library.

Tomáš Flouri; F. Izquierdo-Carrasco; Diego Darriba; Andre J. Aberer; L.-T. Nguyen; B.Q. Minh; A. Von Haeseler; Alexandros Stamatakis

We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2–10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL).

Information Processing Letters | 2015

Longest common substrings with k mismatches

Tomáš Flouri; Emanuele Giaquinta; Kassian Kobert; Esko Ukkonen

The longest common substring with k-mismatches problem is to find, given two strings S 1 and S 2 , a longest substring A 1 of S 1 and A 2 of S 2 such that the Hamming distance between A 1 and A 2 is ?k. We introduce a practical O ( n m ) time and O ( 1 ) space solution for this problem, where n and m are the lengths of S 1 and S 2 , respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S 1 and S 2 in O ( n m ) time and O ( m ) space. Moreover, we also present a theoretical solution for the k = 1 case which runs in O ( n log ? m ) time, assuming m ? n , and uses O ( m ) space, improving over the existing O ( n m ) time and O ( m ) space bound of Babenko and Starikovskaya 1]. Two new algorithms for the longest common substring with k mismatches problem.A practical solution for arbitrary k which uses constant space.A theoretical solution for one mismatch which runs in quasilinear time.

BMC Bioinformatics | 2013

libgapmis: extending short-read alignments

Nikolaos Alachiotis; Simon A. Berger; Tomáš Flouri; Solon P. Pissis; Alexandros Stamatakis

BackgroundA wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of mismatches in the alignment; however, their ability to allow for gaps varies greatly, with many performing poorly or not allowing them at all. The seed-and-extend strategy is applied in most short-read alignment programmes. After aligning a substring of the reference sequence against the high-quality prefix of a short read--the seed--an important problem is to find the best possible alignment between a substring of the reference sequence succeeding and the remaining suffix of low quality of the read--extend. The fact that the reads are rather short and that the gap occurrence frequency observed in various studies is rather low suggest that aligning (parts of) those reads with a single gap is in fact desirable.ResultsIn this article, we present libgapmis, a library for extending pairwise short-read alignments. Apart from the standard CPU version, it includes ultrafast SSE- and GPU-based implementations. libgapmis is based on an algorithm computing a modified version of the traditional dynamic-programming matrix for sequence alignment. Extensive experimental results demonstrate that the functions of the CPU version provided in this library accelerate the computations by a factor of 20 compared to other programmes. The analogous SSE- and GPU-based implementations accelerate the computations by a factor of 6 and 11, respectively, compared to the CPU version. The library also provides the user the flexibility to split the read into fragments, based on the observed gap occurrence frequency and the length of the read, thereby allowing for a variable, but bounded, number of gaps in the alignment.ConclusionsWe present libgapmis, a library for extending pairwise short-read alignments. We show that libgapmis is better-suited and more efficient than existing algorithms for this task. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any short-read alignment pipeline. The open-source code of libgapmis is available at http://www.exelixis-lab.org/gapmis.

Science | 2015

Response to Comment on “Phylogenomics resolves the timing and pattern of insect evolution”

Karl M. Kjer; Jessica L. Ware; Jes Rust; Torsten Wappler; Robert Lanfear; Lars S. Jermiin; Xin Zhou; Horst Aspöck; Ulrike Aspöck; Rolf G. Beutel; Alexander Blanke; A. Donath; Tomáš Flouri; Paul B. Frandsen; P. Kapli; Akito Y. Kawahara; Harald Letsch; C. Mayer; Duane D. McKenna; Karen Meusemann; Oliver Niehuis; Ralph S. Peters; Brian M. Wiegmann; David K. Yeates; B.M. von Reumont; Alexandros Stamatakis; Bernhard Misof

Tong et al. comment on the accuracy of the dating analysis presented in our work on the phylogeny of insects and provide a reanalysis of our data. They replace log-normal priors with uniform priors and add a “roachoid” fossil as a calibration point. Although the reanalysis provides an interesting alternative viewpoint, we maintain that our choices were appropriate.

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine | 2011

Approximate string-matching with a single gap for sequence alignment

Tomáš Flouri; Kunsoo Park; Kimon Frousios; Solon P. Pissis; Costas S. Iliopoulos; German Tischler

This paper deals with the approximate string-matching problem with Hamming distance and a single gap for sequence alignment. We consider an extension of the approximate string-matching problem with Hamming distance, by also allowing the existence of a single gap, either in the text, or in the pattern. This problem is strongly and directly motivated by the next-generation re-sequencing procedure. We present a general algorithm that requires O (nm) time, where n is the length of the text and m is the length of the pattern, but this can be reduced to O (mβ) time, if the maximum length β of the gap is given.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Boosting the Performance of Bayesian Divergence Time Estimation with the Phylogenetic Likelihood Library

Diego Darriba; Andre J. Aberer; Tomáš Flouri; Tracy A. Heath; Fernando Izquierdo-Carrasco; Alexandros Stamatakis

We present a substantially improved and parallelized version of DPPDiv, a software tool for estimating species divergence times and lineage-specific substitution rates on a fixed tree topology. The improvement is achieved by integrating the DPPDiv code with the Phylogenetic Likelihood Library (PLL), a fast, optimized, and parallelized collection of functions for conducting likelihood computations on phylogenetic trees. We show that, integrating the PLL into a likelihoodbased application is straight-forward since it took the first author (DD) a programming effort of only one month, without having prior knowledge of DPPDiv, nor the PLL. We achieve sequential speedups that range between a factor of two to three and near-optimal parallel speedups up to 48 threads on sufficiently large datasets. Hence, with a programming effort of one month, we were able to improve DPPDivs time-to-solution on parallel systems by two orders of magnitude and also to substantially improve its ability to infer divergence times on large-scale datasets.

Theoretical Computer Science | 2013

Enhanced string covering

Tomáš Flouri; Costas S. Iliopoulos; Tomasz Kociumaka; Solon P. Pissis; Simon J. Puglisi; William F. Smyth; Wojciech Tyczyński

A factor u of a string y is a cover of y if every letter of y lies within some occurrence of u in y; thus every cover u is also a border-both prefix and suffix-of y. If u is a cover of a superstring of y then u is a seed of y. Covers and seeds are two formalisations of quasiperiodicity, and there exist linear-time algorithms for computing all the covers and seeds of y. A string y covered by u thus generalises the idea of a repetition; that is, a string composed of exact concatenations of u. Even though a string is coverable somewhat more frequently than it is a repetition, still a string that can be covered by a single u is rare. As a result, seeking to find a more generally applicable and descriptive notion of cover, many articles were written on the computation of a minimum k-cover of y; that is, the minimum cardinality set of strings of length k that collectively cover y. Unfortunately, this computation turns out to be NP-hard. Therefore, in this article, we propose new, simple, easily-computed, and widely applicable notions of string covering that provide an intuitive and useful characterisation of a string: the enhanced cover; the enhanced left cover; and the enhanced left seed.

Recent Patents on Dna & Gene Sequences | 2013

GapMis: a Tool for Pairwise Sequence Alignment with a Single Gap

Tomáš Flouri; Kimon Frousios; Costas S. Iliopoulos; Kunsoo Park; Solon P. Pissis; German Tischler

MOTIVATION Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment. RESULTS We present GapMis, a tool for pairwise sequence alignment with a single gap. It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix. The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task.

artificial intelligence applications and innovations | 2012

GapMis-OMP: Pairwise Short-Read Alignment on Multi-core Architectures

Tomáš Flouri; Costas S. Iliopoulos; Kunsoo Park; Solon P. Pissis

Pairwise sequence alignment has received a new motivation due to the advent of next-generation sequencing technologies, particularly so for the application of re-sequencing—the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality fragment of the read allowing a number of mismatches and the insertion of a single gap in the alignment. In this article, we present GapMis-OMP, a tool for pairwise short-read alignment that works on multi-core architectures. It is designed to compute the alignments between all the sequences in a first set of sequences and all those from a second one in parallel. The presented experimental results demonstrate that GapMis-OMP is more efficient than most popular tools.

Explore More