Carl Barton
Queen Mary University of London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carl Barton.
Algorithms for Molecular Biology | 2014
Carl Barton; Costas S. Iliopoulos; Solon P. Pissis
AbstractBackgroundCircular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.ResultsIn this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n) . Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k=O(m/logm) . We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach.ConclusionsWe present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.
language and automata theory and applications | 2015
Carl Barton; Costas S. Iliopoulos; Solon P. Pissis
Approximate string matching is the problem of finding all factors of a text \(t\) of length \(n\) that are at a distance at most \(k\) from a pattern \(x\) of length \(m\). Approximate circular string matching is the problem of finding all factors of \(t\) that are at a distance at most \(k\) from \(x\) or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time \(\mathcal {O}(n(k + \log m) /m)\). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using \(x\) and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach.
BMC Bioinformatics | 2014
Carl Barton; Alice Héliou; Laurent Mouchard; Solon P. Pissis
BackgroundAn absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an O(n2)-time and O(n)-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available.ResultsOur contribution in this article is twofold: first, we bridge this unpleasant gap by presenting an O(n)-time and O(n)-space algorithm for computing all minimal absent words based on the construction of suffix arrays; and second, we provide the respective implementation of this algorithm. Experimental results, using real and synthetic data, show that this implementation outperforms the one by Pinho et al. The open-source code of our implementation is freely available at http://github.com/solonas13/maw.ConclusionsClassical notions for sequence comparison are increasingly being replaced by other similarity measures that refer to the composition of sequences in terms of their constituent patterns. One such measure is the minimal absent words. In this article, we present a new linear-time and linear-space algorithm for the computation of minimal absent words based on the suffix array.
combinatorial pattern matching | 2016
Carl Barton; Tomasz Kociumaka; Solon P. Pissis; Jakub Radoszewski
The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold
Springer (Reference) | 2015
Carl Barton; Alice Héliou; Laurent Mouchard; Solon P. Pissis
1/z
symposium on experimental and efficient algorithms | 2015
Carl Barton; Costas S. Iliopoulos; Ritu Kundu; Solon P. Pissis; Ahmad Retha; Fatima Vayani
, we say that a pattern string
Algorithms for Molecular Biology | 2014
Carl Barton; Costas S. Iliopoulos; Solon P. Pissis
P
conference on combinatorial optimization and applications | 2016
Carl Barton; Chang Liu; Solon P. Pissis
matches a weighted text at position
Springer International Publishing | 2015
Carl Barton; Solon P. Pissis
i
ACM | 2013
Carl Barton; Tomáš Flouri; Costas S. Iliopoulos; Solon P. Pissis
if the product of probabilities of the letters of