Costas S. Iliopoulos
King's College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Costas S. Iliopoulos.
string processing and information retrieval | 2010
Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a string of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for a few classical string problems: computing all distinct kth string powers for a given k, in particular squares for k = 2, and finding all local periods in a given string of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov & Kucherov, 1999). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.
language and automata theory and applications | 2010
Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń
A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition v with a period p such that 2p≤|v|. The maximal number of runs in a string of length n has been thoroughly studied, and is known to be between 0.944 n and 1.029 n. In this paper we investigate cubic runs, in which the shortest period p satisfies 3p≤|v|. We show the upper bound of 0.5 n on the maximal number of such runs in a string of length n, and construct an infinite sequence of words over binary alphabet for which the lower bound is 0.406 n.
Algorithmica | 1988
Alberto Apostolico; Costas S. Iliopoulos; Gad M. Landau; Baruch Schieber; Uzi Vishkin
Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires Θ(n2) space. However, the space needed can be reduced toO(n1+ɛ) for any 0< ɛ ≤1, with a corresponding slow-down proportional to 1/ɛ. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.
workshop on algorithms in bioinformatics | 2015
Roberto Grossi; Costas S. Iliopoulos; Robert Mercaş; Nadia Pisanti; Solon P. Pissis; Ahmad Retha; Fatima Vayani
Sequence comparison is a fundamental step in many important tasks in bioinformatics. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular genome structure is a common phenomenon in nature, a caveat of specialized alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. In this paper, we introduce a new distance measure based on q-grams, and show how it can be computed efficiently for circular sequence comparison. Experimental results, using real and synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art.
mathematical foundations of computer science | 2007
Maxime Crochemore; Costas S. Iliopoulos; M. Sohel Rahman
In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem.
International Journal of Computer Mathematics | 2002
Emilios Cambouropoulos; Maxime Crochemore; Costas S. Iliopoulos; Laurent Mouchard; Yoan J. Pinzón
Here we introduce two new notions of approximate matching with application in computer assisted music analysis. We present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.
Information Processing Letters | 2001
Maxime Crochemore; Costas S. Iliopoulos; Yoan J. Pinzón; James F. Reid
Abstract This paper presents a new practical bit-vector algorithm for solving the well-known Longest Common Subsequence (LCS) problem. Given two strings of length m and n , n ⩾ m , we present an algorithm which determines the length p of an LCS in O( nm / w ) time and O( m / w ) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise “parallelization” of the classical dynamic programming approach. Our algorithm is very efficient in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that m ⩽ w .
Information Processing Letters | 1991
Alberto Apostolico; Martin Farach; Costas S. Iliopoulos
A string w covers another string z if every position of z is within some occurrence of w in z. Clearly, every string is covered by itself. A string that is covered only by itself is superprimitive. We show that the property of being superprimitive is testable on a string of n symbols in O(n) time and space.
Genomics | 2013
Kimon Frousios; Costas S. Iliopoulos; Thomas Schlitt; Michael A. Simpson
The study of DNA sequence variation has been transformed by recent advances in DNA sequencing technologies. Determination of the functional consequences of sequence variant alleles offers potential insight as to how genotype may influence phenotype. Even within protein coding regions of the genome, establishing the consequences of variation on gene and protein function is challenging and requires substantial laboratory investigation. However, a series of bioinformatics tools have been developed to predict whether non-synonymous variants are neutral or disease-causing. In this study we evaluate the performance of nine such methods (SIFT, PolyPhen2, SNPs&GO, PhD-SNP, PANTHER, Mutation Assessor, MutPred, Condel and CAROL) and developed CoVEC (Consensus Variant Effect Classification), a tool that integrates the prediction results from four of these methods. We demonstrate that the CoVEC approach outperforms most individual methods and highlights the benefit of combining results from multiple tools.
Theoretical Computer Science | 1997
Costas S. Iliopoulos; Dennis W. G. Moore; William F. Smyth
A (finite) Fibonacci stringFn is defined as follows: F0 = b, F1 = a; for every integer n ⩾ 2, Fn = Fn − 1Fn − 2. For n ⩾ 1, the length of Fn is denoted by . The infinite Fibonacci stringF is the string which contains every Fn, n ⩾ 1, as a prefix. Apart from their general theoretical importance, Fibonacci strings are often cited as worst-case examples for algorithms which compute all the repetitions or all the “Abelian squares” in a given string. In this paper we provide a characterization of all the squares in F, hence in every prefix Fn; this characterization naturally gives rise to a algorithm which specifies all the squares of Fn in an appropriate encoding. This encoding is made possible by the fact that the squares of Fn occur consecutively, in “runs”, the number of which is . By contrast, the known general algorithms for the computation of the repetitions in an arbitrary string require time (and produce outputs) when applied to a Fibonacci string Fn.