Yoan J. Pinzón
King's College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yoan J. Pinzón.
International Journal of Computer Mathematics | 2002
Emilios Cambouropoulos; Maxime Crochemore; Costas S. Iliopoulos; Laurent Mouchard; Yoan J. Pinzón
Here we introduce two new notions of approximate matching with application in computer assisted music analysis. We present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.
Information Processing Letters | 2001
Maxime Crochemore; Costas S. Iliopoulos; Yoan J. Pinzón; James F. Reid
Abstract This paper presents a new practical bit-vector algorithm for solving the well-known Longest Common Subsequence (LCS) problem. Given two strings of length m and n , n ⩾ m , we present an algorithm which determines the length p of an LCS in O( nm / w ) time and O( m / w ) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise “parallelization” of the classical dynamic programming approach. Our algorithm is very efficient in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that m ⩽ w .
Journal of Discrete Algorithms | 2005
Kjell Lemström; Gonzalo Navarro; Yoan J. Pinzón
Abstract We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transposition-invariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bit-parallelism. Our branch and bound algorithm computes the longest common transposition-invariant subsequence (LCTS) in time O ( ( m 2 + log log σ ) log σ ) in the best case and O ( ( m 2 + log σ ) σ ) in the worst case, where m and σ, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bit-parallelism and thus obtain a speedup of O ( w / log m ) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bit-parallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound method is very flexible, it can be further improved by combining it with other efficient algorithms such as our novel bit-parallel algorithm. We experiment on several combination possibilities and discuss which are the best settings for each of those combinations. Our algorithms are easily extended to other musically relevant cases, such as δ-matching and polyphony (where there are several parallel texts to be considered). We also show how our bit-parallel algorithm is adapted to text searching and illustrate its effectiveness in complex cases where the only known competing method is the use of brute force.
Lecture Notes in Computer Science | 2001
Costas S. Iliopoulos; Laurent Mouchard; Yoan J. Pinzón
The approximate string matching problem is to find all locations which a pattern of length m matches a substring of a text of length n with at most k differences. The program agrep is a simple and practical bit-vector algorithm for this problem. In this paper we consider the following incremental version of the problem: given an appropriate encoding of a comparison between A and bB, can one compute the answer for A and B, and the answer for A and Bc with equal efficiency, where b and c are additional symbols? Here we present an elegant and veryeasy to implement bit-vector algorithm for answering these questions that requires only O(n⌈m/w⌉) time, where n is the length of A, m is the length of B and w is the number of bits in a machine word. We also present an O(nm⌈h/w⌉) algorithm for the fixed-length approximate string matching problem: given a text t, a pattern p and an integer h, compute the optimal alignment of all substrings of p of length h and a substring of t.
Journal of Discrete Algorithms | 2005
Maxime Crochemore; Costas S. Iliopoulos; Gonzalo Navarro; Yoan J. Pinzón; Alejandro Salinger
Abstract ( δ , γ ) -matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P 1 … m and a text T 1 … n on an alphabet of integers, find the occurrences P ′ of the pattern in the text such that (i) ∀ 1 ⩽ i ⩽ m , | P i − P i ′ | ⩽ δ , and (ii) ∑ 1 ⩽ i ⩽ m | P i − P i ′ | ⩽ γ . The problem makes sense for δ ⩽ γ ⩽ δ m . Several techniques for ( δ , γ ) -matching have been proposed, based on bit-parallelism or on skipping characters. We first present an O ( m n log ( γ ) / w ) worst-case time and O ( n ) average-case time bit-parallel algorithm (being w the number of bits in the computer word). It improves the previous O ( m n log ( δ m ) / w ) worst-case time algorithm of the same type. Second, we combine our bit-parallel algorithm with suffix automata to obtain the first algorithm that skips characters using both δ and γ. This algorithm examines less characters than any previous approach, as the others do just δ-matching and check the γ-condition on the candidates. We implemented our algorithms and drew experimental results on real music, showing that our algorithms are superior to current alternatives with high values of δ.
string processing and information retrieval | 2008
In-Bok Lee; Juan Mendivelso; Yoan J. Pinzón
This paper defines a new pattern matching problem by combiningtwo paradigms: Δ γ ---matching andparameterized matching. The solution is essentially obtained by acombination of bitparallel techniques and a reduction to a graphmatching problem. The time complexity of the algorithm isO (nm ), assuming text size n , patternsize m and a constant size alphabet.
string processing and information retrieval | 2012
Juan Mendivelso; In-Bok Lee; Yoan J. Pinzón
This paper defines a new string matching problem by combining two paradigms: function matching and δγ-matching. The result is an approximate variant of function matching where two equal-length strings X and Y match if there exists a function that maps X to a string X′ such that X′ and Y are δγ- similar. We propose an O(nm) algorithm for finding all the matches of a pattern P1 …m in a text T1 …n.
conference on current trends in theory and practice of informatics | 2005
Heikki Hyyrö; Yoan J. Pinzón; Ayumi Shinohara
The approximate string matching problem is to find all locations at which a query p of length m matches a substring of a text t of length n with at most k differences (insertions, deletions, substitutions). The fastest solutions in practice for this problem are the bit-parallel NFA simulation algorithms of Wu & Manber [4] and Baeza-Yates & Navarro [1], and the bit-parallel dynamic programming algorithm of Myers [3]. In this paper we present modified versions of these algorithms to deal with the restricted case where only insertions and deletions (called indel for short) are permitted. We also show test results with the algorithms.
Lecture Notes in Computer Science | 2005
Heikki Hyyrö; Yoan J. Pinzón; Ayumi Shinohara
The task of approximate string matching is to find all locations at which a pattern string p of length m matches a substring of a text string t of length n with at most k differences. It is common to use Levenshtein distance [5], which allows the differences to be single-character insertions, deletions, substitutions. Recently, in [3], the IndelMYE, IndelWM and IndelBYN algorithms where introduced as modified version of the bit-parallel algorithms of Myers [6], Wu&Manber [10] and Baeza-Yates&Navarro [1], respectively. These modified versions where made to support the indel distance (only single-character insertions and/or deletions are allowed). In this paper we present an improved version of IndelMYE that makes a better use of the bit-operations and runs 24.5 percent faster in practice. In the end we present a complete set of experimental results to support our findings.
string processing and information retrieval | 2004
Kjell Lemström; Gonzalo Navarro; Yoan J. Pinzón
We consider the problem of longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (i.e. transposed) to match the second. For this longest common transposition invariant subsequence (LCTS) problem, that has applications for instance in music comparison, we develop a branch and bound algorithm with best case time O((m 2 + loglog σ) logσ) and worst case time O((m 2 + log σ) σ), where m and σ are the length of the strings and the number of possible transpositions, respectively. This compares favorably against the O(σm 2) naive algorithm in most cases and, for large m, against the O(m 2loglog m) time algorithm of [2].