Heikki Hyyrö
University of Tampere
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Heikki Hyyrö.
combinatorial pattern matching | 2002
Heikki Hyyrö; Gonzalo Navarro
We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m in a text of length n permitting k differences in O(mn/w) time, where w is the width of the computer word. The second one [Navarro and Raffinot, ACM JEA, 2000], extends a sublinear-time exact algorithm to approximate searching. The latter technique makes use of an O(kmn/w) time algorithm [Wu and Manber, Comm. ACM, 1992] for its internal workings. This algorithm is slow but flexible enough to support all the required operations. In this paper we show that the faster algorithm of Myers can be adapted to support all those operations. This involves extending it to compute edit distance, to search for any pattern suffix, and to detect in advance the impossibility of a later match. The result is an algorithm that performs better than the original version of Navarro and Raffinot and that is the fastest for several combinations of m, k and alphabet sizes that are useful, for example, in natural language searching and computational biology.
ACM Journal of Experimental Algorithms | 2005
Heikki Hyyrö; Kimmo Fredriksson; Gonzalo Navarro
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length <i>m</i> in a text of length <i>n</i> in time <i>O</i>(⌈<i>m</i>/<i>w</i>⌉<i>n</i>), where <i>w</i> is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic <i>O</i>(<i>mn</i>) time algorithm, it wastes bit-parallelisms power in the common case where <i>m</i> is much smaller than <i>w</i>, since <i>w</i>−<i>m</i> bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending <i>O</i>(<i>rn</i>) time to search for <i>r</i> patterns of length <i>m</i>≤<i>w</i>/2, we need <i>O</i>(⌈<i>rm</i>/<i>w</i>⌉<i>n</i>) time. Second, we show how the mechanism permits boosting the search for a single pattern of length <i>m</i>≤<i>w</i>/2, which can be searched for in <i>O</i>(⌈<i>n</i>/⌊<i>w</i>/<i>m</i>⌋⌉) bit-parallel steps instead of <i>O</i>(<i>n</i>). Third, we show how to extend these algorithms so that the time bounds essentially depend on <i>k</i> instead of <i>m</i>, where <i>k</i> is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.
Algorithmica | 2005
Heikki Hyyrö; Gonzalo Navarro
Abstract We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one, BPM (Myers, 1999), searches for a pattern of length m in a text of length n permitting k differences in
Lecture Notes in Computer Science | 2004
Heikki Hyyrö; Kimmo Fredriksson; Gonzalo Navarro
O(\lceil m/w \rceil n)
discovery science | 2004
Shunsuke Inenaga; Hideo Bannai; Heikki Hyyrö; Ayumi Shinohara; Masayuki Takeda; Kenta Nakai; Satoru Miyano
time, where w is the width of the computer word. The second one, ABNDM (Navarro and Raffinot, 2000), extends a sublinear-time exact algorithm to approximate searching. ABNDM relies on another algorithm, BPA (Wu and Manber, 1992), which makes use of an
workshop on algorithms in bioinformatics | 2004
Hideo Bannai; Heikki Hyyrö; Ayumi Shinohara; Masayuki Takeda; Kenta Nakai; Satoru Miyano
O(k \lceil m/w \rceil n)
Journal of Discrete Algorithms | 2005
Heikki Hyyrö
time algorithm for its internal workings. BPA is slow but flexible enough to support all operations required by ABNDM. We improve previous ABNDM analyses, showing that it is average-optimal in number of inspected characters, although the overall complexity is higher because of the
Information Processing Letters | 2008
Heikki Hyyrö
O(k \lceil m/w \rceil )
string processing and information retrieval | 2008
Heikki Hyyrö
work done per inspected character. We then show that the faster BPM can be adapted to support all the operations required by ABNDM. This involves extending it to compute edit distance, to search for any pattern suffix, and to detect in advance the impossibility of a later match. The solution to those challenges is based on the concept of a witness, which permits sampling some dynamic programming matrix values to bound, deduce or compute others fast. The resulting algorithm is average-optimal for m ≤ w, assuming the alphabet size is constant. In practice, it performs better than the original ABNDM and is the fastest algorithm for several combinations of m, k and alphabet sizes that are useful, for example, in natural language searching and computational biology. To show that the concept of witnesses can be used in further scenarios, we also improve a recent variant of BPM. The use of witnesses greatly improves the running time of this algorithm too.
conference on current trends in theory and practice of informatics | 2005
Heikki Hyyrö; Yoan J. Pinzón; Ayumi Shinohara
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal speedup over the basic O(mn) time algorithm, it wastes bit-parallelism’s power in the common case where m is much smaller than w, since w-m bits in the computer words get unused.