Kimmo Fredriksson
University of Eastern Finland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kimmo Fredriksson.
string processing and information retrieval | 2005
Kimmo Fredriksson; Szymon Grabowski
We develop a new exact bit-parallel string matching algorithm, based on the Shift-Or algorithm (Baeza-Yates & Gonnet, 1992). Assuming that the pattern representation fits into a single computer word, this algorithm has optimal O(n logσm / m) average running time, as well as optimal O(n) worst case running time, where n, m and σ are the sizes of the text, the pattern, and the alphabet, respectively. We also study several implementation details. The experimental results show that our algorithm is the fastest in most of the cases where it can be applied, displacing even the long-standing BNDM (Navarro & Raffinot, 2000) family of algorithms. Finally, we show how to adapt our techniques for the Shift-Add algorithm (Baeza-Yates & Gonnet, 1992), obtaining optimal time for searching under Hamming distance.
ACM Journal of Experimental Algorithms | 2004
Kimmo Fredriksson; Gonzalo Navarro
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough l-grams from text windows so as to prove that no occurrence can contain the part of the window read, and then shifting the window.We show analytically that our algorithm is optimal on average. Hence our first contribution is to fill an important gap in the area, since no average-optimal algorithm existed for multiple approximate string matching.We consider several variants and practical improvements to our algorithm, and show experimentally that they are resistant to the number of patterns and the fastest for low difference ratios, displacing the long-standing best algorithms. Hence our second contribution is to give a practical algorithm for this problem, by far better than any existing alternative in many cases of interest. On real-life texts, our algorithm is especially interesting for computational biology applications.In particular, we show that our algorithm can be successfully used to search for one pattern, where many more competing algorithms exist. Our algorithm is also average-optimal in this case, being the second after that of Chang and Marr. However, our algorithm permits higher difference ratios than Chang and Marr, and this is our third contribution.In practice, our algorithm is competitive in this scenario too, being the fastest for low difference ratios and moderate alphabet sizes. This is our fourth contribution, which also answers affirmatively the question of whether a practical average-optimal approximate string-matching algorithm existed.
Information Processing Letters | 2006
Kimmo Fredriksson; Maxim Mozgovoy
In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t. We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns.
Information Processing Letters | 2003
Kimmo Fredriksson
Given a text T[1...n] and a pattern P[1...m] over some alphabet Σ of size σ, we want to find all the (exact) occurrences of P in T. The well-known shift-or algorithm solves this problem in time O(n⌈m/w⌉), where w is the number of bits in machine word, using bit-parallelism. We show how to extend the bit-parallelism in another direction, using super-alphabets. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. The algorithm is implemented, and we show that it works well in practice too. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets.
combinatorial pattern matching | 2002
Kimmo Fredriksson; Gonzalo Navarro; Esko Ukkonen
We give fast filtering algorithms to search for a 2- dimensional pattern in a 2-dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under several matching models, improving the previous results. For a text of size n?n characters and a pattern of size m?m characters, the exact matching takes average time O(n2 log m/m2), which is optimal. If we allow k mismatches of characters, then our best algorithm achieves O(n2k log m/m2) average time, for reasonable k values. For large k, we obtain an O(n2k3/2?log m/m) average time algorithm. We generalize the algorithms for the matching model where the sum of absolute differences between characters is at most k. Finally, we show how to make the algorithms optimal in the worst case, achieving the lower bound ?(n2m3).
combinatorial pattern matching | 1998
Kimmo Fredriksson; Esko Ukkonen
We consider the problem of finding the occurrences of two-dimensional pattern P[1..m, 1..m] in two-dimensional text T[1..n, 1..n] when also rotations of P are allowed. A fast filtration-type algorithm is developed that finds in T the locations where a rotated P can occur. The, corresponding rotations are also found. The algorithm first reads from P a linear string of length m in all θ(m2) orientations that are relevant. We also show that the number of different orientations which P can have is θ(m3). The text T is scanned with Aho-Corasick string matching automaton to find the occurrences of any of these θ(m2) linear strings of length m. Each such occurrence indicates a potential set of occurrences of whole P which are then checked. Some preliminary running times of a prototype implementation of the method are reported.
string processing and information retrieval | 2005
Maxim Mozgovoy; Kimmo Fredriksson; Daniel R. White; Mike Joy; Erkki Sutinen
The large class sizes typical for an undergraduate programming course mean that it is nearly impossible for a human marker to accurately detect plagiarism, particularly if some attempt has been made to hide the copying. While it would be desirable to be able to detect all possible code transformations we believe that there is a minimum level of acceptable performance for the application of detecting student plagiarism. It would be useful if the detector operated at a level that meant for a piece of work to fool the algorithm would require that the student spent a large amount of time on the assignment and had a good enough understanding to do the work without plagiarising.
Theoretical Computer Science | 2004
Gonzalo Navarro; Kimmo Fredriksson
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size σ cannot be less than Ω(n logσ(rm)/m). When we permit up to k insertions, deletions, and/or substitutions of characters in the occurrences of the patterns, the lower bound becomes Ω(n(k + logσ(rm))/m). This generalizes previous single-pattern lower bounds of Yao (for exact matching) and of Chang and Marr (for approximate matching), and proves the optimality of several existing multipattern search algorithms.
string processing and information retrieval | 2002
Kimmo Fredriksson
Given a text T[1 . . . n] and a pattern P[1 . . . m] over some alphabet ? of size ?, finding the exact occurrences of P in T requires at least ?(n log? m/m) character comparisons on average, as shown in [19]. Consequently, it is believed that this lower bound implies also an ?(n log? m/m) lower bound for the execution time of an optimal algorithm. However, in this paper we show how to obtain an O(n/m) average time algorithm. This is achieved by slightly changing the model of computation, and with a modification of an existing algorithm. Our technique uses a super-alphabet for simulating suffix automaton. The space usage of the algorithm is O(?m). The technique can be applied to many other string matching algorithms, including dictionary matching, which is also solved in expected time O(n/m), and approximate matching allowing k edit operations (mismatches, insertions or deletions of characters). This is solved in expected time O(nk/m) for k ? O(m/log? m). The known lower bound for this problem is ?(n(k + log? m)/m), given in [6]. Finally we show how to adopt a similar technique to the shift-or algorithm, extending its bit-parallelism in another direction. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. Some of the algorithms are implemented, and we show that the methods work well in practice too. This is especially true for the shift-or algorithm, which in some cases works faster than predicted by the theory. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. All the methods and analyses assume the RAM model of computation, and that each symbol is coded in b = ?log2 ?? bits. They work for larger b too, but the speed-up is decreased.
Journal of Discrete Algorithms | 2009
Kimmo Fredriksson; Szymon Grabowski
The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads to very simple algorithms that are optimal on average. We then show how our basic method can be used to solve multiple string matching as well as several approximate matching problems in average optimal time. The general method can be applied to many existing string matching algorithms. Our experimental results show that the algorithms perform very well in practice.