Kazuyuki Narisawa
Tohoku University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kazuyuki Narisawa.
discovery science | 2007
Kazuyuki Narisawa; Hideo Bannai; Kohei Hatano; Masayuki Takeda
We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.
combinatorial pattern matching | 2007
Kazuyuki Narisawa; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda
This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.
Information & Computation | 2015
Tomohiro I; Wataru Matsubara; Kouji Shimohira; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda; Kazuyuki Narisawa; Ayumi Shinohara
We address the problems of detecting and counting various forms of regularities in a string represented as a straight-line program (SLP) which is essentially a context free grammar in the Chomsky normal form. Given an SLP of size n that represents a string s of length N, our algorithm computes all runs and squares in s in O ( n 3 h ) time and O ( n 2 ) space, where h is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in O ( n 3 h + g n h log ? N ) time and O ( n 2 ) space, where g is the length of the gap. As one of the main components of the above solution, we propose a new technique called approximate doubling which seems to be a useful tool for a wide range of algorithms on SLPs. Indeed, we show that the technique can be used to compute the periods and covers of the string in O ( n 2 h ) time and O ( n h ( n + log 2 ? N ) ) time, respectively.
conference on current trends in theory and practice of informatics | 2013
Takashi Katsura; Kazuyuki Narisawa; Ayumi Shinohara; Hideo Bannai; Shunsuke Inenaga
We propose a new variant of pattern matching on a multi-set of strings, or multi-tracks, called permuted-matching, that looks for occurrences of a multi-track pattern of length m with M tracks, in a multi-track text of length n with N tracks over Σ. We show that the problem can be solved in O(nNlog|Σ|) time and O(mM + N) space, and further in O(nN) time and space when assuming an integer alphabet. For the case where the number of strings in the text and pattern are equal (full-permuted-matching), we propose a new index structure called the multi-track suffix tree, as well as an O(nN log|Σ|) time and O(nN) space construction algorithm. Using this structure, we can solve the full-permuted-matching problem in O(mN log|Σ| + occ) time for any multi-track pattern of length m with N tracks which occurs occ times.
Journal of Discrete Algorithms | 2015
Heikki Hyyrö; Kazuyuki Narisawa; Shunsuke Inenaga
We discuss the problem of edit distance computation under a dynamic setting, where one of the two compared strings may be modified by single-character edit operations and we wish to keep the edit distance information between the strings up-to-date. A previous algorithm by Kim and Park (2004) 6 solves a more limited problem where modifications can be done only at the ends of the strings (so-called decremental or incremental edits) and the edit operations have (essentially) unit costs. If the lengths of the two strings are m and n, their algorithm requires O ( m + n ) time per modification. We propose a simple and practical algorithm that (1) allows arbitrary non-negative costs for the edit operations and (2) allows the modifications to be done at arbitrary positions. If the latter string is modified at position j ? , our algorithm requires O ( min ? { r c ( m + n ) , m n } ) time, where r = min ? { j ? , n - j ? + 1 } and c is the maximum edit operation cost. This equals O ( m + n ) in the simple decremental/incremental unit cost case. Our experiments indicate that the algorithm performs much faster than the theoretical worst-case time limit O ( m n ) in the general case with arbitrary edit costs and modification positions. The main practical limitation of the algorithm is its ? ( m n ) memory requirement for storing the edit distance information.
mathematical foundations of computer science | 2013
Tomohiro I; Wataru Matsubara; Kouji Shimohira; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda; Kazuyuki Narisawa; Ayumi Shinohara
We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size n that represents a string s of length N, our algorithm computes all runs and squares in s in O(n 3 h) time and O(n 2) space, where h is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in O(n 3 h + gnhlogN) time and O(n 2) space, where g is the length of the gap. The key technique of the above solution also allows us to compute the periods and covers of the string in O(n 2 h) time and O(nh(n + log2 N)) time, respectively.
conference on current trends in theory and practice of informatics | 2009
Heikki Hyyrö; Kazuyuki Narisawa; Shunsuke Inenaga
String comparison is a fundamental task in theoretical computer science, with applications in e.g., spelling correction and computational biology. Edit distance is a classic similarity measure between two given strings A and B. It is the minimum total cost for transforming A into B, or vice versa, using three types of edit operations: single-character insertions, deletions, and/or substitutions.
conference on current trends in theory and practice of informatics | 2017
Shintaro Narisada; Diptarama; Kazuyuki Narisawa; Shunsuke Inenaga; Ayumi Shinohara
We introduce new types of approximate palindromes called single-arm-gapped palindromes (SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either \(wguc u^R w^R\) or \(wuc u^Rgw^R\), where w and u are non-empty strings, \(w^R\) and \(u^R\) are their reversed strings respectively, g is a gap, and c is either a single character or the empty string. We classify SAGPs into two groups: those which have \(ucu^R\) as a maximal palindrome (type-1), and the others (type-2). We propose several algorithms to compute all type-1 SAGPs with longest arms occurring in a given string using suffix arrays, and them a linear-time algorithm based on suffix trees. We also show how to compute type-2 SAGPs with longest arms in linear time. We perform some preliminary experiments to evaluate practical performances of the proposed methods.
conference on current trends in theory and practice of informatics | 2017
Yohei Ueki; Diptarama; Masatoshi Kurihara; Yoshiaki Matsuoka; Kazuyuki Narisawa; Ryo Yoshinaka; Hideo Bannai; Shunsuke Inenaga; Ayumi Shinohara
We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least k length substrings. First, we show an O(mn) time algorithm for the problem which gives a better worst-case running time than existing algorithms, where m and n are lengths of the input strings. Furthermore, we mainly consider the LCS in at least k length order-isomorphic substrings problem. We show that the problem can also be solved in O(mn) worst-case time by an easy-to-implement algorithm.
string processing and information retrieval | 2012
Kazuhiko Kusano; Kazuyuki Narisawa; Ayumi Shinohara
A run (also called maximal repetition) in a word is a non-extendable repetition. Finding the maximum number ρ(n) of runs in a string of length n is a challenging problem. Although it is known that ρ(n)≤1.029n for any n and there exists large n such that ρ(n)≥0.945n, the exact value of ρ(n) is still unknown. Several algorithms have been proposed to count runs in a string efficiently, and ρ(n) can be obtained for small n by these algorithms. In this paper, we focus on computing ρ(n) for given length parameter n, instead of exhaustively counting all runs for every string of length n. We report exact values of ρ(n) for binary strings for n≤66, together with the strings which contain ρ(n) runs.