Kazuyuki Narisawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazuyuki Narisawa is active.

Explore More

Publication

Featured researches published by Kazuyuki Narisawa.

discovery science | 2007

Unsupervised spam detection based on string alienness measures

Kazuyuki Narisawa; Hideo Bannai; Kohei Hatano; Masayuki Takeda

We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

combinatorial pattern matching | 2007

Efficient computation of substring equivalence classes with suffix arrays

Kazuyuki Narisawa; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda

This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.

Information & Computation | 2015

Detecting regularities on grammar-compressed strings

Tomohiro I; Wataru Matsubara; Kouji Shimohira; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda; Kazuyuki Narisawa; Ayumi Shinohara

We address the problems of detecting and counting various forms of regularities in a string represented as a straight-line program (SLP) which is essentially a context free grammar in the Chomsky normal form. Given an SLP of size n that represents a string s of length N, our algorithm computes all runs and squares in s in O ( n 3 h ) time and O ( n 2 ) space, where h is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in O ( n 3 h + g n h log ? N ) time and O ( n 2 ) space, where g is the length of the gap. As one of the main components of the above solution, we propose a new technique called approximate doubling which seems to be a useful tool for a wide range of algorithms on SLPs. Indeed, we show that the technique can be used to compute the periods and covers of the string in O ( n 2 h ) time and O ( n h ( n + log 2 ? N ) ) time, respectively.

conference on current trends in theory and practice of informatics | 2013

Permuted Pattern Matching on Multi-track Strings

Takashi Katsura; Kazuyuki Narisawa; Ayumi Shinohara; Hideo Bannai; Shunsuke Inenaga

We propose a new variant of pattern matching on a multi-set of strings, or multi-tracks, called permuted-matching, that looks for occurrences of a multi-track pattern of length m with M tracks, in a multi-track text of length n with N tracks over Σ. We show that the problem can be solved in O(nNlog|Σ|) time and O(mM + N) space, and further in O(nN) time and space when assuming an integer alphabet. For the case where the number of strings in the text and pattern are equal (full-permuted-matching), we propose a new index structure called the multi-track suffix tree, as well as an O(nN log|Σ|) time and O(nN) space construction algorithm. Using this structure, we can solve the full-permuted-matching problem in O(mN log|Σ| + occ) time for any multi-track pattern of length m with N tracks which occurs occ times.

Journal of Discrete Algorithms | 2015

Dynamic edit distance table under a general weighted cost function

Heikki Hyyrö; Kazuyuki Narisawa; Shunsuke Inenaga

We discuss the problem of edit distance computation under a dynamic setting, where one of the two compared strings may be modified by single-character edit operations and we wish to keep the edit distance information between the strings up-to-date. A previous algorithm by Kim and Park (2004) 6 solves a more limited problem where modifications can be done only at the ends of the strings (so-called decremental or incremental edits) and the edit operations have (essentially) unit costs. If the lengths of the two strings are m and n, their algorithm requires O ( m + n ) time per modification. We propose a simple and practical algorithm that (1) allows arbitrary non-negative costs for the edit operations and (2) allows the modifications to be done at arbitrary positions. If the latter string is modified at position j ? , our algorithm requires O ( min ? { r c ( m + n ) , m n } ) time, where r = min ? { j ? , n - j ? + 1 } and c is the maximum edit operation cost. This equals O ( m + n ) in the simple decremental/incremental unit cost case. Our experiments indicate that the algorithm performs much faster than the theoretical worst-case time limit O ( m n ) in the general case with arbitrary edit costs and modification positions. The main practical limitation of the algorithm is its ? ( m n ) memory requirement for storing the edit distance information.

mathematical foundations of computer science | 2013

Detecting Regularities on Grammar-Compressed Strings

Tomohiro I; Wataru Matsubara; Kouji Shimohira; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda; Kazuyuki Narisawa; Ayumi Shinohara

We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size n that represents a string s of length N, our algorithm computes all runs and squares in s in O(n 3 h) time and O(n 2) space, where h is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in O(n 3 h + gnhlogN) time and O(n 2) space, where g is the length of the gap. The key technique of the above solution also allows us to compute the periods and covers of the string in O(n 2 h) time and O(nh(n + log2 N)) time, respectively.

conference on current trends in theory and practice of informatics | 2009

Dynamic Edit Distance Table under a General Weighted Cost Function

Heikki Hyyrö; Kazuyuki Narisawa; Shunsuke Inenaga

String comparison is a fundamental task in theoretical computer science, with applications in e.g., spelling correction and computational biology. Edit distance is a classic similarity measure between two given strings A and B. It is the minimum total cost for transforming A into B, or vice versa, using three types of edit operations: single-character insertions, deletions, and/or substitutions.

conference on current trends in theory and practice of informatics | 2017

Computing Longest Single-arm-gapped Palindromes in a String

Shintaro Narisada; Diptarama; Kazuyuki Narisawa; Shunsuke Inenaga; Ayumi Shinohara

We introduce new types of approximate palindromes called single-arm-gapped palindromes (SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either \(wguc u^R w^R\) or \(wuc u^Rgw^R\), where w and u are non-empty strings, \(w^R\) and \(u^R\) are their reversed strings respectively, g is a gap, and c is either a single character or the empty string. We classify SAGPs into two groups: those which have \(ucu^R\) as a maximal palindrome (type-1), and the others (type-2). We propose several algorithms to compute all type-1 SAGPs with longest arms occurring in a given string using suffix arrays, and them a linear-time algorithm based on suffix trees. We also show how to compute type-2 SAGPs with longest arms in linear time. We perform some preliminary experiments to evaluate practical performances of the proposed methods.

conference on current trends in theory and practice of informatics | 2017

Longest Common Subsequence in at Least k Length Order-Isomorphic Substrings

Yohei Ueki; Diptarama; Masatoshi Kurihara; Yoshiaki Matsuoka; Kazuyuki Narisawa; Ryo Yoshinaka; Hideo Bannai; Shunsuke Inenaga; Ayumi Shinohara

We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least k length substrings. First, we show an O(mn) time algorithm for the problem which gives a better worst-case running time than existing algorithms, where m and n are lengths of the input strings. Furthermore, we mainly consider the LCS in at least k length order-isomorphic substrings problem. We show that the problem can also be solved in O(mn) worst-case time by an easy-to-implement algorithm.

string processing and information retrieval | 2012

Computing maximum number of runs in strings

Kazuhiko Kusano; Kazuyuki Narisawa; Ayumi Shinohara

A run (also called maximal repetition) in a word is a non-extendable repetition. Finding the maximum number ρ(n) of runs in a string of length n is a challenging problem. Although it is known that ρ(n)≤1.029n for any n and there exists large n such that ρ(n)≥0.945n, the exact value of ρ(n) is still unknown. Several algorithms have been proposed to count runs in a string efficiently, and ρ(n) can be obtained for small n by these algorithms. In this paper, we focus on computing ρ(n) for given length parameter n, instead of exhaustively counting all runs for every string of length n. We report exact values of ρ(n) for binary strings for n≤66, together with the strings which contain ρ(n) runs.

Explore More