Shunsuke Inenaga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shunsuke Inenaga is active.

Explore More

Publication

Featured researches published by Shunsuke Inenaga.

combinatorial pattern matching | 2005

On-Line Construction of Compact Directed Acyclic Word Graphs

Shunsuke Inenaga; Hiromasa Hoshino; Ayumi Shinohara; Masayuki Takeda; Setsuo Arikawa; Giancarlo Mauri; Giulio Pavesi

Many different index structures, providing efficient solutions to problems related to pattern matching, have been introduced so far. Examples of these structures are suffix trees and directed acyclic word graphs (DAWGs), which can be efficiently constructed in linear time and space. Compact directed acyclic word graphs (CDAWGs) are an index structure preserving some features of both suffix trees and DAWGs, and require less space than both of them. An algorithm which directly constructs CDAWGs in linear time and space was first introduced by Crochemore and Verin, based on McCreights algorithm for constructing suffix trees. In this work, we present a novel on-line linear-time algorithm that builds the CDAWG for a single string as well as for a set of strings, inspired by Ukkonens on-line algorithm for constructing suffix trees.

Theoretical Computer Science | 2009

Efficient algorithms to compute compressed longest common substrings and compressed palindromes

Wataru Matsubara; Shunsuke Inenaga; Akira Ishino; Ayumi Shinohara; Tomoyuki Nakamura; Kazuo Hashimoto

This paper studies two problems on compressed strings described in terms of straight line programs (SLPs). One is to compute the length of the longest common substring of two given SLP-compressed strings, and the other is to compute all palindromes of a given SLP-compressed string. In order to solve these problems efficiently (in polynomial time w.r.t. the compressed size) decompression is never feasible, since the decompressed size can be exponentially large. We develop combinatorial algorithms that solve these problems in O(n4logn) time with O(n3) space, and in O(n4) time with O(n2) space, respectively, where n is the size of the input SLP-compressed strings.

mathematical foundations of computer science | 2003

Inferring Strings from Graphs and Arrays

Hideo Bannai; Shunsuke Inenaga; Ayumi Shinohara; Masayuki Takeda

This paper introduces a new problem of inferring strings from graphs, and inferring strings from arrays. Given a graph G or an array A, we infer a string that suits the graph, or the array, under some condition. Firstly, we solve the problem of finding a string w such that the directed acyclic subsequence graph (DASG )o fw is isomorphic to a given graph G. Secondly, we consider directed acyclic word graphs (DAWGs) in terms of string inference. Finally, we consider the problem of finding a string w of a minimal size alphabet, such that the suffix array (SA )o f w is identical to a given permutation p = p1 ,...,p n of integers 1 ,...,n . Each of our three algorithms solving the above problems runs in linear time with respect to the input size.

SIAM Journal on Computing | 2017

The "runs" theorem

Hideo Bannai; Tomohiro I; Shunsuke Inenaga; Yuto Nakashima; Masayuki Takeda; Kazuya Tsuruta

We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the “runs” conjecture [R. M. Kolpakov an...

language and automata theory and applications | 2009

Counting Parameterized Border Arrays for a Binary Alphabet

Tomohiro I; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda

The parameterized pattern matching problem is a kind of pattern matching problem, where a pattern is considered to occur in a text when there exists a renaming bijection on the alphabet with which the pattern can be transformed into a substring of the text. A parameterized border array (p-border array ) is an analogue of a border array of a standard string, which is also known as the failure function of the Morris-Pratt pattern matching algorithm. In this paper we present a linear time algorithm to verify if a given integer array is a valid p-border array for a binary alphabet. We also show a linear time algorithm to compute all binary parameterized strings sharing a given p-border array. In addition, we give an algorithm which computes all p-border arrays of length at most n , where n is a a given threshold. This algorithm runs in time linear in the number of output p-border arrays.

Journal of Discrete Algorithms | 2013

Fast q-gram mining on SLP compressed strings

Keisuke Goto; Hideo Bannai; Shunsuke Inenaga; Masayuki Takeda

We present simple and efficient algorithms for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, we present an O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T. Computational experiments show that our algorithm and its variation are practical for small q, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.

Journal of Bioinformatics and Computational Biology | 2004

EFFICIENTLY FINDING REGULATORY ELEMENTS USING CORRELATION WITH GENE EXPRESSION

Hideo Bannai; Shunsuke Inenaga; Ayumi Shinohara; Masayuki Takeda; Satoru Miyano

We present an efficient algorithm for detecting putative regulatory elements in the upstream DNA sequences of genes, using gene expression information obtained from microarray experiments. Based on a generalized suffix tree, our algorithm looks for motif patterns whose appearance in the upstream region is most correlated with the expression levels of the genes. We are able to find the optimal pattern, in time linear in the total length of the upstream sequences. We implement and apply our algorithm to publicly available microarray gene expression data, and show that our method is able to discover biologically significant motifs, including various motifs which have been reported previously using the same data set. We further discuss applications for which the efficiency of the method is essential, as well as possible extensions to our algorithm.

discovery science | 2001

A Practical Algorithm to Find the Best Episode Patterns

Masahiro Hirao; Shunsuke Inenaga; Ayumi Shinohara; Masayuki Takeda; Setsuo Arikawa

Episode pattern is a generalized concept of subsequence pattern where the length of substring containing the subsequence is bounded. Given two sets of strings, consider an optimization problem to find a best episode pattern that is common to one set but not common in the other set. The problem is known to be NP-hard. We give a practical algorithm to solve it exactly.

combinatorial pattern matching | 2006

On-Line linear-time construction of word suffix trees

Shunsuke Inenaga; Masayuki Takeda

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string in D+, namely, w is a sequence w1 ⋯wk of k words in D. The word suffix tree of w w.r.t. D is a path-compressed trie that represents only the k suffixes in the form of wi ⋯wk. A typical example of its application is word- and phrase-level search on natural language documents. Andersson et al. proposed an algorithm to build word suffix trees in O(n) expected time with O(k) space. In this paper we present a new word suffix tree construction algorithm with O(n) running time and O(k) space in the worst cases. Our algorithm is on-line, which means that it can sequentially process the characters in the input, each by each, from left to right.

conference on current trends in theory and practice of informatics | 2014

Shortest Unique Substrings Queries in Optimal Time

Kazuya Tsuruta; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda

We present an optimal, linear time algorithm for the shortest unique substring problem, thus improving the algorithm by Pei et al. (ICDE 2013). Our implementation is simple and based on suffix arrays. Computational experiments show that our algorithm is much more efficient in practice, compared to that of Pei et al.

Explore More