Ayumi Shinohara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ayumi Shinohara is active.

Explore More

Publication

Featured researches published by Ayumi Shinohara.

algorithmic learning theory | 1991

Teachability in computational learning

Ayumi Shinohara; Satoru Miyano

This paper considers computational learning from the view-point of teaching. We introduce a notion of teachability with which we establish a relationship between the learnability and teachability. We also discuss the complexity issues of a teacher in relation to learning.

combinatorial pattern matching | 1997

An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs

Masamichi Miyazaki; Ayumi Shinohara; Masayuki Takeda

We show an efficient pattern-matching algorithm for strings that are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straight-line programs T and P. The length of the text T (pattern P, resp.) may grow exponentially with respect to its description size ‖T‖=n (‖p‖=m, resp.). We show a new combinatorial property concerning with the periodic occurrences of a pattern in a text. Based on this property, we develop an O(n2m2) time algorithm using O(nm) space, which outputs a compact representation of all occurrences of P in T. This is superior to the algorithm proposed by Karpinski et al. [11], which runs in O((n+m)4 log (n+m)) time using O((n+m)3) space, and finds only one occurrence. Moreover, our algorithm is much simpler than theirs.

New Generation Computing | 1993

A machine discovery from amino acid sequences by decision trees over regular patterns

Setsuo Arikawa; Satoru Miyano; Ayumi Shinohara; Yasuhito Mukouchi; Takeshi Shinohara

This paper describes a machine learning system that discovered a “negative motif”, in transmembrane domain identification from amino acid sequences, and reports its experiments on protein data using PIR database. We introduce a decision tree whose nodes are labeled with regular patterns. As a hypothesis, the system produces such a decision tree for a small number of randomly chosen positive and negative examples from PIR. Experiments show that our system finds reasonable hypotheses very successfully. As a theoretical foundation, we show that the class of languages defined by decesion trees of depth at mostd overk-variable regular patterns is polynomial-time learnable in the sense of probably approximately correct (PAC) learning for any fixedd, k≥0.

combinatorial pattern matching | 2005

On-Line Construction of Compact Directed Acyclic Word Graphs

Shunsuke Inenaga; Hiromasa Hoshino; Ayumi Shinohara; Masayuki Takeda; Setsuo Arikawa; Giancarlo Mauri; Giulio Pavesi

Many different index structures, providing efficient solutions to problems related to pattern matching, have been introduced so far. Examples of these structures are suffix trees and directed acyclic word graphs (DAWGs), which can be efficiently constructed in linear time and space. Compact directed acyclic word graphs (CDAWGs) are an index structure preserving some features of both suffix trees and DAWGs, and require less space than both of them. An algorithm which directly constructs CDAWGs in linear time and space was first introduced by Crochemore and Verin, based on McCreights algorithm for constructing suffix trees. In this work, we present a novel on-line linear-time algorithm that builds the CDAWG for a single string as well as for a set of strings, inspired by Ukkonens on-line algorithm for constructing suffix trees.

data compression conference | 1998

Multiple pattern matching in LZW compressed text

Takuya Kida; Masayuki Takeda; Ayumi Shinohara; Masamichi Miyazaki; Setsuo Arikawa

We address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick (1975) pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach (see Journal of Computer and System Sciences, vol.52, p.299-307, 1996) finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m/sup 2/+r/sub a/) time using O(n+m/sup 2/) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

international conference on algorithms and complexity | 2000

Speeding Up Pattern Matching by Text Compression

Yusuke Shibata; Takuya Kida; Shuichi Fukamachi; Masayuki Takeda; Ayumi Shinohara; Takeshi Shinohara; Setsuo Arikawa

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring out a potential advantage of BPE compression. We show that it is very suitable from a practical view point of compressed pattern matching, where the goal is to find a pattern directly in compressed text without decompressing it explicitly. We compare running times to find a pattern in (1) BPE compressed files, (2) Lempel-Ziv-Welch compressed files, and (3) original text files, in various situations. Experimental results show that pattern matching in BPE compressed text is even faster than matching in the original text. Thus the BPE compression reduces not only the disk space but also the searching time.

Theoretical Computer Science | 2003

Collage system: a unifying framework for compressed pattern matching

Takuya Kida; Tetsuya Matsumoto; Yusuke Shibata; Masayuki Takeda; Ayumi Shinohara; Setsuo Arikawa

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D|| + |S|)- height(D) + m2 + r) time with O(||D|| + m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D|| + |S| + m2 + r).

combinatorial pattern matching | 1995

Pattern-matching for strings with short descriptions

Marek Karpinski; Wojciech Rytter; Ayumi Shinohara

We consider strings which are succinctly described. The description is in terms of straight-line programs in which the constants are symbols and the only operation is the concatenation. Such descriptions correspond to the systems of recurrences or to context-free grammars generating single words. The descriptive size of a string is the length n of a straight-line program (or size of a grammar) which defines this string. Usually the strings of descriptive size n are of exponential length. Fibonacci and Thue-Morse words are examples of such strings. We show that for a pattern P and text T of descriptive sizes m, n, an occurrence of P in T can be found (if there is any) in time polynomial with respect to n. This is nontrivial, since the actual lengths of P and T could be exponential, and none of the known string-matching algorithms is directly applicable. Our first tool is the periodicity lemma, which allows to represent some sets of exponentially many positions in terms of feasibly many arithmetic progressions. The second tool is arithmetics: a simple application of Euclid algorithm. Hence a textual problem for exponentially long strings is reduced here to simple arithmetics on integers with (only) linearly many bits. We present also an NP-complete version of the pattern-matching for shortly described strings.

Theoretical Computer Science | 2009

Efficient algorithms to compute compressed longest common substrings and compressed palindromes

Wataru Matsubara; Shunsuke Inenaga; Akira Ishino; Ayumi Shinohara; Tomoyuki Nakamura; Kazuo Hashimoto

This paper studies two problems on compressed strings described in terms of straight line programs (SLPs). One is to compute the length of the longest common substring of two given SLP-compressed strings, and the other is to compute all palindromes of a given SLP-compressed string. In order to solve these problems efficiently (in polynomial time w.r.t. the compressed size) decompression is never feasible, since the decompressed size can be exponentially large. We develop combinatorial algorithms that solve these problems in O(n4logn) time with O(n3) space, and in O(n4) time with O(n2) space, respectively, where n is the size of the input SLP-compressed strings.

combinatorial pattern matching | 2000

A Boyer-Moore Type Algorithm for Compressed Pattern Matching

Yusuke Shibata; Tetsuya Matsumoto; Masayuki Takeda; Ayumi Shinohara; Setsuo Arikawa

We apply the Boyer-Moore technique to compressed pattern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compression methods. For a subclass of collage systems that contain no truncation, our new algorithm runs in O(∥D∥ + n ċ m + m2 + r) time using O(∥D∥ + m2) space, where ∥D∥ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D)ċ(∥D∥+n)+nċm+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1.2 - 3.0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.

Explore More