Takuya Kida | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takuya Kida is active.

Explore More

Publication

Featured researches published by Takuya Kida.

data compression conference | 1998

Multiple pattern matching in LZW compressed text

Takuya Kida; Masayuki Takeda; Ayumi Shinohara; Masamichi Miyazaki; Setsuo Arikawa

We address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick (1975) pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach (see Journal of Computer and System Sciences, vol.52, p.299-307, 1996) finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m/sup 2/+r/sub a/) time using O(n+m/sup 2/) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

international conference on algorithms and complexity | 2000

Speeding Up Pattern Matching by Text Compression

Yusuke Shibata; Takuya Kida; Shuichi Fukamachi; Masayuki Takeda; Ayumi Shinohara; Takeshi Shinohara; Setsuo Arikawa

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring out a potential advantage of BPE compression. We show that it is very suitable from a practical view point of compressed pattern matching, where the goal is to find a pattern directly in compressed text without decompressing it explicitly. We compare running times to find a pattern in (1) BPE compressed files, (2) Lempel-Ziv-Welch compressed files, and (3) original text files, in various situations. Experimental results show that pattern matching in BPE compressed text is even faster than matching in the original text. Thus the BPE compression reduces not only the disk space but also the searching time.

Theoretical Computer Science | 2003

Collage system: a unifying framework for compressed pattern matching

Takuya Kida; Tetsuya Matsumoto; Yusuke Shibata; Masayuki Takeda; Ayumi Shinohara; Setsuo Arikawa

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D|| + |S|)- height(D) + m2 + r) time with O(||D|| + m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D|| + |S| + m2 + r).

data compression conference | 2001

Faster approximate string matching over compressed text

Gonzalo Navarro; Takuya Kida; Masayuki Takeda; Ayumi Shinohara; Setsuo Arikawa

Approximate string matching on compressed text was an open problem for almost a decade. The two existing solutions are very new. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. We present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.

string processing and information retrieval | 1999

A unifying framework for compressed pattern matching

Takuya Kida; Yusuke Shibata; Masayuki Takeda; Ayumi Shinohara; Setsuo Arikawa

We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW) (J. Ziv and A. Lempel, 1978), byte-pair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by A. Amir et al. (1996).

data compression conference | 2009

Suffix Tree Based VF-Coding for Compressed Pattern Matching

Takuya Kida

We propose an efficient variable-length-to-fixed-length code (VF code for short), called ST-VF code. It utilizes a frequency-base-pruned suffix tree as a parse tree. VF codes as typified by Tunstall code have a preferable aspect to compressed pattern matching. It is unnecessary to distinguish code boundaries on compressed texts since all codewords have the same length.

international conference on ubiquitous information management and communication | 2009

Efficient serial episode mining with minimal occurrences

Hideyuki Ohtani; Takuya Kida; Takeaki Uno; Hiroki Arimura

Recently, knowledge discovery in large data increases its importance in various fields. Especially, data mining from time-series data gains much attention. This paper studies the problem of finding frequent episodes appearing in a sequence of events. We propose an efficient depth-first search algorithm for mining frequent serial episodes in a given event sequence using the notion of right-minimal occurrences. Then, we present some techniques for speeding up the algorithm, namely, occurrence-deliver and tail-redundancy pruning. Finally, we ran experiments on real datasets to evaluate the usefulness of the proposed methods.

string processing and information retrieval | 2004

A Space-Saving Linear-Time Algorithm for Grammar-Based Compression

Hiroshi Sakamoto; Takuya Kida; Shinichi Shimozono

A space-efficient linear-time approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. The algorithm consumes only O(g * log g *) space and achieves the worst-case approximation ratio O(log g * log n), with the size n of an input and the optimum grammar size g *. Experimental results for typical benchmarks demonstrate that our algorithm is practical and efficient.

data compression conference | 2010

An Efficient Algorithm for Almost Instantaneous VF Code Using Multiplexed Parse Tree

Satoshi Yoshida; Takuya Kida

Almost Instantaneous VF code proposed by Yamamoto and Yokoo in 2001, which is one of the variable-length-to-fixed-length codes, uses a set of parse trees and achieves a good compression ratio. However, it needs much time and space for both encoding and decoding than an ordinary VF code does. In this paper, we proved that we can multiplex the set of parse trees into a compact single tree and simulate the original encoding and decoding procedures. Our technique reduces the total number of nodes into O(2^l k - k2), while it is originally O(2^l k), where l and k are the codeword length and the alphabet size, respectively. The experimental results showed that we could encode and decode over three times faster for natural language texts by using this technique.

string processing and information retrieval | 2010

Training parse trees for efficient VF coding

Takashi Uemura; Satoshi Yoshida; Takuya Kida; Tatsuya Asai; Seishi Okamoto

We address the problem of improving variable-length-to-fixed-length codes (VF codes), which have favourable properties for fast compressed pattern matching but moderate compression ratios. Compression ratio of VF codes depends on the parse tree that is used as a dictionary. We propose a method that trains a parse tree by scanning an input text repeatedly, and we show experimentally that it improves the compression ratio of VF codes rapidly to the level of state-of-the-art compression methods.

Explore More