Hiroki Arimura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hiroki Arimura is active.

Explore More

Publication

Featured researches published by Hiroki Arimura.

combinatorial pattern matching | 2001

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

Toru Kasai; Gunho Lee; Hiroki Arimura; Setsuo Arikawa; Kunsoo Park

We present a linear-time algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information.

discovery science | 2004

An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases

Takeaki Uno; Tatsuya Asai; Yuzo Uchida; Hiroki Arimura

The class of closed patterns is a well known condensed representations of frequent patterns, and have recently attracted considerable interest. In this paper, we propose an efficient algorithm LCM (Linear time Closed pattern Miner) for mining frequent closed patterns from large transaction databases. The main theoretical contribution is our proposed prefix-preserving closure extension of closed patterns, which enables us to search all frequent closed patterns in a depth-first manner, in linear time for the number of frequent closed patterns. Our algorithm do not need any storage space for the previously obtained patterns, while the existing algorithms needs it. Performance comparisons of LCM with straightforward algorithms demonstrate the advantages of our prefix-preserving closure extension.

Proceedings of the 1st international workshop on open source data mining | 2005

LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining

Takeaki Uno; Masashi Kiyomi; Hiroki Arimura

For a transaction database, a frequent itemset is an itemset included in at least a specified number of transactions. To find all the frequent itemsets, the heaviest task is the computation of frequency of each candidate itemset. In the previous studies, there are roughly three data structures and algorithms for the computation: bitmap, prefix tree, and array lists. Each of these has its own advantage and disadvantage with respect to the density of the input database. In this paper, we propose an efficient way to combine these three data structures so that in any case the combination gives the best performance.

european conference on principles of data mining and knowledge discovery | 2002

Optimized Substructure Discovery for Semi-structured Data

Kenji Abe; Shinji Kawasoe; Tatsuya Asai; Hiroki Arimura; Setsuo Arikawa

In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the computational complexity of the algorithm for patterns with bounded and unbounded size. Experiments show that the algorithm performs well and discovered interesting patterns on real datasets.

symposium on theoretical aspects of computer science | 1994

Finding Minimal Generalizations for Unions of Pattern Languages and Its Application to Inductive Inference from Positive Data

Hiroki Arimura; Takeshi Shinohara; Setsuko Otsuki

A pattern is a string of constant symbols and variables. The language defined by a pattern p is the set of constant strings obtained from p by substituting nonempty constant strings for variables in p. In this paper we are concerning with polynomial time inference from positive data of the class of unions of a bounded number of pattern languages. We introduce a syntactic notion of minimal multiple generalizations (mmg for short) to study the inferability of classes of unions. If a pattern p is obtained from another pattern q by substituting nonempty patterns for variables in q, q is said to be more general than p. A set of patterns defines a union of their languages. A set Q of patterns is said to be more general than a set P of patterns if for any pattern p in P there exists a more general pattern q in Q than p. Clearly more general set of patterns defines larger unions. A k-minimal multiple generalization (k-mmg) of a set S of strings is a minimally general set of at most k patterns that defines a union containing S. The syntactic notion of minimality enables us to efficiently compute a candidate for a semantically minimal concept. We present a general methodology for designing an efficient algorithm to find a k-mmg. Under some conditions an mmg can be used as an appropriate hypothesis for inductive inference from positive data. As results several classes of unions of pattern languages are shown to be polynomial time inferable from positive data.

international conference on data mining | 2002

Online algorithms for mining semi-structured data stream

Tatsuya Asai; Hiroki Arimura; Kenji Abe; Shinji Kawasoe; Setsuo Arikawa

In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling semi-structured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. A crucial part of our algorithm is the incremental maintenance of the occurrences of possibly frequent patterns using a tree sweeping technique. We give modifications of the algorithm to other online mining model. We present theoretical and empirical analyses to evaluate the performance of the algorithm.

algorithmic learning theory | 1997

Learning Acyclic First-Order Horn Sentences from Entailment

Hiroki Arimura

This paper considers the problem of learning an unknown first-order Horn sentence H* from examples of Horn clauses that H* either implies or does not imply. Particularly, we deal with a subclass of first-order Horn sentences ACH(k), called acyclic constrained Horn programs of constant arity k. ACH(k) allows recursions, disjunctive definitions, and the use of function symbols. We present an algorithm that exactly identifies every target Horn program H* in ACH(k) in polynomial time in p, m and n using O(pmnk+1) entailment equivalence queries and O(pm2n2k+1) request for hint queries, where p is the number of predicates, m is the number of clauses contained in H* and n is the size of the longest counterexample. This algorithm combines saturation and least general generalization operators to invert resolution steps. Next, using the technique of replacing request for hint queries with entailment membership queries, we have a polynomial time learning algorithm using entailment equivalence and entailment membership queries for a subclass of ACH(k). Finally, we show that any algorithm which learns ACH(k) using entailment equivalence and entailment membership queries makes μ(mnk) queries, and that the use of entailment cannot be eliminated to learn ACH(k) even with both equivalence and membership queries for ground atoms are allowed.

algorithmic learning theory | 1996

Inductive Inference of Unbounded Unions of Pattern Languages from Positive Data

Takeshi Shinohara; Hiroki Arimura

A pattern is a string consisting of constant symbols and variables. The language of a pattern is the set of constant strings obtained by substituting nonempty constant strings for variables in the pattern. For any fixed k, the class of unions of at most k pattern languages is already shown to be inferable from positive data. The class of all the unions of arbitrarily finitely many pattern languages is not inferable, because any constant string defines a singleton set consisting of itself, and the class of unions contains all the finite languages. A proper pattern is a pattern that contains at least one variable. The language of a proper pattern is infinite. In this paper, we consider the class of unions when patterns are restricted to be proper and show that the class is not inferable from positive data. A regular pattern is a pattern that contains at most one occurrence of every variable. When regular patterns are restricted not to contain more than l consecutive occurrences of constant symbols for some l, the class of unions is shown to be inferable from positive data.

research in computational molecular biology | 2000

On approximation algorithms for local multiple alignment

Tatsuya Akutsu; Hiroki Arimura; Shinichi Shimozono

This paper studies the local multiple alignment problem, which is also known as the general consensus patterns problem. Local multiple alignment is, given protein or DNA sequences, to locate a region (i.e., a substring) of fixed length from each sequence so that the score determined from the set of regions is optimized. We consider the following scoring schemes. the score indicating the average information content, the score defined by Li et al, and the sum-of-pairs score We prove that multiple local alignment is NP-hard under each of these scoring schemes. In addition, we prove that multiple local alignment is APX-hard under the average information content scoring. It implies that unless P = NP there is no polynomial time algorithm whose worst case approximation error can be arbitrarily specified (precisely, a polynomial time approximation scheme). Several related theoretical results are provided. We also made computational experiments on approximation algorithms for local multiple alignment under the average information content scoring. The results suggest that the Gibbs sampling algorithm proposed by Lawrence et al. is the best.

algorithmic learning theory | 1995

Learning unions of tree patterns using queries

Hiroki Arimura; Hiroki Ishizaka; Takeshi Shinohara

This paper characterizes the polynomial time learnability of TP k , the class of collections of at most k first-order terms. A collection in TPA k defines the union of the languages defined by each first-order terms in the set. Unfortunately, the class TP k not polynomial time learnable in most of learning frameworks under standard assumptions in computational complexity theory. To overcome this computational hardness, we relax the learning problem by allowing a learning algorithm to make membership queries. We present a polynomial time algorithm that exactly learns every concept in TP k using O(kn) equivalence and O(k2n · max{k, n}) membership queries, where n is the size of longest counterexample given so far. In the proof, we use a technique of replacing each restricted subset query by several membership queries under some condition on a set of function symbols. As corollaries, we obtain the polynomial time PAC-learnability and the polynomial time predictability of TP k when membership queries are available. We also show a lower bound Ω(kn) of the number of queries necessary to learn TP k using both types of queries. Further, we show that neither types of queries can be eliminated to achieve efficient learning of TP k . Finally, we apply our results in learning of a class of restricted logic programs, called unit clause programs.

Explore More