Seishi Okamoto | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seishi Okamoto is active.

Explore More

Publication

Featured researches published by Seishi Okamoto.

international conference on case based reasoning | 1995

An Average-Case Analysis of k-Nearest Neighbor Classifier

Seishi Okamoto; Ken Satoh

In this paper, we perform an average-case analysis of k-nearest neighbor classifier (k-NNC) for a subclass of Boolean threshold functions. Our average-case analysis is based on the formal computation for the predictive accuracy of the classifier under the assumption of noise-free Boolean features and a uniform instance distribution. The predictive accuracy is represented as a function of the number of features, the threshold, the number of training instances, and the number of nearest neighbors. We also present the predictive behavior of the classifier by systematically varying the values of the parameters of the accuracy function. We plot the behavior of the classifier by varying the value of k, and then we observe that the performance of the classifier improves as k increases, then reaches a maximum before starting to deteriorate. We further investigate the relationship between the number of training instances and the optimal value of k. We then observe that optimum k increases gradually as the number of training instances increases.

Theoretical Computer Science | 2003

Effects of domain characteristics on instance-based learning algorithms

Seishi Okamoto; Nobuhiro Yugami

This paper presents average-case analyses of instance-based learning algorithms. The algorithms analyzed employ a variant of k-nearest neighbor classifier (k-NN). Our analysis deals with a monotone m-of-n target concept with irrelevant attributes, and handles three types of noise: relevant attribute noise, irrelevant attribute noise, and class noise. We formally represent the expected classification accuracy of k-NN as a function of domain characteristics including the number of training instances, the number of relevant and irrelevant attributes, the threshold number in the target concept, the probability of each attribute, the noise rate for each type of noise, and k. We also explore the behavioral implications of the analyses by presenting the effects of domain characteristics on the expected accuracy of k-NN and on the optimal value of k for artificial domains.

conference on computational natural language learning | 2008

A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking

Tomoya Iwakura; Seishi Okamoto

Combination of features contributes to a significant improvement in accuracy on tasks such as part-of-speech (POS) tagging and text chunking, compared with using atomic features. However, selecting combination of features on learning with large-scale and feature-rich training data requires long training time. We propose a fast boosting-based algorithm for learning rules represented by combination of features. Our algorithm constructs a set of rules by repeating the process to select several rules from a small proportion of candidate rules. The candidate rules are generated from a subset of all the features with a technique similar to beam search. Then we propose POS tagging and text chunking based on our learning algorithm. Our tagger and chunker use candidate POS tags or chunk tags of each word collected from automatically tagged data. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training time of our algorithm are about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy and faster classification speed.

pacific asia conference on knowledge discovery and data mining | 2000

Fast Discovery of Interesting Rules

Nobuhiro Yugami; Yuiko Ohta; Seishi Okamoto

Extracting interesting rules from databases is an important field of knowledge discovery. Typically, enormous number of rules are embedded in a database and one of the essential abilities of discovery systems is to evaluate interestingness of rules to filter out less interesting rules. This paper proposes a new criterion of rules interestingness based on its exceptionality. This criterion evaluates exceptionality of rules by comparing their accuracy with those of simpler and more general rules. We also propose a disovery algorithm, DIG, to extract interesting rules with respect to the criterion effectively.

string processing and information retrieval | 2010

Training parse trees for efficient VF coding

Takashi Uemura; Satoshi Yoshida; Takuya Kida; Tatsuya Asai; Seishi Okamoto

We address the problem of improving variable-length-to-fixed-length codes (VF codes), which have favourable properties for fast compressed pattern matching but moderate compression ratios. Compression ratio of VF codes depends on the parse tree that is used as a dictionary. We propose a method that trains a parse tree by scanning an input text repeatedly, and we show experimentally that it improves the compression ratio of VF codes rapidly to the level of state-of-the-art compression methods.

Journal of Information Processing | 2012

Improving Parse Trees for Efficient Variable-to-Fixed Length Codes

Satoshi Yoshida; Takashi Uemura; Takuya Kida; Tatsuya Asai; Seishi Okamoto

We address the problem of improving variable-length-to-fixed-length codes (VF codes). A VF code that we deal here with is an encoding scheme that parses an input text into variable length substrings and then assigns a fixed length codeword to each parsed substring. VF codes have favourable properties for fast decoding and fast compressed pattern matching, but they are worse in compression ratio than the latest compression methods. The compression ratio of a VF code depends on the parse tree used as a dictionary. To gain a better compression ratio we present several improvement methods for constructing parse trees. All of them are heuristical solutions since it is intractable to construct the optimal parse tree. We compared our methods with the previous VF codes, and showed experimentally that their compression ratios reach to the level of state-of-the-art compression methods.

international conference on case based reasoning | 1997

Theoretical Analysis of Case Retrieval Method Based on Neighborhood of a New Problem

Seishi Okamoto; Nobuhiro Yugami

The retrieval of similar cases is often performed by using the neighborhood of a new problem. The neighborhood is usually denned by a certain fixed number of most similar cases (k nearest neighbors) to the problem. This paper deals with an alternative definition of neighborhood that comprises the cases within a certain distance, d, from the problem. We present an average-case analysis of a classifier, the d-nearest neighborhood method (d-NNh), that retrieves cases in this neighborhood and predicts their majority class as the class of the problem. Our analysis deals with m-of-n/l target concepts, and handles three types of noise. We formally compute the expected classification accuracy of d-NNh, then we explore the predicted behavior of d-NNh. By combining this exploration for d-NNh and one for k-nearest neighbor method (k-NN) in our previous study, we compare the predicted behavior of each in noisy domains. Our formal analysis is supported with Monte Carlo simulations.

EWCBR '94 Selected papers from the Second European Workshop on Advances in Case-Based Reasoning | 1994

An Average Predictive Accuracy of the Nearest Neighbor Classifier

Seishi Okamoto; Ken Satoh

The definition of similarity between cases is a key issue on case-based reasoning. The nearest neighbor method represents a basic mechanism for defining the similarity in the case-based reasoning system. In this paper, we perform an average-case analysis of the nearest neighbor classifier for conjunctive classes. We formally compute the predictive accuracy of the nearest neighbor classifier. The predictive accuracy is represented as a function of the number of training cases and the numbers of relevant attributes for classes. We also plot the predictive behavior of the classifier by substituting actual values into the parameters of the accuracy function. The graphs by plotting the predictive behavior help us to understand the relationships between the parameters of the accuracy function and the predictive accuracy of the nearest neighbor classifier. Our investigation focuses on how the numbers of relevant attributes for classes affect the predictive accuracy.

pacific rim international conference on artificial intelligence | 2014

An AdaBoost for Efficient Use of Confidences of Weak Hypotheses on Text Categorization

Tomoya Iwakura; Takahiro Saitou; Seishi Okamoto

We propose a boosting algorithm based on AdaBoost for using real-valued weak hypotheses that return confidences of their classifications as real numbers with an approximated upper bound of the training error. The approximated upper bound is induced with Bernoulli’s inequality and the upper bound enables us to analytically calculate a confidence-value that satisfies a reduction in the original upper bound. The experimental results on the Reuters-21578 data set and an Amazon review data show that our boosting algorithm with the perceptron attains better accuracy than Support Vector Machines, decision stumps-based boosting algorithms and a perceptron.

database systems for advanced applications | 2010

Chimera: stream-oriented XML filtering/querying engine

Tatsuya Asai; Shinichiro Tago; Hiroya Inakoshi; Seishi Okamoto; Masayuki Takeda

In this paper, we study the problem of filtering and querying massive XML data against a large set of XPath patterns in Univariate XPath. Based on an efficient matching engine XSIGMA for linear XPath patterns with Boolean expression over keywords and a twig evaluator over event streams, we propose an XPath filtering/querying engine Chimera, which runs fast and stably for any XPath patterns without heavy pre- processing techniques for queried data often used by existing native XMLDBs and RDBs. Chimera also runs much faster than those engines against thousands of XPath patterns. We implemented Chimera and showed its effectiveness by several experiments on artificial and real datasets.

Explore More