Nobuhiro Yugami | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nobuhiro Yugami is active.

Explore More

Publication

Featured researches published by Nobuhiro Yugami.

Theoretical Computer Science | 2003

Effects of domain characteristics on instance-based learning algorithms

Seishi Okamoto; Nobuhiro Yugami

This paper presents average-case analyses of instance-based learning algorithms. The algorithms analyzed employ a variant of k-nearest neighbor classifier (k-NN). Our analysis deals with a monotone m-of-n target concept with irrelevant attributes, and handles three types of noise: relevant attribute noise, irrelevant attribute noise, and class noise. We formally represent the expected classification accuracy of k-NN as a function of domain characteristics including the number of training instances, the number of relevant and irrelevant attributes, the threshold number in the target concept, the probability of each attribute, the noise rate for each type of noise, and k. We also explore the behavioral implications of the analyses by presenting the effects of domain characteristics on the expected accuracy of k-NN and on the optimal value of k for artificial domains.

discovery science | 2003

Mining Interesting Patterns Using Estimated Frequencies from Subpatterns and Superpatterns

Yukiko Yoshida; Yuiko Ohta; Ken Kobayashi; Nobuhiro Yugami

In knowledge discovery in databases, the number of discovered patterns is often too enormous for human to understand, so that filtering out less important ones is needed. For this purpose, a number of interestingness measures of patterns have been introduced, and conventional ones evaluate a pattern as how its actual frequency is higher than the predicted values from its subpatterns. These measures may assign high scores to not only a pattern consisting of a set of strongly correlated items but also its subpatterns, and in many cases it is unnecessary to select all these subpatterns as interesting. To reduce this redundancy, we propose a new approach to evaluation of interestingness of patterns. We use a measure of interestingness which evaluates how the actual frequency of a pattern is higher than the predicted not only from its subpatterns but also from its superpatterns. On the strength of adding an estimation from superpatterns, our measure can more powerfully filter out redundant subpatterns than conventional measures. We discuss the effectiveness of our interestingness measure through a set of experimental results.

pacific asia conference on knowledge discovery and data mining | 2000

Fast Discovery of Interesting Rules

Nobuhiro Yugami; Yuiko Ohta; Seishi Okamoto

Extracting interesting rules from databases is an important field of knowledge discovery. Typically, enormous number of rules are embedded in a database and one of the essential abilities of discovery systems is to evaluate interestingness of rules to filter out less interesting rules. This paper proposes a new criterion of rules interestingness based on its exceptionality. This criterion evaluates exceptionality of rules by comparing their accuracy with those of simpler and more general rules. We also propose a disovery algorithm, DIG, to extract interesting rules with respect to the criterion effectively.

pacific-asia conference on knowledge discovery and data mining | 2014

MultiAspectSpotting: Spotting Anomalous Behavior within Count Data Using Tensor

Koji Maruhashi; Nobuhiro Yugami

Methods for finding anomalous behaviors are attracting much attention, especially for very large datasets with several attributes with tens of thousands of categorical values. For example, security engineers try to find anomalous behaviors, i.e., remarkable attacks which greatly differ from the day’s trend of attacks, on the basis of intrusion detection system logs with source IPs, destination IPs, port numbers, and additional information. However, there are large amount of abnormal records caused by noise, which can be repeated more abnormally than those caused by anomalous behaviors, and they are hard to be distinguished from each other. To tackle these difficulties, we propose a two-step anomaly detection. First, we detect abnormal records as individual anomalies by using a statistical anomaly detection, which can be improved by Poisson Tensor Factorization. Next, we gather the individual anomalies into groups of records with similar attribute values, which can be implemented by CANDECOMP/PARAFAC (CP) Decomposition. We conduct experiments using datasets added with synthesized anomalies and prove that our method can spot anomalous behaviors effectively. Moreover, our method can spot interesting patterns within some real world datasets such as IDS logs and web-access logs.

international conference on case based reasoning | 1997

Theoretical Analysis of Case Retrieval Method Based on Neighborhood of a New Problem

Seishi Okamoto; Nobuhiro Yugami

The retrieval of similar cases is often performed by using the neighborhood of a new problem. The neighborhood is usually denned by a certain fixed number of most similar cases (k nearest neighbors) to the problem. This paper deals with an alternative definition of neighborhood that comprises the cases within a certain distance, d, from the problem. We present an average-case analysis of a classifier, the d-nearest neighborhood method (d-NNh), that retrieves cases in this neighborhood and predicts their majority class as the class of the problem. Our analysis deals with m-of-n/l target concepts, and handles three types of noise. We formally compute the expected classification accuracy of d-NNh, then we explore the predicted behavior of d-NNh. By combining this exploration for d-NNh and one for k-nearest neighbor method (k-NN) in our previous study, we compare the predicted behavior of each in noisy domains. Our formal analysis is supported with Monte Carlo simulations.

conference on artificial intelligence for applications | 1992

An assumption-based combinatorial optimization system

Hirotaka Hara; Nobuhiro Yugami; Hiroyuki Yoshida

An assumption-based combinatorial optimization system is proposed for solving combinatorial optimization problems. The assumption-based combinatorial optimization system is a local search method in which a solution is formulated as a set of assumptions. Minimal support for the objective function is a minimal set of assumptions that guarantee the value of the objective function. Using minimal support, the system finds an approximate optimal solution efficiently because it: reduces the number of neighbors, defends the loop of a search and prunes search space, and never stays at a local optimal solution. The system was applied to a jobshop scheduling problem, and the systems effectiveness compared with other methods was demonstrated.<<ETX>>

ieee region 10 conference | 2016

Pyramid stack data stream mining for handling concept-drifting

Zhuoran Xu; Cuiqin Hou; Yingju Xia; Jun Sun; Hiroya Inakoshi; Nobuhiro Yugami

Data stream mining has gained growing attentions recently. Concept drift is a particular problem in data stream mining, which is defined as the distribution of data may change over time. Most of current methods try to estimate the current distribution or reconstruct the current distribution from a mixture of old distributions. They suffer problems of estimation and reconstruction error respectively. In this paper, we found that a classifier that fits the current distribution can be obtained more directly than the current methods by ensembling classifiers trained with increasing number of recent data. This strategy guarantees that no matter when and how concept drift happens, there is always a classifier that suits the current data distribution. So our method only needs to select the current distribution classifier out of all classifiers we hold. This is much easier than estimation and reconstruction. We test our method on four real world data sets. Comparing with other methods, our method is the best algorithm in terms of average accuracy.

database systems for advanced applications | 2014

Discovery of Areas with Locally Maximal Confidence from Location Data

Hiroya Inakoshi; Hiroaki Morikawa; Tatsuya Asai; Nobuhiro Yugami; Seishi Okamoto

A novel algorithm is presented for discovering areas having locally maximized confidence of an association rule on a collection of location data. Although location data obtained from GPS-equipped devices have promising applications, those GPS points are usually not uniformly distributed in two-dimensional space. As a result, substantial insights might be missed by using data mining algorithms that discover admissible or rectangular areas under the assumption that the GPS data points are distributed uniformly. The proposed algorithm composes transitively connected groups of irregular meshes that have locally maximized confidence. There is thus no need to assume the uniformity, which enables the discovery of areas not limited to a certain class of shapes. Iterative removal of the meshes in accordance with the local maximum property enables the algorithm to perform 50 times faster than state-of-the-art ones.

discovery science | 2000

Discovery of M-of-N Concepts for Classification

Nobuhiro Yugami

The purpose of knowledge discovery system is to discover interesting patterns in a given database. There exist many types of patterns and this paper focuses on discovery of classification rules from a set of training instances represented by attribute values and class labels. A classification rule restricts values of attributes in its body and predicts a class of an instance that satisfies the body. In usual, a body is a conjunction of conditions on attribute values. This paper deals with a different type of rule whose body is a threshold function and requires at least m of n conditions in it are satisfied. Such kind of rules have much more representation power than rules with conjunctive bodies and are suitable for many real world problems such as diagnoses of diseases in which observation of more symptoms of a certain disease leads more confident diagnosis[3],[8].

discovery science | 1998

Instance Guided Rule Induction

Nobuhiro Yugami; Yuiko Ohta; Seishi Okamoto

This paper proposes a new supervised induction algorithm, IGR, that uses each training instances as a guide of rule induction. IGR learns a set of if-then rules by inducing a pseudo-optimun classification rule for each training instance. IGR weighs the induced rules by using the number of trianing instances covered by them and classifies new instances by majority voting with the weights. Experimentalresu lts with twenty datasets in UCI repository show IGR can induce more accurate classification rules than existing learning algorithms such as C4.5, AQ and LazyDT. The experiments also show that IGR does not generate too many rules even if it is applied to large problems.

Explore More