Yasuhiko Morimoto | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuhiko Morimoto is active.

Explore More

Publication

Featured researches published by Yasuhiko Morimoto.

Journal of Computer and System Sciences | 1999

Mining Optimized Association Rules for Numeric Attributes

Takeshi Fukuda; Yasuhiko Morimoto; Shinichi Morishita; Takeshi Tokuyama

Given a huge database, we address the problem of finding association rules for numeric attributes, such as(Balance?I)?(CardLoan=yes),which implies that bank customers whose balances fall in a rangeIare likely to use card loan with a probability greater thanp. The above rule is interesting only if the rangeIhas some special feature with respect to the interrelation betweenBalanceandCardLoan. It is required that the number of customers whose balances are contained inI(called thesupportofI) is sufficient and also that the probabilitypof the conditionCardLoan=yesbeing met (called theconfidence ratio) be much higher than the average probability of the condition over all the data. Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing twooptimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at least a given threshold number. Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketing as the preprocessing method and thus obtain an efficient rule-finding system. Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute optimized rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time.

knowledge discovery and data mining | 2001

Mining frequent neighboring class sets in spatial databases

Yasuhiko Morimoto

We consider the problem of finding neighboring class sets. Objects of each instance of a neighboring class set are grouped using their Euclidean distances from each other. Recently, location-based services are growing along with mobile computing infrastructure such as cellular phones and PDAs. Therefore, we expect to see the development of spatial databases that contains very large number of access records including location information. The most typical type would be a database of point objects. Records of the objects may consist of requested service name, number of packet transmitted in addition to x and y coordinate values indicating where the request came from. The algorithm presented here efficiently finds sets of service names that were frequently close to each other in the spatial database. For example, it may find a frequent neighboring class set, where ticket and timetable are frequently requested close to each other. By recognizing this, location-based service providers can promote a ticket service for customers who access the timetable.

ACM Transactions on Database Systems | 2001

Data Mining with optimized two-dimensional association rules

Takeshi Fukuda; Yasuhiko Morimoto; Shinichi Morishita; Takeshi Tokuyama

We discuss data mining based on association rules for two numeric attributes and one Boolean attribute. For example, in a database of bank customers, Age and Balance are two numeric attributes, and CardLoan is a Boolean attribute. Taking the pair (Age, Balance) as a point in two-dimensional space, we consider an association rule of the formn <inline-equation> <f> <fen lp=par><fen lp=par>Age,<hsp sp=0.167>Balance<rp post=par></fen> ∈P<rp post=par></fen>⇒<fen lp=par>CardLoan<hsp sp=0.167> =<hsp sp=0.167>Yes<rp post=par></fen>,</f> </inline-equation> nwhich implies that bank customers whose ages and balances fall within a planar region P tend to take out credit card loans with a high probability.We consider two classes of regions, rectangles and admissible (i.e., connected and x-monotone) regions. For each class, we propose efficient algorithms for computing the regions that give optimal association rules for gain, support, and confidence, respectively. We have implemented the algorithms for admissible regions as well as several advanced functions based on them in our data mining system named SONAR (System for Optimized Numeric Association Rules), where the rules are visualized by using a graphic user interface to make it easy for users to gain an intuitive understanding of rules.

symposium on applications and the internet | 2003

Extracting spatial knowledge from the web

Yasuhiko Morimoto; Masaki Aono; Michael E. Houle; Kevin S. McCurley

The content of the World-Wide Web is pervaded by information of a geographical or spatial nature, particularly location information such as addresses, postal codes, and telephone numbers. We present a system for extracting spatial knowledge from collections of Web pages gathered by Web-crawling programs. For each page determined to contain location information, we apply geocoding techniques to compute geographic coordinates, such as latitude-longitude pairs. Next, we augment the location information with keyword descriptors extracted from Web page contents. We then apply spatial data mining techniques on the augmented location information to derive spatial knowledge.

very large data bases | 1997

Efficient Construction of Regression Trees with Range and Region Splitting

Yasuhiko Morimoto; Hiromu Ishii; Shinichi Morishita

We propose a method for constructing regression trees with range and region splitting. We present an efficient algorithm for computing the optimal two-dimensional region that minimizes the mean squared error of an objective numeric attribute in a given database. As two-dimensional regions, we consider a class R of grid-regions, such as “x-monotone,” “rectilinear-convex,” and “rectangular,” in the plane associated with two numeric attributes. We compute the optimal region R. We propose to use a test that splits data into those that lie inside the region R and those that lie outside the region in the construction of regression trees. Experiments confirm that the use of region splitting gives compact and accurate regression trees in many domains.

Constraints - An International Journal | 1998

Implementation and evaluation of decision trees with range and region splitting

Yasuhiko Morimoto; Takeshi Fukuda; Shinichi Morishita; Takeshi Tokuyama

We propose an extension of an entropy-based heuristic for constructing a decision tree from a large database with many numeric attributes. When it comes to handling numeric attributes, conventional methods are inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pairof attributes. For R ∈ R, the data canbe split into two classes: data inside R and dataoutside R. We compute the region Ropt∈ R that minimizes the entropy of the splitting,and add the splitting associated with Ropt (foreach pair of strongly correlated attributes) to the set of candidatetests in an entropy-based heuristic. We give efficient algorithmsfor cases in which R is (1) x-monotone connected regions, (2) based-monotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm has been implemented as a subsystem of SONAR (System for Optimized Numeric Association Rules) developed by the authors. We have confirmed that we can compute the optimal region efficiently. And diverse experiments show that our approach can create compact trees whose accuracy is comparable with or better than that of conventional trees. More importantly, we can grasp non-linear correlation among numeric attributes which could not be found without our region splitting.

international conference on management of data | 1996

SONAR: system for optimized numeric association rules

Takeshi Fukuda; Yasuhiko Morimoto; Shinichi Morishita; Takeshi Tokuyama

Recent progress in technologies for data input have made it easier for finance and retail organizations to collect massive amounts of data and to store them on disk at a low cost. Such organizations are interested in extracting from these huge databases previously unnoticed information that inspires new marketing strategies. In this demonstration, we introduce SOAJAR, a system for mining optimized association rules from databases with numeric data as well as Boolean data. An example of an association rule is

international symposium on algorithms and computation | 1996

Interval Finding and Its Application to Data Mining

Takeshi Fukuda; Yasuhiko Morimoto; Shinichi Morishita; Takeshi Tokuyama

In this paper, we investigate inverse problems of the interval query problem in application to data mining. Let I be the set of all intervals on U={0, 1, 2,., n}. Consider an objective function f(I), conditional functions ui(I) on I, and define an optimization problem of finding the interval I maximizing f(I) subject to ui(I) > Ki for given real numbers Ki (i=1, 2,., h). We propose efficient algorithms to solve the above optimization problem if the objective function is either additive or quotient, and the conditional functions are additive, where a function f is additive (fleft( I right) = sum _{i in I} hat fleft( i right)) extending a function (hat f) on U, and quotient if it is represented as a quotient of two additive functions. We use computational-geometric methods such as convex hull, range searching, and multidimensional divide-and-conquer.

IEEE Transactions on Knowledge and Data Engineering | 2002

Algorithms for finding attribute value group for binary segmentation of categorical databases

Yasuhiko Morimoto; Takeshi Fukuda; Takeshi Tokuyama

We consider the problem of finding a set of attribute values that give a high quality binary segmentation of a database. The quality of a segmentation is defined by an objective function suitable for the users objective, such as mean squared error, mutual information, or /spl chi//sup 2/ each of which is defined in terms of the distribution of a given target attribute. Our goal is to find value groups on a given conditional domain that split databases into two segments, optimizing the value of an objective function. Though the problem is intractable for general objective functions, there are feasible algorithms for finding high quality binary segmentations when the objective function is convex, and we prove that the typical criteria mentioned above are all convex. We propose two practical algorithms, based on computational geometry techniques, which find a much better value group than conventional heuristics.

knowledge discovery and data mining | 1997