Minoru Kawahara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minoru Kawahara is active.

Explore More

Publication

Featured researches published by Minoru Kawahara.

hawaii international conference on system sciences | 2000

An application of text mining: bibliographic navigator powered by extended association rules

Minoru Kawahara; Hiroyuki Kawano

In this paper, we discuss the implementation and performance of our developed bibliographic navigator with the text mining. We categorize the different attributes and extend the mining association algorithms and in order to provide mole helpful rules. By our proposed algorithm, more interesting rules are derived from relationships between several categorized attributes in bibliographic databases such as INSPEC. We also evaluate the performance of our proposed algorithms on our developing bibliographic navigator and discuss the improvements of our navigator.

hawaii international conference on system sciences | 2001

Mining association algorithm with threshold based on ROC analysis

Minoru Kawahara; Hiroyuki Kawano

The mining association algorithm is one of the most important data mining algorithms to derive association rules at high speed from huge databases. However, the algorithm tends to derive those rules that contain noise, such as stopwords, and then some systems remove the noise using noise filters. We have been improving the algorithm and developing navigation systems for semi-structured data using the algorithm, and we also use a dictionary to remove noise from derived association rules. In order to derive effective rules, it is very important to determine system parameters such as the threshold values of minimum support and minimum confidence. We have adapted ROC analysis to the algorithm on our navigation systems and have evaluated the performance of derived rules. In this paper, we import the parameters from the ROC analysis into the algorithm in order to propose extended mining association algorithms. Moreover, we evaluate the performance of our proposed algorithms using a experimental database and show how our proposed algorithms can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.

data warehousing and knowledge discovery | 2000

Mondou: Information Navigator with Visual Interface

Hiroyuki Kawano; Minoru Kawahara

Since 1995, we have been developing web search engine Mondou using data mining techniques (http://www.kuamp.kyoto-u.ac.jp/labs/ infocom/mondou/index_e.html) in order to discover the helpful information for web search operations. In our previous works, we focus on the computing cost to derive associative keywords, we propose the method to determine system parameters, such as Minsup and Minconf threshold values. Moreover we evaluate the ROC performance of derived keywords by weighted association algorithms. In this paper, we try to implement two kinds of Java applets in our Mondou system, such as ROC graph for selecting associative keywords and documents clustering. This visual interface shows characteristics of associative rules on the ROC graph with the Minsup values. It also provides the function of document clustering in order to visualize retrieved documents.

pacific rim conference on communications, computers and signal processing | 2001

Mining association algorithm with improved threshold based on ROC analysis

Minoru Kawahara; Hiroyuki Kawano

The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However, the algorithm tends to derive those rules that contain noises such as stopwords then many systems use noise filters to remove such noises. In order to remove the noises automatically and derive more effective rules, we proposed an algorithm using the true positive rate and the false positive rate of derived rules in a database based on the ROC analysis. In this paper, we make corrections in the parameters to improve the extended mining association algorithm. Moreover, we evaluate the performance of our proposed algorithm using a experimental database and show how our proposed algorithm can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.

discovery science | 2002

Extended Association Algorithm Based on ROC Analysis for Visual Information Navigator

Hiroyuki Kawano; Minoru Kawahara

It is very important to derive association rules at high speed from huge volume of databases. However, the typical fast mining algorithms in text databases tend to derive meaningless rules such as stop-words, then many researchers try to remove these noisy rules by using various filters. In our researches, we improve the association algorithm and develop information navigation systems for text data using visual interface, and we also apply a dictionary to remove noisy keywords from derived association rules. In order to remove noisy keywords automatically, we propose an algorithm basedon the true positive rate and the false positive rate in the ROC analysis. Moreover, in order to remove stopwords automatically from raw association rules, we introduce several threshold values of the ROC analysis into our proposedmining algorithm. We evaluate the performance of our proposedmining algorithms in a bibliographic database.

data warehousing and knowledge discovery | 2003

Parallel Vector Computing Technique for Discovering Communities on the Very Large Scale Web Graph

Kikuko Kawase; Minoru Kawahara; Takeshi Iwashita; Hiroyuki Kawano; Masanori Kawazawa

The study of the authoritative pages and community discovery from an enormous Web contents has attracted many researchers. One of the link-based analysis, the HITS algorithm, calculates authority scores as the eigenvector of a adjacency matrix created from the Web graph. Although it was considered impossible to compute the eigenvector of a very large scale of Web graph using previous techniques, due to this calculation requires enormous memory space. We make it possible using data compression and parallel computation.

international symposium on database applications in non traditional environments | 1999

Performance evaluation and visualization of association rules using receiver operating characteristic graph

Minoru Kawahara; Hiroyuki Kawano

We have been developing the web search engine, mondou, using weighted association rules. It is very helpful for search users to provide associative keywords which are derived by text mining algorithm. Moreover, based on the experimental results of our web search system, we try to implement these mining functions on web-based intelligent database navigation system using the source program files of the commercial text database OpenText. Especially, in order to derive appropriate keywords with small computing cost, we carefully focus on how to determine system parameters, such as Minsup and Minconf threshold values. In this paper, we use the techniques of ROC graph to evaluate the performance and characteristics of derived rules. We propose the ROC analytical model of our search system, and evaluate the performance of weighted association rules by ROC convex hull method. Moreover, we also propose a method which visualizes association rules using ROC graph to provide the relationship between a query and derived association rules. By using the INSPEC database, we try to specify the optimal threshold values to derive effective rules from the typical bibliographic data.

discovery science | 1999

Mining Association Algorithm Based on ROC Convex Hull Method in Bibliographic Navigation System

Minoru Kawahara; Hiroyuki Kawano

Minoru Kawahara and Hiroyuki Kawano 1 Data Processing Center, Kyoto University, Kyoto 6068501, JAPAN, [email protected], http://www.kudpc.kyoto-u.ac.jp/∼kawahara/index.html 2 Department of Systems Science, Kyoto University, Kyoto 6068501, JAPAN, [email protected], http://www.kuamp.kyoto-u.ac.jp/∼kawano/index.html In order to dissolve or ease retrieval difficulties on bibliographic databases, we have been developing bibliographic navigation system with the implementation of our proposed mining algorithms[1]. Our navigation system shows related keywords derived from the query which is inputed by a query user, and navigates query users to retrieve appropriate bibliographies. Although those thresholds that are used in the mining association algorithm are usually given by the system administrator, it is required methods to give such thresholds that can derive appropriate association rules for bibliographic navigation system. In this paper, we propose a method which specifies the optimal thresholds based on the ROC (Receiver Operating Characteristic) analysis[2] and evaluate the performance of the method on our practical navigation system. According to the bibliography [2], ROC graphs have long been used in signal detection theory to depict tradeoffs between hit rate and false alarm rate. ROC graphs illustrate the behavior of a classifier without regard to class distribution or error cost, and so they decouple classification performance from these factors. The ROC convex hull method is a method to compare multiple classifiers on an ROC graph and specify the optimal classifier which supplies the highest performance. ROC graph uses two parameters true positive rate TP and false positive rate FP as classifiers. If FP is plotted on the X axis and TP is plotted on the Y axis on a graph for several instances, then a curve is drawn and the curve, which is called as the ROC curve, drown nearer the point on which TP is higher and FP is lower, that is the most-northwest line, is better. Although ROC graph illustrates classification performance separated from class and cost, the ROC convex hull method can consider them. It is assumed that c(classification, class) is a two-place error cost function where c(n, P ) is the cost of a false negative error and c(y, N) is the cost of a false positive error, and p(P ) is the prior probability of a positive instance, so the prior probability of a negative instance is p(N) = 1 − p(P ). So the slope of an iso-performance line can be represented by p(N)/p(P ) · c(y, N)/c(n, P ). S. Arikawa, K. Furukawa (Eds.): DS’99, LNAI 1721, pp. 333–334, 1999. c

international conference on semantic computing | 1999

ROC Performance Evaluation of Web-Based Bibliographic Navigator using Extended Association Rules

Minoru Kawahara; Hiroyuki Kawanao

It is very effective for search users to provide meaningful keywords which are derived by text mining algorithm. We are developing our search engine “Mondou” using weighted association rules, as the web-based intelligent database navigation system. In this paper, we focus on the computing cost to derive appropriate keywords, we carefully determine system parameters, such as Minsup and Minconf threshold values. In order to evaluate the performance and characteristics of derived rules, we use the techniques of ROC graph. We propose the ROC analytical model of our search system, and we evaluate the performance of weighted association rules by the ROC convex hull method. Especially, we try to specify the optimal threshold values to derive effective rules from INSPEC database, which is one of the huge bibliographic databases.

Systems and Computers in Japan | 2007