Iko Pramudiono
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Iko Pramudiono.
knowledge discovery and data mining | 2003
Iko Pramudiono; Masaru Kitsuregawa
FP-growth has become a popular algorithm to mine frequent patterns. Its metadata FP-tree has allowed significant performance improvement over previously reported algorithms. However that special data structure also restrict the ability for further extensions. There is also potential problem when FP-tree can not fit into the memory. In this paper, we report parallel execution of FP-growth. We examine the bottlenecks of the parallelization and also method to balance the execution efficiently on shared-nothing environment.
international conference on management of data | 2004
Iko Pramudiono; Masaru Kitsuregawa
Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. One of data mining techniques, generalized association rule mining with taxonomy, is potential to discover more useful knowledge than ordinary flat association rule mining by taking application specific information into account. We propose pattern growth mining paradigm based FP-tax algorithm, which employs a tree structure to compress the database. Two methods to traverse the tree structure are examined: Bottom-Up and Top-Down. Experimental results show that both methods significantly outperform classic Cumulate algorithm, in particular Top-Down FP-tax can achieve two order of magnitudes better performance than Cumulate.
mobile data management | 2002
Iko Pramudiono; Takahiko Shintani; Katsumi Takahashi; Masaru Kitsuregawa
The rapid growth of Internet access from mobile users has emphasised the importance of location specific information on the Web. A unique Web service called Mobile Info Search (MIS) from NTT Laboratories gathers information and provides location aware search facilities. We performed association rule mining and sequence pattern mining against an access log which was accumulated at the MIS site in order to get insight into the behavior of mobile users regarding spatial information on the Web. Details of the Web log mining process and the rules we derived are reported in this paper.
database and expert systems applications | 2003
Iko Pramudiono; Masaru Kitsuregawa
Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mined using frequent pattern growth methodology. Higher level of performance improvement can be expected from parallel execution. In particular, PC cluster is gaining popularity as the high cost-performance parallel platform for data extensive task like data mining. However, we have to address many issues such as space distribution on each node and skew handling to efficiently mine frequent patterns from tree structure on a shared-nothing environment. We develop a framework to address those issues using novel granularity control mechanism and tree remerging. The common framework can be enhanced with temporal constrain to mine web access patterns. We invent improved support counting procedure to reduce the additional communication overhead. Real implementation using up to 32 nodes confirms that good speedup ratio can be achieved even on skewed environment.
australasian database conference | 2002
Masaru Kitsuregawa; Masashi Toyoda; Iko Pramudiono
The emergence of WWW has drawn new frontiers for database research. Web mining has become a hot topic since WWW rapid expansion rate and chaotic nature have exposed some technical challenges as well as interesting discoveries. In general web mining can be classified into web structure mining and web usage mining. Here we introduce two applications of web mining, first from mining the web structure we identify web communities, and the second we mine web usage of mobile internet users on location aware search engine. Those applications require heavy computational power as well as good scalability. Cluster of commodity PCs is suitable as the platform to handle such applications. Here we also report some approaches for optimal parallel execution of mining algorithms on PC cluster.
data warehousing and knowledge discovery | 2000
Takeshi Yoshizawa; Iko Pramudiono; Masaru Kitsuregawa
Data mining is becoming increasingly important since the size of databases grows even larger and the need to explore hidden rules from the databases becomes widely recognized. Currently database systems are dominated by relational database and the ability to perform data mining using standard SQL queries will definitely ease implementation of data mining. However the performance of SQL based data mining is known to fall behind specialized implementation and expensive mining tools being on sale. In this paper we present an evaluation of SQL based data mining on commercial RDBMS (IBM DB2 UDB EEE). We examine some techniques to reduce I/O cost by using View and Subquery. Those queries can be more than 6 times faster than SETM SQL query reported previously. In addition, we have made performance evaluation on parallel database environment and compared the performance result with commercial data mining tool (IBM Intelligent Miner). We prove that SQL based data mining can achieve sufficient performance by the utilization of SQL query customization and database tuning.
knowledge discovery and data mining | 2002
Bowo Prasetyo; Iko Pramudiono; Katsumi Takahashi; Masaru Kitsuregawa
Navigational behavior of Website visitors can be extracted from web access log files with data mining techniques such a sequential pattern mining. Visualization of the discovered patterns is very helpful to understand how visitors navigate over the various pages on the site. Currently several web log visulization tools have been developed. However those tools are far from satisfactory. They do not provide global view of visitor access as well as individual traversal path effertively. Here we introduce Naviz, a system of interactive web log visulization that is designed to overcome those drawbacks. It combines two-dimensional graph of visitor access traversals that considers appropriate web traversal properties, i.e. hierarchization regarding traversal traffic and grouping of related pages, and facilities for filtering traversal paths by specifying visited pages and path attributes, such as number of hops, support and confidence. The tool also provides support for modern dynamic web pages. we apply the tool to visualize results of data mining study on web log data of Mobile Townpage, a directory service of phone numbers in Japan for i-Mode mobile internet users. The results indicate that our system can easily handle thousands of discovered ptterns to discover interesting navigational behavior such as success paths, exit paths and lost paths.
data warehousing and knowledge discovery | 1999
Iko Pramudiono; Takahiko Shintani; Takayuki Tamura; Masaru Kitsuregawa
Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. One of data mining techniques, generalized association rule mining with taxonomy, is potential to discover more useful knowledge than ordinary flat association mining by taking application specific information into account. We proposed SQL queries, named TTR-SQL and TH-SQL to perform this kind of mining and evaluated them on PC cluster. Those queries can be more than 30% faster than Apriori based SQL query reported previously. Although RDBMS has powerful query processing ability through SQL, most data mining systems use specialized implementations to achieve better performance. There is a tradeoff between performance and portability. Performance is not necessarily sufficiently high but seamless integration with existing RDBMS would be considerably advantageous. Since RDB is already very popular, the feasibility of generalized association rule mining can be explored using the proposed SQL query instead of purchasing expensive mining software. In addition, parallel RDB is now also widely accepted. We showed that paralleling the SQL execution can offer the same performance with those native programs with 10 to 15 nodes. Since most organizations have a lot of PCs, which are not fully utilized. We are able to exploit such resources to explore the performance significantly.
databases in networked information systems | 2003
Masaru Kitsuregawa; Iko Pramudiono
The need for scalable and efficient frequent pattern mining has driven the development for parallel algorithms. High cost performance platforms like PC cluster are also becoming widely available. Modern algorithms for frequent pattern mining employs complicated tree structure. Here we report the development of the tree based parallel mining algorithms on PC cluster: Parallel FP-growth and an extension to mine web access patterns called Parallel WAP-mine.
databases in networked information systems | 2002
Masaru Kitsuregawa; Iko Pramudiono; Yusuke Ohura; Masashi Toyoda
Web mining is now a popular term of techniques to analize the data from World Wide Web(WWW). Here we will report some of our experiences in large scale web mining. The first is the development of user query recommendation system based on web usage mining of a commercial web directory service, and the second one is cyber community mining from Japan domain web structure.