Is this you? Create Your Porfile

Hua-Fu Li

National Chiao Tung University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hua-Fu Li is active.

Explore More

Publication

Featured researches published by Hua-Fu Li.

international workshop on research issues in data engineering | 2005

Online mining (recently) maximal frequent itemsets over data streams

Hua-Fu Li; Suh-Yin Lee; Man-Kwan Shan

A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal frequent itemsets in landmark windows over data streams. A new summary data structure called summary frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal frequent itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal frequent itemsets over the entire history of the data streams.

Knowledge and Information Systems | 2008

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Hua-Fu Li; Man-Kwan Shan; Suh-Yin Lee

Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.

international conference on data mining | 2006

Incremental Mining of Sequential Patterns over a Stream Sliding Window

Chin-Chuan Ho; Hua-Fu Li; Fang-Fei Kuo; Suh-Yin Lee

Incremental mining of sequential patterns from data streams is one of the most challenging problems in mining data streams. However, previous work of mining sequential patterns from data streams is almost focused on mining of patterns from stream of item-sequences, not stream of itemset-sequences. In this paper, we propose an efficient single-pass algorithm, called IncSPAM, to maintain the set of sequential patterns from itemset-sequence streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. Experiments show that the proposed IncSPAM algorithm is efficient for mining sequential patterns over data streams

data warehousing and knowledge discovery | 2011

Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

Hua-Fu Li; Hsin-Yun Huang; Suh-Yin Lee

Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.

systems, man and cybernetics | 2006

Efficient Maintenance and Mining of Frequent Itemsets over Online Data Streams with a Sliding Window

Hua-Fu Li; Chin-Chuan Ho; Man-Kwan Shan; Suh-Yin Lee

Online mining of streaming data is one of the most important issues in data mining. In this paper, we proposed an efficient one-pass algorithm, called MFI-TransSW (mining frequent itemsets over a transaction-sensitive sliding window), to mine the set of all frequent itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. The experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithms for mining frequent itemsets over recent data streams.

Computer Networks | 2006

DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences

Hua-Fu Li; Suh-Yin Lee; Man-Kwan Shan

Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data.

international world wide web conferences | 2004

On mining webclick streams for path traversal patterns

Hua-Fu Li; Suh-Yin Lee; Man-Kwan Shan

Mining user access patterns from a continuous stream of Web-clicks presents new challenges over traditional Web usage mining in a large static Web-click database. Modeling user access patterns as maximal forward references, we present a single-pass algorithm StreamPath for online discovering frequent path traversal patterns from an extended prefix tree-based data structure which stores the compressed and essential information about users moving histories in the stream. Theoretical analysis and performance evaluation show that the space requirement of StreamPath is limited to a logarithmic boundary, and the execution time, compared with previous multiple-pass algorithms [2], is fast.

international conference on data mining | 2006

A New Algorithm for Maintaining Closed Frequent Itemsets in Data Streams by Incremental Updates

Hua-Fu Li; Chin-Chuan Ho; Fang-Fei Kuo; Suh-Yin Lee

Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we propose an efficient one-pass algorithm, NewMoment to maintain the set of closed frequent itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. Experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithm Moment for mining closed frequent itemsets over recent data streams

web intelligence | 2005

DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams

Hua-Fu Li; Suh-Yin Lee; Man-Kwan Shan

Online, single-pass mining Web click streams poses some interesting computational issues, such as unbounded length of streaming data, possibly very fast arrival rate and just one scan over previously arrived click-sequencer In this paper, we propose a new, single-pass algorithm, called DSM-TKP (data stream mining for top-k path traversal patterns), for mining top-k path traversal patterns, where k is the desired number of path traversal patterns to be mined. An effective summary data structure called TKP-forest (top-k path forest) is used to maintain the essential information about the top-k path traversal patterns of the click-stream so far. Experimental studies show that DSM-TKP algorithm uses stable memory usage and makes only one pass over the streaming data.

Journal of Information Science | 2011

Incremental mining of closed inter-transaction itemsets over data stream sliding windows

Shih-Chuan Chiu; Hua-Fu Li; Jiun-Long Huang; Hsin-Han You

Mining inter-transaction association rules is one of the most interesting issues in data mining research. However, in a data stream environment the previous approaches are unable to find the result of the new-incoming data and the original database without re-computing the whole database. In this paper, we propose an incremental mining algorithm, called DSM-CITI (Data Stream Mining for Closed Inter-Transaction Itemsets), for discovering the set of all frequent inter-transaction itemsets from data streams. In the framework of DSM-CITI, a new in-memory summary data structure, ITP-tree, is developed to maintain frequent inter-transaction itemsets. Moreover, algorithm DSM-CITI is able to construct ITP-tree incrementally and uses the property to avoid unnecessary updates. Experimental studies show that the proposed algorithm is efficient and scalable for mining frequent inter-transaction itemsets over stream sliding windows.

Explore More