Chang-Hung Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chang-Hung Lee is active.

Explore More

Publication

Featured researches published by Chang-Hung Lee.

conference on information and knowledge management | 2001

Sliding-window filtering: an efficient algorithm for incremental mining

Chang-Hung Lee; Cheng-Ru Lin; Ming-Syan Chen

We explore in this paper an effective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candidate itemset generation. Under SWF, the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions. Algorithm SWF not only significantly reduces I/O and CPU cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding-window partition. Algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database. By utilizing proper scan reduction techniques, only one scan of the incremented dataset is needed by algorithm SWF. The I/O cost of SWF is, in orders of magnitude, smaller than those required by prior methods, thus resolving the performance bottleneck. Experimental studies are performed to evaluate performance of algorithm SWF. It is noted that the improvement achieved by algorithm SWF is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases.

IEEE Transactions on Knowledge and Data Engineering | 2003

Progressive partition miner: an efficient algorithm for mining general temporal association rules

Chang-Hung Lee; Ming-Syan Chen; Cheng-Ru Lin

We explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. The current model of association rule mining is not able to handle the publication database due to the following fundamental problems, i.e., 1) lack of consideration of the exhibition period of each individual item and 2) lack of an equitable support counting basis for each item. To remedy this, we propose an innovative algorithm progressive-partition-miner (abbreviated as PPM) to discover general temporal association rules in a publication database. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics. Algorithm PPM is also designed to employ a filtering threshold in each partition to early prune out those cumulatively infrequent 2-itemsets. The feature that the number of candidate 2-itemsets generated by PPM is very close to the number of frequent 2-itemsets allows us to employ the scan reduction technique to effectively reduce the number of database scans. Explicitly, the execution time of PPM is, in orders of magnitude, smaller than those required by other competitive schemes that are directly extended from existing methods. The correctness of PPM is proven and some of its theoretical properties are derived. Sensitivity analysis of various parameters is conducted to provide many insights into Algorithm PPM.

international conference on data mining | 2001

On mining general temporal association rules in a publication database

Chang-Hung Lee; Cheng-Ru Lin; Ming-Syan Chen

In this paper, we explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items, each containing an individual exhibition period. The current model of association rule mining is not able to handle a publication database due to the following fundamental problems: (1) lack of consideration of the exhibition period of each individual item; and (2) lack of an equitable support counting basis for each item. To remedy this, we propose an innovative algorithm, progressive-partition-miner (PPM), to discover general temporal association rules in a publication database. The basic idea of PPM is to first partition the publication database into exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics. PPM is also designed to employ a filtering threshold in each partition to prune out those cumulatively infrequent 2-itemsets at an early stage. Explicitly, the execution time of PPM is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods.

Information Systems | 2005

Sliding window filtering: an efficient method for incremental mining on a time-variant database.

Chang-Hung Lee; Cheng-Ru Lin; Ming-Syan Chen

Recently, several important database applications have called for the design of efficient techniques for incremental mining of association rules. In response to this need, we explore in this paper an effective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candidate itemset generation. Under SWF, the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions. Algorithm SWF not only significantly reduces I/O and CPU cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding-window partition. More importantly, algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database. By utilizing proper scan reduction techniques, only one scan of the incremented dataset is needed by algorithm SWF. The I/O cost of SWF is, in orders of magnitude, smaller than those required by prior methods, thus resolving the performance bottleneck. Extensive experimental studies are performed to evaluate performance of algorithm SWF. Sensitivity analysis of various parameters is conducted to provide many insights into algorithm SWF. It is noted that the improvement achieved by algorithm SWF is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases.

international conference on data mining | 2002

Mining general temporal association rules for items with different exhibition periods

Cheng-Yue Chang; Ming-Syan Chen; Chang-Hung Lee

In this paper we explore a new model of mining general temporal association rules from large databases where the exhibition periods of the items are allowed to be different from one to another. Note that in this new model, the downward closure property which all prior Apriori-based algorithms relied upon to attain good efficiency is no longer valid. As a result, how to efficiently generate candidate itemsets form large databases has become the major challenge. To address this issue, we develop an efficient algorithm, referred to as algorithm SPF (standing for Segmented Progressive Filter) in this paper The basic idea behind SPF is to first segment the database into sub-databases in such a way that items in each sub-database will have either the common starting time or the common ending time. Then, for each sub-database, SPF progressively filters candidate 2-itemsets with cumulative filtering thresholds either forward or backward in time. This feature allows SPF of adopting the scan reduction technique by generating all candidate k-itemsets (k>2) from candidate 2-itemsets directly. The experimental results show that algorithm SPF significantly outperforms other schemes which are extended from prior methods in terms of the execution time and scalability.

very large data bases | 2008

Efficient algorithms for incremental Web log mining with dynamic thresholds

Jian Chih Ou; Chang-Hung Lee; Ming-Syan Chen

With the fast increase in Web activities, Web data mining has recently become an important research topic and is receiving a significant amount of interest from both academic and industrial environments. While existing methods are efficient for the mining of frequent path traversal patterns from the access information contained in a log file, these approaches are likely to over evaluate associations. Explicitly, most previous studies of mining path traversal patterns are based on the model of a uniform support threshold, where a single support threshold is used to determine frequent traversal patterns without taking into consideration such important factors as the length of a pattern, the positions of Web pages, and the importance of a particular pattern, etc. As a result, a low support threshold will lead to lots of uninteresting patterns derived whereas a high support threshold may cause some interesting patterns with lower supports to be ignored. In view of this, this paper broadens the horizon of frequent path traversal pattern mining by introducing a flexible model of mining Web traversal patterns with dynamic thresholds. Specifically, we study and apply the Markov chain model to provide the determination of support threshold of Web documents; and further, by properly employing some effective techniques devised for joining reference sequences, the proposed algorithm dynamic threshold miner (DTM) not only possesses the capability of mining with dynamic thresholds, but also significantly improves the execution efficiency as well as contributes to the incremental mining of Web traversal patterns. Performance of algorithm DTM and the extension of existing methods is comparatively analyzed with synthetic and real Web logs. It is shown that the option of algorithm DTM is very advantageous in reducing the number of unnecessary rules produced and leads to prominent performance improvement.

IEEE Transactions on Computers | 2002

Processing distributed mobile queries with interleaved remote mobile joins

Chang-Hung Lee; Ming-Syan Chen

The query processing in a mobile computing environment involves join processing among different sites which include static servers and mobile computers. Because of the presence of asymmetric features in a mobile computing environment, the conventional query processing for a distributed database cannot be directly applied to a mobile computing system. In this paper, we first explore some unique features of a mobile environment and then, in light of these features, devise query processing methods for both join and query processing. Remote mobile joins are said to be effectual if they are, when being interleaved into a join sequence, able to reduce the amount of data transmission cost required for distributed mobile query processing. Since mobile relations are employed as reducers in our proposed query processing cost model, more mobile joins in the query processing lead to less data transmitted through the network. With proper scheduling, interleaving effectual remote mobile joins into a query scheduling can significantly reduce the total amount of data transmission among different sites. A simulator is developed to evaluate the performance of algorithms devised. Our results show that the approach of interleaving the processing of distributed mobile queries with effectual remote mobile joins is not only efficient, but also effective in reducing the total amount of data transmission cost required to process distributed mobile queries.

knowledge discovery and data mining | 2002

Distributed data mining in a chain store database of short transactions

Cheng Ru Lin; Chang-Hung Lee; Ming-Syan Chen; Philip S. Yu

In this paper, we broaden the horizon of traditional rule mining by introducing a new framework of causality rule mining in a distributed chain store database. Specifically, the causality rule explored in this paper consists of a sequence of triggering events and a set of consequential events, and is designed with the capability of mining non-sequential, inter-transaction information. Hence, the causality rule mining provides a very general framework for rule derivation. Note, however, that the procedure of causality rule mining is very costly particularly in the presence of a huge number of candidate sets and a distributed database, and in our opinion, cannot be dealt with by direct extensions from existing rule mining methods. Consequently, we devise in this paper a series of level matching algorithms, including Level Matching (abbreviatedly as LM), Level Matching with Selective Scan (abbreviatedly as LMS), and Distributed Level Matching (abbreviatedly as Distibuted LM), to minimize the computing cost needed for the distributed data mining of causality rules. In addition, the phenomena of time window constraints are also taken into consideration for the development of our algorithms. As a result of properly employing the technologies of level matching and selective scan, the proposed algorithms present good efficiency and scalability in the mining of local and global causality rules. Scale-up experiments show that the proposed algorithms scale well with the number of sites and the number of customer transactions.Index Terms: knowledge discovery, distributed data mining causality rules, triggering events, consequential events

international conference on distributed computing systems | 2001

Distributed query processing in the Internet: exploring relation replication and network characteristics

Chang-Hung Lee; Ming-Syan Chen

We introduce the concept of network graph for distributed query processing. Semijoins and joins are termed contributive replicated semijoins and contributive replicated joins, respectively, when they are interleaved into a join sequence to reduce the amount of data transmission cost required in a network with replicated relations. Our solution procedure consists of three consecutive steps, namely relation selection, join sequence scheduling and merge processing. A simulator is developed to evaluate the performance of algorithms devised. Our results show that the approach of interleaving a join sequence with contributive replicated semijoins/joins is not only efficient in its execution but also effective in reducing the total amount of data transmission cost required to process distributed queries.

database systems for advanced applications | 2001

Using remote joins for the processing of distributed mobile queries

Chang-Hung Lee; Ming-Syan Chen

The query processing in a mobile computing environment involves join processing among different sites which include static servers and mobile computers. In this paper, we first present some unique features of a mobile environment, and then, in light of these features, devise query processing methods for both join and query processing. Remote mobile joins are said to be effectual if they are, when interleaved into a join sequence, able to reduce the data transmission cost required for distributed mobile query processing. It can be verified that the total data transmission cost of the processing in a distributed mobile query can be reduced by algorithms designed by using effectual remote joins. A simulator is developed to evaluate the performance of the devised algorithms. Our results show that the approach of interleaving the processing of distributed mobile queries with effectual remote mobile joins is not only efficient but also effective in reducing the total data transmission cost required to process distributed mobile queries.

Explore More