Kun Ta Chuang
National Cheng Kung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kun Ta Chuang.
international conference on data engineering | 2007
Shan-Hung Wu; Kun Ta Chuang; Chung Min Chen; Ming-Syan Chen
Current approaches to k nearest neighbor (KNN) search in mobile sensor networks require certain kind of indexing support. This index could be either a centralized spatial index or an in-network data structure that is distributed over the sensor nodes. Creation and maintenance of these index structures, to reflect the network dynamics due to sensor node mobility, may result in long query response time and low battery efficiency, thus limiting their practical use. In this paper, we propose a maintenance-free, itinerary-based approach called density-aware itinerary KNN query processing (DIKNN). The DIKNN divides the search area into multiple cone-shape areas centered at the query point. It then performs a query dissemination and response collection itinerary in each of the cone-shape areas in parallel. The design of the DIKNN scheme also takes into account challenging issues such as the the dynamic adjustment of the search radius (in terms of number of hops) according to spatial irregularity or mobility of sensor nodes. The simulation results show that DIKNN yields substantially better performance and scalability over previous work, both as k increases and as the sensor node mobility increases. It outperforms the second runner with up to 50% saving in energy consumption and up to 40% reduction in query response time, while rendering the same level of query result accuracy.
very large data bases | 2008
Kun Ta Chuang; Jiun-Long Huang; Ming-Syan Chen
We explore in this paper a practicably interesting mining task to retrieve top-k (closed) itemsets in the presence of the memory constraint. Specifically, as opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper memory size that can be utilized by mining frequent itemsets. To comply with the upper bound of the memory consumption, two efficient algorithms, called MTK and MTK_Close, are devised for mining frequent itemsets and closed itemsets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human-understandable parameter, namely the desired number of frequent (closed) itemsets k. In practice, it is quite challenging to constrain the memory consumption while also efficiently retrieving top-k itemsets. To effectively achieve this, MTK and MTK_Close are devised as level-wise search algorithms, where the number of candidates being generated-and-tested in each database scan will be limited. A novel search approach, called δ-stair search, is utilized in MTK and MTK_Close to effectively assign the available memory for testing candidate itemsets with various itemset-lengths, which leads to a small number of required database scans. As demonstrated in the empirical study on real data and synthetic data, instead of only providing the flexibility of striking a compromise between the execution efficiency and the memory consumption, MTK and MTK_Close can both achieve high efficiency and have a constrained memory bound, showing the prominent advantage to be practical algorithms of mining frequent patterns.
knowledge discovery and data mining | 2005
Kun Ta Chuang; Ming-Syan Chen; Wen Chieh Yang
We explore in this paper a progressive sampling algorithm, called Sampling Error Estimation (SEE), which aims to identify an appropriate sample size for mining association rules. SEE has two advantages over previous works in the literature. First, SEE is highly efficient because an appropriate sample size can be determined without the need of executing association rules. Second, the identified sample size of SEE is very accurate, meaning that association rules can be highly efficiently executed on a sample of this size to obtain a sufficiently accurate result. This is attributed to the merit of SEE for being able to significantly reduce the influence of randomness by examining several samples with the same size in one database scan. As validated by experiments on various real data and synthetic data, SEE can achieve very prominent improvement in efficiency and also the resulting accuracy over previous works.
IEEE Transactions on Knowledge and Data Engineering | 2008
Hung Leng Chen; Kun Ta Chuang; Ming-Syan Chen
Sampling has been recognized as an important technique to improve the efficiency of clustering. However, with sampling applied, those points which are not sampled will not have their labels after the normal process. Although there is a straightforward approach in the numerical domain, the problem of how to allocate those unlabeled data points into proper clusters remains as a challenging issue in the categorical domain. In this paper, a mechanism named MAximal Resemblance Data Labeling (abbreviated as MARDL) is proposed to allocate each unlabeled data point into the corresponding appropriate cluster based on the novel categorical clustering representative, namely, N-Nodeset Importance Representative (abbreviated as NNIR), which represents clusters by the importance of the combinations of attribute values. MARDL has two advantages: 1) MARDL exhibits high execution efficiency and 2) MARDL can achieve high intracluster similarity and low intercluster similarity, which are regarded as the most important properties of clusters, thus benefiting the analysis of cluster behaviors. MARDL is empirically validated on real and synthetic data sets, and is shown to be significantly more efficient than prior works while attaining results of high quality.
computer software and applications conference | 2001
Ching-Huang Yun; Kun Ta Chuang; Ming-Syan Chen
In this paper we devise an efficient algorithm for clustering market-basket data items. In view of the nature of clustering market basket data, we devise in this paper a novel measurement, called the small-large (abbreviated as SL) ratio, and utilize this ratio to perform the clustering. With this SL ratio measurement, we develop an efficient clustering algorithm for data items to minimize the SL ratio in each group. The proposed algorithm not only incurs an execution time that is significantly smaller than that by prior work but also leads to the clustering results of very good quality.
IEEE Transactions on Knowledge and Data Engineering | 2008
Shan-Hung Wu; Kun Ta Chuang; Chung Min Chen; Ming-Syan Chen
The K-nearest neighbors (KNN) query has been of significant interest in many studies and has become one of the most important spatial queries in mobile sensor networks. Applications of KNN queries may include vehicle navigation, wildlife social discovery, and squad/platoon searching on the battlefields. Current approaches to KNN search in mobile sensor networks require a certain kind of indexing support. This index could be either a centralized spatial index or an in-network data structure that is distributed over the sensor nodes. Creation and maintenance of these index structures, to reflect the network dynamics due to sensor node mobility, may result in long query response time and low battery efficiency, thus limiting their practical use. In this paper, we propose a maintenance-free itinerary-based approach called density-aware itinerary KNN query processing (DIKNN). The DIKNN divides the search area into multiple cone-shape areas centered at the query point. It then performs a query dissemination and response collection itinerary in each of the cone-shape areas in parallel. The design of the DIKNN scheme takes into account several challenging issues such as the trade-off between degree of parallelism and network interference on query response time, and the dynamic adjustment of the search radius (in terms of number of hops) according to spatial irregularity or mobility of sensor nodes. To optimize the performance of DIKNN, a detailed analytical model is derived that automatically determines the most suitable degree of parallelism under various network conditions. This model is validated by extensive simulations. The simulation results show that DIKNN yields substantially better performance and scalability over previous work, both as kappa increases and as the sensor node mobility increases. It outperforms the second runner with up to a 50 percent saving in energy consumption and up to a 40 percent reduction in query response time, while rendering the same level of query result accuracy.
international conference on data mining | 2005
Hung Leng Chen; Kun Ta Chuang; Ming-Syan Chen
Sampling has been recognized as an important technique to improve the efficiency of clustering. However, with sampling applied, those points which are not sampled will not have their labels. Although there is a straightforward approach in the numerical domain, the problem of how to allocate those unlabeled data points into proper clusters remains as a challenging issue in the categorical domain. In this paper, a mechanism named MAximal Resemblance Data Labeling (abbreviated as MARDL) is proposed to allocate each unlabeled data point into the corresponding appropriate cluster based on the novel categorical clustering representative, namely, Node Importance Representative (abbreviated as NIR), which represents clusters by the importance of attribute values. MARDL has two advantages: (1) MARDL exhibits high execution efficiency; (2) after each unlabeled data is allocated into the proper cluster, MARDL preserves clustering characteristics, i.e., high intra-cluster similarity and low inter-cluster similarity. MARDL is empirically validated via real and synthetic data sets, and is shown to be not only more efficient than prior methods but also attaining results of better quality.
very large data bases | 2008
Kun Ta Chuang; Jiun-Long Huang; Ming-Syan Chen
In this paper, we identify and explore that the power-law relationship and the self-similar phenomenon appear in the itemset support distribution. The itemset support distribution refers to the distribution of the count of itemsets versus their supports. Exploring the characteristics of these natural phenomena is useful to many applications such as providing the direction of tuning the performance of the frequent-itemset mining. However, due to the explosive number of itemsets, it is prohibitively expensive to retrieve lots of itemsets before we identify the characteristics of the itemset support distribution in targeted data. As such, we also propose a valid and cost-effective algorithm, called algorithm PPL, to extract characteristics of the itemset support distribution. Furthermore, to fully explore the advantages of our discovery, we also propose novel mechanisms with the help of PPL to solve two important problems: (1) determining a subtle parameter for mining approximate frequent itemsets over data streams; and (2) determining the sufficient sample size for mining frequent patterns. As validated in our experimental results, PPL can efficiently and precisely identify the characteristics of the itemset support distribution in various real data. In addition, empirical studies also demonstrate that our mechanisms for those two challenging problems are in orders of magnitude better than previous works, showing the prominent advantage of PPL to be an important pre-processing means for mining applications.
IEEE Transactions on Circuits and Systems for Video Technology | 2015
Min Hsiang Yang; Chun-Rong Huang; Wan Chen Liu; Shu Zhe Lin; Kun Ta Chuang
Recently, most background modeling approaches represent distributions of background changes by using parametric models such as Gaussian mixture models. Because of significant illumination changes and dynamic moving backgrounds with time, variations of background changes are hard to be modeled by parametric background models. Moreover, how to efficiently and effectively update parameters of parametric models to reflect background changes remains a problem. In this paper, we propose a novel coarse-to-fine detection theory algorithm to extract foreground objects on the basis of nonparametric background and foreground models represented by binary descriptors. We update background and foreground models by a first-in-first-out strategy to maintain the most recent observed background and foreground instances. As shown in the experiments, our method can achieve better foreground extraction results and fewer false alarms of surveillance videos with lighting changes and dynamic backgrounds in both collected and CDnet 2012 benchmark data sets.
international conference industrial engineering other applications applied intelligent systems | 2012
Bai En Shie; Ji Hong Cheng; Kun Ta Chuang; Vincent S. Tseng
Mobile sequential pattern mining is an emerging topic in data mining fields with wide applications, such as planning mobile commerce environments and managing online shopping websites. However, an important factor, i.e., actual utilities (i.e., profit here) of items, is not considered and thus some valuable patterns cannot be found. Therefore, previous researches [8, 9] addressed the problem of mining high utility mobile sequential patterns (abbreviated as UMSPs). Nevertheless the tree-based algorithms may not perform efficiently since mobile transaction sequences are often too complex to form compress tree structures. A novel algorithm, namely UM-Span (high Utility Mobile Sequential Pattern mining), is proposed for efficiently mining UMSPs in this work. UM-Span finds UMSPs by a projected database based framework. It does not need additional database scans to find actual UMSPs, which is the bottleneck of utility mining. Experimental results show that UM-Span outperforms the state-of-the-art UMSP mining algorithms under various conditions.