Tuong Le
Ton Duc Thang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tuong Le.
International Journal of Machine Learning and Cybernetics | 2016
Bay Vo; Tuong Le; Frans Coenen; Tzung-Pei Hong
Frequent itemset mining is a fundamental element with respect to many data mining problems directed at finding interesting patterns in data. Recently the PrePost algorithm, a new algorithm for mining frequent itemsets based on the idea of N-lists, which in most cases outperforms other current state-of-the-art algorithms, has been presented. This paper proposes an improved version of PrePost, the N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) algorithm that uses a hash table to enhance the process of creating the N-lists associated with 1-itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the “subsume index” of frequent 1-itemsets based on the N-list concept. Using the subsume index, NSFI can identify groups of frequent itemsets without determining the N-list associated with them. The experimental results show that NSFI outperforms PrePost in terms of runtime and memory usage and outperforms dEclat in terms of runtime.
Expert Systems With Applications | 2015
Quyen Huynh-Thi-Le; Tuong Le; Bay Vo; Bac Le
Using N-list structure for mining top-rank-k frequent patterns effectively.Subsume concept was also used to speed up the runtime of the mining process.The experiment was conducted to show the effectiveness of the proposed algorithm. Frequent pattern mining generates a lot of candidates, which requires a lot of memory usage and mining time. In real applications, a small number of frequent patterns are used. Therefore, the mining of top-rank-k frequent patterns, which limits the number of mined frequent patterns by ranking them in frequency, has received increasing interest. This paper proposes the iNTK algorithm, which is an improved version of the NTK algorithm, for mining top-rank-k frequent patterns. This algorithm employs an N-list structure to represent patterns. The subsume concept is used to speed up the process of mining top-rank-k patterns. The experiments are conducted to evaluate iNTK and NTK in terms of mining time and memory usage for eight datasets. The experimental results show that iNTK is more efficient and faster than NTK.
Engineering Applications of Artificial Intelligence | 2014
Tuong Le; Bay Vo
Erasable itemset (EI) mining is an interesting variation of frequent itemset mining which allows managers to carefully consider their production plans to ensure the stability of the factory. Existing algorithms for EI mining require a lot of time and memory. This paper proposes an effective algorithm, called mining erasable itemsets (MEI), which uses the divide-and-conquer strategy and the difference pidset (dPidset) concept for mining EIs fully. Some theorems for efficiently computing itemset information to reduce mining time up and memory usage are also derived. Experimental results show that MEI outperforms existing approaches in terms of both the mining time and memory usage. Moreover, the proposed algorithm is capable of mining EIs with higher thresholds than those obtained using existing approaches.
Expert Systems With Applications | 2015
Tuong Le; Bay Vo
Two theorems for fast determining closed patterns based on N-list structure are presented.An N-list-based algorithm for mining closed patterns is then proposed.The proposed algorithm outperforms a number of classical algorithms in terms of runtime and memory usage in most cases. Frequent closed patterns (FCPs), a condensed representation of frequent patterns, have been proposed for the mining of (minimal) non-redundant association rules to improve performance in terms of memory usage and mining time. Recently, the N-list structure has been proven to be very efficient for mining frequent patterns. This study proposes an N-list-based algorithm for mining FCPs called NAFCP. Two theorems for fast determining FCPs based on the N-list structure are proposed. The N-list structure provides a much more compact representation compared to previously proposed vertical structures, reducing the memory usage and mining time required for mining FCPs. The experimental results show that NAFCP outperforms previous algorithms in terms of runtime and memory usage in most cases.
Applied Intelligence | 2014
Bay Vo; Tuong Le; Tzung-Pei Hong; Bac Le
Incremental mining has attracted the attention of many researchers due to its usefulness in online applications. Many algorithms have thus been proposed for incrementally mining frequent itemsets. Maintaining a frequent-itemset lattice (FIL) is difficult for databases with large numbers of frequent itemsets, especially huge databases, due to the storage of links of nodes in the lattice. However, generating association rules from a FIL has been shown to be more effective than traditional methods such as directly generating rules from frequent itemsets or frequent closed itemsets. Therefore, when the number of frequent itemsets is not huge (i.e., they can be stored in the lattice without excessive memory overhead), the lattice-based approach outperforms approaches which mine association rules from frequent itemsets/frequent closed itemsets. However, incremental algorithms for building FILs have not yet been proposed. This paper proposes an effective approach for the maintenance of a FIL based on the pre-large concept in incremental mining. The building process of a FIL is first improved using two proposed theorems regarding the paternity relation between two nodes in the lattice. An effective approach for maintaining a FIL with dynamically inserted data is then proposed based on the pre-large and the diffset concepts. The experimental results show that the proposed approach outperforms the batch approach for building a FIL in terms of execution time.
asian conference on intelligent information and database systems | 2014
Giang Nguyen; Tuong Le; Bay Vo; Bac Le
Erasable itemset mining first introduced in 2009 is an interesting variation of pattern mining. The managers can use the erasable itemsets for planning production plan of the factory. Besides the problem of mining erasable itemsets, the problem of mining top-rank-k erasable itemsets is an interesting and practical problem. In this paper, we first propose a new structure, call dPID_List and two theorems associated with it. Then, an improved algorithm for mining top-rank-k erasable itemsets using dPID_List structure is developed. The effectiveness of the proposed method has been demonstrated by comparisons in terms of mining time and memory usage with VM algorithm for three datasets.
Expert Systems With Applications | 2017
Tung Kieu; Bay Vo; Tuong Le; Zhi-Hong Deng; Bac Le
The problem of mining top-k co-occurrence items with sequential pattern is defined.Sid-set and Sid-sets with an index are used to reduce the scanning time.Three algorithms including NAM, VAM, and VIAM for mining top-k co-occurrence items with sequential pattern are proposed.Two pruning techniques are proposed and used in VAM and VIAM algorithms to improve the processing time. Frequent sequential pattern mining has become one of the most important tasks in data mining. It has many applications, such as sequential analysis, classification, and prediction. How to generate candidates and how to control the combinatorically explosive number of intermediate subsequences are the most difficult problems. Intelligent systems such as recommender systems, expert systems, and business intelligence systems use only a few patterns, namely those that satisfy a number of defined conditions. Challenges include the mining of top-k patterns, top-rank-k patterns, closed patterns, and maximal patterns. In many cases, end users need to find itemsets that occur with a sequential pattern. Therefore, this paper proposes approaches for mining top-k co-occurrence items usually found with a sequential pattern. The Naive Approach Mining (NAM) algorithm discovers top-k co-occurrence items by directly scanning the sequence database to determine the frequency of items. The Vertical Approach Mining (VAM) algorithm is based on vertical database scanning. The Vertical with Index Approach Mining (VIAM) algorithm is based on a vertical database with index scanning. VAM and VIAM use pruning strategies to reduce the search space, thus improving performance. VAM and VIAM are especially effective in mining the co-occurrence items of a long input pattern. The three algorithms were evaluated using real-world databases. The experimental results show that these algorithms perform well, especially VAM and VIAM.
Expert Systems With Applications | 2017
Bay Vo; Sang Pham; Tuong Le; Zhi-Hong Deng
An N-list structure is used to compress the dataset for mining Maximal Frequent Patterns.A pruning technique is then proposed and used in INLA-MFP algorithm to improve the runtime and memory usage.Experiments were conducted to show that INLA-MFP outperforms well-known algorithms. Mining maximal frequent patterns (MFPs) is an approach that limits the number of frequent patterns (FPs) to help intelligent systems operate efficiently. Many approaches have been proposed for mining MFPs, but the complexity of the problem is enormous. Therefore, the run time and memory usage are still large. Recently, the N-list structure has been proposed and verified to be very effective for mining FPs, frequent closed patterns, and top-rank-k FPs. Therefore, this paper uses the N-list structure for mining MFPs. A pruning technique is also proposed to prune branches to reduce the search space. This technique is applied to an algorithm called INLA-MFP (improved N-list-based algorithm for mining maximal frequent patterns) for mining MFPs. Experiments were conducted to evaluate the effectiveness of the proposed algorithm. The experimental results show that INLA-MFP outperforms two state-of-the-art algorithms for mining MFPs.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2014
Tuong Le; Bay Vo; Giang Nguyen
Pattern mining, one of the most important problems in data mining, involves finding existing patterns in data. This article provides a survey of the available literature on a variant of pattern mining, namely erasable itemset (EI) mining. EI mining was first presented in 2009 and META is the first algorithm to solve this problem. Since then, a number of algorithms, such as VME, MERIT, and dMERIT+, have been proposed for mining EI. MEI, proposed in 2014, is currently the best algorithm for mining EIs. In this study, the META, VME, MERIT, dMERIT+, and MEI algorithms are described and compared in terms of mining time and memory usage. WIREs Data Mining Knowl Discov 2014, 4:356–379. doi: 10.1002/widm.1137
Applied Intelligence | 2015
Giang Nguyen; Tuong Le; Bay Vo; Bac Le
Erasable itemset mining, first proposed in 2009, is an interesting problem in supply chain optimization. The dPidset structure, a very effective structure for mining erasable itemsets, was introduced in 2014. The dPidset structure outperforms previous structures such as PID_List and NC_Set. Algorithms based on dPidset can effectively mine erasable itemsets. However, for very dense datasets, the mining time and memory usage are large. Therefore, this paper proposes an effective approach that uses the subsume concept for mining erasable itemsets for very dense datasets. The subsume concept is used to help early determine the information of a large number of erasable itemsets without the usual computational cost. Then, the erasable itemsets for very dense datasets (EIFDD) algorithm, which uses the subsume concept and the dPidset structure for the erasable itemset mining of very dense datasets, is proposed. An illustrative example is given to demonstrate the proposed algorithm. Finally, an experiment is conducted to show the effectiveness of EIFDD.