Is this you? Create Your Porfile

Wensheng Gan

Harbin Institute of Technology Shenzhen Graduate School

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wensheng Gan is active.

Explore More

Publication

Featured researches published by Wensheng Gan.

Knowledge Based Systems | 2016

Efficient algorithms for mining high-utility itemsets in uncertain databases

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Vincent S. Tseng

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the information stored in databases is precise, i.e., that there is no uncertainty. But in many real-life applications, an item or itemset is not only present or absent in transactions but is also associated with an existence probability. This is especially the case for data collected experimentally or using noisy sensors. In the past, many algorithms were respectively proposed to effectively mine frequent itemsets in uncertain databases. But mining HUIs in an uncertain database has not yet been proposed, although uncertainty is commonly seen in real-world applications. In this paper, a novel framework, named potential high-utility itemset mining (PHUIM) in uncertain databases, is proposed to efficiently discover not only the itemsets with high utilities but also the itemsets with high existence probabilities in an uncertain database based on the tuple uncertainty model. The PHUI-UP algorithm (potential high-utility itemsets upper-bound-based mining algorithm) is first presented to mine potential high-utility itemsets (PHUIs) using a level-wise search. Since PHUI-UP adopts a generate-and-test approach to mine PHUIs, it suffers from the problem of repeatedly scanning the database. To address this issue, a second algorithm named PHUI-List (potential high-utility itemsets PU-list-based mining algorithm) is also proposed. This latter directly mines PHUIs without generating candidates, thanks to a novel probability-utility-list (PU-list) structure, thus greatly improving the scalability of PHUI mining. Substantial experiments were conducted on both real-life and synthetic datasets to assess the performance of the two designed algorithms in terms of runtime, number of patterns, memory consumption, and scalability.

Advanced Engineering Informatics | 2015

A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification

Jerry Chun-Wei Lin; Wensheng Gan; Tzung-Pei Hong

High-utility itemsets mining (HUIM) is a critical issue which concerns not only the occurrence frequencies of itemsets in association-rule mining (ARM), but also the factors of quantity and profit in real-life applications. Many algorithms have been developed to efficiently mine high-utility itemsets (HUIs) from a static database. Discovered HUIs may become invalid or new HUIs may arise when transactions are inserted, deleted or modified. Existing approaches are required to re-process the updated database and re-mine HUIs each time, as previously discovered HUIs are not maintained. Previously, a pre-large concept was proposed to efficiently maintain and update the discovered information in ARM, which cannot be directly applied into HUIM. In this paper, a maintenance (PRE-HUI-MOD) algorithm with transaction modification based on a new pre-large strategy is presented to efficiently maintain and update the discovered HUIs. When the transactions are consequentially modified from the original database, the discovered information is divided into three parts with nine cases. A specific procedure is then performed to maintain and update the discovered information for each case. Based on the designed PRE-HUI-MOD algorithm, it is unnecessary to rescan original database until the accumulative total utility of the modified transactions achieves the designed safety bound, which can greatly reduce the computations of multiple database scans when compared to the batch-mode approaches.

international c conference on computer science & software engineering | 2015

Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong

High-utility itemset mining (HUIM) is an emerging topic in data mining. It consists of discovering high-utility itemsets (HUIs), i.e. groups of items (itemsets) that generate a high profit in transactional databases. Several algorithms have been proposed for this task. However, they suffer from an important limitation, which is to rely on a single minimum utility threshold as the sole criterion for identifying HUIs. In this paper, we address this issue by introducing the novel framework of HUIM with multiple minimum utility thresholds (HUIM-MMU). According to this framework, the user may specify different thresholds for each item, to discover HUIs. To perform HUIM-MMU, we first present an algorithm named HUI-MMU, which relies on a new sorted downward closure (SDC) property and least minimum utility threshold (LMU). Furthermore, an improved algorithm, namely HUI-MMUTID, is also proposed based on TID-index strategy, to increase mining performance. Substantial experiments both on real-life and synthetic datasets show that the two proposed algorithms can efficiently and effectively discover the complete set of HUIs in transactional databases while considering multiple minimum utility thresholds.

Engineering Applications of Artificial Intelligence | 2015

RWFIM: Recent weighted-frequent itemsets mining

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong

Abstract In recent years, weighted frequent itemsets mining (WFIM) has become a critical issue of data mining, which can be used to discover more useful and interesting patterns in real-world applications instead of the traditional frequent itemsets mining. Many algorithms have been developed to find weighted frequent itemsets (WFIs) without time-sensitive consideration. The discovered out-of-date information may, however, be meaningless and useless in decision making. In this paper, a novel framework, namely recent weighted-frequent itemsets mining (RWFIM) is proposed to concern both the weight and time-sensitive constraints. A projected-based RWFIM-P algorithm is first proposed for mining the designed recent weighted-frequent itemsets (RWFIs) with weight and time-sensitive consideration. It uses the projection-and-test mechanism to discover RWFIs in a recursive way. Based on the developed RWFIM-P algorithm, the entire database can be projected and divided into several sub-databases according to the currently processed itemset, thus reducing the computational costs and memory requirements. The second RWFIM-PE algorithm is also proposed to improve the performance of the first RWFIM-P algorithm based on the developed Estimated Weight of 2-itemset Pruning (EW2P) strategy to mine the RWFIs without generating the unpromising candidates, thus avoiding the computations of the projection mechanism compared to the first RWFIM-P algorithm. Experiments are conducted to evaluate the performance of the proposed two algorithms compared to the traditional WFIM in terms of execution time, number of generated RWFIs and scalability under varied two minimum thresholds in several real-world and synthetic datasets.

Knowledge Based Systems | 2016

Efficient mining of high-utility itemsets using multiple minimum utility thresholds

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Justin Zhan

In the field of data mining, the topic of high-utility itemset mining (HUIM) has recently gained a lot of attention from researchers as it takes many factors into account that are useful for decision-making by retail managers. In the past, many algorithms have been presented for HUIM but most of them suffer from the limitation of using a single minimum utility threshold to identify high-utility itemsets (HUIs). For real-life applications, finding itemsets using a single threshold is inadequate and unfair since each item is different. Hence, the diversity or importance of each item should be considered. This paper proposes a solution to this issue by defining the novel task of HUIM with multiple minimum utility thresholds (named as HUIM-MMU). This task lets users specify a different minimum utility threshold for each item to identify more useful and specific HUIs, which would generate more profits when compared to HUIs discovered based on a single minimum utility threshold. The HUI-MMU algorithm is designed to mine HUIs in a level-wise manner. The sorted downward closure (SDC) property and the least minimum utility (LMU) concept are developed to avoid a combinatorial explosion for identifying HUIs and to ensure the completeness and correctness of HUI-MMU for discovering HUIs. Meanwhile, two improved algorithms, namely HUI-MMUTID and HUI-MMUTE, are presented based on the TID-index and EUCP strategies. Those strategies can be used to speed up the mining performance to discover HUIs. Substantial experiments on both real-life and synthetic datasets show that the designed algorithms can efficiently and effectively discover the complete set of HUIs in databases by considering multiple minimum utility thresholds.

Applied Intelligence | 2016

Weighted frequent itemset mining over uncertain databases

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Vincent S. Tseng

Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.

Advanced Engineering Informatics | 2016

Fast algorithms for mining high-utility itemsets with various discount strategies

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Vincent S. Tseng

In recent years, mining high-utility itemsets (HUIs) has become as a key topic in data mining. However, most of the developed algorithms assume the unrealistic situations that unit profits of items remain unchanged over time. But in real-life situations, the profit of an item or itemset varies as a function of cost prices, sales prices and sales strategies. In this paper, a novel framework for mining HUIs with two algorithms under various Discount strategies (HUID) are introduced. HUID-tp is based on various discount strategies and a novel downward closure property to mine the complete set of HUIs. HUID-Miner is an algorithm relying on a compact data structure (Positive-and-Negative Utility-list, PNU-list) and new pruning strategies to efficiently discover HUIs without candidate generation, while considerably reducing the size of the search space. Furthermore, a strategy named Estimated Utility Co-occurrence Strategy which stores the relationships between 2-itemsets is also adopted in the proposed improvement HUID-EMiner algorithm to speed up computation. An extensive experimental study carried on several real-life datasets shows the performance of the proposed algorithms.

Knowledge Based Systems | 2016

FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits

Jerry Chun-Wei Lin; Philippe Fournier-Viger; Wensheng Gan

Abstract High utility itemset mining is an emerging data mining task, which consists of discovering highly profitable itemsets (called high utility itemsets) in very large transactional databases. Many algorithms have been proposed to efficiently discover high utility itemsets but most of them assume that items may only have positive unit profits. However, in real-world transactional databases, items (products) often have positive or negative unit profits. Mining high utility itemsets in a transactional database where items have positive or negative unit profits is a computationally expensive task, and it is thus desirable to design more efficient algorithms. To address this issue, we propose an efficient algorithm named FHN (Faster High-Utility itemset miner with Negative unit profits). It relies on a novel PNU-list structure (Positive-and-Negative Utility-list) structure to efficiently mine high utility itemsets, while considering both positive and negative unit profits. Moreover, several pruning strategies are introduced in FHN to reduce the number of candidate itemsets, and thus enhance the performance of FHN. Extensive experimental results on both real-life and synthetic datasets show that the proposed FHN algorithm is in general two to three orders of magnitude faster and can use up to 200 times less memory than the state-of-the-art algorithm HUINIV-Mine. Moreover, it is shown that FHN performs especially well on dense datasets.

The Scientific World Journal | 2015

An incremental high-utility mining algorithm with transaction insertion.

Jerry Chun-Wei Lin; Wensheng Gan; Tzung-Pei Hong; Binbin Zhang

Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns.

Knowledge and Information Systems | 2017

FDHUP: Fast algorithm for mining discriminative high utility patterns

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Han-Chieh Chao

Recently, high utility pattern mining (HUPM) has been extensively studied. Many approaches for HUPM have been proposed in recent years, but most of them aim at mining HUPs without any consideration for their frequency. This has the major drawback that any combination of a low utility item with a very high utility pattern is regarded as a HUP, even if this combination has low affinity and contains items that rarely co-occur. Thus, frequency should be a key criterion to select HUPs. To address this issue, and derive high utility interesting patterns (HUIPs) with strong frequency affinity, the HUIPM algorithm was proposed. However, it recursively constructs a series of conditional trees to produce candidates and then derive the HUIPs. This procedure is time-consuming and may lead to a combinatorial explosion when the minimum utility threshold is set relatively low. In this paper, an efficient algorithm named fast algorithm for mining discriminative high utility patterns (DHUPs) with strong frequency affinity (FDHUP) is proposed to efficiently discover DHUPs by considering both the utility and frequency affinity constraints. Two compact structures named EI-table and FU-tree and three pruning strategies are introduced in the proposed algorithm to reduce the search space, and efficiently and effectively discover DHUPs. An extensive experimental study shows that the proposed FDHUP algorithm considerably outperforms the state-of-the-art HUIPM algorithm in terms of execution time, memory consumption, and scalability.

Explore More