Is this you? Create Your Porfile

Jerry Chun-Wei Lin

Harbin Institute of Technology Shenzhen Graduate School

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jerry Chun-Wei Lin is active.

Explore More

Publication

Featured researches published by Jerry Chun-Wei Lin.

european conference on machine learning | 2016

The SPMF Open-Source Data Mining Library Version 2

Philippe Fournier-Viger; Jerry Chun-Wei Lin; Antonio Gomariz; Ted Gueniche; Azadeh Soltani; Zhihong Deng; Hoang Thanh Lam

SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.

Knowledge Based Systems | 2016

Efficient algorithms for mining high-utility itemsets in uncertain databases

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Vincent S. Tseng

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the information stored in databases is precise, i.e., that there is no uncertainty. But in many real-life applications, an item or itemset is not only present or absent in transactions but is also associated with an existence probability. This is especially the case for data collected experimentally or using noisy sensors. In the past, many algorithms were respectively proposed to effectively mine frequent itemsets in uncertain databases. But mining HUIs in an uncertain database has not yet been proposed, although uncertainty is commonly seen in real-world applications. In this paper, a novel framework, named potential high-utility itemset mining (PHUIM) in uncertain databases, is proposed to efficiently discover not only the itemsets with high utilities but also the itemsets with high existence probabilities in an uncertain database based on the tuple uncertainty model. The PHUI-UP algorithm (potential high-utility itemsets upper-bound-based mining algorithm) is first presented to mine potential high-utility itemsets (PHUIs) using a level-wise search. Since PHUI-UP adopts a generate-and-test approach to mine PHUIs, it suffers from the problem of repeatedly scanning the database. To address this issue, a second algorithm named PHUI-List (potential high-utility itemsets PU-list-based mining algorithm) is also proposed. This latter directly mines PHUIs without generating candidates, thanks to a novel probability-utility-list (PU-list) structure, thus greatly improving the scalability of PHUI mining. Substantial experiments were conducted on both real-life and synthetic datasets to assess the performance of the two designed algorithms in terms of runtime, number of patterns, memory consumption, and scalability.

mexican international conference on artificial intelligence | 2015

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

Souleymane Zida; Philippe Fournier-Viger; Jerry Chun-Wei Lin; Cheng-Wei Wu; Vincent S. Tseng

High-utility itemset mining (HUIM) is an important data mining task with wide applications. In this paper, we propose a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discovers high-utility itemsets both in terms of execution time and memory. EFIM relies on two upper-bounds named sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper-bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster and consumes up to eight times less memory than the state-of-art algorithms d\(^2\)HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+.

Knowledge and Information Systems | 2017

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Souleymane Zida; Philippe Fournier-Viger; Jerry Chun-Wei Lin; Cheng-Wei Wu; Vincent S. Tseng

In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms

machine learning and data mining in pattern recognition | 2015

Efficient Mining of High-Utility Sequential Rules

Souleymane Zida; Philippe Fournier-Viger; Cheng-Wei Wu; Jerry Chun-Wei Lin; Vincent S. Tseng

Advanced Engineering Informatics | 2015

A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification

Jerry Chun-Wei Lin; Wensheng Gan; Tzung-Pei Hong

\hbox {d}^2

international c conference on computer science & software engineering | 2015

Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong

Engineering Applications of Artificial Intelligence | 2015

RWFIM: Recent weighted-frequent itemsets mining

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong

d2HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.

Knowledge Based Systems | 2016

Efficient mining of high-utility itemsets using multiple minimum utility thresholds

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Justin Zhan

High-utility pattern mining is an important data mining task having wide applications. It consists of discovering patterns generating a high profit in databases. Recently, the task of high-utility sequential pattern mining has emerged to discover patterns generating a high profit in sequences of customer transactions. However, a well-known limitation of sequential patterns is that they do not provide a measure of the confidence or probability that they will be followed. This greatly hampers their usefulness for several real applications such as product recommendation. In this paper, we address this issue by extending the problem of sequential rule mining for utility mining. We propose a novel algorithm named HUSRM High-Utility Sequential Rule Miner, which includes several optimizations to mine high-utility sequential rules efficiently. An extensive experimental study with four datasets shows that HUSRM is highly efficient and that its optimizations improve its execution time by upi¾?to 25 times and its memory usage by upi¾?to 50i¾?%.

industrial conference on data mining | 2016

PHM: Mining Periodic High-Utility Itemsets

Philippe Fournier-Viger; Jerry Chun-Wei Lin; Quang-Huy Duong; Thu-Lan Dam

High-utility itemsets mining (HUIM) is a critical issue which concerns not only the occurrence frequencies of itemsets in association-rule mining (ARM), but also the factors of quantity and profit in real-life applications. Many algorithms have been developed to efficiently mine high-utility itemsets (HUIs) from a static database. Discovered HUIs may become invalid or new HUIs may arise when transactions are inserted, deleted or modified. Existing approaches are required to re-process the updated database and re-mine HUIs each time, as previously discovered HUIs are not maintained. Previously, a pre-large concept was proposed to efficiently maintain and update the discovered information in ARM, which cannot be directly applied into HUIM. In this paper, a maintenance (PRE-HUI-MOD) algorithm with transaction modification based on a new pre-large strategy is presented to efficiently maintain and update the discovered HUIs. When the transactions are consequentially modified from the original database, the discovered information is divided into three parts with nine cases. A specific procedure is then performed to maintain and update the discovered information for each case. Based on the designed PRE-HUI-MOD algorithm, it is unnecessary to rescan original database until the accumulative total utility of the modified transactions achieves the designed safety bound, which can greatly reduce the computations of multiple database scans when compared to the batch-mode approaches.

Explore More