Philippe Fournier-Viger

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philippe Fournier-Viger is active.

Explore More

Publication

Featured researches published by Philippe Fournier-Viger.

european conference on machine learning | 2016

The SPMF Open-Source Data Mining Library Version 2

Philippe Fournier-Viger; Jerry Chun-Wei Lin; Antonio Gomariz; Ted Gueniche; Azadeh Soltani; Zhihong Deng; Hoang Thanh Lam

SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.

Knowledge Based Systems | 2016

Efficient algorithms for mining high-utility itemsets in uncertain databases

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Vincent S. Tseng

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the information stored in databases is precise, i.e., that there is no uncertainty. But in many real-life applications, an item or itemset is not only present or absent in transactions but is also associated with an existence probability. This is especially the case for data collected experimentally or using noisy sensors. In the past, many algorithms were respectively proposed to effectively mine frequent itemsets in uncertain databases. But mining HUIs in an uncertain database has not yet been proposed, although uncertainty is commonly seen in real-world applications. In this paper, a novel framework, named potential high-utility itemset mining (PHUIM) in uncertain databases, is proposed to efficiently discover not only the itemsets with high utilities but also the itemsets with high existence probabilities in an uncertain database based on the tuple uncertainty model. The PHUI-UP algorithm (potential high-utility itemsets upper-bound-based mining algorithm) is first presented to mine potential high-utility itemsets (PHUIs) using a level-wise search. Since PHUI-UP adopts a generate-and-test approach to mine PHUIs, it suffers from the problem of repeatedly scanning the database. To address this issue, a second algorithm named PHUI-List (potential high-utility itemsets PU-list-based mining algorithm) is also proposed. This latter directly mines PHUIs without generating candidates, thanks to a novel probability-utility-list (PU-list) structure, thus greatly improving the scalability of PHUI mining. Substantial experiments were conducted on both real-life and synthetic datasets to assess the performance of the two designed algorithms in terms of runtime, number of patterns, memory consumption, and scalability.

Knowledge and Information Systems | 2017

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Souleymane Zida; Philippe Fournier-Viger; Jerry Chun-Wei Lin; Cheng-Wei Wu; Vincent S. Tseng

In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms

Knowledge Based Systems | 2016

An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies

Quang-Huy Duong; Bo Liao; Philippe Fournier-Viger; Thu-Lan Dam

Knowledge Based Systems | 2016

Efficient mining of high-utility itemsets using multiple minimum utility thresholds

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Justin Zhan

hbox {d}^2

industrial conference on data mining | 2016

PHM: Mining Periodic High-Utility Itemsets

Philippe Fournier-Viger; Jerry Chun-Wei Lin; Quang-Huy Duong; Thu-Lan Dam

Advanced Engineering Informatics | 2016

Fast algorithms for mining high-utility itemsets with various discount strategies

Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Vincent S. Tseng

d2HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.

Knowledge Based Systems | 2016

FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits

Jerry Chun-Wei Lin; Philippe Fournier-Viger; Wensheng Gan

Top-k high utility itemset mining is the process of discovering the k itemsets having the highest utilities in a transactional database. In recent years, several algorithms have been proposed for this task. However, it remains very expensive both in terms of runtime and memory consumption. The reason is that current algorithms often generate a huge amount of candidate itemsets and are unable to prune the search space effectively. In this paper, we address this issue by proposing a novel algorithm named kHMC to discover the top-k high utility itemsets more efficiently. Unlike several algorithms for top-k high utility itemset mining, kHMC discovers high utility itemsets using a single phase. Furthermore, it employs three strategies named RIU, CUD, and COV to raise its internal minimum utility threshold effectively, and thus reduce the search space. The COV strategy introduces a novel concept of coverage. The concept of coverage can be employed to prune the search space in high utility itemset mining, or to raise the threshold in top-k high utility itemset mining, as proposed in this paper. Furthermore, kHMC relies on a novel co-occurrence pruning technique named EUCPT to avoid performing costly join operations for calculating the utilities of itemsets. Moreover, a novel pruning strategy named TEP is proposed for reducing the search space. To evaluate the performance of the proposed algorithm, extensive experiments have been conducted on six datasets having various characteristics. Results show that the proposed algorithm outperforms the state-of-the-art TKO and REPT algorithms for top-k high utility itemset mining both in terms of memory consumption and runtime.

Engineering Applications of Artificial Intelligence | 2016

Mining high-utility itemsets based on particle swarm optimization

Jerry Chun-Wei Lin; Lu Yang; Philippe Fournier-Viger; Jimmy Ming-Thai Wu; Tzung-Pei Hong; Leon Shyue-Liang Wang; Justin Zhan

In the field of data mining, the topic of high-utility itemset mining (HUIM) has recently gained a lot of attention from researchers as it takes many factors into account that are useful for decision-making by retail managers. In the past, many algorithms have been presented for HUIM but most of them suffer from the limitation of using a single minimum utility threshold to identify high-utility itemsets (HUIs). For real-life applications, finding itemsets using a single threshold is inadequate and unfair since each item is different. Hence, the diversity or importance of each item should be considered. This paper proposes a solution to this issue by defining the novel task of HUIM with multiple minimum utility thresholds (named as HUIM-MMU). This task lets users specify a different minimum utility threshold for each item to identify more useful and specific HUIs, which would generate more profits when compared to HUIs discovered based on a single minimum utility threshold. The HUI-MMU algorithm is designed to mine HUIs in a level-wise manner. The sorted downward closure (SDC) property and the least minimum utility (LMU) concept are developed to avoid a combinatorial explosion for identifying HUIs and to ensure the completeness and correctness of HUI-MMU for discovering HUIs. Meanwhile, two improved algorithms, namely HUI-MMUTID and HUI-MMUTE, are presented based on the TID-index and EUCP strategies. Those strategies can be used to speed up the mining performance to discover HUIs. Substantial experiments on both real-life and synthetic datasets show that the designed algorithms can efficiently and effectively discover the complete set of HUIs in databases by considering multiple minimum utility thresholds.

Applied Intelligence | 2016

An efficient algorithm for mining top-rank-k frequent patterns

Thu-Lan Dam; Kenli Li; Philippe Fournier-Viger; Quang-Huy Duong

High-utility itemset mining is the task of discovering high-utility itemsets, i.e. sets of items that yield a high profit in a customer transaction database. High-utility itemsets are useful, as they provide information about profitable sets of items bought by customers to retail store managers, which can then use this information to take strategic marketing decisions. An inherent limitation of traditional high-utility itemset mining algorithms is that they are inappropriate to discover recurring customer purchase behavior, although such behavior is common in real-life situations (for example, a customer may buy some products every day, week or month). In this paper, we address this limitation by proposing the task of periodic high-utility itemset mining. The goal is to discover groups of items that are periodically bought by customers and generate a high profit. An efficient algorithm named PHM (Periodic High-utility itemset Miner) is proposed to efficiently enumerate all periodic high-utility itemsets. Experimental results show that the PHM algorithm is efficient, and can filter a huge number of non periodic patterns to reveal only the desired periodic high-utility itemsets.

Explore More