Philippe Lenca
Institut Mines-Télécom
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Philippe Lenca.
European Journal of Operational Research | 2008
Philippe Lenca; Patrick Meyer; Benoît Vaillant; Stéphane Lallich
Abstract Data mining algorithms, especially those used for unsupervised learning, generate a large quantity of rules. In particular this applies to the A priori family of algorithms for the determination of association rules. It is hence impossible for an expert in the field being mined to sustain these rules. To help carry out the task, many measures which evaluate the interestingness of rules have been developed. They make it possible to filter and sort automatically a set of rules with respect to given goals. Since these measures may produce different results, and as experts have different understandings of what a good rule is, we propose in this article a new direction to select the best rules: a two-step solution to the problem of the recommendation of one or more user-adapted interestingness measures. First, a description of interestingness measures, based on meaningful classical properties, is given. Second, a multicriteria decision aid process is applied to this analysis and illustrates the benefit that a user, who is not a data mining expert, can achieve with such methods.
discovery science | 2004
Benoît Vaillant; Philippe Lenca; Stéphane Lallich
It is a common issue that KDD processes may generate a large number of patterns depending on the algorithm used, and its parameters. It is hence impossible for an expert to sustain these patterns. This may be the case with the well-known APRIORI algorithm. One of the methods used to cope with such an amount of output depends on the use of interestingness measures. Stating that selecting interesting rules also means using an adapted measure, we present an experimental study of the behaviour of 20 measures on 10 datasets. This study is compared to a previous analysis of formal and meaningful properties of the measures, by means of two clusterings. One of the goals of this study is to enhance our previous approach. Both approaches seem to be complementary and could be profitable for the problem of a users choice of a measure.
Quality Measures in Data Mining | 2007
Philippe Lenca; Benoît Vaillant; Patrick Meyer; Stéphane Lallich
It is a common problem that Kdd processes may generate a large number of patterns depending on the algorithm used, and its parameters. It is hence impossible for an expert to assess these patterns. This is the case with the well-known Apriori algorithm. One of the methods used to cope with such an amount of output depends on using association rule interestingness measures. Stating that selecting interesting rules also means using an adapted measure, we present a formal and an experimental study of 20 measures. The experimental studies carried out on 10 data sets lead to an experimental classification of the measures. This study is compared to an analysis of the formal and meaningful properties of the measures. Finally, the properties are used in a multi-criteria decision analysis in order to select amongst the available measures the one or those that best take into account the user’s needs. These approaches seem to be complementary and could be useful in solving the problem of a user’s choice of measure.
knowledge discovery and data mining | 2008
Philippe Lenca; Stéphane Lallich; Thanh-Nghi Do; Nguyen-Khang Pham
In data mining, large differences in prior class probabilities known as the class imbalance problem have been reported to hinder the performance of classifiers such as decision trees. Dealing with imbalanced and cost-sensitive data has been recognized as one of the 10 most challenging problems in data mining research. In decision trees learning, many measures are based on the concept of Shannons entropy. A major characteristic of the entropies is that they take their maximal value when the distribution of the modalities of the class variable is uniform. To deal with the class imbalance problem, we proposed an off-centered entropy which takes its maximum value for a distribution fixed by the user. This distribution can be the a priori distribution of the class variable modalities or a distribution taking into account the costs of misclassification. Others authors have proposed an asymmetric entropy. In this paper we present the concepts of the three entropies and compare their effectiveness on 20 imbalanced data sets. All our experiments are founded on the C4.5 decision trees algorithm, in which only the function of entropy is modified. The results are promising and show the interest of off-centered entropies to deal with the problem of class imbalance.
advances in information technology | 2009
Komate Amphawan; Philippe Lenca; Athasit Surarerks
Temporal periodicity of patterns can be regarded as an important criterion for measuring the interestingness of frequent patterns in several applications. A frequent pattern can be said periodic-frequent if it appears at a regular interval. In this paper, we introduce the problem of mining the top-k periodic frequent patterns i.e. the periodic patterns with the k highest support. An efficient single-pass algorithm using a best-first search strategy without support threshold, called MTKPP (Mining Top-K Periodic-frequent Patterns), is proposed. Our experiments show that our proposal is efficient.
EGC (best of volume) | 2010
Thanh-Nghi Do; Philippe Lenca; Stéphane Lallich; Nguyen-Khang Pham
The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investigate a new approach for supervised classification with a huge number of numerical attributes. We propose a random oblique decision trees method. It consists of randomly choosing a subset of predictive attributes and it uses SVM as a split function of these attributes.We compare, on 25 datasets, the effectiveness with classical measures (e.g. precision, recall, F1-measure and accuracy) of random forests of random oblique decision trees with SVMs and random forests of C4.5. Our proposal has significant better performance on very-high-dimensional datasets with slightly better results on lower dimensional datasets.
Data Mining | 2010
Yannick Le Bras; Philippe Lenca; Stéphane Lallich
Many studies have shown the limits of support/confidence framework used in Apriori-like algorithms to mine association rules. There are a lot of efficient implementations based on the antimonotony property of the support. But candidate set generation is still costly and many rules are uninteresting or redundant. In addition one can miss interesting rules like nuggets. We are thus facing a complexity issue and a quality issue.
research challenges in information science | 2010
Ion Railean; Cristina Stolojescu; Sorin Moga; Philippe Lenca
In this article we propose an approach for predicting traffic time series based on the association of the Stationary Wavelet Transform (SWT) with Artificial Neural Networks (ANN). We focused on comparing the quality of forecasting obtained using different configurations of the ANN. We tested our different configurations using real traffic data recorded at each base station that belongs to aWiMAX Network developed by Alcatel. We compared our approach with previously forecasting models using ANNs and showed the performance of our neural network configuration.
knowledge discovery and data mining | 2010
Komate Amphawan; Athasit Surarerks; Philippe Lenca
Temporal periodicity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several application. A frequent itemset can be said periodic-frequent in a database if it appears at a regular interval given by the user. In this paper, we propose a concept of the approximate periodicity of each itemset. Moreover, a new tree-based data structure, called ITL-tree (Interval Transaction-ids List tree), is proposed. Our tree structure maintains an approximation of the occurrence information in a highly compact manner for the periodic-frequent itemsets mining. A pattern-growth mining is used to generate all of periodic-frequent itemsets by a bottom-up traversal of the ITL-tree for user-given periodicity and support thresholds. The performance study shows that our data structure is very efficient for mining periodic-frequent itemsets with approximate periodicity results.
Expert Systems With Applications | 2015
Komate Amphawan; Philippe Lenca
Mining top-k frequent-regular closed patterns with minimal length is proposed.A new compact bit-vector representation is designed.An efficient single-pass algorithm is proposed. Frequent-regular pattern mining has attracted recently many works. Most of the approaches focus on discovering a complete set of patterns under the user-given support and regularity threshold constraints. This leads to several quantitative and qualitative drawbacks. First, it is often difficult to set appropriate support threshold. Second, algorithms produce a huge number of patterns, many of them being redundant. Third, most of the patterns are of very small size and it is arduous to extract interesting relationship among items. To reduce the number of patterns a common solution is to consider the desired number k of outputs and to mine the top-k patterns. In addition, this approach does not require to set a support threshold. To cope with redundancy and interestingness relationship among items, we suggest to focus on closed patterns and introduce a minimal length constraint. We thus propose to mine the top-k frequent-regular closed patterns with minimal length. An efficient single-pass algorithm, called TFRC-Mine, and a new compact bit-vector representation which allows to prune uninteresting candidate, are designed. Experiments show that the proposed algorithm is efficient to produce longer - non redundant - patterns, and that the new data representation is efficient for both computational time and memory usage.