Fabrice Muhlenbach | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabrice Muhlenbach is active.

Explore More

Publication

Featured researches published by Fabrice Muhlenbach.

intelligent information systems | 2004

Identifying and Handling Mislabelled Instances

Fabrice Muhlenbach; Stéphane Lallich; Djamel Abdelkader Zighed

Data mining and knowledge discovery aim at producing useful and reliable models from the data. Unfortunately some databases contain noisy data which perturb the generalization of the models. An important source of noise consists of mislabelled training instances. We offer a new approach which deals with improving classification accuracies by using a preliminary filtering procedure. An example is suspect when in its neighbourhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the database itself. Such suspect examples in the training data can be removed or relabelled. The filtered training set is then provided as input to learning algorithms. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removal gives better results than relabelling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable. The filtering method proposed is finally compared to the relaxation relabelling schema.

international syposium on methodologies for intelligent systems | 2002

Improving Classification by Removing or Relabeling Mislabeled Instances

Stéphane Lallich; Fabrice Muhlenbach; Djamel A. Zighed

It is common that a database contains noisy data. An important source of noise consists in mislabeled training instances. We present a new approach that deals with improving classification accuracies in such a case by using a preliminary filtering procedure. An example is suspect when in its neighborhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the whole database. Such suspect examples in the training data can be removed or relabeled. The filtered training set is then provided as input to learning algorithm. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removing give better results than relabeling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable.

international conference on data mining | 2002

Multivariate supervised discretization, a neighborhood graph approach

Fabrice Muhlenbach; Ricco Rakotomalala

We present a new discretization method in the context of supervised learning. This method entitled HyperCluster Finder is characterized by its supervised and polythetic behavior. The method is based on the notion of clusters and processes in two steps. First, a neighborhood graph construction from the learning database allows discovering homogenous clusters. Second, the minimal and maximal values of each cluster are transferred to each dimension in order to define some boundaries to cut the continuous attribute in a set of intervals. The discretization abilities of this method are illustrated by some examples, in particular processing the XOR problem.

Pattern Recognition | 2003

A test to control a region growing process within a hierarchical graph

Stéphane Lallich; Fabrice Muhlenbach; Jean-Michel Jolion

Hierarchical graphs using a decimation process can be applied to an image to obtain segmentation. However, this method encounters limitations. First the decimation proceeds at each step in a general way on the entire image. Second, a criterion is needed to stop the iterative process when the best segmentation is obtained. We propose a statistical test to control the region growing process. This test, based on Morans spatial autocorrelation coefficient, controls the decimation process both globally and locally at the same time. Results on a composite image indicate that the local and global tests are a powerful tool for optimizing the image segmentation process.

international conference on data mining | 2009

A New Clustering Algorithm Based on Regions of Influence with Self-Detection of the Best Number of Clusters

Fabrice Muhlenbach; Stéphane Lallich

Clustering methods usually require to know the best number of clusters, or another parameter, e.g. a threshold, which is not ever easy to provide. This paper proposes a new graph-based clustering method called GBC which detects automatically the best number of clusters, without requiring any other parameter. In this method based on regions of influence, a graph is constructed and the edges of the graph having the higher values are cut according to a hierarchical divisive procedure. An index is calculated from the size average of the cut edges which self-detects the more appropriate number of clusters. The results of GBC for 3 quality indices (Dunn, Silhouette and Davies-Bouldin) are compared with those of K-Means, Wards hierarchical clustering method and DBSCAN on 8 benchmarks. The experiments show the good performance of GBC in the case of well separated clusters, even if the data are unbalanced, non-convex or with presence of outliers, whatever the shape of the clusters.

web intelligence | 2010

Discovering Research Communities by Clustering Bibliographical Data

Fabrice Muhlenbach; Stéphane Lallich

Today’s world is characterized by the multiplicity of interconnections through many types of links between the people, that is why mining social networks appears to be an important topic. Extracting information from social networks becomes a challenging problem, particularly in the case of the discovery of community structures. Mining bibliographical data can be useful to find communities of researchers. In this paper we propose a formal definition to consider the similarity and dissimilarity between individuals of a social network and how a graph-based clustering method can extract research communities from the DBLP database.

Proceedings of the 2014 International Workshop on Web Intelligence and Smart Sensing | 2014

Can Sequence Mining Improve Your Morning Mood? Toward a Precise Non-invasive Smart Clock

Zakaria M. Djedou; Fabrice Muhlenbach; Pierre Maret; Guillaume Lopez

The aim of this paper is to present our preliminary approach and work in progress in the design of sequence mining techniques for a new smart clock alarm. This clock alarm will ring the user at the most physiological opportune moment in a predefined time frame. We rely on a wearable biosensor collecting various signals (ECG, movement, temperature) and on algorithms that dynamically mine into the sequences of heterogeneous data to identify sleep cycles. The system will be less intrusive and more accurate than others. This paper presents the underlying domains, the method and the experiments we are implementing.

Neurocomputing | 2015

Comparison of two topological approaches for dealing with noisy labeling

Fabien Rico; Fabrice Muhlenbach; Djamel A. Zighed; Stéphane Lallich

This paper focuses on the detection of likely mislabeled instances in a learning dataset. In order to detect potentially mislabeled samples, two solutions are considered which are both based on the same framework of topological graphs. The first is a statistical approach based on Cut Edges Weighted statistics (CEW) in the neighborhood graph. The second solution is a Relaxation Technique (RT) that optimizes a local criterion in the neighborhood graph. The evaluations by ROC curves show good results since almost 90% of the mislabeled instances are retrieved for a cost of less than 20% of false positive. The removal of samples detected as mislabeled by our approaches generally leads to an improvement of the performances of classical machine learning algorithms.

international conference on mobile and ubiquitous systems: networking and services | 2013

Integration and Evolution of Data Mining Models in Ubiquitous Health Telemonitoring Systems

Vladimer Kobayashi; Pierre Maret; Fabrice Muhlenbach; Pierre-René Lhérisson

Ubiquitous Health Telemonitoring Systems collect low level data with the aim to ameliorate the health condition of patients. Models from data mining are created to compute indicators regarding their status and activity (habits, abnormalities). Models can also help generate feedbacks and recommendations for patients as well as for remote formal and informal care givers. Essential features are that the models can be easily updated whenever new information is available and that data generated from the models can be readily accessible as well as sensed data. This paper addresses the challenge of conveniently incorporating in a Ubiquitous Health Telemonitoring System the creation, the use, and the updating of data mining models. We conducted first runs and generated results showing the feasibility as well as the effectiveness of the system.

international conference theory and practice digital libraries | 2018

Metadata Enrichment of Multi-disciplinary Digital Library: A Semantic-Based Approach.

Hussein T. Al-Natsheh; Lucie Martinet; Fabrice Muhlenbach; Fabien Rico; Djamel A. Zighed

In the scientific digital libraries, some papers from different research communities can be described by community-dependent keywords even if they share a semantically similar topic. Articles that are not tagged with enough keyword variations are poorly indexed in any information retrieval system which limits potentially fruitful exchanges between scientific disciplines. In this paper, we introduce a novel experimentally designed pipeline for multi-label semantic-based tagging developed for open-access metadata digital libraries. The approach starts by learning from a standard scientific categorization and a sample of topic tagged articles to find semantically relevant articles and enrich its metadata accordingly. Our proposed pipeline aims to enable researchers reaching articles from various disciplines that tend to use different terminologies. It allows retrieving semantically relevant articles given a limited known variation of search terms. In addition to achieving an accuracy that is higher than an expanded query based method using a topic synonym set extracted from a semantic network, our experiments also show a higher computational scalability versus other comparable techniques. We created a new benchmark extracted from the open-access metadata of a scientific digital library and published it along with the experiment code to allow further research in the topic.

Explore More