Jean-Charles Lamirel
French Institute for Research in Computer Science and Automation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jean-Charles Lamirel.
Scientometrics | 2004
Jean-Charles Lamirel; Claire François; Shadi Al Shehabi; Martial Hoffmann
The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonens SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM. Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.
Scientometrics | 2001
Xavier Polanco; Claire François; Jean-Charles Lamirel
We argue in favour of artificial neural networks for exploratory data analysis, clustering andmapping. We propose the Kohonen self-organizing map (SOM) for clustering and mappingaccording to a multi-maps extension. It is consequently called Multi-SOM. Firstly the KohonenSOM algorithm is presented. Then the following improvements are detailed: the way of namingthe clusters, the map division into logical areas, and the map generalization mechanism. Themulti-map display founded on the inter-maps communication mechanism is exposed, and thenotion of the viewpoint is introduced. The interest of Multi-SOM is presented for visualization,exploration or browsing, and moreover for scientific and technical information analysis. A casestudy in patent analysis on transgenic plants illustrates the use of the Multi-SOM. We also showthat the inter-map communication mechanism provides support for watching the plants on whichpatented genetic technology works. It is the first map. The other four related maps provideinformation about the plant parts that are concerned, the target pathology, the transgenictechniques used for making these plants resistant, and finally the firms involved in geneticengineering and patenting. A method of analysis is also proposed in the use of this computerbasedmulti-maps environment. Finally, we discuss some critical remarks about the proposedapproach at its current state. And we conclude about the advantages that it provides for aknowledge-oriented watching analysis on science and technology. In relation with this remark weintroduce in conclusion the notion of knowledge indicators.
international symposium on neural networks | 2011
Jean-Charles Lamirel; Raghvendra Mall; Pascal Cuxac; Ghada Safi
Neural clustering algorithms show high performance in the general context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm (IGNG) and the labeling maximization based incremental growing neural gas algorithm (IGNG-F). In this paper we highlight that there is a drastic decrease of performance of these algorithms, as well as the one of more classical algorithms, when a heterogeneous textual dataset is considered as an input. Specific quality measures and cluster labeling techniques that are independent of the clustering method are used for the precise performance evaluation. We provide new variations to incremental growing neural gas algorithm exploiting in an incremental way knowledge from clusters about their current labeling along with cluster distance measure data. This solution leads to significant gain in performance for all types of datasets, especially for the clustering of complex heterogeneous textual data.
intelligent information systems | 2015
Jean-Charles Lamirel; Pascal Cuxac; Aneesh Sreevallabh Chivukula; Kafil Hajlaoui
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we show that a simple adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. The method is experienced on different types of textual datasets. The paper illustrates that the proposed method provides a very significant performance increase, as compared to state of the art methods, in all the studied cases even when a single bag of words model is exploited for data description. Interestingly, the most significant performance gain is obtained in the case of the classification of highly unbalanced, highly multidimensional and noisy data, with a high degree of similarity between the classes.
Scientometrics | 2012
Jean-Charles Lamirel
The objective of this paper is to propose a new unsupervised incremental approach in order to follow the evolution of research themes for a given scientific discipline in terms of emergence or decline. Such behaviors are detectable by various methods of filtering. However, our choice is made on the exploitation of neural clustering methods in a multi-view context. This new approach makes it possible to take into account the incremental and chronological aspects of information by opening the way to the detection of convergences and divergences of research themes at a large scale.
Scientometrics | 2004
Jean-Charles Lamirel; Shadi Al Shehabi; Claire François; Xavier Polanco
This paper present a compound approach for Webometrics based on an extension the self-organizing multimap MultiSOM model. The goal of this new approach is to combine link and domain clustering in order to increase the reliability and the precision of Webometrics studies. The extension proposed for the MultiSOM model is based on a Bayesian network-oriented approach. A first experiment shows that the behaviour of such an extension is coherent with its expected properties for Webometrics. A second experiment is carried out on a representative Web dataset issued from the EISCTES IST project context. In this latter experiment each map represents a particular viewpoint extracted from the Web data description. The obtained maps represented either thematic or link classifications. The experiment shows empirically that the communication between these classifications provides Webometrics with new explaining capabilities.
Scientometrics | 2013
Pascal Cuxac; Jean-Charles Lamirel; Valérie Bonvallot
The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web…etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions… Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.
discovery science | 2012
Kafil Hajlaoui; Pascal Cuxac; Jean-Charles Lamirel; Claire François
This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international patents classification plan related to the same field. The practical purpose of this work is to provide an assistance tool to experts in their task of evaluation of the originality and novelty of a patent, by offering to the latter the most relevant scientific citations. This issue raises new challenges in categorization research as the patent classification plan is not directly adapted to the structure of scientific documents, classes have high citation or cited topic and that there is not always a balanced distribution of the available examples within the different learning classes. We propose, as a solution to this problem, to apply an improved K-nearest-neighbors (KNN) algorithm based on the exploitation of association rules occurring between the index terms of the documents and the ones of the patent classes. By using a reference dataset of patents belonging to the field of pharmacology, on the one hand, and a bibliographic dataset of the same field issued from the Medline collection, on the other hand, we show that this new approach, which combines the advantages of numerical and symbolical approaches, improves considerably categorization performance, as compared to the usual categorization methods.
international conference industrial engineering other applications applied intelligent systems | 2010
Jean-Charles Lamirel; Zied Boulila; Maha Ghribi; Pascal Cuxac
Neural clustering algorithms show high performance in the usual context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental neural gas algorithm (IGNG). Nevertheless, this paper highlights clearly the drastic decrease of performance of these algorithms, as well as the one of more classical algorithms, when a heterogeneous textual dataset is considered as an input. A new incremental growing neural gas algorithm exploiting knowledge issued from clusters current labeling in an incremental way is proposed as an alternative to the original distance based algorithm. This solution leads to obtain very significant increase of performance for the clustering of heterogeneous textual data. Moreover, it provides a real incremental character to the proposed algorithm.
knowledge discovery and data mining | 2013
Jean-Charles Lamirel; Pascal Cuxac; Aneesh Sreevallabh Chivukula; Kafil Hajlaoui
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. We more especially show that this technique can enhance the performance of classification methods whilst very significantly outperforming (+80%) the state-of-the art feature selection techniques in the case of the classification of unbalanced, highly multidimensional and noisy textual data, with a high degree of similarity between the classes.