Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ludovic Lebart is active.

Publication


Featured researches published by Ludovic Lebart.


Archive | 2014

Exploring Textual Data

Ludovic Lebart; André Salem; Lisette Berry

Researchers in a number of disciplines deal with large text sets requiring both text management and text analysis. Faced with a large amount of textual data collected in marketing surveys, literary investigations, historical archives and documentary data bases, these researchers require assistance with organizing, describing and comparing texts. Exploring Textual Data demonstrates how exploratory multivariate statistical methods such as correspondence analysis and cluster analysis can be used to help investigate, assimilate and evaluate textual data. The main text does not contain any strictly mathematical demonstrations, making it accessible to a large audience. This book is very user-friendly with proofs abstracted in the appendices. Full definitions of concepts, implementations of procedures and rules for reading and interpreting results are fully explored. A succession of examples is intended to allow the reader to appreciate the variety of actual and potential applications and the complementary processing methods. A glossary of terms is provided.


Archive | 1996

Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods.

Frederic Chateau; Ludovic Lebart

Bootstrap distribution-free resampling technique (Efron, 1979) is frequently used to assess the variance of estimators or to produce tolerance areas on visualization diagrams derived from principal axes techniques (correspondence analysis (CA), principal component analysis (PCA)). Gifi (1981), Meulman (1982), Greenacre (1984) have done a pionneering work in the context of two-way or multiple correspondence analysis. In the case of principal component analysis, Diaconis and Efron (1983), Holmes (1985, 1989), Stauffer et al. (1985), Daudin et al. (1988) have adressed the problem of the choice of the relevant number of axes, and have proposed confidence intervals for points in the subspace spanned by the principal axes. These parameters are computed after the realization of each replicated samples, and involve constraints that depend on these samples. Several procedures have been proposed to overcome these difficulties: partial replications using supplementary elements (Greenacre), use of a three-way analysis to process simultaneously the whole set of replications (Holmes), filtering techniques involving reordering of axes and procrustean rotations (Milan and Whittaker, 1995).


Archive | 1998

Correspondence Analysis, Discrimination, and Neural Networks

Ludovic Lebart

Summary: Correspondence Analysis of contingency tables (CA) is closely related to a particular Supervised Multilayer Perceptron (MLP) or can be described as an Unsupervised MLP as well. The unsupervised MLP model is also linked to various types of stochastic approximation algorithms that mimic the cognition process involved in reading and comprehending a data table.


Applied Stochastic Models and Data Analysis | 1998

Text Mining in different languages

Ludovic Lebart

The purpose of Text Mining is to describe and explore textual data, to uncover structural traits, and proceed to predictions. The field of application concerns Information Retrieval, processing responses to open-ended questions in sample surveys as well as processing textual corpora of a more general nature. At the intersection of Corpora Linguistics and Exploratory Statistical Analysis, a series of language independent tools and methods can perform most of the previously mentioned tasks, including the assessment and validation of the obtained results, be it visualization or categorization. Multiple confusion matrices calculated on test-samples characterize the quality of the prediction as well as the structure of errors of prediction. In the case of multinational surveys and corpora, they allow us to proceed to comparisons among several countries, in spite of the very heterogeneous character of the basic information (texts in different languages).


Archive | 1998

Visualization of Textual Data

Ludovic Lebart; André Salem; Lisette Berry

How do we go about applying the multivariate techniques defined through the pedagogical examples in the preceding chapters to real-life situations? The information is so complex, and the possible points of view so numerous, that it is impossible to recommend a single path leading from the problem to a definitive solution. In this chapter we shall rather attempt to recognize different ways of delaying somewhat the moment when the user must necessarily intercede in an interpretative manner. Our aim, briefly, is to extend the scope of the analysis that is controllable and reproducible — we choose to use these simple words instead of the perhaps more controversial terms objective and automatic.


workshop on self-organizing maps | 2006

Assessing self organizing maps via contiguity analysis

Ludovic Lebart

Contiguity analysis is a straightforward generalization of linear discriminant analysis in which the partition of elements is replaced by a more general graph structure. Applied to the graph induced by a Self Organizing Map (SOM), contiguity analysis provides a set of linear projectors leading to a planar representation as close as possible to the SOM. As expected, such projectors may only concern local parts of the SOMs. They allow us to visualize the shapes of the clusters (convex hulls of the projections of the elements belonging to a cluster) and the pattern of the elements within each cluster. In some contexts, it is possible to project the bootstrap replicates of the elements, and therefore to produce confidence areas for elements via a standard partial bootstrap procedure.


Archive | 2005

Extraction of the Useful Words from a Decisional Corpus. Contribution of Correspondence Analysis

Mónica Bécue-Bertaut; Martin Rajman; Ludovic Lebart; Eric Gaussier

In the framework of the JuriSent case study, carried out within the European NEMIS thematic network, we analyze the contribution of text mining techniques to improve the consultation of jurisprudence textual databases. We mainly focus on correspondence analysis (CA) techniques, but also provide some insights on similar visualization techniques, such as self organizing maps (Kohonen maps), and review the potential impact of various Natural Language pre-processing techniques. CA is described in more detail, as well as its use in all the steps of the analysis. A concrete example is provided to illustrate the value of the results obtained with CA techniques for an enhanced access to the studied jurisprudence corpus.


Archive | 1998

Classification problems in text analysis and information retrieval

Ludovic Lebart

The specific complexity of textual data sets (free answers in surveys, documentary data bases, etc.) is emphasized. Recent trends of research show that classification techniques (discrimination and unsupervised clustering as well) are widely used and have great potential in both Information Retrieval and Text Mining.


Archive | 2004

Validation Techniques in Text Mining (with Application to the Processing of Open-ended Questions)

Ludovic Lebart

Clustering methods and principal axes techniques as well play a major role in the computerized exploration of textual corpora. However, most of the outputs of these unsupervised procedures are difficult to assess. We will focus on the two following issues: External validation, involving external data and allowing for classical statistical tests. Internal validation, based on resampling techniques such as bootstrap and other Monte Carlo methods. In the domain of textual data, these techniques can efficiently tackle the difficult problem of the plurality of statistical units (words, lemmas, segments, sentences, respondents).


Archive | 1998

Textual Statistics Scope and Applications

Ludovic Lebart; André Salem; Lisette Berry

The study of texts using statistical methods constitutes a field of interest known as textual statistics. In recent years there have been important changes in the general context of this domain of research, as well as its objectives and the methodological principles it utilizes.

Collaboration


Dive into the Ludovic Lebart's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

André Salem

University of Paris III: Sorbonne Nouvelle

View shared research outputs
Top Co-Authors

Avatar

Marie Piron

Institut de recherche pour le développement

View shared research outputs
Top Co-Authors

Avatar

Martin Rajman

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

François Roubaud

Institut de recherche pour le développement

View shared research outputs
Top Co-Authors

Avatar

Jean-Pierre Cling

Institut de recherche pour le développement

View shared research outputs
Top Co-Authors

Avatar

Mireille Razafindrakoto

Institut de recherche pour le développement

View shared research outputs
Top Co-Authors

Avatar

Michael Greenacre

Barcelona Graduate School of Economics

View shared research outputs
Researchain Logo
Decentralizing Knowledge