Georges Hébrail | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georges Hébrail is active.

Explore More

Publication

Featured researches published by Georges Hébrail.

international acm sigir conference on research and development in information retrieval | 1992

Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together

M. J. Blosseville; Georges Hébrail; M. G. Monteil; N. Pénot

In this paper we describe an automated method of classifying research project descriptions: a human expert classifies a sample set of projects into a set of disjoint and pre-defined classes, and then the computer learns from this sample how to classify new projects into these classes. Both textual and non-textual information associated with the projects are used in the learning and classification phases. Textual information is processed by two methods of analysis: a natural language analysis followed by a statistical analysis. Non-textual information is processed by a symbolic learning technique. We present the results of some experiments done on real data: two different classifications of our research projects.

Questiió: Quaderns d'Estadística, Sistemes, Informatica i Investigació Operativa | 2001

Practical data mining in a large utility company

Georges Hébrail

We present in this paper the main applications of data mining techniques at Electricite de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present.

Archive | 2000

Generation of Symbolic Objects from Relational Databases

Véronique Stéphan; Georges Hébrail; Yves Lechevallier

In former chapters, we have defined the concept of a ‘symbolic object’ in a formal way (with various levels of generality) and illustrated these definitions and the related terminology by many examples. Thereby we have emphasized the two-level-paradigm where symbolic objects were created quite naturally when aggregating single individuals (described by classical single-valued variables) into classes, and describing the more or less complex properties of these classes. Here, we focus on the generalization process from a classical dataset extracted from a relational database. We also define a specialization step which aims at reducing over-generalization. Finally, we present how to build a symbolic dataset from several datasets by applying a join operator.

european conference on principles of data mining and knowledge discovery | 1997

Interactive Interpretation of Hierarchical Clustering

Eric Boudaillier; Georges Hébrail

Automatic clustering methods are part of data mining methods. They aim at building clusters of items so that similar items fall into the same cluster while dissimilar ones fall into separate clusters. A particular class of clustering methods are hierarchical ones where recursive clusters are formed to grow a tree representing an approximation of similarities between items. We propose a new interactive interface to help the user to interpret the result of such a clustering process, according to the item characteristics. The prototype has been applied successfully to a special case of items providing nice graphical representations (electric load curves) but can also be used with other types of curves or with more standard items.

international conference on management of data | 2015

Chiaroscuro: Transparency and Privacy for Massive Personal Time-Series Clustering

Tristan Allard; Georges Hébrail; Florent Masseglia; Esther Pacitti

The advent of on-body/at-home sensors connected to personal devices leads to the generation of fine grain highly sensitive personal data at an unprecendent rate. However, despite the promises of large scale analytics there are obvious privacy concerns that prevent individuals to share their personnal data. In this paper, we propose Chiaroscuro, a complete solution for clustering personal data with strong privacy guarantees. The execution sequence produced by Chiaroscuro is massively distributed on personal devices, coping with arbitrary connections and disconnections. Chiaroscuro builds on our novel data structure, called Diptych, which allows the participating devices to collaborate privately by combining encryption with differential privacy. Our solution yields a high clustering quality while minimizing the impact of the differentially private perturbation. Chiaroscuro is both correct and secure. Finally, we provide an experimental validation of our approach on both real and synthetic sets of time-series.

Archive | 2004

Building Small Scale Models of Multi-Entity Databases By Clustering

Georges Hébrail; Yves Lechevallier

A framework is proposed to build small scale models of very large databases describing several entities and their relationships. In the first part, it is shown that the use of sampling is not a good solution when several entities are stored in a database. In the second part, a model is proposed which is based on clustering all entities of the database and storing aggregates on the clusters and on the relationships between the clusters. The last part of the paper discusses the different problems which are raised by this approach. Some solutions are proposed: in particular, the link with symbolic data analysis is established.

ieee pes asia pacific power and energy engineering conference | 2016

Spatial estimation of electricity consumption using socio-demographic information

Jiali Mei; Yannig Goude; Georges Hébrail; Nicolas Kong

Electric power consumption is known at fine temporal scales (e.g. hourly) for geographical zones corresponding to the electric network service divisions (e.g. substations). In several applications, there is a strong need to estimate the past or to forecast future consumption at different divisions, for example the town, district or city block levels. The deployment of smart meters only gives a partial answer to this problem because they usually do not provide exhaustive measures at such temporal scales. We propose in this paper a generic approach to estimate electric consumption on any geographical zones from source zones where fine-grained consumption data is available, using in addition socio-demographic information. The approach is evaluated on both real and simulated data.

Ingénierie Des Systèmes D'information | 2002

Getting right answers from incomplete multidimensional databases

Sabine Goutier; Georges Hébrail; Véronique Stéphan

Dealing with large volumes of data, OLAP data cubes aggregated values are often spoiled by errors due to missing values in detailed data. This paper suggests to adjust aggregate answers, noticing that non-missing values constitute a biased sample of the true result of the query. Using basic random sampling theory, we show that two different problems can be solved nicely: (1) the case of missing tuples in the database, (2) the case of missing values appearing in the attributes used to build the data cube dimensions. Integration of these concepts within the OLAP data cube model is solved, by adjusting the data cube measures with a well-chosen weighting system. An algorithm (the ROWN method) minimizes the number of necessary weighting systems. A proof of concept implementation on the ORACLE EXPRESS system is briefly described at the end of the paper. RESUME. Dans le contexte des OLAP, les valeurs manquantes au sein des données de détail influencent la qualité des agrégats d’un cube de données. En considérant que l’ensemble des valeurs non manquantes constitue un échantillon biaisé du vrai résultat de la requête, nous proposons une méthode d’ajustement des agrégats. En adaptant des méthodes classiques d’échantillonnage, nous montrons comment résoudre : (1) le cas de tuples manquants dans la base de données, (2) le cas de valeurs manquantes dans les attributs formant les dimensions d’un cube de données. La méthode d’ajustement est réalisée en intégrant un système de poids au sein du cube de données. Un algorithme (méthode ROWN) permet de déterminer les systèmes de pondération en minimisant leur nombre. Une implémentation sous ORACLE EXPRESS est finalement brièvement décrite.

Archive | 2000

DB2SO: A Software for Building Symbolic Objects from Databases

Georges Hébrail; Yves Lechevallier

The SODAS project, funded by EC, has developed a software for extending statistical data analysis methods to more complex objects. Objects processed by these methods are complex in the sense that they represent groups of individuals, featuring variation among each group of individuals. Within the context of the SODAS project, the complex objects are called symbolic objects. In this paper, we present a part of the SODAS software, which enables the user to acquire datasets of symbolic objects, by extracting information from relational databases.

Archive | 1998

The SODAS Project: a Software for Symbolic Data Analysis

Georges Hébrail

This paper presents an ESPRIT European project, whose goal is to develop a prototype software for symbolic data analysis. Symbolic data analysis is an extension of standard methods of data analysis (such as clustering, discrimination, or factorial analysis) to more complex data structures, called symbolic objects. After a short presentation of the model of symbolic objects, the different parts of the software are briefly described.

Explore More