Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jean-Hugues Chauchat is active.

Publication


Featured researches published by Jean-Hugues Chauchat.


international conference on management of data | 2010

Business intelligence for small and middle-sized entreprises

Oksana Grabova; Jérôme Darmont; Jean-Hugues Chauchat; Iryna Zolotaryova

Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architectures and tools (hardware and software) providing online data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also review in-memory processing. Consequently, this paper discusses the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making.


International Journal of Computational Intelligence Systems | 2009

N-grams based feature selection and text representation for Chinese Text Classification

Zhihua Wei; Duoqian Miao; Jean-Hugues Chauchat; Rui Zhao; Wen Li

In this paper, text representation and feature selection strategies for Chinese text classification based on n-grams are discussed. Two steps feature selection strategy is proposed which combines the preprocess within classes with the feature selection among classes. Four different feature selection methods and three text representation weights are compared by exhaustive experiments. Both C-SVC classifier and Naive bayes classifier are adopted to assess the results. All experiments are performed on Chinese corpus TanCorpV1.0 which includes more than 14,000 texts divided in 12 classes. Our experiments concern: (1) the performance comparison among different feature selection strategies: absolute text frequency, relative text frequency, absolute n-gram frequency and relative n-gram frequency; (2) the comparison of the sparseness and feature correlation in the “text by feature” matrices produced by four feature selection methods; (3) the performance comparison among three term weights: 0/1 logical value, n-gr...


Psychiatry Research-neuroimaging | 1994

Seasonality of birth and ventricular enlargement in chronic schizophrenia

Thierry d'Amato; Thierry Rochet; Jean Dalery; Jean-Hugues Chauchat; Jean-Pierre Martin; Michel Marie-Cardine

Many studies have established that birth dates during the winter and early spring months are more common in schizophrenic patients than in the general population. It has been hypothesized that children born in winter are more likely to be exposed to environmental factors which could lead to the development of schizophrenia later in life. Another finding of interest has been the demonstration in brain-imaging studies that mild ventricular enlargement is more often found in schizophrenic patients than in healthy control subjects. In the present report, an increased incidence of ventricular enlargement was found in schizophrenic patients born in the winter months. Although the relationship between seasonality of birth and brain abnormalities is unclear, these phenomena could be partly linked.


portuguese conference on artificial intelligence | 2007

N-grams and morphological normalization in text classification: a comparison on a Croatian-English parallel corpus

Artur Šilić; Jean-Hugues Chauchat; Bojana Dalbelo Bašić; Annie Morin

In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.


Psychiatry Research-neuroimaging | 1992

Relationship between symptoms rated with the Positive and Negative Syndrome Scale and brain measures in schizophrenia.

Thierry d'Amato; Thierry Rochet; Jean Dalery; Annie Laurent; Jean-Hugues Chauchat; Jean-Louis Terra; Michel Marie-Cardine

The Positive and Negative Syndrome Scale (PANSS) was used to rate clinical symptoms in 42 inpatients with schizophrenia before they were examined by computed tomography. Significantly higher mean size of lateral and third ventricles, and higher mean cortical atrophy were found in schizophrenic patients compared with healthy control subjects. Ventricular enlargement and cortical atrophy were significantly related to low scores on the Composite subscale of the PANSS. Positive correlations were observed mainly with negative items such as blunted affect, emotional withdrawal, difficulties in abstract thinking, passive-apathetic social withdrawal, and lack of spontaneity of conversation. Additional positive correlations were observed with two items from the General Psychopathology subscale (mannerisms and disorientation). Inverse correlations were found with most positive items. These results suggest a relationship between brain structural abnormalities and the symptomatology of schizophrenia recorded with PANSS.


artificial intelligence in medicine in europe | 2009

Subgroup Discovery in Data Sets with Multi---dimensional Responses: A Method and a Case Study in Traumatology

Lan Umek; Blaž Zupan; Marko Toplak; Annie Morin; Jean-Hugues Chauchat; Gregor Makovec; Dragica Smrke

Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k -medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.


advances in social networks analysis and mining | 2009

Definition and Measures of an Opinion Model for Mining Forums

Anna Stavrianou; Julien Velcin; Jean-Hugues Chauchat

Online discussion systems in the form of forums have recently been analyzed by using graphs and social network techniques. Each forum is regarded as a social network and it is modeled by a graph whose vertices represent forum participants. In this paper, we focus on the structure and the opinion content of the forum posts and we are looking at the social network that is developed from a semantics point of view. We formally define an opinion-oriented model whose purpose is to provide complementary information to the knowledge extracted by the social network model. We define and present measures that can give important information regarding the opinion flow as well as the general attitude of users and towards users throughout the whole forum. Applying our model to a real forum found on the Web shows the additional information that can be extracted.


Archive | 2001

Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes

Jean-Hugues Chauchat; Ricco Rakotomalala

We propose a fast and efficient sampling strategy to build decision trees from a very large database, even when there are many continuous attributes which must be discretized at each step. Successive samples are used, one on each tree node. After a brief description of two fast sequential simple random sampling methods, we apply elements of statistical theory in order to determine the sample size that is sufficient at each step to obtain a decision tree as efficient as one built on the whole database. Applying the method to a simulated database (virtually infinite size), and to five usual benchmarks, confirms that when the database is large and contains many numerical attributes, our strategy of fast sampling on each node (with sample size about n = 300 or 500) speed up the mining process while maintaining the accuracy of the classifier.


intelligent data analysis | 2009

Textual features for corpus visualization using correspondence analysis

Sasa Petrovic; Bojana Dalbelo Bašić; Annie Morin; Blaž Zupan; Jean-Hugues Chauchat

Explorative data analysis in text mining essentially relies on effective visualization techniques which can expose hidden relationships among documents and reveal correspondence between documents and their features. In text mining, the documents are most often represented by feature vectors of very high dimensions, requiring dimensionality reduction to obtain visual projections in two- or three-dimensional space. Correspondence analysis is an unsupervised approach that allows for construction of low-dimensional projection space with simultaneous placement of both documents and features, making it ideal for explorative analysis in text mining. Its present use, however, has been limited to word-based features. In this paper, we investigate how this particular document representation compares to the representation with letter n-grams and word n-grams, and find that these alternative representations yield better results in separating documents of different class. We perform our experimental analysis on a bilingual Croatian-English parallel corpus, allowing us to additionally explore the impact of features in different languages on the quality of visualizations.


Expert Systems With Applications | 2012

Visualization of temporal text collections based on Correspondence Analysis

Arthur Šilić; Annie Morin; Jean-Hugues Chauchat; Bojana Dalbelo Bašić

In this paper, we present CatViz-Temporally-Sliced Correspondence Analysis Visualization. This novel method visualizes relationships through time and is suitable for large-scale temporal multivariate data. We couple CatViz with clustering methods, whereupon we introduce the concept of final centroid transfer, which enables the correspondence of clusters in time. Although CatViz can be used on any type of temporal data, we show how it can be applied to the task of exploratory visual analysis of text collections. We present a successful concept of employing feature-type filtering to present different aspects of textual data. We performed case studies on large collections of French and English news articles. In addition, we conducted a user study that confirms the usefulness of our method. We present typical tasks of exploratory text analysis and discuss application procedures that an analyst might perform. We believe that CatViz is general and highly applicable to large data sets because of its intuitiveness, effectiveness, and robustness. We expect that it will enable a better understanding of texts in huge historical archives.

Collaboration


Dive into the Jean-Hugues Chauchat's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge