Mathias Géry | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mathias Géry is active.

Explore More

Publication

Featured researches published by Mathias Géry.

acm symposium on applied computing | 2011

Entropy based feature selection for text categorization

Christine Largeron; Christophe Moulin; Mathias Géry

In text categorization, feature selection can be essential not only for reducing the index size but also for improving the performance of the classifier. In this article, we propose a feature selection criterion, called Entropy based Category Coverage Difference (ECCD). On the one hand, this criterion is based on the distribution of the documents containing the term in the categories, but on the other hand, it takes into account its entropy. ECCD compares favorably with usual feature selection methods based on document frequency (DF), information gain (IG), mutual information (IM), χ2, odd ratio and GSS on a large collection of XML documents from Wikipedia encyclopedia. Moreover, this comparative study confirms the effectiveness of selection feature techniques derived from the χ2 statistics.

Pattern Recognition | 2014

Fisher Linear Discriminant Analysis for text-image combination in multimedia information retrieval

Christophe Moulin; Christine Largeron; Christophe Ducottet; Mathias Géry; Cécile Barat

With multimedia information retrieval, combining different modalities - text, image, audio or video provides additional information and generally improves the overall system performance. For this purpose, the linear combination method is presented as simple, flexible and effective. However, it requires to choose the weight assigned to each modality. This issue is still an open problem and is addressed in this paper.Our approach, based on Fisher Linear Discriminant Analysis, aims to learn these weights for multimedia documents composed of text and images. Text and images are both represented with the classical bag-of-words model. Our method was tested over the ImageCLEF datasets 2008 and 2009. Results demonstrate that our combination approach not only outperforms the use of the single textual modality but provides a nearly optimal learning of the weights with an efficient computation. Moreover, it is pointed out that the method allows to combine more than two modalities without increasing the complexity and thus the computing time. HighlightsWe model text and image documents with bag-of-words approach.We Fisher LDA for learning weights assigned to each modality.We experiment our model on ImageCLEF datasets 2008 and 2009.Our model outperforms the use of the single textual modality.Our method provides a nearly optimal learning with an efficient computation.

advances in social networks analysis and mining | 2012

Combining Relations and Text in Scientific Network Clustering

David Combe; Christine Largeron; Elöd Egyed-Zsigmond; Mathias Géry

In this paper, we present different combined clustering methods and we evaluate their performances and their results on a dataset with ground truth. This dataset, built from several sources, contains a scientific social network in which textual data is associated to each vertex and the classes are known. Indeed, while the clustering task is widely studied both in graph clustering and in non supervised learning, combined clustering which exploits simultaneously the relationships between the vertices and attributes describing them, is quite new. We argue that, depending on the kind of data we have and the type of results we want, the choice of the clustering method is important and we present some concrete examples for underlining this.

Knowledge and Information Systems | 2012

BM25t: a BM25 extension for focused information retrieval

Mathias Géry; Christine Largeron

This paper addresses the integration of XML tags into a term-weighting function for focused XML information retrieval (IR). Our model allows us to consider a certain kind of structural information: tags that represent a logical structure (e.g., title, section, paragraph, etc.) as well as other tags (e.g., bold, italic, center, etc.). We take into account the influence of a tag by estimating the probability for this tag to distinguish relevant terms from the others. Then, these weights are integrated in a term-weighting function. Experiments on a large collection from the INEX 2008 XML IR evaluation campaign showed improvements on focused XML retrieval.

Advances in Focused Retrieval | 2009

UJM at INEX 2008: Pre-impacting of Tags Weights

Mathias Géry; Christine Largeron; Franck Thollard

This paper addresses the impact of structure on terms weighting function in the context of focused Information Retrieval (IR). Our model considers a certain kind of structural information: tags that represent logical structure (title, section, paragraph, etc.) and tags related to formatting (bold, italic, center, etc.). We take into account the tags influence by estimating the probability that a tag distinguishes relevant terms. This weight is integrated in the terms weighting function. Experiments on a large collection during INEX 2008 IR competition showed improvements for focused retrieval.

web intelligence | 2008

Integrating Structure in the Probabilistic Model for Information Retrieval

Mathias Géry; Christine Largeron; Franck Thollard

In databases or in the World Wide Web, many documents are in a structured format (e.g. XML). We propose in this article to extend the classical IR probabilistic model in order to take into account the structure through the weighting of tags. Our approach includes a learning step in which the weight of each tag is computed. This weight estimates the probability that the tag distinguishes the terms which are the most relevant. Our model has been evaluated on a large collection during INEX IR evaluation campaigns.

advances in social networks analysis and mining | 2012

Getting Clusters from Structure Data and Attribute Data

David Combe; Christine Largeron; Elöd Egyed-Zsigmond; Mathias Géry

If the clustering task is widely studied both in graph clustering and in non supervised learning, combined clustering which exploits simultaneously the relationships between the vertices and attributes describing them, is quite new. In this paper, we present different scenarios for this task and, we evaluate their performances and their results on a dataset, with ground truth, built from several sources and containing a scientific social network in which textual data is associated to each vertex and the classes are known. We argue that, depending on the kind of data we have and the type of results we want, the choice of the clustering method is important and we present some concrete examples for underlining this.

cross language evaluation forum | 2008

UJM at ImageCLEFwiki 2008

Christophe Moulin; Cécile Barat; Mathias Géry; Christophe Ducottet; Christine Largeron

This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki[10]). We propose a new multimedia model combining textual and/or visual information which enables to perform textual, visual, or multimedia queries. We experiment the model on ImageCLEF data and we compare the results obtained using the different modalities. Our multimedia document model is based on a vector of textual and visual terms. Textual terms correspond to textual words while the visual ones are computed using local colour features. We obtain good results using only the textual part and we show that the visual information is useful in some particular cases.

intelligent data analysis | 2015

I-Louvain: An Attributed Graph Clustering Method

David Combe; Christine Largeron; Mathias Géry; Előd Egyed-Zsigmond

Modularity allows to estimate the quality of a partition into communities of a graph composed of highly inter-connected vertices. In this article, we introduce a complementary measure, based on inertia, and specially conceived to evaluate the quality of a partition based on real attributes describing the vertices. We propose also I-Louvain, a graph nodes clustering method which uses our criterion, combined with Newman’s modularity, in order to detect communities in attributed graph where real attributes are associated with the vertices. Our experiments show that combining the relational information with the attributes allows to detect the communities more efficiently than using only one type of information. In addition, our method is more robust to data degradation.

cross language evaluation forum | 2009

Combining text/image in wikipediaMM task 2009

Christophe Moulin; Cécile Barat; Cédric Lemaitre; Mathias Géry; Christophe Ducottet; Christine Largeron

This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF Wikipedia task 2009. We extend our previous multimedia model defined as a vector of textual and visual information based on a bag of words approach [6]. We extract additional textual information from the original Wikipedia articles and we compute several image descriptors (local colour and texture features). We show that combining linearly textual and visual information significantly improves the results.

Explore More