Christophe Moulin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christophe Moulin is active.

Explore More

Publication

Featured researches published by Christophe Moulin.

acm symposium on applied computing | 2011

Entropy based feature selection for text categorization

Christine Largeron; Christophe Moulin; Mathias Géry

In text categorization, feature selection can be essential not only for reducing the index size but also for improving the performance of the classifier. In this article, we propose a feature selection criterion, called Entropy based Category Coverage Difference (ECCD). On the one hand, this criterion is based on the distribution of the documents containing the term in the categories, but on the other hand, it takes into account its entropy. ECCD compares favorably with usual feature selection methods based on document frequency (DF), information gain (IG), mutual information (IM), χ2, odd ratio and GSS on a large collection of XML documents from Wikipedia encyclopedia. Moreover, this comparative study confirms the effectiveness of selection feature techniques derived from the χ2 statistics.

Pattern Recognition | 2014

Fisher Linear Discriminant Analysis for text-image combination in multimedia information retrieval

Christophe Moulin; Christine Largeron; Christophe Ducottet; Mathias Géry; Cécile Barat

With multimedia information retrieval, combining different modalities - text, image, audio or video provides additional information and generally improves the overall system performance. For this purpose, the linear combination method is presented as simple, flexible and effective. However, it requires to choose the weight assigned to each modality. This issue is still an open problem and is addressed in this paper.Our approach, based on Fisher Linear Discriminant Analysis, aims to learn these weights for multimedia documents composed of text and images. Text and images are both represented with the classical bag-of-words model. Our method was tested over the ImageCLEF datasets 2008 and 2009. Results demonstrate that our combination approach not only outperforms the use of the single textual modality but provides a nearly optimal learning of the weights with an efficient computation. Moreover, it is pointed out that the method allows to combine more than two modalities without increasing the complexity and thus the computing time. HighlightsWe model text and image documents with bag-of-words approach.We Fisher LDA for learning weights assigned to each modality.We experiment our model on ImageCLEF datasets 2008 and 2009.Our model outperforms the use of the single textual modality.Our method provides a nearly optimal learning with an efficient computation.

cross language evaluation forum | 2008

UJM at ImageCLEFwiki 2008

Christophe Moulin; Cécile Barat; Mathias Géry; Christophe Ducottet; Christine Largeron

This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki[10]). We propose a new multimedia model combining textual and/or visual information which enables to perform textual, visual, or multimedia queries. We experiment the model on ImageCLEF data and we compare the results obtained using the different modalities. Our multimedia document model is based on a vector of textual and visual terms. Textual terms correspond to textual words while the visual ones are computed using local colour features. We obtain good results using only the textual part and we show that the visual information is useful in some particular cases.

content based multimedia indexing | 2010

Fusion of tf.idf weighted bag of visual features for image classification

Christophe Moulin; Cécile Barat; Christophe Ducottet

Image representation using bag of visual words approach is commonly used in image classification. Features are extracted from images and clustered into a visual vocabulary. Images can then be represented as a normalized histogram of visual words similarly to textual documents represented as a weighted vector of terms. As a result, text categorization techniques are applicable to image classification. In this paper, our contribution is twofold. First, we propose a suitable Term-Frequency and Inverse Document Frequency weighting scheme to characterize the importance of visual words. Second, we present a method to fuse different bag-of-words obtained with different vocabularies. We show that using our tf.idf normalization and the fusion leads to better classification rates than other normalization methods, other fusion schemes or other approaches evaluated on the SIMPLIcity collection.

cross language evaluation forum | 2009

Combining text/image in wikipediaMM task 2009

Christophe Moulin; Cécile Barat; Cédric Lemaitre; Mathias Géry; Christophe Ducottet; Christine Largeron

This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF Wikipedia task 2009. We extend our previous multimedia model defined as a vector of textual and visual information based on a bag of words approach [6]. We extract additional textual information from the original Wikipedia articles and we compute several image descriptors (local colour and texture features). We show that combining linearly textual and visual information significantly improves the results.

Advances in Focused Retrieval | 2009

UJM at INEX 2008 XML Mining Track

Mathias Géry; Christine Largeron; Christophe Moulin

This paper reports our experiments carried out for the INEX XML Mining track, consisting in developing categorization (or classification) and clustering methods for XML documents. We represent XML documents as vectors of indexed terms. For our first participation, the purpose of our experiments is twofold: Firstly, our overall aim is to set up a categorization text only approach that can be used as a baseline for further work which will take into account the structure of the XML documents. Secondly, our goal is to define two criteria (CC and CCE) based on terms distribution for reducing the size of the index. Results of our baseline are good and using our two criteria, we improve these results while we slightly reduce the index term. The results are slightly worse when we sharply reduce the size of the index of terms.

intelligent data analysis | 2012

MCut: a thresholding strategy for multi-label classification

Christine Largeron; Christophe Moulin; Mathias Géry

The multi-label classification is a frequent task in machine learning notably in text categorization. When binary classifiers are not suited, an alternative consists in using a multiclass classifier that provides for each document a score per category and then in applying a thresholding strategy in order to select the set of categories which must be assigned to the document. The common thresholding strategies, such as RCut, PCut and SCut methods, need a training step to determine the value of the threshold. To overcome this limit, we propose a new strategy, called MCut which automatically estimates a value for the threshold. This method does not have to be trained and does not need any parametrization. Experiments performed on two textual corpora, XML Mining 2009 and RCV1 collections, show that the MCut strategy results are on par with the state of the art but MCut is easy to implement and parameter free.

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition | 2010

Impact of visual information on text and content based image retrieval

Christophe Moulin; Christine Largeron; Mathias Géry

Nowadays, multimedia documents composed of text and images are increasingly used, thanks to the Internet and the increasing capacity of data storage. It is more and more important to be able to retrieve needles in this huge haystack. In this paper, we present a multimedia document model which combines textual and visual information. Using a bag-of-words approach, it represents a textual and visual document using a vector for each modality. Given a multimedia query, our model combines scores obtained for each modality and returns a list of relevant retrieved documents. This paper aims at studying the influence of the weight given to the visual information relative to the textual information. Experiments on the multimedia ImageCLEF collection show that results can be improved by learning this weight parameter.

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009