Cyril Goutte
National Research Council
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cyril Goutte.
NeuroImage | 1999
Cyril Goutte; Peter Aundal Toft; Egill Rostrup; Finn Årup Nielsen; Lars Kai Hansen
Analysis of fMRI time series is often performed by extracting one or more parameters for the individual voxels. Methods based, e.g., on various statistical tests are then used to yield parameters corresponding to probability of activation or activation strength. However, these methods do not indicate whether sets of voxels are activated in a similar way or in different ways. Typically, delays between two activated signals are not identified. In this article, we use clustering methods to detect similarities in activation between voxels. We employ a novel metric that measures the similarity between the activation stimulus and the fMRI signal. We present two different clustering algorithms and use them to identify regions of similar activations in an fMRI experiment involving a visual stimulus.
european conference on information retrieval | 2005
Cyril Goutte; Eric Gaussier
We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.
international conference on computational linguistics | 2004
John Blatz; Erin Fitzgerald; George F. Foster; Simona Gandrabur; Cyril Goutte; Alex Kulesza; Alberto Sanchís; Nicola Ueffing
We present a detailed study of confidence estimation for machine translation. Various methods for determining whether MT output is correct are investigated, for both whole sentences and words. Since the notion of correctness is not intuitively clear in this context, different ways of defining it are proposed. We present results on data from the NIST 2003 Chinese-to-English MT evaluation.
international acm sigir conference on research and development in information retrieval | 2005
Eric Gaussier; Cyril Goutte
Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as document clustering. Despite their different inspirations, both methods are instances of multinomial PCA [1]. We further explore this relationship and first show that PLSA solves the problem of NMF with KL divergence, and then explore the implications of this relationship.
Neural Computation | 1997
Cyril Goutte
The no-free-lunch theorems (Wolpert & Macready, 1995) have sparked heated debate in the computational learning community. A recent communication (Zhu & Rohwer, 1996) attempts to demonstrate the inefficiency of cross-validation on a simple problem. We elaborate on this result by considering a broader class of cross-validation. When used more strictly, cross-validation can yield the expected results on simple examples.
meeting of the association for computational linguistics | 2004
Eric Gaussier; Jean-Michel Renders; Irina Matveeva; Cyril Goutte; Hervé Déjean
We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons.
Human Brain Mapping | 2001
Cyril Goutte; Lars Kai Hansen; Matthew G. Liptrot; Egill Rostrup
Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999 ] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields fMRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta‐analysis of fMRI experiment, where features are obtained from a possibly large number of single‐voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular, shows interesting differences between individual voxel analysis performed with traditional methods. Hum. Brain Mapping 13:165–183, 2001.
empirical methods in natural language processing | 2005
Michel Simard; Nicola Cancedda; Bruno Cavestro; Marc Dymetman; Eric Gaussier; Cyril Goutte; Kenji Yamada; Philippe Langlais; Arne Mauser
This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.
european conference on information retrieval | 2002
Eric Gaussier; Cyril Goutte; Kris Popat; Francine Chen
We propose a new hierarchical generative model for textual data, where words may be generated by topic specific distributions at any level in the hierarchy. This model is naturally well-suited to clustering documents in preset or automatically generated hierarchies, as well as categorising new documents in an existing hierarchy. Training algorithms are derived for both cases, and illustrated on real data by clustering news stories and categorising newsgroup messages. Finally, the generative model may be used to derive a Fisher kernel expressing similarity between documents.
international acm sigir conference on research and development in information retrieval | 2008
Massih Reza Amini; Tuong Vinh Truong; Cyril Goutte
This paper presents a boosting based algorithm for learning a bipartite ranking function (BRF) with partially labeled data. Until now different attempts had been made to build a BRF in a transductive setting, in which the test points are given to the methods in advance as unlabeled data. The proposed approach is a semi-supervised inductive ranking algorithm which, as opposed to transductive algorithms, is able to infer an ordering on new examples that were not used for its training. We evaluate our approach using the TREC-9 Ohsumed and the Reuters-21578 data collections, comparing against two semi-supervised classification algorithms for ROCArea (AUC), uninterpolated average precision (AUP), mean precision@50 (TP) and Precision-Recall (PR) curves. In the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones, we show our method to produce statistically significant improvements with respect to these ranking measures.