Armelle Brun
French Institute for Research in Computer Science and Automation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Armelle Brun.
string processing and information retrieval | 2001
Brigitte Bigi; Armelle Brun; Jean Paul Haton; Kamel Smaïli; Imed Zitouni
This work presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Five methods are tested on these two corpora: topic unigrams, cache model, TFIDF classijier, topic peqdexity, and weighted model. Our work aims to study these methods by confronting them to very diferent data. This study is very fruitful for our research. Statistical topic identiJication methods depend not only on a corpus, but also on its type. One of the methods achieves a topic identiJcation of 80% on a general newspaper corpus but does not exceed 30% on e-mail corpus. Another method gives the best result on e-mails, but has not the same behavior on a newspaper corpus. We also show in this paper that almost all our methods achieve good results in retrieving the first two manually annotated labels.
string processing and information retrieval | 2000
Armelle Brun; Kamel Smaïli; Jean Paul Haton
We present several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific vocabularies. Specific vocabularies are determined manually or statistically. In both cases, we aim at finding the most representative words of a topic. Several methods have been experimented, the first one is based on perplexity, this method achieves a 100% topic identification rate, on large test corpora, when the two first propositions are taken into account. Other methods are based on statistical counts and achieve 94% of identification on smaller test corpora. The major challenge of this work is to identify topics with only few words in order to be able, during speech recognition, to determine the best adequate language model.
european conference on information retrieval | 2007
Anne Boyer; Armelle Brun
The identification of reliable and interesting items on Internet becomes more and more difficult and time consuming. This paper is a position paper describing our intended work in the framework of multimedia information retrieval by browsing techniques within web navigation. It relies on a usage-based indexing of resources: we ignore the nature, the content and the structure of resources. We describe a new approach taking advantage of the similarity between statistical modeling of language and document retrieval systems. A syntax of usage is computed that designs a Statistical Grammar of Usage (SGU). A SGU enables resources classification to perform a personalized navigation assistant tool. It relies both on collaborative filtering to compute virtual communities of users and classical statistical language models. The resulting SGU is a community dependent SGU.
information sciences, signal processing and their applications | 2007
Armelle Brun; David Langlois; Kamel Smaïli
This study examines how to take originally advantage from distant information in statistical language models. We show that it is possible to use n-gram models considering histories different from those used during training. These models are called crossing context models. Our study deals with classical and distant n-gram models. A mixture of four models is proposed and evaluated. A bigram linear mixture achieves an improvement of 14% in terms of perplexity. Moreover the trigram mixture outperforms the standard trigram by 5.6%. These improvements have been obtained without complexifying standard n-gram models. The resulting mixture language model has been integrated into a speech recognition system. Its evaluation achieves a slight improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER [1]. Finally, the impact of the proposed crossing context language models on performance is presented according to various speakers.
information sciences, signal processing and their applications | 2007
Anne Boyer; Armelle Brun
Due to the huge amount of available information via Internet, the identification of reliable and interesting items becomes more and more difficult and time consuming. The way we propose to cope with this difficulty consists in integrating a grammar of usage. We decide to ignore the nature, the content and the structure of resources. This paper is a position paper describing our intended work in the framework of multimedia information retrieval by browsing techniques within digital libraries. It relies on a usage-based indexing of resources. We describe a new approach taking advantage of the similarity between statistical modeling of language and document retrieval systems. We build syntax of usages and design a statistical grammar of usage (SGU) that allows resources classification to design personalized filtering.
conference of the international speech communication association | 1999
Kamel Smaïli; Armelle Brun; Imed Zitouni; Jean Paul Haton
recent advances in natural language processing | 2001
Brigitte Bigi; Armelle Brun; Jean-Paul Haton; Kamel Smaïli; Imed Zitouni
Journées d'Etude sur la Parole - JEP'04 | 2004
Armelle Brun; Christophe Cerisara; Dominique Fohr; Irina Illina; David Langlois; Odile Mella; Kamel Smaïli
conference of the international speech communication association | 2002
Armelle Brun; Kamel Smaïli; Jean Paul Haton
Worshop Ester | 2005
Armelle Brun; Christophe Cerisara; Dominique Fohr; Irina Illina; David Langlois; Odile Mella