Is this you? Create Your Porfile

Armelle Brun

French Institute for Research in Computer Science and Automation

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Armelle Brun is active.

Explore More

Publication

Featured researches published by Armelle Brun.

string processing and information retrieval | 2001

A comparative study of topic identification on newspaper and e-mail

Brigitte Bigi; Armelle Brun; Jean Paul Haton; Kamel Smaïli; Imed Zitouni

This work presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Five methods are tested on these two corpora: topic unigrams, cache model, TFIDF classijier, topic peqdexity, and weighted model. Our work aims to study these methods by confronting them to very diferent data. This study is very fruitful for our research. Statistical topic identiJication methods depend not only on a corpus, but also on its type. One of the methods achieves a topic identiJcation of 80% on a general newspaper corpus but does not exceed 30% on e-mail corpus. Another method gives the best result on e-mails, but has not the same behavior on a newspaper corpus. We also show in this paper that almost all our methods achieve good results in retrieving the first two manually annotated labels.

string processing and information retrieval | 2000

Experiment analysis in newspaper topic detection

Armelle Brun; Kamel Smaïli; Jean Paul Haton

We present several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific vocabularies. Specific vocabularies are determined manually or statistically. In both cases, we aim at finding the most representative words of a topic. Several methods have been experimented, the first one is based on perplexity, this method achieves a 100% topic identification rate, on large test corpora, when the two first propositions are taken into account. Other methods are based on statistical counts and achieve 94% of identification on smaller test corpora. The major challenge of this work is to identify topics with only few words in order to be able, during speech recognition, to determine the best adequate language model.

european conference on information retrieval | 2007

Natural language processing for usage based indexing of web resources

Anne Boyer; Armelle Brun

The identification of reliable and interesting items on Internet becomes more and more difficult and time consuming. This paper is a position paper describing our intended work in the framework of multimedia information retrieval by browsing techniques within web navigation. It relies on a usage-based indexing of resources: we ignore the nature, the content and the structure of resources. We describe a new approach taking advantage of the similarity between statistical modeling of language and document retrieval systems. A syntax of usage is computed that designs a Statistical Grammar of Usage (SGU). A SGU enables resources classification to perform a personalized navigation assistant tool. It relies both on collaborative filtering to compute virtual communities of users and classical statistical language models. The resulting SGU is a community dependent SGU.

information sciences, signal processing and their applications | 2007

Improving language models by using distant information

Armelle Brun; David Langlois; Kamel Smaïli

This study examines how to take originally advantage from distant information in statistical language models. We show that it is possible to use n-gram models considering histories different from those used during training. These models are called crossing context models. Our study deals with classical and distant n-gram models. A mixture of four models is proposed and evaluated. A bigram linear mixture achieves an improvement of 14% in terms of perplexity. Moreover the trigram mixture outperforms the standard trigram by 5.6%. These improvements have been obtained without complexifying standard n-gram models. The resulting mixture language model has been integrated into a speech recognition system. Its evaluation achieves a slight improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER [1]. Finally, the impact of the proposed crossing context language models on performance is presented according to various speakers.

information sciences, signal processing and their applications | 2007

Towards a statistical grammar of usage for document retrieval in digital libraries

Anne Boyer; Armelle Brun

Due to the huge amount of available information via Internet, the identification of reliable and interesting items becomes more and more difficult and time consuming. The way we propose to cope with this difficulty consists in integrating a grammar of usage. We decide to ignore the nature, the content and the structure of resources. This paper is a position paper describing our intended work in the framework of multimedia information retrieval by browsing techniques within digital libraries. It relies on a usage-based indexing of resources. We describe a new approach taking advantage of the similarity between statistical modeling of language and document retrieval systems. We build syntax of usages and design a statistical grammar of usage (SGU) that allows resources classification to design personalized filtering.

conference of the international speech communication association | 1999