Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Brigitte Bigi is active.

Publication


Featured researches published by Brigitte Bigi.


ACM Transactions on Information Systems | 2001

An information-theoretic approach to automatic query expansion

Claudio Carpineto; Renato De Mori; Giovanni Romano; Brigitte Bigi

Techniques for automatic query expansion from top retrieved documents have shown promise for improving retrieval effectiveness on large collections; however, they often rely on an empirical ground, and there is a shortage of cross-system comparisons. Using ideas from Information Theory, we present a computationally simple and theoretically justified method for assigning scores to candidate expansion terms. Such scores are used to select and weight expansion terms within Rocchios framework for query reweigthing. We compare ranking with information-theoretic query expansion versus ranking with other query expansion techniques, showing that the former achieves better retrieval effectiveness on several performance measures. We also discuss the effect on retrieval effectiveness of the main parameters involved in automatic query expansion, such as data sparseness, query difficulty, number of selected documents, and number of selected terms, pointing out interesting relationships.


language and technology conference | 2011

A Multilingual Text Normalization Approach

Brigitte Bigi

The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, which serve to develop a multi-purpose multilingual text corpus. This approach was applied on written texts of French, English, Spanish, Vietnamese, Khmer and Chinese and on speech transcriptions of French, English, Italian, Chinese and Taiwanese. It consists in splitting the text normalization problem in a set of minor sub-problems as language-independent as possible. A set of text corpus normalization tools with linked resources and a document structuring method are proposed and distributed under the terms of the GPL license.


workshop on statistical machine translation | 2009

Mining a Comparable Text Corpus for a Vietnamese-French Statistical Machine Translation System

Thi Ngoc Diep Do; Viet Bac Le; Brigitte Bigi; Laurent Besacier; Eric Castelli

This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnamese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables, words, or their combinations) is discussed.


international conference on acoustics, speech, and signal processing | 2008

Word/sub-word lattices decomposition and combination for speech recognition

Viet Bac Le; Sopheap Seng; Laurent Besacier; Brigitte Bigi

This paper presents the benefit of using multiple lexical units in the post-processing stage of an ASR system. Since the use of sub-word units can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. By using a sub-word information table, every word in a lattice can be decomposed into sub-word units. These decomposed lattices can be combined into a common lattice in order to generate a confusion network. This lattices combination scheme results in an absolute syllable error rate reduction of about 1.4% over the sentence MAP baseline method for a Vietnamese ASR task. By comparing with the N-best lists combination and voting method, the proposed method works better.


ieee automatic speech recognition and understanding workshop | 1997

Combined models for topic spotting and topic-dependent language modeling

Brigitte Bigi; R. De Mori; Marc El-Bèze; Thierry Spriet

A new statistical method for language modeling and spoken document classification is proposed. It is based on a mixture of topic dependent probabilities. Each topic dependent probability is in turn a mixture of n-gram probabilities and the probability of Kullback-Lieber (KL) distances between keyword unigrams and distribution obtained from the content of a cache memory. Experimental result on topic classification using a corpus of 60 Mword from the French newspaper Le Monde show the excellent performance of the cache memory and its complementary role in providing different statistics for the decision process.


annual meeting of the special interest group on discourse and dialogue | 2015

A SIP of CoFee : A Sample of Interesting Productions of Conversational Feedback

Laurent Prévot; Jan Gorisch; Roxane Bertrand; Emilien Gorene; Brigitte Bigi

Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of linguistic theories that take social interaction, involving lan- guage, into account. This paper introduces the corpora and datasets of a project scru- tinizing this kind of feedback utterances in French. We present the genesis of the cor- pora (for a total of about 16 hours of tran- scribed and phone force-aligned speech) involved in the project. We introduce the resulting datasets and discuss how they are being used in on-going work with focus on the form-function relationship of conver- sational feedback. All the corpora created and the datasets produced in the frame- work of this project will be made available for research purposes.


Journal on Multimodal User Interfaces | 2013

A multimodal study of answers to disruptions

Brigitte Bigi; Cristel Portes; Agnès Steuckardt; Marion Tellier

The interaction between Members of Parliament (MPs) is convention-based and rule-regulated. As instantiations of individual and group confrontations, parliamentary debates display well-regulated competing discursive processes. Unauthorised interruptions are spontaneous verbal reactions of MPs who interrupt the current speaker. This paper focuses on the answers of the current speaker to these disruptions. It introduces an annotation scheme for a political debate dataset which is mainly in the form of video annotations and audio annotations. The annotations contain information ranging from general linguistic to domain specific information. Some is annotated with automatic tools, and some is manually annotated. One of the goals is to use the information to predict the categories of the answers by the speaker to the disruptions. A typology of such answers is proposed and an automatic categorization system based on a multimodal parametrization is successfully performed.


Revista de Estudos da Linguagem | 2018

Automatic Segmentation of Spontaneous Speech / Segmentação automática da fala espontânea

Brigitte Bigi; Christine Meunier

Na maior parte dos casos, a analise de entidades foneticas da fala exige o alinhamento da gravacao da fala com sua transcricao fonetica. Entretanto, os estudos sobre segmentacao automatica tem sido predominantemente desenvolvidos com amostras de fala lida ou fala preparada, uma vez que a fala espontânea refere-se a uma atividade mais informal, sem qualquer preparacao. Como consequencia, na fala espontânea numerosos fenomenos ocorrem, tais como: hesitacoes, repeticoes, feedback, backchannels, elisoes nao-padrao, fenomenos de reducao, palavras truncadas, e mais comumente, pronuncias nao-padrao. Eventos como o riso, ruidos e pausas preenchidas tambem sao muito comuns na fala espontânea. Este artigo objetiva comparar a fala lida e a fala espontânea a fim de avaliar o impacto do estilo de fala numa tarefa de segmentacao da fala. O artigo descreve a solucao implementada no programa SPPAS para a segmentacao automatica da fala lida e da fala espontânea. Essa solucao consiste de principalmente dois aspectos: suporte para uma Transcricao Ortografica Enriquecida para a otimizacao da conversao grafema-para-fonema e permissao para o alinhamento forcado (forced-alignment) dos seguintes eventos: pausas preenchidas, riso e ruidos. Tais eventos representam menos de 1% das ocorrencias na fala lida e cerca de 6% na fala espontânea. Eles ocorrem com um maximo de 3% nas Unidades Entre-Pausas de um corpus de fala lida e de 20% a 36% nas Pausas Entre-Unidades de corpora de fala espontânea. As medidas APFU-Acuracia no Posicionamento de Fronteiras de Unidade, do sistema de alinhamento forcado (forced-alignment system) proposto sao de 96% de acerto no que diz respeito a fala lida e 96,48% para a fala espontânea, com uma variacao delta de 40 ms.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2015

Automatic word segmentation for spoken Cantonese

R Fung; Brigitte Bigi

Though Cantonese is the most influential variety of Chinese other than Mandarin, there are only a limited number of Cantonese corpora available for linguistic studies. Among the essential steps of building a corpus, word segmentation is a necessary but highly challenging task due to the lack of clear word boundary in Cantonese. This paper reports the construction and evaluation of an open-source automatic Cantonese word segmenter developed for Cantonese. The tool is a component of the multilingual SPPAS program designed to be used directly by linguists. It is a free software distributed under a GPL license. The effectiveness of the tool was evaluated by comparing the result of segmenting some samples of a spoken Cantonese corpus manually and automatically using the tool developed. High precision and recall were found in our study. Upon completion, the tool would definitely promote the development of more Cantonese corpora for language related studies.


language and technology conference | 2013

A Phonetization Approach for the Forced-Alignment Task in SPPAS

Brigitte Bigi

The phonetization of text corpora requires a sequence of processing steps and resources in order to convert a normalized text in its constituent phones and then to directly exploit it by a given application. This paper presents a generic approach for text phonetization and concentrates on the aspects of phonetizing unknown words. This serves to develop a phonetizer in the context of forced-alignment application. The proposed approach is dictionary-based, which is as language-independent as possible. It is used on French, English, Spanish, Italian, Catalan, Polish, Mandarin Chinese, Taiwanese, Cantonese and Japanese in SPPAS software, a tool distributed under the terms of the GPL license.

Collaboration


Dive into the Brigitte Bigi's collaboration.

Top Co-Authors

Avatar

Laurent Besacier

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Marion Tellier

Aix-Marseille University

View shared research outputs
Top Co-Authors

Avatar

Eric Castelli

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gale Stam

National Louis University

View shared research outputs
Top Co-Authors

Avatar

Viet Bac Le

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Cristel Portes

Aix-Marseille University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge