Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thomas Pellegrini is active.

Publication


Featured researches published by Thomas Pellegrini.


IberSPEECH | 2012

Impact of Age in ASR for the Elderly: Preliminary Experiments in European Portuguese

Thomas Pellegrini; Isabel Trancoso; Annika Hämäläinen; António Calado; Miguel Sales Dias; Daniela Braga

Standard automatic speech recognition (ASR) systems use acoustic models typically trained with speech of young adult speakers. Ageing is known to alter speech production in ways that require ASR systems to be adapted, in particular at the level of acoustic modeling. This paper reports ASR experiments that illustrate the impact of speaker age on speech recognition performance. A large read speech corpus in European Portuguese allowed us to measure statistically significant performance differences among age groups ranging from 60- to 90-year-old speakers. An increase of 41% relative (11.9% absolute) in word error rate was observed between 60-65-year-old and 81-86-year-old speakers. This paper also reports experiments on retraining acoustic models (AMs), further illustrating the impact of ageing on ASR performance. Differentiated gains were observed depending on the age range of the adaptation data use to retrain the acoustic models.


international conference on multimedia and expo | 2009

Audio contributions to semantic video search

Isabel Trancoso; Thomas Pellegrini; José Portelo; Hugo Meinedo; Miguel Bugalho; Alberto Abad; João Paulo Neto

This paper summarizes the contributions to semantic video search that can be derived from the audio signal. Because of space restrictions, the emphasis will be on non-linguistic cues. The paper thus covers what is generally known as audio segmentation, as well as audio event detection. Using machine learning approaches, we have built detectors for over 50 semantic audio concepts.


spoken language technology workshop | 2010

Multimedia learning materials

José Lopes; Isabel Trancoso; Rui Correia; Thomas Pellegrini; Hugo Meinedo; Nuno J. Mamede; Maxine Eskenazi

This paper describes the integration of multimedia documents in the Portuguese version of REAP, a tutoring system for vocabulary learning. The documents result from the pipeline processing of Broadcast News videos that automatically segments the audio files, transcribes them, adds punctuation and capitalization, and breaks them into stories classified by topics. The integration of these materials in REAP was done in a way that tries to decrease the impact of potential errors of the automatic chain in the learning process.


ACM Transactions on Accessible Computing | 2015

Automatic Assessment of Speech Capability Loss in Disordered Speech

Thomas Pellegrini; Lionel Fontan; Julie Mauclair; Jérôme Farinas; Charlotte Alazard-Guiu; Marina Robert; Peggy Gatignol

In this article, we report on the use of an automatic technique to assess pronunciation in the context of several types of speech disorders. Even if such tools already exist, they are more widely used in a different context, namely, Computer-Assisted Language Learning, in which the objective is to assess nonnative pronunciation by detecting learners’ mispronunciations at segmental and/or suprasegmental levels. In our work, we sought to determine if the Goodness of Pronunciation (GOP) algorithm, which aims to detect phone-level mispronunciations by means of automatic speech recognition, could also detect segmental deviances in disordered speech. Our main experiment is an analysis of speech from people with unilateral facial palsy. This pathology may impact the realization of certain phonemes such as bilabial plosives and sibilants. Speech read by 32 speakers at four different clinical severity grades was automatically aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone level; 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed the detection of 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. Finally, to broaden the scope of the study, we explored the correlation between GOP values and speech comprehensibility scores on a second corpus, composed of sentences recorded by six people with speech impairments due to cancer surgery or neurological disorders. Strong correlations were achieved between GOP scores and subjective comprehensibility scores (about 0.7 absolute). Results from both experiments tend to validate the use of GOP to measure speech capability loss, a dimension that could be used as a complement to physiological measures in pathologies causing speech disorders.


language and technology conference | 2009

Error detection in broadcast news ASR using Markov chains

Thomas Pellegrini; Isabel Trancoso

This article addresses error detection in broadcast news automatic transcription, as a post-processing stage. Based on the observation that many errors appear in bursts, we investigated the use of Markov Chains (MC) for their temporal modelling capabilities. Experiments were conducted on a large Amercian English broadcast news corpus from NIST. Common features in error detection were used, all decoder-based. MC classification performance was compared with a discriminative maximum entropy model (Maxent), currently used in our in-house decoder to estimate confidence measures, and also with Gaussian Mixture Models (GMM). The MC classifier obtained the best results, by detecting 16.2% of the errors, with the lowest classification error rate of 16.7%. To be compared with the GMM classifier, MC allowed to lower the number of false detections, by 23.5% relative. The Maxent system achieved the same CER, but detected only 7.2% of the errors.


conference on computer as a tool | 2011

Browsing videos by automatically detected audio events

Virginia Barbosa; Thomas Pellegrini; Miguel Bugalho; Isabel Trancoso

This paper focuses on Audio Event Detection (AED), a research area which aims to substantially enhance the access to audio in multimedia content. With the ever-growing quantity of multimedia documents uploaded on the Web, automatic description of the audio content of videos can provide very useful information, to index, archive and search multimedia documents. Preliminary experiments with a sound effects corpus showed good results for training models. However, the performance on the real data test set, where there are overlapping audio events and continuous background noise is lower. This paper describes the AED framework and methodologies used to build 6 Audio Event detectors, based on statistical machine learning tools (Support Vector Machines). The detectors showed some promising improvements achieved by adding background noises to the training data, comprised of clean sound effects that are quite different from the real audio events in real life videos and movies. A graphical interface prototype is also presented, that allows browsing a movie by its content and provides an audio event description with time codes.


conference of the international speech communication association | 2016

Inferring phonemic classes from CNN activation maps using clustering techniques

Thomas Pellegrini; Sandrine Mouysset

Todays state-of-art in speech recognition involves deep neu-ral networks (DNN). These last years, a certain research effort has been invested in characterizing the feature representations learned by DNNs. In this paper, we focus on convolutional neu-ral networks (CNN) trained for phoneme recognition in French. We report clustering experiments performed on activation maps extracted from the different layers of a CNN comprised of two convolution and sub-sampling layers followed by three dense layers. Our goal was to get insights into phone separability and phonemic categories inferred by the network, and how they vary according to the successive layers. Two directions were explored with both linear and non-linear clustering techniques. First, we imposed a number of 33 classes equal to the number of context-independent phone models for French, in order to assess the phoneme separability power of the different layers. As expected, we observed that this power increases with the layer depth in the network: from 34% to 74% in F-measure from the first convolution to the last dense layers, when using spectral clustering. Second, optimal numbers of classes were automatically inferred through inter-and intra-cluster measure criteria. We analyze these classes in terms of standard French phonological features.


conference of the international speech communication association | 2016

Sinusoidal modelling for ecoacoustics

Patrice Guyot; Alice Eldridge; Ying Chen Eyre-Walker; Alison Johnston; Thomas Pellegrini; Mika Peck

Biodiversity assessment is a central and urgent task, necessary to monitoring the changes to ecological systems and under- standing the factors which drive these changes. Technological advances are providing new approaches to monitoring, which are particularly useful in remote regions. Situated within the framework of the emerging field of ecoacoustics, there is grow- ing interest in the possibility of extracting ecological informa- tion from digital recordings of the acoustic environment. Rather than focusing on identification of individual species, an increas- ing number of automated indices attempt to summarise acoustic activity at the community level, in order to provide a proxy for biodiversity. Originally designed for speech processing, sinu- soidal modelling has previously been used as a bioacoustic tool, for example to detect particular bird species. In this paper, we demonstrate the use of sinusoidal modelling as a proxy for bird abundance. Using data from acoustic surveys made during the breeding season in UK woodland, the number of extracted sinusoidal tracks is shown to correlate with estimates of bird abundance made by expert ornithologists listening to the recordings. We also report ongoing work exploring a new approach to investigate the composition of calls in spectro-temporal space that constitutes a promising new method for Ecoaoustic biodiversity assessment.


conference of the international speech communication association | 2015

Predicting disordered speech comprehensibility from Goodness of Pronunciation scores

Lionel Fontan; Thomas Pellegrini; Julia Olcoz; Alberto Abad

Speech production assessment in disordered speech relies on tests such as intelligibility and/or comprehensibility tests. These tests are subjective and time-consuming for both the patients and the practitioners. In this paper, we report on the use of automatically-derived pronunciation scores to predict comprehensibility ratings, on a pilot development corpus comprised of 120 utterances recorded by 12 speakers with distinct pathologies. We found high correlation values (0.81) between Goodness Of Pronunciation (GOP) scores and comprehensibility ratings. We compare the use of a baseline implementation of the GOP algorithmwith a variant called forced-GOP, which showed better results. A linear regression model allowed to predict comprehensibility scores with a 20.9% relative error, compared to the reference scores given by two expert judges. A correlation value of 0.74 was obtained between both the manual and the predicted scores. Most of the prediction errors concern the speakers who have the most extreme ratings (the lowest or the largest values), showing that the predicted score range was globally more limited than the one of the manual scores due to the simplicity of the model.


processing of the portuguese language | 2014

Automatically Recognising European Portuguese Children’s Speech

Annika Hämäläinen; Hyongsil Cho; Sara Candeias; Thomas Pellegrini; Alberto Abad; Michael Tjalve; Isabel Trancoso; Miguel Sales Dias

This paper reports findings from an analysis of errors made by an automatic speech recogniser trained and tested with 3-10-year-old European Portuguese childrens speech. We expected and were able to identify frequent pronunciation error patterns in the childrens speech. Furthermore, we were able to correlate some of these pronunciation error patterns and automatic speech recognition errors. The findings reported in this paper are of phonetic interest but will also be useful for improving the performance of automatic speech recognisers aimed at children representing the target population of the study.

Collaboration


Dive into the Thomas Pellegrini's collaboration.

Top Co-Authors

Avatar

Isabel Trancoso

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jorge Baptista

University of the Algarve

View shared research outputs
Researchain Logo
Decentralizing Knowledge