Arlindo Veiga
University of Coimbra
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arlindo Veiga.
IberSPEECH | 2012
Arlindo Veiga; Dirce Celorico; Jorge Proença; Sara Candeias; Fernando Perdigão
This study presents an approach to the task of automatically classifying and detecting speaking styles. The detection of speaking styles is useful for the segmentation of multimedia data into consistent parts and has important applications, such as identifying speech segments to train acoustic models for speech recognition. In this work the database consists of daily news broadcasts in Portuguese television, on which two main speaking styles are evident: read speech from voice-over and anchors, and spontaneous speech from interviews and commentaries. Using a combination of phonetic and prosodic features we can separate these two speaking styles with a good accuracy (93.7% read, 69.5% spontaneous). This is performed in two steps. The first step separates the speech segments from the non-speech audio segments and the second step classifies read versus spontaneous speaking style. The use of phonetic and prosodic features provides alternative information that leads to an improvement of the classification and detection task.
processing of the portuguese language | 2008
José Lopes; Cláudio Neves; Arlindo Veiga; Alexandre M. A. Maciel; Carla Lopes; Fernando Perdigão; Luis A. S. V. de Sa
This paper describes the development of a robust speech recognition using a database collected in the scope of the Tecnovoz project. The speech recognition system is speaker independent, robust to noise and operates in a small footprint embedded hardware platform. Some issues about the database, the training of the acoustic models, the noise suppression front-end and the recognizers confidence measure are addressed in the paper. Although the database was especially designed for specific small-vocabulary tasks, the best system performance was obtained using triphone models rather than whole-word models.
european signal processing conference | 2015
Jorge Proenga; Arlindo Veiga; Fernando Perdigão
This paper presents an approach to the Query-by-Example task of finding spoken queries on speech databases when the intended match may be non-exact or slightly complex. The built system is low-resource as it tries to solve the problem where the language of queries and searched audio is unspecified. Our method is based on a modified Dynamic Time Warping (DTW) algorithm using posterior-grams and extracting intricate paths to account for special cases of query match such as word re-ordering, lexical variations and filler content. This system was evaluated on the MediaEval 2014 task of Query by Example Search on Speech (QUESST) where the spoken data is from different languages, unknown to the participant. We combined the results of five DTW modifications computed on the output of three phoneme recognizers of different languages. The combination of all systems provided the best performance overall and improved detection of complex case queries.
processing of the portuguese language | 2014
Arlindo Veiga; Carla Lopes; Luis A. S. V. de Sa; Fernando Perdigão
This paper presents a study on keyword spotting systems based on acoustic similarity between a filler model and keyword model. The ratio between the keyword model likelihood and the generic (filler) model likelihood is used by the classifier to detect relevant peaks values that indicate keyword occurrences. We have changed the standard scheme of keyword spotting system to allow keyword detection in a single forward step. We propose a new log-likelihood ratio normalization to minimize the effect of word length on the classifier performance. Tests show the effectiveness of our normalization method against two other methods. Experiments were performed on continuous speech utterances of the Portuguese TECNOVOZ database (read sentences) with keywords of several lengths.
Journal of the Brazilian Computer Society | 2013
Arlindo Veiga; Sara Candeias; Fernando Perdigão
This paper addresses the problem of grapheme to phoneme conversion to create a pronunciation dictionary from a vocabulary of the most frequent words in European Portuguese. A system based on a mixed approach funded on a stochastic model with embedded rules for stressed vowel assignment is described. The implemented model can generate pronunciations from unrestricted words; however, a dictionary with the 40k most frequent words was constructed and corrected interactively. The dictionary includes homographs with multiplepronunciations. The vocabulary was defined using the CETEMPúblico corpus. The model and dictionary are publicly available.
processing of the portuguese language | 2012
Carla Lopes; Arlindo Veiga; Fernando Perdigão
This paper introduces a European Portuguese speech database containing spoken material recorded from children. The need for such database arose from the need of train phone models for the development of a computer aided speech therapy system. Articulatory disorders affect a significant number of children in pre-school age. We propose a system intended to assist and reinforce the conventional speech therapy programs. Through the systematic use of games, it learns the phones where the child has more difficulty to pronounce. The child is then taken to train the production of those phones by playing games. Another interest of a children speech database is that accurate childrens phone recognition is only possible using training data that reflects the population of users. It is a difficult task due to the high pitch of childrens speech.
processing of the portuguese language | 2014
Jorge Proença; Arlindo Veiga; Sara Candeias; João Lemos; Cristina Januário; Fernando Perdigão
This study intends to identify acoustic and phonetic characteristics of the speech of Parkinson’s Disease (PD) patients, usually manifesting hypokinetic dysarthria. A speech database has been collected from a control group and from a group of patients with similar PD severity, but with different degrees of hypokinetic dysarthria. First and second formant frequencies of vowels in continuous speech were analyzed. Several classifiers were built using phonetic features and a range of acoustic features based on cepstral coefficients with the objective of identifying hypokinetic dysarthria. Results show a centralization of vowel formant frequencies for PD speech, as expected. However, some of the features highlighted in literature for discriminating PD speech were not always found to be statistically significant. The automatic classification tasks to identify the most problematic speakers resulted in high precision and sensitivity by using two formant metrics simultaneously and in even higher performance by using acoustic dynamic parameters.
international conference on signal processing | 2008
Cláudio Neves; Arlindo Veiga; Luis A. S. V. de Sa; Fernando Perdigão
A powerful feature extraction system for noise robust speech recognition was standardized by ETSI. The system was developed for distributed speech recognition (DSR) and includes an advanced front-end (AFE) to be implemented in client terminals, which send the extracted parameters to a remote server that runs a speech recognition engine. In view of the integration of a noise-robust front-end in an embedded speech recognition system, which performs simultaneously the feature extraction and the speech recognition tasks, we propose a modified implementation of the front-end with less computational requirements. Using the Aurora 2 speech database, we evaluate the impact on performance of the blind equalization (BE) block, the gain factorization (GF) block and the SNR-dependent waveform processing (SWP) block that are used in the AFE. We conclude that our modified front-end using cepstral mean normalization (CMN) and dropping BE, GF and SWP, outperforms the AFE in a practical task.
processing of the portuguese language | 2014
Vanessa Marquiafável; Christopher Shulby; Arlindo Veiga; Jorge Proença; Sara Candeias; Fernando Perdigão
The correct automatic pronunciation of words is a nontrivial problem, even for inflexions of Portuguese verbs, and has not been systematically solved yet, if verbal irregularity is taken into account. The purpose of this work is to enhance a grapheme-to-phoneme system with a verb pronunciation system for both varieties of Portuguese, Brazilian (BP) and European (EP), given only its infinitive form. The most common verbs for BP and EP (1000 and 2600 respectively) constituted our database to test the pronunciation system. A detailed and systematic analysis of regular and non-regular pronunciation forms of the inflected verbs was performed, and an index of irregularity for verb pronunciation is proposed. A rule-based algorithm to pronounce all inflexions according to verb paradigms is also described. The defined paradigms are, with a high level of certainty, representative of all the verbs for Portuguese.
processing of the portuguese language | 2010
Carla Lopes; Arlindo Veiga; Fernando Perdigão
Phone recognition experiments give information about the confusions between phones. Grouping the most confusable phones and making a multilevel hierarchical classification should improve phone recognition. In this paper a clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language. The method is based on a statistical similarity measurement rather than acoustical/phonetic knowledge. Results are presented for two phone recognisers (TIMIT corpus and Portuguese TECNOVOZ database).