Is this you? Create Your Porfile

Asunción Moreno

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Asunción Moreno is active.

Explore More

Publication

Featured researches published by Asunción Moreno.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Voice Conversion Based on Weighted Frequency Warping

Daniel Erro; Asunción Moreno; Antonio Bonafonte

Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

Daniel Erro; Asunción Moreno; Antonio Bonafonte

Most existing voice conversion systems, particularly those based on Gaussian mixture models, require a set of paired acoustic vectors from the source and target speakers to learn their corresponding transformation function. The alignment of phonetically equivalent source and target vectors is not problematic when the training corpus is parallel, which means that both speakers utter the same training sentences. However, in some practical situations, such as cross-lingual voice conversion, it is not possible to obtain such parallel utterances. With an aim towards increasing the versatility of current voice conversion systems, this paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions. This method is based on existing voice conversion techniques, and it does not require any phonetic or linguistic information. Subjective evaluation experiments show that the performance of the resulting voice conversion system is very similar to that of an equivalent system trained on a parallel corpus.

International Journal of Speech Technology | 2001

Annotation in the SpeechDat Projects

Henk van den Heuvel; L.W.J. Boves; Asunción Moreno; Maurizio Omologo; Gaël Richard; Eric Sanders

A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.

Speech Communication | 2009

Multidialectal Spanish acoustic modeling for speech recognition

Mónica Caballero; Asunción Moreno; Albino Nogueiras

During the last years, language resources for speech recognition have been collected for many languages and specifically, for global languages. One of the characteristics of global languages is their wide geographical dispersion, and consequently, their wide phonetic, lexical, and semantic dialectal variability. Even if the collected data is huge, it is difficult to represent dialectal variants accurately. This paper deals with multidialectal acoustic modeling for Spanish. The goal is to create a set of multidialectal acoustic models that represents the sounds of the Spanish language as spoken in Latin America and Spain. A comparative study of different methods for combining data between dialects is presented. The developed approaches are based on decision tree clustering algorithms. They differ on whether a multidialectal phone set is defined, and in the decision tree structure applied. Besides, a common overall phonetic transcription for all dialects is proposed. This transcription can be used in combination with all the proposed acoustic modeling approaches. Overall transcription combined with approaches based on defining a multidialectal phone set leads to a full dialect-independent recognizer, capable to recognize any dialect even with a total absence of training data from such dialect. Multidialectal systems are evaluated over data collected in five different countries: Spain, Colombia, Venezuela, Argentina and Mexico. The best results given by multidialectal systems show a relative improvement of 13% over the results obtained with monodialectal systems. Experiments with dialect-independent systems have been conducted to recognize speech from Chile, a dialect not seen in the training process. The recognition results obtained for this dialect are similar to the ones obtained for other dialects.

international conference on acoustics, speech, and signal processing | 2002

Multi-dialectal Spanish speech recognition

Albino Nogueiras; Mónica Caballero; Asunción Moreno

Spanish is a global language, spoken in a big number of different countries with a big dialectal variability‥ This paper deals with the suitability of using a single multi-dialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. This paper deals with the suitability of using a single multi-dialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. The objective is two fold. First, it allows to use all the available databases to jointly train and improve the same system. Second, it allows to use a single system for all the Spanish speakers. The paper describes the rule- based phonetic transcription used for each dialectal variant, the selection of the shared and the specific phonemes to be modeled in a multi-dialectal recognition system, and the results of a multi-dialectal system dealing with dialects in and out of the training set.

Archive | 1995

Keyword Spotting, an Application for Voice Dialing

Eduardo Lleida; José B. Mariño; Josep M. Salavedra; Asunción Moreno

The problem of detecting a given set of words in fluent speech is one of the most interesting topics in speech recognition. In this paper, we deal with the problem of the modelling and rejection of the non-keyword speech for the Spanish language. As a real time application, we present the TELEMACO system. TELEMACO is a word spotting system used to detect the dialing commands in fluent speech used by the IBERCOM Spanish telephone system working in a windows environment.

NATO ASI: Speech recognition and understanding: recent advances, trends and applications | 1992

RAMSES: A Spanish Demisyllable Based Continuous Speech Recognition System

José B. Mariño; Climent Nadeu; Asunción Moreno; Eduardo Lleida; Enrique Monte; Antonio Bonafonte

A continuous speech recognition system (called RAMSES) has been built based on the demisyllable as phonetic unit and tools from connected speech recognition. Speech is parameterized by band-pass lifted LPC-cepstra and demisyllables are represented by hidden Markov models (HMM). In this paper, the application of this system to recognize integer numbers from zero to one thousand is described. The paper contains a general overview of the system, an outline of the grammar inference, a description of the HMM training procedure and an assessment on the recognition performance in a speaker independent experiment.

conference of the international speech communication association | 2001