Iker Luengo
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Iker Luengo.
IEEE Transactions on Multimedia | 2010
Iker Luengo; Eva Navas; Inmaculada Hernáez
The definition of parameters is a crucial step in the development of a system for identifying emotions in speech. Although there is no agreement on which are the best features for this task, it is generally accepted that prosody carries most of the emotional information. Most works in the field use some kind of prosodic features, often in combination with spectral and voice quality parametrizations. Nevertheless, no systematic study has been done comparing these features. This paper presents the analysis of the characteristics of features derived from prosody, spectral envelope, and voice quality as well as their capability to discriminate emotions. In addition, early fusion and late fusion techniques for combining different information sources are evaluated. The results of this analysis are validated with experimental automatic emotion identification tests. Results suggest that spectral envelope features outperform the prosodic ones. Even when different parametrizations are combined, the late fusion of long-term spectral statistics with short-term spectral envelope parameters provides an accuracy comparable to that obtained when all parametrizations are combined.
international conference on acoustics, speech, and signal processing | 2007
Iker Luengo; Ibon Saratxaga; Eva Navas; Inmaculada Hernáez; Jon Sanchez; Iñaki Sainz
A novel algorithm based on classical cepstrum calculation followed by dynamic programming is presented in this paper. The algorithm has been evaluated with a 60-minutes database containing 60 speakers and different recording conditions and environments. A second reference database has also been used. In addition, the performance of four popular PDA algorithms has been evaluated with the same databases. The results prove the good performance of the described algorithm in noisy conditions. Furthermore, the paper is a first initiative to perform an evaluation of widely used PDA algorithms over an extensive and realistic database.
text speech and dialogue | 2004
Eva Navas; Inmaculada Hernáez; Amaia Castelruiz; Iker Luengo
This paper presents a database designed to extract prosodic models corresponding to emotional speech to be used in speech synthesis for standard Basque. A database of acted speech, which uses a corpus containing both neutral texts and texts semantically related with emotion has been recorded for the six basic emotions: anger, disgust, fear, joy, sadness and surprise. Subjective evaluation of the database shows that emotions are accurately identified, so it can be used to study prosodic models of emotion in Basque.
text speech and dialogue | 2005
Eva Navas; Inmaculada Hernáez; Iker Luengo; Jon Sanchez; Ibon Saratxaga
This paper presents the analysis made to assess the suitability of neutral semantic corpora to study emotional speech. Two corpora have been used: one having neutral texts that were common to all emotions and the other having texts related to the emotion. Subjective and objective analysis have been performed. In the subjective test common corpus has achieved good recognition rates, although worse than those obtained with specific texts. In the objective analysis, differences among emotions are larger for common texts than for specific texts, indicating that in common corpus expression of emotions was more exaggerated. This is convenient for emotional speech synthesis, but no for emotion recognition. So, in this case, common corpus is suitable for the prosodic modeling of emotions to be used in speech synthesis, but for emotion recognition specific texts are more convenient.
COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours | 2007
Eva Navas; Inmaculada Hernáez; Iker Luengo; Iñaki Sainz; Ibon Saratxaga; Jon Sanchez
In expressive speech synthesis some method of mimicking the way one specific speaker express emotions is needed. In this work we have studied the suitability of long term prosodic parameters and short term spectral parameters to reflect emotions in speech, by means of the analysis of the results of two automatic emotion classification systems. Those systems have been trained with different emotional monospeaker databases recorded in standard Basque that include six emotions. Both of them are able to differentiate among emotions for a specific speaker with very high identification rates (above 75%), but the models are not applicable to other speakers (identification rates drop to 20%). Therefore in the synthesis process the control of both spectral and prosodic features is essential to get expressive speech and when a change in speaker is desired the values of the parameters should be re-estimated.
iberoamerican congress on pattern recognition | 2004
Eva Navas; Inmaculada Hernáez; Amaia Castelruiz; Jon Sanchez; Iker Luengo
This paper presents the acoustical study of an emotional speech database in standard Basque to determine the set of parameters that can be used for the recognition of emotions. The database is divided into two parts, one with neutral texts and another one with texts semantically related with the emotion. The study is performed on both parts, in order to known whether the same criteria may be used to recognize emotions independently of the semantic content of the text. Mean F0, F0 range, maximum positive slope in F0 curve, mean phone duration and RMS energy are analyzed. The parameters selected can distinguish emotions in both corpora, so they are suitable for emotion recognition.
conference on computer as a tool | 2005
Iñaki Sainz; Eva Navas; Jon Sanchez; Iker Luengo; Inmaculada Hernáez
This paper presents the development of an oral interface to control any Windows application by means of speech, providing a user-friendly interface. This front-end is fully configurable using plain text files, being able to manage any program with a graphic environment that works under Windows Operating System, using functions from the Windows API. Hidden Markov models provide the speech recognition ability, using recursive training of triphone models built with a SpeechDat database. The text to speech system uses an MBROLA based algorithm and it is integrated in a dynamic library. The application is focused on customers with some vision or movement handicap and it is designed to be used in Basque language
conference of the international speech communication association | 2005
Iker Luengo; Eva Navas; Inmaculada Hernáez
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Eva Navas; Inmaculada Hernáez; Iker Luengo
conference of the international speech communication association | 2009
Iker Luengo; Eva Navas; Inmaculada Hernáez