Inmaculada Hernáez
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Inmaculada Hernáez.
IEEE Transactions on Multimedia | 2010
Iker Luengo; Eva Navas; Inmaculada Hernáez
The definition of parameters is a crucial step in the development of a system for identifying emotions in speech. Although there is no agreement on which are the best features for this task, it is generally accepted that prosody carries most of the emotional information. Most works in the field use some kind of prosodic features, often in combination with spectral and voice quality parametrizations. Nevertheless, no systematic study has been done comparing these features. This paper presents the analysis of the characteristics of features derived from prosody, spectral envelope, and voice quality as well as their capability to discriminate emotions. In addition, early fusion and late fusion techniques for combining different information sources are evaluated. The results of this analysis are validated with experimental automatic emotion identification tests. Results suggest that spectral envelope features outperform the prosodic ones. Even when different parametrizations are combined, the late fusion of long-term spectral statistics with short-term spectral envelope parameters provides an accuracy comparable to that obtained when all parametrizations are combined.
iberoamerican congress on pattern recognition | 2003
Juan J. Igarza; Iñaki Goirizelaia; Koldo Espinosa; Inmaculada Hernáez; Raúl Méndez; Jon Sanchez
Most people are used to signing documents and because of this, it is a trusted and natural method for user identity verification, reducing the cost of password maintenance and decreasing the risk of eBusiness fraud. In the proposed system, identity is securely verified and an authentic electronic signature is created using biometric dynamic signature verification. Shape, speed, stroke order, off-tablet motion, pen pressure and timing information are captured and analyzed during the real-time act of signing the handwritten signature. The captured values are unique to an individual and virtually impossible to duplicate. This paper presents a research of various HMM based techniques for signature verification. Different topologies are compared in order to obtain an optimized high performance signature verification system and signal normalization preprocessing makes the system robust with respect to writer variability.
international conference on acoustics, speech, and signal processing | 2007
Iker Luengo; Ibon Saratxaga; Eva Navas; Inmaculada Hernáez; Jon Sanchez; Iñaki Sainz
A novel algorithm based on classical cepstrum calculation followed by dynamic programming is presented in this paper. The algorithm has been evaluated with a 60-minutes database containing 60 speakers and different recording conditions and environments. A second reference database has also been used. In addition, the performance of four popular PDA algorithms has been evaluated with the same databases. The results prove the good performance of the described algorithm in noisy conditions. Furthermore, the paper is a first initiative to perform an evaluation of widely used PDA algorithms over an extensive and realistic database.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Daniel Erro; Eva Navas; Inmaculada Hernáez; Ibon Saratxaga
Voice conversion has been traditionally focused on spectrum. Current systems lack a solid prosody conversion method suitable for different speaking styles. Recently, the unit selection technique has been applied to transform emotional intonation contours. This paper goes one step beyond: it explores strategies for training and configuring the selection cost function in an emotion conversion application. The proposed system, which uses accent groups as basic intonation units and performs conversion also on phoneme durations and intensity, is evaluated by means of a carefully designed subjective test involving the big six emotions. Although the expressiveness of the converted sentences is still far from that of natural emotional speech, satisfactory results are obtained when different configurations are used for different emotions.
text speech and dialogue | 2004
Eva Navas; Inmaculada Hernáez; Amaia Castelruiz; Iker Luengo
This paper presents a database designed to extract prosodic models corresponding to emotional speech to be used in speech synthesis for standard Basque. A database of acted speech, which uses a corpus containing both neutral texts and texts semantically related with emotion has been recorded for the six basic emotions: anger, disgust, fear, joy, sadness and surprise. Subjective evaluation of the database shows that emotions are accurately identified, so it can be used to study prosodic models of emotion in Basque.
international symposium on universal communication | 2008
K. Arrieta; Igor Leturia; Urtza Iturraspe; A.D. de Ilarraza; Kepa Sarasola; Inmaculada Hernáez; Eva Navas
AnHitz is a project promoted by the Basque Government to develop language technologies for the Basque language. The participants in AnHitz are research groups with very different backgrounds: text processing, speech processing and multimedia. The project aims to further develop existing language, speech and visual technologies for Basque: up to now its fruit is a set of 7 different language resources, 9 NLP tools, and 5 applications.. But also, in the last year of this project we are integrating, for the first time, such resources and tools (both existing and generated in the project) into a content management application for Basque with a natural language communication interface. This application consists of a Question Answering and a Cross Lingual Information Retrieval system on the area of Science and Technology. The interaction between the system and the user will be in Basque (the results of the CLIR module that are not in Basque will be translated through Machine Translation) using Speech Synthesis, Automatic Speech Recognition and a Visual Interface. The various resources, technologies and tools that we are developing are already in a very advanced stage, and the implementation of the content management application to integrate them all is in work and is due to be completed by October 2008.
text speech and dialogue | 2005
Eva Navas; Inmaculada Hernáez; Iker Luengo; Jon Sanchez; Ibon Saratxaga
This paper presents the analysis made to assess the suitability of neutral semantic corpora to study emotional speech. Two corpora have been used: one having neutral texts that were common to all emotions and the other having texts related to the emotion. Subjective and objective analysis have been performed. In the subjective test common corpus has achieved good recognition rates, although worse than those obtained with specific texts. In the objective analysis, differences among emotions are larger for common texts than for specific texts, indicating that in common corpus expression of emotions was more exaggerated. This is convenient for emotional speech synthesis, but no for emotion recognition. So, in this case, common corpus is suitable for the prosodic modeling of emotions to be used in speech synthesis, but for emotion recognition specific texts are more convenient.
COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours | 2007
Eva Navas; Inmaculada Hernáez; Iker Luengo; Iñaki Sainz; Ibon Saratxaga; Jon Sanchez
In expressive speech synthesis some method of mimicking the way one specific speaker express emotions is needed. In this work we have studied the suitability of long term prosodic parameters and short term spectral parameters to reflect emotions in speech, by means of the analysis of the results of two automatic emotion classification systems. Those systems have been trained with different emotional monospeaker databases recorded in standard Basque that include six emotions. Both of them are able to differentiate among emotions for a specific speaker with very high identification rates (above 75%), but the models are not applicable to other speakers (identification rates drop to 20%). Therefore in the synthesis process the control of both spectral and prosodic features is essential to get expressive speech and when a change in speaker is desired the values of the parameters should be re-estimated.
Biometric technology for human identification. Conference | 2005
Juan J. Igarza; Inmaculada Hernáez; Iñaki Goirizelaia; Koldo Espinosa; Jon Escolar
In this paper we present the work developed on off-line signature verification as a continuation of a previous work using Left-to-Right Hidden Markov Models (LR-HMM) in order to extend those models to the field of static or off-line signature processing using results provided by image connectivity analysis. The chain encoding of perimeter points for each blob obtained by this analysis is an ordered set of points in the space, clockwise around the perimeter of the blob. Two models are generated depending on the way the blobs obtained from the connectivity analysis are ordered. In the first one, blobs are ordered according to their perimeter length. In the second proposal, blobs are ordered in their natural reading order, i.e. from the top to the bottom and left to right. Finally, two LR-HMM models are trained using the (x,y) coordinates of the chain codes obtained by the two mentioned techniques and a set of geometrical local features obtained from them such as polar coordinates referred to the center of ink, local radii, segment lengths and local tangent angle. Verification results of the two techniques are compared over a biometrical database containing skilled forgeries.
Speech Communication | 2008
Eva Navas; Inmaculada Hernáez; Iñaki Sainz
This paper presents the evaluation of automatic break insertion for standard Basque. Basque is an agglutinative and inflected language and POS features, widely used for other languages, are not enough to accurately predict the insertion of breaks in the text. Other morpho-syntactic features, like grammatical case and information about syntagms have also been taken into account. With a textual corpus specially gathered for this study where the sentence internal punctuation marks have been removed, CARTs have been used to predict break locations. After applying parameter selection to the whole morpho-syntactic feature set, the best features were employed to build two CARTs, one that gives the same importance to deletion and insertion errors, T1, and another one, T2, that tries to minimize insertion errors. The objective evaluation of the break insertion algorithms gives a @k statistic of 0.518 and an F of 0.757 for T1 tree. The algorithms have also been subjectively evaluated and although T1 had better objective measures, the number of serious errors made by this tree is larger than the number of serious errors made by T2.