Is this you? Create Your Porfile

Iñaki Sainz

University of the Basque Country

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Iñaki Sainz is active.

Explore More

Publication

Featured researches published by Iñaki Sainz.

IEEE Journal of Selected Topics in Signal Processing | 2014

Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis

Daniel Erro; Iñaki Sainz; Eva Navas; Inma Hernaez

This article explores the potential of the harmonics plus noise model of speech in the development of a high-quality vocoder applicable in statistical frameworks, particularly in modern speech synthesizers. It presents an extensive explanation of all the different alternatives considered during the design of the HNM-based vocoder, together with the corresponding objective and subjective experiments, and a careful description of its implementation details. Three aspects of the analysis have been investigated: refinement of the pitch estimation using quasi-harmonic analysis, study and comparison of several spectral envelope analysis procedures, and strategies to analyze and model the maximum voiced frequency. The performance of the resulting vocoder is shown to be similar to that of state-of-the-art vocoders in synthesis tasks.

international conference on acoustics, speech, and signal processing | 2007

Evaluation of Pitch Detection Algorithms Under Real Conditions

Iker Luengo; Ibon Saratxaga; Eva Navas; Inmaculada Hernáez; Jon Sanchez; Iñaki Sainz

A novel algorithm based on classical cepstrum calculation followed by dynamic programming is presented in this paper. The algorithm has been evaluated with a 60-minutes database containing 60 speakers and different recording conditions and environments. A second reference database has also been used. In addition, the performance of four popular PDA algorithms has been evaluated with the same databases. The results prove the good performance of the described algorithm in noisy conditions. Furthermore, the paper is a first initiative to perform an evaluation of widely used PDA algorithms over an extensive and realistic database.

international conference on acoustics, speech, and signal processing | 2011

HNM-based MFCC+F0 extractor applied to statistical speech synthesis

Daniel Erro; Iñaki Sainz; Eva Navas; Inma Hernaez

Currently, the statistical framework based on Hidden Markov Models (HMMs) plays a relevant role in speech synthesis, while voice conversion systems based on Gaussian Mixture Models (GMMs) are almost standard. In both cases, statistical modeling is applied to learn distributions of acoustic vectors extracted from speech signals, each vector containing a suitable parametric representation of one speech frame. The overall performance of the systems is often limited by the accuracy of the underlying speech parameterization and reconstruction method. The method presented in this paper allows accurate MFCC extraction and high-quality reconstruction of speech signals assuming a Harmonics plus Noise Model (HNM). Its suitability for high-quality HMM-based speech synthesis is shown through subjective tests.

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours | 2007

Meaningful parameters in emotion characterisation

Eva Navas; Inmaculada Hernáez; Iker Luengo; Iñaki Sainz; Ibon Saratxaga; Jon Sanchez

In expressive speech synthesis some method of mimicking the way one specific speaker express emotions is needed. In this work we have studied the suitability of long term prosodic parameters and short term spectral parameters to reflect emotions in speech, by means of the analysis of the results of two automatic emotion classification systems. Those systems have been trained with different emotional monospeaker databases recorded in standard Basque that include six emotions. Both of them are able to differentiate among emotions for a specific speaker with very high identification rates (above 75%), but the models are not applicable to other speakers (identification rates drop to 20%). Therefore in the synthesis process the control of both spectral and prosodic features is essential to get expressive speech and when a change in speaker is desired the values of the parameters should be re-estimated.

Speech Communication | 2008

Evaluation of automatic break insertion for an agglutinative and inflected language

Eva Navas; Inmaculada Hernáez; Iñaki Sainz

This paper presents the evaluation of automatic break insertion for standard Basque. Basque is an agglutinative and inflected language and POS features, widely used for other languages, are not enough to accurately predict the insertion of breaks in the text. Other morpho-syntactic features, like grammatical case and information about syntagms have also been taken into account. With a textual corpus specially gathered for this study where the sentence internal punctuation marks have been removed, CARTs have been used to predict break locations. After applying parameter selection to the whole morpho-syntactic feature set, the best features were employed to build two CARTs, one that gives the same importance to deletion and insertion errors, T1, and another one, T2, that tries to minimize insertion errors. The objective evaluation of the break insertion algorithms gives a @k statistic of 0.518 and an F of 0.757 for T1 tree. The algorithms have also been subjectively evaluated and although T1 had better objective measures, the number of serious errors made by this tree is larger than the number of serious errors made by T2.

conference on computer as a tool | 2005

Front-End for the Oral Control of Applications in Windows Environments

Iñaki Sainz; Eva Navas; Jon Sanchez; Iker Luengo; Inmaculada Hernáez

This paper presents the development of an oral interface to control any Windows application by means of speech, providing a user-friendly interface. This front-end is fully configurable using plain text files, being able to manage any program with a graphic environment that works under Windows Operating System, using functions from the Windows API. Hidden Markov models provide the speech recognition ability, using recursive training of triphone models built with a SpeechDat database. The text to speech system uses an MBROLA based algorithm and it is integrated in a dynamic library. The application is focused on customers with some vision or movement handicap and it is designed to be used in Basque language

conference of the international speech communication association | 2011