Ignasi Iriondo
La Salle University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ignasi Iriondo.
non linear speech processing | 2009
Ignasi Iriondo; Santiago Planet; Joan-Claudi Socoró; Elisa Martínez; Francesc Alías; Carlos Monzo
This paper presents an automatic system able to enhance expressiveness in speech corpora recorded from acted or stimulated speech. The system is trained with the results of a subjective evaluation carried out on a reduced set of the original corpus. Once the system has been trained, it is able to check the complete corpus and perform an automatic pruning of the unclear utterances, i.e. with expressive styles which are different from the intended corpus. The content which most closely matches the subjective classification remains in the resulting corpus. An expressive speech corpus in Spanish, designed and recorded for speech synthesis purposes, has been used to test the presented proposal. The automatic refinement has been applied to the whole corpus and the result has been validated with a second subjective test.
ieee intelligent vehicles symposium | 2008
Elisa Martínez; Marta Diaz; Javier Melenchón; Josh A. Montero; Ignasi Iriondo; Joan Claudi Socoró
An artificial vision system for vehicles is proposed in this article to alert drivers of potential head on collisions. It is capable of detecting any type of frontal collision from any type of obstacle that may present itself in a vehiclepsilas path. The system operates based on a sequence of algorithms whose images are recorded on a camera located in the moving vehicle, resulting in the calculation of Time-to-Contact taken from an analysis of the optical flow, which allows the vehiclepsilas movement to be studied from a sequence of images.
non-linear speech processing | 2007
Ignasi Iriondo; Santiago Planet; Joan-Claudi Socoró; Francesc Alías
This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationship between both objective and subjective evaluations is analyzed and the obtained conclusions can be useful to improve the following steps related to expressive speech synthesis.
international conference on acoustics, speech, and signal processing | 2007
Ignasi Iriondo; Joan Claudi Socoró; Francesc Alías
This paper presents the use of analogical learning, in particular case-based reasoning, for the automatic generation of prosody from text, which is automatically tagged with prosodic features. This is a corpus-based method for quantitative modelling of prosody to be used in a Spanish text to speech system. The main objective is the development of a method for predicting the three main prosodic parameters: the fundamental frequency (F0) contour, the segmental duration and energy. Both objective and subjective experiments have been conducted in order to evaluate the accuracy of our proposal.
agent-directed simulation | 2004
Ignasi Iriondo; Francesc Alías; Javier Melenchón; M. Angeles Llorca
This paper describes an initial approach to emotional speech synthesis in Catalan based on a diphone concatenation TTS system. The main goal of this work is to develop a simple prosodic model for expressive synthesis. This model is obtained from an emotional speech collection artificially generated by means of a copy-prosody experiment. After validating the emotional content of this collection, the model was automated and incorporated into our TTS system. Finally, the automatic speech synthesis system has been evaluated by means of a perceptual test, obtaining encouraging results.
international conference on image processing | 2003
J. Melenchon; F. De la Torre; Ignasi Iriondo; Francesc Alías; Elisa Martínez; L. Vicent
This paper presents a new method named text to visual synthesis with appearance models (TEVISAM) for generating videorealistic talking heads. In a first step, the system learns a person-specific facial appearance model (PSFAM) automatically. PSFAM allows modeling all facial components (e.g. eyes, mouth, etc) independently and it will be used to animate the face from the input text dynamically. As reported by other researches, one of the key aspects in visual synthesis is the coarticulation effect. To solve such a problem, we introduce a new interpolation method in the high dimensional space of appearance allowing to create photorealistic and videorealistic avatars. In this work, preliminary experiments synthesizing virtual avatars from text are reported. Summarizing, in this paper we introduce three novelties: first, we make use of color PSFAM to animate virtual avatars; second, we introduce a nonlinear high dimensional interpolation to achieve videorealistic animations; finally, this method allows to generate new expressions modeling the different facial elements.
articulated motion and deformable objects | 2004
Javier Melenchón; Lourdes Meler; Ignasi Iriondo
A new algorithm for the incremental learning and non-intrusive tracking of the appearance of a previously non-seen face is presented. The computation is done in a causal fashion: the information for a given frame to be processed is combined only with the one of previous frames. To achieve this aim, a novel way for simultaneous and incremental computation of the Singular Value Decomposition (SVD) and the mean of the data is explained in this work. Previous developed methods about computing the SVD iteratively are taken into account and a novel way to extract the mean from a factorised matrix using SVD is obtained. Moreover, the results are achieved with linear computational cost and sublinear memory requirements with respect to the size of the data. Some experimental results are also reported.
international work-conference on artificial and natural neural networks | 2007
Ignasi Iriondo; Santiago Planet; Francesc Alías; Joan Claudi Socoró; Elisa Martínez
This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances -with wrong expressiveness-. Firstly, a subjective test has been conducted with almost ten percent of the corpus utterances. Secondly, objective techniques have been carried out by means of automatic identification of emotions using different algorithms applied to statistical features computed over the speech prosody. The relationship between both evaluations is achieved by an attribute selection process guided by a metric that measures the matching between the misclassified utterances by the users and the automatic process. The experiments show that this approach can be useful to provide a subset of utterances with poor or wrong expressive content.
non-linear speech processing | 2011
Santiago Planet; Ignasi Iriondo
This paper presents an approach to improve emotion recognition from spontaneous speech. We used a wrapper method to reduce an acoustic set of features and feature-level fusion to merge them with a set of linguistic ones. The proposed system was evaluated with the FAU Aibo Corpus. We considered the same emotion set that was proposed in the Interspeech 2009 Emotion Challenge. The main contribution of this work is the improvement, with the reduced set of features, of the results obtained in this Challenge and the combination of the best ones. We built this set with a selection of 28 acoustic and 5 linguistic features and concatenation of the feature vectors from an original set of 389 parameters.
international symposium on signal processing and information technology | 2003
J. Melenchon; Ignasi Iriondo; Joan Claudi Socoró; E. Matinez; L. Meler
This paper proposes a new method for lip animation of personalized facial model from auditory speech. It is based on Bayesian estimation and person specific appearance models (PSFAM). Initially, a video of a speaking person is recorded from which the visual and acoustic features of the speaker and their relationship will be learnt. First, the visual information of the speaker is stored in a color PSFAM by means of a registration algorithm. Second, the auditory features are extracted from the waveform attached to the recorded video sequence. Third, the relationship between the learnt PSFAM and the auditory features of the speaker is represented by Bayesian estimators. Finally, subjective perceptual tests are reported in order to measure the intelligibility of the preliminary results synthesizing isolated words.