Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mireia Diez is active.

Publication


Featured researches published by Mireia Diez.


international conference on biometrics | 2013

The 2013 speaker recognition evaluation in mobile environment

Elie Khoury; B. Vesnicer; Javier Franco-Pedroso; Ricardo Paranhos Velloso Violato; Z. Boulkcnafet; L. M. Mazaira Fernandez; Mireia Diez; J. Kosmala; Houssemeddine Khemiri; T. Cipr; Rahim Saeidi; Manuel Günther; J. Zganec-Gros; R. Zazo Candil; Flávio Olmos Simões; M. Bengherabi; A. Alvarez Marquina; Mikel Penagarikano; Alberto Abad; M. Boulayemen; Petr Schwarz; D.A. van Leeuwen; J. Gonzalez-Dominguez; M. Uliani Neto; E. Boutellaa; P. Gómez Vilda; Amparo Varona; Dijana Petrovska-Delacrétaz; Pavel Matejka; Joaquin Gonzalez-Rodriguez

This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection error trade-off (DET) confirm that the best performing systems are based on total variability modeling, and are the fusion of several sub-systems. Nevertheless, the good old UBM-GMM based systems are still competitive. The results also show that the use of additional data for training as well as gender-dependent features can be helpful.


international conference on acoustics, speech, and signal processing | 2014

High-performance Query-by-Example Spoken Term Detection on the SWS 2013 evaluation

Luis Javier Rodriguez-Fuentes; Amparo Varona; Mikel Penagarikano; Germán Bordel; Mireia Diez

In the last years, the task of Query-by-Example Spoken Term Detection (QbE-STD), which aims to find occurrences of a spoken query in a set of audio documents, has gained the interest of the research community for its versatility in settings where untranscribed, multilingual and acoustically unconstrained spoken resources, or spoken resources in low-resource languages, must be searched. This paper describes and reports experimental results for a QbE-STD system that achieved the best performance in the recent Spoken Web Search (SWS) evaluation, held as part of MediaEval 2013. Though not optimized for speed, the system operates faster than real-time. The system exploits high-performance phone decoders to extract frame-level phone posteriors (a common representation in QbE-STD tasks). Then, given a query and a audio document, a distance matrix is computed between their phone posterior representations, followed by a newly introduced distance normalization technique and an iterative Dynamic Time Warping (DTW) matching procedure with some heuristic prunings. Results show that remarkable performance improvements can be achieved by using multiple examples per query and, specially, through the late (score-level) fusion of different subsystems, each based on a different set of phone posteriors.


spoken language technology workshop | 2012

On the use of phone log-likelihood ratios as features in spoken language recognition

Mireia Diez; Amparo Varona; Mikel Penagarikano; Luis Javier Rodriguez-Fuentes; Germán Bordel

This paper presents an alternative feature set to the traditional MFCC-SDC used in acoustic approaches to Spoken Language Recognition: the log-likelihood ratios of phone posterior probabilities, hereafter Phone Log-Likelihood Ratios (PLLR), produced by a phone recognizer. In this work, an iVector system trained on this set of features (plus dynamic coefficients) is evaluated and compared to (1) an acoustic iVector system (trained on the MFCC-SDC feature set) and (2) a phonotactic (Phone-lattice-SVM) system, using two different benchmarks: the NIST 2007 and 2009 LRE datasets. iVector systems trained on PLLR features proved to be competitive, reaching or even outperforming the MFCC-SDC-based iVector and the phonotactic systems. The fusion of the proposed approach with the acoustic and phonotactic systems provided even more significant improvements, outperforming state-of-the-art systems on both benchmarks.


Energy Conversion and Management | 1995

Solar radiation incident on tilted surfaces in Burgos, Spain: Isotropic models

A. de Miguel; J. Bilbao; Mireia Diez

Abstract A number of hourly measurements of global solar radiation on a horizontal surface, taken during the period 1981–1986 in Burgos, Spain, have been analyzed into diffuse and beam components. The components of global solar radiation were used to calculate the monthly average hourly and daily values on an inclined surface. The results obtained using isotropic models have been compared, tabulated and plotted against the angle of tilt for summer, winter and all year. The optimum tilt angle and the solar radiation on a south facing tilted surface have been calculated for different periods of time and by three different models. The optimum tilt angle values range from 7° in June to 70° in December and January.


ieee automatic speech recognition and understanding workshop | 2011

Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation

Luis Javier Rodriguez-Fuentes; Mikel Penagarikano; Amparo Varona; Mireia Diez; Germán Bordel; David Martinez; Jesús Villalba; Antonio Miguel; Alfonso Ortega; Eduardo Lleida; Alberto Abad; Oscar Koller; Isabel Trancoso; Paula Lopez-Otero; Laura Docio-Fernandez; Carmen García-Mateo; Rahim Saeidi; Mehdi Soufifar; Tomi Kinnunen; Torbjørn Svendsen; Pasi Fränti

Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless the fusion approach, it is assumed that different systems may contribute complementary information, either because they are developed on different datasets, or because they use different features or different modeling approaches. Most authors apply fusion as a final resource for improving performance based on an existing set of systems. Though relative performance gains decrease as larger sets of systems are considered, best performance is usually attained by fusing all the available systems, which may lead to high computational costs. In this paper, we aim to discover which technologies combine the best through fusion and to analyse the factors (data, features, modeling methodologies, etc.) that may explain such a good performance. Results are presented and discussed for a number of systems provided by the participating sites and the organizing team of the Albayzin 2010 Language Recognition Evaluation. We hope the conclusions of this work help research groups make better decisions in developing language recognition technology.


IEEE Signal Processing Letters | 2014

On the Projection of PLLRs for Unbounded Feature Distributions in Spoken Language Recognition

Mireia Diez; Amparo Varona; Mikel Penagarikano; Luis Javier Rodriguez-Fuentes; Germán Bordel

The so called Phone Log-Likelihood Ratio (PLLR) features have been recently introduced as a novel and effective way of retrieving acoustic-phonetic information in spoken language and speaker recognition systems. In this letter, an in-depth insight into the PLLR feature space is provided and the multidimensional distribution of these features is analyzed in a language recognition system. The study reveals that PLLR features are confined into a subspace that strongly bounds PLLR distributions. To enhance the information retrieved by the system, PLLR features are projected into a hyper-plane that provides a more suitable representation of the subspace where the features lie. After applying the projection method, PCA is used to decorrelate the features. Gains attained on each step of the proposed approach are outlined and compared to simple PCA projection. Experiments carried out on NIST 2007, 2009 and 2011 LRE datasets demonstrate the effectiveness of the proposed method, which yields up to a 27% relative improvement with regard to the system based on the original features.


IEEE Signal Processing Letters | 2014

On the Complementarity of Phone Posterior Probabilities for Improved Speaker Recognition

Mireia Diez; Amparo Varona; Mikel Penagarikano; Luis Javier Rodriguez-Fuentes; Germán Bordel

In this letter, we apply Phone Log-Likelihood Ratio (PLLR) features to the task of speaker recognition. PLLRs, which are computed on the phone posterior probabilities provided by phone decoders, convey acoustic-phonetic information in a sequence of frame-level vectors, and therefore can be easily plugged into traditional acoustic systems, just by replacing the Mel-Frequency Cepstral Coefficients (MFCC) or an alternate representation. To study the performance of the proposed features, MFCC-based and PLLR-based systems are trained under an i-vector-PLDA approach. Results on the NIST 2010 and 2012 Speaker Recognition Evaluation databases show that, despite yielding lower performance than the acoustic system, the system based on PLLR features does provide significant gains when both systems are fused, which reveals a complementarity among features, and provides a suitable and effective way of using higher level phonetic information in speaker recognition systems.


iberian conference on pattern recognition and image analysis | 2011

On the use of dot scoring for speaker diarization

Mireia Diez; Mikel Penagarikano; Amparo Varona; Luis Javier Rodriguez-Fuentes; Germán Bordel

In this paper, an alternative dot scoring based agglomerative hierarchical clustering approach for speaker diarization is presented. Dot-scoring is a simple and fast technique used in speaker verification that makes use of a linearized procedure to score test segments against target models. In our speaker diarization approach speech segments are represented by MAP-adapted GMM zero and first order statistics, dot scoring is applied to compute a similarity measure between segments (or clusters) and finally an agglomerative clustering algorithm is applied until no pair of clusters exceeds a similarity threshold. This diarization system was developed for the Albayzin 2010 Speaker Diarization Evaluation on broadcast news. Results show that the lowest error rate that the clustering algorithm could attain for the evaluation set was around 20% and that over-segmentation was the main source of degradation, due to the lack of robustness in the estimation of statistics for short segments.


language resources and evaluation | 2016

KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios

Luis Javier Rodriguez-Fuentes; Mikel Penagarikano; Amparo Varona; Mireia Diez; Germán Bordel

KALAKA-3 is a speech database specifically designed for the development and evaluation of Spoken Language Recognition (SLR) systems. The database provides TV broadcast speech for training, and audio data extracted from YouTube videos for tuning and testing. The database was created to support the Albayzin 2012 Language Recognition Evaluation (LRE), which featured two language recognition tasks, both dealing with European languages. The first one involved six target languages (Basque, Catalan, English, Galician, Portuguese and Spanish) for which there was plenty of training data, whereas the second one involved four target languages (French, German, Greek and Italian) for which no training data was provided. This second task tried to simulate the use case of low resource languages. Two separate sets of YouTube audio files were provided to test the performance of language recognition systems on both tasks. To allow open-set tests, these datasets included speech in 11 additional (Out-Of-Set) European languages. In this paper, we first discuss the design issues considered when creating the database and describe the data collection procedure. Then, we present the results attained in the Albayzin 2012 LRE, along with the performance of state-of-the-art systems on the four evaluation tracks defined on the database. Both series of results demonstrate the usefulness of KALAKA-3 as a challenging benchmark for the advancement of SLR technology. As far as we know, this is the first database specifically designed to benchmark SLR technology on YouTube audios.


international conference on pattern recognition | 2014

Optimizing PLLR Features for Spoken Language Recognition

Mireia Diez; Amparo Varona; Mikel Penagarikano; Luis Javier Rodriguez-Fuentes; Germán Bordel

Phone Log-Likelihood Ratios (PLLR) have been recently introduced as features for spoken language and speaker recognition systems. This representation has proven to be an effective way of retrieving acoustic-phonotactic information into frame-level vectors, which can be easily plugged into state-of-the-art systems. In a previous work, we began the search of reduced representations of PLLRs, as a mean of reducing computational costs. In this paper, we extend this search, by looking for the optimal compromise between feature vector size and system performance. Results achieved by Principal Component Analysis projection on the PLLR space are extensively analyzed. Also, to evaluate the effect of using larger temporal contexts, a Shifted Delta transformation is applied (and its optimal configuration explored) on highly reduced sets of PCA-projected PLLR features, leading to further performance improvements over the best PCA-projected PLLR set.

Collaboration


Dive into the Mireia Diez's collaboration.

Top Co-Authors

Avatar

Amparo Varona

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Mikel Penagarikano

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Germán Bordel

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Germn Bordel

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Lukas Burget

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pavel Matejka

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Johan Rohdin

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oldrich Plchot

Brno University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge