Daniel Garcia-Romero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Garcia-Romero is active.

Explore More

Publication

Featured researches published by Daniel Garcia-Romero.

Lecture Notes in Computer Science | 2003

A comparative evaluation of fusion strategies for multimodal biometric verification

Julian Fierrez-Aguilar; Javier Ortega-Garcia; Daniel Garcia-Romero; Joaquin Gonzalez-Rodriguez

The aim of this paper, regarding multimodal biometric verification, is twofold: on the one hand, some score fusion strategies reported in the literature are reviewed and, on the other hand, we compare experimentally a selection of them using as monomodal baseline experts: i) our face verification system based on a global face appearance representation scheme, ii) our minutiae-based fingerprint verification system, and iii) our on-line signature verification system based on HMM modeling of temporal functions, on the MCYT multimodal database. A new strategy is also proposed and discussed in order to generate a multimodal combined score by means of Support Vector Machine (SVM) classifiers from which user-independent and user-dependent fusion schemes are derived and evaluated.

international conference on acoustics, speech, and signal processing | 2012

Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition

Daniel Garcia-Romero; Xinhui Zhou; Carol Y. Espy-Wilson

We present a multicondition training strategy for Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling of i-vector representations of speech utterances. The proposed approach uses a multicondition set to train a collection of individual subsystems that are tuned to specific conditions. A final verification score is obtained by combining the individual scores according to the posterior probability of each condition given the trial at hand. The performance of our approach is demonstrated on a subset of the interview data of NIST SRE 2010. Significant robustness to the adverse noise and reverberation conditions included in the multicondition training set are obtained. The system is also shown to generalize to unseen conditions.

international conference on acoustics, speech, and signal processing | 2014

Supervised domain adaptation for I-vector based speaker recognition

Daniel Garcia-Romero; Alan McCree

In this paper, we present a comprehensive study on supervised domain adaptation of PLDA based i-vector speaker recognition systems. After describing the system parameters subject to adaptation, we study the impact of their adaptation on recognition performance. Using the recently designed domain adaptation challenge, we observe that the adaptation of the PLDA parameters (i.e. across-class and within-class co variances) produces the largest gains. Nonetheless, length-normalization is also important; whereas using an indomani UBM and T matrix is not crucial. For the PLDA adaptation, we compare four approaches. Three of them are proposed in this work, and a fourth one was previously published. Overall, the four techniques are successful at leveraging varying amounts of labeled in-domain data and their performance is quite similar. However, our approaches are less involved, and two of them are applicable to a larger class of models (low-rank across-class).

ieee automatic speech recognition and understanding workshop | 2011

Linear versus mel frequency cepstral coefficients for speaker recognition

Xinhui Zhou; Daniel Garcia-Romero; Ramani Duraiswami; Carol Y. Espy-Wilson; Shihab A. Shamma

Mel-frequency cepstral coefficients (MFCC) have been dominantly used in speaker recognition as well as in speech recognition. However, based on theories in speech production, some speaker characteristics associated with the structure of the vocal tract, particularly the vocal tract length, are reflected more in the high frequency range of speech. This insight suggests that a linear scale in frequency may provide some advantages in speaker recognition over the mel scale. Based on two state-of-the-art speaker recognition back-end systems (one Joint Factor Analysis system and one Probabilistic Linear Discriminant Analysis system), this study compares the performances between MFCC and LFCC (Linear frequency cepstral coefficients) in the NIST SRE (Speaker Recognition Evaluation) 2010 extended-core task. Our results in SRE10 show that, while they are complementary to each other, LFCC consistently outperforms MFCC, mainly due to its better performance in the female trials. This can be explained by the relatively shorter vocal tract in females and the resulting higher formant frequencies in speech. LFCC benefits more in female speech by better capturing the spectral characteristics in the high frequency region. In addition, our results show some advantage of LFCC over MFCC in reverberant speech. LFCC is as robust as MFCC in the babble noise, but not in the white noise. It is concluded that LFCC should be more widely used, at least for the female trials, by the mainstream of the speaker recognition community.

international conference on acoustics, speech, and signal processing | 2010

Automatic acquisition device identification from speech recordings

Daniel Garcia-Romero; Carol Y. Espy-Wilson

In this paper we present a study on the automatic identification of acquisition devices when only access to the output speech recordings is possible. A statistical characterization of the frequency response of the device contextualized by the speech content is proposed. In particular, the intrinsic characteristics of the device are captured by a template, constructed by appending together the means of a Gaussian mixture trained on the device speech recordings. This study focuses on two classes of acquisition devices, namely, landline telephone handsets and microphones. Three publicly available databases are used to assess the performance of linear- and mel-scaled cepstral coefficients. A Support Vector Machine classifier was used to perform closed-set identification experiments. The results show classification accuracies higher than 90 percent among the eight telephone handsets and eight microphones tested.

Pattern Recognition Letters | 2005

Adapted user-dependent multimodal biometric authentication exploiting general information

Julian Fierrez-Aguilar; Daniel Garcia-Romero; Javier Ortega-Garcia; Joaquin Gonzalez-Rodriguez

A novel adapted strategy for combining general and user-dependent knowledge at the decision level in multimodal biometric authentication is presented. User-independent, user-dependent, and adapted fusion and decision schemes are compared by using a bimodal system based on fingerprint and written signature. The adapted approach is shown to outperform the other strategies considered in this paper. Exploiting available information for training the fusion function is also shown to be better than using existing information for post-fusion trained decisions.

spoken language technology workshop | 2014

Improving speaker recognition performance in the domain adaptation challenge using deep neural networks

Daniel Garcia-Romero; Xiaohui Zhang; Alan McCree; Daniel Povey

Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC).We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.

international conference on acoustics, speech, and signal processing | 2004

Exploiting general knowledge in user-dependent fusion strategies for multimodal biometric verification

Julian Fierrez-Aguilar; Daniel Garcia-Romero; Javier Ortega-Garcia; Joaquin Gonzalez-Rodriguez

A novel strategy for combining general and user-dependent knowledge in a multimodal biometric verification system is presented. It is based on SVM classifiers and trade-off coefficients introduced in the standard SVM training problem. Experiments are reported on a bimodal biometric system based on fingerprint and on-line signature traits. A comparison between three fusion strategies, namely user-independent, user-dependent and the proposed adapted user-dependent, is carried out. As a result, the suggested approach outperforms the former ones. In particular, a highly remarkable relative improvement of 68% in the EER with respect to the user-independent approach is achieved. The severe and very common problem of training data scarcity in the user-dependent strategy is also relaxed by the proposed scheme, resulting in a relative improvement of 40% in the EER compared to the raw user-dependent strategy.

Pattern Recognition | 2005

Rapid and brief communication: Bayesian adaptation for user-dependent multimodal biometric authentication

Julian Fierrez-Aguilar; Daniel Garcia-Romero; Javier Ortega-Garcia; Joaquin Gonzalez-Rodriguez

A novel score-level fusion strategy based on Bayesian adaptation for user-dependent multimodal biometric authentication is presented. In the proposed method, the fusion function is adapted for each user based on prior information extracted from a pool of users. Experimental results are reported using on-line signature and fingerprint verification subsystems on the MCYT real bimodal database. The proposed scheme outperforms both user-independent and user-dependent standard approaches. As compared to non-adapted user-dependent fusion, relative improvements of 80% and 55% are obtained for small and large training set sizes, respectively.

spoken language technology workshop | 2016

Deep neural network-based speaker embeddings for end-to-end speaker verification

David Snyder; Pegah Ghahremani; Daniel Povey; Daniel Garcia-Romero; Yishay Carmiel; Sanjeev Khudanpur

In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.

Explore More