Doroteo Torre Toledano
Autonomous University of Madrid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Doroteo Torre Toledano.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Joaquin Gonzalez-Rodriguez; Philip Rose; Daniel Ramos; Doroteo Torre Toledano; Javier Ortega-Garcia
Forensic DNA profiling is acknowledged as the model for a scientifically defensible approach in forensic identification science, as it meets the most stringent court admissibility requirements demanding transparency in scientific evaluation of evidence and testability of systems and protocols. In this paper, we propose a unified approach to forensic speaker recognition (FSR) oriented to fulfil these admissibility requirements within a framework which is transparent, testable, and understandable, both for scientists and fact-finders. We show how the evaluation of DNA evidence, which is based on a probabilistic similarity-typicality metric in the form of likelihood ratios (LR), can also be generalized to continuous LR estimation, thus providing a common framework for phonetic-linguistic methods and automatic systems. We highlight the importance of calibration, and we exemplify with LRs from diphthongal F-pattern, and LRs in NIST-SRE06 tasks. The application of the proposed approach in daily casework remains a sensitive issue, and special caution is enjoined. Our objective is to show how traditional and automatic FSR methodologies can be transparent and testable, but simultaneously remain conscious of the present limitations. We conclude with a discussion on the combined use of traditional and automatic approaches and current challenges for the admissibility of speech evidence.
Pattern Recognition | 2007
Julian Fierrez; Javier Ortega-Garcia; Doroteo Torre Toledano; Joaquin Gonzalez-Rodriguez
The baseline corpus of a new multimodal database, acquired in the framework of the FP6 EU BioSec Integrated Project, is presented. The corpus consists of fingerprint images acquired with three different sensors, frontal face images from a webcam, iris images from an iris sensor, and voice utterances acquired both with a close-talk headset and a distant webcam microphone. The BioSec baseline corpus includes real multimodal data from 200 individuals in two acquisition sessions. In this contribution, the acquisition setup and protocol are outlined, and the contents of the corpus-including data and population statistics-are described. The database will be publicly available for research purposes by mid-2006.
Pattern Analysis and Applications | 2010
Julian Fierrez; Javier Galbally; Javier Ortega-Garcia; Manuel Freire; Fernando Alonso-Fernandez; Daniel Ramos; Doroteo Torre Toledano; Joaquin Gonzalez-Rodriguez; Juan A. Sigüenza; J. Garrido-Salas; E. Anguiano; Guillermo González-de-Rivera; R. Ribalda; Marcos Faundez-Zanuy; Juan Antonio Ortega; Valentín Cardeñoso-Payo; A. Viloria; Carlos Vivaracho; Q.-I. Moro; J. J. Igarza; J. Sanchez; I. Hernaez; C. Orrite-Uruñuela; F. Martinez-Contreras; J. J. Gracia-Roche
A new multimodal biometric database, acquired in the framework of the BiosecurID project, is presented together with the description of the acquisition setup and protocol. The database includes eight unimodal biometric traits, namely: speech, iris, face (still images, videos of talking faces), handwritten signature and handwritten text (on-line dynamic signals, off-line scanned images), fingerprints (acquired with two different sensors), hand (palmprint, contour-geometry) and keystroking. The database comprises 400 subjects and presents features such as: realistic acquisition scenario, balanced gender and population distributions, availability of information about particular demographic groups (age, gender, handedness), acquisition of replay attacks for speech and keystroking, skilled forgeries for signatures, and compatibility with other existing databases. All these characteristics make it very useful in research and development of unimodal and multimodal biometric systems.
Interacting with Computers | 2006
Doroteo Torre Toledano; Rubén Fernández Pozo; Álvaro Hernández Trapote; Luis A. Hernández Gómez
As a result of the evolution in the field of biometrics, a new breed of techniques and methods for user identity recognition and verification has appeared based on the recognition and verification of several biometric features considered unique to each individual. Signature and voice characteristics, facial features, and iris and fingerprint patterns have all been used to identify a person or just to verify that the person is who he/she claims to be. Although still relatively new, these new technologies have already reached a level of development that allows its commercialization. However, there is a lack of studies devoted to the evaluation of these technologies from a user-centered perspective. This paper is intended to promote user-centered design and evaluation of biometric technologies. Towards this end, we have developed a platform to perform empirical evaluations of commercial biometric identity verification systems, including fingerprint, voice and signature verification. In this article, we present an initial empirical study in which we evaluate, compare and try to get insights into the factors that are crucial for the usability of these systems.
PLOS ONE | 2016
Ruben Zazo; Alicia Lozano-Diez; Javier Gonzalez-Dominguez; Doroteo Torre Toledano; Joaquin Gonzalez-Rodriguez
Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.
EURASIP Journal on Advances in Signal Processing | 2009
Rubén Fernández Pozo; José Luis Blanco Murillo; Luis A. Hernández Gómez; Eduardo López Gonzalo; José Alcázar Ramírez; Doroteo Torre Toledano
This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Doroteo Torre Toledano; Jesus Gomez Villardebo; Luis A. Hernández Gómez
This paper presents an algorithm for formant tracking using HMMs and analyzes the influence of HMM initialization, training and context-dependency on the accuracy of the formant tracks obtained with the HMMs. Formant trackers usually include two different phases: one in which the speech is analyzed and formant candidates are obtained, and another in which, by imposing different constraints, the most likely formants are chosen. While the first stage usually relies on standard spectrum estimation techniques, the second stage has evolved notably in the recent years. Traditionally the second phase tries to impose continuity constraints on the formant selection process. Lately there has been ongoing research to include phonemic knowledge in the second stage to make formant tracking more reliable. In order to incorporate phonemic knowledge newer approaches make use of the orthographic transcription of the speech utterance. From the orthographic transcription, the phonemic transcription is obtained, and from this and the speech itself a phonemic segmentation can be obtained. This phonemic segmentation, along with the phonemic transcription and some knowledge of the nominal formant positions for the different phonemes provides extra information that can be used to obtain more accurate formant tracks. This paper presents a complete HMM-based data-driven algorithm for formant tracking suitable to combine different levels of acoustic and phonemic information. A detailed analysis on the performance of this algorithm is discussed for: different initialization strategies using different levels of knowledge, different degrees of training, and context-independent and dependent HMMs. Experimental speaker-dependent results show that the efficient use of phonemic information in HMM training and context-dependent modeling significantly reduces the formant tracking error rate especially for formants
Computer Speech & Language | 2014
Ana Montero Benavides; Rubén Fernández Pozo; Doroteo Torre Toledano; José Luis Blanco Murillo; Eduardo López Gonzalo; Luis A. Hernández Gómez
F_2
international conference on acoustics, speech, and signal processing | 2005
Nicolás Morales; John H. L. Hansen; Doroteo Torre Toledano
and
Eurasip Journal on Audio, Speech, and Music Processing | 2013
Javier Tejedor; Doroteo Torre Toledano; Xavier Anguera; Amparo Varona; Lluís F. Hurtado; Antonio Miguel; José Colás
F_3