Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Luciana Ferrer is active.

Publication


Featured researches published by Luciana Ferrer.


international conference on acoustics, speech, and signal processing | 2015

Advances in deep neural network approaches to speaker recognition

Mitchell McLaren; Yun Lei; Luciana Ferrer

The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature modeling. Modeling is conducted using the DNN/i-vector framework, in which the traditional universal background model is replaced with a DNN. The recently proposed use of bottleneck features extracted from a DNN is also evaluated. Systems are first compared with a conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector system on the clean conditions of the NIST 2012 speaker recognition evaluation corpus, where a lack of robustness to microphone speech is found. Several methods of DNN feature processing are then applied to bring significantly greater robustness to microphone speech. To direct future research, the DNN-based systems are also evaluated in the context of audio degradations including noise and reverberation.


international conference on acoustics, speech, and signal processing | 2016

Exploring the role of phonetic bottleneck features for speaker and language recognition

Mitchell McLaren; Luciana Ferrer; Aaron Lawson

Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification. For language identification, the features dense phonetic information is believed to enable improved performance by better representing language-dependent phone distributions. For speaker recognition, the role of these features is less clear, given that a bottleneck layer near the DNN output layer is thought to contain limited speaker information. In this article, we analyze the role of bottleneck features in these identification tasks by varying the DNN layer from which they are extracted, under the hypothesis that speaker information is traded for dense phonetic information as the layer moves toward the DNN output layer. Experiments support this hypothesis under certain conditions, and highlight the benefit of using a bottleneck layer close to the DNN output layer when DNN training data is matched to the evaluation conditions, and a layer more central to the DNN otherwise.


international conference on acoustics, speech, and signal processing | 2016

A phonetically aware system for speech activity detection

Luciana Ferrer; Martin Graciarena; Vikramjit Mitra

Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program. On this challenging data, the best-performing systems are those based on deep neural networks (DNN) trained to predict speech/non-speech posteriors for each frame. We propose a novel two-stage approach to SAD that attempts to model phonetic information in the signal more explicitly than in current systems. In the first stage, a bottleneck DNN is trained to predict posteriors for senones. The activations at the bottleneck layer are then used as input to a second DNN, trained to predict the speech/non-speech posteriors. We test performance on two datasets, with matched and mismatched channels compared to those in the training data. On the matched channels, the proposed approach leads to gains of approximately 35% relative to our best single-stage DNN SAD system. On mismatched channels, the proposed system obtains comparable performance to our baseline, indicating more work needs to be done to improve robustness to mismatched data.


Odyssey 2016 | 2016

Analyzing the Effect of Channel Mismatch on the SRI Language Recognition Evaluation 2015 System.

Mitchell McLaren; Diego Castán; Luciana Ferrer

We present the work done by our group for the 2015 language recognition evaluation (LRE) organized by the National Institute of Standards and Technology (NIST), along with an extended post-evaluation analysis. The focus of this evaluation was the development of language recognition systems for clusters of closely related languages using training data released by NIST. This training data contained a highly imbalanced sample from the languages of interest. The SRI team submitted several systems to LRE’15. Major components included (1) bottleneck features extracted from Deep Neural Networks (DNNs) trained to predict English senones, with multiple DNNs trained using a variety of acoustic features; (2) data-driven Discrete Cosine Transform (DCT) contextualization of features for traditional Universal Background Model (UBM) i-vector extraction and for input to a DNN for bottleneck feature extraction; (3) adaptive Gaussian backend scoring; (4) a newly developed multiresolution neural network backend; and (5) cluster-specific Nway fusion of scores. We compare results on our development dataset with those on the evaluation data and find significantly different conclusions about which techniques were useful for each dataset. This difference was due mostly to a large unexpected mismatch in acoustic and channel conditions between the two datasets. We provide a post-evaluation analysis revealing that the successful approaches for this evaluation included the use of bottleneck features, and a well-defined development dataset appropriate for mismatched conditions.


Odyssey | 2010

Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms.

Andreas Stolcke; Murat Akbacak; Luciana Ferrer; Sachin S. Kajarekar; Colleen Richey; Nicolas Scheffer; Elizabeth Shriberg


conference of the international speech communication association | 2001

Improving Performance of a Keyword Spotting System by Using a New Confidence Measure

Luciana Ferrer; Claudio Estienne


Archive | 2013

Trial-Based Calibration for Speaker Recognition in Unseen Conditions

Mitchell McLaren; Aaron Lawson; Luciana Ferrer; Nicolas Scheffer; Yun Lei


conference of the international speech communication association | 2015

Speech-Based Assessment of PTSD in a Military Population using Diverse Feature Classes

Dimitra Vergyri; Bruce Knoth; Elizabeth Shriberg; Vikramjit Mitra; Mitchell McLaren; Luciana Ferrer; Pablo Garcia; Charles R. Marmar


conference of the international speech communication association | 2015

Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance

Luciana Ferrer; Mitchell McLaren; Aaron Lawson; Martin Graciarena


Odyssey 2018 The Speaker and Language Recognition Workshop | 2018

Approaches to Multi-domain Language Recognition

Mitchell McLaren; Mahesh Kumar Nandwana; Diego Castán; Luciana Ferrer

Collaboration


Dive into the Luciana Ferrer's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mahesh Kumar Nandwana

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge