Allen Stauffer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Allen Stauffer is active.

Explore More

Publication

Featured researches published by Allen Stauffer.

international conference on acoustics, speech, and signal processing | 2011

Survey and evaluation of acoustic features for speaker recognition

Aaron Lawson; Pavel Vabishchevich; Mark C. Huggins; Paul A. Ardis; Brandon Battles; Allen Stauffer

This study seeks to quantify the effectiveness of a broad range of acoustic features for speaker identification and their impact in feature fusion. Sixteen different acoustic features are evaluated under nine different acoustic, channel and speaking style conditions. Three major types of features are examined: traditional (MFCC, PLP, LPCC, etc.), innovative (PYKFEC, MVDR, etc.) and extensions of these (frequency-constrained LPCC, LFCC). All features were then fused in binary and three-way fusion to determine the complementarity between features and their impact on accuracy. Results were surprising, with the MVDR feature having the highest performance for any single feature, and LPCC based features having the greatest impact on fusion effectiveness. Commonly used features like PLP and MFCC did not achieve the best results in any category. It was further found that removing the perceptually-motivated warping from MFCC, MVDR and PYKFEC improved the performance of these features significantly.

international conference on acoustics, speech, and signal processing | 2009

Perturbation and pitch normalization as enhancements to speaker recognition

Aaron Lawson; M. Linderman; Matthew R. Leonard; Allen Stauffer; B. B. Pokines; Michael A. Carlin

This study proposes an approach to improving speaker recognition through the process of minute vocal tract length perturbation of training files, coupled with pitch normalization for both train and test data. The notion of perturbation as a method for improving the robustness of training data for supervised classification is taken from the field of optical character recognition, where distorting characters within a certain range has shown strong improvements across disparate conditions. This paper demonstrates that acoustic perturbation, in this case analysis, distortion, and resynthesis of vocal tract length for a given speaker, significantly improves speaker recognition when the resulting files are used to augment or replace the training data. A pitch length normalization technique is also discussed, which is combined with perturbation to improve open-set speaker recognition from an EER of 20% to 6.7%.

Journal of the Acoustical Society of America | 2018

Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems

John H. L. Hansen; Allen Stauffer; Wei Xia

Speech, speaker, and language systems have traditionally relied on carefully collected speech material for training acoustic models. There is an overwhelming abundance of publicly accessible audio material available for training. A major challenge, however, is that such found data is not professionally recorded, and therefore may contain a wide diversity of background noise, nonlinear distortions, or other unknown environmental or technology-based contamination or mismatch. There is a critical need for automatic analysis to screen such unknown data sets before acoustic model development training, or to perform input audio purity screening prior to classification. In this study, we propose a waveform based clipping detection algorithm for naturalistic audio streams and analyze the impact of clipping at different severities on speech quality measures and automatic speaker identification systems. We use the TIMIT and NIST-SRE08 speech corpora as case studies. The results show, as expected, that clipping introduces a nonlinear distortion into clean speech data, which reduces speech quality and performance for speaker recognition. We also investigate what degree of clipping can be present to sustain effective speech system performance. The proposed detection system, which will be released, could potentially contribute to new audio collections for speech and language technology development.Speech, speaker, and language systems have traditionally relied on carefully collected speech material for training acoustic models. There is an overwhelming abundance of publicly accessible audio material available for training. A major challenge, however, is that such found data is not professionally recorded, and therefore may contain a wide diversity of background noise, nonlinear distortions, or other unknown environmental or technology-based contamination or mismatch. There is a critical need for automatic analysis to screen such unknown data sets before acoustic model development training, or to perform input audio purity screening prior to classification. In this study, we propose a waveform based clipping detection algorithm for naturalistic audio streams and analyze the impact of clipping at different severities on speech quality measures and automatic speaker identification systems. We use the TIMIT and NIST-SRE08 speech corpora as case studies. The results show, as expected, that clipping intr...

Journal of the Acoustical Society of America | 2008

Channel mitigation approach for automatic language identification

Allen Stauffer; Aaron Lawson

A major obstacle to overcome in language identification (LID) performance is the impact of varying channel conditions. In house experiments show that LID performance drops between 10‐12% across channels. The focus of this project is to mitigate the impact of channel conditions and artifacts on LID, and to provide an understanding of how channel robust models may be created by combining data from across corpora. Our approach involved creating composite cross‐channel language models from multiple corpora that were tested with data from three corpora whose results were compared to results obtained from same‐channel and pure cross‐channel experiments. Our hypotheses were that 1) same‐channel models would be the most accurate, 2) purely cross‐channel models would be considerably less accurate, and 3) composite model accuracy would fall in between that of the same‐channel and cross‐channel models. Results were surprising: while pure cross‐channel tests performed the worst, with an average of 11% loss in accurac...

Journal of the Acoustical Society of America | 2007

External factors impacting the performance of speaker identification: Multisession audio research project (MARP) corpus experiments

Stanley Wenndt; Aaron Lawson; Allen Stauffer

This study looks at the effect of data conditions on automated speech processing systems. The goal is to better understand the impact of acoustical features on accuracy and to develop more robust features. A speaker identification (SID) system was used for the experiments. To explore this issue, a new longitudinal database was collected involving 60 speakers over 18 months. This corpus allowed us to examine four data factors that impact SID: (1) intersession variability, (2) question intonation (3), text‐dependency (identical phonetic content), and (4) whispered speech. First, we found that intersession SID suffered an average loss in accuracy of 17% independent of the time latency between sessions. Second, differing the intonation conditions in train and test hurt SID performance by 5%. Third, text‐dependent data showed the most dramatic impact, where using phonetically‐identical test and train sentences yielded 0% error. However, replacing the target speakers content with a random text‐independent sente...

conference of the international speech communication association | 2009