Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aaron Lawson is active.

Publication


Featured researches published by Aaron Lawson.


international conference on acoustics, speech, and signal processing | 2011

Survey and evaluation of acoustic features for speaker recognition

Aaron Lawson; Pavel Vabishchevich; Mark C. Huggins; Paul A. Ardis; Brandon Battles; Allen Stauffer

This study seeks to quantify the effectiveness of a broad range of acoustic features for speaker identification and their impact in feature fusion. Sixteen different acoustic features are evaluated under nine different acoustic, channel and speaking style conditions. Three major types of features are examined: traditional (MFCC, PLP, LPCC, etc.), innovative (PYKFEC, MVDR, etc.) and extensions of these (frequency-constrained LPCC, LFCC). All features were then fused in binary and three-way fusion to determine the complementarity between features and their impact on accuracy. Results were surprising, with the MVDR feature having the highest performance for any single feature, and LPCC based features having the greatest impact on fusion effectiveness. Commonly used features like PLP and MFCC did not achieve the best results in any category. It was further found that removing the perceptually-motivated warping from MFCC, MVDR and PYKFEC improved the performance of these features significantly.


conference of the international speech communication association | 2016

The Speakers in the Wild (SITW) Speaker Recognition Database.

Mitchell McLaren; Luciana Ferrer; Diego Castán; Aaron Lawson

The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology on single and multi-speaker audio acquired across unconstrained or “wild” conditions. The database consists of recordings of 299 speakers, with an average of eight different sessions per person. Unlike existing databases for speaker recognition, this data was not collected under controlled conditions and thus contains real noise, reverberation, intraspeaker variability and compression artifacts. These factors are often convolved in the real world, as the SITW data shows, and they make SITW a challenging database for singleand multispeaker recognition


ieee international conference on technologies for homeland security | 2013

Recent developments in voice biometrics: Robustness and high accuracy

Nicolas Scheffer; Luciana Ferrer; Aaron Lawson; Yun Lei; Mitchell McLaren

Recently, researchers have tackled difficult voice biometrics problems that resonate with the defense and research communities. These problems include non-ideal recording conditions that are frequently found in operational scenarios, such as noise, reverberation, degraded channels, and compressed audio. In this article, we highlight SRIs innovations that resulted from the IARPA Biometrics Exploitation Science & Technology (BEST) and the DARPA Robust Automatic Transcription of Speech (RATS) programs, as well as SRIs approach for codec-degraded speech. We show how these advancements support the case for the biometrics community adopting the use of speaker recognition.


conference of the international speech communication association | 2016

On the Issue of Calibration in DNN-Based Speaker Recognition Systems.

Mitchell McLaren; Diego Castán; Luciana Ferrer; Aaron Lawson

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power.


international conference on acoustics, speech, and signal processing | 2009

Perturbation and pitch normalization as enhancements to speaker recognition

Aaron Lawson; M. Linderman; Matthew R. Leonard; Allen Stauffer; B. B. Pokines; Michael A. Carlin

This study proposes an approach to improving speaker recognition through the process of minute vocal tract length perturbation of training files, coupled with pitch normalization for both train and test data. The notion of perturbation as a method for improving the robustness of training data for supervised classification is taken from the field of optical character recognition, where distorting characters within a certain range has shown strong improvements across disparate conditions. This paper demonstrates that acoustic perturbation, in this case analysis, distortion, and resynthesis of vocal tract length for a given speaker, significantly improves speaker recognition when the resulting files are used to augment or replace the training data. A pitch length normalization technique is also discussed, which is combined with perturbation to improve open-set speaker recognition from an EER of 20% to 6.7%.


Archive | 2014

Identifying User Demographic Traits Through Virtual-World Language Use

Aaron Lawson; John Murray

The paper presents approaches for identifying real-world demographic attributes based on language use in the virtual world. We apply features developed from the classic literature on sociolinguistics and sound symbolism to data collected from virtual-world chat and avatar naming to determine participants’ age and gender. We also examine participants’ use of avatar names across virtual worlds and how these names are employed to project a consistent identity across environments, which we call “traveling characteristics.”


conference of the international speech communication association | 2018

Analysis of Complementary Information Sources in the Speaker Embeddings Framework.

Mahesh Kumar Nandwana; Mitchell McLaren; Diego Castán; Julien van Hout; Aaron Lawson

Deep neural network (DNN)-based speaker embeddings have resulted in new, state-of-the-art text-independent speaker recognition technology. However, very limited effort has been made to understand DNN speaker embeddings. In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard Mel frequency cepstral coefficients (MFCC), as well as power normalized cepstral coefficients (PNCC), and perceptual linear prediction (PLP). Using a speaker recognition system based on DNN speaker embeddings and probabilistic linear discriminant analysis (PLDA), we compared different approaches to leveraging complementary information using score-, embeddings-, and feature-level combination. We report our results for Speakers in the Wild (SITW) and NIST SRE 2016 datasets. We found that first and second embeddings layers are complementary in nature. By applying score and embedding-level fusion we demonstrate relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system.


Journal of the Acoustical Society of America | 2018

The speakers in the room corpus

Aaron Lawson; Karl Ni; Colleen Richey; Zeb Armstrong; Martin Graciarena; Todd Stavish; Cory Stephenson; Jeff Hetherly; Paul Gamble; María Auxiliadora Barrios

The speakers in the room (SITR) corpus is a collaboration between Lab41 and SRI International, designed to be a freely available data set for speech and acoustics research in noisy room conditions. The main focus of the corpus is on distant microphone collection in a series of four rooms of different sizes and configurations. There are both foreground speech and background adversarial sounds, played through high-quality speakers in each room to create multiple, realistic acoustic environments. The foreground speech is played from a randomly rotating speaker to emulate head motion. Foreground speech consists of files from LibriVox audio collections and the background distractor sounds will consist of babble, music, HVAC, TV/radio, dogs, vehicles, and weather sounds drawn from the MUSAN collection. Each room has multiple sessions to exhaustively cover the background foreground combinations, and the audio is collected with twelve different microphones (omnidirectional lavalier, studio cardioid, and piezoelectric) placed strategically around the room. The resulting data set was designed to enable acoustic research on event detection, background detection, source separation, speech enhancement, source distance, sound localization, as well as speech research on speaker recognition, speech activity detection, speech recognition, and language recognition.The speakers in the room (SITR) corpus is a collaboration between Lab41 and SRI International, designed to be a freely available data set for speech and acoustics research in noisy room conditions. The main focus of the corpus is on distant microphone collection in a series of four rooms of different sizes and configurations. There are both foreground speech and background adversarial sounds, played through high-quality speakers in each room to create multiple, realistic acoustic environments. The foreground speech is played from a randomly rotating speaker to emulate head motion. Foreground speech consists of files from LibriVox audio collections and the background distractor sounds will consist of babble, music, HVAC, TV/radio, dogs, vehicles, and weather sounds drawn from the MUSAN collection. Each room has multiple sessions to exhaustively cover the background foreground combinations, and the audio is collected with twelve different microphones (omnidirectional lavalier, studio cardioid, and piezoelec...


computer vision and pattern recognition | 2017

Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video

Robert C. Bolles; J. Brian Burns; Martin Graciarena; Andreas Kathol; Aaron Lawson; Mitchell McLaren; Thomas Mensink

This paper is part of a larger effort to detect manipulations of video by searching for and combining the evidence of multiple types of inconsistencies between the audio and visual channels. Here, we focus on inconsistencies between the type of scenes detected in the audio and visual modalities (e.g., audio indoor, small room versus visual outdoor, urban), and inconsistencies in speaker identity tracking over a video given audio speaker features and visual face features (e.g., a voice change, but no talking face change). The scene inconsistency task was complicated by mismatches in the categories used in current visual scene and audio scene collections. To deal with this, we employed a novel semantic mapping method. The speaker identity inconsistency process was challenged by the complexity of comparing face tracks and audio speech clusters, requiring a novel method of fusing these two sources. Our progress on both tasks was demonstrated on two collections of tampered videos.


Multimedia Data Mining and Analytics | 2015

Detection of Demographics and Identity in Spontaneous Speech and Writing

Aaron Lawson; Luciana Ferrer; Wen Wang; John Murray

This chapter focuses on the automatic identification of demographic traits and identity in both speech and writing. We address language use in the virtual world of online games and text entry on mobile devices in the form of chat, email and nicknames, and demonstrate text factors that correlate with demographics, such as age, gender, personality, and interaction style. Also presented here is work on speakers identification in spontaneous language use, where we describe the state of the art in verification, feature extraction, modeling and calibration across multiple environmental conditions. Finally, we bring speech and writing together to explore approaches to user authentication that span language in general. We discuss how speech-specific factors such as intonation, and writing-specific features such as spelling, punctuation, and typing correction correlate and predict one another as a function of users’ sociolinguistic characteristics.

Collaboration


Dive into the Aaron Lawson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Allen Stauffer

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yun Lei

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mahesh Kumar Nandwana

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge