Rif A. Saurous
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rif A. Saurous.
international conference on acoustics, speech, and signal processing | 2017
Shawn Hershey; Sourish Chaudhuri; Daniel P. W. Ellis; Jort F. Gemmeke; Aren Jansen; R. Channing Moore; Manoj Plakal; Devin Platt; Rif A. Saurous; Bryan Seybold; Malcolm Slaney; Ron J. Weiss; Kevin W. Wilson
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
international conference on acoustics, speech, and signal processing | 2017
Pascal Getreuer; Thad Hughes; Richard F. Lyon; Rif A. Saurous
Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.
conference of the international speech communication association | 2017
Rj Skerry-Ryan; Daisy Stanton; Yonghui Wu; Ron J. Weiss; Navdeep Jaitly; Zongheng Yang; Ying Xiao; Zhifeng Chen; Samy Bengio; Quoc V. Le; Yannis Agiomyrgiannakis; Robert A. J. Clark; Rif A. Saurous
international conference on acoustics, speech, and signal processing | 2018
Jonathan Shen; Ruoming Pang; Ron Weiss; Mike Schuster; Navdeep Jaitly; Zongheng Yang; Zhifeng Chen; Yu Zhang; Rj Skerry-Ryan; Rif A. Saurous; Yannis Agiomyrgiannakis; Yonghui Wu
international conference on learning representations | 2017
Dustin Tran; Matthew D. Hoffman; Rif A. Saurous; Eugene Brevdo; Kevin P. Murphy; David M. Blei
Archive | 2017
Rj Skerry-Ryan; Daisy Stanton; Yonghui Wu; Ron Weiss; Navdeep Jaitly; Zongheng Yang; Ying Xiao; Zhifeng Chen; Samy Bengio; Quoc V. Le; Yannis Agiomyrgiannakis; Rob Clark; Rif A. Saurous
international conference on machine learning | 2018
Daisy Stanton; Yu Zhang; RJ-Skerry Ryan; Eric Battenberg; Joel Shor; Ying Xiao; Ye Jia; Fei Ren; Rif A. Saurous
international conference on artificial intelligence and statistics | 2016
Elad Eban; Mariano Schain; Alan Mackey; Ariel Gordon; Rif A. Saurous
international conference on machine learning | 2018
Alexander A. Alemi; Ben Poole; Ian Fischer; Joshua V. Dillon; Rif A. Saurous; Kevin P. Murphy
international conference on machine learning | 2018
Rj Skerry-Ryan; Eric Battenberg; Ying Xiao; Daisy Stanton; Joel Shor; Ron J. Weiss; Rob Clark; Rif A. Saurous