Rif A. Saurous | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rif A. Saurous is active.

Explore More

Publication

Featured researches published by Rif A. Saurous.

international conference on acoustics, speech, and signal processing | 2017

CNN architectures for large-scale audio classification

Shawn Hershey; Sourish Chaudhuri; Daniel P. W. Ellis; Jort F. Gemmeke; Aren Jansen; R. Channing Moore; Manoj Plakal; Devin Platt; Rif A. Saurous; Bryan Seybold; Malcolm Slaney; Ron J. Weiss; Kevin W. Wilson

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.

international conference on acoustics, speech, and signal processing | 2017

Trainable frontend for robust and far-field keyword spotting

Pascal Getreuer; Thad Hughes; Richard F. Lyon; Rif A. Saurous

Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.

conference of the international speech communication association | 2017

Tacotron: Towards End-to-End Speech Synthesis

Rj Skerry-Ryan; Daisy Stanton; Yonghui Wu; Ron J. Weiss; Navdeep Jaitly; Zongheng Yang; Ying Xiao; Zhifeng Chen; Samy Bengio; Quoc V. Le; Yannis Agiomyrgiannakis; Robert A. J. Clark; Rif A. Saurous

international conference on acoustics, speech, and signal processing | 2018

NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS

Jonathan Shen; Ruoming Pang; Ron Weiss; Mike Schuster; Navdeep Jaitly; Zongheng Yang; Zhifeng Chen; Yu Zhang; Rj Skerry-Ryan; Rif A. Saurous; Yannis Agiomyrgiannakis; Yonghui Wu

international conference on learning representations | 2017

Deep Probabilistic Programming

Dustin Tran; Matthew D. Hoffman; Rif A. Saurous; Eugene Brevdo; Kevin P. Murphy; David M. Blei

Archive | 2017

Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Rj Skerry-Ryan; Daisy Stanton; Yonghui Wu; Ron Weiss; Navdeep Jaitly; Zongheng Yang; Ying Xiao; Zhifeng Chen; Samy Bengio; Quoc V. Le; Yannis Agiomyrgiannakis; Rob Clark; Rif A. Saurous

international conference on machine learning | 2018