Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rif A. Saurous is active.

Publication


Featured researches published by Rif A. Saurous.


international conference on acoustics, speech, and signal processing | 2017

CNN architectures for large-scale audio classification

Shawn Hershey; Sourish Chaudhuri; Daniel P. W. Ellis; Jort F. Gemmeke; Aren Jansen; R. Channing Moore; Manoj Plakal; Devin Platt; Rif A. Saurous; Bryan Seybold; Malcolm Slaney; Ron J. Weiss; Kevin W. Wilson

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.


international conference on acoustics, speech, and signal processing | 2017

Trainable frontend for robust and far-field keyword spotting

Pascal Getreuer; Thad Hughes; Richard F. Lyon; Rif A. Saurous

Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.


conference of the international speech communication association | 2017

Tacotron: Towards End-to-End Speech Synthesis

Rj Skerry-Ryan; Daisy Stanton; Yonghui Wu; Ron J. Weiss; Navdeep Jaitly; Zongheng Yang; Ying Xiao; Zhifeng Chen; Samy Bengio; Quoc V. Le; Yannis Agiomyrgiannakis; Robert A. J. Clark; Rif A. Saurous


international conference on acoustics, speech, and signal processing | 2018

NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS

Jonathan Shen; Ruoming Pang; Ron Weiss; Mike Schuster; Navdeep Jaitly; Zongheng Yang; Zhifeng Chen; Yu Zhang; Rj Skerry-Ryan; Rif A. Saurous; Yannis Agiomyrgiannakis; Yonghui Wu


international conference on learning representations | 2017

Deep Probabilistic Programming

Dustin Tran; Matthew D. Hoffman; Rif A. Saurous; Eugene Brevdo; Kevin P. Murphy; David M. Blei


Archive | 2017

Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Rj Skerry-Ryan; Daisy Stanton; Yonghui Wu; Ron Weiss; Navdeep Jaitly; Zongheng Yang; Ying Xiao; Zhifeng Chen; Samy Bengio; Quoc V. Le; Yannis Agiomyrgiannakis; Rob Clark; Rif A. Saurous


international conference on machine learning | 2018

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Daisy Stanton; Yu Zhang; RJ-Skerry Ryan; Eric Battenberg; Joel Shor; Ying Xiao; Ye Jia; Fei Ren; Rif A. Saurous


international conference on artificial intelligence and statistics | 2016

Scalable Learning of Non-Decomposable Objectives

Elad Eban; Mariano Schain; Alan Mackey; Ariel Gordon; Rif A. Saurous


international conference on machine learning | 2018

Fixing a Broken ELBO

Alexander A. Alemi; Ben Poole; Ian Fischer; Joshua V. Dillon; Rif A. Saurous; Kevin P. Murphy


international conference on machine learning | 2018

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

Rj Skerry-Ryan; Eric Battenberg; Ying Xiao; Daisy Stanton; Joel Shor; Ron J. Weiss; Rob Clark; Rif A. Saurous

Collaboration


Dive into the Rif A. Saurous's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ron Weiss

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge