Awni Y. Hannun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Awni Y. Hannun is active.

Explore More

Publication

Featured researches published by Awni Y. Hannun.

Computer Speech & Language | 2017

Building DNN acoustic models for large vocabulary speech recognition

Andrew L. Maas; Peng Qi; Ziang Xie; Awni Y. Hannun; Christopher T. Lengerich; Daniel Jurafsky; Andrew Y. Ng

Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally.

conference of the international speech communication association | 2016

Learning Multiscale Features Directly from Waveforms.

Zhenyao Zhu; Jesse Engel; Awni Y. Hannun

Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand. However, true end-to-end learning, where features are learned directly from waveforms, has only recently reached the performance of hand-tailored representations based on the Fourier transform. In this paper, we detail an approach to use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations. At increased computational cost, we show that increasing temporal resolution via reduced stride and increasing frequency resolution via additional filters delivers significant performance improvements. Further, we find more efficient representations by simultaneously learning at multiple scales, leading to an overall decrease in word error rate on a difficult internal speech test set by 20.7% relative to networks with the same number of parameters trained on spectrograms.

arXiv: Computation and Language | 2014

Deep Speech: Scaling up end-to-end speech recognition.

Awni Y. Hannun; Carl Case; Jared Casper; Bryan Catanzaro; Greg Diamos; Erich Elsen; Ryan Prenger; Sanjeev Satheesh; Shubho Sengupta; Adam Coates; Andrew Y. Ng

international conference on machine learning | 2016

Deep speech 2: end-to-end speech recognition in English and mandarin

Dario Amodei; Sundaram Ananthanarayanan; Rishita Anubhai; Jingliang Bai; Eric Battenberg; Carl Case; Jared Casper; Bryan Catanzaro; Qiang Cheng; Guoliang Chen; Jie Chen; Jingdong Chen; Zhijie Chen; Mike Chrzanowski; Adam Coates; Greg Diamos; Ke Ding; Niandong Du; Erich Elsen; Jesse Engel; Weiwei Fang; Linxi Fan; Christopher Fougner; Liang Gao; Caixia Gong; Awni Y. Hannun; Tony Han; Lappi Vaino Johannes; Bing Jiang; Cai Ju

arXiv: Computation and Language | 2014

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

Andrew L. Maas; Awni Y. Hannun; Daniel Jurafsky; Andrew Y. Ng

arXiv: Computer Vision and Pattern Recognition | 2017

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Pranav Rajpurkar; Awni Y. Hannun; Masoumeh Haghpanahi; Codie Bourn; Andrew Y. Ng

Archive | 2014

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition.

Andrew L. Maas; Awni Y. Hannun; Christopher T. Lengerich; Peng Qi; Daniel Jurafsky; Andrew Y. Ng

international conference on machine learning | 2016

Persistent RNNs: stashing recurrent weights on-chip

Gregory F. Diamos; Shubho Sengupta; Bryan Catanzaro; Mike Chrzanowski; Adam Coates; Erich Elsen; Jesse Engel; Awni Y. Hannun; Sanjeev Satheesh

Archive | 2015

SYSTEMS AND METHODS FOR SPEECH TRANSCRIPTION

Awni Y. Hannun; Carl Case; Jared Casper; Bryan Catanzaro; Gregory F. Diamos; Erich Elsen; Ryan Prenger; Sanjeev Satheesh; Shubhabrata Sengupta; Adam Coates; Andrew Y. Ng

Archive | 2017

END-TO-END SPEECH RECOGNITION

Bryan Catanzaro; Jingdong Chen; Mike Chrzanowski; Erich Elsen; Jesse Engel; Christopher Fougner; Xu Han; Awni Y. Hannun; Ryan Prenger; Sanjeev Satheesh; Shubhabrata Sengupta; Dani Yogatama; Chong Wang; Jun Zhan; Zhenyao Zhu; Dario Amodei

Explore More