Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ananya Misra is active.

Publication


Featured researches published by Ananya Misra.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

Tara N. Sainath; Ron J. Weiss; Kevin W. Wilson; Bo Li; Arun Narayanan; Ehsan Variani; Michiel Bacchiani; Izhak Shafran; Andrew W. Senior; Kean K. Chin; Ananya Misra; Chanwoo Kim

Multichannel automatic speech recognition (ASR) systems commonly separate speech enhancement, including localization, beamforming, and postfiltering, from acoustic modeling. In this paper, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture, which performs multichannel filtering in the first layer of the network, and show that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction. Next, we show how performance can be improved by factoring the first layer to separate the multichannel spatial filtering operation from a single channel filterbank which computes a frequency decomposition. We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs. Finally, we demonstrate that these approaches can be implemented more efficiently in the frequency domain. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5% compared to a traditional beamforming-based multichannel ASR system and more than 10% compared to a single channel waveform model.


international conference on acoustics, speech, and signal processing | 2012

Mobile music modeling, analysis and recognition

Pavel Golik; Boulos Harb; Ananya Misra; Michael Riley; Alex Rudnick; Eugene Weinstein

We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the systems robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.


New Era for Robust Speech Recognition, Exploiting Deep Learning | 2017

Raw Multichannel Processing Using Deep Neural Networks

Tara N. Sainath; Ron J. Weiss; Kevin W. Wilson; Arun Narayanan; Michiel Bacchiani; Bo Li; Ehsan Variani; Izhak Shafran; Andrew W. Senior; Kean K. Chin; Ananya Misra; Chanwoo Kim

Multichannel automatic speech recognition (ASR) systems commonly separate speech enhancement, including localization, beamforming, and postfiltering, from acoustic modeling. In this chapter, we perform multichannel enhancement jointly with acoustic modeling in a deep-neural-network framework. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture which performs multichannel filtering in the first layer of the network and show that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction. Next, we show how performance can be improved by factoring the first layer to separate the multichannel spatial filtering operation from a single-channel filterbank which computes a frequency decomposition. We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs. Finally, we demonstrate that these approaches can be implemented more efficiently in the frequency domain. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5% compared to a traditional beamforming-based multichannel ASR system and more than 10% compared to a single-channel waveform model.


conference of the international speech communication association | 2012

Speech/Nonspeech Segmentation in Web Videos

Ananya Misra


conference of the international speech communication association | 2017

Acoustic Modeling for Google Home

Bo Li; Tara N. Sainath; Arun Narayanan; Joe Caroselli; Michiel Bacchiani; Ananya Misra; Izhak Shafran; Hasim Sak; Golan Pundak; Kean K. Chin; Khe Chai Sim; Ron J. Weiss; Kevin W. Wilson; Ehsan Variani; Chanwoo Kim; Olivier Siohan; Mitchel Weintraub; Erik McDermott; Richard C. Rose; Matt Shannon


conference of the international speech communication association | 2015

Time-frequency masking for large scale robust speech recognition.

Yuxuan Wang; Ananya Misra; Kean K. Chin


conference of the international speech communication association | 2015

Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Arun Narayanan; Ananya Misra; Kean K. Chin


Archive | 2012

Client/server-based statistical phrase distribution display and associated text entry technique

Jeffrey S. Sorensen; Megan Teresa Ghastin; Ananya Misra; Ravindran Rajakumar; Julia Anna Maria Neidert


international conference on acoustics, speech, and signal processing | 2018

Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition

Chanwoo Kim; Tara N. Sainath; Arun Narayanan; Ananya Misra; Rajeev Nongpiur; Michiel Bacchiani


conference of the international speech communication association | 2018

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.

Khe Chai Sim; Arun Narayanan; Ananya Misra; Anshuman Tripathi; Golan Pundak; Tara N. Sainath; Parisa Haghani; Bo Li; Michiel Bacchiani

Collaboration


Dive into the Ananya Misra's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chanwoo Kim

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ehsan Variani

Johns Hopkins University

View shared research outputs
Researchain Logo
Decentralizing Knowledge