Mike Schuster | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mike Schuster is active.

Explore More

Publication

Featured researches published by Mike Schuster.

international conference on acoustics, speech, and signal processing | 2013

Statistical parametric speech synthesis using deep neural networks

Heiga Ze; Andrew W. Senior; Mike Schuster

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

IEEE Signal Processing Magazine | 2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling; Shiyin Kang; Heiga Zen; Andrew W. Senior; Mike Schuster; Xiaojun Qian; Helen M. Meng; Li Deng

Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inputs and the acoustic features. Inspired by the intrinsically hierarchical process of human speech production and by the successful application of deep neural networks (DNNs) to automatic speech recognition (ASR), deep learning techniques have also been applied successfully to speech generation, as reported in recent literature. This article systematically reviews these emerging speech generation approaches, with the dual goal of helping readers gain a better understanding of the existing techniques as well as stimulating new work in the burgeoning area of deep learning for parametric speech generation.

international conference on acoustics, speech, and signal processing | 2008

Deploying GOOG-411: Early lessons in data, measurement, and testing

Michiel Bacchiani; Francoise Beaufays; Johan Schalkwyk; Mike Schuster; Brian Strope

We describe our early experience building and optimizing GOOG-411, a fully automated, voice-enabled, business finder. We show how taking an iterative approach to system development allows us to optimize the various components of the system, thereby progressively improving user-facing metrics. We show the contributions of different data sources to recognition accuracy. For business listing language models, we see a nearly linear performance increase with the logarithm of the amount of training data. To date, we have improved our correct accept rate by 25% absolute, and increased our transfer rate by 35% absolute.

international conference on acoustics, speech, and signal processing | 2012

Japanese and Korean voice search

Mike Schuster; Kaisuke Nakajima

This paper describes challenges and solutions for building a successful voice search system as applied to Japanese and Korean at Google. We describe the techniques used to deal with an infinite vocabulary, how modeling completely in the written domain for language model and dictionary can avoid some system complexity, and how we built dictionaries, language and acoustic models in this framework. We show how to deal with the difficulty of scoring results for multiple script languages because of ambiguities. The development of voice search for these languages led to a significant simplification of the original process to build a system for any new language which in in parts became our default process for internationalization of voice search.

pacific rim international conference on artificial intelligence | 2010

Speech recognition for mobile devices at Google

Mike Schuster

We briefly describe here some of the content of a talk to be given at the conference.

international conference on acoustics, speech, and signal processing | 2005

Efficient generation of high-order context dependent weighted finite state transducers for speech recognition

Mike Schuster; Takaaki Hori

This paper describes an algorithm for efficient building of weighted finite state transducers for speech recognition when high-order context-dependent models of order K>3 (triphones) with tied states are used. We show how an algorithm to build a part of the needed composed transducers directly from the decision trees in combination with an improved compilation process can lead to much faster, simpler and more memory-efficient compilation. In our case, it also resulted in substantially smaller final networks. With the described algorithm, it is simple to use high-order full cross-word models with little overhead directly within a one-pass time-synchronous search, which we test comparing resulting final network sizes, recognition rates and speed on a large, spontaneous Japanese speech database. Using the proposed algorithm, it is possible to do real-time recognition using full crossword quinphones with a large acoustic model in about 125 MB of memory at about 9% search error.

Systems and Computers in Japan | 1999

Phoneme boundary estimation using bidirectional recurrent neural networks and its applications

Toshiaki Fukada; Mike Schuster; Yoshinori Sagisaka

This paper describes a phoneme boundary estimation method based on bidirectional recurrent neural networks (BRNNs). Experimental results showed that the proposed method could estimate segment boundaries significantly better than an HMM or a multilayer perceptron-based method. Furthermore, we incorporated the BRNN-based segment boundary estimator into the HMM-based and segment model-based recognition systems. As a result, we confirmed that (1) BRNN outputs were effective for improving the recognition rate and reducing computational time in an HMM-based recognition system and (2) segment lattices obtained by the proposed methods dramatically reduce the computational complexity of segment model-based recognition.

ieee automatic speech recognition and understanding workshop | 2005

Construction of weighted finite state transducers for very wide context-dependent acoustic models

Mike Schuster; Takaaki Hori

A previous paper by the authors described an algorithm for efficient construction of weighted finite state transducers for speech recognition when high-order context-dependent models of order K > 3 (triphones) with tied state observation distributions are used, and showed practical application of the algorithm up to K = 5 (quinphones). In this paper we give additional details of the improved implementation and analyze the algorithms practical runtime requirements and memory footprint for context-orders up to K = 13 (+/-6 phones context) when building fully cross-word capable WFSTs for large vocabulary speech recognition tasks. We show that for typical systems it is possible to use any practical context-order K les 13 without having to fear an exponential explosion of the search space, since the necessary state ID to phone transducer (resembling a phone-loop observing all possible K-phone constraints) can be built in a few minutes at most. The paper also gives some implementation details of how we efficiently collect context statistics and build phonetic decision trees for very wide context-dependent acoustic models

arXiv: Distributed, Parallel, and Cluster Computing | 2015

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Martín Abadi; Ashish Agarwal; Paul Barham; Eugene Brevdo; Zhifeng Chen; Craig Citro; Gregory S. Corrado; Andy Davis; Jeffrey Dean; Matthieu Devin; Sanjay Ghemawat; Ian J. Goodfellow; Andrew Harp; Geoffrey Irving; Michael Isard; Yangqing Jia; Rafal Jozefowicz; Lukasz Kaiser; Manjunath Kudlur; Josh Levenberg; Dan Mané; Rajat Monga; Sherry Moore; Derek Gordon Murray; Chris Olah; Mike Schuster; Jonathon Shlens; Benoit Steiner; Ilya Sutskever; Kunal Talwar

arXiv: Computation and Language | 2016

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Yonghui Wu; Mike Schuster; Zhifeng Chen; Quoc V. Le; Mohammad Norouzi; Wolfgang Macherey; Maxim Krikun; Yuan Cao; Qin Gao; Klaus Macherey; Jeff Klingner; Apurva Shah; Melvin Johnson; Xiaobing Liu; Łukasz Kaiser; Stephan Gouws; Yoshikiyo Kato; Taku Kudo; Hideto Kazawa; Keith Stevens; George Kurian; Nishant Patil; Wei Wang; Cliff Young; Jason Smith; Jason Riesa; Alex Rudnick; Oriol Vinyals; Greg Corrado; Macduff Hughes

Explore More