Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mike Plumpe is active.

Publication


Featured researches published by Mike Plumpe.


international conference on spoken language processing | 1996

Whistler: a trainable text-to-speech system

Xuedong Huang; Alex Acero; Jim Adcock; Hsiao-Wuen Hon; John Goldsmith; Jingsong Liu; Mike Plumpe

We introduce Whistler, a trainable text to speech (TTS) system that automatically learns the model parameters from a corpus. Both prosody parameters and concatenative speech units are derived through the use of probabilistic learning methods that have been successfully used for speech recognition. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style.


international conference on acoustics speech and signal processing | 1998

Automatic generation of synthesis units for trainable text-to-speech systems

Hsiao-Wuen Hon; Alex Acero; Xuedong Huang; Jingsong Liu; Mike Plumpe

The Whistler text-to-speech engine was designed so that we can automatically construct the model parameters from training data. This paper describes in detail the design issues of constructing the synthesis unit inventory automatically from speech databases. The automatic process includes (1) determining the scaleable synthesis unit which can reflect spectral variations of different allophones; (2) segmenting the recording sentences into phonetic segments; (3) select good instances for each synthesis unit to generate best synthesis sentence during the run time. These processes are all derived through the use of probabilistic learning methods which are aimed at the same optimization criteria. Through this automatic unit generation, Whistler can automatically produce synthetic speech that sounds very natural and resembles the acoustic characteristics of the original speaker.


international conference on acoustics, speech, and signal processing | 1997

Recent improvements on Microsoft's trainable text-to-speech system-Whistler

Xuedong Huang; Alex Acero; Hsiao-Wuen Hon; Yun-Cheng Ju; Jingsong Liu; Scott Meredith; Mike Plumpe

The Whistler text-to-speech engine was designed so that we can automatically construct the model parameters from training data. This paper focuses on the improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style. Whisper TTS engine supports Microsoft Speech API and requires less than 3 MB of working memory.


international conference on acoustics, speech, and signal processing | 2001

MiPad: a multimodal interaction prototype

Xuedong Huang; Alex Acero; Ciprian Chelba; Li Deng; Jasha Droppo; Doug Duchene; Joshua T. Goodman; Hsiao-Wuen Hon; Derek Jacoby; Li Jiang; Ricky Loynd; Milind Mahajan; Peter Mau; Scott Meredith; Salman Mughal; Salvado Neto; Mike Plumpe; Kuansan Steury; Gina Venolia; Kuansan Wang; Ye-Yi Wang

Dr. Who is a Microsoft research project aiming at creating a speech-centric multimodal interaction framework, which serves as the foundation for the NET natural user interface. MiPad is the application prototype that demonstrates compelling user advantages for wireless personal digital assistant (PDA) devices, MiPad fully integrates continuous speech recognition (CSR) and spoken language understanding (SLU) to enable users to accomplish many common tasks using a multimodal interface and wireless technologies. It tries to solve the problem of pecking with tiny styluses or typing on minuscule keyboards in todays PDAs. Unlike a cellular phone, MiPad avoids speech-only interaction. It incorporates a built-in microphone that activates whenever a field is selected. As a user taps the screen or uses a built in roller to navigate, the tapping action narrows the number of possible instructions for spoken word understanding. MiPad currently runs on a Windows CE Pocket PC with a Windows 2000 machine where speech recognition is performed. The Dr Who CSR engine uses a unified CFG and n-gram language model. The Dr Who SLU engine is based on a robust chart parser and a plan-based dialog manager. The paper discusses MiPads design, implementation work in progress, and preliminary user study in comparison to the existing pen-based PDA interface.


conference of the international speech communication association | 2000

Large-Vocabulary Speech Recognition under Adverse Acoustic Environments

Li Deng; Alex Acero; Mike Plumpe; Xuedong Huang


conference of the international speech communication association | 2000

Mipad: a next generation PDA prototype.

Xuedong Huang; Alex Acero; Ciprian Chelba; Li Deng; Doug Duchene; Joshua T. Goodman; Hsiao-Wuen Hon; Derek Jacoby; Li Jiang; Ricky Loynd; Milind Mahajan; Peter Mau; Scott Meredith; Salman Mughal; Salvado Neto; Mike Plumpe; Kuansan Wang; Ye-Yi Wang


conference of the international speech communication association | 1998

HMM-based smoothing for concatenative speech synthesis.

Mike Plumpe; Alex Acero; Hsiao-Wuen Hon; Xuedong Huang


conference of the international speech communication association | 1998

Speech, silence, music and noise classification of TV broadcast material.

Ara Samouelian; Jordi Robert-Ribes; Mike Plumpe


SSW | 1998

Which is more important in a concatenative text to speech system - pitch, duration, or spectral discontinuity?

Mike Plumpe; Scott Meredith


Archive | 2000

Large Vocabulary Continuous Speech Recognition under Adverse Conditions

Li Deng; Alex Acero; Mike Plumpe; Xuedong Huang

Collaboration


Dive into the Mike Plumpe's collaboration.

Researchain Logo
Decentralizing Knowledge