Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Puming Zhan is active.

Publication


Featured researches published by Puming Zhan.


international conference on acoustics, speech, and signal processing | 1997

Speaker normalization based on frequency warping

Puming Zhan; Martin Westphal

In speech recognition, speaker-dependence of a speech recognition system comes from speaker-dependence of the speech feature, and the variation of vocal tract shape is the major source of inter-speaker variations of the speech feature, though there are some other sources which also contribute. In this paper, we address the approach of speaker normalization which aims at normalizing speakers vocal tract length based on frequency warping (FWP). The FWP is implemented in the front-end preprocessing of our speech recognition system. We investigate the formant-based and ML-based FWP in linear and nonlinear warping modes, and compare them in detail. All experimental results are based on our JANUS3 large vocabulary continuous speech recognition system and the Spanish Spontaneous Scheduling Task database (SSST).


international conference on acoustics, speech, and signal processing | 1997

Janus-III: speech-to-speech translation in multiple languages

Alon Lavie; Alex Waibel; Lori S. Levin; Michael Finke; Donna Gates; Marsal Gavaldà; Torsten Zeppenfeld; Puming Zhan

This paper describes JANUS-III, our most recent version of the JANUS speech-to-speech translation system. We present an overview of the system and focus on how system design facilitates speech translation between multiple languages, and allows for easy adaptation to new source and target languages. We also describe our methodology for evaluation of end-to-end system performance with a variety of source and target languages. For system development and evaluation, we have experimented with both push-to-talk as well as cross-talk recording conditions. To date, our system has achieved performance levels of over 80% acceptable translations on transcribed input, and over 70% acceptable translations on speech input recognized with a 75-90% word accuracy. Our current major research is concentrated on enhancing the capabilities of the system to deal with input in broad and general domains.


european conference on artificial intelligence | 1996

End-to-End Evaluation in JANUS: A Speech-to-speech Translation System

Donna Gates; Alon Lavie; Lori S. Levin; Alex Waibel; Marsal Gavaldà; Laura Mayfield; Monika Woszczyna; Puming Zhan

JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end- to- end evaluations- the evaluation of the translation capabilities of the system as a whole. The main goal of our end-to-end evaluation procedure is to determine translation accuracy on a test set of previously unseen dialogues. Other goals include evaluating the effectiveness of the system in conveying domain-relevant information and in detecting and dealing appropriately with utterances (or portions of utterances) that are out-of-domain. End-to-end evaluations are performed in order to verify the general coverage of our knowledge sources, guide our development efforts, and to track our improvement over time. We discuss our evaluation procedures, the criteria used for assigning scores to translations produced by the system, and the tools developed for performing this task. Recent Spanish-to-English performance evaluation results are presented as an example.


international conference on acoustics speech and signal processing | 1996

JANUS-II-translation of spontaneous conversational speech

Alex Waibel; Michael Finke; Donna Gates; Marsal Gavaldà; Thomas Kemp; Alon Lavie; Lori S. Levin; Martin Maier; Laura Mayfield; Arthur E. McNair; Ivica Rogina; Kaori Shima; Tilo Sloboda; Monika Woszczyna; Torsten Zeppenfeld; Puming Zhan

JANUS-II is a research system to design and test components of speech-to-speech translation systems as well as a research prototype for such a system. We focus on two aspects of the system: (1) the new features of the speech recognition component JANUS-SR, and (2) the end-to-end performance of JANUS-II, including a comparison of two machine translation strategies used for JANUS-MT (PHOENIX and GLR*).


international conference on spoken language processing | 1996

Translation of conversational speech with JANUS-II

Alon Lavie; Alex Waibel; Lori S. Levin; Donna Gates; Marsal Gavaldà; Torsten Zeppenfeld; Puming Zhan; Oren Glickman

We investigate the possibility of translating continuous spoken conversations in a cross talk environment. This is a task known to be difficult for human translators due to several factors. It is characterized by rapid and even overlapping turn taking, a high degree of coarticulation, and fragmentary language. We describe experiments using both push to talk as well as cross talk recording conditions. Our results indicate that conversational speech recognition and translation is possible, even in a free crosstalk environment. To date, our system has achieved performances of over 80%, acceptable translations on transcribed input, and over 70% acceptable translations on speech input recognized with a 70-80% word accuracy. The systems performance on spontaneous conversations recorded in a cross talk environment is shown to be as good and even slightly superior to the simpler and easier push to talk scenario.


international conference on spoken language processing | 1996

JANUS-II: towards spontaneous Spanish speech recognition

Puming Zhan; Klaus Ries; Marsal Gavaldà; Donna Gates; Alon Lavie; Alex Waibel

JANUS-II is a research system for investigating various issues in speech-to-speech translations and has been implemented for translations in many languages. In this paper, we address the Spanish speech recognition part of JANUS-II. First, we report the bootstrapping and optimization of the recognition system. Then we investigate the difference between push-to-talk and cross-talk dialogs, which are two different kinds of data in our database. We give a detailed noise analysis for the push-to-talk and cross-talk dialogs and present some recognition results for comparison. We have observed that the cross-talk dialogs are harder than the push-to-talk dialogs for speech recognition, because they are more noisy than the latter. Currently, the error rate of our Spanish recognizer is 27% for the push-to-talk test set and 32% for the cross-talk test set.


Archive | 1997

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Puming Zhan; Alex Waibel


conference of the international speech communication association | 1997

Speaker normalization and speaker adaptation - a combination for conversational speech recognition

Puming Zhan; Martin Westphal; Michael Finke; Alex Waibel


Archive | 1996

Switchboard April 1996 Evaluation Report

Michael Finke; Torsten Zeppenfeld; Michael R. Maier; Betty Mayfield; Klaus Ries; Puming Zhan; John D. Lafferty; Alex Waibel


Spoken Language Translation | 1997

Expanding the Domain of a Multi-lingual Speech-to-Speech Translation System

Alon Lavie; Lori S. Levin; Puming Zhan; Maite Taboada; Donna Gates; Mirella Lapata; Cortis Clark; Matthew Broadhead; Alex Waibel

Collaboration


Dive into the Puming Zhan's collaboration.

Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alon Lavie

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Donna Gates

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Lori S. Levin

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Marsal Gavaldà

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Michael Finke

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Klaus Ries

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Laura Mayfield

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Martin Westphal

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge