Is this you? Create Your Porfile

Dagen Wang

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dagen Wang is active.

Explore More

Publication

Featured researches published by Dagen Wang.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Robust Speech Rate Estimation for Spontaneous Speech

Dagen Wang; Shrikanth Narayanan

In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral sub- band correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

An Acoustic Measure for Word Prominence in Spontaneous Speech

Dagen Wang; Shrikanth Narayanan

An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information

ieee automatic speech recognition and understanding workshop | 2003

Transonics: a speech to speech system for English-Persian interactions

Shrikanth Narayanan; Sankaranarayanan Ananthakrishnan; Robert Belvin; E. Ettaile; Shadi Ganjavi; Panayiotis G. Georgiou; C. M. Hein; S. Kadambe; Kevin Knight; Daniel Marcu; Howard Neely; Naveen Srinivasamurthy; David R. Traum; Dagen Wang

In this paper, we describe the first phase of development of our speech-to-speech system between English and Modem Persian under the DARPA Babylon program. We give an overview of the various system components: the front end ASR, the machine translation system and the speech generation system. Challenges such as the sparseness of available spoken language data and solutions that have been employed to maximize the obtained benefits from using these limited resources are examined. Efforts in the creation of the user interface and the underlying dialog management system for mediated communication are described.

international conference on acoustics, speech, and signal processing | 2004

A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues

Dagen Wang; Shrikanth Narayanan

We propose a multi-pass linear fold algorithm for sentence boundary detection in spontaneous speech. It uses only prosodic cues and does not rely on segmentation information from a speech recognition decoder. We focus on features based on pitch breaks and pitch durations, study their local and global structural properties and find their relationship with sentence boundaries. In the first step, the algorithm, which requires no training, automatically finds a set of candidate pitch breaks by simple curve fitting. In the next step, by exploiting statistical properties of sentence boundaries and disfluency, the algorithm finds the sentence boundaries within these candidate pitch breaks. With this simple method without any explicit segmentation information from an ASR, a 25% error rate was achieved on a randomly selected portion of the switchboard corpus. The result from this method is comparable with those that include word segmentation information and can be used in conjunction to improve the overall performance and confidence.

asilomar conference on signals, systems and computers | 2002

A confidence-score based unsupervised MAP adaptation for speech recognition

Dagen Wang; Shrikanth Narayanan

In this paper, a method of confidence-score based MAP (maximum a posteriori) adaptation in speech recognition is proposed and evaluated. Using confidence scores to dynamically decide the weight of the priors is shown to have good performance improvement in an unsupervised incremental adaptation. The side effect of vocabulary mismatch in adaptation is also effectively controlled by this way. This paper first gives theoretical analysis and then shows some experimental results. Several extensions are made and also discussed.

international conference on acoustics, speech, and signal processing | 2005

Speech rate estimation via temporal correlation and selected sub-band correlation

Shrikanth Narayanan; Dagen Wang

In this paper, we propose a novel method for speech rate estimation without requiring automatic speech recognition. It extends the methods of spectral subband correlation by including temporal correlation and the use of selecting prominent spectral subbands for correlation. Furthermore, to address some of the practical issues in previously published methods, we introduce some novel components into the algorithm such as the use of pitch confidence, magnifying window, relative peak measure and relative threshold. By selecting the parameters and thresholds from realistic development sets, this method achieves a 0.972 correlation coefficient on syllable number estimation and a 0.706 correlation on speech rate estimation. This result is about 6.9% improvement over the current best single estimator and 3.5% improvement over the current multi-estimator, evaluated on the same switchboard database.

international conference on acoustics, speech, and signal processing | 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

Shrikanth Narayanan; Panayiotis G. Georgiou; Abhinav Sethy; Dagen Wang; Murtaza Bulut; Shiva Sundaram; Emil Ettelaie; Sankaranarayanan Ananthakrishnan; Horacio Franco; Kristin Precoda; Dimitra Vergyri; Jing Zheng; Wen Wang; Ramana Rao Gadde; Martin Graciarena; Victor Abrash; Michael W. Frandsen; Colleen Richey

Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements

international conference on acoustics, speech, and signal processing | 2005

An unsupervised quantitative measure for word prominence in spontaneous speech

Dagen Wang; Shrikanth Narayanan

An unsupervised approach for automatic speech prominence detection is proposed in this paper. The algorithm scores prominence by fusing different acoustic feature sets from the speech signal correlation envelope. In addition, we investigate part of speech (POS) as a linguistic correlate for speech prominence. We also underscore the inadequacy of the traditional approach to prominence detection of heuristically tagging speech prominence into discrete levels (categories). Instead, we propose to keep the prominence score continuous, evaluate it by correlation with POS, and leave it for further processing by other applications such as natural language understanding. Furthermore, in contrast to most previous studies, we evaluate prominence scoring on spontaneous speech data (switchboard corpus). Our experimental results indicate that the proposed prominence score can robustly distinguish between content word and function word classes.

conference of the international speech communication association | 2009