Is this you? Create Your Porfile

Bowen Zhou

University of Colorado Boulder

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bowen Zhou is active.

Explore More

Publication

Featured researches published by Bowen Zhou.

IEEE Transactions on Speech and Audio Processing | 2005

SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

John H. L. Hansen; Rongqing Huang; Bowen Zhou; Michael Seadle; John R. Deller; Aparna Gurijala; Mikko Kurimo; Pongtep Angkititrakul

In this study, we discuss a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). NGSW is the first largescale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20 th Century. We propose a system diagram and discuss critical tasks associated with effective audio information retrieval that include: advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and natural language processing for text query requests. A number of questions regarding copyright assessment, metadata construction, digital watermarking must also be addressed for a sustainable audio collection of this magnitude. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from a portion of the NGSW corpus. We discuss a number of research challenges to address the overall task of robust phrase searching in unrestricted audio corpora. 1. Overview The problem of reliable speech recognition for spoken

IEEE Transactions on Speech and Audio Processing | 2005

Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion

Bowen Zhou; John H. L. Hansen

In many speech and audio applications, it is first necessary to partition and classify acoustic events prior to voice coding for communication or speech recognition for spoken document retrieval. In this paper, we propose an efficient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion (BIC). The proposed method extends an earlier formulation by Chen and Gopalakrishnan. In our formulation, Hotellings T/sup 2/-Statistic is used to pre-select candidate segmentation boundaries followed by BIC to perform the segmentation decision. The proposed algorithm also incorporates a variable-size increasing window scheme and a skip-frame test. Our experiments show that we can improve the final algorithm speed by a factor of 100 compared to that in Chen and Gopalakrishnans while achieving a 6.7% reduction in the acoustic boundary miss rate at the expense of a 5.7% increase in false alarm rate using DARPA Hub4 1997 evaluation data. The approach is particularly successful for short segment turns of less than 2 s in duration. The results suggest that the proposed algorithm is sufficiently effective and efficient for audio stream segmentation applications.

IEEE Transactions on Speech and Audio Processing | 2005

Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

Bowen Zhou; John H. L. Hansen

It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speakers eigenspace by preserving discriminative information learned from baseline models in the directions of the test speakers eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.

international conference on acoustics, speech, and signal processing | 2003

Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation

Bowen Zhou; John H. L. Hansen

international conference on acoustics, speech, and signal processing | 2002

Rapid speaker adaptation using multi-stream Structural Maximum Likelihood Eigenspace Mapping

Bowen Zhou; John H. L. Hansen

In this paper, we extend our previously proposed algorithm entitled Structural Maximum Likelihood Eigenspace Mapping (SMLEM) for rapid speaker adaptation. The SMLEM algorithm directly adapts Speaker Independent (SI) acoustic models to a test speaker by mapping the mixture Gaussian components from a SI eigenspace to Speaker Dependent (SD) eigenspaces in a maximum likelihood manner, with very limited adaptation data. In previous SMLEM paper, we presented encouraging results for SMLEM by adapting only the static feature components. In this paper, we propose a multi-stream approach where the static and dynamic feature streams are adapted. For small amounts of adaptation data ranging from 15 to 50 seconds, superior performance is demonstrated over both standard MLLR and block diagonal MLLR.

conference of the international speech communication association | 2000