Bowen Zhou
University of Colorado Boulder
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bowen Zhou.
IEEE Transactions on Speech and Audio Processing | 2005
John H. L. Hansen; Rongqing Huang; Bowen Zhou; Michael Seadle; John R. Deller; Aparna Gurijala; Mikko Kurimo; Pongtep Angkititrakul
In this study, we discuss a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). NGSW is the first largescale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20 th Century. We propose a system diagram and discuss critical tasks associated with effective audio information retrieval that include: advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and natural language processing for text query requests. A number of questions regarding copyright assessment, metadata construction, digital watermarking must also be addressed for a sustainable audio collection of this magnitude. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from a portion of the NGSW corpus. We discuss a number of research challenges to address the overall task of robust phrase searching in unrestricted audio corpora. 1. Overview The problem of reliable speech recognition for spoken
IEEE Transactions on Speech and Audio Processing | 2005
Bowen Zhou; John H. L. Hansen
In many speech and audio applications, it is first necessary to partition and classify acoustic events prior to voice coding for communication or speech recognition for spoken document retrieval. In this paper, we propose an efficient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion (BIC). The proposed method extends an earlier formulation by Chen and Gopalakrishnan. In our formulation, Hotellings T/sup 2/-Statistic is used to pre-select candidate segmentation boundaries followed by BIC to perform the segmentation decision. The proposed algorithm also incorporates a variable-size increasing window scheme and a skip-frame test. Our experiments show that we can improve the final algorithm speed by a factor of 100 compared to that in Chen and Gopalakrishnans while achieving a 6.7% reduction in the acoustic boundary miss rate at the expense of a 5.7% increase in false alarm rate using DARPA Hub4 1997 evaluation data. The approach is particularly successful for short segment turns of less than 2 s in duration. The results suggest that the proposed algorithm is sufficiently effective and efficient for audio stream segmentation applications.
IEEE Transactions on Speech and Audio Processing | 2005
Bowen Zhou; John H. L. Hansen
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speakers eigenspace by preserving discriminative information learned from baseline models in the directions of the test speakers eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.
international conference on acoustics, speech, and signal processing | 2003
Bowen Zhou; John H. L. Hansen
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a fast speaker adaptation algorithm entitled Eigenspace Mapping (EigMap) is proposed and described. EigMap rapidly adapts the speaker independent models by constructing discriminative acoustic models in the test speakers eigenspace. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation technique such as block diagonal MLLR. A relative improvement of 18.4% over baseline recognizer is achieved using EigMap with only about 4.5 seconds of adaptation data. It is also demonstrated that EigMap is additive to MLLR by encompassing the speaker dependent discrimination information. A significant relative improvement of 24.6% over baseline is observed by combining MLLR and EigMap techniques.
international conference on acoustics, speech, and signal processing | 2002
Bowen Zhou; John H. L. Hansen
In this paper, we extend our previously proposed algorithm entitled Structural Maximum Likelihood Eigenspace Mapping (SMLEM) for rapid speaker adaptation. The SMLEM algorithm directly adapts Speaker Independent (SI) acoustic models to a test speaker by mapping the mixture Gaussian components from a SI eigenspace to Speaker Dependent (SD) eigenspaces in a maximum likelihood manner, with very limited adaptation data. In previous SMLEM paper, we presented encouraging results for SMLEM by adapting only the static feature components. In this paper, we propose a multi-stream approach where the static and dynamic feature streams are adapted. For small amounts of adaptation data ranging from 15 to 50 seconds, superior performance is demonstrated over both standard MLLR and block diagonal MLLR.
conference of the international speech communication association | 2000
Bowen Zhou; John H. L. Hansen
nordic signal processing symposium | 2004
John H. L. Hansen; Rongqing Huang; Praful Mangalath; Bowen Zhou; Michael Seadle; John R. Deller
conference of the international speech communication association | 2002
Bowen Zhou; John H. L. Hansen
conference of the international speech communication association | 2002
Bowen Zhou; Yuqing Gao; Jeffrey S. Sorensen; Zijian Diao; Michael Picheny
conference of the international speech communication association | 2000
John H. L. Hansen; Bowen Zhou; Murat Akbacak; Ruhi Sarikaya; Bryan L. Pellom