Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhanjiang Song is active.

Publication


Featured researches published by Zhanjiang Song.


international conference on acoustics, speech, and signal processing | 2001

Automatic generation of pronunciation lexicons for Mandarin spontaneous speech

William Byrne; Veera Venkataramani; Terri Kamm; Thomas Fang Zheng; Zhanjiang Song; Pascale Fung; Y. Lui; Umar Ruhi

Pronunciation modeling for large vocabulary speech recognition attempts to improve recognition accuracy by identifying and modeling pronunciations that are not in the ASR systems pronunciation lexicon. Pronunciation variability in spontaneous Mandarin is studied using the newly created CASS corpus of phonetically annotated spontaneous speech. Pronunciation modeling techniques developed for English,are applied to this corpus to train pronunciation models which are then used for Mandarin broadcast news transcription.


Speech Communication | 2006

A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification

Zhenyu Xiong; Thomas Fang Zheng; Zhanjiang Song; Frank K. Soong; Wenhu Wu

Abstract We propose a tree-based kernel selection (TBKS) algorithm as a computationally efficient approach to the Gaussian mixture model–universal background model (GMM–UBM) based speaker identification. All Gaussian components in the universal background model are first clustered hierarchically into a tree and the corresponding acoustic space is mapped into structurally partitioned regions. When identifying a speaker, each test input feature vector is scored against a small subset of all Gaussian components. As a result of this TBKS process, computation complexity can be significantly reduced. We improve the efficiency of the proposed system further by applying a previously proposed observation reordering based pruning (ORBP) to screen out unlikely candidate speakers. The approach is evaluated on a speech database of 1031 speakers, in both clean and noisy conditions. The experimental results show that by integrating TBKS and ORBP together we can speed up the computation efficiency by a factor of 15.8 with only a very slight degradation of identification performance, i.e., an increase of 1% of relative error rate, compared with a baseline GMM–UBM system. The improved search efficiency is also robust to additive noise.


international conference on acoustics, speech, and signal processing | 2005

Combining selection tree with observation reordering pruning for efficient speaker identification using GMM-UBM

Zhenyu Xiong; Thomas Fang Zheng; Zhanjiang Song; Wenhu Wu

In this paper a new method of reducing the computational load for Gaussian mixture model universal background model (GMM-UBM) based speaker identification is proposed. In order to speed up the selection of N-best Gaussian mixtures in a UBM, a selection tree (ST) structure as well as relevant operations is proposed. Combined with the existing observation reordering pruning (ORP) method which was proposed for rapid pruning of unlikely speaker model candidates, the proposed method achieves a much larger computation reduction factor than any single individual method. Experimental results show that a GMM-UBM system used in a conjunction with ST and ORP can speed up the computation by a factor of about 16 with an error rate increase of only about 1% compared with a baseline GMM-UBM system.


international symposium on chinese spoken language processing | 2006

CCC speaker recognition evaluation 2006: overview, methods, data, results and perspective

Thomas Fang Zheng; Zhanjiang Song; Lihong Zhang; Michael Brasser; Wei Wu; Jing Deng

For the special session on speaker recognition of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP 2006), the Chinese Corpus Consortium (CCC), the session organizer, developed a speaker recognition evaluation (SRE) to act as a platform for developers in this field to evaluate their speaker recognition systems using two databases provided by the CCC. In this paper, the objective of the evaluation, and the methods and the data used are described. The results of the evaluation are also presented.


international symposium on chinese spoken language processing | 2004

A two-step keyword spotting method based on context-dependent a posteriori probability

Thomas Fang Zheng; Jing Li; Zhanjiang Song; Mingxing Xu

Keyword weighting plays an important role in traditional keyword spotting (KWS) systems: it helps detect keyword candidates in an utterance so that they will not be missed. However, if the keywords are over-weighted, there will be a high number of false alarms, which will slow down the system and might introduce rejection errors; on the other hand, if the keywords are insufficiently weighted, the detection rate is not guaranteed. It is difficult to make a compromise with regard to keyword weighting. A two-step KWS method based on context-dependent a posteriori probability (CDAPP) is proposed in this paper as a way to solve this problem. The first step adopts a continuous speech recognition method, to generate a sequence of acoustic symbols for the second step, which performs a fuzzy keyword search. Preliminary experiments show that the proposed strategy is a promising one that needs additional investigation.


international conference on machine learning and cybernetics | 2005

Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises

Jing Deng; Thomas Fang Zheng; Zhanjiang Song; Jian Liu; Wenhu Wu

In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.


conference of the international speech communication association | 2000

CASS: A phonetically transcribed corpus of Mandarin spontaneous speech

Aijun Li; Fang Zheng; William Byrne; Pascale Fung; Terri Kamm; Yi Liu; Zhanjiang Song; Umar Ruhi; Veera Venkataramani; Xiaoxia Chen


conference of the international speech communication association | 2000

The phonetic labeling on read and spontaneous discourse corpora.

Aijun Li; Xiaoxia Chen; Guohua Sun; Wu Hua; Zhigang Yin; Yiqing Zu; Fang Zheng; Zhanjiang Song


conference of the international speech communication association | 2001

Modeling Pronunciation Variation Using Context-Dependent Weighting and B/S Refined Acoustic Modeling

Thomas Fang Zheng; Zhanjiang Song; Pascale Fung; William Byrne


conference of the international speech communication association | 2002

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling

Thomas Fang Zheng; Zhanjiang Song; Pascale Fung; William Byrne

Collaboration


Dive into the Zhanjiang Song's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pascale Fung

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Terri Kamm

National Institute of Standards and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge