Hongbin Suo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongbin Suo is active.

Explore More

Publication

Featured researches published by Hongbin Suo.

IEICE Transactions on Information and Systems | 2008

Automatic Language Identification with Discriminative Language Characterization Based on SVM

Hongbin Suo; Ming Li; Ping Lu; Yonghong Yan

Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.

international conference on natural computation | 2007

The Design of Backend Classifiers in PPRLM System for Language Identification

Hongbin Suo; Ming Li; Tantan Liu; Ping Lu; Yonghong Yan

The design approach for classifying the backend features of the PPRLM (Parallel Phone Recognition and Language Modeling) system is demonstrated in this paper. A variety of features and their combinations extracted by language dependent recognizers were evaluated based on the National Institute of Standards and Technology (NIST) Language Recognition Evaluation (LRE) 2003 corpus. Three well-known classifiers: Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and feed forward neural network (NN) are proposed to compartmentalize these high level features which are generated by n-gram language model scoring and one pass decoding based on acoustic model in PPRLM system. Finally, the log-likelihood radio (LLR) normalization is applied to backend processing to the target language scores and the performance of language recognition is enhanced.

Eurasip Journal on Audio, Speech, and Music Processing | 2008

Using SVM as back-end classifier for language identification

Hongbin Suo; Ming Li; Ping Lu; Yonghong Yan

Robust automatic language identification (LID) is a task of identifying the language from a short utterance spoken by an unknown speaker. One of the mainstream approaches named parallel phone recognition language modeling (PPRLM) has achieved a very good performance. The log-likelihood radio (LLR) algorithm has been proposed recently to normalize posteriori probabilities which are the outputs of back-end classifiers in PPRLM systems. Support vector machine (SVM) with radial basis function (RBF) kernel is adopted as the back-end classifier. But for the conventional SVM classifier, the output is not probability. We use a pair-wise posterior probability estimation (PPPE) algorithm to calibrate the output of each classifier. The proposed approaches are evaluated on the 2005 National Institute of Standards and Technology (NIST). Language recognition evaluation databases and experiments show that the systems described in this paper produce comparable results to the existing arts.

international symposium on neural networks | 2009

A Novel Fuzzy-Based Automatic Speaker Clustering Algorithm

Haipeng Wang; Xiang Zhang; Hongbin Suo; Qingwei Zhao; Yonghong Yan

Fuzzy clustering has been proved successful in various fields in the recent past. In this paper, we introduce fuzzy clustering algorithms into the domain of automatic speaker clustering, and present a novel fuzzy-based hierarchical speaker clustering algorithm by applying fuzzy theory into the state-of-the-art agglomerative hierarchical clustering. This method follows a bottom-up strategy, and determines the fuzzy memberships according to a membership propagation strategy, which propagates fuzzy memberships in the iterative process of hierarchical clustering. Further analysis reveals that this method is an extension of conventional hierarchical clustering algorithm. Experiment results show that our method exhibits quite competitive performances compared to conventional k-means, fuzzy c-means and agglomerative hierarchical clustering algorithms.

Expert Systems With Applications | 2012

Maximum A Posteriori Linear Regression for language recognition

Jinchao Yang; Xiang Zhang; Hongbin Suo; Li Lu; Jianping Zhang; Yonghong Yan

This paper proposes the use of Maximum A Posteriori Linear Regression (MAPLR) transforms as feature for language recognition. Rather than estimating the transforms using maximum likelihood linear regression (MLLR), MAPLR inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transforms. By multi MAPLR adaptation each language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector. This system can achieve performance comparable to that obtained with state-of-the-art approaches and better than MLLR. Experiment results on 2007 NIST Language Recognition Evaluation (LRE) databases show that relative decline in EER of 4% and on mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gained combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% on minDCF comparing with the performance of the only combination of the MMI system and the GMM-SVM system.

web information systems modeling | 2009

WAPS: An Audio Program Surveillance System for Large Scale Web Data Stream

Jie Gao; Yanqing Sun; Hongbin Suo; Qingwei Zhao; Yonghong Yan

We address the problem of effectively monitoring audio programs on the web. The paper tries to present how to construct such an audio program surveillance system using several state-of-the-art speech technologies. A real-world system WAPS (Web Audio Program Surveillance) is used as an example. WAPS is described in details in terms of the challenges it faces, it system architecture and its component modules. Objective evaluation of the whole WAPS is also given. Experiments show that WAPS shows satisfying performance on both artificially created data and real web data.

international conference on acoustics, speech, and signal processing | 2012

Factor analysis of Laplacian approach for speaker recognition

Jinchao Yang; Chunyan Liang; Lin Yang; Hongbin Suo; Junjie Wang; Yonghong Yan

In this study, we introduce a new factor analysis of Laplacian approach to speaker recognition under the support vector machine (SVM) framework. The Laplacian-projected supervector from our proposed Laplacian approach, which finds an embedding that preserves local information by locality preserving projections (LPP), is believed to contain speaker dependent information. The proposed method was compared with the state-of-the-art total variability approach on 2010 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) corpus. According to the compared results, our proposed method is effective.

EURASIP Journal on Advances in Signal Processing | 2012

Low-dimensional representation of Gaussian mixture model supervector for language recognition

Jinchao Yang; Xiang Zhang; Hongbin Suo; Li Lu; Jianping Zhang; Yonghong Yan

In this article, we propose a new feature which could be used for the framework of SVM-based language recognition, by introducing the idea of total variability used in speaker recognition to language recognition. We consider the new feature as low-dimensional representation of Gaussian mixture model supervector. Thus we propose multiple total variability (MTV) language recognition system based on total variability (TV) language recognition system. Our experiments show that the total factor vector includes the language dependent information; whats more, multiple total factor vector contains more language dependent information.Experimental results on 2007 National Institute of Standards and Technology (NIST) Language Recognition Evaluation (LRE) databases show that MTV outperforms TV in 30 s tasks, and both TV and MTV systems can achieve performance similar to that obtained by state-of-the-art approaches. Best performance of our acoustic language recognition systems can be further improved by combining these two new systems.

ieee international conference on cloud computing technology and science | 2011

Language recognition with language total variability

Jinchao Yang; Xiang Zhang; Hongbin Suo; Li Lu; Jianping Zhang; Yonghong Yan

In this paper, we try to introduce the idea of total variability used in speaker recognition to language recognition. In language total variability (LTV), we propose a new recognition system names language total variability recognition system. Our experiments show that language total factor vector includes the language dependent in- formation. Whats more, our experiments show that language total factor vector contains different language dependent information. Ex- periment results on 2007 National Institute of Standards and Tech- nology (NIST) Language Recognition Evaluation (LRE) databases show our proposed system LTV can achieve performance similar to that obtained with state-of-the-art approaches, and we can obtain fur- ther improvement by combining the new system with state-of-the-art systems. It leads to relative improvement of 5.7% in EER and 13.2% in minDCF comparing with the performance of the combination of the MMI and the GMM-SVM systems.

IEICE Transactions on Information and Systems | 2008

Robust Speaker Clustering Using Affinity Propagation

Xiang Zhang; Ping Lu; Hongbin Suo; Qingwei Zhao; Yonghong Yan

In this letter, a recently proposed clustering algorithm named affinity propagation is introduced for the task of speaker clustering. This novel algorithm exhibits fast execution speed and finds clusters with low error. However, experiments show that the speaker purity of affinity propagation is not satisfying. Thus, we propose a hybrid approach that combines affinity propagation with agglomerative hierarchical clustering to improve the clustering performance. Experiments show that compared with traditional agglomerative hierarchical clustering, the hybrid method achieves better performance on the test corpora.

Explore More