Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xianyu Zhao is active.

Publication


Featured researches published by Xianyu Zhao.


international conference on acoustics, speech, and signal processing | 2009

Variational Bayesian Joint factor analysis for speaker verification

Xianyu Zhao; Yuan Dong; Jian Zhao; Liang Lu; Jiqing Liu; Haila Wang

Joint factor analysis (JFA) has been successfully applied to speaker verification tasks to tackle speaker and session variability. In the sense of Bayesian statistics, it is beneficial to take account of the uncertainties in JFA to better characterize its speaker enrollment and verification processes, e.g. representing target speaker model by posteriori distribution of latent speaker factors and evaluating model likelihood by integrating over all latent factors. However, in a JFA model which has a large number of latent factors, it is computationally demanding to carry out these things in their exact form. In this paper, an alternative approach based on variational Bayesian is developed to explore uncertainties in JFA in an approximate yet efficient way. In this method, fully correlated posteriori distribution is approximated by a variational distribution of factorial form to facilitate inference; and a tight lower bound on model likelihood is derived. Experimental results on the 10sec4w-10sec4w task of the 2006 NIST SRE show that variational Bayesian JFA could obtain better performance than JFA using point estimate.


international conference on acoustics, speech, and signal processing | 2009

The effect of language factors for robust speaker recognition

Liang Lu; Yuan Dong; Xianyu Zhao; Jiqing Liu; Haila Wang

From the results of the NIST speaker recognition evaluation in resent years, speaker recognition systems which are mainly developed based on English training data suffer the language gap problem, namely, the performance of non-English trails is much worse than that of English trails. This problem is addressed in this paper. Based on the conventional joint factor analysis model, we enrolled in the language factors which are mean to capture the language character of each testing and training speech utterance, and compensation was carried out by removing the language factors in order to shrink the difference between languages. Experiments on 2006 NIST SRE data show that, the language factor compensation alone can reduce the gap between the performance of English and non-English trails, and the score level combination with eigenchannels can further improve the performance of non-English trails, e.g., for female part, we observed about 19% relatively reduction in EER, when compared with eigenchannels session variability compensation alone.


international conference on acoustics, speech, and signal processing | 2008

Nonlinear kernel nuisance attribute projection for speaker verification

Xianyu Zhao; Yuan Dong; Hao Yang; Jian Zhao; Liang Lu; Haila Wang

Nuisance attribute projection (NAP) was successfully applied in SVM-based speaker verification systems to improve performance by doing projection to remove dimensions from the SVM feature space that cause unwanted variability in the kernel. Previous studies of NAP were focused mainly on linear and generalized linear kernel SVMs. In this paper, NAP in nonlinear kernel SVMs, e.g. polynomial or Gaussian kernels, are investigated. Instead of doing explicit feature expansion and projection in high-dimension feature space, kernel principal component analysis is employed to find nuisance dimensions; and, NAP is carried out implicitly by incorporating it into some compensated kernel functions. Experimental results on the 2006 NIST SRE corpus indicate the effectiveness of such nonlinear kernel NAP. Compared with linear NAP, nonlinear NAP with Gaussian kernel obtained about 11% relative improvement in equal error rate (EER).


international conference on acoustics, speech, and signal processing | 2007

Svm-Based Speaker Verification by Location in the Space of Reference Speakers

Xianyu Zhao; Yuan Dong; Hao Yang; Jian Zhao; Haila Wang

In this paper, we investigate SVM-based speaker verification by location in the space of reference speakers. Speaker location is represented by a vector of log-likelihoods of utterance data given reference speaker models. Channel or session variability in speaker locations due to microphone, acoustic environments etc. would impair verification performance. To reduce such variability, within-class covariance normalization (WCCN), nuisance attribute projection (NAP) and their combination are applied, and significant performance improvements are obtained. Experimental results on a NIST SRE 2006 task show that this location SVM system achieves comparable performance to a state-of-art cepstral GMM-UBM verification system, and their fusion can give additional performance gains.


Acta Automatica Sinica | 2009

Studies on Model Distance Normalization Approach in Text-independent Speaker Verification

Yuan Dong; Liang Lu; Xianyu Zhao; Jian Zhao

Abstract Model distance normalization (D-Norm) is one of the useful score normalization approaches in automatic speaker verification (ASV) systems. The main advantage of D-Norm lies in that it does not need any additional speech data or external speaker population, as opposed to the other state-of-the-art score normalization approaches. But still, it has some drawbacks, e.g., the Monte-Carlo based Kullback-Leibler distance estimation approach in the conventional D-Norm approach is a time consuming and computation costly task. In this paper, D-Norm was investigated and its principles were explored from a perspective different from the original one. In addition, this paper also proposed a simplified approach to perform D-Norm, which used the upper bound of the KL divergence between two statistical speaker models as the measure of model distance. Experiments on NIST 2006 SRE corpus showed that the simplified approach of D-Norm achieves similar system performance as the conventional one while the computational complexity is greatly reduced.


international symposium on chinese spoken language processing | 2006

Discriminative transformation for sufficient adaptation in text-independent speaker verification

Hao Yang; Yuan Dong; Xianyu Zhao; Jian Zhao; Haila Wang

In conventional Gaussian Mixture Model – Universal Background Model (GMM-UBM) text-independent speaker verification applications, the discriminability between speaker models and the universal background model (UBM) is crucial to system’s performance. In this paper, we present a method based on heteroscedastic linear discriminant analysis (HLDA) that can enhance the discriminability between speaker models and UBM. This technique aims to discriminate the individual Gaussian distributions of the feature space. After the discriminative transformation, the overlapped parts of Gaussian distributions can be reduced. As a result, some Gaussian components of a target speaker model can be adapted more sufficiently during Maximum a Posteriori (MAP) adaptation, and these components will have more discriminative capability over the UBM. Results are presented on NIST 2004 Speaker Recognition data corpora where it is shown that this method provides significant performance improvements over the baseline system.


international symposium on chinese spoken language processing | 2008

Eigenchannel Compensation and Symmetric Score for Robust Text-Independent Speaker Verification

Yuan Dong; Jian Zhao; Liang Lu; Jiqing Lui; Xianyu Zhao; Haila Wang

The negative effect of the session variability has become more and more severe for the performance of the speaker verification system. This paper discusses the eigenchannel compensation and investigates the symmetric scoring method to diminish the session variability and further enhance the performance. Experiments were conducted on the core tests of the 2006 and 2008 speaker recognition evaluation (SRE) corpuses of the national institute of standards and technology (NIST) respectively. The experimental results demonstrate that the eigenchannel compensation can achieve excellent improvement and the symmetric scoring, as a measurement of cross similarity, can further improve the performance moderately. Overall, the system performance can be significantly improved, with equal error rate from 9.74% to 5.08% , 47.8% on SRE06 corpus and from 16.26% to 9.42% , 42.1% on SRE08 corpus while detection cost function from 0.0456 to 0.0263 , 42.3% on SRE06 corpus and from 0.0692 to 0.0449 , 35.1% on SRE08 corpus.


Tsinghua Science & Technology | 2008

Advances in SVM-Based System Using GMM Super Vectors for Text-Independent Speaker Verification

Jian Zhao; Yuan Dong; Xianyu Zhao; Hao Yang; Liang Lu; Haila Wang

Abstract For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recently, a new SVM-based speaker verification method using GMM super vectors has been proposed. This paper describes the construction of a new speaker verification system and investigates the use of nuisance attribute projection and test normalization to further enhance performance. Experiments were conducted on the core test of the 2006 NIST speaker recognition evaluation corpus. The experimental results indicate that an SVM-based speaker verification system using GMM super vectors can achieve appealing performance. With the use of nuisance attribute projection and test normalization, the system performance can be significantly improved, with improvements in the equal error rate from 7.78% to 4.92% and detection cost function from 0.0376 to 0.0251.


international conference on acoustics, speech, and signal processing | 2006

Multigrained Model Adaptation With Map and Reference Speaker Weighting For Text Independent Speaker Verification

Xianyu Zhao; Yuan Dong; Jun Luo; Hao Yang; Haila Wang

When traditional maximum a posteriori (MAP) adaptation is used to adapt a universal background model (UBM), some model components with little or no enrollment data would remain unchanged in the derived speaker model. These model components would have weak discriminative capability over the background model, and would impair subsequent verification performance. In this paper, we present a new speaker adaptation method which combines MAP and reference speaker weighting (RSW) adaptation in a hierarchical, multigrained mode. It enables all model components to be updated in a way that strikes a good balance between model complexity and available data. The experimental results of NIST speaker recognition evaluation confirmed the effective performance increase with this new method compared with using MAP or RSW adaptation techniques alone


international conference on natural computation | 2009

Nonlinear Nuisance Attribute Projection in Combined Kernels for SVM-Based Speaker Verification

Yuan Dong; Liang Lu; Xianyu Zhao; Haila Wang

This paper investigated the nonlinear nuisance attribute projection (NAP) in combined kernels for SVM-based speaker verification. The combined kernels approach enables the SVM classifier to use several different kinds of kernels together, e.g. linear kernel, RBF kernel, etc, for better classification. To compensate the session variability, which is one of the major reasons for performance degradation, nonlinear kernel NAP was used in this paper to projection out the attribute in the nuisance space which contains mainly the intra speaker variability. Experiments on NIST 2006 SRE corpora shows that, the combined kernels approach outperforms the conventional single kernel SVM approach, while the nonlinear NAP can further enhance this performance gains.

Collaboration


Dive into the Xianyu Zhao's collaboration.

Top Co-Authors

Avatar

Yuan Dong

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Jian Zhao

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Liang Lu

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Hao Yang

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Jiqing Liu

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Liang Lu

University of Edinburgh

View shared research outputs
Researchain Logo
Decentralizing Knowledge