Yi-Hsiang Chao
National Chiao Tung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yi-Hsiang Chao.
international conference on acoustics, speech, and signal processing | 2005
Wei-Ho Tsai; Shih-Sian Cheng; Yi-Hsiang Chao; Hsin-Min Wang
The paper investigates the problem of automatically grouping unknown speech utterances based on their associated speakers. The proposed method utilizes the vector space model, which was originally developed in document-retrieval research, to characterize each utterance as a tf-idf-based vector of acoustic terms, thereby deriving a reliable measurement of similarity between utterances. To define the required acoustic terms that are most representative in terms of voice characteristics, the Eigenvoice approach is applied to the utterances to be clustered, which creates a set of eigenvector-based terms. To further improve speaker-clustering performance, the proposed method encompasses a mechanism of blind relevance feedback for refining the inter-utterance similarity measure.
international conference on pattern recognition | 2006
Shih-Sian Cheng; Yi-Hsiang Chao; Hsin-Min Wang; Hsin-Chia Fu
This paper presents a genetic algorithm (GA) for K-means clustering. Instead of the widely applied string-of-group-numbers encoding, we encode the prototypes of the clusters into the chromosomes. The crossover operator is designed to exchange prototypes between two chromosomes. The one-step K-means algorithm is used as the mutation operator. Hence, the proposed GA is called the prototypes-embedded genetic K-means algorithm (PGKA). With the inherent evolution process of evolutionary algorithms, PGKA has superior performance than the classical K-means algorithm, while comparing to other GA-based approaches, PGKA is more efficient and suitable for large scale data sets
international conference on acoustics, speech, and signal processing | 2005
Yi-Hsiang Chao; Hsin-Min Wang; Ruei-Chuan Chang
Clearly, the linear discriminant classifier is not robust enough to cope with most real-world data classification problems. Kernel Fisher discriminant analysis (KFDA) tries to increase the expressiveness of the discriminant based on the high order statistics of the data set. In this paper, we propose the GMM-based KFDA with the Bhattacharyya kernel to obtain a transformation, called a speaker eigenspace, based on which the transformed MFCC features are more discriminative for speaker recognition. In our approach, the eigenspace is directly constructed from the complete GMM parameter set, rather than the supervectors considering mean vectors only as in the eigenvoice approach. Moreover, FDA, which is believed to be more appropriate for classification accuracy than principal component analysis (PCA), is applied for eigenspace construction. The speaker identification experiments show that the new features outperform the MFCC features, in particular when the amount of enrollment data for each speaker is very small.
international conference on pattern recognition | 2006
Yi-Hsiang Chao; Wei-Ho Tsai; Hsin-Min Wang; Ruei-Chuan Chang
Real-word applications often involve a binary hypothesis testing problem with one of the two hypotheses ill-defined and hard to be characterized precisely by a single measure. In this paper, we develop a framework that integrates multiple hypothesis testing measures into a unified decision basis, and apply kernel-based classification techniques, namely, kernel Fisher discriminant (KFD) and support vector machine (SVM), to optimize the integration. Experiments conducted on speaker verification demonstrate the superiority of our approaches over the predominant approaches
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Yi-Hsiang Chao; Wei-Ho Tsai; Hsin-Min Wang; Ruei-Chuan Chang
Speaker verification can be viewed as a task of modeling and testing two hypotheses: the null hypothesis and the alternative hypothesis. Since the alternative hypothesis involves unknown impostors, it is usually hard to characterize a priori. In this paper, we propose improving the characterization of the alternative hypothesis by designing two decision functions based, respectively, on a weighted arithmetic combination and a weighted geometric combination of discriminative information derived from a set of pretrained background models. The parameters associated with the combinations are then optimized using two kernel discriminant analysis techniques, namely, the kernel Fisher discriminant (KFD) and support vector machine (SVM). The proposed approaches have two advantages over existing methods. The first is that they embed a trainable mechanism in the decision functions. The second is that they convert variable-length utterances into fixed-dimension characteristic vectors, which are easily processed by kernel discriminant analysis. The results of speaker-verification experiments conducted on two speech corpora show that the proposed methods outperform conventional likelihood ratio-based approaches.
Speech Communication | 2014
Yi-Hsiang Chao
Kernel methods are powerful techniques that have been widely discussed and successfully applied to pattern recognition problems. Kernel-based speaker verification has also been developed to use the concept of sequence kernel that is able to deal with variable-length patterns such as speech. However, constructing a proper kernel cleverly tied in with speaker verification is still an issue. In this paper, we propose the new defined kernels derived by the Likelihood Ratio (LR) test, named the LR-based kernels, in attempts to integrate kernel methods with the LR-based speaker verification framework tightly and intuitively while an LR is embedded in the kernel function. The proposed kernels have two advantages over existing methods. The first is that they can compute the kernel function without needing to represent the variable-length speech as a fixed-dimension vector in advance. The second is that they have a trainable mechanism in the kernel computation using the Multiple Kernel Learning (MKL) algorithm. Our experimental results show that the proposed methods outperform conventional speaker verification approaches.
multimedia and ubiquitous engineering | 2013
Yi-Hsiang Chao
Support Vector Machine (SVM) has been shown powerful in pattern recognition problems. SVM-based speaker verification has also been developed to use the concept of sequence kernel that is able to deal with variable-length patterns such as speech. In this paper, we propose a new kernel function, named the Log-Likelihood Ratio (LLR)-based composite sequence kernel. This kernel not only can be jointly optimized with the SVM training via the Multiple Kernel Learning (MKL) algorithm, but also can calculate the speech utterances in the kernel function intuitively by embedding an LLR in the sequence kernel. Our experimental results show that the proposed method outperforms the conventional speaker verification approaches.
fuzzy systems and knowledge discovery | 2012
Yi-Hsiang Chao
T-norm and GMM-UBM are two predominant log-likelihood ratio (LLR)-based approaches for speaker verification in the last decade. In this paper, we embed T-norm and GMM-UBM in the new approaches based on cross likelihood ratio (CLR), named the pairwise LLR measures, for speaker identification tasks. This pairwise LLR measures can provide some extent of compensation for the conventional speaker identification method especially when client speaker models are not robust. Our experimental results show that the proposed pairwise LLR methods outperform the conventional GMM-UBM speaker identification approach.
multimedia signal processing | 2011
Yi-Hsiang Chao
The likelihood ratio (LR)-based speaker verification is usually difficult to characterize the alternative hypothesis precisely. To better characterize the alternative hypothesis, we propose to incorporate two effective speaker verification approaches based on weighted geometric combination (WGC) and weighted arithmetic combination (WAC) into the support vector machine (SVM) via a new sequence kernel function, named the LR-based composite sequence kernel. This new kernel can be regarded as a unified framework for characterizing the alternative hypothesis by virtue of the complementary information that the WGC and WAC approaches can contribute. Our experiment results show that the proposed sequence kernel method outperforms the conventional speaker verification approaches.
international symposium on chinese spoken language processing | 2010
Yi-Hsiang Chao; Wei-Ho Tsai; Hsin-Min Wang
Support vector machine (SVM) has been shown powerful in binary classification problems. In order to accommodate SVM to speaker verification problem, the concept of sequence kernel has been developed, which maps variable-length speech data into fixed-dimension vectors. However, constructing a suitable sequence kernel for speaker verification is still an issue. In this paper, we propose a new sequence kernel, named the log-likelihood ratio (LLR)-based sequence kernel, to incorporate LLR-based speaker verification approaches into SVM without needing to represent variable-length speech data as fixed-dimension vectors in advance. Our experimental results show that the proposed sequence kernels outperform the conventional kernel-based approaches.