Shih-Sian Cheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shih-Sian Cheng is active.

Explore More

Publication

Featured researches published by Shih-Sian Cheng.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization

Shih-Sian Cheng; Hsin-Min Wang; Hsin-Chia Fu

In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BlC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using DeltaBIC, a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnans window-growing-based approach, Siegler et al.s fixed-size sliding window approach, and Delacourt and Wellekenss DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.

Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, 2003 | 2005

MATBN: A Mandarin Chinese Broadcast News Corpus

Hsin-Min Wang; Berlin Chen; Jen-Wei Kuo; Shih-Sian Cheng

The MATBN Mandarin Chinese broadcast news corpus contains a total of 198 hours of broadcast news from the Public Television Service Foundation (Taiwan) with corresponding transcripts. The primary purpose of this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast news domain. In this paper, we briefly introduce. the speech corpus and report on some preliminary statistical analysis and speech recognition evaluation results.

IEEE Transactions on Neural Networks | 2009

Model-Based Clustering by Probabilistic Self-Organizing Maps

Shih-Sian Cheng; Hsin-Chia Fu; Hsin-Min Wang

In this paper, we consider the learning process of a probabilistic self-organizing map (PbSOM) as a model-based data clustering procedure that preserves the topological relationships between data clusters in a neural network. Based on this concept, we develop a coupling-likelihood mixture model for the PbSOM that extends the reference vectors in Kohonens self-organizing map (SOM) to multivariate Gaussian distributions. We also derive three expectation-maximization (EM)-type algorithms, called the SOCEM, SOEM, and SODAEM algorithms, for learning the model (PbSOM) based on the maximum-likelihood criterion. SOCEM is derived by using the classification EM (CEM) algorithm to maximize the classification likelihood; SOEM is derived by using the EM algorithm to maximize the mixture likelihood; and SODAEM is a deterministic annealing (DA) variant of SOCEM and SOEM. Moreover, by shrinking the neighborhood size, SOCEM and SOEM can be interpreted, respectively, as DA variants of the CEM and EM algorithms for Gaussian model-based clustering. The experimental results show that the proposed PbSOM learning algorithms achieve comparable data clustering performance to that of the deterministic annealing EM (DAEM) approach, while maintaining the topology-preserving property.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation

Wei-Ho Tsai; Shih-Sian Cheng; Hsin-Min Wang

This paper investigates the problem of automatically grouping unknown speech utterances based on their associated speakers. In attempts to determine which utterances should be grouped together, it is necessary to measure the voice similarities between utterances. Since most existing methods measure the inter-utterance similarities based directly on the spectrum-based features, the resulting clusters may not be well-related to speakers, but to various acoustic classes instead. This study remedies this shortcoming by projecting utterances onto a reference space trained to cover the generic voice characteristics underlying the whole utterance collection. The resultant projection vectors naturally reflect the relationships of voice similarities among all the utterances, and hence are more robust against interference from nonspeaker factors. Then, a clustering method based on maximum purity estimation is proposed, with the aim of maximizing the similarities between utterances within all the clusters. This method employs a genetic algorithm to determine the cluster to which each utterance should be assigned, which overcomes the limitation of conventional hierarchical clustering that the final result can only reach the local optimum. In addition, the proposed clustering method adapts a Bayesian information criterion to determine how many clusters should be created

international conference on acoustics, speech, and signal processing | 2005

Clustering speech utterances by speaker using Eigenvoice-motivated vector space models

Wei-Ho Tsai; Shih-Sian Cheng; Yi-Hsiang Chao; Hsin-Min Wang

The paper investigates the problem of automatically grouping unknown speech utterances based on their associated speakers. The proposed method utilizes the vector space model, which was originally developed in document-retrieval research, to characterize each utterance as a tf-idf-based vector of acoustic terms, thereby deriving a reliable measurement of similarity between utterances. To define the required acoustic terms that are most representative in terms of voice characteristics, the Eigenvoice approach is applied to the utterances to be clustered, which creates a set of eigenvector-based terms. To further improve speaker-clustering performance, the proposed method encompasses a mechanism of blind relevance feedback for refining the inter-utterance similarity measure.

EURASIP Journal on Advances in Signal Processing | 2004

A model-selection-based self-splitting Gaussian mixture learning with application to speaker identification

Shih-Sian Cheng; Hsin-Min Wang; Hsin-Chia Fu

We propose a self-splitting Gaussian mixture learning (SGML) algorithm for Gaussian mixture modelling. The SGML algorithm is deterministic and is able to find an appropriate number of components of the Gaussian mixture model (GMM) based on a self-splitting validity measure, Bayesian information criterion (BIC). It starts with a single component in the feature space and splits adaptively during the learning process until the most appropriate number of components is found. The SGML algorithm also performs well in learning the GMM with a given component number. In our experiments on clustering of a synthetic data set and the text-independent speaker identification task, we have observed the ability of the SGML for model-based clustering and automatically determining the model complexity of the speaker GMMs for speaker identification.

international conference on pattern recognition | 2006

A Prototypes-Embedded Genetic K-means Algorithm

Shih-Sian Cheng; Yi-Hsiang Chao; Hsin-Min Wang; Hsin-Chia Fu

This paper presents a genetic algorithm (GA) for K-means clustering. Instead of the widely applied string-of-group-numbers encoding, we encode the prototypes of the clusters into the chromosomes. The crossover operator is designed to exchange prototypes between two chromosomes. The one-step K-means algorithm is used as the mutation operator. Hence, the proposed GA is called the prototypes-embedded genetic K-means algorithm (PGKA). With the inherent evolution process of evolutionary algorithms, PGKA has superior performance than the classical K-means algorithm, while comparing to other GA-based approaches, PGKA is more efficient and suitable for large scale data sets

international conference on multimedia and expo | 2011

Automatic annotation of Web videos

Shih-Wei Sun; Yu-Chiang Frank Wang; Yao-Ling Hung; Chia-Ling Chang; Kuan-Chieh Chen; Shih-Sian Cheng; Hsin-Min Wang; Hong-Yuan Mark Liao

Most Web videos are captured in uncontrolled environments (e.g. videos captured by freely-moving cameras with low resolution); this makes automatic video annotation very difficult. To address this problem, we present a robust moving foreground object detection method followed by the integration of features collected from heterogeneous domains. We advance SIFT feature matching and present a probabilistic framework to construct consensus foreground object templates (CFOT). The CFOT can detect moving foreground objects of interest across video frames, and this allows us to extract visual features from foreground regions of interest. Together with the use of audio features, we are able to improve resulting annotation accuracy. We conduct experiments and achieve promising results on a Web video dataset collected from YouTube.

International Journal of Speech Technology | 2004

The SoVideo Mandarin Chinese Broadcast News Retrieval System

Hsin-Min Wang; Shih-Sian Cheng; Yong-cheng Chen

This paper describes the SoVideo broadcast news retrieval system for Mandarin Chinese. The system is based on technologies such as large vocabulary continuous speech recognition for Mandarin Chinese, automatic story segmentation, and information retrieval. Currently, the database consists of 177 hours of broadcast news, which yielded 3,264 stories by automatic story segmentation. We discuss the development and evaluation of each component of the retrieval system.

international conference on acoustics, speech, and signal processing | 2008

BIC-based audio segmentation by divide-and-conquer

Shih-Sian Cheng; Hsin-Min Wang; Hsin-Chia Fu

Audio segmentation has received increasing attention in recent years for its potential applications in automatic indexing and transcription of audio data. Among existing audio segmentation approaches, the BIC-based approach proposed by Chen and Gopalakrishnan is most well-known for its high accuracy. However, this window-growing-based segmentation approach suffers from the high computation cost. In this paper, we propose using the efficient divide-and-conquer strategy in audio segmentation. Our approaches detect acoustic changes by recursively partitioning an analysis window into two sub-windows using DeltaBIC. The results of experiments conducted on the broadcast news data demonstrate that our approaches not only have a lower computation cost but also achieve a higher segmentation accuracy than window-growing-based segmentation.

Explore More