Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jen-Tzung Chien is active.

Publication


Featured researches published by Jen-Tzung Chien.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2002

Discriminant waveletfaces and nearest feature classifiers for face recognition

Jen-Tzung Chien; Chia-Chen Wu

Feature extraction, discriminant analysis, and classification rules are three crucial issues for face recognition. We present hybrid approaches to handle three issues together. For feature extraction, we apply the multiresolution wavelet transform to extract the waveletface. We also perform the linear discriminant analysis on waveletfaces to reinforce discriminant power. During classification, the nearest feature plane (NFP) and nearest feature space (NFS) classifiers are explored for robust decisions in presence of wide facial variations. Their relationships to conventional nearest neighbor and nearest feature line classifiers are demonstrated. In the experiments, the discriminant waveletface incorporated with the NFS classifier achieves the best face recognition performance.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Adaptive Bayesian Latent Semantic Analysis

Jen-Tzung Chien; Meng-Sung Wu

Due to the vast growth of data collections, the statistical document modeling has become increasingly important in language processing areas. Probabilistic latent semantic analysis (PLSA) is a popular approach whereby the semantics and statistics can be effectively captured for modeling. However, PLSA is highly sensitive to task domain, which is continuously changing in real-world documents. In this paper, a novel Bayesian PLSA framework is presented. We focus on exploiting the incremental learning algorithm for solving the updating problem of new domain articles. This algorithm is developed to improve document modeling by incrementally extracting up-to-date latent semantic information to match the changing domains at run time. By adequately representing the priors of PLSA parameters using Dirichlet densities, the posterior densities belong to the same distribution so that a reproducible prior/posterior mechanism is activated for incremental learning from constantly accumulated documents. An incremental PLSA algorithm is constructed to accomplish the parameter estimation as well as the hyperparameter updating. Compared to standard PLSA using maximum likelihood estimate, the proposed approach is capable of performing dynamic document indexing and modeling. We also present the maximum a posteriori PLSA for corrective training. Experiments on information retrieval and document categorization demonstrate the superiority of using Bayesian PLSA methods.


IEEE Signal Processing Magazine | 2012

Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances

George Saon; Jen-Tzung Chien

Over the past decade or so, several advances have been made to the design of modern large vocabulary continuous speech recognition (LVCSR) systems to the point where their application has broadened from early speaker dependent dictation systems to speaker-independent automatic broadcast news transcription and indexing, lectures and meetings transcription, conversational telephone speech transcription, open-domain voice search, medical and legal speech recognition, and call center applications, to name a few. The commercial success of these systems is an impressive testimony to how far research in LVCSR has come, and the aim of this article is to describe some of the technological underpinnings of modern systems. It must be said, however, that, despite the commercial success and widespread adoption, the problem of large-vocabulary speech recognition is far from being solved: background noise, channel distortions, foreign accents, casual and disfluent speech, or unexpected topic change can cause automated systems to make egregious recognition errors. This is because current LVCSR systems are not robust to mismatched training and test conditions and cannot handle context as well as human listeners despite being trained on thousands of hours of speech and billions of words of text.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Dirichlet Class Language Models for Speech Recognition

Jen-Tzung Chien; Chuang-Hua Chueh

Latent Dirichlet allocation (LDA) was successfully developed for document modeling due to its generalization to unseen documents through the latent topic modeling. LDA calculates the probability of a document based on the bag-of-words scheme without considering the order of words. Accordingly, LDA cannot be directly adopted to predict words in speech recognition systems. This work presents a new Dirichlet class language model (DCLM), which projects the sequence of history words onto a latent class space and calculates a marginal likelihood over the uncertainties of classes, which are expressed by Dirichlet priors. A Bayesian class-based language model is established and a variational Bayesian procedure is presented for estimating DCLM parameters. Furthermore, the long-distance class information is continuously updated using the large-span history words and is dynamically incorporated into class mixtures for a cache DCLM. Different language models are experimentally evaluated using the Wall Street Journal (WSJ) corpus. The amount of training data and the size of vocabulary are evaluated. We find that the cache DCLM effectively characterizes the unseen -gram events and stores the class information for long-distance language modeling. This approach outperforms the other class-based and topic-based language models in terms of perplexity and recognition accuracy. The DCLM and cache DCLM achieved relative gain of word error rate by 3% to 5% over the LDA topic-based language model with different sizes of training data .


IEEE Transactions on Audio, Speech, and Language Processing | 2006

A new independent component analysis for speech recognition and separation

Jen-Tzung Chien; Bo-Cheng Chen

This paper presents a novel nonparametric likelihood ratio (NLR) objective function for independent component analysis (ICA). This function is derived through the statistical hypothesis test of independence of random observations. A likelihood ratio function is developed to measure the confidence toward independence. We accordingly estimate the demixing matrix by maximizing the likelihood ratio function and apply it to transform data into independent component space. Conventionally, the test of independence was established assuming data distributions being Gaussian, which is improper to realize ICA. To avoid assuming Gaussianity in hypothesis testing, we propose a nonparametric approach where the distributions of random variables are calculated using kernel density functions. A new ICA is then fulfilled through the NLR objective function. Interestingly, we apply the proposed NLR-ICA algorithm for unsupervised learning of unknown pronunciation variations. The clusters of speech hidden Markov models are estimated to characterize multiple pronunciations of subword units for robust speech recognition. Also, the NLR-ICA is applied to separate the linear mixture of speech and audio signals. In the experiments, NLR-ICA achieves better speech recognition performance compared to parametric and nonparametric minimum mutual information ICA


IEEE Transactions on Speech and Audio Processing | 1999

Online hierarchical transformation of hidden Markov models for speech recognition

Jen-Tzung Chien

This paper proposes a novel framework of online hierarchical transformation of hidden Markov model (HMM) parameters for adaptive speech recognition. Our goal is to incrementally transform (or adapt) all the HMM parameters to a new acoustical environment even though most of HMM units are unseen in observed adaptation data. We establish a hierarchical tree of HMM units and apply the tree to dynamically search the transformation parameters for individual HMM mixture components. In this paper, the transformation framework formulated according to the approximate Bayesian estimate, where the prior statistics and the transformation parameters can be jointly and incrementally refreshed after each consecutive adaptation data, is presented. Using this formulation, only the refreshed prior statistics and the current block of data are needed for online transformation. In a series of speaker adaptation experiments on the recognition of 408 Mandarin syllables, we examine the effects on constructing various types of hierarchical trees. The efficiency and effectiveness of proposed method on incremental adaptation of overall HMM units are also confirmed. Besides, we demonstrate the superiority of proposed online transformation to Huos (see ibid., vol.5, p.161-72, 1997) on-line adaptation for a wide range of adaptation data.


international conference on acoustics, speech, and signal processing | 2009

Latent Dirichlet learning for document summarization

Ying-Lang Chang; Jen-Tzung Chien

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet distributions for latent topics and latent themes in word level and sentence level, respectively. The sentence-based latent Dirichlet allocation (SLDA) is accordingly established for document summarization. Different from the vector space summarization, SLDA is built to fit the fine structure of text documents, and is specifically designed for sentence selection. SLDA acts as a sentence mixture model with a mixture of Dirichlet themes, which are used to generate the latent topics in observed words. The theme model is inherent to distinguish sentences in a summarization system. In the experiments, the proposed SLDA outperforms other methods for document summarization in terms of precision, recall and F-measure.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Maximum Confidence Hidden Markov Modeling for Face Recognition

Jen-Tzung Chien; Chih-Pin Liao

This paper presents a hybrid framework of feature extraction and hidden Markov modeling (HMM) for two-dimensional pattern recognition. Importantly, we explore a new discriminative training criterion to assure model compactness and discriminability. This criterion is derived from the hypothesis test theory via maximizing the confidence of accepting the hypothesis that observations are from target HMM states rather than competing HMM states. Accordingly, we develop the maximum confidence hidden Markov modeling (MC-HMM) for face recognition. Under this framework, we merge a transformation matrix to extract discriminative facial features. The closed-form solutions to continuous-density HMM parameters are formulated. Attractively, the hybrid MC-HMM parameters are estimated under the same criterion and converged through the expectation-maximization procedure. From the experiments on the FERET database and GTFD, we find that the proposed method obtains robust segmentation in the presence of different facial expressions, orientations, and so forth. In comparison with the maximum likelihood and minimum classification error HMMs, the proposed MC-HMM achieves higher recognition accuracies with lower feature dimensions.


IEEE Transactions on Speech and Audio Processing | 2005

Predictive hidden Markov model selection for speech recognition

Jen-Tzung Chien; Sadaoki Furui

This paper surveys a series of model selection approaches and presents a novel predictive information criterion (PIC) for hidden Markov model (HMM) selection. The approximate Bayesian using Viterbi approach is applied for PIC selection of the best HMMs providing the largest prediction information for generalization of future data. When the perturbation of HMM parameters is expressed by a product of conjugate prior densities, the segmental prediction information is derived at the frame level without Laplacian integral approximation. In particular, a multivariate t distribution is attained to characterize the prediction information corresponding to HMM mean vector and precision matrix. When performing model selection in tree structure HMMs, we develop a top-down prior/posterior propagation algorithm for estimation of structural hyperparameters. The prediction information is determined so as to choose the best HMM tree model. Different from maximum likelihood (ML) and minimum description length (MDL) selection criteria, the parameters of PIC chosen HMMs are computed via maximum a posteriori estimation. In the evaluation of continuous speech recognition using decision tree HMMs, the PIC criterion outperforms ML and MDL criteria in building a compact tree structure with moderate tree size and higher recognition rate.


Speech Communication | 2010

Joint acoustic and language modeling for speech recognition

Jen-Tzung Chien; Chuang-Hua Chueh

In a traditional model of speech recognition, acoustic and linguistic information sources are assumed independent of each other. Parameters of hidden Markov model (HMM) and n-gram are separately estimated for maximum a posteriori classification. However, the speech features and lexical words are inherently correlated in natural language. Lacking combination of these models leads to some inefficiencies. This paper reports on the joint acoustic and linguistic modeling for speech recognition by using the acoustic evidence in estimation of the linguistic model parameters, and vice versa, according to the maximum entropy (ME) principle. The discriminative ME (DME) models are exploited by using features from competing sentences. Moreover, a mutual ME (MME) model is built for sentence posterior probability, which is maximized to estimate the model parameters by characterizing the dependence between acoustic and linguistic features. The N-best Viterbi approximation is presented in implementing DME and MME models. Additionally, the new models are incorporated with the high-order feature statistics and word regularities. In the experiments, the proposed methods increase the sentence posterior probability or model separation. Recognition errors are significantly reduced in comparison with separate HMM and n-gram model estimations from 32.2% to 27.4% using the MATBN corpus and from 5.4% to 4.8% using the WSJ corpus (5K condition).

Collaboration


Dive into the Jen-Tzung Chien's collaboration.

Top Co-Authors

Avatar

Chuang-Hua Chueh

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Hsin-Lung Hsieh

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Meng-Sung Wu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chih-Hsien Huang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chuan-Wei Ting

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chung-Chien Hsu

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Man-Wai Mak

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Ying-Lan Chang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Sadaoki Furui

Tokyo Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge