Chuang-Hua Chueh
National Cheng Kung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chuang-Hua Chueh.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Jen-Tzung Chien; Chuang-Hua Chueh
Latent Dirichlet allocation (LDA) was successfully developed for document modeling due to its generalization to unseen documents through the latent topic modeling. LDA calculates the probability of a document based on the bag-of-words scheme without considering the order of words. Accordingly, LDA cannot be directly adopted to predict words in speech recognition systems. This work presents a new Dirichlet class language model (DCLM), which projects the sequence of history words onto a latent class space and calculates a marginal likelihood over the uncertainties of classes, which are expressed by Dirichlet priors. A Bayesian class-based language model is established and a variational Bayesian procedure is presented for estimating DCLM parameters. Furthermore, the long-distance class information is continuously updated using the large-span history words and is dynamically incorporated into class mixtures for a cache DCLM. Different language models are experimentally evaluated using the Wall Street Journal (WSJ) corpus. The amount of training data and the size of vocabulary are evaluated. We find that the cache DCLM effectively characterizes the unseen -gram events and stores the class information for long-distance language modeling. This approach outperforms the other class-based and topic-based language models in terms of perplexity and recognition accuracy. The DCLM and cache DCLM achieved relative gain of word error rate by 3% to 5% over the LDA topic-based language model with different sizes of training data .
Speech Communication | 2010
Jen-Tzung Chien; Chuang-Hua Chueh
In a traditional model of speech recognition, acoustic and linguistic information sources are assumed independent of each other. Parameters of hidden Markov model (HMM) and n-gram are separately estimated for maximum a posteriori classification. However, the speech features and lexical words are inherently correlated in natural language. Lacking combination of these models leads to some inefficiencies. This paper reports on the joint acoustic and linguistic modeling for speech recognition by using the acoustic evidence in estimation of the linguistic model parameters, and vice versa, according to the maximum entropy (ME) principle. The discriminative ME (DME) models are exploited by using features from competing sentences. Moreover, a mutual ME (MME) model is built for sentence posterior probability, which is maximized to estimate the model parameters by characterizing the dependence between acoustic and linguistic features. The N-best Viterbi approximation is presented in implementing DME and MME models. Additionally, the new models are incorporated with the high-order feature statistics and word regularities. In the experiments, the proposed methods increase the sentence posterior probability or model separation. Recognition errors are significantly reduced in comparison with separate HMM and n-gram model estimations from 32.2% to 27.4% using the MATBN corpus and from 5.4% to 4.8% using the WSJ corpus (5K condition).
spoken language technology workshop | 2008
Jen-Tzung Chien; Chuang-Hua Chueh
Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Jen-Tzung Chien; Chuang-Hua Chueh
Latent Dirichlet allocation (LDA) is a new paradigm of topic model which is powerful to capture the latent topic information from natural language. However, the topic information in text streams, e.g. meeting recording, lecture transcription and conversational dialogue, are inherently heterogeneous and nonstationary without explicit boundaries. It is difficult to train a precise topic model from the observed text streams. Furthermore, the usage of words in different paragraphs within a document is varied with different composition styles. In this paper, we present a new hierarchical segmentation model (HSM) where the heterogeneous topic information in stream level and the word variations in document level are characterized. We incorporate the contextual topic information in stream-level segmentation. The topic similarity between sentences is used to form a beta distribution reflecting the prior knowledge of document boundaries in a text stream. The distribution of segmentation variable is adaptively updated to achieve flexible segmentation and is used to group coherent sentences into a topic-specific document. For each pseudo-document, we further use a Markov chain to detect the stylistic segments within a document. The words in a segment are accordingly generated by the same composition style, which differs from the style of the next segment. Each segment is represented by a Markov state, and so the word variations within a document are compensated. The whole model is trained by a variational Bayesian EM procedure and is evaluated on using TDT2 corpus. Experimental results show benefits by using the proposed HSM in terms of perplexity, segmentation error, detection accuracy and F measure.
international conference on acoustics, speech, and signal processing | 2010
Chuang-Hua Chueh; Jen-Tzung Chien
Traditional n-gram language models suffer from insufficient long-distance information. The cache language model, which captures the dynamics of word occurrences in a cache, is feasible to compensate this weakness. This paper presents a new topic cache model for speech recognition based on the latent Dirichlet language model where the latent topic structure is explored from n-gram events and employed for word prediction. In particular, the long-distance topic information is continuously updated from the large-span historical words and dynamically incorporated in generating the topic mixtures through Bayesian learning. The topic cache language model does effectively characterize the unseen n-gram events and catch the topic cache for long-distance language modeling. In the experiments on Wall Street Journal corpus, the proposed method achieves better performance than baseline n-gram and the other related language models in terms of perplexity and recognition accuracy.
international conference on acoustics, speech, and signal processing | 2008
Chuang-Hua Chueh; Jen-Tzung Chien
Language model adaptation aims to adapt a general model to a domain-specific model so that the adapted model can match the lexical information in test data. The minimum discrimination information (MDI) is a popular mechanism for language model adaptation through minimizing the Kullback-Leibler distance to the background model where the constraints found in adaptation data are satisfied. MDI adaptation with unigram constraints has been successfully applied for speech recognition owing to its computational efficiency. However, the unigram features only contain low-level information of adaptation articles which are too rough to attain precise adaptation performance. Accordingly, it is desirable to induce high-order features and explore delicate information for language model adaptation if the adaptation data is abundant. In this study, we focus on adaptively select the reliable features based on re-sampling and calculating the statistical confidence interval. We identify the reliable regions and build the inequality constraints for MDI adaptation. In this way, the reliable intervals can be used for adaptation so that interval estimation is achieved rather than point estimation. Also, the features can be selected automatically in the whole procedure. In the experiments, we carry out the proposed method for broadcast news transcription. We obtain significant improvement compared to conventional MDI adaptation with unigram features for different amount of adaptation data.
international symposium on chinese spoken language processing | 2004
Chuang-Hua Chueh; Jen-Tzung Chien; Hsin-Min Wang
In this paper, we propose an adaptive statistical language model, which successfully incorporates the semantic information into an n-gram model. Traditional n-gram models exploit only the immediate context of history. We first introduce the semantic topic as a new source to extract the long distance information for language modeling, and then adopt the maximum entropy (ME) approach instead of the conventional linear interpolation method to integrate the semantic information with the n-gram model. Using the ME approach, each information source gives rise to a set of constraints, which should be satisfied to achieve the hybrid model. In the experiments, the ME language models, trained using the China Times newswire corpus, achieved 40% perplexity reduction over the baseline bigram model.
international conference on acoustics, speech, and signal processing | 2006
Chuang-Hua Chueh; Jen-Tzung Chien
Traditionally, speech recognition system is established assuming that acoustic and linguistic information sources are independent. Parameters of hidden Markov model and n-gram are estimated individually and then plugged in a maximum a posteriori classification rule. However, acoustic and linguistic features are correlated in essence. Modeling performance is limited accordingly. This study aims to relax the independence assumption and achieve sophisticated acoustic and linguistic modeling for speech recognition. We propose an integrated approach based on maximum entropy (ME) principle where acoustic and linguistic features are optimally merged in a unified framework. The correlations between acoustic and linguistic features are explored and properly represented in the integrated models. Due to the flexibility of ME model, we can further combine other high-level linguistic features. In the experiments, we carry out the proposed methods for broadcast news transcription using MATBN database. We obtain significant improvement compared to conventional speech recognition system using individual maximum likelihood training
international symposium on chinese spoken language processing | 2010
Chuang-Hua Chueh; Jen-Tzung Chien
In a robust information retrieval system, the documents should be represented by considering the variations of word distributions in different paragraphs or segments. A nonstationary latent Dirichlet allocation (NLDA) was established by incorporating a Markov chain to detect the stylistic segments in a heterogeneous document. Each segment corresponds to a particular style and is generated by different word distributions. However, such NLDA is constrained by a fixed number of segments for different lengths of documents. This paper presents a new adaptive segment model (ASM) by adaptively building the topic-based document model with different segment numbers. By incorporating a multinomial hidden variable with Dirichlet prior, the inference procedure of ASM parameters is built through a variational Bayes EM algorithm. In the experiments, the proposed ASM is evaluated for spoken document retrieval using TDT2 corpus. ASM achieves better performance than LDA and NLDA.
spoken language technology workshop | 2008
Chuang-Hua Chueh; Jen-Tzung Chien
Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.