Yun-Nung Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yun-Nung Chen is active.

Explore More

Publication

Featured researches published by Yun-Nung Chen.

international joint conference on natural language processing | 2015

Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding

Yun-Nung Chen; William Yang Wang; Anatole Gershman; Alexander I. Rudnicky

Spoken dialogue systems (SDS) typically require a predefined semantic ontology to train a spoken language understanding (SLU) module. In addition to the annotation cost, a key challenge for designing such an ontology is to define a coherent slot set while considering their complex relations. This paper introduces a novel matrix factorization (MF) approach to learn latent feature vectors for utterances and semantic elements without the need of corpus annotations. Specifically, our model learns the semantic slots for a domain-specific SDS in an unsupervised fashion, and carries out semantic parsing using latent MF techniques. To further consider the global semantic structure, such as inter-word and inter-slot relations, we augment the latent MF-based model with a knowledge graph propagation model based on a slot-based semantic graph and a word-based lexical graph. Our experiments show that the proposed MF approaches produce better SLU models that are able to predict semantic slots and word patterns taking into account their relations and domain-specificity in a joint manner.

spoken language technology workshop | 2014

Leveraging frame semantics and distributional semantics for unsupervised semantic slot induction in spoken dialogue systems

Yun-Nung Chen; William Yang Wang; Alexander I. Rudnicky

Distributional semantics and frame semantics are two representative views on language understanding in the statistical world and the linguistic world, respectively. In this paper, we combine the best of two worlds to automatically induce the semantic slots for spoken dialogue systems. Given a collection of unlabeled audio files, we exploit continuous-valued word embeddings to augment a probabilistic frame-semantic parser that identifies key semantic slots in an unsupervised fashion. In experiments, our results on a real-world spoken dialogue dataset show that the distributional word representations significantly improve the adaptation of FrameNet-style parses of ASR decodings to the target semantic space; that comparing to a state-of-the-art baseline, a 13% relative average precision improvement is achieved by leveraging word vectors trained on two 100-billion words datasets; and that the proposed technology can be used to reduce the costs for designing task-oriented spoken dialogue systems.

international conference on acoustics, speech, and signal processing | 2011

Improved spoken term detection with graph-based re-ranking in feature space

Yun-Nung Chen; Chia-ping Chen; Hung-yi Lee; Chun-an Chan; Lin-Shan Lee

This paper presents a graph-based approach for spoken term detection. Each first-pass retrieved utterance is a node on a graph and the edge between two nodes is weighted by the similarity between the two utterances evaluated in feature space. The score of each node is then modified by the contributions from its neighbors by random walk or its modified version, because utterances similar to more utterances with higher scores should be given higher relevance scores. In this way the global similarity structure of all first-pass retrieved utterances can be jointly considered. Experimental results show that this new approach offers significantly better performance than the previously proposed pseudo-relevance feedback approach, which considers primarily the local similarity relationship between first-pass retrieved utterances, and these two different approaches can be cascaded to provide even better results.

spoken language technology workshop | 2010

Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features

Yun-Nung Chen; Yu Huang; Sheng-yi Kong; Lin-Shan Lee

This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.

international conference on acoustics, speech, and signal processing | 2013

An empirical investigation of sparse log-linear models for improved dialogue act classification

Yun-Nung Chen; William Yang Wang; Alexander I. Rudnicky

Previous work on dialogue act classification have primarily focused on dense generative and discriminative models. However, since the automatic speech recognition (ASR) outputs are often noisy, dense models might generate biased estimates and overfit to the training data. In this paper, we study sparse modeling approaches to improve dialogue act classification, since the sparse models maintain a compact feature space, which is robust to noise. To test this, we investigate various element-wise frequentist shrinkage models such as lasso, ridge, and elastic net, as well as structured sparsity models and a hierarchical sparsity model that embed the dependency structure and interaction among local features. In our experiments on a real-world dataset, when augmenting N-best word and phone level ASR hypotheses with confusion network features, our best sparse log-linear model obtains a relative improvement of 19.7% over a rule-based baseline, a 3.7% significant improvement over a traditional non-sparse log-linear model, and outperforms a state-of-the-art SVM model by 2.2%.

north american chapter of the association for computational linguistics | 2015

Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding

Yun-Nung Chen; William Yang Wang; Alexander I. Rudnicky

A key challenge of designing coherent semantic ontology for spoken language understanding is to consider inter-slot relations. In practice, however, it is difficult for domain experts and professional annotators to define a coherent slot set, while considering various lexical, syntactic, and semantic dependencies. In this paper, we exploit the typed syntactic dependency theory for unsupervised induction and filling of semantics slots in spoken dialogue systems. More specifically, we build two knowledge graphs: a slot-based semantic graph, and a word-based lexical graph. To jointly consider word-to-word, word-toslot, and slot-to-slot relations, we use a random walk inference algorithm to combine the two knowledge graphs, guided by dependency grammars. The experiments show that considering inter-slot relations is crucial for generating a more coherent and compete slot set, resulting in a better spoken language understanding model, while enhancing the interpretability of semantic slots.

spoken language technology workshop | 2014

Deriving local relational surface forms from dependency-based entity embeddings for unsupervised spoken language understanding

Yun-Nung Chen; Dilek Hakkani-Tür; Gokhan Tur

Recent works showed the trend of leveraging web-scaled structured semantic knowledge resources such as Freebase for open domain spoken language understanding (SLU). Knowledge graphs provide sufficient but ambiguous relations for the same entity, which can be used as statistical background knowledge to infer possible relations for interpretation of user utterances. This paper proposes an approach to capture the relational surface forms by mapping dependency-based contexts of entities from the text domain to the spoken domain. Relational surface forms are learned from dependency-based entity embeddings, which encode the contexts of entities from dependency trees in a deep learning model. The derived surface forms carry functional dependency to the entities and convey the explicit expression of relations. The experiments demonstrate the efficiency of leveraging derived relational surface forms as local cues together with prior background knowledge.

international conference on acoustics, speech, and signal processing | 2012

Utterance-level latent topic transition modeling for spoken documents and its application in automatic summarization

Hung-yi Lee; Yun-Nung Chen; Lin-Shan Lee

In this paper, we propose to use an utterance-level latent topic transition model to estimate the latent topics behind the utterances, and test the performance of such model in extractive speech summarization. In this model, the latent topic weights behind an utterance are estimated, and these topic weights evolve from an utterance to the next in a spoken document based on a topic transition function represented by a matrix. We explore different ways of obtaining such topic transition matrices used in the model, and find using a set of matrices estimated with utterances clustered from a training spoken document set is very useful. This model was shown to be able to offer extra performance improvement when used with the popularly used Probability Latent Semantic Analysis (PLSA) in preliminary experiments on speech summarization.

international conference on acoustics, speech, and signal processing | 2017

End-to-end joint learning of natural language understanding and dialogue manager

Xuesong Yang; Yun-Nung Chen; Dilek Z. Hakkani-Tur; Paul A. Crook; Xiujun Li; Jianfeng Gao; Li Deng

Natural language understanding and dialogue policy learning are both essential in conversational systems that predict the next system actions in response to a current user utterance. Conventional approaches aggregate separate models of natural language understanding (NLU) and system action prediction (SAP) as a pipeline that is sensitive to noisy outputs of error-prone NLU. To address the issues, we propose an end-to-end deep recurrent neural network with limited contextual dialogue memory by jointly training NLU and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our proposed model significantly outperforms the state-of-the-art pipeline models for both NLU and SAP, which indicates that our joint model is capable of mitigating the affects of noisy NLU outputs, and NLU model can be refined by error flows backpropagating from the extra supervised signals of system actions.

international conference on acoustics, speech, and signal processing | 2013

Bootstrapping Text-to-Speech for speech processing in languages without an orthography

Sunayana Sitaram; Sukhada Palkar; Yun-Nung Chen; Alok Parlikar; Alan W. Black

Speech synthesis technology has reached the stage where given a well-designed corpus of audio and accurate transcription an at least understandable synthesizer can be built without necessarily resorting to new innovations. However many languages do not have a well-defined writing system but such languages could still greatly benefit from speech systems. In this paper we consider the case where we have a (potentially large) single speaker database but have no transcriptions and no standardized way to write transcriptions. To address this scenario we propose a method that allows us to bootstrap synthetic voices purely from speech data. We use a novel combination of automatic speech recognition and automatic word segmentation for the bootstrapping. Our experimental results on speech corpora in two languages, English and German, show that synthetic voices that are built using this method are close to understandable. Our method is language-independent and can thus be used to build synthetic voices from a speech corpus in any new language.

Explore More