Puyang Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Puyang Xu is active.

Explore More

Publication

Featured researches published by Puyang Xu.

ieee automatic speech recognition and understanding workshop | 2013

Convolutional neural network based triangular CRF for joint intent detection and slot filling

Puyang Xu; Ruhi Sarikaya

We describe a joint model for intent detection and slot filling based on convolutional neural networks (CNN). The proposed architecture can be perceived as a neural network (NN) version of the triangular CRF model (TriCRF), in which the intent label and the slot sequence are modeled jointly and their dependencies are exploited. Our slot filling component is a globally normalized CRF style model, as opposed to left-to-right models in recent NN based slot taggers. Its features are automatically extracted through CNN layers and shared by the intent model. We show that our slot model component generates state-of-the-art results, outperforming CRF significantly. Our joint model outperforms the standard TriCRF by 1% absolute for both intent and slot. On a number of other domains, our joint model achieves 0.7-1%, and 0.9-2.1% absolute gains over the independent modeling approach for intent and slot respectively.

ieee automatic speech recognition and understanding workshop | 2009

Self-supervised discriminative training of statistical language models

Puyang Xu; Damianos Karakos; Sanjeev Khudanpur

A novel self-supervised discriminative training method for estimating language models for automatic speech recognition (ASR) is proposed. Unlike traditional discriminative training methods that require transcribed speech, only untranscribed speech and a large text corpus is required. An exponential form is assumed for the language model, as done in maximum entropy estimation, but the model is trained from the text using a discriminative criterion that targets word confusions actually witnessed in first-pass ASR output lattices. Specifically, model parameters are estimated to maximize the likelihood ratio between words w in the text corpus and ws cohorts in the test speech, i.e. other words that w competes with in the test lattices. Empirical results are presented to demonstrate statistically significant improvements over a 4-gram language model on a large vocabulary ASR task.

international conference on acoustics, speech, and signal processing | 2014

Contextual domain classification in spoken language understanding systems using recurrent neural network

Puyang Xu; Ruhi Sarikaya

In a multi-domain, multi-turn spoken language understanding session, information from the history often greatly reduces the ambiguity of the current turn. In this paper, we apply the recurrent neural network (RNN) to exploit contextual information for query domain classification. The Jordan-type RNN directly sends the vector of output distribution to the next query turn as additional input features to the convolutional neural network (CNN). We evaluate our approach against SVM with and without contextual features. On our contextually labeled dataset, we observe a 1.4% absolute (8.3% relative) improvement in classification error rate over the non-contextual SVM, and 0.9% absolute (5.5% relative) improvement over the contextual SVM.

international conference on acoustics, speech, and signal processing | 2012

Hallucinated n-best lists for discriminative language modeling

Kenji Sagae; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are “hallucinated” for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with “real” n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.

international conference on acoustics, speech, and signal processing | 2002

Structurally discriminative graphical models for automatic speech recognition - results from the 2001 Johns Hopkins Summer Workshop

Geoffrey Zweig; Jeff A. Bilmes; Thomas S. Richardson; Karim Filali; Karen Livescu; Puyang Xu; K. Jackson; Yigal Brandman; Eric D. Sandness; E. Holtz; J. Torres; B. Byrne

In recent years there has been growing interest in discriminative parameter training techniques, resulting from notable improvements in speech recognition performance on tasks ranging in size from digit recognition to Switchboard. Typified by Maximum Mutual Information training, these methods assume a fixed statistical modeling structure, and then optimize only the associated numerical parameters (such as means, variances, and transition matrices). In this paper, we explore the significantly different methodology of discriminative structure learning. Here, the fundamental dependency relationships between random variables in a probabilistic model are learned in a discriminative fashion, and are learned separately from the numerical parameters. Tn order to apply the principles of structural discriminability, we adopt the framework of graphical models, which allows an arbitrary set of variables with arbitrary conditional independence relationships to be modeled at each time frame. We present results using a new graphical modeling toolkit (described in a companion paper) from the recent 2001 Johns Hopkins Summer Workshop. These results indicate that significant gains result from discriminative structural analysis of both conventional MFCC and novel AM-FM features on the Aurora continuous digits task.

international conference on acoustics, speech, and signal processing | 2012

Semi-supervised discriminative language modeling for Turkish ASR

Arda Çelebi; Hasim Sak; Erinç Dikici; Murat Saraclar; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Kenji Sagae; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.

international conference on acoustics, speech, and signal processing | 2012

Continuous space discriminative language modeling

Puyang Xu; Sanjeev Khudanpur; Maider Lehr; Emily Prud'hommeaux; Nathan Glenn; Damianos Karakos; Brian Roark; Kenji Sagae; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley

Discriminative language modeling is a structured classification problem. Log-linear models have been previously used to address this problem. In this paper, the standard dot-product feature representation used in log-linear models is replaced by a non-linear function parameterized by a neural network. Embeddings are learned for each word and features are extracted automatically through the use of convolutional layers. Experimental results show that as a stand-alone model the continuous space model yields significantly lower word error rate (1% absolute), while having a much more compact parameterization (60%-90% smaller). If the baseline scores are combined, our approach performs equally well.

ieee automatic speech recognition and understanding workshop | 2011

Randomized maximum entropy language models

Puyang Xu; Sanjeev Khudanpur; Asela Gunawardana

We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store.

neural information processing systems | 2011