Is this you? Create Your Porfile

Ashish Vaswani

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashish Vaswani is active.

Explore More

Publication

Featured researches published by Ashish Vaswani.

north american chapter of the association for computational linguistics | 2016

Supertagging With LSTMs.

Ashish Vaswani; Yonatan Bisk; Kenji Sagae; Ryan Musa

In this paper we present new state-of-the-art performance on CCG supertagging and parsing. Our model outperforms existing approaches by an absolute gain of 1.5%. We analyze the performance of several neural models and demonstrate that while feed-forward architectures can compete with bidirectional LSTMs on POS tagging, models that encode the complete sentence are necessary for the long range syntactic information encoded in supertags.

empirical methods in natural language processing | 2014

Aligning context-based statistical models of language with brain activity during reading

Leila Wehbe; Ashish Vaswani; Kevin Knight; Tom M. Mitchell

Many statistical models for natural language processing exist, including context-based neural networks that (1) model the previously seen context as a latent feature vector, (2) integrate successive words into the context using some learned representation (embedding), and (3) compute output probabilities for incoming words given the context. On the other hand, brain imaging studies have suggested that during reading, the brain (a) continuously builds a context from the successive words and every time it encounters a word it (b) fetches its properties from memory and (c) integrates it with the previous context with a degree of effort that is inversely proportional to how probable the word is. This hints to a parallelism between the neural networks and the brain in modeling context (1 and a), representing the incoming words (2 and b) and integrating it (3 and c). We explore this parallelism to better understand the brain processes and the neural networks representations. We study the alignment between the latent vectors used by neural networks and brain activity observed via Magnetoencephalography (MEG) when subjects read a story. For that purpose we apply the neural network to the same text the subjects are reading, and explore the ability of these three vector representations to predict the observed word-by-word brain activity. Our novel results show that: before a new word i is read, brain activity is well predicted by the neural network latent representation of context and the predictability decreases as the brain integrates the word and changes its own representation of context. Secondly, the neural network embedding of word i can predict the MEG activity when word i is presented to the subject, revealing that it is correlated with the brain’s own representation of word i. Moreover, we obtain that the activity is predicted in different regions of the brain with varying delay. The delay is consistent with the placement of each region on the processing pathway that starts in the visual cortex and moves to higher level regions. Finally, we show that the output probability computed by the neural networks agrees with the brain’s own assessment of the probability of word i, as it can be used to predict the brain activity after the word i’s properties have been fetched from memory and the brain is in the process of integrating it into the context.

empirical methods in natural language processing | 2014

Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation

Qing Dou; Ashish Vaswani; Kevin Knight

Inspired by previous work, where decipherment is used to improve machine translation, we propose a new idea to combine word alignment and decipherment into a single learning process. We use EM to estimate the model parameters, not only to maximize the probability of parallel corpus, but also the monolingual corpus. We apply our approach to improve Malagasy-English machine translation, where only a small amount of parallel data is available. In our experiments, we observe gains of 0.9 to 2.1 Bleu over a strong baseline.

north american chapter of the association for computational linguistics | 2016

Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies.

Barret Zoph; Ashish Vaswani; Jonathan May; Kevin Knight

We present a simple algorithm to efficiently train language models with noise-contrastive estimation (NCE) on graphics processing units (GPUs). Our NCE-trained language models achieve significantly lower perplexity on the One Billion Word Benchmark language modeling challenge, and contain one sixth of the parameters in the best single model in Chelba et al. (2013). When incorporated into a strong Arabic-English machine translation system they give a strong boost in translation quality. We release a toolkit so that others may also train large-scale, large vocabulary LSTM language models with NCE, parallelizing computation across multiple GPUs.

north american chapter of the association for computational linguistics | 2016

Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning

Boliang Zhang; Xiaoman Pan; Tianlu Wang; Ashish Vaswani; Heng Ji; Kevin Knight; Daniel Marcu

In this paper we tackle a challenging name tagging problem in an emergent setting the tagger needs to be complete within a few hours for a new incident language (IL) using very few resources. Inspired by observing how human annotators attack this challenge, we propose a new expectation-driven learning framework. In this framework we rapidly acquire, categorize, structure and zoom in on ILspecific expectations (rules, features, patterns, gazetteers, etc.) from various non-traditional sources: consulting and encoding linguistic knowledge from native speakers, mining and projecting patterns from both mono-lingual and cross-lingual corpora, and typing based on cross-lingual entity linking. We also propose a cost-aware combination approach to compose expectations. Experiments on seven low-resource languages demonstrate the effectiveness and generality of this framework: we are able to setup a name tagger for a new IL within two hours, and achieve 33.8%-65.1% F-score 1.

international joint conference on natural language processing | 2015

Unifying Bayesian Inference and Vector Space Models for Improved Decipherment

Qing Dou; Ashish Vaswani; Kevin Knight; Chris Dyer

We introduce into Bayesian decipherment a base distribution derived from similarities of word embeddings. We use Dirichlet multinomial regression (Mimno and McCallum, 2012) to learn a mapping between ciphertext and plaintext word embeddings from non-parallel data. Experimental results show that the base distribution is highly beneficial to decipherment, improving state-of-the-art decipherment accuracy from 45.8% to 67.4% for Spanish/English, and from 5.1% to 11.2% for Malagasy/English.

north american chapter of the association for computational linguistics | 2015

Model Invertibility Regularization: Sequence Alignment With or Without Parallel Data.

Tomer Levinboim; Ashish Vaswani; David Chiang

We present Model Invertibility Regularization (MIR), a method that jointly trains two directional sequence alignment models, one in each direction, and takes into account the invertibility of the alignment task. By coupling the two models through their parameters (as opposed to through their inferences, as in Liang et al.’s Alignment by Agreement (ABA), and Ganchev et al.’s Posterior Regularization (PostCAT)), our method seamlessly extends to all IBMstyle word alignment models as well as to alignment without parallel data. Our proposed algorithm is mathematically sound and inherits convergence guarantees from EM. We evaluate MIR on two tasks: (1) On word alignment, applying MIR on fertility based models we attain higher F-scores than ABA and PostCAT. (2) On Japanese-to-English backtransliteration without parallel data, applied to the decipherment model of Ravi and Knight, MIR learns sparser models that close the gap in whole-name error rate by 33% relative to a model trained on parallel data, and further, beats a previous approach by Mylonakis et al.

empirical methods in natural language processing | 2016

Unsupervised Neural Hidden Markov Models.

Ke M. Tran; Yonatan Bisk; Ashish Vaswani; Daniel Marcu; Kevin Knight

In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach outperforms existing generative models and is competitive with the state-of-the-art though with a simpler model easily extended to include additional context.

neural information processing systems | 2017