Ashish Vaswani
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ashish Vaswani.
north american chapter of the association for computational linguistics | 2016
Ashish Vaswani; Yonatan Bisk; Kenji Sagae; Ryan Musa
In this paper we present new state-of-the-art performance on CCG supertagging and parsing. Our model outperforms existing approaches by an absolute gain of 1.5%. We analyze the performance of several neural models and demonstrate that while feed-forward architectures can compete with bidirectional LSTMs on POS tagging, models that encode the complete sentence are necessary for the long range syntactic information encoded in supertags.
empirical methods in natural language processing | 2014
Leila Wehbe; Ashish Vaswani; Kevin Knight; Tom M. Mitchell
Many statistical models for natural language processing exist, including context-based neural networks that (1) model the previously seen context as a latent feature vector, (2) integrate successive words into the context using some learned representation (embedding), and (3) compute output probabilities for incoming words given the context. On the other hand, brain imaging studies have suggested that during reading, the brain (a) continuously builds a context from the successive words and every time it encounters a word it (b) fetches its properties from memory and (c) integrates it with the previous context with a degree of effort that is inversely proportional to how probable the word is. This hints to a parallelism between the neural networks and the brain in modeling context (1 and a), representing the incoming words (2 and b) and integrating it (3 and c). We explore this parallelism to better understand the brain processes and the neural networks representations. We study the alignment between the latent vectors used by neural networks and brain activity observed via Magnetoencephalography (MEG) when subjects read a story. For that purpose we apply the neural network to the same text the subjects are reading, and explore the ability of these three vector representations to predict the observed word-by-word brain activity. Our novel results show that: before a new word i is read, brain activity is well predicted by the neural network latent representation of context and the predictability decreases as the brain integrates the word and changes its own representation of context. Secondly, the neural network embedding of word i can predict the MEG activity when word i is presented to the subject, revealing that it is correlated with the brain’s own representation of word i. Moreover, we obtain that the activity is predicted in different regions of the brain with varying delay. The delay is consistent with the placement of each region on the processing pathway that starts in the visual cortex and moves to higher level regions. Finally, we show that the output probability computed by the neural networks agrees with the brain’s own assessment of the probability of word i, as it can be used to predict the brain activity after the word i’s properties have been fetched from memory and the brain is in the process of integrating it into the context.
empirical methods in natural language processing | 2014
Qing Dou; Ashish Vaswani; Kevin Knight
Inspired by previous work, where decipherment is used to improve machine translation, we propose a new idea to combine word alignment and decipherment into a single learning process. We use EM to estimate the model parameters, not only to maximize the probability of parallel corpus, but also the monolingual corpus. We apply our approach to improve Malagasy-English machine translation, where only a small amount of parallel data is available. In our experiments, we observe gains of 0.9 to 2.1 Bleu over a strong baseline.
north american chapter of the association for computational linguistics | 2016
Barret Zoph; Ashish Vaswani; Jonathan May; Kevin Knight
We present a simple algorithm to efficiently train language models with noise-contrastive estimation (NCE) on graphics processing units (GPUs). Our NCE-trained language models achieve significantly lower perplexity on the One Billion Word Benchmark language modeling challenge, and contain one sixth of the parameters in the best single model in Chelba et al. (2013). When incorporated into a strong Arabic-English machine translation system they give a strong boost in translation quality. We release a toolkit so that others may also train large-scale, large vocabulary LSTM language models with NCE, parallelizing computation across multiple GPUs.
north american chapter of the association for computational linguistics | 2016
Boliang Zhang; Xiaoman Pan; Tianlu Wang; Ashish Vaswani; Heng Ji; Kevin Knight; Daniel Marcu
In this paper we tackle a challenging name tagging problem in an emergent setting the tagger needs to be complete within a few hours for a new incident language (IL) using very few resources. Inspired by observing how human annotators attack this challenge, we propose a new expectation-driven learning framework. In this framework we rapidly acquire, categorize, structure and zoom in on ILspecific expectations (rules, features, patterns, gazetteers, etc.) from various non-traditional sources: consulting and encoding linguistic knowledge from native speakers, mining and projecting patterns from both mono-lingual and cross-lingual corpora, and typing based on cross-lingual entity linking. We also propose a cost-aware combination approach to compose expectations. Experiments on seven low-resource languages demonstrate the effectiveness and generality of this framework: we are able to setup a name tagger for a new IL within two hours, and achieve 33.8%-65.1% F-score 1.
international joint conference on natural language processing | 2015
Qing Dou; Ashish Vaswani; Kevin Knight; Chris Dyer
We introduce into Bayesian decipherment a base distribution derived from similarities of word embeddings. We use Dirichlet multinomial regression (Mimno and McCallum, 2012) to learn a mapping between ciphertext and plaintext word embeddings from non-parallel data. Experimental results show that the base distribution is highly beneficial to decipherment, improving state-of-the-art decipherment accuracy from 45.8% to 67.4% for Spanish/English, and from 5.1% to 11.2% for Malagasy/English.
north american chapter of the association for computational linguistics | 2015
Tomer Levinboim; Ashish Vaswani; David Chiang
We present Model Invertibility Regularization (MIR), a method that jointly trains two directional sequence alignment models, one in each direction, and takes into account the invertibility of the alignment task. By coupling the two models through their parameters (as opposed to through their inferences, as in Liang et al.’s Alignment by Agreement (ABA), and Ganchev et al.’s Posterior Regularization (PostCAT)), our method seamlessly extends to all IBMstyle word alignment models as well as to alignment without parallel data. Our proposed algorithm is mathematically sound and inherits convergence guarantees from EM. We evaluate MIR on two tasks: (1) On word alignment, applying MIR on fertility based models we attain higher F-scores than ABA and PostCAT. (2) On Japanese-to-English backtransliteration without parallel data, applied to the decipherment model of Ravi and Knight, MIR learns sparser models that close the gap in whole-name error rate by 33% relative to a model trained on parallel data, and further, beats a previous approach by Mylonakis et al.
empirical methods in natural language processing | 2016
Ke M. Tran; Yonatan Bisk; Ashish Vaswani; Daniel Marcu; Kevin Knight
In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach outperforms existing generative models and is competitive with the state-of-the-art though with a simpler model easily extended to include additional context.
neural information processing systems | 2017
Ashish Vaswani; Noam Shazeer; Niki Parmar; Jakob Uszkoreit; Llion Jones; Aidan N. Gomez; Lukasz Kaiser; Illia Polosukhin
empirical methods in natural language processing | 2013
Ashish Vaswani; Yinggong Zhao; Victoria Fossum; David Chiang