Yonatan Belinkov
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yonatan Belinkov.
meeting of the association for computational linguistics | 2017
Yonatan Belinkov; Nadir Durrani; Fahim Dalvi; Hassan Sajjad; James R. Glass
Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.
north american chapter of the association for computational linguistics | 2015
Yonatan Belinkov; Mitra Mohtarami; Scott Cyphers; James R. Glass
Continuous word and phrase vectors have proven useful in a number of NLP tasks. Here we describe our experience using them as a source of features for the SemEval-2015 task 3, consisting of two community question answering subtasks: Answer Selection for categorizing answers as potential, good, and bad with regards to their corresponding questions; and YES/NO inference for predicting a yes, no, or unsure response to a YES/NO question using all of its good answers. Our system ranked 6th and 1st in the English answer selection and YES/NO inference subtasks respectively, and 2nd in the Arabic answer selection subtask.
north american chapter of the association for computational linguistics | 2016
Mitra Mohtarami; Yonatan Belinkov; Wei-Ning Hsu; Yu Zhang; Tao Lei; Kfir Bar; Scott Cyphers; James R. Glass
Community question answering platforms need to automatically rank answers and questions with respect to a given question. In this paper, we present the approaches for the Answer Selection and Question Retrieval tasks of SemEval-2016 (task 3). We develop a bag-of-vectors approach with various vectorand text-based features, and different neural network approaches including CNNs and LSTMs to capture the semantic similarity between questions and answers for ranking purpose. Our evaluation demonstrates that our approaches significantly outperform the baselines.
empirical methods in natural language processing | 2015
Yonatan Belinkov; James R. Glass
Arabic, Hebrew, and similar languages are typically written without diacritics, leading to ambiguity and posing a major challenge for core language processing tasks like speech recognition. Previous approaches to automatic diacritization employed a variety of machine learning techniques. However, they typically rely on existing tools like morphological analyzers and therefore cannot be easily extended to new genres and languages. We develop a recurrent neural network with long shortterm memory layers for predicting diacritics in Arabic text. Our language-independent approach is trained solely from diacritized text without relying on external tools. We show experimentally that our model can rival state-of-the-art methods that have access to additional resources.
meeting of the association for computational linguistics | 2017
Hassan Sajjad; Fahim Dalvi; Nadir Durrani; Ahmed Abdelali; Yonatan Belinkov; Stephan Vogel
Word segmentation plays a pivotal role in improving any Arabic NLP application. Therefore, a lot of research has been spent in improving its accuracy. Off-the-shelf tools, however, are: i) complicated to use and ii) domain/dialect dependent. We explore three language-independent alternatives to morphological segmentation using: i) data-driven sub-word units, ii) characters as a unit of learning, and iii) word embeddings learned using a character CNN (Convolution Neural Network). On the tasks of Machine Translation and POS tagging, we found these methods to achieve close to, and occasionally surpass state-of-the-art performance. In our analysis, we show that a neural machine translation system is sensitive to the ratio of source and target tokens, and a ratio close to 1 or greater, gives optimal performance.
meeting of the association for computational linguistics | 2015
Yonatan Belinkov; Alberto Barrón-Cedeño; Hamdy Mubarak
The task of answer selection in community question answering consists of identifying pertinent answers from a pool of user-generated comments related to a question. The recent SemEval-2015 introduced a shared task on community question answering, providing a corpus and evaluation scheme. In this paper we address the problem of answer selection in Arabic. Our proposed model includes a manifold of features including lexical and semantic similarities, vector representations, and rankings. We investigate the contribution of each set of features in a supervised setting. We show that employing a feature combination by means of a linear support vector machine achieves a better performance than that of the competition winner (F1 of 79.25 compared to 78.55).
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in#N# Phonetics, Phonology, and Morphology | 2016
Roee Aharoni; Yoav Goldberg; Yonatan Belinkov
Morphological reinflection is the task of generating a target form given a source form and the morpho-syntactic attributes of the target (and, optionally, of the source). This work presents the submission of Bar Ilan University and the Massachusetts Institute of Technology for the morphological reinflection shared task held at SIGMORPHON 2016. The submission includes two recurrent neural network architectures for learning morphological reinflection from incomplete inflection tables while using several novel ideas for this task: morpho-syntactic attribute embeddings, modeling the concept of templatic morphology, bidirectional input character representations and neural discriminative string transduction. The reported results for the proposed models over the ten languages in the shared task bring this submission to the second/third place (depending on the language) on all three sub-tasks out of eight participating teams, while training only on the Restricted category data.
international conference on learning representations | 2017
Yossi Adi; Einat Kermany; Yonatan Belinkov; Ofer Lavi; Yoav Goldberg
meeting of the association for computational linguistics | 2013
Hassan Sajjad; Kareem Darwish; Yonatan Belinkov
Transactions of the Association for Computational Linguistics | 2014
Yonatan Belinkov; Tao Lei; Regina Barzilay; Amir Globerson