Mohammad Sadegh Rasooli
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohammad Sadegh Rasooli.
empirical methods in natural language processing | 2015
Mohammad Sadegh Rasooli; Michael Collins
We present a novel method for the crosslingual transfer of dependency parsers. Our goal is to induce a dependency parser in a target language of interest without any direct supervision: instead we assume access to parallel translations between the target and one or more source languages, and to supervised parsers in the source language(s). Our key contributions are to show the utility of dense projected structures when training the target language parser, and to introduce a novel learning algorithm that makes use of dense structures. Results on several languages show an absolute improvement of 5.51% in average dependency accuracy over the state-of-the-art method of (Ma and Xia, 2014). Our average dependency accuracy of 82.18% compares favourably to the accuracy of fully supervised methods.
mexican international conference on artificial intelligence | 2011
Mohammad Sadegh Rasooli; Heshaam Faili; Behrouz Minaei-Bidgoli
One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.
international conference natural language processing | 2011
Mohammad Sadegh Rasooli; Omid Kahefi; Behrouz Minaei-Bidgoli
In computers era, the flow of producing digital documents simply overwhelmed the traditional manual spell checking, the worst new type of misspelling called typographical errors have been created by machinery text production and management. Therefore, referring to human intolerable load of digital texts spell checking also the irrecusable ability of computers, including accuracy and speed, automatic spell checking using computer systems would be an important application of computer systems. Different users may have their own misspelling patterns or habits so we believe that using a traditional automatic spell checker using a fix set of rules may not be well performable for all kind of misspelling patterns. Therefore, in this paper, we investigate the effect of adaptive spell checking on Persian language comparing a non-adaptive traditional spell checking. Evaluation results show using adaptive spell checking is superior and more efficient than traditional spell checking with a fix set of rules after a short time of usage.
meeting of the association for computational linguistics | 2014
Mohammad Sadegh Rasooli; Thomas Lippincott; Nizar Habash; Owen Rambow
We present a novel way of generating unseen words, which is useful for certain applications such as automatic speech recognition or optical character recognition in low-resource languages. We test our vocabulary generator on seven low-resource languages by measuring the decrease in out-of-vocabulary word rate on a held-out test set. The languages we study have very different morphological properties; we show how our results differ depending on the morphological complexity of the language. In our best result (on Assamese), our approach can predict 29% of the token-based out-of-vocabulary with a small amount of unlabeled training data.
conference of the european chapter of the association for computational linguistics | 2014
Mohammad Sadegh Rasooli; Joel R. Tetreault
Parsing disfluent sentences is a challenging task which involves detecting disfluencies as well as identifying the syntactic structure of the sentence. While there have been several studies recently into solely detecting disfluencies at a high performance level, there has been relatively little work into joint parsing and disfluency detection that has reached that state-ofthe-art performance in disfluency detection. We improve upon recent work in this joint task through the use of novel features and learning cascades to produce a model which performs at 82.6 F-score. It outperforms the previous best in disfluency detection on two different evaluations.
asia information retrieval symposium | 2011
Mohammad Sadegh Rasooli; Omid Kashefi; Behrouz Minaei-Bidgoli
The task of sentence and paragraph alignment is essential for preparing parallel texts that are needed in applications such as machine translation. The lack of sufficient linguistic data for under-resourced languages like Persian is a challenging issue. In this paper, we proposed a hybrid sentence and paragraph alignment model on Persian-English parallel documents based on simple linguistic features as well as length similarity between sentences and paragraphs of source and target languages. We apply a small bilingual dictionary of Persian-English nouns, punctuation marks, and length similarity as alignment metrics. We combine these features in a linear model and use genetic algorithm to learn the linear equation weights. Evaluation results show that the extracted features improve the baseline model which is only a length-based one.
international joint conference on natural language processing | 2015
Alireza Nourian; Mohammad Sadegh Rasooli; Mohsen Imany; Heshaam Faili
Ezafe construction is an idiosyncratic phenomenon in the Persian language. It is a good indicator for phrase boundaries and dependency relations but mostly does not appear in the text. In this paper, we show that adding information about Ezafe construction can give 4.6% relative improvement in dependency parsing and 9% relative improvement in shallow parsing. For evaluation purposes, Ezafe tags are manually annotated in the Persian dependency treebank. Furthermore, to be able to conduct experiments on shallow parsing, we develop a dependency to shallow phrase structure convertor based on the Persian dependencies.
intelligent information systems | 2013
Maryam Aminian; Mohammad Sadegh Rasooli; Hossein Sameti
Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin’s theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a manual classification of 265 verbs into 43 semantic classes. We show that spectral clustering algorithm outperforms KMeans and improves on the baseline algorithm with about 17% in Fmeasure and 0.13 in Rand index.
Machine Translation | 2018
Mohammad Sadegh Rasooli; Noura Farra; Axinia Radeva; Tao Yu; Kathleen R. McKeown
We describe two transfer approaches for building sentiment analysis systems without having gold labeled data in the target language. Unlike previous work that is focused on using only English as the source language and a small number of target languages, we use multiple source languages to learn a more robust sentiment transfer model for 16 languages from different language families. Our approaches explore the potential of using an annotation projection approach and a direct transfer approach using cross-lingual word representations and neural networks. Whereas most previous work relies on machine translation, we show that we can build cross-lingual sentiment analysis systems without machine translation or even high quality parallel data. We have conducted experiments assessing the availability of different resources such as in-domain parallel data, out-of-domain parallel data, and in-domain comparable data. Our experiments show that we can build a robust transfer system whose performance can in some cases approach that of a supervised system.
north american chapter of the association for computational linguistics | 2013
Mohammad Sadegh Rasooli; Manouchehr Kouhestani; Amirsaeid Moloodi