Masood Ghayoomi
Free University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masood Ghayoomi.
language resources and evaluation | 2011
Mahmood Bijankhan; Javad Sheykhzadegan; Mohammad Bahrani; Masood Ghayoomi
This paper addresses some of the issues learned during the course of building a written language resource, called ‘Peykare’, for the contemporary Persian. After defining five linguistic varieties and 24 different registers based on these linguistic varieties, we collected texts for Peykare to do a linguistic analysis, including cross-register differences. For tokenization of Persian, we propose a descriptive generalization to normalize orthographic variations existing in texts. To annotate Peykare, we use EAGLES guidelines which result to have a hierarchy in the part-of-speech tags. To this aim, we apply a semi-automatic approach for the annotation methodology. In the paper, we also give a special attention to the Ezafe construction and homographs which are important in Persian text analyses.
international multiconference on computer science and information technology | 2010
Stefan Müller; Masood Ghayoomi
In this paper, we discuss an HPSG grammar of Persian (PerGram) that is implemented in the TRALE system. We describe some of the phenomena which are currently covered. While working on the grammar, we developed a test suite with positive and negative examples from the linguistic literature. To be able to test the coverage of the grammar with respect to naturally occurring sentences, we use a subcorpus of a big corpus of Persian.
International Conference on NLP | 2012
Masood Ghayoomi
Syntactically annotated data like a treebank are used for training the statistical parsers. One of the main aspects in developing statistical parsers is their sensitivity to the training data. Since data sparsity is the biggest challenge in data oriented analyses, parsers have a malperformance if they are trained with a small set of data, or when the genre of the training and the test data are not equal. In this paper, we propose a word-clustering approach using the Brown algorithm to overcome these problems. Using the proposed class-based model, a more coarser level of the lexicon is created compared to the words. In addition, we propose an extension to the clustering approach in which the POS tags of the words are also taken into the consideration while clustering the words. We prove that adding this information improves the performance of clustering specially for homographs. In usual word clusterings, homographs are treated equally; while the proposed extended model considers the homographs distinct and causes them to be assigned to different clusters. The experimental results show that the class-based approach outperforms the word-based parsing in general. Moreover, we show the superiority of the proposed extension of the class-based parsing to the model which only uses words for clustering.
Linguistic Issues in Language Technology | 2012
Masood Ghayoomi
Int. J. of Asian Lang. Proc. | 2010
Masood Ghayoomi; Saeedeh Momtazi; Mahmood Bijankhan
north american chapter of the association for computational linguistics | 2010
Masood Ghayoomi
language resources and evaluation | 2014
Masood Ghayoomi; Kiril Simov; Petya Osenova
language resources and evaluation | 2012
Masood Ghayoomi
language resources and evaluation | 2014
Masood Ghayoomi; Jonas Kuhn
Signal and Data Processing | 2017
Masood Ghayoomi