A. Cüneyd Tantuğ
Istanbul Technical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by A. Cüneyd Tantuğ.
international conference natural language processing | 2006
A. Cüneyd Tantuğ; Eşref Adalı; Kemal Oflazer
This paper describes the implementation of a two-level morphological analyzer for the Turkmen Language. Like all Turkic languages, the Turkmen Language is an agglutinative language that has productive inflectional and derivational suffixes. In this work, we implemented a finite-state two-level morphological analyzer for Turkmen Language by using Xerox Finite State Tools.
international conference on intelligent engineering systems | 2012
Bahar Ilgen; Eşref Adalı; A. Cüneyd Tantuğ
Word Sense Disambiguation (WSD) is the task of choosing the most appropriate sense of a word having multiple senses in a given context. Collocational features acquired from the words in neighborship with the ambiguous word are one of the important knowledge sources in this area. This paper explores the effective sets of collocational features in Turkish in order to obtain better Turkish WSD systems. A lexical sample dataset of highly polysemous nouns and verbs has been prepared as the initial step of the work. Several supervised learning algorithms have been tested on this data by supplying different feature sets to select the best performing features for both nouns and verbs in Turkish. Also, we investigated the impact of several collocational features of polysemous words and evaluated the performance of several supervised machine learning algorithms.
international symposium on computer and information sciences | 2013
Bahar Ilgen; Eşref Adalı; A. Cüneyd Tantuğ
In this paper, the effect of different windowing schemes on word sense disambiguation accuracy is presented. Turkish Lexical Sample Dataset has been used in the experiments. We took the samples of ambiguous verbs and nouns of the dataset and used bag-of-word properties as context information. The experi-ments have been repeated for different window sizes based on several machine learning algorithms. We follow 2/3 splitting strategy (2/3 for training, 1/3 for test-ing) and determine the most frequently used words in the training part. After re-moving stop words, we repeated the experiments by using most frequent 100, 75, 50 and 25 content words of the training data. Our findings show that the usage of most frequent 75 words as features improves the accuracy in results for Turkish verbs. Similar results have been obtained for Turkish nouns when we use the most frequent 100 words of the training set. Considering this information, selected al-gorithms have been tested on varying window sizes {30, 15, 10 and 5}. Our find-ings show that Naive Bayes and Functional Tree methods yielded better accuracy results. And the window size \(\pm \)5 gives the best average results both for noun and the verb groups. It is observed that the best results of the two groups are 65.8 and 56 % points above the most frequent sense baseline of the verb and noun groups respectively.
international symposium on computer and information sciences | 2006
A. Cüneyd Tantuğ; Eşref Adalı; Kemal Oflazer
This paper presents a statistical lexical ambiguity resolution method in direct transfer machine translation models in which the target language is Turkish. Since direct transfer MT models do not have full syntactic information, most of the lexical ambiguity resolution methods are not very helpful. Our disambiguation model is based on statistical language models. We have investigated the performances of some statistical language model types and parameters in lexical ambiguity resolution for our direct transfer MT system.
international symposium on innovations in intelligent systems and applications | 2012
Bahar Ilgen; Eşref Adalı; A. Cüneyd Tantuğ
Word Sense Disambiguation (WSD) has become even more important research area in recent years with the widespread usage of Natural Language Processing (NLP) applications. WSD task has two variants: “Lexical Sample” and “All Words” approaches. Lexical Sample approach disambiguates the occurrences of a small sample of target words that were previously selected, while in the latter all the words in a piece of text are disambiguated. In the scope of this work, a Lexical Sample Dataset for Turkish has been prepared. As a first step, highly ambiguous words in Turkish have been selected. Collection of text samples for chosen words has been completed. Five taggers have annotated the word senses. This paper summarizes the step-by-step building-up process of a Lexical Sample Dataset in Turkish and presents the results of some experiments on it.
ieee international conference on computer science and information technology | 2009
Murat Orhun; A. Cüneyd Tantuğ; Eşref Adalı; A. Coskun Sonmez
This paper describes the differences between Uyghur (spoken in Sin Kiang, China) and Turkish Grammar on the sentence level. There are not many researches about natural language processing on Turkic languages except than Turkish. Uyghur language is one of the old and rich language in the Turkic language family. Even though both of these languages belong to the same language family, there are some important differences between them. Because of these reasons, it is not possible to implement a machine translation system between Uyghur and Turkish languages, which works on word by word translation simply. All of the words in the sentences must be analyzed at the morphologic level and define some translation rules, in order to avoid lost original sentences mean. We hope this paper give some contribution for advanced studies to the Uyghur language in Machine Translation and Natural Language processing.
international conference on software engineering | 2010
Murat Orhun; A. Cüneyd Tantuğ; Eşref Adalı
In this paper, we present a rule based model for morphological disambiguation of Uyghur language. Morphological ambiguity is a challenging problem for agglutinative languages. Because there is a possibility a word takes unlimited number of suffixes. If that language has more suffixes, then the ambiguity problem gets more complex. Uyghur language is one of Turkic language that has more and complicated suffixes. At present we have defined some rules to disambiguate Uyghur morphology respecting to modern Uyghur language grammar. As a natural language, there are many cases exist and the researches in progress on more specific cases.
language resources and evaluation | 2008
A. Cüneyd Tantuğ; Kemal Oflazer; Ilknur Durgar El-Kahlout
Archive | 2007
A. Cüneyd Tantuğ; Eşref Adalı; Kemal Oflazer
Int. J. of Asian Lang. Proc. | 2009
Murat Orhun; A. Cüneyd Tantuğ; Eşref Adalı