Tarik Kisla
Ege University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tarik Kisla.
international conference on information technology new generations | 2008
Taner Dincer; Bahar Karaoglan; Tarik Kisla
In this paper, we present a stochastic part-of-speech tagger for Turkish. The tagger is primarily developed for information retrieval purposes, but it can as well serve as a light-weight PoS tagger for other purposes. The tagger uses a well-established Hidden Markov model of the language with a closed lexicon that consists of fixed number of letters from the word endings. We have considered seven different lengths of word endings against 30 training corpus sizes. Best- case accuracy obtained is 90.2% with 5 characters. The main contribution of this paper is to present a way of constructing a closed vocabulary for part-of-speech tagging effort that can be useful for highly inflected languages like Turkish, Finnish, Hungarian, Estonian, and Czech.
signal processing and communications applications conference | 2016
Senem Kumova; Bahar Karaoglan; Tarik Kisla
Identification of paraphrase sentence pairs becomes increasingly prominent in natural language processing area (e.g plagiarism detection, summarization, machine translation). In this study, it is proposed to employ information gain measure in determining the value-ranges of the paraphrase classification features on the renown paraphrase corpus of Microsoft Research (MSRP). The classification performances of value-ranges that are determined by information gain measure and an alternative heuristic method are compared by the use of Bayes classifier. The results show that the proposed method performs better than the heuristic method.
signal processing and communications applications conference | 2013
Senem Kumova Metin; Tarik Kisla; Bahar Karaoglan
Natural language processing can be seen as a signal processing problem when the characters, syllabi, words, punctuations in a text are considered as signals. In this article, we present a novel approach that detects text similarity in Turkish, based on the similarities of the lists of retrieved documents when the texts are given as queries to web search engines. The similarities between the URLs contained in the items of the returned lists are measured using statistical methods like euclidean, city-block, chebychev, cosine, correlation, spearman and hamming distances. For experimenting, a corpus of 150 news is developed by gathering news in 50 different topics from 3 Turkish newspapers published during a certain time slot. News on the same topic published in different newspapers are considered as similar texts. Statistical methods are applied on the formed newsXterms matrix; and for each news similar news are ranked from the most similar to least similar. If at least one of the top two is the same with the ones marked manully as similar, it is counted as success. Experimental results show that cosines and correlation distances give the best performance with 84% precision.
conference on intelligent text processing and computational linguistics | 2016
Bahar Karaoglan; Tarik Kisla; Senem Kumova Metin
Because developing a corpus requires a long time and lots of human effort, it is desirable to make it as resourceful as possible: rich in coverage, flexible, multipurpose and expandable. Here we describe the steps we took in the development of Turkish paraphrase corpus, the factors we considered, problems we faced and how we dealt with them. Currently our corpus contains nearly 4000 sentences with the ratio of 60% paraphrase and 40% non-paraphrase sentence pairs. The sentence pairs are annotated at 5-scale: paraphrase, encapsulating, encapsulated, non-paraphrase and opposite. The corpus is formulated in a database structure integrated with Turkish dictionary. The sources we used till now are news texts from Bilcon 2005 corpus, a set of professionally translated sentence pairs from MSRP corpus, multiple Turkish translations from different languages that are involved in Tatoeba corpus and user generated paraphrases.
signal processing and communications applications conference | 2015
Tarik Kisla; Senem Kumova Metin; Bahar Karaoglan
Automatic identification of text similarity has found applications in information retrieval, text summarization, assessment of machine translation, assessment of question answering, word sense disambiguation and many more. In this work, the results of discrimant analysis applied to find out the cumulative effect of the attributes used in the literature so far (ratio of common words, text lentgths, common word sequences, synonyms, hypernyms, hyponyms) in detecting word similarity are reported.
2013 24th EAEEIE Annual Conference (EAEEIE 2013) | 2013
Bahar Karaoglan; Tarik Kisla
Information related to the same “information need”, is available from many different sources widely spread on the web. This information may be differently worded or organized based on the conceptualization of the domain by different people and design autonomy. Ontologies are seen as way out for overcoming this semantic heterogeneity and bringing a common understanding for an integrated access, that is formulating a consolidated answer to a single query. In this work we propose an ontology based course prescription for individuals who are willing to enhance their competencies in certain areas.
international symposium on computer and information sciences | 2007
İlker Kocabaş; Tarik Kisla; Bahar Karaoglan
Zipf law of burstiness of content words is being less studied than his laws that describe the relation between the rank and the frequency of words. Zipf counted the number of intervals of the same length between the repetitions of the words belonging to the same frequency class and on a 260,000 word English corpus empirically showed that the interval size, I, between each occurrence of a word is inversely proportional to the number of intervals having that size: F a Ip, where p varied between 1 and 1.3. In this study we investigated the validity of the law of burstiness on a Turkish corpus of size 55,000 and found p varying between 0.5 and 0.8.
Archive | 2009
Bulent Cavas; Bahar Karaoglan; Tarik Kisla
Procedia - Social and Behavioral Sciences | 2009
Tarik Kisla; Y. Deniz Arikan; Firat Sarsar
Society for Information Technology & Teacher Education International Conference | 2013
Firat Sarsar; Tarik Kisla