Kemal Oflazer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kemal Oflazer is active.

Explore More

Publication

Featured researches published by Kemal Oflazer.

Archive | 2003

Building a Turkish Treebank

Kemal Oflazer; Bilge Say; Dilek Hakkani-Tür; Gokhan Tur

We present the issues that we have encountered in designing a treebank architecture for Turkish along with rationale for the choices we have made for various representation schemes. In the resulting representation, the information encoded in the complex agglutinative word structures are represented as a sequence of inflectional groups separated by derivational boundaries. The syntactic relations are encoded as labeled dependency relations among segments of lexical items marked by derivation boundaries. Our current work involves refining a set of treebank annotation guidelines and developing a sophisticated annotation tool with an extendable plug-in architecture for morphological analysis, morphological disambiguation and syntactic annotation disambiguation.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1983

Design and implementation of a single-chip 1-D median filter

Kemal Oflazer

The design and implementation of a VLSI chip for the one-dimensional median filtering operation is presented. The device is designed to operate on 8-bit sample sequences with a window size of five samples. Extensive pipelining and employment of systolic data-flow concepts at the bit level enable the chip to filter at rates up to ten megasamples per second. A configuration for using the chip for approximate two-dimensional median filtering operation is also presented.

Computers and The Humanities | 2002

Statistical Morphological Disambiguation for Agglutinative Languages

Dilek Hakkani-Tür; Kemal Oflazer; Gokhan Tur

We present statistical models for morphological disambiguation in agglutinative languages, with a specific application to Turkish. Turkish presents an interesting problem for statistical models as the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflectional groups and surface roots in trigram models. Among the four models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.

Computational Linguistics | 2008

Dependency parsing of turkish

Gülşen Eryiğit; Joakim Nivre; Kemal Oflazer

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.

Natural Language Engineering | 2003

A statistical information extraction system for Turkish

Gokhan Tur; Dilek Hakkani-Tür; Kemal Oflazer

This paper presents the results of a study on information extraction from unrestricted Turkish text using statistical language processing methods. In languages like English, there is a very small number of possible word forms with a given root word. However, languages like Turkish have very productive agglutinative morphology. Thus, it is an issue to build statistical models for specific tasks using the surface forms of the words, mainly because of the data sparseness problem. In order to alleviate this problem, we used additional syntactic information, i.e. the morphological structure of the words. We have successfully applied statistical methods using both the lexical and morphological information to sentence segmentation, topic segmentation, and name tagging tasks. For sentence segmentation, we have modeled the final inflectional groups of the words and combined it with the lexical model, and decreased the error rate to 4.34%, which is 21% better than the result obtained using only the surface forms of the words. For topic segmentation, stems of the words (especially nouns) have been found to be more effective than using the surface forms of the words and we have achieved 10.90% segmentation error rate on our test set according to the weighted TDT-2 segmentation cost metric. This is 32% better than the word-based baseline model. For name tagging, we used four different information sources to model names. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained some improvement. After this, we modeled the morphological analyses of the words, and finally we modeled the tag sequence, and reached an F-Measure of 91.56%, according to the MUC evaluation criteria. Our results are important in the sense that, using linguistic information, i.e. morphological analyses of the words, and a corpus large enough to train a statistical model significantly improves these basic information extraction tasks for Turkish.

conference on applied natural language processing | 1994

Tagging and Morphological Disambiguation of Turkish Text

Kemal Oflazer; Ilker Kuruoz

Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multiword and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Preliminary results indicate that the tagger can tag about 98-99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, generates, on the average, 50% less ambiguous parses and parses almost 2.5 times faster. The tagging functionality is not specific to Turkish, and can be applied to any language with a proper morphological analysis interface.

international conference on pattern recognition | 1992

A rotation, scaling and translation invariant pattern classification system

Cein Yuceer; Kemal Oflazer

Presents a hybrid pattern classification system which can classify patterns in a rotation, scaling, and translation invariant manner. The system is based on preprocessing the input image to map it into a rotation, scaling, and translation invariant canonical form, which is then classified by a multilayer feedforward neural net. Results from a number of classification problems are also presented in the paper.<<ETX>>

Computational Linguistics | 2001

Bootstrapping morphological analyzers by combining human elicitation and machine learning

Kemal Oflazer; Sergei Nirenburg; Marjorie McShane

This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three componentselicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.

Pattern Recognition | 1993

A rotation, scaling, and translation invariant pattern classification system

Cem Yüceer; Kemal Oflazer

Abstract This paper describes a hybrid pattern classification system based on a pattern preprocessor and an artificial neural network classifier that can recognize patterns even when they are deformed by transformation of rotation, scaling, and translation or a combination of these. After a description of the system architecture we provide experimental results from three different classification domains: classification of letters in the English alphabet, classification of the letters in the Japanese Katakana alphabet, and classification of geometric figures. For the first problem, our system can recognize patterns deformed by a single transformation with well over 90% success ratio and with 89% success ratio when all three transformations are applied. For the second problem, the system performs very good for patterns deformed by scaling and translation but worse (about 75%) when rotations are involved. For the third problem, the success ratio is almost 100% when only a single transformation is applied and 88% when all three transformations are applied. The system is general purpose and has a reasonable noise tolerance.

workshop on statistical machine translation | 2006

Initial Explorations in English to Turkish Statistical Machine Translation

Ilknur Durgar El-Kahlout; Kemal Oflazer

This paper presents some very preliminary results for and problems in developing a statistical machine translation system from English to Turkish. Starting with a baseline word model trained from about 20K aligned sentences, we explore various ways of exploiting morphological structure to improve upon the baseline system. As Turkish is a language with complex agglutinative word structures, we experiment with morphologically segmented and disambiguated versions of the parallel texts in order to also uncover relations between morphemes and function words in one language with morphemes and functions words in the other, in addition to relations between open class content words. Morphological segmentation on the Turkish side also conflates the statistics from allomorphs so that sparseness can be alleviated to a certain extent. We find that this approach coupled with a simple grouping of most frequent morphemes and function words on both sides improve the BLEU score from the baseline of 0.0752 to 0.0913 with the small training data. We close with a discussion on why one should not expect distortion parameters to model word-local morpheme ordering and that a new approach to handling complex morphotactics is needed.

Explore More