Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Burcu Can is active.

Publication


Featured researches published by Burcu Can.


cross language evaluation forum | 2009

Clustering morphological paradigms using syntactic categories

Burcu Can; Suresh Manandhar

We propose a new clustering algorithm for the induction of the morphological paradigms. Our method is unsupervised and exploits the syntactic categories of the words acquired by an unsupervised syntactic category induction algorithm [1]. Previous research [2,3] on joint learning of morphology and syntax has shown that both types of knowledge affect each other making it possible to use one type of knowledge to help learn the other one.


International Conference on Statistical Language and Speech Processing | 2016

Unsupervised Morphological Segmentation Using Neural Word Embeddings

Ahmet Üstün; Burcu Can

We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.


international conference on computational linguistics | 2014

Methods and Algorithms for Unsupervised Learning of Morphology

Burcu Can; Suresh Manandhar

This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL minimum description length, MLE maximum likelihood estimation, MAP maximum a posteriori, parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.


signal processing and communications applications conference | 2016

Clustering word roots syntactically

Mustafa Öztürk; Burcu Can

Distributional representation of words is used for both syntactic and semantic tasks. In this paper two different methods are presented for clustering word roots. In the first method, the distributional model word2vec [1] is used for clustering word roots, whereas distributional approaches are generally used for words. For this purpose, the distributional similarities of roots are modeled and the roots are divided into syntactic categories (noun, verb etc.). In the other method, two different models are proposed: an information theoretical model and a probabilistic model. With a metric [8] based on mutual information and with another metric based on Jensen-Shannon divergence, similarities of word roots are calculated and clustering is performed using these metrics. Clustering word roots has a significant role in other natural language processing applications such as machine translation and question answering, and in other applications that include language generation. We obtained a purity of 0.92 from the obtained clusters.


soft computing and pattern recognition | 2013

A syllable-based Turkish speech recognition system by using time delay neural networks (TDNNs)

Burcu Can; Harun Artuner

In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are selected for the recognition. The use of longer recognition units in speech recognition systems increases the success of the recognition since it is easier to detect the endpoints of syllables when compared to phonemes. On the other side, word-based recognition requires a very large dataset that includes all the words and word forms in the language, which is also another challenge. Hereby, we take the advantage of Turkish being an ortographically transparent and syllabified language. Our model employs time delay neural networks (TDNNs) for learning syllables. We achieve an accuracy of %65.6 on our large vocabulary continuous speech corpus. In addition, we define an algorithm for the automatic detection of syllable boundaries which gives an accuracy of %44. The automatic syllable boundary detection module is used for the recognition of isolated syllables rather than a continuous speech.


Computational Linguistics | 2018

Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation

Burcu Can; Suresh Manandhar

This article presents a probabilistic hierarchical clustering model for morphological segmentation. In contrast to existing approaches to morphology learning, our method allows learning hierarchical organization of word morphology as a collection of tree structured paradigms. The model is fully unsupervised and based on the hierarchical Dirichlet process. Tree hierarchies are learned along with the corresponding morphological paradigms simultaneously. Our model is evaluated on Morpho Challenge and shows competitive performance when compared to state-of-the-art unsupervised morphological segmentation systems. Although we apply this model for morphological segmentation, the model itself can also be used for hierarchical clustering of other types of data.


signal processing and communications applications conference | 2017

Stem-based PoS tagging for agglutinative languages

Necva Bölücü; Burcu Can

Words are made up of morphemes being glued together in agglutinative languages. This makes it difficult to perform part-of-speech tagging for these languages due to sparsity. In this paper, we present two Hidden Markov Model based Bayesian PoS tagging models for agglutinative languages. Our first model is word-based and the second model is stem-based where the stems of the words are obtained from other two unsupervised stemmers: HPS stemmer and Morfessor FlatCat. The results show that stemming improves the accuracy in PoS tagging. We present the results for Turkish as an agglutinative language and English as a morphologically poor language.


international conference on asian language processing | 2016

Modeling morpheme triplets with a three-level hierarchical Dirichlet process

Serkan Kumyol; Burcu Can

Morphemes are not independent units and attached to each other based on morphotactics. However, they are assumed to be independent from each other to cope with the complexity in most of the models in the literature. We introduce a language independent model for unsupervised morphological segmentation using hierarchical Dirichlet process (HDP). We model the morpheme dependencies in terms of morpheme trigrams in each word. Trigrams, bigrams and unigrams are modeled within a three-level HDP, where the trigram Dirichlet process (DP) uses the bigram DP and bigram DP uses unigram DP as the base distribution. The results show that modeling morpheme dependencies improve the F-measure noticeably in English, Turkish and Finnish.


conference on intelligent text processing and computational linguistics | 2016

Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets.

Burcu Can; Ahmet Üstün; Murathan Kurfali

Sparsity is one of the major problems in natural language processing. The problem becomes even more severe in agglutinating languages that are highly prone to be inflected. We deal with sparsity in Turkish by adopting morphological features for part-of-speech tagging. We learn inflectional and derivational morpheme tags in Turkish by using conditional random fields (CRF) and we employ the morpheme tags in part-of-speech (PoS) tagging by using hidden Markov models (HMMs) to mitigate sparsity. Results show that using morpheme tags in PoS tagging helps alleviate the sparsity in emission probabilities. Our model outperforms other hidden Markov model based PoS tagging models for small training datasets in Turkish. We obtain an accuracy of 94.1% in morpheme tagging and 89.2% in PoS tagging on a 5K training dataset.


CLEF (Working Notes) | 2009

Unsupervised Learning of Morphology by Using Syntactic Categories

Burcu Can; Suresh Manandhar

Collaboration


Dive into the Burcu Can's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ahmet Üstün

Middle East Technical University

View shared research outputs
Top Co-Authors

Avatar

Murathan Kurfali

Middle East Technical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cengiz Acartürk

Middle East Technical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Serkan Kumyol

Middle East Technical University

View shared research outputs
Top Co-Authors

Avatar

Özkan Kiliç

Middle East Technical University

View shared research outputs
Researchain Logo
Decentralizing Knowledge