Jelena Prokić | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jelena Prokić is active.

Explore More

Publication

Featured researches published by Jelena Prokić.

conference of the european chapter of the association for computational linguistics | 2009

Evaluating the Pairwise String Alignment of Pronunciations

Martijn Wieling; Jelena Prokić; John Nerbonne

Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments of Bulgarian phonetic dialect data are used to compare four algorithms with a manually corrected gold standard. The algorithms evaluated include three variants of the Levenshtein algorithm as well as the Pair Hidden Markov Model. Our results show that while all algorithms perform very well and align around 95% of all alignments correctly, there are specific qualitative differences in the (mis)alignments of the different algorithms.

conference of the european chapter of the association for computational linguistics | 2009

Multiple Sequence Alignments in Linguistics

Jelena Prokić; Martijn Wieling; John Nerbonne

In this study we apply and evaluate an iterative pairwise alignment program for producing multiple sequence alignments, ALPHAMALIG (Alonso et al., 2004), using as material the phonetic transcriptions of words used in Bulgarian dialectological research. To evaluate the quality of the multiple alignment, we propose two new methods based on comparing each column in the obtained alignments with the corresponding column in a set of gold standard alignments. Our results show that the alignments produced by ALPHAMALIG correspond well with the gold standard alignments, making this algorithm suitable for the automatic generation of multiple string alignments. Multiple string alignment is particularly interesting for historical reconstruction based on sound correspondences.

International Journal of Humanities and Arts Computing | 2008

Recognising Groups among Dialects

Jelena Prokić; John Nerbonne

In this paper we apply various clustering algorithms to the dialect pronunciation data. At the same time we propose several evaluation techniques that should be used in order to deal with the instability of the clustering techniques. The results have shown that three hierarchical clustering algorithms are not suitable for the data we are working with. The rest of the tested algorithms have successfully detected two-way split of the data into the Eastern and Western dialects. At the aggregate level that we used in this research, no further division of sites can be asserted with high confidence.

meeting of the association for computational linguistics | 2007

Identifying Linguistic Structure in a Quantitative Analysis of Dialect Pronunciation

Jelena Prokić

The aim of this paper is to present a new method for identifying linguistic structure in the aggregate analysis of the language variation. The method consists of extracting the most frequent sound correspondences from the aligned transcriptions of words. Based on the extracted correspondences every site is compared to all other sites, and a correspondence index is calculated for each site. This method enables us to identify sound alternations responsible for dialect divisions and to measure the extent to which each alternation is responsible for the divisions obtained by the aggregate analysis.

Scando Slavica | 2010

Quantitative and Traditional Classifications of Bulgarian Dialects Compared

H.P. Houtzagers; John Nerbonne; Jelena Prokić

Dialect classification is a classical problem in traditional dialectology. In the course of the last few decades, several quantitative approaches have been suggested as solutions for this problem, one of which uses “Levenshtein distance” for measuring linguistic distances between dialects. In the present paper we shall introduce the Levenshtein algorithm as well as two methods with which the results of the measuring can be analyzed, viz. multidimensional scaling and clustering. Then we shall apply these methods to the Bulgarian language area and present a quantitative classification of Bulgarian dialects. Finally, we shall compare the classification obtained to the most widely accepted traditional Bulgarian dialect map, analyze the similarities and differences and evaluate our method.

Archive | 2006