Jelena Prokić
University of Groningen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jelena Prokić.
conference of the european chapter of the association for computational linguistics | 2009
Martijn Wieling; Jelena Prokić; John Nerbonne
Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments of Bulgarian phonetic dialect data are used to compare four algorithms with a manually corrected gold standard. The algorithms evaluated include three variants of the Levenshtein algorithm as well as the Pair Hidden Markov Model. Our results show that while all algorithms perform very well and align around 95% of all alignments correctly, there are specific qualitative differences in the (mis)alignments of the different algorithms.
conference of the european chapter of the association for computational linguistics | 2009
Jelena Prokić; Martijn Wieling; John Nerbonne
In this study we apply and evaluate an iterative pairwise alignment program for producing multiple sequence alignments, ALPHAMALIG (Alonso et al., 2004), using as material the phonetic transcriptions of words used in Bulgarian dialectological research. To evaluate the quality of the multiple alignment, we propose two new methods based on comparing each column in the obtained alignments with the corresponding column in a set of gold standard alignments. Our results show that the alignments produced by ALPHAMALIG correspond well with the gold standard alignments, making this algorithm suitable for the automatic generation of multiple string alignments. Multiple string alignment is particularly interesting for historical reconstruction based on sound correspondences.
International Journal of Humanities and Arts Computing | 2008
Jelena Prokić; John Nerbonne
In this paper we apply various clustering algorithms to the dialect pronunciation data. At the same time we propose several evaluation techniques that should be used in order to deal with the instability of the clustering techniques. The results have shown that three hierarchical clustering algorithms are not suitable for the data we are working with. The rest of the tested algorithms have successfully detected two-way split of the data into the Eastern and Western dialects. At the aggregate level that we used in this research, no further division of sites can be asserted with high confidence.
meeting of the association for computational linguistics | 2007
Jelena Prokić
The aim of this paper is to present a new method for identifying linguistic structure in the aggregate analysis of the language variation. The method consists of extracting the most frequent sound correspondences from the aligned transcriptions of words. Based on the extracted correspondences every site is compared to all other sites, and a correspondence index is calculated for each site. This method enables us to identify sound alternations responsible for dialect divisions and to measure the extent to which each alternation is responsible for the divisions obtained by the aggregate analysis.
Scando Slavica | 2010
H.P. Houtzagers; John Nerbonne; Jelena Prokić
Dialect classification is a classical problem in traditional dialectology. In the course of the last few decades, several quantitative approaches have been suggested as solutions for this problem, one of which uses “Levenshtein distance” for measuring linguistic distances between dialects. In the present paper we shall introduce the Levenshtein algorithm as well as two methods with which the results of the measuring can be analyzed, viz. multidimensional scaling and clustering. Then we shall apply these methods to the Bulgarian language area and present a quantitative classification of Bulgarian dialects. Finally, we shall compare the classification obtained to the most widely accepted traditional Bulgarian dialect map, analyze the similarities and differences and evaluate our method.
Archive | 2006
Sandra Kübler; Jelena Prokić
Verba : Anuario Galego de Filoloxia | 2012
Esteve Valls; John Nerbonne; Jelena Prokić; Martijn Wieling; Esteve Clua; Maria-Rosa Lloret
Serdica Journal of Computing | 2009
Jelena Prokić; John Nerbonne; Vladimir Zhobov; Petya Osenova; Kiril Simov; Thomas Zastrow; Erhard W. Hinrichs
Tools for linguistic variation, 2010, ISBN 978-84-9860-429-0, págs. 41-56 | 2010
John Nerbonne; Jelena Prokić; Martijn Wieling; Charlotte Gooskens
meeting of the association for computational linguistics | 2010
Jelena Prokić; Tim Van de Cruys