Stefan Gerdjikov
Sofia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefan Gerdjikov.
international conference on management of data | 2014
Sebastian Wandelt; Dong Deng; Stefan Gerdjikov; Shashwat Mishra; Petar Mitankin; Manish Patil; Enrico Siragusa; Alexander Tiskin; Wei Wang; Jiaying Wang; Ulf Leser
String similarity search and its variants are fundamental problems with many applications in areas such as data integration, data quality, computational linguistics, or bioinformatics. A plethora of methods have been developed over the last decades. Obtaining an overview of the state-of-the-art in this field is difficult, as results are published in various domains without much cross-talk, papers use different data sets and often study subtle variations of the core problems, and the sheer number of proposed methods exceeds the capacity of a single research group. In this paper, we report on the results of the probably largest benchmark ever performed in this field. To overcome the resource bottleneck, we organized the benchmark as an international competition, a workshop at EDBT/ICDT 2013. Various teams from different fields and from all over the world developed or tuned programs for two crisply defined problems. All algorithms were evaluated by an external group on two machines. Altogether, we compared 14 different programs on two string matching problems (k-approximate search and k-approximate join) using data sets of increasing sizes and with different characteristics from two different domains. We compare programs primarily by wall clock time, but also provide results on memory usage, indexing time, batch query effects and scalability in terms of CPU cores. Results were averaged over several runs and confirmed on a second, different hardware platform. A particularly interesting observation is that disciplines can and should learn more from each other, with the three best teams rooting in computational linguistics, databases, and bioinformatics, respectively.
international conference on document analysis and recognition | 2013
Stefan Gerdjikov; Stoyan Mihov; Vladislav Nenchev
We describe a novel approach for the extraction of spelling variations from a list of instances. It relates distinctive infixes to distinctive infixes of referenced words. The distinctive infixes are extracted automatically from a (multi)set of instances and a referenced dictionary without any additional expert knowledge. Based on the spelling variations retrieved during a learning(training) phase we develop a correction algorithm which suggests and ranks candidates for a particular noisy word. The main advantage of our approach is that it provides good corrections for the unobserved noisy words while it is almost perfect on words observed during the learning. Our experimental results of the normalisation of a typical reference corpus of Early Modern English letters, [1], significantly improve over previous results of VARD2, [2]. We also achieve better results than those reported in [3] and [4] on the OCR-correction of the TREC-5 Confusion Track corpus,[5].
Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage | 2014
Petar Mitankin; Stefan Gerdjikov; Stoyan Mihov
We present a novel approach to unsupervised noisy text correction. Our approach is based on automatic extraction of historical variation patterns by analysing the structure of the words from a historical corpus and comparing it with the structure of the contemporary dictionary. Based on the extracted variation patterns the core candidate generator, REBELS, produces correction candidates even outside the modern dictionary. Further, the sentence correction is complemented with a modern language model combined in a log-linear model. The quality of our unsupervised approach is empirically compared against a supervised system competitive with the state-of-the-art supervised text normalisation systems. The experiments show that our system delivers 81.79% normalisation accuracy of 17th century English historical texts in a fully unsupervised setup.
language and automata theory and applications | 2017
Stefan Gerdjikov; Stoyan Mihov
The input determinization of a finite-state transducer for constructing an equivalent subsequential transducer is performed by the well-known inductive transducer determinization procedure. This procedure has been shown to complete for rational functions with the bounded variation property. The result has been obtained for functions \(f : \varSigma ^*\rightarrow \mathcal {M}\), where \(\mathcal {M}\) is a free monoid, the monoid of non-negative real numbers with addition or a Cartesian product of those monoids. In this paper we generalize this result and define and prove sufficient conditions for a monoid \(\mathcal {M}\) and a rational function \(f : \varSigma ^*\rightarrow \mathcal {M}\), under which the transducer determinization procedure is applicable and terminates.
document analysis systems | 2014
Andrey Sariev; Vladislav Nenchev; Stefan Gerdjikov; Petar Mitankin; Hristo Ganchev; Stoyan Mihov; Tinko Tinchev
We present a new general and language independent approach to the noisy text correction problem developed and implemented in the framework of the CULTURA project. We briefly describe the core candidate generator, REBELS, the complete system concept, its efficient implementation based on functional automata and its immediate applications. The quality of the whole system is empirically established in different experimental settings where language and noise sources are varied.
language and automata theory and applications | 2018
Stefan Gerdjikov
In this paper we consider the problems of canonisation and minimisation of subsequential transducer with output in an arbitrary monoid. We show that these problems can be efficiently solved for a large class of monoids that includes the free monoids, tropical monoid, and groups, and is closed under Cartesian Product. We describe this class of monoids in terms of five simple axioms. The first four of them seem to be natural. For the last one, we show that it is also necessary.
Computational Geometry: Theory and Applications | 2008
Stefan Gerdjikov; Alexander Wolff
In this paper we consider the problem of decomposing a simple polygon into subpolygons that exclusively use vertices of the given polygon. We allow two types of subpolygons: pseudo-triangles and convex polygons. We call the resulting decomposition PT-convex. We are interested in minimum decompositions, i.e., in decomposing the input polygon into the least number of subpolygons. Allowing subpolygons of one of two types has the potential to reduce the complexity of the resulting decomposition considerably. The problem of decomposing a simple polygon into the least number of convex polygons has been considered. We extend a dynamic-programming algorithm of Keil and Snoeyink for that problem to the case that both convex polygons and pseudo-triangles are allowed. Our algorithm determines such a decomposition in O(n^3) time and space, where n is the number of the vertices of the polygon.
international conference on implementation and application of automata | 2018
Stefan Gerdjikov
In this paper we consider the problem of sequentialisation of rational functions \(f:\varSigma ^*\rightarrow \mathcal{M}\). We introduce a class of monoids that includes infinitary groups, free monoids, tropical monoids and is closed under Cartesian Product. For this class of monoids we provide a sequentialisation construction for transducers and appropriately generalise the notion of Twinning Property. We provide a construction to test the Twinning Property for transducers over the considered class of monoids and prove that it is a necessary and sufficient condition for the sequentialisation construction to terminate.
international conference on implementation and application of automata | 2017
Stefan Gerdjikov; Stoyan Mihov; Klaus U. Schulz
The standard construction of a bimachine from a functional transducer involves a preparation step for converting the transducer into an unambiguous transducer (A transducer is unambiguous if there exists at most one successful path for each label.). The conversion involves a specialized determinization. We introduce a new construction principle where the transducer is directly translated into a bimachine. For any input word accepted by the transducer the bimachine exactly imitates one successful path of the transducer. For some classes of transducers the new construction can build a bimachine with an exponentially lower number of states compared to the standard construction. We first present a simple and generic variant of the construction. A second specialized version leads to better complexity bounds in terms of the size of the bimachine.
DH | 2013
Séamus Lawless; Cormac Hampson; Petar Mitankin; Stefan Gerdjikov