Darinka Verdonik
University of Maribor
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Darinka Verdonik.
language resources and evaluation | 2013
Darinka Verdonik; Iztok Kosem; Ana Zwitter Vitez; Simon Krek; Marko Stabej
In recent years, building reference speech corpora was an important part of the activities which provided the necessary linguistic infrastructure in many European countries, for languages with many speakers (e.g., French, German, Spanish, Italian) as well as for those with smaller numbers of speakers (e.g., Swedish, Dutch, Czech, Slovak). This paper describes the process of the creation of a reference speech corpus and its distribution to potential users, as it was done in the case of the Slovene corpus GOS. The corpus structure and fieldwork experiences with recording, labelling system, and two levels of transcription (pronunciation-based and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files.
Discourse Studies | 2008
Darinka Verdonik; Andrej Žgank; Agnes Pisanski Peterlin
The relationships between text or talk and the context are among the basic fields of pragmatic research and an insight into their nature may contribute to a better understanding of language use. In this article, we use the results of an analysis of discourse marker use in two different conversational genres (telephone conversation and television interviews) in an attempt to examine the impact of context on the use of discourse markers, generalized for each analysed genre. In the first stage of the analysis, we observe important differences between the two genres: discourse markers are far more frequently used in telephone conversations than in television interviews. In the second stage of the analysis, we identify several contextual factors which contribute to the differences in the use of discourse markers. In this way, we obtain insight into this particular aspect of genre context-talk relationships, and identify some of the characteristics of the genres in question.
language resources and evaluation | 2007
Darinka Verdonik; Matej Rojc; Marko Stabej
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.
Pattern Recognition Letters | 2014
Mirjam Sepesy Maucec; Zdravko Kacic; Darinka Verdonik
This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation encounters problems if the parallel corpus is not big enough. To reduce the problem of data sparsity, we use the approach called factored translation, which has proven successful when translating between English and a morphologically rich language. We show that it is even more useful when translating between two highly inflected languages. The main contribution of the paper involves two extensions of the factored translation approach. First, we propose a new, more general asynchronous framework for training translation components, where lemmas in the lemma component and MSD tags in the MSD component are aligned independently of alignment done for surface word forms. The second contribution of the paper is a new technique for efficient use of a bilingual dictionary in the translation process. A dictionary is introduced into the lemma component to improve lexical translation. Dictionary use is based on entropy. We tested our enhanced translation approach on the Slovenian-Serbian language pair. The system was trained on a freely available OpenSubtitle corpus. The results show improvements in automatic scores (BLEU and TER). The approach could be used for other language pairs, especially if one or both are highly inflected.
International Journal of Speech Technology | 2007
Matej Rojc; Darinka Verdonik; Zdravko Kacic
This paper presents a framework for the efficient development and representation of morphological and phonetic lexicons, to be used in speech technology applications. Solutions that would be the most appropriate for developing speech technologies for specific language have to be analyzed when developing the lexicons. In the paper issues such as the development of resources, good word coverage in general texts, efficient coding of lexicons, representation (regarding time and memory space) and the integration of lexicons in speech processing applications are addressed. The construction process within the proposed framework is based on the use of finite-state machines and heterogeneous relation-graphs structures, and significantly reduces the time and effort needed for the construction of large-scale lexica, minimizes any analysis errors, and efficiently represents the lexicons, regarding time and memory usage. The wordlist construction process presented in the paper also guarantees that by using the constructed lexicons high word coverage is achieved in general texts. SIlex lexicons are large-scale phonetic and morphology lexicons for the Slovenian language, constructed within the new framework and with a developed toolset, and represent valuable language resources for the development of various speech processing applications for the Slovenian language.
conference of the international speech communication association | 2005
Andrej Zgank; Darinka Verdonik; Aleksandra Zögling Markuš; Zdravko Kacic
Journal of Pragmatics | 2010
Darinka Verdonik
language resources and evaluation | 2004
Darinka Verdonik; Matej Rojc; Zdravko Kacic
language resources and evaluation | 2004
Andrej Zgank; Tomaz Rotovnik; Mirjam Sepesy Maucec; Darinka Verdonik; Janez Kitak; Damjan Vlaj; Vladimir Hozjan; Zdravko Kacic; Bogomir Horvat
language resources and evaluation | 2006
Darinka Verdonik; Matej Rojc