Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Špela Vintar is active.

Publication


Featured researches published by Špela Vintar.


International Journal of Medical Informatics | 2002

Semantic annotation for concept-based cross-language medical information retrieval

Martin Volk; Bärbel Ripplinger; Špela Vintar; Paul Buitelaar; Diana Raileanu; Bogdan Sacaleanu

We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech tagging, morphological analysis, phrase recognition and the identification of medical terms and semantic relations between them. The paper describes experiments in monolingual and cross-language document retrieval, performed on a corpus of medical abstracts. Results show that linguistic processing, especially lemmatization and compound analysis for German, is a crucial step in achieving a good baseline performance. On the other hand, they show that semantic information, specifically the combined use of concepts and relations, increases the performance in monolingual and cross-language retrieval.


text speech and dialogue | 2005

The voiceTRAN speech-to-speech communicator

Jerneja Žganec-Gros; Tomaž Erjavec; Špela Vintar

The paper presents the design concept of the VoiceTRAN Communicator that integrates speech recognition, machine translation and text-to-speech synthesis using the DARPA Galaxy architecture. The aim of the project is to build a robust speech-to-speech translation communicator able to translate simple domain-specific sentences in the Slovenian-English language pair. The project represents a joint collaboration between several Slovenian research organizations that are active in human language technologies. We provide an overview of the task, describe the system architecture and individual servers. Further we describe the language resources that will be used and developed within the project. We conclude the paper with plans for evaluation of the VoiceTRAN Communicator.


portuguese conference on artificial intelligence | 2005

Unsupervised learning of multiword units from part-of-speech tagged corpora: does quantity mean quality?

Gaël Dias; Špela Vintar

This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns that lead to the identification of well-known multiword units (mainly compound nouns), we automatically identify relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. Finally, we propose an exhaustive evaluation of our architecture based on the multi-domain, bilingual Slovene-English IJS-ELAN corpus where surprising results are evidenced. To our knowledge, this challenge has never been attempted before.


Perspectives-studies in Translatology | 2016

A bird's eye view of lexical creativity in original vs. translated Slovene fiction

Špela Vintar

ABSTRACT This paper addresses lexical creativity and applies corpus-based methods to, firstly, identify potentially creative lexemes and, secondly, compare translations into Slovene from different source languages (English, German, French, and Italian) with texts originally written in Slovene. The primary resource for our work is the Spook corpus of translated and original contemporary literary texts in Slovene. We attempt to capture lexical innovations by way of three methods: by looking into the words occurring only once (hapax legomena); by extracting words that occur in only one of the books; and, finally, by comparing the lexical inventories of original English and translated Slovene texts to large reference corpora EnTenTen and Gigafida, respectively. Our quantitative results imply that translators are at least as creative as authors in coining new words or using unexpected word forms, whereby it seems that the English–Slovene language pair contains the largest number of novel lexical items. The analysis of text-specific word lists reveals the special lexical properties of each single book, including specialised terminology, slang, and dialect vocabulary, as well as author- or translator-specific neologisms, borrowings, and coinings. While these findings cannot be generalised in terms of a prevailing translation strategy, results are encouraging because they show that – at least in our corpus – translators know how to be bold in their lexical choices and do not appear to be inferior to authors in their ability to create new words.


Archive | 2016

Using WordNet-Based Word Sense Disambiguation to Improve MT Performance

Špela Vintar; Darja Fišer

We report on a series of experiments aimed at improving the machine translation of ambiguous lexical items by using WordNet-based unsupervised Word Sense Disambiguation (WSD) and comparing its results to three MT systems. Our experiments are performed for the English-Slovene language pair using UKB, a freely available graph-based word sense disambiguation system. Since the fine granularity of WordNet is often reported as problematic, we compare the performance of UKB using all WordNet senses with using sense clusters. Results are evaluated in three ways: a manual evaluation of WSD performance from MT perspective, an analysis of agreement between the WSD-proposed equivalent and those suggested by the three systems, and finally by computing BLEU, NIST and METEOR scores for all translation versions. Our results show that WSD performs with a MT-relevant precision of 71 % and that 21 % of sense-related MT errors could be prevented by using unsupervised WSD. We also show that sense clusters improve MT-relevant precision.


wissensmanagement | 2003

Ontologies in Cross-Language Information Retrieval.

Martin Volk; Špela Vintar; Paul Buitelaar


Archive | 2003

Evaluating Context Features for Medical Relation Mining

Špela Vintar; Ljupčo Todorovski; Daniel Sonntag; Paul Buitelaar


language resources and evaluation | 2008

Harvesting Multi-Word Expressions from Parallel Corpora.

Špela Vintar; Darja Fišer


meeting of the association for computational linguistics | 2011

Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction

Darja Fišer; Nikola Ljubešić; Špela Vintar; Senja Pollak


language resources and evaluation | 2002

An Efficient and Flexible Format for Linguistic and Semantic Annotation.

Špela Vintar; Paul Buitelaar; Bärbel Ripplinger; Bogdan Sacaleanu; Diana Raileanu; Detlef Prescher

Collaboration


Dive into the Špela Vintar's collaboration.

Top Co-Authors

Avatar

Darja Fišer

University of Ljubljana

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul Buitelaar

German Research Centre for Artificial Intelligence

View shared research outputs
Top Co-Authors

Avatar

Senja Pollak

University of Ljubljana

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gaël Dias

University of Beira Interior

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nada Lavrač

University of Nova Gorica

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge