Duško Vitas
University of Belgrade
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Duško Vitas.
intelligent information systems | 2016
Miljana Mladenović; Jelena Mitrović; Cvetana Krstev; Duško Vitas
This paper presents a process of building a Sentiment Analysis Framework for Serbian (SAFOS). We created a hybrid method that uses a sentiment lexicon and Serbian WordNet (SWN) synsets assigned with sentiment polarity scores in the process of feature selection. As the use of stemming for morphologically rich languages (MRLs) may result in loss or giving incorrect sentiment meaning to words, we decided to expand the sentiment lexicon, as well as the lexicon generated using SWN, by adding morphological forms of emotional terms and phrases. It was done using Serbian Morphological Electronic Dictionaries. A new feature reduction method for document-level sentiment polarity classification using maximum entropy modeling is proposed. It is based on mapping of a large number of related feature candidates (sentiment words, phrases and their inflectional forms) to a few concepts and using them as features. Testing was performed on a 10-fold cross validation set and on test sets containing news and movie reviews. The results of all experiments show that sentiment feature mapping for feature set reduction achieves better results over the basic set of features. For both test sets, the best classification accuracy scores were achieved for the combination of unigram and bigram features reduced by sentiment feature mapping (accuracy 78.3 % for movie reviews and 79.2 % for news test set). In 10-fold cross-validation, best average accuracy score of 95.6 % was obtained using unigrams as features, reduced by the mapping procedure.
international conference natural language processing | 2010
Cvetana Krstev; Ranka Stanković; Ivan Obradović; Duško Vitas; Miloš Utvić
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation of the proposed procedure on several different sets of data. Finally, we discuss some implementation issues and present how the same procedure is used for other languages.
text speech and dialogue | 2004
Gordana Pavlović-Lažetić; Duško Vitas; Cvetana Krstev
Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain unrecognized. In this paper we present two aspects of e-dictionary enhancement that provide for additional recognition of two important categories of words: named entities and words generally not recorded in traditional dictionaries. We first describe the structure and content of dictionaries of proper names, both personal and geographic, developed to recognize the corresponding classes of named entities. Then we present a set of lexical transducers expressing morphological rules governing word formation, developed for the recognition of unknown words. The resources presented significantly improve the lexical recognition process.
international conference natural language processing | 2006
Cvetana Krstev; Duško Vitas; Agata Savary
The paper describes the steps that were undertaken in order to start the production of a comprehensive morphological dictionary of compounds for Serbian. First, the classes of multi-word expressions were determined that were to be covered by the dictionaries. In the next step the useful sources of compounds were detected. The retrieved compounds were then classified according to their inflectional properties. The recently developed special finite state transducers were constructed for each of these classes which produce all the variants and morphological forms for the compounds of the class. Finally, the software module was developed that facilitates the production of the dictionary of compound lemmas with all the necessary information in the required format.
text speech and dialogue | 2003
Cvetana Krstev; Gordana Pavlović-Lažetić; Ivan Obradović; Duško Vitas
In this paper we describe how the existing monolingual Serbian corpus, the bilingual Serbian/English (S/E) and Serbian/French (S/F) aligned corpora, and the appropriate morphological e-dictionaries, have been used in validation, development, and refinement of Serbian WordNet. The influence of different derivational processes, e.g. derivation of augmentatives/diminutives and possessive adjectives from nouns, to the structure of Serbian synsets is examined. A part of the experimental results that justify the applied approach is given.
conference of the european chapter of the association for computational linguistics | 2003
Duško Vitas; Cvetana Krstev
The technology of finite-state transducers is implemented to recognize, lemmatize and tag composite tenses in Serbian in a way that connects the auxiliary and main verb. The suggested approach uses a morphological electronic dictionary of simple words and appropriate local grammars.
Semanitic Keyword-based Search on Structured Data Sources | 2016
Ranka Stanković; Cvetana Krstev; Duško Vitas; Nikola Vulović; Olivera Kitanović
This paper outlines the main features of Biblisa, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblisa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic dictionaries, SQL and NoSQL databases, which are distributed in different servers accessed in various ways. The web application has been tested on a collection of texts from 3 journals and 2 projects, comprising 299 documents generated from TMX, stored in a NoSQL database. The tool allows the full-text and metadata search, with extraction of concordance sentence pairs for translation and terminology work support.
Computational Linguistics - Applications | 2013
Cvetana Krstev; Ivan Obradović; Ranka Stanković; Duško Vitas
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making the production of MWU dictionary lemmas more efficient. This procedure, which strongly relies on our comprehensive e-dictionaries of Serbian simple words, was subsequently implemented as a new functionality LeXimir. In this paper we present our approach, and offer an evaluation of the performance of the new functionality of LeXimir, and hence of our procedure, obtained through two rounds of experiments on various types of data. The paper ends with a brief discussion of some further possible applications of both the procedure and LeXimir in various language processing tasks.
Advances in Computers | 2013
Stasa Vujicic Stankovic; Nemanja Kojic; Goran Rakocevic; Duško Vitas; Veljko Milutinovic
Abstract In this article, we propose one original classification and one extension thereof, which takes into consideration the relevant issues in Natural Language Processing. The newly introduced classification of Data Mining algorithms is on the level of a single Wireless Sensor Network and its extension to Concept Modeling on the level of a System of Wireless Sensor Networks. Most of the scientists in this field put emphasis on issues related to applications of Wireless Sensor Networks in different areas, while we here put emphasis on categorization of the selected approaches from the open literature, to help application designers/developers get a better understanding of their options in different areas. Our main goal is to provide a good starting point for a more effective analysis leading to possible new solutions, possible improvements of existing solutions, and possible combination of two or more of the existing solutions into new ones, using the hybridization principle. Another contribution of this article is a synergistic interdisciplinary review of problems in two areas: Data Mining and Natural Language Processing. This enables interoperability improvements on the interface between Wireless Sensor Networks that often share data in native natural languages.
Archive | 2012
Duško Vitas; Ljubomir Popović; Cvetana Krstev; Ivan Obradović; Gordana Pavlović-Lažetić; Mladen Stanojevic
META-NET is a Network of Excellence partially funded by the European Commission. The network currently consists of 54 research centres in 33 European countries. META-NET forges META, the Multilingual Europe Technology Alliance, a growing community of language technology professionals and organisations in Europe.