Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cvetana Krstev is active.

Publication


Featured researches published by Cvetana Krstev.


intelligent information systems | 2016

Hybrid sentiment analysis framework for a morphologically rich language

Miljana Mladenović; Jelena Mitrović; Cvetana Krstev; Duško Vitas

This paper presents a process of building a Sentiment Analysis Framework for Serbian (SAFOS). We created a hybrid method that uses a sentiment lexicon and Serbian WordNet (SWN) synsets assigned with sentiment polarity scores in the process of feature selection. As the use of stemming for morphologically rich languages (MRLs) may result in loss or giving incorrect sentiment meaning to words, we decided to expand the sentiment lexicon, as well as the lexicon generated using SWN, by adding morphological forms of emotional terms and phrases. It was done using Serbian Morphological Electronic Dictionaries. A new feature reduction method for document-level sentiment polarity classification using maximum entropy modeling is proposed. It is based on mapping of a large number of related feature candidates (sentiment words, phrases and their inflectional forms) to a few concepts and using them as features. Testing was performed on a 10-fold cross validation set and on test sets containing news and movie reviews. The results of all experiments show that sentiment feature mapping for feature set reduction achieves better results over the basic set of features. For both test sets, the best classification accuracy scores were achieved for the combination of unigram and bigram features reduced by sentiment feature mapping (accuracy 78.3 % for movie reviews and 79.2 % for news test set). In 10-fold cross-validation, best average accuracy score of 95.6 % was obtained using unigrams as features, reduced by the mapping procedure.


international conference natural language processing | 2010

Automatic construction of a morphological dictionary of multi-word units

Cvetana Krstev; Ranka Stanković; Ivan Obradović; Duško Vitas; Miloš Utvić

The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation of the proposed procedure on several different sets of data. Finally, we discuss some implementation issues and present how the same procedure is used for other languages.


text speech and dialogue | 2004

Towards Full Lexical Recognition

Gordana Pavlović-Lažetić; Duško Vitas; Cvetana Krstev

Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain unrecognized. In this paper we present two aspects of e-dictionary enhancement that provide for additional recognition of two important categories of words: named entities and words generally not recorded in traditional dictionaries. We first describe the structure and content of dictionaries of proper names, both personal and geographic, developed to recognize the corresponding classes of named entities. Then we present a set of lexical transducers expressing morphological rules governing word formation, developed for the recognition of unknown words. The resources presented significantly improve the lexical recognition process.


international conference natural language processing | 2006

Prerequisites for a comprehensive dictionary of serbian compounds

Cvetana Krstev; Duško Vitas; Agata Savary

The paper describes the steps that were undertaken in order to start the production of a comprehensive morphological dictionary of compounds for Serbian. First, the classes of multi-word expressions were determined that were to be covered by the dictionaries. In the next step the useful sources of compounds were detected. The retrieved compounds were then classified according to their inflectional properties. The recently developed special finite state transducers were constructed for each of these classes which produce all the variants and morphological forms for the compounds of the class. Finally, the software module was developed that facilitates the production of the dictionary of compound lemmas with all the necessary information in the required format.


text speech and dialogue | 2003

Corpora Issues in Validation of Serbian Wordnet

Cvetana Krstev; Gordana Pavlović-Lažetić; Ivan Obradović; Duško Vitas

In this paper we describe how the existing monolingual Serbian corpus, the bilingual Serbian/English (S/E) and Serbian/French (S/F) aligned corpora, and the appropriate morphological e-dictionaries, have been used in validation, development, and refinement of Serbian WordNet. The influence of different derivational processes, e.g. derivation of augmentatives/diminutives and possessive adjectives from nouns, to the structure of Serbian synsets is examined. A part of the experimental results that justify the applied approach is given.


conference of the european chapter of the association for computational linguistics | 2003

Composite Tense Recognition and Tagging in Serbian

Duško Vitas; Cvetana Krstev

The technology of finite-state transducers is implemented to recognize, lemmatize and tag composite tenses in Serbian in a way that connects the auxiliary and main verb. The suggested approach uses a morphological electronic dictionary of simple words and appropriate local grammars.


trans. computational collective intelligence | 2017

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Ranka Stanković; Cvetana Krstev; Ivan Obradović; Olivera Kitanović

Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named entity recognition. Documents in this geological database are described by a summary report, and other data, such as title, domain, keywords, abstract, and geographical location. These metadata were used for generating a bag of words for each document with the aid of morphological dictionaries and transducers. Named entities within metadata were also recognized with the help of a rule-based system. Both the bag of words and the metadata were then used for pre-indexing each document. A combination of several \(tf\_idf\) based measures was applied for selecting and ranking of retrieval results of indexed documents for a specific query and the results were compared with the initial retrieval system that was already in place. In general, a significant improvement has been achieved according to the standard information retrieval performance measures, where the InQuery method performed the best.


Semanitic Keyword-based Search on Structured Data Sources | 2016

Keyword-Based Search on Bilingual Digital Libraries

Ranka Stanković; Cvetana Krstev; Duško Vitas; Nikola Vulović; Olivera Kitanović

This paper outlines the main features of Biblisa, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblisa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic dictionaries, SQL and NoSQL databases, which are distributed in different servers accessed in various ways. The web application has been tested on a collection of texts from 3 journals and 2 projects, comprising 299 documents generated from TMX, stored in a NoSQL database. The tool allows the full-text and metadata search, with extraction of concordance sentence pairs for translation and terminology work support.


IKC 2015 Revised Selected Papers of the First COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-based Search on Structured Data Sources - Volume 9398 | 2015

Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian

Ranka Stanković; Cvetana Krstev; Ivan Obradović; Olivera Kitanović

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several


Computational Linguistics - Applications | 2013

An Approach to Efficient Processing of Multi-word Units

Cvetana Krstev; Ivan Obradović; Ranka Stanković; Duško Vitas

Collaboration


Dive into the Cvetana Krstev's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Agata Savary

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge