Zornitsa Kozareva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zornitsa Kozareva is active.

Explore More

Publication

Featured researches published by Zornitsa Kozareva.

meeting of the association for computational linguistics | 2007

UA-ZBSA: A Headline Emotion Classification through Web Information

Zornitsa Kozareva; Borja Navarro; Sonia Vázquez; Andrés Montoyo

This paper presents a headline emotion classification approach based on frequency and co-occurrence information collected from the World Wide Web. The content words of a headline (nouns, verbs, adverbs and adjectives) are extracted in order to form different bag of word pairs with the joy, disgust, fear, anger, sadness and surprise emotions. For each pair, we compute the Mutual Information Score which is obtained from the web occurrences of an emotion and the content words. Our approach is based on the hypothesis that group of words which co-occur together across many documents with a given emotion are highly probable to express the same emotion.

conference of the european chapter of the association for computational linguistics | 2006

Bootstrapping named entity recognition with automatically generated gazetteer lists

Zornitsa Kozareva

Current Named Entity Recognition systems suffer from the lack of hand-tagged data as well as degradation when moving to other domain. This paper explores two aspects: the automatic generation of gazetteer lists from unlabeled data; and the building of a Named Entity Recognition system with labeled and unlabeled data.

international conference natural language processing | 2006

Paraphrase identification on the basis of supervised machine learning techniques

Zornitsa Kozareva; Andrés Montoyo

This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexical and semantic information, as well as techniques for classifier combination.

data and knowledge engineering | 2007

Combining data-driven systems for improving Named Entity Recognition

Zornitsa Kozareva; Óscar Ferrández; Andrés Montoyo; Rafael Muñoz; Armando Suárez; Jaime Gómez

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the identification of proper names (Named Entities) in natural language text, but also their classification into a set of predefined categories, such as names of persons, organizations (companies, government organizations, committees, etc.), locations (cities, countries, rivers, etc.) and miscellaneous (movie titles, sport events, etc.). Throughout the paper, we examine the differences between language models learned by different data-driven classifiers confronted with the same NLP task, as well as ways to exploit these differences to yield a higher accuracy than the best individual classifier. Three machine learning classifiers (Hidden Markov Model, Maximum Entropy and Memory Based Learning) are trained on the same corpus in order to resolve the NE task. After comparison, their output is combined using voting strategies. A comprehensive study and experimental work on the evaluation of our system, as well as a comparison with other systems has been carried out within the framework of two specialized scientific competitions for NER, CoNLL-2002 and HAREM-2005. Finally, this paper describes the integration of our NER system in different NLP applications, in concrete Geographic Information Retrieval and Conceptual Modelling.

international conference on computational linguistics | 2006

An unsupervised language independent method of name discrimination using second order co-occurrence features

Ted Pedersen; Anagha Kulkarni; Roxana Angheluta; Zornitsa Kozareva; Thamar Solorio

Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co–occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.

international conference on computational linguistics | 2009

Determining the Polarity and Source of Opinions Expressed in Political Debates

Alexandra Balahur; Zornitsa Kozareva; Andrés Montoyo

In this paper we investigate different approaches we developed in order to classify opinion and discover opinion sources from text, using affect, opinion and attitude lexicon. We apply these approaches on the discussion topics contained in a corpus of American Congressional speech data. We propose three approaches to classifying opinion at the speech segment level, firstly using similarity measures to the affect, opinion and attitude lexicon, secondly dependency analysis and thirdly SVM machine learning. Further, we study the impact of taking into consideration the source of opinion and the consistency in the opinion expressed, and propose three methods to classify opinion at the speaker intervention level, showing improvements over the classification of individual text segments. Finally, we propose a method to identify the party the opinion belongs to, through the identification of specific affective and non-affective lexicon used in the argumentations. We present the results obtained when evaluating the different methods we developed, together with a discussion on the issues encountered and some possible solutions. We conclude that, even at a more general level, our approach performs better than trained classifiers on specific data.

mexican international conference on artificial intelligence | 2005

Self-training and co-training applied to spanish named entity recognition

Zornitsa Kozareva; Boyan Bonev; Andrés Montoyo

The paper discusses the usage of unlabeled data for Spanish Named Entity Recognition. Two techniques have been used: self-training for detecting the entities in the text and co-training for classifying these already detected entities. We introduce a new co-training algorithm, which applies voting techniques in order to decide which unlabeled example should be added into the training set at each iteration. A proposal for improving the performance of the detected entities has been made. A brief comparative study with already existing co-training algorithms is demonstrated.

international conference natural language processing | 2005

Combining data-driven systems for improving named entity recognition

Zornitsa Kozareva; Óscar Ferrández; Andrés Montoyo; Rafael Muñoz; Armando Suárez

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER which involves identification of proper names in texts, and classification into a set of predefined categories of interest as Person names, Organizations (companies, government organizations, committees, etc.) and Locations (cities, countries, rivers, etc). We examined the differences in language models learned by different data-driven systems performing the same NLP tasks and how they can be exploited to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 98.5% accuracy for recognition and 84.94% accuracy for classification of NE for Spanish language were achieved.

north american chapter of the association for computational linguistics | 2015

Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce

Zornitsa Kozareva

Online shopping caters the needs of millions of users on a daily basis. To build an accurate system that can retrieve relevant products for a query like “MB252 with travel bags” one requires product and query categorization mechanisms, which classify the text as Home&Garden>Kitchen&Dining>Kitchen Appliances>Blenders. One of the biggest challenges in e-Commerce is that providers like Amazon, e-Bay, Google, Yahoo! and Walmart organize products into different product taxonomies making it hard and time-consuming for sellers to categorize goods for each shopping platform. To address this challenge, we propose an automatic product categorization mechanism, which for a given product title assigns the correct product category from a taxonomy. We conducted an empirical evaluation on445,408 product titles and used a rich product taxonomy of 319 categories organized into 6 levels. We compared performance against multiple algorithms and found that the best performing system reaches.88 f-score.

language resources and evaluation | 2013

Tailoring the automated construction of large-scale taxonomies using the web

Zornitsa Kozareva; Eduard H. Hovy

It has long been a dream to have available a single, centralized, semantic thesaurus or terminology taxonomy to support research in a variety of fields. Much human and computational effort has gone into constructing such resources, including the original WordNet and subsequent wordnets in various languages. To produce such resources one has to overcome well-known problems in achieving both wide coverage and internal consistency within a single wordnet and across many wordnets. In particular, one has to ensure that alternative valid taxonomizations covering the same basic terms are recognized and treated appropriately. In this paper we describe a pipeline of new, powerful, minimally supervised, automated algorithms that can be used to construct terminology taxonomies and wordnets, in various languages, by harvesting large amounts of online domain-specific or general text. We illustrate the effectiveness of the algorithms both to build localized, domain-specific wordnets and to highlight and investigate certain deeper ontological problems such as parallel generalization hierarchies. We show shortcomings and gaps in the manually-constructed English WordNet in various domains.

Explore More