Viktor Pekar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Viktor Pekar is active.

Explore More

Publication

Featured researches published by Viktor Pekar.

Journal of Vacation Marketing | 2008

Discovery of subjective evaluations of product features in hotel reviews

Viktor Pekar; Shiyan Ou

Automated discovery and analysis of customer opinions on the web holds a lot of promise for present-day practices of market research and customer relationship management. Opinion mining attempts to come up with ways to automatically analyse subjectivity expressed in natural language text. Previous research on the topic has shown that the overall subjectivity expressed in a document, such as a customer review, can be assessed with accuracy that is feasible in real-world applications. In this paper, we address the challenge of identification of customer opinions expressed towards specific features of a product, such as service quality and location of a hotel. The paper proposes and investigates a method to recognize the relationships between subjective expressions and references to features of a product. While the method has been evaluated on customer hotel reviews, it can potentially find application also in many tasks where concrete statements need to be extracted from documents on heterogeneous topics such as posts in forums, comments on blogs, or utterances in a chat room.

Machine Translation | 2006

Finding translations for low-frequency words in comparable corpora

Viktor Pekar; Ruslan Mitkov; Dimitar Blagoev; Andrea Mulloni

Statistical methods to extract translational equivalents from non-parallel corpora hold the promise of ensuring the required coverage and domain customisation of lexicons as well as accelerating their compilation and maintenance. A challenge for these methods are rare, less common words and expressions, which often have low corpus frequencies. However, it is rare words such as newly introduced terminology and named entities that present the main interest for practical lexical acquisition. In this article, we study possibilities of improving the extraction of low-frequency equivalents from bilingual comparable corpora. Our work is carried out in the general framework which discovers equivalences between words of different languages using similarities between their occurrence patterns found in respective monolingual corpora. We develop a method that aims to compensate for insufficient amounts of corpus evidence on rare words: prior to measuring cross-language similarities, the method uses same-language corpus data to model co-occurrence vectors of rare words by predicting their unseen co-occurrences and smoothing rare, unreliable ones. Our experimental evaluation demonstrates that the proposed method delivers a consistent and significant improvement on the conventional approach to this task.

language and technology conference | 2006

Acquisition of Verb Entailment from Text

Viktor Pekar

The study addresses the problem of automatic acquisition of entailment relations between verbs. While this task has much in common with paraphrases acquisition which aims to discover semantic equivalence between verbs, the main challenge of entailment acquisition is to capture asymmetric, or directional, relations. Motivated by the intuition that it often under-lies the local structure of coherent text, we develop a method that discovers verb entailment using evidence about discourse relations between clauses available in a parsed corpus. In comparison with earlier work, the proposed method covers a much wider range of verb entailment types and learns the mapping between verbs with highly varied argument structures.

discourse anaphora and anaphor resolution colloquium | 2007

Anaphora resolution: to what extent does it help nlp applications?

Ruslan Mitkov; Richard Evans; Constantin Orăsan; Le An Ha; Viktor Pekar

Papers discussing anaphora resolution algorithms or systems usually focus on the intrinsic evaluation of the algorithm/system and not on the issue of extrinsic evaluation. In the context of anaphora resolution, extrinsic evaluation concerns the impact of an anaphora resolution module on a larger NLP system of which it is part. In this paper we explore the extent to which the well-known anaphora resolution system MARS [1] can improve the performance of three NLP applications: text summarisation, term extraction and text categorisation. On the basis of the results so far we conclude that the deployment of anaphora resolution has a positive albeit limited impact.

Machine Translation | 2007

Methods for extracting and classifying pairs of cognates and false friends

Ruslan Mitkov; Viktor Pekar; Dimitar Blagoev; Andrea Mulloni

The identification of cognates has attracted the attention of researchers working in the area of Natural Language Processing, but the identification of false friends is still an under-researched area. This paper proposes novel methods for the automatic identification of both cognates and false friends from comparable bilingual corpora. The methods are not dependent on the existence of parallel texts, and make use of only monolingual corpora and a bilingual dictionary necessary for the mapping of co-occurrence data across languages. In addition, the methods do not require that the newly discovered cognates or false friends are present in the dictionary and hence are capable of operating on out-of-vocabulary expressions. These methods are evaluated on English, French, German and Spanish corpora in order to identify English–French, English–German, English–Spanish and French–Spanish pairs of cognates or false friends. The experiments were performed in two settings: (i) assuming ‘ideal’ extraction of cognates and false friends from plain-text corpora, i.e. when the evaluation data contains only cognates and false friends, and (ii) a real-world extraction scenario where cognates and false friends have to first be identified among words found in two comparable corpora in different languages. The evaluation results show that the developed methods identify cognates and false friends with very satisfactory results for both recall and precision, with methods that incorporate background semantic knowledge, in addition to co-occurrence data obtained from the corpora, delivering the best results.

international conference on computational linguistics | 2004

Feature weighting for co-occurrence-based classification of words

Viktor Pekar; Michael Krkoska; Steffen Staab

The paper comparatively studies methods of feature weighting in application to the task of cooccurrence-based classification of words according to their meaning. We explore parameter optimization of several weighting methods frequently used for similar problems such as text classification. We find that successful application of all the methods crucially depends on a number of parameters; only a carefully chosen weighting procedure allows to obtain consistent improvement on a classifier learned from non-weighted data.

ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries | 2004

Linguistic preprocessing for distributional classification of words

Viktor Pekar

The paper is concerned with automatic classification of new lexical items into synonymic sets on the basis of their cooccurrence data obtained from a corpus. Our goal is to examine the impact that different types of linguistic preprocessing of the cooccurrence material have on the classification accuracy. The paper comparatively studies several preprocessing techniques frequently used for this and similar tasks and makes conclusions about their relative merits. We find that a carefully chosen preprocessing procedure achieves a relative effectiveness improvement of up to 88% depending on the classification method in comparison to the window-based context delineation, along with using much smaller feature space.

international conference natural language processing | 2005

Information extraction from email announcements

Viktor Pekar

Public email announcements present a number of unique challenges for an Information Extraction (IE) system, such as the presence of both free and semi-structured text, inconsistent document layout and widely varying formats of template fillers. In this paper we describe a study of parametrisation of an IE method to determine settings that best suit the specifics of the task at hand.

international conference on computational linguistics | 2014

UBham: Lexical Resources and Dependency Parsing for Aspect-Based Sentiment Analysis

Viktor Pekar; Naveed Afzal; Bernd Bohnet

This paper describes the system developed by the UBham team for the SemEval2014 Aspect-Based Sentiment Analysis task (Task 4). We present an approach based on deep linguistic processing techniques and resources, and explore the parameter space of these techniques applied to the different stages in this task and examine possibilities to exploit interdependencies between them.

Literary and Linguistic Computing | 2007

Discovery of Language Resources on the Web: Information Extraction from Heterogeneous Documents

Viktor Pekar; Richard Evans

The present article is concerned with the problem of automatic database population via information extraction (IE) from web pages obtained from heterogeneous sources, such as those retrieved by a domain crawler. Specifically, we address the task of filling single multi-field templates from individual documents, a common scenario that involves free-format documents with the same communicative goal such as job adverts, CVs, or meeting/seminar announcements. We discuss challenges that arise in this scenario and propose solutions to them at different levels of the processing of web page content. Our main focus is on the issue of information extraction, which we address with a two-step machine learning approach that first aims to determine segments of a page that are likely to contain relevant facts and then delimits specific natural language expressions with which to fill template fields. We also present a range of techniques for the enrichment of web pages with semantic annotations, such as recognition of named entities, domain terminology and coreference resolution, and examine their effect on the information extraction method. We evaluate the developed IE system on the task of automatically populating a database with information on language resources available on the web.

Explore More